GraphQL json output to pandas dataframe - python

I tried a few ways to convert a json output from GraphQL to a pandas dataframe but I was not able to get it right. What's the best way to convert it into a pandas dataframe?
{'data': {'posts': {'edges': [{'node': {'id': '303843',
'name': 'hipCV',
'tagline': 'Create an impressive resume in minutes',
'votesCount': 71}},
{'node': {'id': '303751',
'name': 'Find Your First Frontend Job',
'tagline': "Find your dream job, even if you've been rejected many times",
'votesCount': 51}},
{'node': {'id': '303665',
'name': 'Epsilon3',
'tagline': 'The OS for spacecraft and complex operations',
'votesCount': 290}}]}}}

Try:
df=pd.json_normalize(data['data']['posts']['edges'])
#here data is your json data
#If needed use:
df.columns=df.columns.str.split('.').str[1]
output of df:
node.id node.name node.tagline node.votesCount
0 303843 hipCV Create an impressive resume in minutes 71
1 303751 Find Your First Frontend Job Find your dream job, even if you've been rejec... 51
2 303665 Epsilon3 The OS for spacecraft and complex operations 290

Related

How to query over sort key in the simplest manner?

I have a table stored in dynamoDB which exemplary dictionary is the following:
{event': 'A',
'timestamp': '2017-10-15 16:20:47,009',
'message': 'AAA'},
{event': 'B',
'timestamp': '2018-10-15 16:20:47,009',
'message': 'BBB'},
{event': 'A',
'timestamp': '2019-10-15 16:20:47,009',
'message': 'BBB'},
{event': 'B',
'timestamp': '2020-10-15 16:20:47,009',
'message': 'AAA'},
I would like to extract only those dictionaries that happened after 2018-10-15 00:00:00
I used the following code to load the table:
import boto3
client = boto3.client("dynamodb", region_name="eu-central-1")
session = boto3.Session(profile_name="myprofile")
resource = session.resource("dynamodb", region_name="eu-central-1")
table = resource.Table("mytable")
I read that this relatively easy task is not so easy to perform when considering dynamodb, for which the global secondary index has to be created.I tried plenty of stuff but everything failed, do you have any idea what can I do to query over timestamp in the simplest manner? I would highly like to avoid using GSI, because for me its very complicated stuff for very easy task to perform.
Apparently I found the solution which is very simple but I couldn't find it anywhere on internet.
The solution is:
from boto3.dynamodb.conditions import Key
table.query(KeyConditionExpression=Key("event").eq("A") & Key("timestamp").gt("2018-10-15 00:00:00"))

How to relate two object in a json file when searching?

I'm using the Twitter API, and it returns me a JSON file.
There's a sample in their dev documentation if you scroll to the bottom, note the example only includes 1 tweet whereas I'm working with hundreds.
In the data object you have geo and inside of geo you have place_id which correlates to another field in the includes object, more specifically the id field nested under places.
My problem then arises when I have hundreds of tweets in a JSON file in data and their respective geolocation data in the other object includes. How can I extract the geolocation data and relate it to the current tweet I have selected?
Currently, I have a for loop to go through all of the tweets and append the information into a CSV file, then nested in that for loop I have this:
for place in json_response['includes']['places']:
if (geo == place['id']):
full_name = place['full_name']
country = place['country']
country_code = place['country_code']
new_geo = place['geo']
place_name = place['name']
place_type = place['place_type']
However, it only returns the geolocation data for 1 tweet per JSON response because I assumed that each tweet got its own includes object. Now I'm stuck and any help would be appreciated.
To eliminate the need for a double for loop as well as the if statement
you have in your code snippet a straightforward approach without an additional library would be to make a dict comprehension of all tweets with the place_id as the dict's key:
tweets = {tweet['geo']['place_id']: tweet for tweet in json_response['data']}
this results in the following list:
{'01a9a39529b27f36': {'text': 'We’re sharing a live demo of the new Twitter Developer Labs program, led by a member of our DevRel team, #jessicagarson #TapIntoTwitter [url_intentionally_removed],
'id': '1136048014974423040',
'geo': {'place_id': '01a9a39529b27f36'}}}
If the response returned more than one tweets, as you've mentioned your use case is then this would look like:
['01a9a39529b27f36', 'some_other_id', ...]
In the next step we can do a dict comprehension defining the id of each place, this way we can avoid any if-statements:
places = { p['id']: p for p in json_response['includes']['places']}
this produces the following result:
{'01a9a39529b27f36': {'geo': {'type': 'Feature',
'bbox': [-74.026675, 40.683935, -73.910408, 40.877483],
'properties': {}},
'country_code': 'US',
'name': 'Manhattan',
'id': '01a9a39529b27f36',
'place_type': 'city',
'country': 'United States',
'full_name': 'Manhattan, NY'}}
Finally, to combine them based on the common key:
for pid, geodata in places.items(): tweets[pid].update(geodata)
which yields:
{'01a9a39529b27f36': {'text': 'We’re sharing a live demo of the new Twitter Developer Labs program, led by a member of our DevRel team, #jessicagarson #TapIntoTwitter [url_removed_on_purpose],
'id': '01a9a39529b27f36',
'geo': {'type': 'Feature',
'bbox': [-74.026675, 40.683935, -73.910408, 40.877483],
'properties': {}},
'country_code': 'US',
'name': 'Manhattan',
'place_type': 'city',
'country': 'United States',
'full_name': 'Manhattan, NY'}}

Parsing nested dictionary to dataframe

I am trying to create data frame from a JSON file.
and each album_details have a nested dict like this
{'api_path': '/albums/491200',
'artist': {'api_path': '/artists/1421',
'header_image_url': 'https://images.genius.com/f3a1149475f2406582e3531041680a3c.1000x800x1.jpg',
'id': 1421,
'image_url': 'https://images.genius.com/25d8a9c93ab97e9e6d5d1d9d36e64a53.1000x1000x1.jpg',
'iq': 46112,
'is_meme_verified': True,
'is_verified': True,
'name': 'Kendrick Lamar',
'url': 'https://genius.com/artists/Kendrick-lamar'},
'cover_art_url': 'https://images.genius.com/1efc5de2af228d2e49d91bd0dac4dc49.1000x1000x1.jpg',
'full_title': 'good kid, m.A.A.d city (Deluxe Version) by Kendrick Lamar',
'id': 491200,
'name': 'good kid, m.A.A.d city (Deluxe Version)',
'url': 'https://genius.com/albums/Kendrick-lamar/Good-kid-m-a-a-d-city-deluxe-version'}
I want to create another column in the data frame with just album name which is one the above dict
'name': 'good kid, m.A.A.d city (Deluxe Version)',
I have been looking how to do this from very long time , can some one please help me. thanks
Is that is the case use str to call the dict key
df['name'] = df['album_details'].str['name']
If you have the dataframe stored in the df variable you could do:
df['artist_name'] = [x['artist']['name'] for x in df['album_details'].values]
You can use apply with lambda function:
df['album_name'] = df['album_details'].apply(lambda d: d['name'])
Basically you execute the lambda function for each value of the column 'album_details'. Note that the argument 'd' in the function is the album dictionary. Apply returns a series of the function return values and this you can set to a new column.
See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

How to extract values from json response of Google's distance API and store in python dataframe?

I am new to python and am struggling with how to insert the values I'm extracting from Google's distance API json return to a dataframe.
For example, this is the json response:
{'destination_addresses': ['Victoria, BC, Canada'],
'origin_addresses': ['Vancouver, BC, Canada'],
'rows': [{'elements': [{'distance': {'text': '114 km', 'value': 114210},
'duration': {'text': '3 hours 1 min', 'value': 10862},
'status': 'OK'}]}],
'status': 'OK'}
I have figured out how to extract the values and print them. But I would like to store them in a dataframe. I created a loop to get the four values I need:
level0 = ["distance","duration"]
level1 = ["value", "text"]
for level in level0:
for item in level1:
print (result["rows"][0]["elements"][0][level][item])
Essentially I need to end up with a dataframe with the origin, destination, and the 4 combinations of elements I list above. I'm not sure how to insert those values into a dataframe the way I have this set up. Did I just set up my loop incorrectly?
I was able to put the values into a list but then I'm not sure what to do from there to get the values into a row.
Ultimately I will be looping through many combinations of origins and destinations so I will have many rows, one for each test.
TIA!
Creating a list of dictionaries as Eric pointed out in the comments worked!

Convert Messy String to CSV Columns

I have a large list of ids from the Facebook API that I would like to put into a CSV file. Is there a way to parse each ID into an individual column in a CSV file? I am using Python
Current Format:
{'media': {'data': [{'id': '17898498243076831'}, {'id':
'17907011917029111'}, {'id': '17906766215033990'}, {'id':
'17894813609104671'}, {'id': '17890124843094721'}}
But I would like the format to be this:
id
17898498243076831
17907011917029111
17906766215033990
17894813609104671
17890124843094721
EDITED
The facebook api seem to be spitting out dictionary inside a dictionary inside a dictionary, well thats makes it easy for developers to do any manipulations with any language. So here is what I will suggest to you
You can do something like this
# Create a dictionary
dictionary_fb= {'media': {'data': [{'id': '17898498243076831'}, {'id':
'17907011917029111'}, {'id': '17906766215033990'}, {'id':
'17894813609104671'}, {'id': '17890124843094721'}]}}
# Get the id dictionary_fb and make it into loops
dict_id= dictionary_fb['media']['data']
df =pd.DataFrame(dict_id)
df.to_csv("filename")
If you want to do it more often then you can use for loops and get things done
Cheers!

Categories