I'm using the Twitter API, and it returns me a JSON file.
There's a sample in their dev documentation if you scroll to the bottom, note the example only includes 1 tweet whereas I'm working with hundreds.
In the data object you have geo and inside of geo you have place_id which correlates to another field in the includes object, more specifically the id field nested under places.
My problem then arises when I have hundreds of tweets in a JSON file in data and their respective geolocation data in the other object includes. How can I extract the geolocation data and relate it to the current tweet I have selected?
Currently, I have a for loop to go through all of the tweets and append the information into a CSV file, then nested in that for loop I have this:
for place in json_response['includes']['places']:
if (geo == place['id']):
full_name = place['full_name']
country = place['country']
country_code = place['country_code']
new_geo = place['geo']
place_name = place['name']
place_type = place['place_type']
However, it only returns the geolocation data for 1 tweet per JSON response because I assumed that each tweet got its own includes object. Now I'm stuck and any help would be appreciated.
To eliminate the need for a double for loop as well as the if statement
you have in your code snippet a straightforward approach without an additional library would be to make a dict comprehension of all tweets with the place_id as the dict's key:
tweets = {tweet['geo']['place_id']: tweet for tweet in json_response['data']}
this results in the following list:
{'01a9a39529b27f36': {'text': 'We’re sharing a live demo of the new Twitter Developer Labs program, led by a member of our DevRel team, #jessicagarson #TapIntoTwitter [url_intentionally_removed],
'id': '1136048014974423040',
'geo': {'place_id': '01a9a39529b27f36'}}}
If the response returned more than one tweets, as you've mentioned your use case is then this would look like:
['01a9a39529b27f36', 'some_other_id', ...]
In the next step we can do a dict comprehension defining the id of each place, this way we can avoid any if-statements:
places = { p['id']: p for p in json_response['includes']['places']}
this produces the following result:
{'01a9a39529b27f36': {'geo': {'type': 'Feature',
'bbox': [-74.026675, 40.683935, -73.910408, 40.877483],
'properties': {}},
'country_code': 'US',
'name': 'Manhattan',
'id': '01a9a39529b27f36',
'place_type': 'city',
'country': 'United States',
'full_name': 'Manhattan, NY'}}
Finally, to combine them based on the common key:
for pid, geodata in places.items(): tweets[pid].update(geodata)
which yields:
{'01a9a39529b27f36': {'text': 'We’re sharing a live demo of the new Twitter Developer Labs program, led by a member of our DevRel team, #jessicagarson #TapIntoTwitter [url_removed_on_purpose],
'id': '01a9a39529b27f36',
'geo': {'type': 'Feature',
'bbox': [-74.026675, 40.683935, -73.910408, 40.877483],
'properties': {}},
'country_code': 'US',
'name': 'Manhattan',
'place_type': 'city',
'country': 'United States',
'full_name': 'Manhattan, NY'}}
Related
I am currently donwloading Tweets using the twitter API. I am looking for tweets with a location (cities, pois). This works so far.
In the response I get the name of the place, the type of the place and a unique place id (see below).
{'full_name': 'Mannheim, Deutschland', 'place_type': 'city', 'id': '4555d5a205d63969', 'name': 'Mannheim', 'geo': {'type': 'Feature', 'bbox': [8.414251, 49.410381, 8.590021, 49.590376], 'properties': {}}}
I would like to create a coordinate for each city/poi using the id, since I would like to plot the tweets. Is this possible?
Kind regards,
Daniel
I am trying to create data frame from a JSON file.
and each album_details have a nested dict like this
{'api_path': '/albums/491200',
'artist': {'api_path': '/artists/1421',
'header_image_url': 'https://images.genius.com/f3a1149475f2406582e3531041680a3c.1000x800x1.jpg',
'id': 1421,
'image_url': 'https://images.genius.com/25d8a9c93ab97e9e6d5d1d9d36e64a53.1000x1000x1.jpg',
'iq': 46112,
'is_meme_verified': True,
'is_verified': True,
'name': 'Kendrick Lamar',
'url': 'https://genius.com/artists/Kendrick-lamar'},
'cover_art_url': 'https://images.genius.com/1efc5de2af228d2e49d91bd0dac4dc49.1000x1000x1.jpg',
'full_title': 'good kid, m.A.A.d city (Deluxe Version) by Kendrick Lamar',
'id': 491200,
'name': 'good kid, m.A.A.d city (Deluxe Version)',
'url': 'https://genius.com/albums/Kendrick-lamar/Good-kid-m-a-a-d-city-deluxe-version'}
I want to create another column in the data frame with just album name which is one the above dict
'name': 'good kid, m.A.A.d city (Deluxe Version)',
I have been looking how to do this from very long time , can some one please help me. thanks
Is that is the case use str to call the dict key
df['name'] = df['album_details'].str['name']
If you have the dataframe stored in the df variable you could do:
df['artist_name'] = [x['artist']['name'] for x in df['album_details'].values]
You can use apply with lambda function:
df['album_name'] = df['album_details'].apply(lambda d: d['name'])
Basically you execute the lambda function for each value of the column 'album_details'. Note that the argument 'd' in the function is the album dictionary. Apply returns a series of the function return values and this you can set to a new column.
See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
I'm working with an API trying to currently pull data out of it. The challenge I'm having is that the majority of the columns are straight forward and not nested, with the exception of a CustomFields column which has all the various custom fields used located in a list per record.
Using json_normalize is there a way to target a nested column to flatten it? I'm trying to fetch and use all the data available from the API but one nested column in particular is causing a headache.
The JSON data when retrieved from the API looks like the following. This is just for one customer profile,
[{'EmailAddress': 'an_email#gmail.com', 'Name': 'Al Smith’, 'Date': '2020-05-26 14:58:00', 'State': 'Active', 'CustomFields': [{'Key': '[Location]', 'Value': 'HJGO'}, {'Key': '[location_id]', 'Value': '34566'}, {'Key': '[customer_id]', 'Value': '9051'}, {'Key': '[status]', 'Value': 'Active'}, {'Key': '[last_visit.1]', 'Value': '2020-02-19'}]
Using json_normalize,
payload = json_normalize(payload_json['Results'])
Here are the results when I run the above code,
Ideally, here is what I would like the final result to look like,
I think I just need to work with the record_path and meta parameters but I'm not totally understanding how they work.
Any ideas? Or would using json_normalize not work in this situation?
Try this, You have square brackets in your JSON, that's why you see those [ ] :
d = [{'EmailAddress': 'an_email#gmail.com', 'Name': 'Al Smith', 'Date': '2020-05-26 14:58:00', 'State': 'Active', 'CustomFields': [{'Key': '[Location]', 'Value': 'HJGO'}, {'Key': '[location_id]', 'Value': '34566'}, {'Key': '[customer_id]', 'Value': '9051'}, {'Key': '[status]', 'Value': 'Active'}, {'Key': '[last_visit.1]', 'Value': '2020-02-19'}]}]
df = pd.json_normalize(d, record_path=['CustomFields'], meta=[['EmailAddress'], ['Name'], ['Date'], ['State']])
df = df.pivot_table(columns='Key', values='Value', index=['EmailAddress', 'Name'], aggfunc='sum')
print(df)
Output:
Key [Location] [customer_id] [last_visit.1] [location_id] [status]
EmailAddress Name
an_email#gmail.com Al Smith HJGO 9051 2020-02-19 34566 Active
I need help retrieving a value from a JSON response object in python. Specifically, how do I access the prices-asks-price value? I'm having trouble:
JSON object:
{'prices': [{'asks': [{'liquidity': 10000000, 'price': '1.16049'}],
'bids': [{'liquidity': 10000000, 'price': '1.15989'}],
'closeoutAsk': '1.16064',
'closeoutBid': '1.15974',
'instrument': 'EUR_USD',
'quoteHomeConversionFactors': {'negativeUnits': '1.00000000',
'positiveUnits': '1.00000000'},
'status': 'non-tradeable',
'time': '2018-08-31T20:59:57.748335979Z',
'tradeable': False,
'type': 'PRICE',
'unitsAvailable': {'default': {'long': '4063619', 'short': '4063619'},
'openOnly': {'long': '4063619', 'short': '4063619'},
'reduceFirst': {'long': '4063619', 'short': '4063619'},
'reduceOnly': {'long': '0', 'short': '0'}}}],
'time': '2018-09-02T18:56:45.022341038Z'}
Code:
data = pd.io.json.json_normalize(response['prices'])
asks = data['asks']
asks[0]
Out: [{'liquidity': 10000000, 'price': '1.16049'}]
I want to get the value 1.16049 - but having trouble after trying different things.
Thanks
asks[0] returns a list so you might do something like
asks[0][0]['price']
or
data = pd.io.json.json_normalize(response['prices'])
price = data['asks'][0][0]['price']
The data that you have contains jsons and lists inside one another. Hence you need to navigate through this accordingly. Lists are accessed through indexes (starting from 0) and jsons are accessed through keys.
price_value=data['prices'][0]['asks'][0]['price']
liquidity_value=data['prices'][0]['asks'][0]['liquidity']
Explaining this logic in this case : I assume that your big json object is stored in a object called data. First accessing prices key in this object. Then I have index 0 because the next key is present inside a list. Then after you go inside the list, you have a key called asks. Now again you have a list here so you need to access it using index 0. Then finally the key liquidity and price is here.
I have a large list of ids from the Facebook API that I would like to put into a CSV file. Is there a way to parse each ID into an individual column in a CSV file? I am using Python
Current Format:
{'media': {'data': [{'id': '17898498243076831'}, {'id':
'17907011917029111'}, {'id': '17906766215033990'}, {'id':
'17894813609104671'}, {'id': '17890124843094721'}}
But I would like the format to be this:
id
17898498243076831
17907011917029111
17906766215033990
17894813609104671
17890124843094721
EDITED
The facebook api seem to be spitting out dictionary inside a dictionary inside a dictionary, well thats makes it easy for developers to do any manipulations with any language. So here is what I will suggest to you
You can do something like this
# Create a dictionary
dictionary_fb= {'media': {'data': [{'id': '17898498243076831'}, {'id':
'17907011917029111'}, {'id': '17906766215033990'}, {'id':
'17894813609104671'}, {'id': '17890124843094721'}]}}
# Get the id dictionary_fb and make it into loops
dict_id= dictionary_fb['media']['data']
df =pd.DataFrame(dict_id)
df.to_csv("filename")
If you want to do it more often then you can use for loops and get things done
Cheers!