Parsing nested dictionary to dataframe - python

I am trying to create data frame from a JSON file.
and each album_details have a nested dict like this
{'api_path': '/albums/491200',
'artist': {'api_path': '/artists/1421',
'header_image_url': 'https://images.genius.com/f3a1149475f2406582e3531041680a3c.1000x800x1.jpg',
'id': 1421,
'image_url': 'https://images.genius.com/25d8a9c93ab97e9e6d5d1d9d36e64a53.1000x1000x1.jpg',
'iq': 46112,
'is_meme_verified': True,
'is_verified': True,
'name': 'Kendrick Lamar',
'url': 'https://genius.com/artists/Kendrick-lamar'},
'cover_art_url': 'https://images.genius.com/1efc5de2af228d2e49d91bd0dac4dc49.1000x1000x1.jpg',
'full_title': 'good kid, m.A.A.d city (Deluxe Version) by Kendrick Lamar',
'id': 491200,
'name': 'good kid, m.A.A.d city (Deluxe Version)',
'url': 'https://genius.com/albums/Kendrick-lamar/Good-kid-m-a-a-d-city-deluxe-version'}
I want to create another column in the data frame with just album name which is one the above dict
'name': 'good kid, m.A.A.d city (Deluxe Version)',
I have been looking how to do this from very long time , can some one please help me. thanks

Is that is the case use str to call the dict key
df['name'] = df['album_details'].str['name']

If you have the dataframe stored in the df variable you could do:
df['artist_name'] = [x['artist']['name'] for x in df['album_details'].values]

You can use apply with lambda function:
df['album_name'] = df['album_details'].apply(lambda d: d['name'])
Basically you execute the lambda function for each value of the column 'album_details'. Note that the argument 'd' in the function is the album dictionary. Apply returns a series of the function return values and this you can set to a new column.
See: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

Related

Basic pandas dataframe manipulation question

I have the following JSON snippet:
{'search_metadata': {'completed_in': 0.027,
'count': 2},
'statuses': [{'contributors': None,
'coordinates': None,
'created_at': 'Wed Mar 31 19:25:16 +0000 2021',
'text': 'The text',
'truncated': True,
'user': {'contributors_enabled': False,
'screen_name': 'abcde',
'verified': false
}
}
,{...}]
}
The info that interests me is all in the statuses array. With pandas I can turn this into a DataFrame like this
df = pd.DataFrame(Data['statuses'])
Then I extract a subset out of this dataframe with
dfsub = df[['created_at', 'text']]
display(dfsub) shows exactly what I expect.
But I also want to include [user][screen_name] to the subset.
dfs = df[[ 'user', 'created_at', 'text']]
is syntactically correct but user contains to much information.
How do I add only the screen_name to the subset?
I have tried things like the following but none of that works
[user][screen_name]
user.screen_name
user:screen_name
I would normalize data before contructing DataFrame.
Take a look here: https://stackoverflow.com/a/41801708/14596032
Working example as an answer for your question:
df = pd.json_normalize(Data['statuses'], sep='_')
dfs = df[[ 'user_screen_name', 'created_at', 'text']]
print(dfs)
You can try to access Dataframe, then Series, then Dict
df['user'] # user column = Series
df['user'][0] # 1st (only) item of the Series = dict
df['user'][0]['screen_name'] # screen_name in dict
You can use pd.Series.str. The docs don't do justice to all the wonderful things .str can do, such as accessing list and dict items. Case in point, you can access dict elements like this:
df['user'].str['screen_name']
That said, I agree with #VladimirGromes that a better way is to normalize your data into a flat table.

Splitting at specific string from a dataframe column in Python

I have a dataframe with a column called "Spl" with the values below: I am trying to extract the values next to 'name': strings (some rows have multiple values) but I see the new column generated with the specific location of the memory. I used the below code to extract. Any help how to extract the values after "name:" string is much appreciated.
Column values:
'name': 'Chirotherapie', 'name': 'Innen Medizin'
'name': 'Manuelle Medizin'
'name': 'Akupunktur', 'name': 'Chirotherapie', 'name': 'Innen Medizin'
Code:
df['Spl'] = lambda x: len(x['Spl'].str.split("'name':"))
Output:
<function <lambda> at 0x0000027BF8F68940>
Just simply do:-
df['Spl']=df['Spl'].str.split("'name':").str.len()
Just do count
df['Spl'] = df['Spl'].str.count("'name':")+1

Retrieve value in JSON from pandas series object

I need help retrieving a value from a JSON response object in python. Specifically, how do I access the prices-asks-price value? I'm having trouble:
JSON object:
{'prices': [{'asks': [{'liquidity': 10000000, 'price': '1.16049'}],
'bids': [{'liquidity': 10000000, 'price': '1.15989'}],
'closeoutAsk': '1.16064',
'closeoutBid': '1.15974',
'instrument': 'EUR_USD',
'quoteHomeConversionFactors': {'negativeUnits': '1.00000000',
'positiveUnits': '1.00000000'},
'status': 'non-tradeable',
'time': '2018-08-31T20:59:57.748335979Z',
'tradeable': False,
'type': 'PRICE',
'unitsAvailable': {'default': {'long': '4063619', 'short': '4063619'},
'openOnly': {'long': '4063619', 'short': '4063619'},
'reduceFirst': {'long': '4063619', 'short': '4063619'},
'reduceOnly': {'long': '0', 'short': '0'}}}],
'time': '2018-09-02T18:56:45.022341038Z'}
Code:
data = pd.io.json.json_normalize(response['prices'])
asks = data['asks']
asks[0]
Out: [{'liquidity': 10000000, 'price': '1.16049'}]
I want to get the value 1.16049 - but having trouble after trying different things.
Thanks
asks[0] returns a list so you might do something like
asks[0][0]['price']
or
data = pd.io.json.json_normalize(response['prices'])
price = data['asks'][0][0]['price']
The data that you have contains jsons and lists inside one another. Hence you need to navigate through this accordingly. Lists are accessed through indexes (starting from 0) and jsons are accessed through keys.
price_value=data['prices'][0]['asks'][0]['price']
liquidity_value=data['prices'][0]['asks'][0]['liquidity']
Explaining this logic in this case : I assume that your big json object is stored in a object called data. First accessing prices key in this object. Then I have index 0 because the next key is present inside a list. Then after you go inside the list, you have a key called asks. Now again you have a list here so you need to access it using index 0. Then finally the key liquidity and price is here.

django-tables2 sorting non-queryset data

Hi I have a table thus:
class MyTable(tables.Table):
brand = tables.Column(verbose_name='Brand')
market = tables.Column(verbose_name='Market')
date = tables.DateColumn(format='d/m/Y')
value = tables.Column(verbose_name='Value')
I'm populating the table with a list of dicts and then I'm configuring it and rendering it in my view in the usual django-tables2 way:
data = [
{'brand': 'Nike', 'market': 'UK', 'date': <datetime obj>, 'value': 100},
{'brand': 'Django', 'market': 'FR', 'date': <datetime obj>, 'value': 100},
]
table = MyTable(data)
RequestConfig(request, paginate=False).configure(table)
context = {'table': table}
return render(request, 'my_tmp.html', context)
This all renders the table nicely with the correct data in it. However, I can sort by the date column and the value column on the webpage, but not by the brand and market. So it seems non-string values can be sorted but strings can't. I've been trying to figure out how to do this sorting, is it possible with non-queryset data?
I can't populate the table with a queryset, as I'm using custom methods to generate the value column data. Any help appreciated! I guess I need to specify the order_by parameter in my tables.Column but haven't found the correct setting yet.
data.sort(key = lambda item: item["brand"].lower()) will sort the list in place (so it will return None; the original 'data' list is edited) based on brand (alphabetically). The same can be done for any key.
Alternatively, sorted(data, key = lambda item: item["brand"].lower()) returns a sorted copy of the list.

How to Optimally Save Data in Python as Data Structure

My model returns information about PC games in the following format. The format is game index and game value. This is my sim_sorted.
[(778, 0.99999994), (1238, 0.9999997), (1409, 0.99999905), (1212, 0.99999815)]
I retrieve the information about the game by indexing the database (df_indieGames):
sims_sorted = sorted(enumerate(sims), key=lambda item: -item[1])
results = {}
for val in sims_sorted[:4]:
index, value = val[0], val[1]
results[df_indieGames.game_name.loc[index]] =
{
"Genre":df_indieGames.genre.loc[index],
"Rating": df_indieGames.score.loc[index],
"Link": df_indieGames.game_link[index]
}
However, such a data structure is hard to sort (by Rating). Is there a better way to store the information so retrieval and sorting is easier? Thanks.
Here's the output of results:
{u'Diehard Dungeon': {'Genre': u'Roguelike',
'Link': u'http://www.indiedb.com/games/diehard-dungeon',
'Rating': 8.4000000000000004},
u'Fork Truck Challenge': {'Genre': u'Realistic Sim',
'Link': u'http://www.indiedb.com/games/fork-truck-challenge',
'Rating': 7.4000000000000004},
u'Miniconomy': {'Genre': u'Realistic Sim',
'Link': u'http://www.indiedb.com/games/miniconomy',
'Rating': 7.2999999999999998},
u'World of Padman': {'Genre': u'First Person Shooter',
'Link': u'http://www.indiedb.com/games/world-of-padman',
'Rating': 9.0}}
UPDATE
The solution to the problem as suggested by ziddarth is the following:
result = sorted(results.iteritems(), key=lambda x: x[1]['Rating'], reverse=True)
You can sort by rating using code below. The lambda function is called with a tuple whose first element is the dictionary key and the second element is the dictionary value for the corresponding key, so you can use the lambda function to get to any value in the nested dictionary
sorted(results.iteritems(), key=lambda x: x[1]['Rating'])

Categories