Converting data frame into a nested dictionary - python

Below is my subsetted data frame, I am having a hard time trying to convert it into my desired output as I am fairly new to Python. Essentially I want to create a nested dictionary inside a list, with the column names as a value, and then another nested dictionary inside a list. Is this doable?
import pandas as pd
Sector Community Name
0 centre: 10901.0 park: 3238.0
1 northeast: 6958.0 heights: 1955.0
Desired output:
[{'column': 'Sector',
'value': [{'name': 'centre', 'value': 10901.0},
{'name': 'northeast', 'value': 6958.0}]},
{'column': 'Community Name',
'value': [{'name': 'park', 'value': 3238.0},
{'name': 'heights', 'value': 1955.0},
{'name': 'hill', 'value': 1454.0}]}]

From #sushanth's answer, I may add up to this solution. Assume that your dataframe variable is defined as df.
result = []
for header in list(df):
column_values = df[header].to_list()
result.append({
"column" : header,
"value" : [dict(zip(['name', 'value'], str(value).split(":"))) for value in column_values]
})

Using pandas in above case might be a overkill, Here is a solution using python inbuilt functions which you can give a try,
input_ = {"Sector": ["centre: 10901.0", "northeast: 6958.0"],
"Community Name": ["park: 3238.0", "heights: 1955.0"]}
result = []
for k, v in input_.items():
result.append({
"column" : k,
"value" : [dict(zip(['name', 'value'], vv.split(":"))) for vv in v]
})
print(result)
[{'column': 'Sector',
'value': [{'name': 'centre', 'value': ' 10901.0'},
{'name': 'northeast', 'value': ' 6958.0'}]},
{'column': 'Community Name',
'value': [{'name': 'park', 'value': ' 3238.0'},
{'name': 'heights', 'value': ' 1955.0'}]}]

Related

pandas create a one row dataframe from nested dict

I know the "create pandas dataframe from nested dict" has a lot of entries here but I'm not found the answer that applies to my problem:
I have a dict like this:
{'id': 1,
'creator_user_id': {'id': 12170254,
'name': 'Nicolas',
'email': 'some_mail#some_email_provider.com',
'has_pic': 0,
'pic_hash': None,
'active_flag': True,
'value': 12170254}....,
and after reading with pandas look like this:
df = pd.DataFrame.from_dict(my_dict,orient='index')
print(df)
id 1
creator_user_id {'id': 12170254, 'name': 'Nicolas', 'email': '...
user_id {'id': 12264469, 'name': 'Daniela Giraldo G', ...
person_id {'active_flag': True, 'name': 'Cristina Cardoz...
org_id {'name': 'Cristina Cardozo', 'people_count': 1...
stage_id 2
title Cristina Cardozo
I would like to create a one-row dataframe where, for example, the nested creator_user_id column results in several columns that I after can name: creator_user_id_id, creator_user_id_name, etc.
thank you for your time!
Given you want one row, just use json_normalize()
pd.json_normalize({'id': 1,
'creator_user_id': {'id': 12170254,
'name': 'Nicolas',
'email': 'some_mail#some_email_provider.com',
'has_pic': 0,
'pic_hash': None,
'active_flag': True,
'value': 12170254}})

Json_Normalize, targeting nested columns within a specific column?

I'm working with an API trying to currently pull data out of it. The challenge I'm having is that the majority of the columns are straight forward and not nested, with the exception of a CustomFields column which has all the various custom fields used located in a list per record.
Using json_normalize is there a way to target a nested column to flatten it? I'm trying to fetch and use all the data available from the API but one nested column in particular is causing a headache.
The JSON data when retrieved from the API looks like the following. This is just for one customer profile,
[{'EmailAddress': 'an_email#gmail.com', 'Name': 'Al Smith’, 'Date': '2020-05-26 14:58:00', 'State': 'Active', 'CustomFields': [{'Key': '[Location]', 'Value': 'HJGO'}, {'Key': '[location_id]', 'Value': '34566'}, {'Key': '[customer_id]', 'Value': '9051'}, {'Key': '[status]', 'Value': 'Active'}, {'Key': '[last_visit.1]', 'Value': '2020-02-19'}]
Using json_normalize,
payload = json_normalize(payload_json['Results'])
Here are the results when I run the above code,
Ideally, here is what I would like the final result to look like,
I think I just need to work with the record_path and meta parameters but I'm not totally understanding how they work.
Any ideas? Or would using json_normalize not work in this situation?
Try this, You have square brackets in your JSON, that's why you see those [ ] :
d = [{'EmailAddress': 'an_email#gmail.com', 'Name': 'Al Smith', 'Date': '2020-05-26 14:58:00', 'State': 'Active', 'CustomFields': [{'Key': '[Location]', 'Value': 'HJGO'}, {'Key': '[location_id]', 'Value': '34566'}, {'Key': '[customer_id]', 'Value': '9051'}, {'Key': '[status]', 'Value': 'Active'}, {'Key': '[last_visit.1]', 'Value': '2020-02-19'}]}]
df = pd.json_normalize(d, record_path=['CustomFields'], meta=[['EmailAddress'], ['Name'], ['Date'], ['State']])
df = df.pivot_table(columns='Key', values='Value', index=['EmailAddress', 'Name'], aggfunc='sum')
print(df)
Output:
Key [Location] [customer_id] [last_visit.1] [location_id] [status]
EmailAddress Name
an_email#gmail.com Al Smith HJGO 9051 2020-02-19 34566 Active

Pandas - Extracting values from a Dataframe column

I have a Dataframe in the below format:
cust_id, cust_details
101, [{'self': 'https://website.com/rest/api/2/customFieldOption/1', 'value': 'Type-A', 'id': '1'},
{'self': 'https://website.com/rest/api/2/customFieldOption/2', 'value': 'Type-B', 'id': '2'},
{'self': 'https://website.com/rest/api/2/customFieldOption/3', 'value': 'Type-C', 'id': '3'},
{'self': 'https://website.com/rest/api/2/customFieldOption/4', 'value': 'Type-D', 'id': '4'}]
102, [{'self': 'https://website.com/rest/api/2/customFieldOption/5', 'value': 'Type-X', 'id': '5'},
{'self': 'https://website.com/rest/api/2/customFieldOption/6', 'value': 'Type-Y', 'id': '6'}]
I am trying to extract for every cust_id all cust_detail values
Expected output:
cust_id, new_value
101,Type-A, Type-B, Type-C, Type-D
102,Type-X, Type-Y
Easy answer:
df['new_value'] = df.cust_details.apply(lambda ds: [d['value'] for d in ds])
More complex, potentially better answer:
Rather than storing lists of dictionaries in the first place, I'd recommend making each dictionary a row in the original dataframe.
df = pd.concat([
df['cust_id'],
pd.DataFrame(
df['cust_details'].explode().values.tolist(),
index=df['cust_details'].explode().index
)
], axis=1)
If you need to group values by id, you can do so via standard groupby methods:
df.groupby('cust_id')['value'].apply(list)
This may seem more complex, but depending on your use case might save you effort in the long-run.

Save DataFrame as Json with non unique index

my DF is:
df = pd.DataFrame({'city': ['POA', 'POA', 'SAN'], 'info' : [10,12,5]}, index = [4314902, 4314902, 4300803])
df.index.rename('ID_city', inplace=True)
output:
city info
ID_city
4314902 POA 10
4314902 POA 12
4300803 SAN 5
I need to save as json oriented by index. The following command works only when each index is unique.
df.to_json('df.json', orient='index')
Is possible to save this DataFrame and when he find a duplicate index, create a array?
My desire output:
{ 4314902 : [ {'city': 'POA', 'info': 10} , {'city': 'POA', 'info': 11} ]
,4300803 : {'city': 'SAN', 'info': 5} }
I'm not aware of built-in Pandas functionality, that handles duplicate indexes in json orient='index' exporting.
You could of course build this manually. Merge the columns into one that contains a dict:
cols_as_dict = df.apply(dict, axis=1)
ID_city
4314902 {'city': 'POA', 'info': 10}
4314902 {'city': 'POA', 'info': 12}
4300803 {'city': 'SAN', 'info': 5}
Put rows into lists, grouped by the index:
combined = cols_as_dict.groupby(cols_as_dict.index).apply(list)
ID_city
4300803 [{'city': 'SAN', 'info': 5}]
4314902 [{'city': 'POA', 'info': 10}, {'city': 'POA', ...
Then write the json:
combined.to_json()
'{"4300803":[{"city":"SAN","info":5}],"4314902":[{"city":"POA","info":10},{"city":"POA","info":12}]}'
It creates a list even if there's just a single entry per index. That should make processing actually easier than if you mix the data types (either list of elements or single element).
If you are set on the mixed type (either dict or list of several dicts), then do combined.to_dict(), change the lists with single elements back into their first element, and then dump the json.

API Call - Multi dimensional nested dictionary to pandas data frame

I need your help with converting a multidimensional dict to a pandas data frame. I get the dict from a JSON file which I retrieve from a API call (Shopify).
response = requests.get("URL", auth=("ID","KEY"))
data = json.loads(response.text)
The "data" dictionary looks as follows:
{'orders': [{'created_at': '2016-09-20T22:04:49+02:00',
'email': 'test#aol.com',
'id': 4314127108,
'line_items': [{'destination_location':
{'address1': 'Teststreet 12',
'address2': '',
'city': 'Berlin',
'country_code': 'DE',
'id': 2383331012,
'name': 'Test Test',
'zip': '10117'},
'gift_card': False,
'name': 'Blueberry Cup'}]
}]}
In this case the dictionary has 4 Dimensions and I would like to convert the dict into a pandas data frame. I tried everything ranging from json_normalize() to pandas.DataFrame.from_dict(), yet I did not manage to get anywhere. When I try to convert the dict to a df, I get columns which contain list of lists.
Does anyone know how to approach that?
Thanks
EDITED:
Thank you #piRSquared. Your solution works fine! However, how you solve it if there was another product in the order? Because then it does work. JSON response of an order with 2 products is as follows (goals is to have a second row with the same "created_at". "email" etc. columns):
{'orders': [{'created_at': '2016-09-20T22:04:49+02:00',
'email': 'test#aol.com',
'id': 4314127108,
'line_items': [{'destination_location':
{'address1': 'Teststreet 12',
'address2': '',
'city': 'Berlin',
'country_code': 'DE',
'id': 2383331012,
'name': 'Test Test',
'zip': '10117'},
'gift_card': False,
'name': 'Blueberry Cup'},
{'destination_location':
{'address1': 'Teststreet 12',
'address2': '',
'city': 'Berlin',
'country_code': 'DE',
'id': 2383331012,
'name': 'Test Test',
'zip': '10117'},
'gift_card': False,
'name': 'Strawberry Cup'}]
}]}
So the df in the end should be on a row by row basis for all sold products. Thank you, I really appreciate your help!
There are a number of ways to do this. This is just a way I decided to do it. You need to explore how you want to see this represented, then figure out how to get there.
df = pd.DataFrame(data['orders'])
df1 = df.line_items.str[0].apply(pd.Series)
df2 = df1.destination_location.apply(pd.Series)
pd.concat([df.drop('line_items', 1), df1.drop('destination_location', 1), df2],
axis=1, keys=['', 'line_items', 'destination_location'])

Categories