pandas create new columns from dictionaries - python

a portion of one column 'relatedWorkOrder' in my dataframe looks like this:
{'number': 2552, 'labor': {'name': 'IA001', 'code': '70M0901003'}...}
{'number': 2552, 'labor': {'name': 'IA001', 'code': '70M0901003'}...}
{'number': 2552, 'labor': {'name': 'IA001', 'code': '70M0901003'}...}
My desired output is to have a column 'name','labor_name','labor_code' with their respective values. I can do this using regex extract and replace:
df['name'] = df['relatedWorkOrder'].str.extract(r'{regex}',expand=False).str.replace('something','')
But I have several dictionaries in this column and in this way is tedious, also I'm wondering if it's possible doing this through accessing the keys and values of the dictionary
Any help with that?

You can join the result from pd.json_normalize:
df.join(pd.json_normalize(df['relatedWorkOrder'], sep='_'))

Related

create a pandas dataframe from python list containing tuple with nested dictionary

I have wrestled with this for a few days now, but can't figure it out.
I'm trying to create a dataframe "account_activity" from the results of an api get.
i make an api call and print it out.
account_activities = api.get_activities()
print(account_activities)
returns:
[AccountActivity({ 'activity_type': 'FILL',
'cum_qty': '100',
'id': '20211111105648607::a0ef3f04-ff00-4b8e-834d-54737d89c332',
'leaves_qty': '0',
'order_id': '32c9a40e-e6d2-4c7c-8949-a39ad32b535f',
'order_status': 'filled',
'price': '187.09',
'qty': '56',
'side': 'sell',
'symbol': 'U',
'transaction_time': '2021-11-11T15:56:48.607222Z',
'type': 'fill'})]
How do I create a dataframe "account_activity" where the keys are the column headers and the index is the transaction_time is the row index with values in the rows?
Assuming j is te JSON from your AccountActivity object:
df = pd.DataFrame(j, index=['']).set_index('transaction_time',drop=True)
How you get the JSON depends on the APIs you're using. Perhaps
j = account_activities[0].__dict__
will work?

pandas create a one row dataframe from nested dict

I know the "create pandas dataframe from nested dict" has a lot of entries here but I'm not found the answer that applies to my problem:
I have a dict like this:
{'id': 1,
'creator_user_id': {'id': 12170254,
'name': 'Nicolas',
'email': 'some_mail#some_email_provider.com',
'has_pic': 0,
'pic_hash': None,
'active_flag': True,
'value': 12170254}....,
and after reading with pandas look like this:
df = pd.DataFrame.from_dict(my_dict,orient='index')
print(df)
id 1
creator_user_id {'id': 12170254, 'name': 'Nicolas', 'email': '...
user_id {'id': 12264469, 'name': 'Daniela Giraldo G', ...
person_id {'active_flag': True, 'name': 'Cristina Cardoz...
org_id {'name': 'Cristina Cardozo', 'people_count': 1...
stage_id 2
title Cristina Cardozo
I would like to create a one-row dataframe where, for example, the nested creator_user_id column results in several columns that I after can name: creator_user_id_id, creator_user_id_name, etc.
thank you for your time!
Given you want one row, just use json_normalize()
pd.json_normalize({'id': 1,
'creator_user_id': {'id': 12170254,
'name': 'Nicolas',
'email': 'some_mail#some_email_provider.com',
'has_pic': 0,
'pic_hash': None,
'active_flag': True,
'value': 12170254}})

Json_Normalize, targeting nested columns within a specific column?

I'm working with an API trying to currently pull data out of it. The challenge I'm having is that the majority of the columns are straight forward and not nested, with the exception of a CustomFields column which has all the various custom fields used located in a list per record.
Using json_normalize is there a way to target a nested column to flatten it? I'm trying to fetch and use all the data available from the API but one nested column in particular is causing a headache.
The JSON data when retrieved from the API looks like the following. This is just for one customer profile,
[{'EmailAddress': 'an_email#gmail.com', 'Name': 'Al Smith’, 'Date': '2020-05-26 14:58:00', 'State': 'Active', 'CustomFields': [{'Key': '[Location]', 'Value': 'HJGO'}, {'Key': '[location_id]', 'Value': '34566'}, {'Key': '[customer_id]', 'Value': '9051'}, {'Key': '[status]', 'Value': 'Active'}, {'Key': '[last_visit.1]', 'Value': '2020-02-19'}]
Using json_normalize,
payload = json_normalize(payload_json['Results'])
Here are the results when I run the above code,
Ideally, here is what I would like the final result to look like,
I think I just need to work with the record_path and meta parameters but I'm not totally understanding how they work.
Any ideas? Or would using json_normalize not work in this situation?
Try this, You have square brackets in your JSON, that's why you see those [ ] :
d = [{'EmailAddress': 'an_email#gmail.com', 'Name': 'Al Smith', 'Date': '2020-05-26 14:58:00', 'State': 'Active', 'CustomFields': [{'Key': '[Location]', 'Value': 'HJGO'}, {'Key': '[location_id]', 'Value': '34566'}, {'Key': '[customer_id]', 'Value': '9051'}, {'Key': '[status]', 'Value': 'Active'}, {'Key': '[last_visit.1]', 'Value': '2020-02-19'}]}]
df = pd.json_normalize(d, record_path=['CustomFields'], meta=[['EmailAddress'], ['Name'], ['Date'], ['State']])
df = df.pivot_table(columns='Key', values='Value', index=['EmailAddress', 'Name'], aggfunc='sum')
print(df)
Output:
Key [Location] [customer_id] [last_visit.1] [location_id] [status]
EmailAddress Name
an_email#gmail.com Al Smith HJGO 9051 2020-02-19 34566 Active

Pandas - Extracting values from a Dataframe column

I have a Dataframe in the below format:
cust_id, cust_details
101, [{'self': 'https://website.com/rest/api/2/customFieldOption/1', 'value': 'Type-A', 'id': '1'},
{'self': 'https://website.com/rest/api/2/customFieldOption/2', 'value': 'Type-B', 'id': '2'},
{'self': 'https://website.com/rest/api/2/customFieldOption/3', 'value': 'Type-C', 'id': '3'},
{'self': 'https://website.com/rest/api/2/customFieldOption/4', 'value': 'Type-D', 'id': '4'}]
102, [{'self': 'https://website.com/rest/api/2/customFieldOption/5', 'value': 'Type-X', 'id': '5'},
{'self': 'https://website.com/rest/api/2/customFieldOption/6', 'value': 'Type-Y', 'id': '6'}]
I am trying to extract for every cust_id all cust_detail values
Expected output:
cust_id, new_value
101,Type-A, Type-B, Type-C, Type-D
102,Type-X, Type-Y
Easy answer:
df['new_value'] = df.cust_details.apply(lambda ds: [d['value'] for d in ds])
More complex, potentially better answer:
Rather than storing lists of dictionaries in the first place, I'd recommend making each dictionary a row in the original dataframe.
df = pd.concat([
df['cust_id'],
pd.DataFrame(
df['cust_details'].explode().values.tolist(),
index=df['cust_details'].explode().index
)
], axis=1)
If you need to group values by id, you can do so via standard groupby methods:
df.groupby('cust_id')['value'].apply(list)
This may seem more complex, but depending on your use case might save you effort in the long-run.

Save DataFrame as Json with non unique index

my DF is:
df = pd.DataFrame({'city': ['POA', 'POA', 'SAN'], 'info' : [10,12,5]}, index = [4314902, 4314902, 4300803])
df.index.rename('ID_city', inplace=True)
output:
city info
ID_city
4314902 POA 10
4314902 POA 12
4300803 SAN 5
I need to save as json oriented by index. The following command works only when each index is unique.
df.to_json('df.json', orient='index')
Is possible to save this DataFrame and when he find a duplicate index, create a array?
My desire output:
{ 4314902 : [ {'city': 'POA', 'info': 10} , {'city': 'POA', 'info': 11} ]
,4300803 : {'city': 'SAN', 'info': 5} }
I'm not aware of built-in Pandas functionality, that handles duplicate indexes in json orient='index' exporting.
You could of course build this manually. Merge the columns into one that contains a dict:
cols_as_dict = df.apply(dict, axis=1)
ID_city
4314902 {'city': 'POA', 'info': 10}
4314902 {'city': 'POA', 'info': 12}
4300803 {'city': 'SAN', 'info': 5}
Put rows into lists, grouped by the index:
combined = cols_as_dict.groupby(cols_as_dict.index).apply(list)
ID_city
4300803 [{'city': 'SAN', 'info': 5}]
4314902 [{'city': 'POA', 'info': 10}, {'city': 'POA', ...
Then write the json:
combined.to_json()
'{"4300803":[{"city":"SAN","info":5}],"4314902":[{"city":"POA","info":10},{"city":"POA","info":12}]}'
It creates a list even if there's just a single entry per index. That should make processing actually easier than if you mix the data types (either list of elements or single element).
If you are set on the mixed type (either dict or list of several dicts), then do combined.to_dict(), change the lists with single elements back into their first element, and then dump the json.

Categories