I need your help with converting a multidimensional dict to a pandas data frame. I get the dict from a JSON file which I retrieve from a API call (Shopify).
response = requests.get("URL", auth=("ID","KEY"))
data = json.loads(response.text)
The "data" dictionary looks as follows:
{'orders': [{'created_at': '2016-09-20T22:04:49+02:00',
'email': 'test#aol.com',
'id': 4314127108,
'line_items': [{'destination_location':
{'address1': 'Teststreet 12',
'address2': '',
'city': 'Berlin',
'country_code': 'DE',
'id': 2383331012,
'name': 'Test Test',
'zip': '10117'},
'gift_card': False,
'name': 'Blueberry Cup'}]
}]}
In this case the dictionary has 4 Dimensions and I would like to convert the dict into a pandas data frame. I tried everything ranging from json_normalize() to pandas.DataFrame.from_dict(), yet I did not manage to get anywhere. When I try to convert the dict to a df, I get columns which contain list of lists.
Does anyone know how to approach that?
Thanks
EDITED:
Thank you #piRSquared. Your solution works fine! However, how you solve it if there was another product in the order? Because then it does work. JSON response of an order with 2 products is as follows (goals is to have a second row with the same "created_at". "email" etc. columns):
{'orders': [{'created_at': '2016-09-20T22:04:49+02:00',
'email': 'test#aol.com',
'id': 4314127108,
'line_items': [{'destination_location':
{'address1': 'Teststreet 12',
'address2': '',
'city': 'Berlin',
'country_code': 'DE',
'id': 2383331012,
'name': 'Test Test',
'zip': '10117'},
'gift_card': False,
'name': 'Blueberry Cup'},
{'destination_location':
{'address1': 'Teststreet 12',
'address2': '',
'city': 'Berlin',
'country_code': 'DE',
'id': 2383331012,
'name': 'Test Test',
'zip': '10117'},
'gift_card': False,
'name': 'Strawberry Cup'}]
}]}
So the df in the end should be on a row by row basis for all sold products. Thank you, I really appreciate your help!
There are a number of ways to do this. This is just a way I decided to do it. You need to explore how you want to see this represented, then figure out how to get there.
df = pd.DataFrame(data['orders'])
df1 = df.line_items.str[0].apply(pd.Series)
df2 = df1.destination_location.apply(pd.Series)
pd.concat([df.drop('line_items', 1), df1.drop('destination_location', 1), df2],
axis=1, keys=['', 'line_items', 'destination_location'])
Related
I know the "create pandas dataframe from nested dict" has a lot of entries here but I'm not found the answer that applies to my problem:
I have a dict like this:
{'id': 1,
'creator_user_id': {'id': 12170254,
'name': 'Nicolas',
'email': 'some_mail#some_email_provider.com',
'has_pic': 0,
'pic_hash': None,
'active_flag': True,
'value': 12170254}....,
and after reading with pandas look like this:
df = pd.DataFrame.from_dict(my_dict,orient='index')
print(df)
id 1
creator_user_id {'id': 12170254, 'name': 'Nicolas', 'email': '...
user_id {'id': 12264469, 'name': 'Daniela Giraldo G', ...
person_id {'active_flag': True, 'name': 'Cristina Cardoz...
org_id {'name': 'Cristina Cardozo', 'people_count': 1...
stage_id 2
title Cristina Cardozo
I would like to create a one-row dataframe where, for example, the nested creator_user_id column results in several columns that I after can name: creator_user_id_id, creator_user_id_name, etc.
thank you for your time!
Given you want one row, just use json_normalize()
pd.json_normalize({'id': 1,
'creator_user_id': {'id': 12170254,
'name': 'Nicolas',
'email': 'some_mail#some_email_provider.com',
'has_pic': 0,
'pic_hash': None,
'active_flag': True,
'value': 12170254}})
I am trying to loop through the json below but i am only getting the 1st item. I understand the specifying the key [1] is the reason why. How can I overcome this?
for i in testing['Items']:
MyFunc = testing['Items'][1]['Id']
Containers = UrlFormater(MyFunc)
JSON:
{'Items': [{'Id': 'Test1', 'Type': 'Address', 'Text': '',
'Highlight': '', 'Description': ''}, {'Id': 'Test2', 'Type':
'Address', 'Text': '', 'Highlight': '', 'Description': ''}
}]}
When using a for-loop to iterate through a list you get a variable, which in your example you named i which holds the current element of the list. For example if you loop over the list ['Berlin', 'Paris', 'Bern'] i is equal to 'Berlin' in the first pass, in the second pass it's equal to 'Paris' and in the last pass equal to 'Bern'. With this knowledge you now can refactor your code to look like this:
for i in testing['Items']:
Containers = UrlFormater(i['Id'])
I'm working with an API trying to currently pull data out of it. The challenge I'm having is that the majority of the columns are straight forward and not nested, with the exception of a CustomFields column which has all the various custom fields used located in a list per record.
Using json_normalize is there a way to target a nested column to flatten it? I'm trying to fetch and use all the data available from the API but one nested column in particular is causing a headache.
The JSON data when retrieved from the API looks like the following. This is just for one customer profile,
[{'EmailAddress': 'an_email#gmail.com', 'Name': 'Al Smith’, 'Date': '2020-05-26 14:58:00', 'State': 'Active', 'CustomFields': [{'Key': '[Location]', 'Value': 'HJGO'}, {'Key': '[location_id]', 'Value': '34566'}, {'Key': '[customer_id]', 'Value': '9051'}, {'Key': '[status]', 'Value': 'Active'}, {'Key': '[last_visit.1]', 'Value': '2020-02-19'}]
Using json_normalize,
payload = json_normalize(payload_json['Results'])
Here are the results when I run the above code,
Ideally, here is what I would like the final result to look like,
I think I just need to work with the record_path and meta parameters but I'm not totally understanding how they work.
Any ideas? Or would using json_normalize not work in this situation?
Try this, You have square brackets in your JSON, that's why you see those [ ] :
d = [{'EmailAddress': 'an_email#gmail.com', 'Name': 'Al Smith', 'Date': '2020-05-26 14:58:00', 'State': 'Active', 'CustomFields': [{'Key': '[Location]', 'Value': 'HJGO'}, {'Key': '[location_id]', 'Value': '34566'}, {'Key': '[customer_id]', 'Value': '9051'}, {'Key': '[status]', 'Value': 'Active'}, {'Key': '[last_visit.1]', 'Value': '2020-02-19'}]}]
df = pd.json_normalize(d, record_path=['CustomFields'], meta=[['EmailAddress'], ['Name'], ['Date'], ['State']])
df = df.pivot_table(columns='Key', values='Value', index=['EmailAddress', 'Name'], aggfunc='sum')
print(df)
Output:
Key [Location] [customer_id] [last_visit.1] [location_id] [status]
EmailAddress Name
an_email#gmail.com Al Smith HJGO 9051 2020-02-19 34566 Active
This is the JSON:
[{'can_occur_before': False,
'categories': [{'id': 8, 'name': 'Airdrop'}],
'coins': [{'id': 'cashaa', 'name': 'Cashaa', 'symbol': 'CAS'}],
'created_date': '2018-05-26T03:34:05+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'Unsold Token Distribution',
'twitter_account': None,
'vote_count': 125},
{'can_occur_before': False,
'categories': [{'id': 4, 'name': 'Exchange'}],
'coins': [{'id': 'tron', 'name': 'TRON', 'symbol': 'TRX'}],
'created_date': '2018-06-04T03:54:59+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'Indodax Listing',
'twitter_account': '#PutraDwiJuliyan',
'vote_count': 75},
{'can_occur_before': False,
'categories': [{'id': 5, 'name': 'Conference'}],
'coins': [{'id': 'modum', 'name': 'Modum', 'symbol': 'MOD'}],
'created_date': '2018-05-26T03:18:03+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'SAPPHIRE NOW',
'twitter_account': None,
'vote_count': 27},
{'can_occur_before': False,
'categories': [{'id': 4, 'name': 'Exchange'}],
'coins': [{'id': 'apr-coin', 'name': 'APR Coin', 'symbol': 'APR'}],
'created_date': '2018-05-29T17:45:16+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'TopBTC Listing',
'twitter_account': '#cryptoalarm',
'vote_count': 23}]
I want to take all the date_events and append them to a list in chronological order. I currently have this code and am not sure how to order them chronologically.
date = []
for i in getevents:
date.append(i['date_event'][:10])
Thanks for any help !
Simple way is to compose a list and then apply sort() method
data = json.load(open('filename.json','r'))
dates = [item['date_event'] for i in data]
dates.sort()
Using your example data with field 'creation_date' ('date_event' values are all the same) we'll get:
['2018-05-26T03:18:03+01:00',
'2018-05-26T03:34:05+01:00',
'2018-05-29T17:45:16+01:00',
'2018-06-04T03:54:59+01:00']
First of all, all the date_event in your array of objects are all the same, so not much sense in sorting them.. Also your approach will not get you far, you need to convert the dates to native date/time objects so that you can sort them through a sorting function.
The easiest way to parse properly formatted Date/Times is to use dateutil.parse.parser, and sorting an existing list is done by list.sort() - I made a quick example on how to use these tools, also i changed the date_event values to showcase it: https://repl.it/repls/BogusSpecificRate
After you have decoded the JSON string (json.loads) and have a Python list to work with, you can proceed with sorting the list:
# Ascending
events.sort(key=lambda e: parser.parse(e['date_event']))
print([":".join([e['title'], e['date_event']]) for e in events])
# Descending
events.sort(key=lambda e: parser.parse(e['date_event']), reverse=True)
print([":".join([e['title'], e['date_event']]) for e in events])
I have a json file that I imported into pandas. The first column is filled with cells that are in json format. Below is the first cell of 10K cells or so...
df = pd.read_json("test_file.json") # import data
print (df['test_column'].iloc[0]) # print first cell
{'data': [{'time': '2016-03-25', 'id': '54', 'stop': {'length': 38, 'fun_time': False, 'before': '2015-03-24', 'id': '10xd9'}}], 'dataType': 'life', 'weird': '2013-06-15', '_id': 'dirt', '_type': 'what', 'trace': '32', 'timestamp': 1418193255, 'teller': 'jeff', 'work': '1', 'eventCategory': 'so_true', 'eventType': 'complete', 'city': 'CHI', 'type': 'some_type', 'value': '32', 'data': 'river' }}}
The code above is an approximation of the real data in each cell
Is there a quick way to extract all the key values in the json data, append them as a header to new columns in a pandas, and then add the value to the appropriate row?
Thanks
Try
pd.io.json.json_normalize(df.test_column.apply(pd.io.json.loads))