Python - Extract Json file into headers - python

I have a json file that I imported into pandas. The first column is filled with cells that are in json format. Below is the first cell of 10K cells or so...
df = pd.read_json("test_file.json") # import data
print (df['test_column'].iloc[0]) # print first cell
{'data': [{'time': '2016-03-25', 'id': '54', 'stop': {'length': 38, 'fun_time': False, 'before': '2015-03-24', 'id': '10xd9'}}], 'dataType': 'life', 'weird': '2013-06-15', '_id': 'dirt', '_type': 'what', 'trace': '32', 'timestamp': 1418193255, 'teller': 'jeff', 'work': '1', 'eventCategory': 'so_true', 'eventType': 'complete', 'city': 'CHI', 'type': 'some_type', 'value': '32', 'data': 'river' }}}
The code above is an approximation of the real data in each cell
Is there a quick way to extract all the key values in the json data, append them as a header to new columns in a pandas, and then add the value to the appropriate row?
Thanks

Try
pd.io.json.json_normalize(df.test_column.apply(pd.io.json.loads))

Related

create a pandas dataframe from python list containing tuple with nested dictionary

I have wrestled with this for a few days now, but can't figure it out.
I'm trying to create a dataframe "account_activity" from the results of an api get.
i make an api call and print it out.
account_activities = api.get_activities()
print(account_activities)
returns:
[AccountActivity({ 'activity_type': 'FILL',
'cum_qty': '100',
'id': '20211111105648607::a0ef3f04-ff00-4b8e-834d-54737d89c332',
'leaves_qty': '0',
'order_id': '32c9a40e-e6d2-4c7c-8949-a39ad32b535f',
'order_status': 'filled',
'price': '187.09',
'qty': '56',
'side': 'sell',
'symbol': 'U',
'transaction_time': '2021-11-11T15:56:48.607222Z',
'type': 'fill'})]
How do I create a dataframe "account_activity" where the keys are the column headers and the index is the transaction_time is the row index with values in the rows?
Assuming j is te JSON from your AccountActivity object:
df = pd.DataFrame(j, index=['']).set_index('transaction_time',drop=True)
How you get the JSON depends on the APIs you're using. Perhaps
j = account_activities[0].__dict__
will work?

How to loop through a JSON?

I am trying to loop through the json below but i am only getting the 1st item. I understand the specifying the key [1] is the reason why. How can I overcome this?
for i in testing['Items']:
MyFunc = testing['Items'][1]['Id']
Containers = UrlFormater(MyFunc)
JSON:
{'Items': [{'Id': 'Test1', 'Type': 'Address', 'Text': '',
'Highlight': '', 'Description': ''}, {'Id': 'Test2', 'Type':
'Address', 'Text': '', 'Highlight': '', 'Description': ''}
}]}
When using a for-loop to iterate through a list you get a variable, which in your example you named i which holds the current element of the list. For example if you loop over the list ['Berlin', 'Paris', 'Bern'] i is equal to 'Berlin' in the first pass, in the second pass it's equal to 'Paris' and in the last pass equal to 'Bern'. With this knowledge you now can refactor your code to look like this:
for i in testing['Items']:
Containers = UrlFormater(i['Id'])

Pandas - Extracting values from a Dataframe column

I have a Dataframe in the below format:
cust_id, cust_details
101, [{'self': 'https://website.com/rest/api/2/customFieldOption/1', 'value': 'Type-A', 'id': '1'},
{'self': 'https://website.com/rest/api/2/customFieldOption/2', 'value': 'Type-B', 'id': '2'},
{'self': 'https://website.com/rest/api/2/customFieldOption/3', 'value': 'Type-C', 'id': '3'},
{'self': 'https://website.com/rest/api/2/customFieldOption/4', 'value': 'Type-D', 'id': '4'}]
102, [{'self': 'https://website.com/rest/api/2/customFieldOption/5', 'value': 'Type-X', 'id': '5'},
{'self': 'https://website.com/rest/api/2/customFieldOption/6', 'value': 'Type-Y', 'id': '6'}]
I am trying to extract for every cust_id all cust_detail values
Expected output:
cust_id, new_value
101,Type-A, Type-B, Type-C, Type-D
102,Type-X, Type-Y
Easy answer:
df['new_value'] = df.cust_details.apply(lambda ds: [d['value'] for d in ds])
More complex, potentially better answer:
Rather than storing lists of dictionaries in the first place, I'd recommend making each dictionary a row in the original dataframe.
df = pd.concat([
df['cust_id'],
pd.DataFrame(
df['cust_details'].explode().values.tolist(),
index=df['cust_details'].explode().index
)
], axis=1)
If you need to group values by id, you can do so via standard groupby methods:
df.groupby('cust_id')['value'].apply(list)
This may seem more complex, but depending on your use case might save you effort in the long-run.

Append Dates in Chronological Order

This is the JSON:
[{'can_occur_before': False,
'categories': [{'id': 8, 'name': 'Airdrop'}],
'coins': [{'id': 'cashaa', 'name': 'Cashaa', 'symbol': 'CAS'}],
'created_date': '2018-05-26T03:34:05+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'Unsold Token Distribution',
'twitter_account': None,
'vote_count': 125},
{'can_occur_before': False,
'categories': [{'id': 4, 'name': 'Exchange'}],
'coins': [{'id': 'tron', 'name': 'TRON', 'symbol': 'TRX'}],
'created_date': '2018-06-04T03:54:59+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'Indodax Listing',
'twitter_account': '#PutraDwiJuliyan',
'vote_count': 75},
{'can_occur_before': False,
'categories': [{'id': 5, 'name': 'Conference'}],
'coins': [{'id': 'modum', 'name': 'Modum', 'symbol': 'MOD'}],
'created_date': '2018-05-26T03:18:03+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'SAPPHIRE NOW',
'twitter_account': None,
'vote_count': 27},
{'can_occur_before': False,
'categories': [{'id': 4, 'name': 'Exchange'}],
'coins': [{'id': 'apr-coin', 'name': 'APR Coin', 'symbol': 'APR'}],
'created_date': '2018-05-29T17:45:16+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'TopBTC Listing',
'twitter_account': '#cryptoalarm',
'vote_count': 23}]
I want to take all the date_events and append them to a list in chronological order. I currently have this code and am not sure how to order them chronologically.
date = []
for i in getevents:
date.append(i['date_event'][:10])
Thanks for any help !
Simple way is to compose a list and then apply sort() method
data = json.load(open('filename.json','r'))
dates = [item['date_event'] for i in data]
dates.sort()
Using your example data with field 'creation_date' ('date_event' values are all the same) we'll get:
['2018-05-26T03:18:03+01:00',
'2018-05-26T03:34:05+01:00',
'2018-05-29T17:45:16+01:00',
'2018-06-04T03:54:59+01:00']
First of all, all the date_event in your array of objects are all the same, so not much sense in sorting them.. Also your approach will not get you far, you need to convert the dates to native date/time objects so that you can sort them through a sorting function.
The easiest way to parse properly formatted Date/Times is to use dateutil.parse.parser, and sorting an existing list is done by list.sort() - I made a quick example on how to use these tools, also i changed the date_event values to showcase it: https://repl.it/repls/BogusSpecificRate
After you have decoded the JSON string (json.loads) and have a Python list to work with, you can proceed with sorting the list:
# Ascending
events.sort(key=lambda e: parser.parse(e['date_event']))
print([":".join([e['title'], e['date_event']]) for e in events])
# Descending
events.sort(key=lambda e: parser.parse(e['date_event']), reverse=True)
print([":".join([e['title'], e['date_event']]) for e in events])

API Call - Multi dimensional nested dictionary to pandas data frame

I need your help with converting a multidimensional dict to a pandas data frame. I get the dict from a JSON file which I retrieve from a API call (Shopify).
response = requests.get("URL", auth=("ID","KEY"))
data = json.loads(response.text)
The "data" dictionary looks as follows:
{'orders': [{'created_at': '2016-09-20T22:04:49+02:00',
'email': 'test#aol.com',
'id': 4314127108,
'line_items': [{'destination_location':
{'address1': 'Teststreet 12',
'address2': '',
'city': 'Berlin',
'country_code': 'DE',
'id': 2383331012,
'name': 'Test Test',
'zip': '10117'},
'gift_card': False,
'name': 'Blueberry Cup'}]
}]}
In this case the dictionary has 4 Dimensions and I would like to convert the dict into a pandas data frame. I tried everything ranging from json_normalize() to pandas.DataFrame.from_dict(), yet I did not manage to get anywhere. When I try to convert the dict to a df, I get columns which contain list of lists.
Does anyone know how to approach that?
Thanks
EDITED:
Thank you #piRSquared. Your solution works fine! However, how you solve it if there was another product in the order? Because then it does work. JSON response of an order with 2 products is as follows (goals is to have a second row with the same "created_at". "email" etc. columns):
{'orders': [{'created_at': '2016-09-20T22:04:49+02:00',
'email': 'test#aol.com',
'id': 4314127108,
'line_items': [{'destination_location':
{'address1': 'Teststreet 12',
'address2': '',
'city': 'Berlin',
'country_code': 'DE',
'id': 2383331012,
'name': 'Test Test',
'zip': '10117'},
'gift_card': False,
'name': 'Blueberry Cup'},
{'destination_location':
{'address1': 'Teststreet 12',
'address2': '',
'city': 'Berlin',
'country_code': 'DE',
'id': 2383331012,
'name': 'Test Test',
'zip': '10117'},
'gift_card': False,
'name': 'Strawberry Cup'}]
}]}
So the df in the end should be on a row by row basis for all sold products. Thank you, I really appreciate your help!
There are a number of ways to do this. This is just a way I decided to do it. You need to explore how you want to see this represented, then figure out how to get there.
df = pd.DataFrame(data['orders'])
df1 = df.line_items.str[0].apply(pd.Series)
df2 = df1.destination_location.apply(pd.Series)
pd.concat([df.drop('line_items', 1), df1.drop('destination_location', 1), df2],
axis=1, keys=['', 'line_items', 'destination_location'])

Categories