How to loop through a JSON? - python

I am trying to loop through the json below but i am only getting the 1st item. I understand the specifying the key [1] is the reason why. How can I overcome this?
for i in testing['Items']:
MyFunc = testing['Items'][1]['Id']
Containers = UrlFormater(MyFunc)
JSON:
{'Items': [{'Id': 'Test1', 'Type': 'Address', 'Text': '',
'Highlight': '', 'Description': ''}, {'Id': 'Test2', 'Type':
'Address', 'Text': '', 'Highlight': '', 'Description': ''}
}]}

When using a for-loop to iterate through a list you get a variable, which in your example you named i which holds the current element of the list. For example if you loop over the list ['Berlin', 'Paris', 'Bern'] i is equal to 'Berlin' in the first pass, in the second pass it's equal to 'Paris' and in the last pass equal to 'Bern'. With this knowledge you now can refactor your code to look like this:
for i in testing['Items']:
Containers = UrlFormater(i['Id'])

Related

pandas split list like object

Hi I have this column of data named labels:
[{'id': 123456,
'name': John,
'age': 22,
'pet': None,
'gender': male,
'result': [{'id': 'vEo0PIYPEE',
'type': 'choices',
'value': {'choices': ['Same Person']},
'to_name': 'image',
'from_name': 'person_evaluation'}]}]
[{'id': 123457,
'name': May,
'age': 21,
'pet': None,
'gender': female,
'result': [{'id': zTHYuKIOQ',
'type': 'choices',
'value': {'choices': ['Different Person']},
'to_name': 'image',
'from_name': 'person_evaluation'}]}]
......
Not sure what type is this, and I would like to break this down, to extract the value [Same Person], the outcome should be something like this:
0 [Same Person]
1 [Different Person]
....
How should I achieve this?
Based on the limited data that you have provided, would this work?
df['labels_new'] = df['labels'].apply(lambda x: x[0].get('result')[0].get('value').get('choices'))
labels labels_new
0 [{'id': 123456, 'name': 'John', 'age': 22, 'pe... [Same Person]
1 [{'id': 123457, 'name': 'May', 'age': 21, 'pet... [Different Person]
You can use the following as well, but I find dict.get() to be more versatile (returning default values for example) and has better exception handling.
df['labels'].apply(lambda x: x[0]['result'][0]['value']['choices'])
You could consider using pd.json_normalize (read more here) but for the current state of your column that you have, its going to be a bit complex to extract the data with that, rather than simply using a lambda function

Remove duplicate values from list of dictionaries

I'm trying to filter out my list of dictionaries by two keys. I have a huge list of items and I need to find a way to filter out those items that have repeated 'id' and 'updated_at' keys.
Here is the item list example:
items = [{
'id': 1,
'updated_at': '11/11/2020T00:00:00',
'title': 'Some title',
'value': 'Some value',
'replies': 1
}, {
'id': 1,
'updated_at': '11/11/2020T00:00:00',
'title': 'This is duplicate by id and updated',
'value': 'This item should be removed',
'replies': 1
}, {
'id': 1,
'updated_at': '11/11/2020T17:00:10',
'title': 'This is only duplicate by id',
'value': 'Some value',
'replies': 1
}]
I want to remove those dictionaries that have the same 'id' and 'updated_at'. What would be the correct way of doing this?
Instead of a list of dictionary, why not a dictionary of dictionaries?
filtered_dict = {(d['id'], d['updated_at']): d for d in list_of_dicts}
Since you mention no preference in your question, this will probably take the last duplicate.
You could create your own dict object with a special hash, but this seems easier. If you want a list back then just take filtered_dict.values().
If by chance you only want the first match you are going to have to add a few lines of code.:
existing_dicts = set()
filtered_list = []
for d in list_of_dicts:
if (d['id'], d['updated_at']) not in existing_dicts:
existing_dicts.add((d['id'], d['updated_at']))
filtered_list.append(d)

Json_Normalize, targeting nested columns within a specific column?

I'm working with an API trying to currently pull data out of it. The challenge I'm having is that the majority of the columns are straight forward and not nested, with the exception of a CustomFields column which has all the various custom fields used located in a list per record.
Using json_normalize is there a way to target a nested column to flatten it? I'm trying to fetch and use all the data available from the API but one nested column in particular is causing a headache.
The JSON data when retrieved from the API looks like the following. This is just for one customer profile,
[{'EmailAddress': 'an_email#gmail.com', 'Name': 'Al Smith’, 'Date': '2020-05-26 14:58:00', 'State': 'Active', 'CustomFields': [{'Key': '[Location]', 'Value': 'HJGO'}, {'Key': '[location_id]', 'Value': '34566'}, {'Key': '[customer_id]', 'Value': '9051'}, {'Key': '[status]', 'Value': 'Active'}, {'Key': '[last_visit.1]', 'Value': '2020-02-19'}]
Using json_normalize,
payload = json_normalize(payload_json['Results'])
Here are the results when I run the above code,
Ideally, here is what I would like the final result to look like,
I think I just need to work with the record_path and meta parameters but I'm not totally understanding how they work.
Any ideas? Or would using json_normalize not work in this situation?
Try this, You have square brackets in your JSON, that's why you see those [ ] :
d = [{'EmailAddress': 'an_email#gmail.com', 'Name': 'Al Smith', 'Date': '2020-05-26 14:58:00', 'State': 'Active', 'CustomFields': [{'Key': '[Location]', 'Value': 'HJGO'}, {'Key': '[location_id]', 'Value': '34566'}, {'Key': '[customer_id]', 'Value': '9051'}, {'Key': '[status]', 'Value': 'Active'}, {'Key': '[last_visit.1]', 'Value': '2020-02-19'}]}]
df = pd.json_normalize(d, record_path=['CustomFields'], meta=[['EmailAddress'], ['Name'], ['Date'], ['State']])
df = df.pivot_table(columns='Key', values='Value', index=['EmailAddress', 'Name'], aggfunc='sum')
print(df)
Output:
Key [Location] [customer_id] [last_visit.1] [location_id] [status]
EmailAddress Name
an_email#gmail.com Al Smith HJGO 9051 2020-02-19 34566 Active

Append Dates in Chronological Order

This is the JSON:
[{'can_occur_before': False,
'categories': [{'id': 8, 'name': 'Airdrop'}],
'coins': [{'id': 'cashaa', 'name': 'Cashaa', 'symbol': 'CAS'}],
'created_date': '2018-05-26T03:34:05+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'Unsold Token Distribution',
'twitter_account': None,
'vote_count': 125},
{'can_occur_before': False,
'categories': [{'id': 4, 'name': 'Exchange'}],
'coins': [{'id': 'tron', 'name': 'TRON', 'symbol': 'TRX'}],
'created_date': '2018-06-04T03:54:59+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'Indodax Listing',
'twitter_account': '#PutraDwiJuliyan',
'vote_count': 75},
{'can_occur_before': False,
'categories': [{'id': 5, 'name': 'Conference'}],
'coins': [{'id': 'modum', 'name': 'Modum', 'symbol': 'MOD'}],
'created_date': '2018-05-26T03:18:03+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'SAPPHIRE NOW',
'twitter_account': None,
'vote_count': 27},
{'can_occur_before': False,
'categories': [{'id': 4, 'name': 'Exchange'}],
'coins': [{'id': 'apr-coin', 'name': 'APR Coin', 'symbol': 'APR'}],
'created_date': '2018-05-29T17:45:16+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'TopBTC Listing',
'twitter_account': '#cryptoalarm',
'vote_count': 23}]
I want to take all the date_events and append them to a list in chronological order. I currently have this code and am not sure how to order them chronologically.
date = []
for i in getevents:
date.append(i['date_event'][:10])
Thanks for any help !
Simple way is to compose a list and then apply sort() method
data = json.load(open('filename.json','r'))
dates = [item['date_event'] for i in data]
dates.sort()
Using your example data with field 'creation_date' ('date_event' values are all the same) we'll get:
['2018-05-26T03:18:03+01:00',
'2018-05-26T03:34:05+01:00',
'2018-05-29T17:45:16+01:00',
'2018-06-04T03:54:59+01:00']
First of all, all the date_event in your array of objects are all the same, so not much sense in sorting them.. Also your approach will not get you far, you need to convert the dates to native date/time objects so that you can sort them through a sorting function.
The easiest way to parse properly formatted Date/Times is to use dateutil.parse.parser, and sorting an existing list is done by list.sort() - I made a quick example on how to use these tools, also i changed the date_event values to showcase it: https://repl.it/repls/BogusSpecificRate
After you have decoded the JSON string (json.loads) and have a Python list to work with, you can proceed with sorting the list:
# Ascending
events.sort(key=lambda e: parser.parse(e['date_event']))
print([":".join([e['title'], e['date_event']]) for e in events])
# Descending
events.sort(key=lambda e: parser.parse(e['date_event']), reverse=True)
print([":".join([e['title'], e['date_event']]) for e in events])

API Call - Multi dimensional nested dictionary to pandas data frame

I need your help with converting a multidimensional dict to a pandas data frame. I get the dict from a JSON file which I retrieve from a API call (Shopify).
response = requests.get("URL", auth=("ID","KEY"))
data = json.loads(response.text)
The "data" dictionary looks as follows:
{'orders': [{'created_at': '2016-09-20T22:04:49+02:00',
'email': 'test#aol.com',
'id': 4314127108,
'line_items': [{'destination_location':
{'address1': 'Teststreet 12',
'address2': '',
'city': 'Berlin',
'country_code': 'DE',
'id': 2383331012,
'name': 'Test Test',
'zip': '10117'},
'gift_card': False,
'name': 'Blueberry Cup'}]
}]}
In this case the dictionary has 4 Dimensions and I would like to convert the dict into a pandas data frame. I tried everything ranging from json_normalize() to pandas.DataFrame.from_dict(), yet I did not manage to get anywhere. When I try to convert the dict to a df, I get columns which contain list of lists.
Does anyone know how to approach that?
Thanks
EDITED:
Thank you #piRSquared. Your solution works fine! However, how you solve it if there was another product in the order? Because then it does work. JSON response of an order with 2 products is as follows (goals is to have a second row with the same "created_at". "email" etc. columns):
{'orders': [{'created_at': '2016-09-20T22:04:49+02:00',
'email': 'test#aol.com',
'id': 4314127108,
'line_items': [{'destination_location':
{'address1': 'Teststreet 12',
'address2': '',
'city': 'Berlin',
'country_code': 'DE',
'id': 2383331012,
'name': 'Test Test',
'zip': '10117'},
'gift_card': False,
'name': 'Blueberry Cup'},
{'destination_location':
{'address1': 'Teststreet 12',
'address2': '',
'city': 'Berlin',
'country_code': 'DE',
'id': 2383331012,
'name': 'Test Test',
'zip': '10117'},
'gift_card': False,
'name': 'Strawberry Cup'}]
}]}
So the df in the end should be on a row by row basis for all sold products. Thank you, I really appreciate your help!
There are a number of ways to do this. This is just a way I decided to do it. You need to explore how you want to see this represented, then figure out how to get there.
df = pd.DataFrame(data['orders'])
df1 = df.line_items.str[0].apply(pd.Series)
df2 = df1.destination_location.apply(pd.Series)
pd.concat([df.drop('line_items', 1), df1.drop('destination_location', 1), df2],
axis=1, keys=['', 'line_items', 'destination_location'])

Categories