How to get value from a dict that is in a dict - python

For example, I have this dictionary.
{'count': 1, 'items': [{'date': 1649523732, 'from_id': 269690832, 'id': 190, 'out': 0, 'attachments': [{'type': 'photo', 'photo': {'album_id': -3, 'date': 1649523732, **'id': 457249932**, 'owner_id': 269690832, 'access_key': 'df14603asdd3d26e7a1f5'}}]}]}
I want to get the value of id ('id': 457249932). How do i do this?

The element is nested quite deep -- it's nested inside a list mapped to an items key, a list mapped to an attachments key, and a dictionary mapped to a photo key. So, we can do:
print(data['items'][0]['attachments'][0]['photo']['id'])
where data is the dictionary that you're indexing on.

Related

Updating dictionaries based on huge list of dicts

I have a huge(around 350k elements) list of dictionaries:
lst = [
{'data': 'xxx', 'id': 1456},
{'data': 'yyy', 'id': 24234},
{'data': 'zzz', 'id': 3222},
{'data': 'foo', 'id': 1789},
]
On the other hand I receive dictionaries(around 550k) one by one with missing value(not every dict is missing this value) which I need to update from:
example_dict = {'key': 'x', 'key2': 'y', 'id': 1456, 'data': None}
To:
example_dict = {'key': 'x', 'key2': 'y', 'id': 1456, 'data': 'xxx'}
And I need to take each dict and search withing the list for matching 'id' and update the 'data'. Doing it this way takes ages to process:
if example_dict['data'] is None:
for row in lst:
if row['id'] == example_dict['id']:
example_dict['data'] = row['data']
Is there a way to build a structured chunked data divided to in e.g. 10k values and tell the incoming dict in which chunk to search for the 'id'? Or any other way to to optimize that? Any help is appreciated, take care.
Use a dict instead of searching linearly through the list.
The first important optimization is to remove that linear search through lst, by building a dict indexed on id pointing to the rows
For example, this will be a lot faster than your code, if you have enough RAM to hold all the rows in memory:
row_dict = {row['id']: row for row in lst}
if example_dict['data'] is None:
if example_dict['id'] in row_dict:
example_dict['data'] = row_dict[example_dict['id']]['data']
This improvement will be relevant for you whether you process the rows by chunks of 10k or all at once, since dictionary lookup time is constant instead of linear in the size of lst.
Make your own chunking process
Next you ask "Is there a way to build a structured chunked data divided...". Yes, absolutely. If the data is too big to fit in memory, write a first pass function that divides the input based on id into several temporary files. They could be based on the last two digits of ID if order is irrelevant, or on ranges of ids if you prefer. Do that for both the list of rows, and the dictionaries you receive, and then process each list/dict file pairs on the same ids one at a time, with code like above.
If you have to preserve the order in which you receive the dictionaries, though, this approach will be more difficult to implement.
Some preprocessing of lst list might help a lot. E.g. transform that list of dicts into dictionary, where id would be a key.
To be precise transform lst into such structure:
lst = {
'1456': 'xxx',
'24234': 'yyy',
'3222': 'zzz',
...
}
Then when trying to check your data attributes in example_dict, just access straight to id key in lst as follows:
if example_dict['data'] is None:
example_dict['data'] = lst.get(example_dict['id'])
It should reduce the time complexity from something as quadratic complexity (n*n) to linear complexity (n).
Try creating creating a hash table (in Python, a dict) from lst to speed up the lookup based on 'id':
lst = [
{'data': 'xxx', 'id': 1456},
{'data': 'yyy', 'id': 24234},
{'data': 'zzz', 'id': 3222},
{'data': 'foo', 'id': 1789},
]
example_dict = {'key': 'x', 'key2': 'y', 'id': 1456, 'data': None}
dct ={row['id'] : row for row in lst}
if example_dict['data'] is None:
example_dict['data'] = dct[example_dict['id']]['data']
print(example_dict)
Sample output:
{'key': 'x', 'key2': 'y', 'id': 1456, 'data': 'xxx'}

Remove duplicate values from list of dictionaries

I'm trying to filter out my list of dictionaries by two keys. I have a huge list of items and I need to find a way to filter out those items that have repeated 'id' and 'updated_at' keys.
Here is the item list example:
items = [{
'id': 1,
'updated_at': '11/11/2020T00:00:00',
'title': 'Some title',
'value': 'Some value',
'replies': 1
}, {
'id': 1,
'updated_at': '11/11/2020T00:00:00',
'title': 'This is duplicate by id and updated',
'value': 'This item should be removed',
'replies': 1
}, {
'id': 1,
'updated_at': '11/11/2020T17:00:10',
'title': 'This is only duplicate by id',
'value': 'Some value',
'replies': 1
}]
I want to remove those dictionaries that have the same 'id' and 'updated_at'. What would be the correct way of doing this?
Instead of a list of dictionary, why not a dictionary of dictionaries?
filtered_dict = {(d['id'], d['updated_at']): d for d in list_of_dicts}
Since you mention no preference in your question, this will probably take the last duplicate.
You could create your own dict object with a special hash, but this seems easier. If you want a list back then just take filtered_dict.values().
If by chance you only want the first match you are going to have to add a few lines of code.:
existing_dicts = set()
filtered_list = []
for d in list_of_dicts:
if (d['id'], d['updated_at']) not in existing_dicts:
existing_dicts.add((d['id'], d['updated_at']))
filtered_list.append(d)

How to extract the value of a given key based on the value of another key, from a list of nested (not-always-existing) dictionaries [Python]

I have a list of dictionaries called api_data, where each dictionary has this structure:
{
'location':
{
'indoor': 0,
'exact_location': 0,
'latitude': '45.502',
'altitude': '133.9',
'id': 12780,
'country': 'IT',
'longitude': '9.146'
},
'sampling_rate': None,
'id': 91976363,
'sensordatavalues':
[
{
'value_type': 'P1',
'value': '8.85',
'id': 197572463
},
{
'value_type': 'P2',
'value': '3.95',
'id': 197572466
}
{
'value_type': 'temperature',
'value': '20.80',
'id': 197572625
},
{
'value_type': 'humidity',
'value': '97.70',
'id': 197572626
}
],
'sensor':
{
'id': 24645,
'sensor_type':
{
'name': 'DHT22',
'id': 9,
'manufacturer':
'various'
},
'pin': '7'
},
'timestamp': '2020-04-18 18:37:50'
},
This structure is not complete for each dictionary, meaning that sometimes a dictionary, a list element or a key is missing.
I want to extract the value of a key when the key value of the same dictionary is equal to a certain value.
For example, for dictionary sensordatavalues, I want the value of the key 'value' when 'value_type' is equal to 'P1'.
I have developed this code working with for and if cycles, but I bet it is heavily inefficient.
How can I do it in a quicker and more efficient way?
Please note that sensordatavalues always exists
for sensor in api_data:
sensordatavalues = sensor['sensordatavalues']
# L_sdv = len(sensordatavalues)
for physical_quantity_recorded in sensordatavalues:
if physical_quantity_recorded['value_type'] == 'P1':
PM10_value = physical_quantity_recorded['value']
If you are confident that the value 'P1' is unique to the key you are searching, you can use the 'in' operator with dict.values()
Should be ok to omit this assignment: sensordatavalues = sensor['sensordatavalues']
for sensor in api_data:
for physical_quantity_recorded in sensor['sensordatavalues']:
if 'P1' in physical_quantity_recorded.values():
PM10_value = physical_quantity_recorded['value']
You just need one for loop:
for x in api_data["sensordatavalues"]:
if x["value_type"] == "P1":
print(x["value"])
Output:
8.85
Use dictionary.get() method if the key not exist it will return default value
for physical_quantity_recorded in api_data['sensordatavalues']:
if physical_quantity_recorded.get('value_type', 'default_value') == 'P1':
PM10_value = physical_quantity_recorded.get('value', 'default_value')
this is an alternative: jmespath - allows you to search and filter a nested dict/json :
summary of jmespath ... to access a key, use the . notation, if ur values are in a list, u access it via the [] notation
NB: dict is wrapped in a data variable
import jmespath
#sensordatavalues is a key, so we can access it directly
#the values of sensordatavalues are wrapped in a list
#to access it we pass the bracket(```[]```)
#we are interested in the dict where value_type is P1
#in jmespath, we identify that using the ? mark to precede the filter object
#pass the filter
#and finally access the key we are interested in ... value
expression = jmespath.compile('sensordatavalues[?value_type==`P1`].value')
expression.search(data)
['8.85']

How to identify a category and print from a dictionary

New to the world of python, I am trying to get a list of of categories in this dictionary a list of 'type' and 'sub-type'.
I have tried a few different things but no luck, any help would be appreciated
{'accounts': [{'account_id': 'JqRQG4WVV7IMe3LDG7Ebc97Kjoel4asdrRjqX',
'balances': {'available': 100,
'current': 110,
'iso_currency_code': 'USD',
'limit': None,
'unofficial_currency_code': None},
'mask': '0000',
'name': 'Plaid Checking',
'official_name': 'Plaid Gold Standard 0% Interest Checking',
'subtype': 'checking',
'type': 'depository'},
Iterate through the accounts and collect the types and subtypes:
for subdict in original_dictionary['accounts']:
print('{}:{}'.format(subdict['type'], subdict['subtype']))
If you have to look for types and subtypes in values corresponding to other keys besides the 'accounts' key, you'll have to iterate through the key value pairs of your original dictionary via something like:
for key, value in original_dictionary.items():

Append Dates in Chronological Order

This is the JSON:
[{'can_occur_before': False,
'categories': [{'id': 8, 'name': 'Airdrop'}],
'coins': [{'id': 'cashaa', 'name': 'Cashaa', 'symbol': 'CAS'}],
'created_date': '2018-05-26T03:34:05+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'Unsold Token Distribution',
'twitter_account': None,
'vote_count': 125},
{'can_occur_before': False,
'categories': [{'id': 4, 'name': 'Exchange'}],
'coins': [{'id': 'tron', 'name': 'TRON', 'symbol': 'TRX'}],
'created_date': '2018-06-04T03:54:59+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'Indodax Listing',
'twitter_account': '#PutraDwiJuliyan',
'vote_count': 75},
{'can_occur_before': False,
'categories': [{'id': 5, 'name': 'Conference'}],
'coins': [{'id': 'modum', 'name': 'Modum', 'symbol': 'MOD'}],
'created_date': '2018-05-26T03:18:03+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'SAPPHIRE NOW',
'twitter_account': None,
'vote_count': 27},
{'can_occur_before': False,
'categories': [{'id': 4, 'name': 'Exchange'}],
'coins': [{'id': 'apr-coin', 'name': 'APR Coin', 'symbol': 'APR'}],
'created_date': '2018-05-29T17:45:16+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'TopBTC Listing',
'twitter_account': '#cryptoalarm',
'vote_count': 23}]
I want to take all the date_events and append them to a list in chronological order. I currently have this code and am not sure how to order them chronologically.
date = []
for i in getevents:
date.append(i['date_event'][:10])
Thanks for any help !
Simple way is to compose a list and then apply sort() method
data = json.load(open('filename.json','r'))
dates = [item['date_event'] for i in data]
dates.sort()
Using your example data with field 'creation_date' ('date_event' values are all the same) we'll get:
['2018-05-26T03:18:03+01:00',
'2018-05-26T03:34:05+01:00',
'2018-05-29T17:45:16+01:00',
'2018-06-04T03:54:59+01:00']
First of all, all the date_event in your array of objects are all the same, so not much sense in sorting them.. Also your approach will not get you far, you need to convert the dates to native date/time objects so that you can sort them through a sorting function.
The easiest way to parse properly formatted Date/Times is to use dateutil.parse.parser, and sorting an existing list is done by list.sort() - I made a quick example on how to use these tools, also i changed the date_event values to showcase it: https://repl.it/repls/BogusSpecificRate
After you have decoded the JSON string (json.loads) and have a Python list to work with, you can proceed with sorting the list:
# Ascending
events.sort(key=lambda e: parser.parse(e['date_event']))
print([":".join([e['title'], e['date_event']]) for e in events])
# Descending
events.sort(key=lambda e: parser.parse(e['date_event']), reverse=True)
print([":".join([e['title'], e['date_event']]) for e in events])

Categories