Remove duplicate values from list of dictionaries - python

I'm trying to filter out my list of dictionaries by two keys. I have a huge list of items and I need to find a way to filter out those items that have repeated 'id' and 'updated_at' keys.
Here is the item list example:
items = [{
'id': 1,
'updated_at': '11/11/2020T00:00:00',
'title': 'Some title',
'value': 'Some value',
'replies': 1
}, {
'id': 1,
'updated_at': '11/11/2020T00:00:00',
'title': 'This is duplicate by id and updated',
'value': 'This item should be removed',
'replies': 1
}, {
'id': 1,
'updated_at': '11/11/2020T17:00:10',
'title': 'This is only duplicate by id',
'value': 'Some value',
'replies': 1
}]
I want to remove those dictionaries that have the same 'id' and 'updated_at'. What would be the correct way of doing this?

Instead of a list of dictionary, why not a dictionary of dictionaries?
filtered_dict = {(d['id'], d['updated_at']): d for d in list_of_dicts}
Since you mention no preference in your question, this will probably take the last duplicate.
You could create your own dict object with a special hash, but this seems easier. If you want a list back then just take filtered_dict.values().
If by chance you only want the first match you are going to have to add a few lines of code.:
existing_dicts = set()
filtered_list = []
for d in list_of_dicts:
if (d['id'], d['updated_at']) not in existing_dicts:
existing_dicts.add((d['id'], d['updated_at']))
filtered_list.append(d)

Related

How to loop through a JSON?

I am trying to loop through the json below but i am only getting the 1st item. I understand the specifying the key [1] is the reason why. How can I overcome this?
for i in testing['Items']:
MyFunc = testing['Items'][1]['Id']
Containers = UrlFormater(MyFunc)
JSON:
{'Items': [{'Id': 'Test1', 'Type': 'Address', 'Text': '',
'Highlight': '', 'Description': ''}, {'Id': 'Test2', 'Type':
'Address', 'Text': '', 'Highlight': '', 'Description': ''}
}]}
When using a for-loop to iterate through a list you get a variable, which in your example you named i which holds the current element of the list. For example if you loop over the list ['Berlin', 'Paris', 'Bern'] i is equal to 'Berlin' in the first pass, in the second pass it's equal to 'Paris' and in the last pass equal to 'Bern'. With this knowledge you now can refactor your code to look like this:
for i in testing['Items']:
Containers = UrlFormater(i['Id'])

How to extract the value of a given key based on the value of another key, from a list of nested (not-always-existing) dictionaries [Python]

I have a list of dictionaries called api_data, where each dictionary has this structure:
{
'location':
{
'indoor': 0,
'exact_location': 0,
'latitude': '45.502',
'altitude': '133.9',
'id': 12780,
'country': 'IT',
'longitude': '9.146'
},
'sampling_rate': None,
'id': 91976363,
'sensordatavalues':
[
{
'value_type': 'P1',
'value': '8.85',
'id': 197572463
},
{
'value_type': 'P2',
'value': '3.95',
'id': 197572466
}
{
'value_type': 'temperature',
'value': '20.80',
'id': 197572625
},
{
'value_type': 'humidity',
'value': '97.70',
'id': 197572626
}
],
'sensor':
{
'id': 24645,
'sensor_type':
{
'name': 'DHT22',
'id': 9,
'manufacturer':
'various'
},
'pin': '7'
},
'timestamp': '2020-04-18 18:37:50'
},
This structure is not complete for each dictionary, meaning that sometimes a dictionary, a list element or a key is missing.
I want to extract the value of a key when the key value of the same dictionary is equal to a certain value.
For example, for dictionary sensordatavalues, I want the value of the key 'value' when 'value_type' is equal to 'P1'.
I have developed this code working with for and if cycles, but I bet it is heavily inefficient.
How can I do it in a quicker and more efficient way?
Please note that sensordatavalues always exists
for sensor in api_data:
sensordatavalues = sensor['sensordatavalues']
# L_sdv = len(sensordatavalues)
for physical_quantity_recorded in sensordatavalues:
if physical_quantity_recorded['value_type'] == 'P1':
PM10_value = physical_quantity_recorded['value']
If you are confident that the value 'P1' is unique to the key you are searching, you can use the 'in' operator with dict.values()
Should be ok to omit this assignment: sensordatavalues = sensor['sensordatavalues']
for sensor in api_data:
for physical_quantity_recorded in sensor['sensordatavalues']:
if 'P1' in physical_quantity_recorded.values():
PM10_value = physical_quantity_recorded['value']
You just need one for loop:
for x in api_data["sensordatavalues"]:
if x["value_type"] == "P1":
print(x["value"])
Output:
8.85
Use dictionary.get() method if the key not exist it will return default value
for physical_quantity_recorded in api_data['sensordatavalues']:
if physical_quantity_recorded.get('value_type', 'default_value') == 'P1':
PM10_value = physical_quantity_recorded.get('value', 'default_value')
this is an alternative: jmespath - allows you to search and filter a nested dict/json :
summary of jmespath ... to access a key, use the . notation, if ur values are in a list, u access it via the [] notation
NB: dict is wrapped in a data variable
import jmespath
#sensordatavalues is a key, so we can access it directly
#the values of sensordatavalues are wrapped in a list
#to access it we pass the bracket(```[]```)
#we are interested in the dict where value_type is P1
#in jmespath, we identify that using the ? mark to precede the filter object
#pass the filter
#and finally access the key we are interested in ... value
expression = jmespath.compile('sensordatavalues[?value_type==`P1`].value')
expression.search(data)
['8.85']

How to identify a category and print from a dictionary

New to the world of python, I am trying to get a list of of categories in this dictionary a list of 'type' and 'sub-type'.
I have tried a few different things but no luck, any help would be appreciated
{'accounts': [{'account_id': 'JqRQG4WVV7IMe3LDG7Ebc97Kjoel4asdrRjqX',
'balances': {'available': 100,
'current': 110,
'iso_currency_code': 'USD',
'limit': None,
'unofficial_currency_code': None},
'mask': '0000',
'name': 'Plaid Checking',
'official_name': 'Plaid Gold Standard 0% Interest Checking',
'subtype': 'checking',
'type': 'depository'},
Iterate through the accounts and collect the types and subtypes:
for subdict in original_dictionary['accounts']:
print('{}:{}'.format(subdict['type'], subdict['subtype']))
If you have to look for types and subtypes in values corresponding to other keys besides the 'accounts' key, you'll have to iterate through the key value pairs of your original dictionary via something like:
for key, value in original_dictionary.items():

Append Dates in Chronological Order

This is the JSON:
[{'can_occur_before': False,
'categories': [{'id': 8, 'name': 'Airdrop'}],
'coins': [{'id': 'cashaa', 'name': 'Cashaa', 'symbol': 'CAS'}],
'created_date': '2018-05-26T03:34:05+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'Unsold Token Distribution',
'twitter_account': None,
'vote_count': 125},
{'can_occur_before': False,
'categories': [{'id': 4, 'name': 'Exchange'}],
'coins': [{'id': 'tron', 'name': 'TRON', 'symbol': 'TRX'}],
'created_date': '2018-06-04T03:54:59+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'Indodax Listing',
'twitter_account': '#PutraDwiJuliyan',
'vote_count': 75},
{'can_occur_before': False,
'categories': [{'id': 5, 'name': 'Conference'}],
'coins': [{'id': 'modum', 'name': 'Modum', 'symbol': 'MOD'}],
'created_date': '2018-05-26T03:18:03+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'SAPPHIRE NOW',
'twitter_account': None,
'vote_count': 27},
{'can_occur_before': False,
'categories': [{'id': 4, 'name': 'Exchange'}],
'coins': [{'id': 'apr-coin', 'name': 'APR Coin', 'symbol': 'APR'}],
'created_date': '2018-05-29T17:45:16+01:00',
'date_event': '2018-06-05T00:00:00+01:00',
'title': 'TopBTC Listing',
'twitter_account': '#cryptoalarm',
'vote_count': 23}]
I want to take all the date_events and append them to a list in chronological order. I currently have this code and am not sure how to order them chronologically.
date = []
for i in getevents:
date.append(i['date_event'][:10])
Thanks for any help !
Simple way is to compose a list and then apply sort() method
data = json.load(open('filename.json','r'))
dates = [item['date_event'] for i in data]
dates.sort()
Using your example data with field 'creation_date' ('date_event' values are all the same) we'll get:
['2018-05-26T03:18:03+01:00',
'2018-05-26T03:34:05+01:00',
'2018-05-29T17:45:16+01:00',
'2018-06-04T03:54:59+01:00']
First of all, all the date_event in your array of objects are all the same, so not much sense in sorting them.. Also your approach will not get you far, you need to convert the dates to native date/time objects so that you can sort them through a sorting function.
The easiest way to parse properly formatted Date/Times is to use dateutil.parse.parser, and sorting an existing list is done by list.sort() - I made a quick example on how to use these tools, also i changed the date_event values to showcase it: https://repl.it/repls/BogusSpecificRate
After you have decoded the JSON string (json.loads) and have a Python list to work with, you can proceed with sorting the list:
# Ascending
events.sort(key=lambda e: parser.parse(e['date_event']))
print([":".join([e['title'], e['date_event']]) for e in events])
# Descending
events.sort(key=lambda e: parser.parse(e['date_event']), reverse=True)
print([":".join([e['title'], e['date_event']]) for e in events])

Python: list indices must be integers or slices, not str

Hi am trying to print a list of string in python but still its showing me this error.
"list indices must be integers or slices, not str"
code:
Features ['entity_number',
'type',
'programs',
'name',
'title',
'addresses']
So in here i just want to display the data under 'name'.
can some one help me to resolve this problem..
enter image description here
it looks like you are looking for a dictionary{} and not a list[]. A dictionary has the added benefit of allowing for what is known as a 'key: value' pairs. If you know your key, you can get your value!
Features = {
'entity_number': 'some number',
'type': 'some type',
'programs': 'some program',
'name': 'some name',
'title': 'some title',
'addresses': 'some address'
}
To find a specific value from a key, you can do the following:
for key, value in Features.items():
if key is 'name': #'name' is the key we wish to get the value from
print(value) # print its value
this will give you the output:
some name
I hope this helped.
Try:
import pandas as pd
features= pd.Dataframe({
'entity_number': list1,
'type': list2,
'programs': list3,
'name': list4
'title': list5
'addresses': list6
})

Categories