Remove duplicates from list of dictionaries within list of dictionaries

Remove duplicates from list of dictionaries within list of dictionaries - python

I have list:
my_list = [{'date': '10.06.2016',
'account': [{'name': 'a'},
{'name': 'a'},
{'name': 'b'},
{'name': 'b'}]},
{'date': '22.06.2016',
'account': [{'name': 'a'},
{'name': 'a'}]}]
I want to remove duplicates from the list of dictionaries in 'account':
my_list = [{'date': '10.06.2016',
'account': [{'name': 'a'},
{'name': 'b'}]},
{'date': '22.06.2016',
'account': [{'name': 'a'}]}]
When using set, I get the following error:
TypeError: unhashable type: 'dict'
Can anybody help me with this problem?

This structure is probably over complicated, but it gets the job done.
my_list = [{'date': '10.06.2016',
'account': [{'name': 'a'},
{'name': 'a'},
{'name': 'b'},
{'name': 'b'}]},
{'date': '22.06.2016',
'account': [{'name': 'a'},
{'name': 'a'}]}]
>>> [{'date': date,
'account': [{'name': name} for name in group]
} for group, date in zip([set(account.get('name')
for account in item.get('account'))
for item in my_list],
[d.get('date') for d in my_list])]
[{'account': [{'name': 'a'}, {'name': 'b'}], 'date': '10.06.2016'},
{'account': [{'name': 'a'}], 'date': '22.06.2016'}]

def deduplicate_account_names(l):
for d in l:
names = set(map(lambda d: d.get('name'), d['account']))
d['account'] = [{'name': name} for name in names]
# even shorter:
# def deduplicate_account_names(l):
# for d in l:
# d['account'] = [{'name': name} for name in set(map(lambda d: d.get('name'), d['account']))]
my_list = [{'date': '10.06.2016',
'account': [{'name': 'a'},
{'name': 'a'},
{'name': 'b'},
{'name': 'b'}]},
{'date': '22.06.2016',
'account': [{'name': 'a'},
{'name': 'a'}]}]
deduplicate_account_names(my_list)
print(my_list)
# [ {'date': '10.06.2016',
# 'account': [ {'name': 'a'},
# {'name': 'b'} ] },
# {'date': '22.06.2016',
# 'account': [ {'name': 'a'} ] } ]

Sets can only have hashable members and neither lists nor dicts are - but they can be checked for equality.
you can do
def without_duplicates(inlist):
outlist=[]
for e in inlist:
if e not in outlist:
outlist.append(e)
return outlist
this can be slow for really big lists

Give this code a try:
for d in my_list:
for k in d:
if k == 'account':
v = []
for d2 in d[k]:
if d2 not in v:
v.append(d2)
d[k] = v
This is what you get after running the snippet above:
In [347]: my_list
Out[347]:
[{'account': [{'name': 'a'}, {'name': 'b'}], 'date': '10.06.2016'},
{'account': [{'name': 'a'}], 'date': '22.06.2016'}]

Related

maintain dictionary structure while reducing nested dictionary

I have a list of pairs of nested dict dd and would like to maintain the structure to a list of dictionaries:
dd = [
[{'id': 'bla',
'detail': [{'name': 'discard', 'amount': '123'},
{'name': 'KEEP_PAIR_1A', 'amount': '2'}]},
{'id': 'bla2',
'detail': [{'name': 'discard', 'amount': '123'},
{'name': 'KEEP_PAIR_1B', 'amount': '1'}]}
],
[{'id': 'bla3',
'detail': [{'name': 'discard', 'amount': '123'},
{'name': 'KEEP_PAIR_2A', 'amount': '3'}]},
{'id': 'bla4',
'detail': [{'name': 'discard', 'amount': '123'},
{'name': 'KEEP_PAIR_2B', 'amount': '4'}]}
]
]
I want to reduce this to a list of paired dictionaries while extracting only some detail. For example, an expected output may look like this:
[{'name': ['KEEP_PAIR_1A', 'KEEP_PAIR_1B'], 'amount': [2, 1]},
{'name': ['KEEP_PAIR_2A', 'KEEP_PAIR_2B'], 'amount': [3, 4]}]
I have run my code:
pair=[]
for all_pairs in dd:
for output_pairs in all_pairs:
for d in output_pairs.get('detail'):
if d['name'] != 'discard':
pair.append(d)
output_pair = {
k: [d.get(k) for d in pair]
for k in set().union(*pair)
}
But it didn't maintain that structure :
{'name': ['KEEP_PAIR_1A', 'KEEP_PAIR_1B', 'KEEP_PAIR_2A', 'KEEP_PAIR_2B'],
'amount': ['2', '1', '3', '4']}
I assume I would need to use some list comprehension to solve this but where in the for loop should I do that to maintain the structure.

Since you want to combine dictionaries in lists, one option is to use dict.setdefault:
pair = []
for all_pairs in dd:
dct = {}
for output_pairs in all_pairs:
for d in output_pairs.get('detail'):
if d['name'] != 'discard':
for k,v in d.items():
dct.setdefault(k, []).append(v)
pair.append(dct)
Output:
[{'name': ['KEEP_PAIR_1A', 'KEEP_PAIR_1B'], 'amount': [2, 1]},
{'name': ['KEEP_PAIR_2A', 'KEEP_PAIR_2B'], 'amount': [3, 4]}]

Remove duplicate dict based on field values

Given the following list of dicts, I want to remove duplicates where all fields are identical except for the id field.
old_data = [
{"id":"01","name":"harry","age":21},
{"id":"02","name":"barry","age":32},
{"id":"03","name":"harry","age":44},
{"id":"04","name":"harry","age":21},
{"id":"05","name":"larry","age":66}
]
To produce the following:
new_data = [
{"id":"01","name":"harry","age":21},
{"id":"02","name":"barry","age":32},
{"id":"03","name":"harry","age":44},
{"id":"05","name":"larry","age":66}
]
My current code only works for cases where all fields of the dictionary are identical:
#! /usr/bin/python
for x in old_data:
if x not in new_d:
new_data.append(x)

Build a dict with the significant part of the dict as the key, then turn the values back into a list:
>>> old_data = [
... {"id":"01","name":"harry","age":21},
... {"id":"02","name":"barry","age":32},
... {"id":"03","name":"harry","age":44},
... {"id":"04","name":"harry","age":21},
... {"id":"05","name":"larry","age":66}
...
>>> sorted({(d["name"], d["age"]): d for d in reversed(old_data)}.values(), key=lambda d: d["id"])
[{'id': '01', 'name': 'harry', 'age': 21}, {'id': '02', 'name': 'barry', 'age': 32}, {'id': '03', 'name': 'harry', 'age': 44}, {'id': '05', 'name': 'larry', 'age': 66}]
If you don't care about which specific ids you keep or how they're sorted, it's simpler:
>>> list({(d["name"], d["age"]): d for d in old_data}.values())
[{'id': '04', 'name': 'harry', 'age': 21}, {'id': '02', 'name': 'barry', 'age': 32}, {'id': '03', 'name': 'harry', 'age': 44}, {'id': '05', 'name': 'larry', 'age': 66}]

try this: I ignore id in my comparison
def remove_duplicate(old_data):
new_data = []
for i in old_data:
found=False
for j in new_data:
if (j['name']==i['name']) & (j['age']==i['age']):
found=True
break;
if found==False:
new_data.append(i)
return new_data
old_data = [
{"id":"01","name":"harry","age":21},
{"id":"02","name":"barry","age":32},
{"id":"03","name":"harry","age":44},
{"id":"04","name":"harry","age":21},
{"id":"05","name":"larry","age":66}
]
print(remove_duplicate(old_data))
output:
[{'id': '01', 'name': 'harry', 'age': 21}, {'id': '02', 'name': 'barry', 'age': 32}, {'id': '03', 'name': 'harry', 'age': 44}, {'id': '05', 'name': 'larry', 'age': 66}]

a straight forward solution could be just to keep track of dicts in list.
old_data = [
{"id":"01","name":"harry","age":21},
{"id":"02","name":"barry","age":32},
{"id":"03","name":"harry","age":44},
{"id":"04","name":"harry","age":21},
{"id":"05","name":"larry","age":66}
]
track_list = []
new_data = []
for obj in old_data:
if [obj['name'], obj['age']] in track_list:
continue
else:
track_list.append([obj['name'], obj['age']])
new_data.append(obj)
print(new_data)
output
[{'id': '01', 'name': 'harry', 'age': 21}, {'id': '02', 'name': 'barry', 'age': 32}, {'id': '03', 'name': 'harry', 'age': 44}, {'id': '05', 'name': 'larry', 'age': 66}]

Only hardcoding 'id', not the other keys:
tmp = {}
for d in old_data:
k = frozenset(d.items() - {('id', d['id'])})
tmp.setdefault(k, d)
new_data = list(tmp.values())

How to reorder a list in Python based on its content

I have a list of dictionaries in python like this;
l = [{'name': 'John', 'age': 23},
{'name': 'Steve', 'age': 35},
{'name': 'Helen'},
{'name': 'George'},
{'name': 'Jessica', 'age': 23}]
What I am trying to achieve here is reorder the elements of l in such a way that each entry containing the key age move to the end of the list like this;
End result:
l = [{'name': 'Helen'},
{'name': 'George'},
{'name': 'Jessica', 'age': 23},
{'name': 'John', 'age': 23},
{'name': 'Steve', 'age': 35}]
How can I do this?

You can sort the list:
l.sort(key=lambda d: 'age' in d)
The key returns either True or False, based on the presence of the 'age' key; True is sorted after False. Python's sort is stable, leaving the rest of the relative ordering intact.
Demo:
>>> from pprint import pprint
>>> l = [{'name': 'John', 'age': 23},
... {'name': 'Steve', 'age': 35},
... {'name': 'Helen'},
... {'name': 'George'},
... {'name': 'Jessica', 'age': 23}]
>>> l.sort(key=lambda d: 'age' in d)
>>> pprint(l)
[{'name': 'Helen'},
{'name': 'George'},
{'age': 23, 'name': 'John'},
{'age': 35, 'name': 'Steve'},
{'age': 23, 'name': 'Jessica'}]
If you also wanted to sort by age, then retrieve the age value and return a suitable stable sentinel for those entries that do not have an age, but which will be sorted first. float('-inf') will always be sorted before any other number, for example:
l.sort(key=lambda d: d.get('age', float('-inf')))
Again, entries without an age are left in their original relative order:
>>> l.sort(key=lambda d: d.get('age', float('-inf')))
>>> pprint(l)
[{'name': 'Helen'},
{'name': 'George'},
{'age': 23, 'name': 'John'},
{'age': 23, 'name': 'Jessica'},
{'age': 35, 'name': 'Steve'}]

How to combine values in python list of dictionaries

I have a list of dictionaries that look like this:
l = [{'name': 'john', 'amount': 50}, {'name': 'al', 'amount': 20}, {'name': 'john', 'amount': 80}]
is there any way to combine/merge the matching name values dictionaries and sum the amount also?

You can use a collections.Counter() object to map names to amounts, summing them as you go along:
from collections import Counter
summed = Counter()
for d in l:
summed[d['name']] += d['amount']
result = [{'name': name, 'amount': amount} for name, amount in summed.most_common()]
The result is then also sorted by amount (highest first):
>>> from collections import Counter
>>> l = [{'name': 'john', 'amount': 50}, {'name': 'al', 'amount': 20}, {'name': 'john', 'amount': 80}]
>>> summed = Counter()
>>> for d in l:
... summed[d['name']] += d['amount']
...
>>> summed
Counter({'john': 130, 'al': 20})
>>> [{'name': name, 'amount': amount} for name, amount in summed.most_common()]
[{'amount': 130, 'name': 'john'}, {'amount': 20, 'name': 'al'}]

Duplicate python dict for each value

In a list containing dictionaries, how do I split it based on unique values of dictionaries? So for instance, this:
t = [
{'name': 'xyz', 'value': ['K','L', 'M', 'N']},
{'name': 'abc', 'value': ['O', 'P', 'K']}
]
becomes this:
t = [
{'name': 'xyz', 'value': 'K'},
{'name': 'xyz', 'value': 'L'},
{'name': 'xyz', 'value': 'M'},
{'name': 'xyz', 'value': 'N'},
{'name': 'abc', 'value': 'O'},
{'name': 'xyz', 'value': 'P'},
{'name': 'xyz', 'value': 'K'}
]

You can do this with a list comprehension. Iterate through each dictionary d, and create a new dictionary for each value in d['values']:
>>> t = [ dict(name=d['name'], value=v) for d in t for v in d['value'] ]
>>> t
[{'name': 'xyz', 'value': 'K'},
{'name': 'xyz', 'value': 'L'},
{'name': 'xyz', 'value': 'M'},
{'name': 'xyz', 'value': 'N'},
{'name': 'abc', 'value': 'O'},
{'name': 'abc', 'value': 'P'},
{'name': 'abc', 'value': 'K'}]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove duplicates from list of dictionaries within list of dictionaries - python

Sets can only have hashable members and neither lists nor dicts are - but they can be checked for equality. you can do def without_duplicates(inlist): outlist=[] for e in inlist: if e not in outlist: outlist.append(e) return outlist this can be slow for really big lists

Related

maintain dictionary structure while reducing nested dictionary

Remove duplicate dict based on field values

How to reorder a list in Python based on its content

How to combine values in python list of dictionaries

Duplicate python dict for each value

Categories

Resources