How to combine values in python list of dictionaries - python

I have a list of dictionaries that look like this:
l = [{'name': 'john', 'amount': 50}, {'name': 'al', 'amount': 20}, {'name': 'john', 'amount': 80}]
is there any way to combine/merge the matching name values dictionaries and sum the amount also?

You can use a collections.Counter() object to map names to amounts, summing them as you go along:
from collections import Counter
summed = Counter()
for d in l:
summed[d['name']] += d['amount']
result = [{'name': name, 'amount': amount} for name, amount in summed.most_common()]
The result is then also sorted by amount (highest first):
>>> from collections import Counter
>>> l = [{'name': 'john', 'amount': 50}, {'name': 'al', 'amount': 20}, {'name': 'john', 'amount': 80}]
>>> summed = Counter()
>>> for d in l:
... summed[d['name']] += d['amount']
...
>>> summed
Counter({'john': 130, 'al': 20})
>>> [{'name': name, 'amount': amount} for name, amount in summed.most_common()]
[{'amount': 130, 'name': 'john'}, {'amount': 20, 'name': 'al'}]

Related

maintain dictionary structure while reducing nested dictionary

I have a list of pairs of nested dict dd and would like to maintain the structure to a list of dictionaries:
dd = [
[{'id': 'bla',
'detail': [{'name': 'discard', 'amount': '123'},
{'name': 'KEEP_PAIR_1A', 'amount': '2'}]},
{'id': 'bla2',
'detail': [{'name': 'discard', 'amount': '123'},
{'name': 'KEEP_PAIR_1B', 'amount': '1'}]}
],
[{'id': 'bla3',
'detail': [{'name': 'discard', 'amount': '123'},
{'name': 'KEEP_PAIR_2A', 'amount': '3'}]},
{'id': 'bla4',
'detail': [{'name': 'discard', 'amount': '123'},
{'name': 'KEEP_PAIR_2B', 'amount': '4'}]}
]
]
I want to reduce this to a list of paired dictionaries while extracting only some detail. For example, an expected output may look like this:
[{'name': ['KEEP_PAIR_1A', 'KEEP_PAIR_1B'], 'amount': [2, 1]},
{'name': ['KEEP_PAIR_2A', 'KEEP_PAIR_2B'], 'amount': [3, 4]}]
I have run my code:
pair=[]
for all_pairs in dd:
for output_pairs in all_pairs:
for d in output_pairs.get('detail'):
if d['name'] != 'discard':
pair.append(d)
output_pair = {
k: [d.get(k) for d in pair]
for k in set().union(*pair)
}
But it didn't maintain that structure :
{'name': ['KEEP_PAIR_1A', 'KEEP_PAIR_1B', 'KEEP_PAIR_2A', 'KEEP_PAIR_2B'],
'amount': ['2', '1', '3', '4']}
I assume I would need to use some list comprehension to solve this but where in the for loop should I do that to maintain the structure.
Since you want to combine dictionaries in lists, one option is to use dict.setdefault:
pair = []
for all_pairs in dd:
dct = {}
for output_pairs in all_pairs:
for d in output_pairs.get('detail'):
if d['name'] != 'discard':
for k,v in d.items():
dct.setdefault(k, []).append(v)
pair.append(dct)
Output:
[{'name': ['KEEP_PAIR_1A', 'KEEP_PAIR_1B'], 'amount': [2, 1]},
{'name': ['KEEP_PAIR_2A', 'KEEP_PAIR_2B'], 'amount': [3, 4]}]

Python find duplicated dicts in list and separate them with counting

I have a dicts in a list and some dicts are identical. I want to find duplicated ones and want to add to new list or dictionary with how many duplicate they have.
import itertools
myListCombined = list()
for a, b in itertools.combinations(myList, 2):
is_equal = set(a.items()) - set(b.items())
if len(is_equal) == 0:
a.update(count=2)
myListCombined.append(a)
else:
a.update(count=1)
b.update(count=1)
myListCombined.append(a)
myListCombined.append(b)
myListCombined = [i for n, i enumerate(myListCombine) if i not in myListCombine[n + 1:]]
This code is kinda working, but it's just for 2 duplicated dicts in list. a.update(count=2) won't work in this situations.
I'm also deleting duplicated dicts after separete them in last line, but i'm not sure if it's going to work well.
Input:
[{'name': 'Mary', 'age': 25, 'salary': 1000},
{'name': 'John', 'age': 25, 'salary': 2000},
{'name': 'George', 'age': 30, 'salary': 2500},
{'name': 'John', 'age': 25, 'salary': 2000},
{'name': 'John', 'age': 25, 'salary': 2000}]
Desired Output:
[{'name': 'Mary', 'age': 25, 'salary': 1000, 'count':1},
{'name': 'John', 'age': 25, 'salary': 2000, 'count': 3},
{'name': 'George', 'age': 30, 'salary': 2500, 'count' 1}]
You could try the following, which first converts each dictionary to a frozenset of key,value tuples (so that they are hashable as required by collections.Counter).
import collections
a = [{'a':1}, {'a':1},{'b':2}]
print(collections.Counter(map(lambda x: frozenset(x.items()),a)))
Edit to reflect your desired input/output:
from copy import deepcopy
def count_duplicate_dicts(list_of_dicts):
cpy = deepcopy(list_of_dicts)
for d in list_of_dicts:
d['count'] = cpy.count(d)
return list_of_dicts
x = [{'a':1},{'a':1}, {'c':3}]
print(count_duplicate_dicts(x))
If your dict data is well structured and the contents of the dict are simple data types, e.g., numbers and string, and you have following data analysis processing, I would suggest you use pandas, which provide rich functions. Here is a sample code for your case:
In [32]: data = [{'name': 'Mary', 'age': 25, 'salary': 1000},
...: {'name': 'John', 'age': 25, 'salary': 2000},
...: {'name': 'George', 'age': 30, 'salary': 2500},
...: {'name': 'John', 'age': 25, 'salary': 2000},
...: {'name': 'John', 'age': 25, 'salary': 2000}]
...:
...: df = pd.DataFrame(data)
...: df['counts'] = 1
...: df = df.groupby(df.columns.tolist()[:-1]).sum().reset_index(drop=False)
...:
In [33]: df
Out[33]:
age name salary counts
0 25 John 2000 3
1 25 Mary 1000 1
2 30 George 2500 1
In [34]: df.to_dict(orient='records')
Out[34]:
[{'age': 25, 'counts': 3, 'name': 'John', 'salary': 2000},
{'age': 25, 'counts': 1, 'name': 'Mary', 'salary': 1000},
{'age': 30, 'counts': 1, 'name': 'George', 'salary': 2500}]
The logical are:
(1) First build the DataFrame from your data
(2) The groupby function can do aggregate function on each group.
(3) To output back to dict, you can call pd.to_dict
Pandas is a big package, which costs some time to learn it, but it worths to know pandas. It is so powerful that can make your data analysis quite faster and elegant.
Thanks.
You can try this:
import collections
d = [{'name': 'Mary', 'age': 25, 'salary': 1000},
{'name': 'John', 'age': 25, 'salary': 2000},
{'name': 'George', 'age': 30, 'salary': 2500},
{'name': 'John', 'age': 25, 'salary': 2000},
{'name': 'John', 'age': 25, 'salary': 2000}]
count = dict(collections.Counter([i["name"] for i in d]))
a = list(set(map(tuple, [i.items() for i in d])))
final_dict = [dict(list(i)+[("count", count[dict(i)["name"]])]) for i in a]
Output:
[{'salary': 2000, 'count': 3, 'age': 25, 'name': 'John'}, {'salary': 2500, 'count': 1, 'age': 30, 'name': 'George'}, {'salary': 1000, 'count': 1, 'age': 25, 'name': 'Mary'}]
You can take the count values using collections.Counter and then rebuild the dicts after adding the count value from the Counter to each frozenset:
from collections import Counter
l = [dict(d | {('count', c)}) for d, c in Counter(frozenset(d.items())
for d in myList).items()]
print(l)
# [{'salary': 1000, 'name': 'Mary', 'age': 25, 'count': 1},
# {'name': 'John', 'salary': 2000, 'age': 25, 'count': 3},
# {'salary': 2500, 'name': 'George', 'age': 30, 'count': 1}]

Remove duplicates from list of dictionaries within list of dictionaries

I have list:
my_list = [{'date': '10.06.2016',
'account': [{'name': 'a'},
{'name': 'a'},
{'name': 'b'},
{'name': 'b'}]},
{'date': '22.06.2016',
'account': [{'name': 'a'},
{'name': 'a'}]}]
I want to remove duplicates from the list of dictionaries in 'account':
my_list = [{'date': '10.06.2016',
'account': [{'name': 'a'},
{'name': 'b'}]},
{'date': '22.06.2016',
'account': [{'name': 'a'}]}]
When using set, I get the following error:
TypeError: unhashable type: 'dict'
Can anybody help me with this problem?
This structure is probably over complicated, but it gets the job done.
my_list = [{'date': '10.06.2016',
'account': [{'name': 'a'},
{'name': 'a'},
{'name': 'b'},
{'name': 'b'}]},
{'date': '22.06.2016',
'account': [{'name': 'a'},
{'name': 'a'}]}]
>>> [{'date': date,
'account': [{'name': name} for name in group]
} for group, date in zip([set(account.get('name')
for account in item.get('account'))
for item in my_list],
[d.get('date') for d in my_list])]
[{'account': [{'name': 'a'}, {'name': 'b'}], 'date': '10.06.2016'},
{'account': [{'name': 'a'}], 'date': '22.06.2016'}]
def deduplicate_account_names(l):
for d in l:
names = set(map(lambda d: d.get('name'), d['account']))
d['account'] = [{'name': name} for name in names]
# even shorter:
# def deduplicate_account_names(l):
# for d in l:
# d['account'] = [{'name': name} for name in set(map(lambda d: d.get('name'), d['account']))]
my_list = [{'date': '10.06.2016',
'account': [{'name': 'a'},
{'name': 'a'},
{'name': 'b'},
{'name': 'b'}]},
{'date': '22.06.2016',
'account': [{'name': 'a'},
{'name': 'a'}]}]
deduplicate_account_names(my_list)
print(my_list)
# [ {'date': '10.06.2016',
# 'account': [ {'name': 'a'},
# {'name': 'b'} ] },
# {'date': '22.06.2016',
# 'account': [ {'name': 'a'} ] } ]
Sets can only have hashable members and neither lists nor dicts are - but they can be checked for equality.
you can do
def without_duplicates(inlist):
outlist=[]
for e in inlist:
if e not in outlist:
outlist.append(e)
return outlist
this can be slow for really big lists
Give this code a try:
for d in my_list:
for k in d:
if k == 'account':
v = []
for d2 in d[k]:
if d2 not in v:
v.append(d2)
d[k] = v
This is what you get after running the snippet above:
In [347]: my_list
Out[347]:
[{'account': [{'name': 'a'}, {'name': 'b'}], 'date': '10.06.2016'},
{'account': [{'name': 'a'}], 'date': '22.06.2016'}]

How to reorder a list in Python based on its content

I have a list of dictionaries in python like this;
l = [{'name': 'John', 'age': 23},
{'name': 'Steve', 'age': 35},
{'name': 'Helen'},
{'name': 'George'},
{'name': 'Jessica', 'age': 23}]
What I am trying to achieve here is reorder the elements of l in such a way that each entry containing the key age move to the end of the list like this;
End result:
l = [{'name': 'Helen'},
{'name': 'George'},
{'name': 'Jessica', 'age': 23},
{'name': 'John', 'age': 23},
{'name': 'Steve', 'age': 35}]
How can I do this?
You can sort the list:
l.sort(key=lambda d: 'age' in d)
The key returns either True or False, based on the presence of the 'age' key; True is sorted after False. Python's sort is stable, leaving the rest of the relative ordering intact.
Demo:
>>> from pprint import pprint
>>> l = [{'name': 'John', 'age': 23},
... {'name': 'Steve', 'age': 35},
... {'name': 'Helen'},
... {'name': 'George'},
... {'name': 'Jessica', 'age': 23}]
>>> l.sort(key=lambda d: 'age' in d)
>>> pprint(l)
[{'name': 'Helen'},
{'name': 'George'},
{'age': 23, 'name': 'John'},
{'age': 35, 'name': 'Steve'},
{'age': 23, 'name': 'Jessica'}]
If you also wanted to sort by age, then retrieve the age value and return a suitable stable sentinel for those entries that do not have an age, but which will be sorted first. float('-inf') will always be sorted before any other number, for example:
l.sort(key=lambda d: d.get('age', float('-inf')))
Again, entries without an age are left in their original relative order:
>>> l.sort(key=lambda d: d.get('age', float('-inf')))
>>> pprint(l)
[{'name': 'Helen'},
{'name': 'George'},
{'age': 23, 'name': 'John'},
{'age': 23, 'name': 'Jessica'},
{'age': 35, 'name': 'Steve'}]

sort a list of dicts by x then by y

I want to sort this info(name, points, and time):
list = [
{'name':'JOHN', 'points' : 30, 'time' : '0:02:2'},
{'name':'KARL','points':50,'time': '0:03:00'}
]
so, what I want is the list sorted first by points made, then by time played (in my example, matt go first because of his less time. any help?
I'm trying with this:
import operator
list.sort(key=operator.itemgetter('points', 'time'))
but got a TypeError: list indices must be integers, not str.
Your example works for me. I would advise you not to use list as a variable name, since it is a builtin type.
You could try doing something like this also:
list.sort(key=lambda item: (item['points'], item['time']))
edit:
example list:
>>> a = [
... {'name':'JOHN', 'points' : 30, 'time' : '0:02:20'},
... {'name':'LEO', 'points' : 30, 'time': '0:04:20'},
... {'name':'KARL','points':50,'time': '0:03:00'},
... {'name':'MARK','points':50,'time': '0:02:00'},
... ]
descending 'points':
using sort() for inplace sorting:
>>> a.sort(key=lambda x: (-x['points'],x['time']))
>>> pprint.pprint(a)
[{'name': 'MARK', 'points': 50, 'time': '0:02:00'},
{'name': 'KARL', 'points': 50, 'time': '0:03:00'},
{'name': 'JOHN', 'points': 30, 'time': '0:02:20'},
{'name': 'LEO', 'points': 30, 'time': '0:04:20'}]
>>>
using sorted to return a sorted list:
>>> pprint.pprint(sorted(a, key=lambda x: (-x['points'],x['time'])))
[{'name': 'MARK', 'points': 50, 'time': '0:02:00'},
{'name': 'KARL', 'points': 50, 'time': '0:03:00'},
{'name': 'JOHN', 'points': 30, 'time': '0:02:20'},
{'name': 'LEO', 'points': 30, 'time': '0:04:20'}]
>>>
ascending 'points':
>>> a.sort(key=lambda x: (x['points'],x['time']))
>>> import pprint
>>> pprint.pprint(a)
[{'name': 'JOHN', 'points': 30, 'time': '0:02:20'},
{'name': 'LEO', 'points': 30, 'time': '0:04:20'},
{'name': 'MARK', 'points': 50, 'time': '0:02:00'},
{'name': 'KARL', 'points': 50, 'time': '0:03:00'}]
>>>
itemgetter will throw this error up to Python2.4
If you are stuck on 2.4, you will need to use the lambda
my_list.sort(key=lambda x: (x['points'], x['time']))
It would be preferable to upgrade to a newer Python if possible

Categories