Python 2.7 - Sum value on duplicates in dictionary - python

I have a list of dictionaries like:
list1=[{'a':'apples', 'b':'snack','count':2},{'a':'apples','b':'lunch','count':3},{'a':'apples','b':'snack','count':3}]
I need to group duplicates in the list on 'a' and 'b' and sum their 'count' such that:
list2=[{'a':'apples','b':'snack','count':5},{'a':'apples','b':'lunch','count':3}]
Searched through the repository here and haven't recognized a solution. Thanks very much for any pointers.

You can use a defaultdict with a 2tuple to accumulate the counts, then push it back to a list...
list1=[{'a':'apples', 'b':'snack','count':2},{'a':'apples','b':'lunch','count':3},{'a':'apples','b':'snack','count':3}]
from collections import defaultdict
dd = defaultdict(int)
for d in list1:
dd[d['a'], d['b']] += d['count']
list2 = [{'a': k[0], 'b': k[1], 'count': v} for k, v in dd.iteritems()]
[{'a': 'apples', 'count': 3, 'b': 'lunch'}, {'a': 'apples', 'count': 5, 'b': 'snack'}]

Another solution, using groupby and list,dict and generator comprehensions:
list1=[{'a':'apples', 'b':'snack','count':2},{'a':'apples','b':'lunch','count':3},{'a':'apples','b':'snack','count':3}]
from itertools import groupby
list1.sort()
group_func = lambda x: {key:val for key, val in x.iteritems() if key!='count'}
list2 = [dict(k, count = sum(item['count'] for item in items)) for k, items in groupby(list1, group_func)]
[{'a': 'apples', 'count': 3, 'b': 'lunch'}, {'a': 'apples', 'count': 5, 'b': 'snack'}]
Explanation:
The grouper function takes an item and return a sub-dictionary
without the 'count' item using dict-comprehension.
Then groupby gathers all original list items with the same subdict
Finally the list comprehension iterates those groups and sums (now using generator comprehension) the count items.
Cons:
Less readable.
For groupby to work it needs to be sorted, so that could make things slower.
Pros:
If list1 is already sorted this is probably faster. (since comprehensions are generally faster in python)
Shorter. (can even be written in a single barely comprehensible line :))

Related

Making list of dictionaries into a single dictiionary python [duplicate]

How can I turn a list of dicts like [{'a':1}, {'b':2}, {'c':1}, {'d':2}], into a single dict like {'a':1, 'b':2, 'c':1, 'd':2}?
Answers here will overwrite keys that match between two of the input dicts, because a dict cannot have duplicate keys. If you want to collect multiple values from matching keys, see How to merge dicts, collecting values from matching keys?.
This works for dictionaries of any length:
>>> result = {}
>>> for d in L:
... result.update(d)
...
>>> result
{'a':1,'c':1,'b':2,'d':2}
As a comprehension:
# Python >= 2.7
{k: v for d in L for k, v in d.items()}
# Python < 2.7
dict(pair for d in L for pair in d.items())
In case of Python 3.3+, there is a ChainMap collection:
>>> from collections import ChainMap
>>> a = [{'a':1},{'b':2},{'c':1},{'d':2}]
>>> dict(ChainMap(*a))
{'b': 2, 'c': 1, 'a': 1, 'd': 2}
Also see:
What is the purpose of collections.ChainMap?
Little improvement for #dietbuddha answer with dictionary unpacking from PEP 448, for me, it`s more readable this way, also, it is faster as well:
from functools import reduce
result_dict = reduce(lambda a, b: {**a, **b}, list_of_dicts)
But keep in mind, this works only with Python 3.5+ versions.
This is similar to #delnan but offers the option to modify the k/v (key/value) items and I believe is more readable:
new_dict = {k:v for list_item in list_of_dicts for (k,v) in list_item.items()}
for instance, replace k/v elems as follows:
new_dict = {str(k).replace(" ","_"):v for list_item in list_of_dicts for (k,v) in list_item.items()}
unpacks the k,v tuple from the dictionary .items() generator after pulling the dict object out of the list
For flat dictionaries you can do this:
from functools import reduce
reduce(lambda a, b: dict(a, **b), list_of_dicts)
You can use join function from funcy library:
from funcy import join
join(list_of_dicts)
>>> L=[{'a': 1}, {'b': 2}, {'c': 1}, {'d': 2}]
>>> dict(i.items()[0] for i in L)
{'a': 1, 'c': 1, 'b': 2, 'd': 2}
Note: the order of 'b' and 'c' doesn't match your output because dicts are unordered
if the dicts can have more than one key/value
>>> dict(j for i in L for j in i.items())
If you don't need the singleton dicts anymore:
>>> L = [{'a':1}, {'b':2}, {'c':1}, {'d':2}]
>>> dict(map(dict.popitem, L))
{'a': 1, 'b': 2, 'c': 1, 'd': 2}
dict1.update( dict2 )
This is asymmetrical because you need to choose what to do with duplicate keys; in this case, dict2 will overwrite dict1. Exchange them for the other way.
EDIT: Ah, sorry, didn't see that.
It is possible to do this in a single expression:
>>> from itertools import chain
>>> dict( chain( *map( dict.items, theDicts ) ) )
{'a': 1, 'c': 1, 'b': 2, 'd': 2}
No credit to me for this last!
However, I'd argue that it might be more Pythonic (explicit > implicit, flat > nested ) to do this with a simple for loop. YMMV.
this way worked for me:
object = [{'a':1}, {'b':2}, {'c':1}, {'d':2}]
object = {k: v for dct in object for k, v in dct.items()}
printing object:
object = {'a':1,'b':2,'c':1,'d':2}
thanks Axes
>>> dictlist = [{'a':1},{'b':2},{'c':1},{'d':2, 'e':3}]
>>> dict(kv for d in dictlist for kv in d.iteritems())
{'a': 1, 'c': 1, 'b': 2, 'e': 3, 'd': 2}
>>>
Note I added a second key/value pair to the last dictionary to show it works with multiple entries.
Also keys from dicts later in the list will overwrite the same key from an earlier dict.

combine two python lists of dict with index

I want to merge two lists into a nested construct.
List_A: [{'id':1,'val':'abc'},{'id':2,'val':'bcd'},{'id':3,'val':'efg'}]
List_B: [{'ref':1,'name':'xyz'},{'ref':2,'name':'opq'},{'ref:2,'name':'jkl'},{'ref':3,'name':'nmo'}]
Result should be:
List_C: [{'id':1,'list_b':[{'name':'xyz'}]},{'id':2,'list_b':[{'name':'opq'},{'name':'jkl'}]},{'id':3,'list_b':[{'name':'nmo'}]}]
I tried pandas join and merge but got no result. Sure, I could iterate through the list and do it key by key, but there might be a much better solution.
Based on your desired output, you don't need List_A at all.
from collections import defaultdict
b = [{'ref':1,'name':'xyz'}, {'ref':2,'name':'opq'}, {'ref':2,'name':'jkl'},{'ref':3,'name':'nmo'}]
dct = {}
[dct.setdefault(tuple(i), list()).append(j) for *i, j in [tuple(x.values()) for x in b]]
[{'id':k[0], 'list_b':list({'name': n} for n in v)} for k, v in dct.items()]
Output:
[{'id': 1, 'list_b': [{'name': 'xyz'}]},
{'id': 2, 'list_b': [{'name': 'opq'}, {'name': 'jkl'}]},
{'id': 3, 'list_b': [{'name': 'nmo'}]}]
Overall this seems like an over-complicated way to store the data, but if you need that specific format (ex for an input to another program) then w/e.
from copy import deepcopy
List_C = deepcopy(List_A)
for dct1 in List_C:
dct1.pop('val')
dct1['list_b'] = list()
for dct2 in List_B:
if dct1['id'] == dct2['ref']:
dct3 = dct2.copy()
dct3.pop('ref')
dct1['list_b'].append(dct3)

How to create a dictionary out of a list of lists in python?

Let's suppose I have the following list made out of lists
list1 = [['a','b'],['a'],['b','c'],['c','d'],['b'], ['a','d']]
I am wondering if there is a way to convert every element of list1 in a dictionary where all the new dictionaries will use the same key. E.g: if ['a']
gets to be {'a':1}, and ['b'] gets to be {'b':2}, I would like for all keys a the value of 1 and for all keys b the value of 2. Therefore, when creating the dictionary of ['a','b'], I would like to turn into {'a':1, 'b':2}.
What I have found so far are ways to create a dictionary out of lists of lists but using the first element as the key and the rest of the list as the value:
Please note that's not what I am interested in.
The result I would want to obtain from list1 is something like:
dict_list1 = [{'a':1,'b':2}, {'a':1}, {'b':2,'c':3}, {'c':3,'d':4}, {'b':2}, {'a':1,'d':4}]
I am not that interested in the items being that numbers but in the numbers being the same for each different key.
You need to declare your mapping first:
mapping = dict(a=1, b=2, c=3, d=4)
Then, you can just use dict comprehension:
[{e: mapping[e] for e in li} for li in list1]
# [{'a': 1, 'b': 2}, {'a': 1}, {'b': 2, 'c': 3}, {'c': 3, 'd': 4}, {'b': 2}, {'a': 1, 'd': 4}]
Using chain and OrderedDict you can do auto mapping
from itertools import chain
from collections import OrderedDict
list1 = [['a','b'],['a'],['b','c'],['c','d'],['b'], ['a','d']]
# do flat list for auto index
flat_list = list(chain(*list1))
# remove duplicates
flat_list = list(OrderedDict.fromkeys(flat_list))
mapping = {x:flat_list.index(x)+1 for x in set(flat_list)}
[{e: mapping[e] for e in li} for li in list1]
Here a try with ord() also it will work for both capital and lower letters :
[{e: ord(e)%32 for e in li} for li in list1]

Removing dictionaries from a list on the basis of duplicate value of key

I am new to Python. Suppose i have the following list of dictionaries:
mydictList= [{'a':1,'b':2,'c':3},{'a':2,'b':2,'c':4},{'a':2,'b':3,'c':4}]
From the above list, i want to remove dictionaries with same value of key b. So the resultant list should be:
mydictList = [{'a':1,'b':2,'c':3},{'a':2,'b':3,'c':4}]
You can create a new dictionary based on the value of b, iterating the mydictList backwards (since you want to retain the first value of b), and get only the values in the dictionary, like this
>>> {item['b'] : item for item in reversed(mydictList)}.values()
[{'a': 1, 'c': 3, 'b': 2}, {'a': 2, 'c': 4, 'b': 3}]
If you are using Python 3.x, you might want to use list function over the dictionary values, like this
>>> list({item['b'] : item for item in reversed(mydictList)}.values())
Note: This solution may not maintain the order of the dictionaries.
First, sort the list by b-values (Python's sorting algorithm is stable, so dictionaries with identical b values will retain their relative order).
from operator import itemgetter
tmp1 = sorted(mydictList, key=itemgetter('b'))
Next, use itertools.groupby to create subiterators that iterate over dictionaries with the same b value.
import itertools
tmp2 = itertools.groupby(tmp1, key=itemgetter('b))
Finally, create a new list that contains only the first element of each subiterator:
# Each x is a tuple (some-b-value, iterator-over-dicts-with-b-equal-some-b-value)
newdictList = [ next(x[1]) for x in tmp2 ]
Putting it all together:
from itertools import groupby
from operator import itemgetter
by_b = itemgetter('b')
newdictList = [ next(x[1]) for x in groupby(sorted(mydictList, key=by_b), key=by_b) ]
A very straight forward approach can go something like this:
mydictList= [{'a':1,'b':2,'c':3},{'a':2,'b':2,'c':4},{'a':2,'b':3,'c':4}]
b_set = set()
new_list = []
for d in mydictList:
if d['b'] not in b_set:
new_list.append(d)
b_set.add(d['b'])
Result:
>>> new_list
[{'a': 1, 'c': 3, 'b': 2}, {'a': 2, 'c': 4, 'b': 3}]

Getting the item with the maximum value in list of dicts after grouping the dict by key

I have a list of dicts that looks like this:
[{'apples': 99}, {'bananas': '556685'}, {'apples': 88}, {'apples': '2345566'}]
I would like to group the items by the key and return the key of the item with the highest value? i.e. sum up the values all the apples and sum up the values of all the bananas ad return the higher one — apple or banana I can't seem to figure out a good way of doing this and I'm trying to avoid using a bunch of loops and counter variables.
(Just out of curiosity, is this possible as a one-liner? if so, how?)
After changing all your values to integers:
import itertools as it
a = [{'apples': 99}, {'bananas': 556685}, {'apples': 88}, {'apples': 2345566}]
max((sum(i.values()[0] for i in v), k) for k,v in it.groupby(sorted(a), key=lambda x: x.keys()[0]))[1]
# 'apples'
If you remove the trailing [1], it will give you even the sum:
# (2345753, 'apples')
Not exactly a one-liner, but I like to use collections.Counter for this kind of tasks because I find it quite readable:
from collections import Counter
from itertools import chain
from operator import itemgetter
a = [{'apples': 99}, {'bananas': 556685}, {'apples': 88}, {'apples': 2345566}]
c = Counter()
for k, v in chain.from_iterable([d.items() for d in a]):
c[k] += v
print max(c.items(), key=itemgetter(1))[0]
Try this:
reduce(lambda m,n:m if int(m[1])>=int(n[1]) else n,map(lambda p:p.items()[0],a))
I guess it doesn't sort but gives you the highest one in a line.

Categories