What is the most Pythonic way to take a list of dicts and sum up all the values for matching keys from every row in the list?
I did this but I suspect a comprehension is more Pythonic:
from collections import defaultdict
demandresult = defaultdict(int) # new blank dict to store results
for d in demandlist:
for k,v in d.iteritems():
demandresult[k] = demandresult[k] + v
In Python - sum values in dictionary the question involved the same key all the time, but in my case, the key in each row might be a new key never encountered before.
I think that your method is quite pythonic. Comprehensions are nice but they shouldn't really be overdone, and they can lead to really messy one-liners, like the one below :).
If you insist on a dict comp:
demand_list = [{u'2018-04-29': 1, u'2018-04-30': 1, u'2018-05-01': 1},
{u'2018-04-21': 1},
{u'2018-04-18': 1, u'2018-04-19': 1, u'2018-04-17' : 1}]
d = {key:sum(i[key] for i in demand_list if key in i)
for key in set(a for l in demand_list for a in l.keys())}
print(d)
>>>{'2018-04-21': 1, '2018-04-17': 1, '2018-04-29': 1, '2018-04-30': 1, '2018-04-19': 1, '2018-04-18': 1, '2018-05-01': 1}
Here is another one-liner (ab-)using collections.ChainMap to get the combined keys:
>>> from collections import ChainMap
>>> {k: sum(d.get(k, 0) for d in demand_list) for k in ChainMap(*demand_list)}
{'2018-04-17': 1, '2018-04-21': 1, '2018-05-01': 1, '2018-04-30': 1, '2018-04-19': 1, '2018-04-29': 1, '2018-04-18': 1}
This is easily the slowest of the methods proposed here.
The only thing that seemed unclear in your code was the double-for-loop. It may be clearer to collapse the demandlist into a flat iterable—then the loopant presents the logic as simply as possible. Consider:
demandlist = [{
u'2018-04-29': 1,
u'2018-04-30': 1,
u'2018-05-01': 1
}, {
u'2018-04-21': 1
}, {
u'2018-04-18': 1,
u'2018-04-19': 1,
u'2018-04-17': 1
}]
import itertools as it
from collections import defaultdict
demandresult = defaultdict(int)
for k, v in it.chain.from_iterable(map(lambda d: d.items(), demandlist)):
demandresult[k] = demandresult[k] + v
(With this, print(demandresult) prints defaultdict(<class 'int'>, {'2018-04-29': 1, '2018-04-30': 1, '2018-05-01': 1, '2018-04-21': 1, '2018-04-18': 1, '2018-04-19': 1, '2018-04-17': 1}).)
Imagining myself reading this for the first time (or a few months later), I can see myself thinking, "Ok, I'm collapsing demandlist into a key-val iterable, I don't particularly care how, and then summing values of matching keys."
It's unfortunate that I need that map there to ensure the final iterable has key-val pairs… it.chain.from_iterable(demandlist) is a key-only iterable, so I need to call items on each dict.
Note that unlike many of the answers proposed, this implementation (like yours!) minimizes the number of scans over the data to just one—performance win (and I try to pick up as many easy performance wins as I can).
I suppose you want to return a list of summed values of each dictionary.
list_of_dict = [
{'a':1, 'b':2, 'c':3},
{'d':4, 'e':5, 'f':6}
]
sum_of_each_row = [sum(v for v in d.values()) for d in list_of_dict] # [6,15]
If you want to return the total sum, just simply wrap sum() to "sum_of_each_row".
EDIT:
The main problem is that you don't have a default value for each of the keys, so you can make use of the method dict.setdefault() to set the default value when there's a new key.
list_of_dict = [
{'a':1, 'b':1},
{'b':1, 'c':1},
{'a':2}
]
d = {}
d = {k:d[k]+v if k in d.keys() else d.setdefault(k,v)
for row in list_of_dict for k,v in row.items()} # {'a':3, 'b':2, 'c':1}
Related
I have a question regarding merging two dictionaries. It is not a simple merge, but merging in a way that I shall take first key value pair from the first element, then the first element from the second dictionary and so on.
For example:
dict1 = {"zero":0,"two":2, "four":4, "six": 6, "eight":8,"ten":10}
dict2 = {"one":1,"three":3,"five":5,"seven":7, "nine":9}
I need to have the following:
dict3 = {"zero":0,"one":1,"two":2,"three":3,"four":4, ... "ten":10 }
Would appreaciate any advise
The answer from #Andrei Vintila is along the right lines but dict_keys are not subscriptable and using the smallest size misses some dictionary items. A looping approach which does work (and for any number of elements in either dict) is:
dict1_keys = list(dict1.keys())
dict2_keys = list(dict2.keys())
s1 = len(dict1_keys)
s2 = len(dict2_keys)
max_size = max(s1, s2)
dict3 = {}
for index in range(max_size):
if(index < s1):
key1 = dict1_keys[index]
dict3[key1] = dict1[key1]
if(index < s2):
key2 = dict2_keys[index]
dict3[key2] = dict2[key2]
print(dict3)
which produces:
{'zero': 0, 'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9, 'ten': 10}
You need to create one more dictionary which will merge the other two dictionaries, after that we will only need to sort the new dictionary items. Here is the code:
dict1 = {"zero":0,"two":2, "four":4, "six": 6, "eight":8,"ten":10}
dict2 = {"one":1,"three":3,"five":5,"seven":7, "nine":9}
dict3 = {**dict1, **dict2}
dict3 = dict(sorted(dict3.items(), key=lambda x:x[1]))
print(dict3)
If you’re on python 3.7+ then the dictionaries maintain the insertion order.
dict1_keys = dict1.keys()
dict2_keys = dict2.keys()
The way you do it is you have to get the keys of the first and second dictionary and loop through them using an index and once you exhaust the keys of one of your dictionaries and still have keys left in the other then you just copy over the remaining keys and values.
smallest_size = min(len(dict1_keys), len(dict2_keys))
for index in range(smallest_size):
key1 = dict1_keys[index]
key2 = dict2_keys[index]
dict3[key1] = dict1[key1]
dict3[key2] = dict2[key2]
I’m aware this might not be the Python way to tackle this however if you’re brushing up this is one way to do it…
If you want a simple loop approach the key is to use enumerate to get an index you can use to access the second dict (and check you havent run off the end of the second dict - I'm assuming here the second dict is same length or one shorter):
dict1 = {"zero":0,"two":2, "four":4, "six": 6, "eight":8,"ten":10}
dict2 = {"one":1,"three":3,"five":5,"seven":7, "nine":9}
result = {}
keys2 = list(dict2.keys())
for i, k in enumerate(dict1):
result[k] = dict1[k]
if i < len(dict2):
result[keys2[i]] = dict2[keys2[i]]
If you wanted a more functional approach you could do this:
from functools import reduce
result = reduce(dict.__or__, ({k1: v1, k2: v2} for (k1, v1), (k2, v2) in zip(dict1.items(), dict2.items())))
which works fine if both dicts are the same length. If the second is shorter, you'll need to manually append the remaining key-value from the first dict afterwards
Simple iteration with zip(). However zip() only return the shortest argument of iterable. In this case, the last key:value of the longest dict will not be included. dict.update() will update the dict to ensure all keys and values are added:
dict1 = {"zero":0,"two":2, "four":4, "six": 6, "eight":8,"ten":10}
dict2 = {"one":1,"three":3,"five":5,"seven":7, "nine":9, "eleven":11, "twelve":12}
dict3 = {}
for k1, k2 in zip(dict1, dict2):
dict3[k1], dict3[k2] = k1, k2
dict3.update(dict1); dict3.update(dict2)
print(dict3)
# {'zero': 0, 'one': 1, 'two': 2, 'three': 3, 'four': 4, 'five': 5, 'six': 6, 'seven': 7, 'eight': 8, 'nine': 9, 'ten': 10, 'eleven': 11, 'twelve': 12}
So dictionaries are not sequences like a list (as despite being "ordered" since 3.7, in most cases the order doesn't matter ). There is such a thing as an ordered dictionary from the collections module
There are several ways to combine dictionaries
The easiest way in your case is .update() so:
dict1.update(dict2)
To merge all the keys/val pairs into one dictionary
You can also use kwargs:
dict3 = {**dict1, **dict2}
If you still want them sorted you can use the sorted with different function
sorted(dict3,key=[insert]) # note can also use reverse=True
Different key functions:
str: if an increasing name like a,b,c etc
lamda x: dict3[x]: if increase value such as 0,1,2,3 etc
could also make a key list and search from that (though a bit more hacky)
key_list =[]
#Note only works if len(dict1) = len(dict2)
for item1, item2 in zip(dict1,dict2):
key_list.extend([item1,item2])
And then do key=lambda x: key_list.index(x)
Hope this helps!
Let's say I have the following dictionary:
full_dic = {
'aa': 1,
'ac': 1,
'ab': 1,
'ba': 2,
...
}
I normally use standard dictionary comprehension to remove dupes like:
t = {val : key for (key, val) in full_dic.items()}
cleaned_dic = {val : key for (key, val) in t.items()}
Calling print(cleaned_dic) outputs {'ab': 1,'ba': 2, ...}
With this code, the key that remains seems to always be the final one in the list, but I'm not sure that's even guaranteed as dictionaries are unordered. Instead, I'd like to find a way to ensure that the key I keep is the first alphabetically.
So, regardless of the 'order' the dictionary is in, I want the output to be:
>> {'aa': 1,'ba': 2, ...}
Where 'aa' comes first alphabetically.
I ran some timer tests on 3 answers below and got the following (dictionary was created with random key/value pairs):
dict length: 10
# of loops: 100000
HoliSimo (OrderedDict): 0.0000098405 seconds
Ricardo: 0.0000115448 seconds
Mark (itertools.groupby): 0.0000111745 seconds
dict length: 1000000
# of loops: 10
HoliSimo (OrderedDict): 6.1724137300 seconds
Ricardo: 3.3102091300 seconds
Mark (itertools.groupby): 6.1338266200 seconds
We can see that for smaller dictionary sizes using OrderedDict is fastest but for large dictionary sizes it's slightly better to use Ricardo's answer below.
t = {val : key for (key, val) in dict(sorted(full_dic.items(), key=lambda x: x[0].lower(), reverse=True)).items()}
cleaned_dic = {val : key for (key, val) in t.items()}
dict(sorted(cleaned_dic.items(), key=lambda x: x[0].lower()))
>>> {'aa': 1, 'ba': 2}
Seems like you can do this with a single sort and itertools.groupby. First sort the items by value, then key. Pass this to groupby and take the first item of each group to pass to the dict constructor:
from itertools import groupby
full_dic = {
'aa': 1,
'ac': 1,
'xx': 2,
'ab': 1,
'ba': 2,
}
groups = groupby(sorted(full_dic.items(), key=lambda p: (p[1], p[0])), key=lambda x: x[1])
dict(next(g) for k, g in groups)
# {'aa': 1, 'ba': 2}
You should use the OrderectDict class.
import collections
full_dic = {
'aa': 1,
'ac': 1,
'ab': 1
}
od = collections.OrderedDict(sorted(full_dic.items()))
In this way you will be sure to have sorted dictionary (Original code: StackOverflow).
And then:
result = {}
for k, vin od.items():
if value not in result.values():
result[key] = value
I'm not sure if it will speed up the computation but you can try:
inverted_dict = {}
for k, v in od.items():
if inverted_dict.get(v) is None:
inverted_dict[v] = k
res = {v: k for k, v in inverted_dict.items()}
I have this dictionary (key,list)
index={'chair':['one','two','two','two'],'table':['two','three','three']}
and i want this
#1. number of times each value occurs in each key. ordered descending
indexCalc={'chair':{'two':3,'one':1}, 'table':{'three':2,'two':1}}
#2. value for maximum amount for each key
indexMax={'chair':3,'table':2}
#3. we divide each value in #1 by value in #2
indexCalcMax={'chair':{'two':3/3,'one':1/3}, 'table':{'three':2/2,'two':1/2}}
I think I should use lambda expressions, but can't come up with any idea how i can do that. Any help?
First, define your values as lists correctly:
index = {'chair': ['one','two','two','two'], 'table': ['two','three','three']}
Then use collections.Counter with dictionary comprehensions:
from collections import Counter
number of times each value occurs in each key.
res1 = {k: Counter(v) for k, v in index.items()}
value for maximum amount for each key
res2 = {k: v.most_common()[0][1] for k, v in res1.items()}
we divide each value in #1 by value in #2
res3 = {k: {m: n / res2[k] for m, n in v.items()} for k, v in res1.items()}
index={'chair':{'one','two','two','two'},'table':{'two','three','three'}}
Problem: {} is creating a set. So you should consider to convert it into list.
Now coming to your solution:
from collections import Counter
index={'chair': ['one','two','two','two'],'table':['two','three','three']}
updated_index = {'chair': dict(Counter(index['chair'])), 'table': dict(Counter(index['table']))}
updated_index_2 = {'chair': Counter(index['chair']).most_common()[0][1], 'table': Counter(index['table']).most_common()[0][1]}
print(updated_index)
print(updated_index_2)
You can use python collections library, Counter to find the count without writing any lambda function.
{'chair': {'one': 1, 'two': 3}, 'table': {'two': 1, 'three': 2}}
{'chair': 3, 'table': 2}
Firstly, you have a mistake in how you created the index dict. You should have lists as the elements for each dictionary, you currently have sets. Sets are automatically deduplicated, so you will not be able to get a proper count from there.
You should correct index to be:
index={'chair':['one','two','two','two'],'table':['two','three','three']}
You can use the Counter module in Python 3, which is a subclass of the dict module, to generate what you want for each entry in indexCalc. A counter will create a dictionary with a key, and the number of times that key exists in a collection.
indexCalc = {k, Counter(v) for k, v in index}
indexCalc looks like this:
{'chair': Counter({'two': 3, 'one': 1}), 'table': Counter({'three': 2, 'two': 1})}
We can easily find the index that corresponds to the maximum value in each sub-dictionary:
indexMax = {k: max(indexCalc[k].values()) for k in indexCalc}
indexMax looks like this:
{'chair': 3, 'table': 2}
You can create indexCalcMax with the following comprehension, which is a little ugly:
indexCalcMax = {k: {val: indexCalc[k][val] / indexMax[k] for val in indexCalc[k]} for k in indexCalc}
which is a dict-comprehension translation of this loop:
for k in indexCalc:
tmp = {}
for val in indexCalc[k]:
tmp[val] = indexCalc[k][val] / float(indexMax[k])
indexCalcMax[k] = tmp
I know this is suboptimal, but I had to do it as a thought exercise:
indexCalc = {
k: {key: len([el for el in index[k] if el == key]) for key in set(index[k])}
for k in index
}
Not exactly lambda, as suggested, but comprehensions... Don't use this code in production :) This answer is only partial, you can use the analogy and come up with the other two structures that you require.
Let's say I have a dictionary with these keys and values :
{'foo': 1, 'bar': 5,'foo1' : 1,'bar1' : 1,'foo2': 5}
I can't zip them like this
dict(zip(my.values(),my.keys()))
because this happens :
{1: 'foo', 5: 'bar'}
What I would like to be my output is :
{1:{'bar1','foo','foo1'},5:{'bar','foo2'}}
You should use a collections.defaultdict().
from collections import defaultdict
result = defaultdict(list)
for k, v in my.items():
result[v].append(k)
You can't have multiple values for a given key, so the values in such a data structure need to be lists. You can't do this transformation easily with zip(); you'll need a for loop:
my = {'foo': 1, 'bar': 5, 'foo1': 1, 'bar1': 1,'foo2': 5}
rev = {}
for k, v in my.items():
rev.setdefault(v, []).append(k)
From the edits in your question, it appears that you want to use a set for the values. This is also straightforward:
for k, v in my.items():
rev.setdefault(v, set()).add(k)
You can also use a defaultdict as Daniel has suggested, but it seems overkill here to do an import just for that. Depending on the size of your dictionary it might be a little faster, since using setdefault() we are continuously creating and throwing away empty containers.
For a one-liner with a functional twist (probably neither the most readable nor performant code, though):
import itertools, operator
my_dict = {'foo': 1, 'bar': 5, 'foo1' : 1,'bar1' : 1, 'foo2': 5}
inverse_dict = { k:map(operator.itemgetter(0), v) for k, v in itertools.groupby(sorted(my_dict.items(), key=operator.itemgetter(1)), operator.itemgetter(1)) }
To aggregate using a set, just wrap the mapped value in a set constructor.
inverse_dict = { k:set(map(operator.itemgetter(0), v)) for k, v in itertools.groupby(sorted(my_dict.items(), key=operator.itemgetter(1)), operator.itemgetter(1)) }
Sample:
d = {
"test": 1,
"sample": 2,
"example": 3,
"product": 4,
"software": 5,
"demo": 6,
}
filter_keys = ["test","sample","example","demo"]
I want to create a new dict that contains only those items from the first dict whose keys appear in the list. In other words, I want:
d2 = {
"test": 1,
"sample": 2,
"example": 3,
"demo": 6,
}
I could do it with a loop:
d2 = {}
for k in d.keys():
if (k in filter_keys):
d2[k] = d[k]
But this seems awfully "un-Pythonic". I'm also guessing that if you had a huge dict, say 5,000 items or so, the constant adding of new items to the new dict would be slow compared to a more direct way.
Also, you'd want to be able to handle errors. If the list contains something that's not a key in the dict, it should just be ignored. Or maybe it gets added to the new dict but with a value of None.
Is there a better way to accomplish this?
A straight-forward way to do this is with the "dictionary comprehension":
filtered_dict = {key: value for key, value in d.items() if key in filter_keys}
Note that if the condition appears at the end of the comprehension, it filters execution of the loop statement. Depending on whether the numbers of keys in the dictionary is greater than the number of keys you want to filter on, this revision could be more efficient:
filtered_dict = {key: d[key] for key in filter_keysif key in d}
Checking for membership in the dictionary (key in d) is significantly faster than checking for membership in the filter key list (key in filter_keys). But which ends up faster depends on the size of the filter key list (and, to a lesser extent, the size of the dictionary).
A relatively python way to do it without a dictionary comprehension is with the dict constructor:
filtered_dict = dict([(key, value) for key, value in d.items() if key in filter_keys])
Note that this is essentially equivalent to the dictionary comprehension, but may be clearer if you aren't familiar with dictionary comprehension syntax.
Dictionary comprehension is one way to do it:
new_d = {k: v for k, v in d.items() if k in l}
Demo:
>>> d = {
... "test": 1,
... "sample": 2,
... "example": 3,
... "product": 4,
... "software": 5,
... "demo": 6,
... }
>>>
>>> l = ["test","sample","example","demo"]
>>> new_d = {k: v for k, v in d.items() if k in l}
>>> new_d
{'sample': 2, 'demo': 6, 'test': 1, 'example': 3}
For optimal performance, you should iterate over the keys in the list and check if they are in the dict rather than the other way around:
d2 = {}
for k in list_of_keys:
if k in d:
d2[k] = d[k]
The benefit here is that the dict.__contains__ (in) on a dict is O(1) whereas for the list it's O(N). For big lists, that's a HUGE benefit (O(N) algorithm vs. O(N^2)).
We can be a little more succinct by expressing the above loop with an equivalent dict-comprehension:
d2 = {k: d[k] for k in list_of_keys if k in d}
This will be likely be marginally faster than the loop, but probably not enough to ever worry about. That said, most python programmers would prefer this version as it is more succinct and very common.
As per your last part of the question:
Or maybe it gets added to the new dict but with a value of None.
l = ["test","sample","example","demo","badkey"]
d = {
"test": 1,
"sample": 2,
"example": 3,
"product": 4,
"software": 5,
"demo": 6,
}
print {k: d.get(k) for k in l}
{'test': 1, 'sample': 2, 'badkey': None, 'example': 3, 'demo': 6}
You can pass a default return value to dict.get, it is None by default but you could set it to d.get(k,"No_match") etc.. or whatever value you wanted.