For example I have two dicts:
Dict A: {'a': 1, 'b': 2, 'c': 3}
Dict B: {'b': 3, 'c': 4, 'd': 5}
I need a pythonic way of 'combining' two dicts such that the result is:
{'a': 1, 'b': 5, 'c': 7, 'd': 5}
That is to say: if a key appears in both dicts, add their values, if it appears in only one dict, keep its value.
Use collections.Counter:
>>> from collections import Counter
>>> A = Counter({'a':1, 'b':2, 'c':3})
>>> B = Counter({'b':3, 'c':4, 'd':5})
>>> A + B
Counter({'c': 7, 'b': 5, 'd': 5, 'a': 1})
Counters are basically a subclass of dict, so you can still do everything else with them you'd normally do with that type, such as iterate over their keys and values.
A more generic solution, which works for non-numeric values as well:
a = {'a': 'foo', 'b':'bar', 'c': 'baz'}
b = {'a': 'spam', 'c':'ham', 'x': 'blah'}
r = dict(a.items() + b.items() +
[(k, a[k] + b[k]) for k in set(b) & set(a)])
or even more generic:
def combine_dicts(a, b, op=operator.add):
return dict(a.items() + b.items() +
[(k, op(a[k], b[k])) for k in set(b) & set(a)])
For example:
>>> a = {'a': 2, 'b':3, 'c':4}
>>> b = {'a': 5, 'c':6, 'x':7}
>>> import operator
>>> print combine_dicts(a, b, operator.mul)
{'a': 10, 'x': 7, 'c': 24, 'b': 3}
>>> A = {'a':1, 'b':2, 'c':3}
>>> B = {'b':3, 'c':4, 'd':5}
>>> c = {x: A.get(x, 0) + B.get(x, 0) for x in set(A).union(B)}
>>> print(c)
{'a': 1, 'c': 7, 'b': 5, 'd': 5}
Intro:
There are the (probably) best solutions. But you have to know it and remember it and sometimes you have to hope that your Python version isn't too old or whatever the issue could be.
Then there are the most 'hacky' solutions. They are great and short but sometimes are hard to understand, to read and to remember.
There is, though, an alternative which is to to try to reinvent the wheel.
- Why reinventing the wheel?
- Generally because it's a really good way to learn (and sometimes just because the already-existing tool doesn't do exactly what you would like and/or the way you would like it) and the easiest way if you don't know or don't remember the perfect tool for your problem.
So, I propose to reinvent the wheel of the Counter class from the collections module (partially at least):
class MyDict(dict):
def __add__(self, oth):
r = self.copy()
try:
for key, val in oth.items():
if key in r:
r[key] += val # You can custom it here
else:
r[key] = val
except AttributeError: # In case oth isn't a dict
return NotImplemented # The convention when a case isn't handled
return r
a = MyDict({'a':1, 'b':2, 'c':3})
b = MyDict({'b':3, 'c':4, 'd':5})
print(a+b) # Output {'a':1, 'b': 5, 'c': 7, 'd': 5}
There would probably others way to implement that and there are already tools to do that but it's always nice to visualize how things would basically works.
Definitely summing the Counter()s is the most pythonic way to go in such cases but only if it results in a positive value. Here is an example and as you can see there is no c in result after negating the c's value in B dictionary.
In [1]: from collections import Counter
In [2]: A = Counter({'a':1, 'b':2, 'c':3})
In [3]: B = Counter({'b':3, 'c':-4, 'd':5})
In [4]: A + B
Out[4]: Counter({'d': 5, 'b': 5, 'a': 1})
That's because Counters were primarily designed to work with positive integers to represent running counts (negative count is meaningless). But to help with those use cases,python documents the minimum range and type restrictions as follows:
The Counter class itself is a dictionary
subclass with no restrictions on its keys and values. The values are
intended to be numbers representing counts, but you could store
anything in the value field.
The most_common() method requires only
that the values be orderable.
For in-place operations such as c[key]
+= 1, the value type need only support addition and subtraction. So fractions, floats, and decimals would work and negative values are
supported. The same is also true for update() and subtract() which
allow negative and zero values for both inputs and outputs.
The multiset methods are designed only for use cases with positive values.
The inputs may be negative or zero, but only outputs with positive
values are created. There are no type restrictions, but the value type
needs to support addition, subtraction, and comparison.
The elements() method requires integer counts. It ignores zero and negative counts.
So for getting around that problem after summing your Counter you can use Counter.update in order to get the desire output. It works like dict.update() but adds counts instead of replacing them.
In [24]: A.update(B)
In [25]: A
Out[25]: Counter({'d': 5, 'b': 5, 'a': 1, 'c': -1})
myDict = {}
for k in itertools.chain(A.keys(), B.keys()):
myDict[k] = A.get(k, 0)+B.get(k, 0)
The one with no extra imports!
Their is a pythonic standard called EAFP(Easier to Ask for Forgiveness than Permission). Below code is based on that python standard.
# The A and B dictionaries
A = {'a': 1, 'b': 2, 'c': 3}
B = {'b': 3, 'c': 4, 'd': 5}
# The final dictionary. Will contain the final outputs.
newdict = {}
# Make sure every key of A and B get into the final dictionary 'newdict'.
newdict.update(A)
newdict.update(B)
# Iterate through each key of A.
for i in A.keys():
# If same key exist on B, its values from A and B will add together and
# get included in the final dictionary 'newdict'.
try:
addition = A[i] + B[i]
newdict[i] = addition
# If current key does not exist in dictionary B, it will give a KeyError,
# catch it and continue looping.
except KeyError:
continue
EDIT: thanks to jerzyk for his improvement suggestions.
import itertools
import collections
dictA = {'a':1, 'b':2, 'c':3}
dictB = {'b':3, 'c':4, 'd':5}
new_dict = collections.defaultdict(int)
# use dict.items() instead of dict.iteritems() for Python3
for k, v in itertools.chain(dictA.iteritems(), dictB.iteritems()):
new_dict[k] += v
print dict(new_dict)
# OUTPUT
{'a': 1, 'c': 7, 'b': 5, 'd': 5}
OR
Alternative you can use Counter as #Martijn has mentioned above.
For a more generic and extensible way check mergedict. It uses singledispatch and can merge values based on its types.
Example:
from mergedict import MergeDict
class SumDict(MergeDict):
#MergeDict.dispatch(int)
def merge_int(this, other):
return this + other
d2 = SumDict({'a': 1, 'b': 'one'})
d2.merge({'a':2, 'b': 'two'})
assert d2 == {'a': 3, 'b': 'two'}
From python 3.5: merging and summing
Thanks to #tokeinizer_fsj that told me in a comment that I didn't get completely the meaning of the question (I thought that add meant just adding keys that eventually where different in the two dictinaries and, instead, i meant that the common key values should be summed). So I added that loop before the merging, so that the second dictionary contains the sum of the common keys. The last dictionary will be the one whose values will last in the new dictionary that is the result of the merging of the two, so I thing the problem is solved. The solution is valid from python 3.5 and following versions.
a = {
"a": 1,
"b": 2,
"c": 3
}
b = {
"a": 2,
"b": 3,
"d": 5
}
# Python 3.5
for key in b:
if key in a:
b[key] = b[key] + a[key]
c = {**a, **b}
print(c)
>>> c
{'a': 3, 'b': 5, 'c': 3, 'd': 5}
Reusable code
a = {'a': 1, 'b': 2, 'c': 3}
b = {'b': 3, 'c': 4, 'd': 5}
def mergsum(a, b):
for k in b:
if k in a:
b[k] = b[k] + a[k]
c = {**a, **b}
return c
print(mergsum(a, b))
Additionally, please note a.update( b ) is 2x faster than a + b
from collections import Counter
a = Counter({'menu': 20, 'good': 15, 'happy': 10, 'bar': 5})
b = Counter({'menu': 1, 'good': 1, 'bar': 3})
%timeit a + b;
## 100000 loops, best of 3: 8.62 µs per loop
## The slowest run took 4.04 times longer than the fastest. This could mean that an intermediate result is being cached.
%timeit a.update(b)
## 100000 loops, best of 3: 4.51 µs per loop
One line solution is to use dictionary comprehension.
C = { k: A.get(k,0) + B.get(k,0) for k in list(B.keys()) + list(A.keys()) }
def merge_with(f, xs, ys):
xs = a_copy_of(xs) # dict(xs), maybe generalizable?
for (y, v) in ys.iteritems():
xs[y] = v if y not in xs else f(xs[x], v)
merge_with((lambda x, y: x + y), A, B)
You could easily generalize this:
def merge_dicts(f, *dicts):
result = {}
for d in dicts:
for (k, v) in d.iteritems():
result[k] = v if k not in result else f(result[k], v)
Then it can take any number of dicts.
This is a simple solution for merging two dictionaries where += can be applied to the values, it has to iterate over a dictionary only once
a = {'a':1, 'b':2, 'c':3}
dicts = [{'b':3, 'c':4, 'd':5},
{'c':9, 'a':9, 'd':9}]
def merge_dicts(merged,mergedfrom):
for k,v in mergedfrom.items():
if k in merged:
merged[k] += v
else:
merged[k] = v
return merged
for dct in dicts:
a = merge_dicts(a,dct)
print (a)
#{'c': 16, 'b': 5, 'd': 14, 'a': 10}
Here's yet another option using dictionary comprehensions combined with the behavior of dict():
dict3 = dict(dict1, **{ k: v + dict1.get(k, 0) for k, v in dict2.items() })
# {'a': 4, 'b': 2, 'c': 7, 'g': 1}
From https://docs.python.org/3/library/stdtypes.html#dict:
https://docs.python.org/3/library/stdtypes.html#dict
and also
If keyword arguments are given, the keyword arguments and their values are added to the dictionary created from the positional argument.
The dict comprehension
**{ k: v + dict1.get(v, 0), v in dict2.items() }
handles adding dict1[1] to v. We don't need an explicit if here because the default value for our dict1.get can be set to 0 instead.
This solution is easy to use, it is used as a normal dictionary, but you can use the sum function.
class SumDict(dict):
def __add__(self, y):
return {x: self.get(x, 0) + y.get(x, 0) for x in set(self).union(y)}
A = SumDict({'a': 1, 'c': 2})
B = SumDict({'b': 3, 'c': 4}) # Also works: B = {'b': 3, 'c': 4}
print(A + B) # OUTPUT {'a': 1, 'b': 3, 'c': 6}
The above solutions are great for the scenario where you have a small number of Counters. If you have a big list of them though, something like this is much nicer:
from collections import Counter
A = Counter({'a':1, 'b':2, 'c':3})
B = Counter({'b':3, 'c':4, 'd':5})
C = Counter({'a': 5, 'e':3})
list_of_counts = [A, B, C]
total = sum(list_of_counts, Counter())
print(total)
# Counter({'c': 7, 'a': 6, 'b': 5, 'd': 5, 'e': 3})
The above solution is essentially summing the Counters by:
total = Counter()
for count in list_of_counts:
total += count
print(total)
# Counter({'c': 7, 'a': 6, 'b': 5, 'd': 5, 'e': 3})
This does the same thing but I think it always helps to see what it is effectively doing underneath.
What about:
def dict_merge_and_sum( d1, d2 ):
ret = d1
ret.update({ k:v + d2[k] for k,v in d1.items() if k in d2 })
ret.update({ k:v for k,v in d2.items() if k not in d1 })
return ret
A = {'a': 1, 'b': 2, 'c': 3}
B = {'b': 3, 'c': 4, 'd': 5}
print( dict_merge_and_sum( A, B ) )
Output:
{'d': 5, 'a': 1, 'c': 7, 'b': 5}
More conventional way to combine two dict. Using modules and tools are good but understanding the logic behind it will help in case you don't remember the tools.
Program to combine two dictionary adding values for common keys.
def combine_dict(d1,d2):
for key,value in d1.items():
if key in d2:
d2[key] += value
else:
d2[key] = value
return d2
combine_dict({'a':1, 'b':2, 'c':3},{'b':3, 'c':4, 'd':5})
output == {'b': 5, 'c': 7, 'd': 5, 'a': 1}
Here's a very general solution. You can deal with any number of dict + keys that are only in some dict + easily use any aggregation function you want:
def aggregate_dicts(dicts, operation=sum):
"""Aggregate a sequence of dictionaries using `operation`."""
all_keys = set().union(*[el.keys() for el in dicts])
return {k: operation([dic.get(k, None) for dic in dicts]) for k in all_keys}
example:
dicts_same_keys = [{'x': 0, 'y': 1}, {'x': 1, 'y': 2}, {'x': 2, 'y': 3}]
aggregate_dicts(dicts_same_keys, operation=sum)
#{'x': 3, 'y': 6}
example non-identical keys and generic aggregation:
dicts_diff_keys = [{'x': 0, 'y': 1}, {'x': 1, 'y': 2}, {'x': 2, 'y': 3, 'c': 4}]
def mean_no_none(l):
l_no_none = [el for el in l if el is not None]
return sum(l_no_none) / len(l_no_none)
aggregate_dicts(dicts_diff_keys, operation=mean_no_none)
# {'x': 1.0, 'c': 4.0, 'y': 2.0}
dict1 = {'a':1, 'b':2, 'c':3}
dict2 = {'a':3, 'g':1, 'c':4}
dict3 = {} # will store new values
for x in dict1:
if x in dict2: #sum values with same key
dict3[x] = dict1[x] +dict2[x]
else: #add the values from x to dict1
dict3[x] = dict1[x]
#search for new values not in a
for x in dict2:
if x not in dict1:
dict3[x] = dict2[x]
print(dict3) # {'a': 4, 'b': 2, 'c': 7, 'g': 1}
Merging three dicts a,b,c in a single line without any other modules or libs
If we have the three dicts
a = {"a":9}
b = {"b":7}
c = {'b': 2, 'd': 90}
Merge all with a single line and return a dict object using
c = dict(a.items() + b.items() + c.items())
Returning
{'a': 9, 'b': 2, 'd': 90}
Complete the given method called solve which takes as parameter a dictionary A.
A has some keys, but the values of the keys 'x' and 'y' have been mistakenly swapped. You have to put them back in their original positions, And return the dictionary.
Example Input:
{'a': 1, 'b': 200, 'x': 2, 'y': 7}
Output:
{'a': 1, 'b': 200, 'x': 7, 'y': 2}
def solve(d):
d['x'], d['y'] = d['y'], d['x']
return d
i know but cant understand it!
This has do with mutability in Python: whether a given object in memory can or can't be changed. You're providing the function with variables that point to integer objects, and these are immutable.
You can verify this yourself in the Python interpreter.
>>> A = {'a': 1, 'b': 200, 'x': 2, 'y': 7}
>>> x = A['x']
>>> x
2
>>> id(A['x'])
4402888976
>>> id(x)
4402888976
>>> x = 5
>>> id(A['x'])
4402888976
>>> id(x)
4402889072
>>> A
{'a': 1, 'b': 200, 'x': 2, 'y': 7}
>>> x
5
As you can see in this example, x initially has the same identity as A['x'] (they point to the same object in memory that holds the integer value 2). But when we assign x a new value, since integers are immutable, Python doesn't change the value where 2 is stored in memory; instead, it allocates another space in memory where it stores the integer value 5 and points x to it. So x points to another integer object in memory while A['x'] keeps pointing to the initial value it was assigned. The same thing is happening with your function.
Now, dictionaries are mutable objects in Python, so if you pass the dictionary instead of two integers, you can manipulate the dictionary within the function and your changes will remain after the function exits.
>>> A = {'a': 1, 'b': 200, 'x': 2, 'y': 7}
>>> def solve(d):
... d['x'], d['y'] = d['y'], d['x']
...
>>> solve(A)
>>> A
{'a': 1, 'b': 200, 'x': 7, 'y': 2}
For a dictionary you can use the update method
d = {'a': 1, 'b': 200, 'x': 2, 'y': 7}
d.update({'x':d['y'], 'y':d['x']})
print(d)
{'a': 1, 'b': 200, 'x': 7, 'y': 2}
The problem with your code is that you are manipulating a and b only in the function's scope. You are not returning anything, so the program crashes as you try to print something that is never returned.
Try this solution instead:
dict = {
"a": 1,
"b": 200,
"x": 2,
"y": 7
}
def swap(a,b):
a,b = b,a
return a,b
def solve(A):
A['x'], A['y'] = swap(A['x'], A['y'])
return A
print(solve(dict))
I have a dictionary with keys as single characters. I want to substitute the upper-cased characters with doubled versions of them.
For example, I have this structure:
x = 'AbCDEfGH'
a = dict(zip(list(x), range(len(x))))
print(a)
which creates this dictionary:
{'A': 0, 'b': 1, 'C': 2, 'D': 3, 'E': 4, 'f': 5, 'G': 6, 'H': 7}
The values don't matter, so I just use some integers. What I want is to substitute the upper-cased keys with double characters, so that I get this:
{'AA': 0, 'b': 1, 'CC': 2, 'DD': 3, 'EE': 4, 'f': 5, 'GG': 6, 'HH': 7}
So, I tried the following in-place substitution:
for k, v in a.items():
if k.isupper():
a[k+k] = a.pop(k)
print(a)
But this, strangely, results in:
{'b': 1, 'E': 4, 'f': 5, 'G': 6, 'CCCCCCCCCCCCCCCC': 2, 'DDDDDDDDDDDDDDDD': 3, 'HHHHHHHHHHHHHHHH': 7, 'AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA': 0}
Even stranger, if I set all keys to upper-case:
y = 'ABCDEFGH'
a = dict(zip(list(y), range(len(y))))
for k, v in a.items():
if k.isupper():
a[k+k] = a.pop(k)
print(a)
it yields:
{'D': 3, 'E': 4, 'F': 5, 'CCCCCCCC': 2, 'GGGGGGGG': 6, 'HHHHHHHH': 7, 'AAAAAAAAAAAAAAAA': 0, 'BBBBBBBBBBBBBBBB': 1}
What is happening? I see the keys are repeating in magnitudes of 2. But, why?
I don't really care about the order of the items, but I see some aren't even being changed.
Is there any other way to substitute the keys the way I intend to?
.items() returns a live view of the underlying dict contents. Mutating the dict while iterating it causes unpredictable effects, usually leading to some keys being processed more than once (thus some keys doubling multiple times), while others aren't processed at all. Python tries to defend you from this by raising a RuntimeError if the dict changes size during iteration, but your code is keeping a constant size at the time of the check (when the next item is requested from the iterator), so Python's cheap length check doesn't save you.
The minimal fix is to make your loop run over a snapshot of the items:
for k, v in tuple(a.items()):
A simpler solution is a dict comprehension though:
a = {k*2 if k.isupper() else k: v for k, v in a.items()}
That builds a whole new dict with the doubled keys before reassigning a, so no mutation issues occur. You could build a in one fell swoop for that matter, just by doing:
a = {let*2 if let.isupper() else let: i for i, let in enumerate(x)}
No need to listify x (strings already iterate by character) and enumerate can take care of numbering the values for you without needing zip, range or len at all.
How do I merge the views of the items of two dicts in python?
My use case is: I have two private dictionaries in a class. I want the API to treat them as one, and provide an items method as such. The only way I know of is to combine them then provide a view on the two, but for large dictionaries this seems expensive. I'm thinking of sth like a.items() + b.items()
Note: I don't care about key clashes.
This is what I'd like to improve:
class A:
_priv1 = {'a': 1, 'b': 2}
_priv2 = {'c': 1, 'd': 2}
def items(self):
return {**self._priv1, **self._priv2}.items()
You can use ChainMap:
from collections import ChainMap
class A:
_priv1 = {'a': 1, 'b': 2}
_priv2 = {'c': 1, 'd': 2}
def items(self):
return ChainMap(self._priv1, self._priv2)
Merging views of dictionaries means nothing because a view always reflects the content of the corresponding dictionary.
So to have a different view, either you edit one of your dictionaries or you instantiate a new one (like you did), there is no way around it. See this.
But maybe what you want is itertools.chain to iterate across multiple iterables. This solution doesn't insatiate. Or as other have said collections.ChainMap. I would use chain to iterate and ChainMap to make lookups.
You can use ChainMap:
A ChainMap groups multiple dicts or other mappings together to create a single, updateable view. If no maps are specified, a single empty dictionary is provided so that a new chain always has at least one mapping.
from collections import ChainMap
context = ChainMap(_priv1, _priv2, ...)
Example:
In [3]: _priv1 = {1: 2, 3: 4}
In [4]: _priv2 = {5: 6, 7: 8}
In [5]: from collections import ChainMap
...: context = ChainMap(_priv1, _priv2)
In [6]: context
Out[6]: ChainMap({1: 2, 3: 4}, {5: 6, 7: 8})
In [7]: _priv1.update({9: 10})
In [8]: context
Out[8]: ChainMap({1: 2, 3: 4, 9: 10}, {5: 6, 7: 8})
In [9]: context.get(9)
Out[9]: 10
For your code example, I'd use:
from collections import ChainMap
class A:
_priv1 = {'a': 1, 'b': 2}
_priv2 = {'c': 1, 'd': 2}
_union_dict = ChainMap(_priv1, _priv2)
#classmethod
def items(cls):
return cls._union_dict.items()
You can use chain to combine two or more views. As stated:
Make an iterator that returns elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all of the iterables are exhausted.
That way, the data is not copied.
from itertools import chain
class A:
_priv1 = {'a': 1, 'b': 2}
_priv2 = {'c': 1, 'd': 2}
def items(self):
return chain(self._priv1.items(), self._priv2.items())
Is it possible to iterate through a dictionary while updating a separate dictionary? I tried to create a copy of my original dictionary and edit that one but I'm still getting an error
d = {30:3, 54:5, 16:2}
r = d
for k,v in d.items():
biggest = max(d,key = d.get)
del(r[biggest])
I need to find the largest value of the dictionary each time it loops through. Then I need to remove that item from the dictionary so I can find the next largest item.
The error I get when I run this code is as follows.
Traceback (most recent call last):
File "<pyshell#7>", line 1, in <module>
for k,v in d.items():
RuntimeError: dictionary changed size during iteration
You are actually iterating over the same dictionary since r = d does not creates a new dictionary. r is just another reference to the same dictionary. You can check the object identities to confirm that:
>>> r = d
>>> r is d
True
Please, see this discussion for more details about object identity:
"is" operator behaves unexpectedly with integers
So, the right thing to do is first create a copy of the dictionary and then alter it:
>>> r = d.copy()
>>> r is d
False
And the iteration:
for k,v in d.items():
biggest = max(d,key = d.get)
del(r[biggest])
So, from your code we just need to change a single line:
d = {30:3, 54:5, 16:2}
r = d.copy() # changed: use copy here
for k,v in d.items():
biggest = max(d,key = d.get)
del(r[biggest])
Iterate on a copy of the dict:
d = {30:3, 54:5, 16:2}
r = d
for k,v in dict(d).items():
biggest = max(d,key = d.get)
del(r[biggest])
As others have pointed out, you cannot change the size of the dictionary while iterating thorough it. It has also been noted by #user312016 that you can iterate over a copy and modify the original.
I am not sure what the intention is, but this code will sort the items from largest value to smallest so you don't have to find the max on each iteration:
d = {30:3, 54:5, 16:2}
d_ = sorted(d.items(), key=lambda x: x[1], reverse=True)
for k, v in d_:
print(k, v)
54, 5
30, 3
16, 2
The problem is that when you use = in the line r = d, then r is not a new object. It is the same d. I mean they are refer to a single dictionary:
>>> x = {'a':1, 'b':2}
>>> y = x
>>> x
{'a': 1, 'b': 2}
>>> y
{'a': 1, 'b': 2}
>>> x is y
True
So if you change one of them, the other will change too:
>>> y['c']=3
>>> y
{'a': 1, 'c': 3, 'b': 2}
>>> x
{'a': 1, 'c': 3, 'b': 2}
Using id() method you can check if they are referring to different places in memory or not:
>>> id(y)
44703816L
>>> id(x)
44703816L
>>>
So, you need to use copy() method instead of =:
>>> import copy
>>> z = copy.copy(x)
>>> z
{'a': 1, 'c': 3, 'b': 2}
>>> x
{'a': 1, 'c': 3, 'b': 2}
>>> z is x
False
>>>
That cause changing one of them, don't change the other:
>>> z
{'a': 1, 'c': 3, 'b': 2}
>>> x
{'a': 1, 'c': 3, 'b': 2}
>>> z['d']=4
>>> z
{'a': 1, 'c': 3, 'b': 2, 'd': 4}
>>> x
{'a': 1, 'c': 3, 'b': 2}
>>>