The best way to merge multi-nested dictionaries in Python 2.7 - python

I have two nested dictionaries and I want to merge them into one (where second dict overrides first dict values). I saw a lot of beautiful solutions for merging "flat" (not nested) dictionaries, e.g.:
dict_result = dict1.copy()
dict_result.update(dict2)
or
dict_result = dict(dict1.items() + dict2.items())
or (my favorit one)
dict_result = dict(d1,**d2)
but couldn't find the most efficient way to merge multi-nested dicts.
I'm trying to avoid recursion. What is your proposition?

Unless the depth of the dictionaries to merge is strictly limited, there's no way to avoid recursion.1) Also, there's no bultin or library function to do this (that is, none that I know of), but it's actually not all that hard. Something like this should do:
def merge(d1, d2):
for k in d2:
if k in d1 and isinstance(d1[k], dict) and isinstance(d2[k], dict):
merge(d1[k], d2[k])
else:
d1[k] = d2[k]
What this does: It iterates the keys in d2 and if the key can also be found in d1 and both are dictionaries, merge those sub-dictionaries, otherwise overwrite the value in d1 with that from d2. Note that this changes d1 and its sub-dictionaries in place, so you might want to deep-copy it before.
Or use this version to create a merged copy:
def merge_copy(d1, d2):
return {k: merge_copy(d1[k], d2[k]) if k in d1 and isinstance(d1[k], dict) and isinstance(d2[k], dict) else d2[k] for k in d2}
Example:
>>> d1 = {"foo": {"bar": 23, "blub": 42}, "flub": 17}
>>> d2 = {"foo": {"bar": 100}, "flub": {"flub2": 10}, "more": {"stuff": 111}}
>>> merge(d1, d2)
>>> print d1
{'foo': {'bar': 100, 'blub': 42}, 'flub': {'flub2': 10}, 'more': {'stuff': 111}}
1) You can make it iterative, using a stack, but this will only make things more complicated and should only be done to avoid problems with maximum recursive depth.

Modified version of the above merge_copy function for dicts that can be thought of as merging parent and child where you want the parent to inherit all the values from child to create a new dict.
def merge_copy(child, parent):
'''returns parent updated with child values if exists'''
d = {}
for k in parent:
if k in child and isinstance(child[k], dict) and isinstance(parent[k], dict):
v = merge_copy(child[k], parent[k])
elif k in child:
v = child[k]
else:
v = parent[k]
d[k] = v
return d

Related

dict comprehension with nested loop based on condition

Say I have a nested dict that I want to flatten. The dict is only one layer deep and the nested elements are either dict or str. So given the example data below:
data = {
'foo': 'bar',
'sample': {
'a': 213,
'b': 634298,
'c': 1},
'doo': 'val',
'spaz': {
'x': 4,
'y': 32,
'z': 18}}
Desired output is a flattened dict as follows:
{'foo': 'bar',
'sample_a': 213,
'sample_b': 634298,
'sample_c': 1,
'doo': 'val',
'spaz_x': 4,
'spaz_y': 32,
'spaz_z': 18}
The traditional way I'd do this is a loop such as this:
data_out = {}
for k, v in data.items():
if isinstance(v, dict):
for ik, iv in v.items():
data_out[f'{k}_{ik}'] = iv
else:
data_out[k] = v
Although I'm looking to see how I can do this cleanly with a comprehension. My attempt uses a conditional nested loop along with itertools.chain.from_iterable
data_out = dict(chain.from_iterable(((f'{k}_{ik}', iv) for ik, iv in v.items()) if isinstance(v, dict) else ((k, v),) for k, v in data.items()))
Below is the same thing as above only broken down by line for readability:
data_out = dict(chain.from_iterable(
((f'{k}_{ik}', iv) for ik, iv in v.items())
if isinstance(v, dict)
else ((k, v),)
for k, v in data.items()))
While this works I'm convinced this more than needed, I just cannot think of a better way preferably as a traditional dict comprehension and not using dict on a generator comprehension. Only way I can really shorten the code that I can think of is removing .from_iterable and just star unpacking the generator although I don't think this is more optimal for this approach.
You could invert the order of the loops in your dict comprehension and thus create the "flat" dictionary directly, without chain.from_iterable, but that does not really make it any clearer.
>>> {k: v for (k1, v1) in data.items()
... for (k, v) in (((f'{k1}_{k2}', v2) for (k2, v2) in v1.items())
... if isinstance(v1, dict) else ((k1, v1),))}
...
{'doo': 'val',
'foo': 'bar',
'sample_a': 213,
'sample_b': 634298,
'sample_c': 1,
'spaz_x': 4,
'spaz_y': 32,
'spaz_z': 18}
Another way might be to create two dicts and then **-combine them to the final dict:
>>> {**{ k1: v1 for (k1, v1) in data.items() if isinstance(v1, str)},
... **{f'{k1}_{k2}': v2 for (k1, v1) in data.items() if isinstance(v1, dict)
... for (k2, v2) in v1.items()}}
However, IMHO your "traditional" nested loop is much cleaner than either of those three versions. Alternatively, just use any of the other recursive "flatten dictionary with arbitrary depth" functions found here.
I actually figured out a way I can do this using getattr as a shortcut to checking if its a dict or not. To me this is more efficient than other ways I’ve seen:
data_out = {'_'.join(filter((0).__ne__, [pk, sk])): v
for pk, v in data.items()
for sk, v in getattr(v, 'items', {0: v}.items)()}
Legend:
pk - parent_key
sk - suffix_key or sub_key
And if you may have 0's in your keys I just used a "trasher" object:
fill = object(); nfill = fill.__ne__
data_out = {'_'.join(filter(nfill, [pk, sk])): v
for pk, v in data.items()
for sk, v in getattr(v, 'items', {fill:v}.items)()}

Merging two dictionaries of tuples [duplicate]

I have multiple dicts (or sequences of key-value pairs) like this:
d1 = {key1: x1, key2: y1}
d2 = {key1: x2, key2: y2}
How can I efficiently get a result like this, as a new dict?
d = {key1: (x1, x2), key2: (y1, y2)}
See also: How can one make a dictionary with duplicate keys in Python?.
Here's a general solution that will handle an arbitrary amount of dictionaries, with cases when keys are in only some of the dictionaries:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(list)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key].append(value)
print(dd) # result: defaultdict(<type 'list'>, {1: [2, 6], 3: [4, 7]})
assuming all keys are always present in all dicts:
ds = [d1, d2]
d = {}
for k in d1.iterkeys():
d[k] = tuple(d[k] for d in ds)
Note: In Python 3.x use below code:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = tuple(d[k] for d in ds)
and if the dic contain numpy arrays:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = np.concatenate(list(d[k] for d in ds))
This function merges two dicts even if the keys in the two dictionaries are different:
def combine_dict(d1, d2):
return {
k: tuple(d[k] for d in (d1, d2) if k in d)
for k in set(d1.keys()) | set(d2.keys())
}
Example:
d1 = {
'a': 1,
'b': 2,
}
d2` = {
'b': 'boat',
'c': 'car',
}
combine_dict(d1, d2)
# Returns: {
# 'a': (1,),
# 'b': (2, 'boat'),
# 'c': ('car',)
# }
dict1 = {'m': 2, 'n': 4}
dict2 = {'n': 3, 'm': 1}
Making sure that the keys are in the same order:
dict2_sorted = {i:dict2[i] for i in dict1.keys()}
keys = dict1.keys()
values = zip(dict1.values(), dict2_sorted.values())
dictionary = dict(zip(keys, values))
gives:
{'m': (2, 1), 'n': (4, 3)}
If you only have d1 and d2,
from collections import defaultdict
d = defaultdict(list)
for a, b in d1.items() + d2.items():
d[a].append(b)
Here is one approach you can use which would work even if both dictonaries don't have same keys:
d1 = {'a':'test','b':'btest','d':'dreg'}
d2 = {'a':'cool','b':'main','c':'clear'}
d = {}
for key in set(d1.keys() + d2.keys()):
try:
d.setdefault(key,[]).append(d1[key])
except KeyError:
pass
try:
d.setdefault(key,[]).append(d2[key])
except KeyError:
pass
print d
This would generate below input:
{'a': ['test', 'cool'], 'c': ['clear'], 'b': ['btest', 'main'], 'd': ['dreg']}
Using precomputed keys
def merge(dicts):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: [d[k] for d in dicts if k in d]
for k in keys
}
This is essentially Flux's answer, generalized for a list of input dicts.
The set().union trick works by making a set union of the keys in all the source dictionaries. The union method on a set (we start with an empty one) can accept an arbitrary number of arguments, and make a union of each input with the original set; and it can accept other iterables (it does not require other sets for the arguments) - it will iterate over them and look for all unique elements. Since iterating over a dict yields its keys, they can be passed directly to the union method.
In the case where the keys of all inputs are known to be the same, this can be simplified: the keys can be hard-coded (or inferred from one of the inputs), and the if check in the list comprehension becomes unnecessary:
def merge(dicts):
return {
k: [d[k] for d in dicts]
for k in dicts[0].keys()
}
This is analogous to blubb's answer, but using a dict comprehension rather than an explicit loop to build the final result.
We could also try something like Mahdi Ghelichi's answer:
def merge(dicts):
values = zip(*(d.values() for d in ds))
return dict(zip(dicts[0].keys(), values))
This should work in Python 3.5 and below: dicts with identical keys will store them in the same order, during the same run of the program (if you run the program again, you may get a different ordering, but still a consistent one).
In 3.6 and above, dictionaries preserve their insertion order (though they are only guaranteed to do so by the specification in 3.7 and above). Thus, input dicts could have the same keys in a different order, which would cause the first zip to combine the wrong values.
We can work around this by "sorting" the input dicts (re-creating them with keys in a consistent order, like [{k:d[k] for k in dicts[0].keys()} for d in dicts]. (In older versions, this would be extra work with no net effect.) However, this adds complexity, and this double-zip approach really doesn't offer any advantages over the previous one using a dict comprehension.
Building the result explicitly, discovering keys on the fly
As in Eli Bendersky's answer, but as a function:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
return result
This will produce a defaultdict, a subclass of dict defined by the standard library. The equivalent code using only built-in dicts might look like:
def merge(dicts):
result = {}
for d in dicts:
for key, value in d.items():
result.setdefault(key, []).append(value)
return result
Using other container types besides lists
The precomputed-key approach will work fine to make tuples; replace the list comprehension [d[k] for d in dicts if k in d] with tuple(d[k] for d in dicts if k in d). This passes a generator expression to the tuple constructor. (There is no "tuple comprehension".)
Since tuples are immutable and don't have an append method, the explicit loop approach should be modified by replacing .append(value) with += (value,). However, this may perform poorly if there is a lot of key duplication, since it must create a new tuple each time. It might be better to produce lists first and then convert the final result with something like {k: tuple(v) for (k, v) in merged.items()}.
Similar modifications can be made to get sets (although there is a set comprehension, using {}), Numpy arrays etc. For example, we can generalize both approaches with a container type like so:
def merge(dicts, value_type=list):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: value_type(d[k] for d in dicts if k in d)
for k in keys
}
and
from collections import defaultdict
def merge(dicts, value_type=list):
# We stick with hard-coded `list` for the first part,
# because even other mutable types will offer different interfaces.
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
# This is redundant for the default case, of course.
return {k:value_type(v) for (k, v) in result}
If the input values are already sequences
Rather than wrapping the values from the source in a new list, often people want to take inputs where the values are all already lists, and concatenate those lists in the output (or concatenate tuples or 1-dimensional Numpy arrays, combine sets, etc.).
This is still a trivial modification. For precomputed keys, use a nested list comprehension, ordered to get a flat result:
def merge(dicts):
keys = set().union(*dicts)
return {
k: [v for d in dicts if k in d for v in d[k]]
# Alternately:
# k: [v for d in dicts for v in d.get(k, [])]
for k in keys
}
One might instead think of using sum to concatenate results from the original list comprehension. Don't do this - it will perform poorly when there are a lot of duplicate keys. The built-in sum isn't optimized for sequences (and will explicitly disallow "summing" strings) and will try to create a new list with each addition internally.
With the explicit loop approach, use .extend instead of .append:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].extend(value)
return result
The extend method of lists accepts any iterable, so this will work with inputs that have tuples for the values - of course, it still uses lists in the output; and of course, those can be converted back as shown previously.
If the inputs have one item each
A common version of this problem involves input dicts that each have a single key-value pair. Alternately, the input might be (key, value) tuples (or lists).
The above approaches will still work, of course. For tuple inputs, converting them to dicts first, like [{k:v} for (k, v) in tuples], allows for using the directly. Alternately, the explicit iteration approach can be modified to accept the tuples directly, like in Victoria Stuart's answer:
from collections import defaultdict
def merge(pairs):
result = defaultdict(list)
for key, value in pairs:
result[key].extend(value)
return result
(The code was simplified because there is no need to iterate over key-value pairs when there is only one of them and it has been provided directly.)
However, for these single-item cases it may work better to sort the values by key and then use itertools.groupby. In this case, it will be easier to work with the tuples. That looks like:
from itertools import groupby
def merge(tuples):
grouped = groupby(tuples, key=lambda t: t[0])
return {k: [kv[1] for kv in ts] for k, ts in grouped}
Here, t is used as a name for one of the tuples from the input. The grouped iterator will provide pairs of a "key" value k (the first element that was common to the tuples being grouped) and an iterator ts over the tuples in that group. Then we extract the values from the key-value pairs kv in the ts, make a list from those, and use that as the value for the k key in the resulting dict.
To merge one-item dicts this way, of course, convert them to tuples first. One simple way to do this, for a list of one-item dicts, is [next(iter(d.items())) for d in dicts].
Assuming there are two dictionaries with exact same keys, below is the most succinct way of doing it (python3 should be used for both the solution).
d1 = {'a': 1, 'b': 2, 'c':3}
d2 = {'a': 5, 'b': 6, 'c':7}
# get keys from one of the dictionary
ks = [k for k in d1.keys()]
print(ks)
['a', 'b', 'c']
# call values from each dictionary on available keys
d_merged = {k: (d1[k], d2[k]) for k in ks}
print(d_merged)
{'a': (1, 5), 'b': (2, 6), 'c': (3, 7)}
# to merge values as list
d_merged = {k: [d1[k], d2[k]] for k in ks}
print(d_merged)
{'a': [1, 5], 'b': [2, 6], 'c': [3, 7]}
If there are two dictionaries with some common keys, but a few different keys, a list of all the keys should be prepared.
d1 = {'a': 1, 'b': 2, 'c':3, 'd': 9}
d2 = {'a': 5, 'b': 6, 'c':7, 'e': 4}
# get keys from one of the dictionary
d1_ks = [k for k in d1.keys()]
d2_ks = [k for k in d2.keys()]
all_ks = set(d1_ks + d2_ks)
print(all_ks)
['a', 'b', 'c', 'd', 'e']
# call values from each dictionary on available keys
d_merged = {k: [d1.get(k), d2.get(k)] for k in all_ks}
print(d_merged)
{'d': [9, None], 'a': [1, 5], 'b': [2, 6], 'c': [3, 7], 'e': [None, 4]}
There is a great library funcy doing what you need in a just one, short line.
from funcy import join_with
from pprint import pprint
d1 = {"key1": "x1", "key2": "y1"}
d2 = {"key1": "x2", "key2": "y2"}
list_of_dicts = [d1, d2]
merged_dict = join_with(tuple, list_of_dicts)
pprint(merged_dict)
Output:
{'key1': ('x1', 'x2'), 'key2': ('y1', 'y2')}
More info here: funcy -> join_with.
def merge(d1, d2, merge):
result = dict(d1)
for k,v in d2.iteritems():
if k in result:
result[k] = merge(result[k], v)
else:
result[k] = v
return result
d1 = {'a': 1, 'b': 2}
d2 = {'a': 1, 'b': 3, 'c': 2}
print merge(d1, d2, lambda x, y:(x,y))
{'a': (1, 1), 'c': 2, 'b': (2, 3)}
If keys are nested:
d1 = { 'key1': { 'nkey1': 'x1' }, 'key2': { 'nkey2': 'y1' } }
d2 = { 'key1': { 'nkey1': 'x2' }, 'key2': { 'nkey2': 'y2' } }
ds = [d1, d2]
d = {}
for k in d1.keys():
for k2 in d1[k].keys():
d.setdefault(k, {})
d[k].setdefault(k2, [])
d[k][k2] = tuple(d[k][k2] for d in ds)
yields:
{'key1': {'nkey1': ('x1', 'x2')}, 'key2': {'nkey2': ('y1', 'y2')}}
Modifying this answer to create a dictionary of tuples (what the OP asked for), instead of a dictionary of lists:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(tuple)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key] += (value,)
print(dd)
The above prints the following:
defaultdict(<class 'tuple'>, {1: (2, 6), 3: (4, 7)})
d1 ={'B': 10, 'C ': 7, 'A': 20}
d2 ={'B': 101, 'Y ': 7, 'X': 8}
d3 ={'A': 201, 'Y ': 77, 'Z': 8}
def CreateNewDictionaryAssemblingAllValues1(d1,d2,d3):
aa = {
k :[d[k] for d in (d1,d2,d3) if k in d ] for k in set(d1.keys() | d2.keys() | d3.keys() )
}
aap = print(aa)
return aap
CreateNewDictionaryAssemblingAllValues1(d1, d2, d3)
"""
Output :
{'X': [8], 'C ': [7], 'Y ': [7, 77], 'Z': [8], 'B': [10, 101], 'A': [20, 201]}
"""
From blubb answer:
You can also directly form the tuple using values from each list
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = (d1[k], d2[k])
This might be useful if you had a specific ordering for your tuples
ds = [d1, d2, d3, d4]
d = {}
for k in d1.keys():
d[k] = (d3[k], d1[k], d4[k], d2[k]) #if you wanted tuple in order of d3, d1, d4, d2
Using below method we can merge two dictionaries having same keys.
def update_dict(dict1: dict, dict2: dict) -> dict:
output_dict = {}
for key in dict1.keys():
output_dict.update({key: []})
if type(dict1[key]) != str:
for value in dict1[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict1[key])
if type(dict2[key]) != str:
for value in dict2[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict2[key])
return output_dict
Input: d1 = {key1: x1, key2: y1} d2 = {key1: x2, key2: y2}
Output: {'key1': ['x1', 'x2'], 'key2': ['y1', 'y2']}
dicts = [dict1,dict2,dict3]
out = dict(zip(dicts[0].keys(),[[dic[list(dic.keys())[key]] for dic in dicts] for key in range(0,len(dicts[0]))]))
A compact possibility
d1={'a':1,'b':2}
d2={'c':3,'d':4}
context={**d1, **d2}
context
{'b': 2, 'c': 3, 'd': 4, 'a': 1}

We're trying to build a small dict that's a combination of two others but we're kinda failing

Write a function combine_dicts(d1, d2) that takes two dictionaries
as input and returns a dictionary that is a combination of them in the
following way:
If a key is only in one of the dictionaries it should be the same in the result.
If a key is in both dictionaries and have the same value (as compared with is in Python) it should be the same in the result.
But if a key is in both dictionaries with di erent values, the value in the result should be a tuple with two entires, the one from d1 and
the one from d2.
Note that this program doesn’t do anything partic- ular when run. It
only exists because it contains this function. The module can be
imported in programs that need that function.
So we came up with a pretty simple code, which we don't know if it's correct. We came up with the idea of including two small dicts at the beginning because the assignment provides no way of testing, so we thought if it worked on the dictionaries it would be fine. Things is, it throws error: TypeError: unhashable type: 'dict'. Can you tell us what we're doing wrong?
d1=dict()
d1={'1':'uno','2':'dos','3':'tres'}
d2=dict()
d2={'1':'ena', '2':'duo', '3':'tres'}
def combine_dicts(d1,d2):
d3={}
for key in (d1, d2):
# Key is in d1 or in d2
if key in d1 or d2:
d3[key]=value
return d3
# Key is in both d1 and d2 and values are the same. The below is incorrect right?
elif key in d1 and d2 and d1[key] is d2[key]:
d3[key]=value
return d3
else:
if key in d1 and d2 and d1[key] is not d2[key]:
t=tuple(d1[key], d2[key])
return t
There are a number of problems here.
Currently in your for loop you set key to be equal to d1 and then d2, when you want to set it equal to each of the keys.
You never set value anywhere
You return while processing the first key, and not after having combined the dictionaries, and sometimes you're not even returning a dictionary. You only want to return d3, and only at the end.
There's no need to create the dicts with dict() before you create them again with a {} expression.
key in d1 or d2 will always evaluate to True, because bool(d2) is True. What you mean to say is key in d1 or key in d2. However, you don't really want to have this particular if statement at all - all of the keys are from either d1 or d2!
These issues are all fixed below:
d1 = {'1': 'uno', '2': 'dos', '3': 'tres'}
d2 = {'1': 'ena', '2': 'duo', '3': 'tres'}
def combine_dicts(d1,d2):
d3={}
for key in set().union(d1, d2):
# Key is in d1 or in d2
if key in d1 and key in d2:
if d1[key] == d2[key]:
d3[key] = d1[key]
else:
d3[key] = (d1[key], d2[key])
elif key in d1:
d3[key] = d1[key]
else:
d3[key] = d2[key]
return d3
print(combine_dicts(d1, d2))
First of all, you seem to not understand how to construct a dict:
d1=dict() # Construct an empty dict
d1={'1':'uno','2':'dos','3':'tres'} # Construct a dict containing 3 keys/values
So you do not need to initialize your variables d1 and d2 with dict() since the construction {key: value} will create a new dict, and thus overwrite the previous value (in your case, an empty dict constructed by dict()).
Next:
for key in (d1, d2):
You are asking to iterate over (d1, d2), but (d1, d2) is actually a tuple containing your two dicts. So you will go through your loop only two times: one time with key = d1 and one time with key = d2. So the code in your loop cannot work since key is a dict and not a key of d1 or d2.
You should first construct a list, or better, a set containing the union of d1.keys() and d2.keys(). You can do it with:
all_keys = set(d1.keys()).union(d2.keys())
Then you must iterate over all_keys and construct your new dict.
for key in all_keys:
if key in d1 and key not in d2:
d3[key] = d1[key]
elif key in d2 and key not in d1:
d3[key] = d2[key]
else:
d3[key] = (d1[key], d2[key])
here ya go! hope this helps. just a couple for loops to process both dictionaries, the second for loop has the if/else logic to check for matching keys and/or matching values
def combine(d1,d2):
d = {}
for k,v in d1.items():
d[k] = v
for k,v in d2.items():
if k in d1.keys():
if d1[k] == v:
continue
d[k] = (d1[k], v)
else:
d[k] = v
return d
Ok, here goes. Since an explanation was requested.
Start with an empty dictionary, d3, that will be used as the merged
dictionary.
Copy everything from d1 into d3.
Then for each key in d2:
copy the key/value pair into d3 if the key didn't exist in d1.
Otherwise, do the special dupe/tuple logic.
Solution:
def combine_dicts(d1, d2):
d3 = d1.copy() # start by copying the first dict
for k in d2:
if (k in d1):
if d1[k] != d2[k]:
d3[k] = (d1[k], d2[k])
else:
d3[k] = d2[k]
return d3

Check that Python dicts have same shape and keys

For single layer dicts like x = {'a': 1, 'b': 2} the problem is easy and answered on SO (Pythonic way to check if two dictionaries have the identical set of keys?) but what about nested dicts?
For example, y = {'a': {'c': 3}, 'b': {'d': 4}} has keys 'a' and 'b' but I want to compare its shape to another nested dict structure like z = {'a': {'c': 5}, 'b': {'d': 6}} which has the same shape and keys (different values is fine) as y. w = {'a': {'c': 3}, 'b': {'e': 4}} would have keys 'a' and 'b' but on the next layer in it differs from y because w['b'] has key 'e' while y['b'] has key 'd'.
Want a short/simple function of two arguments dict_1 and dict_2 and return True if they have same shape and key as described above, and False otherwise.
This provides a copy of both dictionaries stripped of any non-dictionary values, then compares them:
def getshape(d):
if isinstance(d, dict):
return {k:getshape(d[k]) for k in d}
else:
# Replace all non-dict values with None.
return None
def shape_equal(d1, d2):
return getshape(d1) == getshape(d2)
I liked nneonneo's answer, and it should be relatively fast, but I want something that didn't create extra unnecessary data structures (I've been learning about memory fragmentation in Python). This may or may not be as fast or faster.
(EDIT: Spoiler!)
Faster by a decent enough margin to make it preferable in all cases, see the other analysis answer.
But if dealing with lots and lots of these and having memory problems, it is likely to be preferable to do it this way.
Implementation
This should work in Python 3, maybe 2.7 if you translate keys to viewkeys, definitely not 2.6. It relies on the set-like view of the keys that dicts have:
def sameshape(d1, d2):
if isinstance(d1, dict):
if isinstance(d2, dict):
# then we have shapes to check
return (d1.keys() == d2.keys() and
# so the keys are all the same
all(sameshape(d1[k], d2[k]) for k in d1.keys()))
# thus all values will be tested in the same way.
else:
return False # d1 is a dict, but d2 isn't
else:
return not isinstance(d2, dict) # if d2 is a dict, False, else True.
Edit updated to reduce redundant type check, now even more efficient.
Testing
To check:
print('expect false:')
print(sameshape({'foo':{'bar':{None:None}}}, {'foo':{'bar':{None: {} }}}))
print('expect true:')
print(sameshape({'foo':{'bar':{None:None}}}, {'foo':{'bar':{None:'foo'}}}))
print('expect false:')
print(sameshape({'foo':{'bar':{None:None}}}, {'foo':{'bar':{None:None, 'baz':'foo'}}}))
Prints:
expect false:
False
expect true:
True
expect false:
False
To profile the two currently existing answers, first lets import timeit:
import timeit
Now we need to setup the code:
setup = '''
import copy
def getshape(d):
if isinstance(d, dict):
return {k:getshape(d[k]) for k in d}
else:
# Replace all non-dict values with None.
return None
def nneo_shape_equal(d1, d2):
return getshape(d1) == getshape(d2)
def aaron_shape_equal(d1,d2):
if isinstance(d1, dict) and isinstance(d2, dict):
return (d1.keys() == d2.keys() and
all(aaron_shape_equal(d1[k], d2[k]) for k in d1.keys()))
else:
return not (isinstance(d1, dict) or isinstance(d2, dict))
class Vividict(dict):
def __missing__(self, key):
value = self[key] = type(self)()
return value
d = Vividict()
d['foo']['bar']
d['foo']['baz']
d['fizz']['buzz']
d['primary']['secondary']['tertiary']['quaternary']
d0 = copy.deepcopy(d)
d1 = copy.deepcopy(d)
d1['primary']['secondary']['tertiary']['extra']
# d == d0 is True
# d == d1 is now False!
'''
And now let's test the two options out, first with Python 3.3!
>>> timeit.repeat('nneo_shape_equal(d0, d); nneo_shape_equal(d1,d)', setup=setup)
[36.784881490981206, 36.212246977956966, 36.29759863798972]
And it looks like my solution takes 2/3rd to 3/4th the time, making it more than 1.25 times as fast.
>>> timeit.repeat('aaron_shape_equal(d0, d); aaron_shape_equal(d1,d)', setup=setup)
[26.838892214931548, 26.61037168605253, 27.170253590098582]
And on a version of Python 3.4 (an alpha) that I compiled myself:
>>> timeit.repeat('nneo_shape_equal(d0, d); nneo_shape_equal(d1,d)', setup=setup)
[272.5629618819803, 273.49581588001456, 270.13374400604516]
>>> timeit.repeat('aaron_shape_equal(d0, d); aaron_shape_equal(d1,d)', setup=setup)
[214.87033835891634, 215.69223327597138, 214.85333003790583]
Still about the same ratio. The time difference between the two is likely because I self-compiled 3.4 without optimizations.
Thanks to all readers!

How to merge dicts, collecting values from matching keys?

I have multiple dicts (or sequences of key-value pairs) like this:
d1 = {key1: x1, key2: y1}
d2 = {key1: x2, key2: y2}
How can I efficiently get a result like this, as a new dict?
d = {key1: (x1, x2), key2: (y1, y2)}
See also: How can one make a dictionary with duplicate keys in Python?.
Here's a general solution that will handle an arbitrary amount of dictionaries, with cases when keys are in only some of the dictionaries:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(list)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key].append(value)
print(dd) # result: defaultdict(<type 'list'>, {1: [2, 6], 3: [4, 7]})
assuming all keys are always present in all dicts:
ds = [d1, d2]
d = {}
for k in d1.iterkeys():
d[k] = tuple(d[k] for d in ds)
Note: In Python 3.x use below code:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = tuple(d[k] for d in ds)
and if the dic contain numpy arrays:
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = np.concatenate(list(d[k] for d in ds))
This function merges two dicts even if the keys in the two dictionaries are different:
def combine_dict(d1, d2):
return {
k: tuple(d[k] for d in (d1, d2) if k in d)
for k in set(d1.keys()) | set(d2.keys())
}
Example:
d1 = {
'a': 1,
'b': 2,
}
d2` = {
'b': 'boat',
'c': 'car',
}
combine_dict(d1, d2)
# Returns: {
# 'a': (1,),
# 'b': (2, 'boat'),
# 'c': ('car',)
# }
dict1 = {'m': 2, 'n': 4}
dict2 = {'n': 3, 'm': 1}
Making sure that the keys are in the same order:
dict2_sorted = {i:dict2[i] for i in dict1.keys()}
keys = dict1.keys()
values = zip(dict1.values(), dict2_sorted.values())
dictionary = dict(zip(keys, values))
gives:
{'m': (2, 1), 'n': (4, 3)}
If you only have d1 and d2,
from collections import defaultdict
d = defaultdict(list)
for a, b in d1.items() + d2.items():
d[a].append(b)
Here is one approach you can use which would work even if both dictonaries don't have same keys:
d1 = {'a':'test','b':'btest','d':'dreg'}
d2 = {'a':'cool','b':'main','c':'clear'}
d = {}
for key in set(d1.keys() + d2.keys()):
try:
d.setdefault(key,[]).append(d1[key])
except KeyError:
pass
try:
d.setdefault(key,[]).append(d2[key])
except KeyError:
pass
print d
This would generate below input:
{'a': ['test', 'cool'], 'c': ['clear'], 'b': ['btest', 'main'], 'd': ['dreg']}
Using precomputed keys
def merge(dicts):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: [d[k] for d in dicts if k in d]
for k in keys
}
This is essentially Flux's answer, generalized for a list of input dicts.
The set().union trick works by making a set union of the keys in all the source dictionaries. The union method on a set (we start with an empty one) can accept an arbitrary number of arguments, and make a union of each input with the original set; and it can accept other iterables (it does not require other sets for the arguments) - it will iterate over them and look for all unique elements. Since iterating over a dict yields its keys, they can be passed directly to the union method.
In the case where the keys of all inputs are known to be the same, this can be simplified: the keys can be hard-coded (or inferred from one of the inputs), and the if check in the list comprehension becomes unnecessary:
def merge(dicts):
return {
k: [d[k] for d in dicts]
for k in dicts[0].keys()
}
This is analogous to blubb's answer, but using a dict comprehension rather than an explicit loop to build the final result.
We could also try something like Mahdi Ghelichi's answer:
def merge(dicts):
values = zip(*(d.values() for d in ds))
return dict(zip(dicts[0].keys(), values))
This should work in Python 3.5 and below: dicts with identical keys will store them in the same order, during the same run of the program (if you run the program again, you may get a different ordering, but still a consistent one).
In 3.6 and above, dictionaries preserve their insertion order (though they are only guaranteed to do so by the specification in 3.7 and above). Thus, input dicts could have the same keys in a different order, which would cause the first zip to combine the wrong values.
We can work around this by "sorting" the input dicts (re-creating them with keys in a consistent order, like [{k:d[k] for k in dicts[0].keys()} for d in dicts]. (In older versions, this would be extra work with no net effect.) However, this adds complexity, and this double-zip approach really doesn't offer any advantages over the previous one using a dict comprehension.
Building the result explicitly, discovering keys on the fly
As in Eli Bendersky's answer, but as a function:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
return result
This will produce a defaultdict, a subclass of dict defined by the standard library. The equivalent code using only built-in dicts might look like:
def merge(dicts):
result = {}
for d in dicts:
for key, value in d.items():
result.setdefault(key, []).append(value)
return result
Using other container types besides lists
The precomputed-key approach will work fine to make tuples; replace the list comprehension [d[k] for d in dicts if k in d] with tuple(d[k] for d in dicts if k in d). This passes a generator expression to the tuple constructor. (There is no "tuple comprehension".)
Since tuples are immutable and don't have an append method, the explicit loop approach should be modified by replacing .append(value) with += (value,). However, this may perform poorly if there is a lot of key duplication, since it must create a new tuple each time. It might be better to produce lists first and then convert the final result with something like {k: tuple(v) for (k, v) in merged.items()}.
Similar modifications can be made to get sets (although there is a set comprehension, using {}), Numpy arrays etc. For example, we can generalize both approaches with a container type like so:
def merge(dicts, value_type=list):
# First, figure out which keys are present.
keys = set().union(*dicts)
# Build a dict with those keys, using a list comprehension to
# pull the values from the source dicts.
return {
k: value_type(d[k] for d in dicts if k in d)
for k in keys
}
and
from collections import defaultdict
def merge(dicts, value_type=list):
# We stick with hard-coded `list` for the first part,
# because even other mutable types will offer different interfaces.
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].append(value)
# This is redundant for the default case, of course.
return {k:value_type(v) for (k, v) in result}
If the input values are already sequences
Rather than wrapping the values from the source in a new list, often people want to take inputs where the values are all already lists, and concatenate those lists in the output (or concatenate tuples or 1-dimensional Numpy arrays, combine sets, etc.).
This is still a trivial modification. For precomputed keys, use a nested list comprehension, ordered to get a flat result:
def merge(dicts):
keys = set().union(*dicts)
return {
k: [v for d in dicts if k in d for v in d[k]]
# Alternately:
# k: [v for d in dicts for v in d.get(k, [])]
for k in keys
}
One might instead think of using sum to concatenate results from the original list comprehension. Don't do this - it will perform poorly when there are a lot of duplicate keys. The built-in sum isn't optimized for sequences (and will explicitly disallow "summing" strings) and will try to create a new list with each addition internally.
With the explicit loop approach, use .extend instead of .append:
from collections import defaultdict
def merge(dicts):
result = defaultdict(list)
for d in dicts:
for key, value in d.items():
result[key].extend(value)
return result
The extend method of lists accepts any iterable, so this will work with inputs that have tuples for the values - of course, it still uses lists in the output; and of course, those can be converted back as shown previously.
If the inputs have one item each
A common version of this problem involves input dicts that each have a single key-value pair. Alternately, the input might be (key, value) tuples (or lists).
The above approaches will still work, of course. For tuple inputs, converting them to dicts first, like [{k:v} for (k, v) in tuples], allows for using the directly. Alternately, the explicit iteration approach can be modified to accept the tuples directly, like in Victoria Stuart's answer:
from collections import defaultdict
def merge(pairs):
result = defaultdict(list)
for key, value in pairs:
result[key].extend(value)
return result
(The code was simplified because there is no need to iterate over key-value pairs when there is only one of them and it has been provided directly.)
However, for these single-item cases it may work better to sort the values by key and then use itertools.groupby. In this case, it will be easier to work with the tuples. That looks like:
from itertools import groupby
def merge(tuples):
grouped = groupby(tuples, key=lambda t: t[0])
return {k: [kv[1] for kv in ts] for k, ts in grouped}
Here, t is used as a name for one of the tuples from the input. The grouped iterator will provide pairs of a "key" value k (the first element that was common to the tuples being grouped) and an iterator ts over the tuples in that group. Then we extract the values from the key-value pairs kv in the ts, make a list from those, and use that as the value for the k key in the resulting dict.
To merge one-item dicts this way, of course, convert them to tuples first. One simple way to do this, for a list of one-item dicts, is [next(iter(d.items())) for d in dicts].
Assuming there are two dictionaries with exact same keys, below is the most succinct way of doing it (python3 should be used for both the solution).
d1 = {'a': 1, 'b': 2, 'c':3}
d2 = {'a': 5, 'b': 6, 'c':7}
# get keys from one of the dictionary
ks = [k for k in d1.keys()]
print(ks)
['a', 'b', 'c']
# call values from each dictionary on available keys
d_merged = {k: (d1[k], d2[k]) for k in ks}
print(d_merged)
{'a': (1, 5), 'b': (2, 6), 'c': (3, 7)}
# to merge values as list
d_merged = {k: [d1[k], d2[k]] for k in ks}
print(d_merged)
{'a': [1, 5], 'b': [2, 6], 'c': [3, 7]}
If there are two dictionaries with some common keys, but a few different keys, a list of all the keys should be prepared.
d1 = {'a': 1, 'b': 2, 'c':3, 'd': 9}
d2 = {'a': 5, 'b': 6, 'c':7, 'e': 4}
# get keys from one of the dictionary
d1_ks = [k for k in d1.keys()]
d2_ks = [k for k in d2.keys()]
all_ks = set(d1_ks + d2_ks)
print(all_ks)
['a', 'b', 'c', 'd', 'e']
# call values from each dictionary on available keys
d_merged = {k: [d1.get(k), d2.get(k)] for k in all_ks}
print(d_merged)
{'d': [9, None], 'a': [1, 5], 'b': [2, 6], 'c': [3, 7], 'e': [None, 4]}
There is a great library funcy doing what you need in a just one, short line.
from funcy import join_with
from pprint import pprint
d1 = {"key1": "x1", "key2": "y1"}
d2 = {"key1": "x2", "key2": "y2"}
list_of_dicts = [d1, d2]
merged_dict = join_with(tuple, list_of_dicts)
pprint(merged_dict)
Output:
{'key1': ('x1', 'x2'), 'key2': ('y1', 'y2')}
More info here: funcy -> join_with.
def merge(d1, d2, merge):
result = dict(d1)
for k,v in d2.iteritems():
if k in result:
result[k] = merge(result[k], v)
else:
result[k] = v
return result
d1 = {'a': 1, 'b': 2}
d2 = {'a': 1, 'b': 3, 'c': 2}
print merge(d1, d2, lambda x, y:(x,y))
{'a': (1, 1), 'c': 2, 'b': (2, 3)}
If keys are nested:
d1 = { 'key1': { 'nkey1': 'x1' }, 'key2': { 'nkey2': 'y1' } }
d2 = { 'key1': { 'nkey1': 'x2' }, 'key2': { 'nkey2': 'y2' } }
ds = [d1, d2]
d = {}
for k in d1.keys():
for k2 in d1[k].keys():
d.setdefault(k, {})
d[k].setdefault(k2, [])
d[k][k2] = tuple(d[k][k2] for d in ds)
yields:
{'key1': {'nkey1': ('x1', 'x2')}, 'key2': {'nkey2': ('y1', 'y2')}}
Modifying this answer to create a dictionary of tuples (what the OP asked for), instead of a dictionary of lists:
from collections import defaultdict
d1 = {1: 2, 3: 4}
d2 = {1: 6, 3: 7}
dd = defaultdict(tuple)
for d in (d1, d2): # you can list as many input dicts as you want here
for key, value in d.items():
dd[key] += (value,)
print(dd)
The above prints the following:
defaultdict(<class 'tuple'>, {1: (2, 6), 3: (4, 7)})
d1 ={'B': 10, 'C ': 7, 'A': 20}
d2 ={'B': 101, 'Y ': 7, 'X': 8}
d3 ={'A': 201, 'Y ': 77, 'Z': 8}
def CreateNewDictionaryAssemblingAllValues1(d1,d2,d3):
aa = {
k :[d[k] for d in (d1,d2,d3) if k in d ] for k in set(d1.keys() | d2.keys() | d3.keys() )
}
aap = print(aa)
return aap
CreateNewDictionaryAssemblingAllValues1(d1, d2, d3)
"""
Output :
{'X': [8], 'C ': [7], 'Y ': [7, 77], 'Z': [8], 'B': [10, 101], 'A': [20, 201]}
"""
From blubb answer:
You can also directly form the tuple using values from each list
ds = [d1, d2]
d = {}
for k in d1.keys():
d[k] = (d1[k], d2[k])
This might be useful if you had a specific ordering for your tuples
ds = [d1, d2, d3, d4]
d = {}
for k in d1.keys():
d[k] = (d3[k], d1[k], d4[k], d2[k]) #if you wanted tuple in order of d3, d1, d4, d2
Using below method we can merge two dictionaries having same keys.
def update_dict(dict1: dict, dict2: dict) -> dict:
output_dict = {}
for key in dict1.keys():
output_dict.update({key: []})
if type(dict1[key]) != str:
for value in dict1[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict1[key])
if type(dict2[key]) != str:
for value in dict2[key]:
output_dict[key].append(value)
else:
output_dict[key].append(dict2[key])
return output_dict
Input: d1 = {key1: x1, key2: y1} d2 = {key1: x2, key2: y2}
Output: {'key1': ['x1', 'x2'], 'key2': ['y1', 'y2']}
dicts = [dict1,dict2,dict3]
out = dict(zip(dicts[0].keys(),[[dic[list(dic.keys())[key]] for dic in dicts] for key in range(0,len(dicts[0]))]))
A compact possibility
d1={'a':1,'b':2}
d2={'c':3,'d':4}
context={**d1, **d2}
context
{'b': 2, 'c': 3, 'd': 4, 'a': 1}

Categories