How to merge two dicts and combine common keys? - python

I would like to know how if there exists any python function to merge two dictionary and combine all values that have a common key.
I have found function to append two dict, to merge two dict but not to combine its values.
Example:
D1 = [{k1: v01}, {k3: v03}, {k4: v04},}],
D2 = [{k1: v11}, {k2: v12}, {k4: v14},}],
this should be the expected result:
D3 = [
{k1: [v01, v11]},
{k2: [ v12]},
{K3: [v03 ]},
{k4: [v04, v14]},
]

There is no built-in function for this but you can use a defaultdict for this:
from collections import defaultdict
d = defaultdict(list)
for other in [d1, d1]:
for k, v in other.items():
d[k].append(v)

A solution, without importing anything:
# First initialize data, done correctly here.
D1 = [{'k1': 'v01'}, {'k3': 'v03'}, {'k4': 'v04'}]
D2 = [{'k1': 'v11'}, {'k2': 'v12'}, {'k4': 'v14'}]
# Get all unique keys
keys = {k for d in [*D1, *D2] for k in d}
# Initialize an empty dict
D3 = {x:[] for x in keys}
# sort to maintain order
D3 = dict(sorted(D3.items()))
#Iterate and extend
for x in [*D1, *D2]:
for k,v in x.items():
D3[k].append(v)
# NOTE: I do not recommend you convert a dictionary into a list of records.
# Nonetheless, here is how it would be done.
# To convert to a list
D3_list = [{k:v} for k,v in D3.items()]
print(D3_list)
# [{'k1': ['v01', 'v11']},
# {'k2': ['v12']},
# {'k3': ['v03']},
# {'k4': ['v04', 'v14']}]

If you meant to use actual dicts, instead of lists of dicts, this is easier.
D1 = dict(k1=1, k3=3, k4=4)
D2 = dict(k1=11, k2=12, k4=14)
There isn't a simple built-in function to do this, but the setdefault method is close.
It tries to get the given key, but creates it if it doesn't exist.
D3 = {}
for k, v in D1.items() | D2.items():
D3.setdefault(k, set()).add(v)
And the result.
{'k4': {4, 14}, 'k1': {1, 11}, 'k3': {3}, 'k2': {12}}
This all assumes the order doesn't matter, just combining sets.

A more generic way to merge dicts together may look like this.
(To answer a similar SO question)
def merge(combiner, dicts):
new_dict = {}
for d in dicts:
for k, v in d.items():
if k in new_dict:
new_dict[k] = combiner(new_dict[k], v)
else:
new_dict[k] = v
return new_dict
x = {'a': 'A', 'b': 'B'}
y = {'b': 'B', 'c': 'C'}
z = {'a': 'A', 'd': 'D'}
merge_dicts(combiner= lambda x, y: f'{x} AND {y}', dicts=(x,y,z))
# {'a': 'A AND A', 'b': 'B AND B', 'c': 'C', 'd': 'D'}

Related

How to extract values from list of dictionary that match the keys in another list

I have a list of keys:
l_keys = ['a', 'c', 'd']
And I have a list of dictionary:
l_dict = [{'a': 1, 'b': 2, 'c': 3}, {'a':4, 'd':5}]
The result I want to get is:
[{'a': 1, 'c': 3}, {'a': 4, 'd': 5}]
.
I can achieve this result in the following way
[{k: d[key] for k in l_keys if k in l_dict} for d in l_dict]
.
Explain:
I actually go through every object in l_dict and then I go through every key in l_keys and check if that key is in the current object and if so I retrieve it and its value
My question is if there is a better, professional and faster way in terms of time complexity to do that.
Firstly, your list comprehension should be: [{k: d[k] for k in l_keys if k in d} for d in l_dict]
If you know that len(l_keys) will usually be smaller than the dicts in l_dict, your way is the most efficient. Otherwise, it would be better to check whether each key in the dict is in l_keys: [{k: d[k] for k in d if k in l_keys} for d in l_dict] l_set = set(l_keys): [{k: d[k] for k in d if k in l_set} for d in l_dict]
This page might be helpful when it comes to time complexity: https://wiki.python.org/moin/TimeComplexity#dict

How to extract the values from the dictionary with condition

My dictionary is below, if the values end with csv or json i need to put in another dictionary of same dictionary:
d = {'a': ['1.json', '1.html', '1.csv'], 'B': ['2.json', '2.html', '2.csv']}
Code is below:
d = {}
for k,v in d.items():
for i in v:
if i.split('.')[1] == 'csv' or i.split('.')[1] == 'json':
d[k] = v
My out
{'a': ['1.json', '1.html', '1.csv'], 'B': ['2.json', '2.html', '2.csv']}
Expected out
{'a': ['1.json', '1.csv'], 'B': ['2.json', '2.csv']}
You never change v, so the dictionary is unchanged. You should build a new list only keeping the wanted values:
for k,v in d.items():
v = [i for i in v if i.split('.')[1] in ('csv', 'json')]
d[k] = v
or even better:
d = {k: [i for i in v if i.split('.')[1] in ('csv', 'json')]
for k,v in d.items()}
You can try to use rfind to find the last . in the file name and to add it the list if it's matching CSV or JSON
d = {'a': ['1.json', '1.html', '1.csv'], 'B': ['2.json', '2.html', '2.csv']}
for k, v in d.items():
d[k] = []
for i in v:
if i[i.rfind(".") + 1:] in ('csv', 'json'):
d[k].append(i)
Please test it:
from collections import defaultdict
x = defaultdict(list)
d = {'a': ['1.json', '1.html', '1.csv'], 'B': ['2.json', '2.html', '2.csv']}
for k, v in d.items():
for i in v:
if i.split('.')[-1] in ['csv', 'json']:
x[k].append(i)
print(dict(x))
You can use dict comprehension with list comp on each of the values while filtering with str.endswith.
d = {'a': ['1.json', '1.html', '1.csv'], 'B': ['2.json', '2.html', '2.csv']}
new = {k:[i for i in v if i.endswith(('.csv','.json'))] for k,v in d.items()}
# {'a': ['1.json', '1.csv'], 'B': ['2.json', '2.csv']}
A simple solution would be the following, assuming you want to update the existing dictionary
for k, v in d.items():
for i in v:
if i.rsplit('.')[-1] not in {'csv','json'}:
v.remove(i)
You can use split filename and add condition for checking the file type (using list for comparison could be a better option if you want to add more file types).
Combine this with list Comprehension, which provides a concise way to create lists.
d = {'a': ['1.json', '1.html', '1.csv'], 'B': ['2.json', '2.html', '2.csv']}
filtered_d = {key:[item for item in value if item.split('.')[-1] in ['json','csv']] for key, value in d.items()}
you can check documentation here for more detail on list comprehension.

Weave two dictionaries into one

I have two dictionaries:
dict1 = {'a': 1,
'b': 2,
'c': 3,
'd': 4,
'x': 5}
and
dict2 = {'a': 'start',
'b': 'start',
'c': 'end',
'd': 'end'}
I am trying to create a new dictionary that maps the values start and end as keys to a dictionary that would contain the info of dict1, while keeping those that are not present in dict2 as keys, e.g.:
dict3 = {'start': {'a': 1, 'b': 2},
'end': {'c': 3, 'd': 4},
'x': {'x': 5}
}
Use dict.setdefault() to create the nested dictionaries in dict3 if not yet there, and dict.get() to determine the key in the top-level output dictionary:
dict3 = {}
for k, v in dict1.items():
nested = dict3.setdefault(dict2.get(k, k), {})
nested[k] = v
So dict2.get(k, k) will produce the value from dict2 for a given key from dict1, using the key itself as a default. So for the 'x' key, that'll produce 'x' as there is no mapping in dict2 for that key.
Demo:
>>> dict3 = {}
>>> for k, v in dict1.items():
... nested = dict3.setdefault(dict2.get(k, k), {})
... nested[k] = v
...
>>> dict3
{'start': {'a': 1, 'b': 2}, 'end': {'c': 3, 'd': 4}, 'x': {'x': 5}}
I actually figured it out while abstracting the example and typing up my question here (should have maybe done this earlier...). Anyways: Yay!
So here is my solution, in case it may help someone. If someone knows a swifter or more elegant way to do it, I would be glad to learn!
dict3 = dict()
for k, v in dict1.items():
# if the key of dict1 exists also in dict2
if k in dict2.keys():
# get its value (the keys-to-be for the new dict3)
new_key = dict2[k]
# if the new key is already in the new dict
if new_key in dict3.keys():
# appends new dict entry to dict3
dict3[new_key].update({k: v})
# otherwise create a new entry
else:
dict3[new_key] = {k: v}
# if there is no corresponding mapping present
else:
# treat the original key as the new key and add to dict3
no_map = k
dict3[no_map] = {k: v}

How to remove dictionary key with known value?

Assume python dict:
mydict = {'a': 100, 'b': 200, 'c': 300}
I know one of the values:
value = 200
How to remove the 'b': 200 pair from the dict? I need this:
mydict = {'a': 100, 'c': 300}
Use a dictionary comprehension. Note that (as jonrsharpe has stated) this will create a new dictionary which excludes the key:value pair that you want to remove. If you want to delete it from your original dictionary then please see his answer.
>>> d = {'a': 100, 'b': 200, 'c': 300}
>>> val = 200
# Use d.items() for Python 2.x and d.iteritems() for Python 3.x
>>> d2 = {k:v for k,v in d.items() if v != val}
>>> d2
{'a': 100, 'c': 300}
It sounds like you want:
for key, val in list(mydict.items()):
if val == value:
del mydict[key]
break # unless you want to remove multiple occurences
You'll need to loop over every items(), either with dict comprehension:
new_dict = {k:v for k,v in my_dict.items() if predicate(value)}
Or modifying the existing dictionary:
for k,v in my_dict.items():
if not predicate(v):
del my_dict[k]
The simplest i found:
for key in [k for k,v in mydict.items() if v==200]:
del mydict[key]

Python remove duplicate value in a combined dictionary's list

I need a little bit of homework help. I have to write a function that combines several dictionaries into new dictionary. If a key appears more than once; the values corresponding to that key in the new dictionary should be a unique list. As an example this is what I have so far:
f = {'a': 'apple', 'c': 'cat', 'b': 'bat', 'd': 'dog'}
g = {'c': 'car', 'b': 'bat', 'e': 'elephant'}
h = {'b': 'boy', 'd': 'deer'}
r = {'a': 'adam'}
def merge(*d):
newdicts={}
for dict in d:
for k in dict.items():
if k[0] in newdicts:
newdicts[k[0]].append(k[1])
else:
newdicts[k[0]]=[k[1]]
return newdicts
combined = merge(f, g, h, r)
print(combined)
The output looks like:
{'a': ['apple', 'adam'], 'c': ['cat', 'car'], 'b': ['bat', 'bat', 'boy'], 'e': ['elephant'], 'd': ['dog', 'deer']}
Under the 'b' key, 'bat' appears twice. How do I remove the duplicates?
I've looked under filter, lambda but I couldn't figure out how to use with (maybe b/c it's a list in a dictionary?)
Any help would be appreciated. And thank you in advance for all your help!
Just test for the element inside the list before adding it: -
for k in dict.items():
if k[0] in newdicts:
if k[1] not in newdicts[k[0]]: # Do this test before adding.
newdicts[k[0]].append(k[1])
else:
newdicts[k[0]]=[k[1]]
And since you want just unique elements in the value list, then you can just use a Set as value instead. Also, you can use a defaultdict here, so that you don't have to test for key existence before adding.
Also, don't use built-in for your as your variable names. Instead of dict some other variable.
So, you can modify your merge method as:
from collections import defaultdict
def merge(*d):
newdicts = defaultdict(set) # Define a defaultdict
for each_dict in d:
# dict.items() returns a list of (k, v) tuple.
# So, you can directly unpack the tuple in two loop variables.
for k, v in each_dict.items():
newdicts[k].add(v)
# And if you want the exact representation that you have shown
# You can build a normal dict out of your newly built dict.
unique = {key: list(value) for key, value in newdicts.items()}
return unique
>>> import collections
>>> import itertools
>>> uniques = collections.defaultdict(set)
>>> for k, v in itertools.chain(f.items(), g.items(), h.items(), r.items()):
... uniques[k].add(v)
...
>>> uniques
defaultdict(<type 'set'>, {'a': set(['apple', 'adam']), 'c': set(['car', 'cat']), 'b': set(['boy', 'bat']), 'e': set(['elephant']), 'd': set(['deer', 'dog'])})
Note the results are in a set, not a list -- far more computationally efficient this way. If you would like the final form to be lists then you can do the following:
>>> {x: list(y) for x, y in uniques.items()}
{'a': ['apple', 'adam'], 'c': ['car', 'cat'], 'b': ['boy', 'bat'], 'e': ['elephant'], 'd': ['deer', 'dog']}
In your for loop add this:
for dict in d:
for k in dict.items():
if k[0] in newdicts:
# This line below
if k[1] not in newdicts[k[0]]:
newdicts[k[0]].append(k[1])
else:
newdicts[k[0]]=[k[1]]
This makes sure duplicates aren't added
Use set when you want unique elements:
def merge_dicts(*d):
result={}
for dict in d:
for key, value in dict.items():
result.setdefault(key, set()).add(value)
return result
Try to avoid using indices; unpack tuples instead.

Categories