Quickest way to merge dictionaries based on key match

Quickest way to merge dictionaries based on key match - python

I have two dictionaries:
dic_1={'1234567890': 1, '1234567891': 2, '1234567880': 3, '1234567881': 4}
dic_2={'1234567890': 5, '1234567891': 6}
Now I want to merge them based on key values such that the merged dictionary looks like the following:
merged_dic=={'1234567890': 1, '1234567891': 2, '1234567880': 3, '1234567881': 4}
We only want to keep unique keys and only one distinct value associated with them. What's the best way to do that

This should be what you need. It iterates through all dictionaries adding key/values only if the key is not already in the merged dictionary.
from itertools import chain
merged_dic = {}
for k, v in chain(dic_1.items(), dic_2.items()):
if k not in merged_dic:
merged_dic[k] = v
print(merged_dic)
# {'1234567890': 1, '1234567891': 2, '1234567880': 3, '1234567881': 4}
If, for example, you were wanting to keep all values for a key you could use:
from collections import defaultdict
from itertools import chain
merged_dic = defaultdict(list)
for k, v in chain(dic_1.items(), dic_2.items()):
merged_dic[k].append(v)
print(merged_dic)
# {'1234567890': [1, 5], '1234567891': [2, 6], '1234567880': [3], '1234567881': [4]}
Using chain() can allow you to iterate over many dictionaries. In the question you showed 2 dictionaries, but if you had 4 you could easily merge them all. E.g.
for k, v in chain(dic_1.items(), dic_2.items(), dic_3.items(), dic_4.items()):

All you're really trying to do is update dic_2 with any values in dic_1 so you can just do
merged_dic = {**dic_2,**dic_1}
This will merge the two dictionaries, taking all the values from dic_2, updating any keys in the new dictionary with any new values that exist in dic_1 and then adding any unique keys in dic_1

The sample data is not exactly explains the SO. If dic_2 has common key with dic_1 then retain the item in dic_1; if new item is found in dic_2 then put it in merged dictionary.
import copy
dic_1={'1234567890': 1, '1234567891': 2, '1234567880': 3, '1234567881': 4}
dic_2={'1234567890': 5, '8234567890': 6}
merged_d = copy.copy(dic_1)
diff = set(dic_2)-set(dic_1)
merged_d.update({k: dic_2[k] for k in diff})
print(merged_d)
Result:
{'1234567890': 1, '1234567891': 2, '1234567880': 3, '1234567881': 4, '8234567890': 6}

If you want the first dict to override the keys in the second dict then:
dic_2.update(dic_1)

Related

Converting a list of "pairs" into a dictionary of dictionaries?

This question was previously asked here with an egregious typo: Counting "unique pairs" of numbers into a python dictionary?
This is an algorithmic problem, and I don't know of the most efficient solution. My idea would be to somehow cache values in a list and enumerate pairs...but that would be so slow. I'm guessing there's something useful from itertools.
Let's say I have a list of integers whereby are never repeats:
list1 = [2, 3]
In this case, there is a unique pair 2-3 and 3-2, so the dictionary should be:
{2:{3: 1}, 3:{2: 1}}
That is, there is 1 pair of 2-3 and 1 pair of 3-2.
For larger lists, the pairing is the same, e.g.
list2 = [2, 3, 4]
has the dicitonary
{2:{3:1, 4:1}, 3:{2:1, 4:1}, 4:{3:1, 2:1}}
(1) Once the size of the lists become far larger, how would one algorithmically find the "unique pairs" in this format using python data structures?
(2) I mentioned that the lists cannot have repeat integers, e.g.
[2, 2, 3]
is impossible, as there are two 2s.
However, one may have a list of lists:
list3 = [[2, 3], [2, 3, 4]]
whereby the dictionary must be
{2:{3:2, 4:1}, 3:{2:2, 4:1}, 4:{2:1, 3:1}}
as there are two pairs of 2-3 and 3-2. How would one "update" the dictionary given multiple lists within a list?
EDIT: My ultimate use case is, I want to iterate through hundreds of lists of integers, and create a single dictionary with the "counts" of pairs. Does this make sense? There might be another data structure which is more useful.

For the nested list example, you can do the following, making use of itertools.permutations and dict.setdefault:
from itertools import permutations
list3 = [[2, 3], [2, 3, 4]]
d = {}
for l in list3:
for a, b in permutations(l, 2):
d[a][b] = d.setdefault(a, {}).setdefault(b, 0) + 1
# {2: {3: 2, 4: 1}, 3: {2: 2, 4: 1}, 4: {2: 1, 3: 1}}
For flat lists l, use only the inner loop and omit the outer one

For this example I'll just use a list with straight numbers and no nested list:
values = [3, 2, 4]
result = dict.from_keys(values)
for key, value in result.items():
value = {}
for num in values:
if num != key:
value[num] = 1
This creates a dict with each number as a key. Now in each key, make the value a nested dict who's contents are num: 1 for each number in the original values list if it isn't the name of the key that we're in

use defaultdict with permutations
from collections import defaultdict
from itertools import permutations
d = defaultdict(dict)
for i in [x for x in permutations([4,2,3])]:
d[i[0]] = {k: 1 for k in i[1:]}
output is
In [22]: d
Out[22]: defaultdict(dict, {2: {3: 1, 4: 1}, 4: {2: 1, 3: 1}, 3: {2: 1, 4: 1}})
for inherit list of lists https://stackoverflow.com/a/52206554/8060120

how to get all the values in a dictionary with the same key? [duplicate]

I want to add multiple values to a specific key in a python dictionary. How can I do that?
a = {}
a["abc"] = 1
a["abc"] = 2
This will replace the value of a["abc"] from 1 to 2.
What I want instead is for a["abc"] to have multiple values (both 1 and 2).

Make the value a list, e.g.
a["abc"] = [1, 2, "bob"]
UPDATE:
There are a couple of ways to add values to key, and to create a list if one isn't already there. I'll show one such method in little steps.
key = "somekey"
a.setdefault(key, [])
a[key].append(1)
Results:
>>> a
{'somekey': [1]}
Next, try:
key = "somekey"
a.setdefault(key, [])
a[key].append(2)
Results:
>>> a
{'somekey': [1, 2]}
The magic of setdefault is that it initializes the value for that key if that key is not defined. Now, noting that setdefault returns the value, you can combine these into a single line:
a.setdefault("somekey", []).append("bob")
Results:
>>> a
{'somekey': [1, 2, 'bob']}
You should look at the dict methods, in particular the get() method, and do some experiments to get comfortable with this.

How about
a["abc"] = [1, 2]
This will result in:
>>> a
{'abc': [1, 2]}
Is that what you were looking for?

Append list elements
If the dict values need to be extended by another list, extend() method of lists may be useful.
a = {}
a.setdefault('abc', []).append(1) # {'abc': [1]}
a.setdefault('abc', []).extend([2, 3]) # a is now {'abc': [1, 2, 3]}
This can be especially useful in a loop where values need to be appended or extended depending on datatype.
a = {}
some_key = 'abc'
for v in [1, 2, 3, [2, 4]]:
if isinstance(v, list):
a.setdefault(some_key, []).extend(v)
else:
a.setdefault(some_key, []).append(v)
a
# {'abc': [1, 2, 3, 2, 4]}
Append list elements without duplicates
If there's a dictionary such as a = {'abc': [1, 2, 3]} and it needs to be extended by [2, 4] without duplicates, checking for duplicates (via in operator) should do the trick. The magic of get() method is that a default value can be set (in this case empty set ([])) in case a key doesn't exist in a, so that the membership test doesn't error out.
a = {some_key: [1, 2, 3]}
for v in [2, 4]:
if v not in a.get(some_key, []):
a.setdefault(some_key, []).append(v)
a
# {'abc': [1, 2, 3, 4]}

Python: "Hash" a nested List

I have a dictionary master which contains around 50000 to 100000 unique lists which can be simple lists or also lists of lists. Every list is assigned to a specific ID (which is the key of the dictionary):
master = {12: [1, 2, 4], 21: [[1, 2, 3], [5, 6, 7, 9]], ...} # len(master) is several ten thousands
Now I have a few hundreds of dictionarys which again contain around 10000 lists (same as above: can be nested). Example of one of those dicts:
a = {'key1': [6, 9, 3, 1], 'key2': [[1, 2, 3], [5, 6, 7, 9]], 'key3': [7], ...}
I want to cross-reference this data for every single dictionary in reference to my master, i.e. instead of saving every list within a, I want to only store the ID of the master in case the list is present in the master.
=> a = {'key1': [6, 9, 3, 1], 'key2': 21, 'key3': [7], ...}
I can do that by looping over all values in a and all values of master and try to match the lists (by sorting them), but that'll take ages.
Now I'm wondering how would you solve this?
I thought of "hashing" every list in master to a unique string and store it as a key of a new master_inverse reference dict, e.g.:
master_inverse = {hash([1,2,4]): 12, hash([[1, 2, 3], [5, 6, 7, 9]]): 21}
Then it would be very simple to look it up later on:
for k, v in a.items():
h = hash(v)
if h in master_inverse:
a[k] = master_inverse[h]
Do you have a better idea?
How could such a hash look like? Is there a built-in-method already which is fast and unique?
EDIT:
Dunno why I didn't come up instantly with this approach:
What do you think of using a m5-hash of either the pickle or the repr() any single list?
Something like this:
import hashlib
def myHash(str):
return hashlib.md5(repr(str)).hexdigest()
master_inverse = {myHash(v): k for k, v in master.items()}
for k, v in a.items():
h = myHash(v)
if h in master_inverse:
a[k] = master_inverse[h]
EDIT2:
I benched it: To check one of the hundred dicts (in my example a, a contains for my benchmark around 20k values) against my master_inverse is very fast, didn't expect that: 0.08sec. So I guess I can live with that well enough.

MD5 approach will work, but you need to be cautions about very small possibility of cache collisions (see How many random elements before MD5 produces collisions? for more deitals) when using MD5 hash.
If you need to be absolutely sure that program works correctly you can convert lists to tuples and create dictionary where keys are tuples you have created and values are keys from your master dictionary (same as master_inverse, but with full values instead of MD5 hash values).
More info on how to use tuples as dictionary keys: http://www.developer.com/lang/other/article.php/630941/Learn-to-Program-using-Python-Using-Tuples-as-Keys.htm.

Sum values in a dict of lists

If I have a dictionary such as:
my_dict = {"A": [1, 2, 3], "B": [9, -4, 2], "C": [3, 99, 1]}
How do I create a new dictionary with sums of the values for each key?
result = {"A": 6, "B": 7, "C": 103}

Use sum() function:
my_dict = {"A": [1, 2, 3], "B": [9, -4, 2], "C": [3, 99, 1]}
result = {}
for k, v in my_dict.items():
result[k] = sum(v)
print(result)
Or just create a dict with a dictionary comprehension:
result = {k: sum(v) for k, v in my_dict.items()}
Output:
{'A': 6, 'B': 7, 'C': 103}

Try This:
def sumDictionaryValues(d):
new_d = {}
for i in d:
new_d[i]= sum(d[i])
return new_d

Just a for loop:
new = {}
for key in dict:
new_d[key]= sum(d[key])
new the dictionary having all the summed values

One liners- for the same, just demonstrating some ways to get the expected result,in python3
my_dict = {"A": [1, 2, 3], "B": [9, -4, 2], "C": [3, 99, 1]}
Using my_dict.keys():- That returns a list containing the my_dict's keys, with dictionary comprehension.
res_dict = {key : sum(my_dict[key]) for key in my_dict.keys()}
Note that my_dict.keys() produce a list of keys in the dictionary,
But python dictionary data structure also got an iterator implementation to loop
through keys, (which is faster, but not recommended always) so, this
will also give same result,
res_dict = {key : sum(my_dict[key]) for key in my_dict}
Using my_dict.items():- That returns a list containing a tuple for each key
value pair of dictionary, by unpacking them gives a best single line solution for this,
res_dict = {key : sum(val) for key, val in my_dict.items()}
Using my_dict.values():- That returns a list of all the values in the
dictionary(here the list of each lists),
note:- This one is not intended as a direct solution, just for demonstrating the three python methods used to traverse through a dictionary
res_dict = dict(zip(my_dict.keys(), [sum(val) for val in my_dict.values()]))
The zip function accepts iterators(lists, tuples, strings..), and pairs similar indexed items together and returns a zip object, by using dict it is converted back to a dictionary.

Get a dict whose keys are combinations of a given dict without duplicate values

I have a dict like this:
dic = {'01':[1,2], '02':[1], '03':[2,3]}
what I want to achieve is a new dict, its keys are combinations of the keys (group in 2 only), and without duplicate values.
in this simple example, the output will be:
newDic = {'0102':[1,2], '0103':[1,2,3],'0203':[1,2,3]}
thanks a bunch!!

You can use a itertools.combinations to get the different combo's of keys. and then use set to get unique values of the list. Put it all into a dictionary-comprehension like this:
>>> dic = {'01':[1,2], '02':[1], '03':[2,3]}
>>> import itertools as IT
>>> {a+b: list(set(dic[a]+dic[b])) for a,b in IT.combinations(dic, 2)}
{'0203': [1, 2, 3], '0301': [1, 2, 3], '0201': [1, 2]}
You can also use join and sorted to have the keys the way you want them:
>>> {''.join(sorted([a,b])): list(set(dic[a]+dic[b])) for a,b in IT.combinations(dic, 2)}
{'0203': [1, 2, 3], '0103': [1, 2, 3], '0102': [1, 2]}

newDic = { a+b : list(set(dic[a] + dic[b])) for a in dic for b in dic if b>a }

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Quickest way to merge dictionaries based on key match - python

If you want the first dict to override the keys in the second dict then: dic_2.update(dic_1)

Related

Converting a list of "pairs" into a dictionary of dictionaries?

how to get all the values in a dictionary with the same key? [duplicate]

Python: "Hash" a nested List

Sum values in a dict of lists

Get a dict whose keys are combinations of a given dict without duplicate values

Categories

Resources