python quickest way to merge dictionaries based on key match - python

I have 2 lists of dictionaries. List A is 34,000 long, list B is 650,000 long. I am essentially inserting all the List B dicts into the List A dicts based on a key match. Currently, I am doing the obvious, but its taking forever (seriously, like a day). There must be a quicker way!
for a in listA:
a['things'] = []
for b in listB:
if a['ID'] == b['ID']:
a['things'].append(b)

from collections import defaultdict
dictB = defaultdict(list)
for b in listB:
dictB[b['ID']].append(b)
for a in listA:
a['things'] = []
for b in dictB[a['ID']]:
a['things'].append(b)
this will turn your algorithm from O(n*m) to O(m)+O(n), where n=len(listA), m=len(listB)
basically it avoids looping through each dict in listB for each dict in listA by 'precalculating' what dicts from listB match each 'ID'

Here's an approach that may help. I'll leave it to you to fill in the details.
Your code is slow because it is a O(n^2) algorithm, comparing every A against every B.
If you sort each of listA and listB by id first (these are O(nlogn)) operations, then you can iterate easily through the sorted versions of A and B (this will be in linear time).
This approach is common when you have to do external merges on very large data sets. Mihai's answer is better for internal merging, where you simply index everything by id (in memory). If you have the memory to hold these additional structures, and dictionary lookup is constant time, that approach will likely be faster, not to mention simpler. :)
By way of example let's say A had the following ids after sorting
acfgjp
and B had these ids, again after sorting
aaaabbbbcccddeeeefffggiikknnnnppppqqqrrr
The idea is, strangely enough, to keep indexes into A and B (I know that does not sound very Pythonic). At first you are looking at a in A and a in B. So you walk through B adding all the a's to your "things" array for a. Once you exhaust the a's in B, you move up one in A, to c. But the next item in B is b, which is less than c, so you have to skip the b's. Then you arrive at a c in B, so you can start adding into "things" for c. Continue in this fashion until both lists are exhausted. Just one pass. :)

I'd convert ListA and ListB into dictionaries instead, dictionaries with ID as the key. Then it is a simple matter to append data using python's quick dictionary lookups:
from collections import defaultdict
class thingdict(dict):
def __init__(self, *args, **kwargs):
things = []
super(thingdict,self).__init__(*args, things=things, **kwargs)
A = defaultdict(thingdict)
A[1] = defaultdict(list)
A[2] = defaultdict(list, things=[6]) # with some dummy data
A[3] = defaultdict(list, things=[7])
B = {1: 5, 2: 6, 3: 7, 4: 8, 5: 9}
for k, v in B.items():
# print k,v
A[k]['things'].append(v)
print A
print B
This returns:
defaultdict(<class '__main__.thingdict'>, {
1: defaultdict(<type 'list'>, {'things': [5]}),
2: defaultdict(<type 'list'>, {'things': [6, 6]}),
3: defaultdict(<type 'list'>, {'things': [7, 7]}),
4: {'things': [8]},
5: {'things': [9]}
})
{1: 5, 2: 6, 3: 7, 4: 8, 5: 9}

Related

python list of lists to dict when key appear many times

I know to write something simple and slow with loop, but I need it to run super fast in big scale.
input:
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
desired out put:
d = {1 : ["txt1", "txt2"], 2 : "txt3"]
There is something built-in at python which make dict() extend key instead replacing it?
dict(list(zip(lst[0], lst[1])))
One option is to use dict.setdefault:
out = {}
for k, v in zip(*lst):
out.setdefault(k, []).append(v)
Output:
{1: ['txt1', 'txt2'], 2: ['txt3']}
If you want the element itself for singleton lists, one way is adding a condition that checks for it while you build an output dictionary:
out = {}
for k,v in zip(*lst):
if k in out:
if isinstance(out[k], list):
out[k].append(v)
else:
out[k] = [out[k], v]
else:
out[k] = v
or if lst[0] is sorted (like it is in your sample), you could use itertools.groupby:
from itertools import groupby
out = {}
pos = 0
for k, v in groupby(lst[0]):
length = len([*v])
if length > 1:
out[k] = lst[1][pos:pos+length]
else:
out[k] = lst[1][pos]
pos += length
Output:
{1: ['txt1', 'txt2'], 2: 'txt3'}
But as #timgeb notes, it's probably not something you want because afterwards, you'll have to check for data type each time you access this dictionary (if value is a list or not), which is an unnecessary problem that you could avoid by having all values as lists.
If you're dealing with large datasets it may be useful to add a pandas solution.
>>> import pandas as pd
>>> lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
>>> s = pd.Series(lst[1], index=lst[0])
>>> s
1 txt1
1 txt2
2 txt3
>>> s.groupby(level=0).apply(list).to_dict()
{1: ['txt1', 'txt2'], 2: ['txt3']}
Note that this also produces lists for single elements (e.g. ['txt3']) which I highly recommend. Having both lists and strings as possible values will result in bugs because both of those types are iterable. You'd need to remember to check the type each time you process a dict-value.
You can use a defaultdict to group the strings by their corresponding key, then make a second pass through the list to extract the strings from singleton lists. Regardless of what you do, you'll need to access every element in both lists at least once, so some iteration structure is necessary (and even if you don't explicitly use iteration, whatever you use will almost definitely use iteration under the hood):
from collections import defaultdict
lst = [[1, 1, 2], ["txt1", "txt2", "txt3"]]
result = defaultdict(list)
for key, value in zip(lst[0], lst[1]):
result[key].append(value)
for key in result:
if len(result[key]) == 1:
result[key] = result[key][0]
print(dict(result)) # Prints {1: ['txt1', 'txt2'], 2: 'txt3'}

python: I want to make a dictionary using two, two dimensional lists

I want to make a single list with two dictionaries in it using two, two dimensional lists. Note that each element should have to be paired to the element of the second list.
a = [[1,2,3],[4,5,6]]
b = [[7,8,9],[10,11,12]]
c = dict(zip(a,b))
is not working because list is not hash-able.
Then I need the out put as
c = [{1:7, 2:8, 3:9}, {4:10, 5:11, 6:12}]
You want something like the following:
c = [dict(zip(keys, vals)) for keys, vals in zip(a, b)]
Here we use a list comprehension to zip and cast to a dict for each inner list in the original lists a and b.
Alternatively, we could flatten out the comprehension further, to get:
c = [{k: v for k, v in zip(keys, vals)} for keys, vals in zip(a, b)]
Both are equivalent, its just a matter of style.
Output:
>>> print(c)
[{1: 7, 2: 8, 3: 9}, {4: 10, 5: 11, 6: 12}]

Python - Merge/update dictionary by smaller value

I would like to merge or update a dictionary in Python with new entries, but replace the values of entries whose key exists with the smaller of the values associated with the key in the existing entry and the new entry. For example:
Input:
dict_A = {1:14, 2:15, 3:16, 4:17}, dict_B= {2:19, 3:9, 4:11, 5:13}
Expected output:
{1:14, 2:15, 3:9, 4:11, 5:13}
I know it can be achieved with a loop iterating through the dictionaries while performing comparisons, but is there any simpler and faster ways or any helpful libraries to achieve this?
in this case you could easily use pandas to avoid writing the loop, though I dunno if there would be any speedup - didn't test that
import pandas as pd
df = pd.DataFrame([dict_A, dict_B])
out = df.min().to_dict()
output: {1: 14.0, 2: 15.0, 3: 9.0, 4: 11.0, 5: 13.0}
there's probably some edge cases you'd have to account for
Quick One-liner
c = {**a, **b, **{key:min(a[key], b[key]) for key in set(a).intersection(set(b))} }
Explanation
This should be quick enough because it uses the set.
You can merge dictionaries by using the **dictionary syntax like so: {**a, **b}. The ** simply just "expands" out the dictionary into each individual item, with the last expanded dictionary overwriting any previous ones (so in {**a, **b}, any matching keys in b overwrite the value from a).
The first thing I do is load in all the values in a and b into the new dictionary:
c = {**a, **b, ...
Then I use dictionary comprehension to generate a new dictionary, which only has the smallest value for every set of keys which are in both a and b.
... {key:min(a[key], b[key]) for key in set(a).intersection(set(b))} ...
To get the set of keys which only exist in both a and b, I convert both dictionaries to sets (which converts them to sets of their keys) and use intersection to quickly find all keys which are in both sets.
... set(a).intersection(set(b)) ...
Then I loop through each of the keys in the matching-keys set, and use the dictionary comprehension to generate a new dictionary with the current key and the min of both dictionaries' values for that key.
... {key:min(a[key], b[key]) ...
Then I use the ** syntax to "expand" this new generated dictionary with the expanded a and b, putting it last to make sure it overwrites any values from the two.
Works on the example given (ctrl-cv'd straight from my terminal):
>>> a = {1:14, 2:15, 3:16, 4:17}
>>> b = {2:19, 3:9, 4:11, 5:13}
>>> c = {**a, **b, **{key:min(a[key], b[key]) for key in set(a).intersection(set(b))} }
>>> c
{1: 14, 2: 15, 3: 9, 4: 11, 5: 13}
Here's something short to do it. Not inherently the fastest or best way to do it, but figured I'd share nonetheless.
max_val = max(max(dict_A.values()), max(dict_B.values())) + 1
keys = set(list(dict_A.keys()) + list(dict_B.keys()))
dict_C = { key : min(dict_A.get(key, max_val), dict_B.get(key, max_val)) for key in keys }
hello this is the code I have made Hope it's what you needed :
dict_A = {1:14, 2:15, 3:16, 4:17}
dict_B= {2:19, 3:9, 4:11, 5:13}
dict_res =dict_B
dict_A_keys = dict_A.keys()
dict_B_keys = dict_B.keys()
for e in dict_A_keys :
if e in dict_B_keys :
if dict_A[e]>dict_B[e]:
dict_res[e]=dict_B[e]
else:
dict_res[e]=dict_A[e]
else:
dict_res[e]=dict_A[e]
I have tested it and the output is :
{1: 14, 2: 15, 3: 9, 4: 11, 5: 13}

Extract a list of keys by Sorting the dictionary in python

I have my program's output as a python dictionary and i want a list of keys from the dictn:
s = "cool_ice_wifi"
r = ["water_is_cool", "cold_ice_drink", "cool_wifi_speed"]
good_list=s.split("_")
dictn={}
for i in range(len(r)):
split_review=r[i].split("_")
counter=0
for good_word in good_list:
if good_word in split_review:
counter=counter+1
d1={i:counter}
dictn.update(d1)
print(dictn)
The conditions on which we should get the keys:
The keys with the same values will have the index copied as it is in a dummy list.
The keys with highest values will come first and then the lowest in the dummy list
Dictn={0: 1, 1: 1, 2: 2}
Expected output = [2,0,1]
You can use a list comp:
[key for key in sorted(dictn, key=dictn.get, reverse=True)]
In Python3 it is now possible to use the sorted method, as described here, to sort the dictionary in any way you choose.
Check out the documentation, but in the simplest case you can .get the dictionary's values, while for more complex operations, you'd define a key function yourself.
Dictionaries in Python3 are now insertion-ordered, so one other way to do things is to sort at the moment of dictionary creation, or you could use an OrderedDict.
Here's an example of the first option in action, which I think is the easiest
>>> a = {}
>>> a[0] = 1
>>> a[1] = 1
>>> a[2] = 2
>>> print(a)
{0: 1, 1: 1, 2: 2}
>>>
>>> [(k) for k in sorted(a, key=a.get, reverse=True)]
[2, 0, 1]

Combining two list to make a dictionary [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
I need to convert two list into a dictionary. I have two list named A and B,
A = [[1,2,3] [2,3,4] [1,4,5] [1,3,4]]
the values in A (list of list) will be unique
B = [[10,13,23] [22,21,12] [5,34,23] [10,9,8]]
the length of both list will be same
I need the result like
C = [['1' :10 , '2' :13, '3': 23] ['2':22, '3':21, '4':12] ['1':5, '4':34, '5':23] ['1':10, '3':9, '4':8]]
I tried the dict method but it is throwing me an error saying list is un hashable type. Please let me know how to do this. Im new to python.
It's hard to be sure, but I'm guessing your problem is this:
If a and b were just flat lists of numbers, all you'd have to do is this:
dict(zip(a, b))
And I'm assuming that's similar to what you wrote. But that doesn't work here. Those aren't lists of numbers, they're list of lists. And you don't to get back a dict, you want a list of dicts.
So, you're asking Python to create a dict whose keys are the sub-lists of a, and whose values are the sub-lists of b. That's an error because lists can't be keys, but it wouldn't be useful even if that weren't an issue.
To actually do this, you need to not only zip up a and b, but also zip up their sublists, and pass those sub-zips, not the main zip, to dict. Like this:
[dict(zip(suba, subb)) for (suba, subb) in zip(a, b)]
I think you want C to be a list of dictionaries. Here's a straightforward way to do that.
def combine(keys, values):
"generate a dictionary from keys and values"
out = {}
for k,v in zip(keys,values):
out[k] = v
return out
def combine_each(keys, values):
"for each pair of keys and values, make a dictionary"
return [combine(klist, vlist) for (klist,vlist) in zip(keys,values)]
C = combine_each(A,B)
Alternative:
>>> A = [[1,2,3], [2,3,4], [1,4,5], [1,3,4]]
>>> B = [[10,13,23], [22,21,12], [5,34,23], [10,9,8]]
>>> [{k:v for k, v in zip(sa, sb)} for (sa, sb) in zip(A, B)]
[{1: 10, 2: 13, 3: 23}, {2: 22, 3: 21, 4: 12}, {1: 5, 4: 34, 5: 23}, {1: 10, 3: 9, 4: 8}]

Categories