Find matching key-value pairs of two dictionaries - python

What will be the most efficient way to check if key-value pair of one dictionary is present in other dictionary as well. Suppose if I have two dictionaries as dict1 and dict2 and these two dictionaries have some of the key-value pairs in common. I want to find those and print them. What would be the most efficient way to do this? Please suggest.

one way would be:
d_inter = dict([k, v for k, v in dict1.iteritems() if k in dict2 and dict2[k] == v])
the other:
d_inter = dict(set(d1.iteritems()).intersection(d2.iteritems()))
I'm not sure which one would be more efficient, so let's compare both of them:
1. Solution with iteration through dicts:
we parse all keys of dict1: for k,v in dict1.iteritems() -> O(n)
then we check whether the key is in dict2, if k in dict2 and dict2[k] == v -> O(m)
which makes it a global worst case complexity of O(n+m) -> O(n)
2. Solution with sets:
if we assume that converting a dict into a set is O(n):
we parse all items of d1 to create the first set set(d1.iteritems()) -> O(n)
we parse all items of d2 to create the second set set(d2.iteritems()) -> O(m)
we get the intersetion of both which is O(min(len(s), len(t)) on average or O(n * m) in worst case
which makes it a global worst case complexity of O(2n*n*m) which can be considered as O(n^3) for same sized dicts: then solution 1. is best
if we assume that converting a dict into a set is O(1) (constant time)
the average is O(min(n,m)) and the worst case is O(n*m), then solution #1 is best on worst case scenario, but solution #2 is best on average case scenario because O(n+m) > O(min(n,m)).
In conclusion, the solution you choose will depend on your dataset and the measurements you'll make! ;-)
N.B.: I took there the complexity of the set().
N.B.2: for the solution #1 make always the smallest dict as dict2 and for solution #2 the smallest dict as dict1.
N.B.2016: This solution was written for python2. Here's the changes needed to make it python3 ready:
replace iteritems() with items() ;
you could also use the newer dict comprehension syntax: {[k, v for … == v]} ;
as d.items() returns dict_items which is not hashable anymore, you'd have to use a frozenset() instead {frozenset(d1.items()).intersection(d2.items())}.

What about...
matching_dict_values = {}
for key in dict1.keys():
if key in dict2.keys():
if dict1[key] == dict2[key]:
matching_dict_values[key]=dict1[key]

I don't see why you'd need anything fancier than this:
if all([testKey in dict1, testKey in dict2]) and dict1[testKey] == dict2[testKey]:
We don't have to worry about a KeyError because the boolean test will fail before the and (do a value which correlates to a key that isn't in one of them will never get tested)
So to get your full list common key-value pairs you could do this:
for testKey in set(dict1.keys() + dict2.keys()):
if all([testKey in dict1, testKey in dict2]) and dict1[testKey] == dict2[testKey]:
commonDict[testKey] = dict1[testKey]

Update to #zmo's answer
Solution 1:
d_inter = {k:v for k, v in dict1.items() if k in dict2 and dict2[k] == v}
Solution 2:
d_inter = dict(set(dict1.items()).intersection(dict2.items()))

Related

Splitting a dictionary by key suffixes

I have a dictionary like so
d = {"key_a":1, "anotherkey_a":2, "key_b":3, "anotherkey_b":4}
So the values and key names are not important here. The key (no pun intended) thing, is that related keys share the same suffix in my example above that is _a and _b.
These suffixes are not known before hand (they are not always _a and _b for example, and there are an unknown number of different suffixes.
What I would like to do, is to extract out related keys into their own dictionaries, and have all generated dictionaries in a list.
The output from above would be
output = [{"key_a":1, "anotherkey_a":2},{"key_b":3, "anotherkey_b":4}]
My current approach is to first get all the suffixes, and then generate the sub-dicts one at a time and append to the new list
output = list()
# Generate a set of suffixes
suffixes = set([k.split("_")[-1] for k in d.keys()])
# Create the subdict and append to output
for suffix in suffixes:
output.append({k:v for k,v in d.items() if k.endswith(suffix)})
This works (and is not prohibitively slow or anyhting) but I am simply wondering if there is a more elegant way to do it with a list or dict comprehension? Just out of interest...
Make your output a defaultdict rather than a list, with suffixes as keys:
from collections import defaultdict
output = defaultdict(lambda: {})
for k, v in d.items():
prefix, suffix = k.rsplit('_', 1)
output[suffix][k] = v
This will split your dict in a single pass and result in something like:
output = {"a" : {"key_a":1, "anotherkey_a":2}, "b": {"key_b":3, "anotherkey_b":4}}
and if you insist on converting it to a list, you can simply use:
output = list(output.values())
You could condense the lines
output = list()
for suffix in suffixes:
output.append({k:v for k,v in d.items() if k.endswith(suffix)})
to a list comprehension, like this
[{k:v for k,v in d.items() if k.endswith(suffix)} for suffix in suffixes]
Whether it is more elegant is probably in the eyes of the beholder.
The approach suggested by #Błotosmętek will probably be faster though, given a large dictionary, since it results in less looping.
def sub_dictionary_by_suffix(dictionary, suffix):
sub_dictionary = {k: v for k, v in dictionary.items() if k.endswith(suffix)}
return sub_dictionary
I hope it helps

Use A Single Line To Loop Two Dictionaries To Create List

I have looked at several threads similar but unable to get a workable solution. I am looping (2) dictionaries trying to create a single list from the values of one based on the keys of another. I have it done with for loops but looking to use a single line if possible. My code with for loops
for k, v in dict1.items():
for value in dict2[k]:
temp.append(value)
On the first loop thru the temp list would be and is from above code:
[16,18,20,22,24,26]
I then use min to get the min value of the list. Now I want to condense the for loop to a one liner. I have put together
temp=[dict2.values() for k in dict1.keys() if k in dict2.keys()]
When executed, instead of temp being a single list for the k that exist in the dict1, I get a list of list for all the values from all dict2.
[[16,18,20,22,24,26], [12,16,18,20,22,24], [16,18,22,26,30,32]]
It seems to be ignoring the if statement. I know my dict1 has only 1 key in this situation and I know the 1 key exist in the dict2. Is my one liner wrong?
Input Values for dictionaries:
dict1={'table1':[16,18,20,22,24,26]}
dict2={'table1':[16,18,20,22,24,26],'table2': [12,16,18,20,22,24], 'table3': [16,18,22,26,30,32]}
You can iterate through one dictionary checking for matching keys and create a list of lists. Use chain.from_iterable to flatten list and call min():
from itertools import chain
dict1 = {'table1': [16,18,20,22,24,26]}
dict2 = {'table1': [16,18,20,22,24,26], 'table2': [12,16,18,20,22,24], 'table3': [16,18,22,26,30,32]}
temp = [dict2[k] for k in dict1 if k in dict2]
print(min(chain.from_iterable(temp)))
# 16
The reason why your list comprehension does not work:
It looks like dict2 has 3 key-value pairs, and the values are [16,18,20,22,24,26], [12,16,18,20,22,24]and [16,18,22,26,30,32]. What you're doing in your list comprehension translates to
for k in dict1.keys():
if k in dict2.keys():
temp.append(dict2.values())
So if dict1has, let's say, 3 keys, this for loop will repeat 3 times. Because, as you said in a comment above, only one key is shared between dict1and dict2, the if statement only is True once, so all items of dict2.values() will be appended to temponce. What you want to do, if i got that right, is to append all items INSIDE one of the values of dict2, namely the one assigned to the one key that the two dicts share. Your idea was pretty close, you just have to add one little thing. As a one liner, it would look like this:
temp = [x for x in dict2[k] for k in dict1.keys() if k in dict2.keys()]
or, differently:
temp = [dict2[k] for k in set(dict1.keys()).intersection(set(dict2.keys()))]
You can use the operator itemgetter():
from operator import itemgetter
from itertools import chain
dict1 = {'table1': [16,18,20,22,24,26], 'table2': [12,16,18,20,22,24]}
dict2 = {'table1': [16,18,20,22,24,26], 'table2': [12,16,18,20,22,24], 'table3': [16,18,22,26,30,32]}
common_keys = set(dict1).intersection(dict2)
sublists = itemgetter(*common_keys)(dict2)
if len(common_keys) == 1:
max_val = max(sublists)
else:
max_val = max(chain.from_iterable(sublists))
print(max_val)
# 26

Comparing dictionaries on shared keys only

I have two dictionaries with some shared keys, and some different ones. (Each dictionary has some keys not present in the other). What's a nice way to compare the two dictionaries for equality, as if only the shared keys were present?
In other words I want a simplest way to calculate the following:
commonkeys = set(dict1).intersection(dict2)
simple1 = dict((k, v) for k,v in dict1.items() if k in commonkeys)
simple2 = dict((k, v) for k,v in dict2.items() if k in commonkeys)
return simple1 == simple2
I've managed to simplify it to this:
commonkeys = set(dict1).intersection(dict2)
return all(dict1[key] == dict2[key] for key in commonkeys)
But I'm hoping for an approach that doesn't require precalculation of the common keys. (In reality I have two lists of dictionaries that I'll be comparing pairwise. All dictionaries in each list have the same set of keys, so if a computation like commonkeys above is necessary, it would only need to be done once.)
What about the following?
return all(dict2[key] == val for key, val in dict1.iteritems() if key in dict2)
Or even shorter (although it possibly involves a few more comparisons):
return all(dict2.get(key, val) == val for key, val in dict1.iteritems())
Try this
dict((k, dict1[k]) for k in dict1.keys() + dict2.keys() if dict1.get(k) == dict2.get(k))
O(m + n) comparisons.
If you want a true/false result put a simple check on the above result. If not none return true

Returning unique elements from values in a dictionary

I have a dictionary like this :
d = {'v03':["elem_A","elem_B","elem_C"],'v02':["elem_A","elem_D","elem_C"],'v01':["elem_A","elem_E"]}
How would you return a new dictionary with the elements that are not contained in the key of the highest value ?
In this case :
d2 = {'v02':['elem_D'],'v01':["elem_E"]}
Thank you,
I prefer to do differences with the builtin data type designed for it: sets.
It is also preferable to write loops rather than elaborate comprehensions. One-liners are clever, but understandable code that you can return to and understand is even better.
d = {'v03':["elem_A","elem_B","elem_C"],'v02':["elem_A","elem_D","elem_C"],'v01':["elem_A","elem_E"]}
last = None
d2 = {}
for key in sorted(d.keys()):
if last:
if set(d[last]) - set(d[key]):
d2[last] = sorted(set(d[last]) - set(d[key]))
last = key
print d2
{'v01': ['elem_E'], 'v02': ['elem_D']}
from collections import defaultdict
myNewDict = defaultdict(list)
all_keys = d.keys()
all_keys.sort()
max_value = all_keys[-1]
for key in d:
if key != max_value:
for value in d[key]:
if value not in d[max_value]:
myNewDict[key].append(value)
You can get fancier with set operations by taking the set difference between the values in d[max_value] and each of the other keys but first I think you should get comfortable working with dictionaries and lists.
defaultdict(<type 'list'>, {'v01': ['elem_E'], 'v02': ['elem_D']})
one reason not to use sets is that the solution does not generalize enough because sets can only have hashable objects. If your values are lists of lists the members (sublists) are not hashable so you can't use a set operation
Depending on your python version, you may be able to get this done with only one line, using dict comprehension:
>>> d2 = {k:[v for v in values if not v in d.get(max(d.keys()))] for k, values in d.items()}
>>> d2
{'v01': ['elem_E'], 'v02': ['elem_D'], 'v03': []}
This puts together a copy of dict d with containing lists being stripped off all items stored at the max key. The resulting dict looks more or less like what you are going for.
If you don't want the empty list at key v03, wrap the result itself in another dict:
>>> {k:v for k,v in d2.items() if len(v) > 0}
{'v01': ['elem_E'], 'v02': ['elem_D']}
EDIT:
In case your original dict has a very large keyset [or said operation is required frequently], you might also want to substitute the expression d.get(max(d.keys())) by some previously assigned list variable for performance [but I ain't sure if it doesn't in fact get pre-computed anyway]. This speeds up the whole thing by almost 100%. The following runs 100,000 times in 1.5 secs on my machine, whereas the unsubstituted expression takes more than 3 seconds.
>>> bl = d.get(max(d.keys()))
>>> d2 = {k:v for k,v in {k:[v for v in values if not v in bl] for k, values in d.items()}.items() if len(v) > 0}

Inverting Dictionaries in Python

I want to know which would be an efficient method to invert dictionaries in python. I also want to get rid of duplicate values by comparing the keys and choosing the larger over the smaller assuming they can be compared. Here is inverting a dictionary:
inverted = dict([[v,k] for k,v in d.items()])
To remove duplicates by using the largest key, sort your dictionary iterator by value. The call to dict will use the last key inserted:
import operator
inverted = dict((v,k) for k,v in sorted(d.iteritems(), key=operator.itemgetter(1)))
Here is a simple and direct implementation of inverting a dictionary and keeping the larger of any duplicate values:
inverted = {}
for k, v in d.iteritems():
if v in inverted:
inverted[v] = max(inverted[v], k)
else:
inverted[v] = k
This can be tightened-up a bit with dict.get():
inverted = {}
for k, v in d.iteritems():
inverted[v] = max(inverted.get(v, k), k)
This code makes fewer comparisons and uses less memory than an approach using sorted().

Categories