I have two dictionaries with some shared keys, and some different ones. (Each dictionary has some keys not present in the other). What's a nice way to compare the two dictionaries for equality, as if only the shared keys were present?
In other words I want a simplest way to calculate the following:
commonkeys = set(dict1).intersection(dict2)
simple1 = dict((k, v) for k,v in dict1.items() if k in commonkeys)
simple2 = dict((k, v) for k,v in dict2.items() if k in commonkeys)
return simple1 == simple2
I've managed to simplify it to this:
commonkeys = set(dict1).intersection(dict2)
return all(dict1[key] == dict2[key] for key in commonkeys)
But I'm hoping for an approach that doesn't require precalculation of the common keys. (In reality I have two lists of dictionaries that I'll be comparing pairwise. All dictionaries in each list have the same set of keys, so if a computation like commonkeys above is necessary, it would only need to be done once.)
What about the following?
return all(dict2[key] == val for key, val in dict1.iteritems() if key in dict2)
Or even shorter (although it possibly involves a few more comparisons):
return all(dict2.get(key, val) == val for key, val in dict1.iteritems())
Try this
dict((k, dict1[k]) for k in dict1.keys() + dict2.keys() if dict1.get(k) == dict2.get(k))
O(m + n) comparisons.
If you want a true/false result put a simple check on the above result. If not none return true
Related
We have a dictionary d1 and a condition cond. We want d1 to contain only the values that satisfy the condition cond. One way to do it is:
d1 = {k:v for k,v in d1.items() if cond(v)}
But, this creates a new dictionary, which may be very memory-inefficient if d1 is large.
Another option is:
for k,v in d1.items():
if not cond(v):
d1.pop(k)
But, this modifies the dictionary while it is iterated upon, and generates an error: "RuntimeError: dictionary changed size during iteration".
What is the correct way in Python 3 to filter a dictionary in-place?
If there are not many keys the corresponding values of which satisfy the condition, then you might first aggregate the keys and then prune the dictionary:
for k in [k for k,v in d1.items() if cond(v)]:
del d1[k]
In case the list [k for k,v in d1.items() if cond(v)] would be too large, one might process the dictionary "in turns", i.e., to assemble the keys until their count does not exceed a threshold, prune the dictionary, and repeat until there are no more keys satisfying the condition:
from itertools import islice
def prune(d, cond, chunk_size = 1000):
change = True
while change:
change = False
keys = list(islice((k for k,v in d.items() if cond(v)), chunk_size))
for k in keys:
change = True
del d[k]
I am trying to find a way to return more than one result for my dictionary in Python:
def transitive_property(d1, d2):
'''
Return a new dictionary in which the keys are from d1 and the values are from d2.
A key-value pair should be included only if the value associated with a key in d1
is a key in d2.
>>> transitive_property({'one':1, 'two':2}, {1:1.0})
{'one':1.0}
>>> transitive_property({'one':1, 'two':2}, {3:3.0})
{}
>>> transitive_property({'one':1, 'two':2, 'three':3}, {1:1.0, 3:3.0})
{'one':1.0}
{'three': 3.0}
'''
for key, val in d1.items():
if val in d2:
return {key:d2[val]}
else:
return {}
I've come up with a bunch of different things but they would never pass a few test cases such as the third one (with {'three':3}). This what results when I test using the third case in the doc string:
{'one':1.0}
So since it doesn't return {'three':3.0}, I feel that it only returns a single occurrence within the dictionary, so maybe it's a matter of returning a new dictionary so it could iterate over all of the cases. What would you say on this approach? I'm quite new so I hope the code below makes some sense despite the syntax errors. I really did try.
empty = {}
for key, val in d1.items():
if val in d2:
return empty += key, d2[val]
return empty
Your idea almost works but (i) you are returning the value immediately, which exits the function at that point, and (ii) you can't add properties to a dictionary using +=. Instead you need to set its properties using dictionary[key] = value.
result = {}
for key, val in d1.items():
if val in d2:
result[key] = d2[val]
return result
This can also be written more succinctly as a dictionary comprehension:
def transitive_property(d1, d2):
return {key: d2[val] for key, val in d1.items() if val in d2}
You can also have the function return a list of dictionaries with a single key-value pair in each, though I'm not sure why you would want that:
def transitive_property(d1, d2):
return [{key: d2[val]} for key, val in d1.items() if val in d2]
If return is used to , then the function is terminated for that particular call . So if you want to return more than one value it is impossible. You can use arrays instead .You can store values in array and the return thhe array.
What will be the most efficient way to check if key-value pair of one dictionary is present in other dictionary as well. Suppose if I have two dictionaries as dict1 and dict2 and these two dictionaries have some of the key-value pairs in common. I want to find those and print them. What would be the most efficient way to do this? Please suggest.
one way would be:
d_inter = dict([k, v for k, v in dict1.iteritems() if k in dict2 and dict2[k] == v])
the other:
d_inter = dict(set(d1.iteritems()).intersection(d2.iteritems()))
I'm not sure which one would be more efficient, so let's compare both of them:
1. Solution with iteration through dicts:
we parse all keys of dict1: for k,v in dict1.iteritems() -> O(n)
then we check whether the key is in dict2, if k in dict2 and dict2[k] == v -> O(m)
which makes it a global worst case complexity of O(n+m) -> O(n)
2. Solution with sets:
if we assume that converting a dict into a set is O(n):
we parse all items of d1 to create the first set set(d1.iteritems()) -> O(n)
we parse all items of d2 to create the second set set(d2.iteritems()) -> O(m)
we get the intersetion of both which is O(min(len(s), len(t)) on average or O(n * m) in worst case
which makes it a global worst case complexity of O(2n*n*m) which can be considered as O(n^3) for same sized dicts: then solution 1. is best
if we assume that converting a dict into a set is O(1) (constant time)
the average is O(min(n,m)) and the worst case is O(n*m), then solution #1 is best on worst case scenario, but solution #2 is best on average case scenario because O(n+m) > O(min(n,m)).
In conclusion, the solution you choose will depend on your dataset and the measurements you'll make! ;-)
N.B.: I took there the complexity of the set().
N.B.2: for the solution #1 make always the smallest dict as dict2 and for solution #2 the smallest dict as dict1.
N.B.2016: This solution was written for python2. Here's the changes needed to make it python3 ready:
replace iteritems() with items() ;
you could also use the newer dict comprehension syntax: {[k, v for … == v]} ;
as d.items() returns dict_items which is not hashable anymore, you'd have to use a frozenset() instead {frozenset(d1.items()).intersection(d2.items())}.
What about...
matching_dict_values = {}
for key in dict1.keys():
if key in dict2.keys():
if dict1[key] == dict2[key]:
matching_dict_values[key]=dict1[key]
I don't see why you'd need anything fancier than this:
if all([testKey in dict1, testKey in dict2]) and dict1[testKey] == dict2[testKey]:
We don't have to worry about a KeyError because the boolean test will fail before the and (do a value which correlates to a key that isn't in one of them will never get tested)
So to get your full list common key-value pairs you could do this:
for testKey in set(dict1.keys() + dict2.keys()):
if all([testKey in dict1, testKey in dict2]) and dict1[testKey] == dict2[testKey]:
commonDict[testKey] = dict1[testKey]
Update to #zmo's answer
Solution 1:
d_inter = {k:v for k, v in dict1.items() if k in dict2 and dict2[k] == v}
Solution 2:
d_inter = dict(set(dict1.items()).intersection(dict2.items()))
Following works for a dictionary, but not OrderedDict. For od it seems to form an infinite loop. Can you tell me why?
If the function input is dict it has to return dict, if input is OrderedDict it has to return od.
def key_lower(d):
"""returns d for d or od for od with keys changed to lower case
"""
for k in d.iterkeys():
v = d.pop(k)
if (type(k) == str) and (not k.islower()):
k = k.lower()
d[k] = v
return d
It forms an infinite loop because of the way ordered dictionaries add new members (to the end)
Since you are using iterkeys, it is using a generator. When you assign d[k] = v you are adding the new key/value to the end of the dictionary. Because you are using a generator, that will continue to generate keys as you continue adding them.
You could fix this in a few ways. One would be to create a new ordered dict from the previous.
def key_lower(d):
newDict = OrderedDict()
for k, v in d.iteritems():
if (isinstance(k, (str, basestring))):
k = k.lower()
newDict[k] = v
return newDict
The other way would be to not use a generator and use keys instead of iterkeys
As sberry mentioned, the infinite loop is essentially as you are modifying and reading the dict at the same time.
Probably the simplest solution is to use OrderedDict.keys() instead of OrderedDict.iterkeys():
for k in d.keys():
v = d.pop(k)
if (type(k) == str) and (not k.islower()):
k = k.lower()
d[k] = v
as the keys are captured directly at the start, they won't get updated as items are changed in the dict.
I want to know which would be an efficient method to invert dictionaries in python. I also want to get rid of duplicate values by comparing the keys and choosing the larger over the smaller assuming they can be compared. Here is inverting a dictionary:
inverted = dict([[v,k] for k,v in d.items()])
To remove duplicates by using the largest key, sort your dictionary iterator by value. The call to dict will use the last key inserted:
import operator
inverted = dict((v,k) for k,v in sorted(d.iteritems(), key=operator.itemgetter(1)))
Here is a simple and direct implementation of inverting a dictionary and keeping the larger of any duplicate values:
inverted = {}
for k, v in d.iteritems():
if v in inverted:
inverted[v] = max(inverted[v], k)
else:
inverted[v] = k
This can be tightened-up a bit with dict.get():
inverted = {}
for k, v in d.iteritems():
inverted[v] = max(inverted.get(v, k), k)
This code makes fewer comparisons and uses less memory than an approach using sorted().