Python remove dictionary keys which don't occur in separate dictionary

Python remove dictionary keys which don't occur in separate dictionary - python

I have two dictionaries, I need to remove the keys from dictionary 1 which don't occur in dictionary 2. This is my attempt:
d1 = {'id1':1,
'id2':1,
'id3':1,
'id4':1}
d2 = {'id1':0,
'id2':0,
'id3':0,
'idnew':0}
for k in (d1.keys()-d2.keys()):
del d1[k]
print (d1)
prints:
{'id1': 1, 'id2': 1, 'id3': 1}
My question is: Is this the fastest/most memory efficient way to do this? or does it construct sets which will take up more memory than required to do something like this
My 2nd attempt:
d1 = {k:v for k,v in d1.items() if k in d2}

filter and dict comprehension might be a good way to go for such a task, although this issue is easy to solve without as well.
filtered_d = {k:d1[k] for k in filter(lambda k: k in d2, d1)}

Dict comprehension can have a perfomance hit when dictionaries a large. You can remove them just iterating over list with for loop:
for key in set(d1) -set(d2):
del d1[key]
or if you know that your dicts will be small, you can use dict comprehension.
d1 = {k: v for k, v in d1.items() if k in d2}

Your solve is good because : the fastest/most memory efficient issues come from type of values and sized .
This can be see and set with a python debugger.

Related

How to switch between keys and values in python dictionary in place (without changing it's location in memory)

i was asked to write a code including a function- reverse_dict_in_place(d)
which switches between keys and values of the inputed dictionary
without changing the dictionary's location in memory (in place).
however, testing it with id() function shows that all my solutions do change dictionaries memory location..
def reverse_dict_in_place(d):
d={y:x for x,y in d.items()}
return d

Alternative to current ones which allows values to be same as keys. Works in mostly the same way though, however once again no two values may be the same.
def reverse_dict_in_place(d):
copy = d.copy().items()
d.clear()
for k, v in copy:
d[v] = k
return d
>>> x = {0: 1, 1: 2}
>>> y = reverse_dict_in_place(x)
>>> id(x) == id(y)
True
>>>

Some assumptions for this to work (thanks to all the users who pointed these out):
There are no duplicate values
There are no non-hashable values
There are no values that are also keys
If you're comfortable with those assumption then I think this should work:
def reverse_dict_in_place(d):
for k,v in d.items():
del d[k]
d[v] = k
return d

Extending on Gad suggestion, you could use dict comprehension:
reversed = {v: k for k, v in d.items()}
Where d is a dict, and the same assumptions apply:
There are no duplicate values
There are no non-hashable values
There are no values that are also keys
This would not work, without modification, for nested dicts.

Note: #NightShade has posted a similar answer as my below answer, earlier than I posted.
You can try this:
def reverse_dict_in_place(d):
d_copy = d.copy()
d.clear()
for k in d_copy:
d[d_copy[k]] = k
This would work even if one of the dictionary's values happens to also be a key (as tested out below)
Testing it out:
my_dict = {1:1, 2:'two', 3:'three'}
reverse_dict_in_place(my_dict)
print (my_dict)
Output:
{1: 1, 'two': 2, 'three': 3}

Efficiently filtering a dictionary in-place

We have a dictionary d1 and a condition cond. We want d1 to contain only the values that satisfy the condition cond. One way to do it is:
d1 = {k:v for k,v in d1.items() if cond(v)}
But, this creates a new dictionary, which may be very memory-inefficient if d1 is large.
Another option is:
for k,v in d1.items():
if not cond(v):
d1.pop(k)
But, this modifies the dictionary while it is iterated upon, and generates an error: "RuntimeError: dictionary changed size during iteration".
What is the correct way in Python 3 to filter a dictionary in-place?

If there are not many keys the corresponding values of which satisfy the condition, then you might first aggregate the keys and then prune the dictionary:
for k in [k for k,v in d1.items() if cond(v)]:
del d1[k]
In case the list [k for k,v in d1.items() if cond(v)] would be too large, one might process the dictionary "in turns", i.e., to assemble the keys until their count does not exceed a threshold, prune the dictionary, and repeat until there are no more keys satisfying the condition:
from itertools import islice
def prune(d, cond, chunk_size = 1000):
change = True
while change:
change = False
keys = list(islice((k for k,v in d.items() if cond(v)), chunk_size))
for k in keys:
change = True
del d[k]

Returning unique elements from values in a dictionary

I have a dictionary like this :
d = {'v03':["elem_A","elem_B","elem_C"],'v02':["elem_A","elem_D","elem_C"],'v01':["elem_A","elem_E"]}
How would you return a new dictionary with the elements that are not contained in the key of the highest value ?
In this case :
d2 = {'v02':['elem_D'],'v01':["elem_E"]}
Thank you,

I prefer to do differences with the builtin data type designed for it: sets.
It is also preferable to write loops rather than elaborate comprehensions. One-liners are clever, but understandable code that you can return to and understand is even better.
d = {'v03':["elem_A","elem_B","elem_C"],'v02':["elem_A","elem_D","elem_C"],'v01':["elem_A","elem_E"]}
last = None
d2 = {}
for key in sorted(d.keys()):
if last:
if set(d[last]) - set(d[key]):
d2[last] = sorted(set(d[last]) - set(d[key]))
last = key
print d2
{'v01': ['elem_E'], 'v02': ['elem_D']}

from collections import defaultdict
myNewDict = defaultdict(list)
all_keys = d.keys()
all_keys.sort()
max_value = all_keys[-1]
for key in d:
if key != max_value:
for value in d[key]:
if value not in d[max_value]:
myNewDict[key].append(value)
You can get fancier with set operations by taking the set difference between the values in d[max_value] and each of the other keys but first I think you should get comfortable working with dictionaries and lists.
defaultdict(<type 'list'>, {'v01': ['elem_E'], 'v02': ['elem_D']})
one reason not to use sets is that the solution does not generalize enough because sets can only have hashable objects. If your values are lists of lists the members (sublists) are not hashable so you can't use a set operation

Depending on your python version, you may be able to get this done with only one line, using dict comprehension:
>>> d2 = {k:[v for v in values if not v in d.get(max(d.keys()))] for k, values in d.items()}
>>> d2
{'v01': ['elem_E'], 'v02': ['elem_D'], 'v03': []}
This puts together a copy of dict d with containing lists being stripped off all items stored at the max key. The resulting dict looks more or less like what you are going for.
If you don't want the empty list at key v03, wrap the result itself in another dict:
>>> {k:v for k,v in d2.items() if len(v) > 0}
{'v01': ['elem_E'], 'v02': ['elem_D']}
EDIT:
In case your original dict has a very large keyset [or said operation is required frequently], you might also want to substitute the expression d.get(max(d.keys())) by some previously assigned list variable for performance [but I ain't sure if it doesn't in fact get pre-computed anyway]. This speeds up the whole thing by almost 100%. The following runs 100,000 times in 1.5 secs on my machine, whereas the unsubstituted expression takes more than 3 seconds.
>>> bl = d.get(max(d.keys()))
>>> d2 = {k:v for k,v in {k:[v for v in values if not v in bl] for k, values in d.items()}.items() if len(v) > 0}

Find matching key-value pairs of two dictionaries

What will be the most efficient way to check if key-value pair of one dictionary is present in other dictionary as well. Suppose if I have two dictionaries as dict1 and dict2 and these two dictionaries have some of the key-value pairs in common. I want to find those and print them. What would be the most efficient way to do this? Please suggest.

one way would be:
d_inter = dict([k, v for k, v in dict1.iteritems() if k in dict2 and dict2[k] == v])
the other:
d_inter = dict(set(d1.iteritems()).intersection(d2.iteritems()))
I'm not sure which one would be more efficient, so let's compare both of them:
1. Solution with iteration through dicts:
we parse all keys of dict1: for k,v in dict1.iteritems() -> O(n)
then we check whether the key is in dict2, if k in dict2 and dict2[k] == v -> O(m)
which makes it a global worst case complexity of O(n+m) -> O(n)
2. Solution with sets:
if we assume that converting a dict into a set is O(n):
we parse all items of d1 to create the first set set(d1.iteritems()) -> O(n)
we parse all items of d2 to create the second set set(d2.iteritems()) -> O(m)
we get the intersetion of both which is O(min(len(s), len(t)) on average or O(n * m) in worst case
which makes it a global worst case complexity of O(2n*n*m) which can be considered as O(n^3) for same sized dicts: then solution 1. is best
if we assume that converting a dict into a set is O(1) (constant time)
the average is O(min(n,m)) and the worst case is O(n*m), then solution #1 is best on worst case scenario, but solution #2 is best on average case scenario because O(n+m) > O(min(n,m)).
In conclusion, the solution you choose will depend on your dataset and the measurements you'll make! ;-)
N.B.: I took there the complexity of the set().
N.B.2: for the solution #1 make always the smallest dict as dict2 and for solution #2 the smallest dict as dict1.
N.B.2016: This solution was written for python2. Here's the changes needed to make it python3 ready:
replace iteritems() with items() ;
you could also use the newer dict comprehension syntax: {[k, v for … == v]} ;
as d.items() returns dict_items which is not hashable anymore, you'd have to use a frozenset() instead {frozenset(d1.items()).intersection(d2.items())}.

What about...
matching_dict_values = {}
for key in dict1.keys():
if key in dict2.keys():
if dict1[key] == dict2[key]:
matching_dict_values[key]=dict1[key]

I don't see why you'd need anything fancier than this:
if all([testKey in dict1, testKey in dict2]) and dict1[testKey] == dict2[testKey]:
We don't have to worry about a KeyError because the boolean test will fail before the and (do a value which correlates to a key that isn't in one of them will never get tested)
So to get your full list common key-value pairs you could do this:
for testKey in set(dict1.keys() + dict2.keys()):
if all([testKey in dict1, testKey in dict2]) and dict1[testKey] == dict2[testKey]:
commonDict[testKey] = dict1[testKey]

Update to #zmo's answer
Solution 1:
d_inter = {k:v for k, v in dict1.items() if k in dict2 and dict2[k] == v}
Solution 2:
d_inter = dict(set(dict1.items()).intersection(dict2.items()))

Python Merge 2 Dictionaries without overwriting

If a and b are 2 dictionaries:
a = {'UK':'http://www.uk.com', 'COM':['http://www.uk.com','http://www.michaeljackson.com']}
bb = {'Australia': 'http://www.australia.com', 'COM':['http://www.Australia.com', 'http://www.rafaelnadal.com','http://www.rogerfederer.com']}
I want to merge them to get
{'Australia': ['http://www.australia.com'], 'COM': ['http://www.uk.com', 'http://www.michaeljackson.com', 'http://www.Australia.com', 'http://www.rafaelnadal.com', 'http://www.rogerfederer.com'], 'UK': ['http://www.uk.com']}
I want to union them i.e.
How to do it in Python without overwwriting and replacing any value?

Use a defaultdict:
from collections import defaultdict
d = defaultdict(list)
for dd in (a,bb):
for k,v in dd.items():
#Added this check to make extending work for cases where
#the value is a string.
v = (v,) if isinstance(v,basestring) else v #basestring is just str in py3k.
d[k].extend(v)
(but this is pretty much what I told you in my earlier answer)
This now works if your input dictionaries look like
{'Australia':['http://www.australia.com']}
or like:
{'Australia':'http://www.australia.com'}
However, I would advise against the latter form. In general, I think it's a good idea to keep all the keys/values of a dictionary looking the same (at least if you want to treat all the items the same as in this question). That means that if one value is a list, it's a good idea for all of them to be a list.
If you really insist on keeping things this way:
d = {}
for dd in (a,b):
for k,v in dd.items():
if(not isinstance(v,list)):
v = [v]
try:
d[k].extend(v)
except KeyError: #no key, no problem, just add it to the dict.
d[k] = v

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python remove dictionary keys which don't occur in separate dictionary - python

filter and dict comprehension might be a good way to go for such a task, although this issue is easy to solve without as well. filtered_d = {k:d1[k] for k in filter(lambda k: k in d2, d1)}

Dict comprehension can have a perfomance hit when dictionaries a large. You can remove them just iterating over list with for loop: for key in set(d1) -set(d2): del d1[key] or if you know that your dicts will be small, you can use dict comprehension. d1 = {k: v for k, v in d1.items() if k in d2}

Your solve is good because : the fastest/most memory efficient issues come from type of values and sized . This can be see and set with a python debugger.

Related

How to switch between keys and values in python dictionary in place (without changing it's location in memory)

Efficiently filtering a dictionary in-place

Returning unique elements from values in a dictionary

Find matching key-value pairs of two dictionaries

Python Merge 2 Dictionaries without overwriting

Categories

Resources