Check that Python dicts have same shape and keys - python

For single layer dicts like x = {'a': 1, 'b': 2} the problem is easy and answered on SO (Pythonic way to check if two dictionaries have the identical set of keys?) but what about nested dicts?
For example, y = {'a': {'c': 3}, 'b': {'d': 4}} has keys 'a' and 'b' but I want to compare its shape to another nested dict structure like z = {'a': {'c': 5}, 'b': {'d': 6}} which has the same shape and keys (different values is fine) as y. w = {'a': {'c': 3}, 'b': {'e': 4}} would have keys 'a' and 'b' but on the next layer in it differs from y because w['b'] has key 'e' while y['b'] has key 'd'.
Want a short/simple function of two arguments dict_1 and dict_2 and return True if they have same shape and key as described above, and False otherwise.

This provides a copy of both dictionaries stripped of any non-dictionary values, then compares them:
def getshape(d):
if isinstance(d, dict):
return {k:getshape(d[k]) for k in d}
else:
# Replace all non-dict values with None.
return None
def shape_equal(d1, d2):
return getshape(d1) == getshape(d2)

I liked nneonneo's answer, and it should be relatively fast, but I want something that didn't create extra unnecessary data structures (I've been learning about memory fragmentation in Python). This may or may not be as fast or faster.
(EDIT: Spoiler!)
Faster by a decent enough margin to make it preferable in all cases, see the other analysis answer.
But if dealing with lots and lots of these and having memory problems, it is likely to be preferable to do it this way.
Implementation
This should work in Python 3, maybe 2.7 if you translate keys to viewkeys, definitely not 2.6. It relies on the set-like view of the keys that dicts have:
def sameshape(d1, d2):
if isinstance(d1, dict):
if isinstance(d2, dict):
# then we have shapes to check
return (d1.keys() == d2.keys() and
# so the keys are all the same
all(sameshape(d1[k], d2[k]) for k in d1.keys()))
# thus all values will be tested in the same way.
else:
return False # d1 is a dict, but d2 isn't
else:
return not isinstance(d2, dict) # if d2 is a dict, False, else True.
Edit updated to reduce redundant type check, now even more efficient.
Testing
To check:
print('expect false:')
print(sameshape({'foo':{'bar':{None:None}}}, {'foo':{'bar':{None: {} }}}))
print('expect true:')
print(sameshape({'foo':{'bar':{None:None}}}, {'foo':{'bar':{None:'foo'}}}))
print('expect false:')
print(sameshape({'foo':{'bar':{None:None}}}, {'foo':{'bar':{None:None, 'baz':'foo'}}}))
Prints:
expect false:
False
expect true:
True
expect false:
False

To profile the two currently existing answers, first lets import timeit:
import timeit
Now we need to setup the code:
setup = '''
import copy
def getshape(d):
if isinstance(d, dict):
return {k:getshape(d[k]) for k in d}
else:
# Replace all non-dict values with None.
return None
def nneo_shape_equal(d1, d2):
return getshape(d1) == getshape(d2)
def aaron_shape_equal(d1,d2):
if isinstance(d1, dict) and isinstance(d2, dict):
return (d1.keys() == d2.keys() and
all(aaron_shape_equal(d1[k], d2[k]) for k in d1.keys()))
else:
return not (isinstance(d1, dict) or isinstance(d2, dict))
class Vividict(dict):
def __missing__(self, key):
value = self[key] = type(self)()
return value
d = Vividict()
d['foo']['bar']
d['foo']['baz']
d['fizz']['buzz']
d['primary']['secondary']['tertiary']['quaternary']
d0 = copy.deepcopy(d)
d1 = copy.deepcopy(d)
d1['primary']['secondary']['tertiary']['extra']
# d == d0 is True
# d == d1 is now False!
'''
And now let's test the two options out, first with Python 3.3!
>>> timeit.repeat('nneo_shape_equal(d0, d); nneo_shape_equal(d1,d)', setup=setup)
[36.784881490981206, 36.212246977956966, 36.29759863798972]
And it looks like my solution takes 2/3rd to 3/4th the time, making it more than 1.25 times as fast.
>>> timeit.repeat('aaron_shape_equal(d0, d); aaron_shape_equal(d1,d)', setup=setup)
[26.838892214931548, 26.61037168605253, 27.170253590098582]
And on a version of Python 3.4 (an alpha) that I compiled myself:
>>> timeit.repeat('nneo_shape_equal(d0, d); nneo_shape_equal(d1,d)', setup=setup)
[272.5629618819803, 273.49581588001456, 270.13374400604516]
>>> timeit.repeat('aaron_shape_equal(d0, d); aaron_shape_equal(d1,d)', setup=setup)
[214.87033835891634, 215.69223327597138, 214.85333003790583]
Still about the same ratio. The time difference between the two is likely because I self-compiled 3.4 without optimizations.
Thanks to all readers!

Related

dictionary with lists custom equality check

Say, I have two lists:
a = [1,2,3]
and b = [2,3,1]
If I do a a == b it returns False,
If I check sorted(a) == sorted(b), it returns True.
Now, I have two objects:
obj1 = {'a': 1, 'b': 2, 'c': [1, 2]}
and obj2 = {'b': 2, 'a': 1, 'c': [1, 2]}
obj1 == obj2 is True, irrespective of the order of keys.
But if obj2 = {'b': 2, 'a': 1, 'c': [2, 1]}
how do I test the equality? Obviously, obj1 == obj2 returns False. sorted(obj1) will have ['a', 'b', 'c'], so sorted(obj1) == sorted(obj2) is kind of waste check.
I should have probably overridden the equality method for the object, or use some library. Is, is there any way to write idiomatic python code for deep equality?
sort each element in the dict if it is of type list and then compare
>>> def sorted_element_dict(d):
... return {k:sorted(v) if isinstance(v, list) else v for k,v in d.items()}
...
>>> sorted_element_dict(obj1) == sorted_element_dict(obj2)
True
If you are only interested in an equality check, consider using all. You could just apply a custom comparator to all the dictionary values:
def check_equal(d1, d2):
if d1.keys() != d2.keys():
return False
return all(sorted(v) == sorted(d2[k]) if isinstance(v, list) else v == d2[k] for k, v in d1.items())
A more elegant way might be to factor out the comparator into a separate function and check against something more general, like collections.abc.Sequence:
def check_equal(d1, d2):
if d1.keys() != d2.keys():
return False
def cmp(v1, v2):
if isinstance(v1, Sequence) and isinstance(v2, sequence):
return sorted(v1) == sorted(v2)
return v1 == v2
return all(cmp(v, d2[k]) for k, v in d1.items())
This has the advantage of not storing all the intermediate sorted products. If, on the other hand, you need to do the comparison frequently, it may be better to transform your dictionaries before using regular ==:
def normalize(d):
for k, v in d.items():
if isinstance(v, Sequence):
d[k] = sorted(v)
Notice that I used a loop instead of a comprehension here. That way, your dictionary is transformed in-place, rather than allocating a whole new hash table.

Reaching into a nested dictionary several levels deep (that might not exist)

I have an API that I call that returns a dictionary. Part of that dictionary is itself another dictionary. In that inside dictionary, there are some keys that might not exist, or they might. Those keys could reference another dictionary.
To give an example, say I have the following dictionaries:
dict1 = {'a': {'b': {'c':{'d':3}}}}
dict2 = {'a': {'b': {''f': 2}}}
I would like to write a function that I can pass in the dictionary, and a list of keys that would lead me to the 3 in dict1, and the 2 in dict2. However, it is possible that b and c might not exist in dict1, and b and f might not exist in dict2.
I would like to have a function that I could call like this:
get_value(dict1, ['a', 'b', 'c'])
and that would return a 3, or if the keys are not found, then return a default value of 0.
I know that I can use something like this:
val = dict1.get('a', {}).get('b', {}).get('c', 0)
but that seems to be quite wordy to me.
I can also flatten the dict (see https://stackoverflow.com/a/6043835/1758023), but that can be a bit intensive since my dictionary is actually fairly large, and has about 5 levels of nesting in some keys. And, I only need to get two things from the dict.
Right now I am using the flattenDict function in the SO question, but that seems a bit of overkill for my situation.
You can use a recursive function:
def get_value(mydict, keys):
if not keys:
return mydict
if keys[0] not in mydict:
return 0
return get_value(mydict[keys[0]], keys[1:])
If keys can not only be missing, but be other, non-dict types, you can handle this like so:
def get_value(mydict, keys):
if not keys:
return mydict
key = keys[0]
try:
newdict = mydict[key]
except (TypeError, KeyError):
return 0
return get_value(newdict, keys[1:])
Without recursion, just iterate through the keys and go down one level at a time. Putting that inside a try/except allows you to handle the missing key case. KeyError will be raised when the key is not there, and TypeError will be raised if you hit the "bottom" of the dict too soon and try to apply the [] operator to an int or something.
def get_value(d, ks):
for k in ks:
try:
d = d[k] # descend one level
except (KeyError, TypeError):
return 0 # when any lookup fails, return 0
return d # return the final element
Here is a recursive function that should work for general cases
def recursive_get(d, k):
if len(k) == 0:
return 0
elif len(k) == 1:
return d.get(k[0], 0)
else:
value = d.get(k[0], 0)
if isinstance(value, dict):
return recursive_get(value, k[1:])
else:
return value
It takes arguments of the dict to search, and a list of keys, which it will check one per level
>>> dict1 = {'a': {'b': {'c':{'d':3}}}}
>>> recursive_get(dict1, ['a', 'b', 'c'])
{'d': 3}
>>> dict2 = {'a': {'b': {'f': 2}}}
>>> recursive_get(dict2, ['a', 'b', 'c'])
0

The best way to merge multi-nested dictionaries in Python 2.7

I have two nested dictionaries and I want to merge them into one (where second dict overrides first dict values). I saw a lot of beautiful solutions for merging "flat" (not nested) dictionaries, e.g.:
dict_result = dict1.copy()
dict_result.update(dict2)
or
dict_result = dict(dict1.items() + dict2.items())
or (my favorit one)
dict_result = dict(d1,**d2)
but couldn't find the most efficient way to merge multi-nested dicts.
I'm trying to avoid recursion. What is your proposition?
Unless the depth of the dictionaries to merge is strictly limited, there's no way to avoid recursion.1) Also, there's no bultin or library function to do this (that is, none that I know of), but it's actually not all that hard. Something like this should do:
def merge(d1, d2):
for k in d2:
if k in d1 and isinstance(d1[k], dict) and isinstance(d2[k], dict):
merge(d1[k], d2[k])
else:
d1[k] = d2[k]
What this does: It iterates the keys in d2 and if the key can also be found in d1 and both are dictionaries, merge those sub-dictionaries, otherwise overwrite the value in d1 with that from d2. Note that this changes d1 and its sub-dictionaries in place, so you might want to deep-copy it before.
Or use this version to create a merged copy:
def merge_copy(d1, d2):
return {k: merge_copy(d1[k], d2[k]) if k in d1 and isinstance(d1[k], dict) and isinstance(d2[k], dict) else d2[k] for k in d2}
Example:
>>> d1 = {"foo": {"bar": 23, "blub": 42}, "flub": 17}
>>> d2 = {"foo": {"bar": 100}, "flub": {"flub2": 10}, "more": {"stuff": 111}}
>>> merge(d1, d2)
>>> print d1
{'foo': {'bar': 100, 'blub': 42}, 'flub': {'flub2': 10}, 'more': {'stuff': 111}}
1) You can make it iterative, using a stack, but this will only make things more complicated and should only be done to avoid problems with maximum recursive depth.
Modified version of the above merge_copy function for dicts that can be thought of as merging parent and child where you want the parent to inherit all the values from child to create a new dict.
def merge_copy(child, parent):
'''returns parent updated with child values if exists'''
d = {}
for k in parent:
if k in child and isinstance(child[k], dict) and isinstance(parent[k], dict):
v = merge_copy(child[k], parent[k])
elif k in child:
v = child[k]
else:
v = parent[k]
d[k] = v
return d

python check multi-level dict key existence

Many SO posts show you how to efficiently check the existence of a key in a dictionary, e.g., Check if a given key already exists in a dictionary
How do I do this for a multi level key? For example, if d["a"]["b"] is a dict, how can I check if d["a"]["b"]["c"]["d"] exists without doing something horrendous like this:
if "a" in d and isInstance(d["a"], dict) and "b" in d["a"] and isInstance(d["a"]["b"], dict) and ...
Is there some syntax like
if "a"/"b"/"c"/"d" in d
What I am actually using this for: we have jsons, parsed into dicts using simplejson, that I need to extract values from. Some of these values are nested three and four levels deep; but sometimes the value doesn't exist at all. So I wanted something like:
val = None if not d["a"]["b"]["c"]["d"] else d["a"]["b"]["c"]["d"] #here d["a"]["b"] may not even exist
EDIT: prefer not to crash if some subkey exists but is not a dictionary, e.g, d["a"]["b"] = 5.
Sadly, there isn't any builtin syntax or a common library to query dictionaries like that.
However, I believe the simplest(and I think it's efficient enough) thing you can do is:
d.get("a", {}).get("b", {}).get("c")
Edit: It's not very common, but there is: https://github.com/akesterson/dpath-python
Edit 2: Examples:
>>> d = {"a": {"b": {}}}
>>> d.get("a", {}).get("b", {}).get("c")
>>> d = {"a": {}}
>>> d.get("a", {}).get("b", {}).get("c")
>>> d = {"a": {"b": {"c": 4}}}
>>> d.get("a", {}).get("b", {}).get("c")
4
This isn't probably a good idea and I wouldn't recommend using this in prod. However, if you're just doing it for learning purposes then the below might work for you.
def rget(dct, keys, default=None):
"""
>>> rget({'a': 1}, ['a'])
1
>>> rget({'a': {'b': 2}}, ['a', 'b'])
2
"""
key = keys.pop(0)
try:
elem = dct[key]
except KeyError:
return default
except TypeError:
# you gotta handle non dict types here
# beware of sequences when your keys are integers
if not keys:
return elem
return rget(elem, keys, default)
UPDATE: I ended up writing my own open-source, pippable library that allows one to do this: https://pypi.python.org/pypi/dictsearch
A non-recursive version, quite similar to #Meitham's solution, which does not mutate the looked-for key. Returns True/False if the exact structure is present in the source dictionary.
def subkey_in_dict(dct, subkey):
""" Returns True if the given subkey is present within the structure of the source dictionary, False otherwise.
The format of the subkey is parent_key:sub_key1:sub_sub_key2 (etc.) - description of the dict structure, where the
character ":" is the delemiter.
:param dct: the dictionary to be searched in.
:param subkey: the target keys structure, which should be present.
:returns Boolean: is the keys structure present in dct.
:raises AttributeError: if subkey is not a string.
"""
keys = subkey.split(':')
work_dict = dct
while keys:
target = keys.pop(0)
if isinstance(work_dict, dict):
if target in work_dict:
if not keys: # this is the last element in the input, and it is in the dict
return True
else: # not the last element of subkey, change the temp var
work_dict = work_dict[target]
else:
return False
else:
return False
The structure that is checked is in the form parent_key:sub_key1:sub_sub_key2, where the : char is the delimiter. Obviously - it will match case-sensitively, and will stop (return False) if there's a list within the dictionary.
Sample usage:
dct = {'a': {'b': {'c': {'d': 123}}}}
print(subkey_in_dict(dct, 'a:b:c:d')) # prints True
print(subkey_in_dict(dct, 'a:b:c:d:e')) # False
print(subkey_in_dict(dct, 'a:b:d')) # False
print(subkey_in_dict(dct, 'a:b:c')) # True
This is what I usually use
def key_in_dict(_dict: dict, key_lookup: str, separator='.'):
"""
Searches for a nested key in a dictionary and returns its value, or None if nothing was found.
key_lookup must be a string where each key is deparated by a given "separator" character, which by default is a dot
"""
keys = key_lookup.split(separator)
subdict = _dict
for k in keys:
subdict = subdict[k] if k in subdict else None
if subdict is None: break
return subdict
Returns the key if exists, or None it it doesn't
key_in_dict({'test': {'test': 'found'}}, 'test.test') // 'found'
key_in_dict({'test': {'test': 'found'}}, 'test.not_a_key') // None

Recursive dictionary modification in python

What would be the easiest way to go about turning this dictionary:
{'item':{'w':{'c':1, 'd':2}, 'x':120, 'y':240, 'z':{'a':100, 'b':200}}}
into this one:
{'item':{'y':240, 'z':{'b':200}}}
given only that you need the vars y and b while maintaining the structure of the dictionary? The size or number of items or the depth of the dictionary should not matter, as the one I'm working with can be anywhere from 2 to 5 levels deep.
EDIT: I apologize for the type earlier, and to clarify, I am given an array of strings (eg ['y', 'b']) which I need to find in the dictionary and then keep ONLY 'y' and 'b' as well as any other keys in order to maintain the structure of the original dictionary, in this case, it would be 'z'
A better example can be found here where I need Chipset Model, VRAM, and Resolution.
In regards to the comment, the input would be the above link as the starting dictionary along with an array of ['chipset model', 'vram', 'resolution'] as the keep list. It should return this:
{'Graphics/Displays':{'NVIDIA GeForce 7300 GT':{'Chipset Model':'NVIDIA GeForce 7300 GT', 'Displays':{'Resolution':'1440 x 900 # 75 Hz'}, 'VRAM (Total)':'256 Mb'}}
Assuming that the dictionary you want to assign to an element of a super-dictionary is foo, you could just do this:
my_dictionary['keys']['to']['subdict']=foo
Regarding your edit—where you need to eliminate all keys except those on a certain list—this function should do the trick:
def drop_keys(recursive_dict,keep_list):
key_list=recursive_dict.keys()
for key in key_list:
if(type(recursive_dict[key]) is dict):
drop_keys(recursive_dict[key], keep_list)
elif(key not in keep_list):
del recursive_dict[key]
Something like this?
d = {'item': {'w': {'c': 1, 'd': 2}, 'x': 120, 'y': 240, 'z': {'a': 100, 'b': 200}}}
l = ['y', 'z']
def do_dict(d, l):
return {k: v for k, v in d['item'].items() if k in l}
Here's what I arrived at for a recursive solution, which ended up being similar to what #Dan posted:
def recursive_del(d,keep):
for k in d.copy():
if type(d[k]) == dict:
recursive_del(d[k],keep)
if len(d[k]) == 0: #all keys were deleted, clean up empty dict
del d[k]
elif k not in keep:
del d[k]
demo:
>>> keepset = {'y','b'}
>>> a = {'item':{'w':{'c':1, 'd':2}, 'x':120, 'y':240, 'z':{'a':100, 'b':200}}}
>>> recursive_del(a,keepset)
>>> a
{'item': {'z': {'b': 200}, 'y': 240}}
The only thing I think he missed is that you will need to sometimes need to clean up dicts which had all their keys deleted; i.e. without that adjustment you would end up with a vestigial 'w':{} in your example output.
Using your second example I made something like this, it's not exactly pretty but it should be easy to extend. If your tree starts to get big, you can define some sets of rules to parse the dict.
Each rule here are actually pretty much "what should I do when i'm in which state".
def rule2(key, value):
if key == 'VRAM (Total)':
return (key, value)
elif key == 'Chipset Model':
return (key, value)
def rule1(key, value):
if key == "Graphics/Displays":
if isinstance(value, dict):
return (key, recursive_checker(value, rule1))
else:
return (key, value)
else:
return (key, recursive_checker(value, rule2))
def recursive_checker(dat, rule):
def inner(item):
key = item[0]
value = item[1]
return rule(key, value)
return dict(filter(lambda x: x!= None, map(inner, dat.items())))
# Important bits
print recursive_checker(data, rule1)
In your case, as there is not many states, it isn't worth doing it but in case you have multiple cards and you don't necessarly know which key should be traversed but only know that you want certain keys from the tree. This method could be used to search the tree easily. It can be applied to many things.

Categories