Compare two nested dictionary keys - python

I need to compare only the keys of two nested dictionaries. The primary usage is for the live tests of external API responses to prevent response change.
For example, this two dictionary matched, however their values differ:
EDIT:‌ this is a sample and the actual dictionaries have dynamic keys, probably larger, and consists of integers, strings, and boolean
dict1 = {"guid": {"id": {"addr": "foo", "creation_num": "4"}}}
dict2 = {"guid": {"id": {"addr": "bar", "creation_num": "2"}}}
I try to do this by resetting the values of dictionaries with this method:
def reset_values(dictionary, reset_value=0):
for key, value in dictionary.items():
if type(value) is dict:
dictionary[key] = reset_values(dictionary[key], reset_value)
else:
dictionary[key] = reset_value
return dictionary
This method works, but is there a more Pythonic and straightforward way?

EDIT
#Bheid is correct that by flattening the key list my solution would get tripped up when two dictionaries have the same keys but at different nesting levels. Any easy fix is to change this line:
if isinstance(v, dict):
klist.extend(get_keys(v))
to:
if isinstance(v, dict):
klist.append(get_keys(v))
Same idea, but the edited version preserves nested key levels.
If I understand the problem you are trying to solve it is to the compare keys of the two dictionaries (as well as subkeys of nested dictionaries) irrespective of the associated values. If two dictionaries have the same keys (and subkeys) then they are the "same" for your purposes. If that problem statement is correct then generating an in-order list of keys/sub-keys for one dictionary and comparing that list to the same for a second dictionary should be sufficient for your purposes:
dict1 = {"guid": {"id": {"addr": "foo", "creation_num": "4"}}}
dict2 = {"guid": {"id": {"addr": "bar", "creation_num": "2"}}}
def get_keys(d):
klist = []
for k, v in d.items():
klist.append(k)
if isinstance(v, dict):
klist.extend(get_keys(v))
return klist
print(get_keys(dict1) == get_keys(dict2))
Output:
True

You can solve this by creating a dictionary hash of keys. This solution also takes into account different key ordering.
import hashlib
def has_exact_keys(d: dict, z: dict) -> bool:
return dict_keys_hash(d) == dict_keys_hash(z)
def dict_keys_hash(d: dict) -> str:
hash = hashlib.md5() # switch to different hashing algorithm if you want
for k in sorted(d.keys()):
if isinstance(d[k], dict):
hash.update(dict_hash(d[k]).encode('utf-8'))
hash.update(k.encode('utf-8'))
return hash.hexdigest()

Disclaimer: I am the author of the ndicts package recommended in this answer.
Answer
You can use my package ndicts to do that. To install it:
pip install ndicts
Then simply convert your dictionaries into NestedDict:
from ndicts import NestedDict
dict1 = {"guid": {"id": {"addr": "foo", "creation_num": "4"}}}
dict2 = {"guid": {"id": {"addr": "bar", "creation_num": "2"}}}
nd1 = NestedDict(dict1)
nd2 = NestedDict(dict2)
You can then check if the keys are equal:
>>> nd1.keys() == nd2.keys()
True
Here is what the keys of a NestedDict are:
>>> for k in nd1.keys():
... print(k)
('guid', 'id', 'addr')
('guid', 'id', 'creation_num')
Comment to your solution
Your method works but as a potentially unwanted side effect it will modify the input dictionary:
>>> dict1
{"guid": {"id": {"addr": "foo", "creation_num": "4"}}}
>>> reset_values(dict1)
{'guid': {'id': {'addr': 0, 'creation_num': 0}}}
>>> dict1
{'guid': {'id': {'addr': 0, 'creation_num': 0}}}
This can be easily fixed by deepcopying the input dictionary at the beginning of your function.
You are using recursion, which is fine. However, you can also use a for loop or reduce which have some advantages. See answer to "Best way to get nested dictionary items".
One last thing, I would aim at creating a function that compares keys directly or that returns all the keys in an iterable, rather than resetting the values of your dictionaries and then comparing them.

Related

Remove JSON data pairs from nested structure [duplicate]

I had to remove some fields from a dictionary, the keys for those fields are on a list. So I wrote this function:
def delete_keys_from_dict(dict_del, lst_keys):
"""
Delete the keys present in lst_keys from the dictionary.
Loops recursively over nested dictionaries.
"""
dict_foo = dict_del.copy() #Used as iterator to avoid the 'DictionaryHasChanged' error
for field in dict_foo.keys():
if field in lst_keys:
del dict_del[field]
if type(dict_foo[field]) == dict:
delete_keys_from_dict(dict_del[field], lst_keys)
return dict_del
This code works, but it's not very elegant and I'm sure that there is a better solution.
First of, I think your code is working and not inelegant. There's no immediate reason not to use the code you presented.
There are a few things that could be better though:
Comparing the type
Your code contains the line:
if type(dict_foo[field]) == dict:
That can be definitely improved. Generally (see also PEP8) you should use isinstance instead of comparing types:
if isinstance(dict_foo[field], dict)
However that will also return True if dict_foo[field] is a subclass of dict. If you don't want that, you could also use is instead of ==. That will be marginally (and probably unnoticeable) faster.
If you also want to allow arbitary dict-like objects you could go a step further and test if it's a collections.abc.MutableMapping. That will be True for dict and dict subclasses and for all mutable mappings that explicitly implement that interface without subclassing dict, for example UserDict:
>>> from collections import MutableMapping
>>> # from UserDict import UserDict # Python 2.x
>>> from collections import UserDict # Python 3.x - 3.6
>>> # from collections.abc import MutableMapping # Python 3.7+
>>> isinstance(UserDict(), MutableMapping)
True
>>> isinstance(UserDict(), dict)
False
Inplace modification and return value
Typically functions either modify a data structure inplace or return a new (modified) data structure. Just to mention a few examples: list.append, dict.clear, dict.update all modify the data structure inplace and return None. That makes it easier to keep track what a function does. However that's not a hard rule and there are always valid exceptions from this rule. However personally I think a function like this doesn't need to be an exception and I would simply remove the return dict_del line and let it implicitly return None, but YMMV.
Removing the keys from the dictionary
You copied the dictionary to avoid problems when you remove key-value pairs during the iteration. However, as already mentioned by another answer you could just iterate over the keys that should be removed and try to delete them:
for key in keys_to_remove:
try:
del dict[key]
except KeyError:
pass
That has the additional advantage that you don't need to nest two loops (which could be slower, especially if the number of keys that need to be removed is very long).
If you don't like empty except clauses you can also use: contextlib.suppress (requires Python 3.4+):
from contextlib import suppress
for key in keys_to_remove:
with suppress(KeyError):
del dict[key]
Variable names
There are a few variables I would rename because they are just not descriptive or even misleading:
delete_keys_from_dict should probably mention the subdict-handling, maybe delete_keys_from_dict_recursive.
dict_del sounds like a deleted dict. I tend to prefer names like dictionary or dct because the function name already describes what is done to the dictionary.
lst_keys, same there. I'd probably use just keys there. If you want to be more specific something like keys_sequence would make more sense because it accepts any sequence (you just have to be able to iterate over it multiple times), not just lists.
dict_foo, just no...
field isn't really appropriate either, it's a key.
Putting it all together:
As I said before I personally would modify the dictionary in-place and not return the dictionary again. Because of that I present two solutions, one that modifies it in-place but doesn't return anything and one that creates a new dictionary with the keys removed.
The version that modifies in-place (very much like Ned Batchelders solution):
from collections import MutableMapping
from contextlib import suppress
def delete_keys_from_dict(dictionary, keys):
for key in keys:
with suppress(KeyError):
del dictionary[key]
for value in dictionary.values():
if isinstance(value, MutableMapping):
delete_keys_from_dict(value, keys)
And the solution that returns a new object:
from collections import MutableMapping
def delete_keys_from_dict(dictionary, keys):
keys_set = set(keys) # Just an optimization for the "if key in keys" lookup.
modified_dict = {}
for key, value in dictionary.items():
if key not in keys_set:
if isinstance(value, MutableMapping):
modified_dict[key] = delete_keys_from_dict(value, keys_set)
else:
modified_dict[key] = value # or copy.deepcopy(value) if a copy is desired for non-dicts.
return modified_dict
However it only makes copies of the dictionaries, the other values are not returned as copy, you could easily wrap these in copy.deepcopy (I put a comment in the appropriate place of the code) if you want that.
def delete_keys_from_dict(dict_del, lst_keys):
for k in lst_keys:
try:
del dict_del[k]
except KeyError:
pass
for v in dict_del.values():
if isinstance(v, dict):
delete_keys_from_dict(v, lst_keys)
return dict_del
Since the question requested an elegant way, I'll submit my general-purpose solution to wrangling nested structures. First, install the boltons utility package with pip install boltons, then:
from boltons.iterutils import remap
data = {'one': 'remains', 'this': 'goes', 'of': 'course'}
bad_keys = set(['this', 'is', 'a', 'list', 'of', 'keys'])
drop_keys = lambda path, key, value: key not in bad_keys
clean = remap(data, visit=drop_keys)
print(clean)
# Output:
{'one': 'remains'}
In short, the remap utility is a full-featured, yet succinct approach to handling real-world data structures which are often nested, and can even contain cycles and special containers.
This page has many more examples, including ones working with much larger objects from Github's API.
It's pure-Python, so it works everywhere, and is fully tested in Python 2.7 and 3.3+. Best of all, I wrote it for exactly cases like this, so if you find a case it doesn't handle, you can bug me to fix it right here.
def delete_keys_from_dict(d, to_delete):
if isinstance(to_delete, str):
to_delete = [to_delete]
if isinstance(d, dict):
for single_to_delete in set(to_delete):
if single_to_delete in d:
del d[single_to_delete]
for k, v in d.items():
delete_keys_from_dict(v, to_delete)
elif isinstance(d, list):
for i in d:
delete_keys_from_dict(i, to_delete)
d = {'a': 10, 'b': [{'c': 10, 'd': 10, 'a': 10}, {'a': 10}], 'c': 1 }
delete_keys_from_dict(d, ['a', 'c']) # inplace deletion
print(d)
>>> {'b': [{'d': 10}, {}]}
This solution works for dict and list in a given nested dict. The input to_delete can be a list of str to be deleted or a single str.
Plese note, that if you remove the only key in a dict, you will get an empty dict.
I think the following is more elegant:
def delete_keys_from_dict(dict_del, lst_keys):
if not isinstance(dict_del, dict):
return dict_del
return {
key: value
for key, value in (
(key, delete_keys_from_dict(value, lst_keys))
for key, value in dict_del.items()
)
if key not in lst_keys
}
Example usage:
test_dict_in = {
1: {1: {0: 2, 3: 4}},
0: {2: 3},
2: {5: {0: 4}, 6: {7: 8}},
}
test_dict_out = {
1: {1: {3: 4}},
2: {5: {}, 6: {7: 8}},
}
assert delete_keys_from_dict(test_dict_in, [0]) == test_dict_out
Since you already need to loop through every element in the dict, I'd stick with a single loop and just make sure to use a set for looking up the keys to delete
def delete_keys_from_dict(dict_del, the_keys):
"""
Delete the keys present in the lst_keys from the dictionary.
Loops recursively over nested dictionaries.
"""
# make sure the_keys is a set to get O(1) lookups
if type(the_keys) is not set:
the_keys = set(the_keys)
for k,v in dict_del.items():
if k in the_keys:
del dict_del[k]
if isinstance(v, dict):
delete_keys_from_dict(v, the_keys)
return dict_del
this works with dicts containing Iterables (list, ...) that may contain dict. Python 3. For Python 2 unicode should also be excluded from the iteration. Also there may be some iterables that don't work that I'm not aware of. (i.e. will lead to inifinite recursion)
from collections.abc import Iterable
def deep_omit(d, keys):
if isinstance(d, dict):
for k in keys:
d.pop(k, None)
for v in d.values():
deep_omit(v, keys)
elif isinstance(d, Iterable) and not isinstance(d, str):
for e in d:
deep_omit(e, keys)
return d
Since nobody posted an interactive version that could be useful for someone:
def delete_key_from_dict(adict, key):
stack = [adict]
while stack:
elem = stack.pop()
if isinstance(elem, dict):
if key in elem:
del elem[key]
for k in elem:
stack.append(elem[k])
This version is probably what you would push to production. The recursive version is elegant and easy to write but it scales badly (by default Python uses a maximum recursion depth of 1000).
If you have nested keys as well and based on #John La Rooy's answer here is an elegant solution:
from boltons.iterutils import remap
def sof_solution():
data = {"user": {"name": "test", "pwd": "******"}, "accounts": ["1", "2"]}
sensitive = {"user.pwd", "accounts"}
clean = remap(
data,
visit=lambda path, key, value: drop_keys(path, key, value, sensitive)
)
print(clean)
def drop_keys(path, key, value, sensitive):
if len(path) > 0:
nested_key = f"{'.'.join(path)}.{key}"
return nested_key not in sensitive
return key not in sensitive
sof_solution() # prints {'user': {'name': 'test'}}
Using the awesome code from this post and add a small statement:
def remove_fields(self, d, list_of_keys_to_remove):
if not isinstance(d, (dict, list)):
return d
if isinstance(d, list):
return [v for v in (self.remove_fields(v, list_of_keys_to_remove) for v in d) if v]
return {k: v for k, v in ((k, self.remove_fields(v, list_of_keys_to_remove)) for k, v in d.items()) if k not in list_of_keys_to_remove}
I came here to search for a solution to remove keys from deeply nested Python3 dicts and all solutions seem to be somewhat complex.
Here's a oneliner for removing keys from nested or flat dicts:
nested_dict = {
"foo": {
"bar": {
"foobar": {},
"shmoobar": {}
}
}
}
>>> {'foo': {'bar': {'foobar': {}, 'shmoobar': {}}}}
nested_dict.get("foo", {}).get("bar", {}).pop("shmoobar", None)
>>> {'foo': {'bar': {'foobar': {}}}}
I used .get() to not get KeyError and I also provide empty dict as default value up to the end of the chain. I do pop() for the last element and I provide None as the default there to avoid KeyError.

How to efficiently perform a dictionary merge?

For a problem I am solving, I have a list of dictionaries. The problem involves multiple queries of the form merge(a, b, c). Merging means, in the result, the count for the common keys is added/subtracted and uncommon keys (and their values) are appended as is.
I am currently using Python's collection.Counter to represent the dictionaries and perform the merging as follows:
def merge(a, b, c):
counter_a, counter_b, counter_c = DICTLIST[a],DICTLIST[b],DICTLIST[c]
total = counter_a + counter_b - counter_c # Type collections.Counter
return total
Although this is a convenient solution, in the problem, there can be up to 10**5 such queries. On such a scale, using this approach is too slow. Is there a better approach to solving this?
NOTE: Pre-computation of the merge queries is not practical as the number of possible inputs is very large.
Example:
DICTLIST[a] = Counter({1:5,2:10})
DICTLIST[b] = Counter({2:10,3:20})
DICTLIST[c] = Counter({1:2})
merge(a,b,c) # Expected Output: {1:3, 2:20, 3:20}
Try this -
def mergeDict(dict1, dict2):
dict3 = {**dict1, **dict2}
for key, value in dict3.items():
if key in dict1 and key in dict2:
dict3[key] = value + dict1[key]
return dict3
Then you can call like this -
# Create first dictionary
dict1 = {1:5,2:10}
# Create second dictionary
dict2 = {2:10,3:20}
# Create third dictionary
dict3 = {1:-2}
dict4 = mergeDict(dict3, mergeDict(dict1, dict2))
Please note I have "-2" in the third dict for the subtraction logic.
you can use **kwargs here
x={1:5,2:10}
y={2:10,3:20}
z={**x, **y}
if you further want to optimize the performance as there are multiple queries you should use "caching + dictionary" because lookup table is always faster than any operation
My first instinct is to look for something like the Javascript "spread" operator for python:
https://mlpipes.com/object-spread-operator-python/
Example here:
old_dict = {'hello': 'world', 'foo': 'bar'}
new_dict = {**old_dict, 'foo': 'baz'}
For your code, you should try something like:
DICTLIST[d] = {**a,**b,**c}

How to reference a combination of dict() and list() in Python3?

This is an example of a complexe data structure. The depth of the structure is not fixed. To reference a specific datum in the structure I need a unknown number of indices (for list()) and keys (for dict()).
>>> x = [{'child': [{'text': 'ass'}, {'group': 'wef'}]}]
>>> x[0]['child'][0]['text']
'ass'
Now I want to have single keys for the values like this.
keys = {'ID01': [0]['child'][0]['text'],
'ID02': [1]['group']}
But this is not possible. Is there another pythonic way?
I think you need a couple of things here. First is a custom lookup function:
def lookup(obj, keys):
for k in keys:
obj = obj[k]
return obj
Then a dictionary of keys to key list tuples:
keys = {'ID01': (0,'child',0,'text'),
'ID02': (1,'group')}
then you can do this:
lookup(x, keys['ID01']) # returns 'ass'

Get a list of a dictionary's key names

How do I make a function that takes a dictionary as input and outputs the names of its keys? So like:
input_dictionary = {"foo": 1, "bar": 2}
names = get_names(input_dictionary) # returns ["foo", "bar"]
You should read the Python documentation for dictionaries (see here for Python 3.5). There are multiple ways that this can be done.
You could use:
input_dictionary.keys(), obviously the easiest solution:
def get_names(input_dictionary):
return input_dictionary.keys()
input_dictionary.iterkeys(), to get an iterator over the dictionary's keys, then you can iterate over those keys to create a list of them:
def get_names(input_dictionary):
list_of_keys = []
for key in input_dictionary.iterkeys():
list_of_keys.append(key)
return list_of_keys
input_dictionary.iteritems(), which returns an iterator over the dictionary's (key, value) pairs, which you can iterate over and then extract the keys:
def get_names(input_dictionary):
list_of_keys = []
for item in input_dictionary.iteritems():
list_of_keys.append(item[0])
return list_of_keys
input_dictionary.popitem(), which pops (removes) and returns an arbitrary (key, value) pair from your dictionary, from which you can extract the key. You probably don't want this one, though, since it clears your dictionary
And finally, input_dictionary.viewitems() or input_dictionary.viewkeys() to get a view of the (key, value) pairs or the list of keys, respectively, for your dictionary. Anytime the dictionary changes, this view object will reflect that change.
input_dictionary = {"foo": 1, "bar": 2}
input_dictionary.keys() # ["foo", "bar"]
Using keys():
>>> input_dictionary = {"foo": 1, "bar": 2}
>>> print input_dictionary.keys()
['foo', 'bar']
So a function would be:
def dictkeys(mydictionary):
return mydictionary.keys()
Output:
>>> dictkeys(input_dictionary)
['foo', 'bar']
You don't really need a function for this though because it's the same as just simply using dictionaryname.keys()

Using dict.fromkeys(), assign each value to an empty dictionary

I've hit a bit of a problem with creating empty dictionaries within dictionaries while using fromkeys(); they all link to the same one.
Here's a quick bit of code to demonstrate what I mean:
a = dict.fromkeys( range( 3 ), {} )
for key in a:
a[key][0] = key
Output I'd want is like a[0][0]=0, a[1][0]=1, a[2][0]=2, yet they all equal 2 since it's editing the same dictionarionary 3 times
If I was to define the dictionary like a = {0: {}, 1: {}, 2: {}}, it works, but that's not very practical for if you need to build it from a bigger list.
With fromkeys, I've tried {}, dict(), dict.copy() and b={}; b.copy(), how would I go about doing this?
The problem is that {} is a single value to fromkeys, and not a factory. Therefore you get the single mutable dict, not individual copies of it.
defaultdict is one way to create a dict that has a builtin factory.
from collections import defaultdict as dd
from pprint import pprint as pp
a = dd(dict)
for key in range(3):
a[key][0] = key
pp(a)
If you want something more strictly evaluated, you will need to use a dict comprehension or map.
a = {key: {} for key in range(3)}
But then, if you're going to do that, you may as well get it all done
a = {key: {0: key} for key in range(3)}
Just iterate over keys and insert a dict for each key:
{k: {0: k} for k in keys}
Here, keys is an iterable of hashable values such as range(3) in your example.

Categories