Extract all keys in nested json data with arbitray depth

Extract all keys in nested json data with arbitray depth - python

I have a json which is little more than an unordered dump, which consits of a mix of dicts, lists and unicode values nested at depth ranging from 1 to 10. Here is a rough simplified example of what I'm dealing with:
{'name': 'TheDude',
'age': '19',
'hobbies': {
'love': 'eating',
'hate': 'reading',
'like': [
{'outdoor': {
'teamsport': 'soccer',
}
}
]
}
}
I want the following output (based on the above):
[name, age, hobbies_love, hobbies_hate, hobbies_like_outdoor_teamsport]
I tried the following code:
def printinoice(dictionary,arr):
for k, v in dictionary.iteritems():
arr.append(k)
if isinstance(v, dict):
for result in printinoice(v,arr):
arr.append(result)
elif isinstance(v, list):
for d in v:
for result in printinoice(d,arr):
arr.append(result)
return arr
based on this but no luck so far. Anyone have a good idea on how to make it work?

The following recursive function will work:
def deep_keys(dct):
if not isinstance(dct, (dict, list)):
return ['']
if isinstance(dct, list):
return [dk for x in dct for dk in deep_keys(x)]
return [k+('_'+dk if dk else '') for k, v in dct.items() for dk in deep_keys(v)]
>>> deep_keys(d)
['name', 'age', 'hobbies_love', 'hobbies_hate', 'hobbies_like_outdoor_teamsport']
It is easiest to not to assume a given type for the function argument, so that you can just pass any nested stuff (i.e. list elements and dict values) down the recursion.

Related

Transform dictionary to map values to list of keys

For example I have a dictionary like this:
my_dict = {
'name_1': 'method_name_x',
'name_2': 'method_name_x',
'name_3': 'method_name_y',
}
(keys and values of the dictionary are simply strings)
I want to transform this dictionary so that all values will be mapped to a list of keys which have these value.
Example result:
my_transformed_dict = {
'method_name_x': ['name_1', 'name_2'],
'method_name_y': ['name_3'],
}
I could do this by the following code:
my_transformed_dict = dict.fromkeys(my_dict.values(), [])
for k, v in my_dict.items():
my_transformed_dict[v].append(k)
But this will end up addind every key to the values somehow.
I also thought of using dict.setdefault(), like this:
my_transformed_dict = dict()
for k, v in my_dict:
my_transformed_dict.setdefault(v, []).append(k)
This works as indentend, but:
What would be best practice to solve this?
Is there a simpler way to solve this (maybe using a library)? Or just doing the code as a readable one-liner?

You can use itertools.groupby, for example:
from itertools import groupby
my_dict = {
'name_1': 'method_name_x',
'name_2': 'method_name_x',
'name_3': 'method_name_y',
}
print({k: list(v) for k, v in groupby(sorted(my_dict, key=lambda k: my_dict[k]), key=lambda k: my_dict[k])})
Output:
{'method_name_x': ['name_1', 'name_2'], 'method_name_y': ['name_3']}

filter list of dictionaries based on a particular value of a key in that dictionary

I have a list of dictionaries as follows
dict = {2308:[{'name':'john'},{'age':'24'},{'employed':'yes'}],3452:[{'name':'sam'},{'age':'45'},{'employed':'yes'}],1234:[{'name':'victor'},{'age':'72'},{'employed':'no'}]}
I want to filter out the above dictionary to new dictionary named new_dict whose age >30.
I tried the following. As I new to programming, could not get the logic.
new_dict =[var for var in dict if dict['age']>30]
But I know it is list of dictionaries, so is there any way I can get the new list dictionaries with age >30

You can use the following dict comprehension (assuming you store the dictionary in variable d rather than dict, which would shadow the built-in dict class):
{k: v for k, v in d.items() if any(int(s.get('age', 0)) > 30 for s in v)}
This returns:
{3452: [{'name': 'sam'}, {'age': '45'}, {'employed': 'yes'}], 1234: [{'name': 'victor'}, {'age': '72'}, {'employed': 'no'}]}

Solution
people = {
2308:[
{'name':'john'},{'age':'24'},{'employed':'yes'}
],
3452:[
{'name':'sam'},{'age':'45'},{'employed':'yes'}
],
1234:[
{'name':'victor'},{'age':'72'},{'employed':'no'}
]
}
people_over_30 = {}
for k, v in people.items():
for i in range(len(v)):
if int(v[i].get('age', 0)) > 30:
people_over_30[k] = [v]
print(people_over_30)
Output
(xenial)vash#localhost:~/python$ python3.7 quote.py
{3452: [[{'name': 'sam'}, {'age': '45'}, {'employed': 'yes'}]], 1234: [[{'name': 'victor'}, {'age': '72'}, {'employed': 'no'}]]}
Comments
Unless this is your desired structure for this particular code, I would suggest reformatting your dictionary to look like this
people = {
2308:
{'name':'john','age':'24','employed':'yes'}
,
3452:
{'name':'sam','age':'45','employed':'yes'}
,
1234:
{'name':'victor','age':'72','employed':'no'}
}
And then your work would be much simpler and able to handle with this instead
for k, v in people.items():
if int(v.get('age', 0)) > 30:
people_over_30[k] = [v]

Python3 sorting lists of nested dicts with lambda

I have a deep nested object of various lists and dicts that I retrieve as json which I need to compare to another version of itself. The issue is that all lists are basically unsorted, therefore I need to sort before comparing them. Any deep diff library I've tried failed without proper sorting the dicts position in the lists, so here we go.
Sample object that requires sorting:
{
"main":{
"key1":"value1",
"key2":"value2",
"key3":[{
"sub1":"value2",
"sub2":{
"subsub":[{
"subsubsub1":10,
"subsubsub2":11,
"subsubsub3":[10,11,12]
},{
"subsubsub1":7,
"subsubsub2":8,
"subsubsub3":[9,7,8]
}]
}
},{
"sub1":"value1",
"sub2":{
"subsub":[{
"subsubsub1":1,
"subsubsub2":2,
"subsubsub3":[1,2,3]
},
{
"subsubsub1":4,
"subsubsub2":5,
"subsubsub3":[5,6,4]
}]
}
}]
}
}
Besides a few recursive loops I'm trying to sort the dicts by translating them with sorted lists into sorted tuples and hash them.
Edit:
The object is passed into unnest()
def unnest(d):
for k, v in d.items():
if isinstance(v, dict):
d.update({k: unnest(v)})
elif isinstance(v, list):
d.update({k: unsort(v)})
return d
def unsort(l):
for i, e in enumerate(l):
if isinstance(e, dict):
l[i] = unnest(e)
elif isinstance(e, list):
l[i] = unsort(e)
return sorted(l, key=lambda i: sort_hash(i))
def unnest_hash(d):
for k, v in d.items():
if isinstance(v, dict):
d.update({k: unnest_hash(v)})
elif isinstance(v, list):
d.update({k: sort_hash(v)})
return hash(tuple(sorted(d.items())))
def sort_hash(l):
if isinstance(l, list):
for i, e in enumerate(l):
if isinstance(e, dict):
l[i] = unnest_hash(e)
elif isinstance(e, list):
l[i] = sort_hash(e)
return hash(tuple(sorted(l)))
elif isinstance(l, dict):
return unnest_hash(l)
else:
return hash(l)
However for some reason the hash value gets written into the "sorted" list:
{'main': {'key1': 'value1', 'key2': 'value2', 'key3': [{'sub1': 'value2', 'sub2': -4046234112924644199}, {'sub1': 'value1', 'sub2': 4015568797712784641}]}}
How can I prevent the sort value in the lambda function to be written into the returned sorted list?
Thanks!

Your sort_hash function is mutating the value passed into it. That's why you see it in the original values are the sort:
l[i] = unnest_hash(e)
and
l[i] = sort_hash(e)
both modify the value you are trying to hash. unnest_hash also modifies the original values:
d.update({k: unnest_hash(v)})
A hash calculation for sorting must never modify the value it is hashing.

Efficient way to remove keys with empty strings from a dict

I have a dict and would like to remove all the keys for which there are empty value strings.
metadata = {u'Composite:PreviewImage': u'(Binary data 101973 bytes)',
u'EXIF:CFAPattern2': u''}
What is the best way to do this?

Python 2.X
dict((k, v) for k, v in metadata.iteritems() if v)
Python 2.7 - 3.X
{k: v for k, v in metadata.items() if v}
Note that all of your keys have values. It's just that some of those values are the empty string. There's no such thing as a key in a dict without a value; if it didn't have a value, it wouldn't be in the dict.

It can get even shorter than BrenBarn's solution (and more readable I think)
{k: v for k, v in metadata.items() if v}
Tested with Python 2.7.3.

If you really need to modify the original dictionary:
empty_keys = [k for k,v in metadata.iteritems() if not v]
for k in empty_keys:
del metadata[k]
Note that we have to make a list of the empty keys because we can't modify a dictionary while iterating through it (as you may have noticed). This is less expensive (memory-wise) than creating a brand-new dictionary, though, unless there are a lot of entries with empty values.

If you want a full-featured, yet succinct approach to handling real-world data structures which are often nested, and can even contain cycles, I recommend looking at the remap utility from the boltons utility package.
After pip install boltons or copying iterutils.py into your project, just do:
from boltons.iterutils import remap
drop_falsey = lambda path, key, value: bool(value)
clean = remap(metadata, visit=drop_falsey)
This page has many more examples, including ones working with much larger objects from Github's API.
It's pure-Python, so it works everywhere, and is fully tested in Python 2.7 and 3.3+. Best of all, I wrote it for exactly cases like this, so if you find a case it doesn't handle, you can bug me to fix it right here.

Based on Ryan's solution, if you also have lists and nested dictionaries:
For Python 2:
def remove_empty_from_dict(d):
if type(d) is dict:
return dict((k, remove_empty_from_dict(v)) for k, v in d.iteritems() if v and remove_empty_from_dict(v))
elif type(d) is list:
return [remove_empty_from_dict(v) for v in d if v and remove_empty_from_dict(v)]
else:
return d
For Python 3:
def remove_empty_from_dict(d):
if type(d) is dict:
return dict((k, remove_empty_from_dict(v)) for k, v in d.items() if v and remove_empty_from_dict(v))
elif type(d) is list:
return [remove_empty_from_dict(v) for v in d if v and remove_empty_from_dict(v)]
else:
return d

BrenBarn's solution is ideal (and pythonic, I might add). Here is another (fp) solution, however:
from operator import itemgetter
dict(filter(itemgetter(1), metadata.items()))

If you have a nested dictionary, and you want this to work even for empty sub-elements, you can use a recursive variant of BrenBarn's suggestion:
def scrub_dict(d):
if type(d) is dict:
return dict((k, scrub_dict(v)) for k, v in d.iteritems() if v and scrub_dict(v))
else:
return d

For python 3
dict((k, v) for k, v in metadata.items() if v)

Quick Answer (TL;DR)
Example01
### example01 -------------------
mydict = { "alpha":0,
"bravo":"0",
"charlie":"three",
"delta":[],
"echo":False,
"foxy":"False",
"golf":"",
"hotel":" ",
}
newdict = dict([(vkey, vdata) for vkey, vdata in mydict.iteritems() if(vdata) ])
print newdict
### result01 -------------------
result01 ='''
{'foxy': 'False', 'charlie': 'three', 'bravo': '0'}
'''
Detailed Answer
Problem
Context: Python 2.x
Scenario: Developer wishes modify a dictionary to exclude blank values
aka remove empty values from a dictionary
aka delete keys with blank values
aka filter dictionary for non-blank values over each key-value pair
Solution
example01 use python list-comprehension syntax with simple conditional to remove "empty" values
Pitfalls
example01 only operates on a copy of the original dictionary (does not modify in place)
example01 may produce unexpected results depending on what developer means by "empty"
Does developer mean to keep values that are falsy?
If the values in the dictionary are not gauranteed to be strings, developer may have unexpected data loss.
result01 shows that only three key-value pairs were preserved from the original set
Alternate example
example02 helps deal with potential pitfalls
The approach is to use a more precise definition of "empty" by changing the conditional.
Here we only want to filter out values that evaluate to blank strings.
Here we also use .strip() to filter out values that consist of only whitespace.
Example02
### example02 -------------------
mydict = { "alpha":0,
"bravo":"0",
"charlie":"three",
"delta":[],
"echo":False,
"foxy":"False",
"golf":"",
"hotel":" ",
}
newdict = dict([(vkey, vdata) for vkey, vdata in mydict.iteritems() if(str(vdata).strip()) ])
print newdict
### result02 -------------------
result02 ='''
{'alpha': 0,
'bravo': '0',
'charlie': 'three',
'delta': [],
'echo': False,
'foxy': 'False'
}
'''
See also
list-comprehension
falsy
checking for empty string
modifying original dictionary in place
dictionary comprehensions
pitfalls of checking for empty string

Building on the answers from patriciasz and nneonneo, and accounting for the possibility that you might want to delete keys that have only certain falsy things (e.g. '') but not others (e.g. 0), or perhaps you even want to include some truthy things (e.g. 'SPAM'), then you could make a highly specific hitlist:
unwanted = ['', u'', None, False, [], 'SPAM']
Unfortunately, this doesn't quite work, because for example 0 in unwanted evaluates to True. We need to discriminate between 0 and other falsy things, so we have to use is:
any([0 is i for i in unwanted])
...evaluates to False.
Now use it to del the unwanted things:
unwanted_keys = [k for k, v in metadata.items() if any([v is i for i in unwanted])]
for k in unwanted_keys: del metadata[k]
If you want a new dictionary, instead of modifying metadata in place:
newdict = {k: v for k, v in metadata.items() if not any([v is i for i in unwanted])}

I read all replies in this thread and some referred also to this thread:
Remove empty dicts in nested dictionary with recursive function
I originally used solution here and it worked great:
Attempt 1: Too Hot (not performant or future-proof):
def scrub_dict(d):
if type(d) is dict:
return dict((k, scrub_dict(v)) for k, v in d.iteritems() if v and scrub_dict(v))
else:
return d
But some performance and compatibility concerns were raised in Python 2.7 world:
use isinstance instead of type
unroll the list comp into for loop for efficiency
use python3 safe items instead of iteritems
Attempt 2: Too Cold (Lacks Memoization):
def scrub_dict(d):
new_dict = {}
for k, v in d.items():
if isinstance(v,dict):
v = scrub_dict(v)
if not v in (u'', None, {}):
new_dict[k] = v
return new_dict
DOH! This is not recursive and not at all memoizant.
Attempt 3: Just Right (so far):
def scrub_dict(d):
new_dict = {}
for k, v in d.items():
if isinstance(v,dict):
v = scrub_dict(v)
if not v in (u'', None, {}):
new_dict[k] = v
return new_dict

To preserve 0 and False values but get rid of empty values you could use:
{k: v for k, v in metadata.items() if v or v == 0 or v is False}
For a nested dict with mixed types of values you could use:
def remove_empty_from_dict(d):
if isinstance(d, dict):
return dict((k, remove_empty_from_dict(v)) for k, v in d.items() \
if v or v == 0 or v is False and remove_empty_from_dict(v) is not None)
elif isinstance(d, list):
return [remove_empty_from_dict(v) for v in d
if v or v == 0 or v is False and remove_empty_from_dict(v) is not None]
else:
if d or d == 0 or d is False:
return d

"As I also currently write a desktop application for my work with Python, I found in data-entry application when there is lots of entry and which some are not mandatory thus user can left it blank, for validation purpose, it is easy to grab all entries and then discard empty key or value of a dictionary. So my code above a show how we can easy take them out, using dictionary comprehension and keep dictionary value element which is not blank. I use Python 3.8.3
data = {'':'', '20':'', '50':'', '100':'1.1', '200':'1.2'}
dic = {key:value for key,value in data.items() if value != ''}
print(dic)
{'100': '1.1', '200': '1.2'}

Dicts mixed with Arrays
The answer at Attempt 3: Just Right (so far) from BlissRage's answer does not properly handle arrays elements. I'm including a patch in case anyone needs it. The method is handles list with the statement block of if isinstance(v, list):, which scrubs the list using the original scrub_dict(d) implementation.
#staticmethod
def scrub_dict(d):
new_dict = {}
for k, v in d.items():
if isinstance(v, dict):
v = scrub_dict(v)
if isinstance(v, list):
v = scrub_list(v)
if not v in (u'', None, {}, []):
new_dict[k] = v
return new_dict
#staticmethod
def scrub_list(d):
scrubbed_list = []
for i in d:
if isinstance(i, dict):
i = scrub_dict(i)
scrubbed_list.append(i)
return scrubbed_list

An alternative way you can do this, is using dictionary comprehension. This should be compatible with 2.7+
result = {
key: value for key, value in
{"foo": "bar", "lorem": None}.items()
if value
}

Here is an option if you are using pandas:
import pandas as pd
d = dict.fromkeys(['a', 'b', 'c', 'd'])
d['b'] = 'not null'
d['c'] = '' # empty string
print(d)
# convert `dict` to `Series` and replace any blank strings with `None`;
# use the `.dropna()` method and
# then convert back to a `dict`
d_ = pd.Series(d).replace('', None).dropna().to_dict()
print(d_)

Some of Methods mentioned above ignores if there are any integers and float with values 0 & 0.0
If someone wants to avoid the above can use below code(removes empty strings and None values from nested dictionary and nested list):
def remove_empty_from_dict(d):
if type(d) is dict:
_temp = {}
for k,v in d.items():
if v == None or v == "":
pass
elif type(v) is int or type(v) is float:
_temp[k] = remove_empty_from_dict(v)
elif (v or remove_empty_from_dict(v)):
_temp[k] = remove_empty_from_dict(v)
return _temp
elif type(d) is list:
return [remove_empty_from_dict(v) for v in d if( (str(v).strip() or str(remove_empty_from_dict(v)).strip()) and (v != None or remove_empty_from_dict(v) != None))]
else:
return d

metadata ={'src':'1921','dest':'1337','email':'','movile':''}
ot = {k: v for k, v in metadata.items() if v != ''}
print(f"Final {ot}")

You also have an option with filter method:
filtered_metadata = dict( filter(lambda val: val[1] != u'', metadata.items()) )

Some benchmarking:
1. List comprehension recreate dict
In [7]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
...: dic = {k: v for k, v in dic.items() if v is not None}
1000000 loops, best of 7: 375 ns per loop
2. List comprehension recreate dict using dict()
In [8]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
...: dic = dict((k, v) for k, v in dic.items() if v is not None)
1000000 loops, best of 7: 681 ns per loop
3. Loop and delete key if v is None
In [10]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
...: for k, v in dic.items():
...: if v is None:
...: del dic[k]
...:
10000000 loops, best of 7: 160 ns per loop
so loop and delete is the fastest at 160ns, list comprehension is half as slow at ~375ns and with a call to dict() is half as slow again ~680ns.
Wrapping 3 into a function brings it back down again to about 275ns. Also for me PyPy was about twice as fast as neet python.

Loop through all nested dictionary values?

for k, v in d.iteritems():
if type(v) is dict:
for t, c in v.iteritems():
print "{0} : {1}".format(t, c)
I'm trying to loop through a dictionary and print out all key value pairs where the value is not a nested dictionary. If the value is a dictionary I want to go into it and print out its key value pairs...etc. Any help?
EDIT
How about this? It still only prints one thing.
def printDict(d):
for k, v in d.iteritems():
if type(v) is dict:
printDict(v)
else:
print "{0} : {1}".format(k, v)
Full Test Case
Dictionary:
{u'xml': {u'config': {u'portstatus': {u'status': u'good'}, u'target': u'1'},
u'port': u'11'}}
Result:
xml : {u'config': {u'portstatus': {u'status': u'good'}, u'target': u'1'}, u'port': u'11'}

As said by Niklas, you need recursion, i.e. you want to define a function to print your dict, and if the value is a dict, you want to call your print function using this new dict.
Something like :
def myprint(d):
for k, v in d.items():
if isinstance(v, dict):
myprint(v)
else:
print("{0} : {1}".format(k, v))

There are potential problems if you write your own recursive implementation or the iterative equivalent with stack. See this example:
dic = {}
dic["key1"] = {}
dic["key1"]["key1.1"] = "value1"
dic["key2"] = {}
dic["key2"]["key2.1"] = "value2"
dic["key2"]["key2.2"] = dic["key1"]
dic["key2"]["key2.3"] = dic
In the normal sense, nested dictionary will be a n-nary tree like data structure. But the definition doesn't exclude the possibility of a cross edge or even a back edge (thus no longer a tree). For instance, here key2.2 holds to the dictionary from key1, key2.3 points to the entire dictionary(back edge/cycle). When there is a back edge(cycle), the stack/recursion will run infinitely.
root<-------back edge
/ \ |
_key1 __key2__ |
/ / \ \ |
|->key1.1 key2.1 key2.2 key2.3
| / | |
| value1 value2 |
| |
cross edge----------|
If you print this dictionary with this implementation from Scharron
def myprint(d):
for k, v in d.items():
if isinstance(v, dict):
myprint(v)
else:
print "{0} : {1}".format(k, v)
You would see this error:
> RuntimeError: maximum recursion depth exceeded while calling a Python object
The same goes with the implementation from senderle.
Similarly, you get an infinite loop with this implementation from Fred Foo:
def myprint(d):
stack = list(d.items())
while stack:
k, v = stack.pop()
if isinstance(v, dict):
stack.extend(v.items())
else:
print("%s: %s" % (k, v))
However, Python actually detects cycles in nested dictionary:
print dic
{'key2': {'key2.1': 'value2', 'key2.3': {...},
'key2.2': {'key1.1': 'value1'}}, 'key1': {'key1.1': 'value1'}}
"{...}" is where a cycle is detected.
As requested by Moondra this is a way to avoid cycles (DFS):
def myprint(d):
stack = list(d.items())
visited = set()
while stack:
k, v = stack.pop()
if isinstance(v, dict):
if k not in visited:
stack.extend(v.items())
else:
print("%s: %s" % (k, v))
visited.add(k)

Since a dict is iterable, you can apply the classic nested container iterable formula to this problem with only a couple of minor changes. Here's a Python 2 version (see below for 3):
import collections
def nested_dict_iter(nested):
for key, value in nested.iteritems():
if isinstance(value, collections.Mapping):
for inner_key, inner_value in nested_dict_iter(value):
yield inner_key, inner_value
else:
yield key, value
Test:
list(nested_dict_iter({'a':{'b':{'c':1, 'd':2},
'e':{'f':3, 'g':4}},
'h':{'i':5, 'j':6}}))
# output: [('g', 4), ('f', 3), ('c', 1), ('d', 2), ('i', 5), ('j', 6)]
In Python 2, It might be possible to create a custom Mapping that qualifies as a Mapping but doesn't contain iteritems, in which case this will fail. The docs don't indicate that iteritems is required for a Mapping; on the other hand, the source gives Mapping types an iteritems method. So for custom Mappings, inherit from collections.Mapping explicitly just in case.
In Python 3, there are a number of improvements to be made. As of Python 3.3, abstract base classes live in collections.abc. They remain in collections too for backwards compatibility, but it's nicer having our abstract base classes together in one namespace. So this imports abc from collections. Python 3.3 also adds yield from, which is designed for just these sorts of situations. This is not empty syntactic sugar; it may lead to faster code and more sensible interactions with coroutines.
from collections import abc
def nested_dict_iter(nested):
for key, value in nested.items():
if isinstance(value, abc.Mapping):
yield from nested_dict_iter(value)
else:
yield key, value

Alternative iterative solution:
def myprint(d):
stack = d.items()
while stack:
k, v = stack.pop()
if isinstance(v, dict):
stack.extend(v.iteritems())
else:
print("%s: %s" % (k, v))

Slightly different version I wrote that keeps track of the keys along the way to get there
def print_dict(v, prefix=''):
if isinstance(v, dict):
for k, v2 in v.items():
p2 = "{}['{}']".format(prefix, k)
print_dict(v2, p2)
elif isinstance(v, list):
for i, v2 in enumerate(v):
p2 = "{}[{}]".format(prefix, i)
print_dict(v2, p2)
else:
print('{} = {}'.format(prefix, repr(v)))
On your data, it'll print
data['xml']['config']['portstatus']['status'] = u'good'
data['xml']['config']['target'] = u'1'
data['xml']['port'] = u'11'
It's also easy to modify it to track the prefix as a tuple of keys rather than a string if you need it that way.

Here is pythonic way to do it. This function will allow you to loop through key-value pair in all the levels. It does not save the whole thing to the memory but rather walks through the dict as you loop through it
def recursive_items(dictionary):
for key, value in dictionary.items():
if type(value) is dict:
yield (key, value)
yield from recursive_items(value)
else:
yield (key, value)
a = {'a': {1: {1: 2, 3: 4}, 2: {5: 6}}}
for key, value in recursive_items(a):
print(key, value)
Prints
a {1: {1: 2, 3: 4}, 2: {5: 6}}
1 {1: 2, 3: 4}
1 2
3 4
2 {5: 6}
5 6

A alternative solution to work with lists based on Scharron's solution
def myprint(d):
my_list = d.iteritems() if isinstance(d, dict) else enumerate(d)
for k, v in my_list:
if isinstance(v, dict) or isinstance(v, list):
myprint(v)
else:
print u"{0} : {1}".format(k, v)

I am using the following code to print all the values of a nested dictionary, taking into account where the value could be a list containing dictionaries. This was useful to me when parsing a JSON file into a dictionary and needing to quickly check whether any of its values are None.
d = {
"user": 10,
"time": "2017-03-15T14:02:49.301000",
"metadata": [
{"foo": "bar"},
"some_string"
]
}
def print_nested(d):
if isinstance(d, dict):
for k, v in d.items():
print_nested(v)
elif hasattr(d, '__iter__') and not isinstance(d, str):
for item in d:
print_nested(item)
elif isinstance(d, str):
print(d)
else:
print(d)
print_nested(d)
Output:
10
2017-03-15T14:02:49.301000
bar
some_string

Your question already has been answered well, but I recommend using isinstance(d, collections.Mapping) instead of isinstance(d, dict). It works for dict(), collections.OrderedDict(), and collections.UserDict().
The generally correct version is:
def myprint(d):
for k, v in d.items():
if isinstance(v, collections.Mapping):
myprint(v)
else:
print("{0} : {1}".format(k, v))

Iterative solution as an alternative:
def traverse_nested_dict(d):
iters = [d.iteritems()]
while iters:
it = iters.pop()
try:
k, v = it.next()
except StopIteration:
continue
iters.append(it)
if isinstance(v, dict):
iters.append(v.iteritems())
else:
yield k, v
d = {"a": 1, "b": 2, "c": {"d": 3, "e": {"f": 4}}}
for k, v in traverse_nested_dict(d):
print k, v

Here's a modified version of Fred Foo's answer for Python 2. In the original response, only the deepest level of nesting is output. If you output the keys as lists, you can keep the keys for all levels, although to reference them you need to reference a list of lists.
Here's the function:
def NestIter(nested):
for key, value in nested.iteritems():
if isinstance(value, collections.Mapping):
for inner_key, inner_value in NestIter(value):
yield [key, inner_key], inner_value
else:
yield [key],value
To reference the keys:
for keys, vals in mynested:
print(mynested[keys[0]][keys[1][0]][keys[1][1][0]])
for a three-level dictionary.
You need to know the number of levels before to access multiple keys and the number of levels should be constant (it may be possible to add a small bit of script to check the number of nesting levels when iterating through values, but I haven't yet looked at this).

I find this approach a bit more flexible, here you just providing generator function that emits key, value pairs and can be easily extended to also iterate over lists.
def traverse(value, key=None):
if isinstance(value, dict):
for k, v in value.items():
yield from traverse(v, k)
else:
yield key, value
Then you can write your own myprint function, then would print those key value pairs.
def myprint(d):
for k, v in traverse(d):
print(f"{k} : {v}")
A test:
myprint({
'xml': {
'config': {
'portstatus': {
'status': 'good',
},
'target': '1',
},
'port': '11',
},
})
Output:
status : good
target : 1
port : 11
I tested this on Python 3.6.

Nested dictionaries looping using isinstance() and yield function.
**isinstance is afunction that returns the given input and reference is true or false as in below case dict is true so it go for iteration.
**Yield is used to return from a function without destroying the states of its local variable and when the function is called, the execution starts from the last yield statement. Any function that contains a yield keyword is termed a generator.
students= {'emp1': {'name': 'Bob', 'job': 'Mgr'},
'emp2': {'name': 'Kim', 'job': 'Dev','emp3': {'namee': 'Saam', 'j0ob': 'Deev'}},
'emp4': {'name': 'Sam', 'job': 'Dev'}}
def nested_dict_pairs_iterator(dict_obj):
for key, value in dict_obj.items():
# Check if value is of dict type
if isinstance(value, dict):
# If value is dict then iterate over all its values
for pair in nested_dict_pairs_iterator(value):
yield (key, *pair)
else:
# If value is not dict type then yield the value
yield (key, value)
for pair in nested_dict_pairs_iterator(students):
print(pair)

For a ready-made solution install ndicts
pip install ndicts
Import a NestedDict in your script
from ndicts.ndicts import NestedDict
Initialize
dictionary = {
u'xml': {
u'config': {
u'portstatus': {u'status': u'good'},
u'target': u'1'
},
u'port': u'11'
}
}
nd = NestedDict(dictionary)
Iterate
for key, value in nd.items():
print(key, value)

While the original solution from #Scharron is beautiful and simple, it cannot handle the list very well:
def myprint(d):
for k, v in d.items():
if isinstance(v, dict):
myprint(v)
else:
print("{0} : {1}".format(k, v))
So this code can be slightly modified like this to handle list in elements:
def myprint(d):
for k, v in d.items():
if isinstance(v, dict):
myprint(v)
elif isinstance(v, list):
for i in v:
myprint(i)
else:
print("{0} : {1}".format(k, v))

These answers work for only 2 levels of sub-dictionaries. For more try this:
nested_dict = {'dictA': {'key_1': 'value_1', 'key_1A': 'value_1A','key_1Asub1': {'Asub1': 'Asub1_val', 'sub_subA1': {'sub_subA1_key':'sub_subA1_val'}}},
'dictB': {'key_2': 'value_2'},
1: {'key_3': 'value_3', 'key_3A': 'value_3A'}}
def print_dict(dictionary):
dictionary_array = [dictionary]
for sub_dictionary in dictionary_array:
if type(sub_dictionary) is dict:
for key, value in sub_dictionary.items():
print("key=", key)
print("value", value)
if type(value) is dict:
dictionary_array.append(value)
print_dict(nested_dict)

You can print recursively with a dictionary comprehension:
def print_key_pairs(d):
{k: print_key_pairs(v) if isinstance(v, dict) else print(f'{k}: {v}') for k, v in d.items()}
For your test case this is the output:
>>> print_key_pairs({u'xml': {u'config': {u'portstatus': {u'status': u'good'}, u'target': u'1'}, u'port': u'11'}})
status: good
target: 1
port: 11

Returns a tuple of each key and value and the key contains the full path
from typing import Mapping, Tuple, Iterator
def traverse_dict(nested: Mapping, parent_key="", keys_to_not_traverse_further=tuple()) -> Iterator[Tuple[str, str]]:
"""Each key is joined with it's parent using dot as a separator.
Once a `parent_key` matches `keys_to_not_traverse_further`
it will no longer find its child dicts.
"""
for key, value in nested.items():
if isinstance(value, abc.Mapping) and key not in keys_to_not_traverse_further:
yield from traverse_dict(value, f"{parent_key}.{key}", keys_to_not_traverse_further)
else:
yield f"{parent_key}.{key}", value
Let's test it
my_dict = {
"isbn": "123-456-222",
"author": {"lastname": "Doe", "firstname": "Jane"},
"editor": {"lastname": "Smith", "firstname": "Jane"},
"title": "The Ultimate Database Study Guide",
"category": ["Non-Fiction", "Technology"],
"first": {
"second": {"third": {"fourth": {"blah": "yadda"}}},
"fifth": {"sixth": "seventh"},
},
}
for k, v in traverse_dict(my_dict):
print(k, v)
Returns
.isbn 123-456-222
.author.lastname Doe
.author.firstname Jane
.editor.lastname Smith
.editor.firstname Jane
.title The Ultimate Database Study Guide
.category ['Non-Fiction', 'Technology']
.first.second.third.fourth.blah yadda
.first.fifth.sixth seventh
If you don't care about some child dicts e.g names in this case then
use the keys_to_not_traverse_further
for k, v in traverse_dict(my_dict, parent_key="", keys_to_not_traverse_further=("author","editor")):
print(k, v)
Returns
.isbn 123-456-222
.author {'lastname': 'Doe', 'firstname': 'Jane'}
.editor {'lastname': 'Smith', 'firstname': 'Jane'}
.title The Ultimate Database Study Guide
.category ['Non-Fiction', 'Technology']
.first.second.third.fourth.blah yadda
.first.fifth.sixth seventh

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Extract all keys in nested json data with arbitray depth - python

Related

Transform dictionary to map values to list of keys

filter list of dictionaries based on a particular value of a key in that dictionary

Python3 sorting lists of nested dicts with lambda

Efficient way to remove keys with empty strings from a dict

Loop through all nested dictionary values?

Categories

Resources