Convert a nested dictionary into list of tuples - python

I have a dictionary -
d={'revenues':
{
'201907':
{'aaa.csv':'fdwe34x2'},
'201906':{'ddd.csv':'e4c5q'}
},
'complaints':
{'2014':
{'sfdwa.csv','c2c2jh'}
}
}
I want to convert it into list of tuples -
[
('revenues','201907','aaa.csv','fdwe34x2'),
('revenues','201906','ddd.csv','e4c5q'),
('complaints','2014','sfdwa.csv','c2c2jh')
]
I tried using list comprehensions, but did not help -
l = [(k,[(p,q) for p,q in v.items()]) for k,v in d.items()]
print(l)
[('revenues', [('201907', {'aaa.csv': 'fdwe34x2'}), ('201906', {'ddd.csv': 'e4c5q'})]),
('complaints', [('2014', {'c2c2jh', 'sfdwa.csv'})])]
Any suggestions?

If you're not sure how many levels this list may have, it seems that what you need is recursion:
def unnest(d, keys=[]):
result = []
for k, v in d.items():
if isinstance(v, dict):
result.extend(unnest(v, keys + [k]))
else:
result.append(tuple(keys + [k, v]))
return result
Just a friendly reminder: before Python 3.6, dict order is not maintained.
[('complaints', '2014', 'sfdwa.csv', 'c2c2jh'),
('revenues', '201906', 'ddd.csv', 'e4c5q'),
('revenues', '201907', 'aaa.csv', 'fdwe34x2')]

You can loop through the levels of your dictionary:
[(x, y, z) for x in d for y in d[x] for z in d[x][y]]

You can do it using a list comprehension, but it would be quite complex, and not easy to maintain if the structure changes.
Unless you especially need good performance, I would suggest using a generic recursive function:
def unnest(d, keys=[]):
result = []
if isinstance(d, dict):
for k, v in d.items():
result.extend(unnest(v, keys + [k]))
elif isinstance(d, list):
result.append(tuple(keys + d))
elif isinstance(d, set) or isinstance(d, tuple):
result.append(tuple(keys + list(d)))
else:
result.append(tuple(keys + [d]))
return result
As a bonus, I've also supported lists and tuples during the recursion, in addition to the set on the provided example.

Related

Get averages from a dictionary in Python

Get averages from a python dictionary for example if i have the next dictionary:
students={'Dan':(5,8,8), 'Tim':(7), 'Richard':(9,9)}
And i would like to print de dictionary in the next form:
results={'Dan':(7), 'Tim':(7), 'Richard':(9)}
is there any function that i can use? Im new coding in python so dictionaries are a bit confusing for me.
If you want that the avg values will be a tuple element (I don't see any reason to do so but maybe I don't have enough context), try:
results={k: (sum(v)/len(v),) for k,v in students.items()}
I'd do:
result = {}
for k, v in students.items():
if type(v) in [float, int]:
result[k] = v
else:
result[k] = sum(v) / len(v)
I was trying this but realized we had a problem summing a tuple of length 1. So you can do it this way.
results = {}
for k, v in students.items():
print(v)
if (isinstance(v, int)):
results[k] = v
else:
results[k] = sum(v) / len(v)
A pythonic solution would be to use dictionary comprehension to create the results dictionary.
def avg(l):
return sum([l]) / len(l)
results = {key: (avg(val)) for (key, val) in students.items()}
You need brackets around the l in sum so the tuple is treated as a list. I would further change the data structure to a list instead of tuple.

How to count the number of occurrences of a nested dictionary key?

I have a nested dictionary:
d = {'key': 1, 'lock':{'key': 3, 'lock': {'key': 7, 'lock': None}}}
and I am hoping to simply get the number of times 'key' occurs. So in this example, the output would be:
3
because 'key' appears 3 times. I know this is pretty straight forward but I'm a bit foggy on how to do this with dictionaries. I would prefer to not use any libraries if possible.
Thanks for the help!
You can use a function that recursively counts the given key:
def count(d, k):
c = int(k in d)
for v in d.values():
if isinstance(v, dict):
c += count(v, k)
return c
or to write the above in a more concise way:
def count(d, k):
return (k in d) + sum(count(v, k) for v in d.values() if isinstance(v, dict))
so that count(d, 'key') returns: 3

Python3 sorting lists of nested dicts with lambda

I have a deep nested object of various lists and dicts that I retrieve as json which I need to compare to another version of itself. The issue is that all lists are basically unsorted, therefore I need to sort before comparing them. Any deep diff library I've tried failed without proper sorting the dicts position in the lists, so here we go.
Sample object that requires sorting:
{
"main":{
"key1":"value1",
"key2":"value2",
"key3":[{
"sub1":"value2",
"sub2":{
"subsub":[{
"subsubsub1":10,
"subsubsub2":11,
"subsubsub3":[10,11,12]
},{
"subsubsub1":7,
"subsubsub2":8,
"subsubsub3":[9,7,8]
}]
}
},{
"sub1":"value1",
"sub2":{
"subsub":[{
"subsubsub1":1,
"subsubsub2":2,
"subsubsub3":[1,2,3]
},
{
"subsubsub1":4,
"subsubsub2":5,
"subsubsub3":[5,6,4]
}]
}
}]
}
}
Besides a few recursive loops I'm trying to sort the dicts by translating them with sorted lists into sorted tuples and hash them.
Edit:
The object is passed into unnest()
def unnest(d):
for k, v in d.items():
if isinstance(v, dict):
d.update({k: unnest(v)})
elif isinstance(v, list):
d.update({k: unsort(v)})
return d
def unsort(l):
for i, e in enumerate(l):
if isinstance(e, dict):
l[i] = unnest(e)
elif isinstance(e, list):
l[i] = unsort(e)
return sorted(l, key=lambda i: sort_hash(i))
def unnest_hash(d):
for k, v in d.items():
if isinstance(v, dict):
d.update({k: unnest_hash(v)})
elif isinstance(v, list):
d.update({k: sort_hash(v)})
return hash(tuple(sorted(d.items())))
def sort_hash(l):
if isinstance(l, list):
for i, e in enumerate(l):
if isinstance(e, dict):
l[i] = unnest_hash(e)
elif isinstance(e, list):
l[i] = sort_hash(e)
return hash(tuple(sorted(l)))
elif isinstance(l, dict):
return unnest_hash(l)
else:
return hash(l)
However for some reason the hash value gets written into the "sorted" list:
{'main': {'key1': 'value1', 'key2': 'value2', 'key3': [{'sub1': 'value2', 'sub2': -4046234112924644199}, {'sub1': 'value1', 'sub2': 4015568797712784641}]}}
How can I prevent the sort value in the lambda function to be written into the returned sorted list?
Thanks!
Your sort_hash function is mutating the value passed into it. That's why you see it in the original values are the sort:
l[i] = unnest_hash(e)
and
l[i] = sort_hash(e)
both modify the value you are trying to hash. unnest_hash also modifies the original values:
d.update({k: unnest_hash(v)})
A hash calculation for sorting must never modify the value it is hashing.

Efficient way to remove keys with empty strings from a dict

I have a dict and would like to remove all the keys for which there are empty value strings.
metadata = {u'Composite:PreviewImage': u'(Binary data 101973 bytes)',
u'EXIF:CFAPattern2': u''}
What is the best way to do this?
Python 2.X
dict((k, v) for k, v in metadata.iteritems() if v)
Python 2.7 - 3.X
{k: v for k, v in metadata.items() if v}
Note that all of your keys have values. It's just that some of those values are the empty string. There's no such thing as a key in a dict without a value; if it didn't have a value, it wouldn't be in the dict.
It can get even shorter than BrenBarn's solution (and more readable I think)
{k: v for k, v in metadata.items() if v}
Tested with Python 2.7.3.
If you really need to modify the original dictionary:
empty_keys = [k for k,v in metadata.iteritems() if not v]
for k in empty_keys:
del metadata[k]
Note that we have to make a list of the empty keys because we can't modify a dictionary while iterating through it (as you may have noticed). This is less expensive (memory-wise) than creating a brand-new dictionary, though, unless there are a lot of entries with empty values.
If you want a full-featured, yet succinct approach to handling real-world data structures which are often nested, and can even contain cycles, I recommend looking at the remap utility from the boltons utility package.
After pip install boltons or copying iterutils.py into your project, just do:
from boltons.iterutils import remap
drop_falsey = lambda path, key, value: bool(value)
clean = remap(metadata, visit=drop_falsey)
This page has many more examples, including ones working with much larger objects from Github's API.
It's pure-Python, so it works everywhere, and is fully tested in Python 2.7 and 3.3+. Best of all, I wrote it for exactly cases like this, so if you find a case it doesn't handle, you can bug me to fix it right here.
Based on Ryan's solution, if you also have lists and nested dictionaries:
For Python 2:
def remove_empty_from_dict(d):
if type(d) is dict:
return dict((k, remove_empty_from_dict(v)) for k, v in d.iteritems() if v and remove_empty_from_dict(v))
elif type(d) is list:
return [remove_empty_from_dict(v) for v in d if v and remove_empty_from_dict(v)]
else:
return d
For Python 3:
def remove_empty_from_dict(d):
if type(d) is dict:
return dict((k, remove_empty_from_dict(v)) for k, v in d.items() if v and remove_empty_from_dict(v))
elif type(d) is list:
return [remove_empty_from_dict(v) for v in d if v and remove_empty_from_dict(v)]
else:
return d
BrenBarn's solution is ideal (and pythonic, I might add). Here is another (fp) solution, however:
from operator import itemgetter
dict(filter(itemgetter(1), metadata.items()))
If you have a nested dictionary, and you want this to work even for empty sub-elements, you can use a recursive variant of BrenBarn's suggestion:
def scrub_dict(d):
if type(d) is dict:
return dict((k, scrub_dict(v)) for k, v in d.iteritems() if v and scrub_dict(v))
else:
return d
For python 3
dict((k, v) for k, v in metadata.items() if v)
Quick Answer (TL;DR)
Example01
### example01 -------------------
mydict = { "alpha":0,
"bravo":"0",
"charlie":"three",
"delta":[],
"echo":False,
"foxy":"False",
"golf":"",
"hotel":" ",
}
newdict = dict([(vkey, vdata) for vkey, vdata in mydict.iteritems() if(vdata) ])
print newdict
### result01 -------------------
result01 ='''
{'foxy': 'False', 'charlie': 'three', 'bravo': '0'}
'''
Detailed Answer
Problem
Context: Python 2.x
Scenario: Developer wishes modify a dictionary to exclude blank values
aka remove empty values from a dictionary
aka delete keys with blank values
aka filter dictionary for non-blank values over each key-value pair
Solution
example01 use python list-comprehension syntax with simple conditional to remove "empty" values
Pitfalls
example01 only operates on a copy of the original dictionary (does not modify in place)
example01 may produce unexpected results depending on what developer means by "empty"
Does developer mean to keep values that are falsy?
If the values in the dictionary are not gauranteed to be strings, developer may have unexpected data loss.
result01 shows that only three key-value pairs were preserved from the original set
Alternate example
example02 helps deal with potential pitfalls
The approach is to use a more precise definition of "empty" by changing the conditional.
Here we only want to filter out values that evaluate to blank strings.
Here we also use .strip() to filter out values that consist of only whitespace.
Example02
### example02 -------------------
mydict = { "alpha":0,
"bravo":"0",
"charlie":"three",
"delta":[],
"echo":False,
"foxy":"False",
"golf":"",
"hotel":" ",
}
newdict = dict([(vkey, vdata) for vkey, vdata in mydict.iteritems() if(str(vdata).strip()) ])
print newdict
### result02 -------------------
result02 ='''
{'alpha': 0,
'bravo': '0',
'charlie': 'three',
'delta': [],
'echo': False,
'foxy': 'False'
}
'''
See also
list-comprehension
falsy
checking for empty string
modifying original dictionary in place
dictionary comprehensions
pitfalls of checking for empty string
Building on the answers from patriciasz and nneonneo, and accounting for the possibility that you might want to delete keys that have only certain falsy things (e.g. '') but not others (e.g. 0), or perhaps you even want to include some truthy things (e.g. 'SPAM'), then you could make a highly specific hitlist:
unwanted = ['', u'', None, False, [], 'SPAM']
Unfortunately, this doesn't quite work, because for example 0 in unwanted evaluates to True. We need to discriminate between 0 and other falsy things, so we have to use is:
any([0 is i for i in unwanted])
...evaluates to False.
Now use it to del the unwanted things:
unwanted_keys = [k for k, v in metadata.items() if any([v is i for i in unwanted])]
for k in unwanted_keys: del metadata[k]
If you want a new dictionary, instead of modifying metadata in place:
newdict = {k: v for k, v in metadata.items() if not any([v is i for i in unwanted])}
I read all replies in this thread and some referred also to this thread:
Remove empty dicts in nested dictionary with recursive function
I originally used solution here and it worked great:
Attempt 1: Too Hot (not performant or future-proof):
def scrub_dict(d):
if type(d) is dict:
return dict((k, scrub_dict(v)) for k, v in d.iteritems() if v and scrub_dict(v))
else:
return d
But some performance and compatibility concerns were raised in Python 2.7 world:
use isinstance instead of type
unroll the list comp into for loop for efficiency
use python3 safe items instead of iteritems
Attempt 2: Too Cold (Lacks Memoization):
def scrub_dict(d):
new_dict = {}
for k, v in d.items():
if isinstance(v,dict):
v = scrub_dict(v)
if not v in (u'', None, {}):
new_dict[k] = v
return new_dict
DOH! This is not recursive and not at all memoizant.
Attempt 3: Just Right (so far):
def scrub_dict(d):
new_dict = {}
for k, v in d.items():
if isinstance(v,dict):
v = scrub_dict(v)
if not v in (u'', None, {}):
new_dict[k] = v
return new_dict
To preserve 0 and False values but get rid of empty values you could use:
{k: v for k, v in metadata.items() if v or v == 0 or v is False}
For a nested dict with mixed types of values you could use:
def remove_empty_from_dict(d):
if isinstance(d, dict):
return dict((k, remove_empty_from_dict(v)) for k, v in d.items() \
if v or v == 0 or v is False and remove_empty_from_dict(v) is not None)
elif isinstance(d, list):
return [remove_empty_from_dict(v) for v in d
if v or v == 0 or v is False and remove_empty_from_dict(v) is not None]
else:
if d or d == 0 or d is False:
return d
"As I also currently write a desktop application for my work with Python, I found in data-entry application when there is lots of entry and which some are not mandatory thus user can left it blank, for validation purpose, it is easy to grab all entries and then discard empty key or value of a dictionary. So my code above a show how we can easy take them out, using dictionary comprehension and keep dictionary value element which is not blank. I use Python 3.8.3
data = {'':'', '20':'', '50':'', '100':'1.1', '200':'1.2'}
dic = {key:value for key,value in data.items() if value != ''}
print(dic)
{'100': '1.1', '200': '1.2'}
Dicts mixed with Arrays
The answer at Attempt 3: Just Right (so far) from BlissRage's answer does not properly handle arrays elements. I'm including a patch in case anyone needs it. The method is handles list with the statement block of if isinstance(v, list):, which scrubs the list using the original scrub_dict(d) implementation.
#staticmethod
def scrub_dict(d):
new_dict = {}
for k, v in d.items():
if isinstance(v, dict):
v = scrub_dict(v)
if isinstance(v, list):
v = scrub_list(v)
if not v in (u'', None, {}, []):
new_dict[k] = v
return new_dict
#staticmethod
def scrub_list(d):
scrubbed_list = []
for i in d:
if isinstance(i, dict):
i = scrub_dict(i)
scrubbed_list.append(i)
return scrubbed_list
An alternative way you can do this, is using dictionary comprehension. This should be compatible with 2.7+
result = {
key: value for key, value in
{"foo": "bar", "lorem": None}.items()
if value
}
Here is an option if you are using pandas:
import pandas as pd
d = dict.fromkeys(['a', 'b', 'c', 'd'])
d['b'] = 'not null'
d['c'] = '' # empty string
print(d)
# convert `dict` to `Series` and replace any blank strings with `None`;
# use the `.dropna()` method and
# then convert back to a `dict`
d_ = pd.Series(d).replace('', None).dropna().to_dict()
print(d_)
Some of Methods mentioned above ignores if there are any integers and float with values 0 & 0.0
If someone wants to avoid the above can use below code(removes empty strings and None values from nested dictionary and nested list):
def remove_empty_from_dict(d):
if type(d) is dict:
_temp = {}
for k,v in d.items():
if v == None or v == "":
pass
elif type(v) is int or type(v) is float:
_temp[k] = remove_empty_from_dict(v)
elif (v or remove_empty_from_dict(v)):
_temp[k] = remove_empty_from_dict(v)
return _temp
elif type(d) is list:
return [remove_empty_from_dict(v) for v in d if( (str(v).strip() or str(remove_empty_from_dict(v)).strip()) and (v != None or remove_empty_from_dict(v) != None))]
else:
return d
metadata ={'src':'1921','dest':'1337','email':'','movile':''}
ot = {k: v for k, v in metadata.items() if v != ''}
print(f"Final {ot}")
You also have an option with filter method:
filtered_metadata = dict( filter(lambda val: val[1] != u'', metadata.items()) )
Some benchmarking:
1. List comprehension recreate dict
In [7]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
...: dic = {k: v for k, v in dic.items() if v is not None}
1000000 loops, best of 7: 375 ns per loop
2. List comprehension recreate dict using dict()
In [8]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
...: dic = dict((k, v) for k, v in dic.items() if v is not None)
1000000 loops, best of 7: 681 ns per loop
3. Loop and delete key if v is None
In [10]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
...: for k, v in dic.items():
...: if v is None:
...: del dic[k]
...:
10000000 loops, best of 7: 160 ns per loop
so loop and delete is the fastest at 160ns, list comprehension is half as slow at ~375ns and with a call to dict() is half as slow again ~680ns.
Wrapping 3 into a function brings it back down again to about 275ns. Also for me PyPy was about twice as fast as neet python.

Create a dictionary of non-contradicting items from a list of dictionaries

This question is inspired by this question. I'd like to get a dictionary from a list of dictionaries that should contain all key/value pairs from all dictionaries that are either only contained once, or where all dictionaries agree on the associated value. Example (taken from the aforementioned posting):
dicts = [dict(a=3, b=89, d=2), dict(a=3, b=89, c=99), dict(a=3, b=42, c=33)]
print dict_itersection(dicts)
should yield
{'a': 3, 'd': 2}
My current implementation looks like this:
import collections
def dict_intersection(dicts):
c=collections.defaultdict(set)
for d in dicts:
for a, b in d.iteritems():
c[a].add(b)
return {a: next(iter(b)) for a, b in c.iteritems() if len(b) == 1}
So my question: Can this be done more elegantly?
Sidequestion: can next(iter(b)) be done better without modification of the underlying dictionary (i.e. not b.pop())?
dicts = [dict(a=3, b=89, d=2), dict(a=3, b=89, c=99), dict(a=3, b=42, c=33)]
data = {}
for d in dicts:
for k, v in d.iteritems():
data.setdefault(k, set()).add(v)
out = dict((k, v.pop()) for k, v in data.iteritems() if len(v) == 1)
# out == {'a': 3, 'd': 2}
… or a one-liner:
import itertools as it
dict((k, v.pop()[1]) for k,v in ((k, set(v)) for k, v in it.groupby(sorted(it.chain(*(d.iteritems() for d in dicts))), key=lambda x: x[0])) if len(v) == 1)
Yours is pretty close to as elegant as I can think of. The only change I would make is to replaced the nested for loop with a itertools.chain()'ed iterator, like this:
import collections
def dict_intersection(dicts):
c=collections.defaultdict(set)
for k,v in itertools.chain(*[d.iteritems() for d in dicts]):
c[k].add(v)
return {a: next(iter(b)) for a, b in c.iteritems() if len(b) == 1}
Edit(1): The below code answers a slightly different question - how to get any entry which appears with the same key and value in at least two of the input dictionaries.
My answer from the comments in the other question:
dict(
[k for k,count in
collections.Counter(itertools.chain(*[d.iteritems() for d in dicts])).iteritems()
if count > 1]
)
This is nominally a "one-liner" but I've spread it over multiple lines to (hopefully) make it a bit clearer.
The way it works is (starting from the inside and working out):
Use itertools.chain() to get an iterator over the elements of all the dictionaries.
Use collections.Counter() to count how many times each key, value pair appears in the dictionaries.
Use a list comprehension to filter the Counter for those key, value pairs occurring at least twice.
Convert the list back into a dict.
All solutions so far assume that all dictionary values are hashable. Since the code won't get slower and only little more complex without this assumption, I'd drop it. Here's a version that works for all values that support !=:
def dict_intersection(dicts):
result = {}
conflicting = set()
for d in dicts:
for k, v in d.iteritems():
if k not in conflicting and result.setdefault(k, v) != v:
del result[k]
conflicting.add(k)
return result
The set conflicting will only contain dictionary keys, which will always be hashable.
To get the intersection:
dict(reduce(lambda x, y: x & y, map(set, map(lambda x: x.iteritems(), dicts))))
Of course, this drops unique values, so we need to get the complement:
dict(reduce(lambda x, y: x - y, map(set, map(lambda x: x.iteritems(), dicts))))
Combining the resulting dictionaries gives us the result set:
def dict_intersection(d):
x = dict(reduce(lambda x, y: x & y, map(set, map(lambda x: x.iteritems(), dicts))))
y = dict(reduce(lambda x, y: x - y, map(set, map(lambda x: x.iteritems(), dicts))))
return dict(x.items() + y.items())
If my set fu was stronger I could get it down to a one liner, but not today it seems.

Categories