Efficient way to remove keys with empty strings from a dict

Efficient way to remove keys with empty strings from a dict - python

I have a dict and would like to remove all the keys for which there are empty value strings.
metadata = {u'Composite:PreviewImage': u'(Binary data 101973 bytes)',
u'EXIF:CFAPattern2': u''}
What is the best way to do this?

Python 2.X
dict((k, v) for k, v in metadata.iteritems() if v)
Python 2.7 - 3.X
{k: v for k, v in metadata.items() if v}
Note that all of your keys have values. It's just that some of those values are the empty string. There's no such thing as a key in a dict without a value; if it didn't have a value, it wouldn't be in the dict.

It can get even shorter than BrenBarn's solution (and more readable I think)
{k: v for k, v in metadata.items() if v}
Tested with Python 2.7.3.

If you really need to modify the original dictionary:
empty_keys = [k for k,v in metadata.iteritems() if not v]
for k in empty_keys:
del metadata[k]
Note that we have to make a list of the empty keys because we can't modify a dictionary while iterating through it (as you may have noticed). This is less expensive (memory-wise) than creating a brand-new dictionary, though, unless there are a lot of entries with empty values.

If you want a full-featured, yet succinct approach to handling real-world data structures which are often nested, and can even contain cycles, I recommend looking at the remap utility from the boltons utility package.
After pip install boltons or copying iterutils.py into your project, just do:
from boltons.iterutils import remap
drop_falsey = lambda path, key, value: bool(value)
clean = remap(metadata, visit=drop_falsey)
This page has many more examples, including ones working with much larger objects from Github's API.
It's pure-Python, so it works everywhere, and is fully tested in Python 2.7 and 3.3+. Best of all, I wrote it for exactly cases like this, so if you find a case it doesn't handle, you can bug me to fix it right here.

Based on Ryan's solution, if you also have lists and nested dictionaries:
For Python 2:
def remove_empty_from_dict(d):
if type(d) is dict:
return dict((k, remove_empty_from_dict(v)) for k, v in d.iteritems() if v and remove_empty_from_dict(v))
elif type(d) is list:
return [remove_empty_from_dict(v) for v in d if v and remove_empty_from_dict(v)]
else:
return d
For Python 3:
def remove_empty_from_dict(d):
if type(d) is dict:
return dict((k, remove_empty_from_dict(v)) for k, v in d.items() if v and remove_empty_from_dict(v))
elif type(d) is list:
return [remove_empty_from_dict(v) for v in d if v and remove_empty_from_dict(v)]
else:
return d

BrenBarn's solution is ideal (and pythonic, I might add). Here is another (fp) solution, however:
from operator import itemgetter
dict(filter(itemgetter(1), metadata.items()))

If you have a nested dictionary, and you want this to work even for empty sub-elements, you can use a recursive variant of BrenBarn's suggestion:
def scrub_dict(d):
if type(d) is dict:
return dict((k, scrub_dict(v)) for k, v in d.iteritems() if v and scrub_dict(v))
else:
return d

For python 3
dict((k, v) for k, v in metadata.items() if v)

Quick Answer (TL;DR)
Example01
### example01 -------------------
mydict = { "alpha":0,
"bravo":"0",
"charlie":"three",
"delta":[],
"echo":False,
"foxy":"False",
"golf":"",
"hotel":" ",
}
newdict = dict([(vkey, vdata) for vkey, vdata in mydict.iteritems() if(vdata) ])
print newdict
### result01 -------------------
result01 ='''
{'foxy': 'False', 'charlie': 'three', 'bravo': '0'}
'''
Detailed Answer
Problem
Context: Python 2.x
Scenario: Developer wishes modify a dictionary to exclude blank values
aka remove empty values from a dictionary
aka delete keys with blank values
aka filter dictionary for non-blank values over each key-value pair
Solution
example01 use python list-comprehension syntax with simple conditional to remove "empty" values
Pitfalls
example01 only operates on a copy of the original dictionary (does not modify in place)
example01 may produce unexpected results depending on what developer means by "empty"
Does developer mean to keep values that are falsy?
If the values in the dictionary are not gauranteed to be strings, developer may have unexpected data loss.
result01 shows that only three key-value pairs were preserved from the original set
Alternate example
example02 helps deal with potential pitfalls
The approach is to use a more precise definition of "empty" by changing the conditional.
Here we only want to filter out values that evaluate to blank strings.
Here we also use .strip() to filter out values that consist of only whitespace.
Example02
### example02 -------------------
mydict = { "alpha":0,
"bravo":"0",
"charlie":"three",
"delta":[],
"echo":False,
"foxy":"False",
"golf":"",
"hotel":" ",
}
newdict = dict([(vkey, vdata) for vkey, vdata in mydict.iteritems() if(str(vdata).strip()) ])
print newdict
### result02 -------------------
result02 ='''
{'alpha': 0,
'bravo': '0',
'charlie': 'three',
'delta': [],
'echo': False,
'foxy': 'False'
}
'''
See also
list-comprehension
falsy
checking for empty string
modifying original dictionary in place
dictionary comprehensions
pitfalls of checking for empty string

Building on the answers from patriciasz and nneonneo, and accounting for the possibility that you might want to delete keys that have only certain falsy things (e.g. '') but not others (e.g. 0), or perhaps you even want to include some truthy things (e.g. 'SPAM'), then you could make a highly specific hitlist:
unwanted = ['', u'', None, False, [], 'SPAM']
Unfortunately, this doesn't quite work, because for example 0 in unwanted evaluates to True. We need to discriminate between 0 and other falsy things, so we have to use is:
any([0 is i for i in unwanted])
...evaluates to False.
Now use it to del the unwanted things:
unwanted_keys = [k for k, v in metadata.items() if any([v is i for i in unwanted])]
for k in unwanted_keys: del metadata[k]
If you want a new dictionary, instead of modifying metadata in place:
newdict = {k: v for k, v in metadata.items() if not any([v is i for i in unwanted])}

I read all replies in this thread and some referred also to this thread:
Remove empty dicts in nested dictionary with recursive function
I originally used solution here and it worked great:
Attempt 1: Too Hot (not performant or future-proof):
def scrub_dict(d):
if type(d) is dict:
return dict((k, scrub_dict(v)) for k, v in d.iteritems() if v and scrub_dict(v))
else:
return d
But some performance and compatibility concerns were raised in Python 2.7 world:
use isinstance instead of type
unroll the list comp into for loop for efficiency
use python3 safe items instead of iteritems
Attempt 2: Too Cold (Lacks Memoization):
def scrub_dict(d):
new_dict = {}
for k, v in d.items():
if isinstance(v,dict):
v = scrub_dict(v)
if not v in (u'', None, {}):
new_dict[k] = v
return new_dict
DOH! This is not recursive and not at all memoizant.
Attempt 3: Just Right (so far):
def scrub_dict(d):
new_dict = {}
for k, v in d.items():
if isinstance(v,dict):
v = scrub_dict(v)
if not v in (u'', None, {}):
new_dict[k] = v
return new_dict

To preserve 0 and False values but get rid of empty values you could use:
{k: v for k, v in metadata.items() if v or v == 0 or v is False}
For a nested dict with mixed types of values you could use:
def remove_empty_from_dict(d):
if isinstance(d, dict):
return dict((k, remove_empty_from_dict(v)) for k, v in d.items() \
if v or v == 0 or v is False and remove_empty_from_dict(v) is not None)
elif isinstance(d, list):
return [remove_empty_from_dict(v) for v in d
if v or v == 0 or v is False and remove_empty_from_dict(v) is not None]
else:
if d or d == 0 or d is False:
return d

"As I also currently write a desktop application for my work with Python, I found in data-entry application when there is lots of entry and which some are not mandatory thus user can left it blank, for validation purpose, it is easy to grab all entries and then discard empty key or value of a dictionary. So my code above a show how we can easy take them out, using dictionary comprehension and keep dictionary value element which is not blank. I use Python 3.8.3
data = {'':'', '20':'', '50':'', '100':'1.1', '200':'1.2'}
dic = {key:value for key,value in data.items() if value != ''}
print(dic)
{'100': '1.1', '200': '1.2'}

Dicts mixed with Arrays
The answer at Attempt 3: Just Right (so far) from BlissRage's answer does not properly handle arrays elements. I'm including a patch in case anyone needs it. The method is handles list with the statement block of if isinstance(v, list):, which scrubs the list using the original scrub_dict(d) implementation.
#staticmethod
def scrub_dict(d):
new_dict = {}
for k, v in d.items():
if isinstance(v, dict):
v = scrub_dict(v)
if isinstance(v, list):
v = scrub_list(v)
if not v in (u'', None, {}, []):
new_dict[k] = v
return new_dict
#staticmethod
def scrub_list(d):
scrubbed_list = []
for i in d:
if isinstance(i, dict):
i = scrub_dict(i)
scrubbed_list.append(i)
return scrubbed_list

An alternative way you can do this, is using dictionary comprehension. This should be compatible with 2.7+
result = {
key: value for key, value in
{"foo": "bar", "lorem": None}.items()
if value
}

Here is an option if you are using pandas:
import pandas as pd
d = dict.fromkeys(['a', 'b', 'c', 'd'])
d['b'] = 'not null'
d['c'] = '' # empty string
print(d)
# convert `dict` to `Series` and replace any blank strings with `None`;
# use the `.dropna()` method and
# then convert back to a `dict`
d_ = pd.Series(d).replace('', None).dropna().to_dict()
print(d_)

Some of Methods mentioned above ignores if there are any integers and float with values 0 & 0.0
If someone wants to avoid the above can use below code(removes empty strings and None values from nested dictionary and nested list):
def remove_empty_from_dict(d):
if type(d) is dict:
_temp = {}
for k,v in d.items():
if v == None or v == "":
pass
elif type(v) is int or type(v) is float:
_temp[k] = remove_empty_from_dict(v)
elif (v or remove_empty_from_dict(v)):
_temp[k] = remove_empty_from_dict(v)
return _temp
elif type(d) is list:
return [remove_empty_from_dict(v) for v in d if( (str(v).strip() or str(remove_empty_from_dict(v)).strip()) and (v != None or remove_empty_from_dict(v) != None))]
else:
return d

metadata ={'src':'1921','dest':'1337','email':'','movile':''}
ot = {k: v for k, v in metadata.items() if v != ''}
print(f"Final {ot}")

You also have an option with filter method:
filtered_metadata = dict( filter(lambda val: val[1] != u'', metadata.items()) )

Some benchmarking:
1. List comprehension recreate dict
In [7]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
...: dic = {k: v for k, v in dic.items() if v is not None}
1000000 loops, best of 7: 375 ns per loop
2. List comprehension recreate dict using dict()
In [8]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
...: dic = dict((k, v) for k, v in dic.items() if v is not None)
1000000 loops, best of 7: 681 ns per loop
3. Loop and delete key if v is None
In [10]: %%timeit dic = {str(i):i for i in xrange(10)}; dic['10'] = None; dic['5'] = None
...: for k, v in dic.items():
...: if v is None:
...: del dic[k]
...:
10000000 loops, best of 7: 160 ns per loop
so loop and delete is the fastest at 160ns, list comprehension is half as slow at ~375ns and with a call to dict() is half as slow again ~680ns.
Wrapping 3 into a function brings it back down again to about 275ns. Also for me PyPy was about twice as fast as neet python.

Related

Check if value exists in a dictionary of dictionaries and get the key(s)?

I have a dictionary of dictionaries:
x = {'NIFTY': {11382018: 'NIFTY19SEPFUT', 13177346: 'NIFTY19OCTFUT', 12335874: 'NIFTY19NOVFUT'}}
The dictionary has a lot of other dictionaries inside.
I want to check whether example:
y = 11382018
exists in the dictionary, if yes, get the master key in this case NIFTY and the value of the above key i.e. 'NIFTY19SEPFUT'
I can do this in the following way I assume:
for key in x.keys():
di = x[key]
if y in di.keys():
inst = key
cont = di[y]
Just wondering if there is a better way.
I was thinking along the lines of not having to loop over the entire dictionary master keys

A more compact way to retrieve both values of interest would be using a nested dictionary comprehension:
[(k, sv) for k,v in x.items() for sk,sv in v.items() if sk == y]
# [('NIFTY', 'NIFTY19SEPFUT')]

More compact version (generic):
[(k, v[y]) for k, v in d.items() if y in v]
Or:
*next(((k, v[y]) for k, v in d.items() if y in v), 'not found')
if you can guarantee the key is found only in one nested dictionary. (Note that I have used d as dictionary here, simply because that feels more meaningful)
Code:
d = {'NIFTY': {11382018: 'NIFTY19SEPFUT', 13177346: 'NIFTY19OCTFUT', 12335874: 'NIFTY19NOVFUT'}}
y = 11382018
print([(k, v[y]) for k, v in d.items() if y in v])
# or:
# print(*next(((k, v[y]) for k, v in d.items() if y in v), 'not found'))

Straightforwardly (for only 2 levels of nesting):
x = {'NIFTY': {11382018: 'NIFTY19SEPFUT', 13177346: 'NIFTY19OCTFUT', 12335874: 'NIFTY19NOVFUT'}}
search_key = 11382018
parent_key, value = None, None
for k, inner_d in x.items():
if search_key in inner_d:
parent_key, value = k, inner_d[search_key]
break
print(parent_key, value) # NIFTY NIFTY19SEPFUT

map a function to a dict of lists [duplicate]

I want to apply a function to all values in dict and store that in a separate dict. I am just trying to see how I can play with python and want to see how I can rewrite something like this
for i in d:
d2[i] = f(d[i])
to something like
d2[i] = f(d[i]) for i in d
The first way of writing it is of course fine, but I am trying to figure how python syntax can be changed

If you're using Python 2.7 or 3.x:
d2 = {k: f(v) for k, v in d1.items()}
Which is equivalent to:
d2 = {}
for k, v in d1.items():
d2[k] = f(v)
Otherwise:
d2 = dict((k, f(v)) for k, v in d1.items())

You could use map:
d2 = dict(d, map(f, d.values()))
If you don't mind using an extension. You can also use valmap in the toolz library which is functionally equivalent to using the map solution:
from toolz.dicttoolz import valmap
d2 = valmap(f, d)
If not for the clean presentation of the method, you also have the option of supplying a default return class as well, for people that need something other than a dict.

d2 = dict((k, f(v)) for k,v in d.items())

Dictionaries can be nested in Python and in this case the solution d2 = {k: f(v) for k, v in d1.items()} will not work.
For nested dictionaries one needs some function to transverse the whole data structure. For instance if values are allowed to be themselves dictionaries, one can define a function like:
def myfun(d):
for k, v in d.iteritems():
if isinstance(v, dict):
d[k] = myfun(v)
else:
d[k] = f(v)
return d
And then
d2 = myfun(d)

Python inverted dictionaries

I'm currently writing a function that takes a dictionary with immutable values and returns an inverted dictionary. So far, my code is getting extremely simple tests right, but it still has some kinks to work out
def dict_invert(d):
inv_map = {v: k for k, v in d.items()}
return inv_map
list1 = [1,2,3,4,5,6,7,8]
list2 = {[1]:3245,[2]:4356,[3]:6578}
d = {['a']:[],['b']:[]}
d['a'].append(list1)
d['b'].append(list2)
How do I fix my code so that it passes the test cases?
My only thoughts are to change list 2 to [1:32, 2:43, 3:54, 4:65]; however, I would still have a problem with having the "[]" in the right spot. I have no idea how to do that.

The trick is to realize that multiple keys can have the same values, so when inverting, you must make sure your values map to a list of keys.
from collections import defaultdict
def dict_invert(d):
inv_map = defaultdict(list)
for k, v in d.items():
inv_map[v].append(k)
return inv_map
EDIT:
Just adding a bit of more helpful info...
The defaultdict(list) makes the default value of the dict = list() when accessed via [] or get (when normally it would raise KeyError or return None respectively).
With that defaultdict in place, you can use a bit of logic to group keys together... here's an example to illustrate (from my comment above)
Original dict: K0 -> V0, K1 -> V0, K2 -> V0
Should invert to: V0 -> [K0, K1, K2]
EDIT 2:
Your tests seem to be forcing you into using a normal dict, in which case...
def dict_invert(d):
inv_map = {}
for k, v in d.items():
if v not in inv_map:
inv_map[v] = []
inv_map[v].append(k)
return inv_map

change key to lower case for dict or OrderedDict

Following works for a dictionary, but not OrderedDict. For od it seems to form an infinite loop. Can you tell me why?
If the function input is dict it has to return dict, if input is OrderedDict it has to return od.
def key_lower(d):
"""returns d for d or od for od with keys changed to lower case
"""
for k in d.iterkeys():
v = d.pop(k)
if (type(k) == str) and (not k.islower()):
k = k.lower()
d[k] = v
return d

It forms an infinite loop because of the way ordered dictionaries add new members (to the end)
Since you are using iterkeys, it is using a generator. When you assign d[k] = v you are adding the new key/value to the end of the dictionary. Because you are using a generator, that will continue to generate keys as you continue adding them.
You could fix this in a few ways. One would be to create a new ordered dict from the previous.
def key_lower(d):
newDict = OrderedDict()
for k, v in d.iteritems():
if (isinstance(k, (str, basestring))):
k = k.lower()
newDict[k] = v
return newDict
The other way would be to not use a generator and use keys instead of iterkeys

As sberry mentioned, the infinite loop is essentially as you are modifying and reading the dict at the same time.
Probably the simplest solution is to use OrderedDict.keys() instead of OrderedDict.iterkeys():
for k in d.keys():
v = d.pop(k)
if (type(k) == str) and (not k.islower()):
k = k.lower()
d[k] = v
as the keys are captured directly at the start, they won't get updated as items are changed in the dict.

Is there a more pythonic way to do this dictionary iteration?

I have a dictionary in the view layer, that I am passing to my templates. The dictionary values are (mostly) lists, although a few scalars also reside in the dictionary. The lists if present are initialized to None.
The None values are being printed as 'None' in the template, so I wrote this little function to clean out the Nones before passing the dictionary of lists to the template. Since I am new to Python, I am wondering if there could be a more pythonic way of doing this?
# Clean the table up and turn Nones into ''
for k, v in table.items():
#debug_str = 'key: %s, value: %s' % (k,v)
#logging.debug(debug_str)
try:
for i, val in enumerate(v):
if val == None: v[i] = ''
except TypeError:
continue;

Have you looked at defaultdict within collections? You'd have a dictionary formed via
defaultdict(list)
which initializes an empty list when a key is queried and that key does not exist.

filtered_dict = dict((k, v) for k, v in table.items() if v is not None)
or in Python 2.7+, use the dictionary comprehension syntax:
filtered_dict = {k: v for k, v in table.items() if v is not None}

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficient way to remove keys with empty strings from a dict - python

I have a dict and would like to remove all the keys for which there are empty value strings. metadata = {u'Composite:PreviewImage': u'(Binary data 101973 bytes)', u'EXIF:CFAPattern2': u''} What is the best way to do this?

It can get even shorter than BrenBarn's solution (and more readable I think) {k: v for k, v in metadata.items() if v} Tested with Python 2.7.3.

BrenBarn's solution is ideal (and pythonic, I might add). Here is another (fp) solution, however: from operator import itemgetter dict(filter(itemgetter(1), metadata.items()))

If you have a nested dictionary, and you want this to work even for empty sub-elements, you can use a recursive variant of BrenBarn's suggestion: def scrub_dict(d): if type(d) is dict: return dict((k, scrub_dict(v)) for k, v in d.iteritems() if v and scrub_dict(v)) else: return d

For python 3 dict((k, v) for k, v in metadata.items() if v)

An alternative way you can do this, is using dictionary comprehension. This should be compatible with 2.7+ result = { key: value for key, value in {"foo": "bar", "lorem": None}.items() if value }

metadata ={'src':'1921','dest':'1337','email':'','movile':''} ot = {k: v for k, v in metadata.items() if v != ''} print(f"Final {ot}")

You also have an option with filter method: filtered_metadata = dict( filter(lambda val: val[1] != u'', metadata.items()) )

Related

Check if value exists in a dictionary of dictionaries and get the key(s)?

map a function to a dict of lists [duplicate]

Python inverted dictionaries

change key to lower case for dict or OrderedDict

Is there a more pythonic way to do this dictionary iteration?

Categories

Resources