Related
Is it possible to make a function that will return a nested dict depending on the arguments?
def foo(key):
d = {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}, }
return d[key]
foo(['c']['d'])
I waiting for:
3
I'm getting:
TypeError: list indices must be integers or slices, not str
I understanding that it possible to return a whole dict, or hard code it to return a particular part of dict, like
if 'c' and 'd' in kwargs:
return d['c']['d']
elif 'c' and 'e' in kwargs:
return d['c']['e']
but it will be very inflexible
When you give ['c']['d'], you slice the list ['c'] using the letter d, which isin't possible. So what you can do is, correct the slicing:
foo('c')['d']
Or you could alter your function to slice it:
def foo(*args):
d = {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}, }
d_old = dict(d) # if case you have to store the dict for other operations in the fucntion
for i in args:
d = d[i]
return d
>>> foo('c','d')
3
d = {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}, }
def funt(keys):
val = d
for key in keys:
if val:
val = val.get(key)
return val
funt(['c', 'd'])
Additionally to handle key not present state.
One possible solution would be to iterate over multiple keys -
def foo(keys, d=None):
if d is None:
d = {'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}, }
if len(keys) == 1:
return d[keys[0]]
return foo(keys[1:], d[keys[0]])
foo(['c', 'd'])
This question is an extension based on here and here.
What is a good approach to mapping a function to a specified key path in nested dicts, including these path specification:
List of keys at a given path position
Key slices (assuming sorting)
Wildcards (ie all keys at a path position)
Handling ragged hierarchies by ignoring keys that don't appear at a given level
If it is makes is simpler, can assume that only dicts are nested, no lists of dicts, since the former can be obtained with dict(enumerate(...)).
However, the hierarchy can be ragged, eg:
data = {0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': 300}}}
Would like to be able to specify key path like this:
map_at(f, ['*',['b','c'],'d'])
To return:
{0: {'a': 1, 'b': 2},
1: {'a': 10, 'c': 13},
2: {'a': 20, 'b': {'d': f(100), 'e': 101}, 'c': 23},
3: {'a': 30, 'b': 31, 'c': {'d': f(300)}}}
Here f is mapped to key paths [2,b,d] and [3,c,d].
Slicing would be specified as, eg [0:3,b] for instance.
I think the path spec is unambiguous, though could be generalized to, for example, match key path prefix (in which case, f would also be mapped at [0,b]` and other paths).
Can this be implemented via comprehension and recursion or does it require heavy lifting to catch KeyError etc?
Please do not suggest Pandas as an alternative.
I'm not a big fan of pseudo-code, but in this kind of situation, you need to write down an algorithm. Here's my understanding of your requirements:
map_at(func, path_pattern, data):
if path_pattern is not empty
if data is terminal, it's a failure : we did not match the full path_pattern ̀so there is no reason to apply the function. Just return data.
else, we have to explore every path in data. We consume the head of path_pattern if possible. That is return a dict data key -> map_at(func, new_path, data value) where new_path is the tail of the path_pattern if the key matches the head, else the `path_pattern itself.
else, it's a success, because all the path_pattern was consumed:
if data is terminal, return func(data)
else, find the leaves and apply func: return return a dict data key -> map_at(func, [], data value)
Notes:
I assume that the pattern *-b-d matches the path 0-a-b-c-d-e;
it's an eager algorithm: the head of the path is always consumed when possible;
if the path is fully consumed, every terminal should be mapped;
it's a simple DFS, thus I guess it's possible to write an iterative version with a stack.
Here's the code:
def map_at(func, path_pattern, data):
def matches(pattern, value):
try:
return pattern == '*' or value == pattern or value in pattern
except TypeError: # EDIT: avoid "break" in the dict comprehension if pattern is not a list.
return False
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, [], v) for k,v in data.items()}
except AttributeError: # a leaf: map it
return func(data)
Note that tail if matches(head, k) else path_pattern means: consume head if possible. To use a range in the pattern, just use range(...).
As you can see, you never escape from case 2. : if the path_pattern is empty, you just have to map all leaves whatever happens. This is clearer in this version:
def map_all_leaves(func, data):
"""Apply func to all leaves"""
try:
return {k: map_all_leaves(func, v) for k,v in data.items()}
except AttributeError:
return func(data)
def map_at(func, path_pattern, data):
def matches(pattern, value):
try:
return pattern == '*' or value == pattern or value in pattern
except TypeError: # EDIT: avoid "break" in the dict comprehension if pattern is not a list.
return False
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in data.items()}
except AttributeError: # fail: terminal data but path_pattern is not consumed
return data
else:
map_all_leaves(func, data)
EDIT
If you want to handle lists, you can try this:
def map_at(func, path_pattern, data):
def matches(pattern, value):
try:
return pattern == '*' or value == pattern or value in pattern
except TypeError: # EDIT: avoid "break" in the dict comprehension if pattern is not a list.
return False
def get_items(data):
try:
return data.items()
except AttributeError:
try:
return enumerate(data)
except TypeError:
raise
if path_pattern:
head, *tail = path_pattern
try: # try to consume head for each key of data
return {k: map_at(func, tail if matches(head, k) else path_pattern, v) for k,v in get_items(data)}
except TypeError: # fail: terminal data but path_pattern was not consumed
return data
else: # success: path_pattern is empty.
try: # not a leaf: map every leaf of every path
return {k: map_at(func, [], v) for k,v in get_items(data)}
except TypeError: # a leaf: map it
return func(data)
The idea is simple: enumerate is the equivalent for a list of dict.items:
>>> list(enumerate(['a', 'b']))
[(0, 'a'), (1, 'b')]
>>> list({0:'a', 1:'b'}.items())
[(0, 'a'), (1, 'b')]
Hence, get_items is just a wrapper to return the dict items, the list items (index, value) or raise an error.
The flaw is that lists are converted to dicts in the process:
>>> data2 = [{'a': 1, 'b': 2}, {'a': 10, 'c': 13}, {'a': 20, 'b': {'d': 100, 'e': 101}, 'c': 23}, {'a': 30, 'b': 31, 'c': {'d': 300}}]
>>> map_at(type,['*',['b','c'],'d'],data2)
{0: {'a': 1, 'b': 2}, 1: {'a': 10, 'c': 13}, 2: {'a': 20, 'b': {'d': <class 'int'>, 'e': 101}, 'c': 23}, 3: {'a': 30, 'b': 31, 'c': {'d': <class 'int'>}}}
EDIT
Since you are looking for something like Xpath for JSON, you could try https://pypi.org/project/jsonpath/ or https://pypi.org/project/jsonpath-rw/. (I did not test those libs).
This is not very simple and less efficient, but it should work:
def map_at(f,kp,d): return map_at0(f,kp,d,0)
def slice_contains(s,i): # no negative-index support
a=s.start or 0
return i>=a and (s.end is None or i<s.end) and\
not (i-a)%(s.step or 1)
def map_at0(f,kp,d,i):
if i==len(kp): return f(d)
if not isinstance(d,dict): return d # no such path here
ret={}
p=kp[i]
if isinstance(p,str) and p!='*': p=p,
for j,(k,v) in enumerate(sorted(d.items())):
if p=='*' or (slice_contains(p,j) if isinstance(p,slice) else k in p):
v=map_at0(f,kp,v,i+1)
ret[k]=v
return ret
Note that this copies every dictionary it expands (because it matches the key path, even if no further keys match and f is never applied) but returns unmatched subdictionaries by reference. Note also that '*' can be “quoted” by putting it in a list.
I think you might appreciate this refreshing generator implementation -
def select(sel = [], d = {}, res = []):
# (base case: no selector)
if not sel:
yield (res, d)
# (inductive: a selector) non-dict
elif not isinstance(d, dict):
return
# (inductive: a selector, a dict) wildcard selector
elif sel[0] == '*':
for (k, v) in d.items():
yield from select \
( sel[1:]
, v
, [*res, k]
)
# (inductive: a selector, a dict) list selector
elif isinstance(sel[0], list):
for s in sel[0]:
yield from select \
( [s, *sel[1:]]
, d
, res
)
# (inductive: a selector, a dict) single selector
elif sel[0] in d:
yield from select \
( sel[1:]
, d[sel[0]]
, [*res, sel[0]]
)
# (inductive: single selector not in dict) no match
else:
return
It works like this -
data = \
{ 0: { 'a': 1, 'b': 2 }
, 1: { 'a': 10, 'c': 13 }
, 2: { 'a': 20, 'b': { 'd': 100, 'e': 101 }, 'c': 23 }
, 3: { 'a': 30, 'b': 31, 'c': { 'd': 300 } }
}
for (path, v) in select(['*',['b','c'],'d'], data):
print(path, v)
# [2, 'b', 'd'] 100
# [3, 'c', 'd'] 300
Because select returns an iterable, you can use conventional map function on it -
s = select(['*',['b','c'],'d'], data)
work = lambda r: f"path: {r[0]}, value: {r[1]}"
for x in map(work, s):
print(x)
# path: [2, 'b', 'd'], value: 100
# path: [3, 'c', 'd'], value: 300
I'm writing a Python script that parses RSS feeds. I want to maintain a dictionary of entries from the feed that gets updated periodically. Entries that no longer exist in the feed should be removed, new entries should get a default value, and the values for previously seen entries should remain unchanged.
This is best explained by example, I think:
>>> old = {
... 'a': 1,
... 'b': 2,
... 'c': 3
... }
>>> new = {
... 'c': 'x',
... 'd': 'y',
... 'e': 'z'
... }
>>> out = some_function(old, new)
>>> out
{'c': 3, 'd': 'y', 'e': 'z'}
Here's my current attempt at this:
def merge_preserving_old_values_and_new_keys(old, new):
out = {}
for k, v in new.items():
out[k] = v
for k, v in old.items():
if k in out:
out[k] = v
return out
This works, but it seems to me there might be a better or more clever way.
EDIT: If you feel like testing your function:
def my_merge(old, new):
pass
old = {'a': 1, 'b': 2, 'c': 3}
new = {'c': 'x', 'd': 'y', 'e': 'z'}
out = my_merge(old, new)
assert out == {'c': 3, 'd': 'y', 'e': 'z'}
EDIT 2:
Defining Martijn Pieters' answer as set_merge, bravosierra99's as loop_merge, and my first attempt as orig_merge, I get the following timing results:
>>> setup="""
... old = {'a': 1, 'b': 2, 'c': 3}
... new = {'c': 'x', 'd': 'y', 'e': 'z'}
... from __main__ import set_merge, loop_merge, orig_merge
... """
>>> timeit.timeit('set_merge(old, new)', setup=setup)
3.4415210600000137
>>> timeit.timeit('loop_merge(old, new)', setup=setup)
1.161155690000669
>>> timeit.timeit('orig_merge(old, new)', setup=setup)
1.1776735319999716
I find this surprising, since I didn't expect the dictionary view approach to be that much slower.
Dictionaries have dictionary view objects that act as sets. Use these to get the intersection between old and new:
def merge_preserving_old_values_and_new_keys(old, new):
result = new.copy()
result.update((k, old[k]) for k in old.viewkeys() & new.viewkeys())
return result
The above uses the Python 2 syntax; use old.keys() & new.keys() if you are using Python 3, for the same results:
def merge_preserving_old_values_and_new_keys(old, new):
# Python 3 version
result = new.copy()
result.update((k, old[k]) for k in old.keys() & new.keys())
return result
The above takes all key-value pairs from new as a starting point, then adds the values for old for any key that appears in both.
Demo:
>>> merge_preserving_old_values_and_new_keys(old, new)
{'c': 3, 'e': 'z', 'd': 'y'}
Note that the function, like your version, produces a new dictionary (albeit that the key and value objects are shared; it is a shallow copy).
You could also just update the new dictionary in-place if you don't need that new dictionary for anything else:
def merge_preserving_old_values_and_new_keys(old, new):
new.update((k, old[k]) for k in old.viewkeys() & new.viewkeys())
return new
You could also use a one-liner dict comprehension to build a new dictionary:
def merge_preserving_old_values_and_new_keys(old, new):
return {k: old[k] if k in old else v for k, v in new.items()}
This should be more efficient, since you are no longer iterating through the entire old.items(). Additionally, it's more clear what you are trying to do this way since you aren't overwriting some values.
for k, v in new.items():
if k in old.keys():
out[k] = old[k]
else:
out[k] = v
return out
old = {
'a': 1,
'b': 2,
'c': 3
}
new = {
'c': 'x',
'd': 'y',
'e': 'z'
}
def merge_preserving_old_values_and_new_keys(o, n):
out = {}
for k in n:
if k in o:
out[k] = o[k]
else:
out[k] = n[k]
return out
print merge_preserving_old_values_and_new_keys(old, new)
I'm not 100% the best way to add this information to the discussion: feel free to edit/redistribute it if necessary.
Here are timing results for all of the methods discussed here.
from timeit import timeit
def loop_merge(old, new):
out = {}
for k, v in new.items():
if k in old:
out[k] = old[k]
else:
out[k] = v
return out
def set_merge(old, new):
out = new.copy()
out.update((k, old[k]) for k in old.keys() & new.keys())
return out
def comp_merge(old, new):
return {k: old[k] if k in old else v for k, v in new.items()}
def orig_merge(old, new):
out = {}
for k, v in new.items():
out[k] = v
for k, v in old.items():
if k in out:
out[k] = v
return out
old = {'a': 1, 'b': 2, 'c': 3}
new = {'c': 'x', 'd': 'y', 'e': 'z'}
out = {'c': 3, 'd': 'y', 'e': 'z'}
assert loop_merge(old, new) == out
assert set_merge(old, new) == out
assert comp_merge(old, new) == out
assert orig_merge(old, new) == out
setup = """
from __main__ import old, new, loop_merge, set_merge, comp_merge, orig_merge
"""
for a in ['loop', 'set', 'comp', 'orig']:
time = timeit('{}_merge(old, new)'.format(a), setup=setup)
print('{}: {}'.format(a, time))
size = 10**4
large_old = {i: 'old' for i in range(size)}
large_new = {i: 'new' for i in range(size//2, size)}
setup = """
from __main__ import large_old, large_new, loop_merge, set_merge, comp_merge, orig_merge
"""
for a in ['loop', 'set', 'comp', 'orig']:
time = timeit('{}_merge(large_old, large_new)'.format(a), setup=setup)
print('{}: {}'.format(a, time))
The winner is the improved looping method!
$ python3 merge.py
loop: 0.7791572390015062 # small dictionaries
set: 3.1920828100010112
comp: 1.1180207730030816
orig: 1.1681104259987478
loop: 927.2149353210007 # large dictionaries
set: 1696.8342713210004
comp: 902.039078668
orig: 1373.0389542560006
I'm disappointed, because the dictionary view/set operation method is much cooler.
With larger dictionaries (10^4 items), the dictionary comprehension method pulls ahead of the improved looping method and far ahead of the original method. The set operation method still performs the slowest.
In python, if I have a dictionary with sub dictionaries
d = {
'a' : {
'aa': {},
'ab': {},
},
'b' : {
'ba': {},
'bb': {},
}
}
how I can get the keys of every sub dictionary?
d.keys()
d['a'].keys()
d['b'].keys()
this is the normal way , but if I have many subdirectories, how I can get the keys of every sub dictionary ?
EDIT
I need the keys to access in a dictionary with five or more levels,
d[k1][k2][k3][k4][k5]
in some case I need the information "under" the k2 key, in other case "under" the k3 key, etc.
If you need the keys of all levels, you can recurse:
def nested_keys(d):
yield d.keys()
for value in d.values():
if isinstance(value, dict):
for res in nested_keys(value):
yield res
This is a generator function, you'd loop over the output or call list on it. This yields sequences of keys, not individual keys. In Python 3, that means you get dictionary views, for example, and empty dictionaries are included:
>>> d = {
... 'a' : {
... 'aa': {},
... 'ab': {},
... },
... 'b' : {
... 'ba': {},
... 'bb': {},
... }
... }
>>> def nested_keys(d):
... yield d.keys()
... for value in d.values():
... if isinstance(value, dict):
... for res in nested_keys(value):
... yield res
...
>>> for keys in nested_keys(d):
... print keys
...
['a', 'b']
['aa', 'ab']
[]
[]
['ba', 'bb']
[]
[]
This isn't all that useful, really, as you don't know what dictionary the keys belonged to.
This solution works for arbitrary level nested dictionaries
d = {
'a' : {
'aa': {},
'ab': {},
},
'b' : {
'ba': {},
'bb': {},
}
}
from itertools import chain
def rec(current_dict):
children = []
for k in current_dict:
yield k
if isinstance(current_dict[k], dict):
children.append(rec(current_dict[k]))
for k in chain.from_iterable(children):
yield k
print list(rec(d))
# ['a', 'b', 'aa', 'ab', 'ba', 'bb']
This depends if you have dictionaries in the subdictionaries or not. You can create a function that will check the types via recursion
def checksubtype(d):
# d is a dictionary check the subtypes.
for k in d:
if type(d[k]) == type(d):
print 'key', k, 'contains a dictionary'
checktype(d[k])
else:
print 'key', k, 'has type', type(d[k])
>>> d = {'a': {'aa': [1, 2, 3], 'bb': [3, 4]}, 'b': {'ba': [5, 6], 'bb': [7, 8]}}
>>> checksubtype(d)
key a contains a dictionary
key aa has type <type 'list'>
key bb has type <type 'list'>
key b contains a dictionary
key ba has type <type 'list'>
key bb has type <type 'list'>
I used a direct check of type rather than isinstance in order to show more obviously what is meant.
Something like:
for (a,b) in kwargs.iteritems():
if not b : del kwargs[a]
This code raise exception because changing of dictionary when iterating.
I discover only non pretty solution with another dictionary:
res ={}
res.update((a,b) for a,b in kwargs.iteritems() if b is not None)
Thanks
Another way to write it is
res = dict((k,v) for k,v in kwargs.iteritems() if v is not None)
In Python3, this becomes
res = {k:v for k,v in kwargs.items() if v is not None}
You can also use filter:
d = dict(a = 1, b = None, c = 3)
filtered = dict(filter(lambda item: item[1] is not None, d.items()))
print(filtered)
{'a': 1, 'c': 3}
d = {'a': None, 'b': 'myname', 'c': 122}
print dict(filter(lambda x:x[1], d.items()))
{'b': 'myname', 'c': 122}
I like the variation of your second method:
res = dict((a, b) for (a, b) in kwargs.iteritems() if b is not None)
it's Pythonic and I don't think that ugly. A variation of your first is:
for (a, b) in list(kwargs.iteritems()):
if b is None:
del kwargs[a]
If you need to handle nested dicts, then you can leverage a simple recursive approach:
# Python 2
from collections import Mapping
def filter_none(d):
if isinstance(d, Mapping):
return dict((k, filter_none(v)) for k, v, in d.iteritems() if v is not None)
else:
return d
# Python 3
from collections.abc import Mapping
def filter_none(d):
if isinstance(d, Mapping):
return {k: filter_none(v) for k, v in d.items() if v is not None}
else:
return d
To anybody who may interests, here's another way to get rid of None value. Instead of deleting the key, I change the value of None with a placeholder for the same key.
One use case is applying with Spark RDD.map onto null valued JSON.
def filter_null(data, placeholder="[spark]nonexists"):
# Replace all `None` in the dict to the value of `placeholder`
return dict((k, filter_null(v, placeholder) if isinstance(v, dict) else v if v
is not None else placeholder) for k, v in data.iteritems())
Sample output:
>>> filter_null({'a':None,'b':"nul", "c": {'a':None,'b':"nul"}})
{'a': '[spark]nonexists', 'c': {'a': '[spark]nonexists', 'b': 'nul'}, 'b': 'nul'}
For python3, change the iteritems() to items().
The recursive approach to also filter nested lists of dicts in the dictionary:
def filter_none(d):
if isinstance(d, dict):
return {k: filter_none(v) for k, v in d.items() if v is not None}
elif isinstance(d, list):
return [filter_none(v) for v in d]
else:
return d
Sample output:
data = {'a': 'b', 'c': None, 'd':{'e': 'f', 'h': None, 'i':[{'j': 'k', 'l': None}]}}
print(filter_none(data))
>>> {'a': 'b', 'd': {'e': 'f', 'i': [{'j': 'k'}]}}