Let's say we have a dict parsed from json and we read values from it from the keys in the form of key path path-to.my.keys
my_dict['path-to']['my']['keys']
In file system we have mkdir -p to create such path if it not exists.
In python, do we have such similar syntax/function to create key path for dict aka default empty dict for missing keys? My google search results not very helpful.
TLDR
You can use dict.setdefault or collections.defaultdict.
def make_path(d: dict, *paths: str) -> None:
for key in paths:
d = d.setdefault(key, {})
make_path(my_dict, 'path-to', 'my', 'keys')
assert my_dict['path-to']['my']['keys'] is not None
Full details
Solution 1. dict.setdefault:
my_dict.setdefault('path-to', {}).setdefault('my', {}).setdefault('keys', {})
Pros:
my_dict is normal dict
making dict happens only explicitly
No restrict of depth
Cons:
You should call setdefault method every use cases.
Solution 2. collections.defaultdict:
from collections import defaultdict
my_dict = defaultdict(lambda: defaultdict(lambda: defaultdict(dict)))
my_dict['path-to']['my']['keys']
Pros:
You don't need to call checking existence at all.
Cons:
Making dictionary happens implicitly.
my_dict is not pure dict.
You have depth limit by definition of my_dict.
Solution 3. advanced from solution 1: Make your own function
def make_path(my_dict: dict, *paths: str) -> dict:
while paths:
key, *paths = paths
my_dict = my_dict.setdefault(key, {})
return my_dict
test = {'path-to': {'test': 1}}
print(test)
make_path(test, 'path-to', 'my', 'keys')['test2'] = 4
print(test)
print(make_path(test)) # It's okay even no paths passed
output:
{'path-to': {'test': 1}}
{'path-to': {'test': 1, 'my': {'keys': {'test2': 4}}}}
{'path-to': {'test': 1, 'my': {'keys': {'test2': 4}}}}
Solution 4. advanced from solution 2: Make your own class
class MyDefaultDict(dict):
def __missing__(self, key):
self[key] = MyDefaultDict()
return self[key]
my_dict = MyDefaultDict()
print(my_dict)
my_dict['path-to']['my']['keys'] = 'hello'
print(my_dict)
output:
{}
{'path-to': {'my': {'keys': 'hello'}}}
Conclusion
I think that solution 3 is most similar to your need, but you can use any other options if it fits to your case.
Append
How about in Solution 4 we have dict :d already parsed from a json? Your solution starts from MyDefaultDict() type not from what returned from jsons.loads()
If you can edit json.loads part, then try:
import json
class MyDefaultDict(dict):
def __missing__(self, key):
self[key] = MyDefaultDict()
return self[key]
data = '{"path-to": {"my": {"keys": "hello"}}}'
my_dict = json.loads(data, object_pairs_hook=MyDefaultDict)
print(type(my_dict))
output:
<class '__main__.MyDefaultDict'>
There's the recursive defaultdict trick that allows you to set values at random paths down a nested structure without explicitly creating the path:
import json
from collections import defaultdict
nested = lambda: defaultdict(nested)
d = nested()
d['path']['to']['nested']['key'] = 'value'
print(json.dumps(d))
# {"path": {"to": {"nested": {"key": "value"}}}}
Non-existing keys will return empty defaultdicts.
In python, do we have such similar syntax/function to create key path for dict? My google search results not very helpful.
Python doesn't have "keypath" syntax in the style of clojure & friends no. It can handle this specific case at some runtime cost for the convenience using the setdefault method though: dict.setdefault(key, default) will return the value for the key after having set it if it was missing so my_dict.setdefault('path-to', {}).setdefault('my', {}).setdefault('keys', ???) would access the specified path, setting dicts where they are missing.
Answer to your question -- YES.
In python you can use subprocess module to execute all the commands you generally do on a system.
You can execute same mkdir -p command from python for creating a nested directory usinf subprocess.Popen.
Here is how you can do that :
import subprocess
# Create a string of nested directory path.
path_from_dict_keys = "dir1/dir2/dir3"
temp = subprocess.Popen(['mkdir', '-p', path_from_dict_keys], stdout = subprocess.PIPE)
# we use the communicate function to fetch the output
output = str(temp.communicate())
Related
If I have a dictionary that is nested, and I pass in a string like "key1.key2.key3" which would translate to:
myDict["key1"]["key2"]["key3"]
What would be an elegant way to be able to have a method where I could pass on that string and it would translate to that key assignment? Something like
myDict.set_nested('key1.key2.key3', someValue)
Using only builtin stuff:
def set(my_dict, key_string, value):
"""Given `foo`, 'key1.key2.key3', 'something', set foo['key1']['key2']['key3'] = 'something'"""
# Start off pointing at the original dictionary that was passed in.
here = my_dict
# Turn the string of key names into a list of strings.
keys = key_string.split(".")
# For every key *before* the last one, we concentrate on navigating through the dictionary.
for key in keys[:-1]:
# Try to find here[key]. If it doesn't exist, create it with an empty dictionary. Then,
# update our `here` pointer to refer to the thing we just found (or created).
here = here.setdefault(key, {})
# Finally, set the final key to the given value
here[keys[-1]] = value
myDict = {}
set(myDict, "key1.key2.key3", "some_value")
assert myDict == {"key1": {"key2": {"key3": "some_value"}}}
This traverses myDict one key at a time, ensuring that each sub-key refers to a nested dictionary.
You could also solve this recursively, but then you risk RecursionError exceptions without any real benefit.
There are a number of existing modules that will already do this, or something very much like it. For example, the jmespath module will resolve jmespath expressions, so given:
>>> mydict={'key1': {'key2': {'key3': 'value'}}}
You can run:
>>> import jmespath
>>> jmespath.search('key1.key2.key3', mydict)
'value'
The jsonpointer module does something similar, although it likes / for a separator instead of ..
Given the number of pre-existing modules I would avoid trying to write your own code to do this.
EDIT: OP's clarification makes it clear that this answer isn't what he's looking for. I'm leaving it up here for people who find it by title.
I implemented a class that did this a while back... it should serve your purposes.
I achieved this by overriding the default getattr/setattr functions for an object.
Check it out! AndroxxTraxxon/cfgutils
This lets you do some code like the following...
from cfgutils import obj
a = obj({
"b": 123,
"c": "apple",
"d": {
"e": "nested dictionary value"
}
})
print(a.d.e)
>>> nested dictionary value
I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.
Thanks a lot #Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)
I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.
Thanks a lot #Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)
I see this pattern in some code that i'm writing
e = {...} # a dictionary
e["table"] = "users"
e["timestamp"] = time.time()
queue.push(e)
del e["table"]
del e["timestamp"]
[...]
e["table"] = "events"
queue2.push(e)
del e["table"]
# etc..
I'm demultiplexing an event over some queues but each queue has a slightly different format. I've started doing this:
queue.push( dict(e.items() + [("table":"users"), ("timestamp", time.time())]) )
but it looks ugly and it kind of slows down the code. What else can i do?
Assuming queue.push only needs read access, you could try something like this:
class MergedDicts(dict):
def __init__(self, *dicts, **kw):
self.dicts = dicts + (kw,)
def __getitem__(self, key):
for d in self.dicts:
if key in d: return d[key]
raise KeyError(key)
This would give you a dictionary returning items from both sources, but avoid the overhead of building another actual copy from the originals (you may need to implement more than just __getitem__ though, depending on what push needs).
Usage:
other = {"table": "users", "timestamp": time.time()}
queue.push(MergedDicts(e, other))
or:
queue.push(MergedDicts(e, table="users", timestamp=time.time()))
If the number of modifications to the dictionary are relatively small compared to the size of the dictionary itself, you can avoid making a copy of it each time by creating a context manager function and using it as shown. This will insure that any changes made to the dictionary are temporary, even if an exception is thrown while using it inside the block.
from contextlib import contextmanager
#contextmanager
def contextdict(adict, **kwargs):
# modify dictionary
changed = {}
added = []
for key in kwargs:
if key in adict:
changed[key] = adict[key]
else:
added.append(key)
adict[key] = kwargs[key]
yield adict
# restore dictionary
adict.update(changed)
for key in added:
del adict[key]
e = dict(...) # some dictionary
with contextdict(e, table="users", timestamp=time.time()) as context:
queue.push(context)
with contextdict(e, table="events") as context:
queue.push(context)
# e will be unchanged at this point
You can create a new dictionary with the new fields you want and use dict.update on it with the base fields
e = {...} # a dictionary
d={"table":"users", "timestamp":time.time()}
d.update(e)
queue.push(d)
You could also create a new dict with fields as a list:
e = {...} # a dictionary
queue.push( e.items() + [("table","users"), ("timestamp",time.time())] )
If you do this a lot on large dictionaries, and don't want to create a copy, you can use a Context Manager that modifies the dictionary temporarily, automating what you're doing right now.
Another option, instead of the context manager, is performing the modification in a function, passing the operations you want to do as a function:
def modify_dict_and_call( d, newfields, f):
for k,v in newfields.items():
d[k]=v
f(d)
for k in newfields:
del d[k]
e = {...} # a dictionary
modify_dict_and_call( e, {"table":"users", "timestamp":time.time()}, queue.push )
If you initially define e with only those keys that are common to each use case, you can use the mock library. mock.patch.dict allows you to temporarily add keys to a dictionary (for the duration of the with statement), although you cannot temporarily remove keys.
e = { ... }
with mock.patch.dict(e, table="users", timestamp=time.time()):
queue.push(e)
with mock.patch.dict(e, table="events"):
queue2.push(e)
mock is a third-party module for Python 2.x and prior to Python 3.4, where it was added to the standard library as unittest.mock.
I think it is might be covered with the next code:
a = {'val': 2, 'val2': -5, "name": 'Vladimir'}
b = {"asdf": 1, "b2": 2}
queue.push( dict( **a, **b) )
This question already has answers here:
Is there a standard class for an infinitely nested defaultdict?
(6 answers)
Closed 9 years ago.
I'm creating a dictionary structure that is several levels deep. I'm trying to do something like the following:
dict = {}
dict['a']['b'] = True
At the moment the above fails because key 'a' does not exist. At the moment I have to check at every level of nesting and manually insert an empty dictionary. Is there some type of syntactic sugar to be able to do something like the above can produce:
{'a': {'b': True}}
Without having to create an empty dictionary at each level of nesting?
As others have said, use defaultdict. This is the idiom I prefer for arbitrarily-deep nesting of dictionaries:
def nested_dict():
return collections.defaultdict(nested_dict)
d = nested_dict()
d[1][2][3] = 'Hello, dictionary!'
print(d[1][2][3]) # Prints Hello, dictionary!
This also makes checking whether an element exists a little nicer, too, since you may no longer need to use get:
if not d[2][3][4][5]:
print('That element is empty!')
This has been edited to use a def rather than a lambda for pep8 compliance. The original lambda form looked like this below, which has the drawback of being called <lambda> everywhere instead of getting a proper function name.
>>> nested_dict = lambda: collections.defaultdict(nested_dict)
>>> d = nested_dict()
>>> d[1][2][3]
defaultdict(<function <lambda> at 0x037E7540>, {})
Use defaultdict.
Python: defaultdict of defaultdict?
Or you can do this, since dict() function can handle **kwargs:
http://docs.python.org/2/library/functions.html#func-dict
print dict(a=dict(b=True))
# {'a': {'b' : True}}
If the depth of your data structure is fixed (that is, you know in advance that you need mydict[a][b][c] but not mydict[a][b][c][d]), you can build a nested defaultdict structure using lambda expressions to create the inner structures:
two_level = defaultdict(dict)
three_level = defaultdict(lambda: defaultdict(dict))
four_level = defaultdict(lamda: defaultdict(lambda: defaultdict(dict)))