I have this nested dictionary that I get from an API.
response_body = \
{
u'access_token':u'SIF_HMACSHA256lxWT0K',
u'expires_in':86000,
u'name':u'Gandalf Grey',
u'preferred_username':u'gandalf',
u'ref_id':u'ab1d4237-edd7-4edd-934f-3486eac5c262',
u'refresh_token':u'eyJhbGciOiJIUzI1N',
u'roles':u'Instructor',
u'sub':{
u'cn':u'Gandalf Grey',
u'dc':u'7477',
u'uid':u'gandalf',
u'uniqueIdentifier':u'ab1d4237-edd7-4edd-934f-3486eac5c262'
}
}
I used the following to convert it into a Python object:
class sample_token:
def __init__(self, **response):
self.__dict__.update(response)
and used it like this:
s = sample_token(**response_body)
After this, I can access the values using s.access_token, s.name etc. But the value of c.sub is also a dictionary. How can I get the values of the nested dictionary using this technique? i.e. s.sub.cn returns Gandalf Grey.
Maybe a recursive method like this -
>>> class sample_token:
... def __init__(self, **response):
... for k,v in response.items():
... if isinstance(v,dict):
... self.__dict__[k] = sample_token(**v)
... else:
... self.__dict__[k] = v
...
>>> s = sample_token(**response_body)
>>> s.sub
<__main__.sample_token object at 0x02CEA530>
>>> s.sub.cn
'Gandalf Grey'
We go over each key:value pair in the response, and if value is a dictionary we create a sample_token object for that and put that new object in the __dict__() .
You can iterate over all key/value pairs with response.items() and for each value which isinstance(value, dict), replace it with sample_token(**value).
Nothing will do the recursion automagically for you.
Once you've evaluated the expression in Python, it's not a JSON object anymore; it's a Python dict; the usual way to access entries is with the [] indexer notation, e.g.:
response_body['sub']['uid']
'gandalf'
If you must access it as an object rather than a dict, check out the answers in the question Convert Python dict to object?; the case of nested dicsts is covered in one of the later answers.
Related
I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.
Thanks a lot #Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)
I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.
Thanks a lot #Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)
I have a dictionary:
big_dict = {1:"1",
2:"2",
...
1000:"1000"}
(Note: My dictionary isn't actually numbers to strings)
I am passing this dictionary into a function that calls for it. I use the dictionary often for different functions. However, on occasion I want to send in big_dict with an extra key:item pair such that the dictionary I want to send in would be equivalent to:
big_dict[1001]="1001"
But I don't want to actually add the value to the dictionary. I could make a copy of the dictionary and add it there, but I'd like to avoid the memory + CPU cycles this would consume.
The code I currently have is:
big_dict[1001]="1001"
function_that_uses_dict(big_dict)
del big_dict[1001]
While this works, it seems rather kludgy.
If this were a string I'd do:
function_that_uses_string(myString + 'what I want to add on')
Is there any equivalent way of doing this with a dictionary?
As pointed out by Veedrac in his answer, this problem has already been solved in Python 3.3+ in the form of the ChainMap class:
function_that_uses_dict(ChainMap({1001 : "1001"}, big_dict))
If you don't have Python 3.3 you should use a backport, and if for some reason you don't want to, then below you can see how to implement it by yourself :)
You can create a wrapper, similarly to this:
class DictAdditionalValueWrapper:
def __init__(self, baseDict, specialKey, specialValue):
self.baseDict = baseDict
self.specialKey = specialKey
self.specialValue = specialValue
def __getitem__(self, key):
if key == self.specialKey:
return self.specialValue
return self.baseDict[key]
# ...
You need to supply all other dict method of course, or use the UserDict as a base class, which should simplify this.
and then use it like this:
function_that_uses_dict(DictAdditionalValueWrapper(big_dict, 1001, "1001"))
This can be easily extended to a whole additional dictionary of "special" keys and values, not just single additional element.
You can also extend this approach to reach something similar as in your string example:
class AdditionalKeyValuePair:
def __init__(self, specialKey, specialValue):
self.specialKey = specialKey
self.specialValue = specialValue
def __add__(self, d):
if not isinstance(d, dict):
raise Exception("Not a dict in AdditionalKeyValuePair")
return DictAdditionalValueWrapper(d, self.specialKey, self.specialValue)
and use it like this:
function_that_uses_dict(AdditionalKeyValuePair(1001, "1001") + big_dict)
If you're on 3.3+, just use ChainMap. Otherwise use a backport.
new_dict = ChainMap({1001: "1001"}, old_dict)
You can add the extra key-value pair leaving original dictionary as such like this:
>>> def function_that_uses_bdict(big_dict):
... print big_dict[1001]
...
>>> dct = {1:'1', 2:'2'}
>>> function_that_uses_bdict(dict(dct.items()+[(1001,'1001')]))
1001
>>> dct
{1: '1', 2: '2'} # original unchanged
This is a bit annoying too, but you could just have the function take two parameters, one of them being big_dict, and another being a temporary dictionary, created just for the function (so something like fxn(big_dict, {1001,'1001'}) ). Then you could access both dictionaries without changing your first one, and without copying big_dict.
We are attempting to refactor and modify a Python program such that it is able to take a user-defined JSON file, parse that file, and then execute a workflow based on the options that they user wants and had defined in the JSON. So basically, the user will have to specify a dictionary in JSON, and when this JSON file is parsed by the Python program, we obtain a python dictionary which we then pass in as an argument into a class that we instantiate in a top level module. To sum this up, the JSON dictionary defined by the user will eventually be added into the instance namespace when the python program is running.
Implementing the context managers to parse the JSON inputs was not a problem for us. However, we have a requirement that we be able to use the JSON dictionary (which gets subsequently added into the instance namespace) and generate multiple lines from a Jinja2 template file using looping within a template. We attempted to use this line for one of the key-value pairs in the JSON:
"extra_scripts" : [["Altera/AlteraCommon.lua",
"Altera/StratixIV/EP4SGX70HF35C2.lua"]]
and this is sitting in a large dictionary object, let's call it option_space_dict and for simplicity in this example, it has only 4 key-value pairs (assume that "extra_scripts" is 'key4' here), although for our program, it is much larger:
option_space_dict = {
'key1' : ['value1'],
'key2' : ['value2'],
'key3' : ['value3A', 'value3B', 'value3C'],
'key4' : [['value4A', 'value4B']]
}
which is the parsed by this line:
import itertools
option_space = [ dict(itertools.izip(option_space_dict, opt)) for opt in itertools.product(*option_space_dict.itervalues()) ]
to get the option_space which essentially differs from option_space_dict in that it is something like:
[
{ 'key1' : 'value1',
'key2' : 'value2',
'key3' : 'value3A'
'key4' : ['value4A', 'value4B'] },
{ 'key1' : 'value1',
'key2' : 'value2',
'key3' : 'value3B'
'key4' : ['value4A', 'value4B'] },
{ 'key1' : 'value1',
'key2' : 'value2',
'key3' : 'value3C'
'key4' : ['value4A', 'value4B'] }
]
So the option_space we generate serves us well for what we want to do with the jinja2 templating. However, in order to get this, the key4 key that we added to option_space_dict caused an issue somewhere else in the program which did:
# ignore self.option as it is not relevant to the issue here
def getOptionCompack(self) :
return [ (k, v) for k, v in self.option.iteritems() if set([v]) != set(self.option_space_dict[k])]
I get the error TypeError: unhashable type: 'list' stemming from the fact that the value of key4 contains a nested list structure, which is 'unhashable'.
So we kind of hit a barrier. Does anyone have a suggestion on how we could overcome this; being able to specify our JSON files in that way to do what we'd want with Jinja2 while still being able to parse the data structures out in the same format?
Thanks a million!
You can normalize your key data structures to use hashable types after they have parsed from JSON.
Since key4 is a list, you have two options:
Convert it to a tuple where order is significant. E.g.,
key = tuple(key)
Convert it to a frozenset where order is insignificant. E.g.,
key = frozenset(key)
If a key can contain a dictionary, then you'll have two additional options:
Convert it to either a sorted tuple or frozenset of its item tuples. E.g.,
key = tuple(sorted(key.iteritems())) # Use key.items() for Python 3.
# OR
key = frozenset(key.iteritems()) # Use key.items() for Python 3.
Convert it to a third-party frozendict (Python 3 compatible version here). E.g.,
import frozendict
key = frozendict.frozendict(key)
Depending on how simple or complex your keys are, you may have to apply the transformation recursively.
Since your keys come directly from JSON, you can check for the native types directly:
if isinstance(key, list):
# Freeze list.
elif isinstance(key, dict):
# Freeze dict.
If you want to support the generic types, you can do something similar to:
import collections
if isinstance(key, collections.Sequence) and not isinstance(key, basestring): # Use str for Python 2.
# NOTE: Make sure to exclude basestring because it meets the requirements for a Sequence (of characters).
# Freeze list.
elif isinstance(key, collections.Mapping):
# Freeze dict.
Here is a full example:
def getOptionCompack(self):
results = []
for k, v in self.option.iteritems():
k = self.freeze_key(k)
if set([v]) != set(self.option_space_dict[k]):
results.append((k, v))
return results
def freeze_key(self, key):
if isinstance(key, list):
return frozenset(self.freeze_key(subv) for subv in key)
# If dictionaries need to be supported, uncomment this.
#elif isinstance(key, dict):
# return frozendict((subk, self.freeze_key(subv)) for subk, subv in key.iteritems())
return key
Where self.option_space_dict already had its keys converted using self.freeze_key().
We have managed to figure out the solution to this problem. The main gist of our solution lies in that we implemented a Helper Function that assists us to actually convert a list into a tuple. Basically, going back to my question, remember we had this list: [["Altera/AlteraCommon.lua", "Altera/StratixIV/EP4SGX70HF35C2.lua"]]?
With our original getOptionCompack(self) method, and the way we were invoking it, what happened was that we directly tried to convert the list to a set with the statement
return [ (k, v) for k, v in self.option.iteritems() if set([v]) != set(self.option_space_dict[k])]
where set(self.option_space_dict[k]) and iterating over k would mean we will hit the dictionary key-value pair that would give us one instance of doing set([["Altera/AlteraCommon.lua", "Altera/StratixIV/EP4SGX70HF35C2.lua"]])
which was the cause of the error. This is because a list object is not hashable and set() would actually hash over each element within the outer list that is fed to it, and the element in this case is an inner list. Try doing set([[2]]) and you will see what I mean.
So we figured that the workaround would be to define a Helper function that would accept a list object, or any iterable object for that matter, and test whether each element in it is a list or not. If the element was not a list, it would not do any change to its object type, if it was (and that which would be a nested list), then the Helper function would convert that nested list to a tuple object instead, and in doing that iteratively, it actually constructs a set object that it returns to itself. The definition of the function is:
# Helper function to build a set
def Set(iterable) :
return { tuple(v) if isinstance(v, list) else v for v in iterable }
and so a call that invoked Set() would be in our example:
Set([["Altera/AlteraCommon.lua", "Altera/StratixIV/EP4SGX70HF35C2.lua"]])
and the object that it returns to itself would be:
{("Altera/AlteraCommon.lua", "Altera/StratixIV/EP4SGX70HF35C2.lua")}
The inner nested list gets converted to a tuple, which is an object type that fits within a set object, as denoted by the {} that encloses the tuple. That's why it can work now, that the set can be formed.
We proceeded to redefine out original method to use our own Set() function:
def getOptionCompack(self) :
return [ (k, v) for k, v in self.option.iteritems() if Set([v]) != Set(self.option_space_dict[k]) ]
and now we no longer have the TypeError, and solved the problem. Seems like a lot of trouble just to do this, but the reason why we went through all this was so as to have an objective means of comparing two objects by sort of "normalizing" them to be the same object type, a set, in order to perform some other action later on as part of our source code.
[Python 3.4.2]
I know this question sounds ridiculous, but I can't figure out where I'm messing up. I'm trying to add keys and values to a dictionary by using strings instead of quoted text. So instead of this,
dict['key'] = value
this:
dict[key] = value
When I run the command above, I get this error:
TypeError: 'str' object does not support item assignment
I think Python is thinking that I'm trying to create a string, not add to a dictionary. I'm guessing I'm using the wrong syntax. This is what I'm trying to do:
dict[string_for_key][string_for_value] = string_for_deeper_value
I want this^ command to do this:
dict = {string_for_key: string_for_value: string_for_deeper_value}
I'm getting this error:
TypeError: 'str' object does not support item assignment
I should probably give some more context. I'm:
creating one dictionary
creating a copy of it (because I need to edit the dictionary while iterating through it)
iterating through the first dictionary while running some queries
trying to assign a query's result as a value for each "key: value" in the dictionary.
Here's a picture to show what I mean:
key: value: query_as_new_value
-----EDIT-----
Sorry, I should have clarified: the dictionary's name is not actually 'dict'; I called it 'dict' in my question to show that it was a dictionary.
-----EDIT-----
I'll just post the whole process I'm writing in my script. The error occurs during the last command of the function. Commented out at the very bottom are some other things I've tried.
from collections import defaultdict
global query_line, pericope_p, pericope_f, pericope_e, pericope_g
def _pre_query(self, typ):
with open(self) as f:
i = 1
for line in f:
if i == query_line:
break
i += 1
target = repr(line.strip())
###skipping some code
pericope_dict_post[self][typ] = line.strip()
#^Outputs error TypeError: 'str' object does not support item assignment
return
pericope_dict_pre = {'pericope-p.txt': 'pericope_p',
'pericope-f.txt': 'pericope_f',
'pericope-e.txt': 'pericope_e',
'pericope-g.txt': 'pericope_g'}
pericope_dict_post = defaultdict(dict)
#pericope_dict_post = defaultdict(list)
#pericope_dict_post = {}
for key, value in pericope_dict_pre.items():
pericope_dict_post[key] = value
#^Works
#pericope_dict_post.update({key: value})
#^Also works
#pericope_dict_post.append(key)
#^AttributeError: 'dict' object has no attribute 'append'
#pericope_dict_post[key].append(value)
#^AttributeError: 'dict' object has no attribute 'append'
_pre_query(key, value)
-----FINAL EDIT-----
Matthias helped me figure it out, although acushner had the solution too. I was trying to make the dictionary three "levels" deep, but Python dictionaries cannot work this way. Instead, I needed to create a nested dictionary. To use an illustration, I was trying to do {key: value: value} when I needed to do {key: {key: value}}.
To apply this to my code, I need to create the [second] dictionary with all three strings at once. So instead of this:
my_dict[key] = value1
my_dict[key][value1] = value2
I need to do this:
my_dict[key][value1] = value2
Thanks a ton for all your help guys!
You could create a dictionary that expands by itself (Python 3 required).
class AutoTree(dict):
"""Dictionary with unlimited levels"""
def __missing__(self, key):
value = self[key] = type(self)()
return value
Use it like this.
data = AutoTree()
data['a']['b'] = 'foo'
print(data)
Result
{'a': {'b': 'foo'}}
Now I'm going to explain your problem with the message TypeError: 'str' object does not support item assignment.
This code will work
from collections import defaultdict
data = defaultdict(dict)
data['a']['b'] = 'c'
data['a'] doesn't exist, so the default value dict is used. Now data['a'] is a dict and this dictionary gets a new value with the key 'b' and the value 'c'.
This code won't work
from collections import defaultdict
data = defaultdict(dict)
data['a'] = 'c'
data['a']['b'] = 'c'
The value of data['a'] is defined as the string 'c'. Now you can only perform string operations with data['a']. You can't use it as a dictionary now and that's why data['a']['b'] = 'c' fails.
first, do not use dict as your variable name as it shadows the built-in of the same name.
second, all you want is a nested dictionary, no?
from collections import defaultdict
d = defaultdict(dict)
d[string_for_key][string_for_value] = 'snth'
another way, as #Matthias suggested, is to create a bottomless dictionary:
dd = lambda: defaultdict(dd)
d = dd()
d[string_for_key][string_for_value] = 'snth'
you can do something like this:
>>> my_dict = {}
>>> key = 'a' # if key is not defined before it will raise NameError
>>> my_dict[key] = [1]
>>> my_dict[key].append(2)
>>> my_dict
{'a': [1, 2]}
Note: dict is inbuilt don't use it as variable name