I have a string '{"News":"news", "News":"politics", "News":"breaking", "News":"stories"}' that I am trying to convert to a dictionary. I have used both json.loads() and ast.literal_eval() to convert them, but it seems like both methods only take the last key value pair.
Is there a one line solution for this problem? Or would I need something more complex?
Assuming that the string is otherwise valid JSON, you could use the object_pairs_hook parameter to the JSON decoder:
import json
def multidict(l):
result = {}
for k, v in l:
result.setdefault(k, []).append(v)
return result
data = '{"News":"news", "News":"politics", ' \
'"News":"breaking", "News":"stories"}'
data = json.loads(data, object_pairs_hook=multidict)
assert data == {'News': ['news', 'politics', 'breaking', 'stories']}
Dictionaries can not have more than one instance for every key. That's why you can not produce a dictionary with 4 identical keys, News for that case.
Consider changing the data format to something that meet your specifications, like a dictionary with a list values:
{"News": ["news", "politics", "breaking", "stories"]}
You could use defaultdict with a list:
>>> from collections import defaultdict
>>> dictionary = defaultdict(list)
>>> values = '{"News":"news", "News":"politics", "News":"breaking", "News":"stories"}'
>>> for pair in values.strip('{}').split(','):
... key, value = pair.strip().split(':')
... key = key.strip('"')
... value = value.strip('"')
... dictionary[key].append(value)
Result:
>>> dictionary
{'News': ['news', 'politics', 'breaking', 'stories']}
Related
I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.
Thanks a lot #Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)
OK, so I have json source code from a webpage and, in this source code, the same word("author") is used as the key for multiple values. How can I retrieve all the values for "author"?
E.g.
"author": "SampleMan", "author":"NonSampleMan", "author":"BoringMan"
How do I get Python to return ["SampleMan", "NonSampleMan", "BoringMan"]?
You could pass object_pairs_hook to json.loads that will collect the values with same keys to lists:
from collections import defaultdict
import json
s = '{"author": "SampleMan", "author":"NonSampleMan", "author":"BoringMan", "foo":"bar", "bar": [1]}'
def hook(pairs):
d = defaultdict(list)
for k, v in pairs:
d[k].append(v)
return {k: v if len(v) > 1 else v[0] for k, v in d.items()}
print(json.loads(s, object_pairs_hook=hook))
Output:
{'bar': [1], 'author': ['SampleMan', 'NonSampleMan', 'BoringMan'], 'foo': 'bar'}
In above hook receives list of (key, value) tuples that it stores to defaultdict where values are lists. Once it has iterated over the tuples it will generate result dict where value is list if there were multiple items with given key.
Python documentation has following description of the hook:
object_pairs_hook is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict() will remember the order of insertion). If object_hook is also defined, the object_pairs_hook takes priority.
This question already has answers here:
Collections.defaultdict difference with normal dict
(16 answers)
Closed 6 years ago.
I am starting to learn Python and have run across a piece of code that I'm hoping one of you can help me understand.
from collections import defaultdict
dd_dict = defaultdict(dict)
dd_dict["Joel"]["City"] = "Seattle"
result:
{ "Joel" : { "City" : Seattle"}}
The part I am having a problem with is the third line. Could someone please explain to me what is happening here?
The third line inserts a dictionary inside a dictionary. By using dict as a default value in default dict you are telling python to initialize every new dd_dict value with an empty dict. The above code is equivalent to
dd_dict["Joel"] = {}
dd_dict['Joel"]["City"] = "Seattle"
If you didn't use default dict the second line would have raised a key error. So default dicts are a way of avoiding such errors by initializing the default value of your data structure.
From the documentation of defaultdict:
If default_factory is not None, it is called without arguments to provide a default value for the given key, this value is inserted in the dictionary for the key, and returned.
Since "Joel" doesn't exist as key yet the dd_dict["Joel"] part creates an empty dictionary as value for the key "Joel". The following part ["City"] = "Seattle" is just like adding a normal key-value pair a dictionary - in this case the dd_dict["Joel"] dictionary.
The first argument provides the initial value for the default_factory
attribute; it defaults to None. If default_factory is not None, it is
called without arguments to provide a default value for the given key,
this value is inserted in the dictionary for the key, and returned.
dd_dict = defaultdict(dict)
dd_dict["Joel"]["City"] = "Seattle"
in you case, when you call dd_dict["Joel"], there is no such key in the dd_dict, this raises a KeyError exception. defaultdict has __missing__(key) protocol to handle this error, when it can not find the key, it will call the default_factory without arguments to provide a default value for the given key.
so when you call dd_dict["Joel"], this will give you a dict {}, then you add item ["City"] = "Seattle" to the empty dict, someting like:
{}["City"] = "Seattle"
When a key is accessed and is missing, the __missing__ method is accessed.
For a regular dict, a KeyError is raised
For a defaultdict, the object you passed as a parameter is created and accessed.
If you made a defaultdict(list), and tried to access a missing key, you would get a list back.
Example:
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> d['missing']
[]
When you access a key of a defaultdict that does not exits, you will get what the function you supply returns.
In your case you supplied dict, therefore you get a new empty dictionary:
>>> dict()
{}
>>> from collections import defaultdict
... dd_dict = defaultdict(dict)
...
>>> dd_dict['Joel']
{}
Now you add your key-value pair to this dictionary:
>>> dd_dict["Joel"]["City"] = "Seattle"
"Joel" : { "City" : Seattle"}}
defaultdict(dict) returns a dictionary object that will return an empty dictionary value if you index into it with a key that doesn't yet exist:
>>> from collections import defaultdict
>>> dd_dict = defaultdict(dict)
>>> dd_dict
defaultdict(<class 'dict'>, {})
>>> dd_dict["Joel"]
{}
>>> dd_dict["anything"]
{}
>>> dd_dict[99]
{}
So the third line creates a key-value pair ("Joel", {}) in dd_dict, then sets the ("City", "Seattle") key-value pair on the empty dictionary.
It's equivalent to:
>>> dd_dict = defaultdict(dict)
>>> dd_dict["Joel"] = {}
>>> dd_dict
defaultdict(<class 'dict'>, {'Joel': {}})
>>> dd_dict["Joel"]["City"] = "Seattle"
>>> dd_dict
defaultdict(<class 'dict'>, {'Joel': {'City': 'Seattle'}})
I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.
Thanks a lot #Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)
I have an 3rd party python method (ANSA) that returns strings based on user gui input with default values to start with.
The default values may look like this and are stored in a dictionary:
Label Value
Userdata1 123
Userdata2 'abc'
Userdata3 123.456
... ...
Now, these are all returned as a dictionary with string values by the 3rd party "back_box" function.
Is it, based on the class of the default values, possible to change the new string class back to the default classes?
Here is something that may make it a bit clearer:
data = {'Userdata1':123,
'Userdata2':'abc',
'Userdata3':123.456}
new_data = black_box(data) # Outputs dictionary with the values as strings
for i, var_name in enumerate(data):
original_class = type(data[var_name])
new_data[var_name] = new_data[var_name].StrToClass(original_class)
So what i am looking for is a method/function/magic trick called strToClass in the above example.
I know it may be a risky way to do it, but any ideas about pythonic ways to do it?
(python3 btw)
Just keep the types of the original values of the dictionary in a separate new dictionary, then use that dictionary to transform them back --
>>> my_dict = {'Userdata1':123,
'Userdata2':'abc',
'Userdata3':123.456}
>>> type_dict = {k:type(v) for k,v in my_dict.iteritems()}
>>> type_dict
{'Userdata2': <type 'str'>, 'Userdata3': <type 'float'>, 'Userdata1': <type 'int'>}
>>> str_dict = {k:str(v) for k,v in my_dict.iteritems()}
>>> str_dict
{'Userdata2': 'abc', 'Userdata3': '123.456', 'Userdata1': '123'}
>>> new_dict = {k:type_dict[k](v) for k,v in str_dict.iteritems()}
>>> new_dict
{'Userdata2': 'abc', 'Userdata3': 123.456, 'Userdata1': 123}
Or, using your pseudocode
for i, var_name in enumerate(data):
original_class = type(data[var_name])
new_data[var_name] = original_class(new_data[var_name])
I believe the magic function you're looking for is ast.literal_eval.
for i, var_name in enumerate(data):
new_data[var_name] = ast.literal_eval(new_data[var_name])
Never name a variable def
For complicated classes you'll need more work, however this should do nicely for basic types, and types that have a representation identical to other basic types
d = {'Userdata1':123,
'Userdata2':'abc',
'Userdata3':123.456}
# create a list containing each type
var_type = [ type(d[key]) for key in sorted(d.keys())]
d = black_box(d)
import ast
for vt, key in zip(var_type, sorted(d.keys())):
# if the variable was a string you're already done
if not vt == str:
# cast from string to generic type
d[key] = ast.literal_eval(d[key])
# cast from generic to original type
try:
d[key] = vt(d[key])
except:
print "Error converting dictionary value with key: " + key
print "Unable to cast {} as {}".format(d[key], vt)