Json in Python: Receive/Check duplicate key error - python

The json module of python acts a little of the specification when having duplicate keys in a map:
import json
>>> json.loads('{"a": "First", "a": "Second"}')
{u'a': u'Second'}
I know that this behaviour is specified in the documentation:
The RFC specifies that the names within a JSON object should be
unique, but does not specify how repeated names in JSON objects should
be handled. By default, this module does not raise an exception;
instead, it ignores all but the last name-value pair for a given name:
For my current project, I absolutely need to make sure that no duplicate keys are present in the file and receive an error/exception if this is the case? How can this be accomplished?
I'm still stuck on Python 2.7, so a solution which also works with older versions would help me most.

Well, you could try using the JSONDecoder class and specifying a custom object_pairs_hook, which will receive the duplicates before they would get deduped.
import json
def dupe_checking_hook(pairs):
result = dict()
for key,val in pairs:
if key in result:
raise KeyError("Duplicate key specified: %s" % key)
result[key] = val
return result
decoder = json.JSONDecoder(object_pairs_hook=dupe_checking_hook)
# Raises a KeyError
some_json = decoder.decode('''{"a":"hi","a":"bye"}''')
# works
some_json = decoder.decode('''{"a":"hi","b":"bye"}''')

The below code will take any json that has repeated keys and put them into an array. for example takes this json string '''{"a":"hi","a":"bye"}''' and gives {"a":['hi','bye']} as output
import json
def dupe_checking_hook(pairs):
result = dict()
for key,val in pairs:
if key in result:
if(type(result[key]) == dict):
temp = []
temp.append(result[key])
temp.append(val)
result[key] = temp
else:
result[key].append(val)
else:
result[key] = val
return result
decoder = json.JSONDecoder(object_pairs_hook=dupe_checking_hook)
#it will not raise error
some_json = decoder.decode('''{"a":"hi","a":"bye"}''')

Related

is it possible to convert each part of dictionary to json separately put them together as valid json?

Is it possible to convert each value pair of a dictionary into json
and then put them back together as one JSON, I tried doing it as such but it failed and produced weird:
def toj(dict):
j=[]
for k,v in dict.items():
try:
j.append({json.dumps(k):json.dumps(v)})
except:
j.append({json.dumps(k):json.dumps(str(v))})
return j
why do I need it? -->
I need to pass around dictionary instance within my app which is currently using Flask as the web framework (It may change in the future).
from datetime import datetime
dic={'1':One, 2:'Two','list1':[1,2,3],'time_stamp':datetime.now()}
This dictionary is not json convertible as you all would know:
raise TypeError(f'Object of type {o.__class__.__name__} '
f'is not JSON serializable')
TypeError: Object of type datetime is not JSON serializable
Json.dumps acts on the entire object and not each key.So either the whole thing has to be sanitized.
Lucky for me, I I can solve this by converting the offending object into str.
but this is not possible because
I do not know which field will be the offending one as the
dictionary is created dynamically and can be any length really
I can not check for type datetime only because it could be one of many objects which can be of any type but all have str implementd
I can not convert the whole dict to str as then converting it back to dict causes odd issues.
JUST fyi For very complicated reasons I can not use any other library than json itself.
Passing a default does not help, because I can not change the class OF THE object I am passing into dumps and again JSON.dumps takes the whole object not pieces of it
Full-disclosure---> my brain does not digest json easily
you can easily do this with default keyword of json.dump
def handle_conversion(v):
if isinstace(v,datetime):
return str(v)
elif isinstance(v,some_other_thing_that_cannot_be_jsonified):
return "???" # a value that can be jsonified
return v
data = json.dumps(a_dict_with_datetimes,default=handle_conversion)
if you only need to pass serialized data between python you can use the pickle module instead which serialized almost anything (but sometimes breaks depending on where you are decoding it)
here is a complete working example
import json
from datetime import datetime
class some_other_thing_that_cannot_be_jsonified:
pass
def handle_conversion(v):
if isinstance(v,datetime):
return str(v)
elif isinstance(v,some_other_thing_that_cannot_be_jsonified):
return "???" # a value that can be jsonified
return v
a_dict_with_datetimes = {
"dt":datetime.now(),"a":"hello","w":"world","L":[1,2,3]
}
data = json.dumps(a_dict_with_datetimes,default=handle_conversion)
print(data)
# {"dt": "2022-10-09 16:21:19.643080", "a": "hello", "w": "world", "q": [1, 2, 3]}
you could simplify it to also just always return str (it only calls default if it could not jsonify the object), this will cause any object that cannot be jsonified to be converted to str
def handle_conversion(v):
# if isinstance(v,datetime):
# return str(v)
# elif isinstance(v,some_other_thing_that_cannot_be_jsonified):
# return "???" # a value that can be jsonified
return str(v)
from datetime import datetime
def toj(dict):
conversions = {datetime: str}
for k, v in dict.items():
for instance, conv in conversions.items():
if isinstance(v, instance):
dict[k] = conv(v)
return json.dumps(dict)
You can update conversions according to your requirements.
Execution:
In [1]: d = {'1':'One', 2:'Two','list1':[1,2,3],'time_stamp':datetime.now()}
In [2]: toj(d)
Out[2]: '{"1": "One", "2": "Two", "list1": [1, 2, 3], "time_stamp": "2022-10-10 00:39:00.604636"}'
Edit:
Based on your comment if all have string methods available you can do this.
def toj(dict):
for k, v in dict.items():
try:
json.dumps(v)
except (TypeError, OverflowError):
dict[k] = v.__str__()
return json.dumps(dict)

Trying to get a value from a JSON in python

I am trying to access a value in a json file using python. In the imgur link, the value I am trying to access is the "NUM" nested in "args". My main logic is reading in the JSON file, then using pandas to normalize the json.I have tried using .loc to try and find 'args' but I need help with another way or option.
[1]: https://i.stack.imgur.com/n6hOg.png
Here is my code snippet along with the terminal error I am getting
def readInJSON(json):
df = pandas.json_normalize(json)
goto_rows = [i for i in df.loc[df['mnemonic'] == 'PLD_CCD_EXPOSE_CLOSED'].index]
commandDates = list(df['utc_time'])
numIDs = list(df['args']) #tried using list typing
print(type(df['args'])) #couldnt get a typing from it either
args = df['args'] #tried just using it like a regular list
args = [i for i in df.loc[df['args']]] #tried using .loc from pandas as well
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandas/core/frame.py", line 3505, in getitem
indexer = self.columns.get_loc(key)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'args'
This is how you can access value of NUM:
import json
file = open('data.json')
data = json.load(file)
NUM = data[1]["args"]["NUM"]
# NUM = 20.0
file.close()
The JSON structure appears to be a list of dictionaries. Those dictionaries may or may not have a 'args' key. The value associated with 'args' is expected to be a dictionary. That dictionary may contains a 'NUM' key. If 'NUM' exists, append its value to a list. Return the list.
def readInJSON(json):
numvals = []
for d in json:
if (args := d.get('args')):
if isinstance(args, dict) and (num := args.get('NUM')):
numvals.append(num)
return numvals
A better approach might be to write the function so that it handles the input JSON file like this:
import json
def readInJSON(filename):
with open(filename) as jdata:
numvals = []
for d in json.load(jdata):
if (args := d.get('args')):
if isinstance(args, dict) and (num := args.get('NUM')):
numvals.append(num)
return numvals

How to prevent ast.literal_eval from overwriting same key values? [duplicate]

I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.
Thanks a lot #Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)

Python json parser allow duplicate keys

I need to parse a json file which unfortunately for me, does not follow the prototype. I have two issues with the data, but i've already found a workaround for it so i'll just mention it at the end, maybe someone can help there as well.
So i need to parse entries like this:
"Test":{
"entry":{
"Type":"Something"
},
"entry":{
"Type":"Something_Else"
}
}, ...
The json default parser updates the dictionary and therfore uses only the last entry. I HAVE to somehow store the other one as well, and i have no idea how to do this. I also HAVE to store the keys in the several dictionaries in the same order they appear in the file, thats why i am using an OrderedDict to do so. it works fine, so if there is any way to expand this with the duplicate entries i'd be grateful.
My second issue is that this very same json file contains entries like that:
"Test":{
{
"Type":"Something"
}
}
Json.load() function raises an exception when it reaches that line in the json file. The only way i worked around this was to manually remove the inner brackets myself.
Thanks in advance
You can use JSONDecoder.object_pairs_hook to customize how JSONDecoder decodes objects. This hook function will be passed a list of (key, value) pairs that you usually do some processing on, and then turn into a dict.
However, since Python dictionaries don't allow for duplicate keys (and you simply can't change that), you can return the pairs unchanged in the hook and get a nested list of (key, value) pairs when you decode your JSON:
from json import JSONDecoder
def parse_object_pairs(pairs):
return pairs
data = """
{"foo": {"baz": 42}, "foo": 7}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
[(u'foo', [(u'baz', 42)]), (u'foo', 7)]
How you use this data structure is up to you. As stated above, Python dictionaries won't allow for duplicate keys, and there's no way around that. How would you even do a lookup based on a key? dct[key] would be ambiguous.
So you can either implement your own logic to handle a lookup the way you expect it to work, or implement some sort of collision avoidance to make keys unique if they're not, and then create a dictionary from your nested list.
Edit: Since you said you would like to modify the duplicate key to make it unique, here's how you'd do that:
from collections import OrderedDict
from json import JSONDecoder
def make_unique(key, dct):
counter = 0
unique_key = key
while unique_key in dct:
counter += 1
unique_key = '{}_{}'.format(key, counter)
return unique_key
def parse_object_pairs(pairs):
dct = OrderedDict()
for key, value in pairs:
if key in dct:
key = make_unique(key, dct)
dct[key] = value
return dct
data = """
{"foo": {"baz": 42, "baz": 77}, "foo": 7, "foo": 23}
"""
decoder = JSONDecoder(object_pairs_hook=parse_object_pairs)
obj = decoder.decode(data)
print obj
Output:
OrderedDict([(u'foo', OrderedDict([(u'baz', 42), ('baz_1', 77)])), ('foo_1', 7), ('foo_2', 23)])
The make_unique function is responsible for returning a collision-free key. In this example it just suffixes the key with _n where n is an incremental counter - just adapt it to your needs.
Because the object_pairs_hook receives the pairs exactly in the order they appear in the JSON document, it's also possible to preserve that order by using an OrderedDict, I included that as well.
Thanks a lot #Lukas Graf, i got it working as well by implementing my own version of the hook function
def dict_raise_on_duplicates(ordered_pairs):
count=0
d=collections.OrderedDict()
for k,v in ordered_pairs:
if k in d:
d[k+'_dupl_'+str(count)]=v
count+=1
else:
d[k]=v
return d
Only thing remaining is to automatically get rid of the double brackets and i am done :D Thanks again
If you would prefer to convert those duplicated keys into an array, instead of having separate copies, this could do the work:
def dict_raise_on_duplicates(ordered_pairs):
"""Convert duplicate keys to JSON array."""
d = {}
for k, v in ordered_pairs:
if k in d:
if type(d[k]) is list:
d[k].append(v)
else:
d[k] = [d[k],v]
else:
d[k] = v
return d
And then you just use:
dict = json.loads(yourString, object_pairs_hook=dict_raise_on_duplicates)

Python - Add to a dictionary using a string

[Python 3.4.2]
I know this question sounds ridiculous, but I can't figure out where I'm messing up. I'm trying to add keys and values to a dictionary by using strings instead of quoted text. So instead of this,
dict['key'] = value
this:
dict[key] = value
When I run the command above, I get this error:
TypeError: 'str' object does not support item assignment
I think Python is thinking that I'm trying to create a string, not add to a dictionary. I'm guessing I'm using the wrong syntax. This is what I'm trying to do:
dict[string_for_key][string_for_value] = string_for_deeper_value
I want this^ command to do this:
dict = {string_for_key: string_for_value: string_for_deeper_value}
I'm getting this error:
TypeError: 'str' object does not support item assignment
I should probably give some more context. I'm:
creating one dictionary
creating a copy of it (because I need to edit the dictionary while iterating through it)
iterating through the first dictionary while running some queries
trying to assign a query's result as a value for each "key: value" in the dictionary.
Here's a picture to show what I mean:
key: value: query_as_new_value
-----EDIT-----
Sorry, I should have clarified: the dictionary's name is not actually 'dict'; I called it 'dict' in my question to show that it was a dictionary.
-----EDIT-----
I'll just post the whole process I'm writing in my script. The error occurs during the last command of the function. Commented out at the very bottom are some other things I've tried.
from collections import defaultdict
global query_line, pericope_p, pericope_f, pericope_e, pericope_g
def _pre_query(self, typ):
with open(self) as f:
i = 1
for line in f:
if i == query_line:
break
i += 1
target = repr(line.strip())
###skipping some code
pericope_dict_post[self][typ] = line.strip()
#^Outputs error TypeError: 'str' object does not support item assignment
return
pericope_dict_pre = {'pericope-p.txt': 'pericope_p',
'pericope-f.txt': 'pericope_f',
'pericope-e.txt': 'pericope_e',
'pericope-g.txt': 'pericope_g'}
pericope_dict_post = defaultdict(dict)
#pericope_dict_post = defaultdict(list)
#pericope_dict_post = {}
for key, value in pericope_dict_pre.items():
pericope_dict_post[key] = value
#^Works
#pericope_dict_post.update({key: value})
#^Also works
#pericope_dict_post.append(key)
#^AttributeError: 'dict' object has no attribute 'append'
#pericope_dict_post[key].append(value)
#^AttributeError: 'dict' object has no attribute 'append'
_pre_query(key, value)
-----FINAL EDIT-----
Matthias helped me figure it out, although acushner had the solution too. I was trying to make the dictionary three "levels" deep, but Python dictionaries cannot work this way. Instead, I needed to create a nested dictionary. To use an illustration, I was trying to do {key: value: value} when I needed to do {key: {key: value}}.
To apply this to my code, I need to create the [second] dictionary with all three strings at once. So instead of this:
my_dict[key] = value1
my_dict[key][value1] = value2
I need to do this:
my_dict[key][value1] = value2
Thanks a ton for all your help guys!
You could create a dictionary that expands by itself (Python 3 required).
class AutoTree(dict):
"""Dictionary with unlimited levels"""
def __missing__(self, key):
value = self[key] = type(self)()
return value
Use it like this.
data = AutoTree()
data['a']['b'] = 'foo'
print(data)
Result
{'a': {'b': 'foo'}}
Now I'm going to explain your problem with the message TypeError: 'str' object does not support item assignment.
This code will work
from collections import defaultdict
data = defaultdict(dict)
data['a']['b'] = 'c'
data['a'] doesn't exist, so the default value dict is used. Now data['a'] is a dict and this dictionary gets a new value with the key 'b' and the value 'c'.
This code won't work
from collections import defaultdict
data = defaultdict(dict)
data['a'] = 'c'
data['a']['b'] = 'c'
The value of data['a'] is defined as the string 'c'. Now you can only perform string operations with data['a']. You can't use it as a dictionary now and that's why data['a']['b'] = 'c' fails.
first, do not use dict as your variable name as it shadows the built-in of the same name.
second, all you want is a nested dictionary, no?
from collections import defaultdict
d = defaultdict(dict)
d[string_for_key][string_for_value] = 'snth'
another way, as #Matthias suggested, is to create a bottomless dictionary:
dd = lambda: defaultdict(dd)
d = dd()
d[string_for_key][string_for_value] = 'snth'
you can do something like this:
>>> my_dict = {}
>>> key = 'a' # if key is not defined before it will raise NameError
>>> my_dict[key] = [1]
>>> my_dict[key].append(2)
>>> my_dict
{'a': [1, 2]}
Note: dict is inbuilt don't use it as variable name

Categories