Python JSON parsing: ignore sub-objects - python

I only want to parse the root object of a JSON string. If this object contains any key where the value is also an object, the value should be kept as string, and should not be treated as Python dictionary.
input = '{ "a": 1, "b": { "c": 2 } }'
Needed outcome:
result = {
'a': 1,
'b': '{ "c": 2 }'
}
The reason for doing so is because the sub-objects are large, and we won't process them here, so parsing and storing them as typed values are not useful. Surely some parsing have to be done, but at least objects are not created, the deep processing of the token can be skipped.
After using json.loads(input), I would be able to convert back the value via json.dumps(result['c']). Is there a better way to do this? Maybe a pre-created JSONDecoder which yields all sub-object tokens as string?

Definetely not the best solution and maybe something you have thought of already but here is a solution which converts all values which are dicts to strings after the fact.
input = '{ "a": 1, "b": { "c": 2 } }'
import json
data = json.loads(input)
for k, v in data.items():
if isinstance(v, dict):
data[k] = str(v)

Related

How to sum equal key values when inserting them into a new dictionary in Python?

I have the dictionary that I got from a .txt file.
dictOne = {
"AAA": 0,
"BBB": 1,
"AAA": 3,
"BBB": 1,
}
I would like to generate a new dictionary called dictTwo with the sum of values of equal keys. Result:
dictTwo = {
"AAA": 3,
"BBB": 2,
}
I prepared the following code, but it points to error syntax (SyntaxError: invalid syntax):
import json
dictOne = json.loads(text)
dictTwo = {}
for k, v in dictOne.items():
dictTwo [k] = v += v
Can anyone help me what error?
Assuming you resolve the duplicate key issue in dict
dictOne = {
"AAA": 0,
"BBB": 1,
"AAA": 3,
"BBB": 1
}
dictTwo = {
"AAA": 3,
"BBB": 2,
}
for k, v in dictOne.items():
if k in dictTwo:
dictTwo [k] += v
else:
dictTwo[k] = v
print(dictTwo)
You can do this if you do it while reading the JSON input.
JSON permits duplicate keys in objects, although it discourages the practice, noting that different JSON processors produce different results for duplicate keys.
Python does not allow duplicate keys in dictionaries, and Python's json module handles duplicate keys in one of the ways noted by the JSON standard: it ignores all but the last value for any such key. However, it gives you a mechanism to do your own processing of objects, in case you want to do something else with duplicate keys (or produce something other than a dictionary).
You do this by providing the object_pairs_hook parameter to json.load or json.loads. That parameter should be a function whose argument is an iterable of (key, value) pairs, where the key is a string and the value is an already processed JSON object. Whatever the function returns will be the value used by json.load for an object literal; it does not need to return a dict.
That implies that the handling of duplicate keys will be the same for every object literal in the JSON input, which is a bit of a limitation, but it may be acceptable in your case.
Here's a simple example:
import json
def key_combiner(pairs):
rv = {}
for k, v in pairs:
if k in rv: rv[k] += v
else: rv[k] = v
return rv
# Sample usage:
# (Note: JSON doesn't allow trailing commas in objects or lists.)
json_data = '''{
"AAA": 0,
"BBB": 1,
"AAA": 3,
"BBB": 1
}'''
consolidated = json.loads(json_data, object_pairs_hook=key_combiner)
print(consolidated)
This prints {'AAA': 3, 'BBB': 2}.
If I'd known that the values were numbers, I could have used a slightly simpler definition using defaultdict. Writing it the way I did permits combining certain other value types, such as strings or arrays, provided that all the values for the same key in an object are the same type. (Unfortunately, it doesn't allow combining objects, because Python uses | to combine two dicts, instead of +.)
This feature was mostly intended to be used for creating class instances from json objects, but it has many other possible uses.

KeyError 0 when deleting JSON key with value

I try to write script for deleting JSON fragment.
Currently I stopped with deleting key and value.
I get key error 0:
File "<stdin>", line 4, in <module>
KeyError: 0
I use json module and Python 2.7.
My sample json file is this:
"1": {
"aaa": "234235",
"bbb": "sfd",
"date": "01.01.2022",
"ccc": "456",
"ddd": "dghgdehs"
},
"2": {
"aaa": "544634436",
"bbb": "rgdfhfdsh",
"date": "01.01.2022",
"ccc": "etw",
"ddd": "sgedsry"
}
And faulty code is this:
import json
obj = json.load(open("aaa.json"))
for i in xrange(len(obj)):
if obj[i]["date"] == "01.01.2022":
obj.pop(i)
break
What I do wrong here?
i will take on the integer values 0, 1, but your object is a dictionary with string keys "1", "2". So iterate over the keys instead, which is simply done like this:
for i in obj:
if obj[i]["date"] == "01.01.2022":
obj.pop(i)
break
In your loop, range yields integers, the first being 0. The is no integer as key in your json so this immediately raises a KeyError.
Instead, loop over obj.items() which yields key-value pairs. Since some of your entries are not dict themselves, you will need to be careful with accessing obj[i]['date'].
if isinstance(v, dict) and v.get("date") == "01.01.2022":
obj.pop(k)
break
The way you're reading it in, obj is a dict. You're trying to access it as a list, with integer indices. This code:
for i in range(len(obj)):
if obj[i]["date"] == "Your Date":
...
First calls obj[0]["date"], then obj[1]["date"], and so on. Since obj is not a list, 0 here is interpreted here as a key - and since obj doesn't have a key 0, you get a KeyError.
A better way to do this would be to iterate through the dict by keys and values:
for k, v in obj.items():
if v["date"] == "your date": # index using the value
obj.pop(k) # delete the key

Remove entire JSON object if it contains a specified phrase (from a list in python)

Have a JSON file output similar to:
{
"object1": {
"json_data": "{json data information}",
"tables_data": "TABLES_DATA"
},
"object2": {
"json_data": {json data information}",
"tables_data": ""
}
}
Essentially, if there is an empty string for tables_data as shown in object2 (eg. "tables_data": ""), I want the entire object to be removed so that the output would look like:
{
"object1": {
"json_data": "{json data information}",
"tables_data": "TABLES_DATA"
}
}
What is the best way to go about doing this? Each of these objects correspond to a separate index in a list that I've appended called summary[].
To achieve this, you could iterate through the JSON dictionary and test the tables_data values, adding the objectX elements to a new dictionary if their tables_data value passes the test:
new_dict = {k: v for k, v in json_dict.items()
if v.get("tables_data", "") != ""}
If your JSON objectX is stored in a list as you say, these could be processed as follows using a list comprehension:
filtered_summary = [object_dict for object_dict in summary
if object_dict.get("tables_data", "") != ""]
Unless you have compelling reasons to do otherwise, the pythonic way to filter out a list or dict (or any iterable) is not to change it in place but to create a new filtered one. For your case this would look like
raw_data = YOUR_DICT_HERE_WHEREVER_IT_COMES_FROM
# NB : empty string have a false value in a boolean test
cleaned_data = {k:v for k, v in raw_data.items() if not v["table_data"]}

Using Python to flatten and unpack dictionaries

I am using python to flatten and unpack JSON structures. I have already figured out flattening and can flatten JSON files into dictionary structures like this:
# Given the JSON
{
"a": "thing",
"b": {
"c": "foo"
},
"array": [
"item1",
"item2"
]
}
Then flatten() it into:
{
"a": "thing",
"b.c": "foo",
"array.[0]": "item1",
"array.[1]": "item2"
}
But any ideas on how to unpack those flattened dicts back into the original json? I had an idea on how to do it using string.split() on the key names but the arrays complicated things and now I don't know how to go about doing it. The trouble is arrays can have items that themselves are another array or dict. I am guessing something recursive?
UPDATE: So I have looking around for packages that unflatten (or flatten + unflatten, I don't care) and I found this one, which works well except it can't handle paths that include the separator character as part of the key name.
For example I had a path that flattened down into REG_SRC.http://www.awebsite.com/ but when unflattened, it got a little mangled because the dots in the URL were interpreted as key seperators. Does anyone know of a library that can handle key names with any text? Even text containing the separator character? I am assuming it would require the flat paths to be quote encapsulated or something "REG_SRC"."http://www.awebsite.com/"
You could try this:
import re
from bisect import insort
RE_SPLIT = re.compile(r'(?<!\\)\.')
RE_INDEX = re.compile(r'\[(\d+)\]')
data = {
"a": "thing",
"b.c": "foo",
"array.[1]": "item2",
"array.[0]": "item1",
"some\\.text": 'bar2', # you need to difference between `.` path operator and normal . char
}
result = {}
for key, value in data.items():
paths = RE_SPLIT.split(key)
if len(paths) > 1:
subkey, subvalue = paths
m = RE_INDEX.search(subvalue)
if m:
# if you care about order this code needs to be enhanced
insort(result.setdefault(subkey, []),((int(m.group(1)),value)))
else:
result.setdefault(subkey, {})[subvalue] = value
else:
result[key.replace('\\.', '.')] = value
# convert tuple of elements to just one element like (0, item) to item
for k in result:
if isinstance(result[k], list):
result[k] = [e[1] for e in result[k]]
print(result)
{'a': 'thing', 'b': {'c': 'foo'}, 'array': ['item1', 'item2'], 'some.text': 'bar2'}
I used bisect module to insert the items of list in order. as you can see item2 is in the second position.

Python 3: taking a json and splitting it into smaller jsons

So I have a file with a json that looks like:
{
"a":{
"ab":2,
"cd":3
},
"b":{
"ef":2,
"gh":3
},
"c":{
"ij":2,
"kl":3
}
}
So in python, I would like to import this json from the file, and then from that break it into separate jsons, each in a separate variable, such that each variable would look like:
json1 = {
"a":{
"ab":2,
"cd":3
}
}
##etc.
And these json variables should function as variables that can be converted to json objects, via methods like json.load, or json.dump.
How can this be done?
Once you've imported the file with json.load, what you have is just a plain old Python dict:
with open('bigfile.json') as f:
bigd = json.load('bigfile.json')
And if you iterate over items() for a dict, what you get is key-value pairs.
for key, value in bigd.items():
And turning a key-value pair back into a single-item dict is trivial.
smalld = {key: value}
At which point you have a dict again, so you can json.dump it.
with open(f'smallfile-{key}.json', 'w') as f:
json.dump(f, smalld)
Or whatever else you want to do with them. For example, append each smalld to a listodicts, or convert its repr to ASCII art and send it to /dev/lpr0, or whatever.

Categories