Converting nested JSON into Python dictionary

Converting nested JSON into Python dictionary - python

I'm receiving a string server side which I then convert to JSON:
127.0.0.1:8000/devices/f751/?json={ "DeviceId":"192-2993-2993", "Date":"1/4/2019 9:52:2", "Location":"-1.000000000,-1.000000000", "Key":"{XXXX-XXXX-XXXX}", "Data":" { \"Value0\":\"{ \"ReferenceValue\":\"Elevation\", \"Prediction\":\"22.216558464\"}\", \"Value1\":\"{ \"ReferenceValue\":\"Wind Speed\", \"Prediction\":\"42.216558464\"}\" } "}
After conversion using json.loads() I get the following output:
updatedRequest = json.loads(jsonRequest)
updatedRequest
{'DeviceId': '192-2993-2993',
'Date': '1/4/2019 9:52:2',
'Location': '-1.000000000,-1.000000000',
'Key': '{XXXX-XXXX-XXXX}',
'Data': '{ "Value0":"{ "ReferenceValue":"Elevation", "Prediction":"22.216558464"}", "Value1":"{ "ReferenceValue":"Wind Speed", "Prediction":"42.216558464"}" }'}
So far so good, I can access the Data value via updatedRequest['Data'].
updatedRequest['Data']
'{ "Value0":"{ "ReferenceValue":"Elevation", "Prediction":"22.216558464"}", "Value1":"{ "ReferenceValue":"Wind Speed", "Prediction":"42.216558464"}" }'
My issue when attempting to convert this into a Python usable dictionary (e.g updatedRequest['Data']['Value0']['ReferenceValue']). Because there is an unknown number of 'Value' keys, I'm uncertain as to what the best procedure would be to move this into workable data.

You have received a JSON document with a nested JSON document, itself containing further JSON documents, inside one another like a Matryoshka doll.
Unfortunately, you can only decode one level, because the next level is broken. There should be \ escapes in front of the " quote characters used for the 3rd level of JSON documents, just like the second level quotes were escaped when it was embedded in the top-level JSON document. Those are missing so no JSON parser can decode it anymore. The delimiters around JSON strings have been derailed by stray, unescaped " characters that were meant to be part of a JSON string value.
You either need to repair the client sending this data, and discard these malformed values as an invalid request.
For completeness sake, a valid document would look like this:
>>> v0 = '''{ "ReferenceValue":"Elevation", "Prediction":"22.216558464"}'''
>>> v1 = '''{ "ReferenceValue":"Wind Speed", "Prediction":"42.216558464"}" }'''
>>> data_value = json.dumps({'Value0': v0, 'Value1': v1})
>>> print(json.dumps({'Data': data_value, 'Date': '1/4/2019 9:52:2', 'DeviceId': '192-2993-2993', 'Key': '{XXXX-XXXX-XXXX}', 'Location': '-1.000000000,-1.000000000'}, indent=4))
{
"Data": "{\"Value0\": \"{ \\\"ReferenceValue\\\":\\\"Elevation\\\", \\\"Prediction\\\":\\\"22.216558464\\\"}\", \"Value1\": \"{ \\\"ReferenceValue\\\":\\\"Wind Speed\\\", \\\"Prediction\\\":\\\"42.216558464\\\"}\\\" }\"}",
"Date": "1/4/2019 9:52:2",
"DeviceId": "192-2993-2993",
"Key": "{XXXX-XXXX-XXXX}",
"Location": "-1.000000000,-1.000000000"
}
Note the \" and \\\" escapes in the Data value. On decoding, the string value for Data will have one level of escape sequences removed, forming " and \" sequences, where the " quotes are part of the JSON syntax and \" are part of the string values, which in turn can be decoded to " used in the innermost JSON document.

It really depends what you want to do with the data. You can loop through the 'Data' dictionary with:
for k,v in updatedRequest['Data'].items():
# do some stuff
This will allow you to process without having to deal with the variable number of items in this dictionary. Hard to say what is best without knowing exactly what you wish to do though!

Related

Json.loads() string that represents a dictionary with unicode values

I have a command line tool that takes strings that represent dictionaries and runs traffic
warp_cli -j '{"warp_ip":"172.18...
Robot Framework is generating parts of these dictionaries and inject unicode keys u'TCP'. If there is a unicode key being passed to the dictionary I get:
ValueError: No JSON object could be decoded
Example:
import json
streams = [
{
"burst_loop_count": "500",
"protocol": u"TCP",
"tcp_src_port": "10000",
},
{
"burst_loop_count": "500",
"protocol": u"TCP",
"tcp_src_port": "10000"
}
]
def _create_streams_arg(streams):
return str(streams).replace("'", '"')
print(json.loads(_create_streams_arg((streams))))
Is there a way to: In place map str(x) or x.encode("ascii", "ignore") to all values in Streams or is there a way to have json.loads() properly load something that holds unicode?
I got it working with:
def _encode(streams):
encoded = []
for stream in streams:
encoded.append({str(k): str(v) for k, v in stream.iteritems()})
return encoded
But I hoped there might be a cleaner solution

Unable to iterate through JSON data

I'm trying to loop through JSON data to find values for specific keys. My data is coming from a http request and the data looks like:
{'1': {'manufacturername': 'SVLJ',
'modelid': 'TCL014',
'name': 'Fling'},
'10': {'manufacturername': 'SONY',
'modelid': 'BLL4554',
'name': 'ACQ'}}
My current goal is to loop through each item number (1, 10, etc..) and get the value for light ('fling', 'acq', etc..). My latest attempt is:
import requests
RESOURCE_URL = 'xxx/xxx/'
def get_json(url):
raw_response = requests.get(url)
data = raw_response.json()
return data
def get_SMR():
url = "{}SMR/".format(RESOURCE_URL)
return get_json(url)
smr_json = get_SMR()
for SMR in smr_json:
print(SMR['name'])
When I try running this, I get the error:
TypeError: string indices must be integers
I've also tried importing the json library, and using json.loads(raw_response.text); however, it's still being recognized as a string, rather than an iterable json object (that can be referenced by key). Any and all insight would be greatly appreciated.

When you are doing for SMR in smr_json:, you are iterating over the keys of the dictionary. In other words, SMR is a string, which does not allow indexing by a string:
In [1]: SMR = 'test'
In [2]: SMR['string']
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
...
TypeError: string indices must be integers
You've meant to iterate over both the keys and values:
for key, SMR in smr_json.items():
print(SMR['name'])
Or, just values:
for SMR in smr_json.values():
print(SMR['name'])

You are probably getting a string because that is not valid JSON. JSON requires " for strings, not '.
See json.org:
A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes.

I think that the problem is in JSON file. Single quotes are not allowed.
I'd first replace the single quotes ' with the double quotes " , to have something like this:
{
"1": {
"manufacturername": "SVLJ",
"modelid": "TCL014",
"name": "Fling"
},
"10": {
"manufacturername": "SONY",
"modelid": "BLL4554",
"name": "ACQ"
}
}

Importing wrongly concatenated JSONs in python

I've a text document that has several thousand jsons strings in the form of: "{...}{...}{...}". This is not a valid json it self but each {...} is.
I currently use the following a regular expression to split them:
fp = open('my_file.txt', 'r')
raw_dataset = (re.sub('}{', '}\n{', fp.read())).split('\n')
Which basically breaks every line where a curly bracket closes and other opens (}{ -> }\n{) so I can split them into different lines.
The problem is that few of them have a tags attribute written as "{tagName1}{tagName2}" which breaks my regular expression.
An example would be:
'{"name":\"Bob Dylan\", "tags":"{Artist}{Singer}"}{"name": "Michael Jackson"}'
Is parsed into
'{"name":"Bob Dylan", "tags":"{Artist}'
'{Singer}"}'
'{"name": "Michael Jackson"}'
instead of
'{"name":"Bob Dylan", "tags":"{Artist}{Singer}"}'
'{"name": "Michael Jackson"}'
What is the proper way of achieve this for further json parsing?

Use the raw_decode method of json.JSONDecoder
>>> import json
>>> d = json.JSONDecoder()
>>> x='{"name":\"Bob Dylan\", "tags":"{Artist}{Singer}"}{"name": "Michael Jackson"}'
>>> d.raw_decode(x)
({'tags': '{Artist}{Singer}', 'name': 'Bob Dylan'}, 47)
>>> x=x[47:]
>>> d.raw_decode(x)
({'name': 'Michael Jackson'}, 27)
raw_decode returns a 2-tuple, the first element being the decoded JSON and the second being the offset in the string of the next byte after the JSON ended.
To loop until the end or until an invalid JSON element is encountered:
>>> while True:
... try:
... j,n = d.raw_decode(x)
... except ValueError:
... break
... print(j)
... x=x[n:]
...
{'name': 'Bob Dylan', 'tags': '{Artist}{Singer}'}
{'name': 'Michael Jackson'}
When the loop breaks, inspection of x will reveal if it has processed the whole string or had encountered a JSON syntax error.
With a very long file of short elements you might read a chunk into a buffer and apply the above loop, concatenating anything that's left over with the next chunk after the loop breaks.

You can use the jq command line utility to transfer your input to json. Let's say you have the following input:
input.txt:
{"name":"Bob Dylan", "tags":"{Artist}{Singer}"}{"name": "Michael Jackson"}
You can use jq -s, which consumes multiple json documents from input and transfers them into a single output array:
jq -s . input.txt
Gives you:
[
{
"name": "Bob Dylan",
"tags": "{Artist}{Singer}"
},
{
"name": "Michael Jackson"
}
]
I've just realized that there are python bindings for libjq. Meaning you
don't need to use the command line, you can use jq directly in python.
https://github.com/mwilliamson/jq.py
However, I've not tried it so far. Let me give it a try :) ...
Update: The above library is nice, but it does not support the slurp mode so far.

you need to make a parser ... I dont think regex can help you for
data = ""
curlies = []
def get_dicts(file_text):
for letter in file_text:
data += letter
if letter == "{":
curlies.append(letter)
elif letter == "}":
curlies.pop() # remove last
if not curlies:
yield json.loads(data)
data = ""
note that this does not actually solve the problem that {name:"bob"} is not valid json ... {"name":"bob"} is
this will also break in the event you have weird unbalanced parenthesis inside of strings ie {"name":"{{}}}"} would break this
really your json is so broken based on your example your best bet is probably to edit it by hand and fix the code that is generating it ... if that is not feasible you may need to write a more complex parser using pylex or some other grammar library (effectively writing your own language parser)

decoding JSON data with backslash encoding

I have the following JSON data.
"[
\"msgType\": \"0\",
\"tid\": \"1\",
\"data\": \"[
{
\\\"EventName\\\": \\\"TExceeded\\\",
\\\"Severity\\\": \\\"warn\\\",
\\\"Subject\\\": \\\"Exceeded\\\",
\\\"Message\\\": \\\"tdetails: {
\\\\\\\"Message\\\\\\\": \\\\\\\"my page tooktoolong(2498ms: AT: 5ms,
BT: 1263ms,
CT: 1230ms),
andexceededthresholdof5ms\\\\\\\",
\\\\\\\"Referrer\\\\\\\": \\\\\\\"undefined\\\\\\\",
\\\\\\\"Session\\\\\\\": \\\\\\\"None\\\\\\\",
\\\\\\\"ResponseTime\\\\\\\": 0,
\\\\\\\"StatusCode\\\\\\\": 0,
\\\\\\\"Links\\\\\\\": 215,
\\\\\\\"Images\\\\\\\": 57,
\\\\\\\"Forms\\\\\\\": 2,
\\\\\\\"Platform\\\\\\\": \\\\\\\"Linuxx86_64\\\\\\\",
\\\\\\\"BrowserAppname\\\\\\\": \\\\\\\"Netscape\\\\\\\",
\\\\\\\"AppCodename\\\\\\\": \\\\\\\"Mozilla\\\\\\\",
\\\\\\\"CPUs\\\\\\\": 8,
\\\\\\\"Language\\\\\\\": \\\\\\\"en-GB\\\\\\\",
\\\\\\\"isEvent\\\\\\\": \\\\\\\"true\\\\\\\",
\\\\\\\"PageLatency\\\\\\\": 2498,
\\\\\\\"Threshold\\\\\\\": 5,
\\\\\\\"AT\\\\\\\": 5,
\\\\\\\"BT\\\\\\\": 1263,
\\\\\\\"CT\\\\\\\": 1230
}\\\",
\\\"EventTimestamp\\\": \\\"1432514783269\\\"
}
]\",
\"Timestamp\": \"1432514783269\",
\"AppName\": \"undefined\",
\"Group\": \"UndefinedGroup\"
]"
I want to make this JSON file into a single level of wrapping.i.e I want to remove the nested structure inside and copy that data over to the top level JSON structure. How can I do this?
If this strucutre is named json_data
I want to be able to access
json_data['Platform']
json_data[BrowserAppname']
json_data['Severity']
json_data['msgType']
Basically some kind of rudimentary normalization.What is the easiest way to do this using python

A generally unsafe but probably okay in this case solution would be:
import json
d = json.loads(json_string.replace('\\', ''))

I'm not sure what happened but this doesn't look like valid JSON.
You have some double-quotes escaped once, some twice, some three times etc.
You have key/value pairs inside of a list-like object []
tdetails is missing a trailing quote
Even if you fix the above you still have your data list quoted as a multi-line string which is invalid.
It appears to be that this "JSON" was constructed by hand, by someone with no knowledge of JSON.
You can try "massaging" the data into JSON with the following:
import re
x = re.sub(r'\\+', '', js_str)
x = re.sub(r'\n', '', js_str)
x = '{' + js_str.strip()[1:-1] + '}'
Which would make the string almost json like, but you still need to fix point #3.

Parsing JSON failed

I am trying to parse this data (from the Viper malware analysis framework API specifically). I am having a hard time figure out the best way to do this. Ideally, I would just do a:
jsonObject.get("SSdeep")
... and I would get the value.
I don't think this is valid JSON unfortunately, and without editing the source of the project, how can I make this proper JSON or easily get these values?
[{
'data': {
'header': ['Key', 'Value'],
'rows': [
['Name', u 'splwow64.exe'],
['Tags', ''],
['Path', '/home/ubuntu/viper-master/projects/../binaries/8/e/e/5/8ee5b228bd78781aa4e6b2e15e965e24d21f791d35b1eccebd160693ba781781'],
['Size', 125952],
['Type', 'PE32+ executable (GUI) x86-64, for MS Windows'],
['Mime', 'application/x-dosexec'],
['MD5', '4b1d2cba1367a7b99d51b1295b3a1d57'],
['SHA1', 'caf8382df0dcb6e9fb51a5e277685b540632bf18'],
['SHA256', '8ee5b228bd78781aa4e6b2e15e965e24d21f791d35b1eccebd160693ba781781'],
['SHA512', '709ca98bfc0379648bd686148853116cabc0b13d89492c8a0fa2596e50f7e4d384e5c359081a90f893d8d250cfa537193cbaa1c53186f29c0b6dedeb50d53d4d'],
['SSdeep', ''],
['CRC32', '7106095E']
]
},
'type': 'table'
}]
Edit 1
Thank you! So I have tried this:
jsonObject = r.content.replace("'", "\"")
jsonObject = jsonObject.replace(" u", "")
and the output I have now is:
"[{"data": {"header": ["Key", "Value"], "rows": [["Name","splwow64.exe"], ["Tags", ""], ["Path", "/home/ubuntu/viper-master/projects/../binaries/8/e/e/5/8ee5b228bd78781aa4e6b2e15e965e24d21f791d35b1eccebd160693ba781781"], ["Size", 125952], ["Type", "PE32+ executable (GUI) x86-64, for MS Windows"], ["Mime", "application/x-dosexec"], ["MD5", "4b1d2cba1367a7b99d51b1295b3a1d57"], ["SHA1", "caf8382df0dcb6e9fb51a5e277685b540632bf18"], ["SHA256", "8ee5b228bd78781aa4e6b2e15e965e24d21f791d35b1eccebd160693ba781781"], ["SHA512", "709ca98bfc0379648bd686148853116cabc0b13d89492c8a0fa2596e50f7e4d384e5c359081a90f893d8d250cfa537193cbaa1c53186f29c0b6dedeb50d53d4d"], ["SSdeep", ""], ["CRC32", "7106095E"]]}, "type": "table"}]"
and now I'm getting this error:
File "/usr/lib/python2.7/json/decoder.py", line 369, in decode
raise ValueError(errmsg("Extra data", s, end, len(s)))
ValueError: Extra data: line 1 column 5 - line 1 column 716 (char 4 - 715)
Note: I'd really rather not do the find and replaces like that.. especially the " u" one, as this could have unintended consequences.
Edit 2:
Figured it out! Thank you everyone!
Here's what I ended up doing, as someone mentioned the original text from the server was a "list of dicts":
r = requests.post(url, data=data) #Make the server request
listObject = r.content #Grab the content (don't really need this line)
listObject = listObject[1:-1] #Get rid of the quotes
listObject = ast.literal_eval(listObject) #Create a list out of the literal characters of the string
dictObject = listObject[0] #My dict!

JSON specifies double quotes "s for strings, from the JSON standard
A value can be a string in double quotes, or a number, or true or false or null, or an object or an array.
So you would need to replace all the single quotes with double quotes:
data.replace("'", '"')
There is also a spurious u in the Name field that will need to be removed.
However if the data is valid Python and you trust it you could try evaluating it, this worked with your original data (without the space after the u):
result = eval(data)
Or more safely:
result = ast.literal_eval(data)

Now you appear to have quotes "wrapping" the entire thing. Which is causing all the brackets to be strings. Remove the quotes at the start and end of the JSON.
Also, in JSON, start the structure with either '[' or '{' (usually '{'), not both.

No need to use eval(), just replace the malformed characters (use escape \ character) and parse it with json will be fine:
resp = r.content.replace(" u \'", " \'").replace("\'", "\"")
json.loads(resp)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting nested JSON into Python dictionary - python

Related

Json.loads() string that represents a dictionary with unicode values

Unable to iterate through JSON data

Importing wrongly concatenated JSONs in python

decoding JSON data with backslash encoding

Parsing JSON failed

Categories

Resources