Convert a bytes array into JSON format - python

I want to parse a bytes string in JSON format to convert it into python objects. This is the source I have:
my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'
And this is the desired outcome I want to have:
[{
"Date": "2016-05-21T21:35:40Z",
"CreationDate": "2012-05-05",
"LogoType": "png",
"Ref": 164611595,
"Classes": [
"Email addresses",
"Passwords"
],
"Link": "http://some_link.com"}]
First, I converted the bytes to string:
my_new_string_value = my_bytes_value.decode("utf-8")
but when I try to invoke loads to parse it as JSON:
my_json = json.loads(my_new_string_value)
I get this error:
json.decoder.JSONDecodeError: Expecting value: line 1 column 174 (char 173)

Your bytes object is almost JSON, but it's using single quotes instead of double quotes, and it needs to be a string. So one way to fix it is to decode the bytes to str and replace the quotes. Another option is to use ast.literal_eval; see below for details. If you want to print the result or save it to a file as valid JSON you can load the JSON to a Python list and then dump it out. Eg,
import json
my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'
# Decode UTF-8 bytes to Unicode, and convert single quotes
# to double quotes to make it valid JSON
my_json = my_bytes_value.decode('utf8').replace("'", '"')
print(my_json)
print('- ' * 20)
# Load the JSON to a Python list & dump it back out as formatted JSON
data = json.loads(my_json)
s = json.dumps(data, indent=4, sort_keys=True)
print(s)
output
[{"Date": "2016-05-21T21:35:40Z", "CreationDate": "2012-05-05", "LogoType": "png", "Ref": 164611595, "Classe": ["Email addresses", "Passwords"],"Link":"http://some_link.com"}]
- - - - - - - - - - - - - - - - - - - -
[
{
"Classe": [
"Email addresses",
"Passwords"
],
"CreationDate": "2012-05-05",
"Date": "2016-05-21T21:35:40Z",
"Link": "http://some_link.com",
"LogoType": "png",
"Ref": 164611595
}
]
As Antti Haapala mentions in the comments, we can use ast.literal_eval to convert my_bytes_value to a Python list, once we've decoded it to a string.
from ast import literal_eval
import json
my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'
data = literal_eval(my_bytes_value.decode('utf8'))
print(data)
print('- ' * 20)
s = json.dumps(data, indent=4, sort_keys=True)
print(s)
Generally, this problem arises because someone has saved data by printing its Python repr instead of using the json module to create proper JSON data. If it's possible, it's better to fix that problem so that proper JSON data is created in the first place.

You can simply use,
import json
json.loads(my_bytes_value)

Python 3.5 + Use io module
import json
import io
my_bytes_value = b'[{\'Date\': \'2016-05-21T21:35:40Z\', \'CreationDate\': \'2012-05-05\', \'LogoType\': \'png\', \'Ref\': 164611595, \'Classe\': [\'Email addresses\', \'Passwords\'],\'Link\':\'http://some_link.com\'}]'
fix_bytes_value = my_bytes_value.replace(b"'", b'"')
my_json = json.load(io.BytesIO(fix_bytes_value))

d = json.dumps(byte_str.decode('utf-8'))

To convert this bytesarray directly to json, you could first convert the bytesarray to a string with decode(), utf-8 is standard. Change the quotation markers.. The last step is to remove the " from the dumped string, to change the json object from string to list.
dumps(s.decode()).replace("'", '"')[1:-1]

Better solution is:
import json
byte_array_example = b'{"text": "\u0627\u06CC\u0646 \u06CC\u06A9 \u0645\u062A\u0646 \u062A\u0633\u062A\u06CC \u0641\u0627\u0631\u0633\u06CC \u0627\u0633\u062A."}'
res = json.loads(byte_array_example.decode('unicode_escape'))
print(res)
result:
{'text': 'این یک متن تستی فارسی است.'}
decode by utf-8 cannot decode unicode characters. The right solution is uicode_escape
It is OK

if you have a bytes object and want to store it in a JSON file, then you should first decode the byte object because JSON only has a few data types and raw byte data isn't one of them. It has arrays, decimal numbers, strings, and objects.
To decode a byte object you first have to know its encoding. For this, you can use
import chardet
encoding = chardet.detect(your_byte_object)['encoding']
then you can save this object to your json file like this
data = {"data": your_byte_object.decode(encoding)}
with open('request.txt', 'w') as file:
json.dump(data, file)

The most simple solution is to use the json function that comes with http request.
For example:

Related

How to convert binary data to json

I want to convert the below data to json in python.
I have the data in the following format.
b'{"id": "1", "name": " value1"}\n{"id":"2", name": "value2"}\n{"id":"3", "name": "value3"}\n'
This has multiple json objects separated by \n. I was trying to load this as json .
converted the data into string first and loads as json but getting the exception.
my_json = content.decode('utf8')
json_data = json.loads(my_json)
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 2306)
You need to decode it then split by '\n' and load each json object separately. If you store your byte string in a variable called byte_string you could do something like:
json_str = byte_string.decode('utf-8')
json_objs = json_str.split('\n')
for obj in json_objs:
json.loads(obj)
For the particular string that you have posted here though, you will get an error on the second object because the second key in it is missing a double quote. It is name" in the string you linked.
First, this isn't valid json since it's not a single object. Second, there is a typo: the "id":"2" entry is missing a double-quote on the name property element.
An alternative to processing one dict at a time, you can replace the newlines with "," and turn it into an array. This is a fragile solution since it requires exactly one newline between each dict, but is compact:
s = b'{"id": "1", "name": " value1"}\n{"id":"2", "name": "value2"}\n{"id":"3", "name": "value3"}\n'
my_json = s.decode('utf8')
json_data = json.loads("[" + my_json.rstrip().replace("\n", ",") + "]")
What have to first decode your json to a string. So you can just say:
your_json_string = the_json.decode()
now you have a string.
Now what you want to do is:
your_json_string = your_json_string.replace("\\n", "")
so you are replacing the \n with nothing basically. Note that the two backslashes are required, this is not a typo.
Now you can just say:
your_json = json.loads(your_json_string)

Why do I need to do `json.loads` twice to parse a JSON string?

I'm trying to parse a JSON string with
json.loads(json_string)
but it returns a string instead of a dict. I can get the expected result by parsing it again
json.loads(json.loads(json_string))
but I don't understand why.
I receive a bytes object from a webhook:
bytes_object = b'"{\\"action\\":\\"connection_test\\",\\"data\\":{}}"'
The bytes object is then utf-8 decoded:
decoded_bytes = bytes_object.decode('utf-8')
"{\"action\":\"connection_test\",\"data\":{}}"
Then, the utf-8 decoded object is parsed using json.loads:
parsed_once = json.loads(decoded_bytes)
But this doesn't return a dict, but a string object looking like this:
{"action":"connection_test","data":{}}
of type <class 'str'>.
But if I parse it again I get the dict expected from the first try:
parsed_twice = json.loads(parsed_once)
{'action': 'connection_test', 'data': {}}
of type <class 'dict'>.
I suspect it's something about how Python 3.9 handles JSON escaping, but I'm not sure. Any help?
The JSON is double encoded, so it needs to be double-decoded. It went something like this:
>>> import json
>>> data = {'action': 'connection_test', 'data': {}}
>>> a = json.dumps(data)
>>> print(a)
{"action": "connection_test", "data": {}}
>>> b = json.dumps(a)
>>> print(b)
"{\"action\": \"connection_test\", \"data\": {}}"
That's a mistake that needs to be rectified on the producer side. As long as the producer gives you this double encoded JSON, you need to double decode it.

best ways to Decode and process Binary Bytes from files

I am trying to decode and process Binary data file, following is a data format
input:9,data:443,gps:3
and has more data in the same fashion, [key:value] format.
basically, I need to create a dictionary of the file to process it later.
input:b'input:9,data:443,gps:3'
Desired output:{'input': '9', 'data': '443', 'gps': '3'}
Your input data is bytes(sequence of bytes). To convert it to str object you can use bytes.decode(). Than you can work with data lie with sting and split it by , and :. Code:
inp = b"input:9,data:443,gps:3"
out = dict(s.split(":") for s in inp.decode().split(","))
The string input:9,data:443,gps:3 is text, not binary data, so I am going to guess that it is a format template, not a sample of the file contents. This format would mean that your file has an "input" field that is 9 bytes long, followed by 443 bytes of "data", followed by a 3-byte "gps" value. This description does not specify the types of the fields, so it is incomplete; but it's a start.
The Python tool for structured binary files is the module struct. Here's how to extract the three fields as bytes objects:
import struct
with open("some_file.bin", "rb") as binfile:
content = binfile.read()
input_, data, gps = struct.unpack("9s443s3s", content)
The function struct.unpack provides many other formats besides s; this is just an example. But there is no specifier for plain text strings, so if input_ is a text string, the next step is to convert it:
input_ = input_.decode("ascii") # or other encoding
Since you ask for a dictionary of the results, here is one way:
result = { "input":input_, "data": data, "gps": gps }
Solution using eval:
inp = b"input:9,data:443,gps:3"
out = eval(b'dict(%s)' % inp.replace(b':', b'='))

Parse the json output for specific value [duplicate]

I'll be receiving a JSON encoded string from Objective-C, and I am decoding a dummy string (for now) like the code below. My output comes out with character 'u' prefixing each item:
[{u'i': u'imap.gmail.com', u'p': u'aaaa'}, {u'i': u'333imap.com', u'p': u'bbbb'}...
How is JSON adding this Unicode character? What's the best way to remove it?
mail_accounts = []
da = {}
try:
s = '[{"i":"imap.gmail.com","p":"aaaa"},{"i":"imap.aol.com","p":"bbbb"},{"i":"333imap.com","p":"ccccc"},{"i":"444ap.gmail.com","p":"ddddd"},{"i":"555imap.gmail.com","p":"eee"}]'
jdata = json.loads(s)
for d in jdata:
for key, value in d.iteritems():
if key not in da:
da[key] = value
else:
da = {}
da[key] = value
mail_accounts.append(da)
except Exception, err:
sys.stderr.write('Exception Error: %s' % str(err))
print mail_accounts
The u- prefix just means that you have a Unicode string. When you really use the string, it won't appear in your data. Don't be thrown by the printed output.
For example, try this:
print mail_accounts[0]["i"]
You won't see a u.
Everything is cool, man. The 'u' is a good thing, it indicates that the string is of type Unicode in python 2.x.
http://docs.python.org/2/howto/unicode.html#the-unicode-type
The d3 print below is the one you are looking for (which is the combination of dumps and loads) :)
Having:
import json
d = """{"Aa": 1, "BB": "blabla", "cc": "False"}"""
d1 = json.loads(d) # Produces a dictionary out of the given string
d2 = json.dumps(d) # Produces a string out of a given dict or string
d3 = json.dumps(json.loads(d)) # 'dumps' gets the dict from 'loads' this time
print "d1: " + str(d1)
print "d2: " + d2
print "d3: " + d3
Prints:
d1: {u'Aa': 1, u'cc': u'False', u'BB': u'blabla'}
d2: "{\"Aa\": 1, \"BB\": \"blabla\", \"cc\": \"False\"}"
d3: {"Aa": 1, "cc": "False", "BB": "blabla"}
Those 'u' characters being appended to an object signifies that the object is encoded in Unicode.
If you want to remove those 'u' characters from your object, you can do this:
import json, ast
jdata = ast.literal_eval(json.dumps(jdata)) # Removing uni-code chars
Let's checkout from python shell
>>> import json, ast
>>> jdata = [{u'i': u'imap.gmail.com', u'p': u'aaaa'}, {u'i': u'333imap.com', u'p': u'bbbb'}]
>>> jdata = ast.literal_eval(json.dumps(jdata))
>>> jdata
[{'i': 'imap.gmail.com', 'p': 'aaaa'}, {'i': '333imap.com', 'p': 'bbbb'}]
Unicode is an appropriate type here. The JSONDecoder documentation describe the conversion table and state that JSON string objects are decoded into Unicode objects.
From 18.2.2. Encoders and Decoders:
JSON Python
==================================
object dict
array list
string unicode
number (int) int, long
number (real) float
true True
false False
null None
"encoding determines the encoding used to interpret any str objects decoded by this instance (UTF-8 by default)."
The u prefix means that those strings are unicode rather than 8-bit strings. The best way to not show the u prefix is to switch to Python 3, where strings are unicode by default. If that's not an option, the str constructor will convert from unicode to 8-bit, so simply loop recursively over the result and convert unicode to str. However, it is probably best just to leave the strings as unicode.
I kept running into this problem when trying to capture JSON data in the log with the Python logging library, for debugging and troubleshooting purposes. Getting the u character is a real nuisance when you want to copy the text and paste it into your code somewhere.
As everyone will tell you, this is because it is a Unicode representation, and it could come from the fact that you’ve used json.loads() to load in the data from a string in the first place.
If you want the JSON representation in the log, without the u prefix, the trick is to use json.dumps() before logging it out. For example:
import json
import logging
# Prepare the data
json_data = json.loads('{"key": "value"}')
# Log normally and get the Unicode indicator
logging.warning('data: {}'.format(json_data))
>>> WARNING:root:data: {u'key': u'value'}
# Dump to a string before logging and get clean output!
logging.warning('data: {}'.format(json.dumps(json_data)))
>>> WARNING:root:data: {'key': 'value'}
Try this:
mail_accounts[0].encode("ascii")
Just replace the u' with a single quote...
print (str.replace(mail_accounts,"u'","'"))

How do I decode unicode characters via python?

I am trying to import the following json file using python:
The file is called new_json.json:
{
"nextForwardToken": "f/3208873243596875673623625618474139659",
"events": [
{
"ingestionTime": 1045619,
"timestamp": 1909000,
"message": "2 32823453119 eni-889995t1 54.25.64.23 156.43.12.120 3389 23 6 342 24908 143234809 983246 ACCEPT OK"
}]
}
I have the following code to read the json file, and remove the unicode characters:
JSON_FILE = "new_json.json"
with open(JSON_FILE) as infile:
print infile
print '\n type of infile is \n', infile
data = json.load(infile)
str_data = str(data) # convert to string to remove unicode characters
wo_unicode = str_data.decode('unicode_escape').encode('ascii','ignore')
print 'unicode characters have been removed \n'
print wo_unicode
But print wo_unicode still prints with the unicode characters (i.e.u) in it.
The unicode characters cause a problem when trying to treat the json as a dictionary:
for item in data:
iden = item.get['nextForwardToken']
...results in an error:
AttributeError: 'unicode' object has no attribute 'get'
This has to work in Python2.7. Is there an easy way around this?
The error has nothing to do with unicode, you are trying to treat the keys as dicts, just use data to get 'nextForwardToken':
print data.get('nextForwardToken')
When you iterate over data, you are iterating over the keys so 'nextForwardToken'.get('nextForwardToken'), "events".get('nextForwardToken') etc.. are obviously not going to work even with the correct syntax.
Whether you access by data.get(u'nextForwardToken') or data.get('nextForwardToken'), both will return the value for the key:
In [9]: 'nextForwardToken' == u'nextForwardToken'
Out[9]: True
In [10]: data[u'nextForwardToken']
Out[10]: u'f/3208873243596875673623625618474139659'
In [11]: data['nextForwardToken']
Out[11]: u'f/3208873243596875673623625618474139659'
This code will give you the values as str without the unicode
import json
JSON_FILE = "/tmp/json.json"
with open(JSON_FILE) as infile:
print infile
print '\n type of infile is \n', infile
data = json.load(infile)
print data
str_data = json.dumps(data)
print str_data

Categories