Convert Python dictionary to JSON array - python

Currently I have this dictionary, printed using pprint:
{'AlarmExTempHum': '\x00\x00\x00\x00\x00\x00\x00\x00',
'AlarmIn': 0,
'AlarmOut': '\x00\x00',
'AlarmRain': 0,
'AlarmSoilLeaf': '\x00\x00\x00\x00',
'BarTrend': 60,
'BatteryStatus': 0,
'BatteryVolts': 4.751953125,
'CRC': 55003,
'EOL': '\n\r',
'ETDay': 0,
'ETMonth': 0,
'ETYear': 0,
'ExtraHum1': None,
'ExtraHum2': None,
'ExtraHum3': None,
'ExtraHum4': None,
'ExtraHum5': None,
'ExtraHum6': None,
'ExtraHum7': None,
'ExtraTemp1': None,
'ExtraTemp2': None,
'ExtraTemp3': None,
'ExtraTemp4': None,
'ExtraTemp5': None,
'ExtraTemp6': None,
'ExtraTemp7': None,
'ForecastIcon': 2,
'ForecastRuleNo': 122,
'HumIn': 31,
'HumOut': 94,
'LOO': 'LOO',
'LeafTemps': '\xff\xff\xff\xff',
'LeafWetness': '\xff\xff\xff\x00',
'NextRec': 37,
'PacketType': 0,
'Pressure': 995.9363359295631,
'RainDay': 0.0,
'RainMonth': 0.0,
'RainRate': 0.0,
'RainStorm': 0.0,
'RainYear': 2.8,
'SoilMoist': '\xff\xff\xff\xff',
'SoilTemps': '\xff\xff\xff\xff',
'SolarRad': None,
'StormStartDate': '2127-15-31',
'SunRise': 849,
'SunSet': 1611,
'TempIn': 21.38888888888889,
'TempOut': 0.8888888888888897,
'UV': None,
'WindDir': 219,
'WindSpeed': 3.6,
'WindSpeed10Min': 3.6}
When I do this:
import json
d = (my dictionary above)
jsonarray = json.dumps(d)
I get this error: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte

If you are fine with non-printable symbols in your json, then add ensure_ascii=False to dumps call.
>>> json.dumps(your_data, ensure_ascii=False)
If ensure_ascii is false, then the return value will be a
unicode instance subject to normal Python str to unicode
coercion rules instead of being escaped to an ASCII str.

ensure_ascii=False really only defers the issue to the decoding stage:
>>> dict2 = {'LeafTemps': '\xff\xff\xff\xff',}
>>> json1 = json.dumps(dict2, ensure_ascii=False)
>>> print(json1)
{"LeafTemps": "����"}
>>> json.loads(json1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/json/__init__.py", line 328, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte
Ultimately you can't store raw bytes in a JSON document, so you'll want to use some means of unambiguously encoding a sequence of arbitrary bytes as an ASCII string - such as base64.
>>> import json
>>> from base64 import b64encode, b64decode
>>> my_dict = {'LeafTemps': '\xff\xff\xff\xff',}
>>> my_dict['LeafTemps'] = b64encode(my_dict['LeafTemps'])
>>> json.dumps(my_dict)
'{"LeafTemps": "/////w=="}'
>>> json.loads(json.dumps(my_dict))
{u'LeafTemps': u'/////w=='}
>>> new_dict = json.loads(json.dumps(my_dict))
>>> new_dict['LeafTemps'] = b64decode(new_dict['LeafTemps'])
>>> print new_dict
{u'LeafTemps': '\xff\xff\xff\xff'}

If you use Python 2, don't forget to add the UTF-8 file encoding comment on the first line of your script.
# -*- coding: UTF-8 -*-
This will fix some Unicode problems and make your life easier.

One possible solution that I use is to use python3. It seems to solve many utf issues.
Sorry for the late answer, but it may help people in the future.
For example,
#!/usr/bin/env python3
import json
# your code follows

Related

Python 3 UnicodeEncodeError [duplicate]

This question already has answers here:
Unsupported operation :not writeable python
(2 answers)
Closed 5 years ago.
I have strings as follows in my python list (taken from command prompt):
>>> o['records'][5790]
(5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60,
True, '40141613')
>>>
I have tried suggestions as mentioned here: Changing default encoding of Python?
Further changed the default encoding to utf-16 too. But still json.dumps() threw and exception as follows:
>>> write(o)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "okapi_create_master.py", line 49, in write
o = json.dumps(output)
File "C:\Python27\lib\json\__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 201, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 264, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 25: invalid
continuation byte
Can't figure what kind of transformation is required for such strings so that json.dumps() works.
\xe1 is not decodable using utf-8, utf-16 encoding.
>>> '\xe1'.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 0: unexpected end of data
>>> '\xe1'.decode('utf-16')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_16.py", line 16, in decode
return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0xe1 in position 0: truncated data
Try latin-1 encoding:
>>> record = (5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ',
... 60, True, '40141613')
>>> json.dumps(record, encoding='latin1')
'[5790, "Vlv-Gate-Assy-Mdl-\\u00e1M1-2-\\u00e19/16-10K-BB Credit Memo ", 60, true, "40141613"]'
Or, specify ensure_ascii=False, json.dumps to make json.dumps not try to decode the string.
>>> json.dumps(record, ensure_ascii=False)
'[5790, "Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ", 60, true, "40141613"]'
I had a similar problem, and came up with the following approach to either guarantee unicodes or byte strings, from either input. In short, include and use the following lambdas:
# guarantee unicode string
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
_uu = lambda *tt: tuple(_u(t) for t in tt)
# guarantee byte string in UTF8 encoding
_u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
_uu8 = lambda *tt: tuple(_u8(t) for t in tt)
Applied to your question:
import json
o = (5790, u"Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ", 60,
True, '40141613')
as_json = json.dumps(_uu8(*o))
as_obj = json.loads(as_json)
print "object\n ", o
print "json (type %s)\n %s " % (type(as_json), as_json)
print "object again\n ", as_obj
=>
object
(5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60, True, '40141613')
json (type <type 'str'>)
[5790, "Vlv-Gate-Assy-Mdl-\u00e1M1-2-\u00e19/16-10K-BB Credit Memo ", 60, true, "40141613"]
object again
[5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60, True, u'40141613']
Here's some more reasoning about this.

Issue in reading JSON file in python

>>> import json
>>> d2 = json.loads(open("t.json").read())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
obj, end = self._scanner.iterscan(s, **kw).next()
File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
rval, next_pos = action(m, context)
File "/usr/lib64/python2.6/json/decoder.py", line 185, in JSONObject
raise ValueError(errmsg("Expecting object", s, end))
ValueError: Expecting object: line 1 column 11 (char 11)
[ RHEL - ~/testing ]$ cat t.json
{"us": u"OFF", "val": u"5"}
Here is what I have in my JSON file and when I try to read it using open and json.load and json.loads it fails.
After using json.load
>>> import json
>>> d2 = json.load(open("t.json"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/json/__init__.py", line 267, in load
parse_constant=parse_constant, **kw)
File "/usr/lib64/python2.6/json/__init__.py", line 307, in loads
return _default_decoder.decode(s)
File "/usr/lib64/python2.6/json/decoder.py", line 319, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib64/python2.6/json/decoder.py", line 336, in raw_decode
obj, end = self._scanner.iterscan(s, **kw).next()
File "/usr/lib64/python2.6/json/scanner.py", line 55, in iterscan
rval, next_pos = action(m, context)
File "/usr/lib64/python2.6/json/decoder.py", line 185, in JSONObject
raise ValueError(errmsg("Expecting object", s, end))
ValueError: Expecting object: line 1 column 11 (char 11)
>>>
You are using the wrong function. Use json.load() (no s!) to load data from an open file object:
d2 = json.load(open("t.json"))
The json.loads() function expects you to pass in a string, not a file object. You'd have to read your file in that case, returning the read data:
d2 = json.loads(open("t.json").read())
Next, you have invalid JSON in that file:
{"us": u"OFF", "val": u"5"}
# ^ ^
JSON is not Python; those u prefixes are not supported nor needed. You'll need to remove those from the file before it'll load.
If you have an API producing that format, it is not giving you JSON. It could be that it is producing a (strange form of) Python syntax instead; Python itself would produce {'us': u'OFF', 'val': u'5'} (single quotes). You can have Python interpret that as Python literals with ast.literal_eval():
import ast
with open('t.json') as fileobj:
d2 = ast.literal_eval(fileobj.read())
but it could be that the format is broken in other ways we cannot determine from a single isolated sample. It could be using true and false for boolean values, like in JSON, for example.
Better to have the API fixed rather that try and work around this broken-ness.
You are using the json.loads method. More documentation here. This method is used for string arguments only. Luckily, there is a similarly named json.load method documented here. This one can be used directly on a file object.
d2 = json.load(open("t.json"))
Your issue is that the JSON is not valid.
It looks like it is a python dictionnary. u'string' is a python 2 unicode string.
If you remove the u from your strings, it works fine.
>>> import json
>>> json.load(open('i.json'))
{u'val': u'5', u'us': u'OFF'}
Here is the json file:
$ cat i.json
{"us": "OFF", "val": "5"}

Utf-8 decode error in pyramid/WebOb request

I found an error in the logs of a website of mine, in the log i got the body of the request, so i tried to reproduce that
This is what i got.
>>> from mondishop.models import *
>>> from pyramid.request import *
>>> req = Request.blank('/')
>>> b = DBSession.query(Log).filter(Log.id == 503).one().payload.encode('utf-8')
>>> req.method = 'POST'
>>> req.body = b
>>> req.params
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/request.py", line 856, in params
params = NestedMultiDict(self.GET, self.POST)
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/request.py", line 807, in POST
vars = MultiDict.from_fieldstorage(fs)
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/multidict.py", line 92, in from_fieldstorage
obj.add(field.name, decode(value))
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/multidict.py", line 78, in <lambda>
decode = lambda b: b.decode(charset)
File "/home/phas/virtualenv/mondishop/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 52: invalid start byte
>>> req.POST
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/request.py", line 807, in POST
vars = MultiDict.from_fieldstorage(fs)
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/multidict.py", line 92, in from_fieldstorage
obj.add(field.name, decode(value))
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/multidict.py", line 78, in <lambda>
decode = lambda b: b.decode(charset)
File "/home/phas/virtualenv/mondishop/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 52: invalid start byte
>>>
The error is the same as the one i got i my log, so apparently something goes bad try to decoding the original post.
What is weird is that i get an error trying to utf-8 decode something that i just utf-8 encoded.
I cannot provide the content of the original request body because it contains some sensitive data (it's a paypal IPN) and i don't really have any idea on how to start addressing this issue.

Need a way to load embedded, escaped JSON strings in Python [duplicate]

This question already has answers here:
How can I parse (read) and use JSON?
(5 answers)
Closed last month.
I have to parse the following JSON string:
{"JobDescription":"{\"project\": \"1322\", \"vault\": \"qa-122\"}"}'
If I try to use json.loads, I get the following:
>>> import json
>>> print json.loads('{"JobDescription":"{\"project\": \"1322\", \"vault\": \"qa-122\"}"}')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 22 (char 21)
I don't have any control over the string I receive as its generated by another system.
You are not producing embedded backslashes; Python is interpreting the \" as an escaped quote and the final string just contains the quote:
>>> '{"JobDescription":"{\"project\": \"1322\", \"vault\": \"qa-122\"}"}'
'{"JobDescription":"{"project": "1322", "vault": "qa-122"}"}'
Use a raw string or double the slashes:
>>> r'{"JobDescription":"{\"project\": \"1322\", \"vault\": \"qa-122\"}"}'
'{"JobDescription":"{\\"project\\": \\"1322\\", \\"vault\\": \\"qa-122\\"}"}'
>>> '{"JobDescription":"{\\"project\\": \\"1322\\", \\"vault\\": \\"qa-122\\"}"}'
'{"JobDescription":"{\\"project\\": \\"1322\\", \\"vault\\": \\"qa-122\\"}"}'
This then loads fine:
>>> import json
>>> json.loads('{"JobDescription":"{\\"project\\": \\"1322\\", \\"vault\\": \\"qa-122\\"}"}')
{'JobDescription': '{"project": "1322", "vault": "qa-122"}'}
and you can decode the nested JSON document from there:
>>> decoded = json.loads('{"JobDescription":"{\\"project\\": \\"1322\\", \\"vault\\": \\"qa-122\\"}"}')
>>> json.loads(decoded['JobDescription'])
{'project': '1322', 'vault': 'qa-122'}

UnicodeDecodeError while using json.dumps() [duplicate]

This question already has answers here:
Unsupported operation :not writeable python
(2 answers)
Closed 5 years ago.
I have strings as follows in my python list (taken from command prompt):
>>> o['records'][5790]
(5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60,
True, '40141613')
>>>
I have tried suggestions as mentioned here: Changing default encoding of Python?
Further changed the default encoding to utf-16 too. But still json.dumps() threw and exception as follows:
>>> write(o)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "okapi_create_master.py", line 49, in write
o = json.dumps(output)
File "C:\Python27\lib\json\__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 201, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 264, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 25: invalid
continuation byte
Can't figure what kind of transformation is required for such strings so that json.dumps() works.
\xe1 is not decodable using utf-8, utf-16 encoding.
>>> '\xe1'.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 0: unexpected end of data
>>> '\xe1'.decode('utf-16')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_16.py", line 16, in decode
return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0xe1 in position 0: truncated data
Try latin-1 encoding:
>>> record = (5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ',
... 60, True, '40141613')
>>> json.dumps(record, encoding='latin1')
'[5790, "Vlv-Gate-Assy-Mdl-\\u00e1M1-2-\\u00e19/16-10K-BB Credit Memo ", 60, true, "40141613"]'
Or, specify ensure_ascii=False, json.dumps to make json.dumps not try to decode the string.
>>> json.dumps(record, ensure_ascii=False)
'[5790, "Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ", 60, true, "40141613"]'
I had a similar problem, and came up with the following approach to either guarantee unicodes or byte strings, from either input. In short, include and use the following lambdas:
# guarantee unicode string
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
_uu = lambda *tt: tuple(_u(t) for t in tt)
# guarantee byte string in UTF8 encoding
_u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
_uu8 = lambda *tt: tuple(_u8(t) for t in tt)
Applied to your question:
import json
o = (5790, u"Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ", 60,
True, '40141613')
as_json = json.dumps(_uu8(*o))
as_obj = json.loads(as_json)
print "object\n ", o
print "json (type %s)\n %s " % (type(as_json), as_json)
print "object again\n ", as_obj
=>
object
(5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60, True, '40141613')
json (type <type 'str'>)
[5790, "Vlv-Gate-Assy-Mdl-\u00e1M1-2-\u00e19/16-10K-BB Credit Memo ", 60, true, "40141613"]
object again
[5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60, True, u'40141613']
Here's some more reasoning about this.

Categories