Python 3 UnicodeEncodeError [duplicate] - python

This question already has answers here:
Unsupported operation :not writeable python
(2 answers)
Closed 5 years ago.
I have strings as follows in my python list (taken from command prompt):
>>> o['records'][5790]
(5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60,
True, '40141613')
>>>
I have tried suggestions as mentioned here: Changing default encoding of Python?
Further changed the default encoding to utf-16 too. But still json.dumps() threw and exception as follows:
>>> write(o)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "okapi_create_master.py", line 49, in write
o = json.dumps(output)
File "C:\Python27\lib\json\__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 201, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 264, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 25: invalid
continuation byte
Can't figure what kind of transformation is required for such strings so that json.dumps() works.

\xe1 is not decodable using utf-8, utf-16 encoding.
>>> '\xe1'.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 0: unexpected end of data
>>> '\xe1'.decode('utf-16')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_16.py", line 16, in decode
return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0xe1 in position 0: truncated data
Try latin-1 encoding:
>>> record = (5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ',
... 60, True, '40141613')
>>> json.dumps(record, encoding='latin1')
'[5790, "Vlv-Gate-Assy-Mdl-\\u00e1M1-2-\\u00e19/16-10K-BB Credit Memo ", 60, true, "40141613"]'
Or, specify ensure_ascii=False, json.dumps to make json.dumps not try to decode the string.
>>> json.dumps(record, ensure_ascii=False)
'[5790, "Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ", 60, true, "40141613"]'

I had a similar problem, and came up with the following approach to either guarantee unicodes or byte strings, from either input. In short, include and use the following lambdas:
# guarantee unicode string
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
_uu = lambda *tt: tuple(_u(t) for t in tt)
# guarantee byte string in UTF8 encoding
_u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
_uu8 = lambda *tt: tuple(_u8(t) for t in tt)
Applied to your question:
import json
o = (5790, u"Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ", 60,
True, '40141613')
as_json = json.dumps(_uu8(*o))
as_obj = json.loads(as_json)
print "object\n ", o
print "json (type %s)\n %s " % (type(as_json), as_json)
print "object again\n ", as_obj
=>
object
(5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60, True, '40141613')
json (type <type 'str'>)
[5790, "Vlv-Gate-Assy-Mdl-\u00e1M1-2-\u00e19/16-10K-BB Credit Memo ", 60, true, "40141613"]
object again
[5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60, True, u'40141613']
Here's some more reasoning about this.

Related

Python errors in file encode

this is my code:
prettyPicture(clf, features_test, labels_test)
output_image("F:/test.png", "png", open("F:/test.png", "rb").read())
def output_image(name, format, bytes):
image_start = "BEGIN_IMAGE_f9825uweof8jw9fj4r8"
image_end = "END_IMAGE_0238jfw08fjsiufhw8frs"
data = {}
data['name'] = name
data['format'] = format
data['bytes'] = base64.encodestring(bytes)
print(image_start + json.dumps(data) + image_end)
this errors is:
Traceback (most recent call last):
File "studentMain.py", line 41, in <module>
output_image("F:/test.png", "png", open("F:/test.png", "rb").read())
File "F:\Demo\class_vis.py", line 69, in output_image
print(image_start + json.dumps(data) + image_end)
File "C:\Users\Tony\AppData\Local\Programs\Python\Python36-
32\lib\json\__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "C:\Users\Tony\AppData\Local\Programs\Python\Python36-
32\lib\json\encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Users\Tony\AppData\Local\Programs\Python\Python36-
32\lib\json\encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "C:\Users\Tony\AppData\Local\Programs\Python\Python36-
32\lib\json\encoder.py", line 180, in default
o.__class__.__name__)
TypeError: Object of type 'bytes' is not JSON serializable
The issue here is that base64.encodestring() returns a bytes object, not a string.
Try:
data['bytes'] = base64.encodestring(bytes).decode('ascii')
Check out this question and answer for a good explanation of why this is:
Why does base64.b64encode() return a bytes object?
Also see: How to encode bytes in JSON? json.dumps() throwing a TypeError
You're only missing one aspect here: When you use .encodestring, you have a bytes object as result, and bytes are not json serializable in python 3.
You can solve it just encoding your data["bytes"]:
data['bytes'] = base64.encodestring(bytes).decode("utf-8")
I'm assuming you'll always receive a bytes object at the "bytes" variable, otherwise you should add a checker for the type of the object, and not encoding when it's already a string.

Utf-8 decode error in pyramid/WebOb request

I found an error in the logs of a website of mine, in the log i got the body of the request, so i tried to reproduce that
This is what i got.
>>> from mondishop.models import *
>>> from pyramid.request import *
>>> req = Request.blank('/')
>>> b = DBSession.query(Log).filter(Log.id == 503).one().payload.encode('utf-8')
>>> req.method = 'POST'
>>> req.body = b
>>> req.params
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/request.py", line 856, in params
params = NestedMultiDict(self.GET, self.POST)
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/request.py", line 807, in POST
vars = MultiDict.from_fieldstorage(fs)
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/multidict.py", line 92, in from_fieldstorage
obj.add(field.name, decode(value))
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/multidict.py", line 78, in <lambda>
decode = lambda b: b.decode(charset)
File "/home/phas/virtualenv/mondishop/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 52: invalid start byte
>>> req.POST
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/request.py", line 807, in POST
vars = MultiDict.from_fieldstorage(fs)
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/multidict.py", line 92, in from_fieldstorage
obj.add(field.name, decode(value))
File "/home/phas/virtualenv/mondishop/local/lib/python2.7/site-packages/webob/multidict.py", line 78, in <lambda>
decode = lambda b: b.decode(charset)
File "/home/phas/virtualenv/mondishop/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xb0 in position 52: invalid start byte
>>>
The error is the same as the one i got i my log, so apparently something goes bad try to decoding the original post.
What is weird is that i get an error trying to utf-8 decode something that i just utf-8 encoded.
I cannot provide the content of the original request body because it contains some sensitive data (it's a paypal IPN) and i don't really have any idea on how to start addressing this issue.

UnicodeDecodeError when dumping to json input from console

I input some cyrillic text from console and when I try to dump it to json I get exceptions.UnicodeDecodeError: 'utf8' codec can't decode byte. I can't figure out why because it doesn't happen always and text is always cyrillic.
Here's the part of the code where I input text:
item['title'] = raw_input('Title: ')
item['description'] = raw_input('Description: ')
And here's the line where I dump the dictionary to json:
line = json.dumps(dict(item), encoding='utf8') + "\n"
The item is not a dictionary, it's an object, so I need to convert it to dictionary first.
Here's the full traceback:
Traceback (most recent call last):
File "/home/dmitry/.virtualenvs/test_scrapy/local/lib/python2.7/site-packages/scrapy/middleware.py", line 62, in _process_chain
return process_chain(self.methods[methodname], obj, *args)
File "/home/dmitry/.virtualenvs/test_scrapy/local/lib/python2.7/site-packages/scrapy/utils/defer.py", line 65, in process_chain
d.callback(input)
File "/home/dmitry/.virtualenvs/test_scrapy/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 382, in callback
self._startRunCallbacks(result)
File "/home/dmitry/.virtualenvs/test_scrapy/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 490, in _startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "/home/dmitry/.virtualenvs/test_scrapy/local/lib/python2.7/site-packages/twisted/internet/defer.py", line 577, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/dmitry/Dropbox/coding/python/scrapy/videos_parser/videos_parser/pipelines.py", line 94, in process_item
line = json.dumps(dict(item), encoding='utf8') + "\n"
File "/usr/lib/python2.7/json/__init__.py", line 250, in dumps
sort_keys=sort_keys, **kw).encode(obj)
File "/usr/lib/python2.7/json/encoder.py", line 207, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python2.7/json/encoder.py", line 270, in iterencode
return _iterencode(o, 0)
File "/usr/lib/python2.7/json/encoder.py", line 233, in _encoder
o = o.decode(_encoding)
File "/home/dmitry/.virtualenvs/test_scrapy/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
exceptions.UnicodeDecodeError: 'utf8' codec can't decode byte 0xd1 in position 15: invalid continuation byte
sys.getdefaultencoding() says I'm using ascii. I've tried to changed it to utf8 with sys.setdefaultencoding('utf8') but it didn't work.
UPDATE
Here's the code I use to see how strings looked before decoding:
try:
item['title'] = raw_input('Title: ')
item['title'] = item['title'].decode(sys.stdin.encoding)
except UnicodeDecodeError:
print repr(item['title'])
try:
item['description'] = raw_input('Description: ')
item['description'] = item['description'].decode(sys.stdin.encoding)
except UnicodeDecodeError:
print repr(item['description'])
And here's what the output from console:
Title: На работе платят бабло, но работать надо на ней
'\xd0\x9d\xd0\xb0 \xd1\x80\xd0\xb0\xd0\xb1\xd0\xbe\xd1\x82\xd0\xd0\xb5 \xd0\xbf\xd0\xbb\xd0\xb0\xd1\x82\xd1\x8f\xd1\x82 \xd0\xb1\xd0\xb0\xd0\xb1\xd0\xbb\xd0\xbe, \xd0\xbd\xd0\xd0\xbe \xd1\x80\xd0\xb0\xd0\xb1\xd0\xbe\xd1\x82\xd0\xb0\xd1\x82\xd1\x8c \xd0\xbd\xd0\xb0\xd0\xb4\xd0\xbe \xd0\xbd\xd0\xb0 \xd0\xbd\xd0\xb5\xd0\xb9'
Description: Я не против первого, но без второго мне веселей
'\xd0\xaf \xd0\xbd\xd0\xb5 \xd0\xbf\xd1\x80\xd0\xbe\xd1\x82\xd0\xb8\xd0\xb2 \xd0\xbf\xd0\xb5\xd1\x80\xd0\xb2\xd0\xbe\xd0\xb3\xd0\xbe \xd0, \xd0\xbd\xd0\xbe \xd0\xb1\xd0\xb5\xd0\xb7 \xd0\xb2\xd1\x82\xd0\xbe\xd1\x80\xd0\xbe\xd0\xb3\xd0\xbe \xd0\xbc\xd0\xbd\xd0\xb5 \xd0\xb2\xd0\xb5\xd1\x81\xd0\xb5\xd0\xbb\xd0\xb5\xd0\xb9'
Your terminal appears to be botching UTF-8 input; extra \dx0 bytes have been inserted:
>>> import difflib
>>> given = '\xd0\x9d\xd0\xb0 \xd1\x80\xd0\xb0\xd0\xb1\xd0\xbe\xd1\x82\xd0\xd0\xb5 \xd0\xbf\xd0\xbb\xd0\xb0\xd1\x82\xd1\x8f\xd1\x82 \xd0\xb1\xd0\xb0\xd0\xb1\xd0\xbb\xd0\xbe, \xd0\xbd\xd0\xd0\xbe \xd1\x80\xd0\xb0\xd0\xb1\xd0\xbe\xd1\x82\xd0\xb0\xd1\x82\xd1\x8c \xd0\xbd\xd0\xb0\xd0\xb4\xd0\xbe \xd0\xbd\xd0\xb0 \xd0\xbd\xd0\xb5\xd0\xb9'
>>> expected = 'На работе платят бабло, но работать надо на ней' # requires UTF-8 terminal
>>> for opcode in difflib.SequenceMatcher(a=expected, b=given).get_opcodes():
... print "%6s a[%d:%d] b[%d:%d]" % opcode
... if opcode[0] == 'insert': print 'Inserted:', repr(given[opcode[3]:opcode[4]])
...
equal a[0:15] b[0:15]
insert a[15:15] b[15:16]
Inserted: '\xd0'
equal a[15:45] b[16:46]
insert a[45:45] b[46:47]
Inserted: '\xd0'
equal a[45:85] b[47:87]
>>> expected[14:17]
'\x82\xd0\xb5'
>>> given[14:18]
'\x82\xd0\xd0\xb5'
>>> expected[44:47]
'\xbd\xd0\xbe'
>>> given[45:49]
'\xbd\xd0\xd0\xbe'
>>> given = '\xd0\xaf \xd0\xbd\xd0\xb5 \xd0\xbf\xd1\x80\xd0\xbe\xd1\x82\xd0\xb8\xd0\xb2 \xd0\xbf\xd0\xb5\xd1\x80\xd0\xb2\xd0\xbe\xd0\xb3\xd0\xbe \xd0, \xd0\xbd\xd0\xbe \xd0\xb1\xd0\xb5\xd0\xb7 \xd0\xb2\xd1\x82\xd0\xbe\xd1\x80\xd0\xbe\xd0\xb3\xd0\xbe \xd0\xbc\xd0\xbd\xd0\xb5 \xd0\xb2\xd0\xb5\xd1\x81\xd0\xb5\xd0\xbb\xd0\xb5\xd0\xb9'
>>> expected = 'Я не против первого, но без второго мне веселей' # requires UTF-8 terminal
>>> for opcode in difflib.SequenceMatcher(a=expected, b=given).get_opcodes():
... print "%6s a[%d:%d] b[%d:%d]" % opcode
... if opcode[0] == 'insert': print 'Inserted:', repr(given[opcode[3]:opcode[4]])
...
equal a[0:35] b[0:35]
insert a[35:35] b[35:37]
Inserted: ' \xd0'
equal a[35:85] b[37:87]
>>> expected[34:38]
'\xbe, \xd0'
>>> given[34:40]
'\xbe \xd0, \xd0'
In the title, two extra \xd0 bytes were inserted where there already was a \xd0 byte. In the description, a space and \xd0 was inserted before a comma, space, \xd0 sequence.
This is a failure of your terminal, not Python. Why this happens, is not clear however.
If you use simple raw_input() you give only bytes:
>>> raw_input('Input: ')
Input: фыв
'\xd1\x84\xd1\x8b\xd0\xb2'
Use unicode() to convert input string to unicode
>>> unicode(raw_input('Input: '), encoding='utf-8')
Input: фыв
u'\u0444\u044b\u0432'
and then try work with json

UnicodeDecodeError while using json.dumps() [duplicate]

This question already has answers here:
Unsupported operation :not writeable python
(2 answers)
Closed 5 years ago.
I have strings as follows in my python list (taken from command prompt):
>>> o['records'][5790]
(5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60,
True, '40141613')
>>>
I have tried suggestions as mentioned here: Changing default encoding of Python?
Further changed the default encoding to utf-16 too. But still json.dumps() threw and exception as follows:
>>> write(o)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "okapi_create_master.py", line 49, in write
o = json.dumps(output)
File "C:\Python27\lib\json\__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "C:\Python27\lib\json\encoder.py", line 201, in encode
chunks = self.iterencode(o, _one_shot=True)
File "C:\Python27\lib\json\encoder.py", line 264, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 25: invalid
continuation byte
Can't figure what kind of transformation is required for such strings so that json.dumps() works.
\xe1 is not decodable using utf-8, utf-16 encoding.
>>> '\xe1'.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe1 in position 0: unexpected end of data
>>> '\xe1'.decode('utf-16')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\encodings\utf_16.py", line 16, in decode
return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0xe1 in position 0: truncated data
Try latin-1 encoding:
>>> record = (5790, 'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ',
... 60, True, '40141613')
>>> json.dumps(record, encoding='latin1')
'[5790, "Vlv-Gate-Assy-Mdl-\\u00e1M1-2-\\u00e19/16-10K-BB Credit Memo ", 60, true, "40141613"]'
Or, specify ensure_ascii=False, json.dumps to make json.dumps not try to decode the string.
>>> json.dumps(record, ensure_ascii=False)
'[5790, "Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ", 60, true, "40141613"]'
I had a similar problem, and came up with the following approach to either guarantee unicodes or byte strings, from either input. In short, include and use the following lambdas:
# guarantee unicode string
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
_uu = lambda *tt: tuple(_u(t) for t in tt)
# guarantee byte string in UTF8 encoding
_u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
_uu8 = lambda *tt: tuple(_u8(t) for t in tt)
Applied to your question:
import json
o = (5790, u"Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ", 60,
True, '40141613')
as_json = json.dumps(_uu8(*o))
as_obj = json.loads(as_json)
print "object\n ", o
print "json (type %s)\n %s " % (type(as_json), as_json)
print "object again\n ", as_obj
=>
object
(5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60, True, '40141613')
json (type <type 'str'>)
[5790, "Vlv-Gate-Assy-Mdl-\u00e1M1-2-\u00e19/16-10K-BB Credit Memo ", 60, true, "40141613"]
object again
[5790, u'Vlv-Gate-Assy-Mdl-\xe1M1-2-\xe19/16-10K-BB Credit Memo ', 60, True, u'40141613']
Here's some more reasoning about this.

Convert Python dictionary to JSON array

Currently I have this dictionary, printed using pprint:
{'AlarmExTempHum': '\x00\x00\x00\x00\x00\x00\x00\x00',
'AlarmIn': 0,
'AlarmOut': '\x00\x00',
'AlarmRain': 0,
'AlarmSoilLeaf': '\x00\x00\x00\x00',
'BarTrend': 60,
'BatteryStatus': 0,
'BatteryVolts': 4.751953125,
'CRC': 55003,
'EOL': '\n\r',
'ETDay': 0,
'ETMonth': 0,
'ETYear': 0,
'ExtraHum1': None,
'ExtraHum2': None,
'ExtraHum3': None,
'ExtraHum4': None,
'ExtraHum5': None,
'ExtraHum6': None,
'ExtraHum7': None,
'ExtraTemp1': None,
'ExtraTemp2': None,
'ExtraTemp3': None,
'ExtraTemp4': None,
'ExtraTemp5': None,
'ExtraTemp6': None,
'ExtraTemp7': None,
'ForecastIcon': 2,
'ForecastRuleNo': 122,
'HumIn': 31,
'HumOut': 94,
'LOO': 'LOO',
'LeafTemps': '\xff\xff\xff\xff',
'LeafWetness': '\xff\xff\xff\x00',
'NextRec': 37,
'PacketType': 0,
'Pressure': 995.9363359295631,
'RainDay': 0.0,
'RainMonth': 0.0,
'RainRate': 0.0,
'RainStorm': 0.0,
'RainYear': 2.8,
'SoilMoist': '\xff\xff\xff\xff',
'SoilTemps': '\xff\xff\xff\xff',
'SolarRad': None,
'StormStartDate': '2127-15-31',
'SunRise': 849,
'SunSet': 1611,
'TempIn': 21.38888888888889,
'TempOut': 0.8888888888888897,
'UV': None,
'WindDir': 219,
'WindSpeed': 3.6,
'WindSpeed10Min': 3.6}
When I do this:
import json
d = (my dictionary above)
jsonarray = json.dumps(d)
I get this error: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte
If you are fine with non-printable symbols in your json, then add ensure_ascii=False to dumps call.
>>> json.dumps(your_data, ensure_ascii=False)
If ensure_ascii is false, then the return value will be a
unicode instance subject to normal Python str to unicode
coercion rules instead of being escaped to an ASCII str.
ensure_ascii=False really only defers the issue to the decoding stage:
>>> dict2 = {'LeafTemps': '\xff\xff\xff\xff',}
>>> json1 = json.dumps(dict2, ensure_ascii=False)
>>> print(json1)
{"LeafTemps": "����"}
>>> json.loads(json1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/json/__init__.py", line 328, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: invalid start byte
Ultimately you can't store raw bytes in a JSON document, so you'll want to use some means of unambiguously encoding a sequence of arbitrary bytes as an ASCII string - such as base64.
>>> import json
>>> from base64 import b64encode, b64decode
>>> my_dict = {'LeafTemps': '\xff\xff\xff\xff',}
>>> my_dict['LeafTemps'] = b64encode(my_dict['LeafTemps'])
>>> json.dumps(my_dict)
'{"LeafTemps": "/////w=="}'
>>> json.loads(json.dumps(my_dict))
{u'LeafTemps': u'/////w=='}
>>> new_dict = json.loads(json.dumps(my_dict))
>>> new_dict['LeafTemps'] = b64decode(new_dict['LeafTemps'])
>>> print new_dict
{u'LeafTemps': '\xff\xff\xff\xff'}
If you use Python 2, don't forget to add the UTF-8 file encoding comment on the first line of your script.
# -*- coding: UTF-8 -*-
This will fix some Unicode problems and make your life easier.
One possible solution that I use is to use python3. It seems to solve many utf issues.
Sorry for the late answer, but it may help people in the future.
For example,
#!/usr/bin/env python3
import json
# your code follows

Categories