Python file chunk in json - python

I have some service api and need to upload file by chunks in json format.
Here is example of code:
for chunk in file_content.chunks():
chunkSize = len(chunk)
d = {'chunksize': chunkSize, 'chunk': chunk}
dump = json.dumps(d)
But getting:
'utf8' codec can't decode byte 0x93 in position 11: invalid start byte
How can i fix it or maybe ignore?
Note: I cannot change the api

json.dumps takes a number of arguments, setting ensure_ascii to false should do the trick.
dump = json.dumps(d, ensure_ascii=False)

Related

How to convert Binary into JSON

I'm having trouble to convert a binary file in which I have data from a FLASH Memory into a JSON or a file readable.
I'm trying to read the file this way in Python:
fileName = "TST477 DeviceMemory.bin"
with open(fileName, mode='rb') as file: # b is important -> binary
fileContent = file.read()
print(fileContent)
I attach a sample of the data I'm trying to convert:
Data Sample Added
And if I try to load it into JSON
data = fileContent.decode("utf-8")
s = json.dumps(data, indent=4, sort_keys=True)
throws the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 8:
invalid continuation byte
Please can someone help me?
Thank you!

Decode a compressed bytes to utf-8 in python?

I have a base64 format variable
variable = b'gAN9cQAoGdS4='
To store in database I decode it
st1 = string.decode('utf-8')
st1
Out[35]: 'gAN9cQAoGdS4='
Now I have a very large variable more than 4GB, so I compress it using zlib
import zlib
variable_comp = zlib.compress(variable)
Now to store in db, I can't decode it
st1 = variable_comp.decode('utf-8')
I get
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x9c in position 1: invalid start byte
so I tried
st2 = variable_comp.decode('utf-8', errors="ignore")
But when I decompress it, I get error
variable_decomp = zlib.decompress(st2)
TypeError: a bytes-like object is required, not 'str'
may I know the fix for this, whether gzip will fix it?

How to get python's json module to cope with right quotation marks?

I am trying to load a utf-8 encoded json file using python's json module. The file contains several right quotation marks, encoded as E2 80 9D. When I call
json.load(f, encoding='utf-8')
I receive the message:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 212068: character maps to
How can I convince the json module to decode this properly?
EDIT: Here's a minimal example:
[
{
"aQuote": "“A quote”"
}
]
There is no encoding in the signature of json.load. The solution should be simply:
with open(filename, encoding='utf-8') as f:
x = json.load(f)

UnicodeEncodeError: 'ascii' codec can't encode

Ia have the following data container which is constantly being updated:
data = []
for val, track_id in zip(values,list(track_ids)):
#below
if val < threshold:
#structure data as dictionary
pre_data = {"artist": sp.track(track_id)['artists'][0]['name'], "track":sp.track(track_id)['name'], "feature": filter_name, "value": val}
data.append(pre_data)
#write to file
with open('db/json/' + user + '_' + product + '_' + filter_name + '.json', 'w') as f:
json.dump(data,f, ensure_ascii=False, indent=4, sort_keys=True)
but I am getting a lot of errors like this:
json.dump(data,f, ensure_ascii=False, indent=4, sort_keys=True)
File"/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 190, in dump
fp.write(chunk)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)
Is there a way I can get rid of this encoding problem once and for all?
I was told that this would do it:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
but many people do not recommend it.
I use python 2.7.10
any clues?
When you write to a file that was opened in text mode, Python encodes the string for you. The default encoding is ascii, which generates the error you see; there are a lot of characters that can't be encoded to ASCII.
The solution is to open the file in a different encoding. In Python 2 you must use the codecs module, in Python 3 you can add the encoding= parameter directly to open. utf-8 is a popular choice since it can handle all of the Unicode characters, and for JSON specifically it's the standard; see https://en.wikipedia.org/wiki/JSON#Data_portability_issues.
import codecs
with codecs.open('db/json/' + user + '_' + product + '_' + filter_name + '.json', 'w', encoding='utf-8') as f:
Your object has unicode strings and python 2.x's support for unicode can be a bit spotty. First, lets make a short example that demonstrates the problem:
>>> obj = {"artist":u"Björk"}
>>> import json
>>> with open('deleteme', 'w') as f:
... json.dump(obj, f, ensure_ascii=False)
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/usr/lib/python2.7/json/__init__.py", line 190, in dump
fp.write(chunk)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 3: ordinal not in range(128)
From the json.dump help text:
If ``ensure_ascii`` is true (the default), all non-ASCII characters in the
output are escaped with ``\uXXXX`` sequences, and the result is a ``str``
instance consisting of ASCII characters only. If ``ensure_ascii`` is
``False``, some chunks written to ``fp`` may be ``unicode`` instances.
This usually happens because the input contains unicode strings or the
``encoding`` parameter is used. Unless ``fp.write()`` explicitly
understands ``unicode`` (as in ``codecs.getwriter``) this is likely to
cause an error.
Ah! There is the solution. Either use the default ensure_ascii=True and get ascii escaped unicode characters or use the codecs module to open the file with the encoding you want. This works:
>>> import codecs
>>> with codecs.open('deleteme', 'w', encoding='utf-8') as f:
... json.dump(obj, f, ensure_ascii=False)
...
>>>
Why not encode the specific string instead? try, the .encode('utf-8') method on the string that is raising the exception.

Write unicode gif to file in python

I have a GIF file (or any image format) in unicode form:
>>> data
u'GIF89a,\x000\x00\ufffd\ufffd\x00\x00\x00\x00\ufffd\ufffd\ufff...
I want to write this to file:
>>> f = open('file.gif', 'wb')
>>> f.write(data)
But I get an error:
UnicodeEncodeError at /image
'ascii' codec can't encode characters in position 10-11: ordinal not in range(128)
How do I do this?
Try this:
utf8data = data.encode('UTF-8')
open('file.gif', 'w').write(utf8data)
You must encode the string to unicode explicitly
f.write(data.encode('utf-8'))

Categories