UnicodeDecodeError when import json file - python

I want to open a json file in python and I have the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 64864: ordinal not in range(128)
my code is quite simple:
# -*- coding: utf-8 -*-
import json
with open('birdw3l2.json') as data_file:
data = json.load(data_file)
print(data)
Someone can help me? Thanks!

Try the following code.
import json
with open('birdw3l2.json') as data_file:
data = json.load(data_file).decode('utf-8')
print(data)

You should specify your encoding format when you load your json file. like this:
data = json.load(data_file, encoding='utf-8')
The encoding depends on your file encoding.

Related

How to convert Binary into JSON

I'm having trouble to convert a binary file in which I have data from a FLASH Memory into a JSON or a file readable.
I'm trying to read the file this way in Python:
fileName = "TST477 DeviceMemory.bin"
with open(fileName, mode='rb') as file: # b is important -> binary
fileContent = file.read()
print(fileContent)
I attach a sample of the data I'm trying to convert:
Data Sample Added
And if I try to load it into JSON
data = fileContent.decode("utf-8")
s = json.dumps(data, indent=4, sort_keys=True)
throws the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 8:
invalid continuation byte
Please can someone help me?
Thank you!

unicode decode error while importing Medical Data on pandas

I tried importing a medical data and I ran into this unicode error, here is my code:
output_path = r"C:/Users/muham/Desktop/AI projects/cancer doc classification"
my_file = glob.glob(os.path.join(output_path, '*.csv'))
for files in my_file:
data = pd.read_csv(files)
print(data)
My error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x93 in position 3314: invalid start byte
Try other encodings, default one is utf-8
like
import pandas
pandas.read_csv(path, encoding="cp1252")
or ascii, latin1, etc ...

Encoding and decoding with utf-8 returns UnicodeError

I am both enconding and decoding with utf-8 but still I get a UnicodeError.
import pandas as pd
df.to_csv('myfile.csv', index=False, encoding='utf-8')
Then, in another .py, same project
import pandas as pd
with open(file, 'r') as f:
csv = pd.read_csv(f, encoding='utf-8')
The error is:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 51956: character maps to <undefined>
This is not the first time I get this issue.
Ok, found it. Makes a lot of sense now.
with open(file, 'r', encoding='utf-8') as f:
csv = pd.read_csv(f)

How to get python's json module to cope with right quotation marks?

I am trying to load a utf-8 encoded json file using python's json module. The file contains several right quotation marks, encoded as E2 80 9D. When I call
json.load(f, encoding='utf-8')
I receive the message:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 212068: character maps to
How can I convince the json module to decode this properly?
EDIT: Here's a minimal example:
[
{
"aQuote": "“A quote”"
}
]
There is no encoding in the signature of json.load. The solution should be simply:
with open(filename, encoding='utf-8') as f:
x = json.load(f)

UnicodeDecodeError: 'utf8' codec can't decode bytes

I'm parsing an xml file which has "iso-8859-15" encoding.
Words like 'Zürich', 'Aktienrückk' get converted to "&#228 ;" etc.
I tried these suggestions :
p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8'))
>>> p.text
u'found "\u62c9\u67cf \u591a\u516c \u56ed"'
>>> print p.text
but I get errors like UnicodeDecodeError: 'ascii' codec can't decode byte
Even this doesn't help
content = unicode(mystring.strip(codecs.BOM_UTF8), 'utf-8')
I tried a lot of suggestions on Stack Overflow, but I couldn't figure out my way.
I need to write the parsed content back to a html file with same character sets like 'ü'
Try this:
from xml.etree import ElementTree
p = ElementTree.fromstring(u'<p>found "\u62c9\u67cf \u591a\u516c \u56ed"</p>'.encode('utf8'))
print p.text.encode('utf8')
found "拉柏 多公 园"
For your example:
# -*- coding: utf-8 -*-
from xml.etree import ElementTree
text = 'Aktienrückk'.decode('utf8')
print text.encode('utf8')
Aktienrückk
Don't forget to put # -*- coding: utf-8 -*- at the beginning of the file.

Categories