JSON encoding/decoding issues in Python

JSON encoding/decoding issues in Python - python

I'm trying to read in a response from a REST API, parse it as JSON and write the properties to a CSV file.
It appears some of the characters are in an unknown encoding and can't be converted to strings when they're written out to the CSV file:
'ascii' codec can't encode character u'\xf6' in position 15: ordinal not in range(128)
So, what I've tried to do is follow the answer by "agf" on this question:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 20: ordinal not in range(128)
I added a call to unicode(content).encode("utf-8") when my script reads the contents of the response:
obj = json.loads(unicode(content).encode("utf-8"))
Now I see a exceptions.UnicodeDecodeError on this line.
Is Python attempting to decode "content" before encoding it as utf-8? I don't quite understand what's going on. There is no way to determine the encoding of the response since the API I'm calling doesn't set a Content-Type header.
Not sure how to handle this. Please advise.

Related

'utf-8' codec can't decode byte 0xa0 in position 24: invalid start byte

I am trying to read a csv file using the following lines of Python code:
crimes = pd.read_csv('C:/Users/usuario1/Desktop/python/csv/001 Boston crimes/crime.csv', encoding = 'utf8')
crimes.head(5)
But I am getting decode error as follws:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 24: invalid start byte
What is going wrong?

May be your file does not support utf-8 codec or has a character that does not support utf-8. You can try other encodings like ISO-8859-1. But it is best to check your file encoding first. To do so, something like the following should work:
1.
with open('Your/file/path') as f:
print(f)
This should print file details with encoding.
Or you can just open the csv and when you go to File -> Save As this should show your encoding.
If those don't help, you can ignore the rows that are causing problems by using `error_bad_lines=False'
crimes = pd.read_csv('Your/file/path', encoding='utf8', error_bad_lines=False)
Hope these will help

Python Encoding Error when writing to file

I want write some strings to file which is not in English, they are in Azeri language. Even if I do utf-8 encoding I get following error:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 10-12: ordinal not in range(128)
my code piece that wants to write to file is following:
t_w = text_list[y].encode('utf-8')
new_file.write(t_w.decode('utf-8'))
new_file.write('\n')
EDIT
Even if I make code as:
t_w = text_list[y].encode('ascii',errors='ignore')
new_file.write(t_w)
new_file.write('\n')
I get following error which is :
TypeError: write() argument must be str, not bytes

From what I can tell t_w.decode(...) attempts to convert your characters to ASCII, which doesn't encode some Azeri characters. There is no need to decode the string because you want to write it to the file as UTF-8, so omit the .decode(...) part:new_file.write(t_w)

decode following urls in python

I have a URL like this:
http://idebate.org/debatabase/debates/constitutional-governance/house-supports-dalai-lama%E2%80%99s-%E2%80%98third-way%E2%80%99-tibet
Then I used following script in python to decode this url:
full_href = urllib.unquote(full_href.encode('ascii')).decode('utf-8')
However, i got error like this:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 89: ordinal not in range(128)
when trying to write in file

Just as #KevinJ.Chase pointed out, you were most likely trying to write to a file with string in incompatible ascii format.
You can either change your write file encoding, or encode your full_href to ascii, something like this:
# don't decode again to utf-8
full_href = urllib.unquote(url.encode('ascii'))
... then write to your file stream
or,
...
# encode your your to compatible encoding on write, ie. utf-8
with open('yourfilenamehere', 'w') as f:
f.write(full_href.encode('utf-8'))

Want the code to read ' instead of ’

I am trying to convert a csv file to a json file. The whole code runs fine but when I encounter the statement:
json.dump(DictName, out_file)
I get the following error:
UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 15: invalid start byte
Would someone please be able to help?
TIA.

I found the solution. While parsing the string, I converted the string to unicode using the unicode() function:
unicode(stringname, errors='replace')
and it replaced all the erroneous symbols.

How do I print a list of strings, when I can't know the char encoding in advance?

I am retrieving a list of names from a webservice using a client I've written in Python. Upon retrieving the list, I encode each name to unicode and then print each of them to stdout. When I get to the name "Ólafur Jóhann Ólafsson", I get the following error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:
ordinal not in range(128)
Since I cannot know what the encoding will be, how do I convert all of these strings to unicode? Or can you suggest a better way to handle this problem?

First of all, you decode data to Unicode (the absence of encoding) when reading from a file, pipe, socket, terminal, etc.; and encode Unicode to an appropriate byte encoding when sending/persisting data. I suspect this is the root of your problem.
The web service should declare the encoding in the headers or data received. print normally automatically encodes Unicode to the terminal's encoding (discovered through sys.stdout.encoding) or in absence of that just ascii. If the characters in your data are not supported by the target encoding, you'll get a UnicodeEncodeError.
Since that is not the error you received, you should post some code so we can see what your are doing. Most likely, you are encoding a byte string instead of decoding. Here's an example of this:
>>> data = '\xc2\xbd' # UTF-8 encoded 1/2 symbol.
>>> data.encode('cp437')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\dev\python\lib\encodings\cp437.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
What I did here is call encode on a byte string. Since encode requires a Unicode string, Python used the default ascii encoding to decode the byte string to Unicode first, before encoding to cp437.
Fix this by decoding instead of encoding the data, then print will encode to stdout automatically. As long as your terminal supports the characters in the data, it will display properly:
>>> import sys
>>> sys.stdout.encoding
'cp437'
>>> print data.decode('utf8') # implicit encode to sys.stdout.encoding
½
>>> print data.decode('utf8').encode('cp437') # explicit encode.
½

The UnicodeDammit module from BeautifulSoup can automagically detect the encoding.
from BeautifulSoup import UnicodeDammit
u = UnicodeDammit("Ólafur Jóhann Ólafsson")
print u.unicode
print u.originalEncoding

This page may help you http://wiki.python.org/moin/PrintFails
The problem, I guess, is that you need to print those names to console. Do you really need it? or it's just a test environment? if you use console just for testing, you may switch to other tools like unit testing to check what values you exactly get.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

JSON encoding/decoding issues in Python - python

Related

'utf-8' codec can't decode byte 0xa0 in position 24: invalid start byte

Python Encoding Error when writing to file

decode following urls in python

Want the code to read ' instead of ’

How do I print a list of strings, when I can't know the char encoding in advance?

Categories

Resources