How can I solve UnicodeDecodeError in Django?

How can I solve UnicodeDecodeError in Django? - python

I am getting this error in Django:
UnicodeDecodeError at /category/list/
'utf8' codec can't decode byte 0xf5 in position 7: invalid start byte
Request Method: GET
Request URL: ...
Django Version: 1.3.1
Exception Type: UnicodeDecodeError
Exception Value:
'utf8' codec can't decode byte 0xf5 in position 7: invalid start byte
Exception Location: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py in iterencode, line 264
...
I should save Turkish characters in the database. How can I fix this error?

A start-byte of 0xf5 would indicate the start of a 4-character UTF-8 encoding. One strong possibility is that the input isn't UTF-8 at all but ISO-8859-9, the Turkish ISO encoding. On that codepage 0xf5 is a lowercase o with tilde or õ.

Below code solved my problem. Thank you.
if isinstance(encObject, unicode):
myStr = encObject.encode('utf-8')

http://www.fileformat.info/info/unicode/char/f5/index.htm
it is an o with a tilde
try
some_string.decode('latin1','replace')

Related

UnicodeEncodeError / python3

Can't solve typical issue with encodings. Cyrrlic text is received via post and error is raised
'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
The text key itself has a look and it must cyrillic text in russian: u'\u043f\u0440\u043e'
After that error tried this way and some others:
key = key.decode('ascii').encode('utf8')
or :
key = key.decode('ascii')
Localy it works, error is raised in production only. Python system encoding in production is utf8
EDIT: in order to clear things up. Error is raised on form handler function(again, works localy, doesn't in production)
def search(request):
if request.method == 'POST':
key = request.POST.get("key")
if key is not None:
..
So it's str received from input form, and error is first raised at this point, so I supposed it must be decoded but it didn't help.
more traceback:
UnicodeEncodeError at /search/
'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

UnicodeEncodeError with nginx and django

I've tried this SO answer, this doc is inapplicable as I'm running nginx, I've added charset utf-8; to my nginx config and I'm still getting this error.
Summarised traceback is here:
UnicodeEncodeError at /
'ascii' codec can't encode character u'\xe1' in position 69: ordinal not in range(128)
Request Method: GET
Request URL: http://django/
Django Version: 1.4.20
Exception Type: UnicodeEncodeError
Exception Value:
'ascii' codec can't encode character u'\xe1' in position 69: ordinal not in range(128)
Exception Location: /opt/envs/venv/lib/python2.7/genericpath.py in getmtime, line 54
Unicode error hint
The string that could not be encoded/decoded was: choacán.jpg

I think this error is not about nginx. It's on the file creation step.
Python uses system locale when saving files.
Check your system locale:
$ python manage.py shell
> import os
> print os.popen("locale").read()
If it's incorrect you should set system locale.
But filenames like this can cause any kind of troubles for users. Please think about defining custom file storage for models.FileField and generating random file name for every file - it's good practice.

UnicodeDecodeError: 'ascii' codec can't decode byte

I have the following code:
# -*- coding: utf-8 -*-
import splinter
import urllib
browser = splinter.Browser('firefox')
miss = ("rúin",)
for i in miss:
browser.visit(link)
browser.fill('word', i)
Which gives me the error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
How can I resolve this issue?

Use an actual unicode value:
miss = (u"rúin",)
Note the u before the string literal.
Python otherwise will try to coerce the bytestring to unicode implicitly, using the default codec (ASCII).

Anyone know how to fix a unicode error?

I am using Google App Engine for Python, but I get a unicode error is there a way to work around it?
Here is my code:
def get(self):
contents = db.GqlQuery("SELECT * FROM Content ORDER BY created DESC")
output = StringIO.StringIO()
with zipfile.ZipFile(output, 'w') as myzip:
for content in contents:
if content.code:
code=content.code
else:
code=content.code2
myzip.writestr("udacity_code", code)
self.response.headers["Content-Type"] = "application/zip"
self.response.headers['Content-Disposition'] = "attachment; filename=test.zip"
self.response.out.write(output.getvalue())
I now get a unicode error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf7 in position 12: ordinal not in range(128)
I believe it is coming from output.getvalue()... Is there a way to fix this?

#Areke Ignacio's answer is the fix. For a brief walkthrough here is a post I did recently "Python and Unicode Punjabi" https://www.pippallabs.com/blog/python-and-unicode-panjabi

I had the exact same issue.
in the end I solved it by changing the call to writestr from
myzip.writestr("udacity_code", code)
to
myzip.writestr("udacity_code", code.encode('utf-8'))

From this link:
Python UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 ordinal not in range(128)
However in the meantime your problem is that your templates are ASCII
but your data is not (can't tell if it's utf-8 or unicode). Easy
solution is to prefix each template string with u to make it Unicode.

Python error: 'utf8' codec can't decode byte 0x92 in position 85: invalid start byte

I am using python2.7 and lxml. My code is as below
import urllib
from lxml import html
def get_value(el):
return get_text(el, 'value') or el.text_content()
response = urllib.urlopen('http://www.edmunds.com/dealerships/Texas/Frisco/DavidMcDavidHondaofFrisco/fullsales-504210667.html').read()
dom = html.fromstring(response)
try:
description = get_value(dom.xpath("//div[#class='description item vcard']")[0].xpath(".//p[#class='sales-review-paragraph loose-spacing']")[0])
except IndexError, e:
description = ''
The code crashes inside the try, giving an error
UnicodeDecodeError at /
'utf8' codec can't decode byte 0x92 in position 85: invalid start byte
The string that could not be encoded/decoded was: ouldn�t be
I have tried using a lot of techniques including .encode('utf8'), but none does solve the problem. I have 2 question:
How to solve this problem
How can my app crash when the problem code is between a try except

The page is being served up with charset=ISO-8859-1. Decode from that to unicode.
[

Your except clause only handles exceptions of the IndexError type. The problem was a UnicodeDecodeError, which is not an IndexError - so the exception is not handled by that except clause.
It's also not clear what 'get_value' does, and that may well be where the actual problem is arising.

skip chars on Error, or decode it correctly to unicode.
you only catch IndexError, not UnicodeDecodeError

decode the response to unicode, properly handling errors (ignore on error) before parsing with fromhtml.
catch the UnicodeDecodeError, or all errors.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I solve UnicodeDecodeError in Django? - python

A start-byte of 0xf5 would indicate the start of a 4-character UTF-8 encoding. One strong possibility is that the input isn't UTF-8 at all but ISO-8859-9, the Turkish ISO encoding. On that codepage 0xf5 is a lowercase o with tilde or õ.

Below code solved my problem. Thank you. if isinstance(encObject, unicode): myStr = encObject.encode('utf-8')

http://www.fileformat.info/info/unicode/char/f5/index.htm it is an o with a tilde try some_string.decode('latin1','replace')

Related

UnicodeEncodeError / python3

UnicodeEncodeError with nginx and django

UnicodeDecodeError: 'ascii' codec can't decode byte

Anyone know how to fix a unicode error?

Python error: 'utf8' codec can't decode byte 0x92 in position 85: invalid start byte

Categories

Resources