Can't solve typical issue with encodings. Cyrrlic text is received via post and error is raised
'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
The text key itself has a look and it must cyrillic text in russian: u'\u043f\u0440\u043e'
After that error tried this way and some others:
key = key.decode('ascii').encode('utf8')
or :
key = key.decode('ascii')
Localy it works, error is raised in production only. Python system encoding in production is utf8
EDIT: in order to clear things up. Error is raised on form handler function(again, works localy, doesn't in production)
def search(request):
if request.method == 'POST':
key = request.POST.get("key")
if key is not None:
..
So it's str received from input form, and error is first raised at this point, so I supposed it must be decoded but it didn't help.
more traceback:
UnicodeEncodeError at /search/
'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
Related
E-Mail clients decode messages correctly. So I assume there must be also a way do decode emails with python correctly.
I use the building email python library to process incoming emails.
import email
...
email_message = email.message_from_file(fp)
email_message.is_multipart() # => False
email_message.get_content_type() # 'text/plain'
to_decode = email_message.get_payload(decode=True)
charset = email_message.get_content_charset()
# charset is utf-8
to_decode.decode(charset)
Exception:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 135: invalid start byte
This is a part of the string within to_decode variable.
b'Dzie\\u0144 dobry,\n\nniestety w podany'
I figured out with try and error that I can to the following.
test = b'Dzie\\u0144 dobry,\n\nniestety w podany'
test.decode('unicode-escape')
>> output: 'Dzień dobry,\n\nniestety w podany'
Which is correct. But I think there must be a better way instead of guessing. How is my email client doing this?
The response to my request looks as following:
alle_R={'items': [{'x':..., 'y'..., ...}, {...}...]}
I am trying to perform some Actions, as for example, to recall 'x' with help of alle_R['items'][4]['x'], but I get this error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xfc in position 1022: invalid start Byte
I tried out this solution:
alle_R=str(alle_R)
encode_alle_R = codecs.encode(alle_R, 'utf-8')
while importing codecs of course. But this did not bring any results. Creating json files brings the same error.
Do you have any idea how I can access my elements?
Thank you a lot!
I am using Google App Engine for Python, but I get a unicode error is there a way to work around it?
Here is my code:
def get(self):
contents = db.GqlQuery("SELECT * FROM Content ORDER BY created DESC")
output = StringIO.StringIO()
with zipfile.ZipFile(output, 'w') as myzip:
for content in contents:
if content.code:
code=content.code
else:
code=content.code2
myzip.writestr("udacity_code", code)
self.response.headers["Content-Type"] = "application/zip"
self.response.headers['Content-Disposition'] = "attachment; filename=test.zip"
self.response.out.write(output.getvalue())
I now get a unicode error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf7 in position 12: ordinal not in range(128)
I believe it is coming from output.getvalue()... Is there a way to fix this?
#Areke Ignacio's answer is the fix. For a brief walkthrough here is a post I did recently "Python and Unicode Punjabi" https://www.pippallabs.com/blog/python-and-unicode-panjabi
I had the exact same issue.
in the end I solved it by changing the call to writestr from
myzip.writestr("udacity_code", code)
to
myzip.writestr("udacity_code", code.encode('utf-8'))
From this link:
Python UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 ordinal not in range(128)
However in the meantime your problem is that your templates are ASCII
but your data is not (can't tell if it's utf-8 or unicode). Easy
solution is to prefix each template string with u to make it Unicode.
I am using python2.7 and lxml. My code is as below
import urllib
from lxml import html
def get_value(el):
return get_text(el, 'value') or el.text_content()
response = urllib.urlopen('http://www.edmunds.com/dealerships/Texas/Frisco/DavidMcDavidHondaofFrisco/fullsales-504210667.html').read()
dom = html.fromstring(response)
try:
description = get_value(dom.xpath("//div[#class='description item vcard']")[0].xpath(".//p[#class='sales-review-paragraph loose-spacing']")[0])
except IndexError, e:
description = ''
The code crashes inside the try, giving an error
UnicodeDecodeError at /
'utf8' codec can't decode byte 0x92 in position 85: invalid start byte
The string that could not be encoded/decoded was: ouldn�t be
I have tried using a lot of techniques including .encode('utf8'), but none does solve the problem. I have 2 question:
How to solve this problem
How can my app crash when the problem code is between a try except
The page is being served up with charset=ISO-8859-1. Decode from that to unicode.
[
Your except clause only handles exceptions of the IndexError type. The problem was a UnicodeDecodeError, which is not an IndexError - so the exception is not handled by that except clause.
It's also not clear what 'get_value' does, and that may well be where the actual problem is arising.
skip chars on Error, or decode it correctly to unicode.
you only catch IndexError, not UnicodeDecodeError
decode the response to unicode, properly handling errors (ignore on error) before parsing with fromhtml.
catch the UnicodeDecodeError, or all errors.
I am getting this error in Django:
UnicodeDecodeError at /category/list/
'utf8' codec can't decode byte 0xf5 in position 7: invalid start byte
Request Method: GET
Request URL: ...
Django Version: 1.3.1
Exception Type: UnicodeDecodeError
Exception Value:
'utf8' codec can't decode byte 0xf5 in position 7: invalid start byte
Exception Location: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/encoder.py in iterencode, line 264
...
I should save Turkish characters in the database. How can I fix this error?
A start-byte of 0xf5 would indicate the start of a 4-character UTF-8 encoding. One strong possibility is that the input isn't UTF-8 at all but ISO-8859-9, the Turkish ISO encoding. On that codepage 0xf5 is a lowercase o with tilde or õ.
Below code solved my problem. Thank you.
if isinstance(encObject, unicode):
myStr = encObject.encode('utf-8')
http://www.fileformat.info/info/unicode/char/f5/index.htm
it is an o with a tilde
try
some_string.decode('latin1','replace')