I have code in my views that returns information to be displayed in a textbox. My name has fadas (Irish accent) over the letters which is causing UnicodeDecodeErrors. The line in my logic is as follows:
return {
...
'wrap_up_form': WrapUpForm(data={u'message': settings.DEFAULT_WRAP_UP_MESSAGE.format(name=customer.given_name.encode('utf-8'))}),
}
and the traceback I get is this
ERROR 2014-07-24 14:48:26,540 exception_handlers.py:65] 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128)
Traceback (most recent call last):
File "/home/rony/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1529, in __call__
rv = self.router.dispatch(request, response)
File "/home/rony/Documents/clone-attempt/personal-shopping/vendor/nacelle/core/dispatcher.py", line 24, in nacelle_dispatcher
response = router.default_dispatcher(request, response)
File "/home/rony/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/home/rony/google_appengine/lib/webapp2-2.5.2/webapp2.py", line 1065, in __call__
return self.handler(request, *args, **kwargs)
File "/home/rony/Documents/clone-attempt/personal-shopping/app/utils/decorators.py", line 43, in _arguments_wrapper
return view_method(request, *args, **kwargs)
File "/home/rony/Documents/clone-attempt/personal-shopping/app/utils/decorators.py", line 89, in _arguments_wrapper
output = render_jinja2_template(template_name, context)
File "/home/rony/Documents/clone-attempt/personal-shopping/vendor/nacelle/core/template/renderers.py", line 19, in render_jinja2_template
return renderer.render_template(template_name, **context)
File "/home/rony/google_appengine/lib/webapp2-2.5.2/webapp2_extras/jinja2.py", line 158, in render_template
return self.environment.get_template(_filename).render(**context)
File "/home/rony/google_appengine/lib/jinja2-2.6/jinja2/environment.py", line 894, in render
return self.environment.handle_exception(exc_info, True)
File "templates/cms/appointments_form.html", line 2, in top-level template code
{% import 'cms/macros.html' as cms_macros %}
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128)
Do I need to add some sort of encoding to my templates?
As Daniel Roseman answered, I also suspect that customer.given_name is byte string; trying to encode on it cause Python try to decode it.
Another issue is that DEFAULT_WRAP_UP_MESSAGE is byte string literal.
str.format(unicode) has same issue.
Solution:
remove .decode(..) part.
Make a DEFAULT_WRAP_UP_MESSAGE unicode object instead of byte string.
It seems likely that customer.given_name is a byte string, rather than Unicode - so in calling encode on it, Python first needs to decode it to Unicode before it can then re-encode to UTF-8.
You should drop the encode call altogether.
Related
I had built chatbot following this github:
https://github.com/llSourcell/tensorflow_chatbot
Also I get data on: https://github.com/suriyadeepan/easy_seq2seq/tree/master/data
I use tensorflow 0.12 with python 3.5. Can someone help me fix this problem: >> Mode : test
Traceback (most recent call last):
File "execute.py", line 324, in
decode()
File "execute.py", line 220, in decode
enc_vocab, _ = data_utils.initialize_vocabulary(enc_vocab_path)
File "D:\My_document\AI\Chatbot_Conversation\tensorflow_chatbot-master\data_utils.py", line 86, in initialize_vocabulary
rev_vocab.extend(f.readlines())
File "C:\Users\Hoang\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 131, in readlines
s = self.readline()
File "C:\Users\Hoang\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\lib\io\file_io.py", line 124, in readline
return compat.as_str_any(self._read_buf.ReadLineAsString())
File "C:\Users\Hoang\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\util\compat.py", line 106, in as_str_any
return as_str(value)
File "C:\Users\Hoang\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\util\compat.py", line 84, in as_text
return bytes_or_text.decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1: invalid start byte
This error mean that I need to revise my data to utf-8?
I appreciate any help. Thank you!
Please try to read the data using encoding='unicode_escape'. For example
df= pd.read_csv('file_name.csv',encoding ='unicode_escape')
It worked for me.
I'm running a Python script that writes HTML code found using BeautifulSoup into multiple rows of an Excel spreadsheet column.
[...]
Col_HTML = 19
w_sheet.write(row_index, Col_HTML, str(HTML_Code))
wb.save(output)
When trying to save the file, I get the following error message:
Traceback (most recent call last):
File "C:\Users\[..]\src\MYCODE.py", line 201, in <module>
wb.save(output)
File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\Workbook.py", line 662, in save
doc.save(filename, self.get_biff_data())
File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\Workbook.py", line 637, in get_biff_data
shared_str_table = self.__sst_rec()
File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\Workbook.py", line 599, in __sst_rec
return self.__sst.get_biff_record()
File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\BIFFRecords.py", line 76, in get_biff_record
self._add_to_sst(s)
File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\BIFFRecords.py", line 91, in _add_to_sst
u_str = upack2(s, self.encoding)
File "C:\Python27\lib\site-packages\xlwt-0.7.5-py2.7.egg\xlwt\UnicodeUtils.py", line 50, in upack2
us = unicode(s, encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 5181: ordinal not in range(128)
I've successfully written Python script in the past to write into worksheets. It's the first time I try to write a string of HTML into cells and I'm wondering what is causing the error and how I could fix it.
Use this line before passing HTML_Code to w_sheet.write
HTML_Code = HTML_Code.decode('utf-8')
Because, in the error line UnicodeDecodeError: 'ascii' codec can't decode, Python is trying to decode unicode into ascii, so you need to decode unicode using the proper encoding format, that is, utf-8.
So, you have:
Col_HTML = 19
HTML_Code = HTML_Code.decode('utf-8')
w_sheet.write(row_index, Col_HTML, str(HTML_Code))
I am posting to github's api for markdown, and in the post request I am sending json data. I discovered that I can't write lists because the characters are not a part of ascii and looked it up to find that I should always encode. I encoded the text which needed to be marked down and the api is working, but I still get the same error when I try to make lists.
The code for the POST method is:
def markDown(to_mark):
headers = {
'content-type': 'application/json'
}
text = to_mark.decode('utf8')
payload = {
'text': text,
'mode':'gfm'
}
data = json.dumps(payload)
req = urllib2.Request('https://api.github.com/markdown', data, headers)
response = urllib2.urlopen(req)
marked_down = response.read()
return marked_down
And the error that I get when I try making lists is as follows:
'ascii' codec can't decode byte 0xe2 in position 55: ordinal not in range(128)
Add the full traceback:
Traceback (most recent call last):
File "/home/bigb/Programming/google_appengine/google/appengine/runtime/wsgi.py", line 266, in Handle
result = handler(dict(self._environ), self._StartResponse)
File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1519, in __call__
response = self._internal_error(e)
File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1511, in __call__
rv = self.handle_exception(request, response, e)
File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 547, in dispatch
return self.handle_exception(e, self.app.debug)
File "/home/bigb/Programming/google_appengine/lib/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/home/bigb/Programming/Blog/my-ramblings/blog.py", line 232, in post
mark_blog = markDown(blog)
File "/home/bigb/Programming/Blog/my-ramblings/blog.py", line 43, in markDown
text = to_mark.decode('utf8')
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 45-46: ordinal not in range(128)
Am I understanding something wrong here ? Thanks!
Your to_mark value is not a Unicode value; you already have encoded byte string there. Trying to encode a byte string tells Python that it should first decode the value to Unicode before encoding again. This causes your exception:
>>> '\xc3\xa5'.encode('utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
For the json.dumps() function, you want to use Unicode values. If to_mark contains UTF-8 data, use str.decode():
text = to_mark.decode('utf8')
Your code snippets reads:
text = to_mark.encode('utf-8')
but in the traceback you have:
File "/home/bigb/Programming/Blog/my-ramblings/blog.py", line 43, in markDown
text = to_mark.decode('utf8')
Please first make sure you post the real code and traceback (that is: you post the code that actually raise the exception).
I can not remember accurately, but probably using decode/encode at response.read() worked for me when I have faced the exact same error.
response.read().decode("utf8")
I have a question that is a bit hard to explain. I'm forking devsniper's application 'customers' as a base to start a POS system for a local computer shop. The original application uses MySQL, however it is critical that this application uses my client's original data. So I am presented with two options:
1) I can migrate the SQLite Database to a MySQL DB
2) I can modify the program to use the SQLite DB (Preferred)
However, whenever I try to pull up the customers page, I get the following:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
I am not sure where to start with detailing my problem, as there isn't much detail in precisely what is causing this problem, however I will start with the traceback.
Traceback (most recent call last):
File "/home/tabras/posenv/local/lib/python2.7/site-packages/pyramid-1.4.2-py2.7.egg/pyramid/mako_templating.py", line 232, in __call__
result = template.render_unicode(**system)
File "/home/tabras/posenv/local/lib/python2.7/site-packages/Mako-0.8.1-py2.7.egg/mako/template.py", line 452, in render_unicode
as_unicode=True)
File "/home/tabras/posenv/local/lib/python2.7/site-packages/Mako-0.8.1-py2.7.egg/mako/runtime.py", line 783, in _render
**_kwargs_for_callable(callable_, data))
File "/home/tabras/posenv/local/lib/python2.7/site-packages/Mako-0.8.1-py2.7.egg/mako/runtime.py", line 815, in _render_context
_exec_template(inherit, lclcontext, args=args, kwargs=kwargs)
File "/home/tabras/posenv/local/lib/python2.7/site-packages/Mako-0.8.1-py2.7.egg/mako/runtime.py", line 841, in _exec_template
callable_(context, *args, **kwargs)
File "/home/tabras/posenv/customers/customers/templates/base/index.html", line 102, in render_body
${next.body()}
File "/home/tabras/posenv/customers/customers/templates/customer/list.html", line 19, in render_body
<%include file="listPartial.html"/>
File "/home/tabras/posenv/local/lib/python2.7/site-packages/Mako-0.8.1-py2.7.egg/mako/runtime.py", line 710, in _include_file
callable_(ctx, **_kwargs_for_include(callable_, context._data, **kwargs))
File "/home/tabras/posenv/customers/customers/templates/customer/listPartial.html", line 50, in render_body
${pager(customers)}
File "/home/tabras/posenv/customers/customers/templates/base/uiHelpers.html", line 10, in render_pager
${items.pager(format="$link_previous ~2~ $link_next",
File "/home/tabras/posenv/local/lib/python2.7/site-packages/WebHelpers-1.3-py2.7.egg/webhelpers/paginate.py", line 716, in pager
self._pagerlink(self.next_page, symbol_next) or ''
File "/home/tabras/posenv/local/lib/python2.7/site-packages/WebHelpers-1.3-py2.7.egg/webhelpers/paginate.py", line 855, in _pagerlink
return HTML.a(text, href=link_url, onclick=onclick_action, **self.link_attr)
File "/home/tabras/posenv/local/lib/python2.7/site-packages/WebHelpers-1.3-py2.7.egg/webhelpers/html/builder.py", line 213, in __call__
return make_tag(self._tag, *args, **kw)
File "/home/tabras/posenv/local/lib/python2.7/site-packages/WebHelpers-1.3-py2.7.egg/webhelpers/html/builder.py", line 308, in make_tag
chunks.extend(escape(x) for x in args)
File "/home/tabras/posenv/local/lib/python2.7/site-packages/WebHelpers-1.3-py2.7.egg/webhelpers/html/builder.py", line 308, in <genexpr>
chunks.extend(escape(x) for x in args)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
Post Solution Edit:
The problem was here:
${items.pager(format="$link_previous ~2~ $link_next",
symbol_previous="«",
symbol_next="»",
link_attr=link_attr,
curpage_attr=curpage_attr,
dotdot_attr=dotdot_attr,
onclick="$('.list-partial').load('%s'); return false;")}
For some reason the '»' character and its counterpart were giving throwing the error. I simply changed them to standard ascii characters and everything was golden.
Yeah, you were right about slowing down Michael -- It was a really simple error. In uiHelpers.html there was a unicode character '»' which was causing the problem for some reason.. Simply changed that to '>' and it was golden. This was a good lesson in reading the traceback more carefully, thanks for the feedback.
-Tabras
I have a GAE project in Python where I am setting a cookie in one of my RequestHandlers with this code:
self.response.headers['Set-Cookie'] = 'app=ABCD; expires=Fri, 31-Dec-2020 23:59:59 GMT'
I checked in Chrome and I can see the cookie listed, so it appears to be working.
Then later in another RequestHandler, I get the cookie to check it:
appCookie = self.request.cookies['app']
This line gives the following error when executed:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1962: ordinal not in range(128)
It seems that it is trying to decode the incoming cookie info using an ASCII codec rather than UTF-8.
How do I force Python to use UTF-8 to decode this?
Are there any other Unicode-related gotchas that I need to be aware of as a newbie to Python and Google App Engine (but an experienced programmer in other languages)?
Here is the full Traceback:
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/dev_appserver.py", line 4144, in _HandleRequest
self._Dispatch(dispatcher, self.rfile, outfile, env_dict)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/dev_appserver.py", line 4049, in _Dispatch
base_env_dict=env_dict)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/dev_appserver.py", line 616, in Dispatch
base_env_dict=base_env_dict)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/dev_appserver.py", line 3120, in Dispatch
self._module_dict)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/dev_appserver.py", line 3024, in ExecuteCGI
reset_modules = exec_script(handler_path, cgi_path, hook)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/tools/dev_appserver.py", line 2887, in ExecuteOrImportScript
exec module_code in script_module.__dict__
File "/Users/ken/hgdev/juicekit/main.py", line 402, in <module>
main()
File "/Users/ken/hgdev/juicekit/main.py", line 399, in main
run_wsgi_app(application)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/util.py", line 98, in run_wsgi_app
run_bare_wsgi_app(add_wsgi_middleware(application))
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/util.py", line 116, in run_bare_wsgi_app
result = application(env, _start_response)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/__init__.py", line 721, in __call__
response.wsgi_write(start_response)
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/__init__.py", line 296, in wsgi_write
body = self.out.getvalue()
File "/System/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/StringIO.py", line 270, in getvalue
self.buf += ''.join(self.buflist)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1962: ordinal not in range(128)
You're looking to use the decode function somewhat like this (cred #agf:):
self.request.cookies['app'].decode('utf-8')
From official python documentation (plus a couple added details):
Python’s 8-bit strings have a .decode([encoding], [errors]) method that interprets the string using the given encoding. The following example shows the string as it goes to unicode and then back to 8-bit string:
>>> u = unichr(40960) + u'abcd' + unichr(1972) # Assemble a string
>>> type(u), u # Examine
(<type 'unicode'>, u'\ua000abcd\u07b4')
>>> utf8_version = u.encode('utf-8') # Encode as UTF-8
>>> type(utf8_version), utf8_version # Examine
(<type 'str'>, '\xea\x80\x80abcd\xde\xb4')
>>> u2 = utf8_version.decode('utf-8') # Decode using UTF-8
>>> u == u2 # The two strings match
True
First, encode any unicode value you set in the cookies. You also need to quote them in case they can break the header:
import urllib
# This is the value we want to set.
initial_value = u'äëïöü'
# WebOb version that comes with SDK doesn't quote cookie values
# in the Response, neither webapp.Response. So we have to do it.
quoted_value = urllib.quote(initial_value.encode('utf-8'))
rsp = webapp.Response()
rsp.headers['Set-Cookie'] = 'app=%s; Path=/' % quoted_value
Now let's read the value. To test it, create a fake Request to test the cookie we have set. This code was extracted from a real unittest:
cookie = rsp.headers.get('Set-Cookie')
req = webapp.Request.blank('/', headers=[('Cookie', cookie)])
# The stored value is the same quoted value from before.
# Notice that here we use .str_cookies, not .cookies.
stored_value = req.str_cookies.get('app')
self.assertEqual(stored_value, quoted_value)
Our value is still encoded and quoted. We must do the reverse to get the initial one:
# And we can get the initial value unquoting and decoding.
final_value = urllib.unquote(stored_value).decode('utf-8')
self.assertEqual(final_value, initial_value)
If you can, consider using webapp2. webob.Response does all the hard work of quoting and setting cookies, and you can set unicode values directly. See a summary of these issues here.