Python 3.3 cgi: Cannot decode GET parameter with value %A3 - python

I am really struggling to find an answer to this.
I am writing a simple cgi script and the input GET parameters will be html encoded
e.g. £ -> %A3
Here 2 test URLs im using in my browser.
?a=%7B&b=%A3
?a={&b=£
When i loop through the parameters from cgi.FieldStorage i get an exception with the b parameter.
- i know its related to encodign of some form, but i just can't work out a solution.
key = a
value = {
key = b
ERROR: 'ascii' codec can't encode character '\ufffd' in position 12: ordinal not in range(128)
key = a
value = {
key = b
ERROR: 'ascii' codec can't encode character '\xa3' in position 12: ordinal not in range(128)
The following is the test CGI script.
#!/opt/python-3.3.4/bin/python3
import cgitb
import cgi
import sys
print("Content-Type: text/html; charset=utf-8")
print("")
print("<html>")
print("<body>")
print("<h1>Hello</h1>")
form = cgi.FieldStorage()
#form = cgi.FieldStorage(encoding="utf8")
for i in form.keys():
print("<br>key = ", i)
try:
tmp = form[i].value
print("<br>value = %s" % tmp)
except Exception as err:
print("<br>ERROR:", err)
print("</body>")
print("</html>")

I believe that GET only supports ASCII characters.
Therefore you need to use POST for non-ASCII characters

Related

UnicodeEncodeError / python3

Can't solve typical issue with encodings. Cyrrlic text is received via post and error is raised
'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)
The text key itself has a look and it must cyrillic text in russian: u'\u043f\u0440\u043e'
After that error tried this way and some others:
key = key.decode('ascii').encode('utf8')
or :
key = key.decode('ascii')
Localy it works, error is raised in production only. Python system encoding in production is utf8
EDIT: in order to clear things up. Error is raised on form handler function(again, works localy, doesn't in production)
def search(request):
if request.method == 'POST':
key = request.POST.get("key")
if key is not None:
..
So it's str received from input form, and error is first raised at this point, so I supposed it must be decoded but it didn't help.
more traceback:
UnicodeEncodeError at /search/
'ascii' codec can't encode characters in position 0-2: ordinal not in range(128)

Python 2.7: 'ascii' codec can't encode character u'\xe9' error while writing in file

I know this question have been asked various time but somehow I am not getting results.
I am fetching data from web which contains a string Elzéar. While going to read in CSV file it gives error which mentioned in question title.
While producing data I did following:
address = str(address).strip()
address = address.encode('utf8')
return name+','+address+','+city+','+state+','+phone+','+fax+','+pumps+','+parking+','+general+','+entertainment+','+fuel+','+resturants+','+services+','+technology+','+fuel_cards+','+credit_cards+','+permits+','+money_services+','+security+','+medical+','+longit+','+latit
and writing it as:
with open('records.csv', 'a') as csv_file:
print(type(data)) #prints <unicode>
data = data.encode('utf8')
csv_file.write(id+','+data+'\n')
status = 'OK'
the_file.write(ts+'\t'+url+'\t'+status+'\n')
Generates error as:
'ascii' codec can't encode character u'\xe9' in position 55: ordinal
not in range(128)
You could try something like (python2.7):
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import codecs
...
with codecs.open('records.csv', 'a', encoding="utf8") as csv_file:
print(type(data)) #prints <unicode>
# because data is unicode
csv_file.write(unicode(id)+u','+data+u'\n')
status = u'OK'
the_file.write(unicode(ts, encoding="utf8")+u'\t'+unicode(url, encoding="utf8")+u'\t'+status+u'\n')
The main idea is to work with unicode as much as possible and return str when outputing (better do not operate over str).

UnicodeDecodeError: 'utf-8' codec can't decode byte error

I'm trying to get a response from urllib and decode it
to a readable format. The text is in Hebrew and also contains characters like { and /
top page coding is:
# -*- coding: utf-8 -*-
raw string is:
b'\xff\xfe{\x00 \x00\r\x00\n\x00"\x00i\x00d\x00"\x00 \x00:\x00 \x00"\x001\x004\x000\x004\x008\x003\x000\x000\x006\x004\x006\x009\x006\x00"\x00,\x00\r\x00\n\x00"\x00t\x00i\x00t\x00l\x00e\x00"\x00 \x00:\x00 \x00"\x00\xe4\x05\xd9\x05\xe7\x05\xd5\x05\xd3\x05 \x00\xd4\x05\xe2\x05\xd5\x05\xe8\x05\xe3\x05 \x00\xd4\x05\xea\x05\xe8\x05\xe2\x05\xd4\x05 \x00\xd1\x05\xde\x05\xe8\x05\xd7\x05\xd1\x05 \x00"\x00,\x00\r\x00\n\x00"\x00d\x00a\x00t\x00a\x00"\x00 \x00:\x00 \x00[\x00]\x00\r\x00\n\x00}\x00\r\x00\n\x00\r\x00\n\x00'
Now I'm trying to decode it using:
data = data.decode()
and I get the following error:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte
Your problem is that that is not UTF-8. You have UTF-16 encoded data, decode it as such:
>>> data = b'\xff\xfe{\x00 \x00\r\x00\n\x00"\x00i\x00d\x00"\x00 \x00:\x00 \x00"\x001\x004\x000\x004\x008\x003\x000\x000\x006\x004\x006\x009\x006\x00"\x00,\x00\r\x00\n\x00"\x00t\x00i\x00t\x00l\x00e\x00"\x00 \x00:\x00 \x00"\x00\xe4\x05\xd9\x05\xe7\x05\xd5\x05\xd3\x05 \x00\xd4\x05\xe2\x05\xd5\x05\xe8\x05\xe3\x05 \x00\xd4\x05\xea\x05\xe8\x05\xe2\x05\xd4\x05 \x00\xd1\x05\xde\x05\xe8\x05\xd7\x05\xd1\x05 \x00"\x00,\x00\r\x00\n\x00"\x00d\x00a\x00t\x00a\x00"\x00 \x00:\x00 \x00[\x00]\x00\r\x00\n\x00}\x00\r\x00\n\x00\r\x00\n\x00'
>>> data.decode('utf16')
'{ \r\n"id" : "1404830064696",\r\n"title" : "פיקוד העורף התרעה במרחב ",\r\n"data" : []\r\n}\r\n\r\n'
>>> import json
>>> json.loads(data.decode('utf16'))
{'title': 'פיקוד העורף התרעה במרחב ', 'id': '1404830064696', 'data': []}
If you loaded this from a website with urllib.request, the Content-Type header should contain a charset parameter telling you this; if response is the returned urllib.request response object, then use:
codec = response.info().get_content_charset('utf-8')
This defaults to UTF-8 when no charset parameter has been set, which is the appropriate default for JSON data.
Alternatively, use the requests library to load the JSON response, it handles decoding automatically (including UTF-codec autodetection specific to JSON responses).
One further note: the PEP 263 source code codec comment is used only to interpret your source code, including string literals. It has nothing to do with encodings of external sources (files, network data, etc.).
I got this error in Django with Python 3.4. I was trying to get this to work with django-rest-framework.
This was my code that fixed the error UnicodeDecodeError: 'utf-8' codec can't decode byte error.
This is the passing test:
import os
from os.path import join, dirname
import uuid
from rest_framework.test import APITestCase
class AttachmentTests(APITestCase):
def setUp(self):
self.base_dir = dirname(dirname(dirname(__file__)))
self.image = join(self.base_dir, "source/test_in/aaron.jpeg")
self.image_filename = os.path.split(self.image)[1]
def test_create_image(self):
id = str(uuid.uuid4())
with open(self.image, 'rb') as data:
# data = data.read()
post_data = {
'id': id,
'filename': self.image_filename,
'file': data
}
response = self.client.post("/api/admin/attachments/", post_data)
self.assertEqual(response.status_code, 201)

UnicodeEncodeError in json

I am teaching myself how to parse google results with json, but when I run this code ( which shoud work ), I am getting this error: UnicodeEncodeError: 'charmap' codec can't encode character u'\u2014' in position 5: character maps to <undefined>. Can someone help me?
import urllib
import simplejson
query = urllib.urlencode({'q' : 'site:example.com'})
url = 'http://ajax.googleapis.com/ajax/services/search/web?v=1.0&%s&start=50' \
% (query)
search_results = urllib.urlopen(url)
json = simplejson.loads(search_results.read())
results = json['responseData']['results']
for i in results:
print i['title'] + ": " + i['url']
This error may be caused by the encoding that your console application uses when sending unicode data to stdout. There's an article that talks about it.
Check stdout's encoding:
>>> import sys
>>> sys.stdout.encoding # On my machine I get this result:
'UTF-8'
Use unicode literals.
print i[u'title'] + u": " + i[u'url']
Also:
jsondata = simplejson.load(search_results)
My guess is that the error is in simplejson.loads(search_results.read()) line, possibly because the default encoding your python is picking up is not utf-8 and google is returning utf-8.
Try: simplejson.loads(unicode(search_results.read(), "utf8").

Insert record of utf-8 character (Chinese, Arabic, Japanese.. etc) into GAE datastore programatically with python

I just want to build simple UI translation built in GAE (using python SDK).
def add_translation(self, pid=None):
trans = Translation()
trans.tlang = db.Key("agtwaW1kZXNpZ25lcnITCxILQXBwTGFuZ3VhZ2UY8aIEDA")
trans.ttype = "UI"
trans.transid = "ui-about"
trans.content = "关于我们"
trans.put()
this is resulting encoding error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 0: ordinal not in range(128)
How to encode the correct insert content with unicode(utf-8) character?
using the u notation:
>>> s=u"关于我们"
>>> print s
关于我们
Or explicitly, stating the encoding:
>>> s=unicode('אדם מתן', 'utf8')
>>> print s
אדם מתן
Read more at the Unicode HOWTO page in the python documentation site.

Categories