python3 default encoding UnicodeDecodeError ascii using apache WSGI - python

import locale
prefered_encoding = locale.getpreferredencoding()
prefered_encoding 'ANSI_X3.4-1968'
I'm using a framework called inginious and it's using web.py to render its template.
web.template.render(os.path.join(root_path, dir_path),
globals=self._template_globals,
base=layout_path)
The rendering works on my localhost but not on my staging server.
They both run python3. I see that web.py enforces utf-8 on
the encoding in Python2 only (that's out of my hands)
def __str__(self):
self._prepare_body()
if PY2:
return self["__body__"].encode('utf-8')
else:
return self["__body__"]
here is the stack trace
t = self._template(name),
File "/lib/python3.5/site-packages/web/template.py", line 1028, in _template,
self._cache[name] = self._load_template(name),
File "/lib/python3.5/site-packages/web/template.py", line 1016, in _load_template
return Template(open(path).read(), filename=path, **self._keywords)
File "/lib64/python3.5/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 83: ordinal not in range(128),
My html do include hebew chars, small example
<div class="modal-content">
<div class="modal-header">
<button type="button" class="close" data-dismiss="modal">×</button>
<h4 class="modal-title feedback-modal-title">
חישוב האיברים הראשונים בסדרה של איבר ראשון חיובי ויחס שלילי:
<span class="red-text">אי הצלחה</span>
and I open it like so :
open('/path/to/feedback.html').read()
and the line where the encoding fails is where the Hebrew chars are.
I tried setting some environment variables in ~/.bashrc:
export PYTHONIOENCODING=utf8
export LC_ALL=en_US.UTF-8
export LANG=en_US.UTF-8
export LANGUAGE=en_US.UTF-8
under the user centos
The ingenious framework is installed as a pip under python3.5 site-packages. and it served by an apache server under the user apache
Tried setting the environment variables in the code (during the init of the app) so that the apache WSGI will be aware of them
import os
os.environ['LC_ALL'] = 'en_US.UTF-8'
os.environ['LANG'] = 'en_US.UTF-8'
os.environ['LANGUAGE'] = 'en_US.UTF-8'
I have edited the /etc/httpd/conf/httpd.conf using the setenv method:
SetEnv LC_ALL en_US.UTF-8
SetEnv LANG en_US.UTF-8
SetEnv LANGUAGE en_US.UTF-8
SetEnv PYTHONIOENCODING utf8
and restarted using sudo service httpd restart and still no luck.
My question is, what is the best practice to solve this. I understand there are hacks for this, but I want to understand what is the underline cause as well as how to solve it.
Thanks!

finally found the answer when reading the file
changed from
open('/path/to/feedback.html').read()
to
import codecs
with codecs.open(file_path,'r',encoding='utf8') as f:
text = f.read()
if anyone has a more general approach that will work, I'll accept his answer

A Python 2+3 solution would be:
import io
with io.open(file_path, mode='r', encoding='utf8') as f:
text = f.read()
See the documentation of io.open.

Related

env: python\r: No such file or directory error : even though there is no /r/n in the script

I am working on Ubuntu 20.04 on Raspberry Pi
I read this error is caused by Windows - Unix Formats conflicts
But my script has no /r/n in it, how can I solve this?
#!/usr/bin/env python
import rospy
from laser_assembler.srv import *
from sensor_msgs.msg import PointCloud2
rospy.init_node("test_client")
rospy.wait_for_service("assemble_scans2")
assemble_scans = rospy.ServiceProxy('assemble_scans2',AssembleScans2)
pub = rospy.Publisher("/pointcloud", PointCloud2, queue_size = 1)
r = rospy.Rate(1)
while not rospy.is_shutdown():
try:
resp = assemble_scans(rospy.Time(0,0), rospy.get_rostime())
print "Got cloud with %u points" % len(resp.cloud.data)
pub.publish(resp.cloud)
except rospy.ServiceException, e:
print "Service call failed: %s" %e
r.sleep()
Windows by default uses CRLF line endings while Linux/Unix use LF line endings.
If your application uses both windows and linux, its better to keep your code in LF since both platforms understand LF line endings.
A foolproof way to convert your code to unix compatible line endings is to use Vi/Vim and explicitly set the line endings to LF.
Use this command in Vi - :set ff=unix to set the line endings to LF. Your code should then work.
And to avoid further conflicts, while developing on Windows, make sure to check that your text editor line ending settings is set to LF.

Does Python CGIHTTPServer decode plus sign(+) in URL into blank space?

In my html, I have below form:
<form method=GET action="/cgi-bin/encry.sh">
<table nowrap>
<tr>
<td>Plain Text:</TD>
<TD><input type="text" name="PlainText"></td>
</tr>
</table>
<input type="submit" value="Encrypt">
</form>
After inputing "aaa +=" and clicking the button,
in my cgi-bin/encry.sh, the QUERY_STRING is assigned as "aaa++=" rather than "aaa +=", nor "a+%2B%3D".
Is that correct behavior, and if so how can I get the blank space correctly?
If not, is that fixed in any later CGIHTTPServer version?
Below provides some info about CGIHTTPServer.py in my CentOS 7.2:
HiAccount-4# md5sum /usr/lib64/python2.7/CGIHTTPServer.py
564afe4defc63001f236b0b2ef899b58 /usr/lib64/python2.7/CGIHTTPServer.py
HiAccount-4# grep __version /usr/lib64/python2.7/CGIHTTPServer.py -i
__version__ = "0.4"
HiAccount-4# grep unquote /usr/lib64/python2.7/CGIHTTPServer.py -i -C 3 -n
84- path begins with one of the strings in self.cgi_directories
85- (and the next character is a '/' or the end of the string).
86- """
87: collapsed_path = _url_collapse_path(urllib.unquote(self.path))
88- dir_sep = collapsed_path.find('/', 1)
89- head, tail = collapsed_path[:dir_sep], collapsed_path[dir_sep+1:]
90- if head in self.cgi_directories:
--
164- env['SERVER_PROTOCOL'] = self.protocol_version
165- env['SERVER_PORT'] = str(self.server.server_port)
166- env['REQUEST_METHOD'] = self.command
167: uqrest = urllib.unquote(rest)
168- env['PATH_INFO'] = uqrest
169- env['PATH_TRANSLATED'] = self.translate_path(uqrest)
170- env['SCRIPT_NAME'] = scriptname
Thanks in advance!
After trying the CGIHTTPServer from python2.7.18, I think the asked question is a known issue in 2.7.5, which was my version, and not sure which version fixed it.
The problem is in:
/usr/lib64/python2.7/CGIHTTPServer.py:
87: collapsed_path = _url_collapse_path(urllib.unquote(self.path))
In 2.7.18, the QUERY_STRING isn't decoded by CGIHTTPServer, and I need to decode it in my CGI script, but that's OK as it's "correct" encoded QUERY_STRING.
BTW, I didn't upgrade my python in OS from 2.7.5 to 2.7.18, but just extract CGIHTTPServer from python 2.7.18 source code and use it as:
nohup python ./CGIHTTPServer.py 7070 &
rather than
nohup python -m CGIHTTPServer 7070 &

Scrapy - FEED_EXPORT_ENCODING Doesn't Work in Ubuntu Server

Even though both my local and server scrapy versions are the same, setting FEED_EXPORT_ENCODING = 'utf-8' in the settings.py doesn't make change in the server for exporting the result in the JSON file.
What I've done in settings.py file :
FEED_EXPORT_ENCODING = 'utf-8'
Which command I run to get the result :
scrapy crawl spiderName -o file.json
What I get in return :
...
'content': u'\n \r\nTruffle
\u0628\u0627 \u0645\u0627\u0654\u0645\u0648'
u'\u0631\u06cc\u062a
\u0631\u0627\u062d\u062a\u200c\u062a\u0631 \u06a9\u0631\u062f\u0646
\u0632'
u'\u0646\u062f\u06af\u06cc
\u062f\u0648\u0644\u0648\u067e\u0631\u0647\u0627\u06cc \u06a9\u0631\
u06cc\u067e\u062a\u0648\u06a9\u
...
I do exactly the same process in my local machine and every unicode decode to utf-8.
What would you suggest?

Why not monkey patch sys.getfilesystemencoding()?

In Python can read the filesystem encoding with sys.getfilesystemencoding().
But there seems to be no official way to set the filesystem encoding.
See: How to change file system encoding via python?
I found this dirty hack:
import sys
sys.getfilesystemencoding = lambda: 'UTF-8'
Is there a better solution, if changing environment variable LANG before starting the interpreter is not an option?
Background, why I want this:
This works:
user#host:~$ python src/setfilesystemencoding.py
LANG: de_DE.UTF-8
sys.getdefaultencoding(): ascii
sys.getfilesystemencoding(): UTF-8
This does not work:
user#host:~$ LANG=C python src/setfilesystemencoding.py
LANG: C
sys.getdefaultencoding(): ascii
sys.getfilesystemencoding(): ANSI_X3.4-1968
Traceback (most recent call last):
File "src/setfilesystemencoding.py", line 10, in <module>
with open('/tmp/german-umlauts-üöä', 'wb') as fd:
UnicodeEncodeError: 'ascii' codec can't encode characters in position 20-22: ordinal not in range(128)
Here is the simple script:
# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, unicode_literals, print_function
import os, sys
print('LANG: {}'.format(os.environ['LANG']))
print('sys.getdefaultencoding(): {}'.format(sys.getdefaultencoding()))
print('sys.getfilesystemencoding(): {}'.format(sys.getfilesystemencoding()))
with open('/tmp/german-umlauts-üöä', 'wb') as fd:
fd.write('foo')
I hopped that above monkey patching would solve this ... but it doesn't. Sorry, this question does not make sense any more. I close it.
My solution: use LANG=C.UTF-8

Locale on django and uwsgi UnicodeEncodeError

EDIT: I just realized, that when i'm not trying to print to console that variable, it works. Why?
I run into an issue related to displaying string label with utf chars. I set locale env in uwsgi ini file like this:
env =LC_ALL=en_US.UTF-8
env =LANG=en_US.UTF-8
and in wsgi.py:
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
When I run app code:
print (locale.getlocale(), locale.getpreferredencoding())
print locale.getdefaultlocale()
print "option_value", option_value
label = force_text(option_label)
print 'label', label #THIS FAILS
the output is:
(('en_US', 'UTF-8'), 'UTF-8')
('en_US', 'UTF-8')
option_value d
ERROR <stack trace>
print 'label', label
UnicodeEncodeError: 'ascii' codec can't encode character u'\u015b' in position 5: ordinal not in range(128)
The problem is not present when I run app via runserver in production environment.
Django 1.6.5 Python 2.7.6 Ubuntu 14.04 uWSGI 2.0.5.1
I just found answer here: http://chase-seibert.github.io/blog/2014/01/12/python-unicode-console-output.html
Realized that the console is responsible for that error, so exporting additional env variable in uwsgi config file solves the issue: env = PYTHONIOENCODING=UTF-8
for all in django when you want use unicode , like in forms and etc .. you must set a u in leading of your unicode that you want to be saved ! do this any where that your unicode have been saved !
in this case i think it is (option_label)

Categories