Template strings with UTF8 in Python 2.7 - python

Please I need help with python 2.7.
I use from string import Template
and there error with Unicode
if I print the string without Template working good
and if I print it under Template appear error
AH01215: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 8: ordinal not in range(128)
my example: 2 files:
index.py
template.py
in template.py I use this code
#!/usr/bin/python
# -*- coding: utf-8 -*-
########################################################
#
from string import Template
ABC = Template("""<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Hello ${NAME}""")
and in index.py I use this code
#!/usr/bin/python
# -*- coding: utf-8 -*-
########################################################
import template
print "Content-Type: text/html\n"
ZXC = "m’a réveillé"
print template.ABC.substitute(dict(NAME=ZXC))
If I used this code appear the error above
and if I print it direct without under template print ZXC working good
How can fix this utf8 under the template?

It is needed to escape the special chars before feeding the template with them.
But first specify that the string is unicode. I believe your index.py should just become:
#!/usr/bin/python
# -*- coding: utf-8 -*-
########################################################
import template
print "Content-Type: text/html\n"
ZXC = u"m’a réveillé".encode('ascii', 'xmlcharrefreplace')
print template.ABC.substitute(dict(NAME=ZXC))

Related

python HTML rendering changese utf-8 characters to unknown characters

from util.lead_email import lead_template
lead_template.render()
I have special characters in HTML such as "ğ, ş, ı, ç"
So the render output is ;
# Rendered output
oluşturduğunuz teklif için aldınız
# HTML
oluşturduğunuz teklif için aldınız.
How can I fix this problem?
I tried unescape, escape, decode, encode

SimpleHTTPServer in Python3.6.4 can not handle non-ASCII string(Chinese in my case)

I run SimpleHTTPServer in Python3.6.4 64bit by this command:
python -m http.server --cgi
then I make a form in test.py, submit it to test_form_action.py to print the input text.
cgi-bin/test.py
# coding=utf-8
from __future__ import unicode_literals, absolute_import
print("Content-Type: text/html") # HTML is following
print()
reshtml = '''<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html" charset="utf-8"/>
</head>
<body>
<div style="text-align: center;">
<form action="/cgi-bin/test_form_action.py" method="POST"
target="_blank">
输入:<input type="text" id= "id" name="name"/></td>
<button type="submit">Submit</button>
</form>
</div>
</body>
</html>'''
print(reshtml)
cgi-bin/test_form_action.py
# coding=utf-8
from __future__ import unicode_literals, absolute_import
# Import modules for CGI handling
import cgi, cgitb
cgitb.enable()
if __name__ == '__main__':
print("Content-Type: text/html") # HTML is following
print()
form = cgi.FieldStorage()
print(form)
id = form.getvalue("id")
name = form.getvalue("name")
print(id)
When I visit http://127.0.0.1:8000/cgi-bin/test.py,
The Chinese Character "输入" doesn't show right, it look like "����",
I have to manually change the Text Encoding of this page from
"Unicode" to "Chinese Simplified" in Firefox to make Chinese Character look normal.
It's weird, since I put charset="utf-8" in cgi-bin/test.py.
Further more, when I put some Chinese in input form, and submit. But cgi-bin/test_form_action.py is blank.
meanwhile some error show in windows terminal where I run SimpleHTTPServer:
127.0.0.1 - - [23/Mar/2018 23:43:32] b'Error in sys.excepthook:\r\nTraceback (most recent call last):\r\n File
"E:\Python\Python36\Lib\cgitb.py", line 26 8, in call\r\n
self.handle((etype, evalue, etb))\r\n File
"E:\Python\Python36\Lib\cgitb.py", line 288, in handle\r\n
self.file.write(doc + \'\ n\')\r\nUnicodeEncodeError: \'gbk\' codec
can\'t encode character \'\ufffd\' in position 1894: illegal
multibyte sequence\r\n\r\nOriginal exception was:\r\nT raceback (most
recent call last):\r\n File
"G:\Python\Project\VideoHelper\cgi-bin\test_form_action.py", line
13, in \r\n print(form)\r\nUnico deEncodeError: \'gbk\'
codec can\'t encode character \'\ufffd\' in position 52: illegal
multibyte sequence\r\n'
127.0.0.1 - - [23/Mar/2018 23:43:32] CGI script exit status 0x1
When you use the print() expression, Python converts the strings to bytes, ie. it encodes them using a default codec.
The choice of this default value depends on the environment – in your case it seems to be GBK (judging from the error message).
In the HTML page your CGI script returns, you specify the codec ("charset") as UTF-8.
You can of course change this to GBK, but it will only solve your first problem (display of test.py), not the second one (encoding error in test_form_action.py).
Instead, it's probably better to get Python to send UTF-8-encoded data on STDOUT.
One approach is to replace all occurrences of
print(x)
with
sys.stdout.buffer.write(x.encode('utf8'))
Alternatively, you can replace sys.stdout with a re-encoded wrapper, without changing the print() occurrences:
sys.stdout = open(sys.stdout.buffer.fileno(), 'w', encoding='utf8'))
Note: These two solutions don't work in Python 2.x (you'd have to omit the .buffer part there).
I'm writing this because your code has from __future__ import statements, which have no use in code that is run with Python 3 exclusively.

How to get utf-8 from forms in Bottle?

I am trying to use Bottle.py to get input information from users in a web page.
Everything works fine except when I have latin characters (accents mostly). I have try using utf-8 and latin-1 coding on the first two lines of the code, but it won't work.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import bottle
#bottle.post('/newpost')
def post_newpost():
subject = bottle.request.forms.get("subject")
body = bottle.request.forms.get("body")
tags = bottle.request.forms.get("tags")
and the html code from the page is:
<html>
<head>
<meta charset="utf-8" />
<title>New Posts</title>
</head>
<body>
<form action="/newpost" method="POST">
<h2>Post title</h2>
<input type="text" name="subject" size="120" value="{{subject}}" ><br>
<h2>Post<h2>
<textarea name="body" cols="120" rows="20">{{body}}</textarea><br>
<h2>Tags</h2>
<input type="text" name="tags" size="120" value="{{tags}}"><br>
<p>
<input type="submit" value="Submit">
</body>
</html>
I read in Bottle page that:
In Python 3 all strings are unicode, but HTTP is a byte-based wire
protocol. The server has to decode the byte strings somehow before
they are passed to the application. To be on the safe side, WSGI
suggests ISO-8859-1 (aka latin1), a reversible single-byte codec that
can be re-encoded with a different encoding later. Bottle does that
for FormsDict.getunicode() and attribute access, but not for the
dict-access methods. These return the unchanged values as provided by
the server implementation, which is probably not what you want.
request.query['city']
'Göttingen' # An utf8 string provisionally decoded as ISO-8859-1 by the server
request.query.city
'Göttingen' # The same string correctly re-encoded as utf8 by bottle
If you need the whole dictionary with correctly decoded values (e.g. for WTForms), you can call FormsDict.decode() to get a re-encoded copy.
After reading that I tried using that function but don't know how.
Right now Bottle form returns strings, so I can not use encode('utf-8') or decode('utf-8').
Please help me!
Thanks!
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import bottle
#bottle.post('/newpost')
def post_newpost():
subject = bottle.request.forms.subject
body = bottle.request.forms.body
tags = bottle.request.forms.tags
That will do it.... Thanks!

Encoding on PostgreSQL, Python, Jinja2

I'm having a problem with encoding in my application and didn't find the solution anywhere on web.
Here is the scenario:
PostgreSQL with UTF-8 encoding (CREATE DATABASE xxxx WITH ENCODING 'UTF8')
Python logic also with UTF-8 encoding (# -*- coding: utf-8 -*-)
Jinja2 to show my HTML pages. Python and Jinja2 are used on Flask, which is the microframework I'm using.
The header of my pages have: <meta http-equiv="content-type" content="text/html; charset=utf-8"/>
Well, using psycopg2 to do a simple query and print it on Jinja2, this is what I get:
{% for company in list %}
<li>
{{ company }}
</li>
{% endfor %}
(1, 'Casa das M\xc3\xa1quinas', 'R. Tr\xc3\xaas, Mineiros - Goi\xc3\xa1s')
(2, 'Ar do Z\xc3\xa9', 'Av. S\xc3\xa9tima, Mineiros - Goi\xc3\xa1s')
If I try do get more deep into the fields:
{% for company in list %}
<li>
{% for field in company %}
<li>
{{ field }}
</li>
{% endfor %}
</li>
{% endfor %}
I get the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)
However, if I do a print of the list fields before sending them to Jinja2, I get the expected result (which is also how is presented in postgresql):
1
Casa das Máquinas
R. Três, Mineiros - Goiás
2
Ar do Zé
Av. Sétima, Mineiros - Goiás
When I get the error, Flask offers an option to "debug". This is where the code breaks
File "/home/anonimou/Desktop/flask/lib/python2.7/site-packages/jinja2/_markupsafe/_native.py", line 21, in escape
return Markup(unicode(s)
And I can also do:
[console ready]
>>> print s
Casa das Máquinas
>>> s
'Casa das M\xc3\xa1quinas'
>>> unicode(s)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)
>>> s.decode('utf-8')
u'Casa das M\xe1quinas'
>>> s.encode('utf-8')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 10: ordinal not in range(128)
>>> s.decode('utf-8').encode('utf-8')
'Casa das M\xc3\xa1quinas'
>>> print s.decode('utf-8').encode('utf-8')
Casa das Máquinas
>>> print s.decode('utf-8')
Casa das Máquinas
I've already tried to break the list, decode, encode, in python code before sending it to Jinja2. The same error.
Sooo, not sure what I can do here. =(
Thanks in advance!
The issue is that psycopg2 returns byte strings by default in Python 2:
When reading data from the database, in Python 2 the strings returned are usually 8 bit str objects encoded in the database client encoding
So you can either:
Manually decode all of the data to UTF-8:
# Decode the byte strings into Unicode objects using
# the encoding you know that your database is using.
companies = [company.decode("utf-8") for company in companies]
return render_template("companies.html", companies=companies)
or
Set the encoders when you first import psycopg2 as per the note in the same section of the manual:
Note In Python 2, if you want to uniformly receive all your database input in Unicode, you can register the related typecasters globally as soon as Psycopg is imported:
import psycopg2
import psycopg2.extensions
psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)
and then forget about this story.

How to include Non-AscII character in python appengine send html mail

My problem is that I want to compose an email in python environment of google appengine.
When I add Greek characters to the body of my message I get:
SyntaxError: Non-ASCII character '\xce'
megssage.html = """
<html>
<body>
παραδειγμα
</body>
</html>"""
Use this shebang:
# -*- coding: utf-8 -*-

Categories