python HTML rendering changese utf-8 characters to unknown characters - python

from util.lead_email import lead_template
lead_template.render()
I have special characters in HTML such as "ğ, ş, ı, ç"
So the render output is ;
# Rendered output
oluşturduğunuz teklif için aldınız
# HTML
oluşturduğunuz teklif için aldınız.
How can I fix this problem?
I tried unescape, escape, decode, encode

Related

UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1135-1137: ordinal not in range(256)

I'm trying in Django to render from html to pdf a monthly report that takes data from a database
in Hebrew.
I have a problem that I have not been able to fix for several days.
When I display data in English the pdf works properly and everything is fine, but when I want to display data in the Hebrew language from the database, it does not allow me and the following error appears: 'latin-1' codec can't encode characters in position 1135-1137: ordinal not in range(256)
I have already tried a several solutions:
I changed the font several times in the base html of the PDF via the css and it just does not help at all.
I tried to change the rendering function I have at the moment and it also did not help at all.
My rendering function :
def render_to_pdf(template_src, context_dict={}):
template = get_template(template_src)
html = template.render(context_dict)
result = StringIO.StringIO()
pdf = pisa.pisaDocument(StringIO.StringIO(html.encode("UTF-8")), result)
if not pdf.err:
return HttpResponse(result.getvalue(), content_type='application/pdf')
return None

Template strings with UTF8 in Python 2.7

Please I need help with python 2.7.
I use from string import Template
and there error with Unicode
if I print the string without Template working good
and if I print it under Template appear error
AH01215: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 8: ordinal not in range(128)
my example: 2 files:
index.py
template.py
in template.py I use this code
#!/usr/bin/python
# -*- coding: utf-8 -*-
########################################################
#
from string import Template
ABC = Template("""<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
Hello ${NAME}""")
and in index.py I use this code
#!/usr/bin/python
# -*- coding: utf-8 -*-
########################################################
import template
print "Content-Type: text/html\n"
ZXC = "m’a réveillé"
print template.ABC.substitute(dict(NAME=ZXC))
If I used this code appear the error above
and if I print it direct without under template print ZXC working good
How can fix this utf8 under the template?
It is needed to escape the special chars before feeding the template with them.
But first specify that the string is unicode. I believe your index.py should just become:
#!/usr/bin/python
# -*- coding: utf-8 -*-
########################################################
import template
print "Content-Type: text/html\n"
ZXC = u"m’a réveillé".encode('ascii', 'xmlcharrefreplace')
print template.ABC.substitute(dict(NAME=ZXC))

What is wrong with my xpath expression?

I want to extract all the links in td whose class is u-ctitle.
import os
import urllib
import lxml.html
down='http://v.163.com/special/opencourse/bianchengdaolun.html'
file=urllib.urlopen(down).read()
root=lxml.html.document_fromstring(file)
namelist=root.xpath('//td[#class="u-ctitle"]/a')
len(namelist)
The output is [],there are so many td whose classis "u-ctitle" ,with firebug you ca get, why can't extract it?
My python version is 2.7.9.
It is no use to change file into other name.
Your XPath is correct. The problem is unrelated.
If you examine HTML, you will see following meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=GBK" />
And in this code:
file=urllib.urlopen(down).read()
root=lxml.html.document_fromstring(file)
file is actually a bytes sequence, so decoding from GBK-encoded bytes to Unicode string is happening inside document_fromstring method.
The problem is, HTML encoding is not actually GBK and lxml decodes it incorrectly, leading to loss of data.
>>> file.decode('gbk')
Traceback (most recent call last):
File "down.py", line 9, in <module>
file.decode('gbk')
UnicodeDecodeError: 'gbk' codec can't decode bytes in position 7247-7248: illegal multibyte sequence
After some trial and error, we can find that actual encoding is GB_18030. To make script work, you need to decode bytes manually:
root=lxml.html.document_fromstring(file.decode('GB18030'))

How to properly decode Quoted Printable encoding in Django HTML Template

I have a Google app engine in python form submit that POSTS text to a server, and the text gets encoded with the encoding Quoted Printables.
My code for POSTing is this:
<form action={{ upload_url }} method="post" enctype="multipart/form-data">
<div class="sigle-form"><textarea name="body" rows="5"></textarea></div>
<div class="sigle-form"><input name="file" type="file" /></div>
</form>
Then the result of the fetching self.request.get('body') will be encoded with the encoding Quoted Printables. I store this in text DB.textProperty() and later sends the text to a HTML template using Django. When i write out the variable using {{ body }}, the result is written with Quoted printable encoding, and it does not seem that there is a way of decoding this in the Django HTML template.
Is there any way of encoding the text in the body thats sent on another way than with Quoted Printables? If not, how to decode this encoding in the Django HTML template?
The result for submiting the text "ÅØÆ" is encoded to " xdjG ", so the sum of the Quoted Prinables are somehow added togheter as well. This happens when more than one special character are present in the encoded text. An ordinary "ø" is encoded to =F8.
EDIT: I get this problem only in production, and this thread seems to talk about the same problem.
If anyone else here on Stack Overflow are doing form submit with blobs and åæøè characters, please respond to this thread on how you have solved it!
Ok, after two days working with this issue i finally resolved it. Its seemingly a bug with Google App Engine that makes the encoding f'ed up in production. When in production the text is sometimes encoded with Quoted Printable encoded, and some other times encoded with base64 encoding. Weird. Here is my solution:
postBody = self.request.get('body')
postBody = postBody.encode('iso-8859-1')
DEBUG = os.environ['SERVER_SOFTWARE'].startswith('Dev')
if DEBUG:
r.body = postBody
else:
postBody += "=" * ((4 - len(postBody) % 4) % 4)
b64 = base64.urlsafe_b64decode(postBody)
Though the resulting b64 can't be stored in the data storage because it's not ascii encoded
'ascii' codec can't decode byte 0xe5 in position 5: ordinal not in range(128)
I solved a similar problem by using the Python quopri module to decode the string before passing it to an HTML template.
import quopri
body = quopri.decodestring(body)
This seems to be something to do with the multipart/form-data enctype. Quotable printable encoding is applied to the textarea input, which is then, in my case, submitted via a blobstore upload link. The blobstore returns the text to my upload handler still in encoded form.
Not sure what Quoted Printables are but have you tried safe?
{{ body|safe }}
https://docs.djangoproject.com/en/dev/ref/templates/builtins/?from=olddocs#safe

How to include Non-AscII character in python appengine send html mail

My problem is that I want to compose an email in python environment of google appengine.
When I add Greek characters to the body of my message I get:
SyntaxError: Non-ASCII character '\xce'
megssage.html = """
<html>
<body>
παραδειγμα
</body>
</html>"""
Use this shebang:
# -*- coding: utf-8 -*-

Categories