How to properly decode Quoted Printable encoding in Django HTML Template - python

I have a Google app engine in python form submit that POSTS text to a server, and the text gets encoded with the encoding Quoted Printables.
My code for POSTing is this:
<form action={{ upload_url }} method="post" enctype="multipart/form-data">
<div class="sigle-form"><textarea name="body" rows="5"></textarea></div>
<div class="sigle-form"><input name="file" type="file" /></div>
</form>
Then the result of the fetching self.request.get('body') will be encoded with the encoding Quoted Printables. I store this in text DB.textProperty() and later sends the text to a HTML template using Django. When i write out the variable using {{ body }}, the result is written with Quoted printable encoding, and it does not seem that there is a way of decoding this in the Django HTML template.
Is there any way of encoding the text in the body thats sent on another way than with Quoted Printables? If not, how to decode this encoding in the Django HTML template?
The result for submiting the text "ÅØÆ" is encoded to " xdjG ", so the sum of the Quoted Prinables are somehow added togheter as well. This happens when more than one special character are present in the encoded text. An ordinary "ø" is encoded to =F8.
EDIT: I get this problem only in production, and this thread seems to talk about the same problem.
If anyone else here on Stack Overflow are doing form submit with blobs and åæøè characters, please respond to this thread on how you have solved it!

Ok, after two days working with this issue i finally resolved it. Its seemingly a bug with Google App Engine that makes the encoding f'ed up in production. When in production the text is sometimes encoded with Quoted Printable encoded, and some other times encoded with base64 encoding. Weird. Here is my solution:
postBody = self.request.get('body')
postBody = postBody.encode('iso-8859-1')
DEBUG = os.environ['SERVER_SOFTWARE'].startswith('Dev')
if DEBUG:
r.body = postBody
else:
postBody += "=" * ((4 - len(postBody) % 4) % 4)
b64 = base64.urlsafe_b64decode(postBody)
Though the resulting b64 can't be stored in the data storage because it's not ascii encoded
'ascii' codec can't decode byte 0xe5 in position 5: ordinal not in range(128)

I solved a similar problem by using the Python quopri module to decode the string before passing it to an HTML template.
import quopri
body = quopri.decodestring(body)
This seems to be something to do with the multipart/form-data enctype. Quotable printable encoding is applied to the textarea input, which is then, in my case, submitted via a blobstore upload link. The blobstore returns the text to my upload handler still in encoded form.

Not sure what Quoted Printables are but have you tried safe?
{{ body|safe }}
https://docs.djangoproject.com/en/dev/ref/templates/builtins/?from=olddocs#safe

Related

Gmail API encoding - how to get rid of 3D and &amp

I am trying to extract the body of GMAIL emails via GMAIL API, using Python well.
I am able to extract the messages using the commands below. However, there seems to be an issue with the encoding of the email text (Original email has html in it) - for some reason, every time before each quote 3D appears.
Also, within the a href="my_url", I have random equal signs = appearing, and at the end of the link, there is &amp character which is not in the original HTML of the email.
Any idea how to fix this?
Code I use to extract the email:
from __future__ import print_function
from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
from apiclient import errors
import base64
msgs = service.users().messages().list(userId='me', q="no-reply#hello.com",maxResults=1).execute()
for msg in msgs['messages']:message = service.users().messages().get(userId='me', id=m_id, format='raw').execute()
"raw": Returns the full email message data with body content in the raw field as a base64url encoded string; the payload field is not used."
print(base64.urlsafe_b64decode(message['raw'].encode('ASCII')))
td style=3D"padding:20px; color:#45555f; font-family:Tahoma,He=
lvetica; font-size:12px; line-height:18px; "
JPk79hd =
JFQZEhc6%2BpAiQKF8M85SFbILbNd6IG8%2FEAWwe3VTr2jPzba4BHf%2FEnjMxq66fr228I7OS =
You should check the Content-Transfer-Encoding header to see if it specifies quoted-printable because that looks like quoted-printable encoded text.
Per RFC 1521, Section 5.1:
The Quoted-Printable encoding is intended to represent data that largely consists of octets that correspond to printable characters in the US-ASCII character set. It encodes the data in such a way that the resulting octets are unlikely to be modified by mail transport. If the data being encoded are mostly US-ASCII text, the encoded form of the data remains largely recognizable by humans. A body which is entirely US-ASCII may also be encoded in Quoted-Printable to ensure the integrity of the data should the message pass through a character-translating, and/or line-wrapping gateway.
Python's quopri module can be used to decode emails with this encoding.
Sadly I wasn't able to figure out the proper way to decode the message.
I ended up using the following workaround, which:
1) splits the message into a list, with each separate line as a list item
2) Figures out the list location of one of the strings, and location of ending string.
3) Generates a new list out of #2, then regenerates the same list, cutting out the last character (equals sign)
4) Generates a string out of the new list
5) searches for the URL I want
x= mime_msg.splitlines() #convert to list
a = ([i for i, s in enumerate(x) if 'My unique start string' in s])[0] #get list# of beginning
b = ([i for i, s in enumerate(x) if 'my end id' in s])[0] #end
y = x[a:b] #generate list w info we want
new_list=[]
for item in y:new_list.append(item[:-1]) #get rid of last character, which bs base64 encoding is "="
url = ("".join(new_list)) #convert to string
url = url.replace("3D","").replace("&amp","") #cleaner for some reason - encoding gives us random 3Ds + &amps
csv_url = re.search('Whatever message comes before the URL (.*)',url).group(1)
The above uses
import re
from __future__ import print_function
from googleapiclient.discovery import build
from httplib2 import Http
from oauth2client import file, client, tools
from apiclient import errors
import base64
import email
I have send a mail from my webservice in asp.net to gmail
The content is in true html
It showed as wanted despite the =3D
Dim Bericht As MailMessage
Bericht = New MailMessage
the content of my styleText is
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-=1">
<meta content="text/html; charset=us-ascii">
<style>h1{color:blue;}
.EditText{
background:#ff0000;/*rood*/
height:100;
font-size:10px;
color:#0000ff;/*blauw*/
}
</head>
and the content of my body is
<div class='EditText'>this is just some text</div>
finaly I combine it in
Bericht.Body = "<html>" & styleText & "<body>" & content& "</body></html>"
if I look in the source of the message received, there is still this 3D
it shows
<html><head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=
=3D1">
<meta content=3D"text/html; charset=3Dus-ascii">
<style>h1{color:blue;}
.EditText{
background:#ff0000;/*rood*/
height:100;
font-size:10px;
color:#0000ff;/*blauw*/
}
</style>
</head><body><div class=3D'EditText'>MailadresAfzender</div></body></html>
the result showed a blue text with a red background. Great

Error at form: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 1: ordinal not in range(128)

I am getting this problem on specific form and rest of the Python/Flask project is working fine. I am using Pycharm and my code is set to utf-8 in IDE. Don't know how to manage this. There is a form and its getting input value from list like below:
my_school = form.university.data
waiverlist = ['Alpha University', 'Beta College', 'Charlie University', 'Foxthroat International University']
if my_school in waiverlist:
package = Package(
student_id=profile_data.id,
stripe_id = 'N/A For non-stripe users',
student_email= profile_data.email,
is_active=True,
package_type='PartnerSubscription',
subscription_id='N/A For non-stripe users'
)
dbase.session.add(package)
dbase.session.commit()
In my template I have:
<div class="col-xs-6 col-md-6">
{{ form.university.label }}{{ form.university(class_='form-control reg-select') }}
</div>
Error is here
Try to use some smart IDE...I personally would prefer PyCharm. There must be some coding convention defined in your IDE so set it to UTF-8 and the code will work fine. If the issue was related to the whole data in template it could had been issue with UTF-8 but in your case you are copy pasting code from multiple different sources. So your code editor isn't able to identify exact character set. There is no other rational explanation for you issue.
try encoding university.label as using 'utf-8' in the template.
{{ form.university.label.encode('utf-8) }}

pdf StringIO embeded rendered html flask template

In my flask-app, i'm want to see a preview of pdf that will be generated, before definitely printing and saving it in my application_base_folder.
I could save previews on a tmp dir, but that's not what I really looking for.
I'm creating a pdf with report lab :
def gen_pdf(text):
output = cStringIO.StringIO()
c = canvas.Canvas(output)
c.drawString(100,100, text)
c.showPage()
c.save()
pdf_output=output.getvalue()
output.close()
return pdf_output
The pdf would be then send to html template with my form to update part
class Form(wtforms.Form):
text = TextField('text')
#app.route('/finalize/pdf/')
def finalize_pdf():
form = Form(request.form)
pdf_output = gen_pdf(form.text.data)
return render_template('preview_pdf.html', form=form, pdf_output=pdf_output)
and in the html page, I have my form from which i can update the text, a button to POST value (it doesn't appear in the view finalize_pdf() ), and the pdf preview :
<form method='post' action={{ url_for('finalize_pdf') }}
{{ form.text }}
<input name='update' value='update'>
</form>
<br />
<embed src="{{ pdf_output }}" type="application/pdf" width='30%'>
The problem here is that "src" is probably looking for a path on the FS, I guess ? Anyway, it "dies" with this error :
UnicodeDecodeError: 'ascii' codec can't decode byte 0x93 in position 11: ordinal not in range(128)
Am I so missing something, or is it impossible to embed / object in html a StringIO and I must write it on a tmp_path on my FS ?
This is a bit late but you could use base64 encoding for this problem:
from base64 import b64encode
# in your view:
return render_template(..., pdf_output=b64encode(pdf_output))
And your template:
<embed src="data:application/pdf;base64,{{ pdf_output }}" type="application/pdf" width='30%'>

How to get utf-8 from forms in Bottle?

I am trying to use Bottle.py to get input information from users in a web page.
Everything works fine except when I have latin characters (accents mostly). I have try using utf-8 and latin-1 coding on the first two lines of the code, but it won't work.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import bottle
#bottle.post('/newpost')
def post_newpost():
subject = bottle.request.forms.get("subject")
body = bottle.request.forms.get("body")
tags = bottle.request.forms.get("tags")
and the html code from the page is:
<html>
<head>
<meta charset="utf-8" />
<title>New Posts</title>
</head>
<body>
<form action="/newpost" method="POST">
<h2>Post title</h2>
<input type="text" name="subject" size="120" value="{{subject}}" ><br>
<h2>Post<h2>
<textarea name="body" cols="120" rows="20">{{body}}</textarea><br>
<h2>Tags</h2>
<input type="text" name="tags" size="120" value="{{tags}}"><br>
<p>
<input type="submit" value="Submit">
</body>
</html>
I read in Bottle page that:
In Python 3 all strings are unicode, but HTTP is a byte-based wire
protocol. The server has to decode the byte strings somehow before
they are passed to the application. To be on the safe side, WSGI
suggests ISO-8859-1 (aka latin1), a reversible single-byte codec that
can be re-encoded with a different encoding later. Bottle does that
for FormsDict.getunicode() and attribute access, but not for the
dict-access methods. These return the unchanged values as provided by
the server implementation, which is probably not what you want.
request.query['city']
'Göttingen' # An utf8 string provisionally decoded as ISO-8859-1 by the server
request.query.city
'Göttingen' # The same string correctly re-encoded as utf8 by bottle
If you need the whole dictionary with correctly decoded values (e.g. for WTForms), you can call FormsDict.decode() to get a re-encoded copy.
After reading that I tried using that function but don't know how.
Right now Bottle form returns strings, so I can not use encode('utf-8') or decode('utf-8').
Please help me!
Thanks!
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import bottle
#bottle.post('/newpost')
def post_newpost():
subject = bottle.request.forms.subject
body = bottle.request.forms.body
tags = bottle.request.forms.tags
That will do it.... Thanks!

Avoiding UnicodeDecodeError exceptions Python

In python I use an html template to display a steam player's information.
The template is:
'''<td>
<div>
Name: %s<br>
Hours: %s<br>
Steam Profile <br>
</div>
</td>'''
So I have TEMPLATE %(personaName, tf2Hours, id64)
Later on that template is saved into an html file.
Occasionally it returns a UnicodeDecodeError, because personaName can contain strange characters.
Is there a way to avoid this while still having the correct characters in the final html file?
EDIT:
The reason for the error was non-unicode characters.
Doing unicode(personaName, errors='ignore') solved the issue.
Try:
u'UnicodeTextHereaあä'.encode('ascii', 'ignore')
This will ignore unicode characters that can't be converted to ascii.
Here are a few examples that I just tried.
>>> x = 'Hello world!'
>>> y = 'notあä ascii'
>>> x.encode('ascii', 'ignore')
b'Hello world!'
>>> y.encode('ascii', 'ignore')
b'not ascii'
As you can see, it removed every trace of non-ascii characters.
Alternatively, you could tell the interpreter that you are planning on reading unicode values. For example (from docs.python.org/3.3/howto/unicode.html),
with open('unicode.txt', encoding='utf-8') as f:
for line in f:
print(repr(line))
This will interpret and allow you to read unicode as-is.

Categories