How to get utf-8 from forms in Bottle? - python

I am trying to use Bottle.py to get input information from users in a web page.
Everything works fine except when I have latin characters (accents mostly). I have try using utf-8 and latin-1 coding on the first two lines of the code, but it won't work.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import bottle
#bottle.post('/newpost')
def post_newpost():
subject = bottle.request.forms.get("subject")
body = bottle.request.forms.get("body")
tags = bottle.request.forms.get("tags")
and the html code from the page is:
<html>
<head>
<meta charset="utf-8" />
<title>New Posts</title>
</head>
<body>
<form action="/newpost" method="POST">
<h2>Post title</h2>
<input type="text" name="subject" size="120" value="{{subject}}" ><br>
<h2>Post<h2>
<textarea name="body" cols="120" rows="20">{{body}}</textarea><br>
<h2>Tags</h2>
<input type="text" name="tags" size="120" value="{{tags}}"><br>
<p>
<input type="submit" value="Submit">
</body>
</html>
I read in Bottle page that:
In Python 3 all strings are unicode, but HTTP is a byte-based wire
protocol. The server has to decode the byte strings somehow before
they are passed to the application. To be on the safe side, WSGI
suggests ISO-8859-1 (aka latin1), a reversible single-byte codec that
can be re-encoded with a different encoding later. Bottle does that
for FormsDict.getunicode() and attribute access, but not for the
dict-access methods. These return the unchanged values as provided by
the server implementation, which is probably not what you want.
request.query['city']
'Göttingen' # An utf8 string provisionally decoded as ISO-8859-1 by the server
request.query.city
'Göttingen' # The same string correctly re-encoded as utf8 by bottle
If you need the whole dictionary with correctly decoded values (e.g. for WTForms), you can call FormsDict.decode() to get a re-encoded copy.
After reading that I tried using that function but don't know how.
Right now Bottle form returns strings, so I can not use encode('utf-8') or decode('utf-8').
Please help me!
Thanks!

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import bottle
#bottle.post('/newpost')
def post_newpost():
subject = bottle.request.forms.subject
body = bottle.request.forms.body
tags = bottle.request.forms.tags
That will do it.... Thanks!

Related

SimpleHTTPServer in Python3.6.4 can not handle non-ASCII string(Chinese in my case)

I run SimpleHTTPServer in Python3.6.4 64bit by this command:
python -m http.server --cgi
then I make a form in test.py, submit it to test_form_action.py to print the input text.
cgi-bin/test.py
# coding=utf-8
from __future__ import unicode_literals, absolute_import
print("Content-Type: text/html") # HTML is following
print()
reshtml = '''<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html" charset="utf-8"/>
</head>
<body>
<div style="text-align: center;">
<form action="/cgi-bin/test_form_action.py" method="POST"
target="_blank">
输入:<input type="text" id= "id" name="name"/></td>
<button type="submit">Submit</button>
</form>
</div>
</body>
</html>'''
print(reshtml)
cgi-bin/test_form_action.py
# coding=utf-8
from __future__ import unicode_literals, absolute_import
# Import modules for CGI handling
import cgi, cgitb
cgitb.enable()
if __name__ == '__main__':
print("Content-Type: text/html") # HTML is following
print()
form = cgi.FieldStorage()
print(form)
id = form.getvalue("id")
name = form.getvalue("name")
print(id)
When I visit http://127.0.0.1:8000/cgi-bin/test.py,
The Chinese Character "输入" doesn't show right, it look like "����",
I have to manually change the Text Encoding of this page from
"Unicode" to "Chinese Simplified" in Firefox to make Chinese Character look normal.
It's weird, since I put charset="utf-8" in cgi-bin/test.py.
Further more, when I put some Chinese in input form, and submit. But cgi-bin/test_form_action.py is blank.
meanwhile some error show in windows terminal where I run SimpleHTTPServer:
127.0.0.1 - - [23/Mar/2018 23:43:32] b'Error in sys.excepthook:\r\nTraceback (most recent call last):\r\n File
"E:\Python\Python36\Lib\cgitb.py", line 26 8, in call\r\n
self.handle((etype, evalue, etb))\r\n File
"E:\Python\Python36\Lib\cgitb.py", line 288, in handle\r\n
self.file.write(doc + \'\ n\')\r\nUnicodeEncodeError: \'gbk\' codec
can\'t encode character \'\ufffd\' in position 1894: illegal
multibyte sequence\r\n\r\nOriginal exception was:\r\nT raceback (most
recent call last):\r\n File
"G:\Python\Project\VideoHelper\cgi-bin\test_form_action.py", line
13, in \r\n print(form)\r\nUnico deEncodeError: \'gbk\'
codec can\'t encode character \'\ufffd\' in position 52: illegal
multibyte sequence\r\n'
127.0.0.1 - - [23/Mar/2018 23:43:32] CGI script exit status 0x1
When you use the print() expression, Python converts the strings to bytes, ie. it encodes them using a default codec.
The choice of this default value depends on the environment – in your case it seems to be GBK (judging from the error message).
In the HTML page your CGI script returns, you specify the codec ("charset") as UTF-8.
You can of course change this to GBK, but it will only solve your first problem (display of test.py), not the second one (encoding error in test_form_action.py).
Instead, it's probably better to get Python to send UTF-8-encoded data on STDOUT.
One approach is to replace all occurrences of
print(x)
with
sys.stdout.buffer.write(x.encode('utf8'))
Alternatively, you can replace sys.stdout with a re-encoded wrapper, without changing the print() occurrences:
sys.stdout = open(sys.stdout.buffer.fileno(), 'w', encoding='utf8'))
Note: These two solutions don't work in Python 2.x (you'd have to omit the .buffer part there).
I'm writing this because your code has from __future__ import statements, which have no use in code that is run with Python 3 exclusively.

how to send response to html using cgi python script

I am trying to design and implement a basic calculator in HTML and Python(using CGI). Below given is a static HTML web page and it is being redirected to a python script (calci.py) where, I am able to calculate the sum but unable to append the resultant to the 'output' textbox.
calculator.html
<html>
<head>
<title>Calculator</title>
</head>
<body>
<form action="python_scripts/calci.py" method="post">
Input 1 : <input type="text" name="input1"/><br>
Input 2 : <input type="text" name="input2"/><br>
<input type="submit" value="+" title="add" /><br>
output : <input type="text" name="output"/><br>
</form>
</body>
</html>
calci.py
import cgi
form = cgi.FieldStorage()
input1 = form.getvalue('input1')
input2 = form.getvalue('input2')
output = str(int(input1)+int(input2))
#how to return the response to the html
#and append it to the textbox
Thanks
This is not the way Web applications work - not hat simply, at least.
If you want to rely only on the browser, and plain HTML for your application, each request has to send the whole html page as a string. You have to use Python's string formatting capabilities to put the resulting number in the correct place in the HTML form.
This way of working is typical of "Web 1.0" applications (as opposed to the "Web 2.0" term used about ten years ago).
Modern web applications use logic that runs on the client side, in Javascript code, to make an HTTP request to retrieve only the needed data - and them, this client-side logic would place your result in the proper place in the page, without reloading the page. This is what isgenerally known as "ajax". It is not that complex, but the html + javascript side of the application become much more complex.
I think one should really understand the "Web 1.0" way before doing it the "Ajax way". in your case, let's suppose your HTML containing the calculator form is in a file called "calc.html". Inside it, where the result should lie, put a markup that can be understood by Python's built-in native string formatting methods, like {result} -
<html>
<body>
...
calculator body
...
Answer: <input type="text" readonly="true" value={result} />
</body>
</html>
And rewrite your code like:
import cgi
form = cgi.FieldStorage()
input1 = form.getvalue('input1')
input2 = form.getvalue('input2')
result = int(input1)+int(input2)
html = open("calc.html".read())
header = "Content-Type: text/html; charset=UTF-8\n\n"
output = header + html.format(result=result)
print (output)
The CGI way is outdated, but is nice for learning: it relies on your whole program being run, and whatever it prints to the standard output to be redirected to the HTTP request as a response. That includes the HTTP Headers, which are included, in a minimal form, above.
(I will leave the complete implementation of a way for the raw '{result}' string not to show up in the inital calculator form as an exercise from where you are 0- the path is to get the initial calculator html template through a CGI script as well, instead of statically, (maybe the same) as well - and just populate "result" with "0" or an empty string)
you can transfer response with the help of java script.
use under print("window.location=url")

Reading a file uploaded via CGI in Python

I am having trouble analyzing a file that is uploaded to my server. Ideally, what would happen is that the user would upload a .csv and I would run some numeric integration (hence the import numpy as np) from data in the file and return the result. To test the cgi part before I did any analysis, I made the python script below. My problem is that message always appears blank, so I get a blank html page displayed in browser at the end.
Clearly I am not reading the file correctly (which means I can not analyze it), but I have n o idea what I am doing wrong. My own searches indicate that what I have below should work.
#!/usr/bin/python
#---------------------------------------------
#=============================================
#---------------------------------------------
#imports
#import csv
#import time as tm
#import numpy as np
#import os
import cgi, cgitb
cgitb.enable()
form = cgi.FieldStorage()
#The variables
#httpopen=""
#httpclose=""
message="meow"
#get the fileitem
fileitem=form['userfile']
if fileitem.file:
#yay...we got a file
message=fileitem.file.readline()
print """\
Content-Type: text/html\n
<html><body>
<p>%s</p>
</body></html>
""" % (message,)
And here is the html form that gives the upload:
<html>
<body>
<form enctype="multipart/form-data"
action="capacity_rewrite.py" method="post">
<p>File: <input type="file" name="userfile" /></p>
<p><input type="submit" value="Upload" /></p>
</form>
</body>
</html>
Thanks,
You're only reading one line from the uploaded file. Perhaps that first line is a blank line? Try fileitem.file.read() instead, which will read the entire file into a string. Don't do that for large files as you may run out of memory, but it should help in understanding your test.

How to properly decode Quoted Printable encoding in Django HTML Template

I have a Google app engine in python form submit that POSTS text to a server, and the text gets encoded with the encoding Quoted Printables.
My code for POSTing is this:
<form action={{ upload_url }} method="post" enctype="multipart/form-data">
<div class="sigle-form"><textarea name="body" rows="5"></textarea></div>
<div class="sigle-form"><input name="file" type="file" /></div>
</form>
Then the result of the fetching self.request.get('body') will be encoded with the encoding Quoted Printables. I store this in text DB.textProperty() and later sends the text to a HTML template using Django. When i write out the variable using {{ body }}, the result is written with Quoted printable encoding, and it does not seem that there is a way of decoding this in the Django HTML template.
Is there any way of encoding the text in the body thats sent on another way than with Quoted Printables? If not, how to decode this encoding in the Django HTML template?
The result for submiting the text "ÅØÆ" is encoded to " xdjG ", so the sum of the Quoted Prinables are somehow added togheter as well. This happens when more than one special character are present in the encoded text. An ordinary "ø" is encoded to =F8.
EDIT: I get this problem only in production, and this thread seems to talk about the same problem.
If anyone else here on Stack Overflow are doing form submit with blobs and åæøè characters, please respond to this thread on how you have solved it!
Ok, after two days working with this issue i finally resolved it. Its seemingly a bug with Google App Engine that makes the encoding f'ed up in production. When in production the text is sometimes encoded with Quoted Printable encoded, and some other times encoded with base64 encoding. Weird. Here is my solution:
postBody = self.request.get('body')
postBody = postBody.encode('iso-8859-1')
DEBUG = os.environ['SERVER_SOFTWARE'].startswith('Dev')
if DEBUG:
r.body = postBody
else:
postBody += "=" * ((4 - len(postBody) % 4) % 4)
b64 = base64.urlsafe_b64decode(postBody)
Though the resulting b64 can't be stored in the data storage because it's not ascii encoded
'ascii' codec can't decode byte 0xe5 in position 5: ordinal not in range(128)
I solved a similar problem by using the Python quopri module to decode the string before passing it to an HTML template.
import quopri
body = quopri.decodestring(body)
This seems to be something to do with the multipart/form-data enctype. Quotable printable encoding is applied to the textarea input, which is then, in my case, submitted via a blobstore upload link. The blobstore returns the text to my upload handler still in encoded form.
Not sure what Quoted Printables are but have you tried safe?
{{ body|safe }}
https://docs.djangoproject.com/en/dev/ref/templates/builtins/?from=olddocs#safe

How to include Non-AscII character in python appengine send html mail

My problem is that I want to compose an email in python environment of google appengine.
When I add Greek characters to the body of my message I get:
SyntaxError: Non-ASCII character '\xce'
megssage.html = """
<html>
<body>
παραδειγμα
</body>
</html>"""
Use this shebang:
# -*- coding: utf-8 -*-

Categories