For HTML5 and Python CGI:
If I write UTF-8 Meta Tag, my code doesn't work.
If I don't write, it works.
Page encoding is UTF-8.
print("Content-type:text/html")
print()
print("""
<!doctype html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
şöğıçü
</body>
</html>
""")
This codes doesn't work.
print("Content-type:text/html")
print()
print("""
<!doctype html>
<html>
<head></head>
<body>
şöğıçü
</body>
</html>
""")
But this codes works.
For CGI, using print() requires that the correct codec has been set up for output. print() writes to sys.stdout and sys.stdout has been opened with a specific encoding and how that is determined is platform dependent and can differ based on how the script is run. Running your script as a CGI script means you pretty much do not know what encoding will be used.
In your case, the web server has set the locale for text output to a fixed encoding other than UTF-8. Python uses that locale setting to produce output in in that encoding, and without the <meta> header your browser correctly guesses that encoding (or the server has communicated it in the Content-Type header), but with the <meta> header you are telling it to use a different encoding, one that is incorrect for the data produced.
You can write directly to sys.stdout.buffer, after explicitly encoding to UTF-8. Make a helper function to make this easier:
import sys
def enc_print(string='', encoding='utf8'):
sys.stdout.buffer.write(string.encode(encoding) + b'\n')
enc_print("Content-type:text/html")
enc_print()
enc_print("""
<!doctype html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
şöğıçü
</body>
</html>
""")
Another approach is to replace sys.stdout with a new io.TextIOWrapper() object that uses the codec you need:
import sys
import io
def set_output_encoding(codec, errors='strict'):
sys.stdout = io.TextIOWrapper(
sys.stdout.detach(), errors=errors,
line_buffering=sys.stdout.line_buffering)
set_output_encoding('utf8')
print("Content-type:text/html")
print()
print("""
<!doctype html>
<html>
<head></head>
<body>
şöğıçü
</body>
</html>
""")
From https://ru.stackoverflow.com/a/352838/11350
First dont forget to set encoding in file
#!/usr/bin/env python
# -*- coding: utf-8 -*-
Then try
import sys
import codecs
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
Or if you use apache2, add to your conf.
AddDefaultCharset UTF-8
SetEnv PYTHONIOENCODING utf8
Related
I run SimpleHTTPServer in Python3.6.4 64bit by this command:
python -m http.server --cgi
then I make a form in test.py, submit it to test_form_action.py to print the input text.
cgi-bin/test.py
# coding=utf-8
from __future__ import unicode_literals, absolute_import
print("Content-Type: text/html") # HTML is following
print()
reshtml = '''<!DOCTYPE html>
<html lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html" charset="utf-8"/>
</head>
<body>
<div style="text-align: center;">
<form action="/cgi-bin/test_form_action.py" method="POST"
target="_blank">
输入:<input type="text" id= "id" name="name"/></td>
<button type="submit">Submit</button>
</form>
</div>
</body>
</html>'''
print(reshtml)
cgi-bin/test_form_action.py
# coding=utf-8
from __future__ import unicode_literals, absolute_import
# Import modules for CGI handling
import cgi, cgitb
cgitb.enable()
if __name__ == '__main__':
print("Content-Type: text/html") # HTML is following
print()
form = cgi.FieldStorage()
print(form)
id = form.getvalue("id")
name = form.getvalue("name")
print(id)
When I visit http://127.0.0.1:8000/cgi-bin/test.py,
The Chinese Character "输入" doesn't show right, it look like "����",
I have to manually change the Text Encoding of this page from
"Unicode" to "Chinese Simplified" in Firefox to make Chinese Character look normal.
It's weird, since I put charset="utf-8" in cgi-bin/test.py.
Further more, when I put some Chinese in input form, and submit. But cgi-bin/test_form_action.py is blank.
meanwhile some error show in windows terminal where I run SimpleHTTPServer:
127.0.0.1 - - [23/Mar/2018 23:43:32] b'Error in sys.excepthook:\r\nTraceback (most recent call last):\r\n File
"E:\Python\Python36\Lib\cgitb.py", line 26 8, in call\r\n
self.handle((etype, evalue, etb))\r\n File
"E:\Python\Python36\Lib\cgitb.py", line 288, in handle\r\n
self.file.write(doc + \'\ n\')\r\nUnicodeEncodeError: \'gbk\' codec
can\'t encode character \'\ufffd\' in position 1894: illegal
multibyte sequence\r\n\r\nOriginal exception was:\r\nT raceback (most
recent call last):\r\n File
"G:\Python\Project\VideoHelper\cgi-bin\test_form_action.py", line
13, in \r\n print(form)\r\nUnico deEncodeError: \'gbk\'
codec can\'t encode character \'\ufffd\' in position 52: illegal
multibyte sequence\r\n'
127.0.0.1 - - [23/Mar/2018 23:43:32] CGI script exit status 0x1
When you use the print() expression, Python converts the strings to bytes, ie. it encodes them using a default codec.
The choice of this default value depends on the environment – in your case it seems to be GBK (judging from the error message).
In the HTML page your CGI script returns, you specify the codec ("charset") as UTF-8.
You can of course change this to GBK, but it will only solve your first problem (display of test.py), not the second one (encoding error in test_form_action.py).
Instead, it's probably better to get Python to send UTF-8-encoded data on STDOUT.
One approach is to replace all occurrences of
print(x)
with
sys.stdout.buffer.write(x.encode('utf8'))
Alternatively, you can replace sys.stdout with a re-encoded wrapper, without changing the print() occurrences:
sys.stdout = open(sys.stdout.buffer.fileno(), 'w', encoding='utf8'))
Note: These two solutions don't work in Python 2.x (you'd have to omit the .buffer part there).
I'm writing this because your code has from __future__ import statements, which have no use in code that is run with Python 3 exclusively.
I am trying to use Bottle.py to get input information from users in a web page.
Everything works fine except when I have latin characters (accents mostly). I have try using utf-8 and latin-1 coding on the first two lines of the code, but it won't work.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import bottle
#bottle.post('/newpost')
def post_newpost():
subject = bottle.request.forms.get("subject")
body = bottle.request.forms.get("body")
tags = bottle.request.forms.get("tags")
and the html code from the page is:
<html>
<head>
<meta charset="utf-8" />
<title>New Posts</title>
</head>
<body>
<form action="/newpost" method="POST">
<h2>Post title</h2>
<input type="text" name="subject" size="120" value="{{subject}}" ><br>
<h2>Post<h2>
<textarea name="body" cols="120" rows="20">{{body}}</textarea><br>
<h2>Tags</h2>
<input type="text" name="tags" size="120" value="{{tags}}"><br>
<p>
<input type="submit" value="Submit">
</body>
</html>
I read in Bottle page that:
In Python 3 all strings are unicode, but HTTP is a byte-based wire
protocol. The server has to decode the byte strings somehow before
they are passed to the application. To be on the safe side, WSGI
suggests ISO-8859-1 (aka latin1), a reversible single-byte codec that
can be re-encoded with a different encoding later. Bottle does that
for FormsDict.getunicode() and attribute access, but not for the
dict-access methods. These return the unchanged values as provided by
the server implementation, which is probably not what you want.
request.query['city']
'Göttingen' # An utf8 string provisionally decoded as ISO-8859-1 by the server
request.query.city
'Göttingen' # The same string correctly re-encoded as utf8 by bottle
If you need the whole dictionary with correctly decoded values (e.g. for WTForms), you can call FormsDict.decode() to get a re-encoded copy.
After reading that I tried using that function but don't know how.
Right now Bottle form returns strings, so I can not use encode('utf-8') or decode('utf-8').
Please help me!
Thanks!
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import bottle
#bottle.post('/newpost')
def post_newpost():
subject = bottle.request.forms.subject
body = bottle.request.forms.body
tags = bottle.request.forms.tags
That will do it.... Thanks!
I found this CGI Module, its letting me use HTML tags inside a python script.
ive seen some topics in here that shows how to use it, but when im using it it doesnt works.
import cgi
print ("""
<html>
<body>
Hello
</body>
</html>
""")
and this is the output when im running the script:
<html>
<body>
Hello
</body>
</html>
how can i use this properly?
thanks.
If you have your CGI script already hooked up to a web server, you will need to emit the HTTP headers too, e.g.
print("Content-Type: text/html") # HTML is following
print() # blank line, end of headers
print ("""
<html>
<body>
Hello
</body>
</html>
""")
Note that the cgi module is not being used in any way to achieve this; just simple calls to print(). The module is useful when you want to process form data submitted by a client through a HTML form.
I'm getting an internal server error (500 err) "End of script output before headers" trying to run a cgi using
XAMPP Apache on Windows
Python 3.3
Notepad++ with UNIX Style (\n) newline chars
My script reads as follows
#!"C:\Python33\python.exe"
import cgi
def htmlTop():
print("Content-type: text/html")
print()
print("""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<title>My Server Side Test</title>
</head>
<body>""")
def htmlTail():
print("""</body>
</html>""")
if ___name___ == "__main__":
try:
htmlTop()
print("Hello World")
htmlTail()
except:
cgi.print_exception()
Please note I have tried using print("Content-type: text/html\n\n") as opposed to the extra print statement. Thanks!
I know this is an old post but I found the errors in your script
The first mistake I found was the quote you used when requiring python, so
#!"C:\Python33\python.exe" should be changed to #!C:\Python33\python.exe
The second mistake I found was the additional bars you used There are 3 bars here->___name___<-and here so
___name___ should be changed to __name__
So the final code should be
#!C:\Python33\python.exe
import cgi
def htmlTop():
print("Content-type: text/html")
print()
print("""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8"/>
<title>My Server Side Test</title>
</head>
<body>""")
def htmlTail():
print("""</body>
</html>""")
if __name__ == "__main__":
try:
htmlTop()
print("Hello World")
htmlTail()
except:
cgi.print_exception()
How is to compress (minimize) HTML from python; I know I can use some regex to strip spaces and other things, but I want a real compiler using pure python(so it can be used on Google App Engine).
I did a test on a online html compressor and it saved 65% of the html size. I want that, but from python.
You can use htmlmin to minify your html:
import htmlmin
html = """
<!DOCTYPE html>
<html lang="en">
<head>
<title>Bootstrap Case</title>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/css/bootstrap.min.css">
<script src="https://ajax.googleapis.com/ajax/libs/jquery/3.1.1/jquery.min.js"></script>
<script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.7/js/bootstrap.min.js"></script>
</head>
<body>
<div class="container">
<h2>Well</h2>
<div class="well">Basic Well</div>
</div>
</body>
</html>
"""
minified = htmlmin.minify(html.decode("utf-8"), remove_empty_space=True)
print(minified)
htmlmin and html_slimmer are some simple html minifying tools for python. I have millions of html pages stored in my database and running htmlmin, I am able to reduce the page size between 5 and 50%. Neither of them do an optimal job at complete html minification (i.e. the font color #00000 can be reduced to #000), but it's a good start. I have a try/except block that runs htmlmin and then if that fails, html_slimmer because htmlmin seems to provide better compression, but it does not support non ascii characters.
Example Code:
import htmlmin
from slimmer import html_slimmer # or xhtml_slimmer, css_slimmer
try:
html=htmlmin.minify(html, remove_comments=True, remove_empty_space=True)
except:
html=html_slimmer( html.strip().replace('\n',' ').replace('\t',' ').replace('\r',' ') )
Good Luck!
I suppose that in GAE there is no really need for minify your html as GAE already gzip it Caching & GZip on GAE (Community Wiki)
I did not test but minified version of html will probably win only 1% of size as it only remove space once both version are compressed.
If you want to save storage, for example by memcached it, you have more interest to gzip it (even at low level of compression) than removing space as in python it will be probably smaller and faster as processed in C instead of pure python
import htmlmin
code='''<body>
Hello World
<div style='color:red;'>Hi</div>
</body>
'''
htmlmin.minify(code)
Last line output
<body> Hello World <div style=color:red;>Hi</div> </body>
You can use this code to delete spaces
htmlmin.minify(code,remove_empty_space=True)
I wrote a build script that duplicates my templates into another directory and then I use this trick to tell my application to select the correct template in development mode, or in production:
DEV = os.environ['SERVER_SOFTWARE'].startswith('Development') and not PRODUCTION_MODE
TEMPLATE_DIR = 'templates/2012/head/' if DEV else 'templates/2012/output/'
Whether it is gzipped by your webserver is not really the point, you should save every byte that you can for performance reasons.
If you look at some of the biggest sites out there, they often do things like writing invalid html to save bytes, for example, it is common to omit double quotes in id attributes in html tags, for example:
<!-- Invalid HTML -->
<div id=mydiv> ... </div>
<!-- Valid HTML -->
<div id="mydiv"> ... </div>
And there are several examples like this one, but that's beside the scope of the thread I guess.
Back to the question, I put together a little build script that minifies your HTML, CSS and JS. Caveat: It doesn't cover the case of the PRE tag.
import os
import re
import sys
from subprocess import call
HEAD_DIR = 'templates/2012/head/'
OUT_DIR = 'templates/2012/output/'
REMOVE_WS = re.compile(r"\s{2,}").sub
YUI_COMPRESSOR = 'java -jar tools/yuicompressor-2.4.7.jar '
CLOSURE_COMPILER = 'java -jar tools/compiler.jar --compilation_level ADVANCED_OPTIMIZATIONS '
def ensure_dir(f):
d = os.path.dirname(f)
if not os.path.exists(d):
os.makedirs(d)
def getTarget(fn):
return fn.replace(HEAD_DIR, OUT_DIR)
def processHtml(fn, tg):
f = open(fn, 'r')
content = f.read()
content = REMOVE_WS(" ", content)
ensure_dir(tg)
d = open(tg, 'w+')
d.write(content)
content
def processCSS(fn, tg):
cmd = YUI_COMPRESSOR + fn + ' -o ' + tg
call(cmd, shell=True)
return
def processJS(fn, tg):
cmd = CLOSURE_COMPILER + fn + ' --js_output_file ' + tg
call(cmd, shell=True)
return
# Script starts here.
ensure_dir(OUT_DIR)
for root, dirs, files in os.walk(os.getcwd()):
for dir in dirs:
print "Processing", os.path.join(root, dir)
for file in files:
fn = os.path.join(root) + '/' + file
if fn.find(OUT_DIR) > 0:
continue
tg = getTarget(fn)
if file.endswith('.html'):
processHtml(fn, tg)
if file.endswith('.css'):
processCSS(fn, tg)
if file.endswith('.js'):
processJS(fn, tg)