Avoiding UnicodeDecodeError exceptions Python - python

In python I use an html template to display a steam player's information.
The template is:
'''<td>
<div>
Name: %s<br>
Hours: %s<br>
Steam Profile <br>
</div>
</td>'''
So I have TEMPLATE %(personaName, tf2Hours, id64)
Later on that template is saved into an html file.
Occasionally it returns a UnicodeDecodeError, because personaName can contain strange characters.
Is there a way to avoid this while still having the correct characters in the final html file?
EDIT:
The reason for the error was non-unicode characters.
Doing unicode(personaName, errors='ignore') solved the issue.

Try:
u'UnicodeTextHereaあä'.encode('ascii', 'ignore')
This will ignore unicode characters that can't be converted to ascii.
Here are a few examples that I just tried.
>>> x = 'Hello world!'
>>> y = 'notあä ascii'
>>> x.encode('ascii', 'ignore')
b'Hello world!'
>>> y.encode('ascii', 'ignore')
b'not ascii'
As you can see, it removed every trace of non-ascii characters.
Alternatively, you could tell the interpreter that you are planning on reading unicode values. For example (from docs.python.org/3.3/howto/unicode.html),
with open('unicode.txt', encoding='utf-8') as f:
for line in f:
print(repr(line))
This will interpret and allow you to read unicode as-is.

Related

Display contents of a text file in Django template

I am attempting to create a file in simple website and then read the contents of the same in a variable inside a Django view function and parse the variable to the template to be displayed on web page.
However, when I print the variable, it appears the same on cmd as is in the original text file, but the output on the web page has no formattings but appears like a single string.
I've been stuck on it for two days.
Also I'm relatively new to django and self learning it
file1 = open(r'status.txt','w',encoding='UTF-8')
file1.seek(0)
for i in range(0,len(data.split())):
file1.write(data.split()[i] + " ")
if i%5==0 and i!=0 and i!=5:
file1.write("\n")
file1.close()
file1 = open(r'status.txt',"r+",encoding='UTF-8')
d = file1.read()
print(d) #prints on cmd in the same formatting as in text file
return render(request,'status.html',{'dat':d}) **#the html displays it only as a single text
string**
<body>
{% block content %}
{{dat}}
{% endblock %}
</body>
Use the linebreaks filter in your template. It will render \n as <br/>.
use it like -:
{{ dat | linebreaks }}
from the docs:
Replaces line breaks in plain text with appropriate HTML; a single
newline becomes an HTML line break (<br>) and a new line followed by a
blank line becomes a paragraph break (</p>).
You can use linebreaksbr if you don't want <p> tag.
It's because in HTML newline is </br> in Python it is \n. You should convert it, before rendering
mytext = "<br />".join(mytext.split("\n"))
Depending of your needs and the file format you want to print, you may also want to check the <pre> HTML tag.

Python shell - new line

I'm following a tutorial in a book and I'm having some issues with the "\n".
Here is the code that I am asked to type in Python shell:
from django.template import Template, Context
template = Template(
'{{ ml.exclaim }}!\n'
'she said {{ ml.adverb }}\n'
'as she jumped into her convertible {{ ml.noun1 }}\n'
'and drove off with her {{ ml.noun2 }}.\n'
)
mad_lib = {
'exclaim':'Ouch',
'adverb':'dutifully',
'noun1':'boat',
'noun2':'pineapple',
}
context = Context({'ml': mad_lib})
template.render(context)
So whenever I enter this into the Python shell, it comes returns it as this all at once:
u'Ouch!\nshe said dutifully\nas she jumped into her convertible boat\nand drove off with her pineapple.\n'
I'd like to have it come out like this all on separate lines:
Ouch!
she said dutifully
as she jumped into her convertible boat
and drove off with her pineapple.
All help is appreciated.
strings that appeared next to each other is automatically combined together in python, so
>>> s1 = "hello " "world"
>>> s2 = "hello world"
>>> s3 = "hello"
..."world" \
>>> s1 == s2 == s3
True
your rendered string is the exact result you should expect.
but, I think you are just asking how to make \n actually breaks a line,
just use print template.render(context)
Use the autoescape template tag to esacpe HTML linebreak (< br />) characters.
In addition, since you do not have html linebreak characters, you have to convert your text file newlines to HTML linebreaks, using the linebreaksbr Django template filter, as suggested in [this][1] answer.
Note:-It will works when you try to print direct in html.

python script to convert " into a html tag

I would like to add two html tags to this file. Each line ends with <br> which is done via something like
>>> f = open("/tmp/x","r")
>>> con = f.readlines()
>>> for line in con:
... print line + "<br>"
...
Now I would like to replace " with specific html tag <h3>. For ex, I have file with content
This is test file named "file.txt" and below is the way to
understand its data and values etc
This is line "one" is also "tricky" to change
expected output is :
This is test file named <h3>file.txt</h3> and below is the way to <br>
understand its data and values etc<br>
This is line <h3>one</h3> is also <h3>tricky</h3> to change <br>
I'm thinking about have two flags for appearance of " . If its odd then use <h3> else use </h3> something like that.If you anyother solution , please suggest.
With import re at the top of your code,
line = re.sub(r'"([^"]*)"', r'<h3>\1</h3>', line)
should do the substitutions you show in your examples.
This doesn't catch an "extra/odd/spare" occurrence of ", only pairs of "s -- if you need to do some other substitution to the "odd" double-quote if any, that's easily arranged as a next step of processing for line.

How to properly decode Quoted Printable encoding in Django HTML Template

I have a Google app engine in python form submit that POSTS text to a server, and the text gets encoded with the encoding Quoted Printables.
My code for POSTing is this:
<form action={{ upload_url }} method="post" enctype="multipart/form-data">
<div class="sigle-form"><textarea name="body" rows="5"></textarea></div>
<div class="sigle-form"><input name="file" type="file" /></div>
</form>
Then the result of the fetching self.request.get('body') will be encoded with the encoding Quoted Printables. I store this in text DB.textProperty() and later sends the text to a HTML template using Django. When i write out the variable using {{ body }}, the result is written with Quoted printable encoding, and it does not seem that there is a way of decoding this in the Django HTML template.
Is there any way of encoding the text in the body thats sent on another way than with Quoted Printables? If not, how to decode this encoding in the Django HTML template?
The result for submiting the text "ÅØÆ" is encoded to " xdjG ", so the sum of the Quoted Prinables are somehow added togheter as well. This happens when more than one special character are present in the encoded text. An ordinary "ø" is encoded to =F8.
EDIT: I get this problem only in production, and this thread seems to talk about the same problem.
If anyone else here on Stack Overflow are doing form submit with blobs and åæøè characters, please respond to this thread on how you have solved it!
Ok, after two days working with this issue i finally resolved it. Its seemingly a bug with Google App Engine that makes the encoding f'ed up in production. When in production the text is sometimes encoded with Quoted Printable encoded, and some other times encoded with base64 encoding. Weird. Here is my solution:
postBody = self.request.get('body')
postBody = postBody.encode('iso-8859-1')
DEBUG = os.environ['SERVER_SOFTWARE'].startswith('Dev')
if DEBUG:
r.body = postBody
else:
postBody += "=" * ((4 - len(postBody) % 4) % 4)
b64 = base64.urlsafe_b64decode(postBody)
Though the resulting b64 can't be stored in the data storage because it's not ascii encoded
'ascii' codec can't decode byte 0xe5 in position 5: ordinal not in range(128)
I solved a similar problem by using the Python quopri module to decode the string before passing it to an HTML template.
import quopri
body = quopri.decodestring(body)
This seems to be something to do with the multipart/form-data enctype. Quotable printable encoding is applied to the textarea input, which is then, in my case, submitted via a blobstore upload link. The blobstore returns the text to my upload handler still in encoded form.
Not sure what Quoted Printables are but have you tried safe?
{{ body|safe }}
https://docs.djangoproject.com/en/dev/ref/templates/builtins/?from=olddocs#safe

Printing HTML in Python CGI

I've been teaching myself python and cgi scripting, and I know that your basic script looks like
#!/usr/local/bin/python
import cgi
print "Content-type: text/html"
print
print "<HTML>"
print "<BODY>"
print "HELLO WORLD!"
print "</BODY>"
print "</HTML>"
My question is, if I have a big HTML file I want to display in python (it had lines and lines of code and sone JS in it) do I have to manually add 'print' in front of each line and turn "s into \" , etc? Or is there a method or script that could convert it for me?
Thanks!
Python supports multiline strings, so you can print out your text in one big blurb.
print '''<html>
<head><title>My first Python CGI app</title></head>
<body>
<p>Hello, 'world'!</p>
</body>
</html>'''
They support all string operations, including methods (.upper(), .translate(), etc.) and formatting (%), as well as raw mode (r prefix) and the u unicode prefix.
If that big html file is called (for example) 'foo.html' and lives in the current directory for your CGI script, then all you need as your script's body is:
print "Content-type: text/html"
print
with open('foo.html') as f:
print f.read()
If you're stuck with Python 2.5, add from __future__ import with_statement as the start of your module's body. If you're stuck with an even older Python, change the last two lines into
print open('foo.html').read()
Note that you don't need to import cgi when you're using none of the functionality of the cgi module, which is the case both in your example and in this answer.
When I was first experimenting with decorators, I wrote this little CGI decorator to handle the HTML head and body tag boilerplate stuff. So that you can just write:
#CGImethod(title="Hello with Decorator")
def say_hello():
print '<h1>Hello from CGI-Land</h1>'
which when called returns:
Content-Type: text/html
<HTML>
<HEAD><TITLE>Hello with Decorator</TITLE></HEAD>
<BODY>
<h1>Hello from CGI-Land</h1>
</BODY></HTML>
Then say_hello could be called from your HTTP server's do_GET or do_POST methods.
Python supports multiline string. So you can just copy your HTML code and paste it into the quotations.
print ("<html>
<head>
<title>
</title>
</head>
</html>")
and so on!

Categories