Jinja2 weird encoding error

Jinja2 weird encoding error - python

I have been fighting this for a whole night...
I'm trying to use Python markdown to generate HTML files from .md files and embed them into some other HTML files.
Here is the problematic snippet:
md = markdown.Markdown(encoding="utf-8")
input_file = codecs.open(f, mode="r", encoding="utf-8") # f is the name of the markdown file
text = input_file.read()
html = md.convert(text) # html generated from the markdown file
context = {
'css_url': url_for('static', filename = 'markdown.css'),
'contents': html
}
rendered_file = render_template('blog.html', **context)
output = open(splitext(f)[0] + '.html', 'w') # write the html to disk
output.write(rendered_file)
output.close()
Here is my "blog.html" template , which is really simple:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>blog</title>
<link rel="stylesheet" href="{{ css_url }}" type="text/css" />
</head>
<body>
{{ contents }}
</body>
</html>
And yet this is what I get:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>blog</title>
<link rel="stylesheet" href="/static/markdown.css" type="text/css" />
</head>
<body>
<li>People who love what they are doing</li>
<li></li>
</ol>
</body>
</html>
So I'm getting those weird "&gt", "&lt" stuff, even though I've already specified the encoding to be 'utf-8'. What could possibly go wrong?
Thank you!

<> has nothing to do with encoding. These are HTML entities that represent your input. You should mark it as safe so that jinja will not automatically escape it.
{{ contents|safe }}

Related

Why wont py script write file?

Shouldnt it create a file and write asd in it? When it prints out fgh.
<!DOCTYPE html>
<html>
<head>
<link rel="stylesheet" href="https://pyscript.net/latest/pyscript.css" />
<script defer src="https://pyscript.net/latest/pyscript.js"></script>
</head>
<body>
<py-script>
f = open("file.txt", "a", encoding="UTF-8")
f.write("asd")
print("fgh")
</py-script>
</body>
</html>

Shibboleth (Request missing SAMLRequest or SAMLResponse form parameter)

Good morning people!
I'm trying to make a POST request on shibolleth with the following code:
import requests as requests
link = 'https://[IP]/Shibboleth.sso/SAML2/POST'
requisicao = requests.Session().post(url=link,verify=False)
print(requisicao)
print(requisicao.text)
However, an HTML is returned with the following information:
<Response [500]>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html
PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<link rel="stylesheet" type="text/css" href="/shibboleth-sp/main.css" />
<title>opensaml::BindingException</title>
</head>
<body>
<h1>opensaml::BindingException</h1>
<p>The system encountered an error at Mon May 23 11:48:22 2022</p>
<p>To report this problem, please contact the site administrator at
root#localhost.
</p>
<p>Please include the following message in any email:</p>
<p class="error">opensaml::BindingException at (https://shibboleth.unifeob.edu.br/Shibboleth.sso/SAML2/POST)</p>
<p>Request missing SAMLRequest or SAMLResponse form parameter.</p>
</body>
</html>
Does anyone know which parameter is missing? Could it be SSL?

Download PDF with chrome plugin in python selenium

I'm trying to extract a PDF from this site that uses the native Google Chrome pdf viewer tool to open the pdf in the first place, it's content type is /application/pdf. The issue is that the site URLs that I get aren't actually links to the PDF but rather to a .zul site where the js will load the pdf, or fetch it.
Here's my download code below:
def download_pdf(url, idx, save_dir):
options = webdriver.ChromeOptions()
profile = {"plugins.plugins_list": [{"enabled":False,"name":"Chrome PDF Viewer"}],
"download.default_directory" : save_dir}
options.add_experimental_option("prefs",profile)
driver = webdriver.Chrome("/usr/lib/chromium-browser/chromedriver", chrome_options=options)
driver.get(url)
The problem that Im encountering with the above code is that I get the following readout from driver.source_page:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Pragma" content="no-cache" />
<meta http-equiv="Expires" content="-1" />
<title>Document Viewer</title>
<link rel="stylesheet" type="text/css" href="/eSMARTContracts/zkau/web/9776a7f0/zul/css/zk.wcs;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1"/>
<script type="text/javascript" src="/eSMARTContracts/zkau/web/9776a7f0/js/zk.wpd;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1" charset="UTF-8">
</script>
<script type="text/javascript" src="/eSMARTContracts/zkau/web/9776a7f0/js/zul.lang.wpd;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1" charset="UTF-8">
</script>
<!-- ZK 6.0.2 EE 2012072410 -->
</head>
<body>
<div id="j4AP_" class="z-temp"></div>
<script class="z-runonce" type="text/javascript">zk.pi=1;zkmx(
[0,'j4AP_',{dt:'z_2m1',cu:'/eSMARTContracts;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1',uu:'/eSMARTContracts/zkau;jsessionid=088DC94ECA6804AF717A0E997E4F1444.node1',ru:'/service/dpsweb/ViewDPSWeb.zul'},[
['zul.wnd.Window','j4AP0',{$$onSize:false,$$onMaximize:false,$$onOpen:false,$$onMinimize:false,$$onZIndex:false,$onClose:true,$$onMove:false,width:'100%',height:'100%',prolog:'\
'},[]]]]);
</script>
<noscript>
<div class="noscript"><p>Sorry, JavaScript must be enabled.<br/>Change your browser options, then try again.</p></div>
</noscript>
</body>
</html>
EDIT: Included the link

Why Pycharm print less than writing to a file?

I am testing the following code, I found that the output after the "print" is inconsistent with the text file. I have set the encoding to be "UTF-8". Is this a bug? How to fix?
import requests
url = "http://www.aastocks.com/tc/stocks/analysis/company-fundamental/financial-ratios?symbol=0001&period=4"
r = requests.get(url)
print r.content
f = open("test.txt","w")
f.write(r.content)

There is an internal limit to how many lines the run console buffer can hold. It is limited to about 15K lines.
To increase this limit, you'll have to change the idea.properties file and add a key idea.cycle.buffer.size and adjust it accordingly.
See this bug report where the solution was detailed.

While I don't know the exact version of python you are using, I would venture to guess that it's not 3.x because of usage of print statements.
The problem is not with your print statement per se, but, displaying such long lines (this one is 175765 long) can frequently be a significant issue. Python (particularly on windows), starts to become moody when dealing with lines that are several kB (176KB in this case) long. Instead of trying to display the entire string in one statement, try to break it up into multiple parts and then display. You will see that there is no difference between whatever r.content is showing up on screen and what it's storing through f.write.
Just for your confirmation you can do this after your code:
fh = open("test.txt","r")
print fh.read()
fh.close()
You will notice that there will not be a difference between this and whatever is shown by the previous print statement.
I have tried this on python 3.4.x and linux, But the behaviour you mentioned is not observed with this combination of python and platform.
EDIT 1
This is what I have tried:
import requests
url = "http://www.aastocks.com/tc/stocks/analysis/company-fundamental/financial-ratios?symbol=0001&period=4"
r = requests.get(url)
a = print(str(r.content))
f = open("test.txt","w")
f.write(str(r.content))
f.close()
f = open("test.txt","r")
print(f.read())
f.close()
and here is the output:
http://pastebin.com/R0j0mYe5
EDIT 2
I didn't notice the header was getting cut. I tried it in 2.x and saw the behavior. That does seem to be a problem. Apparently there are some issues popping up when scanning through the html and decoding ot for print. :
This is what I saw:
print r.content[0:500]
print "*****"
print r.content[0:1000]
Gives and op like this:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="公司資料, 主要財經比率, 流動比率, 股東權益回報率, 總資產回報率, 邊際利潤率, 派息比率" /><meta name="description" content="公司資料, 財務比率, 變現能力, 償債能力
*****
</script> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <link rel="stylesheet" type="teonal.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="公司資料, 主要財經比率, 流動比率, 股東權益回報率, 總資產回報率, 邊際利潤率, 派息比率" /><meta name="description" content="公司資料, 財務比率, 變現能力, 償債能力, 投資回報, 盈利能力, 營運能力, 投資收益, 綜合全年, 綜合中期" /><meta http-equiv="X-UA-Compatible" content="IE=Edge" /> <script type="text/javascript">
As we can see when printing only the first 500 lines, the op is as expected, but there are errors when we try for more.
Something strange is going on when it tries to decode the entire doc.
However, in python 3.4.x I see this:
print(con[0:500]) #con = r.content
print(con[0:1000])
output:
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="\xe5\x85\xac\xe5\x8f\xb8\xe8\xb3\x87\xe6\x96\x99, \xe4\xb8\xbb\xe8\xa6\x81\xe8\xb2\xa1\xe7\xb6\x93\xe6\xaf\x94\xe7\x8e\x87, \xe6\xb5\x81\xe5\x8b\x95\xe6\xaf\x94\xe7\x8e\x87, \xe8\x82\xa1\xe6\x9d\xb1\xe6\xac\x8a\xe7\x9b\x8a\xe5\x9b\x9e\xe5\xa0\xb1\xe7\x8e\x87, \xe7\xb8\xbd\xe8\xb3\x87\xe7\x94\xa2\xe5\x9b\x9e\xe5\xa0\xb1\xe7\x8e\x87, \xe9\x82\x8a\xe9\x9a\x9b\xe5\x88\xa9\xe6\xbd\xa4\xe7\x8e\x87, \xe6\xb4\xbe\xe6\x81\xaf\xe6\xaf\x94\xe7\x8e\x87" /><meta name="description" content="\xe5\x85\xac\xe5\x8f\xb8\xe8\xb3\x87\xe6\x96\x99, \xe8\xb2\xa1\xe5\x8b\x99\xe6\xaf\x94\xe7\x8e\x87, \xe8\xae\x8a\xe7\x8f\xbe\xe8\x83\xbd\xe5\x8a\x9b, \xe5\x84\x9f\xe5\x82\xb5\xe8\x83\xbd\xe5\x8a\x9b'
b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="\xe5\x85\xac\xe5\x8f\xb8\xe8\xb3\x87\xe6\x96\x99, \xe4\xb8\xbb\xe8\xa6\x81\xe8\xb2\xa1\xe7\xb6\x93\xe6\xaf\x94\xe7\x8e\x87, \xe6\xb5\x81\xe5\x8b\x95\xe6\xaf\x94\xe7\x8e\x87, \xe8\x82\xa1\xe6\x9d\xb1\xe6\xac\x8a\xe7\x9b\x8a\xe5\x9b\x9e\xe5\xa0\xb1\xe7\x8e\x87, \xe7\xb8\xbd\xe8\xb3\x87\xe7\x94\xa2\xe5\x9b\x9e\xe5\xa0\xb1\xe7\x8e\x87, \xe9\x82\x8a\xe9\x9a\x9b\xe5\x88\xa9\xe6\xbd\xa4\xe7\x8e\x87, \xe6\xb4\xbe\xe6\x81\xaf\xe6\xaf\x94\xe7\x8e\x87" /><meta name="description" content="\xe5\x85\xac\xe5\x8f\xb8\xe8\xb3\x87\xe6\x96\x99, \xe8\xb2\xa1\xe5\x8b\x99\xe6\xaf\x94\xe7\x8e\x87, \xe8\xae\x8a\xe7\x8f\xbe\xe8\x83\xbd\xe5\x8a\x9b, \xe5\x84\x9f\xe5\x82\xb5\xe8\x83\xbd\xe5\x8a\x9b, \xe6\x8a\x95\xe8\xb3\x87\xe5\x9b\x9e\xe5\xa0\xb1, \xe7\x9b\x88\xe5\x88\xa9\xe8\x83\xbd\xe5\x8a\x9b, \xe7\x87\x9f\xe9\x81\x8b\xe8\x83\xbd\xe5\x8a\x9b, \xe6\x8a\x95\xe8\xb3\x87\xe6\x94\xb6\xe7\x9b\x8a, \xe7\xb6\x9c\xe5\x90\x88\xe5\x85\xa8\xe5\xb9\xb4, \xe7\xb6\x9c\xe5\x90\x88\xe4\xb8\xad\xe6\x9c\x9f" /><meta http-equiv="X-UA-Compatible" content="IE=Edge" /> <script type="text/javascript">\rvar _gaq = _gaq || [];\r_gaq.push([\'_setAccount\', \'UA-20790503-3\']);\r_gaq.push([\'_setDomainName\', \'www.aastocks.com\']);\r_gaq.push([\'_trackPageview\']);\r_gaq.push([\'_trackPageLoadTime\']);\rfunction OA_show(name) {\r} \r</script> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <link rel="stylesheet" type="te'
But the output is similar in 3.x (like 2.x) if I try to decode the utf-8:
print(con[0:500].decode('utf-8'))
print(con[0:1000].decode('utf-8'))
Op:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:fb="http://www.facebook.com/2008/fbml" xmlns:og="http://ogp.me/ns#"> <head id="Head1"><meta name="keywords" content="公司資料, 主要財經比率, 流動比率, 股東權益回報率, 總資產回報率, 邊際利潤率, 派息比率" /><meta name="description" content="公司資料, 財務比率, 變現能力, 償債能力
</script> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <link rel="stylesheet" type="te

webpy: how to override content of basic template?

I'm using webpy with a basic template
render = web.template.render(basedir + 'templates/', base='layout', globals=globals_vars_custom)
in layout.html I have something like:
$def with (content)
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>PAGE TITLE</title>
<!-- Some CSS and some javascript -->
</head>
<body>
$:content
</body>
</html>
This basic template works fine for the 90% of my site, but I have a page in which I need to insert some other data inside the <head> (some meta tags).
How can I do this? How can I put the <head> inside a structure that I can easily override inside a template?
Thanks!

It was easier than I tought:
You can create a variable in a template:
in page.html you can define a variable
$var title = 'specific title'
in the base template layout.html you can call the varable:
$def with (content)
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>$content.get("title","DEFAULT TITLE")</title>
<!-- Some CSS and some javascript -->
</head>
<body>
$:content
</body>
</html>
I hope this answer can be useful also for other people.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Jinja2 weird encoding error - python

<> has nothing to do with encoding. These are HTML entities that represent your input. You should mark it as safe so that jinja will not automatically escape it. {{ contents|safe }}

Related

Why wont py script write file?

Shibboleth (Request missing SAMLRequest or SAMLResponse form parameter)

Download PDF with chrome plugin in python selenium

Why Pycharm print less than writing to a file?

webpy: how to override content of basic template?

Categories

Resources