Generating HTML documents in python - python

In python, what is the most elegant way to generate HTML documents. I currently manually append all of the tags to a giant string, and write that to a file. Is there a more elegant way of doing this?

You can use yattag to do this in an elegant way. FYI I'm the author of the library.
from yattag import Doc
doc, tag, text = Doc().tagtext()
with tag('html'):
with tag('body'):
with tag('p', id = 'main'):
text('some text')
with tag('a', href='/my-url'):
text('some link')
result = doc.getvalue()
It reads like html, with the added benefit that you don't have to close tags.

I would suggest using one of the many template languages available for python, for example the one built into Django (you don't have to use the rest of Django to use its templating engine) - a google query should give you plenty of other alternative template implementations.
I find that learning a template library helps in so many ways - whenever you need to generate an e-mail, HTML page, text file or similar, you just write a template, load it with your template library, then let the template code create the finished product.
Here's some simple code to get you started:
#!/usr/bin/env python
from django.template import Template, Context
from django.conf import settings
settings.configure() # We have to do this to use django templates standalone - see
# http://stackoverflow.com/questions/98135/how-do-i-use-django-templates-without-the-rest-of-django
# Our template. Could just as easily be stored in a separate file
template = """
<html>
<head>
<title>Template {{ title }}</title>
</head>
<body>
Body with {{ mystring }}.
</body>
</html>
"""
t = Template(template)
c = Context({"title": "title from code",
"mystring":"string from code"})
print t.render(c)
It's even simpler if you have templates on disk - check out the render_to_string function for django 1.7 that can load templates from disk from a predefined list of search paths, fill with data from a dictory and render to a string - all in one function call. (removed from django 1.8 on, see Engine.from_string for comparable action)

If you're building HTML documents than I highly suggest using a template system (like jinja2) as others have suggested. If you're in need of some low level generation of html bits (perhaps as an input to one of your templates), then the xml.etree package is a standard python package and might fit the bill nicely.
import sys
from xml.etree import ElementTree as ET
html = ET.Element('html')
body = ET.Element('body')
html.append(body)
div = ET.Element('div', attrib={'class': 'foo'})
body.append(div)
span = ET.Element('span', attrib={'class': 'bar'})
div.append(span)
span.text = "Hello World"
if sys.version_info < (3, 0, 0):
# python 2
ET.ElementTree(html).write(sys.stdout, encoding='utf-8',
method='html')
else:
# python 3
ET.ElementTree(html).write(sys.stdout, encoding='unicode',
method='html')
Prints the following:
<html><body><div class="foo"><span class="bar">Hello World</span></div></body></html>

There is also a nice, modern alternative: airium: https://pypi.org/project/airium/
from airium import Airium
a = Airium()
a('<!DOCTYPE html>')
with a.html(lang="pl"):
with a.head():
a.meta(charset="utf-8")
a.title(_t="Airium example")
with a.body():
with a.h3(id="id23409231", klass='main_header'):
a("Hello World.")
html = str(a) # casting to string extracts the value
print(html)
Prints such a string:
<!DOCTYPE html>
<html lang="pl">
<head>
<meta charset="utf-8" />
<title>Airium example</title>
</head>
<body>
<h3 id="id23409231" class="main_header">
Hello World.
</h3>
</body>
</html>
The greatest advantage of airium is - it has also a reverse translator, that builds python code out of html string. If you wonder how to implement a given html snippet - the translator gives you the answer right away.
Its repository contains tests with example pages translated automatically with airium in: tests/documents. A good starting point (any existing tutorial) - is this one: tests/documents/w3_architects_example_original.html.py

I would recommend using xml.dom to do this.
http://docs.python.org/library/xml.dom.html
Read this manual page, it has methods for building up XML (and therefore XHTML). It makes all XML tasks far easier, including adding child nodes, document types, adding attributes, creating texts nodes. This should be able to assist you in the vast majority of things you will do to create HTML.
It is also very useful for analysing and processing existing xml documents.
Here is a tutorial that should help you with applying the syntax:
http://www.postneo.com/projects/pyxml/

I am using the code snippet known as throw_out_your_templates for some of my own projects:
https://github.com/tavisrudd/throw_out_your_templates
https://bitbucket.org/tavisrudd/throw-out-your-templates/src
Unfortunately, there is no pypi package for it and it's not part of any distribution as this is only meant as a proof-of-concept. I was also not able to find somebody who took the code and started maintaining it as an actual project. Nevertheless, I think it is worth a try even if it means that you have to ship your own copy of throw_out_your_templates.py with your code.
Similar to the suggestion to use yattag by John Smith Optional, this module does not require you to learn any templating language and also makes sure that you never forget to close tags or quote special characters. Everything stays written in Python. Here is an example of how to use it:
html(lang='en')[
head[title['An example'], meta(charset='UTF-8')],
body(onload='func_with_esc_args(1, "bar")')[
div['Escaped chars: ', '< ', u'>', '&'],
script(type='text/javascript')[
'var lt_not_escaped = (1 < 2);',
'\nvar escaped_cdata_close = "]]>";',
'\nvar unescaped_ampersand = "&";'
],
Comment('''
not escaped "< & >"
escaped: "-->"
'''),
div['some encoded bytes and the equivalent unicode:',
'你好', unicode('你好', 'utf-8')],
safe_unicode('<b>My surrounding b tags are not escaped</b>'),
]
]

I am attempting to make an easier solution called
PyperText
In Which you can do stuff like this:
from PyperText.html import Script
from PyperText.htmlButton import Button
#from PyperText.html{WIDGET} import WIDGET; ex from PyperText.htmlEntry import Entry; variations shared in file
myScript=Script("myfile.html")
myButton=Button()
myButton.setText("This is a button")
myScript.addWidget(myButton)
myScript.createAndWrite()

I wrote a simple wrapper for the lxml module (should work fine with xml as well) that makes tags for HTML/XML -esq documents.
Really, I liked the format of the answer by John Smith but I didn't want to install yet another module to accomplishing something that seemed so simple.
Example first, then the wrapper.
Example
from Tag import Tag
with Tag('html') as html:
with Tag('body'):
with Tag('div'):
with Tag('span', attrib={'id': 'foo'}) as span:
span.text = 'Hello, world!'
with Tag('span', attrib={'id': 'bar'}) as span:
span.text = 'This was an example!'
html.write('test_html.html')
Output:
<html><body><div><span id="foo">Hello, world!</span><span id="bar">This was an example!</span></div></body></html>
Output after some manual formatting:
<html>
<body>
<div>
<span id="foo">Hello, world!</span>
<span id="bar">This was an example!</span>
</div>
</body>
</html>
Wrapper
from dataclasses import dataclass, field
from lxml import etree
PARENT_TAG = None
#dataclass
class Tag:
tag: str
attrib: dict = field(default_factory=dict)
parent: object = None
_text: str = None
#property
def text(self):
return self._text
#text.setter
def text(self, value):
self._text = value
self.element.text = value
def __post_init__(self):
self._make_element()
self._append_to_parent()
def write(self, filename):
etree.ElementTree(self.element).write(filename)
def _make_element(self):
self.element = etree.Element(self.tag, attrib=self.attrib)
def _append_to_parent(self):
if self.parent is not None:
self.parent.element.append(self.element)
def __enter__(self):
global PARENT_TAG
if PARENT_TAG is not None:
self.parent = PARENT_TAG
self._append_to_parent()
PARENT_TAG = self
return self
def __exit__(self, typ, value, traceback):
global PARENT_TAG
if PARENT_TAG is self:
PARENT_TAG = self.parent

Related

Error in converting from markdown to editor in flask

I am using markdown library in python to display some markdown in my flask app.
I am getting an error during display of the output as it is showing the markdown content without converting it to HTML.
This is my python code.
import markdown
from flask import Flask
#import some other libraries
#app.route('/md')
def md():
content = """
<h1>Hello</h1>
Chapter
=======
Section
-------
* Item 1
* Item 2
**Ishaan**
"""
content = Markup(markdown.markdown(content))
return render_template('md.html', **locals())
This is my html code.
<html>
<head>
<title>Markdown Snippet</title>
</head>
<body>
{{ content }}
</body>
</html>
I am following the code from here
I know I am doing some blunder but I'll be thankful if anybody help me.
Thanks in advance.
Dedent your lines of Markdown.
Anything inside Python triple-quotes is interpreted by Python literally. That includes the indentation. Therefore, the text passed to Markdown is indented by one level causing Markdown to interpret the entire document as a code block. Remove the indentation and Markdown will recognize the text properly:
#app.route('/md')
def md():
content = """
<h1>Hello</h1>
Chapter
=======
Section
-------
* Item 1
* Item 2
**Ishaan**
"""
Note that the example you are copying from also does not indent the triple-quoted text. Of course, that makes your Python code less readable. Therefore, the Python Standard Library includes the textwrap.dedent() function, which will remove the indentation programmatically:
from textwrap import dedent
#app.route('/md')
def md():
content = """
<h1>Hello</h1>
Chapter
=======
Section
-------
* Item 1
* Item 2
**Ishaan**
"""
content = Markup(markdown.markdown(dedent(content))) # <= dedent here
Note that content is passed through dedent before being passed to Markdown.

Realtime(ish) updates on Web Page using AppEngine

I'm using AppEngine to create a page that I would like to update from the program. Specifically, I am getting some market data and would like to have a table (or something else appropriate) that shows current prices. Let me be clear: I am new to this and think my problem is that I'm not asking the question well enough to find a good (best) answer. I'm not even sure AppEngine is necessarily the way to go. I'll also caveat that I've been learning via Udacity so if code looks familiar -- kudos to Steve Huffman.
I've created the page via jinja2 and I've managed to wrangle the appropriate libraries and sandbox parameters to get market updates. I've created an html table and passed in a dictionary with values for exchanges and bid/ask pairs. The table creates fine -- but when I render again, I get tables repeating down the page rather than one table with updating market prices.
Here is the html/jinja2 (I ditched all the styling to make it shorter):
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Table template</title>
</head>
<body>
<h1>Table template</h1>
<table>
{% for exch in mkt_data %}
<tr>
<td> <div>{{exch}}</div></td>
<td> <div>{{mkt_data[exch][0]}}</div></td>
<td><div>{{mkt_data[exch][1]}}</div></td>
</tr>
{% endfor %}
</table>
</body>
</html>
Here is the code:
import os
import jinja2
import webapp2
import ccxt
template_dir = os.path.join(os.path.dirname(__file__), 'templates')
jinja_env = jinja2.Environment(loader = jinja2.FileSystemLoader(template_dir),
autoescape=True)
class Handler(webapp2.RequestHandler):
def write(self, *a, **kw):
self.response.out.write(*a, **kw)
def render_str(self, template, **params):
t = jinja_env.get_template(template)
return t.render(params)
def render(self, template, **kw):
self.write(self.render_str(template, **kw))
class MainPage(Handler):
def get(self):
self.render("table.html", mkt_data=btc)
for x in range(3):
for exch in exchanges:
orderbook=exch.fetch_order_book('BTC/USD')
bid = orderbook['bids'][0][0] if len(orderbook['bids'])>0 else None
ask = orderbook['asks'][0][0] if len(orderbook['asks'])>0 else None
btc[exch.id]=[bid,ask]
self.render("table.html", mkt_data=btc)
gdax = ccxt.gdax()
gemini = ccxt.gemini()
exchanges = [gdax, gemini]
btc = {"gemini":[0,1], "gdax":[1,2]}
for exch in exchanges:
exch.load_markets()
app = webapp2.WSGIApplication([('/', MainPage)], debug=True)
I have 2 questions:
First, why am I getting the table repeating? I think I know why, but I want to hear a formal reason.
Second, what should I be doing? I originally started learning javascript/node but then it seemed very hard to wrap all the appropriate libraries (was looking into browserify but then thought appengine may be better so I could more easily host something for others to see). I tried integrating some javascript but that did not get me anywhere. Now I've run into Firebase but before I go learn yet another "thing" I wanted to ask how other people do this. I'm certain there are multiple ways but I'm new to web programming; I'm viewing a web page as a nice UI & delivery mechanism.
Some add'l notes: using Ubuntu, virtualenv, ccxt library (for cryptocurrency).
edit: I checked Dan's answer because it offered a solution. I'd love to hear about whether Firebase is "a" more correct solution rather than auto-refreshing.
The repeated table is the result of the multiple self.render() calls inside your MainPage.get() - both above and repetead ones inside the for loop(s).
Update your code to make a single such call, after the for loops building the template values (at the end of MainPage.get())

How to display html from the controller in the view using Web2Py?

I'm utilizing web2py and I'd like to display html code that is returned from a python function in the controller.
I have the following controller (default.py):
def index():
return {"html_code":"<img src='https://static1.squarespace.com/static/54e8ba93e4b07c3f655b452e/t/56c2a04520c64707756f4267/1493764650017'>"}
This is my view (index.html):
{{=html_code}}
When I visit the site (http://127.0.0.1:8000/test/default/index), I see the following (instead of the image)
<img src='https://static1.squarespace.com/static/54e8ba93e4b07c3f655b452e/t/56c2a04520c64707756f4267/1493764650017'>
How can I render the variable called html_code as html instead of as plain text?
By default, any content written to the view via {{=...}} is escaped. To suppress the escaping, you can use the XML() helper:
{{=XML(html_code)}}
Alternatively, you can construct the HTML via the server-side HTML helpers rather than generating raw HTML:
def index():
return {"html_code": IMG(_src='https://static1.squarespace.com/static/54e8ba93e4b07c3f655b452e/t/56c2a04520c64707756f4267/1493764650017')}
And then you can leave the view as you have it:
{{=html_code}}
The above assumes that you are generating the HTML via your own code. If the HTML in question comes from an untrusted source (e.g., user input), writing it to the view without escaping presents a security risk. In that case, you can have the XML() helper doing some sanitizing (i.e., it will limit the allowed HTML tags and attributes to a safe whitelist) (see here for more details):
{{=XML(html_code, sanitize=True)}}
try use XML() helper
def index():
return {"html_code":XML("<img src='https://static1.squarespace.com/static/54e8ba93e4b07c3f655b452e/t/56c2a04520c64707756f4267/1493764650017'>")}

Python, search for html tags inside a file using regex

So I am doing some data analysis in which I am required to extract the page title, breadcrumb, h1 tags from hundreds of HTML and SHTML files.
Those tags are in the following format (meaning stuffs inside , and breadcrumb):
<title>Mapping a Drive: Macintosh OSX < Mapping a Drive < eHelp < Cal Poly Pomona</title>
<p><!-- InstanceBeginEditable name="breadcrumb" -->eHelp » Mapping a Drive » Mac OS X<!-- InstanceEndEditable --></p>
<h1><a name="contentstart" id="contentstart"></a><!-- InstanceBeginEditable name="page_heading" --><a name="top" id="top"></a>Mapping a Drive:<span class="goldletter"> Macintosh </span>OS X <!-- InstanceEndEditable --></h1>
After getting those tags, I want to further extract the first part of the title Mapping a Drive: Macintosh OSX, last part of the breadcrumb Mac OS X and the whole h1 Mapping a Drive: Macintosh OSX
Any idea how that can be accomplished?
Use a real HTML parser, not a regex. You will be happier. lxml.html is highly regarded, as is BeautifulSoup.
Since most HTML is basically xml (or can easily be trimmed to be compatible with most xml parsers) I would suggest using an xml parser. Most python HTML-specific parsers are just subclasses of an xml parser anyway.
Check out: Python and XML.
Here is a good tutorial: Python XML Parser Tutorial.
Also, the xml.dom.minidom Class has been super useful for me personally.
Another similar method is explained here: xml.etree.ElementTree.
This is a good example from the xml.dom.minidom reference page:
import xml.dom.minidom
document = """\
<slideshow>
<title>Demo slideshow</title>
<slide><title>Slide title</title>
<point>This is a demo</point>
<point>Of a program for processing slides</point>
</slide>
<slide><title>Another demo slide</title>
<point>It is important</point>
<point>To have more than</point>
<point>one slide</point>
</slide>
</slideshow>
"""
dom = xml.dom.minidom.parseString(document)
def getText(nodelist):
rc = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
rc.append(node.data)
return ''.join(rc)
def handleSlideshow(slideshow):
print "<html>"
handleSlideshowTitle(slideshow.getElementsByTagName("title")[0])
slides = slideshow.getElementsByTagName("slide")
handleToc(slides)
handleSlides(slides)
print "</html>"
def handleSlides(slides):
for slide in slides:
handleSlide(slide)
def handleSlide(slide):
handleSlideTitle(slide.getElementsByTagName("title")[0])
handlePoints(slide.getElementsByTagName("point"))
def handleSlideshowTitle(title):
print "<title>%s</title>" % getText(title.childNodes)
def handleSlideTitle(title):
print "<h2>%s</h2>" % getText(title.childNodes)
def handlePoints(points):
print "<ul>"
for point in points:
handlePoint(point)
print "</ul>"
def handlePoint(point):
print "<li>%s</li>" % getText(point.childNodes)
def handleToc(slides):
for slide in slides:
title = slide.getElementsByTagName("title")[0]
print "<p>%s</p>" % getText(title.childNodes)
handleSlideshow(dom)
If you absolutely must use regex instead of a parser, check out the re module:
In [1]: import re
In [2]: grps = re.search(r"<([^>]+)>([^<]+)</\1>", "<abc>123</abc>")
In [3]: if grps:
In [4]: print grps.groups()
Out[3]: ('abc', '123')
html5lib is a very reliable html parser. Since your xhtml is somewhat broken, an xml parser will reject it. Fortunately, html5lib has lxml integration, so you can still use the full power of lxml and xpath to extract your data.

Un/bound methods in Cheetah

Is there a way to declare static methods in cheetah? IE
snippets.tmpl
#def address($address, $title)
<div class="address">
<b>$title</h1></b>
#if $address.title
$address.title <br/>
#end if
$address.line1 <br/>
#if $address.line2
$address.line2 <br/>
#end if
$address.town, $address.state $address.zipcode
</div>
#end def
....
other snippets
other.tmpl
#from snippets import *
$snippets.address($home_address, "home address")
This code reports this error: NotFound: cannot find 'address'. Cheetah is compiling it as a bound method, natch:
snippets.py
class snippets(Template):
...
def address(self, address, title, **KWS):
Is there a way to declare static methods? If not, what are some alternative ways to implement something like this (a snippets library)?
This page seems to have some relevant information, but I'm not in a position to try it out myself right now, sorry.
Specifically, you should just be able to do:
##staticmethod
#def address($address, $title)
...and have it work.
(If you didn't know, staticmethod is a built-in function that creates a... static method :) It's most commonly used as a decorator. So I found that page by Googling "cheetah staticmethod".)

Categories