Python templates for huge HTML/XML

Python templates for huge HTML/XML - python

Recently I needed to generate a huge HTML page containing a report with several thousand row table. And, obviously, I did not want to build the whole HTML (or the underlying tree) in memory. As result, I built the page with the old good string interpolation, but I do not like the solution.
Thus, I wonder whether there are Python templating engines that can yield resulting page content by parts.
UPD 1: I am not interested in listing all available frameworks and templating engines.
I am interested in templating solutions that I can use separately from any framework and which can yield content by portions instead of building the whole result in memory.
I understand the usability enhancements from partial content loading with client scripting, but that is out of the scope of my current question. Say, I want to generate a huge HTML/XML and stream it into a local file.

Most popular template engines have a way to generate or write rendered result to file objects with chunks. For example:
Template.generate() in Jinja2
Template.render_context() in Mako
Stream.serialize() in Genshi

It'd be more user-friendly (assuming they have javascript enabled) to build the table via javascript by using e.g. a jQuery plugin which allows automatical loading of contents as soon as you scroll down. Then only few rows are loaded initially and when the user scrolls down more rows are loaded on demand.
If that's not a solution, you could use three templates: one for everything before the rows, one for everything after the rows and a third one for the rows.
Then you first send the before-rows template, then generate the rows and send them immediately, then the after-rows template. Then you will have only one block/row in memory instead of the whole table.

There is no problem with building something like this in memory. Several thousand rows is by no means big.
For your templating needs you can use any of the:
rst http://docutils.sourceforge.net/rst.html (just needs core docutils)
markdown http://daringfireball.net/projects/markdown/ (python lib)
sphinx https://www.sphinx-doc.org (python based)
json http://www.json.org/ (python lib)
There are some tools that allow generation of HTML from these markup languages.

You don't need a streaming templating engine - I do this all the time, and long before you run into anything vaguely heavy server-side, the browser will start to choke. Rendering a 10000 row table will peg the CPU for several seconds in pretty much any browser; scrolling it will be bothersomely choppy in chrome, and the browser mem usage will rise regardless of browser.
What you can do (and I've previously implemented, even though in retrospect it turns out not to be necessary) is use client-side xslt. Printing the xslt processing instruction and the opening and closing tag using strings is easy and fairly safe; then you can stream each individual row as a standalone xml element using whatever xml writer technique you prefer.
However - you really don't need this, and likely never will - if ever your html generator gets too slow, the browser will be an order of magnitude more problematic.
So, unless you benchmarked this and have determined you really have a problem, don't waste your time. If you do have a problem, you can solve it without fundamentally changing the method - in memory generation can work just fine.

Are you using a web framework for this?
http://www.pylonshq.com includes compatibility with several templating engines.
http://www.djangoproject.com/ Django has its own templating language.
I think an answer that included lazy loading of the rows with javascript would work for web view, but I presume the report is going to need to be printed, in which case you'll have to build the whole thing at some point, right?

Related

Python - Documenting Test Results via HTML

This is definitely the most vague question I have asked on SO but I hope it will not be too heavily frowned upon.
I recently started a project for the company I am interning at this summer. The task is to test traffic between two servers using the program iperf.
Long story short: I am successfully able to run the tests, get the data (saved as a txt file), and generate graphs all in python (company specification of language).
My question for all is how can I create a webpage to store this information via python? The webpage would not need to be complicated by any stretch of the imagination. What I am imagining is that every time a test case is run it adds a link to a "homepage" once the user clicks on that link they will have the test results. Test results would be something along the lines of the graphs (perhaps in iframes?) and the txt file (also in an iframe?). Nothing exciting.
I am extremely lost on how to even start something like. Any and all guidance would be helpful.

I think what you're looking for is a templating engine for Python. Jinja2 is a good example of one.
Jinja2 is a full featured template engine for Python. It has full
unicode support, an optional integrated sandboxed execution
environment, widely used and BSD licensed. source
You could pull the test data into Python and generate a dashboard-style webpage with it. If you're looking for something more powerful, take a look at Flask, which is a microframework and uses Jinja2, and combine it with Highcharts

django templates performance

Which is more efficient in Django: to render an object using the template language? or to push the code to render the object via a templatetag function using the python language?

The question is sort-of nonsensical. Templates are always going to be slower than a hand-tuned rendering solution, but template tags still need to run through the Template machinery, so you lose out any advantage there. If you want ultra-high efficiency, consider writing all of the HTML as an array of Python strings, join() them once and then pass your HTML back as the body of an HTTPResponse object.
Or, you could try all three and profile them. Since we don't know your code, we can't do that work for you. After a couple of experiments, you should be satisfied with which approach is correct for you.
The templating engine is almost never your bottleneck. You database is your most likely bottleneck. You do have the Django Toolbar installed, right? If the templating engine is not where your performance is at issue, always use the solution that cheapest for you to implement.

This is like the argument about the most efficient way to append strings (virtually every programming language has that argument). This is 2011, not 1970. Even consumer level computers have processing power and memory reserves to match the supercomputers of the last decade. Network server-class machines have much more. Saving a cycle or two on the processor these days is pretty meaningless. It's one thing when you're developing OS system code or low-level system processes, but for parsing a template, you're wasting your time.

I'd say it's about the same; take the most clean way, and if you need speed, cache it.

Consuming RSS in Django ( / Python)

For a site I'm working on I would like to import a lot of RSS feeds using Django. Since I need the content of them fast I will need to cache them locally (either in the database or in some other way)
Is there a standard app to do RSS consumption in Django, or is there a standard way to do this in Python?
Of course I could implement it myself, but I'd rather reuse a good piece of code (since there's a lot of stuff to consider, like what to do when an item updates, how long to wait before checking for updates, etc, and I'd rather reuse someone elses thinking about this).
(I did google django and rss, but everything that seems to popup is feed generation; surely there must be other sites out there using Django and consuming RSS?)

Check outhttp://feedparser.org/docs/ http://code.google.com/p/feedparser/
One of the best Python libraries for parsing RSS and Atom Feeds; although it seems like you want to do a bit more (caching, auto-refresh etc.)

I have a web-app consisting of some html forms for maintaining some tables (SQlite, with CherryPy for web-server stuff). First I did it entirely 'the Python way', and generated html strings via. code, with common headers, footers, etc. defined as functions in a separate module.
I also like the idea of templates, so I tried Jinja2, which I find quite developer-friendly. In the beginning I thought templates were the way to go, but that was when pages were simple. Once .css and .js files were introduced (not necessarily in the same folder as the .html files), and an ever-increasing number of {{...}} variables and {%...%} commands were introduced, things started getting messy at design-time, even though they looked great at run-time. Things got even more difficult when I needed additional javascript in the or sections.
As far as I can see, the main advantages of using templates are:
Non-dynamic elements of page can easily be viewed in browser during design.
Except for {} placeholders, html is kept separate from python code.
If your company has a web-page designer, they can still design without knowing Python.
while some disadvantages are:
{{}} delimiters visible when viewed at design-time in browser
Associated .css and .js files have to be in same folder to see effects in browser at design-time.
Data, variables, lists, etc., must be prepared in advanced and either declared globally or passed as parameters to render() function.
So - when to use 'hard-coded' HTML, and when to use templates? I am not sure of the best way to go, so I would be interested to hear other developers' views.
TIA, Alan

Although I'm not a Python developer, I'll answer here - I believe the idea of using templates is common for PHP and Python.
Using templates has many advantages, like:
keeping the code clean. Separating the "logic" (controller) code from the presentation (view) is very important. Working on projects that mix HTML / CSS / JS / Python is really hard.
keeping HTML in separate files doesn't require you to modify the code itself. For example, placing in the controller code (Python code) might require you to put a slash before each " character.
you can ask your web designer to learn the basics of templates syntax so he's able to help you much without destroying your work on the controller code (which is quite common when a person with no experience in a given language modifies something)
in fact, there are many more advantages, but those are most important for me.
The only disadvantage is that you have to pass parameters to the rendering function ... which doesn't require much of work. Anyway it's much easier than maintaining any project that mixes controller code with view code.
Generally, you should have a look at >MVC pros and cons question< :
What is MVC and what are the advantages of it?

The simplest way to solve your static file problem is to use relative paths when referring to them in your html. For example: <img src="static/image.jpg" />
If you're willing to put in a little more work, you can solve all the design-time problems you mentioned by writing a mini-server to display your templates.
Maintain a file full of simple data structures containing example values for all your templates.
Use a microframework like Werkzeug to serve http on your local machine.
Write a root request handler that scans your data structure list or templates directory to produce an index page with links to all your templates.
Write a secondary request handler for non-root requests, which renders the requested template with the data structure of the same name.
You can write this tool in a few hours, and it makes template design very convenient. One nice feature of Werkzeug's built-in wsgi server is that it can automatically reload itself when it detects that a file has changed. You can leave your mini-server running while you edit templates and click links on your index page all day.

I would highly recommend using templates. Templates help to encourage a good MVC structure to your application. Python code that emits HTML, IMHO, is wrong. The reason I say that is because Python code should be responsible for doing logic and not have to worry about presentation. Template syntax is usually restrictive enough that you can't really do much logic within the template, but you can do any presentation specific type logic that you may need.
ymmv.

I think that templates are still the best way to separate presentation from business logic. The key is a good templating engine, in particular, one that can make decisions in the template itself. For Python, I've found the Genshi templating engine to be quite good. It's used by the Trac Wiki/Issue tracking system and is quite powerful, while still leaving the templates easy to work with.
For other tasks, I tend to use the BeautifulSoup module. I'll create a simple HTML page, parse it with BeautifulSoup, use the resulting object to add the necessary data, and then write the output to its destination (typically a file for me).

As a Seaside developer, I see no use for templates, if your designer can do css. In practice, I find it impossible to keep templates DRY. Take a look at this question to see the advantages of using code (a DSL) to generate your page. You might be bound by legacy, of course.
Separation of presentation (html) and business logic in web apps does not lead to good modularisation, i.e. low coupling and high cohesion. That is why separate template systems don't work well.
I just read Jeff Atwoods "What's Wrong With CSS". This again is a problem long solved in the smalltalk world with the Phantasia DSL

Is it reasonable to save data as python modules?

This is what I've done for a project. I have a few data structures that are bascially dictionaries with some methods that operate on the data. When I save them to disk, I write them out to .py files as code that when imported as a module will load the same data into such a data structure.
Is this reasonable? Are there any big disadvantages? The advantage I see is that when I want to operate with the saved data, I can quickly import the modules I need. Also, the modules can be used seperate from the rest of the application because you don't need a separate parser or loader functionality.

By operating this way, you may gain some modicum of convenience, but you pay many kinds of price for that. The space it takes to save your data, and the time it takes to both save and reload it, go up substantially; and your security exposure is unbounded -- you must ferociously guard the paths from which you reload modules, as it would provide an easy avenue for any attacker to inject code of their choice to be executed under your userid (pickle itself is not rock-solid, security-wise, but, compared to this arrangement, it shines;-).
All in all, I prefer a simpler and more traditional arrangement: executable code lives in one module (on a typical code-loading path, that does not need to be R/W once the module's compiled) -- it gets loaded just once and from an already-compiled form. Data live in their own files (or portions of DB, etc) in any of the many suitable formats, mostly standard ones (possibly including multi-language ones such as JSON, CSV, XML, ... &c, if I want to keep the option open to easily load those data from other languages in the future).

It's reasonable, and I do it all the time. Obviously it's not a format you use to exchange data, so it's not a good format for anything like a save file.
But for example, when I do migrations of websites to Plone, I often get data about the site (such as a list of which pages should be migrated, or a list of how old urls should be mapped to new ones, aor lists of tags). These you typically get in Word och Excel format. Also the data often needs massaging a bit, and I end up with what for all intents and purposes are a dictionaries mapping one URL to some other information.
Sure, I could save that as CVS, and parse it into a dictionary. But instead I typically save it as a Python file with a dictionary. Saves code.
So, yes, it's reasonable, no it's not a format you should use for any sort of save file. It however often used for data that straddles the border to configuration, like above.

The biggest drawback is that it's a potential security problem since it's hard to guarantee that the files won't contains arbitrary code, which could be really bad. So don't use this approach if anyone else than you have write-access to the files.

A reasonable option might be to use the Pickle module, which is specifically designed to save and restore python structures to disk.

Alex Martelli's answer is absolutely insightful and I agree with him. However, I'll go one step further and make a specific recommendation: use JSON.
JSON is simple, and Python's data structures map well into it; and there are several standard libraries and tools for working with JSON. The json module in Python 3.0 and newer is based on simplejson, so I would use simplejson in Python 2.x and json in Python 3.0 and newer.
Second choice is XML. XML is more complicated, and harder to just look at (or just edit with a text editor) but there is a vast wealth of tools to validate it, filter it, edit it, etc.
Also, if your data storage and retrieval needs become at all nontrivial, consider using an actual database. SQLite is terrific: it's small, and for small databases runs very fast, but it is a real actual SQL database. I would definitely use a Python ORM instead of learning SQL to interact with the database; my favorite ORM for SQLite would be Autumn (small and simple), or the ORM from Django (you don't even need to learn how to create tables in SQL!) Then if you ever outgrow SQLite, you can move up to a real database such as PostgreSQL. If you find yourself writing lots of loops that search through your saved data, and especially if you need to enforce dependencies (such as if foo is deleted, bar must be deleted too) consider going to a database.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.