I'd like to be able to add objects (filled rectangles, outlines, etc.) to a cairo SVG context in such a way that when I open the SVG file with Inkscape, each would be recognized as an individual object so that I can move/edit it. Ideally, I'd also like to group objects that would appear as "layers" in Inkscape.
Is this possible to do through the cairo API (to some extent), or would I have to generate each element individually, and then stitch them all together to an Inkscape SVG format? Maybe cairo is not the right tool here? What's the best approach, then?
Cairo provides an API for rendering SVG, but not for generating it.
SVG is just XML, so you could use any off-the-shelf XML library to generate your SVG content. If the content is fairly simple, and you would like a python-based solution, I would recommend lxml. In this case, you would be using Python's ETree API to generate XML content.
On the other hand, if the content you're generating is complex, such that you need to position elements dynamically, compute bounding boxes of groups and/or text, and other complex tasks, then I would recommend Batik, which implement's the SVG DOM and provides such methods. In this case, you would be using the DOM API to generate content. Here are some resources on DOM:
http://www.w3.org/DOM/DOMTR
Java DOM bindings: http://docs.oracle.com/javase/1.4.2/docs/api/org/w3c/dom/package-summary.html
SVG extensions to DOM: http://www.w3.org/TR/SVG/svgdom.html
Batik is written in Java, and so the most straightforward approach would be to use Java to develop against it, but you could also use Jython, which is Python for Java, if you prefer to stick with the python language.
Related
I use Pisa/xhtml2pdf in my Django apps to generate pdf from an HTML source. That is:
I generate the HTML file formatted with all 'printing' stuffs (e.g. page-breaks, header, footer, etc.)
I convert this HTML into pdf using Pisa
This process is ok but it is slow (expecially when dealing with long tables) and I must use HTML/CSS according to Pisa features/limitations.
The question is: is this the right way to generate pdf from a web application (i.e. create HTML and then convert it to pdf) or there is a more direct way, that is "write" the pdf with a more suitable language?
WeasyPrint author here. The point of using HTML/CSS to generate PDF (vs. using a lower-level PDF library directly.) is to get automatic layout. It lets you specify high-level constraints like h1 { page-break-after: avoid } and let the layout engine figure it out, rather than specifying the absolute position of everything. The former is much more maintainable when you make changes to your documents.
Some tools like rst2pdf have their own stylesheet syntax, but that’s just a bad way of re-inventing CSS.
But yes, dumping complex stylesheets made for screen might not give great results. It’s better to build the stylesheets with print in mind, or even use completely different stylesheets with #media print in CSS or <link media="print"> in HTML.
I think generating a pdf from html with libraries like Pisa or http://weasyprint.org/ is the simplest approach. because it takes care of inserting images, css, barcode (on pisa) ... etc
If you want to write the pdf yourself take a look at Reportlab but it will take much longer to implement. In both cases i suggest to always generate the pdf in the background with celery or python-rq for optimization.
Pisa is known having various issues - especially with long tables. In general one should avoid using PISA. Other options are:
using Reportlab directly
z3c.rml (Reportlab template language clone)
commercial alternatives:
PrinceXML
PDFreactor
The general rule when it comes to PDF production: you get what you pay for.
Converters like Pisa or Apache FOP are half-baked solutions that work for simple cases but suck in general.
You can also use the QT webkit rendering engine to create PDFs from HTML with http://code.google.com/p/wkhtmltopdf/ and django-wkhtmltopdf.
The advantage is that you can write the HTML and CSS as you would normally for WebKit. This works well if you are outputting an existing web page but may be less appropriate if generating PDFs from scratch.
I generate PDFs with the xhtml2pdf Python package. The output is not optimal. I use floating divs in order to place images and text on the page. In HTML this works but after PDF rendering, images and text ar placed underneath eachother which is not what I want. From surfing the web I learned that the Report Lab package that is used by xhtml2pdf can not handle floating divs. Does a workaround exist? I have tried webkit rendering via QT but the resulting PDFs are of low quality, i.e. character spacing is completely wrong.
If you cannot achieve the results you need with xhtml2pdf, I suggest you use ReportLab directly. ReportLab contains support for RML, ReportLabs own markup language that lets you easily create formatted text, and has a support library called Platypus that makes layout fairly simple using Python objects to represent document parts and page layouts.
The reason you are having problems, by the way, is that xhtml2pdf has to essentially act like a HTML rendering engine that outputs to PDF rather than the screen directly. As it took a long time and a lot of effort to make good rendering engines for browsers, so, too, does it seem that xhtml2pdf will take a lot of effort to make it of similar quality. This isn't to say that xhtml2pdf is bad, just that it's going to take time for it to be as good as rendering in a browser, and if PDF output for its own sake is what you really are interested in, I think using ReportLab directly is a better choice.
Could someone point me in the right direction of hopefully a library or code examples, any resources on how to take XML and create a PDF using XSL-FO in Python? If I should have to use an XML renderer, then which XML renderer is recommended?
If you want to run XSLT programmatically with Python, you want lxml.
However, if you just need to create a .fo file from a defined XSL/XML pair, you might as well just use xsltproc, which is available on any Unix-y system on the command line, including OS X.
Once you have the .fo file, use Fop to transform that to PDF.
You may want to try XHTML2PDF. It's very easy to use. Create your XHTML template. Use Jinja2 (or similar) to fill in the template. Convert to PDF.
I want to automatically generate booking confirmation PDF files in Python. Most of the content will be static (i.e. logos, booking terms, phone numbers), with a few dynamic bits (dates, costs, etc).
From the user side, the simplest way to do this would be to start with a PDF file with the static content, and then using python to just add the dynamic parts. Is this a simple process?
From doing a bit of search, it seems that I can use reportlab for creating content and pyPdf for merging PDF's together. Is this the best approach? Or is there a really funky way that I haven't come across yet?
Thanks!
From the user side, the simplest way to do this would be to start with a PDF file with the static content, and then using python to just add the dynamic parts. Is this a simple process?
Unfortunately no. There are several tools that are good at producing PDFs from scratch (most commonly for Python, ReportLab), but they don't generally load existing PDFs. You would have to include generating code for any boilerplate text, lines, blocks, shapes and images, rather than this being freely editable by the user.
On the other side there's pyPdf which can load PDFs, collate the pages, and extract some of the information, but can't really add new content. You can ‘merge’ pages into one, but you'd still have to create the extra information overlay as a page in ReportLab first.
Look into docutils and reSTructuredText. You could quickly write out your PDF document in reST and then compile the PDF using rst2pdf.py
I've used this, it creates very beautiful documents and the markup is extensible! Later you could take the same code and run it into rst2html to create a website out if it!
Take a look here:
http://docutils.sourceforge.net/docs/user/rst/quickref.html
http://code.google.com/p/rst2pdf/
Good luck
You could generate a document through, for example, TeX, or OpenOffice, or whatever gives you the most comfortable bindings and then print the document with a pdf printer.
This allows you not to have to figure out where to put fields precisely or figure out what to do if your content overflows the space allocated for it.
I am using reportlab toolkit in Python to generate some reports in PDF format. I want to use some predefined parts of documents already published in PDF format to be included in generated PDF file. Is it possible (and how) to accomplish this in reportlab or in python library?
I know I can use some other tools like PDF Toolkit (pdftk) but I am looking for Python-based solution.
I'm currently using PyPDF to read, write, and combine existing PDF's and ReportLab to generate new content. Using the two package seemed to work better than any single package I was able to find.
If you want to place existing PDF pages in your Reportlab documents I recommend pdfrw. Unlike PageCatcher it is free.
I've used it for several projects where I need to add barcodes etc to existing documents and it works very well. There are a couple of examples on the project page of how to use it with Reportlab.
A couple of things to note though:
If the source PDF contains errors (due to the originating program following the PDF spec imperfectly for example), pdfrw may fail even though something like Adobe Reader has no apparent problems reading the PDF. pdfrw is currently not very fault tolerant.
Also, pdfrw works by being completely agnostic to the actual content of the PDF page you are placing. So for example, you wouldn't be able to use pdfrw inspect a page to see if it contains a certain string of text in the lower right-hand corner. However if you don't need to do anything like that you should be fine.
There is an add-on for ReportLab — PageCatcher.