Creating a PDF from XML using XSL FO w/ Python - python

Could someone point me in the right direction of hopefully a library or code examples, any resources on how to take XML and create a PDF using XSL-FO in Python? If I should have to use an XML renderer, then which XML renderer is recommended?

If you want to run XSLT programmatically with Python, you want lxml.
However, if you just need to create a .fo file from a defined XSL/XML pair, you might as well just use xsltproc, which is available on any Unix-y system on the command line, including OS X.
Once you have the .fo file, use Fop to transform that to PDF.

You may want to try XHTML2PDF. It's very easy to use. Create your XHTML template. Use Jinja2 (or similar) to fill in the template. Convert to PDF.

Related

Graphic representation of the elements in the XML file

I`ve got an interesting task and I need to know am I on the right path.
Task is:
Create Front end visualization using the attached XML file. Visualization should be a graphic representation of the elements in the XML file.
Requirements:
JSON backend service
Open source javascript library for the frontend
Suggested Tools
Python in combination with Tornado backend language
Twitter Bootstrap
I have made a setup of Tornado and created python file that opens index.html which will display graphic representation of the elements from the XML file.
My idea is to parse XML to JSON,and than show output graphically.
I`m uncertain about this things:
1. Am I on the right path, is there a better solution?
2. Does anyone have experience for graphic representation...Does graphic representation means something like this ? Or does it mean I can show output as a simple element tree???
All ideas and suggestions are welcome!
EDIT:
This is my XML file: http://pastebin.com/AJeNctFY
There could be various representation methods to visualize an XML file. You must first decide what you want. For example, showing XMl elements as nodes of a tree sounds good. But what about attributes? what about the node's value?
For example, you might want to show a notification box to show the attributes and their values. Another option is to show each attribute as a child node of the XML element. As an instance of latter, take a look at this XML Viewer.

create office files from python

We have a project in python with django.
We need to generate complex word, excel and pdf files.
For the rest of our projects which were done in PHP we used PHPexcel ,
PHPWord and tcpdf for PDF.
What libraries for python would you recommend for creating this kind of files ? (for excel and word its imortant to use the open xml file format xlsx , docx)
Python-docx may help ( https://github.com/mikemaccana/python-docx ).
Python doesn't have highly-developed tools to manipulate word documents. I've found the java library xdocreport ( https://code.google.com/p/xdocreport/ ) to be the best by far for Word reporting. Because I need to generate PCL, which is efficiently done via FOP I also use docx4j.
To integrate this with my python, I use the spark framework to wrap it up with a simple web service, and use requests on the python side to talk to the service.
For excel, there's openpyxl, which actually is a python port of PHPexcel, afaik. I haven't used it yet, but it sounds ok to me.
I would recommend using Docutils. It takes reStructuredText files and converts them to a range of output files. Included in the package are HTML, LaTeX and .odf file writers but in the sandbox there are a whole load of other writers for writing to other formats, see for example, the WordML writer (disclaimer: I haven't used it).
The advantage of this solution is that you can write plain text (reStructuredText) master files, which are human readable as is, and then convert to a range of other file formats as required.
Whilst not a Python solution, you should also look at Pandoc a Haskell library which supports a much wider range of output and input formats than docutils. One major advantage of Pandoc over Docutils is that you can do the reverse translation, i.e. WordML to reStructuredText. You can try Pandoc here.
I have never used any libraries for this, but you can change the extension of any docx, xlsx file to zip, and see the magic!
Generating openxml files is as simple as generating couple of XML files (you can use templates) and zipping it.
Simplest way to generate PDF is to generate HTML (with CSS+images) and convert it using wkhtmltopdf tool.

Inkscape groups/layers generation with Cairo/Pycairo SVG?

I'd like to be able to add objects (filled rectangles, outlines, etc.) to a cairo SVG context in such a way that when I open the SVG file with Inkscape, each would be recognized as an individual object so that I can move/edit it. Ideally, I'd also like to group objects that would appear as "layers" in Inkscape.
Is this possible to do through the cairo API (to some extent), or would I have to generate each element individually, and then stitch them all together to an Inkscape SVG format? Maybe cairo is not the right tool here? What's the best approach, then?
Cairo provides an API for rendering SVG, but not for generating it.
SVG is just XML, so you could use any off-the-shelf XML library to generate your SVG content. If the content is fairly simple, and you would like a python-based solution, I would recommend lxml. In this case, you would be using Python's ETree API to generate XML content.
On the other hand, if the content you're generating is complex, such that you need to position elements dynamically, compute bounding boxes of groups and/or text, and other complex tasks, then I would recommend Batik, which implement's the SVG DOM and provides such methods. In this case, you would be using the DOM API to generate content. Here are some resources on DOM:
http://www.w3.org/DOM/DOMTR
Java DOM bindings: http://docs.oracle.com/javase/1.4.2/docs/api/org/w3c/dom/package-summary.html
SVG extensions to DOM: http://www.w3.org/TR/SVG/svgdom.html
Batik is written in Java, and so the most straightforward approach would be to use Java to develop against it, but you could also use Jython, which is Python for Java, if you prefer to stick with the python language.

Generating & Merging PDF Files in Python

I want to automatically generate booking confirmation PDF files in Python. Most of the content will be static (i.e. logos, booking terms, phone numbers), with a few dynamic bits (dates, costs, etc).
From the user side, the simplest way to do this would be to start with a PDF file with the static content, and then using python to just add the dynamic parts. Is this a simple process?
From doing a bit of search, it seems that I can use reportlab for creating content and pyPdf for merging PDF's together. Is this the best approach? Or is there a really funky way that I haven't come across yet?
Thanks!
From the user side, the simplest way to do this would be to start with a PDF file with the static content, and then using python to just add the dynamic parts. Is this a simple process?
Unfortunately no. There are several tools that are good at producing PDFs from scratch (most commonly for Python, ReportLab), but they don't generally load existing PDFs. You would have to include generating code for any boilerplate text, lines, blocks, shapes and images, rather than this being freely editable by the user.
On the other side there's pyPdf which can load PDFs, collate the pages, and extract some of the information, but can't really add new content. You can ‘merge’ pages into one, but you'd still have to create the extra information overlay as a page in ReportLab first.
Look into docutils and reSTructuredText. You could quickly write out your PDF document in reST and then compile the PDF using rst2pdf.py
I've used this, it creates very beautiful documents and the markup is extensible! Later you could take the same code and run it into rst2html to create a website out if it!
Take a look here:
http://docutils.sourceforge.net/docs/user/rst/quickref.html
http://code.google.com/p/rst2pdf/
Good luck
You could generate a document through, for example, TeX, or OpenOffice, or whatever gives you the most comfortable bindings and then print the document with a pdf printer.
This allows you not to have to figure out where to put fields precisely or figure out what to do if your content overflows the space allocated for it.

How to include page in PDF in PDF document in Python

I am using reportlab toolkit in Python to generate some reports in PDF format. I want to use some predefined parts of documents already published in PDF format to be included in generated PDF file. Is it possible (and how) to accomplish this in reportlab or in python library?
I know I can use some other tools like PDF Toolkit (pdftk) but I am looking for Python-based solution.
I'm currently using PyPDF to read, write, and combine existing PDF's and ReportLab to generate new content. Using the two package seemed to work better than any single package I was able to find.
If you want to place existing PDF pages in your Reportlab documents I recommend pdfrw. Unlike PageCatcher it is free.
I've used it for several projects where I need to add barcodes etc to existing documents and it works very well. There are a couple of examples on the project page of how to use it with Reportlab.
A couple of things to note though:
If the source PDF contains errors (due to the originating program following the PDF spec imperfectly for example), pdfrw may fail even though something like Adobe Reader has no apparent problems reading the PDF. pdfrw is currently not very fault tolerant.
Also, pdfrw works by being completely agnostic to the actual content of the PDF page you are placing. So for example, you wouldn't be able to use pdfrw inspect a page to see if it contains a certain string of text in the lower right-hand corner. However if you don't need to do anything like that you should be fine.
There is an add-on for ReportLab — PageCatcher.

Categories