Is it possible to output individual figures from Bokeh as pdf or svg images? I feel like I'm missing something obvious, but I've checked the online help pages and gone through the bokeh.objects api and haven't found anything...
There is no way to save PDF currently, but as of Bokeh 0.12.6, it is now possible to export PNG and SVG directly from
Python code.
Exporting PNGs looks like this
export_png(plot, filename="plot.png")
And exporting SVGs looks like this
plot.output_backend = "svg"
export_svgs(plot, filename="plot.svg")
There are some optional dependencies that need to be installed.
You can find more information in the Exporting Plots section of the User Guide.
In the meantime... as a workaround, until we get a native support, you can use phantom.js to convert the HTML output into a pdf file. We use it in our example testing directory to convert HTML generated plots into png images, but you could also get pdf images:
https://github.com/ContinuumIO/bokeh/blob/master/examples/test#L217
And more info here:
http://phantomjs.org/screen-capture.html
It seems that since bokeh uses html5 canvas as a backend, it will be writing things to static html pages. You could always export the html to pdf later.
So, it's 2022 now and there's no direct support for exporting images as pdf.
However, there is a alternative way which works just as well.
First, as the other answers have mentioned, set the backend to 'svg'.
plot.output_backend = "svg"
then save the image as an svg file. Do not save it as an html file as this will likely lead to extra space around the actual image.
Then use any online svg to pdf convertor to get the pdf equivalent of the svg.
Related
As an accountant, I produce A4 PDF financial reports for clients. The report contains a PDF cover page design, table of contents, blocks of text and many tables of financial data.
To date I have used a mixture of Microsoft Excel and Word to produce these reports, then save as PDF and add the PDF cover. The major disadvantages to this are that I have to manually edit the tables, I would much rather create automated reports based off existing data exported from my accounting software.
I would like to move away from Excel-Word and move towards (semi-) automating this through python (potentially pandas and markdown packages) - with markdown or html.
Previously I used LaTeX to produce these reports, however I found LaTeX challenging if something went wrong, the errors are difficult to understand and even basic table production can be challenging.
I am trying to plan out how I could bring together python-markdown-html/css. I was wondering if anyone else had experience in producing A4 reports in this way and any advice that they could offer. Initially I was drawn to having text saved as .md files and data stored in either mongoDB, pandas dataframes or simply CSV. I would then use the combination of .md and the data to produce a complete report in HTML. However, could HTML be converted into A4 PDF easily? I understand that there are now page CSS functionality for printing, but is this applicable? How would you suggest I can automate the creation of A4 PDF reports?
To answer your questions plainly:
However, could HTML be converted into A4 PDF easily?
Yes, this is possible using pandoc.
I understand that there are now page CSS functionality for printing, but is this applicable?
Not needed if you use a pandoc template, but possible if desired.
How would you suggest I can automate the creation of A4 PDF reports?
I suggest using pandoc and pandoc templates. This will allow you to convert from a file containing a mixture of makdown, latex, html, and whatever else you would like directly into a pdf.
More details on how:
Pandoc is a document conversion tool that can do this job very well. It will allow you to convert from html or markdown or LaTeX or a mix of all 3 into pdf or a number of other desired formats. For additional control on how the output looks, you can use a pandoc template. You can find information on how to create a custom template here. Here is an example of how that command works:
pandoc /filepath/doc_name.md -o doc_name.pdf --template /file_path/pandoc-templates/article.latex
This process can automated with some further effort. You could do something such as write some python code to generate your graphs or tables from source csv files, then have that code call your pandoc command and build a document.
Here is how I convert my ipython files with graphics outputs and tables into nice looking PDF files, hiding the code segments:
First install jupyter_contrib_nbextensions with
pip install jupyter_contrib_nbextensions
and wkhtmltopdf library from:
https://wkhtmltopdf.org/downloads.html for example I use macos so I had to install
wkhtmltox-0.12.6-2.macos-cocoa.pkg
from this site.
Now convert your file outputs to HTML hiding your code:
jupyter nbconvert --no-input --to html A4_REPORT.ipynb
(A4_REPORT.ipynb is the file you should already have prepared generating some kind of a table, graph or have contained inline markdown segments and able to run in jupyter notebook)
Now convert your this HTML to PDF:
wkhtmltopdf A4_REPORT.html A4_REPORT.pdf
DONE !
I am trying different python libraries like pdftotree, pdfminer, tabula etc. But could not get the exact results. I mean I can get text from PDF, Images and Tabular data in HTML, but not as maintained and organized as original PDF file. Can someone help me with something regarding this? I would be thankful.
Mostly yes. Translate the PDF to SVG, and embed the SVG in your web page.
SVG's image model (what it can represent and how) is a near-superset of the PDF image model (which is itself a superset of PostScript), though SVG lacks some of the print-specific features of PDF. There are probably quite a few PDF->SVG converters out there already. Googling "Pdf to SVG" turned up quite a few promising hits
There will be some complications:
Many PDF files are longer than 1 page. You might need to generate 10 SVG files for a single 10 page PDF file, and then build a web page around those 10 SVGs. Throw in some dynamic HTML to "turn pages" and you've got a good web-based PDF viewer.
There are parts of PDF that aren't within its image model at all... bookmarks, annotations (form fields, digital signatures), document metadata (author, creation date, etc), and so forth. Some of the non-image-model stuff is common enough that a PDF to SVG utility might handle it directly (links), while other stuff doesn't have an HTML equivalent and would be lost.
You could preserve the appearance of a digital signature, but the actual security represented by those visuals would be gone. Preserving that signature's appearance could be considered lying about the security.
http://pillow.readthedocs.org/en/3.0.x/handbook/image-file-formats.html#pdf
The Pillow docs mention being able to save multiple pages, but I can't find any other docs or code samples that tell you how to add pages to a new PDF.
You could try the following:
im.save('test.pdf', save_all=True)
By default, the output format is determined by the file extension. This is documented in the release notes.
Doesn't seem like this is currently possible with Pillow, I used reportlab to build a multi-page PDF from scratch:
https://pypi.python.org/pypi/reportlab
I want to print or save gantt-chart(in pdf format). These charts are generated on web after a particular input. Our chart is a plug-in for Trac. I have used Genshi library to generate charts.
There's an open source python library for generating PDF files by Report Labs. I've not used it myself, but other questions & answers on SO have revolved around this library, Report Lab Toolkit.
Can you give more information about your plugin? There is a gantt chart plugin on trac-hacks.org; is that the one you are using, or a custom one? If custom, is it available as Open Source somewhere so we can see what you are doing?
If you implemented this as a wiki macro, you can use the WikiToPdf plugin to do this.
You Could use WeasyPrint to convert HTML to PDF. From their example website:
weasyprint http://www.w3.org/TR/CSS21/intro.html CSS21-intro.pdf -s http://weasyprint.org/samples/CSS21-print.css
creates a PDF file based on the HTML page and CSS provided. This is a python implementation.
I am using reportlab toolkit in Python to generate some reports in PDF format. I want to use some predefined parts of documents already published in PDF format to be included in generated PDF file. Is it possible (and how) to accomplish this in reportlab or in python library?
I know I can use some other tools like PDF Toolkit (pdftk) but I am looking for Python-based solution.
I'm currently using PyPDF to read, write, and combine existing PDF's and ReportLab to generate new content. Using the two package seemed to work better than any single package I was able to find.
If you want to place existing PDF pages in your Reportlab documents I recommend pdfrw. Unlike PageCatcher it is free.
I've used it for several projects where I need to add barcodes etc to existing documents and it works very well. There are a couple of examples on the project page of how to use it with Reportlab.
A couple of things to note though:
If the source PDF contains errors (due to the originating program following the PDF spec imperfectly for example), pdfrw may fail even though something like Adobe Reader has no apparent problems reading the PDF. pdfrw is currently not very fault tolerant.
Also, pdfrw works by being completely agnostic to the actual content of the PDF page you are placing. So for example, you wouldn't be able to use pdfrw inspect a page to see if it contains a certain string of text in the lower right-hand corner. However if you don't need to do anything like that you should be fine.
There is an add-on for ReportLab — PageCatcher.