Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
This question's answers are a community effort. Edit existing answers to improve this post. It is not currently accepting new answers or interactions.
I'm working on a project which takes some images from user and then creates a PDF file which contains all of these images.
Is there any way or any tool to do this in Python? E.g. to create a PDF file (or eps, ps) from image1 + image 2 + image 3 -> PDF file?
Here is my experience after following the hints on this page.
pyPDF can't embed images into files. It can only split and merge. (Source: Ctrl+F through its documentation page)
Which is great, but not if you have images that are not already embedded in a PDF.
pyPDF2 doesn't seem to have any extra documentation on top of pyPDF.
ReportLab is very extensive. (Userguide) However, with a bit of Ctrl+F and grepping through its source, I got this:
First, download the Windows installer and source
Then try this on Python command line:
from reportlab.pdfgen import canvas
from reportlab.lib.units import inch, cm
c = canvas.Canvas('ex.pdf')
c.drawImage('ar.jpg', 0, 0, 10*cm, 10*cm)
c.showPage()
c.save()
All I needed is to get a bunch of images into a PDF, so that I can check how they look and print them. The above is sufficient to achieve that goal.
ReportLab is great, but would benefit from including helloworlds like the above prominently in its documentation.
I suggest Pdfkit. (installation guide)
It creates pdf from html files. I chose it to create pdf in 2 steps from my Python Pyramid stack:
Rendering server-side with mako templates with the style and markup you want for you pdf document
Executing pdfkit.from_string(...) method by passing the rendered html as parameter
This way you get a pdf document with styling and images supported.
You can install it as follows :
using pip
pip install pdfkit
You will also need to install wkhtmltopdf (on Ubuntu).
I suggest pyPdf. It works really nice. I also wrote a blog post some while ago, you can find it here.
You can try this(Python-for-PDF-Generation) or you can try PyQt, which has support for printing to pdf.
Python for PDF Generation
The Portable Document Format (PDF) lets you create documents that look exactly the same on every platform. Sometimes a PDF document needs to be generated dynamically, however, and that can be quite a challenge. Fortunately, there are libraries that can help. This article examines one of those for Python.
Read more at http://www.devshed.com/c/a/Python/Python-for-PDF-Generation/#whoCFCPh3TAks368.99
fpdf works well for me. Much simpler than ReportLab and really free. Works with UTF-8.
Here is a solution that works with only the standard packages. matplotlib has a PDF backend to save figures to PDF. You can create a figures with subplots, where each subplot is one of your images. You have full freedom to mess with the figure: Adding titles, play with position, etc. Once your figure is done, save to PDF. Each call to savefig will create another page of PDF.
Example below plots 2 images side-by-side, on page 1 and page 2.
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
from scipy.misc import imread
import os
import numpy as np
files = [ "Column0_Line16.jpg", "Column0_Line47.jpg" ]
def plotImage(f):
folder = "C:/temp/"
im = imread(os.path.join(folder, f)).astype(np.float32) / 255
plt.imshow(im)
a = plt.gca()
a.get_xaxis().set_visible(False) # We don't need axis ticks
a.get_yaxis().set_visible(False)
pp = PdfPages("c:/temp/page1.pdf")
plt.subplot(121)
plotImage(files[0])
plt.subplot(122)
plotImage(files[1])
pp.savefig(plt.gcf()) # This generates page 1
pp.savefig(plt.gcf()) # This generates page 2
pp.close()
rinohtype supports embedding PDF, PNG and JPEG images (natively) and other bitmap formats (when Pillow is installed).
(Full disclosure: I am the author of rinohtype)
fpdf is python (too). And often used. See PyPI / pip search. But maybe it was renamed from pyfpdf to fpdf. From features:
PNG, GIF and JPG support (including transparency and alpha channel)
If you are familiar with LaTex you might want to consider pylatex
One of the advantages of pylatex is that it is easy to control the image quality. The images in your pdf will be of the same quality as the original images. When using reportlab, I experienced that the images were automatically compressed, and the image quality reduced.
The disadvantage of pylatex is that, since it is based on LaTex, it can be hard to place images exactly where you want on the page. However, I have found that using the position argument in the Figure class, and sometimes Subfigure, gives good enough results.
Example code for creating a pdf with a single image:
from pylatex import Document, Figure
doc = Document(documentclass="article")
with doc.create(Figure(position='p')) as fig:
fig.add_image('Lenna.png')
doc.generate_pdf('test', compiler='latexmk', compiler_args=["-pdf", "-pdflatex=pdflatex"], clean_tex=True)
In addition to installing pylatex (pip install pylatex), you need to install LaTex. For Ubuntu and other Debian systems you can run sudo apt-get install texlive-full. If you are using Windows I would recommend MixTex
I have done this quite a bit in PyQt and it works very well. Qt has extensive support for images, fonts, styles, etc and all of those can be written out to pdf documents.
I believe that matplotlib has the ability to serialize graphics, text and other objects to a pdf document.
I use rst2pdf to create a pdf file, since I am more familiar with RST than with HTML. It supports embedding almost any kind of raster or vector images.
It requires reportlab, but I found reportlab is not so straight forward to use (at least for me).
You can actually try xhtml2pdf http://flask.pocoo.org/snippets/68/
It depends on what format your image files are in, but for a project here at work I used the tiff2pdf tool in LibTIFF from RemoteSensing.org.
Basically just used subprocess to call tiff2pdf.exe with the appropriate argument to read the kind of tiff I had and output the kind of pdf I wanted. If they aren't tiffs you could probably convert them to tiffs using PIL, or maybe find a tool more specific to your image type (or more generic if the images will be diverse) like ReportLab mentioned above.
Related
I have a requirement involving making a large pptx file with loads of charts, images and tables dynamic. This pptx has default styles. Is there any library or method to make a template so that I can insert the dynamic parts. Like docxtpl library allows us to input a dict and generate a docx file.
Thanks.
You can try the following python modules:
https://pypi.org/project/template-pptx-jinja/
https://pypi.org/project/pptx-template-simple/
https://pypi.org/project/python-pptx-templater/
With the first one, the example they show, it works fine, but im having trouble applying it in another custom ppt.. keep getting "Unepexpected end of template".
The third one didint work for me though, maybe you have more luck.. and now i will try the second one.
I'm not sure I understand your problem...
python-pptx (a tag you've used) is the fundamental programmable way to build a presentation - whether from a "template presentation or not.)
I hope nobody will mind me advertising my md2pptx open source project for taking Markdown and images and making a presentation.
However, I think md2pptx doesn't help you unless you have a way of turning graphs into eg PNG files - and I suspect that's not what you want.
http://pillow.readthedocs.org/en/3.0.x/handbook/image-file-formats.html#pdf
The Pillow docs mention being able to save multiple pages, but I can't find any other docs or code samples that tell you how to add pages to a new PDF.
You could try the following:
im.save('test.pdf', save_all=True)
By default, the output format is determined by the file extension. This is documented in the release notes.
Doesn't seem like this is currently possible with Pillow, I used reportlab to build a multi-page PDF from scratch:
https://pypi.python.org/pypi/reportlab
Is it possible to output individual figures from Bokeh as pdf or svg images? I feel like I'm missing something obvious, but I've checked the online help pages and gone through the bokeh.objects api and haven't found anything...
There is no way to save PDF currently, but as of Bokeh 0.12.6, it is now possible to export PNG and SVG directly from
Python code.
Exporting PNGs looks like this
export_png(plot, filename="plot.png")
And exporting SVGs looks like this
plot.output_backend = "svg"
export_svgs(plot, filename="plot.svg")
There are some optional dependencies that need to be installed.
You can find more information in the Exporting Plots section of the User Guide.
In the meantime... as a workaround, until we get a native support, you can use phantom.js to convert the HTML output into a pdf file. We use it in our example testing directory to convert HTML generated plots into png images, but you could also get pdf images:
https://github.com/ContinuumIO/bokeh/blob/master/examples/test#L217
And more info here:
http://phantomjs.org/screen-capture.html
It seems that since bokeh uses html5 canvas as a backend, it will be writing things to static html pages. You could always export the html to pdf later.
So, it's 2022 now and there's no direct support for exporting images as pdf.
However, there is a alternative way which works just as well.
First, as the other answers have mentioned, set the backend to 'svg'.
plot.output_backend = "svg"
then save the image as an svg file. Do not save it as an html file as this will likely lead to extra space around the actual image.
Then use any online svg to pdf convertor to get the pdf equivalent of the svg.
I'm trying to automatically generate some pdf format reports in Python. I have figures that I want in the reports, but the figures are currently saved as pdfs. Saving the figures as something else is an option, but not ideal for what I'm trying to do. I've found examples (http://code.google.com/p/pdfrw/wiki/ExampleTools) using pdfrw and reportlab to turn pages from one pdf into pages of a new pdf, but I don't want them to be entire pages of my new pdf, just a figure occupying a section of the page. I haven't used pdfrw before, so I don't know a lot about the Canvas method and what it is fully capable of.
The trick to using a part of a page with the pagemerge canvas is the ViewInfo object. With it, you can describe, among other things, rectangle coordinates (e.g. when you are adding a page to a PageMerge object).
ViewInfo objects are defined and described in more detail in the buildxobj.py file.
I should make another example and some documentation for that, but here is a similar stackoverflow question that I answered awhile back. HTH
(Disclaimer: I am the pdfrw author.)
I am using reportlab toolkit in Python to generate some reports in PDF format. I want to use some predefined parts of documents already published in PDF format to be included in generated PDF file. Is it possible (and how) to accomplish this in reportlab or in python library?
I know I can use some other tools like PDF Toolkit (pdftk) but I am looking for Python-based solution.
I'm currently using PyPDF to read, write, and combine existing PDF's and ReportLab to generate new content. Using the two package seemed to work better than any single package I was able to find.
If you want to place existing PDF pages in your Reportlab documents I recommend pdfrw. Unlike PageCatcher it is free.
I've used it for several projects where I need to add barcodes etc to existing documents and it works very well. There are a couple of examples on the project page of how to use it with Reportlab.
A couple of things to note though:
If the source PDF contains errors (due to the originating program following the PDF spec imperfectly for example), pdfrw may fail even though something like Adobe Reader has no apparent problems reading the PDF. pdfrw is currently not very fault tolerant.
Also, pdfrw works by being completely agnostic to the actual content of the PDF page you are placing. So for example, you wouldn't be able to use pdfrw inspect a page to see if it contains a certain string of text in the lower right-hand corner. However if you don't need to do anything like that you should be fine.
There is an add-on for ReportLab — PageCatcher.