Embed .SVG files into PDF using reportlab - python

I have written a script in python that produces matplotlib graphs and puts them into a pdf report using reportlab.
I am having difficulty embedding SVG image files into my PDF file. I've had no trouble using PNG images but I want to use SVG format as this produces better quality images in the PDF report.
This is the error message I am getting:
IOError: cannot identify image file
Does anyone have suggestions or have you overcome this issue before?

Yesterday I succeeded in using svglib to add a SVG Image as a reportlab Flowable.
so this drawing is an instance of reportlab Drawing, see here:
from reportlab.graphics.shapes import Drawing
a reportlab Drawing inherits Flowable:
from reportlab.platypus import Flowable
Here is a minimal example that also shows how you can scale it correctly (you must only specify path and factor):
from svglib.svglib import svg2rlg
drawing = svg2rlg(path)
sx = sy = factor
drawing.width, drawing.height = drawing.minWidth() * sx, drawing.height * sy
drawing.scale(sx, sy)
#if you want to see the box around the image
drawing._showBoundary = True

As mentioned by skidzo, you can totally do this with the svglib package, which you can find here: https://pypi.python.org/pypi/svglib/
According to the website, Svglib is a pure-Python library for reading SVG files and converting them (to a reasonable degree) to other formats using the ReportLab Open Source toolkit.
You can use pip to install svglib.
Here is a complete example script:
# svg_demo.py
from reportlab.graphics import renderPDF, renderPM
from reportlab.platypus import SimpleDocTemplate
from svglib.svglib import svg2rlg
def svg_demo(image_path, output_path):
drawing = svg2rlg(image_path)
renderPDF.drawToFile(drawing, output_path)
if __name__ == '__main__':
svg_demo('/path/to/image.svg', 'svg_demo.pdf')

skidzo's answer is very helpful, but isn't a complete example of how to use an SVG file as a flowable in a reportlab PDF. Hopefully this is helpful for others trying to figure out the last few steps:
from io import BytesIO
import matplotlib.pyplot as plt
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.platypus import SimpleDocTemplate, Paragraph
from svglib.svglib import svg2rlg
def plot_data(data):
# Plot the data using matplotlib.
plt.plot(data)
# Save the figure to SVG format in memory.
svg_file = BytesIO()
plt.savefig(svg_file, format='SVG')
# Rewind the file for reading, and convert to a Drawing.
svg_file.seek(0)
drawing = svg2rlg(svg_file)
# Scale the Drawing.
scale = 0.75
drawing.scale(scale, scale)
drawing.width *= scale
drawing.height *= scale
return drawing
def main():
styles = getSampleStyleSheet()
pdf_path = 'sketch.pdf'
doc = SimpleDocTemplate(pdf_path)
data = [1, 3, 2]
story = [Paragraph('Lorem ipsum!', styles['Normal']),
plot_data(data),
Paragraph('Dolores sit amet.', styles['Normal'])]
doc.build(story)
main()

You need to make sure you are importing PIL (Python Imaging Library) in your code so that ReportLab can use it to handle image types like SVG. Otherwise it can only support a few basic image formats.
That said, I recall having some trouble, even when using PIL, with vector graphics. I don't know if I tried SVG but I remember having a lot of trouble with EPS.

Related

Convert .svg to .png with svg2rlg and renderPM

The goal is to composite several .svg files, and then convert the resulting .svg to .png for display in a tkiner frame. The resulting .svg renders nicely, but when converted to .png the image has horizontal black bars, and some filling of enclosed spaces.
I'm not commited to using the svglib and reportlab libraries - I just want to successfully convert file formats. Finally, I'm a newb to python, and verbose examples would be hugely appreciated. Thanks in advance for your time and effort.
from tkinter import *
import tkinter as tk
import svgutils
import svgutils.transform as sg
from svgutils.compose import Unit
from svglib.svglib import svg2rlg
from reportlab.graphics import renderPM
root = Tk()
# Generate a blank SVG image
input_width = "800px"
input_length = "800px"
width = Unit(input_width)
length = Unit(input_length)
fig = sg.SVGFigure(width, length)
# Use the following images downloaded to script local directory
# https://upload.wikimedia.org/wikipedia/commons/9/9a/Mandala_52.svg
# https://upload.wikimedia.org/wikipedia/commons/7/77/Mandala_21.svg
# Create the composite svg image
image1 = "Mandala_52.svg"
image2 = "Mandala_21.svg"
figure1 = sg.fromfile(image1)
figure2 = sg.fromfile(image2)
fig1 = figure1.getroot()
fig2 = figure2.getroot()
fig.append([fig1])
fig.append([fig2])
fig.save("MandalaComposite.svg")
# Convert svg to png <--- This is where the errors happen
drawing = svg2rlg("MandalaComposite.svg")
renderPM.drawToFile(drawing, "MandalaComposite.png")
root.mainloop()

How to extract images and image BBox coordinates using python?

I am trying to extract images in PDF with BBox coordinates of the image.
I tried using pdfrw library, it is identifying image objects and it have an attribute called media box which have some coordinates, i am not sure if those are correct bbox coordinates since for some pdfs it is showing something like this
['0', '0', '684', '864']
but image doesn't start at the start of the page, so i don't think it is bbox
I tried with following code using pdfrw
import pdfrw, os
from pdfrw import PdfReader, PdfWriter
from pdfrw.findobjs import page_per_xobj
outfn = 'extract.' + os.path.basename(path)
pages = list(page_per_xobj(PdfReader(path).pages, margin=0.5*72))
writer = PdfWriter(outfn)
writer.addpages(pages)
writer.write()
How do i get image along with it's bbox coordinates?
sample pdf : https://drive.google.com/open?id=1IVbj1b3JfmSv_BJvGUqYvAPVl3FwC2A-
I found a way to do it through a library called pdfplumber. It's built on top of pdfminer and is working consistently in my use-case. And moreover, its MIT licensed so it is helpful for my office work.
import pdfplumber
pdf_obj = pdfplumber.open(doc_path)
page = pdf_obj.pages[page_no]
images_in_page = page.images
page_height = page.height
image = images_in_page[0] # assuming images_in_page has at least one element, only for understanding purpose.
image_bbox = (image['x0'], page_height - image['y1'], image['x1'], page_height - image['y0'])
cropped_page = page.crop(image_bbox)
image_obj = cropped_page.to_image(resolution=400)
image_obj.save(path_to_save_image)
Worked well for tables and images in my case.

Which is the best way to make a report in PDF with more than 100 plots with Python?

I need to have a report in PDF with a lot of plots. Most of them will be created with matplotlib within a loop, but I would need also to include pandas plots and dataframes (the whole view) and seaborn plots. Right now I have explored the following solutions:
PythonTex. I have already used it for other projects, but it would consume a lot of time because you have to write \pythontexprint for each plot you want to display.
Use savefig command in every iteration of the loop and save all the plots as image for inserting all in Latex later. That would be very time consuming choice too. Other option is with that command save the plots as pdf and then merge all the pdfs. That would create an ugly report since the plots are not going to fit the whole page.
Use RStudio with reticulate for creating a Markdown report. The problem here is that I would need to learn reticulate functionality, thus spending time.
As far as I know, PyPDF does not fit my needs.
Create a jupyter notebook and then try to export it to a PDF. Once again, I do not know how to use jupyter notebook and I read that I would have to convert first to html and then to pdf.
Solutions from here: Generating Reports with Python: PDF or HTML to PDF However, the question is from three years ago and it might better options nowadays.
So my question is the following: is there any easy and quick way of getting all those plots (if it is along the code which generates them even better) in a PDF with a decent aspect?
My recommendation would be to use matplotlibs savefig to a BytesIO buffer (or save buffers to a list or similar data structure for 100). Then you can use those image buffers to insert the image into a pdf using a library like reportlab (website here and docs here). I regularly use this approach to create PowerPoint documents using python-pptx library but also verified it via PDF with reportlab. reportlab library is very powerful and a bit "low level" so there might be a little learning curve getting started but it surely meets your needs. There is a simple getting started tutorial here. reportlab is BSD license and available on pip and conda.
Anyways my code snippet looks like this.
Sorry its a bit long but my code has some helper functions to print text and dummy images. You should be able to copy/paste it directly.
The code will yield a PDF that looks like this
import io
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Image
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch
import numpy as np
import matplotlib.pyplot as plt
def plot_hist():
""" Create a sample histogram plot and return a bytesio buffer with plot
Returns
-------
BytesIO : in memory buffer with plot image, can be passed to reportlab or elsewhere
"""
# from https://matplotlib.org/gallery/lines_bars_and_markers/scatter_masked.html#sphx-glr-gallery-lines-bars-and-markers-scatter-masked-py
plt.figure(figsize=(7, 2.25))
N = 100
r0 = 0.6
x = 0.9 * np.random.rand(N)
y = 0.9 * np.random.rand(N)
area = (20 * np.random.rand(N))**2 # 0 to 10 point radii
c = np.sqrt(area)
r = np.sqrt(x * x + y * y)
area1 = np.ma.masked_where(r < r0, area)
area2 = np.ma.masked_where(r >= r0, area)
plt.scatter(x, y, s=area1, marker='^', c=c)
plt.scatter(x, y, s=area2, marker='o', c=c)
# Show the boundary between the regions:
theta = np.arange(0, np.pi / 2, 0.01)
plt.plot(r0 * np.cos(theta), r0 * np.sin(theta))
# create buffer and save image to buffer
# dpi should match the dpi of your PDF, I think 300 is typical otherwise it won't pretty well
buf = io.BytesIO()
plt.savefig(buf, format='png', dpi=300)
buf.seek(0)
# you'll want to close the figure once its saved to buffer
plt.close()
return buf
def add_text(text, style="Normal", fontsize=12):
""" Adds text with some spacing around it to PDF report
Parameters
----------
text : str
The string to print to PDF
style : str
The reportlab style
fontsize : int
The fontsize for the text
"""
Story.append(Spacer(1, 12))
ptext = "<font size={}>{}</font>".format(fontsize, text)
Story.append(Paragraph(ptext, styles[style]))
Story.append(Spacer(1, 12))
# Use basic styles and the SimpleDocTemplate to get started with reportlab
styles=getSampleStyleSheet()
doc = SimpleDocTemplate("form_letter.pdf",pagesize=letter,
rightMargin=inch/2,leftMargin=inch/2,
topMargin=72,bottomMargin=18)
# The "story" just holds "instructions" on how to build the PDF
Story=[]
add_text("My Report", style="Heading1", fontsize=24)
# See plot_hist for information on how to get BytesIO object of matplotlib plot
# This code uses reportlab Image function to add and valid PIL input to the report
image_buffer1 = plot_hist()
im = Image(image_buffer1, 7*inch, 2.25*inch)
Story.append(im)
add_text("This text explains something about the chart.")
image_buffer2 = plot_hist()
im = Image(image_buffer2, 7*inch, 2.25*inch)
Story.append(im)
add_text("This text explains something else about another chart chart.")
# This command will actually build the PDF
doc.build(Story)
# should close open buffers, can use a "with" statement in python to do this for you
# if that works better
image_buffer1.close()
image_buffer2.close()

Create a pdf in python and keep the coordinates of elements

I need to generate an examination template for an online school website,
I need to know each coordinates of answers boxes in order to crop them later.
Is it possible to generate a pdf and get coordinates from each elements inside the pdf ? (Like inserting a black square as an image in the pdf and get his coordinates ?)
I found many libraries to create pdf like pyPdf, pyPdf2,... but i didn't find a way to get coordinates.
Thank you for your suggestions and advices.
You could use reportlab. It would allow you to access coordinates by specifying them yourself:
from reportlab.platypus import SimpleDocTemplate, Paragraph
from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle
from reportlab.lib.units import inch
from reportlab.lib.pagesizes import letter
import io
buf = io.BytesIO()
doc = SimpleDocTemplate(buf, rightMargin=inch/2, leftMargin=inch/2, topMargin=inch/2, bottomMargin=inch/2, pagesize=letter)
styles = getSampleStyleSheet()
answers = []
answers.append(Paragraph('Data for Answer box', styles['Normal']))
doc.build(answers)
school_pdf = open('answers.pdf', 'a')
school_pdf.write(buf.getvalue())

Python ipyleaflet export map as PNG or JPG or SVG

I have tried to export a visualisation of data with ipyleaflet as PNG or any other file format but i could not find a method that is working. For example in folium there is map.save(path). Is there a library or method in ipyleaflet that i have missed in my research which helps me to accomplish my goal?
here is some example code to generate a map
from ipyleaflet import *
center = [34.6252978589571, -77.34580993652344]
zoom = 10
m = Map(default_tiles=TileLayer(opacity=1.0), center=center, zoom=zoom)
m
I'd like to export this map as an image file without taking a screenshot manually.
I found two sources that allow to export javascript leaflet maps:
https://github.com/aratcliffe/Leaflet.print and https://github.com/mapbox/leaflet-image
Unfortunately i was not able to make use of them in python.
My colleague and I found a decent work around for ipyleaflet (python) image export. Here is how it works. The folium library is required for an export. The GeoJson data in this example is already prepared with style properties:
import folium
map = folium.Map([51., 12.], zoom_start=6,control_scale=True)
folium.GeoJson(data).add_to(map)
map.save('map.html')
This is how the result looks:
The html file can be further processed in python (windows) with subprocess calls to make a PDF or PNG out of it. I hope this helps as the ipyleaflet doc for python is almost non existant.
For generating html, you can use ipywidgets
from ipywidgets.embed import embed_minimal_html
embed_minimal_html('map.html', views=[m])
If you want to make a PNG, you can use ipywebrtc, more specifically:
https://ipywebrtc.readthedocs.io/en/latest/ImageRecorder.html
https://ipywebrtc.readthedocs.io/en/latest/WidgetStream.html
Or in code:
from ipywebrtc import WidgetStream, ImageRecorder
widget_stream = WidgetStream(widget=m, max_fps=1)
image_recorder = ImageRecorder(stream=widget_stream)
display(image_recorder)
Saving the PNG:
with open('map.png', 'wb') as f:
f.write(image_recorder.image.value)
Or converting to pillow image for preprocessing:
import PIL.Image
import io
im = PIL.Image.open(io.BytesIO(image_recorder.image.value))
ipyleaflet supports saving as html. Export of svg and png does not seem to be supported.
https://ipyleaflet.readthedocs.io/en/latest/map_and_basemaps/map.html#save-to-html
m.save('output.html')
I created an issue ticket for ipyleaflet:
https://github.com/jupyter-widgets/ipyleaflet/issues/1083

Categories