How to save a Bokeh plot as PDF? - python

I'm working with Bokeh a lot and I'm looking for a way to create a PDF from the figure I created.
Is there an option to achieve this goal?

This is possible with a combination of the three python package bokeh, svglib and reportlab which works perfect for me.
This will include 3 steps:
creating a bokeh svg output
read in this svg
saving this svg as pdf
Minimal Example
To show how this could work please see the following example.
from bokeh.plotting import figure
from bokeh.io import export_svgs
import svglib.svglib as svglib
from reportlab.graphics import renderPDF
test_name = 'bokeh_to_pdf_test'
# Example plot p
p = figure(plot_width=400, plot_height=400, tools="")
p.circle(list(range(1,6)),[2, 5, 8, 2, 7], size=10)
# See comment 1
p.xaxis.axis_label_standoff = 12
p.xaxis.major_label_standoff = 12
# step 1: bokeh save as svg
p.output_backend = "svg"
export_svgs(p, filename = test_name + '.svg')
# see comment 2
svglib.register_font('helvetica', '/home/fonts/Helvetica.ttf')
# step 2: read in svg
svg = svglib.svg2rlg(test_name+".svg")
# step 3: save as pdf
renderPDF.drawToFile(svg, test_name+".pdf")
Comment 1
There is an extra information used for axis_label_standoff and major_label_standoff because the ticks of the x-axis are moving without this definition a bit up and this looks not so good.
Comment 2
If you get a long list of warnings like
Unable to find a suitable font for 'font-family:helvetica'
Unable to find a suitable font for 'font-family:helvetica'
....
Unable to find a suitable font for 'font-family:helvetica'
the ppdf is still created. This warning appears because the default font in bokeh is named helvetica, which is not known by svglib. svglib looks for this font at a defined place. If this font is not there, the message appears. This means bokeh will use its own default font instead.
To get rid of this message you can register a font in svglib like this
# name in svglib, path to font
svglib.register_font('helvetica' , f'/{PATH_TO_FONT}/Helvetica.ttf')
right before calling svglib.svg2rlg().
Output
This code will create the same figure twice, once with the suffix .svg and once with the suffix .pdf.
The figure looks like this:

Related

Save plotly figure interactively (html) whilst preserving LaTeX font

I created a plotly figure using python and I am aware that one can save the interactive figure in html format by using:
fig.write_html("name_of_figure.html")
For the axis labels, as well as the title of the figure, I used LaTeX fonts like this:
fig.update_layout(title=r'$\text{Some title}_2$')
When I render it in my browser directly the LaTeX fonts are displayed correctly. However, when I save the figure in .html format, the title, as well as the axis labels, are not rendered using LaTeX fonts. I rather see the plain text like $\text{Some title}_2$.
What can I do to circumvent that problem?
Add an include_mathjax = 'cdn' parameter to .write_html.
And read the documentation of write_html function carefully :)
include_mathjax: bool or string (default False)
Specifies how the MathJax.js library is included in the output html div string. MathJax is required in order to display labels with LaTeX typesetting.
If False, no script tag referencing MathJax.js will be included in the output.
If 'cdn', a script tag that references a MathJax CDN location will be included in the output. HTML div strings generated with this option will be able to display
LaTeX typesetting as long as internet access is available.
If a string that ends in '.js', a script tag is included that
references the specified path. This approach can be used to point the
resulting HTML div string to an alternative CDN.
import plotly.express as px
fig = px.line(x = [0,1,2], y = [0,1,4])
fig.update_layout(title=r'$\text{Some title}_2$')
fig.write_html("example_figure.html", include_mathjax = 'cdn')

Python plotly sankey export broken

I have a python sankey chart which works well when exporting the html but looks completely broken when exporting it to other file formats
import plotly.graph_objects as go
fig = go.Figure(data=[go.Sankey(
node = dict(label = data["label"]),
link = dict(source = data["source"],target = data["target"],value = data["value"])
)])
fig.write_image("sankey.svg")
fig.write_image("sankey.eps")
fig.write_image("sankey.png")
fig.write_html("sankey.html")
HTML Screenshot
PNG Export (SVG, EPS differ a bit but also look broken)
I'm using python 3.8.5 with the kaleido 0.0.3 engine.
Additionally, I've tried Orca 1.2.1 but got the same results.
The answer actually is very easy. Tough most of the charts can figure out the required size on their own, the sankey chart obviously can't. So basically you just have to set dimensions for all exports on sankey charts (yes even for vector graphics like eps and svg).
Also worth mentioning is that a minimum size is required. While my example now looks satisfying with 1920x1080, a size of 1280x720 looks broken even with vector-graphics.
fig = go.Figure(...)
fig.update_layout(width=1920, height=1080)
fig.write_image(...)

Customize font when using style sheet

I am using the 'ggplot' style sheet. Now, the style is great, except that I would like to specifically change the font style. Is that possible?
If found the documentation about customizing styles. However, I just want to change the font while keeping the rest of the style.
Also, does anyone know where to see the setting-details of each style (like font, figsize, etc.)?
plt.imshow(ori, vmin = 0, vmax = 300)
plt.style.use('ggplot')
plt.show()
Combining styles
I think that the most elegant is to combine styles.
For example, you could define your own font settings in mystyle.mplstyle (see below where to save it, and what it could look like). To get the ggplot style with your own font settings you would then only have to specify:
plt.style.use(['ggplot', 'mystyle'])
This solution is elegant, because it allows consistent application in all your plots and allows you to mix-and-match.
Where to save your custom style?
Taken from one of my own styles mystyle.mplstyle could have the following entries (you should customise to your need obviously):
font.family : serif
font.serif : CMU Serif
font.weight : bold
font.size : 18
text.usetex : true
Which you should save it matplotlib's configuration directory. For me this is ~/.matplotlib/stylelib/, but use
import matplotlib
matplotlib.get_configdir()
to find out what to use on your operating system. See documentation. You could also write a Python function to install in the right location.
Where to find existing styles?
Then the final part of your question. First, it is usefull to know that you can obtain a list with available styles using
import matplotlib.pyplot as plt
plt.style.available
See the documentation for a graphical representation.
How to inspect for example ggplot.mplstyle? I think that the best reference in matplotlib's source. You can also find the *.mplstyle files on your system. Where, however, depends on your operating system and installation. For me
find / -iname 'ggplot.mplstyle' 2>/dev/null
gives
/usr/local/lib/python3.7/site-packages/matplotlib/mpl-data/stylelib/ggplot.mplstyle
Or more generally you could search for all styles:
find / -iname '*.mplstyle' 2>/dev/null
For Windows I am not really an expert, but maybe the file-paths that were listed above give you a clue where to look.
Python script to install style in the right location
To install your custom styles in the right location, you could build a script that looks something like:
def copy_style():
import os
import matplotlib
# style definition(s)
styles = {}
styles['mystyle.mplstyle'] = '''
font.family : serif
'''
# write style definitions
# directory name where the styles are stored
dirname = os.path.abspath(os.path.join(matplotlib.get_configdir(), 'stylelib'))
# make directory if it does not yet exist
if not os.path.isdir(dirname): os.makedirs(dirname)
# write all styles
for fname, style in styles.items():
open(os.path.join(dirname, fname),'w').write(style)
Yes, it is possible. And you can do it either locally by passing the font to individual labels
font = {'fontname':'your font'}
plt.xlabel('xlabel', **hfont)
plt.ylabel('xlabel', **hfont)
or globally
import matplotlib.pyplot as plt
plt.rcParams['font.family'] = 'your font'
Try rcParams
matplotlib.rcParams.update({'font.size': 12})
Read more matplotlib

Which is the best way to make a report in PDF with more than 100 plots with Python?

I need to have a report in PDF with a lot of plots. Most of them will be created with matplotlib within a loop, but I would need also to include pandas plots and dataframes (the whole view) and seaborn plots. Right now I have explored the following solutions:
PythonTex. I have already used it for other projects, but it would consume a lot of time because you have to write \pythontexprint for each plot you want to display.
Use savefig command in every iteration of the loop and save all the plots as image for inserting all in Latex later. That would be very time consuming choice too. Other option is with that command save the plots as pdf and then merge all the pdfs. That would create an ugly report since the plots are not going to fit the whole page.
Use RStudio with reticulate for creating a Markdown report. The problem here is that I would need to learn reticulate functionality, thus spending time.
As far as I know, PyPDF does not fit my needs.
Create a jupyter notebook and then try to export it to a PDF. Once again, I do not know how to use jupyter notebook and I read that I would have to convert first to html and then to pdf.
Solutions from here: Generating Reports with Python: PDF or HTML to PDF However, the question is from three years ago and it might better options nowadays.
So my question is the following: is there any easy and quick way of getting all those plots (if it is along the code which generates them even better) in a PDF with a decent aspect?
My recommendation would be to use matplotlibs savefig to a BytesIO buffer (or save buffers to a list or similar data structure for 100). Then you can use those image buffers to insert the image into a pdf using a library like reportlab (website here and docs here). I regularly use this approach to create PowerPoint documents using python-pptx library but also verified it via PDF with reportlab. reportlab library is very powerful and a bit "low level" so there might be a little learning curve getting started but it surely meets your needs. There is a simple getting started tutorial here. reportlab is BSD license and available on pip and conda.
Anyways my code snippet looks like this.
Sorry its a bit long but my code has some helper functions to print text and dummy images. You should be able to copy/paste it directly.
The code will yield a PDF that looks like this
import io
from reportlab.lib.pagesizes import letter
from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Image
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.lib.units import inch
import numpy as np
import matplotlib.pyplot as plt
def plot_hist():
""" Create a sample histogram plot and return a bytesio buffer with plot
Returns
-------
BytesIO : in memory buffer with plot image, can be passed to reportlab or elsewhere
"""
# from https://matplotlib.org/gallery/lines_bars_and_markers/scatter_masked.html#sphx-glr-gallery-lines-bars-and-markers-scatter-masked-py
plt.figure(figsize=(7, 2.25))
N = 100
r0 = 0.6
x = 0.9 * np.random.rand(N)
y = 0.9 * np.random.rand(N)
area = (20 * np.random.rand(N))**2 # 0 to 10 point radii
c = np.sqrt(area)
r = np.sqrt(x * x + y * y)
area1 = np.ma.masked_where(r < r0, area)
area2 = np.ma.masked_where(r >= r0, area)
plt.scatter(x, y, s=area1, marker='^', c=c)
plt.scatter(x, y, s=area2, marker='o', c=c)
# Show the boundary between the regions:
theta = np.arange(0, np.pi / 2, 0.01)
plt.plot(r0 * np.cos(theta), r0 * np.sin(theta))
# create buffer and save image to buffer
# dpi should match the dpi of your PDF, I think 300 is typical otherwise it won't pretty well
buf = io.BytesIO()
plt.savefig(buf, format='png', dpi=300)
buf.seek(0)
# you'll want to close the figure once its saved to buffer
plt.close()
return buf
def add_text(text, style="Normal", fontsize=12):
""" Adds text with some spacing around it to PDF report
Parameters
----------
text : str
The string to print to PDF
style : str
The reportlab style
fontsize : int
The fontsize for the text
"""
Story.append(Spacer(1, 12))
ptext = "<font size={}>{}</font>".format(fontsize, text)
Story.append(Paragraph(ptext, styles[style]))
Story.append(Spacer(1, 12))
# Use basic styles and the SimpleDocTemplate to get started with reportlab
styles=getSampleStyleSheet()
doc = SimpleDocTemplate("form_letter.pdf",pagesize=letter,
rightMargin=inch/2,leftMargin=inch/2,
topMargin=72,bottomMargin=18)
# The "story" just holds "instructions" on how to build the PDF
Story=[]
add_text("My Report", style="Heading1", fontsize=24)
# See plot_hist for information on how to get BytesIO object of matplotlib plot
# This code uses reportlab Image function to add and valid PIL input to the report
image_buffer1 = plot_hist()
im = Image(image_buffer1, 7*inch, 2.25*inch)
Story.append(im)
add_text("This text explains something about the chart.")
image_buffer2 = plot_hist()
im = Image(image_buffer2, 7*inch, 2.25*inch)
Story.append(im)
add_text("This text explains something else about another chart chart.")
# This command will actually build the PDF
doc.build(Story)
# should close open buffers, can use a "with" statement in python to do this for you
# if that works better
image_buffer1.close()
image_buffer2.close()

How to prevent plotly from plotting automatically

I just discovered plotly and like it so far. I have this code provided by the main website
import plotly.plotly as py
from plotly.graph_objs import *
trace0 = Scatter(
x=[1,2,3,4],
y=[10,15,13,17]
)
trace1 = Scatter(
x=[1,2,3,4],
y=[16,5,11,9]
)
data = Data([trace0, trace1])
unique_url = py.plot(data, filename='basic-line')
I am curious about two things:
1) When I run this code, my browser automatically pops up and shows me the graph. All I want is the url so that I can later embed it in an html file. Is there a way to turn off the feature that opens my browser and shows me the graph?
2) Is there a way to get rid of the 'Play with this data' link?
I have combed through the documentation provided, but have come up empty-handed on these two issues.
To disable pop-ups you could use auto_open=FALSE and try the following
py.plot(data, filename='basic_line', auto_open=False)
py.plot(data, show_link=False) will take that link off (if you are referring to the link that says Export to plot.ly). At least it does using:
import plotly.offline as py. As for the link at the top (when you hover your mouse over the graph), I'm trying to get rid of the Save and edit plot in cloud but only find options for that under the java script version... and that hides the whole bar which has other useful items on it (javascript option is: {displayModeBar: false}). Obviously I am finding the reference to "play with this data" ambiguous. You can see the workaround I wrote here: Adding config modes to Plotly.Py offline - modebar
You can easily remove that Export to plot.ly link in the offline graph.
Open your saved html file in a text editor. and search for. {"showLink": true, "linkText": "Export to plot.ly"}
And change the true value to false.

Categories