Save multiple seaborn plots into one pdf file - python

At the moment I'm learning how to work with matplotlib and seaborn and the concept behind it seems quite strange to me. One would expect the sns.countplot function to return an object that has a .plot() and .save() fuction so one could work with the plot in a different function.
Instead it seems that every call to sns.countplot overwrites the previous object (see MWE).
So one the one hand It would be grate if someone could provide a explanation of the matplotlib and seaborn interface (or have some good doku linked). Since all the doku I read wasn't of any great help.
On the other hand I have a function that returns some plots, which I want to save as an .pdf file with one plot per page. I found this similar question but can't copy the code over in a way to make my MWE work.
from matplotlib.backends.backend_pdf import PdfPages
import seaborn as sns
def generate_plots():
penguins = sns.load_dataset("penguins")
countplot_sex = sns.countplot(y='sex', data=penguins)
countplot_species = sns.countplot(y='species', data=penguins)
countplot_island = sns.countplot(y='island', data=penguins)
# As showes
# print(countplot_sex) -> AxesSubplot(0.125,0.11;0.775x0.77)
# print(countplot_species) -> AxesSubplot(0.125,0.11;0.775x0.77)
# print(countplot_island) -> AxesSubplot(0.125,0.11;0.775x0.77)
# All three variables contain the same object
return(countplot_sex, countplot_species, countplot_island)
def plots2pdf(plots, fname): # from: https://stackoverflow.com/a/21489936
pp = PdfPages('multipage.pdf')
for plot in plots:
pass
# TODO save plot
# Does not work: plot.savefig(pp, format='pdf')
pp.savefig()
pp.close()
def main():
plots2pdf(generate_plots(), 'multipage.pdf')
if __name__ == '__main__':
main()
My Idea is to have a somewhat decent software architecture with one function generating plots and another function saving them.

The problem is that by default, sns.countplot will do its plotting on the current matplotlib Axes instance. From the docs:
ax matplotlib Axes, optional
Axes object to draw the plot onto, otherwise uses the current Axes.
One solution would be to define a small function that creates a new figure and Axes instance, then passes that to sns.countplot, to ensure it is plotted on a new figure and does not overwrite the previous one. This is what I have shown in the example below. An alternative would be to just create 3 figures and axes, and then pass each one to the sns.countplot function yourself.
Then in your plots2pdf function, you can iterate over the Axes, and pass their figure instances to the PdfPages instance when you save. (Note: Since you create the figures in the generate_plots function, an alternative would be to return the figure instances from that function, then you have them ready to pass into the pp.savefig function, but I did it this way so the output from your function remained the same).
from matplotlib.backends.backend_pdf import PdfPages
import seaborn as sns
import matplotlib.pyplot as plt
def generate_plots():
penguins = sns.load_dataset("penguins")
def my_countplot(y, data):
fig, ax = plt.subplots()
sns.countplot(y=y, data=data)
return ax
countplot_sex = my_countplot(y='sex', data=penguins)
countplot_species = my_countplot(y='species', data=penguins)
countplot_island = my_countplot(y='island', data=penguins)
return(countplot_sex, countplot_species, countplot_island)
def plots2pdf(plots, fname):
with PdfPages(fname) as pp:
for plot in plots:
pp.savefig(plot.figure)
def main():
plots2pdf(generate_plots(), 'multipage.pdf')
if __name__ == '__main__':
main()
A screenshot of the multipage pdf produced:

Related

How can I add multiple pre-generated subplots into a figure?

I'm using this library called FMSkill.
One of the method in this library is called .plot_timeseries
This method returns an Axes.Subplot object from matplotlib.
I'm trying to use that method to build a Multiplot Figure. I have a list called comparison that contains items upon which I can call the .plot_timeseries() method.
I've tried something like:
import math
import matplotlib as mpl
import numpy as np
import fmskill as fms
#Code to determine how many subplots in the figure
fig = plt.figure()
if len(comparison) % 2 == 0:
col, row = (int(math.ceil(np.sqrt(len(comparison)))),int(math.ceil(np.sqrt(len(comparison)))))
if len(comparison) % 2 == 1:
col, row = (int(math.ceil(np.sqrt(len(comparison)+1))),int(math.ceil(np.sqrt(len(comparison)+1))))
#Code where I try to iterate on the axes in my figures and set them using the .plot_timeseries() method
for graphs in range(len(comparison)):
ax = comparison[graphs].plot_timeseries()
fig.add_subplot(col,row,graphs+1)
This particular codes outputs a figure with the appropriate number of subplots. However the subplots are all empty. Also, it outputs every graphs generated by the .plot_timeseries() method separately.
I would like them to be put inside the subplots into one Figure.
Any ideas?
Thanks
The last portion of code is backwards.
# original
for graphs in range(len(comparison)):
ax = comparison[graphs].plot_timeseries()
fig.add_subplot(col,row,graphs+1)
Generate the axes object first, the pass it to the plot_timeseries function:
for graphs in range(len(comparison)):
ax = fig.add_subplot(col, row, graphs+1)
comparison[graphs].plot_timeseries(ax=ax)

Matplotlib: Plotting multiple histograms in plt.subplots

Background of the problem:
I'm working on a class that takes an Axes object as constructor parameter and produces a (m,n) dimension figure with a histogram in each cell, kind of like the figure below:
There are two important things to note here, that I'm not allowed to modified in any way:
The Figure object is not passed as a constructor parameter; only the Axes object is. So the subplots object cannot be modified in any way.
The Axes parameter is set to that of a (1,1) figure, by default (as below). All the modification required to make it an (m,n) figure are performed within the class (inside its methods)
_, ax = plt.subplots() # By default takes (1,1) dimension
cm = ClassName(model, ax=ax, histogram=True) # calling my class
What I'm stuck on:
Since I want to plot histograms within each cell, I decided to approach it by looping over each cell and creating a histogram for each.
results[col].hist(ax=self.ax[y,x], bins=bins)
However, I'm not able to specify the axes of the histogram in any way. This is because the Axes parameter passed is of default dimension (1,1) and hence not index-able. When I try this I get a TypeError saying.
TypeError: 'AxesSubplot' object is not subscriptable
With all this considered, I would like to know of any possible ways I can add my histogram to the parent Axes object. Thanks for taking a look.
The requirement is pretty strict and maybe not the best design choice. Because you later want to plot several subplots at the position of a single subplot, this single subplot is only created for the sole purpose of dying and being replaced a few moments later.
So what you can do is obtain the position of the axes you pass in and create a new gridspec at that position. Then remove the original axes and create a new set of axes at within that newly created gridspec.
The following would be an example. Note that it currently requires that the axes to be passed in is a Subplot (as opposed to any axes).
It also hardcodes the number of plots to be 2*2. In the real use case you would probably derive that number from the model you pass in.
import matplotlib.pyplot as plt
import numpy as np
from matplotlib import gridspec
class ClassName():
def __init__(self, model, ax=None, **kwargs):
ax = ax or plt.gca()
if not hasattr(ax, "get_gridspec"):
raise ValueError("Axes needs to be a subplot")
parentgs = ax.get_gridspec()
q = ax.get_geometry()[-1]
# Geometry of subplots
m, n = 2, 2
gs = gridspec.GridSpecFromSubplotSpec(m,n, subplot_spec=parentgs[q-1])
fig = ax.figure
ax.remove()
self.axes = np.empty((m,n), dtype=object)
for i in range(m):
for j in range(n):
self.axes[i,j] = fig.add_subplot(gs[i,j], label=f"{i}{j}")
def plot(self, data):
for ax,d in zip(self.axes.flat, data):
ax.plot(d)
_, (ax,ax2) = plt.subplots(ncols=2)
cm = ClassName("mymodel", ax=ax2) # calling my class
cm.plot(np.random.rand(4,10))
plt.show()

Retrieve all colorbars in figure

I'm trying to manipulate all colorbar instances contained in a figure. There is fig.get_axes() to obtain a list of axes, but I cannot find anything similar for colorbars.
This answer, https://stackoverflow.com/a/19817573/7042795, only applies to special situations, but not the general case.
Consider this MWE:
import matplotlib.pyplot as plt
import numpy as np
data = np.random.random((10,10)) # Generate some random data to plot
fig, axs = plt.subplots(1,2)
im1 = axs[0].imshow(data)
cbar1 = fig.colorbar(im1)
im2 = axs[1].imshow(2*data)
cbar2 = fig.colorbar(im2)
fig.show()
How can I get cbar1 and cbar2 from fig?
What I need is a function like:
def get_colorbars(fig):
cbars = fig.get_colorbars()
return cbars
cbars = get_colorbars(fig)
You would have no choice but to check each object present in the figure whether it has a colorbar or not. This could look as follows:
def get_colorbars(fig):
cbs = []
for ax in fig.axes:
cbs.extend(ax.findobj(lambda obj: hasattr(obj, "colorbar") and obj.colorbar))
return [a.colorbar for a in cbs]
This will give you all the colorbars that are tied to an artist. There may be more colorbars in the figure though, e.g. created directly from a ScalarMappble or multiple colorbars for the same object; those cannot be found.
Since the only place I'm reasonably sure that colorbar references are retained is as an attribute of the artist they are tied to, the best solution I could think of is to search all artists in a figure. This is best done recursively:
def get_colorbars(fig):
def check_kids(obj, bars):
for child in obj.get_children():
if isinstance(getattr(child, 'colorbar', None), Colorbar):
bars.append(child.colorbar)
check_kids(child, bars)
return bars
return check_kids(fig, [])
I have not had a chance to test this code, but it should at least point you in the right direction.

How to update Matplotlib graph when new data arrives?

I'm building a bot which fetches data from the internet every 30 seconds.
I want to plot this data using matplotlib and be able to update the graphs when I fetch some new one.
At the beginning of my script I initialise my plots.
Then I run a function to update these plots every 30 seconds, but my plot window freezes at this moment.
I've done some research but can't seem to find any working solution:
plt.show(block=False)
plt.pause(0.001)
What am I doing wrong ?
General structure of the code:
import matplotlib.pyplot as plt
import time
def init_plots():
global fig
global close_line
plt.ion()
fig = plt.figure()
main_fig = fig.add_subplot(111)
x = [datetime.fromtimestamp(x) for x in time_series]
close_line, = main_fig.plot(x, close_history, 'k-')
plt.draw()
def update_all_plots():
global close_line
x = [datetime.fromtimestamp(x) for x in time_series]
close_line.set_xdata(time_series)
close_line.set_ydata(close_history)
plt.draw()
# SCRIPT :
init_plots()
while(True):
# Fetch new data...
# Update time series...
# Update close_history...
update_plots()
time.sleep(30)
There is a module in matplotlib for specifically for plots that change over time: https://matplotlib.org/api/animation_api.html
Basically you define an update function, that updates the data for your line objects. That update function can then be used to create a FuncAnimation object which automatically calls the update function every x milliseconds:
ani = FuncAnimation(figure, update_function, repeat=True, interval=x)
There is a simple way to do it, given a panda dataframe . You would usually do something like this to draw(df is dataframe) :
ax = df.plot.line()
Using
df.plot.line(reuse_plot=True,ax=ax)
one can reuse the same figure to redraw it elegantly and probably fast enough.
Possibly duplicate of Matplotlib updating live plot

matplotlib subplots from plot objects

I've got a series of functions that return three plot objects (figure, axis and plot) and I would like to combine them into a single figure as subplots. I've put together example code:
import matplotlib.pyplot as plt
import numpy as np
def main():
line_fig,line_axes,line_plot=line_grapher()
cont_fig,cont_axes,cont_plot=cont_grapher()
compound_fig=plot_compounder(line_fig,cont_fig)#which arguments?
plt.show()
def line_grapher():
x=np.linspace(0,2*np.pi)
y=np.sin(x)/(x+1)
line_fig=plt.figure()
line_axes=line_fig.add_axes([0.1,0.1,0.8,0.8])
line_plot=line_axes.plot(x,y)
return line_fig,line_axes,line_plot
def cont_grapher():
z=np.random.rand(10,10)
cont_fig=plt.figure()
cont_axes=cont_fig.add_axes([0.1,0.1,0.8,0.8])
cont_plot=cont_axes.contourf(z)
return cont_fig,cont_axes,cont_plot
def plot_compounder(fig1,fig2):
#... lines that will compound the two figures that
#... were passed to the function and return a single
#... figure
fig3=None#provisional, so that the code runs
return fig3
if __name__=='__main__':
main()
It would be really useful to combine a set of graphs into one with a function. Has anybody done this before?
If you're going to be plotting your graphs on the same figure anyway, there's no need to create a figure for each plot. Changing your plotting functions to just return the axes, you can instantiate a figure with two subplots and add an axes to each subplot:
def line_grapher(ax):
x=np.linspace(0,2*np.pi)
y=np.sin(x)/(x+1)
ax.plot(x,y)
def cont_grapher(ax):
z=np.random.rand(10,10)
cont_plot = ax.contourf(z)
def main():
fig3, axarr = plt.subplots(2)
line_grapher(axarr[0])
cont_grapher(axarr[1])
plt.show()
if __name__=='__main__':
main()
Look into the plt.subplots function and the add_subplot figure method for plotting multiple plots on one figure.

Categories