Problems with plt.subplots, which should be the best option? - python

I'm new in both python and stackoverflow... I come from the ggplot2 R background and I am still getting stacked with python. I don't understand why I have a null plot before my figure using matplotlib... I just have a basic pandas series and I want to plot some of the rows in a subplot, and some on the others (however my display is terrible and I don't know why/how to fix it). Thank you in advance!
df = organism_df.T
fig, (ax1,ax2) = plt.subplots(nrows=1,ncols=2,figsize=(5,5))
ax1 = df.iloc[[0,2,3,-1]].plot(kind='bar')
ax1.get_legend().remove()
ax1.set_title('Number of phages/bacteria interacting vs actual DB')
ax2 = df.iloc[[1,4,5,6,7]].plot(kind='bar')
ax2.get_legend().remove()
ax2.set_title('Number of different taxonomies with interactions')
plt.tight_layout()

The method plot from pandas would need the axes given as an argument, e.g., df.plot(ax=ax1, kind='bar'). In your example, first the figure (consisting of ax1 and ax2) is created, then another figure is created by the plot function (at the same time overwriting the original ax1 object) etc.

Related

How can I return a matplotlib figure from a function?

I need to plot changing molecule numbers against time. But I'm also trying to investigate the effects of parallel processing so I'm trying to avoid writing to global variables. At the moment I have the following two numpy arrays tao_all, contains all the time points to be plotted on the x-axis and popul_num_all which contains the changing molecule numbers to be plotted on the y-axis.
The current code I've got for plotting is as follows:
for i, label in enumerate(['Enzyme', 'Substrate', 'Enzyme-Substrate complex', 'Product']):
figure1 = plt.plot(tao_all, popul_num_all[:, i], label=label)
plt.legend()
plt.tight_layout()
plt.show()
I need to encapsulate this in a function that takes the above arrays as the input and returns the graph. I've read a couple of other posts on here that say I should write my results to an axis and return the axis? But I can't quite get my head around applying that to my problem?
Cheers
def plot_func(x, y):
fig,ax = plt.subplots()
ax.plot(x, y)
return fig
Usage:
fig = plot_func([1,2], [3,4])
Alternatively you may want to return ax. For details about Figure and Axes see the docs. You can get the axes array from the figure by fig.axes and the figure from the axes by ax.get_figure().
In addition to above answer, I can suggest you to use matplotlib animation.FuncAnimation method if you are working with the time series and want to make your visualization better.
You can find the details here https://matplotlib.org/api/_as_gen/matplotlib.animation.FuncAnimation.html

How to add two data sets on one bar graph using matplotlib

How do you put two data sets on the same bar graph? I tried this code. This should be simple enough to help anyone with the same problem?
x = groups1_table.plot.bar(color='blue')
x = groups2_table.plot.bar(color='red')
plt.show()
Any suggestions?
Use ax in matplotlib.
Since you have not posted MRE, I am assuming the data points.
You can proceed with something like this:
import matplotlib.pyplot as plt
x1=[1,2,3,4,5]
y1=[6,7,8,9,15]
x2=[16,17,18,16,19]
y2=[20,22,23,26,21]
fig, ax=plt.subplots()
ax=plt.bar(x1,y1,label='x_list')
ax=plt.bar(x2,y2,label='y_list')
plt.legend(loc='upper left')
plt.show()
Again you have to change the code to meet your preferences. Just know that you can place as many graphs as you want on a same plot. Just use the same axes object ax to plot them.
OUTPUT:

How to remove the extra figures created when running a for loop to create seaborn plots

I am trying to do EDA along with exploring the Matplotlib and Seaborn libraries.
The data_cat DataFrame has 4 columns and I want to create plots in a single row with 4 columns.
For that, I created a figure object with 4 axes objects.
fig, ax = plt.subplots(1,4, figsize = (16,4))
for i in range(len(data_cat.columns)):
sns.catplot(x = data_cat.columns[i], kind = 'count', data = data_cat, ax= ax[i])
The output for it is a figure with the 4 plots (as required) but it is followed by 4 blank plots that I think are the extra figure objects generated by the sns.catplot function.
Your code does not work as intended because sns.catplot() is a figure level function, that is designed to create its own grid of subplots if desired. So if you want to set up the subplot grid directly in matplotlib, as you do with your first line, you should use the appropriate axes level function instead, in this case sns.countplot():
fig, ax = plt.subplots(1, 4, figsize = (16,4))
for i in range(4):
sns.countplot(x = data_cat.columns[i], data = data_cat, ax= ax[i])
Alternatively, you could use pandas' df.melt() method to tidy up your dataset so that all the values from your four columns are in one column (say 'col_all'), and you have another column (say 'subplot') that identifies from which original column each value is. Then you can get all the subplots with one call:
sns.catplot(x='col_all', kind='count', data=data_cat, col='subplot')
I answered a related question here.

Saving matplotlib subplot figure to image file

I'm fairly new to matplotlib and am limping along. That said, I haven't found an obvious answer to this question.
I have a scatter plot I wanted colored by groups, and it looked like plotting via a loop was the way to roll.
Here is my reproducible example, based on the first link above:
import matplotlib.pyplot as plt
import pandas as pd
from pydataset import data
df = data('mtcars').iloc[0:10]
df['car'] = df.index
fig, ax = plt.subplots(1)
plt.figure(figsize=(12, 9))
for ind in df.index:
ax.scatter(df.loc[ind, 'wt'], df.loc[ind, 'mpg'], label=ind)
ax.legend(bbox_to_anchor=(1.05, 1), loc=2)
# plt.show()
# plt.savefig('file.png')
Uncommenting plt.show() yields what I want:
Searching around, it looked like plt.savefig() is the way to save a file; if I re-comment out plt.show() and run plt.savefig() instead, I get a blank white picture. This question, suggests this is cause by calling show() before savefig(), but I have it entirely commented out. Another question has a comment suggesting I can save the ax object directly, but that cuts off my legend:
The same question has an alternative that uses fig.savefig() instead. I get the same chopped legend.
There's this question which seems related, but I'm not plotting a DataFrame directly so I'm not sure how to apply the answer (where dtf is the pd.DataFrame they're plotting):
plot = dtf.plot()
fig = plot.get_figure()
fig.savefig("output.png")
Thanks for any suggestions.
Edit: to test the suggestion below to try tight_layout(), I ran this and still get a blank white image file:
fig, ax = plt.subplots(1)
plt.figure(figsize=(12, 9))
for ind in df.index:
ax.scatter(df.loc[ind, 'wt'], df.loc[ind, 'mpg'], label=ind)
ax.legend(bbox_to_anchor=(1.05, 1), loc=2)
fig.tight_layout()
plt.savefig('test.png')
Remove the line plt.figure(figsize=(12, 9)) and it will work as expected. I.e. call savefig before show.
The problem is that the figure being saved is the one created by plt.figure(), while all the data is plotted to ax which is created before that (and in a different figure, which is not the one being saved).
For saving the figure including the legend use the bbox_inches="tight" option
plt.savefig('test.png', bbox_inches="tight")
Of course saving the figure object directly is equally possible,
fig.savefig('test.png', bbox_inches="tight")
For a deeper understanding on how to move the legend out of the plot, see this answer.
Additional add-up on #ImportanceOfBeingErnest's answer, when bbox_inches='tight', 'pad_inches=0.1' may need to set to larger values.

Why do many examples use `fig, ax = plt.subplots()` in Matplotlib/pyplot/python

I'm learning to use matplotlib by studying examples, and a lot of examples seem to include a line like the following before creating a single plot...
fig, ax = plt.subplots()
Here are some examples...
Modify tick label text
http://matplotlib.org/examples/pylab_examples/boxplot_demo2.html
I see this function used a lot, even though the example is only attempting to create a single chart. Is there some other advantage? The official demo for subplots() also uses f, ax = subplots when creating a single chart, and it only ever references ax after that. This is the code they use.
# Just a figure and one subplot
f, ax = plt.subplots()
ax.plot(x, y)
ax.set_title('Simple plot')
plt.subplots() is a function that returns a tuple containing a figure and axes object(s). Thus when using fig, ax = plt.subplots() you unpack this tuple into the variables fig and ax. Having fig is useful if you want to change figure-level attributes or save the figure as an image file later (e.g. with fig.savefig('yourfilename.png')). You certainly don't have to use the returned figure object but many people do use it later so it's common to see. Also, all axes objects (the objects that have plotting methods), have a parent figure object anyway, thus:
fig, ax = plt.subplots()
is more concise than this:
fig = plt.figure()
ax = fig.add_subplot(111)
Just a supplement here.
The following question is that what if I want more subplots in the figure?
As mentioned in the Doc, we can use fig = plt.subplots(nrows=2, ncols=2) to set a group of subplots with grid(2,2) in one figure object.
Then as we know, the fig, ax = plt.subplots() returns a tuple, let's try fig, ax1, ax2, ax3, ax4 = plt.subplots(nrows=2, ncols=2) firstly.
ValueError: not enough values to unpack (expected 4, got 2)
It raises a error, but no worry, because we now see that plt.subplots() actually returns a tuple with two elements. The 1st one must be a figure object, and the other one should be a group of subplots objects.
So let's try this again:
fig, [[ax1, ax2], [ax3, ax4]] = plt.subplots(nrows=2, ncols=2)
and check the type:
type(fig) #<class 'matplotlib.figure.Figure'>
type(ax1) #<class 'matplotlib.axes._subplots.AxesSubplot'>
Of course, if you use parameters as (nrows=1, ncols=4), then the format should be:
fig, [ax1, ax2, ax3, ax4] = plt.subplots(nrows=1, ncols=4)
So just remember to keep the construction of the list as the same as the subplots grid we set in the figure.
Hope this would be helpful for you.
As a supplement to the question and above answers there is also an important difference between plt.subplots() and plt.subplot(), notice the missing 's' at the end.
One can use plt.subplots() to make all their subplots at once and it returns the figure and axes (plural of axis) of the subplots as a tuple. A figure can be understood as a canvas where you paint your sketch.
# create a subplot with 2 rows and 1 columns
fig, ax = plt.subplots(2,1)
Whereas, you can use plt.subplot() if you want to add the subplots separately. It returns only the axis of one subplot.
fig = plt.figure() # create the canvas for plotting
ax1 = plt.subplot(2,1,1)
# (2,1,1) indicates total number of rows, columns, and figure number respectively
ax2 = plt.subplot(2,1,2)
However, plt.subplots() is preferred because it gives you easier options to directly customize your whole figure
# for example, sharing x-axis, y-axis for all subplots can be specified at once
fig, ax = plt.subplots(2,2, sharex=True, sharey=True)
whereas, with plt.subplot(), one will have to specify individually for each axis which can become cumbersome.
In addition to the answers above, you can check the type of object using type(plt.subplots()) which returns a tuple, on the other hand, type(plt.subplot()) returns matplotlib.axes._subplots.AxesSubplot which you can't unpack.
Using plt.subplots() is popular because it gives you an Axes object and allows you to use the Axes interface to define plots.
The alternative would be to use the global state interface, the plt.plot etc functionality:
import matplotlib.pyplot as plt
# global state version - modifies "current" figure
plt.plot(...)
plt.xlabel(...)
# axes version - modifies explicit axes
ax.plot(...)
ax.set_xlabel(...)
So why do we prefer Axes?
It is refactorable - you can put away some of the code into a function that takes an Axes object, and does not rely on global state
It is easier to transition to a situation with multiple subplots
One consistent/familiar interface instead of switching between two
The only way to access the depth of all features of matplotlib
The global state version was created that way to be easy to use interactively, and to be a familiar interface for Matlab users, but in larger programs and scripts the points outlined here favour using the Axes interface.
There is a matplotlib blog post exploring this topic in more depth: Pyplot vs Object Oriented Interface
It is relatively easy to deal with both worlds. We can for example always ask for the current axes: ax = plt.gca() ("get current axes").
fig.tight_layout()
such a feature is very convenient, if xticks_labels goes out of plot-window, such a line helps to fit xticks_labels & the whole chart to the window, if automatic positioning of chart in plt-window works not correctly. And this code-line works only if you use fig-object in the plt-window
fig, ax = plt.subplots(figsize=(10,8))
myData.plot(ax=ax)
plt.xticks(fontsize=10, rotation=45)
fig.tight_layout()
plt.show()

Categories