Subplot of Subplots Matplotlib / Seaborn - python

I am trying to create a grid of subplots. each subplot will look like the one that is on this site.
https://python-graph-gallery.com/24-histogram-with-a-boxplot-on-top-seaborn/
If I have 10 different sets of this style of plot I want to make them into a 5x2 for example.
I have read through the documentation of Matplotlib and cannot seem to figure out how do it. I can loop the subplots and have each output but I cannot make it into the rows and columns
import pandas as pd
import numpy as np
import seaborn as sns
df = pd.DataFrame(np.random.randint(0,100,size=(100, 10)),columns=list('ABCDEFGHIJ'))
for c in df :
# Cut the window in 2 parts
f, (ax_box,
ax_hist) = plt.subplots(2,
sharex=True,
gridspec_kw={"height_ratios":(.15, .85)},
figsize = (10, 10))
# Add a graph in each part
sns.boxplot(df[c], ax=ax_box)
ax_hist.hist(df[c])
# Remove x axis name for the boxplot
plt.show()
the results would just take this loop and put them into a set of rows and columns in this case 5x2

You have 10 columns, each of which creates 2 subplots: a box plot and a histogram. So you need a total of 20 figures. You can do this by creating a grid of 2 rows and 10 columns
Complete answer: (Adjust the figsize and height_ratios as per taste)
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
f, axes = plt.subplots(2, 10, sharex=True, gridspec_kw={"height_ratios":(.35, .35)},
figsize = (12, 5))
df = pd.DataFrame(np.random.randint(0,100,size=(100, 10)),columns=list('ABCDEFGHIJ'))
for i, c in enumerate(df):
sns.boxplot(df[c], ax=axes[0,i])
axes[1,i].hist(df[c])
plt.tight_layout()
plt.show()

Related

How to sync color between Seaborn and pandas pie plot

I am struggling with syncing colors between [seaborn.countplot] and [pandas.DataFrame.plot] pie plot.
I found a similar question on SO, but it does not work with pie chart as it throws an error:
TypeError: pie() got an unexpected keyword argument 'color'
I searched on the documentation sites, but all I could find is that I can set a colormap and palette, which was also not in sync in the end:
Result of using the same colormap and palette
My code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1])
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Illustration of the problem
As you can see, colors are not in sync with labels.
I added the argument order to the sns.countplot(). This would change how seaborn selects the values and as a consequence the colours between both plots will mach.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1],
order=df[var].value_counts().index)
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Explanation: Colors are selected by order. So, if the columns in the sns.countplot have a different order than the other plot, both plots will have different columns for the same label.
Using default colors
Using the same dataframe for the pie plot and for the seaborn plot might help. As the values are already counted for the pie plot, that same dataframe could be plotted directly as a bar plot. That way, the order of the values stays the same.
Note that seaborn by default makes the colors a bit less saturated. To get the same colors as in the pie plot, you can use saturation=1 (default is .75). To add text above the bars, the latest matplotlib versions have a new function bar_label.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
counts_df = df[var].value_counts()
counts_df.plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
sns.barplot(x=counts_df.index, y=counts_df.values, saturation=1, ax=ax[1])
ax[1].bar_label(ax[1].containers[0])
#Customized colors
If you want to use a customized list of colors, you can use the colors= keyword in pie() and palette= in seaborn.
To make things fit better, you can replace spaces by newlines (so "Staten Island" will use two lines). plt.tight_layout() will rearrange spacings to make titles and texts fit nicely into the figure.

Plotting two subplots in one figure

I have two PCA plots: one for training data and testing test. Using seaborn, I'd like to combine those two and plot like subplots.
sns.FacetGrid(finalDf_test, hue="L", height=6).map(plt.scatter, 'PC1_test', 'PC2_test').add_legend()
sns.FacetGrid(finalDf_train, hue="L", height=6).map(plt.scatter, 'PC1_train', 'PC2_train').add_legend()
Can someone help on that?
FacetGrid is a figure-level function that creates one or more subplots, depending on its col= and row= parameters. In this case, only one subplot is created.
As FacetGrid works on only one dataframe, you could concatenate your dataframes, introducing a new column to diferentiate test and train. Also, the "PC1" and "PC2" columns of both dataframes should get the same name.
An easier approach is to use matplotlib to create the figure and then call sns.scatterplot(...., ax=...) for each of the subplots.
It would look like:
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
# create some dummy data
l = np.random.randint(0,2,500)
p1 = np.random.rand(500)*10
p2 = p1 + np.random.randn(500) + l
finalDf_test = pd.DataFrame({'PC1_test': p1[:100], 'PC2_test': p2[:100], 'L':l[:100] })
finalDf_train = pd.DataFrame({'PC1_train': p1[100:], 'PC2_train': p2[100:], 'L':l[100:] })
sns.set()
fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(12, 6), sharex=True, sharey=True)
sns.scatterplot(data=finalDf_test, x='PC1_test', y='PC2_test', hue='L', ax=ax1)
sns.scatterplot(data=finalDf_train, x='PC1_train', y='PC2_train', hue='L', ax=ax2)
plt.show()
Concatenating the dataframes could look as follows:
sns.set()
finalDf_total = pd.concat({'test': finalDf_test.rename(columns={'PC1_test': 'PC1', 'PC2_test': 'PC2' }),
'train':finalDf_train.rename(columns={'PC1_train': 'PC1', 'PC2_train': 'PC2' })})
finalDf_total.index.rename(['origin', None], inplace=True) # rename the first index column to "origin"
finalDf_total.reset_index(level=0, inplace=True) # convert the first index to a regular column
sns.FacetGrid(finalDf_total, hue='L', height=6, col='origin').map(plt.scatter, 'PC1', 'PC2').add_legend()
plt.show()
The same combined dataframe could also be used for example in lmplot:
sns.lmplot(data=finalDf_total, x='PC1', y='PC2', hue='L', height=6, col='origin')

Plot all pandas dataframe columns separately

I have a pandas dataframe who just has numeric columns, and I am trying to create a separate histogram for all the features
ind group people value value_50
1 1 5 100 1
1 2 2 90 1
2 1 10 80 1
2 2 20 40 0
3 1 7 10 0
3 2 23 30 0
but in my real life data there are 50+ columns, how can I create a separate plot for all of them
I have tried
df.plot.hist( subplots = True, grid = True)
It gave me an overlapping unclear plot.
how can I arrange them using pandas subplots = True. Below example can help me to get graphs in (2,2) grid for four columns. But its a long method for all 50 columns
fig, [(ax1,ax2),(ax3,ax4)] = plt.subplots(2,2, figsize = (20,10))
Pandas subplots=True will arange the axes in a single column.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.rand(7,20))
df.plot(subplots=True)
plt.tight_layout()
plt.show()
Here, tight_layout isn't applied, because the figure is too small to arange the axes nicely. One can use a bigger figure (figsize=(...)) though.
In order to have the axes on a grid, one can use the layout parameter, e.g.
df.plot(subplots=True, layout=(4,5))
The same can be achieved if creating the axes via plt.subplots()
fig, axes = plt.subplots(nrows=4, ncols=5)
df.plot(subplots=True, ax=axes)
If you want to plot them separately (which is why I ended up here), you can use
for i in df.columns:
plt.figure()
plt.hist(df[i])
An alternative for this task can be using the "hist" method with hyperparameter "layout". Example using part of the code provided by #ImportanceOfBeingErnest:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.rand(7,20))
df.hist(layout=(5,4), figsize=(15,10))
plt.show()
Using pandas.DataFrame I would suggest using pandas.DataFrame.apply. With a custom function, in this example plot(), you can print and save each figure seperately.
def plot(col):
fig, ax = plt.subplots()
ax.plot(col)
plt.show()
df.apply(plot)
While not asked for in the question I thought I'd add that using the x parameter to plot would allow you to specify a column for the x axis data.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
df = pd.DataFrame(np.random.rand(7,20),columns=list('abcdefghijklmnopqrst'))
df.plot(x='a',subplots=True, layout=(4,5))
plt.tight_layout()
plt.show()
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.plot.html

sharey='all' argument in plt.subplots() not passed to df.plot()?

I have a pandas dataframe which I would like to slice, and plot each slice in a separate subplot. I would like to use the sharey='all' and have matplotlib decide on some reasonable y-axis limits, rather than having to search the dataframe for the min and max and add offsets.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig, axes = plt.subplots(nrows=0,ncols=0, sharey='all', tight_layout=True)
for i in range(1, len(df.columns) + 1):
ax = fig.add_subplot(2,3,i)
iC = df.iloc[:, i-1]
iC.plot(ax=ax)
Which gives the following plot:
In fact, it gives that irrespective of what I specify sharey to be ('all','col','row',True, or False). What I sought after using sharey='all' would be something like:
Can somebody perhaps explain me what I'm doing wrong here?
The following version would only add those axes you need for your df-columns and share their y-scales:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig = plt.figure(tight_layout=True)
ref_ax = None
for i in range(len(df.columns)):
ax = fig.add_subplot(2, 3, i+1, sharey=ref_ax)
ref_ax=ax
iC = df.iloc[:, i]
iC.plot(ax=ax)
plt.show()
The grid-layout Parameters, which are explicitly given as ...add_subplot(2, 3, ... here can of course be calculated with respect to len(df.columns).
Your plots are not shared. You create a subplot grid with 0 rows and 0 columns, i.e. no subplots at all, but those nonexisting subplots have their y axes shared. Then you create some other (existing) subplots, which are not shared. Those are the ones that are plotted to.
Instead you need to set nrows and ncols to some useful values and plot to those hence created axes.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.arange(50).reshape((5,10))).transpose()
fig, axes = plt.subplots(nrows=2,ncols=3, sharey='all', tight_layout=True)
for i, ax in zip(range(len(df.columns)), axes.flat):
iC = df.iloc[:, i]
iC.plot(ax=ax)
for j in range(len(df.columns),len(axes.flat)):
axes.flatten()[j].axis("off")
plt.show()

Plotting through a subset of data frame in Pandas using Matplotlib

I have a Dataframe and I slice the Dataframe into three subsets. Each subset has 3 to 4 rows of data. After I slice the data frame into three subsets, I plot them using Matplotlib.
The problem I have is I am not able to create a plot where each subplot is plotted using sliced DataFrame. For example, in a group of three in a set, I have only one of the plots (last subplot) plotted where there is no data for the remaining two plots initial sets in a group. it looks like the 'r' value does not pass to 'r.plot' for all three subplots.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df['key1'] = 0
df.key1.iloc[0:3] = 1
df.key1.iloc[3:7] = 2
df.key1.iloc[7:] = 3
df_grouped = df.groupby('key1')
for group_name, group_value in df_grouped:
rows, columns = group_value.shape
fig, axes = plt.subplots(rows, 1, sharex=True, sharey=True, figsize=(15,20))
for i,r in group_value.iterrows():
r = r[0:columns-1]
r.plot(kind='bar', fill=False, log=False)
I think you might want what I call df_subset to be summarized in some way, but here's a way to plot each group in its own panel.
# Your Code Setting Up the Dataset
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df['key1'] = 0
df.key1.iloc[0:3] = 1
df.key1.iloc[3:7] = 2
df.key1.iloc[7:] = 3
# My Code to Plot in Three Panels
distinct_keys = df['key1'].unique()
fig, axes = plt.subplots(len(distinct_keys), 1, sharex=True, figsize=(3,5))
for i, key in enumerate(distinct_keys):
df_subset = df[df.key1==key]
# {maybe insert a line here to summarize df_subset somehow interesting?}
# plot
axes[i] = df_subset.plot(kind='bar', fill=False, log=False)

Categories