I am trying to plot a facet_grid with stacked bar charts inside.
I would like to use Seaborn. Its barplot function does not include a stacked argument.
I tried to use FacetGrid.map with a custom callable function.
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
def custom_stacked_barplot(col_day, col_time, col_total_bill, **kwargs):
dict_df={}
dict_df['day']=col_day
dict_df['time']=col_time
dict_df['total_bill']=col_total_bill
df_data_graph=pd.DataFrame(dict_df)
df = pd.crosstab(index=df_data_graph['time'], columns=tips['day'], values=tips['total_bill'], aggfunc=sum)
df.plot.bar(stacked=True)
tips=sns.load_dataset("tips")
g = sns.FacetGrid(tips, col='size', row='smoker')
g = g.map(custom_stacked_barplot, "day", 'time', 'total_bill')
However I get an empty canvas and stacked bar charts separately.
Empty canvas:
Graph1 apart:
Graph2:.
How can I fix this issue? Thanks for the help!
The simplest code to achive that result is this:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
tips=sns.load_dataset("tips")
g = sns.FacetGrid(tips, col = 'size', row = 'smoker', hue = 'day')
g = (g.map(sns.barplot, 'time', 'total_bill', ci = None).add_legend())
plt.show()
which gives this result:
Your different mixes of APIs (pandas.DataFrame.plot) appears not to integrate with (seaborn.FacetGrid). Since stacked bar plots are not supported in seaborn plotting, consider developing your own version with matplotlib subplots by iterating across groupby levels:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
def custom_stacked_barplot(t, sub_df, ax):
plot_df = pd.crosstab(index=sub_df["time"], columns=sub_df['day'],
values=sub_df['total_bill'], aggfunc=sum)
p = plot_df.plot(kind="bar", stacked=True, ax = ax,
title = " | ".join([str(i) for i in t]))
return p
tips = sns.load_dataset("tips")
g_dfs = tips.groupby(["smoker", "size"])
# INITIALIZE PLOT
# sns.set()
fig, axes = plt.subplots(nrows=2, ncols=int(len(g_dfs)/2)+1, figsize=(15,6))
# BUILD PLOTS ACROSS LEVELS
for ax, (i,g) in zip(axes.ravel(), sorted(g_dfs)):
custom_stacked_barplot(i, g, ax)
plt.tight_layout()
plt.show()
plt.clf()
plt.close()
And use seaborn.set to adjust theme and pallette:
Related
I am struggling with syncing colors between [seaborn.countplot] and [pandas.DataFrame.plot] pie plot.
I found a similar question on SO, but it does not work with pie chart as it throws an error:
TypeError: pie() got an unexpected keyword argument 'color'
I searched on the documentation sites, but all I could find is that I can set a colormap and palette, which was also not in sync in the end:
Result of using the same colormap and palette
My code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1])
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Illustration of the problem
As you can see, colors are not in sync with labels.
I added the argument order to the sns.countplot(). This would change how seaborn selects the values and as a consequence the colours between both plots will mach.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
df[var].value_counts().plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
cplot = sns.countplot(data=df, x=var, ax=ax[1],
order=df[var].value_counts().index)
for patch in cplot.patches:
cplot.annotate(
format(patch.get_height()),
(
patch.get_x() + patch.get_width() / 2,
patch.get_height()
)
)
plt.show()
Explanation: Colors are selected by order. So, if the columns in the sns.countplot have a different order than the other plot, both plots will have different columns for the same label.
Using default colors
Using the same dataframe for the pie plot and for the seaborn plot might help. As the values are already counted for the pie plot, that same dataframe could be plotted directly as a bar plot. That way, the order of the values stays the same.
Note that seaborn by default makes the colors a bit less saturated. To get the same colors as in the pie plot, you can use saturation=1 (default is .75). To add text above the bars, the latest matplotlib versions have a new function bar_label.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
df = pd.read_csv('https://andybek.com/pandas-sat')
cat_vars = ['Borough', 'SAT Section']
for var in list(cat_vars):
fig, ax = plt.subplots(1, 2, figsize=(10, 5))
counts_df = df[var].value_counts()
counts_df.plot(kind='pie', autopct=lambda v: f'{v:.2f}%', ax=ax[0])
sns.barplot(x=counts_df.index, y=counts_df.values, saturation=1, ax=ax[1])
ax[1].bar_label(ax[1].containers[0])
#Customized colors
If you want to use a customized list of colors, you can use the colors= keyword in pie() and palette= in seaborn.
To make things fit better, you can replace spaces by newlines (so "Staten Island" will use two lines). plt.tight_layout() will rearrange spacings to make titles and texts fit nicely into the figure.
I wonder is it possible to plot stacked bar plot with seaborn catplot.
For example:
import seaborn as sns
exercise = sns.load_dataset("exercise")
plot = exercise.groupby(['diet'])['kind'].value_counts(normalize=True).mul(100).reset_index(name='percentage%')
g = sns.catplot(x="diet", y="percentage%", hue="kind", data=plot, kind='bar')
I'd like to stack kind, but it seems catplot doesn't take 'stacked' parameter.
You cannot do it using sns.barplot, i think the closest you can get is using sns.histplot:
import seaborn as sns
exercise = sns.load_dataset("exercise")
plot = exercise.groupby(['diet'])['kind'].value_counts(normalize=True).mul(100).reset_index(name='percentage')
g = sns.histplot(x = 'diet' , hue = 'kind',weights= 'percentage',
multiple = 'stack',data=plot,shrink = 0.7)
In the example below, how do I use seaborn.PairGrid() to reproduce the plots created by seaborn.pairplot()? Specifically, I'd like the diagonal distributions to span the vertical axis. Markers with white borders etc... would be great too. Thanks!
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
# pairplot() example
g = sns.pairplot(iris, kind='scatter', diag_kind='kde')
plt.show()
# PairGrid() example
g = sns.PairGrid(iris)
g.map_diag(sns.kdeplot)
g.map_offdiag(plt.scatter)
plt.show()
This is quite simple to achieve. The main differences between your plot and what pairplot does are:
the use of the diag_sharey parameter of PairGrid
using sns.scatterplot instead of plt.scatter
With that, we have:
iris = sns.load_dataset('iris')
g = sns.PairGrid(iris, diag_sharey=False)
g.map_diag(sns.kdeplot)
g.map_offdiag(sns.scatterplot)
To change the visual style:
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
g = sns.PairGrid(iris)
g.map_diag(sns.kdeplot, shade=True)
g.map_offdiag(plt.scatter, edgecolor="w")
plt.show()
I'm trying to make a function that produces two pairplots side by side. I've got it to produce 1 at time.
import seaborn as sns
data = sns.load_dataset("iris")
def pairplot(col,hue_var):
sns.set(style="ticks")
cols_to_look_at = col + [hue_var]
sns.pairplot(data[cols_to_look_at], hue=hue_var)
col_names1 = ['sepal_length', 'sepal_width']
col_names2 = ['petal_length', 'petal_width']
pairplot(col_names1,'species')
pairplot(col_names2,'species')
grateful for any help
In fact it is not a trivial matter.
pairplot has not ax parameter to make it simple due to its nature of "figure-level" at seaborn.
There are some hacks:
How to plot multiple Seaborn Jointplot in Subplot
A "clean" one consists in saving the plots as images to bring them back to place them: https://stackoverflow.com/a/61330067/10372616
Using it in your plots:
import matplotlib.image as mpimg
import matplotlib.pyplot as plt
import seaborn as sns
data = sns.load_dataset("iris")
def pairplot(col,hue_var):
sns.set(style="ticks")
cols_to_look_at = col + [hue_var]
return sns.pairplot(data[cols_to_look_at], hue=hue_var) # !!! I've placed a return
col_names1 = ['sepal_length', 'sepal_width']
col_names2 = ['petal_length', 'petal_width']
g0 = pairplot(col_names1,'species')
g1 = pairplot(col_names2,'species')
############### 1. SAVE PLOTS IN MEMORY TEMPORALLY
g0.savefig('g0.png', dpi=300)
plt.close(g0.fig)
g1.savefig('g1.png', dpi=300)
plt.close(g1.fig)
############### 2. CREATE YOUR SUBPLOTS FROM TEMPORAL IMAGES
f, axarr = plt.subplots(1, 2, figsize=(20, 20))
axarr[0].imshow(mpimg.imread('g0.png'))
axarr[1].imshow(mpimg.imread('g1.png'))
# turn off x and y axis
[ax.set_axis_off() for ax in axarr.ravel()]
plt.tight_layout()
plt.show()
How can I create a seaborn boxplot with 2 y-axes? I need this because of different scales. My current code will overwrite the first box in the boxplot, eg. it is populated by 2 first data item from first ax and first item from second ax.
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
matplotlib.style.use('ggplot')
import seaborn as sns
df = pd.DataFrame({'A': pd.Series(np.random.uniform(0,1,size=10)),
'B': pd.Series(np.random.uniform(10,20,size=10)),
'C': pd.Series(np.random.uniform(10,20,size=10))})
fig = plt.figure()
# 2/3 of A4
fig.set_size_inches(7.8, 5.51)
plt.ylim(0.0, 1.1)
ax1 = fig.add_subplot(111)
ax1 = sns.boxplot(ax=ax1, data=df[['A']])
ax2 = ax1.twinx()
boxplot = sns.boxplot(ax=ax2, data=df[['B','C']])
fig = boxplot.get_figure()
fig
How do I prevent the first item getting overwritten?
EDIT:
If I add positions argument
boxplot = sns.boxplot(ax=ax2, data=df[['B','C']], positions=[2,3])
I get an exception:
TypeError: boxplot() got multiple values for keyword argument 'positions'
Probably because seaborn already sets that argument internally.
It may not make too much sense to use seaborn here. Using usual matplotlib boxplots allows you to use the positions argument as expected.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
df = pd.DataFrame({'A': pd.Series(np.random.uniform(0,1,size=10)),
'B': pd.Series(np.random.uniform(10,20,size=10)),
'C': pd.Series(np.random.uniform(10,20,size=10))})
fig, ax1 = plt.subplots(figsize=(7.8, 5.51))
props = dict(widths=0.7,patch_artist=True, medianprops=dict(color="gold"))
box1=ax1.boxplot(df['A'].values, positions=[0], **props)
ax2 = ax1.twinx()
box2=ax2.boxplot(df[['B','C']].values,positions=[1,2], **props)
ax1.set_xlim(-0.5,2.5)
ax1.set_xticks(range(len(df.columns)))
ax1.set_xticklabels(df.columns)
for b in box1["boxes"]+box2["boxes"]:
b.set_facecolor(next(ax1._get_lines.prop_cycler)["color"])
plt.show()