In the example below, how do I use seaborn.PairGrid() to reproduce the plots created by seaborn.pairplot()? Specifically, I'd like the diagonal distributions to span the vertical axis. Markers with white borders etc... would be great too. Thanks!
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
# pairplot() example
g = sns.pairplot(iris, kind='scatter', diag_kind='kde')
plt.show()
# PairGrid() example
g = sns.PairGrid(iris)
g.map_diag(sns.kdeplot)
g.map_offdiag(plt.scatter)
plt.show()
This is quite simple to achieve. The main differences between your plot and what pairplot does are:
the use of the diag_sharey parameter of PairGrid
using sns.scatterplot instead of plt.scatter
With that, we have:
iris = sns.load_dataset('iris')
g = sns.PairGrid(iris, diag_sharey=False)
g.map_diag(sns.kdeplot)
g.map_offdiag(sns.scatterplot)
To change the visual style:
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
g = sns.PairGrid(iris)
g.map_diag(sns.kdeplot, shade=True)
g.map_offdiag(plt.scatter, edgecolor="w")
plt.show()
Related
I am generating gain plot based on the following example data in Matplotlib.
M_GRP_1 F_GRP_1 GRP_1 GAIN_GRP_1
0.036796 0.067024 0.058878 0.624948
0.000093 0.000087 0.000089 1.043674
0.000316 0.0002 0.000231 1.366149
0.011152 0.008329 0.00909 1.226813
0.001227 0.000747 0.000876 1.400792
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
fig.set_size_inches([18, 9])
ax.plot(np.linspace(0,1),np.linspace(0,1), color = 'black', linewidth = 2)
D = d.sort_values('GRP_1', ascending = False).cumsum()
ax.plot(D.iloc[:,2], D.iloc[:,0], color = 'orange', linewidth = 2)
plt.xlabel('Percentage of total data')
plt.ylabel('Gain')
plt.title ('Target groups :: GRP_1')
plt.legend(['Basline','Male'])
plt.grid(True)
plt.show()
However, I want to generate same plot using seaborn. I am wondering how I can do that as I,m not familiar with it.
Can any body suggest/help with this.
Thanks in advance
Seaborn is based on matplotlib, so most of your code is the same.
Just import seaborn as sns and replace ax.plot by sns.lineplot.
You may also want to add sns.set_theme() (or sns.set() prior to version 0.11.0) to apply seaborn default styles.
I am trying to plot a facet_grid with stacked bar charts inside.
I would like to use Seaborn. Its barplot function does not include a stacked argument.
I tried to use FacetGrid.map with a custom callable function.
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
def custom_stacked_barplot(col_day, col_time, col_total_bill, **kwargs):
dict_df={}
dict_df['day']=col_day
dict_df['time']=col_time
dict_df['total_bill']=col_total_bill
df_data_graph=pd.DataFrame(dict_df)
df = pd.crosstab(index=df_data_graph['time'], columns=tips['day'], values=tips['total_bill'], aggfunc=sum)
df.plot.bar(stacked=True)
tips=sns.load_dataset("tips")
g = sns.FacetGrid(tips, col='size', row='smoker')
g = g.map(custom_stacked_barplot, "day", 'time', 'total_bill')
However I get an empty canvas and stacked bar charts separately.
Empty canvas:
Graph1 apart:
Graph2:.
How can I fix this issue? Thanks for the help!
The simplest code to achive that result is this:
import seaborn as sns
import matplotlib.pyplot as plt
sns.set()
tips=sns.load_dataset("tips")
g = sns.FacetGrid(tips, col = 'size', row = 'smoker', hue = 'day')
g = (g.map(sns.barplot, 'time', 'total_bill', ci = None).add_legend())
plt.show()
which gives this result:
Your different mixes of APIs (pandas.DataFrame.plot) appears not to integrate with (seaborn.FacetGrid). Since stacked bar plots are not supported in seaborn plotting, consider developing your own version with matplotlib subplots by iterating across groupby levels:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
def custom_stacked_barplot(t, sub_df, ax):
plot_df = pd.crosstab(index=sub_df["time"], columns=sub_df['day'],
values=sub_df['total_bill'], aggfunc=sum)
p = plot_df.plot(kind="bar", stacked=True, ax = ax,
title = " | ".join([str(i) for i in t]))
return p
tips = sns.load_dataset("tips")
g_dfs = tips.groupby(["smoker", "size"])
# INITIALIZE PLOT
# sns.set()
fig, axes = plt.subplots(nrows=2, ncols=int(len(g_dfs)/2)+1, figsize=(15,6))
# BUILD PLOTS ACROSS LEVELS
for ax, (i,g) in zip(axes.ravel(), sorted(g_dfs)):
custom_stacked_barplot(i, g, ax)
plt.tight_layout()
plt.show()
plt.clf()
plt.close()
And use seaborn.set to adjust theme and pallette:
When plotting correlations, this code
>>> import seaborn as sns
>>> iris = sns.load_dataset("iris")
>>> g = sns.pairplot(iris)
results in the following pairplot:
http://seaborn.pydata.org/_images/seaborn-pairplot-1.png
What if I just want to show the first row out of those four (i.e. correlations of 'sepal_length' vs all other features)? How can I plot that? Could pairplot be used but with some modifications?
Thanks
Using the x_vars and y_vars arguments of pairplot you can select which columns to correlate.
import matplotlib.pyplot as plt
import seaborn as sns
iris = sns.load_dataset("iris")
g = sns.pairplot(iris,
x_vars=["sepal_width","petal_length","petal_width"],
y_vars=["sepal_length"])
plt.show()
I'm starting to learn a bit of python (been using R) for data analysis. I'm trying to create two plots using seaborn, but it keeps saving the second on top of the first. How do I stop this behavior?
import seaborn as sns
iris = sns.load_dataset('iris')
length_plot = sns.barplot(x='sepal_length', y='species', data=iris).get_figure()
length_plot.savefig('ex1.pdf')
width_plot = sns.barplot(x='sepal_width', y='species', data=iris).get_figure()
width_plot.savefig('ex2.pdf')
You have to start a new figure in order to do that. There are multiple ways to do that, assuming you have matplotlib. Also get rid of get_figure() and you can use plt.savefig() from there.
Method 1
Use plt.clf()
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
length_plot = sns.barplot(x='sepal_length', y='species', data=iris)
plt.savefig('ex1.pdf')
plt.clf()
width_plot = sns.barplot(x='sepal_width', y='species', data=iris)
plt.savefig('ex2.pdf')
Method 2
Call plt.figure() before each one
plt.figure()
length_plot = sns.barplot(x='sepal_length', y='species', data=iris)
plt.savefig('ex1.pdf')
plt.figure()
width_plot = sns.barplot(x='sepal_width', y='species', data=iris)
plt.savefig('ex2.pdf')
I agree with a previous comment that importing matplotlib.pyplot is not the best software engineering practice as it exposes the underlying library. As I was creating and saving plots in a loop, then I needed to clear the figure and found out that this can now be easily done by importing seaborn only:
since version 0.11:
import seaborn as sns
import numpy as np
data = np.random.normal(size=100)
path = "/path/to/img/plot.png"
plot = sns.displot(data) # also works with histplot() etc
plot.fig.savefig(path)
plot.fig.clf() # this clears the figure
# ... continue with next figure
alternative example with a loop:
import seaborn as sns
import numpy as np
for i in range(3):
data = np.random.normal(size=100)
path = "/path/to/img/plot2_{0:01d}.png".format(i)
plot = sns.displot(data)
plot.fig.savefig(path)
plot.fig.clf() # this clears the figure
before version 0.11 (original post):
import seaborn as sns
import numpy as np
data = np.random.normal(size=100)
path = "/path/to/img/plot.png"
plot = sns.distplot(data)
plot.get_figure().savefig(path)
plot.get_figure().clf() # this clears the figure
# ... continue with next figure
Create specific figures and plot onto them:
import seaborn as sns
iris = sns.load_dataset('iris')
length_fig, length_ax = plt.subplots()
sns.barplot(x='sepal_length', y='species', data=iris, ax=length_ax)
length_fig.savefig('ex1.pdf')
width_fig, width_ax = plt.subplots()
sns.barplot(x='sepal_width', y='species', data=iris, ax=width_ax)
width_fig.savefig('ex2.pdf')
I've found that if the interaction is turned off seaborn plot the heatmap normally.
I have a scatter plot matrix generated using the seaborn package and I'd like to remove all the tick mark labels as these are just messying up the graph (either that or just remove those on the x-axis), but I'm not sure how to do it and have had no success doing Google searches. Any suggestions?
import seaborn as sns
sns.pairplot(wheat[['area_planted',
'area_harvested',
'production',
'yield']])
plt.show()
import seaborn as sns
iris = sns.load_dataset("iris")
g = sns.pairplot(iris)
g.set(xticklabels=[])
You can use a list comprehension to loop through all columns and turn off visibility of the xaxis.
df = pd.DataFrame(np.random.randn(1000, 2)) * 1e6
sns.pairplot(df)
plot = sns.pairplot(df)
[plot.axes[len(df.columns) - 1][col].xaxis.set_visible(False)
for col in range(len(df.columns))]
plt.show()
You could also rescale your data to something more readable:
df /= 1e6
sns.pairplot(df)
Probably using the following is more appropriate
import seaborn as sns
iris = sns.load_dataset("iris")
g = sns.pairplot(iris)
g.set(xticks=[])