Correlation values in pairplot() - python

Is there a way to show pair-correlation values with seaborn.pairplot(), as in the example below (created with ggpairs() in R)? I can make the plots using the attached code, but cannot add the correlations. Thanks
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
g = sns.pairplot(iris, kind='scatter', diag_kind='kde')
# remove upper triangle plots
for i, j in zip(*np.triu_indices_from(g.axes, 1)):
g.axes[i, j].set_visible(False)
plt.show()

If you use PairGrid instead of pairplot, then you can pass a custom function that would calculate the correlation coefficient and display it on the graph:
from scipy.stats import pearsonr
def reg_coef(x,y,label=None,color=None,**kwargs):
ax = plt.gca()
r,p = pearsonr(x,y)
ax.annotate('r = {:.2f}'.format(r), xy=(0.5,0.5), xycoords='axes fraction', ha='center')
ax.set_axis_off()
iris = sns.load_dataset("iris")
g = sns.PairGrid(iris)
g.map_diag(sns.distplot)
g.map_lower(sns.regplot)
g.map_upper(reg_coef)

Related

Plotting seaborn histplot bar_label with condition

I want to plot a seaborn histogram with labels to show the values of each bar. I only want to show the non-zero values, but I'm not sure how to do it. My MWE is
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
xlist = 900+200*np.random.randn(50,1)
fig, ax = plt.subplots()
y = sns.histplot(data=xlist, element="bars", bins=20, stat='count', legend=False)
y.set(xlabel='total time (ms)')
y.bar_label(y.containers[0])
## y.bar_label(y.containers[0][y.containers[0]!=0])
plt.show()
The graph looks like
and I want to remove all the 0 labels.
Update
A best version suggested by #BigBen:
labels = [str(v) if v else '' for v in y.containers[0].datavalues]
y.bar_label(y.containers[0], labels=labels)
Try:
labels = []
for p in y.patches:
h = p.get_height()
labels.append(str(h) if h else '')
y.bar_label(y.containers[0], labels=labels)

How can I add an R^2 value to the legend of a seaborn barplot?

I have a seaborn barplot and a regression line graphed on top of it that looks like this. As you can see, I have a legend that is automatically created with seaborn.barplot() and I am attempting to add the R^2 score with this:
g = sns.barplot(x='City/Town', y="Value", hue="Metric", data=df, ax=ax1)
h, l = g.get_legend_handles_labels()
g.legend(h + [lin_reg.score(X, Y)], l + ['R^2 score'], title="Legend")
It doesn't throw an error, in fact I know it's working because it changes the title to "Legend", but it also doesn't add the R^2.
The legend() function expects a handle in the first argument, and I don't think you can use text as one of those. You can read more help page for matplotlib legend
One quick solution I can think of, is to make a blank rectangle for the line intended for R^2, below is an example using iris as an example:
import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from sklearn.linear_model import LinearRegression
df = sns.load_dataset("iris")
lin_reg = LinearRegression().fit(df[['petal_length']], df['sepal_length'])
r2 = lin_reg.score(df[['petal_length']], df['sepal_length'])
blank = Rectangle((0, 0), 1, 1, fc="w", fill=False, edgecolor='none', linewidth=0)
fig, ax = plt.subplots(figsize=(10,5))
sns.scatterplot(x='sepal_width', y="sepal_length", hue="species", data=df, ax=ax)
h, l = ax.get_legend_handles_labels()
ax.legend(h + [blank], l + [f'R^2 score = {r2:.3f}'], title="Legend")
I noticed you have a barplot with a few categories, so I am not sure how you can calculate R^2 from that. In any case, with the code above you should be able to add the R^2

Add a normal distribution to seaborn 2D histogram

Is it possible to take a histogram from seaborn and add a normal distribution?
Say I had something like this scatter plot and histogram from the documentation.
import seaborn as sns
penguins = sns.load_dataset("penguins")
sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm");
plt.savefig('deletethis.png', bbox_inches='tight')
Can i superimpose a distribution on the sides like the image below?
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
x = np.random.normal(size=100000)
# Plot histogram in one-dimension
plt.hist(x,bins=80,density=True)
xvals = np.arange(-4,4,0.01)
plt.plot(xvals, norm.pdf(xvals),label='$N(0,1)$')
plt.legend();
The following gives a Kernel Density Estimate which displays the distribution (and if it is normal):
g = sns.JointGrid(data=penguins, x="bill_length_mm", y="bill_depth_mm")
g.plot_joint(sns.scatterplot, s=100, alpha=.5)
g.plot_marginals(sns.histplot, kde=True)
The following superimposes a normal distribution on the histograms in the axes.
import seaborn as sns
import numpy as np
import pandas as pd
from scipy.stats import norm
df1 = penguins.loc[:,["bill_length_mm", "bill_depth_mm"]]
axs = sns.jointplot("bill_length_mm", "bill_depth_mm", data=df1)
axs.ax_joint.scatter("bill_length_mm", "bill_depth_mm", data=df1, c='r', marker='x')
axs.ax_marg_x.cla()
axs.ax_marg_y.cla()
sns.distplot(df1.bill_length_mm, ax=axs.ax_marg_x, fit=norm)
sns.distplot(df1.bill_depth_mm, ax=axs.ax_marg_y, vertical=True, fit=norm)

How do I get the diagonal of sns.pairplot?

OK I am probably being thick, but how do I get just the graphs in the diagonal (top left to bottom right) in a nice row or 2x2 grid of:
import seaborn as sns; sns.set(style="ticks", color_codes=True)
iris = sns.load_dataset("iris")
g = sns.pairplot(iris, hue="species", palette="husl")
TO CLARIFY: I just want these graphs I do not care whether pairplot or something else is used.
Doing this the seaborn-way would make use of a FacetGrid. For this we would need to convert the wide-form input to a long-form dataframe, such that every observation is a single row. This is done via pandas.melt.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset("iris")
df = pd.melt(iris, iris.columns[-1], iris.columns[:-1])
g = sns.FacetGrid(df, col="variable", hue="species", col_wrap=2)
g.map(sns.kdeplot, "value", shade=True)
plt.show()
Why do you even want to do that. The diagonal of the pairplot gives you the distplot of that feature. It will be more effective if you can plot the idividual distplots as subplot or mux them Ex:
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
import seaborn as sns
iris = load_iris()
iris = pd.DataFrame(data=np.c_[iris['data'], iris['target']],
columns=iris['feature_names'] + ['target'])
# Sort the dataframe by target
target_0 = iris.loc[iris['target'] == 0]
target_1 = iris.loc[iris['target'] == 1]
target_2 = iris.loc[iris['target'] == 2]
sns.distplot(target_0[['sepal length (cm)']], hist=False, rug=True)
sns.distplot(target_1[['sepal length (cm)']], hist=False, rug=True)
sns.distplot(target_2[['sepal length (cm)']], hist=False, rug=True)
sns.plt.show()
The output will be somewhat like this:
[1]
Read more here : python: distplot with multiple distributions
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="ticks", color_codes=True)
iris = sns.load_dataset("iris")
def hide_current_axis(*args, **kwds):
plt.gca().set_visible(False)
g = sns.pairplot(iris, hue="species", palette="husl")
g.map_upper(hide_current_axis)
g.map_lower(hide_current_axis)
Output:
plt.subplots(2, 2)
for i, col in enumerate(iris.columns[:4]):
plt.subplot(2, 2, i+1)
sns.kdeplot(iris.loc[iris['species'] == 'setosa', col], shade=True, label='setosa')
sns.kdeplot(iris.loc[iris['species'] == 'versicolor', col], shade=True, label='versicolor')
sns.kdeplot(iris.loc[iris['species'] == 'virginica', col], shade=True, label='virginica')
plt.xlabel('cm')
plt.title(col)
if i == 1:
plt.legend(loc='upper right')
else:
plt.legend().remove()
plt.subplot_tool() # Opens a widget which allows adjusting plot aesthetics
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset("iris")
sns.pairplot(iris, hue="species", corner=True)

How can I plot identity lines on a seaborn pairplot?

I'm using Seaborn's pairplot:
g = sns.pairplot(df)
Is it possible to draw identity lines on each of the scatter plots?
Define a function which will plot the identity line on the current axes, and apply it to the off-diagonal axes of the grid using PairGrid.map_offdiag() method.
For example:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
def plot_unity(xdata, ydata, **kwargs):
mn = min(xdata.min(), ydata.min())
mx = max(xdata.max(), ydata.max())
points = np.linspace(mn, mx, 100)
plt.gca().plot(points, points, color='k', marker=None,
linestyle='--', linewidth=1.0)
ds = sns.load_dataset('iris')
grid = sns.pairplot(ds)
grid.map_offdiag(plot_unity)
This makes the following plot on my setup. You can tweak the kwargs of the plot_unity function to style the plot however you want.

Categories