How can I plot identity lines on a seaborn pairplot? - python

I'm using Seaborn's pairplot:
g = sns.pairplot(df)
Is it possible to draw identity lines on each of the scatter plots?

Define a function which will plot the identity line on the current axes, and apply it to the off-diagonal axes of the grid using PairGrid.map_offdiag() method.
For example:
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
def plot_unity(xdata, ydata, **kwargs):
mn = min(xdata.min(), ydata.min())
mx = max(xdata.max(), ydata.max())
points = np.linspace(mn, mx, 100)
plt.gca().plot(points, points, color='k', marker=None,
linestyle='--', linewidth=1.0)
ds = sns.load_dataset('iris')
grid = sns.pairplot(ds)
grid.map_offdiag(plot_unity)
This makes the following plot on my setup. You can tweak the kwargs of the plot_unity function to style the plot however you want.

Related

How to plot a density bar next to my density scatter plot? [duplicate]

I'm working with data that has the data has 3 plotting parameters: x,y,c. How do you create a custom color value for a scatter plot?
Extending this example I'm trying to do:
import matplotlib
import matplotlib.pyplot as plt
cm = matplotlib.cm.get_cmap('RdYlBu')
colors=[cm(1.*i/20) for i in range(20)]
xy = range(20)
plt.subplot(111)
colorlist=[colors[x/2] for x in xy] #actually some other non-linear relationship
plt.scatter(xy, xy, c=colorlist, s=35, vmin=0, vmax=20)
plt.colorbar()
plt.show()
but the result is TypeError: You must first set_array for mappable
From the matplotlib docs on scatter 1:
cmap is only used if c is an array of floats
So colorlist needs to be a list of floats rather than a list of tuples as you have it now.
plt.colorbar() wants a mappable object, like the CircleCollection that plt.scatter() returns.
vmin and vmax can then control the limits of your colorbar. Things outside vmin/vmax get the colors of the endpoints.
How does this work for you?
import matplotlib.pyplot as plt
cm = plt.cm.get_cmap('RdYlBu')
xy = range(20)
z = xy
sc = plt.scatter(xy, xy, c=z, vmin=0, vmax=20, s=35, cmap=cm)
plt.colorbar(sc)
plt.show()
Here is the OOP way of adding a colorbar:
fig, ax = plt.subplots()
im = ax.scatter(x, y, c=c)
fig.colorbar(im, ax=ax)
If you're looking to scatter by two variables and color by the third, Altair can be a great choice.
Creating the dataset
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
df = pd.DataFrame(40*np.random.randn(10, 3), columns=['A', 'B','C'])
Altair plot
from altair import *
Chart(df).mark_circle().encode(x='A',y='B', color='C').configure_cell(width=200, height=150)
Plot

Correlation values in pairplot()

Is there a way to show pair-correlation values with seaborn.pairplot(), as in the example below (created with ggpairs() in R)? I can make the plots using the attached code, but cannot add the correlations. Thanks
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
g = sns.pairplot(iris, kind='scatter', diag_kind='kde')
# remove upper triangle plots
for i, j in zip(*np.triu_indices_from(g.axes, 1)):
g.axes[i, j].set_visible(False)
plt.show()
If you use PairGrid instead of pairplot, then you can pass a custom function that would calculate the correlation coefficient and display it on the graph:
from scipy.stats import pearsonr
def reg_coef(x,y,label=None,color=None,**kwargs):
ax = plt.gca()
r,p = pearsonr(x,y)
ax.annotate('r = {:.2f}'.format(r), xy=(0.5,0.5), xycoords='axes fraction', ha='center')
ax.set_axis_off()
iris = sns.load_dataset("iris")
g = sns.PairGrid(iris)
g.map_diag(sns.distplot)
g.map_lower(sns.regplot)
g.map_upper(reg_coef)

How to add a standard normal pdf over a seaborn histogram

I would like to add a standard normal pdf curve over a histogram built with seaborn.
import numpy as np
import seaborn as sns
x = np.random.standard_normal(1000)
sns.distplot(x, kde = False)
Any help would be appreciated!
scipy.stats.norm gives easy access to the pdf of a normal distribution with
known parameters; by default it corresponds to the standard normal, mu=0, sigma=1.
This answer works regardless of where the data mean is located (e.g. mu=0 or mu=10)
Tested in python 3.8.11, matplotlib 3.4.2, seaborn 0.11.2
This question and answer are for axes-level plots; for figure-level plots, see How to draw a normal curve on seaborn displot
Imports and Data
import numpy as np
import seaborn as sns
from scipy import stats
import matplotlib.pyplot as plt
# data
np.random.seed(365)
x = np.random.standard_normal(1000)
seaborn.histplot
ax = sns.histplot(x, kde=False, stat='density', label='samples')
# calculate the pdf
x0, x1 = ax.get_xlim() # extract the endpoints for the x-axis
x_pdf = np.linspace(x0, x1, 100)
y_pdf = scipy.stats.norm.pdf(x_pdf)
ax.plot(x_pdf, y_pdf, 'r', lw=2, label='pdf')
ax.legend()
seaborn.distplot - deprecated
For this to correspond correctly to your sampled data, the histogram should
display densities and not counts, so use norm_hist=True in the seaborn.distplot call.
ax = sns.distplot(x, kde = False, norm_hist=True, hist_kws={'ec': 'k'}, label='samples')
# calculate the pdf
x0, x1 = ax.get_xlim() # extract the endpoints for the x-axis
x_pdf = np.linspace(x0, x1, 100)
y_pdf = scipy.stats.norm.pdf(x_pdf)
ax.plot(x_pdf, y_pdf, 'r', lw=2, label='pdf')
ax.legend()

Change color of seaborn distribution line

I want to specify the color of a line of fit within the seaborn package for an array of x and y data. Instead all I can figure out is how to change the color and shading for the kernel density function. How can I change the color for a gaussian fit? I.e. the lines below should be red and blue. It would also be great to shade in the function like the "shade":True argument.
import seaborn as sns
sns.distplot(x,kde_kws={"shade":True}, kde=False, fit=stats.gamma, hist=None, color="red", label="label 1");
sns.distplot(y,kde_kws={"shade":True}, kde=False, fit=stats.gamma, hist=None, color="blue", label="label 2");
For changing the color of the fitted curve, you need to set fit_kws argument. But fit_kws does not support shading. You can still shade the area below the fitted curve by a few extra lines of code as shown below but that I think is an answer to another question that you have posted.
import numpy as np
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
sns.set()
np.random.seed(0)
x = np.random.randn(100)
y = np.random.normal(loc=6.0, scale=1, size=(50,))
ax = sns.distplot(x, fit_kws={"color":"red"}, kde=False,
fit=stats.gamma, hist=None, label="label 1");
ax = sns.distplot(y, fit_kws={"color":"blue"}, kde=False,
fit=stats.gamma, hist=None, label="label 2");
plt.show(block=False)
The result of the code is show below:

Histogram at specific coordinates inside axes

What I want to achieve with Python 3.6 is something like this :
Obviously made in paint and missing some ticks on the xAxis. Is something like this possible? Essentially, can I control exactly where to plot a histogram (and with what orientation)?
I specifically want them to be on the same axes just like the figure above and not on separate axes or subplots.
fig = plt.figure()
ax2Handler = fig.gca()
ax2Handler.scatter(np.array(np.arange(0,len(xData),1)), xData)
ax2Handler.hist(xData,bins=60,orientation='horizontal',normed=True)
This and other approaches (of inverting the axes) gave me no results. xData is loaded from a panda dataframe.
# This also doesn't work as intended
fig = plt.figure()
axHistHandler = fig.gca()
axScatterHandler = fig.gca()
axHistHandler.invert_xaxis()
axHistHandler.hist(xData,orientation='horizontal')
axScatterHandler.scatter(np.array(np.arange(0,len(xData),1)), xData)
A. using two axes
There is simply no reason not to use two different axes. The plot from the question can easily be reproduced with two different axes:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("ggplot")
xData = np.random.rand(1000)
fig,(ax,ax2)= plt.subplots(ncols=2, sharey=True)
fig.subplots_adjust(wspace=0)
ax2.scatter(np.linspace(0,1,len(xData)), xData, s=9)
ax.hist(xData,bins=60,orientation='horizontal',normed=True)
ax.invert_xaxis()
ax.spines['right'].set_visible(False)
ax2.spines['left'].set_visible(False)
ax2.tick_params(axis="y", left=0)
plt.show()
B. using a single axes
Just for the sake of answering the question: In order to plot both in the same axes, one can shift the bars by their length towards the left, effectively giving a mirrored histogram.
import numpy as np
import matplotlib.pyplot as plt
plt.style.use("ggplot")
xData = np.random.rand(1000)
fig,ax= plt.subplots(ncols=1)
fig.subplots_adjust(wspace=0)
ax.scatter(np.linspace(0,1,len(xData)), xData, s=9)
xlim1 = ax.get_xlim()
_,__,bars = ax.hist(xData,bins=60,orientation='horizontal',normed=True)
for bar in bars:
bar.set_x(-bar.get_width())
xlim2 = ax.get_xlim()
ax.set_xlim(-xlim2[1],xlim1[1])
plt.show()
You might be interested in seaborn jointplots:
# Import and fake data
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
data = np.random.randn(2,1000)
# actual plot
jg = sns.jointplot(data[0], data[1], marginal_kws={"bins":100})
jg.ax_marg_x.set_visible(False) # remove the top axis
plt.subplots_adjust(top=1.15) # fill the empty space
produces this:
See more examples of bivariate distribution representations, available in Seaborn.

Categories