I'm trying to use Seaborn Pair Grid to make a correlogram with scatterplots in one half, histograms on the diagonal and the pearson coefficient on the other half. I've managed to put together the following code which does what I need, but I'm really struggling with further customization
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
df = sns.load_dataset('iris')
def reg_coef(x,y,label=None,color=None,**kwargs):
ax = plt.gca()
r,p = pearsonr(x,y)
ax.annotate('{:.2f}'.format(r), xy=(0.5,0.5), xycoords='axes fraction', ha='center',fontsize=30,
bbox={'facecolor': 'red', 'alpha': 0.5, 'pad': 20})
ax.set_axis_off()
sns.set(font_scale=1.5)
sns.set_style("white")
g = sns.PairGrid(df)
g.map_diag(plt.hist)
g.map_lower(plt.scatter)
g.map_upper(reg_coef)
g.fig.subplots_adjust(top=0.9)
g.fig.suptitle('Iris Correlogram', fontsize=30)
plt.show()
This is the result
What I'd like to do:
Change the font used for the whole plot and assign my own defined rgb colour to the font and axes (same one)
Remove the X & Y tick labels
Change the colour of the scatter dots and histogram bars to my own defined rgb colour (same one)
Set a diverging colour map for the background of the pearson number to highlight the degree and type of correlation, again using my own defined rgb colours.
I know Im asking a lot but Ive spent hours going round in circles trying to figure this out!!
The color can be set as extra parameter in g.map_diag(plt.hist, color=...) and
g.map_lower(plt.scatter, color=...). The function reg_coef can be modified to take a colormap into account.
The font color and family can be set via the rcParams. The ticks can be removed via plt.setp(g.axes, xticks=[], yticks=[]). Instead of subplot_adjust, g.fig.tight_layout() usually fits all elements nicely into the plot. Here is an example:
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
def reg_coef(x, y, label=None, color=None, cmap=None, **kwargs):
ax = plt.gca()
r, p = pearsonr(x, y)
norm = plt.Normalize(-1, 1)
cmap = cmap if not cmap is None else plt.cm.coolwarm
ax.annotate(f'{r:.2f}', xy=(0.5, 0.5), xycoords='axes fraction', ha='center', fontsize=30,
bbox={'facecolor': cmap(norm(r)), 'alpha': 0.5, 'pad': 20})
ax.set_axis_off()
df = sns.load_dataset('iris')
sns.set(font_scale=1.5)
sns.set_style("white")
for param in ['text.color', 'axes.labelcolor', 'xtick.color', 'ytick.color']:
plt.rcParams[param] = 'cornflowerblue'
plt.rcParams['font.family'] = 'cursive'
g = sns.PairGrid(df, height=2)
g.map_diag(plt.hist, color='turquoise')
g.map_lower(plt.scatter, color='fuchsia')
g.map_upper(reg_coef, cmap=plt.get_cmap('PiYG'))
plt.setp(g.axes, xticks=[], yticks=[])
g.fig.suptitle('Iris Correlogram', fontsize=30)
g.fig.tight_layout()
plt.show()
Related
I want to make boxplots with hues but I want to color code it so that each specific X string is a certain color with the hue just being a lighter color. I am able to do a boxplot without a hue. When I incorporate the hue, I get the second boxplot which loses the colors. Can someone help me customize the colors for the figure that contains the hue?
Essentially, its what the answer for this question is but with boxplots.
This is my code:
first boxplot
order=['Ash1','E1A','FUS','p53']
colors=['gold','teal','darkorange','royalblue']
color_dict=dict(zip(order,colors))
fig,ax=plt.subplots(figsize=(25,15))
bp=sns.boxplot(data=df_idrs, x=df_idrs["construct"], y=df_idrs['Norm_Ef_IDR/Ef_GS'],ax=ax,palette=color_dict)
sns.stripplot(ax=ax,y='Norm_Ef_IDR/Ef_GS', x='construct', data=df_idrs,palette=color_dict,
jitter=1, marker='o', alpha=0.4,edgecolor='black',linewidth=1, dodge=True)
ax.axhline(y=1,linestyle="--",color='black',linewidth=2)
plt.legend(loc='upper left', bbox_to_anchor=(1.03, 1))
second boxplot
order=['Ash1','E1A','FUS','p53']
colors=['gold','teal','darkorange','royalblue']
color_dict=dict(zip(order,colors))
fig,ax=plt.subplots(figsize=(25,15))
bp=sns.boxplot(data=df_idrs, x=df_idrs["construct"], y=df_idrs['Norm_Ef_IDR/Ef_GS'],ax=ax, hue=df_idrs["location"])
sns.stripplot(y='Norm_Ef_IDR/Ef_GS', x='construct', data=df_idrs, hue=df_idrs["location"],
jitter=1, marker='o', alpha=0.4,edgecolor='black',linewidth=1, dodge=True)
ax.axhline(y=1,linestyle="--",color='black',linewidth=2)
plt.legend(loc='upper left', bbox_to_anchor=(1.03, 1))
The only thing that changed was the palette to hue. I have seen many examples on here but I am unable to get them to work. Using the second code, I have tried the following:
Nothing happens for this one.
for ind, bp in enumerate(ax.findobj(PolyCollection)):
rgb = to_rgb(colors[ind // 2])
if ind % 2 != 0:
rgb = 0.5 + 0.5 * np.array(rgb) # make whiter
bp.set_facecolor(rgb)
I get index out of range for the following one.
for i in range(0,4):
mybox = bp.artists[i]
mybox.set_facecolor(color_dict[order[i]])
Matplotlib stores the boxes in ax.patches, but there are also 2 dummy patches (used to construct the legend) that need to be filtered away. The dots of the stripplot are stored in ax.collections. There are also 2 dummy collections for the legend, but as those come at the end, they don't form a problem.
Some remarks:
sns.boxplot returns the subplot on which it was drawn; as it is called with ax=ax it will return that same ax
Setting jitter=1in the stripplot will smear the dots over a width of 1. 1 is the distance between the x positions, and the boxes are only 0.4 wide. To avoid clutter, the code below uses jitter=0.4.
Here is some example code starting from dummy test data:
from matplotlib import pyplot as plt
from matplotlib.legend_handler import HandlerTuple
from matplotlib.patches import PathPatch
from matplotlib.colors import to_rgb
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(20230215)
order = ['Ash1', 'E1A', 'FUS', 'p53']
colors = ['gold', 'teal', 'darkorange', 'royalblue']
hue_order = ['A', 'B']
df_idrs = pd.DataFrame({'construct': np.repeat(order, 200),
'Norm_Ef_IDR/Ef_GS': (np.random.normal(0.03, 1, 800).cumsum() + 10) / 15,
'location': np.tile(np.repeat(hue_order, 100), 4)})
fig, ax = plt.subplots(figsize=(12, 5))
sns.boxplot(data=df_idrs, x=df_idrs['construct'], y=df_idrs['Norm_Ef_IDR/Ef_GS'], hue='location',
order=order, hue_order=hue_order, ax=ax)
box_colors = [f + (1 - f) * np.array(to_rgb(c)) # whiten colors depending on hue
for c in colors for f in np.linspace(0, 0.5, len(hue_order))]
box_patches = [p for p in ax.patches if isinstance(p, PathPatch)]
for patch, color in zip(box_patches, box_colors):
patch.set_facecolor(color)
sns.stripplot(y='Norm_Ef_IDR/Ef_GS', x='construct', data=df_idrs, hue=df_idrs['location'],
jitter=0.4, marker='o', alpha=0.4, edgecolor='black', linewidth=1, dodge=True, ax=ax)
for collection, color in zip(ax.collections, box_colors):
collection.set_facecolor(color)
ax.axhline(y=1, linestyle='--', color='black', linewidth=2)
handles = [tuple(box_patches[i::len(hue_order)]) for i in range(len(hue_order))]
ax.legend(handles=handles, labels=hue_order, title='hue category',
handlelength=4, handler_map={tuple: HandlerTuple(ndivide=None, pad=0)},
loc='upper left', bbox_to_anchor=(1.01, 1))
plt.tight_layout()
plt.show()
This is my plot:
I would like the coloring to be centered at 0 within the plot. While I managed to have the legend centered at 0, this does not apply to the dots in the plot (i.e. I would expect them to be gray at the zero value).
This is my code which generates the plots:
import matplotlib.colors as mcolors
import matplotlib.cm as cm
import seaborn as sns
def plot_jitter(df):
plot = sns.stripplot(x='category', y='overall_margin', hue='overall_margin', data=df,
palette='coolwarm_r',
jitter=True, edgecolor='none', alpha=.60)
plot.get_legend().set_visible(False)
sns.despine()
plt.axhline(0, 0,1,color='grey').set_linestyle("--")
#Drawing the side color bar
normalize = mcolors.TwoSlopeNorm(vcenter=0, vmin=df['overall_margin'].min(), vmax=df['overall_margin'].max())
colormap = cm.coolwarm_r
[plt.plot(color=colormap(normalize(x))) for x in df['overall_margin']]
scalarmappaple = cm.ScalarMappable(norm=normalize, cmap=colormap)
scalarmappaple.set_array(df['overall_margin'])
plt.colorbar(scalarmappaple)
By using sns.scatterplot instead of sns.stripplot you can use the c, norm and cmap parameters like so.
# Load demo data, scale `total_bill` to be in the range [0, 1]
tips = sns.load_dataset("tips")
tips["total_bill"] = tips["total_bill"].div(100)
Building the plot:
fig, ax = plt.subplots()
# Get/set params for the colour mapping
vcenter = 0.15
vmin, vmax = tips["total_bill"].min(), tips["total_bill"].max()
normalize = mcolors.TwoSlopeNorm(vcenter=vcenter, vmin=vmin, vmax=vmax)
colormap = cm.coolwarm_r
# plot with:
# - `c`: array of floats for colour mapping
# - `cmap`: the colourmap you want
# - `norm`: to scale the data from `c`
sns.scatterplot(
x="day",
y="total_bill",
data=tips,
c=tips["total_bill"],
norm=normalize,
cmap=colormap,
ax=ax,
)
ax.axhline(vcenter, color="grey", ls="--")
# Tweak the points to mimic `sns.stripplot`
pts = ax.collections[0]
pts.set_offsets(pts.get_offsets() + np.c_[np.random.uniform(-.1, .1, len(tips)), np.zeros(len(tips))])
ax.margins(x=0.15)
scalarmappaple = cm.ScalarMappable(norm=normalize, cmap=colormap)
scalarmappaple.set_array(tips["total_bill"])
fig.colorbar(scalarmappaple)
Which produces:
The code to mimic stripplot is from seaborn's github issues
A spike map (as shown in the image below, implemented with D3.js) is a method for displaying differences in the magnitude of a certain discrete, abruptly changing phenomenon such as counts of people.
Is there a package I could use (or example code I could follow) to create a static spike map, similar to the map shown above, in Python? e.g. Matplotlib
You could try with a Ridge Plot. It's not exactly the same, but maybe it can work for you. The implementation in seaborn looks like this:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})
# Create the data
rs = np.random.RandomState(1979)
x = rs.randn(500)
g = np.tile(list("ABCDEFGHIJ"), 50)
df = pd.DataFrame(dict(x=x, g=g))
m = df.g.map(ord)
df["x"] += m
# Initialize the FacetGrid object
pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="g", hue="g", aspect=15, height=.5, palette=pal)
# Draw the densities in a few steps
g.map(sns.kdeplot, "x",
bw_adjust=.5, clip_on=False,
fill=True, alpha=1, linewidth=1.5)
g.map(sns.kdeplot, "x", clip_on=False, color="w", lw=2, bw_adjust=.5)
g.map(plt.axhline, y=0, lw=2, clip_on=False)
# Define and use a simple function to label the plot in axes coordinates
def label(x, color, label):
ax = plt.gca()
ax.text(0, .2, label, fontweight="bold", color=color,
ha="left", va="center", transform=ax.transAxes)
g.map(label, "x")
# Set the subplots to overlap
g.fig.subplots_adjust(hspace=-.25)
# Remove axes details that don't play well with overlap
g.set_titles("")
g.set(yticks=[])
g.despine(bottom=True, left=True)
plt.show()
And creates the following graph
I am trying to make an upper triangle correlation matrix which ideally I would like to superpose to another picture of a lower triangle matrix. Therefore, I would like the mask color to be setup to none or transparent (otherwise if it's white I will not be able to superpose)...any idea about how to do this in seaborn?
EDIT
Here is what I would like to do: using a set of columns from dataframe, I would like to plot the pairplot (lower triangle) and the correlation map (upper triangle) of these columns
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
rs = np.random.RandomState(112358)
d1 = pd.DataFrame(data=rs.normal(size=(100, 10)), columns=[*'abcdefghij' ])
corr1 = d1.corr()
mask1 = np.tril(np.ones_like(corr1, dtype=bool))
fig, ax = plt.subplots(figsize=(11, 9))
sns.heatmap(corr1, mask=mask1, cmap='PRGn', vmax=.3, vmin=-.3,
square=True, linewidths=.5, cbar_kws={"shrink": .85, "pad":-.01}, ax=ax)
def hide_current_axis(*args, **kwds):
plt.gca().set_visible(False)
e = sns.pairplot(d1)
e.map_upper(hide_current_axis)
plt.show()
This code of course works, but it plots the two figures separately.
The normal way to create a triangular heatmap is to mask away the not-needed part. Nothing will be drawn there, the original background color will stay visible. If you draw a second heatmap, it also will only draw where it isn't masked away.
Here is some code to demonstrate the idea.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_theme(style="white")
rs = np.random.RandomState(112358)
d1 = pd.DataFrame(data=rs.normal(size=(100, 10)), columns=[*'abcdefghij' ])
d2 = pd.DataFrame(data=rs.normal(size=(100, 10)), columns=[*'abcdefghij' ])
corr1 = d1.corr()
corr2 = d2.corr()
mask1 = np.tril(np.ones_like(corr1, dtype=bool))
mask2 = np.triu(np.ones_like(corr2, dtype=bool))
fig, ax = plt.subplots(figsize=(11, 9))
sns.heatmap(corr1, mask=mask1, cmap='PRGn', vmax=.3, vmin=-.3,
square=True, linewidths=.5, cbar_kws={"shrink": .85, "pad":-.01}, ax=ax)
sns.heatmap(corr1, mask=mask2, cmap='RdYlBu', vmax=.3, vmin=-.3,
square=True, linewidths=.5, cbar_kws={"shrink": .85}, ax=ax)
# the following lines color and hatch the axes background, only the diagonals are visible
ax.patch.set_facecolor('grey')
ax.patch.set_edgecolor('yellow')
ax.patch.set_hatch('xx')
plt.show()
About the new question, to combine a pairplot with a triangular heatmap. As a pairplot is a figure-level function, it creates its own figure with subplots. It should be created first.
As a second step, a special ax for the heatmap can be created, using the positions of the pairplot's subplots. Setting its facecolor to 'none' makes it fully transparent (the default would be white, hiding everything behind).
Adding a colorbar can be more cumbersome, as the pairplot doesn't leave a good spot to position it.
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
def hide_current_axis(*args, **kwds):
plt.gca().set_visible(False)
rs = np.random.RandomState(112358)
d1 = pd.DataFrame(data=rs.normal(size=(20, 5)), columns=[*'abcde'])
e = sns.pairplot(d1)
e.map_upper(hide_current_axis)
(xmin, _), (_, ymax) = e.axes[0, 0].get_position().get_points()
(_, ymin), (xmax, _) = e.axes[-1, -1].get_position().get_points()
ax = e.fig.add_axes([xmin, ymin, xmax - xmin, ymax - ymin], facecolor='none')
corr1 = d1.corr()
mask1 = np.tril(np.ones_like(corr1, dtype=bool))
sns.heatmap(corr1, mask=mask1, cmap='seismic', vmax=.5, vmin=-.5,
linewidths=.5, cbar=False, annot=True, annot_kws={'size': 22}, ax=ax)
ax.set_xticks([])
ax.set_yticks([])
# ax.xaxis.tick_top()
# ax.yaxis.tick_right()
plt.show()
As mentioned in the comments, an approach more faithful to seaborn's philosophy would be to color the axes of the upper right subplots according to the correlation together with a numeric display. I couldn't find example code, here is my attempt:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import pearsonr
def corrfunc(x, y, **kwds):
cmap = kwds['cmap']
norm = kwds['norm']
ax = plt.gca()
ax.tick_params(bottom=False, top=False, left=False, right=False)
sns.despine(ax=ax, bottom=True, top=True, left=True, right=True)
r, _ = pearsonr(x, y)
facecolor = cmap(norm(r))
ax.set_facecolor(facecolor)
lightness = (max(facecolor[:3]) + min(facecolor[:3]) ) / 2
ax.annotate(f"r={r:.2f}", xy=(.5, .5), xycoords=ax.transAxes,
color='white' if lightness < 0.7 else 'black', size=26, ha='center', va='center')
rs = np.random.RandomState(112358)
d1 = pd.DataFrame(data=rs.normal(size=(20, 5)), columns=[*'abcde'])
g = sns.PairGrid(d1)
g.map_lower(plt.scatter, s=10)
g.map_diag(sns.histplot, kde=False)
g.map_upper(corrfunc, cmap=plt.get_cmap('seismic'), norm=plt.Normalize(vmin=-.5, vmax=.5))
g.fig.subplots_adjust(wspace=0.06, hspace=0.06) # equal spacing in both directions
plt.show()
Sometimes, it is useful to change the size of the correlation value based on the abs(r). Higher values -> larger numbers.
def corrfunc(x, y, **kwds):
cmap = kwds['cmap']
norm = kwds['norm']
ax = plt.gca()
ax.tick_params(bottom=False, top=False, left=False, right=False)
sns.despine(ax=ax, bottom=True, top=True, left=True, right=True)
r, _ = pearsonr(x, y)
facecolor = cmap(norm(r))
ax.set_facecolor(facecolor)
lightness = (max(facecolor[:3]) + min(facecolor[:3]) ) / 2
tam = int(70*abs(r))
if tam < 10:
tam = 10
ax.annotate(f"{r:.2f}", xy=(.5, .5), xycoords=ax.transAxes,
color='white' if lightness < 0.7 else 'black', size=tam, ha='center', va='center')
I am trying to generate multi-panel figure using seaborn in python and I want the color of the points in my multi-panel figure to be specified by a continuous variable. Here's an example of what I am trying to do with the "iris" dataset:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
g = sns.FacetGrid(iris, col = 'species', hue = 'petal_length', palette = 'seismic')
g = g.map(plt.scatter, 'sepal_length', 'sepal_width', s = 100, alpha = 0.5)
g.add_legend()
This makes the following figure:
Which is nice, but the legend is way too long. I'd like to sample out like 1/4 of these values (ideally) or barring that display a colorbar instead.
For instance, something like this might be acceptable, but I'd still want to split it over the three species.
plt.scatter(iris.sepal_length, iris.sepal_width, alpha = .8, c = iris.petal_length, cmap = 'seismic')
cbar = plt.colorbar()
Any idea about how I can get the best of both of these plots?
Edit:
This topic seems like a good start.
https://github.com/mwaskom/seaborn/issues/582
Somehow, for this user, simply appending plt.colorbar after everything else ran seemed to somehow work. Doesn't seem to help in this case though.
The FacetGrid hue is categorical, not continuous. It will require a little bit of work to get a continuous colormap for a scatterplot in the FacetGrid (unlike with imshow in the linked Github issue, matplotlib does not keep a reference to the "currently active scatterplot mapper" so that a magic call to plt.colorbar doesn't pick up the mapping applied to the point colors).
g = sns.FacetGrid(iris, col='species', palette = 'seismic')
def facet_scatter(x, y, c, **kwargs):
"""Draw scatterplot with point colors from a faceted DataFrame columns."""
kwargs.pop("color")
plt.scatter(x, y, c=c, **kwargs)
vmin, vmax = 0, 7
cmap = sns.diverging_palette(240, 10, l=65, center="dark", as_cmap=True)
g = g.map(facet_scatter, 'sepal_length', 'sepal_width', "petal_length",
s=100, alpha=0.5, vmin=vmin, vmax=vmax, cmap=cmap)
# Make space for the colorbar
g.fig.subplots_adjust(right=.92)
# Define a new Axes where the colorbar will go
cax = g.fig.add_axes([.94, .25, .02, .6])
# Get a mappable object with the same colormap as the data
points = plt.scatter([], [], c=[], vmin=vmin, vmax=vmax, cmap=cmap)
# Draw the colorbar
g.fig.colorbar(points, cax=cax)
Since you were asking about a legend for the scatter, one may adapt #mwaskom's solution to produce a legend with scatter points like so:
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
iris = sns.load_dataset('iris')
g = sns.FacetGrid(iris, col='species', palette = 'seismic')
def facet_scatter(x, y, c, **kwargs):
kwargs.pop("color")
plt.scatter(x, y, c=c, **kwargs)
vmin, vmax = 0, 7
cmap = plt.cm.viridis
norm=plt.Normalize(vmin=vmin, vmax=vmax)
g = g.map(facet_scatter, 'sepal_length', 'sepal_width', "petal_length",
s=100, alpha=0.5, norm=norm, cmap=cmap)
# Make space for the colorbar
g.fig.subplots_adjust(right=.9)
lp = lambda i: plt.plot([], color=cmap(norm(i)), marker="o", ls="", ms=10, alpha=0.5)[0]
labels = np.arange(0,7.5,0.5)
h = [lp(i) for i in labels]
g.fig.legend(handles=h, labels=labels, fontsize=9)
plt.show()