How to plot a paired histogram using seaborn

How to plot a paired histogram using seaborn - python

I would like to make a paired histogram like the one shown here using the seaborn distplot.
This kind of plot can also be referred to as the back-to-back histogram shown here, or a bihistogram inverted/mirrored along the x-axis as discussed here.
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
green = np.random.normal(20,10,1000)
blue = np.random.poisson(60,1000)
fig, ax = plt.subplots(figsize=(8,6))
sns.distplot(blue, hist=True, kde=True, hist_kws={'edgecolor':'black'}, kde_kws={'linewidth':2}, bins=10, color='blue')
sns.distplot(green, hist=True, kde=True, hist_kws={'edgecolor':'black'}, kde_kws={'linewidth':2}, bins=10, color='green')
ax.set_xticks(np.arange(-20,121,20))
ax.set_yticks(np.arange(0.0,0.07,0.01))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show()
Here is the output:
When I use the method discussed here (plt.barh), I get the bar plot shown just below, which is not what I am looking for.
Or maybe I haven't understood the workaround well enough...
A simple/short implementation of python-seaborn-distplot similar to these kinds of plots would be perfect. I edited the figure of my first plot above to show the kind of plot I hope to achieve (though y-axis not upside down):
Any leads would be greatly appreciated.

You could use two subplots and invert the y-axis of the lower one and plot with the same bins.
df = pd.DataFrame({'a': np.random.normal(0,5,1000), 'b': np.random.normal(20,5,1000)})
fig =plt.figure(figsize=(5,5))
ax = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
bins = np.arange(-20,40)
ax.hist(df['a'], bins=bins)
ax2.hist(df['b'],color='orange', bins=bins)
ax2.invert_yaxis()
edit:
improvements suggested by #mwaskom
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(5,5))
bins = np.arange(-20,40)
for ax, column, color, invert in zip(axes.ravel(), df.columns, ['teal', 'orange'], [False,True]):
ax.hist(df[column], bins=bins, color=color)
if invert:
ax.invert_yaxis()
plt.subplots_adjust(hspace=0)

Here is a possible approach using seaborn's displots.
Seaborn doesn't return the created graphical elements, but the ax can be interrogated. To make sure the ax only contains the elements you want upside down, those elements can be drawn first. Then, all the patches (the rectangular bars) and the lines (the curve for the kde) can be given their height in negative. Optionally the x-axis can be set at y == 0 using ax.spines['bottom'].set_position('zero').
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
green = np.random.normal(20, 10, 1000)
blue = np.random.poisson(60, 1000)
fig, ax = plt.subplots(figsize=(8, 6))
sns.distplot(green, hist=True, kde=True, hist_kws={'edgecolor': 'black'}, kde_kws={'linewidth': 2}, bins=10,
color='green')
for p in ax.patches: # turn the histogram upside down
p.set_height(-p.get_height())
for l in ax.lines: # turn the kde curve upside down
l.set_ydata(-l.get_ydata())
sns.distplot(blue, hist=True, kde=True, hist_kws={'edgecolor': 'black'}, kde_kws={'linewidth': 2}, bins=10,
color='blue')
ax.set_xticks(np.arange(-20, 121, 20))
ax.set_yticks(np.arange(0.0, 0.07, 0.01))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
pos_ticks = np.array([t for t in ax.get_yticks() if t > 0])
ticks = np.concatenate([-pos_ticks[::-1], [0], pos_ticks])
ax.set_yticks(ticks)
ax.set_yticklabels([f'{abs(t):.2f}' for t in ticks])
ax.spines['bottom'].set_position('zero')
plt.show()

Related

How to customize seaborn boxplot with specific color sequence when boxplots have hue

I want to make boxplots with hues but I want to color code it so that each specific X string is a certain color with the hue just being a lighter color. I am able to do a boxplot without a hue. When I incorporate the hue, I get the second boxplot which loses the colors. Can someone help me customize the colors for the figure that contains the hue?
Essentially, its what the answer for this question is but with boxplots.
This is my code:
first boxplot
order=['Ash1','E1A','FUS','p53']
colors=['gold','teal','darkorange','royalblue']
color_dict=dict(zip(order,colors))
fig,ax=plt.subplots(figsize=(25,15))
bp=sns.boxplot(data=df_idrs, x=df_idrs["construct"], y=df_idrs['Norm_Ef_IDR/Ef_GS'],ax=ax,palette=color_dict)
sns.stripplot(ax=ax,y='Norm_Ef_IDR/Ef_GS', x='construct', data=df_idrs,palette=color_dict,
jitter=1, marker='o', alpha=0.4,edgecolor='black',linewidth=1, dodge=True)
ax.axhline(y=1,linestyle="--",color='black',linewidth=2)
plt.legend(loc='upper left', bbox_to_anchor=(1.03, 1))
second boxplot
order=['Ash1','E1A','FUS','p53']
colors=['gold','teal','darkorange','royalblue']
color_dict=dict(zip(order,colors))
fig,ax=plt.subplots(figsize=(25,15))
bp=sns.boxplot(data=df_idrs, x=df_idrs["construct"], y=df_idrs['Norm_Ef_IDR/Ef_GS'],ax=ax, hue=df_idrs["location"])
sns.stripplot(y='Norm_Ef_IDR/Ef_GS', x='construct', data=df_idrs, hue=df_idrs["location"],
jitter=1, marker='o', alpha=0.4,edgecolor='black',linewidth=1, dodge=True)
ax.axhline(y=1,linestyle="--",color='black',linewidth=2)
plt.legend(loc='upper left', bbox_to_anchor=(1.03, 1))
The only thing that changed was the palette to hue. I have seen many examples on here but I am unable to get them to work. Using the second code, I have tried the following:
Nothing happens for this one.
for ind, bp in enumerate(ax.findobj(PolyCollection)):
rgb = to_rgb(colors[ind // 2])
if ind % 2 != 0:
rgb = 0.5 + 0.5 * np.array(rgb) # make whiter
bp.set_facecolor(rgb)
I get index out of range for the following one.
for i in range(0,4):
mybox = bp.artists[i]
mybox.set_facecolor(color_dict[order[i]])

Matplotlib stores the boxes in ax.patches, but there are also 2 dummy patches (used to construct the legend) that need to be filtered away. The dots of the stripplot are stored in ax.collections. There are also 2 dummy collections for the legend, but as those come at the end, they don't form a problem.
Some remarks:
sns.boxplot returns the subplot on which it was drawn; as it is called with ax=ax it will return that same ax
Setting jitter=1in the stripplot will smear the dots over a width of 1. 1 is the distance between the x positions, and the boxes are only 0.4 wide. To avoid clutter, the code below uses jitter=0.4.
Here is some example code starting from dummy test data:
from matplotlib import pyplot as plt
from matplotlib.legend_handler import HandlerTuple
from matplotlib.patches import PathPatch
from matplotlib.colors import to_rgb
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(20230215)
order = ['Ash1', 'E1A', 'FUS', 'p53']
colors = ['gold', 'teal', 'darkorange', 'royalblue']
hue_order = ['A', 'B']
df_idrs = pd.DataFrame({'construct': np.repeat(order, 200),
'Norm_Ef_IDR/Ef_GS': (np.random.normal(0.03, 1, 800).cumsum() + 10) / 15,
'location': np.tile(np.repeat(hue_order, 100), 4)})
fig, ax = plt.subplots(figsize=(12, 5))
sns.boxplot(data=df_idrs, x=df_idrs['construct'], y=df_idrs['Norm_Ef_IDR/Ef_GS'], hue='location',
order=order, hue_order=hue_order, ax=ax)
box_colors = [f + (1 - f) * np.array(to_rgb(c)) # whiten colors depending on hue
for c in colors for f in np.linspace(0, 0.5, len(hue_order))]
box_patches = [p for p in ax.patches if isinstance(p, PathPatch)]
for patch, color in zip(box_patches, box_colors):
patch.set_facecolor(color)
sns.stripplot(y='Norm_Ef_IDR/Ef_GS', x='construct', data=df_idrs, hue=df_idrs['location'],
jitter=0.4, marker='o', alpha=0.4, edgecolor='black', linewidth=1, dodge=True, ax=ax)
for collection, color in zip(ax.collections, box_colors):
collection.set_facecolor(color)
ax.axhline(y=1, linestyle='--', color='black', linewidth=2)
handles = [tuple(box_patches[i::len(hue_order)]) for i in range(len(hue_order))]
ax.legend(handles=handles, labels=hue_order, title='hue category',
handlelength=4, handler_map={tuple: HandlerTuple(ndivide=None, pad=0)},
loc='upper left', bbox_to_anchor=(1.01, 1))
plt.tight_layout()
plt.show()

Python Geopandas: Single Legend for multiple plots

How do I use a single legend for multiple geopandas plots?
Right now I have a Figure like this:
This post explains how to set legend values to the same for each plot. Though, i would like to have single legend for all plots. Optimally it should be possible to have multiple legends for different df's that I want to plot. E.g. the lines you see in the pictures also have a description.
Here is my current code:
years = [2005, 2009, 2013]
# initialize figure
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(10, 10), dpi=300, constrained_layout=True)
for i, year in enumerate(years):
# subset lines
lines_plot = lines[lines['year'] == year]
# subset controls plot
controls_plot = controls[controls['year'] == year]
# draw subfig
controls_plot.plot(column='pop_dens', ax=ax[i], legend=True, legend_kwds={'orientation': "horizontal"})
lines_plot.plot(ax=ax[i], color='red', lw=2, zorder=2)

Regarding the first of your questions 'How do I use a single legend for multiple geopandas plots?' you could make sure your plots all use the same colors (using the vmin and vmax args of the .plot() function) and then add a single colorbar to the figure like shown below. for the red lines you can just add another legend (the first thing is technically a colorbar not a legend).
import geopandas as gpd
from matplotlib import pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as mcolors
from matplotlib.lines import Line2D
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
f, ax = plt.subplots(nrows=1, ncols=3, figsize=(9, 4))
# define min and max values and colormap for the plots
value_min = 0
value_max = 1e7
cmap = 'viridis'
world.plot(ax=ax[0], column='pop_est', vmin=value_min, vmax=value_max, cmap=cmap)
world.plot(ax=ax[1], column='pop_est', vmin=value_min, vmax=value_max, cmap=cmap)
world.plot(ax=ax[2], column='pop_est', vmin=value_min, vmax=value_max, cmap=cmap)
# define a mappable based on which the colorbar will be drawn
mappable = cm.ScalarMappable(
norm=mcolors.Normalize(value_min, value_max),
cmap=cmap
)
# define position and extent of colorbar
cb_ax = f.add_axes([0.1, 0.1, 0.8, 0.05])
# draw colorbar
cbar = f.colorbar(mappable, cax=cb_ax, orientation='horizontal')
# add handles for the legend
custom_lines = [
Line2D([0], [0], color='r'),
Line2D([0], [0], color='b'),
]
# define labels for the legend
custom_labels = ['red line', 'blue line']
# plot legend, loc defines the location
plt.legend(
handles=custom_lines,
labels=custom_labels,
loc=(.4, 1.5),
title='2nd legend',
ncol=2
)
plt.tight_layout()
plt.show()

Aligning subplots with a pyplot barplot and seaborn heatmap

I am attempting to place a Seaborn time-based heatmap on top of a bar chart, indicating the number of patients in each bin/timeframe. I can successfully make an individual heatmap and bar plot, but combining the two does not work as intended.
import pandas as pd
import numpy as np
import seaborn as sb
from matplotlib import pyplot as plt
# Mock data
patient_counts = [650, 28, 8]
missings_df = pd.DataFrame(np.array([[-15.8, 600/650, 580/650, 590/650],
[488.2, 20/23, 21/23, 21/23],
[992.2, 7/8, 8/8, 8/8]]),
columns=['time', 'Resp. (/min)', 'SpO2', 'Blood Pressure'])
missings_df.set_index('time', inplace=True)
# Plot heatmap
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(26, 16), sharex=True, gridspec_kw={'height_ratios': [5, 1]})
sb.heatmap(missings_df.T, cmap="Blues", cbar_kws={"shrink": .8}, ax=ax1, xticklabels=False)
plt.xlabel('Time (hours)')
# Plot line graph under heatmap to show nr. of patients in each bin
x_ticks = [time for time in missings_df.index]
ax2.bar([i for i, _ in enumerate(x_ticks)], patient_counts, align='center')
plt.xticks([i for i, _ in enumerate(x_ticks)], x_ticks)
plt.show()
This code gives me the graph below. As you can see, there are two issues:
The bar plot extends too far
The first and second bar are not aligned with the top graph, where the tick of the first plot does not line up with the centre of the bar either.
I've tried looking online but could not find a good resource to fix the issues.. Any ideas?

A problem is that the colorbar takes away space from the heatmap, making its plot narrower than the bar plot. You can create a 2x2 grid to make room for the colorbar, and remove the empty subplot. Change sharex=True to sharex='col' to prevent the colorbar getting the same x-axis as the heatmap.
Another problem is that the heatmap has its cell borders at positions 0, 1, 2, ..., so their centers are at 0.5, 1.5, 2.5, .... You can put the bars at these centers instead of at their default positions:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
fig, ((ax1, cbar_ax), (ax2, dummy_ax)) = plt.subplots(nrows=2, ncols=2, figsize=(26, 16), sharex='col',
gridspec_kw={'height_ratios': [5, 1], 'width_ratios': [20, 1]})
missings_df = np.random.rand(3, 3)
sns.heatmap(missings_df.T, cmap="Blues", cbar_ax=cbar_ax, xticklabels=False, linewidths=2, ax=ax1)
ax2.set_xlabel('Time (hours)')
patient_counts = np.random.randint(10, 50, 3)
x_ticks = ['Time1', 'Time2', 'Time3']
x_tick_pos = [i + 0.5 for i in range(len(x_ticks))]
ax2.bar(x_tick_pos, patient_counts, align='center')
ax2.set_xticks(x_tick_pos)
ax2.set_xticklabels(x_ticks)
dummy_ax.axis('off')
plt.tight_layout()
plt.show()
PS: Be careful not to mix the "functional" interface with the "object-oriented" interface to matplotlib. So, try not to use plt.xlabel() as it is not obvious that it will be applied to the "current" ax (ax2 in the code of the question).

Superimposition of histogram and density in Pandas/Matplotlib in Python

I've got a Pandas dataframe named clean which contains a column v for which I would like to draw a histogram and superimpose a density plot. I know I can plot one under the other this way:
import pandas as pd
import matplotlib.pyplot as plt
Maxv=200
plt.subplot(211)
plt.hist(clean['v'],bins=40, range=(0, Maxv), color='g')
plt.ylabel("Number")
plt.subplot(212)
ax=clean['v'].plot(kind='density')
ax.set_xlim(0, Maxv)
plt.xlabel("Orbital velocity (km/s)")
ax.get_yaxis().set_visible(False)
But when I try to superimpose, y scales doesn't match (and I loose y axis ticks and labels):
yhist, xhist, _hist = plt.hist(clean['v'],bins=40, range=(0, Maxv), color='g')
plt.ylabel("Number")
ax=clean['v'].plot(kind='density') #I would like to insert here a normalization to max(yhist)/max(ax)
ax.set_xlim(0, Maxv)
plt.xlabel("Orbital velocity (km/s)")
ax.get_yaxis().set_visible(False)
Some hint? (Additional question: how can I change the width of density smoothing?)

Based on your code, this should work:
ax = clean.v.plot(kind='hist', bins=40, normed=True)
clean.v.plot(kind='kde', ax=ax, secondary_y=True)
ax.set(xlim=[0, Maxv])
You might not even need the secondary_y anymore.

No I try this:
ax = clean.v.plot(kind='hist', bins=40, range=(0, Maxv))
clean.v.plot(kind='kde', ax=ax, secondary_y=True)
But the range part doesn't work, and ther's still the left y-axis problem

Seaborn makes this easy
import seaborn as sns
sns.distplot(df['numeric_column'],bins=25)

Matplotlib: how to adjust zorder of second legend?

Here is an example that reproduces my problem:
import matplotlib.pyplot as plt
import numpy as np
data1,data2,data3,data4 = np.random.random(100),np.random.random(100),np.random.random(100),np.random.random(100)
fig,ax = plt.subplots()
ax.plot(data1)
ax.plot(data2)
ax.plot(data3)
ax2 = ax.twinx()
ax2.plot(data4)
plt.grid('on')
ax.legend(['1','2','3'], loc='center')
ax2.legend(['4'], loc=1)
How can I get the legend in the center to plot on top of the lines?

To get exactly what you have asked for, try the following. Note I have modified your code to define the labels when you generate the plot and also the colors so you don't get a repeated blue line.
import matplotlib.pyplot as plt
import numpy as np
data1,data2,data3,data4 = (np.random.random(100),
np.random.random(100),
np.random.random(100),
np.random.random(100))
fig,ax = plt.subplots()
ax.plot(data1, label="1", color="k")
ax.plot(data2, label="2", color="r")
ax.plot(data3, label="3", color="g")
ax2 = ax.twinx()
ax2.plot(data4, label="4", color="b")
# First get the handles and labels from the axes
handles1, labels1 = ax.get_legend_handles_labels()
handles2, labels2 = ax2.get_legend_handles_labels()
# Add the first legend to the second axis so it displaysys 'on top'
first_legend = plt.legend(handles1, labels1, loc='center')
ax2.add_artist(first_legend)
# Add the second legend as usual
ax2.legend(handles2, labels2)
plt.show()
Now I will add that it would be clearer if you just use a single legend adding all the lines to that. This is described in this SO post and in the code above can easily be achieved with
ax2.legend(handles1+handles2, labels1+labels2)
But obviously you may have your own reasons for wanting two legends.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to plot a paired histogram using seaborn - python

Related

How to customize seaborn boxplot with specific color sequence when boxplots have hue

Python Geopandas: Single Legend for multiple plots

Aligning subplots with a pyplot barplot and seaborn heatmap

Superimposition of histogram and density in Pandas/Matplotlib in Python

Matplotlib: how to adjust zorder of second legend?

Categories

Resources