I am attempting to place a Seaborn time-based heatmap on top of a bar chart, indicating the number of patients in each bin/timeframe. I can successfully make an individual heatmap and bar plot, but combining the two does not work as intended.
import pandas as pd
import numpy as np
import seaborn as sb
from matplotlib import pyplot as plt
# Mock data
patient_counts = [650, 28, 8]
missings_df = pd.DataFrame(np.array([[-15.8, 600/650, 580/650, 590/650],
[488.2, 20/23, 21/23, 21/23],
[992.2, 7/8, 8/8, 8/8]]),
columns=['time', 'Resp. (/min)', 'SpO2', 'Blood Pressure'])
missings_df.set_index('time', inplace=True)
# Plot heatmap
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(26, 16), sharex=True, gridspec_kw={'height_ratios': [5, 1]})
sb.heatmap(missings_df.T, cmap="Blues", cbar_kws={"shrink": .8}, ax=ax1, xticklabels=False)
plt.xlabel('Time (hours)')
# Plot line graph under heatmap to show nr. of patients in each bin
x_ticks = [time for time in missings_df.index]
ax2.bar([i for i, _ in enumerate(x_ticks)], patient_counts, align='center')
plt.xticks([i for i, _ in enumerate(x_ticks)], x_ticks)
plt.show()
This code gives me the graph below. As you can see, there are two issues:
The bar plot extends too far
The first and second bar are not aligned with the top graph, where the tick of the first plot does not line up with the centre of the bar either.
I've tried looking online but could not find a good resource to fix the issues.. Any ideas?
A problem is that the colorbar takes away space from the heatmap, making its plot narrower than the bar plot. You can create a 2x2 grid to make room for the colorbar, and remove the empty subplot. Change sharex=True to sharex='col' to prevent the colorbar getting the same x-axis as the heatmap.
Another problem is that the heatmap has its cell borders at positions 0, 1, 2, ..., so their centers are at 0.5, 1.5, 2.5, .... You can put the bars at these centers instead of at their default positions:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
fig, ((ax1, cbar_ax), (ax2, dummy_ax)) = plt.subplots(nrows=2, ncols=2, figsize=(26, 16), sharex='col',
gridspec_kw={'height_ratios': [5, 1], 'width_ratios': [20, 1]})
missings_df = np.random.rand(3, 3)
sns.heatmap(missings_df.T, cmap="Blues", cbar_ax=cbar_ax, xticklabels=False, linewidths=2, ax=ax1)
ax2.set_xlabel('Time (hours)')
patient_counts = np.random.randint(10, 50, 3)
x_ticks = ['Time1', 'Time2', 'Time3']
x_tick_pos = [i + 0.5 for i in range(len(x_ticks))]
ax2.bar(x_tick_pos, patient_counts, align='center')
ax2.set_xticks(x_tick_pos)
ax2.set_xticklabels(x_ticks)
dummy_ax.axis('off')
plt.tight_layout()
plt.show()
PS: Be careful not to mix the "functional" interface with the "object-oriented" interface to matplotlib. So, try not to use plt.xlabel() as it is not obvious that it will be applied to the "current" ax (ax2 in the code of the question).
Related
I'm working with Matplotlib and have a large number of 1D heatmaps, each with their own label. However, the labels are misaligned with the plots and I cannot figure out to get this to work automatically.
Here's an MWE
import numpy as np
import matplotlib.pyplot as plt
data = np.random.rand(10, 1000)
dogs = ["woof", "bark", "bowwow"]
fig, axs = plt.subplots(10)
for i in range(10):
axs[i].scatter(np.linspace(0, 1, 1000), np.linspace(0,1,1000)*0, 2000,
c=data[i, :], marker="|", cmap='inferno')
axs[i].set_frame_on(False)
axs[i].set_yticklabels([])
axs[i].set_xticklabels([])
axs[i].set_xticks([])
axs[i].set_yticks([])
axs[i].set_ylabel(dogs[i%3], rotation='horizontal')
plt.show()
I experimented with
axs[i].yaxis.set_label_coords(x, y)
for various values of x and y, and nothing seems to work. I would prefer to have it align automatically, with the bottom of the text corresponding to the bottom of the individual plot.
Attached is an image showcasing the alignment issue.
Example
You could create your heatmaps via seaborn, and use yticklabels=[label_name] to set the labels. Rotating the labels to 0 degrees should have them nicely aligned. Note that the data is expected to have a shape of 1xN.
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
labels = ['Alkaid', 'Mizar', 'Alioth', 'Megrez', 'Phecda', 'Merak', 'Dubhe']
nrows = len(labels)
fig, axs = plt.subplots(nrows=nrows, figsize=(10, 5))
for ax_i, data_i, label_i in zip(axs, np.random.randn(nrows, 1, 100).cumsum(axis=2), labels):
sns.heatmap(data=data_i, xticklabels=[], yticklabels=[label_i], cmap='inferno', cbar=False, ax=ax_i)
ax_i.tick_params(axis='y', rotation=0, labelsize=22, length=0) # length means length of the tick mark
plt.tight_layout()
plt.show()
After a bit of playing around, I found that
axs[0].set_ylabel("Pseudotime", fontsize=12, rotation='horizontal', ha='right', va='center')
is sufficient for aligning the y-labels.
I would like to make a paired histogram like the one shown here using the seaborn distplot.
This kind of plot can also be referred to as the back-to-back histogram shown here, or a bihistogram inverted/mirrored along the x-axis as discussed here.
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
green = np.random.normal(20,10,1000)
blue = np.random.poisson(60,1000)
fig, ax = plt.subplots(figsize=(8,6))
sns.distplot(blue, hist=True, kde=True, hist_kws={'edgecolor':'black'}, kde_kws={'linewidth':2}, bins=10, color='blue')
sns.distplot(green, hist=True, kde=True, hist_kws={'edgecolor':'black'}, kde_kws={'linewidth':2}, bins=10, color='green')
ax.set_xticks(np.arange(-20,121,20))
ax.set_yticks(np.arange(0.0,0.07,0.01))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show()
Here is the output:
When I use the method discussed here (plt.barh), I get the bar plot shown just below, which is not what I am looking for.
Or maybe I haven't understood the workaround well enough...
A simple/short implementation of python-seaborn-distplot similar to these kinds of plots would be perfect. I edited the figure of my first plot above to show the kind of plot I hope to achieve (though y-axis not upside down):
Any leads would be greatly appreciated.
You could use two subplots and invert the y-axis of the lower one and plot with the same bins.
df = pd.DataFrame({'a': np.random.normal(0,5,1000), 'b': np.random.normal(20,5,1000)})
fig =plt.figure(figsize=(5,5))
ax = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
bins = np.arange(-20,40)
ax.hist(df['a'], bins=bins)
ax2.hist(df['b'],color='orange', bins=bins)
ax2.invert_yaxis()
edit:
improvements suggested by #mwaskom
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(5,5))
bins = np.arange(-20,40)
for ax, column, color, invert in zip(axes.ravel(), df.columns, ['teal', 'orange'], [False,True]):
ax.hist(df[column], bins=bins, color=color)
if invert:
ax.invert_yaxis()
plt.subplots_adjust(hspace=0)
Here is a possible approach using seaborn's displots.
Seaborn doesn't return the created graphical elements, but the ax can be interrogated. To make sure the ax only contains the elements you want upside down, those elements can be drawn first. Then, all the patches (the rectangular bars) and the lines (the curve for the kde) can be given their height in negative. Optionally the x-axis can be set at y == 0 using ax.spines['bottom'].set_position('zero').
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
green = np.random.normal(20, 10, 1000)
blue = np.random.poisson(60, 1000)
fig, ax = plt.subplots(figsize=(8, 6))
sns.distplot(green, hist=True, kde=True, hist_kws={'edgecolor': 'black'}, kde_kws={'linewidth': 2}, bins=10,
color='green')
for p in ax.patches: # turn the histogram upside down
p.set_height(-p.get_height())
for l in ax.lines: # turn the kde curve upside down
l.set_ydata(-l.get_ydata())
sns.distplot(blue, hist=True, kde=True, hist_kws={'edgecolor': 'black'}, kde_kws={'linewidth': 2}, bins=10,
color='blue')
ax.set_xticks(np.arange(-20, 121, 20))
ax.set_yticks(np.arange(0.0, 0.07, 0.01))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
pos_ticks = np.array([t for t in ax.get_yticks() if t > 0])
ticks = np.concatenate([-pos_ticks[::-1], [0], pos_ticks])
ax.set_yticks(ticks)
ax.set_yticklabels([f'{abs(t):.2f}' for t in ticks])
ax.spines['bottom'].set_position('zero')
plt.show()
When I use lineplot or stripplot it works well. But using both the median is shifted; I don't understand why! Thank you for your help.
sns.lineplot(x='quality', y='alcohol', data=df, estimator=np.median, err_style=None)
sns.stripplot(x='quality', y='alcohol', data=df, jitter=True, color='red', alpha=0.2, edgecolor='none')
stripplot
lineplot+stripplot
lineplot
What is happening here is that your first plot is creating an x axis with 0 to n range, and relabeling those x tick with a list of integers from 3 to n, then when the second chart or the stripplot plots on top of this x axis it is using the original number therefore xtick 3 for this new chart starts on labelled xtick 6. Hence the offset.
One way to do correct this is to create a xaxis with a predefined range and then plot both charts on this predefined scale, see example below:
import seaborn as sns
import matplotlib.pyplot as plt
x = [3,4,5,6,7,8]
y = [10, 12, 15, 18, 19, 26]
#First axes creates the error in graphing
fig, ax = plt.subplots(1,2)
sns.lineplot(x=x,y=y, ax=ax[0])
sns.stripplot(x=x, y=y, ax=ax[0])
#Second axes shows correction
xplot = range(len(x))
sns.lineplot(x=xplot,y=y, ax=ax[1])
sns.stripplot(x, y=y, ax=ax[1])
Output:
I currently have 2 subplots using seaborn:
import matplotlib.pyplot as plt
import seaborn.apionly as sns
f, (ax1, ax2) = plt.subplots(2, sharex=True)
sns.distplot(df['Difference'].values, ax=ax1) #array, top subplot
sns.boxplot(df['Difference'].values, ax=ax2, width=.4) #bottom subplot
sns.stripplot([cimin, cimax], color='r', marker='d') #overlay confidence intervals over boxplot
ax1.set_ylabel('Relative Frequency') #label only the top subplot
plt.xlabel('Difference')
plt.show()
Here is the output:
I am rather stumped on how to make ax2 (the bottom figure) to become shorter relative to ax1 (the top figure). I was looking over the GridSpec (http://matplotlib.org/users/gridspec.html) documentation but I can't figure out how to apply it to seaborn objects.
Question:
How do I make the bottom subplot shorter compared to the top
subplot?
Incidentally, how do I move the plot's title "Distrubition of Difference" to go above the top
subplot?
Thank you for your time.
As #dnalow mentioned, seaborn has no impact on GridSpec, as you pass a reference to the Axes object to the function. Like so:
import matplotlib.pyplot as plt
import seaborn.apionly as sns
import matplotlib.gridspec as gridspec
tips = sns.load_dataset("tips")
gridkw = dict(height_ratios=[5, 1])
fig, (ax1, ax2) = plt.subplots(2, 1, gridspec_kw=gridkw)
sns.distplot(tips.loc[:,'total_bill'], ax=ax1) #array, top subplot
sns.boxplot(tips.loc[:,'total_bill'], ax=ax2, width=.4) #bottom subplot
plt.show()
If you're using a FacetGrid (either directly or through something like catplot, which uses it indirectly), then you can pass gridspec_kws.
Here is an example using a catplot, where "var3" has two values, i.e. there are two subplots, which I am displaying at a ratio of 3:8, with un-shared x-axes.
g = sns.catplot(data=data, x="bin", y="y", col="var3", hue="var4", kind="bar",
sharex=False,
facet_kws={
'gridspec_kws': {'width_ratios': [3, 8]}
})
# Make the first subplot have a custom `xlim`:
g.axes[0][0].set_xlim(right=2.5)
Result, with labels hidden because I just copied my actual data's output, so the labels wouldn't make sense.
Hi I wanted to draw a histogram with a boxplot appearing the top of the histogram showing the Q1,Q2 and Q3 as well as the outliers. Example phone is below. (I am using Python and Pandas)
I have checked several examples using matplotlib.pyplot but hardly came out with a good example. And I also wanted to have the histogram curve appearing like in the image below.
I also tried seaborn and it provided me the shape line along with the histogram but didnt find a way to incorporate with boxpot above it.
can anyone help me with this to have this on matplotlib.pyplot or using pyplot
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set(style="ticks")
x = np.random.randn(100)
f, (ax_box, ax_hist) = plt.subplots(2, sharex=True,
gridspec_kw={"height_ratios": (.15, .85)})
sns.boxplot(x, ax=ax_box)
sns.distplot(x, ax=ax_hist)
ax_box.set(yticks=[])
sns.despine(ax=ax_hist)
sns.despine(ax=ax_box, left=True)
From seaborn v0.11.2, sns.distplot is deprecated. Use sns.histplot for axes-level plots instead.
np.random.seed(2022)
x = np.random.randn(100)
f, (ax_box, ax_hist) = plt.subplots(2, sharex=True, gridspec_kw={"height_ratios": (.15, .85)})
sns.boxplot(x=x, ax=ax_box)
sns.histplot(x=x, bins=12, kde=True, stat='density', ax=ax_hist)
ax_box.set(yticks=[])
sns.despine(ax=ax_hist)
sns.despine(ax=ax_box, left=True)
Solution using only matplotlib, just because:
# start the plot: 2 rows, because we want the boxplot on the first row
# and the hist on the second
fig, ax = plt.subplots(
2, figsize=(7, 5), sharex=True,
gridspec_kw={"height_ratios": (.3, .7)} # the boxplot gets 30% of the vertical space
)
# the boxplot
ax[0].boxplot(data, vert=False)
# removing borders
ax[0].spines['top'].set_visible(False)
ax[0].spines['right'].set_visible(False)
ax[0].spines['left'].set_visible(False)
# the histogram
ax[1].hist(data)
# and we are good to go
plt.show()
Expanding on the answer from #mwaskom, I made a little adaptable function.
import seaborn as sns
def histogram_boxplot(data, xlabel = None, title = None, font_scale=2, figsize=(9,8), bins = None):
""" Boxplot and histogram combined
data: 1-d data array
xlabel: xlabel
title: title
font_scale: the scale of the font (default 2)
figsize: size of fig (default (9,8))
bins: number of bins (default None / auto)
example use: histogram_boxplot(np.random.rand(100), bins = 20, title="Fancy plot")
"""
sns.set(font_scale=font_scale)
f2, (ax_box2, ax_hist2) = plt.subplots(2, sharex=True, gridspec_kw={"height_ratios": (.15, .85)}, figsize=figsize)
sns.boxplot(data, ax=ax_box2)
sns.distplot(data, ax=ax_hist2, bins=bins) if bins else sns.distplot(data, ax=ax_hist2)
if xlabel: ax_hist2.set(xlabel=xlabel)
if title: ax_box2.set(title=title)
plt.show()
histogram_boxplot(np.random.randn(100), bins = 20, title="Fancy plot", xlabel="Some values")
Image
def histogram_boxplot(feature, figsize=(15,10), bins=None):
f,(ax_box,ax_hist)=plt.subplots(nrows=2,sharex=True, gridspec_kw={'height_ratios':(.25,.75)},figsize=figsize)
sns.distplot(feature,kde=False,ax=ax_hist,bins=bins)
sns.boxplot(feature,ax=ax_box, color='Red')
ax_hist.axvline(np.mean(feature),color='g',linestyle='-')
ax_hist.axvline(np.median(feature),color='y',linestyle='--')