Stripplot and lineplot weird result - python

When I use lineplot or stripplot it works well. But using both the median is shifted; I don't understand why! Thank you for your help.
sns.lineplot(x='quality', y='alcohol', data=df, estimator=np.median, err_style=None)
sns.stripplot(x='quality', y='alcohol', data=df, jitter=True, color='red', alpha=0.2, edgecolor='none')
stripplot
lineplot+stripplot
lineplot

What is happening here is that your first plot is creating an x axis with 0 to n range, and relabeling those x tick with a list of integers from 3 to n, then when the second chart or the stripplot plots on top of this x axis it is using the original number therefore xtick 3 for this new chart starts on labelled xtick 6. Hence the offset.
One way to do correct this is to create a xaxis with a predefined range and then plot both charts on this predefined scale, see example below:
import seaborn as sns
import matplotlib.pyplot as plt
x = [3,4,5,6,7,8]
y = [10, 12, 15, 18, 19, 26]
#First axes creates the error in graphing
fig, ax = plt.subplots(1,2)
sns.lineplot(x=x,y=y, ax=ax[0])
sns.stripplot(x=x, y=y, ax=ax[0])
#Second axes shows correction
xplot = range(len(x))
sns.lineplot(x=xplot,y=y, ax=ax[1])
sns.stripplot(x, y=y, ax=ax[1])
Output:

Related

Aligning subplots with a pyplot barplot and seaborn heatmap

I am attempting to place a Seaborn time-based heatmap on top of a bar chart, indicating the number of patients in each bin/timeframe. I can successfully make an individual heatmap and bar plot, but combining the two does not work as intended.
import pandas as pd
import numpy as np
import seaborn as sb
from matplotlib import pyplot as plt
# Mock data
patient_counts = [650, 28, 8]
missings_df = pd.DataFrame(np.array([[-15.8, 600/650, 580/650, 590/650],
[488.2, 20/23, 21/23, 21/23],
[992.2, 7/8, 8/8, 8/8]]),
columns=['time', 'Resp. (/min)', 'SpO2', 'Blood Pressure'])
missings_df.set_index('time', inplace=True)
# Plot heatmap
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(26, 16), sharex=True, gridspec_kw={'height_ratios': [5, 1]})
sb.heatmap(missings_df.T, cmap="Blues", cbar_kws={"shrink": .8}, ax=ax1, xticklabels=False)
plt.xlabel('Time (hours)')
# Plot line graph under heatmap to show nr. of patients in each bin
x_ticks = [time for time in missings_df.index]
ax2.bar([i for i, _ in enumerate(x_ticks)], patient_counts, align='center')
plt.xticks([i for i, _ in enumerate(x_ticks)], x_ticks)
plt.show()
This code gives me the graph below. As you can see, there are two issues:
The bar plot extends too far
The first and second bar are not aligned with the top graph, where the tick of the first plot does not line up with the centre of the bar either.
I've tried looking online but could not find a good resource to fix the issues.. Any ideas?
A problem is that the colorbar takes away space from the heatmap, making its plot narrower than the bar plot. You can create a 2x2 grid to make room for the colorbar, and remove the empty subplot. Change sharex=True to sharex='col' to prevent the colorbar getting the same x-axis as the heatmap.
Another problem is that the heatmap has its cell borders at positions 0, 1, 2, ..., so their centers are at 0.5, 1.5, 2.5, .... You can put the bars at these centers instead of at their default positions:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
fig, ((ax1, cbar_ax), (ax2, dummy_ax)) = plt.subplots(nrows=2, ncols=2, figsize=(26, 16), sharex='col',
gridspec_kw={'height_ratios': [5, 1], 'width_ratios': [20, 1]})
missings_df = np.random.rand(3, 3)
sns.heatmap(missings_df.T, cmap="Blues", cbar_ax=cbar_ax, xticklabels=False, linewidths=2, ax=ax1)
ax2.set_xlabel('Time (hours)')
patient_counts = np.random.randint(10, 50, 3)
x_ticks = ['Time1', 'Time2', 'Time3']
x_tick_pos = [i + 0.5 for i in range(len(x_ticks))]
ax2.bar(x_tick_pos, patient_counts, align='center')
ax2.set_xticks(x_tick_pos)
ax2.set_xticklabels(x_ticks)
dummy_ax.axis('off')
plt.tight_layout()
plt.show()
PS: Be careful not to mix the "functional" interface with the "object-oriented" interface to matplotlib. So, try not to use plt.xlabel() as it is not obvious that it will be applied to the "current" ax (ax2 in the code of the question).

How to plot a paired histogram using seaborn

I would like to make a paired histogram like the one shown here using the seaborn distplot.
This kind of plot can also be referred to as the back-to-back histogram shown here, or a bihistogram inverted/mirrored along the x-axis as discussed here.
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
green = np.random.normal(20,10,1000)
blue = np.random.poisson(60,1000)
fig, ax = plt.subplots(figsize=(8,6))
sns.distplot(blue, hist=True, kde=True, hist_kws={'edgecolor':'black'}, kde_kws={'linewidth':2}, bins=10, color='blue')
sns.distplot(green, hist=True, kde=True, hist_kws={'edgecolor':'black'}, kde_kws={'linewidth':2}, bins=10, color='green')
ax.set_xticks(np.arange(-20,121,20))
ax.set_yticks(np.arange(0.0,0.07,0.01))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show()
Here is the output:
When I use the method discussed here (plt.barh), I get the bar plot shown just below, which is not what I am looking for.
Or maybe I haven't understood the workaround well enough...
A simple/short implementation of python-seaborn-distplot similar to these kinds of plots would be perfect. I edited the figure of my first plot above to show the kind of plot I hope to achieve (though y-axis not upside down):
Any leads would be greatly appreciated.
You could use two subplots and invert the y-axis of the lower one and plot with the same bins.
df = pd.DataFrame({'a': np.random.normal(0,5,1000), 'b': np.random.normal(20,5,1000)})
fig =plt.figure(figsize=(5,5))
ax = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
bins = np.arange(-20,40)
ax.hist(df['a'], bins=bins)
ax2.hist(df['b'],color='orange', bins=bins)
ax2.invert_yaxis()
edit:
improvements suggested by #mwaskom
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(5,5))
bins = np.arange(-20,40)
for ax, column, color, invert in zip(axes.ravel(), df.columns, ['teal', 'orange'], [False,True]):
ax.hist(df[column], bins=bins, color=color)
if invert:
ax.invert_yaxis()
plt.subplots_adjust(hspace=0)
Here is a possible approach using seaborn's displots.
Seaborn doesn't return the created graphical elements, but the ax can be interrogated. To make sure the ax only contains the elements you want upside down, those elements can be drawn first. Then, all the patches (the rectangular bars) and the lines (the curve for the kde) can be given their height in negative. Optionally the x-axis can be set at y == 0 using ax.spines['bottom'].set_position('zero').
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
green = np.random.normal(20, 10, 1000)
blue = np.random.poisson(60, 1000)
fig, ax = plt.subplots(figsize=(8, 6))
sns.distplot(green, hist=True, kde=True, hist_kws={'edgecolor': 'black'}, kde_kws={'linewidth': 2}, bins=10,
color='green')
for p in ax.patches: # turn the histogram upside down
p.set_height(-p.get_height())
for l in ax.lines: # turn the kde curve upside down
l.set_ydata(-l.get_ydata())
sns.distplot(blue, hist=True, kde=True, hist_kws={'edgecolor': 'black'}, kde_kws={'linewidth': 2}, bins=10,
color='blue')
ax.set_xticks(np.arange(-20, 121, 20))
ax.set_yticks(np.arange(0.0, 0.07, 0.01))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
pos_ticks = np.array([t for t in ax.get_yticks() if t > 0])
ticks = np.concatenate([-pos_ticks[::-1], [0], pos_ticks])
ax.set_yticks(ticks)
ax.set_yticklabels([f'{abs(t):.2f}' for t in ticks])
ax.spines['bottom'].set_position('zero')
plt.show()

Superimposition of histogram and density in Pandas/Matplotlib in Python

I've got a Pandas dataframe named clean which contains a column v for which I would like to draw a histogram and superimpose a density plot. I know I can plot one under the other this way:
import pandas as pd
import matplotlib.pyplot as plt
Maxv=200
plt.subplot(211)
plt.hist(clean['v'],bins=40, range=(0, Maxv), color='g')
plt.ylabel("Number")
plt.subplot(212)
ax=clean['v'].plot(kind='density')
ax.set_xlim(0, Maxv)
plt.xlabel("Orbital velocity (km/s)")
ax.get_yaxis().set_visible(False)
But when I try to superimpose, y scales doesn't match (and I loose y axis ticks and labels):
yhist, xhist, _hist = plt.hist(clean['v'],bins=40, range=(0, Maxv), color='g')
plt.ylabel("Number")
ax=clean['v'].plot(kind='density') #I would like to insert here a normalization to max(yhist)/max(ax)
ax.set_xlim(0, Maxv)
plt.xlabel("Orbital velocity (km/s)")
ax.get_yaxis().set_visible(False)
Some hint? (Additional question: how can I change the width of density smoothing?)
Based on your code, this should work:
ax = clean.v.plot(kind='hist', bins=40, normed=True)
clean.v.plot(kind='kde', ax=ax, secondary_y=True)
ax.set(xlim=[0, Maxv])
You might not even need the secondary_y anymore.
No I try this:
ax = clean.v.plot(kind='hist', bins=40, range=(0, Maxv))
clean.v.plot(kind='kde', ax=ax, secondary_y=True)
But the range part doesn't work, and ther's still the left y-axis problem
Seaborn makes this easy
import seaborn as sns
sns.distplot(df['numeric_column'],bins=25)

Plotting multiple series on a line/bar graph with pandas

I'm trying to make a plot of a line and bar on the same graph. I'm close, but I can't solve a few items. Here's what I have so far...
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
data = pd.DataFrame({'Value1': np.arange(80, 180, 1),
'Value2': np.arange(1.5, .5, -0.01)},
index=np.arange(10, 110, 1))
fig, ax = plt.subplots(figsize=(10, 10))
data['Value1'].plot(ax=ax)
ax2 = ax.twinx()
data['Value2'].plot(kind='bar', ax=ax2, color='y', ylim=(0, 3))
So the problems I have with this graph are...
The x-ticks look awful. If I only do a line graph, the x-ticks look fine. As soon as I add the twinx axis however, the major/minor ticks logic get's dropped. How can I keep that?
My x-axes is numeric. Note that the line intercepts the x-axis at the value "10" (its hard to see, but that's what's going on). I presume this is because the line's x-axis is supposed to begin at "10" and the bar's x-axis begins at 10 as well, but there's confusion of the value and label so the line's x-axis get's pushed over the label "20".
What's the best way to do this?
Bar plot and line plot has different X coordinate range is different, consider using two x coordinate.
you can try to save xticks and xtickslabels after data['Value1'].plot(ax=ax) and set them back after data['Value2'].plot(kind='bar', ax=ax2, color='y', ylim=(0, 3)):
data['Value1'].plot(ax=ax)
xticks = ax.get_xticks()
xlabels = [x.get_text() for x in ax.get_xticklabels()]
ax2 = ax.twinx()
data['Value2'].plot(kind='bar', ax=ax2, color='y', ylim=(0, 3))
ax.set_xticks(xticks)
ax.set_xticklabels(xlabels)
plt.show()

Use matplotlib: plot error bars on two y axes

I'd like to plot a series with x and y error bars, then plot a second series with x and y error bars on a second y axis all on the same subplot. Can this be done with matplotlib?
import matplotlib.pyplot as plt
plt.figure()
ax1 = plt.errorbar(voltage, dP, xerr=voltageU, yerr=dPU)
ax2 = plt.errorbar(voltage, current, xerr=voltageU, yerr=currentU)
plt.show()
Basically, I'd like to put ax2 on a second axis and have the scale on the right side.
Thanks!
twinx() is your friend for adding a secondary y-axis, e.g.:
import matplotlib.pyplot as pl
import numpy as np
pl.figure()
ax1 = pl.gca()
ax1.errorbar(np.arange(10), np.arange(10), xerr=np.random.random(10), yerr=np.random.random(10), color='g')
ax2 = ax1.twinx()
ax2.errorbar(np.arange(10), np.arange(10)+5, xerr=np.random.random(10), yerr=np.random.random(10), color='r')
There is not a lot of documentation except for:
matplotlib.pyplot.twinx(ax=None)
Make a second axes that shares the x-axis. The new axes will overlay ax (or the current axes if ax is None). The ticks for ax2 will be placed on the right, and the ax2 instance is returned.
I was struggling to share the x-axis, but thank you #Bart you saved me!
The simple solution is use twiny instead of twinx
ax1.errorbar(layers, scores_means[str(epoch)][h,:],np.array(scores_stds[str(epoch)][h,:]))
# Make the y-axis label, ticks and tick labels match the line color.
ax1.set_xlabel('depth', color='b')
ax1.tick_params('x', colors='b')
ax2 = ax1.twiny()
ax2.errorbar(hidden_dim, scores_means[str(epoch)][:,l], np.array(scores_stds[str(epoch)][:,l]))
ax2.set_xlabel('width', color='r')
ax2.tick_params('x', colors='r')
fig.tight_layout()
plt.show()

Categories