Shift bar locations on multi-bar bar plot - python

much searching has not yielded a working solution to a python matplotlib problem. I'm sure I'm missing something simple...
MWE:
import pandas as pd
import matplotlib.pyplot as plt
#MWE plot
T = [1, 2, 3, 4, 5, 6]
n = len(T)
d1 = list(zip([500]*n, [250]*n))
d2 = list(zip([250]*n, [125]*n))
df1 = pd.DataFrame(data=d1, index=T)
df2 = pd.DataFrame(data=d2, index=T)
fig = plt.figure()
ax = fig.add_subplot(111)
df1.plot(kind='bar', stacked=True, align='edge', width=-0.4, ax=ax)
df2.plot(kind='bar', stacked=True, align='edge', width=0.4, ax=ax)
plt.show()
Generates:
Shifted Plot
No matter what parameters I play around with, that first bar is cut off on the left. If I only plot a single bar (i.e. not clusters of bars), the bars are not cut off and in fact there is nice even white space on both sides.
I hard-coded the data for this MWE; however, I am trying to find a generic way to ensure the correct alignment since I will likely produce a LOT of these plots with varying numbers of items on the x axis and potentially a varying number of bars in each cluster.
How do I shift the bars so that the they are spaced correctly on the x axis with even white space?

It all depends on the width that you put in your plots. Put some xlim.
import pandas as pd
import matplotlib.pyplot as plt
#MWE plot
T = [1, 2, 3, 4, 5, 6]
n = len(T)
d1 = list(zip([500]*n, [250]*n))
d2 = list(zip([250]*n, [125]*n))
df1 = pd.DataFrame(data=d1, index=T)
df2 = pd.DataFrame(data=d2, index=T)
fig = plt.figure()
ax = fig.add_subplot(111)
df1.plot(kind='bar', stacked=True, align='edge', width=-0.4, ax=ax)
df2.plot(kind='bar', stacked=True, align='edge', width=0.4, ax=ax)
plt.xlim(-.4,5.4)
plt.show()
Hope it works!

Related

Matplotlib stacked histogram numpy.ndarray error

I am trying to make a stacked histogram using matplotlib by looping through the categories in the dataframe and assigning the bar color based on a dictionary.
I get this error on the ax1.hist() call. How should I fix it?
AttributeError: 'numpy.ndarray' object has no attribute 'hist'
Reproducible Example
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
%matplotlib inline
plt.style.use('seaborn-whitegrid')
y = [1,5,9,2,4,2,5,6,1]
cat = ['A','B','B','B','A','B','B','B','B']
df = pd.DataFrame(list(zip(y,cat)), columns =['y', 'cat'])
fig, axes = plt.subplots(3,3, figsize=(5,5), constrained_layout=True)
fig.suptitle('Histograms')
ax1 = axes[0]
mycolorsdict = {'A':'magenta', 'B':'blue'}
for key, batch in df.groupby(['cat']):
ax1.hist(batch.y, label=key, color=mycolorsdict[key],
density=False, cumulative=False, edgecolor='black',
orientation='horizontal', stacked=True)
Updated effort, still not working
This is close, but it is not stacking (should see stacks at y=5); I think maybe because of the loop?
mycolorsdict = {'A':'magenta', 'B':'blue'}
for ii, ax in enumerate(axes.flat):
for key, batch in df.groupby(['cat']):
ax.hist(batch.y,
label=key, color=mycolorsdict[key],density=False, edgecolor='black',
cumulative=False, orientation='horizontal', stacked=True)
To draw on a specific subplot, two indices are needed (row, column), so axes[0,0] for the first subplot. The error message comes from using ax1 = axes[0] instead of ax1 = axes[0,0].
Now, to create a stacked histogram via ax.hist(), all the y-data need to be provided at the same time. The code below shows how this can be done starting from the result of groupby. Also note, that when your values are discrete, it is important to explicitly set the bin boundaries making sure that the values fall precisely between these boundaries. Setting the boundaries at the halves is one way.
Things can be simplified a lot using seaborn's histplot(). Here is a breakdown of the parameters used:
data=df the dataframe
y='y' gives the dataframe column for histogram. Use x= (instead of y=) for a vertical histogram.
hue='cat' gives the dataframe column to create mulitple groups
palette=mycolorsdict; the palette defines the coloring; there are many ways to assign a palette, one of which is a dictionary on the hue values
discrete=True: when working with discrete data, seaborn sets the appropriate bin boundaries
multiple='stack' creates a stacked histogram, depending on the hue categories
alpha=1: default seaborn sets an alpha of 0.75; optionally this can be changed
ax=axes[0, 1]: draw on the 2nd subplot of the 1st row
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn-whitegrid')
y = [1, 5, 9, 2, 4, 2, 5, 6, 1]
cat = ['A', 'B', 'B', 'B', 'A', 'B', 'B', 'B', 'B']
df = pd.DataFrame({'y':y, 'cat':cat})
fig, axes = plt.subplots(3, 3, figsize=(20, 10), constrained_layout=True)
fig.suptitle('Histograms')
mycolorsdict = {'A': 'magenta', 'B': 'blue'}
groups = df.groupby(['cat'])
axes[0, 0].hist([batch.y for _, batch in groups],
label=[key for key, _ in groups], color=[mycolorsdict[key] for key, _ in groups], density=False,
edgecolor='black',
cumulative=False, orientation='horizontal', stacked=True, bins=np.arange(0.5, 10))
axes[0, 0].legend()
sns.histplot(data=df, y='y', hue='cat', palette=mycolorsdict, discrete=True, multiple='stack', alpha=1, ax=axes[0, 1])
plt.show()

Aligning subplots with a pyplot barplot and seaborn heatmap

I am attempting to place a Seaborn time-based heatmap on top of a bar chart, indicating the number of patients in each bin/timeframe. I can successfully make an individual heatmap and bar plot, but combining the two does not work as intended.
import pandas as pd
import numpy as np
import seaborn as sb
from matplotlib import pyplot as plt
# Mock data
patient_counts = [650, 28, 8]
missings_df = pd.DataFrame(np.array([[-15.8, 600/650, 580/650, 590/650],
[488.2, 20/23, 21/23, 21/23],
[992.2, 7/8, 8/8, 8/8]]),
columns=['time', 'Resp. (/min)', 'SpO2', 'Blood Pressure'])
missings_df.set_index('time', inplace=True)
# Plot heatmap
fig, (ax1, ax2) = plt.subplots(nrows=2, figsize=(26, 16), sharex=True, gridspec_kw={'height_ratios': [5, 1]})
sb.heatmap(missings_df.T, cmap="Blues", cbar_kws={"shrink": .8}, ax=ax1, xticklabels=False)
plt.xlabel('Time (hours)')
# Plot line graph under heatmap to show nr. of patients in each bin
x_ticks = [time for time in missings_df.index]
ax2.bar([i for i, _ in enumerate(x_ticks)], patient_counts, align='center')
plt.xticks([i for i, _ in enumerate(x_ticks)], x_ticks)
plt.show()
This code gives me the graph below. As you can see, there are two issues:
The bar plot extends too far
The first and second bar are not aligned with the top graph, where the tick of the first plot does not line up with the centre of the bar either.
I've tried looking online but could not find a good resource to fix the issues.. Any ideas?
A problem is that the colorbar takes away space from the heatmap, making its plot narrower than the bar plot. You can create a 2x2 grid to make room for the colorbar, and remove the empty subplot. Change sharex=True to sharex='col' to prevent the colorbar getting the same x-axis as the heatmap.
Another problem is that the heatmap has its cell borders at positions 0, 1, 2, ..., so their centers are at 0.5, 1.5, 2.5, .... You can put the bars at these centers instead of at their default positions:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
fig, ((ax1, cbar_ax), (ax2, dummy_ax)) = plt.subplots(nrows=2, ncols=2, figsize=(26, 16), sharex='col',
gridspec_kw={'height_ratios': [5, 1], 'width_ratios': [20, 1]})
missings_df = np.random.rand(3, 3)
sns.heatmap(missings_df.T, cmap="Blues", cbar_ax=cbar_ax, xticklabels=False, linewidths=2, ax=ax1)
ax2.set_xlabel('Time (hours)')
patient_counts = np.random.randint(10, 50, 3)
x_ticks = ['Time1', 'Time2', 'Time3']
x_tick_pos = [i + 0.5 for i in range(len(x_ticks))]
ax2.bar(x_tick_pos, patient_counts, align='center')
ax2.set_xticks(x_tick_pos)
ax2.set_xticklabels(x_ticks)
dummy_ax.axis('off')
plt.tight_layout()
plt.show()
PS: Be careful not to mix the "functional" interface with the "object-oriented" interface to matplotlib. So, try not to use plt.xlabel() as it is not obvious that it will be applied to the "current" ax (ax2 in the code of the question).

Overlayed seaborn distplots sharing x axis

sorry if this is too basic, this is my first question to the forum:
I'm using the titanic dataset for practice and
I'm trying to plot two distributions of the variable 'Age', one only with passengers that survived and another with the passenger that perished. But for some reason, they don't share the same x-axis when plotted together.
Here's my code so far:
df_age = df[df['Age'].notnull()]
dfage_survived = dfage[dfage.Survived == 1]
dfage_perished = dfage[dfage.Survived == 0]
sns.set(style="white", palette="muted", color_codes=True)
fig = plt.figure(constrained_layout=True, figsize=(8, 8))
spec = fig.add_gridspec(3, 2)
ax1 = fig.add_subplot(spec[0, 0])
ax1 = sns.barplot(x='Sex', y = 'Survived', data =df)
ax2 = fig.add_subplot(spec[0, 1])
ax2 = sns.barplot(x='Embarked', y = 'Survived', data =df)
ax3 = fig.add_subplot(spec[1, 0])
ax3 = sns.barplot(x='Pclass', y ='Survived', data =df)
ax4 = fig.add_subplot(spec[1, 1])
ax4 = sns.barplot(x='SibSp', y ='Survived', data=df)
ax5 = fig.add_subplot(spec[2, :])
ax5_1 = sns.distplot(dfage_survived['Age'], kde = False, label = 'Survived')
ax5_2 = sns.distplot(dfage_perished['Age'], kde = False, label = 'Perished')
plt.legend(prop={'size': 12})
OUTPUT:
OUTPUT:
You must set bins for each sns.distplot call, otherwise sns will set the bins for you, which are based on the minimum element and maximum element, and since these are different for perished and survived, the bars won't line up. Use the bins parameter to set appropriate bins (see here https://seaborn.pydata.org/generated/seaborn.distplot.html)
The bins of the histogram are dividing the range between the smallest and largest x into equal parts. Both sets have different minimal and maximal values. Moreover, your data is discrete, so the bin boundaries should best be placed in-between the integer values. The bins can be set explicitly: sns.distplot(..., bins=np.arange(-0.5, 86, 5)) for both.
A simpler approach, however, is to make use of Seaborn's hue= parameter to make seaborn take care of dividing the groups and creating both histograms in one go.
Note that sns.distplot has been replaced by sns.histplot in the latest version (0.11). If you want both histograms stacked, you can add the parameter multiple='stack'.
To obtain a stand-alone example, the code below uses the standard Seaborn Titanic dataset, which uses the column names in lowercase.
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
df = sns.load_dataset('titanic')
sns.set(style="white", palette="muted", color_codes=True)
fig = plt.figure(constrained_layout=True, figsize=(8, 3))
spec = fig.add_gridspec(1, 2)
ax5 = fig.add_subplot(spec[0, :])
sns.histplot(df, x='age', bins=np.arange(-0.5, 86, 5), kde=False, hue='survived', legend=True, ax=ax5)
ax5.legend(['Yes', 'No'], title='Survived?', prop={'size': 12})
plt.show()

Is there a way to plot 3 (or more) charts on the same line

I have the dataframe df = pd.DataFrame(x, columns = ['col1', 'col2', 'col3']). I know I can use sns.distplot(df.col1) to plot a histogram for each column. Can I plot separate histograms for all 3 columns on the same line?
I suggest that you should use matplotlib here:
fig = matplotlib.pyplot.subplots(nrows=1,
ncols=3)
fig.plot(df['col1'])
fig.plot(df['col2'])
fig.plot(df['col3'])
plt.show()
I think this will work well well. For external reference you can go to https://www.educative.io/edpresso/what-is-a-subplots-in-matplotlib
An alternative:
from matplotlib import pyplot as plt
fig = plt.figure()
ax1 = fig.add_subplot(1, 3, 1)
ax2 = fig.add_subplot(1, 3, 2)
sns.distplot(df.col1, ax=ax1)
sns.distplot(df.col2, ax=ax2)
Edited:
To adjust the size of the subplots, use plt.subplotsdocumentation
fig, axs = plt.subplots(1, 3, gridspec_kw={'width_ratios': [3, 1, 1]})
sns.distplot(df.col1, ax=axs[0])
sns.distplot(df.col2, ax=axs[1])
The subplots will have 3:1:1 width ratios.

seaborn barplot add xticks for hue

Seaborn barplots have an xtick for every unique value along the x coordinate. But for the hue values, there are no ticks:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(columns=["model", "time", "value"])
df["model"] = ["on"]*2 + ["off"]*2
df["time"] = ["short", "long"] * 2
df["value"] = [1, 10, 2, 4]
fig, ax = plt.subplots()
bar = sns.barplot(data=df, x="model", hue="time", y="value", edgecolor="white")
Is it possible to add ticks for the hue, too?
Some of the hue colors are quite similar and I would like to add a text description, too.
You have to be careful about the number of hues that you might have in your dataset, and the number of categories and so forth.
If you have N categories, then they are each plotted at axis coordinates 0,1,...,N-1. Then the various hues are plotted centered around this coordinate. For 2 hues like in your example, the bars are at x±0.2
fig, ax = plt.subplots()
bar = sns.barplot(data=df, x="model", hue="time", y="value", edgecolor="white")
ax.set_xticks([-0.2,0.2, 0.8,1.2])
ax.set_xticklabels(["on/short","on/long",'off/short','off/long'])
Note that I would strongly recommend that you use order= and hue_order= in your call to barplot() to be sure that your labels match the bars.

Categories