In version 3.4, matplotlib added automatic Bar labels:
https://matplotlib.org/stable/users/whats_new.html#new-automatic-labeling-for-bar-charts
I'm trying to use this on a bar plot generated by Seaborn.
fig, axs = plt.subplots(
nrows=2,
)
for i, col in enumerate(['col_1', 'col_2']):
ax = axs[i]
sns.barplot(
x="class",
y=col,
hue="hue_col",
data=data_df,
edgecolor=".3",
linewidth=0.5,
ax=ax
)
ax.bar_label(ax.containers[i]) # Doesn't work
What do I need to do to make this work? example plot
You can loop through the containers and call ax.bar_label(...) for each of them. Note that seaborn creates one set of bars for each hue value.
The following example uses the titanic dataset and sets ci=None to avoid the error bars overlapping with the text (if error bars are needed, one could set a lighter color, e.g. errcolor='gold').
import seaborn as sns
import matplotlib.pyplot as plt
titanic = sns.load_dataset('titanic')
fig, axs = plt.subplots(ncols=2, figsize=(12, 4))
for ax, col in zip(axs, ['age', 'fare']):
sns.barplot(
x='sex',
y=col,
hue="class",
data=titanic,
edgecolor=".3",
linewidth=0.5,
ci=None,
ax=ax
)
ax.set_title('mean ' + col)
ax.margins(y=0.1) # make room for the labels
for bars in ax.containers:
ax.bar_label(bars, fmt='%.1f')
plt.tight_layout()
plt.show()
Related
I'd like to represent two datasets on the same plot, one as a line as one as a binned barplot. I can do each individually:
tobar = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
tobar["bins"] = pd.qcut(tobar.index, 20)
bp = sns.barplot(data=tobar, x="bins", y="value")
toline = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
lp = sns.lineplot(data=toline, x=toline.index, y="value")
But when I try to combine them, of course the x axis gets messed up:
fig, ax = plt.subplots()
ax2 = ax.twinx()
bp = sns.barplot(data=tobar, x="bins", y="value", ax=ax)
lp = sns.lineplot(data=toline, x=toline.index, y="value", ax=ax2)
bp.set(xlabel=None)
I also can't seem to get rid of the bin labels.
How can I get these two informations on the one plot?
This answer explains why it's better to plot the bars with matplotlib.axes.Axes.bar instead of sns.barplot or pandas.DataFrame.bar.
In short, the xtick locations correspond to the actual numeric value of the label, whereas the xticks for seaborn and pandas are 0 indexed, and don't correspond to the numeric value.
This answer shows how to add bar labels.
ax2 = ax.twinx() can be used for the line plot if needed
Works the same if the line plot is different data.
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2, seaborn 0.12.1
Imports and DataFrame
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# test data
np.random.seed(2022)
df = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
# create the bins
df["bins"] = pd.qcut(df.index, 20)
# add a column for the mid point of the interval
df['mid'] = df.bins.apply(lambda row: row.mid.round().astype(int))
# pivot the dataframe to calculate the mean of each interval
pt = df.pivot_table(index='mid', values='value', aggfunc='mean').reset_index()
Plot 1
# create the figure
fig, ax = plt.subplots(figsize=(30, 7))
# add a horizontal line at y=0
ax.axhline(0, color='black')
# add the bar plot
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
# set the labels on the xticks - if desired
ax.set_xticks(ticks=pt.mid, labels=pt.mid)
# add the intervals as labels on the bars - if desired
ax.bar_label(ax.containers[0], labels=df.bins.unique(), weight='bold')
# add the line plot
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Plot 2
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Plot 3
The bar width is the width of the interval
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=50, alpha=0.5, ec='k')
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Running this below code produces seaborn facetgrid graphs.
merged1=merged[merged['TEST'].isin(['VL'])]
merged2=merged[merged['TEST'].isin(['CD4'])]
g = sns.relplot(data=merged1, x='Days Post-ART', y='Log of VL and CD4', col='PATIENT ID',col_wrap=4, kind="line", height=4, aspect=1.5,
color='b', facet_kws={'sharey':True,'sharex':True})
for patid, ax in g.axes_dict.items(): # axes_dict is new in seaborn 0.11.2
ax1 = ax.twinx()
sns.lineplot(data=merged2[merged2['PATIENT ID'] == patid], x='Days Post-ART', y='Log of VL and CD4', color='r')
I've used the facet_kws={'sharey':True, 'sharex':True} to share the x-axis and y-axis but it's not working properly. Can someone assist?
As stated in the comments, the FacetGrid axes are shared by default. However, the twinx axes are not. Also, the call to twinx seems to reset the default hiding of the y tick labels.
You can manually share the twinx axes, and remove the unwanted tick labels.
Here is some example code using the iris dataset:
from matplotlib import pyplot as plt
import seaborn as sns
import numpy as np
iris = sns.load_dataset('iris')
g = sns.relplot(data=iris, x='petal_length', y='petal_width', col='species', col_wrap=2, kind="line",
height=4, aspect=1.5, color='b')
last_axes = np.append(g.axes.flat[g._col_wrap - 1::g._col_wrap], g.axes.flat[-1])
shared_right_y = None
for species, ax in g.axes_dict.items():
ax1 = ax.twinx()
if shared_right_y is None:
shared_right_y = ax1
else:
shared_right_y.get_shared_y_axes().join(shared_right_y, ax1)
sns.lineplot(data=iris[iris['species'] == species], x='petal_length', y='sepal_length', color='r', ax=ax1)
if not ax in last_axes: # remove tick labels from secondary axis
ax1.yaxis.set_tick_params(labelleft=False, labelright=False)
ax1.set_ylabel('')
if not ax in g._left_axes: # remove tick labels from primary axis
ax.yaxis.set_tick_params(labelleft=False, labelright=False)
plt.tight_layout()
plt.show()
I made a plot that looks like this
I want to turn off the ticklabels along the y axis. And to do that I am using
plt.tick_params(labelleft=False, left=False)
And now the plot looks like this. Even though the labels are turned off the scale 1e67 still remains.
Turning off the scale 1e67 would make the plot look better. How do I do that?
seaborn is used to draw the plot, but it's just a high-level API for matplotlib.
The functions called to remove the y-axis labels and ticks are matplotlib methods.
After creating the plot, use .set().
.set(yticklabels=[]) should remove tick labels.
This doesn't work if you use .set_title(), but you can use .set(title='')
.set(ylabel=None) should remove the axis label.
.tick_params(left=False) will remove the ticks.
Similarly, for the x-axis: How to remove or hide x-axis labels from a seaborn / matplotlib plot?
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2, seaborn 0.12.1
Example 1
import seaborn as sns
import matplotlib.pyplot as plt
# load data
exercise = sns.load_dataset('exercise')
pen = sns.load_dataset('penguins')
# create figures
fig, ax = plt.subplots(2, 1, figsize=(8, 8))
# plot data
g1 = sns.boxplot(x='time', y='pulse', hue='kind', data=exercise, ax=ax[0])
g2 = sns.boxplot(x='species', y='body_mass_g', hue='sex', data=pen, ax=ax[1])
plt.show()
Remove Labels
fig, ax = plt.subplots(2, 1, figsize=(8, 8))
g1 = sns.boxplot(x='time', y='pulse', hue='kind', data=exercise, ax=ax[0])
g1.set(yticklabels=[]) # remove the tick labels
g1.set(title='Exercise: Pulse by Time for Exercise Type') # add a title
g1.set(ylabel=None) # remove the axis label
g2 = sns.boxplot(x='species', y='body_mass_g', hue='sex', data=pen, ax=ax[1])
g2.set(yticklabels=[])
g2.set(title='Penguins: Body Mass by Species for Gender')
g2.set(ylabel=None) # remove the y-axis label
g2.tick_params(left=False) # remove the ticks
plt.tight_layout()
plt.show()
Example 2
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# sinusoidal sample data
sample_length = range(1, 1+1) # number of columns of frequencies
rads = np.arange(0, 2*np.pi, 0.01)
data = np.array([(np.cos(t*rads)*10**67) + 3*10**67 for t in sample_length])
df = pd.DataFrame(data.T, index=pd.Series(rads.tolist(), name='radians'), columns=[f'freq: {i}x' for i in sample_length])
df.reset_index(inplace=True)
# plot
fig, ax = plt.subplots(figsize=(8, 8))
ax.plot('radians', 'freq: 1x', data=df)
# or skip the previous two lines and plot df directly
# ax = df.plot(x='radians', y='freq: 1x', figsize=(8, 8), legend=False)
Remove Labels
# plot
fig, ax = plt.subplots(figsize=(8, 8))
ax.plot('radians', 'freq: 1x', data=df)
# or skip the previous two lines and plot df directly
# ax = df.plot(x='radians', y='freq: 1x', figsize=(8, 8), legend=False)
ax.set(yticklabels=[]) # remove the tick labels
ax.tick_params(left=False) # remove the ticks
I have created a simple violin plot from a bands DataFrame (df10 below) using seaborn:
fig, ax = plt.subplots(figsize=(10,4))
ax = sns.violinplot(x='z', y='z_fit', hue='new_col', data=df10, cut=0, palette='Blues', linewidth=1)
ax.set_xlabel('z_sim')
ax.legend()
The legend is plotted automatically with the values of the hue parameter. Using ax.legend() I can only hide the name of the used column ('new_col').
However, I was wondering if there is some way to manually modify the legend (texts, colors and shapes) plotted below:
Example:
import seaborn as sns
tips = sns.load_dataset("tips")
g = sns.FacetGrid(tips, col="time", size=4, aspect=.75)
g = g.map(sns.violinplot, "sex", "total_bill", "smoker", palette={"No": "b", "Yes": "w"}, inner=None, linewidth=1, scale="area", split=True, width=0.75).despine(left=True)
g.fig.get_axes()[0].legend(title= 'smoker',loc='top left',labels=["YES","NO"],edgecolor='red',facecolor='blue',ncol=2)
g.set_axis_labels('lunch','total bill')
For more info run:
help(g.fig.get_axes()[0].legend)
Want labels for Bollinger Bands (R) ('upper band', 'rolling mean', 'lower band') to show up in legend. But legend just applies the same label to each line with the pandas label for the first (only) column, 'IBM'.
# Plot price values, rolling mean and Bollinger Bands (R)
ax = prices['IBM'].plot(title="Bollinger Bands")
rm_sym.plot(label='Rolling mean', ax=ax)
upper_band.plot(label='upper band', c='r', ax=ax)
lower_band.plot(label='lower band', c='r', ax=ax)
#
# Add axis labels and legend
ax.set_xlabel("Date")
ax.set_ylabel("Adjusted Closing Price")
ax.legend(loc='upper left')
plt.show()
I know this code may represent a fundamental lack of understanding of how matlibplot works so explanations are particularly welcome.
The problem is most probably that whatever upper_band and lower_band are, they are not labeled.
One option is to label them by putting them as column to a dataframe. This will allow to plot the dataframe column directly.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
y =np.random.rand(4)
yupper = y+0.2
ylower = y-0.2
df = pd.DataFrame({"price" : y, "upper": yupper, "lower": ylower})
fig, ax = plt.subplots()
df["price"].plot(label='Rolling mean', ax=ax)
df["upper"].plot(label='upper band', c='r', ax=ax)
df["lower"].plot(label='lower band', c='r', ax=ax)
ax.legend(loc='upper left')
plt.show()
Otherwise you can also plot the data directly.
import matplotlib.pyplot as plt
import numpy as np
y =np.random.rand(4)
yupper = y+0.2
ylower = y-0.2
fig, ax = plt.subplots()
ax.plot(y, label='Rolling mean')
ax.plot(yupper, label='upper band', c='r')
ax.plot(ylower, label='lower band', c='r')
ax.legend(loc='upper left')
plt.show()
In both cases, you'll get a legend with labels. If that isn't enough, I recommend reading the Matplotlib Legend Guide which also tells you how to manually add labels to legends.