Plotting three categories with two axes in matplotlib - python

I am using a combination of pandas and matplotlib to plot three values for several categories. I would like one plot to have its own axis, and the other two to share an axis.
Close, but illustrates the issue with why I need dual axes:
pd.DataFrame([[1,2,3], [500,600,700], [500, 700, 650]], columns=['foo', 'bar','baz'],
index=['a','b','c']).T.plot(kind='bar')
Instead, I would like a second axis for the a bars. My attempt:
smol = pd.DataFrame([[1,2,3], [500,600,700], [500, 700, 650]], columns=['foo', 'bar','baz'],
index=['a','b','c']).T
fig = plt.figure(figsize=(10,5)) # Create matplotlib figure
ax = fig.add_subplot(111) # Create matplotlib axes
ax2 = ax.twinx() # Create another axes that shares the same x-axis as ax.
smol['a'].plot(kind='bar', color='red', ax=ax, width=0.3,
position=1, edgecolor='black')
smol['b'].plot(kind='bar', color='blue', ax=ax2, width=0.3,
position=0, edgecolor='black')
ax.set_ylabel('Small scale')
ax2.set_ylabel('Big scale')
plt.show()
Unfortunately, adding
smol['c'].plot(kind='bar', color='green', ax=ax2, width=0.3,
position=0, edgecolor='black')
produces:
How can I have b and c share an axis, but appear next to each other, as in the first attempt?

I've used secondary_y keyword. The code is also considerably shorter
smol = pd.DataFrame([[1,2,3], [500,600,700], [500, 700, 650]], columns=['foo', 'bar','baz'],
index=['a','b','c']).T
ax = smol.plot(kind="bar", secondary_y=['b', 'c'])
ax.set_ylabel('Small scale')
ax.right_ax.set_ylabel('Big scale')
plt.show()

Related

Combine Binned barplot with lineplot

I'd like to represent two datasets on the same plot, one as a line as one as a binned barplot. I can do each individually:
tobar = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
tobar["bins"] = pd.qcut(tobar.index, 20)
bp = sns.barplot(data=tobar, x="bins", y="value")
toline = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
lp = sns.lineplot(data=toline, x=toline.index, y="value")
But when I try to combine them, of course the x axis gets messed up:
fig, ax = plt.subplots()
ax2 = ax.twinx()
bp = sns.barplot(data=tobar, x="bins", y="value", ax=ax)
lp = sns.lineplot(data=toline, x=toline.index, y="value", ax=ax2)
bp.set(xlabel=None)
I also can't seem to get rid of the bin labels.
How can I get these two informations on the one plot?
This answer explains why it's better to plot the bars with matplotlib.axes.Axes.bar instead of sns.barplot or pandas.DataFrame.bar.
In short, the xtick locations correspond to the actual numeric value of the label, whereas the xticks for seaborn and pandas are 0 indexed, and don't correspond to the numeric value.
This answer shows how to add bar labels.
ax2 = ax.twinx() can be used for the line plot if needed
Works the same if the line plot is different data.
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2, seaborn 0.12.1
Imports and DataFrame
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# test data
np.random.seed(2022)
df = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
# create the bins
df["bins"] = pd.qcut(df.index, 20)
# add a column for the mid point of the interval
df['mid'] = df.bins.apply(lambda row: row.mid.round().astype(int))
# pivot the dataframe to calculate the mean of each interval
pt = df.pivot_table(index='mid', values='value', aggfunc='mean').reset_index()
Plot 1
# create the figure
fig, ax = plt.subplots(figsize=(30, 7))
# add a horizontal line at y=0
ax.axhline(0, color='black')
# add the bar plot
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
# set the labels on the xticks - if desired
ax.set_xticks(ticks=pt.mid, labels=pt.mid)
# add the intervals as labels on the bars - if desired
ax.bar_label(ax.containers[0], labels=df.bins.unique(), weight='bold')
# add the line plot
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Plot 2
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Plot 3
The bar width is the width of the interval
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=50, alpha=0.5, ec='k')
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')

Subplotting subplots

I am creating two plots using matplotlib, each of them is a subplot showing two metrics on the same axis.
I'm trying to run them so they show as two charts but in one graphic, so that when I save the graphic I see both plots. At the moment, running the second plot overwrites the first in memory so I can only ever save the second.
How can I plot them together?
My code is below.
plot1 = plt.figure()
fig,ax1 = plt.subplots()
ax1.plot(dfSat['time'],dfSat['wind_at_altitude'], 'b-', label = "speed", linewidth = 5.0)
plt.title('Wind Speeds - Saturday - {}'.format(windloc))
plt.xlabel('Time of day')
plt.ylabel('Wind speed (mph)')
ax1.plot(dfSat['time'],dfSat['gust_at_altitude'], 'r-', label = "gust", linewidth = 5.0)
plt.legend(loc="upper right")
ax1.text(0.05, 0.95, calcmeassat, transform=ax1.transAxes, fontsize=30,
verticalalignment='top')
plt.ylim((0,100))
plot2 = plt.figure()
fig,ax2 = plt.subplots()
ax2.plot(dfSun['time'],dfSun['wind_at_altitude'], 'b-', label = "speed", linewidth = 5.0)
plt.title('Wind Speeds - Sunday - {}'.format(windloc))
plt.xlabel('Time of day')
plt.ylabel('Wind speed (mph)')
ax2.plot(dfSun['time'],dfSun['gust_at_altitude'], 'r-', label = "gust", linewidth = 5.0)
plt.legend(loc="upper right")
ax2.text(0.05, 0.95, calcmeassun, transform=ax2.transAxes, fontsize=30,
verticalalignment='top')
plt.ylim((0,100))
As mentioned, in your case you only need one level of subplots, e.g., nrows=1, ncols=2.
However, in matplotlib 3.4+ there is such a thing as "subplotting subplots" called subfigures, which makes it easier to implement nested layouts, e.g.:
How to create row titles for subplots
How to share colorbars within some subplots
How to share xlabels within some subplots
Subplots
For your simpler use case, create 1x2 subplots with ax1 on the left and ax2 on the right:
# create 1x2 subplots
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(16, 4))
# plot saturdays on the left
dfSat.plot(ax=ax1, x='date', y='temp_min')
dfSat.plot(ax=ax1, x='date', y='temp_max')
ax1.set_ylim(-20, 50)
ax1.set_title('Saturdays')
# plot sundays on the right
dfSun.plot(ax=ax2, x='date', y='temp_min')
dfSun.plot(ax=ax2, x='date', y='temp_max')
ax2.set_ylim(-20, 50)
ax2.set_title('Sundays')
Subfigures
Say you want something more complicated like having the left side show 2012 with its own suptitle and right side to show 2015 with its own suptitle.
Create 1x2 subfigures (left subfig_l and right subfig_r) with 2x1 subplots on the left (top ax_lt and bottom ax_lb) and 2x1 subplots on the right (top ax_rt and bottom ax_rb):
# create 1x2 subfigures
fig = plt.figure(constrained_layout=True, figsize=(12, 5))
(subfig_l, subfig_r) = fig.subfigures(nrows=1, ncols=2, wspace=0.07)
# create top/box axes in left subfig
(ax_lt, ax_lb) = subfig_l.subplots(nrows=2, ncols=1)
# plot 2012 saturdays on left-top axes
dfSat12 = dfSat.loc[dfSat['date'].dt.year.eq(2012)]
dfSat12.plot(ax=ax_lt, x='date', y='temp_min')
dfSat12.plot(ax=ax_lt, x='date', y='temp_max')
ax_lt.set_ylim(-20, 50)
ax_lt.set_ylabel('Saturdays')
# plot 2012 sundays on left-top axes
dfSun12 = dfSun.loc[dfSun['date'].dt.year.eq(2012)]
dfSun12.plot(ax=ax_lb, x='date', y='temp_min')
dfSun12.plot(ax=ax_lb, x='date', y='temp_max')
ax_lb.set_ylim(-20, 50)
ax_lb.set_ylabel('Sundays')
# set suptitle for left subfig
subfig_l.suptitle('2012', size='x-large', weight='bold')
# create top/box axes in right subfig
(ax_rt, ax_rb) = subfig_r.subplots(nrows=2, ncols=1)
# plot 2015 saturdays on left-top axes
dfSat15 = dfSat.loc[dfSat['date'].dt.year.eq(2015)]
dfSat15.plot(ax=ax_rt, x='date', y='temp_min')
dfSat15.plot(ax=ax_rt, x='date', y='temp_max')
ax_rt.set_ylim(-20, 50)
ax_rt.set_ylabel('Saturdays')
# plot 2015 sundays on left-top axes
dfSun15 = dfSun.loc[dfSun['date'].dt.year.eq(2015)]
dfSun15.plot(ax=ax_rb, x='date', y='temp_min')
dfSun15.plot(ax=ax_rb, x='date', y='temp_max')
ax_rb.set_ylim(-20, 50)
ax_rb.set_ylabel('Sundays')
# set suptitle for right subfig
subfig_r.suptitle('2015', size='x-large', weight='bold')
Sample data for reference:
import pandas as pd
from vega_datasets import data
df = data.seattle_weather()
df['date'] = pd.to_datetime(df['date'])
dfSat = df.loc[df['date'].dt.weekday.eq(5)]
dfSun = df.loc[df['date'].dt.weekday.eq(6)]
It doesn't work like that. Subplots are what they are called; plots inside a main plot.
That means if you need two subplots; then you need one plot containing two subplots in it.
# figure object NOT plot object
# useful when you want only one plot NO subplots
fig = plt.figure()
# 2 subplots inside 1 plot
# 1 row, 2 columns
fig, [ax1, ax2] = plt.subplots(1, 2)
# then call plotting method on each axis object to
# create plot on that subplot
sns.histplot(...., ax=ax1)
sns.violinplot(..., ax=ax2)
# or using matplotlib like this
ax1.plot()
ax2.plot()
Learn more about subplots

How to plot a paired histogram using seaborn

I would like to make a paired histogram like the one shown here using the seaborn distplot.
This kind of plot can also be referred to as the back-to-back histogram shown here, or a bihistogram inverted/mirrored along the x-axis as discussed here.
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
green = np.random.normal(20,10,1000)
blue = np.random.poisson(60,1000)
fig, ax = plt.subplots(figsize=(8,6))
sns.distplot(blue, hist=True, kde=True, hist_kws={'edgecolor':'black'}, kde_kws={'linewidth':2}, bins=10, color='blue')
sns.distplot(green, hist=True, kde=True, hist_kws={'edgecolor':'black'}, kde_kws={'linewidth':2}, bins=10, color='green')
ax.set_xticks(np.arange(-20,121,20))
ax.set_yticks(np.arange(0.0,0.07,0.01))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show()
Here is the output:
When I use the method discussed here (plt.barh), I get the bar plot shown just below, which is not what I am looking for.
Or maybe I haven't understood the workaround well enough...
A simple/short implementation of python-seaborn-distplot similar to these kinds of plots would be perfect. I edited the figure of my first plot above to show the kind of plot I hope to achieve (though y-axis not upside down):
Any leads would be greatly appreciated.
You could use two subplots and invert the y-axis of the lower one and plot with the same bins.
df = pd.DataFrame({'a': np.random.normal(0,5,1000), 'b': np.random.normal(20,5,1000)})
fig =plt.figure(figsize=(5,5))
ax = fig.add_subplot(211)
ax2 = fig.add_subplot(212)
bins = np.arange(-20,40)
ax.hist(df['a'], bins=bins)
ax2.hist(df['b'],color='orange', bins=bins)
ax2.invert_yaxis()
edit:
improvements suggested by #mwaskom
fig, axes = plt.subplots(nrows=2, ncols=1, sharex=True, figsize=(5,5))
bins = np.arange(-20,40)
for ax, column, color, invert in zip(axes.ravel(), df.columns, ['teal', 'orange'], [False,True]):
ax.hist(df[column], bins=bins, color=color)
if invert:
ax.invert_yaxis()
plt.subplots_adjust(hspace=0)
Here is a possible approach using seaborn's displots.
Seaborn doesn't return the created graphical elements, but the ax can be interrogated. To make sure the ax only contains the elements you want upside down, those elements can be drawn first. Then, all the patches (the rectangular bars) and the lines (the curve for the kde) can be given their height in negative. Optionally the x-axis can be set at y == 0 using ax.spines['bottom'].set_position('zero').
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
green = np.random.normal(20, 10, 1000)
blue = np.random.poisson(60, 1000)
fig, ax = plt.subplots(figsize=(8, 6))
sns.distplot(green, hist=True, kde=True, hist_kws={'edgecolor': 'black'}, kde_kws={'linewidth': 2}, bins=10,
color='green')
for p in ax.patches: # turn the histogram upside down
p.set_height(-p.get_height())
for l in ax.lines: # turn the kde curve upside down
l.set_ydata(-l.get_ydata())
sns.distplot(blue, hist=True, kde=True, hist_kws={'edgecolor': 'black'}, kde_kws={'linewidth': 2}, bins=10,
color='blue')
ax.set_xticks(np.arange(-20, 121, 20))
ax.set_yticks(np.arange(0.0, 0.07, 0.01))
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
pos_ticks = np.array([t for t in ax.get_yticks() if t > 0])
ticks = np.concatenate([-pos_ticks[::-1], [0], pos_ticks])
ax.set_yticks(ticks)
ax.set_yticklabels([f'{abs(t):.2f}' for t in ticks])
ax.spines['bottom'].set_position('zero')
plt.show()

Modifying y-axis in histogram in Pandas matplotlib

I have 33960 - 0's and 144 - 1's in data_train['fk_action_code_id'].
On plotting histogram, the bar of 1 is so less that it is not visible. Is there any way I can raise the bar of 1 by modifying the Y-Axis so that the bar of 1 is visible?
I tried this but it doesn't work
b=[0,145, 35000]
plt.yticks(b)
plt.hist(data_train['fk_action_code_id'], histtype='bar', rwidth=0.8)
A few suggestions: you could
1.) create two y axes, one for the zeros and the other for the ones
2.) multiply one of the bars by a numerical factor, so that they are of the same order of magnitude (you should explain this in the plot legend then)
3.) draw a logarithmic histogram with the option log=True in the plt.hist() command.
The following will produce plots for these three options:
import numpy as np
import matplotlib.pyplot as plt
zeros = np.zeros([35000])
modifier = 100
ones = np.ones([145*modifier])
arr = np.hstack((zeros, ones))
bins = np.asarray([-0.5, 0.5, 1.5])
plt.hist(arr, bins=bins, facecolor='green', alpha=0.75, log=False)
plt.xticks([0,1])
plt.title('Multiplied with a factor')
plt.savefig('multiplied.png')
plt.show()
plt.clf()
modifier = 1
ones = np.ones([145*modifier])
arr = np.hstack((zeros, ones))
plt.hist(arr, bins=bins, facecolor='green', alpha=0.75, log=True)
plt.xticks([0,1])
plt.title('Logarithmic')
plt.savefig('log.png')
plt.show()
plt.clf()
ax1 = plt.gca()
ax2 = ax1.twinx()
ax1.set_yticks([0, 35000, 40000])
ax1.set_ylim(0, 40000)
ax2.set_yticks([0, 145, 200])
ax2.set_ylim(0, 200)
ax1.hist(arr, bins=[bins[0], bins[1]], facecolor='green', alpha=0.75, log=False)#, histtype='bar')#, rwidth=1.0)
ax2.hist(arr, bins=[bins[1], bins[2]], facecolor='green', alpha=0.75, log=False)#, histtype='bar')#, rwidth=1.0)
plt.xticks([0,1])
plt.title('Two y axes')
plt.savefig('two_axes.png')
plt.show()
plt.clf()

Plot alignment and formatting help in Matplotlib and Seaborn

I have a dataframe with 15 rows, which I plot using a seaborn heatmap. I have three plots, each with different scale for the heatmap. The first two plots are the first two rows, which are not aligned on the plot.
I have created a grid with 15 rows, I give each of the first two rows 1/15th of the grid so I don't know why it is not aligned.
Another problem with the first two rows of the heatmap is that the text formatting doesn't work either.
So I want to do two things:
Stretch the top two rows of the table to align it with the bottom one and;
To make the formatting work for the top two rows as well.
Maybe also add titles to my white xaxes (l1 and l2) that separate the the subgroups in the bottom plot (standard methods like ax.set_title does not work).
My code:
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.gridspec as gs
gs = gs.GridSpec(15, 1) # nrows, ncols
f = plt.figure(figsize=(10, 15))
cmap = sns.diverging_palette(220, 10, as_cmap=True)
ax1 = f.add_subplot(gs[0:1, :])
ax2 = f.add_subplot(gs[1:2, :])
ax3 = f.add_subplot(gs[2:15, :])
ticksx = plt.xticks(fontsize = 18, fontweight='bold')
ticksy = plt.yticks(fontsize = 18, fontweight='bold')
wageplot = sns.heatmap(df[0:1], vmin=3000, vmax=10000, annot=False, square=True, cmap=cmap, ax=ax1, yticklabels=True, cbar=False, xticklabels=False)
tenureplot = sns.heatmap(df[1:2], vmin=45, vmax=100, annot=True, square=True, cmap=cmap, ax=ax2, yticklabels=True, cbar=False, xticklabels=False)
heatmap = sns.heatmap(df[2:15], vmin=0, vmax=1, annot=False, square=True, cmap=cmap, ax=ax3, yticklabels=True, cbar=True, xticklabels=True)
heatmap.set_xticklabels(cols, rotation=45, ha='right')
l1 = plt.axhline(y=1, linewidth=14, color='w', label='Female')
l2 = plt.axhline(y=5, linewidth=14, color='w', label='Education')
f.tight_layout()
I would appreciate if I can pointed to where can I get some information about how to program the needed grid. An example image:

Categories