Combine Binned barplot with lineplot - python

I'd like to represent two datasets on the same plot, one as a line as one as a binned barplot. I can do each individually:
tobar = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
tobar["bins"] = pd.qcut(tobar.index, 20)
bp = sns.barplot(data=tobar, x="bins", y="value")
toline = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
lp = sns.lineplot(data=toline, x=toline.index, y="value")
But when I try to combine them, of course the x axis gets messed up:
fig, ax = plt.subplots()
ax2 = ax.twinx()
bp = sns.barplot(data=tobar, x="bins", y="value", ax=ax)
lp = sns.lineplot(data=toline, x=toline.index, y="value", ax=ax2)
bp.set(xlabel=None)
I also can't seem to get rid of the bin labels.
How can I get these two informations on the one plot?

This answer explains why it's better to plot the bars with matplotlib.axes.Axes.bar instead of sns.barplot or pandas.DataFrame.bar.
In short, the xtick locations correspond to the actual numeric value of the label, whereas the xticks for seaborn and pandas are 0 indexed, and don't correspond to the numeric value.
This answer shows how to add bar labels.
ax2 = ax.twinx() can be used for the line plot if needed
Works the same if the line plot is different data.
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2, seaborn 0.12.1
Imports and DataFrame
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# test data
np.random.seed(2022)
df = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
# create the bins
df["bins"] = pd.qcut(df.index, 20)
# add a column for the mid point of the interval
df['mid'] = df.bins.apply(lambda row: row.mid.round().astype(int))
# pivot the dataframe to calculate the mean of each interval
pt = df.pivot_table(index='mid', values='value', aggfunc='mean').reset_index()
Plot 1
# create the figure
fig, ax = plt.subplots(figsize=(30, 7))
# add a horizontal line at y=0
ax.axhline(0, color='black')
# add the bar plot
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
# set the labels on the xticks - if desired
ax.set_xticks(ticks=pt.mid, labels=pt.mid)
# add the intervals as labels on the bars - if desired
ax.bar_label(ax.containers[0], labels=df.bins.unique(), weight='bold')
# add the line plot
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Plot 2
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Plot 3
The bar width is the width of the interval
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=50, alpha=0.5, ec='k')
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')

Related

Why I have two legends ? How to fusion the legends ? Python

I'm plotting 2 dataframes with this method:
df.plot(ax=ax, x='x', y='y', label = "first_df")
df2.plot(ax=ax, x='x', y='y', label = "second_df")
And I add some avxspan functions:
plt.axvspan(x, y, label = value)
Since that I have multiple avxspan and there are also dupplicated values, I am using this code to uniquely display the values.
handles, labels = plt.gca().get_legend_handles_labels()
by_label = dict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys(),loc='upper center', bbox_to_anchor=(1.1, 0.8))
But when I display the legend, I have one legend for the dfs and an other for the avxspan functions. I think it is because I use plot for dfs and plt for axvspan, so I don't know how to fusion the legends.
EDITH:
I tried this with ax1 et ax2 for my dfs:
h1, l1 = ax1.get_legend_handles_labels()
h2, l2 = ax2.get_legend_handles_labels()
ax1.legend((h1+h2), l1+l2, loc='upper center', bbox_to_anchor=(1.1, 0.8))
It's working but I have dupplicates in the legend, how can I remove it ?
In stead of using df.plot which creates a legend whenever it's called, you can use ax.plot:
ax.plot(df['x'], df['y'], label='first df')
ax.plot(df2['x'], df2['y'], label='second df')
ax.legend()
In the following example, the legend is combined by plotting everything onto the same Axes including the avxspan with ax.avxspan and then running ax.legend to add the avxspan legend to the existing legend:
import numpy as np # v 1.19.2
import pandas as pd # v 1.2.3
# Create sample dataset
rng = np.random.default_rng(seed=1)
size = 30
df = pd.DataFrame(dict(x=range(size), y=rng.integers(0, 100, size=size)))
df2 = pd.DataFrame(dict(x=range(size), y=rng.integers(10, 50, size=size)))
# Plot data unto single Axes with combined legend using multiple plotting functions
ax = df.plot(x='x', y='y', label='first_df', figsize=(8,4))
df2.plot(ax=ax, x='x', y='y', label = 'second_df')
ax.axvspan(10, 15, label='span', facecolor='black', edgecolor=None, alpha=0.2)
ax.legend(loc='upper right');

Plotting difficulty combining 3 variables and repositioning the legends in matplotlib

I have data where I have names, proportions and total. I want to show all 3 variables in one plot. Ideally I want to have everything like plot 1 but inside I want to show total as in plot 2
In first plot I don't get line right also this is not my plot of choice.
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
df = pd.DataFrame({"name": list("ABCDEFGHIJ"), "proportion": [0.747223, 0.785883, 0.735542, 0.817368, 0.565193, 0.723029, 0.723004, 0.722595, 0.783929, 0.55152],
"total": [694327, 309681, 239384, 201646, 192267, 189399, 181974, 163483, 157902, 153610]})
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
sns.barplot(data=df, x="name", y="total", color="lightblue", ax=ax1)
sns.lineplot(data=df, x="name", y= "proportion", color="black", lw=3, ls="--", ax=ax2)
# Plot the figure.
df["male"] = df.proportion * df.total
ax = sns.barplot(data = df, x= "name", y = 'total', color = "lightblue")
sns.barplot(data = df, x="name", y = "male", color = "blue", ax = ax)
ax.set_ylabel("male/no_of_streams")
Is there a way I can achieve my goal of effective plot where
I can show total split
I also want to add proportions values to plot as well
Any help would be appreciated
Thanks in advance
If my understanding is right, for the first plot, I guess you wanna to know why the line is dashed. Just remove argument ls="--", you will get solid line.
The second, following code can work, if you want percentage of "man-number" / "total". If the percentage is computed using other numbers, you can adjust the equation in the for statement:
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns
if __name__ == '__main__':
df = pd.DataFrame({"name": list("ABCDEFGHIJ"), "proportion": [0.747223, 0.785883, 0.735542, 0.817368, 0.565193, 0.723029, 0.723004, 0.722595, 0.783929, 0.55152], "total": [694327, 309681, 239384, 201646, 192267, 189399, 181974, 163483, 157902, 153610]})
# fig, ax1 = plt.subplots()
# ax2 = ax1.twinx()
# sns.barplot(data=df, x="name", y="total", color="lightblue", ax=ax1)
# # remove ls='--'
# sns.lineplot(data=df, x="name", y="proportion", color="black", lw=3, ax=ax2)
# Plot the figure.
df["male"] = df.proportion * df.total
ax = sns.barplot(data = df, x= "name", y = 'total', color = "lightblue")
sns.barplot(data = df, x="name", y = "male", color = "blue", ax = ax)
ax.set_ylabel("proportion(male/no_of_streams)")
# this is code block to add percentage
for i, v in enumerate(df['proportion']):
p = ax.patches[i]
height = p.get_height()
ax.text(p.get_x()+p.get_width()/2.,
height + 3,
'{:1.0f}%'.format(v * 100),
ha="center")
plt.show()
BTW, I learn at this page, FYI.

Problems rotating xtick labels when using twinx

I have problems with the rotation of my X-axis, I have tried to do the rotation the output plot without errors, but I do not have the results.
# Import Data
#df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv")
x = total_test["Dia"].values[:]; y1 = total_test["Confirmados"].values[:]; y2 = total_test["Fallecidos"].values[:]
# Plot Line1 (Left Y Axis)
fig, ax1 = plt.subplots(1,1,figsize=(10,8), dpi= 200)
ax1.plot(x, y1,'g^', color='tab:red')
# Plot Line2 (Right Y Axis)
ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis
ax2.plot(x, y2,'bs', color='tab:blue')
# Just Decorations!! -------------------
# ax1 (left y axis)
ax1.set_xlabel('Dias', fontsize=10)
ax1.set_ylabel('Personas Confirmadas', color='tab:red', fontsize=20)
ax1.tick_params(axis='y', rotation=0, labelcolor='tab:red' )
# ax2 (right Y axis)
ax2.set_ylabel("Personas Fallecidas", color='tab:blue', fontsize=20)
ax2.tick_params(axis='y', rotation=0, labelcolor='tab:blue')
ax2.set_title("Personas Confirmadas y Fallecidas por Covid-19 Peru", fontsize=15)
#ax2.set_xticks(x)
ax2.set_xticklabels(x[::],fontsize=10,rotation=90)
plt.show()
Any commands for the xaxis need to occur before ax2.
Verify date is in a datetime format and set as the index.
import pandas as pd
import matplotlib.pyplot as plt
# read data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv")
# verify the date column is a datetime format and set as index
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
#plot
# create figure
fig, ax1 = plt.subplots(1, 1, figsize=(10,8))
# 1st plot
ax1.plot(df['pop'], color='tab:red')
# set xticks rotation before creating ax2
plt.xticks(rotation=90)
# 2nd plot (Right Y Axis)
ax2 = ax1.twinx() # create the 'twin' axis on the right
ax2.plot(df['unemploy'], color='tab:blue')
plt.show()
Plot directly with pandas.DataFrame.plot
# load data
df = pd.read_csv("https://github.com/selva86/datasets/raw/master/economics.csv", parse_dates=True, index_col=[0])
# plot and rotate the tick labels with rot= in the first plot call
ax = df.plot(y='pop', color='tab:red', figsize=(10,8), rot=90)
ax2 = ax.twinx()
df.plot(y='unemploy', color='tab:blue', ax=ax2)
ax2.legend(loc='upper right')

Python - dual y axis chart, align zero

I'm trying to create a horizontal bar chart, with dual x axes. The 2 axes are very different in scale, 1 set goes from something like -5 to 15 (positive and negative value), the other set is more like 100 to 500 (all positive values).
When I plot this, I'd like to align the 2 axes so zero shows at the same position, and only the negative values are to the left of this. Currently the set with all positive values starts at the far left, and the set with positive and negative starts in the middle of the overall plot.
I found the align_yaxis example, but I'm struggling to align the x axes.
Matplotlib bar charts: Aligning two different y axes to zero
Here is an example of what I'm working on with simple test data. Any ideas/suggestions? thanks
import pandas as pd
import matplotlib.pyplot as plt
d = {'col1':['Test 1','Test 2','Test 3','Test 4'],'col 2':[1.4,-3,1.3,5],'Col3':[900,750,878,920]}
df = pd.DataFrame(data=d)
fig = plt.figure() # Create matplotlib figure
ax = fig.add_subplot(111) # Create matplotlib axes
ax2 = ax.twiny() # Create another axes that shares the same y-axis as ax.
width = 0.4
df['col 2'].plot(kind='barh', color='darkblue', ax=ax, width=width, position=1,fontsize =4, figsize=(3.0, 5.0))
df['Col3'].plot(kind='barh', color='orange', ax=ax2, width=width, position=0, fontsize =4, figsize=(3.0, 5.0))
ax.set_yticklabels(df.col1)
ax.set_xlabel('Positive and Neg',color='darkblue')
ax2.set_xlabel('Positive Only',color='orange')
ax.invert_yaxis()
plt.show()
I followed the link from a question and eventually ended up at this answer : https://stackoverflow.com/a/10482477/5907969
The answer has a function to align the y-axes and I have modified the same to align x-axes as follows:
def align_xaxis(ax1, v1, ax2, v2):
"""adjust ax2 xlimit so that v2 in ax2 is aligned to v1 in ax1"""
x1, _ = ax1.transData.transform((v1, 0))
x2, _ = ax2.transData.transform((v2, 0))
inv = ax2.transData.inverted()
dx, _ = inv.transform((0, 0)) - inv.transform((x1-x2, 0))
minx, maxx = ax2.get_xlim()
ax2.set_xlim(minx+dx, maxx+dx)
And then use it within the code as follows:
import pandas as pd
import matplotlib.pyplot as plt
d = {'col1':['Test 1','Test 2','Test 3','Test 4'],'col 2' [1.4,-3,1.3,5],'Col3':[900,750,878,920]}
df = pd.DataFrame(data=d)
fig = plt.figure() # Create matplotlib figure
ax = fig.add_subplot(111) # Create matplotlib axes
ax2 = ax.twiny() # Create another axes that shares the same y-axis as ax.
width = 0.4
df['col 2'].plot(kind='barh', color='darkblue', ax=ax, width=width, position=1,fontsize =4, figsize=(3.0, 5.0))
df['Col3'].plot(kind='barh', color='orange', ax=ax2, width=width, position=0, fontsize =4, figsize=(3.0, 5.0))
ax.set_yticklabels(df.col1)
ax.set_xlabel('Positive and Neg',color='darkblue')
ax2.set_xlabel('Positive Only',color='orange')
align_xaxis(ax,0,ax2,0)
ax.invert_yaxis()
plt.show()
This will give you what you're looking for

Plot alignment and formatting help in Matplotlib and Seaborn

I have a dataframe with 15 rows, which I plot using a seaborn heatmap. I have three plots, each with different scale for the heatmap. The first two plots are the first two rows, which are not aligned on the plot.
I have created a grid with 15 rows, I give each of the first two rows 1/15th of the grid so I don't know why it is not aligned.
Another problem with the first two rows of the heatmap is that the text formatting doesn't work either.
So I want to do two things:
Stretch the top two rows of the table to align it with the bottom one and;
To make the formatting work for the top two rows as well.
Maybe also add titles to my white xaxes (l1 and l2) that separate the the subgroups in the bottom plot (standard methods like ax.set_title does not work).
My code:
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.gridspec as gs
gs = gs.GridSpec(15, 1) # nrows, ncols
f = plt.figure(figsize=(10, 15))
cmap = sns.diverging_palette(220, 10, as_cmap=True)
ax1 = f.add_subplot(gs[0:1, :])
ax2 = f.add_subplot(gs[1:2, :])
ax3 = f.add_subplot(gs[2:15, :])
ticksx = plt.xticks(fontsize = 18, fontweight='bold')
ticksy = plt.yticks(fontsize = 18, fontweight='bold')
wageplot = sns.heatmap(df[0:1], vmin=3000, vmax=10000, annot=False, square=True, cmap=cmap, ax=ax1, yticklabels=True, cbar=False, xticklabels=False)
tenureplot = sns.heatmap(df[1:2], vmin=45, vmax=100, annot=True, square=True, cmap=cmap, ax=ax2, yticklabels=True, cbar=False, xticklabels=False)
heatmap = sns.heatmap(df[2:15], vmin=0, vmax=1, annot=False, square=True, cmap=cmap, ax=ax3, yticklabels=True, cbar=True, xticklabels=True)
heatmap.set_xticklabels(cols, rotation=45, ha='right')
l1 = plt.axhline(y=1, linewidth=14, color='w', label='Female')
l2 = plt.axhline(y=5, linewidth=14, color='w', label='Education')
f.tight_layout()
I would appreciate if I can pointed to where can I get some information about how to program the needed grid. An example image:

Categories