How to plot and annotate grouped bars in seaborn / matplotlib - python

I have a dataframe that looks like this:
I have used a barplot to represent the subscribers for each row. This is what I did:
data = channels.sort_values('subscribers', ascending=False).head(5)
chart = sns.barplot(x = 'name', y='subscribers',data=data)
chart.set_xticklabels(chart.get_xticklabels(), rotation=90)
for p in chart.patches:
chart.annotate("{:,.2f}".format(p.get_height(), '.2f'), (p.get_x() + p.get_width() / 2., p.get_height()), ha = 'center', va = 'center', xytext = (0, 10), textcoords = 'offset points')
Now I want to show the 'video_count' for each user on this same plot. The goal is to compare how the number of subscribers relate to the number of videos. How can I depict this on the chart?

Data
The data needs to be converted to a long format using .melt
Because of the scale of values, 'log' is used for the yscale
All of the categories in 'cats' are included for the example.
Select only the desired columns before melting, or use dfl = dfl[dfl.cats.isin(['sub', 'vc']) to filter for the desired 'cats'.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# setup dataframe
data = {'vc': [76, 47, 140, 106, 246],
'tv': [29645400, 28770702, 50234486, 30704017, 272551386],
'sub': [66100, 15900, 44500, 37000, 76700],
'name': ['a', 'b', 'c', 'd', 'e']}
df = pd.DataFrame(data)
vc tv sub name
0 76 29645400 66100 a
1 47 28770702 15900 b
2 140 50234486 44500 c
# convert to long form
dfl = (df.melt(id_vars='name', var_name='cats', value_name='values')
.sort_values('values', ascending=False).reset_index(drop=True))
name cats values
0 e tv 272551386
1 c tv 50234486
2 d tv 30704017
Updated as of matplotlib v3.4.2
Use matplotlib.pyplot.bar_label
.bar_label works for matplotlib, seaborn, and pandas plots.
See How to add value labels on a bar chart for additional details and examples with .bar_label.
Tested with seaborn v0.11.1, which is using matplotlib as the plot engine.
# plot
fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(x='name', y='values', data=dfl, hue='cats', ax=ax)
ax.set_xticklabels(ax.get_xticklabels(), rotation=0)
ax.set_yscale('log')
for c in ax.containers:
# set the bar label
ax.bar_label(c, fmt='%.0f', label_type='edge', padding=1)
# pad the spacing between the number and the edge of the figure
ax.margins(y=0.1)
Plot with seaborn v0.11.1
Using matplotlib before version 3.4.2
Note that using .annotate and .patches is much more verbose than with .bar_label.
# plot
fig, ax = plt.subplots(figsize=(12, 6))
sns.barplot(x='name', y='values', data=dfl, hue='cats', ax=ax)
ax.set_xticklabels(chart.get_xticklabels(), rotation=0)
ax.set_yscale('log')
for p in ax.patches:
ax.annotate(f"{p.get_height():.0f}", (p.get_x() + p.get_width() / 2., p.get_height()),
ha='center', va='center', xytext =(0, 7), textcoords='offset points')

Related

How to customize seaborn boxplot with specific color sequence when boxplots have hue

I want to make boxplots with hues but I want to color code it so that each specific X string is a certain color with the hue just being a lighter color. I am able to do a boxplot without a hue. When I incorporate the hue, I get the second boxplot which loses the colors. Can someone help me customize the colors for the figure that contains the hue?
Essentially, its what the answer for this question is but with boxplots.
This is my code:
first boxplot
order=['Ash1','E1A','FUS','p53']
colors=['gold','teal','darkorange','royalblue']
color_dict=dict(zip(order,colors))
fig,ax=plt.subplots(figsize=(25,15))
bp=sns.boxplot(data=df_idrs, x=df_idrs["construct"], y=df_idrs['Norm_Ef_IDR/Ef_GS'],ax=ax,palette=color_dict)
sns.stripplot(ax=ax,y='Norm_Ef_IDR/Ef_GS', x='construct', data=df_idrs,palette=color_dict,
jitter=1, marker='o', alpha=0.4,edgecolor='black',linewidth=1, dodge=True)
ax.axhline(y=1,linestyle="--",color='black',linewidth=2)
plt.legend(loc='upper left', bbox_to_anchor=(1.03, 1))
second boxplot
order=['Ash1','E1A','FUS','p53']
colors=['gold','teal','darkorange','royalblue']
color_dict=dict(zip(order,colors))
fig,ax=plt.subplots(figsize=(25,15))
bp=sns.boxplot(data=df_idrs, x=df_idrs["construct"], y=df_idrs['Norm_Ef_IDR/Ef_GS'],ax=ax, hue=df_idrs["location"])
sns.stripplot(y='Norm_Ef_IDR/Ef_GS', x='construct', data=df_idrs, hue=df_idrs["location"],
jitter=1, marker='o', alpha=0.4,edgecolor='black',linewidth=1, dodge=True)
ax.axhline(y=1,linestyle="--",color='black',linewidth=2)
plt.legend(loc='upper left', bbox_to_anchor=(1.03, 1))
The only thing that changed was the palette to hue. I have seen many examples on here but I am unable to get them to work. Using the second code, I have tried the following:
Nothing happens for this one.
for ind, bp in enumerate(ax.findobj(PolyCollection)):
rgb = to_rgb(colors[ind // 2])
if ind % 2 != 0:
rgb = 0.5 + 0.5 * np.array(rgb) # make whiter
bp.set_facecolor(rgb)
I get index out of range for the following one.
for i in range(0,4):
mybox = bp.artists[i]
mybox.set_facecolor(color_dict[order[i]])
Matplotlib stores the boxes in ax.patches, but there are also 2 dummy patches (used to construct the legend) that need to be filtered away. The dots of the stripplot are stored in ax.collections. There are also 2 dummy collections for the legend, but as those come at the end, they don't form a problem.
Some remarks:
sns.boxplot returns the subplot on which it was drawn; as it is called with ax=ax it will return that same ax
Setting jitter=1in the stripplot will smear the dots over a width of 1. 1 is the distance between the x positions, and the boxes are only 0.4 wide. To avoid clutter, the code below uses jitter=0.4.
Here is some example code starting from dummy test data:
from matplotlib import pyplot as plt
from matplotlib.legend_handler import HandlerTuple
from matplotlib.patches import PathPatch
from matplotlib.colors import to_rgb
import seaborn as sns
import pandas as pd
import numpy as np
np.random.seed(20230215)
order = ['Ash1', 'E1A', 'FUS', 'p53']
colors = ['gold', 'teal', 'darkorange', 'royalblue']
hue_order = ['A', 'B']
df_idrs = pd.DataFrame({'construct': np.repeat(order, 200),
'Norm_Ef_IDR/Ef_GS': (np.random.normal(0.03, 1, 800).cumsum() + 10) / 15,
'location': np.tile(np.repeat(hue_order, 100), 4)})
fig, ax = plt.subplots(figsize=(12, 5))
sns.boxplot(data=df_idrs, x=df_idrs['construct'], y=df_idrs['Norm_Ef_IDR/Ef_GS'], hue='location',
order=order, hue_order=hue_order, ax=ax)
box_colors = [f + (1 - f) * np.array(to_rgb(c)) # whiten colors depending on hue
for c in colors for f in np.linspace(0, 0.5, len(hue_order))]
box_patches = [p for p in ax.patches if isinstance(p, PathPatch)]
for patch, color in zip(box_patches, box_colors):
patch.set_facecolor(color)
sns.stripplot(y='Norm_Ef_IDR/Ef_GS', x='construct', data=df_idrs, hue=df_idrs['location'],
jitter=0.4, marker='o', alpha=0.4, edgecolor='black', linewidth=1, dodge=True, ax=ax)
for collection, color in zip(ax.collections, box_colors):
collection.set_facecolor(color)
ax.axhline(y=1, linestyle='--', color='black', linewidth=2)
handles = [tuple(box_patches[i::len(hue_order)]) for i in range(len(hue_order))]
ax.legend(handles=handles, labels=hue_order, title='hue category',
handlelength=4, handler_map={tuple: HandlerTuple(ndivide=None, pad=0)},
loc='upper left', bbox_to_anchor=(1.01, 1))
plt.tight_layout()
plt.show()

Combine Binned barplot with lineplot

I'd like to represent two datasets on the same plot, one as a line as one as a binned barplot. I can do each individually:
tobar = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
tobar["bins"] = pd.qcut(tobar.index, 20)
bp = sns.barplot(data=tobar, x="bins", y="value")
toline = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
lp = sns.lineplot(data=toline, x=toline.index, y="value")
But when I try to combine them, of course the x axis gets messed up:
fig, ax = plt.subplots()
ax2 = ax.twinx()
bp = sns.barplot(data=tobar, x="bins", y="value", ax=ax)
lp = sns.lineplot(data=toline, x=toline.index, y="value", ax=ax2)
bp.set(xlabel=None)
I also can't seem to get rid of the bin labels.
How can I get these two informations on the one plot?
This answer explains why it's better to plot the bars with matplotlib.axes.Axes.bar instead of sns.barplot or pandas.DataFrame.bar.
In short, the xtick locations correspond to the actual numeric value of the label, whereas the xticks for seaborn and pandas are 0 indexed, and don't correspond to the numeric value.
This answer shows how to add bar labels.
ax2 = ax.twinx() can be used for the line plot if needed
Works the same if the line plot is different data.
Tested in python 3.11, pandas 1.5.2, matplotlib 3.6.2, seaborn 0.12.1
Imports and DataFrame
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# test data
np.random.seed(2022)
df = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
# create the bins
df["bins"] = pd.qcut(df.index, 20)
# add a column for the mid point of the interval
df['mid'] = df.bins.apply(lambda row: row.mid.round().astype(int))
# pivot the dataframe to calculate the mean of each interval
pt = df.pivot_table(index='mid', values='value', aggfunc='mean').reset_index()
Plot 1
# create the figure
fig, ax = plt.subplots(figsize=(30, 7))
# add a horizontal line at y=0
ax.axhline(0, color='black')
# add the bar plot
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
# set the labels on the xticks - if desired
ax.set_xticks(ticks=pt.mid, labels=pt.mid)
# add the intervals as labels on the bars - if desired
ax.bar_label(ax.containers[0], labels=df.bins.unique(), weight='bold')
# add the line plot
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Plot 2
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Plot 3
The bar width is the width of the interval
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=50, alpha=0.5, ec='k')
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')

labeling data points when X axis is a string [duplicate]

I have created a bar chart and a line chart using two different y-axes for the following dataframe.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'DXC':['T1', 'H1', 'HP', 'T1_or_H1_or_HP'],
'Count': [2485, 5595, 3091, 9933],
'percent':[1.06, 2.39, 1.31, 4.23]})
DXC Count percent
0 T1 2485 1.06
1 H1 5595 2.39
2 HP 3091 1.31
3 T1_or_H1_or_HP 9933 4.23
Using the following code, I can also display values next to each bar in the bar chart. However, I have not been successful thus far in my attempts to also display the label (percent) values for the line plot.
fig=plt.figure()
#AX: bar chart
ax=df["Count"].plot(kind="bar", color="orange")
ax.set_ylabel("Counts")
ax.set_xlabel("")
ax.set_ylim(0,20000)
for tick in ax.get_xticklabels():
tick.set_rotation(0)
#AX2: Create secondary y-axis with same x-axis as above for plotting percent values
ax2=ax.twinx()
ax2.plot(ax.get_xticks(),df["percent"], color="red", linewidth=4, marker = "o")
ax2.grid(False)
ax2.set_ylabel("Percent", color = "red")
ax2.set_ylim(0,4.5)
ax2.tick_params(labelcolor="red", axis='y')
def add_value_labels(ax, spacing=5):
for i in ax.patches:
y_value = i.get_height()
x_value = i.get_x() + i.get_width() / 2
space = spacing
va = 'bottom'
# Use Y value as label and format number with no decimal place
label = "{:.0f}".format(y_value)
# Create annotation
ax.annotate(label,(x_value, y_value), xytext=(0, space), textcoords="offset points", ha='center', va=va)
add_value_labels(ax)
plt.show()
Can somebody suggest how to display labels for both bar plot and line plot in the same figure?
Here is a modified function that will achieve the required task. The trick is to extract the x and y values based on the type of the chart you have. For a line chart, you can use ax.lines[0] and then get_xdata and get_ydata
def add_value_labels(ax, typ, spacing=5):
space = spacing
va = 'bottom'
if typ == 'bar':
for i in ax.patches:
y_value = i.get_height()
x_value = i.get_x() + i.get_width() / 2
label = "{:.0f}".format(y_value)
ax.annotate(label,(x_value, y_value), xytext=(0, space),
textcoords="offset points", ha='center', va=va)
if typ == 'line':
line = ax.lines[0]
for x_value, y_value in zip(line.get_xdata(), line.get_ydata()):
label = "{:.2f}".format(y_value)
ax.annotate(label,(x_value, y_value), xytext=(0, space),
textcoords="offset points", ha='center', va=va)
add_value_labels(ax, typ='bar')
add_value_labels(ax2, typ='line')
From matplotlib v3.4.0 it's easier to use matplotlib.pyplot.bar_label, as explained in this answer.
The OP has many extraneous steps, which can be removed by using the yticks, secondary_y, and ylabel parameters for pandas.DataFrame.plot
pandas.DataFrame.itertuples can be used to annotate the line with matplotlib.axes.Axes.annotate because .Index corresponds to the x-axis locations and .percent is the correct y value for ax2.
See How to add hovering annotations to a plot for additional options to annotate the line.
See How to change the color of the axis, ticks and labels for a plot to easily change colors of various aspects of the figure.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# create the bar plot for the Count column and specify the yticks
ax = df.Count.plot(kind='bar', color='tab:orange', rot=0, yticks=range(0, 20001, 2500), figsize=(9, 5), ylabel='Counts')
# add bar labels
ax.bar_label(ax.containers[0])
# add the line plot for the percent column and specify the yticks and secondary_y
ax2 = df.percent.plot(marker='.', yticks=np.arange(0, 5, 0.5), secondary_y=True, ax=ax, ylabel='Percent')
# annotate the line by iterating through each row with itertuples
for row in df.itertuples():
ax2.annotate(text=row.percent, xy=(row.Index, row.percent))

Why I have two legends ? How to fusion the legends ? Python

I'm plotting 2 dataframes with this method:
df.plot(ax=ax, x='x', y='y', label = "first_df")
df2.plot(ax=ax, x='x', y='y', label = "second_df")
And I add some avxspan functions:
plt.axvspan(x, y, label = value)
Since that I have multiple avxspan and there are also dupplicated values, I am using this code to uniquely display the values.
handles, labels = plt.gca().get_legend_handles_labels()
by_label = dict(zip(labels, handles))
plt.legend(by_label.values(), by_label.keys(),loc='upper center', bbox_to_anchor=(1.1, 0.8))
But when I display the legend, I have one legend for the dfs and an other for the avxspan functions. I think it is because I use plot for dfs and plt for axvspan, so I don't know how to fusion the legends.
EDITH:
I tried this with ax1 et ax2 for my dfs:
h1, l1 = ax1.get_legend_handles_labels()
h2, l2 = ax2.get_legend_handles_labels()
ax1.legend((h1+h2), l1+l2, loc='upper center', bbox_to_anchor=(1.1, 0.8))
It's working but I have dupplicates in the legend, how can I remove it ?
In stead of using df.plot which creates a legend whenever it's called, you can use ax.plot:
ax.plot(df['x'], df['y'], label='first df')
ax.plot(df2['x'], df2['y'], label='second df')
ax.legend()
In the following example, the legend is combined by plotting everything onto the same Axes including the avxspan with ax.avxspan and then running ax.legend to add the avxspan legend to the existing legend:
import numpy as np # v 1.19.2
import pandas as pd # v 1.2.3
# Create sample dataset
rng = np.random.default_rng(seed=1)
size = 30
df = pd.DataFrame(dict(x=range(size), y=rng.integers(0, 100, size=size)))
df2 = pd.DataFrame(dict(x=range(size), y=rng.integers(10, 50, size=size)))
# Plot data unto single Axes with combined legend using multiple plotting functions
ax = df.plot(x='x', y='y', label='first_df', figsize=(8,4))
df2.plot(ax=ax, x='x', y='y', label = 'second_df')
ax.axvspan(10, 15, label='span', facecolor='black', edgecolor=None, alpha=0.2)
ax.legend(loc='upper right');

Bar chart with bars from two different dataframes

I have the following dataframes:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df_One = pd.DataFrame({'Category': ['1024Sen', '1024Act', '2089Eng', '2089Sen'],
'Qtd_Instrumentation': [18, 5, 25, 10]})
df_Two = pd.DataFrame({'Category': ['1024Sen', '1024Act', '2089Eng', '2089Sen'],
'Qtd_Instrumentation': [14, 1, 22, 10]})
I would like to build a bar graph that contains the information from the two data frames, that is, the blue bars indicate the datadrame_One and the vertical red bars indicate the information of the dataframe_Two.
I tried to implement it as follows:
fig, ax = plt.subplots()
n_group = len(df_One['Category'])
index = np.arange(n_group)
bar_width = 0.35
opacity = 0.8
rects1 = df_One.plot.bar(x='Category', y='Qtd_Instrumentation', color='r', label = 'Station
One')
rects2 = df_Two.plot.bar(x='Category', y='Qtd_Instrumentation', color='b', label = 'Station
Two')
plt.xlabel('Category Instrumentation')
plt.ylabel('Qtd Instrumentation')
plt.show()
However, this code is wrong, as it designs two bar graphs instead of building just one graph with the two representations.
Does anyone know how I can build this described chart? Tks
This is one way to do it by choosing the align='edge' option and then using positive width for one bar and negative width for another. This will make them aligned next to each other. Also, you have to call plt.legend() to display the legends
fig, ax = plt.subplots()
index = np.arange(len(df_One['Category']))
bar_width = 0.35
opacity = 0.8
ax.bar(index, df_One['Qtd_Instrumentation'], color='r', align='edge', width=bar_width, label = 'Station One')
ax.bar(index, df_Two['Qtd_Instrumentation'], color='b', align='edge', width=-bar_width,label = 'Station Two')
# Assign the tick labels
ax.set_xticks(index)
ax.set_xticklabels(df_One['Category'], rotation=90)
plt.xlabel('Category Instrumentation')
plt.ylabel('Qtd Instrumentation')
plt.legend()
plt.show()
Alternative method is the following by using the keyword position to place the bars next to each other as shown here
df_One.Qtd_Instrumentation.plot(kind='bar', color='red', ax=ax, width=bar_width, position=1)
df_Two.Qtd_Instrumentation.plot(kind='bar', color='blue', ax=ax, width=bar_width, position=0)
ax.set_xlim(-0.5, 3.5)
ax.set_xticks(index)
ax.set_xticklabels(df_One['Category'])
I'd suggest merging the two dataframes first:
df_c = pd.merge(df_One, df_Two, on='Category')
df_c.plot.bar(x='Category')
gives:
note that you might want to pass how='outer' to merge if you have missing categories.

Categories