I have a bar chart of data from 8 separate buildings, the data is separated by year, I'm trying to place the growth each building went through in the last year on top of the bar chart.
I have this written currently:
n_groups = 8
numbers_2017 = (122,96,42,23,23,22,0,0)
numbers_2018 = (284,224,122,52,41,24,3,1)
fig, ax = plt.subplots(figsize=(15, 10))
index = np.arange(n_groups)
bar_width = 0.35
events2017 = plt.bar(index, numbers_2017, bar_width,
alpha=0.7,
color='#fec615',
label='2017')
events2018 = plt.bar(index + bar_width, numbers_2018, bar_width,
alpha=0.7,
color='#044a05',
label='2018')
labels = ("8 specific buildings passed as strings")
labels = [ '\n'.join(wrap(l, 15)) for l in labels ]
plt.ylabel('Total Number of Events', fontsize=18, fontweight='bold', color = 'white')
plt.title('Number of Events per Building By Year\n', fontsize=20, fontweight='bold', color = 'white')
plt.xticks(index + bar_width / 2)
plt.yticks(color = 'white', fontsize=12)
ax.set_xticklabels((labels),fontsize=12, fontweight='bold', color = 'white')
plt.legend(loc='best', fontsize='xx-large')
plt.tight_layout()
plt.show()
Looking through similar questions on here many of them split the total count across all the bars, whereas I'm just trying to get a positive (or negative) growth percentage placed on top of the most recent year, 2018 in this case.
I found this excellent example online, however it does exactly what I explained earlier, splits up the percentages across the chart:
totals = []
# find the values and append to list
for i in ax.patches:
totals.append(i.get_height())
# set individual bar lables using above list
total = sum(totals)
# set individual bar lables using above list
for i in ax.patches:
# get_x pulls left or right; get_height pushes up or down
ax.text(i.get_x()-.03, i.get_height()+.5, \
str(round((i.get_height()/total)*100, 1))+'%', fontsize=15,
color='dimgrey')
Please let me know if I can list any examples or images that would help, and if this is a dupe please don't hesitate to send me to a (RELEVANT) original and I can shut this question down, Thanks!
I think you gave the answer yourself with the second part of code you gave.
The only thing you had to do was change the ax to the object you wanted the text above, which in this case was events2018.
totals = []
for start, end in zip(events2017.patches, events2018.patches):
if start.get_height() != 0:
totals.append( (end.get_height() - start.get_height())/start.get_height() * 100)
else:
totals.append("NaN")
# set individual bar lables using above list
for ind, i in enumerate(events2018.patches):
# get_x pulls left or right; get_height pushes up or down
if totals[ind] != "NaN":
plt.text(i.get_x(), i.get_height()+.5, \
str(round((totals[ind]), 1))+'%', fontsize=15,
color='dimgrey')
else:
plt.text(i.get_x(), i.get_height()+.5, \
totals[ind], fontsize=15, color='dimgrey')
Related
This question already has answers here:
Stacked bars are unexpectedly annotated with the sum of bar heights
(2 answers)
How to add value labels on a bar chart
(7 answers)
Closed 10 months ago.
I want to create a stacked horizontal bar plot with values of each stack displayed inside it and the total value of the stacks just after the bar. Using python matplotlib, I could create a simple barh. My dataframe looks like below:
import pandas as pd
df = pd.DataFrame({"single":[168,345,345,352],
"comp":[481,44,23,58],})
item = ["white_rice",
"pork_and_salted_vegetables",
"sausage_and_potato_in_tomato_sauce",
"curry_vegetable",]
df.index = item
Expect to get bar plot like below except that it is not horizontal:
The code I tried is here...and i get AttributeError: 'DataFrame' object has no attribute 'rows'. Please help me with horizontal bar plot. Thanks.
fig, ax = plt.subplots(figsize=(10,4))
colors = ['c', 'y']
ypos = np.zeros(len(df))
for i, row in enumerate(df.index):
ax.barh(df.index, df[row], x=ypos, label=row, color=colors[i])
bottom += np.array(df[row])
totals = df.sum(axis=0)
x_offset = 4
for i, total in enumerate(totals):
ax.text(totals.index[i], total + x_offset, round(total), ha='center',) # weight='bold')
x_offset = -15
for bar in ax.patches:
ax.text(
# Put the text in the middle of each bar. get_x returns the start so we add half the width to get to the middle.
bar.get_y() + bar.get_height() / 2,
bar.get_width() + bar.get_x() + x_offset,
# This is actual value we'll show.
round(bar.get_width()),
# Center the labels and style them a bit.
ha='center',
color='w',
weight='bold',
size=10)
labels = df.index
ax.set_title('Label Distribution Overview')
ax.set_yticklabels(labels, rotation=90)
ax.legend(fancybox=True)
Consider the following approach to get something similar with matplotlib only (I use matplotlib 3.5.0). Basically the job is done with bar/barh and bar_label combination. You may change label_type and add padding to tweak plot appearance. Also you may use fmt to format values. Edited code with total values added.
import matplotlib.pyplot as plt
import pandas as pd
import random
def main(data):
data['total'] = data['male'] + data['female']
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.suptitle('Plot title')
ax1.bar(x=data['year'].astype(str), height=data['female'], label='female')
ax1.bar_label(ax1.containers[0], label_type='center')
ax1.bar(x=data['year'].astype(str), height=data['male'], bottom=data['female'], label='male')
ax1.bar_label(ax1.containers[1], label_type='center')
ax1.bar_label(ax1.containers[1], labels=data['total'], label_type='edge')
ax1.legend()
ax2.barh(y=data['year'].astype(str), width=data['female'], label='female')
ax2.bar_label(ax2.containers[0], label_type='center')
ax2.barh(y=data['year'].astype(str), width=data['male'], left=data['female'], label='male')
ax2.bar_label(ax2.containers[1], label_type='center')
ax2.bar_label(ax2.containers[1], labels=data['total'], label_type='edge')
ax2.legend()
plt.show()
if __name__ == '__main__':
N = 4
main(pd.DataFrame({
'year': [2010 + val for val in range(N)],
'female': [int(10 + 100 * random.random()) for dummy in range(N)],
'male': [int(10 + 100 * random.random()) for dummy in range(N)]}))
Result (with total values added):
I'm working on an experimentation personal project. I have the following dataframes:
treat_repr = pd.DataFrame({'kpi': ['cpsink', 'hpu', 'mpu', 'revenue', 'wallet']
,'diff_pct': [0.655280, 0.127299, 0.229958, 0.613308, -0.718421]
,'me_pct': [1.206313, 0.182875, 0.170821, 1.336590, 2.229763]
,'p': [0.287025, 0.172464, 0.008328, 0.368466, 0.527718]
,'significance': ['insignificant', 'insignificant', 'significant', 'insignificant', 'insignificant']})
pre_treat_repr = pd.DataFrame({'kpi': ['cpsink', 'hpu', 'mpu', 'revenue', 'wallet']
,'diff_pct': [0.137174, 0.111005, 0.169490, -0.152929, -0.450667]
,'me_pct': [1.419080, 0.207081, 0.202014, 1.494588, 1.901672]
,'p': [0.849734, 0.293427, 0.100091, 0.841053, 0.642303]
,'significance': ['insignificant', 'insignificant', 'insignificant', 'insignificant', 'insignificant']})
I have used the below code to construct errorbar plot, which works fine:
def confint_plot(df):
plt.style.use('fivethirtyeight')
fig, ax = plt.subplots(figsize=(18, 10))
plt.errorbar(df[df['significance'] == 'significant']["diff_pct"], df[df['significance'] == 'significant']["kpi"], xerr = df[df['significance'] == 'significant']["me_pct"], color = '#d62828', fmt = 'o', capsize = 10)
plt.errorbar(df[df['significance'] == 'insignificant']["diff_pct"], df[df['significance'] == 'insignificant']["kpi"], xerr = df[df['significance'] == 'insignificant']["me_pct"], color = '#2a9d8f', fmt = 'o', capsize = 10)
plt.legend(['significant', 'insignificant'], loc = 'best')
ax.axvline(0, c='red', alpha=0.5, linewidth=3.0,
linestyle = '--', ymin=0.0, ymax=1)
plt.title("Confidence Intervals of Continous Metrics", size=14, weight='bold')
plt.xlabel("% Difference of Control over Treatment", size=12)
plt.show()
for which the output of confint_plot(treat_repr) looks like this:
Now if I run the same plot function on a pre-treatment dataframe confint_plot(pre_treat_repr), the plot looks like this:
We can observe from both the plots that the order of the variables changed from 1st plot to 2nd plot depending on whether the kpi is significant(that's the way I figured after exhausting many attempts).
Questions:
How do I make a change to the code to dynamically allocate color maps without changing the order of the kpis on y axis?
Currently I have manually typed in the legends. Is there a way to dynamically populate legends?
Appreciate the help!
Because you plot the significant KPIs first, they will always appear on the bottom of the chart. How you solve this and keep the desired colors depends on the kind of charts you are making with matplotlib. With scatter charts, you can specify a color array in c parameter. Error bar charts do not offer that functionality.
One way to work around that is to sort your KPIs, give them numeric position (0, 1, 2, 3 , ...), plot them twice (once for significants, once for insignificants) and re-tick them:
def confint_plot(df):
plt.style.use('fivethirtyeight')
fig, ax = plt.subplots(figsize=(18, 10))
# Sort the KPIs alphabetically. You can change the order to anything
# that fits your purpose
df_plot = df.sort_values('kpi').assign(y=range(len(df)))
for significance in ['significant', 'insignificant']:
cond = df_plot['significance'] == significance
color = '#d62828' if significance == 'significant' else '#2a9d8f'
# Plot them in their numeric positions first
plt.errorbar(
df_plot.loc[cond, 'diff_pct'], df_plot.loc[cond, 'y'],
xerr=df_plot.loc[cond, 'me_pct'], label=significance,
fmt='o', capsize=10, c=color
)
plt.legend(loc='best')
ax.axvline(0, c='red', alpha=0.5, linewidth=3.0,
linestyle = '--', ymin=0.0, ymax=1)
# Re-tick to show the KPIs
plt.yticks(df_plot['y'], df_plot['kpi'])
plt.title("Confidence Intervals of Continous Metrics", size=14, weight='bold')
plt.xlabel("% Difference of Control over Treatment", size=12)
plt.show()
I was trying to reproduce this plot with Matplotlib:
So, by looking at the documentation, I found out that the closest thing is a grouped bar chart. The problem is that I have a different number of "bars" for each category (subject, illumination, ...) compared to the example provided by matplotlib that instead only has 2 classes (M, F) for each category (G1, G2, G3, ...). I don't know exactly from where to start, does anyone here has any clue? I think in this case the trick they made to specify bars location:
x = np.arange(len(labels)) # the label locations
width = 0.35 # the width of the bars
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, men_means, width, label='Men')
rects2 = ax.bar(x + width/2, women_means, width, label='Women')
does not work at all as in the second class (for example) there is a different number of bars. It would be awesome if anyone could give me an idea. Thank you in advance!
Supposing the data resides in a dataframe, the bars can be generated by looping through the categories:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
# first create some test data, similar in structure to the question's
categories = ['Subject', 'Illumination', 'Location', 'Daytime']
df = pd.DataFrame(columns=['Category', 'Class', 'Value'])
for cat in categories:
for _ in range(np.random.randint(2, 7)):
df = df.append({'Category': cat,
'Class': "".join(np.random.choice([*'tuvwxyz'], 10)),
'Value': np.random.uniform(10, 17)}, ignore_index=True)
fig, ax = plt.subplots()
start = 0 # position for first label
gap = 1 # gap between labels
labels = [] # list for all the labels
label_pos = np.array([]) # list for all the label positions
# loop through the categories of the dataframe
# provide a list of colors (at least as long as the expected number of categories)
for (cat, df_cat), color in zip(df.groupby('Category', sort=False), ['navy', 'orange'] * len(df)):
num_in_cat = len(df_cat)
# add a text for the category, using "axes coordinates" for the y-axis
ax.text(start + num_in_cat / 2, 0.95, cat, ha='center', va='top', transform=ax.get_xaxis_transform())
# positions for the labels of the current category
this_label_pos = np.arange(start, start + num_in_cat)
# create bars at the desired positions
ax.bar(this_label_pos, df_cat['Value'], color=color)
# store labels and their positions
labels += df_cat['Class'].to_list()
label_pos = np.append(label_pos, this_label_pos)
start += num_in_cat + gap
# set the positions for the labels
ax.set_xticks(label_pos)
# set the labels
ax.set_xticklabels(labels, rotation=30)
# optionally set a new lower position for the y-axis
ax.set_ylim(ymin=9)
# optionally reduce the margin left and right
ax.margins(x=0.01)
plt.tight_layout()
plt.show()
I m trying to replicate this boxplot with seaborn. I wish to have a division like in the image. I thought that I can create a different Boxplot and union in a single image but isn't a great idea for computation, create many images, use a merge and delete all.
I used Seaborn to put the value on the box in this way
this is my function:
def boxplot(df, name,prot,min,max):
fig = plt.figure(figsize=(100, 20))
plt.title(name+ " RMSE from "+ str(min) +"h PSW to " + str(max) +"h PWS")
plt.ylabel("RMSE")
plt.xlabel("")
box_plot = sns.boxplot(x="Interval" ,y="RMSE", data=df, palette="Set1", showfliers = False)
ax = box_plot.axes
lines = ax.get_lines()
categories = ax.get_xticks()
for cat in categories:
# every 4th line at the interval of 6 is median line
# 0 -> p25 1 -> p75 2 -> lower whisker 3 -> upper whisker 4 -> p50 5 -> upper extreme value
y = round(lines[4+cat*5].get_ydata()[0],3)
ax.text(
cat,
y,
f'{y}',
ha='center',
va='center',
fontweight='bold',
size=70,
color='white',
bbox=dict(facecolor='#445A64'))
box_plot.figure.tight_layout()
plt.savefig("output/"+str(prot)+ str(name)+".jpg")
plt.close(fig)
I added this code too for each colour (foolish) to set the same colour for each same elements in the box. Ad example for values "15" on the x-axe I set red, and so on...
for i in range(0,len(box_plot.artists),12):
mybox = ax.artists[i]
mybox.set_facecolor('red')
for i in range(1,len(box_plot.artists),12):
mybox = ax.artists[i]
mybox.set_facecolor('orange')
I tried to use a "hue" for the category in my dataset (adding a row 15,30 near various values) but when use hue the boxplot take so many distances between them like this and I really don't like.
I tried to use "order" as same but didn't work.
This kind of plot is called "facetting" when you have a plot that's repeated for different levels of a categorical variable. In seaborn, you can create a FacetGrid, or use catplot to do this kind of things. With a bit of tweaking, you get a result that's very similar to your desired output
# dummy data
N=100
psws = [3,6,12,24,36]
times = [15,30,45,60]
df = pd.DataFrame(columns=pd.MultiIndex.from_product([psws,times], names=['PSW','Time']))
for psw in psws:
for time in times:
df[(psw,time)] = np.random.normal(loc=time, size=(N,))
# data need to be in "long-form"
df = df.melt()
g = sns.catplot(kind='box', data=df, x='Time', y='value', col='PSW', height=4, aspect=0.5, palette='Greys')
g.fig.subplots_adjust(wspace=0)
# remove the spines of the axes (except the leftmost one)
# and replace with dasehd line
for ax in g.axes.flatten()[1:]:
ax.spines['left'].set_visible(False)
[tick.set_visible(False) for tick in ax.yaxis.get_major_ticks()]
xmin,xmax = ax.get_xlim()
ax.axvline(xmin, ls='--', color='k')
I analyzed a guessing game for lift usage on a mountain and plotted those things per day. In the plot window, it looks the way I want it to look but when saving as a png, it squeezes the first column.
I have no idea why this happens. Does anyone have any idea? When saving from plot it doesn't do this.
correct depiction in plot window
squeezed first column
Code for the plot looks like this:
plt.figure(figsize=(15,8), dpi=80, facecolor = 'white')
# Histogram
ax1 = plt.subplot2grid( (1,3),(0,0), colspan = 2)
plt.hist(estDay.visitors[estDay.date == est_date], color='#E7E7E7', bins=15)
plt.axvline(estDay.visitors[estDay.date == est_date].mean(), linestyle='dashed', linewidth=3, color='#353535')
plt.axvline(erst.eintritte[erst.date == est_date].mean(), linestyle='dashed', linewidth=3, color='#AF272F')
plt.title(est_date)
ax1.spines['right'].set_visible(False)
ax1.spines['top'].set_visible(False)
ax1.yaxis.set_ticks_position('left')
ax1.xaxis.set_ticks_position('bottom')
summ = statSumm(est_date)
# Info Table
plt.subplot2grid( (1,3),(0,2))
plt.axis('off')
plt.table( cellText = summ.values,
rowLabels = summ.index,
colLabels = summ.columns,
cellLoc = 'center',
rowLoc = 'center',
bbox=[0.6, 0.1, 0.5, 0.8] )
plt.savefig('lottoDays/' + est_date + '.png')
The idea would be to draw the canvas once before saving such that the row has the chance to adapt its size to the row headers.
plt.gcf().canvas.draw()