Matplotlib: Automatic labeling in side by side bar chart - python

Based on the following example from matplotlib, I have made a function that plots two weekly time series as a side-by-side bar chart.
https://matplotlib.org/3.1.1/gallery/lines_bars_and_markers/barchart.html#sphx-glr-gallery-lines-bars-and-markers-barchart-py
My problem is that I set the xticks explicitly, and that creates messy xtick-labels. Is there a way to get matplotlib to choose xticks (position and labels) explicitly in such a plot?
I must say that I find the whole operation with specifycing the position of the bar using (x - width/2) quite inelegant to get side-by-side-bars - are there other options (other packages than matplotlib or other specifications in matplotlib) to avoid writing such explicit code?
Below is code and result. I'm seeking a solution that selects the number and placements of xticks and xticklabels that leaves it readable:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
labels = ['W1-2020', 'W2-2020', 'W3-2020', 'W4-2020', 'W5-2020','W6-2020','W7-2020','W8-2020','W9-2020','W10-2020','W11-2020','W12-2020','W13-2020','W14-2020','W15-2020']
men_means = [20, 34, 30, 35, 27,18,23,29,27,29,38,28,17,28,23]
women_means = [25, 32, 34, 20, 25,27,18,23,29,27,29,38,19,20, 34]
x = np.arange(len(labels)) # the label locations
width = 0.35 # the width of the bars
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, men_means, width, label='Men')
rects2 = ax.bar(x + width/2, women_means, width, label='Women')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Scores')
ax.set_title('Scores by group and gender')
ax.set_xticks(x)
ax.set_xticklabels(labels)
ax.legend()
def autolabel(rects):
"""Attach a text label above each bar in *rects*, displaying its height."""
for rect in rects:
height = rect.get_height()
ax.annotate('{}'.format(height),
xy=(rect.get_x() + rect.get_width() / 2, height),
xytext=(0, 3), # 3 points vertical offset
textcoords="offset points",
ha='center', va='bottom')
autolabel(rects1)
autolabel(rects2)
fig.tight_layout()
plt.show()

Solution 1 : Using Pandas
You can first create a pandas DataFrame and then plot a multiple bar chart directly. The formatting of labels on the x-axis is much neater
df = pd.DataFrame(
{'labels': labels,
'men_means': men_means,
'women_means': women_means
})
df.plot(x="labels", y=["men_means", "women_means"], kind="bar")
Solution 2: Using Seaborn (adapted from this answer)
import seaborn as sns
fig, ax = plt.subplots(figsize=(6, 4))
tidy = df.melt(id_vars='labels').rename(columns=str.title)
sns.barplot(x='Labels', y='Value', hue='Variable', data=tidy, ax=ax)
sns.despine(fig)
ax.tick_params(axis='x', labelrotation=90)
To hide only every n-th tick, you can do the following as adapted from this answer
n = 2
for label in ax.xaxis.get_ticklabels()[::n]:
label.set_visible(False)
To show every n-th label, you can use the following trick
fig.canvas.draw()
n = 4
labels = [item.get_text() if i%n == 0 else "" for i, item in enumerate(ax.get_xticklabels())]
ax.set_xticklabels(labels);

Related

Why is my grouped bar graph not showing all 3 bars and how to make it more neater?

This is in connection to my earlier question with the link here: How do I add labels and trace lines into my grouped bar graph?
The dataframe used is the same.
I have decided to use this code in order to label my group bar graph. It produces a graph with all bars annotated with the correct data. However, it ended up like this:
This graph shows only two bars being displayed and third one not aligned together with the other bars.
I have even tried plt.figure(figsize()) to resize my graph to no avail.
Is there an explanation into why only 2 boxes were displayed for each x-axis ticks and is there a way to resolve this?
Thanks!
This is my code to refer to:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv(r'C:\Users\admin\Desktop\Customer_list_practice.csv')
months = df.columns.values.tolist()
months = months[1:]
# Increase size of plot in jupyter
plt.figure(figsize=[15,14])
labels = months
cus_1 = list(df.loc[0])
cus_1 = cus_1[1:]
cus_2 = list(df.loc[1])
cus_2 = cus_2[1:]
cus_3 = list(df.loc[2])
cus_3 = cus_3[1:]
# Increase size of plot in jupyter
plt.rcParams["figure.figsize"] = (10,10)
x = np.arange(len(labels)) # the label locations
width = 0.20 # the width of the bars
fig, ax = plt.subplots()
rects1 = ax.bar(x - width/2, cus_1, width, label='1')
rects2 = ax.bar(x + width/2, cus_2, width, label='2')
rects3 = ax.bar(x - width/2, cus_3, width, label='3')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Sales (S$)')
ax.set_title('Sales from each customers')
ax.set_xticks(x, labels)
ax.legend()
ax.bar_label(rects1, padding=3)
ax.bar_label(rects2, padding=3)
ax.bar_label(rects3, padding=3)
fig.tight_layout()
plt.legend(bbox_to_anchor=(-0.05,1), loc=1, prop={'size': 19})
plt.show()
The reason for the overlap is because of the code in the rects1/2/3 - these are basically plotting the rectangles required. In your code, you have them as x - width/2, x + width/2 and x - width/2. As you can see, the x-coordinates for the blue bar is the same as the green one, which is causing the bars to be plotted on top of each other. So, they should be x - width/2, x and x + width/2.
Once fixed, the bar width becomes an issue as they will overlap on each other because of the size. So, I reduced the width from 0.5 to 0.3. I think you can go as high as 0.33 without overlapping, but will leave it up to you.
The updated code and output is below. Note that my version of matplotlib doesn't allow bar_level, so have commented it. You can adjust that if you need to.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.read_csv(r'C:\Users\admin\Desktop\Customer_list_practice.csv')
months = df.columns.values.tolist()
months = months[1:]
# Increase size of plot in jupyter
plt.figure(figsize=[15,14])
labels = months
cus_1 = list(df.loc[0])
cus_1 = cus_1[1:]
cus_2 = list(df.loc[1])
cus_2 = cus_2[1:]
cus_3 = list(df.loc[2])
cus_3 = cus_3[1:]
# Increase size of plot in jupyter
plt.rcParams["figure.figsize"] = (10,10)
x = np.arange(len(labels)) # the label locations
width = 0.30 # the width of the bars
print("x = ", x)
fig, ax = plt.subplots()
rects1 = ax.bar(x - width, cus_1, width, label='1')
rects2 = ax.bar(x, cus_2, width, label='2')
rects3 = ax.bar(x + width, cus_3, width, label='3')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Sales (S$)')
ax.set_title('Sales from each customers')
ax.set_xticks(x, labels)
ax.legend()
#ax.bar_label(rects1, padding=3)
#ax.bar_label(rects2, padding=3)
#ax.bar_label(rects3, padding=3)
fig.tight_layout()
plt.show()
Output plot

How to draw two stacked histograms side-by-side with Matplotlib/Seaborn

I am drawing a couple of stacked histograms using the code below.
I am using the same bin edges for both so they are aligned nicely.
How can I have these displayed on the same chart? I.e. a green/red and a blue/orange bar per each bin -- side-by-side.
I saw many questions and answers similar to this suggesting using a bar chart and calculating the width of the bars, but this seems like something that should be supported out-of-the-box, at least in matplotlib.
Also, can I draw stacked histograms directly with seaborn? I wasn't able to find a way.
plt.hist( [correct_a, incorrect_a], bins=edges, stacked=True, color=['green', 'red'], rwidth=0.95, alpha=0.5)
plt.hist( [correct_b, incorrect_b], bins=edges, stacked=True, color=['green', 'red'], rwidth=0.95, alpha=0.5)
Well, I think plt.bar is your best bet here. To create stacked histograms, you can use its bottom argument. To display two bar charts side-by-side you can shift the x values by some width, just like in this original matplotlib example:
import numpy as np
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(16, 8))
correct_a = np.random.randint(0, 20, 20)
incorrect_a = np.random.randint(0, 20, 20)
correct_b = np.random.randint(0, 20, 20)
incorrect_b = np.random.randint(0, 20, 20)
edges = len(correct_a)
width=0.35
rects1 = ax.bar(np.arange(edges), incorrect_a, width, color="red", label="incorrect_a")
rects2 = ax.bar(np.arange(edges), correct_a, width, bottom=incorrect_a, color='seagreen', label="correct_a")
rects3 = ax.bar(np.arange(edges) + width, incorrect_b, width, color="blue", label="incorrect_b")
rects4 = ax.bar(np.arange(edges) + width, correct_b, width, bottom=incorrect_b, color='orange', label="correct_b")
# placing the ticks to the middle
ticks_aligned = np.arange(edges) + width // 2
ax.set_xticks(np.arange(edges) + width / 2)
ax.set_xticklabels((str(tick) for tick in ticks_aligned))
ax.legend()
This returns:
Here is a simple example (histograms are not stacked) for 2 histograms displayed together with each bin having a dedicated place for each of them side by side:
# generating some data for this example:
a = [1,2,3,4,3,4,2,3,4,5,4,3,4,5,4,1,2,3,2,1,3,4,5,6,7,6,5,4,3,4,6,5,4,3,4]
b = [1,2,3,4,5,6,7,6,5,6,7,6,5,4,3,4,5,6,7,6,7,6,7,5,4,3,2,1,3,4,5,6,5,6,5,6,7,6,7]
# plotting 2 histograms with bars centered differently within each bin:
plt.hist(a, bins=5, align='left', rwidth=0.5)
plt.hist(b, bins=5, align='mid', rwidth=0.5, color='r')

Editing densely packed x labels in timeseries

I have the following code which produces a plot.
labels = sorted(set(df.index))
a = df.loc[df.user_id ==1234, 'count']
b = df.loc[df.user_id ==5678, 'count']
width = 0.5
x = np.arange(len(labels)) # the label locations
fig, ax = plt.subplots(figsize=(10,5))
rects1 = plt.plot_date(x, a, width, label='a')
rects2 = plt.plot_date(x, b, width, label='b', color = 'orange')
ax.set_xticks(x)
ax.set_xticklabels(labels, rotation=90)
ax.legend(['a','b'])
plt.ylabel('counts')
fig.tight_layout()
plt.gcf().autofmt_xdate()
plt.show()
The obvious problem is not being able to read the densely packed x labels. Without changing the plot, how can I label by week rather than day?
plt.xticks(range(0,len(label), 7), label[::7])
given a daily data this show ticks every 7 days

Python - dual y axis chart, align zero

I'm trying to create a horizontal bar chart, with dual x axes. The 2 axes are very different in scale, 1 set goes from something like -5 to 15 (positive and negative value), the other set is more like 100 to 500 (all positive values).
When I plot this, I'd like to align the 2 axes so zero shows at the same position, and only the negative values are to the left of this. Currently the set with all positive values starts at the far left, and the set with positive and negative starts in the middle of the overall plot.
I found the align_yaxis example, but I'm struggling to align the x axes.
Matplotlib bar charts: Aligning two different y axes to zero
Here is an example of what I'm working on with simple test data. Any ideas/suggestions? thanks
import pandas as pd
import matplotlib.pyplot as plt
d = {'col1':['Test 1','Test 2','Test 3','Test 4'],'col 2':[1.4,-3,1.3,5],'Col3':[900,750,878,920]}
df = pd.DataFrame(data=d)
fig = plt.figure() # Create matplotlib figure
ax = fig.add_subplot(111) # Create matplotlib axes
ax2 = ax.twiny() # Create another axes that shares the same y-axis as ax.
width = 0.4
df['col 2'].plot(kind='barh', color='darkblue', ax=ax, width=width, position=1,fontsize =4, figsize=(3.0, 5.0))
df['Col3'].plot(kind='barh', color='orange', ax=ax2, width=width, position=0, fontsize =4, figsize=(3.0, 5.0))
ax.set_yticklabels(df.col1)
ax.set_xlabel('Positive and Neg',color='darkblue')
ax2.set_xlabel('Positive Only',color='orange')
ax.invert_yaxis()
plt.show()
I followed the link from a question and eventually ended up at this answer : https://stackoverflow.com/a/10482477/5907969
The answer has a function to align the y-axes and I have modified the same to align x-axes as follows:
def align_xaxis(ax1, v1, ax2, v2):
"""adjust ax2 xlimit so that v2 in ax2 is aligned to v1 in ax1"""
x1, _ = ax1.transData.transform((v1, 0))
x2, _ = ax2.transData.transform((v2, 0))
inv = ax2.transData.inverted()
dx, _ = inv.transform((0, 0)) - inv.transform((x1-x2, 0))
minx, maxx = ax2.get_xlim()
ax2.set_xlim(minx+dx, maxx+dx)
And then use it within the code as follows:
import pandas as pd
import matplotlib.pyplot as plt
d = {'col1':['Test 1','Test 2','Test 3','Test 4'],'col 2' [1.4,-3,1.3,5],'Col3':[900,750,878,920]}
df = pd.DataFrame(data=d)
fig = plt.figure() # Create matplotlib figure
ax = fig.add_subplot(111) # Create matplotlib axes
ax2 = ax.twiny() # Create another axes that shares the same y-axis as ax.
width = 0.4
df['col 2'].plot(kind='barh', color='darkblue', ax=ax, width=width, position=1,fontsize =4, figsize=(3.0, 5.0))
df['Col3'].plot(kind='barh', color='orange', ax=ax2, width=width, position=0, fontsize =4, figsize=(3.0, 5.0))
ax.set_yticklabels(df.col1)
ax.set_xlabel('Positive and Neg',color='darkblue')
ax2.set_xlabel('Positive Only',color='orange')
align_xaxis(ax,0,ax2,0)
ax.invert_yaxis()
plt.show()
This will give you what you're looking for

How to plot a superimposed bar chart using matplotlib in python?

I want to plot a bar chart or a histogram using matplotlib. I don't want a stacked bar plot, but a superimposed barplot of two lists of data, for instance I have the following two lists of data with me:
Some code to begin with :
import matplotlib.pyplot as plt
from numpy.random import normal, uniform
highPower = [1184.53,1523.48,1521.05,1517.88,1519.88,1414.98,1419.34,
1415.13,1182.70,1165.17]
lowPower = [1000.95,1233.37, 1198.97,1198.01,1214.29,1130.86,1138.70,
1104.12,1012.95,1000.36]
plt.hist(highPower, bins=10, histtype='stepfilled', normed=True,
color='b', label='Max Power in mW')
plt.hist(lowPower, bins=10, histtype='stepfilled', normed=True,
color='r', alpha=0.5, label='Min Power in mW')
I want to plot these two lists against the number of values in the two lists such that I am able to see the variation per reading.
You can produce a superimposed bar chart using plt.bar() with the alpha keyword as shown below.
The alpha controls the transparency of the bar.
N.B. when you have two overlapping bars, one with an alpha < 1, you will get a mixture of colours. As such the bar will appear purple even though the legend shows it as a light red. To alleviate this I have modified the width of one of the bars, this way even if your powers should change you will still be able to see both bars.
plt.xticks can be used to set the location and format of the x-ticks in your graph.
import matplotlib.pyplot as plt
import numpy as np
width = 0.8
highPower = [1184.53,1523.48,1521.05,1517.88,1519.88,1414.98,
1419.34,1415.13,1182.70,1165.17]
lowPower = [1000.95,1233.37, 1198.97,1198.01,1214.29,1130.86,
1138.70,1104.12,1012.95,1000.36]
indices = np.arange(len(highPower))
plt.bar(indices, highPower, width=width,
color='b', label='Max Power in mW')
plt.bar([i+0.25*width for i in indices], lowPower,
width=0.5*width, color='r', alpha=0.5, label='Min Power in mW')
plt.xticks(indices+width/2.,
['T{}'.format(i) for i in range(len(highPower))] )
plt.legend()
plt.show()
Building on #Ffisegydd's answer, if your data is in a Pandas DataFrame, this should work nicely:
def overlapped_bar(df, show=False, width=0.9, alpha=.5,
title='', xlabel='', ylabel='', **plot_kwargs):
"""Like a stacked bar chart except bars on top of each other with transparency"""
xlabel = xlabel or df.index.name
N = len(df)
M = len(df.columns)
indices = np.arange(N)
colors = ['steelblue', 'firebrick', 'darksage', 'goldenrod', 'gray'] * int(M / 5. + 1)
for i, label, color in zip(range(M), df.columns, colors):
kwargs = plot_kwargs
kwargs.update({'color': color, 'label': label})
plt.bar(indices, df[label], width=width, alpha=alpha if i else 1, **kwargs)
plt.xticks(indices + .5 * width,
['{}'.format(idx) for idx in df.index.values])
plt.legend()
plt.title(title)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
if show:
plt.show()
return plt.gcf()
And then in a python command line:
low = [1000.95, 1233.37, 1198.97, 1198.01, 1214.29, 1130.86, 1138.70, 1104.12, 1012.95, 1000.36]
high = [1184.53, 1523.48, 1521.05, 1517.88, 1519.88, 1414.98, 1419.34, 1415.13, 1182.70, 1165.17]
df = pd.DataFrame(np.matrix([high, low]).T, columns=['High', 'Low'],
index=pd.Index(['T%s' %i for i in range(len(high))],
name='Index'))
overlapped_bar(df, show=False)
It is actually simpler than the answers all over the internet make it appear.
a = range(1,10)
b = range(4,13)
ind = np.arange(len(a))
fig = plt.figure()
ax = fig.add_subplot(111)
ax.bar(x=ind, height=a, width=0.35,align='center')
ax.bar(x=ind, height=b, width=0.35/3, align='center')
plt.xticks(ind, a)
plt.tight_layout()
plt.show()

Categories