matplotlib uneven group size bar charts side-by-side

matplotlib uneven group size bar charts side-by-side - python

I am trying to plot groups of data which have different bar sizes and may have different group sizes. How can I group the bars that belong to the same groups (shown as the same color) so that they are side by side? (Similar to this, except the same colors should be side-by-side)
width = 0.50
groupgap=2
y1=[20,80]
y2=[60,30,10]
x1 = np.arange(len(y1))
x2 = np.arange(len(y2))+groupgap
ind = np.concatenate((x1,x2))
fig, ax = plt.subplots()
rects1 = ax.bar(x1, y1, width, color='r', ecolor= "black",label="Gender")
rects2 = ax.bar(x2, y2, width, color='b', ecolor= "black",label="Type")
ax.set_ylabel('Population',fontsize=14)
ax.set_xticks(ind)
ax.set_xticklabels(('Male', 'Female','Student', 'Faculty','Others'),fontsize=14)
ax.legend()

The idea of using a gap between the categories (groupgap) is indeed a way to go. You would just have to add the length of the first group as well:
x2 = np.arange(len(y2))+groupgap+len(y1)
Here is the complete example where I used groupgap=1:
import matplotlib.pyplot as plt
import numpy as np
width = 1
groupgap=1
y1=[20,80]
y2=[60,30,10]
x1 = np.arange(len(y1))
x2 = np.arange(len(y2))+groupgap+len(y1)
ind = np.concatenate((x1,x2))
fig, ax = plt.subplots()
rects1 = ax.bar(x1, y1, width, color='r', edgecolor= "black",label="Gender")
rects2 = ax.bar(x2, y2, width, color='b', edgecolor= "black",label="Type")
ax.set_ylabel('Population',fontsize=14)
ax.set_xticks(ind)
ax.set_xticklabels(('Male', 'Female','Student', 'Faculty','Others'),fontsize=14)
plt.show()

Related

Setting color of area in Matplotlib

I'm creating a chart with matplotlib, here is my code:
fig = plt.figure(facecolor='#131722',dpi=155, figsize=(8, 4))
ax1 = plt.subplot2grid((1,2), (0,0), facecolor='#131722')
Colors = [['#0400ff', '#FF0000'], ['#09ff00', '#ff8c00']]
for x in List:
Index = List.index(x)
rate_buy = []
total_buy = []
for y in x['data']['bids']:
rate_buy.append(y[0])
total_buy.append(y[1])
rBuys = pd.DataFrame({'buy': rate_buy})
tBuys = pd.DataFrame({'total': total_buy})
ax1.plot(rBuys.buy, tBuys.total, color=Colors[Index][0], linewidth=0.5, alpha=0.8)
ax1.fill_between(rBuys.buy, 0, tBuys.total, facecolor=Colors[Index][0], alpha=1)
And here is the output:
The problem with the current output is that the colors of the two areas are "merging": basically the area BELOW the blue line should be blue, but instead it's green. How can i set it to be blue, for example, like in my example?
Example List data:
[[9665, 0.07062500000000001], [9666, 0.943708], [9667, 5.683787000000001], [9668, 9.802289], [9669, 11.763305], [9670, 14.286004], [9671, 16.180122], [9672, 23.316723000000003], [9673, 30.915156000000003], [9674, 33.44226200000001], [9675, 36.14526200000001], [9676, 45.76024100000001], [9677, 51.85294700000001], [9678, 58.79529300000001], [9679, 59.05322900000001], [9680, 60.27704500000001], [9681, 60.743885000000006], [9682, 66.75103700000001], [9683, 71.86412600000001], [9684, 73.659636], [9685, 78.08502800000001], [9686, 78.19614200000001], [9687, 79.98396400000001], [9688, 90.55855800000002]]

I guess the hint of #JohanC is correct, you are plotting in the wrong order and overlay your previous plots with new ones.
I tried to recreate a small example where total_buy1 > total_buy0, so in order to get the desired result you first have to plot total_buy1
and then total_buy0:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
Colors = [['#0400ff', '#FF0000'],
['#09ff00', '#ff8c00']]
n = 100
rate_buy = np.linspace(0, 1000, 100)
total_buy0 = np.linspace(0, 300, n)[::-1] + np.random.normal(scale=10, size=n)
total_buy1 = np.linspace(0, 600, n)[::-1] + np.random.normal(scale=10, size=n)
ax.plot(rate_buy, total_buy1, color=Colors[1][1], linewidth=0.5, alpha=0.8)
ax.fill_between(rate_buy, 0, total_buy1, facecolor=Colors[1][0], alpha=1)
ax.plot(rate_buy, total_buy0, color=Colors[0][1], linewidth=0.5, alpha=0.8)
ax.fill_between(rate_buy, 0, total_buy0, facecolor=Colors[0][0], alpha=1)
I noticed that you use Colors[Index][0] for both plotting calls, so the line and the area will not have different colors.

Drawing lines between two points with twinaxes

I have followed this example (Drawing lines between two plots in Matplotlib) but am running into problems. I believe it has something to do with the fact that I essentially have two different y points, but am not sure how to amend the code to fix it. I would like the line to start at one point and end at the other point directly below it, as well as plotting for all lines.
fig=plt.figure(figsize=(22,10), dpi=150)
ax1 = fig.add_subplot(1, 1, 1)
ax2 = ax1.twinx()
n = 10
y1 = np.random.random(n)
y2 = np.random.random(n) + 1
x1 = np.arange(n)
ax1.scatter(x1, y1)
ax2.scatter(x1, y2)
i = 1
xy = (x1[i],y1[i])
con = ConnectionPatch(xyA=xy, xyB=xy, coordsA="data", coordsB="data",
axesA=ax1, axesB=ax2, color="red")
ax2.add_artist(con)
ax1.plot(x1[i],y1[i],'g+',markersize=12)
ax2.plot(x1[i],y1[i],'g+',markersize=12)

Just iterate over zipped (x, y1, y2):
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import ConnectionPatch
fig = plt.figure(figsize=(10, 5), dpi=100)
ax1 = fig.add_subplot(1, 1, 1)
ax2 = ax1.twinx()
n = 10
y1 = np.random.random(n)
y2 = np.random.random(n) + 1
x1 = np.arange(n)
# I add some colors blue for left y-axis, red for right y-axis
ax1.scatter(x1, y1, c='b')
ax2.scatter(x1, y2, c='r')
# Now iterate over paired x, and 2 y values:
for xi, y1i, y2i in zip(x1, y1, y2):
con = ConnectionPatch(
xyA=(xi, y1i),
xyB=(xi, y2i),
coordsA="data",
coordsB="data",
axesA=ax1,
axesB=ax2,
color='g',
)
ax1.add_artist(con)
plt.show()
Out:

How to add error values next to error bars?

I'm using matplotlib for my plots. I have with me the plot and errorbar. I want to specify the error value in text next to the errorbars. I'm looking for something like this (edited in pinta):
Is this possible to do in this code:
import numpy as np
import matplotlib.pyplot as plt
import math
N = 8
y1 = [0.1532, 0.1861, 0.2618, 0.0584, 0.1839, 0.2049, 0.009, 0.2077]
y1err = []
for item in y1:
err = 1.96*(math.sqrt(item*(1-item)/10000))
y1err.append(err)
ind = np.arange(N)
width = 0.35
fig, ax = plt.subplots()
ax.bar(ind, y1, width, yerr=y1err, capsize=7)
ax.grid()
plt.show()

You can use the annotate function to add text labels in the plot. Here is how you could do it:
import numpy as np
import matplotlib.pyplot as plt
import math
N = 8
y1 = [0.1532, 0.1861, 0.2618, 0.0584, 0.1839, 0.2049, 0.009, 0.2077]
y1err = []
for item in y1:
err = 1.96*(math.sqrt(item*(1-item)/10000))
y1err.append(err)
ind = np.arange(N)
width = 0.35
fig, ax = plt.subplots()
ax.bar(ind, y1, width, yerr=y1err, capsize=7)
# add error values
for k, x in enumerate(ind):
y = y1[k] + y1err[k]
r = y1err[k] / y1[k] * 100
ax.annotate(f'{y1[k]:.2f} +/- {r:.2f}%', (x, y), textcoords='offset points',
xytext=(0, 3), ha='center', va='bottom', fontsize='x-small')
ax.grid()
plt.show()

Multiple plots on same figure with DataFrame.Plot

While I can get multiple lines on a chart and multiple bars on a chart - I cannot get a line and bar on the same chart using the same PeriodIndex.
Faux code follows ...
# play data
n = 100
x = pd.period_range('2001-01-01', periods=n, freq='M')
y1 = (Series(np.random.randn(n)).diff() + 5).tolist()
y2 = (Series(np.random.randn(n)).diff()).tolist()
df = pd.DataFrame({'bar':y2, 'line':y1}, index=x)
# let's plot
plt.figure()
ax = df['bar'].plot(kind='bar', label='bar')
df['line'].plot(kind='line', ax=ax, label='line')
plt.savefig('fred.png', dpi=200)
plt.close()
Any help will be greatly appreciated ...

The problem is: bar plots don't use index values as x axis, but use range(0, n). You can use twiny() to create a second axes that share yaxis with the bar axes, and draw line curve in this second axes.
The most difficult thing is how to align x-axis ticks. Here we define the align function, which will align ax2.get_xlim()[0] with x1 in ax1 and ax2.get_xlim()[1] with x2 in ax1:
def align_xaxis(ax2, ax1, x1, x2):
"maps xlim of ax2 to x1 and x2 in ax1"
(x1, _), (x2, _) = ax2.transData.inverted().transform(ax1.transData.transform([[x1, 0], [x2, 0]]))
xs, xe = ax2.get_xlim()
k, b = np.polyfit([x1, x2], [xs, xe], 1)
ax2.set_xlim(xs*k+b, xe*k+b)
Here is the full code:
from matplotlib import pyplot as plt
import pandas as pd
from pandas import Series
import numpy as np
n = 50
x = pd.period_range('2001-01-01', periods=n, freq='M')
y1 = (Series(np.random.randn(n)) + 5).tolist()
y2 = (Series(np.random.randn(n))).tolist()
df = pd.DataFrame({'bar':y2, 'line':y1}, index=x)
# let's plot
plt.figure(figsize=(20, 4))
ax1 = df['bar'].plot(kind='bar', label='bar')
ax2 = ax1.twiny()
df['line'].plot(kind='line', label='line', ax=ax2)
ax2.grid(color="red", axis="x")
def align_xaxis(ax2, ax1, x1, x2):
"maps xlim of ax2 to x1 and x2 in ax1"
(x1, _), (x2, _) = ax2.transData.inverted().transform(ax1.transData.transform([[x1, 0], [x2, 0]]))
xs, xe = ax2.get_xlim()
k, b = np.polyfit([x1, x2], [xs, xe], 1)
ax2.set_xlim(xs*k+b, xe*k+b)
align_xaxis(ax2, ax1, 0, n-1)
and the output:

pandas - plotting integration with matplotlib

Given this data frame:
xlabel = list('xxxxxxyyyyyyzzzzzz')
fill= list('abc'*6)
val = np.random.rand(18)
df = pd.DataFrame({ 'xlabel':xlabel, 'fill':fill, 'val':val})
This is what I'm aiming at: http://matplotlib.org/mpl_examples/pylab_examples/barchart_demo.png
Applied to my example, Group would be x, y and z, Gender would be a, b and c, and Scores would be val.
I'm aware that in pandas plotting integration with matplotlib is still work in progress, so is it possible to do it directly in matplotlib?

Is this what you want?
df.groupby(['fill', 'xlabel']).mean().unstack().plot(kind='bar')
or
df.pivot_table(rows='fill', cols='xlabel', values='val').plot(kind='bar')
You can brake it apart and fiddle with the labels and columns and title, but I think this basically gives you the plot you wanted.
For the error bars currently you'll have to go to the mpl directly.
mean_df = df.pivot_table(rows='fill', cols='xlabel',
values='val', aggfunc='mean')
err_df = df.pivot_table(rows='fill', cols='xlabel',
values='val', aggfunc='std')
rows = len(mean_df)
cols = len(mean_df.columns)
ind = np.arange(rows)
width = 0.8 / cols
colors = 'grb'
fig, ax = plt.subplots()
for i, col in enumerate(mean_df.columns):
ax.bar(ind + i * width, mean_df[col], width=width,
color=colors[i], yerr=err_df[col], label=col)
ax.set_xticks(ind + cols / 2.0 * width)
ax.set_xticklabels(mean_df.index)
ax.legend()
But there will be an enhancement, probably in the 0.13: issue 3796

This was the only solution I found for displaying the error bars:
means = df.groupby(['fill', 'xlabel']).mean().unstack()
x_mean,y_mean,z_mean = means.val.x, means.val.y,means.val.z
sems = df.groupby(['fill','xlabel']).aggregate(stats.sem).unstack()
x_sem,y_sem,z_sem = sems.val.x, sems.val.y,sems.val.z
ind = np.array([0,1.5,3])
fig, ax = plt.subplots()
width = 0.35
bar_x = ax.bar(ind, x_mean, width, color='r', yerr=x_sem, ecolor='r')
bar_y = ax.bar(ind+width, y_mean, width, color='g', yerr=y_sem, ecolor='g')
bar_z = ax.bar(ind+width*2, z_mean, width, color='b', yerr=z_sem, ecolor='b')
ax.legend((bar_x[0], bar_y[0], bar_z[0]), ('X','Y','Z'))
I'd be happy to see a neater approach to tackle the problem though, possibly as an extension of Viktor Kerkez answer.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

matplotlib uneven group size bar charts side-by-side - python

Related

Setting color of area in Matplotlib

Drawing lines between two points with twinaxes

How to add error values next to error bars?

Multiple plots on same figure with DataFrame.Plot

pandas - plotting integration with matplotlib

Categories

Resources