Piechart with Matplotlib/Python [duplicate] - python

Alright matplotlib afficionados, we know how to plot a donut chart, but what is better than a donut chart? A double-donut chart. Specifically: We have a set of elements that fall into disjoint categories and sub-categories of the first categorization. The donut chart should have slices for the categories in the outer ring and slices for the sub-categories in the inner ring, obviously aligned with the outer slices.
Is there any library that provides this or do we need to work this out here?

To obtain a double donut chart, you can plot as many pie charts in the same plot as you want. So the outer pie would have a width set to its wedges and the inner pie would have a radius that is less or equal 1-width.
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
ax.axis('equal')
width = 0.3
cm = plt.get_cmap("tab20c")
cout = cm(np.arange(3)*4)
pie, _ = ax.pie([120,77,39], radius=1, labels=list("ABC"), colors=cout)
plt.setp( pie, width=width, edgecolor='white')
cin = cm(np.array([1,2,5,6,9,10]))
labels = list(map("".join, zip(list("aabbcc"),map(str, [1,2]*3))))
pie2, _ = ax.pie([60,60,37,40,29,10], radius=1-width, labels=labels,
labeldistance=0.7, colors=cin)
plt.setp( pie2, width=width, edgecolor='white')
plt.show()
Note: I made this code also available in the matplotlib gallery as nested pie example.

I adapted the example you provided; you can tackle your problem by plotting two donuts on the same figure, with a smaller outer radius for one of them.
import matplotlib.pyplot as plt
import numpy as np
def make_pie(sizes, text,colors,labels, radius=1):
col = [[i/255 for i in c] for c in colors]
plt.axis('equal')
width = 0.35
kwargs = dict(colors=col, startangle=180)
outside, _ = plt.pie(sizes, radius=radius, pctdistance=1-width/2,labels=labels,**kwargs)
plt.setp( outside, width=width, edgecolor='white')
kwargs = dict(size=20, fontweight='bold', va='center')
plt.text(0, 0, text, ha='center', **kwargs)
# Group colors
c1 = (226, 33, 7)
c2 = (60, 121, 189)
# Subgroup colors
d1 = (226, 33, 7)
d2 = (60, 121, 189)
d3 = (25, 25, 25)
make_pie([100, 80, 90], "", [d1, d3, d2], ['M', 'N', 'F'], radius=1.2)
make_pie([180, 90], "", [c1, c2], ['M', 'F'], radius=1)
plt.show()

Related

Matplotlib stacked histogram label

Here is my picture. I need to make label for those bars however every upper layer contains lower layer - so the label should containt grouped colors, i.e. blue - dataset 1, blue/orange - dataset 2, blue/orange/green - dataset 3 and finally blue/orange/green/purple - dataset 4. Is it plausible to make it? Thank you.
enter image description here
binwidth = 1
n, bins, patches = ax1.hist(C, bins=range(81, 105, binwidth),
density=False, histtype='barstacked' ,
edgecolor='gray',
color=barvy_histogram,linewidth=0.3)
hatches = ['//','x','..','oo']
for patch_set, hatch in zip(patches, hatches):
for patch in patch_set.patches:
patch.set_hatch(hatch)
patch.set_linewidth=0.1
patch.set_color='gray'
mpl.rcParams['hatch.linewidth'] = 0.5
The following approach uses the tuple legend handler (HandlerTuple) to combine the legend handles. It produces a horizontal layout, while maybe a vertical stacking would be more interesting.
The code starts with creating some test data, supposing C is an Nx4 array of integers. The bin edges are set at halves to make sure that floating point accuracy wouldn't place values in the wrong bin.
import matplotlib as mpl
import matplotlib.pyplot as plt
from matplotlib.legend_handler import HandlerTuple
import numpy as np
# first, create some test data
C = (np.random.normal(0.001, 1, (100, 20)).cumsum(axis=0) * 1.2 + 90).astype(int).reshape(-1, 4)
c_min = C.min()
c_max = C.max()
mpl.rcParams['hatch.linewidth'] = 0.5
fig, ax1 = plt.subplots(figsize=(12, 5))
binwidth = 1
colors = plt.cm.Set2.colors[:C.shape[1]]
_, _, patches = ax1.hist(C, bins=np.arange(c_min - 0.5, c_max + binwidth, binwidth),
density=False, histtype='barstacked',
edgecolor='gray', color=colors, linewidth=0.3,
label=[f'N={p}' for p in range(25, 101, 25)])
hatches = ['//', 'x', '..', 'oo']
for patch_set, hatch in zip(patches, hatches):
for patch in patch_set.patches:
patch.set_hatch(hatch)
patch.set_linewidth = 0.1
handles, labels = ax1.get_legend_handles_labels()
ax1.legend(handles=[tuple(handles[:i + 1]) for i in range(C.shape[1])], labels=labels,
handlelength=6, handler_map={tuple: HandlerTuple(ndivide=None, pad=0)})
plt.show()

Add value labels to stacked bar chart [duplicate]

I'm trying to "robustly" center the data labels in a stacked bar chart. A simple code example and the result are given below. As you can see, the data labels aren't really centered in all rectangles. What am I missing?
import numpy as np
import matplotlib.pyplot as plt
A = [45, 17, 47]
B = [91, 70, 72]
fig = plt.figure(facecolor="white")
ax = fig.add_subplot(1, 1, 1)
bar_width = 0.5
bar_l = np.arange(1, 4)
tick_pos = [i + (bar_width / 2) for i in bar_l]
ax1 = ax.bar(bar_l, A, width=bar_width, label="A", color="green")
ax2 = ax.bar(bar_l, B, bottom=A, width=bar_width, label="B", color="blue")
ax.set_ylabel("Count", fontsize=18)
ax.set_xlabel("Class", fontsize=18)
ax.legend(loc="best")
plt.xticks(tick_pos, ["C1", "C2", "C3"], fontsize=16)
plt.yticks(fontsize=16)
for r1, r2 in zip(ax1, ax2):
h1 = r1.get_height()
h2 = r2.get_height()
plt.text(r1.get_x() + r1.get_width() / 2., h1 / 2., "%d" % h1, ha="center", va="bottom", color="white", fontsize=16, fontweight="bold")
plt.text(r2.get_x() + r2.get_width() / 2., h1 + h2 / 2., "%d" % h2, ha="center", va="bottom", color="white", fontsize=16, fontweight="bold")
plt.show()
The following method is more succinct, and easily scales.
Putting the data into a pandas.DataFrame is the easiest way to plot a stacked bar plot.
Using pandas.DataFrame.plot.bar(stacked=True), or pandas.DataFrame.plot(kind='bar', stacked=True), is the easiest way to plot a stacked bar plot.
This method returns a matplotlib.axes.Axes or a numpy.ndarray of them.
Since seaborn is just a high-level API for matplotlib, these solutions also work with seaborn plots, as shown in How to annotate a seaborn barplot with the aggregated value.
For horizontal stacked bars, see Horizontal stacked bar plot and add labels to each section
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
Imports & Test DataFrame
import pandas as pd
import matplotlib.pyplot as plt
A = [45, 17, 47]
B = [91, 70, 72]
C = [68, 43, 13]
# pandas dataframe
df = pd.DataFrame(data={'A': A, 'B': B, 'C': C})
df.index = ['C1', 'C2', 'C3']
A B C
C1 45 91 68
C2 17 70 43
C3 47 72 13
Updated for matplotlib v3.4.2
Use matplotlib.pyplot.bar_label, which will automatically center the values in the bar.
See How to add value labels on a bar chart for additional details and examples with .bar_label.
Tested with pandas v1.2.4, which is using matplotlib as the plot engine.
If some sections of the bar plot will be zero, see my answer, which shows how to customize the labels for .bar_label().
ax.bar_label(c, fmt='%0.0f', label_type='center') will change the number format to show no decimal places, if needed.
ax = df.plot(kind='bar', stacked=True, figsize=(8, 6), rot=0, xlabel='Class', ylabel='Count')
for c in ax.containers:
# Optional: if the segment is small or 0, customize the labels
labels = [v.get_height() if v.get_height() > 0 else '' for v in c]
# remove the labels parameter if it's not needed for customized labels
ax.bar_label(c, labels=labels, label_type='center')
Seaborn Options
seaborn is a high-level api for matplotlib
The seaborn.barplot api doesn't have an option for stacking, but it "can" be implemented with sns.histplot, or sns.displot.
Seaborn DataFrame Format
# create the data frame
df = pd.DataFrame(data={'A': A, 'B': B, 'C': C, 'cat': ['C1', 'C2', 'C3']})
A B C cat
0 45 91 68 C1
1 17 70 43 C2
2 47 72 13 C3
# convert the dataframe to a long form
df = df.melt(id_vars='cat')
cat variable value
0 C1 A 45
1 C2 A 17
2 C3 A 47
3 C1 B 91
4 C2 B 70
5 C3 B 72
6 C1 C 68
7 C2 C 43
8 C3 C 13
axes-level plot
# plot
ax = sns.histplot(data=df, x='cat', hue='variable', weights='value', discrete=True, multiple='stack')
# iterate through each container
for c in ax.containers:
# Optional: if the segment is small or 0, customize the labels
labels = [v.get_height() if v.get_height() > 0 else '' for v in c]
# remove the labels parameter if it's not needed for customized labels
ax.bar_label(c, labels=labels, label_type='center')
figure-level plot
# plot
g = sns.displot(data=df, x='cat', hue='variable', weights='value', discrete=True, multiple='stack')
# iterate through each axes
for ax in g.axes.flat:
# iterate through each container
for c in ax.containers:
# Optional: if the segment is small or 0, customize the labels
labels = [v.get_height() if v.get_height() > 0 else '' for v in c]
# remove the labels parameter if it's not needed for customized labels
ax.bar_label(c, labels=labels, label_type='center')
Original Answer
Using the .patches method unpacks a list of matplotlib.patches.Rectangle objects, one for each of the sections of the stacked bar.
Each .Rectangle has methods for extracting the various values that define the rectangle.
Each .Rectangle is in order from left to right, and bottom to top, so all the .Rectangle objects, for each level, appear in order, when iterating through .patches.
The labels are made using an f-string, label_text = f'{height}', so any additional text can be added as needed, such as label_text = f'{height}%'
label_text = f'{height:0.0f}' will display numbers with no decimal places.
Plot
plt.style.use('ggplot')
ax = df.plot(stacked=True, kind='bar', figsize=(12, 8), rot='horizontal')
# .patches is everything inside of the chart
for rect in ax.patches:
# Find where everything is located
height = rect.get_height()
width = rect.get_width()
x = rect.get_x()
y = rect.get_y()
# The height of the bar is the data value and can be used as the label
label_text = f'{height}' # f'{height:.2f}' to format decimal values
# ax.text(x, y, text)
label_x = x + width / 2
label_y = y + height / 2
# plot only when height is greater than specified value
if height > 0:
ax.text(label_x, label_y, label_text, ha='center', va='center', fontsize=8)
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
ax.set_ylabel("Count", fontsize=18)
ax.set_xlabel("Class", fontsize=18)
plt.show()
To plot a horizontal bar:
kind='barh'
label_text = f'{width}'
if width > 0:
Attribution: jsoma/chart.py
Why you wrote va="bottom"? You have to use va="center".

Color by category in matplotlib using np.where

I'm trying to create a scatter plot with 100 data points and three variables: x value, y value, and category. This information is stored in an ndarray.
I can create the scatter plot, but I don't know how to use a different color for each category. I used the following code for the plot, which seems to work fine (although it's not finished):
def my_plot(data, color_map):
f, ax = plt.subplots()
ax.scatter(data.x, data.y, s = 150, edgecolors = "r")
return f
In my function, color_map is a parameter which refers to a dictionary I created to color the different categories (there are four in total). This is the dictionary:
color_map = {"winter":(15, 28, 75), "spring":(92, 57, 32), "summer":(255, 253, 211), "fall":(174, 12, 12)}
What I would like to do is to somehow integrate this color_map in my function so that each dot in my plot receives a different color.
I think this could be done using np.where to create a mask, but I'm not sure how to proceed...
The color values need to be divided by 255 because matplotlib likes them between 0 and 1.
With this dict you can create an array of colors for the categories:
from matplotlib import pyplot as plt
from matplotlib.lines import Line2D
import pandas as pd
import numpy as np
color_map = {"winter": (15, 28, 75), "spring": (92, 57, 32), "summer": (255, 253, 211), "fall": (174, 12, 12)}
color_map = {key: (r / 255, g / 255, b / 255,) for key, (r, g, b) in color_map.items()}
N = 200
data = pd.DataFrame({'x': np.random.uniform(1, 9, N), 'y': np.random.uniform(1, 5, N),
'cat': np.random.choice([*color_map.keys()], N)})
fig, ax = plt.subplots()
ax.scatter(data.x, data.y, s=150, color=[color_map[c] for c in data.cat], ec='r')
handles = [Line2D([], [], marker='o', ls='', color=col, markeredgecolor='r', label=label)
for label, col in color_map.items()]
plt.legend(handles=handles, bbox_to_anchor=[1.02, 1.02], loc='upper left')
plt.tight_layout()
plt.show()
PS: A similar plot can be generated with seaborn, which also automatically adds the corresponding legend. Note that the current version of matplotlib (3.3.1) has a problem with the hue parameter. Normally you would add it as hue='cat' but in this version a workaround via .to_list is needed.
import seaborn as sns
ax = sns.scatterplot(x='x', y='y', hue=data['cat'].to_list(), s=150, palette=color_map, edgecolor='r', data=data)

Matching the colour of a legend to the bars in a bargraph python?

I'm trying to match the colours of my legend to the bars in a graph. I've specifically highlighted these bars as points of interest, since they are outside of my ylim. Problem is, my legend is displaying the colours as black as opposed to the colours that I want it to.
Below is the function I'm using to graph, as well as the image of the graph.
def seaborn_plot(dataset,times):
sns.set_style('darkgrid')
sns.set_color_codes("muted")
data_ = dataset
time_list = []
data_list = []
for i, v in enumerate(data_):
if data_[i] > 80000:
data_list.append(('ED={:.2f}'.format(data_[i])))
time_list.append(("Hour {}:".format(times[i])))
df = pd.DataFrame(data = {'times_new':time_list,
'data_list':data_list})
red = 'r'
blue = 'b'
colors = []
for i in range(len(data_)):
if data_[i] > 80000:
color = red
colors.append(color)
else:
color2 = blue
colors.append(color2)
graph = sns.barplot(x=times, y=data_ , palette = colors, label = time_list)
graph.set_xlabel("Time (Hours)", fontsize = 10, fontweight = 'bold');
graph.set_ylabel("Euclidean Distance", fontsize = 10, fontweight = 'bold');
graph.set_ylim([0, 80000])
leg = mp.gca().legend(labels = df["times_new"] + df["data_list"])
return graph
The resulting image:
You can loop through the generated bars and use the bars that satisfy the condition as handles for the legend. As seaborn doesn't return a list of bars (in contrast to plt.bars()), the bars can be obtained from the returned ax (supposing no other bars are drawn yet in the same plot):
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
sns.set_style('darkgrid')
sns.set_color_codes("muted")
data_ = np.random.randint(20000, 100000, 24)
times = np.arange(0, 24)
y_limit = 80000
colors = ['r' if d > y_limit else 'b' for d in data_]
ax = sns.barplot(x=times, y=data_, palette=colors)
ax.set_xlabel("Time (Hours)", fontsize=10, fontweight='bold')
ax.set_ylabel("Euclidean Distance", fontsize=10, fontweight='bold')
ax.set_ylim([0, y_limit])
handles = [bar for bar in graph.containers[0] if bar.get_height() > y_limit]
labels = [f'Hour {" " if h < 10 else ""}{h}: ED={ed:,.0f}' for ed, h in zip(data_, times) if ed > y_limit]
ax.legend(handles, labels, bbox_to_anchor=[1.02, 1], loc='upper left')
plt.tight_layout()
plt.show()
Note that by using the bars as legend handles, this approach would also work when each bar would have an individual color.

Legend only for one of two pies in pandas

I made two pies: one is inside the other. I also want to make a legend but only for the inner circle.
One more significant thing: the inner circle has only two labels, that repeated 5 times, so when I make a legend for both pies, I get something like "paid, free, paid, free, etc"
...
titles = ['Free', 'Paid']
subgroup_names= 5*titles
subgroup_size = final.num.tolist()
a, b, c = [plt.cm.Blues, plt.cm.Reds, plt.cm.Greens]
#Outer ring
fig, ax = plt.subplots()
ax.axis('equal')
mypie, _ = ax.pie(group_size, radius = 2.5, labels = group_names,
colors = [a(0.7), a(0.6), a(0.5), a(0.4), a(0.3)])
plt.setp(mypie, width = 1, edgecolor = 'white')
#Inner ring
mypie2, _ = ax.pie(subgroup_size, radius = 1.6, labels = subgroup_names,
labeldistance = 0.7, colors = [b(0.5), c(0.5)])
plt.setp(mypie2, width = 0.8, edgecolor = 'white')
plt.legend()
plt.show()
plt.legend accepts a list of handles and labels as parameters. get_legend_handles_labels() conveniently gets a list of handles and of labels that would normally be used. Via list indexing you can grab the interesting part.
To center the labels inside the plot, the textprops= parameter of plt.pie accepts a horizontal and vertical alignment.
import matplotlib.pyplot as plt
import numpy as np
titles = ['Free', 'Paid']
subgroup_names = 5 * titles
subgroup_size = np.random.uniform(10, 30, len(subgroup_names))
group_size = subgroup_size.reshape(5, 2).sum(axis=1)
group_names = [f'Group {l}' for l in 'abcde']
a, b, c = [plt.cm.Blues, plt.cm.Reds, plt.cm.Greens]
# Outer ring
fig, ax = plt.subplots()
ax.axis('equal')
mypie, _ = ax.pie(group_size, radius=2.5, labels=group_names,
colors=[a(0.7), a(0.6), a(0.5), a(0.4), a(0.3)])
plt.setp(mypie, width=1, edgecolor='white')
# Inner ring
mypie2, _ = ax.pie(subgroup_size, radius=1.6, labels=subgroup_names,
labeldistance=0.7, colors=[b(0.5), c(0.5)],
textprops={'va': 'center', 'ha': 'center'})
plt.setp(mypie2, width=0.8, edgecolor='white')
handles, labels = plt.gca().get_legend_handles_labels()
labels_to_skip = len(group_names)
plt.legend(handles[labels_to_skip:labels_to_skip + 2], labels[labels_to_skip:labels_to_skip + 2])
plt.show()
PS: To leave out the labels from the pie chart and only have them in the legend, call plt.pie() without the labels= parameter. And create the legend from the patches returned by plt.pie() (limited to the first two in this case):
# Inner ring
mypie2, _ = ax.pie(subgroup_size, radius=1.6,
labeldistance=0.7, colors=[b(0.5), c(0.5)])
plt.setp(mypie2, width=0.8, edgecolor='white')
plt.legend(mypie2[:len(titles)], titles)

Categories