Multiple subplots in triangular form using matplotlib

Multiple subplots in triangular form using matplotlib - python

I have 6 lists and I want to create scatterplots for all possible combinations. This means that I want to create n(n-1)/2 combinations, so 15 plots. I have done this correctly based on the following
script.
for i in d:
for j in d:
if(j>i):
plt.cla() # Clear axis
plt.clf() # Clear figure
correlation_coefficient = str(np.corrcoef(d[i], d[j])[0][1])
plt.scatter(d[i],d[j])
plt.xlabel(names[i])
plt.ylabel(names[j])
plt.title('Correlation Coefficient: '+correlation_coefficient)
plt.grid()
plt.savefig(names[i]+"_"+names[j]+".png")
I want to save all these plots in one figure using subplot, where the first row will have the combinations (0,1) (0,2) (0,3) (0,4) (0,5) the second row (1,2) (1,3) (1,4) (1,5) the third row (2,3) (2,4) (2,5) etc.
So the final outcome will be a figure containing subplots in triangular form.
Update:
If I use subplots (code below) I was able to get somehow the result, but it is not optimal as I create a 6x6 frame whereas you can do it with 5x5.
fig = plt.figure()
cnt = 0
# Create scatterplots for all pairs
for i in d:
for j in d:
if(i>=j):
cnt=cnt+1
if(j>i):
cnt += 1
fig.add_subplot(6,6,cnt) #top left
correlation_coefficient = str(np.corrcoef(d[i], d[j])[0][1])
plt.scatter(np.log(d[i]),np.log(d[j]))
fig.savefig('test.png')

With gridspec:
from matplotlib import pyplot as plt
fig = plt.figure()
data = [(1,2,3),(8,2,3),(0,5,2),(4,7,1),(9,5,2),(8,8,8)]
plotz = len(data)
for i in range(plotz-1):
for j in range(plotz):
if(j>i) :
print(i,j)
ax = plt.subplot2grid((plotz-1, plotz-1), (i,j-1))
ax.xaxis.set_ticklabels([])
ax.yaxis.set_ticklabels([])
plt.scatter(data[i],data[j]) # might be nice with shared axis limits
fig.show()
With add_subplot, you've hit an oddity inherited from MATLAB, which 1-indexes the subplot count. (Also you have some counting errors.) Here's an example that just keeps track of the various indices:
from matplotlib import pyplot as plt
fig = plt.figure()
count = 0
data = [(1,2,3),(8,2,3),(0,5,2),(4,7,1),(9,5,2),(8,8,8)]
plotz = len(data)
for i in range(plotz-1):
for j in range(plotz):
if(j>i):
print(count, i,j, count -i)
ax = fig.add_subplot(plotz-1, plotz-1, count-i)
ax.xaxis.set_ticklabels([])
ax.yaxis.set_ticklabels([])
plt.text(.15, .5,'i %d, j %d, c %d'%(i,j,count))
count += 1
fig.show()
N.b.: the error from doing the obvious (your original code with add_subplot(5,5,cnt)) was a good hint:
...User/lib/python2.7/site-packages/matplotlib/axes.pyc in
init(self, fig, *args, **kwargs)
9249 self._subplotspec = GridSpec(rows,
cols)[num[0] - 1:num1]
9250 else:
-> 9251 self._subplotspec = GridSpec(rows, cols)[int(num) - 1]
9252 # num - 1 for converting from MATLAB to
python indexing

Related

Problems plotting multiple data sets on same graph in python

I have a dataset which consists of data gathered from experiments from various participants done over 3 days.
I managed to plot the data for each participant on a seperate plot for each experiment succefully using the following code:
by_part = p1.groupby('participant_id')
for name, group in by_part:
byexp_num = p1.groupby('exp_num')
fig, axs = plt.subplots(figsize=(len(byexp_num), 5), nrows=2, ncols=(len(byexp_num)//2)+ (len(byexp_num) % 2 > 0)) #as there are 2 rows, the column is by the length of the experiments divided by 2 plus the modulo of the same operation to account for odd numbers
plt.subplots_adjust(left=0.1, bottom=0.1, right=0.9, top=0.8, wspace=0.4, hspace=0.4)
fig.suptitle('Participant {}'.format(name), fontsize=20)
subplot_targets = zip(byexp_num.groups.keys(), axs.flatten())
for key, ax in subplot_targets:
ax.plot(byexp_num.get_group(key).rn_norm, byexp_num.get_group(key).scale_data)
ax.set_ylabel('Scale Data')
ax.set_title('Experiment {}'.format(key+1))
But when I try to group the data in days and plot multiple experiments on the same graph for each day there is a problem with how the graphs are displayed. The data is grouped succesffuly, but it joins the all the experiments together. i.e the line continues from the last datapoint of the experiment to the next so instead of showing n seperate lines in each plot, it shows 1 continuous one. I am not sure about what im doing wrong.
by_part = p1.groupby('participant_id')
p1 = df_all[(df_all['participant_id']== 1)]
by_part = p1.groupby('participant_id')
for name, group in by_part:
by_day = p1.groupby('day')
fig, axs = plt.subplots(figsize=(30, 5), nrows=1, ncols=(len(by_day)))
fig.suptitle('Participant {}'.format(name), fontsize=20)
subplot_targets = zip(by_day.groups.keys(), axs.flatten())
for key, ax in subplot_targets:
ax.plot(by_day.get_group(key).rn, by_day.get_group(key).scale_data)
ax.set_ylabel('Scale Data')
ax.set_title('Day {}'.format(key))
ax.legend(p1['exp_num'])
Here is the graph it displays.
Plots
EDIT
Adding dataframe as requested by GalacticPonderer
DataFrame for Participant 1

I suspect you need to sort your data, per invocation of plot, by X value (rn). To do this, we'll convert it into a numpy array, find out what indexes will sort the X axis, then apply these indices to both the X and Y axis, creating a new numpy array we can plot.
First let's look at a simple example:
import matplotlib.pyplot as plt
import numpy as np
y=[list(range(0,10)), list(range(0,20,2))]
# Randomly chosen out-of-order values
x=[[7,3,8,5,2,1,9,6,4,0],[8,2,4,0,1,6,3,7,9,5]]
fig,axs=plt.subplots()
for i in range(0,2):
axs.plot(x[i], y[i])
plt.show()
Yields:
Now let's sort by X axis:
for i in range(0,2):
xy=np.array([x[i],y[i]])
ind=np.argsort(xy,axis=-1)[0]
axs.plot(np.take_along_axis(xy[0,:], ind, axis=0), np.take_along_axis(xy[1,:], ind, axis=0))
The xy=np.array([x[i],y[i]]) line converts the data to a numpy array, because numpy has the requisite sorting operations. ind=np.argsort(xy,axis=-1) returns a numpy array of indices that will sort the X-axis, which we will apply to both the X and Y axes. Since we are only interested in the X-axis sort results, we discard the second row ([0]). take_along_axis is a companion to argsort and allows us to apply the indices to the array.
Result:
Applying to your code:
import numpy as np
# [...]
for key, ax in subplot_targets:
xy=np.array([by_day.get_group(key).rn, by_day.get_group(key).scale_data])
ind=np.argsort(xy,axis=-1)[0]
ax.plot(np.take_along_axis(xy[0,:], ind, axis=0), np.take_along_axis(xy[1,:], ind, axis=0)))
ax.set_ylabel('Scale Data')
ax.set_title('Day {}'.format(key))
ax.legend(p1['exp_num'])

Adding a colorbar whose color corresponds to the different lines in an existing plot

My dataset is in the form of :
Data[0] = [headValue,x0,x1,..xN]
Data[1] = [headValue_ya,ya0,ya1,..yaN]
Data[2] = [headValue_yb,yb0,yb1,..ybN]
...
Data[n] = [headvalue_yz,yz0,yz1,..yzN]
I want to plot f(y*) = x, so I can visualize all Lineplots in the same figure with different colors, each color determined by the headervalue_y*.
I also want to add a colorbar whose color matching the lines and therefore the header values, so we can link visually which header value leads to which behaviour.
Here is what I am aiming for :(Plot from Lacroix B, Letort G, Pitayu L, et al. Microtubule Dynamics Scale with Cell Size to Set Spindle Length and Assembly Timing. Dev Cell. 2018;45(4):496–511.e6. doi:10.1016/j.devcel.2018.04.022)
I have trouble adding the colorbar, I have tried to extract N colors from a colormap (N is my number of different headValues, or column -1) and then adding for each line plot the color corresponding here is my code to clarify:
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
Data = [['Time',0,0.33,..200],[0.269,4,4.005,...11],[0.362,4,3.999,...16.21],...[0.347,4,3.84,...15.8]]
headValues = [0.269,0.362,0.335,0.323,0.161,0.338,0.341,0.428,0.245,0.305,0.305,0.314,0.299,0.395,0.32,0.437,0.203,0.41,0.392,0.347]
# the differents headValues_y* of each column here in a list but also in Data
# with headValue[0] = Data[1][0], headValue[1] = Data[2][0] ...
cmap = mpl.cm.get_cmap('rainbow') # I choose my colormap
rgba = [] # the color container
for value in headValues:
rgba.append(cmap(value)) # so rgba will contain a different color for each headValue
fig, (ax,ax1) = plt.subplots(2,1) # creating my figure and two axes to put the Lines and the colorbar
c = 0 # index for my colors
for i in range(1, len(Data)):
ax.plot( Data[0][1:], Data[i][1:] , color = rgba[c])
# Data[0][1:] is x, Data[i][1:] is y, and the color associated with Data[i][0]
c += 1
fig.colorbar(mpl.cm.ScalarMappable(cmap= mpl.colors.ListedColormap(rgba)), cax=ax1, orientation='horizontal')
# here I create my scalarMappable for my lineplot and with the previously selected colors 'rgba'
plt.show()
The current result:
How to add the colorbar on the side or the bottom of the first axis ?
How to properly add a scale to this colorbar correspondig to different headValues ?
How to make the colorbar scale and colors match to the different lines on the plot with the link One color = One headValue ?
I have tried to work with scatterplot which are more convenient to use with scalarMappable but no solution allows me to do all these things at once.

Here is a possible approach. As the 'headValues' aren't sorted, nor equally spaced and one is even used twice, it is not fully clear what the most-desired result would be.
Some remarks:
The standard way of creating a colorbar in matplotlib doesn't need a separate subplot. Matplotlib will reduce the existing plot a bit and put the colorbar next to it (or below for a vertical bar).
Converting the 'headValues' to a numpy array allows for compact code, e.g. writing rgba = cmap(headValues) directly calculates the complete array.
Calling cmap on unchanged values will map 0 to the lowest color and 1 to the highest color, so for values only between 0.16 and 0.44 they all will be mapped to quite similar colors. One approach is to create a norm to map 0.16 to the lowest color and 0.44 to the highest. In code: norm = plt.Normalize(headValues.min(), headValues.max()) and then calculate rgba = cmap(norm(headValues)).
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
headValues = np.array([0.269, 0.362, 0.335, 0.323, 0.161, 0.338, 0.341, 0.428, 0.245, 0.305, 0.305, 0.314, 0.299, 0.395, 0.32, 0.437, 0.203, 0.41, 0.392, 0.347])
x = np.linspace(0, 200, 500)
# create Data similar to the data in the question
Data = [['Time'] + list(x)] + [[val] + list(np.sqrt(4 * x) * val + 4) for val in headValues]
headValues = np.array([d[0] for d in Data[1:]])
order = np.argsort(headValues)
inverse_order = np.argsort(order)
cmap = mpl.cm.get_cmap('rainbow')
rgba = cmap(np.linspace(0, 1, len(headValues))) # evenly spaced colors
fig, ax = plt.subplots(1, 1)
for i in range(1, len(Data)):
ax.plot(Data[0][1:], Data[i][1:], color=rgba[inverse_order[i-1]])
# Data[0][1:] is x, Data[i][1:] is y, and the color associated with Data[i-1][0]
cbar = fig.colorbar(mpl.cm.ScalarMappable(cmap=mpl.colors.ListedColormap(rgba)), orientation='vertical',
ticks=np.linspace(0, 1, len(rgba) * 2 + 1)[1::2])
cbar.set_ticklabels(headValues[order])
plt.show()
Alternatively, the colors can be assigned using their position in the colormap, but without creating
cmap = mpl.cm.get_cmap('rainbow')
norm = plt.Normalize(headValues.min(), headValues.max())
fig, ax = plt.subplots(1, 1)
for i in range(1, len(Data)):
ax.plot(Data[0][1:], Data[i][1:], color=cmap(norm(Data[i][0])))
cbar = fig.colorbar(mpl.cm.ScalarMappable(cmap=cmap, norm=norm))
To get ticks for each of the 'headValues', these ticks can be set explicitly. As putting a label for each tick will result in overlapping text, labels that are too close to other labels can be replaced by an empty string:
headValues.sort()
cbar2 = fig.colorbar(mpl.cm.ScalarMappable(cmap=cmap, norm=norm), ticks=headValues)
cbar2.set_ticklabels([val if val < next - 0.007 else '' for val, next in zip(headValues[:-1], headValues[1:])]
+ [headValues[-1]])
At the left the result of the first approach (colors in segments), at the right the alternative colorbars (color depending on value):

Plot subplots using seaborn pairplot

If I draw the plot using the following code, it works and I can see all the subplots in a single row. I can specifically break the number of cols into three or two and show them. But I have 30 columns and I wanted to use a loop mechanism so that they are plotted in a grid of say 4x4 sub-plots
regressionCols = ['col_a', 'col_b', 'col_c', 'col_d', 'col_e']
sns.pairplot(numerical_df, x_vars=regressionCols, y_vars='price',height=4, aspect=1, kind='scatter')
plt.show()
The code using loop is below. However, I don't see anything rendered.
nr_rows = 4
nr_cols = 4
li_cat_cols = list(regressionCols)
fig, axs = plt.subplots(nr_rows, nr_cols, figsize=(nr_cols*4,nr_rows*4), squeeze=False)
for r in range(0, nr_rows):
for c in range(0,nr_cols):
i = r*nr_cols+c
if i < len(li_cat_cols):
sns.set(style="darkgrid")
bp=sns.pairplot(numerical_df, x_vars=li_cat_cols[i], y_vars='price',height=4, aspect=1, kind='scatter')
bp.set(xlabel=li_cat_cols[i], ylabel='Price')
plt.tight_layout()
plt.show()
Not sure what I am missing.

I think you didnt connect each of your subplot spaces in a matrix plot to scatter plots generated in a loop.
Maybe this solution with inner pandas plots could be proper for you:
For example,
1.Lets simply define an empty pandas dataframe.
numerical_df = pd.DataFrame([])
2. Create some random features and price depending on them:
numerical_df['A'] = np.random.randn(100)
numerical_df['B'] = np.random.randn(100)*10
numerical_df['C'] = np.random.randn(100)*-10
numerical_df['D'] = np.random.randn(100)*2
numerical_df['E'] = 20*(np.random.randn(100)**2)
numerical_df['F'] = np.random.randn(100)
numerical_df['price'] = 2*numerical_df['A'] +0.5*numerical_df['B'] - 9*numerical_df['C'] + numerical_df['E'] + numerical_df['D']
3. Define number of rows and columns. Create a subplots space with nr_rows and nr_cols.
nr_rows = 2
nr_cols = 4
fig, axes = plt.subplots(nrows=nr_rows, ncols=nr_cols, figsize=(15, 8))
for idx, feature in enumerate(numerical_df.columns[:-1]):
numerical_df.plot(feature, "price", subplots=True,kind="scatter",ax=axes[idx // 4,idx % 4])
4. Enumerate each feature in dataframe and plot a scatterplot with price:
for idx, feature in enumerate(numerical_df.columns[:-1]):
numerical_df.plot(feature, "price", subplots=True,kind="scatter",ax=axes[idx // 4,idx % 4])
where axes[idx // 4, idx % 4] defines the location of each scatterplot in a matrix you create in (3.)
So, we got a matrix plot:
Scatterplot matrix

python saving multiple subplot figures to pdf

I am new with python I am trying to save a huge bunch of data into a pdf with figures using PdfPages of matplotlib and subplots. Problem is that I found a blottleneck I dont know how to solve, the code goes something like:
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages
with PdfPages('myfigures.pdf') as pdf:
for i in range(1000):
f,axarr = plt.subplots(2, 3)
plt.subplots(2, 3)
axarr[0, 0].plot(x1, y1)
axarr[1, 0].plot(x2, y2)
pdf.savefig(f)
plt.close('all')
Creating a figure each loop it is highly time consuming, but if I put that outside the loop it doesnt clear each plot. Other options I tried like clear() or clf() didnt work either or ended in creating multiple different figures, anyone as an idea on how to put this in a different way so that it goes faster?

Multipage PDF appending w/ matplotlib
Create 𝑚-rows × 𝑛-cols matrices of subplot axes arrays per pdf page & save (append) as each page's matrix of subplots becomes completely full → then create new page, repeat, 𝐞𝐭𝐜.
To contain large numbers of subplots as multipage output inside a single pdf, immediately start filling the first page with your plot(s), then you'll need to create a new page after detecting that the latest subplot addition in your iteration of plot generation has maxed out the available space in the current page's 𝑚-rows × 𝑛-cols subplot-array layout [i.e., an 𝑚 × 𝑛 matrix of subplots], as applicable.
Here's a way to do it where the dimensions (𝑚 × 𝑛) controlling the number of subplots per page can easily be changed:
import sys
import matplotlib
from matplotlib.backends.backend_pdf import PdfPages
import matplotlib.pyplot as plt
import numpy as np
matplotlib.rcParams.update({"font.size": 6})
# Dimensions for any m-rows × n-cols array of subplots / pg.
m, n = 4, 5
# Don't forget to indent after the with statement
with PdfPages("auto_subplotting.pdf") as pdf:
"""Before beginning the iteration through all the data,
initialize the layout for the plots and create a
representation of the subplots that can be easily
iterated over for knowing when to create the next page
(and also for custom settings like partial axes labels)"""
f, axarr = plt.subplots(m, n, sharex="col", sharey="row")
arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
subplots = [axarr[index] for index in arr_ij]
# To conserve needed plotting real estate,
# only label the bottom row and leftmost subplots
# as determined automatically using m and n
splot_index = 0
for s, splot in enumerate(subplots):
splot.set_ylim(0, 0.15)
splot.set_xlim(0, 50)
last_row = m * n - s < n + 1
first_in_row = s % n == 0
if last_row:
splot.set_xlabel("X-axis label")
if first_in_row:
splot.set_ylabel("Y-axis label")
# Iterate through each sample in the data
for sample in range(33):
# As a stand-in for real data, let's just make numpy take 100 random draws
# from a poisson distribution centered around say ~25 and then display
# the outcome as a histogram
scaled_y = np.random.randint(20, 30)
random_data = np.random.poisson(scaled_y, 100)
subplots[splot_index].hist(
random_data,
bins=12,
normed=True,
fc=(0, 0, 0, 0),
lw=0.75,
ec="b",
)
# Keep collecting subplots (into the mpl-created array;
# see: [1]) through the samples in the data and increment
# a counter each time. The page will be full once the count is equal
# to the product of the user-set dimensions (i.e. m * n)
splot_index += 1
"""Once an mxn number of subplots have been collected
you now have a full page's worth, and it's time to
close and save to pdf that page and re-initialize for a
new page possibly. We can basically repeat the same
exact code block used for the first layout
initialization, but with the addition of 3 new lines:
+2 for creating & saving the just-finished pdf page,
+1 more to reset the subplot index (back to zero)"""
if splot_index == m * n:
pdf.savefig()
plt.close(f)
f, axarr = plt.subplots(m, n, sharex="col", sharey="row")
arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
subplots = [axarr[index] for index in arr_ij]
splot_index = 0
for s, splot in enumerate(subplots):
splot.set_ylim(0, 0.15)
splot.set_xlim(0, 50)
last_row = (m * n) - s < n + 1
first_in_row = s % n == 0
if last_row:
splot.set_xlabel("X-axis label")
if first_in_row:
splot.set_ylabel("Y-axis label")
# Done!
# But don't forget to save to pdf after the last page
pdf.savefig()
plt.close(f)
For any m×n layout, just change the declarations for the values of m and n, respectively. From the code above (where "m, n = 4, 5"), a 4x5 matrix of subplots with a total 33 samples is produced as a two-page pdf output file:
References
Link to matplotlib subplots official docs.
Note:
There will be, on the final page of the multipage PDF, a number of blank subplots equal to the remainder from the the product of your chosen subplots 𝑚 × 𝑛 layout dimension numbers and your total number of samples/data to plot. E.g., say m=3, and n=4, thus you get 3 rows of 4 subplots each equals 12 per page, and if you had say 20 samples, then there would be a two-page pdf auto-created with a total of 24 subplots with the last 4 (so full bottom-most row in this hypothetical example) of subplots on the second page empty.
Using seaborn
For a more advanced (& more "pythonic"*) extension of the implementation above, see below:
The multipage handling should probably be simplified by creating a new_page function; it's better to not repeat code verbatim*, especially if you start customizing the plots in which case you won't want to have to mirror every change and type the same thing twice. A more customized aesthetic based off of seaborn and utilizing the available matplotlib parameters like shown below might be preferable too.
Add a new_page function & some customizations for the subplot style:
import matplotlib.pyplot as plt
import numpy as np
import random
import seaborn as sns
from matplotlib.backends.backend_pdf import PdfPages
# this erases labels for any blank plots on the last page
sns.set(font_scale=0.0)
m, n = 4, 6
datasize = 37
# 37 % (m*n) = 13, (m*n) - 13 = 24 - 13 = 11. Thus 11 blank subplots on final page
# custom colors scheme / palette
ctheme = [
"k", "gray", "magenta", "fuchsia", "#be03fd", "#1e488f",
(0.44313725490196076, 0.44313725490196076, 0.88627450980392153), "#75bbfd",
"teal", "lime", "g", (0.6666674, 0.6666663, 0.29078014184397138), "y",
"#f1da7a", "tan", "orange", "maroon", "r", ] # pick whatever colors you wish
colors = sns.blend_palette(ctheme, datasize)
fz = 7 # labels fontsize
def new_page(m, n):
global splot_index
splot_index = 0
fig, axarr = plt.subplots(m, n, sharey="row")
plt.subplots_adjust(hspace=0.5, wspace=0.15)
arr_ij = [(x, y) for x, y in np.ndindex(axarr.shape)]
subplots = [axarr[index] for index in arr_ij]
for s, splot in enumerate(subplots):
splot.grid(
b=True,
which="major",
color="gray",
linestyle="-",
alpha=0.25,
zorder=1,
lw=0.5,
)
splot.set_ylim(0, 0.15)
splot.set_xlim(0, 50)
last_row = m * n - s < n + 1
first_in_row = s % n == 0
if last_row:
splot.set_xlabel("X-axis label", labelpad=8, fontsize=fz)
if first_in_row:
splot.set_ylabel("Y-axis label", labelpad=8, fontsize=fz)
return (fig, subplots)
with PdfPages("auto_subplotting_colors.pdf") as pdf:
fig, subplots = new_page(m, n)
for sample in xrange(datasize):
splot = subplots[splot_index]
splot_index += 1
scaled_y = np.random.randint(20, 30)
random_data = np.random.poisson(scaled_y, 100)
splot.hist(
random_data,
bins=12,
normed=True,
zorder=2,
alpha=0.99,
fc="white",
lw=0.75,
ec=colors.pop(),
)
splot.set_title("Sample {}".format(sample + 1), fontsize=fz)
# tick fontsize & spacing
splot.xaxis.set_tick_params(pad=4, labelsize=6)
splot.yaxis.set_tick_params(pad=4, labelsize=6)
# make new page:
if splot_index == m * n:
pdf.savefig()
plt.close(fig)
fig, subplots = new_page(m, n)
if splot_index > 0:
pdf.savefig()
plt.close(f)

How to label the bars of a stacked bar plot from a pandas DataFrame? [duplicate]

I am trying to replicate the following image in matplotlib and it seems barh is my only option. Though it appears that you can't stack barh graphs so I don't know what to do
If you know of a better python library to draw this kind of thing, please let me know.
This is all I could come up with as a start:
import matplotlib.pyplot as plt; plt.rcdefaults()
import numpy as np
import matplotlib.pyplot as plt
people = ('A','B','C','D','E','F','G','H')
y_pos = np.arange(len(people))
bottomdata = 3 + 10 * np.random.rand(len(people))
topdata = 3 + 10 * np.random.rand(len(people))
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
ax.barh(y_pos, bottomdata,color='r',align='center')
ax.barh(y_pos, topdata,color='g',align='center')
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')
plt.show()
I would then have to add labels individually using ax.text which would be tedious. Ideally I would like to just specify the width of the part to be inserted then it updates the center of that section with a string of my choosing. The labels on the outside (e.g. 3800) I can add myself later, it is mainly the labeling over the bar section itself and creating this stacked method in a nice way I'm having problems with. Can you even specify a 'distance' i.e. span of color in any way?

Edit 2: for more heterogeneous data. (I've left the above method since I find it more usual to work with the same number of records per series)
Answering the two parts of the question:
a) barh returns a container of handles to all the patches that it drew. You can use the coordinates of the patches to aid the text positions.
b) Following these two answers to the question that I noted before (see Horizontal stacked bar chart in Matplotlib), you can stack bar graphs horizontally by setting the 'left' input.
and additionally c) handling data that is less uniform in shape.
Below is one way you could handle data that is less uniform in shape is simply to process each segment independently.
import numpy as np
import matplotlib.pyplot as plt
# some labels for each row
people = ('A','B','C','D','E','F','G','H')
r = len(people)
# how many data points overall (average of 3 per person)
n = r * 3
# which person does each segment belong to?
rows = np.random.randint(0, r, (n,))
# how wide is the segment?
widths = np.random.randint(3,12, n,)
# what label to put on the segment (xrange in py2.7, range for py3)
labels = range(n)
colors ='rgbwmc'
patch_handles = []
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
left = np.zeros(r,)
row_counts = np.zeros(r,)
for (r, w, l) in zip(rows, widths, labels):
print r, w, l
patch_handles.append(ax.barh(r, w, align='center', left=left[r],
color=colors[int(row_counts[r]) % len(colors)]))
left[r] += w
row_counts[r] += 1
# we know there is only one patch but could enumerate if expanded
patch = patch_handles[-1][0]
bl = patch.get_xy()
x = 0.5*patch.get_width() + bl[0]
y = 0.5*patch.get_height() + bl[1]
ax.text(x, y, "%d%%" % (l), ha='center',va='center')
y_pos = np.arange(8)
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')
plt.show()
Which produces a graph like this , with a different number of segments present in each series.
Note that this is not particularly efficient since each segment used an individual call to ax.barh. There may be more efficient methods (e.g. by padding a matrix with zero-width segments or nan values) but this likely to be problem-specific and is a distinct question.
Edit: updated to answer both parts of the question.
import numpy as np
import matplotlib.pyplot as plt
people = ('A','B','C','D','E','F','G','H')
segments = 4
# generate some multi-dimensional data & arbitrary labels
data = 3 + 10* np.random.rand(segments, len(people))
percentages = (np.random.randint(5,20, (len(people), segments)))
y_pos = np.arange(len(people))
fig = plt.figure(figsize=(10,8))
ax = fig.add_subplot(111)
colors ='rgbwmc'
patch_handles = []
left = np.zeros(len(people)) # left alignment of data starts at zero
for i, d in enumerate(data):
patch_handles.append(ax.barh(y_pos, d,
color=colors[i%len(colors)], align='center',
left=left))
# accumulate the left-hand offsets
left += d
# go through all of the bar segments and annotate
for j in range(len(patch_handles)):
for i, patch in enumerate(patch_handles[j].get_children()):
bl = patch.get_xy()
x = 0.5*patch.get_width() + bl[0]
y = 0.5*patch.get_height() + bl[1]
ax.text(x,y, "%d%%" % (percentages[i,j]), ha='center')
ax.set_yticks(y_pos)
ax.set_yticklabels(people)
ax.set_xlabel('Distance')
plt.show()
You can achieve a result along these lines (note: the percentages I used have nothing to do with the bar widths, as the relationship in the example seems unclear):
See Horizontal stacked bar chart in Matplotlib for some ideas on stacking horizontal bar plots.

Imports and Test DataFrame
Tested in python 3.10, pandas 1.4.2, matplotlib 3.5.1, seaborn 0.11.2
For vertical stacked bars see Stacked Bar Chart with Centered Labels
import pandas as pd
import numpy as np
# create sample data as shown in the OP
np.random.seed(365)
people = ('A','B','C','D','E','F','G','H')
bottomdata = 3 + 10 * np.random.rand(len(people))
topdata = 3 + 10 * np.random.rand(len(people))
# create the dataframe
df = pd.DataFrame({'Female': bottomdata, 'Male': topdata}, index=people)
# display(df)
Female Male
A 12.41 7.42
B 9.42 4.10
C 9.85 7.38
D 8.89 10.53
E 8.44 5.92
F 6.68 11.86
G 10.67 12.97
H 6.05 7.87
Updated with matplotlib v3.4.2
Use matplotlib.pyplot.bar_label
See How to add value labels on a bar chart for additional details and examples with .bar_label.
labels = [f'{v.get_width():.2f}%' if v.get_width() > 0 else '' for v in c ] for python < 3.8, without the assignment expression (:=).
Plotted using pandas.DataFrame.plot with kind='barh'
ax = df.plot(kind='barh', stacked=True, figsize=(8, 6))
for c in ax.containers:
# customize the label to account for cases when there might not be a bar section
labels = [f'{w:.2f}%' if (w := v.get_width()) > 0 else '' for v in c ]
# set the bar label
ax.bar_label(c, labels=labels, label_type='center')
# uncomment and use the next line if there are no nan or 0 length sections; just use fmt to add a % (the previous two lines of code are not needed, in this case)
# ax.bar_label(c, fmt='%.2f%%', label_type='center')
# move the legend
ax.legend(bbox_to_anchor=(1.025, 1), loc='upper left', borderaxespad=0.)
# add labels
ax.set_ylabel("People", fontsize=18)
ax.set_xlabel("Percent", fontsize=18)
plt.show()
Using seaborn
sns.barplot does not have an option for stacked bar plots, however, sns.histplot and sns.displot can be used to create horizontal stacked bars.
seaborn typically requires the dataframe to be in a long, instead of wide, format, so use pandas.DataFrame.melt to reshape the dataframe.
Reshape dataframe
# convert the dataframe to a long form
df = df.reset_index()
df = df.rename(columns={'index': 'People'})
dfm = df.melt(id_vars='People', var_name='Gender', value_name='Percent')
# display(dfm)
People Gender Percent
0 A Female 12.414557
1 B Female 9.416027
2 C Female 9.846105
3 D Female 8.885621
4 E Female 8.438872
5 F Female 6.680709
6 G Female 10.666258
7 H Female 6.050124
8 A Male 7.420860
9 B Male 4.104433
10 C Male 7.383738
11 D Male 10.526158
12 E Male 5.916262
13 F Male 11.857227
14 G Male 12.966913
15 H Male 7.865684
sns.histplot: axes-level plot
fig, axe = plt.subplots(figsize=(8, 6))
sns.histplot(data=dfm, y='People', hue='Gender', discrete=True, weights='Percent', multiple='stack', ax=axe)
# iterate through each set of containers
for c in axe.containers:
# add bar annotations
axe.bar_label(c, fmt='%.2f%%', label_type='center')
axe.set_xlabel('Percent')
plt.show()
sns.displot: figure-level plot
g = sns.displot(data=dfm, y='People', hue='Gender', discrete=True, weights='Percent', multiple='stack', height=6)
# iterate through each facet / supbplot
for axe in g.axes.flat:
# iteate through each set of containers
for c in axe.containers:
# add the bar annotations
axe.bar_label(c, fmt='%.2f%%', label_type='center')
axe.set_xlabel('Percent')
plt.show()
Original Answer - before matplotlib v3.4.2
The easiest way to plot a horizontal or vertical stacked bar, is to load the data into a pandas.DataFrame
This will plot, and annotate correctly, even when all categories ('People'), don't have all segments (e.g. some value is 0 or NaN)
Once the data is in the dataframe:
It's easier to manipulate and analyze
It can be plotted with the matplotlib engine, using:
pandas.DataFrame.plot.barh
label_text = f'{width}' for annotations
pandas.DataFrame.plot.bar
label_text = f'{height}' for annotations
SO: Vertical Stacked Bar Chart with Centered Labels
These methods return a matplotlib.axes.Axes or a numpy.ndarray of them.
Using the .patches method unpacks a list of matplotlib.patches.Rectangle objects, one for each of the sections of the stacked bar.
Each .Rectangle has methods for extracting the various values that define the rectangle.
Each .Rectangle is in order from left the right, and bottom to top, so all the .Rectangle objects, for each level, appear in order, when iterating through .patches.
The labels are made using an f-string, label_text = f'{width:.2f}%', so any additional text can be added as needed.
Plot and Annotate
Plotting the bar, is 1 line, the remainder is annotating the rectangles
# plot the dataframe with 1 line
ax = df.plot.barh(stacked=True, figsize=(8, 6))
# .patches is everything inside of the chart
for rect in ax.patches:
# Find where everything is located
height = rect.get_height()
width = rect.get_width()
x = rect.get_x()
y = rect.get_y()
# The height of the bar is the data value and can be used as the label
label_text = f'{width:.2f}%' # f'{width:.2f}' to format decimal values
# ax.text(x, y, text)
label_x = x + width / 2
label_y = y + height / 2
# only plot labels greater than given width
if width > 0:
ax.text(label_x, label_y, label_text, ha='center', va='center', fontsize=8)
# move the legend
ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', borderaxespad=0.)
# add labels
ax.set_ylabel("People", fontsize=18)
ax.set_xlabel("Percent", fontsize=18)
plt.show()
Example with Missing Segment
# set one of the dataframe values to 0
df.iloc[4, 1] = 0
Note the annotations are all in the correct location from df.

For this case, the above answers work perfectly. The issue I had, and didn't find a plug-and-play solution online, was that I often have to plot stacked bars in multi-subplot figures, with many values, which tend to have very non-homogenous amplitudes.
(Note: I work usually with pandas dataframes, and matplotlib. I couldn't make the bar_label() method of matplotlib to work all the times.)
So, I just give a kind of ad-hoc, but easily generalizable solution. In this example, I was working with single-row dataframes (for power-exchange monitoring purposes per hour), so, my dataframe (df) had just one row.
(I provide an example figure to show how this can be useful in very densely-packed plots)
[enter image description here][1]
[1]: https://i.stack.imgur.com/9akd8.png
'''
This implementation produces a stacked, horizontal bar plot.
df --> pandas dataframe. Columns are used as the iterator, and only the firs value of each column is used.
waterfall--> bool: if True, apart from the stack-direction, also a perpendicular offset is added.
cyclic_offset_x --> list (of any length) or None: loop through these values to use as x-offset pixels.
cyclic_offset_y --> list (of any length) or None: loop through these values to use as y-offset pixels.
ax --> matplotlib Axes, or None: if None, creates a new axis and figure.
'''
def magic_stacked_bar(df, waterfall=False, cyclic_offset_x=None, cyclic_offset_y=None, ax=None):
if isinstance(cyclic_offset_x, type(None)):
cyclic_offset_x = [0, 0]
if isinstance(cyclic_offset_y, type(None)):
cyclic_offset_y = [0, 0]
ax0 = ax
if isinstance(ax, type(None)):
fig, ax = plt.subplots()
fig.set_size_inches(19, 10)
cycler = 0;
prev = 0 # summation variable to make it stacked
for c in df.columns:
if waterfall:
y = c ; label = "" # bidirectional stack
else:
y = 0; label = c # unidirectional stack
ax.barh(y=y, width=df[c].values[0], height=1, left=prev, label = label)
prev += df[c].values[0] # add to sum-stack
offset_x = cyclic_offset_x[divmod(cycler, len(cyclic_offset_x))[1]]
offset_y = cyclic_offset_y[divmod(cycler, len(cyclic_offset_y))[1]]
ax.annotate(text="{}".format(int(df[c].values[0])), xy=(prev - df[c].values / 2, y),
xytext=(offset_x, offset_y), textcoords='offset pixels',
ha='center', va='top', fontsize=8,
arrowprops=dict(facecolor='black', shrink=0.01, width=0.3, headwidth=0.3),
bbox=dict(boxstyle='round', facecolor='grey', alpha=0.5))
cycler += 1
if not waterfall:
ax.legend() # if waterfall, the index annotates the columns. If
# waterfall ==False, the legend annotates the columns
if isinstance(ax0, type(None)):
ax.set_title("Voi la")
ax.set_xlabel("UltraWatts")
plt.show()
else:
return ax
''' (Sometimes, it is more tedious and requires some custom functions to make the labels look alright.
'''
A, B = 80,80
n_units = df.shape[1]
cyclic_offset_x = -A*np.cos(2*np.pi / (2*n_units) *np.arange(n_units))
cyclic_offset_y = B*np.sin(2*np.pi / (2*n_units) * np.arange(n_units)) + B/2

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiple subplots in triangular form using matplotlib - python

Related

Problems plotting multiple data sets on same graph in python

Adding a colorbar whose color corresponds to the different lines in an existing plot

Plot subplots using seaborn pairplot

python saving multiple subplot figures to pdf

How to label the bars of a stacked bar plot from a pandas DataFrame? [duplicate]

Categories

Resources