I'm trying to make a grouped bar plot in matplotlib, following the example in the gallery. I use the following:
import matplotlib.pyplot as plt
plt.figure(figsize=(7,7), dpi=300)
xticks = [0.1, 1.1]
groups = [[1.04, 0.96],
[1.69, 4.02]]
group_labels = ["G1", "G2"]
num_items = len(group_labels)
ind = arange(num_items)
width = 0.1
s = plt.subplot(1,1,1)
for num, vals in enumerate(groups):
print "plotting: ", vals
group_len = len(vals)
gene_rects = plt.bar(ind, vals, width,
align="center")
ind = ind + width
num_groups = len(group_labels)
# Make label centered with respect to group of bars
# Is there a less complicated way?
offset = (num_groups / 2.) * width
xticks = arange(num_groups) + offset
s.set_xticks(xticks)
print "xticks: ", xticks
plt.xlim([0 - width, max(xticks) + (num_groups * width)])
s.set_xticklabels(group_labels)
My questions are:
How can I control the space between the groups of bars? Right now the spacing is huge and it looks silly. Note that I do not want to make the bars wider - I want them to have the same width, but be closer together.
How can I get the labels to be centered below the groups of bars? I tried to come up with some arithmetic calculations to position the xlabels in the right place (see code above) but it's still slightly off... it feels a bit like writing a plotting library rather than using one. How can this be fixed? (Is there a wrapper or built in utility for matplotlib where this is default behavior?)
EDIT: Reply to #mlgill: thank you for your answer. Your code is certainly much more elegant but still has the same issue, namely that the width of the bars and the spacing between the groups are not controlled separately. Your graph looks correct but the bars are far too wide -- it looks like an Excel graph -- and I wanted to make the bar thinner.
Width and margin are now linked, so if I try:
margin = 0.60
width = (1.-2.*margin)/num_items
It makes the bar skinnier, but brings the group far apart, so the plot again does not look right.
How can I make a grouped bar plot function that takes two parameters: the width of each bar, and the spacing between the bar groups, and plots it correctly like your code did, i.e. with the x-axis labels centered below the groups?
I think that since the user has to compute specific low-level layout quantities like margin and width, we are still basically writing a plotting library :)
Actually I think this problem is best solved by adjusting figsize and width; here is my output with figsize=(2,7) and width=0.3:
By the way, this type of thing becomes a lot simpler if you use pandas wrappers (i've also imported seaborn, not necessary for the solution, but makes the plot a lot prettier and more modern looking in my opinion):
import pandas as pd
import seaborn
seaborn.set()
df = pd.DataFrame(groups, index=group_labels)
df.plot(kind='bar', legend=False, width=0.8, figsize=(2,5))
plt.show()
The trick to both of your questions is understanding that bar graphs in Matplotlib expect each series (G1, G2) to have a total width of "1.0", counting margins on either side. Thus, it's probably easiest to set margins up and then calculate the width of each bar depending on how many of them there are per series. In your case, there are two bars per series.
Assuming you left align each bar, instead of center aligning them as you had done, this setup will result in series which span from 0.0 to 1.0, 1.0 to 2.0, and so forth on the x-axis. Thus, the exact center of each series, which is where you want your labels to appear, will be at 0.5, 1.5, etc.
I've cleaned up your code as there were a lot of extraneous variables. See comments within.
import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize=(7,7), dpi=300)
groups = [[1.04, 0.96],
[1.69, 4.02]]
group_labels = ["G1", "G2"]
num_items = len(group_labels)
# This needs to be a numpy range for xdata calculations
# to work.
ind = np.arange(num_items)
# Bar graphs expect a total width of "1.0" per group
# Thus, you should make the sum of the two margins
# plus the sum of the width for each entry equal 1.0.
# One way of doing that is shown below. You can make
# The margins smaller if they're still too big.
margin = 0.05
width = (1.-2.*margin)/num_items
s = plt.subplot(1,1,1)
for num, vals in enumerate(groups):
print "plotting: ", vals
# The position of the xdata must be calculated for each of the two data series
xdata = ind+margin+(num*width)
# Removing the "align=center" feature will left align graphs, which is what
# this method of calculating positions assumes
gene_rects = plt.bar(xdata, vals, width)
# You should no longer need to manually set the plot limit since everything
# is scaled to one.
# Also the ticks should be much simpler now that each group of bars extends from
# 0.0 to 1.0, 1.0 to 2.0, and so forth and, thus, are centered at 0.5, 1.5, etc.
s.set_xticks(ind+0.5)
s.set_xticklabels(group_labels)
I read an answer that Paul Ivanov posted on Nabble that might solve this problem with less complexity. Just set the index as below. This will increase the spacing between grouped columns.
ind = np.arange(0,12,2)
Related
I want to create sort of Stacked Bar Chart [don't know the proper name]. I hand drew the graph [for years 2016 and 2017] and attached it here.
The code to create the df is below:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = [[2016.0, 0.4862, 0.4115, 0.3905, 0.3483, 0.1196],
[2017.0, 0.4471, 0.4096, 0.3725, 0.2866, 0.1387],
[2018.0, 0.4748, 0.4016, 0.3381, 0.2905, 0.2012],
[2019.0, 0.4705, 0.4247, 0.3857, 0.3333, 0.2457],
[2020.0, 0.4755, 0.4196, 0.3971, 0.3825, 0.2965]]
cols = ['attribute_time', '100-81 percentile', '80-61 percentile', '60-41 percentile', '40-21 percentile', '20-0 percentile']
df = pd.DataFrame(data, columns=cols)
#set seaborn plotting aesthetics
sns.set(style='white')
#create stacked bar chart
df.set_index('attribute_time').plot(kind='bar', stacked=True)
The data doesn't need to stack on top of each other. The code will create a stacked bar chart, but that's not exactly what needs to be displayed. The percentile needs to have labeled horizontal lines indicating the percentile on the x axis for that year. Does anyone have recommendations on how to achieve this goal? Is it a sort of modified stacked bar chart that needs to be visualized?
My approach to this is to represent the data as a categorical scatter plot (stripplot in Seaborn) using horizontal lines rather than points as markers. You'll have to make some choices about exactly how and where you want to plot things, but this should get you started!
I first modified the data a little bit:
df['attribute_time'] = df['attribute_time'].astype('int') # Just to get rid of the decimals.
df = df.melt(id_vars = ['attribute_time'],
value_name = 'pct_value',
var_name = 'pct_range')
Melting the DataFrame takes the wide data and makes it long instead, so the columns are now year, pct_value, and pct_range and there is a row for each data point.
Next is the plotting:
fig, ax = plt.subplots()
sns.stripplot(data = df,
x = 'attribute_time',
y = 'pct_value',
hue = 'pct_range',
jitter = False,
marker = '_',
s = 40,
linewidth = 3,
ax = ax)
Instead of labeling each point with the range that it belongs to, I though it would be a lot cleaner to separate them into ranges by color.
The jitter is used when there are lots of points for a given category that might overlap to try and prevent them from touching. In this case, we don't need to worry about that so I turned the jitter off. The marker style is designated here as hline.
The s parameter is the horizontal width of each line, and the linewidth is the thickness, so you can play around with those a bit to see what works best for you.
Text is added to the figure using the ax.text method as follows:
for year, value in zip(df['attribute_time'],df['pct_value']):
ax.text(year - 2016,
value,
str(value),
ha = 'center',
va = 'bottom',
fontsize = 'small')
The figure coordinates are indexed starting from 0 despite the horizontal markers displaying the years, so the x position of the text is shifted left by the minimum year (2016). The y position is equal to the value, and the text itself is a string representation of the value. The text is centered above the line and sits slightly above it due to the vertical anchor being on the bottom.
There's obviously a lot you can tweak to make it look how you want with sizing and labeling and stuff, but hopefully this is at least a good start!
I'm trying to set up an inverted axis bar chart such that smaller numbers have bigger bars, and those bars start from the top of the bar chart. Ideally, my y-axis would vary from 10e-10 on the bottom to 10e-2 on the top, and would look similar to this excel plot:
In presenting this data, getting to a lower number is better, so I was hoping to represent this with bigger bars, rather than the absence of bars.
Inverting the y-axis limits makes the bars start from the top, but it does not solve my problem, since the smaller bars are still associated with the smaller numbers. Is there some way to move the origin, and specify that bars should be drawn from the origin to the appropriate tick on the axis?
The data and code are really not so important here, but here is an excerpt:
plt.rcParams['xtick.bottom'] = plt.rcParams['xtick.labelbottom'] = False
plt.rcParams['xtick.top'] = plt.rcParams['xtick.labeltop'] = True
barVals = [ 10**(-x) for x in range(10) ]
ticks = [x for x in range(10)]
plt.bar(ticks, barVals)
plt.yscale('log')
plt.ylim([1e-2, 1e-10])
#plt.axes().spines['bottom'].set_position(('data', 0))
plt.show()
The resultant plot has bigger bars for bigger numbers and smaller bars for smaller numbers. I could instead plot the difference between each value and the maximum, but I was hoping there was some built-in way to do this in matplotlib/pyplot.
Using matlab, the functionality I am looking for is setting the axis base value:
b = bar(ticks, barValues);
b(1).BaseValue = 1e0;
Given the way a normal log-scaling of the axis works, I think your best bet is to scale the data manually, and adjust the labels to match. The following is a simple example to get you started, using the OO API:
data = 10.0**np.arange(-2, -7, -1)
plot_data = np.log10(data)
fig, ax = plt.subplots()
ax.bar(np.arange(data.size) + 1, plot_data)
You can set the ticks manually, but I would recommend using a Formatter:
from matplotlib.ticker import StrMethodFormatter
...
ax.yaxis.set_major_formatter(StrMethodFormatter('$10^{{{x}}}$'))
This particular Formatter accepts a template string suitable for str.format and interpolates it with the tick value bound to the name x. If you only wanted to display the integer portion of the exponent, you could initialize it as
StrMethodFormatter('$10^{{{x:.0f}}}$')
The symbols $...$ tell matplotlib that the string is LaTeX, and {{...}} are escaped curly braces to tell LaTeX to group the entire exponent as a superscript.
To adjust the limits of your chart:
ax.set_ylim([plot_data.min() - 0.5, -1])
I am currently producing a figure for a paper, which looks like this:
The above is pretty close to how I want it to look, but I have a strong feeling that I'm not doing this the "right way", since it was really fiddly to produce, and my code is full of all sorts of magic numbers where I fine-tuned the positioning by hand. Thus my question is, what is the right way to produce a plot like this?
Here are the important features of this plot that made it hard to produce:
The aspect ratios of the three subplots are fixed by the data, but the images are not all at the same resolution.
I wanted all three plots to take up the full height of the figure
I wanted (a) and (b) to be close together since they share their y axis, while (c) is further away
Ideally, I would like the top of the top colour bar to exactly match the top of the three images, and similarly with the bottom of the lower colour bar. (In fact they aren't quite aligned, because I did this by guessing numbers and re-compiling the image.)
In producing this figure, I first tried using GridSpec, but I wasn't able to control the relative spacing between the three main subplots. I then tried ImageGrid, which is part of the AxisGrid toolkit, but the differing resolutions between the three images caused that to behave strangely. Delving deeper into AxesGrid, I was able to position the three main subplots using the append_axes function, but I still had to position the three colourbars by hand. (I created the colourbars manually.)
I'd rather not post my existing code, because it's a horrible collection of hacks and magic numbers. Rather my question is, is there any way in MatPlotLib to just specify the logical layout of the figure (i.e. the content of the bullet points above) and have the layout calculated for me automatically?
Here is a possible solution. You'd start with the figure width (which makes sense when preparing a paper) and calculate your way through, using the aspects of the figures, some arbitrary spacings between the subplots and the margins. The formulas are similar to the ones I used in this answer. And the unequal aspects are taken care of by GridSpec's width_ratios argument.
You then end up with a figure height such that the subplots' are equal in height.
So you cannot avoid typing in some numbers, but they are not "magic". All are related to acessible quatities like fraction of figure size or fraction of mean subplots size. Since the system is closed, changing any number will simply produce a different figure height, but will not destroy the layout.
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import numpy as np; np.random.seed(42)
imgs = []
shapes = [(550,200), ( 550,205), (1100,274) ]
for shape in shapes:
imgs.append(np.random.random(shape))
# calculate inverse aspect(width/height) for all images
inva = np.array([ img.shape[1]/float(img.shape[0]) for img in imgs])
# set width of empty column used to stretch layout
emptycol = 0.02
r = np.array([inva[0],inva[1], emptycol, inva[2], 3*emptycol, emptycol])
# set a figure width in inch
figw = 8
# border, can be set independently of all other quantities
left = 0.1; right=1-left
bottom=0.1; top=1-bottom
# wspace (=average relative space between subplots)
wspace = 0.1
#calculate scale
s = figw*(right-left)/(len(r)+(len(r)-1)*wspace)
# mean aspect
masp = len(r)/np.sum(r)
#calculate figheight
figh = s*masp/float(top-bottom)
gs = gridspec.GridSpec(3,len(r), width_ratios=r)
fig = plt.figure(figsize=(figw,figh))
plt.subplots_adjust(left, bottom, right, top, wspace)
ax1 = plt.subplot(gs[:,0])
ax2 = plt.subplot(gs[:,1])
ax2.set_yticks([])
ax3 = plt.subplot(gs[:,3])
ax3.yaxis.tick_right()
ax3.yaxis.set_label_position("right")
cax1 = plt.subplot(gs[0,5])
cax2 = plt.subplot(gs[1,5])
cax3 = plt.subplot(gs[2,5])
im1 = ax1.imshow(imgs[0], cmap="viridis")
im2 = ax2.imshow(imgs[1], cmap="plasma")
im3 = ax3.imshow(imgs[2], cmap="RdBu")
fig.colorbar(im1, ax=ax1, cax=cax1)
fig.colorbar(im2, ax=ax2, cax=cax2)
fig.colorbar(im3, ax=ax3, cax=cax3)
ax1.set_title("image title")
ax1.set_xlabel("xlabel")
ax1.set_ylabel("ylabel")
plt.show()
I have a multi-figure Bokeh plot of vertically stacked & aligned figures. Because I want to align the plots vertically, the y-axis labels are rotated to be vertical rather than horizontal.
In certain scenarios, Bokeh produces too many ticks, such that the tick labels overlap completely, making illegible. Here is an example:
import bokeh.plotting as bp
import numpy as np
y = np.random.uniform(0, 300, 50)
x = np.arange(len(y))
bp.output_file("/tmp/test.html", "test")
plot = bp.figure(plot_width=800, plot_height=200)
plot.yaxis.axis_label_text_font_size = "12pt"
plot.yaxis.major_label_orientation = 'vertical'
plot.line (x,y)
bp.show(plot)
Short of making the renderer clever enough to produce fewer labels automatically, is there a way to indicate the # of labels to be placed on an axis?
It seems that the # of labels generated has to do with the range of the data, in terms of its affinity to a power of 10.
You can control the number of ticks now with desired_num_ticks property. Look at the example from the bokeh docs (and this issue).
For example, in your case, something like this: plot.yaxis[0].ticker.desired_num_ticks = 10.
Looks like there is still no direct way to specify this. Please follow the related issue. This is a workaround:
from bokeh.models import SingleIntervalTicker, LinearAxis
plot = bp.figure(plot_width=800, plot_height=200, x_axis_type=None)
ticker = SingleIntervalTicker(interval=5, num_minor_ticks=10)
xaxis = LinearAxis(ticker=ticker)
plot.add_layout(xaxis, 'below')
You can control the number of tickets via the interval parameter in SingleIntervalTicker.
Changing the vertical distance between two subplot using tight_layout(h_pad=-1) changes the total figuresize. How can I define the figuresize using tight_layout?
Here is the code:
#define figure
pl.figure(figsize=(10, 6.25))
ax1=subplot(211)
img=pl.imshow(np.random.random((10,50)), interpolation='none')
ax1.set_xticklabels(()) #hides the tickslabels of the first plot
subplot(212)
x=linspace(0,50)
pl.plot(x,x,'k-')
xlim( ax1.get_xlim() ) #same x-axis for both plots
And here is the results:
If I write
pl.tight_layout(h_pad=-2)
in the last line, then I get this:
As you can see, the figure is bigger...
You can use a GridSpec object to control precisely width and height ratios, as answered on this thread and documented here.
Experimenting with your code, I could produce something like what you want, by using a height_ratio that assigns twice the space to the upper subplot, and increasing the h_pad parameter to the tight_layout call. This does not sound completely right, but maybe you can adjust this further ...
import numpy as np
from matplotlib.pyplot import *
import matplotlib.pyplot as pl
import matplotlib.gridspec as gridspec
#define figure
fig = pl.figure(figsize=(10, 6.25))
gs = gridspec.GridSpec(2, 1, height_ratios=[2,1])
ax1=subplot(gs[0])
img=pl.imshow(np.random.random((10,50)), interpolation='none')
ax1.set_xticklabels(()) #hides the tickslabels of the first plot
ax2=subplot(gs[1])
x=np.linspace(0,50)
ax2.plot(x,x,'k-')
xlim( ax1.get_xlim() ) #same x-axis for both plots
fig.tight_layout(h_pad=-5)
show()
There were other issues, like correcting the imports, adding numpy, and plotting to ax2 instead of directly with pl. The output I see is this:
This case is peculiar because of the fact that the default aspect ratios of images and plots are not the same. So it is worth noting for people looking to remove the spaces in a grid of subplots consisting of images only or of plots only that you may find an appropriate solution among the answers to this question (and those linked to it): How to remove the space between subplots in matplotlib.pyplot?.
The aspect ratios of the subplots in this particular example are as follows:
# Default aspect ratio of images:
ax1.get_aspect()
# 1.0
# Which is as it is expected based on the default settings in rcParams file:
matplotlib.rcParams['image.aspect']
# 'equal'
# Default aspect ratio of plots:
ax2.get_aspect()
# 'auto'
The size of ax1 and the space beneath it are adjusted automatically based on the number of pixels along the x-axis (i.e. width) so as to preserve the 'equal' aspect ratio while fitting both subplots within the figure. As you mentioned, using fig.tight_layout(h_pad=xxx) or the similar fig.set_constrained_layout_pads(hspace=xxx) is not a good option as this makes the figure larger.
To remove the gap while preserving the original figure size, you can use fig.subplots_adjust(hspace=xxx) or the equivalent plt.subplots(gridspec_kw=dict(hspace=xxx)), as shown in the following example:
import numpy as np # v 1.19.2
import matplotlib.pyplot as plt # v 3.3.2
np.random.seed(1)
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10, 6.25),
gridspec_kw=dict(hspace=-0.206))
# For those not using plt.subplots, you can use this instead:
# fig.subplots_adjust(hspace=-0.206)
size = 50
ax1.imshow(np.random.random((10, size)))
ax1.xaxis.set_visible(False)
# Create plot of a line that is aligned with the image above
x = np.arange(0, size)
ax2.plot(x, x, 'k-')
ax2.set_xlim(ax1.get_xlim())
plt.show()
I am not aware of any way to define the appropriate hspace automatically so that the gap can be removed for any image width. As stated in the docstring for fig.subplots_adjust(), it corresponds to the height of the padding between subplots, as a fraction of the average axes height. So I attempted to compute hspace by dividing the gap between the subplots by the average height of both subplots like this:
# Extract axes positions in figure coordinates
ax1_x0, ax1_y0, ax1_x1, ax1_y1 = np.ravel(ax1.get_position())
ax2_x0, ax2_y0, ax2_x1, ax2_y1 = np.ravel(ax2.get_position())
# Compute negative hspace to close the vertical gap between subplots
ax1_h = ax1_y1-ax1_y0
ax2_h = ax2_y1-ax2_y0
avg_h = (ax1_h+ax2_h)/2
gap = ax1_y0-ax2_y1
hspace=-(gap/avg_h) # this divided by 2 also does not work
fig.subplots_adjust(hspace=hspace)
Unfortunately, this does not work. Maybe someone else has a solution for this.
It is also worth mentioning that I tried removing the gap between subplots by editing the y positions like in this example:
# Extract axes positions in figure coordinates
ax1_x0, ax1_y0, ax1_x1, ax1_y1 = np.ravel(ax1.get_position())
ax2_x0, ax2_y0, ax2_x1, ax2_y1 = np.ravel(ax2.get_position())
# Set new y positions: shift ax1 down over gap
gap = ax1_y0-ax2_y1
ax1.set_position([ax1_x0, ax1_y0-gap, ax1_x1, ax1_y1-gap])
ax2.set_position([ax2_x0, ax2_y0, ax2_x1, ax2_y1])
Unfortunately, this (and variations of this) produces seemingly unpredictable results, including a figure resizing similar to when using fig.tight_layout(). Maybe someone else has an explanation for what is happening here behind the scenes.