How to sensibly set bar width in matplotlib.axes.bar - python

When adding a bar plot, eg
an_axis.bar(xvalues, yvalues)
The default bar width is 0.8, but my plots have a variable number of bars & risk getting messed up with the width set manually.
Is there a good way to set the bar width programmatically?

OK - this seems to work:
minx, maxx = plt.getp(ax2, 'xbound')
ax2.bar(xvalues, yvalues,
width=(maxx-minx)/len(xvalues))
Which I was able to figure out after discovering:
plt.getp(object) # In this case ax2
Without additional parameters this gives a list of properties & their values; very useful for exploring matplotlib objects.

Related

How to center the histogram bars around tick marks using seaborn displot? Stacking bars is essential

I have searched many ways of making histograms centered around tick marks but not able to find a solution that works with seaborn displot. The function displot lets me stack the histogram according to a column in the dataframe and thus would prefer a solution using displot or something that allows stacking based on a column in a data frame with color-coding as with palette.
Even after setting the tick values, I am not able to get the bars to center around the tick marks.
Example code
# Center the histogram on the tick marks
tips = sns.load_dataset('tips')
sns.displot(x="total_bill",
hue="day", multiple = 'stack', data=tips)
plt.xticks(np.arange(0, 50, 5))
I would also like to plot a histogram of a variable that takes a single value and choose the bin width of the resulting histogram in such a way that it is centered around the value. (0.5 in this example.)
I can get the center point by choosing the number of bins equal to a number of tick marks but the resulting bar is very thin. How can I increase the bin size in this case, where there is only one bar but want to display all the other possible points. By displaying all the tick marks, the bar width is very tiny.
I want the same centering of the bar at the 0.5 tick mark but make it wider as it is the only value for which counts are displayed.
Any solutions?
tips['single'] = 0.5
sns.displot(x='single',
hue="day", multiple = 'stack', data=tips, bins = 10)
plt.xticks(np.arange(0, 1, 0.1))
Edit:
Would it be possible to have more control over the tick marks in the second case? I would not want to display the round off to 1 decimal place but chose which of the tick marks to display. Is it possible to display just one value in the tick mark and have it centered around that?
Does the min_val and max_val in this case refer to value of the variable which will be 0 in this case and then the x axis would be plotted on negative values even when there are none and dont want to display them.
For your first problem, you may want to figure out a few properties of the data that your plotting. For example the range of the data. Additionally, you may want to choose beforehand the number of bins that you want displayed.
tips = sns.load_dataset('tips')
min_val = tips.total_bill.min()
max_val = tips.total_bill.max()
val_width = max_val - min_val
n_bins = 10
bin_width = val_width/n_bins
sns.histplot(x="total_bill",
hue="day", multiple = 'stack', data=tips,
bins=n_bins, binrange=(min_val, max_val),
palette='Paired')
plt.xlim(0, 55) # Define x-axis limits
Another thing to remember is that width a of a bar in a histogram identifies the bounds of its range. So a bar spanning [2,5] on the x-axis implies that the values represented by that bar belong to that range.
Considering this, it is easy to formulate a solution. Assume that we want the original bar graphs - identifying the bounds of each bar graph, one solution may look like
plt.xticks(np.arange(min_val-bin_width, max_val+bin_width, bin_width))
Now, if we offset the ticks by half a bin-width, we will get to the centers of the bars.
plt.xticks(np.arange(min_val-bin_width/2, max_val+bin_width/2, bin_width))
For your single value plot, the idea remains the same. Control the bin_width and the x-axis range and ticks. Bin-width has to be controlled explicitly since automatic inference of bin-width will probably be 1 unit wide which on the plot will have no thickness. Histogram bars always indicate a range - even though when we have just one single value. This is illustrated in the following example and figure.
single_val = 23.5
tips['single'] = single_val
bin_width = 4
fig, axs = plt.subplots(1, 2, sharey=True, figsize=(12,4)) # Get 2 subplots
# Case 1 - With the single value as x-tick label on subplot 0
sns.histplot(x='single',
hue="day", multiple = 'stack', data=tips,
binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
palette='rocket',
ax=axs[0])
ticks = [single_val, single_val+bin_width] # 2 ticks - given value and given_value + width
axs[0].set(
title='Given value as tick-label starts the bin on x-axis',
xticks=ticks,
xlim=(0, int(single_val*2)+bin_width)) # x-range such that bar is at middle of x-axis
axs[0].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))
# Case 2 - With centering on the bin starting at single-value on subplot 1
sns.histplot(x='single',
hue="day", multiple = 'stack', data=tips,
binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
palette='rocket',
ax=axs[1])
ticks = [single_val+bin_width/2] # Just the bin center
axs[1].set(
title='Bin centre is offset from single_value by bin_width/2',
xticks=ticks,
xlim=(0, int(single_val*2)+bin_width) ) # x-range such that bar is at middle of x-axis
axs[1].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))
Output:
I feel from your description that what you are really implying by a bar graph is a categorical bar graph. The centering is then automatic. Because the bar is not a range anymore but a discrete category. For the numeric and continuous nature of the variable in the example data, I would not recommend such an approach. Pandas provides for plotting categorical bar plots. See here. For our example, one way to do this is as follows:
n_colors = len(tips['day'].unique()) # Get number of uniques categories
agg_df = tips[['single', 'day']].groupby(['day']).agg(
val_count=('single', 'count'),
val=('single','max')
).reset_index() # Get aggregated information along the categories
agg_df.pivot(columns='day', values='val_count', index='val').plot.bar(
stacked=True,
color=sns.color_palette("Paired", n_colors), # Choose "number of days" colors from palette
width=0.05 # Set bar width
)
plt.show()
This yields:

How do I invert the bar size in matplotlib pyplot bar charts?

I'm trying to set up an inverted axis bar chart such that smaller numbers have bigger bars, and those bars start from the top of the bar chart. Ideally, my y-axis would vary from 10e-10 on the bottom to 10e-2 on the top, and would look similar to this excel plot:
In presenting this data, getting to a lower number is better, so I was hoping to represent this with bigger bars, rather than the absence of bars.
Inverting the y-axis limits makes the bars start from the top, but it does not solve my problem, since the smaller bars are still associated with the smaller numbers. Is there some way to move the origin, and specify that bars should be drawn from the origin to the appropriate tick on the axis?
The data and code are really not so important here, but here is an excerpt:
plt.rcParams['xtick.bottom'] = plt.rcParams['xtick.labelbottom'] = False
plt.rcParams['xtick.top'] = plt.rcParams['xtick.labeltop'] = True
barVals = [ 10**(-x) for x in range(10) ]
ticks = [x for x in range(10)]
plt.bar(ticks, barVals)
plt.yscale('log')
plt.ylim([1e-2, 1e-10])
#plt.axes().spines['bottom'].set_position(('data', 0))
plt.show()
The resultant plot has bigger bars for bigger numbers and smaller bars for smaller numbers. I could instead plot the difference between each value and the maximum, but I was hoping there was some built-in way to do this in matplotlib/pyplot.
Using matlab, the functionality I am looking for is setting the axis base value:
b = bar(ticks, barValues);
b(1).BaseValue = 1e0;
Given the way a normal log-scaling of the axis works, I think your best bet is to scale the data manually, and adjust the labels to match. The following is a simple example to get you started, using the OO API:
data = 10.0**np.arange(-2, -7, -1)
plot_data = np.log10(data)
fig, ax = plt.subplots()
ax.bar(np.arange(data.size) + 1, plot_data)
You can set the ticks manually, but I would recommend using a Formatter:
from matplotlib.ticker import StrMethodFormatter
...
ax.yaxis.set_major_formatter(StrMethodFormatter('$10^{{{x}}}$'))
This particular Formatter accepts a template string suitable for str.format and interpolates it with the tick value bound to the name x. If you only wanted to display the integer portion of the exponent, you could initialize it as
StrMethodFormatter('$10^{{{x:.0f}}}$')
The symbols $...$ tell matplotlib that the string is LaTeX, and {{...}} are escaped curly braces to tell LaTeX to group the entire exponent as a superscript.
To adjust the limits of your chart:
ax.set_ylim([plot_data.min() - 0.5, -1])

Highlight a label in a legend, matplotlib

As of now I am using Matplotlib to generate plots.
The legend on the plot can be tweaked using some parameters (as mentioned in this guide). But I would like to have something specific in the legend, as attached in this image below.
I would like to highlight one of the labels in the legend like shown (as of now done using MS paint).
If there are other ways of highlighting a specific label, that would also suffice.
The answer by FLab is actually quite reasonable given how painful it can be to backtrace the coordinates of the plotted items. However, the demands of publication-grade figures are quite often unreasonable, and seeing matplotlib challenged by MS Paint is a enough good motivation for answering this.
Lets consider this example from the matplotlib gallery as a starting point:
N = 100
x = np.arange(N)
fig = plt.figure()
ax = fig.add_subplot(111)
xx = x - (N/2.0)
plt.plot(xx, (xx*xx)-1225, label='$y=x^2$')
plt.plot(xx, 25*xx, label='$y=25x$')
plt.plot(xx, -25*xx, label='$y=-25x$')
legend = plt.legend()
plt.show()
Once an image has been drawn, we can backtrace the elements in the legend instance to find out their coordinates. There are two difficulties associated with this:
The coordinates we'll get through the get_window_extent method are in pixels, not "data" coordinates, so we'll need to use a transform function. A great overview of the transforms is given here.
Finding a proper boundary is tricky. The legend instance above has two useful attributes, legend.legendHandles and legend.texts - two lists with a list of line artists and text labels respectively. One would need to get a bounding box for both elements, while keeping in mind that the implementation might not be perfect and is backend-specific (c.f. this SO question). This is a proper way to do this, but it's not the one in this answer, because...
.. because luckily in your case the legend items seem to be uniformly separated, so we could just get the legend box, split it into a number of rectangles equal to the number of rows in your legend, and draw one of the rectangles on-screen. Below we'll define two functions, one to get the data coordinates of the legend box, and another one to split them into chunks and draw a rectangle according to an index:
from matplotlib.patches import Rectangle
def get_legend_box_coord(ax, legend):
""" Returns coordinates of the legend box """
disp2data = ax.transData.inverted().transform
box = legend.legendPatch
# taken from here:
# https://stackoverflow.com/a/28728709/4118756
box_pixcoords = box.get_window_extent(ax)
box_xycoords = [disp2data(box_pixcoords.p0), disp2data(box_pixcoords.p1)]
box_xx, box_yy = np.array(box_xycoords).T
return box_xx, box_yy
def draw_sublegend_box(ax, legend, idx):
nitems = len(legend.legendHandles)
xx, yy = get_legend_box_coord(ax, legend)
# assuming equal spacing between legend items:
y_divisors = np.linspace(*yy, num=nitems+1)
height = y_divisors[idx]-y_divisors[idx+1]
width = np.diff(xx)
lower_left_xy = [xx[0], y_divisors[idx+1]]
legend_box = Rectangle(
xy = lower_left_xy,
width = width,
height = height,
fill = False,
zorder = 10)
ax.add_patch(legend_box)
Now, calling draw_sublegend_box(ax, legend, 1) produces the following plot:
Note that annotating the legend in such is way is only possible once the figure has been drawn.
In order to highlight a specific label, you could have it in bold.
Here's the link to another SO answer that suggest how to use Latex to format entries of a legend:
Styling part of label in legend in matplotlib

How do I plot more than one set of bars per axis on a bar plot in python?

I currently use the align=’edge’ parameter and positive/negative widths in pyplot.bar() to plot the bar data of one metric to each axis. However, if I try to plot a second set of data to one axis, it covers the first set. Is there a way for pyplot to automatically space this data correctly?
lns3 = ax[1].bar(bucket_df.index,bucket_df.original_revenue,color='c',width=-0.4,align='edge')
lns4 = ax[1].bar(bucket_df.index,bucket_df.revenue_lift,color='m',bottom=bucket_df.original_revenue,width=-0.4,align='edge')
lns5 = ax3.bar(bucket_df.index,bucket_df.perc_first_priced,color='grey',width=0.4,align='edge')
lns6 = ax3.bar(bucket_df.index,bucket_df.perc_revenue_lift,color='y',width=0.4,align='edge')
This is what it looks like when I show the plot:
The data shown in yellow completely covers the data in grey. I'd like it to be shown next to the grey data.
Is there any easy way to do this? Thanks!
The first argument to the bar() plotting method is an array of the x-coordinates for your bars. Since you pass the same x-coordinates they will all overlap. You can get what you want by staggering the bars by doing something like this:
x = np.arange(10) # define your x-coordinates
width = 0.1 # set a width for your plots
offset = 0.15 # define an offset to separate each set of bars
fig, ax = plt.subplots() # define your figure and axes objects
ax.bar(x, y1) # plot the first set of bars
ax.bar(x + offset, y2) # plot the second set of bars
Since you have a few sets of data to plot, it makes more sense to make the code a bit more concise (assume y_vals is a list containing the y-coordinates you'd like to plot, bucket_df.original_revenue, bucket_df.revenue_lift, etc.). Then your plotting code could look like this:
for i, y in enumerate(y_vals):
ax.bar(x + i * offset, y)
If you want to plot more sets of bars you can decrease the width and offset accordingly.

How to remove gaps between bars in Matplotlib bar chart

I'm making a bar chart in Matplotlib with a call like this:
xs.bar(bar_lefts, bar_heights, facecolor='black', edgecolor='black')
I get a barchart that looks like this:
What I'd like is one with no white gap between consecutive bars, e.g. more like this:
Is there a way to achieve this in Matplotlib using the bar() function?
Add width=1.0 as a keyword argument to bar(). E.g.
xs.bar(bar_lefts, bar_heights, width=1.0, facecolor='black', edgecolor='black').
This will fill the bars gaps vertically.
It has been 8 years since this question was asked, and the matplotlib API now has built-in ways to produce filled, gapless bars: pyplot.step() and pyplot.stairs() with the argument fill=True.
See the docs for a fuller comparison, but the primary difference is that step() defines the step positions with N x and N y values just like plot() would, while stairs() defines the step positions with N heights and N+1 edges, like what hist() returns. It is a subtle difference, and I think both tools can create the same outputs.
Just set the width 1 over the number of bars, so:
width = 1 / len(bar_lefts)
xs.bar(bar_lefts, bar_heights, width=width, color='black')
You can set the width equal to the distance between two bars:
width = bar_lefts[-1] - bar_lefts[-2]
xs.bar(bar_lefts, bar_heights, width=width)

Categories