I am generating horizontal bar graphs using mathplotlib from
SO: How to plot multiple horizontal bars in one chart with matplotlib. The problem is when I have more than 2 horizontal bars, the bars are getting overlapped. Not sure, what I am doing wrong.
Here is the following graph code
import pandas
import matplotlib.pyplot as plt
import numpy as np
df = pandas.DataFrame(dict(graph=['q1','q2','q3' , 'q4','q5',' q6'],
n=[3, 5, 2,3 ,5 , 2], m=[6, 1, 3, 6 , 1 , 3]))
ind = np.arange(len(df))
width = 0.4
opacity = 0.4
fig, ax = plt.subplots()
ax.barh(ind, df.n, width, alpha=opacity, color='r', label='Existing')
ax.barh(ind + width, df.m, width, alpha=opacity,color='b', label='Community')
ax.barh(ind + 2* width, df.m, width, alpha=opacity,color='g', label='Robust')
ax.set(yticks=ind + width , yticklabels=df.graph, ylim=[2*width - 1, len(df)])
ax.legend()
#plt.xlabel('Queries')
plt.xlabel('Precesion')
plt.title('Precesion for these queries')
plt.show()
Currently, the graph looks like this
You set the width of the bars to 0.4, but you have three bars in each group. That means the width of each group is 1.2. But you set the ticks only 1 unit apart, so your bars don't fit into the spaces.
Since you are using pandas, you don't really need to do all that. Just do df.plot(kind='barh') and you will get a horizontal bar chart of the dataframe data. You can tweak the display colors, etc., by using various paramters to plot that you can find in the documentation. (If you want the "graph" column to be used as y-axis labels, set it as the index: df.set_index('graph').plot(kind='barh'))
(Using df.plot will give a barplot with only two bars per group, since your DataFrame has only two numeric columns. In your example, you plotted column m twice, which doesn't seem very useful. If you really want to do that, you could add a duplicate column into the DataFrame.)
ax=df.plot.barh(width=0.85,figsize=(15,15))
adjust width(should be less than 1 otherwise overlaps) and figsize to get the best and clear view of bars. Because if the figure is bigger you can have a clear and bigger view of bars which is the ultimate goal.
Related
I'm trying to make a grouped bar plot in matplotlib, following the example in the gallery. I use the following:
import matplotlib.pyplot as plt
plt.figure(figsize=(7,7), dpi=300)
xticks = [0.1, 1.1]
groups = [[1.04, 0.96],
[1.69, 4.02]]
group_labels = ["G1", "G2"]
num_items = len(group_labels)
ind = arange(num_items)
width = 0.1
s = plt.subplot(1,1,1)
for num, vals in enumerate(groups):
print "plotting: ", vals
group_len = len(vals)
gene_rects = plt.bar(ind, vals, width,
align="center")
ind = ind + width
num_groups = len(group_labels)
# Make label centered with respect to group of bars
# Is there a less complicated way?
offset = (num_groups / 2.) * width
xticks = arange(num_groups) + offset
s.set_xticks(xticks)
print "xticks: ", xticks
plt.xlim([0 - width, max(xticks) + (num_groups * width)])
s.set_xticklabels(group_labels)
My questions are:
How can I control the space between the groups of bars? Right now the spacing is huge and it looks silly. Note that I do not want to make the bars wider - I want them to have the same width, but be closer together.
How can I get the labels to be centered below the groups of bars? I tried to come up with some arithmetic calculations to position the xlabels in the right place (see code above) but it's still slightly off... it feels a bit like writing a plotting library rather than using one. How can this be fixed? (Is there a wrapper or built in utility for matplotlib where this is default behavior?)
EDIT: Reply to #mlgill: thank you for your answer. Your code is certainly much more elegant but still has the same issue, namely that the width of the bars and the spacing between the groups are not controlled separately. Your graph looks correct but the bars are far too wide -- it looks like an Excel graph -- and I wanted to make the bar thinner.
Width and margin are now linked, so if I try:
margin = 0.60
width = (1.-2.*margin)/num_items
It makes the bar skinnier, but brings the group far apart, so the plot again does not look right.
How can I make a grouped bar plot function that takes two parameters: the width of each bar, and the spacing between the bar groups, and plots it correctly like your code did, i.e. with the x-axis labels centered below the groups?
I think that since the user has to compute specific low-level layout quantities like margin and width, we are still basically writing a plotting library :)
Actually I think this problem is best solved by adjusting figsize and width; here is my output with figsize=(2,7) and width=0.3:
By the way, this type of thing becomes a lot simpler if you use pandas wrappers (i've also imported seaborn, not necessary for the solution, but makes the plot a lot prettier and more modern looking in my opinion):
import pandas as pd
import seaborn
seaborn.set()
df = pd.DataFrame(groups, index=group_labels)
df.plot(kind='bar', legend=False, width=0.8, figsize=(2,5))
plt.show()
The trick to both of your questions is understanding that bar graphs in Matplotlib expect each series (G1, G2) to have a total width of "1.0", counting margins on either side. Thus, it's probably easiest to set margins up and then calculate the width of each bar depending on how many of them there are per series. In your case, there are two bars per series.
Assuming you left align each bar, instead of center aligning them as you had done, this setup will result in series which span from 0.0 to 1.0, 1.0 to 2.0, and so forth on the x-axis. Thus, the exact center of each series, which is where you want your labels to appear, will be at 0.5, 1.5, etc.
I've cleaned up your code as there were a lot of extraneous variables. See comments within.
import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize=(7,7), dpi=300)
groups = [[1.04, 0.96],
[1.69, 4.02]]
group_labels = ["G1", "G2"]
num_items = len(group_labels)
# This needs to be a numpy range for xdata calculations
# to work.
ind = np.arange(num_items)
# Bar graphs expect a total width of "1.0" per group
# Thus, you should make the sum of the two margins
# plus the sum of the width for each entry equal 1.0.
# One way of doing that is shown below. You can make
# The margins smaller if they're still too big.
margin = 0.05
width = (1.-2.*margin)/num_items
s = plt.subplot(1,1,1)
for num, vals in enumerate(groups):
print "plotting: ", vals
# The position of the xdata must be calculated for each of the two data series
xdata = ind+margin+(num*width)
# Removing the "align=center" feature will left align graphs, which is what
# this method of calculating positions assumes
gene_rects = plt.bar(xdata, vals, width)
# You should no longer need to manually set the plot limit since everything
# is scaled to one.
# Also the ticks should be much simpler now that each group of bars extends from
# 0.0 to 1.0, 1.0 to 2.0, and so forth and, thus, are centered at 0.5, 1.5, etc.
s.set_xticks(ind+0.5)
s.set_xticklabels(group_labels)
I read an answer that Paul Ivanov posted on Nabble that might solve this problem with less complexity. Just set the index as below. This will increase the spacing between grouped columns.
ind = np.arange(0,12,2)
I am trying to create a stacked bar chart using PyCharm.
I am using matplotlib to explore at fullest its potentialities for simple data visualization.
My original code is for a group chart bar that displays cycle time for different teams. Such information come from a dataframe. The chart also includes autolabeling function (i.e. the height of each bar = continuous variable).
I am trying to convert such group bar chart in a stacked bar chart. The code below needs to be improved because of two factors:
labels for variables have too many decimals: this issue did not occur for the grouped bar chart. The csv file and the derived data frame weren't altered. I am struggling to understand if and where to use round command. I guess the issue is either related to the autolabeling function, because datatype used is float (I need to see at least 1 decimal).
data labels are displaced: as the auto labeling function was created for separated bars, the labels always matched the distance I wanted (based on the vertical offset). Unfortunately I did not figure out how to make sure that this distance is rather centered (see my example, the value for funnel time is at the height of squad time instead, and vice-versa). By logic, the issue should be that the height of each variable is defined ahead (see rects3 in the code, value of bottom) but I don't know how to reflect this in my auto-labeling variable.
The question is what exactly in the code must be altered in order to have the values of cycle time centered?
The code (notes for you are marked in bold):
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
'''PART 1 - Preprocess data -----------------------------------------------'''
#Directory or link of my CSV. This can be used also if you want to use API.
csv1 = r"C:\Users\AndreaPaviglianiti\Downloads\CT_Plot_DF.csv"
#Create and read dataframe. This is just to check the DF before plotting
df = pd.read_csv(csv1, sep=',', engine= 'python')
print(df, '\n')
#Extract columns as lists
squads = df['Squad_Name'].astype('str') #for our horizontal axis
funnel = df['Funnel_Time'].astype('float')
squadt = df['Squad_Time'].astype('float')
wait = df['Waiting_Time'].astype('float')
Here I tried to set the rounding but without success
'''PART 2 - Create the Bar Plot / Chart ----------------------------------'''
x = np.arange(len(squads)) #our labels on x will be the squads' names
width = 0.2 # the width of the bars. The bigger value, the larger bars
distance = 0.2
#Create objects that will be used as subplots (fig and ax).
#Each "rects" is the visualization of a yn value. first witdth is distance between X values,
# the second is the real width of bars.
fig, ax = plt.subplots()
rects1 = ax.bar(x, funnel, width, color='red', label='Funnel Time')
rects2 = ax.bar(x, squadt, width, color='green', bottom=funnel, label='Squad Time')
rects3 = ax.bar(x, wait, width, bottom=funnel+squadt, color='purple', label='Waiting Time')
# Add some text for labels, title and custom x-axis tick labels, etc.
ax.set_ylabel('Mean Cycle Time (h)')
ax.set_xlabel('\n Squads')
ax.set_title("Squad's Cycle Time Comparison in Dec-2020 \n (in mean Hours)")
ax.set_xticks(x)
ax.set_xticklabels(squads)
ax.legend()
fig.align_xlabels() #align labels to columns
# The function to display values above the bars
def autolabel(rects):
"""Attach a text label above each bar in *rects*, displaying its height."""
for rect in rects:
height = rect.get_height()
ax.annotate('{}'.format(height),
xy=(rect.get_x() + rect.get_width()/2, height),
xytext=(0, 3), # 3 points vertical offset
textcoords="offset points",
ha='center', va='bottom')
Here I tried to change xytext="center" but I get error, I am supposed to use coordinates only or is there an alternative to change the position from the height to the center?
#We will label only the most recent information. To label both add to the code "autolabel(rects1)"
autolabel(rects1)
autolabel(rects2)
autolabel(rects3)
fig.tight_layout()
'''PART 3 - Execute -------------------------------------------------------'''
plt.show()
Thank you for the help!
I have searched many ways of making histograms centered around tick marks but not able to find a solution that works with seaborn displot. The function displot lets me stack the histogram according to a column in the dataframe and thus would prefer a solution using displot or something that allows stacking based on a column in a data frame with color-coding as with palette.
Even after setting the tick values, I am not able to get the bars to center around the tick marks.
Example code
# Center the histogram on the tick marks
tips = sns.load_dataset('tips')
sns.displot(x="total_bill",
hue="day", multiple = 'stack', data=tips)
plt.xticks(np.arange(0, 50, 5))
I would also like to plot a histogram of a variable that takes a single value and choose the bin width of the resulting histogram in such a way that it is centered around the value. (0.5 in this example.)
I can get the center point by choosing the number of bins equal to a number of tick marks but the resulting bar is very thin. How can I increase the bin size in this case, where there is only one bar but want to display all the other possible points. By displaying all the tick marks, the bar width is very tiny.
I want the same centering of the bar at the 0.5 tick mark but make it wider as it is the only value for which counts are displayed.
Any solutions?
tips['single'] = 0.5
sns.displot(x='single',
hue="day", multiple = 'stack', data=tips, bins = 10)
plt.xticks(np.arange(0, 1, 0.1))
Edit:
Would it be possible to have more control over the tick marks in the second case? I would not want to display the round off to 1 decimal place but chose which of the tick marks to display. Is it possible to display just one value in the tick mark and have it centered around that?
Does the min_val and max_val in this case refer to value of the variable which will be 0 in this case and then the x axis would be plotted on negative values even when there are none and dont want to display them.
For your first problem, you may want to figure out a few properties of the data that your plotting. For example the range of the data. Additionally, you may want to choose beforehand the number of bins that you want displayed.
tips = sns.load_dataset('tips')
min_val = tips.total_bill.min()
max_val = tips.total_bill.max()
val_width = max_val - min_val
n_bins = 10
bin_width = val_width/n_bins
sns.histplot(x="total_bill",
hue="day", multiple = 'stack', data=tips,
bins=n_bins, binrange=(min_val, max_val),
palette='Paired')
plt.xlim(0, 55) # Define x-axis limits
Another thing to remember is that width a of a bar in a histogram identifies the bounds of its range. So a bar spanning [2,5] on the x-axis implies that the values represented by that bar belong to that range.
Considering this, it is easy to formulate a solution. Assume that we want the original bar graphs - identifying the bounds of each bar graph, one solution may look like
plt.xticks(np.arange(min_val-bin_width, max_val+bin_width, bin_width))
Now, if we offset the ticks by half a bin-width, we will get to the centers of the bars.
plt.xticks(np.arange(min_val-bin_width/2, max_val+bin_width/2, bin_width))
For your single value plot, the idea remains the same. Control the bin_width and the x-axis range and ticks. Bin-width has to be controlled explicitly since automatic inference of bin-width will probably be 1 unit wide which on the plot will have no thickness. Histogram bars always indicate a range - even though when we have just one single value. This is illustrated in the following example and figure.
single_val = 23.5
tips['single'] = single_val
bin_width = 4
fig, axs = plt.subplots(1, 2, sharey=True, figsize=(12,4)) # Get 2 subplots
# Case 1 - With the single value as x-tick label on subplot 0
sns.histplot(x='single',
hue="day", multiple = 'stack', data=tips,
binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
palette='rocket',
ax=axs[0])
ticks = [single_val, single_val+bin_width] # 2 ticks - given value and given_value + width
axs[0].set(
title='Given value as tick-label starts the bin on x-axis',
xticks=ticks,
xlim=(0, int(single_val*2)+bin_width)) # x-range such that bar is at middle of x-axis
axs[0].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))
# Case 2 - With centering on the bin starting at single-value on subplot 1
sns.histplot(x='single',
hue="day", multiple = 'stack', data=tips,
binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
palette='rocket',
ax=axs[1])
ticks = [single_val+bin_width/2] # Just the bin center
axs[1].set(
title='Bin centre is offset from single_value by bin_width/2',
xticks=ticks,
xlim=(0, int(single_val*2)+bin_width) ) # x-range such that bar is at middle of x-axis
axs[1].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))
Output:
I feel from your description that what you are really implying by a bar graph is a categorical bar graph. The centering is then automatic. Because the bar is not a range anymore but a discrete category. For the numeric and continuous nature of the variable in the example data, I would not recommend such an approach. Pandas provides for plotting categorical bar plots. See here. For our example, one way to do this is as follows:
n_colors = len(tips['day'].unique()) # Get number of uniques categories
agg_df = tips[['single', 'day']].groupby(['day']).agg(
val_count=('single', 'count'),
val=('single','max')
).reset_index() # Get aggregated information along the categories
agg_df.pivot(columns='day', values='val_count', index='val').plot.bar(
stacked=True,
color=sns.color_palette("Paired", n_colors), # Choose "number of days" colors from palette
width=0.05 # Set bar width
)
plt.show()
This yields:
There is a big space between the border/x-axe of the graph and the first and last bar in a bar plot in pyplot (red arrows in the first picture). In the image below, it looks fine in the graph on the left, but it's wasting a lot of space in the graph on the right. The larger the graph, the larger the space.
See how much space is wasted in the second picture:
Any idea how to fix that?
The code:
plt.figure(figsize=figsize)
grid = plt.GridSpec(figsize[1], figsize[0], wspace=120/figsize[0])
plt.suptitle(feature.upper())
# NaN plot
plt.subplot(grid[:, :2])
plt.title('Présence')
plt.ylabel('occurences')
plt.bar([0, 1], [df.shape[0] - nan, nan], color=['#4caf50', '#f44336'])
plt.xticks([0, 1], ['Renseigné', 'Absent'], rotation='vertical')
# Distrib plot
plt.subplot(grid[:, 2:])
plt.title('Distribution')
x_pos = [i for i, _ in enumerate(sizes)]
plt.bar(x_pos, sizes)
plt.xticks(x_pos, labels, rotation='vertical')
plt.show()
df is my pandas DataFrame, nan the amount of null value for the feature I'm plotting, sizes is the number of occurrences for each value, and labels are the corresponding label's value.
You can use plt.margins(x=0, tight=True).
The default margins are 0.05 which means that 5% of the distance between the first and last x-values is used as a margin. Choosing a small value such as plt.margins(x=0-01, tight=True) would leave a bit of margin, so the bars don't look glued to the axes.
In some situations wider margins are desired. For example when drawing only two very narrow bars. So, the exact value for a plot to look nice depends on multiple factors as well as on personal preferences.
Note that plt.margins is best called after the last function that adds elements to the plot.
Also note that plt.xlim has a similar function of changing the drawing limits. Here is an example that sets xlim to have the same distance between the axes and the bars as between the bars themselves.
import matplotlib.pyplot as plt
fig, axes = plt.subplots(ncols=3, figsize=(12, 3))
for ax in axes:
width = 0.4
ax.bar([0, 1, 2], [2, 3, 5], width=width, color='turquoise')
if ax == axes[0]:
ax.margins(x=0.01)
ax.set_title('1% margins')
elif ax == axes[1]:
ax.margins(x=0.1)
ax.set_title('10% margins')
else:
ax.set_xlim(-1+width/2, 3-width/2)
ax.set_title('setting xlim')
plt.show()
I currently use the align=’edge’ parameter and positive/negative widths in pyplot.bar() to plot the bar data of one metric to each axis. However, if I try to plot a second set of data to one axis, it covers the first set. Is there a way for pyplot to automatically space this data correctly?
lns3 = ax[1].bar(bucket_df.index,bucket_df.original_revenue,color='c',width=-0.4,align='edge')
lns4 = ax[1].bar(bucket_df.index,bucket_df.revenue_lift,color='m',bottom=bucket_df.original_revenue,width=-0.4,align='edge')
lns5 = ax3.bar(bucket_df.index,bucket_df.perc_first_priced,color='grey',width=0.4,align='edge')
lns6 = ax3.bar(bucket_df.index,bucket_df.perc_revenue_lift,color='y',width=0.4,align='edge')
This is what it looks like when I show the plot:
The data shown in yellow completely covers the data in grey. I'd like it to be shown next to the grey data.
Is there any easy way to do this? Thanks!
The first argument to the bar() plotting method is an array of the x-coordinates for your bars. Since you pass the same x-coordinates they will all overlap. You can get what you want by staggering the bars by doing something like this:
x = np.arange(10) # define your x-coordinates
width = 0.1 # set a width for your plots
offset = 0.15 # define an offset to separate each set of bars
fig, ax = plt.subplots() # define your figure and axes objects
ax.bar(x, y1) # plot the first set of bars
ax.bar(x + offset, y2) # plot the second set of bars
Since you have a few sets of data to plot, it makes more sense to make the code a bit more concise (assume y_vals is a list containing the y-coordinates you'd like to plot, bucket_df.original_revenue, bucket_df.revenue_lift, etc.). Then your plotting code could look like this:
for i, y in enumerate(y_vals):
ax.bar(x + i * offset, y)
If you want to plot more sets of bars you can decrease the width and offset accordingly.