I am plotting a pandas dataframe. I have a bar chart, with stacked=True. The problem is, it puts the results on top of each other, instead of overlapping. If value a is 20, and value b is 40, the height of the bar will be 60. What I really want is a from 0-20 from the axis, and b from 20-40 (so that values are overlapped). Is this possible?
Related
I am plotting a stacked bar chart for a few fixed quantities using the usual:
bar = df.plot.barh(x='Intervention', y={'Remuneration','Training','Supervision'}, stacked=True )
I however noticed that depending on the input dataset, matplotlib changes the order of the plotted columns. For instance, sometimes it plots Remuneration as the first component of the stacked bar chart, whereas in other occasions it changes it to Training or Supervision. To be honest, I haven't been able to figure out what is the order being used. Ideally I'd like to keep always the same order as I have a list of colors to be used. Is there any way to force this re-ordering? Eg that the stacked bar chart always appears as Remuneration-Training-Supervision?
to set a custom order for the stacked bars, you will need to order it by setting the CategoricalIndex and sorting the data by these categories. This will sort the three categories in the way you need. A small example to show the same is below...
Data 'df'
Intervention Remuneration Training Supervision
A 21 4 12
B 41 5 21
C 33 6 7
Code
#Convert Intervention as index, so columns are the categories
df = df.set_index('Intervention')
#Set categories... Order will Remuneration-Training-Supervision
df.columns=pd.CategoricalIndex(df.columns.values, ordered=True, categories=['Remuneration','Training','Supervision'])
#Sort the data
df = df.sort_index(axis=1)
#...and plot
bar = df.plot.barh(stacked=True)
Output
I have searched many ways of making histograms centered around tick marks but not able to find a solution that works with seaborn displot. The function displot lets me stack the histogram according to a column in the dataframe and thus would prefer a solution using displot or something that allows stacking based on a column in a data frame with color-coding as with palette.
Even after setting the tick values, I am not able to get the bars to center around the tick marks.
Example code
# Center the histogram on the tick marks
tips = sns.load_dataset('tips')
sns.displot(x="total_bill",
hue="day", multiple = 'stack', data=tips)
plt.xticks(np.arange(0, 50, 5))
I would also like to plot a histogram of a variable that takes a single value and choose the bin width of the resulting histogram in such a way that it is centered around the value. (0.5 in this example.)
I can get the center point by choosing the number of bins equal to a number of tick marks but the resulting bar is very thin. How can I increase the bin size in this case, where there is only one bar but want to display all the other possible points. By displaying all the tick marks, the bar width is very tiny.
I want the same centering of the bar at the 0.5 tick mark but make it wider as it is the only value for which counts are displayed.
Any solutions?
tips['single'] = 0.5
sns.displot(x='single',
hue="day", multiple = 'stack', data=tips, bins = 10)
plt.xticks(np.arange(0, 1, 0.1))
Edit:
Would it be possible to have more control over the tick marks in the second case? I would not want to display the round off to 1 decimal place but chose which of the tick marks to display. Is it possible to display just one value in the tick mark and have it centered around that?
Does the min_val and max_val in this case refer to value of the variable which will be 0 in this case and then the x axis would be plotted on negative values even when there are none and dont want to display them.
For your first problem, you may want to figure out a few properties of the data that your plotting. For example the range of the data. Additionally, you may want to choose beforehand the number of bins that you want displayed.
tips = sns.load_dataset('tips')
min_val = tips.total_bill.min()
max_val = tips.total_bill.max()
val_width = max_val - min_val
n_bins = 10
bin_width = val_width/n_bins
sns.histplot(x="total_bill",
hue="day", multiple = 'stack', data=tips,
bins=n_bins, binrange=(min_val, max_val),
palette='Paired')
plt.xlim(0, 55) # Define x-axis limits
Another thing to remember is that width a of a bar in a histogram identifies the bounds of its range. So a bar spanning [2,5] on the x-axis implies that the values represented by that bar belong to that range.
Considering this, it is easy to formulate a solution. Assume that we want the original bar graphs - identifying the bounds of each bar graph, one solution may look like
plt.xticks(np.arange(min_val-bin_width, max_val+bin_width, bin_width))
Now, if we offset the ticks by half a bin-width, we will get to the centers of the bars.
plt.xticks(np.arange(min_val-bin_width/2, max_val+bin_width/2, bin_width))
For your single value plot, the idea remains the same. Control the bin_width and the x-axis range and ticks. Bin-width has to be controlled explicitly since automatic inference of bin-width will probably be 1 unit wide which on the plot will have no thickness. Histogram bars always indicate a range - even though when we have just one single value. This is illustrated in the following example and figure.
single_val = 23.5
tips['single'] = single_val
bin_width = 4
fig, axs = plt.subplots(1, 2, sharey=True, figsize=(12,4)) # Get 2 subplots
# Case 1 - With the single value as x-tick label on subplot 0
sns.histplot(x='single',
hue="day", multiple = 'stack', data=tips,
binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
palette='rocket',
ax=axs[0])
ticks = [single_val, single_val+bin_width] # 2 ticks - given value and given_value + width
axs[0].set(
title='Given value as tick-label starts the bin on x-axis',
xticks=ticks,
xlim=(0, int(single_val*2)+bin_width)) # x-range such that bar is at middle of x-axis
axs[0].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))
# Case 2 - With centering on the bin starting at single-value on subplot 1
sns.histplot(x='single',
hue="day", multiple = 'stack', data=tips,
binwidth=bin_width, binrange=(single_val-bin_width, single_val+bin_width),
palette='rocket',
ax=axs[1])
ticks = [single_val+bin_width/2] # Just the bin center
axs[1].set(
title='Bin centre is offset from single_value by bin_width/2',
xticks=ticks,
xlim=(0, int(single_val*2)+bin_width) ) # x-range such that bar is at middle of x-axis
axs[1].xaxis.set_major_formatter(FormatStrFormatter('%.1f'))
Output:
I feel from your description that what you are really implying by a bar graph is a categorical bar graph. The centering is then automatic. Because the bar is not a range anymore but a discrete category. For the numeric and continuous nature of the variable in the example data, I would not recommend such an approach. Pandas provides for plotting categorical bar plots. See here. For our example, one way to do this is as follows:
n_colors = len(tips['day'].unique()) # Get number of uniques categories
agg_df = tips[['single', 'day']].groupby(['day']).agg(
val_count=('single', 'count'),
val=('single','max')
).reset_index() # Get aggregated information along the categories
agg_df.pivot(columns='day', values='val_count', index='val').plot.bar(
stacked=True,
color=sns.color_palette("Paired", n_colors), # Choose "number of days" colors from palette
width=0.05 # Set bar width
)
plt.show()
This yields:
I'm trying to plot for each columns of the df_new on a separate bar plot, however it keeps on giving me all of the plots on each chart. as I have 33 columns, my solution should be 33 bar plots showing row 1 and rows 2 corresponding to each column.
Update, to make it clearer, image shows (shown in excel) how each plot would look, so the next one would be 'r(0,0)' and so on
I want to create a Pie chart using single column of my dataframe, say my column name is 'Score'. I have stored scores in this column as below :
Score
.92
.81
.21
.46
.72
.11
.89
Now I want to create a pie chart with the range in percentage.
Say 0-0.4 is 30% , 0.4-0.7 is 35 % , 0.7+ is 35% .
I am using the below code using
df1['bins'] = pd.cut(df1['Score'],bins=[0,0.5,1], labels=["0-50%","50-100%"])
df1 = df.groupby(['Score', 'bins']).size().unstack(fill_value=0)
df1.plot.pie(subplots=True,figsize=(8, 3))
With the above code I am getting the Pie chart, but i don’t know how i can do this using percentage.
my pie chart look like this for now
Cutting the dataframe up into bins is the right first step. After which, you can use value_counts with normalize=True in order to get relative frequencies of values in the bins column. This will let you see percentage of data across ranges that are defined in the bins.
In terms of plotting the pie chart, I'm not sure if I understood correctly, but it seemed like you would like to display the correct legend values and the percentage values in each slice of the pie.
pandas.DataFrame.plot is a good place to see all parameters that can be passed into the plot method. You can specify what are your x and y columns to use, and by default, the dataframe index is used as the legend in the pie plot.
To show the percentage values per slice, you can use the autopct parameter as well. As mentioned in this answer, you can use all the normal matplotlib plt.pie() flags in the plot method as well.
Bringing everything together, this is the resultant code and the resultant chart:
df = pd.DataFrame({'Score': [0.92,0.81,0.21,0.46,0.72,0.11,0.89]})
df['bins'] = pd.cut(df['Score'], bins=[0,0.4,0.7,1], labels=['0-0.4','0.4-0.7','0.7-1'], right=True)
bin_percent = pd.DataFrame(df['bins'].value_counts(normalize=True) * 100)
plot = bin_percent.plot.pie(y='bins', figsize=(5, 5), autopct='%1.1f%%')
Plot of Pie Chart
I have a Dataframe with 14 rows and 7 columns where the columns represent groups and the rows represent months. I am trying to create a grouped bar plot such that at each month (on the x-axis) I will have the values for each of the groups as bars. The code is simply
ax = df.plot.bar(width=1,color=['b','g','r','c','orange','purple','y']);
ax.legend(loc='center left', bbox_to_anchor=(1.0, 0.5))
ax.set_xticklabels(months2,rotation=45)
Which produces the following result:
I would like to make the individual bars in each group wider but without them overlapping and I would also like to increase the distance between each group of bars so that there is enough space in the plot.
It might be worth mentioning that the index of the dataframe is 0,...,13.
Help would be greatly appreciated!
TH
If you want to pack 10 apples in a box and want the apples to have more space between them you have two options: (1) take a larger box, or (2) use smaller apples.
(1) How do you change the size of figures drawn with matplotlib?
(2) change the width argument.