This question already has answers here:
Sort a pandas dataframe series by month name
(6 answers)
Closed 8 months ago.
I write a function to create barplot based on the column provide:
def bar_plot(dataset, col, figsize=(16,8)):
fig, ax = plt.subplots(figsize=figsize)
for loc in ['bottom', 'left']:
ax.spines[loc].set_visible(True)
ax.spines[loc].set_linewidth(2)
ax.spines[loc].set_color('black')
data = dataset[col].value_counts().reset_index()
ax = sns.barplot(data=data,x=col,y='index',orient='h', linewidth=1, edgecolor='k',color='#005EB8')
plt.title(f'Change counts by: {col.capitalize()}', size=16, fontweight='bold', color='#425563')
ax.set_ylabel('')
for p in ax.patches:
width = p.get_width()
plt.text(p.get_width(),
p.get_y()+.55*p.get_height(),
round(width),
va='center',
color='#425563')
When I provide the month in number, the plot is showing OK like below:
However, if I provide the full month name the last two values (Nov and Dec) are mingled in the plot:
I have been researching on it for some time now (I adjusted the yticks, ylim, etc.), but it seems without any luck so far. I can do with the month in number, but how can I fix this?
The issue was when I assigned month name in categorical order. I missed a "," between November and December. The issue is clear now after I insert the "," in that line of code.
month_order = ['January', 'February', 'March', 'April', 'May','June','July','August','September', 'October','November''December']
raw_df['month_name'] = pd.Categorical(raw_df.month_name,categories=month_order,ordered=True)
Related
This question already has answers here:
X-axis not properly aligned with bars in barplot (seaborn)
(2 answers)
Bar labels in matplotlib/Seaborn
(1 answer)
How to get the label values on a bar chat with seaborn on a categorical data
(2 answers)
Closed 2 months ago.
The code below is the code I am using for a heart failure analysis project. But,
This method is not centering the values of each bar under the graph, pictured below.
I am not getting the percentage value above each bar in the graph
def plot_percentage(df, col, target):
x,y = col, target
temp_df = df.groupby(x)[y].value_counts(normalize=True)
temp_df = temp_df.mul(100).rename('percent').reset_index()
temp_df = temp_df[temp_df.HeartDisease != 0]
order_list = list(df[col].unique())
order_list.sort()
sns.set(font_scale=1.5)
g = sns.catplot(x=x, y='percent', hue=x,kind='bar', data=temp_df, height=8, aspect=2, order=order_list, legend_out=False)
g.ax.set_ylim(0,100)
plt.title(f'{col.title()} By Percent {target.title()}',
fontdict={'fontsize': 30})
plt.xlabel(f'{col.title()}', fontdict={'fontsize': 20})
plt.ylabel(f'{target.title()} Percentage', fontdict={'fontsize': 20})
return g
This question already has answers here:
How to rank plot in seaborn boxplot
(2 answers)
How can I sort a boxplot in pandas by the median values?
(4 answers)
Closed 10 months ago.
Mtcars is a public dataset in R. I'm not sure it's a public dataset in python.
mtcars <- mtcars
I created this boxplot in R and part of what I'm doing is reordering the y-axis with the reorder() function.
ggplot(mtcars, aes(x = mpg, y = reorder(origin, mpg), color = origin)) +
geom_boxplot() +
theme(legend.position = "none") +
labs(title = "Mtcars", subtitle = "Box Plot") +
theme(plot.title = element_text(face = "bold")) +
ylab("country")
Now in python I have this boxplot that I created with seaborn:
plt.close()
seaborn.boxplot(x="mpg", y="origin", data=mtcars)
plt.suptitle("Mtcars", x=0.125, y=0.97, ha='left', fontweight = 'bold')
plt.title("boxplot", loc = 'left')
plt.show()
I'm trying to render it now but the same kind of treatment for R doesn't work.
plt.close()
seaborn.boxplot(x="mpg", y=reorder("origin", 'mpg'), data=mtcars)
plt.suptitle("Mtcars", x=0.125, y=0.97, ha='left', fontweight = 'bold')
plt.title("boxplot", loc = 'left')
plt.show()
It's not surprising it doesn't work because it's a different language; I do know that! But how would I do this reordering in python using Seaborn? I'm having trouble understanding if this is even part of the plotting process.
You can compute a custom order and feed it to seaborn's boxplot order parameter:
import seaborn as sns
mtcars = sns.load_dataset('mpg')
order = mtcars.groupby('origin')['mpg'].median().sort_values(ascending=False)
sns.boxplot(x="mpg", y="origin", data=mtcars, order=order.index)
plt.suptitle("Mtcars", x=0.125, y=0.97, ha='left', fontweight = 'bold')
plt.title("boxplot", loc = 'left')
plt.show()
NB. order also acts like a filter, so if values are missing, of non-existent they will be omitted in the graph
output:
This question already has answers here:
Matplotlib pie chart: Show both value and percentage
(2 answers)
Closed 12 months ago.
we are trying with the below code to get the pie charts but we can only see the percentage in pie chart for each category and unable to print its exact values along with percentage.
dfinfoofbuss = pd.read_csv("file.csv", header=None)
dfinfoofbuss.columns = ['ID', 'INFO']
dfinfoofbuss['VALUE_INFO'] = dfinfoofbuss['ID'].str.split('_').str[1]
dfinfoofbusscnt = dfinfoofbuss.groupby(['VALUE_INFO']).size().reset_index(name='COUNT_INFO')
print("dfinfoofbusscnt:",dfinfoofbusscnt)
plotvar3 = dfinfoofbusscnt.groupby(['VALUE_INFO']).sum().plot(kind='pie' ,title='pie chart', figsize=(6,6), autopct='%.2f', legend = False, use_index=False, subplots=True, colormap="Pastel1")
fig3 = plotvar3[0].get_figure()
fig3.savefig("Info.jpg")
Sample Data
VALUE_INFO CountInfo
abc 1
defair 2
cdf 109
aggr 1
sum 1
normal 2
dev 1
Is there a way to print its original values along with percentage in pie chart.. Pls suggest
You will likely need to write your own custom function to get both value and percentage as labels.
Try:
def formatter(x):
return f"{total*x/100:.0f} ({x:.2f})%"
total = sum(dfinfoofbusscnt["CountInfo"])
plotdata = dfinfoofbusscnt.groupby("VALUE_INFO").sum()
>>> plotdata.plot(kind='pie',
title='pie chart',
figsize=(6,6),
autopct=formatter,
colormap="Pastel1",
legend=False,
subplots=True
)
I have data in this format / shape etc in a dataframe that I would like to represent in the form of a graph showing the total counts per each month. I have resampled the data so that it shows one row for one month, and then I wrote the following code to chart it out:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
#Read in data & create total column
stacked_bar_data = new_df
stacked_bar_data["total"] = stacked_bar_data.var1 + stacked_bar_data.var2
#Set general plot properties
sns.set_style("whitegrid")
sns.set_context({"figure.figsize": (24, 10)})
sns.set_context("poster")
#Plot 1 - background - "total" (top) series
sns.barplot(x = stacked_bar_data.index, y = stacked_bar_data.total, color = "red")
#Plot 2 - overlay - "bottom" series
bottom_plot = sns.barplot(x = stacked_bar_data.index, y = stacked_bar_data.attended, color = "#0000A3")
topbar = plt.Rectangle((0,0),1,1,fc="red", edgecolor = 'none')
bottombar = plt.Rectangle((0,0),1,1,fc='#0000A3', edgecolor = 'none')
l = plt.legend([bottombar, topbar], ['var1', 'var2'], loc=1, ncol = 2, prop={'size':18})
l.draw_frame(False)
#Optional code - Make plot look nicer
sns.despine(left=True)
bottom_plot.set_ylabel("Count")
# bottom_plot.set_xlabel("date")
#Set fonts to consistent 16pt size
for item in ([bottom_plot.xaxis.label, bottom_plot.yaxis.label] +
bottom_plot.get_xticklabels() + bottom_plot.get_yticklabels()):
item.set_fontsize(16)
# making sure our xticks is formatted correctly
plt.xticks(fontsize=20)
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
years_fmt = mdates.DateFormatter('%Y')
bottom_plot.xaxis.set_major_locator(years)
bottom_plot.xaxis.set_major_formatter(years_fmt)
bottom_plot.xaxis.set_minor_locator(months)
plt.show()
# bottom_plot.axes.xaxis.set_visible(False)
Thing is, my chart doesn't show me the years at the bottom. I believe I have all the pieces necessary to solve this problem, but for some reason I can't figure out what I'm doing wrong.
I think I'm doing something wrong with how I set up the subplots of the sns.barplot. Maybe I should be assigning them to fig and ax or something like that? That's how I saw it done on the matplotlib site. I just can't managed to transfer that logic over to my example.
Any help would be most appreciated. Thanks!
There are few things to consider. First of all, please try to convert your date column (new_df.date) to datetime.
new_df.date = pd.to_datetime(new_df.date)
Second of all do not use this part:
bottom_plot.xaxis.set_major_locator(years)
bottom_plot.xaxis.set_major_formatter(years_fmt)
bottom_plot.xaxis.set_minor_locator(months)
Instead use:
x_dates = stacked_bar_data['date'].dt.strftime('%Y').sort_values().unique()
bottom_plot.set_xticklabels(labels=x_dates, rotation=0, ha='center')
This is because seaborn re-locates the bars to integer positions. Even if we set them to be dates - Note, that you used indices explicitly. Below is fully working example. Note - this gives you major ticks only. You'll have to work the minor ticks out. My comments and things I've commented out after double #.
stacked_bar_data.date = pd.to_datetime(stacked_bar_data.date)
stacked_bar_data["total"] = stacked_bar_data.var1 + stacked_bar_data.var2
#Set general plot properties
sns.set_style("whitegrid")
sns.set_context({"figure.figsize": (14, 7)}) ## modified size :)
sns.set_context("poster")
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
years_fmt = mdates.DateFormatter('%Y')
sns.barplot(x = stacked_bar_data.index, y = stacked_bar_data.total, color = "red")
bottom_plot = sns.barplot(x = stacked_bar_data.index, y = stacked_bar_data.attended, color = "#0000A3")
topbar = plt.Rectangle((0,0),1,1,fc="red", edgecolor = 'none')
bottombar = plt.Rectangle((0,0),1,1,fc='#0000A3', edgecolor = 'none')
l = plt.legend([bottombar, topbar], ['var1', 'var2'], loc=1, ncol = 2, prop={'size':18})
l.draw_frame(False)
#Optional code - Make plot look nicer
sns.despine(left=True)
bottom_plot.set_ylabel("Count")
# bottom_plot.set_xlabel("date")
# making sure our xticks is formatted correctly
## plt.xticks(fontsize=20) # not needed as you change font below in the loop
## Do not use at all
## bottom_plot.xaxis.set_major_locator(years)
## bottom_plot.xaxis.set_major_formatter(years_fmt)
## bottom_plot.xaxis.set_minor_locator(months)
#Set fonts to consistent 16pt size
for item in ([bottom_plot.xaxis.label, bottom_plot.yaxis.label] +
bottom_plot.get_xticklabels() + bottom_plot.get_yticklabels()):
item.set_fontsize(16)
## This part is required if you want to stick to seaborn
## This is because the moment you start using seaborn it will "re-position" the bars
## at integer position rather than dates. W/o seaborn there is no such need
x_dates = stacked_bar_data['date'].dt.strftime('%Y').sort_values().unique()
bottom_plot.set_xticklabels(labels=x_dates, rotation=0, ha='center')
plt.show()
I am trying to plot stacked yearly line graphs by months.
I have a dataframe df_year as below:
Day Number of Bicycle Hires
2010-07-30 6897
2010-07-31 5564
2010-08-01 4303
2010-08-02 6642
2010-08-03 7966
with the index set to the date going from 2010 July to 2017 July
I want to plot a line graph for each year with the xaxis being months from Jan to Dec and only the total sum per month is plotted
I have achieved this by converting the dataframe to a pivot table as below:
pt = pd.pivot_table(df_year, index=df_year.index.month, columns=df_year.index.year, aggfunc='sum')
This creates the pivot table as below which I can plot as show in the attached figure:
Number of Bicycle Hires 2010 2011 2012 2013 2014
1 NaN 403178.0 494325.0 565589.0 493870.0
2 NaN 398292.0 481826.0 516588.0 522940.0
3 NaN 556155.0 818209.0 504611.0 757864.0
4 NaN 673639.0 649473.0 658230.0 805571.0
5 NaN 722072.0 926952.0 749934.0 890709.0
plot showing yearly data with months on xaxis
The only problem is that the months show up as integers and I would like them to be shown as Jan, Feb .... Dec with each line representing one year. And I am unable to add a legend for each year.
I have tried the following code to achieve this:
dims = (15,5)
fig, ax = plt.subplots(figsize=dims)
ax.plot(pt)
months = MonthLocator(range(1, 13), bymonthday=1, interval=1)
monthsFmt = DateFormatter("%b '%y")
ax.xaxis.set_major_locator(months) #adding this makes the month ints disapper
ax.xaxis.set_major_formatter(monthsFmt)
handles, labels = ax.get_legend_handles_labels() #legend is nowhere on the plot
ax.legend(handles, labels)
Please can anyone help me out with this, what am I doing incorrectly here?
Thanks!
There is nothing in your legend handles and labels, furthermore the DateFormatter is not returning the right values considering they are not datetime objects your translating.
You could set the index specifically for the dates, then drop the multiindex column level which is created by the pivot (the '0') and then use explicit ticklabels for the months whilst setting where they need to occur on your x-axis. As follows:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import datetime
# dummy data (Days)
dates_d = pd.date_range('2010-01-01', '2017-12-31', freq='D')
df_year = pd.DataFrame(np.random.randint(100, 200, (dates_d.shape[0], 1)), columns=['Data'])
df_year.index = dates_d #set index
pt = pd.pivot_table(df_year, index=df_year.index.month, columns=df_year.index.year, aggfunc='sum')
pt.columns = pt.columns.droplevel() # remove the double header (0) as pivot creates a multiindex.
ax = plt.figure().add_subplot(111)
ax.plot(pt)
ticklabels = [datetime.date(1900, item, 1).strftime('%b') for item in pt.index]
ax.set_xticks(np.arange(1,13))
ax.set_xticklabels(ticklabels) #add monthlabels to the xaxis
ax.legend(pt.columns.tolist(), loc='center left', bbox_to_anchor=(1, .5)) #add the column names as legend.
plt.tight_layout(rect=[0, 0, 0.85, 1])
plt.show()