Python Pandas Stacked Bar Chart x-axis labels - python

I've got the below dataframe:
Months Region Open Case ID Closed Case ID
April APAC 648888 648888
April US 157790
April UK 221456 221456
April APAC 425700
April US 634156 634156
April UK 109445
April APAC 442459 442459
May US 218526
May UK 317079 317079
May APAC 458098
May US 726342 726342
May UK 354155
May APAC 463582 463582
May US 511059
June UK 97186 97186
June APAC 681548
June US 799169 799169
June UK 210129
June APAC 935887 935887
June US 518106
June UK 69279 69279
and I am getting the counts of the Open Case ID and Closed Case ID with:
df = df.groupby(['Months','Region']).count()
I am trying to replicate the below chart generated by Excel, which looks like this:
and I am getting the below with:
df[['Months','Region']].plot.bar(stacked=True, rot=0, alpha=0.5, legend=False)
Is there a way to get the chart generated by python closer to the chart generated by Excel in terms of how the x-axis and its labels are broken down?

Theres are great solution for similar question to design multi index labels here. You can use the same parameters of plot with ax=fig.gca() in that solution i.e
import matplotlib.pyplot as plt
# add_line,label_len,label_group_bar_table from https://stackoverflow.com/a/39502106/4800652
fig = plt.figure()
ax = fig.add_subplot(111)
#Your df.plot code with ax parameter here
df.plot.bar(stacked=True, rot=0, alpha=0.5, legend=False, ax=fig.gca())
labels = ['' for item in ax.get_xticklabels()]
ax.set_xticklabels(labels)
ax.set_xlabel('')
label_group_bar_table(ax, df)
fig.subplots_adjust(bottom=.1*df.index.nlevels)
plt.show()
Output based on sample data:

Related

How to plot two bar diagram for two colums of a given table using seaborn or matplotib

This might be a simple task but I am new to plotting in python and is struggling to convert logic into code. I have 3 columns like below that consists of Countries, Quantities and Revenues:
Country
Quantities
Revenues
United Kingdom
2915836
8125479.97
EIRE
87390
253026.10
Netherlands
127083
245279.99
Germany
72068
202050.01
France
68439
184024.28
Australia
52611
122974.01
Spain
18947
56444.29
Switzerland
18769
50671.57
Belgium
12068
34926.92
Norway
10965
32184.10
Japan
14207
31914.79
Portugal
10430
30247.57
Sweden
10720
24456.55
All I want to do is creating a side by side bars for each country which would represent the revenue and quantity for each region.
So far, i have came across performing this:
sns.catplot(kind = 'bar', data = dj, y = 'Quantities,Revenues', x = 'Country', hue = 'Details')
plt.show()
But this cannot interpret the input "Country".
I hope I am making sense.
With pandas, you can simply use pandas.DataFrame.plot.bar :
dj.plot.bar(x="Country", figsize=(10, 5))
#dj[dj["Country"].ne("United Kingdom")].plot.bar(x="Country", figsize=(10, 5)) #to exclude UK
With seaborn, you can use seaborn.barplot after pandas.DataFrame.melting the original df.
fig, ax = plt.subplots(figsize=(10, 5))
ax.tick_params(axis='x', rotation=90)
dj_m = dj.melt(id_vars="Country", value_name="Values", var_name="Variables")
sns.barplot(data=dj_m, x='Country', y="Values", hue="Variables", ax=ax)
# Output :
pandas already has a built-in plotting function: .plot and you can choose which type by specifying it like; .bar(), .scatter() or using kind= and then the type; kind='bar' or kind='scatter'. So, in this situation you will use a bar.
import matplotlib.pyplot as plt # import this to show the plot
df.plot.bar(x="Country", **kwargs) # plot the bars
plt.show() # show it

Seaborn lineplot, show months when index of your data is Date

I have a Kaggle dataset (link).
I read the dataset, and I set the Date to be index column:
museum_data = pd.read_csv("museum_visitors.csv", index_col = "Date", parse_dates = True)
Then, the museum_data be like:
Date
Avila Adobe
Firehouse Museum
Chinese American Museum
America Tropical Interpretive Center
2014-01-01
24778
4486
1581
6602
2014-02-01
18976
4172
1785
5029
...
...
...
...
...
2018-10-01
19280
4622
2364
3775
2018-11-01
17163
4082
2385
4562
Here is the code I use to plot the lineplot in seaborn:
plt.figure(figsize = (20,8))
sns.lineplot(data = museum_data)
plt.show()
And, this is what the result looks like:
What I want to know is that, how I can show multiple (not all, for example, first month of each season) months per year in x-axis.
Thank you all for your time, in advance.
You can use MonthLocator and perhaps ConciseDateFormatter to add minor ticks with a few months showing, something like the following:
import matplotlib.dates as mdates
...
fig, ax = plt.subplots(figsize = (20,8))
sns.lineplot(data = museum_data, ax=ax)
locator = mdates.MonthLocator(bymonth=[4,7,10])
ax.xaxis.set_minor_locator(locator)
ax.xaxis.set_minor_formatter(mdates.ConciseDateFormatter(locator))
Output:
Edit (closer): you can add the following to show January as well:
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
Output:
Edit 2 (there's probably a better way but I'm rusty):
length = plt.rcParams["xtick.minor.size"]
pad = plt.rcParams['xtick.minor.pad']
ax.tick_params('x', length=length, pad=pad)

How to plot daily data as monthly averages (for separate years)

I am trying to plot a graph to represent a monthly river discharge dataset from 1980-01-01 to 2013-12-31.
Please check out this graph
The plan is to plot "Jan Feb Mar Apr May...Dec" as the x-axis and the discharge (m3/s) as the y-axis. The actual lines on the graphs would represent the years. Alternatively, the lines on the graph would showcase monthly average (from jan to dec) of every year from 1980 to 2013.
DAT = pd.read_excel('Modelled Discharge_UIB_1980-2013_Daily.xlsx',
sheet_name='Karhmong', header=None, skiprows=1,
names=['year', 'month', 'day', 'flow'],
parse_dates={ 'date': ['year', 'month', 'day'] },
index_col='date')
the above is to show what type of data it is
date flow
1980-01-01 104.06
1980-01-02 103.81
1980-01-03 103.57
1980-01-04 103.34
1980-01-05 103.13
... ...
2013-12-27 105.65
2013-12-28 105.32
2013-12-29 105.00
2013-12-30 104.71
2013-12-31 104.42
because I want to compare all the years to each other so I tried the below command
DAT1980 = DAT[DAT.index.year==1980]
DAT1980
DAT1981 = DAT[DAT.index.year==1981
DAT1981
...etc
in terms of grouping the months for the x-axis I tried grouping months using the command
datmonth = np.unique(DAT.index.month)
so far all of these commands caused no error
however as I plot the graph I got this error
Graph plot command
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))
ax.plot(datmonth, DAT1980, color='purple', linestyle='--', label='1980')
ax.grid()
plt.legend()
ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
axs.set_xlim(3, 5)
axs.xaxis.set_major_formatter
fig.autofmt_xdate()
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))
which I got "ValueError: x and y must have same first dimension, but have shapes (12,) and (366, 1)" as the error
I then tried
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))
ax.plot(DAT.index.month, DAT.index.year==1980, color='purple', linestyle='--', label='1980')
ax.grid()
ax.plot(DAT.index.month, DAT.index.year==1981, color='black', marker='o', linestyle='-', label='C1981')
ax.grid()
plt.legend()
ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
#axs.set_xlim(1, 12)
axs.xaxis.set_major_formatter
fig.autofmt_xdate()
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))
and it worked better than the previous graph but still not what I wanted
(please check out the graph here)
as my intention is to create a graph similar to this
I wholeheartedly appreciate any suggestion you may have! Thank you so so much and if you need any further information please do not hesitate to ask, I will reply as soon as possible.
Welcome to SO! Nice job creating a clear description of your issue and showing lots of code : )
There are a few syntax issues here and there, but the main issue I see is that you need to add a groupby/aggregation operation at some point. That is, you have daily data, but your desired plot has monthly resolution (for each year). It sounds like you want an average of the daily values for each month for each year (correct me if that is wrong).
Here is some fake data:
dr = pd.date_range('01-01-1980', '12-31-2013', freq='1D')
flow = np.random.rand(len(dr))
df = pd.DataFrame(flow, columns=['flow'], index=dr)
Looks like your example:
flow
1980-01-01 0.751287
1980-01-02 0.411040
1980-01-03 0.134878
1980-01-04 0.692086
1980-01-05 0.671108
...
2013-12-27 0.683654
2013-12-28 0.772894
2013-12-29 0.380631
2013-12-30 0.957220
2013-12-31 0.864612
[12419 rows x 1 columns]
You can use groupby to get a mean for each month, using the same datetime attributes you use above (with some additional methods to help make the data easier to work with)
monthly = (df.groupby([df.index.year, df.index.month])
.mean()
.rename_axis(index=['year', 'month'],)
.reset_index())
monthly has flow data for each month for each year, i.e. what you want to plot:
year month flow
0 1980 1 0.514496
1 1980 2 0.633738
2 1980 3 0.566166
3 1980 4 0.553763
4 1980 5 0.537686
.. ... ... ...
403 2013 8 0.402805
404 2013 9 0.479226
405 2013 10 0.446874
406 2013 11 0.526942
407 2013 12 0.599161
[408 rows x 3 columns]
Now to plot an individual year, you index it from monthly and plot the flow data. I use most of your axes formatting:
# make figure
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))
# plotting for one year
sub = monthly[monthly['year'] == 1980]
ax.plot(sub['month'], sub['flow'], color='purple', linestyle='--', label='1980')
# some formatting
ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
ax.set_xticks(range(1, 13))
ax.set_xticklabels(['J','F','M','A','M','J','J','A','S','O','N','D'])
ax.legend()
ax.grid()
Producing the following:
You could instead plot several years using a loop of some sort:
years = [1980, 1981, 1982, ...]
for year in years:
sub = monthly[monthly['year'] == year]
ax.plot(sub['month'], sub['flow'], ...)
You many run into some other challenges here (like finding a way to set nice styling for 30+ lines, and doing so in a loop). You can open a new post (building off of this one) if you can't find out how to accomplish something through other posts here. Best of luck!

How can I plot a pandas dataframe and have groupings in the x-axis?

I have a pandas dataframe that is grouped by Month, and with each month is a Region with their respective Sales and Profit amounts.
I'm able to create a chart (link to the graphic is at the end of the post) but the X-axis displays the (Date, Region) for each vertical bar. What I'm looking to display is to have each vertical bar labeled with only the respective Region, than group each of the 4 regions by the Month.
Is this possible?
Python code:
fig, ax = plt.subplots(figsize=(17, 6))
result.plot(kind='bar', ax=ax)
plt.show()
Pandas Dataframe contents
Profit Sales
Order Date Region
2015-01-31 Central -602.8452 2510.5116
East -1633.3880 4670.9940
South -1645.0817 4965.8340
West 600.3079 6026.7360
2015-02-28 Central 330.9740 2527.5860
East 1806.5875 6463.1330
South 222.5703 1156.5140
West 453.7190 1804.1780
2015-03-31 Central 51.4141 6730.2680
East 1474.0029 6011.7410
South 3982.8631 10322.0950
West 4223.8177 15662.1480
2015-04-30 Central 992.0608 11642.0550
East 1095.2726 7778.7960
South 767.3671 5718.3335
West 1332.7957 9056.0240
2015-05-31 Central 963.9297 8623.9030
East 1633.5375 7481.4240
South 429.2514 1983.7040
West 1641.1504 12042.6555
...
Chart
If you want to simply change the ordering of the bars, you can change the order of your index and then sort the index:
result = result.reorder_levels(['Region', 'Order Date']).sort_index()
fig, ax = plt.subplots(figsize=(17, 6))
result.plot(kind='bar')
plt.show()
If you want 4 separate plots, you can try create 4 subplots and map a filtered dataset for each region to each of the subplot axes:
# reorder levels for easier index slicing
result = result.reorder_levels(['Region', 'Order Date']).sort_index()
fig, axes = plt.subplots(1, 4, figsize=(17, 6))
result.loc['Central'].plot.bar(ax = axes[0])
result.loc['East'].plot.bar(ax = axes[1])
result.loc['South'].plot.bar(ax = axes[2])
result.loc['West'].plot.bar(ax = axes[3])
plt.show()
From there you can tweak your subplot titles, axis labels, add annotation, etc. to get it to look the way you want.

Add months to xaxis and legend on a matplotlib line plot

I am trying to plot stacked yearly line graphs by months.
I have a dataframe df_year as below:
Day Number of Bicycle Hires
2010-07-30 6897
2010-07-31 5564
2010-08-01 4303
2010-08-02 6642
2010-08-03 7966
with the index set to the date going from 2010 July to 2017 July
I want to plot a line graph for each year with the xaxis being months from Jan to Dec and only the total sum per month is plotted
I have achieved this by converting the dataframe to a pivot table as below:
pt = pd.pivot_table(df_year, index=df_year.index.month, columns=df_year.index.year, aggfunc='sum')
This creates the pivot table as below which I can plot as show in the attached figure:
Number of Bicycle Hires 2010 2011 2012 2013 2014
1 NaN 403178.0 494325.0 565589.0 493870.0
2 NaN 398292.0 481826.0 516588.0 522940.0
3 NaN 556155.0 818209.0 504611.0 757864.0
4 NaN 673639.0 649473.0 658230.0 805571.0
5 NaN 722072.0 926952.0 749934.0 890709.0
plot showing yearly data with months on xaxis
The only problem is that the months show up as integers and I would like them to be shown as Jan, Feb .... Dec with each line representing one year. And I am unable to add a legend for each year.
I have tried the following code to achieve this:
dims = (15,5)
fig, ax = plt.subplots(figsize=dims)
ax.plot(pt)
months = MonthLocator(range(1, 13), bymonthday=1, interval=1)
monthsFmt = DateFormatter("%b '%y")
ax.xaxis.set_major_locator(months) #adding this makes the month ints disapper
ax.xaxis.set_major_formatter(monthsFmt)
handles, labels = ax.get_legend_handles_labels() #legend is nowhere on the plot
ax.legend(handles, labels)
Please can anyone help me out with this, what am I doing incorrectly here?
Thanks!
There is nothing in your legend handles and labels, furthermore the DateFormatter is not returning the right values considering they are not datetime objects your translating.
You could set the index specifically for the dates, then drop the multiindex column level which is created by the pivot (the '0') and then use explicit ticklabels for the months whilst setting where they need to occur on your x-axis. As follows:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import datetime
# dummy data (Days)
dates_d = pd.date_range('2010-01-01', '2017-12-31', freq='D')
df_year = pd.DataFrame(np.random.randint(100, 200, (dates_d.shape[0], 1)), columns=['Data'])
df_year.index = dates_d #set index
pt = pd.pivot_table(df_year, index=df_year.index.month, columns=df_year.index.year, aggfunc='sum')
pt.columns = pt.columns.droplevel() # remove the double header (0) as pivot creates a multiindex.
ax = plt.figure().add_subplot(111)
ax.plot(pt)
ticklabels = [datetime.date(1900, item, 1).strftime('%b') for item in pt.index]
ax.set_xticks(np.arange(1,13))
ax.set_xticklabels(ticklabels) #add monthlabels to the xaxis
ax.legend(pt.columns.tolist(), loc='center left', bbox_to_anchor=(1, .5)) #add the column names as legend.
plt.tight_layout(rect=[0, 0, 0.85, 1])
plt.show()

Categories