How to plot daily data as monthly averages (for separate years) - python

I am trying to plot a graph to represent a monthly river discharge dataset from 1980-01-01 to 2013-12-31.
Please check out this graph
The plan is to plot "Jan Feb Mar Apr May...Dec" as the x-axis and the discharge (m3/s) as the y-axis. The actual lines on the graphs would represent the years. Alternatively, the lines on the graph would showcase monthly average (from jan to dec) of every year from 1980 to 2013.
DAT = pd.read_excel('Modelled Discharge_UIB_1980-2013_Daily.xlsx',
sheet_name='Karhmong', header=None, skiprows=1,
names=['year', 'month', 'day', 'flow'],
parse_dates={ 'date': ['year', 'month', 'day'] },
index_col='date')
the above is to show what type of data it is
date flow
1980-01-01 104.06
1980-01-02 103.81
1980-01-03 103.57
1980-01-04 103.34
1980-01-05 103.13
... ...
2013-12-27 105.65
2013-12-28 105.32
2013-12-29 105.00
2013-12-30 104.71
2013-12-31 104.42
because I want to compare all the years to each other so I tried the below command
DAT1980 = DAT[DAT.index.year==1980]
DAT1980
DAT1981 = DAT[DAT.index.year==1981
DAT1981
...etc
in terms of grouping the months for the x-axis I tried grouping months using the command
datmonth = np.unique(DAT.index.month)
so far all of these commands caused no error
however as I plot the graph I got this error
Graph plot command
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))
ax.plot(datmonth, DAT1980, color='purple', linestyle='--', label='1980')
ax.grid()
plt.legend()
ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
axs.set_xlim(3, 5)
axs.xaxis.set_major_formatter
fig.autofmt_xdate()
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))
which I got "ValueError: x and y must have same first dimension, but have shapes (12,) and (366, 1)" as the error
I then tried
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))
ax.plot(DAT.index.month, DAT.index.year==1980, color='purple', linestyle='--', label='1980')
ax.grid()
ax.plot(DAT.index.month, DAT.index.year==1981, color='black', marker='o', linestyle='-', label='C1981')
ax.grid()
plt.legend()
ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
#axs.set_xlim(1, 12)
axs.xaxis.set_major_formatter
fig.autofmt_xdate()
ax.legend(loc='upper left', bbox_to_anchor=(1, 1))
and it worked better than the previous graph but still not what I wanted
(please check out the graph here)
as my intention is to create a graph similar to this
I wholeheartedly appreciate any suggestion you may have! Thank you so so much and if you need any further information please do not hesitate to ask, I will reply as soon as possible.

Welcome to SO! Nice job creating a clear description of your issue and showing lots of code : )
There are a few syntax issues here and there, but the main issue I see is that you need to add a groupby/aggregation operation at some point. That is, you have daily data, but your desired plot has monthly resolution (for each year). It sounds like you want an average of the daily values for each month for each year (correct me if that is wrong).
Here is some fake data:
dr = pd.date_range('01-01-1980', '12-31-2013', freq='1D')
flow = np.random.rand(len(dr))
df = pd.DataFrame(flow, columns=['flow'], index=dr)
Looks like your example:
flow
1980-01-01 0.751287
1980-01-02 0.411040
1980-01-03 0.134878
1980-01-04 0.692086
1980-01-05 0.671108
...
2013-12-27 0.683654
2013-12-28 0.772894
2013-12-29 0.380631
2013-12-30 0.957220
2013-12-31 0.864612
[12419 rows x 1 columns]
You can use groupby to get a mean for each month, using the same datetime attributes you use above (with some additional methods to help make the data easier to work with)
monthly = (df.groupby([df.index.year, df.index.month])
.mean()
.rename_axis(index=['year', 'month'],)
.reset_index())
monthly has flow data for each month for each year, i.e. what you want to plot:
year month flow
0 1980 1 0.514496
1 1980 2 0.633738
2 1980 3 0.566166
3 1980 4 0.553763
4 1980 5 0.537686
.. ... ... ...
403 2013 8 0.402805
404 2013 9 0.479226
405 2013 10 0.446874
406 2013 11 0.526942
407 2013 12 0.599161
[408 rows x 3 columns]
Now to plot an individual year, you index it from monthly and plot the flow data. I use most of your axes formatting:
# make figure
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(12,6))
# plotting for one year
sub = monthly[monthly['year'] == 1980]
ax.plot(sub['month'], sub['flow'], color='purple', linestyle='--', label='1980')
# some formatting
ax.set_title('Monthly River Indus Discharge Comparison 1980-2013')
ax.set_ylabel('Discharge (m3/s)')
ax.set_xlabel('Month')
ax.set_xticks(range(1, 13))
ax.set_xticklabels(['J','F','M','A','M','J','J','A','S','O','N','D'])
ax.legend()
ax.grid()
Producing the following:
You could instead plot several years using a loop of some sort:
years = [1980, 1981, 1982, ...]
for year in years:
sub = monthly[monthly['year'] == year]
ax.plot(sub['month'], sub['flow'], ...)
You many run into some other challenges here (like finding a way to set nice styling for 30+ lines, and doing so in a loop). You can open a new post (building off of this one) if you can't find out how to accomplish something through other posts here. Best of luck!

Related

how to plot line graphs with an iterate method and assign proper labels at each of them

This is the dataset I am working on
Update Pb95 Pb98 diesel heating oil
0 6/2/2022 7519 8311 7172 5582
1 6/1/2022 7406 8194 6912 5433
2 5/31/2022 7213 7950 6754 5394
3 5/28/2022 7129 7864 6711 5360
4 5/27/2022 7076 7798 6704 5366
5 5/26/2022 6895 7504 6502 5182
6 5/25/2022 6714 7306 6421 5130
7 5/24/2022 6770 7358 6405 5153
8 5/21/2022 6822 7421 6457 5216
9 5/20/2022 6826 7430 6523 5281
I am attempting to create some elegant graphs in order to represent the relationship between time vs price change. I have use the following code for a single graph
import matplotlib.pyplot as plt
plt.plot(df['Update'], df['Pb95'], label='sales', linewidth=3)
plt.rcParams["figure.figsize"] = (8,8)
#add title and axis labels
plt.title('Fuels price by Date')
plt.xlabel('Date')
plt.ylabel('Price (PLN)')
#add legend
plt.legend()
#display plot
plt.show()
Now I would have liked to plot the lines for each fuel in different plots by using a for a loop. I have used the following code:
df_columns = df.iloc[:,1:6]
for i in df_columns:
plt.plot(df['Update'], df[i], label='sales', linewidth=3)
plt.title('Fuels price by Date')
plt.xlabel('Date')
plt.ylabel('Price (PLN)')
plt.legend()
plt.show()
I have obtained a unique plot
but now I would like, where I would like to assign the proper name of fuel to each of the lines. Is there anyone that could help with one of the two ways I would like (separate or
in the same plot) I would like to represent this relationship?
I would like above all to learn how to assign labels (for one or the other way to plot a graph) by an iterated method.
Thanks
You can do that by modifying your code slightly. I'm using the dataframe exerpt you provided.
for column in df.columns[1:]:
plt.plot(df['Update'], df[column], label=column, linewidth=3)
plt.title('Fuels price by Date')
plt.xlabel('Date')
plt.ylabel('Price (PLN)')
plt.legend()
plt.show()
Gives this:
Alternatively, you can do this by pandas' own plot method.
df.plot.line(x='Update', y=['Pb95', 'Pb98', 'diesel', 'heating oil'])
Gives this:

Seaborn lineplot, show months when index of your data is Date

I have a Kaggle dataset (link).
I read the dataset, and I set the Date to be index column:
museum_data = pd.read_csv("museum_visitors.csv", index_col = "Date", parse_dates = True)
Then, the museum_data be like:
Date
Avila Adobe
Firehouse Museum
Chinese American Museum
America Tropical Interpretive Center
2014-01-01
24778
4486
1581
6602
2014-02-01
18976
4172
1785
5029
...
...
...
...
...
2018-10-01
19280
4622
2364
3775
2018-11-01
17163
4082
2385
4562
Here is the code I use to plot the lineplot in seaborn:
plt.figure(figsize = (20,8))
sns.lineplot(data = museum_data)
plt.show()
And, this is what the result looks like:
What I want to know is that, how I can show multiple (not all, for example, first month of each season) months per year in x-axis.
Thank you all for your time, in advance.
You can use MonthLocator and perhaps ConciseDateFormatter to add minor ticks with a few months showing, something like the following:
import matplotlib.dates as mdates
...
fig, ax = plt.subplots(figsize = (20,8))
sns.lineplot(data = museum_data, ax=ax)
locator = mdates.MonthLocator(bymonth=[4,7,10])
ax.xaxis.set_minor_locator(locator)
ax.xaxis.set_minor_formatter(mdates.ConciseDateFormatter(locator))
Output:
Edit (closer): you can add the following to show January as well:
ax.xaxis.set_major_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b\n%Y'))
Output:
Edit 2 (there's probably a better way but I'm rusty):
length = plt.rcParams["xtick.minor.size"]
pad = plt.rcParams['xtick.minor.pad']
ax.tick_params('x', length=length, pad=pad)

Display Multiple Year's Data Using Custom Start/End Dates - datetime, matplotlib

I feel like this question has an obvious answer and I'm just being a bit of a fool. Say you have a couple of dataframes with datetime indices, where each dataframe is for a different year. In my case the index is every day going from June 25th to June 24th the next year:
date var
2019-06-25 107.230294
2019-06-26 104.110004
2019-06-27 104.291506
2019-06-28 111.162552
2019-06-29 112.515364
...
2020-06-20 132.840242
2020-06-21 127.641148
2020-06-22 132.797584
2020-06-23 129.094451
2020-06-24 110.408866
What I want is a single plot with multiple lines, where each line represents a year. The y-axis is my variable, var, and the x-axis should be day of the year. The x-axis should start from June 25th and end at June 24th.
This is what I've tried so far but it messes up the x-axis. Anyone know a more elegant way to do this?
fig, ax = plt.subplots()
plt.plot(average_prices19.index.strftime("%d/%m"), average_prices19.var, label = "2019-20")
plt.plot(average_prices20.index.strftime("%d/%m"), average_prices20.var, label = "2020-21")
plt.legend()
plt.show()
Well, there is a twist in this question: the list of dates in a year is not constant: on leap years there is a 'Feb-29' that is otherwise absent.
If you are comfortable glossing over this (and always representing a potential 'Feb-29' date on your plot, with missing data for non-leap years), then the following will achieve what you are seeking (assuming the data is in df with the date as DateTimeIndex):
import matplotlib.dates as mdates
fig, ax = plt.subplots()
for label, dfy in df.assign(
# note: 2000 is a leap year; the choice is deliberate
date=pd.to_datetime(df.index.strftime('2000-%m-%d')),
label=df.index.strftime('%Y')
).groupby('label'):
dfy.set_index('date')['var'].plot(ax=ax, label=str(label))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%m-%d"))
ax.legend()
Update
For larger amounts of data however, the above does not produce very legible xlabels. So instead, we can use ConciseFormatter to customize the display of xlabels (and remove the fake year 2000):
import matplotlib.dates as mdates
fig, ax = plt.subplots()
for label, dfy in df.assign(
# note: 2000 is a leap year; the choice is deliberate
date=pd.to_datetime(df.index.strftime('2000-%m-%d')),
label=df.index.strftime('%Y')
).groupby('label'):
dfy.set_index('date')['var'].plot(ax=ax, label=str(label))
ax.legend()
locator = mdates.AutoDateLocator(minticks=3, maxticks=7)
formatter = mdates.ConciseDateFormatter(
locator,
formats=['', '%b', '%d', '%H:%M', '%H:%M', '%S.%f'],
offset_formats=['', '', '%b', '%b-%d', '%b-%d', '%b-%d %H:%M']
)
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
For the data in your example:
For more data:
# setup
idx = pd.date_range('2016-01-01', 'now', freq='QS')
df = pd.DataFrame(
{'var': np.random.uniform(size=len(idx))},
index=idx).resample('D').interpolate(method='polynomial', order=5)
Corresponding plot (with ConciseFormatter):

Add months to xaxis and legend on a matplotlib line plot

I am trying to plot stacked yearly line graphs by months.
I have a dataframe df_year as below:
Day Number of Bicycle Hires
2010-07-30 6897
2010-07-31 5564
2010-08-01 4303
2010-08-02 6642
2010-08-03 7966
with the index set to the date going from 2010 July to 2017 July
I want to plot a line graph for each year with the xaxis being months from Jan to Dec and only the total sum per month is plotted
I have achieved this by converting the dataframe to a pivot table as below:
pt = pd.pivot_table(df_year, index=df_year.index.month, columns=df_year.index.year, aggfunc='sum')
This creates the pivot table as below which I can plot as show in the attached figure:
Number of Bicycle Hires 2010 2011 2012 2013 2014
1 NaN 403178.0 494325.0 565589.0 493870.0
2 NaN 398292.0 481826.0 516588.0 522940.0
3 NaN 556155.0 818209.0 504611.0 757864.0
4 NaN 673639.0 649473.0 658230.0 805571.0
5 NaN 722072.0 926952.0 749934.0 890709.0
plot showing yearly data with months on xaxis
The only problem is that the months show up as integers and I would like them to be shown as Jan, Feb .... Dec with each line representing one year. And I am unable to add a legend for each year.
I have tried the following code to achieve this:
dims = (15,5)
fig, ax = plt.subplots(figsize=dims)
ax.plot(pt)
months = MonthLocator(range(1, 13), bymonthday=1, interval=1)
monthsFmt = DateFormatter("%b '%y")
ax.xaxis.set_major_locator(months) #adding this makes the month ints disapper
ax.xaxis.set_major_formatter(monthsFmt)
handles, labels = ax.get_legend_handles_labels() #legend is nowhere on the plot
ax.legend(handles, labels)
Please can anyone help me out with this, what am I doing incorrectly here?
Thanks!
There is nothing in your legend handles and labels, furthermore the DateFormatter is not returning the right values considering they are not datetime objects your translating.
You could set the index specifically for the dates, then drop the multiindex column level which is created by the pivot (the '0') and then use explicit ticklabels for the months whilst setting where they need to occur on your x-axis. As follows:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import datetime
# dummy data (Days)
dates_d = pd.date_range('2010-01-01', '2017-12-31', freq='D')
df_year = pd.DataFrame(np.random.randint(100, 200, (dates_d.shape[0], 1)), columns=['Data'])
df_year.index = dates_d #set index
pt = pd.pivot_table(df_year, index=df_year.index.month, columns=df_year.index.year, aggfunc='sum')
pt.columns = pt.columns.droplevel() # remove the double header (0) as pivot creates a multiindex.
ax = plt.figure().add_subplot(111)
ax.plot(pt)
ticklabels = [datetime.date(1900, item, 1).strftime('%b') for item in pt.index]
ax.set_xticks(np.arange(1,13))
ax.set_xticklabels(ticklabels) #add monthlabels to the xaxis
ax.legend(pt.columns.tolist(), loc='center left', bbox_to_anchor=(1, .5)) #add the column names as legend.
plt.tight_layout(rect=[0, 0, 0.85, 1])
plt.show()

In Pandas, generate DateTime index from Multi-Index with years and weeks

I have a DataFrame df with columns saledate (in DateTime, dytpe <M8[ns]) and price (dytpe int64), such if I plot them like
fig, ax = plt.subplots()
ax.plot_date(dfp['saledate'],dfp['price']/1000.0,'.')
ax.set_xlabel('Date of sale')
ax.set_ylabel('Price (1,000 euros)')
I get a scatter plot which looks like below.
Since there are so many points that it is difficult to discern an average trend, I'd like to compute the average sale price per week, and plot that in the same plot. I've tried the following:
dfp_week = dfp.groupby([dfp['saledate'].dt.year, dfp['saledate'].dt.week]).mean()
If I plot the resulting 'price' column like this
plt.figure()
plt.plot(df_week['price'].values/1000.0)
plt.ylabel('Price (1,000 euros)')
I can more clearly discern an increasing trend (see below).
The problem is that I no longer have a time axis to plot this DataSeries in the same plot as the previous figure. The time axis starts like this:
longitude_4pp postal_code_4pp price rooms \
saledate saledate
2014 1 4.873140 1067.5 206250.0 2.5
6 4.954779 1102.0 129000.0 3.0
26 4.938828 1019.0 327500.0 3.0
40 4.896904 1073.0 249000.0 2.0
43 4.938828 1019.0 549000.0 5.0
How could I convert this Multi-Index with years and weeks back to a single DateTime index that I can plot my per-week-averaged data against?
If you group using pd.TimeGrouper you'll keep datetimes in your index.
dfp.groupby(pd.TimeGrouper('W')).mean()
Create a new index:
i = pd.Index(pd.datetime(year, 1, 1) + pd.Timedelta(7 * weeks, unit='d') for year, weeks in df.index)
Then set this new index on the DataFrame:
df.index = i
For the sake of completeness, here are the details of how I implemented the solution suggested by piRSquared:
fig, ax = plt.subplots()
ax.plot_date(dfp['saledate'],dfp['price']/1000.0,'.')
ax.set_xlabel('Date of sale')
ax.set_ylabel('Price (1,000 euros)')
dfp_week = dfp.groupby(pd.TimeGrouper(key='saledate', freq='W')).mean()
plt.plot_date(dfp_week.index, dfp_week['price']/1000.0)
which yields the plot below.

Categories