Display Multiple Year's Data Using Custom Start/End Dates - datetime, matplotlib

Display Multiple Year's Data Using Custom Start/End Dates - datetime, matplotlib - python

I feel like this question has an obvious answer and I'm just being a bit of a fool. Say you have a couple of dataframes with datetime indices, where each dataframe is for a different year. In my case the index is every day going from June 25th to June 24th the next year:
date var
2019-06-25 107.230294
2019-06-26 104.110004
2019-06-27 104.291506
2019-06-28 111.162552
2019-06-29 112.515364
...
2020-06-20 132.840242
2020-06-21 127.641148
2020-06-22 132.797584
2020-06-23 129.094451
2020-06-24 110.408866
What I want is a single plot with multiple lines, where each line represents a year. The y-axis is my variable, var, and the x-axis should be day of the year. The x-axis should start from June 25th and end at June 24th.
This is what I've tried so far but it messes up the x-axis. Anyone know a more elegant way to do this?
fig, ax = plt.subplots()
plt.plot(average_prices19.index.strftime("%d/%m"), average_prices19.var, label = "2019-20")
plt.plot(average_prices20.index.strftime("%d/%m"), average_prices20.var, label = "2020-21")
plt.legend()
plt.show()

Well, there is a twist in this question: the list of dates in a year is not constant: on leap years there is a 'Feb-29' that is otherwise absent.
If you are comfortable glossing over this (and always representing a potential 'Feb-29' date on your plot, with missing data for non-leap years), then the following will achieve what you are seeking (assuming the data is in df with the date as DateTimeIndex):
import matplotlib.dates as mdates
fig, ax = plt.subplots()
for label, dfy in df.assign(
# note: 2000 is a leap year; the choice is deliberate
date=pd.to_datetime(df.index.strftime('2000-%m-%d')),
label=df.index.strftime('%Y')
).groupby('label'):
dfy.set_index('date')['var'].plot(ax=ax, label=str(label))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%m-%d"))
ax.legend()
Update
For larger amounts of data however, the above does not produce very legible xlabels. So instead, we can use ConciseFormatter to customize the display of xlabels (and remove the fake year 2000):
import matplotlib.dates as mdates
fig, ax = plt.subplots()
for label, dfy in df.assign(
# note: 2000 is a leap year; the choice is deliberate
date=pd.to_datetime(df.index.strftime('2000-%m-%d')),
label=df.index.strftime('%Y')
).groupby('label'):
dfy.set_index('date')['var'].plot(ax=ax, label=str(label))
ax.legend()
locator = mdates.AutoDateLocator(minticks=3, maxticks=7)
formatter = mdates.ConciseDateFormatter(
locator,
formats=['', '%b', '%d', '%H:%M', '%H:%M', '%S.%f'],
offset_formats=['', '', '%b', '%b-%d', '%b-%d', '%b-%d %H:%M']
)
ax.xaxis.set_major_locator(locator)
ax.xaxis.set_major_formatter(formatter)
For the data in your example:
For more data:
# setup
idx = pd.date_range('2016-01-01', 'now', freq='QS')
df = pd.DataFrame(
{'var': np.random.uniform(size=len(idx))},
index=idx).resample('D').interpolate(method='polynomial', order=5)
Corresponding plot (with ConciseFormatter):

Related

Clustered x-axis with the dates not showing clearly

I'm trying to plot a graph of a time series which has dates from 1959 to 2019 including months, and I when I try plotting this time series I'm getting a clustered x-axis where the dates are not showing properly. How is it possible to remove the months and get only the years on the x-axis so it wont be as clustered and it would show the years properly?
fig,ax = plt.subplots(2,1)
ax[0].hist(pca_function(sd_Data))
ax[0].set_ylabel ('Frequency')
ax[1].plot(pca_function(sd_Data))
ax[1].set_xlabel ('Years')
fig.suptitle('Histogram and Time series of Plot Factor')
plt.tight_layout()
# fig.savefig('factor1959.pdf')
pca_function(sd_Data)
comp_0
sasdate
1959-01 -0.418150
1959-02 1.341654
1959-03 1.684372
1959-04 1.981473
1959-05 1.242232
...
2019-08 -0.075270
2019-09 -0.402110
2019-10 -0.609002
2019-11 0.320586
2019-12 -0.303515
[732 rows x 1 columns]

From what I see, you do have years on your second subplot, they are just overlapped because there are to many of them placed horizontally. Try to increase figsize, and rotate ticks:
# Builds an example dataframe.
df = pd.DataFrame(columns=['Years', 'Frequency'])
df['Years'] = pd.date_range(start='1/1/1959', end='1/1/2023', freq='M')
df['Frequency'] = np.random.normal(0, 1, size=(df.shape[0]))
fig, ax = plt.subplots(2,1, figsize=(20, 5))
ax[0].hist(df.Frequency)
ax[0].set_ylabel ('Frequency')
ax[1].plot(df.Years, df.Frequency)
ax[1].set_xlabel('Years')
for tick in ax[0].get_xticklabels():
tick.set_rotation(45)
tick.set_ha('right')
for tick in ax[1].get_xticklabels():
tick.set_rotation(45)
tick.set_ha('right')
fig.suptitle('Histogram and Time series of Plot Factor')
plt.tight_layout()
p.s. if the x-labels still overlap, try to increase your step size.

First off, you need to store the result of the call to pca_function into a variable. E.g. called result_pca_func. That way, the calculations (and possibly side effects or different randomization) are only done once.
Second, the dates should be converted to a datetime format. For example using pd.to_datetime(). That way, matplotlib can automatically put year ticks as appropriate.
Here is an example, starting from a dummy test dataframe:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date': [f'{y}-{m:02d}' for y in range(1959, 2019) for m in range(1, 13)]})
df['Values'] = np.random.randn(len(df)).cumsum()
df = df.set_index('Date')
result_pca_func = df
result_pca_func.index = pd.to_datetime(result_pca_func.index)
fig, ax2 = plt.subplots(figsize=(10, 3))
ax2.plot(result_pca_func)
plt.tight_layout()
plt.show()

Ensuring first and last date ticks in x-axis - Matplotlib

Currently I am charting data from some historical point to a point in current time. For example, January 2019 to TODAY (February 2021). However, my matplotlib chart only shows dates from January 2019 to January 2021 on the x-axis (with the last February tick missing) even though the data is charted to today's date on the actual plot.
Is there any way to ensure that the first and last month is always reflected on the x-axis chart? In other words, I would like the x-axis to have the range displayed (inclusive).
Picture of x axis (missing February 2021)
The data charted here is from January 2019 to TODAY (February 12th).
Here is my code for the date format:
fig.autofmt_xdate()
date_format = mdates.DateFormatter("%b-%y")
ax.xaxis.set_major_formatter(date_format)
EDIT: The numbers after each month represent years.

I am not aware of any way to do this other than by creating the ticks from scratch.
In the following example, a list of all first-DatetimeIndex-timestamp-of-the-month is created from the DatetimeIndex of a pandas dataframe, starting from the month of the first date (25th of Jan.) up to the start of the last ongoing month. An appropriate number of ticks is automatically selected by the step variable and the last month is appended and then removed with np.unique when it is a duplicate. The labels are formatted from the tick timestamps.
This solution works for any frequency smaller than yearly:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create sample dataset
start_date = '2019-01-25'
end_date = '2021-02-12'
rng = np.random.default_rng(seed=123) # random number generator
dti = pd.date_range(start_date, end_date, freq='D')
variable = 100 + rng.normal(size=dti.size).cumsum()
df = pd.DataFrame(dict(variable=variable), index=dti)
# Create matplotlib plot
fig, ax = plt.subplots(figsize=(10, 2))
ax.plot(df.index, df.variable)
# Create list of monthly ticks made of timestamp objects
monthly_ticks = [timestamp for idx, timestamp in enumerate(df.index)
if (timestamp.month != df.index[idx-1].month) | (idx == 0)]
# Select appropriate number of ticks and include last month
step = 1
while len(monthly_ticks[::step]) > 10:
step += 1
ticks = np.unique(np.append(monthly_ticks[::step], monthly_ticks[-1]))
# Create tick labels from tick timestamps
labels = [timestamp.strftime('%b\n%Y') if timestamp.year != ticks[idx-1].year
else timestamp.strftime('%b') for idx, timestamp in enumerate(ticks)]
plt.xticks(ticks, labels, rotation=0, ha='center');
As you can see, the first and last months are located at an irregular distance from the neighboring tick.
In case you are plotting a time series with a discontinous date range (e.g. weekend and holidays not included) and you are not using the DatetimeIndex for the x-axis (like this for example: ax.plot(range(df.index.size), df.variable)) so as to avoid gaps with straight lines showing up on short time series and/or very wide plots, then replace the last line of code with this:
plt.xticks([df.index.get_loc(tick) for tick in ticks], labels, rotation=0, ha='center');

Matplotlib uses a limited number of ticks. It just happens that for February 2021 no tick is used. There are two things you could try. First try setting the axis limits to past today with:
ax.set_xlim(start_date, end_date)
What you could also try, is using even more ticks:
ax.set_xticks(np.arange(np.min(x), np.max(x), n_ticks))
Where n_ticks stands for the amount of ticks and x for the values on the x-axis.

Add months to xaxis and legend on a matplotlib line plot

I am trying to plot stacked yearly line graphs by months.
I have a dataframe df_year as below:
Day Number of Bicycle Hires
2010-07-30 6897
2010-07-31 5564
2010-08-01 4303
2010-08-02 6642
2010-08-03 7966
with the index set to the date going from 2010 July to 2017 July
I want to plot a line graph for each year with the xaxis being months from Jan to Dec and only the total sum per month is plotted
I have achieved this by converting the dataframe to a pivot table as below:
pt = pd.pivot_table(df_year, index=df_year.index.month, columns=df_year.index.year, aggfunc='sum')
This creates the pivot table as below which I can plot as show in the attached figure:
Number of Bicycle Hires 2010 2011 2012 2013 2014
1 NaN 403178.0 494325.0 565589.0 493870.0
2 NaN 398292.0 481826.0 516588.0 522940.0
3 NaN 556155.0 818209.0 504611.0 757864.0
4 NaN 673639.0 649473.0 658230.0 805571.0
5 NaN 722072.0 926952.0 749934.0 890709.0
plot showing yearly data with months on xaxis
The only problem is that the months show up as integers and I would like them to be shown as Jan, Feb .... Dec with each line representing one year. And I am unable to add a legend for each year.
I have tried the following code to achieve this:
dims = (15,5)
fig, ax = plt.subplots(figsize=dims)
ax.plot(pt)
months = MonthLocator(range(1, 13), bymonthday=1, interval=1)
monthsFmt = DateFormatter("%b '%y")
ax.xaxis.set_major_locator(months) #adding this makes the month ints disapper
ax.xaxis.set_major_formatter(monthsFmt)
handles, labels = ax.get_legend_handles_labels() #legend is nowhere on the plot
ax.legend(handles, labels)
Please can anyone help me out with this, what am I doing incorrectly here?
Thanks!

There is nothing in your legend handles and labels, furthermore the DateFormatter is not returning the right values considering they are not datetime objects your translating.
You could set the index specifically for the dates, then drop the multiindex column level which is created by the pivot (the '0') and then use explicit ticklabels for the months whilst setting where they need to occur on your x-axis. As follows:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import datetime
# dummy data (Days)
dates_d = pd.date_range('2010-01-01', '2017-12-31', freq='D')
df_year = pd.DataFrame(np.random.randint(100, 200, (dates_d.shape[0], 1)), columns=['Data'])
df_year.index = dates_d #set index
pt = pd.pivot_table(df_year, index=df_year.index.month, columns=df_year.index.year, aggfunc='sum')
pt.columns = pt.columns.droplevel() # remove the double header (0) as pivot creates a multiindex.
ax = plt.figure().add_subplot(111)
ax.plot(pt)
ticklabels = [datetime.date(1900, item, 1).strftime('%b') for item in pt.index]
ax.set_xticks(np.arange(1,13))
ax.set_xticklabels(ticklabels) #add monthlabels to the xaxis
ax.legend(pt.columns.tolist(), loc='center left', bbox_to_anchor=(1, .5)) #add the column names as legend.
plt.tight_layout(rect=[0, 0, 0.85, 1])
plt.show()

Remove Saturdays (but not Sundays or other dataless periods) from Timeserie plot

I am plotting a financial timeserie (see below, here 1 month worth of data)
I would like to remove the periods I show with red cross etc., which are Saturdays. Note that those periods are not all the time periods without data but only the Saturdays.
I know there are some example of how to remove the gaps , for instance: http://matplotlib.org/examples/api/date_index_formatter.html.
This is not what I am after since they remove all the gaps. (NOT MY INTENT!).
I was thinking that the way to go might be to create a custom sequence of values for the xaxis. Since the days are ordinals (ie 1 day = a value of 1), it might be possible to create a sequence such as 1,2,3,4,5,6,8,9,10,11,12,13,15,16,etc. skipping 1 every seven days - that day skipped needing to be a Saturday of course.
The skipping of every Saturday i can imagine how to do it using rrule from timeutil. It is done here (see below) as every Monday is marked with a stronger vertical line. But How would i go at passing it to the Tick locator? There is in fact a RRuleLocator class in the matplotlib API but no indication on how to use it is given in the doc: http://matplotlib.org/api/dates_api.html#matplotlib.dates.RRuleLocator.
Every suggestion welcome.
Here the code that I use for the current chart:
fig, axes = plt.subplots(2, figsize=(20, 6))
quotes = price_data.as_matrix() # as matrix() to remove the columns header of the df
mpf.candlestick_ohlc(axes[0], quotes, width=0.01)
plt.bar(quotes[:,0] , quotes[:,5], width = 0.01)
for i , axes[i] in enumerate(axes):
axes[i].xaxis.set_major_locator(mdates.DayLocator(interval=1) )
axes[i].xaxis.set_major_formatter(mdates.DateFormatter('%a, %b %d'))
axes[i].grid(True)
# show night times with a grey shade
majors=axes[i].xaxis.get_majorticklocs()
chart_start, chart_end = (axes[i].xaxis.get_view_interval()[0],
axes[i].xaxis.get_view_interval()[1])
for major in majors:
axes[i].axvspan(max (chart_start, major-(0.3333)),
min(chart_end, major+(0.3333)),color="0.95", zorder=-1 ) #0.33 corresponds to 1/3 of a day i.e. 8h
# show mondays with a line
mondays = list(rrule(WEEKLY, byweekday=MO, dtstart= mdates.num2date(chart_start),
until=mdates.num2date(chart_end)))
for j, monday in enumerate(mondays):
axes[i].axvline(mdates.date2num(mondays[j]), linewidth=0.75, color='k', zorder=1)

If your dates are datetime objects, or a DateTimeIndex in a pandas DataFrame, you could check which weekday a certain date is using .weekday. Then you just set the data on Saturdays to nan. See the example below.
Code
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import random
import datetime
import numpy as np
# generate some data with a datetime index
x = 400
data = pd.DataFrame([
random.random() for i in range(x)],
index=[datetime.datetime(2018, 1, 1, 0)
+ datetime.timedelta(hours=i) for i in range(x)])
# Set all data on a Saturday (5) to nan, so it doesn't show in the graph
data[data.index.weekday == 5] = np.nan
# Plot the data
fig, ax = plt.subplots(figsize=(12, 2.5))
ax.plot(data)
# Set a major tick on each weekday
days = mdates.DayLocator()
daysFmt = mdates.DateFormatter('%a')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(daysFmt)
Result

Formatting X axis labels Pandas time series plot

I am trying to plot a multiple time series dataframe in pandas. The time series is a 1 year daily points of length 365. The figure is coming alright but I want to suppress the year tick showing on the x axis.
I want to suppress the 1950 label showing in the left corner of x axis. Can anybody suggest something on this? My code
dates = pandas.date_range('1950-01-01', '1950-12-31', freq='D')
data_to_plot12 = pandas.DataFrame(data=data_array, # values
index=homo_regions) # 1st column as index
dataframe1 = pandas.DataFrame.transpose(data_to_plot12)
dataframe1.index = dates
ax = dataframe1.plot(lw=1.5, marker='.', markersize=2, title='PRECT time series PI Slb Ocn CNTRL 60 years')
ax.set(xlabel="Months", ylabel="PRECT (mm/day)")
fig_name = 'dataframe1.pdf'
plt.savefig(fig_name)

You should be able to specify the xaxis major formatter like so
import matplotlib.dates as mdates
...
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Display Multiple Year's Data Using Custom Start/End Dates - datetime, matplotlib - python

Related

Clustered x-axis with the dates not showing clearly

Ensuring first and last date ticks in x-axis - Matplotlib

Add months to xaxis and legend on a matplotlib line plot

Remove Saturdays (but not Sundays or other dataless periods) from Timeserie plot

Formatting X axis labels Pandas time series plot

Categories

Resources