How can I highlight weekends in a small multiples?
I've read different threads (e.g. (1) and (2)) but couldn't figure out how to implement it into my case, since I work with small multiples where I iterate through the DateTimeIndex to every month (see code below figure). My data Profiles is for this case a time-series of 2 years with an interval of 15min (i.e. 70080 datapoints).
However, weekend days occuring at the end of the month and therefore generate an error; in this case: IndexError: index 2972 is out of bounds for axis 0 with size 2972
My attempt: [Edited - with suggestions by #Patrick FitzGerald]
In [10]:
class highlightWeekend:
'''Class object to highlight weekends'''
def __init__(self, period):
self.ranges= period.index.dayofweek >= 5
self.res = [x for x, (i , j) in enumerate(zip( [2] + list(self.ranges), list(self.ranges) + [2])) if i != j]
if self.res[0] == 0 and self.ranges[0] == False:
del self.res[0]
if self.res[-1] == len(self.ranges) and self.ranges[-1] == False:
del self.res[-1]
months= Profiles.loc['2018'].groupby(lambda x: x.month)
fig, axs= plt.subplots(4,3, figsize= (16, 12), sharey=True)
axs= axs.flatten()
for i, j in months:
axs[i-1].plot(j.index, j)
if i < len(months):
k= 0
while k < len(highlightWeekend(j).res):
axs[i-1].axvspan(j.index[highlightWeekend(j).res[k]], j.index[highlightWeekend(j).res[k+1]], alpha=.2)
k+=2
i+=1
plt.show()
[Out 10]:
Question
How to solve the issue of the weekend day occuring at the end of the month ?
TL;DR Skip to Solution for method 2 to see the optimal solution, or skip to the last example for a solution with a single pandas line plot. In all three examples, weekends are highlighted using just 4-6 lines of code, the rest is for formatting and reproducibility.
Methods and tools
I am aware of two methods to highlight weekends on plots of time series, which can be applied both to single plots and to small multiples by looping over the array of subplots. This answer presents solutions for highlighting weekends but they can be easily adjusted to work for any recurring period of time.
Method 1: highlight based on the dataframe index
This method follows the logic of the code in the question and in the answers in the linked threads. Unfortunately, a problem arises when a weekend day occurs at the end of the month, the index number that is needed to draw the full span of the weekend exceeds the index range which produces an error. This issue is solved in the solution shown further below by computing the time difference between two timestamps and adding it to each timestamp of the DatetimeIndex when looping over them to highlight the weekends.
But two issues remain, i) this method does not work for time series with a frequency of more than a day, and ii) time series based on frequencies less than hourly (like 15 minutes) will require the drawing of many polygons which hurts performance. For these reasons, this method is presented here for the purpose of documentation and I suggest using instead method 2.
Method 2: highlight based on the x-axis units
This method uses the x-axis units, that is the number of days since the time origin (1970-01-01), to identify the weekends independently from the time series data being plotted which makes it much more flexible than method 1. The highlights are drawn for each full weekend day only, making this two times faster than method 1 for the examples presented below (according to a %%timeit test in Jupyter Notebook). This is the method I recommend using.
Tools in matplotlib that can be used to implement both methods
axvspan link demo, link API (used in Solution for method 1)
broken_barh link demo, link API
fill_between link demo, link API (used in Solution for method 2)
BrokenBarHCollection.span_where link demo, link API
To me, it seems that fill_between and BrokenBarHCollection.span_where are essentially the same. Both provide the handy where argument which is used in the solution for method 2 presented further below.
Solutions
Here is a reproducible sample dataset used to illustrate both methods, using a frequency of 6 hours. Note that the dataframe contains data only for one year which makes it possible to select the monthly data simply with df[df.index.month == month] to draw each subplot. You will need to adjust this if you are dealing with a multi-year DatetimeIndex.
Import packages used for all 3 examples and create the dataset for the first 2 examples
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import matplotlib.dates as mdates # used only for method 2
# Create sample dataset
rng = np.random.default_rng(seed=1) # random number generator
dti = pd.date_range('2018-01-01 00:00', '2018-12-31 23:59', freq='6H')
consumption = rng.integers(1000, 2000, size=dti.size)
df = pd.DataFrame(dict(consumption=consumption), index=dti)
Solution for method 1: highlight based on the dataframe index
In this solution, the weekends are highlighted using axvspan and the DatetimeIndex of the monthly dataframes df_month. The weekend timestamps are selected with df_month.index[df_month.weekday>=5].to_series() and the problem of exceeding the index range is solved by computing the timedelta from the frequency of the DatetimeIndex and adding it to each timestamp.
Of course, axvspan could also be used in a smarter way than shown here so that each weekend highlight is drawn in a single go, but I believe this would result in a less flexible solution and more code than what is presented in Solution for method 2.
# Draw and format subplots by looping through months and flattened array of axes
fig, axs = plt.subplots(4, 3, figsize=(10, 9), sharey=True)
for month, ax in zip(df.index.month.unique(), axs.flat):
# Select monthly data and plot it
df_month = df[df.index.month == month]
ax.plot(df_month.index, df_month['consumption'])
ax.set_ylim(0, 2500) # set limit similar to plot shown in question
# Draw vertical spans for weekends: computing the timedelta and adding it
# to the date solves the problem of exceeding the df_month.index
timedelta = pd.to_timedelta(df_month.index.freq)
weekends = df_month.index[df_month.index.weekday>=5].to_series()
for date in weekends:
ax.axvspan(date, date+timedelta, facecolor='k', edgecolor=None, alpha=.1)
# Format tick labels
ax.set_xticks(ax.get_xticks())
tk_labels = [pd.to_datetime(tk, unit='D').strftime('%d') for tk in ax.get_xticks()]
ax.set_xticklabels(tk_labels, rotation=0, ha='center')
# Add x labels for months
ax.set_xlabel(df_month.index[0].month_name().upper(), labelpad=5)
ax.xaxis.set_label_position('top')
# Add title and edit spaces between subplots
year = df.index[0].year
freq = df_month.index.freqstr
title = f'{year} consumption displayed for each month with a {freq} frequency'
fig.suptitle(title.upper(), y=0.95, fontsize=12)
fig.subplots_adjust(wspace=0.1, hspace=0.5)
fig.text(0.5, 0.99, 'Weekends are highlighted by using the DatetimeIndex',
ha='center', fontsize=14, weight='semibold');
As you can see, the weekend highlights end where the data ends as illustrated with the month of March. This is of course not noticeable if the DatetimeIndex is used to set the x-axis limits.
Solution for method 2: highlight based on the x-axis units
This solution uses the x-axis limits to compute the range of time covered by the plot in terms of days, which is the unit used for matplotlib dates. A weekends mask is computed and then passed to the where argument of the fill_between plotting function. The True values of the mask are processed as right-exclusive so in this case, Mondays must be included for the highlights to be drawn up to Mondays 00:00. Because plotting these highlights can alter the x-axis limits when weekends occur near the limits, the x-axis limits are set back to the original values after plotting.
Note that with fill_between the y1 and y2 arguments must be given. For some reason using the default y-axis limits leaves a small gap between the plot frame and the tops and bottoms of the weekend highlights. Here, the y limits are set to 0 and 2500 just to create an example similar to the one in the question but the following should be used instead for general cases: ax.set_ylim(*ax.get_ylim()).
# Draw and format subplots by looping through months and flattened array of axes
fig, axs = plt.subplots(4, 3, figsize=(10, 9), sharey=True)
for month, ax in zip(df.index.month.unique(), axs.flat):
# Select monthly data and plot it
df_month = df[df.index.month == month]
ax.plot(df_month.index, df_month['consumption'])
ax.set_ylim(0, 2500) # set limit like plot shown in question, or use next line
# ax.set_ylim(*ax.get_ylim())
# Highlight weekends based on the x-axis units, regardless of the DatetimeIndex
xmin, xmax = ax.get_xlim()
days = np.arange(np.floor(xmin), np.ceil(xmax)+2)
weekends = [(dt.weekday()>=5)|(dt.weekday()==0) for dt in mdates.num2date(days)]
ax.fill_between(days, *ax.get_ylim(), where=weekends, facecolor='k', alpha=.1)
ax.set_xlim(xmin, xmax) # set limits back to default values
# Create appropriate ticks with matplotlib date tick locator and formatter
tick_loc = mdates.MonthLocator(bymonthday=np.arange(1, 31, step=5))
ax.xaxis.set_major_locator(tick_loc)
tick_fmt = mdates.DateFormatter('%d')
ax.xaxis.set_major_formatter(tick_fmt)
# Add x labels for months
ax.set_xlabel(df_month.index[0].month_name().upper(), labelpad=5)
ax.xaxis.set_label_position('top')
# Add title and edit spaces between subplots
year = df.index[0].year
freq = df_month.index.freqstr
title = f'{year} consumption displayed for each month with a {freq} frequency'
fig.suptitle(title.upper(), y=0.95, fontsize=12)
fig.subplots_adjust(wspace=0.1, hspace=0.5)
fig.text(0.5, 0.99, 'Weekends are highlighted by using the x-axis units',
ha='center', fontsize=14, weight='semibold');
As you can see, the weekends are always highlighted to the full extent, regardless of where the data starts and ends.
Additional example of a solution for method 2 with a monthly time series and a pandas plot
This plot may not make much sense but it serves to illustrate the flexibility of method 2 and how to make it compatible with a pandas line plot. Note that the sample dataset uses a month start frequency so that the default ticks are aligned with the data points.
# Create sample dataset with a month start frequency
rng = np.random.default_rng(seed=1) # random number generator
dti = pd.date_range('2018-01-01 00:00', '2018-06-30 23:59', freq='MS')
consumption = rng.integers(1000, 2000, size=dti.size)
df = pd.DataFrame(dict(consumption=consumption), index=dti)
# Draw pandas plot: x_compat=True converts the pandas x-axis units to matplotlib
# date units
ax = df.plot(x_compat=True, figsize=(10, 4), legend=None)
ax.set_ylim(0, 2500) # set limit similar to plot shown in question, or use next line
# ax.set_ylim(*ax.get_ylim())
# Highlight weekends based on the x-axis units, regardless of the DatetimeIndex
xmin, xmax = ax.get_xlim()
days = np.arange(np.floor(xmin), np.ceil(xmax)+2)
weekends = [(dt.weekday()>=5)|(dt.weekday()==0) for dt in mdates.num2date(days)]
ax.fill_between(days, *ax.get_ylim(), where=weekends, facecolor='k', alpha=.1)
ax.set_xlim(xmin, xmax) # set limits back to default values
# Additional formatting
ax.figure.autofmt_xdate(rotation=0, ha='center')
ax.set_title('2018 consumption by month'.upper(), pad=15, fontsize=12)
ax.figure.text(0.5, 1.05, 'Weekends are highlighted by using the x-axis units',
ha='center', fontsize=14, weight='semibold');
You can find more examples of this solution in the answers I have posted here and here.
References: this answer by Nipun Batra, this answer by BenB, matplotlib.dates
Related
I'm trying to plot minimum and maximum daily temperature values for last 20 years. Since there are too many days in between, my plot graph looks too complicated.
How can I make change the frequency of days to reduce the density of my graph?
In other words, I want to set that it gets the weather of one day and then skips following 2 days in the plot without changing the dataframe.
fig, ax = plt.subplots()
colors = ["Orange", "Blue"]
for i,col in enumerate(weather_data.columns):
if col is "Date": continue
ax.plot('Date', col, data=weather_data)
ax.set_xlabel("Date")
ax.set_ylabel("Temperature (Celcius)")
# set 15 xticks to prevent overlapping
ax.set_xticks(np.arange(0, weather_data.shape[0],weather_data.shape[0] / 15))
ax.legend()
fig.autofmt_xdate()
ax.set_title('Time Plot of Weather');
Dataset:
https://drive.google.com/uc?id=1O-7DuL6-bkPBpz7mAUZ7M62P6EOyngG2
Hard to say without sample data, but one option is to show only one data point out of every k data points in the original DataFrame, and interpolate the missing days with straight line segments. (This is basically downsampling.)
For example, to show every 5 data points, change this line:
ax.plot('Date', col, data=weather_data)
to this:
ax.plot('Date', col, data=weather_data.iloc[::5])
There are other approaches such as nonlinear interpolation or showing a rolling average, but this should serve as a starting point.
I'm trying to plot two separate things from two pandas dataframes but the x-axis is giving some issues. When using matplotlib.ticker to skip x-ticks, the date doesn't get skipped. The result is that the x-axis values doesn't match up with what is plotted.
For example, when the x-ticks are set to a base of 2, you'll see that the dates are going up by 1.
But the graph has the same spacing when the base is set to 4, which you can see here:
For the second image, the goal is for the days to increase by 4 each tick, so it should read 22, 26, 30, etc.
Here is the code that I'm working with:
ax = plot2[['Date','change value']].plot(x='Date',color='red',alpha=1,linewidth=1.5)
plt.ylabel('Total Change')
plot_df[['Date','share change daily']].plot(x='Date',secondary_y=True,kind='bar',ax=ax,alpha=0.4,color='black',figsize=(6,2),label='Daily Change')
plt.ylabel('Daily Change')
ax.legend(['Total Change (L)','Daily Change'])
plt.xticks(plot_df.index,plot_df['Date'].values)
myLocator = mticker.MultipleLocator(base=4)
ax.xaxis.set_major_locator(myLocator)
Any help is appreciated! Thanks :)
First off, I suggest you set the date as the index of your dataframe. This lets pandas automatically format the date labels nicely when you create line plots and it lets you conveniently create a custom format with the strftime method.
This second point is relevant to this example, seeing as plotting a bar plot over a line plot prevents you from getting the pandas line plot date labels because the x-axis units switch to integer units starting at 0 (note that this is also the case when you use the dates as strings instead of datetime objects, aka timestamp objects in pandas). You can check this for yourself by running ax.get_xticks() after creating the line plot (with a DatetimeIndex) and again after creating the bar plot.
There are too many peculiarities regarding the tick locators and formatters, the pandas plotting defaults, and the various ways in which you could define your custom ticks and tick labels for me to go into more detail here. So let me suggest you refer to the documentation for more information (though for your case you don't really need any of this): Major and minor ticks, Date tick labels, Custom tick formatter for time series, more examples using ticks, and the ticker module which contains the list of tick locators and formatters and their parameters.
Furthermore, you can identify the default tick locators and formatters used by the plotting functions with ax.get_xaxis().get_major_locator() or ax.get_xaxis().get_major_formatter() (you can do the same for the y-axis, and for minor ticks) to get an idea of what is happening under the hood.
On to solving your problem. Seeing as you want a fixed frequency of ticks for a predefined range of dates, I suggest that you avoid explicitly selecting a ticker locator and formatter and that instead you simply create the list of ticks and tick labels you want. First, here is some sample data similar to yours:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
rng = np.random.default_rng(seed=1) # random number generator
dti = pd.bdate_range(start='2020-07-22', end='2020-09-03')
daily = rng.normal(loc=0, scale=250, size=dti.size)
total = -1900 + np.cumsum(daily)
df = pd.DataFrame({'Daily Change': daily,
'Total Change': total},
index=dti)
df.head()
Daily Change Total Change
2020-07-22 86.396048 -1813.603952
2020-07-23 205.404536 -1608.199416
2020-07-24 82.609269 -1525.590147
2020-07-27 -325.789308 -1851.379455
2020-07-28 226.338967 -1625.040488
The date is set as the index, which will simplify the code for creating the plots (no need to specify x). I use the same formatting arguments as in the example you gave, except for the figure size. Note that for setting the ticks and tick labels I do not use plt.xticks because this refers to the secondary Axes containing the bar plot and for some reason, the rotation and ha arguments get ignored.
label_daily, label_total = df.columns
# Create pandas line plot: note the 'use_index' parameter
ax = df.plot(y=label_total, color='red', alpha=1, linewidth=1.5,
use_index=False, ylabel=label_total)
# Create pandas bar plot: note that the second ylabel must be created
# after, else it overwrites the previous label on the left
df.plot(kind='bar', y=label_daily, color='black', alpha=0.4,
ax=ax, secondary_y=True, mark_right=False, figsize=(9, 4))
plt.ylabel(label_daily, labelpad=10)
# Place legend in a better location: note that because there are two
# Axes, the combined legend can only be edited with the fig.legend
# method, and the ax legend must be removed
ax.legend().remove()
plt.gcf().legend(loc=(0.11, 0.15))
# Create custom x ticks and tick labels
freq = 4 # business days
xticks = ax.get_xticks()
xticklabels = df.index[::freq].strftime('%b-%d')
ax.set_xticks(xticks[::freq])
ax.set_xticks(xticks, minor=True)
ax.set_xticklabels(xticklabels, rotation=0, ha='center')
plt.show()
The codes for formatting the dates can be found here.
For the sake of completeness, here are two alternative ways of creating exactly the same ticks but this time by making explicit use of matplotlib tick locators and formatters.
This first alternative uses lists of ticks and tick labels like before, but this time passing them to FixedLocator and FixedFormatter:
import matplotlib.ticker as mticker
# Create custom x ticks and tick labels
freq = 4 # business days
maj_locator = mticker.FixedLocator(ax.get_xticks()[::freq])
min_locator = mticker.FixedLocator(ax.get_xticks())
ax.xaxis.set_major_locator(maj_locator)
ax.xaxis.set_minor_locator(min_locator)
maj_formatter = mticker.FixedFormatter(df.index[maj_locator.locs].strftime('%b-%d'))
ax.xaxis.set_major_formatter(maj_formatter)
plt.setp(ax.get_xticklabels(), rotation=0, ha='center')
This second alternative makes use of the option to create a tick at every nth position of the index when using IndexLocator, combining it with FuncFormatter (instead of IndexFormatter which is deprecated):
import matplotlib.ticker as mticker
# Create custom x ticks and tick labels
maj_freq = 4 # business days
min_freq = 1 # business days
maj_locator = mticker.IndexLocator(maj_freq, 0)
min_locator = mticker.IndexLocator(min_freq, 0)
ax.xaxis.set_major_locator(maj_locator)
ax.xaxis.set_minor_locator(min_locator)
maj_formatter = mticker.FuncFormatter(lambda x, pos=None:
df.index[int(x)].strftime('%b-%d'))
ax.xaxis.set_major_formatter(maj_formatter)
plt.setp(ax.get_xticklabels(), rotation=0, ha='center')
As you can see, both of these alternatives are more verbose than the initial example.
Question :
Is there a way I can convert day to String rather than decimal value? Similarly for Month.
Note: I already visited this (3D Scatterplot with strings in Python) answer which does not solve my question.
I am working on a self project where I am trying to create 3D chart for my commute from data I retrieved from my google activity.
For reference I am following this guide : https://nvbn.github.io/2018/05/01/commute/
I am able to create informative 2D chart based on Month + Time and Day +Time attributes however I wish to combine these 2 chart.
3D chart I want to create requires 3 attribute Day (Mon/Tue) , Month (Jan/Feb), Time taken.
Given that matplotlib does not support String values in charts right away I have used Number for Day (0-7) and Month (1-12). However graph seems bit obscure with decimal values for days. Looks like following
My current code looks like this, retrieving weekday() to get day number, and month for month.
# How commute is calculated and grouped
import pandas as pd
#{...}
def get_commute_to_work():
#{...}
yield Commute_to_work(pd.to_datetime(start.datetime), start.datetime, end.datetime, end.datetime - start.datetime)
#Now creating graph here
fig, ax = pyplot.subplots(subplot_kw={'projection': '3d'})
ax.grid()
ax.scatter([commute.day.weekday() for commute in normalised],
[commute.day.month for commute in normalised],
[commute.took.total_seconds() / 60 for commute in normalised])
ax.set(xlabel='Day',ylabel='Month' ,zlabel='commute (minutes)',
title='Daily commute')
ax.legend()
pyplot.show()
nb. if you wish to gaze into detail of this code it's available on github here
You can try this (I have not verified for the 3d plot though):
x_tick_labels = ['Sun','Mon','Tue','Wed','Thurs', 'Fri', 'Sat']
# Set number of ticks for x-axis
x = np.linspace(1.0, 4.0, 7) # Why you have 9 days in a week is beyond me
ax.set_xticks(x)
# Set ticks labels for x-axis
ax.set_xticklabels(x_ticks_labels, rotation='vertical', fontsize=18)
You can repeat a similar procedure for months.
The source for this answer is here.
It is driving me crazy that I can't accomplish something that should be simple enough. I have a time series that I grouped by year so that I can plot each year and compare them. When I plot I have 21 lines, so instead of a legend box I'd like to add the year to the end of each line, like this graph here (example):
I've created a function to take any time series and return this plot, and I'm struggling to add this custom label/annotation.
My code is this:
def plot_by_year(ts):
# first I group the time series (ts) by year:
year_group = ts.groupby(pd.Grouper(freq ='A'))
yearly = pd.DataFrame()
for year, group in year_group:
yearly[year.year] = group.values.ravel()
# now to plot it, the easy mode is:
yearly.plot(figsize = (12,14), legend=True)
plt.gca().legend(loc='center left', bbox_to_anchor=(1, .5));
However, this only gives me a legend box outside the plot (see plot below)
The alternative I'm trying, following this instructions is this:
for rank, column in enumerate(years):
plt.plot(np.arange(0,13), yearly[column].values, lw=2.5)
y_pos = yearly[column].values[-1] - 0.5
plt.text(12, y_pos, rank)
This gives me KeyError: 1996, which is the first year of my data. I've tried so many things that I don't even know anymore what I'm doing. Help!
It sounds like your years isn't the same as yearly.columns. Maybe you just got the datatypes wrong (ints vs strings?). Try this instead:
fig, ax = plt.subplots() # probably unnecessary tbh but I prefer working with the ax obj rather than plt
n_points = yearly.shape[0] # generalize out that 12. This is the number of points in your series. If it's months then I guess it's always 12...
for year in yearly: # get the column names straight off the dataframe
ax.plot(np.arange(n_points), yearly[year].values, lw=2.5)
y_pos = yearly[year].values[-1] - 0.5
ax.text(n_points, y_pos, year) # You wanted to label it with the column name, not the column index which is what rank would have given you
You are using Pandas to do plotting. That interface is a little different from what I am about to describe below but is a rough tutorial
# The following creates a new figure of desired size and 2 subplots
f, ax = plt.subplots(1,2, figsize=(9,3))
# the following is a Pandas DataFrame. It doesn't matter how you got it
df = pd.read_csv('my.csv', sep=r'\s+')
# here is a plot:
ax[0].plot(df['x'], np.abs(df['y'])/1000,
label="whatever I want")
ax[1].plot(df['x'], np.abs(df['g'])/1000,
label="another whatever I want")
# set whatever I want for the labels
ax[0].set_ylabel("My x-label")
ax[1].set_ylabel("My Y-label")
# the bbox argument controls where the legend is located
# bbox_to_anchor=(0.4, .5) for example is inside
f.legend(loc='center left', bbox_to_anchor=(1, .5))
rcParams['date.autoformatter.month'] = "%b\n%Y"
I am using matpltolib to plot a time-series and if I set rcParams as above, the resulting plot has month name and year labeled at each tick. How can I set it up so that year is only plotted at january of each year. I tried doing this, but it does not work:
rcParams['date.autoformatter.month'] = "%b"
rcParams['date.autoformatter.year'] = "%Y"
The formatters do not allow to specify conditions on them. Depending on the span of the series, the AutoDateFormatter will either fall into the date.autoformatter.month range or the date.autoformatter.year range.
Also, the AutoDateLocator may not necessarily decide to actually tick the first of January at all.
I would hence suggest to specify the tickers directly to the desired format and locations. You may use the major ticks to show the years and the minor ticks to show the months. The format for the major ticks can then get a line break, in order not to overlap with the minor ticklabels.
import matplotlib.pyplot as plt
import matplotlib.dates
from datetime import datetime
t = [datetime(2016,1,1), datetime(2017,12,31)]
x = [0,1]
fig, ax = plt.subplots()
ax.plot(t,x)
ax.xaxis.set_major_locator(matplotlib.dates.YearLocator())
ax.xaxis.set_minor_locator(matplotlib.dates.MonthLocator((1,4,7,10)))
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter("\n%Y"))
ax.xaxis.set_minor_formatter(matplotlib.dates.DateFormatter("%b"))
plt.setp(ax.get_xticklabels(), rotation=0, ha="center")
plt.show()
You could then also adapt the minor ticks' lengths to match those of the major ones in case that is desired,
ax.tick_params(axis="x", which="both", length=4)