how to highlight weekends in matplotlib plots? - python

For a simple time series:
import pandas as pd
df = pd.DataFrame({'dt':['2020-01-01', '2020-01-02', '2020-01-04', '2020-01-05', '2020-01-06'], 'foo':[1,2, 4,5,6]})
df['dt'] = pd.to_datetime(df.dt)
df['dt_label']= df['dt'].dt.strftime('%Y-%m-%d %a')
df = df.set_index('dt')
#display(df)
df['foo'].plot()
x =plt.xticks(ticks=df.reset_index().dt.values, labels=df.dt_label, rotation=90, horizontalalignment='right')
How can I highlight the x-axis labels for weekends?
edit
Pandas Plots: Separate color for weekends, pretty printing times on x axis
suggests:
def highlight_weekends(ax, timeseries):
d = timeseries.dt
ranges = timeseries[d.dayofweek >= 5].groupby(d.year * 100 + d.weekofyear).agg(['min', 'max'])
for i, tmin, tmax in ranges.itertuples():
ax.axvspan(tmin, tmax, facecolor='orange', edgecolor='none', alpha=0.1)
but applying it with
highlight_weekends(ax, df.reset_index().dt)
will not change the plot

I've extended your sample data a little so we can can make sure that we can highlight more than a single weekend instance.
In this solution I create a column 'weekend', which is a column of bools indicating whether the corresponding date was at a weekend.
We then loop over these values and make a call to ax.axvspan
import pandas as pd
import matplotlib.pyplot as plt
# Add a couple of extra dates to sample data
df = pd.DataFrame({'dt': ['2020-01-01',
'2020-01-02',
'2020-01-04',
'2020-01-05',
'2020-01-06',
'2020-01-07',
'2020-01-09',
'2020-01-10',
'2020-01-11',
'2020-01-12']})
# Fill in corresponding observations
df['foo'] = range(df.shape[0])
df['dt'] = pd.to_datetime(df.dt)
df['dt_label']= df['dt'].dt.strftime('%Y-%m-%d %a')
df = df.set_index('dt')
ax = df['foo'].plot()
plt.xticks(ticks=df.reset_index().dt.values,
labels=df.dt_label,
rotation=90,
horizontalalignment='right')
# Create an extra column which highlights whether or not a date occurs at the weekend
df['weekend'] = df['dt_label'].apply(lambda x: x.endswith(('Sat', 'Sun')))
# Loop over weekend pairs (Saturdays and Sundays), and highlight
for i in range(df['weekend'].sum() // 2):
ax.axvspan(df[df['weekend']].index[2*i],
df[df['weekend']].index[2*i+1],
alpha=0.5)

Here is a solution that uses the fill_between plotting function and the x-axis units so that weekends can be highlighted independently from the DatetimeIndex and the frequency of the data.
The x-axis limits are used to compute the range of time covered by the plot in terms of days, which is the unit used for matplotlib dates. Then a weekends mask is computed and passed to the where argument of the fill_between function. The masks are processed as right-exclusive so in this case, they must contain Mondays for the highlights to be drawn up to Mondays 00:00. Because plotting these highlights can alter the x-axis limits when weekends occur near the limits, the x-axis limits are set back to the original values after plotting.
Note that contrary to axvspan, the fill_between function needs the y1 and y2 arguments. For some reason, using the default y-axis limits leaves a small gap between the plot frame and the tops and bottoms of the weekend highlights. This issue is solved by running ax.set_ylim(*ax.get_ylim()) just after creating the plot.
Here is a complete example based on the provided sample code and using an extended dataset similar to the answer provided by jwalton:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import matplotlib.dates as mdates
# Create sample dataset
dt = pd.to_datetime(['2020-01-01', '2020-01-02', '2020-01-04', '2020-01-05',
'2020-01-06', '2020-01-07', '2020-01-09', '2020-01-10',
'2020-01-11', '2020-01-14'])
df = pd.DataFrame(dict(foo=range(len(dt))), index=dt)
# Draw pandas plot: setting x_compat=True converts the pandas x-axis units to
# matplotlib date units. This is not necessary for this particular example but
# it is necessary for all cases where the dataframe contains a continuous
# DatetimeIndex (for example ones created with pd.date_range) that uses a
# frequency other than daily
ax = df['foo'].plot(x_compat=True, figsize=(6,4), ylabel='foo')
ax.set_ylim(*ax.get_ylim()) # reset y limits to display highlights without gaps
# Highlight weekends based on the x-axis units
xmin, xmax = ax.get_xlim()
days = np.arange(np.floor(xmin), np.ceil(xmax)+2) # range of days in date units
weekends = [(dt.weekday()>=5)|(dt.weekday()==0) for dt in mdates.num2date(days)]
ax.fill_between(days, *ax.get_ylim(), where=weekends, facecolor='k', alpha=.1)
ax.set_xlim(xmin, xmax) # set limits back to default values
# Create and format x tick for each data point
plt.xticks(df.index.values, df.index.strftime('%d\n%a'), rotation=0, ha='center')
plt.title('Weekends are highlighted from SAT 00:00 to MON 00:00', pad=15, size=12);
You can find more examples of this solution in the answers I have posted here and here.

Related

Clustered x-axis with the dates not showing clearly

I'm trying to plot a graph of a time series which has dates from 1959 to 2019 including months, and I when I try plotting this time series I'm getting a clustered x-axis where the dates are not showing properly. How is it possible to remove the months and get only the years on the x-axis so it wont be as clustered and it would show the years properly?
fig,ax = plt.subplots(2,1)
ax[0].hist(pca_function(sd_Data))
ax[0].set_ylabel ('Frequency')
ax[1].plot(pca_function(sd_Data))
ax[1].set_xlabel ('Years')
fig.suptitle('Histogram and Time series of Plot Factor')
plt.tight_layout()
# fig.savefig('factor1959.pdf')
pca_function(sd_Data)
comp_0
sasdate
1959-01 -0.418150
1959-02 1.341654
1959-03 1.684372
1959-04 1.981473
1959-05 1.242232
...
2019-08 -0.075270
2019-09 -0.402110
2019-10 -0.609002
2019-11 0.320586
2019-12 -0.303515
[732 rows x 1 columns]
From what I see, you do have years on your second subplot, they are just overlapped because there are to many of them placed horizontally. Try to increase figsize, and rotate ticks:
# Builds an example dataframe.
df = pd.DataFrame(columns=['Years', 'Frequency'])
df['Years'] = pd.date_range(start='1/1/1959', end='1/1/2023', freq='M')
df['Frequency'] = np.random.normal(0, 1, size=(df.shape[0]))
fig, ax = plt.subplots(2,1, figsize=(20, 5))
ax[0].hist(df.Frequency)
ax[0].set_ylabel ('Frequency')
ax[1].plot(df.Years, df.Frequency)
ax[1].set_xlabel('Years')
for tick in ax[0].get_xticklabels():
tick.set_rotation(45)
tick.set_ha('right')
for tick in ax[1].get_xticklabels():
tick.set_rotation(45)
tick.set_ha('right')
fig.suptitle('Histogram and Time series of Plot Factor')
plt.tight_layout()
p.s. if the x-labels still overlap, try to increase your step size.
First off, you need to store the result of the call to pca_function into a variable. E.g. called result_pca_func. That way, the calculations (and possibly side effects or different randomization) are only done once.
Second, the dates should be converted to a datetime format. For example using pd.to_datetime(). That way, matplotlib can automatically put year ticks as appropriate.
Here is an example, starting from a dummy test dataframe:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date': [f'{y}-{m:02d}' for y in range(1959, 2019) for m in range(1, 13)]})
df['Values'] = np.random.randn(len(df)).cumsum()
df = df.set_index('Date')
result_pca_func = df
result_pca_func.index = pd.to_datetime(result_pca_func.index)
fig, ax2 = plt.subplots(figsize=(10, 3))
ax2.plot(result_pca_func)
plt.tight_layout()
plt.show()

How can I reduce the frequency of x-axis ticks in Python when plotting multiple groups of values on one axis?

This produces a graph of all these stock prices plotted against the date. If you zoom in, all the tiny ticks have labels for the dates. I wanted to reduce the frequency of ticks so that it only displayed tick marks at the month and year. I have tried using locators and formatters, but whenever I add them all of the ticks and tick labels completely disappear. All that's left at the x-axis is the x-axis label.
Does any of the issue lie within the fact that I extract the date and use that for the x-axis plot for every new batch of stock prices I want to plot? Any advice would be appreciated. I am a beginner programmer.
from iexfinance import get_historical_data
import pandas as pd
import matplotlib.pyplot as plt
def tester():
start_date = '20170828'
end_date = '20180828'
symbols =['GOOG', 'IBM', 'CRON']
for symbol in symbols:
f_temp = get_historical_data(symbol, start_date, end_date, output_format='pandas')
df_close = pd.DataFrame(f_temp['close'])
df_open = pd.DataFrame(f_temp['open'])
df_date_string =
pd.to_datetime(f_temp.index).strftime("%Y%m%d").astype(str)
df = pd.merge(df_open, df_close, on=df_date_string)
df.columns = ['date', 'open', 'close']
plt.legend(symbols)
plot_data(df)
plt.show()
return df
def normalize_data(df):
return df/df.ix[0, :]
def plot_data(df):
normalized = normalize_data(df['close'])
plt.plot(df['date'], normalized)
plt.title("Normalized close stock prices")
plt.xlabel("Dates")
plt.ylabel("Close prices")
plt.tight_layout()
if __name__ == "__main__":
df = tester()

Matplotlib: How to skip a range of hours when plotting with a datetime axis?

I have tick-by-tick data of a financial instrument, which I am trying to plot using matplotlib. I am working with pandas and the data is indexed with DatetimeIndex.
The problem is, when I try to plot multiple trading days I can't skip the range of time between the market closing time and next day's opening (see the example), which of course I am not interested in.
Is there a way to make matplotlib ignore this and just "stick" together the closing quote with the following day's opening? I tried to pass a custom range of time:
plt.xticks(time_range)
But the result is the same. Any ideas how to do this?
# Example data
instrument = pd.DataFrame(data={
'Datetime': [
dt.datetime.strptime('2018-01-11 11:00:11', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-11 13:02:17', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-11 16:59:14', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-12 11:00:11', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-12 13:15:24', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-12 16:58:43', '%Y-%m-%d %H:%M:%S')
],
'Price': [127.6, 128.1, 127.95, 129.85, 129.7, 131.2],
'Volume': [725, 146, 48, 650, 75, 160]
}).set_index('Datetime')
plt.figure(figsize=(10,5))
top = plt.subplot2grid((4,4), (0, 0), rowspan=3, colspan=4)
bottom = plt.subplot2grid((4,4), (3,0), rowspan=1, colspan=4)
top.plot(instrument.index, instrument['Price'])
bottom.bar(instrument.index, instrument['Volume'], 0.005)
top.xaxis.get_major_ticks()
top.axes.get_xaxis().set_visible(False)
top.set_title('Example')
top.set_ylabel('Price')
bottom.set_ylabel('Volume')
TL;DR
Replace the matplotlib plotting functions:
top.plot(instrument.index, instrument['Price'])
bottom.bar(instrument.index, instrument['Volume'], 0.005)
With these ones:
top.plot(range(instrument.index.size), instrument['Price'])
bottom.bar(range(instrument.index.size), instrument['Volume'], width=1)
Or with these pandas plotting functions (only the x-axis limits will look different):
instrument['Price'].plot(use_index=False, ax=top)
instrument['Volume'].plot.bar(width=1, ax=bottom)
Align both plots by sharing the x-axis with sharex=True and set up the ticks as you would like them using the dataframe index, as shown in the example further below.
Let me first create a sample dataset and show what it looks like if I plot it using matplotlib plotting functions like in your example where the DatetimeIndex is used as the x variable.
Create sample dataset
The sample data is created using the pandas_market_calendars package to create a realistic DatetimeIndex with a minute-by-minute frequency that spans several weekdays and a weekend.
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import matplotlib.ticker as ticker
import pandas_market_calendars as mcal # v 1.6.1
# Create datetime index with a 'minute start' frequency based on the New
# York Stock Exchange trading hours (end date is inclusive)
nyse = mcal.get_calendar('NYSE')
nyse_schedule = nyse.schedule(start_date='2021-01-07', end_date='2021-01-11')
nyse_dti = mcal.date_range(nyse_schedule, frequency='1min', closed='left')\
.tz_convert(nyse.tz.zone)
# Remove timestamps of closing times to create a 'period start' datetime index
nyse_dti = nyse_dti.delete(nyse_dti.indexer_at_time('16:00'))
# Create sample of random data consisting of opening price and
# volume of financial instrument traded for each period
rng = np.random.default_rng(seed=1234) # random number generator
price_change = rng.normal(scale=0.1, size=nyse_dti.size)
price_open = 127.5 + np.cumsum(price_change)
volume = rng.integers(100, 10000, size=nyse_dti.size)
df = pd.DataFrame(data=dict(Price=price_open, Volume=volume), index=nyse_dti)
df.head()
# Price Volume
# 2021-01-07 09:30:00-05:00 127.339616 7476
# 2021-01-07 09:31:00-05:00 127.346026 3633
# 2021-01-07 09:32:00-05:00 127.420115 1339
# 2021-01-07 09:33:00-05:00 127.435377 3750
# 2021-01-07 09:34:00-05:00 127.521752 7354
Plot data with matplotlib using the DatetimeIndex
This sample data can now be plotted using matplotlib plotting functions like in your example, but note that the subplots are created by using plt.subplots with the sharex=True argument. This aligns the line with the bars correctly and makes it possible to use the interactive interface of matplotlib with both subplots.
# Create figure and plots using matplotlib functions
fig, (top, bot) = plt.subplots(2, 1, sharex=True, figsize=(10,5),
gridspec_kw=dict(height_ratios=[0.75,0.25]))
top.plot(df.index, df['Price'])
bot.bar(df.index, df['Volume'], 0.0008)
# Set title and labels
top.set_title('Matplotlib plots with unwanted gaps', pad=20, size=14, weight='semibold')
top.set_ylabel('Price', labelpad=10)
bot.set_ylabel('Volume', labelpad=10);
Plot data with matplotlib without any gaps by using a range of integers
The problem of these gaps can be solved by simply ignoring the DatetimeIndex and using a range of integers instead. Most of the work then lies in creating appropriate tick labels. Here is an example:
# Create figure and matplotlib plots with some additional formatting
fig, (top, bot) = plt.subplots(2, 1, sharex=True, figsize=(10,5),
gridspec_kw=dict(height_ratios=[0.75,0.25]))
top.plot(range(df.index.size), df['Price'])
top.set_title('Matplotlib plots without any gaps', pad=20, size=14, weight='semibold')
top.set_ylabel('Price', labelpad=10)
top.grid(axis='x', alpha=0.3)
bot.bar(range(df.index.size), df['Volume'], width=1)
bot.set_ylabel('Volume', labelpad=10)
# Set fixed major and minor tick locations
ticks_date = df.index.indexer_at_time('09:30')
ticks_time = np.arange(df.index.size)[df.index.minute == 0][::2] # step in hours
bot.set_xticks(ticks_date)
bot.set_xticks(ticks_time, minor=True)
# Format major and minor tick labels
labels_date = [maj_tick.strftime('\n%d-%b').replace('\n0', '\n')
for maj_tick in df.index[ticks_date]]
labels_time = [min_tick.strftime('%I %p').lstrip('0').lower()
for min_tick in df.index[ticks_time]]
bot.set_xticklabels(labels_date)
bot.set_xticklabels(labels_time, minor=True)
bot.figure.autofmt_xdate(rotation=0, ha='center', which='both')
Create dynamic ticks for interactive plots
If you like to use the interactive interface of matplotlib (with pan/zoom), you will need to use locators and formatters from the matplotlib ticker module. Here is an example of how to set the ticks, where the major ticks are fixed and formatted like above but the minor ticks are generated automatically as you zoom in/out of the plot:
# Set fixed major tick locations and automatic minor tick locations
ticks_date = df.index.indexer_at_time('09:30')
bot.set_xticks(ticks_date)
bot.xaxis.set_minor_locator(ticker.AutoLocator())
# Format major tick labels
labels_date = [maj_tick.strftime('\n%d-%b').replace('\n0', '\n')
for maj_tick in df.index[ticks_date]]
bot.set_xticklabels(labels_date)
# Format minor tick labels
def min_label(x, pos):
if 0 <= x < df.index.size:
return df.index[int(x)].strftime('%H:%M')
min_fmtr = ticker.FuncFormatter(min_label)
bot.xaxis.set_minor_formatter(min_fmtr)
bot.figure.autofmt_xdate(rotation=0, ha='center', which='both')
Documentation: example of an alternative solution; datetime string format codes
Maybe use https://pypi.org/project/mplfinance/
Allows mimicking the usual financial plots you see in most services.
When you call the mplfinance mpf.plot() function, there is a kwarg show_nontrading, which by default is set to False so that these unwanted gaps are automatically not plotted. (To plot them, set show_nontrading=True).

Remove Saturdays (but not Sundays or other dataless periods) from Timeserie plot

I am plotting a financial timeserie (see below, here 1 month worth of data)
I would like to remove the periods I show with red cross etc., which are Saturdays. Note that those periods are not all the time periods without data but only the Saturdays.
I know there are some example of how to remove the gaps , for instance: http://matplotlib.org/examples/api/date_index_formatter.html.
This is not what I am after since they remove all the gaps. (NOT MY INTENT!).
I was thinking that the way to go might be to create a custom sequence of values for the xaxis. Since the days are ordinals (ie 1 day = a value of 1), it might be possible to create a sequence such as 1,2,3,4,5,6,8,9,10,11,12,13,15,16,etc. skipping 1 every seven days - that day skipped needing to be a Saturday of course.
The skipping of every Saturday i can imagine how to do it using rrule from timeutil. It is done here (see below) as every Monday is marked with a stronger vertical line. But How would i go at passing it to the Tick locator? There is in fact a RRuleLocator class in the matplotlib API but no indication on how to use it is given in the doc: http://matplotlib.org/api/dates_api.html#matplotlib.dates.RRuleLocator.
Every suggestion welcome.
Here the code that I use for the current chart:
fig, axes = plt.subplots(2, figsize=(20, 6))
quotes = price_data.as_matrix() # as matrix() to remove the columns header of the df
mpf.candlestick_ohlc(axes[0], quotes, width=0.01)
plt.bar(quotes[:,0] , quotes[:,5], width = 0.01)
for i , axes[i] in enumerate(axes):
axes[i].xaxis.set_major_locator(mdates.DayLocator(interval=1) )
axes[i].xaxis.set_major_formatter(mdates.DateFormatter('%a, %b %d'))
axes[i].grid(True)
# show night times with a grey shade
majors=axes[i].xaxis.get_majorticklocs()
chart_start, chart_end = (axes[i].xaxis.get_view_interval()[0],
axes[i].xaxis.get_view_interval()[1])
for major in majors:
axes[i].axvspan(max (chart_start, major-(0.3333)),
min(chart_end, major+(0.3333)),color="0.95", zorder=-1 ) #0.33 corresponds to 1/3 of a day i.e. 8h
# show mondays with a line
mondays = list(rrule(WEEKLY, byweekday=MO, dtstart= mdates.num2date(chart_start),
until=mdates.num2date(chart_end)))
for j, monday in enumerate(mondays):
axes[i].axvline(mdates.date2num(mondays[j]), linewidth=0.75, color='k', zorder=1)
If your dates are datetime objects, or a DateTimeIndex in a pandas DataFrame, you could check which weekday a certain date is using .weekday. Then you just set the data on Saturdays to nan. See the example below.
Code
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import random
import datetime
import numpy as np
# generate some data with a datetime index
x = 400
data = pd.DataFrame([
random.random() for i in range(x)],
index=[datetime.datetime(2018, 1, 1, 0)
+ datetime.timedelta(hours=i) for i in range(x)])
# Set all data on a Saturday (5) to nan, so it doesn't show in the graph
data[data.index.weekday == 5] = np.nan
# Plot the data
fig, ax = plt.subplots(figsize=(12, 2.5))
ax.plot(data)
# Set a major tick on each weekday
days = mdates.DayLocator()
daysFmt = mdates.DateFormatter('%a')
ax.xaxis.set_major_locator(days)
ax.xaxis.set_major_formatter(daysFmt)
Result

Formatting X axis labels Pandas time series plot

I am trying to plot a multiple time series dataframe in pandas. The time series is a 1 year daily points of length 365. The figure is coming alright but I want to suppress the year tick showing on the x axis.
I want to suppress the 1950 label showing in the left corner of x axis. Can anybody suggest something on this? My code
dates = pandas.date_range('1950-01-01', '1950-12-31', freq='D')
data_to_plot12 = pandas.DataFrame(data=data_array, # values
index=homo_regions) # 1st column as index
dataframe1 = pandas.DataFrame.transpose(data_to_plot12)
dataframe1.index = dates
ax = dataframe1.plot(lw=1.5, marker='.', markersize=2, title='PRECT time series PI Slb Ocn CNTRL 60 years')
ax.set(xlabel="Months", ylabel="PRECT (mm/day)")
fig_name = 'dataframe1.pdf'
plt.savefig(fig_name)
You should be able to specify the xaxis major formatter like so
import matplotlib.dates as mdates
...
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))

Categories