Formatting datetime for plotting time-series with pandas - python

I am plotting some time-series with pandas dataframe and I ran into the problem of gaps on weekends. What can I do to remove gaps in the time-series plot?
date_concat = pd.to_datetime(pd.Series(df.index),infer_datetime_format=True)
pca_factors.index = date_concat
pca_colnames = ['Outright', 'Curve', 'Convexity']
pca_factors.columns = pca_colnames
fig,axes = plt.subplots(2)
pca_factors.Curve.plot(ax=axes[0]); axes[0].set_title('Curve')
pca_factors.Convexity.plot(ax=axes[1]); axes[1].set_title('Convexity'); plt.axhline(linewidth=2, color = 'g')
fig.tight_layout()
fig.savefig('convexity.png')
Partial plot below:
Ideally, I would like the time-series to only show the weekdays and ignore weekends.

To make MaxU's suggestion more explicit:
convert to datetime as you have done, but drop the weekends
reset the index and plot the data via this default Int64Index
change the x tick labels
Code:
date_concat = data_concat[date_concat.weekday < 5] # drop weekends
pca_factors = pca_factors.reset_index() # from MaxU's comment
pca_factors['Convexity'].plot() # ^^^
plt.xticks(pca_factors.index, date_concat) # change the x tick labels

Related

Clustered x-axis with the dates not showing clearly

I'm trying to plot a graph of a time series which has dates from 1959 to 2019 including months, and I when I try plotting this time series I'm getting a clustered x-axis where the dates are not showing properly. How is it possible to remove the months and get only the years on the x-axis so it wont be as clustered and it would show the years properly?
fig,ax = plt.subplots(2,1)
ax[0].hist(pca_function(sd_Data))
ax[0].set_ylabel ('Frequency')
ax[1].plot(pca_function(sd_Data))
ax[1].set_xlabel ('Years')
fig.suptitle('Histogram and Time series of Plot Factor')
plt.tight_layout()
# fig.savefig('factor1959.pdf')
pca_function(sd_Data)
comp_0
sasdate
1959-01 -0.418150
1959-02 1.341654
1959-03 1.684372
1959-04 1.981473
1959-05 1.242232
...
2019-08 -0.075270
2019-09 -0.402110
2019-10 -0.609002
2019-11 0.320586
2019-12 -0.303515
[732 rows x 1 columns]
From what I see, you do have years on your second subplot, they are just overlapped because there are to many of them placed horizontally. Try to increase figsize, and rotate ticks:
# Builds an example dataframe.
df = pd.DataFrame(columns=['Years', 'Frequency'])
df['Years'] = pd.date_range(start='1/1/1959', end='1/1/2023', freq='M')
df['Frequency'] = np.random.normal(0, 1, size=(df.shape[0]))
fig, ax = plt.subplots(2,1, figsize=(20, 5))
ax[0].hist(df.Frequency)
ax[0].set_ylabel ('Frequency')
ax[1].plot(df.Years, df.Frequency)
ax[1].set_xlabel('Years')
for tick in ax[0].get_xticklabels():
tick.set_rotation(45)
tick.set_ha('right')
for tick in ax[1].get_xticklabels():
tick.set_rotation(45)
tick.set_ha('right')
fig.suptitle('Histogram and Time series of Plot Factor')
plt.tight_layout()
p.s. if the x-labels still overlap, try to increase your step size.
First off, you need to store the result of the call to pca_function into a variable. E.g. called result_pca_func. That way, the calculations (and possibly side effects or different randomization) are only done once.
Second, the dates should be converted to a datetime format. For example using pd.to_datetime(). That way, matplotlib can automatically put year ticks as appropriate.
Here is an example, starting from a dummy test dataframe:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date': [f'{y}-{m:02d}' for y in range(1959, 2019) for m in range(1, 13)]})
df['Values'] = np.random.randn(len(df)).cumsum()
df = df.set_index('Date')
result_pca_func = df
result_pca_func.index = pd.to_datetime(result_pca_func.index)
fig, ax2 = plt.subplots(figsize=(10, 3))
ax2.plot(result_pca_func)
plt.tight_layout()
plt.show()

how to highlight weekends in matplotlib plots?

For a simple time series:
import pandas as pd
df = pd.DataFrame({'dt':['2020-01-01', '2020-01-02', '2020-01-04', '2020-01-05', '2020-01-06'], 'foo':[1,2, 4,5,6]})
df['dt'] = pd.to_datetime(df.dt)
df['dt_label']= df['dt'].dt.strftime('%Y-%m-%d %a')
df = df.set_index('dt')
#display(df)
df['foo'].plot()
x =plt.xticks(ticks=df.reset_index().dt.values, labels=df.dt_label, rotation=90, horizontalalignment='right')
How can I highlight the x-axis labels for weekends?
edit
Pandas Plots: Separate color for weekends, pretty printing times on x axis
suggests:
def highlight_weekends(ax, timeseries):
d = timeseries.dt
ranges = timeseries[d.dayofweek >= 5].groupby(d.year * 100 + d.weekofyear).agg(['min', 'max'])
for i, tmin, tmax in ranges.itertuples():
ax.axvspan(tmin, tmax, facecolor='orange', edgecolor='none', alpha=0.1)
but applying it with
highlight_weekends(ax, df.reset_index().dt)
will not change the plot
I've extended your sample data a little so we can can make sure that we can highlight more than a single weekend instance.
In this solution I create a column 'weekend', which is a column of bools indicating whether the corresponding date was at a weekend.
We then loop over these values and make a call to ax.axvspan
import pandas as pd
import matplotlib.pyplot as plt
# Add a couple of extra dates to sample data
df = pd.DataFrame({'dt': ['2020-01-01',
'2020-01-02',
'2020-01-04',
'2020-01-05',
'2020-01-06',
'2020-01-07',
'2020-01-09',
'2020-01-10',
'2020-01-11',
'2020-01-12']})
# Fill in corresponding observations
df['foo'] = range(df.shape[0])
df['dt'] = pd.to_datetime(df.dt)
df['dt_label']= df['dt'].dt.strftime('%Y-%m-%d %a')
df = df.set_index('dt')
ax = df['foo'].plot()
plt.xticks(ticks=df.reset_index().dt.values,
labels=df.dt_label,
rotation=90,
horizontalalignment='right')
# Create an extra column which highlights whether or not a date occurs at the weekend
df['weekend'] = df['dt_label'].apply(lambda x: x.endswith(('Sat', 'Sun')))
# Loop over weekend pairs (Saturdays and Sundays), and highlight
for i in range(df['weekend'].sum() // 2):
ax.axvspan(df[df['weekend']].index[2*i],
df[df['weekend']].index[2*i+1],
alpha=0.5)
Here is a solution that uses the fill_between plotting function and the x-axis units so that weekends can be highlighted independently from the DatetimeIndex and the frequency of the data.
The x-axis limits are used to compute the range of time covered by the plot in terms of days, which is the unit used for matplotlib dates. Then a weekends mask is computed and passed to the where argument of the fill_between function. The masks are processed as right-exclusive so in this case, they must contain Mondays for the highlights to be drawn up to Mondays 00:00. Because plotting these highlights can alter the x-axis limits when weekends occur near the limits, the x-axis limits are set back to the original values after plotting.
Note that contrary to axvspan, the fill_between function needs the y1 and y2 arguments. For some reason, using the default y-axis limits leaves a small gap between the plot frame and the tops and bottoms of the weekend highlights. This issue is solved by running ax.set_ylim(*ax.get_ylim()) just after creating the plot.
Here is a complete example based on the provided sample code and using an extended dataset similar to the answer provided by jwalton:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import matplotlib.dates as mdates
# Create sample dataset
dt = pd.to_datetime(['2020-01-01', '2020-01-02', '2020-01-04', '2020-01-05',
'2020-01-06', '2020-01-07', '2020-01-09', '2020-01-10',
'2020-01-11', '2020-01-14'])
df = pd.DataFrame(dict(foo=range(len(dt))), index=dt)
# Draw pandas plot: setting x_compat=True converts the pandas x-axis units to
# matplotlib date units. This is not necessary for this particular example but
# it is necessary for all cases where the dataframe contains a continuous
# DatetimeIndex (for example ones created with pd.date_range) that uses a
# frequency other than daily
ax = df['foo'].plot(x_compat=True, figsize=(6,4), ylabel='foo')
ax.set_ylim(*ax.get_ylim()) # reset y limits to display highlights without gaps
# Highlight weekends based on the x-axis units
xmin, xmax = ax.get_xlim()
days = np.arange(np.floor(xmin), np.ceil(xmax)+2) # range of days in date units
weekends = [(dt.weekday()>=5)|(dt.weekday()==0) for dt in mdates.num2date(days)]
ax.fill_between(days, *ax.get_ylim(), where=weekends, facecolor='k', alpha=.1)
ax.set_xlim(xmin, xmax) # set limits back to default values
# Create and format x tick for each data point
plt.xticks(df.index.values, df.index.strftime('%d\n%a'), rotation=0, ha='center')
plt.title('Weekends are highlighted from SAT 00:00 to MON 00:00', pad=15, size=12);
You can find more examples of this solution in the answers I have posted here and here.

Matplotlib: How to skip a range of hours when plotting with a datetime axis?

I have tick-by-tick data of a financial instrument, which I am trying to plot using matplotlib. I am working with pandas and the data is indexed with DatetimeIndex.
The problem is, when I try to plot multiple trading days I can't skip the range of time between the market closing time and next day's opening (see the example), which of course I am not interested in.
Is there a way to make matplotlib ignore this and just "stick" together the closing quote with the following day's opening? I tried to pass a custom range of time:
plt.xticks(time_range)
But the result is the same. Any ideas how to do this?
# Example data
instrument = pd.DataFrame(data={
'Datetime': [
dt.datetime.strptime('2018-01-11 11:00:11', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-11 13:02:17', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-11 16:59:14', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-12 11:00:11', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-12 13:15:24', '%Y-%m-%d %H:%M:%S'),
dt.datetime.strptime('2018-01-12 16:58:43', '%Y-%m-%d %H:%M:%S')
],
'Price': [127.6, 128.1, 127.95, 129.85, 129.7, 131.2],
'Volume': [725, 146, 48, 650, 75, 160]
}).set_index('Datetime')
plt.figure(figsize=(10,5))
top = plt.subplot2grid((4,4), (0, 0), rowspan=3, colspan=4)
bottom = plt.subplot2grid((4,4), (3,0), rowspan=1, colspan=4)
top.plot(instrument.index, instrument['Price'])
bottom.bar(instrument.index, instrument['Volume'], 0.005)
top.xaxis.get_major_ticks()
top.axes.get_xaxis().set_visible(False)
top.set_title('Example')
top.set_ylabel('Price')
bottom.set_ylabel('Volume')
TL;DR
Replace the matplotlib plotting functions:
top.plot(instrument.index, instrument['Price'])
bottom.bar(instrument.index, instrument['Volume'], 0.005)
With these ones:
top.plot(range(instrument.index.size), instrument['Price'])
bottom.bar(range(instrument.index.size), instrument['Volume'], width=1)
Or with these pandas plotting functions (only the x-axis limits will look different):
instrument['Price'].plot(use_index=False, ax=top)
instrument['Volume'].plot.bar(width=1, ax=bottom)
Align both plots by sharing the x-axis with sharex=True and set up the ticks as you would like them using the dataframe index, as shown in the example further below.
Let me first create a sample dataset and show what it looks like if I plot it using matplotlib plotting functions like in your example where the DatetimeIndex is used as the x variable.
Create sample dataset
The sample data is created using the pandas_market_calendars package to create a realistic DatetimeIndex with a minute-by-minute frequency that spans several weekdays and a weekend.
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import matplotlib.ticker as ticker
import pandas_market_calendars as mcal # v 1.6.1
# Create datetime index with a 'minute start' frequency based on the New
# York Stock Exchange trading hours (end date is inclusive)
nyse = mcal.get_calendar('NYSE')
nyse_schedule = nyse.schedule(start_date='2021-01-07', end_date='2021-01-11')
nyse_dti = mcal.date_range(nyse_schedule, frequency='1min', closed='left')\
.tz_convert(nyse.tz.zone)
# Remove timestamps of closing times to create a 'period start' datetime index
nyse_dti = nyse_dti.delete(nyse_dti.indexer_at_time('16:00'))
# Create sample of random data consisting of opening price and
# volume of financial instrument traded for each period
rng = np.random.default_rng(seed=1234) # random number generator
price_change = rng.normal(scale=0.1, size=nyse_dti.size)
price_open = 127.5 + np.cumsum(price_change)
volume = rng.integers(100, 10000, size=nyse_dti.size)
df = pd.DataFrame(data=dict(Price=price_open, Volume=volume), index=nyse_dti)
df.head()
# Price Volume
# 2021-01-07 09:30:00-05:00 127.339616 7476
# 2021-01-07 09:31:00-05:00 127.346026 3633
# 2021-01-07 09:32:00-05:00 127.420115 1339
# 2021-01-07 09:33:00-05:00 127.435377 3750
# 2021-01-07 09:34:00-05:00 127.521752 7354
Plot data with matplotlib using the DatetimeIndex
This sample data can now be plotted using matplotlib plotting functions like in your example, but note that the subplots are created by using plt.subplots with the sharex=True argument. This aligns the line with the bars correctly and makes it possible to use the interactive interface of matplotlib with both subplots.
# Create figure and plots using matplotlib functions
fig, (top, bot) = plt.subplots(2, 1, sharex=True, figsize=(10,5),
gridspec_kw=dict(height_ratios=[0.75,0.25]))
top.plot(df.index, df['Price'])
bot.bar(df.index, df['Volume'], 0.0008)
# Set title and labels
top.set_title('Matplotlib plots with unwanted gaps', pad=20, size=14, weight='semibold')
top.set_ylabel('Price', labelpad=10)
bot.set_ylabel('Volume', labelpad=10);
Plot data with matplotlib without any gaps by using a range of integers
The problem of these gaps can be solved by simply ignoring the DatetimeIndex and using a range of integers instead. Most of the work then lies in creating appropriate tick labels. Here is an example:
# Create figure and matplotlib plots with some additional formatting
fig, (top, bot) = plt.subplots(2, 1, sharex=True, figsize=(10,5),
gridspec_kw=dict(height_ratios=[0.75,0.25]))
top.plot(range(df.index.size), df['Price'])
top.set_title('Matplotlib plots without any gaps', pad=20, size=14, weight='semibold')
top.set_ylabel('Price', labelpad=10)
top.grid(axis='x', alpha=0.3)
bot.bar(range(df.index.size), df['Volume'], width=1)
bot.set_ylabel('Volume', labelpad=10)
# Set fixed major and minor tick locations
ticks_date = df.index.indexer_at_time('09:30')
ticks_time = np.arange(df.index.size)[df.index.minute == 0][::2] # step in hours
bot.set_xticks(ticks_date)
bot.set_xticks(ticks_time, minor=True)
# Format major and minor tick labels
labels_date = [maj_tick.strftime('\n%d-%b').replace('\n0', '\n')
for maj_tick in df.index[ticks_date]]
labels_time = [min_tick.strftime('%I %p').lstrip('0').lower()
for min_tick in df.index[ticks_time]]
bot.set_xticklabels(labels_date)
bot.set_xticklabels(labels_time, minor=True)
bot.figure.autofmt_xdate(rotation=0, ha='center', which='both')
Create dynamic ticks for interactive plots
If you like to use the interactive interface of matplotlib (with pan/zoom), you will need to use locators and formatters from the matplotlib ticker module. Here is an example of how to set the ticks, where the major ticks are fixed and formatted like above but the minor ticks are generated automatically as you zoom in/out of the plot:
# Set fixed major tick locations and automatic minor tick locations
ticks_date = df.index.indexer_at_time('09:30')
bot.set_xticks(ticks_date)
bot.xaxis.set_minor_locator(ticker.AutoLocator())
# Format major tick labels
labels_date = [maj_tick.strftime('\n%d-%b').replace('\n0', '\n')
for maj_tick in df.index[ticks_date]]
bot.set_xticklabels(labels_date)
# Format minor tick labels
def min_label(x, pos):
if 0 <= x < df.index.size:
return df.index[int(x)].strftime('%H:%M')
min_fmtr = ticker.FuncFormatter(min_label)
bot.xaxis.set_minor_formatter(min_fmtr)
bot.figure.autofmt_xdate(rotation=0, ha='center', which='both')
Documentation: example of an alternative solution; datetime string format codes
Maybe use https://pypi.org/project/mplfinance/
Allows mimicking the usual financial plots you see in most services.
When you call the mplfinance mpf.plot() function, there is a kwarg show_nontrading, which by default is set to False so that these unwanted gaps are automatically not plotted. (To plot them, set show_nontrading=True).

Formatting X axis labels Pandas time series plot

I am trying to plot a multiple time series dataframe in pandas. The time series is a 1 year daily points of length 365. The figure is coming alright but I want to suppress the year tick showing on the x axis.
I want to suppress the 1950 label showing in the left corner of x axis. Can anybody suggest something on this? My code
dates = pandas.date_range('1950-01-01', '1950-12-31', freq='D')
data_to_plot12 = pandas.DataFrame(data=data_array, # values
index=homo_regions) # 1st column as index
dataframe1 = pandas.DataFrame.transpose(data_to_plot12)
dataframe1.index = dates
ax = dataframe1.plot(lw=1.5, marker='.', markersize=2, title='PRECT time series PI Slb Ocn CNTRL 60 years')
ax.set(xlabel="Months", ylabel="PRECT (mm/day)")
fig_name = 'dataframe1.pdf'
plt.savefig(fig_name)
You should be able to specify the xaxis major formatter like so
import matplotlib.dates as mdates
...
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))

Plot tuple and string using pyplot

I've got list of tuples with timestamps and usernames and I'm trying to make a plot of the frequency of the timestamps.
That means the x-axis will be from the earliest timestamp to the latest, y-axis will show the frequency within a time period. I'm also looking for way to mark the plots of username the timestamp is connected to (with different colors etc.).
I've been googling for hours, but can't find any examples that does what I'm looking for. Could anyone here point me in the right direction?
The timestamp is epoch time. Example:
lst= [('john', '1446302675'), ('elvis', '1446300605'),('peter','1446300622'), ...]
Thanks
You could create the histogram by simply counting the users per date, and then show it with a bar chart:
import numpy as np
import matplotlib.pyplot as plt
# some data
data=[('2013-05-15','test'),('2013-05-16','test'),('2013-05-14','user2'),('2013-05-14', 'user')]
# to create the histogram I use a dict()
hist = dict()
for x in data:
if x[0] in hist:
hist[x[0]] += 1
else:
hist[x[0]] = 1
# extract the labels and sort them
labels = [x for x in hist]
labels = sorted(labels)
# extract the values
values = [hist[x] for x in labels]
num = len(labels)
# now plot the values as bar chart
fig, ax = plt.subplots()
barwidth = 0.3
ax.bar(np.arange(num),values,barwidth)
ax.set_xticks(np.arange(num)+barwidth/2)
ax.set_xticklabels(labels)
plt.show()
This results in:
For more details on creating a bar chart see [an example]
(http://matplotlib.org/examples/api/barchart_demo.html).
Edit 1: with the data format you added you can use my example by converting the timestamp using this method:
>>> from datetime import datetime
>>> datetime.fromtimestamp(float('1446302675')).strftime('%Y-%m-%d %H:%M:%S')
'2015-10-31 15:44:35'
If you only want to use year-month-date, then you can use:
>>> datetime.fromtimestamp(float('1446302675')).strftime('%Y-%m-%d')
'2015-10-31'

Categories