3D scatter plot with in Python extracted from Dates - python

Question :
Is there a way I can convert day to String rather than decimal value? Similarly for Month.
Note: I already visited this (3D Scatterplot with strings in Python) answer which does not solve my question.
I am working on a self project where I am trying to create 3D chart for my commute from data I retrieved from my google activity.
For reference I am following this guide : https://nvbn.github.io/2018/05/01/commute/
I am able to create informative 2D chart based on Month + Time and Day +Time attributes however I wish to combine these 2 chart.
3D chart I want to create requires 3 attribute Day (Mon/Tue) , Month (Jan/Feb), Time taken.
Given that matplotlib does not support String values in charts right away I have used Number for Day (0-7) and Month (1-12). However graph seems bit obscure with decimal values for days. Looks like following
My current code looks like this, retrieving weekday() to get day number, and month for month.
# How commute is calculated and grouped
import pandas as pd
#{...}
def get_commute_to_work():
#{...}
yield Commute_to_work(pd.to_datetime(start.datetime), start.datetime, end.datetime, end.datetime - start.datetime)
#Now creating graph here
fig, ax = pyplot.subplots(subplot_kw={'projection': '3d'})
ax.grid()
ax.scatter([commute.day.weekday() for commute in normalised],
[commute.day.month for commute in normalised],
[commute.took.total_seconds() / 60 for commute in normalised])
ax.set(xlabel='Day',ylabel='Month' ,zlabel='commute (minutes)',
title='Daily commute')
ax.legend()
pyplot.show()
nb. if you wish to gaze into detail of this code it's available on github here

You can try this (I have not verified for the 3d plot though):
x_tick_labels = ['Sun','Mon','Tue','Wed','Thurs', 'Fri', 'Sat']
# Set number of ticks for x-axis
x = np.linspace(1.0, 4.0, 7) # Why you have 9 days in a week is beyond me
ax.set_xticks(x)
# Set ticks labels for x-axis
ax.set_xticklabels(x_ticks_labels, rotation='vertical', fontsize=18)
You can repeat a similar procedure for months.
The source for this answer is here.

Related

How to set a starting day in heatmap (python)?

I have drawn a heatmap that represents the anomaly score value for a specific week and its days. The heatmap I got is shown below.
Now, on the Y-axis, I want the day that should start with Monday and on the x-axis, the gap between the two dates should be 7 days i.e. one week. Is there any other way to draw a heatmap to get the results I desired? Or is there any other ways to set the parameters in the existing heatmap function (sns.heatmap())?
There may be a more sophisticated way to do this, but I have taken the day of the week values from the sample data dates and pivot transformed them to be the source data for the graph.
Next, we will create a list of days of the week to make the day of the week data into day names. Then, create a label for the x-axis with a date interval of 7 days.
weekday = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
ax = sns.heatmap(df_pivot, cmap='afmhot_r')
freq = 7
ax.set_xticks(df.index[::freq])
ax.set_xticklabels(df.iloc[::freq]["date"].dt.strftime("%Y-%m-%d"))
ax.set_yticklabels(weekday, rotation=0)
ax.invert_yaxis()
plt.show()

Highlighting weekends in small multiples

How can I highlight weekends in a small multiples?
I've read different threads (e.g. (1) and (2)) but couldn't figure out how to implement it into my case, since I work with small multiples where I iterate through the DateTimeIndex to every month (see code below figure). My data Profiles is for this case a time-series of 2 years with an interval of 15min (i.e. 70080 datapoints).
However, weekend days occuring at the end of the month and therefore generate an error; in this case: IndexError: index 2972 is out of bounds for axis 0 with size 2972
My attempt: [Edited - with suggestions by #Patrick FitzGerald]
In [10]:
class highlightWeekend:
'''Class object to highlight weekends'''
def __init__(self, period):
self.ranges= period.index.dayofweek >= 5
self.res = [x for x, (i , j) in enumerate(zip( [2] + list(self.ranges), list(self.ranges) + [2])) if i != j]
if self.res[0] == 0 and self.ranges[0] == False:
del self.res[0]
if self.res[-1] == len(self.ranges) and self.ranges[-1] == False:
del self.res[-1]
months= Profiles.loc['2018'].groupby(lambda x: x.month)
fig, axs= plt.subplots(4,3, figsize= (16, 12), sharey=True)
axs= axs.flatten()
for i, j in months:
axs[i-1].plot(j.index, j)
if i < len(months):
k= 0
while k < len(highlightWeekend(j).res):
axs[i-1].axvspan(j.index[highlightWeekend(j).res[k]], j.index[highlightWeekend(j).res[k+1]], alpha=.2)
k+=2
i+=1
plt.show()
[Out 10]:
Question
How to solve the issue of the weekend day occuring at the end of the month ?
TL;DR Skip to Solution for method 2 to see the optimal solution, or skip to the last example for a solution with a single pandas line plot. In all three examples, weekends are highlighted using just 4-6 lines of code, the rest is for formatting and reproducibility.
Methods and tools
I am aware of two methods to highlight weekends on plots of time series, which can be applied both to single plots and to small multiples by looping over the array of subplots. This answer presents solutions for highlighting weekends but they can be easily adjusted to work for any recurring period of time.
Method 1: highlight based on the dataframe index
This method follows the logic of the code in the question and in the answers in the linked threads. Unfortunately, a problem arises when a weekend day occurs at the end of the month, the index number that is needed to draw the full span of the weekend exceeds the index range which produces an error. This issue is solved in the solution shown further below by computing the time difference between two timestamps and adding it to each timestamp of the DatetimeIndex when looping over them to highlight the weekends.
But two issues remain, i) this method does not work for time series with a frequency of more than a day, and ii) time series based on frequencies less than hourly (like 15 minutes) will require the drawing of many polygons which hurts performance. For these reasons, this method is presented here for the purpose of documentation and I suggest using instead method 2.
Method 2: highlight based on the x-axis units
This method uses the x-axis units, that is the number of days since the time origin (1970-01-01), to identify the weekends independently from the time series data being plotted which makes it much more flexible than method 1. The highlights are drawn for each full weekend day only, making this two times faster than method 1 for the examples presented below (according to a %%timeit test in Jupyter Notebook). This is the method I recommend using.
Tools in matplotlib that can be used to implement both methods
axvspan link demo, link API (used in Solution for method 1)
broken_barh link demo, link API
fill_between link demo, link API (used in Solution for method 2)
BrokenBarHCollection.span_where link demo, link API
To me, it seems that fill_between and BrokenBarHCollection.span_where are essentially the same. Both provide the handy where argument which is used in the solution for method 2 presented further below.
Solutions
Here is a reproducible sample dataset used to illustrate both methods, using a frequency of 6 hours. Note that the dataframe contains data only for one year which makes it possible to select the monthly data simply with df[df.index.month == month] to draw each subplot. You will need to adjust this if you are dealing with a multi-year DatetimeIndex.
Import packages used for all 3 examples and create the dataset for the first 2 examples
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import matplotlib.dates as mdates # used only for method 2
# Create sample dataset
rng = np.random.default_rng(seed=1) # random number generator
dti = pd.date_range('2018-01-01 00:00', '2018-12-31 23:59', freq='6H')
consumption = rng.integers(1000, 2000, size=dti.size)
df = pd.DataFrame(dict(consumption=consumption), index=dti)
Solution for method 1: highlight based on the dataframe index
In this solution, the weekends are highlighted using axvspan and the DatetimeIndex of the monthly dataframes df_month. The weekend timestamps are selected with df_month.index[df_month.weekday>=5].to_series() and the problem of exceeding the index range is solved by computing the timedelta from the frequency of the DatetimeIndex and adding it to each timestamp.
Of course, axvspan could also be used in a smarter way than shown here so that each weekend highlight is drawn in a single go, but I believe this would result in a less flexible solution and more code than what is presented in Solution for method 2.
# Draw and format subplots by looping through months and flattened array of axes
fig, axs = plt.subplots(4, 3, figsize=(10, 9), sharey=True)
for month, ax in zip(df.index.month.unique(), axs.flat):
# Select monthly data and plot it
df_month = df[df.index.month == month]
ax.plot(df_month.index, df_month['consumption'])
ax.set_ylim(0, 2500) # set limit similar to plot shown in question
# Draw vertical spans for weekends: computing the timedelta and adding it
# to the date solves the problem of exceeding the df_month.index
timedelta = pd.to_timedelta(df_month.index.freq)
weekends = df_month.index[df_month.index.weekday>=5].to_series()
for date in weekends:
ax.axvspan(date, date+timedelta, facecolor='k', edgecolor=None, alpha=.1)
# Format tick labels
ax.set_xticks(ax.get_xticks())
tk_labels = [pd.to_datetime(tk, unit='D').strftime('%d') for tk in ax.get_xticks()]
ax.set_xticklabels(tk_labels, rotation=0, ha='center')
# Add x labels for months
ax.set_xlabel(df_month.index[0].month_name().upper(), labelpad=5)
ax.xaxis.set_label_position('top')
# Add title and edit spaces between subplots
year = df.index[0].year
freq = df_month.index.freqstr
title = f'{year} consumption displayed for each month with a {freq} frequency'
fig.suptitle(title.upper(), y=0.95, fontsize=12)
fig.subplots_adjust(wspace=0.1, hspace=0.5)
fig.text(0.5, 0.99, 'Weekends are highlighted by using the DatetimeIndex',
ha='center', fontsize=14, weight='semibold');
As you can see, the weekend highlights end where the data ends as illustrated with the month of March. This is of course not noticeable if the DatetimeIndex is used to set the x-axis limits.
Solution for method 2: highlight based on the x-axis units
This solution uses the x-axis limits to compute the range of time covered by the plot in terms of days, which is the unit used for matplotlib dates. A weekends mask is computed and then passed to the where argument of the fill_between plotting function. The True values of the mask are processed as right-exclusive so in this case, Mondays must be included for the highlights to be drawn up to Mondays 00:00. Because plotting these highlights can alter the x-axis limits when weekends occur near the limits, the x-axis limits are set back to the original values after plotting.
Note that with fill_between the y1 and y2 arguments must be given. For some reason using the default y-axis limits leaves a small gap between the plot frame and the tops and bottoms of the weekend highlights. Here, the y limits are set to 0 and 2500 just to create an example similar to the one in the question but the following should be used instead for general cases: ax.set_ylim(*ax.get_ylim()).
# Draw and format subplots by looping through months and flattened array of axes
fig, axs = plt.subplots(4, 3, figsize=(10, 9), sharey=True)
for month, ax in zip(df.index.month.unique(), axs.flat):
# Select monthly data and plot it
df_month = df[df.index.month == month]
ax.plot(df_month.index, df_month['consumption'])
ax.set_ylim(0, 2500) # set limit like plot shown in question, or use next line
# ax.set_ylim(*ax.get_ylim())
# Highlight weekends based on the x-axis units, regardless of the DatetimeIndex
xmin, xmax = ax.get_xlim()
days = np.arange(np.floor(xmin), np.ceil(xmax)+2)
weekends = [(dt.weekday()>=5)|(dt.weekday()==0) for dt in mdates.num2date(days)]
ax.fill_between(days, *ax.get_ylim(), where=weekends, facecolor='k', alpha=.1)
ax.set_xlim(xmin, xmax) # set limits back to default values
# Create appropriate ticks with matplotlib date tick locator and formatter
tick_loc = mdates.MonthLocator(bymonthday=np.arange(1, 31, step=5))
ax.xaxis.set_major_locator(tick_loc)
tick_fmt = mdates.DateFormatter('%d')
ax.xaxis.set_major_formatter(tick_fmt)
# Add x labels for months
ax.set_xlabel(df_month.index[0].month_name().upper(), labelpad=5)
ax.xaxis.set_label_position('top')
# Add title and edit spaces between subplots
year = df.index[0].year
freq = df_month.index.freqstr
title = f'{year} consumption displayed for each month with a {freq} frequency'
fig.suptitle(title.upper(), y=0.95, fontsize=12)
fig.subplots_adjust(wspace=0.1, hspace=0.5)
fig.text(0.5, 0.99, 'Weekends are highlighted by using the x-axis units',
ha='center', fontsize=14, weight='semibold');
As you can see, the weekends are always highlighted to the full extent, regardless of where the data starts and ends.
Additional example of a solution for method 2 with a monthly time series and a pandas plot
This plot may not make much sense but it serves to illustrate the flexibility of method 2 and how to make it compatible with a pandas line plot. Note that the sample dataset uses a month start frequency so that the default ticks are aligned with the data points.
# Create sample dataset with a month start frequency
rng = np.random.default_rng(seed=1) # random number generator
dti = pd.date_range('2018-01-01 00:00', '2018-06-30 23:59', freq='MS')
consumption = rng.integers(1000, 2000, size=dti.size)
df = pd.DataFrame(dict(consumption=consumption), index=dti)
# Draw pandas plot: x_compat=True converts the pandas x-axis units to matplotlib
# date units
ax = df.plot(x_compat=True, figsize=(10, 4), legend=None)
ax.set_ylim(0, 2500) # set limit similar to plot shown in question, or use next line
# ax.set_ylim(*ax.get_ylim())
# Highlight weekends based on the x-axis units, regardless of the DatetimeIndex
xmin, xmax = ax.get_xlim()
days = np.arange(np.floor(xmin), np.ceil(xmax)+2)
weekends = [(dt.weekday()>=5)|(dt.weekday()==0) for dt in mdates.num2date(days)]
ax.fill_between(days, *ax.get_ylim(), where=weekends, facecolor='k', alpha=.1)
ax.set_xlim(xmin, xmax) # set limits back to default values
# Additional formatting
ax.figure.autofmt_xdate(rotation=0, ha='center')
ax.set_title('2018 consumption by month'.upper(), pad=15, fontsize=12)
ax.figure.text(0.5, 1.05, 'Weekends are highlighted by using the x-axis units',
ha='center', fontsize=14, weight='semibold');
You can find more examples of this solution in the answers I have posted here and here.
References: this answer by Nipun Batra, this answer by BenB, matplotlib.dates

python - how to set ticks on x-axis at specific location

On my x-axis I have days (for example 0-5000 days). Now I would like to divide this into years by dividing by 365 days so that from 0-365 it would say 2016, 366 - 2*365: 2017 etc.
What's the best way to do this?
Is there something like ax.tickValues(xrange(0,5000,365))?
Following is one example of how you could do it. The idea is following: First generate the tick-labels (years) you want to show starting from 2016. Here int(np.ceil(5000/365)) just gives you the number of years to display on the axis. Next, you generate the position where to put these ticks. Currently I am using centre of 0-365, 365-730, etc. as the location of ticks because you want 0-365 labeled as 2016 and so on.
You can adapt the below solution for your problem.
from matplotlib.ticker import AutoMinorLocator
fig, axes = plt.subplots(figsize=(10, 6))
x = range(3000)
plt.plot(x,x)
tcks = [2016+i for i in range(int(np.ceil(3000/365)))]
locs = [365*i for i in range(int(np.ceil(3000/365)))]
plt.xticks(locs, tcks)
axes.xaxis.set_minor_locator(AutoMinorLocator(12))

Ignoring Time gaps larger than x mins Matplotlib in Python

I get data every 5 mins between 9:30am and 4pm. Most days I just plot live intraday data. However, sometimes I want a historical view of lets says 2+ days. The only problem is that during 4pm and 9:30 am I just get a line connecting the two data points. I would like that gap to disappear. My code and an example of what is happening are below;
fig = plt.figure()
plt.ylabel('Bps')
plt.xlabel('Date/Time')
plt.title(ticker)
ax = fig.add_subplot(111)
myFmt = mdates.DateFormatter('%m/%d %I:%M')
ax.xaxis.set_major_formatter(myFmt)
line, = ax.plot(data['Date_Time'],data['Y'],'b-')
I want to keep the data as a time series so that when i scroll over it I can see the exact date and time.
So it looks like you're using a pandas object, which is helpful. Assuming you have filtered out any time between 4pm and 9am in data['Date_Time'], I would make sure your index is reset via data.reset_index(). You'll want to use that integer index as the under-the-hood index that matplotlib actually uses to plot the timeseries. Then you can manually alter the tick labels themselves with plt.xticks() as seen in this demo case. Put together, I would expect it to look something like this:
data = data.reset_index(drop=True) # just to remove that pesky column
fig, ax = plt.subplots(1,1)
ax.plot(data.index, data['Y'])
plt.ylabel('Bps')
plt.xlabel('Date/Time')
plt.title(ticker)
plt.xticks(data.index, data['Date_Time'])
I noticed the last statement in your question just after posting this. Unfortunately, this "solution" doesn't track the "x" variable in an interactive figure. That is, while the time axis labels adjust to your zoom, you can't know the time by cursor location, so you'd have to eyeball it up from the bottom of the figure. :/

Compare multiple year data on a single plot python

I have two time series from different years stored in pandas dataframes. For example:
data15 = pd.DataFrame(
[1,2,3,4,5,6,7,8,9,10,11,12],
index=pd.date_range(start='2015-01',end='2016-01',freq='M'),
columns=['2015']
)
data16 = pd.DataFrame(
[5,4,3,2,1],
index=pd.date_range(start='2016-01',end='2016-06',freq='M'),
columns=['2016']
)
I'm actually working with daily data but if this question is answered sufficiently I can figure out the rest.
What I'm trying to do is overlay the plots of these different data sets onto a single plot from January through December to compare the differences between the years. I can do this by creating a "false" index for one of the datasets so they have a common year:
data16.index = data15.index[:len(data16)]
ax = data15.plot()
data16.plot(ax=ax)
But I would like to avoid messing with the index if possible. Another problem with this method is that the year (2015) will appear in the x axis tick label which I don't want. Does anyone know of a better way to do this?
One way to do this would be to overlay a transparent axes over the first, and plot the 2nd dataframe in that one, but then you'd need to update the x-limits of both axes at the same time (similar to twinx). However, I think that's far more work and has a few more downsides: you can't easily zoom interactively into a specific region anymore for example, unless you make sure both axes are linked via their x-limits. Really, the easiest is to take into account that offset, by "messing with the index".
As for the tick labels, you can easily change the format so that they don't show the year by specifying the x-tick format:
import matplotlib.dates as mdates
month_day_fmt = mdates.DateFormatter('%b %d') # "Locale's abbreviated month name. + day of the month"
ax.xaxis.set_major_formatter(month_day_fmt)
Have a look at the matplotlib API example for specifying the date format.
I see two options.
Option 1: add a month column to your dataframes
data15['month'] = data15.index.to_series().dt.strftime('%b')
data16['month'] = data16.index.to_series().dt.strftime('%b')
ax = data16.plot(x='month', y='2016')
ax = data15.plot(x='month', y='2015', ax=ax)
Option 2: if you don't want to do that, you can use matplotlib directly
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(data15['2015'].values)
ax.plot(data16['2016'].values)
plt.xticks(range(len(data15)), data15.index.to_series().dt.strftime('%b'), size='small')
Needless to say, I would recommend the first option.
You might be able to use pandas.DatetimeIndex.dayofyear to get the day number which will allow you to plot two different year's data on top of one another.
in: date=pd.datetime('2008-10-31')
in: date.dayofyear
out: 305

Categories