Ignoring Time gaps larger than x mins Matplotlib in Python - python

I get data every 5 mins between 9:30am and 4pm. Most days I just plot live intraday data. However, sometimes I want a historical view of lets says 2+ days. The only problem is that during 4pm and 9:30 am I just get a line connecting the two data points. I would like that gap to disappear. My code and an example of what is happening are below;
fig = plt.figure()
plt.ylabel('Bps')
plt.xlabel('Date/Time')
plt.title(ticker)
ax = fig.add_subplot(111)
myFmt = mdates.DateFormatter('%m/%d %I:%M')
ax.xaxis.set_major_formatter(myFmt)
line, = ax.plot(data['Date_Time'],data['Y'],'b-')
I want to keep the data as a time series so that when i scroll over it I can see the exact date and time.

So it looks like you're using a pandas object, which is helpful. Assuming you have filtered out any time between 4pm and 9am in data['Date_Time'], I would make sure your index is reset via data.reset_index(). You'll want to use that integer index as the under-the-hood index that matplotlib actually uses to plot the timeseries. Then you can manually alter the tick labels themselves with plt.xticks() as seen in this demo case. Put together, I would expect it to look something like this:
data = data.reset_index(drop=True) # just to remove that pesky column
fig, ax = plt.subplots(1,1)
ax.plot(data.index, data['Y'])
plt.ylabel('Bps')
plt.xlabel('Date/Time')
plt.title(ticker)
plt.xticks(data.index, data['Date_Time'])
I noticed the last statement in your question just after posting this. Unfortunately, this "solution" doesn't track the "x" variable in an interactive figure. That is, while the time axis labels adjust to your zoom, you can't know the time by cursor location, so you'd have to eyeball it up from the bottom of the figure. :/

Related

How to show the X-axis date ticks neatly or in a form of certain interval in a plotly chart

I am making an OHLC graph using plotly. I have stumbled across one issue. The labels in the x-axis is looking really messy . Is there a way to make it more neat. Or can we only show the extreme date values. For example only the first date value and last date value is show. The date range is a dynamic in nature. I am using the below query to make the graph . Thanks for the help.
fig = go.Figure(data=go.Candlestick(x=tickerDf.index.date,
open=tickerDf.Open,
high=tickerDf.High,
low=tickerDf.Low,
close=tickerDf.Close) )
fig.update_xaxes(showticklabels=True ) #Disable xticks
fig.update_layout(width=800,height=600,xaxis=dict(type = "category") ) # hide dates with no values
st.plotly_chart(fig)
Here tickerDf is the dataframe which contains the stock related data.
One way that you can use is changing the nticks. This can be done by calling fig.update_xaxes() and passing nticks as the parameter. For example, here's a plot with the regular amount of ticks, with no change.
and here is what it looks like after specifying the number of ticks:
The code for the second plot:
import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('./finance-charts-apple.csv')
fig = go.Figure([go.Scatter(x=df['Date'], y=df['AAPL.High'])])
fig.update_xaxes(nticks=5)
fig.show()
the important line, again is:
fig.update_xaxes(nticks=5)

Matplotlib best practices for automatically spacing out/omitting overlapping tick labels and annotations

I've decided to use Matplotlib in one of my projects which involves having to automatically generate graphs and slapping them onto reports.
Trying to make matplotlib graphs attractive to the eye is something that's been a lot of fun - however there's still just 1 little bit I'm somewhat stuck on!
Right now, I have an issue in cases where there are tonnes of data points. The problem occurs when the x-axis ticks and the annotations overlap!
With few datapoints, the graph is very pretty:
However, in edge cases with very large amount of datapoints, it gets completely messed up:
What I'd basically like Matplotlib to do is to use some kind of determination to make sure that no other annotation element is within range when it applies an annotation. Same concept for the x-axis ticks!
The solutions I've ruled out are things like the x-axis showing every other tick, since in some cases it's possible that even just the first 3 ticks are very close to each other!
You can control your xticks using both set_xticks and set_xticklabels methods knowing that you're controlling just the x-axis. So, your data won't be affected
Here is an example; I've generated a list called days which contains all days in 2019 and the output graph:
from datetime import date, timedelta
import matplotlib.pyplot as plt
sdate = date(2019, 1, 1)
edate = date(2019, 12, 31)
delta = edate - sdate
days = [sdate + timedelta(days=i) for i in range(delta.days+1)]
fig, ax = plt.subplots()
ax.set_xticks(range(365))
ax.set_xticklabels(days)
plt.xticks(rotation=45)
plt.show()
And it generated this graph which looks close enough to yours:
Now, let's see how to use set_xticks and set_xticklabels to handle this issue. All you need to do is to limit the vectors getting passed to these two methods like so:
#skips 30 items in-between
ax.set_xticks(range(0, 365, 30))
ax.set_xticklabels(days[::30])
And this produces this graph:
That's how you can control the ticks of your x-axis. I pretty much believe you can find a similar way to control the labels of your data points.

3D scatter plot with in Python extracted from Dates

Question :
Is there a way I can convert day to String rather than decimal value? Similarly for Month.
Note: I already visited this (3D Scatterplot with strings in Python) answer which does not solve my question.
I am working on a self project where I am trying to create 3D chart for my commute from data I retrieved from my google activity.
For reference I am following this guide : https://nvbn.github.io/2018/05/01/commute/
I am able to create informative 2D chart based on Month + Time and Day +Time attributes however I wish to combine these 2 chart.
3D chart I want to create requires 3 attribute Day (Mon/Tue) , Month (Jan/Feb), Time taken.
Given that matplotlib does not support String values in charts right away I have used Number for Day (0-7) and Month (1-12). However graph seems bit obscure with decimal values for days. Looks like following
My current code looks like this, retrieving weekday() to get day number, and month for month.
# How commute is calculated and grouped
import pandas as pd
#{...}
def get_commute_to_work():
#{...}
yield Commute_to_work(pd.to_datetime(start.datetime), start.datetime, end.datetime, end.datetime - start.datetime)
#Now creating graph here
fig, ax = pyplot.subplots(subplot_kw={'projection': '3d'})
ax.grid()
ax.scatter([commute.day.weekday() for commute in normalised],
[commute.day.month for commute in normalised],
[commute.took.total_seconds() / 60 for commute in normalised])
ax.set(xlabel='Day',ylabel='Month' ,zlabel='commute (minutes)',
title='Daily commute')
ax.legend()
pyplot.show()
nb. if you wish to gaze into detail of this code it's available on github here
You can try this (I have not verified for the 3d plot though):
x_tick_labels = ['Sun','Mon','Tue','Wed','Thurs', 'Fri', 'Sat']
# Set number of ticks for x-axis
x = np.linspace(1.0, 4.0, 7) # Why you have 9 days in a week is beyond me
ax.set_xticks(x)
# Set ticks labels for x-axis
ax.set_xticklabels(x_ticks_labels, rotation='vertical', fontsize=18)
You can repeat a similar procedure for months.
The source for this answer is here.

Compare multiple year data on a single plot python

I have two time series from different years stored in pandas dataframes. For example:
data15 = pd.DataFrame(
[1,2,3,4,5,6,7,8,9,10,11,12],
index=pd.date_range(start='2015-01',end='2016-01',freq='M'),
columns=['2015']
)
data16 = pd.DataFrame(
[5,4,3,2,1],
index=pd.date_range(start='2016-01',end='2016-06',freq='M'),
columns=['2016']
)
I'm actually working with daily data but if this question is answered sufficiently I can figure out the rest.
What I'm trying to do is overlay the plots of these different data sets onto a single plot from January through December to compare the differences between the years. I can do this by creating a "false" index for one of the datasets so they have a common year:
data16.index = data15.index[:len(data16)]
ax = data15.plot()
data16.plot(ax=ax)
But I would like to avoid messing with the index if possible. Another problem with this method is that the year (2015) will appear in the x axis tick label which I don't want. Does anyone know of a better way to do this?
One way to do this would be to overlay a transparent axes over the first, and plot the 2nd dataframe in that one, but then you'd need to update the x-limits of both axes at the same time (similar to twinx). However, I think that's far more work and has a few more downsides: you can't easily zoom interactively into a specific region anymore for example, unless you make sure both axes are linked via their x-limits. Really, the easiest is to take into account that offset, by "messing with the index".
As for the tick labels, you can easily change the format so that they don't show the year by specifying the x-tick format:
import matplotlib.dates as mdates
month_day_fmt = mdates.DateFormatter('%b %d') # "Locale's abbreviated month name. + day of the month"
ax.xaxis.set_major_formatter(month_day_fmt)
Have a look at the matplotlib API example for specifying the date format.
I see two options.
Option 1: add a month column to your dataframes
data15['month'] = data15.index.to_series().dt.strftime('%b')
data16['month'] = data16.index.to_series().dt.strftime('%b')
ax = data16.plot(x='month', y='2016')
ax = data15.plot(x='month', y='2015', ax=ax)
Option 2: if you don't want to do that, you can use matplotlib directly
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(data15['2015'].values)
ax.plot(data16['2016'].values)
plt.xticks(range(len(data15)), data15.index.to_series().dt.strftime('%b'), size='small')
Needless to say, I would recommend the first option.
You might be able to use pandas.DatetimeIndex.dayofyear to get the day number which will allow you to plot two different year's data on top of one another.
in: date=pd.datetime('2008-10-31')
in: date.dayofyear
out: 305

Plotting multiple timeseries power data using matplotlib and pandas

I have a csv file of power levels at several stations (4 in this case, though "HUT4" is not in this short excerpt):
2014-06-21T20:03:21,HUT3,74
2014-06-21T21:03:16,HUT1,70
2014-06-21T21:04:31,HUT3,73
2014-06-21T21:04:33,HUT2,30
2014-06-21T22:03:50,HUT3,64
2014-06-21T23:03:29,HUT1,60
(etc . .)
The times are not synchronised across stations. The power level is (in this case) integer percent. Some machines report in volts (~13.0), which would be an additional issue when plotting.
The data is easy to read into a dataframe, to index the dataframe, to put into a dictionary. But I can't get the right syntax to make a meaningful plot. Either all stations on a single plot sharing a timeline that's big enough for all stations, or as separate plots, maybe a subplot for each station. If I do:
import pandas as pd
df = pd.read_csv('Power_Log.csv',names=['DT','Station','Power'])
df2=df.groupby(['Station']) # set 'Station' as the data index
d = dict(iter(df2)) # make a dictionary including each station's data
for stn in d.keys():
d[stn].plot(x='DT',y='Power')
plt.legend(loc='lower right')
plt.savefig('Station_Power.png')
I do get a plot but the X axis is not right for each station.
I have not figured out yet how to do four independent subplots, which would free me from making a wide-enough timescale.
I would greatly appreciate comments on getting a single plot right and/or getting good looking subplots. The subplots do not need to have synchronised X axes.
I'd rather plot the typical way, smth like:
import matplotlib.pyplot as plt
plt.plot([1,2,3,4], [1,4,9,16], 'ro')
plt.axis([0, 6, 0, 20])
plt.savefig()
( http://matplotlib.org/users/pyplot_tutorial.html )
Re more subplots: simply call plt.plot() multiple times, once for each data series.
P.S. you can set xticks this way: Changing the "tick frequency" on x or y axis in matplotlib?
Sorry for the comment above where I needed to add code. Still learning . .
From the 5th code line:
import matplotlib.dates as mdates
for stn in d.keys():
plt.figure()
d[stn].interpolate().plot(x='DT',y='Power',title=stn,rot=45)
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%D/%M/%Y'))
plt.savefig('Station_Power_'+stn+'.png')
Does more or less what I want to do except the DateFormatter line does not work. I would like to shorten my datetime data to show just date. If it places ticks at midnight that would be brilliant but not strictly necessary.
The key to getting a continuous plot is to use the interpolate() method in the plot.
With this data having different x scales from station to station a plot of all stations on the same graph does not work. HUT4 in my data has far fewer records and only plots to about 25% of the scale even though the datetime values cover more or less the same range as the other HUTs.

Categories