python - how to set ticks on x-axis at specific location - python

On my x-axis I have days (for example 0-5000 days). Now I would like to divide this into years by dividing by 365 days so that from 0-365 it would say 2016, 366 - 2*365: 2017 etc.
What's the best way to do this?
Is there something like ax.tickValues(xrange(0,5000,365))?

Following is one example of how you could do it. The idea is following: First generate the tick-labels (years) you want to show starting from 2016. Here int(np.ceil(5000/365)) just gives you the number of years to display on the axis. Next, you generate the position where to put these ticks. Currently I am using centre of 0-365, 365-730, etc. as the location of ticks because you want 0-365 labeled as 2016 and so on.
You can adapt the below solution for your problem.
from matplotlib.ticker import AutoMinorLocator
fig, axes = plt.subplots(figsize=(10, 6))
x = range(3000)
plt.plot(x,x)
tcks = [2016+i for i in range(int(np.ceil(3000/365)))]
locs = [365*i for i in range(int(np.ceil(3000/365)))]
plt.xticks(locs, tcks)
axes.xaxis.set_minor_locator(AutoMinorLocator(12))

Related

Matplotlib best practices for automatically spacing out/omitting overlapping tick labels and annotations

I've decided to use Matplotlib in one of my projects which involves having to automatically generate graphs and slapping them onto reports.
Trying to make matplotlib graphs attractive to the eye is something that's been a lot of fun - however there's still just 1 little bit I'm somewhat stuck on!
Right now, I have an issue in cases where there are tonnes of data points. The problem occurs when the x-axis ticks and the annotations overlap!
With few datapoints, the graph is very pretty:
However, in edge cases with very large amount of datapoints, it gets completely messed up:
What I'd basically like Matplotlib to do is to use some kind of determination to make sure that no other annotation element is within range when it applies an annotation. Same concept for the x-axis ticks!
The solutions I've ruled out are things like the x-axis showing every other tick, since in some cases it's possible that even just the first 3 ticks are very close to each other!
You can control your xticks using both set_xticks and set_xticklabels methods knowing that you're controlling just the x-axis. So, your data won't be affected
Here is an example; I've generated a list called days which contains all days in 2019 and the output graph:
from datetime import date, timedelta
import matplotlib.pyplot as plt
sdate = date(2019, 1, 1)
edate = date(2019, 12, 31)
delta = edate - sdate
days = [sdate + timedelta(days=i) for i in range(delta.days+1)]
fig, ax = plt.subplots()
ax.set_xticks(range(365))
ax.set_xticklabels(days)
plt.xticks(rotation=45)
plt.show()
And it generated this graph which looks close enough to yours:
Now, let's see how to use set_xticks and set_xticklabels to handle this issue. All you need to do is to limit the vectors getting passed to these two methods like so:
#skips 30 items in-between
ax.set_xticks(range(0, 365, 30))
ax.set_xticklabels(days[::30])
And this produces this graph:
That's how you can control the ticks of your x-axis. I pretty much believe you can find a similar way to control the labels of your data points.

3D scatter plot with in Python extracted from Dates

Question :
Is there a way I can convert day to String rather than decimal value? Similarly for Month.
Note: I already visited this (3D Scatterplot with strings in Python) answer which does not solve my question.
I am working on a self project where I am trying to create 3D chart for my commute from data I retrieved from my google activity.
For reference I am following this guide : https://nvbn.github.io/2018/05/01/commute/
I am able to create informative 2D chart based on Month + Time and Day +Time attributes however I wish to combine these 2 chart.
3D chart I want to create requires 3 attribute Day (Mon/Tue) , Month (Jan/Feb), Time taken.
Given that matplotlib does not support String values in charts right away I have used Number for Day (0-7) and Month (1-12). However graph seems bit obscure with decimal values for days. Looks like following
My current code looks like this, retrieving weekday() to get day number, and month for month.
# How commute is calculated and grouped
import pandas as pd
#{...}
def get_commute_to_work():
#{...}
yield Commute_to_work(pd.to_datetime(start.datetime), start.datetime, end.datetime, end.datetime - start.datetime)
#Now creating graph here
fig, ax = pyplot.subplots(subplot_kw={'projection': '3d'})
ax.grid()
ax.scatter([commute.day.weekday() for commute in normalised],
[commute.day.month for commute in normalised],
[commute.took.total_seconds() / 60 for commute in normalised])
ax.set(xlabel='Day',ylabel='Month' ,zlabel='commute (minutes)',
title='Daily commute')
ax.legend()
pyplot.show()
nb. if you wish to gaze into detail of this code it's available on github here
You can try this (I have not verified for the 3d plot though):
x_tick_labels = ['Sun','Mon','Tue','Wed','Thurs', 'Fri', 'Sat']
# Set number of ticks for x-axis
x = np.linspace(1.0, 4.0, 7) # Why you have 9 days in a week is beyond me
ax.set_xticks(x)
# Set ticks labels for x-axis
ax.set_xticklabels(x_ticks_labels, rotation='vertical', fontsize=18)
You can repeat a similar procedure for months.
The source for this answer is here.

Plotting time-series data using matplotlib and showing year only at start of year

rcParams['date.autoformatter.month'] = "%b\n%Y"
I am using matpltolib to plot a time-series and if I set rcParams as above, the resulting plot has month name and year labeled at each tick. How can I set it up so that year is only plotted at january of each year. I tried doing this, but it does not work:
rcParams['date.autoformatter.month'] = "%b"
rcParams['date.autoformatter.year'] = "%Y"
The formatters do not allow to specify conditions on them. Depending on the span of the series, the AutoDateFormatter will either fall into the date.autoformatter.month range or the date.autoformatter.year range.
Also, the AutoDateLocator may not necessarily decide to actually tick the first of January at all.
I would hence suggest to specify the tickers directly to the desired format and locations. You may use the major ticks to show the years and the minor ticks to show the months. The format for the major ticks can then get a line break, in order not to overlap with the minor ticklabels.
import matplotlib.pyplot as plt
import matplotlib.dates
from datetime import datetime
t = [datetime(2016,1,1), datetime(2017,12,31)]
x = [0,1]
fig, ax = plt.subplots()
ax.plot(t,x)
ax.xaxis.set_major_locator(matplotlib.dates.YearLocator())
ax.xaxis.set_minor_locator(matplotlib.dates.MonthLocator((1,4,7,10)))
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter("\n%Y"))
ax.xaxis.set_minor_formatter(matplotlib.dates.DateFormatter("%b"))
plt.setp(ax.get_xticklabels(), rotation=0, ha="center")
plt.show()
You could then also adapt the minor ticks' lengths to match those of the major ones in case that is desired,
ax.tick_params(axis="x", which="both", length=4)

Ignoring Time gaps larger than x mins Matplotlib in Python

I get data every 5 mins between 9:30am and 4pm. Most days I just plot live intraday data. However, sometimes I want a historical view of lets says 2+ days. The only problem is that during 4pm and 9:30 am I just get a line connecting the two data points. I would like that gap to disappear. My code and an example of what is happening are below;
fig = plt.figure()
plt.ylabel('Bps')
plt.xlabel('Date/Time')
plt.title(ticker)
ax = fig.add_subplot(111)
myFmt = mdates.DateFormatter('%m/%d %I:%M')
ax.xaxis.set_major_formatter(myFmt)
line, = ax.plot(data['Date_Time'],data['Y'],'b-')
I want to keep the data as a time series so that when i scroll over it I can see the exact date and time.
So it looks like you're using a pandas object, which is helpful. Assuming you have filtered out any time between 4pm and 9am in data['Date_Time'], I would make sure your index is reset via data.reset_index(). You'll want to use that integer index as the under-the-hood index that matplotlib actually uses to plot the timeseries. Then you can manually alter the tick labels themselves with plt.xticks() as seen in this demo case. Put together, I would expect it to look something like this:
data = data.reset_index(drop=True) # just to remove that pesky column
fig, ax = plt.subplots(1,1)
ax.plot(data.index, data['Y'])
plt.ylabel('Bps')
plt.xlabel('Date/Time')
plt.title(ticker)
plt.xticks(data.index, data['Date_Time'])
I noticed the last statement in your question just after posting this. Unfortunately, this "solution" doesn't track the "x" variable in an interactive figure. That is, while the time axis labels adjust to your zoom, you can't know the time by cursor location, so you'd have to eyeball it up from the bottom of the figure. :/

Compare multiple year data on a single plot python

I have two time series from different years stored in pandas dataframes. For example:
data15 = pd.DataFrame(
[1,2,3,4,5,6,7,8,9,10,11,12],
index=pd.date_range(start='2015-01',end='2016-01',freq='M'),
columns=['2015']
)
data16 = pd.DataFrame(
[5,4,3,2,1],
index=pd.date_range(start='2016-01',end='2016-06',freq='M'),
columns=['2016']
)
I'm actually working with daily data but if this question is answered sufficiently I can figure out the rest.
What I'm trying to do is overlay the plots of these different data sets onto a single plot from January through December to compare the differences between the years. I can do this by creating a "false" index for one of the datasets so they have a common year:
data16.index = data15.index[:len(data16)]
ax = data15.plot()
data16.plot(ax=ax)
But I would like to avoid messing with the index if possible. Another problem with this method is that the year (2015) will appear in the x axis tick label which I don't want. Does anyone know of a better way to do this?
One way to do this would be to overlay a transparent axes over the first, and plot the 2nd dataframe in that one, but then you'd need to update the x-limits of both axes at the same time (similar to twinx). However, I think that's far more work and has a few more downsides: you can't easily zoom interactively into a specific region anymore for example, unless you make sure both axes are linked via their x-limits. Really, the easiest is to take into account that offset, by "messing with the index".
As for the tick labels, you can easily change the format so that they don't show the year by specifying the x-tick format:
import matplotlib.dates as mdates
month_day_fmt = mdates.DateFormatter('%b %d') # "Locale's abbreviated month name. + day of the month"
ax.xaxis.set_major_formatter(month_day_fmt)
Have a look at the matplotlib API example for specifying the date format.
I see two options.
Option 1: add a month column to your dataframes
data15['month'] = data15.index.to_series().dt.strftime('%b')
data16['month'] = data16.index.to_series().dt.strftime('%b')
ax = data16.plot(x='month', y='2016')
ax = data15.plot(x='month', y='2015', ax=ax)
Option 2: if you don't want to do that, you can use matplotlib directly
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.plot(data15['2015'].values)
ax.plot(data16['2016'].values)
plt.xticks(range(len(data15)), data15.index.to_series().dt.strftime('%b'), size='small')
Needless to say, I would recommend the first option.
You might be able to use pandas.DatetimeIndex.dayofyear to get the day number which will allow you to plot two different year's data on top of one another.
in: date=pd.datetime('2008-10-31')
in: date.dayofyear
out: 305

Categories