matplotlib xticks outputs wrong array - python

I am trying to plot a time series, which looks like this
ts
2020-01-01 00:00:00 1300.0
2020-01-01 01:00:00 1300.0
2020-01-01 02:00:00 1300.0
2020-01-01 03:00:00 1300.0
2020-01-01 04:00:00 1300.0
...
2020-12-31 19:00:00 1300.0
2020-12-31 20:00:00 1300.0
2020-12-31 21:00:00 1300.0
2020-12-31 22:00:00 1300.0
2020-12-31 23:00:00 1300.0
Freq: H, Name: 1, Length: 8784, dtype: float64
And I plot it via: ts.plot(label=label, linestyle='--', color='k', alpha=0.75, zorder=2)
If the time series ts starts from 2020-01-01 to 2020-12-31, I get following when I call plt.xticks()[0]:
array([438288, 439032, 439728, 440472, 441192, 441936, 442656, 443400,
444144, 444864, 445608, 446328, 447071], dtype=int64)
which is fine since the first element of that array actually shows the right position of the first xtick. However when I expand the time series object from 2019-01-01 to 2020-12-31, so over 2 years, when I call the plt.xticks()[0], I get following:
array([429528, 431688, 433872, 436080, 438288, 440472, 442656, 444864,
447071], dtype=int64)
I don't understand why now I am getting less values as xticks. So for 12 months I am getting 13 locations for xticks. But for 24 months I was expecting to get 25 locations. Instead I got only 9. How would I get all of these 25 locations?
This is the whole script:
fig, ax = plt.subplots(figsize=(8,4))
ts.plot(label=label, linestyle='--', color='k', alpha=0.75, zorder=2)
locs, labels = plt.xticks()

Matplotlib automatically selects an appropriate number of ticks and tick labels so that the x-axis does not become unreadable. You can override the default behavior by using tick locators and formatters from the matplotlib.dates module.
But note that you are plotting the time series with the pandas plot method which is a wrapper around plt.plot. Pandas uses custom tick formatters for time series plots that produce nicely-formatted tick labels. By doing so, it uses x-axis units for dates that are different from the matplotlib date units, which explains why you get what looks like a random number of ticks when you try using the MonthLocator.
To make the pandas plot compatible with matplotlib.dates tick locators, you need to add the undocumented x_compat=True argument. Unfortunately, this also removes the pandas custom tick label formatters. So here is an example of how to use a matplotlib date tick locator with a pandas plot and get a similar tick format (minor ticks not included):
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import matplotlib.dates as mdates
# Create sample time series stored in a dataframe
ts = pd.DataFrame(data=dict(constant=1),
index=pd.date_range('2019-01-01', '2020-12-31', freq='H'))
# Create pandas plot
ax = ts.plot(figsize=(10,4), x_compat=True)
ax.set_xlim(min(ts.index), max(ts.index))
# Select and format x ticks
ax.xaxis.set_major_locator(mdates.MonthLocator())
ticks = pd.to_datetime(ax.get_xticks(), unit='d') # timestamps of x ticks
labels = [timestamp.strftime('%b\n%Y') if timestamp.year != ticks[idx-1].year
else timestamp.strftime('%b') for idx, timestamp in enumerate(ticks)]
plt.xticks(ticks, labels, rotation=0, ha='center');

Related

x-Axis ticks as dates

I have some data I would like to plot consisting of two columns, one being an amount count and the other column being the actually date recorded. When plotting this, since I have over 2000 dates, it makes more sense to not show every single date as a tick on the x-axis, otherwise it won't be readable. However, I am having a hard time making the dates show up on the x-axis with some kind of logic. I have tried using the in-built tick locators for matplotlib but it's not working somehow. Here is a preview of the data:
PatientTraffic = pd.DataFrame({'count' : CleanData.groupby("TimeStamp").size()}).reset_index()
display(PatientTraffic.head(3000))
TimeStamp count
0 2016-03-13 12:20:00 1
1 2016-03-13 13:39:00 1
2 2016-03-13 13:43:00 1
3 2016-03-13 16:00:00 1
4 2016-03-14 13:27:00 1
... ... ...
2088 2020-02-18 16:00:00 8
2089 2020-02-19 16:00:00 8
2090 2020-02-20 16:00:00 8
2091 2020-02-21 16:00:00 8
2092 2020-02-22 16:00:00 8
2093 rows × 2 columns
and when I go to plot it with these settings:
PatientTrafficPerTimeStamp = PatientTraffic.plot.bar(
x='TimeStamp',
y='count',
figsize=(20,3),
title = "Patient Traffic over Time"
)
PatientTrafficPerTimeStamp.xaxis.set_major_locator(plt.MaxNLocator(3))
I expect to get a bar chart where the x-axis has three ticks, one in the beginning middle and end...maybe I'm using this wrong. Also, it seems like the ticks that appear are simply the first 3 in the column which is not what I want. Any help would be appreciated!
You probably think that you have one problem, but you actually have two - and both are based on the fact that you use convenience functions. The problem that you are most likely not aware of is that pandas plots bars as categorical data. This makes sense under most conditions but obviously not, if you have TimeStamp data as your x-axis. Let's see if I didn't make that up:
import matplotlib.pyplot as plt
import pandas as pd
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
df = pd.read_csv("test.txt", sep = "\s{2,}", engine="python")
#convert TS from string into datetime objects
df.TS = pd.to_datetime(df.TS, format="%Y-%m-%d %H:%M:%S")
#and plot it as you do directly from pandas that provides the data to matplotlib
df.plot.bar(
x="TS",
y="Val",
ax=ax1,
title="pandas version"
)
#now plot the same data using matplotlib
ax2.bar(df.TS, df.Val, width=22)
ax2.tick_params(axis="x", labelrotation=90)
ax2.set_title("matplotlib version")
plt.tight_layout()
plt.show()
Sample output:
So, we should plot them directly from matplotlib to prevent losing the TimeStamp information. Obviously, we lose some comfort provided by pandas, e.g., we have to adjust the width of the bars and label the axes. Now, you could use the other convenience function of MaxNLocatorbut as you noticed that has been written to work well for most conditions but you give up control over the exact positioning of the ticks. Why not write our own locator using FixedLocator?
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import FixedLocator
import pandas as pd
def myownMaxNLocator(datacol, n):
datemin = mdates.date2num(datacol.min())
datemax = mdates.date2num(datacol.max())
xticks = np.linspace(datemin, datemax, n)
return xticks
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
df = pd.read_csv("test.txt", sep = "\s{2,}", engine="python")
df.TS = pd.to_datetime(df.TS, format="%Y-%m-%d %H:%M:%S")
df.plot.bar(
x="TS",
y="Val",
ax=ax1,
title="pandas version"
)
ax2.bar(df.TS, df.Val, width=22)
ax2.set_title("matplotlib version")
dateticks = myownMaxNLocator(df.TS, 5)
ax2.xaxis.set_major_locator(FixedLocator(dateticks))
ax2.tick_params(axis="x", labelrotation=90)
plt.tight_layout()
plt.show()
Sample output:
Here, the ticks start with the lowest value and end with the highest value. Alternatively, you could use the LinearLocator that distributes the ticks evenly over the entire view:
from matplotlib.ticker import LinearLocator
...
ax2.bar(df.TS, df.Val, width=22)
ax2.set_title("matplotlib version")
ax2.xaxis.set_major_locator(LinearLocator(numticks=5))
ax2.tick_params(axis="x", labelrotation=90)
...
Sample output:
The sample data were stored in a file with the following structure:
TS Val
0 2016-03-13 12:20:00 1
1 2016-04-13 13:39:00 3
2 2016-04-03 13:43:00 5
3 2016-06-17 16:00:00 1
4 2016-09-14 13:27:00 2
2088 2017-02-08 16:00:00 7
2089 2017-02-25 16:00:00 2
2090 2018-02-20 16:00:00 8
2091 2019-02-21 16:00:00 9
2092 2020-02-22 16:00:00 8
Have you considered grouping by date if you don't need that many xticks?
Answering your question, you can make custom ticks with :
plt.xticks(ticks=[ any list ], labels=[ list of labels ])
link to documentation

Modifying x ticks labels in seaborn

I am trying to modify the format of the x-tick label to date format (%m-%d).
My data consists of hourly data values over a certain period of dates. I am trying to plot the data for 14 days. However when I run I get x labels fully jumbled up.
Is there any way I can show only dates and skip hourly values on the x-axis. ? Is there any way to modify x ticks where I can skip labels for hours and show labels only for dates? I am using seaborn.
After suggestion from comment by i edited my code to plot as below:
fig, ax = plt.pyplot.subplots()
g = sns.barplot(data=data_n,x='datetime',y='hourly_return')
g.xaxis.set_major_formatter(plt.dates.DateFormatter("%d-%b"))
But I got the following error:
ValueError: DateFormatter found a value of x=0, which is an illegal
date; this usually occurs because you have not informed the axis that
it is plotting dates, e.g., with ax.xaxis_date()
Upon checking the datetime column I get following output with data type type of the column:
0 2020-01-01 00:00:00
1 2020-01-01 01:00:00
2 2020-01-01 02:00:00
3 2020-01-01 03:00:00
4 2020-01-01 04:00:00
...
307 2020-01-13 19:00:00
308 2020-01-13 20:00:00
309 2020-01-13 21:00:00
310 2020-01-13 22:00:00
311 2020-01-13 23:00:00
Name: datetime, Length: 312, dtype: datetime64[ns]
I was suspecting the x ticks so when I ran g.get_xticks() [which gets the ticks on x-axis], I got output as ordinal numbers. Can anyone tell why is this happening?
1. Approach for Drawing Line Plot with x-axis datetime
Can you try changing x axis format as below
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import dates
## create dummy dataframe
datelist = pd.date_range(start='2020-01-01 00:00:00', periods=312,freq='1H').tolist()
#create dummy dataframe
df = pd.DataFrame(datelist, columns=["datetime"])
df["val"] = [i for i in range(1,312+1)]
df.head()
Below is the dataframe info
Draw plot
fig, ax = plt.subplots()
chart = sns.lineplot(data=df, ax=ax, x="datetime",y="val")
ax.xaxis.set_major_formatter(dates.DateFormatter("%d-%b"))
Output:
2. Approach for Drawing Bar plot using seaborn with x-axis datetime
There is a problem with the above approach if you draw for barplot. So, will use below code
fig, ax = plt.subplots()
## barplot
chart = sns.barplot(data=df, ax=ax,x="datetime",y="val")
## freq of showing dates, since frequency of datetime in our data is 1H.
## so, will have every day 24data points
## Trying to calculate the frequency by each day
## (assumed points are collected every hours in each day, 24)
## set the frequency for labelling the xaxis
freq = int(24)
# set the xlabels as the datetime data for the given labelling frequency,
# also use only the date for the label
ax.set_xticklabels(df.iloc[::freq]["datetime"].dt.strftime("%d-%b-%y"))
# set the xticks at the same frequency as the xlabels
xtix = ax.get_xticks()
ax.set_xticks(xtix[::freq])
# nicer label format for dates
fig.autofmt_xdate()
plt.show()
output:

how to plot 23h-25h time-series in python

I have this kind of table.
I want to plot x=day, y=hour (x-y graph) on one graph.
so I set all days into certain fixed day(2017-03-01) except hour and minute parameters for plot.
and after plot graph, I just change x label from original time data
accordingly this steps i get the graph as bellow.
but a problem occurs in 23~00 hours data
in order to look clear in graph
If the gap of minimum and maximum is over 23 hours,
I want to search all 00:00 time-slot in dataframe and add at 00:00 to 24 hours
Autually, there is an hour difference from 23:00 to 24:00.
But, the difference of below graph I attached from 23:00 to 00:00 is 23 hours.
Could you please let me know how to plot the data as I requested?
I also attach my code too
plt.clf()
#####read files###########################
df = pd.read_excel('files',parse_dates=[0])
#####to make xlabel###########################
x = range(len(df))
xla =df['UTC'].dt.strftime('%Y-%m-%d')
#####set the days same date ###########################
y = df['UTC'].apply(lambda x: x.replace(year=2020, month=3, day=1))
ax = plt.subplot()
ax.plot(x, y ,marker='s', color='k')
ax.yaxis.set_major_locator(md.MinuteLocator(interval=5))
ax.yaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
plt.xticks(x,xla)
plt.xticks(rotation=90)
ax.xaxis.grid(True)
ax.yaxis.grid(True)
plt.title('time_of_waypoint', fontsize=10)
plt.xlabel('day')
plt.ylabel('time')
Try this:
import pandas as pd
import matplotlib.pyplot as plt
from io import StringIO
data = StringIO("""datetime
2018-04-03 00:00:00
2018-04-04 23:56:00
2018-04-05 23:57:00
2018-04-06 23:58:00
2018-04-07 00:02:00
2018-04-08 23:59:00
2018-04-09 23:57:00
2018-04-10 23:52:00
""")
df = pd.read_csv(data, engine='python')
df['datetime']= pd.to_datetime(df['datetime'])
x = df['datetime'].apply(lambda x:x.strftime('%m-%d'))
y = df['datetime'].apply(lambda x:x.strftime('%H:%m'))
plt.plot(x,y, 'o')
plt.plot(x,y, '-')
plt.xlabel('month-day')
plt.ylabel('hour:minute')
plt.show()

Pandas plot bar graph with datetime64

I have the following dataframe:
gr_data = data.groupby([pd.Grouper(key='date', freq='W-SUN')])['name'].count()
print(gr_data)
date
2018-08-19 582
2018-09-02 1997
2018-09-16 3224
2018-10-07 4282
2018-10-28 5618
2018-11-04 5870
Freq: W-SUN, Name: name, dtype: int64
I am plotting this data using plot.bar()
'date' is a datetime64[ns] type.
When I plot the data hours/minutes/seconds are visible. How do drop the hours/minutes/seconds?
You should convert them with strfttime. You might want to ensure the dates are sorted before plotting so it always plots in time order.
import matplotlib.pyplot as plt
gr_data.sort_index(inplace=True) #Ensure plotting in time order
fig, ax = plt.subplots()
gr_data.assign(dates = gr_data.index.strftime('%Y-%m-%d')).plot(kind='bar', x='dates', ax=ax)
fig.autofmt_xdate() #Rotate the dates so they aren't squished
plt.show()

Pandas Dataframe Multicolor Line plot

I have a Pandas Dataframe with a DateTime index and two column representing Wind Speed and ambient Temperature. Here is the data for half a day
temp winds
2014-06-01 00:00:00 8.754545 0.263636
2014-06-01 01:00:00 8.025000 0.291667
2014-06-01 02:00:00 7.375000 0.391667
2014-06-01 03:00:00 6.850000 0.308333
2014-06-01 04:00:00 7.150000 0.258333
2014-06-01 05:00:00 7.708333 0.375000
2014-06-01 06:00:00 9.008333 0.391667
2014-06-01 07:00:00 10.858333 0.300000
2014-06-01 08:00:00 12.616667 0.341667
2014-06-01 09:00:00 15.008333 0.308333
2014-06-01 10:00:00 17.991667 0.491667
2014-06-01 11:00:00 21.108333 0.491667
2014-06-01 12:00:00 21.866667 0.395238
I would like to plot this data as one line where the color changes according to temperature. So from light red to dark red the higher the temperature for example.
I found this example of multicolored lines with matplotlib but I have no idea how to use this with a pandas DataFrame. Has anyone an idea what I could do?
If it is possible to do this, would it also be possible as additional feature to change the width of the line according to wind speed? So the faster the wind the wider the line.
Thanks for any help!
The build-in plot method in pandas probably won't be able to do it. You need to extract the data and plot them using matplotlib.
from matplotlib.collections import LineCollection
import matplotlib.dates as mpd
x=mpd.date2num(df.index.to_pydatetime())
y=df.winds.values
c=df['temp'].values
points = np.array([x, y]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
lc = LineCollection(segments, cmap=plt.get_cmap('copper'), norm=plt.Normalize(0, 10))
lc.set_array(c)
lc.set_linewidth(3)
ax=plt.gca()
ax.add_collection(lc)
plt.xlim(min(x), max(x))
ax.xaxis.set_major_locator(mpd.HourLocator())
ax.xaxis.set_major_formatter(mpd.DateFormatter('%Y-%m-%d:%H:%M:%S'))
_=plt.setp(ax.xaxis.get_majorticklabels(), rotation=70 )
plt.savefig('temp.png')
There are two issues worth mentioning,
the range of the color gradient is controlled by norm=plt.Normalize(0, 10)
pandas and matplotlib plot time series differently, which requires the df.index to be converted to float before plotting. And by modifying the major_locators, we will get the xaxis majorticklabels back into date-time format.
The second issue may cause problem when we want to plot more than just one lines (the data will be plotted in two separate x ranges):
#follow what is already plotted:
df['another']=np.random.random(13)
print ax.get_xticks()
df.another.plot(ax=ax, secondary_y=True)
print ax.get_xticks(minor=True)
[ 735385. 735385.04166667 735385.08333333 735385.125
735385.16666667 735385.20833333 735385.25 735385.29166667
735385.33333333 735385.375 735385.41666667 735385.45833333
735385.5 ]
[389328 389330 389332 389334 389336 389338 389340]
Therefore we need to do it without .plot() method of pandas:
ax.twinx().plot(x, df.another)

Categories