Modifying x ticks labels in seaborn - python

I am trying to modify the format of the x-tick label to date format (%m-%d).
My data consists of hourly data values over a certain period of dates. I am trying to plot the data for 14 days. However when I run I get x labels fully jumbled up.
Is there any way I can show only dates and skip hourly values on the x-axis. ? Is there any way to modify x ticks where I can skip labels for hours and show labels only for dates? I am using seaborn.
After suggestion from comment by i edited my code to plot as below:
fig, ax = plt.pyplot.subplots()
g = sns.barplot(data=data_n,x='datetime',y='hourly_return')
g.xaxis.set_major_formatter(plt.dates.DateFormatter("%d-%b"))
But I got the following error:
ValueError: DateFormatter found a value of x=0, which is an illegal
date; this usually occurs because you have not informed the axis that
it is plotting dates, e.g., with ax.xaxis_date()
Upon checking the datetime column I get following output with data type type of the column:
0 2020-01-01 00:00:00
1 2020-01-01 01:00:00
2 2020-01-01 02:00:00
3 2020-01-01 03:00:00
4 2020-01-01 04:00:00
...
307 2020-01-13 19:00:00
308 2020-01-13 20:00:00
309 2020-01-13 21:00:00
310 2020-01-13 22:00:00
311 2020-01-13 23:00:00
Name: datetime, Length: 312, dtype: datetime64[ns]
I was suspecting the x ticks so when I ran g.get_xticks() [which gets the ticks on x-axis], I got output as ordinal numbers. Can anyone tell why is this happening?

1. Approach for Drawing Line Plot with x-axis datetime
Can you try changing x axis format as below
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import dates
## create dummy dataframe
datelist = pd.date_range(start='2020-01-01 00:00:00', periods=312,freq='1H').tolist()
#create dummy dataframe
df = pd.DataFrame(datelist, columns=["datetime"])
df["val"] = [i for i in range(1,312+1)]
df.head()
Below is the dataframe info
Draw plot
fig, ax = plt.subplots()
chart = sns.lineplot(data=df, ax=ax, x="datetime",y="val")
ax.xaxis.set_major_formatter(dates.DateFormatter("%d-%b"))
Output:
2. Approach for Drawing Bar plot using seaborn with x-axis datetime
There is a problem with the above approach if you draw for barplot. So, will use below code
fig, ax = plt.subplots()
## barplot
chart = sns.barplot(data=df, ax=ax,x="datetime",y="val")
## freq of showing dates, since frequency of datetime in our data is 1H.
## so, will have every day 24data points
## Trying to calculate the frequency by each day
## (assumed points are collected every hours in each day, 24)
## set the frequency for labelling the xaxis
freq = int(24)
# set the xlabels as the datetime data for the given labelling frequency,
# also use only the date for the label
ax.set_xticklabels(df.iloc[::freq]["datetime"].dt.strftime("%d-%b-%y"))
# set the xticks at the same frequency as the xlabels
xtix = ax.get_xticks()
ax.set_xticks(xtix[::freq])
# nicer label format for dates
fig.autofmt_xdate()
plt.show()
output:

Related

matplotlib xticks outputs wrong array

I am trying to plot a time series, which looks like this
ts
2020-01-01 00:00:00 1300.0
2020-01-01 01:00:00 1300.0
2020-01-01 02:00:00 1300.0
2020-01-01 03:00:00 1300.0
2020-01-01 04:00:00 1300.0
...
2020-12-31 19:00:00 1300.0
2020-12-31 20:00:00 1300.0
2020-12-31 21:00:00 1300.0
2020-12-31 22:00:00 1300.0
2020-12-31 23:00:00 1300.0
Freq: H, Name: 1, Length: 8784, dtype: float64
And I plot it via: ts.plot(label=label, linestyle='--', color='k', alpha=0.75, zorder=2)
If the time series ts starts from 2020-01-01 to 2020-12-31, I get following when I call plt.xticks()[0]:
array([438288, 439032, 439728, 440472, 441192, 441936, 442656, 443400,
444144, 444864, 445608, 446328, 447071], dtype=int64)
which is fine since the first element of that array actually shows the right position of the first xtick. However when I expand the time series object from 2019-01-01 to 2020-12-31, so over 2 years, when I call the plt.xticks()[0], I get following:
array([429528, 431688, 433872, 436080, 438288, 440472, 442656, 444864,
447071], dtype=int64)
I don't understand why now I am getting less values as xticks. So for 12 months I am getting 13 locations for xticks. But for 24 months I was expecting to get 25 locations. Instead I got only 9. How would I get all of these 25 locations?
This is the whole script:
fig, ax = plt.subplots(figsize=(8,4))
ts.plot(label=label, linestyle='--', color='k', alpha=0.75, zorder=2)
locs, labels = plt.xticks()
Matplotlib automatically selects an appropriate number of ticks and tick labels so that the x-axis does not become unreadable. You can override the default behavior by using tick locators and formatters from the matplotlib.dates module.
But note that you are plotting the time series with the pandas plot method which is a wrapper around plt.plot. Pandas uses custom tick formatters for time series plots that produce nicely-formatted tick labels. By doing so, it uses x-axis units for dates that are different from the matplotlib date units, which explains why you get what looks like a random number of ticks when you try using the MonthLocator.
To make the pandas plot compatible with matplotlib.dates tick locators, you need to add the undocumented x_compat=True argument. Unfortunately, this also removes the pandas custom tick label formatters. So here is an example of how to use a matplotlib date tick locator with a pandas plot and get a similar tick format (minor ticks not included):
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
import matplotlib.dates as mdates
# Create sample time series stored in a dataframe
ts = pd.DataFrame(data=dict(constant=1),
index=pd.date_range('2019-01-01', '2020-12-31', freq='H'))
# Create pandas plot
ax = ts.plot(figsize=(10,4), x_compat=True)
ax.set_xlim(min(ts.index), max(ts.index))
# Select and format x ticks
ax.xaxis.set_major_locator(mdates.MonthLocator())
ticks = pd.to_datetime(ax.get_xticks(), unit='d') # timestamps of x ticks
labels = [timestamp.strftime('%b\n%Y') if timestamp.year != ticks[idx-1].year
else timestamp.strftime('%b') for idx, timestamp in enumerate(ticks)]
plt.xticks(ticks, labels, rotation=0, ha='center');

x-Axis ticks as dates

I have some data I would like to plot consisting of two columns, one being an amount count and the other column being the actually date recorded. When plotting this, since I have over 2000 dates, it makes more sense to not show every single date as a tick on the x-axis, otherwise it won't be readable. However, I am having a hard time making the dates show up on the x-axis with some kind of logic. I have tried using the in-built tick locators for matplotlib but it's not working somehow. Here is a preview of the data:
PatientTraffic = pd.DataFrame({'count' : CleanData.groupby("TimeStamp").size()}).reset_index()
display(PatientTraffic.head(3000))
TimeStamp count
0 2016-03-13 12:20:00 1
1 2016-03-13 13:39:00 1
2 2016-03-13 13:43:00 1
3 2016-03-13 16:00:00 1
4 2016-03-14 13:27:00 1
... ... ...
2088 2020-02-18 16:00:00 8
2089 2020-02-19 16:00:00 8
2090 2020-02-20 16:00:00 8
2091 2020-02-21 16:00:00 8
2092 2020-02-22 16:00:00 8
2093 rows × 2 columns
and when I go to plot it with these settings:
PatientTrafficPerTimeStamp = PatientTraffic.plot.bar(
x='TimeStamp',
y='count',
figsize=(20,3),
title = "Patient Traffic over Time"
)
PatientTrafficPerTimeStamp.xaxis.set_major_locator(plt.MaxNLocator(3))
I expect to get a bar chart where the x-axis has three ticks, one in the beginning middle and end...maybe I'm using this wrong. Also, it seems like the ticks that appear are simply the first 3 in the column which is not what I want. Any help would be appreciated!
You probably think that you have one problem, but you actually have two - and both are based on the fact that you use convenience functions. The problem that you are most likely not aware of is that pandas plots bars as categorical data. This makes sense under most conditions but obviously not, if you have TimeStamp data as your x-axis. Let's see if I didn't make that up:
import matplotlib.pyplot as plt
import pandas as pd
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
df = pd.read_csv("test.txt", sep = "\s{2,}", engine="python")
#convert TS from string into datetime objects
df.TS = pd.to_datetime(df.TS, format="%Y-%m-%d %H:%M:%S")
#and plot it as you do directly from pandas that provides the data to matplotlib
df.plot.bar(
x="TS",
y="Val",
ax=ax1,
title="pandas version"
)
#now plot the same data using matplotlib
ax2.bar(df.TS, df.Val, width=22)
ax2.tick_params(axis="x", labelrotation=90)
ax2.set_title("matplotlib version")
plt.tight_layout()
plt.show()
Sample output:
So, we should plot them directly from matplotlib to prevent losing the TimeStamp information. Obviously, we lose some comfort provided by pandas, e.g., we have to adjust the width of the bars and label the axes. Now, you could use the other convenience function of MaxNLocatorbut as you noticed that has been written to work well for most conditions but you give up control over the exact positioning of the ticks. Why not write our own locator using FixedLocator?
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import FixedLocator
import pandas as pd
def myownMaxNLocator(datacol, n):
datemin = mdates.date2num(datacol.min())
datemax = mdates.date2num(datacol.max())
xticks = np.linspace(datemin, datemax, n)
return xticks
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 5))
df = pd.read_csv("test.txt", sep = "\s{2,}", engine="python")
df.TS = pd.to_datetime(df.TS, format="%Y-%m-%d %H:%M:%S")
df.plot.bar(
x="TS",
y="Val",
ax=ax1,
title="pandas version"
)
ax2.bar(df.TS, df.Val, width=22)
ax2.set_title("matplotlib version")
dateticks = myownMaxNLocator(df.TS, 5)
ax2.xaxis.set_major_locator(FixedLocator(dateticks))
ax2.tick_params(axis="x", labelrotation=90)
plt.tight_layout()
plt.show()
Sample output:
Here, the ticks start with the lowest value and end with the highest value. Alternatively, you could use the LinearLocator that distributes the ticks evenly over the entire view:
from matplotlib.ticker import LinearLocator
...
ax2.bar(df.TS, df.Val, width=22)
ax2.set_title("matplotlib version")
ax2.xaxis.set_major_locator(LinearLocator(numticks=5))
ax2.tick_params(axis="x", labelrotation=90)
...
Sample output:
The sample data were stored in a file with the following structure:
TS Val
0 2016-03-13 12:20:00 1
1 2016-04-13 13:39:00 3
2 2016-04-03 13:43:00 5
3 2016-06-17 16:00:00 1
4 2016-09-14 13:27:00 2
2088 2017-02-08 16:00:00 7
2089 2017-02-25 16:00:00 2
2090 2018-02-20 16:00:00 8
2091 2019-02-21 16:00:00 9
2092 2020-02-22 16:00:00 8
Have you considered grouping by date if you don't need that many xticks?
Answering your question, you can make custom ticks with :
plt.xticks(ticks=[ any list ], labels=[ list of labels ])
link to documentation

how to plot 23h-25h time-series in python

I have this kind of table.
I want to plot x=day, y=hour (x-y graph) on one graph.
so I set all days into certain fixed day(2017-03-01) except hour and minute parameters for plot.
and after plot graph, I just change x label from original time data
accordingly this steps i get the graph as bellow.
but a problem occurs in 23~00 hours data
in order to look clear in graph
If the gap of minimum and maximum is over 23 hours,
I want to search all 00:00 time-slot in dataframe and add at 00:00 to 24 hours
Autually, there is an hour difference from 23:00 to 24:00.
But, the difference of below graph I attached from 23:00 to 00:00 is 23 hours.
Could you please let me know how to plot the data as I requested?
I also attach my code too
plt.clf()
#####read files###########################
df = pd.read_excel('files',parse_dates=[0])
#####to make xlabel###########################
x = range(len(df))
xla =df['UTC'].dt.strftime('%Y-%m-%d')
#####set the days same date ###########################
y = df['UTC'].apply(lambda x: x.replace(year=2020, month=3, day=1))
ax = plt.subplot()
ax.plot(x, y ,marker='s', color='k')
ax.yaxis.set_major_locator(md.MinuteLocator(interval=5))
ax.yaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
plt.xticks(x,xla)
plt.xticks(rotation=90)
ax.xaxis.grid(True)
ax.yaxis.grid(True)
plt.title('time_of_waypoint', fontsize=10)
plt.xlabel('day')
plt.ylabel('time')
Try this:
import pandas as pd
import matplotlib.pyplot as plt
from io import StringIO
data = StringIO("""datetime
2018-04-03 00:00:00
2018-04-04 23:56:00
2018-04-05 23:57:00
2018-04-06 23:58:00
2018-04-07 00:02:00
2018-04-08 23:59:00
2018-04-09 23:57:00
2018-04-10 23:52:00
""")
df = pd.read_csv(data, engine='python')
df['datetime']= pd.to_datetime(df['datetime'])
x = df['datetime'].apply(lambda x:x.strftime('%m-%d'))
y = df['datetime'].apply(lambda x:x.strftime('%H:%m'))
plt.plot(x,y, 'o')
plt.plot(x,y, '-')
plt.xlabel('month-day')
plt.ylabel('hour:minute')
plt.show()

Pandas plot bar graph with datetime64

I have the following dataframe:
gr_data = data.groupby([pd.Grouper(key='date', freq='W-SUN')])['name'].count()
print(gr_data)
date
2018-08-19 582
2018-09-02 1997
2018-09-16 3224
2018-10-07 4282
2018-10-28 5618
2018-11-04 5870
Freq: W-SUN, Name: name, dtype: int64
I am plotting this data using plot.bar()
'date' is a datetime64[ns] type.
When I plot the data hours/minutes/seconds are visible. How do drop the hours/minutes/seconds?
You should convert them with strfttime. You might want to ensure the dates are sorted before plotting so it always plots in time order.
import matplotlib.pyplot as plt
gr_data.sort_index(inplace=True) #Ensure plotting in time order
fig, ax = plt.subplots()
gr_data.assign(dates = gr_data.index.strftime('%Y-%m-%d')).plot(kind='bar', x='dates', ax=ax)
fig.autofmt_xdate() #Rotate the dates so they aren't squished
plt.show()

How to add x-axis on plot?

I am trying to plot some data, but I don't know how I can add the date values on the x-axis on my graph. Here is my code:
import pandas as pd
import numpy as np
%matplotlib inline
%pylab inline
import matplotlib.pyplot as plt
pylab.rcParams['figure.figsize'] = (15, 9)
df["msft"].plot(grid = True)
The description of the image is a plot, but the x-axis just has numbers, but I am looking for dates to appear on x-axis. The dates are in the date column in the dataframe.
Here is what the dataframe looks like:
date msft nok aapl ibm amzn
1 2018-01-01 09:00:00 112 1 143 130 1298
2 2018-01-01 10:00:00 109 10 185 137 1647
3 2018-01-01 11:00:00 98 11 146 105 1331
4 2018-01-01 12:00:00 83 3 214 131 1355
Can you offer some help on what I am missing?
Your column date is just another column for pandas, you have to tell the program that you want to plot against this specific one. One way is to plot against this column:
from matplotlib import pyplot as plt
import pandas as pd
#load dataframe
df = pd.read_csv("test.txt", delim_whitespace=True)
#convert date column to datetime object, if it is not already one
df["date"] = pd.to_datetime(df["date"])
#plot the specified columns vs dates
df.plot(x = "date", y = ["msft", "ibm"], kind = "line", grid = True)
plt.show()
For more pandas plot options, please have a look at the documentation.
Another way would be to set date as the index of the dataframe. Then you can use your approach:
df.set_index("date", inplace = True)
df[["msft", "ibm"]].plot(grid = True)
plt.show()
The automatic date labels might not be, what you want to display. But there are ways to format the output and you can find examples on SO.
one way to do it is the set_xticklabels function, though Mr. T's answer is the proper way to go
ax = plt.subplot(111)
df["msft"].plot(grid = True)
ax.set_xticklabels(df['date'])
plt.xticks(np.arange(4))
with the data provided:

Categories