Pandas plot function ignores timezone of timeseries - python

When plotting a timeseries with the built-in plot function of pandas, it seems to ignore the timezone of my index: it always uses the UTC time for the x-axis. An example:
import numpy as np
import matplotlib.pyplot as plt
from pandas import rolling_mean, DataFrame, date_range
rng = date_range('1/1/2011', periods=200, freq='S', tz="UTC")
data = DataFrame(np.random.randn(len(rng), 3), index=rng, columns=['A', 'B', 'C'])
data_cet = data.tz_convert("CET")
# plot with data in UTC timezone
fig, ax = plt.subplots()
data[["A", "B"]].plot(ax=ax, grid=True)
plt.show()
# plot with data in CET timezone, but the x-axis remains the same as above
fig, ax = plt.subplots()
data_cet[["A", "B"]].plot(ax=ax, grid=True)
plt.show()
The plot does not change, although the index has:
In [11]: data.index[0]
Out[11]: <Timestamp: 2011-01-01 00:00:00+0000 UTC, tz=UTC>
In [12]: data_cet.index[0]
Out[12]: <Timestamp: 2011-01-01 01:00:00+0100 CET, tz=CET>
Should I file a bug, or do I miss something?

This is definitely a bug. I've created a report on github. The reason is because internally, pandas converts a regular frequency DatetimeIndex to PeriodIndex to hook into formatters/locators in pandas, and currently PeriodIndex does NOT retain timezone information.
Please stay tuned for a fix.

from pytz import timezone as ptz
import matplotlib as mpl
...
data.index = pd.to_datetime(data.index, utc=True).tz_localize(tz=ptz('<your timezone>'))
...
mpl.rcParams['timezone'] = data.index.tz.zone
... after which matplotlib prints as that zone rather than UTC.
However! Note if you need to annotate, the x locations of the annotations will still need to be in UTC, even whilst strings passed to data.loc[] or data.at[] will be assumed to be in the set timezone!
For instance I needed to show a series of vertical lines labelled with timestamps on them:
(this is after most of the plot calls, and note the timestamp strings in sels were UTC)
sels = ['2019-03-21 3:56:28',
'2019-03-21 4:00:30',
'2019-03-21 4:05:55',
'2019-03-21 4:13:40']
ax.vlines(sels,125,145,lw=1,color='grey') # 125 was bottom, 145 was top in data units
for s in sels:
tstr = pd.to_datetime(s, utc=True)\
.astimezone(tz=ptz(data.index.tz.zone))\
.isoformat().split('T')[1].split('+')[0]
ax.annotate(tstr,xy=(s,125),xycoords='data',
xytext=(0,5), textcoords='offset points', rotation=90,
horizontalalignment='right', verticalalignment='bottom')
This puts grey vertical lines at the times chosen manually in sels, and labels them in local timezone hours, minutes and seconds. (the .split()[] business discards the date and timezone info from the .isoformat() string).
But when I need to actually get corresponding values from data using the same s in sels, I then have to use the somewhat awkward:
data.tz_convert('UTC').at[s]
Whereas just
data.at[s]
Fails with a KeyError because pandas interprets s is in the data.index.tz timezone, and so interpreted, the timestamps fall outside of range of the contents of data

How to deal with UTC to local time conversion
import time
import matplotlib.dates
…
tz = pytz.timezone(time.tzname[0])
…
ax.xaxis.set_major_locator(matplotlib.dates.HourLocator(interval=1, tz=tz))
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%H', tz=tz))

Related

Matplotlib not reading time axis correctly

I want to visualise the daily data using Matplotlib. The data is temperature against time and has this format:
Time Temperature
1 8:23:04 18.5
2 8:23:04 19.0
3 9:12:57 19.0
4 9:12:57 20.0
... ... ...
But when plotting the graph, the Time values on x-axis is distorted, which looks like this:
Realising Matplotlib may not be interpreting time data correctly, I converted the time format using pd.to_datetime:
df['Time'] = pd.to_datetime(df['Time'], format="%H:%M:%S")
df.plot( 'Time', 'Temperature',figsize=(20, 10))
df.describe()
but this again returned:
How to make the time on x-axis look normal? Thanks
As #Michael O. was saying, you need to take care of the datetime.
You miss the day, year and month. Here I implemented a possible solution adding these missing data with some default values, you may want to change them.
The code is very simple and the comments illustrate what I am doing.
import pandas as pd
from datetime import datetime, date, time, timezone
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
vals=[["8:23:04", 18.5],
["8:23:04", 19.0],
["9:12:57", 19.0],
["9:12:57", 20.0]]
apd=pd.DataFrame(vals, columns=["Time", "Temp"])
# a simple function to convert a string to datetime
def conv_time(cell):
dt = datetime.strptime(cell, "%d/%m/%Y %H:%M:%S")
return(dt)
# the dataframe misses the day, month and year, we need to add some
apd["Time"]=["{}/{}/{} {}".format(1,1,2020, cell) for cell in apd["Time"]]
# we use the function to convert the column to a datetime
apd["Time"]=[conv_time(cell) for cell in apd["Time"]]
## plotting the results taking care of the axis
fig, ax = plt.subplots()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%H"))
ax.set_xlim([pd.to_datetime('2020-01-1 6:00:00'), pd.to_datetime('2020-01-1 12:00:00')])
ax.scatter(apd["Time"], apd["Temp"])

Matplotlib - show x axis values - dates [duplicate]

Compare the following code:
test = pd.DataFrame({'date':['20170527','20170526','20170525'],'ratio1':[1,0.98,0.97]})
test['date'] = pd.to_datetime(test['date'])
test = test.set_index('date')
ax = test.plot()
I added DateFormatter in the end:
test = pd.DataFrame({'date':['20170527','20170526','20170525'],'ratio1':[1,0.98,0.97]})
test['date'] = pd.to_datetime(test['date'])
test = test.set_index('date')
ax = test.plot()
ax.xaxis.set_minor_formatter(dates.DateFormatter('%d\n\n%a')) ## Added this line
The issue with the second graph is that it starts on 5-24 instead 5-25. Also, 5-25 of 2017 is Thursday not Monday. What is causing the issue? Is this timezone related? (I don't understand why the date numbers are stacked on top of each other either)
In general the datetime utilities of pandas and matplotlib are incompatible. So trying to use a matplotlib.dates object on a date axis created with pandas will in most cases fail.
One reason is e.g. seen from the documentation
datetime objects are converted to floating point numbers which represent time in days since 0001-01-01 UTC, plus 1. For example, 0001-01-01, 06:00 is 1.25, not 0.25.
However, this is not the only difference and it is thus advisable not to mix pandas and matplotlib when it comes to datetime objects.
There is however the option to tell pandas not to use its own datetime format. In that case using the matplotlib.dates tickers is possible. This can be steered via.
df.plot(x_compat=True)
Since pandas does not provide sophisticated formatting capabilities for dates, one can use matplotlib for plotting and formatting.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
df = pd.DataFrame({'date':['20170527','20170526','20170525'],'ratio1':[1,0.98,0.97]})
df['date'] = pd.to_datetime(df['date'])
usePandas=True
#Either use pandas
if usePandas:
df = df.set_index('date')
df.plot(x_compat=True)
plt.gca().xaxis.set_major_locator(dates.DayLocator())
plt.gca().xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
plt.gca().invert_xaxis()
plt.gcf().autofmt_xdate(rotation=0, ha="center")
# or use matplotlib
else:
plt.plot(df["date"], df["ratio1"])
plt.gca().xaxis.set_major_locator(dates.DayLocator())
plt.gca().xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
plt.gca().invert_xaxis()
plt.show()
Updated using the matplotlib object oriented API
usePandas=True
#Either use pandas
if usePandas:
df = df.set_index('date')
ax = df.plot(x_compat=True, figsize=(6, 4))
ax.xaxis.set_major_locator(dates.DayLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
ax.invert_xaxis()
ax.get_figure().autofmt_xdate(rotation=0, ha="center")
# or use matplotlib
else:
fig, ax = plt.subplots(figsize=(6, 4))
ax.plot('date', 'ratio1', data=df)
ax.xaxis.set_major_locator(dates.DayLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
fig.invert_xaxis()
plt.show()

Changing the xlim by date in Matplotlib

I am trying to make a visually appealing graph in Python. I was using Randal Olson's example http://www.randalolson.com/2014/06/28/how-to-make-beautiful-data-visualizations-in-python-with-matplotlib/ here and trying to make some adjustments.
My data is simple,
dispute_percentage
Out[34]:
2015-08-11 0.017647
2015-08-12 0.004525
2015-08-13 0.006024
2015-08-14 0.000000
2015-08-15 0.000000
2015-08-17 0.000000
The problem is that the data starts loading at Feb 2015, and I want to start displaying at April 2015.
Here is my code
from __future__ import division
from collections import OrderedDict
import pandas as pd
from collections import Counter
from pylab import *
import datetime as datetime
dispute_percentage.plot(kind = 'line')
plt.xlabel('Date')
plt.ylabel('Percent')
plt.title('Percent Disputes In FY2015')
# Remove the plot frame lines. They are unnecessary chartjunk.
ax = plt.subplot(111)
ax.spines["top"].set_visible(False)
ax.spines["bottom"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["left"].set_visible(False)
# Ensure that the axis ticks only show up on the bottom and left of the plot.
# Ticks on the right and top of the plot are generally unnecessary chartjunk.
ax.get_xaxis().tick_bottom()
#ax.get_yaxis().tick_left()
# Limit the range of the plot to only where the data is.
# Avoid unnecessary whitespace.
datenow = datetime.datetime.now
dstart = datetime(2015,4,1)
print datenow
plt.ylim(0, .14)
plt.xlim(dstart, datenow)
The xlim is what I am struggling with. I'm getting the error
File "C:/Mypath/name.py", line 52, in <module>
dstart = datetime(2015,4,1)
TypeError: 'module' object is not callable
If anyone could help with this that would be great. Also any input on trying to make it prettier would also be appreciated.
You need to call datetime.datetime.now() with the parentheses on the end, and for dstart, you need to use the datetime method of the datetime module: datetime.datetime(2015,4,1)
import datetime
datenow = datetime.datetime.now()
dstart = datetime.datetime(2015,4,1)
EDIT:
To set the xticks to the first of the month (thanks to #AndyKubiak):
firsts=[]
for i in range(dstart.month, datenow.month+1):
firsts.append(datetime.datetime(2015,i,1))
plt.xticks(firsts)

How to Customise Pandas Date Time Stamp # x-axis

When I plots the complete data works fine and displays the date on the x-axis:
.
When I zoom into particular portion to view:
the plot shows only the time rather than date, I do understand with less points can't display different set of date but how to show date or set date format even if the graph is zoomed?
dataToPlot = pd.read_csv(fileName, names=['time','1','2','3','4','plotValue','6','7','8','9','10','11','12','13','14','15','16'],
sep=',', index_col=0, parse_dates=True, dayfirst=True)
dataToPlot.drop(dataToPlot.index[0])
startTime = dataToPlot.head(1).index[0]
endTime = dataToPlot.tail(1).index[0]
ax = pd.rolling_mean(dataToPlot_plot[startTime:endTime][['plotValue']],mar).plot(linestyle='-', linewidth=3, markersize=9, color='#FECB00')
Thanks in advance!
I have a solution to make the labels look consistent, though bear in mind that it will also include the time on the "larger scale" time plot.
The code below uses the matplotlib.dates functionality to choose a date format for the x-axis. Note that as we're using the matplotlib formatting you can't simple use df.plot but must instead use plt.plot_date and convert your index to the correct format.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import dates
# Generate some random data and plot it
time = pd.date_range('07/11/2014', periods=1000, freq='5min')
ts = pd.Series(pd.np.random.randn(len(time)), index=time)
fig, ax = plt.subplots()
ax.plot_date(ts.index.to_pydatetime(), ts.data)
# Create your formatter object and change the xaxis formatting.
date_fmt = '%d/%m/%y %H:%M:%S'
formatter = dates.DateFormatter(date_fmt)
ax.xaxis.set_major_formatter(formatter)
plt.gcf().autofmt_xdate()
plt.show()
An example showing the fully zoomed out plot
An example showing the plot zoomed in.

Plotting time in Python with Matplotlib

I have an array of timestamps in the format (HH:MM:SS.mmmmmm) and another array of floating point numbers, each corresponding to a value in the timestamp array.
Can I plot time on the x axis and the numbers on the y-axis using Matplotlib?
I was trying to, but somehow it was only accepting arrays of floats. How can I get it to plot the time? Do I have to modify the format in any way?
Update:
This answer is outdated since matplotlib version 3.5. The plot function now handles datetime data directly. See https://matplotlib.org/3.5.1/api/_as_gen/matplotlib.pyplot.plot_date.html
The use of plot_date is discouraged. This method exists for historic
reasons and may be deprecated in the future.
datetime-like data should directly be plotted using plot.
If you need to plot plain numeric data as Matplotlib date format or
need to set a timezone, call ax.xaxis.axis_date / ax.yaxis.axis_date
before plot. See Axis.axis_date.
Old, outdated answer:
You must first convert your timestamps to Python datetime objects (use datetime.strptime). Then use date2num to convert the dates to matplotlib format.
Plot the dates and values using plot_date:
import matplotlib.pyplot
import matplotlib.dates
from datetime import datetime
x_values = [datetime(2021, 11, 18, 12), datetime(2021, 11, 18, 14), datetime(2021, 11, 18, 16)]
y_values = [1.0, 3.0, 2.0]
dates = matplotlib.dates.date2num(x_values)
matplotlib.pyplot.plot_date(dates, y_values)
You can also plot the timestamp, value pairs using pyplot.plot (after parsing them from their string representation). (Tested with matplotlib versions 1.2.0 and 1.3.1.)
Example:
import datetime
import random
import matplotlib.pyplot as plt
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.plot(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.show()
Resulting image:
Here's the same as a scatter plot:
import datetime
import random
import matplotlib.pyplot as plt
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.scatter(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.show()
Produces an image similar to this:
7 years later and this code has helped me.
However, my times still were not showing up correctly.
Using Matplotlib 2.0.0 and I had to add the following bit of code from Editing the date formatting of x-axis tick labels in matplotlib by Paul H.
import matplotlib.dates as mdates
myFmt = mdates.DateFormatter('%d')
ax.xaxis.set_major_formatter(myFmt)
I changed the format to (%H:%M) and the time displayed correctly.
All thanks to the community.
I had trouble with this using matplotlib version: 2.0.2. Running the example from above I got a centered stacked set of bubbles.
I "fixed" the problem by adding another line:
plt.plot([],[])
The entire code snippet becomes:
import datetime
import random
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(minutes=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.plot([],[])
plt.scatter(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
myFmt = mdates.DateFormatter('%H:%M')
plt.gca().xaxis.set_major_formatter(myFmt)
plt.show()
plt.close()
This produces an image with the bubbles distributed as desired.
Pandas dataframes haven't been mentioned yet. I wanted to show how these solved my datetime problem. I have datetime to the milisecond 2021-04-01 16:05:37. I am pulling linux/haproxy throughput from /proc so I can really format it however I like. This is nice for feeding data into a live graph animation.
Here's a look at the csv. (Ignore the packets per second column I'm using that in another graph)
head -2 ~/data
date,mbps,pps
2021-04-01 16:05:37,113,9342.00
...
By using print(dataframe.dtype) I can see how the data was read in:
(base) ➜ graphs ./throughput.py
date object
mbps int64
pps float64
dtype: object
Pandas pulls the date string in as "object", which is just type char. Using this as-is in a script:
import matplotlib.pyplot as plt
import pandas as pd
dataframe = pd.read_csv("~/data")
dates = dataframe["date"]
mbps = dataframe["mbps"]
plt.plot(dates, mbps, label="mbps")
plt.title("throughput")
plt.xlabel("time")
plt.ylabel("mbps")
plt.legend()
plt.xticks(rotation=45)
plt.show()
Matplotlib renders all the milisecond time data. I've added plt.xticks(rotation=45) to tilt the dates but it's not what I want. I can convert the date "object" to a datetime64[ns]. Which matplotlib does know how to render.
dataframe["date"] = pd.to_datetime(dataframe["date"])
This time my date is type datetime64[ns]
(base) ➜ graphs ./throughput.py
date datetime64[ns]
mbps int64
pps float64
dtype: object
Same script with 1 line difference.
#!/usr/bin/env python
import matplotlib.pyplot as plt
import pandas as pd
dataframe = pd.read_csv("~/data")
# convert object to datetime64[ns]
dataframe["date"] = pd.to_datetime(dataframe["date"])
dates = dataframe["date"]
mbps = dataframe["mbps"]
plt.plot(dates, mbps, label="mbps")
plt.title("throughput")
plt.xlabel("time")
plt.ylabel("mbps")
plt.legend()
plt.xticks(rotation=45)
plt.show()
This might not have been ideal for your usecase but it might help someone else.

Categories