Plotting from a pandas dataframe with a timedelta index - python

I'm trying to plot some data from a pandas dataframe with a timedelta index and I want to customize the time ticks and labels in my x-axis. This should be simple but it's proving to be a tough job.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dates
## I have a df similar to this
timestamps = pd.date_range(start="2017-05-08", freq="10T", periods=6*6)
timedeltas = timestamps - pd.to_datetime("2017-05-08")
yy = np.random.random((len(timedeltas),10))
df = pd.DataFrame(data=yy, index=timedeltas) # Ok, this is what I have
## Now I want to plot this but I want detailed control of the plot so I use matplotlib instead of df.plot
fig,axes=plt.subplots()
axes.plot(df.index.values, df.values)
#axes.plot_date(df.index, df.values, '-')
axes.xaxis.set_major_locator(dates.HourLocator(byhour=range(0,24,2)))
axes.xaxis.set_minor_locator(dates.MinuteLocator(byminute=range(0,24*60,10)))
axes.xaxis.set_major_formatter(dates.DateFormatter('%H:%M'))
plt.show()
As you can see, the ticks are not even showing up. How can I add major ticks and labels every two hours and minor ticks every 10 minutes, for example?

Although I don't know exactly what the root issue is, it seems that it is related to the used package versions. When I run your example with an older python distribution (matplotlib 1.5.1, numpy 1.11.1, pandas 0.18.1 and python 2.7.12), then I get a plot without ticks just as you described.
However I can get a plot with the correct ticks
by running the code below with a recent python distribution (matplot 2.0.1, numpy 1.12.1, pandas 0.19.1 and python 3.6.1).
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as dates
timestamps = pd.date_range(start="2017-05-08", freq="10T", periods=6*6)
timedeltas = timestamps - pd.to_datetime("2017-05-08")
yy = np.random.random((len(timedeltas),10))
df = pd.DataFrame(data=yy, index=timedeltas)
fig,axes=plt.subplots()
axes.plot_date(df.index, df.values, '-')
axes.xaxis.set_major_locator(dates.HourLocator(byhour=range(0,24,2)))
axes.xaxis.set_minor_locator(dates.MinuteLocator(byminute=range(0,24*60,10)))
axes.xaxis.set_major_formatter(dates.DateFormatter('%H:%M'))
plt.show()

Related

How to combine bar and line plots with x-axis as datetime in matplotlib

I have a dataFrame with datetimeIndex and two columns with int values. I would like to plot on the same graph Col1 as a bar plot, and Col2 as a line plot.
Important feature is to have correctly labeled x-axis as datetime, also when zooming in-out. I think solutions with DateFormatter would not work, since I want a dynamic xtick labeling.
import matplotlib.pyplot as plt
import pandas as pd
import datetime as dt
import numpy as np
startDate = dt.datetime(2018,1,1,0,0)
nrHours = 144
datetimeIndex = [startDate + dt.timedelta(hours=x) for x in range(0,nrHours)]
dF = pd.DataFrame(index=datetimeIndex)
dF['Col1'] = np.random.randint(1,3,nrHours)
dF['Col2'] = np.random.randint(3,6,nrHours)
axes = dF[['Col1']].plot(kind='bar')
dF[['Col2']].plot(ax=axes)
What seemed to be a simple task turns out being very challenging. Actually, after extensive search on the net, I still haven't found any clean solutions.
I have tried to use both pandas plot and matplotlib.
The main issue arises from the bar plot that seems to have difficulties handling datetime index (prefers integers, in some cases it plot dates but in Epoch 1970-1-1 style which is equivalent to 0).
I finally found a way using mdates and date2num. The solution is not very clean but provides an efficient solution to:
Combine bar and line plot on same graph
Using datetime on x-axis
Correctly and dynamically displaying x-ticks time labels (also when zooming in and out)
Working example :
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
import datetime as dt
import numpy as np
startDate = dt.datetime(2018,1,1,0,0)
nrHours = 144
datetimeIndex = [startDate + dt.timedelta(hours=x) for x in range(0, nrHours)]
dF = pd.DataFrame(index=datetimeIndex)
dF['Col1'] = np.random.randint(1,3,nrHours)
dF['Col2'] = np.random.randint(3,6,nrHours)
fig,axes = plt.subplots()
axes.xaxis_date()
axes.plot(mdates.date2num(list(dF.index)),dF['Col2'])
axes.bar(mdates.date2num(list(dF.index)),dF['Col1'],align='center',width=0.02)
fig.autofmt_xdate()
Sample output:

Matplotlib - show x axis values - dates [duplicate]

Compare the following code:
test = pd.DataFrame({'date':['20170527','20170526','20170525'],'ratio1':[1,0.98,0.97]})
test['date'] = pd.to_datetime(test['date'])
test = test.set_index('date')
ax = test.plot()
I added DateFormatter in the end:
test = pd.DataFrame({'date':['20170527','20170526','20170525'],'ratio1':[1,0.98,0.97]})
test['date'] = pd.to_datetime(test['date'])
test = test.set_index('date')
ax = test.plot()
ax.xaxis.set_minor_formatter(dates.DateFormatter('%d\n\n%a')) ## Added this line
The issue with the second graph is that it starts on 5-24 instead 5-25. Also, 5-25 of 2017 is Thursday not Monday. What is causing the issue? Is this timezone related? (I don't understand why the date numbers are stacked on top of each other either)
In general the datetime utilities of pandas and matplotlib are incompatible. So trying to use a matplotlib.dates object on a date axis created with pandas will in most cases fail.
One reason is e.g. seen from the documentation
datetime objects are converted to floating point numbers which represent time in days since 0001-01-01 UTC, plus 1. For example, 0001-01-01, 06:00 is 1.25, not 0.25.
However, this is not the only difference and it is thus advisable not to mix pandas and matplotlib when it comes to datetime objects.
There is however the option to tell pandas not to use its own datetime format. In that case using the matplotlib.dates tickers is possible. This can be steered via.
df.plot(x_compat=True)
Since pandas does not provide sophisticated formatting capabilities for dates, one can use matplotlib for plotting and formatting.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
df = pd.DataFrame({'date':['20170527','20170526','20170525'],'ratio1':[1,0.98,0.97]})
df['date'] = pd.to_datetime(df['date'])
usePandas=True
#Either use pandas
if usePandas:
df = df.set_index('date')
df.plot(x_compat=True)
plt.gca().xaxis.set_major_locator(dates.DayLocator())
plt.gca().xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
plt.gca().invert_xaxis()
plt.gcf().autofmt_xdate(rotation=0, ha="center")
# or use matplotlib
else:
plt.plot(df["date"], df["ratio1"])
plt.gca().xaxis.set_major_locator(dates.DayLocator())
plt.gca().xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
plt.gca().invert_xaxis()
plt.show()
Updated using the matplotlib object oriented API
usePandas=True
#Either use pandas
if usePandas:
df = df.set_index('date')
ax = df.plot(x_compat=True, figsize=(6, 4))
ax.xaxis.set_major_locator(dates.DayLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
ax.invert_xaxis()
ax.get_figure().autofmt_xdate(rotation=0, ha="center")
# or use matplotlib
else:
fig, ax = plt.subplots(figsize=(6, 4))
ax.plot('date', 'ratio1', data=df)
ax.xaxis.set_major_locator(dates.DayLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
fig.invert_xaxis()
plt.show()

TimeSeries in Bokeh using a dataframe with index

I'm trying to use Bokeh to plot a Pandas dataframe with a DateTime column containing years and a numeric one. If the DateTime is specified as x, the behaviour is the expected (years in the x-axis). However, if I use set_index to turn the DateTime column into the index of the dataframe and then only specify the y in the TimeSeries I get time in milliseconds in the x-axis. A minimal example
import pandas as pd
import numpy as np
from bokeh.charts import TimeSeries, output_file, show
output_file('fig.html')
test = pd.DataFrame({'datetime':pd.date_range('1/1/1880', periods=2000),'foo':np.arange(2000)})
fig = TimeSeries(test,x='datetime',y='foo')
show(fig)
output_file('fig2.html')
test = test.set_index('datetime')
fig2 = TimeSeries(test,y='foo')
show(fig2)
Is this the expected behaviour or a bug? I would expect the same picture with both approaches.
Cheers!!
Bokeh used to add an index for internal reasons but as of not-so-recent versions (>= 0.12.x) it no longer does this. Also it's worth noting that the bokeh.charts API has been deprecated and removed. The equivalent code using the stable bokeh.plotting API yields the expected result:
import pandas as pd
import numpy as np
from bokeh.plotting import figure, output_file, show
from bokeh.layouts import row
output_file('fig.html')
test = pd.DataFrame({'datetime':pd.date_range('1/1/1880', periods=2000),'foo':np.arange(2000)})
fig = figure(x_axis_type="datetime")
fig.line(x='datetime',y='foo', source=test)
test = test.set_index('datetime')
fig2 = figure(x_axis_type="datetime")
fig2.line(x='datetime', y='foo', source=test)
show(row(fig, fig2))

Time-series boxplot in pandas

How can I create a boxplot for a pandas time-series where I have a box for each day?
Sample dataset of hourly data where one box should consist of 24 values:
import pandas as pd
n = 480
ts = pd.Series(randn(n),
index=pd.date_range(start="2014-02-01",
periods=n,
freq="H"))
ts.plot()
I am aware that I could make an extra column for the day, but I would like to have proper x-axis labeling and x-limit functionality (like in ts.plot()), so being able to work with the datetime index would be great.
There is a similar question for R/ggplot2 here, if it helps to clarify what I want.
If its an option for you, i would recommend using Seaborn, which is a wrapper for Matplotlib. You could do it yourself by looping over the groups from your timeseries, but that's much more work.
import pandas as pd
import numpy as np
import seaborn
import matplotlib.pyplot as plt
n = 480
ts = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
fig, ax = plt.subplots(figsize=(12,5))
seaborn.boxplot(ts.index.dayofyear, ts, ax=ax)
Which gives:
Note that i'm passing the day of year as the grouper to seaborn, if your data spans multiple years this wouldn't work. You could then consider something like:
ts.index.to_series().apply(lambda x: x.strftime('%Y%m%d'))
Edit, for 3-hourly you could use this as a grouper, but it only works if there are no minutes or lower defined. :
[(dt - datetime.timedelta(hours=int(dt.hour % 3))).strftime('%Y%m%d%H') for dt in ts.index]
(Not enough rep to comment on accepted solution, so adding an answer instead.)
The accepted code has two small errors: (1) need to add numpy import and (2) nned to swap the x and y parameters in the boxplot statement. The following produces the plot shown.
import numpy as np
import pandas as pd
import seaborn
import matplotlib.pyplot as plt
n = 480
ts = pd.Series(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
fig, ax = plt.subplots(figsize=(12,5))
seaborn.boxplot(ts.index.dayofyear, ts, ax=ax)
I have a solution that may be helpful-- It only uses native pandas and allows for hierarchical date-time grouping (i.e spanning years). The key is that if you pass a function to groupby(), it will be called on each element of the dataframe's index. If your index is a DatetimeIndex (or similar), you can access all of the dt's convenience functions for resampling!
Try this:
n = 480
ts = pd.DataFrame(np.random.randn(n), index=pd.date_range(start="2014-02-01", periods=n, freq="H"))
ts.groupby(lambda x: x.strftime("%Y-%m-%d")).boxplot(subplots=False, figsize=(12,9), rot=90)

Plotting time in Python with Matplotlib

I have an array of timestamps in the format (HH:MM:SS.mmmmmm) and another array of floating point numbers, each corresponding to a value in the timestamp array.
Can I plot time on the x axis and the numbers on the y-axis using Matplotlib?
I was trying to, but somehow it was only accepting arrays of floats. How can I get it to plot the time? Do I have to modify the format in any way?
Update:
This answer is outdated since matplotlib version 3.5. The plot function now handles datetime data directly. See https://matplotlib.org/3.5.1/api/_as_gen/matplotlib.pyplot.plot_date.html
The use of plot_date is discouraged. This method exists for historic
reasons and may be deprecated in the future.
datetime-like data should directly be plotted using plot.
If you need to plot plain numeric data as Matplotlib date format or
need to set a timezone, call ax.xaxis.axis_date / ax.yaxis.axis_date
before plot. See Axis.axis_date.
Old, outdated answer:
You must first convert your timestamps to Python datetime objects (use datetime.strptime). Then use date2num to convert the dates to matplotlib format.
Plot the dates and values using plot_date:
import matplotlib.pyplot
import matplotlib.dates
from datetime import datetime
x_values = [datetime(2021, 11, 18, 12), datetime(2021, 11, 18, 14), datetime(2021, 11, 18, 16)]
y_values = [1.0, 3.0, 2.0]
dates = matplotlib.dates.date2num(x_values)
matplotlib.pyplot.plot_date(dates, y_values)
You can also plot the timestamp, value pairs using pyplot.plot (after parsing them from their string representation). (Tested with matplotlib versions 1.2.0 and 1.3.1.)
Example:
import datetime
import random
import matplotlib.pyplot as plt
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.plot(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.show()
Resulting image:
Here's the same as a scatter plot:
import datetime
import random
import matplotlib.pyplot as plt
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.scatter(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.show()
Produces an image similar to this:
7 years later and this code has helped me.
However, my times still were not showing up correctly.
Using Matplotlib 2.0.0 and I had to add the following bit of code from Editing the date formatting of x-axis tick labels in matplotlib by Paul H.
import matplotlib.dates as mdates
myFmt = mdates.DateFormatter('%d')
ax.xaxis.set_major_formatter(myFmt)
I changed the format to (%H:%M) and the time displayed correctly.
All thanks to the community.
I had trouble with this using matplotlib version: 2.0.2. Running the example from above I got a centered stacked set of bubbles.
I "fixed" the problem by adding another line:
plt.plot([],[])
The entire code snippet becomes:
import datetime
import random
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(minutes=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.plot([],[])
plt.scatter(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
myFmt = mdates.DateFormatter('%H:%M')
plt.gca().xaxis.set_major_formatter(myFmt)
plt.show()
plt.close()
This produces an image with the bubbles distributed as desired.
Pandas dataframes haven't been mentioned yet. I wanted to show how these solved my datetime problem. I have datetime to the milisecond 2021-04-01 16:05:37. I am pulling linux/haproxy throughput from /proc so I can really format it however I like. This is nice for feeding data into a live graph animation.
Here's a look at the csv. (Ignore the packets per second column I'm using that in another graph)
head -2 ~/data
date,mbps,pps
2021-04-01 16:05:37,113,9342.00
...
By using print(dataframe.dtype) I can see how the data was read in:
(base) ➜ graphs ./throughput.py
date object
mbps int64
pps float64
dtype: object
Pandas pulls the date string in as "object", which is just type char. Using this as-is in a script:
import matplotlib.pyplot as plt
import pandas as pd
dataframe = pd.read_csv("~/data")
dates = dataframe["date"]
mbps = dataframe["mbps"]
plt.plot(dates, mbps, label="mbps")
plt.title("throughput")
plt.xlabel("time")
plt.ylabel("mbps")
plt.legend()
plt.xticks(rotation=45)
plt.show()
Matplotlib renders all the milisecond time data. I've added plt.xticks(rotation=45) to tilt the dates but it's not what I want. I can convert the date "object" to a datetime64[ns]. Which matplotlib does know how to render.
dataframe["date"] = pd.to_datetime(dataframe["date"])
This time my date is type datetime64[ns]
(base) ➜ graphs ./throughput.py
date datetime64[ns]
mbps int64
pps float64
dtype: object
Same script with 1 line difference.
#!/usr/bin/env python
import matplotlib.pyplot as plt
import pandas as pd
dataframe = pd.read_csv("~/data")
# convert object to datetime64[ns]
dataframe["date"] = pd.to_datetime(dataframe["date"])
dates = dataframe["date"]
mbps = dataframe["mbps"]
plt.plot(dates, mbps, label="mbps")
plt.title("throughput")
plt.xlabel("time")
plt.ylabel("mbps")
plt.legend()
plt.xticks(rotation=45)
plt.show()
This might not have been ideal for your usecase but it might help someone else.

Categories