How to plot time series in python - python

I have been trying to plot a time series graph from a CSV file. I have managed to read the file and converted the data from string to date using strptime and stored in a list. When I tried plotting a test plot in matplotlib with the list containing the date information it plotted the date as a series of dots; that is, for a date 2012-may-31 19:00 hours, I got a plot with a dot at 2012, 05, 19, 31, 00 on y axis for the value of x=1 and so on. I understand that this is not the correct way of passing date information for plotting. Can someone tell me how to pass this information correctly.

Convert your x-axis data from text to datetime.datetime, use datetime.strptime:
>>> from datetime import datetime
>>> datetime.strptime("2012-may-31 19:00", "%Y-%b-%d %H:%M")
datetime.datetime(2012, 5, 31, 19, 0)
This is an example of how to plot data once you have an array of datetimes:
import matplotlib.pyplot as plt
import datetime
import numpy as np
x = np.array([datetime.datetime(2013, 9, 28, i, 0) for i in range(24)])
y = np.random.randint(100, size=x.shape)
plt.plot(x,y)
plt.show()

Related

Nested dictionary to diagram [duplicate]

I have an index array (x) of dates (datetime objects) and an array of actual values (y: bond prices). Doing (in iPython):
plot(x,y)
Produces a perfectly fine time series graph with the x axis labeled with the dates. No problem so far. But I want to add text on certain dates. For example, at 2009-10-31 I wish to display the text "Event 1" with an arrow pointing to the y value at that date.
I have read trough the Matplotlib documentation on text() and annotate() to no avail. It only covers standard numbered x-axises, and I can´t infer how to work those examples on my problem.
Thank you
Matplotlib uses an internal floating point format for dates.
You just need to convert your date to that format (using matplotlib.dates.date2num or matplotlib.dates.datestr2num) and then use annotate as usual.
As a somewhat excessively fancy example:
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
x = [dt.datetime(2009, 05, 01), dt.datetime(2010, 06, 01),
dt.datetime(2011, 04, 01), dt.datetime(2012, 06, 01)]
y = [1, 3, 2, 5]
fig, ax = plt.subplots()
ax.plot_date(x, y, linestyle='--')
ax.annotate('Test', (mdates.date2num(x[1]), y[1]), xytext=(15, 15),
textcoords='offset points', arrowprops=dict(arrowstyle='-|>'))
fig.autofmt_xdate()
plt.show()

Pandas Plotting Display all date values on x-axis (matplolib only displays few values) formatted as MMM-YYYY [duplicate]

This question already has an answer here:
MonthLocator in Matplotlib
(1 answer)
Closed 2 years ago.
import os
import pandas as pd
import matplotlib.pyplot as plt
import datetime
df = pd.read_excel(DATA_DIR+"/"+file_list[0], index_col="Date")
df.head(5)
smooth = df['Pur. Rate'].rolling(window=20).mean()
smooth.plot()
I get the following graph and need to plot all the date values for every MONTH-YEAR on the x-axis.
I want to display all the months and years formatted diagonally on the x-axis in the format (Feb-19). I can make the size of the plot larger to fit all as I will save it as jpg.
I want the x-axis to have the following values:
Jan 16, Feb 16, Mar 16, Apr 16, May 16, Jun 16, Jul 16, Aug 16, Sep 16, Oct 16, Nov 16, Dec 16, Jan 17, Feb 17 …
(I want to display all these values, matplotlib automatically truncates this, I want to avoid that)
As mentioned in the comments, you have to set both, the Locator and the Formatter. This is explained well in the matplotlib documentation for graphs in general and separately for datetime axes. See also an explanation of the TickLocators. The formatting codes are derived from Python's strftime() and strptime() format codes.
from matplotlib import pyplot as plt
import pandas as pd
from matplotlib.dates import MonthLocator, DateFormatter
#fake data
import numpy as np
np.random.seed(123)
n = 100
df = pd.DataFrame({"Dates": pd.date_range("20180101", periods=n, freq="10d"), "A": np.random.randint(0, 100, size=n), "B": np.random.randint(0, 100, size=n),})
df.set_index("Dates", inplace=True)
print(df)
ax = df.plot()
#defines the tick location
ax.xaxis.set_major_locator(MonthLocator())
#defines the label format
ax.xaxis.set_major_formatter(DateFormatter("%b-%y"))
ax.tick_params(axis="x", labelrotation= 90)
plt.tight_layout()
plt.show()
Sample output:
With just pandas functions, you can use stftime() to replace your dates schema index '%Y-%m-%d' by a new form '%b-%Y' and some params in plot.
smoothdf.plot(xticks=smoothdf.index.strftime('%m-%Y').unique()).set_xticklabels(smoothdf.index.strftime('%b-%Y').unique())
xticks to specify wich label you absolutly want to see.
set_xticklabels to modify the list of labels.
I suggest you use matplotlib and not pandas plot, and do something like this to plot the dates in the format you specified:
import matplotlib.dates as mdates
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
myFmt = mdates.DateFormatter('%b-%Y') # date formatter for matplotlib
# %b is Month abbreviated name, %Y is the Year
# ... after some code
fig, ax = plt.subplots(figsize=(15,8))
ax.xaxis.set_major_formatter(myFmt)
fig.autofmt_xdate()
# Plot data ...
ax.set_xticks("""... define how often to show the date""")
You can get the data out of the data frame with something like: .to_numpy() or .values().
Refer to this documentation for the set_xticks function.

How to format Pandas / Matplotlib graph so the x-axis ticks are ONLY hours and minutes?

I am trying to plot temperature with respect to time data from a csv file.
My goal is to have a graph which shows the temperature data per day.
My problem is the x-axis: I would like to show the time for uniformly and only be in hours and minutes with 15 minute intervals, for example: 00:00, 00:15, 00:30.
The csv is loaded into a pandas dataframe, where I filter the data to be shown based on what day it is, in the code I want only temperature data for 18th day of the month.
Here is the csv data that I am loading in:
date,temp,humid
2020-10-17 23:50:02,20.57,87.5
2020-10-17 23:55:02,20.57,87.5
2020-10-18 00:00:02,20.55,87.31
2020-10-18 00:05:02,20.54,87.17
2020-10-18 00:10:02,20.54,87.16
2020-10-18 00:15:02,20.52,87.22
2020-10-18 00:20:02,20.5,87.24
2020-10-18 00:25:02,20.5,87.24
here is the python code to make the graph:
import pandas as pd
import datetime
import matplotlib.pyplot as plt
df = pd.read_csv("saveData2020.csv")
#make new columns in dataframe so data can be filtered
df["New_Date"] = pd.to_datetime(df["date"]).dt.date
df["New_Time"] = pd.to_datetime(df["date"]).dt.time
df["New_hrs"] = pd.to_datetime(df["date"]).dt.hour
df["New_mins"] = pd.to_datetime(df["date"]).dt.minute
df["day"] = pd.DatetimeIndex(df['New_Date']).day
#filter the data to be only day 18
ndf = df[df["day"]==18]
#display dataframe in console
pd.set_option('display.max_rows', ndf.shape[0]+1)
print(ndf.head(10))
#plot a graph
ndf.plot(kind='line',x='New_Time',y='temp',color='red')
#edit graph to be sexy
plt.setp(plt.gca().xaxis.get_majorticklabels(),'rotation', 30)
plt.xlabel("time")
plt.ylabel("temp in C")
#show graph with the sexiness edits
plt.show()
here is the graph I get:
Answer
First of all, you have to convert "New Time" (your x axis) from str to datetime type with:
ndf["New_Time"] = pd.to_datetime(ndf["New_Time"], format = "%H:%M:%S")
Then you can simply add this line of code before showing the plot (and import the proper matplotlib library, matplotlib.dates as md) to tell matplotlib you want only hours and minutes:
plt.gca().xaxis.set_major_formatter(md.DateFormatter('%H:%M'))
And this line of code to fix the 15 minutes span for the ticks:
plt.gca().xaxis.set_major_locator(md.MinuteLocator(byminute = [0, 15, 30, 45]))
For more info on x axis time formatting you can check this answer.
Code
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import matplotlib.dates as md
df = pd.read_csv("saveData2020.csv")
#make new columns in dataframe so data can be filtered
df["New_Date"] = pd.to_datetime(df["date"]).dt.date
df["New_Time"] = pd.to_datetime(df["date"]).dt.time
df["New_hrs"] = pd.to_datetime(df["date"]).dt.hour
df["New_mins"] = pd.to_datetime(df["date"]).dt.minute
df["day"] = pd.DatetimeIndex(df['New_Date']).day
#filter the data to be only day 18
ndf = df[df["day"]==18]
ndf["New_Time"] = pd.to_datetime(ndf["New_Time"], format = "%H:%M:%S")
#display dataframe in console
pd.set_option('display.max_rows', ndf.shape[0]+1)
print(ndf.head(10))
#plot a graph
ndf.plot(kind='line',x='New_Time',y='temp',color='red')
#edit graph to be sexy
plt.setp(plt.gca().xaxis.get_majorticklabels(),'rotation', 30)
plt.xlabel("time")
plt.ylabel("temp in C")
plt.gca().xaxis.set_major_locator(md.MinuteLocator(byminute = [0, 15, 30, 45]))
plt.gca().xaxis.set_major_formatter(md.DateFormatter('%H:%M'))
#show graph with the sexiness edits
plt.show()
Plot
Notes
If you do not need "New_Date", "New_Time", "New hrs", "New_mins" and "day" columns for other purposes than plotting, you can use a shorter version of the above code, getting rid of those columns and appling the day filter directly on "date" column as here:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as md
df = pd.read_csv("saveData2020.csv")
# convert date from string to datetime
df["date"] = pd.to_datetime(df["date"], format = "%Y-%m-%d %H:%M:%S")
#filter the data to be only day 18
ndf = df[df["date"].dt.day == 18]
#display dataframe in console
pd.set_option('display.max_rows', ndf.shape[0]+1)
print(ndf.head(10))
#plot a graph
ndf.plot(kind='line',x='date',y='temp',color='red')
#edit graph to be sexy
plt.setp(plt.gca().xaxis.get_majorticklabels(),'rotation', 30)
plt.xlabel("time")
plt.ylabel("temp in C")
plt.gca().xaxis.set_major_locator(md.MinuteLocator(byminute = [0, 15, 30, 45]))
plt.gca().xaxis.set_major_formatter(md.DateFormatter('%H:%M'))
#show graph with the sexiness edits
plt.show()
This code will reproduce exactly the same plot as before.

Plotting dates with matplotlib

I've been attempting to plot data from a comma delimited csv file which contains a date and a float:
Date,Price (€)
01062017,20.90
02062017,30.90
03062017,40.90
04062017,60.90
05062017,50.90
I then attempt to plot this with the following code:
import matplotlib.pyplot as plt
import numpy as np
import datetime
dates,cost = np.loadtxt('price_check.csv',delimiter=',',skiprows=1,unpack=True)
xdates = [datetime.datetime.strptime(str(int(date)),'%d%m%Y') for date in dates]
fig = plt.figure()
ax = plt.subplot(111)
plt.plot(xdates, cost,'o-',label='Cost')
plt.legend(loc=4)
plt.ylabel('Price (Euro)')
plt.xlabel('date')
plt.gcf().autofmt_xdate()
plt.grid()
plt.savefig('sunglasses_cost.png')
plt.show()
However, when the data is plotted, it looks like the leading zero in in the date string is being dropped:
Is there an easy way for the full date to be used in the plot?
The problem are the dates, which are converted to integers and loose their leading zero. Then
"01062017" becomes 1062017 and is then interpreted as (2017, 6, 10, 0, 0), so 2 digits as day, one digit month. For 5062017, because there is no 50th of june, it is interpreted differently and correctly as (2017, 6, 5, 0, 0).
The least invasive method to overcome this would be to format the string such that it always has 8 digist before datetime conversion:
xdates = [datetime.datetime.strptime('{:08}'.format(int(date)),'%d%m%Y') for date in dates]
This will then result in the correct plot. However, the xticklabels may show in an inconvenient way. This could be adjusted by choosing some locator and formatter
import matplotlib.dates as mdates
plt.gca().xaxis.set_major_locator(mdates.DayLocator())
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%d-%m-%Y'))
As a last comment: If you have the choice to select the format of your input file, it might be worth specifing it in a non-ambiguous way, e.g. 20170601.

Plotting time in Python with Matplotlib

I have an array of timestamps in the format (HH:MM:SS.mmmmmm) and another array of floating point numbers, each corresponding to a value in the timestamp array.
Can I plot time on the x axis and the numbers on the y-axis using Matplotlib?
I was trying to, but somehow it was only accepting arrays of floats. How can I get it to plot the time? Do I have to modify the format in any way?
Update:
This answer is outdated since matplotlib version 3.5. The plot function now handles datetime data directly. See https://matplotlib.org/3.5.1/api/_as_gen/matplotlib.pyplot.plot_date.html
The use of plot_date is discouraged. This method exists for historic
reasons and may be deprecated in the future.
datetime-like data should directly be plotted using plot.
If you need to plot plain numeric data as Matplotlib date format or
need to set a timezone, call ax.xaxis.axis_date / ax.yaxis.axis_date
before plot. See Axis.axis_date.
Old, outdated answer:
You must first convert your timestamps to Python datetime objects (use datetime.strptime). Then use date2num to convert the dates to matplotlib format.
Plot the dates and values using plot_date:
import matplotlib.pyplot
import matplotlib.dates
from datetime import datetime
x_values = [datetime(2021, 11, 18, 12), datetime(2021, 11, 18, 14), datetime(2021, 11, 18, 16)]
y_values = [1.0, 3.0, 2.0]
dates = matplotlib.dates.date2num(x_values)
matplotlib.pyplot.plot_date(dates, y_values)
You can also plot the timestamp, value pairs using pyplot.plot (after parsing them from their string representation). (Tested with matplotlib versions 1.2.0 and 1.3.1.)
Example:
import datetime
import random
import matplotlib.pyplot as plt
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.plot(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.show()
Resulting image:
Here's the same as a scatter plot:
import datetime
import random
import matplotlib.pyplot as plt
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.scatter(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.show()
Produces an image similar to this:
7 years later and this code has helped me.
However, my times still were not showing up correctly.
Using Matplotlib 2.0.0 and I had to add the following bit of code from Editing the date formatting of x-axis tick labels in matplotlib by Paul H.
import matplotlib.dates as mdates
myFmt = mdates.DateFormatter('%d')
ax.xaxis.set_major_formatter(myFmt)
I changed the format to (%H:%M) and the time displayed correctly.
All thanks to the community.
I had trouble with this using matplotlib version: 2.0.2. Running the example from above I got a centered stacked set of bubbles.
I "fixed" the problem by adding another line:
plt.plot([],[])
The entire code snippet becomes:
import datetime
import random
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(minutes=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.plot([],[])
plt.scatter(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
myFmt = mdates.DateFormatter('%H:%M')
plt.gca().xaxis.set_major_formatter(myFmt)
plt.show()
plt.close()
This produces an image with the bubbles distributed as desired.
Pandas dataframes haven't been mentioned yet. I wanted to show how these solved my datetime problem. I have datetime to the milisecond 2021-04-01 16:05:37. I am pulling linux/haproxy throughput from /proc so I can really format it however I like. This is nice for feeding data into a live graph animation.
Here's a look at the csv. (Ignore the packets per second column I'm using that in another graph)
head -2 ~/data
date,mbps,pps
2021-04-01 16:05:37,113,9342.00
...
By using print(dataframe.dtype) I can see how the data was read in:
(base) ➜ graphs ./throughput.py
date object
mbps int64
pps float64
dtype: object
Pandas pulls the date string in as "object", which is just type char. Using this as-is in a script:
import matplotlib.pyplot as plt
import pandas as pd
dataframe = pd.read_csv("~/data")
dates = dataframe["date"]
mbps = dataframe["mbps"]
plt.plot(dates, mbps, label="mbps")
plt.title("throughput")
plt.xlabel("time")
plt.ylabel("mbps")
plt.legend()
plt.xticks(rotation=45)
plt.show()
Matplotlib renders all the milisecond time data. I've added plt.xticks(rotation=45) to tilt the dates but it's not what I want. I can convert the date "object" to a datetime64[ns]. Which matplotlib does know how to render.
dataframe["date"] = pd.to_datetime(dataframe["date"])
This time my date is type datetime64[ns]
(base) ➜ graphs ./throughput.py
date datetime64[ns]
mbps int64
pps float64
dtype: object
Same script with 1 line difference.
#!/usr/bin/env python
import matplotlib.pyplot as plt
import pandas as pd
dataframe = pd.read_csv("~/data")
# convert object to datetime64[ns]
dataframe["date"] = pd.to_datetime(dataframe["date"])
dates = dataframe["date"]
mbps = dataframe["mbps"]
plt.plot(dates, mbps, label="mbps")
plt.title("throughput")
plt.xlabel("time")
plt.ylabel("mbps")
plt.legend()
plt.xticks(rotation=45)
plt.show()
This might not have been ideal for your usecase but it might help someone else.

Categories