Plotting time in Python with Matplotlib - python

I have an array of timestamps in the format (HH:MM:SS.mmmmmm) and another array of floating point numbers, each corresponding to a value in the timestamp array.
Can I plot time on the x axis and the numbers on the y-axis using Matplotlib?
I was trying to, but somehow it was only accepting arrays of floats. How can I get it to plot the time? Do I have to modify the format in any way?

Update:
This answer is outdated since matplotlib version 3.5. The plot function now handles datetime data directly. See https://matplotlib.org/3.5.1/api/_as_gen/matplotlib.pyplot.plot_date.html
The use of plot_date is discouraged. This method exists for historic
reasons and may be deprecated in the future.
datetime-like data should directly be plotted using plot.
If you need to plot plain numeric data as Matplotlib date format or
need to set a timezone, call ax.xaxis.axis_date / ax.yaxis.axis_date
before plot. See Axis.axis_date.
Old, outdated answer:
You must first convert your timestamps to Python datetime objects (use datetime.strptime). Then use date2num to convert the dates to matplotlib format.
Plot the dates and values using plot_date:
import matplotlib.pyplot
import matplotlib.dates
from datetime import datetime
x_values = [datetime(2021, 11, 18, 12), datetime(2021, 11, 18, 14), datetime(2021, 11, 18, 16)]
y_values = [1.0, 3.0, 2.0]
dates = matplotlib.dates.date2num(x_values)
matplotlib.pyplot.plot_date(dates, y_values)

You can also plot the timestamp, value pairs using pyplot.plot (after parsing them from their string representation). (Tested with matplotlib versions 1.2.0 and 1.3.1.)
Example:
import datetime
import random
import matplotlib.pyplot as plt
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.plot(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.show()
Resulting image:
Here's the same as a scatter plot:
import datetime
import random
import matplotlib.pyplot as plt
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(hours=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.scatter(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
plt.show()
Produces an image similar to this:

7 years later and this code has helped me.
However, my times still were not showing up correctly.
Using Matplotlib 2.0.0 and I had to add the following bit of code from Editing the date formatting of x-axis tick labels in matplotlib by Paul H.
import matplotlib.dates as mdates
myFmt = mdates.DateFormatter('%d')
ax.xaxis.set_major_formatter(myFmt)
I changed the format to (%H:%M) and the time displayed correctly.
All thanks to the community.

I had trouble with this using matplotlib version: 2.0.2. Running the example from above I got a centered stacked set of bubbles.
I "fixed" the problem by adding another line:
plt.plot([],[])
The entire code snippet becomes:
import datetime
import random
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# make up some data
x = [datetime.datetime.now() + datetime.timedelta(minutes=i) for i in range(12)]
y = [i+random.gauss(0,1) for i,_ in enumerate(x)]
# plot
plt.plot([],[])
plt.scatter(x,y)
# beautify the x-labels
plt.gcf().autofmt_xdate()
myFmt = mdates.DateFormatter('%H:%M')
plt.gca().xaxis.set_major_formatter(myFmt)
plt.show()
plt.close()
This produces an image with the bubbles distributed as desired.

Pandas dataframes haven't been mentioned yet. I wanted to show how these solved my datetime problem. I have datetime to the milisecond 2021-04-01 16:05:37. I am pulling linux/haproxy throughput from /proc so I can really format it however I like. This is nice for feeding data into a live graph animation.
Here's a look at the csv. (Ignore the packets per second column I'm using that in another graph)
head -2 ~/data
date,mbps,pps
2021-04-01 16:05:37,113,9342.00
...
By using print(dataframe.dtype) I can see how the data was read in:
(base) ➜ graphs ./throughput.py
date object
mbps int64
pps float64
dtype: object
Pandas pulls the date string in as "object", which is just type char. Using this as-is in a script:
import matplotlib.pyplot as plt
import pandas as pd
dataframe = pd.read_csv("~/data")
dates = dataframe["date"]
mbps = dataframe["mbps"]
plt.plot(dates, mbps, label="mbps")
plt.title("throughput")
plt.xlabel("time")
plt.ylabel("mbps")
plt.legend()
plt.xticks(rotation=45)
plt.show()
Matplotlib renders all the milisecond time data. I've added plt.xticks(rotation=45) to tilt the dates but it's not what I want. I can convert the date "object" to a datetime64[ns]. Which matplotlib does know how to render.
dataframe["date"] = pd.to_datetime(dataframe["date"])
This time my date is type datetime64[ns]
(base) ➜ graphs ./throughput.py
date datetime64[ns]
mbps int64
pps float64
dtype: object
Same script with 1 line difference.
#!/usr/bin/env python
import matplotlib.pyplot as plt
import pandas as pd
dataframe = pd.read_csv("~/data")
# convert object to datetime64[ns]
dataframe["date"] = pd.to_datetime(dataframe["date"])
dates = dataframe["date"]
mbps = dataframe["mbps"]
plt.plot(dates, mbps, label="mbps")
plt.title("throughput")
plt.xlabel("time")
plt.ylabel("mbps")
plt.legend()
plt.xticks(rotation=45)
plt.show()
This might not have been ideal for your usecase but it might help someone else.

Related

Plot data frame fast and with correct date format

I have the data as in the screenshot, it is in a dataframe format, I would like to plot the dataframe fast and with correct date format.
The code as follow is much fast than using e.g plt.plot(df["Date"], df["D30"])
df.plot(marker='.', linestyle='none')
So that I would like to keep using the dataframe.plot() functionality directly because it is much faster than plot each column against the "Date" column separately. However, as shown in the graph, the date is not correct. My actually starting Date is 2006-01-10, but in the figure, it is shown from 70-01 (1970-01-01).
For me, the official documentation of matplotlib DateFormatter is quite confusing and not so helpful. I tried to google a easy and clear solution, but most answers are related to plt.plot(x, y) where x is Date and y is the actual value. After that it is easy to adjust the format of the "Date" in the figure. But it will make my plot super slow since I am plotting 11 columns in total.
Any idea how I can plot data frame fast and with correct date format
import os
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
date_format = mdates.DateFormatter('%y%m')
df_file = r"C:\Codes\df_file.csv"
df = pd.read_csv(path_file)
print(len(df), df.info(), df["Date"][0], type(df["Date"][0]))
df.head(2)
fig = plt.figure(figsize=(12.0, 8.0))
df.plot(marker='.', linestyle='none')
plt.title("data_frame_show date", fontsize=16)
plt.gca().xaxis.set_major_formatter(dtFmt)
plt.legend(loc=(1.04, 0))
plt.show()
partial input:
Date,D10,D30,D60,D91,D122,D152,D182,D273,D365,D547,D730
2006-01-10,,0.1373444,0.1544265,0.1541397,0.1429375,0.1421464,0.1426055,0.1460771,0.1486266,0.1551848,0.1593932
2006-01-11,,0.135426,0.1411246,0.141093,0.1384091,0.1383636,0.1395791,0.1438944,0.1469191,0.1553112,0.1598582
2006-01-12,,0.1311339,0.1292621,0.1304292,0.1363482,0.1362213,0.1367843,0.1404174,0.1439877,0.152306,0.1568677
2006-01-13,,0.1594458,0.1355387,0.1367246,0.1434708,0.143745,0.1441349,0.1453056,0.1481918,0.157193,0.1607564
2006-01-16,,0.1374846,0.1182223,0.1272385,0.1415359,0.1418881,0.1430098,0.1468544,0.1496407,0.1547714,0.158936
2006-01-17,,0.1453834,0.1418838,0.143198,0.1437924,0.143473,0.1440987,0.1473208,0.1501543,0.1590842,0.1629096
2006-01-18,,0.1385479,0.141472,0.1481763,0.1515037,0.1511353,0.1511544,0.1535245,0.1554254,0.1626349,0.1663554
2006-01-19,,0.1639788,0.1462084,0.1483903,0.1486906,0.1483109,0.1492335,0.1539002,0.1563708,0.1611751,0.1644693
2006-01-20,,0.189771,0.178394,0.1638331,0.1565402,0.1559029,0.1553547,0.1526479,0.1516396,0.1614136,0.1646431
2006-01-23,,0.1420271,0.1570005,0.1614942,0.1607205,0.1605297,0.1630065,0.1653838,0.1642349,0.166809,0.1701779
2006-01-24,,0.1814291,0.1633585,0.1563364,0.1548823,0.15382,0.1545099,0.1590869,0.1609158,0.1653819,0.1681759
2006-01-25,,0.1272998,0.1445222,0.1487031,0.1522032,0.152714,0.1524364,0.1532192,0.1550062,0.1635665,0.1658293
2006-01-26,,0.1392162,0.1413034,0.1443807,0.1476261,0.1482458,0.1473548,0.1471019,0.1493254,0.1578586,0.160699
2006-01-27,,0.1360269,0.1374056,0.1387952,0.1426731,0.1441445,0.144917,0.1462428,0.1478979,0.1519537,0.1550311
2006-01-30,,0.1439245,0.1430108,0.1434628,0.1448731,0.1450397,0.1454756,0.1467621,0.1487521,0.1538424,0.1561802
2006-01-31,,0.1483135,0.1468713,0.1473837,0.1519043,0.1519379,0.1502139,0.1504632,0.1529254,0.1571567,0.1589795
2006-02-01,,0.1464208,0.1447363,0.1443483,0.1459808,0.1477726,0.1505124,0.1520256,0.1535773,0.1589145,0.1607383
2006-02-02,,0.1484249,0.1414394,0.1412338,0.1497531,0.1500731,0.1475751,0.147502,0.1512457,0.1571017,0.1606797
2006-02-03,,0.1496503,0.1485318,0.1502473,0.1565336,0.156727,0.1556335,0.1560396,0.1579241,0.1619183,0.1634751
2006-02-06,,0.149966,0.1457216,0.1475524,0.1539103,0.1546401,0.154973,0.1553681,0.1570598,0.161173,0.1630743
2006-02-08,,0.1463649,0.1436135,0.1454147,0.1498372,0.1507231,0.1520234,0.1538407,0.1563603,0.1617697,0.1639547
2006-02-09,,0.1401312,0.1432856,0.1437166,0.1443243,0.1463163,0.148681,0.1496198,0.1516376,0.1584639,0.1615756
2006-02-10,,0.1339916,0.1405194,0.1432779,0.1464605,0.1470921,0.1484831,0.1514307,0.1550715,0.1599564,0.1623171
2006-02-13,,0.1470304,0.1423007,0.1446087,0.1470668,0.1485171,0.1503383,0.1508497,0.1532987,0.1591155,0.1615874
2006-02-14,,0.1454322,0.1449017,0.1455735,0.1462286,0.1478059,0.1501469,0.1522522,0.1541999,0.157668,0.1601427
2006-02-15,,0.1429312,0.1455881,0.1464055,0.1471812,0.1489883,0.1514654,0.153837,0.1559375,0.16082,0.1631557
2006-02-16,,0.134637,0.1373471,0.140634,0.1432172,0.145788,0.14875,0.1507805,0.15325,0.1581015,0.1613797
2006-02-20,,0.1303785,0.1334454,0.139216,0.1423217,0.1454704,0.1477552,0.1487534,0.1509405,0.1554398,0.1588761
2006-02-21,,0.1359587,0.1370814,0.1416117,0.1418016,0.1441761,0.1468109,0.1476679,0.1496546,0.1561362,0.1607204
2006-02-22,,0.1302253,0.1337104,0.1415016,0.141451,0.1438881,0.1467031,0.1502449,0.1514018,0.1531452,0.1582335
2006-02-23,,0.1282022,0.1333902,0.1342376,0.1385976,0.1453201,0.1481733,0.1490296,0.1512885,0.1554035,0.1593463
2006-02-24,,0.1269229,0.1304391,0.1348061,0.1378378,0.1419301,0.1442134,0.1472283,0.1507224,0.1555662,0.1595938
2006-02-27,,0.1254707,0.128201,0.1334554,0.1374389,0.1427246,0.1446071,0.1465459,0.1496113,0.1541296,0.1578174
2006-02-28,,0.1346332,0.1361773,0.139586,0.1421924,0.1468084,0.1489651,0.1505661,0.1541479,0.1606205,0.1675438
2006-03-01,,0.1301198,0.1318495,0.1343342,0.1376886,0.1434328,0.1459977,0.1490832,0.1525961,0.1557153,0.1593923
2006-03-02,,0.1304425,0.1347556,0.1398592,0.1420431,0.1457691,0.1479747,0.1510143,0.1544964,0.1589201,0.1616325
2006-03-03,,0.1311674,0.1339681,0.138887,0.1418598,0.1451706,0.1472144,0.1495689,0.1536886,0.1599843,0.162247
2006-03-06,,0.1308081,0.1367775,0.1412145,0.1436582,0.1480171,0.1495588,0.1511633,0.1545973,0.1588486,0.1616268
2006-03-07,,0.1344355,0.1387528,0.143365,0.1459607,0.1482421,0.1491656,0.1512236,0.1550063,0.1593201,0.1615385
When plotting time series, pandas takes the index for the x-axis when calling the plot function.
I would suggest to:
df = df.assign(
Date=lambda x: pd.to_datetime(x["Date"], format="%Y-%m%d")
).set_index("Date")

Python Pyplot - Format Plotted Graph's Y Axis as a Percent to 2 Decimal Places

I am trying to represent CDC Delay of Care data as a line graph but am having some trouble formatting the y axis so that it is a percentage to the hundredths place. I would also like for the x axis to show every year in the range selected.
Here is my code:
import pandas as pd
from isolation import isolate_total_stub, isolate_age_stub
import matplotlib.pyplot as plt
# very simple extraction, drop some columns and check some data
cdc_data = pd.read_csv('CDC_Delay_of_Care_Data.csv')
# separate the categories of delayed care
delay_of_medical_care = cdc_data[cdc_data.PANEL == 'Delay or nonreceipt of needed medical care due to cost']
# isolate the totals stub
total_delay_of_medical_care = isolate_total_stub(delay_of_medical_care)
x_axis = total_delay_of_medical_care.YEAR
y_axis = total_delay_of_medical_care.ESTIMATE
plt.plot(x_axis, y_axis)
plt.xlabel('Year')
plt.ylabel('Percentage')
plt.show()
The graph that displays looks like this:
line graph
Excuse me for being a novice, I have been googling for an hour now and instead of continue to search for an answer I thought it would be more productive to ask StackOverflow.
Thank you for your time.
To change the format of Y-axis, you can use set_major_formatter
To change X-axis to date in year format, you will need to use set_major_locator, assuming that your date is in datetime format
To change format of X-axis, you can again use the set_major_formatter
I am showing a small example below with dummy data. Hope this works.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FormatStrFormatter
import matplotlib.dates as mdate
estimate = [8, 7.1, 11, 10.6, 8, 8.3]
year = ['2000-01-01', '2004-01-01', '2008-01-01', '2012-01-01', '2016-01-01', '2020-01-01']
year=pd.to_datetime(year) ## Convert string to datetime
plt.figure(figsize=(12,5)) ## Added so the Years don't overlap on each other
plt.plot(year, estimate)
plt.xlabel('Year')
plt.ylabel('Percentage')
plt.gca().yaxis.set_major_formatter(FormatStrFormatter('%.2f')) ## Makes X-axis label with two decimal points
locator = mdate.YearLocator()
plt.gca().xaxis.set_major_locator(locator) ## Changes datetime to years - 1 label per year
plt.gca().xaxis.set_major_formatter(mdate.DateFormatter('%Y')) ## Shows X-axis in Years
plt.gcf().autofmt_xdate() ## Rotates X-labels, if you want to use it
plt.show()
Output plot

How to mark the beginning of a new year while plotting pandas Series?

I am plotting such data:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import pandas as pd
a = pd.DatetimeIndex(start='2010-01-01',end='2011-06-01' , freq='M')
b = pd.Series(np.random.randn(len(a)), index=a)
I would like the plot to be in the format of bars, so I use this:
b.plot(kind='bar')
This is what I get:
As you can see, the dates are formatted in full, which is very ugly and unreadable. I happened to test this command which creates a very nice Date format:
b.plot()
As you can see:
I like this format very much, it includes the months, marks the beginning of the year and is easily readable.
After doing some search, the closest I could get to that format is using this:
fig, ax = plt.subplots()
ax.plot(b.index, b)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
However the output looks like this:
I am able to have month names on x axis this way, but I like the first formatting much more. That is much more elegant. Does anyone know how I can get the same exact xticks for my bar plot?
Here's a solution that will get you the format you're looking for. You can edit the tick labels directly, and use set_major_formatter() method:
fig, ax = plt.subplots()
ax.bar(b.index, b)
ticklabels = [item.strftime('%b') for item in b.index] #['']*len(b.index)
ticklabels[::12] = [item.strftime('%b\n%Y') for item in b.index[::12]]
ax.xaxis.set_major_formatter(matplotlib.ticker.FixedFormatter(ticklabels))
ax.set_xticks(b.index)
plt.gcf().autofmt_xdate()
Output:

Matplotlib - show x axis values - dates [duplicate]

Compare the following code:
test = pd.DataFrame({'date':['20170527','20170526','20170525'],'ratio1':[1,0.98,0.97]})
test['date'] = pd.to_datetime(test['date'])
test = test.set_index('date')
ax = test.plot()
I added DateFormatter in the end:
test = pd.DataFrame({'date':['20170527','20170526','20170525'],'ratio1':[1,0.98,0.97]})
test['date'] = pd.to_datetime(test['date'])
test = test.set_index('date')
ax = test.plot()
ax.xaxis.set_minor_formatter(dates.DateFormatter('%d\n\n%a')) ## Added this line
The issue with the second graph is that it starts on 5-24 instead 5-25. Also, 5-25 of 2017 is Thursday not Monday. What is causing the issue? Is this timezone related? (I don't understand why the date numbers are stacked on top of each other either)
In general the datetime utilities of pandas and matplotlib are incompatible. So trying to use a matplotlib.dates object on a date axis created with pandas will in most cases fail.
One reason is e.g. seen from the documentation
datetime objects are converted to floating point numbers which represent time in days since 0001-01-01 UTC, plus 1. For example, 0001-01-01, 06:00 is 1.25, not 0.25.
However, this is not the only difference and it is thus advisable not to mix pandas and matplotlib when it comes to datetime objects.
There is however the option to tell pandas not to use its own datetime format. In that case using the matplotlib.dates tickers is possible. This can be steered via.
df.plot(x_compat=True)
Since pandas does not provide sophisticated formatting capabilities for dates, one can use matplotlib for plotting and formatting.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
df = pd.DataFrame({'date':['20170527','20170526','20170525'],'ratio1':[1,0.98,0.97]})
df['date'] = pd.to_datetime(df['date'])
usePandas=True
#Either use pandas
if usePandas:
df = df.set_index('date')
df.plot(x_compat=True)
plt.gca().xaxis.set_major_locator(dates.DayLocator())
plt.gca().xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
plt.gca().invert_xaxis()
plt.gcf().autofmt_xdate(rotation=0, ha="center")
# or use matplotlib
else:
plt.plot(df["date"], df["ratio1"])
plt.gca().xaxis.set_major_locator(dates.DayLocator())
plt.gca().xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
plt.gca().invert_xaxis()
plt.show()
Updated using the matplotlib object oriented API
usePandas=True
#Either use pandas
if usePandas:
df = df.set_index('date')
ax = df.plot(x_compat=True, figsize=(6, 4))
ax.xaxis.set_major_locator(dates.DayLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
ax.invert_xaxis()
ax.get_figure().autofmt_xdate(rotation=0, ha="center")
# or use matplotlib
else:
fig, ax = plt.subplots(figsize=(6, 4))
ax.plot('date', 'ratio1', data=df)
ax.xaxis.set_major_locator(dates.DayLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('%d\n\n%a'))
fig.invert_xaxis()
plt.show()

Plotting chart with epoch time x axis using matplotlib

I have the following code to plot a chart with matplotlib
#!/usr/bin/env python
import matplotlib.pyplot as plt
import urllib2
import json
req = urllib2.urlopen("http://localhost:17668/retrieval/data/getData.json? pv=LNLS:ANEL:corrente&donotchunk")
data = json.load(req)
secs = [x['secs'] for x in data[0]['data']]
vals = [x['val'] for x in data[0]['data']]
plt.plot(secs, vals)
plt.show()
The secs arrays is epoch time.
What I want is to plot the data in the x axis (secs) as a date (DD-MM-YYYY HH:MM:SS).
How can I do that?
To plot date-based data in matplotlib you must convert the data to the correct format.
One way is to first convert your data to datetime objects, for an epoch timestamp you should use datetime.datetime.fromtimestamp().
You must then convert the datetime objects to the right format for matplotlib, this can be handled using matplotlib.date.date2num.
Alternatively you can use matplotlib.dates.epoch2num and skip converting your date to datetime objects in the first place (while this will suit your use-case better initially, I would recommend trying to keep date based date in datetime objects as much as you can when working, it will save you a headache in the long run).
Once you have your data in the correct format you can plot it using plot_date.
Finally to format your x-axis as you wish you can use a matplotlib.dates.DateFormatter object to choose how your ticks will look.
import matplotlib.pyplot as plt
import matplotlib.dates as mdate
import numpy as np
# Generate some random data.
N = 40
now = 1398432160
raw = np.array([now + i*1000 for i in range(N)])
vals = np.sin(np.linspace(0,10,N))
# Convert to the correct format for matplotlib.
# mdate.epoch2num converts epoch timestamps to the right format for matplotlib
secs = mdate.epoch2num(raw)
fig, ax = plt.subplots()
# Plot the date using plot_date rather than plot
ax.plot_date(secs, vals)
# Choose your xtick format string
date_fmt = '%d-%m-%y %H:%M:%S'
# Use a DateFormatter to set the data to the correct format.
date_formatter = mdate.DateFormatter(date_fmt)
ax.xaxis.set_major_formatter(date_formatter)
# Sets the tick labels diagonal so they fit easier.
fig.autofmt_xdate()
plt.show()
You can change the ticks locations and formats on your plot:
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import time
secs = [10928389,102928123,383827312,1238248395]
vals = [12,8,4,12]
plt.plot(secs,vals)
plt.gcf().autofmt_xdate()
plt.gca().xaxis.set_major_locator(mtick.FixedLocator(secs))
plt.gca().xaxis.set_major_formatter(
mtick.FuncFormatter(lambda pos,_: time.strftime("%d-%m-%Y %H:%M:%S",time.localtime(pos)))
)
plt.tight_layout()
plt.show()

Categories