Matplotlib not reading time axis correctly - python

I want to visualise the daily data using Matplotlib. The data is temperature against time and has this format:
Time Temperature
1 8:23:04 18.5
2 8:23:04 19.0
3 9:12:57 19.0
4 9:12:57 20.0
... ... ...
But when plotting the graph, the Time values on x-axis is distorted, which looks like this:
Realising Matplotlib may not be interpreting time data correctly, I converted the time format using pd.to_datetime:
df['Time'] = pd.to_datetime(df['Time'], format="%H:%M:%S")
df.plot( 'Time', 'Temperature',figsize=(20, 10))
df.describe()
but this again returned:
How to make the time on x-axis look normal? Thanks

As #Michael O. was saying, you need to take care of the datetime.
You miss the day, year and month. Here I implemented a possible solution adding these missing data with some default values, you may want to change them.
The code is very simple and the comments illustrate what I am doing.
import pandas as pd
from datetime import datetime, date, time, timezone
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
vals=[["8:23:04", 18.5],
["8:23:04", 19.0],
["9:12:57", 19.0],
["9:12:57", 20.0]]
apd=pd.DataFrame(vals, columns=["Time", "Temp"])
# a simple function to convert a string to datetime
def conv_time(cell):
dt = datetime.strptime(cell, "%d/%m/%Y %H:%M:%S")
return(dt)
# the dataframe misses the day, month and year, we need to add some
apd["Time"]=["{}/{}/{} {}".format(1,1,2020, cell) for cell in apd["Time"]]
# we use the function to convert the column to a datetime
apd["Time"]=[conv_time(cell) for cell in apd["Time"]]
## plotting the results taking care of the axis
fig, ax = plt.subplots()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%H"))
ax.set_xlim([pd.to_datetime('2020-01-1 6:00:00'), pd.to_datetime('2020-01-1 12:00:00')])
ax.scatter(apd["Time"], apd["Temp"])

Related

How do you get dates on the start on the specified month? (matplotlib)

# FEB
# configuring the figure and plot space
fig, lx = plt.subplots(figsize=(30,10))
# converting the Series into str so the data can be plotted
wd = df2['Unnamed: 1']
wd = wd.astype(float)
# adding the x and y axes' values
lx.plot(list(df2.index.values), wd)
# defining what the labels will be
lx.set(xlabel='Day', ylabel='Weight', title='Daily Weight February 2022')
# defining the date format
date_format = DateFormatter('%m-%d')
lx.xaxis.set_major_formatter(date_format)
lx.xaxis.set_minor_locator(mdates.WeekdayLocator(interval=1))
Values I would like the x-axis to have:
['2/4', '2/5', '2/6', '2/7', '2/8', '2/9', '2/10', '2/11', '2/12', '2/13', '2/14', '2/15', '2/16', '2/17', '2/18', '2/19', '2/20', '2/21', '2/22', '2/23', '2/24', '2/25', '2/26', '2/27']
Values on the x-axis:
enter image description here
It is giving me the right number of values just not the right labels. I have tried to specify the start and end with xlim=['2/4', '2/27], however that did seem to work.
It would be great to see how your df2 actually looks, but from your code snippet, it looks like it has weights recorded but not the corresponding dates.
How about prepare a data frame that has dates in it?
(Also, since this question is tagged with seaborn too, I'm going to use Seaborn, but the same idea should work.)
import pandas as pd
import seaborn as sns
import seaborn.objects as so
from matplotlib.dates import DateFormatter
sns.set_theme()
Create an index with the dates starting from 4 Feb with the number of days we have weight recorded.
index = pd.date_range(start="2/4/2022", periods=df.count().Weight, name="Date")
Then with Seaborn's object interface (v0.12+), we can do:
(
so.Plot(df2.set_index(index), x="Date", y="Weight")
.add(so.Line())
.scale(x=so.Temporal().label(formatter=DateFormatter('%m-%d')))
.label(title="Daily Weight February 2022")
)
I have solved this solution. Very simple. I just added mdates.WeekdayLocator() to set_major_formatter. I overlooked this when I was going through the matplotlib docs. But happy to have found this solution.

Plot data frame fast and with correct date format

I have the data as in the screenshot, it is in a dataframe format, I would like to plot the dataframe fast and with correct date format.
The code as follow is much fast than using e.g plt.plot(df["Date"], df["D30"])
df.plot(marker='.', linestyle='none')
So that I would like to keep using the dataframe.plot() functionality directly because it is much faster than plot each column against the "Date" column separately. However, as shown in the graph, the date is not correct. My actually starting Date is 2006-01-10, but in the figure, it is shown from 70-01 (1970-01-01).
For me, the official documentation of matplotlib DateFormatter is quite confusing and not so helpful. I tried to google a easy and clear solution, but most answers are related to plt.plot(x, y) where x is Date and y is the actual value. After that it is easy to adjust the format of the "Date" in the figure. But it will make my plot super slow since I am plotting 11 columns in total.
Any idea how I can plot data frame fast and with correct date format
import os
import datetime as dt
import pandas as pd
import matplotlib.pyplot as plt
date_format = mdates.DateFormatter('%y%m')
df_file = r"C:\Codes\df_file.csv"
df = pd.read_csv(path_file)
print(len(df), df.info(), df["Date"][0], type(df["Date"][0]))
df.head(2)
fig = plt.figure(figsize=(12.0, 8.0))
df.plot(marker='.', linestyle='none')
plt.title("data_frame_show date", fontsize=16)
plt.gca().xaxis.set_major_formatter(dtFmt)
plt.legend(loc=(1.04, 0))
plt.show()
partial input:
Date,D10,D30,D60,D91,D122,D152,D182,D273,D365,D547,D730
2006-01-10,,0.1373444,0.1544265,0.1541397,0.1429375,0.1421464,0.1426055,0.1460771,0.1486266,0.1551848,0.1593932
2006-01-11,,0.135426,0.1411246,0.141093,0.1384091,0.1383636,0.1395791,0.1438944,0.1469191,0.1553112,0.1598582
2006-01-12,,0.1311339,0.1292621,0.1304292,0.1363482,0.1362213,0.1367843,0.1404174,0.1439877,0.152306,0.1568677
2006-01-13,,0.1594458,0.1355387,0.1367246,0.1434708,0.143745,0.1441349,0.1453056,0.1481918,0.157193,0.1607564
2006-01-16,,0.1374846,0.1182223,0.1272385,0.1415359,0.1418881,0.1430098,0.1468544,0.1496407,0.1547714,0.158936
2006-01-17,,0.1453834,0.1418838,0.143198,0.1437924,0.143473,0.1440987,0.1473208,0.1501543,0.1590842,0.1629096
2006-01-18,,0.1385479,0.141472,0.1481763,0.1515037,0.1511353,0.1511544,0.1535245,0.1554254,0.1626349,0.1663554
2006-01-19,,0.1639788,0.1462084,0.1483903,0.1486906,0.1483109,0.1492335,0.1539002,0.1563708,0.1611751,0.1644693
2006-01-20,,0.189771,0.178394,0.1638331,0.1565402,0.1559029,0.1553547,0.1526479,0.1516396,0.1614136,0.1646431
2006-01-23,,0.1420271,0.1570005,0.1614942,0.1607205,0.1605297,0.1630065,0.1653838,0.1642349,0.166809,0.1701779
2006-01-24,,0.1814291,0.1633585,0.1563364,0.1548823,0.15382,0.1545099,0.1590869,0.1609158,0.1653819,0.1681759
2006-01-25,,0.1272998,0.1445222,0.1487031,0.1522032,0.152714,0.1524364,0.1532192,0.1550062,0.1635665,0.1658293
2006-01-26,,0.1392162,0.1413034,0.1443807,0.1476261,0.1482458,0.1473548,0.1471019,0.1493254,0.1578586,0.160699
2006-01-27,,0.1360269,0.1374056,0.1387952,0.1426731,0.1441445,0.144917,0.1462428,0.1478979,0.1519537,0.1550311
2006-01-30,,0.1439245,0.1430108,0.1434628,0.1448731,0.1450397,0.1454756,0.1467621,0.1487521,0.1538424,0.1561802
2006-01-31,,0.1483135,0.1468713,0.1473837,0.1519043,0.1519379,0.1502139,0.1504632,0.1529254,0.1571567,0.1589795
2006-02-01,,0.1464208,0.1447363,0.1443483,0.1459808,0.1477726,0.1505124,0.1520256,0.1535773,0.1589145,0.1607383
2006-02-02,,0.1484249,0.1414394,0.1412338,0.1497531,0.1500731,0.1475751,0.147502,0.1512457,0.1571017,0.1606797
2006-02-03,,0.1496503,0.1485318,0.1502473,0.1565336,0.156727,0.1556335,0.1560396,0.1579241,0.1619183,0.1634751
2006-02-06,,0.149966,0.1457216,0.1475524,0.1539103,0.1546401,0.154973,0.1553681,0.1570598,0.161173,0.1630743
2006-02-08,,0.1463649,0.1436135,0.1454147,0.1498372,0.1507231,0.1520234,0.1538407,0.1563603,0.1617697,0.1639547
2006-02-09,,0.1401312,0.1432856,0.1437166,0.1443243,0.1463163,0.148681,0.1496198,0.1516376,0.1584639,0.1615756
2006-02-10,,0.1339916,0.1405194,0.1432779,0.1464605,0.1470921,0.1484831,0.1514307,0.1550715,0.1599564,0.1623171
2006-02-13,,0.1470304,0.1423007,0.1446087,0.1470668,0.1485171,0.1503383,0.1508497,0.1532987,0.1591155,0.1615874
2006-02-14,,0.1454322,0.1449017,0.1455735,0.1462286,0.1478059,0.1501469,0.1522522,0.1541999,0.157668,0.1601427
2006-02-15,,0.1429312,0.1455881,0.1464055,0.1471812,0.1489883,0.1514654,0.153837,0.1559375,0.16082,0.1631557
2006-02-16,,0.134637,0.1373471,0.140634,0.1432172,0.145788,0.14875,0.1507805,0.15325,0.1581015,0.1613797
2006-02-20,,0.1303785,0.1334454,0.139216,0.1423217,0.1454704,0.1477552,0.1487534,0.1509405,0.1554398,0.1588761
2006-02-21,,0.1359587,0.1370814,0.1416117,0.1418016,0.1441761,0.1468109,0.1476679,0.1496546,0.1561362,0.1607204
2006-02-22,,0.1302253,0.1337104,0.1415016,0.141451,0.1438881,0.1467031,0.1502449,0.1514018,0.1531452,0.1582335
2006-02-23,,0.1282022,0.1333902,0.1342376,0.1385976,0.1453201,0.1481733,0.1490296,0.1512885,0.1554035,0.1593463
2006-02-24,,0.1269229,0.1304391,0.1348061,0.1378378,0.1419301,0.1442134,0.1472283,0.1507224,0.1555662,0.1595938
2006-02-27,,0.1254707,0.128201,0.1334554,0.1374389,0.1427246,0.1446071,0.1465459,0.1496113,0.1541296,0.1578174
2006-02-28,,0.1346332,0.1361773,0.139586,0.1421924,0.1468084,0.1489651,0.1505661,0.1541479,0.1606205,0.1675438
2006-03-01,,0.1301198,0.1318495,0.1343342,0.1376886,0.1434328,0.1459977,0.1490832,0.1525961,0.1557153,0.1593923
2006-03-02,,0.1304425,0.1347556,0.1398592,0.1420431,0.1457691,0.1479747,0.1510143,0.1544964,0.1589201,0.1616325
2006-03-03,,0.1311674,0.1339681,0.138887,0.1418598,0.1451706,0.1472144,0.1495689,0.1536886,0.1599843,0.162247
2006-03-06,,0.1308081,0.1367775,0.1412145,0.1436582,0.1480171,0.1495588,0.1511633,0.1545973,0.1588486,0.1616268
2006-03-07,,0.1344355,0.1387528,0.143365,0.1459607,0.1482421,0.1491656,0.1512236,0.1550063,0.1593201,0.1615385
When plotting time series, pandas takes the index for the x-axis when calling the plot function.
I would suggest to:
df = df.assign(
Date=lambda x: pd.to_datetime(x["Date"], format="%Y-%m%d")
).set_index("Date")

Dataframe changing question with time series data with pandas

I have this dataframe:
The event-time is certain time, date-time column is every 10 min with a specific price. Continues for 4 hours after event time and 2 hours before the event for each security. I have thousands of securities. I want to create a plot that i x-axis starts from -12 to 24 which is event time to -2 hour to 4 hours after. y-axis price change. Is any way to synchronize date-time in python for security.
If you're looking to simply plot the data pandas should handle your datetimes for you assuming they are in datetime formats instead of strings.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
event_time = pd.to_datetime('2024-04-28T07:52:00')
date_time = pd.date_range(event_time, periods=24, freq=pd.to_timedelta(10, 'minute'))
df = pd.DataFrame({'date_time': date_time, 'change': np.random.normal(size=len(date_time))})
ax = df.plot(x='date_time', y='change')
plt.show()
However if you're wanting to remove the specific times from the x axis and just count up from zero you could use the index as the x-axis:
df['_index'] = df.index
ax = df.plot(x='_index', y='change')
plt.show()

How to convert excel time without date to pandas dataframe and then plot it?

I want to import dataset from excel file to pandas dataframe and then plot it. Dataset includes Date column - which should be convert to pd.datetime and Stopwatch columns - which sould be convert to format: HH:MM:SS or HH:MM or H:MM depends from data (hour could be more than 24 hours and format shouldn't include date). Here are some rows from the data:
Date Stopwatch1 Stopwatch2 Stopwatch3 Timesum
01.08.2019 00:10:05 19:05 0:45 25:01:00
02.08.2019 00:08:00 23:50 0:30 30:30:00
03.08.2019 00:05:00 00:10 0:40 124:00:00
Then I want to plot Stopwatch column on y axis with labels in time format (HH:MM) and Date column on x axis. It would be nice if I could specify that for example if time < 06:00 : time = time + 24:00 - what I mean is 00:10 is greater than 23:50 in the Stopwatch2 column so that should be included in the chart.
I try to do:
df = pd.read_excel(path, dtype={'Stopwatch1':str, 'Stopwatch2':str, 'Stopwatch3':str})
tmd = pd.to_timedelta(df["Stopwatch1"])
tmd.plot()
Plot is working, but the labels on the y axis are numbers. I want to change them to time format(HH:MM).
Not sure if I understand the output correctly, but would this be something you are looking for?
I added the Stopwatch1 as a datetime so matplotlib dates module would understand and then help format it correctly:
df['Stopwatch1'] = pd.to_datetime(df["Stopwatch1"])
from matplotlib import dates as mdates
fig, ax = plt.subplots()
ax.plot(df['Stopwatch1'])
ax.yaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
In case you also want the dates as the x-axis, you can set it as the index, so the distance between observations will be based on the date distance.
df['Date'] = pd.to_datetime(df["Date"])
df = df.set_index('Date')
fig, ax = plt.subplots()
ax.plot(df.index, df['Stopwatch1'])
ax.set_xticks(df.index)
ax.yaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d"))

Pandas plot function ignores timezone of timeseries

When plotting a timeseries with the built-in plot function of pandas, it seems to ignore the timezone of my index: it always uses the UTC time for the x-axis. An example:
import numpy as np
import matplotlib.pyplot as plt
from pandas import rolling_mean, DataFrame, date_range
rng = date_range('1/1/2011', periods=200, freq='S', tz="UTC")
data = DataFrame(np.random.randn(len(rng), 3), index=rng, columns=['A', 'B', 'C'])
data_cet = data.tz_convert("CET")
# plot with data in UTC timezone
fig, ax = plt.subplots()
data[["A", "B"]].plot(ax=ax, grid=True)
plt.show()
# plot with data in CET timezone, but the x-axis remains the same as above
fig, ax = plt.subplots()
data_cet[["A", "B"]].plot(ax=ax, grid=True)
plt.show()
The plot does not change, although the index has:
In [11]: data.index[0]
Out[11]: <Timestamp: 2011-01-01 00:00:00+0000 UTC, tz=UTC>
In [12]: data_cet.index[0]
Out[12]: <Timestamp: 2011-01-01 01:00:00+0100 CET, tz=CET>
Should I file a bug, or do I miss something?
This is definitely a bug. I've created a report on github. The reason is because internally, pandas converts a regular frequency DatetimeIndex to PeriodIndex to hook into formatters/locators in pandas, and currently PeriodIndex does NOT retain timezone information.
Please stay tuned for a fix.
from pytz import timezone as ptz
import matplotlib as mpl
...
data.index = pd.to_datetime(data.index, utc=True).tz_localize(tz=ptz('<your timezone>'))
...
mpl.rcParams['timezone'] = data.index.tz.zone
... after which matplotlib prints as that zone rather than UTC.
However! Note if you need to annotate, the x locations of the annotations will still need to be in UTC, even whilst strings passed to data.loc[] or data.at[] will be assumed to be in the set timezone!
For instance I needed to show a series of vertical lines labelled with timestamps on them:
(this is after most of the plot calls, and note the timestamp strings in sels were UTC)
sels = ['2019-03-21 3:56:28',
'2019-03-21 4:00:30',
'2019-03-21 4:05:55',
'2019-03-21 4:13:40']
ax.vlines(sels,125,145,lw=1,color='grey') # 125 was bottom, 145 was top in data units
for s in sels:
tstr = pd.to_datetime(s, utc=True)\
.astimezone(tz=ptz(data.index.tz.zone))\
.isoformat().split('T')[1].split('+')[0]
ax.annotate(tstr,xy=(s,125),xycoords='data',
xytext=(0,5), textcoords='offset points', rotation=90,
horizontalalignment='right', verticalalignment='bottom')
This puts grey vertical lines at the times chosen manually in sels, and labels them in local timezone hours, minutes and seconds. (the .split()[] business discards the date and timezone info from the .isoformat() string).
But when I need to actually get corresponding values from data using the same s in sels, I then have to use the somewhat awkward:
data.tz_convert('UTC').at[s]
Whereas just
data.at[s]
Fails with a KeyError because pandas interprets s is in the data.index.tz timezone, and so interpreted, the timestamps fall outside of range of the contents of data
How to deal with UTC to local time conversion
import time
import matplotlib.dates
…
tz = pytz.timezone(time.tzname[0])
…
ax.xaxis.set_major_locator(matplotlib.dates.HourLocator(interval=1, tz=tz))
ax.xaxis.set_major_formatter(matplotlib.dates.DateFormatter('%H', tz=tz))

Categories