Adjust timestamps on x-axis - Matplotlib - python

I am trying to create a line plot in order of time. For the df below, the first value appears at 07:00:00 and finishes at 00:00:40.
But the timestamps aren't assigned to the x-axis and the row after midnight is plotted first, instead of last.
import pandas as pd
import matplotlib.pyplot as plt
d = ({
'Time' : ['7:00:00','10:30:00','12:40:00','16:25:00','18:30:00','22:40:00','00:40:00'],
'Value' : [1,2,3,4,5,4,10],
})
df = pd.DataFrame(d)
df['Time'] = pd.to_timedelta(df['Time'])
plt.plot(df['Time'], df['Value'])
plt.show()
print(df)

Your timedelta object is being converted to a numerical representation by matplotlib. That's why you aren't getting a date on your x axis. And the plot is going in order. It's just that '00:40:00' is less than all the other times so it's being plotted as the left most point.
What you can do instead is use a datetime format to include days, which will indicate that 00:40:00 should come last since it'll fall on the next day. You can also use pandas plotting method for easier formatting:
d = ({
'Time' : ['2019/1/1 7:00:00','2019/1/1 10:30:00','2019/1/1 12:40:00',
'2019/1/1 16:25:00','2019/1/1 18:30:00','2019/1/1 22:40:00',
'2019/1/2 00:40:00'],
'Value' : [1,2,3,4,5,4,10],
})
df = pd.DataFrame(d)
df['Time'] = pd.to_datetime(df['Time'])
df.plot(x='Time', y='Value')
Update
To set the tick/tick lables at your time points is a bit tricky. This post will give you an idea of how the positioning works. Basically, you'll need to use something like matplotlib.dates.date2num to get the numerical representation of datetime:
xticks = [matplotlib.dates.date2num(x) for x in df['Time']]
xticklabels = [x.strftime('%H:%M') for x in df['Time']]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)

Related

Plot each single day on one plot by extracting time of DatetimeIndex without for loop

I have a dataframe including random data over 7 days and each data point is indexed by DatetimeIndex. I want to plot data of each day on a single plot. Currently my try is the following:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n =10000
i = pd.date_range('2018-04-09', periods=n, freq='1min')
ts = pd.DataFrame({'A': [np.random.randn() for i in range(n)]}, index=i)
dates = list(ts.index.map(lambda t: t.date).unique())
for date in dates:
ts['A'].loc[date.strftime('%Y-%m-%d')].plot()
The result is the following:
As you can see when DatetimeIndex is used the corresponding day is kept that is why we have each day back to the next one.
Questions:
1- How can I fix the current code to have an x-axis which starts from midnight and ends next midnight.
2- Is there a pandas way to group days better and plot them on a single day without using for loop?
You can split the index into dates and times and unstack the ts into a dataframe:
df = ts.set_index([ts.index.date, ts.index.time]).unstack(level=0)
df.columns = df.columns.get_level_values(1)
then plot all in one chart:
df.plot()
or in separate charts:
axs = df.plot(subplots=True, title=df.columns.tolist(), legend=False, figsize=(6,8))
axs[0].figure.execute_constrained_layout()

How to plot a dataframe using date as the x axis

I have a simple dataframe with two columns, 'date' and 'amount'. I want to plot the amount using date as the x-axis. The first lines of the data are:
22/05/2018,52068.67
21/05/2018,52159.19
15/05/2018,52744.03
08/05/2018,54666.21
08/05/2018,54677.51
01/05/2018,53890.59
30/04/2018,54812.25
27/04/2018,52258.23
26/04/2018,52351.47
23/04/2018,49777.04
23/04/2018,49952.44
23/04/2018,49992.44
05/04/2018,53238.59
03/04/2018,53631.09
03/04/2018,53839.64
28/03/2018,50836.78
26/03/2018,51206.67
26/03/2018,51372.02
14/03/2018,51110.17
12/03/2018,51411.31
06/03/2018,51169.91
05/03/2018,51374.57
27/02/2018,48728.85
27/02/2018,48730.5
16/02/2018,44988.25
14/02/2018,41948.03
12/02/2018,43776.31
12/02/2018,43800.31
12/02/2018,43840.11
05/02/2018,29358.96
26/01/2018,39491.0
24/01/2018,36470.03
23/01/2018,36562.76
23/01/2018,36616.61
22/01/2018,36582.46
22/01/2018,36665.71
22/01/2018,36743.31
17/01/2018,36965.3
16/01/2018,37044.6
09/01/2018,42083.65
08/01/2018,42183.39
05/01/2018,42285.41
03/01/2018,41537.51
03/01/2018,41579.51
02/01/2018,41945.32
27/12/2017,43003.33
27/12/2017,43217.29
18/12/2017,38208.63
15/12/2017,38315.53
However, the plot gives me points that don't appear in the data. For example, in May 2018 there is no value near 30000.
My code is:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("test.csv", header=None, names =['date', 'amount'])
df['time'] = pd.to_datetime(df['date'])
df.set_index(['time'],inplace=True)
df['amount'].plot()
plt.show()
What am I doing wrong?
You need to covert the dates to date time using correct format and use pandas plot
df['date'] = pd.to_datetime(df['date'], format = '%d/%m/%Y')
df.plot('date', 'amount')

Inconsistent time series given equal spacing in matplotlib

I'm plotting a stacked line chart from a pandas dataframe. The data was collected irregularly over the course of two days. In the image below, you can see that the time change between equal intervals varies (~7 hours to ~36 hours between equally spaced intervals). I don't want this to happen, I want points on the graph to be stretched and squeezed appropriately such that time scales linearly with the x-axis. How can I do this?
The data was read in as follows:
df = pd.read_csv("filepath", index_col=0)
df = df.T
Above, I had to transpose the dataframe for the pandas stacked line plot to work as I wanted it to. The plot was produced as follows:
plot = df.plot.area(rot=90)
plot.axhline(y=2450, color="black")
In response to ImportanceOfBeingErnest, here is a minimal, complete, and verifiable example:
import numpy as np
import pandas as pd
import matplotlib.pyplot as mpl
dateTimeIndex = ["04.12.17 23:03", "05.12.17 00:09", "05.12.17 21:44", "05.12.17 22:34", "08.12.17 16:23"]
d = {'one' : pd.Series(abs(np.random.randn(5)), index=dateTimeIndex),
'two' : pd.Series(abs(np.random.randn(5)), index=dateTimeIndex),
'three' : pd.Series(abs(np.random.randn(5)), index=dateTimeIndex)}
df = pd.DataFrame(d)
plot = df.plot.area(rot=90)
Here is what the dataframe looks like (random values will vary):
one three two
04.12.17 23:03 0.472832 0.283329 0.739657
05.12.17 00:09 3.166099 1.065015 0.561079
05.12.17 21:44 0.209190 0.674236 0.143453
05.12.17 22:34 1.275056 0.764328 0.650507
08.12.17 16:23 0.764038 0.265599 0.342435
and the plot produced:
As you can tell, the dateTimeIndex entries are rather random but they are given equal spacing on the x-axis. I don't mind if the tick marks coincide with the data points. I only want time to scale linearly. How can this be achieved?
Whats happening above is pandas is just using the strings as the x-ticks. You need to make the dateTimeIndex a datetime object:
dateTimeIndex = pd.to_datetime( ["04.12.17 23:03", "05.12.17 00:09",
"05.12.17 21:44", "05.12.17 22:34", "08.12.17 16:23"])
d = {'one' : pd.Series(abs(np.random.randn(5)), index=dateTimeIndex),
'two' : pd.Series(abs(np.random.randn(5)), index=dateTimeIndex),
'three' : pd.Series(abs(np.random.randn(5)), index=dateTimeIndex)}
df = pd.DataFrame(d)
plot = df.plot.area(rot=90)

Annotate timeseries plot by merging two timeseries

Given I have two time series (or two columns in a data frame) like this:
rng1 = pd.date_range('1/1/2017', periods=3, freq='H')
ts1 = pd.Series(np.random.randn(len(rng)), index=rng)
ts2 = pd.Series(['HE','NOT','SHE'], index=rng)
I want to do a plot of ts1.plot() where ts2 is used to annotate ts1 time series, HOWEVER I only want to annotate the timestamps that are <> NOT.
What I have found so far is using markers would be what Im looking for. For example having one marker for 'HE' and another for 'SHE' and No marker for 'NOT'. However I cant figure out how to use another time series as input and only to annotate the timestamps <> some value.
You can use the pandas dataframe groupby method to split the dataset by the labels you're using and just ignore the values you don't want to plot.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
rng = pd.date_range('1/1/2017', periods=3, freq='H')
ts1 = pd.Series(np.random.randn(len(rng)), index=rng)
ts2 = pd.Series(['HE','NOT','SHE'], index=rng)
df = pd.concat([ts1, ts2], keys=['foo', 'bar'], axis=1)
ax = None # trick to keep everything plotted on a single axis
labels = [] # keep track of the labels you actually use
for key, dat in df.groupby('bar'):
if key == 'NOT':
continue
labels.append(key)
ax = dat.plot(ax=ax, marker='s', ls='none', legend=False)
# handle the legend through matplotlib directly, rather than pandas' interface
ax.legend(ax.get_lines(), labels)
plt.show()

Modifying number of ticks on Pandas hourly time axis

If I have the following example Python code using a Pandas dataframe:
import pandas as pd
from datetime import datetime
ts = pd.DataFrame(randn(1000), index=pd.date_range('1/1/2000 00:00:00', freq='H', periods=1000), columns=['Data'])
ts['Time'] = ts.index.map(lambda t: t.time())
ts = ts.groupby('Time').mean()
ts.plot(x_compat=True, figsize=(20,10))
The output plot is:
What is the most elegant way to get the X-Axis ticks to automatically space themselves hourly or bi-hourly? x_compat=True has no impact
You can pass to ts.plot() the argument xticks. Giving the right interval you can plot hourly our bi-hourly like:
max_sec = 90000
ts.plot(x_compat=True, figsize=(20,10), xticks=arange(0, max_sec, 3600))
ts.plot(x_compat=True, figsize=(20,10), xticks=arange(0, max_sec, 7200))
Here max_sec is the maximum value of the xaxis, in seconds.

Categories