Error rounding time to previous 15 min - Python - python

I've developed a crude method to round timestamps to the previous 15 mins. For instance, if the timestamp is 8:10:00, it gets rounded to 8:00:00.
However, when it goes over 15 mins it rounds to the previous hour. For instance, if the timestamp was 8:20:00, it gets rounded to 7:00:00 for some reason? I'll list the two examples below.
Correct Rounding:
import pandas as pd
from datetime import datetime, timedelta
d = ({
'Time' : ['8:00:00'],
})
df = pd.DataFrame(data=d)
df['Time'] = pd.to_datetime(df['Time'])
FirstTime = df['Time'].iloc[0]
def hour_rounder(t):
return (t.replace(second=0, microsecond=0, minute=0, hour=t.hour)
-timedelta(hours=t.minute//15))
StartTime = hour_rounder(FirstTime)
StartTime = datetime.time(StartTime)
print(StartTime)
Out:
08:00:00
Incorrect Rounding:
import pandas as pd
from datetime import datetime, timedelta
d = ({
'Time' : ['8:20:00'],
})
df = pd.DataFrame(data=d)
df['Time'] = pd.to_datetime(df['Time'])
FirstTime = df['Time'].iloc[0]
def hour_rounder(t):
return (t.replace(second=0, microsecond=0, minute=0, hour=t.hour)
-timedelta(hours=t.minute//15))
StartTime = hour_rounder(FirstTime)
StartTime = datetime.time(StartTime)
print(StartTime)
Out:
07:00:00
I don't understand what I'm doing wrong?

- timedelta(hours=t.minute//15)
If minute is 20, then minute // 15 equals 1, so you're subtracting one hour.
Try this instead:
return t.replace(second=0, microsecond=0, minute=(t.minute // 15 * 15), hour=t.hour)

Use .dt.floor('15min') to round down to 15 minute invervals.
import pandas as pd
df = pd.DataFrame({'Time': pd.date_range('2018-01-01', freq='13.141min', periods=13)})
df['prev_15'] = df.Time.dt.floor('15min')
Output:
Time prev_15
0 2018-01-01 00:00:00.000 2018-01-01 00:00:00
1 2018-01-01 00:13:08.460 2018-01-01 00:00:00
2 2018-01-01 00:26:16.920 2018-01-01 00:15:00
3 2018-01-01 00:39:25.380 2018-01-01 00:30:00
4 2018-01-01 00:52:33.840 2018-01-01 00:45:00
5 2018-01-01 01:05:42.300 2018-01-01 01:00:00
6 2018-01-01 01:18:50.760 2018-01-01 01:15:00
7 2018-01-01 01:31:59.220 2018-01-01 01:30:00
8 2018-01-01 01:45:07.680 2018-01-01 01:45:00
9 2018-01-01 01:58:16.140 2018-01-01 01:45:00
10 2018-01-01 02:11:24.600 2018-01-01 02:00:00
11 2018-01-01 02:24:33.060 2018-01-01 02:15:00
12 2018-01-01 02:37:41.520 2018-01-01 02:30:00
There is also .dt.round() and .dt.ceil() if you need to get the nearest 15 minute, or the following 15 minute invterval respectively.

Related

How to subtract time when there is a date change in pandas?

I have following dataframe in pandas
start_date start_time end_time
2018-01-01 23:55:00 00:05:00
2018-01-02 00:05:00 00:10:00
2018-01-03 23:59:00 00:05:00
I want to calculate the time difference. But, for 1st and 3rd observation, there is a date change in end_time.
How can I do it in pandas?
Currently, I am using the logic where end_time is less than start_time I am creating one more column called end_date where it increments the start_date by 1 and then subtracts the time.
Is there any other way to do it?
Solution working with timedeltas - if difference are days equal -1 then add one day:
df['start_time'] = pd.to_timedelta(df['start_time'])
df['end_time'] = pd.to_timedelta(df['end_time'])
d = df['end_time'] - df['start_time']
df['diff'] = d.mask(d.dt.days == -1, d + pd.Timedelta(1, unit='d'))
print (df)
start_date start_time end_time diff
0 2018-01-01 23:55:00 00:05:00 00:10:00
1 2018-01-02 00:05:00 00:10:00 00:05:00
2 2018-01-03 23:59:00 00:05:00 00:06:00
Another solution:
s = df['end_time'] - df['start_time']
df['diff'] = np.where(df['end_time'] < df['start_time'],
s + pd.Timedelta(1, unit='d'),
s)
print (df)
start_date start_time end_time diff
0 2018-01-01 23:55:00 00:05:00 00:10:00
1 2018-01-02 00:05:00 00:10:00 00:05:00
2 2018-01-03 23:59:00 00:05:00 00:06:00

How to round date time index in a pandas data frame?

There is a pandas dataframe like this:
index
2018-06-01 02:50:00 R 45.48 -2.8
2018-06-01 07:13:00 R 45.85 -2.0
...
2018-06-01 08:37:00 R 45.87 -2.7
I would like to round the index to the hour like this:
index
2018-06-01 02:00:00 R 45.48 -2.8
2018-06-01 07:00:00 R 45.85 -2.0
...
2018-06-01 08:00:00 R 45.87 -2.7
I am trying the following code:
df = df.date_time.apply ( lambda x : x.round('H'))
but returns a serie instead of a dataframe with the modified index column
Try using floor:
df.index.floor('H')
Setup:
df = pd.DataFrame(np.arange(25),index=pd.date_range('2018-01-01 01:12:50','2018-01-02 01:12:50',freq='H'),columns=['Value'])
df.head()
Value
2018-01-01 01:12:50 0
2018-01-01 02:12:50 1
2018-01-01 03:12:50 2
2018-01-01 04:12:50 3
2018-01-01 05:12:50 4
df.index = df.index.floor('H')
df.head()
Value
2018-01-01 01:00:00 0
2018-01-01 02:00:00 1
2018-01-01 03:00:00 2
2018-01-01 04:00:00 3
2018-01-01 05:00:00 4
Try my method:
Add a new column by the rounded value of hour:
df['E'] = df.index.round('H')
Set it as index:
df1 = df.set_index('E')
Delete the name you set('E' here):
df1.index.name = None
And now, df1 is a new DataFrame with index hour rounded from df.
Try this
df['index'].apply(lambda dt: datetime.datetime(dt.year, dt.month, dt.day, dt.hour,60*(dt.minute // 60)))

Convert/use timedelta's time to datetime

I have a dataframe with two columns: 1 timedelta 'Time', and 1 datetime 'DateTime'.
My timedelta column simply contains/displays a normal regular time, it never has more than 24 hours. It's not being used as a 'timedetla', just 'time'.
It's just the way it comes when pandas gets the data from my database.
I want a new column 'NewDateTime', with the date from the datetime, and time from the deltatime.
So I have this:
Time DateTime
1 09:01:00 2018-01-01 10:10:10
2 21:43:00 2018-01-01 11:11:11
3 03:20:00 2018-01-01 12:12:12
And I want this:
Time DateTime NewDateTime
1 09:01:00 2018-01-01 10:10:10 2018-01-01 09:01:00
2 21:43:00 2018-01-01 11:11:11 2018-01-01 21:43:00
3 03:20:00 2018-01-01 12:12:12 2018-01-01 03:20:00
At first I tried to set the DateTime column's hours, minutes and seconds to 0.
Then I planned to add the timedelta to the datetime.
But when I tried to do:
df['NewDateTime'] = df['DateTime'].dt.replace(hour=0, minute=0, second=0)
I get AttributeError: 'DatetimeProperties' object has no attribute 'replace'
Use Series.dt.floor for remove times:
df['NewDateTime'] = df['DateTime'].dt.floor('D') + pd.to_timedelta(df['Time'])
#if necesary convert times to strings
#df['NewDateTime'] = df['DateTime'].dt.floor('D') + pd.to_timedelta(df['Time'].astype(str))
print (df)
Time DateTime NewDateTime
1 09:01:00 2018-01-01 10:10:10 2018-01-01 09:01:00
2 21:43:00 2018-01-01 11:11:11 2018-01-01 21:43:00
3 03:20:00 2018-01-01 12:12:12 2018-01-01 03:20:00

pandas: join dataframes based on time interval

I have a data frame with a datetime column every 10 minutes and a numerical value:
df1 = pd.DataFrame({'time' : pd.date_range('1/1/2018', periods=20, freq='10min'), 'value' : np.random.randint(2, 20, size=20)})
And another with a schedule of events, with a start time and end time. There can be multiple events happening at the same time:
df2 = pd.DataFrame({'start_time' : ['2018-01-01 00:00:00', '2018-01-01 00:00:00','2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 02:00:00' ], 'end_time' : ['2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 02:00:00','2018-01-01 02:00:00', '2018-01-01 02:00:00', '2018-01-01 03:00:00'], 'event' : ['A', 'B', 'C', 'D', 'E', 'F'] })
df2[['start_time', 'end_time']] = df2.iloc[:,0:2].apply(pd.to_datetime)
I want to do a left join on df1, with all events that fall inside the start and end times. My output table should be:
time value event
0 2018-01-01 00:00:00 5 A
1 2018-01-01 00:00:00 5 B
2 2018-01-01 00:10:00 15 A
3 2018-01-01 00:10:00 15 B
4 2018-01-01 00:20:00 16 A
5 2018-01-01 00:20:00 16 B
.....
17 2018-01-01 02:50:00 7 F
I attempted these SO solutions, but they fail because of duplicate time intervals.
Setup (Only using a few entries from df1 for brevity):
df1 = pd.DataFrame({'time' : pd.date_range('1/1/2018', periods=20, freq='10min'), 'value' : np.random.randint(2, 20, size=20)})
df2 = pd.DataFrame({'start_time' : ['2018-01-01 00:00:00', '2018-01-01 00:00:00','2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 02:00:00' ], 'end_time' : ['2018-01-01 01:00:00', '2018-01-01 01:00:00', '2018-01-01 02:00:00','2018-01-01 02:00:00', '2018-01-01 02:00:00', '2018-01-01 03:00:00'], 'event' : ['A', 'B', 'C', 'D', 'E', 'F'] })
df1 = df1.sample(5)
df2[['start_time', 'end_time']] = df2.iloc[:,0:2].apply(pd.to_datetime)
You can use a couple of straightfoward list comprehensions to achieve your result. This answer assumes that all date columns are in fact, of type datetime in your DataFrame:
Step 1
Find all events that occur within a particular time range using a list comprehension and simple interval checking:
packed = list(zip(df2.start_time, df2.end_time, df2.event))
df1['event'] = [[ev for strt, end, ev in packed if strt <= el <= end] for el in df1.time]
time value event
2 2018-01-01 00:20:00 8 [A, B]
14 2018-01-01 02:20:00 14 [F]
8 2018-01-01 01:20:00 6 [C, D, E]
19 2018-01-01 03:10:00 16 []
4 2018-01-01 00:40:00 7 [A, B]
Step 2:
Finally, explode each list from the last result to a new row using another list comprehension:
pd.DataFrame(
[[t, val, e] for t, val, event in zip(df1.time, df1.value, df1.event)
for e in event
], columns=df1.columns
)
Output:
time value event
0 2018-01-01 00:20:00 8 A
1 2018-01-01 00:20:00 8 B
2 2018-01-01 02:20:00 14 F
3 2018-01-01 01:20:00 6 C
4 2018-01-01 01:20:00 6 D
5 2018-01-01 01:20:00 6 E
6 2018-01-01 00:40:00 7 A
7 2018-01-01 00:40:00 7 B
I'm not entirely sure of your question, but if you are trying to join on "events that fall inside the start and end times," then sounds like you need something akin to a "between" operator from SQL. You're data doesn't make it particularly clear.
Pandas doesn't have this natively, but Pandasql does. It allows you to run sqlite against you're dataframe. I think something like this is what you need:
import pandasql as ps
sqlcode = '''
select *
from df1
inner join df2 on df1.event=df2.event
where df2.time >= d1.start_time and df2.fdate <= d1.stop_time
'''
newdf = ps.sqldf(sqlcode,locals())
Relevant Question:
Merge pandas dataframes where one value is between two others
One option is with the conditional_join from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
out = df1.conditional_join(
df2,
('time', 'start_time', '>='),
('time', 'end_time', '<=')
)
out.head()
time value start_time end_time event
0 2018-01-01 00:00:00 14 2018-01-01 2018-01-01 01:00:00 A
1 2018-01-01 00:00:00 14 2018-01-01 2018-01-01 01:00:00 B
2 2018-01-01 00:10:00 10 2018-01-01 2018-01-01 01:00:00 A
3 2018-01-01 00:10:00 10 2018-01-01 2018-01-01 01:00:00 B
4 2018-01-01 00:20:00 15 2018-01-01 2018-01-01 01:00:00 A
You can work on df2 to create a column with all the time with a resampling '10min' (like in df1) for each event, and then use merge. It's a lot of manipulation so probably not the most efficient.
df2_manip = (df2.set_index('event').stack().reset_index().set_index(0)
.groupby('event').resample('10T').ffill().reset_index(1))
and df2_manip looks like:
0 event level_1
event
A 2018-01-01 00:00:00 A start_time
A 2018-01-01 00:10:00 A start_time
A 2018-01-01 00:20:00 A start_time
A 2018-01-01 00:30:00 A start_time
A 2018-01-01 00:40:00 A start_time
A 2018-01-01 00:50:00 A start_time
A 2018-01-01 01:00:00 A end_time
B 2018-01-01 00:00:00 B start_time
B 2018-01-01 00:10:00 B start_time
B 2018-01-01 00:20:00 B start_time
B 2018-01-01 00:30:00 B start_time
...
Now you can merge:
df1 = df1.merge(df2_manip[[0, 'event']].rename(columns={0:'time'}))
and you get df1:
time value event
0 2018-01-01 00:00:00 9 A
1 2018-01-01 00:00:00 9 B
2 2018-01-01 00:10:00 16 A
3 2018-01-01 00:10:00 16 B
...
33 2018-01-01 02:00:00 6 D
34 2018-01-01 02:00:00 6 E
35 2018-01-01 02:00:00 6 F
36 2018-01-01 02:10:00 2 F
37 2018-01-01 02:20:00 18 F
38 2018-01-01 02:30:00 14 F
39 2018-01-01 02:40:00 5 F
40 2018-01-01 02:50:00 3 F
41 2018-01-01 03:00:00 9 F

Pandas .resample() or .asfreq() fill forward times

I'm trying to resample a dataframe with a time series from 1-hour increments to 15-minute. Both .resample() and .asfreq() do almost exactly what I want, but I'm having a hard time filling the last three intervals.
I could add an extra hour at the end, resample, and then drop that last hour, but it feels hacky.
Current code:
df = pd.DataFrame({'date':pd.date_range('2018-01-01 00:00', '2018-01-01 01:00', freq = '1H'), 'num':5})
df = df.set_index('date').asfreq('15T', method = 'ffill', how = 'end').reset_index()
Current output:
date num
0 2018-01-01 00:00:00 5
1 2018-01-01 00:15:00 5
2 2018-01-01 00:30:00 5
3 2018-01-01 00:45:00 5
4 2018-01-01 01:00:00 5
Desired output:
date num
0 2018-01-01 00:00:00 5
1 2018-01-01 00:15:00 5
2 2018-01-01 00:30:00 5
3 2018-01-01 00:45:00 5
4 2018-01-01 01:00:00 5
5 2018-01-01 01:15:00 5
6 2018-01-01 01:30:00 5
7 2018-01-01 01:45:00 5
Thoughts?
Not sure about asfreq but reindex works wonderfully:
df.set_index('date').reindex(
pd.date_range(
df.date.min(),
df.date.max() + pd.Timedelta('1H'), freq='15T', closed='left'
),
method='ffill'
)
num
2018-01-01 00:00:00 5
2018-01-01 00:15:00 5
2018-01-01 00:30:00 5
2018-01-01 00:45:00 5
2018-01-01 01:00:00 5
2018-01-01 01:15:00 5
2018-01-01 01:30:00 5
2018-01-01 01:45:00 5

Categories