Here I have an extract from my pandas dataframe which is survey data with two datetime fields. It appears that some of the start times and end times were filled in the wrong position in the survey. Here is an example from my dataframe. The start and end time in the 8th row, I suspect were entered the wrong way round.
Just to give context, I generated the third column like this:
df_time['trip_duration'] = df_time['tripEnd_time'] - df_time['tripStart_time']
The three columns are in timedelta64 format.
Here is the top of my dataframe:
tripStart_time tripEnd_time trip_duration
1 22:30:00 23:15:00 00:45:00
2 11:00:00 11:30:00 00:30:00
3 09:00:00 09:15:00 00:15:00
4 13:30:00 14:25:00 00:55:00
5 09:00:00 10:15:00 01:15:00
6 12:00:00 12:15:00 00:15:00
7 08:00:00 08:30:00 00:30:00
8 11:00:00 09:15:00 -1 days +22:15:00
9 14:00:00 14:30:00 00:30:00
10 14:55:00 15:20:00 00:25:00
What I am trying to do is, loop through these two columns, and for each time 'tripEnd_time' is less than 'tripStart_time' swap the positions of these two entries. So in the case of row 8 above, I would make tripStart_time = tripEnd_time and tripEnd_time = tripStart_time.
I am not quite sure the best way to approach this. Should I use nested for loop where i compare each entry in the two columns?
Thanks
Use Series.abs:
df_time['trip_duration'] = (df_time['tripEnd_time'] - df_time['tripStart_time']).abs()
print (df_time)
1 22:30:00 23:15:00 00:45:00
2 11:00:00 11:30:00 00:30:00
3 09:00:00 09:15:00 00:15:00
4 13:30:00 14:25:00 00:55:00
5 09:00:00 10:15:00 01:15:00
6 12:00:00 12:15:00 00:15:00
7 08:00:00 08:30:00 00:30:00
8 11:00:00 09:15:00 01:45:00
9 14:00:00 14:30:00 00:30:00
10 14:55:00 15:20:00 00:25:00
What is same like:
a = df_time['tripEnd_time'] - df_time['tripStart_time']
b = df_time['tripStart_time'] - df_time['tripEnd_time']
mask = df_time['tripEnd_time'] > df_time['tripStart_time']
df_time['trip_duration'] = np.where(mask, a, b)
print (df_time)
tripStart_time tripEnd_time trip_duration
1 22:30:00 23:15:00 00:45:00
2 11:00:00 11:30:00 00:30:00
3 09:00:00 09:15:00 00:15:00
4 13:30:00 14:25:00 00:55:00
5 09:00:00 10:15:00 01:15:00
6 12:00:00 12:15:00 00:15:00
7 08:00:00 08:30:00 00:30:00
8 11:00:00 09:15:00 01:45:00
9 14:00:00 14:30:00 00:30:00
10 14:55:00 15:20:00 00:25:00
You can switch column values on selected rows:
df_time.loc[df_time['tripEnd_time'] < df_time['tripStart_time'],
['tripStart_time', 'tripEnd_time']] = df_time.loc[
df_time['tripEnd_time'] < df_time['tripStart_time'],
['tripEnd_time', 'tripStart_time']].values
Related
I'm working on a timeseries dataframe which looks like this and has data from January to August 2020.
Timestamp Value
2020-01-01 00:00:00 -68.95370
2020-01-01 00:05:00 -67.90175
2020-01-01 00:10:00 -67.45966
2020-01-01 00:15:00 -67.07624
2020-01-01 00:20:00 -67.30549
.....
2020-07-01 00:00:00 -65.34212
I'm trying to apply a filter on the previous dataframe using the columns start_time and end_time in the dataframe below:
start_time end_time
2020-01-12 16:15:00 2020-01-13 16:00:00
2020-01-26 16:00:00 2020-01-26 16:10:00
2020-04-12 16:00:00 2020-04-13 16:00:00
2020-04-20 16:00:00 2020-04-21 16:00:00
2020-05-02 16:00:00 2020-05-03 16:00:00
The output should assign all values which are not within the start and end time as zero and retain values for the start and end times specified in the filter. I tried applying two simultaneous filters for start and end time but didn't work.
Any help would be appreciated.
Idea is create all masks by Series.between in list comprehension, then join with logical_or by np.logical_or.reduce and last pass to Series.where:
print (df1)
Timestamp Value
0 2020-01-13 00:00:00 -68.95370 <- changed data for match
1 2020-01-01 00:05:00 -67.90175
2 2020-01-01 00:10:00 -67.45966
3 2020-01-01 00:15:00 -67.07624
4 2020-01-01 00:20:00 -67.30549
5 2020-07-01 00:00:00 -65.34212
L = [df1['Timestamp'].between(s, e) for s, e in df2[['start_time','end_time']].values]
m = np.logical_or.reduce(L)
df1['Value'] = df1['Value'].where(m, 0)
print (df1)
Timestamp Value
0 2020-01-13 00:00:00 -68.9537
1 2020-01-01 00:05:00 0.0000
2 2020-01-01 00:10:00 0.0000
3 2020-01-01 00:15:00 0.0000
4 2020-01-01 00:20:00 0.0000
5 2020-07-01 00:00:00 0.0000
Solution using outer join of merge method and query:
print(df1)
timestamp Value <- changed Timestamp to timestamp to avoid name conflict in query
0 2020-01-13 00:00:00 -68.95370 <- also changed data for match
1 2020-01-01 00:05:00 -67.90175
2 2020-01-01 00:10:00 -67.45966
3 2020-01-01 00:15:00 -67.07624
4 2020-01-01 00:20:00 -67.30549
5 2020-07-01 00:00:00 -65.34212
df1.loc[df1.index.difference(df1.assign(key=0).merge(df2.assign(key=0), how = 'outer')\
.query("timestamp >= start_time and timestamp < end_time").index),"Value"] = 0
result:
timestamp Value
0 2020-01-13 00:00:00 -68.9537
1 2020-01-01 00:05:00 0.0000
2 2020-01-01 00:10:00 0.0000
3 2020-01-01 00:15:00 0.0000
4 2020-01-01 00:20:00 0.0000
5 2020-07-01 00:00:00 0.0000
Key assign(key=0) is added to both dataframes to produce cartesian product.
I am looking for the following functionality in python:
I have a Pandas DataFrame with 4 columns: ID, StartDate, EndDate, Moment.
I want to group by ID and evaluate per row in the group whether the Moment variable falls between the interval between StartDate and EndDate. The problem is that I want to evaluate this for each row in the group. For example in the following DataFrame there are two groups (ID=1 and ID=2) and both groups contains of 5 rows. For each row, I want a boolean for each row in both groups whether the moment variable in that row falls in ANY of the time windows in the group, the window being [date1, date2].
import pandas as pd
i = pd.date_range('2018-04-11', periods=10, freq='2D20min')
i2 = pd.date_range('2018-04-12', periods=10, freq='2D20min')
i3 = pd.date_range('2018-04-9', periods=10, freq='1D6H')
id = ['1', '1', '1', '1', '1', '2', '2', '2', '2', '2']
ts = pd.DataFrame({'date1': i, 'date2': i2, 'moment': i3}, index=id)
ID date1 date2 moment
1 2018-04-11 00:00:00 2018-04-12 00:00:00 2018-04-09 00:00:00
1 2018-04-13 00:20:00 2018-04-14 00:20:00 2018-04-10 06:00:00
1 2018-04-15 00:40:00 2018-04-16 00:40:00 2018-04-11 12:00:00
1 2018-04-17 01:00:00 2018-04-18 01:00:00 2018-04-12 18:00:00
1 2018-04-19 01:20:00 2018-04-20 01:20:00 2018-04-14 00:00:00
2 2018-04-21 01:40:00 2018-04-22 01:40:00 2018-04-15 06:00:00
2 2018-04-23 02:00:00 2018-04-24 02:00:00 2018-04-16 12:00:00
2 2018-04-25 02:20:00 2018-04-26 02:20:00 2018-04-17 18:00:00
2 2018-04-27 02:40:00 2018-04-28 02:40:00 2018-04-19 00:00:00
2 2018-04-29 03:00:00 2018-04-30 03:00:00 2018-04-20 06:00:00
In this case, the value for moment in the first row of the first group does not fall in any of the five time intervals. Neither does the second. The third value, 2018-04-11 12:00:00 does fall in the interval in the first row and I would thus want to have True returned.
The desired result would look as follows:
ID date1 date2 moment result
1 2018-04-11 00:00:00 2018-04-12 00:00:00 2018-04-09 00:00:00 False
1 2018-04-13 00:20:00 2018-04-14 00:20:00 2018-04-10 06:00:00 False
1 2018-04-15 00:40:00 2018-04-16 00:40:00 2018-04-11 12:00:00 True
1 2018-04-17 01:00:00 2018-04-18 01:00:00 2018-04-12 18:00:00 False
1 2018-04-19 01:20:00 2018-04-20 01:20:00 2018-04-14 00:00:00 True
2 2018-04-21 01:40:00 2018-04-22 01:40:00 2018-04-15 06:00:00 False
2 2018-04-23 02:00:00 2018-04-24 02:00:00 2018-04-16 12:00:00 False
2 2018-04-25 02:20:00 2018-04-26 02:20:00 2018-04-17 18:00:00 False
2 2018-04-27 02:40:00 2018-04-28 02:40:00 2018-04-19 00:00:00 False
2 2018-04-29 03:00:00 2018-04-30 03:00:00 2018-04-20 06:00:00 False
EDIT
I already 'solved' this problem with the following approach but am looking for a more pythonic and perhaps faster way...
boolean_result = []
for c in ts.index.unique():
temp = ts.loc[ts.index == c]
for row in temp.index:
current_date = temp['moment'][row]
boolean_result.append(max((temp['date1'] <= current_date)
& (current_date <= temp['date2'])))
ts['Result'] = boolean_result
This may actually be very slow if your dataframe is too big, and there might be an optimal solution other than this one:
def time_in_range(start, end, x):
"""Return true if x is in the range [start, end]"""
if start <= x and x <= end:
return True
else:
return False
# empty list to be appended
result = []
test_list = []
for i in ts.index.unique():
temp_df = ts[ts.index == i]
for j in range(0, len(temp_df)):
for k in range(0, len(temp_df)):
test_list.append(time_in_range(temp_df.date1.iloc[k], temp_df.date2.iloc[k], temp_df.moment.iloc[j]))
result.append(any(test_list))
# reset the list
test_list = []
ts['result'] = result
I have a dataframe df that contains datetimes for every hour of a day between 2003-02-12 to 2017-06-30 and I want to delete all datetimes between 24th Dec and 1st Jan of EVERY year.
An extract of my data frame is:
...
7505,2003-12-23 17:00:00
7506,2003-12-23 18:00:00
7507,2003-12-23 19:00:00
7508,2003-12-23 20:00:00
7509,2003-12-23 21:00:00
7510,2003-12-23 22:00:00
7511,2003-12-23 23:00:00
7512,2003-12-24 00:00:00
7513,2003-12-24 01:00:00
7514,2003-12-24 02:00:00
7515,2003-12-24 03:00:00
7516,2003-12-24 04:00:00
7517,2003-12-24 05:00:00
7518,2003-12-24 06:00:00
...
7723,2004-01-01 19:00:00
7724,2004-01-01 20:00:00
7725,2004-01-01 21:00:00
7726,2004-01-01 22:00:00
7727,2004-01-01 23:00:00
7728,2004-01-02 00:00:00
7729,2004-01-02 01:00:00
7730,2004-01-02 02:00:00
7731,2004-01-02 03:00:00
7732,2004-01-02 04:00:00
7733,2004-01-02 05:00:00
7734,2004-01-02 06:00:00
7735,2004-01-02 07:00:00
...
and my expected output is:
...
7505,2003-12-23 17:00:00
7506,2003-12-23 18:00:00
7507,2003-12-23 19:00:00
7508,2003-12-23 20:00:00
7509,2003-12-23 21:00:00
7510,2003-12-23 22:00:00
7511,2003-12-23 23:00:00
...
7728,2004-01-02 00:00:00
7729,2004-01-02 01:00:00
7730,2004-01-02 02:00:00
7731,2004-01-02 03:00:00
7732,2004-01-02 04:00:00
7733,2004-01-02 05:00:00
7734,2004-01-02 06:00:00
7735,2004-01-02 07:00:00
...
Sample dataframe:
dates
0 2003-12-23 23:00:00
1 2003-12-24 05:00:00
2 2004-12-27 05:00:00
3 2003-12-13 23:00:00
4 2002-12-23 23:00:00
5 2004-01-01 05:00:00
6 2014-12-24 05:00:00
Solution:
If you want it for every year between the following dates excluded, then extract the month and dates first:
df['month'] = df['dates'].dt.month
df['day'] = df['dates'].dt.day
And now put the condition check:
dec_days = [24, 25, 26, 27, 28, 29, 30, 31]
## if the month is dec, then check for these dates
## if the month is jan, then just check for the day to be 1 like below
df = df[~(((df.month == 12) & (df.day.isin(dec_days))) | ((df.month == 1) & (df.day == 1)))]
Sample output:
dates month day
0 2003-12-23 23:00:00 12 23
3 2003-12-13 23:00:00 12 13
4 2002-12-23 23:00:00 12 23
This takes advantage of the fact that datetime-strings in the form mm-dd are sortable. Read everything in from the CSV file then filter for the dates you want:
df = pd.read_csv('...', parse_dates=['DateTime'])
s = df['DateTime'].dt.strftime('%m-%d')
excluded = (s == '01-01') | (s >= '12-24') # Jan 1 or >= Dec 24
df[~excluded]
You can try dropping on conditionals. Maybe with a pattern match to the date string or parsing the date as a number (like in Java) and conditionally removing.
datesIdontLike = df[df['colname'] == <stringPattern>].index
newDF = df.drop(datesIdontLike, inplace=True)
Check this out: https://thispointer.com/python-pandas-how-to-drop-rows-in-dataframe-by-conditions-on-column-values/
(If you have issues, let me know.)
You can use pandas and boolean filtering with strftime
# version 0.23.4
import pandas as pd
# make df
df = pd.DataFrame(pd.date_range('20181223', '20190103', freq='H'), columns=['date'])
# string format the date to only include the month and day
# then set it strictly less than '12-24' AND greater than or equal to `01-02`
df = df.loc[
(df.date.dt.strftime('%m-%d') < '12-24') &
(df.date.dt.strftime('%m-%d') >= '01-02')
].copy()
print(df)
date
0 2018-12-23 00:00:00
1 2018-12-23 01:00:00
2 2018-12-23 02:00:00
3 2018-12-23 03:00:00
4 2018-12-23 04:00:00
5 2018-12-23 05:00:00
6 2018-12-23 06:00:00
7 2018-12-23 07:00:00
8 2018-12-23 08:00:00
9 2018-12-23 09:00:00
10 2018-12-23 10:00:00
11 2018-12-23 11:00:00
12 2018-12-23 12:00:00
13 2018-12-23 13:00:00
14 2018-12-23 14:00:00
15 2018-12-23 15:00:00
16 2018-12-23 16:00:00
17 2018-12-23 17:00:00
18 2018-12-23 18:00:00
19 2018-12-23 19:00:00
20 2018-12-23 20:00:00
21 2018-12-23 21:00:00
22 2018-12-23 22:00:00
23 2018-12-23 23:00:00
240 2019-01-02 00:00:00
241 2019-01-02 01:00:00
242 2019-01-02 02:00:00
243 2019-01-02 03:00:00
244 2019-01-02 04:00:00
245 2019-01-02 05:00:00
246 2019-01-02 06:00:00
247 2019-01-02 07:00:00
248 2019-01-02 08:00:00
249 2019-01-02 09:00:00
250 2019-01-02 10:00:00
251 2019-01-02 11:00:00
252 2019-01-02 12:00:00
253 2019-01-02 13:00:00
254 2019-01-02 14:00:00
255 2019-01-02 15:00:00
256 2019-01-02 16:00:00
257 2019-01-02 17:00:00
258 2019-01-02 18:00:00
259 2019-01-02 19:00:00
260 2019-01-02 20:00:00
261 2019-01-02 21:00:00
262 2019-01-02 22:00:00
263 2019-01-02 23:00:00
264 2019-01-03 00:00:00
This will work with multiple years because we are only filtering on the month and day.
# change range to include 2017
df = pd.DataFrame(pd.date_range('20171223', '20190103', freq='H'), columns=['date'])
df = df.loc[
(df.date.dt.strftime('%m-%d') < '12-24') &
(df.date.dt.strftime('%m-%d') >= '01-02')
].copy()
print(df)
date
0 2017-12-23 00:00:00
1 2017-12-23 01:00:00
2 2017-12-23 02:00:00
3 2017-12-23 03:00:00
4 2017-12-23 04:00:00
5 2017-12-23 05:00:00
6 2017-12-23 06:00:00
7 2017-12-23 07:00:00
8 2017-12-23 08:00:00
9 2017-12-23 09:00:00
10 2017-12-23 10:00:00
11 2017-12-23 11:00:00
12 2017-12-23 12:00:00
13 2017-12-23 13:00:00
14 2017-12-23 14:00:00
15 2017-12-23 15:00:00
16 2017-12-23 16:00:00
17 2017-12-23 17:00:00
18 2017-12-23 18:00:00
19 2017-12-23 19:00:00
20 2017-12-23 20:00:00
21 2017-12-23 21:00:00
22 2017-12-23 22:00:00
23 2017-12-23 23:00:00
240 2018-01-02 00:00:00
241 2018-01-02 01:00:00
242 2018-01-02 02:00:00
243 2018-01-02 03:00:00
244 2018-01-02 04:00:00
245 2018-01-02 05:00:00
... ...
8779 2018-12-23 19:00:00
8780 2018-12-23 20:00:00
8781 2018-12-23 21:00:00
8782 2018-12-23 22:00:00
8783 2018-12-23 23:00:00
9000 2019-01-02 00:00:00
9001 2019-01-02 01:00:00
9002 2019-01-02 02:00:00
9003 2019-01-02 03:00:00
9004 2019-01-02 04:00:00
9005 2019-01-02 05:00:00
9006 2019-01-02 06:00:00
9007 2019-01-02 07:00:00
9008 2019-01-02 08:00:00
9009 2019-01-02 09:00:00
9010 2019-01-02 10:00:00
9011 2019-01-02 11:00:00
9012 2019-01-02 12:00:00
9013 2019-01-02 13:00:00
9014 2019-01-02 14:00:00
9015 2019-01-02 15:00:00
9016 2019-01-02 16:00:00
9017 2019-01-02 17:00:00
9018 2019-01-02 18:00:00
9019 2019-01-02 19:00:00
9020 2019-01-02 20:00:00
9021 2019-01-02 21:00:00
9022 2019-01-02 22:00:00
9023 2019-01-02 23:00:00
9024 2019-01-03 00:00:00
Since you want this to happen for every year, we can first define a series that where we replace the year by a static value (2000 for example). Let date be the column that stores the date, we can generate such column as:
dt = pd.to_datetime({'year': 2000, 'month': df['date'].dt.month, 'day': df['date'].dt.day})
For the given sample data, we get:
>>> dt
0 2000-12-23
1 2000-12-23
2 2000-12-23
3 2000-12-23
4 2000-12-23
5 2000-12-23
6 2000-12-23
7 2000-12-24
8 2000-12-24
9 2000-12-24
10 2000-12-24
11 2000-12-24
12 2000-12-24
13 2000-12-24
14 2000-01-01
15 2000-01-01
16 2000-01-01
17 2000-01-01
18 2000-01-01
19 2000-01-02
20 2000-01-02
21 2000-01-02
22 2000-01-02
23 2000-01-02
24 2000-01-02
25 2000-01-02
26 2000-01-02
dtype: datetime64[ns]
Next we can filter the rows, like:
from datetime import date
df[(dt >= date(2000,1,2)) & (dt < date(2000,12,24))]
This gives us the following data for your sample data:
>>> df[(dt >= date(2000,1,2)) & (dt < date(2000,12,24))]
id dt
0 7505 2003-12-23 17:00:00
1 7506 2003-12-23 18:00:00
2 7507 2003-12-23 19:00:00
3 7508 2003-12-23 20:00:00
4 7509 2003-12-23 21:00:00
5 7510 2003-12-23 22:00:00
6 7511 2003-12-23 23:00:00
19 7728 2004-01-02 00:00:00
20 7729 2004-01-02 01:00:00
21 7730 2004-01-02 02:00:00
22 7731 2004-01-02 03:00:00
23 7732 2004-01-02 04:00:00
24 7733 2004-01-02 05:00:00
25 7734 2004-01-02 06:00:00
26 7735 2004-01-02 07:00:00
So regardless what the year is, we will only consider dates between the 2nd of January and the 23rd of December (both inclusive).
I have a dataframe like shown below
df = pd.DataFrame({'time':['2166-01-09 14:00:00','2166-01-09 14:08:00','2166-01-09 16:00:00','2166-01-09 20:00:00',
'2166-01-09 04:00:00','2166-01-10 05:00:00','2166-01-10 06:00:00','2166-01-10 07:00:00','2166-01-10 11:00:00',
'2166-01-10 11:30:00','2166-01-10 12:00:00','2166-01-10 13:00:00','2166-01-10 13:30:00']})
I am trying to find a time difference between rows. For which I did the below
df['time2'] = df['time'].shift(-1)
df['tdiff'] = (df['time2'] - df['time'])
So, my result looks like as shown below
I found out that there exists a function like dt.days and I tried
df['tdiff'].dt.days
but it only gives the day component but am looking for something like 'hours` component
However, I would like to have my output like as shown below
I am sorry that I am not sure how to calculate the hour equivalent of negative time in row no 3. Might be that's an data issue.
In pandas is possible convert timedeltas to seconds by Series.dt.total_seconds and then divide by 3600:
df['tdiff'] = (df['time2'] - df['time']).dt.total_seconds() / 3600
print (df)
time time2 tdiff
0 2166-01-09 14:00:00 2166-01-09 14:08:00 0.133333
1 2166-01-09 14:08:00 2166-01-09 16:00:00 1.866667
2 2166-01-09 16:00:00 2166-01-09 20:00:00 4.000000
3 2166-01-09 20:00:00 2166-01-09 04:00:00 -16.000000
4 2166-01-09 04:00:00 2166-01-10 05:00:00 25.000000
5 2166-01-10 05:00:00 2166-01-10 06:00:00 1.000000
6 2166-01-10 06:00:00 2166-01-10 07:00:00 1.000000
7 2166-01-10 07:00:00 2166-01-10 11:00:00 4.000000
8 2166-01-10 11:00:00 2166-01-10 11:30:00 0.500000
9 2166-01-10 11:30:00 2166-01-10 12:00:00 0.500000
10 2166-01-10 12:00:00 2166-01-10 13:00:00 1.000000
11 2166-01-10 13:00:00 2166-01-10 13:30:00 0.500000
12 2166-01-10 13:30:00 NaT NaN
I want to calculate time difference between two columns on specific time range.
I try df.between_time but it only works on index.
Ex. Time range: between 18.00 - 8.00
Data :
start stop
0 2018-07-16 16:00:00 2018-07-16 20:00:00
1 2018-07-11 08:03:00 2018-07-11 12:03:00
2 2018-07-13 17:54:00 2018-07-13 21:54:00
3 2018-07-14 13:09:00 2018-07-14 17:09:00
4 2018-07-20 00:21:00 2018-07-20 04:21:00
5 2018-07-20 17:00:00 2018-07-21 09:00:00
Expect Result :
start stop time_diff
0 2018-07-16 16:00:00 2018-07-16 20:00:00 02:00:00
1 2018-07-11 08:03:00 2018-07-11 12:03:00 0
2 2018-07-13 17:54:00 2018-07-13 21:54:00 03:54:00
3 2018-07-14 13:09:00 2018-07-14 17:09:00 0
4 2018-07-20 00:21:00 2018-07-20 04:21:00 04:00:00
5 2018-07-20 17:00:00 2018-07-21 09:00:00 14:00:00
Note: If time_diff > 1 days, I already deal with that case.
Question: Should I build a function to do this or there are pandas build-in function to do this? Any help or guide would be appreciated.
I think this can be a solution
tmp = pd.DataFrame({'time1': pd.to_datetime(['2018-07-16 16:00:00', '2018-07-11 08:03:00',
'2018-07-13 17:54:00', '2018-07-14 13:09:00',
'2018-07-20 00:21:00', '2018-07-20 17:00:00']),
'time2': pd.to_datetime(['2018-07-16 20:00:00', '2018-07-11 12:03:00',
'2018-07-13 21:54:00', '2018-07-14 17:09:00',
'2018-07-20 04:21:00', '2018-07-21 09:00:00'])})
time1_date = tmp.time1.dt.date.astype(str)
tmp['rule18'], tmp['rule08'] = pd.to_datetime(time1_date + ' 18:00:00'), pd.to_datetime(time1_date + ' 08:00:00')
# if stop exceeds 18:00:00, compute time difference from this hour
tmp['time_diff_rule1'] = np.where(tmp.time2 > tmp.rule18, (tmp.time2 - tmp.rule18), (tmp.time2 - tmp.time1))
# rearrange the dataframe with your second rule
tmp['time_diff_rule2'] = np.where((tmp.time2 < tmp.rule18) & (tmp.time1 > tmp.rule08), 0, tmp['time_diff_rule1'])
time_diff_rule1 time_diff_rule2
0 02:00:00 02:00:00
1 04:00:00 00:00:00
2 03:54:00 03:54:00
3 04:00:00 00:00:00
4 04:00:00 04:00:00
5 15:00:00 15:00:00