I have following dataframe in pandas
start_date start_time end_time
2018-01-01 23:55:00 00:05:00
2018-01-02 00:05:00 00:10:00
2018-01-03 23:59:00 00:05:00
I want to calculate the time difference. But, for 1st and 3rd observation, there is a date change in end_time.
How can I do it in pandas?
Currently, I am using the logic where end_time is less than start_time I am creating one more column called end_date where it increments the start_date by 1 and then subtracts the time.
Is there any other way to do it?
Solution working with timedeltas - if difference are days equal -1 then add one day:
df['start_time'] = pd.to_timedelta(df['start_time'])
df['end_time'] = pd.to_timedelta(df['end_time'])
d = df['end_time'] - df['start_time']
df['diff'] = d.mask(d.dt.days == -1, d + pd.Timedelta(1, unit='d'))
print (df)
start_date start_time end_time diff
0 2018-01-01 23:55:00 00:05:00 00:10:00
1 2018-01-02 00:05:00 00:10:00 00:05:00
2 2018-01-03 23:59:00 00:05:00 00:06:00
Another solution:
s = df['end_time'] - df['start_time']
df['diff'] = np.where(df['end_time'] < df['start_time'],
s + pd.Timedelta(1, unit='d'),
s)
print (df)
start_date start_time end_time diff
0 2018-01-01 23:55:00 00:05:00 00:10:00
1 2018-01-02 00:05:00 00:10:00 00:05:00
2 2018-01-03 23:59:00 00:05:00 00:06:00
Related
I have the following dataframe df:
Datetime1 Datetime2 Value
2018-01-01 00:00 2018-01-01 01:00 5
2018-01-01 01:00 2018-01-01 02:00 1
2018-01-01 02:00 2018-01-01 03:00 2
2018-01-01 03:00 2018-01-01 04:00 3
2018-01-01 04:00 2018-01-01 05:00 6
I want to set a multi index composed of Datetime1 and Datetime2 to further proceed with the data resampling and interpolation (from 1 hour to 30 minutes frequency).
If I do df.set_index(["Datetime1","Datetime2"]).resample("30T").ffill(), then it fails.
Desired output:
Datetime1 Datetime2 Value
2018-01-01 00:00 2018-01-01 01:00 5
2018-01-01 00:30 2018-01-01 01:30 5
2018-01-01 01:00 2018-01-01 02:00 1
2018-01-01 01:30 2018-01-01 02:30 1
...
If there is one hour difference is possible create MultiIndex after resample with add 1H to new DatetimeIndex:
df = df.set_index(["Datetime1"])[['Value']].resample("30T").ffill()
df = df.set_index([df.index.rename('Datetime2') + pd.Timedelta('1H')], append=True)
print (df)
Value
Datetime1 Datetime2
2018-01-01 00:00:00 2018-01-01 01:00:00 5
2018-01-01 00:30:00 2018-01-01 01:30:00 5
2018-01-01 01:00:00 2018-01-01 02:00:00 1
2018-01-01 01:30:00 2018-01-01 02:30:00 1
2018-01-01 02:00:00 2018-01-01 03:00:00 2
2018-01-01 02:30:00 2018-01-01 03:30:00 2
2018-01-01 03:00:00 2018-01-01 04:00:00 3
2018-01-01 03:30:00 2018-01-01 04:30:00 3
2018-01-01 04:00:00 2018-01-01 05:00:00 6
Or:
s = df.set_index(["Datetime1"])['Value'].resample("30T").ffill()
s.index = [s.index,s.index.rename('Datetime2') + pd.Timedelta('1H')]
print (s)
Datetime1 Datetime2
2018-01-01 00:00:00 2018-01-01 01:00:00 5
2018-01-01 00:30:00 2018-01-01 01:30:00 5
2018-01-01 01:00:00 2018-01-01 02:00:00 1
2018-01-01 01:30:00 2018-01-01 02:30:00 1
2018-01-01 02:00:00 2018-01-01 03:00:00 2
2018-01-01 02:30:00 2018-01-01 03:30:00 2
2018-01-01 03:00:00 2018-01-01 04:00:00 3
2018-01-01 03:30:00 2018-01-01 04:30:00 3
2018-01-01 04:00:00 2018-01-01 05:00:00 6
Name: Value, dtype: int64
The multi-index is not meant for a double-index but for a hierarchical (grouped) index. See the docs. You said in the comments, that Datetime2 is always offset by 1 hour. That means it's probably fastest to recalculate it:
df.set_index("Datetime1","Datetime2").resample("30T").ffill()
df["Datetime2" = df.index + pd.Timedelta(1, "hour")
I've developed a crude method to round timestamps to the previous 15 mins. For instance, if the timestamp is 8:10:00, it gets rounded to 8:00:00.
However, when it goes over 15 mins it rounds to the previous hour. For instance, if the timestamp was 8:20:00, it gets rounded to 7:00:00 for some reason? I'll list the two examples below.
Correct Rounding:
import pandas as pd
from datetime import datetime, timedelta
d = ({
'Time' : ['8:00:00'],
})
df = pd.DataFrame(data=d)
df['Time'] = pd.to_datetime(df['Time'])
FirstTime = df['Time'].iloc[0]
def hour_rounder(t):
return (t.replace(second=0, microsecond=0, minute=0, hour=t.hour)
-timedelta(hours=t.minute//15))
StartTime = hour_rounder(FirstTime)
StartTime = datetime.time(StartTime)
print(StartTime)
Out:
08:00:00
Incorrect Rounding:
import pandas as pd
from datetime import datetime, timedelta
d = ({
'Time' : ['8:20:00'],
})
df = pd.DataFrame(data=d)
df['Time'] = pd.to_datetime(df['Time'])
FirstTime = df['Time'].iloc[0]
def hour_rounder(t):
return (t.replace(second=0, microsecond=0, minute=0, hour=t.hour)
-timedelta(hours=t.minute//15))
StartTime = hour_rounder(FirstTime)
StartTime = datetime.time(StartTime)
print(StartTime)
Out:
07:00:00
I don't understand what I'm doing wrong?
- timedelta(hours=t.minute//15)
If minute is 20, then minute // 15 equals 1, so you're subtracting one hour.
Try this instead:
return t.replace(second=0, microsecond=0, minute=(t.minute // 15 * 15), hour=t.hour)
Use .dt.floor('15min') to round down to 15 minute invervals.
import pandas as pd
df = pd.DataFrame({'Time': pd.date_range('2018-01-01', freq='13.141min', periods=13)})
df['prev_15'] = df.Time.dt.floor('15min')
Output:
Time prev_15
0 2018-01-01 00:00:00.000 2018-01-01 00:00:00
1 2018-01-01 00:13:08.460 2018-01-01 00:00:00
2 2018-01-01 00:26:16.920 2018-01-01 00:15:00
3 2018-01-01 00:39:25.380 2018-01-01 00:30:00
4 2018-01-01 00:52:33.840 2018-01-01 00:45:00
5 2018-01-01 01:05:42.300 2018-01-01 01:00:00
6 2018-01-01 01:18:50.760 2018-01-01 01:15:00
7 2018-01-01 01:31:59.220 2018-01-01 01:30:00
8 2018-01-01 01:45:07.680 2018-01-01 01:45:00
9 2018-01-01 01:58:16.140 2018-01-01 01:45:00
10 2018-01-01 02:11:24.600 2018-01-01 02:00:00
11 2018-01-01 02:24:33.060 2018-01-01 02:15:00
12 2018-01-01 02:37:41.520 2018-01-01 02:30:00
There is also .dt.round() and .dt.ceil() if you need to get the nearest 15 minute, or the following 15 minute invterval respectively.
I have a dataframe where I need to group the TX/RX column into pairs, and then put these into a new dataframe with a new index and the timedelta between them as values.
df = pd.DataFrame()
df['time1'] = pd.date_range('2018-01-01', periods=6, freq='H')
df['time2'] = pd.date_range('2018-01-01', periods=6, freq='1H1min')
df['id'] = ids
df['val'] = vals
time1 time2 id val
0 2018-01-01 00:00:00 2018-01-01 00:00:00 1 A
1 2018-01-01 01:00:00 2018-01-01 01:01:00 2 B
2 2018-01-01 02:00:00 2018-01-01 02:02:00 3 A
3 2018-01-01 03:00:00 2018-01-01 03:03:00 4 B
4 2018-01-01 04:00:00 2018-01-01 04:04:00 5 A
5 2018-01-01 05:00:00 2018-01-01 05:05:00 6 B
needs to be...
index timedelta A B
0 1 1 2
1 1 3 4
2 1 5 6
I think that pivot_tables or stack/unstack is probably the best way to go about this, but I'm not entirely sure how...
I believe you need:
df = pd.DataFrame()
df['time1'] = pd.date_range('2018-01-01', periods=6, freq='H')
df['time2'] = df['time1'] + pd.to_timedelta([60,60,120,120,180,180], 's')
df['id'] = range(1,7)
df['val'] = ['A','B'] * 3
df['t'] = df['time2'] - df['time1']
print (df)
time1 time2 id val t
0 2018-01-01 00:00:00 2018-01-01 00:01:00 1 A 00:01:00
1 2018-01-01 01:00:00 2018-01-01 01:01:00 2 B 00:01:00
2 2018-01-01 02:00:00 2018-01-01 02:02:00 3 A 00:02:00
3 2018-01-01 03:00:00 2018-01-01 03:02:00 4 B 00:02:00
4 2018-01-01 04:00:00 2018-01-01 04:03:00 5 A 00:03:00
5 2018-01-01 05:00:00 2018-01-01 05:03:00 6 B 00:03:00
#if necessary convert to seconds
#df['t'] = (df['time2'] - df['time1']).dt.total_seconds()
df = df.pivot('t','val','id').reset_index().rename_axis(None, axis=1)
#if necessary aggregate values
#df = (df.pivot_table(index='t',columns='val',values='id', aggfunc='mean')
# .reset_index().rename_axis(None, axis=1))
print (df)
t A B
0 00:01:00 1 2
1 00:02:00 3 4
2 00:03:00 5 6
I am trying to iterate over every quarter of the year, this is what I have so far.
now = datetime.now()
first_day = datetime(year=now.year, month=1, day=1)
print("--",first_day)
hundredDaysLater = first_day - timedelta(days=100)
for dt in rrule.rrule(rrule.MONTHLY, dtstart=first_day, bymonthday=(31, -1), count=6, interval=3):
print(dt.replace(day=1))
print(dt)
output
-- 2018-01-01 00:00:00
2018-01-01 00:00:00
2018-01-31 00:00:00
2018-04-01 00:00:00
2018-04-30 00:00:00
2018-07-01 00:00:00
2018-07-31 00:00:00
2018-10-01 00:00:00
2018-10-31 00:00:00
2019-01-01 00:00:00
2019-01-31 00:00:00
2019-04-01 00:00:00
Now I basicly need to get this output
2018-01-01, 2018-03-31
2018-04-01, 2018-06-30
2018-07-01, 2018-09-30
2018-10-01, 2018-12-31
But instead of counting forwards I need to count backwards.
Using relativedelta (from the dateutil package):
>>> from dateutil.relativedelta import relativedelta
>>> from datetime import date
>>> d = date(2019, 1, 1)
>>> day = relativedelta(days=1)
>>> quarter = relativedelta(months=3)
>>> while True:
... print(d - day)
... print(d - quarter)
... d -= quarter
...
2018-12-31
2018-10-01
2018-09-30
2018-07-01
2018-06-30
2018-04-01
2018-03-31
2018-01-01
2017-12-31
2017-10-01
2017-09-30
2017-07-01
2017-06-30
2017-04-01
2017-03-31
...
Short solution with pandas.date_range function:
In [708]: start_date = datetime.datetime(2018, 1, 1)
In [709]: data = pd.date_range(start=start_date, periods=4, freq='QS-JAN').union(
pd.date_range(start=start_date, periods=4, freq='Q-DEC')).strftime('%Y-%m-%d')
In [710]: for i in range(0, data.size, 2):
...: print(data[i], data[i+1], sep=', ')
...:
2018-01-01, 2018-03-31
2018-04-01, 2018-06-30
2018-07-01, 2018-09-30
2018-10-01, 2018-12-31
I'm trying to resample a dataframe with a time series from 1-hour increments to 15-minute. Both .resample() and .asfreq() do almost exactly what I want, but I'm having a hard time filling the last three intervals.
I could add an extra hour at the end, resample, and then drop that last hour, but it feels hacky.
Current code:
df = pd.DataFrame({'date':pd.date_range('2018-01-01 00:00', '2018-01-01 01:00', freq = '1H'), 'num':5})
df = df.set_index('date').asfreq('15T', method = 'ffill', how = 'end').reset_index()
Current output:
date num
0 2018-01-01 00:00:00 5
1 2018-01-01 00:15:00 5
2 2018-01-01 00:30:00 5
3 2018-01-01 00:45:00 5
4 2018-01-01 01:00:00 5
Desired output:
date num
0 2018-01-01 00:00:00 5
1 2018-01-01 00:15:00 5
2 2018-01-01 00:30:00 5
3 2018-01-01 00:45:00 5
4 2018-01-01 01:00:00 5
5 2018-01-01 01:15:00 5
6 2018-01-01 01:30:00 5
7 2018-01-01 01:45:00 5
Thoughts?
Not sure about asfreq but reindex works wonderfully:
df.set_index('date').reindex(
pd.date_range(
df.date.min(),
df.date.max() + pd.Timedelta('1H'), freq='15T', closed='left'
),
method='ffill'
)
num
2018-01-01 00:00:00 5
2018-01-01 00:15:00 5
2018-01-01 00:30:00 5
2018-01-01 00:45:00 5
2018-01-01 01:00:00 5
2018-01-01 01:15:00 5
2018-01-01 01:30:00 5
2018-01-01 01:45:00 5