Measuring elapsed time in Pandas

Measuring elapsed time in Pandas - python

I'm trying to make a simple analysis of my sport activities where I have elapsed time in the string format like this:
00:22:05
00:30:34
00:30:31
00:37:19
00:28:43
00:22:08
I've tried to convert it to the pandas datetime type but I'm only interested in time of my activities so I could calculate mean for instance or how much I was pausing during whole run.
I've tried that code but it doesn't resolve my issue.
df_test['Elapsed time'] = pd.to_datetime(df_test['Elapsed time'], format = '%H:%M:%S')
Any ideas how I can make that work? I've been trying to find answers but nothing helps. And I'm still new to Pandas. Thanks in advance.

Welcome to StackOverflow. I think the question you are looking to answer is how to convert the time string to a datetime format without the date portion. Doing so requires only a minor modification to your code.
pd.to_datetime(df['Elapsed Time'], format = '%H:%M:%S').dt.time
Complete code:
import pandas as pd
data_dict = { 'Elapsed Time': ['00:22:05', '00:30:34', '00:30:31', '00:37:19', '00:28:43', '00:22:08'] }
df = pd.DataFrame.from_dict(data_dict)
df['Formatted Time'] = pd.to_datetime(df['Elapsed Time'], format = '%H:%M:%S').dt.time
type(df['Elapsed Time'][0]) # 'str'
type(df['Formatted Time'][0]) # 'datetime.time'
Computing with Time
In order to perform analysis of the data you'll need to convert the time value to something useful, such as seconds. Here I'll present two methods of doing that.
The first method performs manual calculations using the original time string.
def total_seconds_in_time_string(time_string):
segments = time_string.strip().split(':')
# segments: [ 'HH', 'MM', 'SS' ]
# total seconds = (((HH * 60) + MM) * 60) + SS
return (((int(segments[0]) * 60) + int(segments[1])) * 60) + int(segments[2])
df['Total Seconds'] = df['Elapsed Time'].apply(lambda x: total_seconds_in_time_string(x))
type(df['Total Seconds'][0]) # 'numpy.int64'
df['Total Seconds'].mean() # 1713.3333333333333
def seconds_to_timestring(secs):
import time
time_secs = time.gmtime(round(secs))
return time.strftime('%H:%M:%S', time_secs)
avg_time_str = seconds_to_timestring(df['Total Seconds'].mean())
print(avg_time_str) # '00:28:33'
The second method would be the more Pythonic solution using the datetime library.
def total_seconds_in_time(t):
from datetime import timedelta
return timedelta(hours=t.hour, minutes=t.minute, seconds=t.second) / timedelta(seconds=1)
df['TimeDelta Seconds'] = df['Formatted Time'].apply(lambda x: total_seconds_in_time(x))
type(df['TimeDelta Seconds'][0]) # 'numpy.float64'
df['TimeDelta Seconds'].mean() # 1713.3333333333333
def seconds_to_timedelta(secs):
from datetime import timedelta
return timedelta(seconds=round(secs))
mean_avg = seconds_to_timedelta(df['TimeDelta Seconds'].mean())
print(mean_avg) # '0:28:33'

Related

How to calculate relative volume using pandas with faster way?

I am trying to implement the RVOL by the time of day technical indicator, which can be used as the indication of market strength.
The logic behind this is as follows:
If the current time is 2022/3/19 13:00, we look through the same moment (13:00) at the previous N days and average all the previous volumes at that moment to calculate Average_volume_previous.
Then, RVOL(t) is volume(t)/Average_volume_previous(t).
It is hard to use methods like rolling and apply to deal with this complex logic in the code I wrote.
However, the operation time of for loop is catastrophically long.
from datetime import datetime
import pandas as pd
import numpy as np
datetime_array = pd.date_range(datetime.strptime('2015-03-19 13:00:00', '%Y-%m-%d %H:%M:%S'), datetime.strptime("2022-03-19 13:00:00", '%Y-%m-%d %H:%M:%S'), freq='30min')
volume_array = pd.Series(np.random.uniform(1000, 10000, len(datetime_array)))
df = pd.DataFrame({'Date':datetime_array, 'Volume':volume_array})
df.set_index(['Date'], inplace=True)
output = []
for idx in range(len(df)):
date = str(df.index[idx].hour)+':'+str(df.index[idx].minute)
temp_date = df.iloc[:idx].between_time(date, date)
output.append(temp_date.tail(day_len).mean().iloc[0])
output = np.array(output)
Practically, there might be missing data in the datetime array. So, it would be hard to use fixed length lookback period to solve this. Is there any way to make this code work faster?

I'm not sure I understand, however this is the solution as far as I understand.
I didn't use date as index
df.set_index(['Date'], inplace=True)
# Filter data to find instant
rolling_day = 10
hour = df['Date'].dt.hour == 13
minute = df['Date'].dt.minute == 0
df_moment = df[ore&minuti].copy()
Calculation of moving averages
df_moment['rolling'] = df_moment.rolling(rolling_day).mean()
Calculation of Average_volume_previous(t)/volume(t)
for idx_s, idx_e in zip(df_moment['Volume'][::rolling_day], df_moment['rolling'][rolling_day::rolling_day]):
print(f'{idx_s/idx_e}')
Output:
0.566379345408499
0.7229214799940626
0.6753586759429548
2.0588617812341354
0.7494803741982076
1.2132554086225438

How to add a certain time to a datetime?

I want to add hours to a datetime and use:
date = date_object + datetime.timedelta(hours=6)
Now I want to add a time:
time='-7:00' (string) plus 4 hours.
I tried hours=time+4 but this doesn't work. I think I have to int the string like int(time) but this doesn't work either.

Better you parse your time like below and access datetime attributes for getting time components from the parsed datetime object
input_time = datetime.strptime(yourtimestring,'yourtimeformat')
input_seconds = input_time.second # for seconds
input_minutes = input_time.minute # for minutes
input_hours = input_time.hour # for hours
# Usage: input_time = datetime.strptime("07:00","%M:%S")
Rest you have datetime.timedelta method to compose the duration.
new_time = initial_datetime + datetime.timedelta(hours=input_hours,minutes=input_minutes,seconds=input_seconds)
See docs strptime
and datetime format

You need to convert to a datetime object in order to add timedelta to your current time, then return it back to just the time portion.
Using date.today() just uses the arbitrary current date and sets the time to the time you supply. This allows you to add over days and reset the clock to 00:00.
dt.time() prints out the result you were looking for.
from datetime import date, datetime, time, timedelta
dt = datetime.combine(date.today(), time(7, 00)) + timedelta(hours=4)
print dt.time()
Edit:
To get from a string time='7:00' to what you could split on the colon and then reference each.
this_time = this_time.split(':') # make it a list split at :
this_hour = this_time[0]
this_min = this_time[1]
Edit 2:
To put it all back together then:
from datetime import date, datetime, time, timedelta
this_time = '7:00'
this_time = this_time.split(':') # make it a list split at :
this_hour = int(this_time[0])
this_min = int(this_time[1])
dt = datetime.combine(date.today(), time(this_hour, this_min)) + timedelta(hours=4)
print dt.time()
If you already have a full date to use, as mentioned in the comments, you should convert it to a datetime using strptime. I think another answer walks through how to use it so I'm not going to put an example.

Have a list of hours between two dates in python

I have two times and I want to make a list of all the hours between them using the same format in Python
from= '2016-12-02T11:00:00.000Z'
to= '2017-06-06T07:00:00.000Z'
hours=to-from
so the result will be something like this
2016-12-02T11:00:00.000Z
2016-12-02T12:00:00.000Z
2016-12-02T13:00:00.000Z
..... and so on
How can I so this and what kind of plugin should I use?

If possible I would recommend using pandas.
import pandas
time_range = pandas.date_range('2016-12-02T11:00:00.000Z', '2017-06-06T07:00:00.000Z', freq='H')
If you need strings then use the following:
timestamps = [str(x) + 'Z' for x in time_range]
# Output
# ['2016-12-02 11:00:00+00:00Z',
# '2016-12-02 12:00:00+00:00Z',
# '2016-12-02 13:00:00+00:00Z',
# '2016-12-02 14:00:00+00:00Z',
# '2016-12-02 15:00:00+00:00Z',
# '2016-12-02 16:00:00+00:00Z',
# ...]

simpler solution using standard library's datetime package:
from datetime import datetime, timedelta
DATE_TIME_STRING_FORMAT = '%Y-%m-%dT%H:%M:%S.%fZ'
from_date_time = datetime.strptime('2016-12-02T11:00:00.000Z',
DATE_TIME_STRING_FORMAT)
to_date_time = datetime.strptime('2017-06-06T07:00:00.000Z',
DATE_TIME_STRING_FORMAT)
date_times = [from_date_time.strftime(DATE_TIME_STRING_FORMAT)]
date_time = from_date_time
while date_time < to_date_time:
date_time += timedelta(hours=1)
date_times.append(date_time.strftime(DATE_TIME_STRING_FORMAT))
will give us
>>>date_times
['2016-12-02T11:00:00.000000Z',
'2016-12-02T12:00:00.000000Z',
'2016-12-02T13:00:00.000000Z',
'2016-12-02T14:00:00.000000Z',
'2016-12-02T15:00:00.000000Z',
'2016-12-02T16:00:00.000000Z',
'2016-12-02T17:00:00.000000Z',
'2016-12-02T18:00:00.000000Z',
'2016-12-02T19:00:00.000000Z',
'2016-12-02T20:00:00.000000Z',
...]

Pandas; transform column with MM:SS,decimals into number of seconds

Hey: Spent several hours trying to do a quite simple thing,but couldnt figure it out.
I have a dataframe with a column, df['Time'] which contains time, starting from 0, up to 20 minutes,like this:
1:10,10
1:16,32
3:03,04
First being minutes, second is seconds, third is miliseconds (only two digits).
Is there a way to automatically transform that column into seconds with Pandas, and without making that column the time index of the series?
I already tried the following but it wont work:
pd.to_datetime(df['Time']).convert('s') # AttributeError: 'Series' object has no attribute 'convert'
If the only way is to parse the time just point that out and I will prepare a proper / detailed answer to this question, dont waste your time =)
Thank you!

Code:
import pandas as pd
import numpy as np
import datetime
df = pd.DataFrame({'Time':['1:10,10', '1:16,32', '3:03,04']})
df['time'] = df.Time.apply(lambda x: datetime.datetime.strptime(x,'%M:%S,%f'))
df['timedelta'] = df.time - datetime.datetime.strptime('00:00,0','%M:%S,%f')
df['secs'] = df['timedelta'].apply(lambda x: x / np.timedelta64(1, 's'))
print df
Output:
Time time timedelta secs
0 1:10,10 1900-01-01 00:01:10.100000 00:01:10.100000 70.10
1 1:16,32 1900-01-01 00:01:16.320000 00:01:16.320000 76.32
2 3:03,04 1900-01-01 00:03:03.040000 00:03:03.040000 183.04
If you have also negative time deltas:
import pandas as pd
import numpy as np
import datetime
import re
regex = re.compile(r"(?P<minus>-)?((?P<minutes>\d+):)?(?P<seconds>\d+)(,(?P<centiseconds>\d{2}))?")
def parse_time(time_str):
parts = regex.match(time_str)
if not parts:
return
parts = parts.groupdict()
time_params = {}
for (name, param) in parts.iteritems():
if param and (name != 'minus'):
time_params[name] = int(param)
time_params['milliseconds'] = time_params['centiseconds']*10
del time_params['centiseconds']
return (-1 if parts['minus'] else 1) * datetime.timedelta(**time_params)
df = pd.DataFrame({'Time':['-1:10,10', '1:16,32', '3:03,04']})
df['timedelta'] = df.Time.apply(lambda x: parse_time(x))
df['secs'] = df['timedelta'].apply(lambda x: x / np.timedelta64(1, 's'))
print df
Output:
Time timedelta secs
0 -1:10,10 -00:01:10.100000 -70.10
1 1:16,32 00:01:16.320000 76.32
2 3:03,04 00:03:03.040000 183.04

How to calculate the time interval between two time strings

I have two times, a start and a stop time, in the format of 10:33:26 (HH:MM:SS). I need the difference between the two times. I've been looking through documentation for Python and searching online and I would imagine it would have something to do with the datetime and/or time modules. I can't get it to work properly and keep finding only how to do this when a date is involved.
Ultimately, I need to calculate the averages of multiple time durations. I got the time differences to work and I'm storing them in a list. I now need to calculate the average. I'm using regular expressions to parse out the original times and then doing the differences.
For the averaging, should I convert to seconds and then average?

Yes, definitely datetime is what you need here. Specifically, the datetime.strptime() method, which parses a string into a datetime object.
from datetime import datetime
s1 = '10:33:26'
s2 = '11:15:49' # for example
FMT = '%H:%M:%S'
tdelta = datetime.strptime(s2, FMT) - datetime.strptime(s1, FMT)
That gets you a timedelta object that contains the difference between the two times. You can do whatever you want with that, e.g. converting it to seconds or adding it to another datetime.
This will return a negative result if the end time is earlier than the start time, for example s1 = 12:00:00 and s2 = 05:00:00. If you want the code to assume the interval crosses midnight in this case (i.e. it should assume the end time is never earlier than the start time), you can add the following lines to the above code:
if tdelta.days < 0:
tdelta = timedelta(
days=0,
seconds=tdelta.seconds,
microseconds=tdelta.microseconds
)
(of course you need to include from datetime import timedelta somewhere). Thanks to J.F. Sebastian for pointing out this use case.

Try this -- it's efficient for timing short-term events. If something takes more than an hour, then the final display probably will want some friendly formatting.
import time
start = time.time()
time.sleep(10) # or do something more productive
done = time.time()
elapsed = done - start
print(elapsed)
The time difference is returned as the number of elapsed seconds.

Here's a solution that supports finding the difference even if the end time is less than the start time (over midnight interval) such as 23:55:00-00:25:00 (a half an hour duration):
#!/usr/bin/env python
from datetime import datetime, time as datetime_time, timedelta
def time_diff(start, end):
if isinstance(start, datetime_time): # convert to datetime
assert isinstance(end, datetime_time)
start, end = [datetime.combine(datetime.min, t) for t in [start, end]]
if start <= end: # e.g., 10:33:26-11:15:49
return end - start
else: # end < start e.g., 23:55:00-00:25:00
end += timedelta(1) # +day
assert end > start
return end - start
for time_range in ['10:33:26-11:15:49', '23:55:00-00:25:00']:
s, e = [datetime.strptime(t, '%H:%M:%S') for t in time_range.split('-')]
print(time_diff(s, e))
assert time_diff(s, e) == time_diff(s.time(), e.time())
Output
0:42:23
0:30:00
time_diff() returns a timedelta object that you can pass (as a part of the sequence) to a mean() function directly e.g.:
#!/usr/bin/env python
from datetime import timedelta
def mean(data, start=timedelta(0)):
"""Find arithmetic average."""
return sum(data, start) / len(data)
data = [timedelta(minutes=42, seconds=23), # 0:42:23
timedelta(minutes=30)] # 0:30:00
print(repr(mean(data)))
# -> datetime.timedelta(0, 2171, 500000) # days, seconds, microseconds
The mean() result is also timedelta() object that you can convert to seconds (td.total_seconds() method (since Python 2.7)), hours (td / timedelta(hours=1) (Python 3)), etc.

This site says to try:
import datetime as dt
start="09:35:23"
end="10:23:00"
start_dt = dt.datetime.strptime(start, '%H:%M:%S')
end_dt = dt.datetime.strptime(end, '%H:%M:%S')
diff = (end_dt - start_dt)
diff.seconds/60
This forum uses time.mktime()

Structure that represent time difference in Python is called timedelta. If you have start_time and end_time as datetime types you can calculate the difference using - operator like:
diff = end_time - start_time
you should do this before converting to particualr string format (eg. before start_time.strftime(...)). In case you have already string representation you need to convert it back to time/datetime by using strptime method.

I like how this guy does it — https://amalgjose.com/2015/02/19/python-code-for-calculating-the-difference-between-two-time-stamps.
Not sure if it has some cons.
But looks neat for me :)
from datetime import datetime
from dateutil.relativedelta import relativedelta
t_a = datetime.now()
t_b = datetime.now()
def diff(t_a, t_b):
t_diff = relativedelta(t_b, t_a) # later/end time comes first!
return '{h}h {m}m {s}s'.format(h=t_diff.hours, m=t_diff.minutes, s=t_diff.seconds)
Regarding to the question you still need to use datetime.strptime() as others said earlier.

Try this
import datetime
import time
start_time = datetime.datetime.now().time().strftime('%H:%M:%S')
time.sleep(5)
end_time = datetime.datetime.now().time().strftime('%H:%M:%S')
total_time=(datetime.datetime.strptime(end_time,'%H:%M:%S') - datetime.datetime.strptime(start_time,'%H:%M:%S'))
print total_time
OUTPUT :
0:00:05

import datetime as dt
from dateutil.relativedelta import relativedelta
start = "09:35:23"
end = "10:23:00"
start_dt = dt.datetime.strptime(start, "%H:%M:%S")
end_dt = dt.datetime.strptime(end, "%H:%M:%S")
timedelta_obj = relativedelta(start_dt, end_dt)
print(
timedelta_obj.years,
timedelta_obj.months,
timedelta_obj.days,
timedelta_obj.hours,
timedelta_obj.minutes,
timedelta_obj.seconds,
)
result:
0 0 0 0 -47 -37

Both time and datetime have a date component.
Normally if you are just dealing with the time part you'd supply a default date. If you are just interested in the difference and know that both times are on the same day then construct a datetime for each with the day set to today and subtract the start from the stop time to get the interval (timedelta).

Take a look at the datetime module and the timedelta objects. You should end up constructing a datetime object for the start and stop times, and when you subtract them, you get a timedelta.

you can use pendulum:
import pendulum
t1 = pendulum.parse("10:33:26")
t2 = pendulum.parse("10:43:36")
period = t2 - t1
print(period.seconds)
would output:
610

import datetime
day = int(input("day[1,2,3,..31]: "))
month = int(input("Month[1,2,3,...12]: "))
year = int(input("year[0~2020]: "))
start_date = datetime.date(year, month, day)
day = int(input("day[1,2,3,..31]: "))
month = int(input("Month[1,2,3,...12]: "))
year = int(input("year[0~2020]: "))
end_date = datetime.date(year, month, day)
time_difference = end_date - start_date
age = time_difference.days
print("Total days: " + str(age))

Concise if you are just interested in the time elapsed that is under 24 hours. You can format the output as needed in the return statement :
import datetime
def elapsed_interval(start,end):
elapsed = end - start
min,secs=divmod(elapsed.days * 86400 + elapsed.seconds, 60)
hour, minutes = divmod(min, 60)
return '%.2d:%.2d:%.2d' % (hour,minutes,secs)
if __name__ == '__main__':
time_start=datetime.datetime.now()
""" do your process """
time_end=datetime.datetime.now()
total_time=elapsed_interval(time_start,time_end)

Usually, you have more than one case to deal with and perhaps have it in a pd.DataFrame(data) format. Then:
import pandas as pd
df['duration'] = pd.to_datetime(df['stop time']) - pd.to_datetime(df['start time'])
gives you the time difference without any manual conversion.
Taken from Convert DataFrame column type from string to datetime.
If you are lazy and do not mind the overhead of pandas, then you could do this even for just one entry.

Here is the code if the string contains days also [-1 day 32:43:02]:
print(
(int(time.replace('-', '').split(' ')[0]) * 24) * 60
+ (int(time.split(' ')[-1].split(':')[0]) * 60)
+ int(time.split(' ')[-1].split(':')[1])
)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Measuring elapsed time in Pandas - python

Related

How to calculate relative volume using pandas with faster way?

How to add a certain time to a datetime?

Have a list of hours between two dates in python

Pandas; transform column with MM:SS,decimals into number of seconds

How to calculate the time interval between two time strings

Categories

Resources