I am trying to implement the RVOL by the time of day technical indicator, which can be used as the indication of market strength.
The logic behind this is as follows:
If the current time is 2022/3/19 13:00, we look through the same moment (13:00) at the previous N days and average all the previous volumes at that moment to calculate Average_volume_previous.
Then, RVOL(t) is volume(t)/Average_volume_previous(t).
It is hard to use methods like rolling and apply to deal with this complex logic in the code I wrote.
However, the operation time of for loop is catastrophically long.
from datetime import datetime
import pandas as pd
import numpy as np
datetime_array = pd.date_range(datetime.strptime('2015-03-19 13:00:00', '%Y-%m-%d %H:%M:%S'), datetime.strptime("2022-03-19 13:00:00", '%Y-%m-%d %H:%M:%S'), freq='30min')
volume_array = pd.Series(np.random.uniform(1000, 10000, len(datetime_array)))
df = pd.DataFrame({'Date':datetime_array, 'Volume':volume_array})
df.set_index(['Date'], inplace=True)
output = []
for idx in range(len(df)):
date = str(df.index[idx].hour)+':'+str(df.index[idx].minute)
temp_date = df.iloc[:idx].between_time(date, date)
output.append(temp_date.tail(day_len).mean().iloc[0])
output = np.array(output)
Practically, there might be missing data in the datetime array. So, it would be hard to use fixed length lookback period to solve this. Is there any way to make this code work faster?
I'm not sure I understand, however this is the solution as far as I understand.
I didn't use date as index
df.set_index(['Date'], inplace=True)
# Filter data to find instant
rolling_day = 10
hour = df['Date'].dt.hour == 13
minute = df['Date'].dt.minute == 0
df_moment = df[ore&minuti].copy()
Calculation of moving averages
df_moment['rolling'] = df_moment.rolling(rolling_day).mean()
Calculation of Average_volume_previous(t)/volume(t)
for idx_s, idx_e in zip(df_moment['Volume'][::rolling_day], df_moment['rolling'][rolling_day::rolling_day]):
print(f'{idx_s/idx_e}')
Output:
0.566379345408499
0.7229214799940626
0.6753586759429548
2.0588617812341354
0.7494803741982076
1.2132554086225438
I am working with time series data and I would like to know if there is a efficient & pythonic way to verify if the sequence of timestamps associated to the series is valid. In other words, I would like to know if the sequence of time stamps is in the correct ascending order without missing or duplicated values.
I suppose that verifying the correct order and the presence of duplicated values should be fairly straightforward but I am not so sure about the detection of missing timestamps.
numpy.diff can be used to find the difference between subsequent time stamps. These diffs can then be evaluated to determine if the timestamps look as expected:
import numpy as np
import datetime as dt
def errant_timestamps(ts, expected_time_step=None, tolerance=0.02):
# get the time delta between subsequent time stamps
ts_diffs = np.array([tsd.total_seconds() for tsd in np.diff(ts)])
# get the expected delta
if expected_time_step is None:
expected_time_step = np.median(ts_diffs)
# find the index of timestamps that don't match the spacing of the rest
ts_slow_idx = np.where(ts_diffs < expected_time_step * (1-tolerance))[0] + 1
ts_fast_idx = np.where(ts_diffs > expected_time_step * (1+tolerance))[0] + 1
# find the errant timestamps
ts_slow = ts[ts_slow_idx]
ts_fast = ts[ts_fast_idx]
# if the timestamps appear valid, return None
if len(ts_slow) == 0 and len(ts_fast) == 0:
return None
# return any errant timestamps
return ts_slow, ts_fast
sample_timestamps = np.array(
[dt.datetime.strptime(sts, "%d%b%Y %H:%M:%S") for sts in (
"05Jan2017 12:45:00",
"05Jan2017 12:50:00",
"05Jan2017 12:55:00",
"05Jan2017 13:05:00",
"05Jan2017 13:10:00",
"05Jan2017 13:00:00",
)]
)
print errant_timestamps(sample_timestamps)
I have a date range - say between 1925-01-01 and 1992-01-01. I'd like to generate a list of x dates between that range, and have those x dates generated follow a 'normal' (bell curve - see image) distribution.
There are many many answers on stackoverflow about doing this with integers (using numpy, scipy, etc), but I can't find a solid example with dates
As per #sascha's comment, a conversion from the dates to a time value does the job:
#!/usr/bin/env python3
import time
import numpy
_DATE_RANGE = ('1925-01-01', '1992-01-01')
_DATE_FORMAT = '%Y-%m-%d'
_EMPIRICAL_SCALE_RATIO = 0.15
_DISTRIBUTION_SIZE = 1000
def main():
time_range = tuple(time.mktime(time.strptime(d, _DATE_FORMAT))
for d in _DATE_RANGE)
distribution = numpy.random.normal(
loc=(time_range[0] + time_range[1]) * 0.5,
scale=(time_range[1] - time_range[0]) * _EMPIRICAL_SCALE_RATIO,
size=_DISTRIBUTION_SIZE
)
date_range = tuple(time.strftime(_DATE_FORMAT, time.localtime(t))
for t in numpy.sort(distribution))
print(date_range)
if __name__ == '__main__':
main()
Note that instead of the _EMPIRICAL_SCALE_RATIO, you could (should?) use scipy.stats.truncnorm to generate a truncated normal distribution.
Here is an implementation using datetime module that also allows to generate hours, minutes, seconds & is using Numpy/Pandas friendly date format.
from datetime import datetime
import numpy
def main(start, end, date_format, distribution_size, scale_ratio):
# Converting to timestamp
start = datetime.strptime(start, date_format).timestamp()
end = datetime.strptime(end, date_format).timestamp()
# Generate Normal Distribution
mu = datetime.strptime('1958-01-01T00:00:00', date_format).timestamp()
sigma = (end - start) * scale_ratio
total_distribution = np.random.normal(loc=mu, scale=sigma, size=distribution_size)
# Sort and Convert back to datetime
sorted_distribution = numpy.sort(total_distribution)
date_range = tuple(datetime.fromtimestamp(t) for t in sorted_distribution)
print(date_range)
start = '1925-01-01T00:00:00'
end = '1992-01-01T00:00:00'
date_format = '%Y-%m-%dT%H:%M:%S'
main(start=start, end=end, date_format=date_format, distribution_size=1000, scale_ratio=0.05)
Results:
You can also blend multiple distributions like this:
dist_1 = np.random.normal(loc=mu_1, scale=sigma_1, size=size_1)
dist_2 = np.random.normal(loc=mu_2, scale=sigma_2, size=size_2)
all_distributions = np.concatenate([dist_1, dist_2])
I want to linear interpolation some points between two time string.
So I try to convert string to datetime then insert some point then convert datetime to string. but it seems the timezone not correct.
In below example. I wish to insert one point between 9-28 11:07:57.435" and "9-28 12:00:00.773".
#!/usr/bin/env python
import numpy as np
from time import mktime
from datetime import datetime
#-----------------------------------------#
def main():
dtstr = [
"9-28 11:07:57.435",
"9-28 12:00:00.773"
]
print "input",dtstr
dtlst = str2dt(dtstr)
floatlst = dt2float(dtlst)
bins = 3
x1 = list(np.arange(floatlst[0],floatlst[-1],(floatlst[-1]-floatlst[0])/bins))
dtlst = float2dt(x1)
dtstr = dt2str(dtlst)
print "output",dtstr
return
def str2dt(strlst):
dtlst = [datetime.strptime("2014-"+i, "%Y-%m-%d %H:%M:%S.%f") for i in strlst]
return dtlst
def dt2float(dtlst):
floatlst = [mktime(dt.timetuple()) for dt in dtlst]
return floatlst
def dt2str(dtlst):
dtstr = [dt.strftime("%Y-%m-%d %H:%M:%S %Z%z") for dt in dtlst]
return dtstr
def float2dt(floatlst):
dtlst = [datetime.utcfromtimestamp(seconds) for seconds in floatlst]
return dtlst
#-----------------------------------------#
if __name__ == "__main__":
main()
The output looks like:
input ['9-28 11:07:57.435', '9-28 12:00:00.773']
output ['2014-09-28 16:07:57 ', '2014-09-28 16:25:18 ', '2014-09-28 16:42:39 ']
Two questions here:
The input and output has 4 hours differ (9-28 16:07:57 to 9-28 11:07:57). I guess it caused by timezone but not sure how to fix it.
I wish the first and last point the same as input, but now it seems the last point is less than the input last point (16:42:39 vs 12:00:00).
Q1. You're right about the timezones, you're using time.mktime which converts struct_time to seconds assuming the input is local time, but then using datetime.utcfromtimestamp which (naturally) converts to utc. Use datetime.fromtimestamp instead to keep everything in local time.
Q2. As with the native Python range/xrange, when you do numpy.arange(x, y, z), the result starts with x and goes upto, but not including y (except in weird floating point roundoff cases. Don't rely on this behaviour). if you want consistent behaviour on the end points w/ floating values, use numpy.linspace
On the other hand, why convert datetime to seconds, then go back again? datetime objects support addition and subtraction. Below would be my suggestion.
from time import mktime, localtime
from datetime import datetime
from copy import copy
def main():
input_timestrings = ["9-28 11:07:57.435", "9-28 12:00:00.773"]
input_datetimes = timestrings_to_datestimes(input_timestrings)
start_datetime = input_datetimes[0]
end_datetime = input_datetimes[1]
# subtraction between datetime objects returns a timedelta object
period_length = end_datetime - start_datetime
bins = 3
# operation w/ timedelta objects and datetime objects work pretty much as you'd expect it to
delta = period_length / bins
datetimes = list(custom_range(start_datetime, end_datetime + delta, delta))
output_timestrings = datetimes_to_timestrings(datetimes)
print output_timestrings
return
def timestrings_to_datetimes(timestrings):
datetimes = [datetime.strptime("2014-"+timestring, "%Y-%m-%d %H:%M:%S.%f") for timestring in timestrings]
return datetimes
def datetimes_to_timestrings(datetimes):
timestrings = [datetime_.strftime("%Y-%m-%d %H:%M:%S %Z%z") for datetime_ in datetimes]
return timestrings
def custom_range(start, end, jump):
x = start
while x < end:
yield x
x = x + jump
if __name__ == "__main__":
main()
I have two times, a start and a stop time, in the format of 10:33:26 (HH:MM:SS). I need the difference between the two times. I've been looking through documentation for Python and searching online and I would imagine it would have something to do with the datetime and/or time modules. I can't get it to work properly and keep finding only how to do this when a date is involved.
Ultimately, I need to calculate the averages of multiple time durations. I got the time differences to work and I'm storing them in a list. I now need to calculate the average. I'm using regular expressions to parse out the original times and then doing the differences.
For the averaging, should I convert to seconds and then average?
Yes, definitely datetime is what you need here. Specifically, the datetime.strptime() method, which parses a string into a datetime object.
from datetime import datetime
s1 = '10:33:26'
s2 = '11:15:49' # for example
FMT = '%H:%M:%S'
tdelta = datetime.strptime(s2, FMT) - datetime.strptime(s1, FMT)
That gets you a timedelta object that contains the difference between the two times. You can do whatever you want with that, e.g. converting it to seconds or adding it to another datetime.
This will return a negative result if the end time is earlier than the start time, for example s1 = 12:00:00 and s2 = 05:00:00. If you want the code to assume the interval crosses midnight in this case (i.e. it should assume the end time is never earlier than the start time), you can add the following lines to the above code:
if tdelta.days < 0:
tdelta = timedelta(
days=0,
seconds=tdelta.seconds,
microseconds=tdelta.microseconds
)
(of course you need to include from datetime import timedelta somewhere). Thanks to J.F. Sebastian for pointing out this use case.
Try this -- it's efficient for timing short-term events. If something takes more than an hour, then the final display probably will want some friendly formatting.
import time
start = time.time()
time.sleep(10) # or do something more productive
done = time.time()
elapsed = done - start
print(elapsed)
The time difference is returned as the number of elapsed seconds.
Here's a solution that supports finding the difference even if the end time is less than the start time (over midnight interval) such as 23:55:00-00:25:00 (a half an hour duration):
#!/usr/bin/env python
from datetime import datetime, time as datetime_time, timedelta
def time_diff(start, end):
if isinstance(start, datetime_time): # convert to datetime
assert isinstance(end, datetime_time)
start, end = [datetime.combine(datetime.min, t) for t in [start, end]]
if start <= end: # e.g., 10:33:26-11:15:49
return end - start
else: # end < start e.g., 23:55:00-00:25:00
end += timedelta(1) # +day
assert end > start
return end - start
for time_range in ['10:33:26-11:15:49', '23:55:00-00:25:00']:
s, e = [datetime.strptime(t, '%H:%M:%S') for t in time_range.split('-')]
print(time_diff(s, e))
assert time_diff(s, e) == time_diff(s.time(), e.time())
Output
0:42:23
0:30:00
time_diff() returns a timedelta object that you can pass (as a part of the sequence) to a mean() function directly e.g.:
#!/usr/bin/env python
from datetime import timedelta
def mean(data, start=timedelta(0)):
"""Find arithmetic average."""
return sum(data, start) / len(data)
data = [timedelta(minutes=42, seconds=23), # 0:42:23
timedelta(minutes=30)] # 0:30:00
print(repr(mean(data)))
# -> datetime.timedelta(0, 2171, 500000) # days, seconds, microseconds
The mean() result is also timedelta() object that you can convert to seconds (td.total_seconds() method (since Python 2.7)), hours (td / timedelta(hours=1) (Python 3)), etc.
This site says to try:
import datetime as dt
start="09:35:23"
end="10:23:00"
start_dt = dt.datetime.strptime(start, '%H:%M:%S')
end_dt = dt.datetime.strptime(end, '%H:%M:%S')
diff = (end_dt - start_dt)
diff.seconds/60
This forum uses time.mktime()
Structure that represent time difference in Python is called timedelta. If you have start_time and end_time as datetime types you can calculate the difference using - operator like:
diff = end_time - start_time
you should do this before converting to particualr string format (eg. before start_time.strftime(...)). In case you have already string representation you need to convert it back to time/datetime by using strptime method.
I like how this guy does it — https://amalgjose.com/2015/02/19/python-code-for-calculating-the-difference-between-two-time-stamps.
Not sure if it has some cons.
But looks neat for me :)
from datetime import datetime
from dateutil.relativedelta import relativedelta
t_a = datetime.now()
t_b = datetime.now()
def diff(t_a, t_b):
t_diff = relativedelta(t_b, t_a) # later/end time comes first!
return '{h}h {m}m {s}s'.format(h=t_diff.hours, m=t_diff.minutes, s=t_diff.seconds)
Regarding to the question you still need to use datetime.strptime() as others said earlier.
Try this
import datetime
import time
start_time = datetime.datetime.now().time().strftime('%H:%M:%S')
time.sleep(5)
end_time = datetime.datetime.now().time().strftime('%H:%M:%S')
total_time=(datetime.datetime.strptime(end_time,'%H:%M:%S') - datetime.datetime.strptime(start_time,'%H:%M:%S'))
print total_time
OUTPUT :
0:00:05
import datetime as dt
from dateutil.relativedelta import relativedelta
start = "09:35:23"
end = "10:23:00"
start_dt = dt.datetime.strptime(start, "%H:%M:%S")
end_dt = dt.datetime.strptime(end, "%H:%M:%S")
timedelta_obj = relativedelta(start_dt, end_dt)
print(
timedelta_obj.years,
timedelta_obj.months,
timedelta_obj.days,
timedelta_obj.hours,
timedelta_obj.minutes,
timedelta_obj.seconds,
)
result:
0 0 0 0 -47 -37
Both time and datetime have a date component.
Normally if you are just dealing with the time part you'd supply a default date. If you are just interested in the difference and know that both times are on the same day then construct a datetime for each with the day set to today and subtract the start from the stop time to get the interval (timedelta).
Take a look at the datetime module and the timedelta objects. You should end up constructing a datetime object for the start and stop times, and when you subtract them, you get a timedelta.
you can use pendulum:
import pendulum
t1 = pendulum.parse("10:33:26")
t2 = pendulum.parse("10:43:36")
period = t2 - t1
print(period.seconds)
would output:
610
import datetime
day = int(input("day[1,2,3,..31]: "))
month = int(input("Month[1,2,3,...12]: "))
year = int(input("year[0~2020]: "))
start_date = datetime.date(year, month, day)
day = int(input("day[1,2,3,..31]: "))
month = int(input("Month[1,2,3,...12]: "))
year = int(input("year[0~2020]: "))
end_date = datetime.date(year, month, day)
time_difference = end_date - start_date
age = time_difference.days
print("Total days: " + str(age))
Concise if you are just interested in the time elapsed that is under 24 hours. You can format the output as needed in the return statement :
import datetime
def elapsed_interval(start,end):
elapsed = end - start
min,secs=divmod(elapsed.days * 86400 + elapsed.seconds, 60)
hour, minutes = divmod(min, 60)
return '%.2d:%.2d:%.2d' % (hour,minutes,secs)
if __name__ == '__main__':
time_start=datetime.datetime.now()
""" do your process """
time_end=datetime.datetime.now()
total_time=elapsed_interval(time_start,time_end)
Usually, you have more than one case to deal with and perhaps have it in a pd.DataFrame(data) format. Then:
import pandas as pd
df['duration'] = pd.to_datetime(df['stop time']) - pd.to_datetime(df['start time'])
gives you the time difference without any manual conversion.
Taken from Convert DataFrame column type from string to datetime.
If you are lazy and do not mind the overhead of pandas, then you could do this even for just one entry.
Here is the code if the string contains days also [-1 day 32:43:02]:
print(
(int(time.replace('-', '').split(' ')[0]) * 24) * 60
+ (int(time.split(' ')[-1].split(':')[0]) * 60)
+ int(time.split(' ')[-1].split(':')[1])
)