python time interval algorithm sum - python

Assume I have 2 time intervals,such as 16:30 - 20:00 AND 15:00 - 19:00, I need to find the total time between these two intervals so the result is 5 hours (I add both intervals and subtract the intersecting interval), how can I write a generic function which also deals with all cases such as one interval inside other(so the result is the interval of the bigger one), no intersection (so the result is the sum of both intervals).
My incoming data structure is primitive, simply string like "15:30" so a conversion may be needed.
Thanks

from datetime import datetime, timedelta
START, END = xrange(2)
def tparse(timestring):
return datetime.strptime(timestring, '%H:%M')
def sum_intervals(intervals):
times = []
for interval in intervals:
times.append((tparse(interval[START]), START))
times.append((tparse(interval[END]), END))
times.sort()
started = 0
result = timedelta()
for t, type in times:
if type == START:
if not started:
start_time = t
started += 1
elif type == END:
started -= 1
if not started:
result += (t - start_time)
return result
Testing with your times from the question:
intervals = [
('16:30', '20:00'),
('15:00', '19:00'),
]
print sum_intervals(intervals)
That prints:
5:00:00
Testing it together with data that doesn't overlap
intervals = [
('16:30', '20:00'),
('15:00', '19:00'),
('03:00', '04:00'),
('06:00', '08:00'),
('07:30', '11:00'),
]
print sum_intervals(intervals)
result:
11:00:00

I'll assume you can do the conversion to something like datetime on your own.
Sum the two intervals, then subtract any overlap. You can get the overlap by comparing the min and max of each of the two ranges.

Code for when there is an overlap, please add it to one of your solutions:
def interval(i1, i2):
minstart, minend = [min(*e) for e in zip(i1, i2)]
maxstart, maxend = [max(*e) for e in zip(i1, i2)]
if minend < maxstart: # no overlap
return minend-minstart + maxend-maxstart
else: # overlap
return maxend-minstart

You'll want to convert your strings into datetimes. You can do this with datetime.datetime.strptime.
Given intervals of datetime.datetime objects, if the intervals are:
int1 = (start1, end1)
int2 = (start2, end2)
Then isn't it just:
if end1 < start2 or end2 < start1:
# The intervals are disjoint.
return (end1-start1) + (end2-start2)
else:
return max(end1, end2) - min(start1, start2)

Related

Python: Selecting longest consecutive series of dates in list

I have a series of lists (np.arrays, actually), of which the elements are dates.
id
0a0fe3ed-d788-4427-8820-8b7b696a6033 [2019-01-30, 2019-01-31, 2019-02-01, 2019-02-0...
0a48d1e8-ead2-404a-a5a2-6b05371200b1 [2019-01-30, 2019-01-31, 2019-02-01, 2019-02-0...
0a9edba1-14e3-466a-8d0c-f8a8170cefc8 [2019-01-29, 2019-01-30, 2019-01-31, 2019-02-0...
Name: startDate, dtype: object
For each element in the series (i.e. for each list of dates), I want to retain the longest sublist in which all dates are consecutive. I'm struggling to approach this in a pythonic (simple/efficient) way. The only approach that I can think of is to use multiple loops: loop over the series values (the lists), and loop over each element in the list. I would then store the first date and the number of consecutive days, and use temporary values to overwrite the results if a longer sequence of consecutive days is encountered. This seems highly inefficient though. Is there a better way of doing this?
Since you mention you are using numpy arrays of dates it makes sense to stick to numpy types instead of converting to the built-in type. I'm assuming here that your arrays have dtype 'datetime64[D]'. In that case you could do something like
import numpy as np
date_list = np.array(['2005-02-01', '2005-02-02', '2005-02-03',
'2005-02-05', '2005-02-06', '2005-02-07', '2005-02-08', '2005-02-09',
'2005-02-11', '2005-02-12',
'2005-02-14', '2005-02-15', '2005-02-16', '2005-02-17',
'2005-02-19', '2005-02-20',
'2005-02-22', '2005-02-23', '2005-02-24',
'2005-02-25', '2005-02-26', '2005-02-27', '2005-02-28'],
dtype='datetime64[D]')
i0max, i1max = 0, 0
i0 = 0
for i1, date in enumerate(date_list):
if date - date_list[i0] != np.timedelta64(i1-i0, 'D'):
if i1 - i0 > i1max - i0max:
i0max, i1max = i0, i1
i0 = i1
print(date_list[i0max:i1max])
# output: ['2005-02-05' '2005-02-06' '2005-02-07' '2005-02-08' '2005-02-09']
Here, i0 and i1 indicate the start and stop indeces of the current sub-array of consecutive dates, and i0max and i1max the start and stop indices of the longest sub-array found so far. The solution uses the fact that the difference between the i-th and zeroth entry in a list of consecutive dates is exactly i days.
You can convert list to ordinals which are increasing for all consecutive dates. Which means next_date = previous_date + 1 read more.
Then find the longest consecutive sub-array.
This process will take O(n)->single loop time which is the most efficient way to get this.
CODE
from datetime import datetime
def get_consecutive(date_list):
# convert to ordinals
v = [datetime.strptime(d, "%Y-%m-%d").toordinal() for d in date_list]
consecutive = []
run = []
dates = []
# get consecutive ordinal sequence
for i in range(1, len(v) + 1):
run.append(v[i-1])
dates.append(date_list[i-1])
if i == len(v) or v[i-1] + 1 != v[i]:
if len(consecutive) < len(run):
consecutive = dates
dates = []
run = []
return consecutive
OUTPUT:
date_list = ['2019-01-29', '2019-01-30', '2019-01-31','2019-02-05']
get_consecutive(date_list )
# ordinales will be -> v = [737088, 737089, 737090, 737095]
OUTPUT:
['2019-01-29', '2019-01-30', '2019-01-31']
Now use get_consecutive in df.column.apply(get_consecutive)it will give you all increasing date list. Or you can all function for each list if you are using some other data structure.
I'm going to reduce this problem to finding consecutive days in a single list. There are a few tricks that make it more Pythonic as you ask. The following script should run as-is. I've documented how it works inline:
from datetime import timedelta, date
# example input
days = [
date(2020, 1, 1), date(2020, 1, 2), date(2020, 1, 4),
date(2020, 1, 5), date(2020, 1, 6), date(2020, 1, 8),
]
# store the longest interval and the current consecutive interval
# as we iterate through a list
longest_interval_index = current_interval_index = 0
longest_interval_length = current_interval_length = 1
# using zip here to reduce the number of indexing operations
# this will turn the days list into [(2020-01-1, 2020-01-02), (2020-01-02, 2020-01-03), ...]
# use enumerate to get the index of the current day
for i, (previous_day, current_day) in enumerate(zip(days, days[1:]), start=1):
if current_day - previous_day == timedelta(days=+1):
# we've found a consecutive day! increase the interval length
current_interval_length += 1
else:
# nope, not a consecutive day! start from this day and start
# counting from 1
current_interval_index = i
current_interval_length = 1
if current_interval_length > longest_interval_length:
# we broke the record! record it as the longest interval
longest_interval_index = current_interval_index
longest_interval_length = current_interval_length
print("Longest interval index:", longest_interval_index)
print("Longest interval: ", days[longest_interval_index:longest_interval_index + longest_interval_length])
It should be easy enough to turn this into a reusable function.

how to calculate the time difference between the values in column over midnight?

I have a column of timestamp values for each hour for 25 days. I want to take the time difference between them. I have tried the code in the past but it gives me a negative value. I have this new code, but it is for two values, but how can I tweak it to loop over the values of column?
def time_diff(start, end):
if isinstance(start, datetime_time): # convert to datetime
assert isinstance(end, datetime_time)
start, end = [datetime.combine(datetime.min, t) for t in [start, end]]
if start <= end: # e.g., 10:33:26-11:15:49
return end - start
else: # end < start e.g., 23:55:00-00:25:00
end += timedelta(1) # +day
assert end > start
return end - start
Use:
s = dataFrame.groupby(['DeviceName'])['Time_ist_td'].diff()
dataFrame['Time_diff']=s.where(s>pd.Timedelta(0),-s).dt.total_seconds()) // 60
basic idea:
s=pd.Series([-1,1])
print(s)
0 -1
1 1
dtype: int64
s=s.where(s>0,-s)
print(s)
0 1
1 1
dtype: int64

Is there a way to sum a list based on another list?

Python 3 sum a list of time based on dates from another list.
I used the mentioned code to arrive at the total time, but I am trying to aggregation the time for each dates. e.g. '02-01-2019' should sum up to '08:00:00'.
Date = ['01-01-2019', '02-01-2019', '02-01-2019']
Time = ['07:00:00', '06:00:00','02:00:00']
total = 0
for t in Time:
h, m, s = map(int, t.split(":"))
total += 3600*h + 60*m + s
d="%02d:%02d:%02d" % (total / 3600, total / 60 % 60, total % 60)
I need an if statement to check if the sum of time for each date>='08:00:00'.
e.g
if time_for_each_date>='08:00:00':
do something
else do something else.
This might help get you started on accomplishing your ultimate goal:
Date = ['01-01-2019', '02-01-2019', '02-01-2019']
Time = ['07:00:00', '06:00:00','02:00:00']
import datetime
data = zip(Date, Time)
dates = []
for d in data:
dt = datetime.datetime.strptime("{}, {}".format(*d), "%m-%d-%Y, %H:%M:%S")
dates.append(dt)
totals = {}
for d in dates:
if d.date() not in totals: totals[d.date()] = d.hour
else: totals[d.date()] += d.hour
for date, time in totals.items():
if time >= 8:
# do something
print('do something:', date)
else:
print('do something else.')

defining a function that gives me the distance between 2 dates

enter image description here
I need to write a piece of code that calculates the distance between 2 dates, as shown in the screenshot above^
This is what I put:
def dist2dates(date1,date2):
result= date2[0:2]-date1[0:2],"/",date2[3:]-date1[3:]
return result
res2=dist2dates(1116,1129)
print(res2)
This produces an error that says:
TypeError: 'int' object is not subscriptable
I am not too sure what I am doing wrong. Also, I am not too clear on how to put a date such as "08/16" in the argument? Can someone help me define such a function?
This is the best I could come up with:
import re
def dist2dates(date1,date2):
if re.match(r'^\d{2}\/\d{2}$', date1) and re.match(r'^\d{2}\/\d{2}$', date2):
diff1 = abs(((int(date2[0:2])-int(date1[0:2]))))
diff2 = abs((int(date2[3:])-int(date1[3:])))
result= ("%02d/%02d") %(diff1, diff2)
return result
else:
return "Incorrect format"
res2=dist2dates("11/16","11/29")
print(res2)
Output:
00/13
It always returns the absolute value abs(), so you wont get negative numbers. It also checks if the date string is formatted correctly using RegEx.
=================== Before edit =================
Try this:
def dist2dates(date1,date2):
result= ("%02d/%02d") %(int(date2[0:2])-int(date1[0:2]) ,int(date2[3:])-int(date1[3:]))
return result
res2=dist2dates("10/16","11/29")
print(res2)
Output:
01/13
The inputs to the dist2dates function should be strings not integers.
Cast the strings into integers so you can subtract the numbers.
The nature of this problem is sort of strange since now your year has 360 days, but this structure handles both if the first date is larger or smaller, it works by evaluating the difference in days based on 30 day months then converts them to days and uses zfill to print the output.
def dist2dates(a, b):
a = list(map(int, a.split('/')))
b = list(map(int, b.split('/')))
if b[0] < a[0]:
days = 360 - 30*(a[0] - b[0]) - (a[1] - b[1])
else:
days = 30*((b[0]- 1) - a[0]) + (30 - a[1]) + b[1]
print(days)
tup = divmod(days,30)
mm, dd = tup[0], tup[1]
return f'{str(mm).zfill(2)}/{str(dd).zfill(2)}'
print(dist2dates('03/20', '01/10')) # => 09/20
print(dist2dates('01/12', '05/20')) # => 04/08
why not use datetime module to calculate the timedelta & use the strptime/strftime for formatting?
from datetime import datetime, timedelta
def dist2dates(strdate1,strdate2):
'''format mm/dd, assumes month = 30 days always'''
dtfmt = '%m/%d'
date1 = datetime.strptime(strdate1, dtfmt)
date2 = datetime.strptime(strdate2, dtfmt)
delta = date2 - date1
year = date1.year
month, day = divmod(delta.days, 30)
result = datetime.strftime(datetime(year, round(month), round(day)), '%m/%d')
return result
res = dist2dates('01/16','11/29')
print(res)
# 10/13

Find highest possible number under certain time span?

For Python 3, is there a possibility to find the highest possible calculated number in a function under a specific time span?
For example if something would take almost 'forever', is there a way to find out the highest possible number to be calculated under 1 minute?
Here is the code:
def fibonacci5(n):
f1, f2 = 1, 0
while n > 0:
f1, f2 = f1 + f2, f1
n -= 1
return f2
I am trying to use the possible solution for finding the number that takes 1 second via timeit.
repeats = 10
t = timeit.Timer("fibonacci5(500000)", globals=globals())
time = t.timeit(repeats)
print ("average execution time:", time/repeats)
But 500.000 takes on average 2,6s, while 250.000 takes on average 0,6s - so that solution can't work.
you could add a timer to your function to make it stop after a given time:
from datetime import datetime, timedelta
max_runtime = timedelta(seconds=1)
def fibonacci5(n):
stop_time = datetime.now() + max_runtime
f1, f2 = 1, 0
while n > 0:
f1, f2 = f1 + f2, f1
n -= 1
if datetime.now() > stop_time:
return f2, 'timelimit reached'
return f2
note that if it returns when the time has run out that it will not just return a number, but a tuple with the number and the string 'timelimit reached'. that way you can differentiate between normal termination and timeout (there may be better ways to handle that...).
the caveat here is that the if line (at least as long as your ints are still very small) is probably the line of the function that takes up the most amount of time... the results will therefore not represent the actual run-times very exactly...
also note that there are way more efficient ways to calculate fibonacci numbers.
if we write Fibonacci sequence generator like
def fibonacci():
a, b = 0, 1
while True:
yield b
a, b = b, a + b
it looks naive but works fast enough, e.g. if you need 500000th Fibonacci number we can use itertools.islice
from itertools import islice
fibonacci_500000 = next(islice(fibonacci(), 500000, 500001))
print(fibonacci_500000)
which took about 5 seconds on my old machine, output is too big to insert, but it looks like
47821988144175...more digits here...2756008390626
but if you really need to find out which value we've calculated after some time – we can use timedelta and datetime objects like
from datetime import datetime, timedelta
def fibonacci():
a, b = 0, 1
while True:
yield b
a, b = b, a + b
if __name__ == '__main__':
duration = timedelta(seconds=5)
fibonacci_numbers = fibonacci()
stop = datetime.now() + duration
for index, number in enumerate(fibonacci_numbers, start=1):
if datetime.now() >= stop:
break
print(index)
which gives us 505352th Fibonacci number calculated after approximately 5 seconds (we can also print number, but it is too long)

Categories