Divide two timespans by eachother in Python/dateutil - python

please consider this problem:
I have a date in the past from where I start adding periods. When adding these periods results in a date greater than today, I want to stop, and check what the last date is.
This is a functionality for calculating debit dates in a membership. A member joins, say, 2007-01-31. He is debited every month. Let's say today is 2013-03-29 (it actually is atm). So I need to start counting months since 2007-01-31 and when I get past today's date, I need to stop. I can then see that the next debit date is 2013-03-31.
I am using the dateutil library to implement this, adding relativedelta's in a while loop until I surpass the current date. (I know it's probably not the best way, but I'm still quite new at Python and this is a proof-of-concept). The problem is that when I add a month to 2007-01-31, the next date is 2007-02-28, which is correct. But the next iteration the date is 2007-03-28, because dateutil doesn't recognize the 28th as the last day of the month to keep it intact and iterate to the last day of march. Ofcourse, that's a perfectly valid implementation. I then experimented with dateutils rrule object, but it has the same principles. It outputs a list of dates, but it simply skips the months that don't have enough days.
period = rrule(MONTHLY, interval=1, dtstart=datetime.date(2012, 5, 31), until=datetime.date(2013, 3, 29))
print(list(period))
Then I thought of a different approach:
If I could count the number of periods in the timespan between 2007-01-31 and 2013-03-29, I can add those number of periods to the startdate, and dateutil would return the right date.
The problem there is that the period isn't always one month. It can also be four weeks, a quarter or a year, for example.
I couldn't find a way to take a relativedelta and divide it with another relativedelta, to get a number of times the latter goes in the first.
If anyone can point me in the right direction I would appreciate it. Is there, for example, a library that can do this (divide timespans by each other, and output the result in a given timeblock, like months or weeks)? Is there perhaps a datediff function that accepts a period as an input (I know for example in vbscript you can get the difference between two dates in whatever period you want, be it weeks, months, days, whatever). Is there perhaps a totally different solution?
For completeness, I will include the code, but I think the text explains it all already:
def calculate(self, testdate=datetime.date.today()):
self._testdate = testdate
while self.next < self._testdate:
self.next += self._ddinterval
self.previous = self.next - self._ddinterval
return self.next
Thanks,
Erik
edit: I now have a solution that does what it's supposed to, but it's hardly Pythonic, elegant or speedy. So the question remains the same, if anyone can come up with a better solution, please do. Here's what I came up with:
def calculate(self, testdate=datetime.date.today()):
self._testdate = testdate
start = self.next
count = 0
while self.next < self._testdate:
count += 1
self.next = start + (count * self._ddinterval)
self.previous = self.next - self._ddinterval
return self.next

Instead of using the new value after adding your period as the starting point for the next loop, create an ever-increasing delta instead:
today = datetime.date.today()
start = datetime.date(2007, 1, 31)
period = relativedelta(months=1)
delta = period
while start + delta < today:
delta += period
next = start + delta
This results in:
>>> import datetime
>>> from dateutil.relativedelta import relativedelta
>>> today = datetime.date.today()
>>> start = datetime.date(2007, 1, 31)
>>> period = relativedelta(months=1)
>>> delta = period
>>> while start + delta < today:
... delta += period
...
>>> delta
relativedelta(years=+6, months=+3)
>>> start + delta
datetime.date(2013, 4, 30)
This works for any variable period lengths including quarters and years. For periods measured in exact weeks or days you can use a timedelta() object too, but this is a generic solution.
You cannot 'divide' time periods when using variable-width periods such as months and quarters. For time periods measured in whole days you can simply convert the difference between two dates to days (from the timedelta) and then get the modulus:
period = datetime.timedelta(days=28) # 4 weeks
delta = today - start
remainder = delta.days % period.days
end = today + datetime.timedelta(days=remainder)
which gives:
>>> period = datetime.timedelta(days=28) # 4 weeks
>>> delta = today - start
>>> remainder = delta.days % period.days
>>> today + datetime.timedelta(days=remainder)
datetime.date(2013, 4, 17)

If your delta is variable with respect to base time (i.e., 1 month can mean any of 28-31 days depending), then you're stuck with a loop.
If delta is a constant day count, however, you can sidestep iteration by converting to integers and doing a modulo operation.

Related

How can I find the elapsed business hours between two dates using pandas' CustomBusinessHour objects?

If I want to find the number of hours between two datetime objects, I can do something like this:
from datetime import datetime
today = datetime.today()
day_after_tomorrow = datetime(2022, 9, 24)
diff = (day_after_tomorrow - today).total_seconds() / 3600
print(diff)
which returns: 37.58784580333333 hours.
But this is the number of real hours between two dates. I want to know the number of specific business hours between two dates.
I can define two CustomBusinessHour objects with pandas to specify those business hours (which are 8AM to 4:30PM M-F, and 8AM to 12PM on Saturday, excluding US Federal holidays):
from pandas.tseries.offsets import CustomBusinessHour
from pandas.tseries.holiday import USFederalHolidayCalendar
business_hours_mtf = CustomBusinessHour(calendar=USFederalHolidayCalendar(), start='08:00', end='16:30')
business_hours_sat = CustomBusinessHour(calendar=USFederalHolidayCalendar(), start='08:00', end='12:00')
My understanding is that CustomBusinessHour is a type of pandas DateOffset object, so it should behave just like a relativedelta object. So I should be able to use it in the datetime arithmetic somehow, to get the number I want.
And that's as far as I was able to get.
What I think I'm struggling to understand is how relativedeltas work, and how to actually use them in datetime arithmetic.
Is this the right approach? If so, how can I use these CustomBusinessHour objects to get an accurate amount of elapsed business hours between the two dates?
I figured out a solution. It feels ugly and hacky, but it seems to work. Hopefully someone else has a simpler or more elegant solution.
Edit: I cleaned up the documentation a little bit to make it easier to read. Also added a missing kwarg in business_hours_sat. Figuring this out was a headache, so if anyone else has to deal with this problem, hopefully this solution helps.
from datetime import datetime, timedelta
from pandas.tseries.offsets import CustomBusinessHour
from pandas.tseries.holiday import USFederalHolidayCalendar
business_hours_mtf = CustomBusinessHour(calendar=USFederalHolidayCalendar(), start='08:00', end='16:30')
business_hours_sat = CustomBusinessHour(calendar=USFederalHolidayCalendar(), weekmask='Sat', start='08:00', end='12:00')
def get_business_hours_range(earlier_date: datetime, later_date: datetime) -> float:
"""Return the number of business hours between `earlier_date` and `later_date` as a float with two decimal places.
Algorithm:
1. Increment `earlier_date` by 1 "business hour" until it's further in the future than `later_date`.
2. Also increment an `elapsed_business_hours` variable by 1.
3. Once `earlier_date` is larger (further in the future) than `later_date`...
a. Roll back `earlier_date` by one business hour.
b. Get the close of business hour for `earlier_date` ([3a]).
c. Get the number of minutes between [3b] and [3a] (`minutes_remaining`).
d. Create a timedelta with `elapsed_business_hours` and `minutes_remaining`
e. Represent this timedelta as a float with two decimal places.
f. Return this float.
"""
# Count how many "business hours" have elapsed between the `earlier_date` and `later_date`.
elapsed_business_hours = 0.0
current_day_of_week = 0
while earlier_date < later_date:
day_of_week = earlier_date.isoweekday()
# 6 = Saturday
if day_of_week == 6:
# Increment `earlier_date` by one "business hour", as specified by the `business_hours_sat` CBH object.
earlier_date += business_hours_sat
# Increment the counter of how many "business hours" have elapsed between these two dates.
elapsed_business_hours += 1
# Save the current day of the week in `earlier_date`, in case this is the last iteration of this while loop.
current_day_of_week = day_of_week
# 1 = Monday, 2 = Tuesday, ...
elif day_of_week in (1, 2, 3, 4, 5):
# Increment `earlier_date` by one "business hour", as specified by the `business_hours_mtf` CBH object.
earlier_date += business_hours_mtf
# Increment the counter of how many "business hours" have elapsed between these two dates.
elapsed_business_hours += 1
# Save the current day of the week in `earlier_date`, in case this is the last iteration of this while loop.
current_day_of_week = day_of_week
# Once we've incremented `earlier_date` to a date further in the future than `later_date`, we know that we've counted
# all the full (60min) "business hours" between `earlier_date` and `later_date`. (We can only increment by one hour when using
# CBH, so when we make this final increment, we may be skipping over a few minutes in that last day.)
#
# So now we roll `earlier_date` back by 1 business hour, to the last full business hour before `later_date`. Then we get the
# close of business hour for that day, and subtract `earlier_date` from it. This will give us whatever minutes may be remaining
# in that day, that weren't accounted for when tallying the number of "business hours".
#
# But before we do these things, we need to check what day of the week the last business hour is, so we know which closing time
# to use.
if current_day_of_week == 6:
ed_rolled_back = earlier_date - business_hours_sat
ed_closing_time = datetime.combine(ed_rolled_back, business_hours_sat.end[0])
elif current_day_of_week in (1, 2, 3, 4, 5):
ed_rolled_back = earlier_date - business_hours_mtf
ed_closing_time = datetime.combine(ed_rolled_back, business_hours_mtf.end[0])
minutes_remaining = (ed_closing_time - ed_rolled_back).total_seconds() / 60
if 0 < minutes_remaining < 60:
delta = timedelta(hours=elapsed_business_hours, minutes=minutes_remaining)
else:
delta = timedelta(hours=elapsed_business_hours)
delta_hours = round(float(delta.total_seconds() / 3600), 2)
return delta_hours

How to calculate working days using python between to dates?

I did find this solution click here to see article but still not exactly what I was looking for. The solution calculates all the days between 2 dates, including weekends. So, is there a solution that excludes the weekends in the calculation?
So, what I did is take that solution and expand it this way:
crtD = datetime.datetime.strptime(pd.loc[x,'createDate'], '%m/%d/%Y') # start date
tdyD = datetime.datetime.today() # end date
dayx = tdyD - crtD # number of days between start and end date. Includes weekends
wkds = dayx.days + 1 # eliminates time stamp and just leaves the number of days and adds 1 day
wkns = round(wkds/7,0) # divides the number of days by seven and rounds the result to the nearest integer
networkdays = int(wkds - wkns) - 1
print(networkdays)
I embedded these lines of code in a for loop. Hope this helps. If you have a solution to include Holidays, please post it here.

Python get date from weekday

Let's say I have 11 Sessions for myself to complete. I haven't set dates for these sessions but rather just weekdays where one session would take place. Let's say when scheduling these sessions, I chose MON, TUE and WED. This means that after today, I want the dates to 11 my sessions which would be 4 Mondays, 4 Tuesdays and 3 Wednesdays from now after which my sessions will be completed.
I want to automatically get the dates for these days until there are 11 dates in total.
I really hope this makes sense... Please help me. I've been scratching my head over this for 3 hours straight.
Thanks,
You can use pd.date_range and the CustomBusinessDay object to do this very easily.
You can use the CustomBusinessDay to specify your "business days" and create your date range from it:
import pandas
from datetime import date
session_days = pd.offset.CustomBusinessDay(weekmask="Mon Tue Wed")
dates = pd.date_range(date.today(), freq=session_days, periods=11)
I figured it out a while ago but my internet died. All it took was Dunhill and some rest.
import datetime
def get_dates():
#This is the max number of dates you want. In my case, sessions.
required_sessions = 11
#These are the weekdays you want these sessions to be
days = [1,2,3]
#An empty list to store the dates you get
dates = []
#Initialize a variable for the while loop
current_sessions = 0
#I will start counting from today but you can choose any date
now = datetime.datetime.now()
#For my use case, I don't want a session on the same day I run this function.
#I will start counting from the next day
if now.weekday() in days:
now = now + datetime.timedelta(days=1)
while current_sessions != required_sessions:
#Iterate over every day in your desired days
for day in days:
#Just a precautionary measure so the for loops breaks as soon as you have the max number of dates
#Or the while loop will run for ever
if current_sessions == required_sessions:
break
#If it's Saturday, you wanna hop onto the next week
if now.weekday() == 6:
#Check if Sunday is in the days, add it
if 0 in days:
date = now + datetime.timedelta(days=1)
dates.append(date)
current_sessions += 1
now = date
else:
#Explains itself.
if now.weekday() == day:
dates.append(now)
now = now + datetime.timedelta(days=1)
current_sessions += 1
#If the weekday today is greater than the day you're iterating over, this means you've iterated over all the days in a NUMERIC ORDER
#NOTE: This only works if the days in your "days" list are in a correct numeric order meaning 0 - 6. If it's random, you'll have trouble
elif not now.weekday() > day:
difference = day - now.weekday()
date = now + datetime.timedelta(days=difference)
dates.append(date)
now = date
current_sessions += 1
#Reset the cycle after the for loop is done so you can hop on to the next week.
reset_cycle_days = 6 - now.weekday()
if reset_cycle_days == 0:
original_now = now + datetime.timedelta(days=1)
now = original_now
else:
original_now = now + datetime.timedelta(days=reset_cycle_days)
now = original_now
for date in dates:(
print(date.strftime("%d/%m/%y"), date.weekday()))
Btw, I know this answer is pointless compared to #Daniel Geffen 's answer. If I were you, I would definitely choose his answer as it is very simple. This was just my contribution to my own question in case anyone would want to jump into the "technicalities" of how it's done by just using datetime. For me, this works best as I'm having issues with _bz2 in Python3.7 .
Thank you all for your help.

In python, how can I find the start & end dates for a random quarter in the past?

Get start and end date of quarter from date and fiscal year end provides great helper functions to get the current/prior quarter. I'm unable to generalize the prev_quarter_range function to include a quarters_ago param that returns the start & end dates for a random quarter n quarters ago.
Ideally, I want a function named get_quarter_start_end_dates that takes in (dt, quarters_ago) and outputs (start_dt, end_dt). Here are some sample inputs --> outputs:
('2017-01-01', 0) --> ('2017-01-01', '2017-04-01')
('2017-01-01', 1) --> ('2016-10-01', '2017-01-01')
('2017-01-01', 2) --> ('2016-07-01', '2016-10-01')
('2017-02-01', 12) --> ('2014-01-01', '2014-04-01')
How about:
def get_quarter_start_end_dates(dt, quarters_ago):
months = relativedelta(months=3*quarters_ago)
start_dt = dt - months
end_dt = start_dt + relativedelta(months=3)
return (start_dt, end_dt)
The arrow library (for date processing) is very good for this purpose.
Get today's date in tod.
Find the first day of the month by replacing the day in today's date with one.
Suppose we want to go back n=3 quarters.
Then shift the first of the month back by 3*n months.
To get the end of the quarter shift the first of the month back by 3*(n-1) months, then by 1 day.
>>> import arrow
>>> tod = arrow.now()
>>> first_of_month = tod.replace(day=1)
>>> n = 3
>>> n_quarters_back = first_of_month.shift(months=-3*n)
>>> n_quarters_back
<Arrow [2017-04-01T16:56:46.377079-04:00]>
>>> end_of_n_quarters_back = first_of_month.shift(months=-3*(1-n)).shift(days=-1)
>>> end_of_n_quarters_back
<Arrow [2018-06-30T16:56:46.377079-04:00]>
I had this same question and after some trial and error I found the following to work:
def get_n_quarters_back(n_quarters):
date_in_quarter = arrow.now().shift(months=-3*n_quarters)
quarter_start, quarter_end = list(arrow.Arrow.interval('quarter',
date_in_quarter,date_in_quarter))[0]
return quarter_start, quarter_end
If you want to specify the datetime to shift back from just add the datetime parameter instead of going from now. In this case ref_date would be an Arrow object.
def get_n_quarters_back(ref_date, n_quarters):
date_in_quarter = ref_date.shift(months=-3*n_quarters)
quarter_start, quarter_end = list(arrow.Arrow.interval('quarter',
date_in_quarter, date_in_quarter))[0]
return quarter_start, quarter_end
NOTE: The arrow documentation states that the end date is an optional parameter on the interval classmethod. However, in practice this does not appear to be the case because without it TypeError: interval() missing 1 required positional argument: 'end' is returned.
With Arrow it's quite easy actually
import arrow
def get_quarter_start_end_dates(dt, quarters_ago):
date_in_past = arrow.get(dt).shift(quarters=-quarters_ago)
quarter_start_date = date_in_past.floor('quarter').datetime
quarter_end_date = date_in_past.ceil('quarter').datetime
return quarter_start_date, quarter_end_date

How to count days belonging to a given month in the range of two Python datetimes?

I have two Python datetime and I want to count the days between those dates, counting ONLY the days belonging to the month I choose. The range might overlap multiple months/years.
Example:
If I have 2017-10-29 & 2017-11-04 and I chose to count the days in October, I get 3 (29, 30 & 31 Oct.).
I can't find a straightforward way to do this so I think I'm going to iterate over the days using datetime.timedelta(days=1), and increment a count each time the day belongs to the month I chose.
Do you know a more performant method?
I'm using Python 2.7.10 with the Django framework.
Iterating over the days would be the most straightforward way to do it. Otherwise, you would need to know how many days are in a given month and you would need different code for different scenarios:
The given month is the month of the first date
The given month is the month of the second date
The given month is between the first and the second date (if dates span more than two months)
If you want to support dates spanning more than one year then you would need the input to include month and year.
Your example fits scenario #1, which I guess you could do like this:
>>> from datetime import datetime, timedelta
>>>
>>> first_date = datetime(2017, 10, 29)
>>>
>>> first_day_of_next_month = first_date.replace(month=first_date.month + 1, day=1)
>>> last_day_of_this_month = first_day_of_next_month - timedelta(1)
>>> number_of_days_in_this_month = last_day_of_this_month.day
>>> number_of_days_in_this_month - first_date.day + 1
3
This is why I would suggest implementing it the way you originally intended and only turning to this if there's a performance concern.
You can get difference between two datetime objects by simply subtracting them.
So, we start by getting the difference between the two dates.
And then we generate all the dates between the two using
gen = (start_date + datetime.timedelta(days = e) for e in range(diff + 1))
And since we only want the dates between the specified ones, we apply a filter.
filter(lambda x : x==10 , gen)
Then we will sum them over.
And the final code is this:
diff = start_date - end_date
gen = (start_date + datetime.timedelta(days = e) for e in range(diff + 1))
filtered_dates = filter(
lambda x : x.month == 10 ,
gen
)
count = sum(1 for e in filtered_dates)
You can also use reduce but sum() is a lot more readable.
A potential method of achieving this is to first compare whether your start or end dates you are comparing have the same month that you want to choose.
For example:
start = datetime(2017, 10, 29)
end = datetime(2017, 11, 4)
We create a function to compare the dates like so:
def daysofmonth(start, end, monthsel):
if start.month == monthsel:
days = (datetime(start.year, monthsel+1, 1) - start).days
elif end.month == monthsel:
days = (end - datetime(end.year, monthsel, 1)).days
elif not (monthsel > start.month) & (end.month > monthsel):
return 0
else:
days = (datetime(start.year, monthsel+1, 1) - datetime(start.year, monthsel, 1)).days
return days
So, in our example setting monthsel gives:
>>> daysofmonth(start, end, 10)
>>> 3
Using pandas whit your dates:
import pandas as pd
from datetime import datetime
first_date = datetime(2017, 10, 29)
second_date = datetime(2017, 11, 4)
days_count = (second_date - first_date).days
month_date = first_date.strftime("%Y-%m")
values = pd.date_range(start=first_date,periods=days_count,freq='D').to_period('M').value_counts()
print(values)
print(values[month_date])
outputs
2017-10 3
2017-11 3
3

Categories