I did find this solution click here to see article but still not exactly what I was looking for. The solution calculates all the days between 2 dates, including weekends. So, is there a solution that excludes the weekends in the calculation?
So, what I did is take that solution and expand it this way:
crtD = datetime.datetime.strptime(pd.loc[x,'createDate'], '%m/%d/%Y') # start date
tdyD = datetime.datetime.today() # end date
dayx = tdyD - crtD # number of days between start and end date. Includes weekends
wkds = dayx.days + 1 # eliminates time stamp and just leaves the number of days and adds 1 day
wkns = round(wkds/7,0) # divides the number of days by seven and rounds the result to the nearest integer
networkdays = int(wkds - wkns) - 1
print(networkdays)
I embedded these lines of code in a for loop. Hope this helps. If you have a solution to include Holidays, please post it here.
Related
If I want to find the number of hours between two datetime objects, I can do something like this:
from datetime import datetime
today = datetime.today()
day_after_tomorrow = datetime(2022, 9, 24)
diff = (day_after_tomorrow - today).total_seconds() / 3600
print(diff)
which returns: 37.58784580333333 hours.
But this is the number of real hours between two dates. I want to know the number of specific business hours between two dates.
I can define two CustomBusinessHour objects with pandas to specify those business hours (which are 8AM to 4:30PM M-F, and 8AM to 12PM on Saturday, excluding US Federal holidays):
from pandas.tseries.offsets import CustomBusinessHour
from pandas.tseries.holiday import USFederalHolidayCalendar
business_hours_mtf = CustomBusinessHour(calendar=USFederalHolidayCalendar(), start='08:00', end='16:30')
business_hours_sat = CustomBusinessHour(calendar=USFederalHolidayCalendar(), start='08:00', end='12:00')
My understanding is that CustomBusinessHour is a type of pandas DateOffset object, so it should behave just like a relativedelta object. So I should be able to use it in the datetime arithmetic somehow, to get the number I want.
And that's as far as I was able to get.
What I think I'm struggling to understand is how relativedeltas work, and how to actually use them in datetime arithmetic.
Is this the right approach? If so, how can I use these CustomBusinessHour objects to get an accurate amount of elapsed business hours between the two dates?
I figured out a solution. It feels ugly and hacky, but it seems to work. Hopefully someone else has a simpler or more elegant solution.
Edit: I cleaned up the documentation a little bit to make it easier to read. Also added a missing kwarg in business_hours_sat. Figuring this out was a headache, so if anyone else has to deal with this problem, hopefully this solution helps.
from datetime import datetime, timedelta
from pandas.tseries.offsets import CustomBusinessHour
from pandas.tseries.holiday import USFederalHolidayCalendar
business_hours_mtf = CustomBusinessHour(calendar=USFederalHolidayCalendar(), start='08:00', end='16:30')
business_hours_sat = CustomBusinessHour(calendar=USFederalHolidayCalendar(), weekmask='Sat', start='08:00', end='12:00')
def get_business_hours_range(earlier_date: datetime, later_date: datetime) -> float:
"""Return the number of business hours between `earlier_date` and `later_date` as a float with two decimal places.
Algorithm:
1. Increment `earlier_date` by 1 "business hour" until it's further in the future than `later_date`.
2. Also increment an `elapsed_business_hours` variable by 1.
3. Once `earlier_date` is larger (further in the future) than `later_date`...
a. Roll back `earlier_date` by one business hour.
b. Get the close of business hour for `earlier_date` ([3a]).
c. Get the number of minutes between [3b] and [3a] (`minutes_remaining`).
d. Create a timedelta with `elapsed_business_hours` and `minutes_remaining`
e. Represent this timedelta as a float with two decimal places.
f. Return this float.
"""
# Count how many "business hours" have elapsed between the `earlier_date` and `later_date`.
elapsed_business_hours = 0.0
current_day_of_week = 0
while earlier_date < later_date:
day_of_week = earlier_date.isoweekday()
# 6 = Saturday
if day_of_week == 6:
# Increment `earlier_date` by one "business hour", as specified by the `business_hours_sat` CBH object.
earlier_date += business_hours_sat
# Increment the counter of how many "business hours" have elapsed between these two dates.
elapsed_business_hours += 1
# Save the current day of the week in `earlier_date`, in case this is the last iteration of this while loop.
current_day_of_week = day_of_week
# 1 = Monday, 2 = Tuesday, ...
elif day_of_week in (1, 2, 3, 4, 5):
# Increment `earlier_date` by one "business hour", as specified by the `business_hours_mtf` CBH object.
earlier_date += business_hours_mtf
# Increment the counter of how many "business hours" have elapsed between these two dates.
elapsed_business_hours += 1
# Save the current day of the week in `earlier_date`, in case this is the last iteration of this while loop.
current_day_of_week = day_of_week
# Once we've incremented `earlier_date` to a date further in the future than `later_date`, we know that we've counted
# all the full (60min) "business hours" between `earlier_date` and `later_date`. (We can only increment by one hour when using
# CBH, so when we make this final increment, we may be skipping over a few minutes in that last day.)
#
# So now we roll `earlier_date` back by 1 business hour, to the last full business hour before `later_date`. Then we get the
# close of business hour for that day, and subtract `earlier_date` from it. This will give us whatever minutes may be remaining
# in that day, that weren't accounted for when tallying the number of "business hours".
#
# But before we do these things, we need to check what day of the week the last business hour is, so we know which closing time
# to use.
if current_day_of_week == 6:
ed_rolled_back = earlier_date - business_hours_sat
ed_closing_time = datetime.combine(ed_rolled_back, business_hours_sat.end[0])
elif current_day_of_week in (1, 2, 3, 4, 5):
ed_rolled_back = earlier_date - business_hours_mtf
ed_closing_time = datetime.combine(ed_rolled_back, business_hours_mtf.end[0])
minutes_remaining = (ed_closing_time - ed_rolled_back).total_seconds() / 60
if 0 < minutes_remaining < 60:
delta = timedelta(hours=elapsed_business_hours, minutes=minutes_remaining)
else:
delta = timedelta(hours=elapsed_business_hours)
delta_hours = round(float(delta.total_seconds() / 3600), 2)
return delta_hours
Let's say I have 11 Sessions for myself to complete. I haven't set dates for these sessions but rather just weekdays where one session would take place. Let's say when scheduling these sessions, I chose MON, TUE and WED. This means that after today, I want the dates to 11 my sessions which would be 4 Mondays, 4 Tuesdays and 3 Wednesdays from now after which my sessions will be completed.
I want to automatically get the dates for these days until there are 11 dates in total.
I really hope this makes sense... Please help me. I've been scratching my head over this for 3 hours straight.
Thanks,
You can use pd.date_range and the CustomBusinessDay object to do this very easily.
You can use the CustomBusinessDay to specify your "business days" and create your date range from it:
import pandas
from datetime import date
session_days = pd.offset.CustomBusinessDay(weekmask="Mon Tue Wed")
dates = pd.date_range(date.today(), freq=session_days, periods=11)
I figured it out a while ago but my internet died. All it took was Dunhill and some rest.
import datetime
def get_dates():
#This is the max number of dates you want. In my case, sessions.
required_sessions = 11
#These are the weekdays you want these sessions to be
days = [1,2,3]
#An empty list to store the dates you get
dates = []
#Initialize a variable for the while loop
current_sessions = 0
#I will start counting from today but you can choose any date
now = datetime.datetime.now()
#For my use case, I don't want a session on the same day I run this function.
#I will start counting from the next day
if now.weekday() in days:
now = now + datetime.timedelta(days=1)
while current_sessions != required_sessions:
#Iterate over every day in your desired days
for day in days:
#Just a precautionary measure so the for loops breaks as soon as you have the max number of dates
#Or the while loop will run for ever
if current_sessions == required_sessions:
break
#If it's Saturday, you wanna hop onto the next week
if now.weekday() == 6:
#Check if Sunday is in the days, add it
if 0 in days:
date = now + datetime.timedelta(days=1)
dates.append(date)
current_sessions += 1
now = date
else:
#Explains itself.
if now.weekday() == day:
dates.append(now)
now = now + datetime.timedelta(days=1)
current_sessions += 1
#If the weekday today is greater than the day you're iterating over, this means you've iterated over all the days in a NUMERIC ORDER
#NOTE: This only works if the days in your "days" list are in a correct numeric order meaning 0 - 6. If it's random, you'll have trouble
elif not now.weekday() > day:
difference = day - now.weekday()
date = now + datetime.timedelta(days=difference)
dates.append(date)
now = date
current_sessions += 1
#Reset the cycle after the for loop is done so you can hop on to the next week.
reset_cycle_days = 6 - now.weekday()
if reset_cycle_days == 0:
original_now = now + datetime.timedelta(days=1)
now = original_now
else:
original_now = now + datetime.timedelta(days=reset_cycle_days)
now = original_now
for date in dates:(
print(date.strftime("%d/%m/%y"), date.weekday()))
Btw, I know this answer is pointless compared to #Daniel Geffen 's answer. If I were you, I would definitely choose his answer as it is very simple. This was just my contribution to my own question in case anyone would want to jump into the "technicalities" of how it's done by just using datetime. For me, this works best as I'm having issues with _bz2 in Python3.7 .
Thank you all for your help.
I am using python pandas date range package to create a list of hourly timestamps for a calendar year. I code to do this, it looks like :
year = 2018
times = list(pd.date_range('{}-01-01'.format(year), '{}-12-31'.format(year), freq='H'))
I expect the length of times to be 8760 (the number of hours in a year). But when I view the length of the times vector, it is only 8737. Why????
When you specify a list by range, the first boundary is included and the second boundary is not. So here you are including {}-01-01 and not including {}-12-31. But you are including the midnight value.
So, you need to include the last day of the year, but omit the "celebratory" New Year Hour:
>>> year = 2018
>>> times = list(pd.date_range('{}-01-01'.format(year), '{}-01-01'.format(year+1), freq='H'))
>>> times = times[:-1]
>>> len(times)
8760
You need to include the New Year's Day, {}-01-01, so that you get New Year's Eve, {}-12-31. But then you get the midnight hour since that's what starts the day. Hence the need to eliminate the last entry in the list: times = times[:-1], so that you're ending at 11:00pm on 12-31.
please consider this problem:
I have a date in the past from where I start adding periods. When adding these periods results in a date greater than today, I want to stop, and check what the last date is.
This is a functionality for calculating debit dates in a membership. A member joins, say, 2007-01-31. He is debited every month. Let's say today is 2013-03-29 (it actually is atm). So I need to start counting months since 2007-01-31 and when I get past today's date, I need to stop. I can then see that the next debit date is 2013-03-31.
I am using the dateutil library to implement this, adding relativedelta's in a while loop until I surpass the current date. (I know it's probably not the best way, but I'm still quite new at Python and this is a proof-of-concept). The problem is that when I add a month to 2007-01-31, the next date is 2007-02-28, which is correct. But the next iteration the date is 2007-03-28, because dateutil doesn't recognize the 28th as the last day of the month to keep it intact and iterate to the last day of march. Ofcourse, that's a perfectly valid implementation. I then experimented with dateutils rrule object, but it has the same principles. It outputs a list of dates, but it simply skips the months that don't have enough days.
period = rrule(MONTHLY, interval=1, dtstart=datetime.date(2012, 5, 31), until=datetime.date(2013, 3, 29))
print(list(period))
Then I thought of a different approach:
If I could count the number of periods in the timespan between 2007-01-31 and 2013-03-29, I can add those number of periods to the startdate, and dateutil would return the right date.
The problem there is that the period isn't always one month. It can also be four weeks, a quarter or a year, for example.
I couldn't find a way to take a relativedelta and divide it with another relativedelta, to get a number of times the latter goes in the first.
If anyone can point me in the right direction I would appreciate it. Is there, for example, a library that can do this (divide timespans by each other, and output the result in a given timeblock, like months or weeks)? Is there perhaps a datediff function that accepts a period as an input (I know for example in vbscript you can get the difference between two dates in whatever period you want, be it weeks, months, days, whatever). Is there perhaps a totally different solution?
For completeness, I will include the code, but I think the text explains it all already:
def calculate(self, testdate=datetime.date.today()):
self._testdate = testdate
while self.next < self._testdate:
self.next += self._ddinterval
self.previous = self.next - self._ddinterval
return self.next
Thanks,
Erik
edit: I now have a solution that does what it's supposed to, but it's hardly Pythonic, elegant or speedy. So the question remains the same, if anyone can come up with a better solution, please do. Here's what I came up with:
def calculate(self, testdate=datetime.date.today()):
self._testdate = testdate
start = self.next
count = 0
while self.next < self._testdate:
count += 1
self.next = start + (count * self._ddinterval)
self.previous = self.next - self._ddinterval
return self.next
Instead of using the new value after adding your period as the starting point for the next loop, create an ever-increasing delta instead:
today = datetime.date.today()
start = datetime.date(2007, 1, 31)
period = relativedelta(months=1)
delta = period
while start + delta < today:
delta += period
next = start + delta
This results in:
>>> import datetime
>>> from dateutil.relativedelta import relativedelta
>>> today = datetime.date.today()
>>> start = datetime.date(2007, 1, 31)
>>> period = relativedelta(months=1)
>>> delta = period
>>> while start + delta < today:
... delta += period
...
>>> delta
relativedelta(years=+6, months=+3)
>>> start + delta
datetime.date(2013, 4, 30)
This works for any variable period lengths including quarters and years. For periods measured in exact weeks or days you can use a timedelta() object too, but this is a generic solution.
You cannot 'divide' time periods when using variable-width periods such as months and quarters. For time periods measured in whole days you can simply convert the difference between two dates to days (from the timedelta) and then get the modulus:
period = datetime.timedelta(days=28) # 4 weeks
delta = today - start
remainder = delta.days % period.days
end = today + datetime.timedelta(days=remainder)
which gives:
>>> period = datetime.timedelta(days=28) # 4 weeks
>>> delta = today - start
>>> remainder = delta.days % period.days
>>> today + datetime.timedelta(days=remainder)
datetime.date(2013, 4, 17)
If your delta is variable with respect to base time (i.e., 1 month can mean any of 28-31 days depending), then you're stuck with a loop.
If delta is a constant day count, however, you can sidestep iteration by converting to integers and doing a modulo operation.
So I have list of events that are sort of like alarms. They're defined by their start and end time (in hours and minutes), a range of days (ie 1-3 which is sunday through wed.), and a range of months (ie 1-3, january through march). The format of that data is largely unchangeable. I need to, not necessarily sort the list, but I need to find the next upcoming event based on the current time. There's just so many different ways to do this and so many different corner cases. This is my pseudo code:
now = time()
diff = []
# Start difference between now and start times
for s in schedule #assuming appending to diff
diff.minutes = s.minutes - time.minutes #
diff.hours = s.hours - time.hours
diff.days = s.days - time.days
diff.months = s.months - time.months
for d in diff
if d < 0
d = period + d
# period is the maximum period of the attribute. ie minutes is 60, hours is 24
# repeat for event end times
So now I have a list of tuples of differences in hours, minutes, days, and weeks. This tuple already takes into account if it's passed the start time, but before the end time. So let's say it's in August and the start month of the event is July and the end month is September, so diff.month == 0.
Now this specific corner case is giving me trouble:
Let's say a schedule runs from 0 to 23:59 thursdays in august. And it's Friday the 27th. Running my algorithm, the difference in months would be 0 when in reality it won't run again until next august, so it should be 12. And I'm stuck. The month is the only problem I think because the month is the only attribute that directly depends on what the date of the specific month is (versus just the day). Is my algorithm OK and I can just deal with this special case? Or is there something better out there for this?
This is the data I'm working with
map['start_time']=''
map['end_time']=''
map['start_moy']=''
map['end_moy']=''
map['start_dow']=''
map['end_dow']=''
The schedule getAllSchedules method just returns a list to all of the schedules. I can change the schedule class but I'm not sure what difference I can make there. I can't add/change the format of the schedules I'm given
Convert the items from the schedule into datetime objects. Then you can simply sort them
from datetime import datetime
events = sorted(datetime(s.year, s.month, s.day, s.hour, s.minute) for s in schedule)
Since your resolution is in minutes, and assuming that you don't have many events, then I'd simply scan all the events every minute.
Filter your events so that you have a new list where the event range match the current month and day.
Then for each of those events declare that they are active or inactive according to whether the current time matches the event's range.
The primary issue seems to be with the fact that you're using the day of the week, instead of explicit days of the month.
While your cited edge case is one example, does this issue not crop up with all events scheduled in any month outside of the current one?
I think the most robust approach here would be to do the work to get your scheduled events into datetime format, then use #gnibbler's suggestion of sorting the datetime objects.
Once you have determined that the last event for the current month has already passed, calculate the distance to the next month the event occurs in (be it + 1 year, or just + 1 month), then construct a datetime object with that information:
first_of_month = datetime.date(calculated_year, calculated_month, 1)
By using the first day of the month, you can then use:
day_of_week = first_of_month.strftime('%w')
To give you what day of the week the first of that month falls on, which you can then use to calculate how many days to add to get to the first, second, third, etc. instance of a given day of the week, for that month. Once you have that day, you can construct a valid datetime object and do whatever comparisons you wish with now().
I couldn't figure out how to do it using only datetimes. But I found a module and used this. It's perfect
http://labix.org/python-dateutil