I've managed to find two (fairly comprehensive) holiday packages which are:
Python | Holidays library
workalendar (7.1.0)
End Goal:
I would like to get a int by using either of these two packages for the public holidays that only fall on a week day. The country is Australia and the state is Western Australia (WA), where both packages can accommodate for states and territories of Australia.
MWE:
import numpy as np
start_date = "2019-01-01"
end_date = "2020-01-01"
total_working_days = np.busday_count(start_date, end_date)
# total_public_holidays_that_fall_on_working_days =
# use the holidays package to get the number of public holidays
# that fall on working days (Mon-Fri) between *start_date* and
# *end_date*.
actual_working_days = total_working_days - total_public_holidays_that_fall_on_working_days
Question:
I want to get an int for the actual number of working days (working days - public holidays that fall on weekdays), how can I do this with either of the above libraries?
From the documentation of workalendar, adapted to your requirements:
from datetime import datetime
from workalendar.oceania.australia import WesternAustralia
start_date = "2019-01-01"
end_date = "2020-01-01"
start_datetime = datetime.strptime(start_date, '%Y-%m-%d')
end_datetime = datetime.strptime(end_date, '%Y-%m-%d')
cal = WesternAustralia()
print(cal.get_working_days_delta(start_datetime, end_datetime))
Output:
252
Related
I've been diving into List Comprehensions, and I'm determined to put it into practice.
The below code takes a month and year input to determine the number of business days in the month, minus the public holidays (available at https://www.gov.uk/bank-holidays.json).
Additionally, I want to list all public holidays in that month/year, but I'm struck with a date format conflict.
TypeError: '<' not supported between instances of 'str' and 'datetime.date'
edate and sdate are datetime.date, whereas title["date"] is a string.
I've tried things like datetime.strptime and datetime.date to now avail.
How can I resolve the date conflict within the List Comprehension?
Any help or general feedback on code appreciated.
from datetime import date, timedelta, datetime
import inspect
from turtle import title
from typing import Iterator
import numpy as np
import json
import requests
from calendar import month, monthrange
import print
# Ask for a month and year input (set to September for quick testing)
monthInput = "09"
yearInput = "2022"
# Request from UK GOV and filter to England and Wales
holidaysJSON = requests.get("https://www.gov.uk/bank-holidays.json")
ukHolidaysJSON = json.loads(holidaysJSON.text)['england-and-wales']['events']
# List for all England and Wales holidays
ukHolidayList = []
eventIterator = 0
for events in ukHolidaysJSON:
ukHolidayDate = list(ukHolidaysJSON[eventIterator].values())[1]
ukHolidayList.append(ukHolidayDate)
eventIterator += 1
# Calculate days in the month
daysInMonth = monthrange(int(yearInput), int(monthInput))[1] # Extract the number of days in the month
# Define start and end dates
sdate = date(int(yearInput), int(monthInput), 1) # start date
edate = date(int(yearInput), int(monthInput), int(daysInMonth)) # end date
# Calculate delta
delta = edate - sdate
# Find all of the business days in the month
numberOfWorkingDays = 0
for i in range(delta.days + 1): # Look through all days in the month
day = sdate + timedelta(days=i)
if np.is_busday([day]) and str(day) not in ukHolidayList: # Determine if it's a business day
print("- " + str(day))
numberOfWorkingDays += 1
# Count all of the UK holidays
numberOfHolidays = 0
for i in range(delta.days + 1): # Look through all days in the month
day = sdate + timedelta(days=i)
if str(day) in ukHolidayList: # Determine if it's a uk holiday
numberOfHolidays += 1
# Strip the 0 from the month input
month = months[monthInput.lstrip('0')]
# for x in ukHolidaysJSON:
# pprint.pprint(x["title"])
# This is where I've gotten to
hols = [ title["title"] for title in ukHolidaysJSON if title["date"] < sdate and title["date"] > edate ]
print(hols)
I got this to work. You can used the datetime module to parse the string format but then you need to convert that result into a Date to compare to the Date objects you have already.
hols = [ title["title"] for title in ukHolidaysJSON if datetime.strptime(title["date"], '%Y-%m-%d').date() < sdate and datetime.strptime(title["date"], "%Y-%m-%d").date() > edate ]
First use strptime and then convert the datetime object to date. I'm not sure if there's a more straightforward way but this seems to work:
hols = [title["title"] for title in ukHolidaysJSON
if datetime.strptime(title["date"], "%Y-%M-%d").date() < sdate and
datetime.strptime(title["date"], "%Y-%M-%d").date() > edate]
I have a dataframe (df) with start_date column's and add_days column's (=10). I want to create target_date (=start_date + add_days) excluding week-end and holidays (holidays as dataframe).
I do some research and I try this.
from datetime import date, timedelta
import datetime as dt
df["star_date"] = pd.to_datetime(df["star_date"])
Holidays['Date_holi'] = pd.to_datetime(Holidays['Date_holi'])
def date_by_adding_business_days(from_date, add_days, holidays):
business_days_to_add = add_days
current_date = from_date
while business_days_to_add > 0:
current_date += datetime.timedelta(days=1)
weekday = current_date.weekday()
if weekday >= 5: # sunday = 6
continue
if current_date in holidays:
continue
business_days_to_add -= 1
return current_date
#demo:
base["Target_date"]=date_by_adding_business_days(df["start_date"], 10, Holidays['Date_holi'])
but i get this error:
AttributeError: 'Series' object has no attribute 'weekday'
Thanks you for your help.
The comments by ALollz are very valid; customizing your date during creation to only keep what is defined as business day for your problem would be optimal.
However, I assume that you cannot define the business day beforehand and that you need to solve the problem with the data frame constructed as is.
Here is one possible solution:
import pandas as pd
import numpy as np
from datetime import timedelta
# Goal is to offset a start date by N business days (weekday + not a holiday)
# Here we fake the dataset as it was not provided
num_row = 1000
df = pd.DataFrame()
df['start_date'] = pd.date_range(start='1/1/1979', periods=num_row, freq='D')
df['add_days'] = pd.Series([10]*num_row)
# Define what is a week day
week_day = [0,1,2,3,4] # Monday to Friday
# Define what is a holiday with month and day without year (you can add more)
holidays = ['10-30','12-24']
def add_days_to_business_day(df, week_day, holidays, increment=10):
'''
modify the dataframe to increment only the days that are part of a weekday
and not part of a pre-defined holiday
>>> add_days_to_business_day(df, [0,1,2,3,4], ['10-31','12-31'])
this will increment by 10 the days from Monday to Friday excluding Halloween and new year-eve
'''
# Increment everything that is in a business day
df.loc[df['start_date'].dt.dayofweek.isin(week_day),'target_date'] = df['start_date'] + timedelta(days=increment)
# Remove every increment done on a holiday
df.loc[df['start_date'].dt.strftime('%m-%d').isin(holidays), 'target_date'] = np.datetime64('NaT')
add_days_to_business_day(df, week_day, holidays)
df
To Note: I'm not using the 'add_days' column since its just a repeated value. I am instead using a parameter for my function increment which will increment by N number of days (with a default of N = 10).
Hope it helps!
Made my own definition of MLK Day Holiday that adheres not to when the holiday was first observed, but by when it was first observed by the NYSE. The NYSE first observed MLK day in January of 1998.
When asking the Holiday for the days in which the holiday occurred between dates, it works fine for the most part, returning an empty set when the MLK date is not in the range requested, and returning the appropriate date when it is. For date ranges that precede the start_date of the holiday, it appropriately returns the empty set, until we hit around 1995, and then it fails. I cannot figure out why it fails then and not in other situations when the empty set is the correct answer.
Note: Still stuck on Pandas 0.22.0. Python3
import pandas as pd
from datetime import datetime
from dateutil.relativedelta import MO
from pandas.tseries.holiday import Holiday
__author__ = 'eb'
mlk_rule = Holiday('MLK Day (NYSE Observed)',
start_date=datetime(1998, 1, 1), month=1, day=1,
offset=pd.DateOffset(weekday=MO(3)))
start = pd.to_datetime('1999-01-17')
end = pd.to_datetime('1999-05-01')
finish = pd.to_datetime('1980-01-01')
while start > finish:
print(f"{start} - {end}:")
try:
dates = mlk_rule.dates(start, end, return_name=True)
except Exception as e:
print("\t****** Fail *******")
print(f"\t{e}")
break
print(f"\t{dates}")
start = start - pd.DateOffset(years=1)
end = end - pd.DateOffset(years=1)
When run, this results in:
1999-01-17 00:00:00 - 1999-05-01 00:00:00:
1999-01-18 MLK Day (NYSE Observed)
Freq: 52W-MON, dtype: object
1998-01-17 00:00:00 - 1998-05-01 00:00:00:
1998-01-19 MLK Day (NYSE Observed)
Freq: 52W-MON, dtype: object
1997-01-17 00:00:00 - 1997-05-01 00:00:00:
Series([], dtype: object)
1996-01-17 00:00:00 - 1996-05-01 00:00:00:
Series([], dtype: object)
1995-01-17 00:00:00 - 1995-05-01 00:00:00:
****** Fail *******
Must provide freq argument if no data is supplied
What happens in 1995 that causes it to fail, that does not happen in the same periods in the years before?
ANSWER: Inside of the Holiday class, the dates() method is used to
gather the list of valid holidays within a requested date range. In
order to insure that this occurs properly, the implementation gathers
all holidays from one year before to one year after the requested date
range via the internal _reference_dates() method. In this method,
if the receiving Holiday instance has an internal start or end date,
it uses that date as the begin or end of the range to be examined
rather than the passed in requested range, even if the dates in the requested
range precede or exceed the start or end date of the rule.
The existing implementation mistakenly assumes it is ok to limit the effective range over which it must accurately identify what holidays are in existence to the range over which holidays exist. As part of a set of rules in a calendar, it is as important for a Holiday to identify where holidays do not exist as where they do. The NULL set response is an important function of the Holiday class.
For example, in a Trading Day Calendar that needs to identify when financial markets are open or closed, the calendar may need to accurately identify which days the market is closed over a 100 year history. The market only closed for MLK day for a small part of that history. A calendar that includes the MLK holiday as constructed above throws an error when asked for the open days or holidays for periods preceding the MLK start_date[1].
To fix this, I re-implemented the _reference_dates() method in a
custom sub-class of Holiday to insure that when the requested date
range extends before the start_date or after the end_date of the
holiday rule, it uses the actual requested range to build the
reference dates from, rather than bound the request by the internal
start and end dates.
Here is the implementation I am using.
class MLKHoliday(Holiday):
def __init__(self):
super().__init__('MLK Day (NYSE Observed)',
start_date=datetime(1998, 1, 1), month=1, day=1,
offset=pd.DateOffset(weekday=MO(3)))
def _reference_dates(self, start_date, end_date):
"""
Get reference dates for the holiday.
Return reference dates for the holiday also returning the year
prior to the start_date and year following the end_date. This ensures
that any offsets to be applied will yield the holidays within
the passed in dates.
"""
if self.start_date and start_date and start_date >= self.start_date:
start_date = self.start_date.tz_localize(start_date.tz)
if self.end_date and end_date and end_date <= self.end_date:
end_date = self.end_date.tz_localize(end_date.tz)
year_offset = pd.DateOffset(years=1)
reference_start_date = pd.Timestamp(
datetime(start_date.year - 1, self.month, self.day))
reference_end_date = pd.Timestamp(
datetime(end_date.year + 1, self.month, self.day))
# Don't process unnecessary holidays
dates = pd.DatetimeIndex(start=reference_start_date,
end=reference_end_date,
freq=year_offset, tz=start_date.tz)
return dates
Does anyone know if this has been fixed in a more up-to-date version of pandas?
[1] Note: As constructed in the original question, the mlk_rule will not actually fail to provide the NULL set to the dates() call over a range just preceding the start_date but will actually start throwing exceptions a year or so before that. This is because the mistaken assumption about the lack of need for a proper NULL set response is mitigated by the extension of the date range by a year in each direction by _reference_dates().
Is there a better / more direct way to calculate this than the following?
# 1. Set up the start and end date for which you want to calculate the
# number of business days excluding holidays.
start_date = '01JAN1986'
end_date = '31DEC1987'
start_date = datetime.datetime.strptime(start_date, '%d%b%Y')
end_date = datetime.datetime.strptime(end_date, '%d%b%Y')
# 2. Generate a list of holidays over this period
from pandas.tseries.holiday import USFederalHolidayCalendar
calendar = USFederalHolidayCalendar()
holidays = calendar.holidays(start_date, end_date)
holidays
Which gives a pandas.tseries.index.DatetimeIndex
DatetimeIndex(['1986-01-01', '1986-01-20', '1986-02-17', '1986-05-26',
'1986-07-04', '1986-09-01', '1986-10-13', '1986-11-11',
'1986-11-27', '1986-12-25', '1987-01-01', '1987-01-19',
'1987-02-16', '1987-05-25', '1987-07-03', '1987-09-07',
'1987-10-12', '1987-11-11', '1987-11-26', '1987-12-25'],
dtype='datetime64[ns]', freq=None, tz=None)
But you need a list for numpy busday_count
holiday_date_list = holidays.date.tolist()
Then with and without the holidays you get:
np.busday_count(start_date.date(), end_date.date())
>>> 521
np.busday_count(start_date.date(), end_date.date(), holidays = holiday_date_list)
>>> 501
There are some other questions slightly similar but generally working with pandas Series or Dataframes (Get business days between start and end date using pandas, Counting the business days between two series)
If you put the index you created in a dataframe, you can use resample to fill in the gaps. The offset passed to .resample() can include things like business days and even (custom) calendars:
from pandas.tseries.holiday import USFederalHolidayCalendar
C = pd.offsets.CustomBusinessDay(calendar=USFederalHolidayCalendar())
start_date = '01JAN1986'
end_date = '31DEC1987'
(
pd.DataFrame(index=pd.to_datetime([start_date, end_date]))
.resample(C, closed='right')
.asfreq()
.index
.size
) - 1
The size of the index - 1 then gives us the amount of days.
In Python can you select a random date from a year. e.g. if the year was 2010 a date returned could be 15/06/2010
It's much simpler to use ordinal dates (according to which today's date is 734158):
from datetime import date
import random
start_date = date.today().replace(day=1, month=1).toordinal()
end_date = date.today().toordinal()
random_day = date.fromordinal(random.randint(start_date, end_date))
This will fail for dates before 1AD.
Not directly, but you could add a random number of days to January 1st. I guess the following should work for the Gregorian calendar:
from datetime import date, timedelta
import random
import calendar
# Assuming you want a random day of the current year
firstJan = date.today().replace(day=1, month=1)
randomDay = firstJan + timedelta(days = random.randint(0, 365 if calendar.isleap(firstJan.year) else 364))
import datetime, time
import random
def year_start(year):
return time.mktime(datetime.date(year, 1, 1).timetuple())
def rand_day(year):
stamp = random.randrange(year_start(year), year_start(year + 1))
return datetime.date.fromtimestamp(stamp)
Edit: Ordinal dates as used in Michael Dunns answer are way better to use then timestamps! One might want to combine the use of ordinals with this though.
import calendar
import datetime
import random
def generate_random_date(future=True, years=1):
today = datetime.date.today()
#Set the default dates
day = today.day
year = today.year
month = today.month
if future:
year = random.randint(year, year + years)
month = random.randint(month, 12)
date_range = calendar.monthrange(year, month)[1] #dates possible this month
day = random.randint(day + 1, date_range) #1 day in the future
else:
year = random.randint(year, year - years)
month = random.randint(1, month)
day = random.randint(1, day - 1)
return datetime.date(year, month, day)
This is an old question, but, you can use my new library ^_^ chancepy here
from chancepy import Chance
randomDate = Chance.date(year=2020)
To get a random date you can use faker
pip install faker
from faker import Faker
fake = Faker()
fake.date_between(start_date='today', end_date='+1y')
if you want from the beginning of the year then:
start_date = datetime.date(year=2023, month=1, day=1)
fake.date_between(start_date, end_date='+1y')