pandas Calendar issue - python

This code for CustomBusinessDay() works fine:
from datetime import datetime
from pandas.tseries.offsets import CustomBusinessDay
runday = datetime(2021,12,30).date()
nextday = (runday + CustomBusinessDay()).date()
output 1:
In [26]: nextday
Out[26]: datetime.date(2021, 12, 31)
However, when adding an optional calendar as in date functionality , it produces the next business day even though today's date (Dec 31, 2021) is not a holiday according to a specified calendar below:
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, \
USMemorialDay, USMartinLutherKingJr, USPresidentsDay, GoodFriday, \
USLaborDay, USThanksgivingDay, nearest_workday
class NYSECalendar(AbstractHolidayCalendar):
''' NYSE holiday calendar via pandas '''
rules = [
Holiday('New Years Day', month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
GoodFriday,
USMemorialDay,
Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
USLaborDay,
USThanksgivingDay,
Holiday('Christmas', month=12, day=25, observance=nearest_workday),
]
nextday = (runday + CustomBusinessDay(calendar=NYSECalendar())).date()
output2:
In [27]: nextday
Out[27]: datetime.date(2022, 1, 3)

This line is the location of your problem:
Holiday('New Years Day', month=1, day=1, observance=nearest_workday)
If you take a look at the source code, nearest_workday means that the holiday is observed on a Friday if it falls on a Saturday, and on a Monday if the holiday falls on a Sunday. Since New Year's Day 2022 falls on a Saturday, it is observed today (12/31/2021) according to your calendar.
Removing the observance parameter will lead to an output of 2021-12-31.

Related

How to remove a specific holiday from Pandas USFederalHolidayCalendar?

I'm trying to remove Columbus Day from pandas.tseries.holiday.USFederalHolidayCalendar.
This seems to be possible, as a one-time operation, with
from pandas.tseries.holiday import USFederalHolidayCalendar
cal = USFederalHolidayCalendar()
cal = cal.rules.pop(6)
However, if this code is within a function that gets called repeatedly (in a loop) to generate several independent outputs, I get the following error:
IndexError: pop index out of range
It gives me the impression that the object remains in its initial loaded state and as the loop progresses it pops holidays at index 6 until they're gone and then throws an error.
I tried reloading via importlib.reload to no avail.
Any idea what I'm doing wrong?
# Import your library
from pandas.tseries.holiday import USFederalHolidayCalendar
# Get an id of 'columbus' in 'rules' list
columbus_index = USFederalHolidayCalendar().rules.index([i for i in USFederalHolidayCalendar().rules if 'Columbus' in str(i)][0])
# Create your own class, inherit 'USFederalHolidayCalendar'
class USFederalHolidayCalendar(USFederalHolidayCalendar):
# Exclude 'columbus' entry
rules = USFederalHolidayCalendar().rules[:columbus_index] + USFederalHolidayCalendar().rules[columbus_index+1:]
# Create an object from your class
cal = USFederalHolidayCalendar()
print(cal.rules)
[Holiday: New Years Day (month=1, day=1, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Martin Luther King Jr. Day (month=1, day=1, offset=<DateOffset: weekday=MO(+3)>),
Holiday: Presidents Day (month=2, day=1, offset=<DateOffset: weekday=MO(+3)>),
Holiday: Memorial Day (month=5, day=31, offset=<DateOffset: weekday=MO(-1)>),
Holiday: July 4th (month=7, day=4, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Labor Day (month=9, day=1, offset=<DateOffset: weekday=MO(+1)>),
Holiday: Veterans Day (month=11, day=11, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Thanksgiving (month=11, day=1, offset=<DateOffset: weekday=TH(+4)>),
Holiday: Christmas (month=12, day=25, observance=<function nearest_workday at 0x7f6afad571f0>)]
The problem here is that rules is a class attribute (a list of objects). See the code taken from here:
class USFederalHolidayCalendar(AbstractHolidayCalendar):
"""
US Federal Government Holiday Calendar based on rules specified by:
https://www.opm.gov/policy-data-oversight/
snow-dismissal-procedures/federal-holidays/
"""
rules = [
Holiday("New Years Day", month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
USMemorialDay,
Holiday("July 4th", month=7, day=4, observance=nearest_workday),
USLaborDay,
USColumbusDay,
Holiday("Veterans Day", month=11, day=11, observance=nearest_workday),
USThanksgivingDay,
Holiday("Christmas", month=12, day=25, observance=nearest_workday),
]
Since the attribute is defined on the class, there is only one underlying list referred to, so if operations on different instances of that class both attempt to edit the list, then you'll have some unwanted behavior. Here is an example that shows what's going on:
>>> class A:
... rules = [0,1,2]
...
>>> a1 = A()
>>> a2 = A()
>>> a1.rules.pop()
2
>>> a1.rules.pop()
1
>>> a2.rules.pop()
0
>>> a2.rules.pop()
IndexError: pop from empty list
>>> a3 = A()
>>> a3.rules
[]
Also, each module in python is imported only one time

python last working day of month (with CustomBusinessDay)?

I like to calculate last working day before or after a specific date(includes holidays, not just weekends)?
import datetime as dt
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, nearest_workday, \
USMartinLutherKingJr, USPresidentsDay, GoodFriday, USMemorialDay, \
USLaborDay, USThanksgivingDay
class USTradingCalendar(AbstractHolidayCalendar):
rules = [
Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
GoodFriday,
USMemorialDay,
Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
USLaborDay,
USThanksgivingDay,
Holiday('Christmas', month=12, day=25, observance=nearest_workday)
]
def get_trading_close_holidays(fromyear, toyear):
inst = USTradingCalendar()
return inst.holidays(dt.datetime(fromyear-1, 12, 31), dt.datetime(toyear, 12, 31))
print(get_trading_close_holidays(2018,2018))
>> DatetimeIndex(['2018-01-01', '2018-01-15', '2018-02-19', '2018-03-30', '2018-05-28', '2018-07-04', '2018-09-03', '2018-11-22', '2018-12-25'], dtype='datetime64[ns]', freq=None)
import datetime as dt
from pandas.tseries.holiday import USFederalHolidayCalendar
bday_us = CustomBusinessDay(calendar=get_trading_close_holidays(2000,2050))
d = dt.datetime(2018, 3, 31)
d - bday_us
>> Timestamp('2018-03-30 00:00:00')
This falls on Good Friday, that holiday(as shown)... should show 1 day before = 2018-03-29...
What's the issue?
I was able to reproduce the problem and after some testing I've narrowed it down to using a DatetimeIndex as the input of the calendar parameter in CustomBusinessDay.
You can skip that and use the calendar instance directly:
import datetime as dt
import pandas as pd
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, nearest_workday, \
USMartinLutherKingJr, USPresidentsDay, GoodFriday, USMemorialDay, \
USLaborDay, USThanksgivingDay
from pandas.tseries.offsets import CustomBusinessDay, BDay
class USTradingCalendar(AbstractHolidayCalendar):
rules = [
Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
GoodFriday,
USMemorialDay,
Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
USLaborDay,
USThanksgivingDay,
Holiday('Christmas', month=12, day=25, observance=nearest_workday)
]
bday_us = CustomBusinessDay(calendar=USTradingCalendar())
d = dt.datetime(2018, 3, 31)
c = d - bday_us
print(c)
The output:
2018-03-29 00:00:00

How to filter two datetime indices?

I have two datetime indices - one being a date_range of business days and the other being a list of holidays.
I filter the holiday list by a start and end date. But now I need to join them and drop any duplicates (holidays and trading days both exist).
Finally I need to convert the daterange into a list of formatted strings ie: yyyy_mm_dd that I can iterate through later.
Here is my code so far:
import datetime
import pandas as pd
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, nearest_workday, \
USMartinLutherKingJr, USPresidentsDay, GoodFriday, USMemorialDay, \
USLaborDay, USThanksgivingDay
class USTradingCalendar(AbstractHolidayCalendar):
rules = [
Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
GoodFriday,
USMemorialDay,
Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
USLaborDay,
USThanksgivingDay,
Holiday('Christmas', month=12, day=25, observance=nearest_workday)
]
def get_trading_close_holidays(year):
inst = USTradingCalendar()
return inst.holidays(datetime.datetime(year-1, 12, 31),
datetime.datetime(year, 12, 31))
start_date = "2017_07_01"
end_date = "2017_08_31"
start_date = datetime.datetime.strptime(start_date,"%Y_%m_%d").date()
end_date = datetime.datetime.strptime(end_date,"%Y_%m_%d").date()
date_range = pd.bdate_range(start = start_date, end = end_date, name =
"trading_days")
holidays = get_trading_close_holidays(start_date.year)
holidays = holidays.where((holidays.date > start_date) &
(holidays.date < end_date))
holidays = holidays.dropna(how = 'any')
date_range = date_range.where(~(date_range.trading_days.isin(holidays)))
Consider filtering by boolean condition:
date_range = date_range[date_range.date != holidays.date]
print(date_range) # ONE HOLIDAY 2017-07-04 DOES NOT APPEAR
# DatetimeIndex(['2017-07-03', '2017-07-05', '2017-07-06', '2017-07-07',
# '2017-07-10', '2017-07-11', '2017-07-12', '2017-07-13',
# '2017-07-14', '2017-07-17', '2017-07-18', '2017-07-19',
# '2017-07-20', '2017-07-21', '2017-07-24', '2017-07-25',
# '2017-07-26', '2017-07-27', '2017-07-28', '2017-07-31',
# '2017-08-01', '2017-08-02', '2017-08-03', '2017-08-04',
# '2017-08-07', '2017-08-08', '2017-08-09', '2017-08-10',
# '2017-08-11', '2017-08-14', '2017-08-15', '2017-08-16',
# '2017-08-17', '2017-08-18', '2017-08-21', '2017-08-22',
# '2017-08-23', '2017-08-24', '2017-08-25', '2017-08-28',
# '2017-08-29', '2017-08-30', '2017-08-31'],
# dtype='datetime64[ns]', name='trading_days', freq=None)
And using astype() to convert the datetime index to string type array, even tostring() for list conversion:
strdates = date_range.date.astype('str').tolist()
print(strdates)
# ['2017-07-03', '2017-07-05', '2017-07-06', '2017-07-07', '2017-07-10',
# '2017-07-11', '2017-07-12', '2017-07-13', '2017-07-14', '2017-07-17',
# '2017-07-18', '2017-07-19', '2017-07-20', '2017-07-21', '2017-07-24',
# '2017-07-25', '2017-07-26', '2017-07-27', '2017-07-28', '2017-07-31',
# '2017-08-01', '2017-08-02', '2017-08-03', '2017-08-04', '2017-08-07',
# '2017-08-08', '2017-08-09', '2017-08-10', '2017-08-11', '2017-08-14',
# '2017-08-15', '2017-08-16', '2017-08-17', '2017-08-18', '2017-08-21',
# '2017-08-22', '2017-08-23', '2017-08-24', '2017-08-25', '2017-08-28',
# '2017-08-29', '2017-08-30', '2017-08-31']

Changing pandas class variables in python

I'm making use of the pandas.tseries.holiday module and have had an issue with the AbstractHolidayCalendar class within this module. The use is to obtain a calendar that accounts for all bank holidays/weekends in the UK when rolling forward business days.
My existing code (taken from another user) is:
class EnglandAndWalesHolidayCalendar(AbstractHolidayCalendar):
rules = [Holiday('New Years Day', month=1, day=1, observance=next_monday),
GoodFriday,
EasterMonday,
Holiday('Early May bank holiday',
month=5, day=1, offset=DateOffset(weekday=MO(1))),
Holiday('Spring bank holiday',
month=5, day=31, offset=DateOffset(weekday=MO(-1))),
Holiday('Summer bank holiday',
month=8, day=31, offset=DateOffset(weekday=MO(-1))),
Holiday('Christmas Day', month=12, day=25, observance=next_monday),
Holiday('Boxing Day',
month=12, day=26, observance=next_monday_or_tuesday)]
In the pandas module mentioned, I see the following:
class AbstractHolidayCalendar(object):
"""
Abstract interface to create holidays following certain rules.
"""
__metaclass__ = HolidayCalendarMetaClass
rules = []
start_date = Timestamp(datetime(1970, 1, 1))
end_date = Timestamp(datetime(2030, 12, 31))
_cache = None
However when adding
end_date = Timestamp(datetime(2080, 12, 31))
To my defined class it doesn't seem to work. Does anyone know how to adjust the end date without directly changing the pandas module?
Thanks

Finding the previous month

I've seen some methods using dateutil module to do this, but is there a way to do this without just using the built in libs?
For example, the current month right now is July. I can do this using the datetime.now() function.
What would be the easiest way for python to return the previous month?
It's very easy:
>>> previous_month = datetime.now().month - 1
>>> if previous_month == 0:
... previous_month = 12
You can use the calendar module
>>> from calendar import month_name, month_abbr
>>> d = datetime.now()
>>> month_name[d.month - 1] or month_name[-1]
'June'
>>> month_abbr[d.month - 1] or month_abbr[-1]
'Jun'
>>>
If you just want it as a string then do below process.
import datetime
months =(" Blank", "December", "January", "February", "March", "April",
"May","June", "July","August","September","October","November")
d = datetime.date.today()
print(months[d.month])
Generalized function finding the year and month, based on a month delta:
# %% function
def get_year_month(ref_date, month_delta):
year_delta, month_index = divmod(ref_date.month - 1 + month_delta, 12)
year = ref_date.year + year_delta
month = month_index + 1
return year, month
# %% test
some_date = date(2022, 5, 31)
for delta in range(-12, 12):
year, month = get_year_month(some_date, delta)
print(f"{delta=}, {year=}, {month=}")
delta=-12, year=2021, month=5
delta=-11, year=2021, month=6
delta=-10, year=2021, month=7
delta=-9, year=2021, month=8
delta=-8, year=2021, month=9
delta=-7, year=2021, month=10
delta=-6, year=2021, month=11
delta=-5, year=2021, month=12
delta=-4, year=2022, month=1
delta=-3, year=2022, month=2
delta=-2, year=2022, month=3
delta=-1, year=2022, month=4
delta=0, year=2022, month=5
delta=1, year=2022, month=6
delta=2, year=2022, month=7
delta=3, year=2022, month=8
delta=4, year=2022, month=9
delta=5, year=2022, month=10
delta=6, year=2022, month=11
delta=7, year=2022, month=12
delta=8, year=2023, month=1
delta=9, year=2023, month=2
delta=10, year=2023, month=3
delta=11, year=2023, month=4
If you want a date object:
import datetime
d = datetime.date.today() - datetime.timedelta(days=30)
>>> datetime.date(2015, 6, 29)

Categories