How to exact holiday names from Pandas USFederalHolidayCalendar? - python

I would like to extract US Holiday names (e.g. "Memorial Day") using USFederalHolidayCalendar in Pandas library. The code below is just printing US holidays in the given range. I don't necessarily need to use Pandas for this purpose though if there is an easier way.
cal=USFederalHolidayCalendar()
y_str=datetime.datetime.now().strftime("%Y")
holidays = cal.holidays(start=y_str+'-01-01', end=y_str+'-12-31')
for h in holidays:
print(h)
I know that "cal.rules" can return a list like below. Should I extract from this? If so, this doesn't look like a list of strings. What is the content type of list?
[Holiday: New Years Day (month=1, day=1, observance=<function nearest_workday at 0x0000012EF1B3B280>), Holiday: Martin Luther King Jr. Day (month=1, day=1, offset=<DateOffset: weekday=MO(+3)>), Holiday: Presidents Day (month=2, da

cal.rules gives a list of pandas.tseries.holiday.Holiday objects. These objects have .name attributes (see source). So, you can do the following:
from pandas.tseries.holiday import USFederalHolidayCalendar
import datetime
cal = USFederalHolidayCalendar()
holidays = cal.rules
print([holiday.name for holiday in holidays])

Related

Adding a range of dates as one holiday rule, instead of just a single date, in Pandas.tseries AbstractHolidayCalendar?

I'm working on a Python script to offset a given start date with X number of business days according to a custom holiday calendar. Pandas.tseries seems to be a good choice.
When building my generic holiday calendar, I have come across examples on adding a single date to the holiday rules.
Example:
import pandas as pd
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, Easter
from pandas.tseries.offsets import Day
class myCalendar(AbstractHolidayCalendar):
rules = [
Holiday('Off-day during Easter', month=1, day=1, offset=[Easter(), Day(-2)]),
Holiday('Christmas Day', month=12, day=25)
]
When using a function like this:
def offset_date(start, offset):
return start + pd.offsets.CustomBusinessDay(n=offset, calendar=myCalendar())
The dates within the rules will be skipped as expected.
But I now want to add 3 full weeks, 21 days to the rule set, with a given start-offset, instead of writing 21 rule lines to achieve the same thing?
I wonder if you guys know if it's possible to create a one-liner that adds 21 days to the rule set?
Here is one way to do it with a list comprehension, which keeps it short and readable:
class myCalendar(AbstractHolidayCalendar):
rules = [
Holiday("Off-day during Easter", month=1, day=1, offset=[Easter(), Day(-2)]),
Holiday("Christmas Day", month=12, day=25),
Holiday("Christmas Day", month=12, day=25),
] + [Holiday("Ski days", month=2, day=x) for x in range(1, 22)]
Here, a 21 days-off period starting February, 1st is added to the set of rules.
So that:
print(offset_date(pd.to_datetime("2023-01-31"), 1))
# 2023-02-22 00:00:00 as expected

How can I get the year, month, and day from a Deephaven DateTime in Python?

I have a Deephaven DateTime in the New York (US-East) timezone and I'd like to get the year, month, and day (of the month) numbers from it as integers in Python.
Deephaven's time module has these utilities. You may have used it to create a Deephaven DateTime in the first place.
from deephaven import time as dhtu
timestamp = dhtu.to_datetime("2022-04-01T12:00:00 NY")
The following three methods will give you what you're looking for:
year - Gets the year
month_of_year - Gets the month
day_of_month - Gets the day of the month
All three methods will give you what you want based on the DateTime itself and your preferred time zone.
tz_ny = dhtu.TimeZone.NY
year = dhtu.year(timestamp, tz_ny)
month = dhtu.month_of_year(timestamp, tz_ny)
day = dhtu.day_of_month(timestamp, tz_ny)

Create Pandas Holiday for SIFMA Good Friday

For SIFMA (bonds) market, Good Friday is NOT a holiday if it is the first Friday of the month because NFP (Non-Farm Payroll) numbers coming out.
We can use a hack to take the existing holidays and not include them if day of month <= 7, but this is ugly. Is there a way to properly define rules where the NFP Fridays are taken into account?
This works, but is ugly:
from pandas.tseries.holiday import (
AbstractHolidayCalendar,
Holiday,
nearest_workday,
sunday_to_monday,
GoodFriday,
)
# occurs on the first Friday of every month
sifma_good_friday_holidays = [
Holiday('Good Friday', year=d.year, month=d.month, day=d.day)
for d in GoodFriday.dates('1900-01-01', '2200-12-31')
if d.day > 7
]
Note You cannot currently create a holiday where you both pass in an offset and an observance (so you can't simply add an observance that filters out the holidays where the day of the month <= 7)
The following does seem to work, but also seems like quite a kludge.
def not_NFP_GoodFriday(dt: datetime):
possible = GoodFriday.dates(date(dt.year,1,1), date(dt.year,12,31))[0]
if possible.day <= 7:
return None
return datetime.fromtimestamp(possible.timestamp())
sifma_good_friday_holidays = Holiday("NFP Good Friday", month=1, day=1,
observance=not_NFP_GoodFriday)
Is there no better way than one of these two functions?

Use of the individual parts with Tkcalendar

My problem is that I want to use the individual parts from the calendar. That means, I would like to be able to use dd, mm and y individually. I decided to use the German spelling: dd.mm.y.
datepicker = Calendar(root, selectmode="day", year=2021, month=1, day=1, date_pattern='ddmmy')
In order to only filter out the day, I have so far proceeded as follows:
birthday = str(datepicker.get_date())
bday = int(str(birthday)[:-6])
But that's just a stopgap. How can I sort the day, month and year properly so that I can use the values ​​individually?
Thank you in advance for your answers, Best regards.

Python3 Panda's Holiday fails to NOT find dates in arbitrary periods in the past

Made my own definition of MLK Day Holiday that adheres not to when the holiday was first observed, but by when it was first observed by the NYSE. The NYSE first observed MLK day in January of 1998.
When asking the Holiday for the days in which the holiday occurred between dates, it works fine for the most part, returning an empty set when the MLK date is not in the range requested, and returning the appropriate date when it is. For date ranges that precede the start_date of the holiday, it appropriately returns the empty set, until we hit around 1995, and then it fails. I cannot figure out why it fails then and not in other situations when the empty set is the correct answer.
Note: Still stuck on Pandas 0.22.0. Python3
import pandas as pd
from datetime import datetime
from dateutil.relativedelta import MO
from pandas.tseries.holiday import Holiday
__author__ = 'eb'
mlk_rule = Holiday('MLK Day (NYSE Observed)',
start_date=datetime(1998, 1, 1), month=1, day=1,
offset=pd.DateOffset(weekday=MO(3)))
start = pd.to_datetime('1999-01-17')
end = pd.to_datetime('1999-05-01')
finish = pd.to_datetime('1980-01-01')
while start > finish:
print(f"{start} - {end}:")
try:
dates = mlk_rule.dates(start, end, return_name=True)
except Exception as e:
print("\t****** Fail *******")
print(f"\t{e}")
break
print(f"\t{dates}")
start = start - pd.DateOffset(years=1)
end = end - pd.DateOffset(years=1)
When run, this results in:
1999-01-17 00:00:00 - 1999-05-01 00:00:00:
1999-01-18 MLK Day (NYSE Observed)
Freq: 52W-MON, dtype: object
1998-01-17 00:00:00 - 1998-05-01 00:00:00:
1998-01-19 MLK Day (NYSE Observed)
Freq: 52W-MON, dtype: object
1997-01-17 00:00:00 - 1997-05-01 00:00:00:
Series([], dtype: object)
1996-01-17 00:00:00 - 1996-05-01 00:00:00:
Series([], dtype: object)
1995-01-17 00:00:00 - 1995-05-01 00:00:00:
****** Fail *******
Must provide freq argument if no data is supplied
What happens in 1995 that causes it to fail, that does not happen in the same periods in the years before?
ANSWER: Inside of the Holiday class, the dates() method is used to
gather the list of valid holidays within a requested date range. In
order to insure that this occurs properly, the implementation gathers
all holidays from one year before to one year after the requested date
range via the internal _reference_dates() method. In this method,
if the receiving Holiday instance has an internal start or end date,
it uses that date as the begin or end of the range to be examined
rather than the passed in requested range, even if the dates in the requested
range precede or exceed the start or end date of the rule.
The existing implementation mistakenly assumes it is ok to limit the effective range over which it must accurately identify what holidays are in existence to the range over which holidays exist. As part of a set of rules in a calendar, it is as important for a Holiday to identify where holidays do not exist as where they do. The NULL set response is an important function of the Holiday class.
For example, in a Trading Day Calendar that needs to identify when financial markets are open or closed, the calendar may need to accurately identify which days the market is closed over a 100 year history. The market only closed for MLK day for a small part of that history. A calendar that includes the MLK holiday as constructed above throws an error when asked for the open days or holidays for periods preceding the MLK start_date[1].
To fix this, I re-implemented the _reference_dates() method in a
custom sub-class of Holiday to insure that when the requested date
range extends before the start_date or after the end_date of the
holiday rule, it uses the actual requested range to build the
reference dates from, rather than bound the request by the internal
start and end dates.
Here is the implementation I am using.
class MLKHoliday(Holiday):
def __init__(self):
super().__init__('MLK Day (NYSE Observed)',
start_date=datetime(1998, 1, 1), month=1, day=1,
offset=pd.DateOffset(weekday=MO(3)))
def _reference_dates(self, start_date, end_date):
"""
Get reference dates for the holiday.
Return reference dates for the holiday also returning the year
prior to the start_date and year following the end_date. This ensures
that any offsets to be applied will yield the holidays within
the passed in dates.
"""
if self.start_date and start_date and start_date >= self.start_date:
start_date = self.start_date.tz_localize(start_date.tz)
if self.end_date and end_date and end_date <= self.end_date:
end_date = self.end_date.tz_localize(end_date.tz)
year_offset = pd.DateOffset(years=1)
reference_start_date = pd.Timestamp(
datetime(start_date.year - 1, self.month, self.day))
reference_end_date = pd.Timestamp(
datetime(end_date.year + 1, self.month, self.day))
# Don't process unnecessary holidays
dates = pd.DatetimeIndex(start=reference_start_date,
end=reference_end_date,
freq=year_offset, tz=start_date.tz)
return dates
Does anyone know if this has been fixed in a more up-to-date version of pandas?
[1] Note: As constructed in the original question, the mlk_rule will not actually fail to provide the NULL set to the dates() call over a range just preceding the start_date but will actually start throwing exceptions a year or so before that. This is because the mistaken assumption about the lack of need for a proper NULL set response is mitigated by the extension of the date range by a year in each direction by _reference_dates().

Categories