Changing pandas class variables in python - python

I'm making use of the pandas.tseries.holiday module and have had an issue with the AbstractHolidayCalendar class within this module. The use is to obtain a calendar that accounts for all bank holidays/weekends in the UK when rolling forward business days.
My existing code (taken from another user) is:
class EnglandAndWalesHolidayCalendar(AbstractHolidayCalendar):
rules = [Holiday('New Years Day', month=1, day=1, observance=next_monday),
GoodFriday,
EasterMonday,
Holiday('Early May bank holiday',
month=5, day=1, offset=DateOffset(weekday=MO(1))),
Holiday('Spring bank holiday',
month=5, day=31, offset=DateOffset(weekday=MO(-1))),
Holiday('Summer bank holiday',
month=8, day=31, offset=DateOffset(weekday=MO(-1))),
Holiday('Christmas Day', month=12, day=25, observance=next_monday),
Holiday('Boxing Day',
month=12, day=26, observance=next_monday_or_tuesday)]
In the pandas module mentioned, I see the following:
class AbstractHolidayCalendar(object):
"""
Abstract interface to create holidays following certain rules.
"""
__metaclass__ = HolidayCalendarMetaClass
rules = []
start_date = Timestamp(datetime(1970, 1, 1))
end_date = Timestamp(datetime(2030, 12, 31))
_cache = None
However when adding
end_date = Timestamp(datetime(2080, 12, 31))
To my defined class it doesn't seem to work. Does anyone know how to adjust the end date without directly changing the pandas module?
Thanks

Related

pandas Calendar issue

This code for CustomBusinessDay() works fine:
from datetime import datetime
from pandas.tseries.offsets import CustomBusinessDay
runday = datetime(2021,12,30).date()
nextday = (runday + CustomBusinessDay()).date()
output 1:
In [26]: nextday
Out[26]: datetime.date(2021, 12, 31)
However, when adding an optional calendar as in date functionality , it produces the next business day even though today's date (Dec 31, 2021) is not a holiday according to a specified calendar below:
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, \
USMemorialDay, USMartinLutherKingJr, USPresidentsDay, GoodFriday, \
USLaborDay, USThanksgivingDay, nearest_workday
class NYSECalendar(AbstractHolidayCalendar):
''' NYSE holiday calendar via pandas '''
rules = [
Holiday('New Years Day', month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
GoodFriday,
USMemorialDay,
Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
USLaborDay,
USThanksgivingDay,
Holiday('Christmas', month=12, day=25, observance=nearest_workday),
]
nextday = (runday + CustomBusinessDay(calendar=NYSECalendar())).date()
output2:
In [27]: nextday
Out[27]: datetime.date(2022, 1, 3)
This line is the location of your problem:
Holiday('New Years Day', month=1, day=1, observance=nearest_workday)
If you take a look at the source code, nearest_workday means that the holiday is observed on a Friday if it falls on a Saturday, and on a Monday if the holiday falls on a Sunday. Since New Year's Day 2022 falls on a Saturday, it is observed today (12/31/2021) according to your calendar.
Removing the observance parameter will lead to an output of 2021-12-31.

How to remove a specific holiday from Pandas USFederalHolidayCalendar?

I'm trying to remove Columbus Day from pandas.tseries.holiday.USFederalHolidayCalendar.
This seems to be possible, as a one-time operation, with
from pandas.tseries.holiday import USFederalHolidayCalendar
cal = USFederalHolidayCalendar()
cal = cal.rules.pop(6)
However, if this code is within a function that gets called repeatedly (in a loop) to generate several independent outputs, I get the following error:
IndexError: pop index out of range
It gives me the impression that the object remains in its initial loaded state and as the loop progresses it pops holidays at index 6 until they're gone and then throws an error.
I tried reloading via importlib.reload to no avail.
Any idea what I'm doing wrong?
# Import your library
from pandas.tseries.holiday import USFederalHolidayCalendar
# Get an id of 'columbus' in 'rules' list
columbus_index = USFederalHolidayCalendar().rules.index([i for i in USFederalHolidayCalendar().rules if 'Columbus' in str(i)][0])
# Create your own class, inherit 'USFederalHolidayCalendar'
class USFederalHolidayCalendar(USFederalHolidayCalendar):
# Exclude 'columbus' entry
rules = USFederalHolidayCalendar().rules[:columbus_index] + USFederalHolidayCalendar().rules[columbus_index+1:]
# Create an object from your class
cal = USFederalHolidayCalendar()
print(cal.rules)
[Holiday: New Years Day (month=1, day=1, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Martin Luther King Jr. Day (month=1, day=1, offset=<DateOffset: weekday=MO(+3)>),
Holiday: Presidents Day (month=2, day=1, offset=<DateOffset: weekday=MO(+3)>),
Holiday: Memorial Day (month=5, day=31, offset=<DateOffset: weekday=MO(-1)>),
Holiday: July 4th (month=7, day=4, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Labor Day (month=9, day=1, offset=<DateOffset: weekday=MO(+1)>),
Holiday: Veterans Day (month=11, day=11, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Thanksgiving (month=11, day=1, offset=<DateOffset: weekday=TH(+4)>),
Holiday: Christmas (month=12, day=25, observance=<function nearest_workday at 0x7f6afad571f0>)]
The problem here is that rules is a class attribute (a list of objects). See the code taken from here:
class USFederalHolidayCalendar(AbstractHolidayCalendar):
"""
US Federal Government Holiday Calendar based on rules specified by:
https://www.opm.gov/policy-data-oversight/
snow-dismissal-procedures/federal-holidays/
"""
rules = [
Holiday("New Years Day", month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
USMemorialDay,
Holiday("July 4th", month=7, day=4, observance=nearest_workday),
USLaborDay,
USColumbusDay,
Holiday("Veterans Day", month=11, day=11, observance=nearest_workday),
USThanksgivingDay,
Holiday("Christmas", month=12, day=25, observance=nearest_workday),
]
Since the attribute is defined on the class, there is only one underlying list referred to, so if operations on different instances of that class both attempt to edit the list, then you'll have some unwanted behavior. Here is an example that shows what's going on:
>>> class A:
... rules = [0,1,2]
...
>>> a1 = A()
>>> a2 = A()
>>> a1.rules.pop()
2
>>> a1.rules.pop()
1
>>> a2.rules.pop()
0
>>> a2.rules.pop()
IndexError: pop from empty list
>>> a3 = A()
>>> a3.rules
[]
Also, each module in python is imported only one time

Python for dummies using the bakeshare data

I want to run this code but I have some errors and I can't see the problem. The code is below. And do I have to set global list for cities, month and the day?
import time
import pandas as pd
import numpy as np
CITY_DATA = { 'chicago': 'chicago.csv', 'new york city': 'new_york_city.csv',
'washington': 'washington.csv' }
def get_filters():
"""
Asks user to specify a city, month, and day to analyze.
Returns:
(str) city - name of the city to analyze
(str) month - name of the month to filter by, or "all" to apply no month filter
(str) day - name of the day of week to filter by, or "all" to apply no day filter
"""
print('Hello! Let\'s explore some US bikeshare data!')
# get user input for city (chicago, new york city, washington). HINT: Use a while loop to handle invalid inputs
cities = ('Chicago', 'New York', 'Washington')
while True:
city = input('Which of these cities do you want to explore : Chicago, New York or Washington? \n> ').lower()
if city in cities:
break
# get user input for month (all, january, february, ... , june)
months = ['january', 'february', 'march', 'april', 'may', 'june']
month = get_user_input('Now you have to enter a month to get some months result) \n> ', months)
# get user input for day of week (all, monday, tuesday, ... sunday)
days = ['sunday', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday' ]
day = get_user_input('Now you have to enter a month to get some months result) \n> ', days)
print('-'*40)
return city, month, day
If I understood you correctly I think this is what you have to do:
CITY_DATA = { 'chicago': 'chicago.csv', 'new york city': 'new_york_city.csv',
'washington': 'washington.csv' }
def get_filters():
"""
Asks user to specify a city, month, and day to analyze.
Returns:
(str) city - name of the city to analyze
(str) month - name of the month to filter by, or "all" to apply no month filter
(str) day - name of the day of week to filter by, or "all" to apply no day filter
"""
print('Hello! Let\'s explore some US bikeshare data!')
# get user input for city (chicago, new york city, washington). HINT: Use a while loop to handle invalid inputs
cities = ('Chicago', 'New York', 'Washington')
while True:
city = input('Which of these cities do you want to explore : Chicago, New York or Washington? \n> ').capitalize() #was lower()
if city in cities:
break
# get user input for month (all, january, february, ... , june)
months = ['january', 'february', 'march', 'april', 'may', 'june']
month = input('Now you have to enter a month to get some months result \n> {} \n> '.format(months))
# get user input for day of week (all, monday, tuesday, ... sunday)
days = ['sunday', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday']
day = input('Now you have to enter a dau to get some days result \n> {} \n> '.format(days))
print('-'*40)
if month == '' and day == '':
return city, months, days
elif month == '' and day != '':
return city, months, day
elif month != '' and day == '':
return city, month, days
else:
return city, month, day
city, month, day = get_filters()
print(city, month, day)
A couple of things to note:
The input of the city was set to '.lower()' while the list of cities contained capitilized letters. So I changed it to capitilize().
Also you wanted it to return every day and month if the user didn't specify any input. Therefore I added a simple if-test at the end to check for that. Furthermore I used the '-format()' to display the different options the user has on the input. Hope everything is clear!
Using .lower() is fine, but you should change your cities list and your `CITY_DATA`` dict to match the names (so New York City --: new york).
CITY_DATA = { 'chicago': 'chicago.csv', 'new york': 'new_york_city.csv',
'washington': 'washington.csv' }
def get_filters():
"""
Asks user to specify a city, month, and day to analyze.
Returns:
(str) city - name of the city to analyze
(str) month - name of the month to filter by, or "all" to apply no month filter
(str) day - name of the day of week to filter by, or "all" to apply no day filter
"""
print('Hello! Let\'s explore some US bikeshare data!')
# get user input for city (chicago, new york city, washington). HINT: Use a while loop to handle invalid inputs
cities = ('chicago', 'new york', 'washington')
while True:
city = raw_input('Which of these cities do you want to explore : Chicago, New York or Washington? \n> ').lower()
if city in cities:
break
# get user input for month (all, january, february, ... , june)
months = ['january', 'february', 'march', 'april', 'may', 'june']
month = raw_input('Now you have to enter a month to get some months result \n> {} \n>'.format(months)).lower()
# get user input for day of week (all, monday, tuesday, ... sunday)
days = ['sunday', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday']
day = raw_input('Now you have to enter a day to get some days result \n> {} \n>'.format(days)).lower()
print('-'*40)
if month == '' and day == '':
return city, months, days
elif month == '' and day != '':
return city, months, day
elif month != '' and day == '':
return city, month, days
else:
return city, month, day
city, month, day = get_filters()
print(city, month, day)
If you are using Python 2.7, you should use raw_input() instead of input().

python last working day of month (with CustomBusinessDay)?

I like to calculate last working day before or after a specific date(includes holidays, not just weekends)?
import datetime as dt
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, nearest_workday, \
USMartinLutherKingJr, USPresidentsDay, GoodFriday, USMemorialDay, \
USLaborDay, USThanksgivingDay
class USTradingCalendar(AbstractHolidayCalendar):
rules = [
Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
GoodFriday,
USMemorialDay,
Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
USLaborDay,
USThanksgivingDay,
Holiday('Christmas', month=12, day=25, observance=nearest_workday)
]
def get_trading_close_holidays(fromyear, toyear):
inst = USTradingCalendar()
return inst.holidays(dt.datetime(fromyear-1, 12, 31), dt.datetime(toyear, 12, 31))
print(get_trading_close_holidays(2018,2018))
>> DatetimeIndex(['2018-01-01', '2018-01-15', '2018-02-19', '2018-03-30', '2018-05-28', '2018-07-04', '2018-09-03', '2018-11-22', '2018-12-25'], dtype='datetime64[ns]', freq=None)
import datetime as dt
from pandas.tseries.holiday import USFederalHolidayCalendar
bday_us = CustomBusinessDay(calendar=get_trading_close_holidays(2000,2050))
d = dt.datetime(2018, 3, 31)
d - bday_us
>> Timestamp('2018-03-30 00:00:00')
This falls on Good Friday, that holiday(as shown)... should show 1 day before = 2018-03-29...
What's the issue?
I was able to reproduce the problem and after some testing I've narrowed it down to using a DatetimeIndex as the input of the calendar parameter in CustomBusinessDay.
You can skip that and use the calendar instance directly:
import datetime as dt
import pandas as pd
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, nearest_workday, \
USMartinLutherKingJr, USPresidentsDay, GoodFriday, USMemorialDay, \
USLaborDay, USThanksgivingDay
from pandas.tseries.offsets import CustomBusinessDay, BDay
class USTradingCalendar(AbstractHolidayCalendar):
rules = [
Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
GoodFriday,
USMemorialDay,
Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
USLaborDay,
USThanksgivingDay,
Holiday('Christmas', month=12, day=25, observance=nearest_workday)
]
bday_us = CustomBusinessDay(calendar=USTradingCalendar())
d = dt.datetime(2018, 3, 31)
c = d - bday_us
print(c)
The output:
2018-03-29 00:00:00

How to filter two datetime indices?

I have two datetime indices - one being a date_range of business days and the other being a list of holidays.
I filter the holiday list by a start and end date. But now I need to join them and drop any duplicates (holidays and trading days both exist).
Finally I need to convert the daterange into a list of formatted strings ie: yyyy_mm_dd that I can iterate through later.
Here is my code so far:
import datetime
import pandas as pd
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, nearest_workday, \
USMartinLutherKingJr, USPresidentsDay, GoodFriday, USMemorialDay, \
USLaborDay, USThanksgivingDay
class USTradingCalendar(AbstractHolidayCalendar):
rules = [
Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
GoodFriday,
USMemorialDay,
Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
USLaborDay,
USThanksgivingDay,
Holiday('Christmas', month=12, day=25, observance=nearest_workday)
]
def get_trading_close_holidays(year):
inst = USTradingCalendar()
return inst.holidays(datetime.datetime(year-1, 12, 31),
datetime.datetime(year, 12, 31))
start_date = "2017_07_01"
end_date = "2017_08_31"
start_date = datetime.datetime.strptime(start_date,"%Y_%m_%d").date()
end_date = datetime.datetime.strptime(end_date,"%Y_%m_%d").date()
date_range = pd.bdate_range(start = start_date, end = end_date, name =
"trading_days")
holidays = get_trading_close_holidays(start_date.year)
holidays = holidays.where((holidays.date > start_date) &
(holidays.date < end_date))
holidays = holidays.dropna(how = 'any')
date_range = date_range.where(~(date_range.trading_days.isin(holidays)))
Consider filtering by boolean condition:
date_range = date_range[date_range.date != holidays.date]
print(date_range) # ONE HOLIDAY 2017-07-04 DOES NOT APPEAR
# DatetimeIndex(['2017-07-03', '2017-07-05', '2017-07-06', '2017-07-07',
# '2017-07-10', '2017-07-11', '2017-07-12', '2017-07-13',
# '2017-07-14', '2017-07-17', '2017-07-18', '2017-07-19',
# '2017-07-20', '2017-07-21', '2017-07-24', '2017-07-25',
# '2017-07-26', '2017-07-27', '2017-07-28', '2017-07-31',
# '2017-08-01', '2017-08-02', '2017-08-03', '2017-08-04',
# '2017-08-07', '2017-08-08', '2017-08-09', '2017-08-10',
# '2017-08-11', '2017-08-14', '2017-08-15', '2017-08-16',
# '2017-08-17', '2017-08-18', '2017-08-21', '2017-08-22',
# '2017-08-23', '2017-08-24', '2017-08-25', '2017-08-28',
# '2017-08-29', '2017-08-30', '2017-08-31'],
# dtype='datetime64[ns]', name='trading_days', freq=None)
And using astype() to convert the datetime index to string type array, even tostring() for list conversion:
strdates = date_range.date.astype('str').tolist()
print(strdates)
# ['2017-07-03', '2017-07-05', '2017-07-06', '2017-07-07', '2017-07-10',
# '2017-07-11', '2017-07-12', '2017-07-13', '2017-07-14', '2017-07-17',
# '2017-07-18', '2017-07-19', '2017-07-20', '2017-07-21', '2017-07-24',
# '2017-07-25', '2017-07-26', '2017-07-27', '2017-07-28', '2017-07-31',
# '2017-08-01', '2017-08-02', '2017-08-03', '2017-08-04', '2017-08-07',
# '2017-08-08', '2017-08-09', '2017-08-10', '2017-08-11', '2017-08-14',
# '2017-08-15', '2017-08-16', '2017-08-17', '2017-08-18', '2017-08-21',
# '2017-08-22', '2017-08-23', '2017-08-24', '2017-08-25', '2017-08-28',
# '2017-08-29', '2017-08-30', '2017-08-31']

Categories