How to remove a specific holiday from Pandas USFederalHolidayCalendar?

How to remove a specific holiday from Pandas USFederalHolidayCalendar? - python

I'm trying to remove Columbus Day from pandas.tseries.holiday.USFederalHolidayCalendar.
This seems to be possible, as a one-time operation, with
from pandas.tseries.holiday import USFederalHolidayCalendar
cal = USFederalHolidayCalendar()
cal = cal.rules.pop(6)
However, if this code is within a function that gets called repeatedly (in a loop) to generate several independent outputs, I get the following error:
IndexError: pop index out of range
It gives me the impression that the object remains in its initial loaded state and as the loop progresses it pops holidays at index 6 until they're gone and then throws an error.
I tried reloading via importlib.reload to no avail.
Any idea what I'm doing wrong?

# Import your library
from pandas.tseries.holiday import USFederalHolidayCalendar
# Get an id of 'columbus' in 'rules' list
columbus_index = USFederalHolidayCalendar().rules.index([i for i in USFederalHolidayCalendar().rules if 'Columbus' in str(i)][0])
# Create your own class, inherit 'USFederalHolidayCalendar'
class USFederalHolidayCalendar(USFederalHolidayCalendar):
# Exclude 'columbus' entry
rules = USFederalHolidayCalendar().rules[:columbus_index] + USFederalHolidayCalendar().rules[columbus_index+1:]
# Create an object from your class
cal = USFederalHolidayCalendar()
print(cal.rules)
[Holiday: New Years Day (month=1, day=1, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Martin Luther King Jr. Day (month=1, day=1, offset=<DateOffset: weekday=MO(+3)>),
Holiday: Presidents Day (month=2, day=1, offset=<DateOffset: weekday=MO(+3)>),
Holiday: Memorial Day (month=5, day=31, offset=<DateOffset: weekday=MO(-1)>),
Holiday: July 4th (month=7, day=4, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Labor Day (month=9, day=1, offset=<DateOffset: weekday=MO(+1)>),
Holiday: Veterans Day (month=11, day=11, observance=<function nearest_workday at 0x7f6afad571f0>),
Holiday: Thanksgiving (month=11, day=1, offset=<DateOffset: weekday=TH(+4)>),
Holiday: Christmas (month=12, day=25, observance=<function nearest_workday at 0x7f6afad571f0>)]

The problem here is that rules is a class attribute (a list of objects). See the code taken from here:
class USFederalHolidayCalendar(AbstractHolidayCalendar):
"""
US Federal Government Holiday Calendar based on rules specified by:
https://www.opm.gov/policy-data-oversight/
snow-dismissal-procedures/federal-holidays/
"""
rules = [
Holiday("New Years Day", month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
USMemorialDay,
Holiday("July 4th", month=7, day=4, observance=nearest_workday),
USLaborDay,
USColumbusDay,
Holiday("Veterans Day", month=11, day=11, observance=nearest_workday),
USThanksgivingDay,
Holiday("Christmas", month=12, day=25, observance=nearest_workday),
]
Since the attribute is defined on the class, there is only one underlying list referred to, so if operations on different instances of that class both attempt to edit the list, then you'll have some unwanted behavior. Here is an example that shows what's going on:
>>> class A:
... rules = [0,1,2]
...
>>> a1 = A()
>>> a2 = A()
>>> a1.rules.pop()
2
>>> a1.rules.pop()
1
>>> a2.rules.pop()
0
>>> a2.rules.pop()
IndexError: pop from empty list
>>> a3 = A()
>>> a3.rules
[]
Also, each module in python is imported only one time

Related

KeyError when trying to access the mode of DataFrame columns

I am trying to run the following code:
import time
import pandas as pd
import numpy as np
CITY_DATA = {'chicago': 'chicago.csv',
'new york city': 'new_york_city.csv',
'washington': 'washington.csv'}
def get_filters():
"""
Asks user to specify a city, month, and day to analyze.
Returns:
(str) city - name of the city to analyze
(str) month - name of the month to filter by, or "all" to apply no month filter
(str) day - name of the day of week to filter by, or "all" to apply no day filter
"""
print('Hello! Let\'s explore some US bikeshare data!')
# get user input for city (chicago, new york city, washington). HINT: Use a while loop to handle invalid inputs
while True:
city = input('Which city you would like to explore : "chicago" , "new york city" , or "washington" :' )
if city not in ('chicago', 'new york city', 'washington'):
print(" You entered wrong choice , please try again")
continue
else:
break
# get user input for month (all, january, february, ... , june)
while True:
month = input('Enter "all" for all data or chose a month : "january" , "february" , "march", "april" , "may" or "june " :')
if month not in ("all", "january", "february", "march", "april", "may", "june"):
print(" You entered wrong choice , please try again")
continue
else:
break
# get user input for day of week (all, monday, tuesday, ... sunday)
while True:
day = input('Enter "all" for all days or chose a day : "saturday", "sunday", "monday", "tuesday", "wednesday", "thursday", "friday": ')
if day not in ("all","saturday", "sunday", "monday", "tuesday", "wednesday", "thursday", "friday"):
print(" You entered wrong choice , please try again")
continue
else:
break
print('-'*60)
return city, month, day
def load_data(city, month, day):
"""
Loads data for the specified city and filters by month and day if applicable.
Args:
(str) city - name of the city to analyze
(str) month - name of the month to filter by, or "all" to apply no month filter
(str) day - name of the day of week to filter by, or "all" to apply no day filter
Returns:
df - Pandas DataFrame containing city data filtered by month and day
"""
df = pd.read_csv(CITY_DATA[city])
# convert the Start Time column to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'])
# extract month , day of week , and hour from Start Time to new columns
df['month'] = df['Start Time'].dt.month
df['day_of_week'] = df['Start Time'].dt.day_name
df['hour'] = df['Start Time'].dt.hour
# filter by month if applicable
if month != 'all':
# use the index of the month_list to get the corresponding int
months = ['january', 'february', 'march', 'april', 'may', 'june']
month = months.index(month) + 1
# filter by month to create the new dataframe
df = df[df['month'] == month]
# filter by day of week if applicable
if day != 'all':
# filter by day of week to create the new dataframe
df = df[df['day_of_week'] == day.title()]
return df
def time_stats(df):
"""Displays statistics on the most frequent times of travel."""
print('\nCalculating The Most Frequent Times of Travel...\n')
start_time = time.time()
# display the most common month
popular_month = df['month'].mode()[0]
print('\n The most popular month is : \n', popular_month)
# display the most common day of week
popular_day = df['day_of_week'].mode()[0]
print('\n The most popular day of the week is : \n', str(popular_day))
# display the most common start hour
popular_hour = df['hour'].mode()[0]
print('\n The most popular hour of the day is :\n ', popular_hour)
print("\nThis took %s seconds.\n" % (time.time() - start_time))
print('-'*60)
def station_stats(df):
"""Displays statistics on the most popular stations and trip."""
print('\nCalculating The Most Popular Stations and Trip...\n')
start_time = time.time()
# display most commonly used start station
start_station = df['Start Station'].value_counts().idxmax()
print('\n The most commonly used start station is : \n', start_station)
# display most commonly used end station
end_station = df['End Station'].value_counts().idxmax()
print('\nThe most commonly used end station is: \n', end_station)
# display most frequent combination of start station and end station trip
combination = df.groupby(['Start Station','End Station']).value_counts().idxmax()
print('\nThe most frequent combination of start station and end station are: \n', combination)
print("\nThis took %s seconds." % (time.time() - start_time))
print('-'*40)
def trip_duration_stats(df):
"""Displays statistics on the total and average trip duration."""
start_time = time.time()
travel_time = sum(df['Trip Duration'])
print('Total travel time:', travel_time / 86400, " Days")
# display total travel time
total_time = sum(df['Trip Duration'])
print('\nThe total travel time is {} seconds: \n', total_time)
# display mean travel time
mean_time = df['Trip Duration'].mean()
print('\n The average travel time is \n', mean_time)
print("\nThis took %s seconds." % (time.time() - start_time))
print('-'*40)
def user_stats(df):
"""Displays statistics on bikeshare users."""
print('\nCalculating User Stats...\n')
start_time = time.time()
# TO DO: Display counts of user types
user_types = df['User Type'].value_counts()
#print(user_types)
print('User Types:\n', user_types)
# TO DO: Display counts of gender
print("\nThis took %s seconds." % (time.time() - start_time))
print('-'*40)
def main():
while True:
city, month, day = get_filters()
df = load_data(city, month, day)
time_stats(df)
station_stats(df)
trip_duration_stats(df)
user_stats(df)
restart = input('\nWould you like to restart? Enter yes or no.\n')
if restart.lower() != 'yes':
break
if __name__ == "__main__":
main()
and I am receiving the following errors , can someone assist please
the errors:
> Traceback (most recent call last):
File "C:\Users\DELL\PycharmProjects\Professional\venv\Lib\site-packages\pandas\core\indexes\range.py", line 391, in get_loc
return self._range.index(new_key)
^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: 0 is not in range
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\DELL\PycharmProjects\Professional\Bikeshare.py", line 203, in <module>
main()
File "C:\Users\DELL\PycharmProjects\Professional\Bikeshare.py", line 192, in main
time_stats(df)
File "C:\Users\DELL\PycharmProjects\Professional\Bikeshare.py", line 100, in time_stats
popular_month = df['month'].mode()[0]
~~~~~~~~~~~~~~~~~~^^^
File "C:\Users\DELL\PycharmProjects\Professional\venv\Lib\site-packages\pandas\core\series.py", line 981, in __getitem__
Calculating The Most Frequent Times of Travel...
return self._get_value(key)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DELL\PycharmProjects\Professional\venv\Lib\site-packages\pandas\core\series.py", line 1089, in _get_value
loc = self.index.get_loc(label)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\DELL\PycharmProjects\Professional\venv\Lib\site-packages\pandas\core\indexes\range.py", line 393, in get_loc
raise KeyError(key) from err
KeyError: 0
I am expecting to filter pandas DataFrame to return month, day of week, and hour to perform some statistics.

KeyError means that the key isn't valid, because it doesn't exist. In this case, one reason to get KeyError when trying to get first mode is when column 'month' in dataframe is empty, and therefore mode() returns an empty collection, so you get KeyError: 0 when trying to get its first element.
To avoid this, you could replace:
popular_month = df['month'].mode()[0]
With:
try:
# try to get first mode of column 'month'
popular_month = df['month'].mode()[0]
except KeyError:
# if there's no data on column 'month'
popular_month = "unknown"
Because if there's no data on 'month' column, there's no point in trying to get its mode.
More about handling exceptions: https://docs.python.org/3/tutorial/errors.html#handling-exceptions

Also when I tried to ( not use the filters) by choosing " all " in the second and 3rd input, I get the following result:
Calculating The Most Frequent Times of Travel...
The most popular month is :
6
The most popular day of the week is :
<bound method PandasDelegate._add_delegate_accessors.._create_delegator_method..f of <pandas.core.indexes.accessors.DatetimeProperties object at 0x0000022B7CD5E890>>
The most popular hour of the day is :
17
This took 0.0260775089263916 seconds.
Calculating The Most Popular Stations and Trip...
The most commonly used start station is :
Streeter Dr & Grand Ave
The most commonly used end station is:
Streeter Dr & Grand Ave
The most frequent combination of start station and end station are:
('2112 W Peterson Ave', '2112 W Peterson Ave', 1064651, Timestamp('2017-06-02 07:59:13'), '2017-06-02 08:25:42', 1589, 'Subscriber', 'Female', 1963.0, 6, <bound method PandasDelegate._add_delegate_accessors.._create_delegator_method..f of <pandas.core.indexes.accessors.DatetimeProperties object at 0x0000022B7CD5E890>>, 7)
This took 2.1254045963287354 seconds.
Total travel time: 3250.8308680555556 Days
The total travel time is {} seconds:
280871787
The average travel time is
936.23929
This took 0.06502270698547363 seconds.
Calculating User Stats...
User Types:
Subscriber 238889
Customer 61110
Dependent 1
Name: User Type, dtype: int64
This took 0.022009611129760742 seconds.
Would you like to restart? Enter yes or no.

how to convert windows system timezone to uk timezone

i'm trying to convert "(UTC+05:30) Chennai, Kolkata, Mumbai, New Delhi India Standard Time" to uk timzone. I found the solution using this "Asia/Kolkata" but I should either pass "(UTC+05:30)" or "Chennai, Kolkata, Mumbai, New Delhi" or "India Standard Time" to my function to convert it into uk time.
Here is my code:
from datetime import datetime
from pytz import timezone
def get_uk_time(time_zone):
try:
country_tz = timezone(time_zone)
format = "%Y, %m, %d, %H, %M, %S"
time_now = datetime.now(timezone(time_zone))
format_time = time_now.strftime(format)
london_tz = timezone('Europe/London')
local_time = country_tz.localize(datetime.strptime(format_time, format))
convert_to_london_time = local_time.astimezone(london_tz)
uk_time = convert_to_london_time.strftime('%H:%M')
return uk_time
except:
print('unable to find the timezone')
#my_json = {'timezone': 'Asia/Kolkata'}
my_json = {'timezone': '(UTC+05:30) Chennai, Kolkata, Mumbai, New Delhi India Standard Time'}
time_zone = my_json['timezone']
time_in_london = get_uk_time(time_zone)
print(time_in_london)

If we remove the try/except, the error is :
File "/home/stack_overflow/so71012475.py", line 6, in get_uk_time
country_tz = timezone(time_zone)
pytz.exceptions.UnknownTimeZoneError: '(UTC+05:30) Chennai, Kolkata, Mumbai, New Delhi India Standard Time'
Because (UTC+05:30) Chennai, Kolkata, Mumbai, New Delhi India Standard Time is not a correct identifier. I could not find which identifiers it corresponds to, but I can suggest another approach.
Because you know the UTC offset, you can create a timezone object with it, and use it to compute the UK time :
import datetime
import re
import pytz
# the data you provided
my_json = {'timezone': '(UTC+05:30) Chennai, Kolkata, Mumbai, New Delhi India Standard Time'}
# a regex to extract relevant info from your data
pattern = re.compile(r"\(UTC(?P<sign>[+-])(?P<hours>\d\d):(?P<minutes>\d\d)\)(?P<name>.*)")
# extract the info
match = pattern.match(my_json["timezone"])
assert match is not None
tz_name = match.group("name").strip()
hours, minutes = int(match.group("hours")), int(match.group("minutes"))
is_negative = match.group("sign") == "-"
# build the timezone object
my_tz = datetime.timezone(offset=datetime.timedelta(hours=hours, minutes=minutes), name=tz_name)
if is_negative:
my_tz = -my_tz
# use the timezone object to localize the current time
my_time = datetime.datetime.now().replace(tzinfo=my_tz)
print(my_time)
# and convert it to UK time
uk_tz = pytz.timezone('Europe/London')
uk_time = my_time.astimezone(uk_tz)
print(uk_time)
2022-02-07 14:33:12.463372+05:30
2022-02-07 09:03:12.463372+00:00
(it is actually 14:30 at my local time when I am writing this answer, and if I was in India, it would mean that it is 09:00 in London.

pandas Calendar issue

This code for CustomBusinessDay() works fine:
from datetime import datetime
from pandas.tseries.offsets import CustomBusinessDay
runday = datetime(2021,12,30).date()
nextday = (runday + CustomBusinessDay()).date()
output 1:
In [26]: nextday
Out[26]: datetime.date(2021, 12, 31)
However, when adding an optional calendar as in date functionality , it produces the next business day even though today's date (Dec 31, 2021) is not a holiday according to a specified calendar below:
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, \
USMemorialDay, USMartinLutherKingJr, USPresidentsDay, GoodFriday, \
USLaborDay, USThanksgivingDay, nearest_workday
class NYSECalendar(AbstractHolidayCalendar):
''' NYSE holiday calendar via pandas '''
rules = [
Holiday('New Years Day', month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
GoodFriday,
USMemorialDay,
Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
USLaborDay,
USThanksgivingDay,
Holiday('Christmas', month=12, day=25, observance=nearest_workday),
]
nextday = (runday + CustomBusinessDay(calendar=NYSECalendar())).date()
output2:
In [27]: nextday
Out[27]: datetime.date(2022, 1, 3)

This line is the location of your problem:
Holiday('New Years Day', month=1, day=1, observance=nearest_workday)
If you take a look at the source code, nearest_workday means that the holiday is observed on a Friday if it falls on a Saturday, and on a Monday if the holiday falls on a Sunday. Since New Year's Day 2022 falls on a Saturday, it is observed today (12/31/2021) according to your calendar.
Removing the observance parameter will lead to an output of 2021-12-31.

python last working day of month (with CustomBusinessDay)?

I like to calculate last working day before or after a specific date(includes holidays, not just weekends)?
import datetime as dt
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, nearest_workday, \
USMartinLutherKingJr, USPresidentsDay, GoodFriday, USMemorialDay, \
USLaborDay, USThanksgivingDay
class USTradingCalendar(AbstractHolidayCalendar):
rules = [
Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
GoodFriday,
USMemorialDay,
Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
USLaborDay,
USThanksgivingDay,
Holiday('Christmas', month=12, day=25, observance=nearest_workday)
]
def get_trading_close_holidays(fromyear, toyear):
inst = USTradingCalendar()
return inst.holidays(dt.datetime(fromyear-1, 12, 31), dt.datetime(toyear, 12, 31))
print(get_trading_close_holidays(2018,2018))
>> DatetimeIndex(['2018-01-01', '2018-01-15', '2018-02-19', '2018-03-30', '2018-05-28', '2018-07-04', '2018-09-03', '2018-11-22', '2018-12-25'], dtype='datetime64[ns]', freq=None)
import datetime as dt
from pandas.tseries.holiday import USFederalHolidayCalendar
bday_us = CustomBusinessDay(calendar=get_trading_close_holidays(2000,2050))
d = dt.datetime(2018, 3, 31)
d - bday_us
>> Timestamp('2018-03-30 00:00:00')
This falls on Good Friday, that holiday(as shown)... should show 1 day before = 2018-03-29...
What's the issue?

I was able to reproduce the problem and after some testing I've narrowed it down to using a DatetimeIndex as the input of the calendar parameter in CustomBusinessDay.
You can skip that and use the calendar instance directly:
import datetime as dt
import pandas as pd
from pandas.tseries.holiday import AbstractHolidayCalendar, Holiday, nearest_workday, \
USMartinLutherKingJr, USPresidentsDay, GoodFriday, USMemorialDay, \
USLaborDay, USThanksgivingDay
from pandas.tseries.offsets import CustomBusinessDay, BDay
class USTradingCalendar(AbstractHolidayCalendar):
rules = [
Holiday('NewYearsDay', month=1, day=1, observance=nearest_workday),
USMartinLutherKingJr,
USPresidentsDay,
GoodFriday,
USMemorialDay,
Holiday('USIndependenceDay', month=7, day=4, observance=nearest_workday),
USLaborDay,
USThanksgivingDay,
Holiday('Christmas', month=12, day=25, observance=nearest_workday)
]
bday_us = CustomBusinessDay(calendar=USTradingCalendar())
d = dt.datetime(2018, 3, 31)
c = d - bday_us
print(c)
The output:
2018-03-29 00:00:00

Changing pandas class variables in python

I'm making use of the pandas.tseries.holiday module and have had an issue with the AbstractHolidayCalendar class within this module. The use is to obtain a calendar that accounts for all bank holidays/weekends in the UK when rolling forward business days.
My existing code (taken from another user) is:
class EnglandAndWalesHolidayCalendar(AbstractHolidayCalendar):
rules = [Holiday('New Years Day', month=1, day=1, observance=next_monday),
GoodFriday,
EasterMonday,
Holiday('Early May bank holiday',
month=5, day=1, offset=DateOffset(weekday=MO(1))),
Holiday('Spring bank holiday',
month=5, day=31, offset=DateOffset(weekday=MO(-1))),
Holiday('Summer bank holiday',
month=8, day=31, offset=DateOffset(weekday=MO(-1))),
Holiday('Christmas Day', month=12, day=25, observance=next_monday),
Holiday('Boxing Day',
month=12, day=26, observance=next_monday_or_tuesday)]
In the pandas module mentioned, I see the following:
class AbstractHolidayCalendar(object):
"""
Abstract interface to create holidays following certain rules.
"""
__metaclass__ = HolidayCalendarMetaClass
rules = []
start_date = Timestamp(datetime(1970, 1, 1))
end_date = Timestamp(datetime(2030, 12, 31))
_cache = None
However when adding
end_date = Timestamp(datetime(2080, 12, 31))
To my defined class it doesn't seem to work. Does anyone know how to adjust the end date without directly changing the pandas module?
Thanks

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to remove a specific holiday from Pandas USFederalHolidayCalendar? - python

Related

KeyError when trying to access the mode of DataFrame columns

how to convert windows system timezone to uk timezone

pandas Calendar issue

python last working day of month (with CustomBusinessDay)?

Changing pandas class variables in python

Categories

Resources