How to change Pandas.DataFrame.resample default holiday calendar? - python

I'm reading Wes McKinney's Python for Data Analysis book. On the topic of using DataFrame.resample() or Series.resample(), if I want to resample for Business days, I would use:
df.resample('B')
However, I noticed that the notation of 'B' depends on your computer's region... I'm failing to run the examples on page 344 because my calendar isn't US...
How can I explicitly choose to resample based on a particular country's holiday? Say, US holidays, or European holidays? Struggling to find some documentation on this... The doc for resample() I found here {http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.resample.html} is rather short and doesn't really go into the details of the first parameter rule...
Many thanks.

Custom Business Days (Experimental)
The CDay or CustomBusinessDay class provides a parametric BusinessDay
class which can be used to create customized business day calendars
which account for local holidays and local weekend conventions.

Related

Prophet Parameters

I am currently using Prophet to forecast usage in a year period. This is my first time using this algo and I have some questions in mind.
I am utilising the code attached below. I am wondering if anyone has included holidays as parameter before and how to do so while including holidays from other calendar (lunar/islamic etc). Also since February may have 1 more day in a leap year, would be great as well to know if the algorithm take this into consideration?
m = Prophet(
growth='logistic',
seasonality_mode='multiplicative',
seasonality_prior_scale=1.5,
mcmc_samples=5,
n_changepoints=25,
changepoint_range=0.8,
yearly_seasonality='auto',
weekly_seasonality='auto',
daily_seasonality='auto',
holidays=None,
holidays_prior_scale=10.0,
changepoint_prior_scale=0.05,
interval_width=0.8,
stan_backend=None,
)
The holidays parameter takes in a dataframe. The minimal set of columns required in that dataframe are date and holiday name.
The important thing to note here is that you provide both historical and future holidays in this dataframe.
Apart from the 2 columns mentioned above, the following columns are optional:
lower_window, upper_window (int) - to extend holiday effect around the date of holiday.
prior_scale(float) - to set a different prior scale for each holiday.
Also to answer your second question i.e.
Also since February may have 1 more day in a leap year, would be great
as well to know if the algorithm take this into consideration?
It depends on the modelling data. Since the data you'd be providing would already include leap year, Prophet will take that into consideration.

How to modify the observed values in python's holidays library

I am not sure how to modify the 'observed' values in holidays. I need a holiday to be observed on a Monday only if it falls on a Sunday i.e. 12-25-2016(Sunday) then 12-26-2016 is observed. However, I do not want a holiday to be observed on a Monday if it was on a Saturday, or a holiday be observed on a Saturday if it is on a Friday. i.e. 12-25-2015(Friday) then 12-26-2015 is observed. Those two example I got from testing the holidays library. I checked the documentation and could only find how to turn the observed holidays off. I just want to modify the observed values not remove.
Thank you for all the help. I am new to the holiday's library, so please pardon my noobness

How to extract useful features from time-series data (e.g., users' daily activities in a forum)

I have data regarding users' visits and postings in a discussion forum for a 1-week period, and this data contains the timestamp of the activity. Based on this forum data, I tried to predict users' another behavior (let's say X behavior). Initial results of the regression model show that users' forum activity seem to be associated with their X behavior. Besides these cumulative features: avg_visits_per_day, total_posts_whole_week, I also have features for each day (0<a<8): {a}_visits and {a}_posts.
Thus, I have 16 features in total, and the regression model built with these 16 features gives promising results. So, it would make more sense if I can generate more features. However, I do not know if there any useful feature-extraction strategy for such time-series data. I am using sklearn but did not see a method for this purpose. Any ideas or recommendations?
There are lots of options, an it's difficult to suggest which ones are more useful for predicting the unknown "x behaviour". However, you could:
Manually create features representing information that's clearly available in raw data, but not present in you current feature set at all. For example, if you have not only dates, but also times of activity logged - you can construct additional features for first/last/average time of visiting within each day (maybe converted to categorical morning/day/evening/night), average time between visits and so on. Probably day of week information could be useful as well.
Manually create relative features from existing set: say, visits/posts ratio for each day, number of days since last post, longest period without visits, etc
Use additional information if it's available: user's browser, OS, screen resolution, post length, keywords present in his/her post, subforum it belongs to, new post or follow-up, ... - once again, it's hard to tell beforehand what will be relevant.
Do automated feature extraction by package like tsfresh or (less automated) hctsa

JPYLibor fixing during Japanese holiday: negative time error

I am using QuantLib 1.7 with the Python interface.
I have constructed the JPY Fixed-Float swap curve following the standard convention. For the swap schedules I have a JointCalendar with Japan and UnitedKingdom. My JPYLibor index has the UK calendar only.
When I set the market date to 2009-May-1, I do a bootstrap using PiecewiseFlatForward with settlement date 2009-May-8 because in the Japan calendar there was a long holiday from 2009-May-4 (monday) to 2009-May-6.
Now, with this bootstraped curve, I try to value a swap that has a floating payment on 2009-May-7. When I try to value it (or compute the amount() function of the next floatingLeg cashflow which has a reset date on 2009-May-5) I get the error message "2nd leg: negative time (-0.00277778) given".
I guess that this is related to the fact that 2009-May-5, which is the London fixing date for value date 2009-May-7, falls on a Japanese holiday?
My swap payments schedules and reset schedule are matching Bloomberg so I am confident in theory is the correct convention. I have read some old posts regarding apparently a similar issue for a US swap, but as far as I understood this was a bug which was corrected around the time of QuantLib 0.9.
Could my problem be related to the same bug or I am not using QuantLib correctly?
The problem is that the value date for the payment, May 7th, is between today's date and the reference date of the curve. The fixing needs to be forecast, since it's in the future (the fixing date is on May 5th); but because the curve effectively starts on May 8th, it can't return the May 7th discount which is required to forecast the fixing.
The reason why this doesn't usually happen is that, when the value date is between today and the reference date, the fixing date is usually before today's date and thus the fixing can be loaded from past ones.
In this particular case, the way to make it work would be to create a curve with no settlement days so that its reference date is the same as today's date. If you then wanted the price as-of May 8th, you'd have to manually adjust the swap NPV for the discount between May 1st and 8th.

Detecting overlapping date recurrence rules

I'm working in a application that looks like Google Calendar, but with one main difference: events shouldn't have intersections with other events. This means that no two events may share common time, even in minutes granularity. This is specially useful for a calendar that only store meetings, since it is impossible to be at the same time in two meetings.
Just like Google Calendar, events may be created by using recurrence rules (every friday and sunday from 10 AM to 13 PM, for example). So I would like to detect overlapping events by only using rrules (of python-dateutil module), without needing to create N datetime objects and checking for intersection against each one.
Is it possible to detect overlapping dates by only using rrules? Is there anything similar already implemented in another library?
No, I don't believe it's possible to analyse a rrule to see if it can intersect another one without creating the datetime objects.
Essentially you're asking for the output of an algorithm without running the algorithm, and I think that's non-computable.
However, for certain types of rrule it is possible - e.g. a rrule of every Thursday can't intersect a rrule for every Tuesday. The problematical ones are days of the month and days of the year intersecting with days of week, and frequencies that never intersect.
The best bet would be to do the rules that are analytically checkable analytically, then for others generate the next year or so's data and compare manually.
The algorithm can run fast, since you can cache the existing occupied times as you add each rule.

Categories