I am currently using Prophet to forecast usage in a year period. This is my first time using this algo and I have some questions in mind.
I am utilising the code attached below. I am wondering if anyone has included holidays as parameter before and how to do so while including holidays from other calendar (lunar/islamic etc). Also since February may have 1 more day in a leap year, would be great as well to know if the algorithm take this into consideration?
m = Prophet(
growth='logistic',
seasonality_mode='multiplicative',
seasonality_prior_scale=1.5,
mcmc_samples=5,
n_changepoints=25,
changepoint_range=0.8,
yearly_seasonality='auto',
weekly_seasonality='auto',
daily_seasonality='auto',
holidays=None,
holidays_prior_scale=10.0,
changepoint_prior_scale=0.05,
interval_width=0.8,
stan_backend=None,
)
The holidays parameter takes in a dataframe. The minimal set of columns required in that dataframe are date and holiday name.
The important thing to note here is that you provide both historical and future holidays in this dataframe.
Apart from the 2 columns mentioned above, the following columns are optional:
lower_window, upper_window (int) - to extend holiday effect around the date of holiday.
prior_scale(float) - to set a different prior scale for each holiday.
Also to answer your second question i.e.
Also since February may have 1 more day in a leap year, would be great
as well to know if the algorithm take this into consideration?
It depends on the modelling data. Since the data you'd be providing would already include leap year, Prophet will take that into consideration.
Related
I am working on a panel dataset that includes daily stock returns of 450 firms for 5 years and daily ESG score(momentum based) for 5 years. I want to regress stock return on daily ESG scores, keeping Firm and year fixed effect. I have used linearmodels.panel function in python and set the index('Stock ticker", "Date") before running the regressions with entity and time effects. In the regression result, the number of entities shows 450, which is perfect but the time period shows 1800. I am wondering how python is capturing the time effects? Is it based on year or some other way? What I want is a year fixed effects, where for a particular year all firm will have same indicator variable. Can someone please help me to do it in the right way?
the image shows the format of the data, where panel is based on daily returns
Sounds like your model is capturing daily fixed effects instead of yearly fixed effects. This is happening because you set Date as an index, so you're telling Python that you want one fixed effect per date.
You have to create a new column that only contains the year. That is, convert the date column to datetime format (see pandas.to_datetime) and then:
# Extract year from Date
df['Year'] = pd.DatetimeIndex(df['Date']).year
# Set indices
df = df.set_index(['Ticker','Year'])
Then run your model.
I recommend using linearmodels.PanelOLS because that module is specifically made for fitting fixed effects models.
For future reference, post your code and a replicable example so we can help you out more easily.
I am not sure how to modify the 'observed' values in holidays. I need a holiday to be observed on a Monday only if it falls on a Sunday i.e. 12-25-2016(Sunday) then 12-26-2016 is observed. However, I do not want a holiday to be observed on a Monday if it was on a Saturday, or a holiday be observed on a Saturday if it is on a Friday. i.e. 12-25-2015(Friday) then 12-26-2015 is observed. Those two example I got from testing the holidays library. I checked the documentation and could only find how to turn the observed holidays off. I just want to modify the observed values not remove.
Thank you for all the help. I am new to the holiday's library, so please pardon my noobness
I have searched extensively in Google and here but cannot seem to find the answer I am looking for or at least, some thing I understand. Is it possible to use EWMA in Pandas for forecasting ? For example, if I had daily data of website clicks for 2 months 1st Feb to 31st Mar. and don't see any trend or seasonality in the data, it seems like I should be able to use EWMA to "predict" number of clicks at a later date say on 10th April. In Excel, I can imagine just filling approximately 10 dates or rows after 31st March and computing a moving average where the 5-day EWMA for 10th April will be based on weighted forecasts of prior days. Is there a way I can do this in Python ?
Thanks !
It's a one-liner to implement, but you're going to be a little bored by EWMA's predictions of the future (the mean is simply the most recent observation). If you'd like a python package that lets you experiment with EWMA level, trend and seasonality, try my Holt Winters implementation:
https://github.com/welch/seasonal
https://pypi.python.org/pypi/seasonal
I am using QuantLib 1.7 with the Python interface.
I have constructed the JPY Fixed-Float swap curve following the standard convention. For the swap schedules I have a JointCalendar with Japan and UnitedKingdom. My JPYLibor index has the UK calendar only.
When I set the market date to 2009-May-1, I do a bootstrap using PiecewiseFlatForward with settlement date 2009-May-8 because in the Japan calendar there was a long holiday from 2009-May-4 (monday) to 2009-May-6.
Now, with this bootstraped curve, I try to value a swap that has a floating payment on 2009-May-7. When I try to value it (or compute the amount() function of the next floatingLeg cashflow which has a reset date on 2009-May-5) I get the error message "2nd leg: negative time (-0.00277778) given".
I guess that this is related to the fact that 2009-May-5, which is the London fixing date for value date 2009-May-7, falls on a Japanese holiday?
My swap payments schedules and reset schedule are matching Bloomberg so I am confident in theory is the correct convention. I have read some old posts regarding apparently a similar issue for a US swap, but as far as I understood this was a bug which was corrected around the time of QuantLib 0.9.
Could my problem be related to the same bug or I am not using QuantLib correctly?
The problem is that the value date for the payment, May 7th, is between today's date and the reference date of the curve. The fixing needs to be forecast, since it's in the future (the fixing date is on May 5th); but because the curve effectively starts on May 8th, it can't return the May 7th discount which is required to forecast the fixing.
The reason why this doesn't usually happen is that, when the value date is between today and the reference date, the fixing date is usually before today's date and thus the fixing can be loaded from past ones.
In this particular case, the way to make it work would be to create a curve with no settlement days so that its reference date is the same as today's date. If you then wanted the price as-of May 8th, you'd have to manually adjust the swap NPV for the discount between May 1st and 8th.
I'm reading Wes McKinney's Python for Data Analysis book. On the topic of using DataFrame.resample() or Series.resample(), if I want to resample for Business days, I would use:
df.resample('B')
However, I noticed that the notation of 'B' depends on your computer's region... I'm failing to run the examples on page 344 because my calendar isn't US...
How can I explicitly choose to resample based on a particular country's holiday? Say, US holidays, or European holidays? Struggling to find some documentation on this... The doc for resample() I found here {http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.resample.html} is rather short and doesn't really go into the details of the first parameter rule...
Many thanks.
Custom Business Days (Experimental)
The CDay or CustomBusinessDay class provides a parametric BusinessDay
class which can be used to create customized business day calendars
which account for local holidays and local weekend conventions.