SARIMA with daily data - determine saisonal period

SARIMA with daily data - determine saisonal period - python

i have 30 years for daily temperature data and i have a problem with determinate the saisonal period , i cant do it with lag = 365 day python is implemented
so can i use 90 day ( 1 season) . I don't know does it make sense or not

A seasonality of 365 does not really make sense for three reasons:
Such a long seasonality often leads to long runtimes or as in your case the process breaks.
When you use exactly 365 days you are ignoring leap years, so you model distorts over longer periods (30 years for examples results a distortion of 7 days already)
The most important reason is the correlation itself. A seasonality of 365 would mean that the weather today is somehow correlated to the weather last year, which is not really the case. Of course, both days will be somewhat close to another due to meteorological seasons, but I would not rely on this correlation. Imagine last year was relatively cold and this year is relatively warm. Do you use the last days immediately before today or would you base your forecast on some day a year ago? That is not very reliable.
Your forecast has to be based on the last immediate lags before the actual day – not the days of last year. However, to improve your model you have to factor in the meteorological seasons (spring, summer, fall, winter). There are multiple ways to do so. One for example would be to use SARIMAX and pass the model an exogenous feature, e.g. the monthly average temperature or a moving average over the last years. That way you can teach the model that days in summer are generally hotter than in winter, and for the precise prediction you use the days immediately before.
See here for the documentation of statsmodels SARIMAX:
https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html
There are plenty of examples on how to create these models with exogenous variables on the web.

Related

How to include year fixed effect (in a daily panel data)

I am working on a panel dataset that includes daily stock returns of 450 firms for 5 years and daily ESG score(momentum based) for 5 years. I want to regress stock return on daily ESG scores, keeping Firm and year fixed effect. I have used linearmodels.panel function in python and set the index('Stock ticker", "Date") before running the regressions with entity and time effects. In the regression result, the number of entities shows 450, which is perfect but the time period shows 1800. I am wondering how python is capturing the time effects? Is it based on year or some other way? What I want is a year fixed effects, where for a particular year all firm will have same indicator variable. Can someone please help me to do it in the right way?
the image shows the format of the data, where panel is based on daily returns

Sounds like your model is capturing daily fixed effects instead of yearly fixed effects. This is happening because you set Date as an index, so you're telling Python that you want one fixed effect per date.
You have to create a new column that only contains the year. That is, convert the date column to datetime format (see pandas.to_datetime) and then:
# Extract year from Date
df['Year'] = pd.DatetimeIndex(df['Date']).year
# Set indices
df = df.set_index(['Ticker','Year'])
Then run your model.
I recommend using linearmodels.PanelOLS because that module is specifically made for fitting fixed effects models.
For future reference, post your code and a replicable example so we can help you out more easily.

Prophet Parameters

I am currently using Prophet to forecast usage in a year period. This is my first time using this algo and I have some questions in mind.
I am utilising the code attached below. I am wondering if anyone has included holidays as parameter before and how to do so while including holidays from other calendar (lunar/islamic etc). Also since February may have 1 more day in a leap year, would be great as well to know if the algorithm take this into consideration?
m = Prophet(
growth='logistic',
seasonality_mode='multiplicative',
seasonality_prior_scale=1.5,
mcmc_samples=5,
n_changepoints=25,
changepoint_range=0.8,
yearly_seasonality='auto',
weekly_seasonality='auto',
daily_seasonality='auto',
holidays=None,
holidays_prior_scale=10.0,
changepoint_prior_scale=0.05,
interval_width=0.8,
stan_backend=None,
)

The holidays parameter takes in a dataframe. The minimal set of columns required in that dataframe are date and holiday name.
The important thing to note here is that you provide both historical and future holidays in this dataframe.
Apart from the 2 columns mentioned above, the following columns are optional:
lower_window, upper_window (int) - to extend holiday effect around the date of holiday.
prior_scale(float) - to set a different prior scale for each holiday.
Also to answer your second question i.e.
Also since February may have 1 more day in a leap year, would be great
as well to know if the algorithm take this into consideration?
It depends on the modelling data. Since the data you'd be providing would already include leap year, Prophet will take that into consideration.

Predict Sales as Counterfactual for Experiment

Which modelling strategy (time frame, features, technique) would you recommend to forecast 3-month sales for total customer base?
At my company, we often analyse the effect of e.g. marketing campaigns that run at the same time for the total customer base. In order to get an idea of the true incremental impact of the campaign, we - among other things - want to use predicted sales as a counterfactual for the campaign, i.e. what sales were expected to be assuming no marketing campaign.
Time frame used to train the model I'm currently considering 2 options (static time frame and rolling window) - let me know what you think. 1. Static: Use the same period last year as the dependent variable to build a specific model for this particular 3 month time frame. Data of 12 months before are used to generate features. 2. Use a rolling window logic of 3 months, dynamically creating dependent time frames and features. Not yet sure what the benefit of that would be. It uses more recent data for the model creation but feels less specific because it uses any 3 month period in a year as dependent. Not sure what the theory says for this particular example. Experiences, thoughts?
Features - currently building features per customer from one year pre-period data, e.g. Sales in individual months, 90,180,365 days prior, max/min/avg per customer, # of months with sales, tenure, etc. Takes quite a lot of time - any libraries/packages you would recommend for this?
Modelling technique - currently considering GLM, XGboost, S/ARIMA, and LSTM networks. Any experience here?
To clarify, even though I'm considering e.g. ARIMA, I do not need to predict any seasonal patterns of the 3 month window. As a first stab, a single number, total predicted sales of customer base for these 3 months, would be sufficient.
Any experience or comment would be highly appreciated.
Thanks,
F

How to make RNN time-forecast multiple days using Keras?

I am currently working on a program that would take the previous 4000 days of stock data about a particular stock and predict the next 90 days of performance.
The way I've elected to do this is with an RNN that makes use of LSTM layers to use the previous 90 days to predict the next day's performance (when training, the previous 90 days are the x-values and the next day is used as the y-value). What I would like to do however, is use the previous 90-180 days to predict all the values for the next 90 days. However, I am unsure of how to implement this in Keras as all the examples I have seen only predict the next day and then they may loop that prediction into the next day's 90 day x-values.
Is there any ways to just use the previous 180 days to predict the next 90? Or is the LSTM restricted to only predicting the next day?

I don't have the rep to comment, but I'll say here that I've toyed with a similar task. One could use a sliding window approach for 90 days (I used 30, since 90 is pushing LSTM limits), then predict the price appreciation for next month (so your prediction is for a single value). #Digital-Thinking is generally right though, you shouldn't expect great performance.

Time series: EWMA pandas forecast

I have searched extensively in Google and here but cannot seem to find the answer I am looking for or at least, some thing I understand. Is it possible to use EWMA in Pandas for forecasting ? For example, if I had daily data of website clicks for 2 months 1st Feb to 31st Mar. and don't see any trend or seasonality in the data, it seems like I should be able to use EWMA to "predict" number of clicks at a later date say on 10th April. In Excel, I can imagine just filling approximately 10 dates or rows after 31st March and computing a moving average where the 5-day EWMA for 10th April will be based on weighted forecasts of prior days. Is there a way I can do this in Python ?
Thanks !

It's a one-liner to implement, but you're going to be a little bored by EWMA's predictions of the future (the mean is simply the most recent observation). If you'd like a python package that lets you experiment with EWMA level, trend and seasonality, try my Holt Winters implementation:
https://github.com/welch/seasonal
https://pypi.python.org/pypi/seasonal

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.