I am a bit confused about how to identify the seasonal component of a SARIMA model. I am currently looking at forecasting rates (ocean carrier rates to be specific). The first thing I did was to convert my original rates to the difference in rates, i.e. log(P2) - log(P1) as I wanted to forecast the change in rate itself. Then I checked for stationarity of the series and it was stationary. This is what the seasonal decomposed series look like
After that, I ran a basic ARIMA model and chose the p,d,q based on the ACF and PACF plots here but the predictions were pretty bad which was expected
I am now trying to run a SARIMA model instead. I obtained the seasonality component via seasonal_decompose and plotted the ACF and PACF over 52 lags (i have weekly data) and this is what I see. How do I choose the right pdq component for SARIMA?
Related
I am working on the following timeseries multi-class classification problem:
42 possible classes that are dependent on each other, I want to know the probability of each class for up to 56 days ahead
1 year of daily data so 365 observations
the class probabilities have a strong weekly seasonality
I have exogenous regressors that are strongly correlated with the output classes
I realise that I am trying to predict a lot of output classes with little data, but I am looking for a model (preferably with Python implementation) that is most suited for this use case.
Any recommendations on what model could work for this problem?
So far I have tried:
a tree based model, but it struggles with the high amount of classes and does not capture the time series component well
a VAR model, but the number of parameters to estimate becomes too high compared to the series
predicting each class probability independently, but that assumes the series are independent, which is not the case
Struggling to build an arima model in python that is even close to useful for predicting household electricity usage. Would appreciate any thoughts and suggestions. (Might just be a silly error in my implementation!)
Some design thoughts:
Data is very messy in general but there is clearly daily seasonality (usage drops over night and while household at work/school) and a weekly seasonality (weekday usage differs from weekend)
Have tried statsmodels, sktime, fbprophet and pmdarima 'auto_arima' functions with no luck. Don't think these take seasonality into account particularly well
Currently trying to get a more manual approach to work: statsmodel's sarima with only daily seasonality incorporated (see code and results below), and maybe add fourier term as exogenous variable to handle weekly seasonality.
Will consider adding exogenous variables (like temperature) to account for annual seasonality but first just trying to get something reasonable on a smaller time scale (3-6 months).
Approach I am trying to get working: Use box-jenkins method to specify a sarima model for just daily seasonality (images below).
(1) Looking at Dickey-Fuller and KPSS for the time series, there appears to be minimal trend to correct for (expected), but ACF and PACF charts show significant seasonality (daily, weekly).
(2) Taking differences to account for week and day seasonality, then taking a further first-order difference, we can quickly get to a dataset that has minimal remaining seasonality and is stationary. This should be a really good sign and suggests there is a model we can build to predict this behaviour!
One more plot to show difference between original and differenced data when we zoom in for a typicaly week.
(3) Finally, I trained a sarima model in the following way with results. I configured D=d=0 since no identifiable trend (expected), p=2 to give model opportunity to learn from most recent behaviour, m=48 for seasonality (daily since data is in 30min time intervals), and P=Q=1 to capture those seasonality effects t-48.
model = SARIMAX(
train_data,
trend='n',
order=(2, 0, 0),
seasonal_order=(1, 0, 1, 48),
)
results = model.fit()
I am able to get an exponential smoothing model working, but I had expected double seasonal arima to blow it out of the water. Any thoughts and suggestions most welcome. Thank you in advance!
Model Fits but the Predictions Fail
Using a (4,0,13) ARIMA model on the following data shown in the picture below yields flat predictions (also shown shown in the second picture below). I am not sure why the model can fit the data in the training set, but then predict nothing afterwards. I found another question here which said I needed to add a seasonal component. I detail my experience with that below.
The Time Series (zoomed in)
The Predictions*
* The predictions plot shows all the training data as well as the validation data after the orange vertical line. The training fit is rounded to be integers (it's not possible to have real numbers in this dataset). Note the prediction is just flat and then dies.
Problem Definition
I have 15 minute interval data and desire to apply a SARIMA model to it. It has a daily seasonality, which is defined from 7am-9pm (therefore, every 4 * 15 = 60 periods (4, 15 minute periods in an hour * 15 hours)). I first tested for stationarity with the Augmented Dickey-Fuller test. This passed, and so I started to analyze the ACF and PACF to determine the SARIMA parameters.
Parameter Determination
(p,d,q)
ACF & PACF on Original Data
From this, I see there is no unit root (sum of ACF and PACF do not equal 1), and that we need to difference the series since there is no big cut off in the ACF.
ACF & PACF on Differenced Data
From this, I see it is slightly overdifferenced, so I may want to try no integrated term and add an AR term at 15 (the point where the ACF in the original plot enters the bands). I also add an MA term here.
(P,D,Q)s
I now look for the seasonal component. I do a seasonal difference of period 60 since that's where the spike is in the plots.
Seasonal difference
Seeing this, I should add 2 MA terms to the seasonal component (Rules 13 and 7 from here) But the site also says to not use more than 1 seasonal MA usually, so I leave it at 1.
Model
This leaves me with a SARIMA(0,1,1)(0,1,1,60) model. However, I run out of memory trying to fit this model (Python, using the statsmodels SARIMA function).
Question
Did I choose the parameters correctly? Is this data even fittable by ARIMA/SARIMA? And lastly, would the 60 period SARIMA actually work and I just need to find a way to run it on a different machine?
I guess the tl;dr question is: what am I doing wrong?
Feel free to go into detail. I want to become well informed with time series and so more information is better!
to select the best fit model, you use the AIC/BIC test to find the model that receives best results. You test different combination of Q and P.
Further,normally the model follows rule: q+d+p+Q+D+P < 6
BR
A.
I am building a churn forecast model using features such as 1 year worth lags, holidays, moving averages, day/day ratios, seasonality factor extracted from statsmodels etc. It is clearly not an additive series, the magnitude of holiday churn each year is greater than that in previous years.
My XGB model predicts daily churn quite accurately, but it fails on holidays miserably (the trenches are slightly better predicted as compared to peaks):
in my opinion the model is unable to capture the exponential nature of the series. here is how it looks like at present. is there a way i can capture the exponential nature of the series, by using any additional features or something?
Data:
I have time series data for different countries and factors, e.g. birth rate for "Afghanistan" for years from 1972 up until 2007 (source).
Goal:
Predict e.g. birth rate for 2008 and 2012
Question:
I am familiar with linear regressions, but need some help on how to work with time series data and predict future values.
Can you point me to examples or share code snippets?
Take a look at the statsmodels Time Series Analysis module. Time series models are often based around autocorrelation, and the module has the standard univariate (for individual time series) AR(p) and MA(p) models, as well as the combined version ARIMA that allows for unit roots. You'll also find multivariate (for various interrelated time series) VAR models.
And here's a time series tutorial for statistical analysis and forecasting using pandas and statsmodels.
you can use ARIMA model and VAR Model in R.
ARIMA: Auto Regressive Integrated Moving Average model
VAR: Vector Auto Regressive model
For ARIMA model: click here
For VAR model: click here
For one time series data, use ARIMA model, however, if multiple time series data are related to each other, use VAR model.