How do I forecast data using ARIMA? - python

I wanted to forecast stock prices using the ARIMA model (Autoregressive Moving Average) and wanted to plot the forecasted data over the actual and training data. I'm following this tutorial and have browsed others too. But they all follow the same code. Here is the link to their tutorial for your reference:(https://www.analyticsvidhya.com/blog/2021/07/stock-market-forecasting-using-time-series-analysis-with-arima-model/)
# Forecast
fc, se, conf= fitted.forecast(216, alpha=0.05) # 95% conf
I was expecting a graph that looks like this
Instead, an error message shows up: ValueError: too many values to unpack (expected 3)
please help :')
Edit: I tried doing that before and it produces an error message in the next code. My next line of codes are as the following:
result = fitted.forecast(216, alpha =0.05)`
# Make as pandas series
fc_series = pd.Series(result, index=test_data.index)
lower_series = pd.Series(result[:, 0], index=test_data.index)
upper_series = pd.Series(result[:, 1], index=test_data.index)
The error message: KeyError: 'key of type tuple not found and not a MultiIndex'

It seems, that the forecast function is not returning three return values anymore. This may happen if you don’t use the same version as in the tutorial.
Please try something like:
result = fitted.forecast(216, alpha=0.05)
And then inspect the result if it does contain all the data you need.

import library
import statsmodels.api as sm
use a model with sm.tsa
model = sm.tsa.ARIMA(train_data, order=(1, 1, 1))
fitted = model.fit()
print(fitted.summary())
pass a parameter Summary_frame to get a forecast , lower and upper interval
result = fitted.get_forecast(216, alpha =0.05).summary_frame()
print(result)
Make pandas series, dont forget add values to get series not null.
fc_series = pd.Series(result['mean'].values, index=test_data.index)
lower_series = pd.Series(result['mean_ci_lower'].values, index=test_data.index)
upper_series = pd.Series(result['mean_ci_upper'].values, index=test_data.index)
I hope this help you.

Related

Time Series AR model shows NaNs for prediction

I'm running the below code for AR model and it returns blanks
Can someone help me debug this.
# With Headers
df = pd.read_sql(sql_query, cnxn,index_col='date',parse_dates=True)
#index col is required to make sure stasmodel on this dataset we need to set index frequency
df.index.freq = 'MS'
df.to_csv("Billings.csv")
# write back to an excel for audits and testing
#train test split
train_data = df.iloc[:len(df)-12]
test_data = df.iloc[len(df)-12:]
from statsmodels.tsa.ar_model import AR,ARResults
# Ignore harmless warnings
import warnings
warnings.filterwarnings("ignore")
model = AR(train_data['tcv'])
AR1fit = model.fit(maxlag=1,method='mle') #max_lag tells you how many co efficients to take or what model type it is. E.g. AR1
print(f'Lag: {AR1fit.k_ar}')
print(f'Coefficients:\n{AR1fit.params}')
# general format for obtaining predictions
start=len(train_data)
end=len(train_data)+len(test_data)-1
predictions1 = AR1fit.predict(start=start, end=end, dynamic=False).rename('AR(1) Predictions')
predictions1
Output:
Results of print statements
Thank you for uploading the results of the print statements!
As you can see the value of L1.tcv parameter is NaN. BTW, To get a better picture of the model fit, you can also do:
print(AR1fit.summary())
In any case, this explains why you get NaNs in your predictions - because any computation with NaN will result in NaN.
However, fixing this is another kettle of fish. If you look at the vignette here, you can see they use dropna in block [3].
I suspect that if you did something similar on your train set, train_data['tcv'].dropna(), this could fix your predictions.

python statsmodels ARIMA plot_predict: How to get the data predicted?

I used ARIMAResults' plot_predict function to predict 5 years in advance what the data would look like and it's fairly reasonable. The only thing is, I need that data that was predicted for Power Bi!
How can I actually see those values (not on the plot)?
Note: I am using python!
Thanks!
You need to call the predict() method instead of plot_predict(). It is more or less the same method with same parameters, but predict() returns the predicted values as an array while plot_predict() returns a figure.
https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima_model.ARIMAResults.plot_predict.html#statsmodels.tsa.arima_model.ARIMAResults.plot_predict
https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima_model.ARIMAResults.predict.html#statsmodels.tsa.arima_model.ARIMAResults.predict
use predict instead of predict_plot()
print("Predicted Price pct change")
def plotARMA(df_accumulative,ax,label):
result=df_accumulative
result=result.rolling(window=45).mean().dropna()
mod = sm.tsa.arima.ARIMA(result, order=(2,0,0))
res = mod.fit()
# Plot the original series and the forecasted series
#res.plot_predict(start=0, end=400)
df_accumulative.plot(ax=ax,label=label)
res.predict().plot(ax=ax,label=label)
fig,ax = plt.subplots(figsize=(20,20))
plotARMA(duke_accumulative,ax,"Duke")
plotARMA(nee_accumulative,ax,"Next Era")
plotARMA(xel_accumulative,ax,"Xel")
plt.legend(fontsize=8)
plt.title("ARMA")
plt.show()

Equivalent R's arima function in Python

I tried a time series forecast with Python using statsmodel's arima function and it gave me a different result from the r's arima function.
I used the same hyper-parameters.
R's version :
fit <- arima(data[1:9000,3], order = c(3,0,3), seasonal = list(order = c(0,0,0)))
predd = forecast(fit,h=1000)
pred = cbind(data[9001:10000,3], predd$mean)
Python's version :
series = df[0:9000].copy()
model = ARIMA(series, order=(3, 0, 3))
model_fitted = model.fit()
predictions = model_fitted.predict(start=len(series), end=len(df)-1)
Attached are the plots results Plots of the R's and Python's arima
What am I doing wrong?
Is there any other Python package/function arima that I can use other than statsmodel for a univariate time series?
Any insight or guidance would be greatly appreciated. Thank you so much in advance.
Summary: I do not know how you created the first image you showed as "R's version", but when I run the R code you gave and plot the results, they look identical to the Python results to me and do not look like the "R's version" graph you included. My best guess is that somehow you were plotting in-sample predictions when you created that image showing R's results.
See below for details.
Details:
I started by downloading the dataset "dataset.txt" from the link you gave, https://gist.github.com/DouddaS/5043a340ff7d7b35b255b4f8f74fc534
Now, if I run the following R code:
library(forecast)
y <- read.csv('dataset.txt')
fit <- arima(y[1:9000, 1], order = c(3,0,3), seasonal = list(order = c(0,0,0)))
predd = forecast(fit,h=1000)
pred = cbind(y[9001:10000,1], predd$mean)
autoplot(pred)
This gives the following plot:
And when I run the following Python code:
y = pd.read_csv('dataset.txt')
model = sm.tsa.arima.ARIMA(y.iloc[:9000, 0], order=(3, 0, 3))
model_fitted = model.fit()
pred = model_fitted.predict(start=len(series), end=len(y)-1)
predd = pd.concat([y.iloc[9000:, 0], pred], axis=1)
predd.plot()
Then I get the following plot:
These look basically identical to me, and R's version looks nothing like the image that was posted in the question.

Get better fit on test data using Auto_Arima

I am using the AirPassengers dataset to predict a timeseries. For the model I am using, I chosen to use auto_arima to forecast the predicted values. However, it seems that the chosen order by the auto_arima is unable to fit the model. The corresponding chart is produced.
What can I do to get a better fit?
My code for those that want to try:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
%matplotlib inline
from pmdarima import auto_arima
df = pd.read_csv("https://raw.githubusercontent.com/AileenNielsen/TimeSeriesAnalysisWithPython/master/data/AirPassengers.csv")
df = df.rename(columns={"#Passengers":"Passengers"})
df.Month = pd.to_datetime(df.Month)
df.set_index('Month',inplace=True)
train,test=df[:-24],df[-24:]
model = auto_arima(train,trace=True,error_action='ignore', suppress_warnings=True)
model.fit(train)
forecast = model.predict(n_periods=24)
forecast = pd.DataFrame(forecast,index = test.index,columns=['Prediction'])
plt.plot(train, label='Train')
plt.plot(test, label='Valid')
plt.plot(forecast, label='Prediction')
plt.show()
from sklearn.metrics import mean_squared_error
print(mean_squared_error(test['Passengers'],forecast['Prediction']))
Thank you for reading. Any advice is appreciated.
This series is not stationary, and no amount of differencing (notice that the amplitude of the variations keeps increasing) will make it so. However, transforming the data first by taking logs should do better (experiment shows that it does do better, but not what I would call well). Setting the seasonality (as I suggest in the comment by m=12, and taking logs produces this: which is essentially perfect.
The problem was that I did not specify the m, in this case, I assigned the value of m to be 12, denoting that it is a monthly cycle, that each data row is a month. That's how I understand it. source
Feel free to comment, I'm not entirely sure as I am new to using ARIMA.
Code:
model = auto_arima(train,m=12,trace=True,error_action='ignore', suppress_warnings=True)
Just add m=12,to denote that the data is monthly.
Result:

ARIMA PREDICT doesnt forecast (But works for Hindcasting)

When using ARIMA I can hind-cast past data as shown below but the moment I try to forecast future values, it doesn't work.
And yes I have added new rows to my table using concat:
df['forecast'] = results.predict(start = 50, end = 251)
df[['close', 'forecast']].plot(figsize = (12,8))
But the moment I change end = 251 to end= 252, it doesn't produce any forecast values and all my hind-cast values disappear?
Any solutions?
You probably want to use forecast instead of predict:
df['forecast'] = results.forecast(steps=7)
There's a good tutorial on this here: https://machinelearningmastery.com/make-sample-forecasts-arima-python/

Categories