python statsmodels ARIMA plot_predict: How to get the data predicted? - python

I used ARIMAResults' plot_predict function to predict 5 years in advance what the data would look like and it's fairly reasonable. The only thing is, I need that data that was predicted for Power Bi!
How can I actually see those values (not on the plot)?
Note: I am using python!
Thanks!

You need to call the predict() method instead of plot_predict(). It is more or less the same method with same parameters, but predict() returns the predicted values as an array while plot_predict() returns a figure.
https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima_model.ARIMAResults.plot_predict.html#statsmodels.tsa.arima_model.ARIMAResults.plot_predict
https://www.statsmodels.org/stable/generated/statsmodels.tsa.arima_model.ARIMAResults.predict.html#statsmodels.tsa.arima_model.ARIMAResults.predict

use predict instead of predict_plot()
print("Predicted Price pct change")
def plotARMA(df_accumulative,ax,label):
result=df_accumulative
result=result.rolling(window=45).mean().dropna()
mod = sm.tsa.arima.ARIMA(result, order=(2,0,0))
res = mod.fit()
# Plot the original series and the forecasted series
#res.plot_predict(start=0, end=400)
df_accumulative.plot(ax=ax,label=label)
res.predict().plot(ax=ax,label=label)
fig,ax = plt.subplots(figsize=(20,20))
plotARMA(duke_accumulative,ax,"Duke")
plotARMA(nee_accumulative,ax,"Next Era")
plotARMA(xel_accumulative,ax,"Xel")
plt.legend(fontsize=8)
plt.title("ARMA")
plt.show()

Related

How to take confidence interval of statsmodels.tsa.holtwinters-ExponentialSmoothing Models in python?

I did time series forecasting analysis with ExponentialSmoothing in python. I used statsmodels.tsa.holtwinters.
model = ExponentialSmoothing(df, seasonal='mul', seasonal_periods=12).fit()
pred = model.predict(start=df.index[0], end=122)
plt.plot(df_fc.index, df_fc, label='Train')
plt.plot(pred.index, pred, label='Holt-Winters')
plt.legend(loc='best')
I want to take confidence interval of the model result. But I couldn't find any function about this in "statsmodels.tsa.holtwinters - ExponentialSmoothing". How to I do that?
From this answer from a GitHub issue, it is clear that you should be using the new ETSModel class, and not the old (but still present for compatibility) ExponentialSmoothing.
ETSModel includes more parameters and more functionality than ExponentialSmoothing.
To calculate confidence intervals, I suggest you to use the simulate method of ETSResults:
from statsmodels.tsa.exponential_smoothing.ets import ETSModel
import pandas as pd
# Build model.
ets_model = ETSModel(
endog=y, # y should be a pd.Series
seasonal='mul',
seasonal_periods=12,
)
ets_result = ets_model.fit()
# Simulate predictions.
n_steps_prediction = y.shape[0]
n_repetitions = 500
df_simul = ets_result.simulate(
nsimulations=n_steps_prediction,
repetitions=n_repetitions,
anchor='start',
)
# Calculate confidence intervals.
upper_ci = df_simul.quantile(q=0.9, axis='columns')
lower_ci = df_simul.quantile(q=0.1, axis='columns')
Basically, calling the simulate method you get a DataFrame with n_repetitions columns, and with n_steps_prediction steps (in this case, the same number of items in your training data-set y).
Then, you calculate the confidence intervals with DataFrame quantile method (remember the axis='columns' option).
You could also calculate other statistics from the df_simul.
I also checked the source code: simulate is internally called by the forecast method to predict steps in the future. So, you could also predict steps in the future and their confidence intervals with the same approach: just use anchor='end', so that the simulations will start from the last step in y.
To be fair, there is also a more direct approach to calculate the confidence intervals: the get_prediction method (which uses simulate internally). But I do not really like its interface, it is not flexible enough for me, I did not find a way to specify the desired confidence intervals. The approach with the simulate method is pretty easy to understand, and very flexible, in my opinion.
If you want further details on how this kind of simulations are performed, read this chapter from the excellent Forecasting: Principles and Practice online book.
Complementing the answer from #Enrico, we can use the get_prediction in the following way:
ci = model.get_prediction(start = forecast_data.index[0], end = forecast_data.index[-1])
preds = ci.pred_int(alpha = .05) #confidence interval
limits = ci.predicted_mean
preds = pd.concat([limits, preds], axis = 1)
preds.columns = ['yhat', 'yhat_lower', 'yhat_upper']
preds
Implemented answer (by myself).... #Enrico, we can use the get_prediction in the following way:
from statsmodels.tsa.exponential_smoothing.ets import ETSModel
#---sales:pd.series, time series data(index should be timedate format)
#---new advanced holt's winter ts model implementation
HWTES_Model = ETSModel(endog=sales, trend= 'mul', seasonal='mul', seasonal_periods=4).fit()
point_forecast = HWTES_Model.forecast(16)
#-------Confidence Interval forecast calculation start------------------
ci = HWTES_Model.get_prediction(start = point_forecast.index[0],
end = point_forecast.index[-1])
lower_conf_forecast = ci.pred_int(alpha=alpha_1).iloc[:,0]
upper_conf_forecast = ci.pred_int(alpha=alpha_1).iloc[:,1]
#-------Confidence Interval forecast calculation end-----------------
To complement the previous answers, I provide the function to plot the CI on top of the forecast.
def ets_forecast(model, h=8):
# Simulate predictions.
n_steps_prediction =h
n_repetitions = 1000
yhat = model.forecast(h)
df_simul = model.simulate(
nsimulations=n_steps_prediction,
repetitions=n_repetitions,
anchor='end',
)
# Calculate confidence intervals.
upper_ci = df_simul.quantile(q=0.975, axis='columns')
lower_ci = df_simul.quantile(q=0.025, axis='columns')
plt.plot(yhat.index, yhat.values)
plt.fill_between(yhat.index, (lower_ci), (upper_ci), color='blue', alpha=0.1)
return yhat
plt.plot(y)
ets_forecast(model2, h=8)
plt.show()
enter image description here

How to use exponential smoothing to smooth the timeseries in python?

I am trying to use exponential smooting to smooth a timeseries.
Suppose my timeseries looks like this:
import pandas as pd
data = [446.6565, 454.4733, 455.663 , 423.6322, 456.2713, 440.5881, 425.3325, 485.1494, 506.0482, 526.792 , 514.2689, 494.211 ]
index= pd.date_range(start='1996', end='2008', freq='A')
oildata = pd.Series(data, index)
I want to get the smoothed version of that timeseries.
If I did something like this;
from statsmodels.tsa.api import ExponentialSmoothing
fit1 = SimpleExpSmoothing(oildata).fit(smoothing_level=0.2,optimized=False)
fcast1 = fit1.forecast(3).rename(r'$\alpha=0.2$')
it only outputs the forcasted three values, but not the smoothed version of my original timeseries. Is there a way to get the smoothed version of my original timeseries?
I am happy to provide more details if needed.
You can get the smoothed values in the fittedvalues attribute of the model, apparently.
import pandas as pd
data = [446.6565, 454.4733, 455.663 , 423.6322, 456.2713, 440.5881, 425.3325, 485.1494, 506.0482, 526.792 , 514.2689, 494.211 ]
index= pd.date_range(start='1996', end='2008', freq='A')
oildata = pd.Series(data, index)
from statsmodels.tsa.api import SimpleExpSmoothing
fit1 = SimpleExpSmoothing(oildata).fit(smoothing_level=0.2,optimized=False)
# fcast1 = fit1.forecast(3).rename(r'$\alpha=0.2$')
import matplotlib.pyplot as plt
plt.plot(oildata)
plt.plot(fit1.fittedvalues)
plt.show()
It yields:
The documentation states:
fittedvalues: ndarray
An array of the fitted values. Fitted by the Exponential Smoothing model.
Note that you can also use the fittedfcast attribute which contains all values + the first forecast, or the fcastvalues attribute which contains the forecast only.
ExponentialSmoothing is not to a tool to smoothen time series data, it is a time series forecasting method.
The fit() function will return an instance of the HoltWintersResults class that contains the learned coefficients. The forecast() or the predict() function on the result object can be called to make a forecast.
So by calling predict, what the class will doing is providing a forecast using the learned coefficients.
In order to smoothen the time series however, you can use the fittedvalues attribute, as #smarie points out
However, I'd go with a more appropriate tool, such as a savgol_filter:
from scipy.signal import savgol_filter
savgol_filter(oildata, 5, 3)
array([444.87816 , 461.58666 , 444.99296 , 441.70785143,
442.40769143, 438.36852857, 441.50125714, 472.05622571,
512.20891429, 521.74822857, 517.63141429, 493.37037143])
As mentioned in the comments, the savgol filter performs a local taylor approximation of a given polyorder on a given window size (window_length) and results in a smoothing of the time series.
Here's what it would look like with the above set up:
plt.plot(oildata)
plt.plot(pd.Series(savgol_filter(oildata, 5, 3), index=oildata.index))
plt.show()

How to make the confidence interval (error bands) show on seaborn lineplot

I'm trying to create a plot of classification accuracy for three ML models, depending on the number of features used from the data (the number of features used is from 1 to 75, ranked according to a feature selection method). I did 100 iterations of calculating the accuracy output for each model and for each "# of features used". Below is what my data looks like (clsf from 0 to 2, timepoint from 1 to 75):
data
I am then calling the seaborn function as shown in documentation files.
sns.lineplot(x= "timepoint", y="acc", hue="clsf", data=ttest_df, ci= "sd", err_style = "band")
The plot comes out like this:
plot
I wanted there to be confidence intervals for each point on the x-axis, and don't know why it is not working. I have 100 y values for each x value, so I don't see why it cannot calculate/show it.
You could try your data set using Seaborn's pointplot function instead. It's specifically for showing an indication of uncertainty around a scatter plot of points. By default pointplot will connect values by a line. This is fine if the categorical variable is ordinal in nature, but it can be a good idea to remove the line via linestyles = "" for nominal data. (I used join = False in my example)
I tried to recreate your notebook to give a visual, but wasn't able to get the confidence interval in my plot exactly as you describe. I hope this is helpful for you.
sb.set(style="darkgrid")
sb.pointplot(x = 'timepoint', y = 'acc', hue = 'clsf',
data = ttest_df, ci = 'sd', palette = 'magma',
join = False);

Detect a given pattern in time series

How an I detect this type of change in a time series in python?click here to see image
Thanks for your help
There are many ways to do this.
I will show one of the fastest and simplest way. It is based on using correlation.
First of all we need a data(time series) and template(in our case the template is like a signum function):
data = np.concatenate([np.random.rand(70),np.random.rand(30)+2])
template = np.concatenate([[-1]*5,[1]*5])
Before detection I strongly recommend normalize the data(for example like that):
data = (data - data.mean())/data.std()
And now all we need is use of correlation function:
corr_res = np.correlate(data, template,mode='same')
You need to choose the threshold for results(you should define that value based on your template):
th = 9
You can see the results:
plt.figure(figsize=(10,5))
plt.subplot(211)
plt.plot(data)
plt.subplot(212)
plt.plot(corr_res)
plt.plot(np.arange(len(corr_res))[corr_res > th],corr_res[corr_res > th],'ro')
plt.show()

How to fix .predict() function in statsmodels?

I'm trying to predict temperature at 12 UTC tomorrow in 1 location. To forecast, I use a basic linear regression model with the statmodels module. My code is hereafter:
x = ds_main
X = sm.add_constant(x)
y = ds_target_t
model = sm.OLS(y,X,missing='drop')
results = model.fit()
The summary shows that the fit is "good":
But the problem appears when I try to predict values with a new dataset that I consider to be my testset. The latter has the same columns number and the same variables names, but the .predict() function returns an array of NaN, although my testset has values ...
xnew = ts_main
Xnew = sm.add_constant(xnew)
ynewpred = results.predict(Xnew)
I really don't understand where the problem is ...
UPDATE : I think I have an explanation: my Xnew dataframe contains NaN values. Statmodels function .fit() allows to drop missing values (NaN) but not .predict() function. Thus, it returns a NaN values array ...
But this is the "why", but I still don't get the "how" reason to fix it...
statsmodels.api.OLS be default will not accept the data with NA values. So if you use this, then you need to drop your NA values first.
However, if you use statsmodels.formula.api.ols, then it will automatically drop the NA values to run regression and make predictions for you.
so you can try this:
import statsmodels.formula.api as smf
lm = smf.ols(formula = "y~X", pd.concat([y, X], axis = 1)).fit()
lm.predict(Xnew)

Categories