fbprophet yearly seasonality volatility - python

I am new to using fbprophet and have a question about using the predict function.
As an example, I am using fbprophet to extrapolate Apples revenue for the next 5 years. Below is the code using the default settings.
m = Prophet()
m.fit(data)
future = m.make_future_dataframe(periods=5*365)
forecast = m.predict(future)
m.plot(forecast)
m.plot_components(forecast)
plt.show()
The results:
If I choose to remove the "yearly seasonality", I get a linear regression that fits much better.
My question is why do the predicted yhat results blow up so much when yearly seasonality is included. As shown, turning the option off produces a linear regression model but I'm unsure whether this model is most suitable for the data. Any suggestions would be much appreciated.

It looks like you are using monthly data and not daily data.
So instead of using "periods=5*365" you can change the freq to monthly.
Example:
future_pd = m.make_future_dataframe(periods = 12 * 5,
freq='MS',
include_history=True)

Related

How to get performance_metrics() on weekly frequency in facebook-prophet?

I am working with prophet library for educational purpose on a classic dataset:
the air passenger dataset available on Kaggle.
Data are on monthly frequency which is not possible to cross validate as standard frequency on Prophet, based on that discussion.
During the cross validation for Time Series I used the prophet function cross_validation() passing the arguments on weekly frequency.
But when I call the function performance_metrics it returns the horizion column on daily frequency.
How can I get on weekly frequency?
I also tried to read the documentation and the function description:
Metrics are calculated over a rolling window of cross validation
predictions, after sorting by horizon. Averaging is first done within each
value of horizon, and then across horizons as needed to reach the window
size. The size of that window (number of simulated forecast points) is
determined by the rolling_window argument, which specifies a proportion of
simulated forecast points to include in each window. rolling_window=0 will
compute it separately for each horizon. The default of rolling_window=0.1
will use 10% of the rows in df in each window. rolling_window=1 will
compute the metric across all simulated forecast points. The results are
set to the right edge of the window.
Here how I modelled the dataset:
model = Prophet()
model.fit(df)
future_dates = model.make_future_dataframe(periods=36, freq='MS')
df_cv = cross_validation(model,
initial='300 W',
period='5 W',
horizon = '52 W')
df_cv.head()
And then when I call the performance_metrics
df_p = performance_metrics(df_cv)
df_p.head()
This is the output that I get with a daily frequency.
I am probably missing something or I made a mistake in the code.

Is there any way to predict survival probability for censored objects after historical dates (prediction in future)?

I am trying to understand the possibilities and limitations of Survival analysis, in particular lifelines python package.
I fitted the Cox Proportional Hazard Model with some rossi data and got survival function showing the survival over historical period, which is clear.
Here is my code:
import pandas as pd
from lifelines.datasets import load_rossi
from lifelines import CoxPHFitter
rossi = load_rossi()
cph1 = CoxPHFitter()
cph1.fit(rossi, duration_col='week', event_col='arrest')
cph1.plot_covariate_groups('race', [0,1])
My questions are:
1. Can we somehow predict future survival probabilities of censored objects using lifelines package or any other python library for survival analysis? I mean to make survival function go beyond historical periods (e.g. probability of survival after 60 weeks?)
2. Can we use fitted model to compute survival function for new samples of data given their features values?
Regarding my 1st question I tried this (from lifelines doc):
censored_subjects = rossi.loc[~rossi['arrest'].astype(bool)]
censored_subjects_last_obs = censored_subjects['week']
# predict new survival function
cph1.predict_survival_function(censored_subjects,
conditional_after=censored_subjects_last_obs)
But it returns following 49x318 dataframe:

Why my time series use seasonal_decompose() can see clear seasonal, but when apply it with adfuller(), the result shows it is stationary

I think to my naked eye that there are seasonal time series that, when I use adfuller(), the results show the series is stationary based on p values.
I have also applied seasonal_decompose() with it. The results were pretty much what I expected
tb3['percent'].plot(figsize=(18,8))
what the series look like
One thing to note is that my data is collected every minute.
tb3.index.freq = 'T'
from statsmodels.tsa.seasonal import seasonal_decompose
result = seasonal_decompose(tb3['percent'].values,freq=24*60, model='additive')
result.plot();
the result of ETS decompose are shown in the figure below
ETS decompose
We can see a clear seasonality, which is same as what i expect
But when use adfuller()
from statsmodels.tsa.stattools import adfuller
result = adfuller(tb3['percent'], autolag='AIC')
the p-value is less than the 0.05, which means this series is stationary.
Can anyone tells me why that happened? how can i fix it?
Because I want to use the SARIMA model to predict furture values, while use the ARIMA model predicts always a constant value of furture.
An Augmented Dickey Fuller test examines whether the coefficient in the regression
y_t - y_{t-1} = <deterministic terms> + c y_{t-1} + <lagged differences>
is equal to 1. It does not usually have power against seasonal deterministic terms, and so it is not surprising that you are not rejecting using adfuller.
You can use a stationary SARIMA model, for example
SARIMAX(y, order=(p,0,q), seasonal_order=(ps, 0, qs, 24*60))
where you set the AR, MA, seasonal AR, and seasonal MA orders as needed.
This model will be quite slow and memory intensive since you have 24 hours of minutely data and so a 1440 lag seasonal.
The next version of statsmodels, which has been released as statsmodels 0.12.0rc0, adds initial support for deterministic processes in time series models which may simplify modeling this type of series. In particular, it would be tempting to use a low order Fourier deterministic sequence. Below is an example notebook.
https://www.statsmodels.org/devel/examples/notebooks/generated/deterministics.html

Ecommerce item sales forecasting with pandas and statsmodels

I want to forecast item sales (number of sales for each product) with pandas and statsmodel for an ecommerce business. Because item sales is a count dependent variable I'm assuming a Poisson modeling would work best.
In an ideal world the model will be used to decide on which products to use in ads (increasing product views) and also to decide on deciding price points (changing prices) to result in best performance/profitability.
So far so good, however when I try:
...
import statsmodels.formula.api as smf
...
result = smf.poisson(formula="Item_Sales ~ Product_Detail_Views + Variant_Price + C(Product_Type)", data=df).fit()
I get:
RuntimeWarning: invalid value encountered in multiply
return -np.dot(L*X.T, X)
RuntimeWarning: invalid value encountered in greater_equal
return mu >= 0
RuntimeWarning: invalid value encountered in greater
oldparams) > tol))
And a table full of NaNs
If I use OLS with the same dataset:
result = smf.ols(formula="Item_Sales ~ Product_Detail_Views + Variant_Price + C(Product_Type)", data=df).fit()
I get an R-squared of 0.809 so data is good. The model isn't as usable though as I get negative predictions which are obviously not possible (you cannot have negative sales of items).
How can I make the Poisson model work?
Looks like a data problem. Since no sample data is shown, cannot be sure. You can try using GLM with family Poisson or GEE with family Poisson
example:
smf.glm('sedimentation ~ C(control_grid)', data=df, families=sm.families.Poisson)

Prediction intervals for ARMA.predict

The Summary of an ARMA prediction for time series (print arma_mod.summary()) shows some numbers about the confidence interval. Is it possible to use these numbers as prediction intervals in the plot which shows predicted values?
ax = indexed_df.ix[:].plot(figsize=(12,8))
ax = predict_price.plot(ax=ax, style='rx', label='Dynamic Prediction');
ax.legend();
I guess the code:
from statsmodels.sandbox.regression.predstd import wls_prediction_std
prstd, iv_l, iv_u = wls_prediction_std(results)
found here: Confidence intervals for model prediction
...does not apply here as it is made for OLS rather then for ARMA forecasting. I also checked github but did not find any new stuff which might relate to time series prediction.
(Making forecasts requires forecasting intervals i guess, especially when it comes to an out-of sample forecast.)
Help appreciated.
I suppose, for out-of-sample ARMA prediction, you can use ARMA.forecast from statsmodels.tsa
It returns three arrays: predicted values, standard error and confidence interval for the prediction.
Example with ARMA(1,1), time series y and prediction 1 step ahead:
import statsmodels as sm
arma_res = sm.tsa.ARMA(y, order=(1,1)).fit()
preds, stderr, ci = arma_res.forecast(1)

Categories