Facebook Prophet Forecasting using Daily or Monthly data? - python

This is a question concerning forecasting using Facebook Prophet.
I have a 10-year dataset with daily data that are integers ranging from 0 to 100+.
On certain days, y = 0.
I am looking to produce monthly forecasts for the next 3 years. Should I:
a) Aggregate the data by months before running Prophet? (120 data points); or
b) Run Prophet using daily data (3652 data points)
Thanks.

Related

High frequency time series forecasting

I have a high frequency time series (observations separated by 3 seconds), which I'd like to analyse and eventually forecast short-term periods (10/20/30 min ahead) using different models. My hole dataset containing 20K observations. My goal is to come out with conclusions of how good the different models can forecast the data.
I tried first to plot the hole dataset but i couldn't identify anything :
Hole Dataset
Then I plotted only the first 500 observations and this is the result :
Firt 500 observations
I don't know why it looks just like a whitenoise !
After running the ADF test on the hole dataset it gives me a 0.0 p-value ! this means that my dataset is stationary right ?
I decided to try first the ARIMA model, from the ACF and PACF plots I can't identify p and q :
ACF
PACF
1- Is the dataset a whitenoise ? Is it possible to predict in this time series ?
2- I tried to downsample the dataset (the mean in each 4 minutes), but same think, I couldn't identify anythink, and I think this will result a loss of inforlation no ?
3- What is the length of data on which I should fit the ARIMA on the training set ? Does it make sense to use a short training set for short term forecasting period ?

How can i find cross-correlation of time series data by using pandas library of python?

I have annual daily timeseries data of rainfall and groundwater level and I want cross-correlation between these two variables. How can I do this in Pandas with lags of months? I want to check in which month GWL shows a high correlation with rainfall.

How to choose initial, period, horizon and cutoffs with Facebook Prophet?

I have around 23300 hourly datapoints in my dataset and I try to forecast using Facebook Prophet.
To fine-tune the hyperparameters one can use cross validation:
from fbprophet.diagnostics import cross_validation
The whole procedure is shown here:
https://facebook.github.io/prophet/docs/diagnostics.html
Using cross_validation one needs to specify initial, period and horizon:
df_cv = cross_validation(m, initial='xxx', period='xxx', horizon = 'xxx')
I am now wondering how to configure these three values in my case? As stated I have data of about 23.300 hourly datapoints. Should I take a fraction of that as the horizon or is it not that important to have correct fractions of the data as horizon and I can take whatever value seems to be appropriate?
Furthermore, cutoffs has also be defined as below:
cutoffs = pd.to_datetime(['2013-02-15', '2013-08-15', '2014-02-15'])
df_cv2 = cross_validation(m, cutoffs=cutoffs, horizon='365 days')
Should these cutoffs be equally distributed as above or can we set the cutoffs individually as someone likes to set them?
initial is the first training period. It is the minimum
amount of data needed to begin your training on.
horizon is the length of time you want to evaluate your forecast
over. Let's say that a retail outlet is building their model so
that they can predict sales over the next month. A horizon set to 30
days would make sense here, so that they are evaluating their model
on the same parameter setting that they wish to use it on.
period is the amount of time between each fold. It can be either
greater than the horizon or less than it, or even equal to it.
cutoffs are the dates where each horizon will begin.
You can understand these terms by looking at this image -
credits: Forecasting Time
Series Data with
Facebook Prophet by Greg Rafferty
Let's imagine that a retail outlet wants a model that is able to predict the next month
of daily sales, and they plan on running the model at the beginning of each quarter. They
have 3 years of data
They would set their initial training data to be 2 years, then. They want to predict the
next month of sales, and so would set horizon to 30 days. They plan to run the model
each business quarter, and so would set the period to be 90 days.
Which is also shown in above image.
Let's apply these parameters into our model:
df_cv = cross_validation(model,
horizon='30 days',
period='90 days',
initial='730 days')

fbprophet yearly seasonality volatility

I am new to using fbprophet and have a question about using the predict function.
As an example, I am using fbprophet to extrapolate Apples revenue for the next 5 years. Below is the code using the default settings.
m = Prophet()
m.fit(data)
future = m.make_future_dataframe(periods=5*365)
forecast = m.predict(future)
m.plot(forecast)
m.plot_components(forecast)
plt.show()
The results:
If I choose to remove the "yearly seasonality", I get a linear regression that fits much better.
My question is why do the predicted yhat results blow up so much when yearly seasonality is included. As shown, turning the option off produces a linear regression model but I'm unsure whether this model is most suitable for the data. Any suggestions would be much appreciated.
It looks like you are using monthly data and not daily data.
So instead of using "periods=5*365" you can change the freq to monthly.
Example:
future_pd = m.make_future_dataframe(periods = 12 * 5,
freq='MS',
include_history=True)

Can macd be calculated for values other than 12 and 26?

I am working on time-series classification problem using CNN. The dataset used is financial stock market data (like Yahoo Finance). I am using some technical indicators calculated using raw values high,low,volume,open,close.
One of the technical indicators is MACD (Moving Average Convergence Divergence) using TA Library. However, it is written, in most places, that it is calculated for n_fast = 12 and n_slow = 26 periods with RSI (Relative Strength Index) being calculated for 14 days and n_sign = 9 (parameter of macd_diff() in ta library).
So, if I am calculating RSI for 5 days period then how do we set these n_fast and n_slow values according to it? Should these be n_fast = 3 and n_slow = 8. Also, what should be the value of n_sign then? I am new to finance domain.

Categories