I am currently working on a program that would take the previous 4000 days of stock data about a particular stock and predict the next 90 days of performance.
The way I've elected to do this is with an RNN that makes use of LSTM layers to use the previous 90 days to predict the next day's performance (when training, the previous 90 days are the x-values and the next day is used as the y-value). What I would like to do however, is use the previous 90-180 days to predict all the values for the next 90 days. However, I am unsure of how to implement this in Keras as all the examples I have seen only predict the next day and then they may loop that prediction into the next day's 90 day x-values.
Is there any ways to just use the previous 180 days to predict the next 90? Or is the LSTM restricted to only predicting the next day?
I don't have the rep to comment, but I'll say here that I've toyed with a similar task. One could use a sliding window approach for 90 days (I used 30, since 90 is pushing LSTM limits), then predict the price appreciation for next month (so your prediction is for a single value). #Digital-Thinking is generally right though, you shouldn't expect great performance.
Related
i have 30 years for daily temperature data and i have a problem with determinate the saisonal period , i cant do it with lag = 365 day python is implemented
so can i use 90 day ( 1 season) . I don't know does it make sense or not
A seasonality of 365 does not really make sense for three reasons:
Such a long seasonality often leads to long runtimes or as in your case the process breaks.
When you use exactly 365 days you are ignoring leap years, so you model distorts over longer periods (30 years for examples results a distortion of 7 days already)
The most important reason is the correlation itself. A seasonality of 365 would mean that the weather today is somehow correlated to the weather last year, which is not really the case. Of course, both days will be somewhat close to another due to meteorological seasons, but I would not rely on this correlation. Imagine last year was relatively cold and this year is relatively warm. Do you use the last days immediately before today or would you base your forecast on some day a year ago? That is not very reliable.
Your forecast has to be based on the last immediate lags before the actual day – not the days of last year. However, to improve your model you have to factor in the meteorological seasons (spring, summer, fall, winter). There are multiple ways to do so. One for example would be to use SARIMAX and pass the model an exogenous feature, e.g. the monthly average temperature or a moving average over the last years. That way you can teach the model that days in summer are generally hotter than in winter, and for the precise prediction you use the days immediately before.
See here for the documentation of statsmodels SARIMAX:
https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html
There are plenty of examples on how to create these models with exogenous variables on the web.
With regard to time series features in a regression ML model.
Suppose, we are living in a space colony. The temperature there is accurately under control, so we will know the temperature next week.
Now, I have a problem predicting ice cream sales next week. Feature values are past sales records and temp values.
In this case, I believe that the fixed temp next week will help raise the accuracy of the sales prediction but I cannot come up with how to use this temp. I should split training/validation datasets from past data with train_test_split() as always. But I do not know how to handle the fixed future values.
Does somebody know how to?
I am trying to set up a Neural Network that, based on returns of the last 260 days of a series of assets, tries to predict which is the likelihood of each single asset falling within the first quartile over the following period in terms of sharpe ratio.
The setting of the network is quite easy, two layers (the first with 30 nodes, the second with 20).
At first sight the results look encouraging too, however there is a problem. if we take for example one single asset, it might happen that if I run the neural network model on it today, it predicts for example 0.8, whereas if I run it tomorrow (so shifting the returns in X by one day) the result change completely.
I wouldn't want this behaviour and would desire some more stable results, considering that the only difference is that one return has changed (the oldest one has gone out, and the yesterday return has entered) and the other 259 has been shifted by 1 day.
Can you please help me?
I'm working on hierarchical time series forecasting(python) and when I'm trying to fit the model with the entire data that I have I could see that the forecasts are constant all the time for some features. I couldn't able to understand where exactly the problem is and what are the possible approaches to fix this issue. Any sort of help would be great.
Thanks in Advance!!
I have faced a similar issue recently with hierarchical time series forecasting with no seasonality, where one of the 90 forecasts had no trend and no change in level over time, an exception in my set of predictions.
I have implemented Statsmodels Exponential Smoothing and I have tuned the hyperparameters roughly following Jason Brownlee's super helpful guide on How to grid search exponential smoothing.
No matter the grid search, this particular forecast remained flat.
I've added some extra parameters to the grid search, for instance, to identify the optimum amount of data points to train each of the 90 models in my hierarchical forecast and I've calculated the Z-scores of the year-on-year percentage change of the predictions, to skip predictions that were less than 5% likely given a reference period, even when they have the lowest error scores (this was needed because 2020 is probably an outlier in my dataset - time will tell).
And again, it was flat.
Well, maybe I should indeed do as Rob Hyndman suggests in his article and add random noise to this prediction so that the users are happier with what they see, even though this would increase the error.
Which modelling strategy (time frame, features, technique) would you recommend to forecast 3-month sales for total customer base?
At my company, we often analyse the effect of e.g. marketing campaigns that run at the same time for the total customer base. In order to get an idea of the true incremental impact of the campaign, we - among other things - want to use predicted sales as a counterfactual for the campaign, i.e. what sales were expected to be assuming no marketing campaign.
Time frame used to train the model I'm currently considering 2 options (static time frame and rolling window) - let me know what you think. 1. Static: Use the same period last year as the dependent variable to build a specific model for this particular 3 month time frame. Data of 12 months before are used to generate features. 2. Use a rolling window logic of 3 months, dynamically creating dependent time frames and features. Not yet sure what the benefit of that would be. It uses more recent data for the model creation but feels less specific because it uses any 3 month period in a year as dependent. Not sure what the theory says for this particular example. Experiences, thoughts?
Features - currently building features per customer from one year pre-period data, e.g. Sales in individual months, 90,180,365 days prior, max/min/avg per customer, # of months with sales, tenure, etc. Takes quite a lot of time - any libraries/packages you would recommend for this?
Modelling technique - currently considering GLM, XGboost, S/ARIMA, and LSTM networks. Any experience here?
To clarify, even though I'm considering e.g. ARIMA, I do not need to predict any seasonal patterns of the 3 month window. As a first stab, a single number, total predicted sales of customer base for these 3 months, would be sufficient.
Any experience or comment would be highly appreciated.
Thanks,
F