Best practice for timeseries prediction with help of indicators

Best practice for timeseries prediction with help of indicators - python

I would like to predict values (e.g. transport volumes). As input data I have the volumes from the last two years. I already did some timeseries prediction on those values basically following the instruction on Basics of Time Series Prediction and Techniques for Time Series Prediction.
I now would like to go a step further and include some indicators (e.g. economic indicators) in the prediction to see if this will increase the accuracy of the predictions.
What is the right approach to do so? Looking around I found this Post, basically describing the same usecase. Unfortunately it got no responses.
One approach might be to do a "simple" prediction based on a model with the current volume and indicators as features and the future volume as label. But I then would loose the timeseries, the connection between the single data points so to say.
Do you have experience with such predictions? What did work in your case? Please point me in the right direction!

One approach might be to do a "simple" prediction based on a model
with the current volume and indicators as features and the future
volume as label. But I then would loose the timeseries, the connection
between the single data points so to say.
In this case a common solution is to include N 'lagging' values (i.e. volumes for N previous periods) as features for every observation, in addition to some indicator value features. This allows using pretty much any regression model for time series forecasting. Just make sure there's no data leakage of the 'future' values when calculating your indicators.

Related

Forecasts are constant all the time in time series data

I'm working on hierarchical time series forecasting(python) and when I'm trying to fit the model with the entire data that I have I could see that the forecasts are constant all the time for some features. I couldn't able to understand where exactly the problem is and what are the possible approaches to fix this issue. Any sort of help would be great.
Thanks in Advance!!

I have faced a similar issue recently with hierarchical time series forecasting with no seasonality, where one of the 90 forecasts had no trend and no change in level over time, an exception in my set of predictions.
I have implemented Statsmodels Exponential Smoothing and I have tuned the hyperparameters roughly following Jason Brownlee's super helpful guide on How to grid search exponential smoothing.
No matter the grid search, this particular forecast remained flat.
I've added some extra parameters to the grid search, for instance, to identify the optimum amount of data points to train each of the 90 models in my hierarchical forecast and I've calculated the Z-scores of the year-on-year percentage change of the predictions, to skip predictions that were less than 5% likely given a reference period, even when they have the lowest error scores (this was needed because 2020 is probably an outlier in my dataset - time will tell).
And again, it was flat.
Well, maybe I should indeed do as Rob Hyndman suggests in his article and add random noise to this prediction so that the users are happier with what they see, even though this would increase the error.

Pattern prediction in time series

Has anyone tried to predict a specific pattern in time series data?
Example: In a specific time, there is a huge upward spike in certain variables in a time series...
How would I build a model to predict that spike when next time it occurs?
Please do respond if anyone working in this area.
I tried with converting that particular series of data in a NumPy array and trying to feed in the model.But Its not allowing me.
Here is the data looks like
This data is generated in a controlled manner so that we can have these spikes near to near.. In actual case this could b random, and our main objective is to catch this pattern and make a count.

Das, you could try implementing LSTM based Neural Network Models.
See:
https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
It is still preferred that the data contains a trend. If the upward spike happens around the same time of the recurring time interval, it is more likely that you get a better prediction result.
In the image you shared, there seems to be trend in the data. Hence LSTM models can pretty efficiently extract the pattern and output a prediction.
Statistical modelling of the data can also provide better results.
See: https://orangematter.solarwinds.com/2019/12/15/holt-winters-forecasting-simplified/
Das, if outputting the total number of peaks is solely the requirement, then I think heavy neural network models are bit of an overkill. However, neural network models also can pretty well do the job, but require lot of data input for training and fine tuning the weights and biases to give a really good result.
How about you try implementing a thresholding based technique, where you increment a counter every time the data value crosses the preset threshold? In such an approach you should ensure to group very nearby peaks together so that the count is just one for that case. Here you could set a threshold on the x axis too.
ie:- For instance with respect to the given plot, let the y-threshold be 4. Then you will get a count 5 if you consider the y axis threshold (y value 4) alone. This is because for x value at 15:48.2, there are two peaks that cross y value 4. So suppose you set a threshold in the x axis too, then these nearby peaks shall be grouped together within the preset limit and the final count will be 4 (which is the requirement).

Deal with excessive number of zeros

ipdb> np.count_nonzero(test==0) / len(ytrue) * 100
76.44815766923736
I have a datafile counting 24000 prices where I use them for a time series forecasting problem. Instead of trying predicting the price, I tried to predict log-return, i.e. log(P_t/P_P{t-1}). I have applied the log-return over the prices as well as all the features. The prediction are not bad, but the trend tend to predict zero. As you can see above, ~76% of the data are zeros.
Now the idea is probably to "look up for a zero-inflated estimator : first predict whether it's gonna be a zero; if not, predict the value".
In details, what is the perfect way to deal with excessive number of zeros? How zero-inflated estimator can help me with that? Be aware originally I am not probabilist.
P.S. I am working trying to predict the log-return where the units are "seconds" for High-Frequency Trading study. Be aware that it is a regression problem (not a classification problem).
Update
That picture is probably the best prediction I have on the log-return, i.e log(P_t/P_{t-1}). Although it is not bad, the remaining predictions tend to predict zero. As you can see in the above question, there is too many zeros. I have probably the same problem inside the features as I take the log-return on the features as well, i.e. if F is a particular feature, then I apply log(F_t/F_{t-1}).
Here is a one day data, log_return_with_features.pkl, with shape (23369, 30, 161). Sorry, but I cannot tell what are the features. As I apply log(F_t/F_{t-1}) on all the features and on the target (i.e. the price), then be aware I added 1e-8 to all the features before applying the log-return operation to avoid division by 0.

Ok, so judging from your plot: it's the nature of the data, the price doesn't really change that often.
Try subsampling your original data a bit (perhaps by a factor of 5, just look at the data), so that you generally see a price movement with every time-tick. This should make any modeling much MUCH easier.
For the subsampling: I suggest you do simple regular downsampling in time domain. So if you have price data with a second resolution (i.e. one price tag every second), then simply take every fifth datapoint. Then proceed as you usually do, specifically, compute the log-increase in the price from this subsampled data. Remember that whatever you do, it must be reproducible during the test time.
If that is not an option for you for whatever reasons, have a look at something that can handle multiple time scales, e.g. WaveNet or Clockwork RNN.

Recurrent Neural Network for anomaly detection

I am implementing an anomaly detection system that will be used on different time series (one observation every 15 min for a total of 5 months). All these time series have a common pattern: high levels during working hours and low levels otherwise.
The idea presented in many papers is the following: build a model to predict future values and calculate an anomaly score based on the residuals.
What I have so far
I use an LSTM to predict the next time step given the previous 96 (1 day of observations) and then I calculate the anomaly score as the likelihood that the residuals come from one of the two normal distributions fitted on the residuals obtained with the validation test. I am using two different distributions, one for working hours and one for non working hours.
The model detects very well point anomalies, such as sudden falls and peaks, but it fails during holidays, for example.
If an holiday is during the week, I expect my model to detect more anomalies, because it's an unusual daily pattern wrt a normal working day.
But the predictions simply follows the previous observations.
My solution
Use a second and more lightweight model (based on time series decomposition) which is fed with daily aggregations instead of 15min aggregations to detect daily anomalies.
The question
This combination of two models allows me to have both anomalies and it works very well, but my idea was to use only one model because I expected the LSTM to be able to "learn" also the weekly pattern. Instead it strictly follows the previous time steps without taking into consideration that it is a working hour and the level should be much higher.
I tried to add exogenous variables to the input (hour of day, day of week), to add layers and number of cells, but the situation is not that better.
Any consideration is appreciated.
Thank you

A note on your current approach
Training with MSE is equivalent to optimizing the likelihood of your data under a Gaussian with fixed variance and mean given by your model. So you are already training an autoencoder, though you do not formulate it so.
About the things you do
You don't give the LSTM a chance
Since you provide data from last 24 hours only, the LSTM cannot possibly learn a weekly pattern.
It could at best learn that the value should be similar as it was 24 hours before (though it is very unlikely, see next point) -- and then you break it with Fri-Sat and Sun-Mon data. From the LSTM's point of view, your holiday 'anomaly' looks pretty much the same as the weekend data you were providing during the training.
So you would first need to provide longer contexts during learning (I assume that you carry the hidden state on during test time).
Even if you gave it a chance, it wouldn't care
Assuming that your data really follows a simple pattern -- high value during and only during working hours, plus some variations of smaller scale -- the LSTM doesn't need any long-term knowledge for most of the datapoints. Putting in all my human imagination, I can only envision the LSTM benefiting from long-term dependencies at the beginning of the working hours, so just for one or two samples out of the 96.
So even if the loss value at the points would like to backpropagate through > 7 * 96 timesteps to learn about your weekly pattern, there are 7*95 other loss terms that are likely to prevent the LSTM from deviating from the current local optimum.
Thus it may help to weight the samples at the beginning of working hours more, so that the respective loss can actually influence representations from far history.
Your solutions is a good thing
It is difficult to model sequences at multiple scales in a single model. Even you, as a human, need to "zoom out" to judge longer trends -- that's why all the Wall Street people have Month/Week/Day/Hour/... charts to watch their shares' prices on. Such multiscale modeling is especially difficult for an RNN, because it needs to process all the information, always, with the same weights.
If you really want on model to learn it all, you may have more success with deep feedforward architectures employing some sort of time-convolution, eg. TDNNs, Residual Memory Networks (Disclaimer: I'm one of the authors.), or the recent one-architecture-to-rule-them-all, WaveNet. As these have skip connections over longer temporal context and apply different transformations at different levels, they have better chances of discovering and exploiting such an unexpected long-term dependency.
There are implementations of WaveNet in Keras laying around on GitHub, e.g. 1 or 2. I did not play with them (I've actually moved away from Keras some time ago), but esp. the second one seems really easy, with the AtrousConvolution1D.
If you want to stay with RNNs, Clockwork RNN is probably the model to fit your needs.
About things you may want to consider for your problem
So are there two data distributions?
This one is a bit philosophical.
Your current approach shows that you have a very strong belief that there are two different setups: workhours and the rest. You're even OK with changing part of your model (the Gaussian) according to it.
So perhaps your data actually comes from two distributions and you should therefore train two models and switch between them as appropriate?
Given what you have told us, I would actually go for this one (to have a theoretically sound system). You cannot expect your LSTM to learn that there will be low values on Dec 25. Or that there is a deadline and this weekend consists purely of working hours.
Or are there two definitions of anomaly?
One philosophical point more. Perhaps you personally consider two different types of anomaly:
A weird temporal trajectory, unexpected peaks, oscillations, whatever is unusual in your domain. Your LSTM supposedly handles these already.
And then, there is different notion of anomaly: Value of certain bound in certain time intervals. Perhaps a simple linear regression / small MLP from time to value would do here?
Let the NN do all the work
Currently, you effectively model the distribution of your quantity in two steps: First, the LSTM provides the mean. Second, you supply the variance.
You might instead let your NN (together with additional 2 affine transformations) directly provide you with a complete Gaussian by producing its mean and variance; much like in Variational AutoEncoders (https://arxiv.org/pdf/1312.6114.pdf, appendix C.2). Then, you need to optimize directly the likelihood of your following sample under the NN-distribution, rather than just MSE between the sample and the NN output.
This will allow your model to tell you when it is very strict about the following value and when "any" sample will be OK.
Note, that you can take this approach further and have your NN produce "any" suitable distribution. E.g. if your data live in-/can be sensibly transformed to- a limited domain, you may try to produce a Categorical distribution over the space by having a Softmax on the output, much like WaveNet does (https://arxiv.org/pdf/1609.03499.pdf, Section 2.2).

Regression with Date variable (python)

I have a time series (daily) dataset consisting of 1 label (integer) and 15 features over 5 years. I have no idea about the meaning of features, but I have to predict the labels based on those features.
To do so, first, I used the autocorrelation_plot from pandas.tools.plotting to figure out if I have any seasonality in my label (y) or not. Please see the figure below:
Then I used seasonal_decompose to find seasonal, trend and residual of my label (y) by sweeping the Freq parameter:
Could you please let me know which Freq is OK, and why?
What would be the next step? Do I need to remove both trend and seasonal terms from the data and then try to model and predict the residual factor by regression (e.g., SVR, linear, etc)? Or I need to predict the whole data (without removing seasonal and trend) by regression. I tried to predict the whole data (without removing seasonal and trend) by several regression techniques but the results are very bad. Finally, how can I predict the seasonal at the end? ARIMA is OK? what about the Trend???\
3) Am I on the right track (extracting seasonal, etc), or I should consider the "date" as a feature besides the other 15 features such as:
hour of the day (24 boolean features)
day of the week (7 boolean features)
day of the month (up to 31 boolean features)
month (12 boolean features)
year

Let me explain to you how seasonality is usually treated.
Most of the time, people try to extract a seasonal component and deal with the corrected series for analysis. In North America, statistical agencies apply a sequence of symmetric moving average filters to estimate seasonal, tend-cycle and irregular components and seasonnally adjusted data corresponds to data minus the estimated seasonal component. Usually, they also provide raw data in other tables and, sometimes, they also provide trend-cycle in yet other tables. In Australia, they prefer to present trend-cycles.
In Europe, decomposition is usually based upon a model: they specify an ARIMA model with seasonal components -- it allows for integrated seasonal components, moving averager components in seasonal dynamics, etc. -- and proceed to a decomposition by imposing hypotheses on the model to extract specific frequencies.
Now, the first thing you need to know is what exactly your function does. If you it uses moving average filters, you have to be aware that those filters are symmetric and that it forces the use of backcasts and forecasts (you need points before the beginning and after the end to apply symmetric filters -- it's the same end point problem faced by filters like the Hoddrick-Prescott, for instance). So, it needs to specify a good ARIMA with seasonality as a proxy to not make end points behave too poorly (or specify asymmetric filters for end points) and the symmetry implies a small data-snooping bias if you use the corrected dataset to compare forecasting models (because all new points contain future information). If you use an ARIMA model, the filter is asymmetric and corrected data points are not built using future points.
Now, to forecast, you have two options. (1) You can try to forecast the corrected value (you can then either forecast seasonality separately, if you need raw values abolsutely); (2) you forecast the raw series.
It's not obvious what is the best way to proceed. In theory, you want (2), but it can be very complicated -- like, frontier research models --, unless you use an ARIMA with seasonal component or impose constant seasonality and use seasonal dummies.
As for the 'frequency' choice, I tend to use informal tests to determine what is appropriate. In the moving average literature, we pick how long or short we want our filters -- and the goal is to produce estimated seasonals that capture entirely seasonal regularities. You can use nonparamateric tests on corrected data, like the Kruskal-Wallis test, but it is rather forgiving.
My advice, which I believe is preferable for forecasting, would be to find a package that allows you to work with parametric models with seasonality. Then, you'd have clear tests and information criteria to use to make decisions on sound statistical ground.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.