Tensorflow continous training from input data - python

I‘m new to tensorflow have a general question:
I have a certain amount of training data and want to do a time series prediction.
The intervals on my training data is one minute and I want to do a prediction for the following minutes based on the new input data which is provided via a REST API
What I don‘t understand is this:
Let‘s say I train the model with all the data till yesterday and this means I can predict the first values of today by a certain amount. But the new values of today have not been observed by the model that has been build yesterday.
How would you solve this problem?
Thanks

Djboblo,
I assume that you need to predict whole next day values on per minute basis.
In that case your options are:
recursive prediction, i.e. using predicted data as input for the next prediction
structuring the model to provide you with prediction for the whole next day
If it is just a matter of predicting for a single minute forward and your model is trained on reasonably large amount of data - don't worry, just feed it with the values up to the prediction minute. Periodically you may re-train the model using new data.

What I was looking for is this:
How to use a Keras RNN model to forecast for future dates or events?
to predict stateful events
and after a while use the .fit Method to update the network with new data
See: https://machinelearningmastery.com/update-lstm-networks-training-time-series-forecasting/

Related

Python - How to use fitted ARIMA model on unseen data

I am using statsmodels.tsa.arima.model.ARIMA to fit an ARIMA model on a timeseries.
How can I use this model to make predictions on unseen data? It seems that the predict and forecast function can only make predictions from the last seen data in the training set that model was fitted to.
So for instance, I want to use a static model to keep making prediction into the future. This is for the purpose of real time multi step forecasting where re-fitting the model isn't an option.
E.g.,
Say we have a dataset size of 10,000 split into train and test (70/30).
The last reading we train on is 7,000
Is it possible to, say, use the trained model and pass in 6997 to 7000 to predict 7001 to 7004
And then in the following iteration pass it 6998 to 7001 to predict 7002 to 7005 using the same model.
This type of prediction is common in ML workflow, but not apparent to me how to perform this in ARIMA.
Predict and forecast functions only ask for indices parameters, but there is no parameter for fresh data.
You can easily do it with the predict method which was created for this purpose. You first train you ARIMA model on all of you data (without splits). When generating forecasts you use the predict method and set the start and end parameter, e.g. when you want to predict 7001 to 7004 like this:
model.predict(start=7000, end=7004)
The predict method will use all the data available to the start point (including that one) and then make a prediction. That way you do not have to train you model again and again with new data.
The start/end parameter also works with datetime or strings (like "2021-06-30" to "2021-07-31").
https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMAResults.predict.html

How to predict future candlestick based on trained model

Let's assume we have trained a model using Keras with more than 90% accuracy.
We have used past data (open price, high, low, close, volume, etc.) with 80-20 train/test split ratio.
The problem here is that we used the data which already exists to predict the data which again already exists.
How can we use this model to predict the future? For example, using a trained model to predict red or green candlesticks for the next 4 hours?
I know we can use model.save then load the model and finally use model.predict() but the problem here is that model.predict() needs some input data for making predictions. Can we use timestamp here as an input (where we don't have the future OHLCV data obviously)?
Your training data and input data should be in the same format.
For instance if you trained your model with the previous days open, high, etc. data to predict todays data, just input todays data to predict tomorrows.

Do Machine Learning Algorithms read data top-down or bottom up?

I'm new to Machine Learning and I'm a bit confused about how data is being read for the training/testing process. Assuming my data works with date and I want the model to read the later dates first before getting to the newer dates, the data is saved in the form of earliest date on line 1 and line n has the oldest date. I'm assuming that naturally data is being read from line 1 down to line n, but I just need to be sure about it. And is there any way to make the model(E.g Logistic Regression) read data whichever direction I want it to?
A machine learning model in supervised learning method learns from all samples in no order, it is encouraged to shuffle the samples during training.
Most of the times, the model won't get fed with all samples at once; the training set is split into batches, batches of random samples or just in whatever order it is in the training set.

XGBoost model with real time data

I need a clarification about my FOREX machine learning strategy cause I'm a little bit confused.
I train/test and save an XGBoost classification model in a pickle file.
Then in my broker's algo trading loop I import that model and try to predict real time data (historical data until today in a loop that updates last candle) opening a buy/sell order depending on a forecast class
Unfortunately I get this error:
ValueError: Number of features of the model must match the input. Model n_features is 41 and input n_features is 1
Why? I really have to pre-process live data like my train/test dataframe? I just thought that saving and importing pickled models would help to avoid this step inside a live brokerage streaming.
I know, logically, that a boosted tree is a set of rules built upon data features, so it is illogical to think that those rules could be applied to a single row of returns. But...
Last question: every update on real time data my algo takes last two rows, calculates return and assigns a classification label (1, -1) based on returns sign (if return >0, 1, -1). What should be my predict target, return or binary label?
For example:
my_model.predict(?)[0]
What do I put in the parenthesis here?

Out of sample predictions with LSTM

This is a general question about making real future predictions with an LSTM model using keras & tensorflow in Python (optional R).
For example stock prices. I know there is a train/test split to measure the accuracy/performance of the model comparing my results with the test prices. But I want to make real future predictions/out of sample predictions. Does anyone has an idea & would like to share some thoughts on it?
Me only came to mind to use a rolling window but that didn't work at all. So I'm glad about every tip you guys have.
There are two main ways to create train/validation set for a time series situation:
splitting your samples ( taking for example 80 % of the time series for training and 20 % for validation)
splitting your time series ( training your model on the first n-k values of the time series and validation on the k other values)

Categories