RNN with more weights on recent data - python

I am working on LSTM RNN on stock prediction sample data. It seems the RNN is not giving more weights to
recent data.The weights is equally shared in different time steps in RNN. Do we have an option to increase the weight for recent data? (With any parameters in LSTM OR RNN).
Can you correct me or give some more inputs on this.
Thanks in advance.

This is why most time-series models now have an attention mechanism. As the attention mechanism is better at learning the relevant time steps. It is also why some people are now using the transformer. RNN/LSTM don't learn long range dependencies well. For instance the DA-RNN paper states
In the first stage, we introduce an in- put attention mechanism to adaptively extract rel- evant driving series (a.k.a., input features) at each time step by referring to the previous encoder hid- den state. In the second stage, we use a temporal attention mechanism to select relevant encoder hid- den states across all time steps.
The key word being across all time steps. You can find implementations of several attention/transformer based models here (disclaimer I'm the maintainer of this framework).

Related

Neural Network Regression - Considering a dynamic state

I am using Tensorflow to solve a regression problem with known dynamic components, that is, we are aware that the (singular) label at time t depends on some dynamic state of the environment, but this feature is unknown. The initial attempt to solve the problem via simple regression has, understandably, failed, confirming our assumption that there is some kind of dynamic influence by a feature we have no access to.
However, the state of the environment at time t should be reflected somewhere in the features and labels (and in particular their interplay) known at times t0-n, where n > 0. Unfortunately, because of the nature of the problem, the output at time t heavily depends on the input at time t, about as much as it depends on the dynamic state of the environment. I am worried that this renders the approach I wanted to try ineffective - time series forecasting, in my understanding, would consider features from previous timesteps, but no inputs on the current timestep. Additionally, I know labels from previous timesteps, but not at the time at which I want to make my prediction.
Here is a table to illustrate the problem:
t
input
output
0
x(t=0)
y(t=0)
...
...
...
t0-1
x(t=t0-1)
y(t=t0-1)
t0
x(t=t0)
y(t=t0)=?
How can I use all the information at my disposal to predict the value of y(t=t0), using x(t=t0) (where x is the array of input features) and a defined window of features and labels at previous timesteps?
Is there an established method for solving a problem like this, either using a neural net or perhaps even a different model?
Does this problem require a combination of methods, and if so, which ones might be suitable for tackling it?
The final model is meant to be deployed and continue working for future time windows as well. We know the size of the relevant time window to be roughly 100 time steps into the past.
The kind of problem I have described is, as I have since learned, linked to so-called exogenous variables. In my case, I require something called NNARX, which is similar to the ARMAX model at its core, but (as a neural net) can take non-linearity into account.
The general idea is to introduce an LSTM layer which acts as an Encoder for the historical input, which is then coupled to another input layer with the exogenous variables. Both are coupled at the so-called Decoder - the rest of the NN architecture.

How can you do time series forecasting in Tensorflow (or with other tools) where features of the label timestep are known?

This is a question about a general approach rather than a specific coding problem. I'm trying to do time series forecasting with Tensorflow where features of the label timestep are known to the model. E.g. a human trying to predict a variable a week from now would know things that are going to happen in the next week that will affect that variable. So a window of 20 timesteps where the label is the 20th timestep would look something like this:
Timesteps 1-19 would each have a set of features plus the timeseries data
Timestep 20 would have a set of features which are known, plus the timeseries label which is unknown
Is there a model that could handle this sort of data? I've gone through the Tensorflow time series forecasting tutorial, done a Coursera course on Tensorflow time series forecasting and searched elsewhere but I can't find anything. I'm fairly new to this so apologies for any imprecise language.
I once tried to do this kind of TS problem by stacking a multivariate model and another machine learning model. My idea was that I use the normal TS model's output, add it as another feature in the other model that only takes the last time step's info as input. But it is complicated and might overfit a lot even if I carefully regularized the second model. The idea is that I use step 1 to window_size - 1 info to predict a rough output at step window_size, then use the info at step window_size to reduce the residual between my TS model output and the actual label; But I don't think this approach is theoretically correct and the result might be worse than using a TS model without feeding the target step's info.
I don't think tensorflow have any API for your problem because this type of problem is not a normal TS problem. Usually people would just treat this kind of problem as a regression or classification problem.
I am not an expert on this problem as well, but I just happened to attempt to solve the exact problem so this is just my personal experience...

Drop inactive features in Keras

I'm building a Sequential NN model in Keras for binary classification. The training data has about 600,000 rows and 2,000 features, so every epoch and every layer is very time consuming. I believe many of the features are not relevant to the model, and can be dropped altogether, to make the model thinner, so it it would be faster to work with.
I run a simple model with one hidden-layer of 200 neurons. How can I tell which of the features (which are actually the nodes in the input layer) are meaningless, so I could drop them from the data set and re run the model without them?
There is a very big topic in machine learning called feature selection. Though, neural networks are considered to automatically choose the best features for the problem, to an extent, by using their weights, to either consider more or less some of them. Neural networks also need a lot of experience to be tuned correctly. I would definitely suggest you to increase the layers of the network, because you have a lot of data and features and use l1 regularisation, in order to get sparse weights and exclude most of the features. Also, these information are indicative, since I do not know anything about your dataset and your network architecture. At last, I would suggest you to study more about the basics of machine learning and then continue learning about neural networks, before practicing with real data.

Forecasting using LSTM

How can I use Long Short-term Memory (LSTM) to predict a future value x(t+1) (out of sample prediction) based on a historical dataset. I read and tried many web tutorials for forecasting and prediction using lstm, but still far away from the point. What's the exact procedure to do this prediction? Is it just as simple as shifting the target array (n)steps where n is the number of future predicts and do the prediction operation? or there's another techniques?
please help or leave a suggestion.
Can you provide the framework you are using? tensorflow? pytorch? which web tutorials specifically?
Assuming you are going tensorflow, you can copy and paste code from one of these, test that it works on the provided dataset, then modify the input encoding functions to fit your dataset, then run on your dataset.
https://github.com/llSourcell/How-to-Predict-Stock-Prices-Easily-Demo (best)
https://github.com/sebastianheinz/stockprediction
https://github.com/talolard/MarketVectors/blob/master/preparedata.ipynb (you will have to replace fc layers with lstm, and fiddle with inputs)
In general procedure is something like (assuming tensorflow):
Download Dataset
Create a function to load batches of data
Create a function to encode batch of data (normalization, other transforms)
Create LSTM layer to recieve series of inputs.
Create output layer (usually fully connected) to take last lstm state and predict output of your desired size.
Create a tf session to wire everything together, and hit run.
Some questions to ask conceptually about which network use:
How many inputs to how many outputs - see this excellent http://cs231n.stanford.edu/slides/2016/winter1516_lecture10.pdf by Karpathy
How far back do you consider the stock prices eg {t-100... t} or {t-10 ...t} which may dictate size of hidden layers.
What other information do you think is relevant to the model? does stock A influence stock B? in which case you may have 2 lstms outputing a state to your fully connected layer...

How to train a hierarchical model in two parts

This is a follow up to the following question: Confused about how to implement time-distributed LSTM + LSTM
The current draft structure that is working well:
The basic idea is that there is a TimeDistributed deep LSTM input layer that works on each epoch of raw time series data and outputs a vector of features for each output. Then, the "outer" deep LSTM layer takes 7 of those sequential outputs and tries to classify the center epoch (assumed that 1 epoch does not have enough information to be classified by itself, and needs surrounding epochs). I say this is a draft because I haven't yet explored the feature space required for this to work well on many subjects.
There are several issues that still need to be resolved, but the one that I haven't found any clear-cut examples of online are trying to train this model in two parts: 1) the TimeDistributed later and 2) the "outer" layer. The reason being is that as I increase the number of epochs needed to classify (currently 7, but I expect it may get up to 21 or higher) more duplicated data is loaded, and the training speed is decreasing quickly.
One may propose an autoencoder for the first layer. However, I don't think this is the best solution. The reason I think so is that the features necessary to reproduce the input might very well be different than the features necessary to be used with other epochs to classify said layer. To expand: this is probable because the time series is semi-periodic, with most of the epoch providing little information other than the current period from important feature to important feature (and the number and location of these important features varies in each epoch).

Categories