Time series forecast with recurrent Elman network in neurolab

Time series forecast with recurrent Elman network in neurolab - python

I use the Elman recurrent network from neurolab to predict a time series of continuous values. The network is trained from a sequence such that the input is the value at index i and the target is the value at index i+1.
To make predictions beyond the immediate next time step, the output of the net is feed back as input. If, for example, I intend to predict the value at i+5, I proceed as follows.
Input the value from i
Take the output and feed it to the net the as next input value (e.g. i+1)
Repeat 1. to 3. four more times
The output is a prediction of the value ati+5
So for predictions beyond the immediate next time step, recurrent networks must be activated with the output from a previous activation.
In most examples, however, the network is fed with an already complete sequence. See, for example, the functions train and sim in the example behind the link above. The first function trains the network with an already complete list of examples and the second function activates the network with a complete list of input values.
After some digging in neurolab, I found the function step to return a single output for a single input. Results from using step suggest, however, that the function does not retain the activation of the recurrent layer, which is crucial to recurrent networks.
How can I activate a recurrent Elman network in neurolab with a single input such that it maintains its internal state for the next single input activation?

It turns out it is quite normal for output which is generated from previous output sooner or later to converge towards a constant value. In effect, the output of a network cannot depend only on its previous output.

I obtain the same result - constant . However I noticed something:
-> if you use 0 and 1 data, results improve. 0 - decrease 1 - increase. Result is no longer a constant.
-> try to use another variable to explain the targeted one as one of our colleagues already mentioned.

Related

Neural Network doesn't match the expected output

I am trying to build a Neural Network from scratch, using only numpy. I have the following code and functions. However, the output after the training is not matching the expected output that i have (using XOR as an example). I think one of my functions is not correct but cannot figure out the mistake. The output I get is, for example: [[0.73105858], [0.53336314],[0.79343002],[0.5786911 ]], which is not close to the expected output [0,0,0,1]

I don't so any issues with your code, but here are some thing you should have in mind:
Your neural network is trained for 2 iterations, with a learning rate of 0.01. This means that your network is only updated 2 times with a small rate of improvement resulting in an undertrained neural network. Also, your always using a tensor of the size 4*4 for input, meaning that the neural network is only updated for the average of all samples, hence the result that just seems like an average.
For improvement, my suggestion would be to increase the number of iterations and also increase the number of samples for each iterations, also making sure that each iteration has more than one update. Still, i believe that you won't get 100% accurate results, since you are only using one linear layer for XOR, which can't be solved with just one linear system. You could consider adding another layer for better results.

Do LSTMs remember previous windows or is the hidden state reset?

I am training an LSTM to forecast the next value of a timeseries. Let's say I have training data with the given shape (2345, 95) and a total of 15 files with this data, this means that I have 2345 window with 50% overlap between them (the timeseries was divided into windows). Each window has 95 timesteps. If I use the following model:
input1 = Input(shape=(95, 1))
lstm1 = LSTM(units=100, return_sequences=False,
activation="tanh")(input1)
outputs = Dense(1, activation="sigmoid")(lstm1)
model = Model(inputs=input1, outputs=outputs)
model.compile(loss=keras.losses.BinaryCrossentropy(),
optimizer=tf.keras.optimizers.Adam(learning_rate=0.01))
I am feeding this data using a generator where it passes a whole file each time, therefore one epoch will have 15 steps. Now my question is, in a given epoch, for a given step, does the LSTM remember the previous window that it saw or is the memory of the LSTM reset after seeing each window? If it remembers the previous windows, then is the memory reset only at the end of an epoch?
I have seen similar questions like this TensorFlow: Remember LSTM state for next batch (stateful LSTM) or https://datascience.stackexchange.com/questions/27628/sliding-window-leads-to-overfitting-in-lstm but I either did not quite understand the explanation or I was unsure if what I wanted was explained. I'm looking for more of a technical explanation as to where in the LSTM architecture is the whole memory/hidden state reset.
EDIT:
So from my understanding there are two concepts we can call "memory"
here. The weights that are updated through BPTT and the hidden state
of the LSTM cell. For a given window of timesteps the LSTM can
remember what the previous timestep was, this is what the hidden
state is for I think. Now the weight update does not directly
reflect memory if I'm understanding this correctly.
The size of the hidden state, in other words how much the LSTM
remembers is determined by the batch size, which in this case is one
whole file, but other question/answers
(https://datascience.stackexchange.com/questions/27628/sliding-window-leads-to-overfitting-in-lstm and https://stackoverflow.com/a/50235563/13469674) state that if we
have to windows for instance: [1,2,3] and [4,5,6] the LSTM does not
know that 4 comes after 3 because they are in different windows,
even though they belong to the same batch. So I'm still unsure how
exactly memory is maintained in the LSTM
It makes some sense that the hidden state is reset between windows when we look at the LSTM cell diagram. But then the weights are only updated after each step, so where does the hidden state come into play?

What you are describing is called "Back Propagation Through Time", you can google that for tutorials that describe the process.
Your concern is justified in one respect and unjustified in another respect.
The LSTM is capable of learning across multiple training iterations (e.g. multiple 15 step intervals). This is because the LSTM state is being passed forward from one iteration (e.g. multiple 15 step intervals) to the next iteration. This is feeding information forward across multiple training iterations.
Your concern is justified in that the model's weights are only updated with respect to the 15 steps (plus any batch size you have). As long as 15 steps is long enough for the model to catch valuable patterns, it will generally learn a good set of weights that generalize well beyond 15 steps. A good example of this is the Shakespeare character recognition model described in Karpathy's, "The unreasonable effectiveness of RNNs".
In summary, the model is learning to create a good hidden state for the next step averaged over sets of 15 steps as you have defined. It is common that an LSTM will produce a good generalized solution by looking at data in these limited segments. Akin to batch training, but sequentially over time.
I might note that 100 is a more typical upper limit for the number of steps in an LSTM. At ~100 steps you start to see a vanishing gradient problem in which the earlier steps contribute nearly nothing to the gradient.
Note that it is important to ensure you are passing the LSTM state forward from training step to training step over the course of an episode (any contiguous sequence). If this step was missed the model would certainly suffer.

Is it possible to use LSTM predictions as inputs for next time steps?

I am working with LSTM (in PyTorch) for multivariate time series prediction. Let’s imagine the situation: I have 2 time series, A and B, and I want to predict t-value of B using previous values of A and B (before t). Such prediction works fine, my model gets good results.
But what if (during testing, after training) I want to use predicted values of B as inputs for next time step instead of real values? For example: I predict first value of B, make a step, put predicted value instead of a real, and make prediction again. Then I use two predicted values instead of real two, and so on. In some steps only predicted values will be in time series B.
Are there any possibilities to do that?

This is exactly what people do for machine translation and text generation in general. In this case, the LSTM predicts a distribution over a vocabulary, you select one word and use it as an input to the network in the next step. See PyTotrch tutorial on machine translation for more details.
The important point is that the LSTM executed in two regimes:
For training: as a standard sequence labeling. It is provided input and it should predict one step in the future.
For inference: It gradually generates new samples and uses them as the next input. In PyTorch, this needs to be implemented with an explicit for loop.

How do I feed data into my neural network?

I've coded a simple neural network for XOR in python. While there is loads of information online about how to program this, there isn't much on how to feed the data through it. I've tested the change in weights after one cycle for inputs [1,1] to compare my results with my lecture slides and it's 100% the same, so I believe the code works. I can train the network for that same input, but when I change the input (and corresponding target) every cycle the error doesn't go down.
Should I allow changing the weights and inputs after every cycle or should I run through all the possible inputs first, get an average error and then change the weights? (But changing weights are dependent on the output, so what output would I use then)
I can share my code, if needed, but I'm pretty certain it's correct.
Please give me some advice? Thank you in advance.

So, you're saying you implemented a neural network on your own ?
well in this case, basically each neuron on the input layer must be assigned with a feature of a certain row, than just iterate through each layer and each neuron in that layer and calculate as instructed.
I'm sure you are familiar with the back-propagation algorithm so you'll know when to stop.
once you're done with that row, do it again to the next row, assign each feature to each of the input neurons and start the iterations again.
once youre done with all records, thats an Epoch.
I hope that answers your question.
also, I would recommend you to try out Keras, its easy to use and a good tool to be experienced in.

Neural Network, python

I am trying to write a simple neural network that can come up with weights to for, say, the y=x function. Here's my code:
http://codepad.org/rPdZ7fOz
As you can see, the error level never really goes down much. I tried changing the momentum and learning rate but it did not help much. Is my number of input, hidden and output correct for what I want to do? If not, what should it be? If so, what else could be wrong?

You're attempting to train the network to give output values 1,2,3,4 as far as I understood. Yet, at the output you use a sigmoid (math.tanh(..)) whose values are always between -1 and 1.
So the output of your Neural network is always between -1 and 1 and thus you always get a large error when trying to fit output values outside that range.
(I just checked that when scaling your input and output values by 0.1, there seems to be a nice training progress and I get at the end:
error 0.00025
)
The Neural Network you're using is useful if you want to do classification (e.g. assign the data point to class A if the NN output is < 0 or B if it is > 0). It looks like what you want to do is regression (fit a real-valued function).
You can remove the sigmoid at the output node but you will have to slightly modify your backpropagation procedure to take this into account.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.