LSTM forecasted a straight line - python

I built an LSTM in Keras. It reads observations of 9 time-lags, and predicts the next label. For some reason, the model I trained is predicting something that is nearly a straight line. What issue might there be in the model architecture that is creating such a bad regression result?
Input Data: Hourly financial time-series, with a clear upward trend 1200+ records
Input Data Dimensions:
- originally:
X_train.shape (1212, 9)
- reshaped for LSTM:
Z_train.shape (1212, 1, 9)
array([[[0.45073171, 0.46783444, 0.46226164, ..., 0.47164819,
0.47649667, 0.46017738]],
[[0.46783444, 0.46226164, 0.4553289 , ..., 0.47649667,
0.46017738, 0.47167775]],
Target data: y_train
69200 0.471678
69140 0.476364
69080 0.467761
...
7055 0.924937
7017 0.923651
7003 0.906253
Name: Close, Length: 1212, dtype: float64
type(y_train)
<class 'pandas.core.series.Series'>
LSTM design:
my = Sequential()
my.add(LSTM((20),batch_input_shape=(None,1,9), return_sequences=True))
my.add(LSTM(20, return_sequences=True))
my.add(LSTM(20, return_sequences=True))
my.add(LSTM(1))
input layer of 9 nodes. 3 hidden layers of 20 units each. 1 output layers of 1 unit.
The Keras default is return_sequences=False
Model is compiled with mse loss, and adam or sgd optimizer.
curr_model.compile(optimizer=optmfunc, loss="mse")
Model is fit in this manner. Batch is 32, shuffle can be True/False
curr_model.fit(Z_train, y_train,
validation_data=(Z_validation,y_validation),
epochs=noepoch, verbose=0,
batch_size=btchsize,
shuffle=shufBOOL)
Config and Weights are saved to disk. Since I'm training several models, I load them afterward to test certain performance metrics.
spec_model.model.save_weights(mname_trn)
mkerascfg = spec_model.model.to_json()
with open(mname_cfg, "w") as json_file:
json_file.write(mkerascfg)
When I trained an MLP, I got this result against the validation set:
I've trained several of the LSTMs, but the result against the validation set looks like this:
The 2nd plot (LSTM plot) is of the validation data. This is y_validation versus predictions on Z_validation. They are the last 135 records in respective arrays. These were split out of full data (i.e validation), and have the same type/properties as Z_train and y_train. The x-axis is just numbering 0 to 134 of the index, and y-axis it the value of y_validation or the prediction. Units are normalized in both arrays. So all the units are the same. The "straight" line is the prediction.
What idea could you suggest on why this is happening?
- I've changed batch sizes. Similar result.
- I've tried changing the return_sequences, but it leads to various errors around shape for subsequent layers, etc.
Information about LSTM progression of MSE loss
There are 4 models trained, all with the same issue of course. We'll just focus on the 3 hidden layer, 20-unit per layer, LSTM, as defined above.(Mini-batch size was 32, and shuffling was disabled. But enabling changed nothing).
This is a slightly zoomed in image of the loss progressionfor the first model (adam optimizer)
From what I can tell by messing with the index, that bounce in the loss values (which creates the thick area) starts after in the 500s of epochs.

Your code has a single critical problem: dimensionality shuffling. LSTM expects inputs to be shaped as (batch_size, timesteps, channels) (or (num_samples, timesteps, features)) - whereas you're feeding one timestep with nine channels. Backpropagation through time never even takes place.
Fix: reshape inputs as (1212, 9, 1).
Suggestion: read this answer. It's long, but could save you hours of debugging; this information isn't available elsewhere in such a compact form, and I wish I've had it when starting out with LSTMs.
Answer to a related question may also prove useful - but previous link's more important.

OverLordGoldDragon is right: the problem is with the dimensionality of the input.
As you can see in the Keras documentation all recurrent layers expect the input to be a 3D tensor with shape: (batch_size, timesteps, input_dim).
In your case:
the input has 9 time lags that need to be fed to the LSTM in sequence, so they are timesteps
the time series contains only one financial instrument, so the input_dim is 1
Hence, the correct way to reshape it is: (1212, 9, 1)
Also, make sure to respect the order in which data is fed to the LSTM. For forecasting problems it is better to feed the lags from the most ancient to the most recent, since we are going to predict the next value after the most recent.
Since the LSTM reads the input from left to right, the 9 values should be ordered as: x_t-9, x_t-8, ...., x_t-1 from left to right, i.e. the input and output tensors should look like this:
Z = [[[0], [1], [2], [3], [4], [5], [6], [7], [8]],
[[1], [2], [3], [4], [5], [6], [7], [8], [9]],
...
]
y = [9, 10, ...]
If they are not oriented as such you can always set the LSTM flag go_backwards=True to have the LSTM read from right to left.
Also, make sure to pass numpy arrays and not pandas series as X and y as Keras sometimes gets confused by Pandas.
For a full example of doing time series forecasting with Keras take a look at this notebook

Related

How to work with samples of variable length using keras for an RNN?

Background
Hello everyone,
I'm working on (what I thought would be) a simple RNN using Google Colab [Tensorflow 2.9.2 and Keras 2.9.0]. I've been working through this for a while now, but I can't quite seem to get everything to play nice. The inputs to my RNN are sequences of the numbers 0 ~ 6 inclusive expressed as one-hot-encoded column vectors. The targets are just a single 0 ~ 6 value expressed as a one-hot-encoded row vector.
This link to a screenshot of my Colab describes...
Input of [0] -> Target of 6
Input of [0, 6] -> Target of 0
Input of [0, 6, 0, 3] -> Target of 0
Input of [0, 6, 0, 3, 0] -> Target of 5
From what I've been able to gather from other stackoverflow questions, blog posts, keras documentation, etc., the code below should be close to all I need for my use case as far as my model is concerned.
# Bulding RNN Model
model = None
model = keras.Sequential()
model.add(keras.Input((None, 7)))
model.add(layers.Bidirectional(layers.LSTM(32)))
model.add(layers.Dense(7, activation='softmax'))
model.summary()
# Compiling RNN Model
model.compile(
loss = keras.losses.CategoricalCrossentropy(),
optimizer="sgd",
metrics=["accuracy"],
)
The Problem
I'm very sure that my issue is related to every sample input being a vector or matrix of a different size. For example, a sequence of [0] would become a (7, 1) vector input for that particular timestep while [0, 5, 4, 1, 2, 3] would become a (7, 6) matrix input for its respective timestep. Based on the error messages I've received for the last several hours, I know keras isn't too pleased with that, but for what I'm trying to do, I'm not entirely sure of the best way forward.
I've manually split up my training and test sets.
( Image of code with output )
For clarity...
x_train and x_test -> A list of numpy arrays each with a variable column count (e.g., np.shape=(7, ???))
y_train and y_test -> A list of numpy arrays each with a constant size (e.g., np.shape=(1,7))
I'm quite sure my types are correct.
I'm fitting my model without anything extravagant.
# Fitting the RNN Model
model.fit(
x_train, y_train, validation_data=(x_test, y_test), epochs = 50
)
That said, I continue to receive a Value Error saying that "Layer Sequential_??? expects 1 input(s), but it received ??? input tensors."
( Image of Value Error )
Any help at all in this matter would be greatly appreciated!
Thank you all in advance!
As it turns out, I needing a masking layer to for what I was trying to do.
Rather, I believe a masking layer solved two underlying problems...
My dimensions not being precisely what Keras wanted [samples, timestep, features]
Not knowing how to make my input structure work when every sample had a variable timestep value. I can't shake the feeling that there's still a way to do it without padding, but my model can now 'fit' for the first time in hours. So, I'm going to call this success on some level.
Link to Embeddings and Masking
*With a special shoutout to keras.preprocessing.sequence.pad_sequences()

Keras LSTM trained with masking and custom loss function breaks after first iteration

I am attempting to train an LSTM that reads a variable length input sequence and has a custom loss function applied to it. In order to be able to train on batches, I pad my inputs to all be the maxmimum length.
My input data is a float tensor of shape (7789, 491, 11) where the form is (num_samples, max_sequence_length, dimension).
Any sample that is shorter than the maximum length I pad with -float('inf'), so a sequence with 10 values would start with 481 sets of 11 '-inf' values followed by the real values at the end.
The way I am attempting to evaluate this model doesn't fit into any standard loss functions, so I had to make my own. I've tested it and it performs as expected on sample tensors. I don't believe this is the source of the issue so I won't go into details, but I could be wrong.
The problem I'm having comes from the model itself. Here is how I define and train it:
model = tf.keras.Sequential()
model.add(tf.keras.layers.Masking(mask_value=-float('inf'),
input_shape=(train_X.shape[1], train_X.shape[2])))
model.add(tf.keras.layers.LSTM(32))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dense(30),
kernel_initializer=tf.keras.initializers.zeros())
model.add(tf.keras.layers.Reshape((3, 10)))
model.compile(loss=batched_custom_loss, optimizer='rmsprop', run_eagerly=True)
model.fit(x=train_X, y=train_y, validation_data=val, epochs=5, batch_size=32)
No errors are thrown when I try to fit the model, but it only works on the first batch of training. As soon as the second batch starts, the loss becomes 'nan'. Upon closer inspection, it seems like the LSTM layer is outputting 'nan' after the first epoch of training.
My two guesses for what is going on are:
I set up the masking layer wrong, and it for some reason fails to mask out all of the -inf values after the first training iteration. Thus, -inf gets passed through the LSTM and it goes haywire.
I did something wrong with the format of my loss function, and the when the optimizer applies my calculated loss to the model it ruins the weights of the LSTM. For reference, my loss function outputs a 1D tensor with length equal to the number of samples in the batch. Each item in the output is a float with the loss of that sample.
I know that the math in my loss function is good since I've tested it on sample data, but maybe the output format is wrong even though it seems to match what I've found online.
Let me know if the problem is obvious from what I've shown or if you need more information.

How to deal with variable length output units in output dense layer?

I am using Keras to build my architecture.
The regression problem I am trying to solve has outputs different for different training samples.
For instance, for 1st training sample, I have output as [16,3] for 2nd training sample it is [6]. I am unable to find a solution about how to assign units to output dense layer based on this type of output. You can interpret the output as having y_train having shape [no. of samples, columns(depending on how many outputs do specific training sample has)].
I have tried to fetch each and every input from y_train, so that I would have assign the length of this input to output dense layer as units, e.g. as for 1st training sample, we have [16,3] as output (input in y_train to predict for 1st training sample), I am planning to set output dense layer's unit as 2 and so on. but I even don't know how to fetch this and assign it to output dense layer's unit.
Can anybody help me on this variable length output problem?

LSTM architecture in Keras implementation?

I am new to Keras and going through the LSTM and its implementation details in Keras documentation. It was going easy but suddenly I came through this SO post and the comment. It has confused me on what is the actual LSTM architecture:
Here is the code:
model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))
model.add(Dense(2))
As per my understanding, 10 denote the no. of time-steps and each one of them is fed to their respective LSTM cell; 64 denote the no. of features for each time-step.
But, the comment in the above post and the actual answer has confused me about the meaning of 32.
Also, how is the output from LSTM is getting connected to the Dense layer.
A hand-drawn diagrammatic explanation would be quite helpful in visualizing the architecture.
EDIT:
As far as this another SO post is concerned, then it means 32 represents the length of the output vector that is produced by each of the LSTM cells if return_sequences=True.
If that's true then how do we connect each of 32-dimensional output produced by each of the 10 LSTM cells to the next dense layer?
Also, kindly tell if the first SO post answer is ambiguous or not?
how do we connect each of 32-dimensional output produced by each of
the 10 LSTM cells to the next dense layer?
It depends on how you want to do it. Suppose you have:
model.add(LSTM(32, input_shape=(10, 64), return_sequences=True))
Then, the output of that layer has shape (10, 32). At this point, you can either use a Flatten layer to get a single vector with 320 components, or use a TimeDistributed to work on each of the 10 vectors independently:
model.add(TimeDistributed(Dense(15))
The output shape of this layer is (10, 15), and the same weights are applied to the output of every LSTM unit.
it's easy to figure out the no. of LSTM cells required for the input(specified in timespan)
How to figure out the no. of LSTM units required in the output?
You either get the output of the last LSTM cell (last timestep) or the output of every LSTM cell, depending on the value of return_sequences. As for the dimensionality of the output vector, that's just a choice you have to make, just like the size of a dense layer, or number of filters in a conv layer.
how each of the 32-dim vector from the 10 LSTM cells get connected to TimeDistributed layer?
Following the previous example, you would have a (10, 32) tensor, i.e. a size-32 vector for each of the 10 LSTM cells. What TimeDistributed(Dense(15)) does, is to create a (15, 32) weight matrix and a bias vector of size 15, and do:
for h_t in lstm_outputs:
dense_outputs.append(
activation(dense_weights.dot(h_t) + dense_bias)
)
Hence, dense_outputs has size (10, 15), and the same weights were applied to every LSTM output, independently.
Note that everything still works when you don't know how many timesteps you need, e.g. for machine translation. In this case, you use None for the timestep; everything that I wrote still applies, with the only difference that the number of timesteps is not fixed anymore. Keras will repeat LSTM, TimeDistributed, etc. for as many times as necessary (which depend on the input).

How to train a LSTM model with different N-dimensions labels?

I am using keras (ver. 2.0.6 with TensorFlow backend) for a simple neural network:
model = Sequential()
model.add(LSTM(32, return_sequences=True, input_shape=(100, 5)))
model.add(LSTM(32, return_sequences=True))
model.add(TimeDistributed(Dense(5)))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])
It is only a test for me, I am "training" the model with the following dummy data.
x_train = np.array([
[[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]],
[[1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]],
[[0,1,0,0,0], [0,0,1,0,0], [0,0,0,1,0]],
[[0,0,1,0,0], [1,0,0,0,0], [1,0,0,0,0]],
[[0,0,0,1,0], [0,0,0,0,1], [0,1,0,0,0]],
[[0,0,0,0,1], [0,0,0,0,1], [0,0,0,0,1]]
])
y_train = np.array([
[[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]],
[[1,0,0,0,0], [0,1,0,0,0], [0,0,1,0,0]],
[[0,1,0,0,0], [0,0,1,0,0], [0,0,0,1,0]],
[[1,0,0,0,0], [1,0,0,0,0], [1,0,0,0,0]],
[[1,0,0,0,0], [0,0,0,0,1], [0,1,0,0,0]],
[[1,0,0,0,0], [0,0,0,0,1], [0,0,0,0,1]]
])
then i do:
model.fit(x_train, y_train, batch_size=2, epochs=50, shuffle=False)
print(model.predict(x_train))
The result is:
[[[ 0.11855114 0.13603994 0.21069065 0.28492314 0.24979511]
[ 0.03013871 0.04114409 0.16499813 0.41659597 0.34712321]
[ 0.00194826 0.00351031 0.06993906 0.52274817 0.40185428]]
[[ 0.17915446 0.19629011 0.21316603 0.22450975 0.18687972]
[ 0.17935558 0.1994358 0.22070852 0.2309722 0.16952793]
[ 0.18571526 0.20774922 0.22724937 0.23079531 0.14849086]]
[[ 0.11163659 0.13263632 0.20109797 0.28029731 0.27433187]
[ 0.02216373 0.03424517 0.13683401 0.38068131 0.42607573]
[ 0.00105937 0.0023865 0.0521594 0.43946937 0.50492537]]
[[ 0.13276921 0.15531689 0.21852671 0.25823513 0.23515201]
[ 0.05750636 0.08210614 0.22636817 0.3303588 0.30366054]
[ 0.01128351 0.02332032 0.210263 0.3951444 0.35998878]]
[[ 0.15303896 0.18197381 0.21823004 0.23647803 0.21027911]
[ 0.10842207 0.15755147 0.23791778 0.26479205 0.23131666]
[ 0.06472684 0.12843341 0.26680911 0.28923658 0.25079405]]
[[ 0.19560908 0.20663913 0.21954383 0.21920268 0.15900527]
[ 0.22829761 0.22907974 0.22933882 0.20822221 0.10506159]
[ 0.27179539 0.25587022 0.22594844 0.18308094 0.063305 ]]]
Ok, It works, but it is just a test, i really do not care about accuracy etc. I would like to understand how i can work with output of different size.
For example: passing a sequence (numpy.array) like:
[[0,0,0,0,1], [0,0,0,1,0], [0,0,1,0,0]]
I would like to get 4 dimensions output as prediction:
[[..first..], [..second..], [..third..], [..four..]]
Is that possibile somehow? The size could vary I would train the model with different labels that can have different N-dimensions.
Thanks
This answer is for non varying dimensions, but for varying dimensions, the padding idea in Giuseppe's answer seems the way to go, maybe with help of the "Masking" proposed in Keras documentation.
The output shape in Keras is totally dependent on the number of "units/neurons/cells" you put in the last layer, and of course, on the type of layer.
I can see that your data does not match your code in your question, it's impossible, but, suppose your code is right and forget the data for a while.
An input shape of (100,5) in an LSTM layer means a tensor of shape (None, 100, 5), which is
None is the batch size. The first dimension of your data is reserved to the number of examples you have. (X and Y must have the same number of examples).
Each example is a sequence with 100 time steps
each time step is a 5-dimension vector.
And the 32 cells in this same LSTM layer means that the resulting vectors will change from 5 to 32-dimension vectors. With return_sequences=True, all the 100 timesteps will appear in the result. So the result shape of the first layer is (None, 100, 32):
Same number of examples (this will never change along the model)
Still 100 timesteps per example (because return_sequences=True)
each time step is a 32-dimension vector (because of 32 cells)
Now the second LSTM layer does exactly the same thing. Keeps the 100 timesteps, and since it has also 32 cells, keeps the 32-dimension vectors, so the output is also (None, 100, 32)
Finally, the time distributed Dense layer will also keep the 100 timesteps (because of TimeDistributed), and change your vectors to 5-dimensoin vectors again (because of 5 units), resulting in (None, 100, 5).
As you can see, you cannot change the number of timesteps directly with recurrent layers, you need to use other layers to change these dimensions. And the way to do this is completely up to you, there are infinite ways of doing this.
But in all of them, you need to get free of the timesteps and rebuild the data with another shape.
Suggestion
A suggestion from me (which is just one possibility) is to reshape your result, and apply another dense layer just to achieve the final shape expeted.
Suppose you want a result like (None, 4, 5) (never forget, the first dimension of your data is the number of examples, it can be any number, but you must take it into account when you organize your data). We can achieve this by reshaping the data to a shape containing 4 in the second dimension:
#after the Dense layer:
model.add(Reshape((4,125)) #the batch size doesn't appear here,
#just make sure you have 500 elements, which is 100*5 = 4*125
model.add(TimeDistributed(Dense(5))
#this layer could also be model.add(LSTM(5,return_sequences=True)), for instance
#continue to the "Activation" layer
This will give you 4 timesteps (because the dimension after Reshape was: (None, 4, 125), each step being a 5-dimension vector (because of Dense(5)).
Use the model.summary() command to see the shapes outputted by each layer.
I don't know Keras but from a practical and theoretical point of view this is absolutely possible.
The idea is that you have an input sequence and an output sequence. Commonly, the beginning and the end of each sequence are delimited by some special symbol (e.g. the character sequence "cat" is translated into "^cat#" with an start symbol "^" and an end symbol "#"). Then the sequence is padded with another special symbol, up to a maximum sequence length (e.g. "^cat#$$$$$$" with a padding symbol "$").
If the padding symbol correspond to a zero-vector, it will have no impact on your training.
Your output sequence could now assume any length up to the maximum one, because the real length is the one from the start to the end symbol positions.
In other words, you will have always the same input and output sequence length (i.e. the maximum one), but the real length is that between the start and the end symbols.
(Obviously, in the output sequence, anything after the end symbol should not be considered in the loss function)
There seems to be two methods to do a sequence to sequence method, you're describing. The first directly using keras using this example (code below)
from keras.layers import Input, LSTM, RepeatVector
from keras.models import Model
inputs = Input(shape=(timesteps, input_dim))
encoded = LSTM(latent_dim)(inputs)
decoded = RepeatVector(timesteps)(encoded)
decoded = LSTM(input_dim, return_sequences=True)(decoded)
sequence_autoencoder = Model(inputs, decoded)
encoder = Model(inputs, encoded)
Where the repeat vector repeats the initial time series n times to match the output vectors number of timestamps. This will still mean you need a fixed number of time steps in you output vector, however, there may be a method to padding vectors that have less timestamps than you max amount of timesteps.
Or you can you the seq2seq module, which is built ontop of keras.

Categories