Simple questions about LSTM networks from Keras - python

Considering this LSTM based RNN:
# Instantiating the model
model = Sequential()
# Input layer
model.add(LSTM(30, activation="softsign", return_sequences=True, input_shape=(30, 1)))
# Hidden layers
model.add(LSTM(12, activation="softsign", return_sequences=True))
model.add(LSTM(12, activation="softsign", return_sequences=True))
# Final Hidden layer
model.add(LSTM(10, activation="softsign"))
# Output layer
model.add(Dense(10))
Is each output unit from the final hidden layer connected to each 12 output unit of the preceding hidden layer ? (10*12 = 120 connections)
Is each one of the 10 outputs from the Dense layer connected to each one of the final hidden layer (10*10 = 100 connections)
Would there be a difference in term of connections between the Input layer and the 1st hidden layer if variable "return_sequence" was set to False (for both layers or for one) ?
Thanks a lot for your help
Aymeric
Here is how I picture the RNN, please tell me if it's wrong:
Note about the picture:
X = one training example, i.e a vector of 30 bitcoin (BTC) values (each value represent one day, 30 days total)
Output vector = 10 values that are supposed to be the 10 next values of bitcoin (10 next days)

Let's take a look at the model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm (LSTM) (None, 30, 30) 3840
_________________________________________________________________
lstm_1 (LSTM) (None, 30, 12) 2064
_________________________________________________________________
lstm_2 (LSTM) (None, 30, 12) 1200
_________________________________________________________________
lstm_3 (LSTM) (None, 10) 920
_________________________________________________________________
dense (Dense) (None, 10) 110
=================================================================
Total params: 8,134
Trainable params: 8,134
Non-trainable params: 0
_________________________________________________________________
Since you don't use return_sequences=True, the default is return_sequences=False, which means only the last output from the final LSTM layer is used by the Dense layer.
Yes. But it is actually 110 because you have a bias: (10 + 1) * 10.
There would not. The difference between return_sequence=True and return_sequence=False is that when it is set to false, only the final output will be sent to the next layer. So if I have a time series data with 30 events (1, 30, 30), only the output from the 30th event will be passed along to the next layer. The computations are the same, so there will be no difference in weights. Do know that there might be some shape mis-matches if you try to set some of these to be False out of the box.

Related

Understand the summary of a LSTM model

I have the following LSTM model. Can somebody helps me understand the summary of the model?
a) How the param# are calculated?
b) We have no value?
c) the param# near the dropoout why is 0?
model = Sequential()
model.add(LSTM(64, return_sequences=True, recurrent_regularizer=l2(0.0015), input_shape=(timestamps,
input_dim)))
model.add(Dropout(0.5))
model.add(LSTM(64, recurrent_regularizer=l2(0.0015), input_shape=(timesteps,input_dim)))
model.add(Dense(64, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(n_classes, activation='softmax'))
model.summary()
The following are the input, timestamps, and x_train
timesteps=100
input_dim= 6
X_train=1120
The summary is:
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 100, 64) 18176
_________________________________________________________________
dropout_1 (Dropout) (None, 100, 64) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 64) 33024
_________________________________________________________________
dense_1 (Dense) (None, 64) 4160
_________________________________________________________________
dense_2 (Dense) (None, 64) 4160
_________________________________________________________________
dense_3 (Dense) (None, 6) 390
=================================================================
Total params: 59,910
Trainable params: 59,910
Non-trainable params: 0
Part of your question is answered here.
https://datascience.stackexchange.com/questions/10615/number-of-parameters-in-an-lstm-model
Simply put, the reason there are so many parameters for an LSTM model is because you have tons of data in your model and many weights need to be trained to fit the model.
Dropout layers don't have parameters because there are no weights in a dropout layer. All a dropout layer does is give a % chance that a neuron won't be included during testing. In this case, you've chosen 50%. Beyond that, there is nothing to configure in a dropout layer.
How parameters are calculated?
well!!. the input dimension is 6 and the hidden neurons in the first LSTM layer is 64.
so the first LSTM layer takes input [64 (initialized hidden state) + 6 (input)] in this form. so we can say the input dimension is 70 [64 (hidden state at t-1) + 6 current input at t].
Now the calculation part.
no of parms = input dimension * hidden units + bias.
= [64 (randomly initialized hidden state dimension) + 6 (input dimension)]*64( hidden neurons ) + 64 ( bias 1 for each hidden neurons)
= (64+6)*64+64
for one FFNN = 4544
But LSTM has 4 FFNN, so simply multiply it by 4.
Total trainable params = 4 * 4544
= 18176
Dropout layer does not have any parameters.
I am not sure which value you are talking about.?

How do I know the correct format for my input data into my keras RNN?

I am trying to build an Elman simple RNN as described here.
I've built my model using Keras as follows:
model = keras.Sequential()
model.add(keras.layers.SimpleRNN(7,activation =None,use_bias=True,input_shape=
[x_train.shape[0],x_train.shape[1]]))
model.add(keras.layers.Dense(7,activation = tf.nn.sigmoid))
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn_2 (SimpleRNN) (None, 7) 105
_________________________________________________________________
dense_2 (Dense) (None, 7) 56
=================================================================
Total params: 161
Trainable params: 161
Non-trainable params: 0
_________________________________________________________________
My training data is currently of shape (15000, 7, 7). That is, 15000 instances of length 7 one hot encodings, coding for one of seven letters. E.g [0,0,0,1,0,0,0],[0,0,0,0,1,0,0] and so forth.
The data's labels are the same format, since each letter predicts the next letter in the sequence, i.e [0,1,0,0,0,0,0] has the label [0,0,1,0,0,0,0].
So, training data (x_train) and training labels (y_train) are both shaped (15000,7,7).
My validation data x_val and y_val are shaped (10000,7,7). I.e the same shape just with fewer instances.
So when I run my model:
history = model.fit(x_train,
y_train,
epochs = 40,
batch_size=512,
validation_data = (x_val,y_val))
I get the error:
ValueError: Error when checking input: expected simple_rnn_7_input to have shape (15000, 7) but got array with shape (7, 7)
Clearly my input data is formatted incorrectly for input into the Keras RNN, but I can't think how to feed it the correct input.
Could anyone advise me as to the solution?
SimpleRNN layer expects the inputs of dimensions (seq_length, input_dim) that is (7,7) in your case.
Also if you want output at each time-step, you need to use return_sequence=True, by default which is false. This way you can compare the output at time-step.
So the model architecture will be something like this:
model.add(keras.layers.SimpleRNN(7, activation='tanh',
return_sequences=True,
input_shape=[7,7]))
model.add(keras.layers.Dense(7))
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
simple_rnn_12 (SimpleRNN) (None, 7, 7) 105
_________________________________________________________________
dense_2 (Dense) (None, 7, 7) 56
=================================================================
Total params: 161
Trainable params: 161
Non-trainable params: 0
_________________________________________________________________
Now at training time, it expects data input and output of dims (num_samples, seq_length, input_dims) i.e. (15000, 7, 7) for both.
model.compile(loss='categorical_crossentropy', optimizer='adam')# define any loss, you want
model.fit(x_train, y_train, epochs=2)

Mutli Step Forecast LSTM model

I am trying to implement a multi step forecasting LSTM model in Keras. The shapes of data is like this:
X : (5831, 48, 1)
y : (5831, 1, 12)
The model that I am trying to use is:
power_in = Input(shape=(X.shape[1], X.shape[2]))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
dropout=0.412563, kernel_initializer=power_lstm_init, return_sequences=True)(power_in)
main_out = TimeDistributed(Dense(12, kernel_initializer=power_lstm_init))(power_lstm)
While trying to train the model like this:
hist = forecaster.fit([X], y, epochs=325, batch_size=16, validation_data=([X_valid], y_valid), verbose=1, shuffle=False)
I am getting the following error:
ValueError: Error when checking target: expected time_distributed_16 to have shape (48, 12) but got array with shape (1, 12)
How to fix this?
According to your comment:
[The] data i have is like t-48, t-47, t-46, ..... , t-1 as the past data and
t+1, t+2, ......, t+12 as the values that I want to forecast
you may not need to use a TimeDistributed layer at all:
first, just remove the resturn_sequences=True argument of the LSTM layer. After doing it, the LSTM layer would encode the input timeseries of the past in a vector of shape (50,). Now you can feed it directly to a Dense layer with 12 units:
# make sure the labels have are in shape (num_samples, 12)
y = np.reshape(y, (-1, 12))
power_in = Input(shape=(X.shape[1:],))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
dropout=0.412563,
kernel_initializer=power_lstm_init)(power_in)
main_out = Dense(12, kernel_initializer=power_lstm_init)(power_lstm)
Alternatively, if you would like to use a TimeDistributed layer and considering that the output is a sequence itself, we can explicitly enforce this temporal dependency in our model by using another LSTM layer before the Dense layer (with the addition of a RepeatVector layer after the first LSTM layer to make its output a timseries of length 12, i.e. same as the output timeseries length):
# make sure the labels have are in shape (num_samples, 12, 1)
y = np.reshape(y, (-1, 12, 1))
power_in = Input(shape=(48,1))
power_lstm = LSTM(50, recurrent_dropout=0.4128,
dropout=0.412563,
kernel_initializer=power_lstm_init)(power_in)
rep = RepeatVector(12)(power_lstm)
out_lstm = LSTM(32, return_sequences=True)(rep)
main_out = TimeDistributed(Dense(1))(out_lstm)
model = Model(power_in, main_out)
model.summary()
Model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) (None, 48, 1) 0
_________________________________________________________________
lstm_3 (LSTM) (None, 50) 10400
_________________________________________________________________
repeat_vector_2 (RepeatVecto (None, 12, 50) 0
_________________________________________________________________
lstm_4 (LSTM) (None, 12, 32) 10624
_________________________________________________________________
time_distributed_1 (TimeDist (None, 12, 1) 33
=================================================================
Total params: 21,057
Trainable params: 21,057
Non-trainable params: 0
_________________________________________________________________
Of course, in both models you may need to tune the hyper-parameters (e.g. number of LSTM layers, the dimension of LSTM layers, etc.) to be able to accurately compare them and achieve good results.
Side note: actually, in your scenario, you don't need to use TimeDistributed layer at all because (currently) Dense layer is applied on the last axis. Therefore, TimeDistributed(Dense(...)) and Dense(...) are equivalent.

Keras Dense layer's input is not flattened

This is my test code:
from keras import layers
input1 = layers.Input((2,3))
output = layers.Dense(4)(input1)
print(output)
The output is:
<tf.Tensor 'dense_2/add:0' shape=(?, 2, 4) dtype=float32>
But What Happend?
The documentation says:
Note: if the input to the layer has a rank greater than 2, then it is
flattened prior to the initial dot product with kernel.
While the output is reshaped?
Currently, contrary to what has been stated in documentation, the Dense layer is applied on the last axis of input tensor:
Contrary to the documentation, we don't actually flatten it. It's
applied on the last axis independently.
In other words, if a Dense layer with m units is applied on an input tensor of shape (n_dim1, n_dim2, ..., n_dimk) it would have an output shape of (n_dim1, n_dim2, ..., m).
As a side note: this makes TimeDistributed(Dense(...)) and Dense(...) equivalent to each other.
Another side note: be aware that this has the effect of shared weights. For example, consider this toy network:
model = Sequential()
model.add(Dense(10, input_shape=(20, 5)))
model.summary()
The model summary:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_1 (Dense) (None, 20, 10) 60
=================================================================
Total params: 60
Trainable params: 60
Non-trainable params: 0
_________________________________________________________________
As you can see the Dense layer has only 60 parameters. How? Each unit in the Dense layer is connected to the 5 elements of each row in the input with the same weights, therefore 10 * 5 + 10 (bias params per unit) = 60.
Update. Here is a visual illustration of the example above:

keras access layer parameter of pre-trained model to freeze

I saved an LSTM with multiple layers. Now, I want to load it and just fine-tune the last LSTM layer. How can I target this layer and change its parameters?
Example of a simple model trained and saved:
model = Sequential()
# first layer #neurons
model.add(LSTM(100, return_sequences=True, input_shape=(X.shape[1],
X.shape[2])))
model.add(LSTM(50, return_sequences=True))
model.add(LSTM(25))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
I can load and retrain it but I can't find a way to target specific layer and freeze all the other layers.
An easy solution would be to name each layer, i.e.
model.add(LSTM(50, return_sequences=True, name='2nd_lstm'))
Then, upon loading the model you can iterate over the layers and freeze the ones matching a name condition:
for layer in model.layers:
if layer.name == '2nd_lstm':
layer.trainable = False
Then you need to recompile your model for the changes to take effect, and afterwards you may resume training as usual.
If you have previously built and saved the model and now want to load it and fine-tune only the last LSTM layer, then you need to set the other layers' trainable property to False. First, find the name of the layer (or index of the layer by counting from zero starting from the top) by using model.summary() method. For example this is the output produced for one of my models:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_10 (InputLayer) (None, 400, 16) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 400, 32) 4128
_________________________________________________________________
lstm_2 (LSTM) (None, 32) 8320
_________________________________________________________________
dense_2 (Dense) (None, 1) 33
=================================================================
Total params: 12,481
Trainable params: 12,481
Non-trainable params: 0
_________________________________________________________________
Then set the trainable parameters of all the layers except the LSTM layer to False.
Approach 1:
for layer in model.layers:
if layer.name != `lstm_2`
layer.trainable = False
Approach 2:
for layer in model.layers:
layer.trainable = False
model.layers[2].trainable = True # set lstm to be trainable
# to make sure 2 is the index of the layer
print(model.layers[2].name) # prints 'lstm_2'
Don't forget to compile the model again to apply these changes.

Categories