Weights and Biases of LSTM Layer Python - python

I have developed an LSTM Model with 1 LSTM layer and 3 dense layers as shown below
model = Sequential()
model.add(LSTM(units = 120, activation ='relu', return_sequences = False,input_shape
(train_in.shape[1],5)))
model.add(Dense(100,activation='relu'))
model.add(Dense(50,activation='relu'))
model.add(Dense(1))
I have trained the model and obtained the trained weights and biases of the model. The details are shown below.
w = model.get_weights()
w[0].shape, w[1].shape,w[2].shape,w[3].shape,w[4].shape,w[5].shape,w[6].shape,w[7].shape,w[8].shape
The output I got is,
((5, 480),(120, 480),(480,),(120, 100),(100,),(100, 50),(50,),(50, 1),(1,))
It has given out 2 weight matrices of dimensions (5,480)&(120,480) and one bias matrix of dim (480,) corresponding to the LSTM layer. the others are related to the dense layers.
The thing I want to know is that, LSTM has 4 layers. So How can I get the weights and biases of these 4 layers separately? Can I divide the total weights(5,480) into 4 equal parts and consider the 1st 120 correspond to the 1st layer of LSTM, 2nd 120 belong to the 2nd layer of LSTM and so on??
Please share your valuable thoughts on this. Also any good references please

An LSTM doesn't have 4 layers but 4 weight matrices due to its internal gate-cell structure. If this is confusing, it is helpful to read some resources on how an LSTM works. To summarize, the internals consist of 3 gates and 1 cell state which are used to calculate the final hidden state.
If you check the underlying implementation, you can see in which order they are concatenated:
[i, f, c, o]
# i is input gate weights (W_i).
# f is forget gate weights (W_f).
# o is output gate weights (W_o).
# c is cell gate weights (W_c).
So on the example of your bias tensor (480,), you can divide this into 4 subtensors with size 120, where w[:120] represents the input gate weights, w[120:240] represents the forget gate weights and so on.

Related

how to Create Multi Head Convolutional layers merging as Temporal dimension in keras

I want to implement a time-series prediction model which has a window of non-image matrixes as input , each matrix to be processed by a Conv2d layer at the first layer and then the output of this conv layers merged as time dimension to be passed to a recurrent layer like LSTM,
one way is to use Time-Distribution technique but TimeDistributed layer apply the same layer to several inputs. And it produce one output per input to get the result in time, the Time-Distribution technique will share the same weights among all convolution heads which is not what I want, for example If you injects 5 Matrixes, the weights are not tweaked 5 times, but only once, and distributed to every blocks defined in the current Time Distributed layer. how can I avoid this and have independent Convolutional heads with outputs merging as time dimension for the next layer?
I have tried to implement it as following
Matrix_Dimention=20;
Input_Window=4;
Input_Matrixes=[]
ConvLayers=[]
for i in range(0 , Input_Window):
Inp_Matrix=layers.Input(shape=(Matrix_Dimention,Matrix_Dimention,1));
Input_Matrixes.append(Inp_Matrix);
conv=layers.Conv2D(64, 5, activation='relu', input_shape=(Matrix_Dimention,Matrix_Dimention,1))(Inp_Matrix)
ConvLayers.append(conv);
#Temporal Concatenation
Spatial_Layers_Concate = layers.Concatenate(ConvLayers); # this causes error : Inputs to a layer should be tensors
#Temporal Component
LSTM_Layer=layers.LSTM(activation='relu',return_sequences=False)(Spatial_Layers_Concate )
Model = keras.Model(Input_Matrixes, LSTM_Layer)
Model.compile(optimizer='adam', loss=keras.losses.MeanSquaredError)
it would be great if you provide your answer by correcting my implementation or provide your own if there is a better way to form this idea , tnx.

Kernel and Recurrent Kernel in Keras LSTMs

I'm trying to draw in my mind the structure of the LSTMs and I don't understand what are the Kernel and Recurrent Kernel. According to this post in LSTMs section, the Kernel it's the four matrices that are multiplied by the inputs and Recurrent Kernel it's the four matrices that are multiplied by the hidden state, but, what are those 4 matrices in this diagram?
Are the gates?
I was testing with this app how the unit variable of the code below affect the kernel, recurrent kernel and bias:
model = Sequential()
model.add(LSTM(unit = 1, input_shape=(1, look_back)))
with look_back = 1 it returns me that:
with unit = 2 it returns me this
With unit = 3 this
Testing with this values I could deducted this expressions
but I don't know how this works by inside. What does mean <1x(4u)> or <ux(4u)>? u = units
The kernels are basically the weights handled by the LSTM cell
units = neurons, like the classic multilayer perceptron
It is not shown in your diagram, but the input is a vector X with 1 or more values, and each value is sent in a neuron with its own weight w (the which we are going to learn with backpropagation)
The four matrices are these (expressed as Wf, Wi, Wc, Wo):
When you add a neuron, you are adding other 4 weights\kernel
So for your input vector X you have four matrix. And therefore
1 * 4 * units = kernel
Regarding the recurrent_kernel here you can find the answer.
Basically in keras input and hidden state are not concatenated like in the example diagrams (W[ht-1, t]) but they are split and handled with other four matrices called U:
Because you have a hidden state x neuron, the weights U (all four U) are:
units * (4 * units) = recurrent kernel
ht-1 comes in a recurrent way from all your neurons. Like in a multilayer perceptron, each output of a neuron goes in all the next recurrent layer neurons
source: http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Tensorflow - building LSTM model - need for tf.keras.layers.Dense()

Python 3.7 tensorflow
I am experimenting Time series forecasting w Tensorflow
I understand the second line creates a LSTM RNN i.e. a Recurrent Neural Network of type Long Short Term Memory.
Why do we need to add a Dense(1) layer in the end?
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))
Tutorial for Dense() says
Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
would you like to rephrase or elaborate on need for Dense() here ?
The following line
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
creates an LSTM layer which transforms each input step of size #features into a latent representation of size 32. You want to predict a single value so you need to convert this latent representation of size 32 into a single value. Hence, you add the following line
single_step_model.add(tf.keras.layers.Dense(1))
which adds a Dense Layer (Fully-Connected Neural Network) with one neuron in the output which, obviously, produces a single value. Look at it as a way to transform an intermediate result of higher dimensionality into the final result.
Well in the tutorial you are following Time series forecasting, they are trying to forecast temperature (6 hrs ahead). For which they are using an LSTM followed by a Dense layer.
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))
Dense layer is nothing but a regular fully-connected NN layer. In this case you are bringing down the output dimensionality to 1, which should represent some proportionality (need not be linear) to the temperature you are trying to predict. There are other layers you could use as well. Check out, Keras Layers.
If you are confused about the input and output shape of LSTM, check out
I/O Shape.

LSTM architecture in Keras implementation?

I am new to Keras and going through the LSTM and its implementation details in Keras documentation. It was going easy but suddenly I came through this SO post and the comment. It has confused me on what is the actual LSTM architecture:
Here is the code:
model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))
model.add(Dense(2))
As per my understanding, 10 denote the no. of time-steps and each one of them is fed to their respective LSTM cell; 64 denote the no. of features for each time-step.
But, the comment in the above post and the actual answer has confused me about the meaning of 32.
Also, how is the output from LSTM is getting connected to the Dense layer.
A hand-drawn diagrammatic explanation would be quite helpful in visualizing the architecture.
EDIT:
As far as this another SO post is concerned, then it means 32 represents the length of the output vector that is produced by each of the LSTM cells if return_sequences=True.
If that's true then how do we connect each of 32-dimensional output produced by each of the 10 LSTM cells to the next dense layer?
Also, kindly tell if the first SO post answer is ambiguous or not?
how do we connect each of 32-dimensional output produced by each of
the 10 LSTM cells to the next dense layer?
It depends on how you want to do it. Suppose you have:
model.add(LSTM(32, input_shape=(10, 64), return_sequences=True))
Then, the output of that layer has shape (10, 32). At this point, you can either use a Flatten layer to get a single vector with 320 components, or use a TimeDistributed to work on each of the 10 vectors independently:
model.add(TimeDistributed(Dense(15))
The output shape of this layer is (10, 15), and the same weights are applied to the output of every LSTM unit.
it's easy to figure out the no. of LSTM cells required for the input(specified in timespan)
How to figure out the no. of LSTM units required in the output?
You either get the output of the last LSTM cell (last timestep) or the output of every LSTM cell, depending on the value of return_sequences. As for the dimensionality of the output vector, that's just a choice you have to make, just like the size of a dense layer, or number of filters in a conv layer.
how each of the 32-dim vector from the 10 LSTM cells get connected to TimeDistributed layer?
Following the previous example, you would have a (10, 32) tensor, i.e. a size-32 vector for each of the 10 LSTM cells. What TimeDistributed(Dense(15)) does, is to create a (15, 32) weight matrix and a bias vector of size 15, and do:
for h_t in lstm_outputs:
dense_outputs.append(
activation(dense_weights.dot(h_t) + dense_bias)
)
Hence, dense_outputs has size (10, 15), and the same weights were applied to every LSTM output, independently.
Note that everything still works when you don't know how many timesteps you need, e.g. for machine translation. In this case, you use None for the timestep; everything that I wrote still applies, with the only difference that the number of timesteps is not fixed anymore. Keras will repeat LSTM, TimeDistributed, etc. for as many times as necessary (which depend on the input).

create single perceptron with scikits MLPregressor

I use sklearn.neural_networks MLPRegressor
Do I understand it right, that by choosing hidden_layer_sizes=(1, ) I create a single perceptron because the first "hidden layer" is nothing else than the neurons that learn from the input layer?
When you set hidden_layer_size=(1,) you create a network with 1 hidden layer with size 1 neuron. It means instead of a Single-Layer Perceptron which has no hidden layer, you create a Multi-Layer Perceptron with 1 hidden layer with size 1 neuron.
you can read it from here: http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html
hidden_layer_sizes : tuple, length = n_layers - 2, default (100,)
n-layers - 2 means the value in hidden_layer_size is not included the first layer (input layer) and the last layer (output layer)
To create a Single Layer Perceptron, set it to empty: hidden_layer_size=()

Categories