Keras Bidirectional LSTM - Layer grouping - python

While working to implement a paper (Dialogue Act Sequence Labeling using Hierarchical encoder with CRF) using Keras, I need to implement a specific Bidirectional LSTM architecture.
I have to train the network on the concept of a Conversation. Conversations are composed of Utterances, and Utterances are composed of Words. Words are N-dimensional vectors. The model represented in the paper first reduces each Utterance to a single M-dimensional vector. To achieve this, it uses a Bidirectional LSTM layer. Let's call this layer A.
(For simplicity, let's assume that each Utterance has a length of |U| and each Conversation has a length of |C|)
Each Utterance is input to a Bi-LSTM layer with U timesteps, and the output of the last timestep is taken. The input size is (|U|, N), and the output size is (1, M).
This Bi-LSTM layer should be applied separately/simultaneously to each Utterance in the Conversation. Note that, since the network takes as input the entire Conversation, the dimensions for a single input to the network would be (|C|, |U|, N).
As the paper describes, I intend to feed each utterance (i.e. each (|U|, N)) of that input and feed it to a Bi-LSTM layer with |U| units. As there are |C| Utterances in a Conversation, this implies that there should be a total of |C| x |U| Bi-LSTM units, grouped into |C| different partitions for each Utterance. There should be no connection between the |C| groups of units. Once processed, the output of each of those C groups of Bidirectional LSTM units will then be fed into another Bi-LSTM layer, say B.
How is it possible to feed specific portions of the input only to specific portions of the layer A, and make sure that they are not interconnected? (i.e. the portion of Bi-LSTM units used for an Utterance should not be connected to the Bi-LSTM units used for another Utterance)
Is it possible to achieve this through keras.models.Sequential, or is there a specific way to achieve this using Functional API?
Here is what I have tried so far:
# ...
model = Sequential()
model.add(Bidirectional(LSTM(C * U), input_shape = (C, U, N),
merge_mode='concat'))
model.add(GlobalMaxPooling1D())
model.add(Bidirectional(LSTM(n, return_sequences = True), merge_mode='concat'))
# ...
model.compile(loss = loss_function,
optimizer = optimizer,
metrics=['accuracy'])
However, this code is currently receiving the following error:
ValueError: Input 0 is incompatible with layer bidirectional_1: expected ndim=3, found ndim=4
More importantly, the code above obviously does not do the grouping I mentioned. I am looking for a way to enhance the model as I described above.
Finally, below is the figure of the model I described above. It may possibly help clarify some of the written content narrated above. The layer tagged as "Utterance layer" is what I called the layer A. As you can see in the figure, each utterance u_i in the figure is composed of words w_j, which are N-dimensional vectors. (You may omit the embedding layer for the purposes of this question) Assuming, for simplicity, that each u_i has equal number of Words, then each group of Bidirectional LSTM nodes in the Utterance Layer will have an input size of (|U|, N). Yet, since there are |C| such utterances u_i in a Conversation, the dimensions of the entire input will be (|C|, |U|, N).

I'll create a net for what I see in the picture. For now I'm ignoring the "units" part I mentioned in my comment to your question.
This model does exactly what is shown in the picture. All utterances are completely separate from start to end.
model = Sequential()
#You have an extra time dimension that should be kept as is
#So we add a TimeDistributed` wrapper to the first layers
model.add(TimeDistributed(Embedding(dictionaryLength,N), input_shape=(C,U)))
#This is the utterance layer. It works in "word steps", keeping "utterance steps" untouched
model.add(TimeDistributed(Bidirectional(LSTM(M//2, return_sequences=False))))
#Is the pooling really demanded by the article?
#Or was it an attempt to remove one of the time dimensions?
#Not adding it here because I used `return_sequences=False`
model.add(Bidirectional(LSTM(someSize//2,return_sequences=True)))
model.add(Dense(anotherSize)) #is this a CRF layer???
model.summary()
Notice that in every Bidirectional layer, I divided the output size by two, so it's important that M and someSize are even numbers.

Related

how to Create Multi Head Convolutional layers merging as Temporal dimension in keras

I want to implement a time-series prediction model which has a window of non-image matrixes as input , each matrix to be processed by a Conv2d layer at the first layer and then the output of this conv layers merged as time dimension to be passed to a recurrent layer like LSTM,
one way is to use Time-Distribution technique but TimeDistributed layer apply the same layer to several inputs. And it produce one output per input to get the result in time, the Time-Distribution technique will share the same weights among all convolution heads which is not what I want, for example If you injects 5 Matrixes, the weights are not tweaked 5 times, but only once, and distributed to every blocks defined in the current Time Distributed layer. how can I avoid this and have independent Convolutional heads with outputs merging as time dimension for the next layer?
I have tried to implement it as following
Matrix_Dimention=20;
Input_Window=4;
Input_Matrixes=[]
ConvLayers=[]
for i in range(0 , Input_Window):
Inp_Matrix=layers.Input(shape=(Matrix_Dimention,Matrix_Dimention,1));
Input_Matrixes.append(Inp_Matrix);
conv=layers.Conv2D(64, 5, activation='relu', input_shape=(Matrix_Dimention,Matrix_Dimention,1))(Inp_Matrix)
ConvLayers.append(conv);
#Temporal Concatenation
Spatial_Layers_Concate = layers.Concatenate(ConvLayers); # this causes error : Inputs to a layer should be tensors
#Temporal Component
LSTM_Layer=layers.LSTM(activation='relu',return_sequences=False)(Spatial_Layers_Concate )
Model = keras.Model(Input_Matrixes, LSTM_Layer)
Model.compile(optimizer='adam', loss=keras.losses.MeanSquaredError)
it would be great if you provide your answer by correcting my implementation or provide your own if there is a better way to form this idea , tnx.

Tensorflow - building LSTM model - need for tf.keras.layers.Dense()

Python 3.7 tensorflow
I am experimenting Time series forecasting w Tensorflow
I understand the second line creates a LSTM RNN i.e. a Recurrent Neural Network of type Long Short Term Memory.
Why do we need to add a Dense(1) layer in the end?
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))
Tutorial for Dense() says
Dense implements the operation: output = activation(dot(input, kernel) + bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and bias is a bias vector created by the layer (only applicable if use_bias is True).
would you like to rephrase or elaborate on need for Dense() here ?
The following line
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
creates an LSTM layer which transforms each input step of size #features into a latent representation of size 32. You want to predict a single value so you need to convert this latent representation of size 32 into a single value. Hence, you add the following line
single_step_model.add(tf.keras.layers.Dense(1))
which adds a Dense Layer (Fully-Connected Neural Network) with one neuron in the output which, obviously, produces a single value. Look at it as a way to transform an intermediate result of higher dimensionality into the final result.
Well in the tutorial you are following Time series forecasting, they are trying to forecast temperature (6 hrs ahead). For which they are using an LSTM followed by a Dense layer.
single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32, input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))
Dense layer is nothing but a regular fully-connected NN layer. In this case you are bringing down the output dimensionality to 1, which should represent some proportionality (need not be linear) to the temperature you are trying to predict. There are other layers you could use as well. Check out, Keras Layers.
If you are confused about the input and output shape of LSTM, check out
I/O Shape.

LSTM architecture in Keras implementation?

I am new to Keras and going through the LSTM and its implementation details in Keras documentation. It was going easy but suddenly I came through this SO post and the comment. It has confused me on what is the actual LSTM architecture:
Here is the code:
model = Sequential()
model.add(LSTM(32, input_shape=(10, 64)))
model.add(Dense(2))
As per my understanding, 10 denote the no. of time-steps and each one of them is fed to their respective LSTM cell; 64 denote the no. of features for each time-step.
But, the comment in the above post and the actual answer has confused me about the meaning of 32.
Also, how is the output from LSTM is getting connected to the Dense layer.
A hand-drawn diagrammatic explanation would be quite helpful in visualizing the architecture.
EDIT:
As far as this another SO post is concerned, then it means 32 represents the length of the output vector that is produced by each of the LSTM cells if return_sequences=True.
If that's true then how do we connect each of 32-dimensional output produced by each of the 10 LSTM cells to the next dense layer?
Also, kindly tell if the first SO post answer is ambiguous or not?
how do we connect each of 32-dimensional output produced by each of
the 10 LSTM cells to the next dense layer?
It depends on how you want to do it. Suppose you have:
model.add(LSTM(32, input_shape=(10, 64), return_sequences=True))
Then, the output of that layer has shape (10, 32). At this point, you can either use a Flatten layer to get a single vector with 320 components, or use a TimeDistributed to work on each of the 10 vectors independently:
model.add(TimeDistributed(Dense(15))
The output shape of this layer is (10, 15), and the same weights are applied to the output of every LSTM unit.
it's easy to figure out the no. of LSTM cells required for the input(specified in timespan)
How to figure out the no. of LSTM units required in the output?
You either get the output of the last LSTM cell (last timestep) or the output of every LSTM cell, depending on the value of return_sequences. As for the dimensionality of the output vector, that's just a choice you have to make, just like the size of a dense layer, or number of filters in a conv layer.
how each of the 32-dim vector from the 10 LSTM cells get connected to TimeDistributed layer?
Following the previous example, you would have a (10, 32) tensor, i.e. a size-32 vector for each of the 10 LSTM cells. What TimeDistributed(Dense(15)) does, is to create a (15, 32) weight matrix and a bias vector of size 15, and do:
for h_t in lstm_outputs:
dense_outputs.append(
activation(dense_weights.dot(h_t) + dense_bias)
)
Hence, dense_outputs has size (10, 15), and the same weights were applied to every LSTM output, independently.
Note that everything still works when you don't know how many timesteps you need, e.g. for machine translation. In this case, you use None for the timestep; everything that I wrote still applies, with the only difference that the number of timesteps is not fixed anymore. Keras will repeat LSTM, TimeDistributed, etc. for as many times as necessary (which depend on the input).

How to interpret clearly the meaning of the units parameter in Keras?

I am wondering how LSTM work in Keras. In this tutorial for example, as in many others, you can find something like this :
model.add(LSTM(4, input_shape=(1, look_back)))
What does the "4" mean. Is it the number of neuron in the layer. By neuron, I mean something that for each instance gives a single output ?
Actually, I found this brillant discussion but wasn't really convinced by the explanation mentioned in the reference given.
On the scheme, one can see the num_unitsillustrated and I think I am not wrong in saying that each of this unit is a very atomic LSTM unit (i.e. the 4 gates). However, how these units are connected ? If I am right (but not sure), x_(t-1)is of size nb_features, so each feature would be an input of a unit and num_unit must be equal to nb_features right ?
Now, let's talk about keras. I have read this post and the accepted answer and get trouble. Indeed, the answer says :
Basically, the shape is like (batch_size, timespan, input_dim), where input_dim can be different from the unit
In which case ? I am in trouble with the previous reference...
Moreover, it says,
LSTM in Keras only define exactly one LSTM block, whose cells is of unit-length.
Okay, but how do I define a full LSTM layer ? Is it the input_shape that implicitely create as many blocks as the number of time_steps (which, according to me is the first parameter of input_shape parameter in my piece of code ?
Thanks for lighting me
EDIT : would it also be possible to detail clearly how to reshape data of, say, size (n_samples, n_features) for a stateful LSTM model ? How to deal with time_steps and batch_size ?
First, units in LSTM is NOT the number of time_steps.
Each LSTM cell(present at a given time_step) takes in input x and forms a hidden state vector a, the length of this hidden unit vector is what is called the units in LSTM(Keras).
You should keep in mind that there is only one RNN cell created by the code
keras.layers.LSTM(units, activation='tanh', …… )
and RNN operations are repeated by Tx times by the class itself.
I've linked this to help you understand it better in with a very simple code.
You can (sort of) think of it exactly as you think of fully connected layers. Units are neurons.
The dimension of the output is the number of neurons, as with most of the well known layer types.
The difference is that in LSTMs, these neurons will not be completely independent of each other, they will intercommunicate due to the mathematical operations lying under the cover.
Before going further, it might be interesting to take a look at this very complete explanation about LSTMs, its inputs/outputs and the usage of stative = true/false: Understanding Keras LSTMs. Notice that your input shape should be input_shape=(look_back, 1). The input shape goes for (time_steps, features).
While this is a series of fully connected layers:
hidden layer 1: 4 units
hidden layer 2: 4 units
output layer: 1 unit
This is a series of LSTM layers:
Where input_shape = (batch_size, arbitrary_steps, 3)
Each LSTM layer will keep reusing the same units/neurons over and over until all the arbitrary timesteps in the input are processed.
The output will have shape:
(batch, arbitrary_steps, units) if return_sequences=True.
(batch, units) if return_sequences=False.
The memory states will have a size of units.
The inputs processed from the last step will have size of units.
To be really precise, there will be two groups of units, one working on the raw inputs, the other working on already processed inputs coming from the last step. Due to the internal structure, each group will have a number of parameters 4 times bigger than the number of units (this 4 is not related to the image, it's fixed).
Flow:
Takes an input with n steps and 3 features
Layer 1:
For each time step in the inputs:
Uses 4 units on the inputs to get a size 4 result
Uses 4 recurrent units on the outputs of the previous step
Outputs the last (return_sequences=False) or all (return_sequences = True) steps
output features = 4
Layer 2:
Same as layer 1
Layer 3:
For each time step in the inputs:
Uses 1 unit on the inputs to get a size 1 result
Uses 1 unit on the outputs of the previous step
Outputs the last (return_sequences=False) or all (return_sequences = True) steps
The number of units is the size (length) of the internal vector states, h and c of the LSTM. That is no matter the shape of the input, it is upscaled (by a dense transformation) by the various kernels for the i, f, and o gates. The details of how the resulting latent features are transformed into h and c are described in the linked post. In your example, the input shape of data
(batch_size, timesteps, input_dim)
will be transformed to
(batch_size, timesteps, 4)
if return_sequences is true, otherwise only the last h will be emmited making it (batch_size, 4). I would recommend using a much higher latent dimension, perhaps 128 or 256 for most problems.
I would put it this way - there are 4 LSTM "neurons" or "units", each with 1 Cell State and 1 Hidden State for each timestep they process. So for an input of 1 timestep processing , you will have 4 Cell States, and 4 Hidden States and 4 Outputs.
Actually the correct way to say this is - for one timestep sized input you 1 Cell State (a vector of size 4) and 1 Hidden State (a vector of size 4) and 1 Output (a vector of size 4).
So if you feed in a timeseries with 20 steps, you will have 20 (intermediate) Cell States, each of size 4. That is because the inputs in LSTM are processed sequentially, 1 after the other. Similarly you will have 20 Hidden States, each of size 4.
Usually, your output will be the output of the LAST step (a vector of size 4). However in case you want the outputs of each intermediate step(remember you have 20 timesteps to process), you can make return_sequences = TRUE. In which case you will have 20 , 4 sized vectors each telling you what was the output when each of those steps got processed as those 20 inputs came one after the other.
In case when you put return_states = TRUE , you get the last Hidden State of size = 4 and last Cell State of size 4.

How to verify structure a neural network in keras model?

I'm new in Keras and Neural Networks. I'm writing a thesis and trying to create a SimpleRNN in Keras as it is illustrated below:
As it is shown in the picture, I need to create a model with 4 inputs + 2 outputs and with any number of neurons in the hidden layer.
This is my code:
model = Sequential()
model.add(SimpleRNN(4, input_shape=(1, 4), activation='sigmoid', return_sequences=True))
model.add(Dense(2))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit(data, target, epochs=5000, batch_size=1, verbose=2)
predict = model.predict(data)
1) Does my model implement the graph?
2) Is it possible to specify connections between neurons Input and Hidden layers or Output and Input layers?
Explanation:
I am going to use backpropagation to train my network.
I have input and target values
Input is a 10*4 array and target is a 10*2 array which I then reshape:
input = input.reshape((10, 1, 4))
target = target.reshape((10, 1, 2))
It is crucial for to able to specify connections between neurons as they can be different. For instance, here you can have an example:
1) Not really. But I'm not sure about what exactly you want in that graph. (Let's see how Keras recurrent layers work below)
2) Yes, it's possible to connect every layer to every layer, but you can't use Sequential for that, you must use Model.
This answer may not be what you're looking for. What exactly do you want to achieve? What kind of data you have, what output you expect, what is the model supposed to do? etc...
1 - How does a recurrent layer work?
Documentation
Recurrent layers in keras work with an "input sequence" and may output a single result or a sequence result. It's recurrency is totally contained in it and doesn't interact with other layers.
You should have inputs with shape (NumberOrExamples, TimeStepsInSequence, DimensionOfEachStep). This means input_shape=(TimeSteps,Dimension).
The recurrent layer will work internally with each time step. The cycles happen from step to step and this behavior is totally invisible. The layer seems to work just like any other layer.
This doesn't seem to be what you want. Unless you have a "sequence" to input. The only way I know if using recurrent layers in Keras that is similar to you graph is when you have a segment of a sequence and want to predict the next step. If that's the case, see some examples by searching for "predicting the next element" in Google.
2 - How to connect layers using Model:
Instead of adding layers to a sequential model (which will always follow a straight line), start using the layers independently, starting from an input tensor:
from keras.layers import *
from keras.models import Model
inputTensor = Input(shapeOfYourInput) #it seems the shape is "(2,)", but we must see your data.
#A dense layer with 2 outputs:
myDense = Dense(2, activation=ItsAGoodIdeaToUseAnActivation)
#The output tensor of that layer when you give it the input:
denseOut1 = myDense(inputTensor)
#You can do as many cycles as you want here:
denseOut2 = myDense(denseOut1)
#you can even make a loop:
denseOut = Activation(ItsAGoodIdeaToUseAnActivation)(inputTensor) #you may create a layer and call it with the input tensor in just one line if you're not going to reuse the layer
#I'm applying this activation layer here because since we defined an activation for the dense layer and we're going to cycle it, it's not going to behave very well receiving huge values in the first pass and small values the next passes....
for i in range(n):
denseOut = myDense(denseOut)
This kind of usage allows you to create any kind of model, with branches, alternative ways, connections from anywhere to anywhere, provided you respect the shape rules. For a cycle like that, inputs and outputs must have the same shape.
At the end, you must define a model from one or many inputs to one or many outputs (you must have training data to match all inputs and outputs you choose):
model = Model(inputTensor,denseOut)
But notice that this model is static. If you want to change the number of cycles, you will have to create a new model.
In this case, it would be as simple as repeating the loop step denseOut = myDense(denseOut) and creating another model2=Model(inputTensor,denseOut).
3 - Trying to create something like the image below:
I am supposing C and F will participate in all iterations. If not,
Since there are four actual inputs, and we are going to treat them all separately, let's create 4 inputs instead, all like (1,).
Your input array should be divided in 4 arrays, all being (10,1).
from keras.models import Model
from keras.layers import *
inputA = Input((1,))
inputB = Input((1,))
inputC = Input((1,))
inputF = Input((1,))
Now the layers N2 and N3, that will be used only once, since C and F are constant:
outN2 = Dense(1)(inputC)
outN3 = Dense(1)(inputF)
Now the recurrent layer N1, without giving it the tensors yet:
layN1 = Dense(1)
For the loop, let's create outA and outB. They start as actual inputs and will be given to the layer N1, but in the loop they will be replaced
outA = inputA
outB = inputB
Now in the loop, let's do the "passes":
for i in range(n):
#unite A and B in one
inputAB = Concatenate()([outA,outB])
#pass through N1
outN1 = layN1(inputAB)
#sum results of N1 and N2 into A
outA = Add()([outN1,outN2])
#this is constant for all the passes except the first
outB = outN3 #looks like B is never changing in your image....
Now the model:
finalOut = Concatenate()([outA,outB])
model = Model([inputA,inputB,inputC,inputF], finalOut)

Categories