Unable to understand the output shapes in LSTM network below

Unable to understand the output shapes in LSTM network below - python

I have been trying to train a bidirectional LSTM using TensorFlow v2 keras for text classification. Below is the architecture:
model1 = Sequential()
model1.add(Embedding(vocab, 128,input_length=maxlength))
model1.add(Bidirectional(LSTM(32,dropout=0.2,recurrent_dropout=0.2,return_sequences=True)))
model1.add(Bidirectional(LSTM(16,dropout=0.2,recurrent_dropout=0.2,return_sequences=True)))
model1.add(GlobalAveragePooling1D())
model1.add(Dense(5, activation='softmax'))
model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model1.summary()
Now, it is the summary details where I am confused
My doubts are related to the output shapes of BiLSTM layers. How they are (283,64) & (283,32) though the number of units used is 32 & 16 respectively for the 2 layers. Here, maxlength=283, vocab=19479

I believe that the explanation for this result is the bidirectional Nature of the LSTM layers in which you have added to your neural network: The size of the layer you have added is doubled for the layer to also learn the sequence backwards. I hope you can understand, if you have any questions, you can ask me in the comments.

This is because of Bidirectional. If you remove it, you'll see that output shapes are (283,32) & (283,16). Bidirectional creates some kind of extra layer

Related

What dose the number mean in LSTM

In RNN neural network,
what does the number 128 behind LSTM mean?
# RNN Recurrent Neural Network architecture
model = Sequential()
model.add(LSTM(128, input_shape=(X_train.shape[1:]), return_sequences=True))
#model.add(Dropout(0.2))
model.add(BatchNormalization())

I think that the following link can provide clear answer to this question.
https://www.tensorflow.org/api_docs/python/tf/compat/v1/keras/layers/LSTM
According to the link, it's dimension of the output space for LSTM.

It is the number of LSTM node you want to use (perpendicular to the flow of information).

Fine-Tuning Pretrained BERT with CNN. How to Disable Masking

Has anyone used CNN in Keras to fine-tune a pre-trained BERT?
I have been trying to design this but the pre-trained model comes with Masks (i think in the embedding layer) and when the fine-tuning architecture is built on the output of one the encoder layers it gives this errorLayer conv1d_1 does not support masking, but was passed an input_mask
So far I have tried some suggested workarounds such as using keras_trans_mask to remove the mask before the CNN and add it back later. But that leads to other errors too.
Is it possible to disable Masking in the pre-trained model or is there a workaround for it?
EDIT: This is the code I'm working with
inputs = model.inputs[:2]
layer_output = model.get_layer('Encoder-12-FeedForward-Norm').output
conv_layer= keras.layers.Conv1D(100, kernel_size=3, activation='relu',
data_format='channels_first')(layer_output)
maxpool_layer = keras.layers.MaxPooling1D(pool_size=4)(conv_layer)
flat_layer= keras.layers.Flatten()(maxpool_layer)
outputs = keras.layers.Dense(units=3, activation='softmax')(flat_layer)
model = keras.models.Model(inputs, outputs)
model.compile(RAdam(learning_rate =LR),loss='sparse_categorical_crossentropy',metrics=['sparse_categorical_accuracy'])
So layer_output contains a mask and it cannot be applied to conv1d

how to fix 'batch_size' in LSTM

I am running a LSTM code, and I want to make it Bidirectional LSTM. How do I go about this?
I am using the code from https://github.com/brunnergino/JamBot.git. The notebook named polyphonic_lstm_training.py has the code.
model = Sequential()
model.add(LSTM(lstm_size, batch_size=batch_size, input_shape=(step_size, new_num_notes+chord_dim+counter_size), stateful=True))
model.add(LSTM(lstm_size, batch_input_shape=(batch_size,step_size, new_num_notes+chord_dim+counter_size), stateful=True))
I expect it to train using Bidirectional LSTM

A possible approach is to implementing bidirectional LSTM is to reverse the input prior to processing it by a second LSTM layer. Then reverse the output of the second LSTM again and concatenate with with the output of the first LSTM layer.

Does applying a Dropout Layer after the Embedding Layer have the same effect as applying the dropout through the LSTM dropout parameter?

I am slightly confused on the different ways to apply dropout to my Sequential model in Keras.
My model is the following:
model = Sequential()
model.add(Embedding(input_dim=64,output_dim=64, input_length=498))
model.add(LSTM(units=100,dropout=0.5, recurrent_dropout=0.5))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
Assume that I added an extra Dropout layer after the Embedding layer in the below manner:
model = Sequential()
model.add(Embedding(input_dim=64,output_dim=64, input_length=498))
model.add(Dropout(0.25))
model.add(LSTM(units=100,dropout=0.5, recurrent_dropout=0.5))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
Will this make any difference since I then specified that the dropout should be 0.5 in the LSTM parameter specifically, or am I getting this all wrong?

When you add a dropout layer you're adding dropout to the output of the previous layer only, in your case you are adding dropout to your embedding layer.
An LSTM cell is more complex than a single layer neural network, when you specify the dropout in the LSTM cell you are actually applying dropout to 4 different sub neural network operations in the LSTM cell.
Below is a visualization of an LSMT cell from Colah's blog on LSTMs (the best visualization of LSTM/RNNs out there, http://colah.github.io/posts/2015-08-Understanding-LSTMs/). The yellow boxes represent 4 fully connected network operations (each with their own weights) which occur under the hood of the LSTM - this is neatly wrapped up in the LSTM cell wrapper, though it's not really so hard to code by hand.
When you specify dropout=0.5 in the LSTM cell, what you are doing under the hood is applying dropout to each of these 4 neural network operations. This is effectively adding model.add(Dropout(0.25)) 4 times, once after each of the 4 yellow blocks you see in the diagram, within the internals of the LSTM cell.
I hope that short discussion makes it more clear how the dropout applied in the LSTM wrapper, which is applied to effectively 4 sub networks within the LSTM, is different from the dropout you applied once in the sequence after your embedding layer. And to answer your question directly, yes, these two dropout definitions are very much different.
Notice, as a further example to help elucidate the point: if you were to define a simple 5 layer fully connected neural network you would need to define dropout after each layer, not once. model.add(Dropout(0.25)) is not some kind of global setting, it's adding the dropout operation to a pipeline of operations. If you have 5 layers, you need to add 5 dropout operations.

Keras Sequential model input layer

When creating a Sequential model in Keras, I understand you provide the input shape in the first layer. Does this input shape then make an implicit input layer?
For example, the model below explicitly specifies 2 Dense layers, but is this actually a model with 3 layers consisting of one input layer implied by the input shape, one hidden dense layer with 32 neurons, and then one output layer with 10 possible outputs?
model = Sequential([
Dense(32, input_shape=(784,)),
Activation('relu'),
Dense(10),
Activation('softmax'),
])

Well, it actually is an implicit input layer indeed, i.e. your model is an example of a "good old" neural net with three layers - input, hidden, and output. This is more explicitly visible in the Keras Functional API (check the example in the docs), in which your model would be written as:
inputs = Input(shape=(784,)) # input layer
x = Dense(32, activation='relu')(inputs) # hidden layer
outputs = Dense(10, activation='softmax')(x) # output layer
model = Model(inputs, outputs)
Actually, this implicit input layer is the reason why you have to include an input_shape argument only in the first (explicit) layer of the model in the Sequential API - in subsequent layers, the input shape is inferred from the output of the previous ones (see the comments in the source code of core.py).
You may also find the documentation on tf.contrib.keras.layers.Input enlightening.

It depends on your perspective :-)
Rewriting your code in line with more recent Keras tutorial examples, you would probably use:
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=784))
model.add(Dense(10, activation='softmax')
...which makes it much more explicit that you only have 2 Keras layers. And this is exactly what you do have (in Keras, at least) because the "input layer" is not really a (Keras) layer at all: it's only a place to store a tensor, so it may as well be a tensor itself.
Each Keras layer is a transformation that outputs a tensor, possibly of a different size/shape to the input. So while there are 3 identifiable tensors here (input, outputs of the two layers), there are only 2 transformations involved corresponding to the 2 Keras layers.
On the other hand, graphically, you might represent this network with 3 (graphical) layers of nodes, and two sets of lines connecting the layers of nodes. Graphically, it's a 3-layer network. But "layers" in this graphical notation are bunches of circles that sit on a page doing nothing, whereas a layers in Keras transform tensors and do actual work for you. Personally, I would get used to the Keras perspective :-)
Note finally that for fun and/or simplicity, I substituted input_dim=784 for input_shape=(784,) to avoid the syntax that Python uses to both confuse newcomers and create a 1-D tuple: (<value>,).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.