how to fix 'batch_size' in LSTM

how to fix 'batch_size' in LSTM - python

I am running a LSTM code, and I want to make it Bidirectional LSTM. How do I go about this?
I am using the code from https://github.com/brunnergino/JamBot.git. The notebook named polyphonic_lstm_training.py has the code.
model = Sequential()
model.add(LSTM(lstm_size, batch_size=batch_size, input_shape=(step_size, new_num_notes+chord_dim+counter_size), stateful=True))
model.add(LSTM(lstm_size, batch_input_shape=(batch_size,step_size, new_num_notes+chord_dim+counter_size), stateful=True))
I expect it to train using Bidirectional LSTM

A possible approach is to implementing bidirectional LSTM is to reverse the input prior to processing it by a second LSTM layer. Then reverse the output of the second LSTM again and concatenate with with the output of the first LSTM layer.

Related

Add layers to a pretrained model wihout creating a sequential model

I am using a pretrained Resnet50 (from the tensorflow.keras.applications package) and finetune it for multilabel classification (with 2 classes), and I'd like to extract the Saliency maps from the finetuned model.
To make a classifier, i add 2 dense layers to the Resnet model, creating a new sequential model as follow :
self.model = tf.keras.Sequential([
resnet50,
layers.Dense(1024, activation='relu', name='hidden_layer'),
layers.Dense(2, activation='sigmoid', name='output')
])
but my problem is that the resnet50 becomes a "single layer", like each layer is no more accessible : the model summary only contains 3 layers. I'd like to know if there is a way to add layers to a functional model without creating a sequential model, in order to be able to access each layer of the resnet model.
Thank you in advance,

To access layers of the resnet50 model, you have to first access the resnet50 layer aand then from that access the convolution layer for which you want to create saliency map.
self.model.get_layer("resnet50").get_layer("conv5_block2_3_conv")

What dose the number mean in LSTM

In RNN neural network,
what does the number 128 behind LSTM mean?
# RNN Recurrent Neural Network architecture
model = Sequential()
model.add(LSTM(128, input_shape=(X_train.shape[1:]), return_sequences=True))
#model.add(Dropout(0.2))
model.add(BatchNormalization())

I think that the following link can provide clear answer to this question.
https://www.tensorflow.org/api_docs/python/tf/compat/v1/keras/layers/LSTM
According to the link, it's dimension of the output space for LSTM.

It is the number of LSTM node you want to use (perpendicular to the flow of information).

Unable to understand the output shapes in LSTM network below

I have been trying to train a bidirectional LSTM using TensorFlow v2 keras for text classification. Below is the architecture:
model1 = Sequential()
model1.add(Embedding(vocab, 128,input_length=maxlength))
model1.add(Bidirectional(LSTM(32,dropout=0.2,recurrent_dropout=0.2,return_sequences=True)))
model1.add(Bidirectional(LSTM(16,dropout=0.2,recurrent_dropout=0.2,return_sequences=True)))
model1.add(GlobalAveragePooling1D())
model1.add(Dense(5, activation='softmax'))
model1.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model1.summary()
Now, it is the summary details where I am confused
My doubts are related to the output shapes of BiLSTM layers. How they are (283,64) & (283,32) though the number of units used is 32 & 16 respectively for the 2 layers. Here, maxlength=283, vocab=19479

I believe that the explanation for this result is the bidirectional Nature of the LSTM layers in which you have added to your neural network: The size of the layer you have added is doubled for the layer to also learn the sequence backwards. I hope you can understand, if you have any questions, you can ask me in the comments.

This is because of Bidirectional. If you remove it, you'll see that output shapes are (283,32) & (283,16). Bidirectional creates some kind of extra layer

Does applying a Dropout Layer after the Embedding Layer have the same effect as applying the dropout through the LSTM dropout parameter?

I am slightly confused on the different ways to apply dropout to my Sequential model in Keras.
My model is the following:
model = Sequential()
model.add(Embedding(input_dim=64,output_dim=64, input_length=498))
model.add(LSTM(units=100,dropout=0.5, recurrent_dropout=0.5))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
Assume that I added an extra Dropout layer after the Embedding layer in the below manner:
model = Sequential()
model.add(Embedding(input_dim=64,output_dim=64, input_length=498))
model.add(Dropout(0.25))
model.add(LSTM(units=100,dropout=0.5, recurrent_dropout=0.5))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
Will this make any difference since I then specified that the dropout should be 0.5 in the LSTM parameter specifically, or am I getting this all wrong?

When you add a dropout layer you're adding dropout to the output of the previous layer only, in your case you are adding dropout to your embedding layer.
An LSTM cell is more complex than a single layer neural network, when you specify the dropout in the LSTM cell you are actually applying dropout to 4 different sub neural network operations in the LSTM cell.
Below is a visualization of an LSMT cell from Colah's blog on LSTMs (the best visualization of LSTM/RNNs out there, http://colah.github.io/posts/2015-08-Understanding-LSTMs/). The yellow boxes represent 4 fully connected network operations (each with their own weights) which occur under the hood of the LSTM - this is neatly wrapped up in the LSTM cell wrapper, though it's not really so hard to code by hand.
When you specify dropout=0.5 in the LSTM cell, what you are doing under the hood is applying dropout to each of these 4 neural network operations. This is effectively adding model.add(Dropout(0.25)) 4 times, once after each of the 4 yellow blocks you see in the diagram, within the internals of the LSTM cell.
I hope that short discussion makes it more clear how the dropout applied in the LSTM wrapper, which is applied to effectively 4 sub networks within the LSTM, is different from the dropout you applied once in the sequence after your embedding layer. And to answer your question directly, yes, these two dropout definitions are very much different.
Notice, as a further example to help elucidate the point: if you were to define a simple 5 layer fully connected neural network you would need to define dropout after each layer, not once. model.add(Dropout(0.25)) is not some kind of global setting, it's adding the dropout operation to a pipeline of operations. If you have 5 layers, you need to add 5 dropout operations.

Keras Sequential model input layer

When creating a Sequential model in Keras, I understand you provide the input shape in the first layer. Does this input shape then make an implicit input layer?
For example, the model below explicitly specifies 2 Dense layers, but is this actually a model with 3 layers consisting of one input layer implied by the input shape, one hidden dense layer with 32 neurons, and then one output layer with 10 possible outputs?
model = Sequential([
Dense(32, input_shape=(784,)),
Activation('relu'),
Dense(10),
Activation('softmax'),
])

Well, it actually is an implicit input layer indeed, i.e. your model is an example of a "good old" neural net with three layers - input, hidden, and output. This is more explicitly visible in the Keras Functional API (check the example in the docs), in which your model would be written as:
inputs = Input(shape=(784,)) # input layer
x = Dense(32, activation='relu')(inputs) # hidden layer
outputs = Dense(10, activation='softmax')(x) # output layer
model = Model(inputs, outputs)
Actually, this implicit input layer is the reason why you have to include an input_shape argument only in the first (explicit) layer of the model in the Sequential API - in subsequent layers, the input shape is inferred from the output of the previous ones (see the comments in the source code of core.py).
You may also find the documentation on tf.contrib.keras.layers.Input enlightening.

It depends on your perspective :-)
Rewriting your code in line with more recent Keras tutorial examples, you would probably use:
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=784))
model.add(Dense(10, activation='softmax')
...which makes it much more explicit that you only have 2 Keras layers. And this is exactly what you do have (in Keras, at least) because the "input layer" is not really a (Keras) layer at all: it's only a place to store a tensor, so it may as well be a tensor itself.
Each Keras layer is a transformation that outputs a tensor, possibly of a different size/shape to the input. So while there are 3 identifiable tensors here (input, outputs of the two layers), there are only 2 transformations involved corresponding to the 2 Keras layers.
On the other hand, graphically, you might represent this network with 3 (graphical) layers of nodes, and two sets of lines connecting the layers of nodes. Graphically, it's a 3-layer network. But "layers" in this graphical notation are bunches of circles that sit on a page doing nothing, whereas a layers in Keras transform tensors and do actual work for you. Personally, I would get used to the Keras perspective :-)
Note finally that for fun and/or simplicity, I substituted input_dim=784 for input_shape=(784,) to avoid the syntax that Python uses to both confuse newcomers and create a 1-D tuple: (<value>,).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.