Bidirectional LSTM with Batch Normalization in Keras

Bidirectional LSTM with Batch Normalization in Keras - python

I was wondering how to implement biLSTM with Batch Normalization (BN) in Keras. I know that BN layer should be between linearity and nonlinearity, i.e., activation. This is easy to implement with CNN or Dense layers. But, how to do this with biLSTM?
Thanks in advance.

If you want to apply BatchNormalization over the linear outputs of an LSTM you can do it as
from keras.models import Sequential
from keras.layers.recurrent import LSTM
from keras.layers.wrappers import Bidirectional
from keras.layers.normalization import BatchNormalization
model = Sequential()
model.add(Bidirectional(LSTM(128, activation=None), input_shape=(256,10)))
model.add(BatchNormalization())
Essentially, you are removing the non-linear activations of the LSTM (but not the gate activations), and then applying BatchNormalization to the outpus.
If what you want is to apply BatchNormalization into one of the inside flows of the LSTM, such as recurrent flows, I'm afraid that feature has not been implemented in Keras.

Related

Is this behavior normal or is it a bug? Training with incompatible shapes in tensorflow

I have found a weird behavior in tensorflow.keras, it doesn't occur in the classic keras.
I have these shapes in my dataset.
x_train = np.random.rand(60,3,1)
y_train = np.random.rand(60,1)
And this LSTM network
from tensorflow.keras.layers import Dense, LSTM
from tensorflow.keras import Sequential
model = Sequential()
model.add(LSTM(120,input_shape=(3,1)))
model.add(Dense(2,activation="relu"))
model.compile(loss="MSE",optimizer="adam")
model.fit(x_train,y_train,epochs=1)
model.summary()
It's supposed this shouldn't work because the output of the network is (,2) and y_train is (,1). But it start training.
But using the classic keras it fails, as I expected.
from keras.layers import Dense, LSTM
from keras import Sequential
model = Sequential()
model.add(LSTM(120,input_shape=(3,1)))
model.add(Dense(2,activation="relu"))
model.compile(loss="MSE",optimizer="adam")
model.fit(x_train,y_train,epochs=1)
model.summary()
The version are, and I'm using Google Colab:
Tensorflow: 2.2.0
Keras: 2.3.1
What could be causing this? Is this a bug or a new feature?

How can I sort a neural network layer in Keras?

I am working on a multi-target regression problem in Keras and I would like the predicted values in the last layer to be sorted. I am currently implementing something like this:
# Lambda layer
import tensorflow as tf
def sort_layer(tensor):
return tf.sort(tensor)
# Training model on train set
from keras.models import Sequential
from keras.layers import Dense,Lambda
model = Sequential()
model.add(Dense(100,input_dim=X_train.shape[1],activation="relu"))
model.add(Dense(150,activation="relu"))
model.add(Dense(50,activation="relu"))
model.add(Dense(y_train.shape[1],activation="linear"))
model.add(Lambda(sort_layer))
model.compile(loss="mse", optimizer="adam")
model.fit(X_train,y_train, epochs=100,batch_size=10, verbose=0)
This doesn't seem to be working every time as some predictions don't come out sorted. Can anyone explain what I am doing wrong and suggest a good fix?
Thank you!

Does applying a Dropout Layer after the Embedding Layer have the same effect as applying the dropout through the LSTM dropout parameter?

I am slightly confused on the different ways to apply dropout to my Sequential model in Keras.
My model is the following:
model = Sequential()
model.add(Embedding(input_dim=64,output_dim=64, input_length=498))
model.add(LSTM(units=100,dropout=0.5, recurrent_dropout=0.5))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
Assume that I added an extra Dropout layer after the Embedding layer in the below manner:
model = Sequential()
model.add(Embedding(input_dim=64,output_dim=64, input_length=498))
model.add(Dropout(0.25))
model.add(LSTM(units=100,dropout=0.5, recurrent_dropout=0.5))
model.add(Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
Will this make any difference since I then specified that the dropout should be 0.5 in the LSTM parameter specifically, or am I getting this all wrong?

When you add a dropout layer you're adding dropout to the output of the previous layer only, in your case you are adding dropout to your embedding layer.
An LSTM cell is more complex than a single layer neural network, when you specify the dropout in the LSTM cell you are actually applying dropout to 4 different sub neural network operations in the LSTM cell.
Below is a visualization of an LSMT cell from Colah's blog on LSTMs (the best visualization of LSTM/RNNs out there, http://colah.github.io/posts/2015-08-Understanding-LSTMs/). The yellow boxes represent 4 fully connected network operations (each with their own weights) which occur under the hood of the LSTM - this is neatly wrapped up in the LSTM cell wrapper, though it's not really so hard to code by hand.
When you specify dropout=0.5 in the LSTM cell, what you are doing under the hood is applying dropout to each of these 4 neural network operations. This is effectively adding model.add(Dropout(0.25)) 4 times, once after each of the 4 yellow blocks you see in the diagram, within the internals of the LSTM cell.
I hope that short discussion makes it more clear how the dropout applied in the LSTM wrapper, which is applied to effectively 4 sub networks within the LSTM, is different from the dropout you applied once in the sequence after your embedding layer. And to answer your question directly, yes, these two dropout definitions are very much different.
Notice, as a further example to help elucidate the point: if you were to define a simple 5 layer fully connected neural network you would need to define dropout after each layer, not once. model.add(Dropout(0.25)) is not some kind of global setting, it's adding the dropout operation to a pipeline of operations. If you have 5 layers, you need to add 5 dropout operations.

concatenate layers to a fully connected layer Tensorflow

I am trying to implement a siamese network from Sergey Zagoruyko using Tensorflow
http://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Zagoruyko_Learning_to_Compare_2015_CVPR_paper.pdf
I don't know to concatenate the 2 input layers to a top network (fully connected layer + relu + fully connected layer)

This may not be what you are looking for, but I recommend trying Keras. It is a flexible, high-level framework built on tensorflow that makes it extremely easy to accomplish what you are attempting. This would be how you could do it in Keras (with 32 inputs and 32 neurons in your FC layers).
from keras.models import Sequential
from keras.layers import Dense, Activation, Input
model = Sequential()
model.add(Input(shape=(32,)))
model.add(Dense(32))
model.add(Activation("relu"))
model.add(Dense(32))
Alternatively, using just tensorflow you could use this strategy.
import tensorflow as tf
x = tf.placeholder(shape, dtype=dtype)
y = tf.layers.dense(x, 32)
y = tf.nn.relu(y)
y = tf.layers.dense(y, 32)
But I personally think keras is more elegant, plus it adds a whole lot of more useful features, such as model.output, model.input, and much more. In fact, keras has recently been built into tensorflow's contrib module as tf.contrib.keras. Hope that helps!

train Keras model with BatchNorm layer with tensorflow

I'm use keras to build a model, and write optimizing codes and all the others in tensorflow. When I was using quite simple layers like Dense or Conv2D, everything was straightforward. But adding BatchNormalization layer into my keras model makes problem complicated.
Since BatchNormalization layer behaves differently in training phase and testing phase, I figured out that I need K.learning_phase():True in my feed_dict. But following code is not working well. It runs with no error, but the model's performance isn't getting any better.
import keras.backend as K
...
x_train, y_train = get_data()
sess.run(train_op, feed_dict={x:x_train, y:y_train, K.learning_phase():True})
When I tried training keras model with keras fit function, it worked well.
What should I do to train a keras model with BatchNormalization layer in tensorflow?

Actually I duplicated this question that I hadn't seen.
I found the answer here, it just consists in passing an special argument to the BatchNormalization layer call

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.