Tensorflow2.0 Keras: Is dropout disabled during testing by default? - python

I am wondering if in the following model, dropout will be disabled when I call model.evaluate(...).
layers = [tf.keras.layers.Dense(size, activation='relu')
for size in (20, 40, 20)]
layers.insert(1, tf.keras.layers.Dropout(0.2))
layers.append(tf.keras.layers.Dense(1, activation="sigmoid"))
model = tf.keras.models.Sequential(layers)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
loss=tf.keras.losses.BinaryCrossentropy())
model.fit(...)
model.evaluate(...) #==> will dropout be deactivated here?

Yes, dropout is always disabled at inference (evaluate/predict).

As per the documentation:
Call arguments:
inputs: Input tensor (of any rank).
training: Python boolean indicating whether the layer should behave in
training mode (adding dropout) or in inference mode (doing nothing).
"""
So yes, dropout is disabled when testing, which is logically correct.
The same holds for SpatialDropout as well.
Please find the link to the documentation below.
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/layers/Dropout
Tensorflow Keras API uses a learning phase flag to identify whether we are training or testing. The learning phase flag is a bool tensor (0 = test, 1 = train) to be passed as input to any Keras function that uses a different behavior at train time and test time.

Related

Missed setting `training=True` call method in tensorflow, any problem?

I have trained a model in tensroflow for 4 days, and achieved good test and train loss, convereged well. But later realised that, I haven't forwarded the training=True argument in the call() method of custom written keras model code, during training. I have batch_normalization layers in my neural network.
So, using what mean and variance are my batch_normalization layers trained , when training=None?
Will the mean, variance of these batch_normalization layers used in the prediction time be random/dynamic now?

Unclear purpose of a class derived from Keras' BatchNormalization

I'm reading the code of a Keras implementation of YOLOv4 object detector.
It uses a custom Batch Norm layer, like this:
class BatchNormalization(tf.keras.layers.BatchNormalization):
"""
"Frozen state" and "inference mode" are two separate concepts.
`layer.trainable = False` is to freeze the layer, so the layer will use
stored moving `var` and `mean` in the "inference mode", and both `gama`
and `beta` will not be updated !
"""
def call(self, x, training=False):
if not training:
training = tf.constant(False)
training = tf.logical_and(training, self.trainable)
return super().call(x, training)
Even though I understand how the usual Batch Norm layer works during training and inference, I don't understand the comment nor the need for this modification. What do you think?
Usually (in another types of layers) freeze training of the layer doesn't necessarily mean that the layer is run in inference mode.
Inference mode is normally controller by the training argument.
In the case of batchnorm layer, when the layer is freeze we want that not only the layer parameters will not modify during the training process, we want addionaly that the model will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch.
So despite the difference between a layer training freeze and an inference mode that is common. In the case of the batchnorm there is more similarity between the two modes.
In both, we want that in addition to freezing the parameters of the layer the model used the general mean and standard deviation and will not be affected by the current batch statistics.
(It's important for the stability of the model).
From BatchNormalization layer guide - Keras
About setting layer.trainable = False on a BatchNormalization layer:
Freezing a layer in all the types of layers:
The meaning of setting layer.trainable = False is to freeze the layer, i.e. its internal state will not change during training: its trainable weights will not be updated during fit() or train_on_batch(), and its state updates will not be run.
additional behavior when freezing a batchnorm layer:
However, in the case of the BatchNormalization layer, setting trainable = False on the layer means that the layer will be subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch).

Passing `training=true` when using doing tensorflow training

TensorFlow's official tutorial says that we should pass base_model(trainin=False) during training in order for the BN layer not to update mean and variance. my question is: why? why we don't need to update mean and variance, I mean BN has imagenet mean and variance and why it is useful to use imagenet's mean and variance, and not update them on new data? even during fine tunning, in this case whole model updates weights but BN layer still is going to have imagenet mean and variance.
edit: i am using this tutorial :https://www.tensorflow.org/tutorials/images/transfer_learning
When model is trained from initialization, batchnorm should be enabled to tune their mean and variance as you mentioned. Finetuning or transfer learning is a bit different thing: you already has a model that can do more than you need and you want to perform particular specialization of pre-trained model to do your task/work on your data set. In this case part of weights are frozen and only some layers closest to output are changed. Since BN layers are used all around model you should froze them as well. Check again this explanation:
Important note about BatchNormalization layers Many models contain
tf.keras.layers.BatchNormalization layers. This layer is a special
case and precautions should be taken in the context of fine-tuning, as
shown later in this tutorial.
When you set layer.trainable = False, the BatchNormalization layer
will run in inference mode, and will not update its mean and variance
statistics.
When you unfreeze a model that contains BatchNormalization layers in
order to do fine-tuning, you should keep the BatchNormalization layers
in inference mode by passing training = False when calling the base
model. Otherwise, the updates applied to the non-trainable weights
will destroy what the model has learned.
Source: transfer learning, details regarding freeze

What's the difference between attrubutes 'trainable' and 'training' in BatchNormalization layer in Keras Tensorfolow?

According to the official documents from tensorflow:
About setting layer.trainable = False on a `BatchNormalization layer:
The meaning of setting layer.trainable = False is to freeze the layer, i.e. its internal state will not change during training: its trainable weights will not be updated during fit() or train_on_batch(), and its state updates will not be run.
Usually, this does not necessarily mean that the layer is run in inference mode (which is normally controlled by the training argument that can be passed when calling a layer). "Frozen state" and "inference mode" are two separate concepts.
However, in the case of the BatchNormalization layer, setting trainable = False on the layer means that the layer will be subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch).
This behavior has been introduced in TensorFlow 2.0, in order to enable layer.trainable = False to produce the most commonly expected behavior in the convnet fine-tuning use case.
I don't quite understand the term 'frozen state' and 'inference mode' here in the concept. I tried fine-tuning by setting the trainable to False, and I found that the moving mean and moving variance are not being updated.
So I have the following questions:
What's the difference between 2 attributes training and trainable?
Is gamma and beta getting updated in the training process if set trainable to false?
Why is it necessary to set trainable to false when fine-tuning?
What's the difference between 2 attributes training and trainable?
trainable:- ( If True ) It basically implies that the "trainable" weights of the parameter( of the layer ) will be updated in backpropagation.
training:- Some layers perform differently at training and inference( or testing ) steps. Some examples include Dropout Layer, Batch-Normalization layers. So this attribute tells the layer that in what manner it should perform.
Is gamma and beta getting updated in the training process if set trainable to false?
Since gamma and beta are "trainable" parameters of the BN Layer, they will NOT be updated in the training process if set trainable is set to "False".
Why is it necessary to set trainable to false when fine-tuning?
When doing fine-tuning, we first add our own classification FC layer at the top which is randomly initialized but our "pre-trained" model is already calibrated( a bit ) for the task.
As an analogy, think like this.
You have a number line from 0 - 10. On this number line, '0' represents a completely randomized model whereas '10' represents a kind of perfect model. Our pre-trained model is
somewhere around 5 or maybe 6 or maybe 7 i.e. most probably better than a random model. The FC Layer we have added at the top is at '0' as it is randomized at the start.
We set trainable = False for the pre-trained model so that we can make the FC Layer reach the level of the pre-trained model rapidly i.e. with a higher learning rate. If we don't set trainable = False for the pre-trained model and use a higher learning rate then it will wreak havoc.
So initially, we set a higher learning rate and trainable = False for the pre-trained model and train the FC layer. After that, we unfreeze our pre-trained model and use a very low learning rate to serve our purpose.
Do freely ask for more clarification if required and upvote if you find it helpful.

Implementation of Adversarial Loss In Keras

I'm trying to implement an adversarial loss in keras.
The model consists of two networks, one auto-encoder (the target model) and one discriminator. The two models share the encoder.
I created the adversarial loss of the auto-encoder by setting a keras variable
def get_adv_loss(d_loss):
def loss(y_true, y_pred):
return some_loss(y_true, y_pred) - d_loss
return loss
discriminator_loss = K.variable()
L = get_adv_loss(discriminator_loss)
autoencoder.compile(..., loss=L)
and during training I interleave train_on_batch of discriminator and autoencoder to update discriminator_loss
d_loss = disciminator.train_on_batch(x, y_domain)
discriminator_loss.assign(d_loss)
a_loss, ... = self.segmenter.train_on_batch(x, y_target)
However, I found out that the value of these variables is frozen when the model is compiled. I tried to recompile the model during training but that raise the error
Node 'IsVariableInitialized_13644': Unknown input node
'training_12/Adam/Variable'
which I guess it means i cant recompile during training? any suggestion on how i can inject the discriminator loss in the autoencoder?
Keras model supports multiple outputs. So just include your discirminator into your keras model and freeze the discrminator layers, if the discriminator should not be trained.
The next question would be how to combine autoencoder loss and discriminator loss. Luckily keras model.compile supports loss weights. If autoencoder is your first output and discriminator is your second you could do something like loss_weights=[1, -1]. So a better discriminator is worse for the autoencoder.
Edit: Here is an example, how to implement an Adversary Network:
# Build your architecture
auto_encoder_input = Input((5,))
auto_encoder_net = Dense(10)(auto_encoder_input)
auto_encoder_output = Dense(5)(auto_encoder_net)
discriminator_net = Dense(20)(auto_encoder_output)
discriminator_output = Dense(5)(discriminator_net)
# Define outputs of your model
train_autoencoder_model = Model(auto_encoder_input, [auto_encoder_output, discriminator_output])
train_discriminator_model = Model(auto_encoder_input, discriminator_output)
# Compile the models (compile the first model and then change the trainable attribute for the second)
for layer_index, layer in enumerate(train_autoencoder_model.layers):
layer.trainable = layer_index < 3
train_autoencoder_model.compile('Adam', loss=['mse', 'mse'], loss_weights=[1, -1])
for layer_index, layer in enumerate(train_discriminator_model.layers):
layer.trainable = layer_index >= 3
train_discriminator_model.compile('Adam', loss='mse')
# A simple example how a training can look like
for i in range(10):
auto_input = np.random.sample((10,5))
discrimi_output = np.random.sample((10,5))
train_discriminator_model.fit(auto_input, discrimi_output, steps_per_epoch=5, epochs=1)
train_autoencoder_model.fit(auto_input, [auto_input, discrimi_output], steps_per_epoch=1, epochs=1)
As you can see there is no much magic behind building an Adversary Model with keras.
Unless you decide to go deep in the keras source code, I don't think you can do this easily. Before writing your own adversarial module, you should check the existing works carefully. As far as I know, keras-adversarial is still used by many people. Of course, it only supports old keras versions, e.g. 2.0.8.
Several other things:
be careful when you freeze your model weights. If you first compile a model and then freeze some weights, these weights are still trainable, because when the train function is generated during compiling. So you should freeze weights first then compile.
keras-adversarial does this job in a more elegant way. Instead of making two models, shared weights but freeze some weights in different ways, it creates two train functions, one for each player.

Categories