Unclear purpose of a class derived from Keras' BatchNormalization

Unclear purpose of a class derived from Keras' BatchNormalization - python

I'm reading the code of a Keras implementation of YOLOv4 object detector.
It uses a custom Batch Norm layer, like this:
class BatchNormalization(tf.keras.layers.BatchNormalization):
"""
"Frozen state" and "inference mode" are two separate concepts.
`layer.trainable = False` is to freeze the layer, so the layer will use
stored moving `var` and `mean` in the "inference mode", and both `gama`
and `beta` will not be updated !
"""
def call(self, x, training=False):
if not training:
training = tf.constant(False)
training = tf.logical_and(training, self.trainable)
return super().call(x, training)
Even though I understand how the usual Batch Norm layer works during training and inference, I don't understand the comment nor the need for this modification. What do you think?

Usually (in another types of layers) freeze training of the layer doesn't necessarily mean that the layer is run in inference mode.
Inference mode is normally controller by the training argument.
In the case of batchnorm layer, when the layer is freeze we want that not only the layer parameters will not modify during the training process, we want addionaly that the model will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch.
So despite the difference between a layer training freeze and an inference mode that is common. In the case of the batchnorm there is more similarity between the two modes.
In both, we want that in addition to freezing the parameters of the layer the model used the general mean and standard deviation and will not be affected by the current batch statistics.
(It's important for the stability of the model).
From BatchNormalization layer guide - Keras
About setting layer.trainable = False on a BatchNormalization layer:
Freezing a layer in all the types of layers:
The meaning of setting layer.trainable = False is to freeze the layer, i.e. its internal state will not change during training: its trainable weights will not be updated during fit() or train_on_batch(), and its state updates will not be run.
additional behavior when freezing a batchnorm layer:
However, in the case of the BatchNormalization layer, setting trainable = False on the layer means that the layer will be subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch).

Related

Passing `training=true` when using doing tensorflow training

TensorFlow's official tutorial says that we should pass base_model(trainin=False) during training in order for the BN layer not to update mean and variance. my question is: why? why we don't need to update mean and variance, I mean BN has imagenet mean and variance and why it is useful to use imagenet's mean and variance, and not update them on new data? even during fine tunning, in this case whole model updates weights but BN layer still is going to have imagenet mean and variance.
edit: i am using this tutorial :https://www.tensorflow.org/tutorials/images/transfer_learning

When model is trained from initialization, batchnorm should be enabled to tune their mean and variance as you mentioned. Finetuning or transfer learning is a bit different thing: you already has a model that can do more than you need and you want to perform particular specialization of pre-trained model to do your task/work on your data set. In this case part of weights are frozen and only some layers closest to output are changed. Since BN layers are used all around model you should froze them as well. Check again this explanation:
Important note about BatchNormalization layers Many models contain
tf.keras.layers.BatchNormalization layers. This layer is a special
case and precautions should be taken in the context of fine-tuning, as
shown later in this tutorial.
When you set layer.trainable = False, the BatchNormalization layer
will run in inference mode, and will not update its mean and variance
statistics.
When you unfreeze a model that contains BatchNormalization layers in
order to do fine-tuning, you should keep the BatchNormalization layers
in inference mode by passing training = False when calling the base
model. Otherwise, the updates applied to the non-trainable weights
will destroy what the model has learned.
Source: transfer learning, details regarding freeze

Convolutional Autoencoder for classification problem

I am following with Datacamp's tutorial on using convolutional autoencoders for classification here. I understand in the tutorial that we only need the autoencoder's head (i.e. the encoder part) stacked to a fully-connected layer to do the classification.
After stacking, the resulting network (convolutional-autoencoder) is trained twice. The first by setting the encoder's weights to false as:
for layer in full_model.layers[0:19]:
layer.trainable = False
And then setting back to true, and re-trained the network:
for layer in full_model.layers[0:19]:
layer.trainable = True
I cannot understand why we are doing this twice. Anyone with experience working with conv-net or autoencoders?

It's because the first 19 layers are already trained in this line:
autoencoder_train = autoencoder.fit(train_X, train_ground, batch_size=batch_size,epochs=epochs,verbose=1,validation_data=(valid_X, valid_ground))
The point of autoencoder is for dimensionality reduction. Let's say you have 1000 features and you want to reduce it to 100 features. You train an autoencoder with encoder layer followed by decoder layer. The point is so that the encoded feature (outputted by encoder layer) can be decoded back to the original features.
After the autoencoder is trained, the decoder part is thrown away and instead a fully connected classification layer is added on top of the encoder layer to train a classification network from a reduced set of features. This is why the encoder layers trainable is set to false to only train the fully connected classification layer (to speed up training).
The reason that the encoder layer trainable is set to true again is to train the entire network which can be slow because changes are back propagated to the entire network.

They freeze the Autoencoder's layers first because they need to initialize the stacked CNN network so as to make them "catch up" with the pre-trained weights.
If you skip this step, what would happen is that the uninitialized layers would untrain your Autoencoder's layer during the backward pass because the gradients between both networks can be large. Eventually you will return to the same point but you'll have lost the time savings of using a pre-trained network.

What's the difference between attrubutes 'trainable' and 'training' in BatchNormalization layer in Keras Tensorfolow?

According to the official documents from tensorflow:
About setting layer.trainable = False on a `BatchNormalization layer:
The meaning of setting layer.trainable = False is to freeze the layer, i.e. its internal state will not change during training: its trainable weights will not be updated during fit() or train_on_batch(), and its state updates will not be run.
Usually, this does not necessarily mean that the layer is run in inference mode (which is normally controlled by the training argument that can be passed when calling a layer). "Frozen state" and "inference mode" are two separate concepts.
However, in the case of the BatchNormalization layer, setting trainable = False on the layer means that the layer will be subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch).
This behavior has been introduced in TensorFlow 2.0, in order to enable layer.trainable = False to produce the most commonly expected behavior in the convnet fine-tuning use case.
I don't quite understand the term 'frozen state' and 'inference mode' here in the concept. I tried fine-tuning by setting the trainable to False, and I found that the moving mean and moving variance are not being updated.
So I have the following questions:
What's the difference between 2 attributes training and trainable?
Is gamma and beta getting updated in the training process if set trainable to false?
Why is it necessary to set trainable to false when fine-tuning?

What's the difference between 2 attributes training and trainable?
trainable:- ( If True ) It basically implies that the "trainable" weights of the parameter( of the layer ) will be updated in backpropagation.
training:- Some layers perform differently at training and inference( or testing ) steps. Some examples include Dropout Layer, Batch-Normalization layers. So this attribute tells the layer that in what manner it should perform.
Is gamma and beta getting updated in the training process if set trainable to false?
Since gamma and beta are "trainable" parameters of the BN Layer, they will NOT be updated in the training process if set trainable is set to "False".
Why is it necessary to set trainable to false when fine-tuning?
When doing fine-tuning, we first add our own classification FC layer at the top which is randomly initialized but our "pre-trained" model is already calibrated( a bit ) for the task.
As an analogy, think like this.
You have a number line from 0 - 10. On this number line, '0' represents a completely randomized model whereas '10' represents a kind of perfect model. Our pre-trained model is
somewhere around 5 or maybe 6 or maybe 7 i.e. most probably better than a random model. The FC Layer we have added at the top is at '0' as it is randomized at the start.
We set trainable = False for the pre-trained model so that we can make the FC Layer reach the level of the pre-trained model rapidly i.e. with a higher learning rate. If we don't set trainable = False for the pre-trained model and use a higher learning rate then it will wreak havoc.
So initially, we set a higher learning rate and trainable = False for the pre-trained model and train the FC layer. After that, we unfreeze our pre-trained model and use a very low learning rate to serve our purpose.
Do freely ask for more clarification if required and upvote if you find it helpful.

Tensorflow2.0 Keras: Is dropout disabled during testing by default?

I am wondering if in the following model, dropout will be disabled when I call model.evaluate(...).
layers = [tf.keras.layers.Dense(size, activation='relu')
for size in (20, 40, 20)]
layers.insert(1, tf.keras.layers.Dropout(0.2))
layers.append(tf.keras.layers.Dense(1, activation="sigmoid"))
model = tf.keras.models.Sequential(layers)
model.compile(
optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
loss=tf.keras.losses.BinaryCrossentropy())
model.fit(...)
model.evaluate(...) #==> will dropout be deactivated here?

Yes, dropout is always disabled at inference (evaluate/predict).

As per the documentation:
Call arguments:
inputs: Input tensor (of any rank).
training: Python boolean indicating whether the layer should behave in
training mode (adding dropout) or in inference mode (doing nothing).
"""
So yes, dropout is disabled when testing, which is logically correct.
The same holds for SpatialDropout as well.
Please find the link to the documentation below.
https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/layers/Dropout
Tensorflow Keras API uses a learning phase flag to identify whether we are training or testing. The learning phase flag is a bool tensor (0 = test, 1 = train) to be passed as input to any Keras function that uses a different behavior at train time and test time.

Unset trainable attributes for parameters in lasagne / nolearn neural networks

I'm implementing a convolutional neural network using lasagne nolearn.
I'd like to fix some parameters that prelearned.
How can I set some layers untrainable?
Actually, though I removed 'trainable' attribute of some layers,
the number shown in the layer information before fitting, namely, such as
Neural Network with *** learnable parameters never change.
Besides, I'm afraid that the greeting function
in 'handers.py'
def _get_greeting(nn):
shapes = [param.get_value().shape for param in
nn.get_all_params() if param]
should be
nn.get_all_params(trainable=True) if param]
but I'm not sure how it affect on training.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.