Passing `training=true` when using doing tensorflow training

Passing `training=true` when using doing tensorflow training - python

TensorFlow's official tutorial says that we should pass base_model(trainin=False) during training in order for the BN layer not to update mean and variance. my question is: why? why we don't need to update mean and variance, I mean BN has imagenet mean and variance and why it is useful to use imagenet's mean and variance, and not update them on new data? even during fine tunning, in this case whole model updates weights but BN layer still is going to have imagenet mean and variance.
edit: i am using this tutorial :https://www.tensorflow.org/tutorials/images/transfer_learning

When model is trained from initialization, batchnorm should be enabled to tune their mean and variance as you mentioned. Finetuning or transfer learning is a bit different thing: you already has a model that can do more than you need and you want to perform particular specialization of pre-trained model to do your task/work on your data set. In this case part of weights are frozen and only some layers closest to output are changed. Since BN layers are used all around model you should froze them as well. Check again this explanation:
Important note about BatchNormalization layers Many models contain
tf.keras.layers.BatchNormalization layers. This layer is a special
case and precautions should be taken in the context of fine-tuning, as
shown later in this tutorial.
When you set layer.trainable = False, the BatchNormalization layer
will run in inference mode, and will not update its mean and variance
statistics.
When you unfreeze a model that contains BatchNormalization layers in
order to do fine-tuning, you should keep the BatchNormalization layers
in inference mode by passing training = False when calling the base
model. Otherwise, the updates applied to the non-trainable weights
will destroy what the model has learned.
Source: transfer learning, details regarding freeze

Related

If we expand or reduce the layer of the same model, can we still be able to train from pretrained model in Pytorch?

If the pretrained model such as Resnet101 were trained on ImageNet dataset, then I change some layers inside it. Can I still be able to use the pretrained model on different ABC dataset?
Lets say This is ResNet34 Model,
It is pretrained on ImageNet and saved as ResNet.pt file.
If I changed some layers inside it, lets say I made it more deeper by introducing some layers in conv4_x (check image)
model = Resnet34() #I have changes some layers inside this ResNet34()
optimizer = optim.Adam(model.parameters(), lr=0.00005)
model.load_state_dict(torch.load('Resnet.pt')['state_dict']) #This is pretrained model of ResNet before some changes
optimizer.load_state_dict(torch.load('Resnet.pt')['optimizer'])
Can I do this? or there are anyother method?

You can do anything you like - the question is: would it be better than training from scratch?
Here are a few issues you might encounter:
1. A mismatch between weights saved in ResNet.pt (the trained weights of the original ResNet18) and the state_dict of your modified model.
You would probably need to manually make sure that the old weights are correctly assigned to the original layers and only the new layer is not initialized.
2. Initializing the weights of the new layer.
Since you are training a resNet - you can take advantage of the residual connections and init the weights of the new layer such that it would initially make no contribution to the predicted value and only pass the input directly to the output via the residual link.

Unclear purpose of a class derived from Keras' BatchNormalization

I'm reading the code of a Keras implementation of YOLOv4 object detector.
It uses a custom Batch Norm layer, like this:
class BatchNormalization(tf.keras.layers.BatchNormalization):
"""
"Frozen state" and "inference mode" are two separate concepts.
`layer.trainable = False` is to freeze the layer, so the layer will use
stored moving `var` and `mean` in the "inference mode", and both `gama`
and `beta` will not be updated !
"""
def call(self, x, training=False):
if not training:
training = tf.constant(False)
training = tf.logical_and(training, self.trainable)
return super().call(x, training)
Even though I understand how the usual Batch Norm layer works during training and inference, I don't understand the comment nor the need for this modification. What do you think?

Usually (in another types of layers) freeze training of the layer doesn't necessarily mean that the layer is run in inference mode.
Inference mode is normally controller by the training argument.
In the case of batchnorm layer, when the layer is freeze we want that not only the layer parameters will not modify during the training process, we want addionaly that the model will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch.
So despite the difference between a layer training freeze and an inference mode that is common. In the case of the batchnorm there is more similarity between the two modes.
In both, we want that in addition to freezing the parameters of the layer the model used the general mean and standard deviation and will not be affected by the current batch statistics.
(It's important for the stability of the model).
From BatchNormalization layer guide - Keras
About setting layer.trainable = False on a BatchNormalization layer:
Freezing a layer in all the types of layers:
The meaning of setting layer.trainable = False is to freeze the layer, i.e. its internal state will not change during training: its trainable weights will not be updated during fit() or train_on_batch(), and its state updates will not be run.
additional behavior when freezing a batchnorm layer:
However, in the case of the BatchNormalization layer, setting trainable = False on the layer means that the layer will be subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch).

Will freezing layers in tensorflow save model update time?

I'm trying to train a model using transfer learning method, which involves loading and freezing a very large embedding layer. As training speed is also what I concern in this task, I expect freezing layers to also boost training speed. However, I haven't observed any speed improvement at all.
I freeze the embedding layer using
self.embedding_layer.trainable = False
And did observered non-trainable parameters increased in model summary
As the frozen layers may occur in the middle of a model, and the gradients need to be passed down to the first several trainable layers, I reckon tensorflow still calculate the gradients of frozen layers but just skip them during updating stage.
If my guess is correct, is there any way to remove the frozen layer from gradient calculation?

What's the difference between attrubutes 'trainable' and 'training' in BatchNormalization layer in Keras Tensorfolow?

According to the official documents from tensorflow:
About setting layer.trainable = False on a `BatchNormalization layer:
The meaning of setting layer.trainable = False is to freeze the layer, i.e. its internal state will not change during training: its trainable weights will not be updated during fit() or train_on_batch(), and its state updates will not be run.
Usually, this does not necessarily mean that the layer is run in inference mode (which is normally controlled by the training argument that can be passed when calling a layer). "Frozen state" and "inference mode" are two separate concepts.
However, in the case of the BatchNormalization layer, setting trainable = False on the layer means that the layer will be subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch).
This behavior has been introduced in TensorFlow 2.0, in order to enable layer.trainable = False to produce the most commonly expected behavior in the convnet fine-tuning use case.
I don't quite understand the term 'frozen state' and 'inference mode' here in the concept. I tried fine-tuning by setting the trainable to False, and I found that the moving mean and moving variance are not being updated.
So I have the following questions:
What's the difference between 2 attributes training and trainable?
Is gamma and beta getting updated in the training process if set trainable to false?
Why is it necessary to set trainable to false when fine-tuning?

What's the difference between 2 attributes training and trainable?
trainable:- ( If True ) It basically implies that the "trainable" weights of the parameter( of the layer ) will be updated in backpropagation.
training:- Some layers perform differently at training and inference( or testing ) steps. Some examples include Dropout Layer, Batch-Normalization layers. So this attribute tells the layer that in what manner it should perform.
Is gamma and beta getting updated in the training process if set trainable to false?
Since gamma and beta are "trainable" parameters of the BN Layer, they will NOT be updated in the training process if set trainable is set to "False".
Why is it necessary to set trainable to false when fine-tuning?
When doing fine-tuning, we first add our own classification FC layer at the top which is randomly initialized but our "pre-trained" model is already calibrated( a bit ) for the task.
As an analogy, think like this.
You have a number line from 0 - 10. On this number line, '0' represents a completely randomized model whereas '10' represents a kind of perfect model. Our pre-trained model is
somewhere around 5 or maybe 6 or maybe 7 i.e. most probably better than a random model. The FC Layer we have added at the top is at '0' as it is randomized at the start.
We set trainable = False for the pre-trained model so that we can make the FC Layer reach the level of the pre-trained model rapidly i.e. with a higher learning rate. If we don't set trainable = False for the pre-trained model and use a higher learning rate then it will wreak havoc.
So initially, we set a higher learning rate and trainable = False for the pre-trained model and train the FC layer. After that, we unfreeze our pre-trained model and use a very low learning rate to serve our purpose.
Do freely ask for more clarification if required and upvote if you find it helpful.

How to train two dense layers at different learning rates in Tensorflow?

I am trying to build a multi-task CNN in Tensorflow which has two dense dense layers in parallel ,one for Age prediction and other for Gender prediction. How can I train each Dense layer for different number of epochs since one can converge before the other and training both for same no of epochs would overfit one of them?
Also, if I propagate the gradients of both age and gender to the CNN, would it overfit since it's weights are being updated at twice the rate of Dense layers?

I have ask a similar question and i've finally found the answer : LINK
SOLUTION : You can define 2 different train_step, and each one has his own learning rate. Each train_step can be called a chosen number of times. In addition, you can define some dependencies if you want some variables to be trainable only for a selected train_step. (See the documentation).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.