Unset trainable attributes for parameters in lasagne / nolearn neural networks

Unset trainable attributes for parameters in lasagne / nolearn neural networks - python

I'm implementing a convolutional neural network using lasagne nolearn.
I'd like to fix some parameters that prelearned.
How can I set some layers untrainable?
Actually, though I removed 'trainable' attribute of some layers,
the number shown in the layer information before fitting, namely, such as
Neural Network with *** learnable parameters never change.
Besides, I'm afraid that the greeting function
in 'handers.py'
def _get_greeting(nn):
shapes = [param.get_value().shape for param in
nn.get_all_params() if param]
should be
nn.get_all_params(trainable=True) if param]
but I'm not sure how it affect on training.

Related

Unclear purpose of a class derived from Keras' BatchNormalization

I'm reading the code of a Keras implementation of YOLOv4 object detector.
It uses a custom Batch Norm layer, like this:
class BatchNormalization(tf.keras.layers.BatchNormalization):
"""
"Frozen state" and "inference mode" are two separate concepts.
`layer.trainable = False` is to freeze the layer, so the layer will use
stored moving `var` and `mean` in the "inference mode", and both `gama`
and `beta` will not be updated !
"""
def call(self, x, training=False):
if not training:
training = tf.constant(False)
training = tf.logical_and(training, self.trainable)
return super().call(x, training)
Even though I understand how the usual Batch Norm layer works during training and inference, I don't understand the comment nor the need for this modification. What do you think?

Usually (in another types of layers) freeze training of the layer doesn't necessarily mean that the layer is run in inference mode.
Inference mode is normally controller by the training argument.
In the case of batchnorm layer, when the layer is freeze we want that not only the layer parameters will not modify during the training process, we want addionaly that the model will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch.
So despite the difference between a layer training freeze and an inference mode that is common. In the case of the batchnorm there is more similarity between the two modes.
In both, we want that in addition to freezing the parameters of the layer the model used the general mean and standard deviation and will not be affected by the current batch statistics.
(It's important for the stability of the model).
From BatchNormalization layer guide - Keras
About setting layer.trainable = False on a BatchNormalization layer:
Freezing a layer in all the types of layers:
The meaning of setting layer.trainable = False is to freeze the layer, i.e. its internal state will not change during training: its trainable weights will not be updated during fit() or train_on_batch(), and its state updates will not be run.
additional behavior when freezing a batchnorm layer:
However, in the case of the BatchNormalization layer, setting trainable = False on the layer means that the layer will be subsequently run in inference mode (meaning that it will use the moving mean and the moving variance to normalize the current batch, rather than using the mean and variance of the current batch).

How to make a neural network generalizes better?

I designed a neural network model with large number of output predicted by softmax function. However, I want categorize all the outputs into 5 outputs without modifying the architecture of other layers. The model performs well in the first case but when I decrease the number of output it loses accuracy and get a bad generalization. My question is : Is there a method to make my model performs well even if there is just 5 outputs ? for example : adding dropout layer before output layer, using other activation function, etc.

If it is a plain neural network then yeah definitely use the RelU activation function in the hidden layers and add dropout layer for each hidden layer. Also you can normalize you data before feeding them to the network.

How to control differential chain rule in Keras

I have a convolutional neural network with some layers in keras. The last layer in this network is a custom layer that is responsible for sorting some numbers those this layer gets from previous layer, then, the output of custom layer is sent for calculate loss function.
for this purpose (sorting) I use some operator in this layer such as K.argmax and K.gather.
In the back-propagation phase I get error from keras that says:
An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval
that is reasonable cause the involvement of this layer in the derivation process.
Given that my custom layer do not need to corporate in differential chain rule, how can I control differential chain in keras? can I disable this process in custom layer?
Reorder layer that I used in my code is simply following:
def Reorder(args):
z = args[0]
l = args[1]
index = K.tf.argmax(l, axis=1)
return K.tf.gather(z, index)
Reorder_Layer = Lambda(Reorder, name='out_x')
pred_x = Reorder_Layer([z, op])

A few things:
It's impossible to train without a derivative, so, there is no solution if you want to train this model
It's not necessary to "compile" if you are only going to predict, so you don't need custom derivation rules
If the problem is really in that layer, I suppose that l is computed by the model using trainable layers before it.
If you really want to try this, which doesn't seem a good idea, you can try a l = keras.backend.stop_gradient(args[1]). But this means that absolutely nothing will be trained from l until the beginning of the model. If this doesn't work, then you have to make all layers that produce l have trainable=False before compiling the model.

LSTM without parameter sharing in Tensorflow/Keras

We know that one of the features of LSTM is parameter sharing between different timesteps. I want to have LSTM without parameter (weight and bias) sharing. How can I implement this structure using Tensorflow and Keras?

mxnet: multiple dropout layers with shared mask

I'd like to reproduce a recurrent neural network where each time layer is followed by a dropout layer, and these dropout layers share their masks. This structure was described in, among others, A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.
As far as I understand the code, the recurrent network models implemented in MXNet do not have any dropout layers applied between time layers; the dropout parameter of functions such as lstm (R API, Python API) actually defines dropout on the input. Therefore I'd need to reimplement these functions from scratch.
However, the Dropout layer does not seem to take a variable that defines mask as a parameter.
Is it possible to make multiple dropout layers in different places of the computation graph, yet sharing their masks?

According to the discussion here, it is not possible to specify the mask and using random seed does not have an impact on dropout's random number generator.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.