I would need to do a kind of custom backpropagation so that, in an arbitrary layer of the network I can decide if actually modify the weights going outside that layer, or make them unchanged.
For example: I would like to study what happens if, during training, I force to not update some weight connecting input layer to 1st layer.
Is there a simple way to just "correct" the normal backpropagation intersecting between the layers ?
Thanks
Related
PyTorch RNN has RNN.weight_ih, which is a weight between input and hidden layer, and RNN.weight_hh, which is a weight between hidden and hidden. Why is there no weight between hidden and output?
When I was learning about RNNs, I learned that there are 3 weights.
There’s no weight there because the PyTorch RNN doesn’t prescribe how to create the output from the hidden state. When you apply the RNN to a sequence, it returns the sequence of hidden states.
You can decide what to do with these: maybe a linear transformation is the right way to get the output (as you learned it). Maybe you don’t need the outputs, except for the final one. In that case, you can save O(T) computations by only computing the final output, manually.
I am creating a CNN to predict the distributed strain applied to an optical fiber from the measured light spectrum (2D), which is ideally a Lorentzian curve. The label is a 1D array where only the strained section is non-zero (the label looks like a square wave).
My CNN has 10 alternating convolution and pooling layers, all activated by RelU. This is then followed by 3 fully-connected hidden layers with softmax activation, then an output layer activated by RelU. Usually, CNNs and other neural networks make use of RelU for hidden layers and then softmax for output layer (in the case of classification problems). But in this case, I use softmax to first determine the positions of optical fiber for which strain is applied (i.e. non-zero) and then use RelU in the output for regression. My CNN is able to predict the labels rather accurately, but I cannot find any supporting publication where softmax is used in hidden layers, followed by RelU in the output layer; nor why this approach is conversely not recommended (i.e. not mathematically possible) other than those I found in Quora/Stackoverflow. I would really appreciate if anyone could enlighten me on this matter as I am pretty new to deep learning, and wish to learn from this. Thank you in advance!
If you look at the way a layer l sees the input from a previous layer l-1, it is assuming that the dimensions of the feature vector are linearly independent.
If the model is building some kind of confidence using a set of neurons, then the neurons better be linearly independent otherwise it is simply exaggerating the value of 1 neuron.
If you apply softmax in hidden layers then you are essentially combining multiple neurons and tampering with their independence. Also if you look at the reasons why ReLU is preferred is because it can give you a better gradients which other activations like sigmoid won't. Also if your goal is too add normalization to your layers, you’d be better off with using an explicit batch normalization layer
I was working with a transfer learning task.
Inadvertently, to make things easier, I put all the last layers in a torch.nn.Sequential wrapper like this-
self.fc=nn.Sequential(
nn.Linear(24*24*64,80),
nn.ReLU(True),
nn.Linear(80,964),
)
Now what I wanted to do is to change the last 80-unit linear layer with an identity mapping. I had trained this network and saved the weights, and I don't want to train it again (time-consuming :( ).
Is there any way I can replace the Linear layer inside ?
I know the usual outer layer replacement with model.fc1=nn.Identity(), but I just do not think this can happen here, as the last fc individual layers are not single objects, and wrapped in the Sequential layer object.
Perhaps, another hack around ? I have all the time amidst the coronavirus crisis :), any other solution would do ?
I have a convolutional neural network with some layers in keras. The last layer in this network is a custom layer that is responsible for sorting some numbers those this layer gets from previous layer, then, the output of custom layer is sent for calculate loss function.
for this purpose (sorting) I use some operator in this layer such as K.argmax and K.gather.
In the back-propagation phase I get error from keras that says:
An operation has None for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval
that is reasonable cause the involvement of this layer in the derivation process.
Given that my custom layer do not need to corporate in differential chain rule, how can I control differential chain in keras? can I disable this process in custom layer?
Reorder layer that I used in my code is simply following:
def Reorder(args):
z = args[0]
l = args[1]
index = K.tf.argmax(l, axis=1)
return K.tf.gather(z, index)
Reorder_Layer = Lambda(Reorder, name='out_x')
pred_x = Reorder_Layer([z, op])
A few things:
It's impossible to train without a derivative, so, there is no solution if you want to train this model
It's not necessary to "compile" if you are only going to predict, so you don't need custom derivation rules
If the problem is really in that layer, I suppose that l is computed by the model using trainable layers before it.
If you really want to try this, which doesn't seem a good idea, you can try a l = keras.backend.stop_gradient(args[1]). But this means that absolutely nothing will be trained from l until the beginning of the model. If this doesn't work, then you have to make all layers that produce l have trainable=False before compiling the model.
I need to implement neurons freezing in CNN for a deep learning research,
I tried to find any function in the Tensorflow docs, but I didn't find anything.
How can I freeze specific neuron when I implemented the layers with tf.nn.conv2d?
A neuron in a dense neural network layer simply corresponds to a column in a weight matrix. You could therefore redefine your weight matrix as a concatenation of 2 parts/variables, one trainable and one not. Then you could either:
selectively pass only the trainable part in the var_list argument of the minimize function of your optimizer, or
Use tf.stop_gradient on the vector/column corresponding to the neuron you want to freeze.
The same concept could be used for convolutional layers, although in this case the definition of a "neuron" becomes unclear; still, you could freeze any column(s) of a convolutional kernel.
As clarified in the comments, you want to freeze Neurons in a tf.nn.conv2d convolution. While there is direct way of doing this in Tensorflow (as per my search), you could try slicing the Tensor and applying tf.stop_gradient() to it. Here is a stackoverflow answer to give you an intuition on how to use tf.stop_gradient()
I haven't tested it, but according to the docs I think it should work.