I need to implement neurons freezing in CNN for a deep learning research,
I tried to find any function in the Tensorflow docs, but I didn't find anything.
How can I freeze specific neuron when I implemented the layers with tf.nn.conv2d?
A neuron in a dense neural network layer simply corresponds to a column in a weight matrix. You could therefore redefine your weight matrix as a concatenation of 2 parts/variables, one trainable and one not. Then you could either:
selectively pass only the trainable part in the var_list argument of the minimize function of your optimizer, or
Use tf.stop_gradient on the vector/column corresponding to the neuron you want to freeze.
The same concept could be used for convolutional layers, although in this case the definition of a "neuron" becomes unclear; still, you could freeze any column(s) of a convolutional kernel.
As clarified in the comments, you want to freeze Neurons in a tf.nn.conv2d convolution. While there is direct way of doing this in Tensorflow (as per my search), you could try slicing the Tensor and applying tf.stop_gradient() to it. Here is a stackoverflow answer to give you an intuition on how to use tf.stop_gradient()
I haven't tested it, but according to the docs I think it should work.
Related
I would need to do a kind of custom backpropagation so that, in an arbitrary layer of the network I can decide if actually modify the weights going outside that layer, or make them unchanged.
For example: I would like to study what happens if, during training, I force to not update some weight connecting input layer to 1st layer.
Is there a simple way to just "correct" the normal backpropagation intersecting between the layers ?
Thanks
I'm currently stuyind TensorFlow 2.0 and Keras. I know that the activation functions are used to calculate the output of each layer of a neural network, based on mathematical functions. However, when searching about layers, I can't find synthetic and easy-to-read information for a beginner in deep learning.
There's a keras documentation, but I would like to know synthetically:
what are the most common layers used to create a model (Dense, Flatten, MaxPooling2D, Dropout, ...).
In which case to use each of them? (Classification, regression, other)
what is the appropriate way to use each layer depending on each case?
Depending on the problem you want to solve, there are different activation functions and loss functions that you can use.
Regression problem: You want to predict the price of a building. You have N features. Of course, the price of the building is a real number, therefore you need to have mean_squared_error as a loss function and a linear activation for your last node. In this case, you can have a couple of Dense() layers with relu activation, while your last layer is a Dense(1,activation='linear').
In between the Dense() layers, you can add Dropout() so as to mitigate the overfitting effect(if present).
Classification problem: You want to detect whether or not someone is diabetic while taking into account several factors/features. In this case, you can use again stacked Dense() layers but your last layer will be a Dense(1,activation='sigmoid'), since you want to detect whether a patient is or not diabetic. The loss function in this case is 'binary_crossentropy'. In between the Dense() layers, you can add Dropout() so as to mitigate the overfitting effect(if present).
Image processing problems: Here you surely have stacks of [Conv2D(),MaxPool2D(),Dropout()]. MaxPooling2D is an operation which is typical for image processing and also some natural language processing(not going to expand upon here). Sometimes, in convolutional neural network architectures, the Flatten() layer is used. Its purpose is to reduce the dimensionality of the feature maps into 1D vector whose dimension is equal to the total number of elements within the entire feature map depth. For example, if you had a matrix of [28,28], flattening it would reduce it to (1,784), where 784=28*28.
Although the question is quite broad and maybe some people will vote to close it, I tried to provide you a short overview over what you asked. I recommend that your start learning the basics behind neural networks and then delve deeper into using a framework, such as TensorFlow or PyTorch.
Let's assume i want to make the following layer in a neural network: Instead of having a square convolutional filter that moves over some image, I want the shape of the filter to be some other shape, say a rectangle, circle, triangle, etc (this is of course a silly example; the real case I have in mind is something different). How would I implement such a layer in TensorFlow?
I found that one can define custom layers in Keras by extending tf.keras.layers.Layer, but the documentation is quite limited without many examples. A python implementation of a convolutional layer by for example extending the tf.keras.layer.Layer would probably help as well, but it seems that the convolutional layers are implemented in C. Does this mean that I have to implement my custom layer in C to get any reasonable speed or would Python TensorFlow operations be enough?
Edit: Perhaps it is enough if I can just define a tensor of weights, but where I can customize entries in the tensor that are identically zero and some weights showing up in multiple places in this tensor, then I should be able to by hand build a convolutional layer and other layers. How would I do this, and also include these variables in training?
Edit2: Let me add some more clarifications. We can take the example of building a 5x5 convolutional layer with one output channel from scratch. If the input is say 10x10 (plus padding so output is also 10x10)), I would imagine doing this by creating a matrix of size 100x100. Then I would fill in the 25 weights in the correct locations in this matrix (so some entries are zero, and some entries are equal, ie all 25 weights will show up in many locations in this matrix). I then multiply the input with this matrix to get an output. So my question would be twofold: 1. How do I do this in TensorFlow? 2. Would this be very inefficient and is some other approach recommended (assuming that I want to later customize what this filter looks like and thus the standard conv2d is not good enough).
Edit3: It seems doable by using sparse tensors and assigning values via a previously defined tf.Variable. However I don't know if this approach will suffer from performance issues.
Just use regular conv. layers with square filters, and zero out some values after each weight update:
g = tf.get_default_graph()
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
conv1_filter = g.get_tensor_by_name('conv1:0')
sess.run(tf.assign(conv1_filter, tf.multiply(conv1_filter, my_mask)))
where my_mask is a binary tensor (of the same shape and type as your filters) that matches the desired pattern.
EDIT: if you're not familiar with tensorflow, you might get confused about using the code above. I recommend looking at this example, and specifically at the way the model is constructed (if you do it like this you can access first layer filters as 'conv1/weights'). Also, I recommend switching to PyTorch :)
So I have a custom layer, that does not have any weights.
In a fist step, I tried to implement the functions manipulating the input tensors in Kers. But I did not succeed because of many reasons. My second approach was to implement the functions with numpy operations, since the custom layer I am implementing does not have any weights, from my understanding, I would say, that I could use numpy operarations, as I don't need backpropagation, since there are no weights, right? And then, I would just convert the output of my layer to a tensor with:
Keras.backend.variable(value = output)
So the main idea is to implement a custom layer, that takes tensors, convert them to numpy arrays, operate on them with numpy operations, then convert the output to a tensor.
The problem is that I seem not to be able to use .eval() in order to convert the input tensors of my layer into numpy arrays, so that they could be manipulated with numpy operations.
Can anybody tell, how I can get around this problem ?
As mentioned by Daniel Möller in the comments, Keras needs to be able to backpropagate through your layer, in order the calculate the gradients for the previous layers. Your layer needs to be differentiable for this reason.
For the same reason you can only use Keras operations, as those can be automatically differentiated with autograd. If your layer is something simple, have a look at the Lambda layer where you can implement custom layers quickly.
As an aside, Keras backend functions should cover a lot of use cases, so if you're stuck with writing your layer through those, you might want to post a another question here.
Hope this helps.
Tensorflow's DropoutWrapper allows to apply dropout to either the cell's inputs, outputs or states. However, I haven't seen an option to do the same thing for the recurrent weights of the cell (4 out of the 8 different matrices used in the original LSTM formulation). I just wanted to check that this is the case before implementing a Wrapper of my own.
EDIT:
Apparently this functionality has been added in newer versions (my original comment referred to v1.4): https://github.com/tensorflow/tensorflow/issues/13103
It's because original LSTM model only applies dropout on the input and output layers (only to the non-recurrent layers.) This paper is considered as a "textbook" that describes the LSTM with dropout: https://arxiv.org/pdf/1409.2329.pdf
Recently some people tried applying dropout in recurrent layers as well. If you want to look at the implementation and the math behind it, search for "A Theoretically Grounded Application of Dropout in Recurrent Neural Networks" by Yarin Gal. I'm not sure Tensorflow or Keras already implemented this approach though.