Let's assume i want to make the following layer in a neural network: Instead of having a square convolutional filter that moves over some image, I want the shape of the filter to be some other shape, say a rectangle, circle, triangle, etc (this is of course a silly example; the real case I have in mind is something different). How would I implement such a layer in TensorFlow?
I found that one can define custom layers in Keras by extending tf.keras.layers.Layer, but the documentation is quite limited without many examples. A python implementation of a convolutional layer by for example extending the tf.keras.layer.Layer would probably help as well, but it seems that the convolutional layers are implemented in C. Does this mean that I have to implement my custom layer in C to get any reasonable speed or would Python TensorFlow operations be enough?
Edit: Perhaps it is enough if I can just define a tensor of weights, but where I can customize entries in the tensor that are identically zero and some weights showing up in multiple places in this tensor, then I should be able to by hand build a convolutional layer and other layers. How would I do this, and also include these variables in training?
Edit2: Let me add some more clarifications. We can take the example of building a 5x5 convolutional layer with one output channel from scratch. If the input is say 10x10 (plus padding so output is also 10x10)), I would imagine doing this by creating a matrix of size 100x100. Then I would fill in the 25 weights in the correct locations in this matrix (so some entries are zero, and some entries are equal, ie all 25 weights will show up in many locations in this matrix). I then multiply the input with this matrix to get an output. So my question would be twofold: 1. How do I do this in TensorFlow? 2. Would this be very inefficient and is some other approach recommended (assuming that I want to later customize what this filter looks like and thus the standard conv2d is not good enough).
Edit3: It seems doable by using sparse tensors and assigning values via a previously defined tf.Variable. However I don't know if this approach will suffer from performance issues.
Just use regular conv. layers with square filters, and zero out some values after each weight update:
g = tf.get_default_graph()
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
conv1_filter = g.get_tensor_by_name('conv1:0')
sess.run(tf.assign(conv1_filter, tf.multiply(conv1_filter, my_mask)))
where my_mask is a binary tensor (of the same shape and type as your filters) that matches the desired pattern.
EDIT: if you're not familiar with tensorflow, you might get confused about using the code above. I recommend looking at this example, and specifically at the way the model is constructed (if you do it like this you can access first layer filters as 'conv1/weights'). Also, I recommend switching to PyTorch :)
Related
This is my first post, so I'm sorry if I'm doing something wrong. I'm a beginner when it comes to machine learning. I want to create an accurate model with the cifar10 dataset. I've learned that a Conv2D layer looks for a specific thing in an image classification model. But how do I know what the layer is looking for? I'm not sure if I make sense here, or if anything I've learned so far is true could someone help me out in understanding how layers work in tensorfow?
Convolutional layers extract features from images. They apply a filter to images and output a feature map. Then, one may add dense layers afterwards to obtain labels from these features.
Conv2D is just a 2d convolution used with 2d images - this means that the kernel is 2d also.
Here are a few helpful links: https://machinelearningmastery.com/convolutional-layers-for-deep-learning-neural-networks/
https://towardsdatascience.com/simple-introduction-to-convolutional-neural-networks-cdf8d3077bac
Sorry if this question is incredibly basic. I feel like there is a wealth of resources online, but most of them are half-complete or skip over the details that I want to know.
I am trying to implement LeNet with Pytorch for practice.
https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html
How come in this examples and many examples online, they define the convolutional layers and the fc layers in init, but the subsampling and activation functions in forward?
What is the purpose of using torch.nn.functional for some functions, and torch.nn for others? For example, you have convolution with torch.nn (https://pytorch.org/docs/stable/nn.html#conv1d) and convolution with torch.nn.functional (https://pytorch.org/docs/stable/nn.functional.html#conv1d). Why choose one or the other?
Let's say I want to try different image sizes, like 28x28 (MNIST). The tutorial recommends I resize MNIST. Is there a way to instead change the values of LeNet? What happens if I don't change them?
What is the purpose of num_flat_features? If you wanted to flatten the features, couldn't you just do x = x.view(-1, 16*5*5)?
How come in this examples and many examples online, they define the
convolutional layers and the fc layers in init, but the subsampling
and activation functions in forward?
Any layer with trainable parameters should be defined in __init__. Subsampling, certain activations, dropout, etc.. don't have any trainable parameters so can be defined either in __init__ or used directly via the torch.nn.functional interface during forward.
What is the purpose of using torch.nn.functional for some functions, and torch.nn for others?
The torch.nn.functional functions are the actual functions that are used at the heart of the majority of torch.nn layers, they call into C++ compiled code. For example nn.Conv2d subclasses nn.Module, as should any custom layer or model which contains trainable parameters. The class handles registering parameters and encapsulates some other necessary functionality required for training and testing. During forward it actually uses nn.functional.conv2d to apply the convolution operation. As mentioned in the first question, when performing a parameterless operation like ReLU there's effectively no difference between using the nn.ReLU class and the nn.functional.relu function.
The reason they are provided is they give some freedom to do unconventional things. For example in this answer which I wrote the other day, providing a solution without nn.functional.conv2d would have been difficult.
Let's say I want to try different image sizes, like 28x28 (MNIST). The
tutorial recommends I resize MNIST. Is there a way to instead change
the values of LeNet? What happens if I don't change them?
There's no obvious way to change an existing, trained model to support different image sizes. The size of the input to the linear layer is necessarily fixed and the number of features at that point in the model is generally determined by the size of the input to the network. If the size of the input differs from the size that the model was designed for then when the data progresses to the linear layers it will have the wrong number of elements and cause the program will crash. Some models can handle a range of input sizes, usually by using something like an nn.AdaptiveAvgPool2d layer before the linear layer to ensure the input shape to the linear layer is always the same. Even so, if the input image size is too small then the downsampling and/or pooling operations in the network will cause the feature maps to vanish at some point, causing the program to crash.
What is the purpose of num_flat_features? If you wanted to flatten the
features, couldn't you just do x = x.view(-1, 16*5*5)?
When you define the linear layer you need to tell it how large the weight matrix is. A linear layer's weights are simply an unconstrained matrix (and bias vector). The shape of the weight matrix therefore is determined by the input shape, but you don't know the input shape before you run forward so it needs to be provided as an additional parameter (or hard coded) when you initialize the model.
To get to the actual question. Yes, during forward you could simply use
x = x.view(-1, 16*5*5)
Better yet, use
x = torch.flatten(x, start_dim=1)
This tutorial was written before the .flatten function was added to the library. The authors effectively just wrote their own flatten functionality which could be used regardless of the shape of x. This was probably so you had some portable code that could be used in your model without hard coding sizes. From a programming perspective it's nice to generalize such things since it means you wouldn't have to worry about changing those magic numbers if you decide to change part of the model (though this concern didn't appear to extend to the initialization).
I do self-studying in Udacity PyTorch
Regarding to the last paragraph
Learning
In the code you've been working with, you've been setting the values of filter weights explicitly, but neural networks will actually learn the best filter weights as they train on a set of image data. You'll learn all about this type of neural network later in this section, but know that high-pass and low-pass filters are what define the behavior of a network like this, and you know how to code those from scratch!
In practice, you'll also find that many neural networks learn to detect the edges of images because the edges of object contain valuable information about the shape of an object.
I have studied all through the last 44th sections. But I couldn't be able to answer the following questions
What is the initialized weight when I do torch.nn.Conv2d? And how to define it myself?
How does PyTorch update weights in the convolutional layer?
When you declared nn.Conv2d the weights are initialized via this code.
In particular, if you give bias it uses initialization as proposed by Kaiming et.al. It initializes as uniform distribution between (-bound, bound) where bound=\sqrt{6/((1+a^2)fan_in)} (See here).
You can initialize weight manually too. This has been answered elsewhere (See here) and I won't repeat it.
When you call optimizer.step and optimizer has parameters of convolutional filter registered they are updated.
1.In PyTorch, Conv2d is designed to accept 4D Tensor of shape (N, C, H, W) as an input for forward pass, where N is the number of samples in mini-batch, C is the number of input channels (for example 3 color channel of an image), H and W are height and width of an image.
Your weights should reflect that and be 4D Tensor of shape (F, C, K_H, K_W) where F would be the number of different kernels you would like to have in this layer, C is the number of input channels, K_H and K_W are height and width of kernels. Exact values of initialization can be computed using formula in PyTorch docs, nn.Conv2d definition.
Here is a great figure which will help to visualize computation.
Cross-correlation computation with 2 input channels. Ref. http://www.d2l.ai/chapter_convolutional-neural-networks/channels.html, Fig. 6.4.1
2.Weights are updated using back propagation algorithm by calculating gradients. It is executed under the hood in PyTorch. If you are initializing weights yourself, you should add requires_grad=True for the weight tensor to specifically say that this tensor should be updated by back propagation.
I need to implement neurons freezing in CNN for a deep learning research,
I tried to find any function in the Tensorflow docs, but I didn't find anything.
How can I freeze specific neuron when I implemented the layers with tf.nn.conv2d?
A neuron in a dense neural network layer simply corresponds to a column in a weight matrix. You could therefore redefine your weight matrix as a concatenation of 2 parts/variables, one trainable and one not. Then you could either:
selectively pass only the trainable part in the var_list argument of the minimize function of your optimizer, or
Use tf.stop_gradient on the vector/column corresponding to the neuron you want to freeze.
The same concept could be used for convolutional layers, although in this case the definition of a "neuron" becomes unclear; still, you could freeze any column(s) of a convolutional kernel.
As clarified in the comments, you want to freeze Neurons in a tf.nn.conv2d convolution. While there is direct way of doing this in Tensorflow (as per my search), you could try slicing the Tensor and applying tf.stop_gradient() to it. Here is a stackoverflow answer to give you an intuition on how to use tf.stop_gradient()
I haven't tested it, but according to the docs I think it should work.
I'm implementing a UNet for binary segmentation while using Sigmoid and BCELoss. The problem is that after several iterations the network tries to predict very small values per pixel while for some regions it should predict values close to one (for ground truth mask region). Does it give any intuition about the wrong behavior?
Besides, there exist NLLLoss2d which is used for pixel-wise loss. Currently, I'm simply ignoring this and I'm using MSELoss() directly. Should I use NLLLoss2d with Sigmoid activation layer?
Thanks
Seems to me like that your Sigmoids are saturating the activation maps. The images are not properly normalised or some batch normalisation layers are missing. If you have an implementation that is working with other images check the image loader and make sure it does not saturate the pixel values. This usually happens with 16-bits channels. Can you share some of the input images?
PS Sorry for commenting in the answer. This is a new account and I am not allowed to comment yet.
You might want to use torch.nn.BCEWithLogitsLoss(), replacing the Sigmoid and the BCELoss function.
An excerpt from the docs tells you why its always better to use this loss function implementation.
This loss combines a Sigmoid layer and the BCELoss in one single class. This version is more numerically stable than using a plain Sigmoid followed by a BCELoss as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability.