what is the first initialized weight in pytorch convolutional layer - python

I do self-studying in Udacity PyTorch
Regarding to the last paragraph
Learning
In the code you've been working with, you've been setting the values of filter weights explicitly, but neural networks will actually learn the best filter weights as they train on a set of image data. You'll learn all about this type of neural network later in this section, but know that high-pass and low-pass filters are what define the behavior of a network like this, and you know how to code those from scratch!
In practice, you'll also find that many neural networks learn to detect the edges of images because the edges of object contain valuable information about the shape of an object.
I have studied all through the last 44th sections. But I couldn't be able to answer the following questions
What is the initialized weight when I do torch.nn.Conv2d? And how to define it myself?
How does PyTorch update weights in the convolutional layer?

When you declared nn.Conv2d the weights are initialized via this code.
In particular, if you give bias it uses initialization as proposed by Kaiming et.al. It initializes as uniform distribution between (-bound, bound) where bound=\sqrt{6/((1+a^2)fan_in)} (See here).
You can initialize weight manually too. This has been answered elsewhere (See here) and I won't repeat it.
When you call optimizer.step and optimizer has parameters of convolutional filter registered they are updated.

1.In PyTorch, Conv2d is designed to accept 4D Tensor of shape (N, C, H, W) as an input for forward pass, where N is the number of samples in mini-batch, C is the number of input channels (for example 3 color channel of an image), H and W are height and width of an image.
Your weights should reflect that and be 4D Tensor of shape (F, C, K_H, K_W) where F would be the number of different kernels you would like to have in this layer, C is the number of input channels, K_H and K_W are height and width of kernels. Exact values of initialization can be computed using formula in PyTorch docs, nn.Conv2d definition.
Here is a great figure which will help to visualize computation.
Cross-correlation computation with 2 input channels. Ref. http://www.d2l.ai/chapter_convolutional-neural-networks/channels.html, Fig. 6.4.1
2.Weights are updated using back propagation algorithm by calculating gradients. It is executed under the hood in PyTorch. If you are initializing weights yourself, you should add requires_grad=True for the weight tensor to specifically say that this tensor should be updated by back propagation.

Related

How can only giving number of channels and no height and width to my convolutional neural network work?

Hello I am a bit new to the deep learning community and I have been really fed up with how to feed in data throught a neural network. So I was doing the sentdex pytorch series and I was learning convnets. He was using the cats and dogs dataset of microsoft on kaggle. He had resized the image to 50 by 50 and turned them into grayscale. If you want to see the video to answer my question here it is -
https://pythonprogramming.net/convnet-model-deep-learning-neural-network-pytorch/
So a few thoughts came in my mind while watching the video. The input he passed is only the colour channel of the image -
At once at seeing the input he entered it came in my mind why is he only passing the number of channels which is a grayscale image. When a conv2d takes 3 inputs.
And I mean it litterally works. I tried researching a bit but no where I found a good explaination for the input shape that is being fed in here
So I have 2 thoughts and questions about this -
So does that line mean that the convolutional neural network will
only take in an image that is grayscale and is of any height and
width and if so please tell how to limit the dimensions like that to
make our cnn only accept a input shape of (50, 50, 1).
And if not then please explain what does it mean, and how we can make
it accept any input.
Convolutional layers use the convolution operation i.e. sliding of a kernel (matrix) over the input and taking the sum of elementwise products at each position while sliding. Thus, the input dimensions will affect the output dimensions, however, it is not necessary to fix the input dimensions.
Thus, the layer can be defined as nn.Conv2d(1, 32, 5) where 1 indicates number of channels of input, 32 is number of channels of output, and 5 is the size of the kernel (it is 5x5 in this case since it is 2D).
The 32 output channels will actually mean that there will be 32 such 5x5 kernels which will be applied to the input and each output will be stacked to get an output of h x w x 32. Note that this h and w will be different than the h_in and w_in in case of not using padding, but same if you use padding.
1 input channel mentioned in the layer means that the layer will accept only single channeled inputs (which are effectively grayscale images).
If you want to limit your CNN to use (50, 50, 1) inputs only, then you can resize the image before feeding it (you can do that using OpenCV).
Check this site for some animations of convolutions.
Update: Adding more things asked in the comments by the OP.
Yes, you can input images of any shape (I suppose they still have to be at least the size of the kernel). So, theoretically, you can input any image to a convolutional layer, but not necessarily to your CNN. That is because the CNN may have flattening operations followed by fully connected layers (nn.Linear). These flattening + fully connected will expect certain dimensions (which are fixed by you in the code), so you cannot give any input image to your CNN i.e. you have to ensure that flattening the last convolutional layer's output has dimensions equal to your first fully connected layer's dimensions.
Edit: You can actually give any sized input even for a CNN containing fully-connected layers by using a Global Average Pooling (GAP) layer to reduce the size to a fixed size irrespective of the input size. It is called Adaptive Average Pooling in PyTorch.
For example, consider this network (image attached)
In this, the convolutional kernels sizes are mentioned below the arrows, and the blue cuboids represent the output after each convolutional layer. At the end, there are fully connected layers (boxes with circles) which have fixed dimensions. So, the last convolutional layer output has dimensions 66256 = 9216, which is also the dimension of the first fully connected layer.
So, basically, you design your network such that the last convolutional output flattened has same dimensions as the first fully connected layer. Note that there are some networks called Fully Convolutional Networks (FCNs) which don't use these fully connected layers and thus are input size independent. The network design and choice of layers depends on your application.

Customized convolutional layer in TensorFlow

Let's assume i want to make the following layer in a neural network: Instead of having a square convolutional filter that moves over some image, I want the shape of the filter to be some other shape, say a rectangle, circle, triangle, etc (this is of course a silly example; the real case I have in mind is something different). How would I implement such a layer in TensorFlow?
I found that one can define custom layers in Keras by extending tf.keras.layers.Layer, but the documentation is quite limited without many examples. A python implementation of a convolutional layer by for example extending the tf.keras.layer.Layer would probably help as well, but it seems that the convolutional layers are implemented in C. Does this mean that I have to implement my custom layer in C to get any reasonable speed or would Python TensorFlow operations be enough?
Edit: Perhaps it is enough if I can just define a tensor of weights, but where I can customize entries in the tensor that are identically zero and some weights showing up in multiple places in this tensor, then I should be able to by hand build a convolutional layer and other layers. How would I do this, and also include these variables in training?
Edit2: Let me add some more clarifications. We can take the example of building a 5x5 convolutional layer with one output channel from scratch. If the input is say 10x10 (plus padding so output is also 10x10)), I would imagine doing this by creating a matrix of size 100x100. Then I would fill in the 25 weights in the correct locations in this matrix (so some entries are zero, and some entries are equal, ie all 25 weights will show up in many locations in this matrix). I then multiply the input with this matrix to get an output. So my question would be twofold: 1. How do I do this in TensorFlow? 2. Would this be very inefficient and is some other approach recommended (assuming that I want to later customize what this filter looks like and thus the standard conv2d is not good enough).
Edit3: It seems doable by using sparse tensors and assigning values via a previously defined tf.Variable. However I don't know if this approach will suffer from performance issues.
Just use regular conv. layers with square filters, and zero out some values after each weight update:
g = tf.get_default_graph()
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
conv1_filter = g.get_tensor_by_name('conv1:0')
sess.run(tf.assign(conv1_filter, tf.multiply(conv1_filter, my_mask)))
where my_mask is a binary tensor (of the same shape and type as your filters) that matches the desired pattern.
EDIT: if you're not familiar with tensorflow, you might get confused about using the code above. I recommend looking at this example, and specifically at the way the model is constructed (if you do it like this you can access first layer filters as 'conv1/weights'). Also, I recommend switching to PyTorch :)

How to implement Region of Interest Pooling layer in tensorflow?

I am trying to create Faster RCNN like model. I get stuck when it comes to the ROI pooling from the feature map. I know here billinear sampling can be used but, it may not help for end to end training. How to implement this ROI pooling layer in tensorflow?
Bilinear sampling - as the name suggests - can actually be used even with end-to-end training as it's basically a linear operation. However, the disadvantage would be that your local maxima (i.e. strong excitations or certain units) could vanish because your sampling points just happen to be close to the minima. To remedy this, you can instead apply a max_pool(features, kernel, stride) operation where kernel and stride are adjusted such that the final output of this max pool operation does always have the same dimensions.
An example: your features have size 12x12 and you would like to pool to 4x4, then setting kernel=(3,3) and stride=(3,3) would help you achieve that and for each 3x3 patch, the strongest excitations in the respective feature maps will be contained in the output.

Does convolution kernel need to be designed in CNN (Convolutional Neural Networks)?

I am new to Convolutional Neural Networks. I am reading some tutorial and testing some sample codes using Keras. To add a convolution layer, basically I just need to specify the number of kernels and the size of the kernel.
My question is what each kernel looks like? Are they generic to all computer vision applications?
My question is what each kernel looks like?
This depends on the parameters you chose for your Convolutional Layer:
It will indeed depend on the kernel_size parameter you mentioned, as it will determine the shape and size of your kernel. Say you pass this parameter as (3,3) (on a Conv2D layer naturally), you will then obtain a 3x3 Kernel Matrix.
It will depend on your kernel_initializer parameter, which determines the way that MxN Kernel Matrix is going to be filled. It's default value is "glorot_uniform", which is explained on its doc page:
Glorot uniform initializer, also called Xavier uniform initializer. It draws samples from a uniform distribution within [-limit, limit] where limit is sqrt(6 / (fan_in + fan_out)) where fan_in is the number of input units in the weight tensor and fan_out is the number of output units in the weight tensor.
This is telling us the specific way it fills that kernel matrix. You may well select any other kernel initializer you desire to fit your needs. You may even build Custom Initializers, also exemplified in that doc page:
from keras import backend as K
def my_init(shape, dtype=None):
#or whatever you want to customize
return K.random_normal(shape, dtype=dtype)
model.add(Dense(64, kernel_initializer=my_init))
Furthermore, it will depend on your kernel_regularizer parameter, which defines regularization functions applied to the weights of your kernel. It's default value is None but you can select others from the ones available. You can again define your own custom initializers in a similar fashion:
def l1_reg(weight_matrix):
#same here, fit your own needs
return 0.01 * K.sum(K.abs(weight_matrix))
model.add(Dense(64, input_dim=64,
kernel_regularizer=l1_reg)
Are they generic to all computer vision applications?
This I think may be a bit broad, however I would venture and say yes. Keras has available many kernels that were designed to specifically adapt to Deep Learning applications; it includes those ones that are most commonly used throughout the literature and well-known applications.
The good thing is that, as illustrated before, if any of those kernels does not fit your needs you could well define your own custom initializer, or well enhance it by using regularizes. This enables you to tackle those really specific CV problems you may have.
The actual kernel values are learned during the learning process, that's why you only need to set the number of kernels and their size.
What might be confusing is that the learned kernel values actually mimic things like Gabor and edge detection filters. These are generic to many computer vision applications, but instead of being engineered manually, they are learned from a big classification dataset (like ImageNet).
Also the kernel values are part of a feature hierarchy that can be used directly as features for a variety of computer vision problems. In that terms they are also generic.

tensorflow - Apply superpixel filter to network output

Good afternoon,
I have a convolutional neural network to perform pixelwise classification with 6 classes of a batch of images. I would like to apply superpixel algorithm (the one in opencv) to the output of the network. Actually, the superpixels would be calculated from the input images, and then for every of these superpixel locations in the network output, I would compute the mode of the output classes, in order to have the same output class for every superpixel of the input image.
Since the output of the network during a feedforward pass is a [batch, w, h, 6] size tensor, I was thinking to reshape the tensor to [batch*w, h, 6] and then to iterate for every class (for i in range(6)) and compute the mode of that class for every superpixel and then reshape back to the original size.
What I would code in a numpy-based script should be something like:
for i in range ( number of superpixels):
for j in range(number of classes=6):
mask = superpixel_location[i]
net_new_output[:,:,j][mask] = mode(net_output[:,:,j][mask])
Although this is definitely easy to code in numpy, I am having issues trying to executing it in tensorflow, since I do not know how to implement for loops or how to manage them.
Can you help me out?
Thank you,
MC

Categories