I recently came across a method in Pytorch when I try to implement AlexNet.
I don't understand how it works. Please explain the idea behind it with some examples. And how it is different from Maxpooling or Average poling in terms of Neural Network functionality
nn.AdaptiveAvgPool2d((6, 6))
In average-pooling or max-pooling, you essentially set the stride and kernel-size by your own, setting them as hyper-parameters. You will have to re-configure them if you happen to change your input size.
In Adaptive Pooling on the other hand, we specify the output size instead. And the stride and kernel-size are automatically selected to adapt to the needs. The following equations are used to calculate the value in the source code.
Stride = (input_size//output_size)
Kernel size = input_size - (output_size-1)*stride
Padding = 0
Related
How can I set a dynamical kernel size in PyTorch?
I am passing images to my network, and I would like to set my kernels to change size and stride as a function of the eccentricity of the input. Also, I would like to use circles rather than squared kernels.
Does this sound possible at all? I tried to google but had no luck. Any help would be really much appreciated!
EDIT: just to be more clear: the input images have all the same size, and I don't need to change kernel size once the net is initialized. I just need a conv layer that behaves more or less like this (bottom images). Basically a kernel changing size as a function of eccentricity, in order to have high resolution at the center and increasingly lower resolution in the periphery (by pooling bigger and overlapping areas).
Yes, you can do it easily in the forward function..
For example, in this code I am calling to a boolean function, if the result is True I am using conv1, else I am using conv2.
def forward(self, x):
"""
In the forward function we accept a Tensor of input data and we must return
a Tensor of output data. We can use Modules defined in the constructor as
well as arbitrary operators on Tensors.
"""
if boolean_func(x):
x = self.conv1(x)
else:
x = self.conv2(x)
out = self.dense1(x)
return out
In your case, you can define two conv layers, and use the one you want by condition on the input.
In the backpropagation process, the weights of the layer you used will be updated without any intervention thanks to the computational information stored in the graph.
Let's assume i want to make the following layer in a neural network: Instead of having a square convolutional filter that moves over some image, I want the shape of the filter to be some other shape, say a rectangle, circle, triangle, etc (this is of course a silly example; the real case I have in mind is something different). How would I implement such a layer in TensorFlow?
I found that one can define custom layers in Keras by extending tf.keras.layers.Layer, but the documentation is quite limited without many examples. A python implementation of a convolutional layer by for example extending the tf.keras.layer.Layer would probably help as well, but it seems that the convolutional layers are implemented in C. Does this mean that I have to implement my custom layer in C to get any reasonable speed or would Python TensorFlow operations be enough?
Edit: Perhaps it is enough if I can just define a tensor of weights, but where I can customize entries in the tensor that are identically zero and some weights showing up in multiple places in this tensor, then I should be able to by hand build a convolutional layer and other layers. How would I do this, and also include these variables in training?
Edit2: Let me add some more clarifications. We can take the example of building a 5x5 convolutional layer with one output channel from scratch. If the input is say 10x10 (plus padding so output is also 10x10)), I would imagine doing this by creating a matrix of size 100x100. Then I would fill in the 25 weights in the correct locations in this matrix (so some entries are zero, and some entries are equal, ie all 25 weights will show up in many locations in this matrix). I then multiply the input with this matrix to get an output. So my question would be twofold: 1. How do I do this in TensorFlow? 2. Would this be very inefficient and is some other approach recommended (assuming that I want to later customize what this filter looks like and thus the standard conv2d is not good enough).
Edit3: It seems doable by using sparse tensors and assigning values via a previously defined tf.Variable. However I don't know if this approach will suffer from performance issues.
Just use regular conv. layers with square filters, and zero out some values after each weight update:
g = tf.get_default_graph()
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
conv1_filter = g.get_tensor_by_name('conv1:0')
sess.run(tf.assign(conv1_filter, tf.multiply(conv1_filter, my_mask)))
where my_mask is a binary tensor (of the same shape and type as your filters) that matches the desired pattern.
EDIT: if you're not familiar with tensorflow, you might get confused about using the code above. I recommend looking at this example, and specifically at the way the model is constructed (if you do it like this you can access first layer filters as 'conv1/weights'). Also, I recommend switching to PyTorch :)
I am trying to create Faster RCNN like model. I get stuck when it comes to the ROI pooling from the feature map. I know here billinear sampling can be used but, it may not help for end to end training. How to implement this ROI pooling layer in tensorflow?
Bilinear sampling - as the name suggests - can actually be used even with end-to-end training as it's basically a linear operation. However, the disadvantage would be that your local maxima (i.e. strong excitations or certain units) could vanish because your sampling points just happen to be close to the minima. To remedy this, you can instead apply a max_pool(features, kernel, stride) operation where kernel and stride are adjusted such that the final output of this max pool operation does always have the same dimensions.
An example: your features have size 12x12 and you would like to pool to 4x4, then setting kernel=(3,3) and stride=(3,3) would help you achieve that and for each 3x3 patch, the strongest excitations in the respective feature maps will be contained in the output.
I am working on a project where I need deconvolution. I read that gen_nn_ops.max_pool_grad_v2() can do that. I load the function from tensorflow.python.ops.
As far as I understand, the function takes an input and output tensor where the input is a convolutional layer before max pooling and the output the result of the max pooling operation. But what is grad? And what exactly does the output of the function represent?
ksize = [1,2,2,1]
strides = [1,2,2,1]
padding = 'SAME'
u = gen_nn_ops.max_pool_grad_v2(input, output, grad, ksize, strides, padding)
Unfortunately I did not find anything useful on the Internet.
Regarding deconvolution, max_pool_grad_v2 is probably not the op you're looking for. For deconvolution, you probably want to use the keras layer Conv2DTranspose instead.
max_pool_grad_v2 is a gradient function for computing the gradient of the maxpooling function (you'll see that it's used for that very purpose internally within tensorflow). A gradient function such as _MaxPoolGradGrad computes gradients with respect to the ops' inputs given gradients with respect to the ops' outputs. You don't really need to understand how gradients are implemented in tensorflow in order to use tensorflow unless you wanted to implement some of your own, but if you did, there is a guide on the main tensorflow site.
I am new to Convolutional Neural Networks. I am reading some tutorial and testing some sample codes using Keras. To add a convolution layer, basically I just need to specify the number of kernels and the size of the kernel.
My question is what each kernel looks like? Are they generic to all computer vision applications?
My question is what each kernel looks like?
This depends on the parameters you chose for your Convolutional Layer:
It will indeed depend on the kernel_size parameter you mentioned, as it will determine the shape and size of your kernel. Say you pass this parameter as (3,3) (on a Conv2D layer naturally), you will then obtain a 3x3 Kernel Matrix.
It will depend on your kernel_initializer parameter, which determines the way that MxN Kernel Matrix is going to be filled. It's default value is "glorot_uniform", which is explained on its doc page:
Glorot uniform initializer, also called Xavier uniform initializer. It draws samples from a uniform distribution within [-limit, limit] where limit is sqrt(6 / (fan_in + fan_out)) where fan_in is the number of input units in the weight tensor and fan_out is the number of output units in the weight tensor.
This is telling us the specific way it fills that kernel matrix. You may well select any other kernel initializer you desire to fit your needs. You may even build Custom Initializers, also exemplified in that doc page:
from keras import backend as K
def my_init(shape, dtype=None):
#or whatever you want to customize
return K.random_normal(shape, dtype=dtype)
model.add(Dense(64, kernel_initializer=my_init))
Furthermore, it will depend on your kernel_regularizer parameter, which defines regularization functions applied to the weights of your kernel. It's default value is None but you can select others from the ones available. You can again define your own custom initializers in a similar fashion:
def l1_reg(weight_matrix):
#same here, fit your own needs
return 0.01 * K.sum(K.abs(weight_matrix))
model.add(Dense(64, input_dim=64,
kernel_regularizer=l1_reg)
Are they generic to all computer vision applications?
This I think may be a bit broad, however I would venture and say yes. Keras has available many kernels that were designed to specifically adapt to Deep Learning applications; it includes those ones that are most commonly used throughout the literature and well-known applications.
The good thing is that, as illustrated before, if any of those kernels does not fit your needs you could well define your own custom initializer, or well enhance it by using regularizes. This enables you to tackle those really specific CV problems you may have.
The actual kernel values are learned during the learning process, that's why you only need to set the number of kernels and their size.
What might be confusing is that the learned kernel values actually mimic things like Gabor and edge detection filters. These are generic to many computer vision applications, but instead of being engineered manually, they are learned from a big classification dataset (like ImageNet).
Also the kernel values are part of a feature hierarchy that can be used directly as features for a variety of computer vision problems. In that terms they are also generic.