I am looking at an implementation of AlexNet with PyTorch. According to the formula, output height = (input_height + padding_top + padding_bottom - kernel_height) / stride_height + 1. so using the formula, with input of size 224, stride = 4, padding = 1, and kernel size =11, the output should be of size 54.75. But if you run a summary of the model, you would see that the output of this first layer to be 54. Does PyTorch clip down the output size? If so, does it consistently clip down (seem like it)? I would like to understand what is going behind the scene please .
Here is the code that I refer to:
net = nn.Sequential(
nn.Conv2d(1, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2), nn.Flatten(),
nn.Linear(6400, 4096), nn.ReLU(), nn.Dropout(p=0.5),
nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(p=0.5),
nn.Linear(4096, 10))
The output size is an whole number of course! It's just that your formula is not correct, the expression is height = floor((input_height + padding_top + padding_bottom - kernel_height) / stride_height + 1). This wouldn't make any sense otherwise.
Yes, it's clipped down when you have length of output by decimal point.
Because when you look at the output variable, it'd be a matrix. Every element in a matrix worth a length, and there is no guarantee an element for 0.75 length, so it's either being clipped down to 54 or rounded up to 55. And because the kernel have to stop striding at where the next stride don't match the kernel size(you may draw to understand how funny it would be if the kernel didn't stop there), it has to be clipped down.
Related
I am reading this research paper (https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf) and trying to follow along with the code on Github. I don't understand how the parameters for the nn.Conv2d() were determined. For the first Conv2d: Does 64#96*96 mean 64 channels with a 96 x 96 kernel size? And if so then why is the kernel size 10 in the function? I have googled the parameters and their meanings and from what I read I understand that its (input_channels, output_channels, kernel_size)
Here is the github post: https://github.com/fangpin/siamese-pytorch/blob/master/train.py
For reference page 4 of the research paper has the model schematic.
self.conv = nn.Sequential(
nn.Conv2d(1, 64, 10), # 64#96*96
nn.ReLU(inplace=True),
nn.MaxPool2d(2), # 64#48*48
nn.Conv2d(64, 128, 7),
nn.ReLU(), # 128#42*42
nn.MaxPool2d(2), # 128#21*21
nn.Conv2d(128, 128, 4),
nn.ReLU(), # 128#18*18
nn.MaxPool2d(2), # 128#9*9
nn.Conv2d(128, 256, 4),
nn.ReLU(), # 256#6*6
)
self.liner = nn.Sequential(nn.Linear(9216, 4096), nn.Sigmoid())
self.out = nn.Linear(4096, 1)
If you look at the model schematic, it's showing two things,
Parameters of the convolution kernel,
Parameters of the feature maps (output of the nn.Conv2D op)
For example first conv2d layer is 64#10x10, meaning 64 output channels and a 10x10 kernel.
Whereas the feature map is 64#96x96, which comes from applying 64#10x10 convolution op on 105x105x1 sized input. This way you get 64 output channels and a 105-10+1=96 sized width and height.
I am struggling to work out how to calculate the dimensions for the fully connected layer. I am inputing images which are (448x448) using a batch size (16). Below is the code for my convolutional layers:
class ConvolutionalNet(nn.Module):
def __init__(self, num_classes=182):
super().__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(3, 16, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(16),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer2 = nn.Sequential(
nn.Conv2d(16, 32, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer3 = nn.Sequential(
nn.Conv2d(32, 32, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer4 = nn.Sequential(
nn.Conv2d(32, 64, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer5 = nn.Sequential(
nn.Conv2d(64, 64, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
I want to add a fully connected layer:
self.fc = nn.Linear(?, num_classes)
Would anyone be able to explain the best way to go about calculating this? Also, if I have multiple fully connected layers e.g. (self.fc2, self.fc3), would the second parameter always equal the number of classes. I am new to coding and finding it hard to wrap my head around this.
The conv layers don't change the width/height of the features since you've set padding equal to (kernel_size - 1) / 2. Max pooling with kernel_size = stride = 2 will decrease the width/height by a factor of 2 (rounded down if input shape is not even).
Using 448 as input width/height, the output width/height will be 448 // 2 // 2 // 2 // 2 // 2 = 448/32 = 14 (where // is floor-divide operator).
The number of channels is fully determined by the last conv layer, which outputs 64 channels.
Therefore you will have a [B,64,14,14] shaped tensor, so the Linear layer should have in_features = 64*14*14 = 12544.
Note you'll need to flatten the input beforehand, something like.
self.layer6 = nn.Sequential(
nn.Flatten(),
nn.Linear(12544, num_classes)
)
I am new to AI and python, I'm trying to build an architecture to train a set of images. and later to aim to overfit. but up till now, I couldn't understand how to get the inputs and outputs correctly. I keep seeing the error whenever I try to train the network:
mat1 and mat2 shapes cannot be multiplied (48x13456 and 16x64)
my network:
net2 = nn.Sequential(
nn.Conv2d(3,8, kernel_size=5, padding=0),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(8,16, kernel_size=5, padding=0),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Linear(16,64),
nn.ReLU(),
nn.Linear(64,10)
)
this is a part of a task I'm working on and I really don't get why it's not running. any hints!
its because you have flattened your 2D cnn into 1D FC layers...
& you have to manually calculate your changed input shape from 128 size to your Maxpool layer just before flattening layer ...In your case its 29*29*16
So your code must be rewritten as
net2 = nn.Sequential(
nn.Conv2d(3,8, kernel_size=5, padding=0),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(8,16, kernel_size=5, padding=0),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Linear(13456,64),
nn.ReLU(),
nn.Linear(64,10)
)
This should work
EDIT: This is a simple formula to calculate output size :
(((W - K + 2P)/S) + 1)
Here W = Input size
K = Filter size
S = Stride
P = Padding
So 1st conv block will make your output of size 124
Then you do Maxpool which will make it half i.e 62
2nd conv block will make your output of size 58
Then your last Maxpool will make it 29...
So final flattened output would be 29*29*16 where 16 is output channels
I have a data with depth = 3 and I want to pass it through 3 convolution layers with 3x3x3 kernels each.
My current code is below. The first input is
[batch_size=10, in_channels=1, depth=3, height=128, width=256]
and I notice after the first conv3d layer the output is [10,8,1,126,254]. Obviously it has now depth 1 and doesn't accept it for another 3x3x3 layer. How can I achieve this?
class CNet(nn.Module):
def __init__(self, **kwargs):
super().__init__()
self.conv1 = nn.Conv3d(1, 8, kernel_size=3, stride=1, padding=0)
self.conv2 = nn.Conv3d(8, 16, kernel_size=3, stride=1, padding=0)
self.conv3 = nn.Conv3d(16, 32, kernel_size=3, stride=1, padding=0)
self.fc1 = nn.Linear(value, 2)
def forward(self, X):
X = F.relu(self.conv1(X))
X = F.relu(self.conv2(X))
X = F.max_pool2d(X,2)
X = self.conv3(X)
X = F.max_pool2d(X,2)
X = self.fc1(X)
return F.softmax(X,dim =1)
You need to use padding. If you only want to pad the input for the convolutions after the first one and only in the depth dimensions to get the minimum dimension of 3, you would use padding=(1, 0, 0) (it's 1 because the same padding is applied to both sides, i.e. (padding, input, padding) along that dimension).
self.conv2 = nn.Conv3d(8, 16, kernel_size=3, stride=1, padding=(1, 0, 0))
self.conv3 = nn.Conv3d(16, 32, kernel_size=3, stride=1, padding=(1, 0, 0))
However, it is common to use padding=1 for all dimensions when using kernel_size=3, because that keeps the dimensions unchanged, which makes it much easier to build deeper network, as you don't need to worry about the sizes suddenly getting too small, as it happened already for your depth dimension. Also when no padding is used, the corners are only included in a single calculation, whereas all other elements contribute to multiple calculations. It is recommended to use kernel_size=3 and padding=1 for all your convolutions.
self.conv1 = nn.Conv3d(1, 8, kernel_size=3, stride=1, padding=1)
self.conv2 = nn.Conv3d(8, 16, kernel_size=3, stride=1, padding=1)
self.conv3 = nn.Conv3d(16, 32, kernel_size=3, stride=1, padding=1)
can we pass images for which height!=width through our CNN in Pytorch?
In CNN, I have convolution, batch-norm, max-pool, relu, and fully connected layers.
My network
self.conv_seqn = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=4, stride=4),
nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=4, stride=4),
)
self.fc_seqn = nn.Sequential(
nn.Linear(1843200, 256),
nn.ReLU(inplace=True),
nn.Linear(256, total_configs)
)
my forward function
forward()
{
x = self.conv_seqn(x)
x = x.view(x.size(0), -1)
x = self.fc_seqn(x)
return x
}
If input image of size 3840*1920*3 after applying conv_seqn() it should be of size [1, 128, 120, 60] but I getting the size of [1,128,120,120] (batch size =1 here)
any suggestion will be highly helpful.