Noob here, hard to elaborate my question without an example,
so I use a model on the MNIST data that classifies digits based on number images.
# Load data
trainset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
model = nn.Sequential(nn.Linear(784, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))
Why does model end up with 64 (row) x 10 (column) matrix?
I thought nn.Linear(64, 10) means a layer that has 64 input neurons to 10 neurons. Shouldn't it be an array of 10 probabilities?
and Why output activation function has dim=1 not dim=0?
Isn't each row of 10 columns for an epoch? Shouldn't LogSoftmax being used to calculate the possibility of each digit?
I'm ...lost.
I have spent 2 hr on this, still can't find the answer, sorry for the noob question!
We usually have our data in the form of (BATCH SIZE, INPUT SIZE) which here in your case would be (64, 784).
What this means is that in every batch you have 64 images and each image has 784 features.
Regarding your model this is what it outputs :
model = nn.Sequential(nn.Linear(784, 128),
nn.ReLU(),
nn.Linear(128, 64),
nn.ReLU(),
nn.Linear(64, 10),
nn.LogSoftmax(dim=1))
print(model)
# Sequential(
# (0): Linear(in_features=784, out_features=128, bias=True)
# (1): ReLU()
# (2): Linear(in_features=128, out_features=64, bias=True)
# (3): ReLU()
# (4): Linear(in_features=64, out_features=10, bias=True)
# (5): LogSoftmax(dim=1)
# )
Let's go through how the data will flow through this model.
You have input of shape (64, 784)
It passes through first Linear layer, where each image of 784 features is converted to have 128 features so output is of shape (64, 128)
ReLU does not change the shape just the values so shape is again (64, 128)
Next Linear layer converts 128 features to 64 so now output shape is (64, 64).
Again ReLU layer just changes values so shape is still (64, 64)
This last Linear layer maps 64 input features to 10 output ones so shape is now (64, 10).
Lastly we have the LogSoftmax layer. Here we provided dim=1 because we want to calculate output possibility for each of the possible 10 digits for each of the 64 images in out batch. The dim=0 is the batch and dim=1 is the outputs for digits that is we we provide dim=1. After this your output will have shape (64, 10).
Therefore at the end, each image in the batch will have possibility for each of the 10 digits.
I thought nn.Linear(64, 10) means a layer that has 64 input neurons to 10 neurons.
That is correct. Another point to remember is that batch dimension is not specified in the layers of the models. We define layers to operate on each image. Your second last Linear layer output 64 values for an image so last Linear layer converts it to 10 values and then applies LogSoftmax to it.
This operation is simply repeated for all 64 images in a batch efficiently using matrix operations.
You might be confusing your batch_size=64 with the input_features=64 of Linear layer which is entirely unrelated.
Related
I have a ResNet9 model, implemented in Pytorch which I am using for multi-class image classification. My total number of classes is 6. Using the following code, from torchsummary library, I am able to show the summary of the model, seen in the attached image:
INPUT_SHAPE = (3, 256, 256) #input shape of my image
print(summary(model.cuda(), (INPUT_SHAPE)))
However, I am quite confused about the -1 values in all layers of the ResNet9 model. Also, for Conv2d-1 layer, I am confused about the 64 value in the output shape [-1, 64, 256, 256] as I believe the n_channels value of the input image is 3. Can anyone please help me with the explanation of the output shape values? Thanks!
Yes
your INPUT_SHAPE is torch.Size([3, 256, 256]) if it's channel first format AND (256, 256, 3) if it's channel last format.
As Pytorch model accepts it in channel first format , for you it shows torch.Size([3, 256, 256])
and talking about our output shape [-1, 64, 256, 256], this is the output shape of your first conv output which has 64 filter each of 256x256 dim and not your input_shape.
-1 represents your variable batch_size which can be fixed in dataloader
I am reading this research paper (https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf) and trying to follow along with the code on Github. I don't understand how the parameters for the nn.Conv2d() were determined. For the first Conv2d: Does 64#96*96 mean 64 channels with a 96 x 96 kernel size? And if so then why is the kernel size 10 in the function? I have googled the parameters and their meanings and from what I read I understand that its (input_channels, output_channels, kernel_size)
Here is the github post: https://github.com/fangpin/siamese-pytorch/blob/master/train.py
For reference page 4 of the research paper has the model schematic.
self.conv = nn.Sequential(
nn.Conv2d(1, 64, 10), # 64#96*96
nn.ReLU(inplace=True),
nn.MaxPool2d(2), # 64#48*48
nn.Conv2d(64, 128, 7),
nn.ReLU(), # 128#42*42
nn.MaxPool2d(2), # 128#21*21
nn.Conv2d(128, 128, 4),
nn.ReLU(), # 128#18*18
nn.MaxPool2d(2), # 128#9*9
nn.Conv2d(128, 256, 4),
nn.ReLU(), # 256#6*6
)
self.liner = nn.Sequential(nn.Linear(9216, 4096), nn.Sigmoid())
self.out = nn.Linear(4096, 1)
If you look at the model schematic, it's showing two things,
Parameters of the convolution kernel,
Parameters of the feature maps (output of the nn.Conv2D op)
For example first conv2d layer is 64#10x10, meaning 64 output channels and a 10x10 kernel.
Whereas the feature map is 64#96x96, which comes from applying 64#10x10 convolution op on 105x105x1 sized input. This way you get 64 output channels and a 105-10+1=96 sized width and height.
This question already has answers here:
Pytorch - Inferring linear layer in_features
(2 answers)
Closed 1 year ago.
I was trying to learn PyTorch and came across a tutorial where a CNN is defined like below,
class Net(Module):
def __init__(self):
super(Net, self).__init__()
self.cnn_layers = Sequential(
# Defining a 2D convolution layer
Conv2d(1, 4, kernel_size=3, stride=1, padding=1),
BatchNorm2d(4),
ReLU(inplace=True),
MaxPool2d(kernel_size=2, stride=2),
# Defining another 2D convolution layer
Conv2d(4, 4, kernel_size=3, stride=1, padding=1),
BatchNorm2d(4),
ReLU(inplace=True),
MaxPool2d(kernel_size=2, stride=2),
)
self.linear_layers = Sequential(
Linear(4 * 7 * 7, 10)
)
# Defining the forward pass
def forward(self, x):
x = self.cnn_layers(x)
x = x.view(x.size(0), -1)
x = self.linear_layers(x)
return x
I understood how the cnn_layers are made. After the cnn_layers, the data should be flattened and given to linear_layers.
I don't understand how the number of features to Linear is 4*7*7. I understand that 4 is the output dimension from the last Conv2d layer.
How is 7*7 coming in to picture? Does stride or padding got any role in that?
Input image shape is [1, 28, 28]
Conv2d layers have a kernel size of 3, stride and padding of 1, which means it doesn't change the spatial size of an image. There are two MaxPool2d layers which reduce the spatial dimensions from (H, W) to (H/2, W/2). So, for each batch, output of the last convolution with 4 output channels has a shape of (batch_size, 4, H/4, W/4). In the forward pass feature tensor is flattened by x = x.view(x.size(0), -1) which makes it in the shape (batch_size, H*W/4). I assume H and W are 28, for which the linear layer would take inputs of shape (batch_size, 196).
Actually,
in the 2D convolution layers features [values] in a matric [2D-tensor],
As usual neural network end up with a fully connected layer followed by the logist later.
so, features in the fully-connected layer in the vector [1D-tensor].
therefore we have to map each feature [value] in the last metric into the fully-connected layer follows.
in pytorch implementation of the fully-connected layer is Linear class.
the first parameter is the number of input features:
in this case
input_image : (28,28,1)
after_Conv2d_1 : (28,28,4) <- because of the padding : if padding := 0 then (26,26,1)
after_maxPool_1 : (14,14,4) <- due to the stride of 2
after_Conv2D_2 : (14,14,4) <- because this is "same" padding
after_maxPool_2 : (7,7,4)
in the end, the total number of features before the fully connected layer is 4*7*7.
Also, here shows why we use an odd number for the kernel size and start from images with even number of pixels
I understand Conv1d strides in one dimension. But my input is of shape [64, 20, 161], where 64 is the batches, 20 is the sequence length and 161 is the dimension of my vector.
I'm not sure how to set up my Conv1d to stride over the vector.
I'm trying:
self.conv1 = torch.nn.Conv1d(batch_size, 20, 161, stride=1)
but getting:
RuntimeError: Given groups=1, weight of size 20 64 161, expected input[64, 20, 161] to have 64 channels, but got 20 channels instead
According to the documentation:
torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
in_channels is the number of channels in your input, number of channels usually an computer vision term, in your case this number is 20.
out_channels size of your output, it depends on how much output you want.
For 1D convolution, you can think of number of channels as "number of input vectors" and "number of output feature vectors". And size (not number) of output feature vectors are decided from other parameters like kernel_size, strike, padding, dilation.
An example usage:
t = torch.randn((64, 20, 161))
conv = torch.nn.Conv1d(20, 100)
conv(t)
Note: You never specify batch size in torch.nn modules, first dimension is always assumed to be batch size.
Bellow is a piece of example code from the documentation in Keras. It looks like the first convolution accepts a 256x256 image with 3 color channels. It has 64 output filters (I think these are the same as feature maps which I have read about elsewhere can someone confirm this for me). What confuses me is that the output size is (None, 64, 256, 256). I would expect it to be (None, 64 * 3, 256, 256) since it would need to do convolutions for each of the color channels. What I am wondering is how does Keras handel the color channels. Do the values get averaged together (converted to grey scale) before passing though the convolution?
# apply a 3x3 convolution with 64 output filters on a 256x256 image:
model = Sequential()
model.add(Convolution2D(64, 3, 3, border_mode='same', input_shape=(3, 256, 256)))
# now model.output_shape == (None, 64, 256, 256)
# add a 3x3 convolution on top, with 32 output filters:
model.add(Convolution2D(32, 3, 3, border_mode='same'))
# now model.output_shape == (None, 32, 256, 256)
a filter of size 3*3 with 3 input channels consists of 3*3*3 parameters, so the weights of the convolution kernels for each channel are different.
it sums up the convolution results of each channel (probably together with a bias term) to get the output. so the output shape is independent of the number of input channels, for example, (None, 64, 256, 256) rather than (None, 64 * 3, 256, 256).
I'm not 100% sure but I think a feature map refers to the output of applying one such filter to the input (for example a 256*256 matrix).