I am new to AI and python, I'm trying to build an architecture to train a set of images. and later to aim to overfit. but up till now, I couldn't understand how to get the inputs and outputs correctly. I keep seeing the error whenever I try to train the network:
mat1 and mat2 shapes cannot be multiplied (48x13456 and 16x64)
my network:
net2 = nn.Sequential(
nn.Conv2d(3,8, kernel_size=5, padding=0),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(8,16, kernel_size=5, padding=0),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Linear(16,64),
nn.ReLU(),
nn.Linear(64,10)
)
this is a part of a task I'm working on and I really don't get why it's not running. any hints!
its because you have flattened your 2D cnn into 1D FC layers...
& you have to manually calculate your changed input shape from 128 size to your Maxpool layer just before flattening layer ...In your case its 29*29*16
So your code must be rewritten as
net2 = nn.Sequential(
nn.Conv2d(3,8, kernel_size=5, padding=0),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(8,16, kernel_size=5, padding=0),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Linear(13456,64),
nn.ReLU(),
nn.Linear(64,10)
)
This should work
EDIT: This is a simple formula to calculate output size :
(((W - K + 2P)/S) + 1)
Here W = Input size
K = Filter size
S = Stride
P = Padding
So 1st conv block will make your output of size 124
Then you do Maxpool which will make it half i.e 62
2nd conv block will make your output of size 58
Then your last Maxpool will make it 29...
So final flattened output would be 29*29*16 where 16 is output channels
Related
I'm working on a CNN for a project using Pytorch lightning. I don't know why am I getting this error. I've check the size of the output from the last maxpool layer and it is (-1,10,128,128). The error is for the linear layer. Any help would be appreciated.
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Conv2d(3,6,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(6,10,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Linear(10*128*128,240),
nn.ReLU(),
nn.Linear(in_features = 240,out_features=101),
nn.ReLU()
)
My error looks like this:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2560x128 and 163840x240)
You have to match the dimension by putting the view method between the feature extractor and the classifier.
And it would be better not to use the relu function in the last part.
Code:
import torch
import torch.nn as nn
class M(nn.Module):
def __init__(self):
super(M, self).__init__()
self.feature_extractor = nn.Sequential(
nn.Conv2d(3,6,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(6,10,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2)
)
self.classifier = nn.Sequential(
nn.Linear(10*128*128,240),
nn.ReLU(),
nn.Linear(in_features = 240,out_features=101)
)
def forward(self, X):
X = self.feature_extractor(X)
X = X.view(X.size(0), -1)
X = self.classifier(X)
return X
model = M()
# batch size, channel size, height, width
X = torch.randn(128, 3, 512, 512)
print(model(X))
You do not use the nn.Flatten() layer. The CNN output should go through this layer and then go to the linear layer.
The last activation function is better to be softmax. The nn.crossentropy in PyTorch has the softmax function in itself.
self.model = nn.Sequential(
nn.Conv2d(3,6,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(6,10,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Flatten(),
nn.Linear(10*128*128,240),
nn.ReLU(),
nn.Linear(in_features = 240,out_features=101)
)
This trained variational autoencoder (VAE) network produces good reconstructions apart from persistent artefacts (see below). They appear some time during training and then persistent however long you continue to train for.
Started appearing after adding BatchNorm2d and Sigmoid layers to decoder
Sigmoid used to squash the output values to [0 to 1] and BatchNorm2D was then necessary to produce any meaningful reconstructions
See previous issue on Cross Validated
My theory is the weight values for those pixels in the Conv2D layers are getting set way beyond the normal range and are unable to recover.
How can I stop these artefacts appearing?
self.kl_coefficient = 100
self.latent_dim = 8
self.input_image_height = 64
self.input_image_channels = 3
self.leaky_relu_negative_slope = 0.01
# Encoder / Decoder architecture
self.encoder = nn.Sequential(
nn.Conv2d(in_channels=self.input_image_channels, out_channels=8, kernel_size=4, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.Conv2d(in_channels=8, out_channels=16, kernel_size=4, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.Conv2d(in_channels=16, out_channels=32, kernel_size=4, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=4, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
Flatten()
)
self.decoder = nn.Sequential(
nn.Linear(in_features=self.latent_dim, out_features=64),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
Unflatten(image_height=self.input_image_height),
nn.ConvTranspose2d(in_channels=64, out_channels=64, kernel_size=5, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.ConvTranspose2d(in_channels=64, out_channels=64, kernel_size=5, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.ConvTranspose2d(in_channels=64, out_channels=32, kernel_size=6, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.ConvTranspose2d(in_channels=32, out_channels=16, kernel_size=6, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.Conv2d(in_channels=16, out_channels=3, kernel_size=1, stride=1),
nn.BatchNorm2d(3,),
nn.Sigmoid()
)
# Fully connected layers
self.fc_pose_mu = nn.Linear(256, self.latent_dim)
self.fc_pose_log_var = nn.Linear(256, self.latent_dim)
def step(self, batch, batch_idx):
reconstructed_images, q, p = self._run_step(batch)
# 1. Reconstruction Loss (Mean Squared Error)
reconstruction_loss = F.mse_loss(batch, reconstructed_images, reduction="sum")
# 2. Training Stability Loss - latent distribution VS standard Gaussian (KL divergence)
q_vs_standard_gaussian = torch.distributions.kl_divergence(q, p)
training_stability_loss = q_vs_standard_gaussian.mean()
training_stability_loss *= self.kl_coefficient
# TOTAL loss
return reconstruction_loss + training_stability_loss
Results
I am struggling to work out how to calculate the dimensions for the fully connected layer. I am inputing images which are (448x448) using a batch size (16). Below is the code for my convolutional layers:
class ConvolutionalNet(nn.Module):
def __init__(self, num_classes=182):
super().__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(3, 16, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(16),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer2 = nn.Sequential(
nn.Conv2d(16, 32, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer3 = nn.Sequential(
nn.Conv2d(32, 32, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer4 = nn.Sequential(
nn.Conv2d(32, 64, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer5 = nn.Sequential(
nn.Conv2d(64, 64, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
I want to add a fully connected layer:
self.fc = nn.Linear(?, num_classes)
Would anyone be able to explain the best way to go about calculating this? Also, if I have multiple fully connected layers e.g. (self.fc2, self.fc3), would the second parameter always equal the number of classes. I am new to coding and finding it hard to wrap my head around this.
The conv layers don't change the width/height of the features since you've set padding equal to (kernel_size - 1) / 2. Max pooling with kernel_size = stride = 2 will decrease the width/height by a factor of 2 (rounded down if input shape is not even).
Using 448 as input width/height, the output width/height will be 448 // 2 // 2 // 2 // 2 // 2 = 448/32 = 14 (where // is floor-divide operator).
The number of channels is fully determined by the last conv layer, which outputs 64 channels.
Therefore you will have a [B,64,14,14] shaped tensor, so the Linear layer should have in_features = 64*14*14 = 12544.
Note you'll need to flatten the input beforehand, something like.
self.layer6 = nn.Sequential(
nn.Flatten(),
nn.Linear(12544, num_classes)
)
I am looking at an implementation of AlexNet with PyTorch. According to the formula, output height = (input_height + padding_top + padding_bottom - kernel_height) / stride_height + 1. so using the formula, with input of size 224, stride = 4, padding = 1, and kernel size =11, the output should be of size 54.75. But if you run a summary of the model, you would see that the output of this first layer to be 54. Does PyTorch clip down the output size? If so, does it consistently clip down (seem like it)? I would like to understand what is going behind the scene please .
Here is the code that I refer to:
net = nn.Sequential(
nn.Conv2d(1, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2), nn.Flatten(),
nn.Linear(6400, 4096), nn.ReLU(), nn.Dropout(p=0.5),
nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(p=0.5),
nn.Linear(4096, 10))
The output size is an whole number of course! It's just that your formula is not correct, the expression is height = floor((input_height + padding_top + padding_bottom - kernel_height) / stride_height + 1). This wouldn't make any sense otherwise.
Yes, it's clipped down when you have length of output by decimal point.
Because when you look at the output variable, it'd be a matrix. Every element in a matrix worth a length, and there is no guarantee an element for 0.75 length, so it's either being clipped down to 54 or rounded up to 55. And because the kernel have to stop striding at where the next stride don't match the kernel size(you may draw to understand how funny it would be if the kernel didn't stop there), it has to be clipped down.
can we pass images for which height!=width through our CNN in Pytorch?
In CNN, I have convolution, batch-norm, max-pool, relu, and fully connected layers.
My network
self.conv_seqn = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=4, stride=4),
nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=4, stride=4),
)
self.fc_seqn = nn.Sequential(
nn.Linear(1843200, 256),
nn.ReLU(inplace=True),
nn.Linear(256, total_configs)
)
my forward function
forward()
{
x = self.conv_seqn(x)
x = x.view(x.size(0), -1)
x = self.fc_seqn(x)
return x
}
If input image of size 3840*1920*3 after applying conv_seqn() it should be of size [1, 128, 120, 60] but I getting the size of [1,128,120,120] (batch size =1 here)
any suggestion will be highly helpful.