Related
I'm working on a CNN for a project using Pytorch lightning. I don't know why am I getting this error. I've check the size of the output from the last maxpool layer and it is (-1,10,128,128). The error is for the linear layer. Any help would be appreciated.
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Conv2d(3,6,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(6,10,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Linear(10*128*128,240),
nn.ReLU(),
nn.Linear(in_features = 240,out_features=101),
nn.ReLU()
)
My error looks like this:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2560x128 and 163840x240)
You have to match the dimension by putting the view method between the feature extractor and the classifier.
And it would be better not to use the relu function in the last part.
Code:
import torch
import torch.nn as nn
class M(nn.Module):
def __init__(self):
super(M, self).__init__()
self.feature_extractor = nn.Sequential(
nn.Conv2d(3,6,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(6,10,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2)
)
self.classifier = nn.Sequential(
nn.Linear(10*128*128,240),
nn.ReLU(),
nn.Linear(in_features = 240,out_features=101)
)
def forward(self, X):
X = self.feature_extractor(X)
X = X.view(X.size(0), -1)
X = self.classifier(X)
return X
model = M()
# batch size, channel size, height, width
X = torch.randn(128, 3, 512, 512)
print(model(X))
You do not use the nn.Flatten() layer. The CNN output should go through this layer and then go to the linear layer.
The last activation function is better to be softmax. The nn.crossentropy in PyTorch has the softmax function in itself.
self.model = nn.Sequential(
nn.Conv2d(3,6,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(6,10,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Flatten(),
nn.Linear(10*128*128,240),
nn.ReLU(),
nn.Linear(in_features = 240,out_features=101)
)
I am using nni framework on python to do Neural Architecture Search. In that I have defined model as:
from nni.nas.pytorch import mutables
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = mutables.LayerChoice([
nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=1)
]) # try 3x3 kernel and 5x5 kernel
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(14400, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x) #Here is error coming
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
What the above code does apart from building the model is it also gives a choice to below algorithm to choose between two layers as the first convolution layer, either layer with 3X3 kernel or 5X5 kernel.
Also I am new to pyTorch so let me know if you can already see a mistake in above.
Moving on, it is coupled by below code:
dataset_train = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
dataset_valid = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), 0.05, momentum=0.9, weight_decay=1.0E-4)
# use NAS here
def top1_accuracy(output, target):
# this is the function that computes the reward, as required by ENAS algorithm
batch_size = target.size(0)
_, predicted = torch.max(output.data, 1)
return (predicted == target).sum().item() / batch_size
def metrics_fn(output, target):
# metrics function receives output and target and computes a dict of metrics
return {"acc1": top1_accuracy(output, target)}
from nni.algorithms.nas.pytorch import enas
trainer = enas.EnasTrainer(model,
loss=criterion,
metrics=metrics_fn,
reward_function=top1_accuracy,
optimizer=optimizer,
batch_size=128,
num_epochs=10, # 10 epochs
dataset_train=dataset_train,
dataset_valid=dataset_valid,
log_frequency=10) # print log every 10 steps
trainer.train() # training
trainer.export(file="model_dir/final_architecture.json") # export the final architecture to file
What the above does is downloads and gets cifar10 dataset, uses the above generated model to train on it and finds which model performs best (based on two choices of layers, you can have more choices as well). But it raises an error:
22 x = self.dropout1(x)
23 x = torch.flatten(x, 1)
---> 24 x = self.fc1(x)
25 x = F.relu(x)
26 x = self.dropout2(x)
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py in forward(self, input)
112
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)
115
116 def extra_repr(self) -> str:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x12544 and 14400x128)
I know this is because the flatten layer converts it to a dimension which is not what the first fully connected layer expects. When I do convert it to what the error says, I get the below error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x14400 and 12544x128)
I believe it happens because of the choice in first convolution layer. My question is how do I fix this? And if nni or something feels not understandable to you, there is the option of just putting the dimensions of fully connected layer as number of hidden units in that layer without mentioning the input in KERAS. But I suppose pyTorch requires input dimension to be correctly put, is there a way I can just say after flatten, to go for a hidden fully connected layer with just the number of units and not the input shape as well which I believe is causing the problems?
For conv with kernel_zise=5 you need to padding=2 and not 1.
Fix:
self.conv1 = mutables.LayerChoice([
nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=1)
])
to
self.conv1 = mutables.LayerChoice([
nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=2) # match padding size to kernel size
])
Update:
Recent versions of pytorch allow you to specify padding='same' and avoid the need to come up with the correct value for padding.
However, I strongly urge you to use the formula for computing the output shape of a convolution layer (found here) and manually compute the correct value for padding. This is a good sanity check to ensure you understand what you are doing.
I'm trying to find road lanes using PyTorch. I created dataset and my model. But when I try to train my model, I get mat1 and mat2 shapes cannot be multiplied (4x460800 and 80000x16) error. I've tried other topic's solutions but those solutions didn't help me very much.
My dataset is bunch of road images with their validation images. I have .csv file that contains names of images (such as 'image1.jpg, image2.jpg'). Original size of images and validation images is 1280x720. I convert them 200x200 in my dataset code.
Road image:
Validation image:
Here's my dataset:
import os
import pandas as pd
import random
import torch
import torchvision.transforms.functional as TF
from torch.utils.data import Dataset
from torchvision import transforms
from PIL import Image
class Dataset(Dataset):
def __init__(self, csv_file, root_dir, val_dir, transform=None):
self.annotations = pd.read_csv(csv_file)
self.root_dir = root_dir
self.val_dir = val_dir
self.transform = transform
def __len__(self):
return len(self.annotations)
def __getitem__(self, index):
img_path = os.path.join(self.root_dir, self.annotations.iloc[index, 0])
image = Image.open(img_path).convert('RGB')
mask_path = os.path.join(self.val_dir, self.annotations.iloc[index, 0])
mask = Image.open(mask_path).convert('RGB')
transform = transforms.Compose([
transforms.Resize((200, 200)),
transforms.ToTensor()
])
if self.transform:
image = self.transform(image)
mask = self.transform(mask)
return image, mask
My model:
import torch
import torch.nn as nn
class Net(nn.Module):
def __init__(self):
super().__init__()
self.cnn_layers = nn.Sequential(
# Conv2d, 3 inputs, 128 outputs
# 200x200 image size
nn.Conv2d(3, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
# Conv2d, 128 inputs, 64 outputs
# 100x100 image size
nn.Conv2d(128, 64, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
# Conv2d, 64 inputs, 32 outputs
# 50x50 image size
nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.linear_layers = nn.Sequential(
# Linear, 32*50*50 inputs, 16 outputs
nn.Linear(32 * 50 * 50, 16),
# Linear, 16 inputs, 3 outputs
nn.Linear(16, 3)
)
def forward(self, x):
x = self.cnn_layers(x)
x = x.view(x.size(0), -1)
x = self.linear_layers(x)
return x
How to avoid this error and train my images on these validation images?
The answer: In your case, NN input has a shape (3, 1280, 720), not (3, 200, 200) as you want. Probably you have forgotten to modify transform argument in RNetDataset. It stays None, so transforms are not applied and the image is not resized. Another possibility is that it happens due to these lines:
transform = transforms.Compose([
transforms.Resize((200, 200)),
transforms.ToTensor()
])
if self.transform:
image = self.transform(image)
mask = self.transform(mask)
You have two variables named transform, but one with self. - maybe you messed them up. Verify it and the problem should go away.
How I came up with it: 460800 is clearly a tensor size after reshaping before linear layers. According to the architecture, tensor processed with self.cnn_layers should have 32 layers, so its height multiplied by width should give 460800 / 32 = 14400. Suppose that its height = H, width = W, so H x W = 14400. Let's understand, what was the original input size in this case? nn.MaxPool2d(kernel_size=2, stride=2) layer divides height and width by 2, and it happens three times. So, the original input size has been 8H x 8W = 64 x 14400 = 936000. Finally, notice that 936000 = 1280 * 720. This can't be a magical coincidence. Case closed!
Another suggestion: even if you apply transforms correctly, your code might not work. Suppose that you have an input of size (4, 3, 200, 200), where 4 is a batch size. Layers in your architecture will process this input as follows:
nn.Conv2d(3, 128, kernel_size=3, stride=1, padding=1) # -> (4, 128, 200, 200)
nn.MaxPool2d(kernel_size=2, stride=2) # -> (4, 128, 100, 100)
nn.Conv2d(128, 64, kernel_size=3, stride=1, padding=1) # -> (4, 64, 100, 100)
nn.MaxPool2d(kernel_size=2, stride=2) # -> (4, 64, 50, 50)
nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1) # -> (4, 32, 50, 50)
nn.MaxPool2d(kernel_size=2, stride=2) # -> (4, 32, 25, 25)
So, your first layer in self.linear_layers should be not nn.Linear(32 * 50 * 50, 16), but nn.Linear(32 * 25 * 25, 16). With this change, everything should be fine.
I am trying to learn build a U-NET architecture from scratch. I have written this code but the problem is that when I try to run to check the output of the encoder part, I am having issues with it. When you the run the code below , you'll get
import torch
import torch.nn as nn
batch = 1
channels = 3
width = 512 # same as height
image = torch.randn(batch, channels, width, width)
enc = Encoder(channels)
enc(image)
RuntimeError: Given groups=1, weight of size [128, 64, 3, 3], expected input[1, 3, 512, 512] to have 64 channels, but got 3 channels instead
Below is the code:
class ConvolutionBlock(nn.Module):
'''
The basic Convolution Block Which Will have Convolution -> RelU -> Convolution -> RelU
'''
def __init__(self, in_channels, out_channels, upsample:bool = False,):
'''
args:
upsample: If True, then use TransposedConv2D (Means it being used in the decoder part) instead MaxPooling
batch_norm was introduced after UNET so they did not know if it existed. Might be useful
'''
super().__init__()
self.network = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size = 3, padding= 1), # padding is 0 by default, 1 means the input width, height == out width, height
nn.ReLU(),
nn.Conv2d(out_channels, out_channels, kernel_size = 3, padding = 1),
nn.ReLU(),
nn.MaxPool2d(kernel_size = 2, stride = 2) if not upsample else nn.ConvTranspose2d(out_channels, out_channels//2, kernel_size = 2, ) # As it is said in the paper that it TransPose2D halves the features
)
def forward(self, feature_map_x):
'''
feature_map_x could be the image itself or the
'''
return self.network(feature_map_x)
class Encoder(nn.Module):
'''
'''
def __init__(self, image_channels:int = 1, repeat:int = 4):
'''
In UNET, the features start at 64 and keeps getting twice the size of the previous one till it reached BottleNeck
'''
super().__init__()
in_channels = [image_channels,64, 128, 256, 512]
out_channels = [64, 128, 256, 512, 1024]
self.layers = nn.ModuleList(
[ConvolutionBlock(in_channels = in_channels[i], out_channels = out_channels[i]) for i in range(repeat+1)]
)
def forward(self, feature_map_x):
for layer in self.layers:
out = layer(feature_map_x)
return out
EDIT: Running the code below gives me expected info too:
in_ = [3,64, 128, 256, 512]
ou_ = [64, 128, 256, 512, 1024]
width = 512
from torchsummary import summary
for i in range(5):
cb = ConvolutionBlock(in_[i], ou_[i])
summary(cb, (in_[i],width,width))
print('#'*50)
There was a code logic mistake in the forward of Encoder
I did:
for layer in self.layers:
out = layer(feature_map_x)
return out
but I was supposed to use feature_map_x as the input because the loop was iterating over the original feature map before but it was supposed to get the output of previous layer.
for layer in self.layers:
feature_map_x = layer(feature_map_x)
return feature_map_x
I am trying to use LayerNorm inside nn.Sequential in torch. This is what I am looking for-
import torch.nn as nn
class LayerNormCnn(nn.Module):
def __init__(self):
super(LayerNormCnn, self).__init__()
self.net = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1),
nn.LayerNorm(),
nn.ReLU(),
nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
nn.LayerNorm(),
nn.ReLU(),
)
def forward(self, x):
x = self.net(x)
return x
Unfortunately, it doesn't work because LayerNorm requires normalized_shape as input. The code above throws following exception-
nn.LayerNorm(),
TypeError: __init__() missing 1 required positional argument: 'normalized_shape'
Right now, this is how I have implemented it-
import torch.nn as nn
import torch.nn.functional as F
class LayerNormCnn(nn.Module):
def __init__(self, state_shape):
super(LayerNormCnn, self).__init__()
self.conv1 = nn.Conv2d(state_shape[0], 32, kernel_size=3, stride=2, padding=1)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1)
# compute shape by doing a forward pass
with torch.no_grad():
fake_input = torch.randn(1, *state_shape)
out = self.conv1(fake_input)
bn1_size = out.size()[1:]
out = self.conv2(out)
bn2_size = out.size()[1:]
self.bn1 = nn.LayerNorm(bn1_size)
self.bn2 = nn.LayerNorm(bn2_size)
def forward(self, x):
x = F.relu(self.bn1(self.conv1(x)))
x = F.relu(self.bn2(self.conv2(x)))
return x
if __name__ == '__main__':
in_shape = (3, 128, 128)
batch_size = 32
model = LayerNormCnn(in_shape)
x = torch.randn((batch_size,) + in_shape)
out = model(x)
print(out.shape)
Is it possible to use LayerNorm inside nn.Sequential?
The original layer normalisation paper advised against using layer normalisation in CNNs, as receptive fields around the boundary of images will have different values as opposed to the receptive fields in the actual image content. This issue does not arise with RNNs, which is what layer norm was originally tested for. Are you sure you want to be using LayerNorm? If you're looking to compare a different normalisation technique against BatchNorm, consider GroupNorm. This gets rid of the LayerNorm assumption that all channels in a layer contribute equally to a prediction, which is problematic particularly if the layer is convolutional. Instead, each channel is divided further into groups, that still allows a GN layer to learn different statistics across channels.
Please refer here for related discussion.