I am trying to learn build a U-NET architecture from scratch. I have written this code but the problem is that when I try to run to check the output of the encoder part, I am having issues with it. When you the run the code below , you'll get
import torch
import torch.nn as nn
batch = 1
channels = 3
width = 512 # same as height
image = torch.randn(batch, channels, width, width)
enc = Encoder(channels)
enc(image)
RuntimeError: Given groups=1, weight of size [128, 64, 3, 3], expected input[1, 3, 512, 512] to have 64 channels, but got 3 channels instead
Below is the code:
class ConvolutionBlock(nn.Module):
'''
The basic Convolution Block Which Will have Convolution -> RelU -> Convolution -> RelU
'''
def __init__(self, in_channels, out_channels, upsample:bool = False,):
'''
args:
upsample: If True, then use TransposedConv2D (Means it being used in the decoder part) instead MaxPooling
batch_norm was introduced after UNET so they did not know if it existed. Might be useful
'''
super().__init__()
self.network = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size = 3, padding= 1), # padding is 0 by default, 1 means the input width, height == out width, height
nn.ReLU(),
nn.Conv2d(out_channels, out_channels, kernel_size = 3, padding = 1),
nn.ReLU(),
nn.MaxPool2d(kernel_size = 2, stride = 2) if not upsample else nn.ConvTranspose2d(out_channels, out_channels//2, kernel_size = 2, ) # As it is said in the paper that it TransPose2D halves the features
)
def forward(self, feature_map_x):
'''
feature_map_x could be the image itself or the
'''
return self.network(feature_map_x)
class Encoder(nn.Module):
'''
'''
def __init__(self, image_channels:int = 1, repeat:int = 4):
'''
In UNET, the features start at 64 and keeps getting twice the size of the previous one till it reached BottleNeck
'''
super().__init__()
in_channels = [image_channels,64, 128, 256, 512]
out_channels = [64, 128, 256, 512, 1024]
self.layers = nn.ModuleList(
[ConvolutionBlock(in_channels = in_channels[i], out_channels = out_channels[i]) for i in range(repeat+1)]
)
def forward(self, feature_map_x):
for layer in self.layers:
out = layer(feature_map_x)
return out
EDIT: Running the code below gives me expected info too:
in_ = [3,64, 128, 256, 512]
ou_ = [64, 128, 256, 512, 1024]
width = 512
from torchsummary import summary
for i in range(5):
cb = ConvolutionBlock(in_[i], ou_[i])
summary(cb, (in_[i],width,width))
print('#'*50)
There was a code logic mistake in the forward of Encoder
I did:
for layer in self.layers:
out = layer(feature_map_x)
return out
but I was supposed to use feature_map_x as the input because the loop was iterating over the original feature map before but it was supposed to get the output of previous layer.
for layer in self.layers:
feature_map_x = layer(feature_map_x)
return feature_map_x
Related
I'm trying to find road lanes using PyTorch. I created dataset and my model. But when I try to train my model, I get mat1 and mat2 shapes cannot be multiplied (4x460800 and 80000x16) error. I've tried other topic's solutions but those solutions didn't help me very much.
My dataset is bunch of road images with their validation images. I have .csv file that contains names of images (such as 'image1.jpg, image2.jpg'). Original size of images and validation images is 1280x720. I convert them 200x200 in my dataset code.
Road image:
Validation image:
Here's my dataset:
import os
import pandas as pd
import random
import torch
import torchvision.transforms.functional as TF
from torch.utils.data import Dataset
from torchvision import transforms
from PIL import Image
class Dataset(Dataset):
def __init__(self, csv_file, root_dir, val_dir, transform=None):
self.annotations = pd.read_csv(csv_file)
self.root_dir = root_dir
self.val_dir = val_dir
self.transform = transform
def __len__(self):
return len(self.annotations)
def __getitem__(self, index):
img_path = os.path.join(self.root_dir, self.annotations.iloc[index, 0])
image = Image.open(img_path).convert('RGB')
mask_path = os.path.join(self.val_dir, self.annotations.iloc[index, 0])
mask = Image.open(mask_path).convert('RGB')
transform = transforms.Compose([
transforms.Resize((200, 200)),
transforms.ToTensor()
])
if self.transform:
image = self.transform(image)
mask = self.transform(mask)
return image, mask
My model:
import torch
import torch.nn as nn
class Net(nn.Module):
def __init__(self):
super().__init__()
self.cnn_layers = nn.Sequential(
# Conv2d, 3 inputs, 128 outputs
# 200x200 image size
nn.Conv2d(3, 128, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
# Conv2d, 128 inputs, 64 outputs
# 100x100 image size
nn.Conv2d(128, 64, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
# Conv2d, 64 inputs, 32 outputs
# 50x50 image size
nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2)
)
self.linear_layers = nn.Sequential(
# Linear, 32*50*50 inputs, 16 outputs
nn.Linear(32 * 50 * 50, 16),
# Linear, 16 inputs, 3 outputs
nn.Linear(16, 3)
)
def forward(self, x):
x = self.cnn_layers(x)
x = x.view(x.size(0), -1)
x = self.linear_layers(x)
return x
How to avoid this error and train my images on these validation images?
The answer: In your case, NN input has a shape (3, 1280, 720), not (3, 200, 200) as you want. Probably you have forgotten to modify transform argument in RNetDataset. It stays None, so transforms are not applied and the image is not resized. Another possibility is that it happens due to these lines:
transform = transforms.Compose([
transforms.Resize((200, 200)),
transforms.ToTensor()
])
if self.transform:
image = self.transform(image)
mask = self.transform(mask)
You have two variables named transform, but one with self. - maybe you messed them up. Verify it and the problem should go away.
How I came up with it: 460800 is clearly a tensor size after reshaping before linear layers. According to the architecture, tensor processed with self.cnn_layers should have 32 layers, so its height multiplied by width should give 460800 / 32 = 14400. Suppose that its height = H, width = W, so H x W = 14400. Let's understand, what was the original input size in this case? nn.MaxPool2d(kernel_size=2, stride=2) layer divides height and width by 2, and it happens three times. So, the original input size has been 8H x 8W = 64 x 14400 = 936000. Finally, notice that 936000 = 1280 * 720. This can't be a magical coincidence. Case closed!
Another suggestion: even if you apply transforms correctly, your code might not work. Suppose that you have an input of size (4, 3, 200, 200), where 4 is a batch size. Layers in your architecture will process this input as follows:
nn.Conv2d(3, 128, kernel_size=3, stride=1, padding=1) # -> (4, 128, 200, 200)
nn.MaxPool2d(kernel_size=2, stride=2) # -> (4, 128, 100, 100)
nn.Conv2d(128, 64, kernel_size=3, stride=1, padding=1) # -> (4, 64, 100, 100)
nn.MaxPool2d(kernel_size=2, stride=2) # -> (4, 64, 50, 50)
nn.Conv2d(64, 32, kernel_size=3, stride=1, padding=1) # -> (4, 32, 50, 50)
nn.MaxPool2d(kernel_size=2, stride=2) # -> (4, 32, 25, 25)
So, your first layer in self.linear_layers should be not nn.Linear(32 * 50 * 50, 16), but nn.Linear(32 * 25 * 25, 16). With this change, everything should be fine.
I am implementing a layer in torch fusing multiple atrous convolutions much like the Atrous Spatial Pyramid Pooling in https://arxiv.org/pdf/1706.05587.pdf.
The problem is that adding this layer slows down my training drastically.
The use of multiple dilated (atrous) convolutions drops my gpu utilization from 90% to 30% for some reason.
When I removed the dilated (atrous) convolutions or used just one of them and repeatedly appended it, there wasn't an issue. Are there any suggestions about the possible bottleneck existing in my code below?
The shape of x in the code below is (batch_size, 1024, 16, 16)
class MultiAtrous(nn.Module):
def __init__(self):
super().__init__()
self.dilated_conv1 = nn.Conv2d(
1024, 512, kernel_size=3, dilation=3, padding=3)
self.dilated_conv2 = nn.Conv2d(
1024, 512, kernel_size=3, dilation=6, padding=6)
self.dilated_conv3 = nn.Conv2d(
1024, 512, kernel_size=3, dilation=9, padding=9)
self.conv1x1 = nn.Conv2d(1024, 512, kernel_size=1)
self.gap = nn.AdaptiveAvgPool2d(1)
self.relu = nn.ReLU()
self.upsample = nn.Upsample(size=(16,16), mode='bilinear')
def forward(self, x):
local_feat = []
local_feat.append(self.dilated_conv1(x))
local_feat.append(self.dilated_conv2(x))
local_feat.append(self.dilated_conv3(x))
local_feat.append(self.upsample(self.relu(self.conv1x1(self.gap(x)))))
local_feat = torch.cat(local_feat, dim=1)
return local_feat
I am trying to use LayerNorm inside nn.Sequential in torch. This is what I am looking for-
import torch.nn as nn
class LayerNormCnn(nn.Module):
def __init__(self):
super(LayerNormCnn, self).__init__()
self.net = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1),
nn.LayerNorm(),
nn.ReLU(),
nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
nn.LayerNorm(),
nn.ReLU(),
)
def forward(self, x):
x = self.net(x)
return x
Unfortunately, it doesn't work because LayerNorm requires normalized_shape as input. The code above throws following exception-
nn.LayerNorm(),
TypeError: __init__() missing 1 required positional argument: 'normalized_shape'
Right now, this is how I have implemented it-
import torch.nn as nn
import torch.nn.functional as F
class LayerNormCnn(nn.Module):
def __init__(self, state_shape):
super(LayerNormCnn, self).__init__()
self.conv1 = nn.Conv2d(state_shape[0], 32, kernel_size=3, stride=2, padding=1)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1)
# compute shape by doing a forward pass
with torch.no_grad():
fake_input = torch.randn(1, *state_shape)
out = self.conv1(fake_input)
bn1_size = out.size()[1:]
out = self.conv2(out)
bn2_size = out.size()[1:]
self.bn1 = nn.LayerNorm(bn1_size)
self.bn2 = nn.LayerNorm(bn2_size)
def forward(self, x):
x = F.relu(self.bn1(self.conv1(x)))
x = F.relu(self.bn2(self.conv2(x)))
return x
if __name__ == '__main__':
in_shape = (3, 128, 128)
batch_size = 32
model = LayerNormCnn(in_shape)
x = torch.randn((batch_size,) + in_shape)
out = model(x)
print(out.shape)
Is it possible to use LayerNorm inside nn.Sequential?
The original layer normalisation paper advised against using layer normalisation in CNNs, as receptive fields around the boundary of images will have different values as opposed to the receptive fields in the actual image content. This issue does not arise with RNNs, which is what layer norm was originally tested for. Are you sure you want to be using LayerNorm? If you're looking to compare a different normalisation technique against BatchNorm, consider GroupNorm. This gets rid of the LayerNorm assumption that all channels in a layer contribute equally to a prediction, which is problematic particularly if the layer is convolutional. Instead, each channel is divided further into groups, that still allows a GN layer to learn different statistics across channels.
Please refer here for related discussion.
I'm trying to classify cat and dog in CNN with PyTorch.
While I made few layers and processing images, I found that final processed feature map size doesn't match with calculated size.
So I tried to check feature map size step by step in CNN process with print shape but it doesn't work.
I heard tensorflow enables check tensor size in steps but how can I do that?
What I want is :
def __init__(self):
super(CNN, self).__init__()
conv1 = nn.Conv2d(1, 16, 3, 1, 1)
conv1_1 = nn.Conv2d(16, 16, 3, 1, 1)
pool1 = nn.MaxPool2d(2)
conv2 = nn.Conv2d(16, 32, 3, 1, 1)
conv2_1 = nn.Conv2d(32, 32, 3, 1, 1)
pool2 = nn.MaxPool2d(2)
conv3 = nn.Conv2d(32, 64, 3, 1, 1)
conv3_1 = nn.Conv2d(64, 64, 3, 1, 1)
conv3_2 = nn.Conv2d(64, 64, 3, 1, 1)
pool3 = nn.MaxPool2d(2)
self.conv_module = nn.Sequential(
conv1,
nn.ReLU(),
conv1_1,
nn.ReLU(),
pool1,
# check first result size
conv2,
nn.ReLU(),
conv2_1,
nn.ReLU(),
pool2,
# check second result size
conv3,
nn.ReLU(),
conv3_1,
nn.ReLU(),
conv3_2,
nn.ReLU(),
pool3,
# check third result size
pool4,
# check fourth result size
pool5
# check fifth result size
)
If there's any other way to check feature size at every step, please give some advice.
Thanks in advance.
To do that you shouldn't use nn.Sequential. Just initialize your layers in __init__() and call them in the forward function. In the forward function you can print the shapes out. For example like this:
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(...)
self.maxpool1 = nn.MaxPool2d()
self.conv2 = nn.Conv2d(...)
self.maxpool2 = nn.MaxPool2d()
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.maxpool1(x)
print(x.size())
x = self.conv2(x)
x = F.relu(x)
x = self.maxpool2(x)
print(x.size())
Hope thats what you looking for!
I am trying to input an image and a vector as input to the model. The image has the correct shape of 4d, but the vector that I input doesn't have such shape. The image size is 424x512 while the vector is of shape (18,). After using dataloader, I get batches of shape (50x1x424x512) and (50x18). Model gives error as it needs the vector shape to be 4d too. How do I do that?
Here is my code :
def loadTrainingData_B(args):
fdm = []
tdm = []
parameters = []
for i in image_files[:4]:
try:
false_dm = np.fromfile(join(ref, i), dtype=np.int32)
false_dm = Image.fromarray(false_dm.reshape((424, 512, 9)).astype(np.uint8)[:,:,1])
fdm.append(false_dm)
true_dm = np.fromfile(join(ref, i), dtype=np.int32)
true_dm = Image.fromarray(true_dm.reshape((424, 512, 9)).astype(np.uint8)[:,:,1])
tdm.append(true_dm)
pos = param_filenames.index(i)
param = np.array(params[pos, 1:])
param = np.where(param == '-point-light-source', 1, param).astype(np.float64)
parameters.append(param)
except:
print('[!] File {} not found'.format(i))
return (fdm, parameters, tdm)
class Flat_ModelB(Dataset):
def __init__(self, args, train=True, transform=None):
self.args = args
if train == True:
self.fdm, self.parameters, self.tdm = loadTrainingData_B(self.args)
else:
self.fdm, self.parameters, self.tdm = loadTestData_B(self.args)
self.data_size = len(self.parameters)
self.transform = transforms.Compose([transforms.ToTensor()])
def __getitem__(self, index):
return (self.transform(self.fdm[index]).double(), torch.from_numpy(self.parameters[index]).double(), self.transform(self.tdm[index]).double())
def __len__(self):
return self.data_size
The error I get is :
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1 5 5, but got 2-dimensional input of size [50, 18] instead
Here is the model :
class Model_B(nn.Module):
def __init__(self, config):
super(Model_B, self).__init__()
self.config = config
# CNN layers for fdm
self.layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(16))
self.layer2 = nn.Sequential(
nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer3 = nn.Sequential(
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer4 = nn.Sequential(
nn.ConvTranspose2d(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer5 = nn.Sequential(
nn.ConvTranspose2d(in_channels=32, out_channels=16, kernel_size=5, stride=2, padding=2,output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(16))
self.layer6 = nn.Sequential(
nn.ConvTranspose2d(in_channels=16, out_channels=1, kernel_size=5, stride=2, padding=2, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(1))
# CNN layer for parameters
self.param_layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
def forward(self, x, y):
out = self.layer1(x)
out_param = self.param_layer1(y)
print("LayerParam 1 Output Shape : {}".format(out_param.shape))
print("Layer 1 Output Shape : {}".format(out.shape))
out = self.layer2(out)
print("Layer 2 Output Shape : {}".format(out.shape))
out = self.layer3(out)
# out = torch.cat((out, out_param), dim=2)
print("Layer 3 Output Shape : {}".format(out.shape))
out = self.layer4(out)
print("Layer 4 Output Shape : {}".format(out.shape))
out = self.layer5(out)
print("Layer 5 Output Shape : {}".format(out.shape))
out = self.layer6(out)
print("Layer 6 Output Shape : {}".format(out.shape))
return out
and the method by which I access the data :
for batch_idx, (fdm, parameters) in enumerate(self.data):
if self.config.gpu:
fdm = fdm.to(device)
parameters = parameters.to(device)
print('shape of parameters for model a : {}'.format(parameters.shape))
output = self.model(fdm)
loss = self.criterion(output, parameters)
Edit :
I think my code is incorrect as I am trying to apply convolutions over a vector of (18). I tried to copy the vector and make it (18x64) and then input it. It still doesnt work and gives this output :
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1 5 5, but got 3-dimensional input of size [4, 18, 64] instead
I am not sure how to concatenate an 18 length vector to the output of layer 3, if I can't do any of these things.
Looks like you are training an autoencoder model and want to parameterize it with some additional vector input in the bottleneck layer. If you want to perform some transformations on it then you have to decide whether if you need any spatial dependencies. Given the constant input size (N, 1, 424, 512), the output of layer3 will have a shape (N, 32, 53, 64). You have a lot of options, depending on you desired model performance:
Use a nn.Linear with activations to transform the parameter vector. Then you might add extra spatial dimensions and repeat this vector in all spatial locations:
img = torch.rand((1, 1, 424, 512))
vec = torch.rand(1, 19)
layer3_out = model(img)
N, C, H, W = layer3_out.shape
param_encoder = nn.Sequential(nn.Linear(19, 30), nn.ReLU(), nn.Linear(30, 10))
param = param_encoder(vec)
param = param.unsqueeze(-1).unsqueeze(-1).expand(N, -1, H, W)
encoding = torch.cat([param, layer3_out], dim=1)
Use transposed convolutions to upsample your parameter vector to the size of layer3 output. But that would be harder to implement as you have to calculate exact output shape to fit with (N, 32, 53, 64)
Transform input vector with MLP using nn.Linear to the size 2x of the channels in layer3 output. Then use so called Feature-wise transformations to scale and shift feature maps from layer3.
I would recomend to start with the first option since this is the simplest one to implement and then try others.