I am trying to use LayerNorm inside nn.Sequential in torch. This is what I am looking for-
import torch.nn as nn
class LayerNormCnn(nn.Module):
def __init__(self):
super(LayerNormCnn, self).__init__()
self.net = nn.Sequential(
nn.Conv2d(3, 32, kernel_size=3, stride=2, padding=1),
nn.LayerNorm(),
nn.ReLU(),
nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1),
nn.LayerNorm(),
nn.ReLU(),
)
def forward(self, x):
x = self.net(x)
return x
Unfortunately, it doesn't work because LayerNorm requires normalized_shape as input. The code above throws following exception-
nn.LayerNorm(),
TypeError: __init__() missing 1 required positional argument: 'normalized_shape'
Right now, this is how I have implemented it-
import torch.nn as nn
import torch.nn.functional as F
class LayerNormCnn(nn.Module):
def __init__(self, state_shape):
super(LayerNormCnn, self).__init__()
self.conv1 = nn.Conv2d(state_shape[0], 32, kernel_size=3, stride=2, padding=1)
self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=2, padding=1)
# compute shape by doing a forward pass
with torch.no_grad():
fake_input = torch.randn(1, *state_shape)
out = self.conv1(fake_input)
bn1_size = out.size()[1:]
out = self.conv2(out)
bn2_size = out.size()[1:]
self.bn1 = nn.LayerNorm(bn1_size)
self.bn2 = nn.LayerNorm(bn2_size)
def forward(self, x):
x = F.relu(self.bn1(self.conv1(x)))
x = F.relu(self.bn2(self.conv2(x)))
return x
if __name__ == '__main__':
in_shape = (3, 128, 128)
batch_size = 32
model = LayerNormCnn(in_shape)
x = torch.randn((batch_size,) + in_shape)
out = model(x)
print(out.shape)
Is it possible to use LayerNorm inside nn.Sequential?
The original layer normalisation paper advised against using layer normalisation in CNNs, as receptive fields around the boundary of images will have different values as opposed to the receptive fields in the actual image content. This issue does not arise with RNNs, which is what layer norm was originally tested for. Are you sure you want to be using LayerNorm? If you're looking to compare a different normalisation technique against BatchNorm, consider GroupNorm. This gets rid of the LayerNorm assumption that all channels in a layer contribute equally to a prediction, which is problematic particularly if the layer is convolutional. Instead, each channel is divided further into groups, that still allows a GN layer to learn different statistics across channels.
Please refer here for related discussion.
Related
I'm working on a CNN for a project using Pytorch lightning. I don't know why am I getting this error. I've check the size of the output from the last maxpool layer and it is (-1,10,128,128). The error is for the linear layer. Any help would be appreciated.
def __init__(self):
super().__init__()
self.model = nn.Sequential(
nn.Conv2d(3,6,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(6,10,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Linear(10*128*128,240),
nn.ReLU(),
nn.Linear(in_features = 240,out_features=101),
nn.ReLU()
)
My error looks like this:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (2560x128 and 163840x240)
You have to match the dimension by putting the view method between the feature extractor and the classifier.
And it would be better not to use the relu function in the last part.
Code:
import torch
import torch.nn as nn
class M(nn.Module):
def __init__(self):
super(M, self).__init__()
self.feature_extractor = nn.Sequential(
nn.Conv2d(3,6,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(6,10,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2)
)
self.classifier = nn.Sequential(
nn.Linear(10*128*128,240),
nn.ReLU(),
nn.Linear(in_features = 240,out_features=101)
)
def forward(self, X):
X = self.feature_extractor(X)
X = X.view(X.size(0), -1)
X = self.classifier(X)
return X
model = M()
# batch size, channel size, height, width
X = torch.randn(128, 3, 512, 512)
print(model(X))
You do not use the nn.Flatten() layer. The CNN output should go through this layer and then go to the linear layer.
The last activation function is better to be softmax. The nn.crossentropy in PyTorch has the softmax function in itself.
self.model = nn.Sequential(
nn.Conv2d(3,6,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Conv2d(6,10,4,padding=2),
nn.ReLU(),
nn.MaxPool2d(2),
nn.Flatten(),
nn.Linear(10*128*128,240),
nn.ReLU(),
nn.Linear(in_features = 240,out_features=101)
)
I am using nni framework on python to do Neural Architecture Search. In that I have defined model as:
from nni.nas.pytorch import mutables
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = mutables.LayerChoice([
nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=1)
]) # try 3x3 kernel and 5x5 kernel
self.conv2 = nn.Conv2d(32, 64, 3, 1)
self.dropout1 = nn.Dropout2d(0.25)
self.dropout2 = nn.Dropout2d(0.5)
self.fc1 = nn.Linear(14400, 128)
self.fc2 = nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, 2)
x = self.dropout1(x)
x = torch.flatten(x, 1)
x = self.fc1(x) #Here is error coming
x = F.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
output = F.log_softmax(x, dim=1)
return output
What the above code does apart from building the model is it also gives a choice to below algorithm to choose between two layers as the first convolution layer, either layer with 3X3 kernel or 5X5 kernel.
Also I am new to pyTorch so let me know if you can already see a mistake in above.
Moving on, it is coupled by below code:
dataset_train = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
dataset_valid = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), 0.05, momentum=0.9, weight_decay=1.0E-4)
# use NAS here
def top1_accuracy(output, target):
# this is the function that computes the reward, as required by ENAS algorithm
batch_size = target.size(0)
_, predicted = torch.max(output.data, 1)
return (predicted == target).sum().item() / batch_size
def metrics_fn(output, target):
# metrics function receives output and target and computes a dict of metrics
return {"acc1": top1_accuracy(output, target)}
from nni.algorithms.nas.pytorch import enas
trainer = enas.EnasTrainer(model,
loss=criterion,
metrics=metrics_fn,
reward_function=top1_accuracy,
optimizer=optimizer,
batch_size=128,
num_epochs=10, # 10 epochs
dataset_train=dataset_train,
dataset_valid=dataset_valid,
log_frequency=10) # print log every 10 steps
trainer.train() # training
trainer.export(file="model_dir/final_architecture.json") # export the final architecture to file
What the above does is downloads and gets cifar10 dataset, uses the above generated model to train on it and finds which model performs best (based on two choices of layers, you can have more choices as well). But it raises an error:
22 x = self.dropout1(x)
23 x = torch.flatten(x, 1)
---> 24 x = self.fc1(x)
25 x = F.relu(x)
26 x = self.dropout2(x)
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1129 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130 return forward_call(*input, **kwargs)
1131 # Do not call functions when jit is used
1132 full_backward_hooks, non_full_backward_hooks = [], []
/usr/local/lib/python3.7/dist-packages/torch/nn/modules/linear.py in forward(self, input)
112
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)
115
116 def extra_repr(self) -> str:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x12544 and 14400x128)
I know this is because the flatten layer converts it to a dimension which is not what the first fully connected layer expects. When I do convert it to what the error says, I get the below error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (128x14400 and 12544x128)
I believe it happens because of the choice in first convolution layer. My question is how do I fix this? And if nni or something feels not understandable to you, there is the option of just putting the dimensions of fully connected layer as number of hidden units in that layer without mentioning the input in KERAS. But I suppose pyTorch requires input dimension to be correctly put, is there a way I can just say after flatten, to go for a hidden fully connected layer with just the number of units and not the input shape as well which I believe is causing the problems?
For conv with kernel_zise=5 you need to padding=2 and not 1.
Fix:
self.conv1 = mutables.LayerChoice([
nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=1)
])
to
self.conv1 = mutables.LayerChoice([
nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1),
nn.Conv2d(3, 32, kernel_size=5, stride=1, padding=2) # match padding size to kernel size
])
Update:
Recent versions of pytorch allow you to specify padding='same' and avoid the need to come up with the correct value for padding.
However, I strongly urge you to use the formula for computing the output shape of a convolution layer (found here) and manually compute the correct value for padding. This is a good sanity check to ensure you understand what you are doing.
I am trying to learn build a U-NET architecture from scratch. I have written this code but the problem is that when I try to run to check the output of the encoder part, I am having issues with it. When you the run the code below , you'll get
import torch
import torch.nn as nn
batch = 1
channels = 3
width = 512 # same as height
image = torch.randn(batch, channels, width, width)
enc = Encoder(channels)
enc(image)
RuntimeError: Given groups=1, weight of size [128, 64, 3, 3], expected input[1, 3, 512, 512] to have 64 channels, but got 3 channels instead
Below is the code:
class ConvolutionBlock(nn.Module):
'''
The basic Convolution Block Which Will have Convolution -> RelU -> Convolution -> RelU
'''
def __init__(self, in_channels, out_channels, upsample:bool = False,):
'''
args:
upsample: If True, then use TransposedConv2D (Means it being used in the decoder part) instead MaxPooling
batch_norm was introduced after UNET so they did not know if it existed. Might be useful
'''
super().__init__()
self.network = nn.Sequential(
nn.Conv2d(in_channels, out_channels, kernel_size = 3, padding= 1), # padding is 0 by default, 1 means the input width, height == out width, height
nn.ReLU(),
nn.Conv2d(out_channels, out_channels, kernel_size = 3, padding = 1),
nn.ReLU(),
nn.MaxPool2d(kernel_size = 2, stride = 2) if not upsample else nn.ConvTranspose2d(out_channels, out_channels//2, kernel_size = 2, ) # As it is said in the paper that it TransPose2D halves the features
)
def forward(self, feature_map_x):
'''
feature_map_x could be the image itself or the
'''
return self.network(feature_map_x)
class Encoder(nn.Module):
'''
'''
def __init__(self, image_channels:int = 1, repeat:int = 4):
'''
In UNET, the features start at 64 and keeps getting twice the size of the previous one till it reached BottleNeck
'''
super().__init__()
in_channels = [image_channels,64, 128, 256, 512]
out_channels = [64, 128, 256, 512, 1024]
self.layers = nn.ModuleList(
[ConvolutionBlock(in_channels = in_channels[i], out_channels = out_channels[i]) for i in range(repeat+1)]
)
def forward(self, feature_map_x):
for layer in self.layers:
out = layer(feature_map_x)
return out
EDIT: Running the code below gives me expected info too:
in_ = [3,64, 128, 256, 512]
ou_ = [64, 128, 256, 512, 1024]
width = 512
from torchsummary import summary
for i in range(5):
cb = ConvolutionBlock(in_[i], ou_[i])
summary(cb, (in_[i],width,width))
print('#'*50)
There was a code logic mistake in the forward of Encoder
I did:
for layer in self.layers:
out = layer(feature_map_x)
return out
but I was supposed to use feature_map_x as the input because the loop was iterating over the original feature map before but it was supposed to get the output of previous layer.
for layer in self.layers:
feature_map_x = layer(feature_map_x)
return feature_map_x
I am implementing a layer in torch fusing multiple atrous convolutions much like the Atrous Spatial Pyramid Pooling in https://arxiv.org/pdf/1706.05587.pdf.
The problem is that adding this layer slows down my training drastically.
The use of multiple dilated (atrous) convolutions drops my gpu utilization from 90% to 30% for some reason.
When I removed the dilated (atrous) convolutions or used just one of them and repeatedly appended it, there wasn't an issue. Are there any suggestions about the possible bottleneck existing in my code below?
The shape of x in the code below is (batch_size, 1024, 16, 16)
class MultiAtrous(nn.Module):
def __init__(self):
super().__init__()
self.dilated_conv1 = nn.Conv2d(
1024, 512, kernel_size=3, dilation=3, padding=3)
self.dilated_conv2 = nn.Conv2d(
1024, 512, kernel_size=3, dilation=6, padding=6)
self.dilated_conv3 = nn.Conv2d(
1024, 512, kernel_size=3, dilation=9, padding=9)
self.conv1x1 = nn.Conv2d(1024, 512, kernel_size=1)
self.gap = nn.AdaptiveAvgPool2d(1)
self.relu = nn.ReLU()
self.upsample = nn.Upsample(size=(16,16), mode='bilinear')
def forward(self, x):
local_feat = []
local_feat.append(self.dilated_conv1(x))
local_feat.append(self.dilated_conv2(x))
local_feat.append(self.dilated_conv3(x))
local_feat.append(self.upsample(self.relu(self.conv1x1(self.gap(x)))))
local_feat = torch.cat(local_feat, dim=1)
return local_feat
I am trying to input an image and a vector as input to the model. The image has the correct shape of 4d, but the vector that I input doesn't have such shape. The image size is 424x512 while the vector is of shape (18,). After using dataloader, I get batches of shape (50x1x424x512) and (50x18). Model gives error as it needs the vector shape to be 4d too. How do I do that?
Here is my code :
def loadTrainingData_B(args):
fdm = []
tdm = []
parameters = []
for i in image_files[:4]:
try:
false_dm = np.fromfile(join(ref, i), dtype=np.int32)
false_dm = Image.fromarray(false_dm.reshape((424, 512, 9)).astype(np.uint8)[:,:,1])
fdm.append(false_dm)
true_dm = np.fromfile(join(ref, i), dtype=np.int32)
true_dm = Image.fromarray(true_dm.reshape((424, 512, 9)).astype(np.uint8)[:,:,1])
tdm.append(true_dm)
pos = param_filenames.index(i)
param = np.array(params[pos, 1:])
param = np.where(param == '-point-light-source', 1, param).astype(np.float64)
parameters.append(param)
except:
print('[!] File {} not found'.format(i))
return (fdm, parameters, tdm)
class Flat_ModelB(Dataset):
def __init__(self, args, train=True, transform=None):
self.args = args
if train == True:
self.fdm, self.parameters, self.tdm = loadTrainingData_B(self.args)
else:
self.fdm, self.parameters, self.tdm = loadTestData_B(self.args)
self.data_size = len(self.parameters)
self.transform = transforms.Compose([transforms.ToTensor()])
def __getitem__(self, index):
return (self.transform(self.fdm[index]).double(), torch.from_numpy(self.parameters[index]).double(), self.transform(self.tdm[index]).double())
def __len__(self):
return self.data_size
The error I get is :
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1 5 5, but got 2-dimensional input of size [50, 18] instead
Here is the model :
class Model_B(nn.Module):
def __init__(self, config):
super(Model_B, self).__init__()
self.config = config
# CNN layers for fdm
self.layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(16))
self.layer2 = nn.Sequential(
nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer3 = nn.Sequential(
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer4 = nn.Sequential(
nn.ConvTranspose2d(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(32))
self.layer5 = nn.Sequential(
nn.ConvTranspose2d(in_channels=32, out_channels=16, kernel_size=5, stride=2, padding=2,output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(16))
self.layer6 = nn.Sequential(
nn.ConvTranspose2d(in_channels=16, out_channels=1, kernel_size=5, stride=2, padding=2, output_padding=1),
nn.ReLU(),
nn.BatchNorm2d(1))
# CNN layer for parameters
self.param_layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5, stride=2, padding=2),
nn.ReLU(),
nn.BatchNorm2d(32))
def forward(self, x, y):
out = self.layer1(x)
out_param = self.param_layer1(y)
print("LayerParam 1 Output Shape : {}".format(out_param.shape))
print("Layer 1 Output Shape : {}".format(out.shape))
out = self.layer2(out)
print("Layer 2 Output Shape : {}".format(out.shape))
out = self.layer3(out)
# out = torch.cat((out, out_param), dim=2)
print("Layer 3 Output Shape : {}".format(out.shape))
out = self.layer4(out)
print("Layer 4 Output Shape : {}".format(out.shape))
out = self.layer5(out)
print("Layer 5 Output Shape : {}".format(out.shape))
out = self.layer6(out)
print("Layer 6 Output Shape : {}".format(out.shape))
return out
and the method by which I access the data :
for batch_idx, (fdm, parameters) in enumerate(self.data):
if self.config.gpu:
fdm = fdm.to(device)
parameters = parameters.to(device)
print('shape of parameters for model a : {}'.format(parameters.shape))
output = self.model(fdm)
loss = self.criterion(output, parameters)
Edit :
I think my code is incorrect as I am trying to apply convolutions over a vector of (18). I tried to copy the vector and make it (18x64) and then input it. It still doesnt work and gives this output :
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1 5 5, but got 3-dimensional input of size [4, 18, 64] instead
I am not sure how to concatenate an 18 length vector to the output of layer 3, if I can't do any of these things.
Looks like you are training an autoencoder model and want to parameterize it with some additional vector input in the bottleneck layer. If you want to perform some transformations on it then you have to decide whether if you need any spatial dependencies. Given the constant input size (N, 1, 424, 512), the output of layer3 will have a shape (N, 32, 53, 64). You have a lot of options, depending on you desired model performance:
Use a nn.Linear with activations to transform the parameter vector. Then you might add extra spatial dimensions and repeat this vector in all spatial locations:
img = torch.rand((1, 1, 424, 512))
vec = torch.rand(1, 19)
layer3_out = model(img)
N, C, H, W = layer3_out.shape
param_encoder = nn.Sequential(nn.Linear(19, 30), nn.ReLU(), nn.Linear(30, 10))
param = param_encoder(vec)
param = param.unsqueeze(-1).unsqueeze(-1).expand(N, -1, H, W)
encoding = torch.cat([param, layer3_out], dim=1)
Use transposed convolutions to upsample your parameter vector to the size of layer3 output. But that would be harder to implement as you have to calculate exact output shape to fit with (N, 32, 53, 64)
Transform input vector with MLP using nn.Linear to the size 2x of the channels in layer3 output. Then use so called Feature-wise transformations to scale and shift feature maps from layer3.
I would recomend to start with the first option since this is the simplest one to implement and then try others.