Pytorch: Lower the parameters in U-net model - python

can anyone give me some tips on how i would be able to lower the amount of parameters in the following U-net implementation. I'm having trouble with over-fitting on my training data and i would like to lower the parameters in order to see if it improves the validation data accuracy.
Layers:
First2D
layers = [
nn.Conv2d(in_channels, middle_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(middle_channels),
nn.ReLU(inplace=True),
nn.Conv2d(middle_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
]
Encoder2D
layers = [
nn.MaxPool2d(kernel_size=downsample_kernel),
nn.Conv2d(in_channels, middle_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(middle_channels),
nn.ReLU(inplace=True),
nn.Conv2d(middle_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
]
Center2D
layers = [
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(in_channels, middle_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(middle_channels),
nn.ReLU(inplace=True),
nn.Conv2d(middle_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(out_channels, deconv_channels, kernel_size=2, stride=2)
]
Decoder2D
layers = [
nn.Conv2d(in_channels, middle_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(middle_channels),
nn.ReLU(inplace=True),
nn.Conv2d(middle_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(out_channels, deconv_channels, kernel_size=2, stride=2)
]
Last2D
layers = [
nn.Conv2d(in_channels, middle_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(middle_channels),
nn.ReLU(inplace=True),
nn.Conv2d(middle_channels, middle_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(middle_channels),
nn.ReLU(inplace=True),
nn.Conv2d(middle_channels, out_channels, kernel_size=1),
nn.Softmax(dim=1)
]

One way to decrease the number of parameters is to decrease the number of channels in the convolution. You wouldn't be able to change the number of model input and output channels, because they depend on the data, but you can change the number of intermediate channels.
Remember that the output of one layer is the input to the next layer, so keep the number of output channels in the first layer the same as the number of input channels in the second layer, for every pair of layers. Example would be
layers = [
nn.Conv2d(in_channels, middle_channels//2, kernel_size=3, padding=1),
nn.BatchNorm2d(middle_channels//2),
nn.ReLU(inplace=True),
nn.Conv2d(middle_channels//2, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
]
Now, coming to the original question of overfitting, first you might want to try to use other things first, before reducing model size. Some things include data augmentations and dropout.

Related

BatchNorm and Activation in Conv2DTranspose

I am trying to stack 3 Conv2DTranspose as below
self.up1 = nn.ConvTranspose2d(in_channels=out3, out_channels=out2, kernel_size=3, padding=1,
stride=2, output_padding=1)
self.up2 = nn.ConvTranspose2d(in_channels=out2, out_channels=out1, kernel_size=3, padding=1,
stride=2, output_padding=1)
self.up3 = nn.ConvTranspose2d(in_channels=out1, out_channels=n_out_channels, kernel_size=3, padding=1,stride=2, output_padding=1)
#forward
x = self.up_tran1(x)
x = self.up_tran2(x)
x = self.up_tran3(x)
Do I need to add batchnorm and activation like regular Conv2d? Or batchnorm and activation are not needed for Conv2dTranspose?

Variational autoencoder reconstructions - artefacts appear that persist during training

This trained variational autoencoder (VAE) network produces good reconstructions apart from persistent artefacts (see below). They appear some time during training and then persistent however long you continue to train for.
Started appearing after adding BatchNorm2d and Sigmoid layers to decoder
Sigmoid used to squash the output values to [0 to 1] and BatchNorm2D was then necessary to produce any meaningful reconstructions
See previous issue on Cross Validated
My theory is the weight values for those pixels in the Conv2D layers are getting set way beyond the normal range and are unable to recover.
How can I stop these artefacts appearing?
self.kl_coefficient = 100
self.latent_dim = 8
self.input_image_height = 64
self.input_image_channels = 3
self.leaky_relu_negative_slope = 0.01
# Encoder / Decoder architecture
self.encoder = nn.Sequential(
nn.Conv2d(in_channels=self.input_image_channels, out_channels=8, kernel_size=4, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.Conv2d(in_channels=8, out_channels=16, kernel_size=4, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.Conv2d(in_channels=16, out_channels=32, kernel_size=4, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=4, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
Flatten()
)
self.decoder = nn.Sequential(
nn.Linear(in_features=self.latent_dim, out_features=64),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
Unflatten(image_height=self.input_image_height),
nn.ConvTranspose2d(in_channels=64, out_channels=64, kernel_size=5, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.ConvTranspose2d(in_channels=64, out_channels=64, kernel_size=5, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.ConvTranspose2d(in_channels=64, out_channels=32, kernel_size=6, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.ConvTranspose2d(in_channels=32, out_channels=16, kernel_size=6, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.Conv2d(in_channels=16, out_channels=3, kernel_size=1, stride=1),
nn.BatchNorm2d(3,),
nn.Sigmoid()
)
# Fully connected layers
self.fc_pose_mu = nn.Linear(256, self.latent_dim)
self.fc_pose_log_var = nn.Linear(256, self.latent_dim)
def step(self, batch, batch_idx):
reconstructed_images, q, p = self._run_step(batch)
# 1. Reconstruction Loss (Mean Squared Error)
reconstruction_loss = F.mse_loss(batch, reconstructed_images, reduction="sum")
# 2. Training Stability Loss - latent distribution VS standard Gaussian (KL divergence)
q_vs_standard_gaussian = torch.distributions.kl_divergence(q, p)
training_stability_loss = q_vs_standard_gaussian.mean()
training_stability_loss *= self.kl_coefficient
# TOTAL loss
return reconstruction_loss + training_stability_loss
Results

Calculating dimensions of fully connected layer?

I am struggling to work out how to calculate the dimensions for the fully connected layer. I am inputing images which are (448x448) using a batch size (16). Below is the code for my convolutional layers:
class ConvolutionalNet(nn.Module):
def __init__(self, num_classes=182):
super().__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(3, 16, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(16),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer2 = nn.Sequential(
nn.Conv2d(16, 32, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer3 = nn.Sequential(
nn.Conv2d(32, 32, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer4 = nn.Sequential(
nn.Conv2d(32, 64, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer5 = nn.Sequential(
nn.Conv2d(64, 64, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
I want to add a fully connected layer:
self.fc = nn.Linear(?, num_classes)
Would anyone be able to explain the best way to go about calculating this? Also, if I have multiple fully connected layers e.g. (self.fc2, self.fc3), would the second parameter always equal the number of classes. I am new to coding and finding it hard to wrap my head around this.
The conv layers don't change the width/height of the features since you've set padding equal to (kernel_size - 1) / 2. Max pooling with kernel_size = stride = 2 will decrease the width/height by a factor of 2 (rounded down if input shape is not even).
Using 448 as input width/height, the output width/height will be 448 // 2 // 2 // 2 // 2 // 2 = 448/32 = 14 (where // is floor-divide operator).
The number of channels is fully determined by the last conv layer, which outputs 64 channels.
Therefore you will have a [B,64,14,14] shaped tensor, so the Linear layer should have in_features = 64*14*14 = 12544.
Note you'll need to flatten the input beforehand, something like.
self.layer6 = nn.Sequential(
nn.Flatten(),
nn.Linear(12544, num_classes)
)

output of conv layer in AlexNet

I am looking at an implementation of AlexNet with PyTorch. According to the formula, output height = (input_height + padding_top + padding_bottom - kernel_height) / stride_height + 1. so using the formula, with input of size 224, stride = 4, padding = 1, and kernel size =11, the output should be of size 54.75. But if you run a summary of the model, you would see that the output of this first layer to be 54. Does PyTorch clip down the output size? If so, does it consistently clip down (seem like it)? I would like to understand what is going behind the scene please .
Here is the code that I refer to:
net = nn.Sequential(
nn.Conv2d(1, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2), nn.Flatten(),
nn.Linear(6400, 4096), nn.ReLU(), nn.Dropout(p=0.5),
nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(p=0.5),
nn.Linear(4096, 10))
The output size is an whole number of course! It's just that your formula is not correct, the expression is height = floor((input_height + padding_top + padding_bottom - kernel_height) / stride_height + 1). This wouldn't make any sense otherwise.
Yes, it's clipped down when you have length of output by decimal point.
Because when you look at the output variable, it'd be a matrix. Every element in a matrix worth a length, and there is no guarantee an element for 0.75 length, so it's either being clipped down to 54 or rounded up to 55. And because the kernel have to stop striding at where the next stride don't match the kernel size(you may draw to understand how funny it would be if the kernel didn't stop there), it has to be clipped down.

can we pass images for which height!=width through our CNN for training in pytorch?

can we pass images for which height!=width through our CNN in Pytorch?
In CNN, I have convolution, batch-norm, max-pool, relu, and fully connected layers.
My network
self.conv_seqn = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=4, stride=4),
nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=4, stride=4),
)
self.fc_seqn = nn.Sequential(
nn.Linear(1843200, 256),
nn.ReLU(inplace=True),
nn.Linear(256, total_configs)
)
my forward function
forward()
{
x = self.conv_seqn(x)
x = x.view(x.size(0), -1)
x = self.fc_seqn(x)
return x
}
If input image of size 3840*1920*3 after applying conv_seqn() it should be of size [1, 128, 120, 60] but I getting the size of [1,128,120,120] (batch size =1 here)
any suggestion will be highly helpful.

Categories