This trained variational autoencoder (VAE) network produces good reconstructions apart from persistent artefacts (see below). They appear some time during training and then persistent however long you continue to train for.
Started appearing after adding BatchNorm2d and Sigmoid layers to decoder
Sigmoid used to squash the output values to [0 to 1] and BatchNorm2D was then necessary to produce any meaningful reconstructions
See previous issue on Cross Validated
My theory is the weight values for those pixels in the Conv2D layers are getting set way beyond the normal range and are unable to recover.
How can I stop these artefacts appearing?
self.kl_coefficient = 100
self.latent_dim = 8
self.input_image_height = 64
self.input_image_channels = 3
self.leaky_relu_negative_slope = 0.01
# Encoder / Decoder architecture
self.encoder = nn.Sequential(
nn.Conv2d(in_channels=self.input_image_channels, out_channels=8, kernel_size=4, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.Conv2d(in_channels=8, out_channels=16, kernel_size=4, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.Conv2d(in_channels=16, out_channels=32, kernel_size=4, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=4, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
Flatten()
)
self.decoder = nn.Sequential(
nn.Linear(in_features=self.latent_dim, out_features=64),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
Unflatten(image_height=self.input_image_height),
nn.ConvTranspose2d(in_channels=64, out_channels=64, kernel_size=5, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.ConvTranspose2d(in_channels=64, out_channels=64, kernel_size=5, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.ConvTranspose2d(in_channels=64, out_channels=32, kernel_size=6, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.ConvTranspose2d(in_channels=32, out_channels=16, kernel_size=6, stride=2),
nn.LeakyReLU(negative_slope=self.leaky_relu_negative_slope),
nn.Conv2d(in_channels=16, out_channels=3, kernel_size=1, stride=1),
nn.BatchNorm2d(3,),
nn.Sigmoid()
)
# Fully connected layers
self.fc_pose_mu = nn.Linear(256, self.latent_dim)
self.fc_pose_log_var = nn.Linear(256, self.latent_dim)
def step(self, batch, batch_idx):
reconstructed_images, q, p = self._run_step(batch)
# 1. Reconstruction Loss (Mean Squared Error)
reconstruction_loss = F.mse_loss(batch, reconstructed_images, reduction="sum")
# 2. Training Stability Loss - latent distribution VS standard Gaussian (KL divergence)
q_vs_standard_gaussian = torch.distributions.kl_divergence(q, p)
training_stability_loss = q_vs_standard_gaussian.mean()
training_stability_loss *= self.kl_coefficient
# TOTAL loss
return reconstruction_loss + training_stability_loss
Results
Related
I am struggling to work out how to calculate the dimensions for the fully connected layer. I am inputing images which are (448x448) using a batch size (16). Below is the code for my convolutional layers:
class ConvolutionalNet(nn.Module):
def __init__(self, num_classes=182):
super().__init__()
self.layer1 = nn.Sequential(
nn.Conv2d(3, 16, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(16),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer2 = nn.Sequential(
nn.Conv2d(16, 32, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer3 = nn.Sequential(
nn.Conv2d(32, 32, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(32),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer4 = nn.Sequential(
nn.Conv2d(32, 64, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
self.layer5 = nn.Sequential(
nn.Conv2d(64, 64, kernal_size=5, stride=1, padding=2),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernal_size=2, stride=2)
)
I want to add a fully connected layer:
self.fc = nn.Linear(?, num_classes)
Would anyone be able to explain the best way to go about calculating this? Also, if I have multiple fully connected layers e.g. (self.fc2, self.fc3), would the second parameter always equal the number of classes. I am new to coding and finding it hard to wrap my head around this.
The conv layers don't change the width/height of the features since you've set padding equal to (kernel_size - 1) / 2. Max pooling with kernel_size = stride = 2 will decrease the width/height by a factor of 2 (rounded down if input shape is not even).
Using 448 as input width/height, the output width/height will be 448 // 2 // 2 // 2 // 2 // 2 = 448/32 = 14 (where // is floor-divide operator).
The number of channels is fully determined by the last conv layer, which outputs 64 channels.
Therefore you will have a [B,64,14,14] shaped tensor, so the Linear layer should have in_features = 64*14*14 = 12544.
Note you'll need to flatten the input beforehand, something like.
self.layer6 = nn.Sequential(
nn.Flatten(),
nn.Linear(12544, num_classes)
)
I am looking at an implementation of AlexNet with PyTorch. According to the formula, output height = (input_height + padding_top + padding_bottom - kernel_height) / stride_height + 1. so using the formula, with input of size 224, stride = 4, padding = 1, and kernel size =11, the output should be of size 54.75. But if you run a summary of the model, you would see that the output of this first layer to be 54. Does PyTorch clip down the output size? If so, does it consistently clip down (seem like it)? I would like to understand what is going behind the scene please .
Here is the code that I refer to:
net = nn.Sequential(
nn.Conv2d(1, 96, kernel_size=11, stride=4, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(96, 256, kernel_size=5, padding=2), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2),
nn.Conv2d(256, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 384, kernel_size=3, padding=1), nn.ReLU(),
nn.Conv2d(384, 256, kernel_size=3, padding=1), nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2), nn.Flatten(),
nn.Linear(6400, 4096), nn.ReLU(), nn.Dropout(p=0.5),
nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(p=0.5),
nn.Linear(4096, 10))
The output size is an whole number of course! It's just that your formula is not correct, the expression is height = floor((input_height + padding_top + padding_bottom - kernel_height) / stride_height + 1). This wouldn't make any sense otherwise.
Yes, it's clipped down when you have length of output by decimal point.
Because when you look at the output variable, it'd be a matrix. Every element in a matrix worth a length, and there is no guarantee an element for 0.75 length, so it's either being clipped down to 54 or rounded up to 55. And because the kernel have to stop striding at where the next stride don't match the kernel size(you may draw to understand how funny it would be if the kernel didn't stop there), it has to be clipped down.
I am new to AI and python, I'm trying to build an architecture to train a set of images. and later to aim to overfit. but up till now, I couldn't understand how to get the inputs and outputs correctly. I keep seeing the error whenever I try to train the network:
mat1 and mat2 shapes cannot be multiplied (48x13456 and 16x64)
my network:
net2 = nn.Sequential(
nn.Conv2d(3,8, kernel_size=5, padding=0),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(8,16, kernel_size=5, padding=0),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Linear(16,64),
nn.ReLU(),
nn.Linear(64,10)
)
this is a part of a task I'm working on and I really don't get why it's not running. any hints!
its because you have flattened your 2D cnn into 1D FC layers...
& you have to manually calculate your changed input shape from 128 size to your Maxpool layer just before flattening layer ...In your case its 29*29*16
So your code must be rewritten as
net2 = nn.Sequential(
nn.Conv2d(3,8, kernel_size=5, padding=0),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(8,16, kernel_size=5, padding=0),
nn.ReLU(),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Flatten(),
nn.Linear(13456,64),
nn.ReLU(),
nn.Linear(64,10)
)
This should work
EDIT: This is a simple formula to calculate output size :
(((W - K + 2P)/S) + 1)
Here W = Input size
K = Filter size
S = Stride
P = Padding
So 1st conv block will make your output of size 124
Then you do Maxpool which will make it half i.e 62
2nd conv block will make your output of size 58
Then your last Maxpool will make it 29...
So final flattened output would be 29*29*16 where 16 is output channels
can anyone give me some tips on how i would be able to lower the amount of parameters in the following U-net implementation. I'm having trouble with over-fitting on my training data and i would like to lower the parameters in order to see if it improves the validation data accuracy.
Layers:
First2D
layers = [
nn.Conv2d(in_channels, middle_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(middle_channels),
nn.ReLU(inplace=True),
nn.Conv2d(middle_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
]
Encoder2D
layers = [
nn.MaxPool2d(kernel_size=downsample_kernel),
nn.Conv2d(in_channels, middle_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(middle_channels),
nn.ReLU(inplace=True),
nn.Conv2d(middle_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
]
Center2D
layers = [
nn.MaxPool2d(kernel_size=2),
nn.Conv2d(in_channels, middle_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(middle_channels),
nn.ReLU(inplace=True),
nn.Conv2d(middle_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(out_channels, deconv_channels, kernel_size=2, stride=2)
]
Decoder2D
layers = [
nn.Conv2d(in_channels, middle_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(middle_channels),
nn.ReLU(inplace=True),
nn.Conv2d(middle_channels, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
nn.ConvTranspose2d(out_channels, deconv_channels, kernel_size=2, stride=2)
]
Last2D
layers = [
nn.Conv2d(in_channels, middle_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(middle_channels),
nn.ReLU(inplace=True),
nn.Conv2d(middle_channels, middle_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(middle_channels),
nn.ReLU(inplace=True),
nn.Conv2d(middle_channels, out_channels, kernel_size=1),
nn.Softmax(dim=1)
]
One way to decrease the number of parameters is to decrease the number of channels in the convolution. You wouldn't be able to change the number of model input and output channels, because they depend on the data, but you can change the number of intermediate channels.
Remember that the output of one layer is the input to the next layer, so keep the number of output channels in the first layer the same as the number of input channels in the second layer, for every pair of layers. Example would be
layers = [
nn.Conv2d(in_channels, middle_channels//2, kernel_size=3, padding=1),
nn.BatchNorm2d(middle_channels//2),
nn.ReLU(inplace=True),
nn.Conv2d(middle_channels//2, out_channels, kernel_size=3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
]
Now, coming to the original question of overfitting, first you might want to try to use other things first, before reducing model size. Some things include data augmentations and dropout.
can we pass images for which height!=width through our CNN in Pytorch?
In CNN, I have convolution, batch-norm, max-pool, relu, and fully connected layers.
My network
self.conv_seqn = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1),
nn.BatchNorm2d(32),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=4, stride=4),
nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=4, stride=4),
)
self.fc_seqn = nn.Sequential(
nn.Linear(1843200, 256),
nn.ReLU(inplace=True),
nn.Linear(256, total_configs)
)
my forward function
forward()
{
x = self.conv_seqn(x)
x = x.view(x.size(0), -1)
x = self.fc_seqn(x)
return x
}
If input image of size 3840*1920*3 after applying conv_seqn() it should be of size [1, 128, 120, 60] but I getting the size of [1,128,120,120] (batch size =1 here)
any suggestion will be highly helpful.