How can I check the image sizes while CNN? - python

I'm trying to classify cat and dog in CNN with PyTorch.
While I made few layers and processing images, I found that final processed feature map size doesn't match with calculated size.
So I tried to check feature map size step by step in CNN process with print shape but it doesn't work.
I heard tensorflow enables check tensor size in steps but how can I do that?
What I want is :
def __init__(self):
super(CNN, self).__init__()
conv1 = nn.Conv2d(1, 16, 3, 1, 1)
conv1_1 = nn.Conv2d(16, 16, 3, 1, 1)
pool1 = nn.MaxPool2d(2)
conv2 = nn.Conv2d(16, 32, 3, 1, 1)
conv2_1 = nn.Conv2d(32, 32, 3, 1, 1)
pool2 = nn.MaxPool2d(2)
conv3 = nn.Conv2d(32, 64, 3, 1, 1)
conv3_1 = nn.Conv2d(64, 64, 3, 1, 1)
conv3_2 = nn.Conv2d(64, 64, 3, 1, 1)
pool3 = nn.MaxPool2d(2)
self.conv_module = nn.Sequential(
# check first result size
# check second result size
# check third result size
# check fourth result size
# check fifth result size
If there's any other way to check feature size at every step, please give some advice.
Thanks in advance.

To do that you shouldn't use nn.Sequential. Just initialize your layers in __init__() and call them in the forward function. In the forward function you can print the shapes out. For example like this:
class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Conv2d(...)
self.maxpool1 = nn.MaxPool2d()
self.conv2 = nn.Conv2d(...)
self.maxpool2 = nn.MaxPool2d()
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = self.maxpool1(x)
x = self.conv2(x)
x = F.relu(x)
x = self.maxpool2(x)
Hope thats what you looking for!


Is it possible to auto-size the subsequent input of a layer following torch.nn.Flatten within torch.nn.Sequential in PyTorch?

If I have the following model class for example:
class MyTestModel(nn.Module):
def __init__(self):
super(MyTestModel, self).__init__()
self.seq1 = nn.Sequential(
nn.Conv2d(3, 6, 3),
nn.MaxPool2d(2, 2),
nn.Conv2d(6, 16, 3),
nn.MaxPool2d(2, 2),
nn.Linear(myflattendinput(), 120), # how to automate this?
nn.Linear(120, 84),
nn.Linear(84, 2),
self.softmax = nn.Softmax(dim=1)
def forward(self, x):
x = self.seq1(x)
x = self.softmax(x)
return x
I know, normally you would let the data loader give a fixed size input to the model, thus having a fixed size for the input of the layer after nn.Flatten(), however I was wondering if you could somehow compute this automatically?
PyTorch (>=1.8) has LazyLinear which infers the input dimension.

Why am I getting calculated padding input size per channel smaller than kernel size?

I have the following model but its returning an error. Not sure why. I have tried googling but not found anything so far. My input is an numpy array of 6 by 6.
class Net(nn.Module):
def __init__(self):
self.conv1 = nn.Conv2d(1, 16, kernel_size=(3,3), stride=1, padding=0)
self.conv2 = nn.Conv2d(16, 32, kernel_size=(3,3), stride=1, padding=0)
self.conv3 = nn.Conv2d(32, 64, kernel_size=(3,3), stride=1, padding=0)
self.fc1 = nn.Linear(64*4*4, 320)
self.fc2 = nn.Linear(320, 160)
self.out = nn.Linear(160, 2)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = F.max_pool2d(x, kernel_size=2, stride=2)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, kernel_size=2, stride=2)
x = self.conv3(x)
x = F.relu(x)
x = F.max_pool2d(x, kernel_size=2, stride=2)
x = x.reshape(-1, 64*4*4)
#x = torch.flatten(x)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.out(x)
return F.softmax(x, dim=1)
My input is a 6x6 numpy array and I get the following error, any idea why?
RuntimeError: Calculated padded input size per channel: (2 x 2). Kernel size: (3 x 3). Kernel size can't be greater than actual input size
Every time you do convolution with kernel size of 3 size of image is shrinked by 1 in each dimension.
So after the first one, without any padding you would get 4 x 4 image and 2 x 2 after the second convolution. And kernel of size 3 x 3 cannot go over 2 x 2 image obviously, hence the error you receive.
Add padding=1 if you don't want this shrinking of representation.
You can see more about how the shape changes with respect to Conv2d parameters in PyTorchs documentation (here), see section shape specifically.
Here is what you may do and I used the padding=1 as proposed by Szymon Maszke. This padding is added to the convolution and to maxpooling.
import numpy
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
self.conv1 = nn.Conv2d(1, 16, kernel_size=(3,3), stride=1, padding=1)
self.conv2 = nn.Conv2d(16, 32, kernel_size=(3,3), stride=1, padding=1)
self.conv3 = nn.Conv2d(32, 64, kernel_size=(3,3), stride=1, padding=1)
self.fc1 = nn.Linear(64*4*4, 320)
self.fc2 = nn.Linear(320, 160)
self.out = nn.Linear(160, 2)
def forward(self, x):
x = self.conv1(x)
x = F.relu(x)
x = F.max_pool2d(x, kernel_size=2, stride=2)
x = self.conv2(x)
x = F.relu(x)
x = F.max_pool2d(x, kernel_size=3, stride=2, padding=1)
x = self.conv3(x)
x = F.relu(x)
x = F.max_pool2d(x, kernel_size=3, stride=2, padding=1)
x = x.reshape(-1, 64*4*4)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.out(x)
return F.softmax(x, dim=1)
a = numpy.random.rand(6,6)
data = torch.tensor(a).float()
# data.unsqueeze_(0).unsqueeze_(0)
data= data.expand(16, 1 ,-1,-1)
o = n(data)
[[0.89695967 0.09447725 0.0905144 0.52694105 0.66000333 0.10537102]
[0.32854697 0.86046884 0.29804184 0.62988374 0.5965067 0.54139821]
[0.41561266 0.95484358 0.82919364 0.75556819 0.77373267 0.52209278]
[0.46406436 0.6553954 0.60010151 0.86314529 0.70020608 0.16471554]
[0.72863547 0.83846636 0.95122373 0.84322402 0.32264676 0.1233866 ]
[0.75767067 0.56546123 0.7765021 0.35303595 0.3254407 0.84033049]]
torch.Size([6, 6])
torch.Size([16, 1, 6, 6])
tensor([[0.5134, 0.4866]], grad_fn=<SoftmaxBackward>)
By default in PyTorch padding=0, so you need to explicitly set padding=1 when needed.

Error in Calculating neural network Test Accuracy

I tried to train my neural network, and then evaluate it's testing accuracy. I am using the code at the bottom of this post to train. The fact is that for other neural networks, I can evaluate the testing accuracy with my code without issue. However, for this neural network (which I constructed correctly according to the description of the neural network paper), I can't evaluate the testing accuracy properly and its giving me the traceback below. So maybe something's wrong in my forward pass?
Here is the training and testing code:
//imports including import
cudnn.benchmark = True
(X_train, y_train), (X_test, y_test) = cifar10.load_data()
X_train = X_train.astype('float32')
X_train = np.transpose(X_train, axes=(0, 3, 1, 2))
X_test = X_test.astype('float32')
X_test = np.transpose(X_test, axes=(0, 3, 1, 2))
X_train /= 255
X_test /= 255
device = torch.device('cuda:0')
# This is where you can load any model of your choice.
# I stole PyTorch Vision's VGG network and modified it to work on CIFAR-10.
# You can take this line out and add any other network and the code
# should run just fine.
model = deepnet.cifar10_deep()
# Forward pass
opfun = lambda X: model.forward(Variable(torch.from_numpy(X)))
# Forward pass through the network given the input
predsfun = lambda op: np.argmax(, 1)
# Do the forward pass, then compute the accuracy
accfun = lambda op, y: np.mean(np.equal(predsfun(op), y.squeeze()))*100
# Initial point
x0 = deepcopy(model.state_dict())
# Number of epochs to train for
# Choose a large value since LB training needs higher values
# Changed from 150 to 30
nb_epochs = 30
batch_range = [25, 40, 50, 64, 80, 128, 256, 512, 625, 1024, 1250, 1750, 2048, 2500, 3125, 4096, 4500, 5000]
# parametric plot (i.e., don't train the network if set to True)
hotstart = False
if not hotstart:
for batch_size in batch_range:
optimizer = torch.optim.Adam(model.parameters())
average_loss_over_epoch = '-'
print('Optimizing the network with batch size %d' % batch_size)
np.random.seed(1337) #So that both networks see same sequence of batches
for e in range(nb_epochs):
print('Epoch:', e, ' of ', nb_epochs, 'Average loss:', average_loss_over_epoch)
average_loss_over_epoch = 0
# Checkpoint the model every epoch, "./models/DeepNetC2BatchSize" + str(batch_size) + ".pth")
array = np.random.permutation(range(X_train.shape[0]))
slices = X_train.shape[0] // batch_size
beginning = 0
end = 1
# Training loop!
for _ in range(slices):
start_index = batch_size * beginning
end_index = batch_size * end
smpl = array[start_index:end_index]
ops = opfun(X_train[smpl])
tgts = Variable(torch.from_numpy(y_train[smpl]).long().squeeze())
loss_fn = F.nll_loss(ops, tgts)
average_loss_over_epoch += / (X_train.shape[0] // batch_size)
beginning += 1
end += 1
grid_size = 18 #How many points of interpolation between [0, 5000]
data_for_plotting = np.zeros((grid_size, 3)) #Uncomment this line if running entire code from scratch
sharpnesses1eNeg3 = []
sharpnesses5eNeg4 = []
#data_for_plotting = np.load("DeepNetCIFAR10-intermediate-values.npy") #Uncomment this line to use an existing NumPy array
i = 0
# Fill in test accuracy values for `grid_size' points in the interpolation
for batch_size in batch_range:
mydict = {}
batchmodel = torch.load("./models/DeepNetC2BatchSize" + str(batch_size) + ".pth")
for key, value in batchmodel.items():
mydict[key] = value
j = 0
for datatype in [(X_train, y_train), (X_test, y_test)]:
dataX = datatype[0]
datay = datatype[1]
for smpl in np.split(np.random.permutation(range(dataX.shape[0])), 10):
ops = opfun(dataX[smpl])
tgts = Variable(torch.from_numpy(datay[smpl]).long().squeeze())
var = F.nll_loss(ops, tgts).data.numpy() / 10
if j == 1:
data_for_plotting[i, j-1] += accfun(ops, datay[smpl]) / 10.
j += 1
print(data_for_plotting[i])'DeepNetCIFAR10-intermediate-values', data_for_plotting)
i += 1
And the model code is here and includes the forward pass
import torch
import torch.nn as nn
F = nn.functional
__all__ = ['cifar10_deepnet', 'cifar100_deepnet']
class VGG(nn.Module):
def __init__(self, num_classes=10):
super(VGG, self).__init__()
self.features = nn.Sequential(
nn.Conv2d(3, 64, kernel_size=3, bias=False),
nn.Conv2d(64, 64, kernel_size=3, padding = 1, bias=False),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(64, 128, kernel_size=3, padding = 1, bias=False),
nn.Conv2d(128, 128, kernel_size=3, padding = 1, bias=False),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(128, 256, kernel_size=3, padding = 1, bias=False),
nn.Conv2d(256, 256, kernel_size=3, padding = 1, bias=False),
nn.Conv2d(256, 256, kernel_size=3, padding = 1, bias=False),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(256, 512, kernel_size=3, padding = 1, bias=False),
nn.Conv2d(512, 512, kernel_size=3, padding = 1, bias=False),
nn.Conv2d(512, 512, kernel_size=3, padding = 1, bias=False),
nn.MaxPool2d(kernel_size=2, stride=2),
nn.Conv2d(512, 512, kernel_size=3, padding = 1, bias=False),
nn.Conv2d(512, 512, kernel_size=3, padding = 1, bias=False),
nn.Conv2d(512, 512, kernel_size=3, padding = 1, bias=False),
nn.MaxPool2d(kernel_size=2, stride=2),
self.classifier = nn.Sequential(
nn.Linear(512, 512, bias=False),
nn.Linear(512, num_classes)
def forward(self, x):
x = self.features(x)
x = x.view(-1, 512)
x = self.classifier(x)
return F.log_softmax(x)
def cifar10_deep(**kwargs):
num_classes = getattr(kwargs, 'num_classes', 10)
return VGG(num_classes)
def cifar100_deep(**kwargs):
num_classes = getattr(kwargs, 'num_classes', 100)
return VGG(num_classes)
You are trying to load a state dict that belongs to another model.
The error shows that your model is the class AlexNet.
RunTimeError: Error(s) in loading state_dict for AlexNet:
But the state dict you are trying to load is from the VGG you posted, which doesn't have the same modules as AlexNet.
You need to use the same model whose state dict you saved before.

Kernel size can't be greater than actual input size

I have a data with depth = 3 and I want to pass it through 3 convolution layers with 3x3x3 kernels each.
My current code is below. The first input is
[batch_size=10, in_channels=1, depth=3, height=128, width=256]
and I notice after the first conv3d layer the output is [10,8,1,126,254]. Obviously it has now depth 1 and doesn't accept it for another 3x3x3 layer. How can I achieve this?
class CNet(nn.Module):
def __init__(self, **kwargs):
self.conv1 = nn.Conv3d(1, 8, kernel_size=3, stride=1, padding=0)
self.conv2 = nn.Conv3d(8, 16, kernel_size=3, stride=1, padding=0)
self.conv3 = nn.Conv3d(16, 32, kernel_size=3, stride=1, padding=0)
self.fc1 = nn.Linear(value, 2)
def forward(self, X):
X = F.relu(self.conv1(X))
X = F.relu(self.conv2(X))
X = F.max_pool2d(X,2)
X = self.conv3(X)
X = F.max_pool2d(X,2)
X = self.fc1(X)
return F.softmax(X,dim =1)
You need to use padding. If you only want to pad the input for the convolutions after the first one and only in the depth dimensions to get the minimum dimension of 3, you would use padding=(1, 0, 0) (it's 1 because the same padding is applied to both sides, i.e. (padding, input, padding) along that dimension).
self.conv2 = nn.Conv3d(8, 16, kernel_size=3, stride=1, padding=(1, 0, 0))
self.conv3 = nn.Conv3d(16, 32, kernel_size=3, stride=1, padding=(1, 0, 0))
However, it is common to use padding=1 for all dimensions when using kernel_size=3, because that keeps the dimensions unchanged, which makes it much easier to build deeper network, as you don't need to worry about the sizes suddenly getting too small, as it happened already for your depth dimension. Also when no padding is used, the corners are only included in a single calculation, whereas all other elements contribute to multiple calculations. It is recommended to use kernel_size=3 and padding=1 for all your convolutions.
self.conv1 = nn.Conv3d(1, 8, kernel_size=3, stride=1, padding=1)
self.conv2 = nn.Conv3d(8, 16, kernel_size=3, stride=1, padding=1)
self.conv3 = nn.Conv3d(16, 32, kernel_size=3, stride=1, padding=1)

How to create properly shaped input using Dataloader?

I am trying to input an image and a vector as input to the model. The image has the correct shape of 4d, but the vector that I input doesn't have such shape. The image size is 424x512 while the vector is of shape (18,). After using dataloader, I get batches of shape (50x1x424x512) and (50x18). Model gives error as it needs the vector shape to be 4d too. How do I do that?
Here is my code :
def loadTrainingData_B(args):
fdm = []
tdm = []
parameters = []
for i in image_files[:4]:
false_dm = np.fromfile(join(ref, i), dtype=np.int32)
false_dm = Image.fromarray(false_dm.reshape((424, 512, 9)).astype(np.uint8)[:,:,1])
true_dm = np.fromfile(join(ref, i), dtype=np.int32)
true_dm = Image.fromarray(true_dm.reshape((424, 512, 9)).astype(np.uint8)[:,:,1])
pos = param_filenames.index(i)
param = np.array(params[pos, 1:])
param = np.where(param == '-point-light-source', 1, param).astype(np.float64)
print('[!] File {} not found'.format(i))
return (fdm, parameters, tdm)
class Flat_ModelB(Dataset):
def __init__(self, args, train=True, transform=None):
self.args = args
if train == True:
self.fdm, self.parameters, self.tdm = loadTrainingData_B(self.args)
self.fdm, self.parameters, self.tdm = loadTestData_B(self.args)
self.data_size = len(self.parameters)
self.transform = transforms.Compose([transforms.ToTensor()])
def __getitem__(self, index):
return (self.transform(self.fdm[index]).double(), torch.from_numpy(self.parameters[index]).double(), self.transform(self.tdm[index]).double())
def __len__(self):
return self.data_size
The error I get is :
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1 5 5, but got 2-dimensional input of size [50, 18] instead
Here is the model :
class Model_B(nn.Module):
def __init__(self, config):
super(Model_B, self).__init__()
self.config = config
# CNN layers for fdm
self.layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=2, padding=2),
self.layer2 = nn.Sequential(
nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=2, padding=2),
self.layer3 = nn.Sequential(
nn.Conv2d(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2),
self.layer4 = nn.Sequential(
nn.ConvTranspose2d(in_channels=32, out_channels=32, kernel_size=5, stride=2, padding=2, output_padding=1),
self.layer5 = nn.Sequential(
nn.ConvTranspose2d(in_channels=32, out_channels=16, kernel_size=5, stride=2, padding=2,output_padding=1),
self.layer6 = nn.Sequential(
nn.ConvTranspose2d(in_channels=16, out_channels=1, kernel_size=5, stride=2, padding=2, output_padding=1),
# CNN layer for parameters
self.param_layer1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=32, kernel_size=5, stride=2, padding=2),
def forward(self, x, y):
out = self.layer1(x)
out_param = self.param_layer1(y)
print("LayerParam 1 Output Shape : {}".format(out_param.shape))
print("Layer 1 Output Shape : {}".format(out.shape))
out = self.layer2(out)
print("Layer 2 Output Shape : {}".format(out.shape))
out = self.layer3(out)
# out =, out_param), dim=2)
print("Layer 3 Output Shape : {}".format(out.shape))
out = self.layer4(out)
print("Layer 4 Output Shape : {}".format(out.shape))
out = self.layer5(out)
print("Layer 5 Output Shape : {}".format(out.shape))
out = self.layer6(out)
print("Layer 6 Output Shape : {}".format(out.shape))
return out
and the method by which I access the data :
for batch_idx, (fdm, parameters) in enumerate(
if self.config.gpu:
fdm =
parameters =
print('shape of parameters for model a : {}'.format(parameters.shape))
output = self.model(fdm)
loss = self.criterion(output, parameters)
Edit :
I think my code is incorrect as I am trying to apply convolutions over a vector of (18). I tried to copy the vector and make it (18x64) and then input it. It still doesnt work and gives this output :
RuntimeError: Expected 4-dimensional input for 4-dimensional weight 32 1 5 5, but got 3-dimensional input of size [4, 18, 64] instead
I am not sure how to concatenate an 18 length vector to the output of layer 3, if I can't do any of these things.
Looks like you are training an autoencoder model and want to parameterize it with some additional vector input in the bottleneck layer. If you want to perform some transformations on it then you have to decide whether if you need any spatial dependencies. Given the constant input size (N, 1, 424, 512), the output of layer3 will have a shape (N, 32, 53, 64). You have a lot of options, depending on you desired model performance:
Use a nn.Linear with activations to transform the parameter vector. Then you might add extra spatial dimensions and repeat this vector in all spatial locations:
img = torch.rand((1, 1, 424, 512))
vec = torch.rand(1, 19)
layer3_out = model(img)
N, C, H, W = layer3_out.shape
param_encoder = nn.Sequential(nn.Linear(19, 30), nn.ReLU(), nn.Linear(30, 10))
param = param_encoder(vec)
param = param.unsqueeze(-1).unsqueeze(-1).expand(N, -1, H, W)
encoding =[param, layer3_out], dim=1)
Use transposed convolutions to upsample your parameter vector to the size of layer3 output. But that would be harder to implement as you have to calculate exact output shape to fit with (N, 32, 53, 64)
Transform input vector with MLP using nn.Linear to the size 2x of the channels in layer3 output. Then use so called Feature-wise transformations to scale and shift feature maps from layer3.
I would recomend to start with the first option since this is the simplest one to implement and then try others.
