I am very new to Deep learning. I am working on the CIFAR10 dataset and created a CNN model which is as below.
class Net2(nn.Module):
def __init__(self):
super(Net2, self).__init__()
self.conv1 = nn.Conv2d(3, 32, 5, 1)
self.fc1 = nn.Linear(32 * 5 * 5, 512)
self.fc2 = nn.Linear(512,10)
def forward(self, x):
x = x.view(x.size(0), -1)
x = F.max_pool2d(F.relu(self.conv1(x)),(2,2))
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
net2 = Net2().to(device)
My assignment requirements are to create a model with:
Convolutional layer with 32 filters, kernel size of 5x5 and stride of 1.
Max Pooling layer with kernel size of 2x2 and default stride.
ReLU Activation Layers.
Linear layer with output of 512.
ReLU Activation Layers.
A linear layer with output of 10.
Which I guess I wrote. But I am assuming that I am going to the wrong path. Please help me to write the correct model and also the reason behind those arguments in Conv2d and Linear layers.
The error which I am getting from my code is as below:
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 3, 5, 5], but got 2-dimensional input of size [1024, 3072] instead
Please help me!
There are two problems with the code:
Flattening of input
x = x.view(x.size(0), -1)
The convolutional layer expects a four dimensional input of dimensions (N, C, H, W), where N is the batch size, C = 3 is the number of channels, and (H, W) is the dimension of the image. By using the above statement, you are flattening your (1024, 3, 32, 32) input to (1024, 3072).
Number of input features in the first linear layer
self.fc1 = nn.Linear(32 * 5 * 5, 512)
The output dimensions of the convolutional layer for a (1024, 3, 32, 32) input will be (1024, 32, 28, 28), and after applying the 2 x 2 maxpooling, it is (1024, 32, 14, 14). So the number of input features for the linear layer should be 32 x 14 x 14 = 6272.
Related
I am working on sentiment analysis, I want to classify the output into 4 classes. For loss I am using cross-entropy.
The problem is PyTorch cross-entropy needs the input of (batch_size, output) which is am having trouble with.
I am taking a batch size of 12 and sequence size is 32
import torch.nn as nn
class RNN(nn.Module):
def __init__(self, hidden_dim = 256, input_size = 32 , num_layers = 1, num_classes=4, vocab_size = len(vocab_to_int)+1, embedding_dim=100):
super().__init__()
self.input_size = input_size
self.hidden_dim = hidden_dim
self.num_layers = num_layers
self.num_classes = num_classes
self.embedding = nn.Embedding(vocab_size, embedding_dim)
self.lstm = nn.LSTM(embedding_dim, hidden_dim, num_layers)
self.fc1 = nn.Linear(hidden_dim, 50)
self.fc2 = nn.Linear(50, 4)
def forward(self, x, hidden):
x = self.embedding(x)
x = x.view(32, 12, 100)
x, hidden = self.lstm(x, hidden)
x = x.contiguous().view(-1, 256)
x = self.fc1(x) # output shape ([384, 50])
x = self.fc2(x) # output shape [384, 4]
return x, hidden
def init_hidden(self, batch_size=12):
weight = next(self.parameters()).data
hidden = (weight.new(self.num_layers, 12, self.hidden_dim).zero_().cuda(), weight.new(self.num_layers, 12, self.hidden_dim).zero_().cuda())
return hidden
According to the CrossEntropyLoss docs:
input has to be a Tensor of size (C) for unbatched input, (minibatch,C) [for batched input] [...]
The code you provided is only the RNN class and not the data processing and the actual call to CrossEntropyLoss, but the error you stated in the comments makes me think that you didn't reshape the labels tensor to have the same size as the output from the neural network. Therefore, you'd be calculating the loss of a tensor with size (384, 4) against another tensor which I infer is of size (12, 32). Your labels tensor should be of size (384) to match the first dimension of the neural network output.
Also, you don't have to manually reshape your tensors, you can reshape them after the forward() call through the torch.nn.utils.rnn.pack_padded_sequence() function. If you do apply this function to both the output of the neural network and the labels, you will have a tensor of size (384, 4) that PyTorch can handle in the call to CrossEntropyLoss. See the note in the pack_padded_sequence() function docs for more details.
I am newbee in neural networks, i have teached my model and now i want to test it. I have wrote a code with google help, but it do not work.
The problem is that i do not understand from where i am getting the 4th dimension.
Code is the following:
import matplotlib.pyplot as plt
import numpy as np
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
transform = transforms.Compose([
transforms.Resize(32),
transforms.CenterCrop(32),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
main_path = 'D:/RTU/dataset/ready_dataset_2classes'
train_data_path = main_path + '/train'
#test_data_path = main_path + '/test'
weigths_path = 'D:/RTU/dataset/weights_done/weights_noise_original037-97%.pt'
train_data = torchvision.datasets.ImageFolder(root=train_data_path, transform=transform)
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# convolutional layer (sees 32x32x3 image tensor)
self.conv1 = nn.Conv2d(3, 16, 3, padding=1)
# convolutional layer (sees 16x16x16 tensor)
self.conv2 = nn.Conv2d(16, 32, 3, padding=1)
# convolutional layer (sees 8x8x32 tensor)
self.conv3 = nn.Conv2d(32, 64, 3, padding=1)
# max pooling layer
self.pool = nn.MaxPool2d(2, 2)
# linear layer (64 * 4 * 4 -> 500)
self.fc1 = nn.Linear(64 * 4 * 4, 500)
# linear layer (500 -> 10)
self.fc2 = nn.Linear(500, 250)
self.fc3 = nn.Linear(250, 2)
# dropout layer (p=0.25)
self.dropout = nn.Dropout(0.25) #0.25
def forward(self, x):
# add sequence of convolutional and max pooling layers
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = self.pool(F.relu(self.conv3(x)))
# flatten image input
x = x.view(-1, 64 * 4 * 4)
# add dropout layer
x = self.dropout(x)
# add 1st hidden layer, with relu activation function
x = F.relu(self.fc1(x))
# add dropout layer
x = self.dropout(x)
# add 2nd hidden layer, with relu activation function
x = F.relu(self.fc2(x))
x = self.dropout(x)
x = self.fc3(x)
return x
# Disable grad
with torch.no_grad():
# Retrieve item
index = 1
item = train_data[index]
image = item[0]
true_target = item[1]
# Loading the saved model
mlp = Net()
optimizer = optim.SGD(mlp.parameters(), lr=0.01)
epoch=5
valid_loss_min = np.Inf
checkpoint = torch.load(weigths_path , map_location=torch.device('cpu'))
mlp.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
valid_loss_min = checkpoint['valid_loss_min']
mlp.eval()
# Generate prediction
prediction = mlp(image)
# Predicted class value using argmax
predicted_class = np.argmax(prediction)
# Reshape image
#image = image.reshape(28, 28, 1)
# Show result
plt.imshow(image, cmap='gray')
plt.title(f'Prediction: {predicted_class} - Actual target: {true_target}')
plt.show()
The code seems working till the "mlp.eval()" and then i am getting an error Expected 4-dimensional input for 4-dimensional weight [16, 3, 3, 3], but got 3-dimensional input of size [3, 32, 32] instead.
What i am doing wrong?
Error
When you are training neural nets, you are feeding small batches of input data to your model. Indeed even it's not clearly specified when writting Layers in Pytorch, If you look at the documentation, here you can see that Layers receive 4D arrays
with N corresponding to batch size and C to number of channels, here 3 because you are using RGB images
So when testing your model once trained, the testing data should be built the same way to be fed into the network.
Thus if you want to feed 1 image to your network you must reshape it proprely
myimage.reshape(-1,3,32,32)
print(myimage.shape)
#(1,3,32,33)
This question already has answers here:
Pytorch - Inferring linear layer in_features
(2 answers)
Closed 1 year ago.
I was trying to learn PyTorch and came across a tutorial where a CNN is defined like below,
class Net(Module):
def __init__(self):
super(Net, self).__init__()
self.cnn_layers = Sequential(
# Defining a 2D convolution layer
Conv2d(1, 4, kernel_size=3, stride=1, padding=1),
BatchNorm2d(4),
ReLU(inplace=True),
MaxPool2d(kernel_size=2, stride=2),
# Defining another 2D convolution layer
Conv2d(4, 4, kernel_size=3, stride=1, padding=1),
BatchNorm2d(4),
ReLU(inplace=True),
MaxPool2d(kernel_size=2, stride=2),
)
self.linear_layers = Sequential(
Linear(4 * 7 * 7, 10)
)
# Defining the forward pass
def forward(self, x):
x = self.cnn_layers(x)
x = x.view(x.size(0), -1)
x = self.linear_layers(x)
return x
I understood how the cnn_layers are made. After the cnn_layers, the data should be flattened and given to linear_layers.
I don't understand how the number of features to Linear is 4*7*7. I understand that 4 is the output dimension from the last Conv2d layer.
How is 7*7 coming in to picture? Does stride or padding got any role in that?
Input image shape is [1, 28, 28]
Conv2d layers have a kernel size of 3, stride and padding of 1, which means it doesn't change the spatial size of an image. There are two MaxPool2d layers which reduce the spatial dimensions from (H, W) to (H/2, W/2). So, for each batch, output of the last convolution with 4 output channels has a shape of (batch_size, 4, H/4, W/4). In the forward pass feature tensor is flattened by x = x.view(x.size(0), -1) which makes it in the shape (batch_size, H*W/4). I assume H and W are 28, for which the linear layer would take inputs of shape (batch_size, 196).
Actually,
in the 2D convolution layers features [values] in a matric [2D-tensor],
As usual neural network end up with a fully connected layer followed by the logist later.
so, features in the fully-connected layer in the vector [1D-tensor].
therefore we have to map each feature [value] in the last metric into the fully-connected layer follows.
in pytorch implementation of the fully-connected layer is Linear class.
the first parameter is the number of input features:
in this case
input_image : (28,28,1)
after_Conv2d_1 : (28,28,4) <- because of the padding : if padding := 0 then (26,26,1)
after_maxPool_1 : (14,14,4) <- due to the stride of 2
after_Conv2D_2 : (14,14,4) <- because this is "same" padding
after_maxPool_2 : (7,7,4)
in the end, the total number of features before the fully connected layer is 4*7*7.
Also, here shows why we use an odd number for the kernel size and start from images with even number of pixels
I have a simple convolution network:
import torch.nn as nn
class model(nn.Module):
def __init__(self, ks=1):
super(model, self).__init__()
self.conv1 = nn.Conv2d(in_channels=4, out_channels=32, kernel_size=ks, stride=1)
self.fc1 = nn.Linear(8*8*32*ks, 64)
self.fc2 = nn.Linear(64, 64)
def forward(self, x):
x = F.relu(self.conv1(x))
x = x.view(x.size(0), -1)
x = F.relu(self.fc1(x))
x = self.fc2(x)
return x
cnn = model(1)
Since the kernel size is 1 and the output channel is 32, I assume that there should be 32*1*1 weights in this layer. But, when I ask pytorch about the shape of the weight matrix cnn.conv1.weight.shape, it returns torch.Size([32, 4, 1, 1]). Why the number of input channel should matter on the weight of a conv2d layer?
Am I missing something?
It matters because you are doing 2D convolution over the images which means the depth of the filter(kernel) must be equal to the number of in_channels(pytorch sets it for you) so the true kernel size is [in_channels,1,1]. On the other hands we can say that out_channels number is the number of kernels so the number of weights = number of kernels * size of kernel = out_channels * (in_channels * kernel_size). Here is 2D conv with 3D input
I'm trying to replicate the CNN described in
https://pdfs.semanticscholar.org/3b57/85ca3c29c963ae396c2f94ba1a805c787cc8.pdf
and I'm stuck at the last layer. I've modeled the cnn like this
# Model function for CNN
def cnn_model_fn(features, labels, mode):
# Input Layer
# Reshape X to 4-D tensor: [batch_size, width, height, channels]
# Taxes images are 150x150 pixels, and have one color channel
input_layer = tf.reshape(features, [-1, 150, 150, 1])
# Convolutional Layer #1
# Input Tensor Shape: [batch_size, 150, 150, 1]
# Output Tensor Shape: [batch_size, 144, 144, 20]
conv1 = tf.layers.conv2d(
inputs=input_layer,
filters=20,
kernel_size=[7, 7],
padding="valid",
activation=tf.nn.relu)
# Pooling Layer #1
# Input Tensor Shape: [batch_size, 144, 144, 20]
# Output Tensor Shape: [batch_size, 36, 36, 20]
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[4, 4], strides=4)
# Convolutional Layer #2
# Input Tensor Shape: [batch_size, 36, 36, 20]
# Output Tensor Shape: [batch_size, 32, 32, 50]
conv2 = tf.layers.conv2d(
inputs=pool1,
filters=50,
kernel_size=[5, 5],
padding="valid",
activation=tf.nn.relu)
# Pooling Layer #2
# Input Tensor Shape: [batch_size, 32, 32, 50]
# Output Tensor Shape: [batch_size, 8, 8, 50]
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[4, 4], strides=4)
# Flatten tensor into a batch of vectors
# Input Tensor Shape: [batch_size, 8, 8, 50]
# Output Tensor Shape: [batch_size, 8 * 8 * 50]
pool2_flat = tf.reshape(pool2, [-1, 8 * 8 * 50])
# Dense Layer #1
# Densely connected layer with 1000 neurons
# Input Tensor Shape: [batch_size, 8 * 8 * 50]
# Output Tensor Shape: [batch_size, 1000]
dense1 = tf.layers.dense(inputs=pool2_flat, units=1000, activation=tf.nn.relu)
# Dense Layer #2
# Densely connected layer with 1000 neurons
# Input Tensor Shape: [batch_size, 1000]
# Output Tensor Shape: [batch_size, 1000]
dense2 = tf.layers.dense(inputs=dense1, units=1000, activation=tf.nn.relu)
# Add dropout operation; 0.5 probability that element will be kept
dropout = tf.layers.dropout(
inputs=dense2, rate=0.5, training=mode == learn.ModeKeys.TRAIN)
# Logits layer
# Input Tensor Shape: [batch_size, 1000]
# Output Tensor Shape: [batch_size, 4]
logits = tf.layers.dense(inputs=dropout, units=nClass)
loss = None
train_op = None
# Calculate Loss (for both TRAIN and EVAL modes)
if mode != learn.ModeKeys.INFER:
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=nClass)
loss = tf.losses.softmax_cross_entropy(
onehot_labels=onehot_labels, logits=logits)
# Configure the Training Op (for TRAIN mode)
if mode == learn.ModeKeys.TRAIN:
train_op = tf.contrib.layers.optimize_loss(
loss=loss,
global_step=tf.contrib.framework.get_global_step(),
learning_rate=0.001,
optimizer="SGD")
# Generate Predictions
predictions = {
"classes": tf.argmax(
input=logits, axis=1)
}
# Return a ModelFnOps object
return model_fn_lib.ModelFnOps(
mode=mode, predictions=predictions, loss=loss, train_op=train_op)
but the final accuracy is really poor (0.25). So I realized that actually the paper states that the last layer is a softmax layer. So i tried changed my logits layer to
logits = tf.layers.softmax(dropout)
but when I run it, it says
ValueError: Shapes (?, 1000) and (?, 4) are incompatible
So, what I'm missing here?
The original one was correct. The softmax activation is applied while calculating the loss with tf.losses.softmax_cross_entropy. If you want to calculate it separately you should add it after the logits calculation, but without replacing it as you did.
logits = tf.layers.dense(inputs=dropout, units=nClass)
softmax = tf.layers.softmax(logits)
Or you can combine both in one, but I wouldn't recommend it. It is better to calculate the softmax with the loss.
logits = tf.layers.dense(inputs=dropout, units=nClass, activation=tf.layers.softmax)
Your classifier is not doing better than random, so I would say that the problem lays somewhere else, maybe in the data loading and preprocessing.