CNN for variable sized images in pytorch - python

I want to make a CNN model in pytorch which can be fed images of different sizes. I am trying to use 2d convolution layer, which takes 4D input shape (pytorch's Conv2d expects its 2D inputs to actually have 4 dimensions).
However, I'm not sure how to set up the input layer that can adjust all the variable sized images into fixed number of feature maps to pass over to remaining layers.
For example, the shape of the input for colored images is [4, 3, 32, 32], which corresponds to batch size, number of channel(RGB), width, and height. If images are grayscale, then it will have [4, 1, 32, 32], which will produce an error when the shape is not what the layer expected.
Error message is "RuntimeError: Given groups=1, weight of size [6, 3, 5, 5], expected input[4, 1, 32, 32] to have 3 channels, but got 1 channels instead"
The architecture of my current CNN is like below.
def __init__(self, num_out, kernel_size, num_input_filters):
super().__init__()
self.num_input_filters = num_input_filters
self.num_out = num_out
self.kernel_size = kernel_size
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
def forward(self, inp):
inp = self.pool(F.relu(self.conv1(inp)))
return inp
I have referenced similar questions and Fully convolutional networks (FCN) have no limitations on the input size at all, which could be the solution. And pytorch provides ConvTranspose2d() for FCN, but its parameters still seem to require fixed input size.
Are there any methods that can solve this problem?

You can just convert the grayscale images to RGB by duplicating the single channel to three.
As your example, shape [1, 32, 32] can be converted to [3, 32, 32] by the following codes:
np.concatenate((images,)*3)
If the shape is [32, 32, 1], try
np.concatenate((images,)*3, axis=-1)
If the shape is [32, 32], then you can try codes below to convert to [32, 32, 3]:
img_shape = tuple(np.ones(len(images.shape), dtype=int))
img_shape += (3,)
images = np.tile(np.expand_dims(images, axis=-1), img_shape)

Related

Input fixed length sequence of frames to CNN

I want my pytorch CNN to take as input a sequence of length SEQ_LEN of 32x32 RGB images concatenated along channels dimension. Therefore, a single input of the network has shape (32, 32, 3, SEQ_LEN). How should I define my CNN input layer?
The common way
SEQ_LEN = 10
input_conv = nn.Conv2d(in_channels=SEQ_LEN, out_channels=32, kernel_size=3)
BATCH_SIZE = 64
frames = np.random.randint(0, 255, size=(BATCH_SIZE, SEQ_LEN, 3, 32, 32))
frames_tensor = torch.tensor(frames)
input_conv(frames_tensor)
gives the error
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 10, 3, 3], but got 5-dimensional input of size [64, 10, 3, 32, 32] instead
Given your comments, it sounds like your data is not fit for a 2D convolutional neural network at all, and that a 3D one (Conv3d) would be more appropriate. As you can see from its documentation, its input shape is what you would expect.

Conv3d error on dimensions of a 5d tensor

I have a tensor shaped ([5, 1, 3, 126, 126]), which represents a video (5 frames each 126x126 rgb).
I need to forward it into a
self.resnet = nn.Sequential(
nn.Conv3d(5,5,1),
nn.UpsamplingBilinear2d(size=None, scale_factor=0.5)
)
but i get
RuntimeError: Given groups=1, weight of size [5, 5, 1, 1, 1], expected input[5, 1, 3, 126, 126] to have 5 channels, but got 1 channels instead
I think that I have probably misunderstood how the conv3d works but I can't really understand why the expected dimensions are so different from the ones that my 5d tensor has at that moment
The reason this is happening is because the shape of your tensor is wrong. The Conv3d class expects the batch_size to come first then the number of channels then the number of frames then the height and width. That is why you are getting the error. You should change the shape of your input tensor to [5,3,1,126,126] Your conv3d parameters are also wrong. The first number should be the number of input channels the conv3d is supposed to get which in your case is 3 because it is an rgb image. The second number is the number of output channels which you do not need to change.

Multi-dimension input to a neural network

I have a neural network with many layers. I have the input to the neural network of dimension [batch_size, 7, 4]. When this input is passed through the network, I observed that only the third dimension of the input keeps changing, that is if my first layer has 20 outputs, then the output of the second layer is [batch_size, 7, 20]. I need the end result after many layers to be of the shape [batchsize, 16].
I have the following questions:
Are the other two dimensions being used at all?
If not, how can I modify my network so that all three dimensions are used?
How do I drop one dimension meaningfully to get the 2-d output that I desire?
Following is my current implementation in Tensorflow v1.14 and Python 3:
out1 = tf.layers.dense(inputs=noisy_data, units=150, activation=tf.nn.tanh) # Outputs [batch, 7, 150]
out2 = tf.layers.dense(inputs=out1, units=75, activation=tf.nn.tanh) # Outputs [batch, 7, 75]
out3 = tf.layers.dense(inputs=out2, units=32, activation=tf.nn.tanh) # Outputs [batch, 7, 32]
out4 = tf.layers.dense(inputs=out3, units=16, activation=tf.nn.tanh) # Outputs [batch, 7, 16]
Any help is appreciated. Thanks.
Answer to Question 1: The data values in 2nd dimension (axis=1) are not being used because if you look at the output of code snippet below (assuming batch_size=2):
>>> input1 = tf.placeholder(float, shape=[2,7,4])
>>> tf.layers.dense(inputs=input1, units=150, activation=tf.nn.tanh)
>>> graph = tf.get_default_graph()
>>> graph.get_collection('variables')
[<tf.Variable 'dense/kernel:0' shape=(4, 150) dtype=float32_ref>, <tf.Variable 'dense/bias:0' shape=(150,) dtype=float32_ref>]
you can see that the dense layer ignores values along 2nd dimension. However, the values along 1st dimension would be considered as it is a part of a batch though the offical tensorflow docs doesn't say anything about the required input shape.
Answer to Question 2: Reshape the input [batch_size, 7, 4] to [batch_size, 28] by using the below line of code before passing the input to the first dense layer:
input1 = tf.reshape(input1, [-1, 7*4])
Answer to Question 3: If you reshape the inputs as above, there is no need to drop a dimension.

Tensorflow convolution

I'm trying to perform a convolution (conv2d) on images of variable dimensions. I have those images in form of an 1-D array and I want to perform a convolution on them, but I have a lot of troubles with the shapes.
This is my code of the conv2d:
tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')
where x is the input image.
The error is:
ValueError: Shape must be rank 4 but is rank 1 for 'Conv2D' (op: 'Conv2D') with input shapes: [1], [5,5,1,32].
I think I might reshape x, but I don't know the right dimensions. When I try this code:
x = tf.reshape(self.x, shape=[-1, 5, 5, 1]) # example
I get this:
ValueError: Dimension size must be evenly divisible by 25 but is 1 for 'Reshape' (op: 'Reshape') with input shapes: [1], [4] and with input tensors computed as partial shapes: input[1] = [?,5,5,1].
You can't use conv2d with a tensor of rank 1. Here's the description from the doc:
Computes a 2-D convolution given 4-D input and filter tensors.
These four dimensions are [batch, height, width, channels] (as Engineero already wrote).
If you don't know the dimensions of the image in advance, tensorflow allows to provide a dynamic shape:
x = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='x')
with tf.Session() as session:
print session.run(x, feed_dict={x: data})
In this example, a 4-D tensor x is created, but only the number of channels is known statically (3), everything else is determined on runtime. So you can pass this x into conv2d, even if the size is dynamic.
But there's another problem. You didn't say your task, but if you're building a convolutional neural network, I'm afraid, you'll need to know the size of the input to determine the size of FC layer after all pooling operations - this size must be static. If this is the case, I think the best solution is actually to scale your inputs to a common size before passing it into a convolutional network.
UPD:
Since it wasn't clear, here's how you can reshape any image into 4-D array.
a = np.zeros([50, 178, 3])
shape = a.shape
print shape # prints (50, 178, 3)
a = a.reshape([1] + list(shape))
print a.shape # prints (1, 50, 178, 3)

Implementing CNN with tensorflow

I'm new in convolutional neural networks and in Tensorflow and I need to implement a conv layer with further parameters:
Conv. layer1: filter=11, channel=64, stride=4, Relu.
The API is following:
tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None, data_format=None, name=None)
I understand, what is stride and that it should be [1, 4, 4, 1] in my case. But I do not understand, how should I pass a filter parameter and padding.
Could someone help with it?
At first, you need to create a filter variable:
W = tf.Variable(tf.truncated_normal(shape = [11, 11, 3, 64], stddev = 0.1), tf.float32)
First two fields of shape parameter stand for filter size, third for the number of input channels (I guess your images have 3 channels) and fourth for the number of output channels.
Now output of convolutional layer could be computed as follows:
conv1 = tf.nn.conv2d(input, W, strides = [1, 4, 4, 1], padding = 'SAME'), where padding = 'SAME' stands for zero padding and therefore size of the image remains the same, input should have size [batch, size1, size2, 3].
ReLU application is pretty straightforward:
conv1 = tf.nn.relu(conv1)

Categories