Input fixed length sequence of frames to CNN - python

I want my pytorch CNN to take as input a sequence of length SEQ_LEN of 32x32 RGB images concatenated along channels dimension. Therefore, a single input of the network has shape (32, 32, 3, SEQ_LEN). How should I define my CNN input layer?
The common way
SEQ_LEN = 10
input_conv = nn.Conv2d(in_channels=SEQ_LEN, out_channels=32, kernel_size=3)
BATCH_SIZE = 64
frames = np.random.randint(0, 255, size=(BATCH_SIZE, SEQ_LEN, 3, 32, 32))
frames_tensor = torch.tensor(frames)
input_conv(frames_tensor)
gives the error
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 10, 3, 3], but got 5-dimensional input of size [64, 10, 3, 32, 32] instead

Given your comments, it sounds like your data is not fit for a 2D convolutional neural network at all, and that a 3D one (Conv3d) would be more appropriate. As you can see from its documentation, its input shape is what you would expect.

Related

CNN for variable sized images in pytorch

I want to make a CNN model in pytorch which can be fed images of different sizes. I am trying to use 2d convolution layer, which takes 4D input shape (pytorch's Conv2d expects its 2D inputs to actually have 4 dimensions).
However, I'm not sure how to set up the input layer that can adjust all the variable sized images into fixed number of feature maps to pass over to remaining layers.
For example, the shape of the input for colored images is [4, 3, 32, 32], which corresponds to batch size, number of channel(RGB), width, and height. If images are grayscale, then it will have [4, 1, 32, 32], which will produce an error when the shape is not what the layer expected.
Error message is "RuntimeError: Given groups=1, weight of size [6, 3, 5, 5], expected input[4, 1, 32, 32] to have 3 channels, but got 1 channels instead"
The architecture of my current CNN is like below.
def __init__(self, num_out, kernel_size, num_input_filters):
super().__init__()
self.num_input_filters = num_input_filters
self.num_out = num_out
self.kernel_size = kernel_size
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
def forward(self, inp):
inp = self.pool(F.relu(self.conv1(inp)))
return inp
I have referenced similar questions and Fully convolutional networks (FCN) have no limitations on the input size at all, which could be the solution. And pytorch provides ConvTranspose2d() for FCN, but its parameters still seem to require fixed input size.
Are there any methods that can solve this problem?
You can just convert the grayscale images to RGB by duplicating the single channel to three.
As your example, shape [1, 32, 32] can be converted to [3, 32, 32] by the following codes:
np.concatenate((images,)*3)
If the shape is [32, 32, 1], try
np.concatenate((images,)*3, axis=-1)
If the shape is [32, 32], then you can try codes below to convert to [32, 32, 3]:
img_shape = tuple(np.ones(len(images.shape), dtype=int))
img_shape += (3,)
images = np.tile(np.expand_dims(images, axis=-1), img_shape)

How to specify input_shape in Conv2D layer in Tensor Flow 2.0, keras

I'm building an image classifier model which classifies Handwritten digits MNIST 28x28 grayscale images using CNN
Here is my layer defination
model = keras.Sequential()
model.add(keras.layers.Conv2D(64,(3,3),activation='relu',input_shape=(28,28,1)))
model.add(keras.layers.MaxPool2D((2,2)))
model.add(keras.layers.Conv2D(64,(3,3),activation='relu'))
model.add(keras.layers.MaxPool2D((2,2)))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(200,activation='relu'))
model.add(keras.layers.Dense(10,activation='softmax'))
But i get this error when i fit the model
ValueError: Input 0 of layer sequential_6 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: [32, 28, 28]
And also i want to know why we should mention 1 in input_shape in Conv2D layer.The image shape is 28x28 but we should mention 1 there.
The minimal change that should work is to change the line:
model.add(keras.layers.Conv2D(64,(3,3),activation='relu',input_shape=(28,28,1)))
to this, dropping the 1:
model.add(keras.layers.Conv2D(64,(3,3),activation='relu',input_shape=(28,28)))
The reason you have the error is that your input image is 28x28 and the batch size you feed into the network has 32 images, thus an array of dimension [32, 28, 28]. Unfortunately I don't see how you feed the input to the network. But what your current code expect is an array of dimension [32, 28, 28, 1]. If that's a numpy array that you can manipulate, just reshape() it to such dimension will solve the problem.
What I suggested above is to do the other way round, ask the network to expect each image of 2D array of dimension [28,28] instead of 3D array of dimension [28,28,1]
Update:
You provided the following code change that made it work:
train_image=train_image.reshape(60000, 28, 28, 1)
train_image=train_image / 255.0
test_image = test_image.reshape(10000, 28, 28, 1)
test_image=test_image/255.0
What this does is that your input images are in a single huge numpy array and you fit your model with it directly. The model fit function will select "tensors" from this array from its first dimension and create a batch for each training step. The batch size is 32, so it will implicitly create an array of shape (32, 28, 28, 1) and pass it down the layers. The 2nd to 4th dimension is merely copied from the original array.
The reshape() command is to change the dimension of the array. Your original array before reshape was (60000, 28, 28) and if you lay it out as a single sequence of numbers, there will be 6000x28x28 floats. What reshape() does is to pick up these numbers and fill them into a (60000, 28, 28, 1) array, which expects 60000x28x28x1 numbers, so it can be filled exactly.

Multi-dimension input to a neural network

I have a neural network with many layers. I have the input to the neural network of dimension [batch_size, 7, 4]. When this input is passed through the network, I observed that only the third dimension of the input keeps changing, that is if my first layer has 20 outputs, then the output of the second layer is [batch_size, 7, 20]. I need the end result after many layers to be of the shape [batchsize, 16].
I have the following questions:
Are the other two dimensions being used at all?
If not, how can I modify my network so that all three dimensions are used?
How do I drop one dimension meaningfully to get the 2-d output that I desire?
Following is my current implementation in Tensorflow v1.14 and Python 3:
out1 = tf.layers.dense(inputs=noisy_data, units=150, activation=tf.nn.tanh) # Outputs [batch, 7, 150]
out2 = tf.layers.dense(inputs=out1, units=75, activation=tf.nn.tanh) # Outputs [batch, 7, 75]
out3 = tf.layers.dense(inputs=out2, units=32, activation=tf.nn.tanh) # Outputs [batch, 7, 32]
out4 = tf.layers.dense(inputs=out3, units=16, activation=tf.nn.tanh) # Outputs [batch, 7, 16]
Any help is appreciated. Thanks.
Answer to Question 1: The data values in 2nd dimension (axis=1) are not being used because if you look at the output of code snippet below (assuming batch_size=2):
>>> input1 = tf.placeholder(float, shape=[2,7,4])
>>> tf.layers.dense(inputs=input1, units=150, activation=tf.nn.tanh)
>>> graph = tf.get_default_graph()
>>> graph.get_collection('variables')
[<tf.Variable 'dense/kernel:0' shape=(4, 150) dtype=float32_ref>, <tf.Variable 'dense/bias:0' shape=(150,) dtype=float32_ref>]
you can see that the dense layer ignores values along 2nd dimension. However, the values along 1st dimension would be considered as it is a part of a batch though the offical tensorflow docs doesn't say anything about the required input shape.
Answer to Question 2: Reshape the input [batch_size, 7, 4] to [batch_size, 28] by using the below line of code before passing the input to the first dense layer:
input1 = tf.reshape(input1, [-1, 7*4])
Answer to Question 3: If you reshape the inputs as above, there is no need to drop a dimension.

reshape a matrix from [?, 100] to [batch_size, ?, 100]

I'm building an autoencoder based on RNN. After FC layer, I have to reshape my output to [batch_size, sequence_length, embedding_dimension]. However, my sequence length(timestep) for my decoder is uncertain. What I wish is something work as follow.
outputs = tf.reshape(outputs, [batch_size, None, word_dimension])
Or, is there any other way for me to get the sequence length from the input data which has a shape [batch_size, sequence_length, embedding_dimension].
You can use -1 for the dimension in your reshape operation that you want to be calculated automatically.
For example, here:
x = tf.zeros((100 * 10 *12,))
reshaped = tf.reshape(x, [100, -1, 12])
reshaped will have shape (100, 10, 12)
Or, is there any other way for me to get the sequence length from the input data which has a shape [batch_size, sequence_length, embedding_dimension].
You can use the tf.shape operation to find the shape of a tensor at runtime so if you want sequence_length in a tensor with shape [batch_size, sequence_length, embedding_dimension], you need just call tf.shape(x)[1].
For my example above, calling:
tf.shape(reshaped)[1]
would give an int32 tensor with shape () and value 10

Tensorflow convolution

I'm trying to perform a convolution (conv2d) on images of variable dimensions. I have those images in form of an 1-D array and I want to perform a convolution on them, but I have a lot of troubles with the shapes.
This is my code of the conv2d:
tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')
where x is the input image.
The error is:
ValueError: Shape must be rank 4 but is rank 1 for 'Conv2D' (op: 'Conv2D') with input shapes: [1], [5,5,1,32].
I think I might reshape x, but I don't know the right dimensions. When I try this code:
x = tf.reshape(self.x, shape=[-1, 5, 5, 1]) # example
I get this:
ValueError: Dimension size must be evenly divisible by 25 but is 1 for 'Reshape' (op: 'Reshape') with input shapes: [1], [4] and with input tensors computed as partial shapes: input[1] = [?,5,5,1].
You can't use conv2d with a tensor of rank 1. Here's the description from the doc:
Computes a 2-D convolution given 4-D input and filter tensors.
These four dimensions are [batch, height, width, channels] (as Engineero already wrote).
If you don't know the dimensions of the image in advance, tensorflow allows to provide a dynamic shape:
x = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='x')
with tf.Session() as session:
print session.run(x, feed_dict={x: data})
In this example, a 4-D tensor x is created, but only the number of channels is known statically (3), everything else is determined on runtime. So you can pass this x into conv2d, even if the size is dynamic.
But there's another problem. You didn't say your task, but if you're building a convolutional neural network, I'm afraid, you'll need to know the size of the input to determine the size of FC layer after all pooling operations - this size must be static. If this is the case, I think the best solution is actually to scale your inputs to a common size before passing it into a convolutional network.
UPD:
Since it wasn't clear, here's how you can reshape any image into 4-D array.
a = np.zeros([50, 178, 3])
shape = a.shape
print shape # prints (50, 178, 3)
a = a.reshape([1] + list(shape))
print a.shape # prints (1, 50, 178, 3)

Categories