I have a tensor shaped ([5, 1, 3, 126, 126]), which represents a video (5 frames each 126x126 rgb).
I need to forward it into a
self.resnet = nn.Sequential(
nn.Conv3d(5,5,1),
nn.UpsamplingBilinear2d(size=None, scale_factor=0.5)
)
but i get
RuntimeError: Given groups=1, weight of size [5, 5, 1, 1, 1], expected input[5, 1, 3, 126, 126] to have 5 channels, but got 1 channels instead
I think that I have probably misunderstood how the conv3d works but I can't really understand why the expected dimensions are so different from the ones that my 5d tensor has at that moment
The reason this is happening is because the shape of your tensor is wrong. The Conv3d class expects the batch_size to come first then the number of channels then the number of frames then the height and width. That is why you are getting the error. You should change the shape of your input tensor to [5,3,1,126,126] Your conv3d parameters are also wrong. The first number should be the number of input channels the conv3d is supposed to get which in your case is 3 because it is an rgb image. The second number is the number of output channels which you do not need to change.
Related
my medical PNG images for test have 3 channels as given below :
import cv2
from google.colab.patches import cv2_imshow
img= cv2.imread("a.png")
print('Image Dimensions :', img.shape)
img= cv2.imread("ax2.png")
print('Image Dimensions :', img.shape)
---------------------> results : <--------------------------------
Image Dimensions : (625, 698, 3)
Image Dimensions : (426, 535, 3)
As it is known, my test images have 3 channels, but I got an error as follows, which says that the images have 4 channels
RuntimeError: Given groups=1, weight of size [3, 3, 1, 1], expected input[1, 4, 268, 300] to have 3 channels, but got 4 channels instead
What is the problem and how can I fix it?
thanks!
weight of size [3, 3, 1, 1] means that your conv2D has an input channel size 3 (second entry of the list).
As a hint: weigth size is [out_channels, in_channels, kernel, kernel]
\newline
input[1, 4, 268, 300] means that your input has channel size of 4. It should, however, be 3.
As a hint: input size is [N, in_channels, H_in, W_in]
Now, you should consider what the shape of the input fed to the network is. It might be that you forgot to change the shape in the format mentioned before (cv2 has a different channel order [H, W, in_channels]), that you concatenated inputs wrongly or similar. So, checking the input shape should definitely help you here.
I'm trying to get my head around 1D convolution - specifically, how the padding comes into it.
Suppose I have an input sequence of shape (batch,128,1) and run it through the following Keras layer:
tf.keras.layers.Conv1D(32, 5, strides=2, padding="same")
I get an output of shape (batch,64,32), but I don't understand why the sequence length has reduced from 128 to 64... I thought the padding="same" parameter kept the output length the same as the input? I suppose that's only true if strides=1; so in this case I'm confused about what padding="same" actually means.
According to the TensorFlow documents in your case we have:
filters (Number of filters - output dimension) = 32
kernelSize (The filter size) = 5
strides (The unit to move in input data by the convolution layer in each dimensions after applying each convolution) = 2
So applying input in shape (batch, 128, 1) will be to apply 32 kernels (in shape 5) and jump two unit after each convolution - so we have 128 / 2 = 64 value corresponding to each filter and at the end output would be in shape (batch, 64, 32).
padding="same" is just determining the the convolution on borders. For more details you can check here.
I want to make a CNN model in pytorch which can be fed images of different sizes. I am trying to use 2d convolution layer, which takes 4D input shape (pytorch's Conv2d expects its 2D inputs to actually have 4 dimensions).
However, I'm not sure how to set up the input layer that can adjust all the variable sized images into fixed number of feature maps to pass over to remaining layers.
For example, the shape of the input for colored images is [4, 3, 32, 32], which corresponds to batch size, number of channel(RGB), width, and height. If images are grayscale, then it will have [4, 1, 32, 32], which will produce an error when the shape is not what the layer expected.
Error message is "RuntimeError: Given groups=1, weight of size [6, 3, 5, 5], expected input[4, 1, 32, 32] to have 3 channels, but got 1 channels instead"
The architecture of my current CNN is like below.
def __init__(self, num_out, kernel_size, num_input_filters):
super().__init__()
self.num_input_filters = num_input_filters
self.num_out = num_out
self.kernel_size = kernel_size
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
def forward(self, inp):
inp = self.pool(F.relu(self.conv1(inp)))
return inp
I have referenced similar questions and Fully convolutional networks (FCN) have no limitations on the input size at all, which could be the solution. And pytorch provides ConvTranspose2d() for FCN, but its parameters still seem to require fixed input size.
Are there any methods that can solve this problem?
You can just convert the grayscale images to RGB by duplicating the single channel to three.
As your example, shape [1, 32, 32] can be converted to [3, 32, 32] by the following codes:
np.concatenate((images,)*3)
If the shape is [32, 32, 1], try
np.concatenate((images,)*3, axis=-1)
If the shape is [32, 32], then you can try codes below to convert to [32, 32, 3]:
img_shape = tuple(np.ones(len(images.shape), dtype=int))
img_shape += (3,)
images = np.tile(np.expand_dims(images, axis=-1), img_shape)
I want my pytorch CNN to take as input a sequence of length SEQ_LEN of 32x32 RGB images concatenated along channels dimension. Therefore, a single input of the network has shape (32, 32, 3, SEQ_LEN). How should I define my CNN input layer?
The common way
SEQ_LEN = 10
input_conv = nn.Conv2d(in_channels=SEQ_LEN, out_channels=32, kernel_size=3)
BATCH_SIZE = 64
frames = np.random.randint(0, 255, size=(BATCH_SIZE, SEQ_LEN, 3, 32, 32))
frames_tensor = torch.tensor(frames)
input_conv(frames_tensor)
gives the error
RuntimeError: Expected 4-dimensional input for 4-dimensional weight [32, 10, 3, 3], but got 5-dimensional input of size [64, 10, 3, 32, 32] instead
Given your comments, it sounds like your data is not fit for a 2D convolutional neural network at all, and that a 3D one (Conv3d) would be more appropriate. As you can see from its documentation, its input shape is what you would expect.
I'm trying to perform a convolution (conv2d) on images of variable dimensions. I have those images in form of an 1-D array and I want to perform a convolution on them, but I have a lot of troubles with the shapes.
This is my code of the conv2d:
tf.nn.conv2d(x, w, strides=[1, 1, 1, 1], padding='SAME')
where x is the input image.
The error is:
ValueError: Shape must be rank 4 but is rank 1 for 'Conv2D' (op: 'Conv2D') with input shapes: [1], [5,5,1,32].
I think I might reshape x, but I don't know the right dimensions. When I try this code:
x = tf.reshape(self.x, shape=[-1, 5, 5, 1]) # example
I get this:
ValueError: Dimension size must be evenly divisible by 25 but is 1 for 'Reshape' (op: 'Reshape') with input shapes: [1], [4] and with input tensors computed as partial shapes: input[1] = [?,5,5,1].
You can't use conv2d with a tensor of rank 1. Here's the description from the doc:
Computes a 2-D convolution given 4-D input and filter tensors.
These four dimensions are [batch, height, width, channels] (as Engineero already wrote).
If you don't know the dimensions of the image in advance, tensorflow allows to provide a dynamic shape:
x = tf.placeholder(tf.float32, shape=[None, None, None, 3], name='x')
with tf.Session() as session:
print session.run(x, feed_dict={x: data})
In this example, a 4-D tensor x is created, but only the number of channels is known statically (3), everything else is determined on runtime. So you can pass this x into conv2d, even if the size is dynamic.
But there's another problem. You didn't say your task, but if you're building a convolutional neural network, I'm afraid, you'll need to know the size of the input to determine the size of FC layer after all pooling operations - this size must be static. If this is the case, I think the best solution is actually to scale your inputs to a common size before passing it into a convolutional network.
UPD:
Since it wasn't clear, here's how you can reshape any image into 4-D array.
a = np.zeros([50, 178, 3])
shape = a.shape
print shape # prints (50, 178, 3)
a = a.reshape([1] + list(shape))
print a.shape # prints (1, 50, 178, 3)