I am having trouble understanding how 2D Conv calculations are done on 4D inputs. Basically, this is the situation, I have an image of height, width, channels = 128, 128, 103. I want each of these 103 channels to be processed individually as if I'm inputting them to the network one by one. Would the following line work?
import tensorflow.keras
from tensorflow.keras.layers import Conv2D
model1 = tensorflow.keras.models.Sequential()
model1.add(Conv2D(1, kernel_size=(3,3), input_shape = (128, 128,103,1), padding='same'))
I want to avoid splitting the image and inputting it into the network as 103 batches of (128,128,1)
As explained in the documentation: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D?version=nightly
4+D tensor with shape: batch_shape + (channels, rows, cols) if data_format='channels_first' or
4+D tensor with shape: batch_shape + (rows, cols, channels) if data_format='channels_last'.
(by default: data_format='channels_last'.)
You are passing a 5D tensor of shape (batch_shape, 128, 128, 103, 1).
I suggest you reshape your tensor into something that will yield a shape like this one (None, 128, 128, 103).
Also please change input_shape = (128, 128,103,1) to input_shape = (128, 128,103)
Related
I have a ResNet9 model, implemented in Pytorch which I am using for multi-class image classification. My total number of classes is 6. Using the following code, from torchsummary library, I am able to show the summary of the model, seen in the attached image:
INPUT_SHAPE = (3, 256, 256) #input shape of my image
print(summary(model.cuda(), (INPUT_SHAPE)))
However, I am quite confused about the -1 values in all layers of the ResNet9 model. Also, for Conv2d-1 layer, I am confused about the 64 value in the output shape [-1, 64, 256, 256] as I believe the n_channels value of the input image is 3. Can anyone please help me with the explanation of the output shape values? Thanks!
Yes
your INPUT_SHAPE is torch.Size([3, 256, 256]) if it's channel first format AND (256, 256, 3) if it's channel last format.
As Pytorch model accepts it in channel first format , for you it shows torch.Size([3, 256, 256])
and talking about our output shape [-1, 64, 256, 256], this is the output shape of your first conv output which has 64 filter each of 256x256 dim and not your input_shape.
-1 represents your variable batch_size which can be fixed in dataloader
In Tensorflow's TimeDistributed document. There is an example:
inputs = tf.keras.Input(shape=(10, 128, 128, 3))
conv_2d_layer = tf.keras.layers.Conv2D(64, (3, 3))
outputs = tf.keras.layers.TimeDistributed(conv_2d_layer)(inputs)
outputs.shape
And the output is TensorShape([None, 10, 126, 126, 64]). How come it ends up with this shape?
Consider a batch of 32 video samples, where each sample is a 128x128 RGB image, across 10 timesteps. The batch input shape is (32, 10, 128, 128, 3).
You can then use TimeDistributed to apply the same Conv2D layer to each of the 10 timesteps, independently as shown below
inputs = tf.keras.Input(shape=(10, 128, 128, 3))
conv_2d_layer = tf.keras.layers.Conv2D(64, (3, 3))
outputs = tf.keras.layers.TimeDistributed(conv_2d_layer)(inputs)
outputs.shape
Output:
TensorShape([None, 10, 126, 126, 64])
And the output is TensorShape([None, 10, 126, 126, 64]). How come it
ends up with this shape?
Here is the formula to calculate output shape of convolution layer
In this case, you have 128 input features, 3 is the convolution kernel size, 0 is convolution padding size and 1 is convolution stride size (by default if not provided) then output features is (128+2*0-3)/1+1=126.
Same apply for other dimension. So now you have a 126x126 image for one filter.
If you apply this for 64 times you will have another dimension 126x126x64, now output shape across 10 time steps is [None, 10, 126, 126, 64]. Here None is batch_size. For more information you can refer this and this.
I'm building a CNN that can classify greyscale CT images as positive for pneumonia or negative. (Binary image classifier), but when I run the model it gives this error: ValueError: Error when checking input: expected conv2d_input to have shape (128, 128, 1) but got array with shape (128, 128, 3)
Here is the first conv2D layer;
model.add(Conv2D(input_shape=(128,128,1), filters=64, kernel_size=(2, 2), padding="same", activation="relu"))
I changed my input shape to 128, 128, 1 so I don't understand what the issue is. If any extra code is needed to help with gaining further insight into the issue, please let me know.
For part of an embedded project, I trained a network in Tensorflow, and now I'm reloading the variables in a Numpy/Scipy-based model script. However, I am unclear on how to redo the conv2d steps with the weights I have.
I've looked at this link: Difference between Tensorflow convolution and numpy convolution,
but I haven't made the connection to a problem where the weights are four-dimensional.
This is my Tensorflow code:
# input shape: (1, 224, 224, 1)
weight1 = tf.Variable([3,3,1,16],stddev)
conv1 = tf.nn.conv2d(input,w,[1,1,1,1])
# conv1 shape: (1, 224, 224, 16)
weight2 = tf.Variable([3,3,16,32],stddev)
conv2 = tf.nn.conv2d(conv2,w,[1,1,1,1])
# conv2 shape: (1, 224, 224, 32)
And when I try to use convolve functions from Scipy or Numpy libraries, the output dimensions are incorrect:
from scipy.ndimage.filters import convolve
conv1 = convolve(input, weight1[::-1])
# conv1 shape: (1, 224, 224, 1)
conv2 = convolve(conv1, weight2[::-1])
# conv2 shape: (1, 224, 224, 16)
Bellow is a piece of example code from the documentation in Keras. It looks like the first convolution accepts a 256x256 image with 3 color channels. It has 64 output filters (I think these are the same as feature maps which I have read about elsewhere can someone confirm this for me). What confuses me is that the output size is (None, 64, 256, 256). I would expect it to be (None, 64 * 3, 256, 256) since it would need to do convolutions for each of the color channels. What I am wondering is how does Keras handel the color channels. Do the values get averaged together (converted to grey scale) before passing though the convolution?
# apply a 3x3 convolution with 64 output filters on a 256x256 image:
model = Sequential()
model.add(Convolution2D(64, 3, 3, border_mode='same', input_shape=(3, 256, 256)))
# now model.output_shape == (None, 64, 256, 256)
# add a 3x3 convolution on top, with 32 output filters:
model.add(Convolution2D(32, 3, 3, border_mode='same'))
# now model.output_shape == (None, 32, 256, 256)
a filter of size 3*3 with 3 input channels consists of 3*3*3 parameters, so the weights of the convolution kernels for each channel are different.
it sums up the convolution results of each channel (probably together with a bias term) to get the output. so the output shape is independent of the number of input channels, for example, (None, 64, 256, 256) rather than (None, 64 * 3, 256, 256).
I'm not 100% sure but I think a feature map refers to the output of applying one such filter to the input (for example a 256*256 matrix).