I currently have a tensor of torch.Size([1, 3, 256, 224]) but I need it to be input shape [32, 3, 256, 224]. I am capturing data in real-time so dataloader doesn't seem to be a good option. Is there any easy way to take 32 of size torch.Size([1, 3, 256, 224]) and combine them to create 1 tensor of size [32, 3, 256, 224]?
You are probable using jit model, and the batch size must be exact like the one the model was trained on.
t = torch.rand(1, 3, 256, 224)
t.size() # torch.Size([1, 3, 256, 224])
t2= t.expand(32, -1,-1,-1)
t2.size() # torch.Size([32, 3, 256, 224])
Expanding a tensor does not allocate new memory, but only creates a new view on the existing tensor, and you get what you need. Only the tensor stride was changed.
Related
For reducing the number of feature maps in a 4D tensor, it is possible to use Conv2d while keeping the height and width of that tensor as the same as before. To be clearer, For example for the following 4D tensor [batch_size, feature_maps, height,weight] I use the following approach for reducing the number of feature maps but the height and size will be the same as before:
self.channel_reduction = nn.Conv2d(1024, 512, kernel_size=1, stride=1)
But I have the following 5D tensor [batch_size, feature_maps, num_frames, height, width] (e.g. [16, 1024, 16, 56, 56]) and I want to reduce the number of feature maps from 1024 to 512 while keeping the height and width size as the same as before (e.g. [16, 1024, 16, 56, 56]). How can I reduce the number of feature maps?
self.channel_reduction = nn.Conv3d(1024, 512, kernel_size=1, stride=1)
I have a ResNet9 model, implemented in Pytorch which I am using for multi-class image classification. My total number of classes is 6. Using the following code, from torchsummary library, I am able to show the summary of the model, seen in the attached image:
INPUT_SHAPE = (3, 256, 256) #input shape of my image
print(summary(model.cuda(), (INPUT_SHAPE)))
However, I am quite confused about the -1 values in all layers of the ResNet9 model. Also, for Conv2d-1 layer, I am confused about the 64 value in the output shape [-1, 64, 256, 256] as I believe the n_channels value of the input image is 3. Can anyone please help me with the explanation of the output shape values? Thanks!
Yes
your INPUT_SHAPE is torch.Size([3, 256, 256]) if it's channel first format AND (256, 256, 3) if it's channel last format.
As Pytorch model accepts it in channel first format , for you it shows torch.Size([3, 256, 256])
and talking about our output shape [-1, 64, 256, 256], this is the output shape of your first conv output which has 64 filter each of 256x256 dim and not your input_shape.
-1 represents your variable batch_size which can be fixed in dataloader
In Tensorflow's TimeDistributed document. There is an example:
inputs = tf.keras.Input(shape=(10, 128, 128, 3))
conv_2d_layer = tf.keras.layers.Conv2D(64, (3, 3))
outputs = tf.keras.layers.TimeDistributed(conv_2d_layer)(inputs)
outputs.shape
And the output is TensorShape([None, 10, 126, 126, 64]). How come it ends up with this shape?
Consider a batch of 32 video samples, where each sample is a 128x128 RGB image, across 10 timesteps. The batch input shape is (32, 10, 128, 128, 3).
You can then use TimeDistributed to apply the same Conv2D layer to each of the 10 timesteps, independently as shown below
inputs = tf.keras.Input(shape=(10, 128, 128, 3))
conv_2d_layer = tf.keras.layers.Conv2D(64, (3, 3))
outputs = tf.keras.layers.TimeDistributed(conv_2d_layer)(inputs)
outputs.shape
Output:
TensorShape([None, 10, 126, 126, 64])
And the output is TensorShape([None, 10, 126, 126, 64]). How come it
ends up with this shape?
Here is the formula to calculate output shape of convolution layer
In this case, you have 128 input features, 3 is the convolution kernel size, 0 is convolution padding size and 1 is convolution stride size (by default if not provided) then output features is (128+2*0-3)/1+1=126.
Same apply for other dimension. So now you have a 126x126 image for one filter.
If you apply this for 64 times you will have another dimension 126x126x64, now output shape across 10 time steps is [None, 10, 126, 126, 64]. Here None is batch_size. For more information you can refer this and this.
Recently I have been trying to understand tensorflow's tf.nn.conv2d_transpose, however I have a hard time understanding the input parameters for it. It's defined as:
tf.nn.conv2d_transpose(value, filter, output_shape, strides, padding='SAME')
For example, let's say I have a image of size [batch_size, 7, 7, 128] and want to transform it to [batch_size, 14, 14, 64]. Then output_shape=[batch_size, 14, 14, 64], strides=[2,2], however I can't figure out how to get the shape of the filter. Any thoughts?
Furthermore how does padding="SAME" works for conv2d_transpose? Is it applied to the output image or the input?
For the first question on filter shapes, I'd use the object oriented version tf.layers.Conv2DTranspose and look at the kernel property to figure out the filter shapes:
>>> import tensorflow as tf
>>> l = tf.layers.Conv2DTranspose(filters=64, kernel_size=1, padding='SAME', strides=[2, 2])
>>> l(tf.ones([12, 7, 7, 128]))
<tf.Tensor 'conv2d_transpose/BiasAdd:0' shape=(12, 14, 14, 64) dtype=float32>
>>> l.kernel
<tf.Variable 'conv2d_transpose/kernel:0' shape=(1, 1, 64, 128) dtype=float32_ref>
>>>
On second padding question, conv2d_transpose computes the gradient of conv2d. Since conv2d pads its inputs, conv2d_transpose needs to pad its output to fit the gradient.
Bellow is a piece of example code from the documentation in Keras. It looks like the first convolution accepts a 256x256 image with 3 color channels. It has 64 output filters (I think these are the same as feature maps which I have read about elsewhere can someone confirm this for me). What confuses me is that the output size is (None, 64, 256, 256). I would expect it to be (None, 64 * 3, 256, 256) since it would need to do convolutions for each of the color channels. What I am wondering is how does Keras handel the color channels. Do the values get averaged together (converted to grey scale) before passing though the convolution?
# apply a 3x3 convolution with 64 output filters on a 256x256 image:
model = Sequential()
model.add(Convolution2D(64, 3, 3, border_mode='same', input_shape=(3, 256, 256)))
# now model.output_shape == (None, 64, 256, 256)
# add a 3x3 convolution on top, with 32 output filters:
model.add(Convolution2D(32, 3, 3, border_mode='same'))
# now model.output_shape == (None, 32, 256, 256)
a filter of size 3*3 with 3 input channels consists of 3*3*3 parameters, so the weights of the convolution kernels for each channel are different.
it sums up the convolution results of each channel (probably together with a bias term) to get the output. so the output shape is independent of the number of input channels, for example, (None, 64, 256, 256) rather than (None, 64 * 3, 256, 256).
I'm not 100% sure but I think a feature map refers to the output of applying one such filter to the input (for example a 256*256 matrix).