Meaning of output shapes of ResNet9 model layers - python

I have a ResNet9 model, implemented in Pytorch which I am using for multi-class image classification. My total number of classes is 6. Using the following code, from torchsummary library, I am able to show the summary of the model, seen in the attached image:
INPUT_SHAPE = (3, 256, 256) #input shape of my image
print(summary(model.cuda(), (INPUT_SHAPE)))
However, I am quite confused about the -1 values in all layers of the ResNet9 model. Also, for Conv2d-1 layer, I am confused about the 64 value in the output shape [-1, 64, 256, 256] as I believe the n_channels value of the input image is 3. Can anyone please help me with the explanation of the output shape values? Thanks!

Yes
your INPUT_SHAPE is torch.Size([3, 256, 256]) if it's channel first format AND (256, 256, 3) if it's channel last format.
As Pytorch model accepts it in channel first format , for you it shows torch.Size([3, 256, 256])
and talking about our output shape [-1, 64, 256, 256], this is the output shape of your first conv output which has 64 filter each of 256x256 dim and not your input_shape.
-1 represents your variable batch_size which can be fixed in dataloader

Related

Processing 4D input with 2D Convolution in Tensorflow

I am having trouble understanding how 2D Conv calculations are done on 4D inputs. Basically, this is the situation, I have an image of height, width, channels = 128, 128, 103. I want each of these 103 channels to be processed individually as if I'm inputting them to the network one by one. Would the following line work?
import tensorflow.keras
from tensorflow.keras.layers import Conv2D
model1 = tensorflow.keras.models.Sequential()
model1.add(Conv2D(1, kernel_size=(3,3), input_shape = (128, 128,103,1), padding='same'))
I want to avoid splitting the image and inputting it into the network as 103 batches of (128,128,1)
As explained in the documentation: https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D?version=nightly
4+D tensor with shape: batch_shape + (channels, rows, cols) if data_format='channels_first' or
4+D tensor with shape: batch_shape + (rows, cols, channels) if data_format='channels_last'.
(by default: data_format='channels_last'.)
You are passing a 5D tensor of shape (batch_shape, 128, 128, 103, 1).
I suggest you reshape your tensor into something that will yield a shape like this one (None, 128, 128, 103).
Also please change input_shape = (128, 128,103,1) to input_shape = (128, 128,103)

Add GlobalAveragePooling2D (before ResNet50)

I'm trying to do a model using ResNet50 for image classification into 6 classes and I want to reduce the dimension of the images before using them to train the ResNet50 model. To do this I start creating a ResNet50 model using the model in keras:
ResNet = ResNet50(
include_top= None, weights='imagenet', input_tensor=None, input_shape=([64, 109, 3]),
pooling=None, classes=6)
And then I create a sequential model that includes ResNet50 but adding some final layers for the classification and also the first layer for dimensionality reduction before using ResNet50:
(About the input shape: The images I'm using have a dimension of 128x217 and the 3 is for the channel that ResNet needs)
model = models.Sequential()
model.add(GlobalAveragePooling2D(input_shape = ([128, 217, 3])))
model.add(ResNet)
model.add(GlobalAveragePooling2D())
model.add(Dense(units=512, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(units=256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(units=6, activation='softmax'))
But this doesn't work because the dimension after the first global average pooling doesn't fit with the input shape in the Resnet, the error I get is:
WARNING:tensorflow:Model was constructed with shape (None, 64, 109, 3) for input Tensor("input_6:0", shape=(None, 64, 109, 3), dtype=float32), but it was called on an input with incompatible shape (None, 3).
ValueError: Input 0 of layer conv1_pad is incompatible with the layer: expected ndim=4, found ndim=2. Full shape received: [None, 3]
I think I understand what is the problem but I don't know how to fix it since (None, 3) is not a valid input shape for ResNet50. How can I fix this? Thank you!:)
You should first understand what GlobalAveragePooling actually does. This layer cannot be apllied right after the input, because it will only give the maximum value of all the images for each channel (in your case 3 values, because you have 3 channels).
You have to use another method to reduce the size of the images (e.g. simple conversion to a smaller size.

create Tensorflow model which accepts tensor image as input to the model

I'm using below configuration for the image classification model:
model = keras.Sequential([
keras.layers.Flatten(input_shape=(100, 100, 3)),
keras.layers.Dense(128, activation='relu'),
keras.layers.Dense(10)
])
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
If I if print model.inputs then it returns
[<tf.Tensor 'flatten_input:0' shape=(None, 100, 100, 3) dtype=float32>]
If I pass tensor image to this model then it does not work. So my question is What changes should I do to my model so that it will accept tensor image
I'm passing image using below code:
image = np.asarray(image)
# The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
input_tensor = tf.convert_to_tensor(image)
# The model expects a batch of images, so add an axis with `tf.newaxis`.
input_tensor = input_tensor[tf.newaxis,...]
# Run inference
output_dict = model(input_tensor)
I get below error if I pass tensor image
WARNING:tensorflow:Model was constructed with shape (None, 100, 100, 3) for input Tensor("flatten_input:0", shape=(None, 100, 100, 3), dtype=float32), but it was called on an input with incompatible shape (1, 886, 685, 3).
---------------------------------------------------------------------------
InvalidArgumentError Traceback (most recent call last)
I just wanted to know which Keras layers and input parameters should I update to the model so that it can accept tensor image as input.
Any help would be appreciated. Thanks!
The message is a warning, not an error, just some semantics. The warning does point to a real problem.
Your model takes images with shape (100, 100, 3), and you are giving it inputs with shape (886, 865, 3). The spatial dimensions do not match, you need to resize the image to size 100 x 100.
Defining your model, you need to tell keras the number of channel of your pictures : 1 for black and white picture, 3 for RGB...
So you need to write:
keras.layers.Flatten(input_shape=(1, 100, 100, 3)) for a 1 channel picture
I also notice that you did not define an activation function for the last layer :
keras.layers.Dense(10), if you define the loss a crossentropy you need to get some probability as output of your network, for example keras.layers.Dense(10, activation='softmax')
Hope it's clear for you

Tensorflow mismatch of input shape from model restored from SavedModel format

I restored a Tensorflow model from a SavedModel format and added a new layer to the end so I can fine-tune it. But, I realized that the labels to the model from the SavedModel format are of shape (?, 256, 256, 2) whereas my current labels are (?, 256, 256, 4). As a result, I get this error:
ValueError: Cannot feed value of shape (16, 256, 256, 4) for Tensor 'labels_1:0', which has shape '(?, 256, 256, 2)'
Is there any way to somehow modify the inputs of the original model that I restored from the SavedModel format? Or is the only way to manually extract the weights from the original model and assign them to a new version of the same model?
Below is a sample of the code I'm using:
meta_graph_def = tf.saved_model.loader.load(
sess,
[tf.saved_model.tag_constants.SERVING],
model_path
)
relu_op = sess.graph.get_tensor_by_name('model/Relu_1:0')
with tf.variable_scope('fine_tune_layer'):
tune_conv_layer = slim.conv2d(relu_op,
output_channels,
1,
stride=1,
rate = 1,
padding=('SAME'),
activation_fn=None
)
I think your not correctly appending a new layer. What layer did you add? It seems your upsampling the channels somehow. Did you use something like tf.keras.layers.Conv2DTranspose or just like tf.keras.layers.Conv2D? Can you share some code?
You should have something like:
model = tf.saved_model.load("./saved_model") # has output shape (?, 256, 256, 2)
model.add(tf.keras.layers.Conv2DTranspose(4)) # (?, 256, 256, 4)
model.compile(loss_function, optimizer)
model.fit(...)
Note that above code is rough and just to give you an idea, I have no idea what you are trying to do.

What do Keras convolution layers do with color channels?

Bellow is a piece of example code from the documentation in Keras. It looks like the first convolution accepts a 256x256 image with 3 color channels. It has 64 output filters (I think these are the same as feature maps which I have read about elsewhere can someone confirm this for me). What confuses me is that the output size is (None, 64, 256, 256). I would expect it to be (None, 64 * 3, 256, 256) since it would need to do convolutions for each of the color channels. What I am wondering is how does Keras handel the color channels. Do the values get averaged together (converted to grey scale) before passing though the convolution?
# apply a 3x3 convolution with 64 output filters on a 256x256 image:
model = Sequential()
model.add(Convolution2D(64, 3, 3, border_mode='same', input_shape=(3, 256, 256)))
# now model.output_shape == (None, 64, 256, 256)
# add a 3x3 convolution on top, with 32 output filters:
model.add(Convolution2D(32, 3, 3, border_mode='same'))
# now model.output_shape == (None, 32, 256, 256)
a filter of size 3*3 with 3 input channels consists of 3*3*3 parameters, so the weights of the convolution kernels for each channel are different.
it sums up the convolution results of each channel (probably together with a bias term) to get the output. so the output shape is independent of the number of input channels, for example, (None, 64, 256, 256) rather than (None, 64 * 3, 256, 256).
I'm not 100% sure but I think a feature map refers to the output of applying one such filter to the input (for example a 256*256 matrix).

Categories