Constraint on dimensions of activation/feature map in convolutional network - python

Let's say input to intermediate CNN layer is of size 512×512×128 and that in the convolutional layer we apply 48 7×7 filters at stride 2 with no padding. I want to know what is the size of the resulting activation map?
I checked some previous posts (e.g., here or here) to point to this Stanford course page. And the formula given there is (W − F + 2P)/S + 1 = (512 - 7)/2 + 1, which would imply that this set up is not possible, as the value we get is not an integer.
However if I run the following snippet in Python 2.7, the code seems to suggest that the size of activation map was computed via (512 - 6)/2, which makes sense but does not match the formula above:
>>> import torch
>>> conv = torch.nn.Conv2d(in_channels=128, out_channels=48, kernel_size=7, stride=2, padding=0)
>>> conv
Conv2d(128, 48, kernel_size=(7, 7), stride=(2, 2))
>>> img = torch.rand((1, 128, 512, 512))
>>> out = conv(img)
>>> out.shape
(1, 48, 253, 253)
Any help in understanding this conundrum is appreciated.

Here is the formula being used in pytorch: conv2d(go to the shape section)
Also, as far as I know, this is the best tutorial on this subject.
Bonus: here is a neat visualizer for conv calculations.

Related

Imbalanced aspect ratio of input in CNN

Consider the following model using keras in TensorFlow.
Conv2D(
filter = 2^(5+i) # i = num of times called conv2D
kernel = (3, 3),
strides = (1, 1),
padding = 'valid')
MaxPooling2D(
pool_size = (2, 2))
Layer Output Shape Param
-----------------------------------------------
L0: Shape (50, 250, 1 ) 0
L1: Conv2D (48, 248, 32 ) 320
L2: MaxPooling2D (24, 124, 32 ) 0
L3: Conv2D_1 (22, 122, 64 ) 18496
L4: MaxPooling2D_1 (11, 61, 64 ) 0
L5: Conv2D_2 (9, 59, 128) 73856
L6: MaxPooling2D_2 (4, 29, 128) 0
L7: Conv2D_3 (2, 27, 256) 295168 !!
L8: MaxPooling2D_3 (1, 13, 256) 0
L9: Flatten (3328) 0
L10: Dense (512) 1704448 !!!
L11: ...
Here, an input shape with ratio of 1:5 is used. After L8, there cannot be any more convolutional layers as one side is 1. Actually in cases where input_side < kernel_size, there can be no more convolutional layers; the layer is forced to be flattened into a vector with high number of units – resulted from the large shape [1][3] and the large number of filters [2] deep into the network. The Dense layer [4] follows will have a high number of parameters that requires a lot of computation time.
To reduce the number of parameters specific to the problems (highlighted in [x]) above, I think of these methods:
Adding a (1, 2) stride to early layers of Conv2D. (Refer to this thread)
Reduce the number of filters, say, from [32, 64, 128, 256, ...] to [16, 24, 32, 48, ...].
Resize the input data to a square-shaped input so that more Conv2D layers can be applied.
Future reduce the number of units in the first Dense layer, say, from 512 to 128.
My question is, will these method work and how much will they affect the performance of the CNN? Is there any better approach to the problem? Thanks.
First of all, you can try 'same' padding instead of valid. It will save you from these diminishing numbers you are getting, somewhat.
For point 1:
Adding a non-uniform stride is only good if your data has more variation in a certain direction in this case, horizontal.
For point 2:
The number of filters don't help or hurt the way your dimensions are changing. This would hurt your performance if your model was not overfitting.
For point 3:
Resize the input to square shape would seem like a good idea but would lead to unnecessary dead neurons because of all that extras you are adding. I would suggest against it. This may hurt performance and lead to overfitting.
For point 4:
Here, again the number of units don't change the dimensions. This would hurt your performance if your model was not overfitting.
Lastly, your network is deep enough to get good results. Rather than trying to go smaller and smaller. Try increasing your Conv2D layers in between the MaxPools, that would be much better.

Shape of 1D convolution output on a 2D data using keras

I am trying to implement a 1D convolution on a time series classification problem using keras. I am having some trouble interpreting the output size of the 1D convolutional layer.
I have my data composed of the time series of different features over a time interval of 128 units and I apply a 1D convolutional layer:
x = Input((n_timesteps, n_features))
cnn1_1 = Conv1D(filters = 100, kernel_size= 10, activation='relu')(x)
which after compilation I obtain the following shapes of the outputs:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_26 (InputLayer) (None, 128, 9) 0
_________________________________________________________________
conv1d_28 (Conv1D) (None, 119, 100) 9100
I was assuming that with 1D convolution, the data is only convoluted across the time axis (axis 1) and the size of my output would be:
119, 100*9. But I guess that the network is performing some king of operation across the feature dimension (axis 2) and I don't know which operation is performing.
I am saying this because what I interpret as 1d convolution is that the features shapes must be preserved because I am only convolving the time domain: If I have 9 features, then for each filter I have 9 convolutional kernels, each of these applied to a different features and convoluted across the time axis. This should return 9 convoluted features for each filter resulting in an output shape of 119, 9*100.
However the output shape is 119, 100.
Clearly something else is happening and I can't understand it or get it.
where am I failing my reasoning? How is the 1d convolution performed?
I add one more comment which is my comment on one of the answers provided:
I understand the reduction from 128 to 119, but what I don't understand is why the feature dimension changes. For example, if I use
Conv1D(filters = 1, kernel_size= 10, activation='relu')
, then the output dimension is going to be (None, 119, 1), giving rise to only one feature after the convolution. What is going on in this dimension, which operation is performed to go from from 9 --> 1?
Conv1D needs 3D tensor for its input with shape (batch_size,time_step,feature). Based on your code, the filter size is 100 which means filter converted from 9 dimensions to 100 dimensions. How does this happen? Dot Product.
In above, X_i is the concatenation of k words (k = kernel_size), l is number of filters (l=filters), d is the dimension of input word vector, and p_i is output vector for each window of k words.
What happens in your code?
[n_features * 9] dot [n_features * 9] => [1] => repeat l-times => [1 * 100]
do above for all sequences => [128 * 100]
Another thing that happens here is you did not specify the padding type. According to the docs, by default Conv1d use valid padding which caused your dimension to reduce from 128 to 119. If you need the dimension to be the same as the input you can choose the same option:
Conv1D(filters = 100, kernel_size= 10, activation='relu', padding='same')
It Sums over the last axis, which is the feature axis, you can easily check this by doing the following:
input_shape = (1, 128, 9)
# initialize kernel with ones, and use linear activations
y = tf.keras.layers.Conv1D(1,3, activation="linear", input_shape=input_shape[2:],kernel_initializer="ones")(x)
y :
if you sum x along the feature axis you will get:
x
Now you can easily see that the sum of the first 3 values of sum of x is the first value of convolution, I used a kernel size of 3 to make this verification easier

How to make a 3-d filter with keras

I am a high schooler taking ML at my local university and we are building cnn's using keras right now. I need to use a filter of size (32, 32, 3) but keras will only let me use 2D filters.
This is what I am trying:
model.add(Conv2D(32, kernel_size = (12, 12, 3), input_shape=(32, 32, 3), activation='relu', strides=(1,2)))
This is my error: "ValueError: The kernel_size argument must be a tuple of 2 integers. Received: (12, 12, 3"
Note: I am using the Cifar10 dataset.
You are using Conv2D layer of Keras when you should be using Conv3D layer.
Although, I think you are misunderstanding the concepts once your stride doesn't have a third dimension. In a 3D convolution operation, your kernel window must move in all 3 dimensions. Thus, your stride need to have the third dimension. The last dimension of kernel_size is always infered by the last dimension of the input size of the current layer.
So, in this code snippet of yours, if you want to use a 2D convolution, use kernel_size = (12, 12) and if your really want to use 3D convolution, define your stride parameter with a third dimension and use 3DConv instead of 2DConv.

Setting custom kernel for CNN in pytorch

Is there a way to specify our own custom kernel values for a convolution neural network in pytorch? Something like kernel_initialiser in tensorflow? Eg. I want a 3x3 kernel in nn.Conv2d with initialization so that it acts as a identity kernel -
0 0 0
0 1 0
0 0 0
(this will effectively return the same output as my input in the very first iteration)
My non-exhaustive research on the subject -
I could use nn.init but it only has some pre-defined kernel initialisaition values.
I tried to follow the discussion on their official thread but it doesn't suit my needs.
I might have missed something in my research please feel free to point out.
I think an easier solution is to :
deconv = nn.ConvTranspose2d(
in_channels=channel_dim, out_channels=channel_dim,
kernel_size=kernel_size, stride=stride,
bias=False, padding=1, output_padding=1
)
deconv.weight.data.copy_(
get_upsampling_weight(channel_dim, channel_dim, kernel_size)
)
in other words use copy_
Thanks to ptrblck I was able to solve it.
I can define a new convolution layer as conv and as per the example I can set the identity kernel using -
weights = ch.Tensor([[0, 0, 0], [0, 1, 0], [0, 0, 0]]).unsqueeze(0).unsqueeze(0)
weights.requires_grad = True
conv = nn.Conv2d(1, 1, kernel_size=3, stride=1, padding=1, bias=False)
with ch.no_grad():
conv.weight = nn.Parameter(weights)
I can then continue to use conv as my regular nn.Conv2d layer.

How can I use a pre-trained neural network with grayscale images?

I have a dataset containing grayscale images and I want to train a state-of-the-art CNN on them. I'd very much like to fine-tune a pre-trained model (like the ones here).
The problem is that almost all models I can find the weights for have been trained on the ImageNet dataset, which contains RGB images.
I can't use one of those models because their input layer expects a batch of shape (batch_size, height, width, 3) or (64, 224, 224, 3) in my case, but my images batches are (64, 224, 224).
Is there any way that I can use one of those models? I've thought of dropping the input layer after I've loaded the weights and adding my own (like we do for the top layers). Is this approach correct?
The model's architecture cannot be changed because the weights have been trained for a specific input configuration. Replacing the first layer with your own would pretty much render the rest of the weights useless.
-- Edit: elaboration suggested by Prune--
CNNs are built so that as they go deeper, they can extract high-level features derived from the lower-level features that the previous layers extracted. By removing the initial layers of a CNN, you are destroying that hierarchy of features because the subsequent layers won't receive the features that they are supposed to as their input. In your case the second layer has been trained to expect the features of the first layer. By replacing your first layer with random weights, you are essentially throwing away any training that has been done on the subsequent layers, as they would need to be retrained. I doubt that they could retain any of the knowledge learned during the initial training.
--- end edit ---
There is an easy way, though, which you can make your model work with grayscale images. You just need to make the image to appear to be RGB. The easiest way to do so is to repeat the image array 3 times on a new dimension. Because you will have the same image over all 3 channels, the performance of the model should be the same as it was on RGB images.
In numpy this can be easily done like this:
print(grayscale_batch.shape) # (64, 224, 224)
rgb_batch = np.repeat(grayscale_batch[..., np.newaxis], 3, -1)
print(rgb_batch.shape) # (64, 224, 224, 3)
The way this works is that it first creates a new dimension (to place the channels) and then it repeats the existing array 3 times on this new dimension.
I'm also pretty sure that keras' ImageDataGenerator can load grayscale images as RGB.
Converting grayscale images to RGB as per the currently accepted answer is one approach to this problem, but not the most efficient. You most certainly can modify the weights of the model's first convolutional layer and achieve the stated goal. The modified model will both work out of the box (with reduced accuracy) and be finetunable. Modifying the weights of the first layer does not render the rest of the weights useless as suggested by others.
To do this, you'll have to add some code where the pretrained weights are loaded. In your framework of choice, you need to figure out how to grab the weights of the first convolutional layer in your network and modify them before assigning to your 1-channel model. The required modification is to sum the weight tensor over the dimension of the input channels. The way the weights tensor is organized varies from framework to framework. The PyTorch default is [out_channels, in_channels, kernel_height, kernel_width]. In Tensorflow I believe it is [kernel_height, kernel_width, in_channels, out_channels].
Using PyTorch as an example, in a ResNet50 model from Torchvision (https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py), the shape of the weights for conv1 is [64, 3, 7, 7]. Summing over dimension 1 results in a tensor of shape [64, 1, 7, 7]. At the bottom I've included a snippet of code that would work with the ResNet models in Torchvision assuming that an argument (inchans) was added to specify a different number of input channels for the model.
To prove this works I did three runs of ImageNet validation on ResNet50 with pretrained weights. There is a slight difference in the numbers for run 2 & 3, but it's minimal and should be irrelevant once finetuned.
Unmodified ResNet50 w/ RGB Images : Prec #1: 75.6, Prec #5: 92.8
Unmodified ResNet50 w/ 3-chan Grayscale Images: Prec #1: 64.6, Prec #5: 86.4
Modified 1-chan ResNet50 w/ 1-chan Grayscale Images: Prec #1: 63.8, Prec #5: 86.1
def _load_pretrained(model, url, inchans=3):
state_dict = model_zoo.load_url(url)
if inchans == 1:
conv1_weight = state_dict['conv1.weight']
state_dict['conv1.weight'] = conv1_weight.sum(dim=1, keepdim=True)
elif inchans != 3:
assert False, "Invalid number of inchans for pretrained weights"
model.load_state_dict(state_dict)
def resnet50(pretrained=False, inchans=3):
"""Constructs a ResNet-50 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(Bottleneck, [3, 4, 6, 3], inchans=inchans)
if pretrained:
_load_pretrained(model, model_urls['resnet50'], inchans=inchans)
return model
A simple way to do this is to add a convolution layer before the base model and then feed the output to the base model. Like this:
from keras.models import Model
from keras.layers import Input
resnet = Resnet50(weights='imagenet',include_top= 'TRUE')
input_tensor = Input(shape=(IMG_SIZE,IMG_SIZE,1) )
x = Conv2D(3,(3,3),padding='same')(input_tensor) # x has a dimension of (IMG_SIZE,IMG_SIZE,3)
out = resnet (x)
model = Model(inputs=input_tensor,outputs=out)
Why not try to convert a grayscale image to a fake "RGB" image?
tf.image.grayscale_to_rgb(
images,
name=None
)
Dropping the input layer will not work out. This will cause that the all following layers will suffer.
What you can do is Concatenate 3 black and white images together to expand your color dimension.
img_input = tf.keras.layers.Input(shape=(img_size_target, img_size_target,1))
img_conc = tf.keras.layers.Concatenate()([img_input, img_input, img_input])
model = ResNet50(include_top=True, weights='imagenet', input_tensor=img_conc)
I faced the same problem while working with VGG16 along with gray-scale images. I solved this problem like follows:
Let's say our training images are in train_gray_images, each row containing the unrolled gray scale image intensities. So if we directly pass it to fit function it will create an error as the fit function is expecting a 3 channel (RGB) image data-set instead of gray-scale data set. So before passing to fit function do the following:
Create a dummy RGB image data set just like the gray scale data set with the same shape (here dummy_RGB_image). The only difference is here we are using the number of the channel is 3.
dummy_RGB_images = np.ndarray(shape=(train_gray_images.shape[0], train_gray_images.shape[1], train_gray_images.shape[2], 3), dtype= np.uint8)
Therefore just copy the whole data-set 3 times to each of the channels of the "dummy_RGB_images". (Here the dimensions are [no_of_examples, height, width, channel])
dummy_RGB_images[:, :, :, 0] = train_gray_images[:, :, :, 0]
dummy_RGB_images[:, :, :, 1] = train_gray_images[:, :, :, 0]
dummy_RGB_images[:, :, :, 2] = train_gray_images[:, :, :, 0]
Finally pass the dummy_RGB_images instead of the gray scale data-set, like:
model.fit(dummy_RGB_images,...)
numpy's depth-stack function, np.dstack((img, img, img)) is a natural way to go.
If you're already using scikit-image, you can get the desired result by using gray2RGB.
from skimage.color import gray2rgb
rgb_img = gray2rgb(gray_img)
I believe you can use a pretrained resnet with 1 channel gray scale images without repeating 3 times the image.
What I have done is to replace the first layer (this is pythorch not keras, but the idea might be similar):
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
With the following layer:
(conv1): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
And then copy the sum (in the channel axis) of the weights to the new layer, for example, the shape of the original weights was:
torch.Size([64, 3, 7, 7])
So I did:
resnet18.conv1.weight.data = resnet18.conv1.weight.data.sum(axis=1).reshape(64, 1, 7, 7)
And then check that the output of the new model is the same than the output with the gray scale image:
y_1 = model_resnet_1(input_image_1)
y_3 = model_resnet_3(input_image_3)
print(torch.abs(y_1).sum(), torch.abs(y_3).sum())
(tensor(710.8860, grad_fn=<SumBackward0>),
tensor(710.8861, grad_fn=<SumBackward0>))
input_image_1: one channel image
input_image_3: 3 channel image (gray scale - all channels equal)
model_resnet_1: modified model
model_resnet_3: Original resnet model
It's really easy !
example for 'resnet50':
before do it you should have :
resnet_50= torchvision.models.resnet50()
print(resnet_50.conv1)
Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3),
bias=False)
Just do this !
resnet_50.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
the final step is to update state_dict.
resnet_50.state_dict()['conv1.weight'] = resnet_50.state_dict()['conv1.weight'].sum(dim=1, keepdim=True)
so if run as follow :
print(resnet_50.conv1)
results would be :
Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3,
3), bias=False)
As you see input channel is for the grayscale images.
what I did is to just simply expand grayscales into RGB images by using the following transform stage:
import torchvision as tv
tv.transforms.Compose([
tv.transforms.ToTensor(),
tv.transforms.Lambda(lambda x: x.broadcast_to(3, x.shape[1], x.shape[2])),
])
When you add the Resnet to model, you should input the input_shape in Resnet definition like
model = ResNet50(include_top=True,input_shape=(256,256,1))
.

Categories