Keras Conv2d own filters - python

it is possible to set as param filter array with own filters instead of number of filters in Conv2D
filters = [[[1,0,0],[1,0,0],[1,0,0]],
[[1,0,0],[0,1,0],[0,0,1]],
[[0,1,0],[0,1,0],[0,1,0]],
[[0,0,1],[0,0,1],[0,0,1]]]
model = Sequential()
model.add(Conv2D(filters, (3, 3), activation='relu', input_shape=(3, 1024, 1024), data_format='channels_first'))

The accepted answer is right but it would certainly be more useful with a complete example, similar to the one provided in this excellent tensorflow example showing what Conv2d does.
For keras, this is,
from keras.models import Sequential
from keras.layers import Conv2D
import numpy as np
# Keras version of this example:
# https://stackoverflow.com/questions/34619177/what-does-tf-nn-conv2d-do-in-tensorflow
# Requires a custom kernel initialise to set to value from example
# kernel = [[1,0,1],[2,1,0],[0,0,1]]
# image = [[4,3,1,0],[2,1,0,1],[1,2,4,1],[3,1,0,2]]
# output = [[14, 6],[6,12]]
#Set Image
image = [[4,3,1,0],[2,1,0,1],[1,2,4,1],[3,1,0,2]]
# Pad to "channels_last" format
# which is [batch, width, height, channels]=[1,4,4,1]
image = np.expand_dims(np.expand_dims(np.array(image),2),0)
#Initialise to set kernel to required value
def kernel_init(shape):
kernel = np.zeros(shape)
kernel[:,:,0,0] = np.array([[1,0,1],[2,1,0],[0,0,1]])
return kernel
#Build Keras model
model = Sequential()
model.add(Conv2D(1, [3,3], kernel_initializer=kernel_init,
input_shape=(4,4,1), padding="valid"))
model.build()
# To apply existing filter, we use predict with no training
out = model.predict(image)
print(out[0,:,:,0])
which outputs
[[14, 6]
[6, 12]]
as expected.

You must have in mind that the purpose of a Conv2D network is to train these filters values. I mean, in a traditional image processing task using morphological filters we are supposed to design the filter kernels and then iterate them through the whole image (convolution).
In a deep learning approach we are trying to do the same task. But here instead we assume we don't know which filters should be used, although we know exactly what we are looking for (the labeled images). When we are training a convolutional neural network we are showing to it what we want and asking it to find out its own weights, i.e. the filters values.
So, in this context, we should just define how many filters we want to train (in your case, 4 filters) and how they will be initialized. Their weights will be set when training the network.
There are many ways to initialize your filters weights (e.g. setting them all to zero or one; or using a random function to guarantee that distinct image characteristics would be catched by them). The Keras Conv2D function uses as default the 'glorot uniform' algorithm, as specified in https://keras.io/layers/convolutional/#conv2d.
If you really want to initialize your filters weights in the way you have showed, you can write your own function (take a look at https://keras.io/initializers/) and pass it via kernel_initializer parameter:
model.add(Conv2D(number_of_filters, (3, 3), activation='relu', input_shape=(3, 1024, 1024), kernel_initializer=your_function, data_format='channels_first'))

Related

Model Created With Pytorch's *list, .children(), and nn.sequential Produces Different Output Tensors

I’m currently trying to use a pretrained DenseNet in my model. I’m following this tutorial: https://pytorch.org/hub/pytorch_vision_densenet/, and it works well, with an input of [1,3,244,244], it returns a [1,1000] tensor, exactly as expected.
However, currently I’m using this code to load a pretrained Densenet into my model, and use it as a “feature extraction” model. This is the code in the init function
base_model = torch.hub.load('pytorch/vision:v0.10.0', 'densenet121', pretrained=True)
self.base_model = nn.Sequential(*list(base_model.children())[:-1])
And it is being used like this in the forward function
x = self.base_model(x)
This however, taking the same input, returns a tensor of the size: ([1, 1024, 7, 7]). I can not figure out what is not working, I think it is due to the fact that DenseNet connects all the layers together, but I do not know how to get it to work in the same method. Any tips in how to use pretrained DenseNet in my own model?
Generally nn.Modules have logic inside the forward definition, which means it won't be accessible by just converting the model to a sequential block. Most notably, you can generally find downsampling and/or flattening occurring between the CNN and the classifier layer(s) of the network. This is the for DenseNet.
If you look at Torchvision's forward implementation of DenseNet here you will see:
def forward(self, x: Tensor) -> Tensor:
features = self.features(x)
out = F.relu(features, inplace=True)
out = F.adaptive_avg_pool2d(out, (1, 1))
out = torch.flatten(out, 1)
out = self.classifier(out)
return out
You can see how the tensor outputted by the CNN self.features (shaped (*, 1024, 7, 7)) is processed through a ReLU, Adaptive average pool, and flatten before being fed to the classifier (the last layer).

Adaptation module design for stacking two CNNs

I'm trying to stack two different CNNs using an adaptation module to bridge them, but I'm having a hard time determining the adaption module's layer hyperparameters correctly.
To be more precise, I would like to train the adaptation module to bridge two convolutional layers:
Layer A with output shape: (29,29,256)
Layer B with input shape: (8,8,384)
So, after Layer A, I sequentially add the adaptation module, for which I choose:
Conv2D layer with 384 filters with kernel size: (3,3) / Output shape: (29,29,384)
MaxPool2D with pool size: (2,2), strides: (4,4) and padding: "same" / Output shape: (8,8,384)
Finally, I try to add layer B to the model, but I get the following error from tensorflow:
InvalidArgumentError: Dimensions must be equal, but are 384 and 288 for '{{node batch_normalization_159/FusedBatchNormV3}} = FusedBatchNormV3[T=DT_FLOAT, U=DT_FLOAT, data_format="NHWC", epsilon=0.001, exponential_avg_factor=1, is_training=false](Placeholder, batch_normalization_159/scale, batch_normalization_159/ReadVariableOp, batch_normalization_159/FusedBatchNormV3/ReadVariableOp, batch_normalization_159/FusedBatchNormV3/ReadVariableOp_1)' with input shapes: [?,8,8,384], [288], [288], [288], [288].
There's a minimal reproducible example of it:
from keras.applications.inception_resnet_v2 import InceptionResNetV2
from keras.applications.mobilenet import MobileNet
from keras.layers import Conv2D, MaxPool2D
from keras.models import Sequential
mobile_model = MobileNet(weights='imagenet')
server_model = InceptionResNetV2(weights='imagenet')
hybrid = Sequential()
for i, layer in enumerate(mobile_model.layers):
if i <= 36:
layer.trainable = False
hybrid.add(layer)
hybrid.add(Conv2D(384, kernel_size=(3,3), padding='same'))
hybrid.add(MaxPool2D(pool_size=(2,2), strides=(4,4), padding='same'))
for i, layer in enumerate(server_model.layers):
if i >= 610:
layer.trainable = False
hybrid.add(layer)
Sequential models only support models where the layers are arranged like a linked list - each layer takes the output of only one layer, and each layer's output is only fed to a single layer. Your two base models have residual blocks, which breaks the above assumption, and turns the model architecture into directed acyclic graph (DAG).
To do what you want to do, you'll need to use the Functional API. With the Functional API, you explicitly control the intermediate activations aka KerasTensors.
For the first model, you can skip that extra work and just create a new model from a subset of the existing graph like this
sub_mobile = keras.models.Model(mobile_model.inputs, mobile_model.layers[36].output)
Wiring some of the layers of the second model is much more difficult. It's easy to slice off the end of a keras model - it's much more difficult to slice of the beginning because of the need for a tf.keras.Input placeholder. To do this successfully, you'll need to write a model walking algorithm that go through the layers, tracks the output KerasTensors, then calls each layer with the new inputs to create a new output KerasTensor.
You could avoid all that work by simply finding some source code for an InceptionResNet and adding layers via Python rather than introspecting an existing model. Here's one which may fit the bill.
https://github.com/yuyang-huang/keras-inception-resnet-v2/blob/master/inception_resnet_v2.py

How can I use a pre-trained neural network with grayscale images?

I have a dataset containing grayscale images and I want to train a state-of-the-art CNN on them. I'd very much like to fine-tune a pre-trained model (like the ones here).
The problem is that almost all models I can find the weights for have been trained on the ImageNet dataset, which contains RGB images.
I can't use one of those models because their input layer expects a batch of shape (batch_size, height, width, 3) or (64, 224, 224, 3) in my case, but my images batches are (64, 224, 224).
Is there any way that I can use one of those models? I've thought of dropping the input layer after I've loaded the weights and adding my own (like we do for the top layers). Is this approach correct?
The model's architecture cannot be changed because the weights have been trained for a specific input configuration. Replacing the first layer with your own would pretty much render the rest of the weights useless.
-- Edit: elaboration suggested by Prune--
CNNs are built so that as they go deeper, they can extract high-level features derived from the lower-level features that the previous layers extracted. By removing the initial layers of a CNN, you are destroying that hierarchy of features because the subsequent layers won't receive the features that they are supposed to as their input. In your case the second layer has been trained to expect the features of the first layer. By replacing your first layer with random weights, you are essentially throwing away any training that has been done on the subsequent layers, as they would need to be retrained. I doubt that they could retain any of the knowledge learned during the initial training.
--- end edit ---
There is an easy way, though, which you can make your model work with grayscale images. You just need to make the image to appear to be RGB. The easiest way to do so is to repeat the image array 3 times on a new dimension. Because you will have the same image over all 3 channels, the performance of the model should be the same as it was on RGB images.
In numpy this can be easily done like this:
print(grayscale_batch.shape) # (64, 224, 224)
rgb_batch = np.repeat(grayscale_batch[..., np.newaxis], 3, -1)
print(rgb_batch.shape) # (64, 224, 224, 3)
The way this works is that it first creates a new dimension (to place the channels) and then it repeats the existing array 3 times on this new dimension.
I'm also pretty sure that keras' ImageDataGenerator can load grayscale images as RGB.
Converting grayscale images to RGB as per the currently accepted answer is one approach to this problem, but not the most efficient. You most certainly can modify the weights of the model's first convolutional layer and achieve the stated goal. The modified model will both work out of the box (with reduced accuracy) and be finetunable. Modifying the weights of the first layer does not render the rest of the weights useless as suggested by others.
To do this, you'll have to add some code where the pretrained weights are loaded. In your framework of choice, you need to figure out how to grab the weights of the first convolutional layer in your network and modify them before assigning to your 1-channel model. The required modification is to sum the weight tensor over the dimension of the input channels. The way the weights tensor is organized varies from framework to framework. The PyTorch default is [out_channels, in_channels, kernel_height, kernel_width]. In Tensorflow I believe it is [kernel_height, kernel_width, in_channels, out_channels].
Using PyTorch as an example, in a ResNet50 model from Torchvision (https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py), the shape of the weights for conv1 is [64, 3, 7, 7]. Summing over dimension 1 results in a tensor of shape [64, 1, 7, 7]. At the bottom I've included a snippet of code that would work with the ResNet models in Torchvision assuming that an argument (inchans) was added to specify a different number of input channels for the model.
To prove this works I did three runs of ImageNet validation on ResNet50 with pretrained weights. There is a slight difference in the numbers for run 2 & 3, but it's minimal and should be irrelevant once finetuned.
Unmodified ResNet50 w/ RGB Images : Prec #1: 75.6, Prec #5: 92.8
Unmodified ResNet50 w/ 3-chan Grayscale Images: Prec #1: 64.6, Prec #5: 86.4
Modified 1-chan ResNet50 w/ 1-chan Grayscale Images: Prec #1: 63.8, Prec #5: 86.1
def _load_pretrained(model, url, inchans=3):
state_dict = model_zoo.load_url(url)
if inchans == 1:
conv1_weight = state_dict['conv1.weight']
state_dict['conv1.weight'] = conv1_weight.sum(dim=1, keepdim=True)
elif inchans != 3:
assert False, "Invalid number of inchans for pretrained weights"
model.load_state_dict(state_dict)
def resnet50(pretrained=False, inchans=3):
"""Constructs a ResNet-50 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(Bottleneck, [3, 4, 6, 3], inchans=inchans)
if pretrained:
_load_pretrained(model, model_urls['resnet50'], inchans=inchans)
return model
A simple way to do this is to add a convolution layer before the base model and then feed the output to the base model. Like this:
from keras.models import Model
from keras.layers import Input
resnet = Resnet50(weights='imagenet',include_top= 'TRUE')
input_tensor = Input(shape=(IMG_SIZE,IMG_SIZE,1) )
x = Conv2D(3,(3,3),padding='same')(input_tensor) # x has a dimension of (IMG_SIZE,IMG_SIZE,3)
out = resnet (x)
model = Model(inputs=input_tensor,outputs=out)
Why not try to convert a grayscale image to a fake "RGB" image?
tf.image.grayscale_to_rgb(
images,
name=None
)
Dropping the input layer will not work out. This will cause that the all following layers will suffer.
What you can do is Concatenate 3 black and white images together to expand your color dimension.
img_input = tf.keras.layers.Input(shape=(img_size_target, img_size_target,1))
img_conc = tf.keras.layers.Concatenate()([img_input, img_input, img_input])
model = ResNet50(include_top=True, weights='imagenet', input_tensor=img_conc)
I faced the same problem while working with VGG16 along with gray-scale images. I solved this problem like follows:
Let's say our training images are in train_gray_images, each row containing the unrolled gray scale image intensities. So if we directly pass it to fit function it will create an error as the fit function is expecting a 3 channel (RGB) image data-set instead of gray-scale data set. So before passing to fit function do the following:
Create a dummy RGB image data set just like the gray scale data set with the same shape (here dummy_RGB_image). The only difference is here we are using the number of the channel is 3.
dummy_RGB_images = np.ndarray(shape=(train_gray_images.shape[0], train_gray_images.shape[1], train_gray_images.shape[2], 3), dtype= np.uint8)
Therefore just copy the whole data-set 3 times to each of the channels of the "dummy_RGB_images". (Here the dimensions are [no_of_examples, height, width, channel])
dummy_RGB_images[:, :, :, 0] = train_gray_images[:, :, :, 0]
dummy_RGB_images[:, :, :, 1] = train_gray_images[:, :, :, 0]
dummy_RGB_images[:, :, :, 2] = train_gray_images[:, :, :, 0]
Finally pass the dummy_RGB_images instead of the gray scale data-set, like:
model.fit(dummy_RGB_images,...)
numpy's depth-stack function, np.dstack((img, img, img)) is a natural way to go.
If you're already using scikit-image, you can get the desired result by using gray2RGB.
from skimage.color import gray2rgb
rgb_img = gray2rgb(gray_img)
I believe you can use a pretrained resnet with 1 channel gray scale images without repeating 3 times the image.
What I have done is to replace the first layer (this is pythorch not keras, but the idea might be similar):
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
With the following layer:
(conv1): Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
And then copy the sum (in the channel axis) of the weights to the new layer, for example, the shape of the original weights was:
torch.Size([64, 3, 7, 7])
So I did:
resnet18.conv1.weight.data = resnet18.conv1.weight.data.sum(axis=1).reshape(64, 1, 7, 7)
And then check that the output of the new model is the same than the output with the gray scale image:
y_1 = model_resnet_1(input_image_1)
y_3 = model_resnet_3(input_image_3)
print(torch.abs(y_1).sum(), torch.abs(y_3).sum())
(tensor(710.8860, grad_fn=<SumBackward0>),
tensor(710.8861, grad_fn=<SumBackward0>))
input_image_1: one channel image
input_image_3: 3 channel image (gray scale - all channels equal)
model_resnet_1: modified model
model_resnet_3: Original resnet model
It's really easy !
example for 'resnet50':
before do it you should have :
resnet_50= torchvision.models.resnet50()
print(resnet_50.conv1)
Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3),
bias=False)
Just do this !
resnet_50.conv1 = nn.Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
the final step is to update state_dict.
resnet_50.state_dict()['conv1.weight'] = resnet_50.state_dict()['conv1.weight'].sum(dim=1, keepdim=True)
so if run as follow :
print(resnet_50.conv1)
results would be :
Conv2d(1, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3,
3), bias=False)
As you see input channel is for the grayscale images.
what I did is to just simply expand grayscales into RGB images by using the following transform stage:
import torchvision as tv
tv.transforms.Compose([
tv.transforms.ToTensor(),
tv.transforms.Lambda(lambda x: x.broadcast_to(3, x.shape[1], x.shape[2])),
])
When you add the Resnet to model, you should input the input_shape in Resnet definition like
model = ResNet50(include_top=True,input_shape=(256,256,1))
.

How to modify layers of pretrained models in Keras like Inception-v3?

I want to use Inception-v3 with pretrained weights on ImageNet to take inputs that are not just 3 channel RGB images but have more channels, such that the dimension is (224, 224, x!=3), and then assigning a self-defined set of weights to the following Conv2D layer. I was trying to change the input layer and the subsequent Conv2D layer such that it suits my needs, but I could not find a structured way of doing so.
I tried building a custom Conv2d tensor with Conv2D(...)(input) and assigning that to the corresponding layer of Inception, but this fails because it requires actual layers, while the above instruction yields a tensor. For all it matters, Conv2D(...)(Input) and Inception.layers[1].output yields the correct same output (which it should be since I just want to change the input dimensions and weights), the question is how to wrap the new Conv2D input-output mapping as a layer and replace it in Inception?
I could try hacking my way through this, but generally I wondered if there is a swift and elegant way of reassigning certain layers in those pretrained models with custom specifications.
Thank you!
Edit:
What works is inserting these lines at line 394 of the inception_v3.py from Keras, disabling the exception for more than 3 channel inputs and then simply calling the constructor with the desired input. (Note that Original calls the original InceptionV3 constructor)
Code:
original_model = Original(weights='imagenet', include_top=False, input_shape=(299, 299, 3))
weights = model.get_weights()
original_weights = original_model.get_weights()
for i in range(1, len(original_weights)):
weights[i] = original_weights[i]
averaged_weights = np.mean(weights[0], axis=2)[:, :, None, :]
replicated_weights = np.repeat(averaged_weights, 20, axis=2)
weights[0] = replicated_weights
Then I can call
InceptionV3(weights='imagenet', include_top=False, input_shape=(299, 299, 20))
This work and gives the desired result, but seems very hacky.

extracting Bottleneck features using pretrained Inceptionv3 - differences between Keras' implementation and Native Tensorflow implementation

(Apologies for the long post)
All,
I want to use the bottleneck features from a pretrained Inceptionv3 model to predict classification for my input images. Before training a model and predicting classification, I tried 3 different approaches for extracting the bottleneck features.
My 3 approaches yielded different bottleneck features (not just in values but even the size was different).
Size of my bottleneck features from Approach 1 and 2: (number of input images) x 3 x 3 x 2048
Size of my bottleneck features from Approach 3: (number of input images) x 2048
Why are the sizes different between the Keras based Inceptionv3 model and the native Tensorflow model? My guess is that when I say include_top=False in Keras, I'm not extracting the 'pool_3/_reshape:0' layer. Is this correct? If yes, how do I extract the 'pool_3/_reshape:0' layer in Keras? If my guess is incorrect, what 'am I missing?
I compared the bottleneck feature values from Approach 1 and 2 and they were significantly different. I think I'm feeding it the same input images because I resize and rescale my images before I even read it as input for my script. I have no options for my ImageDataGenerator in Approach 1 and according to the documentation for that function all the default values do not change my input image. I have set shuffle to false so I assumed that predict_generator and predict are reading images in the same order. What 'am I missing?
Please note:
My inputs images are in RGB format (so number of channels = 3) and I resized all of them to 150x150. I used the preprocess_input function in inceptionv3.py to preprocess all my images.
def preprocess_input(image):
image /= 255.
image -= 0.5
image *= 2.
return image
Approach 1: Used Keras with tensorflow as backend, an ImageDataGenerator to read my data and model.predict_generator to compute bottleneck features
I followed the example (Section Using the bottleneck features of a pre-trained network: 90% accuracy in a minute) from Keras' blog. Instead of VGG model listed there I used Inceptionv3. Below is the snippet of code I used
(code not shown here but what i did before the code below) : read all input images, resize to 150x150x3, rescale according to the preprocessing_input function mentioned above, save the resized and rescaled images
train_datagen = ImageDataGenerator()
train_generator = train_datagen.flow_from_directory(my_input_dir, target_size=(150,150),shuffle=False, batch_size=16)
# get bottleneck features
# use pre-trained model and exclude top layer - which is used for classification
pretrained_model = InceptionV3(include_top=False, weights='imagenet', input_shape=(150,150,3))
bottleneck_features_train_v1 = pretrained_model.predict_generator(train_generator,len(train_generator.filenames)//16)
Approach 2: Used Keras with tensorflow as backend, my own reader and model.predict to compute bottleneck features
Only difference between this approach and earlier one is that I used my own reader to read the input images.
(code not shown here but what i did before the code below) : read all input images, resize to 150x150x3, rescale according to the preprocessing_input function mentioned above, save the resized and rescaled images
# inputImages is a numpy array of size <number of input images x 150 x 150 x 3>
inputImages = readAllJPEGsInFolderAndMergeAsRGB(my_input_dir)
# get bottleneck features
# use pre-trained model and exclude top layer - which is used for classification
pretrained_model = InceptionV3(include_top=False, weights='imagenet', input_shape=(img_width, img_height, 3))
bottleneck_features_train_v2 = pretrained_model.predict(trainData.images,batch_size=16)
Approach 3: Used tensorflow (NO KERAS) compute bottleneck features
I followed retrain.py to extract bottleneck features for my input images. Please note that that the weights from that script can be obtained from (http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz)
As mentioned in that example, I used the bottleneck_tensor_name = 'pool_3/_reshape:0' as the layer to extract and compute bottleneck features. Similar to the first 2 approaches, I used resized and rescaled images as input to the script and I called this feature list bottleneck_features_train_v3
Thank you so much
Different results between 1 and 2
Since you haven't shown your code, I (maybe wrongly) suggest that the problem is that you might not have used preprocess_input when declaring ImageDataGenerator ?
from keras.applications.inception_v3 import preprocess_input
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
Make sure, though, that your saved image files range from 0 to 255. (Bit depth 24).
Different shapes between 1 and 3
There are three possible types of model in this case:
include_top = True -> this will return classes
include_top = False (only) -> this implies in pooling = None (no final pooling layer)
include_top = False, pooling='avg' or ='max' -> has a pooling layer
So, your declared model without an explicit pooling=something doesn't have the final pooling layer in keras. Then the outputs will still have the spatial dimensions.
Solve that simply by adding a pooling at the end. One of these:
pretrained_model = InceptionV3(include_top=False, pooling = 'avg', weights='imagenet', input_shape=(img_width, img_height, 3))
pretrained_model = InceptionV3(include_top=False, pooling = 'max', weights='imagenet', input_shape=(img_width, img_height, 3))
Not sure which one the model in the tgz file is using.
As an alternative, you can also get another layer from the Tensorflow model, the one coming immediately before 'pool_3'.
You can look into the Keras implementation of inceptionv3 here:
https://github.com/keras-team/keras/blob/master/keras/applications/inception_v3.py
so, the default parameter is:
def InceptionV3(include_top=True,
weights='imagenet',
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000):
Notice that default for pooling=None, then when building the model, the code is:
if include_top:
# Classification block
x = GlobalAveragePooling2D(name='avg_pool')(x)
x = Dense(classes, activation='softmax', name='predictions')(x)
else:
if pooling == 'avg':
x = GlobalAveragePooling2D()(x)
elif pooling == 'max':
x = GlobalMaxPooling2D()(x)
# Ensure that the model takes into account
# any potential predecessors of `input_tensor`.
if input_tensor is not None:
inputs = get_source_inputs(input_tensor)
else:
inputs = img_input
# Create model.
model = Model(inputs, x, name='inception_v3')
So if you do not specify the pooling the bottleneck feature is extracted without any pooling, you need to specify if you want to get an average pooling or max pooling on top of these feature.

Categories