I want to use Inception-v3 with pretrained weights on ImageNet to take inputs that are not just 3 channel RGB images but have more channels, such that the dimension is (224, 224, x!=3), and then assigning a self-defined set of weights to the following Conv2D layer. I was trying to change the input layer and the subsequent Conv2D layer such that it suits my needs, but I could not find a structured way of doing so.
I tried building a custom Conv2d tensor with Conv2D(...)(input) and assigning that to the corresponding layer of Inception, but this fails because it requires actual layers, while the above instruction yields a tensor. For all it matters, Conv2D(...)(Input) and Inception.layers[1].output yields the correct same output (which it should be since I just want to change the input dimensions and weights), the question is how to wrap the new Conv2D input-output mapping as a layer and replace it in Inception?
I could try hacking my way through this, but generally I wondered if there is a swift and elegant way of reassigning certain layers in those pretrained models with custom specifications.
Thank you!
Edit:
What works is inserting these lines at line 394 of the inception_v3.py from Keras, disabling the exception for more than 3 channel inputs and then simply calling the constructor with the desired input. (Note that Original calls the original InceptionV3 constructor)
Code:
original_model = Original(weights='imagenet', include_top=False, input_shape=(299, 299, 3))
weights = model.get_weights()
original_weights = original_model.get_weights()
for i in range(1, len(original_weights)):
weights[i] = original_weights[i]
averaged_weights = np.mean(weights[0], axis=2)[:, :, None, :]
replicated_weights = np.repeat(averaged_weights, 20, axis=2)
weights[0] = replicated_weights
Then I can call
InceptionV3(weights='imagenet', include_top=False, input_shape=(299, 299, 20))
This work and gives the desired result, but seems very hacky.
Related
I’m currently trying to use a pretrained DenseNet in my model. I’m following this tutorial: https://pytorch.org/hub/pytorch_vision_densenet/, and it works well, with an input of [1,3,244,244], it returns a [1,1000] tensor, exactly as expected.
However, currently I’m using this code to load a pretrained Densenet into my model, and use it as a “feature extraction” model. This is the code in the init function
base_model = torch.hub.load('pytorch/vision:v0.10.0', 'densenet121', pretrained=True)
self.base_model = nn.Sequential(*list(base_model.children())[:-1])
And it is being used like this in the forward function
x = self.base_model(x)
This however, taking the same input, returns a tensor of the size: ([1, 1024, 7, 7]). I can not figure out what is not working, I think it is due to the fact that DenseNet connects all the layers together, but I do not know how to get it to work in the same method. Any tips in how to use pretrained DenseNet in my own model?
Generally nn.Modules have logic inside the forward definition, which means it won't be accessible by just converting the model to a sequential block. Most notably, you can generally find downsampling and/or flattening occurring between the CNN and the classifier layer(s) of the network. This is the for DenseNet.
If you look at Torchvision's forward implementation of DenseNet here you will see:
def forward(self, x: Tensor) -> Tensor:
features = self.features(x)
out = F.relu(features, inplace=True)
out = F.adaptive_avg_pool2d(out, (1, 1))
out = torch.flatten(out, 1)
out = self.classifier(out)
return out
You can see how the tensor outputted by the CNN self.features (shaped (*, 1024, 7, 7)) is processed through a ReLU, Adaptive average pool, and flatten before being fed to the classifier (the last layer).
I am trying to predict a single image. But my model returns a prediction array with the shape (1,1,1,2048) when it should be (1,10). Any idea what I am doing wrong? My x input shape is correct at (1,32,32,3).
ResNet50V2():
IMG_SHAPE = (32, 32, 3)
return tf.keras.applications.ResNet50V2(input_shape=IMG_SHAPE, include_top=False, weights=None, classes=10)
model = ResNet50V2()
x = x[None, :]
predictions = model.predict(x)
You are loading your keras-model with parameter
include_top=False
which cuts of the fully-connected projection layer that is responsible for projecting the model output to your expected amout of classes. Change the parameter to True.
That's because you are disabling top with the include top, which removes the final classification layer. You need to either add your own layer with 10 classes or remove the include top parameter and retrain the network with the desired inputs.
A image classification network usually works within 2 steps of processing.
The first one is feature extraction, we call that "base", and it consists in a stack of layers to find and reinforce patterns on image(2DConv, Relu and MaxPool).
The second one is the "head" and it is used to classify the extracted features from previous step.
Your code is getting the raw output of the "base", with no classification, and as the other gentle people stated, the solution is adding a custom "head" or changing the include_top parameter to True.
I'm having some problems making masking work with a keras RNN written in Functional API. The idea is to mask a tensor, zero-padded, with shape (batch_size, timesteps, 100) and feed it into a SimpleRNN. Right now I have the following:
input = keras.layers.Input(shape=(None, 100))
mask_layer = keras.layers.Masking(mask_value=0.)
mask = mask_layer(input)
rnn = keras.layers.SimpleRNN(20)
x = rnn(input, mask=mask)
However, this does not work, because it raises the following InvalidArgumentError:
InvalidArgumentError: Dimension 1 in both shapes must be equal, but are 20 and 2000. Shapes are [?,20] and [?,2000]. for 'Select' (op: 'Select') with input shapes: [?,2000], [?,20], [?,20].
By changing my Input's shape into (None, 1) - a sequential input where each element is a single integer, instead of n-dimensional embeddings - I've gotten this code to work. I've also gotten the same idea to work with the Sequential API, but I cannot do this, as my final model will have multiple inputs and outputs. I also do not want to force my Input's shape to be (None, 1), as I want to swap out different embedding models (Word2Vec, etc) during preprocessing, which means my Inputs will be embedding vectors from the start.
Can anyone help me with using masks with RNNs when using keras's functional API?
According to Masking and Padding with Keras, you won't need to manually set mask on the RNN layer, in the following code the RNN layer will automatically receive the mask.
import keras
input_layer = keras.layers.Input(shape=(None, 100))
masked_layer = keras.layers.Masking(mask_value=0.)(input_layer)
rnn_layer = keras.layers.SimpleRNN(20)(masked_layer)
I don't understand what's happening in this code:
def construct_model(use_imagenet=True):
# line 1: how do we keep all layers of this model ?
model = keras.applications.InceptionV3(include_top=False, input_shape=(IMG_SIZE, IMG_SIZE, 3),
weights='imagenet' if use_imagenet else None) # line 1: how do we keep all layers of this model ?
new_output = keras.layers.GlobalAveragePooling2D()(model.output)
new_output = keras.layers.Dense(N_CLASSES, activation='softmax')(new_output)
model = keras.engine.training.Model(model.inputs, new_output)
return model
Specifically, my confusion is, when we call the last constructor
model = keras.engine.training.Model(model.inputs, new_output)
we specify input layer and output layer, but how does it know we want all the other layers to stay?
In other words, we append the new_output layer to the pre-trained model we load in line 1, that is the new_output layer, and then in the final constructor (final line), we just create and return a model with a specified input and output layers, but how does it know what other layers we want in between?
Side question 1): What is the difference between keras.engine.training.Model and keras.models.Model?
Side question 2): What exactly happens when we do new_layer = keras.layers.Dense(...)(prev_layer)? Does the () operation return new layer, what does it do exactly?
This model was created using the Functional API Model
Basically it works like this (perhaps if you go to the "side question 2" below before reading this it may get clearer):
You have an input tensor (you can see it as "input data" too)
You create (or reuse) a layer
You pass the input tensor to a layer (you "call" a layer with an input)
You get an output tensor
You keep working with these tensors until you have created the entire graph.
But this hasn't created a "model" yet. (One you can train and use other things).
All you have is a graph telling which tensors go where.
To create a model, you define it's start end end points.
In the example.
They take an existing model: model = keras.applications.InceptionV3(...)
They want to expand this model, so they get its output tensor: model.output
They pass this tensor as the input of a GlobalAveragePooling2D layer
They get this layer's output tensor as new_output
They pass this as input to yet another layer: Dense(N_CLASSES, ....)
And get its output as new_output (this var was replaced as they are not interested in keeping its old value...)
But, as it works with the functional API, we don't have a model yet, only a graph. In order to create a model, we use Model defining the input tensor and the output tensor:
new_model = Model(old_model.inputs, new_output)
Now you have your model.
If you use it in another var, as I did (new_model), the old model will still exist in model. And these models are sharing the same layers, in a way that whenever you train one of them, the other gets updated as well.
Question: how does it know what other layers we want in between?
When you do:
outputTensor = SomeLayer(...)(inputTensor)
you have a connection between the input and output. (Keras will use the inner tensorflow mechanism and add these tensors and nodes to the graph). The output tensor cannot exist without the input. The entire InceptionV3 model is connected from start to end. Its input tensor goes through all the layers to yield an ouptut tensor. There is only one possible way for the data to follow, and the graph is the way.
When you get the output of this model and use it to get further outputs, all your new outputs are connected to this, and thus to the first input of the model.
Probably the attribute _keras_history that is added to the tensors is closely related to how it tracks the graph.
So, doing Model(old_model.inputs, new_output) will naturally follow the only way possible: the graph.
If you try doing this with tensors that are not connected, you will get an error.
Side question 1
Prefer to import from "keras.models". Basically, this module will import from the other module:
https://github.com/keras-team/keras/blob/master/keras/models.py
Notice that the file keras/models.py imports Model from keras.engine.training. So, it's the same thing.
Side question 2
It's not new_layer = keras.layers.Dense(...)(prev_layer).
It is output_tensor = keras.layers.Dense(...)(input_tensor).
You're doing two things in the same line:
Creating a layer - with keras.layers.Dense(...)
Calling the layer with an input tensor to get an output tensor
If you wanted to use the same layer with different inputs:
denseLayer = keras.layers.Dense(...) #creating a layer
output1 = denseLayer(input1) #calling a layer with an input and getting an output
output2 = denseLayer(input2) #calling the same layer on another input
output3 = denseLayer(input3) #again
Bonus - Creating a functional model that is equal to a sequential model
If you create this sequential model:
model = Sequential()
model.add(Layer1(...., input_shape=some_shape))
model.add(Layer2(...))
model.add(Layer3(...))
You're doing exactly the same as:
inputTensor = Input(some_shape)
outputTensor = Layer1(...)(inputTensor)
outputTensor = Layer2(...)(outputTensor)
outputTensor = Layer3(...)(outputTensor)
model = Model(inputTensor,outputTensor)
What is the difference?
Well, functional API models are totally free to be build anyway you want. You can create branches:
out1 = Layer1(..)(inputTensor)
out2 = Layer2(..)(inputTensor)
You can join tensors:
joinedOut = Concatenate()([out1,out2])
With this, you can create anything you want with all kinds of fancy stuff, branches, gates, concatenations, additions, etc., which you can't do with a sequential model.
In fact, a Sequential model is also a Model, but created for a quick use in models without branches.
There's this way of building a model from a pretrained one that you may build upon.
See https://keras.io/applications/#fine-tune-inceptionv3-on-a-new-set-of-classes:
base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(200, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
Each time a layer is added by an op like "x=Dense(...", information about the computational graph is updated. You can type this interactively to see what it contains:
x.graph.__dict__
You can see there's all kinds of attributes, including about previous and next layers. These are internal implementation details and possibly change over time.
it is possible to set as param filter array with own filters instead of number of filters in Conv2D
filters = [[[1,0,0],[1,0,0],[1,0,0]],
[[1,0,0],[0,1,0],[0,0,1]],
[[0,1,0],[0,1,0],[0,1,0]],
[[0,0,1],[0,0,1],[0,0,1]]]
model = Sequential()
model.add(Conv2D(filters, (3, 3), activation='relu', input_shape=(3, 1024, 1024), data_format='channels_first'))
The accepted answer is right but it would certainly be more useful with a complete example, similar to the one provided in this excellent tensorflow example showing what Conv2d does.
For keras, this is,
from keras.models import Sequential
from keras.layers import Conv2D
import numpy as np
# Keras version of this example:
# https://stackoverflow.com/questions/34619177/what-does-tf-nn-conv2d-do-in-tensorflow
# Requires a custom kernel initialise to set to value from example
# kernel = [[1,0,1],[2,1,0],[0,0,1]]
# image = [[4,3,1,0],[2,1,0,1],[1,2,4,1],[3,1,0,2]]
# output = [[14, 6],[6,12]]
#Set Image
image = [[4,3,1,0],[2,1,0,1],[1,2,4,1],[3,1,0,2]]
# Pad to "channels_last" format
# which is [batch, width, height, channels]=[1,4,4,1]
image = np.expand_dims(np.expand_dims(np.array(image),2),0)
#Initialise to set kernel to required value
def kernel_init(shape):
kernel = np.zeros(shape)
kernel[:,:,0,0] = np.array([[1,0,1],[2,1,0],[0,0,1]])
return kernel
#Build Keras model
model = Sequential()
model.add(Conv2D(1, [3,3], kernel_initializer=kernel_init,
input_shape=(4,4,1), padding="valid"))
model.build()
# To apply existing filter, we use predict with no training
out = model.predict(image)
print(out[0,:,:,0])
which outputs
[[14, 6]
[6, 12]]
as expected.
You must have in mind that the purpose of a Conv2D network is to train these filters values. I mean, in a traditional image processing task using morphological filters we are supposed to design the filter kernels and then iterate them through the whole image (convolution).
In a deep learning approach we are trying to do the same task. But here instead we assume we don't know which filters should be used, although we know exactly what we are looking for (the labeled images). When we are training a convolutional neural network we are showing to it what we want and asking it to find out its own weights, i.e. the filters values.
So, in this context, we should just define how many filters we want to train (in your case, 4 filters) and how they will be initialized. Their weights will be set when training the network.
There are many ways to initialize your filters weights (e.g. setting them all to zero or one; or using a random function to guarantee that distinct image characteristics would be catched by them). The Keras Conv2D function uses as default the 'glorot uniform' algorithm, as specified in https://keras.io/layers/convolutional/#conv2d.
If you really want to initialize your filters weights in the way you have showed, you can write your own function (take a look at https://keras.io/initializers/) and pass it via kernel_initializer parameter:
model.add(Conv2D(number_of_filters, (3, 3), activation='relu', input_shape=(3, 1024, 1024), kernel_initializer=your_function, data_format='channels_first'))