What is the most efficient way to modify a Keras Model? - python

Is there a way to add nodes to a layer in an existing Keras model? if so, what is the most efficient way to do so?
Also, is it possible to do the same but with layers? i.e. add a new layer to an existing Keras model (for example, right after the input layer).
One way I know of is to use Keras functional API by iterating and cloning each layer of the model in order to create a "copy" of the original model with the desired changes, but is it the most efficient way to accomplish this task?

You can take the output of a layer in a model and build another model starting from it:
import tensorflow as tf
# One simple model
inputs = tf.keras.Input(shape=(3,))
x = tf.keras.layers.Dense(4, activation='relu')(inputs)
outputs = tf.keras.layers.Dense(5, activation='softmax')(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs)
# Make a second model starting from layer in previous model
x2 = tf.keras.layers.Dense(8, activation='relu')(model.layers[1].output)
outputs2 = tf.keras.layers.Dense(7, activation='softmax')(x2)
model2 = tf.keras.Model(inputs=model.input, outputs=outputs2)
Note that in this case model and model2 share the same input layer and first dense layer objects (model.layers[0] is model2.layers[0] and model.layers[1] is model2.layers[1]).

Related

Adding Dropout Layers to Segmentation_Models Resnet34 with Keras

I want to use the Segmentation_Models UNet (with ResNet34 Backbone) for uncertainty estimation, so i want to add some Dropout Layers into the upsampling part. The Model is not Sequential, so i think i have to reconnect some outputs to the new Dropout Layers and the following layer inputs to the output of Dropout.
I'm not sure, whats the right way to do this. I'm currently trying this:
# create model
model = sm.Unet('resnet34', classes=1, activation='sigmoid', encoder_weights='imagenet')
# define optimizer, loss and metrics
optim = tf.keras.optimizers.Adam(0.001)
total_loss = sm.losses.binary_focal_dice_loss # or sm.losses.categorical_focal_dice_loss
metrics = ['accuracy', sm.metrics.IOUScore(threshold=0.5), sm.metrics.FScore(threshold=0.5)]
# get input layer
updated_model_layers = model.layers[0]
# iterate over old model and add Dropout after given Convolutions
for layer in model.layers[1:]:
# take old layer and add to new Model
updated_model_layers = layer(updated_model_layers.output)
# after some convolutions, add Dropout
if layer.name in ['decoder_stage0b_conv', 'decoder_stage0a_conv', 'decoder_stage1a_conv', 'decoder_stage1b_conv', 'decoder_stage2a_conv',
'decoder_stage2b_conv', 'decoder_stage3a_conv', 'decoder_stage3b_conv', 'decoder_stage4a_conv']:
if (uncertain):
# activate dropout in predictions
next_layer = Dropout(0.1) (updated_model_layers, training=True)
else:
# add dropout layer
next_layer = Dropout(0.1) (updated_model_layers)
# add reconnected Droput Layer
updated_model_layers = next_layer
model = Model(model.layers[0], updated_model_layers)
This throws the following Error: AttributeError: 'KerasTensor' object has no attribute 'output'
But I think I'm doing something wrong. Does anybody have a Solution for this?
There is a problem with the Resnet model you are using. It is complex and has Add and Concatenate layers (residual layers, I guess), which take as input a list of tensors from several "subnetworks". In other words, the network is not linear, so you can't walk through the model with a simple loop.
Regarding your error, in the loop of your code: layer is a layer and updated_model_layers is a tensor (functional API). Therefore, updated_model_layers.output does not exist. You confuse the two a bit

Using prediction from keras model as a layer inside another keras model

Suppose we have a model already trained for some task can we use that models prediction as a lambda layer inside another model? I am thinking something in the following format:
pretrained_model=get_Model() #Loaded from a different file
pretrained_model.load_weights('pretrained_model_weights.h5')
base_model = VGG16(weights = 'imagenet',include_top=False,input_shape (240,320,3))
for layer in base_model.layers:
layer.trainable = True
img_input=base_model.input
encoded=base_model.output
pretrained_model_output=Lambda(lambda x: pretrained_model.predict(img_input))
#Then run pretrained_model_output through an architecture that gives same output size as base_model.output and then
concat = Concatenate([img_input,Output_Convolutions_pretrained_model_output],axis=-1)
#then feed this through another block in the model
Is something like this viable in Keras?
This is much easier than you think, you just need to do:
pretrained_model_output= pretrained_model(img_input)

Adding a Concatenated layer to TensorFlow 2.0 (using Attention)

In building a model that uses TensorFlow 2.0 Attention I followed the example given in the TF docs. https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention
The last line in the example is
input_layer = tf.keras.layers.Concatenate()(
[query_encoding, query_value_attention])
Then the example has the comment
# Add DNN layers, and create Model.
# ...
So it seemed logical to do this
model = tf.keras.Sequential()
model.add(input_layer)
This produces the error
TypeError: The added layer must be an instance of class Layer.
Found: Tensor("concatenate/Identity:0", shape=(None, 200), dtype=float32)
UPDATE (after #thushv89 response)
What I am trying to do in the end is add an attention layer in the following model which works well (or convert it to an attention model).
model = tf.keras.Sequential()
model.add(layers.Embedding(vocab_size, embedding_nodes, input_length=max_length))
model.add(layers.LSTM(20))
#add attention here?
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error', metrics=['accuracy'])
My data looks like this
4912,5059,5079,0
4663,5145,5146,0
4663,5145,5146,0
4840,5117,5040,0
Where the first three columns are the inputs and the last column is binary and the goal is classification. The data was prepared similarly to this example with a similar purpose, binary classification. https://machinelearningmastery.com/use-word-embedding-layers-deep-learning-keras/
So, first thing is Keras has three APIs when it comes to creating models.
Sequential - (Which is what you're doing here)
Functional - (Which is what I'm using in the solution)
Subclassing - Creating Python classes to represent custom models/layers
The way the model created in the tutorial is not to be used with sequential models but a model from the Functional API. So you got to do the following. Note that, I've taken the liberty of defining the dense layers with arbitrary parameters (e.g. number of output classes, which you can change as needed).
import tensorflow as tf
# Variable-length int sequences.
query_input = tf.keras.Input(shape=(None,), dtype='int32')
value_input = tf.keras.Input(shape=(None,), dtype='int32')
# ... the code in the middle
# Concatenate query and document encodings to produce a DNN input layer.
input_layer = tf.keras.layers.Concatenate()(
[query_encoding, query_value_attention])
# Add DNN layers, and create Model.
# ...
dense_out = tf.keras.layers.Dense(50, activation='relu')(input_layer)
pred = tf.keras.layers.Dense(10, activation='softmax')(dense_out)
model = tf.keras.models.Model(inputs=[query_input, value_input], outputs=pred)
model.summary()

Make fixed timestep length LSTM Keras model free timestep length

I have a Keras LSTM multitask model that performs two tasks. One is a sequence tagging task (so I predict a label per token). The other is a global classification task over the whole sequence using a CNN that is stacked on the hidden states of the LSTM.
In my setup (don't ask why) I only need the CNN task during training, but the labels it predicts have no use on the final product. So, on Keras, one can train a LSTM model without especifiying the input sequence lenght. like this:
l_input = Input(shape=(None,), dtype="int32", name=input_name)
However, if I add the CNN stacked on the LSTM hidden states I need to set a fixed sequence length for the model.
l_input = Input(shape=(timesteps_size,), dtype="int32", name=input_name)
The problem is that once I have trained the model with a fixed timestep_size I can no longer use it to predict longer sequences.
In other frameworks this is not a problem. But in Keras, I cannot get rid of the CNN and change the expected input shape of the model once it has been trained.
Here is a simplified version of the model
l_input = Input(shape=(timesteps_size,), dtype="int32")
l_embs = Embedding(len(input.keys()), 100)(l_input)
l_blstm = Bidirectional(GRU(300, return_sequences=True))(l_embs)
# Sequential output
l_out1 = TimeDistributed(Dense(len(labels.keys()),
activation="softmax"))(l_blstm)
# Global output
conv1 = Conv1D( filters=5 , kernel_size=10 )( l_embs )
conv1 = Flatten()(MaxPooling1D(pool_size=2)( conv1 ))
conv2 = Conv1D( filters=5 , kernel_size=8 )( l_embs )
conv2 = Flatten()(MaxPooling1D(pool_size=2)( conv2 ))
conv = Concatenate()( [conv1,conv2] )
conv = Dense(50, activation="relu")(conv)
l_out2 = Dense( len(global_labels.keys()) ,activation='softmax')(conv)
model = Model(input=input, output=[l_out1, l_out2])
optimizer = Adam()
model.compile(optimizer=optimizer,
loss="categorical_crossentropy",
metrics=["accuracy"])
I would like to know if anyone here has faced this issue, and if there are any solutions to delete layers from a model after training and, more important, how to reshape input layer sizes after training.
Thanks
Variable timesteps length makes a problem not because of using convolution layers (actually the good thing about convolution layers is that they do not depend on the input size). Rather, using Flatten layers cause the problem here since they need an input with specified size. Instead, you can use Global Pooling layers. Further, I think stacking convolution and pooling layers on top of each other might give a better result instead of using two separate convolution layers and merging them (although this depends on the specific problem and dataset you are working on). So considering these two points it might be better to write your model like this:
# Global output
conv1 = Conv1D(filters=16, kernel_size=5)(l_embs)
conv1 = MaxPooling1D(pool_size=2)(conv1)
conv2 = Conv1D(filters=32, kernel_size=5)(conv1)
conv2 = MaxPooling1D(pool_size=2)(conv2)
gpool = GlobalAveragePooling1D()(conv2)
x = Dense(50, activation="relu")(gpool)
l_out2 = Dense(len(global_labels.keys()), activation='softmax')(x)
model = Model(inputs=l_input, outputs=[l_out1, l_out2])
You may need to tune the number of conv+maxpool layers, number of filters, kernel size and even add dropout or batch normalization layers.
As a side note, using TimeDistributed on a Dense layer is redundant as the Dense layer is applied on the last axis.

Constructing a keras model

I don't understand what's happening in this code:
def construct_model(use_imagenet=True):
# line 1: how do we keep all layers of this model ?
model = keras.applications.InceptionV3(include_top=False, input_shape=(IMG_SIZE, IMG_SIZE, 3),
weights='imagenet' if use_imagenet else None) # line 1: how do we keep all layers of this model ?
new_output = keras.layers.GlobalAveragePooling2D()(model.output)
new_output = keras.layers.Dense(N_CLASSES, activation='softmax')(new_output)
model = keras.engine.training.Model(model.inputs, new_output)
return model
Specifically, my confusion is, when we call the last constructor
model = keras.engine.training.Model(model.inputs, new_output)
we specify input layer and output layer, but how does it know we want all the other layers to stay?
In other words, we append the new_output layer to the pre-trained model we load in line 1, that is the new_output layer, and then in the final constructor (final line), we just create and return a model with a specified input and output layers, but how does it know what other layers we want in between?
Side question 1): What is the difference between keras.engine.training.Model and keras.models.Model?
Side question 2): What exactly happens when we do new_layer = keras.layers.Dense(...)(prev_layer)? Does the () operation return new layer, what does it do exactly?
This model was created using the Functional API Model
Basically it works like this (perhaps if you go to the "side question 2" below before reading this it may get clearer):
You have an input tensor (you can see it as "input data" too)
You create (or reuse) a layer
You pass the input tensor to a layer (you "call" a layer with an input)
You get an output tensor
You keep working with these tensors until you have created the entire graph.
But this hasn't created a "model" yet. (One you can train and use other things).
All you have is a graph telling which tensors go where.
To create a model, you define it's start end end points.
In the example.
They take an existing model: model = keras.applications.InceptionV3(...)
They want to expand this model, so they get its output tensor: model.output
They pass this tensor as the input of a GlobalAveragePooling2D layer
They get this layer's output tensor as new_output
They pass this as input to yet another layer: Dense(N_CLASSES, ....)
And get its output as new_output (this var was replaced as they are not interested in keeping its old value...)
But, as it works with the functional API, we don't have a model yet, only a graph. In order to create a model, we use Model defining the input tensor and the output tensor:
new_model = Model(old_model.inputs, new_output)
Now you have your model.
If you use it in another var, as I did (new_model), the old model will still exist in model. And these models are sharing the same layers, in a way that whenever you train one of them, the other gets updated as well.
Question: how does it know what other layers we want in between?
When you do:
outputTensor = SomeLayer(...)(inputTensor)
you have a connection between the input and output. (Keras will use the inner tensorflow mechanism and add these tensors and nodes to the graph). The output tensor cannot exist without the input. The entire InceptionV3 model is connected from start to end. Its input tensor goes through all the layers to yield an ouptut tensor. There is only one possible way for the data to follow, and the graph is the way.
When you get the output of this model and use it to get further outputs, all your new outputs are connected to this, and thus to the first input of the model.
Probably the attribute _keras_history that is added to the tensors is closely related to how it tracks the graph.
So, doing Model(old_model.inputs, new_output) will naturally follow the only way possible: the graph.
If you try doing this with tensors that are not connected, you will get an error.
Side question 1
Prefer to import from "keras.models". Basically, this module will import from the other module:
https://github.com/keras-team/keras/blob/master/keras/models.py
Notice that the file keras/models.py imports Model from keras.engine.training. So, it's the same thing.
Side question 2
It's not new_layer = keras.layers.Dense(...)(prev_layer).
It is output_tensor = keras.layers.Dense(...)(input_tensor).
You're doing two things in the same line:
Creating a layer - with keras.layers.Dense(...)
Calling the layer with an input tensor to get an output tensor
If you wanted to use the same layer with different inputs:
denseLayer = keras.layers.Dense(...) #creating a layer
output1 = denseLayer(input1) #calling a layer with an input and getting an output
output2 = denseLayer(input2) #calling the same layer on another input
output3 = denseLayer(input3) #again
Bonus - Creating a functional model that is equal to a sequential model
If you create this sequential model:
model = Sequential()
model.add(Layer1(...., input_shape=some_shape))
model.add(Layer2(...))
model.add(Layer3(...))
You're doing exactly the same as:
inputTensor = Input(some_shape)
outputTensor = Layer1(...)(inputTensor)
outputTensor = Layer2(...)(outputTensor)
outputTensor = Layer3(...)(outputTensor)
model = Model(inputTensor,outputTensor)
What is the difference?
Well, functional API models are totally free to be build anyway you want. You can create branches:
out1 = Layer1(..)(inputTensor)
out2 = Layer2(..)(inputTensor)
You can join tensors:
joinedOut = Concatenate()([out1,out2])
With this, you can create anything you want with all kinds of fancy stuff, branches, gates, concatenations, additions, etc., which you can't do with a sequential model.
In fact, a Sequential model is also a Model, but created for a quick use in models without branches.
There's this way of building a model from a pretrained one that you may build upon.
See https://keras.io/applications/#fine-tune-inceptionv3-on-a-new-set-of-classes:
base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(200, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
Each time a layer is added by an op like "x=Dense(...", information about the computational graph is updated. You can type this interactively to see what it contains:
x.graph.__dict__
You can see there's all kinds of attributes, including about previous and next layers. These are internal implementation details and possibly change over time.

Categories