Training the same model with two different outputs with Keras - python

I have a simple GRU network coded with Keras in python as below:
gru1 = GRU(16, activation='tanh', return_sequences=True)(input)
dense = TimeDistributed(Dense(16, activation='tanh'))(gru1)
output = TimeDistributed(Dense(1, activation="sigmoid"))(dense)
I've used a sigmoid activation for output since my purpose is classification. But I need to use the same model for regression as well. I'll need to change the output activation as linear. However, the rest of the network is still the same. So in this case, I'll use two different networks for two different purposes. Inputs are the same. But outputs are classes for sigmoid and values for linear activation.
My question is, is there any way to use only one network but get two different outputs at the end? Thanks.

Yes, you can use functional API to design a multi-output model. You can keep shared layers and 2 different outputs one with sigmoid another with linear activation.
N.B: Don't use input as a variable, it's a function name in python.
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
seq_len = 100 # your sequence length
input_ = Input(shape=(seq_len,1))
gru1 = GRU(16, activation='tanh', return_sequences=True)(input_)
dense = TimeDistributed(Dense(16, activation='tanh'))(gru1)
output1 = TimeDistributed(Dense(1, activation="sigmoid", name="out1"))(dense)
output2 = TimeDistributed(Dense(1, activation="linear", name="out2"))(dense)
model = Model(input_, [output1, output2])
model.summary()
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_3 (InputLayer) [(None, 100, 1)] 0
__________________________________________________________________________________________________
gru_2 (GRU) (None, 100, 16) 912 input_3[0][0]
__________________________________________________________________________________________________
time_distributed_3 (TimeDistrib (None, 100, 16) 272 gru_2[0][0]
__________________________________________________________________________________________________
time_distributed_4 (TimeDistrib (None, 100, 1) 17 time_distributed_3[0][0]
__________________________________________________________________________________________________
time_distributed_5 (TimeDistrib (None, 100, 1) 17 time_distributed_3[0][0]
==================================================================================================
Total params: 1,218
Trainable params: 1,218
Non-trainable params: 0
Compiling with two loss functions:
losses = {
"out1": "binary_crossentropy",
"out2": "mse",
}
# initialize the optimizer and compile the model
model.compile(optimizer='adam', loss=losses, metrics=["accuracy", "mae"])

Related

Cannot resolve error in keras sequential model

I am working on a gesture recognition problem. For that I have a train set. Train set consists of multiple folders and each folder consists of a series of 30 images. From those images the model is trained. Also I have a csv file that contains the class label of each folder. The class labels are : "Left Swipe", "Right Swipe", "Stop", "Thumbs Down" and "Thumbs Up". Those labels are present in one np.array variable train_class. Now, I have created a CNN model then feeding that in a Sequential model.
The code is available in below GIT location
https://github.com/subhrajyoti-ghosh/ML-and-Deep-Learning/blob/main/Gesture_Recognition.ipynb
But when I am trying to fit the model, I am receiving error. Can you please help me understanding the error and how to solve that?
You are trying to use a TimeDistributed layer on a 2D input (batch_size, 256), which will not work, because the layer needs at least a 3D tensor. You should try using tf.keras.layers.RepeatVector:
import tensorflow as tf
resnet = tf.keras.applications.ResNet50(include_top=False,weights='imagenet',input_shape=(224,224,3))
cnn = tf.keras.Sequential([resnet])
cnn.add(tf.keras.layers.Conv2D(64,(2,2),strides=(1,1)))
cnn.add(tf.keras.layers.Conv2D(16,(3,3),strides=(1,1)))
cnn.add(tf.keras.layers.Flatten())
inputs = tf.keras.layers.Input(shape=(224,224,3))
x = cnn(inputs)
x = tf.keras.layers.RepeatVector(n=30)(x)
x = tf.keras.layers.GRU(16,return_sequences=True)(x)
x = tf.keras.layers.GRU(8)(x)
outputs = tf.keras.layers.Dense(5,activation='softmax')(x)
model = tf.keras.Model(inputs, outputs)
dummy_x = tf.random.normal((1, 224,224,3))
print(model.summary())
print(model(dummy_x))
Model: "model_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_14 (InputLayer) [(None, 224, 224, 3)] 0
sequential_6 (Sequential) (None, 256) 24121296
repeat_vector_2 (RepeatVect (None, 30, 256) 0
or)
gru_5 (GRU) (None, 30, 16) 13152
gru_6 (GRU) (None, 8) 624
dense_7 (Dense) (None, 5) 45
=================================================================
Total params: 24,135,117
Trainable params: 24,081,997
Non-trainable params: 53,120
_________________________________________________________________
None

Keras Transfer-Learning setting layers.trainable to True has no effect

I want to finetune efficientnet using tf.keras (tensorflow 2.3) but i cannot change the training status of layers properly. My model looks like this:
data_augmentation_layers = tf.keras.Sequential([
keras.layers.experimental.preprocessing.RandomFlip("horizontal_and_vertical"),
keras.layers.experimental.preprocessing.RandomRotation(0.8)])
efficientnet = EfficientNetB3(weights="imagenet", include_top=False,
input_shape=(*img_size, 3))
#Setting to not trainable as described in the standard keras FAQ
efficientnet.trainable = False
inputs = keras.layers.Input(shape=(*img_size, 3))
augmented = augmentation_layers(inputs)
base = efficientnet(augmented, training=False)
pooling = keras.layers.GlobalAveragePooling2D()(base)
outputs = keras.layers.Dense(5, activation="softmax")(pooling)
model = keras.Model(inputs=inputs, outputs=outputs)
model.compile(loss="categorical_crossentropy", optimizer=keras_opt, metrics=["categorical_accuracy"])
This is done so that my random weights on the custom top wont destroy the weights asap.
Model: "functional_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 512, 512, 3)] 0
_________________________________________________________________
sequential (Sequential) (None, 512, 512, 3) 0
_________________________________________________________________
efficientnetb3 (Functional) (None, 16, 16, 1536) 10783535
_________________________________________________________________
global_average_pooling2d (Gl (None, 1536) 0
_________________________________________________________________
dense (Dense) (None, 5) 7685
=================================================================
Total params: 10,791,220
Trainable params: 7,685
Non-trainable params: 10,783,535
Everything seems to work until this point. I train my model for 2 epochs and then i want to start fine-tuning the efficientnet base. Thus i call
for l in model.get_layer("efficientnetb3").layers:
if not isinstance(l, keras.layers.BatchNormalization):
l.trainable = True
model.compile(loss="categorical_crossentropy", optimizer=keras_opt, metrics=["categorical_accuracy"])
I recompiled and print the summary again to see that the number of non-trainable weights remained the same. Also fitting does not bring better results that keeping frozen.
dense (Dense) (None, 5) 7685
=================================================================
Total params: 10,791,220
Trainable params: 7,685
Non-trainable params: 10,783,535
Ps: I also tried efficientnet3.trainable = True but this also had no effect.
Could it be that it has something to do with the fact that i'm using a sequential and a functional model at the same time?
For me the problem was using sequential API for part of the model. When I change to sequential, my model.sumary() displayed all the sublayers and it was possible to set some of them as trainable and others not.

In Keras elmo embedding layer has 0 parameters? is this normal?

So I was using GloVe with my model and it worked, but now I changed to Elmo (reference that Keras code available on GitHub Elmo Keras Github, utils.py
however, when I print model.summary I get 0 parameters in the ELMo Embedding layer unlike when I was using Glove is that normal ? If not can you please tell me what am I doing wrong
Using glove I Got over 20Million parameters
##--------> When I was using Glove Embedding Layer
word_embedding_layer = emb.get_keras_embedding(#dropout = emb_dropout,
trainable = True,
input_length = sent_maxlen,
name='word_embedding_layer')
## --------> Deep layers
pos_embedding_layer = Embedding(output_dim =pos_tag_embedding_size, #5
input_dim = len(SPACY_POS_TAGS),
input_length = sent_maxlen, #20
name='pos_embedding_layer')
latent_layers = stack_latent_layers(num_of_latent_layers)
##--------> 6] Dropout
dropout = Dropout(0.1)
## --------> 7]Prediction
predict_layer = predict_classes()
## --------> 8] Prepare input features, and indicate how to embed them
inputs = [Input((sent_maxlen,), dtype='int32', name='word_inputs'),
Input((sent_maxlen,), dtype='int32', name='predicate_inputs'),
Input((sent_maxlen,), dtype='int32', name='postags_inputs')]
## --------> 9] ELMo Embedding and Concat all inputs and run on deep network
from elmo import ELMoEmbedding
import utils
idx2word = utils.get_idx2word()
ELmoembedding1 = ELMoEmbedding(idx2word=idx2word, output_mode="elmo", trainable=True)(inputs[0]) # These two are interchangeable
ELmoembedding2 = ELMoEmbedding(idx2word=idx2word, output_mode="elmo", trainable=True)(inputs[1]) # These two are interchangeable
embeddings = [ELmoembedding1,
ELmoembedding2,
pos_embedding_layer(inputs[3])]
con1 = keras.layers.concatenate(embeddings)
## --------> 10]Build model
outputI = predict_layer(dropout(latent_layers(con1)))
model = Model(inputs, outputI)
model.compile(optimizer='adam',
loss='categorical_crossentropy',
metrics=['categorical_accuracy'])
model.summary()
Trials:
note: I tried using the TF-Hub Elmo with Keras code, but the output was always a 2D tensor [even when I changed it to 'Elmo' setting and 'LSTM' instead of default']so I couldn't Concatenate with POS_embedding_layer. I tried reshaping but eventually I got the same issue total Parameters 0.
From the TF-Hub description (https://tfhub.dev/google/elmo/2), the embeddings of individual words are not trainable. Only the weighted sum of the embedding and LSTM layers are. So you should get 4 trainable parameters at the ELMo level.
I was able to get the trainable parameters using the class defined in StrongIO's example on Github. The example only provides a class where the output is the default layer, which is a 1024 vector for each input example (essentially a document/sentence encoder). To access the embeddings of each word (the elmo layer), a few changes are needed as suggested in this issue:
class ElmoEmbeddingLayer(Layer):
def __init__(self, **kwargs):
self.dimensions = 1024
self.trainable=True
super(ElmoEmbeddingLayer, self).__init__(**kwargs)
def build(self, input_shape):
self.elmo = hub.Module('https://tfhub.dev/google/elmo/2', trainable=self.trainable,
name="{}_module".format(self.name))
self.trainable_weights += K.tf.trainable_variables(scope="^{}_module/.*".format(self.name))
super(ElmoEmbeddingLayer, self).build(input_shape)
def call(self, x, mask=None):
result = self.elmo(
K.squeeze(
K.cast(x, tf.string), axis=1
),
as_dict=True,
signature='default',
)['elmo']
return result
def compute_output_shape(self, input_shape):
return (input_shape[0], None, self.dimensions)
You can stack the ElmoEmbeddingLayer with the POS layer.
As a more general example, one can use the ELMo embeddings in a 1D ConvNet model for classification:
elmo_input_layer = Input(shape=(None, ), dtype="string")
elmo_output_layer = ElmoEmbeddingLayer()(elmo_input_layer)
conv_layer = Conv1D(
filters=100,
kernel_size=3,
padding='valid',
activation='relu',
strides=1)(elmo_output_layer)
pool_layer = GlobalMaxPooling1D()(conv_layer)
dense_layer = Dense(32)(pool_layer)
output_layer = Dense(1, activation='sigmoid')(dense_layer)
model = Model(
inputs=elmo_input_layer,
outputs=output_layer)
model.summary()
The model summary looks like this:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_62 (InputLayer) (None, None) 0
_________________________________________________________________
elmo_embedding_layer_13 (Elm (None, None, 1024) 4
_________________________________________________________________
conv1d_46 (Conv1D) (None, None, 100) 307300
_________________________________________________________________
global_max_pooling1d_42 (Glo (None, 100) 0
_________________________________________________________________
dense_53 (Dense) (None, 32) 3232
_________________________________________________________________
dense_54 (Dense) (None, 1) 33
=================================================================
Total params: 310,569
Trainable params: 310,569
Non-trainable params: 0
_________________________________________________________________

how does input_shape in keras.applications work?

I have been through the Keras documentation but I am still unable to figure how does the input_shape parameter works and why it does not change the number of parameters for my DenseNet model when I pass it my custom input shape. An example:
import keras
from keras import applications
from keras.layers import Conv3D, MaxPool3D, Flatten, Dense
from keras.layers import Dropout, Input, BatchNormalization
from keras import Model
# define model 1
INPUT_SHAPE = (224, 224, 1) # used to define the input size to the model
n_output_units = 2
activation_fn = 'sigmoid'
densenet_121_model = applications.densenet.DenseNet121(include_top=False, weights=None, input_shape=INPUT_SHAPE, pooling='avg')
inputs = Input(shape=INPUT_SHAPE, name='input')
model_base = densenet_121_model(inputs)
output = Dense(units=n_output_units, activation=activation_fn)(model_base)
model = Model(inputs=inputs, outputs=output)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 224, 224, 1) 0
_________________________________________________________________
densenet121 (Model) (None, 1024) 7031232
_________________________________________________________________
dense_1 (Dense) (None, 2) 2050
=================================================================
Total params: 7,033,282
Trainable params: 6,949,634
Non-trainable params: 83,648
_________________________________________________________________
# define model 2
INPUT_SHAPE = (512, 512, 1) # used to define the input size to the model
n_output_units = 2
activation_fn = 'sigmoid'
densenet_121_model = applications.densenet.DenseNet121(include_top=False, weights=None, input_shape=INPUT_SHAPE, pooling='avg')
inputs = Input(shape=INPUT_SHAPE, name='input')
model_base = densenet_121_model(inputs)
output = Dense(units=n_output_units, activation=activation_fn)(model_base)
model = Model(inputs=inputs, outputs=output)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 512, 512, 1) 0
_________________________________________________________________
densenet121 (Model) (None, 1024) 7031232
_________________________________________________________________
dense_2 (Dense) (None, 2) 2050
=================================================================
Total params: 7,033,282
Trainable params: 6,949,634
Non-trainable params: 83,648
_________________________________________________________________
Ideally with an increase in the input shape the number of parameters should increase, however as you can see they stay exactly the same. My questions are thus:
Why do the number of parameters not change with a change in the input_shape?
I have only defined one channel in my input_shape, what would happen to my model training in this scenario? The documentation says the following:
input_shape: optional shape tuple, only to be specified if include_top
is False (otherwise the input shape has to be (224, 224, 3) (with
'channels_last' data format) or (3, 224, 224) (with 'channels_first'
data format). It should have exactly 3 inputs channels, and width and
height should be no smaller than 32. E.g. (200, 200, 3) would be one
valid value.
However when I run the model with this configuration it runs without any problems. Could there be something that I am missing out?
Using Keras 2.2.4 with Tensorflow 1.12.0 as backend.
1.
In the convolutional layers the input size does not influence the number of weights, because the number of weights is determined by the kernel matrix dimensions. A larger input size leads to a larger output size, but not to an increasing number of weights.
This means, that the output size of the convolutional layers of the second model will be larger than for the first model, which would increase the number of weights in the following dense layer. However if you take a look into the architecture of DenseNet you notice that there's a GlobalMaxPooling2D layer after all the convolutional layers, which averages all the values for each output channel. Thats why the output of DenseNet will be of size 1024, whatever the input shape.
2.
Yes, the model will still work. I'm not entirely sure about that, but my guess is that the single channel will be broadcasted (dublicated) to fill all three channels. Thats at least how these things are usually handled (see for exaple tensorflow or numpy).
The DenseNet is composed of two parts, the convolution part, and the global pooling part.
The number of the convolution part's trainable weights doesn't depend on the input shape.
Usually, a classification network should employ fully connected layers to infer the classification, however, in DenseNet, global pooling is used and doesn't bring any trainable weights.
Therefore, the input shape doesn't affect the number of weights of the entire network.

Sharing filter weight in convolution network

I have been working on VGG16 for image recognition for quite a while, and I am very confident about it already. Today, I came across a post in Quora and I started to doubt my understanding on CNN.
In that post, it says that the same filter in a CNN layer should share the same weight. So, assume the kernal size is 3 and the number of filter is 1 in the following constitutional layer, the total number of parameters (weights) should be 3X1 = 3, which is represented by the red, blue, and green arrows. It's easy to understand the Conv1d example.
Then, I try to do experiment on Conv2d with the following keras code:
from keras.layers import Input, Dense, Conv2D, MaxPool2D, Dropout
from keras.models import Model
input_layer = Input(shape=(100,100,1,), name='input_layer')
ccm1_conv = Conv2D(filters=1,kernel_size=(3,3),strides=(1,1),padding='same')(input_layer)
model = Model(input_layer,ccm1_conv)
model.summary()
Layer (type) Output Shape Param #
=================================================================
input_layer (InputLayer) (None, 100, 100, 1) 0
_________________________________________________________________
conv2d_9 (Conv2D) (None, 100, 100, 1) 10
=================================================================
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________
Since I use only 1 filter, and my kernel_size = 3X3, which means that the kernel reads 9 neurons in the previous layers and then connect it to a neuron in the next layer. Therefore, I would expect 9 parameters (weights) instead of 10.
Then, I tried number of filters = 10, kernal_size = 5X5, it gives 260 parameters (weights) instead of 5*5*10 parameters that I would expect:
from keras.layers import Input, Dense, Conv2D, MaxPool2D, Dropout
from keras.models import Model
input_layer = Input(shape=(100,100,1,), name='input_layer')
ccm1_conv = Conv2D(filters=60,kernel_size=(3,3),strides=(1,1))(input_layer)
model = Model(input_layer,ccm1_conv)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_layer (InputLayer) (None, 100, 100, 1) 0
_________________________________________________________________
conv2d_10 (Conv2D) (None, 100, 100, 10) 260
=================================================================
Total params: 260
Trainable params: 260
Non-trainable params: 0
_________________________________________________________________
It seems the number of parameters in Conv2d is calculated by the following equation
num_weights = num_filters * (kernal_width*kernal_height + 1)
And I have no idea where does the +1 come from.
The +1 comes from the bias term of each filter. In addition to the kernel weights, each filter has an extra parameter called the bias term (which multiplies a constant 1), like in a fully-connected layer. Keras uses a bias term for each filter by default, but you can also omit it by setting the argument use_bias of Conv2D to False:
from keras.layers import Input, Dense, Conv2D, MaxPool2D, Dropout
from keras.models import Model
input_layer = Input(shape=(100, 100, 1,), name='input_layer')
ccm1_conv = Conv2D(filters=1, kernel_size=(3, 3), strides=(1, 1), padding='same', use_bias=False)(input_layer)
model = Model(input_layer, ccm1_conv)
model.summary()

Categories