I saved an LSTM with multiple layers. Now, I want to load it and just fine-tune the last LSTM layer. How can I target this layer and change its parameters?
Example of a simple model trained and saved:
model = Sequential()
# first layer #neurons
model.add(LSTM(100, return_sequences=True, input_shape=(X.shape[1],
X.shape[2])))
model.add(LSTM(50, return_sequences=True))
model.add(LSTM(25))
model.add(Dense(1))
model.compile(loss='mae', optimizer='adam')
I can load and retrain it but I can't find a way to target specific layer and freeze all the other layers.
An easy solution would be to name each layer, i.e.
model.add(LSTM(50, return_sequences=True, name='2nd_lstm'))
Then, upon loading the model you can iterate over the layers and freeze the ones matching a name condition:
for layer in model.layers:
if layer.name == '2nd_lstm':
layer.trainable = False
Then you need to recompile your model for the changes to take effect, and afterwards you may resume training as usual.
If you have previously built and saved the model and now want to load it and fine-tune only the last LSTM layer, then you need to set the other layers' trainable property to False. First, find the name of the layer (or index of the layer by counting from zero starting from the top) by using model.summary() method. For example this is the output produced for one of my models:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_10 (InputLayer) (None, 400, 16) 0
_________________________________________________________________
conv1d_2 (Conv1D) (None, 400, 32) 4128
_________________________________________________________________
lstm_2 (LSTM) (None, 32) 8320
_________________________________________________________________
dense_2 (Dense) (None, 1) 33
=================================================================
Total params: 12,481
Trainable params: 12,481
Non-trainable params: 0
_________________________________________________________________
Then set the trainable parameters of all the layers except the LSTM layer to False.
Approach 1:
for layer in model.layers:
if layer.name != `lstm_2`
layer.trainable = False
Approach 2:
for layer in model.layers:
layer.trainable = False
model.layers[2].trainable = True # set lstm to be trainable
# to make sure 2 is the index of the layer
print(model.layers[2].name) # prints 'lstm_2'
Don't forget to compile the model again to apply these changes.
Related
Suppose I have a model
from tensorflow.keras.applications import DenseNet201
base_model = DenseNet201(input_tensor=Input(shape=basic_shape))
model = Sequential()
model.add(base_model)
model.add(Dense(400))
model.add(BatchNormalization())
model.add(ReLU())
model.add(Dense(50, activation='softmax'))
model.save('test.hdf5')
Then I load the saved model and try to make the last 40 layers of DenseNet201 trainable and the first 161 - non-trainable:
saved_model = load_model('test.hdf5')
cnt = 44
saved_model.trainable = False
while cnt > 0:
saved_model.layers[-cnt].trainable = True
cnt -= 1
But this is not actually working because DenseNet201 is determined as a single layer and I just get index out of range error.
Layer (type) Output Shape Param #
=================================================================
densenet201 (Functional) (None, 1000) 20242984
_________________________________________________________________
dense (Dense) (None, 400) 400400
_________________________________________________________________
batch_normalization (BatchNo (None, 400) 1600
_________________________________________________________________
re_lu (ReLU) (None, 400) 0
_________________________________________________________________
dense_1 (Dense) (None, 50) 20050
=================================================================
Total params: 20,665,034
Trainable params: 4,490,090
Non-trainable params: 16,174,944
The question is how can I actually make the first 161 layers of DenseNet non-trainable and the last 40 layers trainable on a loaded model?
densenet201 (Functional) is a nested model, therefore you can access its layers the same way you access the layers of your 'topmost' model.
saved_model.layers[0].layers
where saved_model.layers[0] is a model with its own layers.
In your loop, you need to access the layers like this
saved_model.layers[0].layers[-cnt].trainable = True
Update
By default, the loaded model's layers are trainable (trainable=True), therefore you will need to set the bottom layers' trainable attribute to False instead.
After creating pre-embedded layer using gensim my val_accuracy has gone down to 45% for 4600 records:-
model = models.Sequential()
model.add(Embedding(input_dim=MAX_NB_WORDS, output_dim=EMBEDDING_DIM,
weights=[embedding_model],trainable=False,
input_length=seq_len,mask_zero=True))
#model.add(SpatialDropout1D(0.2))
#model.add(Embedding(vocabulary_size, 64))
model.add(GRU(units=150, return_sequences=True))
model.add(Dropout(0.4))
model.add(LSTM(units=200,dropout=0.4))
#model.add(Dropout(0.8))
#model.add(LSTM(100))
#model.add(Dropout(0.4))
#Bidirectional(tf.keras.layers.LSTM(embedding_dim))
#model.add(LSTM(400,input_shape=(1117, 100),return_sequences=True))
#model.add(Bidirectional(LSTM(128)))
model.add(Dense(100, activation='relu'))
#
#model.add(Dropout(0.4))
#model.add(Dense(200, activation='relu'))
model.add(Dense(4, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop',
metrics=['accuracy'])
Model: "sequential_4"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_4 (Embedding) (None, 50, 100) 2746300
_________________________________________________________________
gru_4 (GRU) (None, 50, 150) 112950
_________________________________________________________________
dropout_4 (Dropout) (None, 50, 150) 0
_________________________________________________________________
lstm_4 (LSTM) (None, 200) 280800
_________________________________________________________________
dense_7 (Dense) (None, 100) 20100
_________________________________________________________________
dense_8 (Dense) (None, 4) 404
=================================================================
Total params: 3,160,554
Trainable params: 414,254
Non-trainable params: 2,746,300
_________________________________________________________________
Full code is at
https://colab.research.google.com/drive/13N94kBKkHIX2TR5B_lETyuH1QTC5VuRf?usp=sharing
It would be great help for me.Since i am new in deep learning and i tried almost everything i knew.But now am all blank.
The problem is with your input. You've padded your input sequences with zeros but have not provided this information to your model. So your model doesn't ignore the zeros which is the reason it's not learning at all. To resolve this, change your embedding layer as follows:
model.add(layers.Embedding(input_dim=vocab_size+1,
output_dim=embedding_dim,
mask_zero=True))
This will enable your model to ignore the zero padding and learn. Training with this, I got a training accuracy of 100% in just 6 epochs though validation accuracy wasn't that good (aroung 54%) which is expected as your training data contains only 32 examples. More about embedding layer: https://keras.io/api/layers/core_layers/embedding/
Since your dataset is small, the model tends to overfit on training data quite easily which gives lower validation accuracy. To mitigate this to some extent, you can try using pre-trained word embeddings like word2vec or GloVe instead of training your own embedding layer. Also, try some text data augmentation methods like creating artificial data using templates or replacing words in training data with their synonyms. You can also experiment with different types of layers (like replacing GRU with another LSTM) but in my opinion that may not help much here and should be considered after trying out pre-trained embeddings and data augmentation.
I have the following LSTM model. Can somebody helps me understand the summary of the model?
a) How the param# are calculated?
b) We have no value?
c) the param# near the dropoout why is 0?
model = Sequential()
model.add(LSTM(64, return_sequences=True, recurrent_regularizer=l2(0.0015), input_shape=(timestamps,
input_dim)))
model.add(Dropout(0.5))
model.add(LSTM(64, recurrent_regularizer=l2(0.0015), input_shape=(timesteps,input_dim)))
model.add(Dense(64, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(n_classes, activation='softmax'))
model.summary()
The following are the input, timestamps, and x_train
timesteps=100
input_dim= 6
X_train=1120
The summary is:
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 100, 64) 18176
_________________________________________________________________
dropout_1 (Dropout) (None, 100, 64) 0
_________________________________________________________________
lstm_2 (LSTM) (None, 64) 33024
_________________________________________________________________
dense_1 (Dense) (None, 64) 4160
_________________________________________________________________
dense_2 (Dense) (None, 64) 4160
_________________________________________________________________
dense_3 (Dense) (None, 6) 390
=================================================================
Total params: 59,910
Trainable params: 59,910
Non-trainable params: 0
Part of your question is answered here.
https://datascience.stackexchange.com/questions/10615/number-of-parameters-in-an-lstm-model
Simply put, the reason there are so many parameters for an LSTM model is because you have tons of data in your model and many weights need to be trained to fit the model.
Dropout layers don't have parameters because there are no weights in a dropout layer. All a dropout layer does is give a % chance that a neuron won't be included during testing. In this case, you've chosen 50%. Beyond that, there is nothing to configure in a dropout layer.
How parameters are calculated?
well!!. the input dimension is 6 and the hidden neurons in the first LSTM layer is 64.
so the first LSTM layer takes input [64 (initialized hidden state) + 6 (input)] in this form. so we can say the input dimension is 70 [64 (hidden state at t-1) + 6 current input at t].
Now the calculation part.
no of parms = input dimension * hidden units + bias.
= [64 (randomly initialized hidden state dimension) + 6 (input dimension)]*64( hidden neurons ) + 64 ( bias 1 for each hidden neurons)
= (64+6)*64+64
for one FFNN = 4544
But LSTM has 4 FFNN, so simply multiply it by 4.
Total trainable params = 4 * 4544
= 18176
Dropout layer does not have any parameters.
I am not sure which value you are talking about.?
I have a simple GRU network coded with Keras in python as below:
gru1 = GRU(16, activation='tanh', return_sequences=True)(input)
dense = TimeDistributed(Dense(16, activation='tanh'))(gru1)
output = TimeDistributed(Dense(1, activation="sigmoid"))(dense)
I've used a sigmoid activation for output since my purpose is classification. But I need to use the same model for regression as well. I'll need to change the output activation as linear. However, the rest of the network is still the same. So in this case, I'll use two different networks for two different purposes. Inputs are the same. But outputs are classes for sigmoid and values for linear activation.
My question is, is there any way to use only one network but get two different outputs at the end? Thanks.
Yes, you can use functional API to design a multi-output model. You can keep shared layers and 2 different outputs one with sigmoid another with linear activation.
N.B: Don't use input as a variable, it's a function name in python.
from tensorflow.keras.layers import *
from tensorflow.keras.models import Model
seq_len = 100 # your sequence length
input_ = Input(shape=(seq_len,1))
gru1 = GRU(16, activation='tanh', return_sequences=True)(input_)
dense = TimeDistributed(Dense(16, activation='tanh'))(gru1)
output1 = TimeDistributed(Dense(1, activation="sigmoid", name="out1"))(dense)
output2 = TimeDistributed(Dense(1, activation="linear", name="out2"))(dense)
model = Model(input_, [output1, output2])
model.summary()
Model: "model_1"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_3 (InputLayer) [(None, 100, 1)] 0
__________________________________________________________________________________________________
gru_2 (GRU) (None, 100, 16) 912 input_3[0][0]
__________________________________________________________________________________________________
time_distributed_3 (TimeDistrib (None, 100, 16) 272 gru_2[0][0]
__________________________________________________________________________________________________
time_distributed_4 (TimeDistrib (None, 100, 1) 17 time_distributed_3[0][0]
__________________________________________________________________________________________________
time_distributed_5 (TimeDistrib (None, 100, 1) 17 time_distributed_3[0][0]
==================================================================================================
Total params: 1,218
Trainable params: 1,218
Non-trainable params: 0
Compiling with two loss functions:
losses = {
"out1": "binary_crossentropy",
"out2": "mse",
}
# initialize the optimizer and compile the model
model.compile(optimizer='adam', loss=losses, metrics=["accuracy", "mae"])
I have been through the Keras documentation but I am still unable to figure how does the input_shape parameter works and why it does not change the number of parameters for my DenseNet model when I pass it my custom input shape. An example:
import keras
from keras import applications
from keras.layers import Conv3D, MaxPool3D, Flatten, Dense
from keras.layers import Dropout, Input, BatchNormalization
from keras import Model
# define model 1
INPUT_SHAPE = (224, 224, 1) # used to define the input size to the model
n_output_units = 2
activation_fn = 'sigmoid'
densenet_121_model = applications.densenet.DenseNet121(include_top=False, weights=None, input_shape=INPUT_SHAPE, pooling='avg')
inputs = Input(shape=INPUT_SHAPE, name='input')
model_base = densenet_121_model(inputs)
output = Dense(units=n_output_units, activation=activation_fn)(model_base)
model = Model(inputs=inputs, outputs=output)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 224, 224, 1) 0
_________________________________________________________________
densenet121 (Model) (None, 1024) 7031232
_________________________________________________________________
dense_1 (Dense) (None, 2) 2050
=================================================================
Total params: 7,033,282
Trainable params: 6,949,634
Non-trainable params: 83,648
_________________________________________________________________
# define model 2
INPUT_SHAPE = (512, 512, 1) # used to define the input size to the model
n_output_units = 2
activation_fn = 'sigmoid'
densenet_121_model = applications.densenet.DenseNet121(include_top=False, weights=None, input_shape=INPUT_SHAPE, pooling='avg')
inputs = Input(shape=INPUT_SHAPE, name='input')
model_base = densenet_121_model(inputs)
output = Dense(units=n_output_units, activation=activation_fn)(model_base)
model = Model(inputs=inputs, outputs=output)
model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input (InputLayer) (None, 512, 512, 1) 0
_________________________________________________________________
densenet121 (Model) (None, 1024) 7031232
_________________________________________________________________
dense_2 (Dense) (None, 2) 2050
=================================================================
Total params: 7,033,282
Trainable params: 6,949,634
Non-trainable params: 83,648
_________________________________________________________________
Ideally with an increase in the input shape the number of parameters should increase, however as you can see they stay exactly the same. My questions are thus:
Why do the number of parameters not change with a change in the input_shape?
I have only defined one channel in my input_shape, what would happen to my model training in this scenario? The documentation says the following:
input_shape: optional shape tuple, only to be specified if include_top
is False (otherwise the input shape has to be (224, 224, 3) (with
'channels_last' data format) or (3, 224, 224) (with 'channels_first'
data format). It should have exactly 3 inputs channels, and width and
height should be no smaller than 32. E.g. (200, 200, 3) would be one
valid value.
However when I run the model with this configuration it runs without any problems. Could there be something that I am missing out?
Using Keras 2.2.4 with Tensorflow 1.12.0 as backend.
1.
In the convolutional layers the input size does not influence the number of weights, because the number of weights is determined by the kernel matrix dimensions. A larger input size leads to a larger output size, but not to an increasing number of weights.
This means, that the output size of the convolutional layers of the second model will be larger than for the first model, which would increase the number of weights in the following dense layer. However if you take a look into the architecture of DenseNet you notice that there's a GlobalMaxPooling2D layer after all the convolutional layers, which averages all the values for each output channel. Thats why the output of DenseNet will be of size 1024, whatever the input shape.
2.
Yes, the model will still work. I'm not entirely sure about that, but my guess is that the single channel will be broadcasted (dublicated) to fill all three channels. Thats at least how these things are usually handled (see for exaple tensorflow or numpy).
The DenseNet is composed of two parts, the convolution part, and the global pooling part.
The number of the convolution part's trainable weights doesn't depend on the input shape.
Usually, a classification network should employ fully connected layers to infer the classification, however, in DenseNet, global pooling is used and doesn't bring any trainable weights.
Therefore, the input shape doesn't affect the number of weights of the entire network.