model training starts all over again after unfreezing weights in tensorflow - python

I am training an image classifier using Large EfficientNet:
base_model = EfficientNetV2L(input_shape = (300, 500, 3),
include_top = False,
weights = 'imagenet',
include_preprocessing = True)
model = tf.keras.Sequential([base_model,
layers.GlobalAveragePooling2D(),
layers.Dropout(0.2),
layers.Dense(128, activation = 'relu'),
layers.Dropout(0.3),
layers.Dense(6, activation = 'softmax')])
base_model.trainable = False
model.compile(optimizer = optimizers.Adam(learning_rate = 0.001),
loss = losses.SparseCategoricalCrossentropy(),
metrics = ['accuracy'])
callback = [callbacks.EarlyStopping(monitor = 'val_loss', patience = 2)]
history = model.fit(ds_train, batch_size = 28, validation_data = ds_val, epochs = 20, verbose = 1, callbacks = callback)
it is working properly.
model summary:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
efficientnetv2-l (Functiona (None, 10, 16, 1280) 117746848
l)
global_average_pooling2d (G (None, 1280) 0
lobalAveragePooling2D)
dropout (Dropout) (None, 1280) 0
dense (Dense) (None, 128) 163968
dropout_1 (Dropout) (None, 128) 0
dense_1 (Dense) (None, 6) 774
=================================================================
Total params: 117,911,590
Trainable params: 164,742
Non-trainable params: 117,746,848
_________________________________________________________________
output:
Epoch 4/20
179/179 [==============================] - 203s 1s/step - loss: 0.1559 - accuracy: 0.9474 - val_loss: 0.1732 - val_accuracy: 0.9428
But, while fine-tuning it, I am unfreezing some weights:
base_model.trainable = True
fine_tune_at = 900
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False
model.compile(optimizer = optimizers.Adam(learning_rate = 0.0001),
loss = losses.SparseCategoricalCrossentropy(),
metrics = ['accuracy'])
history = model.fit(ds_train, batch_size = 28, validation_data = ds_val, epochs = 20, verbose = 1, callbacks = callback)
model summary:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
efficientnetv2-l (Functiona (None, 10, 16, 1280) 117746848
l)
global_average_pooling2d (G (None, 1280) 0
lobalAveragePooling2D)
dropout (Dropout) (None, 1280) 0
dense (Dense) (None, 128) 163968
dropout_1 (Dropout) (None, 128) 0
dense_1 (Dense) (None, 6) 774
=================================================================
Total params: 117,911,590
Trainable params: 44,592,230
Non-trainable params: 73,319,360
_________________________________________________________________
And, it is starting the training all over again. For the first time, when I trained it with freezed weights, the loss decreased to 0.1559, after unfreezing the weights, the model started training again from loss = 0.444. Why is this happening? I think fine tuning should't reset the weights.

When training again the Adam lr rate for each node is set again to the initial lr maybe that is the reason for the big jump after you start the learning again.You can also specify to save and load the optimizer values as well when saving/loading the model. Maybe look here. You are also retraining a lot of parameters maybe reduce the amount of parameters. If you keep more old parameters the jump might not be that high.

Related

Remove top layer from pre-trained model, transfer learning, tensorflow (load_model)

I have pre-trained a model (my own saved model) with two classes, which I want to use for transfer learning to train a model with six classes.
I have loaded the pre-trained model into the new training script:
base_model = tf.keras.models.load_model("base_model_path")
How can I remove the top/head layer (a conv1D layer) ?
I see that in keras one can use base_model.pop(), and for tf.keras.applications one can simply use include_top=false
but is there something similar when using tf.keras and load_model?
(I have tried something like this:
for layer in base_model.layers[:-1]:
layer.trainable = False`
and then add it to a new model (?) but I am not sure on how to continue)
Thanks for any help!
You could try something like this:
The base model is made up of a simple Conv1D network with an output layer with two classes:
import tensorflow as tf
samples = 100
timesteps = 5
features = 2
classes = 2
dummy_x, dummy_y = tf.random.normal((100, 5, 2)), tf.random.uniform((100, 1), maxval=2, dtype=tf.int32)
base_model = tf.keras.Sequential()
base_model.add(tf.keras.layers.Conv1D(32, 3, activation='relu', input_shape=(5, 2)))
base_model.add(tf.keras.layers.GlobalMaxPool1D())
base_model.add(tf.keras.layers.Dense(32, activation='relu'))
base_model.add( tf.keras.layers.Dense(classes, activation='softmax'))
base_model.compile(optimizer='adam', loss = tf.keras.losses.SparseCategoricalCrossentropy())
print(base_model.summary())
base_model.fit(dummy_x, dummy_y, batch_size=16, epochs=1)
base_model.save("base_model")
base_model = tf.keras.models.load_model("base_model")
Model: "sequential_8"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_31 (Conv1D) (None, 3, 32) 224
global_max_pooling1d_13 (Gl (None, 32) 0
obalMaxPooling1D)
dense_17 (Dense) (None, 32) 1056
dense_18 (Dense) (None, 2) 66
=================================================================
Total params: 1,346
Trainable params: 1,346
Non-trainable params: 0
_________________________________________________________________
None
7/7 [==============================] - 0s 3ms/step - loss: 0.6973
INFO:tensorflow:Assets written to: base_model/assets
The new model is also is made up of a simple Conv1D network, but with an output layer with six classes. It also contains all the layers of the base_model except the first Conv1D layer and the last output layer:
classes = 6
dummy_x, dummy_y = tf.random.normal((100, 5, 2)), tf.random.uniform((100, 1), maxval=6, dtype=tf.int32)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv1D(64, 3, activation='relu', input_shape=(5, 2)))
model.add(tf.keras.layers.Conv1D(32, 2, activation='relu'))
for layer in base_model.layers[1:-1]: # Skip first and last layer
model.add(layer)
model.add(tf.keras.layers.Dense(classes, activation='softmax'))
model.compile(optimizer='adam', loss = tf.keras.losses.SparseCategoricalCrossentropy())
print(model.summary())
model.fit(dummy_x, dummy_y, batch_size=16, epochs=1)
Model: "sequential_9"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv1d_32 (Conv1D) (None, 3, 64) 448
conv1d_33 (Conv1D) (None, 2, 32) 4128
global_max_pooling1d_13 (Gl (None, 32) 0
obalMaxPooling1D)
dense_17 (Dense) (None, 32) 1056
dense_19 (Dense) (None, 6) 198
=================================================================
Total params: 5,830
Trainable params: 5,830
Non-trainable params: 0
_________________________________________________________________
None
7/7 [==============================] - 0s 3ms/step - loss: 1.8069
<keras.callbacks.History at 0x7f90c87a3c50>

Bidirectional LSTM in encoder decoder model running out of memory on training

latent_dim = 500
embedding_dim = 256
# Encoder
encoder_inputs = Input(shape=(max_eng_len,))
enc_emb = Embedding(x_voc_size, embedding_dim,trainable=True)(encoder_inputs)
#LSTM 1
encoder_lstm1 = Bidirectional(LSTM(latent_dim,return_sequences=True,return_state=True))
encoder_output1, forw_state_h, forw_state_c, back_state_h, back_state_c = encoder_lstm1(enc_emb)
final_enc_h = Concatenate()([forw_state_h,back_state_h])
final_enc_c = Concatenate()([forw_state_c,back_state_c])
encoder_states =[final_enc_h, final_enc_c]
# Decoder
decoder_inputs = Input(shape=(None,))
dec_emb_layer = Embedding(y_voc_size, embedding_dim,trainable=True)
dec_emb = dec_emb_layer(decoder_inputs)
#LSTM using encoder_states as initial state
decoder_lstm = LSTM(latent_dim*2, return_sequences=True, return_state=True)
decoder_outputs,decoder_fwd_state, decoder_back_state = decoder_lstm(dec_emb,initial_state=encoder_states)
#from tensorflow.keras.layers import Attention
#Attention Layer
attention_layer = AttentionLayer()
attn_res, attn_weight = attention_layer([encoder_output1, decoder_outputs])
# Concat attention output and decoder LSTM output
decoder_concat_input = Concatenate(axis=-1, name='concat_layer')([decoder_outputs, attn_res])
#Dense layer
decoder_dense = TimeDistributed(Dense(y_voc_size, activation='softmax'))
decoder_outputs = decoder_dense(decoder_concat_input)
# model
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
model.summary()
# Compile
model.compile(optimizer='rmsprop', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
checkpoint = ModelCheckpoint("/content/drive/My Drive/checkpoint.txt", monitor='val_accuracy')
early_stopping = EarlyStopping(monitor='val_accuracy', patience=5)
callbacks_list = [checkpoint, early_stopping]
# Training set
encoder_input_data = X_train
decoder_input_data = Y_train[:,:-1]
decoder_target_data = Y_train[:,1:]
# devlopment set
encoder_input_test = X_test
decoder_input_test = Y_test[:,:-1]
decoder_target_test= Y_test[:,1:]
history = model.fit([encoder_input_data, decoder_input_data],decoder_target_data,
epochs=50,
batch_size=64,
validation_data = ([encoder_input_test, decoder_input_test],decoder_target_test),
callbacks= callbacks_list)
x_voc_size is 45701 and y_voc_size is 84213. Approximately there are 45,000 records. I am getting memory error while training this model on 35GB RAM. Even after reducing the batch size to 25, I am getting the same error. Please suggest how to go about this error.
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) [(None, 5515)] 0
__________________________________________________________________________________________________
embedding (Embedding) (None, 5515, 256) 11699456 input_1[0][0]
__________________________________________________________________________________________________
input_2 (InputLayer) [(None, None)] 0
__________________________________________________________________________________________________
bidirectional (Bidirectional) [(None, 5515, 1000), 3028000 embedding[0][0]
__________________________________________________________________________________________________
embedding_1 (Embedding) (None, None, 256) 21558528 input_2[0][0]
__________________________________________________________________________________________________
concatenate (Concatenate) (None, 1000) 0 bidirectional[0][1]
bidirectional[0][3]
__________________________________________________________________________________________________
concatenate_1 (Concatenate) (None, 1000) 0 bidirectional[0][2]
bidirectional[0][4]
__________________________________________________________________________________________________
lstm_1 (LSTM) [(None, None, 1000), 5028000 embedding_1[0][0]
concatenate[0][0]
concatenate_1[0][0]
__________________________________________________________________________________________________
attention_layer (AttentionLayer ((None, None, 1000), 2001000 bidirectional[0][0]
lstm_1[0][0]
__________________________________________________________________________________________________
concat_layer (Concatenate) (None, None, 2000) 0 lstm_1[0][0]
attention_layer[0][0]
__________________________________________________________________________________________________
time_distributed (TimeDistribut (None, None, 84213) 168510213 concat_layer[0][0]
==================================================================================================
Total params: 211,825,197
Trainable params: 211,825,197
Non-trainable params: 0
__________________________________________________________________________________________________
EDIT - This is the model's summary. I think the parameters are huge. But how to efficiently reduce the complexity of the model?
That's quite the model and I bet you if we can talk it out a bit we can find something suitable for your use case. To give you an idea of where you stand, you're really going to want a cloud tpu cluster for something that big. I've been through most of the deeplearning ai specializations now and they choose a cloud tpu cluster between 5,000,000 and 13,000,000 parameters. The model you have there is something that would really want to be trained in a bigger corporate data center or national lab environment. In lieu of that though, it would be really good for you to check out transfer learning as large numbers of great models have already been trained in that environment and you could piggyback off them for free. I'd say if you can bring down the number of trainable parameters to something like 3,000,000, you might find something much, much more amenable for your hardware. Please, let's turn this into a conversation so everyone gets to learn. Let me know your thoughts!

Keras different model.evaluate() and model.predict() accuracy on the same data

So I need to be able to use predict on my model to test a benchmark but it doesn't work. When I use predict on the same validation data as my model uses whilst training, I only get an accuracy of .529. When using model.evaluate I get .85, it doesn't make sense and other threads talk about np.argmax or forgetting normalization but I've tried it all.
PS: I use transfer learning so it gets trained twice and some layers get frozen but that shouldn't influence this.
Model: "model_7"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_17 (InputLayer) [(None, 150, 150, 3)] 0
_________________________________________________________________
sequential_8 (Sequential) (None, 150, 150, 3) 0
_________________________________________________________________
normalization_7 (Normalizati (None, 150, 150, 3) 7
_________________________________________________________________
xception (Functional) (None, 5, 5, 2048) 20861480
_________________________________________________________________
global_average_pooling2d_7 ( (None, 2048) 0
_________________________________________________________________
dropout_7 (Dropout) (None, 2048) 0
_________________________________________________________________
dense_7 (Dense) (None, 1) 2049
=================================================================
Total params: 20,863,536
Trainable params: 20,809,001
Non-trainable params: 54,535
model.compile(
optimizer=keras.optimizers.Adam(1e-5), # Low learning rate
loss=keras.losses.BinaryCrossentropy(from_logits=True),
metrics=['accuracy', keras.metrics.BinaryAccuracy(),f1_m,precision_m, recall_m],
)
print(model.evaluate(x_test,y_test))
OUTPUT: 38/38 [==============================] - 2s 45ms/step - loss: 0.3570 - binary_accuracy: 0.8550 - f1_m: 0.8509 - precision_m: 0.8786 - recall_m: 0.8326
[0.3569563925266266, 0.8550000190734863, 0.8509402871131897, 0.8786242604255676, 0.8326380848884583]
better = norm_layer(x_test)
# print(better[0])
y_pred = model.predict(better)
print(y_pred)
y_pred[y_pred <= 0.5] = 0
y_pred[y_pred > 0.5] = 1
print(y_pred)
print(sum(1 for x,y in zip(y_pred,y_test) if x == y) / len(y_pred))
OUTPUT: [[0.42335328]
[0.3409149 ]
[0.45328587]
...
[0.38108858]
[0.44630498]
[0.76832736]]
[[0.]
[0.]
[0.]
...
[0.]
[0.]
[1.]]
0.5291666666666667
Code normalization
# Pre-trained Xception weights requires that input be normalized
# from (0, 255) to a range (-1., +1.), the normalization layer
# does the following, outputs = (inputs - mean) / sqrt(var)
norm_layer = keras.layers.experimental.preprocessing.Normalization()
mean = np.array([127.5] * 3)
var = mean ** 2
# Scale inputs to [-1, +1]
x = norm_layer(x)
norm_layer.set_weights([mean, var])
For full code the following link (the file Xception detection):
https://console.paperspace.com/tees0czt0/notebook/rj824evpaot143v?file=Xception%20detection.ipynb

Model subclassing ignores the weights of the Keras layers that appended to a Python list

I am trying to create a model subclassing with a variable number of layers and hidden layers' size.
Since the number and size of the hidden layers are not fixed, I appended the instantiated Keras layers into a list according to constructor parameters. But I do not see why when I use the list self.W to keep the Keras layers, the model ignores the weights of them.
class MLP(keras.Model):
def __init__(self, first_size, num_hidden_layers, hidden_activation, num_classes, **kwargs):
super(MLP, self).__init__()
self.W = [Dense(units=first_size//(2**i), activation=hidden_activation) for i in range(num_hidden_layers)]
# Regression task
if num_classes == 0:
self.W.append(Dense(units=1, activation='linear'))
# Classification task
else:
self.W.append(Dense(units=num_classes, activation='softmax'))
def call(self, inputs):
x = inputs
for w in self.W:
x = w(x)
return x
model = MLP(first_size=128, num_hidden_layers=4, hidden_activation='relu', num_classes=10)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])
model.fit(x_train, y_train, batch_size=64, epochs=20, validation_data=(x_val, y_val))
model.summary()
Model: "mlp_23"
_________________________________________________________________
Layer (type)    Output Shape    Param #
================================================================
Total params: 0
Trainable params: 0
Non-trainable params: 0
_________________________________________________________________
I think you can do this easily like this.
For example I used iris dataset from sklearn.
from keras.models import Model
from keras.layers import Input, Dense
import sklearn.datasets
iris_dataset = sklearn.datasets.load_iris()
x_train = iris_dataset["data"]
y_train = iris_dataset["target"]
inputs = Input(shape=x_train[0].shape)
x = inputs
num_hidden_layers=4
num_classes=10
hidden_activation='relu'
first_size=128
for i in range(num_hidden_layers):
x=Dense(units=first_size//(2**i), activation=hidden_activation)(x)
outputs=Dense(units=num_classes, activation='softmax')(x)
model = Model(inputs=inputs,outputs=outputs)
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])
model.summary()
output
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 4) 0
_________________________________________________________________
dense_1 (Dense) (None, 128) 640
_________________________________________________________________
dense_2 (Dense) (None, 64) 8256
_________________________________________________________________
dense_3 (Dense) (None, 32) 2080
_________________________________________________________________
dense_4 (Dense) (None, 16) 528
_________________________________________________________________
dense_5 (Dense) (None, 10) 170
=================================================================
Total params: 11,674
Trainable params: 11,674
Non-trainable params: 0

How to prevent overfitting in Keras sequential model?

I am already adding dropout regularization. I am trying to build a multiclass text classification multilayer perceptron model.
My model:
model = Sequential([
Dropout(rate=0.2, input_shape=features),
Dense(units=64, activation='relu'),
Dropout(rate=0.2),
Dense(units=64, activation='relu'),
Dropout(rate=0.2),
Dense(units=16, activation='softmax')])
My model.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dropout_1 (Dropout) (None, 20000) 0
_________________________________________________________________
dense_1 (Dense) (None, 64) 1280064
_________________________________________________________________
dropout_2 (Dropout) (None, 64) 0
_________________________________________________________________
dense_2 (Dense) (None, 64) 4160
_________________________________________________________________
dropout_3 (Dropout) (None, 64) 0
_________________________________________________________________
dense_3 (Dense) (None, 16) 1040
=================================================================
Total params: 1,285,264
Trainable params: 1,285,264
Non-trainable params: 0
_________________________________________________________________
None
Train on 6940 samples, validate on 1735 samples
I am getting:
Epoch 16/1000
- 4s - loss: 0.4926 - acc: 0.8719 - val_loss: 1.2640 - val_acc: 0.6640
Validation accuracy: 0.6639769498140736, loss: 1.2639631692545559
The validation accuracy is ~20% less than the accuracy, and the validation loss is way higher than the training loss.
I am already using dropout regularization, and using epochs = 1000, batch size = 512 and early stopping on val_loss.
Any suggestions?

Categories