How to prevent overfitting in Keras sequential model? - python

I am already adding dropout regularization. I am trying to build a multiclass text classification multilayer perceptron model.
My model:
model = Sequential([
Dropout(rate=0.2, input_shape=features),
Dense(units=64, activation='relu'),
Dropout(rate=0.2),
Dense(units=64, activation='relu'),
Dropout(rate=0.2),
Dense(units=16, activation='softmax')])
My model.summary():
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dropout_1 (Dropout) (None, 20000) 0
_________________________________________________________________
dense_1 (Dense) (None, 64) 1280064
_________________________________________________________________
dropout_2 (Dropout) (None, 64) 0
_________________________________________________________________
dense_2 (Dense) (None, 64) 4160
_________________________________________________________________
dropout_3 (Dropout) (None, 64) 0
_________________________________________________________________
dense_3 (Dense) (None, 16) 1040
=================================================================
Total params: 1,285,264
Trainable params: 1,285,264
Non-trainable params: 0
_________________________________________________________________
None
Train on 6940 samples, validate on 1735 samples
I am getting:
Epoch 16/1000
- 4s - loss: 0.4926 - acc: 0.8719 - val_loss: 1.2640 - val_acc: 0.6640
Validation accuracy: 0.6639769498140736, loss: 1.2639631692545559
The validation accuracy is ~20% less than the accuracy, and the validation loss is way higher than the training loss.
I am already using dropout regularization, and using epochs = 1000, batch size = 512 and early stopping on val_loss.
Any suggestions?

Related

model training starts all over again after unfreezing weights in tensorflow

I am training an image classifier using Large EfficientNet:
base_model = EfficientNetV2L(input_shape = (300, 500, 3),
include_top = False,
weights = 'imagenet',
include_preprocessing = True)
model = tf.keras.Sequential([base_model,
layers.GlobalAveragePooling2D(),
layers.Dropout(0.2),
layers.Dense(128, activation = 'relu'),
layers.Dropout(0.3),
layers.Dense(6, activation = 'softmax')])
base_model.trainable = False
model.compile(optimizer = optimizers.Adam(learning_rate = 0.001),
loss = losses.SparseCategoricalCrossentropy(),
metrics = ['accuracy'])
callback = [callbacks.EarlyStopping(monitor = 'val_loss', patience = 2)]
history = model.fit(ds_train, batch_size = 28, validation_data = ds_val, epochs = 20, verbose = 1, callbacks = callback)
it is working properly.
model summary:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
efficientnetv2-l (Functiona (None, 10, 16, 1280) 117746848
l)
global_average_pooling2d (G (None, 1280) 0
lobalAveragePooling2D)
dropout (Dropout) (None, 1280) 0
dense (Dense) (None, 128) 163968
dropout_1 (Dropout) (None, 128) 0
dense_1 (Dense) (None, 6) 774
=================================================================
Total params: 117,911,590
Trainable params: 164,742
Non-trainable params: 117,746,848
_________________________________________________________________
output:
Epoch 4/20
179/179 [==============================] - 203s 1s/step - loss: 0.1559 - accuracy: 0.9474 - val_loss: 0.1732 - val_accuracy: 0.9428
But, while fine-tuning it, I am unfreezing some weights:
base_model.trainable = True
fine_tune_at = 900
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False
model.compile(optimizer = optimizers.Adam(learning_rate = 0.0001),
loss = losses.SparseCategoricalCrossentropy(),
metrics = ['accuracy'])
history = model.fit(ds_train, batch_size = 28, validation_data = ds_val, epochs = 20, verbose = 1, callbacks = callback)
model summary:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
efficientnetv2-l (Functiona (None, 10, 16, 1280) 117746848
l)
global_average_pooling2d (G (None, 1280) 0
lobalAveragePooling2D)
dropout (Dropout) (None, 1280) 0
dense (Dense) (None, 128) 163968
dropout_1 (Dropout) (None, 128) 0
dense_1 (Dense) (None, 6) 774
=================================================================
Total params: 117,911,590
Trainable params: 44,592,230
Non-trainable params: 73,319,360
_________________________________________________________________
And, it is starting the training all over again. For the first time, when I trained it with freezed weights, the loss decreased to 0.1559, after unfreezing the weights, the model started training again from loss = 0.444. Why is this happening? I think fine tuning should't reset the weights.
When training again the Adam lr rate for each node is set again to the initial lr maybe that is the reason for the big jump after you start the learning again.You can also specify to save and load the optimizer values as well when saving/loading the model. Maybe look here. You are also retraining a lot of parameters maybe reduce the amount of parameters. If you keep more old parameters the jump might not be that high.

Very high overfitting in image classification model

I'm a beginner in the development of CNNs and for a university assignment I've been tasked to create an image classificator for food items. The dataset I'm using is Recipes5k. It has 101 classes of foods:
I'm using Google Colab paired with the Tensorflow to achieve this and have been following Tensorflow's image classification beginner tutorial.
So far, everything has been clear and easy to understand but I've ran across a problem when it comes to training my model: The Validation Accuracy is outrageously low (10-11%) when compared to the training accuracy (90%+). I suspect this may be due to overfitting of the model. So far, I've tried image augmentation techniques and applying dropout to the model. This did not work as expected and only boosted the accuracy by about 5%. I have posted the code snippets necessary below:
Data Augmentation layer:
data_augmentation = keras.Sequential(
[
layers.experimental.preprocessing.RandomFlip("horizontal",
input_shape=(img_height,
img_width,
3)),
layers.experimental.preprocessing.RandomRotation(0.1),
layers.experimental.preprocessing.RandomZoom(0.1),
]
)
Model:
model = Sequential([
data_augmentation,
layers.experimental.preprocessing.Rescaling(1./255),
layers.Conv2D(16, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(32, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Conv2D(64, 3, padding='same', activation='relu'),
layers.MaxPooling2D(),
layers.Dropout(0.3),
layers.Flatten(),
layers.Dense(128, activation='relu'),
layers.Dense(num_classes)
])
Model Summary:
Model: "sequential_2"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential_1 (Sequential) (None, 224, 224, 3) 0
_________________________________________________________________
rescaling_2 (Rescaling) (None, 224, 224, 3) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 224, 224, 16) 448
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 112, 112, 16) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 112, 112, 32) 4640
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 56, 56, 32) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 56, 56, 64) 18496
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 28, 28, 64) 0
_________________________________________________________________
dropout (Dropout) (None, 28, 28, 64) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 50176) 0
_________________________________________________________________
dense_2 (Dense) (None, 128) 6422656
_________________________________________________________________
dense_3 (Dense) (None, 101) 13029
=================================================================
Total params: 6,459,269
Trainable params: 6,459,269
Non-trainable params: 0
_________________________________________________________________
Results after training with 250 epochs
Epoch 250/250
121/121 [==============================] - 3s 25ms/step - loss: 0.2564 - accuracy: 0.9270 - val_loss: 17.6184 - val_accuracy: 0.1202
What other techniques can I use to improve the accuracy of my model?
Update: I followed Gerry P's suggestion and edited my last dense layer to work with softmax activation. The results of 1250 epochs of training presented a slower increase in training accuracy and around 5-6% more validation accuracy. This improved my model but it is still a very low accuracy.
For your last dense layer change it to
layers.Dense(num_classes, activation='softmax')
In model.compile() use
loss='categorical_crossentropy'
If your labels are one hot encoded. If they are integers then use
loss='sparse_categorical_crossentropy'

What's the sequence length of a Keras Bidirectional layer?

If I have:
self.model.add(LSTM(lstm1_size, input_shape=(seq_length, feature_dim), return_sequences=True))
self.model.add(BatchNormalization())
self.model.add(Dropout(0.2))
then my seq_length specifies how many slices of data I want to process at once. If it matters, my model is a sequence-to-sequence (same size).
But if I have:
self.model.add(Bidirectional(LSTM(lstm1_size, input_shape=(seq_length, feature_dim), return_sequences=True)))
self.model.add(BatchNormalization())
self.model.add(Dropout(0.2))
then is that doubling the sequence size? Or at each time step, is it getting seq_length / 2 before and after that timestep?
Using a bidirectional LSTM layer has no effect on the sequence length.
I tested this with the following code:
from keras.models import Sequential
from keras.layers import Bidirectional,LSTM,BatchNormalization,Dropout,Input
model = Sequential()
lstm1_size = 50
seq_length = 128
feature_dim = 20
model.add(Bidirectional(LSTM(lstm1_size, input_shape=(seq_length, feature_dim), return_sequences=True)))
model.add(BatchNormalization())
model.add(Dropout(0.2))
batch_size = 32
model.build(input_shape=(batch_size,seq_length, feature_dim))
model.summary()
This resulted in the following output for bidirectional
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
bidirectional_1 (Bidirection (32, 128, 100) 28400
_________________________________________________________________
batch_normalization_1 (Batch (32, 128, 100) 400
_________________________________________________________________
dropout_1 (Dropout) (32, 128, 100) 0
=================================================================
Total params: 28,800
Trainable params: 28,600
Non-trainable params: 200
No bidirectional layer:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 128, 50) 14200
_________________________________________________________________
batch_normalization_1 (Batch (None, 128, 50) 200
_________________________________________________________________
dropout_1 (Dropout) (None, 128, 50) 0
=================================================================
Total params: 14,400
Trainable params: 14,300
Non-trainable params: 100
_________________________________________________________________

Getting different results from Keras model.evaluate and model.predict

I have trained a model to predict topic categories using word2vec and an lstm model using keras and got about 98% accuracy during training, I saved the model then loaded it into another file for trying on the test set, I used model.evaluate and model.predict and the results were very different.
I'm using keras with tensorflow as a backend, the model summary is:
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
lstm_1 (LSTM) (None, 22) 19624
_________________________________________________________________
dropout_1 (Dropout) (None, 22) 0
_________________________________________________________________
dense_1 (Dense) (None, 40) 920
_________________________________________________________________
activation_1 (Activation) (None, 40) 0
=================================================================
Total params: 20,544
Trainable params: 20,544
Non-trainable params: 0
_________________________________________________________________
None
The code:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
model.load_weights(os.path.join('model', 'lstm_model_weights.hdf5'))
score, acc = model.evaluate(x_test, y_test, batch_size=batch_size)
print()
print('Score: %1.4f' % score)
print('Evaluation Accuracy: %1.2f%%' % (acc*100))
predicted = model.predict(x_test, batch_size=batch_size)
acc2 = np.count_nonzero(predicted.argmax(1) == y_test.argmax(1))/y_test.shape[0]
print('Prediction Accuracy: %1.2f%%' % (acc2*100))
The output of this code is
39680/40171 [============================>.] - ETA: 0s
Score: 0.1192
Evaluation Accuracy: 97.50%
Prediction Accuracy: 9.03%
Can anyone tell me what did I miss?
I think model evaluation works on dev set (or average of the dev-set accuracy if you use cross-validation), but prediction works on test set.

Why does this neural network have zero accuracy and very low loss?

I have network :
Tensor("input_1:0", shape=(?, 5, 1), dtype=float32)
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 5, 1) 0
_________________________________________________________________
bidirectional_1 (Bidirection (None, 5, 64) 2176
_________________________________________________________________
activation_1 (Activation) (None, 5, 64) 0
_________________________________________________________________
bidirectional_2 (Bidirection (None, 5, 128) 16512
_________________________________________________________________
activation_2 (Activation) (None, 5, 128) 0
_________________________________________________________________
bidirectional_3 (Bidirection (None, 1024) 656384
_________________________________________________________________
activation_3 (Activation) (None, 1024) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 1025
_________________________________________________________________
p_re_lu_1 (PReLU) (None, 1) 1
=================================================================
Total params: 676,098
Trainable params: 676,098
Non-trainable params: 0
_________________________________________________________________
None
Train on 27496 samples, validate on 6875 samples
I fit and compile it by:
model.compile(loss='mse',optimizer=Adamx,metrics=['accuracy'])
model.fit(x_train,y_train,batch_size=100,epochs=10,validation_data=(x_test,y_test),verbose=2)
When I run it and also evaluate it on unseen data,it returns 0.0 Accuracy with very low loss. I can't figure out what's the problem.
Epoch 10/10
- 29s - loss: 1.6972e-04 - acc: 0.0000e+00 - val_loss: 1.7280e-04 - val_acc: 0.0000e+00
What you are getting is expected. Your model is working correctly, it is your metrics of measure that is incorrect. The aim of the optimization function is to minimize loss, not to increase accuracy.
Since you are using PRelu as the activation function of your last layer, you always get float output from the network. Comparing these float output with actual label for measure of accuracy doesn't seem the right option. Since the outputs and labels are continuous random variable the joint probability for specific value will be zero. Therefore, even if the model predicts values very close to the true label value the model accuracy still will be zero unless the model predicts exactly the same value as true label - which is improbable.
e.g if y_true is 1.0 and the model predicts 0.99999 still this value does not add value to accuracy of the model since 1.0 != 0.99999
Update
The choice of metrics function depends on the type of problem. Keras also provides functionality for implementing custom metrics.
Assuming the problem on question is linear regression and two values are equal if difference between the two values is less than 0.01, the custom loss metrics can be defined as:-
import keras.backend as K
import tensorflow as tf
accepted_diff = 0.01
def linear_regression_equality(y_true, y_pred):
diff = K.abs(y_true-y_pred)
return K.mean(K.cast(diff < accepted_diff, tf.float32))
Now you can use this metrics for your model
model.compile(loss='mse',optimizer=Adamx,metrics=[linear_regression_equality])

Categories