Val_acc doesn't increse

Val_acc doesn't increse - python

I'm trying to train a model with transfer learning, mobilenetv2, but my val accurary stops incressing at around 0,60. I've tried to train the top layers that I've build, after that I've tried to also train some of the mobilenets layers. Same result. How can I fix it? I have to mention that I am new to deep learning and I am not sure that the top layers I've build are right. Feel free to correct me.
IMAGE_SIZE = 224
BATCH_SIZE = 64
train_data_dir = "/content/FER2013/Training"
validation_data_dir = "/content/FER2013/PublicTest"
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
validation_split=0)
train_generator = datagen.flow_from_directory(
train_data_dir,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
class_mode = 'categorical',
batch_size=BATCH_SIZE)
val_generator = datagen.flow_from_directory(
validation_data_dir,
target_size=(IMAGE_SIZE, IMAGE_SIZE),
class_mode = 'categorical',
batch_size=BATCH_SIZE)
IMG_SHAPE = (IMAGE_SIZE, IMAGE_SIZE, 3)
base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
include_top=False,
weights='imagenet')
model = tf.keras.Sequential([
base_model,
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Dense(32, activation='relu'),
tf.keras.layers.Dense(16, activation='relu'),
tf.keras.layers.Dense(7, activation='softmax')
])
model.compile(loss='categorical_crossentropy',
optimizer = tf.keras.optimizers.Adam(1e-5), #I've tried with .Adam as well
metrics=['accuracy'])
from keras.callbacks import ReduceLROnPlateau, EarlyStopping, ModelCheckpoint
lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=0.9, patience=3)
early_stopper = EarlyStopping(monitor='val_accuracy', min_delta=0, patience=6, mode='auto')
checkpointer = ModelCheckpoint('/content/weights.hd5', monitor='val_loss', verbose=1, save_best_only=True)
epochs = 50
learning_rate = 0.004 #I've tried other values as well
history_fine = model.fit(train_generator,
steps_per_epoch=len(train_generator),
epochs=epochs,
callbacks=[lr_reducer, checkpointer, early_stopper],
validation_data=val_generator,
validation_steps=len(val_generator))
Epoch 1/50
448/448 [==============================] - ETA: 0s - loss: 1.7362 - accuracy: 0.2929
Epoch 00001: val_loss improved from inf to 1.58818, saving model to /content/weights.hd5
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py:111: Model.state_updates (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/tracking.py:111: Layer.updates (from tensorflow.python.keras.engine.base_layer) is deprecated and will be removed in a future version.
Instructions for updating:
This property should not be used in TensorFlow 2.0, as updates are applied automatically.
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 166s 370ms/step - loss: 1.7362 - accuracy: 0.2929 - val_loss: 1.5882 - val_accuracy: 0.4249
Epoch 2/50
448/448 [==============================] - ETA: 0s - loss: 1.3852 - accuracy: 0.4664
Epoch 00002: val_loss improved from 1.58818 to 1.31690, saving model to /content/weights.hd5
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 165s 368ms/step - loss: 1.3852 - accuracy: 0.4664 - val_loss: 1.3169 - val_accuracy: 0.4827
Epoch 3/50
448/448 [==============================] - ETA: 0s - loss: 1.2058 - accuracy: 0.5277
Epoch 00003: val_loss improved from 1.31690 to 1.21979, saving model to /content/weights.hd5
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 165s 368ms/step - loss: 1.2058 - accuracy: 0.5277 - val_loss: 1.2198 - val_accuracy: 0.5271
Epoch 4/50
448/448 [==============================] - ETA: 0s - loss: 1.0828 - accuracy: 0.5861
Epoch 00004: val_loss improved from 1.21979 to 1.18972, saving model to /content/weights.hd5
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 166s 370ms/step - loss: 1.0828 - accuracy: 0.5861 - val_loss: 1.1897 - val_accuracy: 0.5533
Epoch 5/50
448/448 [==============================] - ETA: 0s - loss: 0.9754 - accuracy: 0.6380
Epoch 00005: val_loss improved from 1.18972 to 1.13336, saving model to /content/weights.hd5
INFO:tensorflow:Assets written to: /content/weights.hd5/assets
448/448 [==============================] - 165s 368ms/step - loss: 0.9754 - accuracy: 0.6380 - val_loss: 1.1334 - val_accuracy: 0.5743
Epoch 6/50
448/448 [==============================] - ETA: 0s - loss: 0.8761 - accuracy: 0.6848
Epoch 00006: val_loss did not improve from 1.13336
448/448 [==============================] - 153s 342ms/step - loss: 0.8761 - accuracy: 0.6848 - val_loss: 1.1348 - val_accuracy: 0.5882
Epoch 7/50
448/448 [==============================] - ETA: 0s - loss: 0.7783 - accuracy: 0.7264
Epoch 00007: val_loss did not improve from 1.13336
448/448 [==============================] - 153s 341ms/step - loss: 0.7783 - accuracy: 0.7264 - val_loss: 1.1392 - val_accuracy: 0.5893
Epoch 8/50
448/448 [==============================] - ETA: 0s - loss: 0.6832 - accuracy: 0.7638
Epoch 00008: val_loss did not improve from 1.13336
448/448 [==============================] - 153s 342ms/step - loss: 0.6832 - accuracy: 0.7638 - val_loss: 1.1542 - val_accuracy: 0.6052

Since your validation loss is increasing while your training loss decreases, I think you may have a problem of overfitting to your training sets. Some things that could help are:
Use less dense layers. I think there are too many, but I could be wrong since I don't know what problem you are solving.
Add dropout layers after every dense layer.
Increase the dropout rates.
Use data augmentation on your training set (since you are already using ImageDataGenerator it won't be that hard).
Reduce the number of neurons in dense layers.
Use regularization.
You can try applying any of them or multiple of them at the same time. Tweaking your model is a lot of trial and error, do some experiments, and keep the model that achieves the best performance.

With your training loss decreasing and your validation loss increasing it appears that you are in an over fitting situation. The more complex your model the higher the chance of this happening so I suggest you try to simplify your model. Try removing all the dense layers except for the top layer which is producing your classifications. Run that and see how it does. My experience with MobileNet is that it should work well. If not add an additional dense layer with about 128 neurons followed by a drop out layer. Use the drop out value to adjust for over fitting. If you are still over fitting you might want to consider adding regularizers to this dense layer. Documentation is here. It could also be the case that you need more training samples. Use the Image Data Generator to augment your data set. For example set horizontal_flip-True. I notice you did not include the MobileNetV2 pre-processing function in the Image Data Generator. I believe this is necessary so modify the code as shown below. I have also found I get the best results with MobileNet if I train the entire model. So before you compile your model add the code below.
ImageDataGenerator(preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input)
for layer in model.layers:
layer.trainable=True
As an aside you can delete that layer in you model and within the MobileNet set the parameter pooling='max'. I see you start out with a very small learning rate try something like .005. Since you have the ReduceLROnPlateau callback this will adjust it if it is to large but will allow you to converge faster.

Related

Does EarlyStopping take model of last epoch or last best score?

I trained a model in keras using EarlyStopping in callbacks with patience=2:
model.fit(X_train, y_train, batch_size=batch_size, epochs=epochs, validation_split=validation_split,
callbacks= [keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=2, verbose=0, mode='min'), keras.callbacks.ModelCheckpoint('best_model.h5', monitor='val_loss', mode='min', verbose=1, save_best_only=True)], class_weight=class_weights)
Epoch 1/20
3974/3975 [============================>.] - ETA: 0s - loss: 0.3499 - accuracy: 0.7683
Epoch 00001: val_loss improved from inf to 0.30331, saving model to best_model.h5
3975/3975 [==============================] - 15s 4ms/step - loss: 0.3499 - accuracy: 0.7683 - val_loss: 0.3033 - val_accuracy: 0.8134
Epoch 2/20
3962/3975 [============================>.] - ETA: 0s - loss: 0.2821 - accuracy: 0.8041
Epoch 00002: val_loss improved from 0.30331 to 0.25108, saving model to best_model.h5
3975/3975 [==============================] - 14s 4ms/step - loss: 0.2819 - accuracy: 0.8043 - val_loss: 0.2511 - val_accuracy: 0.8342
Epoch 3/20
3970/3975 [============================>.] - ETA: 0s - loss: 0.2645 - accuracy: 0.8157
Epoch 00003: val_loss did not improve from 0.25108
3975/3975 [==============================] - 14s 4ms/step - loss: 0.2645 - accuracy: 0.8157 - val_loss: 0.2687 - val_accuracy: 0.8338
Epoch 4/20
3962/3975 [============================>.] - ETA: 0s - loss: 0.2553 - accuracy: 0.8223
Epoch 00004: val_loss did not improve from 0.25108
3975/3975 [==============================] - 15s 4ms/step - loss: 0.2553 - accuracy: 0.8224 - val_loss: 0.2836 - val_accuracy: 0.8336
Wall time: 58.4 s
Obviously the model didn't improve after epoch 2, but patience=2 led the algorithm terminate after 4 epochs. When I run model.evaluate now, does it take the model trained after 4 epochs or does it take the model trained after 2 epochs?
Is there any need to save and load the best model with ModelCheckpoint then in order to evaluate?

In you specific case, it retain the model after 4 epochs:
as I can see, the model did indeed improve after the first 2 epochs:
accuracy
0.7683 - 0.8043 - 0.8157 - 0.8224
loss
0.3499 - 0.2821 - 0.2645 - 0.2553
Early stopping would stop your training after the first two epochs if had there been no improvements. In this situation your loss isn't stuck after the first two epochs
So early stopping is not called at the end of epoch 2, and the model returned is the one at the end of your first 4 epochs and so the training has continued
From tensorflow:
Assuming the goal of a training is to minimize the loss. With this,
the metric to be monitored would be 'loss', and mode would be 'min'. A
model.fit() training loop will check at end of every epoch whether the
loss is no longer decreasing, considering the min_delta and patience
if applicable. Once it's found no longer decreasing,
model.stop_training is marked True and the training terminates.
If your model had not improved for after two epochs, your returned model would have been the model after the first two epochs.
The pointed parameter restore_best_weights is not directly connected to the early stopping.
It simply restore the weights of the model at the best detected performances. Always from tf:
Whether to restore model weights from the epoch with the best value of
the monitored quantity. If False, the model weights obtained at the
last step of training are used.
To make an example, you can use restore_best_weights even without early_stopping. The training would end only at the end of your set number of epochs, but the model returned at the end would have been the one with the best performances. This could be useful in the situation of bouncing loss due to a wrong learning rate and\or optimizer
refs: early stopping tf doc

Why is my model overfitting on the second epoch?

I'm a beginner in deep learning and I'm trying to train a deep learning model to classify different ASL hand signs using Mobilenet_v2 and Inception.
Here are my codes create an ImageDataGenerator for creating the training and validation set.
# Reformat Images and Create Batches
IMAGE_RES = 224
BATCH_SIZE = 32
datagen = tf.keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
validation_split = 0.4
)
train_generator = datagen.flow_from_directory(
base_dir,
target_size = (IMAGE_RES,IMAGE_RES),
batch_size = BATCH_SIZE,
subset = 'training'
)
val_generator = datagen.flow_from_directory(
base_dir,
target_size= (IMAGE_RES, IMAGE_RES),
batch_size = BATCH_SIZE,
subset = 'validation'
)
Here are the codes to train the models:
# Do transfer learning with Tensorflow Hub
URL = "https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4"
feature_extractor = hub.KerasLayer(URL,
input_shape=(IMAGE_RES, IMAGE_RES, 3))
# Freeze pre-trained model
feature_extractor.trainable = False
# Attach a classification head
model = tf.keras.Sequential([
feature_extractor,
layers.Dense(5, activation='softmax')
])
model.summary()
# Train the model
model.compile(
optimizer='adam',
loss='categorical_crossentropy',
metrics=['accuracy'])
EPOCHS = 5
history = model.fit(train_generator,
steps_per_epoch=len(train_generator),
epochs=EPOCHS,
validation_data = val_generator,
validation_steps=len(val_generator)
)
Epoch 1/5
94/94 [==============================] - 19s 199ms/step - loss: 0.7333 - accuracy: 0.7730 - val_loss: 0.6276 - val_accuracy: 0.7705
Epoch 2/5
94/94 [==============================] - 18s 190ms/step - loss: 0.1574 - accuracy: 0.9893 - val_loss: 0.5118 - val_accuracy: 0.8145
Epoch 3/5
94/94 [==============================] - 18s 191ms/step - loss: 0.0783 - accuracy: 0.9980 - val_loss: 0.4850 - val_accuracy: 0.8235
Epoch 4/5
94/94 [==============================] - 18s 196ms/step - loss: 0.0492 - accuracy: 0.9997 - val_loss: 0.4541 - val_accuracy: 0.8395
Epoch 5/5
94/94 [==============================] - 18s 193ms/step - loss: 0.0349 - accuracy: 0.9997 - val_loss: 0.4590 - val_accuracy: 0.8365
I've tried using data augmentation but the model still overfits so I'm wondering if I've done something wrong in my code.

Your data is very small. Try splitting with random seeds and check if the problem still persists.
If it does, then use regularizations and decrease the complexity of neural network.
Also experiment with different optimizers and smaller learning rate (try lr scheduler)

It seems like your dataset is very small with some true outputs separated only by a small distance of inputs in the input-output curve. That is why it is fitting easily to those points.

Validation Loss is increase and validation accuracy is decrese on every epochs in my RNN Model

I am working on abusive and violent content detection. When I train my model, the training log is as follows:
Train on 9087 samples, validate on 2125 samples
Epoch 1/5
9087/9087 [==============================] - 33s 4ms/step - loss: 0.3193 - accuracy: 0.8603 - val_loss: 0.2314 - val_accuracy: 0.9322
Epoch 2/5
9087/9087 [==============================] - 33s 4ms/step - loss: 0.1787 - accuracy: 0.9440 - val_loss: 0.2039 - val_accuracy: 0.9356
Epoch 3/5
9087/9087 [==============================] - 32s 4ms/step - loss: 0.1148 - accuracy: 0.9637 - val_loss: 0.2569 - val_accuracy: 0.9180
Epoch 4/5
9087/9087 [==============================] - 33s 4ms/step - loss: 0.0805 - accuracy: 0.9738 - val_loss: 0.3409 - val_accuracy: 0.9047
Epoch 5/5
9087/9087 [==============================] - 36s 4ms/step - loss: 0.0599 - accuracy: 0.9795 - val_loss: 0.3661 - val_accuracy: 0.9082
You can see in this graph.
As you can see, the train loss and accuracy decreases but the validation loss and accuracy increases..
The code for the model:
model = Sequential()
model.add(Embedding(8941, 256,input_length=20))
model.add(LSTM(32, dropout=0.1, recurrent_dropout=0.1))
model.add(Dense(32,activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(4, activation='sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy',
optimizer=optimizers.Adam(lr=0.001),
metrics=['accuracy'])
history=model.fit(x, x_test,
batch_size=batch_size,
epochs=5,
verbose=1,
validation_data=(y, y_test))
Help would be appriciated.

It actually depends on your data, but it seems like the model overfits the train set very quickly (after the second epoch).
Try:
Reduce your learning rate
Increase your batch size
Add regularization
Increase your dropout rate
Furthermore, it seems like you use binary_crossentropy while your model outputs a 4-length output for each sample: model.add(Dense(4, activation='sigmoid')) this might cause problems too.

Could validation data be a generator in tensorflow.keras 2.0?

In official documents of tensorflow.keras,
validation_data could be: tuple (x_val, y_val) of Numpy arrays or tensors
tuple (x_val, y_val, val_sample_weights) of Numpy arrays
dataset For the first two cases, batch_size must be provided. For the last case, validation_steps could be provided.
It does not mention if generator could act as validation_data. So I want to know if validation_data could be a datagenerator? like the following codes:
net.fit_generator(train_it.generator(), epoch_iterations * batch_size, nb_epoch=nb_epoch, verbose=1,
validation_data=val_it.generator(), nb_val_samples=3,
callbacks=[checker, tb, stopper, saver])
Update:
In the official documents of keras, the same contents, but another sentense is added:
dataset or a dataset iterator
Considering that
dataset For the first two cases, batch_size must be provided. For the last case, validation_steps could be provided.
I think there should be 3 cases. Keras' documents are correct. So I will post an issue in tensorflow.keras to update the documents.

Yes it can, that's strange that it is not in the doc but is it working exactly like the x argument, you can also use a keras.Sequence or a generator. In my project I often use keras.Sequence that acts like a generator
Minimum working example that shows that it works :
import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense, Flatten
def generator(batch_size): # Create empty arrays to contain batch of features and labels
batch_features = np.zeros((batch_size, 1000))
batch_labels = np.zeros((batch_size,1))
while True:
for i in range(batch_size):
yield batch_features, batch_labels
model = Sequential()
model.add(Dense(125, input_shape=(1000,), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
train_generator = generator(64)
validation_generator = generator(64)
model.fit(train_generator, validation_data=validation_generator, validation_steps=100, epochs=100, steps_per_epoch=100)
100/100 [==============================] - 1s 13ms/step - loss: 0.6689 - accuracy: 1.0000 - val_loss: 0.6448 - val_accuracy: 1.0000
Epoch 2/100
100/100 [==============================] - 0s 4ms/step - loss: 0.6223 - accuracy: 1.0000 - val_loss: 0.6000 - val_accuracy: 1.0000
Epoch 3/100
100/100 [==============================] - 0s 4ms/step - loss: 0.5792 - accuracy: 1.0000 - val_loss: 0.5586 - val_accuracy: 1.0000
Epoch 4/100
100/100 [==============================] - 0s 4ms/step - loss: 0.5393 - accuracy: 1.0000 - val_loss: 0.5203 - val_accuracy: 1.0000

Sudden 50% accuracy drop while training convolutional NN

Training convolutional neural network from scratch on my own dataset with Keras and Tensorflow.
learning rate = 0.0001,
5 classes to sort,
no Dropout used,
dataset checked twice, no wrong labels found
Model:
model = models.Sequential()
model.add(layers.Conv2D(16,(2,2),activation='relu',input_shape=(75,75,3)))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(16,(2,2),activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(32,(2,2),activation='relu'))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dense(128,activation='relu'))
model.add(layers.Dense(5,activation='sigmoid'))
model.compile(optimizer=optimizers.adam(lr=0.0001),
loss='categorical_crossentropy',
metrics=['acc'])
history = model.fit_generator(train_generator,
steps_per_epoch=100,
epochs=50,
validation_data=val_generator,
validation_steps=25)
Everytime when model achieves 25-35 epochs (80-90% accuracy) this happens:
Epoch 31/50
100/100 [==============================] - 3s 34ms/step - loss: 0.3524 - acc: 0.8558 - val_loss: 0.4151 - val_acc: 0.7992
Epoch 32/50
100/100 [==============================] - 3s 34ms/step - loss: 0.3393 - acc: 0.8700 - val_loss: 0.4384 - val_acc: 0.7951
Epoch 33/50
100/100 [==============================] - 3s 34ms/step - loss: 0.3321 - acc: 0.8702 - val_loss: 0.4993 - val_acc: 0.7620
Epoch 34/50
100/100 [==============================] - 3s 33ms/step - loss: 1.5444 - acc: 0.3302 - val_loss: 1.6062 - val_acc: 0.1704
Epoch 35/50
100/100 [==============================] - 3s 34ms/step - loss: 1.6094 - acc: 0.2935 - val_loss: 1.6062 - val_acc: 0.1724
There is some similar problems with answers, but mostly they recommend to lower learning rate, but it doesnt help at all.
UPD: almost all weights and biases in network became nan. Network somehow died inside

Solution in this case:
I changed sigmoid function in last layer to softmax function and drops are gone
Why this worked out?
sigmoid activation function is used for binary (two-class) classifications.
In multiclassification problems we should use softmax function - special extension of sigmoid function for multiclassification problems.
More information: Sigmoid vs Softmax
Special thanks to #desertnaut and #Shubham Panchal for error indication

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.