Fine-tuning ResNet50 with Keras - val_loss keeps increasing - python

I am trying to customize resnet50 using keras with a tensorflow backend. However, upon tranining my val_loss keeps increasing. Trying different learning rates and batch sizes does not resolve the problem.
Using different preprocessing methods such as rescaling or using the preprocess_input function for resnet50 inside the ImageDataGenerator did not not solve the problem either.
This is the code I am using
Importing and preprocessing data:
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.resnet50 import preprocess_input, decode_predictions
IMAGE_SIZE = 224
BATCH_SIZE = 32
num_classes = 27
main_path = "C:/Users/aaron/Desktop/DATEN/data"
gesamt_path = os.path.join(main_path, "ML_DATA")
labels = listdir(gesamt_path)
data_generator = ImageDataGenerator(#rescale=1./255,
validation_split=0.20,
preprocessing_function=preprocess_input)
train_generator = data_generator.flow_from_directory(gesamt_path, target_size=(IMAGE_SIZE, IMAGE_SIZE), shuffle=True, seed=13,
class_mode='categorical', batch_size=BATCH_SIZE, subset="training")
validation_generator = data_generator.flow_from_directory(gesamt_path, target_size=(IMAGE_SIZE, IMAGE_SIZE), shuffle=False, seed=13,
class_mode='categorical', batch_size=BATCH_SIZE, subset="validation")
Defining and training the model
img_width = 224
img_height = 224
model = keras.applications.resnet50.ResNet50()
classes = list(iter(train_generator.class_indices))
model.layers.pop()
for layer in model.layers:
layer.trainable=False
last = model.layers[-1].output
x = Dense(len(classes), activation="softmax")(last)
finetuned_model = Model(model.input, x)
finetuned_model.compile(optimizer=Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])
for c in train_generator.class_indices:
classes[train_generator.class_indices[c]] = c
finetuned_model.classes = classes
earlystopCallback = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=8, verbose=1, mode='auto')
tbCallBack = keras.callbacks.TensorBoard(log_dir='./Graph', histogram_freq=0, write_graph=True, write_images=True)
history = finetuned_model.fit_generator(train_generator,
validation_data=validation_generator,
epochs=85, verbose=1,callbacks=[tbCallBack,earlystopCallback])

You need to match the preprocessing used for the pretrained network, not come up your own preprocessing. Double check the network input tensor, i.e. whether the channel-wise average of your input matches that of the data used for the pretrained network.
It could be that your new data is very different from the data used for the pretrained network. In that case, all BN layers gonna migrate their pretrained mean/var to new values, so an increasing loss is also possible (but eventually the loss should decrease).

In your training you are using a pretrained model (resnet50) changing only the last layer because you want to predict only a few classes and not the 1000 classes the pretrained model was trained on (that's the meaning of transfer learning).
You are freezing all weights and you are not letting your model to train. Try:
model = keras.applications.resnet50.ResNet50(include_top=False, pooling='avg')
for layer in model.layers:
layer.trainable=False
last = model.output
x = Dense(512, activation='relu')(last)
x = Dropout(0.5)(x)
#x = BatchNormalization()(x)
x = Dense(512, activation='relu')(x)
x = Dropout(0.5)(x)
#x = BatchNormalization()(x)
x = Dense(len(classes), activation="softmax")(x)
You can modify the code above, change 512 number of neurons, add or not dropout/batchnormalization, use as many dense layers as you want....

There is known ""problem"" (strange design) regarding BN in Keras and your bad result may be related to this issue.

Related

High precision in training and validation but low precision in test

Hi to everyone!!
I have a problem with my model.
I am training a CNN with transfer learning using the MobileNet base model.
My dataset is made up of 3 classes "paper, scissors, rock" (8751 images, and all class are perfectly balanced) and I use it to create a hand gesture recognition model for the "paper, scissors, rock" game.
In the training phase with keras I get excellent results both with the training set and with the test set (accuracy, precision, AUC all more or less on 0.98%):
This is the last epochs.
When I go to use the validation set, these metrics have a very low result:
I think this could be due to overfitting and that I should do some tuning on my model, in fact through augmentation I increase the number of images in my dataset and then I try to modify the base model of the MobileNet by adding layers.
But things are not getting better ... Can you help me? I'm going crazy.
This is my model training code:
import matplotlib.pyplot as plt
import tensorflow
from tensorflow.keras.layers import Dense, Flatten, GlobalAveragePooling2D, Dropout
from tensorflow.keras.models import Model
from tensorflow.keras import Sequential
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import keras
from keras.metrics import Precision, Recall
from collections import Counter
IMAGE_SIZE = (224, 224)
IMG_SHAPE = IMAGE_SIZE + (3,)
TRAIN_DATASET_DIR = "/content/PAPER_SCISSOR_ROCK/TRAIN"
TEST_DATASET_DIR = "/content/PAPER_SCISSOR_ROCK/TEST"
NUM_CLASSES = 3
BATCH_SIZE = 16
EPOCHS = 40
FC_LAYERS = [512, 512, 256, 256]
DROPOUT = 0.4
LEARNING_RATE = 0.0001
train_datagen = ImageDataGenerator(
vertical_flip=True,
validation_split=0.20,
rescale=1. / 255,
fill_mode = 'wrap',
rotation_range = 45,
brightness_range=[0.2,1.0]
#brightness_range=[1, 2],
#preprocessing_function = keras.applications.mobilenet.preprocess_input
)
# ONLY FOR TEST, SPLITT IN VALIDATION AND TEST IMAGES (TO CALCULATE PRECSION AND CONFUSION MATRIX AFTER)
test_datagen = ImageDataGenerator(
rescale=1./255,
validation_split=0.3
)
train_generator = train_datagen.flow_from_directory(
TRAIN_DATASET_DIR,
target_size=IMAGE_SIZE,
batch_size=BATCH_SIZE,
class_mode="categorical",
shuffle=True
)
val_generator = test_datagen.flow_from_directory(
TEST_DATASET_DIR,
target_size=IMAGE_SIZE,
batch_size=BATCH_SIZE,
class_mode="categorical",
subset='training',
shuffle=True
)
test_generator = test_datagen.flow_from_directory(
TEST_DATASET_DIR,
target_size=IMAGE_SIZE,
batch_size=BATCH_SIZE,
class_mode="categorical",
subset="validation",
shuffle=True
)
def build_finetune_model(base_model, dropout, fc_layers, num_classes):
# prevents weights from being updated in a given layer during training.
for layer in base_model.layers:
layer.trainable = False
# THE NEW PART SUGGESTED
for layer in base_model.layers[-30:]:
layer.trainable=True
for layer in base_model.layers:
if "BatchNormalization" in layer.__class__.__name__:
layer.trainable = False
x = base_model.output
x = Flatten()(x)
for fc in fc_layers:
print(fc)
x = Dense(fc, activation='relu')(x)
x = Dropout(dropout)(x)
preditions = Dense(num_classes, activation='softmax')(x)
finetune_model = Model(inputs = base_model.input, outputs = preditions)
return finetune_model
mobielNetV2 = tensorflow.keras.applications.MobileNetV2(input_shape=IMG_SHAPE, include_top=False, weights='imagenet')
finetune_model = build_finetune_model(mobielNetV2, dropout = DROPOUT, fc_layers = FC_LAYERS, num_classes = NUM_CLASSES)
finetune_model.compile(tensorflow.keras.optimizers.Adam(learning_rate=LEARNING_RATE), loss='categorical_crossentropy', metrics=['accuracy', 'AUC', Precision(), Recall()])
# Imposed EarlyStopping, in any era in which the model is seen to overfit, it stops.
es = keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0, patience=5, verbose=0, mode='auto')
r = finetune_model.fit_generator(train_generator, validation_data=val_generator, epochs=EPOCHS, steps_per_epoch=len(train_generator)/BATCH_SIZE,
validation_steps=len(test_generator), callbacks=[es])
print("\nSAVE THE MODEL")
finetune_model.save(f"/content/drive/My Drive/Computer_Vision/Models/MobileNet_ScissorPaperRock_{EPOCHS}_epochs.h5")
EDITED
This is the code about how I calculate the precision, recall and f1-scor of validation set:
import numpy as np
from sklearn.metrics import classification_report
# test_steps_per_epoch = np.math.ceil(val_generator.samples / val_generator.batch_size)
# print(test_steps_per_epoch)
predictions = finetune_model.predict(val_generator)
# Get most likely class
predicted_classes = np.argmax(predictions, axis=1)
print(val_generator, len(val_generator))
# Get ground-truth classes and class-labels
true_classes = val_generator.classes
#print(true_classes)
class_labels = list(val_generator.class_indices.keys())
#print(class_labels)
# Use scikit-learn to get statistics
report = classification_report(true_classes, predicted_classes, target_names=class_labels)
print(report)
Since you are fine-tuning a MobileNet V2 model, then it is a good idea to update the weights of the last few layers. MobileNet V2 is trained to classify 1000 differnet classes, but your domain contains only 3 classes of similar features. The first few layers are usually used for the general features, while the last few layers are less general, and those would affect your model the most since your domain is a lot smaller. I'd suggest that you allow the last 20%-30% layers of MobileNet V2 to update weights.

Keras VGG16 with flow_from_directory val_acc not rising

I use keras and import VGG16 network with imagenet weights to classify male/female photos.
Strcture of directories is:
split_1/train/male/*.jpg
split_1/train/female/*.jpg
split_1/val/female/*.jpg
split_1/val/male/*.jpg
I tried most of the solutions I found over the internet but none of them worked:
changing batch_size
changing optimizers
changing class_mode/loss function
setting every layer to trainable
copying every layer from VGG to my sequential
import numpy as np
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dropout, Flatten, Dense
from keras import applications
[...]
img_width, img_height = 224, 224
top_model_weights_path = "%s_retry2.h5" % split
train_data_dir = "%s/train" % split
validation_data_dir = "%s/val" % split
batch_size = 48
nb_train_samples = 4000
nb_validation_samples = ( 299 // batch_size ) * batch_size
epochs = 5
def train_top_model():
datagen = ImageDataGenerator(
horizontal_flip=True,
shear_range=0.2,
rescale=1. / 255)
vdatagen = ImageDataGenerator(rescale=1./255)
traingen = datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
follow_links=True,
shuffle=True)
valgen = vdatagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
follow_links=True,
shuffle=True)
vgg_model = applications.VGG16(input_shape=(224,224,3), weights="imagenet", include_top=False)
model = Sequential()
model.add(vgg_model)
model.add(Flatten())
model.add(Dense(2, activation='softmax'))
model.compile(optimizer="rmsprop", loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit_generator(traingen,
epochs=epochs,
steps_per_epoch=nb_train_samples // batch_size,
validation_data=valgen,
validation_steps=nb_validation_samples // batch_size)
It reports actual amount of images so it finds the jpgs properly.
Accuracy in val keeps being "random" and the same (~50%) during entire training.
Try reducing the learning rate, it may be the case where your model is overshooting the minima every time and hence not able to converge.
If any kind of hyper parameter tuning doesn't work then you need to fix your data but i think male/female classification data shouldn't be that tough to learn for a CNN model with pre-trained weights.
How Many sample do you have per class???
It seems that you don't have enough data to fine tune these large scale parameters that VGG16 has. (138 million if you trainable all the layers)
I suggest :
1. for gender classification problem, Try to use an official dataset such as IMDB-WIKI
2. If you wanna use your own data first collect more label sample and after that augment all of them
3. finally, Use state of the art CNN architectures such as Xception (you can load imagenet pre-traind of xception in keras) freeze 20 first layers and fune tune others

Keras: Transfer Learning - Image scaling worsens performance of the model significantly

I am working on an image classification problem with keras and tensorflow. I am using the VGG16 model with Imagenet weights and I am importing my data using the ImageDataGenerator from Keras.
Now I've been reading that one should always rescale the images using 1./255 for an efficient tranining. However, once I implement the scaling my model performs significantly worse than before. Changing the learning rate and batch size didn't help either.
Now I am questioning whether this is possible or if my model has some error. I am using standard .jpg image files.
from keras.preprocessing.image import ImageDataGenerator
IMAGE_SIZE = 224
BATCH_SIZE = 32
num_classes = 27
main_path = "C:/Users/abc/data"
final_path = os.path.join(main_path, "ML_DATA")
labels = listdir(gesamt_path)
data_generator = ImageDataGenerator(rescale=1./255, ### rescaling done here
validation_split=0.20)
train_generator = data_generator.flow_from_directory(final_path, target_size=(IMAGE_SIZE, IMAGE_SIZE), shuffle=True, seed=13,
class_mode='categorical', batch_size=BATCH_SIZE, subset="training")
validation_generator = data_generator.flow_from_directory(final_path, target_size=(IMAGE_SIZE, IMAGE_SIZE), shuffle=False, seed=13,
class_mode='categorical', batch_size=BATCH_SIZE, subset="validation")
Model definition and training
vgg16_model = keras.applications.vgg16.VGG16(weights='imagenet', include_top=True)
model = Sequential()
for layer in vgg16_model.layers[:-1]:
model.add(layer)
for layer in model.layers:
layer.trainable = False
model.add(Dense(num_classes, activation='softmax'))
model.compile(Adam(lr=.001), loss='categorical_crossentropy', metrics=['accuracy'])
history = model.fit_generator(train_generator,
validation_data=validation_generator,
epochs=85, verbose=1,callbacks=[tbCallBack,earlystopCallback])
It could be that Imagenet Weights are not compatible with your new image dimension.
I see that your only trainable layer is the very last layer, a dense layer, which doesn’t know anything about image dimension. My suggestion is to also let the first few convolutional layers to be trainable, so that those layers can adapt to the rescaling.
Working with ResNet and imagenet weights I improved my results using:
ImageDataGenerator(preprocessing_function=preprocess_input)
With rescaling I obtained worse results too.
This information was useful to me:
https://github.com/matterport/Mask_RCNN/issues/231

Difference between fit and fit_generator when doing transfer learning in keras

I am trying to train a deep neural network using transfer learning in Keras with tensorflow. There are different ways to do that, if your data is small you can afford computing features using the pre-trained model for the entire data and then use those features to train and test a small network, this is good as you don't need to compute those features for each batch and at each epoch. However, if the data is large, it will be impossible to compute features for the entire data, in this case we use ImageDataGenerator, flow_from_directory and fit_generator. In this case features are computed each time fore each batch at each epoch which make things much slower. I was assuming that both approaches produce similar results in terms of accuracy and loss. The problem is that I took a small data-set and tried both approaches and got completely different results. I will appreciate if someone can tell if something is wrong in the provided code and/or why I am getting different results please?
Approach when having large data-set:
from keras.applications.inception_v3 import InceptionV3,preprocess_input
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model
datagen= ImageDataGenerator(preprocessing_function=preprocess_input)
train_generator = datagen.flow_from_directory('data/train',
class_mode='categorical',
batch_size=64,...)
vaild_generator = datagen.flow_from_directory('data/valid',
class_mode='categorical',
batch_size=64,...)
base_model = InceptionV3(weights='imagenet', include_top=False)
x = base_model.output
x = Conv2D(filters = 128 , kernel_size = (2,2)) (x)
x = MaxPooling2D()(x)
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(2, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer='rmsprop', loss='categorical_crossentropy',...)
model.fit_generator(generator = train_generator,
steps_per_epoch = len (train_generator),
validation_data = valid_generator ,
validation_steps = len(valid_generator),
...)
Approach when having small data-set:
from keras.applications.inception_v3 import InceptionV3,preprocess_input
from keras.models import Sequential
from keras.utils import np_utils
base_model = InceptionV3(weights='imagenet', include_top=False)
train_features = base_model.predict(preprocess_input(train_data))
valid_features = base_model.predict(preprocess_input(valid_data))
model = Sequential()
model.add(Conv2D(filters = 128 , kernel_size = (2,2),
input_shape=(train_features [1],
train_features [2],
train_features [3])))
model.add(MaxPooling2D())
model.add(GlobalAveragePooling2D())
model.add(Dense(1024, activation='relu'))
model.add(Dense(2, activation='softmax'))
model.compile(optimizer='rmsprop', loss='categorical_crossentropy',...)
model.fit(train_features, np_utils.to_categorical(y_train,2),
validation_data=(valid_features, np_utils.to_categorical(y_valid,2)),
batch_size=64,...)

How to use InceptionV3 bottlenecks as input in Keras 2.0

I want to use bottlenecks for transfer learning using InceptionV3 in Keras.
I've used some of the tips on creating, loading and using bottlenecks from
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
My problem is that I don't know how to use a bottleneck (numpy array) as input to an InceptionV3 with a new top layer.
I get the following error:
ValueError: Error when checking input: expected input_3 to have shape
(None, None, None, 3) but got array with shape (248, 8, 8, 2048)
248 refers to the total number of images in this case.
I know that this line is wrong, but I dont't know how to correct it:
model = Model(inputs=base_model.input, outputs=predictions)
What is the correct way to input the bottleneck into InceptionV3?
Creating the InceptionV3 bottlenecks:
def create_bottlenecks():
datagen = ImageDataGenerator(rescale=1. / 255)
model = InceptionV3(include_top=False, weights='imagenet')
# Generate bottlenecks for all training images
generator = datagen.flow_from_directory(
train_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
nb_train_samples = len(generator.filenames)
bottlenecks_train = model.predict_generator(generator, int(math.ceil(nb_train_samples / float(batch_size))), verbose=1)
np.save(open(train_bottlenecks_file, 'w'), bottlenecks_train)
# Generate bottlenecks for all validation images
generator = datagen.flow_from_directory(
validation_data_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode=None,
shuffle=False)
nb_validation_samples = len(generator.filenames)
bottlenecks_validation = model.predict_generator(generator, int(math.ceil(nb_validation_samples / float(batch_size))), verbose=1)
np.save(open(validation_bottlenecks_file, 'w'), bottlenecks_validation)
Loading the bottlenecks:
def load_bottlenecks(src_dir, bottleneck_file):
datagen = ImageDataGenerator(rescale=1. / 255)
generator = datagen.flow_from_directory(
src_dir,
target_size=(img_width, img_height),
batch_size=batch_size,
class_mode='categorical',
shuffle=False)
num_classes = len(generator.class_indices)
# load the bottleneck features saved earlier
bottleneck_data = np.load(bottleneck_file)
# get the class lebels for the training data, in the original order
bottleneck_class_labels = generator.classes
# convert the training labels to categorical vectors
bottleneck_class_labels = to_categorical(bottleneck_class_labels, num_classes=num_classes)
return bottleneck_data, bottleneck_class_labels
Starting training:
def start_training():
global nb_train_samples, nb_validation_samples
create_bottlenecks()
train_data, train_labels = load_bottlenecks(train_data_dir, train_bottlenecks_file)
validation_data, validation_labels = load_bottlenecks(validation_data_dir, validation_bottlenecks_file)
nb_train_samples = len(train_data)
nb_validation_samples = len(validation_data)
base_model = InceptionV3(weights='imagenet', include_top=False)
# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 2 classes
predictions = Dense(2, activation='softmax')(x)
# What is the correct input? Obviously not base_model.input.
model = Model(inputs=base_model.input, outputs=predictions)
# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
layer.trainable = False
model.compile(optimizer=optimizers.SGD(lr=0.01, momentum=0.9), loss='categorical_crossentropy', metrics=['accuracy'])
# train the model on the new data for a few epochs
history = model.fit(train_data, train_labels,
epochs=epochs,
batch_size=batch_size,
validation_data=(validation_data, validation_labels),
)
Any help would be appreciated!
This error happens when you try to train your model with input data in a different shape from the shape your model supports.
Your model supports (None, None, None, 3), meaning:
Any number of images
Any height
Any width
3 channels
So, you must make sure that train_data (and validation_data) matches this shape.
The system is telling that train_data.shape = (248,8,8,2048)
I see that train_data comes from load_botlenecks. Is it really supposed to be coming from there? What is train data supposed to be? An image? Something else? What is a bottleneck?
Your model starts in the Inception model, and the Inception model takes images.
But if bottlenecks are already results of the Inception model, and you want to feed only bottlenecks, then the Inception model should not participate of anything at all.
Start from:
inputTensor = Input((8,8,2048)) #Use (None,None,2048) if bottlenecks vary in size
x = GlobalAveragePooling2D()(inputTensor)
.....
Create the model with:
model = Model(inputTensor, predictions)
The idea is:
Inception model: Image -> Inception -> Bottlenecks
Your model: Bottlenecks -> Model -> Labels
The combination of the two models is only necessary when you don't have the bottlenecks preloaded, but you have your own images for which you want to predict the bottlenecks first. (Of course you can work with separate models as well)
Then you're going to input only images (the bottlenecks will be created by Inception and passed to your model, everything internally):
Combined model: Image -> Inception ->(bottlenecks)-> Model -> Labels
For that:
inputImage = Input((None,None,3))
bottleNecks = base_model(inputImage)
predictions = model(bottleNecks)
fullModel = Model(inputImage, predictions)

Categories