CNN overfitting on validation set increase test set performance

CNN overfitting on validation set increase test set performance - python

I'm actually using CNN to classify image. I got 16 classes and around 3000 images(very small dataset). This is an unbalance data set. I do a 60/20/20 split, with same percentage of each class in all set. I use weights regularization. I made test with data augmentation (keras augmenteur, SMOTE, ADSYN) which help to prevent overfitting
When I overfit (epoch=350, loss=2) my model perform better (70+%) accuracy (and other metrics like F1 score) than when I don't overfit (epoch=50, loss=1) accuracy is around 60%. Accuracy is for TEST set when loss is the validation set loss.
Is it really a bad thing to use the overfitted model as best model? Since performance are better on the test set?
I have run same model with another test set (which was previously on the train set) performance are still better (tried 3 different split)
EDIT: About what i have read, validation loss is not always the best metric to affirm model is overfiting. In my situation, it's better to use validation F1 score and recall, when it's start to decrease then model is probably overfiting.
I still don't understand why validation loss is a bad metric for model evaluation, still training loss is used by the model to learn

Yes, it is a bad thing to use over fitted model as best model. By definition, the model which over fits don't really perform well in real world scenarios ie on images that are not in the training or test set.
To avoid over fitting, use image augmentation to balance and increase the number of samples to train. Also try to increase the fraction of dropout to avoid over fitting. I personally use ImageGenerator of Keras to augment the images and save it.
from keras.preprocessing.image import ImageDataGenerator,img_to_array, load_img
import glob
import numpy as np
#There are other parameters too. Check the link given at the end of the answer
datagen = ImageDataGenerator(
brightness_range = (0.4, 0.6),
horizontal_flip = True,
fill_mode='nearest'
)
for i, image_path in enumerate(glob.glob(path_to_images)):
img = load_img(image_path)
x = img_to_array(img) # creating a Numpy array
x = x.reshape((1,) + x.shape)
i = 0
num_of_samples_per_image_augmentation = 8
for batch in datagen.flow(x, save_to_dir='augmented-images/preview/fist', save_prefix='fist', save_format='jpg'):
i += 1
if i > num_of_samples_per_image_augmentation : #
break
Here is the link to image augmentation parameters using Keras, https://keras.io/preprocessing/image/
Feel free to use other libraries of your comfort.
Few other methods to reduce over fitting :
1) Tweak your CNN model by adding more training parameters.
2) Reduce Fully Connected Layers.
3) Use Transfer Learning (Pre-Trained Models)

Related

CNN stuck at 0% accuracy

I'm learning CNN and wondering why is my network stuck at 0% accuracy even after multiple epochs? I'm sharing the entire code as it's really simple.
I have a dataset with faces and respective ages. I'm using keras and tf to train a convolution neural network to determine age.
However, my accuracy is always reporting as 0%. I'm very new to neural networks and I'm hoping you could tell me what I am doing wrong?
path = "dataset"
pixels = []
age = []
for img in os.listdir(path):
ages = img.split("_")[0]
img = cv2.imread(str(path)+"/"+str(img))
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
pixels.append(np.array(img))
age.append(np.array(ages))
age = np.array(age,dtype=np.int64)
pixels = np.array(pixels)
x_train,x_test,y_train,y_test = train_test_split(pixels,age,random_state=100)
input = Input(shape=(200,200,3))
conv1 = Conv2D(70,(3,3),activation="relu")(input)
conv2 = Conv2D(65,(3,3),activation="relu")(conv1)
batch1 = BatchNormalization()(conv2)
pool3 = MaxPool2D((2,2))(batch1)
conv3 = Conv2D(60,(3,3),activation="relu")(pool3)
batch2 = BatchNormalization()(conv3)
pool4 = MaxPool2D((2,2))(batch2)
flt = Flatten()(pool4)
#age
age_l = Dense(128,activation="relu")(flt)
age_l = Dense(64,activation="relu")(age_l)
age_l = Dense(32,activation="relu")(age_l)
age_l = Dense(1,activation="relu")(age_l)
model = Model(inputs=input,outputs=age_l)
model.compile(optimizer="adam",loss=["mse","sparse_categorical_crossentropy"],metrics=['mae','accuracy'])
save = model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=2)

Well you have to decide if you want to do a classification model or a regression model. As it stands now it looks like you are trying to do a regression model.
Lets start at the outset. Apparently you have a dataset of image files and within the files path is text that defines the age so it is something like
say 27_01.jpg I assume. So you split the path based on the _ to get the age associated with the image file. You then read in the image using cv2 and then convert it to rgb. Now cv2 reads in the image and return it as an array so you don't need to convert it to an np array just use
pixels.append(img)
now the variable ages is a string which you want to convert into an integer. So just
use the code
ages =int( img.split("_")[0])
this is now a scaler integer value, not an array so just use
age.append(ages)
you now have two lists, pixels and age. To use them in a model you need to convert them to np arrays so use
age=np.array(age)
pixels=np.array(pixels
Now the next thing you want to do is to create a train set and a test set using the train_test_split function. Lets assume you want 90% of the data set to be used for training and 10% for testing. so use
x_train,x_test,y_train,y_test = train_test_split(pixels,age,train_size=.9, shuffle=True, random_state=100)
Now lets look at your model. This is what decides if you are doing regression or
classification. You want to do regression. Your model is OK but needs some changes
You have 4 dense layers. I suspect that this will lead to a case where your model
is over-fitting so I recommend that prior to the last layer you add a dropout layer
Use the code
drop=Dropout(rate=.4, seed=123)(age_1)
age_l = Dense(1,activation="linear")(age_l)
Note the activation is set to linear. That way the output can take a range of values
that can be compared to the integer values of the age array.
Now when you compile your model you want your loss to be mse. So it is measuring the error between the models output and the ages. Sparse_categorical crossentropy is used when you are doing classification which is NOT what you are doing. As for the metrics accuracy is used for classification models so you only want to use mae So you compile code should be
model.compile(optimizer="adam",loss="mse",metrics=['mae'])
now model.fit looks ok but you should run for more epochs like say 20. Now when you run your model look at the training loss and the validation loss. As the training loss decreases, on AVERAGE the validation loss should trend to decrease. If it starts to trend upward your model is over-fitting. In that case you may want to add an additional dropout layer.
At some point your model will stop improving if you run a sufficient number of epochs. You can usually get an improvement in performance if you use an adjustable learning rate. Since you are new to this you may not have experience using callbacks. Callbacks are used within model.fit and there are many types. Documentation for callbacks can be found here. To implement an adjustable learning rate you can use the callback ReduceLROnPlateau. The documentation for that is here. What it does is to set it up to monitor the validation loss. If the validation loss fails to reduce for a "patience" number of epochs the callback will reduce the learning rate by the parameter "factor" where
new_learning_rate=current_learning rate * factor
where factor is a float between 0 and 1.0. May recommended code for this callback is
shown below
rlronp=tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss",factor=0.5,
patience=2, verbose=1)
I also recommend you use the callback EarlyStopping. The documentation for that is here. Set it up to monitor validation loss. If the loss fails to reduce for 'patience number of consecutive epochs training will be halted. Set the parameter restore_best_weights=True. That way if the callback halts training it leaves your model set with the weights for the epoch that had the lowest validation loss. My recommended code for the callback is shown below
estop=tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=4,
verbose=1, restore_best_weights=True)
To use the callback for model.fit include the code
save = model.fit(x_train,y_train,validation_data=(x_test,y_test),epochs=20,
callbacks=[rlronp,estop])
By the way I think I am familar with this dataset or a similar one. Do not expect
great root mean squared error as I have seen many models for this and none had a small error margin. Incidentally if you want to learn machine learning there is an excellent set of about 200 tutorials on this by a guy called Gabriel Atkin. He can see his tutorials called Data Everyday here. The specific tutorial dealing with this kind of age dataset is located here.

Training Accuracy Increasing but Validation Accuracy Remains as Chance of Each Class (1/number of classes)

I am training a classifier using CNNs in Pytorch. My classifier has 6 labels. There are 700 training images for each label and 10 validation images for each label. The batch size is 10 and the learning rate is 0.000001. Each class has 16.7% of the whole dataset images. I have trained 60 epochs and the architecture has 3 main layers:
Conv2D->ReLU->BatchNorm2D->MaxPool2D>Dropout2D
Conv2D->ReLU->BatchNorm2D->Flattening->Dropout2D
Linear->ReLU->BatchNorm1D->Dropout And finally a fully connected and
a softmax.
My optimizer is AdamW and the loss function is crossentropy. The network is training well as the training accuracy is increasing but the validation accuracy remains almost fixed and equal as the chance of each class(1/number of classes). The accuracy is shown in the image below:
Accuracy of training and test
And the loss is shown in:
Loss for training and validation
Is there any idea why this is happening?How can I improve the validation accuracy? I have used L1 and L2 Regularization as well and also the Dropout Layers. I have also tried adding more data but these didn't help.

Problem Solved: First, I looked at this problem as overfitting and spend so much time on methods to solve this such as regularization and augmentation. Finally, after trying different methods, I couldn't improve the validation accuracy. Thus, I went through the data. I found a bug in my data preparation which was resulting in similar tensors being generated under different labels. I generated the correct data and the problem was solved to some extent (The validation accuracy increased around 60%). Then finally I improved the validation accuracy to 90% by adding more "conv2d + maxpool" layers.

This is not so much a programming related question so maybe ask it again in cross-validated
and it would be easier if you would post your architecture code.
But here are things that I would suggest:
you wrote that you "tried adding more data", if you can, always use all data you have. If thats still not enough (and even if it is) use augmentation (e.g. flip, crop, add noise to the image)
your learning rate should not be so small, start with 0.001 and decay while training or try ~ 0.0001 without decaying
remove the dropout after the conv layers and the batchnorm after the dense layers and see if that helps, it is not so common to use cropout after conv but normally that shouldnt have a negative effect. try it anyways

How to split test and train size

I am trying to feed a CNN model(Human body pose estimation)with a dataset contains 1000 numbers,
first, how can I make sure that the number of my datasets is already enough?
second, how should i split my data to train and test size? (when I put train size = 0.6 and test_size = 0.4 the network doesnt work well and show me NAN for weights and bias and loss value!)

There is no fixed way to determine when you have a sufficient size data set. It depends on many factors. Best thing to do is run with what you have and see how it performs. I usually split my data into 3 sets, training, validation and test. I usually try 75% for training, 15% for validation and 10% for final test.The validation set is what I use to tweek the hyper parameters. Initially I monitor the training accuracy and loss. If I can get that up to over 95% then I monitor the validation accuracy and loss. I use the model_checkpoint keras callback to save the model with the lowest validation loss. If the validation accuracy and loss is not satisfactory I tweek the hyper parameters to try to improve it. I have found using an adjustable learning rate to be useful for this purpose. Finally when I am satisfied with the training accuracy and validation accuracy I use the saved model to make predictions on the test set. This is the final measure of how the model performs.

Why test loss fluctuates so much using Resnet?

Here is a typical plot of train/test losses behaviour as epoch increases.
I'm not an expert but I have read several topics on similar problems. Well, let me explain what I'm doing.
First, I have used implementation given by https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py for resnet18 & resnet50, and by https://github.com/akamaster/pytorch_resnet_cifar10 for resnet32, resnet56. For all these nets I got the same kind of test-loss hieratic behaviour.
Second, my inputs are images 5x64x64, so I have adapted the first Convolutional Layer, and the output of the last Full-connected consist of 180 neurons. I have used either 64, 128, 256 batch sizes for the training, and 128 for the test: the same behaviour persists. I have also both used 300k or 100k images in input training (100k for the test): same behaviour persists too. The images are not of "standard" RGB photos: first, as you probably have already, , remarked there are 5 channels, second the pixel values can be negative (eg. spanning the range (-0.01, 500))
Third, I am aware of the model.train() statement for the training phase, as well as the model.eval() statement (coupled with the with torch.no_grad():) for the testing phase. It is clear that if I do not use model.eval() during the test phase, the test loss is gently decrasing as the traing loss. But, this is not allowed, isn't it?
I have tried several things after reading post concerning Batch Norm behaviour wo any success
I have used SGD, Adam (& SWATS)
I have tryied lr = 0.1 to lr= 1e-5
I have modified the BN momentum (default = 0.1) : 0.5 and 0.01; as well as the eps parameter.
Now, I have managed to get nice results (ie; good training & testing losses) with a classical CNN (ie. wo any Batch normalization, & short-cuts) but I would like to study Resnet behaviour against adversarial attack. So, I would like to get Resnet fit my images :slight_smile:
Any idea ?
Thanks
After making some tests, I have found something: I have used the standard resnet20 (h=1). Then, I have used as test set the same samples (100,000 images) as for the train set. BUT, for the test set 1) I do not use the shuffling, and 2) I do not make any Horizontal/vertical flip or Rot90deg, Rot180deg or Rot270deg. I observe the same kind of fluctuations for the test loss.
Moreover, when I switch OFF complety the transformations of the train set, and uses the same set for train & test, I got the same behviour:
And finaly, if I switch off the suffling and random transforms (flips & Rotations) of train set, and I use the same set for test, then I get:
Seems that the test loss is converging towards a value, but different from the train loss. Why ???

How to put evaluations in between trainings in Tensorflow

According to my understanding of machine learning(though I am very new to it), evaluation of a model needs to be done during the training process. This would avoid overfitting or reduce the possibility of bad predictions.
I tried to modify the abalone example Tensorflow official sites provided to fit my project and I found out that the code only do evaluation ONCE after model training is done.
This is very strange to me because only one evaluation seems to make the "evaluation phase" useless. In other words, what is the use of evaluation if the training is already done? It can't help to build up a better model, can it?
Here is part of my code:
nn = tf.estimator.Estimator(model_fn=model_fn, params=model_params, model_dir='/tmp/nmos_self_define')
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": train_features_numpy},
y=train_labels_numpy,
batch_size = 1,
num_epochs= 1,
shuffle=True)
# Train
nn.train(input_fn=train_input_fn)
test_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": test_features_numpy},
y=test_labels_numpy,
batch_size = 1,
num_epochs= 1,
shuffle=False)
ev = nn.evaluate(input_fn=test_input_fn)
print("Loss: %s" % ev["loss"])
print("Root Mean Squared Error: %s" % ev["rmse"])
And the training results visualized through Tensorboard is:
As you can see there is only one evaluation happened at the end of training (the blue dot)
Though I am not sure if the loss not reduced is because of the lack of evaluation, I'd like to know how to manipulate the code so that the evaluation process can be executed during training.
Thanks for taking your time reading this question and I'd love to have a discussion about this, both conceptually and code-wise

Evaluation of a model needs to be done during the training process. This would avoid overfitting or reduce the possibility of bad predictions.
By itself, evaluating while training doesn't do anything, but it allows an operator to track performances of the model on data it has never seen before. Then you can adjust hyper parameters (e.g. learning rate or regularisation factor) accordingly.
I found out that the code only do evaluation ONCE after model training is done.
The code snippet you provided runs evaluation after only one epoch of training. You should train your model for multiple epochs to achieve better performances.
On a side note, you should create what we call a "validation set", which is a small subset of the training data that the algorithm doesn't train on to do your training evaluations. With your current approach, you might overfit your test set. The test set should only be used very rarely to evaluate the real generalisation capabilities of your model.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.