According to my understanding of machine learning(though I am very new to it), evaluation of a model needs to be done during the training process. This would avoid overfitting or reduce the possibility of bad predictions.
I tried to modify the abalone example Tensorflow official sites provided to fit my project and I found out that the code only do evaluation ONCE after model training is done.
This is very strange to me because only one evaluation seems to make the "evaluation phase" useless. In other words, what is the use of evaluation if the training is already done? It can't help to build up a better model, can it?
Here is part of my code:
nn = tf.estimator.Estimator(model_fn=model_fn, params=model_params, model_dir='/tmp/nmos_self_define')
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": train_features_numpy},
y=train_labels_numpy,
batch_size = 1,
num_epochs= 1,
shuffle=True)
# Train
nn.train(input_fn=train_input_fn)
test_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": test_features_numpy},
y=test_labels_numpy,
batch_size = 1,
num_epochs= 1,
shuffle=False)
ev = nn.evaluate(input_fn=test_input_fn)
print("Loss: %s" % ev["loss"])
print("Root Mean Squared Error: %s" % ev["rmse"])
And the training results visualized through Tensorboard is:
As you can see there is only one evaluation happened at the end of training (the blue dot)
Though I am not sure if the loss not reduced is because of the lack of evaluation, I'd like to know how to manipulate the code so that the evaluation process can be executed during training.
Thanks for taking your time reading this question and I'd love to have a discussion about this, both conceptually and code-wise
Evaluation of a model needs to be done during the training process. This would avoid overfitting or reduce the possibility of bad predictions.
By itself, evaluating while training doesn't do anything, but it allows an operator to track performances of the model on data it has never seen before. Then you can adjust hyper parameters (e.g. learning rate or regularisation factor) accordingly.
I found out that the code only do evaluation ONCE after model training is done.
The code snippet you provided runs evaluation after only one epoch of training. You should train your model for multiple epochs to achieve better performances.
On a side note, you should create what we call a "validation set", which is a small subset of the training data that the algorithm doesn't train on to do your training evaluations. With your current approach, you might overfit your test set. The test set should only be used very rarely to evaluate the real generalisation capabilities of your model.
Related
I'm working on a multimodal classifier (text + image) using pytorch (only 2 classes).
Since I don't have a lot of data, i've decided to use StratifiedKFold to avoid overfitting.
I noticed a strange behavior on training/testing curves.
My training accuracy quickly converges forward a unique value for few epochs before evolving again.
With these results I directly thought of overfitting, .67 being the maximum accuracy of the model.
With the rest of the data separated by the KFold, I tested my model in evaluation mode.
I've been quite surprised since test accuracy follows (quite exactly) the training accuracy while the loss (CrossEntropyLoss) still evolves.
Note : changing the batch size only make growing of accuracy delays or brings closer the moment the loss evolves.
Any ideas about this behaviour ?
How it looks like with lesser smoothing
Hi! I am currently training my model with Darkflow Yolov2. The optimiser is SGD with lr 0.001.
Based on this graph, my val loss > train loss, which would mean that it is overfitting? If it is, what would be the recommended course of action? It seems weird because both losses are decreasing, but the val loss is slower.
For more info,
My train dataset consist of 400 images per class, with single annotations,with a total of 2800 images. I did this to prevent class imbalance, by only annotating one class instance per image. My val dataset consist of 350 images , with multiple annotations. Basically, i annotated every object within the images. I have 7 classes and my train-val-test split is 80-10-10. Is this the cause for the val loss?
Over-fitting detection includes a mismatch as training accuracy diverges from test (validation) accuracy. Since you haven't provided that data, we can't evaluate your model.
It might help to clarify stages and terms; this should let you answer the question for yourself in the future:
"Convergence" is the point in training at which we believe that the model
has learned something useful;
has reached this point via reproducible process;
isn't going to get significantly better;
is about to get worse.
Convergence is where we want to stop training and save (checkpoint) the model for production use.
We detect convergence by use of training passes and testing (validation) passes.
At convergence, we expect:
validation loss (error function, perplexity, etc.) is at a relative
minimum;
validation accuracy is at a relative maximum;
validation and training metrics are "reasonably stable", with respect
to the model's general behaviour;
training accuracy and validation accuracy are essentially equal.
Once a training run passes this point, it often transitions into "over-fitting", in which the model learns things so specific to the training data, that it is no longer as good at inferring about new observations. In this state,
training loss drops; validation loss rises;
training accuracy rises; validation accuracy drops.
I am training a NN and getting this result on loss and validation loss:
These are 200 epochs, a batch size of 16, 500 training samples and 200 validation samples.
As you can see, after about 20 epochs, the validation loss begins to do a very exaggerated zig-zagging.
Do you know which could be the reason for that behavior?
I tried to increase the number of validation samples but that just increased the zig-zagging and made it more exaggerated.
Also, I added a decay value to the optimizer, but the loss and validation loss did not look so good.
.
I was looking for another way to improve it.
Any idea on which is the zig-zagging reason and how could I minimize it?
This might be a case of overfitting:
Overfitting refers to a model that models the “training data” too well. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data source.
Basically, you have a very small training sample (500), but are training for a very long time (200 epochs!).
The network will start learning your training data by heart and won't learn to generalise. It will thus seem to be very good during training, but will fail miserably on the test set.
early stopping is a nice way to avoid overfitting: basically, stop as soon as the validation loss becomes erratic/starts increasing. Another way to lower the chances of overfitting is to use techniques such as dropout or simply to increase the training data.
tldr; you are overfitting. To avoid this issue, many possibilities: reduce drastically the number of epochs, use a dev set and a stopping criterion, have more training data, ...
For alternative explanations, see also this question on QUORA.
I would suggest that don't be worry for the zigzag fashion of the validation loss or validation accuracy. See, what happens when training of the neural network goes on, it makes the mistakes and update the weights, right ?( if you know the math behind it). So it is obvious that testing data will create zigzag because model is in training mode (learning stage). Once the model will get trained fully , you will notice that ... zigzag will decrease (if you have chose correct number of epochs).
So don't worry for this.
I'm actually using CNN to classify image. I got 16 classes and around 3000 images(very small dataset). This is an unbalance data set. I do a 60/20/20 split, with same percentage of each class in all set. I use weights regularization. I made test with data augmentation (keras augmenteur, SMOTE, ADSYN) which help to prevent overfitting
When I overfit (epoch=350, loss=2) my model perform better (70+%) accuracy (and other metrics like F1 score) than when I don't overfit (epoch=50, loss=1) accuracy is around 60%. Accuracy is for TEST set when loss is the validation set loss.
Is it really a bad thing to use the overfitted model as best model? Since performance are better on the test set?
I have run same model with another test set (which was previously on the train set) performance are still better (tried 3 different split)
EDIT: About what i have read, validation loss is not always the best metric to affirm model is overfiting. In my situation, it's better to use validation F1 score and recall, when it's start to decrease then model is probably overfiting.
I still don't understand why validation loss is a bad metric for model evaluation, still training loss is used by the model to learn
Yes, it is a bad thing to use over fitted model as best model. By definition, the model which over fits don't really perform well in real world scenarios ie on images that are not in the training or test set.
To avoid over fitting, use image augmentation to balance and increase the number of samples to train. Also try to increase the fraction of dropout to avoid over fitting. I personally use ImageGenerator of Keras to augment the images and save it.
from keras.preprocessing.image import ImageDataGenerator,img_to_array, load_img
import glob
import numpy as np
#There are other parameters too. Check the link given at the end of the answer
datagen = ImageDataGenerator(
brightness_range = (0.4, 0.6),
horizontal_flip = True,
fill_mode='nearest'
)
for i, image_path in enumerate(glob.glob(path_to_images)):
img = load_img(image_path)
x = img_to_array(img) # creating a Numpy array
x = x.reshape((1,) + x.shape)
i = 0
num_of_samples_per_image_augmentation = 8
for batch in datagen.flow(x, save_to_dir='augmented-images/preview/fist', save_prefix='fist', save_format='jpg'):
i += 1
if i > num_of_samples_per_image_augmentation : #
break
Here is the link to image augmentation parameters using Keras, https://keras.io/preprocessing/image/
Feel free to use other libraries of your comfort.
Few other methods to reduce over fitting :
1) Tweak your CNN model by adding more training parameters.
2) Reduce Fully Connected Layers.
3) Use Transfer Learning (Pre-Trained Models)
This is a problem that I am constantly facing, but don't seem to find the answer anywhere. I have a data set of 700 samples. As a result, I have to use cross-validation instead of just using one validation and one test set to get a close estimate of the error.
I would like to use a neural network to do this. But after doing CV with a neural network, and get an error estimate, how do I train the NN on the whole data set? Because for other algorithms like Logistic regression or SVM, there is no question of when to stop in training. But for NN, you train it until your validation score goes down. So, for the final model, training on the whole dataset, how do you know when to stop?
Just to make it clear, my problem is not how to choose hyper-parametes with NN. I can do that by using a nested CV. My question is how to train the final NN on the whole data set(when to stop more specifically) before applying it in wild?
To rephrase your question:
"When training a neural network, a common stopping criterion is the 'early stopping criterion' which stops training when the validation loss increases (signaling overfitting). For small datasets, where training samples are precious, we would prefer to use some other criterion and use 100% of the data for training the model."
I think this is generally a hard problem, so I am not surprised you have not found a simple answer. I think you have a few options:
Add regularization (such as Dropout or Batch Normalization) which should help prevent overfitting. Then, use the training loss for a stopping criterion. You could see how this approach would perform on a validation set without using early stopping to ensure that the model is not overfitting.
Be sure not to overprovision the model. Smaller models will have a more difficult time overfitting.
Take a look at the stopping criterion described in this paper which does not rely on a validation set: https://arxiv.org/pdf/1703.09580.pdf
Finally, you may not use Neural Networks here. Generally, these models work best with large amounts of training data. In this case of 700 samples, you can possibly get better performance with another algorithm.