In the book Introduction to Machine Learning with Python on page 50 the author is performing a Linear Regression on a dataset and gets:
training set score: 0.67
test set score: 0.66
They then state that they are “likely underfitting, not overfitting.”
However, when using TensorFlow’s Basic Classification Tutorial they are using the MNIST Fashion dataset with a neural network and get:
training set score: 0.892
test set score: 0.876
and then they state the following
“It turns out, the accuracy on the test dataset is a little less than the accuracy on the training dataset. This gap between training accuracy and test accuracy is an example of overfitting. Overfitting is when a machine learning model performs worse on new data than on their training data.”
I believe that the quote taken from the TensorFlow site is the correct one, or are they both correct and I don’t fully understand overfitting.
Underfitting occurs when both the training and testing accuracies are low. This signifies a systematic problem with your model, i.e the data would fit better with a polynomial model but you're using a linear model. So a ~66% accuracy for both training and testing is considered underfitting because they are both very low. In general, high error on both sets indicates underfitting.
Overfitting occurs when you have relatively high accuracy on training, but lower on testing. This signifies that your model has fit too much towards your training data, and does not generalize well to other data. In general, low error on training and higher error on testing indicates overfitting.
In general, it is extremely rare to build a model, that would show the same performance on the training and validation (or test, or holdout, whatever you wish to call it) sets. Thus, the gap between training and validation set will be there (almost) always. You will see the definition of overfitting based on the gap often, but in practice it is not applicable as it is not quantitative. The more general concept here is the "bias-variance trade-off", that you might want to google about. The relevant question is how large is the gap, how good is performance and how performance on the validation set behaves with changed complexity of the model.
I find this figure from Wikipedia very instructive: https://en.wikipedia.org/wiki/Overfitting#/media/File:Overfitting_svg.svg. The x axis is the number of training iterations (epochs) in the case of NN or GBM's, but you can also think of it as a model complexity parameter, e.g. the number of powers included in a polynomial model. As you can see, there is always a gap between performance on the training and validation samples. But the key to choose the model that does not overfit is to choose the optimal trade-off between performance on the training sample (= bias) and performance on the validation sample (the difference between performance on training and validation samples = variance).
Over- and under-fitting
The most overfit you can do is have an accuracy of 100% on your training set.
This means that your model learned to predict exactly inputs that it has seen before.
If you are ever in this situation, your test set will probably perform very poorly.
You can detect overfitting by:
High accuracy on the training set
a large gap between training and test set
You can detect underfitting by:
A low accuracy on the training set (irrespective of performance on test set)
Examples:
1)
training set score: 0.67
test set score: 0.66
This example has a low score on training set. So underfitting seems like a fair assumption.
2)
training set score: 0.892
test set score: 0.876
This one is up to interpretation. The score on the training set is quite high and there is a gap in respect to the test set.
If the examples in both sets are very similar, then I would say that there is some overfitting. However, if the two sets are quite different (for example from different sources), then the results could be deemed acceptable.
Related
I'm working on a multimodal classifier (text + image) using pytorch (only 2 classes).
Since I don't have a lot of data, i've decided to use StratifiedKFold to avoid overfitting.
I noticed a strange behavior on training/testing curves.
My training accuracy quickly converges forward a unique value for few epochs before evolving again.
With these results I directly thought of overfitting, .67 being the maximum accuracy of the model.
With the rest of the data separated by the KFold, I tested my model in evaluation mode.
I've been quite surprised since test accuracy follows (quite exactly) the training accuracy while the loss (CrossEntropyLoss) still evolves.
Note : changing the batch size only make growing of accuracy delays or brings closer the moment the loss evolves.
Any ideas about this behaviour ?
I am trying to feed a CNN model(Human body pose estimation)with a dataset contains 1000 numbers,
first, how can I make sure that the number of my datasets is already enough?
second, how should i split my data to train and test size? (when I put train size = 0.6 and test_size = 0.4 the network doesnt work well and show me NAN for weights and bias and loss value!)
There is no fixed way to determine when you have a sufficient size data set. It depends on many factors. Best thing to do is run with what you have and see how it performs. I usually split my data into 3 sets, training, validation and test. I usually try 75% for training, 15% for validation and 10% for final test.The validation set is what I use to tweek the hyper parameters. Initially I monitor the training accuracy and loss. If I can get that up to over 95% then I monitor the validation accuracy and loss. I use the model_checkpoint keras callback to save the model with the lowest validation loss. If the validation accuracy and loss is not satisfactory I tweek the hyper parameters to try to improve it. I have found using an adjustable learning rate to be useful for this purpose. Finally when I am satisfied with the training accuracy and validation accuracy I use the saved model to make predictions on the test set. This is the final measure of how the model performs.
How it looks like with lesser smoothing
Hi! I am currently training my model with Darkflow Yolov2. The optimiser is SGD with lr 0.001.
Based on this graph, my val loss > train loss, which would mean that it is overfitting? If it is, what would be the recommended course of action? It seems weird because both losses are decreasing, but the val loss is slower.
For more info,
My train dataset consist of 400 images per class, with single annotations,with a total of 2800 images. I did this to prevent class imbalance, by only annotating one class instance per image. My val dataset consist of 350 images , with multiple annotations. Basically, i annotated every object within the images. I have 7 classes and my train-val-test split is 80-10-10. Is this the cause for the val loss?
Over-fitting detection includes a mismatch as training accuracy diverges from test (validation) accuracy. Since you haven't provided that data, we can't evaluate your model.
It might help to clarify stages and terms; this should let you answer the question for yourself in the future:
"Convergence" is the point in training at which we believe that the model
has learned something useful;
has reached this point via reproducible process;
isn't going to get significantly better;
is about to get worse.
Convergence is where we want to stop training and save (checkpoint) the model for production use.
We detect convergence by use of training passes and testing (validation) passes.
At convergence, we expect:
validation loss (error function, perplexity, etc.) is at a relative
minimum;
validation accuracy is at a relative maximum;
validation and training metrics are "reasonably stable", with respect
to the model's general behaviour;
training accuracy and validation accuracy are essentially equal.
Once a training run passes this point, it often transitions into "over-fitting", in which the model learns things so specific to the training data, that it is no longer as good at inferring about new observations. In this state,
training loss drops; validation loss rises;
training accuracy rises; validation accuracy drops.
I am training a NN and getting this result on loss and validation loss:
These are 200 epochs, a batch size of 16, 500 training samples and 200 validation samples.
As you can see, after about 20 epochs, the validation loss begins to do a very exaggerated zig-zagging.
Do you know which could be the reason for that behavior?
I tried to increase the number of validation samples but that just increased the zig-zagging and made it more exaggerated.
Also, I added a decay value to the optimizer, but the loss and validation loss did not look so good.
.
I was looking for another way to improve it.
Any idea on which is the zig-zagging reason and how could I minimize it?
This might be a case of overfitting:
Overfitting refers to a model that models the “training data” too well. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data source.
Basically, you have a very small training sample (500), but are training for a very long time (200 epochs!).
The network will start learning your training data by heart and won't learn to generalise. It will thus seem to be very good during training, but will fail miserably on the test set.
early stopping is a nice way to avoid overfitting: basically, stop as soon as the validation loss becomes erratic/starts increasing. Another way to lower the chances of overfitting is to use techniques such as dropout or simply to increase the training data.
tldr; you are overfitting. To avoid this issue, many possibilities: reduce drastically the number of epochs, use a dev set and a stopping criterion, have more training data, ...
For alternative explanations, see also this question on QUORA.
I would suggest that don't be worry for the zigzag fashion of the validation loss or validation accuracy. See, what happens when training of the neural network goes on, it makes the mistakes and update the weights, right ?( if you know the math behind it). So it is obvious that testing data will create zigzag because model is in training mode (learning stage). Once the model will get trained fully , you will notice that ... zigzag will decrease (if you have chose correct number of epochs).
So don't worry for this.
I am training a CNN and I am getting results of 85% accuracy in the training set, and 65% accuracy in the test set.
Is it okey to assume that, with a proper setting of the regularization of the network (dropout and L2 in my case), my test accuracy should get very close to my training accuracy (which will at the same time decrease as the regularization increases) ?
So let's say for instance, a 75%-74% accuracy ?
With a proper setting of the regularization of all parameters of the network and with a well representative data batch, you should have a small difference between your test accuracy and your training accuracy. But of course you need to
optimize your model with parameter optimization and feature selection.
Maybe you can check this link to find some more informations.
Hope it helps !