I have obtained this result after training a neural network in keras and I was wondering if this is overfitting or not.
I'm having doubts because I have read overfitting is produced when a net is overtrained, and it happens when the validation loss INCREASES.
But in this case it doesn't increase. It remains the same, but the training loss DECREASES.
EXTRA INFO
Single dataset split on this way:
70% of the dataset used as training data
30% of the dataset used as validation data
500 EPOCHS TRAINING
2000 EPOCHS TRAINING
Training loss: 3.1711e-05
Validation loss: 0.0036
There is a slight overfit in the sense that you training loss keeps decreasing and the validation loss stopped decreasing.
However, I wouldn't consider this harmful because the validation loss insn't increasing. This is if I read the graph correctly, if there is a small increase then it's getting bad.
A harmful overfit is when your validation loss starts increasing. The validation loss is your true measure of the performance of the network. If it goes up then your model is starting to do bad things and you should stop there.
All in all this seems pretty decent. The training loss will almost always be going lower than the validation at some point, this is an optimization process over the training set.
Training loss does indeed appear to continue decreasing further than validation loss (it still looks to me like it didn't finish decreasing yet at the 500th epoch, would be good to continue for more epochs and see what happens). The difference doesn't appear to be large though.
It may be overfitting slightly, but it may also be possible that the distribution of your validation data is simply a bit different from the distribution of the training data.
I'd recommend testing the following:
Continue for more than 500 epochs, to see if the training loss keeps on decreasing even further, or if it stabilizes close to the validation loss. If it keeps on decreasing much further, and the validation loss stays the same, it's safe to say that the network is overfitting.
Try creating different splits of training and validation sets. How did you determine training and validation sets actually? Were you given two separate sets, one for training and one for validation? Or were you given a single large training set, and did you split it up yourself? In the first case, the distributions may be different, so a difference in training vs validation loss wouldn't be strange. In the second case, try randomly creating different splits and repeating the experiments to see if you always consistently get the same difference in training vs validation loss, or if they're sometimes also closer together.
Related
I'm working on a multimodal classifier (text + image) using pytorch (only 2 classes).
Since I don't have a lot of data, i've decided to use StratifiedKFold to avoid overfitting.
I noticed a strange behavior on training/testing curves.
My training accuracy quickly converges forward a unique value for few epochs before evolving again.
With these results I directly thought of overfitting, .67 being the maximum accuracy of the model.
With the rest of the data separated by the KFold, I tested my model in evaluation mode.
I've been quite surprised since test accuracy follows (quite exactly) the training accuracy while the loss (CrossEntropyLoss) still evolves.
Note : changing the batch size only make growing of accuracy delays or brings closer the moment the loss evolves.
Any ideas about this behaviour ?
I am trying to feed a CNN model(Human body pose estimation)with a dataset contains 1000 numbers,
first, how can I make sure that the number of my datasets is already enough?
second, how should i split my data to train and test size? (when I put train size = 0.6 and test_size = 0.4 the network doesnt work well and show me NAN for weights and bias and loss value!)
There is no fixed way to determine when you have a sufficient size data set. It depends on many factors. Best thing to do is run with what you have and see how it performs. I usually split my data into 3 sets, training, validation and test. I usually try 75% for training, 15% for validation and 10% for final test.The validation set is what I use to tweek the hyper parameters. Initially I monitor the training accuracy and loss. If I can get that up to over 95% then I monitor the validation accuracy and loss. I use the model_checkpoint keras callback to save the model with the lowest validation loss. If the validation accuracy and loss is not satisfactory I tweek the hyper parameters to try to improve it. I have found using an adjustable learning rate to be useful for this purpose. Finally when I am satisfied with the training accuracy and validation accuracy I use the saved model to make predictions on the test set. This is the final measure of how the model performs.
How it looks like with lesser smoothing
Hi! I am currently training my model with Darkflow Yolov2. The optimiser is SGD with lr 0.001.
Based on this graph, my val loss > train loss, which would mean that it is overfitting? If it is, what would be the recommended course of action? It seems weird because both losses are decreasing, but the val loss is slower.
For more info,
My train dataset consist of 400 images per class, with single annotations,with a total of 2800 images. I did this to prevent class imbalance, by only annotating one class instance per image. My val dataset consist of 350 images , with multiple annotations. Basically, i annotated every object within the images. I have 7 classes and my train-val-test split is 80-10-10. Is this the cause for the val loss?
Over-fitting detection includes a mismatch as training accuracy diverges from test (validation) accuracy. Since you haven't provided that data, we can't evaluate your model.
It might help to clarify stages and terms; this should let you answer the question for yourself in the future:
"Convergence" is the point in training at which we believe that the model
has learned something useful;
has reached this point via reproducible process;
isn't going to get significantly better;
is about to get worse.
Convergence is where we want to stop training and save (checkpoint) the model for production use.
We detect convergence by use of training passes and testing (validation) passes.
At convergence, we expect:
validation loss (error function, perplexity, etc.) is at a relative
minimum;
validation accuracy is at a relative maximum;
validation and training metrics are "reasonably stable", with respect
to the model's general behaviour;
training accuracy and validation accuracy are essentially equal.
Once a training run passes this point, it often transitions into "over-fitting", in which the model learns things so specific to the training data, that it is no longer as good at inferring about new observations. In this state,
training loss drops; validation loss rises;
training accuracy rises; validation accuracy drops.
I am currently working on a CNN model for classification, I have to predict words on a wav file. I encountered a problem with my validation accuracy that stays (almost) the same, first I was thinking of overfitting but that does not seem to be the problem. Below you can see a photo with the result at the different epochs:
I am building a CNN model with Keras and using the 'adam' optimizer and 'categorical_crossentropy' for the loss. I already have tried to increase the number of epochs until 1000 and changed the batch size.
Your training loss seems to be decreasing but val_loss is increasing while val_accuracy is approximately same. This is standard case of overfitting. Why do you think that's not the case?
Increasing the training epochs or batch size is not helpful as you're just changing the number of times the model sees the data or the quantity of data it sees in one epoch.
For current scenario, the best model is created till the point both val_loss and train_loss continues to decrease before it becomes saturated.
To address the problem, you need to add noise in the training data so that the model generalizes better, generalize the examples better, create balanced categories in terms of training data volume.
Secondly, you can increase your validation dataset to see if it continues to have the same issue. If it's there then the model is definitely overfitting. ALso please update your question about what kind of validation set and technique you're using. If possible, add the code snippet of your validation set and loss function
I am training a NN and getting this result on loss and validation loss:
These are 200 epochs, a batch size of 16, 500 training samples and 200 validation samples.
As you can see, after about 20 epochs, the validation loss begins to do a very exaggerated zig-zagging.
Do you know which could be the reason for that behavior?
I tried to increase the number of validation samples but that just increased the zig-zagging and made it more exaggerated.
Also, I added a decay value to the optimizer, but the loss and validation loss did not look so good.
.
I was looking for another way to improve it.
Any idea on which is the zig-zagging reason and how could I minimize it?
This might be a case of overfitting:
Overfitting refers to a model that models the “training data” too well. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data source.
Basically, you have a very small training sample (500), but are training for a very long time (200 epochs!).
The network will start learning your training data by heart and won't learn to generalise. It will thus seem to be very good during training, but will fail miserably on the test set.
early stopping is a nice way to avoid overfitting: basically, stop as soon as the validation loss becomes erratic/starts increasing. Another way to lower the chances of overfitting is to use techniques such as dropout or simply to increase the training data.
tldr; you are overfitting. To avoid this issue, many possibilities: reduce drastically the number of epochs, use a dev set and a stopping criterion, have more training data, ...
For alternative explanations, see also this question on QUORA.
I would suggest that don't be worry for the zigzag fashion of the validation loss or validation accuracy. See, what happens when training of the neural network goes on, it makes the mistakes and update the weights, right ?( if you know the math behind it). So it is obvious that testing data will create zigzag because model is in training mode (learning stage). Once the model will get trained fully , you will notice that ... zigzag will decrease (if you have chose correct number of epochs).
So don't worry for this.