I'm building a Keras sequential model to do a binary image classification. Now when I use like 70 to 80 epochs I start getting good validation accuracy (81%). But I was told that this is a very big number to be used for epochs which would affect the performance of the network.
My question is: is there a limited number of epochs that I shouldn't exceed, note that I have 2000 training images and 800 validation images.
If the number of epochs are very high, your model may overfit and your training accuracy will reach 100%. In that approach you plot the error rate on training and validation data. The horizontal axis is the number of epochs and the vertical axis is the error rate. You should stop training when the error rate of validation data is minimum.
You need to have a trade-off between your regularization parameters. Major problem in Deep Learning is overfitting model. Various regularization techniques are used,as
i) Reducing batch-size
ii) Data Augmentation(only if your data is not diverse)
iii) Batch Normalization
iv) Reducing complexity in architecture(mainly convolutional layers)
v) Introducing dropout layer(only if you are using any dense layer)
vi) Reduced learning rate.
vii) Transfer learning
Batch-size vs epoch tradeoff is quite important. Also it is dependent on your data and varies from application to application. In that case, you have to play with your data a little bit to know the exact figure. Normally a batch size of 32 medium size images requires 10 epochs for good feature extraction from the convolutional layers. Again, it is relative
There's this Early Stopping function that Keras supply which you simply define.
EarlyStopping(patience=self.patience, verbose=self.verbose, monitor=self.monitor)
Let's say that the epochs parameter equals to 80, like you said before. When you use the EarlyStopping function the number of epochs becomes the maximum number of epochs.
You can define the EarlyStopping function to monitor the validation loss, for example, when ever this loss does not improve no more it'll give it a few last chances (the number you put in the patience parameter) and if after those last chances the monitored value didn't improve the training process will stop.
The best practice, in my opinion, is to use both EarlyStopping and ModelCheckpoint, which is another callback function supplied in Keras' API that simply saves your last best model (you decide what best means, best loss or other value that you test your results with).
This is the Keras solution for the problem your trying to deal with. In addition there is a lot of online material that you can read about how to deal with overfit.
Yaa! Their is a solution for your problem. Select epochs e.g 1k ,2k just use early stoping on your neural net.
Early Stopping:
Keras supports the early stopping of training via a callback called Early-stopping.
This callback allows you to specify the performance measure to monitor, the trigger, and once triggered, it will stop the training process. For example you apply a trigger that stop the training if accuracy is not increasing in previous 5 epochs. So keras will see previous 5 epochs through call backs and stop training if your accuracy is not increasing
Early Stopping link :
Related
I am training a classifier using CNNs in Pytorch. My classifier has 6 labels. There are 700 training images for each label and 10 validation images for each label. The batch size is 10 and the learning rate is 0.000001. Each class has 16.7% of the whole dataset images. I have trained 60 epochs and the architecture has 3 main layers:
Conv2D->ReLU->BatchNorm2D->MaxPool2D>Dropout2D
Conv2D->ReLU->BatchNorm2D->Flattening->Dropout2D
Linear->ReLU->BatchNorm1D->Dropout And finally a fully connected and
a softmax.
My optimizer is AdamW and the loss function is crossentropy. The network is training well as the training accuracy is increasing but the validation accuracy remains almost fixed and equal as the chance of each class(1/number of classes). The accuracy is shown in the image below:
Accuracy of training and test
And the loss is shown in:
Loss for training and validation
Is there any idea why this is happening?How can I improve the validation accuracy? I have used L1 and L2 Regularization as well and also the Dropout Layers. I have also tried adding more data but these didn't help.
Problem Solved: First, I looked at this problem as overfitting and spend so much time on methods to solve this such as regularization and augmentation. Finally, after trying different methods, I couldn't improve the validation accuracy. Thus, I went through the data. I found a bug in my data preparation which was resulting in similar tensors being generated under different labels. I generated the correct data and the problem was solved to some extent (The validation accuracy increased around 60%). Then finally I improved the validation accuracy to 90% by adding more "conv2d + maxpool" layers.
This is not so much a programming related question so maybe ask it again in cross-validated
and it would be easier if you would post your architecture code.
But here are things that I would suggest:
you wrote that you "tried adding more data", if you can, always use all data you have. If thats still not enough (and even if it is) use augmentation (e.g. flip, crop, add noise to the image)
your learning rate should not be so small, start with 0.001 and decay while training or try ~ 0.0001 without decaying
remove the dropout after the conv layers and the batchnorm after the dense layers and see if that helps, it is not so common to use cropout after conv but normally that shouldnt have a negative effect. try it anyways
I am currently working on a CNN model for classification, I have to predict words on a wav file. I encountered a problem with my validation accuracy that stays (almost) the same, first I was thinking of overfitting but that does not seem to be the problem. Below you can see a photo with the result at the different epochs:
I am building a CNN model with Keras and using the 'adam' optimizer and 'categorical_crossentropy' for the loss. I already have tried to increase the number of epochs until 1000 and changed the batch size.
Your training loss seems to be decreasing but val_loss is increasing while val_accuracy is approximately same. This is standard case of overfitting. Why do you think that's not the case?
Increasing the training epochs or batch size is not helpful as you're just changing the number of times the model sees the data or the quantity of data it sees in one epoch.
For current scenario, the best model is created till the point both val_loss and train_loss continues to decrease before it becomes saturated.
To address the problem, you need to add noise in the training data so that the model generalizes better, generalize the examples better, create balanced categories in terms of training data volume.
Secondly, you can increase your validation dataset to see if it continues to have the same issue. If it's there then the model is definitely overfitting. ALso please update your question about what kind of validation set and technique you're using. If possible, add the code snippet of your validation set and loss function
I am training a NN and getting this result on loss and validation loss:
These are 200 epochs, a batch size of 16, 500 training samples and 200 validation samples.
As you can see, after about 20 epochs, the validation loss begins to do a very exaggerated zig-zagging.
Do you know which could be the reason for that behavior?
I tried to increase the number of validation samples but that just increased the zig-zagging and made it more exaggerated.
Also, I added a decay value to the optimizer, but the loss and validation loss did not look so good.
.
I was looking for another way to improve it.
Any idea on which is the zig-zagging reason and how could I minimize it?
This might be a case of overfitting:
Overfitting refers to a model that models the “training data” too well. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data source.
Basically, you have a very small training sample (500), but are training for a very long time (200 epochs!).
The network will start learning your training data by heart and won't learn to generalise. It will thus seem to be very good during training, but will fail miserably on the test set.
early stopping is a nice way to avoid overfitting: basically, stop as soon as the validation loss becomes erratic/starts increasing. Another way to lower the chances of overfitting is to use techniques such as dropout or simply to increase the training data.
tldr; you are overfitting. To avoid this issue, many possibilities: reduce drastically the number of epochs, use a dev set and a stopping criterion, have more training data, ...
For alternative explanations, see also this question on QUORA.
I would suggest that don't be worry for the zigzag fashion of the validation loss or validation accuracy. See, what happens when training of the neural network goes on, it makes the mistakes and update the weights, right ?( if you know the math behind it). So it is obvious that testing data will create zigzag because model is in training mode (learning stage). Once the model will get trained fully , you will notice that ... zigzag will decrease (if you have chose correct number of epochs).
So don't worry for this.
I have a custom dataset of approximately 20k images (10% used of validation).
I have roughly 1/3 in label class 0, 1/3 in label class 1, and 1/3 that do not have class 0, or 1 objects with a -1 label.
I have run approximately 400 epochs, the last 40 epochs validation mAP has increased from 0.817 TO 0.831, and training cross entropy loss from 0.377->0.356
the last epoch had validation mAP <score>=(0.83138943309)
train cross_entropy <loss>=(0.356147519184)
train smooth_l1 <loss>=(0.150637295831)
The training loss still seems like its got a reasonable amount to reduce but I don't have any experience with resnet (on yolov3 this data set quickly went below .1)
Is my approach of have 1/3 of the training images not have either class present reasonable? When I was doing yolov3 training it seemed to help the network avoid false positives.
Is there any rule of thumb that helps me estimate how many epochs are appropriate based on the number of classes/images?
Its cost me about 100 bucks on aws to get to this point, I'm not sure if it needs another 100 bucks or 1000 bucks to get to the optimal mAP - at the current rate it appears 1 hour is making about 1% improvement - and i'd expect that to slow down.
Are there other metrics I should be looking at? (if so how do i export them)?
are there any hyperparameters I should change, and resume training?
My hyperparameters are:
base_network='resnet-50',
num_classes=2,
mini_batch_size=32,
epochs=200,
learning_rate=0.001,
lr_scheduler_step='3,6',
lr_scheduler_factor=0.1,
optimizer='sgd',
momentum=0.9,
weight_decay=0.0005,
overlap_threshold=0.5,
nms_threshold=0.45,
image_shape=416,
label_width=480,
num_training_samples=19732)
thanks,
John
It's hard to say ahead of time for a custom dataset because you're dealing with many different variables. Tracking the validation mAP is of course a good way to tell you when to stop. For example, the mAP stops increasing, or mAP is leveling out.
So beyond that, I would recommend looking at others who used the same architecture and similar parameters to gain an insight. You mentioned a custom dataset, but for ImageNet, DAWNBench publishes that info. For example, this page lists the hyperparameters per epoch for you to explore of a related setup.
I would also urge you to look at fine tuning pre-trained models to save money and computation. See the Vision section here
and here
and
https://github.com/apache/incubator-mxnet/issues/4616
for information on fine-tuning the FC layers.
With training & validation through a dataset for nearly 24 epochs, intermittently 8 epochs at once and saving weights cumulatively after each interval.
I observed a constant declining train & test-loss for first 16 epochs, post which the training loss continues to fall whereas test loss rises so i think it's the case of Overfitting.
For which i tried to resume training with weights saved after 16 epochs with change in hyperparameters say increasing dropout_rate a little.
Therefore i reran the dense & transition blocks with new dropout to get identical architecture with same sequence & learnable parameters count.
Now when i'm assigning previous weights to my new model(with new dropout) with model.load_weights() and compiling thereafter.
i see the training loss is even higher, that should be initially (blatantly with increased inactivity of random nodes during training) but later also it's performing quite unsatisfactory,
so i'm suspecting maybe compiling after loading pretrained weights might have ruined the performance?
what's reasoning & recommended sequence of model.load_weights() & model.compile()? i'd really appreciate any insights on above case.
The model.compile() method does not touch the weights in any way.
Its purpose is to create a symbolic function adding the loss and the optimizer to the model's existing function.
You can compile the model as many times as you want, whenever you want, and your weights will be kept intact.
Possible consequences of compile
If you got a model, well trained for some epochs, it's optimizer (depending on what type and parameters you chose for it) will also be trained for that specific epochs.
Compiling will make you lose the trained optimizer, and your first training batches might experience some bad results due to learning rates not suited to the current state of the model.
Other than that, compiling doesn't cause any harm.