Reset the weights in K-fold cross validation - python

In k-fold cross validation why we need to reset the weights after each fold
we use thia function
def reset_weights(m):
if isinstance(m, nn.Conv2d) or isinstance(m, nn.Linear):
m.reset_parameters()
so we reset the weights of the model so that each cross-validation fold starts from some random initial state and not learning from the previous folds.
Why i that important ? and i think if we don't do that it would be better that the model learn from all folds and update its parameter from all of them not every one on its own.

Cross fold validation is meant to validate if the model performance is consistent and robust to different subsample of train and test data, and to fine tuning hyper parameters in a less biased way.
If the model have a good performance, with low variance among numerous (usually 5 or 10) folds of train and test data, it means that the model performance is not related to some subsample of the data.
https://en.wikipedia.org/wiki/Cross-validation_(statistics)
After validade the model, you can train it on the whole dataset, without splitting it, to improve performance.
But this approach alone can't tell if your model has overfitted or not, so take note of CNN regularization and validation methods.
https://www.analyticsvidhya.com/blog/2020/09/overfitting-in-cnn-show-to-treat-overfitting-in-convolutional-neural-networks/

Related

How to split test and train size

I am trying to feed a CNN model(Human body pose estimation)with a dataset contains 1000 numbers,
first, how can I make sure that the number of my datasets is already enough?
second, how should i split my data to train and test size? (when I put train size = 0.6 and test_size = 0.4 the network doesnt work well and show me NAN for weights and bias and loss value!)
There is no fixed way to determine when you have a sufficient size data set. It depends on many factors. Best thing to do is run with what you have and see how it performs. I usually split my data into 3 sets, training, validation and test. I usually try 75% for training, 15% for validation and 10% for final test.The validation set is what I use to tweek the hyper parameters. Initially I monitor the training accuracy and loss. If I can get that up to over 95% then I monitor the validation accuracy and loss. I use the model_checkpoint keras callback to save the model with the lowest validation loss. If the validation accuracy and loss is not satisfactory I tweek the hyper parameters to try to improve it. I have found using an adjustable learning rate to be useful for this purpose. Finally when I am satisfied with the training accuracy and validation accuracy I use the saved model to make predictions on the test set. This is the final measure of how the model performs.

Am i overfitting?

How it looks like with lesser smoothing
Hi! I am currently training my model with Darkflow Yolov2. The optimiser is SGD with lr 0.001.
Based on this graph, my val loss > train loss, which would mean that it is overfitting? If it is, what would be the recommended course of action? It seems weird because both losses are decreasing, but the val loss is slower.
For more info,
My train dataset consist of 400 images per class, with single annotations,with a total of 2800 images. I did this to prevent class imbalance, by only annotating one class instance per image. My val dataset consist of 350 images , with multiple annotations. Basically, i annotated every object within the images. I have 7 classes and my train-val-test split is 80-10-10. Is this the cause for the val loss?
Over-fitting detection includes a mismatch as training accuracy diverges from test (validation) accuracy. Since you haven't provided that data, we can't evaluate your model.
It might help to clarify stages and terms; this should let you answer the question for yourself in the future:
"Convergence" is the point in training at which we believe that the model
has learned something useful;
has reached this point via reproducible process;
isn't going to get significantly better;
is about to get worse.
Convergence is where we want to stop training and save (checkpoint) the model for production use.
We detect convergence by use of training passes and testing (validation) passes.
At convergence, we expect:
validation loss (error function, perplexity, etc.) is at a relative
minimum;
validation accuracy is at a relative maximum;
validation and training metrics are "reasonably stable", with respect
to the model's general behaviour;
training accuracy and validation accuracy are essentially equal.
Once a training run passes this point, it often transitions into "over-fitting", in which the model learns things so specific to the training data, that it is no longer as good at inferring about new observations. In this state,
training loss drops; validation loss rises;
training accuracy rises; validation accuracy drops.

python- Best techniques to split datase to get high performance accuracy

I have applied these 4 methods:
Train and Test Sets.
K-fold Cross Validation.
Leave One Out Cross
Validation. Repeated Random Test-Train Splits.
The method "Train and Test Sets" achieve high accuracy but the remaining methods achieve same accuracy but lower then first approach.
I want to know which method should I choose?
Each of Train and Test Sets and Cross Validation used in certain case,Cross Validation used if you want to compare different models.Accuracy always increase if you use bigger training data that's why sometimes Leave One Out Cross perform better than K-fold Cross Validation,it's depends on your dataset size and sometimes on algorithm you are using.On the other hand Train and Test Sets usually used if you aren't comparing diffrent models, and if the time requirements for running the cross validation aren't worth it,mean it's not needed to make Cross Validation in this case.In most cases Cross Validation is preferred,but, what method you should choose? this usually depend on your choices while training your data such way you handle data and algorithm such you are trainning data using Random Forests usually it's not needed to do Cross Validation but you can and do it in case need more you usually not doing Cross Validation in Random Forests when you use Out of Bag estimate .
Training a model comprises tuning model accuracy as well as model generalization. If model is not generalized it may be Underfit or Overfit model.
In this case, model may perform better on training data but accuracy may decrease on test or unknown data.
We use training data to improve the accuracy of model. As training data size increases model accuracy may also increase.
Similarly we use different training samples to generalize the model.
So Train-Test splitting methods depend on the size of available data and algorithm used for model design.
First train-test method has a fix size training and testing data. So on each iteration, we use same train data to train model and same test data for model's accuracy assessment.
Second k-fold method has fix size train and test data but on each iteration, test and train data changes. So it may be a better approach irrespective of data size.
Leave one out approach is useful only if data size is small. Here we use almost whole data for training purpose. So training accuracy of model will be better but may not be a generalized model.
Randomised Train-test method is also a good approach for training and testing model's performance. Here we randomly select train and test data each time. So it may perform better than Leave one out method if data size is small.
And last each splitting approach has some pros and cons. So it depends on you which splitting method is good to your model. It also depends on data size and data selection means how we are selecting data from sample while splitting.

Linear Regression + Cross Validation model training with sklearn

I am new in python sklearn. I understand the basic of cross-validation. If I split the data to 3 folds by default. sklearn will train the model 3 times with different training and testing sets of data. I assume it produces 3 different model, i mean different w^ and d^. Is this right? Should I just get 1 model back? If i use model.predict() to predict an input, which model i am using?
Cross validation evaluates model setup, not model parameters.
I.e. if I use a bad setup, like a LR with 20 parameters over 10 data points, cross validation will report low scores because the model in this setup does not generalize, not because model(s) parameters were wrong.
If after cross validation you conclude the model generalizes well, all trained models will be pretty similar. It is safe to use any of them or even get the final model by training over the entire dev dataset.

How to train the final Neural Network model after cross validation?

This is a problem that I am constantly facing, but don't seem to find the answer anywhere. I have a data set of 700 samples. As a result, I have to use cross-validation instead of just using one validation and one test set to get a close estimate of the error.
I would like to use a neural network to do this. But after doing CV with a neural network, and get an error estimate, how do I train the NN on the whole data set? Because for other algorithms like Logistic regression or SVM, there is no question of when to stop in training. But for NN, you train it until your validation score goes down. So, for the final model, training on the whole dataset, how do you know when to stop?
Just to make it clear, my problem is not how to choose hyper-parametes with NN. I can do that by using a nested CV. My question is how to train the final NN on the whole data set(when to stop more specifically) before applying it in wild?
To rephrase your question:
"When training a neural network, a common stopping criterion is the 'early stopping criterion' which stops training when the validation loss increases (signaling overfitting). For small datasets, where training samples are precious, we would prefer to use some other criterion and use 100% of the data for training the model."
I think this is generally a hard problem, so I am not surprised you have not found a simple answer. I think you have a few options:
Add regularization (such as Dropout or Batch Normalization) which should help prevent overfitting. Then, use the training loss for a stopping criterion. You could see how this approach would perform on a validation set without using early stopping to ensure that the model is not overfitting.
Be sure not to overprovision the model. Smaller models will have a more difficult time overfitting.
Take a look at the stopping criterion described in this paper which does not rely on a validation set: https://arxiv.org/pdf/1703.09580.pdf
Finally, you may not use Neural Networks here. Generally, these models work best with large amounts of training data. In this case of 700 samples, you can possibly get better performance with another algorithm.

Categories