How to get the best model when using EarlyStopping callback in Keras?

How to get the best model when using EarlyStopping callback in Keras? - python

I am training a neural network with Keras using EarlyStopping based on val_acc and patience=0. EarlyStopping stops the training as soon as val_acc decreases.
However the final model that I obtain is not the best model, namely the one with the highest val_acc. But I rather have the model corresponding to the epoch after, namely the one corresponding to a val_acc just a bit lower than the best one and that caused the early stopping!
How do I get the best one?
I tried to use the save the best model using the call back:
ModelCheckpoint(filepath='best_model.h5', monitor='val_loss', save_best_only=True)]
But I get the same results.

In Keras 2.2.3, a new argument called restore_best_weights have been introduced for EarlyStopping callback that if set to True (defaults to False), it would restore the weights from the epoch with the best monitored quantity:
restore_best_weights: whether to restore model weights from the epoch with the best value of the monitored quantity. If False, the
model weights obtained at the last step of training are used.

If you would like to save the highest accuracy then you should set the checkpoint monitor='val_acc' it will automatically save on highest. Lowest loss might not necessarily correspond to highest accuracy. You can also set verbose=1 to see which model is being saved and why.

Related

Tensorflow 2.0: how can I recover the model with best (lowest) validation loss after fitting? [duplicate]

I am training a neural network with Keras using EarlyStopping based on val_acc and patience=0. EarlyStopping stops the training as soon as val_acc decreases.
However the final model that I obtain is not the best model, namely the one with the highest val_acc. But I rather have the model corresponding to the epoch after, namely the one corresponding to a val_acc just a bit lower than the best one and that caused the early stopping!
How do I get the best one?
I tried to use the save the best model using the call back:
ModelCheckpoint(filepath='best_model.h5', monitor='val_loss', save_best_only=True)]
But I get the same results.

In Keras 2.2.3, a new argument called restore_best_weights have been introduced for EarlyStopping callback that if set to True (defaults to False), it would restore the weights from the epoch with the best monitored quantity:
restore_best_weights: whether to restore model weights from the epoch with the best value of the monitored quantity. If False, the
model weights obtained at the last step of training are used.

If you would like to save the highest accuracy then you should set the checkpoint monitor='val_acc' it will automatically save on highest. Lowest loss might not necessarily correspond to highest accuracy. You can also set verbose=1 to see which model is being saved and why.

CNN model validation accuracy is not improving

I am currently working on a CNN model for classification, I have to predict words on a wav file. I encountered a problem with my validation accuracy that stays (almost) the same, first I was thinking of overfitting but that does not seem to be the problem. Below you can see a photo with the result at the different epochs:
I am building a CNN model with Keras and using the 'adam' optimizer and 'categorical_crossentropy' for the loss. I already have tried to increase the number of epochs until 1000 and changed the batch size.

Your training loss seems to be decreasing but val_loss is increasing while val_accuracy is approximately same. This is standard case of overfitting. Why do you think that's not the case?
Increasing the training epochs or batch size is not helpful as you're just changing the number of times the model sees the data or the quantity of data it sees in one epoch.
For current scenario, the best model is created till the point both val_loss and train_loss continues to decrease before it becomes saturated.
To address the problem, you need to add noise in the training data so that the model generalizes better, generalize the examples better, create balanced categories in terms of training data volume.
Secondly, you can increase your validation dataset to see if it continues to have the same issue. If it's there then the model is definitely overfitting. ALso please update your question about what kind of validation set and technique you're using. If possible, add the code snippet of your validation set and loss function

what actually model.save() saves in Keras?

I have a Keras model and i trained the model with 100 epochs.
now, i got 0.0085 loss at epoch 85 and at lat epoch i got 0.0092.
My question is,
what does model.save() in Keras saves?
Is it save the weights it got from lat epoch(i.e., 100)
Or is it saves the weights from best epoch (i,e., epoch 85)
Or average or mean weights from all 100 epochs?.
What actually keras model.save() is designed to save the weights after 100 epochs completion?.
Thanks for Explanation in Advance:).

The model.save() saves the whole architecture, weights and the optimizer state. This command saves the details needed to reconstitute your model.
The command will save:
The architecture of the model, allowing to re-create the model;
The weights of the model;
The training configuration (loss, optimizer);
the state of the optimizer, allowing to resume training exactly where you left off.
So you can reuse your model using keras.models.load_model(filepath) to reinstantiate your model. load_model will also take care of compiling the model using the saved training configuration.
See the example:
from keras.models import load_model
model.save('my_model.h5') # creates a HDF5 file 'my_model.h5'
del model # deletes the existing model
# returns a compiled model
# identical to the previous one
model = load_model('my_model.h5')
Source: https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model

The model.save() will save many details about your NN. Most important details are
The architectures of the network including the dimensions (inputs/outputs layers, hidden layers ...etc).
The weights matrices for every hidden unit in each layer and the activation function.
and many other details that we may not need to outline here.
Coming back to the second part of your question, when we save the trained model, it will be saved the loss value after the last epoch. Which mean, the final value will be less or more from the previous epochs depending on the number of epochs you specified and how close you get from overfitting.
Also, the number of epochs is not saved and it doesn't make sense in most situations according to Francois Chollet the creator of Keras. see this conversation
This is true unless you activate the callback option that turns on the early stopping of the training of your network after a certain number of epochs (which you called the best iteration). see this

My question is, what does model.save() saves , "Is it save the weights
it got from lat epoch(i.e., 100)" OR "Is it saves the weights from
best epoch (i,e., epoch 85)" OR "Average or mean weights from all 100
epochs"?.
What all things are saved(weights, optimizer state etc.) are already mentioned in the other answers. In your case, the weights of the model at the end of 100 epochs are saved.
In case, you would like to save the best model(with the least loss), then you need to create a ModelCheckPoint callback object and pass it to the fit() method via the callbacks argument.
https://keras.io/callbacks/#ModelCheckpoint
https://keras.io/callbacks/#example-model-checkpoints

It saves weights
Yes
For saving weights for best epoch, use chunk of code i have given below
No
What actually keras model.save() is designed to save the weights after 100 epochs completion?. Yes it does, but have a look at following code for saving weights of only best epochs.
Use this chunk of code to:
Save weights of best epochs only
Update weights after every epoch only if given criteria is improved (val_loss is min)
Additionally, history after each epoch will be save in .csv file.
Code
import pandas as pd
from keras.callbacks import EarlyStopping, ModelCheckpoint
#Stop when val_loss is not decreasing
earlyStopping = EarlyStopping(monitor='val_loss', patience=10, verbose=0, mode='min')
#Save the model after every epoch.
checkpointer = ModelCheckpoint(filepath='Model_1_weights.h5', verbose=1, save_best_only=True)
#history variable will save training progress after each epoch
history = model.fit(X_train, y_train, batch_size=20, epochs=40, validation_data=(X_valid, y_valid), shuffle=True, callbacks=[checkpointer, earlyStopping])
#Save progress of each epoch in .csv file
hist_df = pd.DataFrame(history.history)
hist_csv_file = 'History_Model_1.csv'
with open(hist_csv_file, mode='w') as f:
hist_df.to_csv(f)
Link: https://keras.io/callbacks/#ModelCheckpoint

should model.compile() be run prior to using model.load_weights(), if model has been only slightly changed say dropout?

With training & validation through a dataset for nearly 24 epochs, intermittently 8 epochs at once and saving weights cumulatively after each interval.
I observed a constant declining train & test-loss for first 16 epochs, post which the training loss continues to fall whereas test loss rises so i think it's the case of Overfitting.
For which i tried to resume training with weights saved after 16 epochs with change in hyperparameters say increasing dropout_rate a little.
Therefore i reran the dense & transition blocks with new dropout to get identical architecture with same sequence & learnable parameters count.
Now when i'm assigning previous weights to my new model(with new dropout) with model.load_weights() and compiling thereafter.
i see the training loss is even higher, that should be initially (blatantly with increased inactivity of random nodes during training) but later also it's performing quite unsatisfactory,
so i'm suspecting maybe compiling after loading pretrained weights might have ruined the performance?
what's reasoning & recommended sequence of model.load_weights() & model.compile()? i'd really appreciate any insights on above case.

The model.compile() method does not touch the weights in any way.
Its purpose is to create a symbolic function adding the loss and the optimizer to the model's existing function.
You can compile the model as many times as you want, whenever you want, and your weights will be kept intact.
Possible consequences of compile
If you got a model, well trained for some epochs, it's optimizer (depending on what type and parameters you chose for it) will also be trained for that specific epochs.
Compiling will make you lose the trained optimizer, and your first training batches might experience some bad results due to learning rates not suited to the current state of the model.
Other than that, compiling doesn't cause any harm.

Changing optimizer or lr after loading model yields strange results

I'm using the latest Keras with Tensorflow backend (Python 3.6)
I'm loading a model that had a training accuracy at around 86% when I last trained it.
The orginal optimizer that I used was :
r_optimizer = optimizer=Adam(lr=0.0001, decay = .02)
model.compile(optimizer= r_optimizer,
loss='categorical_crossentropy', metrics = ['accuracy'])
If I load the model and continue training without recompiling, my
accuracy would stay around 86% (even after 10 or so more epochs).
So I wanted to try changing the learning rate or optimizer.
If I recompile the model and try to change the learning rate or the
optimizer as follows:
new_optimizer = optimizer=Adam(lr=0.001, decay = .02)
or to this one:
sgd = optimizers.SGD(lr= .0001)
and then compile:
model.compile(optimizer= new_optimizer ,
loss='categorical_crossentropy', metrics = ['accuracy'])
model.fit ....
The accuracy would reset to around 15% - 20%, instead of starting around 86%,
and my loss would be much higher.
Even if I used a small learning rate, and recompiled, I would still start
off from a very low accuracy.
From browsing the internet it seems some optimizers like ADAM or RMSPROP have
a problem with resetting weights after recompiling (can't find the link at the moment)
So I did some digging and tried to reset my optimizer without recompiling as follows:
model = load_model(load_path)
sgd = optimizers.SGD(lr=1.0) # very high for testing
model.optimizer = sgd #change optimizer
#fit for training
history =model.fit_generator(
train_gen,
steps_per_epoch = r_steps_per_epoch,
epochs = r_epochs,
validation_data=valid_gen,
validation_steps= np.ceil(len(valid_gen.filenames)/r_batch_size),
callbacks = callbacks,
shuffle= True,
verbose = 1)
However, these changes don't seem to be reflected in my training.
Despite raising the lr significantly, I'm still floundering around 86% with the same loss. During each epoch, I'm seeing very little loss or accuracy movement. I would expect the loss to be a lot more volatile.
This leads me to believe that my change in optimizer and lr isn't being
realized by the model.
Any idea what I could be doing wrong?

I think your change does not assign new lr to optimizer, and I find a solution to reset lr values after loading model in Keras, hope it will help you.

This is a partial answer referring to what you wrote here:
From browsing the internet it seems some optimizers like ADAM or RMSPROP have a problem with resetting weights after recompiling (can't find the link at the moment)
Adaptive optimizers such as ADAM RMSPROP, ADAGRAD, ADADELTA, and any variation on these, rely on previous update steps to improve the direction and magnitude of any current adjustment to the weights of the model.
Because of this, the first few steps that they take tend to be relatively "bad" as they "calibrate themselves" with information from previous steps.
When used on a random initialization, this is not a problem, but when used on a pretrained model, these few first steps, can degrade the model so much, that almost all of the pretrained work gets lost.
Even worse, now the training doesn't start from a carefully chosen random initialization like a Xavier initialization, but from some sub-optimal starting point, which could potentially prevent the model from converging to the local optimum that it would have reached if it started from a good random initialization.
Unfortunately I'm not sure how you can avoid this... Perhaps pretrain with one optimizer --> save weights --> replace optimizer --> restore weights --> train for a few epochs and hope the new adaptive optimizer learns a "useful history" --> than restore the weights agin from the saved weights of the pretrained model and without recompiling start training again, now with a better optimizer "history".
Please let us know if this works.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get the best model when using EarlyStopping callback in Keras? - python

If you would like to save the highest accuracy then you should set the checkpoint monitor='val_acc' it will automatically save on highest. Lowest loss might not necessarily correspond to highest accuracy. You can also set verbose=1 to see which model is being saved and why.

Related

Tensorflow 2.0: how can I recover the model with best (lowest) validation loss after fitting? [duplicate]

CNN model validation accuracy is not improving

what actually model.save() saves in Keras?

should model.compile() be run prior to using model.load_weights(), if model has been only slightly changed say dropout?

Changing optimizer or lr after loading model yields strange results

Categories

Resources