what actually model.save() saves in Keras? - python

I have a Keras model and i trained the model with 100 epochs.
now, i got 0.0085 loss at epoch 85 and at lat epoch i got 0.0092.
My question is,
what does model.save() in Keras saves?
Is it save the weights it got from lat epoch(i.e., 100)
Or is it saves the weights from best epoch (i,e., epoch 85)
Or average or mean weights from all 100 epochs?.
What actually keras model.save() is designed to save the weights after 100 epochs completion?.
Thanks for Explanation in Advance:).

The model.save() saves the whole architecture, weights and the optimizer state. This command saves the details needed to reconstitute your model.
The command will save:
The architecture of the model, allowing to re-create the model;
The weights of the model;
The training configuration (loss, optimizer);
the state of the optimizer, allowing to resume training exactly where you left off.
So you can reuse your model using keras.models.load_model(filepath) to reinstantiate your model. load_model will also take care of compiling the model using the saved training configuration.
See the example:
from keras.models import load_model
model.save('my_model.h5') # creates a HDF5 file 'my_model.h5'
del model # deletes the existing model
# returns a compiled model
# identical to the previous one
model = load_model('my_model.h5')
Source: https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model

The model.save() will save many details about your NN. Most important details are
The architectures of the network including the dimensions (inputs/outputs layers, hidden layers ...etc).
The weights matrices for every hidden unit in each layer and the activation function.
and many other details that we may not need to outline here.
Coming back to the second part of your question, when we save the trained model, it will be saved the loss value after the last epoch. Which mean, the final value will be less or more from the previous epochs depending on the number of epochs you specified and how close you get from overfitting.
Also, the number of epochs is not saved and it doesn't make sense in most situations according to Francois Chollet the creator of Keras. see this conversation
This is true unless you activate the callback option that turns on the early stopping of the training of your network after a certain number of epochs (which you called the best iteration). see this

My question is, what does model.save() saves , "Is it save the weights
it got from lat epoch(i.e., 100)" OR "Is it saves the weights from
best epoch (i,e., epoch 85)" OR "Average or mean weights from all 100
epochs"?.
What all things are saved(weights, optimizer state etc.) are already mentioned in the other answers. In your case, the weights of the model at the end of 100 epochs are saved.
In case, you would like to save the best model(with the least loss), then you need to create a ModelCheckPoint callback object and pass it to the fit() method via the callbacks argument.
https://keras.io/callbacks/#ModelCheckpoint
https://keras.io/callbacks/#example-model-checkpoints

It saves weights
Yes
For saving weights for best epoch, use chunk of code i have given below
No
What actually keras model.save() is designed to save the weights after 100 epochs completion?. Yes it does, but have a look at following code for saving weights of only best epochs.
Use this chunk of code to:
Save weights of best epochs only
Update weights after every epoch only if given criteria is improved (val_loss is min)
Additionally, history after each epoch will be save in .csv file.
Code
import pandas as pd
from keras.callbacks import EarlyStopping, ModelCheckpoint
#Stop when val_loss is not decreasing
earlyStopping = EarlyStopping(monitor='val_loss', patience=10, verbose=0, mode='min')
#Save the model after every epoch.
checkpointer = ModelCheckpoint(filepath='Model_1_weights.h5', verbose=1, save_best_only=True)
#history variable will save training progress after each epoch
history = model.fit(X_train, y_train, batch_size=20, epochs=40, validation_data=(X_valid, y_valid), shuffle=True, callbacks=[checkpointer, earlyStopping])
#Save progress of each epoch in .csv file
hist_df = pd.DataFrame(history.history)
hist_csv_file = 'History_Model_1.csv'
with open(hist_csv_file, mode='w') as f:
hist_df.to_csv(f)
Link: https://keras.io/callbacks/#ModelCheckpoint

Related

Tensorflow 2.0: how can I recover the model with best (lowest) validation loss after fitting? [duplicate]

I am training a neural network with Keras using EarlyStopping based on val_acc and patience=0. EarlyStopping stops the training as soon as val_acc decreases.
However the final model that I obtain is not the best model, namely the one with the highest val_acc. But I rather have the model corresponding to the epoch after, namely the one corresponding to a val_acc just a bit lower than the best one and that caused the early stopping!
How do I get the best one?
I tried to use the save the best model using the call back:
ModelCheckpoint(filepath='best_model.h5', monitor='val_loss', save_best_only=True)]
But I get the same results.
In Keras 2.2.3, a new argument called restore_best_weights have been introduced for EarlyStopping callback that if set to True (defaults to False), it would restore the weights from the epoch with the best monitored quantity:
restore_best_weights: whether to restore model weights from the epoch with the best value of the monitored quantity. If False, the
model weights obtained at the last step of training are used.
If you would like to save the highest accuracy then you should set the checkpoint monitor='val_acc' it will automatically save on highest. Lowest loss might not necessarily correspond to highest accuracy. You can also set verbose=1 to see which model is being saved and why.

Resume training with Adam optimizer in Keras

My question is quite straightforward but I can't find a definite answer online (so far).
I have saved the weights of a keras model trained with an adam optimizer after a defined number of epochs of training using:
callback = tf.keras.callbacks.ModelCheckpoint(filepath=path, save_weights_only=True)
model.fit(X,y,callbacks=[callback])
When I resume training after closing my jupyter, can I simply use:
model.load_weights(path)
to continue training.
Since Adam is dependent on the epoch number (such as in the case of learning rate decay), I would like to know the easiest way to resume training in the same conditions as before.
Following ibarrond's answer, I have written a small custom callback.
optim = tf.keras.optimizers.Adam()
model.compile(optimizer=optim, loss='categorical_crossentropy',metrics=['accuracy'])
weight_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, save_weights_only=True, verbose=1, save_best_only=False)
class optim_callback(tf.keras.callbacks.Callback):
'''Custom callback to save optimiser state'''
def on_epoch_end(self,epoch,logs=None):
optim_state = tf.keras.optimizers.Adam.get_config(optim)
with open(optim_state_pkl,'wb') as f_out:
pickle.dump(optim_state,f_out)
model.fit(X,y,callbacks=[weight_callback,optim_callback()])
When I resume training:
model.load_weights(checkpoint_path)
with open(optim_state_pkl,'rb') as f_out:
optim_state = pickle.load(f_out)
tf.keras.optimizers.Adam.from_config(optim_state)
I would just like to check if this is correct. Many thanks again!!
Addendum: On further reading of the default Keras implementation of Adam and the original Adam paper, I believe that the default Adam is not dependent on epoch number but only on the iteration number. Therefore, this is unnecessary. However, the code may still be useful for anyone who wishes to keep track of other optimisers.
In order to perfectly capture the status of your optimizer, you should store its configuration using the function get_config(). This function returns a dictionary (containing the options) that can be serialized and stored in a file using pickle.
To restart the process, just d = pickle.load('my_saved_tfconf.txt') to retrieve the dictionary with the configuration and then generate your Adam Optimizer using the function from_config(d) of the Keras Adam Optimizer.

How to monitor validation loss in the training of estimators in TensorFlow?

I want to ask a question about how to monitor validation loss in the training process of estimators in TensorFlow. I have checked a similar question (validation during training of Estimator) asked before, but it did not help much.
If I use estimators to build a model, I will give an input function to the Estimator.train() function. But there is no way to add another validation_x, and validation_y data in the training process. Therefore, when the training started, I can only see the training loss. The training loss is expected to decrease when the training process running longer. However, this information is not helpful to prevent overfitting. The more valuable information is validation loss. Usually, the validation loss is the U-shape with the number of epochs. To prevent overfitting, we want to find the number of epochs that the validation loss is minimum.
So this is my problem. How can I get validation loss for each epoch in the training process of using estimators?
You need to create a validation input_fn and either use estimator.train() and estimator.evaluate() alternatively or simpy use tf.estimator.train_and_evaluate()
x = ...
y = ...
...
# For example, if x and y are numpy arrays < 2 GB
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))
val_dataset = tf.data.Dataset.from_tensor_slices((x_val_, y_val))
...
estimator = ...
for epoch in n_epochs:
estimator.train(input_fn = train_dataset)
estimator.evaluate(input_fn = val_dataset)
estimator.evaluate() will compute the loss and any other metrics that are defined in your model_fn and will save the events in a new "eval" directory inside your job_dir.

How to get the best model when using EarlyStopping callback in Keras?

I am training a neural network with Keras using EarlyStopping based on val_acc and patience=0. EarlyStopping stops the training as soon as val_acc decreases.
However the final model that I obtain is not the best model, namely the one with the highest val_acc. But I rather have the model corresponding to the epoch after, namely the one corresponding to a val_acc just a bit lower than the best one and that caused the early stopping!
How do I get the best one?
I tried to use the save the best model using the call back:
ModelCheckpoint(filepath='best_model.h5', monitor='val_loss', save_best_only=True)]
But I get the same results.
In Keras 2.2.3, a new argument called restore_best_weights have been introduced for EarlyStopping callback that if set to True (defaults to False), it would restore the weights from the epoch with the best monitored quantity:
restore_best_weights: whether to restore model weights from the epoch with the best value of the monitored quantity. If False, the
model weights obtained at the last step of training are used.
If you would like to save the highest accuracy then you should set the checkpoint monitor='val_acc' it will automatically save on highest. Lowest loss might not necessarily correspond to highest accuracy. You can also set verbose=1 to see which model is being saved and why.

Resume training with multi_gpu_model in Keras

I'm training a modified InceptionV3 model with the multi_gpu_model in Keras, and I use model.save to save the whole model.
Then I closed and restarted the IDE and used load_model to reinstantiate the model.
The problem is that I am not able to resume the training exactly where I left off.
Here is the code:
parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)
model.save('my_model.h5')
Before the IDE closed, the loss is around 0.8.
After restarting the IDE, reloading the model and re-running the above code, the loss became 1.5.
But, according to the Keras FAQ, model_save should save the whole model (architecture + weights + optimizer state), and load_model should return a compiled model that is identical to the previous one.
So I don't understand why the loss becomes larger after resuming the training.
EDIT: If I don't use the multi_gpu_model and just use the ordinary model, I'm able to resume exactly where I left off.
When you call multi_gpu_model(...), Keras automatically sets the weights of your model to some default values (at least in the version 2.2.0 which I am currently using). That's why you were not able to resume the training at the same point as it was when you saved it.
I just solved the issue by replacing the weights of the parallel model with the weights from the sequential model:
parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.layers[-2].set_weights(model.get_weights()) # you can check the index of the sequential model with parallel_model.summary()
parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)
I hope this will help you.
#saul19am When you compile it, you can only load the weights and the model structure, but you still lose the optimizer_state. I think this can help.

Categories