Resume training with Adam optimizer in Keras - python

My question is quite straightforward but I can't find a definite answer online (so far).
I have saved the weights of a keras model trained with an adam optimizer after a defined number of epochs of training using:
callback = tf.keras.callbacks.ModelCheckpoint(filepath=path, save_weights_only=True)
model.fit(X,y,callbacks=[callback])
When I resume training after closing my jupyter, can I simply use:
model.load_weights(path)
to continue training.
Since Adam is dependent on the epoch number (such as in the case of learning rate decay), I would like to know the easiest way to resume training in the same conditions as before.
Following ibarrond's answer, I have written a small custom callback.
optim = tf.keras.optimizers.Adam()
model.compile(optimizer=optim, loss='categorical_crossentropy',metrics=['accuracy'])
weight_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, save_weights_only=True, verbose=1, save_best_only=False)
class optim_callback(tf.keras.callbacks.Callback):
'''Custom callback to save optimiser state'''
def on_epoch_end(self,epoch,logs=None):
optim_state = tf.keras.optimizers.Adam.get_config(optim)
with open(optim_state_pkl,'wb') as f_out:
pickle.dump(optim_state,f_out)
model.fit(X,y,callbacks=[weight_callback,optim_callback()])
When I resume training:
model.load_weights(checkpoint_path)
with open(optim_state_pkl,'rb') as f_out:
optim_state = pickle.load(f_out)
tf.keras.optimizers.Adam.from_config(optim_state)
I would just like to check if this is correct. Many thanks again!!
Addendum: On further reading of the default Keras implementation of Adam and the original Adam paper, I believe that the default Adam is not dependent on epoch number but only on the iteration number. Therefore, this is unnecessary. However, the code may still be useful for anyone who wishes to keep track of other optimisers.

In order to perfectly capture the status of your optimizer, you should store its configuration using the function get_config(). This function returns a dictionary (containing the options) that can be serialized and stored in a file using pickle.
To restart the process, just d = pickle.load('my_saved_tfconf.txt') to retrieve the dictionary with the configuration and then generate your Adam Optimizer using the function from_config(d) of the Keras Adam Optimizer.

Related

Tensorflow 2x: What exactly does the parameter include_optimizer affect in tensorflow.keras.save_model

I have been browsing the documentation for the tensorflow.keras.save_model() API and I came across the parameter include_optimizer and I am wondering what would be the advantage of not including the optimizer, or perhaps what problems could arise if the optimizer isn't saved with the model?
To give more context for my specific use-case, I want to save a model and then use the generated .pb file with Tensorflow Serving. Is there any reason I would need to save the optimizer state, would not saving it reduce the overall size of the resultant file? If I don't save it is it possible that the model will not work correctly in TF serving?
Saving the optimizer state will require more space, as the optimizer has parameters that are adjusted during training. For some optimizers, this space can be significant, as several meta-parameters are saved for each tuned model parameter.
Saving the optimizer parameters allows you to restart training in exactly the same state as you saved the checkpoint, whereas without saving the optimizer state, even the same model parameters might result in a variety of training outcomes with different optimizer parameters.
Thus, if you plan on continuing to train your model from the saved checkpoint, you'd probably want to save the optimizer's state as well. However, if you're instead saving the model state for future use only for inference, you don't need the optimizer state for anything. Based on your description of wanting to deploy the model on TF Serving, it sounds like you'll only be doing inference with the saved model, so are safe to exclude the optimizer.

pytorch predictions stability

This is my predict function. is there anything wrong with this? Predictions are not stable, everytime I run on same data, I get different predictions.
def predict(model, device, inputs, batch_size=1024):
model = model.to(device)
dataset = torch.utils.data.TensorDataset(*inputs)
loader = torch.utils.data.DataLoader(
dataset,
batch_size=batch_size,
pin_memory=False
)
predictions = []
for i, batch in enumerate(loader):
with torch.no_grad():
pred = model(*(item.to(device) for item in batch))
pred = pred.detach().cpu().numpy()
predictions.append(pred)
return np.concatenate(predictions)
As Usman Ali suggested, you need to set your model to eval mode by calling
model.eval()
before your prediction function.
What eval mode does:
Sets the module in evaluation mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.
When you finish your prediction and wish t continue training, don't forget to reset your model to training mode by calling
model.train()
There are several layers in models that may introduce randomness into the forward pass of the net. One such example is the dropout layers. A dropout layer "drops" p percent of its neurons at random to increase model's generalization.
Additionally, BatchNorm (and possibly other adaptive normalization layers) keeps track of the statistics of the data and therefore has a different "behavior" in train mode or in eval mode.
You have defined the function, but you haven't trained the model. The model randomizes predictions before it is trained, which is why yours are inconsistent. If you set up an optimizer with a loss function, and run over multiple epochs the predictions will stabilize. This link may help: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html. Look at sections 3 and 4

what actually model.save() saves in Keras?

I have a Keras model and i trained the model with 100 epochs.
now, i got 0.0085 loss at epoch 85 and at lat epoch i got 0.0092.
My question is,
what does model.save() in Keras saves?
Is it save the weights it got from lat epoch(i.e., 100)
Or is it saves the weights from best epoch (i,e., epoch 85)
Or average or mean weights from all 100 epochs?.
What actually keras model.save() is designed to save the weights after 100 epochs completion?.
Thanks for Explanation in Advance:).
The model.save() saves the whole architecture, weights and the optimizer state. This command saves the details needed to reconstitute your model.
The command will save:
The architecture of the model, allowing to re-create the model;
The weights of the model;
The training configuration (loss, optimizer);
the state of the optimizer, allowing to resume training exactly where you left off.
So you can reuse your model using keras.models.load_model(filepath) to reinstantiate your model. load_model will also take care of compiling the model using the saved training configuration.
See the example:
from keras.models import load_model
model.save('my_model.h5') # creates a HDF5 file 'my_model.h5'
del model # deletes the existing model
# returns a compiled model
# identical to the previous one
model = load_model('my_model.h5')
Source: https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model
The model.save() will save many details about your NN. Most important details are
The architectures of the network including the dimensions (inputs/outputs layers, hidden layers ...etc).
The weights matrices for every hidden unit in each layer and the activation function.
and many other details that we may not need to outline here.
Coming back to the second part of your question, when we save the trained model, it will be saved the loss value after the last epoch. Which mean, the final value will be less or more from the previous epochs depending on the number of epochs you specified and how close you get from overfitting.
Also, the number of epochs is not saved and it doesn't make sense in most situations according to Francois Chollet the creator of Keras. see this conversation
This is true unless you activate the callback option that turns on the early stopping of the training of your network after a certain number of epochs (which you called the best iteration). see this
My question is, what does model.save() saves , "Is it save the weights
it got from lat epoch(i.e., 100)" OR "Is it saves the weights from
best epoch (i,e., epoch 85)" OR "Average or mean weights from all 100
epochs"?.
What all things are saved(weights, optimizer state etc.) are already mentioned in the other answers. In your case, the weights of the model at the end of 100 epochs are saved.
In case, you would like to save the best model(with the least loss), then you need to create a ModelCheckPoint callback object and pass it to the fit() method via the callbacks argument.
https://keras.io/callbacks/#ModelCheckpoint
https://keras.io/callbacks/#example-model-checkpoints
It saves weights
Yes
For saving weights for best epoch, use chunk of code i have given below
No
What actually keras model.save() is designed to save the weights after 100 epochs completion?. Yes it does, but have a look at following code for saving weights of only best epochs.
Use this chunk of code to:
Save weights of best epochs only
Update weights after every epoch only if given criteria is improved (val_loss is min)
Additionally, history after each epoch will be save in .csv file.
Code
import pandas as pd
from keras.callbacks import EarlyStopping, ModelCheckpoint
#Stop when val_loss is not decreasing
earlyStopping = EarlyStopping(monitor='val_loss', patience=10, verbose=0, mode='min')
#Save the model after every epoch.
checkpointer = ModelCheckpoint(filepath='Model_1_weights.h5', verbose=1, save_best_only=True)
#history variable will save training progress after each epoch
history = model.fit(X_train, y_train, batch_size=20, epochs=40, validation_data=(X_valid, y_valid), shuffle=True, callbacks=[checkpointer, earlyStopping])
#Save progress of each epoch in .csv file
hist_df = pd.DataFrame(history.history)
hist_csv_file = 'History_Model_1.csv'
with open(hist_csv_file, mode='w') as f:
hist_df.to_csv(f)
Link: https://keras.io/callbacks/#ModelCheckpoint

Changing optimizer or lr after loading model yields strange results

I'm using the latest Keras with Tensorflow backend (Python 3.6)
I'm loading a model that had a training accuracy at around 86% when I last trained it.
The orginal optimizer that I used was :
r_optimizer = optimizer=Adam(lr=0.0001, decay = .02)
model.compile(optimizer= r_optimizer,
loss='categorical_crossentropy', metrics = ['accuracy'])
If I load the model and continue training without recompiling, my
accuracy would stay around 86% (even after 10 or so more epochs).
So I wanted to try changing the learning rate or optimizer.
If I recompile the model and try to change the learning rate or the
optimizer as follows:
new_optimizer = optimizer=Adam(lr=0.001, decay = .02)
or to this one:
sgd = optimizers.SGD(lr= .0001)
and then compile:
model.compile(optimizer= new_optimizer ,
loss='categorical_crossentropy', metrics = ['accuracy'])
model.fit ....
The accuracy would reset to around 15% - 20%, instead of starting around 86%,
and my loss would be much higher.
Even if I used a small learning rate, and recompiled, I would still start
off from a very low accuracy.
From browsing the internet it seems some optimizers like ADAM or RMSPROP have
a problem with resetting weights after recompiling (can't find the link at the moment)
So I did some digging and tried to reset my optimizer without recompiling as follows:
model = load_model(load_path)
sgd = optimizers.SGD(lr=1.0) # very high for testing
model.optimizer = sgd #change optimizer
#fit for training
history =model.fit_generator(
train_gen,
steps_per_epoch = r_steps_per_epoch,
epochs = r_epochs,
validation_data=valid_gen,
validation_steps= np.ceil(len(valid_gen.filenames)/r_batch_size),
callbacks = callbacks,
shuffle= True,
verbose = 1)
However, these changes don't seem to be reflected in my training.
Despite raising the lr significantly, I'm still floundering around 86% with the same loss. During each epoch, I'm seeing very little loss or accuracy movement. I would expect the loss to be a lot more volatile.
This leads me to believe that my change in optimizer and lr isn't being
realized by the model.
Any idea what I could be doing wrong?
I think your change does not assign new lr to optimizer, and I find a solution to reset lr values after loading model in Keras, hope it will help you.
This is a partial answer referring to what you wrote here:
From browsing the internet it seems some optimizers like ADAM or RMSPROP have a problem with resetting weights after recompiling (can't find the link at the moment)
Adaptive optimizers such as ADAM RMSPROP, ADAGRAD, ADADELTA, and any variation on these, rely on previous update steps to improve the direction and magnitude of any current adjustment to the weights of the model.
Because of this, the first few steps that they take tend to be relatively "bad" as they "calibrate themselves" with information from previous steps.
When used on a random initialization, this is not a problem, but when used on a pretrained model, these few first steps, can degrade the model so much, that almost all of the pretrained work gets lost.
Even worse, now the training doesn't start from a carefully chosen random initialization like a Xavier initialization, but from some sub-optimal starting point, which could potentially prevent the model from converging to the local optimum that it would have reached if it started from a good random initialization.
Unfortunately I'm not sure how you can avoid this... Perhaps pretrain with one optimizer --> save weights --> replace optimizer --> restore weights --> train for a few epochs and hope the new adaptive optimizer learns a "useful history" --> than restore the weights agin from the saved weights of the pretrained model and without recompiling start training again, now with a better optimizer "history".
Please let us know if this works.

Resume training with multi_gpu_model in Keras

I'm training a modified InceptionV3 model with the multi_gpu_model in Keras, and I use model.save to save the whole model.
Then I closed and restarted the IDE and used load_model to reinstantiate the model.
The problem is that I am not able to resume the training exactly where I left off.
Here is the code:
parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)
model.save('my_model.h5')
Before the IDE closed, the loss is around 0.8.
After restarting the IDE, reloading the model and re-running the above code, the loss became 1.5.
But, according to the Keras FAQ, model_save should save the whole model (architecture + weights + optimizer state), and load_model should return a compiled model that is identical to the previous one.
So I don't understand why the loss becomes larger after resuming the training.
EDIT: If I don't use the multi_gpu_model and just use the ordinary model, I'm able to resume exactly where I left off.
When you call multi_gpu_model(...), Keras automatically sets the weights of your model to some default values (at least in the version 2.2.0 which I am currently using). That's why you were not able to resume the training at the same point as it was when you saved it.
I just solved the issue by replacing the weights of the parallel model with the weights from the sequential model:
parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.layers[-2].set_weights(model.get_weights()) # you can check the index of the sequential model with parallel_model.summary()
parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)
I hope this will help you.
#saul19am When you compile it, you can only load the weights and the model structure, but you still lose the optimizer_state. I think this can help.

Categories