Resume training with multi_gpu_model in Keras

Resume training with multi_gpu_model in Keras - python

I'm training a modified InceptionV3 model with the multi_gpu_model in Keras, and I use model.save to save the whole model.
Then I closed and restarted the IDE and used load_model to reinstantiate the model.
The problem is that I am not able to resume the training exactly where I left off.
Here is the code:
parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)
model.save('my_model.h5')
Before the IDE closed, the loss is around 0.8.
After restarting the IDE, reloading the model and re-running the above code, the loss became 1.5.
But, according to the Keras FAQ, model_save should save the whole model (architecture + weights + optimizer state), and load_model should return a compiled model that is identical to the previous one.
So I don't understand why the loss becomes larger after resuming the training.
EDIT: If I don't use the multi_gpu_model and just use the ordinary model, I'm able to resume exactly where I left off.

When you call multi_gpu_model(...), Keras automatically sets the weights of your model to some default values (at least in the version 2.2.0 which I am currently using). That's why you were not able to resume the training at the same point as it was when you saved it.
I just solved the issue by replacing the weights of the parallel model with the weights from the sequential model:
parallel_model = multi_gpu_model(model, gpus=2)
parallel_model.layers[-2].set_weights(model.get_weights()) # you can check the index of the sequential model with parallel_model.summary()
parallel_model.compile(optimizer='rmsprop', loss='categorical_crossentropy')
history = parallel_model.fit_generator(generate_batches(path), steps_per_epoch = num_images/batch_size, epochs = num_epochs)
I hope this will help you.

#saul19am When you compile it, you can only load the weights and the model structure, but you still lose the optimizer_state. I think this can help.

Related

I want to add more data to a existing tensorflow model

I tried load_model and then model.fit method to load a existing model and adding some more data on it. It seems working. Epoch also worked without any issue. But after saving the new trained model it looks like the old model. Exactly same file size, same data. What I am doing wrong?
from keras.models import load_model
model = load_model('/content/drive/MyDrive/Trained_database/diu_project.h5')
model.fit(x=X_train, y=y_train, epochs=30, batch_size = 5,shuffle = False, validation_split=0.2)
model.save('/content/drive/MyDrive/Trained_database/diu_project_3.h5')

Did you check if the layers of the model you are loading are trainable?
model.summary()
Is the number of trainable parameters what you would expect? You can make all the layers trainable with the following code:
model.trainable = True
Or train only from a certain layer (let's say number 100). This is particularly useful for transfer learning between close domains.
# Fine-tune from this layer onwards
fine_tune_at = 100
# Freeze all the layers before the `fine_tune_at` layer
for layer in base_model.layers[:fine_tune_at]:
layer.trainable = False
The examples are from:
https://www.tensorflow.org/tutorials/images/transfer_learning

How to fix inconsistent predictions right after training and after loading the saved model?

I trained my Keras (version 2.3.1) Sequential models for a regression problem and achieved very good results. Right after training, I make predictions on the test set and then save the model as well as the weights in separate files.
To check for the speed of the models, I recently loaded them and made predictions on a single test input array but the results are way off, which should mean that the weights at the end of the training are different from the ones being loaded.
I tried making predictions using the loaded model as is and from the loaded weights too. The results for both of them are consistent. So at least, it saves the same weights in both files, however wrong they are.
From what I have read, this looks like a common issue with Keras. I came across this suggestion at several places - set the global variable initializer manually.
My problem is that this suggestion, along with a few others (like setting a fixed seed), are to be put in place before training. Training my models takes 4-5 days! How can I fix this without having to retrain the models?
Here is how I fit the models:
hist = model.fit(
X_train, y_train,
batch_size=batch_size,
verbose=1,
epochs=epochs,
validation_split=0.2
)
Then I save the model as well as the weights:
model.save("path to .h5 file")
model.save_weights("path to .hdf5 file")
Eventually, I am loading the model and predicting from it like so:
from keras.models import load_model
model = load_model("path to the same .h5 file")
ypred = model.predict(input_arr)

what actually model.save() saves in Keras?

I have a Keras model and i trained the model with 100 epochs.
now, i got 0.0085 loss at epoch 85 and at lat epoch i got 0.0092.
My question is,
what does model.save() in Keras saves?
Is it save the weights it got from lat epoch(i.e., 100)
Or is it saves the weights from best epoch (i,e., epoch 85)
Or average or mean weights from all 100 epochs?.
What actually keras model.save() is designed to save the weights after 100 epochs completion?.
Thanks for Explanation in Advance:).

The model.save() saves the whole architecture, weights and the optimizer state. This command saves the details needed to reconstitute your model.
The command will save:
The architecture of the model, allowing to re-create the model;
The weights of the model;
The training configuration (loss, optimizer);
the state of the optimizer, allowing to resume training exactly where you left off.
So you can reuse your model using keras.models.load_model(filepath) to reinstantiate your model. load_model will also take care of compiling the model using the saved training configuration.
See the example:
from keras.models import load_model
model.save('my_model.h5') # creates a HDF5 file 'my_model.h5'
del model # deletes the existing model
# returns a compiled model
# identical to the previous one
model = load_model('my_model.h5')
Source: https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model

The model.save() will save many details about your NN. Most important details are
The architectures of the network including the dimensions (inputs/outputs layers, hidden layers ...etc).
The weights matrices for every hidden unit in each layer and the activation function.
and many other details that we may not need to outline here.
Coming back to the second part of your question, when we save the trained model, it will be saved the loss value after the last epoch. Which mean, the final value will be less or more from the previous epochs depending on the number of epochs you specified and how close you get from overfitting.
Also, the number of epochs is not saved and it doesn't make sense in most situations according to Francois Chollet the creator of Keras. see this conversation
This is true unless you activate the callback option that turns on the early stopping of the training of your network after a certain number of epochs (which you called the best iteration). see this

My question is, what does model.save() saves , "Is it save the weights
it got from lat epoch(i.e., 100)" OR "Is it saves the weights from
best epoch (i,e., epoch 85)" OR "Average or mean weights from all 100
epochs"?.
What all things are saved(weights, optimizer state etc.) are already mentioned in the other answers. In your case, the weights of the model at the end of 100 epochs are saved.
In case, you would like to save the best model(with the least loss), then you need to create a ModelCheckPoint callback object and pass it to the fit() method via the callbacks argument.
https://keras.io/callbacks/#ModelCheckpoint
https://keras.io/callbacks/#example-model-checkpoints

It saves weights
Yes
For saving weights for best epoch, use chunk of code i have given below
No
What actually keras model.save() is designed to save the weights after 100 epochs completion?. Yes it does, but have a look at following code for saving weights of only best epochs.
Use this chunk of code to:
Save weights of best epochs only
Update weights after every epoch only if given criteria is improved (val_loss is min)
Additionally, history after each epoch will be save in .csv file.
Code
import pandas as pd
from keras.callbacks import EarlyStopping, ModelCheckpoint
#Stop when val_loss is not decreasing
earlyStopping = EarlyStopping(monitor='val_loss', patience=10, verbose=0, mode='min')
#Save the model after every epoch.
checkpointer = ModelCheckpoint(filepath='Model_1_weights.h5', verbose=1, save_best_only=True)
#history variable will save training progress after each epoch
history = model.fit(X_train, y_train, batch_size=20, epochs=40, validation_data=(X_valid, y_valid), shuffle=True, callbacks=[checkpointer, earlyStopping])
#Save progress of each epoch in .csv file
hist_df = pd.DataFrame(history.history)
hist_csv_file = 'History_Model_1.csv'
with open(hist_csv_file, mode='w') as f:
hist_df.to_csv(f)
Link: https://keras.io/callbacks/#ModelCheckpoint

UserWarning: No training configuration found in save file: the model was not compiled. Compile it manually

After a training procedure, I wanted to check the accuracy by loading the created model.h5 and executing an evaluation procedure. However, I am getting a following warning:
/usr/local/lib/python3.5/dist-packages/keras/engine/saving.py:269:
UserWarning: No training configuration found in save file: the model
was not compiled. Compile it manually. warnings.warn('No training
configuration found in save file:
This dist-packages/keras/engine/saving.py file
so the problem in loading created model -> this line of code
train_model = load_model('model.h5')
Problem indicates that the model was not compiled, however, I did it.
optimizer = Adam(lr=lr, clipnorm=0.001)
train_model.compile(loss=dummy_loss, optimizer=optimizer)
I can't understand what I am doing wrong . . .
Please help me! SOS :-(

Intro
I'd like to add to olejorgenb's answer - for a specific scenario, where you don't want to train the model, just use it (e.g. in production).
"Compile" means "prepare for training", which includes mainly setting up the optimizer. It could also have been saved before, and then you can continue the "same" training after loading the saved model.
The fix
But, what about the scenario - I want to just run the model? Well, use the compile=False argument to load_model like that:
trained_model = load_model('model.h5', compile=False)
You won't be able to .fit() this model without using trained_model.compile(...) first, but most importantly - the warning will go away.
Misc Notes
Btw, in my Keras version, the argument include_optimizer has a default of True. This should work also for trainig callbacks like Checkpoint. This means, when loading a model saved by Keras, you can usually count on the optimizer being included (except for the situation: see Hull Gasper's answer).
But, when you have a model which was not trained by Keras (e.g. when converting a model trained by Darknet), the model is saved un-compiled. This produces the warning, and you can get rid of it in the way described above.

Do you get this warning when saving the model?
WARNING:tensorflow:TensorFlow optimizers do not make it possible to access
optimizer attributes or optimizer state after instantiation. As a result, we
cannot save the optimizer as part of the model save file.You will have to
compile your model again after loading it. Prefer using a Keras optimizer
instead (see keras.io/optimizers).
Seems tensorflow optimizers can't be preserved by keras :/

As mentioned keras can't save Tensorflow optimizers. Use the keras one:
optimizer = keras.optimizers.Adam(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model.compile(optimizer=optimizer,
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(...)
model.save('...')
This way works for me without manual compiling after calling load.

Changing optimizer or lr after loading model yields strange results

I'm using the latest Keras with Tensorflow backend (Python 3.6)
I'm loading a model that had a training accuracy at around 86% when I last trained it.
The orginal optimizer that I used was :
r_optimizer = optimizer=Adam(lr=0.0001, decay = .02)
model.compile(optimizer= r_optimizer,
loss='categorical_crossentropy', metrics = ['accuracy'])
If I load the model and continue training without recompiling, my
accuracy would stay around 86% (even after 10 or so more epochs).
So I wanted to try changing the learning rate or optimizer.
If I recompile the model and try to change the learning rate or the
optimizer as follows:
new_optimizer = optimizer=Adam(lr=0.001, decay = .02)
or to this one:
sgd = optimizers.SGD(lr= .0001)
and then compile:
model.compile(optimizer= new_optimizer ,
loss='categorical_crossentropy', metrics = ['accuracy'])
model.fit ....
The accuracy would reset to around 15% - 20%, instead of starting around 86%,
and my loss would be much higher.
Even if I used a small learning rate, and recompiled, I would still start
off from a very low accuracy.
From browsing the internet it seems some optimizers like ADAM or RMSPROP have
a problem with resetting weights after recompiling (can't find the link at the moment)
So I did some digging and tried to reset my optimizer without recompiling as follows:
model = load_model(load_path)
sgd = optimizers.SGD(lr=1.0) # very high for testing
model.optimizer = sgd #change optimizer
#fit for training
history =model.fit_generator(
train_gen,
steps_per_epoch = r_steps_per_epoch,
epochs = r_epochs,
validation_data=valid_gen,
validation_steps= np.ceil(len(valid_gen.filenames)/r_batch_size),
callbacks = callbacks,
shuffle= True,
verbose = 1)
However, these changes don't seem to be reflected in my training.
Despite raising the lr significantly, I'm still floundering around 86% with the same loss. During each epoch, I'm seeing very little loss or accuracy movement. I would expect the loss to be a lot more volatile.
This leads me to believe that my change in optimizer and lr isn't being
realized by the model.
Any idea what I could be doing wrong?

I think your change does not assign new lr to optimizer, and I find a solution to reset lr values after loading model in Keras, hope it will help you.

This is a partial answer referring to what you wrote here:
From browsing the internet it seems some optimizers like ADAM or RMSPROP have a problem with resetting weights after recompiling (can't find the link at the moment)
Adaptive optimizers such as ADAM RMSPROP, ADAGRAD, ADADELTA, and any variation on these, rely on previous update steps to improve the direction and magnitude of any current adjustment to the weights of the model.
Because of this, the first few steps that they take tend to be relatively "bad" as they "calibrate themselves" with information from previous steps.
When used on a random initialization, this is not a problem, but when used on a pretrained model, these few first steps, can degrade the model so much, that almost all of the pretrained work gets lost.
Even worse, now the training doesn't start from a carefully chosen random initialization like a Xavier initialization, but from some sub-optimal starting point, which could potentially prevent the model from converging to the local optimum that it would have reached if it started from a good random initialization.
Unfortunately I'm not sure how you can avoid this... Perhaps pretrain with one optimizer --> save weights --> replace optimizer --> restore weights --> train for a few epochs and hope the new adaptive optimizer learns a "useful history" --> than restore the weights agin from the saved weights of the pretrained model and without recompiling start training again, now with a better optimizer "history".
Please let us know if this works.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Resume training with multi_gpu_model in Keras - python

#saul19am When you compile it, you can only load the weights and the model structure, but you still lose the optimizer_state. I think this can help.

Related

I want to add more data to a existing tensorflow model

How to fix inconsistent predictions right after training and after loading the saved model?

what actually model.save() saves in Keras?

UserWarning: No training configuration found in save file: the model was not compiled. Compile it manually

Changing optimizer or lr after loading model yields strange results

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Resume training with multi_gpu_model in Keras - python

#saul19am When you compile it, you can only load the weights and the model structure, but you still lose the optimizer_state. I think this can help.

Related

I want to add more data to a existing tensorflow model

How to fix inconsistent predictions right after training and after loading the saved model?

what actually model.save() saves in Keras?

UserWarning: No training configuration found in save file: the model was *not* compiled. Compile it manually

Changing optimizer or lr after loading model yields strange results

Categories

Resources

UserWarning: No training configuration found in save file: the model was not compiled. Compile it manually