I saved a nn.Module model using (logically):
model = MyWeirdModel()
model.patched_features = .....
train(model)
torch.save(model, file)
Ideally, one would load this model using
model = torch.load(file)
However, in my case this doesn't work because Pickle uses the static class definition when un-pickling, so I get AttributeError: 'MyWeirdModel' object has no attribute 'patched_features' (this attribute was added to MyWeirdModel at runtime).
I would like to avoid having to re-train the model, so I don't want to change the code for saving, only loading.
# Initialise the model in the same way as before
model = MyWeirdModel()
model.patched_features = .....
state_dict = load_state_dict_only(file) # How does one do this?
model.load_state_dict(state_dict)
My understanding is that torch.save() saves the model AND the state dict. How do I load only the state dict from the pickled model, such that I can recover the model?
Referring to the documentation,
only layers with learnable parameters (convolutional layers, linear
layers, etc.) and registered buffers (batchnorm’s running_mean) have
entries in the model’s state_dict.
Your patched_features is not stored in the model’s state_dict, but you can do so by registering it with register_parameter as pointed out by ptrblck in this post but it may involve retraining.
As Pytorch Lightning provides automatic saving for model checkpoints, I use it to save top-k best models. Specifically in Trainer setting,
checkpoint_callback = ModelCheckpoint(
monitor='val_acc',
dirpath='checkpoints/',
filename='{epoch:02d}-{val_acc:.2f}',
save_top_k=5,
mode='max',
)
This is working well but it does not save some attribute of the model object. My model stores some Tensor at every training epoch end such that
class SampleNet(pl.LightningModule):
def __init__(self):
super().__init__()
self.save_hyperparameters()
self.layer = torch.nn.Linear(100, 1)
self.loss = torch.nn.CrossEntropy()
self.some_data = None # Initialize as None
def training_step(self, batch):
x, t = batch
out = self.layer(x)
loss = self.loss(out, t)
results = {'loss': loss}
return results
def training_epoch_end(self, outputs):
self.some_data = some_tensor_object
This is a simplified example but I want the checkpoint file made by above checkpoint_callback to remember the attribute self.some_data but when I load the model from checkpoint, it always reset to None. I confirmed that it is successfully updated during the training.
I tried not to initialize it as None in init but then the attribute will disappear when loading model.
Saving the attribute as a distinct pt file is something I want to avoid as it is associated with model configuration so I manually need to match the file with corresponding checkpoint file later.
Would it be possible to include such tensor attribute in checkpoint file?
Simply use the model class hooks on_save_checkpoint() and on_load_checkpoint() for all sorts of objects that you want to save alongside the default attributes.
def on_save_checkpoint(self, checkpoint) -> None:
"Objects to include in checkpoint file"
checkpoint["some_data"] = self.some_data
def on_load_checkpoint(self, checkpoint) -> None:
"Objects to retrieve from checkpoint file"
self.some_data= checkpoint["some_data"]
See module docs
It doesn't seem to be possible directly, as to extract the parameters most likely nn.Module.state_dict() is used.
This methods only extracts the values of the tensors that are actually considered as parameters. So in this case a workaround would be saving your data as a parameter (see docs):
self.some_data = torch.nn.parameter.Parameter(your_data)
I have created and trained a TensorFlow model using the HammingLoss metric from TensorFlow addons. Thus, it's not a custom metric that I have created on my own. I use a callbacks function with the methords ModelCheckpoint() and EarlyStopping to save the best weights of the best model and stop model training at a given threshold repsectively. When I save the model checkpoint I serialize the whole model structure (similar to model.save()), istead of model.save_weights(), which would have saved only the model weights (more about ModelCheckpoint here).
TL;DR: Here is a colab notebook with the code I post below in case you want to skip this.
The model I have trained is saved in GoogleDrive in the link here. To load the specific model I use the following code:
neural_network_parameters = {}
#======================================================================
# PARAMETERS THAT DEFINE THE NEURAL NETWORK STRUCTURE =
#======================================================================
neural_network_parameters['model_loss'] = tf.keras.losses.BinaryCrossentropy(from_logits=False, name='binary_crossentropy')
neural_network_parameters['model_metric'] = [tfa.metrics.HammingLoss(mode="multilabel", name="hamming_loss"),
tfa.metrics.F1Score(17, average="micro", name="f1_score_micro"),
tfa.metrics.F1Score(17, average=None, name="f1_score_none"),
tfa.metrics.F1Score(17, average="macro", name="f1_score_macro"),
tfa.metrics.F1Score(17, average="weighted", name="f1_score_weighted")]
"""Initialize the hyper parameters tuning the model using Tensorflow's hyperparameters module"""
HP_HIDDEN_UNITS = hp.HParam('batch_size', hp.Discrete([32]))
HP_EMBEDDING_DIM = hp.HParam('embedding_dim', hp.Discrete([50]))
HP_LEARNING_RATE = hp.HParam('learning_rate', hp.Discrete([0.001])) # Adam default: 0.001, SGD default: 0.01, RMSprop default: 0.001....0.1 to be removed
HP_DECAY_STEPS_MULTIPLIER = hp.HParam('decay_steps_multiplier', hp.Discrete([10]))
METRIC_ACCURACY = "hamming_loss"
dependencies = {
'hamming_loss': tfa.metrics.HammingLoss(mode="multilabel", name="hamming_loss"),
'attention': attention(return_sequences=True)
}
def import_trained_keras_model(model_index, method, decay_steps_mode, optimizer_name, hparams):
"""Load the model"""
training_date="2021-02-27"
model_path_structure=f"{folder_path_model_saved}/{initialize_notebbok_variables.saved_model_name}_{hparams[HP_EMBEDDING_DIM]}dim_{hparams[HP_HIDDEN_UNITS]}batchsize_{hparams[HP_LEARNING_RATE]}lr_{hparams[HP_DECAY_STEPS_MULTIPLIER]}decaymultiplier_{training_date}"
model_imported=load_model(f"{model_path_structure}", custom_objects=dependencies)
if optimizer_name=="adam":
optimizer = optimizer_adam_v2(hparams)
elif optimizer_name=="sgd":
optimizer = optimizer_sgd_v1(hparams, "step decay")
else:
optimizer = optimizer_rmsprop_v1(hparams)
model_imported.compile(optimizer=optimizer,
loss=neural_network_parameters['model_loss'],
metrics=neural_network_parameters['model_metric'])
print(f"Model {model_index} is loaded successfully\n")
return model_imported
Calling the function import trained keras model
"""Now that the functions have been created it's time to import each trained classifier from the selected dictionary of hyper parameters, calculate the evaluation metric per model and finally serialize the scores dataframe for later use."""
list_models=[] #a list to store imported models
model_optimizer="adam"
for batch_size in HP_HIDDEN_UNITS.domain.values:
for embedding_dim in HP_EMBEDDING_DIM.domain.values:
for learning_rate in HP_LEARNING_RATE.domain.values:
for decay_steps_multiplier in HP_DECAY_STEPS_MULTIPLIER.domain.values:
hparams = {
HP_HIDDEN_UNITS: batch_size,
HP_EMBEDDING_DIM: embedding_dim,
HP_LEARNING_RATE: learning_rate,
HP_DECAY_STEPS_MULTIPLIER: decay_steps_multiplier
}
print(f"\n{len(list_models)+1}/{(len(HP_HIDDEN_UNITS.domain.values)*len(HP_EMBEDDING_DIM.domain.values)*len(HP_LEARNING_RATE.domain.values)*len(HP_DECAY_STEPS_MULTIPLIER.domain.values))}")
print({h.name: hparams[h] for h in hparams},'\n')
model_object=import_trained_keras_model(len(list_models)+1, "import custom trained model", "on", model_optimizer, hparams)
list_models.append(model_object)
When I call the function I get the following error
ValueError: Unable to restore custom object of type _tf_keras_metric currently. Please make sure that the layer implements get_configand from_config when saving. In addition, please use the custom_objects arg when calling load_model().
It's strange that I get this error since the model metric to compile the NN is from a built in method of TensorFlow and NOT some sort of a custom metric that I developed myself.
I have searched also this thread in GitHub which closed without explaining the root of the problem.
[UPDATE]--Found a temporary solution
I managed to successfully import the model by turning the compile argument to False in order to re-compile the model imported inside the function.
So I did smth like model_imported=load_model(f"{model_path_structure}", custom_objects=dependencies, compile=False).
This action produced the following result:
WARNING:tensorflow:Unable to restore custom metric. Please ensure that the layer implements get_config and from_config when saving. In addition, please use the custom_objects arg when calling load_model().
Model 1 is loaded successfully.
So TensorFlow still cannot understand that HammingLoss is not a custom metric but rather a metric imported from Tensorflow Addons. However, despite the warning the model loaded successfully.
After creating a FastText model using Gensim, I want to load it but am running into errors seemingly related to callbacks.
The code used to create the model is
TRAIN_EPOCHS = 30
WINDOW = 5
MIN_COUNT = 50
DIMS = 256
vocab_model = gensim.models.FastText(sentences=model_input,
size=DIMS,
window=WINDOW,
iter=TRAIN_EPOCHS,
workers=6,
min_count=MIN_COUNT,
callbacks=[EpochSaver("./ftchkpts/")])
vocab_model.save('ft_256_min_50_model_30eps')
and the callback EpochSaver is defined as
from gensim.models.callbacks import CallbackAny2Vec
class EpochSaver(CallbackAny2Vec):
'''Callback to save model after each epoch and show training parameters '''
def __init__(self, savedir):
self.savedir = savedir
self.epoch = 0
os.makedirs(self.savedir, exist_ok=True)
def on_epoch_end(self, model):
savepath = os.path.join(self.savedir, f"ft256_{self.epoch}e")
model.save(savepath)
print(f"Epoch saved: {self.epoch + 1}")
if os.path.isfile(os.path.join(self.savedir, f"ft256_{self.epoch-1}e")):
os.remove(os.path.join(self.savedir, f"ft256_{self.epoch-1}e"))
print("Previous model deleted ")
self.epoch += 1
Aside from the type of model, this is identical to my process for Word2Vec which worked without issue. However when I open another file and try to load the model with
from gensim.models import FastText
vocab = FastText.load(r'vocab/ft_256_min_50_model_30eps')
I'm greeted with the error
AttributeError: Can't get attribute 'EpochSaver' on <module '__main__'>
What can I do to get the vocabulary to load so I can create the embedding layer for my keras model? If it's relevant, this is happening in JupyterLab.
This extra difficulty loading models with custom callbacks is a known, open issue (at least through gensim-3.8.1 and October 2019).
You can see discussions of possible workarounds and fixes there – and the gensim team is considering simply disabling the auto-saving of callbacks at all, requiring them to be re-specified for each later train()/etc call that needs them.
You may be able to load existing models saved with your custom callbacks by importing those same callback classes, as the same names, into the code context where you're doing a load().
You could save callback-free versions of your trained models by blanking the model's callbacks property to its empty default value, just before you save(), eg:
model.callbacks = ()
model.save(save_path)
Then, you wouldn't need to do any special importing of custom classes before a load(). (Of course if you again needed callback functionality on the re-loaded model, they'd then have to be explicitly reestablished after load()).
How do I save a trained model in PyTorch? I have read that:
torch.save()/torch.load() is for saving/loading a serializable object.
model.state_dict()/model.load_state_dict() is for saving/loading model state.
Found this page on their github repo:
Recommended approach for saving a model
There are two main approaches for serializing and restoring a model.
The first (recommended) saves and loads only the model parameters:
torch.save(the_model.state_dict(), PATH)
Then later:
the_model = TheModelClass(*args, **kwargs)
the_model.load_state_dict(torch.load(PATH))
The second saves and loads the entire model:
torch.save(the_model, PATH)
Then later:
the_model = torch.load(PATH)
However in this case, the serialized data is bound to the specific classes and the exact directory structure used, so it can break in various ways when used in other projects, or after some serious refactors.
See also: Save and Load the Model section from the official PyTorch tutorials.
It depends on what you want to do.
Case # 1: Save the model to use it yourself for inference: You save the model, you restore it, and then you change the model to evaluation mode. This is done because you usually have BatchNorm and Dropout layers that by default are in train mode on construction:
torch.save(model.state_dict(), filepath)
#Later to restore:
model.load_state_dict(torch.load(filepath))
model.eval()
Case # 2: Save model to resume training later: If you need to keep training the model that you are about to save, you need to save more than just the model. You also need to save the state of the optimizer, epochs, score, etc. You would do it like this:
state = {
'epoch': epoch,
'state_dict': model.state_dict(),
'optimizer': optimizer.state_dict(),
...
}
torch.save(state, filepath)
To resume training you would do things like: state = torch.load(filepath), and then, to restore the state of each individual object, something like this:
model.load_state_dict(state['state_dict'])
optimizer.load_state_dict(state['optimizer'])
Since you are resuming training, DO NOT call model.eval() once you restore the states when loading.
Case # 3: Model to be used by someone else with no access to your code:
In Tensorflow you can create a .pb file that defines both the architecture and the weights of the model. This is very handy, specially when using Tensorflow serve. The equivalent way to do this in Pytorch would be:
torch.save(model, filepath)
# Then later:
model = torch.load(filepath)
This way is still not bullet proof and since pytorch is still undergoing a lot of changes, I wouldn't recommend it.
The pickle Python library implements binary protocols for serializing and de-serializing a Python object.
When you import torch (or when you use PyTorch) it will import pickle for you and you don't need to call pickle.dump() and pickle.load() directly, which are the methods to save and to load the object.
In fact, torch.save() and torch.load() will wrap pickle.dump() and pickle.load() for you.
A state_dict the other answer mentioned deserves just a few more notes.
What state_dict do we have inside PyTorch?
There are actually two state_dicts.
The PyTorch model is torch.nn.Module which has model.parameters() call to get learnable parameters (w and b).
These learnable parameters, once randomly set, will update over time as we learn.
Learnable parameters are the first state_dict.
The second state_dict is the optimizer state dict. You recall that the optimizer is used to improve our learnable parameters. But the optimizer state_dict is fixed. Nothing to learn there.
Because state_dict objects are Python dictionaries, they can be easily saved, updated, altered, and restored, adding a great deal of modularity to PyTorch models and optimizers.
Let's create a super simple model to explain this:
import torch
import torch.optim as optim
model = torch.nn.Linear(5, 2)
# Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
print("Model's state_dict:")
for param_tensor in model.state_dict():
print(param_tensor, "\t", model.state_dict()[param_tensor].size())
print("Model weight:")
print(model.weight)
print("Model bias:")
print(model.bias)
print("---")
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
print(var_name, "\t", optimizer.state_dict()[var_name])
This code will output the following:
Model's state_dict:
weight torch.Size([2, 5])
bias torch.Size([2])
Model weight:
Parameter containing:
tensor([[ 0.1328, 0.1360, 0.1553, -0.1838, -0.0316],
[ 0.0479, 0.1760, 0.1712, 0.2244, 0.1408]], requires_grad=True)
Model bias:
Parameter containing:
tensor([ 0.4112, -0.0733], requires_grad=True)
---
Optimizer's state_dict:
state {}
param_groups [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [140695321443856, 140695321443928]}]
Note this is a minimal model. You may try to add stack of sequential
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.Conv2d(A, B, C)
torch.nn.Linear(H, D_out),
)
Note that only layers with learnable parameters (convolutional layers, linear layers, etc.) and registered buffers (batchnorm layers) have entries in the model's state_dict.
Non-learnable things belong to the optimizer object state_dict, which contains information about the optimizer's state, as well as the hyperparameters used.
The rest of the story is the same; in the inference phase (this is a phase when we use the model after training) for predicting; we do predict based on the parameters we learned. So for the inference, we just need to save the parameters model.state_dict().
torch.save(model.state_dict(), filepath)
And to use later
model.load_state_dict(torch.load(filepath))
model.eval()
Note: Don't forget the last line model.eval() this is crucial after loading the model.
Also don't try to save torch.save(model.parameters(), filepath). The model.parameters() is just the generator object.
On the other hand, torch.save(model, filepath) saves the model object itself, but keep in mind the model doesn't have the optimizer's state_dict. Check the other excellent answer by #Jadiel de Armas to save the optimizer's state dict.
A common PyTorch convention is to save models using either a .pt or .pth file extension.
Save/Load Entire Model
Save:
path = "username/directory/lstmmodelgpu.pth"
torch.save(trainer, path)
Load:
(Model class must be defined somewhere)
model.load_state_dict(torch.load(PATH))
model.eval()
If you want to save the model and wants to resume the training later:
Single GPU:
Save:
state = {
'epoch': epoch,
'state_dict': model.state_dict(),
'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)
Load:
checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']
Multiple GPU:
Save
state = {
'epoch': epoch,
'state_dict': model.module.state_dict(),
'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)
Load:
checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']
#Don't call DataParallel before loading the model otherwise you will get an error
model = nn.DataParallel(model) #ignore the line if you want to load on Single GPU
Saving locally
How you save your model depends on how you want to access it in the future. If you can call a new instance of the model class, then all you need to do is save/load the weights of the model with model.state_dict():
# Save:
torch.save(old_model.state_dict(), PATH)
# Load:
new_model = TheModelClass(*args, **kwargs)
new_model.load_state_dict(torch.load(PATH))
If you cannot for whatever reason (or prefer the simpler syntax), then you can save the entire model (actually a reference to the file(s) defining the model, along with its state_dict) with torch.save():
# Save:
torch.save(old_model, PATH)
# Load:
new_model = torch.load(PATH)
But since this is a reference to the location of the files defining the model class, this code is not portable unless those files are also ported in the same directory structure.
Saving to cloud - TorchHub
If you wish your model to be portable, you can easily allow it to be imported with torch.hub. If you add an appropriately defined hubconf.py file to a github repo, this can be easily called from within PyTorch to enable users to load your model with/without weights:
hubconf.py (github.com/repo_owner/repo_name)
dependencies = ['torch']
from my_module import mymodel as _mymodel
def mymodel(pretrained=False, **kwargs):
return _mymodel(pretrained=pretrained, **kwargs)
Loading model:
new_model = torch.hub.load('repo_owner/repo_name', 'mymodel')
new_model_pretrained = torch.hub.load('repo_owner/repo_name', 'mymodel', pretrained=True)
pip install pytorch-lightning
make sure your parent model uses pl.LightningModule instead of nn.Module
Saving and loading checkpoints using pytorch lightning
import pytorch_lightning as pl
model = MyLightningModule(hparams)
trainer.fit(model)
trainer.save_checkpoint("example.ckpt")
new_model = MyModel.load_from_checkpoint(checkpoint_path="example.ckpt")
These days everything is written in the official tutorial:
https://pytorch.org/tutorials/beginner/saving_loading_models.html
You have several options on how to save and what to save and all is explained in that tutorial.
I use this approach, hope it will be useful for you.
num_labels = len(test_label_cols)
robertaclassificationtrain = '/dbfs/FileStore/tables/PM/TC/roberta_model'
robertaclassificationpath = "/dbfs/FileStore/tables/PM/TC/ROBERTACLASSIFICATION"
model = RobertaForSequenceClassification.from_pretrained(robertaclassificationpath,
num_labels=num_labels)
model.cuda()
model.load_state_dict(torch.load(robertaclassificationtrain))
model.eval()
Where I save my train model already in 'roberta_model' path. Save a train model.
torch.save(model.state_dict(), '/dbfs/FileStore/tables/PM/TC/roberta_model')