Related
In 1 code., I have uploaded hugging face 'transformers.trainer.Trainer' based model using save_pretrained() function
In 2nd code, I want to download this uploaded model and use it to make predictions. I need help in this step - How to download the uploaded model & then make a prediction?
Steps to create model:
from transformers import AutoModelForQuestionAnswering, TrainingArguments, Trainer
model = AutoModelForQuestionAnswering.from_pretrained('xlm-roberta-large)
trainer = Trainer(
model,
args,
train_dataset=tokenized_train_ds,
eval_dataset=tokenized_val_ds,
data_collator=data_collator,
tokenizer=tokenizer,)
#Arguments used above not mentioned here - model, args, tokenized_train_ds, tokenized_val_ds, data_collator, tokenizer
#Below step train the pre-trained model
trainer.train()
I then uploaded this 'trainer' model using the below command:-
trainer.save_model('./trainer_sm')
In a different code, I now want to download this model & use it for making predictions, Can someone advise how to do this? I tried the below command to upload it:-
model_sm=AutoModelForQuestionAnswering.from_pretrained("./trainer_sm")
And used it to make predictions by this line of code:-
model_sm.predict(test_features)
AttributeError: 'XLMRobertaForQuestionAnswering' object has no attribute 'predict'
I also used 'use_auth_token=True' as an argument for from_pretrained, but that also didn't work.
Also, type(trainer) is 'transformers.trainer.Trainer' , while type(model_sm) is transformers.models.xlm_roberta.modeling_xlm_roberta.XLMRobertaForQuestionAnswering
What you have saved is the model which the trainer was going to tune and you should be aware that predicting, training, evaluation and etc, are the utilities of transformers.trainer.Trainer object, not transformers.models.xlm_roberta.modeling_xlm_roberta.XLMRobertaForQuestionAnswering. Based on what was mentioned the easiest way to keep things going is creating another instance of the trainer.
model_sm=AutoModelForQuestionAnswering.from_pretrained("./trainer_sm")
reloaded_trainer = Trainer(
model = model_sm,
tokenizer = tokenizer,
# other arguments if you have changed the defaults
)
reloaded_trainer.predict(test_dataset)
So I defined my keras model and have used a custom_loss function to train the model:
model.compile(optimizer='adam', loss=custom_loss, metrics=[custom_loss])
Then I am training the model:
history = model.fit(X_train, y_train, batch_size=1024, epochs=125, validation_split=0.2, shuffle=True)
Then I save this history object using the following code:
with open('history.pkl', 'wb') as file:
pickle.dump(history, file)
Now, when I am trying to read the history object as follows:
with open('history.pkl', 'rb') as file:
history = pickle.load(file)
I get the following error:
ValueError: Unknown loss function:custom_loss
How can I read the history object? I don't get this error when I am not using custom_loss function.
I am using keras 2.2.4 and tensorflow 1.15.5
Edit: Complete error traceback as requested:
For most use cases, you don't want to serialize the history object. What you are usually interested in is history.history, which is a dict of the logs / metrics / losses / etc.
Try that:
pickle.dump(history.history, file)
The fuller answer is that the history object returned is a tf.keras.callbacks.History, which subclasses tf.keras.callbacks.Callback. Callback itself has a ref to the model, which then has refs to all kinds of stuff including custom objects like your custom loss. Serialization of Keras custom objects is a whole other big topic... tldr the recommended way to serialize Keras models is not to use pickle.
+1 to #Yaoshiang's answer. That's the right answer.
This is just a big note.
Reading that trace, it looks like keras has custom pickle logic that uses the standard keras save/load logic. keras doesn't know about your loss function unless you tell it.
Search for "custom objects" in this guide: https://keras.io/guides/serialization_and_saving/
Try something like:
custom_objects = {"custom_loss": custom_loss}
with keras.utils.custom_object_scope(custom_objects):
with open('history.pkl', 'rb') as file:
history = pickle.load(file)
I have created and trained a TensorFlow model using the HammingLoss metric from TensorFlow addons. Thus, it's not a custom metric that I have created on my own. I use a callbacks function with the methords ModelCheckpoint() and EarlyStopping to save the best weights of the best model and stop model training at a given threshold repsectively. When I save the model checkpoint I serialize the whole model structure (similar to model.save()), istead of model.save_weights(), which would have saved only the model weights (more about ModelCheckpoint here).
TL;DR: Here is a colab notebook with the code I post below in case you want to skip this.
The model I have trained is saved in GoogleDrive in the link here. To load the specific model I use the following code:
neural_network_parameters = {}
#======================================================================
# PARAMETERS THAT DEFINE THE NEURAL NETWORK STRUCTURE =
#======================================================================
neural_network_parameters['model_loss'] = tf.keras.losses.BinaryCrossentropy(from_logits=False, name='binary_crossentropy')
neural_network_parameters['model_metric'] = [tfa.metrics.HammingLoss(mode="multilabel", name="hamming_loss"),
tfa.metrics.F1Score(17, average="micro", name="f1_score_micro"),
tfa.metrics.F1Score(17, average=None, name="f1_score_none"),
tfa.metrics.F1Score(17, average="macro", name="f1_score_macro"),
tfa.metrics.F1Score(17, average="weighted", name="f1_score_weighted")]
"""Initialize the hyper parameters tuning the model using Tensorflow's hyperparameters module"""
HP_HIDDEN_UNITS = hp.HParam('batch_size', hp.Discrete([32]))
HP_EMBEDDING_DIM = hp.HParam('embedding_dim', hp.Discrete([50]))
HP_LEARNING_RATE = hp.HParam('learning_rate', hp.Discrete([0.001])) # Adam default: 0.001, SGD default: 0.01, RMSprop default: 0.001....0.1 to be removed
HP_DECAY_STEPS_MULTIPLIER = hp.HParam('decay_steps_multiplier', hp.Discrete([10]))
METRIC_ACCURACY = "hamming_loss"
dependencies = {
'hamming_loss': tfa.metrics.HammingLoss(mode="multilabel", name="hamming_loss"),
'attention': attention(return_sequences=True)
}
def import_trained_keras_model(model_index, method, decay_steps_mode, optimizer_name, hparams):
"""Load the model"""
training_date="2021-02-27"
model_path_structure=f"{folder_path_model_saved}/{initialize_notebbok_variables.saved_model_name}_{hparams[HP_EMBEDDING_DIM]}dim_{hparams[HP_HIDDEN_UNITS]}batchsize_{hparams[HP_LEARNING_RATE]}lr_{hparams[HP_DECAY_STEPS_MULTIPLIER]}decaymultiplier_{training_date}"
model_imported=load_model(f"{model_path_structure}", custom_objects=dependencies)
if optimizer_name=="adam":
optimizer = optimizer_adam_v2(hparams)
elif optimizer_name=="sgd":
optimizer = optimizer_sgd_v1(hparams, "step decay")
else:
optimizer = optimizer_rmsprop_v1(hparams)
model_imported.compile(optimizer=optimizer,
loss=neural_network_parameters['model_loss'],
metrics=neural_network_parameters['model_metric'])
print(f"Model {model_index} is loaded successfully\n")
return model_imported
Calling the function import trained keras model
"""Now that the functions have been created it's time to import each trained classifier from the selected dictionary of hyper parameters, calculate the evaluation metric per model and finally serialize the scores dataframe for later use."""
list_models=[] #a list to store imported models
model_optimizer="adam"
for batch_size in HP_HIDDEN_UNITS.domain.values:
for embedding_dim in HP_EMBEDDING_DIM.domain.values:
for learning_rate in HP_LEARNING_RATE.domain.values:
for decay_steps_multiplier in HP_DECAY_STEPS_MULTIPLIER.domain.values:
hparams = {
HP_HIDDEN_UNITS: batch_size,
HP_EMBEDDING_DIM: embedding_dim,
HP_LEARNING_RATE: learning_rate,
HP_DECAY_STEPS_MULTIPLIER: decay_steps_multiplier
}
print(f"\n{len(list_models)+1}/{(len(HP_HIDDEN_UNITS.domain.values)*len(HP_EMBEDDING_DIM.domain.values)*len(HP_LEARNING_RATE.domain.values)*len(HP_DECAY_STEPS_MULTIPLIER.domain.values))}")
print({h.name: hparams[h] for h in hparams},'\n')
model_object=import_trained_keras_model(len(list_models)+1, "import custom trained model", "on", model_optimizer, hparams)
list_models.append(model_object)
When I call the function I get the following error
ValueError: Unable to restore custom object of type _tf_keras_metric currently. Please make sure that the layer implements get_configand from_config when saving. In addition, please use the custom_objects arg when calling load_model().
It's strange that I get this error since the model metric to compile the NN is from a built in method of TensorFlow and NOT some sort of a custom metric that I developed myself.
I have searched also this thread in GitHub which closed without explaining the root of the problem.
[UPDATE]--Found a temporary solution
I managed to successfully import the model by turning the compile argument to False in order to re-compile the model imported inside the function.
So I did smth like model_imported=load_model(f"{model_path_structure}", custom_objects=dependencies, compile=False).
This action produced the following result:
WARNING:tensorflow:Unable to restore custom metric. Please ensure that the layer implements get_config and from_config when saving. In addition, please use the custom_objects arg when calling load_model().
Model 1 is loaded successfully.
So TensorFlow still cannot understand that HammingLoss is not a custom metric but rather a metric imported from Tensorflow Addons. However, despite the warning the model loaded successfully.
I'm implementing a Keras model with a custom batch-renormalization layer, which has 4 weights (beta, gamma, running_mean, and running_std) and 3 state variables (r_max, d_max, and t):
self.gamma = self.add_weight(shape = shape, #NK - shape = shape
initializer=self.gamma_init,
regularizer=self.gamma_regularizer,
name='{}_gamma'.format(self.name))
self.beta = self.add_weight(shape = shape, #NK - shape = shape
initializer=self.beta_init,
regularizer=self.beta_regularizer,
name='{}_beta'.format(self.name))
self.running_mean = self.add_weight(shape = shape, #NK - shape = shape
initializer='zero',
name='{}_running_mean'.format(self.name),
trainable=False)
# Note: running_std actually holds the running variance, not the running std.
self.running_std = self.add_weight(shape = shape, initializer='one',
name='{}_running_std'.format(self.name),
trainable=False)
self.r_max = K.variable(np.ones((1,)), name='{}_r_max'.format(self.name))
self.d_max = K.variable(np.zeros((1,)), name='{}_d_max'.format(self.name))
self.t = K.variable(np.zeros((1,)), name='{}_t'.format(self.name))
When I checkpoint the model, only gamma, beta, running_mean, and running_std are saved (as expected), but when I try to load the model, I get this error:
Layer #1 (named "batch_renormalization_1" in the current model) was found to correspond to layer batch_renormalization_1 in the save file. However the new layer batch_renormalization_1 expects 7 weights, but the saved weights have 4 elements.
So it looks like the model is expecting all 7 weights to be part of the saved file, even though some of them are state variables.
Any insights as to how to get around this?
EDIT: I realize that the problem was that the model was trained and saved on Keras 2.1.0 (with Tensorflow 1.3.0 backend), and I only get the error when loading the model using Keras 2.4.3 (with Tensorflow 2.3.0 backend). I am able to load the model using Keras to 2.1.0.
So the real question is - what changed in Keras/Tensorflow, and is there a way to load older models without receiving this error?
You cant not load the model this way because keras.models.load_model will load the configuration that has been defined, not something has been self_customed.
To overcome this, you should reload the model architecture and try to load_weights from that instead:
model = YourModelDeclaration()
model.load_weights("checkpoint/h5file")
I have the same problem when I self custom BatchNormalize, so I would be pretty sure this is the only way to load it.
In Keras, there's two ways to save the state of your model.
You can call the model.save() and model.save_weights() functions.
model.save() saves the entire model, including the weights and gradients. In your case, the 4 weights and 3 state variables will all be saved by this method. You can simply use the load_model("path.h5") method to get your model back.
The model.save_weights() function only saves the weights of the model and does not save the structure at all. The important thing to note here is that the Keras checkpoint callback uses the model.save_weights() method under the hood. If you wish to use the checkpoint weights, you must instantiate your model structure model = customModel() and then load the weights into it model.load_weights("checkpoint.h5")
How do I save a trained model in PyTorch? I have read that:
torch.save()/torch.load() is for saving/loading a serializable object.
model.state_dict()/model.load_state_dict() is for saving/loading model state.
Found this page on their github repo:
Recommended approach for saving a model
There are two main approaches for serializing and restoring a model.
The first (recommended) saves and loads only the model parameters:
torch.save(the_model.state_dict(), PATH)
Then later:
the_model = TheModelClass(*args, **kwargs)
the_model.load_state_dict(torch.load(PATH))
The second saves and loads the entire model:
torch.save(the_model, PATH)
Then later:
the_model = torch.load(PATH)
However in this case, the serialized data is bound to the specific classes and the exact directory structure used, so it can break in various ways when used in other projects, or after some serious refactors.
See also: Save and Load the Model section from the official PyTorch tutorials.
It depends on what you want to do.
Case # 1: Save the model to use it yourself for inference: You save the model, you restore it, and then you change the model to evaluation mode. This is done because you usually have BatchNorm and Dropout layers that by default are in train mode on construction:
torch.save(model.state_dict(), filepath)
#Later to restore:
model.load_state_dict(torch.load(filepath))
model.eval()
Case # 2: Save model to resume training later: If you need to keep training the model that you are about to save, you need to save more than just the model. You also need to save the state of the optimizer, epochs, score, etc. You would do it like this:
state = {
'epoch': epoch,
'state_dict': model.state_dict(),
'optimizer': optimizer.state_dict(),
...
}
torch.save(state, filepath)
To resume training you would do things like: state = torch.load(filepath), and then, to restore the state of each individual object, something like this:
model.load_state_dict(state['state_dict'])
optimizer.load_state_dict(state['optimizer'])
Since you are resuming training, DO NOT call model.eval() once you restore the states when loading.
Case # 3: Model to be used by someone else with no access to your code:
In Tensorflow you can create a .pb file that defines both the architecture and the weights of the model. This is very handy, specially when using Tensorflow serve. The equivalent way to do this in Pytorch would be:
torch.save(model, filepath)
# Then later:
model = torch.load(filepath)
This way is still not bullet proof and since pytorch is still undergoing a lot of changes, I wouldn't recommend it.
The pickle Python library implements binary protocols for serializing and de-serializing a Python object.
When you import torch (or when you use PyTorch) it will import pickle for you and you don't need to call pickle.dump() and pickle.load() directly, which are the methods to save and to load the object.
In fact, torch.save() and torch.load() will wrap pickle.dump() and pickle.load() for you.
A state_dict the other answer mentioned deserves just a few more notes.
What state_dict do we have inside PyTorch?
There are actually two state_dicts.
The PyTorch model is torch.nn.Module which has model.parameters() call to get learnable parameters (w and b).
These learnable parameters, once randomly set, will update over time as we learn.
Learnable parameters are the first state_dict.
The second state_dict is the optimizer state dict. You recall that the optimizer is used to improve our learnable parameters. But the optimizer state_dict is fixed. Nothing to learn there.
Because state_dict objects are Python dictionaries, they can be easily saved, updated, altered, and restored, adding a great deal of modularity to PyTorch models and optimizers.
Let's create a super simple model to explain this:
import torch
import torch.optim as optim
model = torch.nn.Linear(5, 2)
# Initialize optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
print("Model's state_dict:")
for param_tensor in model.state_dict():
print(param_tensor, "\t", model.state_dict()[param_tensor].size())
print("Model weight:")
print(model.weight)
print("Model bias:")
print(model.bias)
print("---")
print("Optimizer's state_dict:")
for var_name in optimizer.state_dict():
print(var_name, "\t", optimizer.state_dict()[var_name])
This code will output the following:
Model's state_dict:
weight torch.Size([2, 5])
bias torch.Size([2])
Model weight:
Parameter containing:
tensor([[ 0.1328, 0.1360, 0.1553, -0.1838, -0.0316],
[ 0.0479, 0.1760, 0.1712, 0.2244, 0.1408]], requires_grad=True)
Model bias:
Parameter containing:
tensor([ 0.4112, -0.0733], requires_grad=True)
---
Optimizer's state_dict:
state {}
param_groups [{'lr': 0.001, 'momentum': 0.9, 'dampening': 0, 'weight_decay': 0, 'nesterov': False, 'params': [140695321443856, 140695321443928]}]
Note this is a minimal model. You may try to add stack of sequential
model = torch.nn.Sequential(
torch.nn.Linear(D_in, H),
torch.nn.Conv2d(A, B, C)
torch.nn.Linear(H, D_out),
)
Note that only layers with learnable parameters (convolutional layers, linear layers, etc.) and registered buffers (batchnorm layers) have entries in the model's state_dict.
Non-learnable things belong to the optimizer object state_dict, which contains information about the optimizer's state, as well as the hyperparameters used.
The rest of the story is the same; in the inference phase (this is a phase when we use the model after training) for predicting; we do predict based on the parameters we learned. So for the inference, we just need to save the parameters model.state_dict().
torch.save(model.state_dict(), filepath)
And to use later
model.load_state_dict(torch.load(filepath))
model.eval()
Note: Don't forget the last line model.eval() this is crucial after loading the model.
Also don't try to save torch.save(model.parameters(), filepath). The model.parameters() is just the generator object.
On the other hand, torch.save(model, filepath) saves the model object itself, but keep in mind the model doesn't have the optimizer's state_dict. Check the other excellent answer by #Jadiel de Armas to save the optimizer's state dict.
A common PyTorch convention is to save models using either a .pt or .pth file extension.
Save/Load Entire Model
Save:
path = "username/directory/lstmmodelgpu.pth"
torch.save(trainer, path)
Load:
(Model class must be defined somewhere)
model.load_state_dict(torch.load(PATH))
model.eval()
If you want to save the model and wants to resume the training later:
Single GPU:
Save:
state = {
'epoch': epoch,
'state_dict': model.state_dict(),
'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)
Load:
checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']
Multiple GPU:
Save
state = {
'epoch': epoch,
'state_dict': model.module.state_dict(),
'optimizer': optimizer.state_dict(),
}
savepath='checkpoint.t7'
torch.save(state,savepath)
Load:
checkpoint = torch.load('checkpoint.t7')
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
epoch = checkpoint['epoch']
#Don't call DataParallel before loading the model otherwise you will get an error
model = nn.DataParallel(model) #ignore the line if you want to load on Single GPU
Saving locally
How you save your model depends on how you want to access it in the future. If you can call a new instance of the model class, then all you need to do is save/load the weights of the model with model.state_dict():
# Save:
torch.save(old_model.state_dict(), PATH)
# Load:
new_model = TheModelClass(*args, **kwargs)
new_model.load_state_dict(torch.load(PATH))
If you cannot for whatever reason (or prefer the simpler syntax), then you can save the entire model (actually a reference to the file(s) defining the model, along with its state_dict) with torch.save():
# Save:
torch.save(old_model, PATH)
# Load:
new_model = torch.load(PATH)
But since this is a reference to the location of the files defining the model class, this code is not portable unless those files are also ported in the same directory structure.
Saving to cloud - TorchHub
If you wish your model to be portable, you can easily allow it to be imported with torch.hub. If you add an appropriately defined hubconf.py file to a github repo, this can be easily called from within PyTorch to enable users to load your model with/without weights:
hubconf.py (github.com/repo_owner/repo_name)
dependencies = ['torch']
from my_module import mymodel as _mymodel
def mymodel(pretrained=False, **kwargs):
return _mymodel(pretrained=pretrained, **kwargs)
Loading model:
new_model = torch.hub.load('repo_owner/repo_name', 'mymodel')
new_model_pretrained = torch.hub.load('repo_owner/repo_name', 'mymodel', pretrained=True)
pip install pytorch-lightning
make sure your parent model uses pl.LightningModule instead of nn.Module
Saving and loading checkpoints using pytorch lightning
import pytorch_lightning as pl
model = MyLightningModule(hparams)
trainer.fit(model)
trainer.save_checkpoint("example.ckpt")
new_model = MyModel.load_from_checkpoint(checkpoint_path="example.ckpt")
These days everything is written in the official tutorial:
https://pytorch.org/tutorials/beginner/saving_loading_models.html
You have several options on how to save and what to save and all is explained in that tutorial.
I use this approach, hope it will be useful for you.
num_labels = len(test_label_cols)
robertaclassificationtrain = '/dbfs/FileStore/tables/PM/TC/roberta_model'
robertaclassificationpath = "/dbfs/FileStore/tables/PM/TC/ROBERTACLASSIFICATION"
model = RobertaForSequenceClassification.from_pretrained(robertaclassificationpath,
num_labels=num_labels)
model.cuda()
model.load_state_dict(torch.load(robertaclassificationtrain))
model.eval()
Where I save my train model already in 'roberta_model' path. Save a train model.
torch.save(model.state_dict(), '/dbfs/FileStore/tables/PM/TC/roberta_model')