huggingface transformers longformer optimizer warning AdamW

huggingface transformers longformer optimizer warning AdamW - python

I get below warning when I try to run the code from this page.
/usr/local/lib/python3.7/dist-packages/transformers/optimization.py:309: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use thePyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
FutureWarning,
I am super confused because the code doesn't seem to set the optimizer at all. The most probable places where the optimizer was set could be below but I dont know how to change the optimizer then
# define the training arguments
training_args = TrainingArguments(
output_dir = '/media/data_files/github/website_tutorials/results',
num_train_epochs = 5,
per_device_train_batch_size = 8,
gradient_accumulation_steps = 8,
per_device_eval_batch_size= 16,
evaluation_strategy = "epoch",
disable_tqdm = False,
load_best_model_at_end=True,
warmup_steps=200,
weight_decay=0.01,
logging_steps = 4,
fp16 = True,
logging_dir='/media/data_files/github/website_tutorials/logs',
dataloader_num_workers = 0,
run_name = 'longformer-classification-updated-rtx3090_paper_replication_2_warm'
)
# instantiate the trainer class and check for available devices
trainer = Trainer(
model=model,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=train_data,
eval_dataset=test_data
)
device = 'cuda' if torch.cuda.is_available() else 'cpu'
device
I tried another transformer such as distilbert-base-uncased using the identical code but it seems to run without any warnings.
Is this warning more specific to longformer?
How should I change the optimizer?

import torch_optimizer as optim
optim.AdamW(params, opt.learning_rate, (opt.optim_alpha, opt.optim_beta), opt.optim_epsilon, weight_decay=opt.weight_decay)
It can be used this way.

You need to add optim='adamw_torch', the default is optim='adamw_hf'
Refer here
Can you try the following:
# define the training arguments
training_args = TrainingArguments(
optim='adamw_torch',
# your training arguments
...
...
...
)

Related

Weights and Biases error: The wandb backend process has shutdown

running the colab linked below, I get the following error:
"The wandb backend process has shutdown"
I see nothing suspicious in the way the colab uses wandb and I couldn't find anyone with the same problem. Any help is greatly appreciated. I am using the latest version of wandb in colab.
This is where I set up wandb:
if WANDB:
wandb.login()
and this is the part where I get the error:
#setup wandb if we're using it
if WANDB:
experiment_name = os.environ.get("EXPERIMENT_NAME")
group = experiment_name if experiment_name != "none" else wandb.util.generate_id()
cv_scores = []
oof_data_frame = pd.DataFrame()
for fold in range(1, config.folds + 1):
print(f"Fold {fold}/{config.folds}", end="\n"*2)
fold_directory = os.path.join(config.output_directory, f"fold_{fold}")
make_directory(fold_directory)
model_path = os.path.join(fold_directory, "model.pth")
model_config_path = os.path.join(fold_directory, "model_config.json")
checkpoints_directory = os.path.join(fold_directory, "checkpoints/")
make_directory(checkpoints_directory)
#Data collators are objects that will form a batch by using a list of dataset elements as input.
collator = Collator(tokenizer=tokenizer, max_length=config.max_length)
train_fold = train[~train["fold"].isin([fold])]
train_dataset = Dataset(texts=train_fold["anchor"].values,
pair_texts=train_fold["target"].values,
contexts=train_fold["title"].values,
targets=train_fold["score"].values,
max_length=config.max_length,
sep=tokenizer.sep_token,
tokenizer=tokenizer)
train_loader = DataLoader(dataset=train_dataset,
batch_size=config.batch_size,
num_workers=config.num_workers,
pin_memory=config.pin_memory,
collate_fn=collator,
shuffle=True,
drop_last=False)
print(f"Train samples: {len(train_dataset)}")
validation_fold = train[train["fold"].isin([fold])]
validation_dataset = Dataset(texts=validation_fold["anchor"].values,
pair_texts=validation_fold["target"].values,
contexts=validation_fold["title"].values,
targets=validation_fold["score"].values,
max_length=config.max_length,
sep=tokenizer.sep_token,
tokenizer=tokenizer)
validation_loader = DataLoader(dataset=validation_dataset,
batch_size=config.batch_size*2,
num_workers=config.num_workers,
pin_memory=config.pin_memory,
collate_fn=collator,
shuffle=True,
drop_last=False)
print(f"Validation samples: {len(validation_dataset)}")
model = Model(**config.model)
if not os.path.exists(model_config_path):
model.config.to_json_file(model_config_path)
model_parameters = model.parameters()
optimizer = get_optimizer(**config.optimizer, model_parameters=model_parameters)
training_steps = len(train_loader) * config.epochs
if "scheduler" in config:
config.scheduler.parameters.num_training_steps = training_steps
config.scheduler.parameters.num_warmup_steps = training_steps * config.get("warmup", 0)
scheduler = get_scheduler(**config.scheduler, optimizer=optimizer, from_transformers=True)
else:
scheduler = None
model_checkpoint = ModelCheckpoint(mode="min",
delta=config.delta,
directory=checkpoints_directory,
overwriting=True,
filename_format="checkpoint.pth",
num_candidates=1)
if WANDB:
wandb.init()
#wandb.init(group=group, name=f"fold_{fold}", config=config)
(train_loss, train_metrics), (validation_loss, validation_metrics, validation_outputs) = training_loop(model=model,
optimizer=optimizer,
scheduler=scheduler,
scheduling_after=config.scheduling_after,
train_loader=train_loader,
validation_loader=validation_loader,
epochs=config.epochs,
gradient_accumulation_steps=config.gradient_accumulation_steps,
gradient_scaling=config.gradient_scaling,
gradient_norm=config.gradient_norm,
validation_steps=config.validation_steps,
amp=config.amp,
debug=config.debug,
verbose=config.verbose,
device=config.device,
recalculate_metrics_at_end=True,
return_validation_outputs=True,
logger="tqdm")
if WANDB:
wandb.finish()
if config.save_model:
model_state = model.state_dict()
torch.save(model_state, model_path)
print(f"Model's path: {model_path}")
validation_fold["prediction"] = validation_outputs.to("cpu").numpy()
oof_data_frame = pd.concat([oof_data_frame, validation_fold])
cv_monitor_value = validation_loss if config.cv_monitor_value == "loss" else validation_metrics[config.cv_monitor_value]
cv_scores.append(cv_monitor_value)
del model, optimizer, validation_outputs, train_fold, validation_fold
torch.cuda.empty_cache()
gc.collect()
print(end="\n"*6)

TDLR; Check if the generated id is unique in the project space of wandb you are using.
Explanation
You can check the exact reason this happened in the log files under the wandb folder and specific run id. Had the same issue with Error communicating with wandb process and The wandb backend process has shutdown.
My problem was that I was assigning the run id to a specific instance which already existed, and rerunning the whole search space, but the run id have to be unique. Using name in init is generally a safer bet if you don't intend to continue the previous run (which is possible if you indicate so in the init method).
You can try running Wandb in offline mode, to see if this can help, and later on doing wandb sync.

Solution which worked for me is run !wandb login --relogin.

Python Unittest for a PyTorch Model

I have got the following function where I struggle with:
def load_trained_bert(
num_classes: int, path_to_model: Union[Path, str]
) -> Tuple[BertForSequenceClassification, device]:
"""Returns a bert model and device from four required model files of a folder
Parameters
----------
num_classes: int
Number of output layers in the bert model
path_to_model: Union[Path, str]
Folder where the four required models files are
Returns
-------
Tuple[BertForSequenceClassification, device]
BERT model in evaluation mode and device
"""
# Set device
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
# Initialize BERT
model = BertForSequenceClassification.from_pretrained(
path_to_model,
num_classes=num_classes,
output_attentions=False,
output_hidden_states=False,
)
# Load fine tuned model weights
weight_file = get_weight_file(path_to_model)
path_to_weights = os.path.join(path_to_model, weight_file)
model.load_state_dict(torch.load(path_to_weights, map_location=torch.device("cpu")))
# Send model to device
model.to(device)
# Set model to inference mode
model.eval()
return model, device
I am in general not sure how to fest this function, but I thought it would be a good idea just to check the parameters I call the function with:
class LoadModelTest(TestCase):
#patch("abox.util.model_conversion.get_weight_file", return_value="test.model")
def test_load_trained_bert(self, get_weight_file):
BertForSequenceClassification.from_pretrained = Mock()
load_trained_bert(num_classes=16, path_to_model="./model")
BertForSequenceClassification.from_pretrained.assert_called_with(
"./model",
num_classes=16,
output_attentions=False,
output_hidden_states=False,
)
This results in the following error:
FileNotFoundError: [Errno 2] No such file or directory: './model\\test.model'
Now it´s getting difficult... I have no idea what to do with the following snippet:
model.load_state_dict(torch.load(path_to_weights, map_location=torch.device("cpu")))
Can anyone help me here?

Why is it not recommended to save the optimizer, model etc as pickable/dillable objs in PyTorch but instead get the state dicts and load them?

Why is it recommended to save the state dicts and load them instead of saving stuff with dill for example and then just getting the usable objects immediately?
I think I've done that without may issues and it saves users code.
But instead we are recommended to do something like:
def _load_model_and_optimizer_from_checkpoint(args: Namespace, training: bool = True) -> Namespace:
"""
based from: https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html
"""
import torch
from torch import optim
import torch.nn as nn
# model = Net()
args.model = nn.Linear()
# optimizer = optim.SGD(args.model.parameters(), lr=0.001, momentum=0.9)
optimizer = optim.Adam(args.model.parameters(), lr=0.001)
# scheduler...
checkpoint = torch.load(args.PATH)
args.model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
args.epoch_num = checkpoint['epoch_num']
args.loss = checkpoint['loss']
args.model.train() if training else args.model.eval()
For example I've saved:
def save_for_meta_learning(args: Namespace, ckpt_filename: str = 'ckpt.pt'):
if is_lead_worker(args.rank):
import dill
args.logger.save_current_plots_and_stats()
# - ckpt
assert uutils.xor(args.training_mode == 'epochs', args.training_mode == 'iterations')
f: nn.Module = get_model_from_ddp(args.base_model)
# pickle vs torch.save https://discuss.pytorch.org/t/advantages-disadvantages-of-using-pickle-module-to-save-models-vs-torch-save/79016
args_pickable: Namespace = uutils.make_args_pickable(args)
torch.save({'training_mode': args.training_mode, # assert uutils.xor(args.training_mode == 'epochs', args.training_mode == 'iterations')
'it': args.it,
'epoch_num': args.epoch_num,
'args': args_pickable, # some versions of this might not have args!
'meta_learner': args.meta_learner,
'meta_learner_str': str(args.meta_learner), # added later, to make it easier to check what optimizer was used
'f': f,
'f_state_dict': f.state_dict(), # added later, to make it easier to check what optimizer was used
'f_str': str(f), # added later, to make it easier to check what optimizer was used
# 'f_modules': f._modules,
# 'f_modules_str': str(f._modules),
'outer_opt': args.outer_opt, # added later, to make it easier to check what optimizer was used
'outer_opt_state_dict': args.outer_opt.state_dict(), # added later, to make it easier to check what optimizer was used
'outer_opt_str': str(args.outer_opt) # added later, to make it easier to check what optimizer was used
},
pickle_module=dill,
f=args.log_root / ckpt_filename)
then loaded:
def get_model_opt_meta_learner_to_resume_checkpoint_resnets_rfs(args: Namespace,
path2ckpt: str,
filename: str,
device: Optional[torch.device] = None
) -> tuple[nn.Module, optim.Optimizer, MetaLearner]:
"""
Get the model, optimizer, meta_learner to resume training from checkpoint.
Examples:
- see: _resume_from_checkpoint_meta_learning_for_resnets_rfs_test
"""
import uutils
path2ckpt: Path = Path(path2ckpt).expanduser() if isinstance(path2ckpt, str) else path2ckpt.expanduser()
ckpt: dict = torch.load(path2ckpt / filename, map_location=torch.device('cpu'))
# args_ckpt: Namespace = ckpt['args']
training_mode = ckpt.get('training_mode')
if training_mode is not None:
assert uutils.xor(training_mode == 'epochs', training_mode == 'iterations')
if training_mode == 'epochs':
args.epoch_num = ckpt['epoch_num']
else:
args.it = ckpt['it']
# - get meta-learner
meta_learner: MetaLearner = ckpt['meta_learner']
# - get model
model: nn.Module = meta_learner.base_model
# - get outer-opt
outer_opt_str = ckpt.get('outer_opt_str')
if outer_opt_str is not None:
# use the string to create optimizer, load the state dict, etc.
outer_opt: optim.Optimizer = get_optimizer(outer_opt_str)
outer_opt_state_dict: dict = ckpt['outer_opt_state_dict']
outer_opt.load_state_dict(outer_opt_state_dict)
else:
# this is not ideal, but since Adam has a exponentially moving average for it's adaptive learning rate,
# hopefully this doesn't screw my checkpoint to much
outer_opt: optim.Optimizer = optim.Adam(model.parameters(), lr=args.outer_lr)
# - device setup
if device is not None:
# if torch.cuda.is_available():
# meta_learner.base_model = meta_learner.base_model.cuda()
meta_learner.base_model = meta_learner.base_model.to(device)
return model, outer_opt, meta_learner
without issues.
Related:
Save and load model optimizer state
pytorch save and load model
Save and load a Pytorch model
save and load unserialized pytorch pretrained model
https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html
Why is it not recommended to save the optimizer, model etc as pickable/dillable objs in PyTorch but instead get the state dicts and load them?
https://discuss.pytorch.org/t/why-is-it-not-recommended-to-save-the-optimizer-model-etc-as-pickable-dillable-objs-in-pytorch-but-instead-get-the-state-dicts-and-load-them/137933

How to write serving input function for Tensorflow model trained without using Estimators?

I have a model trained on a single machine without using Estimator and I'm looking to serve the final trained model on Google cloud AI platform (ML engine). I exported the frozen graph as a SavedModel using SavedModelBuilder and deployed it on the AI platform. It works fine for small input images but for it to be able to accept large input images for online prediction, I need to change it to accept b64 encoded strings ({'image_bytes': {'b64': base64.b64encode(jpeg_data).decode()}}) which are converted to the required tensor by a serving_input_fn if using Estimators.
What options do I have if I am not using an Estimator? If I have a frozen graph or SavedModel being created from SavedModelBuilder, is there a way to have something similar to an estimator's serving_input_fn when exporting/ saving?
Here's the code I'm using for exporting:
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import tag_constants
export_dir = 'serving_model/'
graph_pb = 'model.pb'
builder = tf.saved_model.builder.SavedModelBuilder(export_dir)
with tf.gfile.GFile(graph_pb, "rb") as f:
graph_def = tf.GraphDef()
graph_def.ParseFromString(f.read())
sigs = {}
with tf.Session(graph=tf.Graph()) as sess:
# name="" is important to ensure we don't get spurious prefixing
tf.import_graph_def(graph_def, name="")
g = tf.get_default_graph()
inp = g.get_tensor_by_name("image_bytes:0")
out_f1 = g.get_tensor_by_name("feature_1:0")
out_f2 = g.get_tensor_by_name("feature_2:0")
sigs[signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY] = \
tf.saved_model.signature_def_utils.predict_signature_def(
{"image_bytes": inp}, {"f1": out_f1, "f2": out_f2})
builder.add_meta_graph_and_variables(sess,
[tag_constants.SERVING],
strip_default_attrs=True,
signature_def_map=sigs)
builder.save()

Use a #tf.function to specify a serving signature. Here's an example that calls Keras:
class ExportModel(tf.keras.Model):
def __init__(self, model):
super().__init__(self)
self.model = model
#tf.function(input_signature=[
tf.TensorSpec([None,], dtype='int32', name='a'),
tf.TensorSpec([None,], dtype='int32', name='b')
])
def serving_fn(self, a, b):
return {
'pred' : self.model({'a': a, 'b': b}) #, steps=1)
}
def save(self, export_path):
sigs = {
'serving_default' : self.serving_fn
}
tf.keras.backend.set_learning_phase(0) # inference only
tf.saved_model.save(self, export_path, signatures=sigs)
sm = ExportModel(model)
sm.save(EXPORT_PATH)

First, load your already exported SavedModel with
import tensorflow as tf
loaded_model = tf.saved_model.load(MODEL_DIR)
Then, wrap it with a new Keras model that takes base64 input
class Base64WrapperModel(tf.keras.Model):
def __init__(self, model):
super(Base64WrapperModel, self).__init__()
self.inner_model = model
#tf.function
def call(self, base64_input):
str_input = tf.io.decode_base64(base64_input)
return self.inner_model(str_input)
wrapper_model = Base64WrapperModel(loaded_model)
Finally, save your wrapped model with Keras API
wrapper_model.save(EXPORT_DIR)

Watson generated Pytorch results in: "ValueError: optimizer got an empty parameter list"

I am experimenting Watson Neural Network Modeler.
I've create a model from the built in demo "Single Convolution layer on MNIST". The only customization I did was to specify the training data files.
I then exported the Pytorch code and I am trying to run in on my local computer.
The generated code is pretty readable. The relevant code excerpt is:
# Define network architecture
class Net(nn.Module):
def __init__(self, inp_c):
super(Net, self).__init__()
def forward(self, ImageData_4, target):
Convolution2D_9 = self.Convolution2D_9(ImageData_4)
ReLU_1 = self.ReLU_1(Convolution2D_9)
Pooling2D_8 = self.Pooling2D_8(ReLU_1)
Flatten_2 = Pooling2D_8.view(-1, 10816)
Dense_3 = self.Dense_3(Flatten_2)
Softmax_5 = self.Softmax_5(Dense_3)
Accuracy_6 = torch.topk(Softmax_5, 1)[0]
CrossEntropyLoss_7 = self.CrossEntropyLoss_7(Softmax_5, target)
return Softmax_5, Accuracy_6
# Model Initialization
inp_c = 1
model = Net(inp_c)
model.cuda()
# Define optimizer
learning_rate = 0.001000
decay = 0.000000
beta_1 = 0.900000
beta_2 = 0.999000
optim = optim.Adam(
model.parameters(),
lr=learning_rate,
betas=(beta_1, beta_2),
weight_decay=decay)
I am getting the error:
"ValueError: optimizer got an empty parameter list"
on the optim = optim.Adam() statement.
Is there any Watson user/expert over there to bring some light on this issue? I am basically running the demo. It was not supposed to fail.
Thanks!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

huggingface transformers longformer optimizer warning AdamW - python

import torch_optimizer as optim optim.AdamW(params, opt.learning_rate, (opt.optim_alpha, opt.optim_beta), opt.optim_epsilon, weight_decay=opt.weight_decay) It can be used this way.

You need to add optim='adamw_torch', the default is optim='adamw_hf' Refer here Can you try the following: # define the training arguments training_args = TrainingArguments( optim='adamw_torch', # your training arguments ... ... ... )

Related

Weights and Biases error: The wandb backend process has shutdown

Python Unittest for a PyTorch Model

Why is it not recommended to save the optimizer, model etc as pickable/dillable objs in PyTorch but instead get the state dicts and load them?

How to write serving input function for Tensorflow model trained without using Estimators?

Watson generated Pytorch results in: "ValueError: optimizer got an empty parameter list"

Categories

Resources