Difference between transformers schedulers and Pytorch schedulers - python

Transformers also provide their own schedulers for learning rates like get_constant_schedule, get_constant_schedule_with_warmup, etc. They are again returning torch.optim.lr_scheduler.LambdaLR (torch scheduler). Is the warmup_steps the only difference between the two?
How can we create a custom transformer-based scheduler similar to other torch schedulers like lr_scheduler.MultiplicativeLR, lr_scheduler.StepLR, lr_scheduler.ExponentialLR?

You can create a custom scheduler by just creating a function in a class that takes in an optimizer and its state dicts and edits the values in its param_groups.
To understand how to structure this in a class, just take a look at how Pytorch creates its schedulers and use the same functions just change the functionality to your liking.
The Permalink I found that will be a good reference is over here
EDIT After comments:
This is like a template you can use
from torch.optim import lr_scheduler
class MyScheduler(lr_scheduler._LRScheduler # Optional inheritance):
def __init__(self, # optimizer, epoch, step size, whatever you need as input to lr scheduler, you can even use vars from LRShceduler Class that you can inherit from etc.):
super(MyScheduler, self).__init__(optimizer, last_epoch, verbose)
# Put variables that you will need for updating scheduler like gamma, optimizer, or step size etc.
self.optimizer = optimizer
def get_lr(self):
# How will you use the above variables to update the optimizer
for group in self.optimizer.param_groups:
group["lr"] = # Fill this out with updated optimizer
return self.optimizer
You can add more functions for increased functionality. Or you can just use a function to update your learning rate. This will take in the optimizer and change it optimizer.param_groups[0]["lr"] and return the new optimizer.

Related

multi-output keras model with a callback that monitors two metrics

I have a tf model that has two outputs, as indicated by this model.compile():
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=7e-4),
loss={"BV": tf.keras.losses.MeanAbsoluteError(), "Rsp": tf.keras.losses.MeanAbsoluteError()},
metrics={"BV": [tf.keras.metrics.RootMeanSquaredError(name="RMSE"), tfa.metrics.r_square.RSquare(name="R2")],
"Rsp": [tf.keras.metrics.RootMeanSquaredError(name="RMSE"), tfa.metrics.r_square.RSquare(name="R2")]})
I would like to use the ModelCheckpoint callback, which should monitor a sum of val_BV_R2 and val_Rsp_R2. I am able to run the callback like this:
save_best_model = tf.keras.callbacks.ModelCheckpoint("xyz.hdf5", monitor="val_Rsp_R2")
However, I don't know how to make it to save the model with the highest sum of two metrics.
According to the tf.keras.callbacks.ModelCheckpoint documentation, the metric to monitor che be only one at a time.
One way to achieve what you want, could be to define an additional custom metric, that performs the sum of the two metrics. Then you could monitor your custom metric and save the checkpoints as you are already doing. However this is a bit complicated, due to having multiple outputs.
Alternatively you could define a custom callback that does the same combining. Below a simple example of this second option. It should work (sorry I can't test it right now):
class CombineCallback(tf.keras.callbacks.Callback):
def __init__(self, **kargs):
super(CombineCallback, self).__init__(**kargs)
def on_epoch_end(self, epoch, logs={}):
logs['combine_metric'] = 0.5*logs['val_BV_R2'] + 0.5*logs['val_Rsp_R2']
Inside the callback you should be able to access your metrics directly with logs['name_of_my_metric'] or through the get function logs.get("name_of_my_metric").
Also I multiplied by 0.5 to leave the combined metric approximately in the same range, but see if this works for your case.
To use it just do:
save_best_model = CombineCallback("xyz.hdf5")
model.fit(..., callbacks=[save_best_model])
More information can be found at the Examples of Keras callback applications.

How to make pytorch lightning module have injected, nested models?

I have some nets, such as the following (augmented) resnet18:
num_classes = 10
resnet = models.resnet18(pretrained=True)
for param in resnet.parameters():
param.requires_grad = True
num_ftrs = resnet.fc.in_features
resnet.fc = nn.Linear(num_ftrs, num_classes)
And I want to use them inside a lightning module, and have it handle all optimizations, to_device, stages and so on. In other words, I want to register those modules for my lightning module.
I also want to be able to access their public members.
class MyLightning(LightningModule):
def __init__(self, resnet):
super().__init__()
self._resnet = resnet
self._criterion = lambda x: 1.0
def forward(self, x):
resnet_out = self._resnet(x)
loss = self._criterion(resnet_out)
return loss
my_lightning = MyLightning(resnet)
The above doesn't optimize any parameters.
Trying
def __init__(self, resnet)
...
_layers = list(resnet.children())[:-1]
self._resnet = nn.Sequential(*_layers)
Doesn't take resnet.fc into account. This also doesn't make sense to be the intended way of nesting models inside pytorch lightning.
How to nest models in pytorch lightning, and have them fully accessible and handled by the framework?
The training loop and optimization process is handles by the Trainer class. You can do so by initializing a new instance:
>>> trainer = Trainer()
And wrapping your PyTorch Lightning module with it. This way you can perform fitting, tuning, validating, and testing on that instance provided a DataLoader or LightningDataModule:
>>> trainer.fit(my_lightning, train_dataloader, val_dataloader)
You will have to implement the following functions on your Lightning module (i.e. in your case MyLightning):
Name
Description
init
Define computations here
forward
Use for inference only (separate from training_step)
training_step
the complete training loop
validation_step
the complete validation loop
test_step
the complete test loop
predict_step
the complete prediction loop
configure_optimizers
define optimizers and LR schedulers
source LightningModule documentation page.
Keep in mind a LightningModule is a nn.Module, so whenever you define a nn.Module as attribute to a LightningModule in the __init__ function, this module will end being registered as a sub-module to the parent pytorch lightning module.
The pytorch model should inherit from nn.Module, So you should find firstly the resnet18 in pytorch, then you can use the resnet18 or revise it by youself
The origin resnet codes is in this path: ...\python\Lib\site-packages\torchvision\models\resnet.py, you import the resnet network from here, so you can use it directly.
Now, you will find the original codes
class ResNet(nn.Module):...
https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py#L166
And import it like
from torchvision.models import ResNet
Finally, you can inherit from ResNet
class MyLightning(ResNet):

How to select subset of metrics to log on commandline when using model.fit() in Keras & Tensorflow

I have added lots of metrics for tracking the performance of my multi-class segmentation model in Keras and Tensorflow. These metrics include class-wise and aggregated metric functions. Now Tensorboard contains everything I want, but my command line output looks overloaded now. I would like to remove the class-wise metrics from the command line output while keeping them in Tensorboard. Is that possible?
model.compile(loss=dice_loss,
metrics=[f1score, f1score_class0, f1score_class1, f1score_class2])
Is it possible when implementing train_step and test_step on my own? Would I need to implement the training loop from scratch?
This can be easily done using callbacks. Namely:
write a callback printing only the metrics you are interested in. On how to write callbacks, see this tutorial.
in model.fit set verbose=0 and add your callback to callbacks.
run the training.
As far as I know, tensorboard uses all logs. Hence it will get all metrics from all epoches.
If you do not need any information to be printed during training at all, you may just set verbose=0.
Based on the answer of Ivan K. I came up with another idea:
The keras.callbacks.ProgbarLogger is responsible for logging to the command line.
It is possible to subclass the ProgbarLogger of keras and add an ProgbarLogger to the callbacks. It will prevent Keras to add the default ProgbarLogger and therefore replace it. Just implement all methods receiving the logs as argument, filter the logs and pass the filtered logs to the corresponding parent's class method.
This example removes logs which contain substrings in opt_out list.
import tensorflow as tf
class CustomProgbarLogger(tf.keras.callbacks.ProgbarLogger):
def __init__(self, count_mode="samples", stateful_metrics=None, opt_out=[]):
super().__init__(count_mode=count_mode, stateful_metrics=stateful_metrics)
self.opt_out = opt_out
def _filter(self, logname):
return all(word not in logname for word in self.opt_out)
def _filter_logs(self, logs):
return logs and {key: value for key, value in logs.items() if self._filter(key)}
def on_train_batch_end(self, batch, logs=None):
super().on_train_batch_end(batch, self._filter_logs(logs))
def on_test_batch_end(self, batch, logs=None):
super().on_test_batch_end(batch, self._filter_logs(logs))
def on_epoch_end(self, epoch, logs=None):
super().on_epoch_end(epoch, self._filter_logs(logs))
def on_test_end(self, logs=None):
super().on_test_end(self._filter_logs(logs))
def on_predict_end(self, logs=None):
super().on_predict_end(self._filter_logs(logs))
this example will remove every log containing one of the string class, precision, recall in the key.
progbar_callback = CustomProgbarLogger(opt_out=["class", "precision", "recall"])
model.fit(dataset, callbacks=[progbar_callback, tensorboard_callback])

How do we create a reusable block that share architecture in a single model but learn different set of weight in the single model in Keras?

I am using tensorflow.keras and want to know if it is possible to create reusable blocks of inbuilt Keras layers. For example, I would like to repeatedly use the same set of layers (that able to learn the different weights) at a different position in a model. I would like to use the following block at different times in my model.
keep_prob_=0.5
input_features=Input(shape=(29, 1664))
Imortant_features= SelfAttention(activation='tanh',
kernel_regularizer=tf.keras.regularizers.l2(0.), kernel_initializer='glorot_uniform'
(input_features)
drop3=tf.keras.layers.Dropout(keep_prob_)(Imortant_features)
Layer_norm_feat=tf.keras.layers.Add()([input_features, drop3])
Layer_norm=tf.keras.layers.LayerNormalization(axis=-1)(Layer_norm_feat)
ff_out=tf.keras.layers.Dense(Layer_norm.shape[2], activation='relu')(Layer_norm)
ff_out=tf.keras.layers.Dense(Layer_norm.shape[2])(ff_out)
drop4=tf.keras.layers.Dropout(keep_prob_)(ff_out)
Layer_norm_input=tf.keras.layers.Add()([Layer_norm, drop4])
Attention_block_out=tf.keras.layers.LayerNormalization(axis=-1)(Layer_norm_input)
intraEpoch_att_block=tf.keras.Model(inputs=input_features, outputs=Attention_block_out)
I have read about creating custom layers in Keras but I did not find the documentation to be clear enough. I want to reuse the sub-model which able to learn the different set of weight in a single functional API model in tensorflow.keras.
Use this code (I removed SelfAttention, so add it back):
import tensorflow as tf
class my_model(tf.keras.layers.Layer):
def __init__(self):
super(my_model, self).__init__()
keep_prob_=0.5
input_features=tf.keras.layers.Input(shape=(29, 1664))
drop3=tf.keras.layers.Dropout(keep_prob_)(input_features)
Layer_norm_feat=tf.keras.layers.Add()([input_features, drop3])
Layer_norm=tf.keras.layers.LayerNormalization(axis=-1)(Layer_norm_feat)
ff_out=tf.keras.layers.Dense(Layer_norm.shape[2], activation='relu')(Layer_norm)
ff_out=tf.keras.layers.Dense(Layer_norm.shape[2])(ff_out)
drop4=tf.keras.layers.Dropout(keep_prob_)(ff_out)
Layer_norm_input=tf.keras.layers.Add()([Layer_norm, drop4])
Attention_block_out=tf.keras.layers.LayerNormalization(axis=-1)(Layer_norm_input)
self.intraEpoch_att_block=tf.keras.Model(inputs=input_features, outputs=Attention_block_out)
def call(self, inp, training=False):
x = self.intraEpoch_att_block(inp)
return x
model1 = my_model()
model2 = my_model()

Alter Keras model for validation step

I have a model that differs during the training and inference. More precisely, it is a SSD (Single Shot Detector) that requires additional DetectionOutput layer to be added on the top of its training counterpart. In Caffe, one can use the 'include' parameter in the layer definition to turn layers on/off.
But what should I do after having defined and compiled the model, if I wish to run validation after each epoch (inside a callback)?
I cannot add DetectionOutput during the training, since it is not compatible with the input to the loss.
I also would like to avoid creation of DetectionOutput layer somewhere inside callback or a custom metric, since it requires sensible hyperparams and I would like to keep the model creation logic inside the dedicated module.
In the following example code model is created for inference, DetectionOutput layer is present. So the evaluation runs just fine:
model, _, _ = build_model(input_shape=(args.input_height, args.input_width, 3),
n_classes=num_classes,
mode='inference')
model.load_weights(args.model, by_name=True)
evaluation = SSDEvaluation(model=model,
evaluator=PascalDetectionEvaluator(categories),
data_files=[args.eval_data])
metrics = evaluation.evaluate()
But this callback does not work properly because during the training model does not have DetectionOutput:
class SSDTensorboard(Callback):
def __init__(self, evaluator, eval_data):
self.evaluator = evaluator
self.eval_data = eval_data
def on_train_begin(self, logs={}):
self.metrics = []
def on_epoch_end(self, epoch, logs={}):
evaluation = SSDEvaluation(self.model, self.evaluator, self.eval_data)
metrics = evaluation.evaluate()
self.metrics.append(metrics)
What would be the proper (pythonic, keratonic etc.) way to run the training as usual, but perform validation step on the altered model with the same weights? Maybe, having a separate model for validation with shared weights?
You should use the headless (without DetectionOutput) model for training, but provide a model with the top layer to the evaluation:
def add_detection_output(model):
# make validation/inference model here
...
evaluation = SSDEvaluation(model=add_detection_output(model),
evaluator=PascalDetectionEvaluator(categories),
data_files=[args.eval_data])
Avoid using the training model inside the callback, let the evaluation object hold the reference to the validation model:
class SSDTensorboard(Callback):
def __init__(self, evaluation):
self.evaluation = evaluation
def on_epoch_end(self, epoch, logs={}):
metrics = self.evaluation.evaluate()

Categories