Python Vetiver model - use alternative prediction method - python

I'm trying to use Vetiver to deploy an isolation forest model (for anomaly detection) to an API endpoint.
All is going well by adapting the example here.
However, when deployed, the endpoint uses the model.predict() method by default (which returns 1 for normal or -1 for anomaly).
I want the model to return a score between 0 and 1 as given by the model.score_samples() method.
Does anybody know how I can configure the Vetiver endpoint such that it uses .score_samples() rather than .predict() for scoring?
Thanks

vetiver.predict() primarily acts as a router to an API endpoint, so it does not have to have the same behavior as model.predict(). You can overload what function vetiver.predict() uses on your model by defining a custom handler.
In your case, an implementation might look something like below.
from vetiver.handlers.base import VetiverHandler
class ScoreSamplesHandler(VetiverHandler):
def __init__(self, model, ptype_data):
super().__init__(model, ptype_data)
def handler_predict(self, input_data, check_ptype):
"""
Define how to make predictions from your model
"""
prediction = self.model.score_samples(input_data)
return prediction
Initialize your custom handler and pass it into VetiverModel().
custom_model = ScoreSamplesHandler(model, ptype_data)
VetiverModel(custom_model, "model_name")
Then, when you use the vetiver.predict() function, it will return the values for model.score_samples(), rather than model.predict().

Related

Is there way to embed non-tf functions to a tf.Keras model graph as SavedModel Signature?

I want to add preprocessing functions and methods to the model graph as a SavedModel signature.
example:
# suppose we have a keras model
# ...
# defining the function I want to add to the model graph
#tf.function
def process(model, img_path):
# do some preprocessing using different libs. and modules...
outputs = {"preds": model.predict(preprocessed_img)}
return outputs
# saving the model with a custom signature
tf.saved_model.save(new_model, dst_path,
signatures={"process": process})
or we can use tf.Module here. However, the problem is I can not embed custom functions into the saved model graph.
Is there any way to do that?
I think you slightly misunderstand the purpose of save_model method in Tensorflow.
As per the documentation the intent is to have a method which serialises the model's graph so that it can be loaded with load_model afterwards.
The model returned by load_model is a class of tf.Module with all it's methods and attributes. Instead you want to serialise the prediction pipeline.
To be honest, I'm not aware of a good way to do that, however what you can do is to use a different method for serialisation of your preprocessing parameters, for example pickle or a different one, provided by the framework you use and write a class on top of that, which would do the following:
class MyModel:
def __init__(self, model_path, preprocessing_path):
self.model = load_model(model_path)
self.preprocessing = load_preprocessing(preprocessing_path)
def predict(self, img_path):
return self.model.predict(self.preprocessing(img_path))

Tensorflow Keras Model subclassing -- call function

I am experimenting with self supervised learning using tensorflow. The example code I'm running can be found in the Keras examples website. This is the link to the NNCLR example. The Github link to download the code can be found here. While I have no issues running the examples, I am running into issues when I try to save the pretrained or the finetuned model using model.save().
The error I'm getting is this:
f"Model {model} cannot be saved either because the input shape is not "
ValueError: Model <__main__.NNCLR object at 0x7f6bc0f39550> cannot be saved either
because the input shape is not available or because the forward pass of the model is
not defined. To define a forward pass, please override `Model.call()`.
To specify an input shape, either call `build(input_shape)` directly, or call the model on actual data using `Model()`, `Model.fit()`, or `Model.predict()`.
If you have a custom training step, please make sure to invoke the forward pass in train step through
`Model.__call__`, i.e. `model(inputs)`, as opposed to `model.call()`.
I am unsure how to override the Model.call() method. Appreciate some help.
One way to achieve model saving in such cases is to override the save (or save_weights) method in the keras.Model class. In your case, first initialize the finetune model in the NNCLR class. And next, override the save method for it. FYI, in this way, you may also able to use ModelCheckpoint API.
As said, define the finetune model in the NNCLR model class and override the save method for it.
class NNCLR(keras.Model):
def __init__(...):
super().__init__()
...
self.finetuning_model = keras.Sequential(
[
layers.Input(shape=input_shape),
self.classification_augmenter,
self.encoder,
layers.Dense(10),
],
name="finetuning_model",
)
...
def save(
self, filepath, overwrite=True, include_optimizer=True,
save_format=None, signatures=None, options=None
):
self.finetuning_model.save(
filepath=filepath,
overwrite=overwrite,
save_format=save_format,
options=options,
include_optimizer=include_optimizer,
signatures=signatures
)
model = NNCLR(...)
model.compile
model.fit
Next, you can do
model.save('finetune_model') # SavedModel format
finetune_model = tf.keras.models.load_model('finetune_model', compile=False)
'''
NNCLR code example: Evaluate sections
"A popular way to evaluate a SSL method in computer vision or
for that fact any other pre-training method as such is to learn
a linear classifier on the frozen features of the trained backbone
model and evaluate the classifier on unseen images."
'''
for layer in finetune_model.layers:
if not isinstance(layer, layers.Dense):
layer.trainable = False
finetune_model.summary() # OK
finetune_model.compile(
optimizer=keras.optimizers.Adam(),
loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=[keras.metrics.SparseCategoricalAccuracy(name="acc")],
)
finetune_model.fit

multi-output keras model with a callback that monitors two metrics

I have a tf model that has two outputs, as indicated by this model.compile():
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=7e-4),
loss={"BV": tf.keras.losses.MeanAbsoluteError(), "Rsp": tf.keras.losses.MeanAbsoluteError()},
metrics={"BV": [tf.keras.metrics.RootMeanSquaredError(name="RMSE"), tfa.metrics.r_square.RSquare(name="R2")],
"Rsp": [tf.keras.metrics.RootMeanSquaredError(name="RMSE"), tfa.metrics.r_square.RSquare(name="R2")]})
I would like to use the ModelCheckpoint callback, which should monitor a sum of val_BV_R2 and val_Rsp_R2. I am able to run the callback like this:
save_best_model = tf.keras.callbacks.ModelCheckpoint("xyz.hdf5", monitor="val_Rsp_R2")
However, I don't know how to make it to save the model with the highest sum of two metrics.
According to the tf.keras.callbacks.ModelCheckpoint documentation, the metric to monitor che be only one at a time.
One way to achieve what you want, could be to define an additional custom metric, that performs the sum of the two metrics. Then you could monitor your custom metric and save the checkpoints as you are already doing. However this is a bit complicated, due to having multiple outputs.
Alternatively you could define a custom callback that does the same combining. Below a simple example of this second option. It should work (sorry I can't test it right now):
class CombineCallback(tf.keras.callbacks.Callback):
def __init__(self, **kargs):
super(CombineCallback, self).__init__(**kargs)
def on_epoch_end(self, epoch, logs={}):
logs['combine_metric'] = 0.5*logs['val_BV_R2'] + 0.5*logs['val_Rsp_R2']
Inside the callback you should be able to access your metrics directly with logs['name_of_my_metric'] or through the get function logs.get("name_of_my_metric").
Also I multiplied by 0.5 to leave the combined metric approximately in the same range, but see if this works for your case.
To use it just do:
save_best_model = CombineCallback("xyz.hdf5")
model.fit(..., callbacks=[save_best_model])
More information can be found at the Examples of Keras callback applications.

Writing a custom predict method using MLFlow and pyspark

I am having trouble writing a custom predict method using MLFlow and pyspark (2.4.0). What I have so far is a custom transformer that changes the data into the format I need.
class CustomGroupBy(Transformer):
def __init__(self):
pass
def _transform(self, dataset):
df = dataset.select("userid", explode(split("widgetid", ',')).alias("widgetid"))
return(df)
Then I built a custom estimator to run one of the pyspark machine learning algorithms
class PipelineFPGrowth(Estimator, HasInputCol, DefaultParamsReadable, DefaultParamsWritable):
def __init__(self, inputCol=None, minSupport=0.005, minConfidence=0.01):
super(PipelineFPGrowth, self).__init__()
self.minSupport = minSupport
self.minConfidence = minConfidence
def setInputCol(self, value):
return(self._set(inputCol=value))
def _fit(self, dataset):
c = self.getInputCol()
fpgrowth = FPGrowth(itemsCol=c, minSupport=self.minSupport, minConfidence=self.minConfidence)
model = fpgrowth.fit(dataset)
return(model)
This runs in the MLFlow pipeline.
pipeline = Pipeline(stages = [CustomGroupBy,PipelineFPGrowth]).fit(df)
This all works. If I create a new pyspark dataframe with new data to predict on, I get predictions.
newDF = spark.createDataFrame([(123456,['123ABC', '789JSF'])], ["userid", "widgetid"])
pipeline.stages[1].transform(newDF).show(3, False)
# How to access frequent itemset.
pipeline.stages[1].freqItemsets.show(3, False)
Where I run into problems is writing a custom predict. I need to append the frequent itemset that FPGrowth generates to the end of the predictions. I have written the logic for that, but I am having a hard time figuring out how to put it into a custom method. I have tried adding it to my custom estimator but this didn't work. Then I wrote a separate class to take in the returned model and give the extended predictions. This was also unsuccessful.
Eventually I need to log and save the model so I can Dockerize it, which means I will need a custom flavor and to use the pyfunc function. Does anyone have a hint on how to extend the predict method and then log and save the model?

Use model method as default value for Django Model?

I actually have this method in my Model:
def speed_score_compute(self):
# Speed score:
#
# - 8 point for every % of time spent in
# high intensity running phase.
# - Average Speed in High Intensity
# running phase (30km/h = 50 points
# 0-15km/h = 15 points )
try:
high_intensity_running_ratio = ((
(self.h_i_run_time * 100)/self.training_length) * 8)
except ZeroDivisionError:
return 0
high_intensity_running_ratio = min(50, high_intensity_running_ratio)
if self.h_i_average_speed < 15:
average_speed_score = 10
else:
average_speed_score = self.cross_multiplication(
30, self.h_i_average_speed, 50)
final_speed_score = high_intensity_running_ratio + average_speed_score
return final_speed_score
I want to use it as default for my Model like this:
speed_score = models.IntegerField(default=speed_score_compute)
But this don't work (see Error message below) . I've checked different topic like this one, but this work only for function (not using self attribute) but not for methods (I must use methods since I'm working with actual object attributes).
Django doc. seems to talk about this but I don't get it clearly maybe because I'm still a newbie to Django and Programmation in general.
Is there a way to achieve this ?
EDIT:
My function is defined above my models. But here is my error message:
ValueError: Could not find function speed_score_compute in tournament.models.
Please note that due to Python 2 limitations, you cannot serialize unbound method functions (e.g. a method declared and used in the same class body). Please move the function into the main module body to use migrations.
For more information, see https://docs.djangoproject.com/en/1.8/topics/migrations/#serializing-values
Error message is clear, it seems that I'm not able to do this. But is there another way to achieve this ?
The problem
When we provide a default=callable and provide a method from a model, it doesn't get called with the self argument that is the model instance.
Overriding save()
I haven't found a better solution than to override MyModel.save()* like below:
class MyModel(models.Model):
def save(self, *args, **kwargs):
if self.speed_score is None:
self.speed_score = ...
# or
self.calculate_speed_score()
# Now we call the actual save method
super(MyModel, self).save(*args, **kwargs)
This makes it so that if you try to save your model, without a set value for that field, it is populated before the save.
Personally I just prefer this approach of having everything that belongs to a model, defined in the model (data, methods, validation, default values etc). I think this approach is referred to as the fat Django model approach.
*If you find a better approach, I'd like to learn about it too!
Using a pre_save signal
Django provides a pre_save signal that runs before save() is ran on the model. Signals run synchronously, i.e. the pre_save code needs to finish running before save() is called on the model. You'll get the same results (and order of execution) as overriding the save().
from django.db.models.signals import pre_save
from django.dispatch import receiver
from myapp.models import MyModel
#receiver(pre_save, sender=MyModel)
def my_handler(sender, **kwargs):
instance = kwargs['instance']
instance.populate_default_values()
If you prefer to keep the default values behavior separated from the model, this approach is for you!
When is save() called? Can I work with the object before it gets saved?
Good questioin! Because we'd like the ability to work with our object before subjecting to saving of populating default values.
To get an object, without saving it, just to work with it we can do:
instance = MyModel()
If you create it using MyModel.objects.create(), then save() will be called. It is essentially (see source code) equivalent to:
instance = MyModel()
instance.save()
If it's interesting to you, you can also define a MyModel.populate_default_values(), that you can call at any stage of the object lifecycle (at creation, at save, or on-demande, it's up to you)

Categories