I recently deployed a custom model to google cloud's ai-platform, and I am trying to debug some parts of my preprocessing logic. However, My print statements are not being logged to the stackdriver output. I have also tried using the logging client imported from google.cloud, to no avail. Here is my custom prediction file:
import os
import pickle
import numpy as np
from sklearn.datasets import load_iris
import tensorflow as tf
from google.cloud import logging
class MyPredictor(object):
def __init__(self, model, preprocessor):
self.logging_client = logging.Client()
self._model = model
self._preprocessor = preprocessor
self._class_names = ["Snare", "Kicks", "ClosedHH", "ClosedHH", "Clap", "Crash", "Perc"]
def predict(self, instances, **kwargs):
log_name = "Here I am"
logger = self.logging_client.logger(log_name)
text = 'Hello, world!'
logger.log_text(text)
print('Logged: {}'.format(text), kwargs.get("sr"))
inputs = np.asarray(instances)
outputs = self._model.predict(inputs)
if kwargs.get('probabilities'):
return outputs.tolist()
#return "[]"
else:
return [self._class_names[index] for index in np.argmax(outputs.tolist(), axis=1)]
#classmethod
def from_path(cls, model_dir):
model_path = os.path.join(model_dir, 'model.h5')
model = tf.keras.models.load_model(model_path, custom_objects={"adam": tf.keras.optimizers.Adam,
"categorical_crossentropy":tf.keras.losses.categorical_crossentropy, "lr":0.01, "name": "Adam"})
preprocessor_path = os.path.join(model_dir, 'preprocessor.pkl')
with open(preprocessor_path, 'rb') as f:
preprocessor = pickle.load(f)
return cls(model, preprocessor)
I can't find anything online for why my logs are not showing up in stackdriver (neither print statements nor the logging library calls). Has anyone faced this issue?
Thanks,
Nikita
NOTE: If you have enough rep to create tags please add the google-ai-platform tag to this post. I think it would really help people who are in my position. Thanks!
From Documentation:
If you want to enable online prediction logging, you must configure it
when you create a model resource or when you create a model version
resource, depending on which type of logging you want to enable. There
are three types of logging, which you can enable independently:
Access logging, which logs information like timestamp and latency for
each request to Stackdriver Logging.
You can enable access logging when you create a model resource.
Stream logging, which logs the stderr and stdout streams from your
prediction nodes to Stackdriver Logging, and can be useful for
debugging. This type of logging is in beta, and it is not supported by
Compute Engine (N1) machine types.
You can enable stream logging when you create a model resource.
Request-response logging, which logs a sample of online prediction
requests and responses to a BigQuery table. This type of logging is in
beta.
You can enable request-response logging by creating a model version
resource, then updating that version.
For your use case, please use the following template to log custom information into StackDriver:
Model
gcloud beta ai-platform models create {MODEL_NAME} \
--regions {REGION} \
--enable-logging \
--enable-console-logging
Model version
gcloud beta ai-platform versions create {VERSION_NAME} \
--model {MODEL_NAME} \
--origin gs://{BUCKET}/{MODEL_DIR} \
--python-version 3.7 \
--runtime-version 1.15 \
--package-uris gs://{BUCKET}/{PACKAGES_DIR}/custom-model-0.1.tar.gz \
--prediction-class=custom_prediction.CustomModelPrediction \
--service-account custom#project_id.iam.gserviceaccount.com
I tried this and worked fine:
I did some modification to the constructor due to the #classmethod decorator.
Create a service account and grant it "Stackdriver Debugger User" role, use it during model version creation
Add google-cloud-logging library to your setup.py
Consider extra cost of enabling StackDriver logging
When using log_struct check the correct type is passed. (If using str, make sure you convert bytes to str in Python 3 using .decode('utf-8'))
Define the project_id parameter during Stackdriver client creation
logging.Client(), otherwise you will get:
ERROR:root:Prediction failed: 400 Name "projects//logs/my-custom-prediction-log" is missing the parent component. Expected the form projects/[PROJECT_ID]/logs/[ID]"
Code below:
%%writefile cloud_logging.py
import os
import pickle
import numpy as np
from datetime import date
from google.cloud import logging
import tensorflow.keras as keras
LOG_NAME = 'my-custom-prediction-log'
class CustomModelPrediction(object):
def __init__(self, model, processor, client):
self._model = model
self._processor = processor
self._client = client
def _postprocess(self, predictions):
labels = ['negative', 'positive']
return [
{
"label":labels[int(np.round(prediction))],
"score":float(np.round(prediction, 4))
} for prediction in predictions]
def predict(self, instances, **kwargs):
logger = self._client.logger(LOG_NAME)
logger.log_struct({'instances':instances})
preprocessed_data = self._processor.transform(instances)
predictions = self._model.predict(preprocessed_data)
labels = self._postprocess(predictions)
return labels
#classmethod
def from_path(cls, model_dir):
client = logging.Client(project='project_id') # Change to your project
model = keras.models.load_model(
os.path.join(model_dir,'keras_saved_model.h5'))
with open(os.path.join(model_dir, 'processor_state.pkl'), 'rb') as f:
processor = pickle.load(f)
return cls(model, processor, client)
# Verify model locally
from cloud_logging import CustomModelPrediction
classifier = CustomModelPrediction.from_path('.')
requests = ["God I hate the north", "god I love this"]
response = classifier.predict(requests)
response
Then I check with the sample library:
python snippets.py my-custom-prediction-log list
Listing entries for logger my-custom-prediction-log:
* 2020-02-19T19:51:45.809767+00:00: {u'instances': [u'God I hate the north', u'god I love this']}
* 2020-02-19T19:57:18.615159+00:00: {u'instances': [u'God I hate the north', u'god I love this']}
To visualize the logs, in StackDriver > Logging > Select Global and your Log name, if you want to see Model logs you should be able to select Cloud ML Model version.
You can use my files here: model and pre-processor
If you just want your print to work and not use the logging method above me you can just add flush flag to your print,
print(“logged”,flush=True)
Related
I am following a guide to get a Vertex AI pipeline working:
https://codelabs.developers.google.com/vertex-pipelines-intro#5
I have implemented the following custom component:
from google.cloud import aiplatform as aip
from google.oauth2 import service_account
project = "project-id"
region = "us-central1"
display_name = "lookalike_model_pipeline_1646929843"
model_name = f"projects/{project}/locations/{region}/models/{display_name}"
api_endpoint = "us-central1-aiplatform.googleapis.com" #europe-west2
model_resource_path = model_name
client_options = {"api_endpoint": api_endpoint}
# Initialize client that will be used to create and send requests.
client = aip.gapic.ModelServiceClient(credentials=service_account.Credentials.from_service_account_file('..\\service_accounts\\aiplatform_sa.json'),
client_options=client_options)
#get model evaluation
response = client.list_model_evaluations(parent=model_name)
And I get following error:
(<class 'google.api_core.exceptions.PermissionDenied'>, PermissionDenied("Permission 'aiplatform.modelEvaluations.list' denied on resource '//aiplatform.googleapis.com/projects/project-id/locations/us-central1/models/lookalike_model_pipeline_1646929843' (or it may not exist)."), <traceback object at 0x000002414D06B9C0>)
The model definitely exists and has finished training. I have given myself admin rights in the aiplatform service account. In the guide, they do not use a service account, but uses only client_options instead. The client_option has the wrong type since it is a dict(str, str) when it should be: Optional['ClientOptions']. But this doesn't cause an error.
My main question is: how do I get around this permission issue?
My subquestions are:
How can I use my model_name variable in a URL to get to the model?
How can I create an Optional['ClientOptions'] object to pass as client_option
Is there another way I can list_model_evaluations from a model that is in VertexAI, trained using automl?
Thanks
With the caveats in my comment that, while familiar with GCP, I'm less familiar with the AI|ML stuff. The following should work. I don't have a model to deploy to test it.
BILLING=[[YOUR-BILLING]]
export PROJECT=[[YOUR-PROJECT]]
export LOCATION="us-central1"
export MODEL=[[YOUR-MODEL]]
ACCOUNT="tester"
gcloud projects create ${PROJECT}
gcloud beta billing projects link ${PROJECT} \
--billing-account=${BILLING}
# Unsure whether ML is needed
for SERVICE in "aiplatform" "ml"
do
gcloud services enable ${SERVICE}.googleapis.com \
--project=${PROJECT}
done
gcloud iam service-accounts create ${ACCOUNT} \
--project=${PROJECT}
EMAIL=${ACCOUNT}#${PROJECT}.iam.gserviceaccount.com
gcloud projects add-iam-policy-binding ${PROJECT} \
--role=roles/aiplatform.admin \
--member=serviceAccount:${EMAIL}
gcloud iam service-accounts keys create ${PWD}/${ACCOUNT}.json \
--iam-account=${EMAIL} \
--project=${PROJECT}
export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/${ACCOUNT}.json
python3 -m venv venv
source venv/bin/activate
python3 -m pip install google-cloud-aiplatform
python3 main.py
main.py:
import os
from google.cloud import aiplatform
project = os.getenv("PROJECT")
location = os.getenv("LOCATION")
model = os.getenv("MODEL")
aiplatform.init(
project=project,
location=location,
experiment="test",
)
parent = f"projects/{project}/locations/{location}/models/{model}"
model = aiplatform.Model(parent)
I tried using your code and it did not also work for me and got a different error. As #DazWilkin mentioned it is recommended to use the Cloud Client.
I used aiplatform_v1 and it worked fine. One thing I noticed is that you should always define a value for client_options so it will point to the correct endpoint. Checking the code for ModelServiceClient, if I'm not mistaken the endpoint defaults to "aiplatform.googleapis.com" which don't have a location prepended. AFAIK the endpoint should prepend a location.
See code below. I used AutoML models and it returns their model evaluations.
from google.cloud import aiplatform_v1 as aiplatform
from typing import Optional
def get_model_eval(
project_id: str,
model_id: str,
client_options: dict,
location: str = 'us-central1',
):
client_model = aiplatform.services.model_service.ModelServiceClient(client_options=client_options)
model_name = f'projects/{project_id}/locations/{location}/models/{model_id}'
list_eval_request = aiplatform.types.ListModelEvaluationsRequest(parent=model_name)
list_eval = client_model.list_model_evaluations(request=list_eval_request)
print(list_eval)
api_endpoint = 'us-central1-aiplatform.googleapis.com'
client_options = {"api_endpoint": api_endpoint} # api_endpoint is required for client_options
project_id = 'project-id'
location = 'us-central1'
model_id = '99999999999' # aiplatform_v1 uses the model_id
get_model_eval(
client_options = client_options,
project_id = project_id,
location = location,
model_id = model_id,
)
This is an output snippet from my AutoML Text Classification:
I'm trying to create log-based metrics programmatically with cloud functions. I didn't really find any code sample so I'm a bit lost. This is the code I have so far
from google.cloud import logging_v2
metric = {"name":"test","filter":"stuff_here"}
client = logging_v2.Client()
client.create(metric)
I have the following error module 'google.cloud.logging_v2' has no attribute 'Client'
#edit
I found some code example in the documentation:
metric = client.metric(metric_name, filter_=filter, description=description)
assert not metric.exists() # API call
metric.create() # API call
assert metric.exists() # API call
but still stuck with the same error
Indeed, it does not, see Client
But, Client has a metric method that "Creates a metric bound to the current client."
And there's a Metrics class
import os
from google.cloud import logging_v2
client = logging_v2.Client(project=os.getenv("PROJECT"))
# You need to provide a filter
# This one counts the Service Accounts created in my project
filter=(
"resource.type=\"service_account\" "
"protoPayload.methodName=\"google.iam.admin.v1.CreateServiceAccountKey\" "
"severity=\"NOTICE\""
)
metric_name=os.getenv("METRIC")
And either using client.metric:
metric = client.metric(
metric_name,
filter_=filter,
description="test")
Or using logging_v2.Metric(...).create():
metric = logging_v2.Metric(
metric_name,
filter_=filter,
client=client).create()
And:
print(metric)
And:
export PROJECT=[[YOUR-PROJECT]]
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/you/key.json
export METRIC="test"
# Before
gcloud logging metrics list \
--project=${PROJECT} \
--filter="name=${METRIC}"
Listed 0 items.
python3 python/main.py
# After
gcloud logging metrics list \
--project=${PROJECT} \
--filter="name=${METRIC}" \
--format="yaml(name,filter)"
Yields:
filter: resource.type="service_account" protoPayload.methodName="google.iam.admin.v1.CreateServiceAccountKey" severity="NOTICE"
name: test
client should be called like this client = logging_v2.client.Client()
for example:
client = logging_v2.client.Client()
metric = client.metric(metric_name, filter_=filter, description=description)
assert not metric.exists() # API call
metric.create() # API call
assert metric.exists() # API call
I am trying to deploy custom model on AI platform. I have followed the steps as mentioned in the Google Document: https://cloud.google.com/ai-platform/prediction/docs/deploying-models#global-endpoint.
The saved model is stored in Google Cloud Storage and trained with python 3.7.
These are the gcloud commands used to deploy
gcloud ai-platform models create title_topic_custom \
--regions=europe-west1 --enable_logging
MODEL_DIR="gs://ai_platform_custom/SavedModel"
VERSION_NAME="V3"
MODEL_NAME="title_topic_custom"
CUSTOM_CODE_PATH="gs://ai_platform_custom/SavedModel/my_custom_code-0.1.tar.gz"
PREDICTOR_CLASS="predictor.py.MyPredictor"
gcloud beta ai-platform versions create $VERSION_NAME \
--model=$MODEL_NAME \
--origin=$MODEL_DIR \
--runtime-version=2.1 \
--python-version=3.7 \
--machine-type=mls1-c1-m2 \
--package-uris=$CUSTOM_CODE_PATH \
--prediction-class=$PREDICTOR_CLASS
I get the following error after executing those commands:
Using endpoint [https://ml.googleapis.com/]
Creating version (this might take a few minutes)......failed.
ERROR: (gcloud.beta.ai-platform.versions.create) Create Version failed. Bad model detected with error: "There was a problem processing the user code: predictor.py.MyPredictor cannot be found. Please make sure (1) prediction_class is the fully qualified function name, and (2) it uses the correct package name as provided by the package_uris: ['gs://ai_platform_custom/SavedModel/my_custom_code-0.1.tar.gz'] (Error code: 4)"
The predictor code is as below:
%%writefile predictor.py
import os
import spacy
import numpy as np
import joblib
import tensorflow as tf
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
class MyPredictor(object):
def __init__(self, model, topic_encoder):
self._model = model
self._nlp = spacy.load('en_core_web_sm')
self._stopwords = stopwords.words('english')
self._topic_encoder = topic_encoder
def predict(self, instances, **kwargs):
inputs = np.asarray(instances)
inputs_t = [' '.join([i for i in x.split() if i not in self._stopwords]) for x in inputs]
preprocessed_inputs = [' '.join([i.lemma_ for i in self._nlp(x)]) for x in inputs_t]
outputs = self._model.predict(preprocessed_inputs)
return [self._topic_encoder[key] for key in np.argmax(outputs, axis=1)]
#classmethod
def from_path(cls, model_dir):
model_path = os.path.join(model_dir)
model = tf.keras.models.load_model(model_path)
topic_encoder = {0:'topic1',1:'topic2',3:'topic3'}
return cls(model, topic_encoder)
This is the setup file
from setuptools import setup
setup(
name='my-custom-code',
version='0.1',
install_requires=['nltk','spacy','joblib'],
scripts=['predictor.py'])
Any workarounds?
Try using
PREDICTOR_CLASS="predictor.MyPredictor" instead of "predictor.py.MyPredictor"
I have same problem as
Why does my ML model deployment in Azure Container Instance still fail?
but the above solution does not work for me. Besides I get additional errors like belos
code": "AciDeploymentFailed",
"message": "Aci Deployment failed with exception: Your container application
crashed. This may be caused by errors in your scoring file's init()
function.\nPlease check the logs for your container instance: anomaly-detection-2.
From the AML SDK, you can run print(service.get_logs()) if you have service object
to fetch the logs. \nYou can also try to run image
mlad046a4688.azurecr.io/anomaly-detection-
2#sha256:fcbba67cf683626291c1bd084f31438fcd641ddaf80f9bdf8cea274d22d1fcb5 locally.
Please refer to http://aka.ms/debugimage#service-launch-fails for more
information.",
"details": [
{
"code": "CrashLoopBackOff",
"message": "Your container application crashed. This may be caused by errors in
your scoring file's init() function.\nPlease check the logs for your container
instance: anomaly-detection-2. From the AML SDK, you can run
print(service.get_logs()) if you have service object to fetch the logs. \nYou can
also try to run image mlad046a4688.azurecr.io/anomaly-detection-
2#sha256:fcbba67cf683626291c1bd084f31438fcd641ddaf80f9bdf8cea274d22d1fcb5 locally.
Please refer to http://aka.ms/debugimage#service-launch-fails for more
information."
}
]
}
It keeps pointing to scoring file but not sure what is wrong here
import numpy as np
import os
import pickle
import joblib
#from sklearn.externals import joblib
from sklearn.linear_model import LogisticRegression
from azureml.core.authentication import AzureCliAuthentication
from azureml.core import Model,Workspace
import logging
logging.basicConfig(level=logging.DEBUG)
def init():
global model
from sklearn.externals import joblib
# retrieve the path to the model file using the model name
model_path = Model.get_model_path(model_name='admlpkl')
print(model_path)
model = joblib.load(model_path)
#ws = Workspace.from_config(auth=cli_auth)
#logging.basicConfig(level=logging.DEBUG)
#modeld = ws.models['admlpkl']
#model=Model.deserialize(ws, modeld)
def run(raw_data):
# data = np.array(json.loads(raw_data)['data'])
# make prediction
data = json.loads(raw_data)
y_hat = model.predict(data)
#r = json.dumps(y_hat.tolist())
r = json.dumps(y_hat)
return r
The model has depencency on other file which I have added in
image_config = ContainerImage.image_configuration(execution_script="score.py",
runtime="python",
conda_file='conda_dependencies.yml',
dependencies=['modeling.py']
The logs are too abstract and really does not help to debug.I am able to create the image but provisioning service fails
Any inputs will be appreciated
Have you registered the model 'admlpkl' in your workspace using the register() function on the model object? If not, there will be no model path and that can cause failure.
See this section on model registration: https://learn.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#registermodel
Please follow the below to register and deploy the model to ACI.
I'm busy configuring a TensorFlow Serving client that asks a TensorFlow Serving server to produce predictions on a given input image, for a given model.
If the model being requested has not yet been served, it is downloaded from a remote URL to a folder where the server's models are located. (The client does this). At this point I need to update the model_config and trigger the server to reload it.
This functionality appears to exist (based on https://github.com/tensorflow/serving/pull/885 and https://github.com/tensorflow/serving/blob/master/tensorflow_serving/apis/model_service.proto#L22), but I can't find any documentation on how to actually use it.
I am essentially looking for a python script with which I can trigger the reload from client side (or otherwise to configure the server to listen for changes and trigger the reload itself).
So it took me ages of trawling through pull requests to finally find a code example for this. For the next person who has the same question as me, here is an example of how to do this. (You'll need the tensorflow_serving package for this; pip install tensorflow-serving-api).
Based on this pull request (which at the time of writing hadn't been accepted and was closed since it needed review): https://github.com/tensorflow/serving/pull/1065
from tensorflow_serving.apis import model_service_pb2_grpc
from tensorflow_serving.apis import model_management_pb2
from tensorflow_serving.config import model_server_config_pb2
import grpc
def add_model_config(host, name, base_path, model_platform):
channel = grpc.insecure_channel(host)
stub = model_service_pb2_grpc.ModelServiceStub(channel)
request = model_management_pb2.ReloadConfigRequest()
model_server_config = model_server_config_pb2.ModelServerConfig()
#Create a config to add to the list of served models
config_list = model_server_config_pb2.ModelConfigList()
one_config = config_list.config.add()
one_config.name= name
one_config.base_path=base_path
one_config.model_platform=model_platform
model_server_config.model_config_list.CopyFrom(config_list)
request.config.CopyFrom(model_server_config)
print(request.IsInitialized())
print(request.ListFields())
response = stub.HandleReloadConfigRequest(request,10)
if response.status.error_code == 0:
print("Reload sucessfully")
else:
print("Reload failed!")
print(response.status.error_code)
print(response.status.error_message)
add_model_config(host="localhost:8500",
name="my_model",
base_path="/models/my_model",
model_platform="tensorflow")
Add a model to TF Serving server and to the existing config file conf_filepath: Use arguments name, base_path, model_platform for the new model. Keeps the original models intact.
Notice a small difference from #Karl 's answer - using MergeFrom instead of CopyFrom
pip install tensorflow-serving-api
import grpc
from google.protobuf import text_format
from tensorflow_serving.apis import model_service_pb2_grpc, model_management_pb2
from tensorflow_serving.config import model_server_config_pb2
def add_model_config(conf_filepath, host, name, base_path, model_platform):
with open(conf_filepath, 'r+') as f:
config_ini = f.read()
channel = grpc.insecure_channel(host)
stub = model_service_pb2_grpc.ModelServiceStub(channel)
request = model_management_pb2.ReloadConfigRequest()
model_server_config = model_server_config_pb2.ModelServerConfig()
config_list = model_server_config_pb2.ModelConfigList()
model_server_config = text_format.Parse(text=config_ini, message=model_server_config)
# Create a config to add to the list of served models
one_config = config_list.config.add()
one_config.name = name
one_config.base_path = base_path
one_config.model_platform = model_platform
model_server_config.model_config_list.MergeFrom(config_list)
request.config.CopyFrom(model_server_config)
response = stub.HandleReloadConfigRequest(request, 10)
if response.status.error_code == 0:
with open(conf_filepath, 'w+') as f:
f.write(request.config.__str__())
print("Updated TF Serving conf file")
else:
print("Failed to update model_config_list!")
print(response.status.error_code)
print(response.status.error_message)
While the solutions mentioned here works fine, there is one more method that you can use to hot-reload your models. You can use --model_config_file_poll_wait_seconds
As mentioned here in the documentation -
By setting the --model_config_file_poll_wait_seconds flag to instruct the server to periodically check for a new config file at --model_config_file filepath.
So, you just have to update the config file at model_config_path and tf-serving will load any new models and unload any models removed from the config file.
Edit 1: I looked at the source code and it seems that the flag is present from the very early version of tf-serving but there have been instances where some users were not able to use this flag (see this). So, try to use the latest version if possible.
If you're using the method described in this answer, please note that you're actually launching multiple tensorflow model server instances instead of a single model server, effectively making the servers compete for resources instead of working together to optimize tail latency.