I am following a guide to get a Vertex AI pipeline working:
https://codelabs.developers.google.com/vertex-pipelines-intro#5
I have implemented the following custom component:
from google.cloud import aiplatform as aip
from google.oauth2 import service_account
project = "project-id"
region = "us-central1"
display_name = "lookalike_model_pipeline_1646929843"
model_name = f"projects/{project}/locations/{region}/models/{display_name}"
api_endpoint = "us-central1-aiplatform.googleapis.com" #europe-west2
model_resource_path = model_name
client_options = {"api_endpoint": api_endpoint}
# Initialize client that will be used to create and send requests.
client = aip.gapic.ModelServiceClient(credentials=service_account.Credentials.from_service_account_file('..\\service_accounts\\aiplatform_sa.json'),
client_options=client_options)
#get model evaluation
response = client.list_model_evaluations(parent=model_name)
And I get following error:
(<class 'google.api_core.exceptions.PermissionDenied'>, PermissionDenied("Permission 'aiplatform.modelEvaluations.list' denied on resource '//aiplatform.googleapis.com/projects/project-id/locations/us-central1/models/lookalike_model_pipeline_1646929843' (or it may not exist)."), <traceback object at 0x000002414D06B9C0>)
The model definitely exists and has finished training. I have given myself admin rights in the aiplatform service account. In the guide, they do not use a service account, but uses only client_options instead. The client_option has the wrong type since it is a dict(str, str) when it should be: Optional['ClientOptions']. But this doesn't cause an error.
My main question is: how do I get around this permission issue?
My subquestions are:
How can I use my model_name variable in a URL to get to the model?
How can I create an Optional['ClientOptions'] object to pass as client_option
Is there another way I can list_model_evaluations from a model that is in VertexAI, trained using automl?
Thanks
With the caveats in my comment that, while familiar with GCP, I'm less familiar with the AI|ML stuff. The following should work. I don't have a model to deploy to test it.
BILLING=[[YOUR-BILLING]]
export PROJECT=[[YOUR-PROJECT]]
export LOCATION="us-central1"
export MODEL=[[YOUR-MODEL]]
ACCOUNT="tester"
gcloud projects create ${PROJECT}
gcloud beta billing projects link ${PROJECT} \
--billing-account=${BILLING}
# Unsure whether ML is needed
for SERVICE in "aiplatform" "ml"
do
gcloud services enable ${SERVICE}.googleapis.com \
--project=${PROJECT}
done
gcloud iam service-accounts create ${ACCOUNT} \
--project=${PROJECT}
EMAIL=${ACCOUNT}#${PROJECT}.iam.gserviceaccount.com
gcloud projects add-iam-policy-binding ${PROJECT} \
--role=roles/aiplatform.admin \
--member=serviceAccount:${EMAIL}
gcloud iam service-accounts keys create ${PWD}/${ACCOUNT}.json \
--iam-account=${EMAIL} \
--project=${PROJECT}
export GOOGLE_APPLICATION_CREDENTIALS=${PWD}/${ACCOUNT}.json
python3 -m venv venv
source venv/bin/activate
python3 -m pip install google-cloud-aiplatform
python3 main.py
main.py:
import os
from google.cloud import aiplatform
project = os.getenv("PROJECT")
location = os.getenv("LOCATION")
model = os.getenv("MODEL")
aiplatform.init(
project=project,
location=location,
experiment="test",
)
parent = f"projects/{project}/locations/{location}/models/{model}"
model = aiplatform.Model(parent)
I tried using your code and it did not also work for me and got a different error. As #DazWilkin mentioned it is recommended to use the Cloud Client.
I used aiplatform_v1 and it worked fine. One thing I noticed is that you should always define a value for client_options so it will point to the correct endpoint. Checking the code for ModelServiceClient, if I'm not mistaken the endpoint defaults to "aiplatform.googleapis.com" which don't have a location prepended. AFAIK the endpoint should prepend a location.
See code below. I used AutoML models and it returns their model evaluations.
from google.cloud import aiplatform_v1 as aiplatform
from typing import Optional
def get_model_eval(
project_id: str,
model_id: str,
client_options: dict,
location: str = 'us-central1',
):
client_model = aiplatform.services.model_service.ModelServiceClient(client_options=client_options)
model_name = f'projects/{project_id}/locations/{location}/models/{model_id}'
list_eval_request = aiplatform.types.ListModelEvaluationsRequest(parent=model_name)
list_eval = client_model.list_model_evaluations(request=list_eval_request)
print(list_eval)
api_endpoint = 'us-central1-aiplatform.googleapis.com'
client_options = {"api_endpoint": api_endpoint} # api_endpoint is required for client_options
project_id = 'project-id'
location = 'us-central1'
model_id = '99999999999' # aiplatform_v1 uses the model_id
get_model_eval(
client_options = client_options,
project_id = project_id,
location = location,
model_id = model_id,
)
This is an output snippet from my AutoML Text Classification:
Related
I'm trying to create log-based metrics programmatically with cloud functions. I didn't really find any code sample so I'm a bit lost. This is the code I have so far
from google.cloud import logging_v2
metric = {"name":"test","filter":"stuff_here"}
client = logging_v2.Client()
client.create(metric)
I have the following error module 'google.cloud.logging_v2' has no attribute 'Client'
#edit
I found some code example in the documentation:
metric = client.metric(metric_name, filter_=filter, description=description)
assert not metric.exists() # API call
metric.create() # API call
assert metric.exists() # API call
but still stuck with the same error
Indeed, it does not, see Client
But, Client has a metric method that "Creates a metric bound to the current client."
And there's a Metrics class
import os
from google.cloud import logging_v2
client = logging_v2.Client(project=os.getenv("PROJECT"))
# You need to provide a filter
# This one counts the Service Accounts created in my project
filter=(
"resource.type=\"service_account\" "
"protoPayload.methodName=\"google.iam.admin.v1.CreateServiceAccountKey\" "
"severity=\"NOTICE\""
)
metric_name=os.getenv("METRIC")
And either using client.metric:
metric = client.metric(
metric_name,
filter_=filter,
description="test")
Or using logging_v2.Metric(...).create():
metric = logging_v2.Metric(
metric_name,
filter_=filter,
client=client).create()
And:
print(metric)
And:
export PROJECT=[[YOUR-PROJECT]]
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/you/key.json
export METRIC="test"
# Before
gcloud logging metrics list \
--project=${PROJECT} \
--filter="name=${METRIC}"
Listed 0 items.
python3 python/main.py
# After
gcloud logging metrics list \
--project=${PROJECT} \
--filter="name=${METRIC}" \
--format="yaml(name,filter)"
Yields:
filter: resource.type="service_account" protoPayload.methodName="google.iam.admin.v1.CreateServiceAccountKey" severity="NOTICE"
name: test
client should be called like this client = logging_v2.client.Client()
for example:
client = logging_v2.client.Client()
metric = client.metric(metric_name, filter_=filter, description=description)
assert not metric.exists() # API call
metric.create() # API call
assert metric.exists() # API call
I have created a train.py script in Azure and it has the data cleaning, wrangling and classification part using XGBoost. Then I have created a ipynb file to do hyperparameter tuning by calling train.py script.
The child runs keep asking me to perform manual interactive login for every run. Please see the image.
I did the interactive login for many runs but still it will ask me everytime.
Here is the code in ipynb file:
subscription_id = 'XXXXXXXXXXXXXXXXXX'
resource_group = 'XXXXXXXXXXXXXXX'
workspace_name = 'XXXXXXXXXXXXXXX'
workspace = Workspace(subscription_id, resource_group, workspace_name)
myenv = Environment(workspace=workspace, name="myenv")
from azureml.core.conda_dependencies import CondaDependencies
conda_dep = CondaDependencies()
conda_dep.add_pip_package("numpy")
conda_dep.add_pip_package("pandas")
conda_dep.add_pip_package("nltk")
conda_dep.add_pip_package("sklearn")
conda_dep.add_pip_package("xgboost")
myenv.python.conda_dependencies = conda_dep
experiment_name = 'experiments_xgboost_hyperparams'
experiment = Experiment(workspace, experiment_name)
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
compute_cluster_name = 'shan'
try:
compute_target = ComputeTarget(workspace=workspace, name = compute_cluster_name)
print('Found the compute cluster')
except ComputeTargetException:
compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_DS3_V2", max_nodes=4)
compute_target = ComputeTarget.create(workspace, compute_cluster_name, compute_config)
compute_target.wait_for_completion(show_output=True)
early_termination_policy = BanditPolicy(slack_factor=0.01)
from azureml.train.hyperdrive import RandomParameterSampling
from azureml.train.hyperdrive import uniform, choice
ps = RandomParameterSampling( {
'learning_rate': uniform(0.1, 0.9),
'max_depth': choice(range(3,8)),
'n_estimators': choice(300, 400, 500, 600)
}
)
primary_metric_name="accuracy",
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE
from azureml.core import ScriptRunConfig
script_run_config = ScriptRunConfig(source_directory='.', script='train.py', compute_target=compute_target, environment=myenv)
# script_run_config.run_config.target = compute_target
# Create a HyperDriveConfig using the estimator, hyperparameter sampler, and policy.
hyperdrive_config = HyperDriveConfig(run_config=script_run_config,
hyperparameter_sampling=ps,
policy=early_termination_policy,
primary_metric_name="accuracy",
primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
max_total_runs=10,
max_concurrent_runs=4)
hyperdrive = experiment.submit(config=hyperdrive_config)
RunDetails(hyperdrive).show()
hyperdrive.wait_for_completion(show_output=True)
This just keeps asking me interactive login for every child run.
You need to implement an authentication method to avoid having interactive authentication.
The issue comes from this line :
workspace = Workspace(subscription_id, resource_group, workspace_name)
Azure ML SDK tries to access a Workspace only based on its name, the subscription id and the associated resource group. It does not know if you have access to it, this it why it asks you to authenticate through an URL.
I would suggest implementing an authentication through a service principal, you can find the official documentation here.
I'm trying to read data from bigquery in my beam pipeline using cloud dataflow runner.
I want to provide a credentials to access the project.
I've seen examples in Java but none in Python.
The only possibility I found is to use the : --service_account_email argument
But what if I want to give the .json key information in the code itself in all the options like :
google_cloud_options.service_account = '/path/to/credential.json'
options = PipelineOptions(flags=argv)
google_cloud_options = options.view_as(GoogleCloudOptions)
google_cloud_options.project = 'project_name'
google_cloud_options.job_name = 'job_name'
google_cloud_options.staging_location = 'gs://bucket'
google_cloud_options.temp_location = 'gs://bucket'
options.view_as(StandardOptions).runner = 'DataflowRunner'
with beam.Pipeline(options=options) as pipeline:
query = open('query.sql', 'r')
bq_source = beam.io.BigQuerySource(query=query.read(), use_standard_sql=True)
main_table = \
pipeline \
| 'ReadAccountViewAll' >> beam.io.Read(bq_source) \
Java has a method getGcpCredential but cant find one in Python...
Any ideas ?
The --service_account_email is the recommended approach as mentioned here . Downloading the key and storing it locally or on GCE is not recommended.
For the cases where it is required to use a different path for the json file within the code, you can try the following python Authentication workarounds:
client = Client.from_service_account_json('/path/to/keyfile.json')
or
client = Client(credentials=credentials)
Here is an example for creating custom credentials from a file:
credentials = service_account.Credentials.from_service_account_file(
key_path,
scopes=["https://www.googleapis.com/auth/cloud-platform"],
)
I recently deployed a custom model to google cloud's ai-platform, and I am trying to debug some parts of my preprocessing logic. However, My print statements are not being logged to the stackdriver output. I have also tried using the logging client imported from google.cloud, to no avail. Here is my custom prediction file:
import os
import pickle
import numpy as np
from sklearn.datasets import load_iris
import tensorflow as tf
from google.cloud import logging
class MyPredictor(object):
def __init__(self, model, preprocessor):
self.logging_client = logging.Client()
self._model = model
self._preprocessor = preprocessor
self._class_names = ["Snare", "Kicks", "ClosedHH", "ClosedHH", "Clap", "Crash", "Perc"]
def predict(self, instances, **kwargs):
log_name = "Here I am"
logger = self.logging_client.logger(log_name)
text = 'Hello, world!'
logger.log_text(text)
print('Logged: {}'.format(text), kwargs.get("sr"))
inputs = np.asarray(instances)
outputs = self._model.predict(inputs)
if kwargs.get('probabilities'):
return outputs.tolist()
#return "[]"
else:
return [self._class_names[index] for index in np.argmax(outputs.tolist(), axis=1)]
#classmethod
def from_path(cls, model_dir):
model_path = os.path.join(model_dir, 'model.h5')
model = tf.keras.models.load_model(model_path, custom_objects={"adam": tf.keras.optimizers.Adam,
"categorical_crossentropy":tf.keras.losses.categorical_crossentropy, "lr":0.01, "name": "Adam"})
preprocessor_path = os.path.join(model_dir, 'preprocessor.pkl')
with open(preprocessor_path, 'rb') as f:
preprocessor = pickle.load(f)
return cls(model, preprocessor)
I can't find anything online for why my logs are not showing up in stackdriver (neither print statements nor the logging library calls). Has anyone faced this issue?
Thanks,
Nikita
NOTE: If you have enough rep to create tags please add the google-ai-platform tag to this post. I think it would really help people who are in my position. Thanks!
From Documentation:
If you want to enable online prediction logging, you must configure it
when you create a model resource or when you create a model version
resource, depending on which type of logging you want to enable. There
are three types of logging, which you can enable independently:
Access logging, which logs information like timestamp and latency for
each request to Stackdriver Logging.
You can enable access logging when you create a model resource.
Stream logging, which logs the stderr and stdout streams from your
prediction nodes to Stackdriver Logging, and can be useful for
debugging. This type of logging is in beta, and it is not supported by
Compute Engine (N1) machine types.
You can enable stream logging when you create a model resource.
Request-response logging, which logs a sample of online prediction
requests and responses to a BigQuery table. This type of logging is in
beta.
You can enable request-response logging by creating a model version
resource, then updating that version.
For your use case, please use the following template to log custom information into StackDriver:
Model
gcloud beta ai-platform models create {MODEL_NAME} \
--regions {REGION} \
--enable-logging \
--enable-console-logging
Model version
gcloud beta ai-platform versions create {VERSION_NAME} \
--model {MODEL_NAME} \
--origin gs://{BUCKET}/{MODEL_DIR} \
--python-version 3.7 \
--runtime-version 1.15 \
--package-uris gs://{BUCKET}/{PACKAGES_DIR}/custom-model-0.1.tar.gz \
--prediction-class=custom_prediction.CustomModelPrediction \
--service-account custom#project_id.iam.gserviceaccount.com
I tried this and worked fine:
I did some modification to the constructor due to the #classmethod decorator.
Create a service account and grant it "Stackdriver Debugger User" role, use it during model version creation
Add google-cloud-logging library to your setup.py
Consider extra cost of enabling StackDriver logging
When using log_struct check the correct type is passed. (If using str, make sure you convert bytes to str in Python 3 using .decode('utf-8'))
Define the project_id parameter during Stackdriver client creation
logging.Client(), otherwise you will get:
ERROR:root:Prediction failed: 400 Name "projects//logs/my-custom-prediction-log" is missing the parent component. Expected the form projects/[PROJECT_ID]/logs/[ID]"
Code below:
%%writefile cloud_logging.py
import os
import pickle
import numpy as np
from datetime import date
from google.cloud import logging
import tensorflow.keras as keras
LOG_NAME = 'my-custom-prediction-log'
class CustomModelPrediction(object):
def __init__(self, model, processor, client):
self._model = model
self._processor = processor
self._client = client
def _postprocess(self, predictions):
labels = ['negative', 'positive']
return [
{
"label":labels[int(np.round(prediction))],
"score":float(np.round(prediction, 4))
} for prediction in predictions]
def predict(self, instances, **kwargs):
logger = self._client.logger(LOG_NAME)
logger.log_struct({'instances':instances})
preprocessed_data = self._processor.transform(instances)
predictions = self._model.predict(preprocessed_data)
labels = self._postprocess(predictions)
return labels
#classmethod
def from_path(cls, model_dir):
client = logging.Client(project='project_id') # Change to your project
model = keras.models.load_model(
os.path.join(model_dir,'keras_saved_model.h5'))
with open(os.path.join(model_dir, 'processor_state.pkl'), 'rb') as f:
processor = pickle.load(f)
return cls(model, processor, client)
# Verify model locally
from cloud_logging import CustomModelPrediction
classifier = CustomModelPrediction.from_path('.')
requests = ["God I hate the north", "god I love this"]
response = classifier.predict(requests)
response
Then I check with the sample library:
python snippets.py my-custom-prediction-log list
Listing entries for logger my-custom-prediction-log:
* 2020-02-19T19:51:45.809767+00:00: {u'instances': [u'God I hate the north', u'god I love this']}
* 2020-02-19T19:57:18.615159+00:00: {u'instances': [u'God I hate the north', u'god I love this']}
To visualize the logs, in StackDriver > Logging > Select Global and your Log name, if you want to see Model logs you should be able to select Cloud ML Model version.
You can use my files here: model and pre-processor
If you just want your print to work and not use the logging method above me you can just add flush flag to your print,
print(“logged”,flush=True)
I want to deploy my trained tensorflow model to the amazon sagemaker, I am following the official guide here: https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/ to deploy my model using jupyter notebook.
But when I try to use code:
predictor = sagemaker_model.deploy(initial_instance_count=1, instance_type='ml.t2.medium')
It gives me the following error message:
ValueError: Error hosting endpoint sagemaker-tensorflow-2019-08-07-22-57-59-547: Failed Reason: The image '520713654638.dkr.ecr.us-west-1.amazonaws.com/sagemaker-tensorflow:1.12-cpu-py3 ' does not exist.
I think the tutorial does not tell me to create an image, and I do not know what to do.
import boto3, re
from sagemaker import get_execution_role
role = get_execution_role()
# make a tar ball of the model data files
import tarfile
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
archive.add('export', recursive=True)
# create a new s3 bucket and upload the tarball to it
import sagemaker
sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')
from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
role = role,
framework_version = '1.12',
entry_point = 'train.py',
py_version='py3')
%%time
#here I fail to deploy the model and get the error message
predictor = sagemaker_model.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge')
https://github.com/aws/sagemaker-python-sdk/issues/912#issuecomment-510226311
As mentioned in the issue
Python 3 isn't supported using the TensorFlowModel object, as the container uses the TensorFlow serving api library in conjunction with the GRPC client to handle making inferences, however the TensorFlow serving api isn't supported in Python 3 officially, so there are only Python 2 versions of the containers when using the TensorFlowModel object.
If you need Python 3 then you will need to use the Model object defined in #2 above. The inference script format will change if you need to handle pre and post processing. https://github.com/aws/sagemaker-tensorflow-serving-container#prepost-processing.