sagemaker.tensorflow.serving predict failed with 502 error

sagemaker.tensorflow.serving predict failed with 502 error - python

I deployed a tensorflow saved_model on using the following code:
`model_path = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz'
from sagemaker.tensorflow.serving import Model
model = Model(model_data=model_path, role=role)
predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge')`
the model takes images of dimensions 1,48,48,1
Immediately after, when I try to make a prediction using the following code:
`predictor.predict(preprocessed_faces_emo.tolist())`
I get the following error, and I know understand what the problem is. I am using this code from within sagemaker with Python version 3.7 and Tensorflow version 1.14.0:
`---------------------------------------------------------------------------
ModelError Traceback (most recent call last)
<ipython-input-37-4dc04dc0679c> in <module>()
----> 1 predictor.predict(preprocessed_faces_emo.tolist())~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/predictor.py in predict(self, data, initial_args)
105
106 request_args = self._create_request_args(data, initial_args)
--> 107 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
108 return self._handle_response(response)
109 ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
355 "%s() only accepts keyword arguments." % py_operation_name)
356 # The "self" in this scope is referring to the BaseClient.
--> 357 return self._make_api_call(operation_name, kwargs)
358
359 _api_call.__name__ = str(py_operation_name)~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
659 error_code = parsed_response.get("Error", {}).get("Code")
660 error_class = self.exceptions.from_code(error_code)
--> 661 raise error_class(parsed_response, operation_name)
662 else:
663 return parsed_responseModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (502) from model with message "<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.16.1</center>
</body>
</html>
". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/sagemaker-tensorflow-serving-2020-01-13-13-43-12-354 in account 970351559819 for more information.`

Related

How to solve "RuntimeError: 'len' is not supported in symbolic tracing by default" for vision transformers?

I am trying to create a feature extractor using from torchvision.models.feature_extraction import create_feature_extractor.
The model I am trying to use is from the vit_pytorch (link: https://github.com/lucidrains/vit-pytorch). The problem I face is that when I create a model from this lib:
from vit_pytorch import ViT
from torchvision.models.feature_extraction import create_feature_extractor
model = ViT(image_size=28,
patch_size=7,
num_classes=10,
dim=16,
depth=6,
heads=16,
mlp_dim=256,
dropout=0.1,
emb_dropout=0.1,
channels=1)
random_layer_name = 'transformer.layers.1.1.fn.net.4'
feature_extractor = create_feature_extractor(model,
return_nodes=random_layer_name)
and when trying to use the create_feature_extractor() on this model I always get this error:
RuntimeError Traceback (most recent call last)
Cell In[17], line 2
1 # torch.fx.wrap('len')
----> 2 feature_extractor = create_feature_extractor(model,
3 return_nodes=['transformer.layers.1.1.fn.net.4'])
File ~/Mokslas/AI/venv/lib/python3.10/site-packages/torchvision/models/feature_extraction.py:485, in create_feature_extractor(model, return_nodes, train_return_nodes, eval_return_nodes, tracer_kwargs, suppress_diff_warning)
483 # Instantiate our NodePathTracer and use that to trace the model
484 tracer = NodePathTracer(**tracer_kwargs)
--> 485 graph = tracer.trace(model)
487 name = model.__class__.__name__ if isinstance(model, nn.Module) else model.__name__
488 graph_module = fx.GraphModule(tracer.root, graph, name)
File ~/Mokslas/AI/venv/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py:756, in Tracer.trace(self, root, concrete_args)
749 for module in self._autowrap_search:
750 _autowrap_check(
751 patcher, module.__dict__, self._autowrap_function_ids
752 )
753 self.create_node(
754 "output",
755 "output",
--> 756 (self.create_arg(fn(*args)),),
757 {},
758 type_expr=fn.__annotations__.get("return", None),
759 )
761 self.submodule_paths = None
762 finally:
File ~/Mokslas/AI/venv/lib/python3.10/site-packages/vit_pytorch/vit.py:115, in ViT.forward(self, img)
114 def forward(self, img):
--> 115 x = self.to_patch_embedding(img)
116 b, n, _ = x.shape
118 cls_tokens = repeat(self.cls_token, '1 1 d -> b 1 d', b = b)
File ~/Mokslas/AI/venv/lib/python3.10/site-packages/torch/fx/_symbolic_trace.py:734, in Tracer.trace.<locals>.module_call_wrapper(mod, *args, **kwargs)
727 return _orig_module_call(mod, *args, **kwargs)
729 _autowrap_check(
730 patcher,
731 getattr(getattr(mod, "forward", mod), "__globals__", {}),
732 self._autowrap_function_ids,
733 )
--> 734 return self.call_module(mod, forward, args, kwargs)
File ~/Mokslas/AI/venv/lib/python3.10/site-packages/torchvision/models/feature_extraction.py:83, in NodePathTracer.call_module(self, m, forward, args, kwargs)
...
--> 396 raise RuntimeError("'len' is not supported in symbolic tracing by default. If you want "
397 "this call to be recorded, please call torch.fx.wrap('len') at "
398 "module scope")
RuntimeError: 'len' is not supported in symbolic tracing by default. If you want this call to be recorded, please call torch.fx.wrap('len') at module scope
It doesn't matter which model I choose from that library or which layer or layers I choose to be outputed I always get the same error.
I have tried to add torch.fx.wrap('len') but the same problem persisted. I know I could try to solve it by using the hook methods, but is there a way to solve this problem so that I could still use the create_feature_extractor() functionality?

ValueError when calling activity_classifier.create(...) method

I am using TuriCreate to create model to classify a human activity, but I get error when I try to run activity_classifier.create(...) method.
Code
This is what I did:
Load all data:
train_sf = tc.SFrame("data/cleaned_train_sframe")
valid_sf = tc.SFrame("data/cleaned_valid_sframe")
test_sf = tc.SFrame("data/cleaned_test_sframe")
Dividing the SFrame randomly into two smaller SFrames:
train, valid = tc.activity_classifier.util.random_split_by_session(train_sf, session_id='sessionId', fraction=0.9)
Trying to build and train my model:
model = tc.activity_classifier.create(dataset=train_sf,
session_id='sessionId',
target='activity',
features=["rotX", "rotY", "rotZ", "accelX", "accelY", "accelZ"],
prediction_window=50,
validation_set=valid_sf,
max_iterations=20)
Error
The third step raise the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [34], in <cell line: 1>()
----> 1 model = tc.activity_classifier.create(dataset=train_sf,
2 session_id='sessionId',
3 target='activity',
4 features=["rotX", "rotY", "rotZ", "accelX", "accelY", "accelZ"],
5 prediction_window=50,
6 validation_set=valid_sf,
7 max_iterations=20)
File ~/Desktop/PFG/lib/python3.8/site-packages/turicreate/toolkits/activity_classifier/_activity_classifier.py:200, in create(dataset, session_id, target, features, prediction_window, validation_set, max_iterations, batch_size, verbose, random_seed)
197 options["_show_loss"] = False
198 options["random_seed"] = random_seed
--> 200 model.train(dataset, target, session_id, validation_set, options)
201 return ActivityClassifier(model_proxy=model, name=name)
File ~/Desktop/PFG/lib/python3.8/site-packages/turicreate/extensions.py:305, in _ToolkitClass.__getattr__.<locals>.<lambda>(*args, **kwargs)
302 return _wrap_function_return(self._tkclass.get_property(name))
303 elif name in self._functions:
304 # is it a function?
--> 305 ret = lambda *args, **kwargs: self.__run_class_function(name, args, kwargs)
306 ret.__doc__ = (
307 "Name: " + name + "\nParameters: " + str(self._functions[name]) + "\n"
308 )
309 try:
File ~/Desktop/PFG/lib/python3.8/site-packages/turicreate/extensions.py:290, in _ToolkitClass.__run_class_function(self, fnname, args, kwargs)
288 # unwrap it
289 try:
--> 290 ret = self._tkclass.call_function(fnname, argument_dict)
291 except RuntimeError as exc:
292 # Expose C++ exceptions using ToolkitError.
293 raise _ToolkitError(exc)
File cy_model.pyx:35, in turicreate._cython.cy_model.UnityModel.call_function()
File cy_model.pyx:40, in turicreate._cython.cy_model.UnityModel.call_function()
ValueError: stod: no conversion
Does anyone know what the problem could be?

You can get passed this issue by setting the validation_set to None.
This does mean that you have no validation, but at least you can create your model.

Error while saving Optuna study to Google Drive from Colab

I can save a random file to my drive colab as:
with open ("gdrive/My Drive/chapter_classification/output/hello.txt",'w')as f:
f.write('hello')
works fine but when I use the Official documentation approach of Optuna using the code:
direction = 'minimize'
name = 'opt1'
study = optuna.create_study(sampler=optuna.samplers.TPESampler(),direction=direction,study_name=name, storage=f"gdrive/My Drive/chapter_classification/output/sqlite:///{name}.db",load_if_exists=True)
study.optimize(tune, n_trials=1000)
throws an error as:
ArgumentError Traceback (most recent call last)
<ipython-input-177-f32da2c0f69a> in <module>()
2 direction = 'minimize'
3 name = 'opt1'
----> 4 study = optuna.create_study(sampler=optuna.samplers.TPESampler(),direction=direction,study_name=name, storage="gdrive/My Drive/chapter_classification/output/sqlite:///opt1.db",load_if_exists=True)
5 study.optimize(tune, n_trials=1000)
6 frames
/usr/local/lib/python3.7/dist-packages/optuna/study/study.py in create_study(storage, sampler, pruner, study_name, direction, load_if_exists, directions)
1134 ]
1135
-> 1136 storage = storages.get_storage(storage)
1137 try:
1138 study_id = storage.create_new_study(study_name)
/usr/local/lib/python3.7/dist-packages/optuna/storages/__init__.py in get_storage(storage)
29 return RedisStorage(storage)
30 else:
---> 31 return _CachedStorage(RDBStorage(storage))
32 elif isinstance(storage, RDBStorage):
33 return _CachedStorage(storage)
/usr/local/lib/python3.7/dist-packages/optuna/storages/_rdb/storage.py in __init__(self, url, engine_kwargs, skip_compatibility_check, heartbeat_interval, grace_period, failed_trial_callback)
173
174 try:
--> 175 self.engine = create_engine(self.url, **self.engine_kwargs)
176 except ImportError as e:
177 raise ImportError(
<string> in create_engine(url, **kwargs)
/usr/local/lib/python3.7/dist-packages/sqlalchemy/util/deprecations.py in warned(fn, *args, **kwargs)
307 stacklevel=3,
308 )
--> 309 return fn(*args, **kwargs)
310
311 doc = fn.__doc__ is not None and fn.__doc__ or ""
/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/create.py in create_engine(url, **kwargs)
528
529 # create url.URL object
--> 530 u = _url.make_url(url)
531
532 u, plugins, kwargs = u._instantiate_plugins(kwargs)
/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/url.py in make_url(name_or_url)
713
714 if isinstance(name_or_url, util.string_types):
--> 715 return _parse_rfc1738_args(name_or_url)
716 else:
717 return name_or_url
/usr/local/lib/python3.7/dist-packages/sqlalchemy/engine/url.py in _parse_rfc1738_args(name)
775 else:
776 raise exc.ArgumentError(
--> 777 "Could not parse rfc1738 URL from string '%s'" % name
778 )
779
ArgumentError: Could not parse rfc1738 URL from string 'gdrive/My Drive/chapter_classification/output/sqlite:///opt1.db'

So according to the official Documentation of create_study
When a database URL is passed, Optuna internally uses SQLAlchemy to handle the database. Please refer to SQLAlchemy’s document for further details. If you want to specify non-default options to SQLAlchemy Engine, you can instantiate RDBStorage with your desired options and pass it to the storage argument instead of a URL.
And when you visit the documentation of SQLAlchemy, you find that it uses absolute path.
So all you have to do is to change
storage=f"gdrive/My Drive/chapter_classification/output/sqlite:///{name}.db"
to the absolute path as:
storage = f"sqlite:///gdrive/My Drive/chapter_classification/output{name}.db"

More efficient way to send a request than JSON to deployed tensorflow model in Sagemaker?

I have trained a tf.estimator based TensorFlow model in Sagemaker and deployed it and it works fine.
But I can only send requests to it in JSON format. I need to send some big input tensors and this seems very inefficient and also quickly breaks InvokeEndpoints 5MB request limit.
Is it possible to use a more effective format against the tensorflow serving based endpoint?
I tried sending a protobuf based request:
from sagemaker.tensorflow.serving import Model
from sagemaker.tensorflow.tensorflow_serving.apis import predict_pb2
from sagemaker.tensorflow.predictor import tf_serializer, tf_deserializer
role = 'xxx'
model = Model('s3://xxx/tmp/artifacts/sagemaker-tensorflow-scriptmode-xxx/output/model.tar.gz', role)
predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge', endpoint_name='test-endpoint')
# this predictor has json serializer, make a new one pred =
RealTimePredictor('test-endpoint', serializer=tf_serializer, deserializer=tf_deserializer)
req = predict_pb2.PredictRequest()
req.inputs['instances'].CopyFrom(tf.make_tensor_proto(np.zeros((4, 36, 64)), shape=(4, 36, 64)))
predictor.predict(req)
Which results in the following error:
---------------------------------------------------------------------------
ModelError Traceback (most recent call last)
<ipython-input-40-5ba7f281bd0d> in <module>()
----> 1 predictor.predict(req)
~/anaconda3/envs/default/lib/python3.6/site-packages/sagemaker/predictor.py in predict(self, data, initial_args)
76
77 request_args = self._create_request_args(data, initial_args)
---> 78 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
79 return self._handle_response(response)
80
~/anaconda3/envs/default/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
355 "%s() only accepts keyword arguments." % py_operation_name)
356 # The "self" in this scope is referring to the BaseClient.
--> 357 return self._make_api_call(operation_name, kwargs)
358
359 _api_call.__name__ = str(py_operation_name)
~/anaconda3/envs/default/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
659 error_code = parsed_response.get("Error", {}).get("Code")
660 error_class = self.exceptions.from_code(error_code)
--> 661 raise error_class(parsed_response, operation_name)
662 else:
663 return parsed_response
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (415) from model with message "{"error": "Unsupported Media Type: application/octet-stream"}".
Is JSON the only available query format for deployed TensorFlow models ?

Have you looked at batch transform? If you don't need actually an HTTPS endpoint, this might solve your problem:
https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-batch-transform.html

AWS Sagemaker, InvokeEndpoint operation, Model error: "setting an array element with a sequence."

I am trying to Invoke Endpoint, previously deployed on Amazon SageMaker.
Here is my code:
import numpy as np
import boto3
client = boto3.client('sagemaker-runtime')
def np2csv(arr):
csv = io.BytesIO()
np.savetxt(csv, arr, delimiter=',', fmt='%g')
return csv.getvalue().decode().rstrip()
endpoint_name = 'DEMO-XGBoostEndpoint-2018-12-12-22-07-28'
test_vector = np.array([3.60606061e+00,
3.91395664e+00,
1.34200000e+03,
4.56100000e+03,
2.00000000e+02,
2.00000000e+02])
csv_test_vector = np2csv(test_vector)
response = client.invoke_endpoint(EndpointName=endpoint_name,
ContentType='text/csv',
Body=csv_test_vector)
And here is the error I get:
ModelErrorTraceback (most recent call last)
in ()
1 response = client.invoke_endpoint(EndpointName=endpoint_name,
2 ContentType='text/csv',
----> 3 Body=csv_test_vector)
/home/ec2-user/anaconda3/envs/python2/lib/python2.7/site-packages/botocore/client.pyc
in _api_call(self, *args, **kwargs)
318 "%s() only accepts keyword arguments." % py_operation_name)
319 # The "self" in this scope is referring to the BaseClient.
--> 320 return self._make_api_call(operation_name, kwargs)
321
322 _api_call.name = str(py_operation_name)
/home/ec2-user/anaconda3/envs/python2/lib/python2.7/site-packages/botocore/client.pyc
in _make_api_call(self, operation_name, api_params)
621 error_code = parsed_response.get("Error", {}).get("Code")
622 error_class = self.exceptions.from_code(error_code)
--> 623 raise error_class(parsed_response, operation_name)
624 else:
625 return parsed_response
ModelError: An error occurred (ModelError) when calling the
InvokeEndpoint operation: Received client error (415) from model with
message "setting an array element with a sequence.". See
https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/DEMO-XGBoostEndpoint-2018-12-12-22-07-28
in account 249707424405 for more information.

This works:
import numpy as np
import boto3
client = boto3.client('sagemaker-runtime')
endpoint_name = 'DEMO-XGBoostEndpoint-2018-12-12-22-07-28'
test_vector = [3.60606061e+00,
3.91395664e+00,
1.34200000e+03,
4.56100000e+03,
2.00000000e+02,
2.00000000e+02])
body = ',',join([str(item) for item in test_vector])
response = client.invoke_endpoint(EndpointName=endpoint_name,
ContentType='text/csv',
Body=body)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

sagemaker.tensorflow.serving predict failed with 502 error - python

Related

How to solve "RuntimeError: 'len' is not supported in symbolic tracing by default" for vision transformers?

ValueError when calling activity_classifier.create(...) method

Error while saving Optuna study to Google Drive from Colab

More efficient way to send a request than JSON to deployed tensorflow model in Sagemaker?

AWS Sagemaker, InvokeEndpoint operation, Model error: "setting an array element with a sequence."

Categories

Resources