I am deploying a machine learning image to Azure Container Instances from Azure Machine Learning services according to this article, but am always stuck with the error message:
Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.
Please check the logs for your container instance xxxxxxx'.
I tried:
increasing memory_gb=4 in aci_config.
I did
troubleshooting locally, but I could not have found any.
Below is my score.py
def init():
global model
model_path = Model.get_model_path('pofc_fc_model')
model = joblib.load(model_path)
def run(raw_data):
data = np.array(json.loads(raw_data)['data'])
y_hat = model.predict(data)
return y_hat.tolist()
Have you registered the model 'pofc_fc_model' in your workspace using the register() function on the model object? If not, there will be no model path and that can cause failure.
See this section on model registration: https://learn.microsoft.com/en-us/azure/machine-learning/service/how-to-deploy-and-where#registermodel
Related
I've been following this guide here: https://aws.amazon.com/blogs/machine-learning/building-an-nlu-powered-search-application-with-amazon-sagemaker-and-the-amazon-es-knn-feature/
I have successfully deployed the model from my notebook instance. I am also able to generate predictions by calling predict() method from sagemaker.predictor.
This is how I created and deployed the model
class StringPredictor(Predictor):
def __init__(self, endpoint_name, sagemaker_session):
super(StringPredictor, self).__init__(endpoint_name, sagemaker_session, content_type='text/plain')
pytorch_model = PyTorchModel(model_data = inputs,
role=role,
entry_point ='inference.py',
source_dir = './code',
framework_version = '1.3.1',
py_version='py3',
predictor_cls=StringPredictor)
predictor = pytorch_model.deploy(instance_type='ml.m5.large', initial_instance_count=4)
From the SageMaker dashboard, I can even see that my endpoint and the status is "in-service"
If I run aws sagemaker list-endpoints I can see my desired endpoint showing up correctly as well.
My issue is when I run this code (outside of sagemaker), I'm getting an error:
import boto3
sm_runtime_client = boto3.client('sagemaker-runtime')
payload = "somestring that is used here"
response = sm_runtime_client.invoke_endpoint(EndpointName='pytorch-inference-xxxx',ContentType='text/plain',Body=payload)
This is the error thrown
botocore.errorfactory.ValidationError: An error occurred (ValidationError) when calling the InvokeEndpoint operation: Endpoint pytorch-inference-xxxx of account xxxxxx not found.
This is quite strange as I'm able to see and run the endpoint just fine from sagemaker notebook and I am able to run the predict() method too.
I have verified the region, endpoint name and the account number.
I was having the exact same error, I've just fixed mine by setting the correct region.
I have verified the region, endpoint name and the account number.
I know that you have indicated that you have verified the region, but in my case, the remote computer had another region configured. So I just ran the following command on my remote computer
aws configure
And once I set the key ID and secret key again, I set the correct region and the error was gone.
Hosting a word2vec model with gensim on AWS lambda
using python 2.7
boto==2.48.0
gensim==3.4.0
and I have a few lines in my function.py file where I load the model directly from s3
print('################### connecting to s3...')
s3_conn = boto.s3.connect_to_region(
region,
aws_access_key_id = Aws_access_key_id,
aws_secret_access_key = Aws_secret_access_key,
is_secure = True,
calling_format = OrdinaryCallingFormat()
)
print('################### connected to s3...')
bucket = s3_conn.get_bucket(S3_BUCKET)
print('################### got bucket...')
key = bucket.get_key(S3_KEY)
print('################### got key...')
model = KeyedVectors.load_word2vec_format(key, binary=True)
print('################### loaded model...')
on the model loading line
model = KeyedVectors.load_word2vec_format(key, binary=True)
getting a mysterious error without much details:
on the cloud watch can see all of my print messages til '################### got key...' inclusive,
then I get:
START RequestId: {req_id} Version: $LATEST
then right after it [no time delays between these two messages]
module initialization error: __exit__
please, is there a way to get a detailed error or more info?
More background details :
I was able to download the model from s3 to /tmp/ and it did authorize and retrieve the model file, but it went out of space [file is ~2GB, /tmp/ is 512MB]
so, switched to directly loading the model by gensim as above and now getting that mysterious error.
running the function with python-lambda-local works without issues
so, this probably narrows it down to an issue with gensim's smart open or aws lambda, would appreciate any hints, thanks!
instead of connecting using boto,
simply:
model = KeyedVectors.load_word2vec_format('s3://{}:{}#{}/{}'.format(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_BUCKET, S3_KEY), binary=True)
worked!
but of course, unfortunately, it doesn't answer the question on why the mysterious exit error came up and how to get more info :/
TL;DR I need to find a real solution to download my data from product datastore and load it to the local development environment.
The detailed problem:
I need to test my app in local development server with the real data (not real-time data) on datastore of the product server. The documentation and other resources offer three option:
Using appfg.py downloading data from the product server then loading it into the local development environment. When I use this method I am getting 'bad request' error due to Oauth problem. Besides, this method will be deprecated. The official documentation advises using the second method:
Using the gcloud via managed export and the import. The epic documentation of this method explains how we backup all data on console (in https://console.cloud.google.com/). I have tried this method. The backup data is being generated on storage in the cloud. I downloaded it. It is in the LevelDB format. I need to load it into local development server. There is no official explanation for it. The loading method of the first method is not compatible with LevelDB format. I couldn't find an official way to solve the problem. There is a StackOverflow entry but it is not worked for me because of it just gets all entities as the dict. The conversation the 'dic' object to the 'ndb' Entities becomes the tricky problem.
I have lost my hope with the first two methods then I have decided the use Cloud Datastore Emulator (beta) which provides the emulating real data on local development environment. It is still beta and has several problems. When I run the command I encountered the problem DATASTORE_EMULATOR_HOST anyway.
It sounds like you should be using a remote sandbox
Even if you get this to work, the localhost datastore still behaves differently than the actual datastore.
If you want to truly simulate your production environment, then i would recommend setting up a clone of your app engine project as a remote sandbox. You could deploy your app to a new gae project id appcfg.py update . -A sandbox-id, and use datastore admin to create a backup of production in google cloud storage and then use datastore admin in your sandbox to restore this backup in your sandbox.
Cloning production data into localhost
I do prime my localhost datastore with some production data, but this is not a complete clone. Just the core required objects and a few test users.
To do this I wrote a google dataflow job that exports select models and saves them in google cloud storage in jsonl format. Then on my local host I have an endpoint called /init/ which launches a taskqueue job to download these exports and import them.
To do this i reuse my JSON REST handler code which is able to convert any model to json and vice versa.
In theory you could do this for your entire datastore.
EDIT - This is what my to-json/from-json code looks like:
All of my ndb.Models subclass my BaseModel which has generic conversion code:
get_dto_typemap = {
ndb.DateTimeProperty: dt_to_timestamp,
ndb.KeyProperty: key_to_dto,
ndb.StringProperty: str_to_dto,
ndb.EnumProperty: str,
}
set_from_dto_typemap = {
ndb.DateTimeProperty: timestamp_to_dt,
ndb.KeyProperty: dto_to_key,
ndb.FloatProperty: float_from_dto,
ndb.StringProperty: strip,
ndb.BlobProperty: str,
ndb.IntegerProperty: int,
}
class BaseModel(ndb.Model):
def to_dto(self):
dto = {'key': key_to_dto(self.key)}
for name, obj in self._properties.iteritems():
key = obj._name
value = getattr(self, obj._name)
if obj.__class__ in get_dto_typemap:
if obj._repeated:
value = [get_dto_typemap[obj.__class__](v) for v in value]
else:
value = get_dto_typemap[obj.__class__](value)
dto[key] = value
return dto
def set_from_dto(self, dto):
for name, obj in self._properties.iteritems():
if isinstance(obj, ndb.ComputedProperty):
continue
key = obj._name
if key in dto:
value = dto[key]
if not obj._repeated and obj.__class__ in set_from_dto_typemap:
try:
value = set_from_dto_typemap[obj.__class__](value)
except Exception as e:
raise Exception("Error setting "+self.__class__.__name__+"."+str(key)+" to '"+str(value) + "': " + e.message)
try:
setattr(self, obj._name, value)
except Exception as e:
print dir(obj)
raise Exception("Error setting "+self.__class__.__name__+"."+str(key)+" to '"+str(value)+"': "+e.message)
class User(BaseModel):
# user fields, etc
My request handlers then use set_from_dto & to_dto like this (BaseHandler also provides some convenience methods for converting json payloads to python dicts and what not):
class RestHandler(BaseHandler):
MODEL = None
def put(self, resource_id=None):
if resource_id:
obj = ndb.Key(self.MODEL, urlsafe=resource_id).get()
if obj:
obj.set_from_dto(self.json_body)
obj.put()
return obj.to_dto()
else:
self.abort(422, "Unknown id")
else:
self.abort(405)
def post(self, resource_id=None):
if resource_id:
self.abort(405)
else:
obj = self.MODEL()
obj.set_from_dto(self.json_body)
obj.put()
return obj.to_dto()
def get(self, resource_id=None):
if resource_id:
obj = ndb.Key(self.MODEL, urlsafe=resource_id).get()
if obj:
return obj.to_dto()
else:
self.abort(422, "Unknown id")
else:
cursor_key = self.request.GET.pop('$cursor', None)
limit = max(min(200, self.request.GET.pop('$limit', 200)), 10)
qs = self.MODEL.query()
# ... other code that handles query params
results, next_cursor, more = qs.fetch_page(limit, start_cursor=cursor)
return {
'$cursor': next_cursor.urlsafe() if more else None,
'results': [result.to_dto() for result in results],
}
class UserHandler(RestHandler):
MODEL = User
I am developing a Flask/MongoDB application Im deploying on Azure. Locally, I am in the process of creating my models and testing my database connection. I am using Flask-MongoEngine to manage my DB connection. This is a sample of code that works perfectly on localhost but fails when calling its deployed version on Azure.
# On models.py
from flask_mongoengine import MongoClient
db = MongoClient()
class User(db.Document):
name = db.StringField(max_length=50)
token = db.StringField(max_length=50)
email = db.EmailField()
Later, from views.py I call my User class like this:
import models as mdl
#app.route('/test')
def test():
"""For testing purposes"""
user = mdl.User(name='Matias')
user.save()
users = mdl.User.objects
return jsonify(users)
which ouputs as expected locally. On Azure, however, I get the following error
(will only show the last and relevant part of the traceback):
File ".\app\views.py", line 53, in test
user = mdl.User(name='Matias')
File "D:\home\python364x86\lib\site-packages\mongoengine\base\document.py",
line 43, in _init_
self._initialised = False
File "D:\home\python364x86\lib\site-packages\mongoengine\base\document.py",
line 168, in _setattr_
self._is_document and
AttributeError: 'User' object has no attribute '_is_document'
Through pip freeze I checked I am using the same versions for mongoengine, pymongo and flask_mongoengine in both environments. I can't seem to find someone else with the same problem. The app is deployed as a webapp on a Windows machine in the Azure cloud.
Any help is appreciated, thanks.
PS: Further info
Reviewing mongoengine code, I found out that _is_document attribute is set inside a metaclass for the class Document (DocumentMetaclass and TopLevelDocumentMetaclass). I tried setting the attribute to True inside User, and the following error showed:
AttributeError: 'User' object has no attribute '_meta',
which is also an attribute defined inside those metaclasses. Somehow the metaclasses code is not running? Maybe the Azure environment has something to do with it?
I'm trying to programmatically spin up an Azure VM using the Python REST API wrapper. All I want is a simple VM, not part of a deployment or anything like that. I've followed the example here: http://www.windowsazure.com/en-us/develop/python/how-to-guides/service-management/#CreateVM
I've gotten the code to run, but I am not seeing any new VM in the portal; all it does is create a new cloud service that says "You have nothing deployed to the production environment." What am I doing wrong?
You've created a hosted_service (cloud service) but haven't put deployed anything in that service. You need to do a few more things so I'll coninue from where you left off, where name is the name of the VM:
# Where should the OS VHD be created:
media_link = 'http://portalvhdsexample.blob.core.windows.net/vhds/%s.vhd' % name
# Linux username/password details:
linux_config = azure.servicemanagement.LinuxConfigurationSet(name, 'username', 'password', True)
# Endpoint (port) configuration example, since documentation on this is lacking:
endpoint_config = azure.servicemanagement.ConfigurationSet()
endpoint_config.configuration_set_type = 'NetworkConfiguration'
endpoint1 = azure.servicemanagement.ConfigurationSetInputEndpoint(name='HTTP', protocol='tcp', port='80', local_port='80', load_balanced_endpoint_set_name=None, enable_direct_server_return=False)
endpoint2 = azure.servicemanagement.ConfigurationSetInputEndpoint(name='SSH', protocol='tcp', port='22', local_port='22', load_balanced_endpoint_set_name=None, enable_direct_server_return=False)
endpoint_config.input_endpoints.input_endpoints.append(endpoint1)
endpoint_config.input_endpoints.input_endpoints.append(endpoint2)
# Set the OS HD up for the API:
os_hd = azure.servicemanagement.OSVirtualHardDisk(image_name, media_link)
# Actually create the machine and start it up:
try:
sms.create_virtual_machine_deployment(service_name=name, deployment_name=name,
deployment_slot='production', label=name, role_name=name,
system_config=linux_config, network_config=endpoint_config,
os_virtual_hard_disk=os_hd, role_size='Small')
except Exception as e:
logging.error('AZURE ERROR: %s' % str(e))
return False
return True
Maybe I'm not understanding your problem, but a VM is essentially a deployment within a cloud service (think of it like a logical container for machines).