Problem with init() function for model deployment in Azure - python

I want to deploy model in Azure but I'm struggling with the following problem.
I have my model registered in Azure. The file with extension .sav is located locally. The registration looks the following:
import urllib.request
from azureml.core.model import Model
# Register model
model = Model.register(ws, model_name="my_model_name.sav", model_path="model/")
I have my score.py file. The init() function in the file looks like this:
import json
import numpy as np
import pandas as pd
import os
import pickle
from azureml.core.model import Model
def init():
global model
model_path = Model.get_model_path(model_name = 'my_model_name.sav', _workspace='workspace_name')
model = pickle(open(model_path, 'rb'))
But when I try to deploy I se the following error:
"code": "AciDeploymentFailed",
"statusCode": 400,
"message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.
1. Please check the logs for your container instance: leak-tester-pm. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.
And when I run print(service.logs()) I have the following output (I have only one model registered in Azure):
None
Am I doing something wrong with loading model in score.py file?
P.S. The .yml file for the deployment:
name: project_environment
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2
- pip:
- scikit-learn==0.24.2
- azureml-defaults
- numpy
- pickle-mixin
- pandas
- xgboost
- azure-ml-api-sdk
channels:
- anaconda
- conda-forge

The local inference server allows you to quickly debug your entry script (score.py). In case the underlying score script has a bug, the server will fail to initialize or serve the model. Instead, it will throw an exception & the location where the issues occurred.
There are two possible reasons for the error or exception occurred.
HTTP server issue. Need to troubleshoot it
Docker deployment.
You need to debug the procedure which you followed. In some cases, HTTP server issues will cause the problem of initialization (init())
Check Azure Machine Learning inference HTTP server for better debugging from server perspective.
The docker file mentioned is looking good, but it's better to debug once by the steps mentioned in https://learn.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-deployment-local#dockerlog

Try below code inside init() function:
def init():
global model
model_path = "modelfoldername/model.pkl"
filename = 'mymodel.sav'
pickle.dump(model_path, open(filename, 'wb'))
# load the model from disk
model= pickle.load(open(filename, 'rb'))

Related

How can I run python code after a DBT run (or a specific model) is completed?

I would like to be able to run an ad-hoc python script that would access and run analytics on the model calculated by a dbt run, are there any best practices around this?
We recently built a tool that could that caters very much to this scenario. It leverages the ease of referencing tables from dbt in Python-land. It's called fal.
The idea is that you would define the python scripts you would like to run after your dbt models are run:
# schema.yml
models:
- name: iris
meta:
owner: "#matteo"
fal:
scripts:
- "notify.py"
And then the file notify.py is called if the iris model was run in the last dbt run:
# notify.py
import os
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError
CHANNEL_ID = os.getenv("SLACK_BOT_CHANNEL")
SLACK_TOKEN = os.getenv("SLACK_BOT_TOKEN")
client = WebClient(token=SLACK_TOKEN)
message_text = f"""Model: {context.current_model.name}
Status: {context.current_model.status}
Owner: {context.current_model.meta['owner']}"""
try:
response = client.chat_postMessage(
channel=CHANNEL_ID,
text=message_text
)
except SlackApiError as e:
assert e.response["error"]
Each script is ran with a reference to the current model for which it is running in a context variable.
To start using fal, just pip install fal and start writing your python scripts.
For production, I'd recommend an orchestration layer such as apache airflow.
See this blog post to get started, but essentially you'll have an orchestration DAG (note - not a dbt DAG) that does something like:
dbt run <with args> -> your python code
Fair warning, though, this can add a bit of complexity to your project.
I suppose you could get a similar effect with a CI/CD tool like github actions or circleCI

Issue while using transformers package inside the docker image

I am using transformers pipeline to perform sentiment analysis on sample texts from 6 different languages. I tested the code in my local Jupyterhub and it worked fine. But when I wrap it in a flask application and create a docker image out of it, the execution is hanging at the pipeline inference line and its taking forever to return the sentiment scores.
mac os catalina 10.15.7 (no GPU)
Python version : 3.8
Transformers package : 4.4.2
torch version : 1.6.0
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
results = classifier(["We are very happy to show you the Transformers library.", "We hope you don't hate it."])
print([i['score'] for i in results])
The above code works fine in Jupyter notebook and it has provided me the expected result
[0.7495927810668945,0.2365245819091797]
So now if I create a docker image with flask wrapper its getting stuck at the results = classifier([input_data]) line and the execution is running forever.
My folder structure is as follows:
- src
|-- app
|--main.py
|-- Dockerfile
|-- requirements.txt
I used the below Dockerfile to create the image
FROM tiangolo/uwsgi-nginx-flask:python3.8
COPY ./requirements.txt /requirements.txt
COPY ./app /app
WORKDIR /app
RUN pip install -r /requirements.txt
RUN echo "uwsgi_read_timeout 1200s;" > /etc/nginx/conf.d/custom_timeout.conf
And my requirements.txt file is as follows:
pandas==1.1.5
transformers==4.4.2
torch==1.6.0
My main.py script look like this :
from flask import Flask, json, request, jsonify
import traceback
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
app = Flask(__name__)
app.config["JSON_SORT_KEYS"] = False
model_name = 'nlptown/bert-base-multilingual-uncased-sentiment'
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
nlp = pipeline('sentiment-analysis', model=model_path, tokenizer=model_path)
#app.route("/")
def hello():
return "Model: Sentiment pipeline test"
#app.route("/predict", methods=['POST'])
def predict():
json_request = request.get_json(silent=True)
input_list = [i['text'] for i in json_request["input_data"]]
results = nlp(input_list) ########## Getting stuck here
for result in results:
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
score_list = [round(i['score'], 4) for i in results]
return jsonify(score_list)
if __name__ == "__main__":
app.run(host='0.0.0.0', debug=False, port=80)
My input payload is of the form
{"input_data" : [{"text" : "We are very happy to show you the Transformers library."},
{"text" : "We hope you don't hate it."}]}
I tried looking into the transformers github issues but couldn't find one. I execution works fine even when using the flask development server but it runs forever when I wrap it and create a docker image. I am not sure if I am missing any additional dependency to be included while creating the docker image.
Thanks.
I was having a similar issue. It seems that starting the app somehow polutes the memory of transformers models. Probably something to do with how Flask does threading but no idea why. What fixed it for me was doing the things that are causing trouble (loading the models) in a different thread.
import threading
def preload_models():
"LOAD MODELS"
return 0
def start_app():
app = create_app()
register_handlers(app)
preloading = threading.Thread(target=preload_models)
preloading.start()
preloading.join()
return app
First reply here. I would be really glad if this helps.
Flask uses port 5000. In creating a docker image, it's important to make sure that the port is set up this way. Replace the last line with the following:
app.run(host="0.0.0.0", port=int(os.environ.get("PORT", 5000)))
Be also sure to import os at the top
Lastly, in Dockerfile, add
EXPOSE 5000
CMD ["python", "./main.py"]

why does importing keras take so long on AWS Lambda?

Problem
I'm trying to load and use a keras model in AWS Lambda, but importing keras from tensorflow is taking a long time in my lambda function. Curiously, though, it didn't take very long in SageMaker. Why is this, and how can I fix it?
Setup Description
I'm using the serverless framework to deploy my function. The handler and serverless.yml are included below. I have an EFS volume holding my dependencies, which were installed using an EC2 instance with the EFS volume mounted. I pip installed dependencies to the EFS with the -t flag. For example, I installed tensorflow like this:
sudo miniconda3/envs/devenv/bin/pip install tensorflow -t /mnt/efs/fs1/lib
where /mnt/efs/fs1/lib is the folder on the EFS which stores my dependencies. The models are stored on s3.
I prototyped my loading my model on a Sagemaker notebook with the following code:
import time
start = time.time()
from tensorflow import keras
print('keras: {}'.format(time.time()-start))
import boto3
import os
import zipfile
print('imports: {}'.format(time.time()-start))
modelPath = '***model.zip'
bucket = 'predictionresources'
def load_motion_model():
s3 = boto3.client('s3')
s3.download_file(bucket, modelPath, 'model.motionmodel.zip')
with zipfile.ZipFile('model.motionmodel.zip', 'r') as zip_ref:
zip_ref.extractall('model.motionmodel')
return keras.models.load_model('model.motionmodel/'+os.listdir('model.motionmodel')[0])
model = load_motion_model()
print('total time: {}'.format(time.time()-start))
which has the following output:
keras: 2.0228586196899414
imports: 2.0231151580810547
total time: 3.0635251998901367
so, including all imports, this takes around 3 seconds to execute.
however, when I deploy to AWS Lambda with serverless, keras takes substantially longer to import. This lambda function (which is the same as the other one, just wrapped in a handler) and serverless.yml:
Handler
try:
import sys
import os
sys.path.append(os.environ['MNT_DIR']+'/lib0') # nopep8 # noqa
except ImportError:
pass
#returns the version of all dependancies
def test(event, context):
print('TEST LOADING')
import time
start = time.time()
from tensorflow import keras
print('keras: {}'.format(time.time()-start))
import boto3
import os
import zipfile
print('imports: {}'.format(time.time()-start))
modelPath = '**********nmodel.zip'
bucket = '***********'
def load_motion_model():
s3 = boto3.client('s3')
s3.download_file(bucket, modelPath, 'model.motionmodel.zip')
with zipfile.ZipFile('model.motionmodel.zip', 'r') as zip_ref:
zip_ref.extractall('model.motionmodel')
return keras.models.load_model('model.motionmodel/'+os.listdir('model.motionmodel')[0])
model = load_motion_model()
print('total time: {}'.format(time.time()-start))
body = {
'message': 'done!'
}
response = {
"statusCode": 200,
"body": json.dumps(body)
}
return response
(p.s. I know this would fail due to lack of write access, and the model needs to be saved to /tmp/)
Serverless.yml
service: test4KerasTest
plugins:
- serverless-pseudo-parameters
custom:
efsAccessPoint: fsap-*****
LocalMountPath: /mnt/efs
subnetsId: subnet-*****
securityGroup: sg-*****
provider:
name: aws
runtime: python3.6
region: us-east-2
timeout: 30
iamRoleStatements:
- Effect: Allow
Action:
- s3:GetObject
- s3:PutObject
Resource: 'arn:aws:s3:::predictionresources/*'
package:
exclude:
- node_modules/**
- .vscode/**
- .serverless/**
- .pytest_cache/**
- __pychache__/**
functions:
test:
handler: handler.test
environment: # Service wide environment variables
MNT_DIR: ${self:custom.LocalMountPath}
BUCKET: predictionresources
REGION: us-east-2
vpc:
securityGroupIds:
- ${self:custom.securityGroup}
subnetIds:
- ${self:custom.subnetsId}
iamManagedPolicies:
- arn:aws:iam::aws:policy/AmazonElasticFileSystemClientReadWriteAccess
- arn:aws:iam::aws:policy/AmazonS3FullAccess
events:
- http:
path: test
method: get
fileSystemConfig:
localMountPath: '${self:custom.LocalMountPath}'
arn: 'arn:aws:elasticfilesystem:${self:provider.region}:#{AWS::AccountId}:access-point/${self:custom.efsAccessPoint}'
results in this output from cloudwatch:
As can be seen, keras takes substantially longer to import in my lambda environment, but the other imports don't seem to be as negatively effected. I have tried importing different modules in different orders, and keras consistently takes an unreasonable amount of time to import.. Due to restrictions by the API Gateway, this function can't take longer than 30 seconds, which means I have to find a way to shorten the time it takes me to import keras on my lambda function.

Tensorflow on Elastic Beanstalk and Django

import tensorflow as tf
with tf.Graph().as_default():
sentences = tf.placeholder(tf.string)
import tensorflow_hub as hub
embed = hub.Module('/tmp/module')
embeddings = embed(sentences)
This is from my views.py file in my django app.
the execution fails at the import and shows the following error.
End of script output before headers: wsgi.py child pid 29409 exit
signal Segmentation fault (11)
While it works fine on my local machine.
I am currently using a t3 medium instance. Any tips on how to fix this ?

Azure Functions - Unable to import other python modules in called scripts

I have created a simple HTTP trigger-based azure function in python which is calling another python script to create a sample file in azure data lake gen 1. My solution structure is given below: -
Requirements.txt contains the following imports: -
azure-functions
azure-mgmt-resource
azure-mgmt-datalake-store
azure-datalake-store
init.py
import logging, os, sys
import azure.functions as func
import json
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
name = req.params.get('name')
if not name:
try:
req_body = req.get_json()
except ValueError:
pass
else:
name = req_body.get('name')
if name:
full_path_to_script = os.path.join(os.path.dirname( __file__ ) + '/Test.py')
logging.info(f"Path: - {full_path_to_script}")
os.system(f"python {full_path_to_script}")
return func.HttpResponse(f"Hello {name}!")
else:
return func.HttpResponse(
"Please pass a name on the query string or in the request body",
status_code=400
)
Test.py
import json
from azure.datalake.store import core, lib, multithread
directoryId = ''
applicationKey = ''
applicationId = ''
adlsCredentials = lib.auth(tenant_id = directoryId, client_secret = applicationKey, client_id = applicationId)
adlsClient = core.AzureDLFileSystem(adlsCredentials, store_name = '')
with adlsClient.open('stage1/largeFiles/TestFile.json', 'rb') as input_file:
data = json.load(input_file)
with adlsClient.open('stage1/largeFiles/Result.json', 'wb') as responseFile:
responseFile.write(data)
Test.py is failing with an error that no module found azure.datalake.store
Why other required modules are not working for Test.py since it is inside the same directory?
pip freeze output: -
adal==1.2.2
azure-common==1.1.23
azure-datalake-store==0.0.48
azure-functions==1.0.4
azure-mgmt-datalake-nspkg==3.0.1
azure-mgmt-datalake-store==0.5.0
azure-mgmt-nspkg==3.0.2
azure-mgmt-resource==6.0.0
azure-nspkg==3.0.2
certifi==2019.9.11
cffi==1.13.2
chardet==3.0.4
cryptography==2.8
idna==2.8
isodate==0.6.0
msrest==0.6.10
msrestazure==0.6.2
oauthlib==3.1.0
pycparser==2.19
PyJWT==1.7.1
python-dateutil==2.8.1
requests==2.22.0
requests-oauthlib==1.3.0
six==1.13.0
urllib3==1.25.6
Problem
os.system(f"python {full_path_to_script}") from your functions project is causing the issue.
Azure Functions Runtime sets up the environment, along with modifying process level variables like os.path so that your function can load any dependencies you may have. When you create a sub-process like that, not all information will flow through. Additionally, you will face issues with logging -- logs from test.py would not show up properly unless explicitly handled.
Importing works locally because you have all your requirements.txt modules installed and available to test.py. This is not the case in Azure. After remotely building as part of publish, your modules are included as part of your code package published. It's not "installed" globally in the Azure environment per se.
Solution
You shouldn't have to run your script like that. In the example above, you could import your test.py from your __init__.py file, and that should behave like it was called python test.py (at least in the case above). Is there a reason you'd want to do python test.py in a sub-process over importing it?
Here's the official guide on how you'd want to structure your app to import shared code -- https://learn.microsoft.com/en-us/azure/azure-functions/functions-reference-python#folder-structure
Side-Note
I think once you get through the import issue, you may also face problems with adlsClient.open('stage1/largeFiles/TestFile.json', 'rb'). We recommend following the developer guide above to structure your project and using __file__ to get the absolute path (reference).
For example --
import pathlib
with open(pathlib.Path(__file__).parent / 'stage1' / 'largeFiles' /' TestFile.json'):
....
Now, if you really want to make os.system(f"python {full_path_to_script}") work, we have workarounds to the import issue. But, I'd rather not recommend such approach unless you have a really compelling need for it. :)

Categories