Tensorflow on Elastic Beanstalk and Django - python

import tensorflow as tf
with tf.Graph().as_default():
sentences = tf.placeholder(tf.string)
import tensorflow_hub as hub
embed = hub.Module('/tmp/module')
embeddings = embed(sentences)
This is from my views.py file in my django app.
the execution fails at the import and shows the following error.
End of script output before headers: wsgi.py child pid 29409 exit
signal Segmentation fault (11)
While it works fine on my local machine.
I am currently using a t3 medium instance. Any tips on how to fix this ?

Related

How to upgrade TF1 GAN notebook on Colab to TF2? It does't work because Colab is't supporting TF1 anymore

I was trying to run this notebook on colab,
https://colab.research.google.com/github/https-deeplearning-ai/GANs-Public/blob/master/C1W1_(Colab)_Inputs_to_a_pre_trained_GAN.ipynb ,
but first I got this :
ValueError: Tensorflow 1 is unsupported in Colab.
then I upgraded it using this script:
import tensorflow as tf
!tf_upgrade_v2 \
--intree stylegan/ \
--inplace
and I did comment these:
%tensorflow_version 1.x
tflib.init_tf()
but I got this one! and couldn't solve:
AttributeError: Can't get attribute 'Network' on <module 'dnnlib.tflib.network' from '/content/stylegan/dnnlib/tflib/network.py'>
Can somebody help?
# Clone the official StyleGAN repository from GitHub
!git clone https://github.com/NVlabs/stylegan.git
%tensorflow_version 1.x
import os
import pickle
import numpy as np
import PIL.Image
import stylegan
from stylegan import config
from stylegan.dnnlib import tflib
from tensorflow.python.util import module_wrapper
module_wrapper._PER_MODULE_WARNING_LIMIT = 0
# Initialize TensorFlow
tflib.init_tf()
# Go into that cloned directory
path = 'stylegan/'
if "stylegan" not in os.getcwd():
os.chdir(path)
# Load pre-trained network
# url = 'https://drive.google.com/uc?id=1MEGjdvVpUsu1jB4zrXZN7Y4kBBOzizDQ' # Downloads the pickled model file: karras2019stylegan-ffhq-1024x1024.pkl
url = 'https://bitbucket.org/ezelikman/gans/downloads/karras2019stylegan-ffhq-1024x1024.pkl'
with stylegan.dnnlib.util.open_url(url, cache_dir=config.cache_dir) as f:
print(f)
_G, _D, Gs = pickle.load(f)
# Gs.print_layers() # Print network details

Problem with init() function for model deployment in Azure

I want to deploy model in Azure but I'm struggling with the following problem.
I have my model registered in Azure. The file with extension .sav is located locally. The registration looks the following:
import urllib.request
from azureml.core.model import Model
# Register model
model = Model.register(ws, model_name="my_model_name.sav", model_path="model/")
I have my score.py file. The init() function in the file looks like this:
import json
import numpy as np
import pandas as pd
import os
import pickle
from azureml.core.model import Model
def init():
global model
model_path = Model.get_model_path(model_name = 'my_model_name.sav', _workspace='workspace_name')
model = pickle(open(model_path, 'rb'))
But when I try to deploy I se the following error:
"code": "AciDeploymentFailed",
"statusCode": 400,
"message": "Aci Deployment failed with exception: Your container application crashed. This may be caused by errors in your scoring file's init() function.
1. Please check the logs for your container instance: leak-tester-pm. From the AML SDK, you can run print(service.get_logs()) if you have service object to fetch the logs.
And when I run print(service.logs()) I have the following output (I have only one model registered in Azure):
None
Am I doing something wrong with loading model in score.py file?
P.S. The .yml file for the deployment:
name: project_environment
dependencies:
# The python interpreter version.
# Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2
- pip:
- scikit-learn==0.24.2
- azureml-defaults
- numpy
- pickle-mixin
- pandas
- xgboost
- azure-ml-api-sdk
channels:
- anaconda
- conda-forge
The local inference server allows you to quickly debug your entry script (score.py). In case the underlying score script has a bug, the server will fail to initialize or serve the model. Instead, it will throw an exception & the location where the issues occurred.
There are two possible reasons for the error or exception occurred.
HTTP server issue. Need to troubleshoot it
Docker deployment.
You need to debug the procedure which you followed. In some cases, HTTP server issues will cause the problem of initialization (init())
Check Azure Machine Learning inference HTTP server for better debugging from server perspective.
The docker file mentioned is looking good, but it's better to debug once by the steps mentioned in https://learn.microsoft.com/en-us/azure/machine-learning/how-to-troubleshoot-deployment-local#dockerlog
Try below code inside init() function:
def init():
global model
model_path = "modelfoldername/model.pkl"
filename = 'mymodel.sav'
pickle.dump(model_path, open(filename, 'wb'))
# load the model from disk
model= pickle.load(open(filename, 'rb'))

How can I run python code after a DBT run (or a specific model) is completed?

I would like to be able to run an ad-hoc python script that would access and run analytics on the model calculated by a dbt run, are there any best practices around this?
We recently built a tool that could that caters very much to this scenario. It leverages the ease of referencing tables from dbt in Python-land. It's called fal.
The idea is that you would define the python scripts you would like to run after your dbt models are run:
# schema.yml
models:
- name: iris
meta:
owner: "#matteo"
fal:
scripts:
- "notify.py"
And then the file notify.py is called if the iris model was run in the last dbt run:
# notify.py
import os
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError
CHANNEL_ID = os.getenv("SLACK_BOT_CHANNEL")
SLACK_TOKEN = os.getenv("SLACK_BOT_TOKEN")
client = WebClient(token=SLACK_TOKEN)
message_text = f"""Model: {context.current_model.name}
Status: {context.current_model.status}
Owner: {context.current_model.meta['owner']}"""
try:
response = client.chat_postMessage(
channel=CHANNEL_ID,
text=message_text
)
except SlackApiError as e:
assert e.response["error"]
Each script is ran with a reference to the current model for which it is running in a context variable.
To start using fal, just pip install fal and start writing your python scripts.
For production, I'd recommend an orchestration layer such as apache airflow.
See this blog post to get started, but essentially you'll have an orchestration DAG (note - not a dbt DAG) that does something like:
dbt run <with args> -> your python code
Fair warning, though, this can add a bit of complexity to your project.
I suppose you could get a similar effect with a CI/CD tool like github actions or circleCI

Issue while using transformers package inside the docker image

I am using transformers pipeline to perform sentiment analysis on sample texts from 6 different languages. I tested the code in my local Jupyterhub and it worked fine. But when I wrap it in a flask application and create a docker image out of it, the execution is hanging at the pipeline inference line and its taking forever to return the sentiment scores.
mac os catalina 10.15.7 (no GPU)
Python version : 3.8
Transformers package : 4.4.2
torch version : 1.6.0
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_name = "nlptown/bert-base-multilingual-uncased-sentiment"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)
results = classifier(["We are very happy to show you the Transformers library.", "We hope you don't hate it."])
print([i['score'] for i in results])
The above code works fine in Jupyter notebook and it has provided me the expected result
[0.7495927810668945,0.2365245819091797]
So now if I create a docker image with flask wrapper its getting stuck at the results = classifier([input_data]) line and the execution is running forever.
My folder structure is as follows:
- src
|-- app
|--main.py
|-- Dockerfile
|-- requirements.txt
I used the below Dockerfile to create the image
FROM tiangolo/uwsgi-nginx-flask:python3.8
COPY ./requirements.txt /requirements.txt
COPY ./app /app
WORKDIR /app
RUN pip install -r /requirements.txt
RUN echo "uwsgi_read_timeout 1200s;" > /etc/nginx/conf.d/custom_timeout.conf
And my requirements.txt file is as follows:
pandas==1.1.5
transformers==4.4.2
torch==1.6.0
My main.py script look like this :
from flask import Flask, json, request, jsonify
import traceback
import pandas as pd
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
app = Flask(__name__)
app.config["JSON_SORT_KEYS"] = False
model_name = 'nlptown/bert-base-multilingual-uncased-sentiment'
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
nlp = pipeline('sentiment-analysis', model=model_path, tokenizer=model_path)
#app.route("/")
def hello():
return "Model: Sentiment pipeline test"
#app.route("/predict", methods=['POST'])
def predict():
json_request = request.get_json(silent=True)
input_list = [i['text'] for i in json_request["input_data"]]
results = nlp(input_list) ########## Getting stuck here
for result in results:
print(f"label: {result['label']}, with score: {round(result['score'], 4)}")
score_list = [round(i['score'], 4) for i in results]
return jsonify(score_list)
if __name__ == "__main__":
app.run(host='0.0.0.0', debug=False, port=80)
My input payload is of the form
{"input_data" : [{"text" : "We are very happy to show you the Transformers library."},
{"text" : "We hope you don't hate it."}]}
I tried looking into the transformers github issues but couldn't find one. I execution works fine even when using the flask development server but it runs forever when I wrap it and create a docker image. I am not sure if I am missing any additional dependency to be included while creating the docker image.
Thanks.
I was having a similar issue. It seems that starting the app somehow polutes the memory of transformers models. Probably something to do with how Flask does threading but no idea why. What fixed it for me was doing the things that are causing trouble (loading the models) in a different thread.
import threading
def preload_models():
"LOAD MODELS"
return 0
def start_app():
app = create_app()
register_handlers(app)
preloading = threading.Thread(target=preload_models)
preloading.start()
preloading.join()
return app
First reply here. I would be really glad if this helps.
Flask uses port 5000. In creating a docker image, it's important to make sure that the port is set up this way. Replace the last line with the following:
app.run(host="0.0.0.0", port=int(os.environ.get("PORT", 5000)))
Be also sure to import os at the top
Lastly, in Dockerfile, add
EXPOSE 5000
CMD ["python", "./main.py"]

why does importing keras take so long on AWS Lambda?

Problem
I'm trying to load and use a keras model in AWS Lambda, but importing keras from tensorflow is taking a long time in my lambda function. Curiously, though, it didn't take very long in SageMaker. Why is this, and how can I fix it?
Setup Description
I'm using the serverless framework to deploy my function. The handler and serverless.yml are included below. I have an EFS volume holding my dependencies, which were installed using an EC2 instance with the EFS volume mounted. I pip installed dependencies to the EFS with the -t flag. For example, I installed tensorflow like this:
sudo miniconda3/envs/devenv/bin/pip install tensorflow -t /mnt/efs/fs1/lib
where /mnt/efs/fs1/lib is the folder on the EFS which stores my dependencies. The models are stored on s3.
I prototyped my loading my model on a Sagemaker notebook with the following code:
import time
start = time.time()
from tensorflow import keras
print('keras: {}'.format(time.time()-start))
import boto3
import os
import zipfile
print('imports: {}'.format(time.time()-start))
modelPath = '***model.zip'
bucket = 'predictionresources'
def load_motion_model():
s3 = boto3.client('s3')
s3.download_file(bucket, modelPath, 'model.motionmodel.zip')
with zipfile.ZipFile('model.motionmodel.zip', 'r') as zip_ref:
zip_ref.extractall('model.motionmodel')
return keras.models.load_model('model.motionmodel/'+os.listdir('model.motionmodel')[0])
model = load_motion_model()
print('total time: {}'.format(time.time()-start))
which has the following output:
keras: 2.0228586196899414
imports: 2.0231151580810547
total time: 3.0635251998901367
so, including all imports, this takes around 3 seconds to execute.
however, when I deploy to AWS Lambda with serverless, keras takes substantially longer to import. This lambda function (which is the same as the other one, just wrapped in a handler) and serverless.yml:
Handler
try:
import sys
import os
sys.path.append(os.environ['MNT_DIR']+'/lib0') # nopep8 # noqa
except ImportError:
pass
#returns the version of all dependancies
def test(event, context):
print('TEST LOADING')
import time
start = time.time()
from tensorflow import keras
print('keras: {}'.format(time.time()-start))
import boto3
import os
import zipfile
print('imports: {}'.format(time.time()-start))
modelPath = '**********nmodel.zip'
bucket = '***********'
def load_motion_model():
s3 = boto3.client('s3')
s3.download_file(bucket, modelPath, 'model.motionmodel.zip')
with zipfile.ZipFile('model.motionmodel.zip', 'r') as zip_ref:
zip_ref.extractall('model.motionmodel')
return keras.models.load_model('model.motionmodel/'+os.listdir('model.motionmodel')[0])
model = load_motion_model()
print('total time: {}'.format(time.time()-start))
body = {
'message': 'done!'
}
response = {
"statusCode": 200,
"body": json.dumps(body)
}
return response
(p.s. I know this would fail due to lack of write access, and the model needs to be saved to /tmp/)
Serverless.yml
service: test4KerasTest
plugins:
- serverless-pseudo-parameters
custom:
efsAccessPoint: fsap-*****
LocalMountPath: /mnt/efs
subnetsId: subnet-*****
securityGroup: sg-*****
provider:
name: aws
runtime: python3.6
region: us-east-2
timeout: 30
iamRoleStatements:
- Effect: Allow
Action:
- s3:GetObject
- s3:PutObject
Resource: 'arn:aws:s3:::predictionresources/*'
package:
exclude:
- node_modules/**
- .vscode/**
- .serverless/**
- .pytest_cache/**
- __pychache__/**
functions:
test:
handler: handler.test
environment: # Service wide environment variables
MNT_DIR: ${self:custom.LocalMountPath}
BUCKET: predictionresources
REGION: us-east-2
vpc:
securityGroupIds:
- ${self:custom.securityGroup}
subnetIds:
- ${self:custom.subnetsId}
iamManagedPolicies:
- arn:aws:iam::aws:policy/AmazonElasticFileSystemClientReadWriteAccess
- arn:aws:iam::aws:policy/AmazonS3FullAccess
events:
- http:
path: test
method: get
fileSystemConfig:
localMountPath: '${self:custom.LocalMountPath}'
arn: 'arn:aws:elasticfilesystem:${self:provider.region}:#{AWS::AccountId}:access-point/${self:custom.efsAccessPoint}'
results in this output from cloudwatch:
As can be seen, keras takes substantially longer to import in my lambda environment, but the other imports don't seem to be as negatively effected. I have tried importing different modules in different orders, and keras consistently takes an unreasonable amount of time to import.. Due to restrictions by the API Gateway, this function can't take longer than 30 seconds, which means I have to find a way to shorten the time it takes me to import keras on my lambda function.

Categories