I'm trying to figure out the best way to preprocess my input data for my inference endpoint for AWS Sagemaker. I'm using the BlazingText algorithm.
I'm not really sure the best way forward and I would be thankful for any pointers.
I currently train my model using a Jupyter notebook in Sagemaker and that works wonderfully, but the problem is that I use NLTK to clean my data (Swedish stopwords and stemming etc):
import nltk
nltk.download('punkt')
nltk.download('stopwords')
So the question is really, how do I get the same pre-processing logic to the inference endpoint ?
I have a couple of thoughts about how to proceed:
Build a docker container with the python libs & data installed with the sole purpose of pre-processing the data. Then use this container in the inference pipeline.
Supply the Python libs and Script to an existing container in the same way you can do for external lib an notebook
Build a custom fastText container with the libs I need and run it outside of Sagemaker.
Will probably work, but feels like a "hack": Build a Lambda function that has the proper Python libs&data installed and calls the Sagemaker Endpoint. I'm worried about cold start delays as the prediction traffic volume will be low.
I would like to go with the first option, but I'm struggling a bit to understand if there is a docker image that I could build from, and add my dependencies to, or if I need to build something from the ground up. For instance, would the image sagemaker-sparkml-serving:2.2 be a good candidate?
But maybe there is a better way all around?
Related
I have a python code which uses keras and tensorflow backend. My system doesn't support training this model due to low memory space. I want to take use of Amazon sagemaker.
However all the tutorials I find are about deploying your model in docker containers. My model isn't trained and I want to train it on Amazon Sagemaker.
Is there a way to do this?
EDIT : Also can I make a script of my python code and run on it on AWS sagemaker?
SageMaker provides the capability for users to bring in their custom training scripts and train their algorithms using the script it on SageMaker using one of the pre-built containers for frameworks like TensorFlow, MXNet, PyTorch.
Please take a look at https://github.com/aws/amazon-sagemaker-examples/blob/master/frameworks/tensorflow/get_started_mnist_train.ipynb
It walks through how you can bring in your training script using TensorFlow and train it using SageMaker.
There are several other examples in the repository which will help you answer other questions you might have as you progress on with your SageMaker journey.
Question is really clear. Nowadays im learning AWS world, and this question is eating my head up. What is the difference of import xgboost and import sagemaker.xgboost.
On SageMaker i can work with normal XGBoost library, and i know i can select different EC2 types with sagemaker.xgboost. But except this, what is the difference?
Are there any big difference?
Using model training as an example task: sagemaker.xgboost provides the ability to create Amazon SageMaker training jobs (and related AWS resources) in an environment that has the XGBoost library installed. So import xgboost gives you the modules for writing a training script that actually trains a model whereas import sagemaker.xgboost gives you modules for performing the training task on SageMaker.
The same applies for other tasks (e.g. predictions).
SageMaker XGBoost documentation: https://sagemaker.readthedocs.io/en/stable/frameworks/xgboost/using_xgboost.html#use-the-open-source-xgboost-algorithm
The requirement is that I have to trigger a SageMaker endpoint on lambda to get predictions(which is easy) but have to do some extra processing for variable importance using packages such as XGBoost and SHAP.
I am able to hit the endpoint and get variable importance using the SageMaker Jupyter notebook. Now, I want to replicate the same thing on AWS lambda.
1) How to run python code on AWS lambda with package dependencies for Pandas, XGBoost and SHAP (total package size greater than 500MB). The unzipped deployment package size is greater than 250 MB, hence lambda is not allowing to deploy. I even tried using lambda function from Cloud9 and got the same error due to size restrictions. I have also tried lambda layers, but no luck.
2) Is there a way for me to run the code with such big packages on or through lambda bypassing the deployment package size limitation of 250 MB
3) Is there a way to trigger a SageMaker notebook execution through lambda which would do the calculations and return the output back to lambda?
Try to upload your dependencies to the Lambda Layer. FYI: https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html
In addition to use multiple layers for your dependencies - you may want to reduce the *.so files by linux strip command which discards symbols from compiled object files which may not necessary in production
In order to strip all *.so -
use linux/docker container with access to your dependencies directory
cd to your dependencies directory
Run
find . -name *.so -exec strip {} \;
Will execute strip command on every *.so file in the current working directory recursively.
It helped me reduce one of my dependencies objects from 94MB to just 7MB
I found the 250MB limitation on AWS lambda size to be draconian. Only one file ibxgboost.so from xgboost package is already around 140 MB which leaves only 110Mb for everything else. That makes AWS lambdas useless for anything but simple "hello world" stuff.
As an ugly workaround you can store xgboost package somewhere on s3 an copy it to the /tmp folder from the lambda invocation routine and point your python path to it. The allowed tmp space is a bit higher - 500MB so it might work.
I am not sure though if the /tmp folder is not cleaned between the lambda function runs though.
You can try using SageMaker Inference Pipelines to do pre-processing before making actual predictions. Basically, you can use the same pre-processing script used for training for inference as well. When the pipeline model is deployed, the full set of containers with pre-processing tasks installs and runs on each EC2 instance in the endpoint or transform job. Feature processing and inferences are executed with low latency because the containers deployed in an inference pipeline are co-located on the same EC2 instance (endpoint). You can refer documentation here.
Following blog posts/notebooks cover this feature in detail
Preprocess input data before making predictions using Amazon SageMaker inference pipelines and Scikit-learn
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/inference_pipeline_sparkml_xgboost_abalone/inference_pipeline_sparkml_xgboost_abalone.ipynb
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/inference_pipeline_sparkml_blazingtext_dbpedia/inference_pipeline_sparkml_blazingtext_dbpedia.ipynb
I have many tensorflow models which make use of 3rd party libraries (i.e. Gensim) to preprocess data prior to training and evaluation. This same preprocessing needs to happen when querying the model to make predictions.
If using either tensorflow-serving or the hosted Google ML solution, can I bundle 3rd party libs and a custom preprocessing step along with the model, and have either of the two serving solutions run it? Or, if I want to use 3rd party libraries, do I have to preprocess the data client-side? I have not come across any examples of this.
Just to be explicit - I know you can do server-side preprocessing using tensorflow's libs, I'm specifically interested in the 3rd-party case.
As far as the ML Engine is concerned, I don't see how this would be possible. Models deployed there need to be in the SavedModel format. This doesn't include any Python files for example where you could run custom processing. In contrast the training job that creates the model can include custom dependencies.
Is there a way to compile the entire Python script with my trained model for faster inference? Seems like loading the Python interpreter, all of Tensorflow, numpy, etc. takes a non-trivial amount of time. When this has to happen at a server responding to a non-trivial frequency of requests, it seems slow.
Edit
I know I can use Tensorflow serving, but don't want to because of the costs associated with it.
How do you set up a server? If you are setting up a server using python framework like django, flask or tornado, you just need to preload your model and keep it as a global variable, and then use this global variable to predict.
If you are using some other server. You can also make the entire python script you use to predict as a local server, and transform request or response between python server and web server.
Do you want to only serve the tensorflow model, or are you doing any work outside of tensorflow?
For just the tensorflow model, you could use TensorFlow Serving. If you are comfortable with gRPC, this will serve you quite well.