How to fit TensorFlow Serving Client API in a Python lambda? - python

I'm trying to build a Python Lambda to send images to TensorFlow Serving for inferences. I have at least two dependencies: CV2 and tensorflow_serving.apis. I've run multiple tutorials showing it's possible to run tensorflow in a lambda, but they provide the package to install and don't explain how they got it to fit in the limit of less than 256MB unzipped.
How to Deploy ... Lambda and TensorFlow
Using TensorFlow and the Serverless Framework...
I've tried following the official instructions for packaging but just this downloads 475MB of dependencies:
$ python -m pip install tensorflow-serving-api --target .
Collecting tensorflow-serving-api
Downloading https://files.pythonhosted.org/packages/79/69/1e724c0d98f12b12f9ad583a3df7750e14ec5f06069aa4be8d75a2ab9bb8/tensorflow_serving_api-1.12.0-py2.py3-none-any.whl
...
$ du -hs .
475M .
I see that others have fought this dragon and won (1) (2) by doing contortions to rip out all unused libraries from all dependencies or compile from scratch. But such extremes strike me as complicated and hopefully outdated in a world where data science and lambdas are almost mainstream. Is it true that so few people are using TensorFlow Serving with Python that I'll have to jump through such hoops to get one working as a Lambda? Or is there an easier way?

The goal is to not actually have tensorflow on the client side, as it uses a ton of space but isn't really needed for inference. Unfortunately the tensorflow-serving-api requires the entire tensorflow package, which by itself is too big to fit into a lambda.
What you can do instead is build your own client instead of using that package. This involves using the grpcio-tools package for the protobuf communication, and the various .proto files from tensorflow and tensorflow serving.
Specifically you'll want to package up these files-
tensorflow/serving/
tensorflow_serving/apis/model.proto
tensorflow_serving/apis/predict.proto
tensorflow_serving/apis/prediction_service.proto
tensorflow/tensorflow/
tensorflow/core/framework/resource_handle.proto
tensorflow/core/framework/tensor_shape.proto
tensorflow/core/framework/tensor.proto
tensorflow/core/framework/types.proto
From there you can generate the python protobuf files.

Related

Install Tensorflow for python 3 on Deeplens

I have just started using the Amazon Deeplens device. I got the demo face detection application working. I have a custom trained tensorflow model (.pb) that I want to deploy on Deeplens. I followed an online guide to create a new project with lambda (python 3.7) and uploaded and imported my model from a s3 bucket. Now, my issue is that I need to install tensorflow on the device for python 3. I tried installing various versions and was even successful it once but I still get an error in the logs saying Tensorflow Module not found
I have a couple of questions regarding this:
My Lambda has python 3.7 execution environment. Is this correct or should I match the one on deeplens (3.5)?
Can I upgrade python on Deeplens to the latest version?
If not, what is the Tensorflow version that is supported for python 3.5 on Deeplens and what is the correct command to install it: pip or pip3?
Any help or insight is appreciated.

Shrinking AWS Lambda deployment package with CFLAGS and PIP to fit sklearn

I'm loading a pickled machine learning model with my Lambda handler so I need sklearn (I get "ModuleNotFoundError: No module named 'sklearn'" if it's not included)
So I created a new deployment package in Docker with sklearn.
But when I tried to upload the new lambda.zip file I could not save the lambda function. I get the error: Unzipped size must be smaller than 262144000 bytes
I did some googling and found two suggestions: (1) using CLFAG with PIP and (2) using Lambda Layers.
I don't think Layers will work. Moving parts of my deployment package to layers won't reduce the total size (and AWS documentation states " he total unzipped size of the function and all layers can't exceed the unzipped deployment package size limit of 250 MB".
CFLAGS sound promising but I've never worked with CFLAGS before and I'm getting errors.
I'm trying to add the flags: -Os -g0 -Wl,--strip-all
Pre-CLFAGS my docker pip command was: pip3 install requests pandas s3fs datetime bs4 sklearn -t ./
First I tried: pip3 install requests pandas s3fs datetime bs4 sklearn -t -Os -g0 -Wl,--strip-all ./
That produced errors of the variety "no such option: -g"
Then I tried CFLAGS = -Os -g0 -Wl,--strip-all pip3 install requests pandas s3fs datetime bs4 sklearn -t ./ and CFLAGS = -Os -g0 -Wl,--strip-all
But they produced the error "CFLAGS: command not found"
Can anyone help me understand how to use CFLAGS?
Also, I'm familiar with the saying "beggars can't be choosers" so any advice would be appreciated.
That said, I'm a bit of a noob so if you could help me with CFLAGS in the context of my Docker deployment package workflow it'd be most appreciated.
My docker workflow is:
docker run -it olivierhervieu/amazonlinux-python36-onbuild
mkdir deploy
cd deploy
pip3 install requests pandas s3fs datetime bs4 sklearn -t ./
zip -r lambda.zip *
This kinda is an answer (I was able to shrink my deployment package and get my Lambda deployed) and kinda not an answer (I still don't know how to use CFLAGS).
A lot of googling eventually led me to this article which included a link to this list of modules that come pre-installed in the AWS Lambda Python environment.
My deployment package contained several of the modules that already exist in the AWS Lambda environment and thus do not need to be included in deployment packages.
The modules that saved the most space for me were Boto3 and Botocore. I didn't explicitly add these in my Docker environment but they made their way into my deployment package anyway (I'm guessing that S3FS depends on these modules and when installing S3FS they are also added).
I was also able to remove a lot of smaller modules (datetime, dateutil, docutils, six, etc).
With these modules removed my package was under the 250mb limit and I was able to deploy.
Were I still not under the limit - I wasn't sure if that would be enough - I was going to try another suggestion from the linked article above: removing .py files from the deployment package (you don't need both .pyc and .py files).
Hope this helps with your Lambda deployment package size!
These days you would use Docker container for your lambda as its size can be 10 GB, which is far greater then traditional lambda functions deployed using deployment packages and layers. From AWS:
You can now package and deploy AWS Lambda functions as a container image of up to 10 GB.
Thus you could create a lambda container with sklearn plus any other files and dependencies that you require with the total size of 10 GB.
We faced this exact problem ourselves but with Spacy rather than sklearn.
You're going about it the right way by looking at not deploying packages already included in the AWS deployment, but note sometimes this still won't get you under the limit (especially for ML purposes in which large models have to be included as part of the dependency).
In these instances, another option is to save any external static files (e.g. models etc.) which are used by the library in a private S3 bucket and then read them in at runtime. For example, as described by this answer..
Incidentally, If you're using the serverless framework to deploy your lambdas should check out the serverless-python-requirements plugin, which allows you to implement the steps you've described such as specifying packets not to deploy with the function and build 'slim' versions of the dependencies (automatically stripping out the .so files, pycache and dist-info directories as well as .pyc and .pyo files.)
Good luck :)
We had the same problem and it was very difficult to make it work. We ended up buying this layer that includes scikit-learn, pandas, numpy and scipy.
https://www.awslambdas.com/layers/3/aws-lambda-scikit-learn-numpy-scipy-python38-layer
There is another layer that includes xgboost as well.
I found this article that mentions the use of CFLAGS. In the comments, a guy named Jesper explained how to use CFLAGS as quoted:
If anyone else doubts how to add the CFLAGS to pip here is how I did it:
Before running pip do this (I did this in Ubuntu):
export CFLAGS="$CFLAGS -Os -g0 -Wl,--strip-all -I/usr/include:/usr/local/include -L/usr/lib:/usr/local/lib"
export CXXFLAGS="$CXXFLAGS -Os -g0 -Wl,--strip-all -I/usr/include:/usr/local/include -L/usr/lib:/usr/local/lib"
Then run pip like (for numpy, but do whatever other package you want):
pip install numpy --no-cache-dir --compile --global-option=build_ext

Full installation of tensorflow (all modules)?

I have this repository with me ; https://github.com/layog/Accurate-Binary-Convolution-Network . As requirements.txt says, it requires tensorflow==1.4.1. So I am using miniconda (in Ubuntu18.04) and for the love of God, I can't get it to run (errors out at the below line)
from tensorflow.examples.tutorial.* import input_data
Gives me an ImportError saying it can't find tensorflow.examples. I have diagnosed the problem that a few modules are missing after I installed tensorflow (Have tried all of the below ways)
pip install tensorflow==1.4.1
conda install -c conda-forge tensorflow==1.4.1
#And various wheel packages avaliable on the internet for 1.4.1
pip install tensorflow-1.4.0rc1-cp36-cp36m-manylinux1_x86_64.whl
Question is, if I want all the modules which are present in the git repo source as my installed copy, do I have to COMPLETELY build tensorflow from source ? If yes, can you mention the flag I should use? Are there any wheel packages available that have all modules present in them ?
A link would save me tonnes of effort!
NOTE: Even if I manually import the examples directory, it says tensorflow.contrib is missing, and if I local import that too, another ImportError pops up. There has to be an easier way I am sure of it
Just for reference for others stuck in the same situation:-
Use latest tensorflow build and bezel 0.27.1 for installing it. Even though the requirements state that we need an older version - use newer one instead. Not worth the hassle and will get the job done.
Also to answer the above question about building only specific directories is possible. Each module consists of BUILD file which is fed to bezel.
See the names category of the file to build specific to that folder. For reference the command I used to generate the wheel package for examples.tutorial.mnist :
bazel build --config=opt --config=cuda --incompatible_load_argument_is_label=false //tensorflow/examples/tutorials/mnist:all_files
Here all_files is the name found in the examples/tutorials/mnist/BUILD file.

Deploying Google Cloud Functions with TensorFlow as a dependency

I'd like to use Google Cloud Functions to deploy a keras model saved in JSON (including weights in HDF5) with tensorflow as backend.
The deployment succeed when I don't specify tensorflow in requirements.txt. Although when testing the function in GCP, I get the error message specifying that tensorflow could not be found.
Error: function crashed. Details:
No module named 'tensorflow'
First, I find it quite strange that Google doesn't provide environments with tensorflow pre-installed.
But now if I specify tensorflow in requirements.txt, deployment fails with the error message
ERROR: (gcloud.beta.functions.deploy) OperationError:
code=3, message=Build failed: USER ERROR:
`pip_download_wheels` had stderr output:
Could not find a version that satisfies the
requirement tensorflow (from -r /dev/stdin (line 5))
(from versions: )
No matching distribution found for tensorflow (from -r
/dev/stdin (line 5))
Is there a way I can get tensorflow on Cloud Functions or Google deliberately blocks the install to get us to use ML Engine?
EDIT: Tensorflow 1.13.1 now supports Python 3.7.
Previous answer:
There isn't currently a way to use tensorflow on Google Cloud Functions.
However, it's not because Google is deliberately blocking it: the tensorflow package only provides built distributions for CPython 2.7, 3.3, 3.4, 3.5 and 3.6, but the Cloud Functions Python runtime is based on Python version 3.7.0, so pip (correctly) can't find any compatible distributions.
There are currently some compatibility issues with TensorFlow and Python 3.7, but once that is fixed, tensorflow should be installable on Google Cloud Functions. For now though, you'll have to use ML Engine.

Unable to install pandas on AWS Lambda

I'm trying to install and run pandas on an Amazon Lambda instance. I've used the recommended zip method of packaging my code file model_a.py and related python libraries (pip install pandas -t /path/to/dir/) and uploaded the zip to Lambda. When I try to run a test, this is the error message I get:
Unable to import module 'model_a': C extension:
/var/task/pandas/hashtable.so: undefined symbol: PyFPE_jbuf not built.
If you want to import pandas from the source directory, you may need
to run 'python setup.py build_ext --inplace' to build the C extensions
first.
Looks like an error in a variable defined in hashtable.so that comes with the pandas installer. Googling for this did not turn up any relevant articles. There were some references to a failure in numpy installation but nothing concrete. Would appreciate any help in troubleshooting this! Thanks.
I would advise you to use Lambda layers to use additional libraries. The size of a lambda function package is limited, but layers can be used up to 250MB (more here).
AWS has open sourced a good package, including Pandas, for dealing with data in Lambdas. AWS has also packaged it making it convenient for Lambda layers. You can find instructions here.
I have successfully run pandas code on lambda before. If your development environment is not binary-compatible with the lambda environment, you will not be able to simply run pip install pandas -t /some/dir and package it up into a lambda .zip file. Even if you are developing on linux, you may still run into compatability issues.
So, how do you get around this? The solution is actually pretty simple: run your pip install on a lambda container and use the pandas module that it downloads/builds instead. When I did this, I had a build script that would spin up an instance of the lambci/lambda container on my local system (a clone of the AWS Lambda container in docker), bind my local build folder to /build and run pip install pandas -t /build/. Once that's done, kill the container and you have the lambda-compatible pandas module in your local build folder, ready to zip up and send to AWS along with the rest of your code.
You can do this for an arbitrary set of python modules by making use of a requirements.txt file, and you can even do it for arbitrary versions of python by first creating a virtual environment on the lambci container. I haven't needed to do this for a couple of years, so maybe there are better tools by now, but this approach should at least be functional.
If you want to install it directly through the AWS Console, I made a step-by-step youtube tutorial, check out the video here: How to install Pandas on AWS Lambda

Categories