Shrink Size of AWS Deployment Package - python

Is their any way to shrink size of the deployment package in python.
I'm using Fbprophet and other liraries by which Lambda max. size limit is exceeding so can we shrink those libraries and deploy .zip to AWS Lambda ??

Lambda Layers sound like what you want - They allow you to have a separate package with all your dependencies which then you can reference in your code.
From their docs:
Layers let you keep your deployment package small, which makes
development easier. You can avoid errors that can occur when you
install and package dependencies with your function code. For Node.js,
Python, and Ruby functions, you can develop your function code in the
Lambda console as long as you keep your deployment package under 3 MB.
For more info Layers see : https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html

Related

Installing a Python library wheel in an Azure Machine Learning Compute Cluster

I am working on an Azure Machine Learning Studio pipeline via the Designer. I need to install a Python library wheel (a third-party tool) in the same compute, so that I can import it into the designer. I usually install packages to compute instances via the terminal, but the Azure Machine Learning Studio designer uses a compute cluster, not a compute instance.
Is there any way to access the terminal so that I can install the wheel in the compute cluster and have access to the library via the designer? Thanks!
There isn't an easy path for this. Your options are either, switch to a code-first pipeline definition approach, or try your darndest to extend the Designer UI to meet your needs.
Define pipelines with v2 CLI or Python SDK
It looks like you're already outside of I get the impression that you know Python quite well, you should really check out the v2 CLI or the Python SDK for Pipelines. I'd recommend maybe starting with the v2 CLI as it will be the way to define AML jobs in the future.
Both require some initial learning, but will give you all the flexibility that isn't currently available in the UI.
custom Docker image
The "Execute Python Script" module allows use a custom python Docker image. I think this works? I just tried it but not with a custom .whl file, and it looked like it worked

Deploy containerized lambda with layer using CDK

I'm working on an ML project that utilizes AWS Lambda for building models and generating predictions. Lambdas are written in python and use several ML libraries like pandas, sklearn, numpy, and scikit-learn.
These lambdas use shared code that packaged by Lambda's layer.
I use AWS CDK for project deployment. CDK code is written in TypeScript, don't ask why I mix Python and Typescript, it's not relevant in this situation.
The size of the package (lambda code + layer) exceeds the maximum allowed size of 250MB because of ML libraries.
After AWS announcement of containerized lambdas support, I decided to try it out to overcome the 250MB limit. However, I didn't find any good example that fit my situation, so I'm trying to build it myself.
The CDK code looks like this:
...
// Create a lambda layer from code
// Code is located in lambda-code/ml directory and it looks
// like any Python package with main ML and DB connection functions
const mlLayer = new PythonLayerVersion(this, 'mlLayer', {
entry: './lambda-code/ml/',
})
...
// Lambda function is specified like
const classifyTransactionLambda = new DockerImageFunction(this, 'classifyTransactionLambda', {
code: DockerImageCode.fromImageAsset('./lambda-code/classify'),
memorySize: 512,
layers: [mlLayer],
tracing: Tracing.ACTIVE,
environment: {
BUCKET_NAME: mlModelsBucket.bucketName,
ENV: env
}
});
...
The structure of the code looks like this:
Dockerfile in classify lambda:
# Use the python lambda image from AWS ECR
FROM public.ecr.aws/lambda/python:3.7
COPY requirements.txt ./
RUN pip3 install -r requirements.txt
COPY index.py ./
CMD ["index.classify_transaction_handler"]
When I run cdk deploy I'm getting the following error:
This lambda function uses a runtime that is incompatible with this layer (FROM_IMAGE is not in [python3.7])
Does anyone run into problem like this? Is this error mean that mlLayer version is not compatible with lambda classifyTransactionLambda?
Any help would be very appreciated!
At this point
Functions defined as container images do not support layers. When you build a container image, you can package your preferred runtimes and dependencies as a part of the image.
https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html
So I modified my build to copy all layer/library code to each lambda function before building an image.

How to run python code on AWS lambda with package dependencies >500MB?

The requirement is that I have to trigger a SageMaker endpoint on lambda to get predictions(which is easy) but have to do some extra processing for variable importance using packages such as XGBoost and SHAP.
I am able to hit the endpoint and get variable importance using the SageMaker Jupyter notebook. Now, I want to replicate the same thing on AWS lambda.
1) How to run python code on AWS lambda with package dependencies for Pandas, XGBoost and SHAP (total package size greater than 500MB). The unzipped deployment package size is greater than 250 MB, hence lambda is not allowing to deploy. I even tried using lambda function from Cloud9 and got the same error due to size restrictions. I have also tried lambda layers, but no luck.
2) Is there a way for me to run the code with such big packages on or through lambda bypassing the deployment package size limitation of 250 MB
3) Is there a way to trigger a SageMaker notebook execution through lambda which would do the calculations and return the output back to lambda?
Try to upload your dependencies to the Lambda Layer. FYI: https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html
In addition to use multiple layers for your dependencies - you may want to reduce the *.so files by linux strip command which discards symbols from compiled object files which may not necessary in production
In order to strip all *.so -
use linux/docker container with access to your dependencies directory
cd to your dependencies directory
Run
find . -name *.so -exec strip {} \;
Will execute strip command on every *.so file in the current working directory recursively.
It helped me reduce one of my dependencies objects from 94MB to just 7MB
I found the 250MB limitation on AWS lambda size to be draconian. Only one file ibxgboost.so from xgboost package is already around 140 MB which leaves only 110Mb for everything else. That makes AWS lambdas useless for anything but simple "hello world" stuff.
As an ugly workaround you can store xgboost package somewhere on s3 an copy it to the /tmp folder from the lambda invocation routine and point your python path to it. The allowed tmp space is a bit higher - 500MB so it might work.
I am not sure though if the /tmp folder is not cleaned between the lambda function runs though.
You can try using SageMaker Inference Pipelines to do pre-processing before making actual predictions. Basically, you can use the same pre-processing script used for training for inference as well. When the pipeline model is deployed, the full set of containers with pre-processing tasks installs and runs on each EC2 instance in the endpoint or transform job. Feature processing and inferences are executed with low latency because the containers deployed in an inference pipeline are co-located on the same EC2 instance (endpoint). You can refer documentation here.
Following blog posts/notebooks cover this feature in detail
Preprocess input data before making predictions using Amazon SageMaker inference pipelines and Scikit-learn
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/inference_pipeline_sparkml_xgboost_abalone/inference_pipeline_sparkml_xgboost_abalone.ipynb
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/inference_pipeline_sparkml_blazingtext_dbpedia/inference_pipeline_sparkml_blazingtext_dbpedia.ipynb

Optimizing build time for ReadTheDocs project

I am developing a reasonably-sized binary Python library, Parselmouth, which takes some time to build - mainly because I am wrapping an existing program with a large codebase. Consequently, now that I'm trying to set up API documentation, I am running into either the 15 minute time limit or 1 GB memory limit (when I multithread my build, I have some expensive template instantiations and the compiler process gets killed) when building on ReadTheDocs.
However, I have successfully set up Travis CI builds, using ccache to not recompile the large codebase, but only the changed parts of the wrapper code.
I have been thinking about installing from PyPI, but then the versioning gets complicated, and intermediate development builds do not get good API documentation.
So I was wondering: is there a known solution for this kind of case, maybe using the builds from Travis CI?
What I ended up doing to solve this problem was to use BinTray to upload my wheels built on Travis CI. After this built and upload have succeeded, I manually trigger the ReadTheDocs build, which will then install the project with the right Python wheel from BinTray.
For more details, see this commit

python tox, creating rpm virtualenv, as part of ci pipeline, unsure about where in workflow

I'm investigating how Python applications can also use a CI pipeline, but I'm not sure how to create the standard work-flow.
Jenkins is used to do the initial repository clone, and then initiates tox. Basically this is where maven, and/or msbuild, would get dependency packages and build.... which tox does via pip, so all good here.
But now for the confusing part, the last part of the pipeline is creating and uploading packages. Devs would likely upload created packages to a local pip repository, BUT then also possibly create a deployment package. In this case it would need to be an RPM containing a virtualenv of the application. I have made one manually using rpmvenev, but regardless of how its made, how what such a step be added to a tox config? In the case if rpmvenv, it creates its own virtualenv, a self contained command so to speak.
I like going with the Unix philosophy for this problem. Have a tool that does one thing incredibly well, then compose other tools together. Tox is purpose built to run your tests in a bunch of different python environments so using it to then build a deb / rpm / etc for you I feel is a bit of a misuse of that tool. It's probably easier to use tox just to run all your tests then depending on the results have another step in your pipeline deal with building a package for what was just tested.
Jenkins 2.x which is fairly recent at the time of this writing seems to be much better about building pipelines. BuildBot is going through a decent amount of development and already makes it fairly easy to build a good pipeline for this as well.
What we've done at my work is
Buildbot in AWS which receives push notifications from Github on PR's
That kicks off a docker container that pulls in the current code and runs Tox (py.test, flake8, as well as protractor and jasmine tests)
If the tox step comes back clean, kick off a different docker container to build a deb package
Push that deb package up to S3 and let Salt deal with telling those machines to update
That deb package is also just available as a build artifact, similar to what Jenkins 1.x would do. Once we're ready to go to staging, we just take that package and promote it to the staging debian repo manually. Ditto for rolling it to prod.
Tools I've found useful for all this:
Buildbot because it's in Python thus easier for us to work on but Jenkins would work just as well. Regardless, this is the controller for the entire pipeline
Docker because each build should be completely isolated from every other build
Tox the glorious test runner to handle all those details
fpm builds the package. RPM, DEB, tar.gz, whatever. Very configurable and easy to script.
Aptly makes it easy to manage debian repositories and in particular push them up to S3.

Categories