I have built a deployment package with pandas, numpy, etc for my sample code to run. The size is some 46 MB. Doubt is, do I have to zip my code update every time and again update the entire deployment package to AWS S3 for a simple code update too?
Is there any other way, by which, I can avoid the 45 MB upload cost of S3 everytime and just upload the few KBs of code?
I would recommend creating a layer in AWS lambda.
First you need to create an instance of Amazon Linux (using the AMI specified in https://docs.aws.amazon.com/lambda/latest/dg/current-supported-versions.html - at this time (26th of March 2019) it is amzn-ami-hvm-2017.03.1.20170812-x86_64-gp2 ) or a docker container with the same environment as the lambda execution environment.
I personally do it with docker.
For example, to create a layer for Python 3.6, I would run
sudo docker run --rm -it -v "$PWD":/var/task lambci/lambda:build-python3.6 bash
Then I would create a folder python/lib/python3.6/site-packages in /var/task in the docker container (so it will be accessible later on in the directory on the host machine where I started docker)
do pip3 install <your packages here> -t python/lib/python3.6/site-packages
zip up the folder python and upload it as a layer and use it in my AWS lambda function.
NB! The paths in the zip file should look like "python/lib/python3.6/site-packages/{your package names}"
Now the heavy dependencies are in separate layer and you don't have re-upload them every time you update the function, you only need to update the code
Split the application into two parts. The first part would be the lambda function that only includes your application code. The next part a lambda layer. The lambda layer will include onky the the dependencies and be uploaded once.
A lambda layer can be uploaded and attached to the lambda function. When your function is invoked, AWS will combine the lambda function with the lambda layer then execute the entire package.
When updating your code, you will only need to update the lambda function. Since the package is much smaller you can edit it using the web editor, or you can zip it and upload it directly to lambda using the cli tools.
Exmample: aws lambda update-function-code --function-name Yourfunctionname --zip-file fileb://Lambda_package.zip
Here are video instructions and examples on creating a lambda layer for dependencies.It demonstrates using the pymsql library, but you can install any of your libraries there.
https://geektopia.tech/post.php?blogpost=Create_Lambda_Layer_Python
Related
I would like to use a serverless lambda that will execute commands from a tool called WSO2 API CTL as I would on linux cli. I am not sure of how to mimic the downloading and calling of the commands as if I were on a linux machine using either Nodejs or Python via the lambda?
I am okay with creating and setting up the lambda and even getting it in the right VPC so that the commands will reach an application on an EC2 instance but I am stuck at how to actually execute the linux commands using either Nodejs or Python and which one would be better, if any.
After adding the following I get an error trying to download:
os.system("curl -O https://apim.docs.wso2.com/en/latest/assets/attachments/learn/api-controller/apictl-3.2.1-linux-x64.tar.gz")
Warning: Failed to create the file apictl-3.2.1-linux-x64.tar.gz: Read-only
It looks like there is no specific reason to download apictl during the initialisation of your Lambda. Therefore, I would propose to bundle it with your deployment package.
The advantage of this approach are:
Quicker initialisation
Less code in your Lambda
You could extend your CI/CD pipeline to download the application during build and then add it to your ZIP archive that you deploy.
Just a conceptual question here.
I'm a newbie to aws. I have a node app and a python file that is currently on a Flask server. The node app sends data to the Py server and gets data back. This takes approx 3.2 secs to happen. I am not sure how I can apply this to AWS. I tried sagemaker but it was really costly for me. Is there anyway I can create a Python server with an endpoint in AWS within the free tier?
Thanks
Rushi
You do not need to use sagemaker to deploy your flask application to AWS. AWS has a nice documentation to deploy a Flask Application to an AWS Elastic Beanstalk environment.
Other than that you can also deploy the application using two methods.
via EC2
via Lambda
EC2 Instances
You can launch the ec2 instance with public IP with SSH enabled from your IP address. Then SSH into the instance and install the python, it's libraries and your application.
Lambda
AWS lambda is the perfect solution. It scales automatically, depends upon the requests your application will receive.
As lambda needs your dependencies be available in the package, so you need to install them using --target parameter, zip the python code along with the installed packages and then upload to the Lambda.
pip install --target ./package Flask
cd package
zip -r9 function.zip . # Create a ZIP archive of the dependencies.
cd .. && zip -g function.zip lambda_function.py # Add your function code to the archive.
For more detailed instructions you can read these documentations
Lambda
I wonder is it able to create an HTTP triggered python function on Azure Function without doing any local coding? I want to do everything on Azure cloud. My python function codes are in a Github/Azure repos repository, but I do not have all the extra files of an Azure function project (for example, a init.py script file that is the HTTP trigger function of the Azure Function App). Is it possible to generate those files from Azure (without generating any Azure Function related files on my local computer)? I noticed that we cannot do in-portal editing for Python function Apps.
As far as I know, we can just deploy the python function from local to Azure cloud, but can not deploy it as you expected. And I think it will not be too difficult to us to deploy the python function from local to azure cloud.
Since you have had the main python function code "init.py", you just need to sign in to Azure in you VS code and create python function by following this tutorial. And then use your init.py code to replace the new python function code. After that, run the command below in "TERMINAL" window to generate the "requirements.txt":
pip freeze > requirements.txt
The "requirements.txt" includes all of the modules which imported in your "init.py" and when the function deployed to azure, azure will install these modules by this "requirements.txt". I saw you mentioned you don't have all the extra files of this function project, if these modules are what you care about, the "requirements.txt" will solve your problem.
Then use this command to deploy it to Azure:
func azure functionapp publish hurypyfunapp --build remote
The requirement is that I have to trigger a SageMaker endpoint on lambda to get predictions(which is easy) but have to do some extra processing for variable importance using packages such as XGBoost and SHAP.
I am able to hit the endpoint and get variable importance using the SageMaker Jupyter notebook. Now, I want to replicate the same thing on AWS lambda.
1) How to run python code on AWS lambda with package dependencies for Pandas, XGBoost and SHAP (total package size greater than 500MB). The unzipped deployment package size is greater than 250 MB, hence lambda is not allowing to deploy. I even tried using lambda function from Cloud9 and got the same error due to size restrictions. I have also tried lambda layers, but no luck.
2) Is there a way for me to run the code with such big packages on or through lambda bypassing the deployment package size limitation of 250 MB
3) Is there a way to trigger a SageMaker notebook execution through lambda which would do the calculations and return the output back to lambda?
Try to upload your dependencies to the Lambda Layer. FYI: https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html
In addition to use multiple layers for your dependencies - you may want to reduce the *.so files by linux strip command which discards symbols from compiled object files which may not necessary in production
In order to strip all *.so -
use linux/docker container with access to your dependencies directory
cd to your dependencies directory
Run
find . -name *.so -exec strip {} \;
Will execute strip command on every *.so file in the current working directory recursively.
It helped me reduce one of my dependencies objects from 94MB to just 7MB
I found the 250MB limitation on AWS lambda size to be draconian. Only one file ibxgboost.so from xgboost package is already around 140 MB which leaves only 110Mb for everything else. That makes AWS lambdas useless for anything but simple "hello world" stuff.
As an ugly workaround you can store xgboost package somewhere on s3 an copy it to the /tmp folder from the lambda invocation routine and point your python path to it. The allowed tmp space is a bit higher - 500MB so it might work.
I am not sure though if the /tmp folder is not cleaned between the lambda function runs though.
You can try using SageMaker Inference Pipelines to do pre-processing before making actual predictions. Basically, you can use the same pre-processing script used for training for inference as well. When the pipeline model is deployed, the full set of containers with pre-processing tasks installs and runs on each EC2 instance in the endpoint or transform job. Feature processing and inferences are executed with low latency because the containers deployed in an inference pipeline are co-located on the same EC2 instance (endpoint). You can refer documentation here.
Following blog posts/notebooks cover this feature in detail
Preprocess input data before making predictions using Amazon SageMaker inference pipelines and Scikit-learn
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/inference_pipeline_sparkml_xgboost_abalone/inference_pipeline_sparkml_xgboost_abalone.ipynb
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/inference_pipeline_sparkml_blazingtext_dbpedia/inference_pipeline_sparkml_blazingtext_dbpedia.ipynb
I got an AWS python Lambda function which contains few python files and also several dependencies.
The app is build using Chalice so by that the function will be mapped like any REST function.
Before the deployment in prod env, I want to test it locally, so I need to pack all this project (python files and dependencies), I tried to look over the web for the desired solution but I couldn't find it.
I managed to figrue how to deploy one python file, but a whole project did not succeed.
Take a look to the Atlassian's Localstack: https://github.com/atlassian/localstack
It's a full copy of the AWS cloud stack, locally.
I use Travis : I hooked it to my master branch in git, so that when I push on this branch, Travis tests my lambda, with a script that uses pytest, after having installed all its dependencies with pip install. If all the tests passed, it then deploy the lambda in AWS in my prod-env.