Does AWS Lambda allows to upload binaries separately to avoid re-upload - python

I am new to AWS Lambda, I have phantomjs application to run there.
There is a python script of 5 kb and phantomjs binary which makes the whole uploadable zip to 32MB.
And I have to upload this bunch all the time. Is there any way of pushing phantomjs binary to AWS lambda /bin folder separately ?

No, there is no way to accomplish this. Your Lambda function is always provisioned as a whole from the latest zipped package you provide (or S3 bucket/key if you choose that method).

Related

How to code a serverless AWS lambda function that will download a linux third party application using wget and then execute commands from that app?

I would like to use a serverless lambda that will execute commands from a tool called WSO2 API CTL as I would on linux cli. I am not sure of how to mimic the downloading and calling of the commands as if I were on a linux machine using either Nodejs or Python via the lambda?
I am okay with creating and setting up the lambda and even getting it in the right VPC so that the commands will reach an application on an EC2 instance but I am stuck at how to actually execute the linux commands using either Nodejs or Python and which one would be better, if any.
After adding the following I get an error trying to download:
os.system("curl -O https://apim.docs.wso2.com/en/latest/assets/attachments/learn/api-controller/apictl-3.2.1-linux-x64.tar.gz")
Warning: Failed to create the file apictl-3.2.1-linux-x64.tar.gz: Read-only
It looks like there is no specific reason to download apictl during the initialisation of your Lambda. Therefore, I would propose to bundle it with your deployment package.
The advantage of this approach are:
Quicker initialisation
Less code in your Lambda
You could extend your CI/CD pipeline to download the application during build and then add it to your ZIP archive that you deploy.

Suggestions to run a python script on AWS

I currently have a python project which basically reads data from an excel file, transforms and formats it, performs intensive calculations on the formatted data, and generates an output. This output is written back on the same excel file.
The script is run using a Pyinstaller EXE which basically is packing all the required libraries and the code itself, so every user is not required to prep the environment to run the script.
Both, the script EXE and the Excel file, sit on the user's machine.
I need some suggestion on how this entire workflow could be achieved using AWS. Like what AWS services would be required etc.
Any inputs would be appreciated.
One option would include using S3 to store the input and output files. You could create a lambda function (or functions) that does the computing work and that writes the update back to S3.
You would need to include the Python dependencies in your deployment zip that you push to AWS Lambda or create a Lambda layer that has the dependencies.
You could build triggers to run on things like S3 events (a file being added to S3 triggers the Lambda), on a schedule (EventBridge rule invokes the Lambda according to a specific schedule), or on demand using an API (such as an API Gateway that users can invoke via a web browser or HTTP request). It just depends on your need.

How to run python code on AWS lambda with package dependencies >500MB?

The requirement is that I have to trigger a SageMaker endpoint on lambda to get predictions(which is easy) but have to do some extra processing for variable importance using packages such as XGBoost and SHAP.
I am able to hit the endpoint and get variable importance using the SageMaker Jupyter notebook. Now, I want to replicate the same thing on AWS lambda.
1) How to run python code on AWS lambda with package dependencies for Pandas, XGBoost and SHAP (total package size greater than 500MB). The unzipped deployment package size is greater than 250 MB, hence lambda is not allowing to deploy. I even tried using lambda function from Cloud9 and got the same error due to size restrictions. I have also tried lambda layers, but no luck.
2) Is there a way for me to run the code with such big packages on or through lambda bypassing the deployment package size limitation of 250 MB
3) Is there a way to trigger a SageMaker notebook execution through lambda which would do the calculations and return the output back to lambda?
Try to upload your dependencies to the Lambda Layer. FYI: https://docs.aws.amazon.com/lambda/latest/dg/configuration-layers.html
In addition to use multiple layers for your dependencies - you may want to reduce the *.so files by linux strip command which discards symbols from compiled object files which may not necessary in production
In order to strip all *.so -
use linux/docker container with access to your dependencies directory
cd to your dependencies directory
Run
find . -name *.so -exec strip {} \;
Will execute strip command on every *.so file in the current working directory recursively.
It helped me reduce one of my dependencies objects from 94MB to just 7MB
I found the 250MB limitation on AWS lambda size to be draconian. Only one file ibxgboost.so from xgboost package is already around 140 MB which leaves only 110Mb for everything else. That makes AWS lambdas useless for anything but simple "hello world" stuff.
As an ugly workaround you can store xgboost package somewhere on s3 an copy it to the /tmp folder from the lambda invocation routine and point your python path to it. The allowed tmp space is a bit higher - 500MB so it might work.
I am not sure though if the /tmp folder is not cleaned between the lambda function runs though.
You can try using SageMaker Inference Pipelines to do pre-processing before making actual predictions. Basically, you can use the same pre-processing script used for training for inference as well. When the pipeline model is deployed, the full set of containers with pre-processing tasks installs and runs on each EC2 instance in the endpoint or transform job. Feature processing and inferences are executed with low latency because the containers deployed in an inference pipeline are co-located on the same EC2 instance (endpoint). You can refer documentation here.
Following blog posts/notebooks cover this feature in detail
Preprocess input data before making predictions using Amazon SageMaker inference pipelines and Scikit-learn
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/inference_pipeline_sparkml_xgboost_abalone/inference_pipeline_sparkml_xgboost_abalone.ipynb
https://github.com/awslabs/amazon-sagemaker-examples/blob/master/advanced_functionality/inference_pipeline_sparkml_blazingtext_dbpedia/inference_pipeline_sparkml_blazingtext_dbpedia.ipynb

How can I update a CSV stored on AWS S3 with a Python script?

I have a CSV which is stored in an AWS S3 bucket and is used to store information which gets loaded into a HTML document via some jQuery.
I also have a Python script which is currently sat on my local machine ready to be used. This Python script scrapes another website and saves the information to the CSV file which I then upload to my AWS S3 bucket.
I am trying to figure out a way that I can have the Python script run nightly and overwrite the CSV stored in the S3 bucket. I cannot seem to find a similar solution to my problem online and am vastly out of my depth when it comes to AWS.
Does anyone have any solutions to this problem?
Cheapest way: Modify your Python script to work as an AWS Lambda function, then schedule it to run nightly.
Easiest way: Spin up an EC2 instance, copy the script to the instance, and schedule it to run nightly via cron.

AWS lambda function deployment

I have developed a lambda function which hits API url and getting the data in Json Format. So need to use modules/libraries like requests which is not available in AWS online editor using Python 2.7.
So need to upload the code in Zip file, How we can do step by step to deploy Lambda function from windows local server to AWS console. What are the requirements?
You could use code build, which will build your code on the aws linux envoirnment. Then it wont matter if the envoirnment is windows or linux.
code build will put the artifacts directly on s3, from there you can directly upload it to lambda.

Categories