Working around AWS Lambda space limitations

Working around AWS Lambda space limitations - python

I am doing an example of a Simple Linear Regression in Python and I want to make use of Lambda functions to make it work on AWS so that I could interface it with Alexa. The problem is my Python package is 114 MB. I have tried to separate the package and the code so that I have two lambda functions but to no avail. I have tried every possible way on the internet.
Is there any way I could upload the packages on S3 and read it from there like how we read csv's from S3 using boto3 client?

Yes, you can upload the package to S3. There is a limit for that as well. It's 250 MB currently. https://docs.aws.amazon.com/lambda/latest/dg/limits.html
Here's a simple command to do that.
aws lambda update-function-code --function-name FuncName --zip-file fileb://path/to/zip/file --region us-west-2

There are certain limitations when using AWS lambda.
1) The total size of your uncompressed code and dependencies should be less than 250MB.
2) The size of your zipped code and dependencies should be less than 75MB.
3) The total fixed size of all function packages in a region should not exceed 75GB.
If you are exceeding the limit, try finding smaller libraries with less dependencies or breaking your functionality in to multiple micro services rather than building code which does all the work for you. This way you don't have to include every library in each function. Hope this helps.

Related

NLTK corpora download is hanging when run in AWS Lambda Python function

I'm trying to download NLTK data onto the file storage of a Lambda function like so:
nltk.data.path.append("/tmp")
nltk.download("popular", download_dir="/tmp")
The Lambda function keeps timing out. When I check the Cloudwatch logs, I see no logs related to the download of different corpora files (e.g. Downloading package cmudict to /tmp...; instead the code seems to reach up to nltk.download(), then hang forever.
Has anyone seen this strange behavior?

Got it: My Lambda function was running in a VPC. I had to add an endpoint to enable the VPC to access S3.

using aws lambda /tmp - python

for <some-condition>:
g.to_csv(f'/tmp/{k}.csv')
This example makes use of /tmp/. When /tmp/ not used in g.to_csv(f'/tmp/{k}.csv') then it gives Read only file system error from here https://stackoverflow.com/a/42002539/13016237, so question is if AWS lambda clears /tmp/ on its own or is it to be done manually. Is there any workaround for this within the scope of boto3. Thanks!

/tmp, as the name suggest, is only a temporary storage. It should not be relied upon for any long term data storage. The files in /tmp persist for as long as lambda execution context is kept alive. The time is not defined and varies.
To overcome the size limitation (512 MB) and to ensure long term data storage there are two solutions employed:
Using Amazon EFS with Lambda
Using AWS Lambda with Amazon S3
The use of the EFS is easier (but not cheaper), as this will present a regular filesystem to your function which you can write and read directly. You can also re-use the same filesystem across multiple lambda functions, instances, containers and more.
The S3 will be cheaper but there is some extra work required from you to seamlessly use in lambda. Pandas does support S3, but for seamless integration you would have to include S3FS in your deployment package (or layer) if not already present. The S3 can also be accessed from different functions, instances and containers.

g.to_csv('s3://my_bucket/my_data.csv') should work if you will package s3fs with your lambda.
Another option is to save the csv into memory and use boto3 to create an object in s3

Deploying a Python Automation script in the cloud

I have a working Python automation program combine_excel.py. It accesses a server and extracts excel files and combining them with an automation workflow. Currently, I need to execute this automation manually.
I like to host this program in a cloud/server and activate the script at preset timings and at regular intervals. I like to know if there is any service out there that will allow me to do that. Can I do this on Google Cloud or AWS?
The program will generate an output that I could have it saved into to my Google Drive.

An easy/cost-effective way to achieve this could be to use AWS Lambda functions. Lambda functions can be set to trigger at certain time intervals using CRON syntax.
You might need to make some minor adjustments to match some format requirements Lambda has, maybe workout a way to include dependencies if you have any, but everything should be pretty straightforward as there is a lot of information available on the web.
The same can be achieved using Google Cloud Functions.
You could also try Serverless Framework which would take care of the deployment for you, you only need to set it up once.
Another options is to try Zeit it's quite simple to use, and it has a free tier (as the others).
Some useful links:
https://serverless.com/blog/serverless-python-packaging/
https://docs.aws.amazon.com/lambda/latest/dg/welcome.html
https://cloud.google.com/functions/docs/quickstart-python
https://zeit.co/docs/runtimes#official-runtimes/python

How to host a Python script on the cloud?

I wrote a Python script which scrapes a website and sends emails if a certain condition is met. It repeats itself every day in a loop.
I converted the Python file to an EXE and it runs as an application on my computer. But I don't think this is the best solution to my needs since my computer isn't always on and connected to the internet.
Is there a specific website I can host my Python code on which will allow it to always run?
More generally, I am trying to get the bigger picture of how this works. What do you actually have to do to have a Python script running on the cloud? Do you just upload it? What steps do you have to undertake?
Thanks in advance!

well i think one of the best option is pythonanywhere.com there you can upload your python script(script.py) and then run it and then finish.
i did this with my telegram bot

You can deploy your application using AWS Beanstalk. It will provide you with the whole python environment along with server configuration likely to be changed according to your needs. Its a PAAS offering from AWS cloud.

The best and cheapest solution I have found so far is to use AWS Event Bridge with AWS Lambda.
AWS Lambda allows you upload and execute any script you want in most popular programming languages without needing to pay for a server monthly.
And you can use AWS Event Bridge to trigger an execution of a Lambda function.
You only get charged for what you use in AWS Lambda and it is extremely cheap. Below is the pricing for Lambda in the AWS N. Virginia region. For most scripts, the minimum memory is more than enough. So to run a script every hour for a month that takes 5 seconds to finish, it will cost $0.00756 a month (less than a cent!).
Memory (MB)
Price per 1ms
128
$0.0000000021
512
$0.0000000083
1024
$0.0000000167
1536
$0.0000000250
2048
$0.0000000333
3072
$0.0000000500
4096
$0.0000000667
5120
$0.0000000833
6144
$0.0000001000
7168
$0.0000001167
8192
$0.0000001333
9216
$0.0000001500
10240
$0.0000001667
Then you can use AWS Event Bridge to schedule to run an AWS Lambda function every minute, hour, etc.
Here are some articles to help you run any script every minute, hour, etc.
How to Create Lambda Functions in Python
How to Schedule Running an AWS Lambda Function

Bring machine learning to live production with AWS Lambda Function

I am currently working on implementing Facebook Prophet for a live production environment. I haven't done that before so I wanted to present you my plan here and hope you can give me some feedback whether that's an okay solution or if you have any suggestions.
Within Django, I create a .csv export of relevant data which I need for my predictions. These .csv exports will be uploaded to an AWS S3 bucket.
From there I can access this S3 bucket with an AWS Lambda Function where the "heavy" calculations are happening.
Once done, I take the forecasts from 2. and save them again in a forcast.csv export
Now my Django application can access the forecast.csv on S3 and get the respective predictions.
I am especially curious if AWS Lambda Function is the right tool in that case. Exports could probably also saved in DynamoDB (?), but I try to keep my v1 simple, therefore .csv. There is still some effort to install the right layers/packages for AWS Lambda. So I want to make sure I am walking in the right direction before diving deep in its documentation.

I am a little concerned about using AWS Lambda for the "heavy" calculations. There are a few reasons.
Binary size Limit: AWS Lambda has a size limit of 250MB for binaries. This is the biggest limitation we faced as you will not be able to include all the libraries like numpy, pandas, matplotlib, etc in that binary.
Disk size limit: AWS only provides max disk size of 500MB for lambda execution, it can become a problem if you want to save intermediate results in the disk.
The cost can skyrocket: If your lambda is going to run for a long time instead of multiple small invocations, you will end up paying a lot of money. In that case, I think you will be better off with something like EC2 and ECS.
You can evaluate linking the S3 bucket to an SQS queue and a process running on EC2 machine which is listening to the queue and performing all the calculations.
AWS Lambda Limits.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Working around AWS Lambda space limitations - python

Related

NLTK corpora download is hanging when run in AWS Lambda Python function

using aws lambda /tmp - python

Deploying a Python Automation script in the cloud

How to host a Python script on the cloud?

Bring machine learning to live production with AWS Lambda Function

Categories

Resources