Unable to install pandas on AWS Lambda - python

I'm trying to install and run pandas on an Amazon Lambda instance. I've used the recommended zip method of packaging my code file model_a.py and related python libraries (pip install pandas -t /path/to/dir/) and uploaded the zip to Lambda. When I try to run a test, this is the error message I get:
Unable to import module 'model_a': C extension:
/var/task/pandas/hashtable.so: undefined symbol: PyFPE_jbuf not built.
If you want to import pandas from the source directory, you may need
to run 'python setup.py build_ext --inplace' to build the C extensions
first.
Looks like an error in a variable defined in hashtable.so that comes with the pandas installer. Googling for this did not turn up any relevant articles. There were some references to a failure in numpy installation but nothing concrete. Would appreciate any help in troubleshooting this! Thanks.

I would advise you to use Lambda layers to use additional libraries. The size of a lambda function package is limited, but layers can be used up to 250MB (more here).
AWS has open sourced a good package, including Pandas, for dealing with data in Lambdas. AWS has also packaged it making it convenient for Lambda layers. You can find instructions here.

I have successfully run pandas code on lambda before. If your development environment is not binary-compatible with the lambda environment, you will not be able to simply run pip install pandas -t /some/dir and package it up into a lambda .zip file. Even if you are developing on linux, you may still run into compatability issues.
So, how do you get around this? The solution is actually pretty simple: run your pip install on a lambda container and use the pandas module that it downloads/builds instead. When I did this, I had a build script that would spin up an instance of the lambci/lambda container on my local system (a clone of the AWS Lambda container in docker), bind my local build folder to /build and run pip install pandas -t /build/. Once that's done, kill the container and you have the lambda-compatible pandas module in your local build folder, ready to zip up and send to AWS along with the rest of your code.
You can do this for an arbitrary set of python modules by making use of a requirements.txt file, and you can even do it for arbitrary versions of python by first creating a virtual environment on the lambci container. I haven't needed to do this for a couple of years, so maybe there are better tools by now, but this approach should at least be functional.

If you want to install it directly through the AWS Console, I made a step-by-step youtube tutorial, check out the video here: How to install Pandas on AWS Lambda

Related

Use Pandas in Azure Functions

Trying to create a basic Python Function and use it in Azure Function App(Consumption based). Used the HTTP Template via VS Code and able to use and get it deployed on Azure. However when I try to use "Pandas" in the logic, I get the error which I am not able to rectify. Me being a rookie in Python. Can you suggest how to rectify ?
Tool Used : VS Code , Azure Functions Tools
Python version installed locally : 3.8.5
Azure Function App Python Version : 3.8
It seems the pandas module hasn't been installed in your function on azure. You need to add the pandas module into your local requirements.txt and then deploy the function from local to azure. It will install the modules according to the lines in requirements.txt.
You can run this command in "Terminal" window to generate the pandas line in your requirements.txt automatically.
pip freeze > requirements.txt
After running the command above, your requirements.txt should be like:

Install Python modules in Azure Functions

I am learning how to use Azure functions and using my web scraping script in it.
It uses BeautifulSoup (bs4) and pymysql modules.
It works fine when I tried it locally in the virtual environment as per this MS guide:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-create-first-azure-function-azure-cli?pivots=programming-language-python&tabs=cmd%2Cbrowser#run-the-function-locally
But when I create the function App and publish the script to it, Azure Functions logs give me this error:
Failure Exception: ModuleNotFoundError: No module named 'pymysql'.
It must happen when attempting to import it.
I really don't know how to proceed, where should I specify what modules it needs to install?
You need to check if you have generated the requirements.txt which includes all of the information of the modules. When you deploy the function to azure, it will install the modules by the requirements.txt automatically.
You can generate the information of modules in requirements.txt file by the command below in local:
pip freeze > requirements.txt
And then deploy the function to azure by running the publish command:
func azure functionapp publish hurypyfunapp --build remote
For more information about deploy python function from local to auzre, please refer to this tutorial.
By the way, if you use consumption plan for your python function, the "Kudu" is not available for us. If you want to use "Kudu", you need to create app service plan for it but not consumption plan.
Hope it helps~
You need to upload the installed modules when deploying to azure. You can upload them using Kudu:
https://github.com/projectkudu/kudu/wiki/Kudu-console
as an alternative, you can also use Kudu and run pip install using the console:
Install python packages from the python code itself with the following snippet: (Tried and verified on Azure functions)
def install(package):
# This function will install a package if it is not present
from importlib import import_module
try:
import_module(package)
except:
from sys import executable as se
from subprocess import check_call
check_call([se,'-m','pip','-q','install',package])
for package in ['beautifulsoup4','pymysql']:
install(package)
Desired libraries mentioned the list gets installed when the azure function is triggered for the first time. for the subsequent triggers, you can comment/ remove the installation code.

Pysftp package is not working in lambda function throwing error : cannot import name '_bcrypt' from 'bcrypt' (./lib/bcrypt/__init__.py)

I download (pip install pysftp) and make a zip file and upload in a lambda function.
but it is not working in a lambda function. throwing error.
Response:
{
"errorMessage": "Unable to import module 'lambda_function': cannot import name '_bcrypt' from 'bcrypt' (./lib/bcrypt/__init__.py)",
"errorType": "Runtime.ImportModuleError"
}
Many thank you in advance.
The operation "pip install pysftp" is to be performed in a linux distribution for preparing the lib for aws lambda. I had used ubuntu in docker on windows to perform pip install on mounted volume to generate the lib.
Try reinstalling the packages and uploading the new packages. If it still shows error, move your development environment from Windows to Linux.
Similar kind of error for your reference: [1]: https://forums.aws.amazon.com/thread.jspa?messageID=804753&tstart=0
Since you need to troubleshoot module dependencies, the python runtime environment of AWS Lambda must be inspected.
In your AWS Lambda, print the modules that are loaded, and therefore available to the other modules that your code imports.
def lambda_handler(event, context):
print (help("modules"))
Running this in a python interpreter is illuminating.
python
help("modules")
You will see Please wait a moment while I gather a list of all available modules... and then a big list of available modules that are importable.
You will find that you are missing bcrypt, for within that module as taught by help(bcrypt) you will find the missing dependency _bcrypt.
Should bcrypt be available to the lambda or just a python interpreter, it is found in this manner.
>>> bcrypt._bcrypt
<module 'bcrypt._bcrypt' from '/usr/local/lib/python2.7/site-packages/bcrypt/_bcrypt.so'>
To install everything in the same directory, use the below command
pip install pysftp -t .
After that, zip the entire dir & upload to Lambda console.
Little old trick that helpssfor those who using Lambda for the 1st time...

Firebase on AWS Lambda Import Error

I am trying to connect Firebase with an AWS Lambda. I am using their firebase-admin sdk. I have installed and created the dependancy package as described here. But I am getting this error on Lambda:
Unable to import module 'index':
Failed to import the Cloud Firestore library for Python.
Make sure to install the "google-cloud-firestore" module.
I have previously also tried setting up a similar function using node.js but I received an error message because GRPC was not configured. I think that this error message might be stemming from that same problem. I don't know how to fix this. I have tried:
pip install grpcio -t path/to/...
and installing google-cloud-firestore, but neither fixed the problem. When I run the code from my terminal, I get no errors.
Part of the problem here is that grpcio compiles a platform specific dynamic module: cygrpc.cpython-37m-darwin.so (in my case). According to this response you cannot import dynamic modules in a zip file: https://stackoverflow.com/a/58140801
Updating to python 3.8 fix this for me
As Alex DeBrie mentioned in his article on serverless.com,
The plugins section registers the plugin with the Framework. In the custom section, we tell the plugin to use Docker when installing packages with pip. It will use a Docker container that's similar to the Lambda environment so the compiled extensions will be compatible. You will need Docker installed for this to work.
Which means, the environment is different between Local and Lambda, so the compiled extensions would differ. If use a container to contain packages installed by pip, the container would mimic the environment of Lambda, then everything would run well.
If you use Serverless Frame work to deploy your Python app to AWS Lambda, add these lines to serverless.yml file:
...
plugins:
- serverless-python-requirements
...
custom:
pythonRequirements:
dockerizePip: non-linux
dockerImage: mlupin/docker-lambda:python3.9-build
...
then serverless-python-requirements would automatically open a Docker container based on mlupin/docker-lambda:python3.9-build image.
This container would mimic the Lamda environment, let pip install and compile everything in it. So the compiled extensions will be compatible.
This worked in my case. Hope this helps.

Building extensions to AWS Lambda with Continuous Delivery

I have a GitHub repository containing a AWS Lambda function. I am currently using Travis CI to build, test and then deploy this function to Lambda if all the tests succeed using
deploy:
provider: lambda
(other settings here)
My function has the following dependencies specified in its requirements.txt
Algorithmia
numpy
networkx
opencv-python
I have set the build script for Travis CI to build in the working directory using the below command so as to have the dependencies get properly copied over to my AWS Lambda function.
pip install --target=$TRAVIS_BUILD_DIR -r requirements.txt
The problem is that while the build in Travis CI succeeds and everything is deployed to the Lambda function successfully, testing my Lambda function results in the following error:
Unable to import module 'mymodule':
Importing the multiarray numpy extension module failed. Most
likely you are trying to import a failed build of numpy.
If you're working with a numpy git repo, try `git clean -xdf` (removes all
files not under version control). Otherwise reinstall numpy.
My best guess as to why this is happening is that numpy is being built in the Ubuntu distribution of linux that Travis CI uses but the Amazon Linux that it is running on when executing as a Lambda function isn't able to run it properly. There are numerous forum posts and blog posts such as this one detailing that python modules that need to build C/C++ extensions must be built on a EC2 instance.
My question is: This is a real hassle to have to add another complication to the CD pipeline and have to mess around with EC2 instances. Has Amazon come up with some better way to do this (because there really should be a better way to do this) or is there some way to have everything compiled properly in Travis CI or another CI solution?
Also, I suppose it's possible that I've mis-identified the problem and that there is some other reason why importing numpy is failing. If anyone has suggestions on how to resolve this that would be great!
EDIT: As suggested by #jordanm it looks like it may be possible to load a docker container with the amazonlinux image when running TravisCI and then perform my build and test inside that container. Unfortunately, while that certainly is easier than using EC2 - I don't think I can use the normal lambda deploy tools in TravisCI - I'll have to write my own deploy script using the aws cli which is a bit of a pain. Any other ideas - or ways to make this smoother? Ideally I would be specify what docker image my builds run on in TravisCI as their default build environment is already using docker...but they don't seem to support that functionality yet: https://github.com/travis-ci/travis-ci/issues/7726
After quite a bit of tinkering I think I've found something that works. I thought I'd post it here in case others have the same problem.
I decided to use Wercker as they have quite a generous free tier and allow you to customize the docker image for your builds.
Turns out there is a docker image that has been created to replicate the exact environments that Lambda functions are executed on! See: https://github.com/lambci/docker-lambda When running your builds in this docker container, extensions will be built properly so they can execute successfully on Lambda.
In case anyone does want to use Wercker here's the wercker.yml I used and it may be helpful as a template:
box: lambci/lambda:build-python3.6
build:
steps:
- script:
name: Install Dependencies
code: |
pip install --target=$WERCKER_SOURCE_DIR -r requirements.txt
pip install pytest
- script:
name: Test code
code: pytest
- script:
name: Cleaning up
code: find $WERCKER_SOURCE_DIR \( -name \*.pyc -o -name \*.pyo -o -name __pycache__ \) -prune -exec rm -rf {} +
- script:
name: Create ZIP
code: |
cd $WERCKER_SOURCE_DIR
zip -r $WERCKER_ROOT/lambda_deploy.zip . -x *.git*
deploy:
box: golang:latest
steps:
- arjen/lambda:
access_key: $AWS_ACCESS_KEY
secret_key: $AWS_SECRET_KEY
function_name: yourFunction
region: us-west-1
filepath: $WERCKER_ROOT/lambda_deploy.zip
Although I appreciate you may not want to add further complications to your project, you could potentially use a Python-focused Lambda management tool for setting up your builds and deployments, say something like Gordon. You could also just use this tool to do your deployment from inside the Amazon Linux Docker container running within Travis.
If you wish to change CI providers, CodeShip allows you to build with any Docker container of your choice, and then deploy to Lambda.
Wercker also runs full Docker-based builds and has many user-submitted deploy "steps", some of which support deployment to Lambda.

Categories