How to create AWS Lambda deployment package that uses Couchbase Python client - python

I'm trying to use AWS Lambda to transfer data from my S3 bucket to Couchbase server, and I'm writing in Python. So I need to import couchbase module in my Python script. Usually if there are external modules used in the script, I need to pip install those modules locally and zip the modules and script together, then upload to Lambda. But this doesn't work this time. The reason is the Python client of couchbase works with the c client of couchbase: libcouchbase. So I'm not clear what I should do. When I simply add in the c client package (with that said, I have 6 package folders in my deployment package, the first 5 are the ones installed when I run "pip install couchbase": couchbase, acouchbase, gcouchbase, txcouchbase, couchbase-2.1.0.dist-info; and the last one is the c client of Couchbase I installed: libcouchbase), lambda doesn't work and said:
"Unable to import module 'lambda_function': libcouchbase.so.2: cannot open shared object file: No such file or directory"
Any idea on how I can get the this work? With a lot of thanks.

Following two things worked for me:
Manually copy /usr/lib64/libcouchbase.so.2 into ur project folder
and zip it with your code before uploading to AWS Lambda.
Use Python 2.7 as runtime on the AWS Lambda console to connect to couchbase.
Thanks !

Unfortunately AWS Lambda does not support executing C-based python modules, like the Couchbase SDK.
Your best bet would be to use a pure-python client. The easiest way to do this would be to use the unofficial memcached client https://github.com/couchbase/couchbase-cli/blob/master/cb_bin_client.py which uses server-side moxi to handle memcached clients on port 11211.

Related

How to code a serverless AWS lambda function that will download a linux third party application using wget and then execute commands from that app?

I would like to use a serverless lambda that will execute commands from a tool called WSO2 API CTL as I would on linux cli. I am not sure of how to mimic the downloading and calling of the commands as if I were on a linux machine using either Nodejs or Python via the lambda?
I am okay with creating and setting up the lambda and even getting it in the right VPC so that the commands will reach an application on an EC2 instance but I am stuck at how to actually execute the linux commands using either Nodejs or Python and which one would be better, if any.
After adding the following I get an error trying to download:
os.system("curl -O https://apim.docs.wso2.com/en/latest/assets/attachments/learn/api-controller/apictl-3.2.1-linux-x64.tar.gz")
Warning: Failed to create the file apictl-3.2.1-linux-x64.tar.gz: Read-only
It looks like there is no specific reason to download apictl during the initialisation of your Lambda. Therefore, I would propose to bundle it with your deployment package.
The advantage of this approach are:
Quicker initialisation
Less code in your Lambda
You could extend your CI/CD pipeline to download the application during build and then add it to your ZIP archive that you deploy.

How to Connect to RDS Instance from AWS Glue Python Shell?

I am trying to access RDS Instance from AWS Glue, I have a few python scripts running in EC2 instances and I currently use PYODBC to connect, but while trying to schedule jobs for glue, I cannot import PYODBC as it is not natively supported by AWS Glue, not sure how drivers will work in glue shell as well.
From: Introducing Python Shell Jobs in AWS Glue announcement:
Python shell jobs in AWS Glue support scripts that are compatible with Python 2.7 and come pre-loaded with libraries such as the Boto3, NumPy, SciPy, pandas, and others.
The module list doesn't include pyodbc module, and it cannot be provided as custom .egg file because it depends on libodbc.so.2 and pyodbc.so libraries.
I think you have 2 options:
Create a jdbc connection to your DB from Glue's console, and use Glue's internal methods to query it. This will require code changes of course.
Use Lambda function instead. You'll need to pack pyodbc and the required libs along with your code in a zip file. Someone has already compiled those libs for AWS Lambda, see here.
Hope it helps
For AWS Glue use either Dataframe/DynamicFrame and specify the SQL Server JDBC driver. AWS Glue already contain JDBC Driver for SQL Server in its environment so you don't need to add any additional driver jar with glue job.
df1=spark.read.format("jdbc").option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver").option("url", url_src).option("dbtable", dbtable_src).option("user", userID_src).option("password", password_src).load()
if you are using a SQL instead of table:
df1=spark.read.format("jdbc").option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver").option("url", url_src).option("dbtable", ("your select statement here") A).option("user", userID_src).option("password", password_src).load()
As an alternate solution you can also use jtds driver for SQL server in your python script running in AWS Glue
If anyone needs a postgres connection with sqlalchemy using python shell, it is possible by referencing the sqlalchemy, scramp, pg8000 wheel files, it's important to reconstruct the wheel from pg8000 by eliminating the scramp dependency on the setup.py.
I needed to so something similar and ended up creating another Glue job in Scala while using Python for everything else. I know it may not work for everyone but wanted to mention How to run DDL SQL statement using AWS Glue
I was able to use the python library psycopg2 even though it is not written in pure python and it does not come preloaded with aws glue python shell environment. This runs contrary to aws glue documentation. So you might be able to use odbc related python libraries in a similar way. I created .egg files for psycopg2 library and used it successfully within glue python shell environment. Following are the logs from glue python shell if you have import psycopg2 in your script and the glue job refers to the related psycopg2 .egg files.
Creating /glue/lib/installation/site.py
Processing psycopg2-2.8.3-py2.7.egg
Copying psycopg2-2.8.3-py2.7.egg to /glue/lib/installation
Adding psycopg2 2.8.3 to easy-install.pth file
Installed /glue/lib/installation/psycopg2-2.8.3-py2.7.egg
Processing dependencies for psycopg2==2.8.3
Searching for psycopg2==2.8.3
Reading https://pypi.org/simple/psycopg2/
Downloading https://files.pythonhosted.org/packages/5c/1c/6997288da181277a0c29bc39a5f9143ff20b8c99f2a7d059cfb55163e165/psycopg2-2.8.3.tar.gz#sha256=897a6e838319b4bf648a574afb6cabcb17d0488f8c7195100d48d872419f4457
Best match: psycopg2 2.8.3
Processing psycopg2-2.8.3.tar.gz
Writing /tmp/easy_install-dml23ld7/psycopg2-2.8.3/setup.cfg
Running psycopg2-2.8.3/setup.py -q bdist_egg --dist-dir /tmp/easy_install-dml23ld7/psycopg2-2.8.3/egg-dist-tmp-9qwen3l_
creating /glue/lib/installation/psycopg2-2.8.3-py3.6-linux-x86_64.egg
Extracting psycopg2-2.8.3-py3.6-linux-x86_64.egg to /glue/lib/installation
Removing psycopg2 2.8.3 from easy-install.pth file
Adding psycopg2 2.8.3 to easy-install.pth file
Installed /glue/lib/installation/psycopg2-2.8.3-py3.6-linux-x86_64.egg
Finished processing dependencies for psycopg2==2.8.3
These are the steps that I used to connect to an RDS from glue python shell job:
Package up your dependency package into an egg file (these package must be pure python if I remember correctly). Put it in S3.
Set your job to reference that egg file under the job configuration > Python library path
Verify that your job can import the package/module
Create a glue connection to your RDS (it's in Database > Tables, Connections), test the connection make sure it can hit your RDS
Now in your job, you must set it to reference/use this connection. It's in the require connection as you configure your job or edit your job.
Once those steps are done and verify, you should be able to connect. In my sample I used pymysql.

how to use jwplatform api using python

I am going to create search api for Android and iOS developers.
Our client have setup a lambda function in AWS.
Now we need to fetch data using jwplatform Api based on search keyword passed as parameter. For this, I have to install jwplatform module in Lambda function or upload zip file of code with dependencies. So that i want to run python script locally and after getting appropriate result i will upload zip in AWS Lambda.
I want to use the videos/list (jwplatform Api) class to search the video library using python but i don't know much about Python. So i want to know how to run python script? and where should i put the pyhton script ?
There are a handful of useful Python script examples here: https://github.com/jwplayer/jwplatform-py
I am succeed to install jwplatform module locally.
Steps are as follows:
1. Open command line
2. Type 'python' on command line
3. Type command 'pip install jwplatform'
4. Now, you can use jwplatform api.
Above command added module jwplatform in python locally
But my another challenge is to install jwplatform in AWS Lambda.
After research i am succeed to install module in AWS Lambda. I have bundled module and code in a directory then create zip of bundle and upload it in AWS Lambda. This will install module(jwplatform) in AWS Lambda.

How can I run a simple python script hosted in the cloud on a specific schedule?

Say I have a file "main.py" and I just want it to run at 10 minute intervals, but not on my computer. The only external libraries the file uses are mysql.connector and pip requests.
Things I've tried:
PythonAnywhere - free tier is too limiting (need to connect to external DB)
AWS Lambda - Only supports up to Python 2.7, converted my code but still had issues
Google Cloud Platform + Heroku - can only find tutorials covering deploying applications, I think these could do what I'm looking for but I can't figure out how.
Thanks!
I'd start by taking a look at this question/answer that I asked previously on unix.stackexchange - I went with an AWS redhat installation and it was free to use.
Once you've decided on your VM, you can add SSH onto your server using any SSH client and upload your Python script. A personal preference is this application.
If you need to update the Python version on the server, you can do this by installing the required Python RPMs. A quick google should return the yum [or whichever RPM management system you're using] repository for the required RPMs.
Once you've installed the version of Python that you need, I'd suggest looking into the 'crontab' which can be used to schedule jobs. You can set a cronjob to run every 10minutes which will call your script.
See this site for more information on how to use the crontab
This sounds like a perfect use case for AWS Lambda which supports Python. You can invoke your Lambda on a schedule using Scheduled Events.
I see that you tried Lambda and it didn't work out for you which is too bad as that seems like the easiest route. You could also launch an EC2 instance and use userdata to schedule a cron when the instance starts.
Another option would be an Elastic Beanstalk worker with a cron.yml that defines your schedule. Elastic Beanstalk supports Python 3.4.
Update: AWS does now support Python 3.6. Just select Python 3.6 from the runtime environments when configuring.

How does AWS know where my imports are?

I'm new to AWS Lambda and pretty new to Python.
I wanted to write a python lambda that uses the AWS API.
boto is the most popular python module to do this so I wanted to include it.
Looking at examples online I put import boto3 at the top of my Lambda and it just worked- I was able to use boto in my Lambda.
How does AWS know about boto? It's a community module. Are there a list of supported modules for Lambdas? Does AWS cache its own copy of community modules?
AWS Lambda's Python environment comes pre-installed with boto3. Any other libraries you want need to be part of the zip you upload. You can install them locally with pip install whatever -t mysrcfolder.
The documentation seems to suggest boto3 is provided by default on AWS Lambda:
AWS Lambda includes the AWS SDK for Python (Boto 3), so you don't need to include it in your deployment package. However, if you want to use a version of Boto3 other than the one included by default, you can include it in your deployment package.
As far as I know, you will need to manually install any other dependencies in your deployment package, as shown in the linked documentation, using:
pip install foobar -t <project path>
AWS Lambda includes the AWS SDK for Python (Boto 3), so you don't need to include it in your deployment package.
This link will give you a little more in-depth info on Lambda environment
https://aws.amazon.com/blogs/compute/container-reuse-in-lambda/
And this too
https://alestic.com/2014/12/aws-lambda-persistence/

Categories