I am trying to access RDS Instance from AWS Glue, I have a few python scripts running in EC2 instances and I currently use PYODBC to connect, but while trying to schedule jobs for glue, I cannot import PYODBC as it is not natively supported by AWS Glue, not sure how drivers will work in glue shell as well.
From: Introducing Python Shell Jobs in AWS Glue announcement:
Python shell jobs in AWS Glue support scripts that are compatible with Python 2.7 and come pre-loaded with libraries such as the Boto3, NumPy, SciPy, pandas, and others.
The module list doesn't include pyodbc module, and it cannot be provided as custom .egg file because it depends on libodbc.so.2 and pyodbc.so libraries.
I think you have 2 options:
Create a jdbc connection to your DB from Glue's console, and use Glue's internal methods to query it. This will require code changes of course.
Use Lambda function instead. You'll need to pack pyodbc and the required libs along with your code in a zip file. Someone has already compiled those libs for AWS Lambda, see here.
Hope it helps
For AWS Glue use either Dataframe/DynamicFrame and specify the SQL Server JDBC driver. AWS Glue already contain JDBC Driver for SQL Server in its environment so you don't need to add any additional driver jar with glue job.
df1=spark.read.format("jdbc").option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver").option("url", url_src).option("dbtable", dbtable_src).option("user", userID_src).option("password", password_src).load()
if you are using a SQL instead of table:
df1=spark.read.format("jdbc").option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver").option("url", url_src).option("dbtable", ("your select statement here") A).option("user", userID_src).option("password", password_src).load()
As an alternate solution you can also use jtds driver for SQL server in your python script running in AWS Glue
If anyone needs a postgres connection with sqlalchemy using python shell, it is possible by referencing the sqlalchemy, scramp, pg8000 wheel files, it's important to reconstruct the wheel from pg8000 by eliminating the scramp dependency on the setup.py.
I needed to so something similar and ended up creating another Glue job in Scala while using Python for everything else. I know it may not work for everyone but wanted to mention How to run DDL SQL statement using AWS Glue
I was able to use the python library psycopg2 even though it is not written in pure python and it does not come preloaded with aws glue python shell environment. This runs contrary to aws glue documentation. So you might be able to use odbc related python libraries in a similar way. I created .egg files for psycopg2 library and used it successfully within glue python shell environment. Following are the logs from glue python shell if you have import psycopg2 in your script and the glue job refers to the related psycopg2 .egg files.
Creating /glue/lib/installation/site.py
Processing psycopg2-2.8.3-py2.7.egg
Copying psycopg2-2.8.3-py2.7.egg to /glue/lib/installation
Adding psycopg2 2.8.3 to easy-install.pth file
Installed /glue/lib/installation/psycopg2-2.8.3-py2.7.egg
Processing dependencies for psycopg2==2.8.3
Searching for psycopg2==2.8.3
Reading https://pypi.org/simple/psycopg2/
Downloading https://files.pythonhosted.org/packages/5c/1c/6997288da181277a0c29bc39a5f9143ff20b8c99f2a7d059cfb55163e165/psycopg2-2.8.3.tar.gz#sha256=897a6e838319b4bf648a574afb6cabcb17d0488f8c7195100d48d872419f4457
Best match: psycopg2 2.8.3
Processing psycopg2-2.8.3.tar.gz
Writing /tmp/easy_install-dml23ld7/psycopg2-2.8.3/setup.cfg
Running psycopg2-2.8.3/setup.py -q bdist_egg --dist-dir /tmp/easy_install-dml23ld7/psycopg2-2.8.3/egg-dist-tmp-9qwen3l_
creating /glue/lib/installation/psycopg2-2.8.3-py3.6-linux-x86_64.egg
Extracting psycopg2-2.8.3-py3.6-linux-x86_64.egg to /glue/lib/installation
Removing psycopg2 2.8.3 from easy-install.pth file
Adding psycopg2 2.8.3 to easy-install.pth file
Installed /glue/lib/installation/psycopg2-2.8.3-py3.6-linux-x86_64.egg
Finished processing dependencies for psycopg2==2.8.3
These are the steps that I used to connect to an RDS from glue python shell job:
Package up your dependency package into an egg file (these package must be pure python if I remember correctly). Put it in S3.
Set your job to reference that egg file under the job configuration > Python library path
Verify that your job can import the package/module
Create a glue connection to your RDS (it's in Database > Tables, Connections), test the connection make sure it can hit your RDS
Now in your job, you must set it to reference/use this connection. It's in the require connection as you configure your job or edit your job.
Once those steps are done and verify, you should be able to connect. In my sample I used pymysql.
Related
I wrote a python script that requires the use of a postgresql DB. For test purpose, I installed the postgresql DB manually, and the DB that comes with that. The script connects to it and make its job.
My question is about packaging : what is the best solution for the user to install this script, along with the DB and its schema juste by typing pip install xxx ?
Is that possible ?
Thanks
Postgres is great but often you can get away with SQLite. It's part of the standard library and comes bundled with Python
My Python App Engine Flex application needs to connect to an external Oracle database. Currently I'm using the cx_Oracle Python package which requires me to install the Oracle Instant Client.
I have successfully run this locally (on macOS) by following the Instant Client installation steps. The steps required me to do the following:
Make a directory called /opt/oracle
Create a symlink from /opt/oracle/instantclient_12_2/libclntsh.dylib.12.1 to ~/lib/
However, I am confused about how to do the same thing in App Engine Flex (instructions). Specifically, here's what I'm confused about:
The instructions say I should run sudo yum install libaio to install the libaio package. How do I do this on GAE Flex? Or is this package already available?
I think I can add the Instant Client files to GAE (a whopping ~100MB!), then set the LD_LIBRARY_PATH environment variable in app.yaml to export LD_LIBRARY_PATH=/opt/oracle/instantclient_12_2:$LD_LIBRARY_PATH. Will this work?
Is this even feasible without using custom Docker containers on App Engine Flex?
Overall I'm not sure if I'm on the right track. Would love to hear from someone who has managed this before :)
If any of your dependencies is not available in the base GAE flex images provided by Google and cannot be installed via pip (because it's not a python package or it's not available in PyPI or whatever other reason) then you can't use the requirements.txt file to get it installed in your GAE flex app.
The proper way to satisfy such dependencies would be to build your own custom runtime. From About Custom Runtimes:
Custom runtimes allow you to define new runtime environments, which
might include additional components like language interpreters or
application servers.
Yes, that means providing a custom Docker file. In your particular case you'd be installing the Instant Client and libaio inside this Dockerfile. See also Building Custom Runtimes.
Answering your first question, I think that the instructions in the oracle website just show that you have to install said library for your application to work.
In the case of App engine flex, they way to ensure that the libraries are present in the deployment is with the requirements.txt textfile. There is a documentation page which does explain how to do so.
On the other hand, I will assume that "Instant Client Files" are not libraries, but necessary data for your App to run. You should use Google Cloud Storage to serve them, or any other alternative of Storage within Google Cloud.
I believe that, if this is all what you need for your App to work, pushing your own custom container should not be necessary.
I have a remote ms sql server at office location and I wanted to connect from an ubuntu desktop to it. I installed DBeaver http://dbeaver.jkiss.org/ and it was working like a charm without any other dependency.
After this I started to setup a connection from python with no success until I found a straightforward tutorial at https://tryolabs.com/blog/2012/06/25/connecting-sql-server-database-python-under-ubuntu/. But this solution involved installing some packages to ubuntu itself and configuring some related config files in the system.
The question is if there is some python package to use without os dependencies?
I don't think you would be able to achieve what you want without installing some packages. Reason being you are connecting to the MSSQL which follows a different set of rules as it is from Microsoft.
If you were connecting to any DB (MySQL,MongoDB etc) you still need to install a package or module to make it work.
Try to install the package as described in the tutorial from the like you shared and if you run into any problem paste the problem here.
I'm trying to use AWS Lambda to transfer data from my S3 bucket to Couchbase server, and I'm writing in Python. So I need to import couchbase module in my Python script. Usually if there are external modules used in the script, I need to pip install those modules locally and zip the modules and script together, then upload to Lambda. But this doesn't work this time. The reason is the Python client of couchbase works with the c client of couchbase: libcouchbase. So I'm not clear what I should do. When I simply add in the c client package (with that said, I have 6 package folders in my deployment package, the first 5 are the ones installed when I run "pip install couchbase": couchbase, acouchbase, gcouchbase, txcouchbase, couchbase-2.1.0.dist-info; and the last one is the c client of Couchbase I installed: libcouchbase), lambda doesn't work and said:
"Unable to import module 'lambda_function': libcouchbase.so.2: cannot open shared object file: No such file or directory"
Any idea on how I can get the this work? With a lot of thanks.
Following two things worked for me:
Manually copy /usr/lib64/libcouchbase.so.2 into ur project folder
and zip it with your code before uploading to AWS Lambda.
Use Python 2.7 as runtime on the AWS Lambda console to connect to couchbase.
Thanks !
Unfortunately AWS Lambda does not support executing C-based python modules, like the Couchbase SDK.
Your best bet would be to use a pure-python client. The easiest way to do this would be to use the unofficial memcached client https://github.com/couchbase/couchbase-cli/blob/master/cb_bin_client.py which uses server-side moxi to handle memcached clients on port 11211.
I have a written a very small web-based survey using cgi with python(This is my first web app. ).The questions are extracted from a MySQL database table and the results are supposed to be saved in the same database. I have created the database along with its table locally. My app works fine on my local computer(localhost). To create db,table and other transaction with the MySQL i had to do import MySQLdb in my code.
Now I want to upload everything on my personal hosting. As far as I know my hosting supports Python,CGI and has MySQL database. And I know that I have to change some parameters in the connection string in my code, so I can connect to the database, but I have two problems:
I remember that I installed MySQLdb as an extra to my Python, and in my code i am using it, how would I know that my hosting's python interpretor has this installed, or do I even need it, do I have to use another library?
How do I upload my database onto my hosting?
Thanks
If you have shell access, you can fire up the python interpreter by running python and type import MySQLdb at the >>> prompt. If you get no errors in return, then its installed.
Likewise, if you have shell access, this page will help you with importing and exporting using the mysql command. I found it by googleing "import export mysql".
You can write a simple script like
import MySQLdb and catch any errors
to see if the required package is
installed. If this fails you can ask
the hosting provider to install your
package, typically via a ticket
The hosting providers typically also provide URL's to connect to the MySQL tables they provision for you, and some tools like phpmyadmin to load database dumps into the hosted MySQL instance
To check if MySQLdb library is installed on your hosting, simply open a python shell and type: import MySQLdb. If everthing goes ok, you're readt to go. If you get: ImportError: No module named MySQLdb, that means the the library is not installed and you nedd to install it.
You need that library or some library that provides similar support, because Python does not support native access to MySQL databases.
To transfer you database to your hosting check mysqldump.