How to query a (Postgres) RDS DB through an AWS Jupyter Notebook?

How to query a (Postgres) RDS DB through an AWS Jupyter Notebook? - python

I'm trying to query an RDS (Postgres) database through Python, more specifically a Jupyter Notebook. Overall, what I've been trying for now is:
import boto3
client = boto3.client('rds-data')
response = client.execute_sql(
awsSecretStoreArn='string',
database='string',
dbClusterOrInstanceArn='string',
schema='string',
sqlStatements='string'
)
The error I've been receiving is:
BadRequestException: An error occurred (BadRequestException) when calling the ExecuteSql operation: ERROR: invalid cluster id: arn:aws:rds:us-east-1:839600708595:db:zprime

In the end, it was much simpler than I thought, nothing fancy or specific. It was basically a solution I had used before when accessing one of my local DBs. Simply import a specific library for your database type (Postgres, MySQL, etc) and then connect to it in order to execute queries through python.
I don't know if it will be the best solution since making queries through python will probably be much slower than doing them directly, but it's what works for now.
import psycopg2
conn = psycopg2.connect(database = 'database_name',
user = 'user',
password = 'password',
host = 'host',
port = 'port')
cur = conn.cursor()
cur.execute('''
SELECT *
FROM table;
''')
cur.fetchall()

Related

Queries working in snowflake web UI but not consistently through the python sqlalchemy connector

I have a sqlalchemy connection setup to snowflake which works, as I can run some queries and get results out. The attempts to query are also logged in my user_query history.
My connection:
engine = create_engine(URL(
user, password, account, database, warehouse, role
))
connection = engine.connect()
However, most of the time my queries fail returning Operational Error (i.e. its a snowflake error) https://docs.sqlalchemy.org/en/13/errors.html#error-e3q8. But these same queries will run fine in the snowflake web UI.
For example if I run
test_query = 'SELECT * FROM TABLE DB1.SCHEMA1.TABLE1'
test = pd.read_sql(test_query, connection)
When I look at my query_history it shows the sqlalchemy query failing, then a second later the base query itself being run successfully. However I'm not sure where this output goes in the snowflake setup, and why its not transferring through my sqlalchemy connection. What I'm seeing...
Query = 'DESC TABLE /* sqlalchemy:_has_object */ "SELECT * FROM DB1"."SCHEMA1"."TABLE1"
Error code = 2003 Error message = SQL compilation error: Database
'"SELECT * FROM DB1" does not exist.
Then 1 second later, the query itself will run successfully, but not clear where this goes as it doesn't get sent over the connection.
Query = SELECT * FROM TABLE DB1.SCHEMA1.TABLE1
Any help much appreciated!
Thanks

You can try adding schema also here
engine = create_engine(URL(
account = '',
user = '',
password = '',
database = '',
schema = '',
warehouse = '',
role='',
))
connection = engine.connect()

It is very unlikely that the query is running in WebUI and fails with syntax error when connected via CLI or other modes.
Suggest you print the query which is via CLI or via a connector, run the same to WebUI and also note that from which role you're running the query.
Please share what is your finding.

The mentioned query (SELECT * FROM TABLE DB1.SCHEMA1.TABLE1) is not a snowflake supported SQL syntax.
Link here will help you more with details.
Hope this helps!

How to secure a Python script SQL Server authentification

I am using a Python script to connect to a SQL Server database:
import pyodbc
import pandas
server = 'SQL'
database = 'DB_TEST'
username = 'USER'
password = 'My password'
sql='''
SELECT *
FROM [DB_TEST].[dbo].[test]
'''
cnxn = pyodbc.connect('DRIVER=SQL Server;SERVER='+server+';DATABASE='+database+';UID='+username+';PWD='+ password)
data = pandas.read_sql(sql,cnxn)
cnxn.close()
The script is launched everyday by an automatisation tools so there is no physical user.
The issue is how to replace the password field by a secure method?

The automated script is still ran by a windows user. Add this windows user to the SQL-Server users and give it the appropriate permissions, so you can use:
import pyodbc
import pandas
server = 'SQL'
database = 'DB_TEST'
sql='''
SELECT *
FROM [DB_TEST].[dbo].[test]
'''
cnxn = pyodbc.connect(
f'DRIVER=SQL Server;SERVER={server};DATABASE={database};Trusted_Connection=True;')
data = pandas.read_sql(sql,cnxn)
cnxn.close()

I am also interested in secure coding using Python .I did my own research to figure out available options, I would recommend reviewing this post as it summarize it all. Check on the listed options, and apply the one suits you better.

Error in connecting Azure SQL database from Azure Machine Learning Service using python

I am trying to connect Azure SQL Database from Azure Machine Learning service, but I got the below error.
Please check Error: -
**('IM002', '[IM002] [unixODBC][Driver Manager]Data source name not found and no default driver specified (0) (SQLDriverConnect)')**
Please Check the below code that I have used for database connection: -
import pyodbc
class DbConnect:
# This class is used for azure database connection using pyodbc
def __init__(self):
try:
self.sql_db = pyodbc.connect(SERVER=<servername>;PORT=1433;DATABASE=<databasename>;UID=<username>;PWD=<password>')
get_name_query = "select name from contacts"
names = self.sql_db.execute(get_name_query)
for name in names:
print(name)
except Exception as e:
print("Error in azure sql server database connection : ", e)
sys.exit()
if __name__ == "__main__":
class_obj = DbConnect()
Is there any way to solve the above error? Please let me know if there is any way.

I'd consider using azureml.dataprep over pyodbc for this task (the API may change, but this worked last time I tried):
import azureml.dataprep as dprep
ds = dprep.MSSQLDataSource(server_name=<server-name,port>,
database_name=<database-name>,
user_name=<username>,
password=<password>)
You should then be able to collect the result of an SQL query in pandas e.g. via
dataflow = dprep.read_sql(ds, "SELECT top 100 * FROM [dbo].[MYTABLE]")
dataflow.to_pandas_dataframe()

Alternatively you can create SQL datastore and create a dataset from the SQL datastore.
Learn how:
https://learn.microsoft.com/en-us/azure/machine-learning/service/how-to-create-register-datasets#create-tabulardatasets
Sample code:
from azureml.core import Dataset, Datastore
# create tabular dataset from a SQL database in datastore
sql_datastore = Datastore.get(workspace, 'mssql')
sql_ds = Dataset.Tabular.from_sql_query((sql_datastore, 'SELECT * FROM my_table'))
#AkshayGodase Any particular reason that you want to use pyodbc?

Pandas read_sql inconsistent behaviour dependent on driver?

When I run a query from my local machine to a SQL server db, data is returned. If I run the same query from a JupyterHub server (using ssh), the following is returned:
TypeError: 'NoneType' object is not iterable
Implying it isn't getting any data.
The connection string is OK on both systems (albeit different), because running the same stored procedure works fine on both systems using the connection string -
Local= "Driver={SQL Server};Server=DNS-based-address;Database=name;uid=user;pwd=pwd"
Hub = "DRIVER=FreeTDS;SERVER=IP.add.re.ss;PORT=1433;DATABASE=name;UID=dbuser;PWD=pwd;TDS_Version=8.0"
Is there something in the FreeTDS driver that affects chunksize, or means a set nocount is required in the original query as per this NoneType object is not iterable error in pandas - I tried this fix by the way and got nowhere.

Are you using pymssql, which is build on top of FreeTDS?
For SQL-Server you could also try the Microsoft JDBC Driver with the python package jaydebeapi: https://github.com/microsoft/mssql-jdbc.
import pandas as pd
import pymssql
conn = pymssql.connect(
host = r'192.168.254.254',
port = '1433',
user = r'user',
password = r'password',
database = 'DB_NAME'
)
query = """SELECT * FROM db_table"""
df = pd.read_sql(con=conn, sql=query)

Cross Server Select In SQLAlchemy

Is it possible to make SQLAlchemy do cross server joins?
If I try to run something like
engine = create_engine('mssql+pyodbc://SERVER/Database')
query = sql.text('SELECT TOP 10 * FROM [dbo].[Table]')
with engine.begin() as connection:
data = connection.execute(query).fetchall()
It works as I'd expect. If I change the query to select from [OtherServer].[OtherDatabase].[dbo].[Table] I get an error message "Login failed for user 'NT AUTHORITY\\ANONYMOUS LOGON"

Looks like there's an issue with how you authenticate to SQL server.
I believe you can connect using the current Windows user, the URI syntax is then mssql+pyodbc://SERVER/Database?trusted_connection=yes (I have never tested this, but give it a try).
Another option is to create a SQL server login (ie. a username/password that is defined within SQL server, NOT a Windows user) and use the SQL server login when you connect.
The database URI then becomes: mssql+pyodbc://username:password#SERVER/Database.

mssql+pyodbc://SERVER/Database?trusted_connection=yes threw an error when I tried to it. It did point me in the right direction though.
from sqlalchemy import create_engine, sql
import urllib
string = "DRIVER={SQL SERVER};SERVER=server;DATABASE=db;TRUSTED_CONNECTION=YES"
params = urllib.quote_plus(string)
engine = create_engine('mssql+pyodbc:///?odbc_connect={0}'.format(params))
query = sql.text('SELECT TOP 10 * FROM [CrossServer].[datbase].[dbo].[Table]')
with engine.begin() as connection:
data = connection.execute(query).fetchall()

It's quite complicated if you suppose to alter different servers through one connection.
But if you need to perform a query to a different server under different credentials you should add linked server first with sp_addlinkedserver. Then it should be added credentials to the linked server with sp_addlinkedsrvlogin. Have you tried this?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to query a (Postgres) RDS DB through an AWS Jupyter Notebook? - python

Related

Queries working in snowflake web UI but not consistently through the python sqlalchemy connector

How to secure a Python script SQL Server authentification

Error in connecting Azure SQL database from Azure Machine Learning Service using python

Pandas read_sql inconsistent behaviour dependent on driver?

Cross Server Select In SQLAlchemy

Categories

Resources