I need to connect to Snowflake using SQLAlchemy but the trick is, I need to authenticate using OAuth2. Snowflake documentation only describes connecting using username and password and this cannot be used in the solution I'm building. I can authenticate using Snowflake's python connector but I see no simple path how to glue it with SQLAlchemy. I'd like to know if there is a ready solution before I write a custom interface for this.
Use snowflake.connector.connect to create a PEP-249 Connection to the database - see documentation. Then use param creator of create_engine (docs) - it takes a callable that returns PEP-249 Connection. If you use it then URL param is ignored.
Example code:
def get_connection():
return snowflake.connector.connect(
user="<username>",
host="<hostname>",
account="<account_identifier>",
authenticator="oauth",
token="<oauth_access_token>",
warehouse="test_warehouse",
database="test_db",
schema="test_schema"
)
engine = create_engine("snowflake://not#used/db", creator=get_connection)
I got this working but just adding more params in the connection URL:
from sqlalchemy.engine import create_engine
import urllib.parse
connection_url = f"snowflake://{user}:#{account}/{database}/{schema}?warehouse={warehouse}&authenticator=oauth&token={urllib.parse.quote(access_token)}"
engine = create_engine(connection_url)
with engine.begin() as connection:
print(connection.execute('select count(*) from lineitem').fetchone())
If you don't want to be constructing the URL on your own, you can use snowflake.sqlalchemy.URL like this:
from snowflake.sqlalchemy import URL
connection_url = URL(
user=user,
authenticator="oauth",
token=access_token,
host=host,
account=account,
warehouse=warehouse,
database=database,
schema=schema
)
Related
I have a postgres connection configured in airflow. Does anyone know how to get the schema, port, host, etc. of this connection so that I don't pass these fixed values in my code?
This code below is what I've been trying but with no sucess cause I'dont know how to pass the parameters. Does anyone knows how to get this parameters using psycopg2 ?
import psycopg2
from sqlalchemy import create_engine
conn_string = 'postgresql+psycopg2://{0}:{1}#{2}:{3}/{4}'.format( <How to get host, port, dbname, login and password from airflow here> )
engine = create_engine(conn_string)
return engine
connection = psycopg2.connect(
login=engine.user,
password=engine.password,
dbname=engine.dbname,
host=engine.host,
port=engine.port
)
task_load = PythonOperator(
task_id="task_id",
python_callable=get_conn,
op_kwargs={
'postgres_conn': get_vars['postgres_conn']
},
provide_context=True,
)
You can use Airflow macros as explained in this answer but it's not really needed for your issue.
Your issue is to get psycopg2.connect() object for that you can use PostgresHook. You mentioned you already have PostgresSQL connection defined in Airflow so all you left to do is:
from airflow.providers.postgres.hooks.postgres import PostgresHook
def work_with_postgress():
hook = PostgresHook(postgres_conn_id="postgres_conn_id")
conn = hook.get_conn() # this returns psycopg2.connect() object
# You can also just run sql directly with the hook
hook.run(sql="UPDATE my_table SET my_col = 'value'")
df = hook.get_pandas_df("SELECT * FROM my_table") # return dataframe object
If for some reason you can't work with the hook with psycopg2 directly rather than with PostgresHook then you have to use macros and place the code inside python callable (I don't recommend doing that!):
def work_with_postgress(**kwargs):
import psycopg2
from sqlalchemy import create_engine
conn_string = 'postgresql+psycopg2://{0}:{1}#{2}:{3}/{4}'.format(...)
engine = create_engine(conn_string)
connection = psycopg2.connect(
login=engine.user,
password=engine.password,
dbname=engine.dbname,
host=engine.host,
port=engine.port
)
task_load = PythonOperator(
task_id="task_id",
python_callable=work_with_postgress,
op_kwargs={
# You can pass here the macros and they will be available in the callable
},
)
in my jupyter notebook I connect to snowflake with an externalbrowser auth like so:
conn = snowflake.connector.connect(
user='<my user>',
authenticator='externalbrowser',
account='<my account>',
warehouse='<the warehouse>')
this opens an external browser to auth and after that works fine with pandas read sql:
pd.read_sql('<a query>', conn)
want to use it with ipython sql, but when I try:
%sql snowflake://conn.user#conn.account
I get:
snowflake.connector.errors.ProgrammingError) Password is empty
well I don't have one :)
any ideas how to pass this?
IPython-sql connection strings are SQLAlchemy URL standard, therefore you can do the following:
%load_ext sql
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL
engine = create_engine(URL(
account = '<account>',
user = '<user>',
database = 'testdb',
schema = 'public',
warehouse = '<wh>',
role='public',
authenticator='externalbrowser'
))
connection = engine.connect()
This would open the external browser for authentication.
I'm working with pandas and sqlalchemy, and would like to load a DataFrame into a MySQL database. I'm currently using this code snippet:
db_connection = sqlalchemy.create_engine('mysql+mysqlconnector://user:pwd#hostname/db_name')
some_data_ref.to_sql(con=db_connection, name='db_table_name', if_exists='replace')
sqlalchemy, pandas have been imported prior to this.
My MySQL backend is 8.x, which I know uses caching_sha2_password. If I were to connect to the database using mysql.connector.connect and I want to use the mysql_native_password method, I know that I should specify auth_plugin = mysql_native_password like so:
mysql.connector.connect(user=user, password=pw, host=host, database=db, auth_plugin='mysql_native_password')
My question: Is there a way to force mysql_native_password authentication with sqlalchemy.create_engine('mysql+mysqlconnector://...)?
Any advice on this would be much appreciated...
You could use connect_args:
db_connection = sqlalchemy.create_engine(
'mysql+mysqlconnector://user:pwd#hostname/db_name',
connect_args={'auth_plugin': 'mysql_native_password'})
or the URL query:
db_connection = sqlalchemy.create_engine(
'mysql+mysqlconnector://user:pwd#hostname/db_name?auth_plugin=mysql_native_password')
I am not sure how to connect to a mongodb database that uses an authentication database with mongoengine.
On the command prompt I need to do mongo hostname:27017/myApp -u "test" -p "test" --authenticationDatabase admin, but I don't see where I'd pass this as an argument to mongoengine so I use the admin database for auth but connect to the myApp database for my models?
I believe this is where it's explained in the PyMongo guide:
https://api.mongodb.com/python/current/examples/authentication.html
>>> from pymongo import MongoClient
>>> client = MongoClient('example.com')
>>> db = client.the_database
>>> db.authenticate('user', 'password', source='source_database')
and I found the pull request that added this to mongoengine:
https://github.com/MongoEngine/mongoengine/pull/590/files
It looks like you just add authentication_source as an argument to connect like connect(authentication_source='admin'). It'd be nice if it was better documented.
http://docs.mongoengine.org/apireference.html?highlight=authentication_source
According to the mongoengine connecting guide, the connect() method support URI style connections. i.e.
connect(
'project1'
host='mongodb://username:password#host1:port1/databaseName'
)
In that sense, you can also specify the authentication source database as below:
"mongodb://username:password#host1:port1/database?authSource=source_database"
See also MongoDB connection string URI for more MongoDB URI examples.
Also Authentication options through connection string
The API has been updated, so this is the right way to do it now:
connect('mydb',
host="localhost",
username="admin",
password="secret",
authentication_source='your_auth_db')
The solution suggested doesn't work for me. What does work:
just add a authSource argument to the connect method as you would do with pymongo MongoClient method. Example:
connect('database_name', host='host', username="username",
password="password",authSource='authentication_database_name')
Here is an easy solution that worked for me.
connect(db="database_name", host="localhost", port=27017, username="username",
password="password", authentication_source="admin")
I'm trying to connect to a SQL Server 2012 database using SQLAlchemy (with pyodbc) on Python 3.3 (Windows 7-64-bit). I am able to connect using straight pyodbc but have been unsuccessful at connecting using SQLAlchemy. I have dsn file setup for the database access.
I successfully connect using straight pyodbc like this:
con = pyodbc.connect('FILEDSN=c:\\users\\me\\mydbserver.dsn')
For sqlalchemy I have tried:
import sqlalchemy as sa
engine = sa.create_engine('mssql+pyodbc://c/users/me/mydbserver.dsn/mydbname')
The create_engine method doesn't actually set up the connection and succeeds, but
iIf I try something that causes sqlalchemy to actually setup the connection (like engine.table_names()), it takes a while but then returns this error:
DBAPIError: (Error) ('08001', '[08001] [Microsoft][ODBC SQL Server Driver][DBNETLIB]SQL Server does not exist or access denied. (17) (SQLDriverConnect)') None None
I'm not sure where thing are going wrong are how to see what connection string is actually being passed to pyodbc by sqlalchemy. I have successfully using the same sqlalchemy classes with SQLite and MySQL.
The file-based DSN string is being interpreted by SQLAlchemy as server name = c, database name = users.
I prefer connecting without using DSNs, it's one less configuration task to deal with during code migrations.
This syntax works using Windows Authentication:
engine = sa.create_engine('mssql+pyodbc://server/database')
Or with SQL Authentication:
engine = sa.create_engine('mssql+pyodbc://user:password#server/database')
SQLAlchemy has a thorough explanation of the different connection string options here.
In Python 3 you can use function quote_plus from module urllib.parse to create parameters for connection:
import urllib
params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};"
"SERVER=dagger;"
"DATABASE=test;"
"UID=user;"
"PWD=password")
engine = sa.create_engine("mssql+pyodbc:///?odbc_connect={}".format(params))
In order to use Windows Authentication, you want to use Trusted_Connection as parameter:
params = urllib.parse.quote_plus("DRIVER={SQL Server Native Client 11.0};"
"SERVER=dagger;"
"DATABASE=test;"
"Trusted_Connection=yes")
In Python 2 you should use function quote_plus from library urllib instead:
params = urllib.quote_plus("DRIVER={SQL Server Native Client 11.0};"
"SERVER=dagger;"
"DATABASE=test;"
"UID=user;"
"PWD=password")
I have an update info about the connection to MSSQL Server without using DSNs and using Windows Authentication. In my example I have next options:
My local server name is "(localdb)\ProjectsV12". Local server name I see from database properties (I am using Windows 10 / Visual Studio 2015).
My db name is "MainTest1"
engine = create_engine('mssql+pyodbc://(localdb)\ProjectsV12/MainTest1?driver=SQL+Server+Native+Client+11.0', echo=True)
It is needed to specify driver in connection.
You may find your client version in:
control panel>Systems and Security>Administrative Tools.>ODBC Data
Sources>System DSN tab>Add
Look on SQL Native client version from the list.
Just want to add some latest information here:
If you are connecting using DSN connections:
engine = create_engine("mssql+pyodbc://USERNAME:PASSWORD#SOME_DSN")
If you are connecting using Hostname connections:
engine = create_engine("mssql+pyodbc://USERNAME:PASSWORD#HOST_IP:PORT/DATABASENAME?driver=SQL+Server+Native+Client+11.0")
For more details, please refer to the "Official Document"
import pyodbc
import sqlalchemy as sa
engine = sa.create_engine('mssql+pyodbc://ServerName/DatabaseName?driver=SQL+Server+Native+Client+11.0',echo = True)
This works with Windows Authentication.
I did different and worked like a charm.
First you import the library:
import pandas as pd
from sqlalchemy import create_engine
import pyodbc
Create a function to create the engine
def mssql_engine(user = os.getenv('user'), password = os.getenv('password')
,host = os.getenv('SERVER_ADDRESS'),db = os.getenv('DATABASE')):
engine = create_engine(f'mssql+pyodbc://{user}:{password}#{host}/{db}?driver=SQL+Server')
return engine
Create a variable with your query
query = 'SELECT * FROM [Orders]'
Execute the Pandas command to create a Dataframe from a MSSQL Table
df = pd.read_sql(query, mssql_engine())