I have a postgres connection configured in airflow. Does anyone know how to get the schema, port, host, etc. of this connection so that I don't pass these fixed values in my code?
This code below is what I've been trying but with no sucess cause I'dont know how to pass the parameters. Does anyone knows how to get this parameters using psycopg2 ?
import psycopg2
from sqlalchemy import create_engine
conn_string = 'postgresql+psycopg2://{0}:{1}#{2}:{3}/{4}'.format( <How to get host, port, dbname, login and password from airflow here> )
engine = create_engine(conn_string)
return engine
connection = psycopg2.connect(
login=engine.user,
password=engine.password,
dbname=engine.dbname,
host=engine.host,
port=engine.port
)
task_load = PythonOperator(
task_id="task_id",
python_callable=get_conn,
op_kwargs={
'postgres_conn': get_vars['postgres_conn']
},
provide_context=True,
)
You can use Airflow macros as explained in this answer but it's not really needed for your issue.
Your issue is to get psycopg2.connect() object for that you can use PostgresHook. You mentioned you already have PostgresSQL connection defined in Airflow so all you left to do is:
from airflow.providers.postgres.hooks.postgres import PostgresHook
def work_with_postgress():
hook = PostgresHook(postgres_conn_id="postgres_conn_id")
conn = hook.get_conn() # this returns psycopg2.connect() object
# You can also just run sql directly with the hook
hook.run(sql="UPDATE my_table SET my_col = 'value'")
df = hook.get_pandas_df("SELECT * FROM my_table") # return dataframe object
If for some reason you can't work with the hook with psycopg2 directly rather than with PostgresHook then you have to use macros and place the code inside python callable (I don't recommend doing that!):
def work_with_postgress(**kwargs):
import psycopg2
from sqlalchemy import create_engine
conn_string = 'postgresql+psycopg2://{0}:{1}#{2}:{3}/{4}'.format(...)
engine = create_engine(conn_string)
connection = psycopg2.connect(
login=engine.user,
password=engine.password,
dbname=engine.dbname,
host=engine.host,
port=engine.port
)
task_load = PythonOperator(
task_id="task_id",
python_callable=work_with_postgress,
op_kwargs={
# You can pass here the macros and they will be available in the callable
},
)
Related
I need to connect to Snowflake using SQLAlchemy but the trick is, I need to authenticate using OAuth2. Snowflake documentation only describes connecting using username and password and this cannot be used in the solution I'm building. I can authenticate using Snowflake's python connector but I see no simple path how to glue it with SQLAlchemy. I'd like to know if there is a ready solution before I write a custom interface for this.
Use snowflake.connector.connect to create a PEP-249 Connection to the database - see documentation. Then use param creator of create_engine (docs) - it takes a callable that returns PEP-249 Connection. If you use it then URL param is ignored.
Example code:
def get_connection():
return snowflake.connector.connect(
user="<username>",
host="<hostname>",
account="<account_identifier>",
authenticator="oauth",
token="<oauth_access_token>",
warehouse="test_warehouse",
database="test_db",
schema="test_schema"
)
engine = create_engine("snowflake://not#used/db", creator=get_connection)
I got this working but just adding more params in the connection URL:
from sqlalchemy.engine import create_engine
import urllib.parse
connection_url = f"snowflake://{user}:#{account}/{database}/{schema}?warehouse={warehouse}&authenticator=oauth&token={urllib.parse.quote(access_token)}"
engine = create_engine(connection_url)
with engine.begin() as connection:
print(connection.execute('select count(*) from lineitem').fetchone())
If you don't want to be constructing the URL on your own, you can use snowflake.sqlalchemy.URL like this:
from snowflake.sqlalchemy import URL
connection_url = URL(
user=user,
authenticator="oauth",
token=access_token,
host=host,
account=account,
warehouse=warehouse,
database=database,
schema=schema
)
Consider below code:-
Now the below works
connection = cx_Oracle.connect(dsn = 'DSNAME')
But when I use below format for SqlAlchemy it doesn't work, I get TypeError: Invalid arguments dsn passed:
connection = create_engine('oracle+cx_oracle://' , dsn = 'DSNAME')
SQLAlchemy requires a database connection URI, there is an article about it on their documentation. It requires the format
oracle+cx_oracle://user:pass#host:port/dbname[?key=value&key=value...]
Have you tried the following?
connection = create_engine('oracle+cx_oracle://' + 'DSNAME')
it seems pretty old post, I tried with below code, it works . hope this helps. providing empty username/passwd, they are read from wallet, location is mentioned in sqlnet.ora
tnsnames.ora:
t1 = (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=ip)(PORT=1521)(KEY=dbpdb1))(CONNECT_DATA=(SERVICE_NAME=dbsvc1.oracle.com))).
sqlnet.ora:
WALLET_LOCATION =
(SOURCE =
(METHOD = FILE)
(METHOD_DATA =
(DIRECTORY = $walletdir)
)
)
SQLNET.WALLET_OVERRIDE = TRUE
from sqlalchemy import create_engine
cstr='oracle://:#t1'
print(cstr)
engine = create_engine(
cstr,
convert_unicode=False,
echo=True
)
s='select * from emp'
conn = engine.connect()
result = conn.execute(s)
for row in result:
print (row)
I'm trying to query an RDS (Postgres) database through Python, more specifically a Jupyter Notebook. Overall, what I've been trying for now is:
import boto3
client = boto3.client('rds-data')
response = client.execute_sql(
awsSecretStoreArn='string',
database='string',
dbClusterOrInstanceArn='string',
schema='string',
sqlStatements='string'
)
The error I've been receiving is:
BadRequestException: An error occurred (BadRequestException) when calling the ExecuteSql operation: ERROR: invalid cluster id: arn:aws:rds:us-east-1:839600708595:db:zprime
In the end, it was much simpler than I thought, nothing fancy or specific. It was basically a solution I had used before when accessing one of my local DBs. Simply import a specific library for your database type (Postgres, MySQL, etc) and then connect to it in order to execute queries through python.
I don't know if it will be the best solution since making queries through python will probably be much slower than doing them directly, but it's what works for now.
import psycopg2
conn = psycopg2.connect(database = 'database_name',
user = 'user',
password = 'password',
host = 'host',
port = 'port')
cur = conn.cursor()
cur.execute('''
SELECT *
FROM table;
''')
cur.fetchall()
I'm totally new using sqlalchemy and postgresql. I read this tutorial to build the following piece of code :
import sqlalchemy
from sqlalchemy import create_engine
from sqlalchemy import engine
def connect(user, password, db, host='localhost', port=5432):
'''Returns a connection and a metadata object'''
# We connect with the help of the PostgreSQL URL
# postgresql://federer:grandestslam#localhost:5432/tennis
url = 'postgresql://{}:{}#{}:{}/{}'
url = url.format(user, password, host, port, db)
# The return value of create_engine() is our connection object
con = sqlalchemy.create_engine(url, client_encoding='utf8')
# We then bind the connection to MetaData()
meta = sqlalchemy.MetaData(bind=con, reflect=True)
return con, meta
con, meta = connect('federer', 'grandestslam', 'tennis')
con
engine('postgresql://federer:***#localhost:5432/tennis')
meta
MetaData(bind=Engine('postgresql://federer:***#localhost:5432/tennis'))
When running it I have this error :
File "test.py", line 22, in <module>
engine('postgresql://federer:***#localhost:5432/tennis')
TypeError: 'module' object is not callable
what should I do ? thanks !
So, your problem is happening because you've made this call:
from sqlalchemy import engine
And then you've used this later in the file:
engine('postgresql://federer:***#localhost:5432/tennis')
Strangely, in that section, you have some statements that are just con and meta with no assignments or calls or anything. I'm not sure what you're doing there. I would suggest that you check out SQLalchemy's page on engine and connection use to help get you sorted.
It will of course depend on exactly how you've set up your database. I used the declarative_base module in one of my projects, so my process of setting up a session to connect to my DB looks like this:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
# Connect to Database and create database session
engine = create_engine('postgresql://catalog:catalog#localhost/menus')
Base.metadata.bind = engine
DBSession = sessionmaker(bind=engine)
session = DBSession()
And in my database setup file, I've assigned:
Base = declarative_base()
But you'll have to customize it a bit to your particular setup. I hope that helps.
Edit: I see now where those calls to con and meta were coming from, as well as your other confusing lines, it's part of the tutorial you linked to. What he was doing in that tutorial was using the Python interpreter in command line. I'll explain a few of the things he did there in the hope that it helps you some more. Lines beginning with >>> are what he enters in as commands. The other lines are the output he receives back.
>>> con, meta = connect('federer', 'grandestslam', 'tennis') # he creates the connection and meta objects
>>> con # now he calls the connection by itself to have it show that it's connected to his DB
Engine(postgresql://federer:***#localhost:5432/tennis)
>>> meta # here he calls his meta object to show how it, too, is connected
MetaData(bind=Engine(postgresql://federer:***#localhost:5432/tennis))
Update: I've confirmed this is only a problem when using an Azure SQL instance. I can use the same conn string to connect to local, network, and remote SQL (AWS) instances - it is only failing when connecting to Azure. I can connect to the Azure instance with other tools, like Management Studio.
I am building a small Python(3.4.x)/Flask application. I'm a complete noob here so forgive me if I break any rules in posting.
I have created the database engine with:
from sqlalchemy import create_engine
from sqlalchemy.orm import scoped_session, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
engine = create_engine('mssql+pymssql://dbadmin:dbadminpass#somedomain.server.net/databasename?charset=utf8')
db_session = scoped_session(sessionmaker(autocommit = False, autoflush = False, bind = engine))
Base = declarative_base()
Base.query = db_session.query_property()
def init_db():
import models
Base.metadata.creat_all(bind=engine)
Everything builds/interprets correctly at runtime but I get an error on running the query:
usr = User.query.filter_by(username=form.user.data).first()
The error is:
sqlalchemy.exc.OperationalError: (OperationalError) (20002, b'DB-Lib error message 20002, severity 9:\nAdaptive Server connection failed\n') None None
packages are: Flask==0.10.1, pymssql==2.1.1, SQLAlchemy==0.9.8
Thanks in advance.
I had similar problem and solved it by explicitly setting tds version = 7.0. FreeTDS reads the user's ${HOME}/.freetds.conf before resorting to the system-wide sysconfdir/freetds.conf. So, I created ~/.freetds.conf with [global] section as:
[global]
tds version = 7.0
You can find more information on freetds.con: http://www.freetds.org/userguide/freetdsconf.htm
As I just had the same problem.
Since I could get pymssql to connect bypassing sqlalchemy, I figured everything else should be fine, so I used the create_engine parameter connect_args to pass everything straight to pymssql.connect.
server_name = "sql_server_name"
server_addres = server_name + ".database.windows.net"
database = "database_name"
username = "{}#{}".format("my_username", server_name)
password = "strong_password"
arguments = dict(server=server_addres, user=username,
password=password, database=database, charset="utf8")
AZURE_ENGINE = create_engine('mssql+pymssql:///', connect_args=arguments)
This works fine and does not require one to meddle with the .freetds.conf file at all.
Also, note that pymssql requires usernname to be in the form username#servername. For more information see the linked documentation.