In SQLAlchemy, can I create an Engine from an existing ODBC connection? - python

I am working in an environment where I am given an ODBC connection, which has been created using credentials to which I don't have access (for security reasons). However I would like to access the underlying database using SQLAlchemy - so my question is, can I pass this ODBC connection to something like create_engine, or alternatively, wrap it in such a way that it looks like a SQLAlchemy connection?
As a supplementary question (and working on the optimistic assumption that the first part can be satisfied) is there a way that I can tell SQLA what dialect to use for the underlying RDBMS?
Thanks

yes you can:
from sqlalchemy import create_engine
from sqlalchemy.pool import StaticPool
eng = create_engine("mssql+pyodbc://", poolclass=StaticPool, creator=lambda: my_odbc_connection)
however, if you truly have only one connection already created, as opposed to a callable that creates them, you must only use this engine in a single thread, one operation at a time. It is not threadsafe for use in a multithreaded application.
If OTOH you can actually get at a Python function that creates new connections whenever called, this is much more appopriate:
from sqlalchemy import create_engine
eng = create_engine("mssql+pyodbc://", creator=my_odbc_connection_function)
the above engine will pool connections normally and can be used freely as a source of connectivity.

found this old post (14 years ago)
import sqlalchemy
sqlalchemy.create_engine('mssql://?dsn=mydsn')
my user case is for using jupyter SQL magics in new notebooks to be distributed to users which already have the DSN configured on their systems
for jupyter, after pip install ipython-sql:
import sqlalchemy
%load_ext sql
%sql mssql://?dsn=mydsn
tested on SQL Server 2019 with Windows authentication, as all info I found point that DSNs don't hold passwords for MSSQL
there's no reason why it shouldn't work on older versions (I'll use it on SQL Server 2005 too, of course with another DSN)

Related

How will setting autocommit = True affect queries from python to Hive server when calling pyodbc.connect()

I am trying to connect a jupyter notebook I'm running in a conda environment to a Hadoop cluster through Apache Hive on cloudera. I understand from this post that I should install/set up the cloudera odbc driver and use pydobc and with a connection as follows:
import pyodbc
import pandas as pd
with pyodbc.connect("DSN=<replace DSN name>", autocommit=True) as conn:
df = pd.read_sql("<Hive Query>", conn)
My question is about the autocommit parameter. I see in the pyodbc connection documentation that setting autocommit to True will make it so that I don't have to explicitly commit transactions, but it doesn't specify what that actually means.
What exactly is a transaction ?
I want to select data from the hive server using pd.read_sql_query() but I don't want to make any changes to the actual data on the server.
Apologies if this question is formatted incorrectly or if there are (seemingly simple) details I'm overlooking in my question - this is my first time posting on stackoverflow and I'm new to working with cloudera / Hive.
I haven't tried connecting yet or running any queries yet because I don't want to mess up anything on the server.
Hive do not have concept of commit and starting transactions like RDBMS systems.
You should not worry about autocommit.

Pandas DataFrame to SQL [duplicate]

Context
I just get into trouble while trying to do some I/O operations on some databases from a Python3 script.
When I want to connect to a database, I habitually use psycopg2 in order to handle the connections and cursors.
My data are usually stored as Pandas DataFrames and/or GeoPandas's equivalent GeoDataFrames.
Difficulties
In order to read data from a database table;
Using Pandas:
I can rely on its .read_sql() methods which takes as a parameter con, as stated in the doc:
con : SQLAlchemy connectable (engine/connection) or database str URI
or DBAPI2 connection (fallback mode)'
Using SQLAlchemy makes it possible to use any DB supported by that
library. If a DBAPI2 object, only sqlite3 is supported. The user is responsible
for engine disposal and connection closure for the SQLAlchemy connectable. See
`here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_
Using GeoPandas:
I can rely on its .read_postigs() methods which takes as a parameter con, as stated in the doc:
con : DB connection object or SQLAlchemy engine
Active connection to the database to query.
In order to write data to a database table;
Using Pandas:
I can rely on the .to_sql() methods which takes as a parameter con, as stated in the doc:
con : sqlalchemy.engine.Engine or sqlite3.Connection
Using SQLAlchemy makes it possible to use any DB supported by that
library. Legacy support is provided for sqlite3.Connection objects. The user
is responsible for engine disposal and connection closure for the SQLAlchemy
connectable See `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_
Using GeoPandas:
I can rely on the .to_sql() methods (which directly relies on the Pandas .to_sql()) which takes as a parameter con, as stated in the doc:
con : sqlalchemy.engine.Engine or sqlite3.Connection
Using SQLAlchemy makes it possible to use any DB supported by that
library. Legacy support is provided for sqlite3.Connection objects. The user
is responsible for engine disposal and connection closure for the SQLAlchemy
connectable See `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_
From here, I easily understand that GeoPandas is built on Pandas especially for its GeoDataFrame object, which is, shortly, a special DataFrame that can handle geographic data.
But I'm wondering why do GeoPandas has the ability to directly takes a psycopg2 connection as an argument and not Pandas and if it is planned for the latter?
And why is it neither the case for one nor the other when it comes to writing data?
I would like (as probably many of others1,2) to directly give them a psycopg2 connections argument instead of relying on SQLAlchemy engine.
Because even is this tool is really great, it makes me use two different frameworks to connect to my database and thus handle two different connection strings (and I personally prefer the way psycopg2 handles the parameters expansion from a dictionary to build a connection string properly such as; psycopg2.connect(**dict_params) vs URL injection as explained here for example: Is it possible to pass a dictionary into create_engine function in SQLAlchemy?).
Workaround
I was first creating my connection string with psycopg2 from a dictionary of parameters this way:
connParams = ("user={}", "password={}", "host={}", "port={}", "dbname={}")
conn = ' '.join(connParams).format(*dict_params.values())
Then I figured out it was better and more pythonic this way:
conn = psycopg2.connect(**dict_params)
Which I finally replaced by this, so that I can interchangeably use it to build either a psycopg2 connections, or a SQLAlchemy engine:
def connector():
return psycopg2.connect(**dict_params)
a) Initialize a psycopg2 connection is now done by:
conn = connector()
curs = conn.cursor()
b) And initialize a SQLAlchemy engine by:
engine = create_engine('postgresql+psycopg2://', creator=connector)
(or with any of your flavored db+driver)
This is well documented here:
https://docs.sqlalchemy.org/en/13/core/engines.html#custom-dbapi-args
and here:
https://docs.sqlalchemy.org/en/13/core/engines.html#sqlalchemy.create_engine
[1] Dataframe to sql without Sql Alchemy engine
[2] How to write data frame to Postgres table without using SQLAlchemy engine?
Probably the main reason why to_sql needs a SQLAlchemy Connectable (Engine or Connection) object is that to_sql needs to be able to create the database table if it does not exist or if it needs to be replaced. Early versions of pandas worked exclusively with DBAPI connections, but I suspect that when they were adding new features to to_sql they found themselves writing a lot of database-specific code to work around the quirks of the various DDL implementations.
On realizing that they were duplicating a lot of logic that was already in SQLAlchemy they likely decided to "outsource' all of that complexity to SQLAlchemy itself by simply accepting an Engine/Connection object and using SQLAlchemy's (database-independent) SQL Expression language to create the table.
it makes me use two different frameworks to connect to my database
No, because .read_sql_query() also accepts a SQLAlchemy Connectable object so you can just use your SQLAlchemy connection for both reading and writing.

Connect to sqlite3.Connection using sqlalchemy

I am using a library that creates an SQLite library in-memory by calling sqlite3.connect(':memory:'). I would like to connect to this database using sqlalchemy to use some ORM and other nice bells and whistles. Is there, in the depths of SQLAlchemy's API, a way to pass the resulting sqlite3.Connection object through so that I can re-use it?
I cannot just re-connect with connection = sqlalchemy.create_engine('sqlite:///:memory:').connect() – as the SQLite documentation states: “The database ceases to exist as soon as the database connection is closed. Every :memory: database is distinct from every other. So, opening two database connections each with the filename ":memory:" will create two independent in-memory databases.” (Which makes sense. I also tried it, and the behaviour is as expected.)
I have tried to follow SQLAlchemy's source code to find the low level location where the database connection is established and SQLite is actually called, but so far I found nothing. It looks like SQLAlchemy uses far too much obscure alchemy to do that for me to understand when and where it happens.
Here's a way to do that:
# some connection is created - by you or someone else
conn = sqlite3.connect(':memory:')
...
def get_connection():
# just a debug print to verify that it's indeed getting called:
print("returning the connection")
return conn
# create a SQL Alchamy engine that uses the same in-memory sqlite connection
engine = create_engine('sqlite://', creator = get_connection)
From this point on, just use the engine as you wish.
Here's a link to the documentation of this feature.

Pyodbc to SQLAlchemy connection string for Sage 50

I am trying to switch a pyodbc connection to sqlalchemy engine. My working pyodbc connection is:
con = pyodbc.connect('DSN=SageLine50v23;UID=#####;PWD=#####;')
This is what I've tried.
con = create_engine('pyodbc://'+username+':'+password+'#'+url+'/'+db_name+'?driver=SageLine50v23')
I am trying to connect to my Sage 50 accounting data but just can't work out how to build the connection string. This is where I downloaded the odbc driver https://my.sage.co.uk/public/help/askarticle.aspx?articleid=19136.
I got some orginal help for the pyodbc connection using this website (which is working) https://www.cdata.com/kb/tech/sageuk-odbc-python-linux.rst but would like to use SQLAlchemy for it connection with pandas. Any ideas? Assume the issue is with this part pyodbc://
According to this thread Sage 50 uses MySQL to store its data. However, Sage also provides its own ODBC driver which may or may not use the same SQL dialect as MySQL itself.
SQLAlchemy needs to know which SQL dialect to use, so you could try using the mysql+pyodbc://... prefix for your connection URI. If that doesn't work (presumably because "Sage SQL" is too different from "MySQL SQL") then you may want to ask Sage support if they know of a SQLAlchemy dialect for their product.

Instantiating SQLAlchemy engine from DBAPI connection object [duplicate]

I am working in an environment where I am given an ODBC connection, which has been created using credentials to which I don't have access (for security reasons). However I would like to access the underlying database using SQLAlchemy - so my question is, can I pass this ODBC connection to something like create_engine, or alternatively, wrap it in such a way that it looks like a SQLAlchemy connection?
As a supplementary question (and working on the optimistic assumption that the first part can be satisfied) is there a way that I can tell SQLA what dialect to use for the underlying RDBMS?
Thanks
yes you can:
from sqlalchemy import create_engine
from sqlalchemy.pool import StaticPool
eng = create_engine("mssql+pyodbc://", poolclass=StaticPool, creator=lambda: my_odbc_connection)
however, if you truly have only one connection already created, as opposed to a callable that creates them, you must only use this engine in a single thread, one operation at a time. It is not threadsafe for use in a multithreaded application.
If OTOH you can actually get at a Python function that creates new connections whenever called, this is much more appopriate:
from sqlalchemy import create_engine
eng = create_engine("mssql+pyodbc://", creator=my_odbc_connection_function)
the above engine will pool connections normally and can be used freely as a source of connectivity.
found this old post (14 years ago)
import sqlalchemy
sqlalchemy.create_engine('mssql://?dsn=mydsn')
my user case is for using jupyter SQL magics in new notebooks to be distributed to users which already have the DSN configured on their systems
for jupyter, after pip install ipython-sql:
import sqlalchemy
%load_ext sql
%sql mssql://?dsn=mydsn
tested on SQL Server 2019 with Windows authentication, as all info I found point that DSNs don't hold passwords for MSSQL
there's no reason why it shouldn't work on older versions (I'll use it on SQL Server 2005 too, of course with another DSN)

Categories