Connect to sqlite3.Connection using sqlalchemy - python

I am using a library that creates an SQLite library in-memory by calling sqlite3.connect(':memory:'). I would like to connect to this database using sqlalchemy to use some ORM and other nice bells and whistles. Is there, in the depths of SQLAlchemy's API, a way to pass the resulting sqlite3.Connection object through so that I can re-use it?
I cannot just re-connect with connection = sqlalchemy.create_engine('sqlite:///:memory:').connect() – as the SQLite documentation states: “The database ceases to exist as soon as the database connection is closed. Every :memory: database is distinct from every other. So, opening two database connections each with the filename ":memory:" will create two independent in-memory databases.” (Which makes sense. I also tried it, and the behaviour is as expected.)
I have tried to follow SQLAlchemy's source code to find the low level location where the database connection is established and SQLite is actually called, but so far I found nothing. It looks like SQLAlchemy uses far too much obscure alchemy to do that for me to understand when and where it happens.

Here's a way to do that:
# some connection is created - by you or someone else
conn = sqlite3.connect(':memory:')
...
def get_connection():
# just a debug print to verify that it's indeed getting called:
print("returning the connection")
return conn
# create a SQL Alchamy engine that uses the same in-memory sqlite connection
engine = create_engine('sqlite://', creator = get_connection)
From this point on, just use the engine as you wish.
Here's a link to the documentation of this feature.

Related

Pandas DataFrame to SQL [duplicate]

Context
I just get into trouble while trying to do some I/O operations on some databases from a Python3 script.
When I want to connect to a database, I habitually use psycopg2 in order to handle the connections and cursors.
My data are usually stored as Pandas DataFrames and/or GeoPandas's equivalent GeoDataFrames.
Difficulties
In order to read data from a database table;
Using Pandas:
I can rely on its .read_sql() methods which takes as a parameter con, as stated in the doc:
con : SQLAlchemy connectable (engine/connection) or database str URI
or DBAPI2 connection (fallback mode)'
Using SQLAlchemy makes it possible to use any DB supported by that
library. If a DBAPI2 object, only sqlite3 is supported. The user is responsible
for engine disposal and connection closure for the SQLAlchemy connectable. See
`here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_
Using GeoPandas:
I can rely on its .read_postigs() methods which takes as a parameter con, as stated in the doc:
con : DB connection object or SQLAlchemy engine
Active connection to the database to query.
In order to write data to a database table;
Using Pandas:
I can rely on the .to_sql() methods which takes as a parameter con, as stated in the doc:
con : sqlalchemy.engine.Engine or sqlite3.Connection
Using SQLAlchemy makes it possible to use any DB supported by that
library. Legacy support is provided for sqlite3.Connection objects. The user
is responsible for engine disposal and connection closure for the SQLAlchemy
connectable See `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_
Using GeoPandas:
I can rely on the .to_sql() methods (which directly relies on the Pandas .to_sql()) which takes as a parameter con, as stated in the doc:
con : sqlalchemy.engine.Engine or sqlite3.Connection
Using SQLAlchemy makes it possible to use any DB supported by that
library. Legacy support is provided for sqlite3.Connection objects. The user
is responsible for engine disposal and connection closure for the SQLAlchemy
connectable See `here <https://docs.sqlalchemy.org/en/13/core/connections.html>`_
From here, I easily understand that GeoPandas is built on Pandas especially for its GeoDataFrame object, which is, shortly, a special DataFrame that can handle geographic data.
But I'm wondering why do GeoPandas has the ability to directly takes a psycopg2 connection as an argument and not Pandas and if it is planned for the latter?
And why is it neither the case for one nor the other when it comes to writing data?
I would like (as probably many of others1,2) to directly give them a psycopg2 connections argument instead of relying on SQLAlchemy engine.
Because even is this tool is really great, it makes me use two different frameworks to connect to my database and thus handle two different connection strings (and I personally prefer the way psycopg2 handles the parameters expansion from a dictionary to build a connection string properly such as; psycopg2.connect(**dict_params) vs URL injection as explained here for example: Is it possible to pass a dictionary into create_engine function in SQLAlchemy?).
Workaround
I was first creating my connection string with psycopg2 from a dictionary of parameters this way:
connParams = ("user={}", "password={}", "host={}", "port={}", "dbname={}")
conn = ' '.join(connParams).format(*dict_params.values())
Then I figured out it was better and more pythonic this way:
conn = psycopg2.connect(**dict_params)
Which I finally replaced by this, so that I can interchangeably use it to build either a psycopg2 connections, or a SQLAlchemy engine:
def connector():
return psycopg2.connect(**dict_params)
a) Initialize a psycopg2 connection is now done by:
conn = connector()
curs = conn.cursor()
b) And initialize a SQLAlchemy engine by:
engine = create_engine('postgresql+psycopg2://', creator=connector)
(or with any of your flavored db+driver)
This is well documented here:
https://docs.sqlalchemy.org/en/13/core/engines.html#custom-dbapi-args
and here:
https://docs.sqlalchemy.org/en/13/core/engines.html#sqlalchemy.create_engine
[1] Dataframe to sql without Sql Alchemy engine
[2] How to write data frame to Postgres table without using SQLAlchemy engine?
Probably the main reason why to_sql needs a SQLAlchemy Connectable (Engine or Connection) object is that to_sql needs to be able to create the database table if it does not exist or if it needs to be replaced. Early versions of pandas worked exclusively with DBAPI connections, but I suspect that when they were adding new features to to_sql they found themselves writing a lot of database-specific code to work around the quirks of the various DDL implementations.
On realizing that they were duplicating a lot of logic that was already in SQLAlchemy they likely decided to "outsource' all of that complexity to SQLAlchemy itself by simply accepting an Engine/Connection object and using SQLAlchemy's (database-independent) SQL Expression language to create the table.
it makes me use two different frameworks to connect to my database
No, because .read_sql_query() also accepts a SQLAlchemy Connectable object so you can just use your SQLAlchemy connection for both reading and writing.

why should we set the local_infile=1 in sqlalchemy to load local file? Load file not allowed issue in sqlalchemy

I am using sqlalchemy to connect to MySQL database and found a strange behavior.
If I query
LOAD DATA LOCAL INFILE
'C:\\\\Temp\\\\JaydenW\\\\iata_processing\\\\icer\\\\rename\\\\ICER_2017-10-
12T09033
7Z023870.csv
It pops an error:
sqlalchemy.exc.InternalError: (pymysql.err.InternalError) (1148, u'The used
command is not allowed with this MySQL versi
on') [SQL: u"LOAD DATA LOCAL INFILE
'C:\\\\Temp\\\\JaydenW\\\\iata_processing\\\\icer\\\\rename\\\\ICER_2017-10-
12T090337Z023870.csv' INTO TABLE genie_etl.iata_icer_etl LINES TERMINATED BY
'\\n'
IGNORE 1 Lines (rtxt);"] (Background on this error at:
http://sqlalche.me/e/2j85)
And I find the reason is that:
I need to set the parameter as
args = "mysql+pymysql://"+username+":"+password+"#"+hostname+"/"+database+"?
local_infile=1"
If I use MySQL official connection library. I do not need to do so.
myConnection = MySQLdb.connect(host=hostname, user=username, passwd=password, db=database)
Can anyone help me to understand the difference between the two mechanisms?
The reason is that the mechanisms use different drivers.
In SQLAlchemy you appear to be using the pymysql engine, which uses the PyMySQL Connection class to create the DB connection. That one requires the user to explicitly pass the local_infile parameter if they want to use the LOAD DATA LOCAL command.
The other example uses MySQLdb, which is basically a wrapper around the MySQL C API (and to my knowledge not the official connection library; that would be MySQL Connector Python, which is also available on SQLAlchemy as mysqlconnector). This one apparently creates the connection in a way that the LOAD DATA LOCAL is enabled by default.

cx_Oracle-like package for Clojure

I'm coming from a very heavy Python->Oracle development environment and have been playing around with Clojure quite a bit. I love the ease of access that cx_Oracle gives me to the database on the Python end and was wondering if Clojure has something similar.
Specifically what I'm looking for is something to give me easy access to a database connection, ala cx_Oracle's "username/password#tns_name" format.
The best I've come up with so far is:
(defn get-datasource [user password server service]
{:datasource (clj-dbcp.core/make-datasource {:adapter :oracle
:style :service-name
:host server
:service-name service
:user user
:password password})})
This requires the server however and 95% of my users don't have the knowledge of what server they're hitting, just the tns name from tnsnames.ora.
In addition, I don't understand when I have a database connection and when it disconnects. With cx_Oracle I either had to do a with cx_Oracle.connect()... or a connection.close() to close the connection.
Can someone give me guidance as to how datasources work as far as connections go and the easiest way to connect to a database given a username, password, and tns alias?
Thanks!!
Best use Clojure's most idiomatic database library clojure.java.jdbc.
First, because the Oracle driver isn't available from a maven repository, we need to download the latest one and install it in our local repository, using the lein-localrepo plugin:
lein localrepo install -r D:\Path\To\Repo\
D:\Path\To\ojdbc6.jar
oracle.jdbc/oracledriver "12.1.0.1"
Now we can reference it in our project.clj, together with clojure.java.jdbc.
(defproject oracle-connect "0.1.0-SNAPSHOT"
:dependencies [[org.clojure/java.jdbc "0.3.3"]
[oracle.jdbc/oracledriver "12.1.0.1"]])
After starting a REPL we can connect to the database through a default host/port/SID connection
(ns oracle-connect
(:require [clojure.java.jdbc :as jdbc]))
(def db
{:classname "oracle.jdbc.OracleDriver"
:subprotocol "oracle:thin"
:subname "//#hostname:port:sid"
:user "username"
:password "password"}))
(jdbc/query db ["select ? as one from dual" 1])
db is just a basic map, referred to as the db-spec. It is not a real connection, but has all the information needed to make one. Clojure.java.jdbc makes one when needed, for instance in (query db ..).
We need to enter the classname manually because clojure.java.jdbc doesn't have a default mapping between the subprotocol and the classname for Oracle. This is probably because the Oracle JDBC driver has both thin and OCI JDBC connection options.
To make a connection with a TNS named database, the driver needs the location of the tnsnames.ora file. This is done by setting a system property called oracle.net.tns_admin.
(System/setProperty "oracle.net.tns_admin"
"D:/oracle/product/12.1.0.1/db_1/NETWORK/ADMIN")
Once this is set all we need for subname is the tnsname of the database.
(def db
{:classname "oracle.jdbc.OracleDriver"
:subprotocol "oracle:thin"
:subname "#tnsname"
:user "username"
:password "password"}))
(jdbc/query db ["select ? as one from dual" 1])
Now on to the 'how do connections work' part. As stated earlier, clojure.java.jdbc creates connections when needed, for instance within the query function.
If all you want to do is transform the results of a query, you can give in two extra optional named parameters: :row-fn and :result-set-fn. Every row is transformed with the row-fn, after which the whole resultset is transformed with the result-set-fn.
Both of these are executed within the context of the connection, so the connection is guaranteed to be open until all these actions have been performed, unless these functions return lazy sequences.
By default the :result-set-fn is defined as a doall guaranteeing all results are realized, but if you redefine it be sure to realize all lazy results. Usually whenever you get a connection or resultset closed exception while using results outside of the scope the problem is you didn't.
The connection only exists within the scope of the query function. At the end it is closed. This means that every query results in a connection. If you want multiple queries done within one connection, you can wrap them in a with-db-connection:
(jdbc/with-db-connection [c db]
(doall (map #(jdbc/query c ["select * from EMP where DEPTNO = ?" %])
(jdbc/query c ["select * from DEPT"] :row-fn :DEPTNO))))
In the with-db-connection binding you bind the db-spec to a var, and use that var instead of the db-spec in statements inside the binding scope. It creates a connection and adds that to the var. The other statements will use that connection. This is especially handy when creating dynamic queries based on the result of other queries.
The same thing goes for with-db-transaction. It has the same semantics as with-db-connection, however here the scope not only guarantees the same connection is used, but also that either all statements or none succeed by wrapping them in a transaction block. Both with-db-connection and with-db-transaction are nestable.
There are also more advanced options like creating connection pools and instead of having query et al. create or reuse single connections, have them draw a connection from the pool. See the clojure-doc.org documentation for those.

Select a single active database connection when starting Plone

I have a Plone 4 site which uses an additional Postgres database via a Z Psycopg 2 Database Connection object. Since the ZODB is sometimes replicated for testing and development purposes, there are a few fellow database connection objects, in a project_suffix naming scheme; this way, I can select one of the existing database adapters via buildout configuraton script.
However, I noticed that all existing database connection objects are apparently opened when Plone starts up. I don't know whether this is a real problem (e.g. when applying changes to the schema of the database of another instance), but I'd rather have Plone open only the single database which is actually used. How can I achieve this?
(Plone 4.2.4, Postgres 9.1.9, psycopg2 2.5.1, Debian Linux)
Update:
I added some code to the __init__.py of my product, which looks roughly like this:
from Shared.DC.ZRDB.Connection import Connection
...
dbname = env['DATABASE']
db = None
for id, obj in portalfolder.objectItems():
if isinstance(obj, Connection):
if id == dbname:
db = obj
else:
print 'before:', obj._v_connected
obj._v_database_connection.close()
print 'after: ', obj._v_connected
However, this seems not to work; there are no exceptions I'm aware of, but for both before and after, I get a timestamp, and when looking in the ZMI afterwards, the connections seem to be open.
Any ideas, please?

In Django, how can I set db connection timeout?

OK, I know it's not that simple. I have two db connections defined in my settings.py: default and cache. I'm using DatabaseCache backend from django.core.cache. I have database router defined so I can use separate database/schema/table for my models and for cache. Perfect!
Now sometimes my cache DB is not available and there are two cases:
Connection to databse was established already when DB crashed - this is easy - I can use this recipe: http://code.activestate.com/recipes/576780-timeout-for-nearly-any-callable/ and wrap my query like this:
try:
timelimited(TIMEOUT, self._meta.cache.get, cache_key))
expect TimeLimitExprired:
# live without cache
Connection to database wasn't yet established - so I need to wrap in timelimited some portion of code that actually establishes database connection. But I don't know where such code exists and how to wrap it selectively (i.e. wrap only cache connection, leave default connection without timeout)
Do you know how to do point 2?
Please note, this answer https://stackoverflow.com/a/1084571/940208 is not correct:
grep -R "connect_timeout" /usr/local/lib/python2.7/dist-packages/django/db
gives no results and cx_Oracle driver doesn't support this parameter as far as I know.

Categories