I'm coming from a very heavy Python->Oracle development environment and have been playing around with Clojure quite a bit. I love the ease of access that cx_Oracle gives me to the database on the Python end and was wondering if Clojure has something similar.
Specifically what I'm looking for is something to give me easy access to a database connection, ala cx_Oracle's "username/password#tns_name" format.
The best I've come up with so far is:
(defn get-datasource [user password server service]
{:datasource (clj-dbcp.core/make-datasource {:adapter :oracle
:style :service-name
:host server
:service-name service
:user user
:password password})})
This requires the server however and 95% of my users don't have the knowledge of what server they're hitting, just the tns name from tnsnames.ora.
In addition, I don't understand when I have a database connection and when it disconnects. With cx_Oracle I either had to do a with cx_Oracle.connect()... or a connection.close() to close the connection.
Can someone give me guidance as to how datasources work as far as connections go and the easiest way to connect to a database given a username, password, and tns alias?
Thanks!!
Best use Clojure's most idiomatic database library clojure.java.jdbc.
First, because the Oracle driver isn't available from a maven repository, we need to download the latest one and install it in our local repository, using the lein-localrepo plugin:
lein localrepo install -r D:\Path\To\Repo\
D:\Path\To\ojdbc6.jar
oracle.jdbc/oracledriver "12.1.0.1"
Now we can reference it in our project.clj, together with clojure.java.jdbc.
(defproject oracle-connect "0.1.0-SNAPSHOT"
:dependencies [[org.clojure/java.jdbc "0.3.3"]
[oracle.jdbc/oracledriver "12.1.0.1"]])
After starting a REPL we can connect to the database through a default host/port/SID connection
(ns oracle-connect
(:require [clojure.java.jdbc :as jdbc]))
(def db
{:classname "oracle.jdbc.OracleDriver"
:subprotocol "oracle:thin"
:subname "//#hostname:port:sid"
:user "username"
:password "password"}))
(jdbc/query db ["select ? as one from dual" 1])
db is just a basic map, referred to as the db-spec. It is not a real connection, but has all the information needed to make one. Clojure.java.jdbc makes one when needed, for instance in (query db ..).
We need to enter the classname manually because clojure.java.jdbc doesn't have a default mapping between the subprotocol and the classname for Oracle. This is probably because the Oracle JDBC driver has both thin and OCI JDBC connection options.
To make a connection with a TNS named database, the driver needs the location of the tnsnames.ora file. This is done by setting a system property called oracle.net.tns_admin.
(System/setProperty "oracle.net.tns_admin"
"D:/oracle/product/12.1.0.1/db_1/NETWORK/ADMIN")
Once this is set all we need for subname is the tnsname of the database.
(def db
{:classname "oracle.jdbc.OracleDriver"
:subprotocol "oracle:thin"
:subname "#tnsname"
:user "username"
:password "password"}))
(jdbc/query db ["select ? as one from dual" 1])
Now on to the 'how do connections work' part. As stated earlier, clojure.java.jdbc creates connections when needed, for instance within the query function.
If all you want to do is transform the results of a query, you can give in two extra optional named parameters: :row-fn and :result-set-fn. Every row is transformed with the row-fn, after which the whole resultset is transformed with the result-set-fn.
Both of these are executed within the context of the connection, so the connection is guaranteed to be open until all these actions have been performed, unless these functions return lazy sequences.
By default the :result-set-fn is defined as a doall guaranteeing all results are realized, but if you redefine it be sure to realize all lazy results. Usually whenever you get a connection or resultset closed exception while using results outside of the scope the problem is you didn't.
The connection only exists within the scope of the query function. At the end it is closed. This means that every query results in a connection. If you want multiple queries done within one connection, you can wrap them in a with-db-connection:
(jdbc/with-db-connection [c db]
(doall (map #(jdbc/query c ["select * from EMP where DEPTNO = ?" %])
(jdbc/query c ["select * from DEPT"] :row-fn :DEPTNO))))
In the with-db-connection binding you bind the db-spec to a var, and use that var instead of the db-spec in statements inside the binding scope. It creates a connection and adds that to the var. The other statements will use that connection. This is especially handy when creating dynamic queries based on the result of other queries.
The same thing goes for with-db-transaction. It has the same semantics as with-db-connection, however here the scope not only guarantees the same connection is used, but also that either all statements or none succeed by wrapping them in a transaction block. Both with-db-connection and with-db-transaction are nestable.
There are also more advanced options like creating connection pools and instead of having query et al. create or reuse single connections, have them draw a connection from the pool. See the clojure-doc.org documentation for those.
Related
I am using a library that creates an SQLite library in-memory by calling sqlite3.connect(':memory:'). I would like to connect to this database using sqlalchemy to use some ORM and other nice bells and whistles. Is there, in the depths of SQLAlchemy's API, a way to pass the resulting sqlite3.Connection object through so that I can re-use it?
I cannot just re-connect with connection = sqlalchemy.create_engine('sqlite:///:memory:').connect() – as the SQLite documentation states: “The database ceases to exist as soon as the database connection is closed. Every :memory: database is distinct from every other. So, opening two database connections each with the filename ":memory:" will create two independent in-memory databases.” (Which makes sense. I also tried it, and the behaviour is as expected.)
I have tried to follow SQLAlchemy's source code to find the low level location where the database connection is established and SQLite is actually called, but so far I found nothing. It looks like SQLAlchemy uses far too much obscure alchemy to do that for me to understand when and where it happens.
Here's a way to do that:
# some connection is created - by you or someone else
conn = sqlite3.connect(':memory:')
...
def get_connection():
# just a debug print to verify that it's indeed getting called:
print("returning the connection")
return conn
# create a SQL Alchamy engine that uses the same in-memory sqlite connection
engine = create_engine('sqlite://', creator = get_connection)
From this point on, just use the engine as you wish.
Here's a link to the documentation of this feature.
In my application I have triggers that need access to things like user id. I am storing that information with
set_config('PRIVATE.'|'user_id', '221', false)
then, while I am doing operations that modify the database, triggers may do:
user_id = current_setting('PRIVATE.user_id');
it seems to work great. My database actions are mostly from python, psycopg2, once I get a connection I'll do the set_config() as my first operation, then go about my database business. Is this practice a good one or could data leak from one session to another? I was doing this sort of thing with the SD and GD variables in plpython, but that language proved too heavy for what I was trying to do so I had to shift to plpgsql.
While it's not really what they're designed for, you can use GUCs as session variables.
They can also be transaction scoped, with SET LOCAL or the set_config equivalent.
So long as you don't allow the user to run arbitrary SQL they're a reasonable choice, and session-local GUCs aren't shared with other sessions. They're not designed for secure session-local storage but they're handy places to stash things like an application's "current user" if you're not using SET ROLE or SET SESSION AUTHORIZATION for that.
Do be aware that the user can define them via environment variables if you let them run a libpq based client, e.g.
$ PGOPTIONS="-c myapp.user_id=fred" psql -c "SHOW myapp.user_id;"
myapp.user_id
---------------
fred
(1 row)
Also, on older PostgreSQL versions you had to declare the namespace in postgresql.conf before you could use it.
OK, I know it's not that simple. I have two db connections defined in my settings.py: default and cache. I'm using DatabaseCache backend from django.core.cache. I have database router defined so I can use separate database/schema/table for my models and for cache. Perfect!
Now sometimes my cache DB is not available and there are two cases:
Connection to databse was established already when DB crashed - this is easy - I can use this recipe: http://code.activestate.com/recipes/576780-timeout-for-nearly-any-callable/ and wrap my query like this:
try:
timelimited(TIMEOUT, self._meta.cache.get, cache_key))
expect TimeLimitExprired:
# live without cache
Connection to database wasn't yet established - so I need to wrap in timelimited some portion of code that actually establishes database connection. But I don't know where such code exists and how to wrap it selectively (i.e. wrap only cache connection, leave default connection without timeout)
Do you know how to do point 2?
Please note, this answer https://stackoverflow.com/a/1084571/940208 is not correct:
grep -R "connect_timeout" /usr/local/lib/python2.7/dist-packages/django/db
gives no results and cx_Oracle driver doesn't support this parameter as far as I know.
Example scenario:
MySQL running a single server -> HOSTNAME
Two MySQL databases on that server -> USERS , GAMES .
Task -> Fetch 10 newest games from GAMES.my_games_table , and fetch users playing those games from USERS.my_users_table ( assume no joins )
In Django as well as Python MySQLdb , why is having one cursor for each database more preferable ?
What is the disadvantage of an extended cursor which is single per MySQL server and can switch databases ( eg by querying "use USERS;" ), and then work on corresponding database
MySQL connections are cheap, but isn't single connection better than many , if there is a linear flow and no complex tranasactions which might need two cursors ?
A shorter answer would be, "MySQL doesn't support that type of cursor", so neither does Python-MySQL, so the reason one connection command is preferred is because that's the way MySQL works. Which is sort of a tautology.
However, the longer answer is:
A 'cursor', by your definition, would be some type of object accessing tables and indexes within an RDMS, capable of maintaining its state.
A 'connection', by your definition, would accept commands, and either allocate or reuse a cursor to perform the action of the command, returning its results to the connection.
By your definition, a 'connection' would/could manage multiple cursors.
You believe this would be the preferred/performant way to access a database as 'connections' are expensive, and 'cursors' are cheap.
However:
A cursor in MySQL (and other RDMS) is not a the user-accessible mechanism for performing operations. MySQL (and other's) perform operations in as "set", or rather, they compile your SQL command into an internal list of commands, and do numerous, complex bits depending on the nature of your SQL command and your table structure.
A cursor is a specific mechanism, utilized within stored procedures (and there only), giving the developer a way to work with data in a procedural way.
A 'connection' in MySQL is what you think of as a 'cursor', sort of. MySQL does not expose it's internals for you as an iterator, or pointer, that is merely moving over tables. It exposes it's internals as a 'connection' which accepts SQL and other commands, translates those commands into an internal action, performs that action, and returns it's result to you.
This is the difference between a 'set' and a 'procedural' execution style (which is really about the granularity of control you, the user, is given access to, or at least, the granularity inherent in how the RDMS abstracts away its internals when it exposes them via an API).
As you say, MySQL connections are cheap, so for your case, I'm not sure there is a technical advantage either way, outside of code organization and flow. It might be easier to manage two cursors than to keep track of which database a single cursor is currently talking to by painstakingly tracking SQL 'USE' statements. Mileage with other databases may vary -- remember that Django strives to be database-agnostic.
Also, consider the case where two different databases, even on the same server, require different access credentials. In such a case, two connections will be necessary, so that each connection can successfully authenticate.
One cursor per database is not necessarily preferable, it's just the default behavior.
The rationale is that different databases are more often than not on different servers, use different engines, and/or need different initialization options. (Otherwise, why should you be using different "databases" in the first place?)
In your case, if your two databases are just namespaces of tables (what should be called "schemas" in SQL jargon) but reside on the same MySQL instance, then by all means use a single connection. (How to configure Django to do so is actually an altogether different question.)
You are also right that a single connection is better than two, if you only have a single thread and don't actually need two database workers at the same time.
I have a Pylons-based web application which connects via Sqlalchemy (v0.5) to a Postgres database. For security, rather than follow the typical pattern of simple web apps (as seen in just about all tutorials), I'm not using a generic Postgres user (e.g. "webapp") but am requiring that users enter their own Postgres userid and password, and am using that to establish the connection. That means we get the full benefit of Postgres security.
Complicating things still further, there are two separate databases to connect to. Although they're currently in the same Postgres cluster, they need to be able to move to separate hosts at a later date.
We're using sqlalchemy's declarative package, though I can't see that this has any bearing on the matter.
Most examples of sqlalchemy show trivial approaches such as setting up the Metadata once, at application startup, with a generic database userid and password, which is used through the web application. This is usually done with Metadata.bind = create_engine(), sometimes even at module-level in the database model files.
My question is, how can we defer establishing the connections until the user has logged in, and then (of course) re-use those connections, or re-establish them using the same credentials, for each subsequent request.
We have this working -- we think -- but I'm not only not certain of the safety of it, I also think it looks incredibly heavy-weight for the situation.
Inside the __call__ method of the BaseController we retrieve the userid and password from the web session, call sqlalchemy create_engine() once for each database, then call a routine which calls Session.bind_mapper() repeatedly, once for each table that may be referenced on each of those connections, even though any given request usually references only one or two tables. It looks something like this:
# in lib/base.py on the BaseController class
def __call__(self, environ, start_response):
# note: web session contains {'username': XXX, 'password': YYY}
url1 = 'postgres://%(username)s:%(password)s#server1/finance' % session
url2 = 'postgres://%(username)s:%(password)s#server2/staff' % session
finance = create_engine(url1)
staff = create_engine(url2)
db_configure(staff, finance) # see below
... etc
# in another file
Session = scoped_session(sessionmaker())
def db_configure(staff, finance):
s = Session()
from db.finance import Employee, Customer, Invoice
for c in [
Employee,
Customer,
Invoice,
]:
s.bind_mapper(c, finance)
from db.staff import Project, Hour
for c in [
Project,
Hour,
]:
s.bind_mapper(c, staff)
s.close() # prevents leaking connections between sessions?
So the create_engine() calls occur on every request... I can see that being needed, and the Connection Pool probably caches them and does things sensibly.
But calling Session.bind_mapper() once for each table, on every request? Seems like there has to be a better way.
Obviously, since a desire for strong security underlies all this, we don't want any chance that a connection established for a high-security user will inadvertently be used in a later request by a low-security user.
Binding global objects (mappers, metadata) to user-specific connection is not good way. As well as using scoped session. I suggest to create new session for each request and configure it to use user-specific connections. The following sample assumes that you use separate metadata objects for each database:
binds = {}
finance_engine = create_engine(url1)
binds.update(dict.fromkeys(finance_metadata.sorted_tables, finance_engine))
# The following line is required when mappings to joint tables are used (e.g.
# in joint table inheritance) due to bug (or misfeature) in SQLAlchemy 0.5.4.
# This issue might be fixed in newer versions.
binds.update(dict.fromkeys([Employee, Customer, Invoice], finance_engine))
staff_engine = create_engine(url2)
binds.update(dict.fromkeys(staff_metadata.sorted_tables, staff_engine))
# See comment above.
binds.update(dict.fromkeys([Project, Hour], staff_engine))
session = sessionmaker(binds=binds)()
I would look at the connection pooling and see if you can't find a way to have one pool per user.
You can dispose() the pool when the user's session has expired