I want to open only one single connection to my cassandra database in Django.
I did not find anything on this topic unfortunately.
At the moment I got a class which i iniate whenever I need to query:
class CassConnection():
def __init__(self):
self.auth_prov = PlainTextAuthProvider(settings.runtime_settings[k.CASSANDRA_USER],
settings.runtime_settings[k.CASSANDRA_PASSWORD])
self.cluster = Cluster(settings.runtime_settings[k.CASSANDRA_CLUSTER], auth_provider=self.auth_prov)
self.session = self.cluster.connect(keyspace=settings.runtime_settings[k.KEYSPACE_TIREREADINGS])
self.session.row_factory = dict_factory
def get_session(self):
return self.session
I open a new Session in other classes for every query I make by:
self.con = CassConnection()
self.session = self.con.get_session()
Anyone a hint, how to keep the session open and make it accesible via multiple packages?
For a "one connection per Django process" scheme, basically, what you want is
a connection proxy class (which you already have) with a "lazy" connection behaviour (doesn't connect until someone tries to use the connection),
a module-global instance of this class that other packages can import, and
a way to ensure the connection gets properly closed (which cassandra requires for proper operation)
This last point well be the main difficulty as none of the available options (mainly atexit.register() and the __del__(self) method) are 101% reliable. Implementing __del__(self) on your connection proxy might be the most reliable still, just beware of circular depencies (http://eli.thegreenplace.net/2009/06/12/safely-using-destructors-in-python might be a good read here).
Also note that a "one single connection per django process" mean your connections must be totally thread-safe, as you usually will have many threads per Django process (depending on your wsgi container configuration).
Another solution - if thread-safety is an issue - might be to have a single connection per request...
Related
I've recently started using SQLAlchemy and am trying to understand how the connection pool and session work in a web-application
I am building an API using flask.
__init__.py
engine = create_engine('mysql://username:password#localhost/my_database')
DBSession = sessionmaker(bind=engine)
views.py
#app.route('/books', methods = ['GET'])
def getBooks():
session = DBSession()
returnedBooks = session.query(BOOK).all()
session.close()
return something...
1) Firstly, if I don't explicitly close my session object, will it automatically close after the request has been processed?
2) When my app receives multiple requests, are the multiple session objects being created all linked to the single engine object that was created in my __init__.py file?
3) Are the session objects being created in view.py the connections that the connection pool holds? If so, and these weren't closed, then would new connections have to be made for subsequent requests?
4) Should I be using dispose() at some point?
1) Firstly, if I don't explicitly close my session object, will it
automatically close after the request has been processed?
The garbage collector will eventually* call __del__ on the session. if the session has not been sorted out by you in some other way, this will probably cause a rollback. You should usually be doing something like:
import contextlib
def myRequestHandler(args):
with contextlib.closing(DBSession()) as session:
result = do_stuff(session)
if neccesary:
session.commit()
else:
session.rollback()
return result
2) When my app receives multiple requests, are the multiple session
objects being created all linked to the single engine object that was
created in my __init__.py file?
yes, sessionmaker() holds on to all Session configuration. Using the pattern you've got, you will get a new session for every request, and that's a good thing.
3) Are the session objects being created in view.py the connections
that the connection pool holds?
sessions and connections are not the same thing; although each session is ever using one connection at a time, and they return their connection to the pool when they're done with them.
If so, and these weren't closed, then
would new connections have to be made for subsequent requests?
different pool implementations have different rules, but for most database engines, the default is QueuePool; and it has a default maximum connections of 15. subsequence requests for additional connections will block or timeout. Connections are reused when possible, though.
4) Should I be using dispose() at some point?
usually not, if you are also using pools and engines and sessions, then any additional resource management outside of committing/rolling back sessions is probably not needed
*eventually actually means sometime between immediately and never; don't count on it to do useful work; Note that your connections will be closed forceably by the OS when your process dies, so using GC for connection management doesn't buy you anything.
OK, I know it's not that simple. I have two db connections defined in my settings.py: default and cache. I'm using DatabaseCache backend from django.core.cache. I have database router defined so I can use separate database/schema/table for my models and for cache. Perfect!
Now sometimes my cache DB is not available and there are two cases:
Connection to databse was established already when DB crashed - this is easy - I can use this recipe: http://code.activestate.com/recipes/576780-timeout-for-nearly-any-callable/ and wrap my query like this:
try:
timelimited(TIMEOUT, self._meta.cache.get, cache_key))
expect TimeLimitExprired:
# live without cache
Connection to database wasn't yet established - so I need to wrap in timelimited some portion of code that actually establishes database connection. But I don't know where such code exists and how to wrap it selectively (i.e. wrap only cache connection, leave default connection without timeout)
Do you know how to do point 2?
Please note, this answer https://stackoverflow.com/a/1084571/940208 is not correct:
grep -R "connect_timeout" /usr/local/lib/python2.7/dist-packages/django/db
gives no results and cx_Oracle driver doesn't support this parameter as far as I know.
I'm constructing my app such that each user has their own database (for easy isolation, and to minimize the need for sharding). This means that each web request, and all of the background scripts, need to connect to a different database based on which user is making the request, and use that connection for all function calls.
I figure I can make some sort of middleware that would pass the right connection to my web requests by attaching it to the request variable, but I don't know how I should ensure that all functions and model methods called by the request use this connection.
Well how to "ensure that all functions and model methods called by the request use this connection" is easy. You pass the connection into your api as with any well-designed code that isn't relying on global variables for such things. So you have a database session object loaded per-request, and you pass it down. It's very easy for model objects to turtle that session object further without explicitly passing it because each managed object knows what session owns it, and you can query it from there.
db = request.db
user = db.query(User).get(1)
user.add_group('foo')
class User(Base):
def add_group(self, name):
db = sqlalchemy.orm.object_session(self)
group = Group(name=name)
db.add(group)
I'm not recommending you use that exact pattern but it serves as an example of how to grab the session from a managed object, avoiding having to pass the session everywhere explicitly.
On to your original question, how to handle multi-tenancy... In your data model! Designing a system where you are splitting things up at that low of a level is a big maintenance burden and it does not scale well. For example it becomes very difficult to use any type of connection pooling when you have an arbitrary number of independent connections. To get around that people commonly use the SQL SCHEMA feature supported by some databases. That allows you to use the same connection but have access to a different table structure per session. That's better, but again managing all of those schemas independently should raise some red flags, violating DRY with all of that duplication in your data model. Any duplication at that level quickly becomes a burden that you need to be ready for.
Web.py has its own database API, web.db. It's possible to use SQLObject instead, but I haven't been able to find documentation describing how to do this properly. I'm especially interested in managing database connections. It would be best to establish a connection at the wsgi entry point, and reuse it. Webpy cookbook contains an example how to do this with SQLAlchemy. I'd be interested to see how to properly do a similar thing using SQLObject.
This is how I currently do it:
class MyPage(object):
def GET(self):
ConnectToDatabase()
....
return render.MyPage(...)
This is obviously inefficient, because it establishes a new database connection on each query. I'm sure there's a better way.
As far as I understand the SQLAlchemy example given, a processor is used, that is, a session is created for each connection and committed when the handler is complete (or rolled back if an error has occurred).
I don't see any simple way to do what you propose, i.e. open a connection at the WSGI entry point. You will probably need a connection pool to serve multiple clients at the same time. (I have no idea what are the requirements for efficiency, code simplicity and so on, though. Please comment.)
Inserting ConnectToDatabase calls into each handler is of course ugly. I suggest that you adapt the cookbook example replacing the SQLAlchemy session with a SQLObject connection.
I have a Pylons-based web application which connects via Sqlalchemy (v0.5) to a Postgres database. For security, rather than follow the typical pattern of simple web apps (as seen in just about all tutorials), I'm not using a generic Postgres user (e.g. "webapp") but am requiring that users enter their own Postgres userid and password, and am using that to establish the connection. That means we get the full benefit of Postgres security.
Complicating things still further, there are two separate databases to connect to. Although they're currently in the same Postgres cluster, they need to be able to move to separate hosts at a later date.
We're using sqlalchemy's declarative package, though I can't see that this has any bearing on the matter.
Most examples of sqlalchemy show trivial approaches such as setting up the Metadata once, at application startup, with a generic database userid and password, which is used through the web application. This is usually done with Metadata.bind = create_engine(), sometimes even at module-level in the database model files.
My question is, how can we defer establishing the connections until the user has logged in, and then (of course) re-use those connections, or re-establish them using the same credentials, for each subsequent request.
We have this working -- we think -- but I'm not only not certain of the safety of it, I also think it looks incredibly heavy-weight for the situation.
Inside the __call__ method of the BaseController we retrieve the userid and password from the web session, call sqlalchemy create_engine() once for each database, then call a routine which calls Session.bind_mapper() repeatedly, once for each table that may be referenced on each of those connections, even though any given request usually references only one or two tables. It looks something like this:
# in lib/base.py on the BaseController class
def __call__(self, environ, start_response):
# note: web session contains {'username': XXX, 'password': YYY}
url1 = 'postgres://%(username)s:%(password)s#server1/finance' % session
url2 = 'postgres://%(username)s:%(password)s#server2/staff' % session
finance = create_engine(url1)
staff = create_engine(url2)
db_configure(staff, finance) # see below
... etc
# in another file
Session = scoped_session(sessionmaker())
def db_configure(staff, finance):
s = Session()
from db.finance import Employee, Customer, Invoice
for c in [
Employee,
Customer,
Invoice,
]:
s.bind_mapper(c, finance)
from db.staff import Project, Hour
for c in [
Project,
Hour,
]:
s.bind_mapper(c, staff)
s.close() # prevents leaking connections between sessions?
So the create_engine() calls occur on every request... I can see that being needed, and the Connection Pool probably caches them and does things sensibly.
But calling Session.bind_mapper() once for each table, on every request? Seems like there has to be a better way.
Obviously, since a desire for strong security underlies all this, we don't want any chance that a connection established for a high-security user will inadvertently be used in a later request by a low-security user.
Binding global objects (mappers, metadata) to user-specific connection is not good way. As well as using scoped session. I suggest to create new session for each request and configure it to use user-specific connections. The following sample assumes that you use separate metadata objects for each database:
binds = {}
finance_engine = create_engine(url1)
binds.update(dict.fromkeys(finance_metadata.sorted_tables, finance_engine))
# The following line is required when mappings to joint tables are used (e.g.
# in joint table inheritance) due to bug (or misfeature) in SQLAlchemy 0.5.4.
# This issue might be fixed in newer versions.
binds.update(dict.fromkeys([Employee, Customer, Invoice], finance_engine))
staff_engine = create_engine(url2)
binds.update(dict.fromkeys(staff_metadata.sorted_tables, staff_engine))
# See comment above.
binds.update(dict.fromkeys([Project, Hour], staff_engine))
session = sessionmaker(binds=binds)()
I would look at the connection pooling and see if you can't find a way to have one pool per user.
You can dispose() the pool when the user's session has expired