I've recently started using SQLAlchemy and am trying to understand how the connection pool and session work in a web-application
I am building an API using flask.
__init__.py
engine = create_engine('mysql://username:password#localhost/my_database')
DBSession = sessionmaker(bind=engine)
views.py
#app.route('/books', methods = ['GET'])
def getBooks():
session = DBSession()
returnedBooks = session.query(BOOK).all()
session.close()
return something...
1) Firstly, if I don't explicitly close my session object, will it automatically close after the request has been processed?
2) When my app receives multiple requests, are the multiple session objects being created all linked to the single engine object that was created in my __init__.py file?
3) Are the session objects being created in view.py the connections that the connection pool holds? If so, and these weren't closed, then would new connections have to be made for subsequent requests?
4) Should I be using dispose() at some point?
1) Firstly, if I don't explicitly close my session object, will it
automatically close after the request has been processed?
The garbage collector will eventually* call __del__ on the session. if the session has not been sorted out by you in some other way, this will probably cause a rollback. You should usually be doing something like:
import contextlib
def myRequestHandler(args):
with contextlib.closing(DBSession()) as session:
result = do_stuff(session)
if neccesary:
session.commit()
else:
session.rollback()
return result
2) When my app receives multiple requests, are the multiple session
objects being created all linked to the single engine object that was
created in my __init__.py file?
yes, sessionmaker() holds on to all Session configuration. Using the pattern you've got, you will get a new session for every request, and that's a good thing.
3) Are the session objects being created in view.py the connections
that the connection pool holds?
sessions and connections are not the same thing; although each session is ever using one connection at a time, and they return their connection to the pool when they're done with them.
If so, and these weren't closed, then
would new connections have to be made for subsequent requests?
different pool implementations have different rules, but for most database engines, the default is QueuePool; and it has a default maximum connections of 15. subsequence requests for additional connections will block or timeout. Connections are reused when possible, though.
4) Should I be using dispose() at some point?
usually not, if you are also using pools and engines and sessions, then any additional resource management outside of committing/rolling back sessions is probably not needed
*eventually actually means sometime between immediately and never; don't count on it to do useful work; Note that your connections will be closed forceably by the OS when your process dies, so using GC for connection management doesn't buy you anything.
Related
I created a rest API with openapi generator that contains all the requests necessary for selecting, inserting, and updating my SQL database.
I use from my database generation and manipulation SQLAlchemy and I'm not sure how to use the session to interact with the database in this context.
My project looks like this:
DB
| openapi_server (generated)
| __init__.py
| request.py
| database.py
In database.py I keep my database structure.
In request.py I have all the functions that need to be processed on every request(to interact with the database).
My way of handling this situation is: I create a session variable at the beginning of each function and after the operation is complete I close it.
Any other methods that are more scalable and easy to maintain or which are the best practices?
My understanding is that the sqlalchemy session is different from the client session in that the client session stores information about authorization & permissions whereas the sqlalchemy session is a gil-bound transaction state which associates your code / machine to an external database.
Assuming you're not utilizing multithreading or parallel processing, a single sqlalchemy session shared between your application would be appropriate. In the case where your users have different levels of database permissions, I would establish those rules in your application authorization, rather than the database user-permission schema. (That should be reserved for system-users.)
Bear in mind, multiple sqlalchemy sessions are appropriate in many scenarios and there are advantages for creating and closing sessions on the fly. But there are also potential downsides, such as write collisions (2 processes try to write the same record) and so on. In these more fine grained cases, I'd suggest a queuing process as a central orchestrator.
For implementation:I usually create a file create_session.py which has a function to create a new db session with the appropriate DB URI. I then call that function in the main __init__.py like so: session = create_session() --> importing that session throughout the application is done by importing session from the main module ex: from database import session.
In cases where you need to create new / multiple sessions, do so with:
# Getting the path right here isn't always straightforward tbh
# basically, import the function from the module directly
from create_session import create_session
def do_something():
# Always create your session in a method
# otherwise your db will open many unnecessary connections
my_session = create_session()
print('Done')
# Close the session when you're done
my_session.close()
I want to open only one single connection to my cassandra database in Django.
I did not find anything on this topic unfortunately.
At the moment I got a class which i iniate whenever I need to query:
class CassConnection():
def __init__(self):
self.auth_prov = PlainTextAuthProvider(settings.runtime_settings[k.CASSANDRA_USER],
settings.runtime_settings[k.CASSANDRA_PASSWORD])
self.cluster = Cluster(settings.runtime_settings[k.CASSANDRA_CLUSTER], auth_provider=self.auth_prov)
self.session = self.cluster.connect(keyspace=settings.runtime_settings[k.KEYSPACE_TIREREADINGS])
self.session.row_factory = dict_factory
def get_session(self):
return self.session
I open a new Session in other classes for every query I make by:
self.con = CassConnection()
self.session = self.con.get_session()
Anyone a hint, how to keep the session open and make it accesible via multiple packages?
For a "one connection per Django process" scheme, basically, what you want is
a connection proxy class (which you already have) with a "lazy" connection behaviour (doesn't connect until someone tries to use the connection),
a module-global instance of this class that other packages can import, and
a way to ensure the connection gets properly closed (which cassandra requires for proper operation)
This last point well be the main difficulty as none of the available options (mainly atexit.register() and the __del__(self) method) are 101% reliable. Implementing __del__(self) on your connection proxy might be the most reliable still, just beware of circular depencies (http://eli.thegreenplace.net/2009/06/12/safely-using-destructors-in-python might be a good read here).
Also note that a "one single connection per django process" mean your connections must be totally thread-safe, as you usually will have many threads per Django process (depending on your wsgi container configuration).
Another solution - if thread-safety is an issue - might be to have a single connection per request...
I am writing an application which connects to a database. I want to create that db connection once, and then reuse that connection throughout the life of the application.
I also want to authenticate users. A user's auth will live for only the life of a request.
How can I differentiate between objects stored for the life of a flask app, versus specific to the request? Where would I store them so that all modules (and subsequent blueprints) have access to them?
Here is my sample app:
from flask import Flask, g
app = Flask(__name__)
#app.before_first_request
def setup_database(*args, **kwargs):
print 'before first request', g.__dict__
g.database = 'DATABASE'
print 'after first request', g.__dict__
#app.route('/')
def index():
print 'request start', g.__dict__
g.current_user = 'USER'
print 'request end', g.__dict__
return 'hello'
if __name__ == '__main__':
app.run(debug=True, port=6001)
When I run this (Flask 0.10.1) and navigate to http://localhost:6001/, here is what shows up in the console:
$ python app.py
* Running on http://127.0.0.1:6001/
* Restarting with reloader
before first request {}
after first request {'database': 'DATABASE'}
request start {'database': 'DATABASE'}
request end {'current_user': 'USER', 'database': 'DATABASE'}
127.0.0.1 - - [30/Sep/2013 11:36:40] "GET / HTTP/1.1" 200 -
request start {}
request end {'current_user': 'USER'}
127.0.0.1 - - [30/Sep/2013 11:36:41] "GET / HTTP/1.1" 200 -
That is, the first request is working as expected: flask.g is holding my database, and when the request starts, it also has my user's information.
However, upon my second request, flask.g is wiped clean! My database is nowhere to be found.
Now, I know that flask.g used to apply to the request only. But now that it is bound to the application (as of 0.10), I want to know how to bind variables to the entire application, rather than just a single request.
What am I missing?
edit: I'm specifically interested in MongoDB - and in my case, maintaining connections to multiple Mongo databases. Is my best bet to just create those connections in __init__.py and reuse those objects?
flask.g will only store things for the duration of a request. The documentation mentioned that the values are stored on the application context rather than the request, but that is more of an implementation issue: it doesn't change the fact that objects in flask.g are only available in the same thread, and during the lifetime of a single request.
For example, in the official tutorial section on database connections, the connection is made once at the beginning of the request, then terminated at the end of the request.
Of course, if you really wanted to, you could create the database connection once, store it in __init__.py, and reference it (as a global variable) as needed. However, you shouldn't do this: the connection could close or timeout, and you could not use the connection in multiple threads.
Since you didn't specify HOW you will be using Mongo in Python, I assume you will be using PyMongo, since that handles all of the connection pooling for you.
In this case, you would do something like this...
from flask import Flask
from pymongo import MongoClient
# This line of code does NOT create a connection
client = MongoClient()
app = Flask()
# This can be in __init__.py, or some other file that has imported the "client" attribute
#app.route('/'):
def index():
posts = client.database.posts.find()
You could, if you wish, do something like this...
from flask import Flask, g
from pymongo import MongoClient
# This line of code does NOT create a connection
client = MongoClient()
app = Flask()
#app.before_request
def before_request():
g.db = client.database
#app.route('/'):
def index():
posts = g.db.posts.find()
This really isn't all that different, however it can be helpful for logic that you want to perform on every request (such as setting g.db to a specific database depending on the user that is logged in).
Finally, you can realize that most of the work of setting up PyMongo with Flask is probably done for you in Flask-PyMongo.
Your other question deals with how you keep track of stuff specific to the user that is logged in. Well, in this case, you DO need to store some data that sticks around with the connection. flask.g is cleared at the end of the reuquest, so that's no good.
What you want to use is sessions. This is a place where you can store values that is (with the default implementation) stored in a cookie on the user's browser. Since the cookie will be passed along with every request the user's browser makes to your web site, you will have available the data you put in the session.
Keep in mind, though, that the session is NOT stored on the server. It is turned into a string that is passed back and forth to the user. Therefore, you can't store things like DB connections onto it. You would instead store identifiers (like user IDs).
Making sure that user authentication works is VERY hard to get right. The security concerns that you need to make sure of are amazingly complex. I would strongly recommend using something like Flask-Login to handle this for you. You can still use the session for storing other items as needed, or you can let Flask-Login handle determining the user ID and store the values you need in the database and retrieving them from the database in every request.
So, in summary, there are a few different ways to do what you want to do. Each have their usages.
Globals are good for items that are thread-safe (such as the PyMongo's MongoClient).
flask.g can be used for storing data in the lifetime of a request. With SQLAlchemy-based flask apps, a common thing to do is to ensure that all changes happen at once, at the end of a request using an after_request method. Using flask.g for something like this is very helpful.
The Flask session can be used to store simple data (strings and numbers, not connection objects) that can be used on subsequent requests that come from the same user. This is entirely dependent on using cookies, so at any point the user could delete the cookie and everything in the "session" will be lost. Therefore, you probably want to store much of your data in databases, with the session used to identify the data that relates to the user in the session.
"bound to the application" does not mean what you think it means. It means that g is bound to the currently running request. Quoth the docs:
Flask provides you with a special object that ensures it is only valid for the active request and that will return different values for each request.
It should be noted that Flask's tutorials specifically do not persist database objects, but that this is not normative for any application of substantial size. If you're really interested in diving down the rabbit hole, I suggest a database connection pooling tool. (such as this one, mentioned in the SO answer ref'd above)
I suggest you use session to manage user information. Sessions help you keep information b/w multiple requests and flask provides you a session framework already.
from flask import session
session['usename'] = 'xyz'
Look at the extension Flask-Login. It is well designed to handle user authentications.
For database, I suggest looking at Flask-SQLAlchemy extension. This takes care of initialization, pooling, teardowns etc. for you out of the box. All you need to do is define the database URI in a config and bind it to the application.
from flask.ext.sqlalchemy import SQLAlchemy
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////tmp/test.db'
db = SQLAlchemy(app)
I am currently working on a new web application that needs to execute an SQL statement before giving a session to the application itself.
In detail: I am running a PostgreSQL database server with multiple schemas and I need to execute a SET search_path statement before the application uses the session. I am also using the ZopeTransactionExtension to have transactions automatically handled at the request level.
To ensure the exectuion of the SQL statement, there seem to be two possible ways:
Executing the statement at the Engine/Connection level via SQLAlchemy events (from Multi-tenancy with SQLAlchemy)
Executing the statement at the session level (from SQLAlchemy support of Postgres Schemas)
Since I am using a scoped session and want to keep my transactions intact, I wonder which of these ways will possibly disturb transaction management.
For example, does the Engine hand out a new connection from the Pool on every query? Or is it attached to the session for its lifetime, i.e. until the request has been processed and the session & transaction are closed/committed?
On the other hand, since I am using a scoped session, can I perform it the way zzzeek suggested it in the second link? That is, is the context preserved and automatically reset once the transaction is over?
Is there possibly a third way that I am missing?
For example, does the Engine hand out a new connection from the Pool on every query?
only if you have autocommit=True, which should not be the case.
Or is it attached to the session for its lifetime, i.e. until the request has been processed and the session & transaction are closed/committed?
it's attached per transaction. But the "search_path" in Postgresql is per postgresql session (not to be confused with SQLAlchemy session) - its basically the lifespan of the connection itself.
The session (and the engine, and the pool) these days has a ton of event hooks you can grab onto in order to set up state like this. If you want to stick with the Session you can try after_begin.
I have a Pylons-based web application which connects via Sqlalchemy (v0.5) to a Postgres database. For security, rather than follow the typical pattern of simple web apps (as seen in just about all tutorials), I'm not using a generic Postgres user (e.g. "webapp") but am requiring that users enter their own Postgres userid and password, and am using that to establish the connection. That means we get the full benefit of Postgres security.
Complicating things still further, there are two separate databases to connect to. Although they're currently in the same Postgres cluster, they need to be able to move to separate hosts at a later date.
We're using sqlalchemy's declarative package, though I can't see that this has any bearing on the matter.
Most examples of sqlalchemy show trivial approaches such as setting up the Metadata once, at application startup, with a generic database userid and password, which is used through the web application. This is usually done with Metadata.bind = create_engine(), sometimes even at module-level in the database model files.
My question is, how can we defer establishing the connections until the user has logged in, and then (of course) re-use those connections, or re-establish them using the same credentials, for each subsequent request.
We have this working -- we think -- but I'm not only not certain of the safety of it, I also think it looks incredibly heavy-weight for the situation.
Inside the __call__ method of the BaseController we retrieve the userid and password from the web session, call sqlalchemy create_engine() once for each database, then call a routine which calls Session.bind_mapper() repeatedly, once for each table that may be referenced on each of those connections, even though any given request usually references only one or two tables. It looks something like this:
# in lib/base.py on the BaseController class
def __call__(self, environ, start_response):
# note: web session contains {'username': XXX, 'password': YYY}
url1 = 'postgres://%(username)s:%(password)s#server1/finance' % session
url2 = 'postgres://%(username)s:%(password)s#server2/staff' % session
finance = create_engine(url1)
staff = create_engine(url2)
db_configure(staff, finance) # see below
... etc
# in another file
Session = scoped_session(sessionmaker())
def db_configure(staff, finance):
s = Session()
from db.finance import Employee, Customer, Invoice
for c in [
Employee,
Customer,
Invoice,
]:
s.bind_mapper(c, finance)
from db.staff import Project, Hour
for c in [
Project,
Hour,
]:
s.bind_mapper(c, staff)
s.close() # prevents leaking connections between sessions?
So the create_engine() calls occur on every request... I can see that being needed, and the Connection Pool probably caches them and does things sensibly.
But calling Session.bind_mapper() once for each table, on every request? Seems like there has to be a better way.
Obviously, since a desire for strong security underlies all this, we don't want any chance that a connection established for a high-security user will inadvertently be used in a later request by a low-security user.
Binding global objects (mappers, metadata) to user-specific connection is not good way. As well as using scoped session. I suggest to create new session for each request and configure it to use user-specific connections. The following sample assumes that you use separate metadata objects for each database:
binds = {}
finance_engine = create_engine(url1)
binds.update(dict.fromkeys(finance_metadata.sorted_tables, finance_engine))
# The following line is required when mappings to joint tables are used (e.g.
# in joint table inheritance) due to bug (or misfeature) in SQLAlchemy 0.5.4.
# This issue might be fixed in newer versions.
binds.update(dict.fromkeys([Employee, Customer, Invoice], finance_engine))
staff_engine = create_engine(url2)
binds.update(dict.fromkeys(staff_metadata.sorted_tables, staff_engine))
# See comment above.
binds.update(dict.fromkeys([Project, Hour], staff_engine))
session = sessionmaker(binds=binds)()
I would look at the connection pooling and see if you can't find a way to have one pool per user.
You can dispose() the pool when the user's session has expired