I'm building a small website and I already have all my models in SQLAlchemy. The website is to publish some information from some calculations which are done offline. Only the results will be published to a slimmed down database i.e. it contains the results, not the raw data, but the website needs to query the results.
I'm going to use Flask, as my models are already driven with Python (and some heavy lifting in C++ via SWIG) and I don't want to use Django.
Now this has been asked before I'm sure and the usual mantra without much justification is to 'use Flask-SQLAlchemy'. The question is why?
If I write some session handling myself, why do I have to go through the additional layer of redefining my database in Flask-SQLAlchemy. Other than having to write some code like here in my Flask app somewhere:
#app.before_request
def before_request():
g.db = connect_db()
#app.teardown_request
def teardown_request(exception):
db = getattr(g, 'db', None)
if db is not None:
db.close()
What else do I need to worry about? SQLAlchemy even does connection pooling for me by default.
Actually, you are building a web application using Flask which do database related kinds of stuff by sqlalchemy. So, when you are dealing with database session with multiple requests which your application handles, then you have to ascertain that, you are creating and closing sessions cautiously.
If you read SQLAlchemy docs, they recommend to keep the lifecycle of the session separate and external from functions and objects that access and/or manipulate database data. This will greatly help with achieving a predictable and consistent transactional scope.
A web application is the easiest case because such an application is already constructed around a single, consistent scope - this is the request, which represents an incoming request from a browser, the processing of that request to formulate a response, and finally the delivery of that response back to the client. Integrating web applications with the Session is then the straightforward task of linking the scope of the Session to that of the request. The Session can be established as the request begins, or using a lazy initialization pattern which establishes one as soon as it is needed. The request then proceeds, with some system in place where application logic can access the current Session in a manner associated with how the actual request object is accessed. As the request ends, the Session is torn down as well, usually through the usage of event hooks provided by the web framework. The transaction used by the Session may also be committed at this point, or alternatively the application may opt for an explicit commit pattern, only committing for those requests where one is warranted, but still always tearing down the Session unconditionally at the end.
In laymen terms, i mean to say that
In SQLAlchemy, above action is mentioned because sessions in web application should be scoped, meaning that each request handler creates and destroys its own session.
This is necessary because web servers can be multi-threaded, so multiple requests might be served at the same time, each working with a different database session.
This means that if you are using SqlAlchemy with Flask, you have to manually handle session like creating scoped session and also remove them on each request cautiously otherwise you may be in deep shit, which adds extra layer of complexity to your web application.
But, there comes Flask-SqlAlchemy (an extension of sqlalchemy library for Flask app) which provides infrastructure to assist in the task of aligning the lifespan of a Session with that of each web request. Actually, you can also found that in SqlAlchmey docs, they also recommend to use this with Flask.
Flask-SQLAlchemy creates a fresh/new scoped session for each request. If you dig further it you will find out here, it also installs a hook on app.teardown_appcontext (for Flask >=0.9), app.teardown_request (for Flask 0.7-0.8), app.after_request (for Flask <0.7) and here is where it calls db.session.remove().
The code you put in the question is not actually valid for Sqlalchemy integration is Flask. I know it is just example, but saying that just in case.
For Sqlalchemy integration all you need to do is to make sure current DbSession is cleaned up at the end of request via something like this:
#app.teardown_appcontext
def shutdown_session(exception=None):
DbSession.remove()
where DbSession is scoped session.
Here is documentation for the case when you dont want to use Flask-Sqlalchemy package.
Related
Suppose I have web request handler in python which processes some complex logic using MySQL queries. I wrap request in some readable methods, for ex:
START TRANSACTION
get_some_users_in_range("select users where id>1 and id<24")
get_user("select users where id=10")
get_user("select users where id=10")
get_user("select users where id=12")
END TRANSACTION
All I want some smart caching application layer which understand what in context of transaction where is no need to do DB request after first query (because all needed rows already fetched by first query). Is where solutions for such problem in modern python (async preferable).
ps. raw SQL lib preferred (not ORM)
You can wrap your function with functools.lru_cache: https://docs.python.org/3/library/functools.html#functools.lru_cache
Here is the async version of the same functionality: https://github.com/aio-libs/async-lru
An excellent library that has more caching strategies (check their readme, they also have async support via separate lib): https://github.com/tkem/cachetools
But keep in mind that using such caching will work only in the scope of a single process. If you want the cache to be shared between processes/instances - consider some external cache service such as Redis for example
Background:
This is the situation I am facing and so far my current solution seems rather clunky. I want to improve on it. Right now:
I setup connections to each database in the main function of the Pyramid application:
def main(global_config, **settings):
a_engine = engine_from_config(settings, 'A.')
b_engine = engine_from_config(settings, 'B.')
ASession.configure(bind=a_engine)
BSession.configure(bind=b_engine)
"ASession" and "BSession" are simply globally defined scoped_session in /models/init.py.
ASession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
I define model base class like so. For example:
ABase = declarative_base()
class user(ABase):
id = Column(Integer, primary_key=True)
name = Column(String)
This somehow already doesn't feel very clean. But now that this model is supposed to be accessed from a different application, I also need to define the engine and connection again in that application. This feels extremely redundant.
Problem Abstracted:
Assume that there are 2 different databases:
A and B
Also assume that you want A and B to be accessible from 2 different applications (e.g.: Pyramid application, Bokeh Server App which uses Tornado) using the same model.
In short, how would one best pattern objects/models/classes/functions to produce clean non-redundant code in Python3?
Initial Thought After The Question Was Posted:
Thinking about this a bit more, I think I want each model to be somehow "self-contained". The model should bring with it methods for initiating connections. In other words, the initiation of db connections should be decoupled from the web application itself.
And it should be done in an instance kind of manner. So that multiple applications can use the same models. Each application would have its own session connection to either DB.
How would the community pattern this? Friday afternoons don't lend themselves to find answers to these kinds of questions for myself at least.
I have done this. My recommendation below is how I like doing it, but is not the only way. I would ditch scoped sessions and the transaction manager and make explicit session management objects, with request lifecycle callbacks handling creation, closing, committing, or rolling back your sessions. Basically scoped sessions are a way to simulate a global by getting the same item for that thread of execution. The other way to do this in Pyramid is to attach things to the registry and the request, because you have those everywhere. You attach shared components to the registry (the ZCA) and per-request objects to the request.
When you have multiple sessions, I've found it much easier to reason about them and keep track of them if they are handled by components that wrap up everything for that engine. So for a case like that you describe, I've made two different DB engine components, that are created on start up, attached to the registry, and have a method for getting a fresh session. If you create these components properly, they should be usable in any application, whether it's Pyramid, Tornado, or your test script. You just make sure it has a constructor with some sane way of passing in settings for setting up the engine, whether it's a settings dict or kwargs. I then make my data model(s) live in their own python packages and it's easy to have any app in the family import the model, instantiate the engine components and go to town. Note that if you like using the ZCA registry (and I love it, it's a fantastic DI system), there's nothing preventing you from using it in non-pyramid apps, you just set it up manually in your server start up code.
In Pyramid specifically, I make a custom Request class and use the reify decorator to allow other pyramid code to get the session(s) for that request. The request class has end-of-life-callbacks attached to close out the sessions, and to do rollbacks or commits. There is a bit more boilerplate, but for me it's cleaner in that I can very easily trace where and when in code and time my session management is happening. It's also a good way for testing.
That said, there are lots of smart folks in SQLAlchemy/Pyramid land who swear by scoped sessions and the transaction manager, so there are other valid approaches. Hope that helps.
I am writing an application which connects to a database. I want to create that db connection once, and then reuse that connection throughout the life of the application.
I also want to authenticate users. A user's auth will live for only the life of a request.
How can I differentiate between objects stored for the life of a flask app, versus specific to the request? Where would I store them so that all modules (and subsequent blueprints) have access to them?
Here is my sample app:
from flask import Flask, g
app = Flask(__name__)
#app.before_first_request
def setup_database(*args, **kwargs):
print 'before first request', g.__dict__
g.database = 'DATABASE'
print 'after first request', g.__dict__
#app.route('/')
def index():
print 'request start', g.__dict__
g.current_user = 'USER'
print 'request end', g.__dict__
return 'hello'
if __name__ == '__main__':
app.run(debug=True, port=6001)
When I run this (Flask 0.10.1) and navigate to http://localhost:6001/, here is what shows up in the console:
$ python app.py
* Running on http://127.0.0.1:6001/
* Restarting with reloader
before first request {}
after first request {'database': 'DATABASE'}
request start {'database': 'DATABASE'}
request end {'current_user': 'USER', 'database': 'DATABASE'}
127.0.0.1 - - [30/Sep/2013 11:36:40] "GET / HTTP/1.1" 200 -
request start {}
request end {'current_user': 'USER'}
127.0.0.1 - - [30/Sep/2013 11:36:41] "GET / HTTP/1.1" 200 -
That is, the first request is working as expected: flask.g is holding my database, and when the request starts, it also has my user's information.
However, upon my second request, flask.g is wiped clean! My database is nowhere to be found.
Now, I know that flask.g used to apply to the request only. But now that it is bound to the application (as of 0.10), I want to know how to bind variables to the entire application, rather than just a single request.
What am I missing?
edit: I'm specifically interested in MongoDB - and in my case, maintaining connections to multiple Mongo databases. Is my best bet to just create those connections in __init__.py and reuse those objects?
flask.g will only store things for the duration of a request. The documentation mentioned that the values are stored on the application context rather than the request, but that is more of an implementation issue: it doesn't change the fact that objects in flask.g are only available in the same thread, and during the lifetime of a single request.
For example, in the official tutorial section on database connections, the connection is made once at the beginning of the request, then terminated at the end of the request.
Of course, if you really wanted to, you could create the database connection once, store it in __init__.py, and reference it (as a global variable) as needed. However, you shouldn't do this: the connection could close or timeout, and you could not use the connection in multiple threads.
Since you didn't specify HOW you will be using Mongo in Python, I assume you will be using PyMongo, since that handles all of the connection pooling for you.
In this case, you would do something like this...
from flask import Flask
from pymongo import MongoClient
# This line of code does NOT create a connection
client = MongoClient()
app = Flask()
# This can be in __init__.py, or some other file that has imported the "client" attribute
#app.route('/'):
def index():
posts = client.database.posts.find()
You could, if you wish, do something like this...
from flask import Flask, g
from pymongo import MongoClient
# This line of code does NOT create a connection
client = MongoClient()
app = Flask()
#app.before_request
def before_request():
g.db = client.database
#app.route('/'):
def index():
posts = g.db.posts.find()
This really isn't all that different, however it can be helpful for logic that you want to perform on every request (such as setting g.db to a specific database depending on the user that is logged in).
Finally, you can realize that most of the work of setting up PyMongo with Flask is probably done for you in Flask-PyMongo.
Your other question deals with how you keep track of stuff specific to the user that is logged in. Well, in this case, you DO need to store some data that sticks around with the connection. flask.g is cleared at the end of the reuquest, so that's no good.
What you want to use is sessions. This is a place where you can store values that is (with the default implementation) stored in a cookie on the user's browser. Since the cookie will be passed along with every request the user's browser makes to your web site, you will have available the data you put in the session.
Keep in mind, though, that the session is NOT stored on the server. It is turned into a string that is passed back and forth to the user. Therefore, you can't store things like DB connections onto it. You would instead store identifiers (like user IDs).
Making sure that user authentication works is VERY hard to get right. The security concerns that you need to make sure of are amazingly complex. I would strongly recommend using something like Flask-Login to handle this for you. You can still use the session for storing other items as needed, or you can let Flask-Login handle determining the user ID and store the values you need in the database and retrieving them from the database in every request.
So, in summary, there are a few different ways to do what you want to do. Each have their usages.
Globals are good for items that are thread-safe (such as the PyMongo's MongoClient).
flask.g can be used for storing data in the lifetime of a request. With SQLAlchemy-based flask apps, a common thing to do is to ensure that all changes happen at once, at the end of a request using an after_request method. Using flask.g for something like this is very helpful.
The Flask session can be used to store simple data (strings and numbers, not connection objects) that can be used on subsequent requests that come from the same user. This is entirely dependent on using cookies, so at any point the user could delete the cookie and everything in the "session" will be lost. Therefore, you probably want to store much of your data in databases, with the session used to identify the data that relates to the user in the session.
"bound to the application" does not mean what you think it means. It means that g is bound to the currently running request. Quoth the docs:
Flask provides you with a special object that ensures it is only valid for the active request and that will return different values for each request.
It should be noted that Flask's tutorials specifically do not persist database objects, but that this is not normative for any application of substantial size. If you're really interested in diving down the rabbit hole, I suggest a database connection pooling tool. (such as this one, mentioned in the SO answer ref'd above)
I suggest you use session to manage user information. Sessions help you keep information b/w multiple requests and flask provides you a session framework already.
from flask import session
session['usename'] = 'xyz'
Look at the extension Flask-Login. It is well designed to handle user authentications.
For database, I suggest looking at Flask-SQLAlchemy extension. This takes care of initialization, pooling, teardowns etc. for you out of the box. All you need to do is define the database URI in a config and bind it to the application.
from flask.ext.sqlalchemy import SQLAlchemy
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:////tmp/test.db'
db = SQLAlchemy(app)
Scenario:
Major web app w. Python+Flask
Flask login and Flask.session for basic session variables (user-id and session-id)
Flask.session and limitations? (Cookies)
Cookie based and basically persist only at the client side.
For some session variables that will be regularly read (ie, user permissions, custom application config) it feels awkward to carry all that info around in a cookie, at every single page request and response.
Database is too much?
Since the session can be identified at the server side by introducing unique session id at login, some server-side session variable management can be used. Reading this data at the server side from a database also feels like unnecessary overhead.
Question
What is the most efficient way to handle the session variables at the server side?
Perhaps that could be a memory-based solution, but I am worried that different Flask app requests could be executed at different threads that would not share the memory-stored session data, or cause conflicts in case of simultaneous reading-writing.
I am looking for advice and best practice for planning the basic level architecture.
Flask-Caching
What you need is a server-side caching package that's Flask-Caching.
A simple setup:
from flask import Flask
from flask_caching import Cache
app = Flask(__name__)
app.config['CACHE_TYPE'] = 'SimpleCache'
cache = Cache(app)
Then a explicitly use of a cached variable:
#app.route('/')
def load():
cache.set("foo", foo)
bar = cache.get("foo")
There is much more in Flask-Caching and that's the recommended approach by Flask.
In case of a multithread server with gunicorn from here you better use ['CACHE_TYPE'] = 'FileSystemCache'
Your instinct is correct, it's probably not the way to do it.
Session data should only be ephemeral information that is not too troublesome to lose and recreate. For example, the user will just have to login again to restore it.
Configuration data or anything else that's necessary on the server and that must survive a logout is not part of the session and should be stored in a DB.
Now, if you really need to easily keep this information client-side and it's not too much of a problem if it's lost, then use a session cookie for logged in/out state and a permanent cookie with a long lifespan for the rest of the configuration information.
If the information is too much size-wise, then the only option I can think of is to store the data, other than the logged in/out state, in a DB.
I have a Pylons-based web application which connects via Sqlalchemy (v0.5) to a Postgres database. For security, rather than follow the typical pattern of simple web apps (as seen in just about all tutorials), I'm not using a generic Postgres user (e.g. "webapp") but am requiring that users enter their own Postgres userid and password, and am using that to establish the connection. That means we get the full benefit of Postgres security.
Complicating things still further, there are two separate databases to connect to. Although they're currently in the same Postgres cluster, they need to be able to move to separate hosts at a later date.
We're using sqlalchemy's declarative package, though I can't see that this has any bearing on the matter.
Most examples of sqlalchemy show trivial approaches such as setting up the Metadata once, at application startup, with a generic database userid and password, which is used through the web application. This is usually done with Metadata.bind = create_engine(), sometimes even at module-level in the database model files.
My question is, how can we defer establishing the connections until the user has logged in, and then (of course) re-use those connections, or re-establish them using the same credentials, for each subsequent request.
We have this working -- we think -- but I'm not only not certain of the safety of it, I also think it looks incredibly heavy-weight for the situation.
Inside the __call__ method of the BaseController we retrieve the userid and password from the web session, call sqlalchemy create_engine() once for each database, then call a routine which calls Session.bind_mapper() repeatedly, once for each table that may be referenced on each of those connections, even though any given request usually references only one or two tables. It looks something like this:
# in lib/base.py on the BaseController class
def __call__(self, environ, start_response):
# note: web session contains {'username': XXX, 'password': YYY}
url1 = 'postgres://%(username)s:%(password)s#server1/finance' % session
url2 = 'postgres://%(username)s:%(password)s#server2/staff' % session
finance = create_engine(url1)
staff = create_engine(url2)
db_configure(staff, finance) # see below
... etc
# in another file
Session = scoped_session(sessionmaker())
def db_configure(staff, finance):
s = Session()
from db.finance import Employee, Customer, Invoice
for c in [
Employee,
Customer,
Invoice,
]:
s.bind_mapper(c, finance)
from db.staff import Project, Hour
for c in [
Project,
Hour,
]:
s.bind_mapper(c, staff)
s.close() # prevents leaking connections between sessions?
So the create_engine() calls occur on every request... I can see that being needed, and the Connection Pool probably caches them and does things sensibly.
But calling Session.bind_mapper() once for each table, on every request? Seems like there has to be a better way.
Obviously, since a desire for strong security underlies all this, we don't want any chance that a connection established for a high-security user will inadvertently be used in a later request by a low-security user.
Binding global objects (mappers, metadata) to user-specific connection is not good way. As well as using scoped session. I suggest to create new session for each request and configure it to use user-specific connections. The following sample assumes that you use separate metadata objects for each database:
binds = {}
finance_engine = create_engine(url1)
binds.update(dict.fromkeys(finance_metadata.sorted_tables, finance_engine))
# The following line is required when mappings to joint tables are used (e.g.
# in joint table inheritance) due to bug (or misfeature) in SQLAlchemy 0.5.4.
# This issue might be fixed in newer versions.
binds.update(dict.fromkeys([Employee, Customer, Invoice], finance_engine))
staff_engine = create_engine(url2)
binds.update(dict.fromkeys(staff_metadata.sorted_tables, staff_engine))
# See comment above.
binds.update(dict.fromkeys([Project, Hour], staff_engine))
session = sessionmaker(binds=binds)()
I would look at the connection pooling and see if you can't find a way to have one pool per user.
You can dispose() the pool when the user's session has expired