My question is about the recommended approach for handling database connections when using Flask in a production environment or other environment where performance is a concern. In Flask, the g object is available for storing things, and open database connections can be placed there to allow the application to reuse them in subsequent database queries during the same request. However, the g object doesn't persist across requests, so it seems that each new request would require a new database connection (and the performance hit that entails).
The most related question I found on this matter is this one: How to preserve database connection in a python web server but the answers only raise the abstract idea of connection pooling (without tying it to how one might use it within Flask and how it would survive across requests) or propose a solution that is only relevant to one particular type of database or particular stack.
So my question is about the general approach that should be taken when productionising apps built on Flask that connect to any kind of database. It seems like something involving connection pooling is in the right direction, especially since that would work for a traditional Python application. But I'm wondering what the recommended approach is when using Flask because of the aforementioned issues of persistence across connections, and also the fact that Flask apps in production are run from WSGI servers, which potentially add further complications.
EDIT: Based on the comments recommending flask sqlalchemy. Assuming flask sqlalchemy solves the problem, does it also work for Neo4J or indeed any arbitrary database that the Flask app uses? Many of the existing database connectors already support pooling natively, so why introduce an additional dependency whose primary purpose is to provide ORM capabilities rather than connection management? Also, how does sqlalchemy get around the fundamental problem of persistence across requests?
Turns out there is a straightforward way to achieve what I was after. But as the commenters suggested, if it is at all possible to go the flask sqlalchemy route, then you might want to go that way. My approach to solving the problem is to save the connection object in a module level variable that is then imported as necessary. That way it will be available for use from within Flask and by other modules. Here is a simplified version of what I did:
app.py
from flask import Flask
from extensions import neo4j
app = Flask(__name__)
neo4j.init_app(app)
extensions.py
from neo4j_db import Neo4j
neo4j = Neo4j()
neo4j_db.py
from neo4j import GraphDatabase
class Neo4j:
def __init__(self):
self.app = None
self.driver = None
def init_app(self, app):
self.app = app
self.connect()
def connect(self):
self.driver = GraphDatabase.driver('bolt://xxx')
return self.driver
def get_db(self):
if not self.driver:
return self.connect()
return self.driver
example.py
from extensions import neo4j
driver = neo4j.get_db()
And from here driver will contain the database driver that will persist across Flask requests.
Hope that helps anyone that has the same issue.
Related
I’ve got a system that runs a PostgreSQL database, and uses Flask with SQLAlchemy to interact with it. A number of the queries are failing, and because SQLAlchemy automatically starts transactions behind the scenes, we’re getting a lot of PostgreSQL processes locked up in idle in transaction (aborted) status.
Now, I know that the proper thing to do is to fix those queries and improve the exception handling, but that’s easier said than done. It’s in the pipeline, but in the meantime I need a quick mitigation, and turning on autocommit looks like it will fix 99% of the current operational issues. Less critical, but I’d also like to avoid spurious transactions for queries that don’t alter database state - this system is an API and most of the calls are SELECTs. I’m quite happy managing my own transactions and telling the database when I need them. Most of the time for this application, they aren’t necessary and I don’t want the extra overhead.
I’ve read https://www.oddbird.net/2014/06/14/sqlalchemy-postgres-autocommit/ and Autocommit in Flask-SQLAlchemy. Unless I’m very much mistaken, the solution given in that Stack Overflow thread turns on what Carl Meyer described as “emulated autocommit”. This system is under very high load and if at all possible I’d like to avoid the extra overhead of requiring a commit() after every SELECT (the implicit requirement of which doesn’t seem like a sensible decision on the part of SQLAlchemy, but I digress).
Here's a minimal snippet of the database initialisation in the Flask app:
from flask_sqlalchemy import SQLAlchemy
from flask import Flask
#db = SQLAlchemy( engine_options = { 'isolation_level': 'AUTOCOMMIT' } )
# line above fails with “TypeError: __init__() got an unexpected keyword argument 'engine_options'”
db = SQLAlchemy( session_options = { 'autocommit': True } )
# this version works, but I think it’s emulated autocommit rather than true (default) PostgreSQL autocommit
def register_extensions(app):
db.init_app(app)
with app.app_context():
db.create_all()
def create_app():
app = Flask(__name__.split('.')[0])
app.config['SQLALCHEMY_DATABASE_URI'] = 'postgresql+psycopg2://localhost/etc'
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = False
app.config['SQLALCHEMY_ENGINE_OPTIONS'] = { 'isolation_level': 'AUTOCOMMIT' }
# line above has no effect, gets ignored
register_extensions(app)
return app
app = create_app()
Not sure whether any of this is version specific, but for what it’s worth we’re running on a vanilla Ubuntu 20.04 stack: python3-flask-sqlalchemy 2.1, python3-sqlalchemy 1.3.12, python 3.8.2, postgresql 12.8. Flask is served by Apache 2.4.41 using mod_wsgi.
If I still had hair I would be tearing it out at this point. Is what I’m trying to do possible, and if so, how?
EDIT: According to the Flask SQLAlchemy documentation at https://flask-sqlalchemy.palletsprojects.com/en/2.x/api/#flask_sqlalchemy.SQLAlchemy, engine_options is one of the expected arguments for the SQLAlchemy constructor, so I'm not sure why it's failing in the example above.
EDIT 2: I just spotted that engine_options wasn’t added until Flask SQLAlchemy 2.4, whereas we’re on version 2.1, so that explains why that’s failing. The question remains, is there another way of passing in the option? Because various other parts of the code need access to the db object, I don’t think I can call create_engine() directly.
I'm useing peewee with my tornado webapp,when I read peewee's document,I found:
Adding Request Hooks
When building web-applications, it is very important that you manage your database connections correctly. In this section I will describe how to add hooks to your web app to ensure the database connection is handled properly.
These steps will ensure that regardless of whether you’re using a simple SQLite database, or a pool of multiple Postgres connections, peewee will handle the connections correctly.
http://docs.peewee-orm.com/en/latest/peewee/database.html
Insides,it tells how Flask Django Bottle...to use that except the solution for Tornado
I wonder it's a easy way for tornado to solve this problem? Or this doesn't matter at all?
The idea there is that you want to open a connection when a request begins, and close it when the request is finished (the response is returned).
To do this it looks like you can subclass RequestHandler:
from tornado.web import RequestHandler
db = SqliteDatabase('my_db.db')
class PeeweeRequestHandler(RequestHandler):
def prepare(self):
db.connect()
return super(PeeweeRequestHandler, self).prepare()
def on_finish(self):
if not db.is_closed():
db.close()
return super(PeeweeRequestHandler, self).on_finish()
I'm building a small website and I already have all my models in SQLAlchemy. The website is to publish some information from some calculations which are done offline. Only the results will be published to a slimmed down database i.e. it contains the results, not the raw data, but the website needs to query the results.
I'm going to use Flask, as my models are already driven with Python (and some heavy lifting in C++ via SWIG) and I don't want to use Django.
Now this has been asked before I'm sure and the usual mantra without much justification is to 'use Flask-SQLAlchemy'. The question is why?
If I write some session handling myself, why do I have to go through the additional layer of redefining my database in Flask-SQLAlchemy. Other than having to write some code like here in my Flask app somewhere:
#app.before_request
def before_request():
g.db = connect_db()
#app.teardown_request
def teardown_request(exception):
db = getattr(g, 'db', None)
if db is not None:
db.close()
What else do I need to worry about? SQLAlchemy even does connection pooling for me by default.
Actually, you are building a web application using Flask which do database related kinds of stuff by sqlalchemy. So, when you are dealing with database session with multiple requests which your application handles, then you have to ascertain that, you are creating and closing sessions cautiously.
If you read SQLAlchemy docs, they recommend to keep the lifecycle of the session separate and external from functions and objects that access and/or manipulate database data. This will greatly help with achieving a predictable and consistent transactional scope.
A web application is the easiest case because such an application is already constructed around a single, consistent scope - this is the request, which represents an incoming request from a browser, the processing of that request to formulate a response, and finally the delivery of that response back to the client. Integrating web applications with the Session is then the straightforward task of linking the scope of the Session to that of the request. The Session can be established as the request begins, or using a lazy initialization pattern which establishes one as soon as it is needed. The request then proceeds, with some system in place where application logic can access the current Session in a manner associated with how the actual request object is accessed. As the request ends, the Session is torn down as well, usually through the usage of event hooks provided by the web framework. The transaction used by the Session may also be committed at this point, or alternatively the application may opt for an explicit commit pattern, only committing for those requests where one is warranted, but still always tearing down the Session unconditionally at the end.
In laymen terms, i mean to say that
In SQLAlchemy, above action is mentioned because sessions in web application should be scoped, meaning that each request handler creates and destroys its own session.
This is necessary because web servers can be multi-threaded, so multiple requests might be served at the same time, each working with a different database session.
This means that if you are using SqlAlchemy with Flask, you have to manually handle session like creating scoped session and also remove them on each request cautiously otherwise you may be in deep shit, which adds extra layer of complexity to your web application.
But, there comes Flask-SqlAlchemy (an extension of sqlalchemy library for Flask app) which provides infrastructure to assist in the task of aligning the lifespan of a Session with that of each web request. Actually, you can also found that in SqlAlchmey docs, they also recommend to use this with Flask.
Flask-SQLAlchemy creates a fresh/new scoped session for each request. If you dig further it you will find out here, it also installs a hook on app.teardown_appcontext (for Flask >=0.9), app.teardown_request (for Flask 0.7-0.8), app.after_request (for Flask <0.7) and here is where it calls db.session.remove().
The code you put in the question is not actually valid for Sqlalchemy integration is Flask. I know it is just example, but saying that just in case.
For Sqlalchemy integration all you need to do is to make sure current DbSession is cleaned up at the end of request via something like this:
#app.teardown_appcontext
def shutdown_session(exception=None):
DbSession.remove()
where DbSession is scoped session.
Here is documentation for the case when you dont want to use Flask-Sqlalchemy package.
I've built a small python REST service using Flask, with Flask-SQLAlchemy used for talking to the MySQL DB.
If I connect directly to the MySQL server everything is good, no problems at all. If I use HAproxy (handles HA/failover, though in this dev environment there is only one DB server) then I constantly get MySQL server has gone away errors if the application doesn't talk to the DB frequently enough.
My HAproxy client timeout is set to 50 seconds, so what I think is happening is it cuts the stream, but the application isn't aware and tries to make use of an invalid connection.
Is there a setting I should be using when using services like HAproxy?
Also it doesn't seem to reconnect automatically, but if I issue a request manually I get Can't reconnect until invalid transaction is rolled back, which is odd since it is just a select() call I'm making, so I don't think it is a commit() I'm missing - or should I be calling commit() after every ORM based query?
Just to tidy up this question with an answer I'll post what I (think I) did to solve the issues.
Problem 1: HAproxy
Either increase the HAproxy client timeout value (globally, or in the frontend definition) to a value longer than what MySQL is set to reset on (see this interesting and related SF question)
Or set SQLALCHEMY_POOL_RECYCLE = 30 (30 in my case was less than HAproxy client timeout) in Flask's app.config so that when the DB is initialised it will pull in those settings and recycle connections before HAproxy cuts them itself. Similar to this issue on SO.
Problem 2: Can't reconnect until invalid transaction is rolled back
I believe I fixed this by tweaking the way the DB is initialised and imported across various modules. I basically now have a module that simply has:
from flask.ext.sqlalchemy import SQLAlchemy
db = SQLAlchemy()
Then in my main application factory I simply:
from common.database import db
db.init_app(app)
Also since I wanted to easily load table structures automatically I initialised the metadata binds within the app context, and I think it was this which cleanly handled the commit() issue/error I was getting, as I believe the database sessions are now being correctly terminated after each request.
with app.app_context():
# Setup DB binding
db.metadata.bind = db.engine
Web.py has its own database API, web.db. It's possible to use SQLObject instead, but I haven't been able to find documentation describing how to do this properly. I'm especially interested in managing database connections. It would be best to establish a connection at the wsgi entry point, and reuse it. Webpy cookbook contains an example how to do this with SQLAlchemy. I'd be interested to see how to properly do a similar thing using SQLObject.
This is how I currently do it:
class MyPage(object):
def GET(self):
ConnectToDatabase()
....
return render.MyPage(...)
This is obviously inefficient, because it establishes a new database connection on each query. I'm sure there's a better way.
As far as I understand the SQLAlchemy example given, a processor is used, that is, a session is created for each connection and committed when the handler is complete (or rolled back if an error has occurred).
I don't see any simple way to do what you propose, i.e. open a connection at the WSGI entry point. You will probably need a connection pool to serve multiple clients at the same time. (I have no idea what are the requirements for efficiency, code simplicity and so on, though. Please comment.)
Inserting ConnectToDatabase calls into each handler is of course ugly. I suggest that you adapt the cookbook example replacing the SQLAlchemy session with a SQLObject connection.