sqlalchemy mysql connections not closing on flask api - python

I have an API I have written in flask. It uses sqlalchemy to deal with a MySQL database. I don't use flask-sqlalchemy, because I don't like how the module forces you into a certain pattern for declaring the model.
I'm having a problem in which my database connections are not closing. The object representing the connection is going out of scope, so I assume it is being garbage collected. I also explicitly call close() on the session. Despite this, the connections stay open long after the API call has returned its response.
sqlsession.py: Here is the wrapper I am using for the session.
class SqlSession:
def __init__(self, conn=Constants.Sql):
self.db = SqlSession.createEngine(conn)
Session = sessionmaker(bind=self.db)
self.session = Session()
#staticmethod
def createEngine(conn):
return create_engine(conn.URI.format(user=conn.USER, password=conn.PASS, host=conn.HOST, port=conn.PORT, database=conn.DATABASE, poolclass=NullPool))
def close(self):
self.session.close()
flaskroutes.py: Here is an example of the flask app instantiating and using the wrapper object. Note that it instantiates it in the beginning within the scope of the api call, then closes the session at the end, and presumably is garbage collected after the response is returned.
def commands(self, deviceId):
sqlSession = SqlSession(self.sessionType) <---
commandsQueued = getCommands()
jsonCommands = []
for command in commandsQueued:
jsonCommand = command.returnJsonObject()
jsonCommands.append(jsonCommand)
sqlSession.session.delete(command)
sqlSession.session.commit()
resp = jsonify({'commands': jsonCommands})
sqlSession.close() <---
resp.status_code = 200
return resp
I would expect the connections to be cleared as soon as the HTTP response is made, but instead, the connections end up with the "SLEEP" state (when viewed in the MySQL command line interface 'show processlist').

I ended up using the advice from this SO post:
How to close sqlalchemy connection in MySQL
I strongly recommend reading that post to anyone having this problem. Basically, I added a dispose() call to the close method. Doing so causes the entire connection to be destroyed, while closing simply returns connections to an available pool (but leave them open).
def close(self):
self.session.close()
self.db.dispose()
This whole this was a bit confusing to me, but at least now I understand more about the connection pool.

Related

Does SQLAlchemy close sessions after commit()?

So this question is a little like Does SQLAlchemy reset the database session between SQLAlchemy Sessions from the same connection?
I have a Flask/SQLAlchemy/Postgres app, which intermittently seems to drop connections after a commit() that occurs as part of a POST request.
This causes me headaches as I rely upon a customized option (https://www.postgresql.org/docs/9.6/runtime-config-custom.html) to control row level security - in effect executing the following before each Flask request while utilising scoped sessions:
#app.before_request
def load_user():
...
# Set-up RLS.
statement = f"SET app.permitted_workspace_id = '{workspace_id}'"
db.db_session.execute(statement)
...
This pattern generally works fine, but occasionally seems to fail when, so far as I can tell, after a commit(), SQLAlchemy drops the existing session and checks out a new one, in which app.permitted_workspace_id is no longer set.
My workaround for this is to listen for session checkout events, and then re-set the parameter:
#event.listens_for(db_engine, 'checkout')
def receive_checkout(dbapi_connection, connection_record, connection_proxy):
...
cursor = dbapi_connection.cursor()
statement = f"SET app.permitted_workspace_id = '{g.user.workspace_id}'"
cursor.execute(statement)
return
So my question is really: is it unavoidable that SQLAlchemy may close sessions after commit(), meaning I lose my session parameters - even with more DB work still to do?
If so, do we think this pattern is secure or even acceptable practice? Ideally, I'd keep the session open until removed (via #app.teardown_appcontext), but since I'm struggling to achieve that, and still have the relevant info available within the Flask request, I think this is the next best way to go.
Thanks
Edit 1:
In terms of session scoping, the layout is this:
In a database module, I lay out the following:
def get_database_connection()
...
db_engine = sa.create_engine(
f'postgresql://{user}:{password}#{host}/postgres',
echo=False,
poolclass=sa.pool.NullPool
)
# Connect - RLS is controlled by db_get_user_details.
db_connection = db_engine.connect()
db_session = scoped_session(
sessionmaker(
autocommit=False,
autoflush=False,
expire_on_commit=False,
bind=db_engine
)
)
return(db_engine, db_session, db_connection)
This is then called up top from inside the main Flask application:
db_engine, db_session, db_connection = db.get_database_connection()
And session removal is controlled by a function as follows:
#app.teardown_appcontext
def remove_session(exception=None):
db_session.remove()
So the answer in here seems to be that commit() does perform a checkin with this pattern:
https://github.com/sqlalchemy/sqlalchemy/issues/4925
if Session is what you're working with then yes, the Session will release connections when any of commit(), rollback(), or close() is called.

Using a single redis connection (aioredis)

My application will be sending hundreds, if not thousands, of messages over redis every second, so we are going to open the connection when the app launches and use that connection for every transaction. I am having trouble keeping that connection open.
Here is some of my redis handler class:
class RedisSingleDatabase(AsyncGetSetDatabase):
async def redis(self):
if not self._redis:
self._redis = await aioredis.create_redis(('redis', REDIS_PORT))
return self._redis
def __init__(self):
self._redis = None
async def get(self, key):
r = await self.redis()
# r = await aioredis.create_redis(('redis', REDIS_PORT))
print(f'CONNECTION IS {"CLOSED" if r.closed else "OPEN"}!')
data = await r.get(key, encoding='utf-8')
if data is not None:
return json.loads(data)
This does not work. By the time we call r.get, the connection is closed (ConnectionClosedError).
If I uncomment the second line of the get method, and connect to the database locally in the method, it works. But then we're no longer using the same connection.
I've considered trying all this using dependency injection as well. This is a flask app, and in my create_app method I would connect to the database and construct my RedisSingleDatabase with that connection. Besides the question of whether my current problem would be the same, I can't seem to get this approach to work since we need to await the connection which I can't do in my root-level create_app which can't be async! (unless I'm missing something).
I've tried asyncio.run all over the place and it either doesn't fix the problem or raises its own error that it can't be called from a running event loop.
Please help!!!

StreamingHttpResponse: return database connection to pool / close it

If you have a StreamingHttpResponse returned from a Django view, when does it return any database connection to the pool? If by default it does it once the StreamingHttpResponse has completed, is there a way to make return the connection earlier?
def my_view(request):
# Some database queries using the Django ORM
# ...
def yield_data():
# A generator, with no database queries using the Django ORM
# ...
return StreamingHttpResponse(
yield_data(), status=200
)
If it makes a difference, this is using https://pypi.org/project/django-db-geventpool/ with gunicorn, and any answer should also work when tested with pytest.mark.django_db (that I think wraps tests in transactions)
If you look at the documentation
https://docs.djangoproject.com/en/3.0/ref/databases/
Connection management
Django opens a connection to the database when it first makes a database query. It keeps this connection open and reuses it in subsequent requests. Django closes the connection once it exceeds the maximum age defined by CONN_MAX_AGE or when it isn’t usable any longer.
In detail, Django automatically opens a connection to the database whenever it needs one and doesn’t have one already — either because this is the first connection, or because the previous connection was closed.
At the beginning of each request, Django closes the connection if it has reached its maximum age. If your database terminates idle connections after some time, you should set CONN_MAX_AGE to a lower value, so that Django doesn’t attempt to use a connection that has been terminated by the database server. (This problem may only affect very low traffic sites.)
At the end of each request, Django closes the connection if it has reached its maximum age or if it is in an unrecoverable error state. If any database errors have occurred while processing the requests, Django checks whether the connection still works, and closes it if it doesn’t. Thus, database errors affect at most one request; if the connection becomes unusable, the next request gets a fresh connection.
Also if you see the db/__init__.py in django source code
# For backwards compatibility. Prefer connections['default'] instead.
connection = DefaultConnectionProxy()
# Register an event to reset saved queries when a Django request is started.
def reset_queries(**kwargs):
for conn in connections.all():
conn.queries_log.clear()
signals.request_started.connect(reset_queries)
# Register an event to reset transaction state and close connections past
# their lifetime.
def close_old_connections(**kwargs):
for conn in connections.all():
conn.close_if_unusable_or_obsolete()
signals.request_started.connect(close_old_connections)
signals.request_finished.connect(close_old_connections)
It has connection to the request_started and request_finished signal to close old connections using close_old_connections.
So if you are not willing to wait for connections to be closed you can call this method yourself. You updated code will be like below
from django.db import close_old_connections
def my_view(request):
# Some database queries using the Django ORM
# ...
close_old_connections()
def yield_data():
# A generator, with no database queries using the Django ORM
# ...
return StreamingHttpResponse(
yield_data(), status=200
)

Keeping SQLAlchemy session alive when streaming a Flask Response

I'm trying to stream large CSVs to clients from my Flask server, which uses Flask-SQLAlchemy.
When configuring the app (using the factory pattern), db.session.close() is called after each request:
#app.after_request
def close_connection(r):
db.session.close()
return r
This configuration has worked great up until now, as all the requests are short lived. But when streaming a response, the SQLAlchemy session is closed prematurely, throwing the following error when the generator is called:
sqlalchemy.orm.exc.DetachedInstanceError: Parent instance <Question> is not bound to a Session; lazy load operation of attribute 'offered_answers' cannot proceed
Pseudo-code:
#app.route('/export')
def export_data():
answers = Answer.query.all()
questions = Question.query.all()
def generate():
Iterate through answers questions and write out various relationships to csv
response = Response(stream_with_context(generate()), mimetype='text/csv')
return response
I've tried multiple configurations of using / not using stream_with_context and global flags in def close_connection to not automatically close the connection but the same error persists.
#app.after_request was closing the database session before the generator to stream the file was ever invoked.
The solution was to migrate db.session.close() to #app.teardown_request. stream_with_context must also be used when instantiating Response.

How to make a tornado request atomic in the Database

I have a python app written in the Tornado Asynchronous framework. When an HTTP request comes in, this method gets called:
#classmethod
def my_method(cls, my_arg1):
# Do some Database Transaction #1
x = get_val_from_db_table1(id=1, 'x')
y = get_val_from_db_table2(id=7, 'y')
x += x + (2 * y)
# Do some Database Transaction #2
set_val_in_db_table1(id=1, 'x', x)
return True
The three database operations are interrelated. And this is a concurrent application so multiple such HTTP calls can be happening concurrently and hitting the same DB.
For data-integrity purposes, its important that the three database operations in this method are all called without another processes reading or writing to those database rows in between.
How can I make sure this method has database atomicity? Does Tornado have a decorator for this?
Synchronous database access
You haven't stated how you access your database. If, which is likely, you have synchronous DB access in get_val_from_db_table1 and friends (e.g. with pymysql) and my_method is blocking (doesn't return control to IO loop) then you block your server (which has implications on performance and responsiveness of your server) but effectively serialise your clients and only one can execute my_method at a time. So in terms of data consistency you don't need to do anything, but generally it's a bad design. You can solve both with #xyres's solution in short term (at cost of keeping in mind thread-safely concerns because most of Tornado's functionality isn't thread-safe).
Asynchronous database access
If you have asynchronous DB access in get_val_from_db_table1 and friends (e.g. with tornado-mysql) then you can use tornado.locks.Lock. Here's an example:
from tornado import web, gen, locks, ioloop
_lock = locks.Lock()
def synchronised(coro):
async def wrapper(*args, **kwargs):
async with _lock:
return await coro(*args, **kwargs)
return wrapper
class MainHandler(web.RequestHandler):
async def get(self):
result = await self.my_method('foo')
self.write(result)
#classmethod
#synchronised
async def my_method(cls, arg):
# db access
await gen.sleep(0.5)
return 'data set for {}'.format(arg)
if __name__ == '__main__':
app = web.Application([('/', MainHandler)])
app.listen(8080)
ioloop.IOLoop.current().start()
Note that the above is said about normal single-process Tornado application. If you use tornado.process.fork_processes, then you can only go with multiprocessing.Lock.
Since you want to run those three db operations one right after the other, the function my_method must be non-asynchronous.
But this would also mean that my_method will block the server. You definitely don't want that. One way that I can think of is to run this function in another thread. This won't block the server and will keep accepting new requests while the operations are running. And since, it's going to be non-async, db atomicity is guaranteed.
Here's the relevant code to get you started:
import concurrent.futures
executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
# Don't set `max_workers` more than 1, because then multiple
# threads will be able to perform db operations
class MyHandler(...):
#gen.coroutine
def get(self):
yield executor.submit(MyHandler.my_method, my_arg1)
# above, `yield` is used to wait for
# db operations to finish
# if you don't want to wait and return
# a response immediately remove the
# `yield` keyword
self.write('Done')
#classmethod
def my_method(cls, my_arg1):
# do db stuff ...
return True

Categories