SQLAlchemy best practices with sessions for REST API - python

I created a rest API with openapi generator that contains all the requests necessary for selecting, inserting, and updating my SQL database.
I use from my database generation and manipulation SQLAlchemy and I'm not sure how to use the session to interact with the database in this context.
My project looks like this:
DB
| openapi_server (generated)
| __init__.py
| request.py
| database.py
In database.py I keep my database structure.
In request.py I have all the functions that need to be processed on every request(to interact with the database).
My way of handling this situation is: I create a session variable at the beginning of each function and after the operation is complete I close it.
Any other methods that are more scalable and easy to maintain or which are the best practices?

My understanding is that the sqlalchemy session is different from the client session in that the client session stores information about authorization & permissions whereas the sqlalchemy session is a gil-bound transaction state which associates your code / machine to an external database.
Assuming you're not utilizing multithreading or parallel processing, a single sqlalchemy session shared between your application would be appropriate. In the case where your users have different levels of database permissions, I would establish those rules in your application authorization, rather than the database user-permission schema. (That should be reserved for system-users.)
Bear in mind, multiple sqlalchemy sessions are appropriate in many scenarios and there are advantages for creating and closing sessions on the fly. But there are also potential downsides, such as write collisions (2 processes try to write the same record) and so on. In these more fine grained cases, I'd suggest a queuing process as a central orchestrator.
For implementation:I usually create a file create_session.py which has a function to create a new db session with the appropriate DB URI. I then call that function in the main __init__.py like so: session = create_session() --> importing that session throughout the application is done by importing session from the main module ex: from database import session.
In cases where you need to create new / multiple sessions, do so with:
# Getting the path right here isn't always straightforward tbh
# basically, import the function from the module directly
from create_session import create_session
def do_something():
# Always create your session in a method
# otherwise your db will open many unnecessary connections
my_session = create_session()
print('Done')
# Close the session when you're done
my_session.close()

Related

Flask-SQLAlchemy - how do sessions work with multiple databases?

I'm working on a Flask project and I am using Flask-SQLAlchemy.
I need to work with multiple already existing databases.
I created the "app" object and the SQLAlchemy one:
from flask import Flask
from flask_sqlalchemy import SQLAlchemy
app = Flask(__name__)
db = SQLAlchemy(app)
In the configuration I set the default connection and the additional binds:
SQLALCHEMY_DATABASE_URI = 'postgresql://pg_user:pg_pwd#pg_server/pg_db'
SQLALCHEMY_BINDS = {
'oracle_bind': 'oracle://oracle_user:oracle_pwd#oracle_server/oracle_schema',
'mssql_bind': 'mssql+pyodbc://msssql_user:mssql_pwd#mssql_server/mssql_schema?driver=FreeTDS'
}
Then I created the table models using the declarative system and, where needed, I set the
__bind_key__ parameter to indicate in which database the table is located.
For example:
class MyTable(db.Model):
__bind_key__ = 'mssql_bind'
__tablename__ = 'my_table'
id = db.Column(db.Integer, nullable=False, primary_key=True)
val = db.Column(db.String(50), nullable=False)
in this way everything works correctly, when I do a query it is made on the right database.
Reading the SQLAlchemy documentation and the Flask-SQLALchemy documentation I understand these things
(i write them down to check I understand correctly):
You can handle the transactions through the session.
In SQLAlchemy you can bind a session with a specific engine.
Flask-SQLAlchemy automatically creates the session (scoped_session) at the request start and it destroys it at the request end
so I can do:
record = MyTable(1, 'some text')
db.session.add(record)
db.session.commit()
I can not understand what happens when we use multiple databases, regarding the session, in Flask-SqlAlchemy.
I verified that the system is able to bind the table correctly at the right database through the __bind_key__ parameter,
I can, therefore, insert data on different databases through db.session and, at the commit, everything is saved.
I can't, however, understand if Flask-SQLAlchemy create multiple sessions (one for each engine) or if manages the thing in a different way.
In both cases, how is it possible refer to the session/transaction of a specific database?
If I use db.session.commit() the system does the commit on all involved databases, but how can I do if I want to commit only for a single database?
I would do something like:
db.session('mssql_bind').commit()
but I can not figure out how to do this.
I also saw a Flask-SQLAlchemy implementation which should ease the management of these situations:
Issue: https://github.com/mitsuhiko/flask-sqlalchemy/issues/107
Implementation: https://github.com/mitsuhiko/flask-sqlalchemy/pull/249
but I can not figure out how to use it.
In Flask-SQLAlchemy how can I manage sessions specifically for each single engine?
Flask-SQLAlchemy uses a customized session that handles bind routing according to given __bind_key__ attribute in mapped class. Under the hood it actually adds that key as info to the created table. In other words, Flask does not create multiple sessions, one for each bind, but a single session that routes to correct connectable (engine/connection) according to the bind key. Note that vanilla SQLAlchemy has similar functionality out of the box.
In both cases, how is it possible refer to the session/transaction of a specific database?
If I use db.session.commit() the system does the commit on all involved databases, but how can I do if I want to commit only for a single database?
It might not be a good idea to subvert and issue commits to specific databases mid session using the connections owned by the session. The session is a whole and keeps track of state for object instances, flushing changes to databases when needed etc. That means that the transaction handled by the session is not just the database transactions, but the session's own transaction as well. All that should commit and rollback as one.
You could on the other hand create new SQLAlchemy (or Flask-SQLAlchemy) sessions that possibly join the ongoing transaction in one of the binds:
session = db.create_scoped_session(
options=dict(bind=db.get_engine(app, 'oracle_bind'),
binds={}))
This is what the pull request is about. It allows using an existing transactional connection as the bind for a new Flask-SQLAlchemy session. This is very useful for example in testing, as can be seen in the rationale for that pull request. That way you can have a "master" transaction that can for example rollback everything done in testing.
Note that the SignallingSession always consults the db.get_engine() method if a bind_key is present. This means that the example session is unable to query tables without a bind key and which don't exist on your oracle DB, but would still work for tables with your mssql_bind key.
The issue you linked to on the other hand does list ways to issue SQL to specific binds:
rows = db.session.execute(query, params,
bind=db.get_engine(app, 'oracle_bind'))
There were other less explicit methods listed as well, but explicit is better than implicit.

Flask-SQLAlchemy and SQLAlchemy

I'm building a small website and I already have all my models in SQLAlchemy. The website is to publish some information from some calculations which are done offline. Only the results will be published to a slimmed down database i.e. it contains the results, not the raw data, but the website needs to query the results.
I'm going to use Flask, as my models are already driven with Python (and some heavy lifting in C++ via SWIG) and I don't want to use Django.
Now this has been asked before I'm sure and the usual mantra without much justification is to 'use Flask-SQLAlchemy'. The question is why?
If I write some session handling myself, why do I have to go through the additional layer of redefining my database in Flask-SQLAlchemy. Other than having to write some code like here in my Flask app somewhere:
#app.before_request
def before_request():
g.db = connect_db()
#app.teardown_request
def teardown_request(exception):
db = getattr(g, 'db', None)
if db is not None:
db.close()
What else do I need to worry about? SQLAlchemy even does connection pooling for me by default.
Actually, you are building a web application using Flask which do database related kinds of stuff by sqlalchemy. So, when you are dealing with database session with multiple requests which your application handles, then you have to ascertain that, you are creating and closing sessions cautiously.
If you read SQLAlchemy docs, they recommend to keep the lifecycle of the session separate and external from functions and objects that access and/or manipulate database data. This will greatly help with achieving a predictable and consistent transactional scope.
A web application is the easiest case because such an application is already constructed around a single, consistent scope - this is the request, which represents an incoming request from a browser, the processing of that request to formulate a response, and finally the delivery of that response back to the client. Integrating web applications with the Session is then the straightforward task of linking the scope of the Session to that of the request. The Session can be established as the request begins, or using a lazy initialization pattern which establishes one as soon as it is needed. The request then proceeds, with some system in place where application logic can access the current Session in a manner associated with how the actual request object is accessed. As the request ends, the Session is torn down as well, usually through the usage of event hooks provided by the web framework. The transaction used by the Session may also be committed at this point, or alternatively the application may opt for an explicit commit pattern, only committing for those requests where one is warranted, but still always tearing down the Session unconditionally at the end.
In laymen terms, i mean to say that
In SQLAlchemy, above action is mentioned because sessions in web application should be scoped, meaning that each request handler creates and destroys its own session.
This is necessary because web servers can be multi-threaded, so multiple requests might be served at the same time, each working with a different database session.
This means that if you are using SqlAlchemy with Flask, you have to manually handle session like creating scoped session and also remove them on each request cautiously otherwise you may be in deep shit, which adds extra layer of complexity to your web application.
But, there comes Flask-SqlAlchemy (an extension of sqlalchemy library for Flask app) which provides infrastructure to assist in the task of aligning the lifespan of a Session with that of each web request. Actually, you can also found that in SqlAlchmey docs, they also recommend to use this with Flask.
Flask-SQLAlchemy creates a fresh/new scoped session for each request. If you dig further it you will find out here, it also installs a hook on app.teardown_appcontext (for Flask >=0.9), app.teardown_request (for Flask 0.7-0.8), app.after_request (for Flask <0.7) and here is where it calls db.session.remove().
The code you put in the question is not actually valid for Sqlalchemy integration is Flask. I know it is just example, but saying that just in case.
For Sqlalchemy integration all you need to do is to make sure current DbSession is cleaned up at the end of request via something like this:
#app.teardown_appcontext
def shutdown_session(exception=None):
DbSession.remove()
where DbSession is scoped session.
Here is documentation for the case when you dont want to use Flask-Sqlalchemy package.

Give each user their own database

I'm constructing my app such that each user has their own database (for easy isolation, and to minimize the need for sharding). This means that each web request, and all of the background scripts, need to connect to a different database based on which user is making the request, and use that connection for all function calls.
I figure I can make some sort of middleware that would pass the right connection to my web requests by attaching it to the request variable, but I don't know how I should ensure that all functions and model methods called by the request use this connection.
Well how to "ensure that all functions and model methods called by the request use this connection" is easy. You pass the connection into your api as with any well-designed code that isn't relying on global variables for such things. So you have a database session object loaded per-request, and you pass it down. It's very easy for model objects to turtle that session object further without explicitly passing it because each managed object knows what session owns it, and you can query it from there.
db = request.db
user = db.query(User).get(1)
user.add_group('foo')
class User(Base):
def add_group(self, name):
db = sqlalchemy.orm.object_session(self)
group = Group(name=name)
db.add(group)
I'm not recommending you use that exact pattern but it serves as an example of how to grab the session from a managed object, avoiding having to pass the session everywhere explicitly.
On to your original question, how to handle multi-tenancy... In your data model! Designing a system where you are splitting things up at that low of a level is a big maintenance burden and it does not scale well. For example it becomes very difficult to use any type of connection pooling when you have an arbitrary number of independent connections. To get around that people commonly use the SQL SCHEMA feature supported by some databases. That allows you to use the same connection but have access to a different table structure per session. That's better, but again managing all of those schemas independently should raise some red flags, violating DRY with all of that duplication in your data model. Any duplication at that level quickly becomes a burden that you need to be ready for.

How do SQLAlchemy Connections relate to Session-based transactions?

I am currently working on a new web application that needs to execute an SQL statement before giving a session to the application itself.
In detail: I am running a PostgreSQL database server with multiple schemas and I need to execute a SET search_path statement before the application uses the session. I am also using the ZopeTransactionExtension to have transactions automatically handled at the request level.
To ensure the exectuion of the SQL statement, there seem to be two possible ways:
Executing the statement at the Engine/Connection level via SQLAlchemy events (from Multi-tenancy with SQLAlchemy)
Executing the statement at the session level (from SQLAlchemy support of Postgres Schemas)
Since I am using a scoped session and want to keep my transactions intact, I wonder which of these ways will possibly disturb transaction management.
For example, does the Engine hand out a new connection from the Pool on every query? Or is it attached to the session for its lifetime, i.e. until the request has been processed and the session & transaction are closed/committed?
On the other hand, since I am using a scoped session, can I perform it the way zzzeek suggested it in the second link? That is, is the context preserved and automatically reset once the transaction is over?
Is there possibly a third way that I am missing?
For example, does the Engine hand out a new connection from the Pool on every query?
only if you have autocommit=True, which should not be the case.
Or is it attached to the session for its lifetime, i.e. until the request has been processed and the session & transaction are closed/committed?
it's attached per transaction. But the "search_path" in Postgresql is per postgresql session (not to be confused with SQLAlchemy session) - its basically the lifespan of the connection itself.
The session (and the engine, and the pool) these days has a ton of event hooks you can grab onto in order to set up state like this. If you want to stick with the Session you can try after_begin.

With sqlalchemy how to dynamically bind to database engine on a per-request basis

I have a Pylons-based web application which connects via Sqlalchemy (v0.5) to a Postgres database. For security, rather than follow the typical pattern of simple web apps (as seen in just about all tutorials), I'm not using a generic Postgres user (e.g. "webapp") but am requiring that users enter their own Postgres userid and password, and am using that to establish the connection. That means we get the full benefit of Postgres security.
Complicating things still further, there are two separate databases to connect to. Although they're currently in the same Postgres cluster, they need to be able to move to separate hosts at a later date.
We're using sqlalchemy's declarative package, though I can't see that this has any bearing on the matter.
Most examples of sqlalchemy show trivial approaches such as setting up the Metadata once, at application startup, with a generic database userid and password, which is used through the web application. This is usually done with Metadata.bind = create_engine(), sometimes even at module-level in the database model files.
My question is, how can we defer establishing the connections until the user has logged in, and then (of course) re-use those connections, or re-establish them using the same credentials, for each subsequent request.
We have this working -- we think -- but I'm not only not certain of the safety of it, I also think it looks incredibly heavy-weight for the situation.
Inside the __call__ method of the BaseController we retrieve the userid and password from the web session, call sqlalchemy create_engine() once for each database, then call a routine which calls Session.bind_mapper() repeatedly, once for each table that may be referenced on each of those connections, even though any given request usually references only one or two tables. It looks something like this:
# in lib/base.py on the BaseController class
def __call__(self, environ, start_response):
# note: web session contains {'username': XXX, 'password': YYY}
url1 = 'postgres://%(username)s:%(password)s#server1/finance' % session
url2 = 'postgres://%(username)s:%(password)s#server2/staff' % session
finance = create_engine(url1)
staff = create_engine(url2)
db_configure(staff, finance) # see below
... etc
# in another file
Session = scoped_session(sessionmaker())
def db_configure(staff, finance):
s = Session()
from db.finance import Employee, Customer, Invoice
for c in [
Employee,
Customer,
Invoice,
]:
s.bind_mapper(c, finance)
from db.staff import Project, Hour
for c in [
Project,
Hour,
]:
s.bind_mapper(c, staff)
s.close() # prevents leaking connections between sessions?
So the create_engine() calls occur on every request... I can see that being needed, and the Connection Pool probably caches them and does things sensibly.
But calling Session.bind_mapper() once for each table, on every request? Seems like there has to be a better way.
Obviously, since a desire for strong security underlies all this, we don't want any chance that a connection established for a high-security user will inadvertently be used in a later request by a low-security user.
Binding global objects (mappers, metadata) to user-specific connection is not good way. As well as using scoped session. I suggest to create new session for each request and configure it to use user-specific connections. The following sample assumes that you use separate metadata objects for each database:
binds = {}
finance_engine = create_engine(url1)
binds.update(dict.fromkeys(finance_metadata.sorted_tables, finance_engine))
# The following line is required when mappings to joint tables are used (e.g.
# in joint table inheritance) due to bug (or misfeature) in SQLAlchemy 0.5.4.
# This issue might be fixed in newer versions.
binds.update(dict.fromkeys([Employee, Customer, Invoice], finance_engine))
staff_engine = create_engine(url2)
binds.update(dict.fromkeys(staff_metadata.sorted_tables, staff_engine))
# See comment above.
binds.update(dict.fromkeys([Project, Hour], staff_engine))
session = sessionmaker(binds=binds)()
I would look at the connection pooling and see if you can't find a way to have one pool per user.
You can dispose() the pool when the user's session has expired

Categories