How do SQLAlchemy Connections relate to Session-based transactions? - python

I am currently working on a new web application that needs to execute an SQL statement before giving a session to the application itself.
In detail: I am running a PostgreSQL database server with multiple schemas and I need to execute a SET search_path statement before the application uses the session. I am also using the ZopeTransactionExtension to have transactions automatically handled at the request level.
To ensure the exectuion of the SQL statement, there seem to be two possible ways:
Executing the statement at the Engine/Connection level via SQLAlchemy events (from Multi-tenancy with SQLAlchemy)
Executing the statement at the session level (from SQLAlchemy support of Postgres Schemas)
Since I am using a scoped session and want to keep my transactions intact, I wonder which of these ways will possibly disturb transaction management.
For example, does the Engine hand out a new connection from the Pool on every query? Or is it attached to the session for its lifetime, i.e. until the request has been processed and the session & transaction are closed/committed?
On the other hand, since I am using a scoped session, can I perform it the way zzzeek suggested it in the second link? That is, is the context preserved and automatically reset once the transaction is over?
Is there possibly a third way that I am missing?

For example, does the Engine hand out a new connection from the Pool on every query?
only if you have autocommit=True, which should not be the case.
Or is it attached to the session for its lifetime, i.e. until the request has been processed and the session & transaction are closed/committed?
it's attached per transaction. But the "search_path" in Postgresql is per postgresql session (not to be confused with SQLAlchemy session) - its basically the lifespan of the connection itself.
The session (and the engine, and the pool) these days has a ton of event hooks you can grab onto in order to set up state like this. If you want to stick with the Session you can try after_begin.

Related

Python select doesn't get updated records [duplicate]

I'm having a problem with the sessions in my python/wsgi web app. There is a different, persistent mysqldb connection for each thread in each of 2 wsgi daemon processes. Sometimes, after deleting old sessions and creating a new one, some connections still fetch the old sessions in a select, which means they fail to validate the session and ask for login again.
Details: Sessions are stored in an InnoDB table in a local mysql database. After authentication (through CAS), I delete any previous sessions for that user, create a new session (insert a row), commit the transaction, and redirect to the originally requested page with the new session id in the cookie. For each request, a session id in the cookie is checked against the sessions in the database.
Sometimes, a newly created session is not found in the database after the redirect. Instead, the old session for that user is still there. (I checked this by selecting and logging all of the sessions at the beginning of each request). Somehow, I'm getting cached results. I tried selecting the sessions with SQL_NO_CACHE, but it made no difference.
Why am I getting cached results? Where else could the caching occur, and how can stop it or refresh the cache? Basically, why do the other connections fail to see the newly inserted data?
MySQL defaults to the isolation level "REPEATABLE READ" which means you will not see any changes in your transaction that were done after the transaction started - even if those (other) changes were committed.
If you issue a COMMIT or ROLLBACK in those sessions, you should see the changed data (because that will end the transaction that is "in progress").
The other option is to change the isolation level for those sessions to "READ COMMITTED". Maybe there is an option to change the default level as well, but you would need to check the manual for that.
Yes, it looks like the assumption is that you are only going to perform a single transaction and then disconnect. If you have a different need, then you need to work around this assumption. As mentioned by #a_horse_with_no_name, you can put in a commit (though I would use a rollback if you are not actually changing data). Or you can change the isolation level on the cursor - from this discussion I used this:
dbcursor.execute("SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED")
Or, it looks like you can set auto commit to true on the connection:
dbconn.autocommit(True)
Though, again, this is not recommended if actually making changes in the connection.

pymysql: cursor after commit to database returns old values in multithreading mode [duplicate]

I'm having a problem with the sessions in my python/wsgi web app. There is a different, persistent mysqldb connection for each thread in each of 2 wsgi daemon processes. Sometimes, after deleting old sessions and creating a new one, some connections still fetch the old sessions in a select, which means they fail to validate the session and ask for login again.
Details: Sessions are stored in an InnoDB table in a local mysql database. After authentication (through CAS), I delete any previous sessions for that user, create a new session (insert a row), commit the transaction, and redirect to the originally requested page with the new session id in the cookie. For each request, a session id in the cookie is checked against the sessions in the database.
Sometimes, a newly created session is not found in the database after the redirect. Instead, the old session for that user is still there. (I checked this by selecting and logging all of the sessions at the beginning of each request). Somehow, I'm getting cached results. I tried selecting the sessions with SQL_NO_CACHE, but it made no difference.
Why am I getting cached results? Where else could the caching occur, and how can stop it or refresh the cache? Basically, why do the other connections fail to see the newly inserted data?
MySQL defaults to the isolation level "REPEATABLE READ" which means you will not see any changes in your transaction that were done after the transaction started - even if those (other) changes were committed.
If you issue a COMMIT or ROLLBACK in those sessions, you should see the changed data (because that will end the transaction that is "in progress").
The other option is to change the isolation level for those sessions to "READ COMMITTED". Maybe there is an option to change the default level as well, but you would need to check the manual for that.
Yes, it looks like the assumption is that you are only going to perform a single transaction and then disconnect. If you have a different need, then you need to work around this assumption. As mentioned by #a_horse_with_no_name, you can put in a commit (though I would use a rollback if you are not actually changing data). Or you can change the isolation level on the cursor - from this discussion I used this:
dbcursor.execute("SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED")
Or, it looks like you can set auto commit to true on the connection:
dbconn.autocommit(True)
Though, again, this is not recommended if actually making changes in the connection.

SQLAlchemy best practices with sessions for REST API

I created a rest API with openapi generator that contains all the requests necessary for selecting, inserting, and updating my SQL database.
I use from my database generation and manipulation SQLAlchemy and I'm not sure how to use the session to interact with the database in this context.
My project looks like this:
DB
| openapi_server (generated)
| __init__.py
| request.py
| database.py
In database.py I keep my database structure.
In request.py I have all the functions that need to be processed on every request(to interact with the database).
My way of handling this situation is: I create a session variable at the beginning of each function and after the operation is complete I close it.
Any other methods that are more scalable and easy to maintain or which are the best practices?
My understanding is that the sqlalchemy session is different from the client session in that the client session stores information about authorization & permissions whereas the sqlalchemy session is a gil-bound transaction state which associates your code / machine to an external database.
Assuming you're not utilizing multithreading or parallel processing, a single sqlalchemy session shared between your application would be appropriate. In the case where your users have different levels of database permissions, I would establish those rules in your application authorization, rather than the database user-permission schema. (That should be reserved for system-users.)
Bear in mind, multiple sqlalchemy sessions are appropriate in many scenarios and there are advantages for creating and closing sessions on the fly. But there are also potential downsides, such as write collisions (2 processes try to write the same record) and so on. In these more fine grained cases, I'd suggest a queuing process as a central orchestrator.
For implementation:I usually create a file create_session.py which has a function to create a new db session with the appropriate DB URI. I then call that function in the main __init__.py like so: session = create_session() --> importing that session throughout the application is done by importing session from the main module ex: from database import session.
In cases where you need to create new / multiple sessions, do so with:
# Getting the path right here isn't always straightforward tbh
# basically, import the function from the module directly
from create_session import create_session
def do_something():
# Always create your session in a method
# otherwise your db will open many unnecessary connections
my_session = create_session()
print('Done')
# Close the session when you're done
my_session.close()

Flask-SQLAlchemy and SQLAlchemy

I'm building a small website and I already have all my models in SQLAlchemy. The website is to publish some information from some calculations which are done offline. Only the results will be published to a slimmed down database i.e. it contains the results, not the raw data, but the website needs to query the results.
I'm going to use Flask, as my models are already driven with Python (and some heavy lifting in C++ via SWIG) and I don't want to use Django.
Now this has been asked before I'm sure and the usual mantra without much justification is to 'use Flask-SQLAlchemy'. The question is why?
If I write some session handling myself, why do I have to go through the additional layer of redefining my database in Flask-SQLAlchemy. Other than having to write some code like here in my Flask app somewhere:
#app.before_request
def before_request():
g.db = connect_db()
#app.teardown_request
def teardown_request(exception):
db = getattr(g, 'db', None)
if db is not None:
db.close()
What else do I need to worry about? SQLAlchemy even does connection pooling for me by default.
Actually, you are building a web application using Flask which do database related kinds of stuff by sqlalchemy. So, when you are dealing with database session with multiple requests which your application handles, then you have to ascertain that, you are creating and closing sessions cautiously.
If you read SQLAlchemy docs, they recommend to keep the lifecycle of the session separate and external from functions and objects that access and/or manipulate database data. This will greatly help with achieving a predictable and consistent transactional scope.
A web application is the easiest case because such an application is already constructed around a single, consistent scope - this is the request, which represents an incoming request from a browser, the processing of that request to formulate a response, and finally the delivery of that response back to the client. Integrating web applications with the Session is then the straightforward task of linking the scope of the Session to that of the request. The Session can be established as the request begins, or using a lazy initialization pattern which establishes one as soon as it is needed. The request then proceeds, with some system in place where application logic can access the current Session in a manner associated with how the actual request object is accessed. As the request ends, the Session is torn down as well, usually through the usage of event hooks provided by the web framework. The transaction used by the Session may also be committed at this point, or alternatively the application may opt for an explicit commit pattern, only committing for those requests where one is warranted, but still always tearing down the Session unconditionally at the end.
In laymen terms, i mean to say that
In SQLAlchemy, above action is mentioned because sessions in web application should be scoped, meaning that each request handler creates and destroys its own session.
This is necessary because web servers can be multi-threaded, so multiple requests might be served at the same time, each working with a different database session.
This means that if you are using SqlAlchemy with Flask, you have to manually handle session like creating scoped session and also remove them on each request cautiously otherwise you may be in deep shit, which adds extra layer of complexity to your web application.
But, there comes Flask-SqlAlchemy (an extension of sqlalchemy library for Flask app) which provides infrastructure to assist in the task of aligning the lifespan of a Session with that of each web request. Actually, you can also found that in SqlAlchmey docs, they also recommend to use this with Flask.
Flask-SQLAlchemy creates a fresh/new scoped session for each request. If you dig further it you will find out here, it also installs a hook on app.teardown_appcontext (for Flask >=0.9), app.teardown_request (for Flask 0.7-0.8), app.after_request (for Flask <0.7) and here is where it calls db.session.remove().
The code you put in the question is not actually valid for Sqlalchemy integration is Flask. I know it is just example, but saying that just in case.
For Sqlalchemy integration all you need to do is to make sure current DbSession is cleaned up at the end of request via something like this:
#app.teardown_appcontext
def shutdown_session(exception=None):
DbSession.remove()
where DbSession is scoped session.
Here is documentation for the case when you dont want to use Flask-Sqlalchemy package.

With sqlalchemy how to dynamically bind to database engine on a per-request basis

I have a Pylons-based web application which connects via Sqlalchemy (v0.5) to a Postgres database. For security, rather than follow the typical pattern of simple web apps (as seen in just about all tutorials), I'm not using a generic Postgres user (e.g. "webapp") but am requiring that users enter their own Postgres userid and password, and am using that to establish the connection. That means we get the full benefit of Postgres security.
Complicating things still further, there are two separate databases to connect to. Although they're currently in the same Postgres cluster, they need to be able to move to separate hosts at a later date.
We're using sqlalchemy's declarative package, though I can't see that this has any bearing on the matter.
Most examples of sqlalchemy show trivial approaches such as setting up the Metadata once, at application startup, with a generic database userid and password, which is used through the web application. This is usually done with Metadata.bind = create_engine(), sometimes even at module-level in the database model files.
My question is, how can we defer establishing the connections until the user has logged in, and then (of course) re-use those connections, or re-establish them using the same credentials, for each subsequent request.
We have this working -- we think -- but I'm not only not certain of the safety of it, I also think it looks incredibly heavy-weight for the situation.
Inside the __call__ method of the BaseController we retrieve the userid and password from the web session, call sqlalchemy create_engine() once for each database, then call a routine which calls Session.bind_mapper() repeatedly, once for each table that may be referenced on each of those connections, even though any given request usually references only one or two tables. It looks something like this:
# in lib/base.py on the BaseController class
def __call__(self, environ, start_response):
# note: web session contains {'username': XXX, 'password': YYY}
url1 = 'postgres://%(username)s:%(password)s#server1/finance' % session
url2 = 'postgres://%(username)s:%(password)s#server2/staff' % session
finance = create_engine(url1)
staff = create_engine(url2)
db_configure(staff, finance) # see below
... etc
# in another file
Session = scoped_session(sessionmaker())
def db_configure(staff, finance):
s = Session()
from db.finance import Employee, Customer, Invoice
for c in [
Employee,
Customer,
Invoice,
]:
s.bind_mapper(c, finance)
from db.staff import Project, Hour
for c in [
Project,
Hour,
]:
s.bind_mapper(c, staff)
s.close() # prevents leaking connections between sessions?
So the create_engine() calls occur on every request... I can see that being needed, and the Connection Pool probably caches them and does things sensibly.
But calling Session.bind_mapper() once for each table, on every request? Seems like there has to be a better way.
Obviously, since a desire for strong security underlies all this, we don't want any chance that a connection established for a high-security user will inadvertently be used in a later request by a low-security user.
Binding global objects (mappers, metadata) to user-specific connection is not good way. As well as using scoped session. I suggest to create new session for each request and configure it to use user-specific connections. The following sample assumes that you use separate metadata objects for each database:
binds = {}
finance_engine = create_engine(url1)
binds.update(dict.fromkeys(finance_metadata.sorted_tables, finance_engine))
# The following line is required when mappings to joint tables are used (e.g.
# in joint table inheritance) due to bug (or misfeature) in SQLAlchemy 0.5.4.
# This issue might be fixed in newer versions.
binds.update(dict.fromkeys([Employee, Customer, Invoice], finance_engine))
staff_engine = create_engine(url2)
binds.update(dict.fromkeys(staff_metadata.sorted_tables, staff_engine))
# See comment above.
binds.update(dict.fromkeys([Project, Hour], staff_engine))
session = sessionmaker(binds=binds)()
I would look at the connection pooling and see if you can't find a way to have one pool per user.
You can dispose() the pool when the user's session has expired

Categories