I have a slightly unusual problem with transaction state and error handling in SQLAlchemy. The short version: is there any way of preserving a transaction when SQLAlchemy raises a ProgrammingError and aborts it?
Background
I'm working on an integration test suite for a legacy codebase. Right now, I'm designing a set of fixtures that will allow us to run all tests inside transactions, inspired by the SQLAlchemy documentation. The general paradigm involves opening a connection, starting a transaction, binding a session to that connection, and then mocking out most database access methods so that they make use of that transaction. (To get a sense of what this looks like, see the code provided in the docs link above, including the note at the end.) The goal is to allow ourselves to run methods from the codebase that perform a lot of database updates in the context of a test, with the assurance that any side effects that happen to alter the test database will get rolled back after the test has completed.
My problem is that the code often relies on handling DBAPI errors to accomplish control flow when running queries, and those errors automatically abort transactions (per the psycopg2 docs). This poses a problem, since I need to preserve the work that has been done in that transaction up to the point that the error is raised, and I need to continue using the transaction after the error handling is done.
Here's a representative method that uses error handling for control flow:
from api.database import engine
def entity_count():
"""
Count the entities in a project.
"""
get_count = '''
SELECT COUNT(*) AS entity_count FROM entity_browser
'''
with engine.begin() as conn:
try:
count = conn.execute(count).first().entity_count
except ProgrammingError:
count = 0
return count
In this example, the error handling provides a quick way of determining if the table entity_browser exists: if not, Postgres will throw an error that gets caught at the DBAPI level (psycopg2) and passed up to SQLAlchemy as a ProgrammingError.
In the tests, I mock out engine.begin() so that it always returns the connection with the ongoing transaction that was established in the test setup. Unfortunately, this means that when the code continues execution after SQLAlchemy has raised a ProgrammingError and psycopg2 has aborted the transaction, SQLAlchemy will raise an InternalError the next time a database query runs using the open connection, complaining that the transaction has been aborted.
Here's a sample test exhibiting this behavior:
import sqlalchemy as sa
def test_entity_count(session):
"""
Test the `entity_count` method.
`session` is a fixture that sets up the transaction and mocks out
database access, returning a Flask-SQLAlchemy `scoped_session` object
that we can use for queries.
"""
# Make a change to a table that we can observe later
session.execute('''
UPDATE users
SET name = 'in a test transaction'
WHERE id = 1
''')
# Drop `entity_browser` in order to raise a `ProgrammingError` later
session.execute('''DROP TABLE entity_browser''')
# Run the `entity_count` method, making sure that it raises an error
with pytest.raises(sa.exc.ProgrammingError):
count = entity_count()
assert count == 0
# Make sure that the changes we made earlier in the test still exist
altered_name = session.execute('''
SELECT name
FROM users
WHERE id = 1
''')
assert altered_name == 'in a test transaction'
Here's the type of output I get:
> altered_name = session.execute('''
SELECT name
FROM users
WHERE id = 1
''')
[... traceback history...]
def do_execute(self, cursor, statement, parameters, context=None):
> cursor.execute(statement, parameters)
E sqlalchemy.exc.InternalError: (psycopg2.InternalError) current transaction is
aborted, commands ignored until end of transaction block
Attempted solutions
My first instinct was to try to interrupt the error handling and force a rollback using SQLAlchemy's handle_error event listener. I added a listener into the test fixture that would roll back the raw connection (since SQLAlchemy Connection instances have no rollback API, as far as I understand it):
#sa.event.listens_for(connection, 'handle_error')
def raise_error(context):
dbapi_conn = context.connection.connection
dbapi_conn.rollback()
This successfully keeps the transaction open for further use, but ends up rolling back all of the previous changes made in the test. Sample output:
> assert altered_name == 'in a test transaction'
E AssertionError
Clearly, rolling back the raw connection is too aggressive of an approach. Thinking that I might be able to roll back to the last savepoint, I tried rolling back the scoped session, which has an event listener attached to it that automatically opens up a new nested transaction when a previous one ends. (See the note at the end of the SQLAlchemy doc on transactions in tests for a sample of what this looks like.)
Thanks to the mocks set up in the session fixture, I can import the scoped session directly into the event listener and roll it back:
#sa.event.listens_for(connection, 'handle_error')
def raise_error(context):
from api.database import db
db.session.rollback()
However, this approach also raises an InternalError on the next query. It seems that it doesn't actually rollback the transaction to the satisfaction of the underlying cursor.
Summary question
Is there any way of preserving the transaction after a ProgrammingError gets raised? On a more abstract level, what is happening when psycopg2 "aborts" the transaction, and how can I work around it?
The root of the problem is that you're hiding the exception from the context manager. You catch the ProgrammingError too soon and so the with-statement never sees it. Your entity_count() should be:
def entity_count():
"""
Count the entities in a project.
"""
get_count = '''
SELECT COUNT(*) AS entity_count FROM entity_browser
'''
try:
with engine.begin() as conn:
count = conn.execute(get_count).first().entity_count
except ProgrammingError:
count = 0
return count
And then if you provide something like
#contextmanager
def fake_begin():
""" Begin a nested transaction and yield the global connection.
"""
with connection.begin_nested():
yield connection
as the mocked engine.begin(), the connection stays usable. But #JL Peyret raises a good point about the logic of your test. Engine.begin() usually1 provides a new connection with an armed transaction from the pool, so your session and entity_count() shouldn't probably even be using the same connection.
1: Depends on pool configuration.
Related
I am writing E2E tests for our software and would like to figure out why the rollback call does not roll the DB back to the state when the test started.
I use a decorator for my pytest test functions.
The issue I get is that data that I write to the DB during the tests persists, even though I call the rollback() in the final statement. Which indicates that the transaction is not setup or SQLAlchemy is doing something else in the background.
I see SQLAlchemy has the SAVEPOINT feature but I am not sure if this is what I really need. I think my request is pretty simple yet the framework obfuscates it. Or simply that I am not too experienced with it...
Note - the functions tested can have multiple commit calls...
def get_postgres_db():
v2_session = sessionmaker(
autocommit=False,
autoflush=False,
bind=v2_engine
)
try:
yield v2_session
finally:
v2_session.close()
def postgres_test_decorator(test_function):
"""
Decorator to open db connection and roll back regardless of test outcome
Can be ported into pytest later
"""
def the_decorator(*args, **kwargs):
try:
postgres_session = list(get_postgres_db())[0]
# IN MY SQL MIND I WOULD LIKE TO DO HERE
# BEGIN TRANSACTION
test_function(postgres_session)
finally:
# THIS SHOULD ROLLBACK TO ORIGINAL STATE
# ROLLBACK
postgres_session.rollback()
return the_decorator
We have a pyramid web application.
We use SQLAlchemy#1.4 with Zope transactions.
In our application, it is possible for an error to occur during flush as described here which causes any subsequent usage of the SQLAlchemy session to throw a PendingRollbackError. The error which occurs during a flush is unintentional (a bug), and is raised to our exception handling view... which tries to use data from the SQLAlchemy session, which then throws a PendingRollbackError.
Is it possible to "recover" from a PendingRollbackError if you have not framed your transaction management correctly? The SQLAclhemy documentation says to avoid this situation you essentially "just need to do things the right way". Unfortunately, this is a large codebase, and developers don't always follow correct transaction management. This issue is also complicated if savepoints/nested transactions are used.
def some_view():
# constraint violation
session.add_all([Foo(id=1), Foo(id=1)])
session.commit() # Error is raised during flush
return {'data': 'some data'}
def exception_handling_view(): # Wired in via pyramid framework, error ^ enters here.
session.query(... does a query to get some data) # This throws a `PendingRollbackError`
I am wondering if we can do something like the below, but don't understand pyramid + SQLAlchemy + Zope transactions well enough to know the implications (when considering the potential for nested transactions etc).
def exception_handling_view(): # Wired in via pyramid framework, error ^ enters here.
def _query():
session.query(... does a query to get some data)
try:
_query()
except PendingRollbackError:
session.rollback()
_query()
Instead of trying to execute your query, just try to get the connection:
def exception_handling_view():
try:
_ = session.connection()
except PendingRollbackError:
session.rollback()
session.query(...)
session.rollback() only rolls back the innermost transaction, as is usually expected — assuming nested transactions are used intentionally via the explicit session.begin_nested().
You don't have to rollback parent transactions, but if you decide to do that, you can:
while session.registry().in_transaction():
session.rollback()
I'm trying to do some schema changes inside a transaction manager provided by pyramid. I'm running into various issues trying to run commit after a rollback:
The simplified version is:
def get_version(conn):
try:
result = conn.execute('SELECT version FROM versions LIMIT 1')
return result.scalar()
except:
conn.rollback()
return 0
def m_version_table(conn):
conn.execute('CREATE TABLE versions (version INT)')
conn.execute('INSERT INTO versions VALUES (1)')
def handle(conn):
ver = get_version(conn)
m_version_table(conn)
# task started with pyramid's transaction manager
with env['request'].tm as tm:
handle(env['request'].dbsession)
The transactions are started implicitly, which I can see in the logs:
BEGIN (implicit)
SELECT version FROM versions LIMIT 1
()
ROLLBACK
BEGIN (implicit)
CREATE TABLE versions (version INT)
()
INSERT INTO versions VALUES (1)
()
UPDATE versions SET version = %s
(1,)
ROLLBACK
If versions exists (and I run a different ALTER afterwards) everything works fine. But after the rollback, I just get:
Traceback (most recent call last):
File ".venv/bin/schema_refresh", line 11, in <module>
load_entry_point('project', 'console_scripts', 'schema_refresh')()
File ".../schema_refresh.py", line 270, in run
handle(env['request'].dbsession, tm)
File ".../transaction-2.4.0-py3.7.egg/transaction/_manager.py", line 140, in __exit__
self.commit()
File ".../transaction-2.4.0-py3.7.egg/transaction/_manager.py", line 131, in commit
return self.get().commit()
...
sqlalchemy.exc.ResourceClosedError: This transaction is closed
Why can't the next transaction be committed, even if a new transaction has been correctly started after the rollback? (ROLLBACK is followed by BEGIN (implicit))
tl;dr
It very much seems like it's not the new transaction that __exit__ tries to commit in your example.
Calling rollback on the DB session does create a new session transaction, but that's not joined with the transaction that the manager tracks within the context. Your subsequent calls to execute are done in the new session transaction, but commit is called on the original, first transaction, created when entering the context.
Assuming that you used cookiecutter to set up your project, your models.__init__.py will probably be the default from the repo.
That means that env['request'].tm returns a Zope TransactionManager and, when entering its context, the begin() method instantiates a Transaction object and stores it in the _txn attribute.
env['request'].dbsession returns an SQLAlchemy Session, after registering it with the transaction manager.
The TransactionManager's Transaction is now joined with the Session's SessionTransaction and should control its end and outcome.
Rolling back the SessionTransaction while handling the exception raised in the execute() call bypasses the transaction manager. Calling its commit() or rollback() methods, as done later by __exit__, will have it still try to terminate the SessionTransaction you rolled back.
Also, there's no mechanism that would join the new transaction with the manager.
You can either use the transaction manager or opt for manual transaction control. Just stick with your choice and don't mix both.
You're using conn.execute which is not tracked by the transaction manager (which, by default, only tracks changes done via the ORM). You can either 1) modify your code that does zope.sqlalchemy.registry(session) to set initial_state='changed' such that the default will be to COMMIT instead of ROLLBACK (the default avoids extra commits if it doesn't know something changed - for performance) or 2) mark specific sessions that do this with zope.sqlalchemy.mark_changed(session).
Finally, get_version should be done by coordinating with the transaction manager so that the entire transaction doesn't go into a bad state (despite your rollback the manager is still marked aborted right now). To do this use tm.savepoint():
def get_version(conn, tm):
sp = tm.savepoint()
try:
result = conn.execute('SELECT version FROM versions LIMIT 1')
return result.scalar()
except:
sp.rollback()
return 0
I have a table in a database, mapped with SQLAlchemy ORM module (I have a "scoped_session" Variable)
I want multiple instances of my program (not just threads, also from several servers) to be able to work on the same table and NOT work on the same data.
so i have coded a manual "row-lock" mechanism to make sure each row is handled in this method i use "Full Lock" on the table while i "row-lock" it:
def instance:
s = scoped_session(sessionmaker(bind=engine)
engine.execute("LOCK TABLES my_data WRITE")
rows = s.query(Row_model).filter(Row_model.condition == 1).filter(Row_model.is_locked == 0).limit(10)
for row in rows:
row.is_locked = 1
row.lock_time = datetime.now()
s.commit()
engine.execute("UNLOCK TABLES")
for row in row:
manipulate_data(row)
row.is_locked = 0
s.commit()
for i in range(10):
t = threading.Thread(target=instance)
t.start()
The problem is that while running some instances, several threads are collapsing and produce this error (each):
sqlalchemy.exc.DatabaseError: (raised as a result of Query-invoked
autoflush; consider using a session.no_autoflush block if this flush
is occurring prematurely) (DatabaseError) 1205 (HY000): Lock wait
timeout exceeded; try restarting transaction 'UPDATE my_daya SET
row_var = 1}
Where is the catch? what makes my DB table to not UNLOCK successfully?
Thanks.
Locks are evil. Avoid them. Things go very bad when errors occur. Especially when you mix sessions with raw SQL statements, like you do.
The beauty of the scoped session is that it wraps a database transaction. This transaction makes the modifications to the database atomic, and also takes care of cleaning up when things go wrong.
Use scoped sessions as follows:
with scoped_session(sessionmaker(bind=engine) as s:
<ORM actions using s>
It may be some work to rewrite your code so that it becomes properly transactional, but it will be worth it! Sqlalchemy has tricks to help you with that.
Context
So I am trying to figure out how to properly override the auto-transaction when using SQLite in Python. When I try and run
cursor.execute("BEGIN;")
.....an assortment of insert statements...
cursor.execute("END;")
I get the following error:
OperationalError: cannot commit - no transaction is active
Which I understand is because SQLite in Python automatically opens a transaction on each modifying statement, which in this case is an INSERT.
Question:
I am trying to speed my insertion by doing one transaction per several thousand records.
How can I overcome the automatic opening of transactions?
As #CL. said you have to set isolation level to None. Code example:
s = sqlite3.connect("./data.db")
s.isolation_level = None
try:
c = s.cursor()
c.execute("begin")
...
c.execute("commit")
except:
c.execute("rollback")
The documentaton says:
You can control which kind of BEGIN statements sqlite3 implicitly executes (or none at all) via the isolation_level parameter to the connect() call, or via the isolation_level property of connections.
If you want autocommit mode, then set isolation_level to None.