When should I be calling flush() on SQLAlchemy?

When should I be calling flush() on SQLAlchemy? - python

I'm new to SQLAlchemy and have inherited a somewhat messy codebase without access to the original author.
The code is litered with calls to DBSession.flush(), seemingly any time the author wanted to make sure data was being saved. At first I was just following patterns I saw in this code, but as I'm reading docs, it seems this is unnecessary - that autoflushing should be in place. Additionally, I've gotten into a few cases with AJAX calls that generate the error "InvalidRequestError: Session is already flushing".
Under what scenarios would I legitimately want to keep a call to flush()?
This is a Pyramid app, and SQLAlchemy is being setup with:
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension(), expire_on_commit=False))
Base = declarative_base()

The ZopeTransactionExtension on the DBSession in conjunction with the pyramid_tm being active on your project will handle all commits for you. The situations where you need to flush are:
You want to create a new object and get back the primary key.
DBSession.add(obj)
DBSession.flush()
log.info('look, my new object got primary key %d', obj.id)
You want to try to execute some SQL in a savepoint and rollback if it fails without invalidating the entire transaction.
sp = transaction.savepoint()
try:
foo = Foo()
foo.id = 5
DBSession.add(foo)
DBSession.flush()
except IntegrityError:
log.error('something already has id 5!!')
sp.rollback()
In all other cases involving the ORM, the transaction will be aborted for you upon exception, or committed upon success automatically by pyramid_tm. If you execute raw SQL, you will need to execute transaction.commit() yourself or mark the session as dirty via zope.sqlalchemy.mark_changed(DBSession) otherwise there is no way for the ZTE to know the session has changed.
Also you should leave expire_on_commit at the default of True unless you have a really good reason.

Related

SQLAlchemy not rolling back after FlushError

I'm writing some test to a REST API linked to a MySQL db with python+werkzeug+SQLalchemy, one of the test is to try to add a "object" with the primary key missing in the json and verify that it fails and doesn't insert anything in the DB. It used to work fine with sqlite but I switched to MySQLdb and now I get a FlushError (instead of an IntegrityError I used to catch) and when I try to rollback after the error, it doesn't throw any error but the entry is in the database with the primary key set to ''. The code looks like this:
session = Session()
try:
res = func(*args, session=session, **kwargs)
session.commit()
except sqlalchemy.exc.SQLAlchemyError as e:
session.rollback()
return abort(422)
else:
return res
finally:
session.close()
And here's the error that I catch during the try/except:
class 'sqlalchemy.orm.exc.FlushError':Instance has a NULL identity key. If this is an auto-generated value, check that the database table allows generation of new primary key values, and that the mapped Column object is configured to expect these generated values. Ensure also that this flush() is not occurring at an inappropriate time, such as within a load() event.
I just read the documentation about the SQLalchemy session and rollback feature but don't understand why the rollback doesn't work for me as this is almost textbook example from the documentation.
I use Python 2.7.13, werkzeug '0.12.2', sqlalchemy '1.1.13' and MySQLdb '1.2.3' and mysql Ver 14.14 Distrib 5.1.73 !
Thanks for your help

It looks like the problem was MYSQL only:
By default, the strict mode isn't activated and allow incorrect insert/update to make changes in the database (wtf?), the solution is to change the sql_mode, either globally:
MySQL: Setting sql_mode permanently
Or in SQLalchemy like explained in this blog post:
https://www.enricozini.org/blog/2012/tips/sa-sqlmode-traditional/

Querying objects added to a non committed session in SQLAlchemy

So I placed this question without too much context, and got downvoted, let's try again...
For one, I don't follow the logic behind SQLAlchemy's session.add. I understand that it queues the object for insertion, and I understand that session.query looks in the connected database rather than in the session, but is it at all possible, within SQLAlchemy, to query the session without first doing session.flush? My expectation from something which reads session.query is that it queries the session...
I am now manually looking in session.new after a None comes out of session.query().first().
There's two reasons why I don't want to do session.flush before my session.query,
one based on efficiency fears (why should I write to the database, and query the database if I am still within a session which the user may want to rollback?);
two is because I've adopted a fairly large program, and it manages to define its own Session whose instances causes flush to also commit.
So really the core of this question is who's helping me find an error in a GPL program on github!
This is a code snippet with a surprising behaviour in bauble/ghini:
# setting up things in ghini
# <replace-later>
import bauble
import bauble.db as db
db.open('sqlite:///:memory:', verify=False)
from bauble.prefs import prefs
import bauble.pluginmgr as pluginmgr
prefs.init()
prefs.testing = True
pluginmgr.load()
db.create(True)
Session = bauble.db.Session
from bauble.plugins.garden import Location
# </replace-later>
# now just plain straightforward usage
session = Session()
session.query(Location).delete()
session.commit()
u0 = session.query(Location).filter_by(code=u'mario').first()
print u0
u1 = Location(code=u'mario')
session.add(u1)
session.flush()
u2 = session.query(Location).filter_by(code=u'mario').one()
print u1, u2, u1==u2
session.rollback()
u3 = session.query(Location).filter_by(code=u'mario').first()
print u3
the output here is:
None
mario mario True
mario
here you have what I think is just standard simple code to set up a database:
from sqlalchemy import Column, Unicode
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class Location(Base):
__tablename__ = 'location'
code = Column(Unicode(64), index=True, primary_key=True)
def __init__(self, code=None):
self.code = code
def __repr__(self):
return self.code
from sqlalchemy import create_engine
engine = create_engine('sqlite:///joindemo.db')
Base.metadata.create_all(engine)
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine, autoflush=False)
with this, the output of the same above code snippet is less surprising:
None
mario mario True
None

The reason why flushes in bauble end up emitting a COMMIT is the line 133 in db.py where they handle their history table:
table.insert(dict(table_name=mapper.local_table.name,
table_id=instance.id, values=str(row),
operation=operation, user=user,
timestamp=datetime.datetime.today())).execute()
Instead of issuing the additional SQL in the event handler using the passed in transactional connection, as they should, they execute the statement itself as is, which means it ends up using the engine as the bind (found through the table's metadata). Executing using the engine has autocommit behaviour. Since bauble always uses a SingletonThreadPool, there's just one connection per thread, and so that statement ends up committing the flushed changes as well. I wonder if this bug is the reason why bauble disables autoflush...
The fix is to change the event handling to use the transactional connection:
class HistoryExtension(orm.MapperExtension):
"""
HistoryExtension is a
:class:`~sqlalchemy.orm.interfaces.MapperExtension` that is added
to all clases that inherit from bauble.db.Base so that all
inserts, updates, and deletes made to the mapped objects are
recorded in the `history` table.
"""
def _add(self, operation, mapper, connection, instance):
"""
Add a new entry to the history table.
"""
... # a ton of code here
table = History.__table__
stmt = table.insert(dict(table_name=mapper.local_table.name,
table_id=instance.id, values=str(row),
operation=operation, user=user,
timestamp=datetime.datetime.today()))
connection.execute(stmt)
def after_update(self, mapper, connection, instance):
self._add('update', mapper, connection, instance)
def after_insert(self, mapper, connection, instance):
self._add('insert', mapper, connection, instance)
def after_delete(self, mapper, connection, instance):
self._add('delete', mapper, connection, instance)
It's worth a note that MapperExtension has been deprecated since version 0.7.
Regarding your views about the session I quote "Session Basics", which you really should read through:
In the most general sense, the Session establishes all conversations with the database and represents a “holding zone” for all the objects which you’ve loaded or associated with it during its lifespan. It provides the entrypoint to acquire a Query object, which sends queries to the database using the Session object’s current database connection, ...
and "Is the Session a cache?":
Yeee…no. It’s somewhat used as a cache, in that it implements the identity map pattern, and stores objects keyed to their primary key. However, it doesn’t do any kind of query caching. This means, if you say session.query(Foo).filter_by(name='bar'), even if Foo(name='bar') is right there, in the identity map, the session has no idea about that. It has to issue SQL to the database, get the rows back, and then when it sees the primary key in the row, then it can look in the local identity map and see that the object is already there. It’s only when you say query.get({some primary key}) that the Session doesn’t have to issue a query.
So:
My expectation from something which reads session.query is that it queries the session...
Your expectations are wrong. The Session handles talking to the DB – among other things.
There's two reasons why I don't want to do session.flush before my session.query,
one based on efficiency fears (why should I write to the database, and query the database if I am still within a session which the user may want to rollback?);
Because your DB may do validation, have triggers, and generate values for some columns – primary keys, timestamps, and the like. The data you thought you're inserting may end up something else in the DB and the Session has absolutely no way to know about that.
Also, why should SQLAlchemy implement a sort of an in-memory DB in itself, with its own query engine, and all the problems that come with synchronizing 2 databases? How would SQLAlchemy support all the different operations and functions of different DBs you query against? Your simple equality predicate example just scratches the surface.
When you rollback, you roll back the DB's transaction (along with the session's unflushed changes).
two is because I've adopted a fairly large program, and it manages to define its own Session whose instances causes flush to also commit.
Caused by the event handling bug.

PonyORM (Python) "Value was updated outside of current transaction" but it wasn't

I'm using Pony ORM version 0.7 with a Sqlite3 database on disk, and running into this issue: I am performing a select, then an update, then a select, then another update, and getting an error message of
pony.orm.core.UnrepeatableReadError: Value of Task.order_id for
Task[23654] was updated outside of current transaction (was: 1, now: 2)
I've reduced the problem to the minimum set of commands that causes the problem (i.e. removing anything causes the problem not to occur):
#db_session
def test_method():
tasks = list(map(Task.to_dict, Task.select()))
db.execute("UPDATE Task SET order_id=order_id*2")
task_to_move = select(task for task in Task if task.order_id == 2).first()
task_to_move.order_id = 1
test_method()
For completeness's sake, here is the definition of Task:
class Task(db.Entity):
text = Required(unicode)
heading = Required(int)
create_timestamp = Required(datetime)
done_timestamp = Optional(datetime)
order_id = Required(int)
Also, if I remove the constraint that task.order_id == 2 from my select, the problem no longer occurs, so I assume the problem has something to do with querying based on a field that has been changed since the transaction has started, but I don't know why the error message is telling me that it was changed by a different transaction (unless maybe db.execute is executing in a separate transaction because it is raw SQL?)
I've already looked at this similar question, but the problem was different (Pony ORM reports record "was updated outside of current transaction" while there is not other transaction) and at this documentation (https://docs.ponyorm.com/transactions.html) but neither solved my problem.
Any ideas what might be going on here?

Pony uses optimistic concurrency control by default. For each attribute Pony remembers its current value (potentially modified by application code) as well as original value which was read from the database. During UPDATE Pony checks that the value of column in the database is still the same. If the value is changed, Pony assumes that some concurrent transaction did it, and throw exception in order to avoid the "lost update" situation.
If you execute some raw SQL query, Pony does not know what exactly was modified in the database. So when Pony encounters that the counter value was changed, it mistakenly thinks that the value was changed by another transaction.
In order to avoid the problem you can mark order_id attribute as volatile. Then Pony will assume, that the value of attribute can change at any time (by trigger or raw SQL update), and will exclude that attribute from optimistic checks:
class Task(db.Entity):
text = Required(unicode)
heading = Required(int)
create_timestamp = Required(datetime)
done_timestamp = Optional(datetime)
order_id = Required(int, volatile=True)
Note that Pony will cache the value of volatile attribute and will not re-read the value from the database until the object was saved, so in some situation you can get obsolete value in Python.
Update:
Starting from release 0.7.4 you can also specify optimistic=False option to db_session to turn off optimistic checks for specific transaction that uses raw SQL queries:
with db_session(optimistic=False):
...
or
#db_session(optimistic=False)
def some_function():
...
Also it is possible now to specify optimistic=False option for attribute instead of specifying volatile=True. Then Pony will not make optimistic checks for that attribute, but will still consider treat it as non-volatile

Insert not working for SQLAlchemy database session

Why isn't a record being inserted? There is an id returned but when I check the database there is no new record.
From models.py
from zope.sqlalchemy import ZopeTransactionExtension
DBSession = scoped_session(sessionmaker(extension=ZopeTransactionExtension()))
And views.py
DBSession.execute(text('INSERT INTO (a,b,c) VALUES (\'a\',\'b\',\'c\') RETURNING id'), params=dict(a=a,b=b,c=c))
I've tried committing with transaction.commit() which doesn't get an error but doesn't insert a record. result.fetchone()[0] is getting an id.
And DBSession.commit which gets
assert self.transaction_manager.get().status == ZopeStatus.COMMITTING, "Transaction must be committed using the transaction manager"

This is because you are not using ORM to insert new rows threfore transaction doesn't know it should commit on it's own, because transaction state is not marked as dirty.
Place the following code after you DBSession.execute the query in your views.py.
from zope.sqlalchemy import mark_changed
session = DBSession()
session.execute(...your query...)
mark_changed(session)
At this point transaction should be able to properly commit your query, alternatively use ORM to insert the new row.
Here is a bit more on this subject:
https://pypi.python.org/pypi/zope.sqlalchemy/0.7.4#id15
By default, zope.sqlalchemy puts sessions in an 'active' state when they are first used. ORM write operations automatically move the session into a 'changed' state. This avoids unnecessary database commits. Sometimes it is necessary to interact with the database directly through SQL. It is not possible to guess whether such an operation is a read or a write. Therefore we must manually mark the session as changed when manual SQL statements write to the DB.

try
DBSession.flush()
after execute

SQLAlchemy DetachedInstanceError with regular attribute (not a relation)

I just started using SQLAlchemy and get a DetachedInstanceError and can't find much information on this anywhere. I am using the instance outside a session, so it is natural that SQLAlchemy is unable to load any relations if they are not already loaded, however, the attribute I am accessing is not a relation, in fact this object has no relations at all. I found solutions such as eager loading, but I can't apply to this because this is not a relation. I even tried "touching" this attribute before closing the session, but it still doesn't prevent the exception. What could be causing this exception for a non-relational property even after it has been successfully accessed once before? Any help in debugging this issue is appreciated. I will meanwhile try to get a reproducible stand-alone scenario and update here.
Update: This is the actual exception message with a few stacks:
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/attributes.py", line 159, in __get__
return self.impl.get(instance_state(instance), instance_dict(instance))
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/attributes.py", line 377, in get
value = callable_(passive=passive)
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/state.py", line 280, in __call__
self.manager.deferred_scalar_loader(self, toload)
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/mapper.py", line 2323, in _load_scalar_attributes
(state_str(state)))
DetachedInstanceError: Instance <ReportingJob at 0xa41cd8c> is not bound to a Session; attribute refresh operation cannot proceed
The partial model looks like this:
metadata = MetaData()
ModelBase = declarative_base(metadata=metadata)
class ReportingJob(ModelBase):
__tablename__ = 'reporting_job'
job_id = Column(BigInteger, Sequence('job_id_sequence'), primary_key=True)
client_id = Column(BigInteger, nullable=True)
And the field client_id is what is causing this exception with a usage like the below:
Query:
jobs = session \
.query(ReportingJob) \
.filter(ReportingJob.job_id == job_id) \
.all()
if jobs:
# FIXME(Hari): Workaround for the attribute getting lazy-loaded.
jobs[0].client_id
return jobs[0]
This is what triggers the exception later out of the session scope:
msg = msg + ", client_id: %s" % job.client_id

I found the root cause while trying to narrow down the code that caused the exception. I placed the same attribute access code at different places after session close and found that it definitely doesn't cause any issue immediately after the close of query session. It turns out the problem starts appearing after closing a fresh session that is opened to update the object. Once I understood that the state of the object is unusable after a session close, I was able to find this thread that discussed this same issue. Two solutions that come out of the thread are:
Keep a session open (which is obvious)
Specify expire_on_commit=False to sessionmaker().
The 3rd option is to manually set expire_on_commit to False on the session once it is created, something like: session.expire_on_commit = False. I verified that this solves my issue.

We were getting similar errors, even with expire_on_commit set to False. In the end it was actually caused by having two sessionmakers that were both getting used to make sessions in different requests. I don't really understand what was going on, but if you see this exception with expire_on_commit=False, make sure you don't have two sessionmakers initialized.

I had a similar problem with the DetachedInstanceError: Instance <> is not bound to a Session;
The situation was quite simple, I pass the session and the record to be updated to my function and it would merge the record and commit it to the database. In the first sample I would get the error, as I was lazy and thought that I could just return the merged object so my operating record would be updated (ie its is_modified value would be false). It did return the updated record and is_modified was now false but subsequent uses threw the error. I think this was compounded because of related child records but not entirely sure of that.
def EditStaff(self, session, record):
try:
r = session.merge(record)
session.commit()
return r
except:
return False
After much googling and reading about sessions etc, I realized that since I had captured the instance r before the commit and returned it, when that same record was sent back to this function for another edit/commit it had lost its session.
So to fix this I just query the database for the record just updated and return it to keep it in session and mark its is_modified value back to false.
def EditStaff(self, session, record):
try:
session.merge(record)
session.commit()
r = self.GetStaff(session, record)
return r
except:
return False
Setting the expire_on_commit=False also avoided the error as mentioned above, but I don't think it actually addresses the error, and could lead to many other issues IMO.

To throw my cause & solution into the ring, I use flask and flask-sqlalchemy to manage all my session stuff. This is fine when I'm doing things via the site, but when doing things via command line and scripts, you have to ensure that anything that's doing flask-y things has to do it with the flask context.
So, in my situation, I needed to get things from a database (using flask-sqlalchemy), then render them to templates (using flask's render_template), then email them (using flask-mail).
In code, what I'd done was something like,
def render_obj(db_obj):
with app.app_context():
return render_template('template_for_my_db_obj.html', db_obj=db_obj
def get_renders():
my_db_objs = MyDbObj.query.all()
renders = []
for day, _db_objs in itertools.groupby(my_db_objs, MyDbObj.get_date):
renders.extend(list(map(render_obj, _db_obj)))
return renders
def email_report():
renders = get_renders()
report = '\n'.join(renders)
with app.app_context():
mail.send(Message('Subject', ['me#me.com'], html=report))
(this is basically pseudocode, I was doing other things in the grouping section)
And when I was running, I'd get through the first _db_obj, but then I'd get the error on any run after.
The culprit? with app.app_context().
Basically it does a few things when you come out of that context, including kinda freshening up the db connections. One of the things that comes from that is getting rid of the last session that was around, which was the session that all the my_db_objs were associated with.
There's a few different options for solutions, but I went with a variant of,
def render_obj(db_obj):
return render_template('template_for_my_db_obj.html', db_obj=db_obj
def get_renders():
my_db_objs = MyDbObj.query.all()
renders = []
for day, _db_objs in itertools.groupby(my_db_objs, MyDbObj.get_date):
renders.extend(list(map(render_obj, _db_obj)))
return renders
def email_report():
with app.app_context():
renders = get_renders()
report = '\n'.join(renders)
mail.send(Message('Subject', ['me#me.com'], html=report))
Only 1 with app.app_context() which wraps them all. The main thing you need to do (if you've a setup like mine) is ensure any dB object you're using to be "inside" any app_context you're using. If you do what I did in the first iteration, all your dB objects will lose their session, ending in DetachedInstanceError like me.

My solution was a simple oversight;
I created an object, added and ,committed it to the db and after that I tried to access on of the original object attributes without refreshing session session.refresh(object)
user = UserFactory()
session.add(user)
session.commit()
# missing session.refresh(user) and causing the problem
print(user.name)

As for me (newbie), I made a mistake on the indent and close the session inside my loop, in which I loop each row, do some operation and commit each time.
So for those newbie like me, check your code before setting things like expire_on_commit=False, it may lead your to another trap.

My solution to this error was also a simple oversight, which I don't think any of the other answers cover.
My function is fetching object x, modifying it, then returning the original x, because I would like the older version.
Before committing and returning x, I was calling expunge_all, but it was "too late", as the object was already marked dirty.
The solution was simply to expunge the object as early as possible.
# pseudo code
x = session.fetch_x()
# adding the following line fixed it
session.expunge(x)
y = session.update(x)
return x

I have a similar problem in my current project and this fix works for me. Please check in your DB relationship for options lazy=True and change it to lazy='dynamic'.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

When should I be calling flush() on SQLAlchemy? - python

Related

SQLAlchemy not rolling back after FlushError

Querying objects added to a non committed session in SQLAlchemy

PonyORM (Python) "Value was updated outside of current transaction" but it wasn't

Insert not working for SQLAlchemy database session

SQLAlchemy DetachedInstanceError with regular attribute (not a relation)

Categories

Resources