I have a SQLAlchemy Session object and would like to know whether it is dirty or not. The exact question what I would like to (metaphorically) ask the Session is: "If at this point I issue a commit() or a rollback(), the effect on the database is the same or not?".
The rationale is this: I want to ask the user wether he wants or not to confirm the changes. But if there are no changes, I would like not to ask anything. Of course I may monitor myself all the operations that I perform on the Session and decide whether there were modifications or not, but because of the structure of my program this would require some quite involved changes. If SQLAlchemy already offered this opportunity, I'd be glad to take advantage of it.
Thanks everybody.
you're looking for a net count of actual flushes that have proceeded for the whole span of the session's transaction; while there are some clues to whether or not this has happened (called the "snapshot"), this structure is just to help with rollbacks and isn't strong referencing. The most direct route to this would be to track "after_flush" events, since this event only emits if flush were called and also that the flush found state to flush:
from sqlalchemy import event
import weakref
transactions_with_flushes = weakref.WeakSet()
#event.listens_for(Session, "after_flush")
def log_transaction(session, flush_context):
for trans in session.transaction._iterate_parents():
transactions_with_flushes.add(trans)
def session_has_pending_commit(session):
return session.transaction in transactions_with_flushes
edit: here's an updated version that's a lot simpler:
from sqlalchemy import event
#event.listens_for(Session, "after_flush")
def log_transaction(session, flush_context):
session.info['has_flushed'] = True
def session_has_pending_commit(session):
return session.info.get('has_flushed', False)
The session has a dirty attribute
session.dirty
persistent objects which currently have changes detected
(this collection is now created on the fly each time the property is called)
sqlalchemy.orm.session.Session.dirty
Here is my solution based on #zzzeek's answer and updated comment. I've unit tested it and it seems to play well with rollbacks (a session is clean after issuing a rollback):
from sqlalchemy import event
from sqlalchemy.orm import Session
#event.listens_for(Session, "after_flush")
def log_flush(session, flush_context):
session.info['flushed'] = True
#event.listens_for(Session, "after_commit")
#event.listens_for(Session, "after_rollback")
def reset_flushed(session):
if 'flushed' in session.info:
del session.info['flushed']
def has_uncommitted_changes(session):
return any(session.new) or any(session.deleted) \
or any([x for x in session.dirty if session.is_modified(x)]) \
or session.info.get('flushed', False)
Sessions have a private _is_clean() member which seems to return true if there is nothing to flush to the database. However, the fact that it is private may mean it's not suitable for external use. I'd stop short of personally recommending this, since any mistake here could obviously result in data loss for your users.
Related
The Problem
I am using a pattern to cache SQLAlchemy objects in Redis. Whenever an instance is modified and committed, I want to clear the relevant caches so that it will be reloaded when next fetched. This clear must happen after commit to avoid race conditions (another thread querying cache, missing, and reloading stale data into the cache).
I have fought with this for a long time, coming up with various solutions that work sometimes but nothing bulletproof. This seems like a straightforward enough problem that there should be a solution to it. How do I trigger some code every time a change is committed to a SQLAlchemy instance?
What I've Tried
Events
I've tried to stitch together some SQLAlchemy events to achieve my goal with varying levels of success. Listening to "after_insert" and "after_update" will tell me when an object is modified, and "after_commit" tells me that whatever was modified was saved, so I had a scheme where the first two events would register listeners for "after_commit", which would in turn pass the object to my cache clearing function. Like this:
def _register_after_commit(_: Mapper, __: Connection, target: MyClass) -> None:
""" Generic callback that adds this function for a target change without params """
targets.add(target) # Clear cache uses this set to know which instances to clear
event.listen(get_session(), "after_commit", clear_cache)
event.listen(MyClass, "after_insert", _register_after_commit)
event.listen(MyClass, "after_update", _register_after_commit)
This works most of the time, but I occasionally get DetachedInstanceError when accessing the attributes on the target that I need to know to clear them from cache (e.g. id). I've read that this happens because of automatic expiring during the commit which causes SQLAlchemy to want to refresh all attributes. I can't turn off auto-expiring nor can I expunge every object that passes through here, either of those could end up breaking other pieces of the code base.
A Custom Session
I made my own session class that looked something like this:
class SessionWithCallback(scoped_session):
""" A version of orm.Session which can call a method after commit completes """
def __init__(self, session_factory, scopefunc = None) -> None:
super().__init__(session_factory=session_factory, scopefunc=scopefunc)
self._callbacks = {}
def add_callback(self, func, *args, **kwargs) -> None:
"""
Adds a callback to be called after commit, ensuring only a single
instance of the callback for each set of args and kwargs is used
"""
key = f"{func}.{args}.{kwargs}"
self._callbacks[key] = (func, args, kwargs)
def run_callbacks(self) -> None:
"""
Executes all callbacks
"""
for (func, args, kwargs) in self._callbacks.values():
func(*args, **kwargs)
self._callbacks = {}
def commit(self) -> None:
""" Flush and commit the current transaction """
super().commit()
self.run_callbacks()
Then instead of _register_after_commit using the "after_commit" event, it would call the current session's add_callback function. This seemed to work when running tests with just SQLAlchemy, but it fell apart when integrating with a Flask app which uses these models and utilizes Flask-SQLAlchemy. I followed instructions to customize session (overriding create_session on the SQLAlchemy instance) but as soon as I commit anything I get an exception that scoped_session has no attribute add_callback. I stepped through and it is using my class internally somehow, but the session it gives me is not an instance of my class. Confusing.
I've Considered
Storing primary keys in my listeners, then requiring the callbacks to open a session and query for a new instance itself if it needs more info. Might work, but it feels like extra I/O that I shouldn't need. I could have several different callbacks for one instance, them all querying feels like a lot of work.
Having some global place to store callbacks instead of on the Session, so that I can avoid the add_callback function. I'd need to still make this session-specific and thread-safe though. Easy enough in Flask, but Flask isn't the only app that needs to share this code.
Just doing these cache clears manually... but that's bound to cause developer error.
Spawning some time-delayed job to clear cache from "after_insert/update". That gets really complicated really fast and sounds like a real headache. For instance, how do you decide how long to wait?
I have a function which is registered as an event on a sqlalchemy model, as show in the code snippets below (not fully-functional as I don't show the db fixture), which should be enough to explain the problem.
root/myapp/models.py:
class MyModel:
id = Column(UUID, primary_key=True)
value = ''
#classmethod
def register_hook(cls, hook_fn):
event.listen(cls, "after_update", hook_fn, propagate=True)
root/myapp/app.py:
from models import MyModel
def hook_fn(mapper, connection, target):
print('fired hook!')
MyModel.register_hook(hook_fn)
root/test/conftest.py:
#pytest.fixture
def patched_hook_fn(mocker):
with mocker.patch("root.myapp.app.hook_fn") as patched:
yield patched
root/test/tests.py:
def test_hook_fires_on_change(db, patched_hook_fn):
model = MyModel(value="initial")
db.session.commit()
model.value = "changed"
db.session.commit() # hook fires here
assert patched_hook_fn.called # assert fails
What I'd like to know is:
Why doesn't the patched function get called?
Is there a simple way in a debug session to see where I should be patching in the with mocker.patch("myapp.app.hook_fn") as patched line?
It doesn't get called because you've already registered the unpatched version with the event system. SQLAlchemy does not read the value at root.myapp.app.hook_fn every time the event is fired, so even if you later set root.myapp.app.hook_fn = some_other_function (which is what patch is doing), it has no visible effect.
The way to fix this is to simply force your app to read the value every time the event is fired, by introducing a level of indirection:
MyModel.register_hook(lambda: hook_fn())
This takes advantage of the way Python resolves identifiers in a closure, where changing root.myapp.app.hook_fn actually changes the value of hook_fn in the closure.
As for your second question, there's no straightforward way to figure out what you need to patch because in order to patch it directly you need to figure out where it is stored in the internals of SQLAlchemy, and depending on that, even in your tests, is quite fragile.
Is there any way to explicitly mark an object as clean in the SQLAlchemy ORM?
This is related partly to a previous question on bulk update strategies.
I want to, within a before_flush event listener mark a bunch of object as actually not needing to be flushed. This is due to them being manually synced with the database by other means.
I have tried the strategy below, but it results in the object being removed from the session, which then can cause problems later when a lazy load happens.
#event.listens_for(SignallingSession, 'before_flush')
def before_flush(session, flush_context, instances):
ledgers = []
if session.dirty:
for elem in session.dirty:
if ( session.is_modified(elem, include_collections=False) ):
if isinstance(elem, Wallet):
session.expunge(elem) # causes problems later
ledgers.append(Ledger(id=elem.id, amount=elem.balance))
if ledgers:
session.bulk_save_objects(ledgers)
session.execute('UPDATE wallet w JOIN ledger l on w.id = l.id SET w.balance = l.amount')
session.execute('TRUNCATE ledger')
I want to do something like:
session.dirty.remove(MyObject)
But that doesn't work as session.dirty is a computed property, not a regular attribute. I've been digging around the instrumentation code, but can't see how I might fool the dirty list to not contain something. I see there is also a history on the object state that will need taking care of as well.
Any ideas? The underlying database is MySQL if that makes any difference.
-Matt
When you modify the database outside of the ORM, you can let the ORM know the current database state by using set_committed_value().
Example:
wallet = session.query(Wallet).filter_by(id=123)
wallet.balance = 0
session.execute("UPDATE wallet SET balance = 0 WHERE id = 123;")
set_committed_value(wallet, "balance", 0)
session.commit() # won't issue additional SQL to update wallet
If you really wanted to mark the instance as not dirty, you can muck with the internals of SQLAlchemy:
state = inspect(p)
session.identity_map._modified.discard(state)
state.modified = False
print(p in session.dirty) # False
Let me summarize this insanity.
from sqlalchemy.orm import attributes
attributes.instance_state(your_object).committed_state.clear()
Easy. (no)
If I have an object retrieved from a model, for example:
obj = Foo.objects.first()
I know that if I want to reference this object later and make sure that it has the current values from the database, I can call:
obj.refresh_from_db()
My question is, is there any advantage to using the refresh_from_db() method over simply doing?:
obj = Foo.objects.get(id=obj.id)
As far as I know, the result will be the same. refresh_from_db() seems more explicit, but in some cases it means an extra line of code. Lets say I update the value field for obj and later want to test that it has been updated to False. Compare:
obj = Foo.objects.first()
assert obj.value is True
# value of foo obj is updated somewhere to False and I want to test below
obj.refresh_from_db()
assert obj.value is False
with this:
obj = Foo.objects.first()
assert obj.value is True
# value of foo obj is updated somewhere to False and I want to test below
assert Foo.objects.get(id=obj.id).value is False
I am not interested in a discussion of which of the two is more pythonic. Rather, I am wondering if one method has a practical advantage over the other in terms of resources, performance, etc. I have read this bit of documentation, but I was not able to ascertain from that whether there is an advantage to using reload_db(). Thank you!
Django sources are usually relatively easy to follow. If we look at the refresh_from_db() implementation, at its core it is still using this same Foo.objects.get(id=obj.id) approach:
db_instance_qs = self.__class__._default_manager.using(db).filter(pk=self.pk)
...
db_instance_qs = db_instance_qs.only(*fields)
...
db_instance = db_instance_qs.get()
Only there are couple extra bells and whistles:
deferred fields are ignored
stale foreign key references are cleared (according to the comment explanation)
So for everyday usage it is safe to say that they are pretty much the same, use whatever you like.
Just to add to #serg's answer, there's a case where explicitly re-fetching from the db is helpful and refreshing from the db isn't so much useful.
This is the case when you're adding permissions to an object and checking them immediately afterwards, and you need to clear the cached permissions for the object so that your permission checks work as expected.
According to the permission caching section of the django documentation:
The ModelBackend caches permissions on the user object after the first time they need to be fetched for a permissions check. This is typically fine for the request-response cycle since permissions aren’t typically checked immediately after they are added (in the admin, for example). If you are adding permissions and checking them immediately afterward, in a test or view for example, the easiest solution is to re-fetch the user from the database...
For an example, consider this block of code inspired by the one in the documentation cited above:
from django.contrib.auth import get_user_model
from django.contrib.auth.models import Permission
from django.contrib.contenttypes.models import ContentType
from smoothies.models import Smoothie
def force_deblend(user, smoothie):
# Any permission check will cache the current set of permissions
if not user.has_perm('smoothies.deblend_smoothie'):
permission = Permission.objects.get(
codename='deblend_smoothie',
content_type=ContentType.objects.get_for_model(Smoothie)
)
user.user_permissions.add(permission)
# Subsequent permission checks hit the cached permission set
print(user.has_perm('smoothies.deblend_smoothie')) # False
# Re-fetch user (explicitly) from db to clear permissions cache
# Be aware that user.refresh_from_db() won't help here
user = get_user_model().objects.get(pk=user.pk)
# Permission cache is now repopulated from the database
print(user.has_perm('smoothies.deblend_smoothie')) # True
...
...
It seems there is a difference if you use cached properties.
See here:
p.roles[0]
<Role: 16888649>
p.refresh_from_db()
p.roles[0]
<Role: 16888649>
p = Person.objects.get(id=p.id)
p.roles[0]
<Role: 16888650>
Definition from models.py:
#cached_property
def roles(self):
return Role.objects.filter(employer__person=self).order_by("id")
I just started using SQLAlchemy and get a DetachedInstanceError and can't find much information on this anywhere. I am using the instance outside a session, so it is natural that SQLAlchemy is unable to load any relations if they are not already loaded, however, the attribute I am accessing is not a relation, in fact this object has no relations at all. I found solutions such as eager loading, but I can't apply to this because this is not a relation. I even tried "touching" this attribute before closing the session, but it still doesn't prevent the exception. What could be causing this exception for a non-relational property even after it has been successfully accessed once before? Any help in debugging this issue is appreciated. I will meanwhile try to get a reproducible stand-alone scenario and update here.
Update: This is the actual exception message with a few stacks:
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/attributes.py", line 159, in __get__
return self.impl.get(instance_state(instance), instance_dict(instance))
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/attributes.py", line 377, in get
value = callable_(passive=passive)
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/state.py", line 280, in __call__
self.manager.deferred_scalar_loader(self, toload)
File "/home/hari/bin/lib/python2.6/site-packages/SQLAlchemy-0.6.1-py2.6.egg/sqlalchemy/orm/mapper.py", line 2323, in _load_scalar_attributes
(state_str(state)))
DetachedInstanceError: Instance <ReportingJob at 0xa41cd8c> is not bound to a Session; attribute refresh operation cannot proceed
The partial model looks like this:
metadata = MetaData()
ModelBase = declarative_base(metadata=metadata)
class ReportingJob(ModelBase):
__tablename__ = 'reporting_job'
job_id = Column(BigInteger, Sequence('job_id_sequence'), primary_key=True)
client_id = Column(BigInteger, nullable=True)
And the field client_id is what is causing this exception with a usage like the below:
Query:
jobs = session \
.query(ReportingJob) \
.filter(ReportingJob.job_id == job_id) \
.all()
if jobs:
# FIXME(Hari): Workaround for the attribute getting lazy-loaded.
jobs[0].client_id
return jobs[0]
This is what triggers the exception later out of the session scope:
msg = msg + ", client_id: %s" % job.client_id
I found the root cause while trying to narrow down the code that caused the exception. I placed the same attribute access code at different places after session close and found that it definitely doesn't cause any issue immediately after the close of query session. It turns out the problem starts appearing after closing a fresh session that is opened to update the object. Once I understood that the state of the object is unusable after a session close, I was able to find this thread that discussed this same issue. Two solutions that come out of the thread are:
Keep a session open (which is obvious)
Specify expire_on_commit=False to sessionmaker().
The 3rd option is to manually set expire_on_commit to False on the session once it is created, something like: session.expire_on_commit = False. I verified that this solves my issue.
We were getting similar errors, even with expire_on_commit set to False. In the end it was actually caused by having two sessionmakers that were both getting used to make sessions in different requests. I don't really understand what was going on, but if you see this exception with expire_on_commit=False, make sure you don't have two sessionmakers initialized.
I had a similar problem with the DetachedInstanceError: Instance <> is not bound to a Session;
The situation was quite simple, I pass the session and the record to be updated to my function and it would merge the record and commit it to the database. In the first sample I would get the error, as I was lazy and thought that I could just return the merged object so my operating record would be updated (ie its is_modified value would be false). It did return the updated record and is_modified was now false but subsequent uses threw the error. I think this was compounded because of related child records but not entirely sure of that.
def EditStaff(self, session, record):
try:
r = session.merge(record)
session.commit()
return r
except:
return False
After much googling and reading about sessions etc, I realized that since I had captured the instance r before the commit and returned it, when that same record was sent back to this function for another edit/commit it had lost its session.
So to fix this I just query the database for the record just updated and return it to keep it in session and mark its is_modified value back to false.
def EditStaff(self, session, record):
try:
session.merge(record)
session.commit()
r = self.GetStaff(session, record)
return r
except:
return False
Setting the expire_on_commit=False also avoided the error as mentioned above, but I don't think it actually addresses the error, and could lead to many other issues IMO.
To throw my cause & solution into the ring, I use flask and flask-sqlalchemy to manage all my session stuff. This is fine when I'm doing things via the site, but when doing things via command line and scripts, you have to ensure that anything that's doing flask-y things has to do it with the flask context.
So, in my situation, I needed to get things from a database (using flask-sqlalchemy), then render them to templates (using flask's render_template), then email them (using flask-mail).
In code, what I'd done was something like,
def render_obj(db_obj):
with app.app_context():
return render_template('template_for_my_db_obj.html', db_obj=db_obj
def get_renders():
my_db_objs = MyDbObj.query.all()
renders = []
for day, _db_objs in itertools.groupby(my_db_objs, MyDbObj.get_date):
renders.extend(list(map(render_obj, _db_obj)))
return renders
def email_report():
renders = get_renders()
report = '\n'.join(renders)
with app.app_context():
mail.send(Message('Subject', ['me#me.com'], html=report))
(this is basically pseudocode, I was doing other things in the grouping section)
And when I was running, I'd get through the first _db_obj, but then I'd get the error on any run after.
The culprit? with app.app_context().
Basically it does a few things when you come out of that context, including kinda freshening up the db connections. One of the things that comes from that is getting rid of the last session that was around, which was the session that all the my_db_objs were associated with.
There's a few different options for solutions, but I went with a variant of,
def render_obj(db_obj):
return render_template('template_for_my_db_obj.html', db_obj=db_obj
def get_renders():
my_db_objs = MyDbObj.query.all()
renders = []
for day, _db_objs in itertools.groupby(my_db_objs, MyDbObj.get_date):
renders.extend(list(map(render_obj, _db_obj)))
return renders
def email_report():
with app.app_context():
renders = get_renders()
report = '\n'.join(renders)
mail.send(Message('Subject', ['me#me.com'], html=report))
Only 1 with app.app_context() which wraps them all. The main thing you need to do (if you've a setup like mine) is ensure any dB object you're using to be "inside" any app_context you're using. If you do what I did in the first iteration, all your dB objects will lose their session, ending in DetachedInstanceError like me.
My solution was a simple oversight;
I created an object, added and ,committed it to the db and after that I tried to access on of the original object attributes without refreshing session session.refresh(object)
user = UserFactory()
session.add(user)
session.commit()
# missing session.refresh(user) and causing the problem
print(user.name)
As for me (newbie), I made a mistake on the indent and close the session inside my loop, in which I loop each row, do some operation and commit each time.
So for those newbie like me, check your code before setting things like expire_on_commit=False, it may lead your to another trap.
My solution to this error was also a simple oversight, which I don't think any of the other answers cover.
My function is fetching object x, modifying it, then returning the original x, because I would like the older version.
Before committing and returning x, I was calling expunge_all, but it was "too late", as the object was already marked dirty.
The solution was simply to expunge the object as early as possible.
# pseudo code
x = session.fetch_x()
# adding the following line fixed it
session.expunge(x)
y = session.update(x)
return x
I have a similar problem in my current project and this fix works for me. Please check in your DB relationship for options lazy=True and change it to lazy='dynamic'.