Marking an object as clean in SQLAlchemy ORM

Marking an object as clean in SQLAlchemy ORM - python

Is there any way to explicitly mark an object as clean in the SQLAlchemy ORM?
This is related partly to a previous question on bulk update strategies.
I want to, within a before_flush event listener mark a bunch of object as actually not needing to be flushed. This is due to them being manually synced with the database by other means.
I have tried the strategy below, but it results in the object being removed from the session, which then can cause problems later when a lazy load happens.
#event.listens_for(SignallingSession, 'before_flush')
def before_flush(session, flush_context, instances):
ledgers = []
if session.dirty:
for elem in session.dirty:
if ( session.is_modified(elem, include_collections=False) ):
if isinstance(elem, Wallet):
session.expunge(elem) # causes problems later
ledgers.append(Ledger(id=elem.id, amount=elem.balance))
if ledgers:
session.bulk_save_objects(ledgers)
session.execute('UPDATE wallet w JOIN ledger l on w.id = l.id SET w.balance = l.amount')
session.execute('TRUNCATE ledger')
I want to do something like:
session.dirty.remove(MyObject)
But that doesn't work as session.dirty is a computed property, not a regular attribute. I've been digging around the instrumentation code, but can't see how I might fool the dirty list to not contain something. I see there is also a history on the object state that will need taking care of as well.
Any ideas? The underlying database is MySQL if that makes any difference.
-Matt

When you modify the database outside of the ORM, you can let the ORM know the current database state by using set_committed_value().
Example:
wallet = session.query(Wallet).filter_by(id=123)
wallet.balance = 0
session.execute("UPDATE wallet SET balance = 0 WHERE id = 123;")
set_committed_value(wallet, "balance", 0)
session.commit() # won't issue additional SQL to update wallet
If you really wanted to mark the instance as not dirty, you can muck with the internals of SQLAlchemy:
state = inspect(p)
session.identity_map._modified.discard(state)
state.modified = False
print(p in session.dirty) # False

Let me summarize this insanity.
from sqlalchemy.orm import attributes
attributes.instance_state(your_object).committed_state.clear()
Easy. (no)

Related

PonyORM (Python) "Value was updated outside of current transaction" but it wasn't

I'm using Pony ORM version 0.7 with a Sqlite3 database on disk, and running into this issue: I am performing a select, then an update, then a select, then another update, and getting an error message of
pony.orm.core.UnrepeatableReadError: Value of Task.order_id for
Task[23654] was updated outside of current transaction (was: 1, now: 2)
I've reduced the problem to the minimum set of commands that causes the problem (i.e. removing anything causes the problem not to occur):
#db_session
def test_method():
tasks = list(map(Task.to_dict, Task.select()))
db.execute("UPDATE Task SET order_id=order_id*2")
task_to_move = select(task for task in Task if task.order_id == 2).first()
task_to_move.order_id = 1
test_method()
For completeness's sake, here is the definition of Task:
class Task(db.Entity):
text = Required(unicode)
heading = Required(int)
create_timestamp = Required(datetime)
done_timestamp = Optional(datetime)
order_id = Required(int)
Also, if I remove the constraint that task.order_id == 2 from my select, the problem no longer occurs, so I assume the problem has something to do with querying based on a field that has been changed since the transaction has started, but I don't know why the error message is telling me that it was changed by a different transaction (unless maybe db.execute is executing in a separate transaction because it is raw SQL?)
I've already looked at this similar question, but the problem was different (Pony ORM reports record "was updated outside of current transaction" while there is not other transaction) and at this documentation (https://docs.ponyorm.com/transactions.html) but neither solved my problem.
Any ideas what might be going on here?

Pony uses optimistic concurrency control by default. For each attribute Pony remembers its current value (potentially modified by application code) as well as original value which was read from the database. During UPDATE Pony checks that the value of column in the database is still the same. If the value is changed, Pony assumes that some concurrent transaction did it, and throw exception in order to avoid the "lost update" situation.
If you execute some raw SQL query, Pony does not know what exactly was modified in the database. So when Pony encounters that the counter value was changed, it mistakenly thinks that the value was changed by another transaction.
In order to avoid the problem you can mark order_id attribute as volatile. Then Pony will assume, that the value of attribute can change at any time (by trigger or raw SQL update), and will exclude that attribute from optimistic checks:
class Task(db.Entity):
text = Required(unicode)
heading = Required(int)
create_timestamp = Required(datetime)
done_timestamp = Optional(datetime)
order_id = Required(int, volatile=True)
Note that Pony will cache the value of volatile attribute and will not re-read the value from the database until the object was saved, so in some situation you can get obsolete value in Python.
Update:
Starting from release 0.7.4 you can also specify optimistic=False option to db_session to turn off optimistic checks for specific transaction that uses raw SQL queries:
with db_session(optimistic=False):
...
or
#db_session(optimistic=False)
def some_function():
...
Also it is possible now to specify optimistic=False option for attribute instead of specifying volatile=True. Then Pony will not make optimistic checks for that attribute, but will still consider treat it as non-volatile

Python peewee save() doesn't work as expected

I'm inserting/updating objects into a MySQL database using the peewee ORM for Python. I have a model like this:
class Person(Model):
person_id = CharField(primary_key=True)
name = CharField()
I create the objects/rows with a loop, and each time through the loop have a dictionary like:
pd = {"name":"Alice","person_id":"A123456"}
Then I try creating an object and saving it.
po = Person()
for key,value in pd.items():
setattr(po,key,value)
po.save()
This takes a while to execute, and runs without errors, but it doesn't save anything to the database -- no records are created.
This works:
Person.create(**pd)
But also throws an error (and terminates the script) when the primary key already exists. From reading the manual, I thought save() was the function I needed -- that peewee would perform the update or insert as required.
Not sure what I need to do here -- try getting each record first? Catch errors and try updating a record if it can't be created? I'm new to peewee, and would normally just write INSERT ... ON DUPLICATE KEY UPDATE or even REPLACE.

Person.save(force_insert=True)
It's documented: http://docs.peewee-orm.com/en/latest/peewee/models.html#non-integer-primary-keys-composite-keys-and-other-tricks

I've had a chance to re-test my answer, and I think it should be replaced. Here's the pattern I can now recommend; first, use get_or_create() on the model, which will create the database row if it doesn't exist. Then, if it is not created (object is retrieved from db instead), set all the attributes from the data dictionary and save the object.
po, created = Person.get_or_create(person_id=pd["person_id"],defaults=pd)
if created is False:
for key in pd:
setattr(fa,key,pd[key])
po.save()
As before, I should mention that these are two distinct transactions, so this should not be used with multi-user databases requiring a true upsert in one transaction.

I think you might try get_or_create()? http://peewee.readthedocs.org/en/latest/peewee/querying.html#get-or-create

You may do something like:
po = Person()
for key,value in pd.items():
setattr(po,key,value)
updated = po.save()
if not updated:
po.save(force_insert=True)

How to check whether SQLAlchemy session is dirty or not

I have a SQLAlchemy Session object and would like to know whether it is dirty or not. The exact question what I would like to (metaphorically) ask the Session is: "If at this point I issue a commit() or a rollback(), the effect on the database is the same or not?".
The rationale is this: I want to ask the user wether he wants or not to confirm the changes. But if there are no changes, I would like not to ask anything. Of course I may monitor myself all the operations that I perform on the Session and decide whether there were modifications or not, but because of the structure of my program this would require some quite involved changes. If SQLAlchemy already offered this opportunity, I'd be glad to take advantage of it.
Thanks everybody.

you're looking for a net count of actual flushes that have proceeded for the whole span of the session's transaction; while there are some clues to whether or not this has happened (called the "snapshot"), this structure is just to help with rollbacks and isn't strong referencing. The most direct route to this would be to track "after_flush" events, since this event only emits if flush were called and also that the flush found state to flush:
from sqlalchemy import event
import weakref
transactions_with_flushes = weakref.WeakSet()
#event.listens_for(Session, "after_flush")
def log_transaction(session, flush_context):
for trans in session.transaction._iterate_parents():
transactions_with_flushes.add(trans)
def session_has_pending_commit(session):
return session.transaction in transactions_with_flushes
edit: here's an updated version that's a lot simpler:
from sqlalchemy import event
#event.listens_for(Session, "after_flush")
def log_transaction(session, flush_context):
session.info['has_flushed'] = True
def session_has_pending_commit(session):
return session.info.get('has_flushed', False)

The session has a dirty attribute
session.dirty
persistent objects which currently have changes detected
(this collection is now created on the fly each time the property is called)
sqlalchemy.orm.session.Session.dirty

Here is my solution based on #zzzeek's answer and updated comment. I've unit tested it and it seems to play well with rollbacks (a session is clean after issuing a rollback):
from sqlalchemy import event
from sqlalchemy.orm import Session
#event.listens_for(Session, "after_flush")
def log_flush(session, flush_context):
session.info['flushed'] = True
#event.listens_for(Session, "after_commit")
#event.listens_for(Session, "after_rollback")
def reset_flushed(session):
if 'flushed' in session.info:
del session.info['flushed']
def has_uncommitted_changes(session):
return any(session.new) or any(session.deleted) \
or any([x for x in session.dirty if session.is_modified(x)]) \
or session.info.get('flushed', False)

Sessions have a private _is_clean() member which seems to return true if there is nothing to flush to the database. However, the fact that it is private may mean it's not suitable for external use. I'd stop short of personally recommending this, since any mistake here could obviously result in data loss for your users.

sqlalchemy: ObjectdereferencedError

I have a flask app where I made a bunch of classes all with relationships to each other:
User
Course
Lecture
Note
Queue
Asset
So I'm trying to make a new lecture and note, and I have a method defined for each thing.
create Note
def createPad(user,course,lecture):
lecture.queues.first().users.append(user)
# make new etherpad for user to wait in
newNote = Note(dt) # init also creates a new pad at /p/groupID$noteID
db.session.add(newNote)
#db.session.commit()
# add note to user, course, and lecture
user.notes.append(newNote)
course.notes.append(newNote)
lecture.notes.append(newNote)
db.session.commit()
return newNote
createLecture
def createLecture(user, course):
# create new lecture
now = datetime.now()
dt = now.strftime("%Y-%m-%d-%H-%M")
newLecture = Lecture(dt)
db.session.add(newLecture)
# add lecture to course, add new queue to lecture, add user to queue, add new user to lecture
course.lectures.append(newLecture)
newQueue = MatchQueue('neutral')
db.session.add(newQueue)
newLecture.users.append(user)
# hook up the new queue to the user, lecture
newQueue.users.append(user)
newQueue.lecture = newLecture
# put new lecture in correct course
db.session.commit()
newLecture.groupID = pad.createGroupIfNotExistsFor(newLecture.course.name+dt)['groupID']
db.session.commit()
return newLecture
which is all called from some controller logic
newlec = createLecture(user, courseobj)
# make new pad
newNote = createPad(user,courseobj,newlec)
# make lecture live
newLecture.live = True
db.session.commit()
redirect(somewhere)
This ends up throwing this error:
ObjectDereferencedError: Can't emit change event for attribute 'Queue.users' - parent object of type has been garbage collected.
At lecture.queues.first().users.append(user) in createPad.
I have no clue what this means. I think I'm lacking some fundamental knowledge of sqlalchemy here (I am a sqlalchemy noob). What's going on?

lecture.queues.first().users.append(user)
it means:
the first() method hits the database and produces an object, I'm not following your mappings but I have a guess its a Queue object.
then, you access the "users" collection on Queue.
At this point, Python itself garbage collects Queue - it is not referenced anywhere, once "users" has been returned. This is how reference counting garbage collection works.
Then you attempt to append a "user" to "users". SQLAlchemy has to track the changes to all mapped attributes, if you were to say Queue.name = "some name", SQLAlchemy needs to register that with the parent Queue object so it knows to flush it. If you say Queue.users.append(someuser), same idea, it needs to register the change event with the parent.
SQLAlchemy can't do this, because the Queue is gone. Hence the message is raised. SQLAlchemy has a weakref to the parent here so it knows exactly what has happened (and we can't prevent it because people get very upset when we create unnecessary reference cycles in their object models).
The solution is very easy and also easier to read which is to assign the query result to a variable:
queue = lecture.queues.first()
queue.users.append(user)

I did a bunch of refactoring and put in the objects corresponding to some objects during instantiation, which made things a lot neater. Somehow the problem went away.
One thing I did differently was in my many-to-many relationships, I changed the backref to a db.backref():
courses = db.relationship('Course', secondary=courseTable, backref=db.backref('users', lazy='dynamic'))
lectures = db.relationship('Lecture', secondary=lectureTable, backref=db.backref('users', lazy='dynamic'))
notes = db.relationship('Note', secondary=noteTable, backref=db.backref('users', lazy='dynamic'))
queues = db.relationship('Queue', secondary=queueTable, backref=db.backref('users', lazy='dynamic'))

How to create a non-persistent Elixir/SQLAlchemy object?

Because of legacy data which is not available in the database but some external files, I want to create a SQLAlchemy object which contains data read from the external files, but isn't written to the database if I execute session.flush()
My code looks like this:
try:
return session.query(Phone).populate_existing().filter(Phone.mac == ident).one()
except:
return self.createMockPhoneFromLicenseFile(ident)
def createMockPhoneFromLicenseFile(self, ident):
# Some code to read necessary data from file deleted....
phone = Phone()
phone.mac = foo
phone.data = bar
phone.state = "Read from legacy file"
phone.purchaseOrderPosition = self.getLegacyOrder(ident)
# SQLAlchemy magic doesn't seem to work here, probably because we don't insert the created
# phone object into the database. So we set the id fields manually.
phone.order_id = phone.purchaseOrderPosition.order_id
phone.order_position_id = phone.purchaseOrderPosition.order_position_id
return phone
Everything works fine except that on a session.flush() executed later in the application SQLAlchemy tries to write the created Phone object to the database (which fortunately doesn't succeed, because phone.state is longer than the data type allows), which breaks the function which issues the flush.
Is there any way to prevent SQLAlchemy from trying to write such an object?
Update
While I didn't find anything on
using_mapper_options(save_on_init=False)
in the Elixir documentation (maybe you can provide a link?), it seemed to me worth a try (I would have preferred a way to prevent a single instance from being written instead of the whole entity).
At first it seemed that the statement has no effect and I suspected my SQLAlchemy/Elixir versions to be too old, but then I found out that the connection to the PurchaseOrderPosition entity (which I didn't modify) made with
phone.purchaseOrderPosition = self.getLegacyOrder(ident)
causes the phone object to be written again. If I remove the statement, everything seems to be fine.

You need to do
import elixir
elixir.options_defaults['mapper_options'] = { 'save_on_init': False }
to prevent Entity instances which you instantiate being auto-added to the session. Ideally, this should be done as early as possible in your code. You can also do this on a per-entity basis, via using_mapper_options(save_on_init=False) - see the Elixir documentation for more details.
Update:
See this post on the Elixir mailing list indicating that this is the solution.
Also, as Ants Aasma points out, you can use cascade options on the Elixir relationship to set up cascade options in SQLAlchemy. See this page for more details.

Well, sqlalchemy doesn't, by default.
Consider the following self-contained example code.
from sqlalchemy import Column, Integer, Unicode, create_engine
from sqlalchemy.orm import create_session
from sqlalchemy.ext.declarative import declarative_base
e = create_engine('sqlite://')
Base = declarative_base(bind=e)
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(Unicode(50))
# create the empty table and a session
Base.metadata.create_all()
s = create_session(bind=e, autoflush=False, autocommit=False)
# assert the table is empty
assert s.query(User).all() == []
# create a new User instance but don't save it to database:
u = User()
u.name = 'siebert'
# I could run s.add(u) here but I won't
s.flush()
s.commit()
# assert the table is still empty
assert s.query(User).all() == []
So I'm not sure what's implicity adding your instances to the session. Normally you have to manually call s.add(u) to make it go to the session. I'm not familiar with elixir so perhaps this is some elixir trickery... Maybe you could remove it from the session, by using session.expunge().

Old post but I came across a similar issue, in my case in sqlalchemy it was caused by cascading on backrefs:
http://docs.sqlalchemy.org/en/rel_0_7/orm/session.html#backref-cascade
Turn it off on your backrefs so that you have to explicitly add things to the session

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Marking an object as clean in SQLAlchemy ORM - python

Let me summarize this insanity. from sqlalchemy.orm import attributes attributes.instance_state(your_object).committed_state.clear() Easy. (no)

Related

PonyORM (Python) "Value was updated outside of current transaction" but it wasn't

Python peewee save() doesn't work as expected

How to check whether SQLAlchemy session is dirty or not

sqlalchemy: ObjectdereferencedError

How to create a non-persistent Elixir/SQLAlchemy object?

Categories

Resources