Issues with scoped_session in sqlalchemy - how does it work?

Issues with scoped_session in sqlalchemy - how does it work? - python

I'm not really sure how scoped_session works, other than it seems to be a wrapper that hides several real sessions, keeping them separate for different requests. Does it do this with thread locals?
Anyway the trouble is as follows:
S = elixir.session # = scoped_session(...)
f = Foo(bar=1)
S.add(f) # ERROR, f is already attached to session (different session)
Not sure how f ended up in a different session, I've not had problems with that before. Elsewhere I have code that looks just like that, but actually works. As you can imagine I find that very confusing.
I just don't know anything here, f seems to be magically added to a session in the constructor, but I don't seem to have any references to the session it uses. Why would it end up in a different session? How can I get it to end up in the right session? How does this scoped_session thing work anyway? It just seems to work sometimes, and other times it just doesn't.
I'm definitely very confused.

Scoped session creates a proxy object that keeps a registry of (by default) per thread session objects created on demand from the passed session factory. When you access a session method such as ScopedSession.add it finds the session corresponding to the current thread and returns the add method bound to that session. The active session can be removed using the ScopedSession.remove() method.
ScopedSession has a few convenience methods, one is query_property that creates a property that returns a query object bound to the scoped session it was created on and the class it was accessed. The other is ScopedSession.mapper that adds a default __init__(**kwargs) constructor and by default adds created objects to the scoped session the mapper was created off. This behavior can be controlled by the save_on_init keyword argument to the mapper. ScopedSession.mapper is deprecated because of exactly the problem that is in the question. This is one case where the Python "explicit is better than implicit" philosophy really applies. Unfortunately Elixir still by default uses ScopedSession.mapper.

It turns out elixir sets save-on-init=True on the created mappers. This can be disabled by:
using_mapper_options(save_on_init=False)
This solves the problem. Kudos to stepz on #sqlalchemy for figuring out what was going on immediately. Although I am still curious how scoped_session really works, so if someone answers that, they'll get credit for answering the question.

Related

threading.Timer in Python -- two functions agree on variable identity, but not the value

Are there any caveats similar to variable caching when using threading.Timer in Python?
I'm observing an effect similar to not putting "volatile" keyword in other languages, but I've read that this doesn't apply to Python and yet something is off. In summary, two methods (threads?) agree on the identity of a list variable but disagree on the contents.
I have a class with member variable self.x (a list) which is assigned once in constructor and then its identity never changes (but it can be cleared and refilled).
A Timer is also started in the constructor that periodically updates the contents of self.x. I'm not using any locks though I probably should (eventually).
Other users of the class instance then sometimes try to read the contents of the list.
Problem: At the end of the timer handler, the list is populated correctly (I've printed its contents in the logs) but when a user of the class instance reads the instance variable, it's empty! (more specifically, same value as it was initialized in the constructor, i.e. if I put some other item in there, it'll be that value).
The weird thing is that the getter and the timer agree on the identity of the list! (id(self.x) returns the same value). Also, I was not able to repro this in tests, even though I'm doing the same thing.
Any idea what I might be doing wrong?
Thanks in advance!

Seems like I misunderstood how multiprocessing works. What happened is the code forked into multiple processes and even though the id() of objects remained the same, I didn't realize they were in different (forked) processes. I knew this was happening but I thought that the fork would take care of copying the timer as well. I've changed the code to create the object (and timers) once the fork is complete and it seems to have solved the problem.

Pyramid with SQLAlchemy: scoped or non-scoped database session

For older versions of pyramid the setup for sqlalchemy session was done with scooped_session similar to this
DBSession = scoped_session(
sessionmaker(
autoflush=True,
expire_on_commit=False,
extension=zope.sqlalchemy.ZopeTransactionExtension()
)
However I see that newer tutorials as well the pyramid docs 'promotes' sqlalchemy with no threadlocals where the DBSession is attached to the request object.
Is the 'old' way broken and what is the advantage of the no threadlocals ?

I spearheaded this transition with help from several other contributors who had blogged [1] about some advantages. It basically boils down to following the pyramid philosophy of making it possible for applications to be written that do not require any global variables. This is really important when writing reusable, composable code. It makes your code's dependencies (api surface) clear, instead of having random functions dependent on your database, despite their function signatures / member variables not exposing those dependencies. This also makes it easier to test code because you don't have to worry as much about threadlocal variables. With globals you need to track down what modules may be holding references to them and patch them to use the new object. Without globals, you simply pass in the objects you want to use and the code uses them, just like any other parameter to a function or state on an object.
A lot of people complain about having to pass their database to tons of functions. This is a smell and just means you aren't designing your apis well. Many times you can structure things as an object that's created once per-request and stores the handle as something like self.dbsession, and each method on the object now has access to it.
[1] https://metaclassical.com/testing-pyramid-apps-without-a-scoped-session/

How is post_delete firing before delete in Django?

I am seeing post_delete fire on a model before the instance is actually deleted from the database, which is contrary to https://docs.djangoproject.com/en/1.6/ref/signals/#post-delete
Note that the object will no longer be in the database, so be very careful what you do with this instance.
If I look in the database, the record remains, if I requery using the ORM, the record is returned, and is equivalent to the instance:
>>> instance.__class__.objects.get(pk=instance.pk) == instance
True
I don't have much relevant code to show, my signal looks like this:
from django.db.models.signals import post_delete, post_save
#receiver(post_delete, sender=UserInvite)
def invite_delete_action(sender, instance, **kwargs):
raise Exception(instance.__class__.objects.get(pk=instance.pk) == instance)
I am deleting this instance directly, it's not a relation of something else that is being deleted
My model is pretty normal looking
My view is a generic DeleteView
I haven't found any transactional decorators anywhere - which was my first thought as to how it might be happening
Any thoughts on where I would start debugging how on earth this happening? Is anyone aware of this as a known bug, I can't find any tickets describing any behaviour like this – also I am sure this works as expected in various other places in my application that are seemingly unaffected.
If I allow the execution to continue the instance does end up deleted... so it's not like it's present because it's failing to delete it (pretty sure post_delete shouldn't fire in that case anyway).

I believe what I am seeing is because of Django's default transactional behaviour, where the changes are not committed until the request is complete.
I don't really have a solution – I can't see a way to interrogate the state an instance or record would be in once the transaction is completed (or even a way to have any visibility of the transaction) nor any easy way to prevent this behaviour without significantly altering the way the application runs.
I am opting for ignore the problem for now, and not worrying about the repercussions in my use-case, which in fact, aren't that severe – I do welcome any and all suggestions regarding how to handle this properly however.
I fire a more generic signal for activity logging in my post_delete, and in the listener for that I need to be able to check if the instance is being deleted – otherwise it binds a bad GenericRelation referencing a pk that does not exist, what I intended to do is nullify it if I see the relation is being deleted - but as described, I can't tell at this point, unless I was to pass an extra argument whenever I fire the signal inside the post_delete.

Refreshing detached entity in sqlalchemy

I need to attach an object to session in such a way that it will not differ from one persisted in db. (Easier to explain it with code):
session.query(type(some_object)).filter_by(id=some_object.id).one()
Is there more proper way to do that?
session.add(some_object) doesn't work since an entity with such id can already be attached to this session, and object = session.merge(some_object) doesn't work for me because it translates state from detached copy (if i make object.name='asdfasdf' these changes will be pending after merging object)
EDIT:
I found a bit less ugly way:
some_object = session.merge(some_object)
session.refresh(some_object)
But is there a way todo this in one call?

I need to attach an object to session in such a way that it will not differ from one persisted in db.
"will not differ from DB" pretty much means you're looking to load it, so query it. You might want to consider that the object might already be present in that target session. so your approach with query(type(object)) is probably the most direct, though you can use get() to hit the primary key directly, and populate_existing() to guarantee that state which already exists in the session is overwritten:
session.query(type(some_object)).populate_existing().get(some_object.id)
the above calls down to almost the same codepaths that refresh() does. The merge/refresh approach you have works too but emits at least two SELECT calls.

What's a good general way to look SQLAlchemy transactions, complete with authenticated user, etc?

I'm using SQLAlchemy's declarative extension. I'd like all changes to tables logs, including changes in many-to-many relationships (mapping tables). Each table should have a separate "log" table with a similar schema, but additional columns specifying when the change was made, who made the change, etc.
My programming model would be something like this:
row.foo = 1
row.log_version(username, change_description, ...)
Ideally, the system wouldn't allow the transaction to commit without row.log_version being called.
Thoughts?

There are too many questions in one, so they that full answers to all them won't fit StackOverflow answer format. I'll try to describe hints in short, so ask separate question for them if it's not enough.
Assigning user and description to transaction
The most popular way to do so is assigning user (and other info) to some global object (threading.local() in threaded application). This is very bad way, that causes hard to discover bugs.
A better way is assigning user to the session. This is OK when session is created for each web request (in fact, it's the best design for application with authentication anyway), since there is the only user using this session. But passing description this way is not as good.
And my favorite solution is to extent Session.commit() method to accept optional user (and probably other info) parameter and assign it current transaction. This is the most flexible, and it suites well to pass description too. Note that info is bound to single transaction and is passed in obvious way when transaction is closed.
Discovering changes
There is a sqlalchemy.org.attributes.instance_state(obj) contains all information you need. The most useful for you is probably state.committed_state dictionary which contains original state for changed fields (including many-to-many relations!). There is also state.get_history() method (or sqlalchemy.org.attributes.get_history() function) returning a history object with has_changes() method and added and deleted properties for new and old value respectively. In later case use state.manager.keys() (or state.manager.attributes) to get a list of all fields.
Automatically storing changes
SQLAlchemy supports mapper extension that can provide hooks before and after update, insert and delete. You need to provide your own extension with all before hooks (you can't use after since the state of objects is changed on flush). For declarative extension it's easy to write a subclass of DeclarativeMeta that adds a mapper extension for all your models. Note that you have to flush changes twice if you use mapped objects for log, since a unit of work doesn't account objects created in hooks.

We have a pretty comprehensive "versioning" recipe at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/LogVersions . It seems some other users have contributed some variants on it. The mechanics of "add a row when something changes at the ORM level" are all there.
Alternatively you can also intercept at the execution level using ConnectionProxy, search through the SQLA docs for how to use that.
edit: versioning is now an example included with SQLA: http://docs.sqlalchemy.org/en/rel_0_8/orm/examples.html#versioned-objects

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.