I need to attach an object to session in such a way that it will not differ from one persisted in db. (Easier to explain it with code):
session.query(type(some_object)).filter_by(id=some_object.id).one()
Is there more proper way to do that?
session.add(some_object) doesn't work since an entity with such id can already be attached to this session, and object = session.merge(some_object) doesn't work for me because it translates state from detached copy (if i make object.name='asdfasdf' these changes will be pending after merging object)
EDIT:
I found a bit less ugly way:
some_object = session.merge(some_object)
session.refresh(some_object)
But is there a way todo this in one call?
I need to attach an object to session in such a way that it will not differ from one persisted in db.
"will not differ from DB" pretty much means you're looking to load it, so query it. You might want to consider that the object might already be present in that target session. so your approach with query(type(object)) is probably the most direct, though you can use get() to hit the primary key directly, and populate_existing() to guarantee that state which already exists in the session is overwritten:
session.query(type(some_object)).populate_existing().get(some_object.id)
the above calls down to almost the same codepaths that refresh() does. The merge/refresh approach you have works too but emits at least two SELECT calls.
Related
So let's say there's a model A which looks like this:
class A(model):
name = char(unique=True)
When a user tries to create a new A, a view will check whether the name is already taken. Like that:
name_taken = A.objects.get(name=name_passed_by_user)
if name_taken:
return "Name exists!"
# Creating A here
It used to work well, but as the system grew there started to appear concurrent attempts at creating A's with the same name. And sometimes multiple requests pass the "name exists check" in the same few milliseconds, resulting in integrity errors, since the name field has to be UNIQUE, and multiple requests to create a certain name pass the check.
The current solution is a lot of "try: except IntegrityError:" wraps around creation parts, despite the prior check. Is there a way to avoid that? Because there are a lot of models with UNIQUE constraints like that, thus a lot of ugly "try: except IntegrityError:" wraps. Is it possible to lock as to not prevent from SELECTing, but lock as to prevent from SELECTing FOR UPDATE? Or maybe there's a more proper solution? I'm certain it's a common problem with usernames and other fields/columns like them, and there must be a proper approach rather than exception catching.
The DB is Postgres10, ORM is SQLAlchemy of Python, but tweaks to db directly are applicable too.
The only thing you can do is to set the appropriate transaction isolation directly to postgres. Neither python nor the ORM can do anything about it. serialized level will most likely solve your problem. But it might slow down performance, so you should try repeatable read too.
If you are using Python, you should have heard of the “ask forgiveness, not permission” design principle.
To avoid the race condition you describe, simply try to add the new row to the table.
If you get a unique_violation (SQLSTATE 23505), rollback the transaction and return that the name exists.
Two code examples (simplified):
.get outside the transaction (object from .get passed into the transactional function)
#db.transactional
def update_object_1_txn(obj, new_value):
obj.prop1 = new_value
return obj.put()
.get inside the transaction
#db.transactional
def update_object2_txn(obj_key, new_value):
obj = db.get(obj_key)
obj.prop1 = new_value
return obj.put()
Is the first example logically sound? Is the transaction there useful at all, does it provide anything? I'm trying to better understand appengine's transactions. Would choosing the second option prevent from concurrent modifications for that object?
To answer your question in one word: yes, your second example is the way to do it. In the boundaries of a transaction, you get some data, change it, and commit the new value.
Your first one is not wrong, though, because you don't read from obj. So even though it might not have the same value that the earlier get returned, you wouldn't notice. Put another way: as written, your examples aren't good at illustrating the point of a transaction, which is usually called "test and set". See a good Wikipedia article on it here: http://en.wikipedia.org/wiki/Test-and-set
More specific to GAE, as defined in GAE docs, a transaction is:
a set of Datastore operations on one or more entities. Each transaction is guaranteed to be atomic, which means that transactions are never partially applied. Either all of the operations in the transaction are applied, or none of them are applied.
which tells you it doesn't have to be just for test and set, it could also be useful for ensuring the batch commit of several entities, etc.
Say I have a query that will be executed often, most likely yielding the same results.
Is it correct that using:
for key in qry.iter(keys_only=True):
item = key.get()
#do something with item
Would perform better than:
for item in qry:
#do something with item
Because in the first example, the query will only load the keys and subsequent calls to key.get() will take advantage of NDB's caching mechanism, whereas example 2 will always fetch the entities from the store? Or have I misunderstood something?
I would doubt that the second form would perform better -- it is always possible that the values are not in the cache, and then, presuming you are getting more than one entity back, you'd be making multiple roundtrips. That quickly gets slower.
A better approach is indeed what's shown in http://code.google.com/p/appengine-ndb-experiment/issues/detail?id=118 -- use ndb.multi_get(q.fetch(keys_only=True)). But even that is worse if your cache hit rate is too low; this is extensively discussed in the issue.
AFAIK It will not make any different, because internally, ndb caches everything, including query. If you are going to do other stuff with each one, try async api. that can save valuable time. edit : moreover, if ndb knows query in advance, it can even prefetch them.
I have read this six months back so not sure what is current behavior.
I'm not really sure how scoped_session works, other than it seems to be a wrapper that hides several real sessions, keeping them separate for different requests. Does it do this with thread locals?
Anyway the trouble is as follows:
S = elixir.session # = scoped_session(...)
f = Foo(bar=1)
S.add(f) # ERROR, f is already attached to session (different session)
Not sure how f ended up in a different session, I've not had problems with that before. Elsewhere I have code that looks just like that, but actually works. As you can imagine I find that very confusing.
I just don't know anything here, f seems to be magically added to a session in the constructor, but I don't seem to have any references to the session it uses. Why would it end up in a different session? How can I get it to end up in the right session? How does this scoped_session thing work anyway? It just seems to work sometimes, and other times it just doesn't.
I'm definitely very confused.
Scoped session creates a proxy object that keeps a registry of (by default) per thread session objects created on demand from the passed session factory. When you access a session method such as ScopedSession.add it finds the session corresponding to the current thread and returns the add method bound to that session. The active session can be removed using the ScopedSession.remove() method.
ScopedSession has a few convenience methods, one is query_property that creates a property that returns a query object bound to the scoped session it was created on and the class it was accessed. The other is ScopedSession.mapper that adds a default __init__(**kwargs) constructor and by default adds created objects to the scoped session the mapper was created off. This behavior can be controlled by the save_on_init keyword argument to the mapper. ScopedSession.mapper is deprecated because of exactly the problem that is in the question. This is one case where the Python "explicit is better than implicit" philosophy really applies. Unfortunately Elixir still by default uses ScopedSession.mapper.
It turns out elixir sets save-on-init=True on the created mappers. This can be disabled by:
using_mapper_options(save_on_init=False)
This solves the problem. Kudos to stepz on #sqlalchemy for figuring out what was going on immediately. Although I am still curious how scoped_session really works, so if someone answers that, they'll get credit for answering the question.
I'm using SQLAlchemy's declarative extension. I'd like all changes to tables logs, including changes in many-to-many relationships (mapping tables). Each table should have a separate "log" table with a similar schema, but additional columns specifying when the change was made, who made the change, etc.
My programming model would be something like this:
row.foo = 1
row.log_version(username, change_description, ...)
Ideally, the system wouldn't allow the transaction to commit without row.log_version being called.
Thoughts?
There are too many questions in one, so they that full answers to all them won't fit StackOverflow answer format. I'll try to describe hints in short, so ask separate question for them if it's not enough.
Assigning user and description to transaction
The most popular way to do so is assigning user (and other info) to some global object (threading.local() in threaded application). This is very bad way, that causes hard to discover bugs.
A better way is assigning user to the session. This is OK when session is created for each web request (in fact, it's the best design for application with authentication anyway), since there is the only user using this session. But passing description this way is not as good.
And my favorite solution is to extent Session.commit() method to accept optional user (and probably other info) parameter and assign it current transaction. This is the most flexible, and it suites well to pass description too. Note that info is bound to single transaction and is passed in obvious way when transaction is closed.
Discovering changes
There is a sqlalchemy.org.attributes.instance_state(obj) contains all information you need. The most useful for you is probably state.committed_state dictionary which contains original state for changed fields (including many-to-many relations!). There is also state.get_history() method (or sqlalchemy.org.attributes.get_history() function) returning a history object with has_changes() method and added and deleted properties for new and old value respectively. In later case use state.manager.keys() (or state.manager.attributes) to get a list of all fields.
Automatically storing changes
SQLAlchemy supports mapper extension that can provide hooks before and after update, insert and delete. You need to provide your own extension with all before hooks (you can't use after since the state of objects is changed on flush). For declarative extension it's easy to write a subclass of DeclarativeMeta that adds a mapper extension for all your models. Note that you have to flush changes twice if you use mapped objects for log, since a unit of work doesn't account objects created in hooks.
We have a pretty comprehensive "versioning" recipe at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/LogVersions . It seems some other users have contributed some variants on it. The mechanics of "add a row when something changes at the ORM level" are all there.
Alternatively you can also intercept at the execution level using ConnectionProxy, search through the SQLA docs for how to use that.
edit: versioning is now an example included with SQLA: http://docs.sqlalchemy.org/en/rel_0_8/orm/examples.html#versioned-objects