Django - Inception style transaction handling

Django - Inception style transaction handling - python

I'm no transaction / database expert, so pardon my ignorance in the wording here:
When you use Django's transaction.commit_on_success(func), any error that's propagated to the control of commit_on_success will roll back the transaction which is really helpful of course in case you need some all-or-nothing action in a method, etc. This makes Django's view-based transaction handling great for views that do a lot of stuff.
Sometimes I wrap model methods or plain old helper functions in commit_on_success to achieve the same all-or-nothing behavior.
The problem comes when you have nested Django transactions. Example: A transaction protected view calls a model method that's wrapped in commit_on_success and then does some other stuff with another model and causes an exception. Oops, when control returned to commit_on_success from the model method the transaction was committed and now the view errors out changing my view to all-or-some instead of all-or-nothing. This isn't limited to views. I may have nested operations going on which all lots_o_foo_or_nothing() uses commit_on_success and calls all_or_nothing_1() and all_or_nothing_2() which are both wrapped in commit_on_success. If lots_o_foo_or_nothing() errors out the sub function calls will have already committed their transactions to the DB, logically corrupting my data.
Is there a way around this? Again, pardon me, if I'm misunderstanding something, but it seems this is the behavior I've witnessed, and a way around it would be a great convenience.

not a final solution but an idea based on this snippet (which is good idea per se)
this plus savepoints can create nice solution: a decorator, which will be aware if transaction is inside other transaction (and if it is, is using savepoints instead of transactions).

Related

How is post_delete firing before delete in Django?

I am seeing post_delete fire on a model before the instance is actually deleted from the database, which is contrary to https://docs.djangoproject.com/en/1.6/ref/signals/#post-delete
Note that the object will no longer be in the database, so be very careful what you do with this instance.
If I look in the database, the record remains, if I requery using the ORM, the record is returned, and is equivalent to the instance:
>>> instance.__class__.objects.get(pk=instance.pk) == instance
True
I don't have much relevant code to show, my signal looks like this:
from django.db.models.signals import post_delete, post_save
#receiver(post_delete, sender=UserInvite)
def invite_delete_action(sender, instance, **kwargs):
raise Exception(instance.__class__.objects.get(pk=instance.pk) == instance)
I am deleting this instance directly, it's not a relation of something else that is being deleted
My model is pretty normal looking
My view is a generic DeleteView
I haven't found any transactional decorators anywhere - which was my first thought as to how it might be happening
Any thoughts on where I would start debugging how on earth this happening? Is anyone aware of this as a known bug, I can't find any tickets describing any behaviour like this – also I am sure this works as expected in various other places in my application that are seemingly unaffected.
If I allow the execution to continue the instance does end up deleted... so it's not like it's present because it's failing to delete it (pretty sure post_delete shouldn't fire in that case anyway).

I believe what I am seeing is because of Django's default transactional behaviour, where the changes are not committed until the request is complete.
I don't really have a solution – I can't see a way to interrogate the state an instance or record would be in once the transaction is completed (or even a way to have any visibility of the transaction) nor any easy way to prevent this behaviour without significantly altering the way the application runs.
I am opting for ignore the problem for now, and not worrying about the repercussions in my use-case, which in fact, aren't that severe – I do welcome any and all suggestions regarding how to handle this properly however.
I fire a more generic signal for activity logging in my post_delete, and in the listener for that I need to be able to check if the instance is being deleted – otherwise it binds a bad GenericRelation referencing a pk that does not exist, what I intended to do is nullify it if I see the relation is being deleted - but as described, I can't tell at this point, unless I was to pass an extra argument whenever I fire the signal inside the post_delete.

How to handle errors from ndb.put_multi

The documentation for GAE's ndp.put_multi is severely lacking. NDB Entities and Keys - Python — Google Cloud Platform shows that it returns a list of keys (list_of_keys = ndb.put_multi(list_of_entities)), but it says nothing about failures. NDB Functions doesn't provide much more information.
Spelunking through the code (below), shows me that, at least for now, put_multi just aggregates the Future.get_result()s returned from the async method, which itself delegates to the entities' put code. Now, the docs for the NDB Future Class indicate that a result will be returned or else an exception will be raised. I've been told, however, that the result will be None if a particular put failed (I can't find any authoritative documentation to that effect, but if it's anything like db.get then that would make sense).
So all of this boils down to some questions I can't find the answers to:
Clearly, the return value is a list - is it a list with some elements possibly None? Or are exceptions used instead?
When there is an error, what should be re-put? Can all entities be re-put (idempotent), or only those whose return value are None (if that's even how errors are communicated)?
How common are errors (One answer: 1/3000)? Do they show up in logs (because I haven't seen any)? Is there a way to reliably simulate an error for testing?
Usage of the function in an open source library implies that the operation is idempotent, but that's about it. (Other usages don't even bother checking the return value or catching exceptions.)
Handling Datastore Errors makes no mention of anything but exceptions.

I agree with your reading of the code: put_multi() reacts to an error the same way put_async().get_result() does. If put() would raise an exception, put_multi() will also, and will be unhelpful about which of the multiple calls failed. I'm not aware of a circumstance where put_multi() would return None for some entries in the key list.
You can re-put entities that have been put, assuming no other user has updated those entities since the last put attempt. Entities that are created with system-generated IDs have their in-memory keys updated, so re-putting these would overwrite the existing entities and not create new ones. I wouldn't call it idempotent exactly because the retry would overwrite any updates to those entities made by other processes.
Of course, if you need more control over the result, you can perform this update in a transaction, though all entities would need to be in the same entity group for this to work with a primitive transaction. (Cross-group transactions support up to five distinct entity groups.) A failure during a transaction would ensure that none of the entities are created or updated if any of the attempts fail.
I don't know a general error rate for update failures. Such failures are most likely to include contention errors or "hot tablets" (too many updates to nearby records in too short a time, resulting in a timeout), which would depend on app behavior. All such errors are reported by the API as exceptions.
The easiest way to test error handling call paths would be to wrap the Model class and override the methods with a test mode behavior. You could get fancier and dig into the stub API used by testbed, which may have a way to hook into low-level calls and simulate errors. (I don't think this is a feature of testbed directly but you could use a similar technique.)

Getting and serializing the state of dynamically created python instances to a relational model

I'm developing a framework of sorts. I'm providing a base class, that will be subclassed by other developers to add behavior to the system. The instances of those classes will have attributes that my framework doesn't necessarily expect, except by inspecting those instances' __dict__. To make things even more interesting, some of those classes can be created dynamically, at any time.
I'd like some things to be handled by the framework, namely, I will need to persist those instances, display their attribute values to the user, and let her search/filter instances using those values.
I have to use a relational database. I know there are some decent python OO database out there, but unfortunately they're not an option in this case.
I'm not looking for a full-blown ORM too... and it may not even be an option, given that some of the classes can be created dynamically.
So, my question is, what state of a python instance do I need to serialize to ensure that I can deserialize it later on? Is it enough to look at __dict__, or are there other private attributes that I should be using?
Pickling the instances is not enough, because I'll need to unpickle them to search/filter the attribute values, and I'm afraid it's too much data to do it in-memory (instead of letting the database do it).

Just use an ORM. This is what they are for.
What you are proposing to do is create your own half-assed ORM on your own time. Save your time for your own code that does things, and use the effort other people put for free into solving this problem for you.
Note that all class creation in python is "dynamic" - this is not an issue, for, well, anything at all. In fact, if you are assembling classes programmatically, it is probably slightly easier with an ORM, because they provide reifications of fields.
In the worst case, if you really do need to store your objects in a fake nosql-type schema, you will still only have to write your own backend driver if you use an existing ORM, rather than coding the whole stack yourself. (As it happens, you're not the first person to face this - solutions exist. Goole "python orm store dynamically created models" and "sqlalchemy store dynamically created models")
Candidates include:
Django ORM
SQLAlchemy
Some others you can find by googling "Python ORM".

Can django lazy-load fields in a model?

One of my django models has a large TextField which I often don't need to use. Is there a way to tell django to "lazy-load" this field? i.e. not to bother pulling it from the database unless I explicitly ask for it. I'm wasting a lot of memory and bandwidth pulling this TextField into python every time I refer to these objects.
The alternative would be to create a new table for the contents of this field, but I'd rather avoid that complexity if I can.

The functionality happens when you make the query, using the defer() statement, instead of in the model definition. Check it out here in the docs:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#defer
Now, actually, your alternative solution of refactoring and pulling the data into another table is a really good solution. Some people would say that the need to lazy load fields means there is a design flaw, and the data should have been modeled differently.
Either way works, though!

There are two options for lazy-loading in Django: https://docs.djangoproject.com/en/1.6/ref/models/querysets/#django.db.models.query.QuerySet.only
defer(*fields)
Avoid loading those fields that require expensive processing to convert them to Python objects.
Entry.objects.defer("text")
only(*fields)
Only load the fields that you actually need
Person.objects.only("name")
Personally, I think only is better than defer since the code is not only easier to understand, but also more maintainable in the long run.

For something like this you can just override the default manager. Usually, it's not advised but for a defer() it makes sense:
class CustomManager(models.Manager):
def get_queryset(self):
return super(CustomManager, self).get_queryset().defer('YOUR_TEXTFIELD_FIELDNAME')
class DjangoModel(models.Model):
objects = CustomerManager()

What's a good general way to look SQLAlchemy transactions, complete with authenticated user, etc?

I'm using SQLAlchemy's declarative extension. I'd like all changes to tables logs, including changes in many-to-many relationships (mapping tables). Each table should have a separate "log" table with a similar schema, but additional columns specifying when the change was made, who made the change, etc.
My programming model would be something like this:
row.foo = 1
row.log_version(username, change_description, ...)
Ideally, the system wouldn't allow the transaction to commit without row.log_version being called.
Thoughts?

There are too many questions in one, so they that full answers to all them won't fit StackOverflow answer format. I'll try to describe hints in short, so ask separate question for them if it's not enough.
Assigning user and description to transaction
The most popular way to do so is assigning user (and other info) to some global object (threading.local() in threaded application). This is very bad way, that causes hard to discover bugs.
A better way is assigning user to the session. This is OK when session is created for each web request (in fact, it's the best design for application with authentication anyway), since there is the only user using this session. But passing description this way is not as good.
And my favorite solution is to extent Session.commit() method to accept optional user (and probably other info) parameter and assign it current transaction. This is the most flexible, and it suites well to pass description too. Note that info is bound to single transaction and is passed in obvious way when transaction is closed.
Discovering changes
There is a sqlalchemy.org.attributes.instance_state(obj) contains all information you need. The most useful for you is probably state.committed_state dictionary which contains original state for changed fields (including many-to-many relations!). There is also state.get_history() method (or sqlalchemy.org.attributes.get_history() function) returning a history object with has_changes() method and added and deleted properties for new and old value respectively. In later case use state.manager.keys() (or state.manager.attributes) to get a list of all fields.
Automatically storing changes
SQLAlchemy supports mapper extension that can provide hooks before and after update, insert and delete. You need to provide your own extension with all before hooks (you can't use after since the state of objects is changed on flush). For declarative extension it's easy to write a subclass of DeclarativeMeta that adds a mapper extension for all your models. Note that you have to flush changes twice if you use mapped objects for log, since a unit of work doesn't account objects created in hooks.

We have a pretty comprehensive "versioning" recipe at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/LogVersions . It seems some other users have contributed some variants on it. The mechanics of "add a row when something changes at the ORM level" are all there.
Alternatively you can also intercept at the execution level using ConnectionProxy, search through the SQLA docs for how to use that.
edit: versioning is now an example included with SQLA: http://docs.sqlalchemy.org/en/rel_0_8/orm/examples.html#versioned-objects

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.