I am seeing post_delete fire on a model before the instance is actually deleted from the database, which is contrary to https://docs.djangoproject.com/en/1.6/ref/signals/#post-delete
Note that the object will no longer be in the database, so be very careful what you do with this instance.
If I look in the database, the record remains, if I requery using the ORM, the record is returned, and is equivalent to the instance:
>>> instance.__class__.objects.get(pk=instance.pk) == instance
True
I don't have much relevant code to show, my signal looks like this:
from django.db.models.signals import post_delete, post_save
#receiver(post_delete, sender=UserInvite)
def invite_delete_action(sender, instance, **kwargs):
raise Exception(instance.__class__.objects.get(pk=instance.pk) == instance)
I am deleting this instance directly, it's not a relation of something else that is being deleted
My model is pretty normal looking
My view is a generic DeleteView
I haven't found any transactional decorators anywhere - which was my first thought as to how it might be happening
Any thoughts on where I would start debugging how on earth this happening? Is anyone aware of this as a known bug, I can't find any tickets describing any behaviour like this – also I am sure this works as expected in various other places in my application that are seemingly unaffected.
If I allow the execution to continue the instance does end up deleted... so it's not like it's present because it's failing to delete it (pretty sure post_delete shouldn't fire in that case anyway).
I believe what I am seeing is because of Django's default transactional behaviour, where the changes are not committed until the request is complete.
I don't really have a solution – I can't see a way to interrogate the state an instance or record would be in once the transaction is completed (or even a way to have any visibility of the transaction) nor any easy way to prevent this behaviour without significantly altering the way the application runs.
I am opting for ignore the problem for now, and not worrying about the repercussions in my use-case, which in fact, aren't that severe – I do welcome any and all suggestions regarding how to handle this properly however.
I fire a more generic signal for activity logging in my post_delete, and in the listener for that I need to be able to check if the instance is being deleted – otherwise it binds a bad GenericRelation referencing a pk that does not exist, what I intended to do is nullify it if I see the relation is being deleted - but as described, I can't tell at this point, unless I was to pass an extra argument whenever I fire the signal inside the post_delete.
Related
I am currently working on a Django 2+ project involving a blockchain, and I want to make copies of some of my object's states into that blockchain.
Basically, I have a model (say "contract") that has a list of several "signature" objects.
I want to make a snapshot of that contract, with the signatures. What I am basically doing is taking the contract at some point in time (when it's created for example) and building a JSON from it.
My problem is: I want to update that snapshot anytime a signature is added/updated/deleted, and each time the contract is modified.
The intuitive solution would be to override each "delete", "create", "update" of each of the models involved in that snapshot, and pray that all of them the are implemented right, and that I didn't forget any. But I think that this is not scalable at all, hard to debug and ton maintain.
I have thought of a solution that might be more centralized: using a periodical job to get the last update date of my object, compare it to the date of my snapshot, and update the snapshot if necessary.
However with that solution, I can identify changes when objects are modified or created, but not when they are deleted.
So, this is my big question mark: how with django can you identify deletions in relationships, without any prior context, just by looking at the current database's state ? Is there a django module to record deleted objects ? What are your thoughts on my issue ?
All right?
I think that, as I understand your problem, you are in need of a module like Django Signals, which listens for changes in the database and, when identified (and if all the desired conditions are met), executes certain commands in your application ( even be able to execute in the database).
This is the most recent documentation:
https://docs.djangoproject.com/en/3.1/topics/signals/
I need to attach an object to session in such a way that it will not differ from one persisted in db. (Easier to explain it with code):
session.query(type(some_object)).filter_by(id=some_object.id).one()
Is there more proper way to do that?
session.add(some_object) doesn't work since an entity with such id can already be attached to this session, and object = session.merge(some_object) doesn't work for me because it translates state from detached copy (if i make object.name='asdfasdf' these changes will be pending after merging object)
EDIT:
I found a bit less ugly way:
some_object = session.merge(some_object)
session.refresh(some_object)
But is there a way todo this in one call?
I need to attach an object to session in such a way that it will not differ from one persisted in db.
"will not differ from DB" pretty much means you're looking to load it, so query it. You might want to consider that the object might already be present in that target session. so your approach with query(type(object)) is probably the most direct, though you can use get() to hit the primary key directly, and populate_existing() to guarantee that state which already exists in the session is overwritten:
session.query(type(some_object)).populate_existing().get(some_object.id)
the above calls down to almost the same codepaths that refresh() does. The merge/refresh approach you have works too but emits at least two SELECT calls.
I have something of a master table of Persons. Everything in my Django app some relates to one or more People, either directly or through long fk chains. Also, all my models have the standard bookkeeping fields 'created_at' and 'updated_at'. I want to add a field on my Person table called 'last_active_at', mostly for raw sql ordering purposes.
Creating or editing certain related models produces new timestamps for those objects. I need to somehow update Person.'last_active_at' with those values. Functionally, this isn't too hard to accomplish, but I'm concerned about undue stress on the app.
My two greatest causes of concern are that I'm restricted to a real db field--I can't assign a function to the Person table as a #property--and one of these 'activity' models receives and processes new instances from a foreign datasource I have no control over, sporadically receiving a lot of data at once.
My first thought was to add a post_save hook to the 'activity' models. Still seems like my best option, but I know nothing about them, how hard they hit the db, etc.
My second thought was to write some sort of script that goes through the day's activity and updates those models over the night. My employers a 'live'er stream, though.
My third thought was to modify the post_save algo to check if the 'updated_at' is less than half an hour from the Person's 'last_active_at', and not update the person if true.
Are my thoughts tending in a scalable direction? Are there other approaches I should pursue?
It is said that premature optimization is the mother of all problems. You should start with the dumbest implementation (update it every time), and then measure and - if needed - replace it with something more efficient.
First of all, let's put a method to update the last_active_at field on Person. That way, all the updating logic itself is concentrated here, and we can easily modify it later.
The signals are quite easy to use : it's just about declaring a function and registering it as a receiver, and it will be ran each time the signal is emitted. See the documentation for the full explanation, but here is what it might look like :
from django.db.models.signals import post_save
from django.dispatch import receiver
#receiver(post_save, sender=RelatedModel)
def my_handler(sender, **kwargs):
# sender is the object being saved
person = # Person to be updated
person.update_activity()
As for the updating itself, start with the dumbest way to do it.
def update_activity(self):
self.last_active_at = now()
Then measure and decide if it's a problem or not. If it's a problem, some of the things you can do are :
Check if the previous update is recent before updating again. Might be useless if a read to you database is not faster than a write. Not a problem if you use a cache.
Write it down somewhere for a deferred process to update later. No need to be daily : if the problem is that you have 100 updates per seconds, you can just have a script update the database every 10 seconds, or every minutes. You can probably find a good performance/uptodatiness trade-off using this technique.
These are just some though based on what you proposed, but the right choice depends on the kind of figures you have. Determine what kind of load you'll have, what kind of reaction time is needed for that field, and experiment.
I'm no transaction / database expert, so pardon my ignorance in the wording here:
When you use Django's transaction.commit_on_success(func), any error that's propagated to the control of commit_on_success will roll back the transaction which is really helpful of course in case you need some all-or-nothing action in a method, etc. This makes Django's view-based transaction handling great for views that do a lot of stuff.
Sometimes I wrap model methods or plain old helper functions in commit_on_success to achieve the same all-or-nothing behavior.
The problem comes when you have nested Django transactions. Example: A transaction protected view calls a model method that's wrapped in commit_on_success and then does some other stuff with another model and causes an exception. Oops, when control returned to commit_on_success from the model method the transaction was committed and now the view errors out changing my view to all-or-some instead of all-or-nothing. This isn't limited to views. I may have nested operations going on which all lots_o_foo_or_nothing() uses commit_on_success and calls all_or_nothing_1() and all_or_nothing_2() which are both wrapped in commit_on_success. If lots_o_foo_or_nothing() errors out the sub function calls will have already committed their transactions to the DB, logically corrupting my data.
Is there a way around this? Again, pardon me, if I'm misunderstanding something, but it seems this is the behavior I've witnessed, and a way around it would be a great convenience.
not a final solution but an idea based on this snippet (which is good idea per se)
this plus savepoints can create nice solution: a decorator, which will be aware if transaction is inside other transaction (and if it is, is using savepoints instead of transactions).
I'm using SQLAlchemy's declarative extension. I'd like all changes to tables logs, including changes in many-to-many relationships (mapping tables). Each table should have a separate "log" table with a similar schema, but additional columns specifying when the change was made, who made the change, etc.
My programming model would be something like this:
row.foo = 1
row.log_version(username, change_description, ...)
Ideally, the system wouldn't allow the transaction to commit without row.log_version being called.
Thoughts?
There are too many questions in one, so they that full answers to all them won't fit StackOverflow answer format. I'll try to describe hints in short, so ask separate question for them if it's not enough.
Assigning user and description to transaction
The most popular way to do so is assigning user (and other info) to some global object (threading.local() in threaded application). This is very bad way, that causes hard to discover bugs.
A better way is assigning user to the session. This is OK when session is created for each web request (in fact, it's the best design for application with authentication anyway), since there is the only user using this session. But passing description this way is not as good.
And my favorite solution is to extent Session.commit() method to accept optional user (and probably other info) parameter and assign it current transaction. This is the most flexible, and it suites well to pass description too. Note that info is bound to single transaction and is passed in obvious way when transaction is closed.
Discovering changes
There is a sqlalchemy.org.attributes.instance_state(obj) contains all information you need. The most useful for you is probably state.committed_state dictionary which contains original state for changed fields (including many-to-many relations!). There is also state.get_history() method (or sqlalchemy.org.attributes.get_history() function) returning a history object with has_changes() method and added and deleted properties for new and old value respectively. In later case use state.manager.keys() (or state.manager.attributes) to get a list of all fields.
Automatically storing changes
SQLAlchemy supports mapper extension that can provide hooks before and after update, insert and delete. You need to provide your own extension with all before hooks (you can't use after since the state of objects is changed on flush). For declarative extension it's easy to write a subclass of DeclarativeMeta that adds a mapper extension for all your models. Note that you have to flush changes twice if you use mapped objects for log, since a unit of work doesn't account objects created in hooks.
We have a pretty comprehensive "versioning" recipe at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/LogVersions . It seems some other users have contributed some variants on it. The mechanics of "add a row when something changes at the ORM level" are all there.
Alternatively you can also intercept at the execution level using ConnectionProxy, search through the SQLA docs for how to use that.
edit: versioning is now an example included with SQLA: http://docs.sqlalchemy.org/en/rel_0_8/orm/examples.html#versioned-objects