Garbage collecting objects in Django - python

I have a one-to-many relationship, and I would like to automatically delete the one side after the last referencing object on the many side has been deleted. That is to say, I want to perform garbage collection, or do a kind of reverse cascade operation.
I have tried to solve this by using Django's post_delete signal. Here is a simplified example of what I'm trying to do:
models.py
class Bar(models.Model):
j = models.IntegerField()
# implicit foo_set
class Foo(models.Model):
i = models.IntegerField()
bar = models.ForeignKey(Bar)
def garbage_collect(sender, instance, **kwargs):
# Bar should be deleted after the last Foo.
if instance.bar.foo_set.count() == 0:
instance.bar.delete()
post_delete.connect(garbage_collect, Foo)
This works when using Model.delete, but with QuerySet.delete it breaks horribly.
tests.py
class TestGarbageCollect(TestCase):
# Bar(j=1)
# Foo(bar=bar, i=1)
# Foo(bar=bar, i=2)
# Foo(bar=bar, i=3)
fixtures = ['db.json']
def test_separate_post_delete(self):
for foo in Foo.objects.all():
foo.delete()
self.assertEqual(Foo.objects.count(), 0)
self.assertEqual(Bar.objects.count(), 0)
This works just fine.
tests.py continued
def test_queryset_post_delete(self):
Foo.objects.all().delete()
self.assertEqual(Foo.objects.count(), 0)
self.assertEqual(Bar.objects.count(), 0)
This breaks on the second time the signal is emitted, because as Django's documentation says, QuerySet.delete is applied instantly, and instance.bar.foo_set.count() == 0 is true already on the first time the signal is emitted. Still reading from the docs, QuerySet.delete will emit post_delete signal for every deleted object, and garbage_collect gets called after Bar has been deleted.
To the questions then:
Is there a better way of garbage collecting the one side of a one-to-many relationship?
If not, what should I change to be able to use QuerySet.delete?

By checking code in delete() inside django/db/models/deletion.py, I found the QuerySet.delete deletes collected instances in batch and THEN trigger post_delete for those deleted instances. If you delete Bar() in the first post_delete calling for the first deleted Foo() instance, later post_delete of Foo() instances will be failed because the Bar() which they point to has already been deleted.
The key here is that Foo()s having same bar does not point to the same Bar() instance, and the bar gets deleted too early. Then we could
straightly try...except the lookup of instance.bar
def garbage_collect(sender, instance, **kwargs):
try:
if instance.bar.foo_set.exists():
instance.bar.delete()
except Bar.DoesNotExist:
pass
preload Bar() for each instances to avoid the above exception
def test_queryset_post_delete(self):
Foo.objects.select_related('bar').delete()
def garbage_collect(sender, instance, **kwargs):
if instance.bar.foo_set.exists():
instance.bar.delete()
Both of above solutions do extra SELECT queries. The more graceful ways could be
Do the deletion of Bar always in garbage_collect or manually later, if you can:
Bar.objects.filter(foo__isnull=True).delete()
In garbage_collect, record the deletion plan for Bar() instead of deleting, to some ref-count flag or queued tasks.

I ques you can override the model's method delete, find the related objects and delete them too.

Related

Django CASCADE and post_delete interaction

I have the following model:
class A():
foriegn_id1 = models.CharField # ref to a database not managed by django
foriegn_id2 = models.CharField
class B():
a = models.OneToOneField(A, on_delete=models.CASCADE)
So I want A to be deleted as well when B is deleted:
#receiver(post_delete, sender=B)
def post_delete_b(sender, instance, *args, **kwargs):
if instance.a:
instance.a.delete()
And on the deletion of A, I want to delete the objects from the unmanaged databases:
#receiver(post_delete, sender=A)
def post_delete_b(sender, instance, *args, **kwargs):
if instance.foriegn_id1:
delete_foriegn_obj_1(instance.foriegn_id1)
if instance.foriegn_id2:
delete_foriegn_obj_2(instance.foriegn_id2)
Now, if I delete object B, it works fine. But if I delete obj A, then obj B is deleted by cascade, and then it emits a post_delete signal, which triggers the deletion of A again. Django knows how to manage that on his end, so it works fine until it reaches delete_foriegn_obj, which is then called twice and returns a failure on the second attempt.
I thought about validating that the object exists in delete_foriegn_obj, but it adds 3 more calls to the DB.
So the question is: is there a way to know during post_delete_b that object a has been deleted?
Both instance.a and A.objects.get(id=instance.a.id) return the object (I guess Django caches the DB update until it finishes all of the deletions are done).
The problem is that the cascaded deletions are performed before the requested object is deleted, hence when you queried the DB (A.objects.get(id=instance.a.id)) the related a instance is present there. instance.a can even show a cached result so there's no way it would show otherwise.
So while deleting a B model instance, the related A instance will always be existent (if actually there's one). Hence, from the B model post_delete signal receiver, you can get the related A instance and check if the related B actually exists from DB (there's no way to avoid the DB here to get the actual picture underneath):
#receiver(post_delete, sender=B)
def post_delete_b(sender, instance, *args, **kwargs):
try:
a = instance.a
except AttributeError:
return
try:
a._state.fields_cache = {}
except AttributeError:
pass
try:
a.b # one extra query
except AttributeError:
# This is cascaded delete
return
a.delete()
We also need to make sure we're not getting any cached result by making a._state.fields_cache empty. The fields_cache (which is actually a descriptor that returns a dict upon first access) is used by the ReverseOneToOneDescriptor (accessor to the related object on the opposite side of a one-to-one) to cache the related field name-value. FWIW, the same is done on the forward side of the relationship by the ForwardOneToOneDescriptor accessor.
Edit based on comment:
If you're using this function for multiple senders' post_delete, you can dynamically get the related attribute via getattr:
getattr(a, sender.a.field.related_query_name())
this does the same as a.b above but allows us to get attribute dynamically via name, so this would result in exactly similar query as you can imagine.

Django - How to dynamically create signals inside model Mixin

I'm working on a model Mixin which needs to dynamically set signals based on one attribute.
It's more complicated but for simplicity, let's say the Mixin has this attribute:
models = ['app.model1','app.model2']
This attribute is defined in model which extends this mixin.
How can I register signals dynamically?
I tried to create a classmethod:
#classmethod
def set_signals(cls):
def status_sig(sender, instance, created, *args, **kwargs):
print('SIGNAL')
... do som things
for m in cls.get_target_models():
post_save.connect(status_sig,m)
My idea was to call this method somewhere in class automatically (for example __call__ method) but for now, I just tried to call it and then save the model to see if it works but it didn't.
from django.db.models.signals import post_save
print(post_save.receivers)
Realestate.set_signals()
print(post_save.receivers)
r = Realestate.objects.first()
r.status = 1
r.save()
output
[]
[((139967044372680, 46800232), <weakref at 0x7f4c9d702408; dead>), ((139967044372680, 46793464), <weakref at 0x7f4c9d702408; dead>)]
So you see that it registered those models but no signal has been triggered after saving the realestate.
Do you know how to make it work? Even better without having to call method explicitely?
EDIT:
I can't just put the signals creation inside mixin file because models depends on the string in child model.
If you haven't already solved this:
In the connect method, set weak=False. By default it's True so the locally-defined function reference will get lost if the object instance is garbage collected.
This is likely what's happening to your status_sig function; as you can see in the print out of the post_save receivers, the weakref's are dead so will always just return None
In the Django docs:
weak – Django stores signal handlers as weak references by default. Thus, if your receiver is a local function, it may be garbage collected. To prevent this, pass weak=False when you call the signal’s connect() method.
For more info on weakrefs, see Python docs

determine if Django model is marked for deletion

My example is very contrived, but hopefully it gets the point across.
Say I have two models like this:
class Group(models.Model):
name = models.CharField(max_length=50)
class Member(models.Model):
name = models.CharField(max_length=50)
group = models.ForeignKey(Group)
I want to add some code so that when a Member is deleted it gets recreated as a new entry (remember, very contrived!). So I do this:
#receiver(post_delete, sender=Member)
def member_delete(sender, instance, **kwargs):
instance.pk = None
instance.save()
This works perfectly fine for when a Member is deleted.
The issue, though, is if a Group is deleted this same handler is called. The Member is re-created with a reference to the Group and an IntegrityError is thrown when the final commit occurs.
Is there any way within the signal handler to determine that Group is being deleted?
What I've tried:
The sender seems to always be Member regardless.
I can't seem to find anything on instance.group to indicate a delete. Even trying to do a Group.objects.filter(id=instance.group_id).exists() returns true. It may be that the actual delete of the parent occurs after post_delete calls occur on the children, in which case what I'm trying to do is impossible.
Try doing your job by a classmethod inside Member class and forget about signals.
#classmethod
def reinit(cls, instance):
instance.delete()
instance.save()

Django model, default records

I am a beginner in Django, and I am learning models for now.
I have two tables in the backend, one a child of another (1-to-many relationship).
So far, so good.
What I want to do is set Django, so that if a record is created in the parent table, the child table will automatically create 3 records.
How do I program this?
Thanks.
You may be interested in something like this:
# ... other imports ...
from django.db.models.signals import post_save, pre_save
class Parent(models.Model)
# this decorator & this method go inside your model
#staticmethod
def create_children(sender, instance=None, **kwargs):
for x in range(3):
# I'm assuming you want the child to be linked to the parent immediately.
# You can set any other attributes you want here too, of course.
child = Child(parent = instance)
child.save()
pre_save.connect(Parent.create_children, Parent)
Note that in the pre_save.connect() call, you can call any [SomeClass].[SomeMethodOfThatClass] (this is the first argument) on the save of some other class (this is the second argument). In practice, though, I don't think I've actually ever done that, and I'm not sure that you need to do that here.

'No implementation for Kind' when Kind is a PolyModel child

I have a model with properties:
_foo = db.ListProperty(db.Key)
#property
def foo(self):
return [db.get(f) for f in self._foo]
_foo is list of Bar keys, where:
class Barbie(polymodel.PolyModel):
#...
class Bar(Barbie):
#...
However, when get() in foo is called, I get a:
KindError: No implementation for kind 'Bar'
Every other question I've found on this has been answered by "you need to import the model for that kind".
Well, I have. To sanity check, I changed db.get to Bar.get and got the same error. If I then remove my import, obviously I get 'global name Bar is not defined', so it is picking up that import okay.
Those other questions weren't about PolyModels though, so I tried changing my Bar and sibling model to db.Models, with duplicate properties rather than derivations of a PolyModel. Then, it worked immediately.
To check I hadn't changed anything else, I reverted to the PolyModel, and it immediately broke with the same KindError.
Why is this get not working with a PolyModel class, but fine with a db.Model class - according to the docs, PolyModel only adds to the available methods.

Categories