Cost of Django many-to-many "through" relationship - python

If the foreign keys that define a many to many relationship are necessary anyway, is there any / much extra cost at the database level to telling Django that they define a many-to-many "through" relationship? Also, can a foreign key remain nullable in this circumstance?
What has to be there:
class StockLine( models.Model) # a line of stock (ie a batch)
# one or other of the following two is null depending on
# whether Stockline was manufactured in-house or bought in.
# (maybe both if it's been in stock "forever", no computer records)
production_record = models.ForeignKey('ProductionRecord',
null=True, blank=True)
purchase_order = models.ForeignKey('PurchaseOrder',
null=True, blank=True)
itemdesc = models.ForeignKey('ItemDesc')
# other fields ...
class ItemDesc( models.Model) # a description of an item
# various fields
class ProductionRecord( models.Model) # desc of a manufacturing process
# various fields
There is an implied many-to-many relationship between ProductionRecord and ItemDesc through StockLine. Given that one of the foreign keys is nullable, can I make the M2M explicit by adding
class ItemDesc( models.Model)
production_records = models.ManyToManyField(ProductionRecord,
through='StockLine')
and if I can, is there any added cost at the database level, or is this change purely at the Django ORM level? It's not an essential relationship to make explicit and it won't be heavily used, but it would certainly make programming easier.

There shouldn't be any problems with nullable fields, because it just means that they can have null as a value, not that they have to. So they remain useable for many-to-many relationships.
Keep in mind the restrictions for intermediate model and you should be fine. On the database level, you'd get an extra table if you don't use an intermediate model as Django needs an extra table for many-to-many-relationships, while with the "through" argument it uses the intermediate model's table.
The SQL query shouldn't be affected (regarding performance).
Generally, I'd recommend having your models follow your projects real-life logic, so use the intermediate model if it's appropriate.

Related

Django model constraint for related objects

I have the following code for models:
class Tag(models.Model):
user = models.ForeignKey('auth.User', on_delete=models.CASCADE)
class Activity(models.Model):
user = models.ForeignKey('auth.User', on_delete=models.CASCADE)
tags = models.ManyToManyField(Tag, through='TagBinding')
class TagBinding(models.Model):
tag = models.ForeignKey(Tag)
activity = models.ForeignKey(Activity)
I want to write a database constraint on the TagBinding model using a new Django 2.2 syntax. This constraint should check that tag and activity fields of the TagBinding model have the same user. What I've tried to do:
class TagBinding(models.Model):
tag = models.ForeignKey(Tag)
activity = models.ForeignKey(Activity)
class Meta:
constraints = [
models.CheckConstraint(
name='user_equality',
check=Q(tag__user=F('activity__user')),
)
]
But this doesn't work because Django doesn't allow to use joins inside of the F function. Also Subquery with OuterRef didn't work for me because models that were referenced in a query were not registered.
Is there any way I can implement this constraint using a new syntax without raw SQL?
Update
It seems like some SQL backends don't support joins in constraints definition, so the question now: is it even possible to implement this behavior in the relational database?
In Postgres, there are two types of constraints (other than things like unique and foreign key constraints), CHECK CONSTRAINTS and EXCLUDE constraints.
Check constraints can only apply to a single row.
Exclusion constraints can only apply to a single table.
You will not be able to use either of these to enforce the constraint you want, which crosses table boundaries to ensure consistency.
What you could use instead are trigger-based constraints, that can perform other queries in order to validate the data.
For instance, you could have a BEFORE INSERT OR UPDATE trigger on the various tables that checks the users match. I have some similar code that runs on same self-relation tree code that ensures a parent and child both have the same "category" as one another.
In this case, it's going to be a bit trickier, because you would need some mechanism of preventing the check until all tables involved have been updated.

Django - Check if a non-foreign-key integer field exists as a Primary Key to another Model?

Say you have the following legacy model:
class Foo(models.Model):
bar_id = models.IntegerField(null=True, blank=True)
# more fields
bar_id is supposed to refer to a primary key from the Bar model, but for some reason, it's not registered as a foreign key. Now, how can I filter out all the Foos that do not have corresponding Bar objects?
We can make a list of primary keys of Bars, and then filter out all Foos that refer to such primary key.
Foo.objects.exclude(bar_id__in=Bar.objects.all().values_list('pk', flat=True))
This is a QuerySet that will give all Foo objects with an "invalid" bar_id (so an id that refers to a non-exiting Bar).
But it is better to use ForeignKey since then most databases will enforce this constraint in a transparant way. As a result, the database typically ensures that no such rows can exist at all. Typically you also add triggers to it what to do in case the Bar object that is referenced is for example removed.
Reading some of the comments makes me understand that OP would prefer to implement a foreign key but can not do so because of corrupt / missing data into database.
Two solutions:
Mark the column as foreign key in your Model, but do not enforce it
in the database (use --fake flag while migrating using manage.py
file. This approach helps in actually better defining your business
/ Model logic and enforces data Integrity in local development and
environments.
Mark the column as foreign key in your Model and use
db_constraint=False flag. Read more here. Use this approach
for legacy systems where data integrity has already been compromised
and you just need to use Django's ORM joins the natural way.

Having a model to relate to several different models

I have a simple notification model:
class Notification(models.Model):
user = models.ForeignKey(User)
sender = models.ForeignKey(User)
model = '''What to put here?'''
comment = models.CharField(max_length=200)
created = models.DateTimeField(auto_now=False,auto_now_add=True)
I need the notification to relate to several different models, for example; posts, user follows, etc
Is there anyway in django you can relate to several models instead of creating a notification model for each one?
I want to avoid models like this:
PostLikeNotification, UserFollowNotification, etc.
So does django have this functionality? I couldn't find it anywhere in the docs.
You could use Content Types/Generic Relations
class Notification(models.Model):
user = models.ForeignKey(User)
sender = models.ForeignKey(User)
object_id = models.PositiveIntegerField(default=None, null=True)
content_type = models.ForeignKey(ContentType, default=None, null=True)
comment = models.CharField(max_length=200)
created = models.DateTimeField(auto_now=False,auto_now_add=True)
#property
def model_object(self):
content_type = self.content_type
object_id = self.object_id
if content_type is not None and object_id is not None:
MyClass = content_type.model_class()
model_object = MyClass.objects.filter(pk=object_id)
if model_object.exists():
return model_object.first()
return None
Here we are storing the Model (Using the Content Types framework) and Primary Key (must be an Integer in this example) of the related object in the Notification model, then adding a property method to fetch the related object.
With this you can relate your notifications to any other model. You could also use the ForeignKey.limit_choices_to argument on the content_type field to validate that it only accepts certain models.
Django need to know the model before creating a relation, you can store the model in char field like post:23 user_follow:41 and define a get_model method that will parse that field and return the right model object
All depends on your design, you have several options. Different options depend on the size of your database:
How many notifications are there?
Do you need to update the notifications often?
Or most of the notifications are inserted once and then read often?
Use an abstract model
Use an abstract model and actually create the PostLikeNotification and UserFollowNotification and other models of such a kind.
class Notification(models.Model):
# ...
class Meta:
abstract = True
class PostLikeNotification(Notification):
model = models.ForeignKey(SomePost)
class UserFollowNotification(Notifcation):
model = models.ForeignKey(Follower)
# ...
This has several advantages:
You keep your relations in your (relational) database.
You have strong foreign keys to prevent inconsistent data.
It is "Djangoic": relations in the database, starting with a normalised database, and no early optimisations are django's way of doing things.
And, of course, this has some disadvantages:
If you need to search all notifications for something the query will be complex.
Moreover, a query over all notifications will be slow, since it filters several tables.
Use a CharField
You can use a simple CharField and store in it the model name and id. Or two fields one for the name and another for the id.
class Notification(models.Model):
model_type = models.CharField(max_len=48)
model_id = models.PositiveInteger()
Advantages:
You have a single table, querying is faster if you have the right indexes.
You can get one of the types of notifications with a simple comparison (index model_type for extra speed).
Disadvantages:
Inconsistent data may appear.
You will need to add extra code at a higher level to deal with possible inconsistent data.
Parallel writes (that may need to lock the entire table) may be a problem.
The middle ground, use several foreign keys
This is just one way of implementing a middle ground between the two options below: You add several nullable foreign keys. Other ways of achieving middle ground exist.
class Notification(models.Model):
model_post = models.ForeignKey(SomePost, null=True, blank=True)
model_follow = models.ForeignKey(Follower, null=True, blank=True)
Advantage:
Verification of inconsistent data can be made without searching other tables (foreign keys are foreign keys, the database takes care of their consistency).
Disadvantage:
It has most of the disadvantages of the other two methods but to a lesser extent (at least in most of them).
Conclusion
If you're just starting a project, and you do not know (or are not worried) about the volume of data then do create several tables. Abstract models were created for this purpose.
On the other hand if you have a lot of notifications to be read and filtered (by a lot, I mean millions) then you have good reasons to create a single notification table and process the relations at a higher level. Note that this incurs locking problems, you shall (almost) never lock notifications if you have a single table.

why is models.ForeignKey advantageous?

In my models I have a Concert class and a Venue class. Each venue has multiple concerts. I have been linking the Concert class to a Venue with a simple
venue = models.IntegerField(max_length = 10)
...containing the venue object's primary key. A colleague suggested we use venue = models.ForeignKey(Venue) instead. While this also works, I wonder if it's worth the switch because I have been able to parse out all the concerts for a venue by simply using the venue's ID in Concert.objects.filter(venue=4) the same way I could do this with a ForeignKey: Venue_instance.Concert_set.all(). I've never had any problems using my method.
The way I see it, using the IntegerField and objects.filter() is just as much of a "ManyToOne" relationship as a ForeignKey, so I want to know where I'm wrong. Why are ForeignKeys advantageous? Are they faster? Is it better database design? Cleaner code?
I would say that the most practical benefit of a foreign key is the ability to query across relationships automatically. Django generates the JOINs automatically.
The automatic reverse relation helpers are great too as you mentioned.
Here are some examples that would be more complicated with only an integer relationship.
concerts = Concert.objects.filter(...)
concerts.order_by('venue__attribute') # ordering beyond PK.
concerts.filter(venue__name='foo') # filter by a value across the relationship
concerts.values_list('venue__name') # get just venue names
concerts.values('venue__city').annotate() # get unique values across the venue
concerts.filter(venue__more__relationships='foo')
Venue.objects.filter(concert__name='Coachella') # reverse lookups work too
# with an integer field for Concert.venue, you'd have to do something like...
Venue.objects.filter(id__in=Concert.objects.filter(name='Coachella'))
As others have pointed out... database integrity is useful, cascading deletes (customizable of course), and facepalm it just occurred to me that the django admin and forms framework work amazingly with foreign keys.
class ConcertInline(admin.TabularInline):
model = Concert
class VenueAdmin(admin.ModelAdmin):
inlines = [ConcertInline]
# that was quick!
I'm sure there are more examples of django features handling foreign keys.
ForeignKey is a database concept implemented in most databases that also enforces referential integrity.
Because django would know what this column refers to is a table, which may itself be a foreign key to some other table, it can help chain the relationship which will produce the corresponding joins in the SQL.
Other than the normal one-way chaining, Django also adds a parameter to the opposite side, like you have recognized. When you have a venue instance, you are able to query venue.concert_set.
The thing that bothers me the most about not using FK and rolling your own by using the integer is that:
You don't have referential integrity check.
You lose out on the power of SQL. Every moderately deep query of yours will now need multiple hits to the database, since you can't join. - You also lose out on all the levers the framework provides to deal with the SQL

Django m2m queries, distinct Users for a m2m relationship of a Model

I have a model Model with a m2m field :
user = .. fk user
...
watchers = models.ManyToManyField(User, related_name="boardShot_watchers", null=True)
How do i select all distinct Users involved in this watchers relationship for all my entries of type Model ?
I dont think there is an ORM way to access to intermediary M2M table.
Greg
Not in your current model. If you want to have explicit access to the joining table, you need to make it part of the Django object model. The docs explain how to do this:
http://www.djangoproject.com/documentation/models/m2m_intermediary/
The admin and other django.contrib* components can be configured to treat most fields the same as if they were just model.ManyToMany's. But it will take a little config.

Categories