why is models.ForeignKey advantageous? - python

In my models I have a Concert class and a Venue class. Each venue has multiple concerts. I have been linking the Concert class to a Venue with a simple
venue = models.IntegerField(max_length = 10)
...containing the venue object's primary key. A colleague suggested we use venue = models.ForeignKey(Venue) instead. While this also works, I wonder if it's worth the switch because I have been able to parse out all the concerts for a venue by simply using the venue's ID in Concert.objects.filter(venue=4) the same way I could do this with a ForeignKey: Venue_instance.Concert_set.all(). I've never had any problems using my method.
The way I see it, using the IntegerField and objects.filter() is just as much of a "ManyToOne" relationship as a ForeignKey, so I want to know where I'm wrong. Why are ForeignKeys advantageous? Are they faster? Is it better database design? Cleaner code?

I would say that the most practical benefit of a foreign key is the ability to query across relationships automatically. Django generates the JOINs automatically.
The automatic reverse relation helpers are great too as you mentioned.
Here are some examples that would be more complicated with only an integer relationship.
concerts = Concert.objects.filter(...)
concerts.order_by('venue__attribute') # ordering beyond PK.
concerts.filter(venue__name='foo') # filter by a value across the relationship
concerts.values_list('venue__name') # get just venue names
concerts.values('venue__city').annotate() # get unique values across the venue
concerts.filter(venue__more__relationships='foo')
Venue.objects.filter(concert__name='Coachella') # reverse lookups work too
# with an integer field for Concert.venue, you'd have to do something like...
Venue.objects.filter(id__in=Concert.objects.filter(name='Coachella'))
As others have pointed out... database integrity is useful, cascading deletes (customizable of course), and facepalm it just occurred to me that the django admin and forms framework work amazingly with foreign keys.
class ConcertInline(admin.TabularInline):
model = Concert
class VenueAdmin(admin.ModelAdmin):
inlines = [ConcertInline]
# that was quick!
I'm sure there are more examples of django features handling foreign keys.

ForeignKey is a database concept implemented in most databases that also enforces referential integrity.
Because django would know what this column refers to is a table, which may itself be a foreign key to some other table, it can help chain the relationship which will produce the corresponding joins in the SQL.
Other than the normal one-way chaining, Django also adds a parameter to the opposite side, like you have recognized. When you have a venue instance, you are able to query venue.concert_set.
The thing that bothers me the most about not using FK and rolling your own by using the integer is that:
You don't have referential integrity check.
You lose out on the power of SQL. Every moderately deep query of yours will now need multiple hits to the database, since you can't join. - You also lose out on all the levers the framework provides to deal with the SQL

Related

Django multi-table inheritance alternatives for basic data model pattern

tl;dr
Is there a simple alternative to multi-table inheritance for implementing the basic data-model pattern depicted below, in Django?
Premise
Please consider the very basic data-model pattern in the image below, based on e.g. Hay, 1996.
Simply put: Organizations and Persons are Parties, and all Parties have Addresses. A similar pattern may apply to many other situations.
The important point here is that the Address has an explicit relation with Party, rather than explicit relations with the individual sub-models Organization and Person.
Note that each sub-model introduces additional fields (not depicted here, but see code example below).
This specific example has several obvious shortcomings, but that is beside the point. For the sake of this discussion, suppose the pattern perfectly describes what we wish to achieve, so the only question that remains is how to implement the pattern in Django.
Implementation
The most obvious implementation, I believe, would use multi-table-inheritance:
class Party(models.Model):
""" Note this is a concrete model, not an abstract one. """
name = models.CharField(max_length=20)
class Organization(Party):
"""
Note that a one-to-one relation 'party_ptr' is automatically added,
and this is used as the primary key (the actual table has no 'id'
column). The same holds for Person.
"""
type = models.CharField(max_length=20)
class Person(Party):
favorite_color = models.CharField(max_length=20)
class Address(models.Model):
"""
Note that, because Party is a concrete model, rather than an abstract
one, we can reference it directly in a foreign key.
Since the Person and Organization models have one-to-one relations
with Party which act as primary key, we can conveniently create
Address objects setting either party=party_instance,
party=organization_instance, or party=person_instance.
"""
party = models.ForeignKey(to=Party, on_delete=models.CASCADE)
This seems to match the pattern perfectly. It almost makes me believe this is what multi-table-inheritance was intended for in the first place.
However, multi-table-inheritance appears to be frowned upon, especially from a performance point-of-view, although it depends on the application. Especially this scary, but ancient, post from one of Django's creators is quite discouraging:
In nearly every case, abstract inheritance is a better approach for the long term. I’ve seen more than few sites crushed under the load introduced by concrete inheritance, so I’d strongly suggest that Django users approach any use of concrete inheritance with a large dose of skepticism.
Despite this scary warning, I guess the main point in that post is the following observation regarding multi-table inheritance:
These joins tend to be "hidden" — they’re created automatically — and mean that what look like simple queries often aren’t.
Disambiguation: The above post refers to Django's "multi-table inheritance" as "concrete inheritance", which should not be confused with Concrete Table Inheritance on the database level. The latter actually corresponds better with Django's notion of inheritance using abstract base classes.
I guess this SO question nicely illustrates the "hidden joins" issue.
Alternatives
Abstract inheritance does not seem like a viable alternative to me, because we cannot set a foreign key to an abstract model, which makes sense, because it has no table. I guess this implies that we would need a foreign key for every "child" model plus some extra logic to simulate this.
Proxy inheritance does not seem like an option either, as the sub-models each introduce extra fields. EDIT: On second thought, proxy models could be an option if we use Single Table Inheritance on the database level, i.e. use a single table that includes all the fields from Party, Organization and Person.
GenericForeignKey relations may be an option in some specific cases, but to me they are the stuff of nightmares.
As another alternative, it is often suggested to use explicit one-to-one relations (eoto for short, here) instead of multi-table-inheritance (so Party, Person and Organization would all just be subclasses of models.Model).
Both approaches, multi-table-inheritance (mti) and explicit one-to-one relations (eoto), result in three database tables. So, depending on the type of query, of course, some form of JOIN is often inevitable when retrieving data.
By inspecting the resulting tables in the database, it becomes clear that the only difference between the mti and eoto approaches, on the database level, is that an eoto Person table has an id column as primary-key, and a separate foreign-key column to Party.id, whereas an mti Person table has no separate id column, but instead uses the foreign-key to Party.id as its primary-key.
Question(s)
I don't think the behavior from the example (especially the single direct relation to the parent) can be achieved with abstract inheritance, can it? If it can, then how would you achieve that?
Is an explicit one-to-one relation really that much better than multi-table-inheritance, except for the fact that it forces us to make our queries more explicit? To me the convenience and clarity of the multi-table approach outweighs the explicitness argument.
Note that this SO question is very similar, but does not quite answer my questions. Moreover, the latest answer there is almost nine years old now, and Django has changed a lot since.
[1]: Hay 1996, Data Model Patterns
While awaiting a better one, here's my attempt at an answer.
As suggested by Kevin Christopher Henry in the comments above, it makes sense to approach the problem from the database side. As my experience with database design is limited, I have to rely on others for this part.
Please correct me if I'm wrong at any point.
Data-model vs (Object-Oriented) Application vs (Relational) Database
A lot can be said about the object/relational mismatch,
or, more accurately, the data-model/object/relational mismatch.
In the present
context I guess it is important to note that a direct translation between data-model,
object-oriented implementation (Django), and relational database implementation, is not always
possible or even desirable. A nice three-way Venn-diagram could probably illustrate this.
Data-model level
To me, a data-model as illustrated in the original post represents an attempt to capture the essence of a real world information system. It should be sufficiently detailed and flexible to enable us to reach our goal. It does not prescribe implementation details, but may limit our options nonetheless.
In this case, the inheritance poses a challenge mostly on the database implementation level.
Relational database level
Some SO answers dealing with database implementations of (single) inheritance are:
How can you represent inheritance in a database?
How do you effectively model inheritance in a database?
Techniques for database inheritance?
These all more or less follow the patterns described in Martin Fowler's book
Patterns of Application Architecture.
Until a better answer comes along, I am inclined to trust these views.
The inheritance section in chapter 3 (2011 edition) sums it up nicely:
For any inheritance structure there are basically three options.
You can have one table for all the classes in the hierarchy: Single Table Inheritance (278) ...;
one table for each concrete class: Concrete Table Inheritance (293) ...;
or one table per class in the hierarchy: Class Table Inheritance (285) ...
and
The trade-offs are all between duplication of data structure and speed of access. ...
There's no clearcut winner here. ... My first choice tends to be Single Table Inheritance ...
A summary of patterns from the book is found on martinfowler.com.
Application level
Django's object-relational mapping (ORM) API
allows us to implement these three approaches, although the mapping is not
strictly one-to-one.
The Django Model inheritance docs
distinguish three "styles of inheritance", based on the type of model class used (concrete, abstract, proxy):
abstract parent with concrete children (abstract base classes):
The parent class has no database table. Instead each child class has its own database
table with its own fields and duplicates of the parent fields.
This sounds a lot like Concrete Table Inheritance in the database.
concrete parent with concrete children (multi-table inheritance):
The parent class has a database table with its own fields, and each child class
has its own table with its own fields and a foreign-key (as primary-key) to the
parent table.
This looks like Class Table Inheritance in the database.
concrete parent with proxy children (proxy models):
The parent class has a database table, but the children do not.
Instead, the child classes interact directly with the parent table.
Now, if we add all the fields from the children (as defined in our data-model)
to the parent class, this could be interpreted as an implementation of
Single Table Inheritance.
The proxy models provide a convenient way of dealing with the application side of
the single large database table.
Conclusion
It seems to me that, for the present example, the combination of Single Table Inheritance with Django's proxy models may be a good solution that does not have the disadvantages of "hidden" joins.
Applied to the example from the original post, it would look something like this:
class Party(models.Model):
""" All the fields from the hierarchy are on this class """
name = models.CharField(max_length=20)
type = models.CharField(max_length=20)
favorite_color = models.CharField(max_length=20)
class Organization(Party):
class Meta:
""" A proxy has no database table (it uses the parent's table) """
proxy = True
def __str__(self):
""" We can do subclass-specific stuff on the proxies """
return '{} is a {}'.format(self.name, self.type)
class Person(Party):
class Meta:
proxy = True
def __str__(self):
return '{} likes {}'.format(self.name, self.favorite_color)
class Address(models.Model):
"""
As required, we can link to Party, but we can set the field using
either party=person_instance, party=organization_instance,
or party=party_instance
"""
party = models.ForeignKey(to=Party, on_delete=models.CASCADE)
One caveat, from the Django proxy-model documentation:
There is no way to have Django return, say, a MyPerson object whenever you query for Person objects. A queryset for Person objects will return those types of objects.
A potential workaround is presented here.

Django - Check if a non-foreign-key integer field exists as a Primary Key to another Model?

Say you have the following legacy model:
class Foo(models.Model):
bar_id = models.IntegerField(null=True, blank=True)
# more fields
bar_id is supposed to refer to a primary key from the Bar model, but for some reason, it's not registered as a foreign key. Now, how can I filter out all the Foos that do not have corresponding Bar objects?
We can make a list of primary keys of Bars, and then filter out all Foos that refer to such primary key.
Foo.objects.exclude(bar_id__in=Bar.objects.all().values_list('pk', flat=True))
This is a QuerySet that will give all Foo objects with an "invalid" bar_id (so an id that refers to a non-exiting Bar).
But it is better to use ForeignKey since then most databases will enforce this constraint in a transparant way. As a result, the database typically ensures that no such rows can exist at all. Typically you also add triggers to it what to do in case the Bar object that is referenced is for example removed.
Reading some of the comments makes me understand that OP would prefer to implement a foreign key but can not do so because of corrupt / missing data into database.
Two solutions:
Mark the column as foreign key in your Model, but do not enforce it
in the database (use --fake flag while migrating using manage.py
file. This approach helps in actually better defining your business
/ Model logic and enforces data Integrity in local development and
environments.
Mark the column as foreign key in your Model and use
db_constraint=False flag. Read more here. Use this approach
for legacy systems where data integrity has already been compromised
and you just need to use Django's ORM joins the natural way.

Django Custom primary key

When I inspect the Trac database I get:
class TicketChange(models.Model):
ticket = models.IntegerField()
time = models.BigIntegerField()
author = models.TextField(blank=True)
field = models.TextField()
oldvalue = models.TextField(blank=True)
newvalue = models.TextField(blank=True)
class Meta:
managed = False
db_table = 'ticket_change'
With not primary key:
>>> TicketChange.objects.all()
DatabaseError: column ticket_change.id does not exist
LINE 1: SELECT "ticket_change"."id", "ticket_change"."ticket", "tick...
Because I need to specify a pk but the original primary key of ticket_change in Trac is:
Primary key (ticket, time, field)
But It's not possible in Django: Django Multi-Column Primary Key Discussion.
If I define time like pk I can't add two tickets changes in the same time.
What can I do?
You're right. It's a known problem. So the only solutions are hacks (sort of).
Your best option is to use django-compositepks. The drawbacks are that it doesn't really support model relationships, so you will not be able to navigate to any relationship from your composite-pk model. However, looking at your TicketChange model this doesn't seem like an issue (unless you have more models with relationships to this one).
Another option would be to manually add the id column (and make sure to apply any additional changes to the db), thereby creating the new one-column primary key. A third option (and what I would probably do), pretty similar to the last one but cleaner, would be to create a new database from scratch and then populate it with a script by fetching the data from your existing legacy db.
Sorry I don't have any better solution for you, that's the way it is. Legacy dbs and django are always a headache, have gone through some similar processes myself. Hope this helps!
This is a use case the Django ORM simply does not support. If you're able to modify the database schema, add an additional column that will serve as the primary key for Django: An integer field with AUTO_INCREMENT (MySQL) or a SERIAL (PostgreSQL). It shouldn't desturb your other applications using the table since it will be managed by your database when new records are inserted. You can still use the actual primary key when making queries through the ORM:
ticket_change = TicketChange.objects.get(ticket=1234, time=5678, field='myfield')
If you’d like to specify a custom primary key, specify primary_key=True on one of your fields. If Django sees you’ve explicitly set Field.primary_key, it won’t add the automatic id column.
Each model requires exactly one field to have primary_key=True (either explicitly declared or automatically added).
https://docs.djangoproject.com/en/3.1/topics/db/models/#automatic-primary-key-fields

Django GenericForeignKey lookup for a given model

I use a voting app (django-ratings if that makes any difference) which uses django's GenericForeignKey, has a ForeignKey to User, and several other fields like date of latest change.
I'd like to get all the objects of one content type, that a single user voted for ordered by date of latest change. As far as I understand - all the info can be found in a single table (except the content_type which can be prefetched/cached). Unfortunately django still makes an extra query each time I request a content_object.
So the question is - how do I get all the votes on a given model, by a given user, with related objects and given ordering with minimum database hits?
Edit: Right now I'm using 2 queries - first selecting all the votes, getting all the objects I need, filtering by .filter(pk__in=obj_ids) and finally populating them to votes objects. But it seems that a reverse generic relation can help solve the problem
Have you checked out select_related()? That may help.
Returns a QuerySet that will automatically "follow" foreign-key relationships, selecting that additional related-object data when it executes its query. This is a performance booster which results in (sometimes much) larger queries but means later use of foreign-key relationships won't require database queries.
https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
Well right now we're using prefetch_related() from django 1.4 on a GenericRelation. It still uses 2 queries, but has a very intuitive interface.
From looking at the models.py of the django-ratings app, I think you would have to do user.votes.filter(content_type__model=Model._meta.module_name).order_by("date_changed") (assuming the model you want to filter by is Model) to get all the Vote objects. For the related objects, loop through the queryset getting content_object on each item. IMHO, this would result in the least DB queries.

Django Admin relations between tables: save database updates in several tables

I am using Django admin for managing my data.
I have a Users, Groups and Domains tables.
Users table has many to many relationship with Groups and Domains tables.
Domains table has one to many relationship with Groups table.
and when I save the User data through admin I also need some addtional database updates in the users_group and the users_domains table.
How do I do this? Where do I put the code?
I think you are looking for InlineModels. They allow you to edit related models in the same page as the parent model. If you are looking for greater control than this, you can override the ModelAdmin save methods.
Also, always check out the Manual when you need something. It really is quite good.
The best way to update other database tables is to perform the necessary get and save operations. However, if you have a many-to-many relationship, by default, both sides of the relationship are accessible from a <lower_case_model_name>_set parameter. That is, user.group_set.all() will give you all Group objects associated with a user, while group.user_set.all() will give you all User objects associated with a group. So if you override the save method (or register a signal listener--whichever option sounds stylistically more pleasing), try:
for group in user.group_set.all():
#play with group object
....
group.save()

Categories