I use a voting app (django-ratings if that makes any difference) which uses django's GenericForeignKey, has a ForeignKey to User, and several other fields like date of latest change.
I'd like to get all the objects of one content type, that a single user voted for ordered by date of latest change. As far as I understand - all the info can be found in a single table (except the content_type which can be prefetched/cached). Unfortunately django still makes an extra query each time I request a content_object.
So the question is - how do I get all the votes on a given model, by a given user, with related objects and given ordering with minimum database hits?
Edit: Right now I'm using 2 queries - first selecting all the votes, getting all the objects I need, filtering by .filter(pk__in=obj_ids) and finally populating them to votes objects. But it seems that a reverse generic relation can help solve the problem
Have you checked out select_related()? That may help.
Returns a QuerySet that will automatically "follow" foreign-key relationships, selecting that additional related-object data when it executes its query. This is a performance booster which results in (sometimes much) larger queries but means later use of foreign-key relationships won't require database queries.
https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
Well right now we're using prefetch_related() from django 1.4 on a GenericRelation. It still uses 2 queries, but has a very intuitive interface.
From looking at the models.py of the django-ratings app, I think you would have to do user.votes.filter(content_type__model=Model._meta.module_name).order_by("date_changed") (assuming the model you want to filter by is Model) to get all the Vote objects. For the related objects, loop through the queryset getting content_object on each item. IMHO, this would result in the least DB queries.
Related
Disclaimer: I have searched and a question tackling this particular challenge could not be found at the time of posting.
The Requirement
For a Class Based View I need to implement Pagination for a QuerySet derived through a many to many relationship. Here's the requirement with a more concrete description:
Many Library Records can belong to many Collections
Web pages are required for most (but not necessarily all) Collections, and so I need to build views/templates/urls based on what the client identifies as required
Each Collection Page displaying the relevant Library Records requires Pagination, as there may be 100's of records to display.
The First Approach
And so with this requirement in mind I approached this as I normally would when building a CBV with Pagination. However, this approach did not allow me to meet the requirement. What I quickly discovered was that the Pagination method in the CBV was building the object based on the declared model, but the many to many relationship was not working for me.
I explored the use of object in the template, but after a number of attempts I was getting nowhere. I need to display Library Record objects but the many to many relationship demands that I do so after determining the records based on the Collection they belong to.
EDIT - Addition of model
models.py
class CollectionOrder(models.Model):
collection = models.ForeignKey(
Collection,
related_name='collection_in_collection_order',
on_delete=models.PROTECT,
null=True,
blank=True,
verbose_name='Collection'
)
record = models.ForeignKey(
LibraryRecord,
related_name='record_in_collection_order',
on_delete=models.PROTECT,
null=True,
blank=True,
verbose_name='Library record',
)
order_number = models.PositiveIntegerField(
blank=True,
null=True,
)
Please do not work with record.record.id: this will each time make a query for each CollectionOrder object, and thus if there are 100 CollectionOrder objects, that will make 100 extra queries, and thus eventually make 102 queries. If the number of matches is thus quite large, it will eventually no longer respond (within reasonable time).
Furthermore pk__in=library_records_ids will not respect the order of the library_record_ids. Indeed, it can return the LibraryRecords in any order, as long as these have primary keys that are members of the list.
You can query with:
def get_queryset(self):
return LibraryRecord.objects.filter(
collectionorder__collection__collection='collection-name'
).order_by('collectionorder__order_number')
Where collectionorder is the related_query_name=… [Django-doc] for the ForeignKey, OneToOneField or ManyToManyField named record from CollectionOrder to the LibraryRecord model. If you did not specify a value for the related_query_name=… parameter, it will take the value for the related_name=… parameter [Django-doc], and if you did not specify that one either, it will use the name of the source model (so where the relation is defined) in lowercase, so in this case collectionorder.
This will thus respect the collectionorder__order_number as ordering condition, and perform this in a single database query, minimizing the amount of queries to the database.
Hopefully, this Q&A helps someone else. If in reading the following approach you can think of ways to refactor/optimize I'd love to learn. Note: I deliberately did not implement Pythonic List Comprehension for my personal preference of readability.
What I ended up doing was adding get_queryset() to:
Query the Collection for the records belong to it, to then
Build a list of record ids, to then
Return the QuerySet by filtering for pk__in (the pk exists in the list of library_record_ids)
Here's the resulting code. (Edit: This code has been optimized following another answer - I just didn't want to leave a lesser snippet up)
def get_queryset(self):
return LibraryRecord.objects.filter(
record_in_collection_order__collection__collection='Collection Name'
).order_by('record_in_collection_order__order_number')
The requirement has been met. I welcome constructive criticism. My intention in sharing this Q&A is to try and give a little back to the Stack Overflow Community that has served me so well since starting this journey into Django.
For instance:
e = Entries.objects.filter(blog__name='Something')
Does this cache blog as well or should I still add select_related('blog') to this.
The filter(blog__name='Something') means that Django will do a join when it fetches the queryset. However if you want to access the related blogs you still have to use select_related.
You might find django-debug-toolbar useful so that you can check the queries yourself.
You will see attributes of everything blog and anything farther down as well. Django automagically takes care of it all. But it will do additional queries internally as needed to get blog (beyond blog_id) and anything else. Use select_related to get anything you know you will use. If select_related doesn't work then most of the time prefetch_related will work instead. The difference is that prefetch_related does an extra query for each table. That is still better than letting Django do everything automagically if the query includes more than one record of the main table - i.e., 1 + 1 instead of 1 + n.
I suspect part of the confusion is about filter(). filter and exclude and other ways of getting anything less than all() will reference the other tables in the WHERE part of the query but Django doesn't retrieve fields from those tables unless/until you access them, unless you use select_related or prefetch_related.
I have optimized the query below the best I can.
message = Message.objects.defer('gateway', 'batch', 'content_type', 'sender',
'reply_callback')\
.select_related().get(pk=message_id)
However, the model has a field called billee (see below)
billee = generic.GenericForeignKey()
I don't seem to be able to use select_related or defer on this field, maybe because its a GenericForeignKey. Can someone explain why and then give me an example of how to achieve this?
select_related() can't prefetch generic relations (it works only with ForeignKey and OneToOneField), so You may need to write a raw SQL query if You really want to reduce this one additional query.
In case of fetching many messages at once You may use prefetch_related() which can follow generic relations (but still makes an additional query).
In my models I have a Concert class and a Venue class. Each venue has multiple concerts. I have been linking the Concert class to a Venue with a simple
venue = models.IntegerField(max_length = 10)
...containing the venue object's primary key. A colleague suggested we use venue = models.ForeignKey(Venue) instead. While this also works, I wonder if it's worth the switch because I have been able to parse out all the concerts for a venue by simply using the venue's ID in Concert.objects.filter(venue=4) the same way I could do this with a ForeignKey: Venue_instance.Concert_set.all(). I've never had any problems using my method.
The way I see it, using the IntegerField and objects.filter() is just as much of a "ManyToOne" relationship as a ForeignKey, so I want to know where I'm wrong. Why are ForeignKeys advantageous? Are they faster? Is it better database design? Cleaner code?
I would say that the most practical benefit of a foreign key is the ability to query across relationships automatically. Django generates the JOINs automatically.
The automatic reverse relation helpers are great too as you mentioned.
Here are some examples that would be more complicated with only an integer relationship.
concerts = Concert.objects.filter(...)
concerts.order_by('venue__attribute') # ordering beyond PK.
concerts.filter(venue__name='foo') # filter by a value across the relationship
concerts.values_list('venue__name') # get just venue names
concerts.values('venue__city').annotate() # get unique values across the venue
concerts.filter(venue__more__relationships='foo')
Venue.objects.filter(concert__name='Coachella') # reverse lookups work too
# with an integer field for Concert.venue, you'd have to do something like...
Venue.objects.filter(id__in=Concert.objects.filter(name='Coachella'))
As others have pointed out... database integrity is useful, cascading deletes (customizable of course), and facepalm it just occurred to me that the django admin and forms framework work amazingly with foreign keys.
class ConcertInline(admin.TabularInline):
model = Concert
class VenueAdmin(admin.ModelAdmin):
inlines = [ConcertInline]
# that was quick!
I'm sure there are more examples of django features handling foreign keys.
ForeignKey is a database concept implemented in most databases that also enforces referential integrity.
Because django would know what this column refers to is a table, which may itself be a foreign key to some other table, it can help chain the relationship which will produce the corresponding joins in the SQL.
Other than the normal one-way chaining, Django also adds a parameter to the opposite side, like you have recognized. When you have a venue instance, you are able to query venue.concert_set.
The thing that bothers me the most about not using FK and rolling your own by using the integer is that:
You don't have referential integrity check.
You lose out on the power of SQL. Every moderately deep query of yours will now need multiple hits to the database, since you can't join. - You also lose out on all the levers the framework provides to deal with the SQL
I am using Django admin for managing my data.
I have a Users, Groups and Domains tables.
Users table has many to many relationship with Groups and Domains tables.
Domains table has one to many relationship with Groups table.
and when I save the User data through admin I also need some addtional database updates in the users_group and the users_domains table.
How do I do this? Where do I put the code?
I think you are looking for InlineModels. They allow you to edit related models in the same page as the parent model. If you are looking for greater control than this, you can override the ModelAdmin save methods.
Also, always check out the Manual when you need something. It really is quite good.
The best way to update other database tables is to perform the necessary get and save operations. However, if you have a many-to-many relationship, by default, both sides of the relationship are accessible from a <lower_case_model_name>_set parameter. That is, user.group_set.all() will give you all Group objects associated with a user, while group.user_set.all() will give you all User objects associated with a group. So if you override the save method (or register a signal listener--whichever option sounds stylistically more pleasing), try:
for group in user.group_set.all():
#play with group object
....
group.save()