There are so many answers and articles to say queryset in django is lazy, it isn't evaluated until you actually do something with queryset.
My question is how is it possible? How does the methods, filter(), all() or order_by() etc, work not knowing what data the objects have?
I assume that hitting a database and knowing data in model objects is different. But, it doesn't make sense for me.
Cheers!
queryset aggragate all filters, excludes annotates and somethins like that and when you do something with this queryset, django generate query (from filters etc. do sql query) to database and after that do qyery do database
Related
For instance:
e = Entries.objects.filter(blog__name='Something')
Does this cache blog as well or should I still add select_related('blog') to this.
The filter(blog__name='Something') means that Django will do a join when it fetches the queryset. However if you want to access the related blogs you still have to use select_related.
You might find django-debug-toolbar useful so that you can check the queries yourself.
You will see attributes of everything blog and anything farther down as well. Django automagically takes care of it all. But it will do additional queries internally as needed to get blog (beyond blog_id) and anything else. Use select_related to get anything you know you will use. If select_related doesn't work then most of the time prefetch_related will work instead. The difference is that prefetch_related does an extra query for each table. That is still better than letting Django do everything automagically if the query includes more than one record of the main table - i.e., 1 + 1 instead of 1 + n.
I suspect part of the confusion is about filter(). filter and exclude and other ways of getting anything less than all() will reference the other tables in the WHERE part of the query but Django doesn't retrieve fields from those tables unless/until you access them, unless you use select_related or prefetch_related.
I have a question, I think is very basic but I didn't find answer for this.
Based in the thread django conditionally filtering objects
Using the same example:
user = User.objects.get(pk=1)
category = Category.objects.get(pk=1)
qs = Item.objects.filter(user=user, date=now())
if category:
qs = qs.filter(category=category)
When the qs variable retrieve the results?
Because, if the line Item.objects.filter(user=user, date=now()) gives as result 1 million of records (after filter the category), those records will be loaded in memory? Or the queries are retrieving the information at the same time of the render view (or whatever method I would use) ?
I would suggest you read official Django documentation regarding retrieving objects
Django QuerySet is lazy and executed when data is evaluated
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated.
For example, if you do
print(qs)
you'll actually evaluate your qs and trigger database select and map data to QuerySet
I have a form like this on Django app:
class CustomForm(forms.Form):
field1 = forms.ModelChoiceField(queryset=ModelA.objects.filter(type=A))
field2 = forms.ModelChoiceField(queryset=ModelA.objects.filter(type=B))
The Debug Toolbar tells me there are two duplicates querys on ModelA but the filter conditions it's different. Is this a bug?. Also I was wondering if there is a way to optimize this case and make only one query
Thanks!
ModelA.objects.filter(type=A) and ModelA.objects.filter(type=B) are two separate querysets, so require two queries.
In theory, you could do
ModelA.objects.filter(type__in=[A, B])
Which would get all objects where type=A or type=B. You could then filter the list in Python. However, this wouldn't necessarily perform any better. You wouldn't be able to use the ModelChoiceField any more, so your code would be more complicated.
I have optimized the query below the best I can.
message = Message.objects.defer('gateway', 'batch', 'content_type', 'sender',
'reply_callback')\
.select_related().get(pk=message_id)
However, the model has a field called billee (see below)
billee = generic.GenericForeignKey()
I don't seem to be able to use select_related or defer on this field, maybe because its a GenericForeignKey. Can someone explain why and then give me an example of how to achieve this?
select_related() can't prefetch generic relations (it works only with ForeignKey and OneToOneField), so You may need to write a raw SQL query if You really want to reduce this one additional query.
In case of fetching many messages at once You may use prefetch_related() which can follow generic relations (but still makes an additional query).
I use a voting app (django-ratings if that makes any difference) which uses django's GenericForeignKey, has a ForeignKey to User, and several other fields like date of latest change.
I'd like to get all the objects of one content type, that a single user voted for ordered by date of latest change. As far as I understand - all the info can be found in a single table (except the content_type which can be prefetched/cached). Unfortunately django still makes an extra query each time I request a content_object.
So the question is - how do I get all the votes on a given model, by a given user, with related objects and given ordering with minimum database hits?
Edit: Right now I'm using 2 queries - first selecting all the votes, getting all the objects I need, filtering by .filter(pk__in=obj_ids) and finally populating them to votes objects. But it seems that a reverse generic relation can help solve the problem
Have you checked out select_related()? That may help.
Returns a QuerySet that will automatically "follow" foreign-key relationships, selecting that additional related-object data when it executes its query. This is a performance booster which results in (sometimes much) larger queries but means later use of foreign-key relationships won't require database queries.
https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
Well right now we're using prefetch_related() from django 1.4 on a GenericRelation. It still uses 2 queries, but has a very intuitive interface.
From looking at the models.py of the django-ratings app, I think you would have to do user.votes.filter(content_type__model=Model._meta.module_name).order_by("date_changed") (assuming the model you want to filter by is Model) to get all the Vote objects. For the related objects, loop through the queryset getting content_object on each item. IMHO, this would result in the least DB queries.