I have a question, I think is very basic but I didn't find answer for this.
Based in the thread django conditionally filtering objects
Using the same example:
user = User.objects.get(pk=1)
category = Category.objects.get(pk=1)
qs = Item.objects.filter(user=user, date=now())
if category:
qs = qs.filter(category=category)
When the qs variable retrieve the results?
Because, if the line Item.objects.filter(user=user, date=now()) gives as result 1 million of records (after filter the category), those records will be loaded in memory? Or the queries are retrieving the information at the same time of the render view (or whatever method I would use) ?
I would suggest you read official Django documentation regarding retrieving objects
Django QuerySet is lazy and executed when data is evaluated
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated.
For example, if you do
print(qs)
you'll actually evaluate your qs and trigger database select and map data to QuerySet
Related
There are so many answers and articles to say queryset in django is lazy, it isn't evaluated until you actually do something with queryset.
My question is how is it possible? How does the methods, filter(), all() or order_by() etc, work not knowing what data the objects have?
I assume that hitting a database and knowing data in model objects is different. But, it doesn't make sense for me.
Cheers!
queryset aggragate all filters, excludes annotates and somethins like that and when you do something with this queryset, django generate query (from filters etc. do sql query) to database and after that do qyery do database
For instance:
e = Entries.objects.filter(blog__name='Something')
Does this cache blog as well or should I still add select_related('blog') to this.
The filter(blog__name='Something') means that Django will do a join when it fetches the queryset. However if you want to access the related blogs you still have to use select_related.
You might find django-debug-toolbar useful so that you can check the queries yourself.
You will see attributes of everything blog and anything farther down as well. Django automagically takes care of it all. But it will do additional queries internally as needed to get blog (beyond blog_id) and anything else. Use select_related to get anything you know you will use. If select_related doesn't work then most of the time prefetch_related will work instead. The difference is that prefetch_related does an extra query for each table. That is still better than letting Django do everything automagically if the query includes more than one record of the main table - i.e., 1 + 1 instead of 1 + n.
I suspect part of the confusion is about filter(). filter and exclude and other ways of getting anything less than all() will reference the other tables in the WHERE part of the query but Django doesn't retrieve fields from those tables unless/until you access them, unless you use select_related or prefetch_related.
I have built an API with Django Rest Framework. I want to change pagination for a better user experience.
The problem:
The client makes a call to request all posts. The request looks like:
http://website.com/api/v1/posts/?page=1
This returns the first page of posts. However, new posts are always being created. So when the user requests:
http://website.com/api/v1/posts/?page=2
The posts are almost always the same as page 1 (since new data is always coming in and we are ordering by -created).
Possible Solution?
I had the idea of sending an object id along with the request so that when we grab the posts. We grab them with respect to the last query.
http://website.com/api/v1/posts/?page=2&post_id=12345
And when we paginate we filter where post_id < 12345
But this only works assuming our post_id is an integer.
Right now I'm currently only using a basic ListAPIView
class PostList(generics.ListAPIView):
"""
API endpoint that allows posts to be viewed
"""
serializer_class = serializers.PostSerializer # just a basic serializer
model = Post
Is there a better way of paginating? so that the next request for that user's session doesn't look like the first when new data is being created.
You're looking for CursorPagination, from the DRF docs:
Cursor Pagination provides the following benefits:
Provides a consistent pagination view. When used properly CursorPagination ensures that the client will never see the same item twice when paging through records, even when new items are being inserted by other clients during the pagination process.
Supports usage with very large datasets. With extremely large datasets pagination using offset-based pagination styles may become inefficient or unusable. Cursor based pagination schemes instead have fixed-time properties, and do not slow down as the dataset size increases.
You can also use -created as the ordering field as you mentioned above.
How about caching the queryset? So that the next page is served from the same query set, and not from a new one. And then you could use a parameter to get a new queryset when you want.
Something like this:
from django.core.cache import cache
class PostList(generics.ListAPIView):
def get_queryset(self):
qs_key = str(self.request.user.id) + '_key'
if 'refresh' in self.request.QUERY_PARAMS:
# get a new query set
qs = Post.objects.all()
cache.set(qs_key, qs)
return cache.get(qs_key)
So basically, only when your url will be like this:
http://website.com/api/v1/posts/?refersh=whatever
the request will return new data.
UPDATE
In order to provide each user with it's own set of posts, the cache key must contain an unique identifier (which might be the user's ID):
I also updated the code.
The downside to this approach is that for a very large number of users and a large number of posts for each user, it might not work very well.
So, here is my second idea
Use a TimeStamped model for the Post model, and filter the query set based on the created field.
I don't know much about your models and how exactly they are built, so I guess you will have to choose which solution is best for your app :)
Maybe you can add a field to every object like "created_at/updated_at". Then you can save the timestamp when the user made the request and filter out everything that came after it.
Haven't tried it myself but I guess it might work on your case
How can I do some processing before returning something through Tastypie for a specific user?
For example, let's say I have an app where a user has posts and can also follow other people's posts. I'd like to combine this person's posts with the posts of people they're following and return it as one array.
So let's say that in Tastypie I'd like to get the latest 20 posts from this person's timeline: I'd need to get the user, process this information and return it in JSON, but I'm not exactly sure how I'd process this and return it using Tastypie.
Any help?
Do more complex processing in get_object_list. It gets called before the dehydration process starts, i.e., before the JSON is created that gets passed back.
class PostResource(ModelResource):
class Meta:
queryset = Post.objects.all()
def get_object_list(self, request):
this_users_posts = super(PostResource, self).get_object_list(request).filter(user=User.objects.get(user=request.user))
all_the_posts_this_user_follows = super(PostResource, self).get_object_list(request).filter(follower=User.objects.get(user=request.user))
return this_users_posts | all_the_posts_this_user_follows
You need to fix those queries so that they work for your particular case. Then the trick is to combine the two different querysets you get back by concatenating them. Using | gets their full set, using & only gets their overlap. You want the full set (unless users can also follow their own posts, then you can probably call distinct() on the resulting superset).
I use a voting app (django-ratings if that makes any difference) which uses django's GenericForeignKey, has a ForeignKey to User, and several other fields like date of latest change.
I'd like to get all the objects of one content type, that a single user voted for ordered by date of latest change. As far as I understand - all the info can be found in a single table (except the content_type which can be prefetched/cached). Unfortunately django still makes an extra query each time I request a content_object.
So the question is - how do I get all the votes on a given model, by a given user, with related objects and given ordering with minimum database hits?
Edit: Right now I'm using 2 queries - first selecting all the votes, getting all the objects I need, filtering by .filter(pk__in=obj_ids) and finally populating them to votes objects. But it seems that a reverse generic relation can help solve the problem
Have you checked out select_related()? That may help.
Returns a QuerySet that will automatically "follow" foreign-key relationships, selecting that additional related-object data when it executes its query. This is a performance booster which results in (sometimes much) larger queries but means later use of foreign-key relationships won't require database queries.
https://docs.djangoproject.com/en/dev/ref/models/querysets/#select-related
Well right now we're using prefetch_related() from django 1.4 on a GenericRelation. It still uses 2 queries, but has a very intuitive interface.
From looking at the models.py of the django-ratings app, I think you would have to do user.votes.filter(content_type__model=Model._meta.module_name).order_by("date_changed") (assuming the model you want to filter by is Model) to get all the Vote objects. For the related objects, loop through the queryset getting content_object on each item. IMHO, this would result in the least DB queries.