Django custom QuerySet evaluation - python

So QuerySets are "lazy" and only run in certain instances (repr,list,etc.). I have created a custom QuerySet for doing many queries, but these could end up having millions of items!! That is way more than I want to return.
When returning the evaluated QuerySet, it should not have more than 25 results! Now I know I could do the following:
first = example.objects.filter(...)
last = first.filter(...)
result = last[:25]
#do stuff with result
but I will be doing so many queries with example objects that I feel it unnecessary to have the line result = last[:25]. Is there a way to specify how a QuerySet is returned?
If there is, how can I change it so that whenever the QuerySet would be evaluated it only returns the first x items in the QuerySet where, in this case, x = 25
Important note:
slicing must be on evaluation because that way I can chain queries without limited results, but when I return a result upon evaluation, it would have a max of x

I'm not sure I understand what your issue with slicing the queryset is. If it's the extra line of code or the hardcoded number that's bothering you, you can run
example.objects.filter(**filters)[:x]
and pass x into whatever method you're using.

You can write a custom Manager:
class LimitedNumberOfResultsManager(models.Manager):
def get_queryset(self):
return super(LimitedNumberOfResultsManager, self).get_queryset()[:25]
Note: You may think that adding a slice here will immediately evaluate the queryset. It won't. Instead information about the query limit will be saved to an underlying Query object and used later, during the final evaluation - as long as it is not overwritten by an another slice in the meantime.
Then add the manager to your model:
class YourModel(models.Model):
# ...
objects = LimitedNumberOfResultsManager()
After setting this up YourModel.objects.all() and other operations on your queryset will always return only up to 25 results. You can still overwrite this any time using slicing. For example:
This will return up to 25 resuls:
YourModel.objects.filter(lorem='ipsum')
but this will return up to 100 resuls:
YourModel.objects.filter(lorem='ipsum')[:100]
One more point. Overwriting the default manager may be confusing to other people reading your code. So I think it would be better to leave the default manager alone and use the custom one as an optional alternative:
class YourModel(models.Model):
# ...
objects = models.Manager()
limited = LimitedNumberOfResultsManager()
With this set up this will return all results:
YourModel.objects.all()
and this will return only up to 25 results:
YourModel.limited.all()
Depending on your exact use case you man also want to look at pagination in Django.

Related

Pass a queryset as the argument to __in in django?

I have a list of object ID's that I am getting from a query in an model's method, then I'm using that list to delete objects from a different model:
class SomeObject(models.Model):
# [...]
def do_stuff(self, some_param):
# [...]
ids_to_delete = {item.id for item in self.items.all()}
other_object = OtherObject.objects.get_or_create(some_param=some_param)
other_object.items.filter(item_id__in=ids_to_delete).delete()
What I don't like is that this takes 2 queries (well, technically 3 for the get_or_create() but in the real code it's actually .filter(some_param=some_param).first() instead of the .get(), so I don't think there's any easy way around that).
How do I pass in an unevaluated queryset as the argument to an __in lookup?
I would like to do something like:
ids_to_delete = self.items.all().values("id")
other_object.items.filter(item_id__in=ids_to_delete).delete()
You can, pass a QuerySet to the query:
other_object.items.filter(id__in=self.items.all()).delete()
this will transform it in a subquery. But not all databases, especially MySQL ones, are good with such subqueries. Furthermore Django handles .delete() manually. It will thus make a query to fetch the primary keys of the items, and then trigger the delete logic (and also remove items that have a CASCADE dependency). So .delete() is not done as one query, but at least two queries, and often a larger amount due to ForeignKeys with an on_delete trigger.
Note however that you here remove Item objects, not "unlink" this from the other_object. For this .remove(…) [Django-doc] can be used.
I should've tried the code sample I posted, you can in fact do this. It's given as an example in the documentation, but it says "be cautious about using nested queries and understand your database server’s performance characteristics" and recommends against doing this, casting the subquery into a list:
values = Blog.objects.filter(
name__contains='Cheddar').values_list('pk', flat=True)
entries = Entry.objects.filter(blog__in=list(values))

What if I delete rows from Django queryset and then filter again?

Consider the following code:
questions = Question.objects.only('id', 'pqa_id', 'retain')
del_questions = questions.filter(retain=False)
# Some computations on del_questions
del_questions.delete()
add_questions = questions.filter(pqa_id=None)
Will add_questions not contain questions with retain=False? I.e. is questions object re-evaluated when we run delete() on its subset del_questions?
Short answer: you here use different QuerySets, so you will here, by creating a copy, make another query. If you would use the same QuerySet, Django will remove the cache, and so it will re-evaluate the QuerySet. It is however possible to let objects temporarily survive a .delete() call, due to caching in another QuerySet that was evaluated before.
is questions object re-evaluated when we run delete() on its subset del_questions
questionss is never evaluated in the first place. A QuerySet is iterable and in case you iterate over it (or fetch the length, or something else), will result in a query. But if you write Model.objects.all().filter(foo=3) then Django will not first "evaluate" the .all() by fetching all Model objects into memory.
A QuerySet is in essence a tool to build a query, by chaining operations and each time constructing a new queryset. Eventually you can evaluate one of the querysets.
Here by apply a .filter(..) for the two calls. We thus constructed two different QuerySets, and so if you evaluate the former, then this will not result in any caching in the latter.
A second important note is that a .delete() does not evaluate the queryset, and thus does not cache the results. If we inspect the .delete() method [GitHub], we see:
def delete(self):
"""Delete the records in the current QuerySet."""
assert self.query.can_filter(), \
"Cannot use 'limit' or 'offset' with delete."
if self._fields is not None:
raise TypeError("Cannot call delete() after .values() or .values_list()")
del_query = self._chain()
# The delete is actually 2 queries - one to find related objects,
# and one to delete. Make sure that the discovery of related
# objects is performed on the same database as the deletion.
del_query._for_write = True
# Disable non-supported fields.
del_query.query.select_for_update = False
del_query.query.select_related = False
del_query.query.clear_ordering(force_empty=True)
collector = Collector(using=del_query.db)
collector.collect(del_query)
deleted, _rows_count = collector.delete()
# Clear the result cache, in case this QuerySet gets reused.
self._result_cache = None
return deleted, _rows_count
With self._chain(), one creates a copy of the querset. So even if this would change the state of a QuerySet, then it would not change the state of this QuerySet.
Another interesting part is self._result_cache = None, here Django resets the cache. So if the queryset was already evaluated before you called .delete() (for example you materialized the queryset before calling .delete()), then it will remove that cache. So if you would reevaluate the QuerySet, this would result in another query to fetch the items.
There is however a scenario where data can still get outdated. For example the following:
questions = Question.objects.all() # create a queryset
list(questions) # materialize the result
questions2 = questions.all() # create a copy of this queryset
questions2.delete() # remove the entries
If we now would call list(questions), we thus obtain the elements in the cache of questions, and this QuerySet is not invalidated, so the elements "survive" a .delete() from another queryset (a copy from this one, although that is not necessary, a simply Questions.objects.all().delete() would also do the trick).

"connection.queries" returns nothing in Django

from django.db import connection, reset_queries
Prints: []
reset_queries()
p = XModel.objects.filter(id=id) \
.values('name') \
.annotate(quantity=Count('p_id'))\
.order_by('-quantity') \
.distinct()[:int(count)]
print(connection.queries)
While this prints:
reset_queries()
tc = ZModel.objects\
.filter(id=id, stock__gt=0) \
.aggregate(Sum('price'))
print(connection.queries)
I have changed fields names to keep things simple. (Fields are of parent tables i.e. __ to multiple level)
I was trying to print MySQL queries that Django makes and came across connection.queries, I was wondering why doesn't it prints empty with first, while with second it works fine. Although I am getting the result I expect it to. Probably the query is executed. Also am executing only one at a time.
As the accepted answer says you must consume the queryset first since it's lazy (e.g. list(qs)).
Another reason can be that you must be in DEBUG mode (see FAQ):
connection.queries is only available if Django DEBUG setting is True.
Because QuerySets in Django are lazy: as long as you do not consume the result, the QuerySet is not evaluated: no querying is done, until you want to obtain non-QuerySet objects like lists, dictionaries, Model objects, etc.
We can however not doe this for all ORM calls: for example Model.objects.get(..) has as type a Model object, we can not postpone that fetch (well of course we could wrap it in a function, and call it later, but then the "type" is a function, not a Model instance).
The same with a .aggregate(..) since then the result is a dictionary, that maps the keys to the corresponding result of the aggregation.
But your first query does not need to be evaluated. By writing a slicing, you only have added a LIMIT statement at the end of the query, but no need to evaluate it immediately: the type of this is still a QuerySet.
If you would however call list(qs) on a QuerySet (qs), then this means the QuerySet has to be evaluated, and Django will make the query.
The laziness of QuerySets also makes these chainings possible. Imagine that you write:
Model.objects.filter(foo=42).filter(bar=1425)
If the QuerySet of Model.objects.filter(foo=42) would be evaluated immediately, then this could result in a huge amount of Model instances, but by postponing this, we now filter on bar=1425 as well (we constructed a new QuerySet that takes both .filter(..)s into account). This can result in a query that can be evaluated more efficiently, and for example, can result in less data that has to be transferred from the database to the Django server.
The documentation says QuerySets are lazy as shown below:
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated. Take a look at this example:
>>> q = Entry.objects.filter(headline__startswith="What")
>>> q = q.filter(pub_date__lte=datetime.date.today())
>>> q = q.exclude(body_text__icontains="food")
>>> print(q)
Though this looks like three database hits, in fact it hits the
database only once, at the last line (print(q)). In general, the
results of a QuerySet aren’t fetched from the database until you “ask”
for them. When you do, the QuerySet is evaluated by accessing the
database. For more details on exactly when evaluation takes place, see
When QuerySets are evaluated.

Building Django Q() objects from other Q() objects, but with relation crossing context

I commonly find myself writing the same criteria in my Django application(s) more than once. I'll usually encapsulate it in a function that returns a Django Q() object, so that I can maintain the criteria in just one place.
I will do something like this in my code:
def CurrentAgentAgreementCriteria(useraccountid):
'''Returns Q that finds agent agreements that gives the useraccountid account current delegated permissions.'''
AgentAccountMatch = Q(agent__account__id=useraccountid)
StartBeforeNow = Q(start__lte=timezone.now())
EndAfterNow = Q(end__gte=timezone.now())
NoEnd = Q(end=None)
# Now put the criteria together
AgentAgreementCriteria = AgentAccountMatch & StartBeforeNow & (NoEnd | EndAfterNow)
return AgentAgreementCriteria
This makes it so that I don't have to think through the DB model more than once, and I can combine the return values from these functions to build more complex criterion. That works well so far, and has saved me time already when the DB model changes.
Something I have realized as I start to combine the criterion from these functions that is that a Q() object is inherently tied to the type of object .filter() is being called on. That is what I would expect.
I occasionally find myself wanting to use a Q() object from one of my functions to construct another Q object that is designed to filter a different, but related, model's instances.
Let's use a simple/contrived example to show what I mean. (It's simple enough that normally this would not be worth the overhead, but remember that I'm using a simple example here to illustrate what is more complicated in my app.)
Say I have a function that returns a Q() object that finds all Django users, whose username starts with an 'a':
def UsernameStartsWithAaccount():
return Q(username__startswith='a')
Say that I have a related model that is a user profile with settings including whether they want emails from us:
class UserProfile(models.Model):
account = models.OneToOneField(User, unique=True, related_name='azendalesappprofile')
emailMe = models.BooleanField(default=False)
Say I want to find all UserProfiles which have a username starting with 'a' AND want use to send them some email newsletter. I can easily write a Q() object for the latter:
wantsEmails = Q(emailMe=True)
but find myself wanting to something to do something like this for the former:
startsWithA = Q(account=UsernameStartsWithAaccount())
# And then
UserProfile.objects.filter(startsWithA & wantsEmails)
Unfortunately, that doesn't work (it generates invalid PSQL syntax when I tried it).
To put it another way, I'm looking for a syntax along the lines of Q(account=Q(id=9)) that would return the same results as Q(account__id=9).
So, a few questions arise from this:
Is there a syntax with Django Q() objects that allows you to add "context" to them to allow them to cross relational boundaries from the model you are running .filter() on?
If not, is this logically possible? (Since I can write Q(account__id=9) when I want to do something like Q(account=Q(id=9)) it seems like it would).
Maybe someone suggests something better, but I ended up passing the context manually to such functions. I don't think there is an easy solution, as you might need to call a whole chain of related tables to get to your field, like table1__table2__table3__profile__user__username, how would you guess that? User table could be linked to table2 too, but you don't need it in this case, so I think you can't avoid setting the path manually.
Also you can pass a dictionary to Q() and a list or a dictionary to filter() functions which is much easier to work with than using keyword parameters and applying &.
def UsernameStartsWithAaccount(context=''):
field = 'username__startswith'
if context:
field = context + '__' + field
return Q(**{field: 'a'})
Then if you simply need to AND your conditions you can combine them into a list and pass to filter:
UserProfile.objects.filter(*[startsWithA, wantsEmails])

What is the most efficient way to iterate django objects updating them?

So I have a queryset to update
stories = Story.objects.filter(introtext="")
for story in stories:
#just set it to the first 'sentence'
story.introtext = story.content[0:(story.content.find('.'))] + ".</p>"
story.save()
And the save() operation completely kills performance. And in the process list, there are multiple entries for "./manage.py shell" yes I ran this through django shell.
However, in the past I've ran scripts that didn't need to use save(), as it was changing a many to many field. These scripts were very performant.
My project has this code, which could be relevant to why these scripts were so good.
#receiver(signals.m2m_changed, sender=Story.tags.through)
def save_story(sender, instance, action, reverse, model, pk_set, **kwargs):
instance.save()
What is the best way to update a large queryset (10000+) efficiently?
As far as new introtext value depends on content field of the object you can't do any bulk update. But you can speed up saving list of individual objects by wrapping it into transaction:
from django.db import transaction
with transaction.atomic():
stories = Story.objects.filter(introtext='')
for story in stories:
introtext = story.content[0:(story.content.find('.'))] + ".</p>"
Story.objects.filter(pk=story.pk).update(introtext=introtext)
transaction.atomic() will increase speed by order of magnitude.
filter(pk=story.pk).update() trick allows you to prevent any pre_save/post_save signals which are emitted in case of the simple save(). This is the officially recommended method of updating single field of the object.
You can use update built-in function over a queryset
Exmaple:
MyModel.objects.all().update(color=red)
In your case, you need use F() (read more here) built-in function to use instance own attributes:
from django.db.models import F
stories = Story.objects.filter(introtext__exact='')
stories.update(F('introtext')[0:F('content').find(.)] + ".</p>" )

Categories