I am wondering when I touch db when doing queries. more precisely, when is the query performed:
i have this kwargs dic:
kwargs = {'name__startswith':'somename','color__iexact':'somecolor'}
but only for name__startswith query, i need to distinct(). and not for color__iexact.
I thought, i would set for name__startswith the distinct() in loop like this:
for q in kwargs:
if q == 'name__startswith':
Thing.objects.filter(name__startswith=somename).distinct('id')
and then query for all dynamically:
allthings = Thing.objects.filter(**kwargs)
but this is somehow wrong, i seem to be doing two different things here..
how can i do these two queries dynamically?
django querysets are lazy, so the actual queries aren't evaluated until you use the data.
allthings = Thing.objects.filter(**kwargs)
if 'name__startswith' in kwargs:
allthings = allthings.distinct('id')
No queries should be preformed above unitl you actually use the data. This is great for filtering queries as you wish to do
From the docs:
QuerySets are lazy – the act of creating a QuerySet doesn’t involve any database activity. You can stack filters together all day long, and Django won’t actually run the query until the QuerySet is evaluated. Take a look at this example:
>>> q = Entry.objects.filter(headline__startswith="What")
>>> q = q.filter(pub_date__lte=datetime.date.today())
>>> q = q.exclude(body_text__icontains="food")
>>> print(q)
Though this looks like three database hits, in fact it hits the database only once, at the last line (print(q)). In general, the results of a QuerySet aren’t fetched from the database until you “ask” for them. When you do, the QuerySet is evaluated by accessing the database. For more details on exactly when evaluation takes place, see When QuerySets are evaluated.
You can use models.Q to create dynamic queries in django.
query = models.Q(name__startswith=somename)
query &= models.Q('color__iexact':'somecolor')
all_things = Thing.objects.filter(query).distinct('name')
Also read
Constructing Django filter queries dynamically with args and kwargs
Related
Let's say I store my query result temporarily to a variable
temp_doc = Document.objects.filter(detail=res)
and then I want to insert some data in said model
and will be something like this
p = Document(detail=res)
p.save()
note that res are object from other model to make some FK relation.
For some reason the temp_doc will contain the new data.
Are .filter() supposed to work like that?
Because with .get() the data inside temp_doc don't change
Django Querysets are lazy, this behavior is well documented
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated.
So basically that means until you don't ask for data evaluation, database query won't be executed
In your example
temp_doc = Document.objects.filter(detail=res)
p = Document(detail=res)
p.save()
enter code here
evaluating temp_doc now would include newly created Document as database query would return it
simply constructing list would evaluate QuerySet at the start
#evaluation happens here
list(temp_doc) = Document.objects.filter(detail=res)
p = Document(detail=res)
p.save()
I have a list of object ID's that I am getting from a query in an model's method, then I'm using that list to delete objects from a different model:
class SomeObject(models.Model):
# [...]
def do_stuff(self, some_param):
# [...]
ids_to_delete = {item.id for item in self.items.all()}
other_object = OtherObject.objects.get_or_create(some_param=some_param)
other_object.items.filter(item_id__in=ids_to_delete).delete()
What I don't like is that this takes 2 queries (well, technically 3 for the get_or_create() but in the real code it's actually .filter(some_param=some_param).first() instead of the .get(), so I don't think there's any easy way around that).
How do I pass in an unevaluated queryset as the argument to an __in lookup?
I would like to do something like:
ids_to_delete = self.items.all().values("id")
other_object.items.filter(item_id__in=ids_to_delete).delete()
You can, pass a QuerySet to the query:
other_object.items.filter(id__in=self.items.all()).delete()
this will transform it in a subquery. But not all databases, especially MySQL ones, are good with such subqueries. Furthermore Django handles .delete() manually. It will thus make a query to fetch the primary keys of the items, and then trigger the delete logic (and also remove items that have a CASCADE dependency). So .delete() is not done as one query, but at least two queries, and often a larger amount due to ForeignKeys with an on_delete trigger.
Note however that you here remove Item objects, not "unlink" this from the other_object. For this .remove(…) [Django-doc] can be used.
I should've tried the code sample I posted, you can in fact do this. It's given as an example in the documentation, but it says "be cautious about using nested queries and understand your database server’s performance characteristics" and recommends against doing this, casting the subquery into a list:
values = Blog.objects.filter(
name__contains='Cheddar').values_list('pk', flat=True)
entries = Entry.objects.filter(blog__in=list(values))
m = MyModel.objects.all().only("colA", "colB").prefetch_related("manyToManyField")
for mm in m:
print(mm.id)
list(mm.manyToManyField.values_list('id', flat=True))
This code takes too long to execute.
This takes virtually no time (no reference to manyToManyField in loop):
m = MyModel.objects.all().only("colA", "colB").prefetch_related("manyToManyField")
for mm in m:
print(mm.id)
And this takes nearly the exact same time as the first
m = MyModel.objects.all().only("colA", "colB")
for mm in m:
print(mm.id)
list(mm.manyToManyField.values_list('id', flat=True))
This makes me think that .prefetch_related("manyToManyField") is useless and it is not actually fetching anything and list(mm.manyToManyField.values_list('id', flat=True)) hits the database for every cycle.
Why is this and how can I force to prefetch from a manytomany field?
I've tried to remove list() but then mm.manyToManyField.all().values_list gives me a queryset that is not JSON serializable (no, I don't want to install rest framework).
Also tried list(mm.manyToManyField.all().values_list) with list(): still goes crazy slow.
Why is this and how can I force to prefetch from a manytomany field?
The reason this happens is because you make a different query than the manyToManyField.all(), and thus that one is not performed. Imagine that you would myManyToManyField.filter(some_col=some_val), then it will hit the database as well, since a database is optimized to filter effectively.
If you would fetch the values, with:
# no extra query
for mm in m:
print(list(mm.manyToManyField.all()))
or if you want to print the primary keys, you can just fetch these with list comprehension for example:
# no extra query
for mm in m:
print([k.id for k in mm.manyToManyField.all()])
it will not make an additional query, since you already loaded that one with the .prefetch_related('manyToManyField'), but all variants, like filtering, annotating, etc. are not loaded.
You can however pass arbitrary querysets to prefetch with Prefetch objects [Django-doc]. For example if you want to retrieve the .values_list('id'), you can prefetch that with:
from django.db.models import Prefetch
m = MyModel.objects.only("colA", "colB").prefetch_related(
Prefetch(
'myManyToManyField',
queryset=TargetModel.objects.filter(pk__gt=5),
to_attr='filtered_pks'
)
)
Then the MyModels that arise from this will have an extra attribute 'filtered_pks' here, that contains the .filter(pk__gt=5) of that related model. The TargetModel is thus the model to which the ManyToManyField refers to.
from django.db import connection, reset_queries
Prints: []
reset_queries()
p = XModel.objects.filter(id=id) \
.values('name') \
.annotate(quantity=Count('p_id'))\
.order_by('-quantity') \
.distinct()[:int(count)]
print(connection.queries)
While this prints:
reset_queries()
tc = ZModel.objects\
.filter(id=id, stock__gt=0) \
.aggregate(Sum('price'))
print(connection.queries)
I have changed fields names to keep things simple. (Fields are of parent tables i.e. __ to multiple level)
I was trying to print MySQL queries that Django makes and came across connection.queries, I was wondering why doesn't it prints empty with first, while with second it works fine. Although I am getting the result I expect it to. Probably the query is executed. Also am executing only one at a time.
As the accepted answer says you must consume the queryset first since it's lazy (e.g. list(qs)).
Another reason can be that you must be in DEBUG mode (see FAQ):
connection.queries is only available if Django DEBUG setting is True.
Because QuerySets in Django are lazy: as long as you do not consume the result, the QuerySet is not evaluated: no querying is done, until you want to obtain non-QuerySet objects like lists, dictionaries, Model objects, etc.
We can however not doe this for all ORM calls: for example Model.objects.get(..) has as type a Model object, we can not postpone that fetch (well of course we could wrap it in a function, and call it later, but then the "type" is a function, not a Model instance).
The same with a .aggregate(..) since then the result is a dictionary, that maps the keys to the corresponding result of the aggregation.
But your first query does not need to be evaluated. By writing a slicing, you only have added a LIMIT statement at the end of the query, but no need to evaluate it immediately: the type of this is still a QuerySet.
If you would however call list(qs) on a QuerySet (qs), then this means the QuerySet has to be evaluated, and Django will make the query.
The laziness of QuerySets also makes these chainings possible. Imagine that you write:
Model.objects.filter(foo=42).filter(bar=1425)
If the QuerySet of Model.objects.filter(foo=42) would be evaluated immediately, then this could result in a huge amount of Model instances, but by postponing this, we now filter on bar=1425 as well (we constructed a new QuerySet that takes both .filter(..)s into account). This can result in a query that can be evaluated more efficiently, and for example, can result in less data that has to be transferred from the database to the Django server.
The documentation says QuerySets are lazy as shown below:
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated. Take a look at this example:
>>> q = Entry.objects.filter(headline__startswith="What")
>>> q = q.filter(pub_date__lte=datetime.date.today())
>>> q = q.exclude(body_text__icontains="food")
>>> print(q)
Though this looks like three database hits, in fact it hits the
database only once, at the last line (print(q)). In general, the
results of a QuerySet aren’t fetched from the database until you “ask”
for them. When you do, the QuerySet is evaluated by accessing the
database. For more details on exactly when evaluation takes place, see
When QuerySets are evaluated.
Consider the following (pseudoPython) code:
l = [some, list]
for i in l:
o, c = Model.objects.get_or_create(par1=i["something"], defaults={'par2': i["else"],})
assuming that most of the time the objects would be retrieved, not created,
there is an obvious performance gain by quering with a first SELECT() of objects not in the set defined by par1, and then bulk-inserting the missing ones..
but, is there a neat Python/Django pattern of accomplishing that without diving into SQL?
This is a bulk import routine, so l contains dictionaries, not django model instances.
Given a list of IDs, you can use Django to quickly give you the corresponding Model instances using the __in operator: https://docs.djangoproject.com/en/dev/ref/models/querysets/#in
photos_exist = Photo.objects.filter(
id__in=photo_ids
)
You can use Q objects to create a complex query to SELECT the existing rows. Something like:
query_parameters = Q()
for i in l:
query_parameters |= Q(first=i['this']) & Q(second=i['that'])
found = MyModel.objects.filter(query_parameters)
Then you can figure out (in Python) the rows that are missing and create() them (or bulk_create() for efficiency, or get_or_create() if there are potential race conditions).
Of course, long complex queries can have performance problems of their own, but I imagine this would be faster that doing a separate query for each item.