Curious about how django achieves the ORM query optimization

Curious about how django achieves the ORM query optimization - python

I am no python expert and I am curious about how django optimize the following query
Model.objects.filter(field = 'abc')[0]
Somehow django will intelligently add 'limit 1' to SQL query like 'select * from model where field = 'abc' limit 1'

This is because Model.objects.filter(...) doesn't actually return a list, it returns a queryset object. When you do qset[0], it calls the __getitem__ method on querysets, which adds the limit 1 and executes it. Here's the source of that method; there's logic for various cases when the result has already been cached or not and so on.

Related

django custom Func for specific SQL function

I'm currently performing a raw query in my database because i use the MySQL function instr. I would like to translate is into a django Func class.I've spend several days reading the docs, Django custom for complex Func (sql function) and Annotate SQL Function to Django ORM Queryset `FUNC` / `AGGREGATE` but i still fail to write succesfully my custom Func.
This is my database
from django.db import models
class Car(models.Model):
brand = models.CharField("brand name", max_length=50)
#then add the __str__ function
Then I populate my database for this test
Car.objects.create(brand="mercedes")
Car.objects.create(brand="bmw")
Car.objects.create(brand="audi")
I want to check if something in my table is in my user input. This is how i perform my SQL query currently
query = Car.objects.raw("SELECT * FROM myAppName_car WHERE instr(%s, brand)>0", ["my audi A3"])
# this return an sql query with one element in this example
I'm trying to tranform it in something that would look like this
from django.db.models import Func
class Instr(Func):
function = 'INSTR'
query = Car.objects.filter(brand=Instr('brand'))
EDIT
Thank to the response, the correct answer is
from django.db.models import Value
from django.db.models.functions import StrIndex
query = Car.objects.annotate(pos=StrIndex(Value("my audi A3"), "brand")).filter(pos__gt=0)

Your custom database function is totally correct, but you're using it in the wrong way.
When you look at your usage in the raw SQL function, you can clearly see that you need 3 parameters for your filtering to work correctly: a string, a column name and a threshold (in your case it is always zero)
If you want to use this function in the same way in your query, you should do it like this:
query = Car.objects.annotate(
position_in_brand=Instr("my audi A3")
).filter(position_in_brand__gt=0)
For this to work properly, you also need to add output_field = IntegerField() to your function class, so Django knows that the result will always be an integer.
But... You've missed 2 things and your function is not needed at all, because it already exist in Django as StrIndex.
And the 2nd thing is: you can achieve the same outcome by using already existing django lookups, like contains or icontains:
query = Car.objects.filter(brand__contains="my audi A3")
You can find out more about lookups in Django docs

Why django ORM .filter() two way binding my data?

Let's say I store my query result temporarily to a variable
temp_doc = Document.objects.filter(detail=res)
and then I want to insert some data in said model
and will be something like this
p = Document(detail=res)
p.save()
note that res are object from other model to make some FK relation.
For some reason the temp_doc will contain the new data.
Are .filter() supposed to work like that?
Because with .get() the data inside temp_doc don't change

Django Querysets are lazy, this behavior is well documented
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated.
So basically that means until you don't ask for data evaluation, database query won't be executed
In your example
temp_doc = Document.objects.filter(detail=res)
p = Document(detail=res)
p.save()
enter code here
evaluating temp_doc now would include newly created Document as database query would return it
simply constructing list would evaluate QuerySet at the start
#evaluation happens here
list(temp_doc) = Document.objects.filter(detail=res)
p = Document(detail=res)
p.save()

Pass a queryset as the argument to __in in django?

I have a list of object ID's that I am getting from a query in an model's method, then I'm using that list to delete objects from a different model:
class SomeObject(models.Model):
# [...]
def do_stuff(self, some_param):
# [...]
ids_to_delete = {item.id for item in self.items.all()}
other_object = OtherObject.objects.get_or_create(some_param=some_param)
other_object.items.filter(item_id__in=ids_to_delete).delete()
What I don't like is that this takes 2 queries (well, technically 3 for the get_or_create() but in the real code it's actually .filter(some_param=some_param).first() instead of the .get(), so I don't think there's any easy way around that).
How do I pass in an unevaluated queryset as the argument to an __in lookup?
I would like to do something like:
ids_to_delete = self.items.all().values("id")
other_object.items.filter(item_id__in=ids_to_delete).delete()

You can, pass a QuerySet to the query:
other_object.items.filter(id__in=self.items.all()).delete()
this will transform it in a subquery. But not all databases, especially MySQL ones, are good with such subqueries. Furthermore Django handles .delete() manually. It will thus make a query to fetch the primary keys of the items, and then trigger the delete logic (and also remove items that have a CASCADE dependency). So .delete() is not done as one query, but at least two queries, and often a larger amount due to ForeignKeys with an on_delete trigger.
Note however that you here remove Item objects, not "unlink" this from the other_object. For this .remove(…) [Django-doc] can be used.

I should've tried the code sample I posted, you can in fact do this. It's given as an example in the documentation, but it says "be cautious about using nested queries and understand your database server’s performance characteristics" and recommends against doing this, casting the subquery into a list:
values = Blog.objects.filter(
name__contains='Cheddar').values_list('pk', flat=True)
entries = Entry.objects.filter(blog__in=list(values))

"connection.queries" returns nothing in Django

from django.db import connection, reset_queries
Prints: []
reset_queries()
p = XModel.objects.filter(id=id) \
.values('name') \
.annotate(quantity=Count('p_id'))\
.order_by('-quantity') \
.distinct()[:int(count)]
print(connection.queries)
While this prints:
reset_queries()
tc = ZModel.objects\
.filter(id=id, stock__gt=0) \
.aggregate(Sum('price'))
print(connection.queries)
I have changed fields names to keep things simple. (Fields are of parent tables i.e. __ to multiple level)
I was trying to print MySQL queries that Django makes and came across connection.queries, I was wondering why doesn't it prints empty with first, while with second it works fine. Although I am getting the result I expect it to. Probably the query is executed. Also am executing only one at a time.

As the accepted answer says you must consume the queryset first since it's lazy (e.g. list(qs)).
Another reason can be that you must be in DEBUG mode (see FAQ):
connection.queries is only available if Django DEBUG setting is True.

Because QuerySets in Django are lazy: as long as you do not consume the result, the QuerySet is not evaluated: no querying is done, until you want to obtain non-QuerySet objects like lists, dictionaries, Model objects, etc.
We can however not doe this for all ORM calls: for example Model.objects.get(..) has as type a Model object, we can not postpone that fetch (well of course we could wrap it in a function, and call it later, but then the "type" is a function, not a Model instance).
The same with a .aggregate(..) since then the result is a dictionary, that maps the keys to the corresponding result of the aggregation.
But your first query does not need to be evaluated. By writing a slicing, you only have added a LIMIT statement at the end of the query, but no need to evaluate it immediately: the type of this is still a QuerySet.
If you would however call list(qs) on a QuerySet (qs), then this means the QuerySet has to be evaluated, and Django will make the query.
The laziness of QuerySets also makes these chainings possible. Imagine that you write:
Model.objects.filter(foo=42).filter(bar=1425)
If the QuerySet of Model.objects.filter(foo=42) would be evaluated immediately, then this could result in a huge amount of Model instances, but by postponing this, we now filter on bar=1425 as well (we constructed a new QuerySet that takes both .filter(..)s into account). This can result in a query that can be evaluated more efficiently, and for example, can result in less data that has to be transferred from the database to the Django server.

The documentation says QuerySets are lazy as shown below:
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated. Take a look at this example:
>>> q = Entry.objects.filter(headline__startswith="What")
>>> q = q.filter(pub_date__lte=datetime.date.today())
>>> q = q.exclude(body_text__icontains="food")
>>> print(q)
Though this looks like three database hits, in fact it hits the
database only once, at the last line (print(q)). In general, the
results of a QuerySet aren’t fetched from the database until you “ask”
for them. When you do, the QuerySet is evaluated by accessing the
database. For more details on exactly when evaluation takes place, see
When QuerySets are evaluated.

django - how to do this with kwargs

I am wondering when I touch db when doing queries. more precisely, when is the query performed:
i have this kwargs dic:
kwargs = {'name__startswith':'somename','color__iexact':'somecolor'}
but only for name__startswith query, i need to distinct(). and not for color__iexact.
I thought, i would set for name__startswith the distinct() in loop like this:
for q in kwargs:
if q == 'name__startswith':
Thing.objects.filter(name__startswith=somename).distinct('id')
and then query for all dynamically:
allthings = Thing.objects.filter(**kwargs)
but this is somehow wrong, i seem to be doing two different things here..
how can i do these two queries dynamically?

django querysets are lazy, so the actual queries aren't evaluated until you use the data.
allthings = Thing.objects.filter(**kwargs)
if 'name__startswith' in kwargs:
allthings = allthings.distinct('id')
No queries should be preformed above unitl you actually use the data. This is great for filtering queries as you wish to do
From the docs:
QuerySets are lazy – the act of creating a QuerySet doesn’t involve any database activity. You can stack filters together all day long, and Django won’t actually run the query until the QuerySet is evaluated. Take a look at this example:
>>> q = Entry.objects.filter(headline__startswith="What")
>>> q = q.filter(pub_date__lte=datetime.date.today())
>>> q = q.exclude(body_text__icontains="food")
>>> print(q)
Though this looks like three database hits, in fact it hits the database only once, at the last line (print(q)). In general, the results of a QuerySet aren’t fetched from the database until you “ask” for them. When you do, the QuerySet is evaluated by accessing the database. For more details on exactly when evaluation takes place, see When QuerySets are evaluated.

You can use models.Q to create dynamic queries in django.
query = models.Q(name__startswith=somename)
query &= models.Q('color__iexact':'somecolor')
all_things = Thing.objects.filter(query).distinct('name')
Also read
Constructing Django filter queries dynamically with args and kwargs

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Curious about how django achieves the ORM query optimization - python

I am no python expert and I am curious about how django optimize the following query Model.objects.filter(field = 'abc')[0] Somehow django will intelligently add 'limit 1' to SQL query like 'select * from model where field = 'abc' limit 1'

Related

django custom Func for specific SQL function

Why django ORM .filter() two way binding my data?

Pass a queryset as the argument to __in in django?

"connection.queries" returns nothing in Django

django - how to do this with kwargs

Categories

Resources