Column comparison in Django queries

Column comparison in Django queries - python

I have a following model:
class Car(models.Model):
make = models.CharField(max_length=40)
mileage_limit = models.IntegerField()
mileage = models.IntegerField()
I want to select all cars where mileage is less than mileage_limit, so in SQL it would be something like:
select * from car where mileage < mileage_limit;
Using Q object in Django, I know I can compare columns with any value/object, e.g. if I wanted to get cars that have mileage say less than 100,000 it would be something like:
cars = Car.objects.filter(Q(mileage__lt=100000))
Instead of a fixed value I would like to use the column name (in my case it is mileage_limit). So I would like to be able to do something like:
cars = Car.objects.filter(Q(mileage__lt=mileage_limit))
However this results in an error, since it is expecting a value/object, not a column name. Is there a way to compare two columns using Q object? I feel like it would be a very commonly used feature and there should be an easy way to do this, however couldn't find anything about it in the documentation.
Note: this is a simplified example, for which the use of Q object might seem to be unnecessary. However the real model has many more columns, and the real query is more complex, that's why I am using Q. Here in this question I just wanted to figure out specifically how to compare columns using Q.
EDIT
Apparently after release of Django 1.1 it would be possible to do the following:
cars = Car.objects.filter(mileage__lt=F('mileage_limit'))
Still not sure if F is supposed to work together with Q like this:
cars = Car.objects.filter(Q(mileage__lt=F('mileage_limit')))

You can't do this right now without custom SQL. The django devs are working on an F() function that would make it possible: #7210 - F() syntax, design feedback required.

Since I had to look this up based on the accepted answer, I wanted to quickly mention that the F() expression has indeed been released and is available for being used in queries.
This is what the Django documentation on F() says about it:
An F() object represents the value of a model field, transformed value of a model field, or annotated column. It makes it possible to refer to model field values and perform database operations using them without actually having to pull them out of the database into Python memory.
Instead, Django uses the F() object to generate an SQL expression that describes the required operation at the database level.
The reference for making queries using F() also gives useful examples.

Related

Django: How to annotate M2M or OneToMany fields using a SubQuery?

I have Order objects and OrderOperation objects that represent an action on a Order (creation, modification, cancellation).
Conceptually, an order has 1 to many order operations. Each time there is an operation on the order, the total is computed in this operation. Which means when I need to find an attribute of an order, I just get the last order operation attribute instead, using a Subquery.
The simplified code
class OrderOperation(models.Model):
order = models.ForeignKey(Order)
total = DecimalField(max_digits=9, decimal_places=2)
class Order(models.Model)
# ...
class OrderQuerySet(query.Queryset):
#staticmethod
def _last_oo(field):
return Subquery(OrderOperation.objects
.filter(order_id=OuterRef("pk"))
.order_by('-id')
.values(field)
[:1])
def annotated_total(self):
return self.annotate(oo_total=self._last_oo('total'))
This way, I can run my_order_total = Order.objects.annotated_total()[0].oo_total. It works great.
The issue
Computing total is easy as it's a simple value. However, when there is a M2M or OneToMany field, this method does not work. For example, using the example above, let's add this field:
class OrderOperation(models.Model):
order = models.ForeignKey(Order)
total = DecimalField(max_digits=9, decimal_places=2)
ordered_articles = models.ManyToManyField(Article,through='orders.OrderedArticle')
Writing something like the following does NOT work as it returns only 1 foreign key (not a list of all the FKs):
def annotated_ordered_articles(self):
return self.annotate(oo_ordered_articles=self._last_oo('ordered_articles'))
The purpose
The whole purpose is to allow a user to search among all orders, providing a list or articles in input. For example: "Please find all orders containing at least article 42 or article 43", or "Please find all orders containing exactly article 42 and 43", etc.
If I could get something like:
>>> Order.objects.annotated_ordered_articles()[0].oo_ordered_articles
<ArticleQuerySet [<Article: Article42>, <Article: Article43>]>
or even:
>>> Order.objects.annotated_ordered_articles()[0].oo_ordered_articles
[42,43]
That would solve my issue.
My current idea
Maybe something like ArrayAgg (I'm using pgSQL) could do the trick, but I'm not sure to understand how to use it in my case.
Maybe this has to do with values() method that seems to not be intended to handle M2M and 1TM relations as stated in the doc:
values() and values_list() are both intended as optimizations for a
specific use case: retrieving a subset of data without the overhead of
creating a model instance. This metaphor falls apart when dealing with
many-to-many and other multivalued relations (such as the one-to-many
relation of a reverse foreign key) because the “one row, one object”
assumption doesn’t hold.

ArrayAgg will be great if you want to fetch only one variable (ie. name) from all articles. If you need more, there is a better option for that:
prefetch_related
Instead, you can prefetch for each Order, latest OrderOperation as a whole object. This adds the ability to easily get any field from OrderOperation without extra magic.
The only caveat with that is that you will always get a list with one operation or an empty list when there are no operations for selected order.
To do that, you should use prefetch_related queryset model together with Prefetch object and custom query for OrderOperation. Example:
from django.db.models import Max, F, Prefetch
last_order_operation_qs = OrderOperation.objects.annotate(
lop_pk=Max('order__orderoperation__pk')
).filter(pk=F('lop_pk'))
orders = Order.objects.prefetch_related(
Prefetch('orderoperation_set', queryset=last_order_operation_qs, to_attr='last_operation')
)
Then you can just use order.last_operation[0].ordered_articles to get all ordered articles for particular order. You can add prefetch_related('ordered_articles') to first queryset to have improved performance and less queries on database.

To my surprise, your idea with ArrayAgg is right on the money. I didn't know there was a way to annotate with an array (and I believe there still isn't for backends other than Postgres).
from django.contrib.postgres.aggregates.general import ArrayAgg
qs = Order.objects.annotate(oo_articles=ArrayAgg(
'order_operation__ordered_articles__id',
'DISTINCT'))
You can then filter the resulting queryset using the ArrayField lookups:
# Articles that contain the specified array
qs.filter(oo_articles__contains=[42,43])
# Articles that are identical to the specified array
qs.filter(oo_articles=[42,43,44])
# Articles that are contained in the specified array
qs.filter(oo_articles__contained_by=[41,42,43,44,45])
# Articles that have at least one element in common
# with the specified array
qs.filter(oo_articles__overlap=[41,42])
'DISTINCT' is needed only if the operation may contain duplicate articles.
You may need to tweak the exact name of the field passed to the ArrayAgg function. For subsequent filtering to work, you may also need to cast id fields in the ArrayAgg to int as otherwise Django casts the id array to ::serial[], and my Postgres complained about type "serial[]" does not exist:
from django.db.models import IntegerField
from django.contrib.postgres.fields.array import ArrayField
from django.db.models.functions import Cast
ArrayAgg(Cast('order_operation__ordered_articles__id', IntegerField()))
# OR
Cast(ArrayAgg('order_operation__ordered_articles__id'), ArrayField(IntegerField()))
Looking at your posted code more closely, you'll also have to filter on the one OrderOperation you are interested in; the query above looks at all operations for the relevant order.

Wrong result when annotating with foreign key set filter

Model C has a foreign key link to model B. Model B has a foreign key link to model A. This is, model A instance may have many model B instances, and Model B may have many model C instances.
Pseudo-code:
class A:
...
class B:
a = models.ForeignKey(A, ...)
class C:
b = models.ForeignKey(B, ...)
I would like to retrieve the entire list of elements of model A from the database, annotating it with the count of B elements it has, where their respective count of C elements is not none (or any random number, for that matter).
I tried this:
A.objects.annotate(
at_least_one_count=Count('b', filter=Q(b__c__isnull=False))
)
To the best of my knowledge, this should be working. However, numbers returned are not correct (in a practical case). It returns a higher number than without the filter, which is obviously not possible:
A.objects.annotate(
at_least_one_count=Count('b')
)
Edit: when I run the following query individually for every A instance, I get the same numbers, which makes me think there might be something wrong in my code:
A.objects.first().b_set.filter(c__isnull=False).__len__()
Note: I would like to perform this query without SQL. If I have to utilise some more advance Pythonic tools that Django provides, I am happy to do it, as long as I stay Object-Oriented. I am trying to move away from using raw SQL for all database operations, and re-write them all with Django ORM. However, it seems to be overly complicated.

The answer is simple: after applying queries that translate to join statements, one must execute a distinct on filter, which in Django is done by calling .distinct(...) on the query set.
In this case, if you are using a filter, and the distinct restriction is on the filter object, then you want to use:
...=Count('b', filter=Q(b__c__isnull=False), distinct=True)

Reorder Django QuerySet by dynamically added field

A have piece of code, which fetches some QuerySet from DB and then appends new calculated field to every object in the Query Set. It's not an option to add this field via annotation (because it's legacy and because this calculation based on another already pre-fetched data).
Like this:
from django.db import models
class Human(models.Model):
name = models.CharField()
surname = models.CharField()
def calculate_new_field(s):
return len(s.name)*42
people = Human.objects.filter(id__in=[1,2,3,4,5])
for s in people:
s.new_column = calculate_new_field(s)
# people.somehow_reorder(new_order_by=new_column)
So now all people in QuerySet have a new column. And I want order these objects by new_column field. order_by() will not work obviously, since it is a database option. I understand thatI can pass them as a sorted list, but there is a lot of templates and other logic, which expect from this object QuerySet-like inteface with it's methods and so on.
So question is: is there some not very bad and dirty way to reorder existing QuerySet by dinamically added field or create new QuerySet-like object with this data? I believe I'm not the only one who faced this problem and it's already solved with django. But I can't find anything (except for adding third-party libs, and this is not an option too).

Conceptually, the QuerySet is not a list of results, but the "instructions to get those results". It's lazily evaluated and also cached. The internal attribute of the QuerySet that keeps the cached results is qs._result_cache
So, the for s in people sentence is forcing the evaluation of the query and caching the results.
You could, after that, sort the results by doing:
people._result_cache.sort(key=attrgetter('new_column'))
But, after evaluating a QuerySet, it makes little sense (in my opinion) to keep the QuerySet interface, as many of the operations will cause a reevaluation of the query. From this point on you should be dealing with a list of Models

Can you try it functions.Length:
from django.db.models.functions import Length
qs = Human.objects.filter(id__in=[1,2,3,4,5])
qs.annotate(reorder=Length('name') * 42).order_by('reorder')

Building Django Q() objects from other Q() objects, but with relation crossing context

I commonly find myself writing the same criteria in my Django application(s) more than once. I'll usually encapsulate it in a function that returns a Django Q() object, so that I can maintain the criteria in just one place.
I will do something like this in my code:
def CurrentAgentAgreementCriteria(useraccountid):
'''Returns Q that finds agent agreements that gives the useraccountid account current delegated permissions.'''
AgentAccountMatch = Q(agent__account__id=useraccountid)
StartBeforeNow = Q(start__lte=timezone.now())
EndAfterNow = Q(end__gte=timezone.now())
NoEnd = Q(end=None)
# Now put the criteria together
AgentAgreementCriteria = AgentAccountMatch & StartBeforeNow & (NoEnd | EndAfterNow)
return AgentAgreementCriteria
This makes it so that I don't have to think through the DB model more than once, and I can combine the return values from these functions to build more complex criterion. That works well so far, and has saved me time already when the DB model changes.
Something I have realized as I start to combine the criterion from these functions that is that a Q() object is inherently tied to the type of object .filter() is being called on. That is what I would expect.
I occasionally find myself wanting to use a Q() object from one of my functions to construct another Q object that is designed to filter a different, but related, model's instances.
Let's use a simple/contrived example to show what I mean. (It's simple enough that normally this would not be worth the overhead, but remember that I'm using a simple example here to illustrate what is more complicated in my app.)
Say I have a function that returns a Q() object that finds all Django users, whose username starts with an 'a':
def UsernameStartsWithAaccount():
return Q(username__startswith='a')
Say that I have a related model that is a user profile with settings including whether they want emails from us:
class UserProfile(models.Model):
account = models.OneToOneField(User, unique=True, related_name='azendalesappprofile')
emailMe = models.BooleanField(default=False)
Say I want to find all UserProfiles which have a username starting with 'a' AND want use to send them some email newsletter. I can easily write a Q() object for the latter:
wantsEmails = Q(emailMe=True)
but find myself wanting to something to do something like this for the former:
startsWithA = Q(account=UsernameStartsWithAaccount())
# And then
UserProfile.objects.filter(startsWithA & wantsEmails)
Unfortunately, that doesn't work (it generates invalid PSQL syntax when I tried it).
To put it another way, I'm looking for a syntax along the lines of Q(account=Q(id=9)) that would return the same results as Q(account__id=9).
So, a few questions arise from this:
Is there a syntax with Django Q() objects that allows you to add "context" to them to allow them to cross relational boundaries from the model you are running .filter() on?
If not, is this logically possible? (Since I can write Q(account__id=9) when I want to do something like Q(account=Q(id=9)) it seems like it would).

Maybe someone suggests something better, but I ended up passing the context manually to such functions. I don't think there is an easy solution, as you might need to call a whole chain of related tables to get to your field, like table1__table2__table3__profile__user__username, how would you guess that? User table could be linked to table2 too, but you don't need it in this case, so I think you can't avoid setting the path manually.
Also you can pass a dictionary to Q() and a list or a dictionary to filter() functions which is much easier to work with than using keyword parameters and applying &.
def UsernameStartsWithAaccount(context=''):
field = 'username__startswith'
if context:
field = context + '__' + field
return Q(**{field: 'a'})
Then if you simply need to AND your conditions you can combine them into a list and pass to filter:
UserProfile.objects.filter(*[startsWithA, wantsEmails])

What is the difference between a mongoengine.DynamicEmbeddedDocument vs mongoengine.DictField?

A mongoengine.DynamicEmbeddedDocument can be used to leverage MongoDB's flexible schema-less design. It's expandable and doesn't apply type constraints to the fields, afaik.
A mongoengine.DictField similarly allows for use of MongoDB's schema-less nature. In the documentation they simply say (w.r.t. the DictField)
This is similar to an embedded document, but the structure is not defined.
Does that mean, then, the mongoengine.fields.DictField and the mongoengine.DynamicEmbeddedDocument are completely interchangeable?
EDIT (for more information):
mongoengine.DynamicEmbeddedDocument inherits from mongoengine.EmbeddedDocument which, from the code is:
A mongoengine.Document that isn't stored in its own collection. mongoengine.EmbeddedDocuments should be used as fields on mongoengine.Documents through the mongoengine.EmbeddedDocumentField field type.
A mongoengine.fields.EmbeddedDocumentField is
An embedded document field - with a declared document_type. Only valid values are subclasses of EmbeddedDocument.
Does this mean the only thing that makes the DictField and DynamicEmbeddedDocument not totally interchangeable is that the DynamicEmbeddedDocument has to be defined through the EmbeddedDocumentField field type?

From what I’ve seen, the two are similar, but not entirely interchangeable. Each approach may have a slight advantage based on your needs. First of all, as you point out, the two approaches require differing definitions in the document, as shown below.
class ExampleDynamicEmbeddedDoc(DynamicEmbeddedDocument):
pass
class ExampleDoc(Document):
dict_approach = DictField()
dynamic_doc_approach = EmbeddedDocumentField(ExampleDynamicEmbeddedDoc, default = ExampleDynamicEmbeddedDoc())
Note: The default is not required, but the dynamic_doc_approach field will need to be set to a ExampleDynamicEmbeddedDoc object in order to save. (i.e. trying to save after setting example_doc_instance.dynamic_doc_approach = {} would throw an exception). Also, you could use the GenericEmbeddedDocumentField if you don’t want to tie the field to a specific type of EmbeddedDocument, but the field would still need to be point to an object subclassed from EmbeddedDocument in order to save.
Once set up, the two are functionally similar in that you can save data to them as needed and without restrictions:
e = ExampleDoc()
e.dict_approach["test"] = 10
e.dynamic_doc_approach.test = 10
However, the one main difference that I’ve seen is that you can query against any values added to a DictField, whereas you cannot with a DynamicEmbeddedDoc.
ExampleDoc.objects(dict_approach__test = 10) # Returns a QuerySet containing our entry.
ExampleDoc.objects(dynamic_doc_approach__test = 10) # Throws an exception.
That being said, using an EmbeddedDocument has the advantage of validating fields which you know will be present in the document. (We simply would need to add them to the ExampleDynamicEmbeddedDoc definition). Because of this, I think it is best to use a DynamicEmbeddedDocument when you have a good idea of a schema for the field and only anticipate adding fields minimally (which you will not need to query against). However, if you are not concerned about validation or anticipate adding a lot of fields which you’ll query against, go with a DictField.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Column comparison in Django queries - python

You can't do this right now without custom SQL. The django devs are working on an F() function that would make it possible: #7210 - F() syntax, design feedback required.

Related

Django: How to annotate M2M or OneToMany fields using a SubQuery?

Wrong result when annotating with foreign key set filter

Reorder Django QuerySet by dynamically added field

Building Django Q() objects from other Q() objects, but with relation crossing context

What is the difference between a mongoengine.DynamicEmbeddedDocument vs mongoengine.DictField?

Categories

Resources