If have some models like:
class Tag(models.Model):
name = models.CharField()
class Thing(models.Model):
title = models.CharField()
tags = models.ManyToManyField(Tag)
I can do a filter:
Thing.objects.filter(tags__name='foo')
Thing.objects.filter(tags__name__in=['foo', 'bar'])
But is it possible to order a queryset on the tags value?
Thing.objects.order_by(tags__name='foo')
Thing.objects.order_by(tags__name__in=['foo','bar'])
What I would expect (or like) back in this example, would be ALL Thing models, but ordered where they have a Tag/Tags that I know. I don't want to filter them out, but bring them to the top.
I gather this is possible using the FIELD operator, but seemingly I can only make it work on columns in that models table, e.g. title, but not on linked tables.
Thanks!
EDIT: After having accepted the below solution, I realised a bug/limitation with it.
If a particular Thing has multiple Tags, then (due to the left join done behind the scenes in the SQL) it will produce one entry for that Thing, for each Tag that it has. With a True or False for each Tag that matches or not.
Adding .distinct() to the queryset helps only slightly, limiting to a max of 2 rows per Thing (i.e. one tagged=True, and one tagged=False).
I know what I need to do in the SQL, which is to MAX() the CASE(), and then GROUP BY Thing's primary key, which means I will get one row per Thing, and if there has been any tag matches, tagged will be True (and False otherwise).
I see the way that people typically achieve this kind of thing is to use .values() like this:
Thing.objects.values('pk').annotate(tagged=Max(Case(...)))
But the result is only pk and tagged, I need the whole Thing model as the result. So I've managed to achieve what I want, thusly:
from django.db.models import Case, When, Max, BooleanField
tags = ['music'] # for example
queryset = Thing.objects.all().annotate(tagged=Max(Case(
When(tags__name__in=tags, then=True),
default=False,
output_field=BooleanField()
)))
queryset.query.group_by = ['pk']
queryset.order_by('-tagged')
This seems to work, but the group by mechanism feels weird/hacky. Is it acceptable/reliable to group in this way?
Sorry for the epic updated :(
I'd try annotate the query with the conditional value that turns true when the tag is in the list you provide
from django.db.models import Case, When, IntegerField
Thing.objects.annotate(tag_is_known=Case(
When(tags__name__in=['foo', 'bar'], then=1),
default=0,
output_field=IntegerField()
))
Next we use that annotation we called tag_is_known to sort with order_by():
Thing.objects.annotate(tag_is_known=...).order_by('tag_is_known')
Boolean version
Thing.objects.annotate(tag_is_known=Case(
When(tags__name__in=['foo', 'bar'], then=True),
default=False,
output_field=BooleanField()
))
Related
Context
I am quite new to Django and I am trying to write a complex query that I think would be easily writable in raw SQL, but for which I am struggling using the ORM.
Models
I have several models named SignalValue, SignalCategory, SignalSubcategory, SignalType, SignalSubtype that have the same structure like the following model:
class MyModel(models.Model):
id = models.BigAutoField(primary_key=True)
name = models.CharField()
fullname = models.CharField()
I also have explicit models that represent the relationships between the model SignalValue and the other models SignalCategory, SignalSubcategory, SignalType, SignalSubtype. Each of these relationships are named SignalValueCategory, SignalValueSubcategory, SignalValueType, SignalValueSubtype respectively. Below is the SignalValueCategory model as an example:
class SignalValueCategory(models.Model):
signal_value = models.OneToOneField(SignalValue)
signal_category = models.ForeignKey(SignalCategory)
Finally, I also have the two following models. ResultSignal stores all the signals related to the model Result:
class Result(models.Model):
pass
class ResultSignal(models.Model):
id = models.BigAutoField(primary_key=True)
result = models.ForeignKey(
Result
)
signal_value = models.ForeignKey(
SignalValue
)
Query
What I am trying to achieve is the following.
For a given Result, I want to retrieve all the ResultSignals that belong to it, filter them to keep the ones of my interest, and annotate them with two fields that we will call filter_group_id and filter_group_name. The values of two fields are determined by the SignalValue of the given ResultSignal.
From my perspective, the easiest way to achieve this would be first to annotate the SignalValues with their corresponding filter_group_name and filter_group_id, and then to join the resulting QuerySet with the ResultSignals. However, I think that it is not possible to join two QuerySets together in Django. Consequently, I thought that we could maybe use Prefetch objects to achieve what I am trying to do, but it seems that I am unable to make it work properly.
Code
I will now describe the current state of my queries.
First, annotating the SignalValues with their corresponding filter_group_name and filter_group_id. Note that filter_aggregator in the following code is just a complex filter that allows me to select the wanted SignalValues only. group_filter is the same filter but as a list of subfilters. Additionally, filter_name_case is a conditional expression (Case() construct):
# Attribute a group_filter_id and group_filter_name for each signal
signal_filters = SignalValue.objects.filter(
filter_aggregator
).annotate(
filter_group_id=Window(
expression=DenseRank(),
order_by=group_filters
),
filter_group_name=filter_name_case
)
Then, trying to join/annotate the SignalResults:
prefetch_object = Prefetch(
lookup="signal_value",
queryset=signal_filters,
to_attr="test"
)
result_signals: QuerySet = (
last_interview_result
.resultsignal_set
.filter(signal_value__in=signal_values_of_interest)
.select_related(
'signal_value__signalvaluecategory__signal_category',
'signal_value__signalvaluesubcategory__signal_subcategory',
'signal_value__signalvaluetype__signal_type',
'signal_value__signalvaluesubtype__signal_subtype',
)
.prefetch_related(
prefetch_object
)
.values(
"signal_value",
"test",
category=F('signal_value__signalvaluecategory__signal_category__name'),
subcategory=F('signal_value__signalvaluesubcategory__signal_subcategory__name'),
type=F('signal_value__signalvaluetype__signal_type__name'),
subtype=F('signal_value__signalvaluesubtype__signal_subtype__name'),
)
)
Normally, from my understanding, the resulting QuerySet should have a field "test" that is now available, that would contain the fields of signal_filter, the first QuerySet. However, Django complains that "test" is not found when calling .values(...) in the last part of my code: Cannot resolve keyword 'test' into field. Choices are: [...]. It is like the to_attr parameter of the Prefetch object was not taken into account at all.
Questions
Did I missunderstand the functioning of annotate() and prefetch_related() functions? If not, what am I doing wrong in my code for the specified parameter to_attr to not exist in my resulting QuerySet?
Is there a better way to join two QuerySets in Django or am I better off using RawSQL? An alternative way would be to switch to Pandas to make the join in-memory, but it is very often more efficient to do such transformations on the SQL side with well-designed queries.
You're on the right path, but just missing what prefetch does.
Your annotations are correct, but the "test" prefetch isn't really an attribute. You batch up the SELECT * FROM signal_value queries so you don't have to execute the select per row. Just drop the "test" annotation and you should be fine. https://docs.djangoproject.com/en/3.2/ref/models/querysets/#prefetch-related
Please don't use pandas, it's definitely not necessary and is a ton of overhead. As you say yourself, it's more efficient to do the transforms on the sql side
From the docs on prefetch_related:
Remember that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query.
It's not obvious but the values() call is part of these chained methods that imply a different query, and will actually cancel prefetch_related. This should work if you remove it.
I have Order objects and OrderOperation objects that represent an action on a Order (creation, modification, cancellation).
Conceptually, an order has 1 to many order operations. Each time there is an operation on the order, the total is computed in this operation. Which means when I need to find an attribute of an order, I just get the last order operation attribute instead, using a Subquery.
The simplified code
class OrderOperation(models.Model):
order = models.ForeignKey(Order)
total = DecimalField(max_digits=9, decimal_places=2)
class Order(models.Model)
# ...
class OrderQuerySet(query.Queryset):
#staticmethod
def _last_oo(field):
return Subquery(OrderOperation.objects
.filter(order_id=OuterRef("pk"))
.order_by('-id')
.values(field)
[:1])
def annotated_total(self):
return self.annotate(oo_total=self._last_oo('total'))
This way, I can run my_order_total = Order.objects.annotated_total()[0].oo_total. It works great.
The issue
Computing total is easy as it's a simple value. However, when there is a M2M or OneToMany field, this method does not work. For example, using the example above, let's add this field:
class OrderOperation(models.Model):
order = models.ForeignKey(Order)
total = DecimalField(max_digits=9, decimal_places=2)
ordered_articles = models.ManyToManyField(Article,through='orders.OrderedArticle')
Writing something like the following does NOT work as it returns only 1 foreign key (not a list of all the FKs):
def annotated_ordered_articles(self):
return self.annotate(oo_ordered_articles=self._last_oo('ordered_articles'))
The purpose
The whole purpose is to allow a user to search among all orders, providing a list or articles in input. For example: "Please find all orders containing at least article 42 or article 43", or "Please find all orders containing exactly article 42 and 43", etc.
If I could get something like:
>>> Order.objects.annotated_ordered_articles()[0].oo_ordered_articles
<ArticleQuerySet [<Article: Article42>, <Article: Article43>]>
or even:
>>> Order.objects.annotated_ordered_articles()[0].oo_ordered_articles
[42,43]
That would solve my issue.
My current idea
Maybe something like ArrayAgg (I'm using pgSQL) could do the trick, but I'm not sure to understand how to use it in my case.
Maybe this has to do with values() method that seems to not be intended to handle M2M and 1TM relations as stated in the doc:
values() and values_list() are both intended as optimizations for a
specific use case: retrieving a subset of data without the overhead of
creating a model instance. This metaphor falls apart when dealing with
many-to-many and other multivalued relations (such as the one-to-many
relation of a reverse foreign key) because the “one row, one object”
assumption doesn’t hold.
ArrayAgg will be great if you want to fetch only one variable (ie. name) from all articles. If you need more, there is a better option for that:
prefetch_related
Instead, you can prefetch for each Order, latest OrderOperation as a whole object. This adds the ability to easily get any field from OrderOperation without extra magic.
The only caveat with that is that you will always get a list with one operation or an empty list when there are no operations for selected order.
To do that, you should use prefetch_related queryset model together with Prefetch object and custom query for OrderOperation. Example:
from django.db.models import Max, F, Prefetch
last_order_operation_qs = OrderOperation.objects.annotate(
lop_pk=Max('order__orderoperation__pk')
).filter(pk=F('lop_pk'))
orders = Order.objects.prefetch_related(
Prefetch('orderoperation_set', queryset=last_order_operation_qs, to_attr='last_operation')
)
Then you can just use order.last_operation[0].ordered_articles to get all ordered articles for particular order. You can add prefetch_related('ordered_articles') to first queryset to have improved performance and less queries on database.
To my surprise, your idea with ArrayAgg is right on the money. I didn't know there was a way to annotate with an array (and I believe there still isn't for backends other than Postgres).
from django.contrib.postgres.aggregates.general import ArrayAgg
qs = Order.objects.annotate(oo_articles=ArrayAgg(
'order_operation__ordered_articles__id',
'DISTINCT'))
You can then filter the resulting queryset using the ArrayField lookups:
# Articles that contain the specified array
qs.filter(oo_articles__contains=[42,43])
# Articles that are identical to the specified array
qs.filter(oo_articles=[42,43,44])
# Articles that are contained in the specified array
qs.filter(oo_articles__contained_by=[41,42,43,44,45])
# Articles that have at least one element in common
# with the specified array
qs.filter(oo_articles__overlap=[41,42])
'DISTINCT' is needed only if the operation may contain duplicate articles.
You may need to tweak the exact name of the field passed to the ArrayAgg function. For subsequent filtering to work, you may also need to cast id fields in the ArrayAgg to int as otherwise Django casts the id array to ::serial[], and my Postgres complained about type "serial[]" does not exist:
from django.db.models import IntegerField
from django.contrib.postgres.fields.array import ArrayField
from django.db.models.functions import Cast
ArrayAgg(Cast('order_operation__ordered_articles__id', IntegerField()))
# OR
Cast(ArrayAgg('order_operation__ordered_articles__id'), ArrayField(IntegerField()))
Looking at your posted code more closely, you'll also have to filter on the one OrderOperation you are interested in; the query above looks at all operations for the relevant order.
I have a couple models
class Order(models.Model):
user = models.ForeignKey(User)
class Lot(models.Model):
order = models.ForeignKey(Order)
buyer = models.ForeignKey(User)
What I'm trying to do is to annotate Lot objects with a number of buys made by a given user to the same seller. (it's not a mistake, Order.user is really a seller). Like “you’ve bought 4 items from this user recently”.
The closest I get was
recent_sold_lots = Lot.objects.filter(
order__user_id=OuterRef('order__user_id'),
status=Lot.STATUS_SOLD,
buyer_id=self.user_id,
date_sold__gte=now() - timedelta(hours=24),
)
qs = Lot.objects.filter(
status=Lot.STATUS_READY,
date_ready__lte=now() - timedelta(seconds=self.lag)
).annotate(same_user_recent_buys=Count(Subquery(recent_sold_lots.values('id'))))
But it fails when recent_sold_lots count is more than one: more than one row returned by a subquery used as an expression.
.annotate(same_user_recent_buys=Subquery(recent_sold_lots.aggregate(Count('id'))) doesn't seem to work also: This queryset contains a reference to an outer query and may only be used in a subquery.
.annotate(same_user_recent_buys=Subquery(recent_sold_lots.annotate(c=Count('id')).values('c')) is giving me Expression contains mixed types. You must set output_field.. If I add output_field=models.IntegerField() to the subquery call, it throws more than one row returned by a subquery used as an expression.
I'm stuck with this one. I feel I'm close to the solution, but what am I missing here?
The models you defined in the question do not correctly reflect the query you are making. In any case i'll use the model as a reference to my query.
from django.db.models import Count
user_id = 123 # my user id and also the buyer
buyer = User.objects.get(pk=user_id)
Lot.objects.filter(buyer=buyer).values('order__user').annotate(unique_seller_order_count=Count('id'))
What the query does is:
Filters the lot objects to the ones you have bought
Groups the Returned lots into the user who created the order
Annotates/Counts the responses for each group
I have activity logs for user activities, basically structured like this:
class ActivityLog(TimeStampedModel):
user = models.ForeignKey(User, on_delete=models.CASCADE)
action_type = models.CharField(max_length=25)
object_raw = models.CharField(max_length=500)
I want to aggregate all the entries where object_raw matches, i.e. so if a user searched for 'foo' on 4 different occasions, I get back one entry for 'foo', with count=4. I'm having trouble doing this right now. I know how to do it in sql, but don't understand that django syntax. I've been reading through the docs but I still don't get it. If anyone could help, it would be much appreciated!
To get one object with 'foo' and how many objects have object_raw='foo' you can do:
activity_logs = ActivityLog.objects.filter(object_raw='foo')
if activity_logs.exists():
activity_logs.first() # get one object
activity_logs.count() # get number of objects
If you just want how many objects have object_raw='foo', you can use conditional expressions with aggregates:
from django.db.models import IntegerField, Sum
Client.objects.aggregate(
num_object_raw=Sum(
Case(
When(object_raw='foo', then=1),
output_field=IntegerField()
)
)
)
Let's say I have a Product model with products in a storefront, and a ProductImages table with images of the product, which can have zero or more images. Here's a simplified example:
class Product(models.Model):
product_name = models.CharField(max_length=255)
# ...
class ProductImage(models.Model):
product = models.ForeignKey(Product, related_name='images')
image_file = models.CharField(max_length=255)
# ...
When displaying search results for products, I want to prioritize products which have images associated with them. I can easily get the number of images:
from django.db.models import Count
Product.objects.annotate(image_count=Count('images'))
But that's not actually what I want. I'd like to annotate it with a boolean field, have_images, indicating whether the product has one or more images, so that I can sort by that:
Product.objects.annotate(have_images=(?????)).order_by('-have_images', 'product_name')
How can I do that? Thanks!
I eventually found a way to do this using django 1.8's new conditional expressions:
from django.db.models import Case, When, Value, IntegerField
q = (
Product.objects
.filter(...)
.annotate(image_count=Count('images'))
.annotate(
have_images=Case(
When(image_count__gt=0,
then=Value(1)),
default=Value(0),
output_field=IntegerField()))
.order_by('-have_images')
)
And that's how I finally found incentive to upgrade to 1.8 from 1.7.
As from Django 1.11 it is possible to use Exists. Example below comes from Exists documentation:
>>> from django.db.models import Exists, OuterRef
>>> from datetime import timedelta
>>> from django.utils import timezone
>>> one_day_ago = timezone.now() - timedelta(days=1)
>>> recent_comments = Comment.objects.filter(
... post=OuterRef('pk'),
... created_at__gte=one_day_ago,
... )
>>> Post.objects.annotate(recent_comment=Exists(recent_comments))
Use conditional expressions and cast outputfield to BooleanField
Product.objects.annotate(image_count=Count('images')).annotate(has_image=Case(When(image_count=0, then=Value(False)), default=Value(True), output_field=BooleanField())).order_by('-has_image')
Read the docs about extra
qs = Product.objects.extra(select={'has_images': 'CASE WHEN images IS NOT NULL THEN 1 ELSE 0 END' })
Tested it works
But order_by or where(filter) by this field doesn't for me (Django 1.8) 0o:
If you need to order the resulting queryset using some of the new
fields or tables you have included via extra() use the order_by
parameter to extra() and pass in a sequence of strings. These strings
should either be model fields (as in the normal order_by() method on
querysets), of the form table_name.column_name or an alias for a
column that you specified in the select parameter to extra().
qs = qs.extra(order_by = ['-has_images'])
qs = qs.extra(where = ['has_images=1'])
FieldError: Cannot resolve keyword 'has_images' into field.
I have found https://code.djangoproject.com/ticket/19434 still opened.
So if you have such troubles like me, you can use raw
If performance matters, my suggestion is to add the hasPictures boolean field (as editable=False)
Then keep right value through ProductImage model signals (or overwriting save and delete methods)
Advantages:
Index friendly.
Better performance. Avoid joins.
Database agnostic.
Coding it will raise your django skills to next level.
When you have to annotate existence with some filters, Sum annotation can be used. For example, following annotates if there are any GIFs in images:
Product.objects.filter(
).annotate(
animated_images=Sum(
Case(
When(images__image_file__endswith='gif', then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
)
This will actually count them, but any pythonic if product.animated_images: will work same as it was boolean.