Django - Aggregate equal values of fields in model - python

I have activity logs for user activities, basically structured like this:
class ActivityLog(TimeStampedModel):
user = models.ForeignKey(User, on_delete=models.CASCADE)
action_type = models.CharField(max_length=25)
object_raw = models.CharField(max_length=500)
I want to aggregate all the entries where object_raw matches, i.e. so if a user searched for 'foo' on 4 different occasions, I get back one entry for 'foo', with count=4. I'm having trouble doing this right now. I know how to do it in sql, but don't understand that django syntax. I've been reading through the docs but I still don't get it. If anyone could help, it would be much appreciated!

To get one object with 'foo' and how many objects have object_raw='foo' you can do:
activity_logs = ActivityLog.objects.filter(object_raw='foo')
if activity_logs.exists():
activity_logs.first() # get one object
activity_logs.count() # get number of objects
If you just want how many objects have object_raw='foo', you can use conditional expressions with aggregates:
from django.db.models import IntegerField, Sum
Client.objects.aggregate(
num_object_raw=Sum(
Case(
When(object_raw='foo', then=1),
output_field=IntegerField()
)
)
)

Related

Referencing twice related field for django model query

So I am using Django to construct a Query and I have 3 models as defined:
class Book(models.Model):
...
class Upload(models.Model):
...
book = models.ForeignKey(Book, on_delete=models.CASCADE)
class Region(models.Model):
...
page = models.ForeignKey(Upload, on_delete=models.CASCADE)
Given these 3 models I wanted a query that lists all the books and annotate them with a segmented_pages variable that contains the count of all the Upload that have non-zero number of regions.
Basically, counting the number of uploads per book that have atleast one region.
I am assuming the basic structure of the query would look like this and mainly the logic inside filter needs to be modified as there is no convenient count lookup.
Book.objects.annotate(segmented_pages=Count('upload', filter=Q(upload__region__count__gt=0))
Can someone please help me with the logic of the filter and a simple explanation of how to go about designing these types of queries using django models?
You can rewrite "non-zero number of regions" as "In the join produced by the query, the region for any upload must not be null", hence you can simply use isnull [Django docs]:
from django.db.models import Count, Q
Book.objects.annotate(
segmented_pages=Count(
'upload',
filter=Q(upload__region__isnull=False),
distinct=True
)
)

Count rows of a subquery in Django 1.11

I have a couple models
class Order(models.Model):
user = models.ForeignKey(User)
class Lot(models.Model):
order = models.ForeignKey(Order)
buyer = models.ForeignKey(User)
What I'm trying to do is to annotate Lot objects with a number of buys made by a given user to the same seller. (it's not a mistake, Order.user is really a seller). Like “you’ve bought 4 items from this user recently”.
The closest I get was
recent_sold_lots = Lot.objects.filter(
order__user_id=OuterRef('order__user_id'),
status=Lot.STATUS_SOLD,
buyer_id=self.user_id,
date_sold__gte=now() - timedelta(hours=24),
)
qs = Lot.objects.filter(
status=Lot.STATUS_READY,
date_ready__lte=now() - timedelta(seconds=self.lag)
).annotate(same_user_recent_buys=Count(Subquery(recent_sold_lots.values('id'))))
But it fails when recent_sold_lots count is more than one: more than one row returned by a subquery used as an expression.
.annotate(same_user_recent_buys=Subquery(recent_sold_lots.aggregate(Count('id'))) doesn't seem to work also: This queryset contains a reference to an outer query and may only be used in a subquery.
.annotate(same_user_recent_buys=Subquery(recent_sold_lots.annotate(c=Count('id')).values('c')) is giving me Expression contains mixed types. You must set output_field.. If I add output_field=models.IntegerField() to the subquery call, it throws more than one row returned by a subquery used as an expression.
I'm stuck with this one. I feel I'm close to the solution, but what am I missing here?
The models you defined in the question do not correctly reflect the query you are making. In any case i'll use the model as a reference to my query.
from django.db.models import Count
user_id = 123 # my user id and also the buyer
buyer = User.objects.get(pk=user_id)
Lot.objects.filter(buyer=buyer).values('order__user').annotate(unique_seller_order_count=Count('id'))
What the query does is:
Filters the lot objects to the ones you have bought
Groups the Returned lots into the user who created the order
Annotates/Counts the responses for each group

How to prefetch a count and use it to annotate a queryset in Django ORM

I’m trying to perform an advanced SQL query but I can’t figure out how to do this.
My goal is to implement a read/unread system for a Forum, so i have the followings models (incomplete for :
succinctness)
class ReadActivity(models.Model):
content_object = GenericForeignKey('content_type', 'object_id')
reader = models.ForeignKey(User, null=False, blank=False)
read_until = models.DateTimeField(blank=False, null=False, default=timezone.now)
class Meta:
unique_together = (("content_type", "object_id", "reader"),)
class Readable(models.Model):
read_activities = GenericRelation(ReadActivity)
class Meta:
abstract = True
class Forum(models.Model):
pass
class Topic(Readable):
pass
I think this is quite self explanatory : Topics can be marked as read. A user has zero or one ReadActivity per topic. (In fact Forum is also readable but this is out of scope for this question)
So : I want to be able to fetch a forum list, and for each forum determine if it is read or unread (for a given user)
My approach is :
For each forums, if any topic do not have any ReadActivity or a ReadActivity where read_until < last_message, mark the forum as unread.
Speaking SQL, I would do that this in the following way :
SELECT COUNT("forums_topic"."id")
FROM "forums_topic"
LEFT JOIN "reads_readactivity"
ON ("forums_topic"."id" = "reads_readactivity"."object_id" AND "reads_readactivity"."content_type_id" = '11')
WHERE (("reads_readactivity"."id" IS NULL
OR ("reads_readactivity"."reader_id" = '1' AND "reads_readactivity"."read_until" < "forums_topic”."last_message_date"))
AND "forums_topic"."forum_id" IN ('1', '2'))
GROUP BY "forums_topic"."forum_id"
But I can’t find a way to do this using Django ORM (Without many many queries). I tried to use Prefetch :
topics_prefetch = Topic.objects.filter(Q(read_activities=None) | (Q(read_activities__reader=self.request.user) & Q(read_activities__read_until__gt=F('created_at'))))
queryset = queryset.prefetch_related(Prefetch('read_activities', queryset=ReadActivity.objects.filter(reader=self.request.user)))
But this solution is not good enough, because there is no way I can select COUNT(), or limit to 1 result, and group the count by forum.
I tried to combine this solution with annotate :
queryset = queryset.annotate(is_unread=Count('topics__read_activities’))
But this just don’t work, it counts ALL the topics
Then I tried to use annotate with case/when :
queryset = queryset.annotate(is_unread=Case(
When(topics=None, then=False),
When(topics__read_activities=None, then=True),
When(topics__read_activities__reader=self.request.user, topics__read_activities__read_until__lte=F('topics__created_at'), then=True),
default=False,
output_field=BooleanField()
))
But it does not give the right results. Moreover I’m not sure this is the best solution. I’m not sure how CASE/WHEN SQL are implemented, but if it iterates internally on every topics, this may be too expensive
Now I’m considering writing RawSQL, however I don’t know how to add a manager method which would let be do something like : Forums.objects.filter().with_unread(), because I don’t know how to access the filtered forums to build a query like
"forums_topic"."forum_id" IN ('1', '2'))
from the manager itself.
Maybe I’m missing something, maybe you know how to do that. Any help would be appreciated, thank you !

Order a django queryset by ManyToManyField = Value

If have some models like:
class Tag(models.Model):
name = models.CharField()
class Thing(models.Model):
title = models.CharField()
tags = models.ManyToManyField(Tag)
I can do a filter:
Thing.objects.filter(tags__name='foo')
Thing.objects.filter(tags__name__in=['foo', 'bar'])
But is it possible to order a queryset on the tags value?
Thing.objects.order_by(tags__name='foo')
Thing.objects.order_by(tags__name__in=['foo','bar'])
What I would expect (or like) back in this example, would be ALL Thing models, but ordered where they have a Tag/Tags that I know. I don't want to filter them out, but bring them to the top.
I gather this is possible using the FIELD operator, but seemingly I can only make it work on columns in that models table, e.g. title, but not on linked tables.
Thanks!
EDIT: After having accepted the below solution, I realised a bug/limitation with it.
If a particular Thing has multiple Tags, then (due to the left join done behind the scenes in the SQL) it will produce one entry for that Thing, for each Tag that it has. With a True or False for each Tag that matches or not.
Adding .distinct() to the queryset helps only slightly, limiting to a max of 2 rows per Thing (i.e. one tagged=True, and one tagged=False).
I know what I need to do in the SQL, which is to MAX() the CASE(), and then GROUP BY Thing's primary key, which means I will get one row per Thing, and if there has been any tag matches, tagged will be True (and False otherwise).
I see the way that people typically achieve this kind of thing is to use .values() like this:
Thing.objects.values('pk').annotate(tagged=Max(Case(...)))
But the result is only pk and tagged, I need the whole Thing model as the result. So I've managed to achieve what I want, thusly:
from django.db.models import Case, When, Max, BooleanField
tags = ['music'] # for example
queryset = Thing.objects.all().annotate(tagged=Max(Case(
When(tags__name__in=tags, then=True),
default=False,
output_field=BooleanField()
)))
queryset.query.group_by = ['pk']
queryset.order_by('-tagged')
This seems to work, but the group by mechanism feels weird/hacky. Is it acceptable/reliable to group in this way?
Sorry for the epic updated :(
I'd try annotate the query with the conditional value that turns true when the tag is in the list you provide
from django.db.models import Case, When, IntegerField
Thing.objects.annotate(tag_is_known=Case(
When(tags__name__in=['foo', 'bar'], then=1),
default=0,
output_field=IntegerField()
))
Next we use that annotation we called tag_is_known to sort with order_by():
Thing.objects.annotate(tag_is_known=...).order_by('tag_is_known')
Boolean version
Thing.objects.annotate(tag_is_known=Case(
When(tags__name__in=['foo', 'bar'], then=True),
default=False,
output_field=BooleanField()
))

Django query annotation with boolean field

Let's say I have a Product model with products in a storefront, and a ProductImages table with images of the product, which can have zero or more images. Here's a simplified example:
class Product(models.Model):
product_name = models.CharField(max_length=255)
# ...
class ProductImage(models.Model):
product = models.ForeignKey(Product, related_name='images')
image_file = models.CharField(max_length=255)
# ...
When displaying search results for products, I want to prioritize products which have images associated with them. I can easily get the number of images:
from django.db.models import Count
Product.objects.annotate(image_count=Count('images'))
But that's not actually what I want. I'd like to annotate it with a boolean field, have_images, indicating whether the product has one or more images, so that I can sort by that:
Product.objects.annotate(have_images=(?????)).order_by('-have_images', 'product_name')
How can I do that? Thanks!
I eventually found a way to do this using django 1.8's new conditional expressions:
from django.db.models import Case, When, Value, IntegerField
q = (
Product.objects
.filter(...)
.annotate(image_count=Count('images'))
.annotate(
have_images=Case(
When(image_count__gt=0,
then=Value(1)),
default=Value(0),
output_field=IntegerField()))
.order_by('-have_images')
)
And that's how I finally found incentive to upgrade to 1.8 from 1.7.
As from Django 1.11 it is possible to use Exists. Example below comes from Exists documentation:
>>> from django.db.models import Exists, OuterRef
>>> from datetime import timedelta
>>> from django.utils import timezone
>>> one_day_ago = timezone.now() - timedelta(days=1)
>>> recent_comments = Comment.objects.filter(
... post=OuterRef('pk'),
... created_at__gte=one_day_ago,
... )
>>> Post.objects.annotate(recent_comment=Exists(recent_comments))
Use conditional expressions and cast outputfield to BooleanField
Product.objects.annotate(image_count=Count('images')).annotate(has_image=Case(When(image_count=0, then=Value(False)), default=Value(True), output_field=BooleanField())).order_by('-has_image')
Read the docs about extra
qs = Product.objects.extra(select={'has_images': 'CASE WHEN images IS NOT NULL THEN 1 ELSE 0 END' })
Tested it works
But order_by or where(filter) by this field doesn't for me (Django 1.8) 0o:
If you need to order the resulting queryset using some of the new
fields or tables you have included via extra() use the order_by
parameter to extra() and pass in a sequence of strings. These strings
should either be model fields (as in the normal order_by() method on
querysets), of the form table_name.column_name or an alias for a
column that you specified in the select parameter to extra().
qs = qs.extra(order_by = ['-has_images'])
qs = qs.extra(where = ['has_images=1'])
FieldError: Cannot resolve keyword 'has_images' into field.
I have found https://code.djangoproject.com/ticket/19434 still opened.
So if you have such troubles like me, you can use raw
If performance matters, my suggestion is to add the hasPictures boolean field (as editable=False)
Then keep right value through ProductImage model signals (or overwriting save and delete methods)
Advantages:
Index friendly.
Better performance. Avoid joins.
Database agnostic.
Coding it will raise your django skills to next level.
When you have to annotate existence with some filters, Sum annotation can be used. For example, following annotates if there are any GIFs in images:
Product.objects.filter(
).annotate(
animated_images=Sum(
Case(
When(images__image_file__endswith='gif', then=Value(1)),
default=Value(0),
output_field=IntegerField()
)
)
)
This will actually count them, but any pythonic if product.animated_images: will work same as it was boolean.

Categories