django left join + group by count(*), includes zero counts

django left join + group by count(*), includes zero counts - python

class ItemGroup(models.Model):
name = models.CharField(max_length=50)
class Item(models.Model):
name = models.CharField(max_length=50)
group = models.ForeignKey(ItemGroup, on_delete=models.CASCADE)
I want to display all the item groups, and for each group show the number of items the group has, and include groups which have no items.
In SQL I use left join for this purpose:
SELECT item_group.name, COUNT(*) as item_count
FROM item_group
LEFT JOIN item ON item_group.id = item.id
GROUP BY item_group.id, item_group.name
So I get groups with zero counts as well.
How to do the equivalent query with django ORM?

You need to use django annotations. Modify your foreign key slightly to:
group = models.ForeignKey(ItemGroup, related_name='items', on_delete=models.CASCADE)
for convenient naming.
Then run python manage.py makemigration followed by
python manage.py migrate
Then it's time to annotate.
from django.db.models import Count
ItemGroup.objects.annotate(no_of_items=Count('items'))
What this does is add to each ItemGroup object an attribute called no_of_items which is a count of the items related to it.
Now if you do item_group = ItemGroup.objects.get(id=1) assuming item group 1 had 3 items related to it you can do item_group.no_of_items which would give you 3. If another ItemGroup had 0 items it would return zero.
For more on annotations see https://docs.djangoproject.com/en/2.1/topics/db/aggregation/

Related

Runtime Foreign Key vs Integerfield

I have a problem. I already have two solution for my problem, but i was wondering which of those is the faster solution.
I guess that the second solution is not only more convienient- to use but also faster, but i want to be sure, so thats the reason why im asking.
My problem is i want to group multiple rows together. The group won't hold any meta data. So im only interested in runtime.
On the one hand i can use a Integer field and filter it later on when i need to get all entries that belong to the group. I guess runtime of O(n).
class SingleEntries(models.Model):
name = models.CharField(max_length=20)
group = models.IntegerField(null=True)
def find_all_group_members(id):
return SingleEntries.objects.filter(group=id)
The second solution and probably the more practicle way would be to create a foreign key to another model only using the pk there.
Then i can use the reverse relation to find all the entries that belong to the group.
class Group(models.Model):
id = models.AutoField(primary_key=True)
class SingleEntries(models.Model):
name = models.CharField(max_length=20)
group = models.ForeignKey(Group,on_delete=models.CASCADE,null=True)
def find_all_group_members(id):
return Group.objects.get(id=id).singleentries_set.all()

The first is more efficient, since this will use one query, whereas the latter will first fetch the Group, and then another one for the SingleEntries.
Indeed, if you work with:
SingleEntries.objects.filter(group=id)
this will make a simple query:
SELECT appname_singleentries.*
FROM appname_singleentries
WHERE appname_singleentries.group_id = id
It thus does not first fetch the Group into memory.
The latter will however make two queries. Indeed, it will first make a query to retrieve the Group, and then it will make a query like the one above to fetch the SingleEntries.
The two are also semantically not entirely the same: if there is no such group, then the former will return an empty QuerySet, whereas the latter will raise a Group.DoesNotExists exception.
But you can model this with:
class Group(models.Model):
pass
class SingleEntries(models.Model):
name = models.CharField(max_length=20)
group = models.ForeignKey(Group,on_delete=models.CASCADE,null=True)
def find_all_group_members(id):
return SingleEntries.objects.filter(group_id=id)
So you can use a Group model without having to retrieve the Group first.

If the groups are static in nature, that means if you don't see more groups coming to your system, you can use choices in Django.
Define choices as below
class GroupType(models.IntegerChoices):
GROUP_0 = 0, "Group 0 name"
GROUP_1 = 1, "Group 1 name"
GROUP_2 = 2, "Group 2 name"
And use it as choices field in the SingleEntries model as below
class SingleEntries(models.Model):
name = models.CharField(max_length=20)
group = models.IntegerField(choices=GroupChoices.choices, default=<set default here>)
If the groups are dynamic, meaning users can create groups whenever they want, in that case, go with your second approach of having another model for group.

Querying Many To Many relationship by number of joins using Django

I have two models: ActorModel and FilmModel joined as follows:
FilmModel(models.Model):
actors = models.ManyToManyField(Actor, blank=True, related_name='film_actors')
ActorModel(models.Model):
name = models.CharField(max_length=40)
def __str__(self):
return self.imdb_id
I want to filter my ActorModel for any instance which has more than 5 joins with the FilmModel. I can do this as follows:
actors = ActorModel.objects.all()
more_than_five_films = []
for actor in actors:
actor_film_list = FilmModel.objects.filter(actors__imdb_id=str(name))
if len(actor_film_list)>5:
more_than_five_films.append(actor)
However, using the above code uses lots of processing power. Is there a more efficient way of finding the actors with more than 5 joins? Could I do this at the filtering stage for example?

You could use query like this:
more_than_five_films = ActorModel.objects.annotate(count=Count('film_actors')).filter(count__gt=5)
You access FilmModel objects of ActorModel through related_name field, annotate new field named count by counting number of FilmModel objects related to each ActorModel object and then filter out only objects that have count value greater than 5.
Advice for code you provided is to never use len() on a queryset because it evaluates the whole query which is expensive and not needed since you need only a count value. You should use count() function which returns the number as same as len() does. It looks like this:
FilmModel.objects.filter(actors__imdb_id=str(name)).count()

django ORM turns two conditions on related table into two separate JOINs

I have the case that I need to filter on two attributes from a related table.
class Item(models.Model):
vouchers = models.ManyToManyField()
class Voucher(models.Model):
is_active = models.BooleanField()
status = models.PositiveIntegerField()
When I query the ORM like this:
Item.objects.exclude(
vouchers__is_active=False,
vouchers__status__in=[1, 2])
The created query looks like this:
SELECT *
FROM `item`
WHERE NOT (`item`.`id` IN (
SELECT U1.`item_id`
FROM `itemvouchers` U1
INNER JOIN `voucher` U2 ON (U1.`voucher_id` = U2.`id`)
WHERE U2.`is_active` = FALSE)
AND
`item`.`id` IN (
SELECT U1.`item_id`
FROM `itemvouchers` U1
INNER JOIN `voucher` U2 ON (U1.`voucher_id` = U2.`id`)
WHERE U2.`status` IN (1, 2))
)
I want to exclude vouchers which are both inactive AND have status 1 or 2.
What the query does is creating two separate joins. This is at first unnecessary and bad for performance. Second it's just wrong.
Case:
voucher_a = Voucher.objects.create(status=3, is_active=True)
voucher_b = Voucher.objects.create(status=1, is_active=False)
If I have an item in related with voucher_a and voucher_b it does not get found because it is in JOIN 1 but not in JOIN 2.
It looks like a bug in django but I wasn't able to find anything useful on the web to this topic.
We are on django==2.1.1 and tried out switching exclude with filter or using Q-expressions. Nothing worked so far.

Your setup is an m2m relation, and you want to exclude any single object that has at least one m2m relation for which this AND combination of conditions is true.
M2M relationships are special when it comes to filter/exclude querysets, see https://docs.djangoproject.com/en/2.1/topics/db/queries/#spanning-multi-valued-relationships
Also note in that documentation:
The behavior of filter() for queries that span multi-value relationships, as described above, is not implemented equivalently for exclude(). Instead, the conditions in a single exclude() call will not necessarily refer to the same item.
The solution presented in the documentation is the following:
Blog.objects.exclude(
entry__in=Entry.objects.filter(
headline__contains='Lennon',
pub_date__year=2008,
),
)

Django filter by the number of rows matching a certain condition in a ManyToMany

I need to filter for objects where the number of elements in a ManyToMany relationship matches a condition. Here's some simplified models:
Place(models.Model):
name = models.CharField(max_length=100)
Person(models.Model):
type = models.CharField(max_length=1)
place = models.ManyToManyField(Place, related_name="people")
I tried to do this:
c = Count(Q(people__type='V'))
p = Places.objects.annotate(v_people=c)
But this just makes the .v_people attribute count the number of People.

Since python-2.0, you can use the filter=... parameter of the Count(..) function [Django-doc] for this:
Place.objects.annotate(
v_people=Count('people', filter=Q(people__type='V'))
)
So this will assign to v_people the number of people with type='V' for that specific Place object.
An alternative is to .filter(..) the relation first:
Place.objects.filter(
Q(people__type='V') | Q(people__isnull=True)
).annotate(
v_people=Count('people')
)
Here we thus filter the relation such that we allow people that either have type='V', or with no people at all (since it is possible that the Place has no people. We then count the related model.
This generates a query like:
SELECT `place`.*, COUNT(`person_place`.`person_id`) AS `v_people`
FROM `place`
LEFT OUTER JOIN `person_place` ON `place`.`id` = `person_place`.`place_id`
LEFT OUTER JOIN `person` ON `person_place`.`person_id` = `person`.`id`
WHERE `person`.`type` = V OR `person_place`.`person_id` IS NULL

Simple Subquery with OuterRef

I am trying to make a very simple Subquery that uses OuterRef (not for practical purposes, but just to get it working), but I keep running into the same error.
posts/models.py code
from django.db import models
class Tag(models.Model):
name = models.CharField(max_length=120)
def __str__(self):
return self.name
class Post(models.Model):
title = models.CharField(max_length=120)
tags = models.ManyToManyField(Tag)
def __str__(self):
return self.title
manage.py shell code
>>> from django.db.models import OuterRef, Subquery
>>> from posts.models import Tag, Post
>>> tag1 = Tag.objects.create(name='tag1')
>>> post1 = Post.objects.create(title='post1')
>>> post1.tags.add(tag1)
>>> Tag.objects.filter(post=post1.pk)
<QuerySet [<Tag: tag1>]>
>>> tags_list = Tag.objects.filter(post=OuterRef('pk'))
>>> Post.objects.annotate(count=Subquery(tags_list.count()))
The last two lines should give me number of tags for each Post object. And here I keep getting the same error:
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.

One of the problems with your example is that you cannot use queryset.count() as a subquery, because .count() tries to evaluate the queryset and return the count.
So one may think that the right approach would be to use Count() instead. Maybe something like this:
Post.objects.annotate(
count=Count(Tag.objects.filter(post=OuterRef('pk')))
)
This won't work for two reasons:
The Tag queryset selects all Tag fields, while Count can only count on one field. Thus: Tag.objects.filter(post=OuterRef('pk')).only('pk') is needed (to select counting on tag.pk).
Count itself is not a Subquery class, Count is an Aggregate. So the expression generated by Count is not recognized as a Subquery (OuterRef requires subquery), we can fix that by using Subquery.
Applying fixes for 1) and 2) would produce:
Post.objects.annotate(
count=Count(Subquery(Tag.objects.filter(post=OuterRef('pk')).only('pk')))
)
However
if you inspect the query being produced:
SELECT
"tests_post"."id",
"tests_post"."title",
COUNT((SELECT U0."id"
FROM "tests_tag" U0
INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."post_id" = ("tests_post"."id"))
) AS "count"
FROM "tests_post"
GROUP BY
"tests_post"."id",
"tests_post"."title"
you will notice a GROUP BY clause. This is because COUNT is an aggregate function. Right now it does not affect the result, but in some other cases it may. That's why the docs suggest a different approach, where the aggregation is moved into the subquery via a specific combination of values + annotate + values :
Post.objects.annotate(
count=Subquery(
Tag.objects
.filter(post=OuterRef('pk'))
# The first .values call defines our GROUP BY clause
# Its important to have a filtration on every field defined here
# Otherwise you will have more than one group per row!!!
# This will lead to subqueries to return more than one row!
# But they are not allowed to do that!
# In our example we group only by post
# and we filter by post via OuterRef
.values('post')
# Here we say: count how many rows we have per group
.annotate(count=Count('pk'))
# Here we say: return only the count
.values('count')
)
)
Finally this will produce:
SELECT
"tests_post"."id",
"tests_post"."title",
(SELECT COUNT(U0."id") AS "count"
FROM "tests_tag" U0
INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."post_id" = ("tests_post"."id")
GROUP BY U1."post_id"
) AS "count"
FROM "tests_post"

The django-sql-utils package makes this kind of subquery aggregation simple. Just pip install django-sql-utils and then:
from sql_util.utils import SubqueryCount
posts = Post.objects.annotate(
tag_count=SubqueryCount('tag'))
The API for SubqueryCount is the same as Count, but it generates a subselect in the SQL instead of joining to the related table.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

django left join + group by count(*), includes zero counts - python

Related

Runtime Foreign Key vs Integerfield

Querying Many To Many relationship by number of joins using Django

django ORM turns two conditions on related table into two separate JOINs

Django filter by the number of rows matching a certain condition in a ManyToMany

Simple Subquery with OuterRef

Categories

Resources