I am trying to make a very simple Subquery that uses OuterRef (not for practical purposes, but just to get it working), but I keep running into the same error.
posts/models.py code
from django.db import models
class Tag(models.Model):
name = models.CharField(max_length=120)
def __str__(self):
return self.name
class Post(models.Model):
title = models.CharField(max_length=120)
tags = models.ManyToManyField(Tag)
def __str__(self):
return self.title
manage.py shell code
>>> from django.db.models import OuterRef, Subquery
>>> from posts.models import Tag, Post
>>> tag1 = Tag.objects.create(name='tag1')
>>> post1 = Post.objects.create(title='post1')
>>> post1.tags.add(tag1)
>>> Tag.objects.filter(post=post1.pk)
<QuerySet [<Tag: tag1>]>
>>> tags_list = Tag.objects.filter(post=OuterRef('pk'))
>>> Post.objects.annotate(count=Subquery(tags_list.count()))
The last two lines should give me number of tags for each Post object. And here I keep getting the same error:
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
One of the problems with your example is that you cannot use queryset.count() as a subquery, because .count() tries to evaluate the queryset and return the count.
So one may think that the right approach would be to use Count() instead. Maybe something like this:
Post.objects.annotate(
count=Count(Tag.objects.filter(post=OuterRef('pk')))
)
This won't work for two reasons:
The Tag queryset selects all Tag fields, while Count can only count on one field. Thus: Tag.objects.filter(post=OuterRef('pk')).only('pk') is needed (to select counting on tag.pk).
Count itself is not a Subquery class, Count is an Aggregate. So the expression generated by Count is not recognized as a Subquery (OuterRef requires subquery), we can fix that by using Subquery.
Applying fixes for 1) and 2) would produce:
Post.objects.annotate(
count=Count(Subquery(Tag.objects.filter(post=OuterRef('pk')).only('pk')))
)
However
if you inspect the query being produced:
SELECT
"tests_post"."id",
"tests_post"."title",
COUNT((SELECT U0."id"
FROM "tests_tag" U0
INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."post_id" = ("tests_post"."id"))
) AS "count"
FROM "tests_post"
GROUP BY
"tests_post"."id",
"tests_post"."title"
you will notice a GROUP BY clause. This is because COUNT is an aggregate function. Right now it does not affect the result, but in some other cases it may. That's why the docs suggest a different approach, where the aggregation is moved into the subquery via a specific combination of values + annotate + values :
Post.objects.annotate(
count=Subquery(
Tag.objects
.filter(post=OuterRef('pk'))
# The first .values call defines our GROUP BY clause
# Its important to have a filtration on every field defined here
# Otherwise you will have more than one group per row!!!
# This will lead to subqueries to return more than one row!
# But they are not allowed to do that!
# In our example we group only by post
# and we filter by post via OuterRef
.values('post')
# Here we say: count how many rows we have per group
.annotate(count=Count('pk'))
# Here we say: return only the count
.values('count')
)
)
Finally this will produce:
SELECT
"tests_post"."id",
"tests_post"."title",
(SELECT COUNT(U0."id") AS "count"
FROM "tests_tag" U0
INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."post_id" = ("tests_post"."id")
GROUP BY U1."post_id"
) AS "count"
FROM "tests_post"
The django-sql-utils package makes this kind of subquery aggregation simple. Just pip install django-sql-utils and then:
from sql_util.utils import SubqueryCount
posts = Post.objects.annotate(
tag_count=SubqueryCount('tag'))
The API for SubqueryCount is the same as Count, but it generates a subselect in the SQL instead of joining to the related table.
Related
suppose we have a model in django defined as follows:
class Literal:
name = models.CharField(...)
...
Name field is not unique, and thus can have duplicate values. I need to accomplish the following task:
Select all rows from the model that have at least one duplicate value of the name field.
I know how to do it using plain SQL (may be not the best solution):
select * from literal where name IN (
select name from literal group by name having count((name)) > 1
);
So, is it possible to select this using django ORM? Or better SQL solution?
Try:
from django.db.models import Count
Literal.objects.values('name')
.annotate(Count('id'))
.order_by()
.filter(id__count__gt=1)
This is as close as you can get with Django. The problem is that this will return a ValuesQuerySet with only name and count. However, you can then use this to construct a regular QuerySet by feeding it back into another query:
dupes = Literal.objects.values('name')
.annotate(Count('id'))
.order_by()
.filter(id__count__gt=1)
Literal.objects.filter(name__in=[item['name'] for item in dupes])
This was rejected as an edit. So here it is as a better answer
dups = (
Literal.objects.values('name')
.annotate(count=Count('id'))
.values('name')
.order_by()
.filter(count__gt=1)
)
This will return a ValuesQuerySet with all of the duplicate names. However, you can then use this to construct a regular QuerySet by feeding it back into another query. The django ORM is smart enough to combine these into a single query:
Literal.objects.filter(name__in=dups)
The extra call to .values('name') after the annotate call looks a little strange. Without this, the subquery fails. The extra values tricks the ORM into only selecting the name column for the subquery.
try using aggregation
Literal.objects.values('name').annotate(name_count=Count('name')).exclude(name_count=1)
In case you use PostgreSQL, you can do something like this:
from django.contrib.postgres.aggregates import ArrayAgg
from django.db.models import Func, Value
duplicate_ids = (Literal.objects.values('name')
.annotate(ids=ArrayAgg('id'))
.annotate(c=Func('ids', Value(1), function='array_length'))
.filter(c__gt=1)
.annotate(ids=Func('ids', function='unnest'))
.values_list('ids', flat=True))
It results in this rather simple SQL query:
SELECT unnest(ARRAY_AGG("app_literal"."id")) AS "ids"
FROM "app_literal"
GROUP BY "app_literal"."name"
HAVING array_length(ARRAY_AGG("app_literal"."id"), 1) > 1
Ok, so for some reason none of the above worked for, it always returned <MultilingualQuerySet []>. I use the following, much easier to understand but not so elegant solution:
dupes = []
uniques = []
dupes_query = MyModel.objects.values_list('field', flat=True)
for dupe in set(dupes_query):
if not dupe in uniques:
uniques.append(dupe)
else:
dupes.append(dupe)
print(set(dupes))
If you want to result only names list but not objects, you can use the following query
repeated_names = Literal.objects.values('name').annotate(Count('id')).order_by().filter(id__count__gt=1).values_list('name', flat='true')
class ItemGroup(models.Model):
name = models.CharField(max_length=50)
class Item(models.Model):
name = models.CharField(max_length=50)
group = models.ForeignKey(ItemGroup, on_delete=models.CASCADE)
I want to display all the item groups, and for each group show the number of items the group has, and include groups which have no items.
In SQL I use left join for this purpose:
SELECT item_group.name, COUNT(*) as item_count
FROM item_group
LEFT JOIN item ON item_group.id = item.id
GROUP BY item_group.id, item_group.name
So I get groups with zero counts as well.
How to do the equivalent query with django ORM?
You need to use django annotations. Modify your foreign key slightly to:
group = models.ForeignKey(ItemGroup, related_name='items', on_delete=models.CASCADE)
for convenient naming.
Then run python manage.py makemigration followed by
python manage.py migrate
Then it's time to annotate.
from django.db.models import Count
ItemGroup.objects.annotate(no_of_items=Count('items'))
What this does is add to each ItemGroup object an attribute called no_of_items which is a count of the items related to it.
Now if you do item_group = ItemGroup.objects.get(id=1) assuming item group 1 had 3 items related to it you can do item_group.no_of_items which would give you 3. If another ItemGroup had 0 items it would return zero.
For more on annotations see https://docs.djangoproject.com/en/2.1/topics/db/aggregation/
I have the case that I need to filter on two attributes from a related table.
class Item(models.Model):
vouchers = models.ManyToManyField()
class Voucher(models.Model):
is_active = models.BooleanField()
status = models.PositiveIntegerField()
When I query the ORM like this:
Item.objects.exclude(
vouchers__is_active=False,
vouchers__status__in=[1, 2])
The created query looks like this:
SELECT *
FROM `item`
WHERE NOT (`item`.`id` IN (
SELECT U1.`item_id`
FROM `itemvouchers` U1
INNER JOIN `voucher` U2 ON (U1.`voucher_id` = U2.`id`)
WHERE U2.`is_active` = FALSE)
AND
`item`.`id` IN (
SELECT U1.`item_id`
FROM `itemvouchers` U1
INNER JOIN `voucher` U2 ON (U1.`voucher_id` = U2.`id`)
WHERE U2.`status` IN (1, 2))
)
I want to exclude vouchers which are both inactive AND have status 1 or 2.
What the query does is creating two separate joins. This is at first unnecessary and bad for performance. Second it's just wrong.
Case:
voucher_a = Voucher.objects.create(status=3, is_active=True)
voucher_b = Voucher.objects.create(status=1, is_active=False)
If I have an item in related with voucher_a and voucher_b it does not get found because it is in JOIN 1 but not in JOIN 2.
It looks like a bug in django but I wasn't able to find anything useful on the web to this topic.
We are on django==2.1.1 and tried out switching exclude with filter or using Q-expressions. Nothing worked so far.
Your setup is an m2m relation, and you want to exclude any single object that has at least one m2m relation for which this AND combination of conditions is true.
M2M relationships are special when it comes to filter/exclude querysets, see https://docs.djangoproject.com/en/2.1/topics/db/queries/#spanning-multi-valued-relationships
Also note in that documentation:
The behavior of filter() for queries that span multi-value relationships, as described above, is not implemented equivalently for exclude(). Instead, the conditions in a single exclude() call will not necessarily refer to the same item.
The solution presented in the documentation is the following:
Blog.objects.exclude(
entry__in=Entry.objects.filter(
headline__contains='Lennon',
pub_date__year=2008,
),
)
I need to filter for objects where the number of elements in a ManyToMany relationship matches a condition. Here's some simplified models:
Place(models.Model):
name = models.CharField(max_length=100)
Person(models.Model):
type = models.CharField(max_length=1)
place = models.ManyToManyField(Place, related_name="people")
I tried to do this:
c = Count(Q(people__type='V'))
p = Places.objects.annotate(v_people=c)
But this just makes the .v_people attribute count the number of People.
Since python-2.0, you can use the filter=... parameter of the Count(..) function [Django-doc] for this:
Place.objects.annotate(
v_people=Count('people', filter=Q(people__type='V'))
)
So this will assign to v_people the number of people with type='V' for that specific Place object.
An alternative is to .filter(..) the relation first:
Place.objects.filter(
Q(people__type='V') | Q(people__isnull=True)
).annotate(
v_people=Count('people')
)
Here we thus filter the relation such that we allow people that either have type='V', or with no people at all (since it is possible that the Place has no people. We then count the related model.
This generates a query like:
SELECT `place`.*, COUNT(`person_place`.`person_id`) AS `v_people`
FROM `place`
LEFT OUTER JOIN `person_place` ON `place`.`id` = `person_place`.`place_id`
LEFT OUTER JOIN `person` ON `person_place`.`person_id` = `person`.`id`
WHERE `person`.`type` = V OR `person_place`.`person_id` IS NULL
I have a QuerySet object with 100 items, for each of them I need to know how many times a particular contract_number occurs in the contract_number field.
Example of expected output:
[{'contract_number': 123, 'contract_count': 2}, {'contract_number': 456, 'contract_count': 1} ...]
This means that value 123 occurs 2 times for the whole contract_number field.
Important thing: I cannot reduce the amount of items, so grouping won't work here.
The SQL equivalent for this would be an additional field contract_count as below:
SELECT *,
(SELECT count(contract_number) FROM table where t.contract_number = contract_number) as contract_count
FROM table as t
The question is how to do it with a Python object. After some research, I have found out that for more complex queries the Queryset extra method should be used. Below is one of my tries, but the result is not what I have expected
queryset = Tracker.objects.extra(
select={
'contract_count': '''
SELECT COUNT(*)
FROM table
WHERE contract_number = %s
'''
},select_params=(F('contract_number'),),)
My models.py:
class Tracker(models.Model):
contract_number = models.IntegerField()
EDIT:
The solution to my problem was Subquery()
You can use annotation like this:
from django.db.models import Count
Tracker.objects.values('contract_number').annotate(contract_count=Count('contract_number')).order_by()
Solutions:
counttraker=Traker.objects.values('contract_number').annotate(Count('contract_number'))
subquery=counttraker.filter(contract_number=OuterRef('contract_number').values('contract_number__count')[:1]
traker=Traker.objects.annotate(count=Subquery(subquery))