django ORM turns two conditions on related table into two separate JOINs - python

I have the case that I need to filter on two attributes from a related table.
class Item(models.Model):
vouchers = models.ManyToManyField()
class Voucher(models.Model):
is_active = models.BooleanField()
status = models.PositiveIntegerField()
When I query the ORM like this:
Item.objects.exclude(
vouchers__is_active=False,
vouchers__status__in=[1, 2])
The created query looks like this:
SELECT *
FROM `item`
WHERE NOT (`item`.`id` IN (
SELECT U1.`item_id`
FROM `itemvouchers` U1
INNER JOIN `voucher` U2 ON (U1.`voucher_id` = U2.`id`)
WHERE U2.`is_active` = FALSE)
AND
`item`.`id` IN (
SELECT U1.`item_id`
FROM `itemvouchers` U1
INNER JOIN `voucher` U2 ON (U1.`voucher_id` = U2.`id`)
WHERE U2.`status` IN (1, 2))
)
I want to exclude vouchers which are both inactive AND have status 1 or 2.
What the query does is creating two separate joins. This is at first unnecessary and bad for performance. Second it's just wrong.
Case:
voucher_a = Voucher.objects.create(status=3, is_active=True)
voucher_b = Voucher.objects.create(status=1, is_active=False)
If I have an item in related with voucher_a and voucher_b it does not get found because it is in JOIN 1 but not in JOIN 2.
It looks like a bug in django but I wasn't able to find anything useful on the web to this topic.
We are on django==2.1.1 and tried out switching exclude with filter or using Q-expressions. Nothing worked so far.

Your setup is an m2m relation, and you want to exclude any single object that has at least one m2m relation for which this AND combination of conditions is true.
M2M relationships are special when it comes to filter/exclude querysets, see https://docs.djangoproject.com/en/2.1/topics/db/queries/#spanning-multi-valued-relationships
Also note in that documentation:
The behavior of filter() for queries that span multi-value relationships, as described above, is not implemented equivalently for exclude(). Instead, the conditions in a single exclude() call will not necessarily refer to the same item.
The solution presented in the documentation is the following:
Blog.objects.exclude(
entry__in=Entry.objects.filter(
headline__contains='Lennon',
pub_date__year=2008,
),
)

Related

How to do an exclude django query on multiple foreign key

My model example:
class Thing(models.Model):
alpha = models.ForeignKey('auth.User', on_delete=models.CASCADE,
related_name='alpha_thing')
beta = models.ForeignKey('auth.User', on_delete=models.CASCADE,
related_name='beta_thing')
assigned_at = models.DateTimeField(
_('assigned at'),
null=True,
help_text=_('Assigned at this date'))
I wish to query all the users which don't have a Thing with an assigned_at date, ie they could have other Things, but that should have a date set.
I've tried:
return User.objects.exclude(
alpha_thing__assigned_at__isnull=True
).exclude(
beta_thing__assigned_at__isnull=True
).all()
but the result is empty (the thing table is empty, so i'm not sure if it has something to do with the join?).
What about this,
from django.db.models import Q
User.objects.filter(Q(alpha_thing__assigned_at__isnull=False) | Q(beta_thing__assigned_at__isnull=False)).distinct()
Screenshots
1. Auth model structure - User
2. Thing model
There is another way, since you want to filter user which "things" contains all an assigned_date.
You could:
User.objects.filter(
alpha_thign__assigned_at__isnull=False,
beta_thign__assigned_at__isnull=False,
)
Simple.
There are no need to Use Q objects here or | (or) operations.
What you want is not
alpha_thing__assigned_at__isnull=False OR
beta_thing__assigned_at__isnull=False
What you're looking for is
alpha_thing__assigned_at__isnull=False AND
beta_thing__assigned_at__isnull=False
For all users which don't have a Thing with an empty date try:
return User.objects.exclude(
alpha_thing__assigned_at=None
).exclude(
beta_thing__assigned_at=None
).all()
By the way, I got the same result whether I used .all() at the end or not, so:
return User.objects.exclude(
alpha_thing__assigned_at=None
).exclude(
beta_thing__assigned_at=None
)
returned the same result as the first example.
Have you tried something like this?
from django.db.models import Q
has_null_alpha = Q(alpha_thing__isnull=False, alpha_thing__assigned_at__isnull=True)
has_null_beta = Q(beta_thing__isnull=False, beta_thing__assigned_at__isnull=True)
User.objects.exclude(has_null_alpha | has_null_beta)
Reasoning
I think the reason you're seeing unexpected results may not have anything to do with the fact that there are multiple ForeignKey paths in the queryset. Your statement that "the thing table is empty" might be the key, and the reason users aren't showing up is because they have no alpha_thing or beta_thing relation.
NOTES:
The QuerySet User.objects.exclude(alpha_thing__assigned_at__isnull=True) produces a left outer join between the User table and the Thing table, which means that before doing any comparisons in the WHERE clause, you're getting NULL for assigned_at in any row where there is no Thing.
One really weird thing here is that a filter causes an INNER join, so that the statement User.objects.filter(alpha_thing__assigned_at__isnull=False) actually only yields the users who actually have alpha_thing related objects with a non-NULL value for assigned_at (leaving out those guys with no related alpha_thing).

Django filter by the number of rows matching a certain condition in a ManyToMany

I need to filter for objects where the number of elements in a ManyToMany relationship matches a condition. Here's some simplified models:
Place(models.Model):
name = models.CharField(max_length=100)
Person(models.Model):
type = models.CharField(max_length=1)
place = models.ManyToManyField(Place, related_name="people")
I tried to do this:
c = Count(Q(people__type='V'))
p = Places.objects.annotate(v_people=c)
But this just makes the .v_people attribute count the number of People.
Since python-2.0, you can use the filter=... parameter of the Count(..) function [Django-doc] for this:
Place.objects.annotate(
v_people=Count('people', filter=Q(people__type='V'))
)
So this will assign to v_people the number of people with type='V' for that specific Place object.
An alternative is to .filter(..) the relation first:
Place.objects.filter(
Q(people__type='V') | Q(people__isnull=True)
).annotate(
v_people=Count('people')
)
Here we thus filter the relation such that we allow people that either have type='V', or with no people at all (since it is possible that the Place has no people. We then count the related model.
This generates a query like:
SELECT `place`.*, COUNT(`person_place`.`person_id`) AS `v_people`
FROM `place`
LEFT OUTER JOIN `person_place` ON `place`.`id` = `person_place`.`place_id`
LEFT OUTER JOIN `person` ON `person_place`.`person_id` = `person`.`id`
WHERE `person`.`type` = V OR `person_place`.`person_id` IS NULL

Simple Subquery with OuterRef

I am trying to make a very simple Subquery that uses OuterRef (not for practical purposes, but just to get it working), but I keep running into the same error.
posts/models.py code
from django.db import models
class Tag(models.Model):
name = models.CharField(max_length=120)
def __str__(self):
return self.name
class Post(models.Model):
title = models.CharField(max_length=120)
tags = models.ManyToManyField(Tag)
def __str__(self):
return self.title
manage.py shell code
>>> from django.db.models import OuterRef, Subquery
>>> from posts.models import Tag, Post
>>> tag1 = Tag.objects.create(name='tag1')
>>> post1 = Post.objects.create(title='post1')
>>> post1.tags.add(tag1)
>>> Tag.objects.filter(post=post1.pk)
<QuerySet [<Tag: tag1>]>
>>> tags_list = Tag.objects.filter(post=OuterRef('pk'))
>>> Post.objects.annotate(count=Subquery(tags_list.count()))
The last two lines should give me number of tags for each Post object. And here I keep getting the same error:
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
One of the problems with your example is that you cannot use queryset.count() as a subquery, because .count() tries to evaluate the queryset and return the count.
So one may think that the right approach would be to use Count() instead. Maybe something like this:
Post.objects.annotate(
count=Count(Tag.objects.filter(post=OuterRef('pk')))
)
This won't work for two reasons:
The Tag queryset selects all Tag fields, while Count can only count on one field. Thus: Tag.objects.filter(post=OuterRef('pk')).only('pk') is needed (to select counting on tag.pk).
Count itself is not a Subquery class, Count is an Aggregate. So the expression generated by Count is not recognized as a Subquery (OuterRef requires subquery), we can fix that by using Subquery.
Applying fixes for 1) and 2) would produce:
Post.objects.annotate(
count=Count(Subquery(Tag.objects.filter(post=OuterRef('pk')).only('pk')))
)
However
if you inspect the query being produced:
SELECT
"tests_post"."id",
"tests_post"."title",
COUNT((SELECT U0."id"
FROM "tests_tag" U0
INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."post_id" = ("tests_post"."id"))
) AS "count"
FROM "tests_post"
GROUP BY
"tests_post"."id",
"tests_post"."title"
you will notice a GROUP BY clause. This is because COUNT is an aggregate function. Right now it does not affect the result, but in some other cases it may. That's why the docs suggest a different approach, where the aggregation is moved into the subquery via a specific combination of values + annotate + values :
Post.objects.annotate(
count=Subquery(
Tag.objects
.filter(post=OuterRef('pk'))
# The first .values call defines our GROUP BY clause
# Its important to have a filtration on every field defined here
# Otherwise you will have more than one group per row!!!
# This will lead to subqueries to return more than one row!
# But they are not allowed to do that!
# In our example we group only by post
# and we filter by post via OuterRef
.values('post')
# Here we say: count how many rows we have per group
.annotate(count=Count('pk'))
# Here we say: return only the count
.values('count')
)
)
Finally this will produce:
SELECT
"tests_post"."id",
"tests_post"."title",
(SELECT COUNT(U0."id") AS "count"
FROM "tests_tag" U0
INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."post_id" = ("tests_post"."id")
GROUP BY U1."post_id"
) AS "count"
FROM "tests_post"
The django-sql-utils package makes this kind of subquery aggregation simple. Just pip install django-sql-utils and then:
from sql_util.utils import SubqueryCount
posts = Post.objects.annotate(
tag_count=SubqueryCount('tag'))
The API for SubqueryCount is the same as Count, but it generates a subselect in the SQL instead of joining to the related table.

Django filtering AND loop in a Django on M2M field

I have a list of IDs I need to query and filter (using AND) in Django. I would like to use something along the lines of example 2 below but it gives incorrect results 0. The models are simple, Many Products can have Many Tags. What is wrong with example 2?
Correct Results
Example 1:
q = Product.objects.all()
for id in _list_of_ids:
q.filter(tags__id=id)
Example 2:
Incorrect results but seems better (edited for brevity) ...
for id in _list_of_ids:
q = Q(tags__id=id)
# apend q here etc
# q = (AND: ('tags__id', 1), ('tags__id', 2))
Products.objects.filter(q)
What you are searching for is:
products = reduce(lambda qs, p_id: qs.filter(tags=p_id), _list_of_ids, Product.objects.all())
Basically there is a difference between a single .filter call with several Q objects and multiple .filter calls each one with a single Q object.
In the first scenario you get one inner join with all Q filters applied to it.
In the second scenario you get many inner joins, each applying only one Q object.
In your case, when you are searching for a product, having a combination of multiple tags, you need to make an inner join per tag in order to find such a product (this is the second scenario) so you need many .filter calls.
More about that in the docs: Spanning multi-valued relationships
What is the full code for Example 2?
Something like this seems like it should work...
q_expression = [Q("tags__", id) for id in list_of_ids]
queryset = Product.objects.filter(reduce(operator.and_, q_expression))
q = Product.objects.filter(tags__id__in=list_of_ids)

How to restrict query of a ManyToMany relationship with Q.AND in Django

I want to get all images that have 2 specific tags, 'tag1' AND 'tag2'. My simplified models:
class Image(models.Model):
title = models.CharField(max_length=100)
class Tag(models.Model):
name = models.CharField(max_length=64, unique=True)
images = models.ManyToManyField(Image, null=True, blank=True)
Concatenating filter works:
query = Image.objects.filter(tag__name='tag1').filter(tag__name='tag2')
However, I thought I could do it using the Q object from Django. I'm building a complex query, so using Q would be more straightforward. I'm adding all parameters to a qobj = Q() using qobj.add(Q(tag__name='tag1'), Q.AND). But... the following retrieves nothing:
qobj = Q()
qobj.add(Q(tag__name='tag1'), Q.AND)
qobj.add(Q(tag__name='tag2'), Q.AND)
query = Image.objects.filter(qobj)
Everything works as expected when using OR connector in the code above, returning correctly images that have tag1 OR tag2.
It seems that in the AND case it is looking for a row in app_tag_images with both tags, which is obviously absent, since each row has only one tag_id for a image_id.
Is there a way to build this query with Q?
ps: let me know if more details of the code are needed.
edit:
Here is que sql query of the query with Q (I cleaned most SELECT columns for clarity):
SELECT "meta_image"."id", "meta_image"."title"
FROM "meta_image"
INNER JOIN "meta_tag_images" ON ("meta_image"."id" = "meta_tag_images"."image_id")
INNER JOIN "meta_tag" ON ("meta_tag_images"."tag_id" = "meta_tag"."id")
WHERE ("meta_tag"."name" = tag1 AND "meta_tag"."name" = tag2)
OR query is identical as above (replacing AND by OR).
Just for reference, the working method using filter concatenating prints this query (also simplified):
SELECT "meta_image"."id", "meta_image"."title"
FROM "meta_image"
INNER JOIN "meta_tag_images" ON ("meta_image"."id" = "meta_tag_images"."image_id")
INNER JOIN "meta_tag" ON ("meta_tag_images"."tag_id" = "meta_tag"."id")
INNER JOIN "meta_tag_images" T4 ON ("meta_image"."id" = T4."image_id")
INNER JOIN "meta_tag" T5 ON (T4."tag_id" = T5."id")
WHERE ("meta_tag"."name" = tag1 AND T5."name" = tag2)
I wasn't even aware of that format!
What's wrong with the way the docs show Q object usage? http://docs.djangoproject.com/en/dev/topics/db/queries/#complex-lookups-with-q-objects
Image.objects.filter(Q(tag__name='tag1') & Q(tag__name='tag2'))
UPDATE:
I tested the qobj.add() method on my model with m2m and it works fine on 1.2.3
It also works fine copy and pasting your simplified model.
Are you sure your query is supposed to return something?
Does the standard Q usage Q(tag__name='tag1') & Q(tag__name='tag2') return results?
Can you print myquery.query as well?
Let's narrow this down.

Categories