Django: filter ManyToMany join

Django: filter ManyToMany join - python

I have app where subscribers are subscribing to various lists.
The domain here is: List model/SubscriberModel/ListSubscription model.
The List class definition contains the following line
subscribers = models.ManyToManyField(Subscriber, through='ListSubscription')
While this code allows me to get all subscribers, I need only some of them. the trick is that ListSubscription
class contains "is_active" boolean field identifiying subscriptions that are either active or inactive.
Is there some straighforward solution to add "is_active=True" to many to many join?
In plain SQL I would add this condition to a join clause, but not sure about Django ORM way.
The ideal result here would be ability to have a queryset to get all Lists with respective *active" subscribers.

A ManyToMany field is already a queryset, so if you want the active subscribers you can just call its filter method, perhaps via a method in the List class. The through table is available for filtering in the same way as the target table:
class List(models.Model):
# ... etc ...
#property
def active_subscribers(self):
return self.subscribers.filter(listsubscription__is_active = True)
To return lists with at least one active subscriber, use this query:
List.objects.filter(listsubscription__is_active = True)

Related

Django: How to "join" two querysets using Prefetch Object?

Context
I am quite new to Django and I am trying to write a complex query that I think would be easily writable in raw SQL, but for which I am struggling using the ORM.
Models
I have several models named SignalValue, SignalCategory, SignalSubcategory, SignalType, SignalSubtype that have the same structure like the following model:
class MyModel(models.Model):
id = models.BigAutoField(primary_key=True)
name = models.CharField()
fullname = models.CharField()
I also have explicit models that represent the relationships between the model SignalValue and the other models SignalCategory, SignalSubcategory, SignalType, SignalSubtype. Each of these relationships are named SignalValueCategory, SignalValueSubcategory, SignalValueType, SignalValueSubtype respectively. Below is the SignalValueCategory model as an example:
class SignalValueCategory(models.Model):
signal_value = models.OneToOneField(SignalValue)
signal_category = models.ForeignKey(SignalCategory)
Finally, I also have the two following models. ResultSignal stores all the signals related to the model Result:
class Result(models.Model):
pass
class ResultSignal(models.Model):
id = models.BigAutoField(primary_key=True)
result = models.ForeignKey(
Result
)
signal_value = models.ForeignKey(
SignalValue
)
Query
What I am trying to achieve is the following.
For a given Result, I want to retrieve all the ResultSignals that belong to it, filter them to keep the ones of my interest, and annotate them with two fields that we will call filter_group_id and filter_group_name. The values of two fields are determined by the SignalValue of the given ResultSignal.
From my perspective, the easiest way to achieve this would be first to annotate the SignalValues with their corresponding filter_group_name and filter_group_id, and then to join the resulting QuerySet with the ResultSignals. However, I think that it is not possible to join two QuerySets together in Django. Consequently, I thought that we could maybe use Prefetch objects to achieve what I am trying to do, but it seems that I am unable to make it work properly.
Code
I will now describe the current state of my queries.
First, annotating the SignalValues with their corresponding filter_group_name and filter_group_id. Note that filter_aggregator in the following code is just a complex filter that allows me to select the wanted SignalValues only. group_filter is the same filter but as a list of subfilters. Additionally, filter_name_case is a conditional expression (Case() construct):
# Attribute a group_filter_id and group_filter_name for each signal
signal_filters = SignalValue.objects.filter(
filter_aggregator
).annotate(
filter_group_id=Window(
expression=DenseRank(),
order_by=group_filters
),
filter_group_name=filter_name_case
)
Then, trying to join/annotate the SignalResults:
prefetch_object = Prefetch(
lookup="signal_value",
queryset=signal_filters,
to_attr="test"
)
result_signals: QuerySet = (
last_interview_result
.resultsignal_set
.filter(signal_value__in=signal_values_of_interest)
.select_related(
'signal_value__signalvaluecategory__signal_category',
'signal_value__signalvaluesubcategory__signal_subcategory',
'signal_value__signalvaluetype__signal_type',
'signal_value__signalvaluesubtype__signal_subtype',
)
.prefetch_related(
prefetch_object
)
.values(
"signal_value",
"test",
category=F('signal_value__signalvaluecategory__signal_category__name'),
subcategory=F('signal_value__signalvaluesubcategory__signal_subcategory__name'),
type=F('signal_value__signalvaluetype__signal_type__name'),
subtype=F('signal_value__signalvaluesubtype__signal_subtype__name'),
)
)
Normally, from my understanding, the resulting QuerySet should have a field "test" that is now available, that would contain the fields of signal_filter, the first QuerySet. However, Django complains that "test" is not found when calling .values(...) in the last part of my code: Cannot resolve keyword 'test' into field. Choices are: [...]. It is like the to_attr parameter of the Prefetch object was not taken into account at all.
Questions
Did I missunderstand the functioning of annotate() and prefetch_related() functions? If not, what am I doing wrong in my code for the specified parameter to_attr to not exist in my resulting QuerySet?
Is there a better way to join two QuerySets in Django or am I better off using RawSQL? An alternative way would be to switch to Pandas to make the join in-memory, but it is very often more efficient to do such transformations on the SQL side with well-designed queries.

You're on the right path, but just missing what prefetch does.
Your annotations are correct, but the "test" prefetch isn't really an attribute. You batch up the SELECT * FROM signal_value queries so you don't have to execute the select per row. Just drop the "test" annotation and you should be fine. https://docs.djangoproject.com/en/3.2/ref/models/querysets/#prefetch-related
Please don't use pandas, it's definitely not necessary and is a ton of overhead. As you say yourself, it's more efficient to do the transforms on the sql side

From the docs on prefetch_related:
Remember that, as always with QuerySets, any subsequent chained methods which imply a different database query will ignore previously cached results, and retrieve data using a fresh database query.
It's not obvious but the values() call is part of these chained methods that imply a different query, and will actually cancel prefetch_related. This should work if you remove it.

Django filter related field using related model's custom manager

How can I apply annotations and filters from a custom manager queryset when filtering via a related field? Here's some code to demonstrate what I mean.
Manager and models
from django.db.models import Value, BooleanField
class OtherModelManager(Manager):
def get_queryset(self):
return super(OtherModelManager, self).get_queryset().annotate(
some_flag=Value(True, output_field=BooleanField())
).filter(
disabled=False
)
class MyModel(Model):
other_model = ForeignKey(OtherModel)
class OtherModel(Model):
disabled = BooleanField()
objects = OtherModelManager()
Attempting to filter the related field using the manager
# This should only give me MyModel objects with related
# OtherModel objects that have the some_flag annotation
# set to True and disabled=False
my_model = MyModel.objects.filter(some_flag=True)
If you try the above code you will get the following error:
TypeError: Related Field got invalid lookup: some_flag
To further clarify, essentially the same question was reported as a bug with no response on how to actually achieve this: https://code.djangoproject.com/ticket/26393.
I'm aware that this can be achieved by simply using the filter and annotation from the manager directly in the MyModel filter, however the point is to keep this DRY and ensure this behaviour is repeated everywhere this model is accessed (unless explicitly instructed not to).

How about running nested queries (or two queries, in case your backend is MySQL; performance).
The first to fetch the pk of the related OtherModel objects.
The second to filter the Model objects on the fetched pks.
other_model_pks = OtherModel.objects.filter(some_flag=...).values_list('pk', flat=True)
my_model = MyModel.objects.filter(other_model__in=other_model_pks)
# use (...__in=list(other_model_pks)) for MySQL to avoid a nested query.

I don't think what you want is possible.
1) I think you are miss-understanding what annotations do.
Generating aggregates for each item in a QuerySet
The second way to generate summary values is to generate an
independent summary for each object in a QuerySet. For example, if you
are retrieving a list of books, you may want to know how many authors
contributed to each book. Each Book has a many-to-many relationship
with the Author; we want to summarize this relationship for each book
in the QuerySet.
Per-object summaries can be generated using the annotate() clause.
When an annotate() clause is specified, each object in the QuerySet
will be annotated with the specified values.
The syntax for these annotations is identical to that used for the
aggregate() clause. Each argument to annotate() describes an aggregate
that is to be calculated.
So when you say:
MyModel.objects.annotate(other_model__some_flag=Value(True, output_field=BooleanField()))
You are not annotation some_flag over other_model.
i.e. you won't have: mymodel.other_model.some_flag
You are annotating other_model__some_flag over mymodel.
i.e. you will have: mymodel.other_model__some_flag
2) I'm not sure how familiar SQL is for you, but in order to preserve MyModel.objects.filter(other_model__some_flag=True) possible, i.e. to keep the annotation when doing JOINS, the ORM would have to do a JOIN over subquery, something like:
INNER JOIN
(
SELECT other_model.id, /* more fields,*/ 1 as some_flag
FROM other_model
) as sub on mymodel.other_model_id = sub.id
which would be super slow and I'm not surprised they are not doing it.
Possible solution
don't annotate your field, but add it as a regular field in your model.

The simplified answer is that models are authoritative on the field collection and Managers are authoritative on collections of models. In your efforts to make it DRY you made it WET, cause you alter the field collection in your manager.
In order to fix it, you would have to teach the model about the lookup and need to do that using the Lookup API.
Now I'm assuming that you're not actually annotating with a fixed value, so if that annotation is in fact reducible to fields, then you may just get it done, because in the end it needs to be mapped to database representation.

How to combine two querysets when defining choices in a ModelMultipleChoiceField?

I am using a piece of code in two separate places in order to dynamically generate some form fields. In both cases, dynamic_fields is a dictionary where the keys are objects and the values are lists of objects (in the event of an empty list, the value is False instead):
class ExampleForm(forms.ModelForm):
def __init__(self, *args, **kwargs):
dynamic_fields = kwargs.pop('dynamic_fields')
super(ExampleForm, self).__init__(*args, **kwargs)
for key in dynamic_fields:
if dynamic_fields[key]:
self.fields[key.description] = forms.ModelMultipleChoiceField(widget=forms.CheckboxSelectMultiple, queryset=dynamic_fields[key], required=False)
class Meta:
model = Foo
fields = ()
In one view, for any key the value is a list of objects returned with a single DB query - a single, normal queryset. This view works just fine.
In the other view, it takes multiple queries to get everything I need to construct a given value. I am first instantiating the dictionary with the values set equal to blank lists, then adding the querysets I get from these multiple queries to the appropriate lists one at a time with basic list comprehension (dict[key] += queryset). This makes each value a 2-D list, which I then flatten (and remove duplicates) by doing:
for key in dict:
dict[key] = list(set(dict[key]))
I have tried this several different ways - directly appending the queries in each queryset to the values/lists, leaving it as a list of lists, using append instead of += - but I get the same error every time: 'list' object has no attribute 'none'.
Looking through the traceback, the error is coming up in the form's clean method. This is the relevant section from the code in django.forms.models:
def clean(self, value):
if self.required and not value:
raise ValidationError(self.error_messages['required'], code='required')
elif not self.required and not value:
return self.queryset.none() # the offending line
My thought process so far: in my first view, I'm generating the list that serves as the value for each key via a single query, but I'm combining multiple queries into a list in my second view. That list doesn't have a none method like I would normally have with a single queryset.
How do I combine multiple querysets without losing access to this method?
I found this post, but I'm still running into the same issue using itertools.chain as suggested there. The only thing I've been able to accomplish with that is changing the error to say 'chain' or 'set' object has no attribute 'none'.
Edit: here's some additional information about how the querysets are generated. I have the following models (only relevant fields are shown):
class Profile(models.Model):
user = models.OneToOneField(User)
preferred_genres = models.ManyToManyField(Genre, blank=True)
class Genre(models.Model):
description = models.CharField(max_length=200, unique=True)
parent = models.ForeignKey("Genre", null=True, blank=True)
class Trope(models.Model):
description = models.CharField(max_length=200, unique=True)
genre_relation = models.ManyToManyField(Genre)
In (the working) view #1, the dictionary I use to generate my fields has keys equal to a certain Genre, and values equal to a list of Genres for whom the key is a parent. In other words, for every key, the queryset is Genre.objects.filter(parent=key, **kwargs).
In the non-functional view #2, we need to start with the profile's preferred_genres field. For every preferred_genre I need to pull the associated Tropes and combine them into a single queryset. Right now, I am looping through preferred_genres and doing something like this:
for g in preferred_genres:
tropeset = g.trope_set.all()
This gets me a bunch of individual querysets containing the information I need, but I can't find a way to combine the multiple tropesets into one big queryset (as opposed to a list without the none attribute). (As an aside, this also hammers my database with a bunch of queries. I am also trying to wrap my head around how I can maybe use prefetch_related to reduce the number of queries, but one thing at a time.)
If I can't combine these querysets into one but CAN somehow accomplish these lookups with a single query, I am all ears! I am now reading the documentation regarding complex queries with the Q object. It is tantalizing - I can conceptualize how this would accomplish what I'm looking for, but only if I can call all of the queries at one time. Since I have to call them iteratively one at a time, I am not sure how to use the Q object to | or & them together.

You can combine querysets by using the | and & operators.
from functools import reduce
from operator import and_, or_
querysets = [q1, q2, q3, ...] # List of querysets you want to combine.
# Objects that are present in *at least one* of the queries
combined_or_querysets = reduce(or_, querysets[1:], querysets[0])
# Objects that are present in *all* of the queries
combined_and_querysets = reduce(and_, querysets[1:], querysets[0])
From Django 1.11+ you can also use the union and intersection methods.

I've found a "solution" to this problem. If I structure the query like so, I can get everything I need in one swoop without having to combine querysets after the fact:
desired_value = Trope.objects.filter(genre_relation__in=preferred_genres).distinct()
I still do not know how to combine multiple querysets into one without losing the inherent "queryset-ness" that seems to be necessary for the form to render properly. However, for my specific use case, restructuring the query as noted renders the issue moot.

How to filter Haystack SearchQuerySets by related models

How do you filter/join a Haystack SearchQuerySet by related model fields?
I have a query like:
sqs = SearchQuerySet().models(models.Person)
and this returns the same results that the equivalent admin page returns.
However, if I try and filter by model records linked by a foreign key:
sqs = sqs.filter(workplace__role__name='teacher')
it returns nothing, even though the page /admin/myapp/person/?workplace__role__name=teacher returns several records.
I don't want to do any full-text searching of these related models. I only want to do a simple exact-match filter. Is that possible with Haystack?

You cannot perform joins using a search engine like the ones supported by haystack.
To make queries like this you need to add the information you want to filter on in a "denormalized" fashion in your search index:
class ProfileIndex(indexes.SearchIndex, indexes.Indexable):
# your other fields, most likely model attributes
role_name = indexes.CharField()
def get_model(self):
return Person
def prepare_role_name(self, person):
return person.workplace.role_name
Then you can filter on a field role_name. Just make sure to update your index if eg. the name changes, then you have to update all the according entries in the search index.

You can also do this:
class ProfileIndex(indexes.SearchIndex, indexes.Indexable):
# your other fields, most likely model attributes
role_name = indexes.CharField(model_attr='workplace__role__name')
def get_model(self):
return Person
And you can filter by role_name.
I saw it here. http://django-haystack.readthedocs.org/en/latest/searchindex_api.html

Tastypie Dehydrate reverse relation count

I have a simple model which includes a product and category table. The Product model has a foreign key Category.
When I make a tastypie API call that returns a list of categories /api/vi/categories/
I would like to add a field that determines the "product count" / the number of products that have a giving category. The result would be something like:
category_objects[
{
id: 53
name: Laptops
product_count: 7
},
...
]
The following code is working but the hit on my DB is heavy
def dehydrate(self, bundle):
category = Category.objects.get(pk=bundle.obj.id)
products = Product.objects.filter(category=category)
bundle.data['product_count'] = products.count()
return bundle
Is there a more efficient way to build this query? Perhaps with annotate ?

You can use prefetch_related method of QuerSet to reverse select_related.
Asper documentation,
prefetch_related(*lookups)
Returns a QuerySet that will automatically
retrieve, in a single batch, related objects for each of the specified
lookups.
This has a similar purpose to select_related, in that both are
designed to stop the deluge of database queries that is caused by
accessing related objects, but the strategy is quite different.
If you change your dehydrate function to following then database will be hit single time.
def dehydrate(self, bundle):
category = Category.objects.prefetch_related("product_set").get(pk=bundle.obj.id)
bundle.data['product_count'] = category.product_set.count()
return bundle
UPDATE 1
You should not initialize queryset inside dehydrate function. queryset should be always set in Meta class only. Please have a look at following example from django-tastypie documentation.
class MyResource(ModelResource):
class Meta:
queryset = User.objects.all()
excludes = ['email', 'password', 'is_staff', 'is_superuser']
def dehydrate(self, bundle):
# If they're requesting their own record, add in their email address.
if bundle.request.user.pk == bundle.obj.pk:
# Note that there isn't an ``email`` field on the ``Resource``.
# By this time, it doesn't matter, as the built data will no
# longer be checked against the fields on the ``Resource``.
bundle.data['email'] = bundle.obj.email
return bundle
As per official django-tastypie documentation on dehydrate() function,
dehydrate
The dehydrate method takes a now fully-populated bundle.data & make
any last alterations to it. This is useful for when a piece of data
might depend on more than one field, if you want to shove in extra
data that isn’t worth having its own field or if you want to
dynamically remove things from the data to be returned.
dehydrate() is only meant to make any last alterations to bundle.data.

Your code does additional count query for each category. You're right about annotate being helpfull in this kind of a problem.
Django will include all queryset's fields in GROUP BY statement. Notice .values() and empty .group_by() serve limiting field set to required fields.
cat_to_prod_count = dict(Product.objects
.values('category_id')
.order_by()
.annotate(product_count=Count('id'))
.values_list('category_id', 'product_count'))
The above dict object is a map [category_id -> product_count].
It can be used in dehydrate method:
bundle.data['product_count'] = cat_to_prod_count[bundle.obj.id]
If that doesn't help, try to keep similar counter on category records and use singals to keep it up to date.
Note categories are usually a tree-like beings and you probably want to keep count of all subcategories as well.
In that case look at the package django-mptt.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.