Django conditional annotation without filtering - python

I have a Student model and an Entry model. Each Entry has a foreign key to a Student, a year-stamp, and two numeric values (value1 and value2). I am overriding the get_queryset() method in the StudentAdmin class, and using the Django ORM, I want to annotate a field that we'll call "specialvalue".
Students have at most one Entry for each year, but they might have none, and they might have an Entry for years in the future. The value of "specialvalue" will be equal to Entry__value1 minus Entry__value2 for the Entry for the current year. If the Student has no Entry for the current year, then specialvalue will just be equal to None, and these Students will NOT be removed from the queryset.
How can I do this? At first I tried splitting the queryset into two:
queryset_1 = queryset.filter(entry__year=THIS_YEAR).annotate(specialvalue=...)
queryset_2 = queryset.exclude(entry__year=THIS_YEAR).annotate(specialvalue=Value(None))
and then annotating them separately and merging them with the | pipe operator, but unfortunately this results in the wrong results due to a known bug in Django's ORM.
Thank you!

I think, using Django conditional expression will be applicable solution in your case. Read about Case expression for more information (https://docs.djangoproject.com/en/1.11/ref/models/conditional-expressions/#case). All you need is to check whether an entry is related to a CURRENT_YEAR or not.
Case() accepts any number of When() objects as individual arguments. Other options are provided using keyword arguments. If none of the conditions evaluate to TRUE, then the expression given with the default keyword argument is returned. If a default argument isn’t provided, None is used.
I didn't test this solution but I think it should work:
from django.db.models import Case, When, F, IntegerField
queryset = queryset.annotate(
specialvalue=Case(
When(entry__year=THIS_YEAR,
then=F('entry__value1') - F('entry__value2')
), output_field=IntegerField()
)
).values_list('specialvalue')

Related

Django: How to annotate M2M or OneToMany fields using a SubQuery?

I have Order objects and OrderOperation objects that represent an action on a Order (creation, modification, cancellation).
Conceptually, an order has 1 to many order operations. Each time there is an operation on the order, the total is computed in this operation. Which means when I need to find an attribute of an order, I just get the last order operation attribute instead, using a Subquery.
The simplified code
class OrderOperation(models.Model):
order = models.ForeignKey(Order)
total = DecimalField(max_digits=9, decimal_places=2)
class Order(models.Model)
# ...
class OrderQuerySet(query.Queryset):
#staticmethod
def _last_oo(field):
return Subquery(OrderOperation.objects
.filter(order_id=OuterRef("pk"))
.order_by('-id')
.values(field)
[:1])
def annotated_total(self):
return self.annotate(oo_total=self._last_oo('total'))
This way, I can run my_order_total = Order.objects.annotated_total()[0].oo_total. It works great.
The issue
Computing total is easy as it's a simple value. However, when there is a M2M or OneToMany field, this method does not work. For example, using the example above, let's add this field:
class OrderOperation(models.Model):
order = models.ForeignKey(Order)
total = DecimalField(max_digits=9, decimal_places=2)
ordered_articles = models.ManyToManyField(Article,through='orders.OrderedArticle')
Writing something like the following does NOT work as it returns only 1 foreign key (not a list of all the FKs):
def annotated_ordered_articles(self):
return self.annotate(oo_ordered_articles=self._last_oo('ordered_articles'))
The purpose
The whole purpose is to allow a user to search among all orders, providing a list or articles in input. For example: "Please find all orders containing at least article 42 or article 43", or "Please find all orders containing exactly article 42 and 43", etc.
If I could get something like:
>>> Order.objects.annotated_ordered_articles()[0].oo_ordered_articles
<ArticleQuerySet [<Article: Article42>, <Article: Article43>]>
or even:
>>> Order.objects.annotated_ordered_articles()[0].oo_ordered_articles
[42,43]
That would solve my issue.
My current idea
Maybe something like ArrayAgg (I'm using pgSQL) could do the trick, but I'm not sure to understand how to use it in my case.
Maybe this has to do with values() method that seems to not be intended to handle M2M and 1TM relations as stated in the doc:
values() and values_list() are both intended as optimizations for a
specific use case: retrieving a subset of data without the overhead of
creating a model instance. This metaphor falls apart when dealing with
many-to-many and other multivalued relations (such as the one-to-many
relation of a reverse foreign key) because the “one row, one object”
assumption doesn’t hold.
ArrayAgg will be great if you want to fetch only one variable (ie. name) from all articles. If you need more, there is a better option for that:
prefetch_related
Instead, you can prefetch for each Order, latest OrderOperation as a whole object. This adds the ability to easily get any field from OrderOperation without extra magic.
The only caveat with that is that you will always get a list with one operation or an empty list when there are no operations for selected order.
To do that, you should use prefetch_related queryset model together with Prefetch object and custom query for OrderOperation. Example:
from django.db.models import Max, F, Prefetch
last_order_operation_qs = OrderOperation.objects.annotate(
lop_pk=Max('order__orderoperation__pk')
).filter(pk=F('lop_pk'))
orders = Order.objects.prefetch_related(
Prefetch('orderoperation_set', queryset=last_order_operation_qs, to_attr='last_operation')
)
Then you can just use order.last_operation[0].ordered_articles to get all ordered articles for particular order. You can add prefetch_related('ordered_articles') to first queryset to have improved performance and less queries on database.
To my surprise, your idea with ArrayAgg is right on the money. I didn't know there was a way to annotate with an array (and I believe there still isn't for backends other than Postgres).
from django.contrib.postgres.aggregates.general import ArrayAgg
qs = Order.objects.annotate(oo_articles=ArrayAgg(
'order_operation__ordered_articles__id',
'DISTINCT'))
You can then filter the resulting queryset using the ArrayField lookups:
# Articles that contain the specified array
qs.filter(oo_articles__contains=[42,43])
# Articles that are identical to the specified array
qs.filter(oo_articles=[42,43,44])
# Articles that are contained in the specified array
qs.filter(oo_articles__contained_by=[41,42,43,44,45])
# Articles that have at least one element in common
# with the specified array
qs.filter(oo_articles__overlap=[41,42])
'DISTINCT' is needed only if the operation may contain duplicate articles.
You may need to tweak the exact name of the field passed to the ArrayAgg function. For subsequent filtering to work, you may also need to cast id fields in the ArrayAgg to int as otherwise Django casts the id array to ::serial[], and my Postgres complained about type "serial[]" does not exist:
from django.db.models import IntegerField
from django.contrib.postgres.fields.array import ArrayField
from django.db.models.functions import Cast
ArrayAgg(Cast('order_operation__ordered_articles__id', IntegerField()))
# OR
Cast(ArrayAgg('order_operation__ordered_articles__id'), ArrayField(IntegerField()))
Looking at your posted code more closely, you'll also have to filter on the one OrderOperation you are interested in; the query above looks at all operations for the relevant order.

Django conditional create

Does the Django ORM provide a way to conditionally create an object?
For example, let's say you want to use some sort of optimistic concurrency control for inserting new objects.
At a certain point, you know the latest object to be inserted in that table, and you want to only create a new object only if no new objects have been inserted since then.
If it's an update, you could filter based on a revision number:
updated = Account.objects.filter(
id=self.id,
version=self.version,
).update(
balance=balance + amount,
version=self.version + 1,
)
However, I can't find any documented way to provide conditions for a create() or save() call.
I'm looking for something that will apply these conditions at the SQL query level, so as to avoid "read-modify-write" problems.
EDIT: This is not an Optimistic Lock attempt. This is a direct answer to OP's provided code.
Django offers a way to implement conditional queries. It also offers the update_or_create(defaults=None, **kwargs) shortcut which:
The update_or_create method tries to fetch an object from the database based on the given kwargs. If a match is found, it updates the fields passed in the defaults dictionary.
The values in defaults can be callables.
So we can attempt to mix and match those two in order to recreate the supplied query:
obj, created = Account.objects.update_or_create(
id=self.id,
version=self.version,
defaults={
balance: Case(
When(version=self.version, then=F('balance')+amount),
default=amount
),
version: Case(
When(version=self.version, then=F('version')+1),
default=self.version
)
}
)
Breakdown of the Query:
The update_or_create will try to retrieve an object with id=self.id and version=self.version in the database.
Found: The object's balance and version fields will get updated with the values inside the Case conditional expressions accordingly (see the next section of the answer).
Not Found: The object with id=self.id and version=self.version will be created and then it will get its balance and version fields updated.
Breakdown of the Conditional Queries:
balance Query:
If the object exists, the When expression's condition will be true, therefore the balance field will get updated with the value of:
# Existing balance # Added amount
F('balance') + amount
If the object gets created, it will receive as an initial balance the amount value.
version Query:
If the object exists, the When expression's condition will be true, therefore the version field will get updated with the value of:
# Existing version # Next Version
F('version') + 1
If the object gets created, it will receive as an initial version the self.version value (it can also be a default initial version like 1.0.0).
Notes:
You may need to provide an output_field argument to the Case expression, have a look here.
In case (pun definitely intended) of curiosity about what F() expression is and how it is used, I have a Q&A style example here: How to execute arithmetic operations between Model fields in django
Except for QuerySet.update returning the number of affected rows Django doesn't provide any primitives to deal with optimistic locking.
However there's a few third-party apps out there that provide such a feature.
django-concurrency which is the most popular option that provides both database level constraints and application one
django-optimistic-lock which is a bit less popular but I've tried in a past project and it was working just fine.
django-locking unmaintained.
Edit: It looks like OP was not after optimistic locking solutions after all.

django ORM AND-lookups on ManyToMany

How do I implement AND-lookups on Q() of the same field.
class Foo(models.Model):
name = models.ManyToManyField(Type)
class Type(models.Model):
type = models.CharField()
I tried:
Foo.objects.filter(Q(name__type='any'), Q(name__type='some'))
and
Foo.objects.filter(Q(name__type='any') & Q(name__type='some'))
However they both deliver empty querysets.
I am aware, that this will work:
Foo.objects.filter(name__type='any').filter(name__type='some')
However when creating more complex queries where Q1, Q2, Q3 are Q() objects then
Foo.objects.filter(Q1).filter(Q2).filter(Q3)
could deliver different results than
Foo.objects.filter(Q2).filter(Q3).filter(Q1)
due to the order in which the filter is applied.
UPDATE:
I realized that this AND query works, as the modelField is not ManyToMany but rather CharField:
Type.objects.filter(Q(type__icontains='some') & Q(type__icontains='thing'))
So making an AND-Query only does not work on the ManyToMany-Relations.
The filter is already an AND lookup. Usually you will want to use Q() queries to compose OR lookups. I'm saying that because with AND lookups there are other ways to compose dynamic queries for example.
It's expected that the following query would return an empty queryset:
Foo.objects.filter(Q(name__type='any') & Q(name__type='some'))
Because either Type instance is named "any" or "some" -- it cannot be named both at the same time, so the queryset will always be empty.
But the syntax is correct. You can use it to join multiple AND lookup expressions (as long as they are filtering different fields, otherwise you may end up with an empty queryset). Because the filter is applied to each row in the database.
Another way to create complex and dynamic filter (using AND lookup) is using a dictionary:
params = {
'first_name': 'foo',
'last_name': 'bar'
}
if condition:
params['email'] = 'foo#bar.com'
Foo.objects.filter(**params)
As Victor Freitas already explained, it is not possible that a field can have two values ​​in the same record, but since your relationship is a many-to-many, there can be several records and therefore the two values ​​can exist for the same field, but to make this query you must use the OR operator:
Foo.objects.filter(Q(name__type='any') | Q(name__type='some'))

Reorder Django QuerySet by dynamically added field

A have piece of code, which fetches some QuerySet from DB and then appends new calculated field to every object in the Query Set. It's not an option to add this field via annotation (because it's legacy and because this calculation based on another already pre-fetched data).
Like this:
from django.db import models
class Human(models.Model):
name = models.CharField()
surname = models.CharField()
def calculate_new_field(s):
return len(s.name)*42
people = Human.objects.filter(id__in=[1,2,3,4,5])
for s in people:
s.new_column = calculate_new_field(s)
# people.somehow_reorder(new_order_by=new_column)
So now all people in QuerySet have a new column. And I want order these objects by new_column field. order_by() will not work obviously, since it is a database option. I understand thatI can pass them as a sorted list, but there is a lot of templates and other logic, which expect from this object QuerySet-like inteface with it's methods and so on.
So question is: is there some not very bad and dirty way to reorder existing QuerySet by dinamically added field or create new QuerySet-like object with this data? I believe I'm not the only one who faced this problem and it's already solved with django. But I can't find anything (except for adding third-party libs, and this is not an option too).
Conceptually, the QuerySet is not a list of results, but the "instructions to get those results". It's lazily evaluated and also cached. The internal attribute of the QuerySet that keeps the cached results is qs._result_cache
So, the for s in people sentence is forcing the evaluation of the query and caching the results.
You could, after that, sort the results by doing:
people._result_cache.sort(key=attrgetter('new_column'))
But, after evaluating a QuerySet, it makes little sense (in my opinion) to keep the QuerySet interface, as many of the operations will cause a reevaluation of the query. From this point on you should be dealing with a list of Models
Can you try it functions.Length:
from django.db.models.functions import Length
qs = Human.objects.filter(id__in=[1,2,3,4,5])
qs.annotate(reorder=Length('name') * 42).order_by('reorder')

Django filter related field using related model's custom manager

How can I apply annotations and filters from a custom manager queryset when filtering via a related field? Here's some code to demonstrate what I mean.
Manager and models
from django.db.models import Value, BooleanField
class OtherModelManager(Manager):
def get_queryset(self):
return super(OtherModelManager, self).get_queryset().annotate(
some_flag=Value(True, output_field=BooleanField())
).filter(
disabled=False
)
class MyModel(Model):
other_model = ForeignKey(OtherModel)
class OtherModel(Model):
disabled = BooleanField()
objects = OtherModelManager()
Attempting to filter the related field using the manager
# This should only give me MyModel objects with related
# OtherModel objects that have the some_flag annotation
# set to True and disabled=False
my_model = MyModel.objects.filter(some_flag=True)
If you try the above code you will get the following error:
TypeError: Related Field got invalid lookup: some_flag
To further clarify, essentially the same question was reported as a bug with no response on how to actually achieve this: https://code.djangoproject.com/ticket/26393.
I'm aware that this can be achieved by simply using the filter and annotation from the manager directly in the MyModel filter, however the point is to keep this DRY and ensure this behaviour is repeated everywhere this model is accessed (unless explicitly instructed not to).
How about running nested queries (or two queries, in case your backend is MySQL; performance).
The first to fetch the pk of the related OtherModel objects.
The second to filter the Model objects on the fetched pks.
other_model_pks = OtherModel.objects.filter(some_flag=...).values_list('pk', flat=True)
my_model = MyModel.objects.filter(other_model__in=other_model_pks)
# use (...__in=list(other_model_pks)) for MySQL to avoid a nested query.
I don't think what you want is possible.
1) I think you are miss-understanding what annotations do.
Generating aggregates for each item in a QuerySet
The second way to generate summary values is to generate an
independent summary for each object in a QuerySet. For example, if you
are retrieving a list of books, you may want to know how many authors
contributed to each book. Each Book has a many-to-many relationship
with the Author; we want to summarize this relationship for each book
in the QuerySet.
Per-object summaries can be generated using the annotate() clause.
When an annotate() clause is specified, each object in the QuerySet
will be annotated with the specified values.
The syntax for these annotations is identical to that used for the
aggregate() clause. Each argument to annotate() describes an aggregate
that is to be calculated.
So when you say:
MyModel.objects.annotate(other_model__some_flag=Value(True, output_field=BooleanField()))
You are not annotation some_flag over other_model.
i.e. you won't have: mymodel.other_model.some_flag
You are annotating other_model__some_flag over mymodel.
i.e. you will have: mymodel.other_model__some_flag
2) I'm not sure how familiar SQL is for you, but in order to preserve MyModel.objects.filter(other_model__some_flag=True) possible, i.e. to keep the annotation when doing JOINS, the ORM would have to do a JOIN over subquery, something like:
INNER JOIN
(
SELECT other_model.id, /* more fields,*/ 1 as some_flag
FROM other_model
) as sub on mymodel.other_model_id = sub.id
which would be super slow and I'm not surprised they are not doing it.
Possible solution
don't annotate your field, but add it as a regular field in your model.
The simplified answer is that models are authoritative on the field collection and Managers are authoritative on collections of models. In your efforts to make it DRY you made it WET, cause you alter the field collection in your manager.
In order to fix it, you would have to teach the model about the lookup and need to do that using the Lookup API.
Now I'm assuming that you're not actually annotating with a fixed value, so if that annotation is in fact reducible to fields, then you may just get it done, because in the end it needs to be mapped to database representation.

Categories