Django models - how to properly calculate multiple values with various conditions - python

I want to calculate multiple values in a single query on my model. Each metric should have a different filter (or no filters at all). I'm using Django==2.2.3 and a Djongo model.
MyModel columns:
user_id = models.IntegerField(blank=True, null=True)
group_id = models.IntegerField(blank=True, null=True)
What I try to run, suggested in another topic, does not work for me.
Query:
MyModel.objects.aggregate
(
total=Count('user_id'),
test=Count('user_id', filter=Q(user_id='just_a_fake_id')),
group_1_value=Count('user_id', filter=Q(group_id=1)),
group_2_value=Count('user_id', filter=Q(group_id=2)),
)
The results: {'total': 0, 'test': 47479, 'group_1_value': 47479, 'group_2_value': 47479}, does not make sense - all results (except total) returns the same number which is the count of all the records.
What I want to run is query similar to
SELECT COUNT(user_id) as total,
COUNT(CASE WHEN group_id=1 THEN user_id END) as group_1_value,
COUNT(CASE WHEN group_id=2 THEN user_id END) as group_2_value
FROM MyModel
How do I modify the query in order to get the correct values?

Related

how can i change this SQL to Django ORM code?

select *
from sample
join process
on sample.processid = process.id
where (processid) in (
select max(processid) as processid
from main_sample
group by serialnumber
)
ORDER BY sample.create_at desc;
models.py
class Sample(models.Model):
processid = models.IntegerField(default=0)
serialnumber = models.CharField(max_length=256) ##
create_at = models.DateTimeField(null=True)
class Process(models.Model):
sample = models.ForeignKey(Sample, blank=False, null=True, on_delete=models.SET_NULL)
Hi I have two models and I need to change this SQL query to Django ORM, Python code.
I need to retrieve the latest Sample(by processid) per unique serial number.
for example,
=> after RUN query
How can I change the SQL query to ORM code?
how can i change the subquery to ORM?
Thanks for reading.
EDIT: To also order by a column that is not one of the distinct or retrieved columns you can fall-back on subqueries. To filter by a single row from a subquery you can use the syntax described in the docs here
from django.db.models import Subquery, OuterRef
subquery = Subquery(Sample.objects.filter(
serialnumber=OuterRef('serialnumber')
).order_by(
'-processid'
).values(
'processid'
)[:1])
results = Sample.objects.filter(
processid=subquery
).order_by(
'create_at'
)
When using PostgreSQL you can pass fields to distinct to get a single result per a certain column, this returns the first result so combined with ordering will do what you need
Sample.objects.order_by('serialnumber', '-processid').distinct('serialnumber')
If you don't use PostgreSQL. Use a values query of the column that should be unique and then annotate the queryset with the condition that should group the values, Max in this case
from django.db.models import Max
Sample.objects.order_by(
'serialnumber'
).values(
'serialnumber'
).annotate(
max_processid=Max('processid')
)
I think this is what you need:
If want multiple related objects
samples = Sample.objects.prefetch_related('process').group_by('serialinumber')
If you want related objects for only one object
samples = Sample.objects.filter(id=1).select_related('process').group_by('serialinumber')

Django ORM: How to group on a value and get a different value of last element in that group

I have been trying to tackle this problem all week but I just can't seem to find the solution.
Basically I want to group on 2 values (user and assignment), then take the last element based on date and get a sum of these scores. Below a description of the problem.
With Postgres this would be easily solved by using the .distinct("value") but unfortunately I do not use Postgres.
Any help would be much appreciated!!
UserAnswer
- user
- assignment
- date
- answer
- score
So I want to group on all user / assignment combinations. Then I want to get the score of each last element in that group. So basically:
user_1, assignment_1, 2019, score 1
user_1, assignment_1, 2020, score 2 <- Take this one
user_2, assignment_1, 2020, score 1
user_2, assignment_1, 2021, score 2 <- Take this one
My best attempt is using annotation but then I do not have the score value anymore:
UserAnswer.objects.filter(user=student, assignment__in=assignments)
.values("user", "assignment")
.annotate(latest_date=Max('date'))
At the end, I had to use raw query rather than django's ORM.
subquery2 = UserAnswer.objects.raw("\
SELECT id, user_id, assignment_id, score, MAX(date) AS latest_date\
FROM soforms_useranswer \
GROUP BY user_id, assignment_id\
")
# the raw queryset from above raw query
# is very similar to queryset you get from django ORM query.
# The difference is now we add 'id' and 'score' to the fields,
# so later we can retrieve them, like below.
sum2= 0
for obj in subquery2:
print(obj.score)
sum2 += obj.score
print('sum2 is')
print(sum2)
Here, I assumed that both user and assignment are foreinkeys. Something liek below:
class Assignment(models.Model):
name = models.CharField(max_length=50)
class UserAnswer(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE, related_name='answers')
assignment = models.ForeignKey(Assignment, on_delete=models.CASCADE)
#assignment = models.CharField(max_length=200)
score = models.IntegerField()
date = models.DateTimeField(default=timezone.now)

How do I include columns properly in a GROUP BY clause of my Django aggregate query?

I'm using Django and Python 3.7. I have the following model ...
class ArticleStat(models.Model):
objects = ArticleStatManager()
article = models.ForeignKey(Article, on_delete=models.CASCADE, related_name='articlestats')
elapsed_time_in_seconds = models.IntegerField(default=0, null=False)
score = models.FloatField(default=0, null=False)
I want to write a MAX/GROUP BY query subject to certain conditions. Specifically I want each row to contain
MAX(ArticleStat.elapsed_time_in_seconds)
ArticleStat.Article.id
ArticleStat.Article.title
ArticleStat.score
in which the columns "ArticleStat.Article.id," "ArticleStat.Article.title," and "ArticleStat.score" are unique per result set row. So I tried this ...
def get_current_articles(self, article):
qset = ArticleStat.objects.values('article__id', 'article__title', 'score').filter(
article__article=article).values('elapsed_time_in_seconds').annotate(\
max_date=Max('elapsed_time_in_seconds'))
print(qset.query)
return qset
However, the resulting SQL does not include the values I want to use in my GROUP BY clause (notice that neither "article" nor "score" is in the GROUP BY) ...
SELECT "myproject_articlestat"."elapsed_time_in_seconds",
MAX("myproject_articlestat"."elapsed_time_in_seconds") AS "max_date"
FROM "myproject_articlestat"
INNER JOIN "myproject_article" ON ("myproject_articlestat"."article_id" = "myproject_article"."id")
WHERE ("myproject_article"."article_id" = 2) GROUP BY "myproject_articlestat"."elapsed_time_in_seconds"
How do I modify my Django query to generate SQL consistent with what I want?
I don't think the answer by #Oleg will work but it's close..
A Subquery expression can be used to select a single value from another queryset. To accomplish this you sort by the value you wish to target then select the first value.
sq = (
ArticleStat.objects
.filter(article=OuterRef('pk'))
.order_by('-elapsed_time_in_seconds')
)
articles = (
Article.objects
.annotate(
max_date=Subquery(sq.values('elapsed_time_in_seconds')[:1]),
score=Subquery(sq.values('score')[:1]),
)
# .values('id', 'path', 'title', 'score', 'max_date')
)
You should not use the 'elapsed_time_in_seconds' in your .values(..) clause, and add the GROUP BY property in the .order_by(..) clause:
qset = ArticleStat.objects.filter(
article__article=article
).values(
'article__path', 'article__title', 'score'
).annotate(
max_date=Max('elapsed_time_in_seconds')
).order_by('article__path', 'article__title', 'score')
This will thus make a QuerySet of dictionaries such that each dictionary contains four elements: 'article__path', 'article__title', 'score', and 'max_date'.
As far as I see, there is no one-step way of doing this in Django.
One way of doing it is with 2 queries and to get use of .annotate() to
add the max id of the related object revision for each revision, then
get all those objectrevisions
Example:
objects = Object.objects.all().annotate(revision_id=Max
('objectrevision__id'))
objectrevisions = ObjectRevision.objects.filter(id__in=
[o.revision_id for o in objects])
This is untested and also its a bit slow, so may be you can also try to write custom SQL as mentioned by Wolfram Kriesing in the blog here
If I understand correctly from all the comments:
Result is to get Articles (id, path, title) filtered by the article argument of get_current_articles method with additional data - maximum elapsed_time_in_seconds from all the ArticleStats of each filtered article and also score of its ArticleStat with maximum elapsed_time.
If so, when the base query can be on Article model: Article.objects.filter(article=article).
Which we can annotate with the Max() of corresponding ArticleStats. This can be done directly on main query .annotate(max_date=Max(articlestats__elapsed_time_in_seconds)) or with Subquery on ArticleStat also filtered the same way as base query on article (we want subquery to run on the same set of article objects as the main query), i.e.
max_sq = ArticleStat.objects.filter(
article__article=OuterRef('article')
).annotate(max_date=Max('elapsed_time_in_seconds'))
Now, to add score column to the result. Max() is aggregate function and it has no row info. In order to get score for the maximum elapsed_time we can make another subquery and filter by max elapsed_time from previous column.
Note: This filter can return multiple ArticleStats objects for the same maximum elapsed_time and article but we will use only the first one. It is for your data structure to make sure that filter returns only one row or provide additional filtering or ordering such that first result will be the one required.
score_sq = ArticleStat.objects.filter(
article__article=OuterRef('article'),
elapsed_time_in_seconds=OuterRef('max_date')
)
And use our subqueries in the main query
qset = Article.objects.filter(
article=article
).annotate(
max_date=Max('articlestats__elapsed_time_in_seconds'),
""" or
max_date=Subquery(
max_sq.values('max_date')[:1]
),
"""
score=Subquery(
score_sq.values('score')[:1]
)
).values(
'id', 'path', 'title', 'score', 'max_date'
)
And somewhat tricky option without using Max() function
but emulating it with ORDER BY and fetching the first row.
artstat_sq = ArticleStat.objects.filter(
article__article=OuterRef('article')
).order_by().order_by('-elapsed_time_in_seconds', '-score')
# empty order_by() to clear any base ordering
qset = Article.objects.filter(
article=article
).annotate(
max_date=Subquery(
artstat_sq.values('elapsed_time_in_seconds')[:1]
),
score=Subquery(
artstat_sq.values('score')[:1]
)
).values(
'id', 'path', 'title', 'score', 'max_date'
)

Django get most recent value AND aggregate values

I have a model which I want to get both the most recent values out of, meaning the values in the most recently added item, and an aggregated value over a period of time. I can get the answers in separate QuerySets and then unite them in Python but I feel like there should be a better ORM approach to this. Anybody know how it can be done?
Simplified example:
Class Rating(models.Model):
movie = models.ForeignKey(Movie, related_name="movieRatings")
rating = models.IntegerField(blank=True, null=True)
timestamp = models.DateTimeField(auto_now_add=True)
I wish to get the avg rating in the past month and the most recent rating per movie.
Current approach:
recent_rating = Rating.objects.order_by('movie_id','-timestamp').distinct('movie')
monthly_ratings = Rating.objects.filter(timestamp__gte=datetime.datetime.now() - datetime.timedelta(days=30)).values('movie').annotate(month_rating=Avg('rating'))
And then I need to somehow join them on the movie id.
Thank you!
Try this solution based on Subquery expressions:
from django.db.models import OuterRef, Subquery, Avg, DecimalField
month_rating_subquery = Rating.objects.filter(
movie=OuterRef('movie'),
timestamp__gte=datetime.datetime.now() - datetime.timedelta(days=30)
).values('movie').annotate(monthly_avg=Avg('rating'))
result = Rating.objects.order_by('movie', '-timestamp').distinct('movie').values(
'movie', 'rating'
).annotate(
monthly_rating=Subquery(month_rating_subquery.values('monthly_avg'), output_field=DecimalField())
)
I suggest you add a property method (monthly_rating) to your rating model using the #property decorator instead of calculating it in your views.py :
#property
def monthly_rating(self):
return 'calculate your avg rating here'

GROUP BY in Django Queries

Dear StackOverFlow community:
I need your help in executing following SQL query.
select DATE(creation_date), COUNT(creation_date) from blog_article WHERE creation_date BETWEEN SYSDATE() - INTERVAL 30 DAY AND SYSDATE() GROUP BY DATE(creation_date) AND author="scott_tiger";
Here is my Django Model
class Article(models.Model):
title = models.CharField(...)
author = models.CharField(...)
creation_date = models.DateField(...)
How can I form aforementioned Django query using aggregate() and annotate() functions. I created something like this -
now = datetime.datetime.now()
date_diff = datetime.datetime.now() + datetime.timedelta(-30)
records = Article.objects.values('creation_date', Count('creation_date')).aggregate(Count('creation_date')).filter(author='scott_tiger', created_at__gt=date_diff, created_at__lte=now)
When I run this query it gives me following error -
'Count' object has no attribute 'split'
Any idea who to use it?
Delete Count('creation_date') from values and add annotate(Count('creation_date')) after filter.
Try
records = Article.objects.filter(author='scott_tiger', created_at__gt=date_diff,
created_at__lte=now).values('creation_date').aggregate(
ccd=Count('creation_date')).values('creation_date', 'ccd')
You need to use creation_date__count or customized name(ccd here) to refer the count result column, after aggregate().
Also, values() before aggregate limits group by columns and last value() declares the columns to be selected. There is no need to group by COUNT which is based on group of rows already.

Categories