How to aggregate the average of a calculation based on two columns?

How to aggregate the average of a calculation based on two columns? - python

I want to write a Django query to give me the average across all rows in my table. My model looks like
class StatByDow(models.Model):
total_score = models.DecimalField(default=0, max_digits=12, decimal_places=2)
num_articles = models.IntegerField(default=0)
day_of_week = IntegerField(
null=True,
validators=[
MaxValueValidator(6),
MinValueValidator(0)
]
)
and I attempt to calculate the average like this
everything_avg = StatByDow.objects.all().aggregate(Avg(Func(F('total_score') / F('num_articles'))))
but this results in the error
File "/Users/davea/Documents/workspace/mainsite_project/venv/lib/python3.7/site-packages/django/db/models/query.py", line 362, in aggregate
raise TypeError("Complex aggregates require an alias")
TypeError: Complex aggregates require an alias
What's the right way to calculate the average?

You don't need Func for the division, but you need to reconcile the two different field types. Use an ExpressionWrapper around Avg:
from django.db.models import ExpressionWrapper
everything_avg = (StatByDow.objects
.aggregate(avg=ExpressionWrapper(
Avg(F('total_score') / F('num_articles')),
DecimalField()
))
)
You could also use a Cast from integer to decimal (not with PostgreSQL, which objects to Django's syntax ::numeric(NONE, NONE)) or an ExpressionWrapper around the division, but just one ExpressionWrapper at the end is the quickest solution as it happens once at the end.

you need to pass a name of an alias (obviously by the error text) for aggregate function. the query should be something like this:
everything_avg = StatByDow.objects.all().aggregate(avg_f=Avg(Func(F('total_score') / F('num_articles'))))

Related

how can i change this SQL to Django ORM code?

select *
from sample
join process
on sample.processid = process.id
where (processid) in (
select max(processid) as processid
from main_sample
group by serialnumber
)
ORDER BY sample.create_at desc;
models.py
class Sample(models.Model):
processid = models.IntegerField(default=0)
serialnumber = models.CharField(max_length=256) ##
create_at = models.DateTimeField(null=True)
class Process(models.Model):
sample = models.ForeignKey(Sample, blank=False, null=True, on_delete=models.SET_NULL)
Hi I have two models and I need to change this SQL query to Django ORM, Python code.
I need to retrieve the latest Sample(by processid) per unique serial number.
for example,
=> after RUN query
How can I change the SQL query to ORM code?
how can i change the subquery to ORM?
Thanks for reading.

EDIT: To also order by a column that is not one of the distinct or retrieved columns you can fall-back on subqueries. To filter by a single row from a subquery you can use the syntax described in the docs here
from django.db.models import Subquery, OuterRef
subquery = Subquery(Sample.objects.filter(
serialnumber=OuterRef('serialnumber')
).order_by(
'-processid'
).values(
'processid'
)[:1])
results = Sample.objects.filter(
processid=subquery
).order_by(
'create_at'
)
When using PostgreSQL you can pass fields to distinct to get a single result per a certain column, this returns the first result so combined with ordering will do what you need
Sample.objects.order_by('serialnumber', '-processid').distinct('serialnumber')
If you don't use PostgreSQL. Use a values query of the column that should be unique and then annotate the queryset with the condition that should group the values, Max in this case
from django.db.models import Max
Sample.objects.order_by(
'serialnumber'
).values(
'serialnumber'
).annotate(
max_processid=Max('processid')
)

I think this is what you need:
If want multiple related objects
samples = Sample.objects.prefetch_related('process').group_by('serialinumber')
If you want related objects for only one object
samples = Sample.objects.filter(id=1).select_related('process').group_by('serialinumber')

django : using if else or while else inside Sum function with annotation? Cannot compute Sum('<CombinedExpression:..') is an aggregate

i want to set a condition to my Sum function inside annotate , and i tried to use Case When but it didnt work in my case
this is my models.py
class MyModel(models.Model):
name = models.ForeignKey(Product, on_delete=models.CASCADE)
order = models.IntegerField()
price = models.IntegerField()
class Prodcut(models.Model):
name = models.CharField(max_lenth=20)
cost = models.IntegerField()
price = models.IntegerField()
i want to something like this
total = F('price')*F('order')
base = (F(name__cost')+F('name__price')) * F('order')
if total> base:
income = Sum(F('total') - F('base'))
i tried this
MyModel.objects.values('name__name').annotate(total=(Sum(F('price') * F('order'),output_field=IntegerField())),
base=(Sum((F('name__price')+F('name__cost'))*F('order'),output_field=IntegerField())
),
income=Sum(
Case(When(total__gt=F('base') , then=Sum(F('total') - F('base'))),default=0),output_field=IntegerField()),)
but this raise this error:
Cannot compute Sum('<CombinedExpression: F(total) - F(base)>'): '<CombinedExpression: F(total) - F(base)>' is an aggregate
i dont want to use .filter(income__gt=0) because it stops quantity from counting
and i dont want to counting income to those products which loss its sold
for example
i make a post on MyModel(name=mouse ,order=2,price=20) and in my Product model i have these information for mouse product Product(name=mouse,cost=4,price=10) , when i calculate to find income for this product : (2 *20) - ((4+10)*2) => 40 - 28 = 12 , but sometimes happen the result will be a negative price when (2*10) - ((4+10)*2) => 20 - 28 = -8
*i use mysql v:8 for database
i want to prevent negative numbers to add to my income with respect the other columns quantity

The problem is that you cannot use an aggregate (total and base) inside yet another aggregate in the same query. There is only one GROUP BY clause and Django cannot automatically produce a valid query here. As far as I've understood, you need to firstly calculate total and base, find each MyModel income, and only then produce an aggregate:
MyModel.objects.annotate(
total=F('price') * F('order'),
base=(F('name__price') + F('name__cost')) * F('order'),
income=Case(
When(total__gt=F('base'), then=F('total') - F('base')),
default=0,
output_field=IntegerField()
)
).values('name__name').annotate(income=Sum('income'))
P.S. Please, format your code so people can read it without difficulties :)
P.P.S I can probably see another way, you don't need Sum() for the income because total and base are sums already
MyModel.objects.values('name__name').annotate(
total=Sum(F('price') * F('order')),
base=Sum((F('name__price') + F('name__cost')) * F('order')),
).annotate(
income=Case(
When(total__gt=F('base'), then=F('total') - F('base')),
default=0,
output_field=IntegerField()
)
)

Try this, maybe some twists needed, idea is using Conditional Expressions
from django.db.models import Case, When, Value, IntegerField
MyModel.objects.values('name__name').annotate(
total = F('price')*F('order')
base = (F('name__cost') + F('name__price')) * F('order')
).annotate(
income = Case(
When(total__gt=F('base'), then=Sum(F('total')-F('base'))
), default = F('total'), output_field=IntegerField())
)

How do I include columns properly in a GROUP BY clause of my Django aggregate query?

I'm using Django and Python 3.7. I have the following model ...
class ArticleStat(models.Model):
objects = ArticleStatManager()
article = models.ForeignKey(Article, on_delete=models.CASCADE, related_name='articlestats')
elapsed_time_in_seconds = models.IntegerField(default=0, null=False)
score = models.FloatField(default=0, null=False)
I want to write a MAX/GROUP BY query subject to certain conditions. Specifically I want each row to contain
MAX(ArticleStat.elapsed_time_in_seconds)
ArticleStat.Article.id
ArticleStat.Article.title
ArticleStat.score
in which the columns "ArticleStat.Article.id," "ArticleStat.Article.title," and "ArticleStat.score" are unique per result set row. So I tried this ...
def get_current_articles(self, article):
qset = ArticleStat.objects.values('article__id', 'article__title', 'score').filter(
article__article=article).values('elapsed_time_in_seconds').annotate(\
max_date=Max('elapsed_time_in_seconds'))
print(qset.query)
return qset
However, the resulting SQL does not include the values I want to use in my GROUP BY clause (notice that neither "article" nor "score" is in the GROUP BY) ...
SELECT "myproject_articlestat"."elapsed_time_in_seconds",
MAX("myproject_articlestat"."elapsed_time_in_seconds") AS "max_date"
FROM "myproject_articlestat"
INNER JOIN "myproject_article" ON ("myproject_articlestat"."article_id" = "myproject_article"."id")
WHERE ("myproject_article"."article_id" = 2) GROUP BY "myproject_articlestat"."elapsed_time_in_seconds"
How do I modify my Django query to generate SQL consistent with what I want?

I don't think the answer by #Oleg will work but it's close..
A Subquery expression can be used to select a single value from another queryset. To accomplish this you sort by the value you wish to target then select the first value.
sq = (
ArticleStat.objects
.filter(article=OuterRef('pk'))
.order_by('-elapsed_time_in_seconds')
)
articles = (
Article.objects
.annotate(
max_date=Subquery(sq.values('elapsed_time_in_seconds')[:1]),
score=Subquery(sq.values('score')[:1]),
)
# .values('id', 'path', 'title', 'score', 'max_date')
)

You should not use the 'elapsed_time_in_seconds' in your .values(..) clause, and add the GROUP BY property in the .order_by(..) clause:
qset = ArticleStat.objects.filter(
article__article=article
).values(
'article__path', 'article__title', 'score'
).annotate(
max_date=Max('elapsed_time_in_seconds')
).order_by('article__path', 'article__title', 'score')
This will thus make a QuerySet of dictionaries such that each dictionary contains four elements: 'article__path', 'article__title', 'score', and 'max_date'.

As far as I see, there is no one-step way of doing this in Django.
One way of doing it is with 2 queries and to get use of .annotate() to
add the max id of the related object revision for each revision, then
get all those objectrevisions
Example:
objects = Object.objects.all().annotate(revision_id=Max
('objectrevision__id'))
objectrevisions = ObjectRevision.objects.filter(id__in=
[o.revision_id for o in objects])
This is untested and also its a bit slow, so may be you can also try to write custom SQL as mentioned by Wolfram Kriesing in the blog here

If I understand correctly from all the comments:
Result is to get Articles (id, path, title) filtered by the article argument of get_current_articles method with additional data - maximum elapsed_time_in_seconds from all the ArticleStats of each filtered article and also score of its ArticleStat with maximum elapsed_time.
If so, when the base query can be on Article model: Article.objects.filter(article=article).
Which we can annotate with the Max() of corresponding ArticleStats. This can be done directly on main query .annotate(max_date=Max(articlestats__elapsed_time_in_seconds)) or with Subquery on ArticleStat also filtered the same way as base query on article (we want subquery to run on the same set of article objects as the main query), i.e.
max_sq = ArticleStat.objects.filter(
article__article=OuterRef('article')
).annotate(max_date=Max('elapsed_time_in_seconds'))
Now, to add score column to the result. Max() is aggregate function and it has no row info. In order to get score for the maximum elapsed_time we can make another subquery and filter by max elapsed_time from previous column.
Note: This filter can return multiple ArticleStats objects for the same maximum elapsed_time and article but we will use only the first one. It is for your data structure to make sure that filter returns only one row or provide additional filtering or ordering such that first result will be the one required.
score_sq = ArticleStat.objects.filter(
article__article=OuterRef('article'),
elapsed_time_in_seconds=OuterRef('max_date')
)
And use our subqueries in the main query
qset = Article.objects.filter(
article=article
).annotate(
max_date=Max('articlestats__elapsed_time_in_seconds'),
""" or
max_date=Subquery(
max_sq.values('max_date')[:1]
),
"""
score=Subquery(
score_sq.values('score')[:1]
)
).values(
'id', 'path', 'title', 'score', 'max_date'
)
And somewhat tricky option without using Max() function
but emulating it with ORDER BY and fetching the first row.
artstat_sq = ArticleStat.objects.filter(
article__article=OuterRef('article')
).order_by().order_by('-elapsed_time_in_seconds', '-score')
# empty order_by() to clear any base ordering
qset = Article.objects.filter(
article=article
).annotate(
max_date=Subquery(
artstat_sq.values('elapsed_time_in_seconds')[:1]
),
score=Subquery(
artstat_sq.values('score')[:1]
)
).values(
'id', 'path', 'title', 'score', 'max_date'
)

Django get most recent value AND aggregate values

I have a model which I want to get both the most recent values out of, meaning the values in the most recently added item, and an aggregated value over a period of time. I can get the answers in separate QuerySets and then unite them in Python but I feel like there should be a better ORM approach to this. Anybody know how it can be done?
Simplified example:
Class Rating(models.Model):
movie = models.ForeignKey(Movie, related_name="movieRatings")
rating = models.IntegerField(blank=True, null=True)
timestamp = models.DateTimeField(auto_now_add=True)
I wish to get the avg rating in the past month and the most recent rating per movie.
Current approach:
recent_rating = Rating.objects.order_by('movie_id','-timestamp').distinct('movie')
monthly_ratings = Rating.objects.filter(timestamp__gte=datetime.datetime.now() - datetime.timedelta(days=30)).values('movie').annotate(month_rating=Avg('rating'))
And then I need to somehow join them on the movie id.
Thank you!

Try this solution based on Subquery expressions:
from django.db.models import OuterRef, Subquery, Avg, DecimalField
month_rating_subquery = Rating.objects.filter(
movie=OuterRef('movie'),
timestamp__gte=datetime.datetime.now() - datetime.timedelta(days=30)
).values('movie').annotate(monthly_avg=Avg('rating'))
result = Rating.objects.order_by('movie', '-timestamp').distinct('movie').values(
'movie', 'rating'
).annotate(
monthly_rating=Subquery(month_rating_subquery.values('monthly_avg'), output_field=DecimalField())
)

I suggest you add a property method (monthly_rating) to your rating model using the #property decorator instead of calculating it in your views.py :
#property
def monthly_rating(self):
return 'calculate your avg rating here'

Django calculate Avg for a computed field

I have a model as below.
class Transaction(models.Model):
time1 = models.DateTimeField(null=True)
time2 = models.DateTimeField(null=True)
#property
def time_diff(self):
return time2.total_seconds() - time1.total_seconds()
I need to get the average of the (time2 - time1) in seconds for a list of records, if both values are not null.
time_avg = transactions.aggregate(total=Avg('time_diff',field='time_diff'))
This gives an error saying 'time_diff' is not a valid field. I want to keep this column as a derived property. Not a stored column.

You first need to find difference and then take avarage. You can use django F function for this like this.
from django.db.models import F
result = Transaction.objects.filter('your_filter_here').annotate(time_diff=F('time1')-F('time2')).aggregate(Avg('time_diff'))

Have you tried
time_avg = transactions.aggregate(Avg('time_diff')).values()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to aggregate the average of a calculation based on two columns? - python

you need to pass a name of an alias (obviously by the error text) for aggregate function. the query should be something like this: everything_avg = StatByDow.objects.all().aggregate(avg_f=Avg(Func(F('total_score') / F('num_articles'))))

Related

how can i change this SQL to Django ORM code?

django : using if else or while else inside Sum function with annotation? Cannot compute Sum('<CombinedExpression:..') is an aggregate

How do I include columns properly in a GROUP BY clause of my Django aggregate query?

Django get most recent value AND aggregate values

Django calculate Avg for a computed field

Categories

Resources