django - annotate() instead of distinct() - python

I am stuck in this issue:
I have two models:
Location and Rate.
each location has its rate, possibly multiple rates.
i want to get locations ordered by its rates, ascendingly.
obvouisly, order_by and distinct() dont work together:
locations = Location.objects.filter(**s_kwargs).order_by('locations_rate__rate').distinct('id')
then i read the docs and came to annotate(). but i am not sure whether i have to use a function between annotate.
if i do this:
locations = Location.objects.filter(**s_kwargs).annotate(rate=Count('locations_rate__rate')).order_by('rate')
but this counts the rates and orders by the sum. i want to get locations with its rates ordered by the value of those rates.
my model definitions are:
class Location(models.Model):
name = models.TextField()
adres = models.TextField()
class Rate(models.Model):
location = models.ForeignKey(Location,related_name='locations_rate')
rate = models.IntegerField(max_length=2)
price_rate = models.IntegerField(max_length=2) #<--- added now
datum = models.DateTimeField(auto_now_add=True,blank=True) #<--- added now

Well the issue is not how to make query in Django for the problem you described. It's that your problem is either incorrect or not property thought through. Let me explained with an example:
Suppose you have two Location objects, l1 and l2. l1 has two Rate objects related to it, r1 and r3, such that r1.rate = 1 and r3.rate = 3; And l2 has one rate object related to it, r2, such that r2.rate = 2. Now what should be the order of your query's result l1 followed l2 or l2 followed by l1?? As one of l1's rate is less than l2's rate and the other one is greater than l2's rate.

Try this:
from django.db.models import Count, Sum
# if you want to annotate by count of rates
locations = Location.objects.filter(**s_kwargs) \
.annotate(rate_count = Count('locations_rate')) \
.order_by('rate_count')
# if you want to annotate on values of rate e.g. Sum
locations = Location.objects.filter(**s_kwargs) \
.annotate(rate_count = Sum('locations_rate')) \
.order_by('rate_count')

Possibly you want something like this:
locations = (Location.objects.filter(**s_kwargs)
.values('locations_rate__rate')
.annotate(Count('locations_rate__rate'))
.order_by('locations_rate__rate'))
You need the Count() since you actually need a GROUP BY query, and GROUP BY only works with aggregate functions like COUNT or SUM.
Anyway I think your problem can be solved with normal distinct():
locations = (Location.objects.filter(**s_kwargs)
.order_by('locations_rate__rate')
.distinct('locations_rate__rate'))
Why would you want to use annotate() instead?
I haven't tested both but hope it helps.

annotate(*args, **kwargs),Annotates each object in the QuerySet with the provided list of aggregate values (averages, sums, etc) that have
been computed over the objects that are related to the objects in the QuerySet.
So if you want only to get locations ordered by its rates, ascendingly you dont have to use annotate()
you can try this :
loc = Location.objects.all()
rate = Rate.objects.filter(loc=rate__location).order_by('-rate')

Related

Calculate the ForeignKey type Percentage (individual) Django ORM

I want to calculate the percentage of all car types using Django ORM, or group by all of the cars on the basis of their types, and calculate the percentage. I've multiple solutions but they are old-fashioned and itrative. I am going to use this query over the dashboard where already multiple queries calculating different analytics. I don't want to compromise on performance, that's why I prefer the single query. Here is the structure of my tables (written) on Django:
class CarType:
name = models.CharField(max_length=50)
class Car:
car_type = models.ForeignKey(CarType, on_delete=models.CASCADE)
I have a utility function that has the following details:
input => cars: (Queryset) of cars Django.
output => list of all car_types (dictionaries) having percentage.
[{'car_type': 'car01', 'percentage': 70, 'this_car_type_count': 20}, ...]
What I've tried so far:
cars.annotate(
total=Count('pk')
).annotate(
car_type_name=F('car_type__name')
).values(
'car_type_name'
).annotate(
car_type_count=Count('car_type_name'),
percentage=Cast(F('car_type_count') * 100.0 / F('total'), FloatField()),
)
But, this solution is giving 100% on all car_types. I know this weird behavior is because of the values() I'm using, but I've kinda stuck it here.
F('total') will be the count of cars within each group (each car type) not the total count of the whole table. This is why you always get 100%. You can achieve what you want in two queries:
total = cars.count()
cars.annotate(
car_type_name=F('car_type__name')
).values(
'car_type_name'
).annotate(
car_type_count=Count('car_type_name'),
percentage=Cast(F('car_type_count') * 100.0 / total, FloatField())
)
If you really want to do this in one query instead of two, the computation of total will need to be a window function instead of a regular aggregate.

Aggregating a windowed queryset in Django

Background
Suppose we have a set of questions, and a set of students that answered these questions.
The answers have been reviewed, and scores have been assigned, on some unknown range.
Now, we need to normalize the scores with respect to the extreme values within each question.
For example, if question 1 has a minimum score of 4 and a maximum score of 12, those scores would be normalized to 0 and 1 respectively. Scores in between are interpolated linearly (as described e.g. in Normalization to bring in the range of [0,1]).
Then, for each student, we would like to know the mean of the normalized scores for all questions combined.
Minimal example
Here's a very naive minimal implementation, just to illustrate what we would like to achieve:
class Question(models.Model):
pass
class Student(models.Model):
def mean_normalized_score(self):
normalized_scores = []
for score in self.score_set.all():
normalized_scores.append(score.normalized_value())
return mean(normalized_scores) if normalized_scores else None
class Score(models.Model):
student = models.ForeignKey(to=Student, on_delete=models.CASCADE)
question = models.ForeignKey(to=Question, on_delete=models.CASCADE)
value = models.FloatField()
def normalized_value(self):
limits = Score.objects.filter(question=self.question).aggregate(
min=models.Min('value'), max=models.Max('value'))
return (self.value - limits['min']) / (limits['max'] - limits['min'])
This works well, but it is quite inefficient in terms of database queries, etc.
Goal
Instead of the implementation above, I would prefer to offload the number-crunching on to the database.
What I've tried
Consider, for example, these two use cases:
list the normalized_value for all Score objects
list the mean_normalized_score for all Student objects
The first use case can be covered using window functions in a query, something like this:
w_min = Window(expression=Min('value'), partition_by=[F('question')])
w_max = Window(expression=Max('value'), partition_by=[F('question')])
annotated_scores = Score.objects.annotate(
normalized_value=(F('value') - w_min) / (w_max - w_min))
This works nicely, so the Score.normalized_value() method from the example is no longer needed.
Now, I would like to do something similar for the second use case, to replace the Student.mean_normalized_score() method by a single database query.
The raw SQL could look something like this (for sqlite):
SELECT id, student_id, AVG(normalized_value) AS mean_normalized_score
FROM (
SELECT
myapp_score.*,
((myapp_score.value - MIN(myapp_score.value) OVER (PARTITION BY myapp_score.question_id)) / (MAX(myapp_score.value) OVER (PARTITION BY myapp_score.question_id) - MIN(myapp_score.value) OVER (PARTITION BY myapp_score.question_id)))
AS normalized_value
FROM myapp_score
)
GROUP BY student_id
I can make this work as a raw Django query, but I have not yet been able to reproduce this query using Django's ORM.
I've tried building on the annotated_scores queryset described above, using Django's Subquery, annotate(), aggregate(), Prefetch, and combinations of those, but I must be making a mistake somewhere.
Probably the closest I've gotten is this:
subquery = Subquery(annotated_scores.values('normalized_value'))
Score.objects.values('student_id').annotate(mean=Avg(subquery))
But this is incorrect.
Could someone point me in the right direction, without resorting to raw queries?
I may have found a way to do this using subqueries. The main thing is at least from django, we cannot use the window functions on aggregates, so that's what is blocking the calculation of the mean of the normalized values. I've added comments on the lines to explain what I'm trying to do:
# Get the minimum score per question
min_subquery = Score.objects.filter(question=OuterRef('question')).values('question').annotate(min=Min('value'))
# Get the maximum score per question
max_subquery = Score.objects.filter(question=OuterRef('question')).values('question').annotate(max=Max('value'))
# Calculate the normalized value per score, then get the average by grouping by students
mean_subquery = Score.objects.filter(student=OuterRef('pk')).annotate(
min=Subquery(min_subquery.values('min')[:1]),
max=Subquery(max_subquery.values('max')[:1]),
normalized=ExpressionWrapper((F('value') - F('min'))/(F('max') - F('min')), output_field=FloatField())
).values('student').annotate(mean=Avg('normalized'))
# Get the calculated mean per student
Student.objects.annotate(mean=Subquery(mean_subquery.values('mean')[:1]))
The resulting SQL is:
SELECT
"student"."id",
"student"."name",
(
SELECT
AVG(
(
(
V0."value" - (
SELECT
MIN(U0."value") AS "min"
FROM
"score" U0
WHERE
U0."question_id" = (V0."question_id")
GROUP BY
U0."question_id"
LIMIT
1
)
) / (
(
SELECT
MAX(U0."value") AS "max"
FROM
"score" U0
WHERE
U0."question_id" = (V0."question_id")
GROUP BY
U0."question_id"
LIMIT
1
) - (
SELECT
MIN(U0."value") AS "min"
FROM
"score" U0
WHERE
U0."question_id" = (V0."question_id")
GROUP BY
U0."question_id"
LIMIT
1
)
)
)
) AS "mean"
FROM
"score" V0
WHERE
V0."student_id" = ("student"."id")
GROUP BY
V0."student_id"
LIMIT
1
) AS "mean"
FROM
"student"
As mentioned by #bdbd, and judging from this Django issue, it appears that annotating a windowed queryset is not yet possible (using Django 3.2).
As a temporary workaround, I refactored #bdbd's excellent Subquery solution as follows.
class ScoreQuerySet(models.QuerySet):
def annotate_normalized(self):
w_min = Subquery(self.filter(
question=OuterRef('question')).values('question').annotate(
min=Min('value')).values('min')[:1])
w_max = Subquery(self.filter(
question=OuterRef('question')).values('question').annotate(
max=Max('value')).values('max')[:1])
return self.annotate(normalized=(F('value') - w_min) / (w_max - w_min))
def aggregate_student_mean(self):
return self.annotate_normalized().values('student_id').annotate(
mean=Avg('normalized'))
class Score(models.Model):
objects = ScoreQuerySet.as_manager()
...
Note: If necessary, we can add more Student lookups to the values() in aggregate_student_mean(), e.g. student__name. As long as we take care not to mess up the grouping.
Now, if it ever becomes possible to filter and annotate windowed querysets, we can simply replace the Subquery lines by the much simpler Window implementation:
w_min = Window(expression=Min('value'), partition_by=[F('question')])
w_max = Window(expression=Max('value'), partition_by=[F('question')])

Pyomo | Creating simple model with indexed set

I am having trouble creating a simple model in pyomo. I want to define the following abstract model:
An attempt at creating an abstract model
I define
m.V = pyo.Set()
m.C = pyo.Set() # I first wanted to make this an indexed set in m.V, but this does not work as I cannot create variables with indexed sets (in next line)
m.Components = pyo.Var(m.V*m.C, domain=Binary)
Now I have no idea how to add the constraint. Just adding
Def constr(m,v):
return sum([m.Components[v,c] for c in m.C]) == 2
m.Constraint = Constraint(m.V, rule= constr)
will lead to the model also summing over components in m.C that should not fall under m.V (eg if I pass m.V = ['Cars', 'Boats'], and one of the 'Boats' components I want to pass is ‘New sails’; the above constraint will also put a constraint on m.Components[‘Cars’,’New sails’], which does not make much sense.
Trying to work out a concrete example
Now if I try to work through this problem in a concrete way and follow e.g. Variable indexed by an indexed Set with Pyomo, I still get an issue with the constraint. E.g. say I want to create a model that has this structure:
set_dict = {‘Car’:[ ‘New wheels’, ’New gearbox’, ’New seats’],’Boat’: [’New seats’, ‘New sail’, ‘New rudder‘]}
I then create these sets and variables:
m.V = pyo.Set(initialize=[‘Car’,’Boat’])
m.C = pyo.Set(initialize=[‘New wheels’, ’New gearbox’, ’New seats’, ‘New sail’, ‘New rudder‘])
m.VxC = pyo.Set(m.V*m.C, within = set_dict)
m.Components = pyo.Var(m.VxC, domain=Binary)
But now I still dont see a way to add the constraint in a pyomo native way. I cannot define a function that sums just over m.C as then it will sum over values that are not allowed again (e.g. as above, ‘New sail’ for the ‘Cars’ vehicle type). It seems the only way to do this is to refer back to the set_dict and loop & sum over that?
I need to create an abstract model, so I want to be able to write out this model in a pyomo native way, not relying on additional dictionaries and other objects to pass the right dimensions/sets into the model.
Any idea how I could do this?
You didn't say what form your data is in, but some variation of below should work. I'm not a huge fan of AbstractModels, but each format for the data should have some accommodation to build sparse sets which is what you want to do to represent the legal combinations of V x C.
By adding a membership test within your constraint(s), you can still sum across either V or C as needed.
import pyomo.environ as pyo
m = pyo.AbstractModel()
### SETS
m.V = pyo.Set()
m.C = pyo.Set()
m.VC = pyo.Set(within = m.V*m.C)
### VARS
m.select = pyo.Var(m.VC, domain=pyo.Binary)
### CONSTRAINTS
def constr(m,v):
return sum(m.select[v,c] for c in m.C if (v,c) in m.VC) == 2
m.Constraint = pyo.Constraint(m.V, rule= constr)

How to search in ManyToManyField

I am new to django, and I am trying to make a query in a Many to many field.
an example of my query:
in the Models I have
class Line(models.Model):
name = models.CharField("Name of line", max_length=50, blank=True)
class Cross(models.Model):
lines = models.ManyToManyField(Line, verbose_name="Lines crossed")
date = models.DateField('Cross Date', null=True, blank=False)
I am making a search querying all the crosses that have certain lines.
I mean the query in the search box will look like: line_1, line_2, line_3
and the result will be all the crosses that have all the lines (line_1, line2, line_3)
I don't know how should the filter condition be!
all_crosses = Cross.objects.all().filter(???)
The view code:
def inventory(request):
if request.method == "POST":
if 'btn_search' in request.POST:
if 'search_by_lines' in request.POST:
lines_query = request.POST['search_by_lines']
queried_lines = split_query(lines_query, ',')
query = [Q(lines__name=l) for l in queried_lines]
print(query)
result = Cross.objects.filter(reduce(operator.and_, query))
Thank you very much
You should be able to do:
crosses = Cross.objects.filter(lines__name__in=['line_1', 'line_2', 'line_3'])
for any of the three values. If you're looking for all of the values that match, you'll need to use a Q object:
from django.db.models import Q
crosses = Cross.objects.filter(
Q(lines__name='line_1') &
Q(lines__name='line_2') &
Q(lines__name='line_3')
)
There is at least one other approach you can use, which would be chaining filters:
Cross.objects.filter(lines__name='line_1')
.filter(lines_name='line_2')
.filter(lines__name='line_3')
If you need to dynamically construct the Q objects, and assuming the "name" value is what you're posting:
import operator
lines = [Q(line__name='{}'.format(line)) for line in request.POST.getlist('lines')]
crosses = Cross.objects.filter(reduce(operator.and_, lines))
[Update]
Turns out, I was dead wrong. I tried a couple of different ways of querying Cross objects where the value of lines matched all of the items searched. Q objects, annotations of counts on the number of objects contained... nothing worked as expected.
In the end, I ended up matching cross.lines as a list to the list of values posted.
In short, the search view I created matched in this fashion:
results = []
posted_lines = []
search_by_lines = 'search_by_lines' in request.POST.keys()
crosses = Cross.objects.all().prefetch_related('lines')
if request.method == 'POST' and search_by_lines:
posted_lines = request.POST.getlist('line')
for cross in crosses:
if list(cross.lines.values_list('name', flat=True)) == posted_lines:
results.append(cross)
return render(request, 'search.html', {'lines': lines, 'results': results,
'posted_lines': posted_lines})
What I would probably do in this case is add a column on the Cross model to keep a comma separated list of the primary keys of the related lines values, which you could keep in sync via a post_save signal.
With the additional field, you could query directly against the "line" values without joins.

Django ORM calculate number of days between two date attributes

Scenario
I have a table student. it has following attributes
name,
age,
school_passout_date,
college_start_date
I need a report to know what is the avg number of days student get free between the passing the school and starting college.
Current approach
Currently i am irritating over the range of values finding days for each student and getting its avg.
Problem
That is highly inefficient when the record set gets bigger.
Question
Is there any ability in the Django ORM that gives me totals days between the two dates?
Possibility
I am looking for something like this.
Students.objects.filter(school_passed=True, started_college=True).annotate(total_days_between=Count('school_passout_date', 'college_start_date'), Avg_days=Avg('school_passout_date', 'college_start_date'))
You can do this like so:
Model.objects.annotate(age=Cast(ExtractDay(TruncDate(Now()) - TruncDate(F('created'))), IntegerField()))
This lets you work with the integer value, eg you could then do something like this:
from django.db.models import IntegerField, F
from django.db.models.functions import Cast, ExtractDay, TruncDate
qs = (
Model
.objects
.annotate(age=Cast(ExtractDay(TruncDate(Now()) - TruncDate(F('created'))), IntegerField()))
.annotate(age_bucket=Case(
When(age__lt=30, then=Value('new')),
When(age__lt=60, then=Value('current')),
default=Value('aged'),
output_field=CharField(),
))
)
This question is very old but Django ORM is much more advanced now.
It's possible to do this using F() functions.
from django.db.models import Avg, F
college_students = Students.objects.filter(school_passed=True, started_college=True)
duration = college_students.annotate(avg_no_of_days=Avg( F('college_start_date') - F('school_passout_date') )
Mathematically, according to the (expected) fact that the pass out date is allway later than the start date, you can just get an average off all your start date, and all your pass out date, and make the difference.
This gives you a solution like that one
from django.db.models import Avg
avg_start_date = Students.objects.filter(school_passed=True, started_college=True).aggregate(Avg('school_start_date'))
avg_passout_date = Students.objects.filter(school_passed=True, started_college=True).aggregate(Avg('school_passout_date'))
avg_time_at_college = avg_passout_date - avg_start_date
Django currently only accept aggregation for 4 function : Max, Min, Count, et Average, so this is a little tricky to do.
Then the solution is using the method extra . That way:
Students.objects.
extra(select={'difference': 'school_passout_date' - 'college_start_date'}).
filter('school_passed=True, started_college=True)
But then, you still have to do the average on the server side

Categories