How to use inner join on Subquery() Dajngo ORM? - python

I have two models:
class FirstModel(models.Model():
some_fields...
class SecondModel(models.Model):
date = models.DateTimeField()
value = models.IntegerField()
first_model = models.ForeignKey(to="FirstModel", on_delete=models.CASCADE)
and I need to do the following query:
select sum(value) from second_model
inner join (
select max(date) as max_date, id from second_model
where date < NOW()
group by id
) as subquery
on date = max_date and id = subquery.id
I think I can do it using Subquery
subquery = Subquery(SecondModel.objects.values("first_model")
.annotate(max_date=Max("date"))
.filter(date__lt=Func(function="NOW")))
and F() expressions but it only can resolve model fields, not a subquery
Question
Is it possible to implement using Django ORM only?
Also, can I evaluate the sum of values from the second model for all values in the first model by annotating this value? Like
FirstModel.objects.annotate(sum_values=sum_with_inner_join_query).all()

Related

Can I write RIGHT JOIN and *ORDER BY queryset in Django ORM? [duplicate]

Here are my models
class Student:
user = ForeignKey(User)
department = IntegerField()
semester = IntegerField()
...
class Attendance:
student = ForeignKey(Student)
subject = ForeignKey(Subject)
month = IntegerField()
year = IntergerField()
present = IntegerField()
total = IntegerField()
students = Student.objects.filter(semester=semester)
How can I perform a right join between Student and Attendance models, so that I can get a
queryset with all of the students and attendances` if exists for a student, else null?
The documentation mentions left joins but not right joins.
change left join for table subject
queryset.query.alias_map['subject'].join_type = "RIGHT OUTER JOIN"
You can use such query:
queryset = Student.objects.all().select_related('attendance_set')
student = queryset[0]
# all attendances for the student
attendances = student.attendance_set.all()
With select_related you JOIN'ing Attendance table. Django does not have explicit join ORM method, but it does JOIN internally when you call select_related. Resulting queryset will contain all Student's with joined attendances, and when you will call attencande_set.all() on each student - no additional queries will be performed.
Check the docs for _set feature.
If you want to query only those students who has at least one attendance, you can use such query:
from django.models import Count
(Student.objects.all()
.select_related('attendance_set')
.annotate(n_attendances=Count('attendance_set'))
.filter(n_attendances__gt=0))

Alternative way of querying through a models' method field

I have this model about Invoices which has a property method which refers to another model in order to get the cancelation date of the invoice, like so:
class Invoice(models.Model):
# (...)
#property
def cancel_date(self):
if self.canceled:
return self.records.filter(change_type = 'cancel').first().date
else:
return None
And in one of my views, i need to query every invoice that has been canceled after max_date or hasn't been canceled at all.
Like so:
def ExampleView(request):
# (...)
qs = Invoice.objects
if r.get('maxDate'):
max_date = datetime.strptime(r.get('maxDate'), r'%Y-%m-%d')
ids = list(map(lambda i: i.pk, filter(lambda i: (i.cancel_date == None) or (i.cancel_date > max_date), qs)))
qs = qs.filter(pk__in = ids) #Error -> django.db.utils.OperationalError: too many SQL variables
However, ids might give me a huge list of ids which causes the error too many SQL variables.
What's the smartest approach here?
EDIT:
I'm looking for a solution that does not involve adding cancel_date as a model field since invoice.records refers to another model where we store every date attribute of the invoice
Like so:
class InvoiceRecord(models.Model):
invoice = models.ForeignKey(Invoice, related_name = 'records', on_delete = models.CASCADE)
date = models.DateTimeField(default = timezone.now)
change_type = models.CharField(max_length = 32) # Multiple choices field
And every invoice might have more than one same date attribute. For example, one invoice might have two cancelation dates
You can annotate a Subquery() expression [Django docs] which will give you the date to do this:
from django.db.models import OuterRef, Q, Subquery
def ExampleView(request):
# (...)
qs = Invoice.objects.annotate(
cancel_date=Subquery(
InvoiceRecords.objects.filter(invoice=OuterRef("pk")).values('date')[:1]
)
)
if r.get('maxDate'):
max_date = datetime.strptime(r.get('maxDate'), r'%Y-%m-%d')
qs = qs.filter(Q(cancel_date__isnull=True) | Q(cancel_date__gt=max_date))
I would set cancel_date as database field when you set cancel flag. Then you can use single query:
qs = Invoice.objects.filter(Q(cancel_date__isnull=True) | Q(cancel_date__gt=max_date))
It's say cancel_date is NULL or greater than max_date
Not sure about your property cancel_date. It will return first record with change_type='cancel' which can be (don't know your code flow) other record then you call that property on.

Why select_related in Django ORM apply join before where clause?

Consider I have two tables namely
Class Department(model.Models):
id = models.AutoField(primary_key=True)
Class Employee(model.Models):
dep = models.ForeignKey(Department)
exp = models.IntegerField()
Now in views class I am using get_queryset function
def get_queryset(self):
return Employee.objects.using(schema).filter(exp_gte=date).select_related('dep')
Sql statement create is in this form
Select `Employee`.`dep`, `Employee`.`exp`, `Department`.`id` from `Employee`
Inner Join `Department` on `Department`.`id`= `Employee`.`dep`
Where `Employee`.`exp` >= date;
Why is join before where clause. And how can I apply join after where clause using Django ORM

Django: sum annotation on relation with a limit clause

I have a situation similar to the following one:
class Player(models.Model):
pass
class Item(models.Model):
player = models.ForeignKey(Player,
on_delete=models.CASCADE,
related_name='item_set')
power = models.IntegerField()
I would like to annotate Player.objects.all() with Sum(item_set__power), taking into account only the top N Item when sorted by descending power. Ideally, I would like to do this with a subquery, but I don't know how to write it. How can this be done?
This is the solution using raw queryset (it was easier to implement for me than using ORM, but it might be possible with ORM using Subquery):
N = 2
query = """
SELECT id,
(
SELECT SUM(power)
FROM (SELECT power FROM myapp_item WHERE myapp_item.player_id = players.id ORDER BY power DESC LIMIT %s)
)
AS power__sum
FROM myapp_player AS players GROUP BY players.id
"""
players = Player.objects.raw(query, [N])
Update
Adding annotations is not possible with RawQueryset, but you can use RawSQL expression:
from django.db.models.expressions import RawSQL
N = 2
queryset = Player.objects.all()
query2 = """
SELECT SUM(power)
FROM (SELECT power FROM myapp_item WHERE myapp_item.player_id = myapp_player.id ORDER BY power DESC LIMIT %s)
"""
queryset.annotate(power__sum=RawSQL(query2, (N,)), my_annotation1=..., my_annotation2=...)

Get least recently rented movies in Django

So imagine you have the following two tables:
CREATE movies (
id int,
name varchar(255),
...
PRIMARY KEY (id)
);
CREATE movieRentals (
id int,
movie_id int,
customer varchar(255),
dateRented datetime,
...
PRIMARY KEY (id)
FOREIGN KEY (movie_id) REFERENCES movies(id)
);
With SQL directly, I'd approach this query as:
(
SELECT movie_id, count(movie_id) AS rent_count
FROM movieRentals
WHERE dateRented > [TIME_ARG_HERE]
GROUP BY movie_id
)
UNION
(
SELECT id AS movie_id, 0 AS rent_count
FROM movie
WHERE movie_id NOT IN
(
SELECT movie_id
FROM movieRentals
WHERE dateRented > [TIME_ARG_HERE]
GROUP BY movie_id
)
)
(Get a count of all movie rentals, by id, since a given date)
Obviously the Django version of these tables are simple models:
class Movies(models.Model):
name = models.CharField(max_length=255, unique=True)
class MovieRentals(models.Model):
customer = models.CharField(max_length=255)
dateRented = models.DateTimeField()
movie = models.ForeignKey(Movies)
However, translating this to an equivalent query appears to be difficult:
timeArg = datetime.datetime.now() - datetime.timedelta(7,0)
queryset = models.MovieRentals.objects.all()
queryset = queryset.filter(dateRented__gte=timeArg)
queryset = queryset.annotate(rent_count=Count('movies'))
querysetTwo = models.Movies.objects.all()
querysetTwo = querysetTwo.filter(~Q(id__in=[val["movie_id"] for val in queryset.values("movie_id")]))
# Somehow need to set the 0 count. For now force it with Extra:
querysetTwo.extra(select={"rent_count": "SELECT 0 AS rent_count FROM app_movies LIMIT 1"})
# Now union these - for some reason this doesn't work:
# return querysetOne | querysetTwo
# so instead
set1List = [_getMinimalDict(model) for model in queryset]
# Where getMinimalDict just extracts the values I am interested in.
set2List = [_getMinimalDict(model) for model in querysetTwo]
return sorted(set1List + set2List, key=lambda x: x['rent_count'])
However, while this method seems to work, it is incredibly slow. Is there a better way I am missing?
With straight SQL, this would be much easier expressed like this:
SELECT movie.id, count(movieRentals.id) as rent_count
FROM movie
LEFT JOIN movieRentals ON (movieRentals.movie_id = movie.id AND dateRented > [TIME_ARG_HERE])
GROUP BY movie.id
The left join will produce a single row for each movie unrented since [TIME_ARG_HERE], but in those rows, the movieRentals.id column will be NULL.
Then, COUNT(movieRentals.id) will count all of the rentals where they exist, and return 0 if there was only the NULL value.
I must be missing something obvious. Why wouldn't the following work:
queryset = models.MovieRentals.filter(dateRented__gte=timeArg).values('movies').annotate(Count('movies')).aggregate(Min('movies__count'))
Also, clauses can be chained (as shown in the code above), so there is no reason to constantly set a queryset variable to the intermediate querysets.

Categories