Django - Select MAX of field of related query set - python

Say if I have models:
class User(models.Model):
name = ...
dob = ...
class Event(models.Model):
user = models.ForeignKey(User, ...)
timestamp = models.DateTimeField()
And I want to query all Users and annotate with both Count of events and MAX of Events.timestamp
I know for count I can do:
Users.objects.all().annotate(event_count=models.Count('event_set'))
But how do I do max of a related queryset field? I want it to be a single query, like:
SELECT Users.*, MAX(Events.timestamp), COUNT(Events)
FROM Users JOIN Events on Users.id = Events.user_id

You could use Query Expressions to achieve that. You might have to play around with the foreign key related name depending on your code, but it would result in something looking like this:
from django.db.models import Count, F, Func,
Users.objects.all().annotate(
event_count=Count('event_set'),
max_timestamp=Func(F('event_set__timestamp'), function='MAX')
)

You can try an aggregate query for this as follows:
from django.db.models import Max
Users.objects.all().aggregate(Max('Events.timestamp'))
See details for the above code in Django documentation here

Related

LEFT JOIN with other param in ON Django ORM

I have the following models:
class Customer(models.Model):
name = models.CharField(max_length=255)
email = models.EmailField(max_length = 255, default='example#example.com')
authorized_credit = models.IntegerField(default=0)
balance = models.IntegerField(default=0)
class Transaction(models.Model):
customer = models.ForeignKey(Customer, on_delete=models.CASCADE)
payment_amount = models.IntegerField(default=0) #can be 0 or have value
exit_amount = models.IntegerField(default=0) #can be 0 or have value
transaction_date = models.DateField()
I want to query for get all customer information and date of last payment.
I have this query in postgres that is correct, is just that i need:
select e.*, max(l.transaction_date) as last_date_payment
from app_customer as e
left join app_transaction as l
on e.id = l.customer_id and l.payment_amount != 0
group by e.id
order by e.id
But i need this query in django for an serializer. I try with that but return other query
In Python:
print(Customer.objects.filter(transaction__isnull=True).order_by('id').query)
>>> SELECT app_customer.id, app_customer.name, app_customer.email, app_customer.balance FROM app_customer
LEFT OUTER JOIN app_transaction
ON (app_customer.id = app_transaction.customer_id)
WHERE app_transaction.id IS NULL
ORDER BY app_customer.id ASC
But that i need is this rows
example
Whether you are working with a serializer or not you can reuse the same view/function for both the tasks.
First to get the transaction detail for the current customer object you have you have to be aware of related_name.related_name have default values but you can mention something unique so that you remember.
Change your model:
class Transaction(models.Model):
customer = models.ForeignKey(Customer, related_name="transac_set",on_delete=models.CASCADE)
related_names are a way in django to create reverse relationship from Customer to Transaction this way you will be able to do Customer cus.transac_set.all() and it will fetch all the transaction of cus object.
Since you might have multiple customers to get transaction details for you can use select_related() when querying this will hit the database least number of times and get all the data for you.
Create a function definition to get the data of all transaction of Customers:
def get_cus_transac(cus_id_list):
#here cus_id_list is the list of ids you need to fetch
cus_transac_list = Transaction.objects.select_related(customer).filter(id__in = cus_id_list)
return cus_transac_list
For your purpose you need to use another way that is the reason you needed related_name, prefetch_related().
Create a function definition to get the data of latest transaction of Customers: ***Warning: I was typing this answer before sleeping so there is no way the latest value of transaction is being fetched here.I will add it later but you can work on similar terms and get it done this way.
def get_latest_transac(cus_id_list):
#here cus_id_list is the list of ids you need to fetch
latest_transac_list = Customer.objects.filter(id__in = cus_id_list).prefetch_related('transac_set')
return latest_transac_list
Now coming to serializer,you need to have 3 serializers (Actually you need 2 only but third one can serialize Customer data + latest transaction that you need) ...one for Transaction and another for customer then the 3rd Serializer to combine them.
There might be some mistakes in code or i might have missed some details.As i have not checked it.I am assuming you know how to make serializers and views for the same.
One approach is to use subqueries:
transaction_subquery = Transaction.objects.filter(
customer=OuterRef('pk'), payment_amount__gt=0,
).order_by('-transaction_date')
Customer.objects.annotate(
last_date_payment=Subquery(
transaction_subquery.values('transaction_date')[:1]
)
)
This will get all customer data, and annotate with their last transaction date that has payment_amount as non-zero, in one query.
To solve your problem:
I want to query for get all customer information and date of last payment.
You can try use order by combine with distinct:
Customer.objects.prefetch_related('transaction_set').values('id', 'name', 'email', 'authorized_credit', 'balance', 'transaction__transaction_date').order_by('-transaction__transaction_date').distinct('transaction__transaction_date')
Note:
It only applies to PostgreSQL when distinct followed by parameters.
Usage of distinct: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#distinct

django orm sort by foreign key occurence in another table

Suppose to have something like the following tables:
class Book(models.Model):
title = models.CharField(max_length=200)
class Purchase(models.Model):
book = models.CharField(Book, db_column="book", on_delete=models.CASCADE)
date = models.DateField()
and wanting to retrieve a queryset of books ordered by number of purchases (i.e. occurrences of the foreign key in the other table).
Book.objects.all().annotate(number_of_purchases=Count(**something**)).order_by('number_of_purchases')
Is this possible? I currently have no idea what the "something" should be replaced with.
You can work with .annotate(…) [Django-doc]:
from django.db.models import Count
Book.objects.annotate(
npurchase=Count('purchase')
).order_by('-npurchase')

How to filter not only by outerref id in a subquery?

I have a problem with filtering by boolean field in a subquery.
For example, I have two models: Good and Order.
class Good(models.Model):
objects = GoodQuerySet.as_manager()
class Order(models.Model):
good = models.FK(Good, related_name="orders")
is_completed = models.BooleanField(default=False)
I want to calculate how many completed orders has each good.
I implemented a method in Good's manager:
class GoodQuerySet(models.QuerySet):
def completed_orders_count(self):
subquery = Subquery(
Order.objects.filter(good_id=OuterRef("id"))
.order_by()
.values("good_id")
.annotate(c=Count("*"))
.values("c")
)
return self.annotate(completed_orders_count=Coalesce(subquery, 0))
This method counts all existing orders for a good, but it works when I call it like this:
Good.objects.completed_orders_count().first().completed_orders_count
To get the correct value of completed orders I tried to add filter is_completed=True. The final version looks like this:
class GoodQuerySet(models.QuerySet):
def completed_orders_count(self):
subquery = Subquery(
Order.objects.filter(good_id=OuterRef("id"), is_completed=True)
.order_by()
.values("good_id")
.annotate(c=Count("*"))
.values("c")
)
return self.annotate(completed_orders_count=Coalesce(subquery, 0))
If I try to call Good.objects.completed_orders_count().first().completed_orders_count I got an error:
django.core.exceptions.FieldError: Expression contains mixed types. You must set output_field.

Alternative way of querying through a models' method field

I have this model about Invoices which has a property method which refers to another model in order to get the cancelation date of the invoice, like so:
class Invoice(models.Model):
# (...)
#property
def cancel_date(self):
if self.canceled:
return self.records.filter(change_type = 'cancel').first().date
else:
return None
And in one of my views, i need to query every invoice that has been canceled after max_date or hasn't been canceled at all.
Like so:
def ExampleView(request):
# (...)
qs = Invoice.objects
if r.get('maxDate'):
max_date = datetime.strptime(r.get('maxDate'), r'%Y-%m-%d')
ids = list(map(lambda i: i.pk, filter(lambda i: (i.cancel_date == None) or (i.cancel_date > max_date), qs)))
qs = qs.filter(pk__in = ids) #Error -> django.db.utils.OperationalError: too many SQL variables
However, ids might give me a huge list of ids which causes the error too many SQL variables.
What's the smartest approach here?
EDIT:
I'm looking for a solution that does not involve adding cancel_date as a model field since invoice.records refers to another model where we store every date attribute of the invoice
Like so:
class InvoiceRecord(models.Model):
invoice = models.ForeignKey(Invoice, related_name = 'records', on_delete = models.CASCADE)
date = models.DateTimeField(default = timezone.now)
change_type = models.CharField(max_length = 32) # Multiple choices field
And every invoice might have more than one same date attribute. For example, one invoice might have two cancelation dates
You can annotate a Subquery() expression [Django docs] which will give you the date to do this:
from django.db.models import OuterRef, Q, Subquery
def ExampleView(request):
# (...)
qs = Invoice.objects.annotate(
cancel_date=Subquery(
InvoiceRecords.objects.filter(invoice=OuterRef("pk")).values('date')[:1]
)
)
if r.get('maxDate'):
max_date = datetime.strptime(r.get('maxDate'), r'%Y-%m-%d')
qs = qs.filter(Q(cancel_date__isnull=True) | Q(cancel_date__gt=max_date))
I would set cancel_date as database field when you set cancel flag. Then you can use single query:
qs = Invoice.objects.filter(Q(cancel_date__isnull=True) | Q(cancel_date__gt=max_date))
It's say cancel_date is NULL or greater than max_date
Not sure about your property cancel_date. It will return first record with change_type='cancel' which can be (don't know your code flow) other record then you call that property on.

how to let django achieve inner join

there are two tables:
class TBLUserProfile(models.Model):
userid = models.IntegerField(primary_key=True)
relmusicuid = models.IntegerField()
fansnum = models.IntegerField()
class TSinger(models.Model):
fsinger_id = models.IntegerField()
ftbl_user_profile = models.ForeignKey(TBLUserProfile, db_column='Fsinger_id')
I want to get Tsinger info and then order by TBLUserProfile.fansnum, I know how to write sql query: select * from t_singer INNER JOIN tbl_user_profile ON (tbl_user_profile.relmusicuid=t_singer.Fsinger_id) order by tbl_user_profile.fansnum, but I don't want to use model raw function. relmusicuid is not primary key otherwise I can use ForeignKey to let it work. How can I use django model to achieve this?
You can do like this :
Tsinger.objects.all().order_by('ftbl_user_profile__fansnum')
For information about Django JOIN :
https://docs.djangoproject.com/en/1.8/topics/db/examples/many_to_one/

Categories