Optimizing a Django query - python

I have 2 models like these:
class Client(models.Model):
// some fields
class Transaction(models.Model):
client = models.ForeignKey(Client)
created = models.DateTimeField(auto_now_add=True)
amount = DecimalField(max_digits=9, decimal_places=2)
I want to write a query that add the last created Transaction amount, per clients, only if created is lower than a provided date argument.
For example, if I have a dataset like this one, and that the provided date is 01/20:
Client1:
- Transaction 1, created on 01/15, 5€
- Transaction 2, created on 01/16, 6€
- Transaction 3, created on 01/22, 7€
Client2:
- Transaction 4, created on 01/18, 8€
- Transaction 5, created on 01/19, 9€
- Transaction 6, created on 01/21, 10€
Client3:
- Transaction 7, created on 01/21, 11€
Then, the query should return 15 (6€ + 9€ from transaction 2 and 5).
From a performance view, my purpose is to avoid having N queries for N clients.
Currently, I have trouble selecting the right Transaction objects.
Maybe I could start by:
Transaction.objects.filter(created__lt=date).select_related('client').
But then, I can't figure out how to select only the latest per client.

Take a look at Django's docs on aggregation, the usage of Sum, SubQuery expressions, and QuerySet.values(). With those we can construct a single query through the ORM to get at what you're after, allowing the database to do all the work:
from django.db.models import Sum, Subquery, OuterRef
from django.utils import timezone
from . import models
# first, start with the client list, rather than the transaction list
aggregation = models.Client.objects.aggregate(
# aggregate the sum of our per client sub queries
result=Sum(
Subquery(
models.Transaction.objects.filter(
# filter transactions by outer query's client pk
client=OuterRef('pk'),
created__lt=timezone.datetime(2018, 1, 20),
)
# order descending so the transaction we're after is first in the list
.order_by('-created')
# use QuerySet.values() to grab just amount and slice the queryset
# to limit the subquery result to a single transaction for each client
.values('amount')[:1]
)
)
)
# aggregation == {'result': Decimal('15.00')}

Something along the lines of the following should do the trick which uses Django's latest QuerySet method
total = 0
for client in clients
try:
total += Transactions.filter(client = client).filter(created__lt = date).latest('created').amount
except DoesNotExist:
pass

Related

how can i change this SQL to Django ORM code?

select *
from sample
join process
on sample.processid = process.id
where (processid) in (
select max(processid) as processid
from main_sample
group by serialnumber
)
ORDER BY sample.create_at desc;
models.py
class Sample(models.Model):
processid = models.IntegerField(default=0)
serialnumber = models.CharField(max_length=256) ##
create_at = models.DateTimeField(null=True)
class Process(models.Model):
sample = models.ForeignKey(Sample, blank=False, null=True, on_delete=models.SET_NULL)
Hi I have two models and I need to change this SQL query to Django ORM, Python code.
I need to retrieve the latest Sample(by processid) per unique serial number.
for example,
=> after RUN query
How can I change the SQL query to ORM code?
how can i change the subquery to ORM?
Thanks for reading.
EDIT: To also order by a column that is not one of the distinct or retrieved columns you can fall-back on subqueries. To filter by a single row from a subquery you can use the syntax described in the docs here
from django.db.models import Subquery, OuterRef
subquery = Subquery(Sample.objects.filter(
serialnumber=OuterRef('serialnumber')
).order_by(
'-processid'
).values(
'processid'
)[:1])
results = Sample.objects.filter(
processid=subquery
).order_by(
'create_at'
)
When using PostgreSQL you can pass fields to distinct to get a single result per a certain column, this returns the first result so combined with ordering will do what you need
Sample.objects.order_by('serialnumber', '-processid').distinct('serialnumber')
If you don't use PostgreSQL. Use a values query of the column that should be unique and then annotate the queryset with the condition that should group the values, Max in this case
from django.db.models import Max
Sample.objects.order_by(
'serialnumber'
).values(
'serialnumber'
).annotate(
max_processid=Max('processid')
)
I think this is what you need:
If want multiple related objects
samples = Sample.objects.prefetch_related('process').group_by('serialinumber')
If you want related objects for only one object
samples = Sample.objects.filter(id=1).select_related('process').group_by('serialinumber')

Filter by CharField pretending it is DateField in Django ORM/mySql

I am working with a already-done mySQL Database using Django ORM and I need to filter rows by date - if it wasn't that dates are not in Date type but normal Varchar(20) stored as dd/mm/yyyy hh:mm(:ss).
With a free query I would transform the field into date and I would use > and < operators to filter the results but before doing this I was wondering whether Django ORM provides a more elegant way to do so without writing raw SQL queries.
I look forward to any suggestion.
EDIT: my raw query would look like
SELECT * FROM table WHERE STR_TO_DATE(mydate,'%d/%m/%Y %H:%i') > STR_TO_DATE('30/12/2020 00:00', '%d/%m/%Y %H:%i')
Thank you.
I will assume your model looks like this:
from django.db import models
class Event(models.Model):
mydate = models.CharField(max_length=20)
def __str__(self):
return f'Event at {self.mydate}'
You can construct a Django query expression to represent this computation. This expression consists of:
Func objects representing your STR_TO_DATE function calls.
An F object representing your field name.
A GreaterThan function to represent your > comparison.
from django.db.models import F, Func, Value
from django.db.models.lookups import GreaterThan
from .models import Event
# Create some events for this example
Event(mydate="29/12/2020 00:00").save()
Event(mydate="30/12/2020 00:00").save()
Event(mydate="31/12/2020 00:00").save()
class STR_TO_DATE(Func):
"Lets us use the STR_TO_DATE() function from SQLite directly in Python"
function = 'STR_TO_DATE'
# This Django query expression converts the mydate field
# from a string into a date, using the STR_TO_DATE function.
mydate = STR_TO_DATE(F('mydate'), Value('%d/%m/%Y %H:%i'))
# This Django query expression represents the value 30/12/2020
# as a date.
date_30_12_2020 = STR_TO_DATE(Value('30/12/2020 00:00'), Value('%d/%m/%Y %H:%i'))
# This Django query expression puts the other two together,
# creating a query like this: mydate < 30/12/2020
expr = GreaterThan(mydate, date_30_12_2020)
# Use the expression as a filter
events = Event.objects.filter(expr)
print(events)
# You can also use the annotate function to add a calculated
# column to your query...
events_with_date = Event.objects.annotate(date=mydate)
# Then, you just treat your calculated column like any other
# field in your database. This example uses a range filter
# (see https://docs.djangoproject.com/en/4.0/ref/models/querysets/#range)
events = events_with_date.filter(date__range=["2020-12-30", "2020-12-31"])
print(events)
I tested this answer with Django 4.0.1 and MySQL 8.0.

Django: Aggregating Across Submodels Fields

I currently have the following models, where there is a Product class, which has many Ratings. Each Rating has a date_created DateTime field, and a stars field, which is an integer from 1 to 10. Is there a way I can add up the total number of stars given to all products on a certain day, for all days?
For instance, on December 21st, 543 stars were given to all Products in total (ie. 200 on Item A, 10 on Item B, 233 on Item C). On the next day, there might be 0 stars, because there were no ratings for any Products.
I can imagine first getting a list of dates, and then filtering on each date, and aggregating each one, but this seems very intensive. Is there an easier way?
You should be able to do it all in one query, using values:
from datetime import date, timedelta
from django.db.models import Sum
end_date = date.now()
start_date = end_date - timedelta(days=7)
qs = Rating.objects.filter(date_created__gt=start_date, date_created__lt=end_date)
qs = qs.values('date_created').annotate(total=Sum('stars'))
print qs
Should output something like:
[{'date_created': '1-21-2013', 'total': 150}, ... ]
The SQL for it looks like this (WHERE clause omitted):
SELECT "myapp_ratings"."date_created", SUM("myapp_ratings"."stars") AS "total" FROM "myapp_ratings" GROUP BY "myapp_ratings"."date_created"
You'll want to use Django's aggregation functions; specifically, Sum.
>>> from django.db.models import Sum
>>>
>>> date = '2012-12-21'
>>> Rating.objects.filter(date_created=date).aggregate(Sum('stars'))
{'stars__sum': 543}
As a side note, your scenario actually doesn't need to use any submodels at all. Since the date_created field and the stars field are both members of the Rating model, you can just do a query over it directly.
You could always just perform some raw SQL:
from django.db import connection, transaction
cursor = connection.cursor()
cursor.execute('SELECT date_created, SUM(stars) FROM yourapp_rating GROUP BY date_created')
result = cursor.fetchall() # looks like [('date1', 'sum1'), ('date2', 'sum2'), etc]

GROUP BY in Django Queries

Dear StackOverFlow community:
I need your help in executing following SQL query.
select DATE(creation_date), COUNT(creation_date) from blog_article WHERE creation_date BETWEEN SYSDATE() - INTERVAL 30 DAY AND SYSDATE() GROUP BY DATE(creation_date) AND author="scott_tiger";
Here is my Django Model
class Article(models.Model):
title = models.CharField(...)
author = models.CharField(...)
creation_date = models.DateField(...)
How can I form aforementioned Django query using aggregate() and annotate() functions. I created something like this -
now = datetime.datetime.now()
date_diff = datetime.datetime.now() + datetime.timedelta(-30)
records = Article.objects.values('creation_date', Count('creation_date')).aggregate(Count('creation_date')).filter(author='scott_tiger', created_at__gt=date_diff, created_at__lte=now)
When I run this query it gives me following error -
'Count' object has no attribute 'split'
Any idea who to use it?
Delete Count('creation_date') from values and add annotate(Count('creation_date')) after filter.
Try
records = Article.objects.filter(author='scott_tiger', created_at__gt=date_diff,
created_at__lte=now).values('creation_date').aggregate(
ccd=Count('creation_date')).values('creation_date', 'ccd')
You need to use creation_date__count or customized name(ccd here) to refer the count result column, after aggregate().
Also, values() before aggregate limits group by columns and last value() declares the columns to be selected. There is no need to group by COUNT which is based on group of rows already.

Can Django ORM do an ORDER BY on a specific value of a column?

I have a table 'tickets' with the following columns
id - primary key - auto increment
title - varchar(256)
status - smallint(6) - Can have any value between 1 and 5, handled by Django
When I'll do a SELECT * I want the rows with status = 4 at the top, the other records will follow them. It can be achieved by the following query:
select * from tickets order by status=4 DESC
Can this query be executed through Django ORM? What parameters should be passed to the QuerySet.order_by() method?
q = Ticket.objects.extra(select={'is_top': "status = 4"})
q = q.extra(order_by = ['-is_top'])
I did this while using PostgresSql with django.
from django.db.models import Case, Count, When
Ticket.objects.annotate(
relevancy=Count(Case(When(status=4, then=1)))
).order_by('-relevancy')
It will return all objects from Ticket, but tickets with status = 4 will be at the beginning.
Hope someone will find it useful.
For those in need just like me that stumbled on this now and are using newer versions of Django
from django.db.models import Case, When
Ticket.objects.annotate(
relevancy=Case(
When(status=4, then=1),
When(status=3, then=2),
When(status=2, then=3),
output_field=IntegerField()
)
).order_by('-relevancy')
Using Count() will return 1 or 0 depending if your case was found or not. Not ideal if ordering by a couple of status

Categories