Django: Aggregating Across Submodels Fields - python

I currently have the following models, where there is a Product class, which has many Ratings. Each Rating has a date_created DateTime field, and a stars field, which is an integer from 1 to 10. Is there a way I can add up the total number of stars given to all products on a certain day, for all days?
For instance, on December 21st, 543 stars were given to all Products in total (ie. 200 on Item A, 10 on Item B, 233 on Item C). On the next day, there might be 0 stars, because there were no ratings for any Products.
I can imagine first getting a list of dates, and then filtering on each date, and aggregating each one, but this seems very intensive. Is there an easier way?

You should be able to do it all in one query, using values:
from datetime import date, timedelta
from django.db.models import Sum
end_date = date.now()
start_date = end_date - timedelta(days=7)
qs = Rating.objects.filter(date_created__gt=start_date, date_created__lt=end_date)
qs = qs.values('date_created').annotate(total=Sum('stars'))
print qs
Should output something like:
[{'date_created': '1-21-2013', 'total': 150}, ... ]
The SQL for it looks like this (WHERE clause omitted):
SELECT "myapp_ratings"."date_created", SUM("myapp_ratings"."stars") AS "total" FROM "myapp_ratings" GROUP BY "myapp_ratings"."date_created"

You'll want to use Django's aggregation functions; specifically, Sum.
>>> from django.db.models import Sum
>>>
>>> date = '2012-12-21'
>>> Rating.objects.filter(date_created=date).aggregate(Sum('stars'))
{'stars__sum': 543}
As a side note, your scenario actually doesn't need to use any submodels at all. Since the date_created field and the stars field are both members of the Rating model, you can just do a query over it directly.

You could always just perform some raw SQL:
from django.db import connection, transaction
cursor = connection.cursor()
cursor.execute('SELECT date_created, SUM(stars) FROM yourapp_rating GROUP BY date_created')
result = cursor.fetchall() # looks like [('date1', 'sum1'), ('date2', 'sum2'), etc]

Related

Using SQLalchemy ORM for Python In my REST api, how can I aggregate resources to the hour to the day?

I have a MySql db table with that looks like:
time_slot | sales
2022-08-026T01:00:00 | 100
2022-08-026T01:06:40 | 103
...
I am serving the data via api to a client. The FE engineer wants the data aggregated by hour for each day within the query period (atm it's a week). So he gives from and to and wants the sum of sales within each hour for each day as a nested array. Because it's a week, it's a 7 element array, where each element is an array containing all the hourly slots where we have data.
[
[
"07:00": 567,
"08:00": 657,
....
],
[], [], ...
]
The api is built in python. There is an ORM (sqlalchemy) for the data, that looks like:
class HourlyData(Base):
hour: Column(Datetime)
sales: Column(Float)
I can query the hourly data, and then in python memory aggregate it into list of lists. But to save compute time (and conceptual complexity), I would like to run the aggregation through orm queries.
What is the sqlalchemy syntax to achieve this?
The below should get you started, where the solution is a mix of SQL and Python using existing tools, and it should work with any RDBMS.
Assumed model definition, and imports
from itertools import groupby
import json
class TimelyData(Base):
__tablename__ = "timely_data"
id = Column(Integer, primary_key=True)
time_slot = Column(DateTime)
sales = Column(Float)
We get the data from the DB aggregated enough for us to group properly
# below works for Posgresql (tested), and should work for MySQL as well
# see: https://mode.com/blog/date-trunc-sql-timestamp-function-count-on
col_hour = func.date_trunc("hour", TimelyData.time_slot)
q = (
session.query(
col_hour.label("hour"),
func.sum(TD.sales).label("total_sales"),
)
.group_by(col_hour)
.order_by(col_hour) # this is important for `groupby` function later on
)
Group the results by date again using python groupby
groups = groupby(q.all(), key=lambda row: row.hour.date())
# truncate and format the final list as required
data = [
[(f"{row.hour:%H}:00", int(row.total_sales)) for row in rows]
for _, rows in groups
]
Example result:
[[["01:00", 201], ["02:00", 102]], [["01:00", 103]], [["08:00", 104]]]
I am not familiar with MySQL, but with Postgresql one could implement all at the DB level due to extensive JSON support. However, I would argue the readability of that implementation will not be improve, and so will not the speed assuming we get from the database at most 168 rows = 7 days x 24 hours).

How to merge two different querysets with one common field in to one in django?

I have 2 Querysets Sales_order and Proc_order. The only common field in both is the product_id. I want to merge both these query sets to one with all fields.
sales_order has fields product_id,sales_qty,sales_price.
proc_order has fields product_id, proc_qty, proc_price. I want to merge both these to get a queryset which looks like.
combined_report which has fields product_id,sales_qty,sales_price``proc_qty, proc_price.
My final aim is to calculate the difference between the number of products.
I'm using Django 2.1
You can try this way to capture all the values.
from django.db.models import Subquery, OuterRef, FloatField
from django.db.models.functions import Cast
subquery_qs = proc_order_qs.filter(product_id=OuterRef('product_id')
combined_qs = sales_order_qs.annotate(
proc_qty = Cast(Subquery(subquery_qs.values('proc_qty')[:1]), output_field=FloatField()),
proc_price = Cast(Subquery(subquery_qs.values('proc_price')[:1]), output_field=FloatField()))
And then you can get all the values in combined_qs
combined_qs.values('product_id','sales_qty','sales_price','proc_qty', 'proc_price')
you can try to do something like this:
views.py
from itertools import chain
def yourview(request):
Sales_order = ....
Proc_order = ....
combined_report = chain(Sales_order,Proc_order)

Count records per day in a Django Model where date is Unix

I'm trying to create a query that counts how many queries per day there were on a certain Django table. I found a bunch of examples about it but none was dealing with Unix data. Here is what my model looks like:
class myData(models.Model):
user_id = models.IntegerField()
user = models.CharField(max_length=150)
query = models.CharField(max_length=100)
unixtime = models.IntegerField()
class Meta:
managed = False
db_table = 'myData'
So the result i'm trying to get is something like: {'27/06/2020': 10, '26/06/2020': 15 ... }
The doubt i have is: should i use a raw MYSQL query or should i use Django's ORM?
I tried to make it with a raw query, but didn't get the expected output:
select FROM_UNIXTIME(`unixtime`, '26.06.2020') as ndate,
count(id) as query_count
from myData
group by ndate
But it gave the following output:
ndate query_count
26/06/2020 1
26/06/2020 1
26/06/2020 1
26/06/2020 1
....
Can anyone help me out on this? It doesn't make the difference whether the query is made with raw mysql or Django ORM, i just need a simple way to do this
You should read up on how to use the function FROM_UNIXTIME(), specially the allowed format string options.
Your query should probably be modified to something like this:
select FROM_UNIXTIME(unixtime, '%Y/%m/%d') as ndate,
count(id) as query_count
from myData
group by ndate
Does that work for you?

Optimizing a Django query

I have 2 models like these:
class Client(models.Model):
// some fields
class Transaction(models.Model):
client = models.ForeignKey(Client)
created = models.DateTimeField(auto_now_add=True)
amount = DecimalField(max_digits=9, decimal_places=2)
I want to write a query that add the last created Transaction amount, per clients, only if created is lower than a provided date argument.
For example, if I have a dataset like this one, and that the provided date is 01/20:
Client1:
- Transaction 1, created on 01/15, 5€
- Transaction 2, created on 01/16, 6€
- Transaction 3, created on 01/22, 7€
Client2:
- Transaction 4, created on 01/18, 8€
- Transaction 5, created on 01/19, 9€
- Transaction 6, created on 01/21, 10€
Client3:
- Transaction 7, created on 01/21, 11€
Then, the query should return 15 (6€ + 9€ from transaction 2 and 5).
From a performance view, my purpose is to avoid having N queries for N clients.
Currently, I have trouble selecting the right Transaction objects.
Maybe I could start by:
Transaction.objects.filter(created__lt=date).select_related('client').
But then, I can't figure out how to select only the latest per client.
Take a look at Django's docs on aggregation, the usage of Sum, SubQuery expressions, and QuerySet.values(). With those we can construct a single query through the ORM to get at what you're after, allowing the database to do all the work:
from django.db.models import Sum, Subquery, OuterRef
from django.utils import timezone
from . import models
# first, start with the client list, rather than the transaction list
aggregation = models.Client.objects.aggregate(
# aggregate the sum of our per client sub queries
result=Sum(
Subquery(
models.Transaction.objects.filter(
# filter transactions by outer query's client pk
client=OuterRef('pk'),
created__lt=timezone.datetime(2018, 1, 20),
)
# order descending so the transaction we're after is first in the list
.order_by('-created')
# use QuerySet.values() to grab just amount and slice the queryset
# to limit the subquery result to a single transaction for each client
.values('amount')[:1]
)
)
)
# aggregation == {'result': Decimal('15.00')}
Something along the lines of the following should do the trick which uses Django's latest QuerySet method
total = 0
for client in clients
try:
total += Transactions.filter(client = client).filter(created__lt = date).latest('created').amount
except DoesNotExist:
pass

How to execute a GROUP BY ... COUNT or SUM in Django ORM?

Prologue:
This is a question arising often in SO:
Django Models Group By
Django equivalent for count and group by
How to query as GROUP BY in django?
How to use the ORM for the equivalent of a SQL count, group and join query?
I have composed an example on SO Documentation but since the Documentation will get shut down on August 8, 2017, I will follow the suggestion of this widely upvoted and discussed meta answer and transform my example to a self-answered post.
Of course, I would be more than happy to see any different approach as well!!
Question:
Assume the model:
class Books(models.Model):
title = models.CharField()
author = models.CharField()
price = models.FloatField()
How can I perform the following queries on that model utilizing Django ORM:
GROUP BY ... COUNT:
SELECT author, COUNT(author) AS count
FROM myapp_books GROUP BY author
GROUP BY ... SUM:
SELECT author, SUM (price) AS total_price
FROM myapp_books GROUP BY author
We can perform a GROUP BY ... COUNT or a GROUP BY ... SUM SQL equivalent queries on Django ORM, with the use of annotate(), values(), the django.db.models's Count and Sum methods respectfully and optionally the order_by() method:
GROUP BY ... COUNT:
from django.db.models import Count
result = Books.objects.values('author')
.order_by('author')
.annotate(count=Count('author'))
Now result contains a dictionary with two keys: author and count:
author | count
------------|-------
OneAuthor | 5
OtherAuthor | 2
... | ...
GROUP BY ... SUM:
from django.db.models import Sum
result = Books.objects.values('author')
.order_by('author')
.annotate(total_price=Sum('price'))
Now result contains a dictionary with two columns: author and total_price:
author | total_price
------------|-------------
OneAuthor | 100.35
OtherAuthor | 50.00
... | ...
UPDATE 13/04/2021
As #dgw points out in the comments, in the case that the model uses a meta option to order rows (ex. ordering), the order_by() clause is paramount for the success of the aggregation!
in group by SUM() you can get almost two dict objects like
inv_data_tot_paid =Invoice.objects.aggregate(total=Sum('amount', filter=Q(status = True,month = m,created_at__year=y)),paid=Sum('amount', filter=Q(status = True,month = m,created_at__year=y,paid=1)))
print(inv_data_tot_paid)
##output -{'total': 103456, 'paid': None}
do not try out more than two query filter otherwise, you will get error like

Categories