Django query: how to apply the mode calculation in the query?

Django query: how to apply the mode calculation in the query? - python

In a previous question I was asking about how to do a complex query in Django. Here is my example model:
class Foo(models.Model):
name = models.CharField(max_length=50)
type = models.CharField(max_length=100, blank=True)
foo_value = models.CharField(max_length=14, blank=True)
time_event = models.DateTimeField(blank=True)
# ... and many many other fields
Now in my previous question #BearBrown answered me with using the When .. then expression to control my query.
Now I need something more. I need to calculate the mode (most repeated value) of the quarter of the month in the time_event field. Manually, I do it like this:
- I manually iterate over all records for the same user.
- Get the day using q['event_time'].day
- Define quarters using quarts = range(1, 31, 7)
- Then, append the calculated quarters to a list month_quarts.append(quarter if quarter <= 4 else 4)
- Then get the mode value for this specific user qm = mode(month_quarts)
Is there a way to automate this mode calculation function in the When .. then expression instead of manually iterating through all records for every user and calculating it?

Related

Calculate the Average delivery time (days) Django ORM

I want to calculate the average delivery time (in days) of products using ORM single query (The reason of using single query is, I've 10000+ records in db and don't want to iterate them over loops). Here is the example of models file, I have:
class Product(models.Model):
name = models.CharField(max_length=10)
class ProductEvents(models.Model):
class Status(models.TextChoices):
IN_TRANSIT = ("in_transit", "In Transit")
DELIVERED = ("delivered", "Delivered")
product = models.ForiegnKey(Product, on_delete=models.CASCADE)
status = models.CharField(max_length=255, choices=Status.choices)
created = models.DateTimeField(blank=True)
To calculate the delivery time for 1 product is:
product = Product.objects.first()
# delivered_date - in_transit_date = days_taken
duration = product.productevent_set.get(status='delivered').created - product.productevent_set.get(status='in_transit').created
I'm here to get your help to getting started myself over this so, that I can calculate the average time between all of the Products. I'd prefer it to done in a single query because of the performance.

A basic solution is to annotate each Product with the minimum created time for related events that have the status "in-transit and select the maximum time for events with the delivered status then annotate the diff and aggregate the average of the diffs
from django.db.models import Min, Max, Q, F, Avg
Product.objects.annotate(
start=Min('productevents__created', filter=Q(productevents__status=ProductEvents.Status.IN_TRANSIT)),
end=Max('productevents__created', filter=Q(productevents__status=ProductEvents.Status.DELIVERED))
).annotate(
diff=F('end') - F('start')
).aggregate(
Avg('diff')
)
Returns a dictionary that should look like
{'diff__avg': datetime.timedelta(days=x, seconds=x)}

Django ORM: How to group on a value and get a different value of last element in that group

I have been trying to tackle this problem all week but I just can't seem to find the solution.
Basically I want to group on 2 values (user and assignment), then take the last element based on date and get a sum of these scores. Below a description of the problem.
With Postgres this would be easily solved by using the .distinct("value") but unfortunately I do not use Postgres.
Any help would be much appreciated!!
UserAnswer
- user
- assignment
- date
- answer
- score
So I want to group on all user / assignment combinations. Then I want to get the score of each last element in that group. So basically:
user_1, assignment_1, 2019, score 1
user_1, assignment_1, 2020, score 2 <- Take this one
user_2, assignment_1, 2020, score 1
user_2, assignment_1, 2021, score 2 <- Take this one
My best attempt is using annotation but then I do not have the score value anymore:
UserAnswer.objects.filter(user=student, assignment__in=assignments)
.values("user", "assignment")
.annotate(latest_date=Max('date'))

At the end, I had to use raw query rather than django's ORM.
subquery2 = UserAnswer.objects.raw("\
SELECT id, user_id, assignment_id, score, MAX(date) AS latest_date\
FROM soforms_useranswer \
GROUP BY user_id, assignment_id\
")
# the raw queryset from above raw query
# is very similar to queryset you get from django ORM query.
# The difference is now we add 'id' and 'score' to the fields,
# so later we can retrieve them, like below.
sum2= 0
for obj in subquery2:
print(obj.score)
sum2 += obj.score
print('sum2 is')
print(sum2)
Here, I assumed that both user and assignment are foreinkeys. Something liek below:
class Assignment(models.Model):
name = models.CharField(max_length=50)
class UserAnswer(models.Model):
user = models.ForeignKey(settings.AUTH_USER_MODEL, on_delete=models.CASCADE, related_name='answers')
assignment = models.ForeignKey(Assignment, on_delete=models.CASCADE)
#assignment = models.CharField(max_length=200)
score = models.IntegerField()
date = models.DateTimeField(default=timezone.now)

Django App : Perform arithmetic operation on two fields in database and store result in third field in DB while adding data in form

I have One form in my application with different fields from which i need to subtracts value of one field from other and store result in third field on the fly in Database.
Example i have 2 fields :
1. Cost of PR Basic & 2. Cost of PO Basic
Need to calculate Delta : Cost of PO Basic - Cost of PR Basic.
Delta is also field in database table.
In models.py i have
class PR_Data(models.Model)
Cost_PR_Basic_INR = models.DecimalField(max_digits=19,decimal_places=2)
Cost_Of_PO_Basic_INR = models.DecimalField(max_digits=19,decimal_places=2)
Delta = models.DecimalField(max_digits=19,decimal_places=2, editable=True)
So how can i calculate delta from values entered in other two fields and store result in Delta field.
Thanks in advance..!!

In your views.py:
do something like this:
pr_data = PR_Data()
pr_data.Cost_PR_Basic_INR = <value1>
pr_data.Cost_Of_PO_Basic_INR = <value2>
pr_data.Delta = <value1> - <value2>
pr_data.save()
just an idea how it should work, not the complete code ;)

Need to quickly update Django model with difference between two other models

I'm populating my database from an API that provides year-to-date stats, and I'll be pulling from this API multiple times a day. Using the year-to-date stats, I need to generate monthly and weekly stats. I'm currently trying to do this by subtracting the stats at the start of the month from the stats at the end of the month and saving it in a separate model, but the process is taking far too long and I need it to go faster.
My models look something like this :
class Stats(models.Model):
date = models.DateField(default=timezone.now) # Date pulled from API
customer_id = models.IntegerField(default=0) # Simplified for this example
a = models.IntegerField(default=0)
b = models.IntegerField(default=0)
c = models.IntegerField(default=0)
d = models.IntegerField(default=0)
class Leaderboard(models.Model):
duration = models.CharField(max_length=7, default="YEARLY") # "MONTHLY", "WEEKLY"
customer_id = models.IntegerField(default=0)
start_stats = models.ForeignKey(Stats, related_name="start_stats") # Stats from the start of the Year/Month/Week
end_stats = models.ForeignKey(Stats, related_name="end_stats") # Stats from the end of the Year/Month/Week
needs_update = models.BooleanField(default=False) # set to True only if the end_stats changed (likely when a new pull happened)
a = models.IntegerField(default=0)
b = models.IntegerField(default=0)
c = models.IntegerField(default=0)
d = models.IntegerField(default=0)
e = models.IntegerField(default=0) # A value computed based on a-d, used to sort Leaderboards
I thought I was going to be home free using Leaderboard.objects.filter(needs_update=True).update(a=F("end_stats__a")-F("start_stats__a"), ...), but that gave me an error "Joined field references are not permitted in this query".
I'm currently iterating over the QuerySet Leaderboard.objects.filter(needs_update=True), doing the subtraction operations, and saving (all with #transaction.atomic), but ~380,000 test records processed this way took just over an hour, so I suspect that this way is going to be too slow for what I need.
I'm OK with changing how I store the data if a different format would help this Leaderboard update go faster (maybe do the subtraction when pulling in the data and saving daily deltas instead?), but I feel like I keep rushing towards whatever comes to mind without any idea of what I should be doing in this situation. Any feedback at this point would be very much appreciated.

After a lot of tinkering, I think I've got a method that will work for this situation. My test sample is smaller than before (84,600 records), but it completes in 8 seconds - about 10,575 records per second (compared to the roughly 6,300 records per second of my earlier tests).
There's probably a way to refine this even more, but here's what I'm doing:
from django.db.models import F, Subquery, OuterRef
# Get the latest versions of the stats
Leaderboard.objects.filter(needs_update=True).update(
a=Subquery(Stats.objects.filter(pk=OuterRef('end_stats')).values('a')[:1]),
b=Subquery(Stats.objects.filter(pk=OuterRef('end_stats')).values('b')[:1]),
c=Subquery(Stats.objects.filter(pk=OuterRef('end_stats')).values('c')[:1]),
d=Subquery(Stats.objects.filter(pk=OuterRef('end_stats')).values('d')[:1])
)
# Subtract the earliest versions of the stats
Leaderboard.objects.filter(needs_update=True).update(
a=F('a') - Subquery(Stats.objects.filter(pk=OuterRef('start_stats')).values('a')[:1]),
b=F('b') - Subquery(Stats.objects.filter(pk=OuterRef('start_stats')).values('b')[:1]),
c=F('c') - Subquery(Stats.objects.filter(pk=OuterRef('start_stats')).values('c')[:1]),
d=F('d') - Subquery(Stats.objects.filter(pk=OuterRef('start_stats')).values('d')[:1])
)
# Calculate stats that require earlier stats.
Leaderboard.objects.filter(needs_update=True).update(
e=F('a') + F('b') * F('c') / F('d'),
needs_update=False
)
I feel like there should be a way to only use one Subquery per update, which should improve the speed even more.

Django model filter compare month fail

Suppose I have this model:
class User(models.Model):
id = models.AutoField(primary_key=True)
username = models.CharField(max_length=10)
last_login = models.DateTimeField(null=True)
Here is one of the records in database,
id=15,
username='yhbohh'
last_login='2015-03-31 10:57:18'
I would like to get a number count of objects with last login of month=3.
I tried in shell,
User.objects.filter(last_login__year=2015).count() # return 80
User.objects.filter(last_login__month=3).count() # return 0
User.objects.filter(last_login__day=31).count() # return 0
May I know why the last 2 queries return no records?
I have searched from the other questions and notice than someone may suggest to use date range comparison to solve this problem. But I just wanna know the root cause of this unexpected result.
Thanks a lot!

According to Django documentation:
When USE_TZ is True, datetime fields are converted to the current time
zone before filtering.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django query: how to apply the mode calculation in the query? - python

Related

Calculate the Average delivery time (days) Django ORM

Django ORM: How to group on a value and get a different value of last element in that group

Django App : Perform arithmetic operation on two fields in database and store result in third field in DB while adding data in form

Need to quickly update Django model with difference between two other models

Django model filter compare month fail

Categories

Resources