Django. Doing complex calculations in database - python

Here's part of my model file :
class Metric(models.Model):
Team = models.ForeignKey(Team)
metric_name = models.CharField(max_length = 40)
def __unicode__(self):
return self.metric_name
class Members(models.Model):
Metric = models.ForeignKey(Metric, through="Calculate")
member_name = models.CharField(max_length = 40, null=True, blank=True)
week_one = models.IntegerField(null=True, blank=True)
week_two = models.IntegerField(null=True, blank=True)
week_three = models.IntegerField(null=True, blank=True)
week_four = models.IntegerField(null=True, blank=True)
total = models.IntegerField(null=True, blank=True)
def __unicode__(self):
return self.member_ID
def save(self, *args, **kwargs):
self.total = int(self.week_one)+int(self.week_two)+int(self.week_three)+int(self.week_four)
super(Members, self).save(*args, **kwargs) # Call the "real" save() method.
Now what I want to do is. I want to calculate the number of members per metric the aggregate total of all the members in a metric and the highest total among all the members in a metric.
I can't figure out a way to do this in Django.
I want to make these calculations and store them in the database.
Can anyone please help me out with this.
Thanks

If you wish to "memoize" these results, there are at least two paths that you could follow:
A per-row post-save trigger on the Members table that updates "members_count", "members_total" and "members_max" fields in the Metric table.
The challenge with this is in maintaining the trigger creation DDL alongside the rest of your code and applying it automatically whenever the models are re-created or altered.
The Django ORM does not make this especially easy. The commonest migration tool ( south ) also doesn't go out of its way to make this easy. Also note that this solution will be specific to one RDBMS and that some RDBMSs may not support this.
You could create "synthetic" fields in your Metric model then use a post-save signal handler to update them whenever a Member is added or changed.
class Metric(models.Model):
Team = models.ForeignKey(Team)
metric_name = models.CharField(max_length = 40)
members_count = models.IntegerField()
members_max = models.IntegerField()
members_total = models.IntegerField()
def __unicode__(self):
return self.metric_name
# signal handling
from django.db.models import signals
from django.dispatch import dispatcher
def members_post_save(sender, instance, signal, *args, **kwargs):
# Update the count, max, total fields
met = sender.Metric # sender is an instance of Members
metric.members_count = Members.objects.filter(Metric=met).count()
# more code here to do the average etc;
dispatcher.connect(members_post_save, signal=signals.post_save, sender=Members)
The django signals documentation here can be of use.
Caveats
While this sort of approach could be made to achieve your stated goal, it is brittle. You need to have test coverage that ensures that this signal handler always fires, even after you've done some refactoring of your code.
I would also consider using "related objects" queries [ documented at https://docs.djangoproject.com/en/dev/topics/db/queries/#related-objects ] eg, assuming we have a "me" instance of Metric
>> members_count = me.members_set.count()
>> # aggregation queries for the total and max
If these aggregates are not used very often, path #2 could be a viable and more maintainable option.

Related

How do I check if two instances of a Django model are the same across a set of attributes and annotate a queryset accordingly?

My app has a model "OptimizationResult", where I store results from mathmatical optimization. The optimization distributes timeslots over projects. I need to indicate whether the current results is different from a recent result, based on a set of attributes (in particularly not the primary key)
The attribute optimization_run is a coutner for different runs
Project is a ForeignKey to the project.
By overwriting the __hash__ and __eq__ functions on the model I can compare the different instances by
OptimizationResults.objects.filter(proj = 1).filter(optimization_run =1).first() == OptimizationResults.objects.filter(proj = 1).filter(optimization_run = 2).first()
. But as I understand __eq__ and __hash__ are not available on the database.
How would I annotate the results accordingly? Something like
OptimizationResults.objects.filter(optimization_run = 2).annotate(same_as_before = Case(When(),default=False))
Edit
Added .first() to the code, to ensure that there is only one element.
class OptimizationResult(models.Model):
project = models.ForeignKey(project, on_delete=models.CASCADE)
request_weight = models.IntegerField()
periods_to_plan = models.IntegerField()
unscheduled_periods = models.IntegerField()
scheduled_periods = models.IntegerField()
start = models.DateField(null=True, blank=True, default=None)
end = models.DateField(null=True, blank=True, default=None)
pub_date = models.DateTimeField('Erstellungsdatum', auto_now_add=True, editable=False)
optimization_run= models.ForeignKey(OptimizationRun, on_delete=models.CASCADE)
I'd like to compore different entries on the basis of start and end.
Edit 2
My fruitless attempt with Subquery:
old = OptimizationResult.objects.filter(project=OuterRef('pk')).filter(optimization_run=19)
newest = OptimizationResult.objects.filter(project=OuterRef('pk')).filter(optimization_run=21)
Project.objects.annotate(changed = Subquery(newest.values('start')[:1])== Subquery(old.values('start')[:1]))
results in TypeError: QuerySet.annotate() received non-expression(s): False
We can use a subquery here, to make an annotation:
from django.db.models import Exists, OuterRef, Subquery, Q
to_exclude = {'pk', 'id', 'project', 'project_id', 'optimization_run', 'optimization_run_id'}
subquery = OptimizationResult.objects.filter(
project_id=OuterRef('project_id')
optimization_run=1,
**{f.name: OuterRef(f.name)
for f in OptimizationResult._meta.get_fields()
if f.name not in to_exclude
}
)
OptimizationResult.objects.filter(
optimization_run=2
).annotate(
are_same=Exist(subquery)
)
Here we will thus annotate all the OptimizationResults with an optimization_run=2, with an extra attribute .are_same that checks if there exists an OptimizationResult object for optimization_run=1 and for the same project_id, where all fields are the same, except the ones in the to_exclude set.

Django using an object in an ORM query, within its __init__ function

I have a model which exists only to combine a number of other models. There are a series of 'payable' objects, with need to be clubbed together into a 'payment' object, called 'ClubbedPayment'.
The model is given a list of payable objects, and it uses them to calculate various attributes. This works fine, so far. These payable objects have a foreign key to ClubbedPayment. Rather than going through the payable objects and adding the new ClubbedPayment object to each of them as a foreign key, it seems easier to add them to ClubbedPayment.payable_set(). I've tried to do this, but it fails because in the __init__ function, the ClubbedPayment object is not yet saved:
class ClubbedPayment(models.Model):
recipient = models.ForeignKey(User, on_delete=models.PROTECT)
paid_at = models.DateTimeField(null=True)
is_unpaid = models.BooleanField(default=True)
total = models.DecimalField(max_digits=6, decimal_places=2)
payment_id = models.UUIDField(null=True, editable=False)
response = models.CharField(null=True, max_length=140)
payment_type = models.CharField(max_length=2, choices=PAYMENT_TYPES)
def __init__(self, *args, **kwargs):
super().__init__()
objects = kwargs["objects"]
recipients = set([x.recipient for x in objects])
types = set([type(x) for x in objects])
if len(recipients) > 1:
raise Exception("Cannot club payments to more than one recipient")
self.total = sum([x.total for x in objects])
self.recipient = list(recipients)[0]
for x in objects:
self.payable_set.add(x)
Now, I know that overriding __init__ with non optional arguments is considered dodgy, because it will prevent saving elsewhere, but it's not a problem, because there is only one place in the entire codebase where these objects are created or modified.
The question is, how do I get around this? Calling save() from within the __init__ function seems like a bad plan ;)

Django - How to update a field for a model on another model's creation?

I have two models, a Product model and a Rating model. What I want to accomplish is, every time a "Rating" is created, via an API POST to an endpoint created using DRF, I want to compute and update the average_rating field in the associated Product.
class Product(models.Model):
name = models.CharField(max_length=100)
...
average_rating = models.DecimalField(max_digits=3, decimal_places=2)
def __str__(self):
return self.name
class Rating(models.Model):
rating = models.IntegerField()
product = models.ForeignKey('Product', related_name='ratings')
def __str__(self):
return "{}".format(self.rating)
What is the best way to do this? Do I use a post_save (Post create?) signal?
What is the best way to do this? Do I use a post_save (Post create?) signal?
The problem is not that much here how to do this technically I think, but more how you make this robust. After all it is not only creating new ratings that is important: if people change their rating, or remove a rating, then the average rating needs to be updated as well. It is even possible that if you define a ForeignKey with a cascade, then deleting something related to a Rating can result in removing several ratings, and thus updating several Products. So getting the average in sync can become quite hard. Especially if you would allow other programs to manipulate the database.
It might therefore be better to calculate the average rating. For example with an aggregate:
from django.db.models import Avg
class Product(models.Model):
name = models.CharField(max_length=100)
#property
def average_rating(self):
return self.ratings.aggregate(average_rating=Avg('rating'))['average_rating']
def __str__(self):
return self.name
Or if you want to load multiple Products in a QuerySet, you can do an .annotate(..) to calculate the average rating in bulk:
Product.objects.annotate(
average_rating=Avg('rating__rating')
)
Here the Products will have an attribute average_rating that is the average rating of the related ratings.
In case the number of ratings can be huge, it can take considerable time to calculate the average. In that case I propose to add a field, and use a periodic task to update the rating. For example:
from django.db.models import Avg, OuterRef, Subquery
class Product(models.Model):
name = models.CharField(max_length=100)
avg_rating=models.DecimalField(
max_digits=3,
decimal_places=2,
null=True,
default=None
)
#property
def average_rating(self):
return self.avg_rating or self.ratings.aggregate(average_rating=Avg('rating'))['average_rating']
#classmethod
def update_averages(cls):
subq = cls.objects.filter(
id=OuterRef('id')
).annotate(
avg=Avg('rating__rating')
).values('avg')[:1]
cls.objects.update(
avg_rating=Subquery(subq)
)
def __str__(self):
return self.name
You can then periodically call Product.update_averages() to update the average ratings of all products. In case you create, update, or remove a rating, then you can aim to set the avg_rating field of the related product(s) to None to force recalculation, for example with a post_save, etc. But note that signals can be circumveted (for example with the .update(..) of a queryset, or by bulk_create(..)), and thus that it is still a good idea to periodically synchronize the average ratings.

How to manipulate value of one Model Field from another Model?

I have two models
class Employee(models.Model):
name = models.CharField(max_length=20)
ID = models.IntegerField()
basic_salary = models.IntegerField()
total_leave = models.IntegerField(default = 14)
paid_leave = models.IntegerField(default = 0)
unpaid_leave = models.IntegerField(default = 0)
def __str__(self):
return self.name
class Leave_management(models.Model):
name = models.OnetoOneField(Employee,on_delete= models.CASCADE)
reason = models.CharField(max_length=50)
from = models.DateTimeField()
to = models.DateTimeField()
total_days = models.IntegerField()
def __str__(self):
return self.name
So,i want to minus 'total_days' of 'model-Leave_management' from 'total_leave' field of 'model-Employee'. And as per leaves taken i want to update 'paid_leave' and 'unpaid_leave' sections.
I can perform so if these two models would be one model(example below), But i dont know how to perform so in different models.
def save(self,*args,**kwargs):
if self.total_days<=self.total_leave:
self.total_leave -= self.total_days
self.unpaid_leave = 14 - self.total_leave
else:
self.total_days -= 14
self.paid_leaves = self.total_days
super(Model_name,self).save(*args,**kwargs)
`
Please be guiding me.
In fact your OneToOneField(..) to an Employee is not a name. At the database level it will store values that correspond to primary keys of an Employee, and in Django, name will be a lazy fetch to the corresponding Employee. Therefore I suggest to rename your function to (for example) employee.
Another problem is that you define it as a OneToOneField. That means that an Employee has one Leave_management. But based on the fields (reason, from, to, etc.), it looks like an Employee can have zero, one, or more Leave_managements. So that means it is a ForeignKey.
So our model looks like:
class Leave_management(models.Model):
employee = models.ForeignKey(Employee,on_delete= models.CASCADE)
reason = models.CharField(max_length=50)
from = models.DateTimeField()
to = models.DateTimeField()
total_days = models.IntegerField()
def __str__(self):
return self.employee.name
Like the __str__ function already suggests, we can obtain the name of the employee by querying self.employee, and we can then fetch its .name attribute.
But now the challenge is what to do when save a Leave_management object. In that case the number of total_leave and paid_leave should be updated.
We first have to figure out the total number of total_days that are stored in Leave_management objects related to an Employee, this is equal to:
(Leave_management.objects.filter(employee=some_employee)
.aggregate(totals=Sum('total_days'))['totals'] or 0
So we can then subtract this from 14, and store the (possibly) remaining days in paid_leave, like:
class Leave_management(models.Model):
# ...
def save(self, *args, **kwargs):
super(Leave_management, self).save(*args, **kwargs)
totals = (Leave_management.objects
.filter(employee=some_employee)
.aggregate(totals=Sum('total_days'))['totals'] or 0
employee = self.employee
unpaid = min(14, totals)
employee.total_leave = 14 - unpaid
employee.unpaid_leave = unpaid
employee.paid_leave = totals - unpaid
employee.save()
Note: typically we do not handle this by overriding the .save(..) function, but by using Django signals: triggers that can be implemented when certain objects are saved, etc. This especially should be used since the objects can be changed circumventing the .save(..) function, and sometimes such objects might get deleted as well. So the above is not a good design decision.
Even when we use signals, it is a good idea to frequently (for example once a day) recalculate the total leave, and update the corresponding Employee models.

How to aggregate a custom model function when it's not a part of the database in django?

I have a project that I need to open and close tickets. So, this is my Ticket model:
class Ticket(models.Model):
issue = models.CharField(max_length=100)
user = models.ForeignKey('Users', blank=True, null=True, related_name="tickets")
date_opened = models.DateTimeField('Date opened')
date_closed = models.DateTimeField('Date closed', blank=True, null=True)
def __str__(self):
return self.issue
def time_to_solve(self):
time_to_solve = self.date_opened - self.date_closed
out = [ time_to_solve.hours//60 ]
return '{:d} hours'.format(*out)
and I want to calculate the average of the time difference between date_opened and date_closed.
In my views.py I have created a view :
class Dashboard(ListView):
model = Ticket
template_name = 'assets/dashboard.html'
def get_context_data(self, **kwargs):
context = super(Dashboard, self).get_context_data(**kwargs)
context['time_to_complete'] = Q(status__contains='closed')).aggregate(time_opened = Avg('time_to_solve'))
return context
Unfortunately it does not work because "time_to_solve" is not a part of the database.
How can I achieve that?
You can only aggregate model fields, but it's not hard to do that in python:
tickets = Ticket.objects.filter(status__contains='closed')
average = sum(map(lambda x: x.time_to_solve(), tickets)) / tickets.count()
In this case time_to_solve should return something like the number of seconds and you can format that as you need right after that.
Depending on the number of tickets, this might not be the fastest solution. If performance are an issue you might want to use some kind of denormalization.
I don't think you can do that directly with the ORM. You can do it in Python but that will retrieve all closed Ticket rows from the database. If you want to do it in SQL you'll need to express your query as raw SQL. If you're using PostgreSQL you might find this useful : Working with Dates and Times in PostgreSQL.
Found the answer from a friend in #irc - django on Freenode:
average = Ticket.objects.extra(
select={ 'date_difference': 'AVG(time_to_sec(TIMEDIFF(date_closed,date_opened)))'}).first().date_difference
context['average'] = "{:.2f}".format(average/86400)
return context
This way it returns the average with 2 decimals accuracy and does everything in the database level so it's much lighter to run than fetching all rows.

Categories