Persistent Calculated Fields in Django

Persistent Calculated Fields in Django - python

In MS SQL Server there is a feature to create a calculated column: a table column that is calculated on the fly at retrieval time. This more-or-less maps on to using a method on a Django model to retrieve a calculated value (the common example being retrieving Full Name, based on stored Forename/Surname fields).
For expensive operations, SQL Server provides a Persisted option. This populates the table column with the results of the calculation, and updates those results when the table is updated - a very useful feature when the calculation is not quick but does not change often compared to access.
However, in Django I cannot find a way to duplicate this functionality. Am I missing something obvious? My best guess would be some sort of custom Field that takes a function as a parameter, but I couldn't see a pre-existing one of those. Is there a better way?

One approach is just to use a regular model field that is calculated whenever an object is saved, e.g.,:
class MyModel(models.Model):
first_name = models.CharField(max_length=255)
surname = models.CharField(max_length=255)
# This is your 'persisted' field
full_name = models.CharField(max_length=255, blank=True)
def save(self, *args, **kwargs):
# set the full name whenever the object is saved
self.full_name = '{} {}'.format(self.first_name, self.surname)
super(MyModel, self).save(*args, **kwargs)
You could make this special field read-only in the admin and similarly exclude it from any model forms.

Related

Django aggregate sum on child model field

Consider the following models:
from django.db import models
from django.db.models import Sum
from decimal import *
class Supply(models.Model):
"""Addition of new batches to stock"""
bottles_number = models.PositiveSmallIntegerField(
bottles_remaining = models.DecimalField(max_digits=4, decimal_places=1, default=0.0)
def remain(self, *args, **kwargs):
used = Pick.objects.filter(supply=self).aggregate(
total=Sum(Pick.n_bottles))[bottles_used__sum]
left = self.bottles_number - used
return left
def save(self, *args, **kwargs):
self.bottles_remaining = self.remain()
super(Supply, self).save(*args, **kwargs)
class Pick(models.Model):
""" Removals from specific stock batch """
supply = models.ForeignKey(Supply, on_delete = models.CASCADE)
n_bottles = models.DecimalField(max_digits=4, decimal_places=1)
Every time an item (bottles in this case) is used, I need to update the "bottles_remaining" field to show the current number in stock. I do know that best practice is normally to avoid storing in the database values that can be calculated on the fly, but I need to do so in order to have the data available for use outside of Django.
This is part of a stock management system originally built in PHP through Xataface. Not being a trained programmer, I managed to get most of it done by googling, but now I am totally stuck on this key feature. The remain() function is probably a total mess. Any pointers as to how to perform that calculation and extract the value would be greatly appreciated.

Not sure what you actually want to solve and what is exact problem.
Possible solutions are
Use SQL View
CREATE VIEW bottles_extended AS
SELECT id, total_number, used, (total_number - used) as remaining
FROM bottles;
After that you may simply get data as select total_number, used, remaining from bottles_extended on PHP side
Add (total_number - used) as remaining column directly in PHP SQL query
Current Django solution to update field automatically on save looks not perfect but also should be working solution (in addition you may add serializers and set serializer's remaining field as read only. This will prevent to change value by user manually)

Can I create a Django object using a subquery for a field value?

TLDR
When creating a new object using Django ORM, can I, in a transactionally safe / race-condition-free manner, set a field's value based on an already existing object's value, say F('sequence_number') + 1 where F('sequence_number') refers not to the current object (which does not exist yet) but to the most recent object with that prefix in the table?
Longer version
I have a model Issue with properties sequence_number and sequence_prefix. There is a unique constraint on (sequence_prefix, sequence_number) (e.g. DATA-1).
class Issue(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
sequence_prefix = models.CharField(blank=True, default="", max_length=32)
sequence_number = models.IntegerField(null=False)
class Meta:
constraints = [
models.UniqueConstraint(
fields=["sequence_prefix", "sequence_number"], name="unique_sequence"
)
]
The idea is that issues —for auditing purposes— have unique sequence numbers for each variable (user-determined) prefix: when creating an issue the user selects a prefix, e.g. REVIEW or DATA, and the sequence number is the incremented value of the previous issue with that same sequence. So it's like an AutoField but dependent on the value of another field for its value. There can not be two issues DATA-1, but REVIEW-1 and DATA-1 and OTHER-1 all may exist at the same time.
How can I tell Django when creating an Issue, that it must find the most recent object for that given sequence_prefix, take the sequence_number + 1 and use that for the new object's sequence_number value, in a way that is safe of any race-condition?

A good way to archive this is to override the save() method of the Issue model.
For example:
class Issue(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
sequence_prefix = models.CharField(blank=True, default="", max_length=32)
sequence_number = models.IntegerField(null=False)
def save(self, *args, **kwargs):
max_id_by_prefix = Issue.objects.filter(sequence_prefix=self.sequence_prefix).max().id
self.sequence_number = max_id_by_prefix + 1
super(Issue, self).save(*args, **kwargs)
class Meta:
constraints = [
models.UniqueConstraint(
fields=["sequence_prefix", "sequence_number"], name="unique_sequence"
)
]
In this way, before saving the object, you can take the max sequence_number of the sequence_prefix that you are saving.

Unless you want to use database sequences (AutoField), I believe you will need to implement something on your own. There are two options
Prevent concurrent inserts per specific sequence_prefix with some locking mechanism (I would use Redis for a distributed lock, to support multi-processing setup)
Implement your own sequencing (again, Redis is a perfect choices), which will provide you with auto-incrementing sequence_number per prefix. For example:
sequence_number = redis_client.incr('sequence:REVIEW')

How to write manager class which use filter field as computed field not as a part of model fields?

I have a model Student with manager StudentManager as given below. As property gives the last date by adding college_duration in join_date. But when I execute this property computation is working well, but for StudentManager it gives an error. How to write manager class which on the fly computes some field using model fields and which is used to filter records.
The computed field is not in model fields. still, I want that as filter criteria.
class StudentManager(models.Manager):
def passed_students(self):
return self.filter(college_end_date__lt=timezone.now())
class Student(models.Model):
join_date = models.DateTimeField(auto_now_add=True)
college_duration = models.IntegerField(default=4)
objects = StudentManager()
#property
def college_end_date(self):
last_date = self.join_date + timezone.timedelta(days=self.college_duration)
return last_date
Error Django gives. when I tried to access Student.objects.passed_students()
django.core.exceptions.FieldError: Cannot resolve keyword 'college_end_date' into field. Choices are: join_date, college_duration

Q 1. How alias queries done in Django ORM?
By using the annotate(...)--(Django Doc) or alias(...) (New in Django 3.2) if you're using the value only as a filter.
Q 2. Why property not accessed in Django managers?
Because the model managers (more accurately, the QuerySet s) are wrapping things that are being done in the database. You can call the model managers as a high-level database wrapper too.
But, the property college_end_date is only defined in your model class and the database is not aware of it, and hence the error.
Q 3. How to write manager to filter records based on the field which is not in models, but can be calculated using fields present in the model?
Using annotate(...) method is the proper Django way of doing so. As a side note, a complex property logic may not be re-create with the annotate(...) method.
In your case, I would change college_duration field from IntegerField(...) to DurationField(...)--(Django Doc) since its make more sense (to me)
Later, update your manager and the properties as,
from django.db import models
from django.utils import timezone
class StudentManager(models.Manager):
<b>def passed_students(self):
default_qs = self.get_queryset()
college_end = models.ExpressionWrapper(
models.F('join_date') + models.F('college_duration'),
output_field=models.DateField()
)
return default_qs \
.annotate(college_end=college_end) \
.filter(college_end__lt=timezone.now().date())</b>
class Student(models.Model):
join_date = models.DateTimeField()
college_duration = models.DurationField()
objects = StudentManager()
#property
def college_end_date(self):
# return date by summing the datetime and timedelta objects
return <b>(self.join_date + self.college_duration).date()
Note:
DurationField(...) will work as expected in PostgreSQL and this implementation will work as-is in PSQL. You may have problems if you are using any other databases, if so, you may need to have a "database function" which operates over the datetime and duration datasets corresponding to your specific database.
Personally, I like this solution,

To quote #Willem Van Olsem's comment:
You don't. The database does not know anything about properties, etc. So it can not filter on this. You can make use of .annotate(..) to move the logic to the database side.
You can either do the message he shared, or make that a model field that auto calculates.
class StudentManager(models.Manager):
def passed_students(self):
return self.filter(college_end_date__lt=timezone.now())
class Student(models.Model):
join_date = models.DateTimeField(auto_now_add=True)
college_duration = models.IntegerField(default=4)
college_end_date = models.DateTimeField()
objects = StudentManager()
def save(self, *args, **kwargs):
# Add logic here
if not self.college_end_date:
self.college_end_date = self.join_date + timezone.timedelta(days-self.college_duration)
return super.save(*args, **kwargs)
Now you can search it in the database.
NOTE: This sort of thing is best to do from the start on data you KNOW you're going to want to filter. If you have pre-existing data, you'll need to re-save all existing instances.

Problem
You’re attempting to query on a row that doesn’t exist in the database. Also, Django ORM doesn’t recognize a property as a field to register.
Solution
The direct answer to your question would be to create annotations, which could be subsequently queried off of. However, I would reconsider your table design for Student as it introduces unnecessary complexity and maintenance overhead.
There’s much more framework/db support for start date, end date idiosyncrasy than there is start date, timedelta.
Instead of storing duration, store end_date and calculate duration in a model method. This makes more not only makes more sense as students are generally provided a start date and estimated graduation date rather than duration, but also because it’ll make queries like these much easier.
Example
Querying which students are graduating in 2020.
Students.objects.filter(end_date__year=2020)

Django/MySQL Unique Constraint how to treat NULLs as equal

I have a Django model which has a recursive field. A simplified version is below. The idea is roughly to have a tree data structure in sql. The problem which I have is that, apparently, Django does not treat NULLs as equal. The problem now is that, since every tree's root has a 'null pointer' by necessity, I can have two identical trees but Django will treat them as different because of the NULL value. How can I implement the UniqueConstraint below so that two 'Link' objects with NULL link values and equal node values will be treated as identical, and fail the UniqueConstraint test? Thank you.
class Link(models.Model):
node = models.ForeignKey(Node, on_delete=models.CASCADE)
link = models.ForeignKey('self', on_delete = models.CASCADE, null=True)
class Meta:
constraints = [
models.UniqueConstraint(['node', 'link'], name='pipe_unique')
]
EDIT
Of course ideally the constraint would be enforced by the db. But even if I can enforce it in application logic by hooking somewhere or using a custom constraint, that would be good enough.

You may be able to do this with a custom constraint
UniqueConstraint(fields=['node'], condition=Q(link__isnull=True), name='unique_root_node')
EDIT:
If you wished to manually add the check you could do it in the save method of Link and also in the clean method so that it get's run in any model forms before you even get to saving the instance
def clean(self):
if self.node_id and not self.link_id:
if self.__class__.objects.exclude(pk=self.pk).filter(node=self.node, link__isnull=True).exists():
raise ValidationError(f'A root node already exists for {self.node}')
Excluding pk=self.pk avoids getting conflicts with itself when updating an object
def save(self, *args, **kwargs):
self.clean()
super().save(*args, **kwargs)

How can I iterate through an SQL database using a function within a Django model?

I need to know how I can create a function within a Django model which iterates through a SQL database, specifically another model.
My Django project contains two models, 'Accelerator' and 'Review'. The Accelerator model has a decimal field 'overall_rating', the value of which is to depend on the accumulation of values inputted into one of the Review model fields, 'overall'. To make this work, I have concluded that I need to create a function within the Accelerator model which:
Iterates through the Review model database
Adds the value of Review.overall to a list where a certain condition is met
Calculates the total value of the list and divides it by the list length to determine the value for overall_rating
The value of accelerator.overall_rating will be prone to change (i.e. it will need to update whenever a new review of that 'accelerator' has been published and hence a new value for review.overall added). So my questions are:
Would inserting a function into my accelerator model ensure that its value changes in accordance with the review model input?
If yes, what syntax is required to iterate through the review model contents of my database?
(I've only included the relevant model fields in the shared code)
class Accelerator(models.Model):
overall_rating = models.DecimalField(decimal_places=2, max_digits=3)
class Review(models.Model):
subject = models.ForeignKey(Accelerator, on_delete=models.CASCADE, blank=False)
overall = models.DecimalField(decimal_places=2, max_digits=3)

You usually do not calculate aggregates yourself, but let the database do this. Databases are optimized to do this. Iterating over the collection would result in the fact that the database needs to communicate all the relevant records which will result in using a lot of bandwidth.
If you aim to calculate the average rating, you can just aggregate over the review_set with the Avg aggregate [Django-doc], like:
from django.db.models import Avg
class Accelerator(models.Model):
#property
def average_rating(self):
return self.review_set.aggregate(
average_rating=Avg('overall')
)['average_rating']
The above is not very useful if you need to do this for a large set of Accelerators, since that would result in a query per Accelerator. You can however annotate(..) [Django-doc] your Accelerator class, for example with a Manager [Django-doc]:
from django.db.models import Avg
class AcceleratorManager(models.Manager):
def get_queryset(self):
return super().get_queryset().annotate(
_average_rating=Avg('review__overall')
)
Then we can alter the Accelerator class to first take a look if the value is already calcuated by the annotation:
class Accelerator(models.Model):
objects = models.Manager()
objects_with_rating = AcceleratorManager()
#property
def average_rating(self):
try:
return self._average_rating
except AttributeError:
self._average_rating = result = self.review_set.aggregate(
average_rating=Avg('overal')
)['average_rating']
return result
If we then thus access for example Accelerator.objects_with_rating.filter(pk__lt=15), we will in bulk calculate the average rating of these Accelerators.
Storing the average in the database is likely not a good idea, since it introduces data duplication, and synchronizing data duplication tends to be a hard problem.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Persistent Calculated Fields in Django - python

Related

Django aggregate sum on child model field

Can I create a Django object using a subquery for a field value?

How to write manager class which use filter field as computed field not as a part of model fields?

Django/MySQL Unique Constraint how to treat NULLs as equal

How can I iterate through an SQL database using a function within a Django model?

Categories

Resources