Django aggregate sum on child model field

Django aggregate sum on child model field - python

Consider the following models:
from django.db import models
from django.db.models import Sum
from decimal import *
class Supply(models.Model):
"""Addition of new batches to stock"""
bottles_number = models.PositiveSmallIntegerField(
bottles_remaining = models.DecimalField(max_digits=4, decimal_places=1, default=0.0)
def remain(self, *args, **kwargs):
used = Pick.objects.filter(supply=self).aggregate(
total=Sum(Pick.n_bottles))[bottles_used__sum]
left = self.bottles_number - used
return left
def save(self, *args, **kwargs):
self.bottles_remaining = self.remain()
super(Supply, self).save(*args, **kwargs)
class Pick(models.Model):
""" Removals from specific stock batch """
supply = models.ForeignKey(Supply, on_delete = models.CASCADE)
n_bottles = models.DecimalField(max_digits=4, decimal_places=1)
Every time an item (bottles in this case) is used, I need to update the "bottles_remaining" field to show the current number in stock. I do know that best practice is normally to avoid storing in the database values that can be calculated on the fly, but I need to do so in order to have the data available for use outside of Django.
This is part of a stock management system originally built in PHP through Xataface. Not being a trained programmer, I managed to get most of it done by googling, but now I am totally stuck on this key feature. The remain() function is probably a total mess. Any pointers as to how to perform that calculation and extract the value would be greatly appreciated.

Not sure what you actually want to solve and what is exact problem.
Possible solutions are
Use SQL View
CREATE VIEW bottles_extended AS
SELECT id, total_number, used, (total_number - used) as remaining
FROM bottles;
After that you may simply get data as select total_number, used, remaining from bottles_extended on PHP side
Add (total_number - used) as remaining column directly in PHP SQL query
Current Django solution to update field automatically on save looks not perfect but also should be working solution (in addition you may add serializers and set serializer's remaining field as read only. This will prevent to change value by user manually)

Related

How to write manager class which use filter field as computed field not as a part of model fields?

I have a model Student with manager StudentManager as given below. As property gives the last date by adding college_duration in join_date. But when I execute this property computation is working well, but for StudentManager it gives an error. How to write manager class which on the fly computes some field using model fields and which is used to filter records.
The computed field is not in model fields. still, I want that as filter criteria.
class StudentManager(models.Manager):
def passed_students(self):
return self.filter(college_end_date__lt=timezone.now())
class Student(models.Model):
join_date = models.DateTimeField(auto_now_add=True)
college_duration = models.IntegerField(default=4)
objects = StudentManager()
#property
def college_end_date(self):
last_date = self.join_date + timezone.timedelta(days=self.college_duration)
return last_date
Error Django gives. when I tried to access Student.objects.passed_students()
django.core.exceptions.FieldError: Cannot resolve keyword 'college_end_date' into field. Choices are: join_date, college_duration

Q 1. How alias queries done in Django ORM?
By using the annotate(...)--(Django Doc) or alias(...) (New in Django 3.2) if you're using the value only as a filter.
Q 2. Why property not accessed in Django managers?
Because the model managers (more accurately, the QuerySet s) are wrapping things that are being done in the database. You can call the model managers as a high-level database wrapper too.
But, the property college_end_date is only defined in your model class and the database is not aware of it, and hence the error.
Q 3. How to write manager to filter records based on the field which is not in models, but can be calculated using fields present in the model?
Using annotate(...) method is the proper Django way of doing so. As a side note, a complex property logic may not be re-create with the annotate(...) method.
In your case, I would change college_duration field from IntegerField(...) to DurationField(...)--(Django Doc) since its make more sense (to me)
Later, update your manager and the properties as,
from django.db import models
from django.utils import timezone
class StudentManager(models.Manager):
<b>def passed_students(self):
default_qs = self.get_queryset()
college_end = models.ExpressionWrapper(
models.F('join_date') + models.F('college_duration'),
output_field=models.DateField()
)
return default_qs \
.annotate(college_end=college_end) \
.filter(college_end__lt=timezone.now().date())</b>
class Student(models.Model):
join_date = models.DateTimeField()
college_duration = models.DurationField()
objects = StudentManager()
#property
def college_end_date(self):
# return date by summing the datetime and timedelta objects
return <b>(self.join_date + self.college_duration).date()
Note:
DurationField(...) will work as expected in PostgreSQL and this implementation will work as-is in PSQL. You may have problems if you are using any other databases, if so, you may need to have a "database function" which operates over the datetime and duration datasets corresponding to your specific database.
Personally, I like this solution,

To quote #Willem Van Olsem's comment:
You don't. The database does not know anything about properties, etc. So it can not filter on this. You can make use of .annotate(..) to move the logic to the database side.
You can either do the message he shared, or make that a model field that auto calculates.
class StudentManager(models.Manager):
def passed_students(self):
return self.filter(college_end_date__lt=timezone.now())
class Student(models.Model):
join_date = models.DateTimeField(auto_now_add=True)
college_duration = models.IntegerField(default=4)
college_end_date = models.DateTimeField()
objects = StudentManager()
def save(self, *args, **kwargs):
# Add logic here
if not self.college_end_date:
self.college_end_date = self.join_date + timezone.timedelta(days-self.college_duration)
return super.save(*args, **kwargs)
Now you can search it in the database.
NOTE: This sort of thing is best to do from the start on data you KNOW you're going to want to filter. If you have pre-existing data, you'll need to re-save all existing instances.

Problem
You’re attempting to query on a row that doesn’t exist in the database. Also, Django ORM doesn’t recognize a property as a field to register.
Solution
The direct answer to your question would be to create annotations, which could be subsequently queried off of. However, I would reconsider your table design for Student as it introduces unnecessary complexity and maintenance overhead.
There’s much more framework/db support for start date, end date idiosyncrasy than there is start date, timedelta.
Instead of storing duration, store end_date and calculate duration in a model method. This makes more not only makes more sense as students are generally provided a start date and estimated graduation date rather than duration, but also because it’ll make queries like these much easier.
Example
Querying which students are graduating in 2020.
Students.objects.filter(end_date__year=2020)

How can I iterate through an SQL database using a function within a Django model?

I need to know how I can create a function within a Django model which iterates through a SQL database, specifically another model.
My Django project contains two models, 'Accelerator' and 'Review'. The Accelerator model has a decimal field 'overall_rating', the value of which is to depend on the accumulation of values inputted into one of the Review model fields, 'overall'. To make this work, I have concluded that I need to create a function within the Accelerator model which:
Iterates through the Review model database
Adds the value of Review.overall to a list where a certain condition is met
Calculates the total value of the list and divides it by the list length to determine the value for overall_rating
The value of accelerator.overall_rating will be prone to change (i.e. it will need to update whenever a new review of that 'accelerator' has been published and hence a new value for review.overall added). So my questions are:
Would inserting a function into my accelerator model ensure that its value changes in accordance with the review model input?
If yes, what syntax is required to iterate through the review model contents of my database?
(I've only included the relevant model fields in the shared code)
class Accelerator(models.Model):
overall_rating = models.DecimalField(decimal_places=2, max_digits=3)
class Review(models.Model):
subject = models.ForeignKey(Accelerator, on_delete=models.CASCADE, blank=False)
overall = models.DecimalField(decimal_places=2, max_digits=3)

You usually do not calculate aggregates yourself, but let the database do this. Databases are optimized to do this. Iterating over the collection would result in the fact that the database needs to communicate all the relevant records which will result in using a lot of bandwidth.
If you aim to calculate the average rating, you can just aggregate over the review_set with the Avg aggregate [Django-doc], like:
from django.db.models import Avg
class Accelerator(models.Model):
#property
def average_rating(self):
return self.review_set.aggregate(
average_rating=Avg('overall')
)['average_rating']
The above is not very useful if you need to do this for a large set of Accelerators, since that would result in a query per Accelerator. You can however annotate(..) [Django-doc] your Accelerator class, for example with a Manager [Django-doc]:
from django.db.models import Avg
class AcceleratorManager(models.Manager):
def get_queryset(self):
return super().get_queryset().annotate(
_average_rating=Avg('review__overall')
)
Then we can alter the Accelerator class to first take a look if the value is already calcuated by the annotation:
class Accelerator(models.Model):
objects = models.Manager()
objects_with_rating = AcceleratorManager()
#property
def average_rating(self):
try:
return self._average_rating
except AttributeError:
self._average_rating = result = self.review_set.aggregate(
average_rating=Avg('overal')
)['average_rating']
return result
If we then thus access for example Accelerator.objects_with_rating.filter(pk__lt=15), we will in bulk calculate the average rating of these Accelerators.
Storing the average in the database is likely not a good idea, since it introduces data duplication, and synchronizing data duplication tends to be a hard problem.

Persistent Calculated Fields in Django

In MS SQL Server there is a feature to create a calculated column: a table column that is calculated on the fly at retrieval time. This more-or-less maps on to using a method on a Django model to retrieve a calculated value (the common example being retrieving Full Name, based on stored Forename/Surname fields).
For expensive operations, SQL Server provides a Persisted option. This populates the table column with the results of the calculation, and updates those results when the table is updated - a very useful feature when the calculation is not quick but does not change often compared to access.
However, in Django I cannot find a way to duplicate this functionality. Am I missing something obvious? My best guess would be some sort of custom Field that takes a function as a parameter, but I couldn't see a pre-existing one of those. Is there a better way?

One approach is just to use a regular model field that is calculated whenever an object is saved, e.g.,:
class MyModel(models.Model):
first_name = models.CharField(max_length=255)
surname = models.CharField(max_length=255)
# This is your 'persisted' field
full_name = models.CharField(max_length=255, blank=True)
def save(self, *args, **kwargs):
# set the full name whenever the object is saved
self.full_name = '{} {}'.format(self.first_name, self.surname)
super(MyModel, self).save(*args, **kwargs)
You could make this special field read-only in the admin and similarly exclude it from any model forms.

Django Model: Default method for field called multiple times

I have the following two models (just for a test):
class IdGeneratorModel(models.Model):
table = models.CharField(primary_key=True, unique=True,
null=False, max_length=32)
last_created_id = models.BigIntegerField(default=0, null=False,
unique=False)
#staticmethod
def get_id_for_table(table: str) -> int:
try:
last_id_set = IdGeneratorModel.objects.get(table=table)
new_id = last_id_set.last_created_id + 1
last_id_set.last_created_id = new_id
last_id_set.save()
return new_id
except IdGeneratorModel.DoesNotExist:
np = IdGeneratorModel()
np.table = table
np.save()
return IdGeneratorModel.get_id_for_table(table)
class TestDataModel(models.Model):
class Generator:
#staticmethod
def get_id():
return IdGeneratorModel.get_id_for_table('TestDataModel')
id = models.BigIntegerField(null=False, primary_key=True,
editable=False, auto_created=True,
default=Generator.get_id)
data = models.CharField(max_length=16)
Now I use the normal Django Admin site to create a new Test Data Set element. What I expected (and maybe I'm wrong here) is, that the method Generator.get_id() is called exactly one time when saving the new dataset to the database. But what really happens is, that the Generator.get_id() method is called three times:
First time when I click the "add a Test Data Set" button in the admin area
A second time shortly after that (no extra interaction from the user's side)
And a third time when finally saving the new data set
The first time could be OK: This would be the value pre-filled in a form field. Since the primary key field is not displayed in my form, this may be an unnecessary call.
The third time is also clear: It's done before saving. When it's really needed.
The code above is only an example and it is a test for me. In the real project I have to ask a remote system for an ID instead from another table model. But whenever I query that system, the delivered ID gets locked there - like the get_id_for_table() method counts up.
I'm sure there are better ways to get an ID from a method only when really needed - the method should be called exactly one time - when inserting the new dataset. Any idea how to achieve that?
Forgot the version: It's Django 1.8.5 on Python 3.4.

This is not an answer to your question, but could be a solution to your problem
I believe this issue is very complicated. Especially because you want a transaction that spans a webservice call and a database insert... What I would use in this case: generate a uuid locally. This value is practially guaranteed to be unique in the 4d world (time + location) and use that as id. Later, when the save is done, sync with your remote services.

Django: Ordering objects by their children's attributes

Consider the models:
class Author(models.Model):
name = models.CharField(max_length=200, unique=True)
class Book(models.Model):
pub_date = models.DateTimeField()
author = models.ForeignKey(Author)
Now suppose I want to order all the books by, say, their pub_date. I would use order_by('pub_date'). But what if I want a list of all authors ordered according to who most recently published books?
It's really very simple when you think about it. It's essentially:
The author on top is the one who most recently published a book
The next one is the one who published books not as new as the first,
So on etc.
I could probably hack something together, but since this could grow big, I need to know that I'm doing it right.
Help appreciated!
Edit: Lastly, would the option of just adding a new field to each one to show the date of the last book and just updating that the whole time be better?

from django.db.models import Max
Author.objects.annotate(max_pub_date=Max('books__pub_date')).order_by('-max_pub_date')
this requires that you use django 1.1
and i assumed you will add a 'related_name' to your author field in Book model, so it will be called by Author.books instead of Author.book_set. its much more readable.

Or, you could play around with something like this:
Author.objects.filter(book__pub_date__isnull=False).order_by('-book__pub_date')

Lastly, would the option of just adding a new field to each one to show the date of the last book and just updating that the whole time be better?
Actually it would! This is a normal denormalization practice and can be done like this:
class Author(models.Model):
name = models.CharField(max_length=200, unique=True)
latest_pub_date = models.DateTimeField(null=True, blank=True)
def update_pub_date(self):
try:
self.latest_pub_date = self.book_set.order_by('-pub_date')[0]
self.save()
except IndexError:
pass # no books yet!
class Book(models.Model):
pub_date = models.DateTimeField()
author = models.ForeignKey(Author)
def save(self, **kwargs):
super(Book, self).save(**kwargs)
self.author.update_pub_date()
def delete(self):
super(Book, self).delete()
self.author.update_pub_date()
This is the third common option you have besides two already suggested:
doing it in SQL with a join and grouping
getting all the books to Python side and remove duplicates
Both these options choose to compute pub_dates from a normalized data at the time when you read them. Denormalization does this computation for each author at the time when you write new data. The idea is that most web apps do reads most often than writes so this approach is preferable.
One of the perceived downsides of this is that basically you have the same data in different places and it requires you to keep it in sync. It horrifies database people to death usually :-). But this is usually not a problem until you use your ORM model to work with dat (which you probably do anyway). In Django it's the app that controls the database, not the other way around.
Another (more realistic) downside is that with the naive code that I've shown massive books update may be way slower since they ping authors for updating their data on each update no matter what. This is usually solved by having a flag to temporarily disable calling update_pub_date and calling it manually afterwards. Basically, denormalized data requires more maintenance than normalized.

def remove_duplicates(seq):
seen = {}
result = []
for item in seq:
if item in seen: continue
seen[item] = 1
result.append(item)
return result
# Get the authors of the most recent books
query_result = Books.objects.order_by('pub_date').values('author')
# Strip the keys from the result set and remove duplicate authors
recent_authors = remove_duplicates(query_result.values())

Building on ayaz's solution, what about:
Author.objects.filter(book__pub_date__isnull=False).distinct().order_by('-book__pub_date')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django aggregate sum on child model field - python

Related

How to write manager class which use filter field as computed field not as a part of model fields?

How can I iterate through an SQL database using a function within a Django model?

Persistent Calculated Fields in Django

Django Model: Default method for field called multiple times

Django: Ordering objects by their children's attributes

Categories

Resources