I want to build a simple hot questions list using Django. I have a function that evaluates "hotness" of each question based on some arguments.
Function looks similar to this (full function here)
def hot(ups, downs, date):
# Do something here..
return hotness
My models for question and vote models (relevant part)
class Question(models.Model):
title = models.CharField(max_length=150)
body = models.TextField()
created_at = models.DateTimeField(auto_now_add=True)
class Vote(models.Model):
question = models.ForeignKey(Question, related_name='questions_votes')
delta = models.IntegerField(default=0)
Now, the delta attribute is either positive or negative. The hot function receives number of positive votes and number of negative votes and creation date of question.
I've tried something like this, but it isn't working.
questions = Question.objects.annotate(hotness=hot(question_votes.filter(delta, > 0),question_votes.filter(delta < 0), 'created_at')).order_by('hotness')
The error I'm getting is: global name 'question_votes' is not defined
I understand the error, but I don't the correct way of doing this.
You can't use python functions for annotations. Annotation is a computation that is done on a database level. Django provides you only a set of basic computations which can be processed by the database - SUM, AVERAGE, MIN, MAX and so on... For more complex stuffs only from version 1.8 we have an API for more complex query expressions. Before Django 1.8 the only way to achieve similar functionality was to use .extra which means to write plain SQL.
So you basically have two options.
First
Write your hotness computation in plain SQL using .extra or via the new API if your Django version is >= 1.8.
Second
Create hotness field inside you model, which will be calculated by a cron job once a day (or more often depending on your needs). And use it for your needs (the hottest list).
For those looking for an updated answer (Django 2.0+) it is possible to subclass Func to generate custom functions for aggregations as per the documentation . There is a good explanation and example here about 80% of the way through the post in the "Extending with custom database functions" section.
Related
Disclaimer: I have searched and a question tackling this particular challenge could not be found at the time of posting.
The Requirement
For a Class Based View I need to implement Pagination for a QuerySet derived through a many to many relationship. Here's the requirement with a more concrete description:
Many Library Records can belong to many Collections
Web pages are required for most (but not necessarily all) Collections, and so I need to build views/templates/urls based on what the client identifies as required
Each Collection Page displaying the relevant Library Records requires Pagination, as there may be 100's of records to display.
The First Approach
And so with this requirement in mind I approached this as I normally would when building a CBV with Pagination. However, this approach did not allow me to meet the requirement. What I quickly discovered was that the Pagination method in the CBV was building the object based on the declared model, but the many to many relationship was not working for me.
I explored the use of object in the template, but after a number of attempts I was getting nowhere. I need to display Library Record objects but the many to many relationship demands that I do so after determining the records based on the Collection they belong to.
EDIT - Addition of model
models.py
class CollectionOrder(models.Model):
collection = models.ForeignKey(
Collection,
related_name='collection_in_collection_order',
on_delete=models.PROTECT,
null=True,
blank=True,
verbose_name='Collection'
)
record = models.ForeignKey(
LibraryRecord,
related_name='record_in_collection_order',
on_delete=models.PROTECT,
null=True,
blank=True,
verbose_name='Library record',
)
order_number = models.PositiveIntegerField(
blank=True,
null=True,
)
Please do not work with record.record.id: this will each time make a query for each CollectionOrder object, and thus if there are 100 CollectionOrder objects, that will make 100 extra queries, and thus eventually make 102 queries. If the number of matches is thus quite large, it will eventually no longer respond (within reasonable time).
Furthermore pk__in=library_records_ids will not respect the order of the library_record_ids. Indeed, it can return the LibraryRecords in any order, as long as these have primary keys that are members of the list.
You can query with:
def get_queryset(self):
return LibraryRecord.objects.filter(
collectionorder__collection__collection='collection-name'
).order_by('collectionorder__order_number')
Where collectionorder is the related_query_name=… [Django-doc] for the ForeignKey, OneToOneField or ManyToManyField named record from CollectionOrder to the LibraryRecord model. If you did not specify a value for the related_query_name=… parameter, it will take the value for the related_name=… parameter [Django-doc], and if you did not specify that one either, it will use the name of the source model (so where the relation is defined) in lowercase, so in this case collectionorder.
This will thus respect the collectionorder__order_number as ordering condition, and perform this in a single database query, minimizing the amount of queries to the database.
Hopefully, this Q&A helps someone else. If in reading the following approach you can think of ways to refactor/optimize I'd love to learn. Note: I deliberately did not implement Pythonic List Comprehension for my personal preference of readability.
What I ended up doing was adding get_queryset() to:
Query the Collection for the records belong to it, to then
Build a list of record ids, to then
Return the QuerySet by filtering for pk__in (the pk exists in the list of library_record_ids)
Here's the resulting code. (Edit: This code has been optimized following another answer - I just didn't want to leave a lesser snippet up)
def get_queryset(self):
return LibraryRecord.objects.filter(
record_in_collection_order__collection__collection='Collection Name'
).order_by('record_in_collection_order__order_number')
The requirement has been met. I welcome constructive criticism. My intention in sharing this Q&A is to try and give a little back to the Stack Overflow Community that has served me so well since starting this journey into Django.
I have two models - a Task model, and a Worker model. I have a property on Worker that counts how many tasks they have completed this month.
class Task(models.Model):
# ...
completed_on = models.DateField()
class Worker(models.Model):
# ...
#property
def completed_this_month(self):
year = datetime.date.today().year
month = datetime.date.today().month
return Task.objects.filter(worker=self,
completed_on__year=year,
completed_on__month=month).count()
I've added this field to the Worker admin, and it displays correctly.
I would like to be able to sort by this field. Is there a way to do this?
Edit: It has been suggested that my question is a duplicate of this question, which uses extra(). The Django documentation strongly advises against using the extra() method, and even asks you to file a ticket explaining why you had to use it.
Use this method as a last resort
This is an old API that we aim to deprecate at some point in the future. Use it only if you cannot express your query using other queryset methods. If you do need to use it, please file a ticket using the QuerySet.extra keyword with your use case (please check the list of existing tickets first) so that we can enhance the QuerySet API to allow removing extra().
Suppose I have a model that represents scientific articles. Doing some research, I may find the same article more than once, with approximately equal titles:
Some Article Title
Some Article Title
Notice that the second title string is slightly different: it has an extra space before "Title".
If the problem was because there could be more or less spacing, it would be easy since I could just trim it before saving.
But say there could be more small differences that consist of characters other than spaces:
Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project.
Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT).
This is some random article I used here as an example
Those titles clearly refer to the same unique work, but the second one for some reason is missing some letters.
What is the best way of defining uniqueness in this situation?
In my mind, I was thinking of some function that calculates the levenshtein distance and decides if the strings are the same title based on some threshold. But is it possible to do on a django model, or define this behavior on a database level?
My first thought was the levenshtein distance too, so it's probably the way to go here ;) You could implement it yourself or find the code that already knows how to compute it (there's a lot of them) and then...
...use it in the model validation:
https://docs.djangoproject.com/en/2.0/ref/models/instances/#validating-objects
You can basically raise an exception in the custom validate_unique if you decide the new object violates this special type of uniqueness. The flipside is you'll probably need to load all other objects there.
If you create these objects on your own, you'll have to call full_clean() explicitly before saving. If the articles come from some kind of form, calling is_valid() on that form is enough.
You have 2 options here, 0 of which are perfect.
Option 1
This assumes you have a function titles_are_similar(title_1: str, title_2: str): bool implemented, that decides whether the two titles are similar. Use any sort of fuzzy string comparison of your choice to implement this function.
We will need to use an enhanced validator.
I said "enhanced" because it will optionally accept the object you are currently trying to save, when a typical django validator for obvious reasons does not do so.
The current object's id is required. When you change and save an already existing instance/row x, validation should not fail because the table already contains a "similar" value that belongs to this exact instance/row x.
The validator itself will use values_list to reduce the performance impact.
def title_unique_enough_validator(value, exclude_obj=None):
query_set = Article.objects.all()
if exclude_obj:
query_set = query_set.exclude(pk=exclude_obj.pk) # pk -> id
old_titles = query_set.values_list("title", flat=True)
if any(titles_are_similar(old_title, new_title) for old_title in old_titles):
raise ValidationError("Similar title already exists") # also use _()
If you will use: title = models.CharField(validators=[title_unique_enough_validator], ...) you will get a ValidationError every time you try to modify and save an existing object, as this object is not passed into the validator and therefore not excluded from the check (I mentioned it above). Instead, we will override the Article.clean() method (docs):
class Article(Model):
...
def clean(self):
super().clean()
title_unique_enough_validator(value=self.title, obj=self)
It will nicely work with forms. But there are 2 other major problems left.
Problem 1
Quoting the docs:
Note, however, that like Model.full_clean(), a model’s clean() method is not invoked when you call your model’s save() method.
To solve this, override the .save() method:
class Article(...):
...
def save(self, *args, **kwargs)
title_unique_enough_validator(value=self.value, obj=self) # can raise ValidationError
return super().save(*args, **kwargs)
However, django does not expect to have a ValidationError when calling save(). So, every time you manually call article.save() from your Python code (without djano forms) you need to wrap it into a try ... except block. Otherwise your software will 500 on ValidationError.
Problem 2
Do you ever explicitly call Article.objects.update()? If so, bad news (docs):
update() does an update at the SQL level and, thus, does not call any save() methods on your models
As a workaround, you might want to create a custom model manager for the Article model and override the update(): simply make it unusable (raise NotImplemented), or implement an additional check there. Just something that will prevent it from violating your constraint.
Option 2
Use database constraints.
Why I did not list this option first? Well, you will encounter tons and tons of problems with it. Django is not aware what database constraints might do. It just dies with OperationalError (docs) every time a constraint prevents it from doing what it wants.
As I have to work with many unmanaged models using django, I can confirm that you will require crap load of efforts to enhance django classes, so that it can deal with the OperationalError every now and then without exploding every bloody time. Especially painful is to deal with it if you're using django.contrib.admin, as it's just an endless pile of spaghetti.
So, seriously, avoid database constraints, unless you already must use unmanaged models or you're a masochist in search of adventures.
lets say I have a model Comments and another model Answers. I want to query all comments but also include in the query the number of answers each comment has. I thought annotate() would be useful in this case, but I just don't know how to write it down. The docs write that:
New in Django 1.8: Previous versions of Django only allowed aggregate
functions to be used as annotations. It is now possible to annotate a
model with all kinds of expressions.
So. This is an example of my models:
class Comments(models.Model):
...
class Answers(models.Model):
comment = models.ForeignKey(Comments)
And this is an example query:
queryset = Comments.objects.all().annotate(...?)
I'm not sure how to annotate the Count of answers each comment has. That is, how many answers point to each comment on the FK field comment. Is it even possible? is there a better way? Is it better to just write a method on the manager?
You need to use a Count aggregation:
from django.db.models import Count
comments = Comments.objects.annotate(num_answers=Count('answers'))
I have a Django model object, Record, which has foreign keys to two other models RecordType and Source:
class Record(models.Model):
title = models.CharField(max_length=200)
record_type = models.ForeignKey(RecordType)
source = models.ForeignKey(Source)
Question: If I want to count the number of Record objects which refer to RecordType with id "x" and Source with id "y", where is the appropriate area of code to put that function?
Right now I have it in views.py and I feel that is a violation of best practices for "fat model, thin views", so I want to move it from views.py. But I'm not entirely sure if this is a row-based or table-based type of operation, so I'm not sure if it should be implemented as a model method, or instead as a manager.
Here's the current (working) logic in views.py:
record_count = Record.objects.filter(record_type__id=record_type_.id, source__id=source_.id).count()
Just to be clear, this isn't a question of how to get the count, but simply in which area of code to put the function.
Here's a similar question, but which was addressing "how to" not "where":
Counting and summing values of records, filtered by a dictionary of foreign keys in Django
If the result involves multiple rows, it is a table-related method, and according to Django conventions, should be a manager method.
From the Django docs:
Adding extra Manager methods is the preferred way to add “table-level” functionality to your models. (For “row-level” functionality – i.e., functions that act on a single instance of a model object – use Model methods, not custom Manager methods.)