I have a simple model which includes a product and category table. The Product model has a foreign key Category.
When I make a tastypie API call that returns a list of categories /api/vi/categories/
I would like to add a field that determines the "product count" / the number of products that have a giving category. The result would be something like:
category_objects[
{
id: 53
name: Laptops
product_count: 7
},
...
]
The following code is working but the hit on my DB is heavy
def dehydrate(self, bundle):
category = Category.objects.get(pk=bundle.obj.id)
products = Product.objects.filter(category=category)
bundle.data['product_count'] = products.count()
return bundle
Is there a more efficient way to build this query? Perhaps with annotate ?
You can use prefetch_related method of QuerSet to reverse select_related.
Asper documentation,
prefetch_related(*lookups)
Returns a QuerySet that will automatically
retrieve, in a single batch, related objects for each of the specified
lookups.
This has a similar purpose to select_related, in that both are
designed to stop the deluge of database queries that is caused by
accessing related objects, but the strategy is quite different.
If you change your dehydrate function to following then database will be hit single time.
def dehydrate(self, bundle):
category = Category.objects.prefetch_related("product_set").get(pk=bundle.obj.id)
bundle.data['product_count'] = category.product_set.count()
return bundle
UPDATE 1
You should not initialize queryset inside dehydrate function. queryset should be always set in Meta class only. Please have a look at following example from django-tastypie documentation.
class MyResource(ModelResource):
class Meta:
queryset = User.objects.all()
excludes = ['email', 'password', 'is_staff', 'is_superuser']
def dehydrate(self, bundle):
# If they're requesting their own record, add in their email address.
if bundle.request.user.pk == bundle.obj.pk:
# Note that there isn't an ``email`` field on the ``Resource``.
# By this time, it doesn't matter, as the built data will no
# longer be checked against the fields on the ``Resource``.
bundle.data['email'] = bundle.obj.email
return bundle
As per official django-tastypie documentation on dehydrate() function,
dehydrate
The dehydrate method takes a now fully-populated bundle.data & make
any last alterations to it. This is useful for when a piece of data
might depend on more than one field, if you want to shove in extra
data that isn’t worth having its own field or if you want to
dynamically remove things from the data to be returned.
dehydrate() is only meant to make any last alterations to bundle.data.
Your code does additional count query for each category. You're right about annotate being helpfull in this kind of a problem.
Django will include all queryset's fields in GROUP BY statement. Notice .values() and empty .group_by() serve limiting field set to required fields.
cat_to_prod_count = dict(Product.objects
.values('category_id')
.order_by()
.annotate(product_count=Count('id'))
.values_list('category_id', 'product_count'))
The above dict object is a map [category_id -> product_count].
It can be used in dehydrate method:
bundle.data['product_count'] = cat_to_prod_count[bundle.obj.id]
If that doesn't help, try to keep similar counter on category records and use singals to keep it up to date.
Note categories are usually a tree-like beings and you probably want to keep count of all subcategories as well.
In that case look at the package django-mptt.
Related
I want to add an extra field to a query set in Django.
The field does not exist in the model but I want to add it to the query set.
Basically I want to add an extra field called "upcoming" which should return "True"
I already tried adding a #property method to my model class. This does not work because apparently django queries access the DB directly.
models.py
class upcomingActivity(models.Model):
title = models.CharField (max_length=150)
address = models.CharField (max_length=150)
Views.py
def get(self, request):
query = upcomingActivity.objects.all()
feature_collection = serialize('geojson', query ,
geometry_field='location',
fields= ( 'upcoming','title','address','pk' )
)
This answer is for the case that you do not want to add a virtual property to the model (the model remains as is).
To add an additional field to the queryset result objects:
from django.db.models import BooleanField, Value
upcomingActivity.objects.annotate(upcoming=Value(True, output_field=BooleanField())).all()
Each object in the resulting queryset will have the attribute upcoming with the boolean value True.
(Performance should be nice because this is easy work for the DB, and Django/Python does not need to do much additional work.)
EDIT after comment by Juan Carlos:
The Django serializer is for serializing model objects and thus will not serialize any non-model fields by default (because basically, the serializer in this case is for loading/dumping DB data).
See https://docs.djangoproject.com/en/2.2/topics/serialization/
Django’s serialization framework provides a mechanism for “translating” Django models into other formats.
See also: Django - How can you include annotated results in a serialized QuerySet?
From my own experience:
In most cases, we are using json.dumps for serialization in views and this works like a charm. You can prepare the data very flexibly for whatever needs arize, either by annotations or by processing the data (list comprehension etc.) before dumping it via json.
Another possibility in your situation could be to customize the serializer or the input to the serializer after fetching the data from the DB.
You can use a class function to return the upcoming value like this:
def upcoming(self):
is_upcoming = # some logic query or just basically set it to true.
return is_upcoming
then call it normally in your serializer the way you did it.
fields= ( 'upcoming','title','address','pk' )
Using Django 11 with PostgreSQL db. I have the models as shown below. I'm trying to prefetch a related queryset, using the Prefetch object and prefetch_related without assigning it to an attribute.
class Person(Model):
name = Charfield()
#property
def latest_photo(self):
return self.photos.order_by('created_at')[-1]
class Photo(Model):
person = ForeignKey(Person, related_name='photos')
created_at = models.DateTimeField(auto_now_add=True)
first_person = Person.objects.prefetch_related(Prefetch('photos', queryset=Photo.objects.order_by('created_at'))).first()
first_person.photos.order_by('created_at') # still hits the database
first_person.latest_photo # still hits the database
In the ideal case, calling person.latest_photo will not hit the database again. This will allow me to use that property safely in a list display.
However, as noted in the comments in the code, the prefetched queryset is not being used when I try to get the latest photo. Why is that?
Note: I've tried using the to_attr argument of Prefetch and that seems to work, however, it's not ideal since it means I would have to edit latest_photo to try to use the prefetched attribute.
The problem is with slicing, it creates a different query.
You can work around it like this:
...
#property
def latest_photo(self):
first_use_the_prefetch = list(self.photos.order_by('created_at'))
then_slice = first_use_the_prefetch[-1]
return then_slice
And in case you want to try, it is not possible to use slicing inside the Prefetch(query=...no slicing here...) (there is a wontfix feature request for this somewhere in Django tracker).
A have piece of code, which fetches some QuerySet from DB and then appends new calculated field to every object in the Query Set. It's not an option to add this field via annotation (because it's legacy and because this calculation based on another already pre-fetched data).
Like this:
from django.db import models
class Human(models.Model):
name = models.CharField()
surname = models.CharField()
def calculate_new_field(s):
return len(s.name)*42
people = Human.objects.filter(id__in=[1,2,3,4,5])
for s in people:
s.new_column = calculate_new_field(s)
# people.somehow_reorder(new_order_by=new_column)
So now all people in QuerySet have a new column. And I want order these objects by new_column field. order_by() will not work obviously, since it is a database option. I understand thatI can pass them as a sorted list, but there is a lot of templates and other logic, which expect from this object QuerySet-like inteface with it's methods and so on.
So question is: is there some not very bad and dirty way to reorder existing QuerySet by dinamically added field or create new QuerySet-like object with this data? I believe I'm not the only one who faced this problem and it's already solved with django. But I can't find anything (except for adding third-party libs, and this is not an option too).
Conceptually, the QuerySet is not a list of results, but the "instructions to get those results". It's lazily evaluated and also cached. The internal attribute of the QuerySet that keeps the cached results is qs._result_cache
So, the for s in people sentence is forcing the evaluation of the query and caching the results.
You could, after that, sort the results by doing:
people._result_cache.sort(key=attrgetter('new_column'))
But, after evaluating a QuerySet, it makes little sense (in my opinion) to keep the QuerySet interface, as many of the operations will cause a reevaluation of the query. From this point on you should be dealing with a list of Models
Can you try it functions.Length:
from django.db.models.functions import Length
qs = Human.objects.filter(id__in=[1,2,3,4,5])
qs.annotate(reorder=Length('name') * 42).order_by('reorder')
How can I apply annotations and filters from a custom manager queryset when filtering via a related field? Here's some code to demonstrate what I mean.
Manager and models
from django.db.models import Value, BooleanField
class OtherModelManager(Manager):
def get_queryset(self):
return super(OtherModelManager, self).get_queryset().annotate(
some_flag=Value(True, output_field=BooleanField())
).filter(
disabled=False
)
class MyModel(Model):
other_model = ForeignKey(OtherModel)
class OtherModel(Model):
disabled = BooleanField()
objects = OtherModelManager()
Attempting to filter the related field using the manager
# This should only give me MyModel objects with related
# OtherModel objects that have the some_flag annotation
# set to True and disabled=False
my_model = MyModel.objects.filter(some_flag=True)
If you try the above code you will get the following error:
TypeError: Related Field got invalid lookup: some_flag
To further clarify, essentially the same question was reported as a bug with no response on how to actually achieve this: https://code.djangoproject.com/ticket/26393.
I'm aware that this can be achieved by simply using the filter and annotation from the manager directly in the MyModel filter, however the point is to keep this DRY and ensure this behaviour is repeated everywhere this model is accessed (unless explicitly instructed not to).
How about running nested queries (or two queries, in case your backend is MySQL; performance).
The first to fetch the pk of the related OtherModel objects.
The second to filter the Model objects on the fetched pks.
other_model_pks = OtherModel.objects.filter(some_flag=...).values_list('pk', flat=True)
my_model = MyModel.objects.filter(other_model__in=other_model_pks)
# use (...__in=list(other_model_pks)) for MySQL to avoid a nested query.
I don't think what you want is possible.
1) I think you are miss-understanding what annotations do.
Generating aggregates for each item in a QuerySet
The second way to generate summary values is to generate an
independent summary for each object in a QuerySet. For example, if you
are retrieving a list of books, you may want to know how many authors
contributed to each book. Each Book has a many-to-many relationship
with the Author; we want to summarize this relationship for each book
in the QuerySet.
Per-object summaries can be generated using the annotate() clause.
When an annotate() clause is specified, each object in the QuerySet
will be annotated with the specified values.
The syntax for these annotations is identical to that used for the
aggregate() clause. Each argument to annotate() describes an aggregate
that is to be calculated.
So when you say:
MyModel.objects.annotate(other_model__some_flag=Value(True, output_field=BooleanField()))
You are not annotation some_flag over other_model.
i.e. you won't have: mymodel.other_model.some_flag
You are annotating other_model__some_flag over mymodel.
i.e. you will have: mymodel.other_model__some_flag
2) I'm not sure how familiar SQL is for you, but in order to preserve MyModel.objects.filter(other_model__some_flag=True) possible, i.e. to keep the annotation when doing JOINS, the ORM would have to do a JOIN over subquery, something like:
INNER JOIN
(
SELECT other_model.id, /* more fields,*/ 1 as some_flag
FROM other_model
) as sub on mymodel.other_model_id = sub.id
which would be super slow and I'm not surprised they are not doing it.
Possible solution
don't annotate your field, but add it as a regular field in your model.
The simplified answer is that models are authoritative on the field collection and Managers are authoritative on collections of models. In your efforts to make it DRY you made it WET, cause you alter the field collection in your manager.
In order to fix it, you would have to teach the model about the lookup and need to do that using the Lookup API.
Now I'm assuming that you're not actually annotating with a fixed value, so if that annotation is in fact reducible to fields, then you may just get it done, because in the end it needs to be mapped to database representation.
I've looked through Tastypie's documentation and searched for a while, but can't seem to find an answer to this.
Let's say that we've got two models: Student and Assignment, with a one-to-many relationship between them. The Assignment model includes an assignment_date field. Basically, I'd like to build an API using Tastypie that returns Student objects sorted by most recent assignment date. Whether the sorting is done on the server or in the client side doesn't matter - but wherever the sorting is done, the assignment_date is needed to sort by.
Idea #1: just return the assignments along with the students.
class StudentResource(ModelResource):
assignments = fields.OneToManyField(
AssignmentResource, 'assignments', full=True)
class Meta:
queryset = models.Student.objects.all()
resource_name = 'student'
Unfortunately, each student may have tens or hundreds of assignments, so this is bloated and unnecessary.
Idea #2: augment the data during the dehydrate cycle.
class StudentResource(ModelResource):
class Meta:
queryset = models.Student.objects.all()
resource_name = 'student'
def dehydrate(self, bundle):
bundle.data['last_assignment_date'] = (models.Assignment
.filter(student=bundle.data['id'])
.order_by('assignment_date')[0].assignment_date)
This is not ideal, since it'll be performing a separate database roundtrip for each student record. It's also not very declarative, nor elegant.
So, is there a good way to get this kind of functionality with Tastypie? Or is there a better way to do what I'm trying to achieve?
You can sort a ModelResource by a field name. Check out this part of the documentation http://django-tastypie.readthedocs.org/en/latest/resources.html#ordering
You could also set this ordering by default in the Model: https://docs.djangoproject.com/en/dev/ref/models/options/#ordering