Django rest framework - Optimizng serialization of nested related field - python

Given the following models:
class Model_A:
...
class Model_B:
...
class Model_C:
model_a = ForeignKey(Model_A, related_name='c_items')
model_b = ForeignKey(Model_B)
...
And the following model serializers setup:
class Model_A_Serializer:
class Model_C_Serializer:
class Meta:
model = Model_C
fields = ( 'model_b', ... )
c_items = Model_C_Serializer(many=True)
class Meta:
model = Model_A
fields = ( 'c_items', ... )
And a basic vieweset:
class Model_A_Viewset:
model = Model_A
queryset = model.objects.all()
serializer_class = Model_A_Serializer
...
When the user POST's the following JSON payload to create an instance of Model_A along with instances of Model_C:
{
'c_items': [
{
'model_b': 1
},
{
'model_b': 2
},
{
'model_b': 3
}
]
}
Note: Model_B instances with IDs 1, 2, and 3 already exist, but no Model_A and no Model_C instances exist in the above example.
Then I noticed that django seems to execute the following queries when serializing the incoming data:
SELECT ... FROM Model_B WHERE id = 1;
SELECT ... FROM Model_B WHERE id = 2;
SELECT ... FROM Model_B WHERE id = 3;
This seems unnecessary to me as a single SELECT ... FROM Model_B WHERE id IN (1,2,3) would do the job.
How do I go about optimizng this?
I have tried to modify the queryset in the viewset above like so:
queryset = model.objects.prefetch_related('c_items__model_b').all()
But this did not make a difference in the number of queries being executed.

been a while since I django'd but I'm pretty sure you need to prefetch the relations and the relations relations, and I'm pretty sure prefetch is for many to many or reverse foreign keys, while select is for direct foreign keys, which means you can save one query by using select_related instead:
model.objects.prefetch_related('c_items').select_related('c_items__model_b').all()

Related

Django recursive serializing

Let's say I have models like this:
class A(models.Model):
...some fields here
class B(models.Model):
...some fields here
a = models.ForeignKey(A, on_delete=CASCADE)
class C(models.Model):
...some fields here
b = models.ForeignKey(B, on_delete=CASCADE)
...
And I want my API endpoint to return something like this
{
...some fields here
b: [
{
...some field here
c: [{...}, {...} ...]
},
{
...some field here
c: [{...}, {...} ...]
}
...
]
}
I know I can do something like this:
class Bserializer(serializers.ModelSerializer):
c = Cserializer(source="c_set", many=True, read_only=True,)
class Meta:
model = B
fields = [...some fields, "c"]
class Aserializer(serializers.ModelSerializer):
b = Bserializer(source="b_set", many=True, read_only=True,)
class Meta:
model = A
fields = [...some fields, "b"]
But if this goes deeper or/and models have more foreign keys it starts to become really complicated. Is there a way to add recursively all instances referencing the model.
You can specify a depth value in the serializer Meta class, however, this won't' work if you want to customize any part of the nested data, as this will automatically create model serializers (with all fields) for nested relations. Check docs here
class Aserializer(serializers.ModelSerializer):
class Meta:
model = A
fields = [...some fields, "b"]
depth = 2
The bottom line, if you don't do any customization of the nested serializers, use it, otherwise, stick with custom serializers for each of the relations. But I'll suggest keeping the nested data at a minimum as it will impact performance

Find parent data using Django ORM

I have 3 tables
AllTests
ReportFormat
Parameter
AllTests has reportFormat as ForeignKey with one to one relation.
class AllTests(models.Model):
id=models.AutoField(primary_key=True)
name=models.CharField(max_length=200)
reportFormat=models.ForeignKey(ReportFormat)
.....
.....
class Meta:
db_table = 'AllTests'
Parameter table has reportFormat as ForeignKey with one too many relations. Means one report format has many parameters.
class Parameter(models.Model):
id=models.AutoField(primary_key=True)
name=models.CharField(max_length=200)
reportFormat=models.ForeignKey(ReportFormat)
.....
.....
class Meta:
db_table = 'AllTests
ReportFormat Table:-
class ReportFormat(models.Model):
id=models.AutoField(primary_key=True)
.....
.....
class Meta:
db_table = 'ReportFormat'
I want to make a query on the parameter model that return parameter data with the related test data. Please suggest me a better way for the same.
my current query is like this.
from django.db.models import (
Sum, Value, Count, OuterRef, Subquery
)
data = Parameter.objects.filter(isDisable=0).values('id', 'name', 'reportFormat_id').annotate(
test=Subquery(Alltsets.objects.filter(reportFormat_id=OuterRef('reportFormat_id').values('id', 'name')))
)
Since, report format column of both the tables refer to the same column of reportformat table, maybe you can directly relate them with something like below.
select * from parameter inner join alltests on parameter.reportformat=alltests.reportformat
Maybe in ORM something like this?
Parameter.objects.filter().annotate(alltests_id=F('reportformat__alltests__id'),alltests_name=F('reportformat__alltests__name'),....other fields)
Fetch all Parameter and AllTests objects. Find the parameter's alltests object using reportFormat field. Add the corresponding data to the parameter dict.
Solution:
parameter_qs = Parameter.objects.all().select_related('reportFormat')
alltests_qs = AllTests.objects.all().select_related('reportFormat')
data = []
for parameter in parameter_qs:
item = {
'id': parameter.id,
'name': parameter.name,
'alltests': {}
}
# fetch related AllTests object using `reportFormat` field
alltests_obj = None
for alltests in alltests_qs:
if parameter.reportFormat == alltests.reportFormat:
alltests_obj = alltests
break
if alltests_obj is not None:
item['alltests'] = {
'id': alltests_obj.id,
'name': alltests_obj.name
}
data.append(item)

Django Serializer Nested Creation: How to avoid N+1 queries on relations

There are dozens of posts about n+1 queries in nested relations in Django, but I can't seem to find the answer to my question. Here's the context:
The Models
class Book(models.Model):
title = models.CharField(max_length=255)
class Tag(models.Model):
book = models.ForeignKey('app.Book', on_delete=models.CASCADE, related_name='tags')
category = models.ForeignKey('app.TagCategory', on_delete=models.PROTECT)
page = models.PositiveIntegerField()
class TagCategory(models.Model):
title = models.CharField(max_length=255)
key = models.CharField(max_length=255)
A book has many tags, each tag belongs to a tag category.
The Serializers
class TagSerializer(serializers.ModelSerializer):
class Meta:
model = Tag
exclude = ['id', 'book']
class BookSerializer(serializers.ModelSerializer):
tags = TagSerializer(many=True, required=False)
class Meta:
model = Book
fields = ['title', 'tags']
def create(self, validated_data):
with transaction.atomic():
tags = validated_data.pop('tags')
book = Book.objects.create(**validated_data)
Tag.objects.bulk_create([Tag(book=book, **tag) for tag in tags])
return book
The Problem
I am trying to POST to the BookViewSet with the following example data:
{
"title": "The Jungle Book"
"tags": [
{ "page": 1, "category": 36 }, // plot intro
{ "page": 2, "category": 37 }, // character intro
{ "page": 4, "category": 37 }, // character intro
// ... up to 1000 tags
]
}
This all works, however, during the post, the serializer proceeds to make a call for each tag to check if the category_id is a valid one:
With up to 1000 nested tags in a call, I can't afford this.
How do I "prefetch" for the validation?
If this is impossible, how do I turn off the validation that checks if a foreign_key id is in the database?
EDIT: Additional Info
Here is the view:
class BookViewSet(views.APIView):
queryset = Book.objects.all().select_related('tags', 'tags__category')
permission_classes = [IsAdminUser]
def post(self, request, format=None):
serializer = BookSerializer(data=request.data)
if serializer.is_valid():
serializer.save()
return Response(serializer.data, status=status.HTTP_201_CREATED)
return Response(serializer.errors, status=status.HTTP_400_BAD_REQUEST)
The DRF serializer is not the place (in my own opinion) to optimize a DB query. Serializer has 2 jobs:
Serialize and check the validity of input data.
Serialize output data.
Therefore the correct place to optimize your query is the corresponding view.
We will use the select_related method that:
Returns a QuerySet that will “follow” foreign-key relationships, selecting additional related-object data when it executes its query. This is a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries.
to avoid the N+1 database queries.
You will need to modify the part of your view code that creates the corresponding queryset, in order to include a select_related call.
You will also need to add a related_name to the Tag.category field definition.
Example:
# In your Tag model:
category = models.ForeignKey(
'app.TagCategory', on_delete=models.PROTECT, related_name='categories'
)
# In your queryset defining part of your View:
class BookViewSet(views.APIView):
queryset = Book.objects.all().select_related(
'tags', 'tags__categories'
) # We are using the related_name of the ForeignKey relationships.
If you want to test something different that uses also the serializer to cut down the number of queries, you can check this article.
I think the issue here is that the Tag constructor is automatically converting the category id that you pass in as category into a TagCategory instance by looking it up from the database. The way to avoid that is by doing something like the following if you know that all of the category ids are valid:
def create(self, validated_data):
with transaction.atomic():
tags = validated_data.pop('tags')
book = Book.objects.create(**validated_data)
tag_instances = [ Tag(book_id=book.id, page=x['page'], category_id=x['category']) for x in tags ]
Tag.objects.bulk_create(tag_instances)
return book
I've come up with an answer that gets things working (but that I'm not thrilled about): Modify the Tag Serializer like this:
class TagSerializer(serializers.ModelSerializer):
category_id = serializers.IntegerField()
class Meta:
model = Tag
exclude = ['id', 'book', 'category']
This allows me to read/write a category_id without having the overhead of validations. Adding category to exclude does mean that the serializer will ignore category if it's set on the instance.
Problem is that you don't set created tags to the book instance so serializer try to get this while returning.
You need to set it to the book as a list:
def create(self, validated_data):
with transaction.atomic():
book = Book.objects.create(**validated_data)
# Add None as a default and check that tags are provided
# If you don't do that, serializer will raise error if request don't have 'tags'
tags = validated_data.pop('tags', None)
tags_to_create = []
if tags:
tags_to_create = [Tag(book=book, **tag) for tag in tags]
Tag.objects.bulk_create(tags_to_create)
# Here I set tags to the book instance
setattr(book, 'tags', tags_to_create)
return book
Provide Meta.fields tuple for TagSerializer (it's weird that this serializer don't raise error saying that fields tuple is required)
class TagSerializer(serializers.ModelSerializer):
class Meta:
model = Tag
fields = ('category', 'page',)
Prefetching tag.category should be NOT necessary in this case because it's just id.
You will need prefetching Book.tags for GET method. The simplest solution is to create static method for serializer and use it in viewset get_queryset method like this:
class BookSerializer(serializers.ModelSerializer):
...
#staticmethod
def setup_eager_loading(queryset): # It can be named any name you like
queryset = queryset.prefetch_related('tags')
return queryset
class BookViewSet(views.APIView):
...
def get_queryset(self):
self.queryset = BookSerializer.setup_eager_loading(self.queryset)
# Every GET request will prefetch 'tags' for every book by default
return super(BookViewSet, self).get_queryset()
select_related function will check ForeignKey in the first time.
Actually,this is a ForeignKey check in the relational database and you can use SET FOREIGN_KEY_CHECKS=0; in database to close inspection.

how to display my M2M model in Django Rest Framework

I'm trying to display a 'nested' model in my API response and having trouble shaping the data.
I have the model the API is called from:
something like
class Rules(Model):
conditions = models.ManyToManyField(RulesPoliciesConditions)
...
...
class RulesPoliciesConditions(Model):
rules = models.ForeignKey(Rules, ...)
policies = models.ForeignKey(Policy, ...)
Rules and Policies are their own models with a few TextFields (name, nickname, timestamp, etc)
My problem is that when I use the Rules model to call a field called conditions, only the rules and policies PK display. I want to reach the other attributes like name, timestamp, nickname, etc.
I tried making my fields (in my Serializer) try to call specifically like "conditions__rules__name" but it's invalid, I also tried "conditions.rules.name" which is also invalid. Maybe I'm using the wrong field in my serializer, I'm trying out conditions = serializers.SlugRelatedField(many=True, queryset=q, slug_field="id")
My intention is to display something like:
conditions: [
{
rules: {id: rulesId, name: rulesName, ...},
policies: {id: policiesId, name: policiesName, ...}
}, ...
]
or just even:
conditions: [
{
rules: rulesName,
policies: policiesName
}, ...
]
since right now it just returns the rulesId and policiesId and it doesn't "know" about the other fields
EDIT: I found a relevant question on SO but couldn't get a relevant answer
Django REST Framework: Add field from related object to ModelSerializer
This can be achieved by using nested serializers. The level of nesting can be controlled/customized by various methods
class RulesPoliciesConditionsSerializer(serializers.ModelSerializer):
class Meta:
fields = '__all__'
model = RulesPoliciesConditions
depth = 1
class RulesSerializer(serializers.ModelSerializer):
conditions = RulesPoliciesConditionsSerializer(many=True)
class Meta:
fields = '__all__'
model = Rules
Pass your Rules queryset to the RulesSerializer serializer to
get the desired output
Example
rules_qs = Rules.objects.all()
rules_serializer = RulesSerializer(rules_qs, many=True)
data = rules_serializer.data
References
1. serializer depth
2. Nested serializer
You can use nested serializers for the purpose.
class RuleSerializer(serializers.ModelSerializer):
...
class Meta:
model = Rules(rulesId, rulesName)
fields = ('id', 'email', 'country')
class RulesPoliciesConditionsSerializer(serializers.ModelSerializer):
rules = RuleSerializer()
policies = PolicySerializer()
...

django admin inline many to many custom fields

Hi I am trying to customize my inlines in django admin.
Here are my models:
class Row(models.Model):
name = models.CharField(max_length=255)
class Table(models.Model):
rows = models.ManyToManyField(Row, blank=True)
name = models.CharField(max_length=255)
def __unicode__(self):
return self.name
and my admin:
class RowInline(admin.TabularInline):
model = Table.rows.through
fields = ['name']
class TableAdmin(admin.ModelAdmin):
inlines = [
RowInline,
]
exclude = ('rows',)
However I get this error
ImproperlyConfigured at /admin/table_app/table/1/
'RowInline.fields' refers to field 'name' that is missing from the
form.
How is that possible ?
class RowInline(admin.TabularInline):
model = Table.rows.through
fields = ['name']
This presents a problem because Table.rows.through represents an intermediate model. If you would like to understand this better have a look at your database. You'll see an intermediate table which references this model. It is probably named something like apname_table_rows. This intermeditate model does not contain the field, name. It just has two foreign key fields: table and row. (And it has an id field.)
If you need the name it can be referenced as a readonly field through the rows relation.
class RowInline(admin.TabularInline):
model = Table.rows.through
fields = ['row_name']
readonly_fields = ['row_name']
def row_name(self, instance):
return instance.row.name
row_name.short_description = 'row name'
class TableAdmin(admin.ModelAdmin):
inlines = [
RowInline,
]
exclude = ('rows',)
Django can not display it as you expected. Because there is an intermediary join table which joins your tables. In your example:
admin.py:
class RowInline(admin.TabularInline):
model = Table.rows.through # You are not addressing directly Row table but intermediary table
fields = ['name']
As the above note, model in RowInline addressing following table in your database, not Your Row table and model
table: your-app-name_table_row
--------------------------------
id | int not null
table_id | int
row_id | int
You can think it like there is an imaginary table in your model that joins the two tables.
class Table_Row(Model):
table = ForeignKey(Table)
row = ForeignKey(Row)
So if you edit your Inline as following
class RowInline(admin.TabularInline):
model = Table.rows.through # You are not addressing directly Row table but intermediary table
fields = ['row', 'table']
you will not see any error or exception. Because your model in RowInline addresses an intermediary table and that table do have those fields. Django can virtualize the imaginary table Table_Row up to here and can handle this.
But we can use relations in admin, with using __. If your code do have a ForeignKey relation instead of ManyToManyField relation, then following be valid in your admin
class Row(models.Model):
name = models.CharField(max_length=255)
class Table(models.Model):
rows = models.ForeignKey(Row, blank=True)
name = models.CharField(max_length=255)
def __unicode__(self):
return self.name
and your admin:
class RowInline(admin.TabularInline):
model = Table
fields = ['rows__name']
Because you will have real Models and djnago can evaluate __ relation on them
But if you try that in your structure:
class Row(models.Model):
name = models.CharField(max_length=255)
class Table(models.Model):
rows = models.ManyToManyField(Row, blank=True)
name = models.CharField(max_length=255)
def __unicode__(self):
return self.name
and your admin:
class RowInline(admin.TabularInline):
model = Table.rows.through
fields = ['row__name']
it will raise Exception! Because you do not have real table in your model, and django can not evaluate __ relations on virtual models it designs on top of its head.
Conclusion:
In your Inlines addressing ManyToMany relations, you are dealing with an imaginary intermediary model and you can not use fields orexclude attributes on that because your imaginary model do not have those fields and django can not handle relations ober that imaginary table. Following will be acceptable
class RowInline(admin.TabularInline):
model = Table.rows.through
# No fields or exclude declarations in here
and django will display combo boxes for your virtual intermediary table options and add a fancy green + sign to add new records, but you will can not have inline fields to add new records directly to your database within the same single page. Djnago can not handle this on a single page.
You can try creating real intermediary table and show it using through, but thats totally a longer job and I do not test it to see its results.
Update: There is also the reason why django do not let something like that. Consider following:
Table | Table_Row | Row
-----------+-----------+---------
Start 5 | |
Step 1 5 | | 1
Step 2 5 | 5-1 | 1
At the beginning, You have a table with no related Rows, you want to add a row to the table... For joining a row with a table, you must first create a row so you execute step 1. After You do create your row, you can create Table_Row record to join these two. So in contains more than a single database insertion. Django crew may avoid such usage since it contains multiple inserts and operation is related more tables.
But this is just an assumption on the reason of the behavior.
In your admin.py try this
class RowInline(admin.TabularInline):
model = Table.rows.through
list_display = ('name',)
class TableAdmin(admin.ModelAdmin):
inlines = [
RowInline,
]
readonly_fields = ('rows',)

Categories