Django Serializer Nested Creation: How to avoid N+1 queries on relations

Django Serializer Nested Creation: How to avoid N+1 queries on relations - python

There are dozens of posts about n+1 queries in nested relations in Django, but I can't seem to find the answer to my question. Here's the context:
The Models
class Book(models.Model):
title = models.CharField(max_length=255)
class Tag(models.Model):
book = models.ForeignKey('app.Book', on_delete=models.CASCADE, related_name='tags')
category = models.ForeignKey('app.TagCategory', on_delete=models.PROTECT)
page = models.PositiveIntegerField()
class TagCategory(models.Model):
title = models.CharField(max_length=255)
key = models.CharField(max_length=255)
A book has many tags, each tag belongs to a tag category.
The Serializers
class TagSerializer(serializers.ModelSerializer):
class Meta:
model = Tag
exclude = ['id', 'book']
class BookSerializer(serializers.ModelSerializer):
tags = TagSerializer(many=True, required=False)
class Meta:
model = Book
fields = ['title', 'tags']
def create(self, validated_data):
with transaction.atomic():
tags = validated_data.pop('tags')
book = Book.objects.create(**validated_data)
Tag.objects.bulk_create([Tag(book=book, **tag) for tag in tags])
return book
The Problem
I am trying to POST to the BookViewSet with the following example data:
{
"title": "The Jungle Book"
"tags": [
{ "page": 1, "category": 36 }, // plot intro
{ "page": 2, "category": 37 }, // character intro
{ "page": 4, "category": 37 }, // character intro
// ... up to 1000 tags
]
}
This all works, however, during the post, the serializer proceeds to make a call for each tag to check if the category_id is a valid one:
With up to 1000 nested tags in a call, I can't afford this.
How do I "prefetch" for the validation?
If this is impossible, how do I turn off the validation that checks if a foreign_key id is in the database?
EDIT: Additional Info
Here is the view:
class BookViewSet(views.APIView):
queryset = Book.objects.all().select_related('tags', 'tags__category')
permission_classes = [IsAdminUser]
def post(self, request, format=None):
serializer = BookSerializer(data=request.data)
if serializer.is_valid():
serializer.save()
return Response(serializer.data, status=status.HTTP_201_CREATED)
return Response(serializer.errors, status=status.HTTP_400_BAD_REQUEST)

The DRF serializer is not the place (in my own opinion) to optimize a DB query. Serializer has 2 jobs:
Serialize and check the validity of input data.
Serialize output data.
Therefore the correct place to optimize your query is the corresponding view.
We will use the select_related method that:
Returns a QuerySet that will “follow” foreign-key relationships, selecting additional related-object data when it executes its query. This is a performance booster which results in a single more complex query but means later use of foreign-key relationships won’t require database queries.
to avoid the N+1 database queries.
You will need to modify the part of your view code that creates the corresponding queryset, in order to include a select_related call.
You will also need to add a related_name to the Tag.category field definition.
Example:
# In your Tag model:
category = models.ForeignKey(
'app.TagCategory', on_delete=models.PROTECT, related_name='categories'
)
# In your queryset defining part of your View:
class BookViewSet(views.APIView):
queryset = Book.objects.all().select_related(
'tags', 'tags__categories'
) # We are using the related_name of the ForeignKey relationships.
If you want to test something different that uses also the serializer to cut down the number of queries, you can check this article.

I think the issue here is that the Tag constructor is automatically converting the category id that you pass in as category into a TagCategory instance by looking it up from the database. The way to avoid that is by doing something like the following if you know that all of the category ids are valid:
def create(self, validated_data):
with transaction.atomic():
tags = validated_data.pop('tags')
book = Book.objects.create(**validated_data)
tag_instances = [ Tag(book_id=book.id, page=x['page'], category_id=x['category']) for x in tags ]
Tag.objects.bulk_create(tag_instances)
return book

I've come up with an answer that gets things working (but that I'm not thrilled about): Modify the Tag Serializer like this:
class TagSerializer(serializers.ModelSerializer):
category_id = serializers.IntegerField()
class Meta:
model = Tag
exclude = ['id', 'book', 'category']
This allows me to read/write a category_id without having the overhead of validations. Adding category to exclude does mean that the serializer will ignore category if it's set on the instance.

Problem is that you don't set created tags to the book instance so serializer try to get this while returning.
You need to set it to the book as a list:
def create(self, validated_data):
with transaction.atomic():
book = Book.objects.create(**validated_data)
# Add None as a default and check that tags are provided
# If you don't do that, serializer will raise error if request don't have 'tags'
tags = validated_data.pop('tags', None)
tags_to_create = []
if tags:
tags_to_create = [Tag(book=book, **tag) for tag in tags]
Tag.objects.bulk_create(tags_to_create)
# Here I set tags to the book instance
setattr(book, 'tags', tags_to_create)
return book
Provide Meta.fields tuple for TagSerializer (it's weird that this serializer don't raise error saying that fields tuple is required)
class TagSerializer(serializers.ModelSerializer):
class Meta:
model = Tag
fields = ('category', 'page',)
Prefetching tag.category should be NOT necessary in this case because it's just id.
You will need prefetching Book.tags for GET method. The simplest solution is to create static method for serializer and use it in viewset get_queryset method like this:
class BookSerializer(serializers.ModelSerializer):
...
#staticmethod
def setup_eager_loading(queryset): # It can be named any name you like
queryset = queryset.prefetch_related('tags')
return queryset
class BookViewSet(views.APIView):
...
def get_queryset(self):
self.queryset = BookSerializer.setup_eager_loading(self.queryset)
# Every GET request will prefetch 'tags' for every book by default
return super(BookViewSet, self).get_queryset()

select_related function will check ForeignKey in the first time.
Actually,this is a ForeignKey check in the relational database and you can use SET FOREIGN_KEY_CHECKS=0; in database to close inspection.

Related

Django REST Framework | many-to-many relation returning "detail: not found"

I'm using Django REST framework. I've created a model for items in an inventory system and also a through table (named subassembly) for many-to-many relationship between items themselves, so an item can be a subpart of other items and vice versa.
I'm just not sure I've done it right and I can't seem to get any results. When I visit the backend at a URL such as http://localhost:8000/api/subassemblies/2/, the response is
{"detail": "Not found."}
but I'm hoping to see all of an item's subparts, or "children". PUT or any other type of request has the same outcome.
If it matters, when accessing the subassemblies from the admin page, I can create relationships between items just fine. But only one at a time though. And I need to be able to edit all of an item's subparts in one go (at least, frontend-wise). Currently, the request body is structured like so:
{
"parent_id": 2,
"children": [
{ "child_id": 5, "qty": 2 },
{ "child_id": 4, "qty": 3 },
]
}
This also allows me to use .set() on a particular item's children which is useful because I think it also removes any prior children that are not included in the new set.
views.py
class SubassemblyDetail(generics.RetrieveUpdateDestroyAPIView):
"""
Retrieve, update, or delete a particular items subassembly
"""
queryset = Subassembly.objects.all()
serializer_class = SubassemblySerializer
def get_queryset(self):
item_id = self.kwargs['pk']
item = Item.objects.get(pk=item_id)
return item.children.all()
models.py
class Item(models.Model):
# ... (various other fields)
children = models.ManyToManyField('self', through='Subassembly', blank=True)
class Subassembly(models.Model):
parent_id = models.ForeignKey(Item, related_name='parent_item', on_delete=models.CASCADE)
child_id = models.ForeignKey(Item, related_name='child_item', on_delete=models.CASCADE)
child_qty = models.PositiveSmallIntegerField(default=1)
serializers.py
class SubassemblySerializer(ModelSerializer):
parent_id = get_primary_key_related_model(ShortItemSerializer)
child_id = get_primary_key_related_model(ShortItemSerializer)
class Meta:
model = Subassembly
fields = '__all__'
def update(self, instance, validated_data):
children = validated_data.pop('children')
child_list = []
qty_list = []
for child in children:
pk = child['child_id']
qty = child['qty']
obj = Item.objects.get(id=pk)
child_list.append(obj)
qty_list.append(qty)
instance.children.set(child_list, through_defaults={'qty': qty_list})
instance.save()
return instance
def delete(self, instance):
instance.children.clear()
For referencing the parent and child item objects, I'm using a mixin or something from this post so I can do so with just their primary keys.
Also, the item serializer code currently has no reference to the subassembly through table or its serializer.
I tried adding something like that but the error was still there and it also made the children field required when creating an item, which I don't want.
That code looked like this:
class ItemSerializer(ModelSerializer):
# ... (other fields)
children = SubassemblySerializer(many=true)
def create(self, validated_data):
return Item.objects.create(**validated_data)
class Meta:
model = Item
fields = '__all__'
urls.py
urlpatterns = [
path('', views.RouteList.as_view()),
path('items/', views.ItemList.as_view(), name='item-list'),
path('items/<int:pk>/', views.ItemDetail.as_view(), name='item-detail'),
path('suppliers/', views.SupplierList.as_view(), name='supplier-list'),
path('suppliers/<int:pk>/', views.SupplierDetail.as_view(), name='supplier-detail'),
path('subassemblies/<int:pk>/', views.SubassemblyDetail.as_view(), name='subassembly-detail')
]
The other urls all work fine, it's just the subassemblies url that returns this error.

If, I understand correctly, your problem involves retrieving children from an Item object. My suspicion is that the Item object isn't being retrieved. The way to diagnose this is as follows:
Enter the Django interactive shell using the following command:
python manage.py shell
From within the shell do the following:
# Import Item
from <your_model_name>.models import Item
# Retrieve an Item object
item = Item.objects.get(id=2)
# Retrieve children
children = item.children.all()
# Get the count of children
print(children.count())
If children.count() is 0, then it means that your item object doesn't have any children. If you can rule this possibility out, it means your views or serializers are the issue.
From a cursory glance, the get_queryset() method in SubAssemblyDetail returns an Item queryset, which is the likely issue. Under-the-hood SubAssemblyDetail executes the retrieve() method see: https://www.cdrf.co/3.13/rest_framework.generics/RetrieveUpdateDestroyAPIView.html

Solved! Thanks to Hamster Hooey - the issue was that in my get_queryset() method under the SubassemblyDetail view, I was returning an item queryset rather than a subassembly queryset, which seems to be unacceptable to Django.
So rather than item.children.all(), I did Subassembly.objects.filter(parent_id=item_id) and it all showed up fine.

Id instead of String when displaying foreign key field in DRF

I'm trying to return the name of the pricing field but all I get is its foreign key id instead. What am I doing wrong here? I looked at some similiar issues on here but I didn't find anything that resembled my situation.
class UserProfileSerializer(serializers.ModelSerializer):
class Meta:
model = UserProfile
fields = (
"assignedteams",
"agent",
"facility",
"organisor",
"avatar",
)
class UserSubscriptionSerializer(serializers.ModelSerializer):
class Meta:
model = Subscription
fields = (
"user",
"pricing",
"status",
)
class UserSerializer(UserDetailsSerializer):
profile = UserProfileSerializer(source="userprofile")
subscription = UserSubscriptionSerializer(source="usersubscription")
class Meta(UserDetailsSerializer.Meta):
fields = UserDetailsSerializer.Meta.fields + ('profile', 'subscription',)
def update(self, instance, validated_data):
userprofile_serializer = self.fields['profile']
userprofile_instance = instance.userprofile
userprofile_data = validated_data.pop('userprofile', {})
usersubscription_serializer = self.fields['subscription']
usersubscription_instance = instance.usersubscription
usersubscription_data = validated_data.pop('usersubscription', {})
# update the userprofile fields
userprofile_serializer.update(userprofile_instance, userprofile_data)
usersubscription_serializer.update(usersubscription_instance, usersubscription_data)
instance = super().update(instance, validated_data)
return instance

You have 2 options to solve this problem.
option1:
If you want to return only the name of your pricing model you can use SlugRelatedField to do it.
Example:
class UserSubscriptionSerializer(serializers.ModelSerializer):
pricing = serializers.SlugRelatedField('name', readonly=True)
class Meta:
model = Subscription
fields = (
"user",
"pricing",
"status",
)
Option2:
If you want to return the Pricing object you can create a new ModelSerializer for your Pricing model and use it.
Example:
class PricingSerializer(serializers.ModelSerializer):
class Meta:
model = Pricing
fields = ["id","name"]
class UserSubscriptionSerializer(serializers.ModelSerializer):
pricing = PricingSerializer(readonly=True)
class Meta:
model = Subscription
fields = (
"user",
"pricing",
"status",
)
There are some other options that can you use but you must explain more about your problem can I will help you with.

you can easily add a new field representation or override the pricing field when want to represent data
so in your serializer add the following code
class UserSubscriptionSerializer(serializers.ModelSerializer):
class Meta:
model = Subscription
fields = (
"user",
"pricing",
"status",
)
def to_representation(self, instance):
data = super().to_representation(instance)
data['pricing_name'] = instance.pricing.name # or replace the name with your pricing name field
return data

As you are saying pricing returned FK id, so i assume pricing column inside Subscription model is a FK to another model, let's assume it Pricing model.
You can create a serializer for Pricing and use it on UserSubscriptionSerializer,
like the way you created UserProfileSerializer and UserSubscriptionSerializer for UserSerializer
But, using directly a nested serializer will give you problem while doing write operation since as far i can understand you are accepting pricing as FK value when creating or updating
To solve this issue you can do some if/else on get_fields() method
class UserSubscriptionSerializer(serializers.ModelSerializer):
class Meta:
model = Subscription
fields = (
"user",
"pricing",
"status",
)
def get_fields(self):
fields = super().get_fields()
# make sure request is passed through context
if self.context['request'] and self.context['request'].method == 'GET':
fields['pricing']=PricingSerializer()
return fields
Now coming back to the question, since you only need the pricing name which i assume name is a column on Pricing model
simply rewrite the previous code as
def get_fields(self):
fields = super().get_fields()
# make sure request is passed through context
if self.context['request'] and self.context['request'].method == 'GET':
fields['pricing'] = serializers.CharField(source='pricing.name', read_only=True)
return fields
P.S: I haven't tested this code on my computer

Django + Django Rest Framework: get correct related objects on intermediate model

I have an intermediate model with the following fields:
class UserSkill(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
skill = models.ForeignKey(Skill, on_delete=models.CASCADE, related_name='user_skills')
disabled = models.BooleanField(default=False)
As you can see, it has two foreign keys, one to the auth user and one to another table called skill.
I am trying to get all Skills assigned to an specific user, so I do the following get_queryset in my ViewSet:
class AssignedSkillViewSet(viewsets.ModelViewSet):
queryset = Skill.objects.all()
serializer_class = AssignedSkillSerializer
permission_classes = [permissions.IsAuthenticated]
def get_queryset(self):
user = self.request.user
return Skill.objects.filter(user_skills__user=user, user_skills_user__disabled=False))
Now, I also need to include the intermediate model information in the API, which I can access trough users_skills related name in DRF's Serializer, as follows:
class AssignedSkillSerializer(serializers.HyperlinkedModelSerializer):
class Meta:
model = Skill
fields = [
'id',
'url',
'title',
'description',
'user_skills',
]
But when I try to get that information it returns ALL user_skills related to the assigned skill, no matter if they are assigned to other users. I need the related model information only for that user and that skill.
For Example:
If I have a skill named Math, and a user named Maria
related_skills = Skill.objects.filter(user_skills__user=user, user_skills_user__disabled=False)).user_skills.all()
The above code will return:
[
<UserSkill: Math+Jenniffer>,
<UserSkill: Math+Gabriel>,
<UserSkill: Math+John>,
<UserSkill: Math+Maria>,
]
I only need to get the item <UserSkill: Math+Maria>. The list is not ordered in any way so getting the last item on the list does not work in all cases.
I know there is something I'm probably missing. I appreciate any help or clues you people can give me.

In this case you need to use a Prefetch..[Django-doc] object with a custom queryset, that uses the same filters as your main queryset like this:
from django.db.models import Prefetch
def get_queryset(self):
user = self.request.user
return Skill.objects.filter(
user_skills__user=user,
user_skills__user__disabled=False,
).prefetch_related(
"user_skills",
queryset=UserSkill.objects.filter(
user=user,
user__disabled=False,
)
)

I think that when you do the filter:
Skill.objects.filter(
user_skills__user=user, #condition_1
user_skills_user__disabled=False, #condition_2
).user_skills.all()
You already did a query related to the UserSkill model. Because the filter is done in the Skill model and the #condition_1 (user_skills__user=user) uses the information from the UserSkill model to filter by users. But when you do .user_skills.all() at the end of the query you are overring the filter with all the data from the UserSkill model.
To get a list of UserSkill instances from the filter you could try:
UserSkill.objects.filter(
user="Maria",
skill="Math",
)

Maybe this will help
serializers.py
class SkillSerializer(serializers.ModelSerializer):
class Meta:
model = Skill
fields = ['id', ...]
class UserSkillSerializer(serializers.ModelSerializer):
skill_detail = SkillSerializer(many=True)
class Meta:
model = UserSkill
fields = ['id', 'user', 'skill_detail']
views.py
class AssignedSkillViewSet(viewsets.ModelViewSet):
queryset = UserSkill.objects.all()
serializer_class = UserSkillSerializer
permission_classes = [permissions.IsAuthenticated]
def get_queryset(self):
user = self.request.user
return UserSkill.objects.filter(user=user, disabled=False))

How to serialize an array of objects in Django

I am working with Django and REST Framework and I am trying to create a get function for one of my Views and running into an error. The basic idea is that I am creating a market which can have multiple shops. For each shop there can be many products. So, I am trying to query all those products which exist in one shop. Once I get all those products I want to send it to my serializer which will finally return it as a JSON object. I have been able to make it work for one product but it does not work for an array of products.
My Product model looks like this:
'''Product model to store the details of all the products'''
class Product(models.Model):
# Define the fields of the product model
name = models.CharField(max_length=100)
price = models.IntegerField(default=0)
quantity = models.IntegerField(default=0)
description = models.CharField(max_length=200, default='', null=True, blank=True)
image = models.ImageField(upload_to='uploads/images/products')
category = models.ForeignKey(Category, on_delete=models.CASCADE, default=1) # Foriegn key with Category Model
store = models.ForeignKey(Store, on_delete=models.CASCADE, default=1)
''' Filter functions for the product model '''
# Create a static method to retrieve all products from the database
#staticmethod
def get_all_products():
# Return all products
return Product.objects.all()
# Filter the data by store ID:
#staticmethod
def get_all_products_by_store(store_id):
# Check if store ID was passed
if store_id:
return Product.objects.filter(store=store_id)
The product serializer that I built is as follows:-
class ProductSerializer(serializers.ModelSerializer):
class Meta:
model = Product
fields = '__all__'
and the view that I created is below
class StoreView(generics.ListAPIView):
"""Store view which returns the store data as a Json file.
"""
# Define class variables
serializer_class = StoreSerializer
# Manage a get request
def get(self, request):
# Get storeid for filtering from the page
store_id = request.GET.get('id')
if store_id:
queryset = Product.get_all_products_by_store(store_id)
# queryset = Product.get_all_products_by_store(store_id)[0]
else:
queryset = Product.get_all_products()
# queryset = Product.get_all_products()[0]
print("QUERYSET", queryset)
return Response(ProductSerializer(queryset).data)
The above view gives me the following error
AttributeError at /market
Got AttributeError when attempting to get a value for field `name` on serializer `ProductSerializer`.
The serializer field might be named incorrectly and not match any attribute or key on the `QuerySet` instance.
Original exception text was: 'QuerySet' object has no attribute 'name'.
If instead queryset = Product.get_all_products_by_store(store_id), I use the line below it where I am only selecting the first option then I get the correct JSON response but if there multiple products then I am not able to serialize. How do I make it work?

If you want to serialize more than one record, either use ListSerializer instead, or pass many=True the the constructor of ModelSerializer:
return Response(ProductSerializer(queryset, many=True).data)

I found the answer thanks to #yedpodtrzitko for giving the direction.
I had to make two changes.
Define queryset outside the function
Pass many=True the the constructor of ModelSerializer
class StoreView(generics.ListAPIView):
"""Store view which returns the store data as a Json file.
"""
# Define class variables
queryset = []
serializer_class = StoreSerializer
# Manage a get request
def get(self, request):
# Get storeid for filtering from the page
store_id = request.GET.get('id')
if store_id:
queryset = Product.get_all_products_by_store(store_id)
else:
queryset = Product.get_all_products()
print("QUERYSET", queryset)
return Response(ProductSerializer(queryset, many = True).data)

Django Rest Framework- can I allow pk id or full objects in a serializer's create method?

Lets say I have the following models:
class Author(models.Model):
first_name = models.CharField(max_length=32)
last_name = models.CharField(max_length=32)
class Book(models.Model):
title = models.CharField(max_length=64)
author = models.ForeignKeyField(Author, on_delete=models.CASCADE)
And I have the following serializer:
class BookSerializer(serializers.ModelSerializer):
class Meta:
model = Book
fields = ('id', 'title', 'author')
read_only_fields = ('id')
If I then query my books, A book's data looks like:
{
"id": 1,
"title": "Book Title",
"author": 4
}
Which is what I want, as I return both an array of books, as well as an array of authors, and allow the client to join everything up. This is because I have many authors that are repeated across books.
However, I want to allow the client to either submit an existing author id to create a new book, or all of the data for a new author. E.g.:
Payload for new book with existing author:
{
"title": "New Book!",
"author": 7
}
or, payload for a new book with a new author:
{
"title": "New Book!",
"author": {
"first_name": "New",
"last_name": "Author"
}
}
However the second version, will not pass the data validation step in my serializer. Is there a way to override the validation step, to allow either an author id, or a full object? Then in my serializer's create method, I can check the type, and either create a new author, get its id, and create the new book, or just attach the existing id. Thoughts?

I believe that it is not possible to do it in the way you want ( using one field author).
It just because one serializer cannot handle two different types for one field.
Note: i might be wrong about the previous statement.
However, the following is a potential solution for you. You just need to use different field name to create new author.
class BookSerializer(serializers.ModelSerializer):
author = serializers.PrimaryKeyRelatedField(
required=False,
queryset=Author.objects.all(),
)
author_add = AuthorSerializer(write_only=True, required=False)
class Meta:
model = Book
fields = ('id', 'title', 'author', 'author_add')
read_only_fields = ('id')
def create(self, validated_data):
author_add_data = validated_data.pop('author_add', None)
if author_add is not None:
validated_data['author'] = Author.objects.create(**author_add_data)
return super().create(validated_data)
Note: you need to handle a case where you send both author and author_add. Probably add a check into validation step and raise ValidationError if both are provided.
Offtopic hint: you dont need to explicityl state read_only_fields = ('id',) - primary keys are read-only.

For anyone else trying to do this, here is what I ended up getting working.
For my book serializer I did the following:
class BookSerializer(serializers.ModelSerializer):
# make author a foreign key/id, read-only field so that it isn't
# processed by the validator, and on read returns just the id.
class Meta:
model = Book
fields = ('id', 'title', 'author')
read_only_fields = ('id', 'author',)
# override run_validation to validate our author
def run_validation(self, data):
# run base validation. Since `author` is read_only, it will
# be ignored.
value = super(Book, self).run_validation(data)
# inject our validated author into the validated data
value['author'] = self.validate_author(data['author'])
return value
# Custom author validation
def validate_author(self, author):
errors = OrderedDict()
if isinstance(author, int): # if just a key, retrieve the author
try:
author_instance = Author.objects.get(pk=author)
except Author.DoesNotExist:
errors['author'] = "Author with pk {} does not exist.".format(author)
raise ValidationError(errors)
else: # if passed an author object...
author_serializer = AuthorSerializer(data=author, many=False)
author_serializer.is_valid(raise_exception=True)
author_instance = author_serializer.save()
return author_instance
I need to do a bit more error checking (e.g.- no author passed), but it works quite well- the consumer of the API can submit either an author id, or a serialized author object to create a new author. And the API itself returns just an id as was needed.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django Serializer Nested Creation: How to avoid N+1 queries on relations - python

select_related function will check ForeignKey in the first time. Actually,this is a ForeignKey check in the relational database and you can use SET FOREIGN_KEY_CHECKS=0; in database to close inspection.

Related

Django REST Framework | many-to-many relation returning "detail: not found"

Id instead of String when displaying foreign key field in DRF

Django + Django Rest Framework: get correct related objects on intermediate model

How to serialize an array of objects in Django

Django Rest Framework- can I allow pk id or full objects in a serializer's create method?

Categories

Resources