I was wondering how I can decrease the number of calls to my database when serializing:
I have the following 2 models:
class House(models.Model):
name = models.CharField(max_length = 100, null = True, blank = True)
address = models.CharField(max_length = 500, null = True, blank = True)
class Room(models.Model):
house = models.ForeignKey(House)
name = models.CharField(max_length = 100)
There is 1 house, it can have multiple Room.
I am using django-rest-framework and trying to serialize all 3 things together at the house level.
class HouseSerializer(serializers.ModelSerializer)
rooms = serializers.SerializerMethodField('room_serializer')
def room_serializer(self):
rooms = Room.objects.filter(house_id = self.id) # we are in House serializer, so self is a house
return RoomSerializer(rooms).data
class Meta:
model = House
fields = ('id', 'name', 'address')
So now, for every house I want to serialize, I need to make a separate call for its Rooms. It works, but that's an extra call.
(imagine me trying to package a lot of stuff together!)
Now, if I had 100 houses, to serialize everything, I would need to make 100 Database hits, O(n) time
I know I can decrease this to 2 hits, if I can get all the information together. O(1) time
my_houses = Houses.objects.filter(name = "mine")
my_rooms = Rooms.objects.filter(house_id__in = [house.id for house in my_houses])
My question is how can I do this? and get the serializers to be happy?
Can I somehow do a loop after doing my two calls, to "attach" a Room to a House, then serialize it? (am I allowed to add an attribute like that?) If I can, how do i get my serializer to read it?
Please note that I do not need django-rest-serializer to allow me to change the attributes in the Rooms, this way. This is for GET only.
As it is currently writen, using a SerializerMethodField, you are making N+1 queries. I have covered this a few times on Stack Overflow for optimizing the database queries and in general, it's similar to how you would improve the performance in Django. You are dealing with a one-to-many relationship, which can be optimized the same way as many-to-many relationships with prefetch_related.
class HouseSerializer(serializers.ModelSerializer)
rooms = RoomSerializer(read_only=True, source="room_set", many=True)
class Meta:
model = House
fields = ('id', 'name', 'address', )
The change I made uses nested serializers instead of manually generating the serializer within a SerializerMethodField. I had restricted it to be read_only, as you mentioned you only need it for GET requests and writable serializers have problems in Django REST Framework 2.4.
As your reverse relationship for the Room -> House relationship has not been set, it is the default room_set. You can (and should) override this by setting the related_name on the ForeignKey field, and you would need to adjust the source accordingly.
In order to prevent the N+1 query issue, you will need to override the queryset on your view. In the case of a generic view, this would be done on the queryset attribute or within the get_queryset method like queyset = House.objects.prefetch_related('room_set'). This will request all of the related rooms alongisde the request for the House object, so instead of N+1 requests you will only have two requests.
Related
Using the Django ORM, how does one access data from related tables without effectively making a separate call for each record (or redundantly denormalizing data to make it more easily accessible)?
Say I have 3 Models:
class Tournament(models.Model):
name = models.CharField(max_length=250)
active = models.BooleanField(null=True,default=1)
class Team(models.Model):
name = models.CharField(max_length=250)
coach_name = models.CharField(max_length=250)
active = models.BooleanField(null=True,default=1)
class Player(models.Model):
user = models.ForeignKey(
settings.AUTH_USER_MODEL,
on_delete=models.DO_NOTHING
)
number = models.PositiveIntegerField()
age = models.PositiveIntegerField()
active = models.BooleanField(null=True,default=1)
Note that this Player model is important in the application as it's a major connection to most of the models - from registration to teams to stats to results to prizes. But this Player model doesn't actually contain the person's name as the model contains a user field which is the foreign key to a custom AUTH_USER_MODEL ('user') model which contains first/last name information. This allows the player to log in to the application and perform certain actions.
In addition to these base models, say that since a player can play on different teams in different tournaments, I also have a connecting ManyToMany model:
class PlayerToTeam(models.Model):
player = models.ForeignKey(
Player,
on_delete=models.DO_NOTHING
)
team = models.ForeignKey(
Team,
on_delete=models.DO_NOTHING
)
tournament = models.ForeignKey(
Tournament,
on_delete=models.DO_NOTHING
)
As an example of one of the challenges I'm encountering, let's say I'm trying to create a form that allows coaches to select their starting lineup. So I need my form to list the names of the Players on a particular Team at a particular Tournament.
Given the tournament and team IDs, I can easily pull back the necessary QuerySet to describe the initial records I'm interested in.
playersOnTeam = PlayerToTeam.objects.filter(tournament=[Tournament_id]).filter(team=[Team_id])
This returns the QuerySet of the IDs (but only the IDs) of the team, the tournament, and the players. However, the name data is two models away:
PlayerToTeam->[player_id]->Player->[user_id]->User->[first_name] [last_name]
Now, if I pull back only a single record, I could simply do
onlyPlayerOnTeam = PlayerToTeam.objects.filter(tournament=[Tournament_id]).filter(team=[Team_id]).filter(player=[Player_id]).get()
onlyPlayerOnTeam.player.user.first_name
So if I was only needing to display the names, I believe I could pass the QuerySet in the view return and loop through it in the template and display what I need. But I can't figure out if you can do something similar when I need the names to be displayed as part of a form.
To populate the form, I believe I could loop through the initial QuerySet and build a new datastructure:
playersOnTeam = PlayerToTeam.objects.filter(tournament=[Tournament_id]).filter(team=[Team_id])
allPlayersData= []
for nextPlayer in playersOnTeam:
playerDetails= {
"player_id": nextPlayer.player.id,
"first_name": nextPlayer.player.user.first_name,
"last_name": nextPlayer.player.user.last_name,
}
allPlayersData.append(playerDetails)
form = StartingLineupForm(allPlayersData)
However, I fear that would result in a separate database call for every player/user!
And while that may be tolerable for 6-10 players, for larger datasets, that seems less than ideal. Looping through performing a query for every user seems completely wrong.
Furthermore, what's frustrating is that this would be simple enough with a straight SQL query:
SELECT User.first_name, User.last_name
FROM PlayerToTeam
INNER JOIN Player ON PlayerToTeam.player_id = Player.id
INNER JOIN User ON Player.user_id = User.id
WHERE PlayerToTeam.tournament_id=[tourney_id] AND PlayerToTeam.team_id=[team_id]
But I'm trying to stick to the Django ORM best practices as much as I can and avoid just dropping to SQL queries when I can't immediately figure something out, and I'm all but certain that this isn't so complicated of a situation that I can't accomplish this without resorting to direct SQL queries.
I'm starting to look at select_related and prefetch_related, but I'm having trouble wrapping my head around how those work for relations more than a single table connection away. Like it seems like I could access the Player.age data using the prefetch, but I don't know how to get to User.first_name from that.
Any help would be appreciated.
I would suggest two approaches:
A) select related (one DB query):
objects = PlayerToTeam.objects.filter(
...
).select_related(
'player__user',
).only('player__user__name')
name = objects.first().user.name
B) annotate (one DB query):
objects = PlayerToTeam.objects.filter(
...
).annotate(
player_name=F('player__user__name'),
)
name = objects.first().player_name
To be sure you have only one object for specific player, team and tournament, I would suggest adding unique_together:
class PlayerToTeam(models.Model):
...
class Meta:
unique_together = ('player', 'team', 'tournament', )
I am using the Django Python Serializer to serialize a list of models which contain a many-to-many relationship. Even with prefetch_related, the serialization retrieves the prefetched-fields. For example:
class House(models.Model):
name = models.CharField(...)
rooms = models.ManyToManyField(Door)
class Room(models.Model):
name = models.CharField(...)
num_windows = models.PositiveIntegerField(...)
Using debug mode I can see that the following function makes the expected 2 database requests.
getHouses():
House.objects.all().prefetch_related('rooms')
However, when I attempt to serialize this object using the django.python.Serializer, it makes an additional query for the rooms in each house. Is there a way to configure the serializer to see the prefetched m2m relationships?
The only way to get rid of this is to build the map by yourself. You will need 2 seperate serializers, one for House and other for Room.
While serializing, first iterate on houses query and serialize it, then run Room serializer for house.rooms which gives you serialized room which can be put as another key in serialized house using house_serialized['rooms_serialized'] = rooms_serialized.
Hope this helps.
I have a simple model which includes a product and category table. The Product model has a foreign key Category.
When I make a tastypie API call that returns a list of categories /api/vi/categories/
I would like to add a field that determines the "product count" / the number of products that have a giving category. The result would be something like:
category_objects[
{
id: 53
name: Laptops
product_count: 7
},
...
]
The following code is working but the hit on my DB is heavy
def dehydrate(self, bundle):
category = Category.objects.get(pk=bundle.obj.id)
products = Product.objects.filter(category=category)
bundle.data['product_count'] = products.count()
return bundle
Is there a more efficient way to build this query? Perhaps with annotate ?
You can use prefetch_related method of QuerSet to reverse select_related.
Asper documentation,
prefetch_related(*lookups)
Returns a QuerySet that will automatically
retrieve, in a single batch, related objects for each of the specified
lookups.
This has a similar purpose to select_related, in that both are
designed to stop the deluge of database queries that is caused by
accessing related objects, but the strategy is quite different.
If you change your dehydrate function to following then database will be hit single time.
def dehydrate(self, bundle):
category = Category.objects.prefetch_related("product_set").get(pk=bundle.obj.id)
bundle.data['product_count'] = category.product_set.count()
return bundle
UPDATE 1
You should not initialize queryset inside dehydrate function. queryset should be always set in Meta class only. Please have a look at following example from django-tastypie documentation.
class MyResource(ModelResource):
class Meta:
queryset = User.objects.all()
excludes = ['email', 'password', 'is_staff', 'is_superuser']
def dehydrate(self, bundle):
# If they're requesting their own record, add in their email address.
if bundle.request.user.pk == bundle.obj.pk:
# Note that there isn't an ``email`` field on the ``Resource``.
# By this time, it doesn't matter, as the built data will no
# longer be checked against the fields on the ``Resource``.
bundle.data['email'] = bundle.obj.email
return bundle
As per official django-tastypie documentation on dehydrate() function,
dehydrate
The dehydrate method takes a now fully-populated bundle.data & make
any last alterations to it. This is useful for when a piece of data
might depend on more than one field, if you want to shove in extra
data that isn’t worth having its own field or if you want to
dynamically remove things from the data to be returned.
dehydrate() is only meant to make any last alterations to bundle.data.
Your code does additional count query for each category. You're right about annotate being helpfull in this kind of a problem.
Django will include all queryset's fields in GROUP BY statement. Notice .values() and empty .group_by() serve limiting field set to required fields.
cat_to_prod_count = dict(Product.objects
.values('category_id')
.order_by()
.annotate(product_count=Count('id'))
.values_list('category_id', 'product_count'))
The above dict object is a map [category_id -> product_count].
It can be used in dehydrate method:
bundle.data['product_count'] = cat_to_prod_count[bundle.obj.id]
If that doesn't help, try to keep similar counter on category records and use singals to keep it up to date.
Note categories are usually a tree-like beings and you probably want to keep count of all subcategories as well.
In that case look at the package django-mptt.
I've looked through Tastypie's documentation and searched for a while, but can't seem to find an answer to this.
Let's say that we've got two models: Student and Assignment, with a one-to-many relationship between them. The Assignment model includes an assignment_date field. Basically, I'd like to build an API using Tastypie that returns Student objects sorted by most recent assignment date. Whether the sorting is done on the server or in the client side doesn't matter - but wherever the sorting is done, the assignment_date is needed to sort by.
Idea #1: just return the assignments along with the students.
class StudentResource(ModelResource):
assignments = fields.OneToManyField(
AssignmentResource, 'assignments', full=True)
class Meta:
queryset = models.Student.objects.all()
resource_name = 'student'
Unfortunately, each student may have tens or hundreds of assignments, so this is bloated and unnecessary.
Idea #2: augment the data during the dehydrate cycle.
class StudentResource(ModelResource):
class Meta:
queryset = models.Student.objects.all()
resource_name = 'student'
def dehydrate(self, bundle):
bundle.data['last_assignment_date'] = (models.Assignment
.filter(student=bundle.data['id'])
.order_by('assignment_date')[0].assignment_date)
This is not ideal, since it'll be performing a separate database roundtrip for each student record. It's also not very declarative, nor elegant.
So, is there a good way to get this kind of functionality with Tastypie? Or is there a better way to do what I'm trying to achieve?
You can sort a ModelResource by a field name. Check out this part of the documentation http://django-tastypie.readthedocs.org/en/latest/resources.html#ordering
You could also set this ordering by default in the Model: https://docs.djangoproject.com/en/dev/ref/models/options/#ordering
Howdy. I'm working on migrating an internal system to Django and have run into a few wrinkles.
Intro
Our current system (a billing system) tracks double-entry bookkeeping while allowing users to enter data as invoices, expenses, etc.
Base Objects
So I have two base objects/models:
JournalEntry
JournalEntryItems
defined as follows:
class JournalEntry(models.Model):
gjID = models.AutoField(primary_key=True)
date = models.DateTimeField('entry date');
memo = models.CharField(max_length=100);
class JournalEntryItem(models.Model):
journalEntryID = models.AutoField(primary_key=True)
gjID = models.ForeignKey(JournalEntry, db_column='gjID')
amount = models.DecimalField(max_digits=10,decimal_places=2)
So far, so good. It works quite smoothly on the admin side (inlines work, etc.)
On to the next section.
We then have two more models
InvoiceEntry
InvoiceEntryItem
An InvoiceEntry is a superset of / it inherits from JournalEntry, so I've been using a OneToOneField (which is what we're using in the background on our current site). That works quite smoothly too.
class InvoiceEntry(JournalEntry):
invoiceID = models.AutoField(primary_key=True, db_column='invoiceID', verbose_name='')
journalEntry = models.OneToOneField(JournalEntry, parent_link=True, db_column='gjID')
client = models.ForeignKey(Client, db_column='clientID')
datePaid = models.DateTimeField(null=True, db_column='datePaid', blank=True, verbose_name='date paid')
Where I run into problems is when trying to add an InvoiceEntryItem (which inherits from JournalEntryItem) to an inline related to InvoiceEntry. I'm getting the error:
<class 'billing.models.InvoiceEntryItem'> has more than 1 ForeignKey to <class 'billing.models.InvoiceEntry'>
The way I see it, InvoiceEntryItem has a ForeignKey directly to InvoiceEntry. And it also has an indirect ForeignKey to InvoiceEntry through the JournalEntry 1->M JournalEntryItems relationship.
Here's the code I'm using at the moment.
class InvoiceEntryItem(JournalEntryItem):
invoiceEntryID = models.AutoField(primary_key=True, db_column='invoiceEntryID', verbose_name='')
invoiceEntry = models.ForeignKey(InvoiceEntry, related_name='invoiceEntries', db_column='invoiceID')
journalEntryItem = models.OneToOneField(JournalEntryItem, db_column='journalEntryID')
I've tried removing the journalEntryItem OneToOneField. Doing that then removes my ability to retrieve the dollar amount for this particular InvoiceEntryItem (which is only stored in journalEntryItem).
I've also tried removing the invoiceEntry ForeignKey relationship. Doing that removes the relationship that allows me to see the InvoiceEntry 1->M InvoiceEntryItems in the admin inline. All I see are blank fields (instead of the actual data that is currently stored in the DB).
It seems like option 2 is closer to what I want to do. But my inexperience with Django seems to be limiting me. I might be able to filter the larger pool of journal entries to see just invoice entries. But it would be really handy to think of these solely as invoices (instead of a subset of journal entries).
Any thoughts on how to do what I'm after?
First, inheriting from a model creates an automatic OneToOneField in the inherited model towards the parents so you don't need to add them. Remove them if you really want to use this form of model inheritance.
If you only want to share the member of the model, you can use Meta inheritance which will create the inherited columns in the table of your inherited model. This way would separate your JournalEntry in 2 tables though but it would be easy to retrieve only the invoices.
All fields in the superclass also exist on the subclass, so having an explicit relation is unnecessary.
Model inheritance in Django is terrible. Don't use it. Python doesn't need it anyway.