Using the Django ORM, how does one access data from related tables without effectively making a separate call for each record (or redundantly denormalizing data to make it more easily accessible)?
Say I have 3 Models:
class Tournament(models.Model):
name = models.CharField(max_length=250)
active = models.BooleanField(null=True,default=1)
class Team(models.Model):
name = models.CharField(max_length=250)
coach_name = models.CharField(max_length=250)
active = models.BooleanField(null=True,default=1)
class Player(models.Model):
user = models.ForeignKey(
settings.AUTH_USER_MODEL,
on_delete=models.DO_NOTHING
)
number = models.PositiveIntegerField()
age = models.PositiveIntegerField()
active = models.BooleanField(null=True,default=1)
Note that this Player model is important in the application as it's a major connection to most of the models - from registration to teams to stats to results to prizes. But this Player model doesn't actually contain the person's name as the model contains a user field which is the foreign key to a custom AUTH_USER_MODEL ('user') model which contains first/last name information. This allows the player to log in to the application and perform certain actions.
In addition to these base models, say that since a player can play on different teams in different tournaments, I also have a connecting ManyToMany model:
class PlayerToTeam(models.Model):
player = models.ForeignKey(
Player,
on_delete=models.DO_NOTHING
)
team = models.ForeignKey(
Team,
on_delete=models.DO_NOTHING
)
tournament = models.ForeignKey(
Tournament,
on_delete=models.DO_NOTHING
)
As an example of one of the challenges I'm encountering, let's say I'm trying to create a form that allows coaches to select their starting lineup. So I need my form to list the names of the Players on a particular Team at a particular Tournament.
Given the tournament and team IDs, I can easily pull back the necessary QuerySet to describe the initial records I'm interested in.
playersOnTeam = PlayerToTeam.objects.filter(tournament=[Tournament_id]).filter(team=[Team_id])
This returns the QuerySet of the IDs (but only the IDs) of the team, the tournament, and the players. However, the name data is two models away:
PlayerToTeam->[player_id]->Player->[user_id]->User->[first_name] [last_name]
Now, if I pull back only a single record, I could simply do
onlyPlayerOnTeam = PlayerToTeam.objects.filter(tournament=[Tournament_id]).filter(team=[Team_id]).filter(player=[Player_id]).get()
onlyPlayerOnTeam.player.user.first_name
So if I was only needing to display the names, I believe I could pass the QuerySet in the view return and loop through it in the template and display what I need. But I can't figure out if you can do something similar when I need the names to be displayed as part of a form.
To populate the form, I believe I could loop through the initial QuerySet and build a new datastructure:
playersOnTeam = PlayerToTeam.objects.filter(tournament=[Tournament_id]).filter(team=[Team_id])
allPlayersData= []
for nextPlayer in playersOnTeam:
playerDetails= {
"player_id": nextPlayer.player.id,
"first_name": nextPlayer.player.user.first_name,
"last_name": nextPlayer.player.user.last_name,
}
allPlayersData.append(playerDetails)
form = StartingLineupForm(allPlayersData)
However, I fear that would result in a separate database call for every player/user!
And while that may be tolerable for 6-10 players, for larger datasets, that seems less than ideal. Looping through performing a query for every user seems completely wrong.
Furthermore, what's frustrating is that this would be simple enough with a straight SQL query:
SELECT User.first_name, User.last_name
FROM PlayerToTeam
INNER JOIN Player ON PlayerToTeam.player_id = Player.id
INNER JOIN User ON Player.user_id = User.id
WHERE PlayerToTeam.tournament_id=[tourney_id] AND PlayerToTeam.team_id=[team_id]
But I'm trying to stick to the Django ORM best practices as much as I can and avoid just dropping to SQL queries when I can't immediately figure something out, and I'm all but certain that this isn't so complicated of a situation that I can't accomplish this without resorting to direct SQL queries.
I'm starting to look at select_related and prefetch_related, but I'm having trouble wrapping my head around how those work for relations more than a single table connection away. Like it seems like I could access the Player.age data using the prefetch, but I don't know how to get to User.first_name from that.
Any help would be appreciated.
I would suggest two approaches:
A) select related (one DB query):
objects = PlayerToTeam.objects.filter(
...
).select_related(
'player__user',
).only('player__user__name')
name = objects.first().user.name
B) annotate (one DB query):
objects = PlayerToTeam.objects.filter(
...
).annotate(
player_name=F('player__user__name'),
)
name = objects.first().player_name
To be sure you have only one object for specific player, team and tournament, I would suggest adding unique_together:
class PlayerToTeam(models.Model):
...
class Meta:
unique_together = ('player', 'team', 'tournament', )
Related
Django newbie here!
I am coming from .NET background I am frustrated as to how to do the following simple thing:
My simplified models are as follows
class Circle(BaseClass):
name = models.CharField("Name", max_length=2048, blank=False, null=False)
active = models.BooleanField(default=False)
...
class CircleParticipant(BaseClass):
circle = models.ForeignKey(Circle, on_delete=models.CASCADE, null=True, blank=True)
user = models.ForeignKey(User, on_delete=models.SET_NULL, null=True, blank=True)
status = models.CharField("Status", max_length=256, blank=False, null=False)
...
class User(AbstractBaseUser, PermissionsMixin):
email = models.EmailField(verbose_name="Email", unique=True, max_length=255, validators=[email_validator])
first_name = models.CharField(verbose_name="First name", max_length=30, default="first")
last_name = models.CharField(verbose_name="Last name", max_length=30, default="last")
...
My goal is to get a single circle with participants that include the users as well. With the extra requirement to do all that in a single DB trip.
in SQL terms I want to accomplish this:
SELECT circle.name, circle.active, circle_participant.status, user.email. user.first_name. user.last_name
FROM circle
JOIN circle_participant on circle.id = circle_participant.id
JOIN user on user.id = circle_participant.id
WHERE circle.id = 43
I've tried the following:
Circle.objects.filter(id=43) \
.prefetch_related(Prefetch('circleparticipant_set', queryset=CircleParticipant.objects.prefetch_related('user')))
This is supposed to be working but when I check the query property on that statement it returns
SELECT "circle"."id", "circle"."created", "circle"."updated", "circle"."name", "circle"."active", FROM "circle" WHERE "circle"."id" = 43
(additional fields omitted for brevity.)
Am I missing something or is the query property incorrect?
More importantly how can I achieve fetching all that data with a single DB trip.
For reference here's how to do it in .NET Entity Framework
dbContext.Circle
.Filter(x => x.id == 43)
.Include(x => x.CircleParticipants) // This will exist in the entity/model
.ThenInclude(x => x.User)
.prefetch_related will use a second query to reduce the bandwidth, otherwise it will repeat data for the same Circle and CircleParticipants multiple times. Your CircleParticipant however acts as a junction table, so you can use:
Circle.objects.filter(id=43).prefetch_related(
Prefetch('circleparticipant_set', queryset=CircleParticipant.objects.select_related('user')
)
)
Am I missing something or is the query property incorrect?
There are two ways that Django gives you to solve the SELECT N+1 problem. The first is prefetch_related(), which creates two queries, and joins the result in memory. The second is select_related(), which creates a join, but has a few more restrictions. (You also haven't set related_name on any of your foriegn keys. IIRC that is required before using select_related().)
More importantly how can I achieve fetching all that data with a single DB trip.
I would suggest that you not worry too much about doing it all in one query. One of the downsides of doing this in one query as you suggest is that lots of the data that comes back will be redundant. For example, the circle.name column will be the same for every row in the table which is returned.
You should absolutely care about how many queries you do - but only to the extent that you avoid a SELECT N+1 problem. If you're doing one query for each model class involved, that's pretty good.
If you care strongly about SQL performance, I also recommend the tool Django Debug Toolbar, which can show you the number of queries, the exact SQL, and the time taken by each.
in SQL terms I want to accomplish this:
There are a few ways you could accomplish that.
Use many-to-many
Django has a field which can be used to create a many-to-many relationship. It's called ManyToManyField. It will implicitly create a many-to-many table to represent the relationship, and some helper methods to allow you to easily query for all circles a user is in, or all users that a circle has.
You're also attaching some metadata to each user/circle relationship. That means you'll need to define an explicit table using ManyToManyField.through.
There are examples in the docs here.
Use a related model query
If I specifically wanted a join, and not a subquery, I would query the users like this:
Users.objects.filter(circleparticipant_set__circle_id=43)
Use a subquery
This also creates only one query, but it uses a subquery instead.
Users.objects.filter(circleparticipant_set=CircleParticipant.objects.filter(circle_id=43))
I have two models UserProfile and ChatUser.
ChatUser.models.py
class ChatUser(models.Model):
chat = models.ForeignKey(ChatRoom,on_delete=models.CASCADE)
user = models.ForeignKey(User,on_delete=models.CASCADE)
UserProfile.models.py
class UserProfile(models.Model):
user = models.OneToOneField(User,on_delete=models.CASCADE)
phone_number = models.IntegerField(default=0)
image = models.ImageField(upload_to='profile_image',blank=True,default='prof1.jpeg')
gender = models.CharField(max_length=10)
joined = JSONField(null=True)
ChatRoom.models
class ChatRoom(models.Model):
eid = models.CharField(max_length=64, unique=True)
name = models.CharField(max_length=100)
location = models.CharField(max_length=50)
vehicle = models.CharField(max_length=50)
brand = models.CharField(max_length=50)
max_limit = models.IntegerField()
joined in UserProfile is an array consisting room ids of the chatrooms model. Now when I delete a ChatRoom row, it automatically deletes the Foreign Key referenced ChatUser object since I am using on_delete=models.CASCADE. But how to update the joined in UserProfile model. I want to remove the id of the deleted ChatRoom from UserProfile.joined
I have used the django.db.models.signals to solve the updating part.
#receiver(post_delete,sender=ChatUser)
def update_profile(sender,instance,**kwargs):
id = instance.chat_id
joined = instance.user.userprofile.joined
if id in joined:
joined.remove(id)
model = profiles.models.UserProfile.objects.filter(user_id=instance.user.id).update(joined=joined)
SDRJ and Willem Van OnSem, thank you for your suggestions
#SAI SANTOSH CHIRAG- Please explain this. You have a ChatUser model that adds user_id and chatroom_id. Now, if I need to find out the list of chatrooms a user has joined, I can simply query this model. If I want to find out the total number of users in a specific chatroom then I can still query this table. Why do I need to keep track of joined in UserProfile? And I am basing this on the premise that joined keeps track of chatroom ids that a user has joined.
At any point, if you choose to add a many-to-many field in any of the models then this is my opinion. E.g Let's assume that you add the following in the UserProfile model
chatroom = models.ManytoManyField(Chat)
Imagine as the number of chatrooms the user joins grows, the list becomes larger and larger and I find it inconvenient because I will have this tiny scroll bar with a large list. It's not wrong but I simply stay away from M2M field for this purpose especially if I expect my list to grow as my application scales.
I prefer the ChatUser approach that you used. Yes, I might have repeating rows of user_ids or repeating chatroom_ids but I don't mind. I can live with it. It's still a bit cleaner to me. And this is simply my opinion. Feel free to disagree.
Lastly, I would rename the ChatUser model to ChatRoomUser...Why? Just by the name of it, I can infer it has something to do with two entities Chatroom and User.
I was wondering how I can decrease the number of calls to my database when serializing:
I have the following 2 models:
class House(models.Model):
name = models.CharField(max_length = 100, null = True, blank = True)
address = models.CharField(max_length = 500, null = True, blank = True)
class Room(models.Model):
house = models.ForeignKey(House)
name = models.CharField(max_length = 100)
There is 1 house, it can have multiple Room.
I am using django-rest-framework and trying to serialize all 3 things together at the house level.
class HouseSerializer(serializers.ModelSerializer)
rooms = serializers.SerializerMethodField('room_serializer')
def room_serializer(self):
rooms = Room.objects.filter(house_id = self.id) # we are in House serializer, so self is a house
return RoomSerializer(rooms).data
class Meta:
model = House
fields = ('id', 'name', 'address')
So now, for every house I want to serialize, I need to make a separate call for its Rooms. It works, but that's an extra call.
(imagine me trying to package a lot of stuff together!)
Now, if I had 100 houses, to serialize everything, I would need to make 100 Database hits, O(n) time
I know I can decrease this to 2 hits, if I can get all the information together. O(1) time
my_houses = Houses.objects.filter(name = "mine")
my_rooms = Rooms.objects.filter(house_id__in = [house.id for house in my_houses])
My question is how can I do this? and get the serializers to be happy?
Can I somehow do a loop after doing my two calls, to "attach" a Room to a House, then serialize it? (am I allowed to add an attribute like that?) If I can, how do i get my serializer to read it?
Please note that I do not need django-rest-serializer to allow me to change the attributes in the Rooms, this way. This is for GET only.
As it is currently writen, using a SerializerMethodField, you are making N+1 queries. I have covered this a few times on Stack Overflow for optimizing the database queries and in general, it's similar to how you would improve the performance in Django. You are dealing with a one-to-many relationship, which can be optimized the same way as many-to-many relationships with prefetch_related.
class HouseSerializer(serializers.ModelSerializer)
rooms = RoomSerializer(read_only=True, source="room_set", many=True)
class Meta:
model = House
fields = ('id', 'name', 'address', )
The change I made uses nested serializers instead of manually generating the serializer within a SerializerMethodField. I had restricted it to be read_only, as you mentioned you only need it for GET requests and writable serializers have problems in Django REST Framework 2.4.
As your reverse relationship for the Room -> House relationship has not been set, it is the default room_set. You can (and should) override this by setting the related_name on the ForeignKey field, and you would need to adjust the source accordingly.
In order to prevent the N+1 query issue, you will need to override the queryset on your view. In the case of a generic view, this would be done on the queryset attribute or within the get_queryset method like queyset = House.objects.prefetch_related('room_set'). This will request all of the related rooms alongisde the request for the House object, so instead of N+1 requests you will only have two requests.
What I have
I have an app, that archives tournaments in the game of chess. The app includes the following models:
class Tournament(models.Model):
name = models.CharField(max_length=128)
class Player(models.Model):
name = models.CharField(max_length=128)
# Abstract base class
class Match(models.Model):
tournament = models.ForeignKey(Tournament)
playerA = models.ForeignKey(Player, related_name='%(class)s_A') # eg. mastertournament_A
playerB = models.ForeignKey(Player, related_name='%(class)s_B')
score = models.CharField(max_length=16)
class Meta:
abstract = True
# here are tables of ``Match`` instances played out in a particular
# tournaments. All ``Match`` instances share the same fields
# so, I could also have one big table for all matches but I want to keep
# each Tournament in separate table for easiness.
class MasterTournament(Match):
pass
class AmateurTournament(Match):
pass
Now, I plan to have two different views: tournament_view (lists all matches played in a tournament) and player_view (lists all matches a player played throughout all tournaments)
Problem to solve
Given the views I mentioned, I need to perform two different queries for each.
In a tournament_view I will have filters (Choice Filter) playerA and playerB and I need to dynamically populate choices for them. This can easily be done with:
playersA_all = MasterTournament.objects.value_list('playerA')
playersB_all = MasterTournament.objects.value_list('playerB')
However, I am struggling to come up with the query for player_view. This view is very similar with Choice Filters playerA and playerB but now, for the choices I need to query all Tournament tables to get all opponents of the player who is being viewed. This will result in a bunch of database hits each time and in the process I'll need to introduce a temporary list to save and append results from different tables.
That's why I am feeling like I need to reorganize my models, but the only solution that comes to my mind is to have that huge one table with all tournaments' matches packed together, something I wanted to prevent from happening.
My question is, do you have any ideas how to tweak my models, or perhaps django does provide a solution to perform the query I need for player_view?
I've actually done something like this before, though I wasn't using Django to do it. The concept of getting all the opponents is a problem when the number of matches gets large. I was able to leverage my solution to also keep track of wins and losses, without having to calculate on the fly.
See www.eurosportscoreboard.com.
Anyway, the way I solved it was with triggers. You could do the same with a save signal.
Create an Opponent model with a fk relationship with Player and Match. When a Match is saved, create an Opponent for each player. The write will be a little slow, but the reads will be very fast.
Instead of having two ForeignKey fields, have one ManyToMany Field:
class Match(models.Model):
tournament = models.ForeignKey(Tournament)
players = models.ManyToManyField(Player, through='Participate')
score = models.CharField(max_length=16)
class Participate(models.Model):
player = models.ForeignKey(Player)
match = models.ForeignKey(Match)
visitor = models.BooleanField()
I think it solves most of your problem, and also makes a lot more sense, since there's no point in defining one as A and one is B. Both are players, there's nothing exceptionally distinguishable between them.
You cant query multiple tables at a time! You can make several queries and union results.
# so, I could also have one big table for all matches but I want to keep
# each Tournament in separate table for easiness.
This is a bad decision in case if you will need information from both tables in query results. Think about how you will query (with 2 tables) for one tournamet matches or positive score matches or matches with specific players - you must do queries to 2 tables and union results that doubles DB load. In this case you should create one table for matches, i think, and make one field for match type - Master or Amauter:
class Match(models.Model):
tournament = models.ForeignKey(Tournament)
playerA = models.ForeignKey(Player, related_name='%(class)s_A') # eg. mastertournament_A
playerB = models.ForeignKey(Player, related_name='%(class)s_B')
score = models.CharField(max_length=16)
master_or_amauter = models.BooleanField(default=True) # master by default
And with one table you have no problem in player_view...
I am creating a web application to manage robotics teams for our area. In the application I have a django model that looks like this:
class TeamFormNote(models.Model):
team = models.ForeignKey(Team, blank=True, null=True)
member = models.ForeignKey(TeamMember, blank=True, null=True)
notes = models.TextField()
def __unicode__(self):
if self.team:
return "Team Form Record: " + unicode(self.team)
if self.member:
return "Member Form Record: " + unicode(self.member)
Essentially, I want it to have a relationship with team or a relationship with member, but not both. Is there a way to enforce this?
I can only see two viable solutions. First is actually the same as #mariodev suggested in the comment which is to use Genetic foreign key. That will look something like:
# make sure to change the app name
ALLOWED_RELATIONSHIPS = models.Q(app_label = 'app_name', model = 'team') | models.Q(app_label = 'app_name', model = 'teammember')
class TeamFormNote(models.Model):
content_type = models.ForeignKey(ContentType, limit_choices_to=ALLOWED_RELATIONSHIPS)
relation_id = models.PositiveIntegerField()
relation = generic.GenericForeignKey('content_type', 'relation_id')
What that does is it sets up a generic foreign key which will allow you to link to any other model within your project. Since it can link to any other model, to restrict it to only the models you need, I use the limit_choices_to parameter of the ForeignKey. This will solve your problem since there is only one generic foreign key hence there is no way multiple relationships will be created. The disadvantage is that you cannot easily apply joins to generic foreign keys so you will not be able to do things like:
Team.objects.filter(teamformnote_set__notes__contains='foo')
The second approach is to leave the model as it and manually go into the database backend and add a db constaint. For example in postgres:
ALTER TABLE foo ADD CONSTRAINT bar CHECK ...;
This will work however it will not be transparent to your code.
This sounds like a malformed object model under the hood...
How about an abstract class which defines all common elements and two dreived classes, one for team and one for member?
If you are running into trouble with this because you want to have both referenced in the same table, you can use Generic Relations.