Suppose I have the following models:
class User(models.Model):
# ... some fields
class Tag(models.Model):
# ... some fields
class UserTag(models.Model):
user = models.ForeignKey(User, related_name='tags')
tag = models.ForeignKey(Tag, related_name='users')
date_removed = models.DateTimeField(null=True, blank=True)
Now I lets say I want to get all the users that have a given tag that has not yet been removed (ie date_removed=None). If I didn't have to worry about the date_removed constraint, I could do:
User.objects.filter(tags__tag=given_tag)
But I want to get all users who have that given tag and have the tag without a date_removed on it. Is there an easy way in Django to get that in a single queryset? And assume I have millions of Users, so getting any sort of list of User IDs and keeping it in memory is not practical.
Your filter() call can include multiple constraints:
User.objects.filter(tags__tag=given_tag, tags__date_removed=None)
When they match, they will both match to the same Tag, not two possibly different ones.
See the documentation on spanning multi-valued relationships;
in particular, the difference between filter(a, b) and filter(a).filter(b).
Related
Django newbie here!
I am coming from .NET background I am frustrated as to how to do the following simple thing:
My simplified models are as follows
class Circle(BaseClass):
name = models.CharField("Name", max_length=2048, blank=False, null=False)
active = models.BooleanField(default=False)
...
class CircleParticipant(BaseClass):
circle = models.ForeignKey(Circle, on_delete=models.CASCADE, null=True, blank=True)
user = models.ForeignKey(User, on_delete=models.SET_NULL, null=True, blank=True)
status = models.CharField("Status", max_length=256, blank=False, null=False)
...
class User(AbstractBaseUser, PermissionsMixin):
email = models.EmailField(verbose_name="Email", unique=True, max_length=255, validators=[email_validator])
first_name = models.CharField(verbose_name="First name", max_length=30, default="first")
last_name = models.CharField(verbose_name="Last name", max_length=30, default="last")
...
My goal is to get a single circle with participants that include the users as well. With the extra requirement to do all that in a single DB trip.
in SQL terms I want to accomplish this:
SELECT circle.name, circle.active, circle_participant.status, user.email. user.first_name. user.last_name
FROM circle
JOIN circle_participant on circle.id = circle_participant.id
JOIN user on user.id = circle_participant.id
WHERE circle.id = 43
I've tried the following:
Circle.objects.filter(id=43) \
.prefetch_related(Prefetch('circleparticipant_set', queryset=CircleParticipant.objects.prefetch_related('user')))
This is supposed to be working but when I check the query property on that statement it returns
SELECT "circle"."id", "circle"."created", "circle"."updated", "circle"."name", "circle"."active", FROM "circle" WHERE "circle"."id" = 43
(additional fields omitted for brevity.)
Am I missing something or is the query property incorrect?
More importantly how can I achieve fetching all that data with a single DB trip.
For reference here's how to do it in .NET Entity Framework
dbContext.Circle
.Filter(x => x.id == 43)
.Include(x => x.CircleParticipants) // This will exist in the entity/model
.ThenInclude(x => x.User)
.prefetch_related will use a second query to reduce the bandwidth, otherwise it will repeat data for the same Circle and CircleParticipants multiple times. Your CircleParticipant however acts as a junction table, so you can use:
Circle.objects.filter(id=43).prefetch_related(
Prefetch('circleparticipant_set', queryset=CircleParticipant.objects.select_related('user')
)
)
Am I missing something or is the query property incorrect?
There are two ways that Django gives you to solve the SELECT N+1 problem. The first is prefetch_related(), which creates two queries, and joins the result in memory. The second is select_related(), which creates a join, but has a few more restrictions. (You also haven't set related_name on any of your foriegn keys. IIRC that is required before using select_related().)
More importantly how can I achieve fetching all that data with a single DB trip.
I would suggest that you not worry too much about doing it all in one query. One of the downsides of doing this in one query as you suggest is that lots of the data that comes back will be redundant. For example, the circle.name column will be the same for every row in the table which is returned.
You should absolutely care about how many queries you do - but only to the extent that you avoid a SELECT N+1 problem. If you're doing one query for each model class involved, that's pretty good.
If you care strongly about SQL performance, I also recommend the tool Django Debug Toolbar, which can show you the number of queries, the exact SQL, and the time taken by each.
in SQL terms I want to accomplish this:
There are a few ways you could accomplish that.
Use many-to-many
Django has a field which can be used to create a many-to-many relationship. It's called ManyToManyField. It will implicitly create a many-to-many table to represent the relationship, and some helper methods to allow you to easily query for all circles a user is in, or all users that a circle has.
You're also attaching some metadata to each user/circle relationship. That means you'll need to define an explicit table using ManyToManyField.through.
There are examples in the docs here.
Use a related model query
If I specifically wanted a join, and not a subquery, I would query the users like this:
Users.objects.filter(circleparticipant_set__circle_id=43)
Use a subquery
This also creates only one query, but it uses a subquery instead.
Users.objects.filter(circleparticipant_set=CircleParticipant.objects.filter(circle_id=43))
I have models defined as follow:
class Employee(models.Model):
...
user = models.OneToOneField(User, on_delete=models.CASCADE, primary_key=True)
class Project(models.Model):
...
employees = models.ManyToManyField(Employee, null=True, blank=True)
I'm trying to retrieve all the projects that have at least one employee assigned to them, but I don't know how. I tried the following things:
projects.filter(employees__gt=0)
where projects = Project.objects.all() but I don't think this is the right query, because if I do projects.filter(employees_lte=0) it returns nothing, even if I have projects with no employees assigned. How can I retrieve what I'm looking for? Could you point to a page where I can find all the lookups I can use?
Thanks!
You can try like this using isnull:
Project.objects.filter(employees__isnull=False)
Update
If you want to check specific number of employees, maybe try like this
from django.db.models import Count
Project.objects.annotate(employee_count=Count('employees')).filter(employee_count__gt=5)
I have a django app with the following models:
class Person(models.Model):
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
class Job(models.Model):
title = models.CharField(max_length=100)
class PersonJob(models.Model):
person = models.ForeignKey(Person, related_name='person_jobs')
job = models.ForeignKey(Job, related_name='person_jobs')
is_active = models.BooleanField()
Multiple Person instances can hold the same job at once. I have a Job queryset and am trying to annotate or through some other method attach the names of each person with that job onto each item in the queryset. I want to be able to loop through the queryset and get those names without doing an additional query for each item. The closest I have gotten is the following:
qs = Job.objects.all().annotate(first_names='person_jobs__person__first_name')
.annotate(last_names='person_jobs__person__last_name')
This will store the name on the Job instance as I would like; however, if a job has multiple people in it, the queryset will have multiple copies of the same Job in it, each with the name of one person. Instead, I need there to only ever be one instance of a given Job in the queryset, which holds the names of all people in it. I don't care how the values are combined and stored; a list, delimited char field, or really any other standard data type would be fine.
I'm using Django 2.1 and Postgres 10.3. I would strongly prefer to not use any Postgres specific features.
You can use either ArrayAgg or StringAgg:
from django.contrib.postgres.aggregates import ArrayAgg, StringAgg
Job.objects.all().annotate(first_names=StringAgg('person_jobs__person__first_name', delimiter=',')
Job.objects.all().annotate(people=ArrayAgg('person_jobs__person__first_name'))
I have a model 'Status' with a ManyToManyField 'groups'. Each group has a ManyToManyField 'users'. I want to get all the users for a certain status. I know I can do a for loop on the groups and add all the users to a list. But the users in the groups can overlap so I have to check to see if the user is already in the group. Is there a more efficient way to do this using queries?
edit: The status has a list of groups. Each group has a list of users. I want to get the list of users from all the groups for one status.
Models
class Status(geomodels.Model):
class Meta:
ordering = ['-date']
def __unicode__(self):
username = self.user.user.username
return "{0} - {1}".format(username, self.text)
user = geomodels.ForeignKey(UserProfile, related_name='statuses')
date = geomodels.DateTimeField(auto_now=True, db_index=True)
groups = geomodels.ManyToManyField(Group, related_name='receivedStatuses', null=True, blank=True)
class Group(models.Model):
def __unicode__(self):
return self.name + " - " + self.user.user.username
name = models.CharField(max_length=64, db_index=True)
members = models.ManyToManyField(UserProfile, related_name='groupsIn')
user = models.ForeignKey(UserProfile, related_name='groups')
I ended up creating a list of the groups I was looking for and then querying all users that were in any of those groups. This should be pretty efficient as I'm only using one query.
statusGroups = []
for group in status.groups.all():
statusGroups.append(group)
users = UserProfile.objects.filter(groupsIn__in=statusGroups)
As you haven't posted your models, its a bit difficult to give you a django queryset answer, but you can solve your overlapping problem by adding your users to a set which doesn't allow duplicates. For example:
from collections import defaultdict
users_by_status = defaultdict(set)
for i in Status.objects.all():
for group in i.group_set.all():
users_by_status[i].add(group.user.pk)
Based on your posted model code, the query for a given status is:
UserProfile.objects.filter(groupsIn__receivedStatuses=some_status).distinct()
I'm not 100% sure that the distinct() call is necessary, but I seem to recall that you'd risk duplicates if a given UserProfile were in multiple groups that share the same status. The main point is that filtering on many-to-many relationships works using the usual underscore notation, if you use the names as defined either by related_name or the default related name.
I have two models Category and Entry. There is another model ExtEntry that inherits from Entry
class Category(models.Model):
title = models.CharField('title', max_length=255)
description = models.TextField('description', blank=True)
...
class Entry(models.Model):
title = models.CharField('title', max_length=255)
categories = models.ManyToManyField(Category)
...
class ExtEntry(Entry):
groups= models.CharField('title', max_length=255)
value= models.CharField('title', max_length=255)
...
I am able to use the Category.entry_set but I want to be able to do Category.blogentry_set but it is not available. If this is not available,then I need another method to get all ExtEntryrelated to one particular Category
EDIT
My end goal is to have a QuerySet of ExtEntry objects
Thanks
I need another method to get all ExtEntryrelated to one particular Category
Easy:
ExtEntry.objects.filter(categories=my_category)
Do you know if there is a way to use the _set feature of an inherited
I don't know if there is a direct they for that. It is not mentioned in documentation.
But it is possible to get similar results with the select_related.
for e in category.entry_set.select_related('extentry'):
e.extentry # already loaded because of `select_related`,
# however might be None if there is no Extentry for current e
It is possible to select only entries which has ExtEntry:
for e in category.entry_set.select_related('extentry').exlude(extentry=None):
e.extentry # now this definitely is something, not None
Bad thing about the exclude is that it generates terrybly inefficient query:
SELECT entry.*, extentry.* FROM entry
LEFT OUTER JOIN `extentry` ON (entry.id = extentry.entry_ptr_id)
WHERE NOT (entry.id IN (SELECT U0.id FROM entry U0 LEFT OUTER JOIN
extentry U1 ON (U0.id = U1.entry_ptr_id)
WHERE U1.entry_ptr_id IS NULL))
So my resume would be: use ExtEntry.objects.filter() to get your results. The backwards relations (object.something_set) is just a convenience and does not work in every situation.
See the documentation here for an explanation of how this works.
Basically, since you can get the parent model item, you should be able to get its child because an implicit one-to-one linkage is created.
The inheritance relationship introduces links between the child model and each of its parents (via an automatically-created OneToOneField).
So, you should be able to do:
categories = Category.objects.all()
for c in categories:
entries = c.entry_set.all()
for e in entries:
extentry = e.extentry
print extentry.value
It isn't documented that I can see, but I believe that generally, your one-to-one field name will be a lower class version of the inheriting model name.
The problem your running into is because Entry and ExtEntry are in separate tables. This may be the best solution for you, but you should be aware of that when you choose to use multi-table inheritance.
Something like category.entry_set.exclude(extentry=None) should work for you.