Django order_by after distinct(). Or distinct? - python

I need something like
user_messages = UserMessage.objects.filter(Q(from_user=user) | Q(to_user=user)).order_by('dialog_id').distinct('dialog_id').order_by('created')
Of course, it doesn't work. I found that I should use annotate(), but it seems to be quiet difficult for me, I'm new to Django. Can you help me?

This is the code I have to implement a similar feature. Maybe it will be of use to you.
In the view:
queryset = Conversation.objects.filter(Q(user_1=user) | Q(user_2=user)).filter(
Q(messages__sender=user, messages__archived_by_sender=False) |
Q(messages__recipient=user, messages__archived_by_recipient=False)).annotate(
last_activity=Max("messages__created")).order_by("-last_activity")
The models look like: (methods etc. omitted)
class Conversation(models.Model): # there is only one of these for each pair of users who message one another
user_1 = models.ForeignKey(settings.AUTH_USER_MODEL, related_name="conversations_as_user_1")
user_2 = models.ForeignKey(settings.AUTH_USER_MODEL, related_name="conversations_as_user_2")
read_by_user_1 = models.BooleanField(default=False)
read_by_user_2 = models.BooleanField(default=False)
class Meta:
unique_together = (("user_1", "user_2"),)
class Message(TimeTrackable):
conversation = models.ForeignKey(Conversation, related_name="messages", blank=True, help_text="the conversation between these two users (ALWAYS selected automatically -- leave this blank!)")
sender = models.ForeignKey(settings.AUTH_USER_MODEL, related_name="messages_sent")
recipient = models.ForeignKey(settings.AUTH_USER_MODEL, related_name="messages_received")
text = models.TextField() # flat text, NOT bleached or marksafe
archived_by_sender = models.BooleanField(default=False)
archived_by_recipient = models.BooleanField(default=False)
This is for an application where you can never have multiple separate conversation objects between the same users, but you can use the archive feature to archive all of the messages (from your perspective) which is as good as deleting the conversation from the user's perspective. Also, "read" status is stored on the conversation, not on the message.
In order to guarantee no two users have multiple "Conversation" objects with one another, user_1 on a conversation is always the lower user ID, and user_2 is always the higher. (In truth, I didn't think of this clever idea in time to actually implement it, so instead I have complex and unnecessary overrided save() logic. But if I were doing it again, I would do this part too, and maybe even call the fields user_lower and user_higher or something like that to make it clear.)
Let's break the view code down:
Fetch all conversation objects where the current user is either user_1 or user_2. Via a filter, require that any conversation objects returned have at least some of the messages visible to the current user have not been archived. The conversations don't have timestamps, but the messages do, so annotate the list of conversations by the most recent activity. Then order by that annotated timestamp.
This avoids any distinct() work because you are fetching on the conversation list with a join instead of on the message list with a join.

Related

How to fetch related entries in Django through reverse foreign key

Django newbie here!
I am coming from .NET background I am frustrated as to how to do the following simple thing:
My simplified models are as follows
class Circle(BaseClass):
name = models.CharField("Name", max_length=2048, blank=False, null=False)
active = models.BooleanField(default=False)
...
class CircleParticipant(BaseClass):
circle = models.ForeignKey(Circle, on_delete=models.CASCADE, null=True, blank=True)
user = models.ForeignKey(User, on_delete=models.SET_NULL, null=True, blank=True)
status = models.CharField("Status", max_length=256, blank=False, null=False)
...
class User(AbstractBaseUser, PermissionsMixin):
email = models.EmailField(verbose_name="Email", unique=True, max_length=255, validators=[email_validator])
first_name = models.CharField(verbose_name="First name", max_length=30, default="first")
last_name = models.CharField(verbose_name="Last name", max_length=30, default="last")
...
My goal is to get a single circle with participants that include the users as well. With the extra requirement to do all that in a single DB trip.
in SQL terms I want to accomplish this:
SELECT circle.name, circle.active, circle_participant.status, user.email. user.first_name. user.last_name
FROM circle
JOIN circle_participant on circle.id = circle_participant.id
JOIN user on user.id = circle_participant.id
WHERE circle.id = 43
I've tried the following:
Circle.objects.filter(id=43) \
.prefetch_related(Prefetch('circleparticipant_set', queryset=CircleParticipant.objects.prefetch_related('user')))
This is supposed to be working but when I check the query property on that statement it returns
SELECT "circle"."id", "circle"."created", "circle"."updated", "circle"."name", "circle"."active", FROM "circle" WHERE "circle"."id" = 43
(additional fields omitted for brevity.)
Am I missing something or is the query property incorrect?
More importantly how can I achieve fetching all that data with a single DB trip.
For reference here's how to do it in .NET Entity Framework
dbContext.Circle
.Filter(x => x.id == 43)
.Include(x => x.CircleParticipants) // This will exist in the entity/model
.ThenInclude(x => x.User)
.prefetch_related will use a second query to reduce the bandwidth, otherwise it will repeat data for the same Circle and CircleParticipants multiple times. Your CircleParticipant however acts as a junction table, so you can use:
Circle.objects.filter(id=43).prefetch_related(
Prefetch('circleparticipant_set', queryset=CircleParticipant.objects.select_related('user')
)
)
Am I missing something or is the query property incorrect?
There are two ways that Django gives you to solve the SELECT N+1 problem. The first is prefetch_related(), which creates two queries, and joins the result in memory. The second is select_related(), which creates a join, but has a few more restrictions. (You also haven't set related_name on any of your foriegn keys. IIRC that is required before using select_related().)
More importantly how can I achieve fetching all that data with a single DB trip.
I would suggest that you not worry too much about doing it all in one query. One of the downsides of doing this in one query as you suggest is that lots of the data that comes back will be redundant. For example, the circle.name column will be the same for every row in the table which is returned.
You should absolutely care about how many queries you do - but only to the extent that you avoid a SELECT N+1 problem. If you're doing one query for each model class involved, that's pretty good.
If you care strongly about SQL performance, I also recommend the tool Django Debug Toolbar, which can show you the number of queries, the exact SQL, and the time taken by each.
in SQL terms I want to accomplish this:
There are a few ways you could accomplish that.
Use many-to-many
Django has a field which can be used to create a many-to-many relationship. It's called ManyToManyField. It will implicitly create a many-to-many table to represent the relationship, and some helper methods to allow you to easily query for all circles a user is in, or all users that a circle has.
You're also attaching some metadata to each user/circle relationship. That means you'll need to define an explicit table using ManyToManyField.through.
There are examples in the docs here.
Use a related model query
If I specifically wanted a join, and not a subquery, I would query the users like this:
Users.objects.filter(circleparticipant_set__circle_id=43)
Use a subquery
This also creates only one query, but it uses a subquery instead.
Users.objects.filter(circleparticipant_set=CircleParticipant.objects.filter(circle_id=43))

Update a model after deleting a row in another model in Django

I have two models UserProfile and ChatUser.
ChatUser.models.py
class ChatUser(models.Model):
chat = models.ForeignKey(ChatRoom,on_delete=models.CASCADE)
user = models.ForeignKey(User,on_delete=models.CASCADE)
UserProfile.models.py
class UserProfile(models.Model):
user = models.OneToOneField(User,on_delete=models.CASCADE)
phone_number = models.IntegerField(default=0)
image = models.ImageField(upload_to='profile_image',blank=True,default='prof1.jpeg')
gender = models.CharField(max_length=10)
joined = JSONField(null=True)
ChatRoom.models
class ChatRoom(models.Model):
eid = models.CharField(max_length=64, unique=True)
name = models.CharField(max_length=100)
location = models.CharField(max_length=50)
vehicle = models.CharField(max_length=50)
brand = models.CharField(max_length=50)
max_limit = models.IntegerField()
joined in UserProfile is an array consisting room ids of the chatrooms model. Now when I delete a ChatRoom row, it automatically deletes the Foreign Key referenced ChatUser object since I am using on_delete=models.CASCADE. But how to update the joined in UserProfile model. I want to remove the id of the deleted ChatRoom from UserProfile.joined
I have used the django.db.models.signals to solve the updating part.
#receiver(post_delete,sender=ChatUser)
def update_profile(sender,instance,**kwargs):
id = instance.chat_id
joined = instance.user.userprofile.joined
if id in joined:
joined.remove(id)
model = profiles.models.UserProfile.objects.filter(user_id=instance.user.id).update(joined=joined)
SDRJ and Willem Van OnSem, thank you for your suggestions
#SAI SANTOSH CHIRAG- Please explain this. You have a ChatUser model that adds user_id and chatroom_id. Now, if I need to find out the list of chatrooms a user has joined, I can simply query this model. If I want to find out the total number of users in a specific chatroom then I can still query this table. Why do I need to keep track of joined in UserProfile? And I am basing this on the premise that joined keeps track of chatroom ids that a user has joined.
At any point, if you choose to add a many-to-many field in any of the models then this is my opinion. E.g Let's assume that you add the following in the UserProfile model
chatroom = models.ManytoManyField(Chat)
Imagine as the number of chatrooms the user joins grows, the list becomes larger and larger and I find it inconvenient because I will have this tiny scroll bar with a large list. It's not wrong but I simply stay away from M2M field for this purpose especially if I expect my list to grow as my application scales.
I prefer the ChatUser approach that you used. Yes, I might have repeating rows of user_ids or repeating chatroom_ids but I don't mind. I can live with it. It's still a bit cleaner to me. And this is simply my opinion. Feel free to disagree.
Lastly, I would rename the ChatUser model to ChatRoomUser...Why? Just by the name of it, I can infer it has something to do with two entities Chatroom and User.

In Django, how to keep many-to-many relations in sync?

What's the best way, in Django, to set and keep up-to-date a many-to-many field that is (for a lack of a better term) a composite of many-to-many fields from other models?
To give a concrete example, I have a Model representing a Resume.
class Resume(models.Model):
owner = models.OneToOneField(
settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
related_name="resume",
)
description = models.TextField(default="Resume description")
skills = models.ManyToManyField(Skill, related_name="resume")
The Resume object is referenced by another model called WorkExperience:
class WorkExperience(models.Model):
...
skills = models.ManyToManyField(Skill, related_name="work")
owner = models.ForeignKey(
settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
null=False,
default=1,
related_name="work_experience",
)
resume = models.ForeignKey(
Resume,
on_delete=models.CASCADE,
null=False,
default=1,
related_name="work_experience",
)
Notice that there's a fair amount of redundancy here, with both Resume and WorkExperience pointing to the same owner etc. That said, both of these Models (Resume & WorkExperience) have a field called Skills. That reference another Model called Skills.
What I'd like is to have the Resume skills to point to the same skills as the ones in WorkExperience. Any suggestions on how to do this? I also have a Model called EducationExperience which also references Skills and has a foreign key relation to Resume. Is there any way to keep the skills in Resume be in sync with both the skills in WorkExperience and EducationExperience?
A straightforward option would be to implement a method called set_resume_skills which would, when called, add the skills they have to the list of skills that Resume has
Class WorkExperience(models.Model):
# Same model as above
def set_resume_skills(self):
self.resume.skills.add(self.skills)
The issue I have with this approach is that it only adds skills, it doesn't really keep them in sync. So if I remove a skill from WorkExperience it won't be removed from Resume. Another boundary condition would be that if multiple WorkExperience objects are referencing a skill and then I remove the skill from one Object how would I make sure that the reference in Resume is still intact? I.e., two work experience objects refer to the Skill "Javascript". I remove the Skill from one WorkExperience object. The Skill "Javascript" should still be referenced by the Resume because one WorkExperience object still has a reference to it.
Edit: The reason I want to do it like this is to reduce the amount of querying done on the front-end. If the only way to filter the skills are through the the "sub-models" (WorkExperience, EducationExperience), I'd need to do two queries in my front-end instead of one. Although now that I think about it, doing two queries isn't that bad.
By your design, it seems that Skills actually belong to owner i.e the AUTH_USER. By having its M2M relation in Resume & other models will definitely cause redundancy. Why don't you just create a M2M of Skills in User model?
The point about reducing queries, there are other ways to do this as well. By using select_related or prefetch_related
class User(models.Model):
skills = models.ManyToManyField(Skill, related_name="resume")
# ... Other fields here
class Resume(models.Model):
owner = models.OneToOneField(
settings.AUTH_USER_MODEL,
on_delete=models.CASCADE,
related_name="resume",
)
description = models.TextField(default="Resume description")
# ... Other fields here
I was approaching the problem from the wrong end. I don't have to worry about keeping the Skills in check when I just do a reverse query on the Skill objects. Currently I'm filtering the Skill QuerySet in my view like this:
class SkillViewSet(viewsets.ModelViewSet):
serializer_class = SkillSerializer
def get_queryset(self):
queryset = Skill.objects.all()
email = self.request.query_params.get("email", None)
if email:
User = get_user_model()
user = User.objects.get(email=email)
query_work_experience = Q(work__owner=user)
query_education_experience = Q(education__owner=user)
queryset = queryset.filter(
query_work_experience | query_education_experience
)
return queryset
This removes the need to keep the Skills in sync with the Resume Model.

Is there a better way to design the Message model?

Is there a better way to design the Message model ?
I have a Message model:
class Message(models.Model):
"""
message
"""
title = models.CharField(max_length=64, help_text="title")
content = models.CharField(max_length=1024, help_text="content")
is_read = models.BooleanField(default=False, help_text="whether message is read")
create_user = models.ForeignKey(User, related_name="messages",help_text="creator")
receive_user = models.CharField(max_length=1024, help_text="receive users' id")
def __str__(self):
return self.title
def __unicode__(self):
return self.title
You see, I use models.CharField to store the users' id, so I can know the users who should receive this row message.
I don't know whether this design type is good. or is there a better way to do that?
I have considered use ManyToMany Field, but I think if user is too many, the admin create one message will create as many as users count, so I think this is not a good idea.
I would definitely use ManyToManyField for your receive_user. You're going to find that keeping a CharField updated and sanitised with user_ids is going to be a nightmare that will involve re-implementing vast swathes of existing Django functionality.
I'm not sure if I understand your potential issue to using ManyToManyField, users of the admin will be able to select which users are to be recipients of the message, it doesn't automatically a message for each user.
e: Also, depending on which version of python you're using (2 or 3) you only need one of either __str__ or __unicode__
__unicode__ is the method to use for python2, __str__ for python3: See this answer for more details
So it actually depends on your needs in which direction I would change your message Model.
General Changes
Based on the guess: you don't ever need an index on the content field
I would change the content to a TextField (alse because the length of 1024 is already to large for a propper index on mysql for example) https://docs.djangoproject.com/en/1.11/ref/databases/#textfield-limitations here some more infos about this topic.
I would pbly increase the size of the title field just because it seems convenient to me.
1. Simple -> One User to One User
The single read field indicates a one to one message:
I would change the Receiver to also be a Foreign key and adapt the related names of the sender and receiver field to represent these connections to something like sent-messages and received-messages.
Like #sebastian-fleck already suggested I'd also change the read field to a datetime field, it only changes your querysets from filter(read=True) to filter(read__isnull=False) to get the same results and you could create a property representing the read as boolean for conveniance, e.g.
#property
def read(self):
return bool(self.read_datetime) # assumed read as a datetime is read_datetime
2. More Complex: One User to Multiple User
This can get a lot more complex, here the least complex solution I could think of.
Conditions:
- there are only messages and no conversation like strukture
- a message should have a read status for every receiver
(I removed descriptions for an easier overview and changed the models according to my opinions from before, this is based on my experience and the business needs I assumed from your example and answers)
#python_2_unicode_compatible
class Message(models.Model):
title = models.CharField(max_length=160)
content = models.TextField()
create_user = models.ForeignKey(User, related_name="sent-messages")
receive_users = models.ManyToManyField(User, through=MessageReceiver)
def __str__(self):
return 'Message: %s' % self.title
#python_2_unicode_compatible
class MessageReceiver(models.Model):
is_read = models.Datetime(null=True, blank=True)
receiver = models.ForeignKey(User)
message = models.ForeignKey(Message)
This structure is using the power of ManyToMany with a custom through Model, check this out, it very mighty: https://docs.djangoproject.com/en/1.11/ref/models/fields/#django.db.models.ManyToManyField.through.
tldr: we want every receiver to have a read status, so we modeled this in a separate object
Longer version: we utilize the power of a custom ManyToMany through model to have a separate read status for every receiver. This means we need to change some parts of our code to work for the many to many structure, e.g. if we want to know if a message was read by all receivers:
def did_all_receiver_read_the_message(message)
unread_count = my_message.receive_users.filter(is_read__isnull=True).count()
if unread_count > 0:
return True
return False
if we want to know if a specific user read a specific message:
def did_user_read_this_message(user, message)
receiver = message.receive_users.get(receiver=user)
return bool(receiver.is_read)
3. Conversations + Messages + Participants
This is something that would exceed my time limit but some short hints:
Conversation holds everything together
Message is written by a Participant and holds a created timestamp
Participant allows access to a conversation and links a User to the Conversation object
the Participant holds a last_read timestamp with can be used to calculate if a message was read or not using the messages created timestamps (-> annoyingly complex part & milliseconds are important)
Everything else pbly would need to be adapted to your specific business needs. This scenario is pbly the most flexible but it's a lot of work (based on personal experience) and adds quite a bit of complexity to your architecture - I only recommend this if it's really really needed ^^.
Disclaimer:
This could be an overall structure, most design decisions I made for the examples are based on assumptions, I could only mentioned some or the text would to long, but feel free to ask.
Please excuse any typos and errors, I didn't had the chance to run the code.

Complex Django queryset filtering, involving Q objects and tricky logic

I have a web-based chat app developed in Django. People can form their own chat groups in it, invite others and chatter away in the said groups.
I want to isolate all users who were recently online AND: (i) have already been sent an invite for a given group, OR (ii) have already participated (i.e. replied) at least once in the same given group. Note that user is a vanilla django.contrib.auth user.
To accomplish this, I'm writing:
online_invited_replied_users = User.objects.filter(id__in=recently_online,(Q()|Q()))
Assume recently_online to be correctly formulated. What should be the two Q objects?
The first Q() ought to refer invitees, the second to users who have replied at least once. I seem to be running into code-writer's block in formulating a well-rounded, efficient db query here. Please advise!
Relevant models are:
class Group(models.Model):
topic = models.TextField(validators=[MaxLengthValidator(200)])
owner = models.ForeignKey(User)
created_at = models.DateTimeField(auto_now_add=True)
class GroupInvite(models.Model):
#this is a group invite object
invitee = models.ForeignKey(User, related_name='invitee')
inviter = models.ForeignKey(User, related_name ='inviter')
sent_at = models.DateTimeField(auto_now_add=True)
which_group = models.ForeignKey(Group)
class Reply(models.Model):
#if a user has replied in a group, count that as participation
text = models.TextField(validators=[MaxLengthValidator(500)])
which_group = models.ForeignKey(Group)
writer = models.ForeignKey(User)
submitted_on = models.DateTimeField(auto_now_add=True)
Note: feel free to ask for more info; I'm cognizant of the fact that I may have left something out.
For users that are already invited to a group group, follow the GroupInvite.invitee foreign key backwards.
already_invited = Q(invitee__which_group=group)
The related_name you chose, invitee, doesn't really make sense, because it's linking to an group invite, not a user. Maybe `group_invites_received' would be be better.
For users that have already replied, follow the Reply.writer foreign key backwards.
already_replied = Q(reply__which_group=group)
In this case it's reply. because you haven't specified a related name.
Note that your current pseudo code
User.objects.filter(id__in=recently_online,(Q()|Q()))
will give an error 'non-keyword arg after keyword arg'.
You can fix this by either moving the non-keyword argument before the keyword argument,
User.objects.filter((Q()|Q()), id__in=recently_online)
or put them in separate filters:
User.objects.filter(id__in=recently_online).filter(Q()|Q())
Finally, note that you might need to use distinct() on your queryset, otherwise users may appear twice.

Categories