How to speed up sorting of django queryset? - python

We have custom User model, each instance has multiple interests.
Here is Interest model:
class Interest(models.Model):
name = models.CharField(max_length=50, unique=True)
users = models.ManyToManyField(settings.AUTH_USER_MODEL,
related_name="interests", blank=True)
When user going to page, we suggest to contact with other people who have most common interests relatively to this user.
For generation list of users we use:
def suggested_people(user):
queryset = User.objects.custom_filter(is_verified=True).exclude(pk=user.pk).order_by('-date_joined').select_related()
users_sorted = sorted(queryset, key=lambda x: x.get_common_interest(user).count(), reverse=True)
return users_sorted
Method of User-model :
def get_common_interest(self, user):
""" Return a list of string with the interests and the total number remaining """
your_interests = user.interests.values_list('pk', flat=True)
return self.interests.filter(pk__in=your_interests)
But there is a problem that the list is sorted very slowly (for 1,000 users about 8 seconds). Is it possible to somehow simplify or speed up the sorting?
Will be grateful for any advice!

Lets say we have an incoming user named as u for which we want to show suggestions, then the query would be:
from django.db.models import Count
interests_ids = u.interests.values_list('id', flat=True) # select ids of incoming user interests
suggestions = User.objects
.exclude(id=u.id) # exclude current user
.filter(is_verified=True) # filter only verified users
.filter(interests__id__in=interests_ids) # select users based on common interests
.annotate(interests_count=Count('interests')) # count numbers of interests for each user after filtering
.order_by('-interests_count') # order users by max common interests
The suggestions queryset will not contain any user which don't share any common interest with user u. If you still want to show some suggestions if there are no suggestions from above query, then you can filter User based on some other criteria e.g. users living in same country or city.

Related

Django Get Last Object for each Value in List

I have a model called Purchase, with two fields, User and amount_spent.
This is models.py:
class Purchase(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
amount_spent = models.IntegerField()
created_at = models.DateTimeField(auto_now_add=True)
I want to get the last purchases from a list of users.
On views.py I have a list with some User's objects, and I want to get the last purchase for each user in the list. I can't find a way of doing this in a single query, I checked the latest() operator on QuerySets, but it only returns one object.
This is views.py:
purchases = Purchase.objects.filter(user__in=list_of_users)
# purchases contains all the purchases from users, now I need to get the most recent onces for each user.
I now I could group the purchases by user and then get the most recent ones, but I was wondering it there is a way of making this as a single query to DB.
try this:
Purchase.objects.filter(user__in=list_of_users).values("user_id", "amount_spent").order_by("-id").distinct("user_id")
You can annotate the Users with the last_purchase_pks and then fetch these and adds that to these users:
from django.db.models import OuterRef, Subquery
users = User.objects.annotate(
last_purchase_pk=Subquery(
purchase.objects.order_by('-created_at')
.filter(user_id=OuterRef('pk'))
.values('pk')[:1]
)
)
purchases = {
p.pk: p
for p in Purchase.objects.filter(
pk__in=[user.last_purchase_pk for user in users]
)
}
for user in users:
user.last_purchase = purchases.get(user.last_purchase_pk)
After this code snippet, the User objects in users will all have a last_purchase attribute that contains the last Purchase for that user, or None in case there is no such purchase.

LEFT JOIN with other param in ON Django ORM

I have the following models:
class Customer(models.Model):
name = models.CharField(max_length=255)
email = models.EmailField(max_length = 255, default='example#example.com')
authorized_credit = models.IntegerField(default=0)
balance = models.IntegerField(default=0)
class Transaction(models.Model):
customer = models.ForeignKey(Customer, on_delete=models.CASCADE)
payment_amount = models.IntegerField(default=0) #can be 0 or have value
exit_amount = models.IntegerField(default=0) #can be 0 or have value
transaction_date = models.DateField()
I want to query for get all customer information and date of last payment.
I have this query in postgres that is correct, is just that i need:
select e.*, max(l.transaction_date) as last_date_payment
from app_customer as e
left join app_transaction as l
on e.id = l.customer_id and l.payment_amount != 0
group by e.id
order by e.id
But i need this query in django for an serializer. I try with that but return other query
In Python:
print(Customer.objects.filter(transaction__isnull=True).order_by('id').query)
>>> SELECT app_customer.id, app_customer.name, app_customer.email, app_customer.balance FROM app_customer
LEFT OUTER JOIN app_transaction
ON (app_customer.id = app_transaction.customer_id)
WHERE app_transaction.id IS NULL
ORDER BY app_customer.id ASC
But that i need is this rows
example
Whether you are working with a serializer or not you can reuse the same view/function for both the tasks.
First to get the transaction detail for the current customer object you have you have to be aware of related_name.related_name have default values but you can mention something unique so that you remember.
Change your model:
class Transaction(models.Model):
customer = models.ForeignKey(Customer, related_name="transac_set",on_delete=models.CASCADE)
related_names are a way in django to create reverse relationship from Customer to Transaction this way you will be able to do Customer cus.transac_set.all() and it will fetch all the transaction of cus object.
Since you might have multiple customers to get transaction details for you can use select_related() when querying this will hit the database least number of times and get all the data for you.
Create a function definition to get the data of all transaction of Customers:
def get_cus_transac(cus_id_list):
#here cus_id_list is the list of ids you need to fetch
cus_transac_list = Transaction.objects.select_related(customer).filter(id__in = cus_id_list)
return cus_transac_list
For your purpose you need to use another way that is the reason you needed related_name, prefetch_related().
Create a function definition to get the data of latest transaction of Customers: ***Warning: I was typing this answer before sleeping so there is no way the latest value of transaction is being fetched here.I will add it later but you can work on similar terms and get it done this way.
def get_latest_transac(cus_id_list):
#here cus_id_list is the list of ids you need to fetch
latest_transac_list = Customer.objects.filter(id__in = cus_id_list).prefetch_related('transac_set')
return latest_transac_list
Now coming to serializer,you need to have 3 serializers (Actually you need 2 only but third one can serialize Customer data + latest transaction that you need) ...one for Transaction and another for customer then the 3rd Serializer to combine them.
There might be some mistakes in code or i might have missed some details.As i have not checked it.I am assuming you know how to make serializers and views for the same.
One approach is to use subqueries:
transaction_subquery = Transaction.objects.filter(
customer=OuterRef('pk'), payment_amount__gt=0,
).order_by('-transaction_date')
Customer.objects.annotate(
last_date_payment=Subquery(
transaction_subquery.values('transaction_date')[:1]
)
)
This will get all customer data, and annotate with their last transaction date that has payment_amount as non-zero, in one query.
To solve your problem:
I want to query for get all customer information and date of last payment.
You can try use order by combine with distinct:
Customer.objects.prefetch_related('transaction_set').values('id', 'name', 'email', 'authorized_credit', 'balance', 'transaction__transaction_date').order_by('-transaction__transaction_date').distinct('transaction__transaction_date')
Note:
It only applies to PostgreSQL when distinct followed by parameters.
Usage of distinct: https://docs.djangoproject.com/en/3.2/ref/models/querysets/#distinct

Django form list field or json field?

I have a model called participants as below
class participants(models.Model):
username= models.CharField(max_length =50)
votes = models.IntegerField(default=0)
voted_by = ?
Votes is the total number of votes given by users and single user can vote multiple times. If the user have voted then the user should wait 1 hour to vote again. Now i am wondering, how can i store users id in a way that it would be easier to know who voted how many times and the recent date and time the user have voted.
Can someone suggest me or refer some examples that i can solve this problem.
You can create another model (eg. VotesHistory)...
class VotesHistory(models.Model):
class Meta:
verbose_name = "Vote Log"
verbose_name_plural = "Vote Logs"
time = models.DateTimeField(auto_now=True, verbose_name="Time")
uid = models.IntegerField(verbose_name="Voter's UserID")
pid = models.IntegerField(verbose_name="Voted UserID")
Now, when user 1 will vote user 2, you can create an entry such as,
VotesHistory(uid=user1.id, pid=user2.id).save()
This kind of problem is generally solved by using a ForeignKey reference.
# class name should begin with a capital letter and should be singular for a model
class Participant(models.Model):
username = models.CharField(max_length =50)
class Vote(models.Model)
vote_to = models.ForeignKey(Participant, on_delete=models.CASCADE, related_name='vote_to')
voted_by = models.ForeignKey(Participant, on_delete=models.CASCADE, related_name='voted_by')
date_time = models.DateTimeField(auto_now=True)
Each vote by a participant would be a row in the Votes table or an object of type Vote.
Something like,
vote = Vote(vote_to=some_participant_object,
voted_by=someother_participant_object)
vote.save()
auto_now=True means the value will be added when the object gets created so you don't have to handle when the vote was cast.
You can then query the number of votes cast by a particular participant using the ORM.
A basic filter query should be enough. Get all the votes by a particular participant.
Something like,
# just as an idea here, the next lines might not be perfect
votes = Vote.objects.filter(voted_by__id=some_participant_id)
# or
votes = Vote.objects.filter(voted_by=some_participant_object)
# check the timestamp of the last vote and build logic accordingly
This way it'll be easier to write ORM queries to count the number of votes a particular participant has or the number of votes a particular participant has cast.

Django - Filter the prefetch_related queryset

I am trying to reduce my complexity by doing the following. I am trying to get all the teachers in active classrooms.
teacher/models.py:
Teacher(models.Model):
name = models.CharField(max_length=300)
classroom/models.py:
Classroom(models.Model):
name = models.CharField(max_length=300)
teacher = models.ForeignKey(Teacher)
students = models.ManyToManyField(Student)
status = models.CharField(max_length=50)
admin/views.py
teachers = Teacher.objects.prefetch_related(Prefetch('classroom_set',queryset=Classroom.objects.filter(status='Active'))
for teacher in teachers:
classrooms = teacher.all()
# run functions
By doing this I get teachers with classrooms. But it also returns teachers with no active classrooms(empty list) which I don't want. Because of this, I have to loop around thousands of teachers with empty classroom_set. Is there any way I can remove those teachers whose classroom_set is [ ]?
This is my original question -
Django multiple queries with foreign keys
Thanks
If you want all teachers with at least one related active class, you do not need to prefetch these, you can filter on the related objects, like:
Teacher.objects.filter(class__status='Active').distinct()
If you want to filter the classroom_set as well, you need to combine filtering as well as .prefetch_related:
from django.db.models import Prefetch
Teacher.objects.filter(
class__status='Active'
).prefetch_related(
Prefetch('class_set', queryset=Class.objects.filter(status='Active'))
).distinct()
Here we thus will filter like:
SELECT DISTINCT teacher.*
FROM teacher
JOIN class on class.teacher_id = teacher.id
WHERE class.status = 'Active'

How to write a complex Mysql Query in Django

I am new to Django and I am working on a small module of a Django application where I need to display the list of people who have common interest as that of any particular User. So Suppose if I am an user I can see the list of people who have similar interests like me.
For this I have 2 models :
models.py
class Entity(models.Model):
name = models.CharField(max_length=255, unique=True)
def __str__(self):
return self.name
class UserLikes(models.Model):
class Meta:
unique_together = (('user', 'entity'),)
user = models.ForeignKey(User)
entity = models.ForeignKey(Entity)
def __str__(self):
return self.user.username + " : " + self.entity.name
So in the Entity Table I store the Entities in which user can be interested Eg : football, Music, Code etc.
and in the UserLikes I store the relation about which user likes which entity.
Now I have a Query to fetch details about which user has maximum interest like any particular user :
SELECT y.user_id, GROUP_CONCAT(y.entity_id) likes, COUNT(*) total
FROM likes_userlikes x
JOIN likes_userlikes y ON y.entity_id = x.entity_id AND y.user_id <> x.user_id
WHERE x.user_id = ?
GROUP BY y.user_id
ORDER BY total desc;
Problem is how do I write this Query using Django Querysets and change it into a function.
# this gives you what are current user's interests
current_user_likes = UserLikes.objects.filter(user__id=user_id) \
.values_list('entity', flat=True).distinct()
# this gives you who are the persons that shares the same interests
user_similar_interests = UserLikes.objects.filter(entity__id__in=current_user_likes) \
.exclude(user__id=user_id) \
.values('user', 'entity').distinct()
# finally the count
user_similar_interests_count = user_similar_interests.count()
Here the user_id is the user's id you want to query for.
One advice though, it's not good practice to use plural form for model names, just use UserLike or better, UserInterest for it. Django would add plural form when it needs to.

Categories