Optimize QuerySets in a Loop with indexes and better SQL

Optimize QuerySets in a Loop with indexes and better SQL - python

I have a View that returns some statistics about email lists growth. The models involved are:
models.py
class Contact(models.Model):
email_list = models.ForeignKey(EmailList, related_name='contacts')
customer = models.ForeignKey('Customer', related_name='contacts')
status = models.CharField(max_length=8)
create_date = models.DateTimeField(auto_now_add=True)
class EmailList(models.Model):
customers = models.ManyToManyField('Customer',
related_name='lists',
through='Contact')
class Customer(models.Model):
is_unsubscribed = models.BooleanField(default=False, db_index=True)
unsubscribe_date = models.DateTimeField(null=True, blank=True, db_index=True)
In the View what I'm doing is iterating over all EmailLists objects and getting some metrics: the following way:
view.py
class ListHealthView(View):
def get(self, request, *args, **kwargs):
start_date, end_date = get_dates_from_querystring(request)
data = []
for email_list in EmailList.objects.all():
# historic data up to start_date
past_contacts = email_list.contacts.filter(
status='active',
create_date__lt=start_date).count()
past_unsubscribes = email_list.customers.filter(
is_unsubscribed=True,
unsubscribe_date__lt=start_date,
contacts__status='active').count()
past_deleted = email_list.contacts.filter(
status='deleted',
modify_date__lt=start_date).count()
# data for the given timeframe
new_contacts = email_list.contacts.filter(
status='active',
create_date__range=(start_date, end_date)).count()
new_unsubscribes = email_list.customers.filter(
is_unsubscribed=True,
unsubscribe_date__range=(start_date, end_date),
contacts__status='active').count()
new_deleted = email_list.contacts.filter(
status='deleted',
modify_date__range=(start_date, end_date)).count()
data.append({
'new_contacts': new_contacts,
'new_unsubscribes': new_unsubscribes,
'new_deleted': new_deleted,
'past_contacts': past_contacts,
'past_unsubscribes': past_unsubscribes,
'past_deleted': past_deleted,
})
return Response({'data': data})
Now this works fine, but as My DB started growing, the response time from this view is above 1s and occasionally will cause long running queries in the Database. I think the most obvious improvement would be to index EmailList.customers but I think maybe it needs to be a compound index ? Also, is there a better way of doing this ? Maybe using aggregates ?
EDIT
After #bdoubleu answer I tried the following:
data = (
EmailList.objects.annotate(
past_contacts=Count(Subquery(
Contact.objects.values('id').filter(
email_list=F('pk'),
status='active',
create_date__lt=start_date)
)),
past_deleted=Count(Subquery(
Contact.objects.values('id').filter(
email_list=F('pk'),
status='deleted',
modify_date__lt=start_date)
)),
)
.values(
'past_contacts', 'past_deleted',
)
)
I had to change to use F instead of OuterRef because I realized that my model EmailList has id = HashidAutoField(primary_key=True, salt='...') was causing ProgrammingError: more than one row returned by a subquery used as an expression but I'm not completely sure about it.
Now the query works but sadly all counts are returned as 0

As is your code is producing 6 queries for every EmailList instance. For 100 instances that's minimum 600 queries which slows things down.
You can optimize by using SubQuery() expressions and .values().
from django.db.models import Count, OuterRef, Subquery
data = (
EmailList.objects
.annotate(
past_contacts=Count(Subquery(
Contact.objects.filter(
email_list=OuterRef('pk'),
status='active',
create_date__lt=start_date
).values('id')
)),
past_unsubscribes=...,
past_deleted=...,
new_contacts=...,
new_unsubscribes=...,
new_deleted=...,
)
.values(
'past_contacts', 'past_unsubscribes',
'past_deleted', 'new_contacts',
'new_unsubscribes', 'new_deleted',
)
)
Update: for older versions of Django your subquery may need to look like below
customers = (
Customer.objects
.annotate(
template_count=Subquery(
CustomerTemplate.objects
.filter(customer=OuterRef('pk'))
.values('customer')
.annotate(count=Count('*')).values('count')
)
).values('name', 'template_count')
)

Related

Annotation inside an annotation in Django Subquery?

I've got a few models and am trying to speed up the page where I list out users.
The issue is that I was leveraging model methods to display some of the data - but when I listed the Users out it was hitting the DB multiple times per User which ended up with hundreds of extra queries (thousands when there were thousands of User objects in the list) so it was a serious performance hit.
I've since began using annotate and prefetch_related which has cut the queries down significantly. I've just got one bit I can't figure out how to annotate.
I have a model method (on Summation model) I use to get a summary of Evaluation data for a user like this:
def evaluations_summary(self):
evaluations_summary = (
self.evaluation_set.all()
.values("evaluation_type__name")
.annotate(Count("evaluation_type"))
)
return evaluations_summary
I'm trying to figure out how to annotate that particular query on a User object.
So the relationship looks like this User has multiple Summations, but only one is ever 'active', which is the one we display in the User list. Each Summation has multiple Evaluations - the summary of which we're trying to show as well.
Here is a summary of the relevant parts of code (including the Summation model method which gives an example of what is currently 'working' to display the data as needed) - I have also made a pastebin example for easier viewing.
# MODELS
class User(AbstractUser):
employee_no = models.IntegerField(default=1)
...all the other usual attributes...
class Summation(CreateUpdateMixin, CreateUpdateUserMixin):
# CreateUpdateMixin adds 'created_at' & 'updated_at
# CreateUpdateUserMixin adds 'created_by' & 'updated_by'
employee = models.ForeignKey(
User, on_delete=models.PROTECT, related_name="%(class)s_employee"
)
report_url = models.CharField(max_length=350, blank=True)
...other unimportant attributes...
def evaluations_summary(self):
evaluations_summary = (
self.evaluation_set.all()
.values("evaluation_type__name")
.annotate(Count("evaluation_type"))
)
return evaluations_summary
class Evaluation(CreateUpdateMixin, CreateUpdateUserMixin):
summation = models.ForeignKey(Summation, on_delete=models.PROTECT)
evaluation_type = models.ForeignKey(
EvaluationType, on_delete=models.PROTECT
)
evaluation_level = models.ForeignKey(
EvaluationLevel, on_delete=models.PROTECT
)
evaluation_date = models.DateField(
auto_now=False, auto_now_add=False, null=True, blank=True
)
published = models.BooleanField(default=False)
class EvaluationLevel(CreateUpdateMixin):
name = models.CharField(max_length=50)
description = models.CharField(max_length=50)
class EvaluationType(CreateUpdateMixin):
name = models.CharField(max_length=50)
description = models.CharField(max_length=50)
evaluation_levels = models.ManyToManyField(EvaluationLevel)
# SERIALIZERS
class UserSerializer(serializers.HyperlinkedModelSerializer):
multiple_locations = serializers.BooleanField()
multiple_jobs = serializers.BooleanField()
summation_status_due_date = serializers.DateField()
summation_employee = SummationSerializer(many=True, read_only=True)
evaluations_summary = serializers.SerializerMethodField()
class Meta:
model = User
fields = [
"url",
"id",
"username",
"first_name",
"last_name",
"full_name",
"email",
"is_staff",
"multiple_locations",
"multiple_jobs",
"summation_status_due_date",
"summation_employee",
"evaluations_summary",
]
def get_evaluations_summary(self, obj):
return (
obj.summation_employee__evaluation_set.all()
.values("evaluation_type__name")
.annotate(Count("evaluation_type"))
)
# CURRENT ANNOTATIONS
# Subqueries for evaluation_summary
active_summations = (
Summation.objects.filter(employee=OuterRef("pk"), locked=False)
)
evaluations_set = (
Evaluation.objects.filter(summation__in=active_summations)
.order_by()
.values("evaluation_type__name")
)
summary_set = evaluations_set.annotate(Count("evaluation_type"))
# the 'summation_employee__evaluation_set' prefetch does not seem
# to make an impact on queries needed
user_list = (
User.objects.prefetch_related("summation_employee")
.prefetch_related("summation_employee__evaluation_set")
.filter(id__in=all_user_ids)
# Get the total locations and if > 1, set multiple_locations to True
.annotate(total_locations=Subquery(total_locations))
.annotate(
multiple_locations=Case(
When(total_locations__gt=1, then=Value(True)),
default=Value(False),
output_field=BooleanField(),
)
)
# Get the total jobs and if > 1 set mutiple_jobs to True
.annotate(total_jobs=Subquery(total_jobs))
.annotate(
multiple_jobs=Case(
When(total_jobs__gt=1, then=Value(True)),
default=Value(False),
output_field=BooleanField(),
)
)
# Get the due_date of the summation from the SummationStatus object
.annotate(
summation_status_due_date=Subquery(
summation_status.values("summation_due")
)
)
# I need to add the annotation here for the 'evaluations_summary' to avoid
# having the database hit for every user (which could possibly range into the
# thousands in certain cases)
# I have tried a number of ways to obtain what I'm looking for
.annotate(
evaluations_summary=Subquery(
evaluations_set.order_by()
.values("evaluation_type__name")
.annotate(Count("evaluation_type"))
)
)
# this annotation gives the error: Only one expression can be specified in the
# select list when the subquery is not introduced with EXISTS.
Is it even possible to transition that model method annotation?? Am I close?

How to add a property to a model which will be calculated based on input?

class TransactionHistory(models.Model):
from_account = models.ForeignKey(
'Account',
on_delete=models.CASCADE,
related_name='from_account'
)
to_account = models.ForeignKey(
'Account',
on_delete=models.CASCADE,
related_name='to_account'
)
amount = models.DecimalField(
max_digits=12,
decimal_places=2
)
created_at = models.DateTimeField(default=timezone.now)
#property
def way(self):
# Here I need to access a list of user's accounts
# (request.user.accounts) to mark the transaction
# as going out or in.
return
def get_own_transaction_history(me_user):
my_accounts = me_user.accounts.all()
# TODO: mark transactions with way as in and out
own_transactions = TransactionHistory.objects.filter(
Q(from_account__in=my_accounts) |
Q(to_account__in=my_accounts)
)
return own_transactions
I want to add a "way" property for the model so when I return the queryset via serializer, the user could understand if the transaction is for going out from his account or in. But if I just add property, it can not be calculated with me_user user in mind, AFAIK the property can only access the local model fields like "from_account" or "to_account".

Something like the following should work as an annotation using conditional expressions, using __in in the When expressions may give a bit of trouble though. The objects returned by this queryset will have an attribute way added to them by the annotation
from django.db.models import Case, CharField, Value, When
def get_own_transaction_history(me_user):
my_accounts = me_user.accounts.all()
return TransactionHistory.objects.filter(
Q(from_account__in=my_accounts) |
Q(to_account__in=my_accounts)
).annotate(
way=Case(
When(from_account__in=my_accounts, then=Value('out')),
When(to_account__in=my_accounts, then=Value('in')),
output_field=CharField(),
)
)

How to build a queryset in a specific order in Django

I'm trying to list profiles of users. I want to list them in such a way that profiles with same city of the user should come first, then next priority should be state, then country, and finally, rest of the profiles. This is what I have tried.
model
class Profile(models.Model):
uuid = UUIDField(auto=True)
user = models.OneToOneField(User)
country = models.ForeignKey(Country, null=True)
state = models.ForeignKey(State, null=True)
city = models.ForeignKey(City, null=True)
views.py
current_user = Profile.objects.filter(user=request.user)
profiles_city = Profile.objects.filter(city=current_user.city)
profiles_state = Profile.objects.filter(state=current_user.state)
profiles_country = Profile.objects.filter(country=current_user.country)
profiles_all = Profile.objects.all()
profiles = (profiles_city | profiles_state | profiles_country | profiles_all).distinct()
But it is yielding the same result as Profile.objects.all()
Please help me. thanks in advance

You need the order_by method of QuerySet that orders objects based on passed parameters; this is done on database:
Profile.objects.order_by(
'current_user__city',
'current_user__state',
'current_user__country',
)
Edit:
If you want to sort by the city, state and country names of the logged in user, you can do this on the Python level, using sorted, and a custom key callable:
from functools import partial
def get_sort_order(profile, logged_in_profile):
# This is a simple example, you need to filter against
# the city-state-country combo to match precisely. For
# example, multiple countries can have the same city/
# state name.
if logged_in_profile.city == profile.city:
return 1
if logged_in_profile.state == profile.state:
return 2
if logged_in_profile.country == profile.country:
return 3
return 4
logged_in_profile = request.user.profile # logged-in user's profile
get_sort_order_partial = partial(get_sort_order, logged_in_profile=logged_in_profile)
sorted(
Profile.objects.all(),
key=get_sort_order_partial,
)
Doing the same on the database level, using Case and When to have a Python if-elif-else like construct:
from django.db.models import Case, When, IntegerField
Profile.objects.order_by(
Case(
When(city=logged_in_profile.city, then=1),
When(state=logged_in_profile.state, then=2),
When(country=logged_in_profile.country, then=3),
default=4,
output_field=IntegerField(),
)
)
This will result in a queryset and also has the added advantage of being faster as all the operations would be done on the database (SELECT CASE WHEN ...).

Django - Multi filtering queryset return empty queryset

I have a problem with queryset in Django 2.0, after some research, I don't find any problem looks like mine.
I think it's because of my very old legacy database create by someone I didn't know.
So, I have a sqlite database who looks like this:
Has you can see, the Table Properties don't have primary_key, so i made a models with django inspectdb command who looks like this:
from django.db import models
class Record(models.Model):
id = models.IntegerField(db_column='ID', primary_key=True)
class Meta:
db_table = 'Records'
def __str__(self):
return "%s" % self.id
class Propertie(models.Model):
id = models.ForeignKey(Record, models.DO_NOTHING, db_column='ID', primary_key=True)
item = models.CharField(db_column='Item', max_length=500)
value = models.CharField(db_column='Value', max_length=500)
class Meta:
db_table = 'Properties'
def __str__(self):
return '[%s]- %s -> %s' % (self.item, self.value, self.id)
I set Properties.id as primary_key but it's a ForeignKey and Django say to set this field as OneToOneField and it's normal and logic, but 1 Record is linked to 9 Properties so Porpertie.id can't be unique this is my first problem because I can't alter the database.
My second and real problem is when I run this query:
def my_view(request):
epoch = datetime.date(1970, 1, 1)
period_from = stat_form.cleaned_data.get("period_from")
period_to = stat_form.cleaned_data.get("period_to")
product = stat_form.cleaned_data.get("kit")
timestamp_from = period_from - epoch
timestamp_to = period_to - epoch
records = Record.objects.using("statool").filter(
propertie__item="product",
propertie__value=product,
).filter(
propertie__item="stamp",
propertie__value__gt=str(int(timestamp_from.total_seconds())),
propertie__value__lt=str(int(timestamp_to.total_seconds())),
).count()
this QuerySet is empty but it should return approximately 16XXX Record
I don't know what happens?
Because if I do this query:
records = Record.objects.using("statool").filter(
propertie__item="product",
propertie__value=product,
)
It returns a result but the second filter doesn't work ...
The goal of those request is to get the Record out with the specific date and product name.
the 9 possibilities of item field in Properties can be:
product
version
tool
stamp
user
host
site
project
args
A future query with the same logic will be applied just after to get version by product and by site.
Thank you for your help!
And sorry for my bad English :)

To answer my problem,
first i have stoped to try user multi .filter because when i run:
records = Record.objects.using("statool").filter(
propertie__item="product",
propertie__value=product,
).filter(
propertie__item="stamp",
propertie__value__gt=str(int(timestamp_from.total_seconds())),
propertie__value__lt=str(int(timestamp_to.total_seconds())),
).count()
After the first .filterRecord objects lost reference to propertie_set so i can't filter by propertie.
As say #ukemi and #Ralf, using:
.filter(
propertie__item="stamp",
propertie__value__gt=str(int(timestamp_from.total_seconds())),
propertie__value__lt=str(int(timestamp_to.total_seconds())),
)
is a really bad idea to have exact query.
So this is my solution:
def select_stats(request):
epoch = datetime.date(1970, 1, 1)
period_from = stat_form.cleaned_data.get("period_from")
period_to = stat_form.cleaned_data.get("period_to")
product = stat_form.cleaned_data.get("kit")
timestamp_from = period_from - epoch
timestamp_to = period_to - epoch
timestamp_from = int(timestamp_from.total_seconds())
timestamp_to = int(timestamp_to.total_seconds())
all_product = Propertie.objects.using("statool").filter(
item="product",
value=product
).values_list("id", flat=True)
all_stamp = Propertie.objects.using("statool").annotate(
date=Cast("value", IntegerField())
).filter(
date__gte=timestamp_from,
date__lt=timestamp_to
).values_list("id", flat=True)
all_records = Record.objects.using("statool").filter(
id__in=all_product.intersection(all_stamp)
)
all_recorded_propertie = Propertie.objects.using("statool").filter(id__in=all_records)
all_version = all_recorded_propertie.filter(
id__in=all_records,
item="version"
).values_list("value", flat=True).distinct()
all_site = all_recorded_propertie.filter(
id__in=all_records,
item="site"
).values_list("value", flat=True).distinct()
stats_site = {}
for version in all_version:
stats_site[version] = {}
id_version = all_recorded_propertie.filter(
item="version",
value=version
).values_list("id", flat=True)
for site in all_site:
id_site = all_recorded_propertie.filter(
item="site",
value=site
).values_list("id", flat=True)
stats_site[version][site] = id_version.intersection(id_site).count()
To solve timestamp problem by this way:
all_stamp = Propertie.objects.using("statool").annotate(
date=Cast("value", IntegerField())
).filter(
date__gte=timestamp_from,
date__lt=timestamp_to
).values_list("id", flat=True)
Thank's to #erikreed from this thread: Django QuerySet Cast
By the way, this is the most efficient way i've found to do my job.
But if we run this view we have this runtime:
view query runtime
As you can see, every QuerySet are very fast, but intersections between version.id and site.id are very long (more than 2 minutes).
If someone know a better way to do those query, just let us know :)
Hope I help someone.

Django queryset filter model attribute against other model attribute

I don't know if I made myself clear with this question title, but, heres my problem:
I have this model, which is just a transactional model:
class InstanceItemEvaluation(models.Model):
instance = models.ForeignKey(Instance)
item = models.ForeignKey(InstanceItem)
user = models.ForeignKey(User)
factor = models.ForeignKey(Factor)
measure = models.ForeignKey(Measure)
measure_value = models.ForeignKey(MeasureValue, null=True, blank=True)
evaluated_at = models.DateTimeField(null=True, blank=True)
Here is a query I must run to only retrieve valid values from the database:
#staticmethod
def get_user_evaluations_by_instance(user, instance):
qs = InstanceItemEvaluation.objects.filter(
user=user,
instance=instance,
factor__is_active=True,
measure__is_active=True).exclude(
factor__measure=None)
return qs
The query set speaks for itself, I am just filtering the user, and the working instance and so on. This query set output this SQL:
SELECT "workspace_instanceitemevaluation"."id",
"workspace_instanceitemevaluation"."instance_id",
"workspace_instanceitemevaluation"."item_id",
"workspace_instanceitemevaluation"."user_id",
"workspace_instanceitemevaluation"."factor_id",
"workspace_instanceitemevaluation"."measure_id",
"workspace_instanceitemevaluation"."measure_value_id",
"workspace_instanceitemevaluation"."evaluated_at"
FROM "workspace_instanceitemevaluation"
INNER JOIN "measures_measure" ON ( "workspace_instanceitemevaluation"."measure_id" = "measures_measure"."id" )
INNER JOIN "factors_factor" ON ( "workspace_instanceitemevaluation"."factor_id" = "factors_factor"."id" )
WHERE ("measures_measure"."is_active" = True
AND "workspace_instanceitemevaluation"."user_id" = 1
AND "factors_factor"."is_active" = True
AND "workspace_instanceitemevaluation"."instance_id" = 5
AND NOT ("factors_factor"."measure_id" IS NULL));
So far so good. But now I need to put this clause on the query:
AND "factors_factor"."measure_id" = "measures_measure"."id"
Which would mean I am only looking for measure values that are currently associated with my factors. Anyway, I tried to do something like this (look at the last filter):
#staticmethod
def get_user_evaluations_by_instance(user, instance):
qs = InstanceItemEvaluation.objects.filter(
user=user,
instance=instance,
factor__is_active=True,
measure__is_active=True).exclude(
factor__measure=None).filter(
factor__measure=measure)
return qs
But that doesn't even make sense. Now I am kinda stuck, and couldn't find a solution. Of course that's something I can do iterating the result and removing the results I don't need. But I am trying to figure out if it is possible to achieve this SQL query I mentioned using the Django queryset API.

I'm not sure if it will work in this case, but generally you can use F() objects for this.
from django.db.models import F
qs = InstanceItemEvaluation.objects.filter(
user=user,
instance=instance,
factor__is_active=True,
measure__is_active=True).exclude(
factor__measure=None).filter(
factor__measure=F('measure_id'))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimize QuerySets in a Loop with indexes and better SQL - python

Related

Annotation inside an annotation in Django Subquery?

How to add a property to a model which will be calculated based on input?

How to build a queryset in a specific order in Django

Django - Multi filtering queryset return empty queryset

Django queryset filter model attribute against other model attribute

Categories

Resources