I'm trying to improve the performance of one of my Django applications to make them run just a bit smoother, as part of a first iteration in improving what I currently have running. When doing some profiling I've noticed that I have a very high number of SQL queries being executed on a couple of pages.
The dashboard page for instance easily has 250+ SQL queries being executed. Further investigation pointed me to the following piece of code in my views.py:
for project in projects:
for historicaldata in project.historical_data_for_n_months_ago(i):
for key in ('hours', 'expenses'):
history_data[key] = history_data[key] + getattr(historicaldata, key)
Relevant function in models.py file:
def historical_data_for_n_months_ago(self, n=1):
n_year, n_month = n_months_ago(n)
try:
return self.historicaldata_set.filter(year=n_year, month=n_month)
except HistoricalData.DoesNotExist:
return []
As you can see, this will cause a lot of queries being executed for each project in the list. Originally this was set-up this way to keep functionality centrally at the model level and introduce convenience functions across the application.
What would be possible ways on how to reduce the number of queries being executed when loading this page? I was thinking on either removing the convince function and just working with select_related() in the view, but, it would still need a lot of queries in order to filter out records for a given year and month.
Thanks a lot in advance!
Edit As requested, some more info on the related models.
Project
class Project(models.Model):
name = models.CharField(max_length=200)
status = models.IntegerField(choices=PROJECT_STATUS_CHOICES, default=1)
last_updated = models.DateTimeField(default=datetime.datetime.now)
total_hours = models.DecimalField(default=0, max_digits=10, decimal_places=2)
total_expenses = models.DecimalField(default=0, max_digits=10, decimal_places=2)
def __str__(self):
return "{i.name}".format(i=self)
def historical_data_for_n_months_ago(self, n=1):
n_year, n_month = n_months_ago(n)
try:
return self.historicaldata_set.filter(year=n_year, month=n_month)
except HistoricalData.DoesNotExist:
return []
HistoricalData
class HistoricalData(models.Model):
project = models.ForeignKey(Project, on_delete=models.CASCADE)
person = models.ForeignKey(Person, on_delete=models.CASCADE)
year = models.IntegerField()
month = models.IntegerField()
hours = models.DecimalField(max_digits=10, decimal_places=2, default=0)
expenses = models.DecimalField(max_digits=10, decimal_places=2, default=0)
def __str__(self):
return "Historical data {i.month}/{i.year} for {i.person} ({i.project})".format(i=self)
I don't think looping through querysets is ever a good idea. So it would be better if you could find some other way. If you could elaborate your view function and exactly what its supposed be to done maybe I could help further.
If you want all the historical_data entries for a project (reverse related) you need to use prefetch_related. Since you want a specific portion of the historical data associated with said project you need to use it with Prefetch.
from django.db.models import Prefetch
Project.objects.prefetch_related(
Prefetch(
'historicaldata_set',
queryset=HistoricalData.objects.filter(year=n_year, month=n_month)
)
)
After that, you should be looping through this dataset in your django template (if you are using that). You can also pass it to a drf-serializer and that would also get your work done :)
Related
I have 3 models Company, Discount and CompanyDiscountRelation as below:
class Company(models.Model):
name = models.CharField(max_length=150)
def __str__(self):
return self.name
class Discount(models.Model):
name = models.CharField(max_length=150)
discount_value = models.IntegerField()
def __str__(self):
return self.name
class DiscountCompanyRelation(models.Model):
company= models.ForeignKey(Company, on_delete=models.CASCADE)
discount = models.ForeignKey(Discount, on_delete=models.CASCADE)
is_active = models.BooleanField(default=True)
I know how to assign a previously created discount to one company. I do it by DiscountCompanyRelationForm and choose company from form list. But i want to assign discount to all companies by one-click. How to do this? I tried get all ID's by:
Company.objects.values_list('pk', flat=True)
and iterate through them but i don't think that's how it should be done and i have problem to save form by:
form.save()
I tried all day but now I gave up.
Sorry if this is basic knowledge. I've been working with django for a few days.
If I understand the question, you want to choose a subset of companies in the Company table, and apply a particular Discount.
The first can be a ModelMultipleChoiceField, the second a ModelChoiceField. Put these in a form with appropriate querysets for the companies and discount that can be chosen, and when the form validates, apply the discount:
discount = form.cleaned_data['discount']
companies = form.cleaned_data['companies']
for company in companies:
relation = DiscountCompanyRelation(
discount = discount,
company = company,
is_active = # whatever you need
)
relation.save()
You need to think about what happens when a discount is applied to a company which already has a discount. You'll put code in the above loop to check and implement the appropriate action.
I'd strongly recommend specifying a related_name on your ForeignKeys rather than relying on whatever Django generates automagically if you don't.
You might also want to look at the "through" option of a model ManyToManyField, because that's another way to create the same DB structure but brings Django's ManyToMany support code online for you.
My problem is that Django inserts entries waaaaaaay too slow ( i didnt even time but it was more than 5 mins) for 100k entries from Pandas csv file. What i am doing is parsing csv file and then save those objects to postgresql in Django. It is going to be a daily cronjob with csv files differ for most of the entries(some can be duplicates from the previous days or owner could be the same)
I haven't tried raw queries, but i dont think that would help much.
and i am really stuck at this point honestly. apart from some iteration manipulations and making a generator, rather than iterator i can not somehow improve the time of insertions.
class TrendType(models.Model):
""" Описывает тип отчета (посты, видео, субъекты)"""
TREND_TYPE = Choices('video', 'posts', 'owners') ## << mnemonic
title = models.CharField(max_length=50)
mnemonic = models.CharField(choices=TREND_TYPE, max_length=30)
class TrendSource(models.Model):
""" Источник отчета (файла) """
trend_type = models.ForeignKey(TrendType, on_delete=models.CASCADE)
load_date = models.DateTimeField()
filename = models.CharField(max_length=100)
class TrendOwner(models.Model):
""" Владелец данных (группа, юзер, и т.п.)"""
TREND_OWNERS = Choices('group', 'user', '_')
title = models.CharField(max_length=50)
mnemonic = models.CharField(choices=TREND_OWNERS, max_length=30)
class Owner(models.Model):
""" Данные о владельце """
link = models.CharField(max_length=255)
name = models.CharField(max_length=255)
trend_type = models.ForeignKey(TrendType, on_delete=models.CASCADE)
trend_owner = models.ForeignKey(TrendOwner, on_delete=models.CASCADE)
class TrendData(models.Model):
""" Модель упаковка всех данных """
owner = models.ForeignKey(Owner, on_delete=models.CASCADE)
views = models.IntegerField()
views_u = models.IntegerField()
likes = models.IntegerField()
shares = models.IntegerField()
interaction_rate = models.FloatField()
mean_age = models.IntegerField()
source = models.ForeignKey(TrendSource, on_delete=models.CASCADE)
date_trend = models.DateTimeField() # << take it as a date
Basically, i would love a good solution for a 'fast' insertion to a database and is it even possible given these models.
Maybe you don't need an ORM here? You can try to implement a simple wrapper around typical SQL requests
Use bulk read/writing, using bulk_create() in Django ORM, or in your wrapper
Check https://docs.djangoproject.com/en/2.2/topics/db/optimization/
The problem is not with django but rather with postgresql itself. My suggestion would be to change your backend. Postgresql is good for UPDATE data, but there are better DBs for INSERT data. Postgresql vs TimescaleDB However, I dont think there is django ORM for TimescaleDB.
My suggestion would be to use Redis. The primary use is cache in memory but you can make it to persist your data too. And there is also ORM for python with redis called ROM
I have a Django system that runs billing for thousands of customers on a regular basis. Here are my models:
class Invoice(models.Model):
balance = models.DecimalField(
max_digits=6,
decimal_places=2,
)
class Transaction(models.Model):
amount = models.DecimalField(
max_digits=6,
decimal_places=2,
)
invoice = models.ForeignKey(
Invoice,
on_delete=models.CASCADE,
related_name='invoices',
null=False
)
When billing is run, thousands of invoices with tens of transactions each are created using several nested for loops, which triggers an insert for each created record. I could run bulk_create() on the transactions for each individual invoice, but this still results in thousands of calls to bulk_create().
How would one bulk-create thousands of related models so that the relationship is maintained and the database is used in the most efficient way possible?
Notes:
I'm looking for a native Django solution that would work on all databases (with the possible exception of SQLite).
My system runs billing in a celery task to decouple long-running code from active requests, but I am still concerned with how long it takes to complete a billing cycle.
The solution should assume that other requests or running tasks are also reading from and writing to the tables in question.
You could bulk_create all the Invoice objects, refresh them from the db, so that they all have ids, create the Transaction objects for all the invoices and then also save them with bulk_create. All of this can be done inside a single transaction.atomic context.
Also, specifically for django 1.10 and postrgres, look at this answer.
You can do it with two bulk create queries, with following method.
new_invoices = []
new_transactions = []
for loop:
invoice = Invoice(params)
new_invoices.append(invoice)
for loop:
transaction = Transaction(params)
transaction.invoice = invoice
new_transactions.append(transaction)
Invoice.objects.bulk_create(new_invoices)
for each in new_transactions:
each.invoice_id = each.invoice.id
Transaction.objects.bulk_create(new_transactions)
Another way for this purpose can be like the below code snippet:
from django.utils import timezone
from django.db import transaction
new_invoices = []
new_transactions = []
for sth in sth_else:
...
invoice = Invoice(params)
new_invoices.append(invoice)
for sth in sth_else:
...
new_transactions.append(transaction)
with transaction.atomic():
other_invoice_ids = Invoice.objects.values_list('id', flat=True)
now = timezone.now()
Invoice.objects.bulk_create(new_invoices)
new_invoices = Invoice.objects.exclude(id__in=other_invoice_ids).values_list('id', flat=True)
for invoice_id in new_invoices:
transaction = Transaction(params, invoice_id=invoice_id)
new_transactions.append(transaction)
Transaction.objects.bulk_create(new_transactions)
I write this answer based on this post on another question in the community.
So I've been stuck with a design problem for the last couple of days and have sunk countless hours into it, to no avail.
My problem is that I wish return all the active Articles. I have made a method within the model, however am unable to use .filter(is_active=True)which would be the worlds best solution.
So now I have made the method into one long filter in the ArticleManager, the problem being that I cannot seem to figure out a way to count the current clicks in a way that can be useful to me. (The current_clicks method in the Article model is what I am aiming for).
Models.py
class ArticleManager(models.Manager):
def get_queryset(self):
return super(ArticleManager, self).get_queryset().filter(article_finish_date=None).filter(article_publish_date__lte=timezone.now())
#this is where i need something to the effect of .filter(article_max_clicks__gt=click_set.count())
class Article(models.Model):
article_name_text = models.CharField(max_length=200)
article_max_clicks = models.IntegerField(default=0)
article_creation_date = models.DateTimeField('date created')
article_publish_date = models.DateTimeField('date published', null=True, blank=True)
article_finish_date = models.DateTimeField('date finished', null=True, blank=True)
def __str__(self):
return self.article_name_text
def is_active(self):
if self.article_finish_date==None:
if self.article_publish_date <= timezone.now():
return self.current_clicks() < self.article_max_clicks
else:
return False
else:
return False
def current_clicks(self):
return self.click_set.count()
is_active.boolean = True
actives = ArticleManager()
class Click(models.Model):
click_article = models.ForeignKey(Article, on_delete=models.CASCADE)
click_user = models.ForeignKey(User, on_delete=models.CASCADE)
click_date = models.DateTimeField('date clicked')
def __str__(self):
return str(self.id) + " " + str(self.click_date)
This is how the clicks are created in views.py if this helps
article.click_set.create(click_article=article, click_user=user, click_date=timezone.now())
If anyone has any sort of idea of how abouts I should do this it would be greatly appreciated!
Many thanks in advance, just let me know if you need anymore information!
Django's annotate functionality is great for adding properties at the time of querying. From the docs -
Per-object summaries can be generated using the annotate() clause. When an annotate() clause is specified, each object in the QuerySet will be annotated with the specified values.
In order to keep your querying performance-minded, you can use this in your manager and not make the (possibly very slow) call of related objects for each of your Articles. Once you have an annotated property, you can use it in your query. Since Django only executes your query when objects are called, you can use this annotation instead of counting the click_set, which would call a separate query per related item. The current_clicks method may still be useful to you, but if calling it for multiple articles your queries will add up quickly and cause a big performance hit.
Please note - I added a related_name of clicks keyword arg to your click_article field in order to use it in place of 'click_set'.
In addition, you'll see the use of Q objects in the query below. What this allows us to do is chain together multiple filters together. These can be nested while using AND (,) / OR(|) operands. So, a reading of the Q objects below would be:
Find all articles where article publish date is before now AND (article has no finish date OR article finish date is after now)
from django.db.models.query import Q,Count
class ArticleManager(models.Manager):
def get_queryset(self):
return super(ArticleManager, self).get_queryset().filter(
Q(article_publish_date__lte=timezone.now()),
( Q(article_finish_date__isnull=True)|
Q(article_finish_date__gte=timezone.now())
).annotate(
click_count=Count('clicks')
).filter(
article_max_clicks__gt=click_count
)
class Article(models.Model):
actives = ArticleManager()
def current_clicks(self):
return self.clicks.count()
# Now you can call Article.actives.all() to get all active articles
class Click(models.Model):
click_article = models.ForeignKey(Article, on_delete=models.CASCADE, related_name='clicks') # added a related_name for more explicit calling of prefetch_related
In Django, I've got a Checkout model, which is a ticket for somebody checking out equipment. I've also got an OrganizationalUnit model that the Checkout model relates to (via ForeignKey), as the person on the checkout belongs to an OrganizationalUnit on our campus.
The OrganizationalUnit has a self relation, so several OUs can be the children of a certain OU, and those children can have children, and so on. Here are the models, somewhat simplified.
class OrganizationalUnit(models.Model):
name = models.CharField(max_length=100)
parent = models.ForeignKey(
'self',
blank=True, null=True,
related_name='children',
)
class Checkout(models.Model):
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
department = models.ForeignKey(
OrganizationalUnit,
null=True,
blank=True,
related_name='checkouts',
)
I want to get a count of the Checkouts that are related to a certain OrganizationalUnit and all of its children. I know how to get the count of all the checkouts that are related to an OU.
ou = OrganizationalUnit.objects.get(pk=1)
count = ou.checkouts.all().count()
But how do I make that count reflect the checkouts of this OU's children and their children? Do I use some sort of iterative loop?
EDIT: I guess I still can't quite wrap my head around the while command to do this. The organizational units can go as deep as the user wants to nest them, but right now the most it goes in the DB is 5 deep. I've written this…
for kid in ou.children.all():
child_checkout_count += kid.checkouts.all().count()
for kid2 in kid.children.all():
child_checkout_count += kid2.checkouts.all().count()
for kid3 in kid2.children.all():
child_checkout_count += kid3.checkouts.all().count()
for kid4 in kid3.children.all():
child_checkout_count += kid4.checkouts.all().count()
for kid5 in kid4.children.all():
child_checkout_count += kid5.checkouts.all().count()
…which is total crap. And it takes a while to run because it pretty much traverses a major chunk of the database. Help! (I can't seem to think very well today.)
I think the most efficient way of calculating this is at write time. You should modify OrganizationalUnit like this:
class OrganizationalUnit(models.Model):
name = models.CharField(max_length=100)
parent = models.ForeignKey(
'self',
blank=True, null=True,
related_name='children',
)
checkout_number = models.IntegerField(default=0)
create the functions that will update the OrganizationalUnit and its parents at write time:
def pre_save_checkout(sender, instance, **kwargs):
if isinstance(instance,Checkout) and instance.id and instance.department:
substract_checkout(instance.department)
def post_save_checkout(sender, instance, **kwargs):
if isinstance(instance,Checkout) and instance.department:
add_checkout(instance.department)
def substract_checkout(organizational_unit):
organizational_unit.checkout_number-=1
organizational_unit.save()
if organizational_unit.parent:
substract_checkout(organizational_unit.parent)
def add_checkout(organizational_unit):
organizational_unit.checkout_number+=1
organizational_unit.save()
if organizational_unit.parent:
add_checkout(organizational_unit.parent)
now all you need is connect those functions to the pre_save, post_save and pre_delete signals:
from django.db.models.signals import post_save, pre_save, pre_delete
pre_save.connect(pre_save_checkout, Checkout)
pre_delete.connect(pre_save_checkout, Checkout)
post_save.connect(post_save_checkout, Checkout)
That should do it...
What you need is a recursive function that traverse OrganizationalUnit relation tree and gets number of related Checkouts for each OrganizationalUnit. So your code will look like this:
def count_checkouts(ou):
checkout_count = ou.checkouts.count()
for kid in ou.children.all():
checkout_count += count_checkouts(kid)
return checkout_count
Also note, that to get a number of related checkouts I use:
checkout_count = ou.checkouts.count()
insted of:
count = ou.checkouts.all().count()
My variant is more efficient (see http://docs.djangoproject.com/en/1.1/ref/models/querysets/#count).
I'm not sure how does SQL perform on this one but what you want to do is exactly what you explained.
Get all OU and it's parents with While loop and then count Checkouts and sum them.
ORM brings you dynamic operations over SQL but kill performance:)