Bulk update in django with calculations

Bulk update in django with calculations - python

I have 2 models in my project:
class Currency(models.Model):
title = models.CharField(max_length=100, unique=True)
value = models.FloatField()
class Good(models.Model):
name = models.CharField(max_length=100)
slug = SlugField(max_length=100, unique=True)
cost_to_display = models.IntegerField(default=0)
cost_in_currency = models.IntegerField()
currency = models.ForeignKey(Currency)
The idea of such model is to speed up the search by price and have all goods in one currency.
Therefore I need some hook which will update all Goods in case exchange rate was updated.
In raw sql it will looks like this
mysql> update core_good set cost_to_display = cost_in_currency * (select core_currency.value from core_currency where core_currency.id = currency_id ) ;
Query OK, 663 rows affected (0.10 sec)
Rows matched: 7847 Changed: 663 Warnings: 0
Works pretty fast. Though I tried to implement the same in django admin like this (using bulk-update):
def save_model(self, request, obj, form, change):
"""Update rate values"""
goods = Good.objects.all()
for good in goods:
good.cost_to_display = good.cost_in_currency * good.currency.value
bulk_update(goods)
obj.save()
It takes up to 20 minutes to update all records via django admin this way.
What I am doing wrong? What is the right way to update all the prices?

This is purely untested, but it's sort of work in my mind:
from django.db.models import F
Good.objects.all().update(cost_to_display=F('cost_in_currenty') * F('currency__value'))
Even you are calling bulk_update, you still looped through all goods, which is why your process is slow.
Edit:
This won't work because F() doesn't support joined fields. It can be done using raw query.

For the future readers: any call to good.currency in your code is hitting the database. Consider using select_related to fetch Currency and Good objects in one query:
goods = Good.objects.select_related('currency')
Also now Django comes with bulk_update method since version 2.2 docs

Related

Query efficiency mongodb

Which query will be more efficient:
for id in user.posts:
Post.objects.get(id=id)
or
posts = Post.objects(user=user_id)
with the next schema
Post(Document):
user = ObjectIdField()
User(Document):
posts = ListField(ObjectIdField())
if there is indexing for user field in the Post document, and an average of 20 posts for each User. Also curious about other usage pattern scenarios

The following block, fires as many database queries as you have post in user.posts so it will be slow anyway.
for id in user.posts:
Post.objects.get(id=id)
but if you use it like this:
Post.objects.get(id__in=user.posts)
Then the performance will be similar to using Post.objects(user=user_id) because the primary key gets indexed by default
I believe you should also use ReferenceField i.o plain ObjectId. They allow for lazy loading of references
class Post(Document):
user = ReferenceField("User")
class User(Document):
name = StringField()
#property
def posts(self):
return Post.objects(user=self)
john = User(name='John').save()
post = Post(user=john).save()
print(john.posts()) # [<Post: Post object>]

Django admin search by field of an external table via 'foreign key' too slow

I have two models in Django:
class Dog(models.Model):
nick = models.CharField(max_length=30, db_index=True)
class Bark(models.Model):
date_of_bark = models.DateTimeField(default=datetime.utcnow)
pet = models.ForeignKey('Dog',
related_name='bark_dog',
on_delete=models.CASCADE)
In admin form, I want to search all the Barks of a specific Dog:
class BarkAdmin(BaseAdmin, BaseActions):
paginator = MyPaginator
list_display = ('id', 'date_of_bark', 'pet')
search_fields = ('pet__nick', )
In my database, every Dog has millions of Barks.
The problem is that every search takes a lot of time:
Load times (aprox):
Load of table : Instant
Search results: 15 seconds
In order to improve the time, I ordered the search field:
class BarkAdmin(BaseAdmin, BaseActions):
paginator = MyPaginator
list_display = ('id', 'date_of_bark', 'pet')
search_fields = ('pet__nick', )
ordering = ('pet__nick',)
And now we have these load times (aprox):
Load of table : 15 seconds
Search results: Instant
How can I improve both times simultaneously?
EDIT : Using get_search_results function
Based of Django admin documentation, the get_search_results function can be overwritten to improve the search like this:
class BarkAdmin(BaseAdmin, BaseActions, admin.ModelAdmin):
paginator = MyPaginator
list_display = ('id', 'date_of_bark', 'pet')
search_fields = ('pet__nick', )
def get_search_results(self, request, queryset, search_term):
queryset, use_distinct = super(BarkAdmin, self).get_search_results(request, queryset, search_term)
# Get the pet_ids with the searched nick
pets = Dog.objects.filter(nick__contains=search_term)
# Pick only the Barks with pet in the set
queryset |= self.model.objects.filter(pet__in=pets)
return queryset, use_distinct
But I am doing something wrong, because now we have these load times (aprox):
Load of table : Instant
Search results: 15 seconds

My first aim would be to tune the query as far as possible. I see that you've indexed the nick field.
I would guess that the fact that you have a load time of 15 seconds on either method shows that the query still takes the same time but the table loading is delayed by a sort.
You can tune the query to make the search more efficient for what you are trying to do. I'm not sure how you want the search to work i.e. any barks where nick contains any of the characters entered in the search.
The following documentation (change for your Django version) can help you: https://docs.djangoproject.com/en/2.0/ref/contrib/admin/#django.contrib.admin.ModelAdmin.search_fields
Edit:
You could try optimising the two queries into one using select related. A possible solution although untested. It may also require an additional annotation for counting:
queryset = self.model.objects.select_related('Dog').filter(Dog__nick__contains=search_term)
https://docs.djangoproject.com/en/2.0/ref/models/querysets/#select-related
https://docs.djangoproject.com/en/2.0/topics/db/optimization/#retrieve-everything-at-once-if-you-know-you-will-need-it
It would be good to know what the current queries generated look like i.e. are lots being fired off for this query causing many small queries?
I can add instructions for logging to the console what queries are running.

How to handle concurrency with django queryset get method?

I'm using django (1.5 with mysql) select_for_update method for fetching data from one model and serve this data to user upon request, but when two user request at simultaneously it returns same data for both of the user, see the sample code below
models.py
class SaveAccessCode(models.Model):
code = models.CharField(max_length=10)
class AccessCode(models.Model):
code = models.CharField(max_length=10)
state = models.CharField(max_length=10, default='OPEN')
views.py
def view(request, code):
# for example code = 234567
acccess_code = AccessCode.objects.select_for_update().filter(
code=code, state='OPEN')
acccess_code.delete()
SaveAccessCode.objects.create(code=code)
return
Concurrent request will generate two records of SaveAccessCode with same code, Please guide me how to handle this scenario in better way

You need to set some flag on the model when doing select_for_update, something like:
qs.first().update(is_locked=True)`
and before that should do select like
qs = self.select_for_update().filter(state='OPEN', is_locked=False).order_by('id')
Then after the user, I presume, has done something with it and saved, set the flag is_locked=False and save.
Also make the fetch_available_code as a #staticmethod.

Django 1.6: How to order a query set by a computed DateTimeField

I have a project model having a DateTimeField and duration PositiveIntegerField fields.
The model also has a function days_left implementing relatively involved logic to compute the number of days left till expiry. The function returns an integer.
I want to preform a simple queryset ordering by the value returned by this function.
Project.objects.all().order_by('days_left')
However I keep getting an exception to the effect that days_left is not a field.
Is there any efficient way to do this in native SQL (maybe through views, etc..) and bypass the queryset or does there exist a django solution for such cases?
The whole code:
import datetime as nsdt
from django.db import models
from django.utils import timezone
class Project(models.Model):
name = models.CharField(max_length=128)
publish_date = models.DateTimeField()
duration_days = models.PositiveIntegerField()
def days_left(self):
t1 = timezone.now()
t2 = self.publish_date + nsdt.timedelta(days=self.duration_days)
return (t2 - t1).days if t2 > t1 else 0
if __name__ == '__main__':
print Project.objects.all().order_by('days_left')
# throws excpetion:
# django.core.exceptions.FieldError: Cannot resolve keyword 'days_left_computed' into field.

Since sorting is happening on the database level, you cannot use Django's order_by for this. Instead, you can try sorting the objects using sorted().
projects = sorted(Project.objects.all(), key=lambda x: x.days_left())
Update:
Since you have a large number of records, maybe you can use the Queryset.extra() method. Source
Another approach you may try is using django.db.models.F.
Example using F() (disclaimer: this was not tested)
from django.db.models import F
projects = Project.objects.all().order_by((F('publish_date') + nsdt.timedelta(days=F('duration_days'))) - timezone.now())

Django - how to implement lock data

I have a database table. Some database items can be edited by a user, but only one user can edit the table content at a time, and if after 2 hours the user hasn't finished editing, other users can edit the table. How can I do this?
The table is like this:
class NodeRevision(BaseModel, NodeContent):
node = models.ForeignKey(Node, related_name='revisions')
summary = models.CharField(max_length=300)
revision = models.PositiveIntegerField()
revised_at = models.DateTimeField(default=datetime.datetime.now)
suggested = models.BooleanField(default=False)
suggest_status = models.CharField(max_length=16,default="")
Should I add a BooleanField to it, such as editing_locked=models.BooleanField(default=False) ? Or something else? And how could I implement the 2 hour check?

You'd need a locked_at time field and locked_by field.
Every time somebody loads an edit page, update the database with the locked_at and locked_by information.
To implement the 2 hour restriction, I'd just have the results calculated only when a user asks for permission (as opposed to polling / updating models). When a user tries to edit a model, have it check locked_by/locked_at and return a Boolean whether it's editable by the user or not.
def can_edit(self, user):
if user == self.locked_by:
return True
elif self.locked_at and (self.locked_at - datetime.datetime.now()).total_seconds > 2*60*60:
return True
return False

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Bulk update in django with calculations - python

For the future readers: any call to good.currency in your code is hitting the database. Consider using select_related to fetch Currency and Good objects in one query: goods = Good.objects.select_related('currency') Also now Django comes with bulk_update method since version 2.2 docs

Related

Query efficiency mongodb

Django admin search by field of an external table via 'foreign key' too slow

How to handle concurrency with django queryset get method?

Django 1.6: How to order a query set by a computed DateTimeField

Django - how to implement lock data

Categories

Resources