Update a field in a django model only if it needs updating

Update a field in a django model only if it needs updating - python

Suppose I have some django model and I'm updating an instance
def modify_thing(id, new_blah):
mything = MyModel.objects.get(pk=id)
mything.blah = new_blah
mything.save()
My question is, if it happened that it was already the case that mything.blah == new_blah, does django somehow know this and not bother to save this [non-]modification again? Or will it always go into the db (MySQL in my case) and update data?
If I want to avoid an unnecessary write, does it make any sense to do something like:
if mything.blah != new_blah:
mything.blah = new_blah
mything.save()
Given that the record would have to be read from db anyway in order to do the comparison in the first place? Is there any efficiency to be gained from this sort of construction - and if so, is there a less ugly way of doing that than with the if statement in python?

You can use Django Signals to ensure that code like that you just posted don´t write to the db. Take a look at pre_save, that's the signal you're looking for.

Given that django does not cache the values, a trip to DB is inevitable, you have to fetch it to compare the value. And definitely we have less ugly ways to do that. You could do it as
if mything.blah is new_blah:
#Do nothing
else:
mything.blah = new_blah
mything.blah.save()

Related

Should you check before setting a Django model's field?

I have a background service that imports django, and uses my django project's ORM directly.
It monitors something, in a loop, very often - every few seconds.
It goes through every user in my database, checks a condition, and based on that, sets a flag to either be True or False. I might have thousands of users in the database, so the efficiency here can add up.
while True:
time.sleep(5)
for user in User.objects.all():
if user.check():
user.flag = True
else:
user.flag = False
user.save()
I'm using MySQL as my database.
What I'm curious about is this: if a particular user has .flag set to True, am I doing a disk write every time I run user.flag = True; user.save(), even though nothing changed? Or is Django or MySQL smart enough not to do a disk write if nothing changed?
I assume a MySQL read operation is less expensive than a write operation. Would it make more sense to check the value of user.flag, and only try to set user.flag if the value actually changed? This would essentially be exchanging a database read for a database write, from what I understand (except in cases where something actually changed, in which case, first a read is performed, and then a write).
Note: This is just a basic example. I'm not actually dealing with users.

If you can move the logic of check into a .filter() clause, that would be best. That way you could do:
User.objects.filter(match_check, flag=False).update(flag=True)
User.objects.filter(~Q(match_check), flag=True).update(flag=False)
Or if you could annotate the value:
User.objects.annotate(
new_flag=some_check,
).exclude(flag=F('new_flag')).update(flag= ('new_flag'))
Otherwise if you can't then maybe do it like this:
new_flag = user.check()
if new_flag != user.flag:
user.flag = new_flag
user.save(update_fields=['flag']) # This will only update that single column.

Solution needed to a scenario

I am trying to make use of a column's value as a radio button's choice using below code
Forms.py
#retreiving data from database and assigning it to diction list
diction = polls_datum.objects.values_list('poll_choices', flat=True)
#initializing list and dictionary
OPTIONS1 = {}
OPTIONS = []
#creating the dictionary with 0 to no of options given in list
for i in range(len(diction)):
OPTIONS1[i] = diction[i]
#creating tuples from the dictionary above
#OPTIONS = zip(OPTIONS1.keys(), OPTIONS1.values())
for i in OPTIONS1:
k = (i,OPTIONS1[i])
OPTIONS.append(k)
class polls_form(forms.ModelForm):
#retreiving data from database and assigning it to diction list
options = forms.ChoiceField(choices=OPTIONS, widget = forms.RadioSelect())
class Meta:
model = polls_model
fields = ['options']
Using a form I am saving the data or choices in a field (poll_choices), when trying to display it on the index page, it is not reflecting until a server restart.
Can someone help on this please

of course "it is not reflecting until a server restart" - that's obvious when you remember that django server processes are long-running processes (it's not like PHP where each script is executed afresh on each request), and that top-level code (code that's at the module's top-level, not in a function) is only executed once per process when the module is first imported. As a general rule: don't do ANY db query at a module's top-level or at the top-level of a class statement - at best you'll get stale data, at worse it will crash your server process (if you're doing query before everything has been properly setup by django, or if you're doing query based on a schema update before the migration has been applied).
The possible solutions are either to wait until the form's initialisation to setup your field's choices, or to pass a callable as the formfield's choices options, cf https://docs.djangoproject.com/en/2.1/ref/forms/fields/#django.forms.ChoiceField.choices
Also, the way you're building your choices list is uselessly complicated - you could do it as a one-liner:
OPTIONS = list(enumerate(polls_datum.objects.values_list('poll_choices', flat=True))
but it's also very brittle - you're relying on the current db content and ordering for the choice value when you should use the polls_datum's pk instead (which is garanteed to be stable).
And finally: since you're working with what seems to be a related model, you may want to use a ModelChoiceField instead.

For future reference:
What version of Django are you using?
Have you read up on the documentation of ModelForms? https://docs.djangoproject.com/en/2.1/topics/forms/modelforms/
I'm not sure what you're trying to do with diction to dictionary to tuple. I think you could skip a step there and your future self will thank you for that.
Try to follow some tutorials and understand why certain steps are being taken. I can see from your code that you're rather new to coding or Python and there's room for improvement. Not trying to talk you down, but I'm trying to push you into the direction of becoming a better developer ;-)
REAL ANSWER:
That being said, I think the solution is to write the loading of the data somewhere in your form model, rather than 'loose' in forms.py. See bruno's answer for more information on this.
If you want to reload the data on each request that loads the form, you should create a function that gets called every time the form is loaded (for example in the form's __init__ function).

Concurent Access to datastore in app engine

i want to know if db.run_in_transaction() acts as a lock for Data store operations
and helps in case of concurrent access on same entity.
Does in following code it is guarantied that a concurrent access will not cause a race and instead of creating new entity it will not do a over-write
Is db.run_in_transaction() correct/best way to do so
in following code i m trying to create new unique entity with following code
def txn(charmer=None):
new = None
key = my_magic() + random_part()
sk = Snake.get_by_name(key)
if not sk:
new = Snake(key_name=key, charmer= charmer)
new.put()
return new
db.run_in_transaction(txn, charmer)

That is a safe method. Should the same name get generated twice, only one entity would be created.
It sounds like you have already looked at the transactions documentation. There is also a more detailed description.
Check out the docs (specifically the equivalent code) on Model.get_or_insert, it answers exactly the question you are asking:
The get and subsequent (possible) put
are wrapped in a transaction to ensure
atomicity. Ths means that
get_or_insert() will never overwrite
an existing entity, and will insert a
new entity if and only if no entity
with the given kind and name exists.

What you've done is right and sort of duplicates the Model.get_or_insert, like Robert already explained.
I don't know if this can be called a 'lock'... the way this works is optimistic concurrency - the operation will execute assuming that no one else is trying to do the same thing at the same time, and if someone is, it will give you an exception. You'll need to figure out what you want to do in that case. Maybe ask the user to choose a new name?

Iterating over a large Django queryset while the data is changing elsewhere

Iterating over a queryset, like so:
class Book(models.Model):
# <snip some other stuff>
activity = models.PositiveIntegerField(default=0)
views = models.PositiveIntegerField(default=0)
def calculate_statistics():
self.activity = book.views * 4
book.save()
def cron_job_calculate_all_book_statistics():
for book in Book.objects.all():
book.calculate_statistics()
...works just fine. However, this is a cron task. book.views is being incremented while this is happening. If book.views is modified while this cronjob is running, it gets reverted.
Now, book.views is not being modified by the cronjob, but it is being cached during the .all() queryset call. When book.save(), I have a feeling it is using the old book.views value.
Is there a way to make sure that only the activity field is updated? Alternatively, let's say there are 100,000 books. This will take quite a while to run. But the book.views will be from when the queryset originally starts running. Is the solution to just use an .iterator()?
UPDATE: Here's effectively what I am doing. If you have ideas about how to make this work well inline, then I'm all for it.
def calculate_statistics(self):
self.activity = self.views + self.hearts.count() * 2
# Can't do self.comments.count with a comments GenericRelation, because Comment uses
# a TextField for object_pk, and that breaks the whole system. Lame.
self.activity += Comment.objects.for_model(self).count() * 4
self.save()

The following will do the job for you in Django 1.1, no loop necessary:
from django.db.models import F
Book.objects.all().update(activity=F('views')*4)
You can have a more complicated calculation too:
for book in Book.objects.all().iterator():
Book.objects.filter(pk=book.pk).update(activity=book.calculate_activity())
Both these options have the potential to leave the activity field out of sync with the rest, but I assume you're ok with that, given that you're calculating it in a cron job.

In addition to what others have said if you are iterating over a large queryset you should use iterator():
Book.objects.filter(stuff).order_by(stuff).iterator()
this will cause Django to not cache the items as it iterates (which could use a ton of memory for a large result set).

No matter how you solve this, beware of transaction-related issues. E.g. default transaction isolation level is set to REPEATABLE READ, at least for MySQL backend. This, plus the fact that both Django and db backend work in a specific autocommit mode with an ongoing transaction means, that even if you use (very nice) whrde suggestion, value of `views' could be no longer valid. I could be wrong here, but feel warned.

Recursive delete in google app engine

I'm using google app engine with django 1.0.2 (and the django-helper) and wonder how people go about doing recursive delete.
Suppose you have a model that's something like this:
class Top(BaseModel):
pass
class Bottom(BaseModel):
daddy = db.ReferenceProperty(Top)
Now, when I delete an object of type 'Top', I want all the associated 'Bottom' objects to be deleted as well.
As things are now, when I delete a 'Top' object, the 'Bottom' objects stay and then I get data that doesn't belong anywhere. When accessing the datastore in a view, I end up with:
Caught an exception while rendering: ReferenceProperty failed to be resolved.
I could of course find all objects and delete them, but since my real model is at least 5 levels deep, I'm hoping there's a way to make sure this can be done automatically.
I've found this article about how it works with Java and that seems to be pretty much what I want as well.
Anyone know how I could get that behavior in django as well?

You need to implement this manually, by looking up affected records and deleting them at the same time as you delete the parent record. You can simplify this, if you wish, by overriding the .delete() method on your parent class to automatically delete all related records.
For performance reasons, you almost certainly want to use key-only queries (allowing you to get the keys of entities to be deleted without having to fetch and decode the actual entities), and batch deletes. For example:
db.delete(Bottom.all(keys_only=True).filter("daddy =", top).fetch(1000))

Actually that behavior is GAE-specific. Django's ORM simulates "ON DELETE CASCADE" on .delete().
I know that this is not an answer to your question, but maybe it can help you from looking in the wrong places.

Reconsider the data structure. If the relationship will never change on the record lifetime, you could use "ancestors" feature of GAE:
class Top(db.Model): pass
class Middle(db.Model): pass
class Bottom(db.Model): pass
top = Top()
middles = [Middle(parent=top) for i in range(0,10)]
bottoms = [Bottom(parent=middle) for i in range(0,10) for middle in middles]
Then querying for ancestor=top will find all the records from all levels. So it will be easy to delete them.
descendants = list(db.Query().ancestor(top))
# should return [top] + middles + bottoms

If your hierarchy is only a small number of levels deep, then you might be able to do something with a field that looks like a file path:
daddy.ancestry = "greatgranddaddy/granddaddy/daddy/"
me.ancestry = daddy.ancestry + me.uniquename + "/"
sort of thing. You do need unique names, at least unique among siblings.
The path in object IDs sort of does this already, but IIRC that's bound up with entity groups, which you're advised not to use to express relationships in the data domain.
Then you can construct a query to return all of granddaddy's descendants using the initial substring trick, like this:
query = Person.all()
query.filter("ancestry >", gdaddy.ancestry + "\U0001")
query.filter("ancestry <", gdaddy.ancestry + "\UFFFF")
Obviously this is no use if you can't fit the ancestry into a 500 byte StringProperty.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Update a field in a django model only if it needs updating - python

You can use Django Signals to ensure that code like that you just posted don´t write to the db. Take a look at pre_save, that's the signal you're looking for.

Given that django does not cache the values, a trip to DB is inevitable, you have to fetch it to compare the value. And definitely we have less ugly ways to do that. You could do it as if mything.blah is new_blah: #Do nothing else: mything.blah = new_blah mything.blah.save()

Related

Should you check before setting a Django model's field?

Solution needed to a scenario

Concurent Access to datastore in app engine

Iterating over a large Django queryset while the data is changing elsewhere

Recursive delete in google app engine

Categories

Resources