Django model reload_from_db() vs. explicitly recalling from db

Django model reload_from_db() vs. explicitly recalling from db - python

If I have an object retrieved from a model, for example:
obj = Foo.objects.first()
I know that if I want to reference this object later and make sure that it has the current values from the database, I can call:
obj.refresh_from_db()
My question is, is there any advantage to using the refresh_from_db() method over simply doing?:
obj = Foo.objects.get(id=obj.id)
As far as I know, the result will be the same. refresh_from_db() seems more explicit, but in some cases it means an extra line of code. Lets say I update the value field for obj and later want to test that it has been updated to False. Compare:
obj = Foo.objects.first()
assert obj.value is True
# value of foo obj is updated somewhere to False and I want to test below
obj.refresh_from_db()
assert obj.value is False
with this:
obj = Foo.objects.first()
assert obj.value is True
# value of foo obj is updated somewhere to False and I want to test below
assert Foo.objects.get(id=obj.id).value is False
I am not interested in a discussion of which of the two is more pythonic. Rather, I am wondering if one method has a practical advantage over the other in terms of resources, performance, etc. I have read this bit of documentation, but I was not able to ascertain from that whether there is an advantage to using reload_db(). Thank you!

Django sources are usually relatively easy to follow. If we look at the refresh_from_db() implementation, at its core it is still using this same Foo.objects.get(id=obj.id) approach:
db_instance_qs = self.__class__._default_manager.using(db).filter(pk=self.pk)
...
db_instance_qs = db_instance_qs.only(*fields)
...
db_instance = db_instance_qs.get()
Only there are couple extra bells and whistles:
deferred fields are ignored
stale foreign key references are cleared (according to the comment explanation)
So for everyday usage it is safe to say that they are pretty much the same, use whatever you like.

Just to add to #serg's answer, there's a case where explicitly re-fetching from the db is helpful and refreshing from the db isn't so much useful.
This is the case when you're adding permissions to an object and checking them immediately afterwards, and you need to clear the cached permissions for the object so that your permission checks work as expected.
According to the permission caching section of the django documentation:
The ModelBackend caches permissions on the user object after the first time they need to be fetched for a permissions check. This is typically fine for the request-response cycle since permissions aren’t typically checked immediately after they are added (in the admin, for example). If you are adding permissions and checking them immediately afterward, in a test or view for example, the easiest solution is to re-fetch the user from the database...
For an example, consider this block of code inspired by the one in the documentation cited above:
from django.contrib.auth import get_user_model
from django.contrib.auth.models import Permission
from django.contrib.contenttypes.models import ContentType
from smoothies.models import Smoothie
def force_deblend(user, smoothie):
# Any permission check will cache the current set of permissions
if not user.has_perm('smoothies.deblend_smoothie'):
permission = Permission.objects.get(
codename='deblend_smoothie',
content_type=ContentType.objects.get_for_model(Smoothie)
)
user.user_permissions.add(permission)
# Subsequent permission checks hit the cached permission set
print(user.has_perm('smoothies.deblend_smoothie')) # False
# Re-fetch user (explicitly) from db to clear permissions cache
# Be aware that user.refresh_from_db() won't help here
user = get_user_model().objects.get(pk=user.pk)
# Permission cache is now repopulated from the database
print(user.has_perm('smoothies.deblend_smoothie')) # True
...
...

It seems there is a difference if you use cached properties.
See here:
p.roles[0]
<Role: 16888649>
p.refresh_from_db()
p.roles[0]
<Role: 16888649>
p = Person.objects.get(id=p.id)
p.roles[0]
<Role: 16888650>
Definition from models.py:
#cached_property
def roles(self):
return Role.objects.filter(employer__person=self).order_by("id")

Related

How can I deal with a massive delete from Django Admin?

I'm working with Django 2.2.10.
I have a model called Site, and a model called Record.
Each record is associated with a single site (Foreign Key).
After my app runs for a few days/weeks/months, each site can have thousands of records associated with it. I use the database efficiently, so this isn't normally a problem.
In Django Admin, when I try to delete a site however, Django Admin tries to figure out every single associated object that will also be deleted, and because my ForeignKey uses on_delete=models.CASCADE, which is what I want, it tries to generate a page that lists thousands, possibly millions of records that will be deleted. Sometimes this succeeds, but takes a few seconds. Sometimes the browser just gives up waiting.
How can I have Django Admin not list every single record it intends to delete? Maybe just say something like "x number of records will be deleted" instead.
Update: Should I be overriding Django admin's delete_confirmation.html? It looks like the culprit might be this line:
<ul>{{ deleted_objects|unordered_list }}</ul>
Or is there an option somewhere that can be enabled to automatically not list every single object to be deleted, perhaps if the object count is over X number of objects?
Update 2: Removing the above line from delete_confirmation.html didn't help. I think it's the view that generates the deleted_objects variable that is taking too long. Not quite sure how to override a Django Admin view

Add this to your admin class, and than you can delete with this action without warning
actions = ["silent_delete"]
def silent_delete(self, request, queryset):
queryset.delete()
If you want to hide default delete action, add this to your admin class
def get_actions(self, request):
actions = super().get_actions(request)
if 'delete_selected' in actions:
del actions['delete_selected']
return actions

Since django 2.1 you can override get_deleted_objects to limit the amount of deleted objects listed (it's either a list or a nested list). The timeout is probably due to the django app server timing out on the view's response.
You could limit the size of the returned list:
class YourModelAdmin(django.admin.ModelAdmin):
def get_deleted_objects(self, objs, request):
deleted = super().get_deleted_objects(objs, request)
deleted_objs = deleted[0]
return (self.__limit_nested(deleted_objs),) + deleted[1:]
def __limit_nested(self, objs):
limit = 10
if isinstance(objs, list):
return list(map(self.__limit_nested, objs))
if len(objs) > limit:
return objs[:limit] + ['...']
return objs
But chances are the call to super takes too long as well, so you probably want to return [], {}, set(), [] instead of calling super; though it doesn't tell you about missing permissions or protected relations then (but I saw no alternative other than copy pasting code from django github). You will want to override the delete_confirmation.html and the delete_selected_confirmation.html template as well. You'll also want to make sure the admin has permission to delete any related objects that might get deleted by the cascading deletes.
In fact, the deletion itself may take too long. It's probably best defer the deletion (and permission checks if those are slow too) to a celery task.

Custom the `on_delete` param function in Django model fields

I have a IPv4Manage model, in it I have a vlanedipv4network field:
class IPv4Manage(models.Model):
...
vlanedipv4network = models.ForeignKey(
to=VlanedIPv4Network, related_name="ipv4s", on_delete=models.xxx, null=True)
As we know, on the on_delete param, we general fill the models.xxx, such as models.CASCADE.
Is it possible to custom a function, to fill there? I want to do other logic things there.

The choices for on_delete can be found in django/db/models/deletion.py
For example, models.SET_NULL is implemented as:
def SET_NULL(collector, field, sub_objs, using):
collector.add_field_update(field, None, sub_objs)
And models.CASCADE (which is slightly more complicated) is implemented as:
def CASCADE(collector, field, sub_objs, using):
collector.collect(sub_objs, source=field.remote_field.model,
source_attr=field.name, nullable=field.null)
if field.null and not connections[using].features.can_defer_constraint_checks:
collector.add_field_update(field, None, sub_objs)
So, if you figure out what those arguments are then you should be able to define your own function to pass to the on_delete argument for model fields. collector is most likely an instance of Collector (defined in the same file, not sure what it's for exactly), field is most likely the model field being deleted, sub_objs is likely instances that relate to the object by that field, and using denotes the database being used.
There are alternatives for custom logic for deletions too, incase overriding the on_delete may be a bit overkill for you.
The post_delete and pre_delete allows you define some custom logic to run before or after an instance is deleted.
from django.db.models.signals import post_save
def delete_ipv4manage(sender, instance, using):
print('{instance} was deleted'.format(instance=str(instance)))
post_delete.connect(delete_ipv4manage, sender=IPv4Manage)
And lastly you can override the delete() method of the Model/Queryset, however be aware of caveats with bulk deletes using this method:
Overridden model methods are not called on bulk operations
Note that the delete() method for an object is not necessarily called when deleting objects in bulk using a QuerySet or as a result of a cascading delete. To ensure customized delete logic gets executed, you can use pre_delete and/or post_delete signals.

Another useful solution is to use the models.SET() where you can pass a function (deleted_guest in the example below)
guest = models.ForeignKey('Guest', on_delete=models.SET(deleted_guest))
and the function deleted_guest is
DELETED_GUEST_EMAIL = 'deleted-guest#introtravel.com'
def deleted_guest():
""" used for setting the guest field of a booking when guest is deleted """
from intro.models import Guest
from django.conf import settings
deleted_guest, created = Guest.objects.get_or_create(
first_name='Deleted',
last_name='Guest',
country=settings.COUNTRIES_FIRST[0],
email=DELETED_GUEST_EMAIL,
gender='M')
return deleted_guest
You can't send any parameters and you have to be careful with circular imports. In my case I am just setting a filler record, so the parent model has a predefined guest to represent one that has been deleted. With the new GDPR rules we gotta be able to delete guest information.

CASCADE and PROTECT etc are in fact functions, so you should be able to inject your own logic there. However, it will take a certain amount of inspection of the code to figure out exactly how to get the effect you're looking for.
Depending what you want to do it might be relatively easy, for example the PROTECT function just raises an exception:
def PROTECT(collector, field, sub_objs, using):
raise ProtectedError(
"Cannot delete some instances of model '%s' because they are "
"referenced through a protected foreign key: '%s.%s'" % (
field.remote_field.model.__name__, sub_objs[0].__class__.__name__, field.name
),
sub_objs
)
However if you want something more complex you'd have to understand what the collector is doing, which is certainly discoverable.
See the source for django.db.models.deletion to get started.

There is nothing stopping you from adding your own logic. However, you need to consider multiple factors including compatibility with the database that you are using.
For most use cases, the out of the box logic is good enough if your database design is correct. Please check out your available options here https://docs.djangoproject.com/en/2.0/ref/models/fields/#django.db.models.ForeignKey.on_delete.

Clearing specific cache in Django

I am using view caching for a django project.
It says the cache uses the URL as the key, so I'm wondering how to clear the cache of one of the keys if a user updates/deletes the object.
An example: A user posts a blog post to domain.com/post/1234/ .. If the user edits that, i'd like to delete the cached version of that URL by adding some kind of delete cache command at the end of the view that saves the edited post.
I'm using:
#cache_page(60 * 60)
def post_page(....):
If the post.id is 1234, it seems like this might work, but it's not:
def edit_post(....):
# stuff that saves the edits
cache.delete('/post/%s/' % post.id)
return Http.....

From django cache docs, it says that cache.delete('key') should be enough. So, it comes to my mind two problems you might have:
Your imports are not correct, remember that you have to import cache from the django.core.cache module:
from django.core.cache import cache
# ...
cache.delete('my_url')
The key you're using is not correct (maybe it uses the full url, including "domain.com"). To check which is the exact url you can go into your shell:
$ ./manage.py shell
>>> from django.core.cache import cache
>>> cache.has_key('/post/1234/')
# this will return True or False, whether the key was found or not
# if False, keep trying until you find the correct key ...
>>> cache.has_key('domain.com/post/1234/') # including domain.com ?
>>> cache.has_key('www.domain.com/post/1234/') # including www.domain.com ?
>>> cache.has_key('/post/1234') # without the trailing / ?

I make a function to delete key starting with some text. This help me to delete dynamic keys.
list posts cached
def get_posts(tag, page=1):
cached_data = cache.get('list_posts_home_tag%s_page%s' % (tag, page))
if not cached_data:
cached_data = mycontroller.get_posts(tag, page)
cache.set('list_posts_home_tag%s_page%s' % (tag, page), cached_data, 60)
return cached_data
when update any post, call flush_cache
def update(data):
response = mycontroller.update(data)
flush_cache('list_posts_home')
return response
flush_cache to delete any dynamic cache
def flush_cache(text):
for key in list(cache._cache.keys()):
if text in key:
cache.delete(key.replace(':1:', ''))
Do not forget to import cache from django
from django.core.cache import cache

I had same problem and I found this solution.
if you are using this decorator:
django.views.decorators.cache.cache_page
you can use this function:
import hashlib
from typing import List
from django.core.cache import cache
def get_cache_keys_from_url(absolute_uri) -> List[str]:
url = hashlib.md5(f"absolute_uri".encode('ascii'))
return cache.keys(f'*{url.hexdigest()}*')
which you need to get cache keys for absolute_uri
and than use cache.delete_many(get_cache_keys_from_url(http://exemple.com/post/1234/)) for clear page with url - http://exemple.com/post/1234/

There's a trick that might work for this. The cache_page decorator takes an optional argument for key_prefix. It's supposed to be used when you have, say, multiple sites on a single cache. Here's what the docs say:
CACHE_MIDDLEWARE_KEY_PREFIX – If the cache is shared across multiple sites using the same Django installation, set this to the name of the site, or some other string that is unique to this Django instance, to prevent key collisions. Use an empty string if you don’t care.
But if you don't have multiple sites, you can abuse it thus:
cache_page(cache_length, key_prefx="20201215")
I used the date as the prefix so it's easy to rotate through new invalidation keys if you want. This should work nicely.
However! Please note that if you are using this trick, some caches (e.g., the DB cache, filesystem cache, and probably others) do not clean up expired entries except when they're accessed. If you use the trick above, you won't ever access the cache entry again and it'll linger until you clear it out. Probably doesn't matter, but worth considering.

Python - caching a property to avoid future calculations

In the following example, cached_attr is used to get or set an attribute on a model instance when a database-expensive property (related_spam in the example) is called. In the example, I use cached_spam to save queries. I put print statements when setting and when getting values so that I could test it out. I tested it in a view by passing an Egg instance into the view and in the view using {{ egg.cached_spam }}, as well as other methods on the Egg model that make calls to cached_spam themselves. When I finished and tested it out the shell output in Django's development server showed that the attribute cache was missed several times, as well as successfully gotten several times. It seems to be inconsistent. With the same data, when I made small changes (as little as changing the print statement's string) and refreshed (with all the same data), different amounts of misses / successes happened. How and why is this happening? Is this code incorrect or highly problematic?
class Egg(models.Model):
... fields
#property
def related_spam(self):
# Each time this property is called the database is queried (expected).
return Spam.objects.filter(egg=self).all() # Spam has foreign key to Egg.
#property
def cached_spam(self):
# This should call self.related_spam the first time, and then return
# cached results every time after that.
return self.cached_attr('related_spam')
def cached_attr(self, attr):
"""This method (normally attached via an abstract base class, but put
directly on the model for this example) attempts to return a cached
version of a requested attribute, and calls the actual attribute when
the cached version isn't available."""
try:
value = getattr(self, '_p_cache_{0}'.format(attr))
print('GETTING - {0}'.format(value))
except AttributeError:
value = getattr(self, attr)
print('SETTING - {0}'.format(value))
setattr(self, '_p_cache_{0}'.format(attr), value)
return value

Nothing wrong with your code, as far as it goes. The problem probably isn't there, but in how you use that code.
The main thing to realise is that model instances don't have identity. That means that if you instantiate an Egg object somewhere, and a different one somewhere else, even if they refer to the same underlying database row they won't share internal state. So calling cached_attr on one won't cause the cache to be populated in the other.
For example, assuming you have a RelatedObject class with a ForeignKey to Egg:
my_first_egg = Egg.objects.get(pk=1)
my_related_object = RelatedObject.objects.get(egg__pk=1)
my_second_egg = my_related_object.egg
Here my_first_egg and my_second_egg both refer to the database row with pk 1, but they are not the same object:
>>> my_first_egg.pk == my_second_egg.pk
True
>>> my_first_egg is my_second_egg
False
So, filling the cache on my_first_egg doesn't fill it on my_second_egg.
And, of course, objects won't persist across requests (unless they're specifically made global, which is horrible), so the cache won't persist either.

Http servers that scale are shared-nothing; you can't rely on anything being singleton. To share state, you need to connect to a special-purpose service.
Django's caching support is appropriate for your use case. It isn't necessarily a global singleton either; if you use locmem://, it will be process-local, which could be the more efficient choice.

How do I force Django to ignore any caches and reload data?

I'm using the Django database models from a process that's not called from an HTTP request. The process is supposed to poll for new data every few seconds and do some processing on it. I have a loop that sleeps for a few seconds and then gets all unhandled data from the database.
What I'm seeing is that after the first fetch, the process never sees any new data. I ran a few tests and it looks like Django is caching results, even though I'm building new QuerySets every time. To verify this, I did this from a Python shell:
>>> MyModel.objects.count()
885
# (Here I added some more data from another process.)
>>> MyModel.objects.count()
885
>>> MyModel.objects.update()
0
>>> MyModel.objects.count()
1025
As you can see, adding new data doesn't change the result count. However, calling the manager's update() method seems to fix the problem.
I can't find any documentation on that update() method and have no idea what other bad things it might do.
My question is, why am I seeing this caching behavior, which contradicts what Django docs say? And how do I prevent it from happening?

Having had this problem and found two definitive solutions for it I thought it worth posting another answer.
This is a problem with MySQL's default transaction mode. Django opens a transaction at the start, which means that by default you won't see changes made in the database.
Demonstrate like this
Run a django shell in terminal 1
>>> MyModel.objects.get(id=1).my_field
u'old'
And another in terminal 2
>>> MyModel.objects.get(id=1).my_field
u'old'
>>> a = MyModel.objects.get(id=1)
>>> a.my_field = "NEW"
>>> a.save()
>>> MyModel.objects.get(id=1).my_field
u'NEW'
>>>
Back to terminal 1 to demonstrate the problem - we still read the old value from the database.
>>> MyModel.objects.get(id=1).my_field
u'old'
Now in terminal 1 demonstrate the solution
>>> from django.db import transaction
>>>
>>> #transaction.commit_manually
... def flush_transaction():
... transaction.commit()
...
>>> MyModel.objects.get(id=1).my_field
u'old'
>>> flush_transaction()
>>> MyModel.objects.get(id=1).my_field
u'NEW'
>>>
The new data is now read
Here is that code in an easy to paste block with docstring
from django.db import transaction
#transaction.commit_manually
def flush_transaction():
"""
Flush the current transaction so we don't read stale data
Use in long running processes to make sure fresh data is read from
the database. This is a problem with MySQL and the default
transaction mode. You can fix it by setting
"transaction-isolation = READ-COMMITTED" in my.cnf or by calling
this function at the appropriate moment
"""
transaction.commit()
The alternative solution is to change my.cnf for MySQL to change the default transaction mode
transaction-isolation = READ-COMMITTED
Note that that is a relatively new feature for Mysql and has some consequences for binary logging / slaving. You could also put this in the django connection preamble if you wanted.
Update 3 years later
Now that Django 1.6 has turned on autocommit in MySQL this is no longer a problem. The example above now works fine without the flush_transaction() code whether your MySQL is in REPEATABLE-READ (the default) or READ-COMMITTED transaction isolation mode.
What was happening in previous versions of Django which ran in non autocommit mode was that the first select statement opened a transaction. Since MySQL's default mode is REPEATABLE-READ this means that no updates to the database will be read by subsequent select statements - hence the need for the flush_transaction() code above which stops the transaction and starts a new one.
There are still reasons why you might want to use READ-COMMITTED transaction isolation though. If you were to put terminal 1 in a transaction and you wanted to see the writes from the terminal 2 you would need READ-COMMITTED.
The flush_transaction() code now produces a deprecation warning in Django 1.6 so I recommend you remove it.

We've struggled a fair bit with forcing django to refresh the "cache" - which it turns out wasn't really a cache at all but an artifact due to transactions. This might not apply to your example, but certainly in django views, by default, there's an implicit call to a transaction, which mysql then isolates from any changes that happen from other processes ater you start.
we used the #transaction.commit_manually decorator and calls to transaction.commit() just before every occasion where you need up-to-date info.
As I say, this definitely applies to views, not sure whether it would apply to django code not being run inside a view.
detailed info here:
http://devblog.resolversystems.com/?p=439

I'm not sure I'd recommend it...but you can just kill the cache yourself:
>>> qs = MyModel.objects.all()
>>> qs.count()
1
>>> MyModel().save()
>>> qs.count() # cached!
1
>>> qs._result_cache = None
>>> qs.count()
2
And here's a better technique that doesn't rely on fiddling with the innards of the QuerySet: Remember that the caching is happening within a QuerySet, but refreshing the data simply requires the underlying Query to be re-executed. The QuerySet is really just a high-level API wrapping a Query object, plus a container (with caching!) for Query results. Thus, given a queryset, here is a general-purpose way of forcing a refresh:
>>> MyModel().save()
>>> qs = MyModel.objects.all()
>>> qs.count()
1
>>> MyModel().save()
>>> qs.count() # cached!
1
>>> from django.db.models import QuerySet
>>> qs = QuerySet(model=MyModel, query=qs.query)
>>> qs.count() # refreshed!
2
>>> party_time()
Pretty easy! You can of course implement this as a helper function and use as needed.

If you append .all() to a queryset, it'll force a reread from the DB. Try
MyModel.objects.all().count() instead of MyModel.objects.count().

Seems like the count() goes to cache after the first time. This is the django source for QuerySet.count:
def count(self):
"""
Performs a SELECT COUNT() and returns the number of records as an
integer.
If the QuerySet is already fully cached this simply returns the length
of the cached results set to avoid multiple SELECT COUNT(*) calls.
"""
if self._result_cache is not None and not self._iter:
return len(self._result_cache)
return self.query.get_count(using=self.db)
update does seem to be doing quite a bit of extra work, besides what you need.
But I can't think of any better way to do this, short of writing your own SQL for the count.
If performance is not super important, I would just do what you're doing, calling update before count.
QuerySet.update:
def update(self, **kwargs):
"""
Updates all elements in the current QuerySet, setting all the given
fields to the appropriate values.
"""
assert self.query.can_filter(), \
"Cannot update a query once a slice has been taken."
self._for_write = True
query = self.query.clone(sql.UpdateQuery)
query.add_update_values(kwargs)
if not transaction.is_managed(using=self.db):
transaction.enter_transaction_management(using=self.db)
forced_managed = True
else:
forced_managed = False
try:
rows = query.get_compiler(self.db).execute_sql(None)
if forced_managed:
transaction.commit(using=self.db)
else:
transaction.commit_unless_managed(using=self.db)
finally:
if forced_managed:
transaction.leave_transaction_management(using=self.db)
self._result_cache = None
return rows
update.alters_data = True

You can also use MyModel.objects._clone().count(). All of the methods in the the QuerySet call _clone() prior to doing any work - that ensures that any internal caches are invalidated.
The root cause is that MyModel.objects is the same instance each time. By cloning it you're creating a new instance without the cached value. Of course, you can always reach in and invalidate the cache if you'd prefer to use the same instance.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django model reload_from_db() vs. explicitly recalling from db - python

Related

How can I deal with a massive delete from Django Admin?

Custom the `on_delete` param function in Django model fields

Clearing specific cache in Django

Python - caching a property to avoid future calculations

How do I force Django to ignore any caches and reload data?

Categories

Resources