What are the specifics and conditions to django auto-caching query sets - python

Django stated in their docs that all query sets are automatically cached, https://docs.djangoproject.com/en/dev/topics/db/queries/#caching-and-querysets. But they weren't super specific with the details of this functionality.
The example that they gave was to save the qs in a python variable, and subsequent calls after the first will be taken from the cache.
queryset = Entry.objects.all()
print([p.headline for p in queryset]) # Evaluate the query set.
print([p.pub_date for p in queryset]) # Re-use the cache from the evaluation.
So even if two exact queryset calls were made without a variable subsequently when a user loads a view, would the results not be cached?
# When the user loads the homepage, call number one (not cached)
def home(request):
entries = Entry.objects.filter(something)
return render_to_response(...)
# Call number two, is this cached automatically? Or do I need to import cache and
# manually do it? This is the same method as above, called twice
def home(request):
entries = Entry.objects.filter(something)
return render_to_response(...)
Sorry if this is confusing, I pasted the method twice to make it look like the user is calling it twice, its just one method. Are entries automatically cached?
Thanks

The queryset example you have given rightly indicates that querysets are evaluated lazily i.e the first time they are used. So when subsequently used again, they are not evaluated in the same flow when assigned to a variable. This is not exactly caching but re-using an evaluated expression as long as it is available in an optimized manner.
For the kind of caching you are looking at i.e the same view called twice, you will need to manually cache the database object when it is fetched the first time. Memcached is good for this. Then subsequently check and fetch like in example below.
def view(request):
results = cache.get(request.user.id)
if not results:
results = do_a_ton_of_work()
cache.set(request.user.id, results)
There are of course a lot of other ways to do caching at different levels right from your proxy server to per url caching. Whatever works best for you. Here is a good read on this topic.

It is not cached for two reasons:
When you use just filter, but don't "loop" through the results the queryset is not yet evaluated, which means the cache is still empty.
Even they would be evaluated it is not cached, because when you call the function the second time the queryset is recreated (new local variable), even you created it already the first time you called the function. The second function call does simply not "know" what you did before. Its simply a new queryset instance. In this case you might rely on the database cache though.

Memcached is built-in module, it works nice but even you can try for "johnny cache" for more better results.
you can get the more info here
http://packages.python.org/johnny-cache/

Related

Is using any with a QuerySet unoptimal?

Many times, one needs to check if there is at least one element inside a QuerySet. Mostly, I use exists:
if queryset.exists():
...
However, I've seen colleagues using python's any function:
if any(queryset):
...
Is using python's any function unoptimal?
My intuition tells me that this is a similar dilemma to one between using count and len: any will iterate through the QuerySet and, therefore, will need to evaluate it. In a case where we will use the QuerySet items, this doesn't create any slowdowns. However, if we need just to check if any pieces of data that satisfy a query exist, this might load data that we do not need.
Is using python's any function unoptimal?
The most Pythonic way would be:
if queryset:
# …
Indeed, a QuerySet has truthiness True if it contains at least one item, and False otherwise.
In case you later want to enumerate over the queryset (with a for loop for example), it will load the items in the cache if you check its truthiness, so for example:
if queryset:
for item in queryset:
# …
will only make one query to the database: one that will fetch all items when you check the if queryset, and then later you can reuse that cache without making a second query.
In case you do not consume the queryset later in the process, then you can work with a .exists() [Django-doc]: this will not load records in memory, but only make a query to check if at least one such record exists, this is thus less expensive in terms of bandwidth between the application and the database. If you however have to consume the queryset later, using .exists() is not a good idea, since then we make two queries.
Using any(queryset) however is non-sensical: you can check if a queryset contains elements by its truthiness, so using any() will usually only make that check slightly less efficient.

What does it mean when written id=-1 in django request?

I'm reading someone's code, and there is written
get_object_or_404(Order, id=-1)
Could someone explain the purpose of id=-1?
Well get_object_or_404 [Django-doc] takes as input a model or queryset, and aims to filter it with the remaining positional and named parameters. It then aims to fetch that object, and raises a 404 in case the object does not exists.
Here we thus aim to obtain an Order object with id=-1. So the query that is executed "behind the curtains" is:
Order.objects.get(id=-1) # SELECT order.* FROM order WHERE id=-1
In most databases ids are however (strictly) positive (if these are assigned automatically). So unless an Order object is explicitly saved with id=-1, this will always raise a 404 exception.
Sometimes however one stores objects with negative id to make it easy to retrieve and update "special" ones (although personally I think it is not a good practice, since this actually is related to the singleton and global state anti-patterns). You thus can look (for example in the database, or in the code) if there are objects with negative ids. If these objects are not created, then this code will always result in a 404 response.

Django - queryset caching request-independent?

I need to cache a mid-sized queryset (about 500 rows). I had a look on some solutions, django-cache-machine being the most promising.
Since the queryset is pretty much static (it's a table of cities that's been populated in advance and gets updated only by me and anyway, almost never), I just need to serve the same queryset at every request for filtering.
In my search, one detail was really not clear to me: is the cache a sort of singleton object, which is available to every request? By which I mean, if two different users access the same page, and the queryset is evaluated for the first user, does the second one get the cached queryset?
I could not figure out, what problem you are exactly facing. What you are saying is the classical use case for caching. Memcache and redis are two most popular options. You just needs to write some method or function which first tries to load the result from cache, if it not there , the it queries the database. E.g:-
from django.core.cache import cache
def cache_user(userid):
key = "user_{0}".format(userid)
value = cache.get(key)
if value is None:
# fetch value from db
cache.set(value)
return value
Although for simplicity, I have written this as function, ideally this should be a manager method of the concerned model.

Django caching queries (I don't want it to)

So I'm currently working in Python/Django and I have a problem where Django caches querysets "within a session".
If I run python manage.py shell and do so:
>>> from myproject.services.models import *
>>> test = TestModel.objects.filter(pk = 5)
>>> print test[0].name
>>> John
Now, if I then update it directly in SQL to Bob and run it again, it'll still say John. If I however CTRL+D out (exit) and run the same thing, it will have updated and will now print Bob.
My problem is that I'm running a SOAP service in a screen and it'll always return the same result, even if the data gets changed.
I need a way to force the query to actually pull the data from the database again, not just pull the cached data. I could just use raw queries but that doesn't feel like a solution to me, any ideas?
The queryset is not cached 'within a session'.
The Django documentation: Caching and QuerySets mentions:
Each QuerySet contains a cache to minimize database access. Understanding how it works will allow you to write the most efficient code.
In a newly created QuerySet, the cache is empty. The first time a QuerySet is evaluated – and, hence, a database query happens – Django saves the query results in the QuerySet’s cache and returns the results that have been explicitly requested (e.g., the next element, if the QuerySet is being iterated over). Subsequent evaluations of the QuerySet reuse the cached results.
Keep this caching behavior in mind, because it may bite you if you don’t use your QuerySets correctly.
(emphasis mine)
For more information on when querysets are evaluated, refer to this link.
If it is critical for your application that he querysets gets updated, you have to evaluate it each time, be it within a single view function, or with ajax.
It is like running a SQL query again and again. Like old times when no querysets have been available and you kept the data in some structure that you had to refresh.

How can I quickly set a field of all instances of a Django model at once?

To clarify, I've got several thousands of Property items, each with a 'present' field (among others). To reset the system for use again, I need to set every item's 'property' field to false. Now, of course there's the easy way to do it, which is just:
for obj in Property.objects.all():
obj.present = False
obj.save()
But this takes nearly 30 seconds on my development server. I feel there must be a better way, so I tried limiting the loaded fields using Django's only queryset:
for obj in Property.objects.only('present'):
obj.present = False
obj.save()
For whatever reason, this actually takes longer than just getting the entire object.
Because I need to indiscriminately set all of these values to False, is there a faster way? This function takes no user input other than the 'go do it' command, so I feel a native SQL command would be a safe option, but I don't know SQL enough to draft such a command.
Thanks everyone.
Use the update query:
Property.objects.all().update(present=False)
Note that update() query runs at SQL level, so if your model has a custom save() method then it is not going to be called here. In that case, the normal for-loop version that you're using is the way to go.

Categories