So I'm currently working in Python/Django and I have a problem where Django caches querysets "within a session".
If I run python manage.py shell and do so:
>>> from myproject.services.models import *
>>> test = TestModel.objects.filter(pk = 5)
>>> print test[0].name
>>> John
Now, if I then update it directly in SQL to Bob and run it again, it'll still say John. If I however CTRL+D out (exit) and run the same thing, it will have updated and will now print Bob.
My problem is that I'm running a SOAP service in a screen and it'll always return the same result, even if the data gets changed.
I need a way to force the query to actually pull the data from the database again, not just pull the cached data. I could just use raw queries but that doesn't feel like a solution to me, any ideas?
The queryset is not cached 'within a session'.
The Django documentation: Caching and QuerySets mentions:
Each QuerySet contains a cache to minimize database access. Understanding how it works will allow you to write the most efficient code.
In a newly created QuerySet, the cache is empty. The first time a QuerySet is evaluated – and, hence, a database query happens – Django saves the query results in the QuerySet’s cache and returns the results that have been explicitly requested (e.g., the next element, if the QuerySet is being iterated over). Subsequent evaluations of the QuerySet reuse the cached results.
Keep this caching behavior in mind, because it may bite you if you don’t use your QuerySets correctly.
(emphasis mine)
For more information on when querysets are evaluated, refer to this link.
If it is critical for your application that he querysets gets updated, you have to evaluate it each time, be it within a single view function, or with ajax.
It is like running a SQL query again and again. Like old times when no querysets have been available and you kept the data in some structure that you had to refresh.
Related
This is the use case: I have a server that receives instructions from many clients. Each client instructions are handled by its own Session object, who holds all the information about the state of the session and queries mongoengine for the data it needs.
Now, suppose session1 queries mongoengine and gets document "A" as a document object.
Later, session2 also queries and gets document "A", as another separate document object.
Now we have 2 document objects representing document "A", and to get them consistent I need to call A.update() and A.reload() all the time, which seems unnecessary.
Is there any way I can get a reference to the same document object over the two queries? This way both sessions could make changes to the document object and those changes would be seen by the other sessions, since they would be made to the same python object.
I've thought about making a wrapper for mongoengine that caches the documents that we have as document objects at runtime and ensures there are no multiple objects for the same document at any given time. But my knowledge of mongoengine is too rudimentary to do it at the time.
Any thoughts on this? Is my entire design flawed? Is there any easy solution?
I don't think going in that direction is a good idea. From what I understand you are in a web application context, you might be able to get something working for threads within a single process but you won't be able to share instances across different processes (and it gets even worse if you have processes running on different machines).
One way to address this is to use optimistic concurrency validation, you basically maintain a field like "version-identifier" that gets updated whenever the instance is updated and whenever you save/update the object, you run a query like "update object if version-identifier=... else you fail"
This means that if there are concurrent requests, 1 of them will succeed (first one to be flusged), the other one will fail because the version-identifier that they have is outdated. MongoEngine has no built in support for that but more info can be found here https://github.com/MongoEngine/mongoengine/issues/1563
I need to cache a mid-sized queryset (about 500 rows). I had a look on some solutions, django-cache-machine being the most promising.
Since the queryset is pretty much static (it's a table of cities that's been populated in advance and gets updated only by me and anyway, almost never), I just need to serve the same queryset at every request for filtering.
In my search, one detail was really not clear to me: is the cache a sort of singleton object, which is available to every request? By which I mean, if two different users access the same page, and the queryset is evaluated for the first user, does the second one get the cached queryset?
I could not figure out, what problem you are exactly facing. What you are saying is the classical use case for caching. Memcache and redis are two most popular options. You just needs to write some method or function which first tries to load the result from cache, if it not there , the it queries the database. E.g:-
from django.core.cache import cache
def cache_user(userid):
key = "user_{0}".format(userid)
value = cache.get(key)
if value is None:
# fetch value from db
cache.set(value)
return value
Although for simplicity, I have written this as function, ideally this should be a manager method of the concerned model.
In the Django 1.8 release notes, it mentions that Django Fields no longer use SubfieldBase, and has replaced the to_python call on assignment with from_db_value.
The docs also state that from_db_value is called whenever data is loaded from the database.
My question is, is from_db_value called if I directly read/write the db (i.e. using cursor.execute())? My initial tries and intuition says no, but I just want to make sure.
See The Django Documentation for Executing custom SQL directly.
Sometimes even Manager.raw() isn’t quite enough: you might need to perform queries that don’t map cleanly to models, or directly execute UPDATE, INSERT, or DELETE queries.
In these cases, you can always access the database directly, routing around the model layer entirely.
The above states that using cursor.execute() will bypass the model logic entirely, returning the raw row results.
If you want to perform raw queries and return model objects, see the Django Documentation on Performing raw queries.
The raw() manager method can be used to perform raw SQL queries that return model instances:
for p in Person.objects.raw('SELECT * FROM myapp_person'):
print(p)
>>> John Smith
>>> Jane Jones
I have a django model, TestModel, over an SQL database.
Whenever I do
TestModel.objects.all()
I seem to be getting the same results if I run it multiple times from the same process. I tested that by manually deleting (without using ANY of the django primitives) a line from the table the model is constructed on, the query still returns the same results, even though obviously there should be less objects after the delete.
Is there a caching mechanism of some sort and django is not going to the database every time I want to retrieve the objects?
If there is, is there a way I could still force django to go to the database on each query, preferably without writing raw SQL queries?
I should also specify that by restarting the process the model once again returns the correct objects, I don't see the deleted ones anymore, but if I delete some more the issue occurs again.
This is because your database isolation level is repeatable read. In a django shell all requests are enclosed in a single transaction.
Edited
You can try in your shell:
from django.db import transaction
with transaction.autocommit():
t = TestModel.objects.all()
...
Sounds like a db transaction issue. If you're keeping a shell session open while you separately go into the database itself and modify data, the transaction that's open in the shell won't see the changes because of isolation. You'll need to exit and reload the shell to get a new transaction before you can see them.
Note that in production, transactions are tied to the request/response cycle so this won't be a significant issue.
Django stated in their docs that all query sets are automatically cached, https://docs.djangoproject.com/en/dev/topics/db/queries/#caching-and-querysets. But they weren't super specific with the details of this functionality.
The example that they gave was to save the qs in a python variable, and subsequent calls after the first will be taken from the cache.
queryset = Entry.objects.all()
print([p.headline for p in queryset]) # Evaluate the query set.
print([p.pub_date for p in queryset]) # Re-use the cache from the evaluation.
So even if two exact queryset calls were made without a variable subsequently when a user loads a view, would the results not be cached?
# When the user loads the homepage, call number one (not cached)
def home(request):
entries = Entry.objects.filter(something)
return render_to_response(...)
# Call number two, is this cached automatically? Or do I need to import cache and
# manually do it? This is the same method as above, called twice
def home(request):
entries = Entry.objects.filter(something)
return render_to_response(...)
Sorry if this is confusing, I pasted the method twice to make it look like the user is calling it twice, its just one method. Are entries automatically cached?
Thanks
The queryset example you have given rightly indicates that querysets are evaluated lazily i.e the first time they are used. So when subsequently used again, they are not evaluated in the same flow when assigned to a variable. This is not exactly caching but re-using an evaluated expression as long as it is available in an optimized manner.
For the kind of caching you are looking at i.e the same view called twice, you will need to manually cache the database object when it is fetched the first time. Memcached is good for this. Then subsequently check and fetch like in example below.
def view(request):
results = cache.get(request.user.id)
if not results:
results = do_a_ton_of_work()
cache.set(request.user.id, results)
There are of course a lot of other ways to do caching at different levels right from your proxy server to per url caching. Whatever works best for you. Here is a good read on this topic.
It is not cached for two reasons:
When you use just filter, but don't "loop" through the results the queryset is not yet evaluated, which means the cache is still empty.
Even they would be evaluated it is not cached, because when you call the function the second time the queryset is recreated (new local variable), even you created it already the first time you called the function. The second function call does simply not "know" what you did before. Its simply a new queryset instance. In this case you might rely on the database cache though.
Memcached is built-in module, it works nice but even you can try for "johnny cache" for more better results.
you can get the more info here
http://packages.python.org/johnny-cache/