Many times, one needs to check if there is at least one element inside a QuerySet. Mostly, I use exists:
if queryset.exists():
...
However, I've seen colleagues using python's any function:
if any(queryset):
...
Is using python's any function unoptimal?
My intuition tells me that this is a similar dilemma to one between using count and len: any will iterate through the QuerySet and, therefore, will need to evaluate it. In a case where we will use the QuerySet items, this doesn't create any slowdowns. However, if we need just to check if any pieces of data that satisfy a query exist, this might load data that we do not need.
Is using python's any function unoptimal?
The most Pythonic way would be:
if queryset:
# …
Indeed, a QuerySet has truthiness True if it contains at least one item, and False otherwise.
In case you later want to enumerate over the queryset (with a for loop for example), it will load the items in the cache if you check its truthiness, so for example:
if queryset:
for item in queryset:
# …
will only make one query to the database: one that will fetch all items when you check the if queryset, and then later you can reuse that cache without making a second query.
In case you do not consume the queryset later in the process, then you can work with a .exists() [Django-doc]: this will not load records in memory, but only make a query to check if at least one such record exists, this is thus less expensive in terms of bandwidth between the application and the database. If you however have to consume the queryset later, using .exists() is not a good idea, since then we make two queries.
Using any(queryset) however is non-sensical: you can check if a queryset contains elements by its truthiness, so using any() will usually only make that check slightly less efficient.
Suppose I have a QuerySet of all db entries:
all_db_entries = Entry.objects.all()
And then I want to get some specific objects from it by calling get(param=value) (or any other method).
The problem is, that In documentation of QuerySet methods it is said: "These methods do not use a cache. Rather, they query the database each time they’re called.".
But what I want to achieve is to load all data once (like doing Select *), and only after do some searches on them. I don't want to open a connection to the db every time I call get() in order to avoid a heavy load on it.
You can use values to convert your resulting queryset in an ordinary python list, which you can use to do searches etc., e.g.:
list(MyModel.objects.values('pk', 'field'))
values will fetch the queryset once.
I need to make a function that will be launched in celery and will take records from the model in turn, check something and write data to another model with onetoone relationship. There are a lot of entries and using model_name.objects.all () is not appropriate (it will take a lot of memory and time) how to do it correctly.
You can use an iterator over the queryset https://docs.djangoproject.com/en/dev/ref/models/querysets/#iterator so your records are fetched on by one
model_iterator = your_model.objects.all().iterator()
for record in model_iterator:
do_something(record)
Django official docs state that:
In some rare circumstances, it’s necessary to be able to force the save() method to perform an SQL INSERT and not fall back to doing an UPDATE. Or vice-versa: update, if possible, but not insert a new row. In these cases you can pass the force_insert=True or force_update=True parameters to the save() method. Obviously, passing both parameters is an error: you cannot both insert and update at the same time!
What are these rare circumstances?
Basically, when should one use force_update=True?
I am relatively new to Django and Python, but I have not been able to quite figure this one out.
I essentially want to query the database using filter for a large number of users. Then I want to make a bunch of queries on this just this section of users. So I thought it would be most efficient do first query for my larger filter parameters, and then make my separate filter queries on that set. In code, it looks like this
#Get the big groups of users, like all people with brown hair.
group_of_users = Data.objects.filter(......)
#Now get all the people with brown hair and blue eyes, and then all with green eyes, etc.
for each haircolor :
subset_of_group = group_of_users.filter(....)
That is just pseudo-code by the way, I am not that inept. I thought this would be more efficient, but it seems that if eliminate the first query and simply just get the querysets in the for loop, it is much faster (actually timed).
I fear this is because when I filter first, and then filter each time in the for loop, it is actually doing both sets of filter queries on each for loop execution. So really, doing twice the amount of work I want. I thought with caching this would not matter, as the first filter results would be cached and it would still be faster, but again, I timed it with multiple tests and the single filter is faster. Any ideas?
EDIT:
So it seems that querying for a set of data, and then trying to further query only against that set of data, is not possible. Rather, I should query for a set of data and then further parse that data using regular Python.
As garnertb ans lanzz said, it doesn't matter where you use the filter function, the only thing that matters is when you evaluate the query (see when querysets are evaluated). My guess is that in your tests, you evaluate the queryset somewhere in your code, and that you do more evaluations in your test with separate filter calls.
Whenever a queryset is evaluated, its results are cached. However, this cache does not carry over if you use another method, such as filter or order_by, on the queryset. SO you can't try to evaluate the bigger set, and use filtering on the queryset to retrieve the smaller sets without doing another query.
If you only have a small set of haircolours, you can get away with doing a query for each haircolour. However, if you have many of them, the amount of queries will have a severe impact on performance. In that case it might be better to do a query for the full set of users you want to use, and the do subsequent processing in python:
qs = Data.objects.filter(hair='brown')
objects = dict()
for obj in qs:
objects.setdefault(obj.haircolour, []).append(obj)
for (k, v) in objects.items():
print "Objects for colour '%s':" % k
for obj in v:
print "- %s" % obj
Filtering Django querysets does not perform any database operation, until you actually try to access the result. Filtering only adds conditions to the queryset, which are then used to build the final query when you access the result of the query.
When you assign group_of_users = Data.objects.filter(...), no data is retrieved from the database; you just get a queryset that knows that you want records that satisfy a specific condition (the filtering parameters you supplied to Data.objects.filter), but it does not pre-fetch those actual users. After that, when you assign subset_of_group = group_of_users.filter(....), you don't filter just that previous group of users, but only add more conditions to the queryset; still no data has been retrived from the database at this point. Only when you actually try to access the results of the queryset (by e.g. iterating over the queryset, or by slicing it, or by accessing a single index in it), the queryset will build an (usually) single query that would retrieve only user records that satisfy all filtering conditions you have accumulated in your querysets up to that point. It will still need to filter your entire users table to find those matching users; it cannot take advantage of the "previously retrieved" users from the group_of_users = Data.objects.filter(...) queryset, because nothing has been actually retrieved at that point.
Your approach is exactly right and it is efficient. The Querysets don't touch the database until they are evaluated, so you can add as many filters as you like and the database won't be touched. Django's excellent documentation provides all the information you need to figure out what operations cause the Queryset to be evaluated.