I need to get multiple random objects from a Django model.
I know I can get one random object from the model Person by typing:
person = Person.objects.order_by('?')[0]
Then, I saw suggestions in How to get two random records with Django saying I could simply do this by:
people = Person.objects.order_by('?')[0:n]
However, as soon as I add that [0:n], instead of returning the objects, Django returns a QuerySet object. This results in the unfortunate consequences that if I then ask for
print(people[0].first_name, people[0].last_name)
I get the first_name and last_name for 2 different people as QuerySets are evaluated as they are called (right?). How do I get the actual list of people that were returned from the first query?
I am using Python 3.4.0 and Django 1.7.1
Simeon Popov's answer solves the problem, but let me explain where it comes from.
As you probably know querysets are lazy and won't be evaluated until it's necessary. They also have an internal cache that gets filled once the entire queryset is evaluated. If only a single object is taken from a queryset (or a slice with a step specified, i.e. [0:n:2]), Django evaluates it, but the results won't get cached.
Take these two examples:
Example 1
>>> people = Person.objects.order_by('?')[0:n]
>>> print(people[0].first_name, people[0].last_name)
# first and last name of different people
Example 2
>>> people = Person.objects.order_by('?')[0:n]
>>> for person in people:
>>> print(person.first_name, person.last_name)
# first and last name are properly matched
In example 1, the queryset is not yet evaluated when you access the first item. It won't get cached, so when you access the first item again it runs another query on the database.
In the second example, the entire queryset is evaluated when you loop over it. Thus, the cache is filled and there won't be any additional database queries that would change the order of the returned items. In that case the names are properly aligned to each other.
Methods for evaluating an entire queryset are a.o. iteration, list(), bool() and len(). There are some subtle differences between these methods. If all you want to do is make sure the queryset is cached, I'd suggest using bool(), i.e.:
>>> people = Person.objects.order_by('?')[0:n]
>>> bool(people)
True
>>> print(people[0].first_name, people[0].last_name)
# matching names
Try this ...
people = []
for person in Person.objects.order_by('?')[0:n]:
people.append(person)
Related
I have Order objects and OrderOperation objects that represent an action on a Order (creation, modification, cancellation).
Conceptually, an order has 1 to many order operations. Each time there is an operation on the order, the total is computed in this operation. Which means when I need to find an attribute of an order, I just get the last order operation attribute instead, using a Subquery.
The simplified code
class OrderOperation(models.Model):
order = models.ForeignKey(Order)
total = DecimalField(max_digits=9, decimal_places=2)
class Order(models.Model)
# ...
class OrderQuerySet(query.Queryset):
#staticmethod
def _last_oo(field):
return Subquery(OrderOperation.objects
.filter(order_id=OuterRef("pk"))
.order_by('-id')
.values(field)
[:1])
def annotated_total(self):
return self.annotate(oo_total=self._last_oo('total'))
This way, I can run my_order_total = Order.objects.annotated_total()[0].oo_total. It works great.
The issue
Computing total is easy as it's a simple value. However, when there is a M2M or OneToMany field, this method does not work. For example, using the example above, let's add this field:
class OrderOperation(models.Model):
order = models.ForeignKey(Order)
total = DecimalField(max_digits=9, decimal_places=2)
ordered_articles = models.ManyToManyField(Article,through='orders.OrderedArticle')
Writing something like the following does NOT work as it returns only 1 foreign key (not a list of all the FKs):
def annotated_ordered_articles(self):
return self.annotate(oo_ordered_articles=self._last_oo('ordered_articles'))
The purpose
The whole purpose is to allow a user to search among all orders, providing a list or articles in input. For example: "Please find all orders containing at least article 42 or article 43", or "Please find all orders containing exactly article 42 and 43", etc.
If I could get something like:
>>> Order.objects.annotated_ordered_articles()[0].oo_ordered_articles
<ArticleQuerySet [<Article: Article42>, <Article: Article43>]>
or even:
>>> Order.objects.annotated_ordered_articles()[0].oo_ordered_articles
[42,43]
That would solve my issue.
My current idea
Maybe something like ArrayAgg (I'm using pgSQL) could do the trick, but I'm not sure to understand how to use it in my case.
Maybe this has to do with values() method that seems to not be intended to handle M2M and 1TM relations as stated in the doc:
values() and values_list() are both intended as optimizations for a
specific use case: retrieving a subset of data without the overhead of
creating a model instance. This metaphor falls apart when dealing with
many-to-many and other multivalued relations (such as the one-to-many
relation of a reverse foreign key) because the “one row, one object”
assumption doesn’t hold.
ArrayAgg will be great if you want to fetch only one variable (ie. name) from all articles. If you need more, there is a better option for that:
prefetch_related
Instead, you can prefetch for each Order, latest OrderOperation as a whole object. This adds the ability to easily get any field from OrderOperation without extra magic.
The only caveat with that is that you will always get a list with one operation or an empty list when there are no operations for selected order.
To do that, you should use prefetch_related queryset model together with Prefetch object and custom query for OrderOperation. Example:
from django.db.models import Max, F, Prefetch
last_order_operation_qs = OrderOperation.objects.annotate(
lop_pk=Max('order__orderoperation__pk')
).filter(pk=F('lop_pk'))
orders = Order.objects.prefetch_related(
Prefetch('orderoperation_set', queryset=last_order_operation_qs, to_attr='last_operation')
)
Then you can just use order.last_operation[0].ordered_articles to get all ordered articles for particular order. You can add prefetch_related('ordered_articles') to first queryset to have improved performance and less queries on database.
To my surprise, your idea with ArrayAgg is right on the money. I didn't know there was a way to annotate with an array (and I believe there still isn't for backends other than Postgres).
from django.contrib.postgres.aggregates.general import ArrayAgg
qs = Order.objects.annotate(oo_articles=ArrayAgg(
'order_operation__ordered_articles__id',
'DISTINCT'))
You can then filter the resulting queryset using the ArrayField lookups:
# Articles that contain the specified array
qs.filter(oo_articles__contains=[42,43])
# Articles that are identical to the specified array
qs.filter(oo_articles=[42,43,44])
# Articles that are contained in the specified array
qs.filter(oo_articles__contained_by=[41,42,43,44,45])
# Articles that have at least one element in common
# with the specified array
qs.filter(oo_articles__overlap=[41,42])
'DISTINCT' is needed only if the operation may contain duplicate articles.
You may need to tweak the exact name of the field passed to the ArrayAgg function. For subsequent filtering to work, you may also need to cast id fields in the ArrayAgg to int as otherwise Django casts the id array to ::serial[], and my Postgres complained about type "serial[]" does not exist:
from django.db.models import IntegerField
from django.contrib.postgres.fields.array import ArrayField
from django.db.models.functions import Cast
ArrayAgg(Cast('order_operation__ordered_articles__id', IntegerField()))
# OR
Cast(ArrayAgg('order_operation__ordered_articles__id'), ArrayField(IntegerField()))
Looking at your posted code more closely, you'll also have to filter on the one OrderOperation you are interested in; the query above looks at all operations for the relevant order.
The question related to python - django framework, and probably to experienced django developers. Googled it for some time, also seeked in django queryset itself, but have no answer. Is it possible to know if queryset has been filtered and if so, get key value of filtered parameters?
I'm developing web system with huge filter set, and I must predefine some user-background behavior if some filters had been affected.
Yes, but since to the best of my knowledge this is not documented, you probably should not use it. Furthermore it looks to me like bad design if you need to obtain this from a QuerySet.
For a QuerySet, for example qs, you can obtain the .query attribute, and then query for the .where attribute. The truthiness of that attribute checks if that node (this attribute is a WhereNode, which is a node in the syntax of the query) has children (these children are then individual WHERE conditions, or groups of such conditions), hence has done some filtering.
So for example:
qs = Model.objects.all()
bool(qs.query.where) # --> False
qs = Model.objects.filter(foo='bar')
bool(qs.query.where) # --> True
If you inspect the WhereNode, you can see the elements out of which it is composed, for example:
>>> qs.query.where
<WhereNode: (AND: <django.db.models.lookups.Exact object at 0x7f2c55615160>)>
and by looking to the children, we even can obtain details:
>>> qs.query.where.children[0]
>>> c1.lhs
Col(app_model, app.Model.foo)
>>> c1.lookup_name
'exact'
>>> c1.rhs
'bar'
But the notation is rather cryptic. Furthermore the WhereNode is not per se a conjunctive one (the AND), it can also be an disjunctive one (the OR), and it is not said that any filtering will be done (since the tests can trivially be true, like 1 > 0). We thus only query if there will be a non-empty WHERE in the SQL query. Not whether this query will restrict the queryset in any way (although you can of course inspect the WhereNode, and look if that holds).
Note that some constraints are not part of the WHERE, for example if you make a JOIN, you will perform an ON, but this is not a WHERE clause.
Since however the above is - to the best of my knowledge - not extenstively documented, it is probably not a good idea to depend on this, since that means that it can easily change, and thus no longer work.
You can use the query attribute (i.e. queryset.query) to get the data used in the SQL query (the output isn't exactly valid SQL).
You can also use queryset.query.__dict__ to get that data in a dictionary format.
I agree with Willem Van Onsen, in that accessing the internals of the query object isn't guaranteed to work in the future. It's correct for now, but might change.
But going half-way down that path, you could use the following:
is_filtered_query = bool(' WHERE ' in str(queryset.query))
which will pretty much do the job!
from django.db import connection, reset_queries
Prints: []
reset_queries()
p = XModel.objects.filter(id=id) \
.values('name') \
.annotate(quantity=Count('p_id'))\
.order_by('-quantity') \
.distinct()[:int(count)]
print(connection.queries)
While this prints:
reset_queries()
tc = ZModel.objects\
.filter(id=id, stock__gt=0) \
.aggregate(Sum('price'))
print(connection.queries)
I have changed fields names to keep things simple. (Fields are of parent tables i.e. __ to multiple level)
I was trying to print MySQL queries that Django makes and came across connection.queries, I was wondering why doesn't it prints empty with first, while with second it works fine. Although I am getting the result I expect it to. Probably the query is executed. Also am executing only one at a time.
As the accepted answer says you must consume the queryset first since it's lazy (e.g. list(qs)).
Another reason can be that you must be in DEBUG mode (see FAQ):
connection.queries is only available if Django DEBUG setting is True.
Because QuerySets in Django are lazy: as long as you do not consume the result, the QuerySet is not evaluated: no querying is done, until you want to obtain non-QuerySet objects like lists, dictionaries, Model objects, etc.
We can however not doe this for all ORM calls: for example Model.objects.get(..) has as type a Model object, we can not postpone that fetch (well of course we could wrap it in a function, and call it later, but then the "type" is a function, not a Model instance).
The same with a .aggregate(..) since then the result is a dictionary, that maps the keys to the corresponding result of the aggregation.
But your first query does not need to be evaluated. By writing a slicing, you only have added a LIMIT statement at the end of the query, but no need to evaluate it immediately: the type of this is still a QuerySet.
If you would however call list(qs) on a QuerySet (qs), then this means the QuerySet has to be evaluated, and Django will make the query.
The laziness of QuerySets also makes these chainings possible. Imagine that you write:
Model.objects.filter(foo=42).filter(bar=1425)
If the QuerySet of Model.objects.filter(foo=42) would be evaluated immediately, then this could result in a huge amount of Model instances, but by postponing this, we now filter on bar=1425 as well (we constructed a new QuerySet that takes both .filter(..)s into account). This can result in a query that can be evaluated more efficiently, and for example, can result in less data that has to be transferred from the database to the Django server.
The documentation says QuerySets are lazy as shown below:
QuerySets are lazy – the act of creating a QuerySet doesn’t involve
any database activity. You can stack filters together all day long,
and Django won’t actually run the query until the QuerySet is
evaluated. Take a look at this example:
>>> q = Entry.objects.filter(headline__startswith="What")
>>> q = q.filter(pub_date__lte=datetime.date.today())
>>> q = q.exclude(body_text__icontains="food")
>>> print(q)
Though this looks like three database hits, in fact it hits the
database only once, at the last line (print(q)). In general, the
results of a QuerySet aren’t fetched from the database until you “ask”
for them. When you do, the QuerySet is evaluated by accessing the
database. For more details on exactly when evaluation takes place, see
When QuerySets are evaluated.
A have piece of code, which fetches some QuerySet from DB and then appends new calculated field to every object in the Query Set. It's not an option to add this field via annotation (because it's legacy and because this calculation based on another already pre-fetched data).
Like this:
from django.db import models
class Human(models.Model):
name = models.CharField()
surname = models.CharField()
def calculate_new_field(s):
return len(s.name)*42
people = Human.objects.filter(id__in=[1,2,3,4,5])
for s in people:
s.new_column = calculate_new_field(s)
# people.somehow_reorder(new_order_by=new_column)
So now all people in QuerySet have a new column. And I want order these objects by new_column field. order_by() will not work obviously, since it is a database option. I understand thatI can pass them as a sorted list, but there is a lot of templates and other logic, which expect from this object QuerySet-like inteface with it's methods and so on.
So question is: is there some not very bad and dirty way to reorder existing QuerySet by dinamically added field or create new QuerySet-like object with this data? I believe I'm not the only one who faced this problem and it's already solved with django. But I can't find anything (except for adding third-party libs, and this is not an option too).
Conceptually, the QuerySet is not a list of results, but the "instructions to get those results". It's lazily evaluated and also cached. The internal attribute of the QuerySet that keeps the cached results is qs._result_cache
So, the for s in people sentence is forcing the evaluation of the query and caching the results.
You could, after that, sort the results by doing:
people._result_cache.sort(key=attrgetter('new_column'))
But, after evaluating a QuerySet, it makes little sense (in my opinion) to keep the QuerySet interface, as many of the operations will cause a reevaluation of the query. From this point on you should be dealing with a list of Models
Can you try it functions.Length:
from django.db.models.functions import Length
qs = Human.objects.filter(id__in=[1,2,3,4,5])
qs.annotate(reorder=Length('name') * 42).order_by('reorder')
I commonly find myself writing the same criteria in my Django application(s) more than once. I'll usually encapsulate it in a function that returns a Django Q() object, so that I can maintain the criteria in just one place.
I will do something like this in my code:
def CurrentAgentAgreementCriteria(useraccountid):
'''Returns Q that finds agent agreements that gives the useraccountid account current delegated permissions.'''
AgentAccountMatch = Q(agent__account__id=useraccountid)
StartBeforeNow = Q(start__lte=timezone.now())
EndAfterNow = Q(end__gte=timezone.now())
NoEnd = Q(end=None)
# Now put the criteria together
AgentAgreementCriteria = AgentAccountMatch & StartBeforeNow & (NoEnd | EndAfterNow)
return AgentAgreementCriteria
This makes it so that I don't have to think through the DB model more than once, and I can combine the return values from these functions to build more complex criterion. That works well so far, and has saved me time already when the DB model changes.
Something I have realized as I start to combine the criterion from these functions that is that a Q() object is inherently tied to the type of object .filter() is being called on. That is what I would expect.
I occasionally find myself wanting to use a Q() object from one of my functions to construct another Q object that is designed to filter a different, but related, model's instances.
Let's use a simple/contrived example to show what I mean. (It's simple enough that normally this would not be worth the overhead, but remember that I'm using a simple example here to illustrate what is more complicated in my app.)
Say I have a function that returns a Q() object that finds all Django users, whose username starts with an 'a':
def UsernameStartsWithAaccount():
return Q(username__startswith='a')
Say that I have a related model that is a user profile with settings including whether they want emails from us:
class UserProfile(models.Model):
account = models.OneToOneField(User, unique=True, related_name='azendalesappprofile')
emailMe = models.BooleanField(default=False)
Say I want to find all UserProfiles which have a username starting with 'a' AND want use to send them some email newsletter. I can easily write a Q() object for the latter:
wantsEmails = Q(emailMe=True)
but find myself wanting to something to do something like this for the former:
startsWithA = Q(account=UsernameStartsWithAaccount())
# And then
UserProfile.objects.filter(startsWithA & wantsEmails)
Unfortunately, that doesn't work (it generates invalid PSQL syntax when I tried it).
To put it another way, I'm looking for a syntax along the lines of Q(account=Q(id=9)) that would return the same results as Q(account__id=9).
So, a few questions arise from this:
Is there a syntax with Django Q() objects that allows you to add "context" to them to allow them to cross relational boundaries from the model you are running .filter() on?
If not, is this logically possible? (Since I can write Q(account__id=9) when I want to do something like Q(account=Q(id=9)) it seems like it would).
Maybe someone suggests something better, but I ended up passing the context manually to such functions. I don't think there is an easy solution, as you might need to call a whole chain of related tables to get to your field, like table1__table2__table3__profile__user__username, how would you guess that? User table could be linked to table2 too, but you don't need it in this case, so I think you can't avoid setting the path manually.
Also you can pass a dictionary to Q() and a list or a dictionary to filter() functions which is much easier to work with than using keyword parameters and applying &.
def UsernameStartsWithAaccount(context=''):
field = 'username__startswith'
if context:
field = context + '__' + field
return Q(**{field: 'a'})
Then if you simply need to AND your conditions you can combine them into a list and pass to filter:
UserProfile.objects.filter(*[startsWithA, wantsEmails])