I'm having a attribute on my model to allow the user to order the objects. I have to update the element's order depending on a list, that contains the object's ids in the new order; right now I'm iterating over the whole queryset and set one objects after the other. What would be the easiest/fastest way to do the same with the whole queryset?
def update_ordering(model, order):
""" order is in the form [id,id,id,id] for example: [8,4,5,1,3] """
id_to_order = dict((order[i], i) for i in range(len(order)))
for x in model.objects.all():
x.order = id_to_order[x.id]
x.save()
This cannot be done in one single queryset operation. As far as I know it can't even be done in one query with raw SQL. So you will always need the update call for each object that has to be updated. So both your and Collin Anderson's solutions seem quite optimal for your description.
However, what's your use case? Is really the whole list going to change every time? In most situations this seems very unlikely. I can see some different approaches.
You save the order field like you say, but you generate a diff for the order list:def
update_ordering(model, order):
""" order is in the form [id,id,id,id] for example: [8,4,5,1,3] """
original_order = model.objects.value_list('id', flat=True).order_by('order')
order = filter( lambda x: x[1]!=x[2], zip(xrange(len(order)), order, original_order))
for i in order:
model.objects.filter(id=i[1]).update(order=i[0])
Another approach, depending on what you're making is to do a partial update (e.g. using AJAX) if possible, instead of updating the whole re-ordered set, just update every update separately. This will often increase the total load, but will spread it more over time. Take for example moving the 5th element step-by-step to place 2, this will introduce 3 swaps: (5,4); (4,3); (3,2). Resulting in 6 updates, while with the all-in-one-time approach only 4 will be needed. But the small operations will be spread over time.
I don't know if it is possible to do this using one query, but this is more efficient in any case:
def update_ordering(model, order):
""" order is in the form [id,id,id,id] for example: [8,4,5,1,3] """
for id in model.objects.values_list('id', flat=True):
model.objects.filter(id=id).update(order=order.index(id))
Related
I have Order objects and OrderOperation objects that represent an action on a Order (creation, modification, cancellation).
Conceptually, an order has 1 to many order operations. Each time there is an operation on the order, the total is computed in this operation. Which means when I need to find an attribute of an order, I just get the last order operation attribute instead, using a Subquery.
The simplified code
class OrderOperation(models.Model):
order = models.ForeignKey(Order)
total = DecimalField(max_digits=9, decimal_places=2)
class Order(models.Model)
# ...
class OrderQuerySet(query.Queryset):
#staticmethod
def _last_oo(field):
return Subquery(OrderOperation.objects
.filter(order_id=OuterRef("pk"))
.order_by('-id')
.values(field)
[:1])
def annotated_total(self):
return self.annotate(oo_total=self._last_oo('total'))
This way, I can run my_order_total = Order.objects.annotated_total()[0].oo_total. It works great.
The issue
Computing total is easy as it's a simple value. However, when there is a M2M or OneToMany field, this method does not work. For example, using the example above, let's add this field:
class OrderOperation(models.Model):
order = models.ForeignKey(Order)
total = DecimalField(max_digits=9, decimal_places=2)
ordered_articles = models.ManyToManyField(Article,through='orders.OrderedArticle')
Writing something like the following does NOT work as it returns only 1 foreign key (not a list of all the FKs):
def annotated_ordered_articles(self):
return self.annotate(oo_ordered_articles=self._last_oo('ordered_articles'))
The purpose
The whole purpose is to allow a user to search among all orders, providing a list or articles in input. For example: "Please find all orders containing at least article 42 or article 43", or "Please find all orders containing exactly article 42 and 43", etc.
If I could get something like:
>>> Order.objects.annotated_ordered_articles()[0].oo_ordered_articles
<ArticleQuerySet [<Article: Article42>, <Article: Article43>]>
or even:
>>> Order.objects.annotated_ordered_articles()[0].oo_ordered_articles
[42,43]
That would solve my issue.
My current idea
Maybe something like ArrayAgg (I'm using pgSQL) could do the trick, but I'm not sure to understand how to use it in my case.
Maybe this has to do with values() method that seems to not be intended to handle M2M and 1TM relations as stated in the doc:
values() and values_list() are both intended as optimizations for a
specific use case: retrieving a subset of data without the overhead of
creating a model instance. This metaphor falls apart when dealing with
many-to-many and other multivalued relations (such as the one-to-many
relation of a reverse foreign key) because the “one row, one object”
assumption doesn’t hold.
ArrayAgg will be great if you want to fetch only one variable (ie. name) from all articles. If you need more, there is a better option for that:
prefetch_related
Instead, you can prefetch for each Order, latest OrderOperation as a whole object. This adds the ability to easily get any field from OrderOperation without extra magic.
The only caveat with that is that you will always get a list with one operation or an empty list when there are no operations for selected order.
To do that, you should use prefetch_related queryset model together with Prefetch object and custom query for OrderOperation. Example:
from django.db.models import Max, F, Prefetch
last_order_operation_qs = OrderOperation.objects.annotate(
lop_pk=Max('order__orderoperation__pk')
).filter(pk=F('lop_pk'))
orders = Order.objects.prefetch_related(
Prefetch('orderoperation_set', queryset=last_order_operation_qs, to_attr='last_operation')
)
Then you can just use order.last_operation[0].ordered_articles to get all ordered articles for particular order. You can add prefetch_related('ordered_articles') to first queryset to have improved performance and less queries on database.
To my surprise, your idea with ArrayAgg is right on the money. I didn't know there was a way to annotate with an array (and I believe there still isn't for backends other than Postgres).
from django.contrib.postgres.aggregates.general import ArrayAgg
qs = Order.objects.annotate(oo_articles=ArrayAgg(
'order_operation__ordered_articles__id',
'DISTINCT'))
You can then filter the resulting queryset using the ArrayField lookups:
# Articles that contain the specified array
qs.filter(oo_articles__contains=[42,43])
# Articles that are identical to the specified array
qs.filter(oo_articles=[42,43,44])
# Articles that are contained in the specified array
qs.filter(oo_articles__contained_by=[41,42,43,44,45])
# Articles that have at least one element in common
# with the specified array
qs.filter(oo_articles__overlap=[41,42])
'DISTINCT' is needed only if the operation may contain duplicate articles.
You may need to tweak the exact name of the field passed to the ArrayAgg function. For subsequent filtering to work, you may also need to cast id fields in the ArrayAgg to int as otherwise Django casts the id array to ::serial[], and my Postgres complained about type "serial[]" does not exist:
from django.db.models import IntegerField
from django.contrib.postgres.fields.array import ArrayField
from django.db.models.functions import Cast
ArrayAgg(Cast('order_operation__ordered_articles__id', IntegerField()))
# OR
Cast(ArrayAgg('order_operation__ordered_articles__id'), ArrayField(IntegerField()))
Looking at your posted code more closely, you'll also have to filter on the one OrderOperation you are interested in; the query above looks at all operations for the relevant order.
Model C has a foreign key link to model B. Model B has a foreign key link to model A. This is, model A instance may have many model B instances, and Model B may have many model C instances.
Pseudo-code:
class A:
...
class B:
a = models.ForeignKey(A, ...)
class C:
b = models.ForeignKey(B, ...)
I would like to retrieve the entire list of elements of model A from the database, annotating it with the count of B elements it has, where their respective count of C elements is not none (or any random number, for that matter).
I tried this:
A.objects.annotate(
at_least_one_count=Count('b', filter=Q(b__c__isnull=False))
)
To the best of my knowledge, this should be working. However, numbers returned are not correct (in a practical case). It returns a higher number than without the filter, which is obviously not possible:
A.objects.annotate(
at_least_one_count=Count('b')
)
Edit: when I run the following query individually for every A instance, I get the same numbers, which makes me think there might be something wrong in my code:
A.objects.first().b_set.filter(c__isnull=False).__len__()
Note: I would like to perform this query without SQL. If I have to utilise some more advance Pythonic tools that Django provides, I am happy to do it, as long as I stay Object-Oriented. I am trying to move away from using raw SQL for all database operations, and re-write them all with Django ORM. However, it seems to be overly complicated.
The answer is simple: after applying queries that translate to join statements, one must execute a distinct on filter, which in Django is done by calling .distinct(...) on the query set.
In this case, if you are using a filter, and the distinct restriction is on the filter object, then you want to use:
...=Count('b', filter=Q(b__c__isnull=False), distinct=True)
Is there a way to speed this up?
section_list_active = Section.objects.all().filter(course=this_course,Active=True,filter(teacher__in=[this_teacher]).order_by('pk')
if section_list_active.count() > 0:
active_section_id = section_list_active.first().pk
else:
section_list = Section.objects.all().filter(course=this_course).filter(teacher__in=[this_teacher]).order_by('pk')
active_section_id = section_list.first().pk
I know that the querysets aren't evaluated until they're used (the .first().pk bit?), so is there then no faster way to do this? If they weren't lazy, I could hit the DB for the whole list of sections, then just filter that set for ones with the property Active, but I'm not seeing how I can do this without hitting it twice (assuming that I have to go to the else).
Given you are only intersted in the Section (or the corresponding pk), Yes. You can order by theActive field first:
active_section_id = Section.objects.filter(
course=this_course,
teacher=this_teacher
).order_by('-Active', 'pk').first().pk
or even a bit more elegant:
active_section_id = Section.objects.filter(
course=this_course,
teacher=this_teacher
).earliest('-Active', 'pk').pk
We thus will first order Active in descending order such that if there is a row with Active=True, then this one will be sorted at the top. Next we thus take the .first() object, and optionally take the pk.
In case there is no Section that satsifies the filter conditions, then .first() or .earliest() will return None, and this will thus result in an AttributeError when fetching the pk attribute.
Note: typically fields are written in lowercase, so active instead of Active.
I need to get multiple random objects from a Django model.
I know I can get one random object from the model Person by typing:
person = Person.objects.order_by('?')[0]
Then, I saw suggestions in How to get two random records with Django saying I could simply do this by:
people = Person.objects.order_by('?')[0:n]
However, as soon as I add that [0:n], instead of returning the objects, Django returns a QuerySet object. This results in the unfortunate consequences that if I then ask for
print(people[0].first_name, people[0].last_name)
I get the first_name and last_name for 2 different people as QuerySets are evaluated as they are called (right?). How do I get the actual list of people that were returned from the first query?
I am using Python 3.4.0 and Django 1.7.1
Simeon Popov's answer solves the problem, but let me explain where it comes from.
As you probably know querysets are lazy and won't be evaluated until it's necessary. They also have an internal cache that gets filled once the entire queryset is evaluated. If only a single object is taken from a queryset (or a slice with a step specified, i.e. [0:n:2]), Django evaluates it, but the results won't get cached.
Take these two examples:
Example 1
>>> people = Person.objects.order_by('?')[0:n]
>>> print(people[0].first_name, people[0].last_name)
# first and last name of different people
Example 2
>>> people = Person.objects.order_by('?')[0:n]
>>> for person in people:
>>> print(person.first_name, person.last_name)
# first and last name are properly matched
In example 1, the queryset is not yet evaluated when you access the first item. It won't get cached, so when you access the first item again it runs another query on the database.
In the second example, the entire queryset is evaluated when you loop over it. Thus, the cache is filled and there won't be any additional database queries that would change the order of the returned items. In that case the names are properly aligned to each other.
Methods for evaluating an entire queryset are a.o. iteration, list(), bool() and len(). There are some subtle differences between these methods. If all you want to do is make sure the queryset is cached, I'd suggest using bool(), i.e.:
>>> people = Person.objects.order_by('?')[0:n]
>>> bool(people)
True
>>> print(people[0].first_name, people[0].last_name)
# matching names
Try this ...
people = []
for person in Person.objects.order_by('?')[0:n]:
people.append(person)
I am using a piece of code in two separate places in order to dynamically generate some form fields. In both cases, dynamic_fields is a dictionary where the keys are objects and the values are lists of objects (in the event of an empty list, the value is False instead):
class ExampleForm(forms.ModelForm):
def __init__(self, *args, **kwargs):
dynamic_fields = kwargs.pop('dynamic_fields')
super(ExampleForm, self).__init__(*args, **kwargs)
for key in dynamic_fields:
if dynamic_fields[key]:
self.fields[key.description] = forms.ModelMultipleChoiceField(widget=forms.CheckboxSelectMultiple, queryset=dynamic_fields[key], required=False)
class Meta:
model = Foo
fields = ()
In one view, for any key the value is a list of objects returned with a single DB query - a single, normal queryset. This view works just fine.
In the other view, it takes multiple queries to get everything I need to construct a given value. I am first instantiating the dictionary with the values set equal to blank lists, then adding the querysets I get from these multiple queries to the appropriate lists one at a time with basic list comprehension (dict[key] += queryset). This makes each value a 2-D list, which I then flatten (and remove duplicates) by doing:
for key in dict:
dict[key] = list(set(dict[key]))
I have tried this several different ways - directly appending the queries in each queryset to the values/lists, leaving it as a list of lists, using append instead of += - but I get the same error every time: 'list' object has no attribute 'none'.
Looking through the traceback, the error is coming up in the form's clean method. This is the relevant section from the code in django.forms.models:
def clean(self, value):
if self.required and not value:
raise ValidationError(self.error_messages['required'], code='required')
elif not self.required and not value:
return self.queryset.none() # the offending line
My thought process so far: in my first view, I'm generating the list that serves as the value for each key via a single query, but I'm combining multiple queries into a list in my second view. That list doesn't have a none method like I would normally have with a single queryset.
How do I combine multiple querysets without losing access to this method?
I found this post, but I'm still running into the same issue using itertools.chain as suggested there. The only thing I've been able to accomplish with that is changing the error to say 'chain' or 'set' object has no attribute 'none'.
Edit: here's some additional information about how the querysets are generated. I have the following models (only relevant fields are shown):
class Profile(models.Model):
user = models.OneToOneField(User)
preferred_genres = models.ManyToManyField(Genre, blank=True)
class Genre(models.Model):
description = models.CharField(max_length=200, unique=True)
parent = models.ForeignKey("Genre", null=True, blank=True)
class Trope(models.Model):
description = models.CharField(max_length=200, unique=True)
genre_relation = models.ManyToManyField(Genre)
In (the working) view #1, the dictionary I use to generate my fields has keys equal to a certain Genre, and values equal to a list of Genres for whom the key is a parent. In other words, for every key, the queryset is Genre.objects.filter(parent=key, **kwargs).
In the non-functional view #2, we need to start with the profile's preferred_genres field. For every preferred_genre I need to pull the associated Tropes and combine them into a single queryset. Right now, I am looping through preferred_genres and doing something like this:
for g in preferred_genres:
tropeset = g.trope_set.all()
This gets me a bunch of individual querysets containing the information I need, but I can't find a way to combine the multiple tropesets into one big queryset (as opposed to a list without the none attribute). (As an aside, this also hammers my database with a bunch of queries. I am also trying to wrap my head around how I can maybe use prefetch_related to reduce the number of queries, but one thing at a time.)
If I can't combine these querysets into one but CAN somehow accomplish these lookups with a single query, I am all ears! I am now reading the documentation regarding complex queries with the Q object. It is tantalizing - I can conceptualize how this would accomplish what I'm looking for, but only if I can call all of the queries at one time. Since I have to call them iteratively one at a time, I am not sure how to use the Q object to | or & them together.
You can combine querysets by using the | and & operators.
from functools import reduce
from operator import and_, or_
querysets = [q1, q2, q3, ...] # List of querysets you want to combine.
# Objects that are present in *at least one* of the queries
combined_or_querysets = reduce(or_, querysets[1:], querysets[0])
# Objects that are present in *all* of the queries
combined_and_querysets = reduce(and_, querysets[1:], querysets[0])
From Django 1.11+ you can also use the union and intersection methods.
I've found a "solution" to this problem. If I structure the query like so, I can get everything I need in one swoop without having to combine querysets after the fact:
desired_value = Trope.objects.filter(genre_relation__in=preferred_genres).distinct()
I still do not know how to combine multiple querysets into one without losing the inherent "queryset-ness" that seems to be necessary for the form to render properly. However, for my specific use case, restructuring the query as noted renders the issue moot.