How can I find the intersection of two Django querysets? - python

I’ve got a Django model with two custom manager methods. Each returns a different subset of the model’s objects, based on a different property of the object.
class FeatureManager(models.Manager):
def without_test_cases(self):
return self.get_query_set().annotate(num_test_cases=models.Count('testcase_set')).filter(num_test_cases=0)
def standardised(self):
return self.get_query_set().annotate(standardised=Count('documentation_set__standard')).filter(standardised__gt=0)
(Both testcase_set and documentation_set refer to ManyToManyFields on other models.)
Is there any way to get a queryset, or just a list of objects, that’s the intersectiond of the querysets returned by each manager method?

In most cases you can just write (exploiting the "Set" part of QuerySet) :
intersection = Model.objects.filter(...) & Model.objects.filter(...)
This isn't very well documented, but should behave almost exactly like using AND conditions on conditions from both queries. Relevant code: https://github.com/django/django/blob/1.8c1/django/db/models/query.py#L203

You can just do something like this:
intersection = queryset1 & queryset2
To do a union just replace & by |

As per Django 1.11, now it's available the function intersection()
>>> qs1.intersection(qs2, qs3)

I believe qs1.filter(pk__in=qs2) should work (usually). It seems to work for a similar case for me, it makes sense that it would work, and the generated query looks sane. (If one of your querysets uses values() to not select the primary key column or something weird, I can believe it'd break, though...)

Refactor
class FeatureManager(models.Manager):
#staticmethod
def _test_cases_eq_0( qs ):
return qs.annotate( num_test_cases=models.Count('testcase_set') ).filter(num_test_cases=0)
#staticmethod
def _standardized_gt_0( qs ):
return qs.annotate( standardised=Count('documentation_set__standard') ).filter(standardised__gt=0)
def without_test_cases(self):
return self._test_cases_eq_0( self.get_query_set() )
def standardised(self):
return self._standardized_gt_0( self.get_query_set() )
def intersection( self ):
return self._test_cases_eq_0( self._standardized_gt_0( self.get_query_set() ) )

If you want to do it in python, not in the database:
intersection = set(queryset1) & set(queryset2)
The problems is that if you use different annotations in the queriesdue to the added annotations the objects might look different...

One way may be to use the python sets module and just do an intersection:
make a couple of query sets that overlap at id=5:
In [42]: first = Location.objects.filter(id__lt=6)
In [43]: last = Location.objects.filter(id__gt=4)
"import sets" first (gets a deprecation warning... ummm... oh well). Now build and intersect them - we get one element in the set:
In [44]: sets.Set(first).intersection(sets.Set(last))
Out[44]: Set([<Location: Location object>])
Now get the id of the intersection elements to check it really is 5:
In [48]: [s.id for s in sets.Set(first).intersection(sets.Set(last))]
Out[48]: [5]
This obviously hits the database twice and returns all the elements of the query set - better way would be to chain the filters on your managers and that should be able to do it in one DB hit and at the SQL level. I cant see a QuerySet.and/or(QuerySet) method.

If you really are just using annotation to filter based on whether the count is zero or not, then this should work instead:
class FeatureManager(models.Manager):
def without_test_cases(self):
return self.get_query_set().filter(testcase__pk__isnull=True)
def standardised(self):
return self.get_query_set().filter(documentation_set__standard__isnull=False)
Since you no longer are worrying about annotation, the two queries should intersect very smoothly.

Related

Django - was queryset filtered using some parameters or not

The question related to python - django framework, and probably to experienced django developers. Googled it for some time, also seeked in django queryset itself, but have no answer. Is it possible to know if queryset has been filtered and if so, get key value of filtered parameters?
I'm developing web system with huge filter set, and I must predefine some user-background behavior if some filters had been affected.
Yes, but since to the best of my knowledge this is not documented, you probably should not use it. Furthermore it looks to me like bad design if you need to obtain this from a QuerySet.
For a QuerySet, for example qs, you can obtain the .query attribute, and then query for the .where attribute. The truthiness of that attribute checks if that node (this attribute is a WhereNode, which is a node in the syntax of the query) has children (these children are then individual WHERE conditions, or groups of such conditions), hence has done some filtering.
So for example:
qs = Model.objects.all()
bool(qs.query.where) # --> False
qs = Model.objects.filter(foo='bar')
bool(qs.query.where) # --> True
If you inspect the WhereNode, you can see the elements out of which it is composed, for example:
>>> qs.query.where
<WhereNode: (AND: <django.db.models.lookups.Exact object at 0x7f2c55615160>)>
and by looking to the children, we even can obtain details:
>>> qs.query.where.children[0]
>>> c1.lhs
Col(app_model, app.Model.foo)
>>> c1.lookup_name
'exact'
>>> c1.rhs
'bar'
But the notation is rather cryptic. Furthermore the WhereNode is not per se a conjunctive one (the AND), it can also be an disjunctive one (the OR), and it is not said that any filtering will be done (since the tests can trivially be true, like 1 > 0). We thus only query if there will be a non-empty WHERE in the SQL query. Not whether this query will restrict the queryset in any way (although you can of course inspect the WhereNode, and look if that holds).
Note that some constraints are not part of the WHERE, for example if you make a JOIN, you will perform an ON, but this is not a WHERE clause.
Since however the above is - to the best of my knowledge - not extenstively documented, it is probably not a good idea to depend on this, since that means that it can easily change, and thus no longer work.
You can use the query attribute (i.e. queryset.query) to get the data used in the SQL query (the output isn't exactly valid SQL).
You can also use queryset.query.__dict__ to get that data in a dictionary format.
I agree with Willem Van Onsen, in that accessing the internals of the query object isn't guaranteed to work in the future. It's correct for now, but might change.
But going half-way down that path, you could use the following:
is_filtered_query = bool(' WHERE ' in str(queryset.query))
which will pretty much do the job!

Building Django Q() objects from other Q() objects, but with relation crossing context

I commonly find myself writing the same criteria in my Django application(s) more than once. I'll usually encapsulate it in a function that returns a Django Q() object, so that I can maintain the criteria in just one place.
I will do something like this in my code:
def CurrentAgentAgreementCriteria(useraccountid):
'''Returns Q that finds agent agreements that gives the useraccountid account current delegated permissions.'''
AgentAccountMatch = Q(agent__account__id=useraccountid)
StartBeforeNow = Q(start__lte=timezone.now())
EndAfterNow = Q(end__gte=timezone.now())
NoEnd = Q(end=None)
# Now put the criteria together
AgentAgreementCriteria = AgentAccountMatch & StartBeforeNow & (NoEnd | EndAfterNow)
return AgentAgreementCriteria
This makes it so that I don't have to think through the DB model more than once, and I can combine the return values from these functions to build more complex criterion. That works well so far, and has saved me time already when the DB model changes.
Something I have realized as I start to combine the criterion from these functions that is that a Q() object is inherently tied to the type of object .filter() is being called on. That is what I would expect.
I occasionally find myself wanting to use a Q() object from one of my functions to construct another Q object that is designed to filter a different, but related, model's instances.
Let's use a simple/contrived example to show what I mean. (It's simple enough that normally this would not be worth the overhead, but remember that I'm using a simple example here to illustrate what is more complicated in my app.)
Say I have a function that returns a Q() object that finds all Django users, whose username starts with an 'a':
def UsernameStartsWithAaccount():
return Q(username__startswith='a')
Say that I have a related model that is a user profile with settings including whether they want emails from us:
class UserProfile(models.Model):
account = models.OneToOneField(User, unique=True, related_name='azendalesappprofile')
emailMe = models.BooleanField(default=False)
Say I want to find all UserProfiles which have a username starting with 'a' AND want use to send them some email newsletter. I can easily write a Q() object for the latter:
wantsEmails = Q(emailMe=True)
but find myself wanting to something to do something like this for the former:
startsWithA = Q(account=UsernameStartsWithAaccount())
# And then
UserProfile.objects.filter(startsWithA & wantsEmails)
Unfortunately, that doesn't work (it generates invalid PSQL syntax when I tried it).
To put it another way, I'm looking for a syntax along the lines of Q(account=Q(id=9)) that would return the same results as Q(account__id=9).
So, a few questions arise from this:
Is there a syntax with Django Q() objects that allows you to add "context" to them to allow them to cross relational boundaries from the model you are running .filter() on?
If not, is this logically possible? (Since I can write Q(account__id=9) when I want to do something like Q(account=Q(id=9)) it seems like it would).
Maybe someone suggests something better, but I ended up passing the context manually to such functions. I don't think there is an easy solution, as you might need to call a whole chain of related tables to get to your field, like table1__table2__table3__profile__user__username, how would you guess that? User table could be linked to table2 too, but you don't need it in this case, so I think you can't avoid setting the path manually.
Also you can pass a dictionary to Q() and a list or a dictionary to filter() functions which is much easier to work with than using keyword parameters and applying &.
def UsernameStartsWithAaccount(context=''):
field = 'username__startswith'
if context:
field = context + '__' + field
return Q(**{field: 'a'})
Then if you simply need to AND your conditions you can combine them into a list and pass to filter:
UserProfile.objects.filter(*[startsWithA, wantsEmails])

Is it possible to modify Django Q() objects after construction?

Is it possible to modify Django Q() objects after construction? I create a Q() object like so:
q = Q(foo=1)
is it possible to later change q to be the same as if I had constructed:
q2 = Q(foo=1, bar=2)
? There's no mention of such an interface in the Django docs that I could find.
I was looking for something like:
Q.append_clause(bar=2)
You can just make another Q() object and AND them together:
q2 = q & Q(bar=2)
You can add Q objects together, using their add method. For example:
>>> q = Q(sender=x)
>>> q.add(Q(receiver=y), Q.AND)
The second argument to add is the connector, which can also be Q.OR
EDIT: My answer is merely a different way of doing what Perrin Harkins suggested, but regarding your other concern, about different behavior of filter depending on the way you construct the query, you don't have to worry about that if you join Q objects. My example is equivalent to filter(sender=x, receiver=y), and not filter(sender=x).filter(receiver=y), because Q objects, as far as I could see in a quick test, do an immediate AND on the clauses and don't have the special behavior of filter for multi-valued relations.
In any case, nothing like looking at the SQL and making sure it really is doing the same in your specific queries.
The answers here are a little old and unsatisfactory imo. So here is my answer
This is how you deep copy:
def deep_copy(q: Q) -> Q:
new_q = Q()
# Go through the children of a query: if it's another
# query it will run this function recursively
for sub_q in q.children:
# Make sure you copy the connector in
# case of complicated queries
new_q.connector = q.connector
if isinstance(sub_q, Q):
# This will run recursively on sub queries
sub_q = get_employee_q(sub_q)
else:
pass # Do your modification here
new_q.children.append(sub_q)
return new_q
In the else condition is where your stuff (name='nathan' for example) is defined. You can change or delete that if you'd like and the Query should work fine.

django get_or_create - performance optimization for a list of objects

Consider the following (pseudoPython) code:
l = [some, list]
for i in l:
o, c = Model.objects.get_or_create(par1=i["something"], defaults={'par2': i["else"],})
assuming that most of the time the objects would be retrieved, not created,
there is an obvious performance gain by quering with a first SELECT() of objects not in the set defined by par1, and then bulk-inserting the missing ones..
but, is there a neat Python/Django pattern of accomplishing that without diving into SQL?
This is a bulk import routine, so l contains dictionaries, not django model instances.
Given a list of IDs, you can use Django to quickly give you the corresponding Model instances using the __in operator: https://docs.djangoproject.com/en/dev/ref/models/querysets/#in
photos_exist = Photo.objects.filter(
id__in=photo_ids
)
You can use Q objects to create a complex query to SELECT the existing rows. Something like:
query_parameters = Q()
for i in l:
query_parameters |= Q(first=i['this']) & Q(second=i['that'])
found = MyModel.objects.filter(query_parameters)
Then you can figure out (in Python) the rows that are missing and create() them (or bulk_create() for efficiency, or get_or_create() if there are potential race conditions).
Of course, long complex queries can have performance problems of their own, but I imagine this would be faster that doing a separate query for each item.

How do I test Django QuerySets are equal?

I am trying to test my Django views. This view passes a QuerySet to the template:
def merchant_home(request, slug):
merchant = Merchant.objects.get(slug=slug)
product_list = merchant.products.all()
return render_to_response('merchant_home.html',
{'merchant': merchant,
'product_list': product_list},
context_instance=RequestContext(request))
and test:
def test(self):
"Merchant home view should send merchant and merchant products to the template"
merchant = Merchant.objects.create(name='test merchant')
product = Product.objects.create(name='test product', price=100.00)
merchant.products.add(product)
test_client = Client()
response = test_client.get('/' + merchant.slug)
# self.assertListEqual(response.context['product_list'], merchant.products.all())
self.assertQuerysetEqual(response.context['product_list'], merchant.products.all())
EDIT
I am using self.assertQuerysetEqual instead of self.assertListEqual. Unfortunately this still doesn't work, and the terminal displays this:
['<Product: Product object>'] != [<Product: Product object>]
assertListEqual raises: 'QuerySet' object has no attribute 'difference' and
assertEqual does not work either, although self.assertSetEqual(response.context['product_list'][0], merchant.products.all()[0]) does pass.
I assume this is because the QuerySets are different objects even though they contain the same model instances.
How do I test that two QuerySets contain the same data? I am even testing this correctly? This is my 4th day learning Django so I would like to know best practices, if possible. Thanks.
By default assertQuerysetEqual uses repr() on the first argument. This is why you were having issues with the strings in the queryset comparison.
To work around this you can override the transform argument with a lambda function that doesn't use repr():
self.assertQuerysetEqual(queryset_1, queryset_2, transform=lambda x: x)
Use assertQuerysetEqual, which is built to compare the two querysets for you. You will need to subclass Django's django.test.TestCase for it to be available in your tests.
I just had the same problem. The second argument of assertQuerysetEqual needs to be a list of the expected repr()s as strings. Here is an example from the Django test suite:
self.assertQuerysetEqual(c1.tags.all(), ["<Tag: t1>", "<Tag: t2>"], ordered=False)
I ended up solving this issue using map to repr() each entry in the queryset inside the self.assertQuerysetEqual call, e.g.
self.assertQuerysetEqual(queryset_1, map(repr, queryset_2))
An alternative, but not necessarily better, method might look like this (testing context in a view, for example) when using pytest:
all_the_things = Things.objects.all()
assert set(response.context_data['all_the_things']) == set(all_the_things)
This converts it to a set, which is directly comparable with another set. Be careful with the behaviour of set though, it might not be exactly what you want since it will remove duplicates and ignore the order of objects.
I found that using self.assertCountEqual(queryset1, queryset2) also solves the issue.

Categories