Django: Unpack argument list for use in aggregate query - python

I am attempting to create a semi-dynamic aggregate function that will return the sums of all fields within a list. The assumption is that running get_query_set() will return a filtered query that contains all the fields in the list and some others that may not play so well with a Sum aggregate (date fields, char fields, Foreign Keys, etc...)
Best examples I've come up with so far are below, this is largely a python question with Django specific usage though my Python-Fu is not the strongest yet...
Works
qs = cl.get_query_set().aggregate(Sum('permits_submitted'), Sum('permits_pulled'), Sum('permits_posted'))
return:
{'permits_pulled__sum': 5772, 'permits_posted__sum': 6723, 'permits_submitted__sum': 7276}
Does not work
qs = cl.get_query_set().aggregate(Sum('permits_submitted')).aggregate(Sum('permits_pulled'))
return: error
qs = cl.get_query_set().aggregate(Sum('permits_submitted', 'permits_pulled', Sum('permits_posted'))
return: error
Does not work - presents the idea
tuple = (
'permits_submitted',
'permits_pulled',
'permits_posted',
)
qs = cl.get_query_set()
for field in tuple:
qs.aggregate(Sum(field))
return: error
qs = cl.get_query_set()
qs.aggregate(*[Sum(field) for field in tuple])
return:
[<permit_runner: User's report for 2010-02-18>, <permit_runner: User's report for 2010-02-19>, '...(remaining elements truncated)...']
(this is the same as the return without aggregation)
WORKS
qs = cl.get_query_set()
qs = qs.aggregate(*[Sum(field) for field in tuple])
had missed defining qs = when adding the aggregation - helps to take a break for a few minutes and look fresh

Since this work, I put that as an answer, so people will find it easily when googling:
qs = cl.get_query_set()
qs = qs.aggregate(*[Sum(field) for field in tuple])

Creating dynamic aggregations (or dynamic annotations)
from typing import Dict
from django.db.models import Q, Sum
# 1. Prepare your querysets
years = year_allowances.values_list("year", flat=True)
time_aways = TimeAway.objects.filter(
sequence__pk__in=sequences.values_list("pk", flat=True)
).actual_paid_sick_leave()
# 2. Define your individual aggregate expression.
def get_aggregate(key) -> Dict[str, Sum]:
return {
str(key): Sum(F('actual_paid_sick_leave'), filter=Q(local_timezone_start__year=key))
}
# 3. Create the dictionary of aggregate expressions.
aggregate_expressions = {}
ds = [{**get_aggregate(year)} for year in years]
for d in ds:
aggregate_expressions.update(d)
# 4. Create your aggregations.
x = time_aways.aggregate(**aggregate_expressions)
>> x = {'2021': datetime.timedelta(0), '2022': datetime.timedelta(days=5)}

Related

Django: How to filter model objects after passing through functions?

I have a model called Service, which has a field url of type str. I have a function f that returns the hostname of an url:
def f(url):
return urllib.parse.urlparse(url).hostname
I want to get all the objects whose f(url) equals a value target.
One way to achieve this would be by doing the following:
[x for x in Service.objects.all() if(f(x.url) == target)]
But in that case, I'll get a list, not a QuerySet.
Is there a way to filter the objects and get a QuerySet satisfying the above criteria?
Can you try sthg like this instead of looping through, we are changing target:
from django.db.models import Q
target_not_safe = 'http://'+target
target_safe = 'https://'+target
queryset = Service.objects.filter(Q(url=target_not_safe) | Q(url=target_safe))
Q objects
EDIT
How about using _istartwith:
queryset = Service.objects.filter(Q(url__istartswith=target_not_safe) | Q(url__istartswith=target_safe))
Edit 2
Another trick could be to check inside the list using __in. So:
query_list = [x.id for x in Service.objects.all() if(f(x.url) == target)]
queryset = Service.objects.filter(id__in=query_list)

Using set instead of list in Django and checking the id with each id in the list

def user(request):
users_list = UserConfig.objects.get(meta_key="ALLOWED_USERS")
users_list = [int(x) for x in user_list.meta_value.split(",")]
if request.user.id not in users_list:
// some logic
else:
// other logic
How can I convert the above snippet by using of a set, currently I am using list to check. So if the Object gets more then it won't be a efficient way.
Using Django 1.8.
How can i convert the above snippet by using of a set
well, technically you could just do this:
users_set = set(int(x) for x in user_list.meta_value.split(","))
BUT your real issue here is a design issue - you should NOT store a list of related pks (you should not store any list of any kind FWIW) in one single field. Either use existing auth.User features, or use permissions, or use a custom User model and add your own things to it etc, but by all means keep your relational schema properly normalized. My 2 cents...
While it’s true that sets have a much quicker lookup time compared to lists, they don’t preserve the order of the data. So just be aware of that when deciding which data structure to use.
def user(request):
users_list = UserConfig.objects.get(meta_key="ALLOWED_USERS")
users_list = [int(x) for x in user_list.meta_value.split(",")]
user_set = set(users_list)
if request.user.id not in users_set:
// some logic
else:
// other logic
I know this is not a direct answer to your question, but if you must use that model as is and you want a faster lookup, I think you can use a dynamic regex pattern to find if the user id is in that list:
import re
def user(request):
users_list = UserConfig.objects.get(meta_key="ALLOWED_USERS")
pattern = "{0},".format(request.user.id)
if re.search(pattern, users_list):
# User is in list
else:
# User not in list
This is just an idea and assumes that your string has every id followed by a comma, including the last one.
You can create set from the list.
users_list = UserConfig.objects.get(meta_key="ALLOWED_USERS")
users_list = [int(x) for x in user_list.meta_value.split(",")]
users_list = set(users_list)
You can also use Set comprehension.
users_list = {int(x) for x in user_list.meta_value.split(",")}

Simple Subquery with OuterRef

I am trying to make a very simple Subquery that uses OuterRef (not for practical purposes, but just to get it working), but I keep running into the same error.
posts/models.py code
from django.db import models
class Tag(models.Model):
name = models.CharField(max_length=120)
def __str__(self):
return self.name
class Post(models.Model):
title = models.CharField(max_length=120)
tags = models.ManyToManyField(Tag)
def __str__(self):
return self.title
manage.py shell code
>>> from django.db.models import OuterRef, Subquery
>>> from posts.models import Tag, Post
>>> tag1 = Tag.objects.create(name='tag1')
>>> post1 = Post.objects.create(title='post1')
>>> post1.tags.add(tag1)
>>> Tag.objects.filter(post=post1.pk)
<QuerySet [<Tag: tag1>]>
>>> tags_list = Tag.objects.filter(post=OuterRef('pk'))
>>> Post.objects.annotate(count=Subquery(tags_list.count()))
The last two lines should give me number of tags for each Post object. And here I keep getting the same error:
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
One of the problems with your example is that you cannot use queryset.count() as a subquery, because .count() tries to evaluate the queryset and return the count.
So one may think that the right approach would be to use Count() instead. Maybe something like this:
Post.objects.annotate(
count=Count(Tag.objects.filter(post=OuterRef('pk')))
)
This won't work for two reasons:
The Tag queryset selects all Tag fields, while Count can only count on one field. Thus: Tag.objects.filter(post=OuterRef('pk')).only('pk') is needed (to select counting on tag.pk).
Count itself is not a Subquery class, Count is an Aggregate. So the expression generated by Count is not recognized as a Subquery (OuterRef requires subquery), we can fix that by using Subquery.
Applying fixes for 1) and 2) would produce:
Post.objects.annotate(
count=Count(Subquery(Tag.objects.filter(post=OuterRef('pk')).only('pk')))
)
However
if you inspect the query being produced:
SELECT
"tests_post"."id",
"tests_post"."title",
COUNT((SELECT U0."id"
FROM "tests_tag" U0
INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."post_id" = ("tests_post"."id"))
) AS "count"
FROM "tests_post"
GROUP BY
"tests_post"."id",
"tests_post"."title"
you will notice a GROUP BY clause. This is because COUNT is an aggregate function. Right now it does not affect the result, but in some other cases it may. That's why the docs suggest a different approach, where the aggregation is moved into the subquery via a specific combination of values + annotate + values :
Post.objects.annotate(
count=Subquery(
Tag.objects
.filter(post=OuterRef('pk'))
# The first .values call defines our GROUP BY clause
# Its important to have a filtration on every field defined here
# Otherwise you will have more than one group per row!!!
# This will lead to subqueries to return more than one row!
# But they are not allowed to do that!
# In our example we group only by post
# and we filter by post via OuterRef
.values('post')
# Here we say: count how many rows we have per group
.annotate(count=Count('pk'))
# Here we say: return only the count
.values('count')
)
)
Finally this will produce:
SELECT
"tests_post"."id",
"tests_post"."title",
(SELECT COUNT(U0."id") AS "count"
FROM "tests_tag" U0
INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."post_id" = ("tests_post"."id")
GROUP BY U1."post_id"
) AS "count"
FROM "tests_post"
The django-sql-utils package makes this kind of subquery aggregation simple. Just pip install django-sql-utils and then:
from sql_util.utils import SubqueryCount
posts = Post.objects.annotate(
tag_count=SubqueryCount('tag'))
The API for SubqueryCount is the same as Count, but it generates a subselect in the SQL instead of joining to the related table.

Django: Retrieving IDs of manyToMany fields quickly

I have the following model schema in Django (with Postgres).
class A(Models.model):
related = models.ManyToManyField("self", null=True)
Given a QuerySet of A, I would like to return a dictionary mapping each instance of A in the QuerySet to a list of ids of its related instances as quickly as possible.
I can surely iterate through each A and query the related field, but is there a more optimal way?
According you have Three instances. You can use the values_list method to retrieve just the results and from this result get just the ID's of their related instances.
I use the pk field to be my filter because i don't know your scheme, but you can use anything, just must be a QuerySet.
>>> result = A.objects.filter(pk=1)
>>> result.values('related__id')
[{'id': 2}, {'id': 3}]
>>> result.values_list('related__id')
[(2,), (3,)]
>>> result.values_list('related__id', flat=True)
[2, 3]
You can get pretty close like this:
qs = A.objects.prefetch_related(Prefetch(
'related',
queryset=A.objects.only('pk'),
to_attr='related_insts')).in_bulk(my_list_of_pks)
This will give a mapping from pks of the current object to the instance itself, so you can iterate through as follows:
for pk, inst in qs.iteritems():
related_ids = (related.pk for related in inst.related_insts)
Or given an instance, you can do a fast lookup like so:
related_ids = (related.pk for related in qs[instance.pk]).
This method maps the instance ids to the related ids (indirectly) since you specifically requested a dictionary. If you aren't doing lookups, you may want the following instead:
qs = A.objects.prefetch_related(Prefetch(
'related',
queryset=A.objects.only('pk'),
to_attr='related_insts')).filter(pk__in=my_list_of_pks)
for inst in qs:
related_ids = (related.pk for related in inst.related_insts)
You may take note of the use of only to only pull the pks from the db. There is an open ticket to allow the use of values and (I presume) values_list in Prefetch queries. This would allow you to do the following.
qs = A.objects.prefetch_related(Prefetch(
'related',
queryset=A.objects.values_list('pk', flat=True),
to_attr='related_ids')).filter(pk__in=my_list_of_pks)
for inst in qs:
related_ids = inst.related_ids
You could of course optimize further, for example by using qs.only('related_insts') on the primary queryset, but make sure you aren't doing anything with these instances-- they're essentially just expensive containers to hold your related_ids.
I believe this is the best that's available for now (without custom queries). To get to exactly what you want, two things are needed:
The feature above is implemented
values_list is made to work with Prefetch to_attr like it does for annotations.
With these two things in place (and continuing the above example) you could do the following to get exactly what you requested:
d = qs.values_list('related_ids', flat=True).in_bulk()
for pk, related_pks in d.items():
print 'Containing Objects %s' % pk
print 'Related objects %s' % related_pks
# And lookups
print 'Object %d has related objects %s' % (20, d[20])
I've left off some details explaining things, but it should be pretty clear from the documentation. If you need any clarification, don't hesitate!
If you're using Postgres:
from django.contrib.postgres.aggregates import ArrayAgg
qs = A.objects.filter(pk__in=[1,2,6]).annotate(related_ids=ArrayAgg('related')).only('id')
mapping = {a.id: a.related_ids for a in qs}
You can also use filter/ordering in the ArrayAgg.

A puzzle concerning Q objects and Foreign Keys

I've got a model like this:
class Thing(models.Model):
property1 = models.IntegerField()
property2 = models.IntegerField()
property3 = models.IntegerField()
class Subthing(models.Model):
subproperty = models.IntegerField()
thing = modelsForeignkey(Thing)
main = models.BooleanField()
I've got a function that is passed a list of filters where each filter is of the form {'type':something, 'value':x}. This function needs to return a set of results ANDing all the filters together:
final_q = Q()
for filter in filters:
q = None
if filter['type'] =='thing-property1':
q = Q(property1=filter['value'])
elif filter['type'] =='thing-property2':
q = Q(property2=filter['value'])
elif filter['type'] =='thing-property2':
q = Q(property3=filter['value'])
if q:
final_q = final_q & q
return Thing.objects.filter(final_q).distinct()
Each Subthing has a Boolean property 'main'. Every Thing has 1 and only 1 Subthing where main==True.
I now need to add filter that returns all the Things which have a Subthing where main==True and subproperty==filter['value']
Can I do this as part of the Q object I'm constructing? If not how else? The queryset I get before my new filter can be quite large so I would like a method that doesn't involve looping over the results.
It's a bit easier to understand if you explicitly give your Subthings a "related_name" in their relationship to the Thing
class Subthing(models.Model):
...
thing = models.ForeignKey(Thing, related_name='subthings')
...
Now, you use Django join syntax to build your Q object:
Q(subthings__main=True) & Q(subthings__subproperty=filter['value'])
The reverse relationship has the default name 'subthing_set', but I find that it's easier to follow if you give it a better name like 'subthings'.
Using (instead of final_q=Q() in the beginning)
final_q=Q(subthing_set__main=True)
sub_vals = map(lambda v: v['value'], filters)
if sub_vals:
final_q = final_q & Q(subthing_set__subproperty__in=sub_vals)
should get you what you want, you can also adjust your loop to build the sub_vals list and apply it after the loop.
subthing_set is and automatically added related field added to the Thing to access related Subthings.
you can assign another related name, e.g.
thing=models.ForeignKey(Thing,related_name='subthings')

Categories