I'm trying to use django annotation to create queryset field which is a list of values of some related model attribute.
queryset = ...
qs = queryset.annotate(
list_field=SomeAggregateFunction(
Case(When(related_model__field="abc"), then="related_model__id")
),
list_elements=Count(F('list_field'))
)
I was thinking about about concatenating all these id with some separator, but i don't know the appropriate functions. Another solution is to make list_field a queryset. I know this syntax is wrong. Thank you for any help.
If you are using postgresql and django >= 1.9, you could use postgres specific aggregating functions e.g.
ArrayAgg:
Returns a list of values, including nulls, concatenated into an array.
In case, you need to concatenate these values using a delimiter, you could also use StringAgg.
I have done something like that:
qs = queryset \
.annotate(
field_a=ArrayAgg(Case(When(
related_model__field="A",
then="related_model__pk")
)),
field_b=ArrayAgg(Case(When(
related_model__field="B",
then="related_model__pk")
)),
field_c=ArrayAgg(Case(When(
related_model__field="C",
then="related_model__pk")
))
)
Now there are lists of None or pk under each field_a, field_b and field_c for every object in queryset. You can also define other default value for Case instead of None.
Related
suppose we have a model in django defined as follows:
class Literal:
name = models.CharField(...)
...
Name field is not unique, and thus can have duplicate values. I need to accomplish the following task:
Select all rows from the model that have at least one duplicate value of the name field.
I know how to do it using plain SQL (may be not the best solution):
select * from literal where name IN (
select name from literal group by name having count((name)) > 1
);
So, is it possible to select this using django ORM? Or better SQL solution?
Try:
from django.db.models import Count
Literal.objects.values('name')
.annotate(Count('id'))
.order_by()
.filter(id__count__gt=1)
This is as close as you can get with Django. The problem is that this will return a ValuesQuerySet with only name and count. However, you can then use this to construct a regular QuerySet by feeding it back into another query:
dupes = Literal.objects.values('name')
.annotate(Count('id'))
.order_by()
.filter(id__count__gt=1)
Literal.objects.filter(name__in=[item['name'] for item in dupes])
This was rejected as an edit. So here it is as a better answer
dups = (
Literal.objects.values('name')
.annotate(count=Count('id'))
.values('name')
.order_by()
.filter(count__gt=1)
)
This will return a ValuesQuerySet with all of the duplicate names. However, you can then use this to construct a regular QuerySet by feeding it back into another query. The django ORM is smart enough to combine these into a single query:
Literal.objects.filter(name__in=dups)
The extra call to .values('name') after the annotate call looks a little strange. Without this, the subquery fails. The extra values tricks the ORM into only selecting the name column for the subquery.
try using aggregation
Literal.objects.values('name').annotate(name_count=Count('name')).exclude(name_count=1)
In case you use PostgreSQL, you can do something like this:
from django.contrib.postgres.aggregates import ArrayAgg
from django.db.models import Func, Value
duplicate_ids = (Literal.objects.values('name')
.annotate(ids=ArrayAgg('id'))
.annotate(c=Func('ids', Value(1), function='array_length'))
.filter(c__gt=1)
.annotate(ids=Func('ids', function='unnest'))
.values_list('ids', flat=True))
It results in this rather simple SQL query:
SELECT unnest(ARRAY_AGG("app_literal"."id")) AS "ids"
FROM "app_literal"
GROUP BY "app_literal"."name"
HAVING array_length(ARRAY_AGG("app_literal"."id"), 1) > 1
Ok, so for some reason none of the above worked for, it always returned <MultilingualQuerySet []>. I use the following, much easier to understand but not so elegant solution:
dupes = []
uniques = []
dupes_query = MyModel.objects.values_list('field', flat=True)
for dupe in set(dupes_query):
if not dupe in uniques:
uniques.append(dupe)
else:
dupes.append(dupe)
print(set(dupes))
If you want to result only names list but not objects, you can use the following query
repeated_names = Literal.objects.values('name').annotate(Count('id')).order_by().filter(id__count__gt=1).values_list('name', flat='true')
My model example:
class Thing(models.Model):
alpha = models.ForeignKey('auth.User', on_delete=models.CASCADE,
related_name='alpha_thing')
beta = models.ForeignKey('auth.User', on_delete=models.CASCADE,
related_name='beta_thing')
assigned_at = models.DateTimeField(
_('assigned at'),
null=True,
help_text=_('Assigned at this date'))
I wish to query all the users which don't have a Thing with an assigned_at date, ie they could have other Things, but that should have a date set.
I've tried:
return User.objects.exclude(
alpha_thing__assigned_at__isnull=True
).exclude(
beta_thing__assigned_at__isnull=True
).all()
but the result is empty (the thing table is empty, so i'm not sure if it has something to do with the join?).
What about this,
from django.db.models import Q
User.objects.filter(Q(alpha_thing__assigned_at__isnull=False) | Q(beta_thing__assigned_at__isnull=False)).distinct()
Screenshots
1. Auth model structure - User
2. Thing model
There is another way, since you want to filter user which "things" contains all an assigned_date.
You could:
User.objects.filter(
alpha_thign__assigned_at__isnull=False,
beta_thign__assigned_at__isnull=False,
)
Simple.
There are no need to Use Q objects here or | (or) operations.
What you want is not
alpha_thing__assigned_at__isnull=False OR
beta_thing__assigned_at__isnull=False
What you're looking for is
alpha_thing__assigned_at__isnull=False AND
beta_thing__assigned_at__isnull=False
For all users which don't have a Thing with an empty date try:
return User.objects.exclude(
alpha_thing__assigned_at=None
).exclude(
beta_thing__assigned_at=None
).all()
By the way, I got the same result whether I used .all() at the end or not, so:
return User.objects.exclude(
alpha_thing__assigned_at=None
).exclude(
beta_thing__assigned_at=None
)
returned the same result as the first example.
Have you tried something like this?
from django.db.models import Q
has_null_alpha = Q(alpha_thing__isnull=False, alpha_thing__assigned_at__isnull=True)
has_null_beta = Q(beta_thing__isnull=False, beta_thing__assigned_at__isnull=True)
User.objects.exclude(has_null_alpha | has_null_beta)
Reasoning
I think the reason you're seeing unexpected results may not have anything to do with the fact that there are multiple ForeignKey paths in the queryset. Your statement that "the thing table is empty" might be the key, and the reason users aren't showing up is because they have no alpha_thing or beta_thing relation.
NOTES:
The QuerySet User.objects.exclude(alpha_thing__assigned_at__isnull=True) produces a left outer join between the User table and the Thing table, which means that before doing any comparisons in the WHERE clause, you're getting NULL for assigned_at in any row where there is no Thing.
One really weird thing here is that a filter causes an INNER join, so that the statement User.objects.filter(alpha_thing__assigned_at__isnull=False) actually only yields the users who actually have alpha_thing related objects with a non-NULL value for assigned_at (leaving out those guys with no related alpha_thing).
I need to create new field in the queryset that flags if a record is a duplicate or not. I consider the concatenated values of 2 fields as an identifier. If they are seen more that once in the query set (the field that is concatenated), then the record is considered a duplicate.
First, on my query set, I create another field from the existing 2 fields which is case number and hearing date. and their output field name is dupe_id
qs = file.objects.annotate(
dupe_id=Concat(
F('case_no')
, F('hearing_date')
, output_field=CharField()
)
)
then I test this dupe_id field for count. If the count is more than 1, then it is considered as duplicate
dupes = qs.values('dupe_id').annotate(dupe_count=Count('dupe_id')).filter(dupe_count__gt=1)
at this point I now have another query set the contains the duplicate values from the original query set. Here are the records seen from the dupe object which is of type queryset. It also states the number of instances the value was found
<QuerySet [{'dupe_id': 'Test Case No.2018-12-26', 'dupe_count': 3}, {'dupe_id': '123452018-12-26', 'dupe_count': 2}]>
Now this is where I'm having a bit of difficulty. What I'm thinking is that I will do an annotation on my main query set and I will use the dupes query set to help in identifying the records that need to be tagged as duplicate.
I tried this:
qs = qs.annotate(
dupe_id2 = Value(('duplicate' if dupes.filter(dupe_id__exact=Concat(F('case_no'), F('hearing_date')))[0] else '--'), output_field=CharField())
)
This is just a simple test that says if the concatenated values are seen in the dupes query set, then the field will be tagged as duplicate, if not then '--'.
But it does not seem to work as expected. All the records are being tagged as duplicate even though I have 1 record that should not be tagged as duplicate.
Also I checked using conditional expressions but I won't be able to use the dupes query set I created.
If there is a more robust way of tagging records in a query set as duplicate, please let me know.
One of the ways to work on duplicates is to use the algorithm of:
GroupBy in SQL > Find Duplicates > loop over duplicates
from django.db.models import Max, Count
# Getting duplicate files based on case_no and hearing_date
files = File.objects.values('case_no', 'hearing_date') \
.annotate(records=Count('case_no')) \
.filter(records__gt=1)
# Check the generated group by query
print files.query
# Then do operations on duplicates
for file in files:
File.objects.filter(
case_no=file['case_no'],
hearing_date=file['hearing_date']
)[1:].update(duplicate=True)
It turns out it is not possible to perform conditional operations on the annotate function of a query set.
What I did was to override the get_context_data function, then get the duplicate keys. The returned object was a queryset so I took all the IDs and then put them in a list, then stored them to context which was made available in the template view.
This is what my get_context_data function looks like, if it could be further improved please let me know.
def get_context_data(self, **kwargs):
ctx = super(fileList, self).get_context_data(**kwargs)
qs = file.objects.annotate(
dupe_id=Concat(
F('case_no')
, F('hearing_date')
, output_field=CharField()
)
)
dupes = qs.values('dupe_id').annotate(dupe_count=Count('dupe_id')).filter(dupe_count__gt=1)
dupe_keys = []
for dupe in dupes:
dupe_keys.append(dupe['dupe_id'])
ctx['dupe_keys'] = dupe_keys
return ctx
Now on the template view, on the for loop of the queryset, I just created another column which checks if the id in the queryset is visible in the list of duplicates, then the record will have a special tagging of duplicate or the cell will be highlighted to something visible to the user.
<td>{% if object.dupe_id in dupe_keys %} duplicate {% else %} not duplicate {% endif %}</td>
I need to annotate a queryset with strings from dictionary. The dictionary keys come from the model's field called 'field_name'.
I can easily annotate with strings from dictionary using the Value operator:
q = MyModel.objects.annotate(
new_value=Value(value_dict[key], output_field=CharField()))
And I can get the field value from the model with F expression:
q = MyModel.objects.annotate(new_value=F('field_name'))
Putting them together however fails:
# doesn't work, throws
# KeyError: F(field_name)
q = MyModel.objects.annotate(
new_value=Value(value_dict[F('field_name')],
output_field=CharField()))
Found this question, which afaiu tries to do the same thing but that solution throws another error:
Unsupported lookup 'field_name' for CharField or join on the field not permitted.
I feel like I'm missing something really obvious here but I just can't get it to work. Any help appreciated.
Right, just as I thought, a tiny piece was missing. The Case(When(... solution in the linked question worked, I just needed to wrap the dictionary value in Value() operator as follows:
qs = MyModel.objects.annotate(
new_value=Case(
*[ When(field_name=k, then=Value(v)) for k,v in value_dict.items() ],
output_field=CharField()
)
)
In Django, I have two models:
class A(models.Model):
# lots of fields
class B(models.Model):
a = models.ForeignKey(A)
member = models.BooleanField()
I need to construct a query that filters B and selects all A, something like this:
result = B.objects.filter(member=True).a
Above example code will of course return an error QuerySet has no attribute 'a'
Expected result:
a QuerySet containing only A objects
Whats the best and fastest way to achieve the desired functionality?
I assume you are looking for something like
result = A.objects.filter(b__member=True)
An alternative to Andrey Zarubin's answer would be to iterate over the queryset you had and create a list of a objects.
b_objects = B.objects.filter(member=True)
a_objects = [result.a for result in b_objects]
Below code will not filter everything but it will filter all the values with respect to field, might be you are looking for same
B.objects.filter(member=True).filter(a__somefield='some value')