I have a QuerySet object with 100 items, for each of them I need to know how many times a particular contract_number occurs in the contract_number field.
Example of expected output:
[{'contract_number': 123, 'contract_count': 2}, {'contract_number': 456, 'contract_count': 1} ...]
This means that value 123 occurs 2 times for the whole contract_number field.
Important thing: I cannot reduce the amount of items, so grouping won't work here.
The SQL equivalent for this would be an additional field contract_count as below:
SELECT *,
(SELECT count(contract_number) FROM table where t.contract_number = contract_number) as contract_count
FROM table as t
The question is how to do it with a Python object. After some research, I have found out that for more complex queries the Queryset extra method should be used. Below is one of my tries, but the result is not what I have expected
queryset = Tracker.objects.extra(
select={
'contract_count': '''
SELECT COUNT(*)
FROM table
WHERE contract_number = %s
'''
},select_params=(F('contract_number'),),)
My models.py:
class Tracker(models.Model):
contract_number = models.IntegerField()
EDIT:
The solution to my problem was Subquery()
You can use annotation like this:
from django.db.models import Count
Tracker.objects.values('contract_number').annotate(contract_count=Count('contract_number')).order_by()
Solutions:
counttraker=Traker.objects.values('contract_number').annotate(Count('contract_number'))
subquery=counttraker.filter(contract_number=OuterRef('contract_number').values('contract_number__count')[:1]
traker=Traker.objects.annotate(count=Subquery(subquery))
Related
suppose we have a model in django defined as follows:
class Literal:
name = models.CharField(...)
...
Name field is not unique, and thus can have duplicate values. I need to accomplish the following task:
Select all rows from the model that have at least one duplicate value of the name field.
I know how to do it using plain SQL (may be not the best solution):
select * from literal where name IN (
select name from literal group by name having count((name)) > 1
);
So, is it possible to select this using django ORM? Or better SQL solution?
Try:
from django.db.models import Count
Literal.objects.values('name')
.annotate(Count('id'))
.order_by()
.filter(id__count__gt=1)
This is as close as you can get with Django. The problem is that this will return a ValuesQuerySet with only name and count. However, you can then use this to construct a regular QuerySet by feeding it back into another query:
dupes = Literal.objects.values('name')
.annotate(Count('id'))
.order_by()
.filter(id__count__gt=1)
Literal.objects.filter(name__in=[item['name'] for item in dupes])
This was rejected as an edit. So here it is as a better answer
dups = (
Literal.objects.values('name')
.annotate(count=Count('id'))
.values('name')
.order_by()
.filter(count__gt=1)
)
This will return a ValuesQuerySet with all of the duplicate names. However, you can then use this to construct a regular QuerySet by feeding it back into another query. The django ORM is smart enough to combine these into a single query:
Literal.objects.filter(name__in=dups)
The extra call to .values('name') after the annotate call looks a little strange. Without this, the subquery fails. The extra values tricks the ORM into only selecting the name column for the subquery.
try using aggregation
Literal.objects.values('name').annotate(name_count=Count('name')).exclude(name_count=1)
In case you use PostgreSQL, you can do something like this:
from django.contrib.postgres.aggregates import ArrayAgg
from django.db.models import Func, Value
duplicate_ids = (Literal.objects.values('name')
.annotate(ids=ArrayAgg('id'))
.annotate(c=Func('ids', Value(1), function='array_length'))
.filter(c__gt=1)
.annotate(ids=Func('ids', function='unnest'))
.values_list('ids', flat=True))
It results in this rather simple SQL query:
SELECT unnest(ARRAY_AGG("app_literal"."id")) AS "ids"
FROM "app_literal"
GROUP BY "app_literal"."name"
HAVING array_length(ARRAY_AGG("app_literal"."id"), 1) > 1
Ok, so for some reason none of the above worked for, it always returned <MultilingualQuerySet []>. I use the following, much easier to understand but not so elegant solution:
dupes = []
uniques = []
dupes_query = MyModel.objects.values_list('field', flat=True)
for dupe in set(dupes_query):
if not dupe in uniques:
uniques.append(dupe)
else:
dupes.append(dupe)
print(set(dupes))
If you want to result only names list but not objects, you can use the following query
repeated_names = Literal.objects.values('name').annotate(Count('id')).order_by().filter(id__count__gt=1).values_list('name', flat='true')
With the following models:
class OrderOperation(models.Model):
ordered_articles = models.ManyToManyField(Article,
through='orders.OrderedArticle')
class OrderedArticle(models.Model):
order_operation = models.ForeignKey(OrderOperation)
article = models.ForeignKey(Article)
articles = ... # some queryset containing multiple articles
If I want to find order operations containing at least one article, this works as expected:
OrderOperation.objects.filter(ordered_articles__in=articles)
However, if I want to find order operations with all the articles in the order, what is the correct way to do it?
OrderOperation.objects.filter(ordered_articles=articles) raises a ProgrammingError: more than one row returned by a subquery used as an expression error (I understand why actually).
A simple solution:
order_operations = OrderOperation.objects.all()
for article in articles:
order_operations = order_operations.filter(ordered_articles=article)
It's just one query, but with an inner join per article. For more than a few articles Willem’s more ingenious solution should perform better.
We can first construct a set of articles:
articles_set = set(articles)
Next we can count the number of articles related to the OrderOperation that appear in that set, and check if that number is equal to the size of that set, like:
from django.db.models import Count
OrderOperation.objects.filter(
ordered_articles__in=articles_set
).annotate(
narticles=Count('ordered_articles')
).filter(
narticles=len(articles_set)
)
Since in a ManyToManyField, each Article can occur once per OrderOperation, if the number of related Articles that are in the article_set is the same as the number of elements in the article_set, we thus know that the two sets are the same.
This will create a query that looks like:
SELECT orderoperation.*
COUNT(orderoperation_article.article_id) AS narticle
FROM orderoperation
JOIN orderoperation_article ON orderoperation_id = orderoperation.id
WHERE orderoperation.article_id IN (article_set)
GROUP BY orderoperation.id
HAVING COUNT(orderoperation_article.article_id) = len(article_set)
where the article_set and len(article_set) are of course replaced by the primary keys of the articles in the set, or the number of elements in that set.
Lets say i have a model
class Testmodel():
amount = models.IntegerField(null=True)
contact = models.CharField()
Now I am making a query like:
obj1 = Testmodel.objects.filter(contact = 123)
and suppose its returning n number objects in any case like (obj1,obj2,obj3 ...)
So, if I want to make the sum of amount from all the returning object (obj1,obj2,obj3 ...) then how to do by the best way.
any help will be appreciated.
It is usually better to do this at the database level, than in Python. We can use .aggregate(..) for that:
from django.db.models import Sum
Testmodel.objects.filter(contact=123).aggregate(total=Sum('amount'))['total']
The .aggregate(total=Sum('amount')) will return a dictionary that contains a single key-value pair: 'total' will be associated with the sum of the amount of the rows. In case no rows are selected (i.e. the filter does not match anything), then it will associate None with the key.
Given the database supports to sum up values (most databases do), you construct a query that is something similar to:
SELECT SUM(amount) AS total
FROM app_testmodel
WHERE contact = 123
Use aggregate
from django.db.models import Sum
Testmodel.objects.filter(contact=123).aggregate(
total_sum=Sum('amount')
)
I am trying to make a very simple Subquery that uses OuterRef (not for practical purposes, but just to get it working), but I keep running into the same error.
posts/models.py code
from django.db import models
class Tag(models.Model):
name = models.CharField(max_length=120)
def __str__(self):
return self.name
class Post(models.Model):
title = models.CharField(max_length=120)
tags = models.ManyToManyField(Tag)
def __str__(self):
return self.title
manage.py shell code
>>> from django.db.models import OuterRef, Subquery
>>> from posts.models import Tag, Post
>>> tag1 = Tag.objects.create(name='tag1')
>>> post1 = Post.objects.create(title='post1')
>>> post1.tags.add(tag1)
>>> Tag.objects.filter(post=post1.pk)
<QuerySet [<Tag: tag1>]>
>>> tags_list = Tag.objects.filter(post=OuterRef('pk'))
>>> Post.objects.annotate(count=Subquery(tags_list.count()))
The last two lines should give me number of tags for each Post object. And here I keep getting the same error:
ValueError: This queryset contains a reference to an outer query and may only be used in a subquery.
One of the problems with your example is that you cannot use queryset.count() as a subquery, because .count() tries to evaluate the queryset and return the count.
So one may think that the right approach would be to use Count() instead. Maybe something like this:
Post.objects.annotate(
count=Count(Tag.objects.filter(post=OuterRef('pk')))
)
This won't work for two reasons:
The Tag queryset selects all Tag fields, while Count can only count on one field. Thus: Tag.objects.filter(post=OuterRef('pk')).only('pk') is needed (to select counting on tag.pk).
Count itself is not a Subquery class, Count is an Aggregate. So the expression generated by Count is not recognized as a Subquery (OuterRef requires subquery), we can fix that by using Subquery.
Applying fixes for 1) and 2) would produce:
Post.objects.annotate(
count=Count(Subquery(Tag.objects.filter(post=OuterRef('pk')).only('pk')))
)
However
if you inspect the query being produced:
SELECT
"tests_post"."id",
"tests_post"."title",
COUNT((SELECT U0."id"
FROM "tests_tag" U0
INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."post_id" = ("tests_post"."id"))
) AS "count"
FROM "tests_post"
GROUP BY
"tests_post"."id",
"tests_post"."title"
you will notice a GROUP BY clause. This is because COUNT is an aggregate function. Right now it does not affect the result, but in some other cases it may. That's why the docs suggest a different approach, where the aggregation is moved into the subquery via a specific combination of values + annotate + values :
Post.objects.annotate(
count=Subquery(
Tag.objects
.filter(post=OuterRef('pk'))
# The first .values call defines our GROUP BY clause
# Its important to have a filtration on every field defined here
# Otherwise you will have more than one group per row!!!
# This will lead to subqueries to return more than one row!
# But they are not allowed to do that!
# In our example we group only by post
# and we filter by post via OuterRef
.values('post')
# Here we say: count how many rows we have per group
.annotate(count=Count('pk'))
# Here we say: return only the count
.values('count')
)
)
Finally this will produce:
SELECT
"tests_post"."id",
"tests_post"."title",
(SELECT COUNT(U0."id") AS "count"
FROM "tests_tag" U0
INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."post_id" = ("tests_post"."id")
GROUP BY U1."post_id"
) AS "count"
FROM "tests_post"
The django-sql-utils package makes this kind of subquery aggregation simple. Just pip install django-sql-utils and then:
from sql_util.utils import SubqueryCount
posts = Post.objects.annotate(
tag_count=SubqueryCount('tag'))
The API for SubqueryCount is the same as Count, but it generates a subselect in the SQL instead of joining to the related table.
I have following two models
class A(models.Model):
name = models.CharField()
age = models.SmallIntergerField()
class B(models.Model):
a = models.OneToOneField(A)
salary = model.IntergerField()
No I have got records both of them. I want to query Model A with known id and I want both A and B records.
The SQL query is:
SELECT A.id, A.name, A.age, B.salary
FROM A INNER JOIN B ON A.id = B.a_id
WHERE A.id=1
Please provide me django query (by using orm). I want to achieve this with one queryset.
q = B.objects.filter(id=id).values('salary','a__id','a__name','a__age')
this will return a ValuesQuerySet
values
values(*fields) Returns a ValuesQuerySet — a QuerySet subclass that
returns dictionaries when used as an iterable, rather than
model-instance objects.
Each of those dictionaries represents an object, with the keys
corresponding to the attribute names of model objects.
You can actually print q.query to get the sql query behind the QuerySet, which in this case is exactly as you requested.
Please try this:
result = B.objects.filter(a__id=1).values('a__id', 'a__name', 'a__age', 'salary')
The result is a <class 'django.db.models.query.ValuesQuerySet'>, which is essentially a list of dictionaries with key as the field name and value as the actual value. If you want only the values, do this:
result = B.objects.filter(a__id=1).values_list('a__id', 'a__name', 'a__age', 'salary')
The result is a <class 'django.db.models.query.ValuesListQuerySet'>, and it's essentially a list of tuples.