I'm using Django ORM to handle my database queries. I have the following db tables:
resource
resource_pool
resource_pool_elem
reservation
and the following models:
class Resource(models.Model):
name = models.CharField(max_length=200)
class Reservation(models.Model):
pass
class ResourcePool(models.Model):
reservation = models.ForeignKey(Reservation, related_name="pools", db_column="reservation")
resources = models.ManyToManyField(Resource, through="ResourcePoolElem")
mode = models.IntegerField()
class ResourcePoolElem(models.Model):
resPool = models.ForeignKey(ResourcePool)
resource = models.ForeignKey(Resource)
Currently, I need to query the resources used in a set of reservations. I use the following query:
resourcesNames = []
reservations = []
resources = models.Resource.objects.filter(
name__in=resourcesNames, resPool__reservation__in=reservations).all()
which I think matches to a sql query similar to this one:
select *
from resource r join resource_pool rp join resource_pool_elem rpe join reservation reserv
where r.id = rpe.resource and
rpe.pool = rp.id and
reserv.id = rp.reservation and
r.name in (resourcesNames[0], ..., resourcesNames[n-1])
reserv.id in (reservations[0], ..., reservations[n-1])
Now, I want to add a restriction to this query. Each pool may have a exclusive mode boolean flag. There will be an extra input list with the requested exclusive flags of each pool and I only want to query the resources of pools which exclusive flag match the requested exclusive flag if exclusive = true OR resources of pools which exclusive flag is false. I could build the SQL query using Python with a code similar to this:
query = "select *
from resource r join resource_pool rp join resource_pool_elem rep
join reservation reserv
where r.id = rpe.resource and
rpe.pool = rp.id and
reserv.id = rp.reservation and
reserv.id in (reservations[0], ..., reservations[n-1]) and ("
for i in resourcesNames[0:len(resourcesNames)]
if i > 0:
query += " or "
query += "r.name = " + resourcesNames[i]
if (exclusive[i])
query += " and p.mode == 0"
query += ")"
Is there a way to express this sql query in a Django query?
Perhaps you can do this with Q objects. I have some issues wrapping my head around your example, but lets look at it with a simpler model.
class Garage(models.Model):
name = models.CharField()
class Vehicle(models.Model):
wheels = models.IntegerField()
gears = models.IntegerField()
garage = models.ForeignKey(Garage)
Say you want to get all "multiple-wheeled" vehicles in the garage (e.g. all motorcycles and cars, but no unicycles), but for cars, you only want those with a CVT transmission, meaning they only have a single gear. (How this came up, no clue, but bear with me... ;) The following should give you that:
from django.db.models import Q
garage = Garage.objects.all()[0]
query = Vehicle.objects.filter(Q(garage=garage))
query = query.filter(Q(wheels=2) | (Q(wheels=4) & Q(gears=1)))
Given the following available data:
for v in Vehicle.objects.filter(garage=garage):
print 'Wheels: {}, Gears: {}'.format(v.wheels, v.gears)
Wheels: 1, Gears: 1
Wheels: 2, Gears: 4
Wheels: 2, Gears: 5
Wheels: 4, Gears: 1
Wheels: 4, Gears: 5
Running the query will give us:
for v in query:
print 'Wheels: {}, Gears: {}'.format(v.wheels, v.gears)
Wheels: 2, Gears: 4
Wheels: 2, Gears: 5
Wheels: 4, Gears: 1
Finally, to adapt it to your case, you might be able to use something along the following lines:
query = models.Resource.objects.filter(Q(resPool__reservation__in=reservations))
query = query.filter(Q(name__in(resourcesNames))
query = query.filter(Q(resPool__exclusive=True) & Q(resPool__mode=0))
You could use a django cursor to make queries , for instance
see documentation :
https://docs.djangoproject.com/en/dev/topics/db/sql/
from django.db import connection
def my_custom_sql(self):
cursor = connection.cursor()
cursor.execute("UPDATE bar SET foo = 1 WHERE baz = %s", [self.baz])
cursor.execute("SELECT foo FROM bar WHERE baz = %s", [self.baz])
row = cursor.fetchone()
return row
Related
I have next data structure:
from enum import IntEnum, unique
from pathlib import Path
from datetime import datetime
from peewee import *
#unique
class Status(IntEnum):
CREATED = 0
FAIL = -1
SUCCESS = 1
db_path = Path(__file__).parent / "test.sqlite"
database = SqliteDatabase(db_path)
class BaseModel(Model):
class Meta:
database = database
class Unit(BaseModel):
name = TextField(unique=True)
some_field = TextField(null=True)
created_at = DateTimeField(default=datetime.now)
class Campaign(BaseModel):
id_ = AutoField()
created_at = DateTimeField(default=datetime.now)
class Task(BaseModel):
id_ = AutoField()
status = IntegerField(default=Status.CREATED)
unit = ForeignKeyField(Unit, backref="tasks")
campaign = ForeignKeyField(Campaign, backref="tasks")
Next code create units, campaign and tasks:
def fill_units(count):
units = []
with database.atomic():
for i in range(count):
units.append(Unit.create(name=f"unit{i}"))
return units
def init_campaign(count):
units = Unit.select().limit(count)
with database.atomic():
campaign = Campaign.create()
for unit in units:
Task.create(unit=unit, campaign=campaign)
return campaign
The problem appears when I'm trying to add more units into existing campaign. I need to select units which haven't been used in this campaign. In SQL I can do this using next query:
SELECT * FROM unit WHERE id NOT IN (SELECT unit_id FROM task WHERE campaign_id = 1) LIMIT 10
But how to do this using peewee?
The only way I've found yet is:
def get_new_units_for_campaign(campaign, count):
unit_names = [task.unit.name for task in campaign.tasks]
units = Unit.select().where(Unit.name.not_in(unit_names)).limit(count)
return units
It's somehow works but I'm 100% sure that it's the dumbest way to implement this. Could you show me the proper way to implement this?
Finally I found this:
Unit.select().where(Unit.id.not_in(campaign.tasks.select(Task.unit))).limit(10)
Which produces
SELECT "t1"."id", "t1"."name", "t1"."some_field", "t1"."created_at" FROM "unit" AS "t1" WHERE ("t1"."id" NOT IN (SELECT "t2"."unit_id" FROM "task" AS "t2" WHERE ("t2"."campaign_id" = 1))) LIMIT 10
Which matches with SQL query I've provided in my question.
P.S. I've done some research and it seems to be a proper implementation, but I'd appreciate if somebody correct me and show the better way (if exist).
I have a table with next columns:
key
time
value
And I need to have a query like that:
SELECT
"time",
SUM("value")
FROM (
SELECT
"key",
django_trunc_datetime("time"),
AVG("value")
FROM my_table
GROUP BY "key", django_trunc_datetime("time")
)
GROUP BY "time"
Is it possible in Django ORM? Maybe with some fake model based on the subquery?
Thanks
UPDATED:
Looks like I have to create five database views (because there are Hour/Day/Week/Month/Year arguments for the django_trunc_datetime) but it can have a bad performance because in this case, I can't do the previous filtering. :(
I also thought about SQLAlchemy but it doesn't have universal datetime truncate function
SOLUTION
The solution with DjangoORM (not completed solution but illustrate the idea)
class TheApp(models.Model):
a = models.DateTimeField()
b = models.IntegerField()
class B(models.Model):
class Meta:
managed = False
c = models.DateTimeField()
d = models.IntegerField()
TheApp.objects.create(a=datetime.now(), b=4)
TheApp.objects.create(a=datetime.now(), b=5)
TheApp.objects.create(a=datetime.now(), b=7)
q1 = TheApp.objects.annotate(c=F('b'), d=Max('a')).values('c', 'd', 'id').query
q1.group_by = ('c',)
q2 = B.objects.annotate(a=F('c') * 2, b=Max('d')).values('a', 'b', 'id').query
q2.group_by = ('a',)
q3 = str(q2).replace('theapp_b', 'sub').replace('FROM "sub" ', f'FROM ({q1}) AS "sub" ')
print(q3)
print(list(B.objects.raw(q3)))
The solution I have chosen:
Use SQLAlchemy via aldjemy
My model is:
class AndroidOffer(models.Model):
name = models.CharField(max_length=128, db_index=True)
# ...
countries = models.ManyToManyField(Country)
And the following code (I skipped previous filtering):
active_offers = active_offers.filter(countries__in=[country])
It generates this SQL query:
SELECT "offers_androidoffer"."id", "offers_androidoffer"."name", "offers_androidoffer"."title", "offers_androidoffer"."is_for_android", "offers_androidoffer"."is_for_ios", "offers_androidoffer"."url", "offers_androidoffer"."icon", "offers_androidoffer"."cost", "offers_androidoffer"."quantity", "offers_androidoffer"."hourly_completions", "offers_androidoffer"."is_active", "offers_androidoffer"."description", "offers_androidoffer"."comment", "offers_androidoffer"."priority", "offers_androidoffer"."offer_type", "offers_androidoffer"."package_name", "offers_androidoffer"."is_search_install", "offers_androidoffer"."search_query", "offers_androidoffer"."launches" FROM "offers_androidoffer" INNER JOIN "offers_androidoffer_platform_versions" ON ("offers_androidoffer"."id" = "offers_androidoffer_platform_versions"."androidoffer_id") INNER JOIN "offers_androidoffer_countries" ON ("offers_androidoffer"."id" = "offers_androidoffer_countries"."androidoffer_id") WHERE ("offers_androidoffer"."is_active" = True AND "offers_androidoffer"."quantity" > 0 AND NOT ("offers_androidoffer"."id" IN (SELECT U0."offer_id" FROM "offers_androidofferstate" U0 WHERE (U0."device_id" = 1 AND (U0."state" = 3 OR U0."state" = 4)))) AND NOT ("offers_androidoffer"."package_name" IN (SELECT V0."package_name" FROM "applications_app" V0 INNER JOIN "applications_deviceapp" V1 ON (V0."id" = V1."app_id") WHERE (V1."device_id" IN (SELECT U0."device_id" FROM "users_userdevice" U0 WHERE U0."user_id" = 2) AND NOT (V0."package_name" IN (SELECT U2."package_name" FROM "offers_androidofferstate" U0 INNER JOIN "offers_androidoffer" U2 ON (U0."offer_id" = U2."id") WHERE (U0."device_id" = 1 AND (U0."state" = 0 OR U0."state" = 1 OR U0."state" = 2))))))) AND "offers_androidoffer_platform_versions"."platformversion_id" IN (14) AND "offers_androidoffer_countries"."country_id" IN (6252001)) ORDER BY "offers_androidoffer"."priority" DESC;
If I run this query in Postgresql console, it will return 0 rows, but active_offers has 4 results (all rows in table), like if I remove AND "offers_androidoffer_countries"."country_id" IN (6252001) statement.
I run this code from tests (APITestCase.client -> DRF view -> filter queryset). Django version is 2.0.2.
Why it ignores country filtering?
UPD. I've just checked with simple TestCase (test -> filter queryset) test and it returns correct number of rows. So, problem exists only with DRF testing.
UPD 2. Testcase where it works incorrectly:
class AndroidOffersListTests(APITestCase):
fixtures = [
'geo/fixtures/cities.json',
'offers/fixtures/users.json',
'offers/fixtures/devices.json',
'offers/fixtures/geo.json',
'offers/fixtures/apps.json',
'offers/fixtures/offers.json',
]
def test_list_offers_1(self):
user_device = UserDevice.objects.get(pk=1)
token = AndroidOffersListTests.get_token_for_device(user_device)
self.client.credentials(HTTP_AUTHORIZATION='Token {}'.format(token))
url = AndroidOffersListTests.get_url(user_device)
response = self.client.get(url)
self.assertEqual(status.HTTP_200_OK, response.status_code)
self.assertEqual(0, len(response.data)) # result is 4
View code:
class AndroidOffersView(ListAPIView):
model = AndroidOffer
serializer_class = AndroidOffersSerializer
permission_classes = (IsAuthenticated,)
def get_queryset(self):
device = UserDevice.get_from_request(self.request)
if device is None:
raise PermissionDenied()
return AndroidOffer.get_offers_for_device(device)
get_offers_for_device:
#staticmethod
def get_offers_for_device(user_device):
active_offers = AndroidOffer.get_active_offers()
# Filter completed
completed_states = AndroidOfferState.get_completed_for_device(user_device)
completed_offers_ids = completed_states.values_list('offer__pk', flat=True)
active_offers = active_offers.exclude(pk__in=completed_offers_ids)
# Filter apps already installed on the user's devices
apps = user_device.user.apps
# Remove packages that are in progress
in_progress_states = AndroidOfferState.get_in_progress_for_device(user_device)
in_progress_packages = in_progress_states.values_list('offer__package_name', flat=True)
apps = apps.exclude(package_name__in=in_progress_packages)
packages = apps.values_list('package_name', flat=True)
active_offers = active_offers.exclude(package_name__in=packages)
# Filter by platform version
active_offers = active_offers.filter(platform_versions__in=[user_device.device.version])
# Filter by country
country = user_device.last_geo_record.country
if country is not None:
active_offers = active_offers.filter(countries__in=[country])
return active_offers
Test case where it works fine:
class AndroidOffersListTests(TestCase):
fixtures = [
'geo/fixtures/cities.json',
'offers/fixtures/users.json',
'offers/fixtures/devices.json',
'offers/fixtures/geo.json',
'offers/fixtures/apps.json',
'offers/fixtures/offers.json',
]
def test_list_offers_1(self):
user_device = UserDevice.objects.get(pk=1)
offers = AndroidOffer.get_offers_for_device(user_device)
self.assertEqual(0, offers.count()) # 0 — thats ok
UPD 3: when I'm running the same request in browser, it works fine:
You said this response is incorrect:
self.assertEqual(0, len(response.data)) # result is 4
But you also say this JSON response is correct:
{
"count": 0,
"next": null,
"previous": null,
"results": []
}
You're using a paginated API here. The length of 4 is due to the number of keys present in the deserialized json:
>>> len(json.loads('{"count": 0, "next": null, "previous": null, "results": []}'))
4
Note that you don't need to actually call json.loads yourself, the DRF framework has already handled that for you when preparing the response - i.e. response.data will be a dict already.
In the "Test case where it works fine", you're dealing with the queryset directly:
self.assertEqual(0, offers.count()) # 0 — thats ok
^
|____ here you go to the database, no serializer!
If you want to check the number of results, from the paginated JSON api, then you'll need to drill down that page:
len_results = len(response.data['results'])
For a test that is expected to return 0 results, this is sufficient. But take care - if you ever have tests which you expect to generate more results than the page size (configured in the settings), you may also want to check the count, and next values. You'll have to make additional requests to subsequent pages to collect all results.
field__in checks if the field is in the list that you pass in to it.
You can get your desired behavior with just this
active_offers = active_offers.filter(countries=country)
Assuming I have two models:
class Profile(models.Model):
#some fields here
class Ratings(models.Model):
profile = models.ForeignKey(profile)
category = models.IntegerField()
points = models.IntegerField()
Assuming following examle of MySQL table "ratings":
profile | category | points
1 1 10
1 1 4
1 2 10
1 3 0
1 4 10
1 4 10
1 4 10
1 5 0
I have following values in my POST data and also other fields values:
category_1_avg_val = 7
category_2_avg_val = 5
category_3_avg_val = 5
category_4_avg_val = 7
category_5_avg_val = 9
I want to filter profiles that have the average ratings calculated for categories higher or equal to required values.
Some filters are applied initially as:
q1 = [('associated_with', search_for),
('profile_type__slug__exact', profile_type),
('gender__in', gender),
('rank__in', rank),
('styles__style__in', styles),
('age__gte', age_from),
('age__lte', age_to)]
q1_list = [Q(x) for x in q1 if x[1]]
q2 = [('user__first_name__icontains', search_term),
('user__last_name__icontains', search_term),
('profile_type__name__icontains', search_term),
('styles__style__icontains', search_term),
('rank__icontains', search_term)]
q2_list = [Q(x) for x in q2 if x[1]]
if q1_list:
objects = Profile.objects.filter(
reduce(operator.and_, q1_list))
if q2_list:
if objects:
objects = objects.filter(
reduce(operator.or_, q2_list))
else:
objects = Profile.objects.filter(
reduce(operator.or_, q2_list))
if order_by_ranking_level == 'desc':
objects = objects.order_by('-ranking_level').distinct()
else:
objects = objects.order_by('ranking_level').distinct()
Now i want to filter profiles whose (average of points) (group by category) >= (avg values of category coming in post)
I tried to do this one by one as
objects = objects.filter(
ratings__category=1) \
.annotate(avg_points=Avg('ratings__points'))\
.filter(avg_points__gte=category_1_avg_val)
objects = objects.filter(
ratings__category=2) \
.annotate(avg_points=Avg('ratings__points'))\
.filter(avg_points__gte=category_2_avg_val)
But this is wrong I think. Please help me out. If return is a queryset that would be great.
Edited
Using the answer posted by hynekcer I came up with slightly different solution as I have already queryset of profiles which needs to be filtered more based on rating.
def check_ratings_avg(pr, rtd):
ok = True
qr = Ratings.objects.filter(profile__id=pr.id) \
.values('category')\
.annotate(points_avg=Avg('points'))
qr = {i['category']:i['points_avg'] for i in qr}
for cat in rtd:
val = rtd[cat]
if qr[cat] >= val:
pass
else:
ok = False
break
return ok
rtd = {1: category_1_avg_val, 2: category_2_avg_val, 3: category_3_avg_val,
4: category_4_avg_val, 5: category_5_avg_val}
objects = [i for i in objects if check_ratings_avg(i, rtd)]
Your complex query require a subquery in the principle. Possible solutions are:
A subquery written by 'extra' queryset method or raw SQL query. It is not DRY and it was unsupported by some db backends, e.g. by some versions of MySQL, however subqueries are by some limited way used since Django 1.1.
Saving intermediate results into a temporary table in the database. It is not nice in Django.
Emulation of the outer query by loop in Python. The best universal solution. A loop in Python over database data aggregated by the first query can aggregate and filter the data fast enough.
A) Subquery emulated by Python
from django.db.models import Q, Avg
from itertools import groupby
from myapp.models import Profile, Ratings
def iterator_filtered_by_average(dictionary):
qr = Ratings.objects.values('profile', 'category', 'points').order_by(
'profile', 'category').annotate(points_avg=Avg('points'))
f = Q()
for k, v in dictionary.iteritems():
f |= Q(category=k, points_avg__gte=v)
for profile, grp in groupby(qr.filter(f).values('profile')):
if len(list(grp)) == len(dictionary):
yield profile
#example
FILTER_DATA = {1:category_1_avg_val, 2:category_2_avg_val, 3:category_3_avg_val,
4:category_4_avg_val, 5:category_5_avg_val}
for row in iterator_filtered_by_average(FILTER_DATA):
print row
This is a simple solution for the original question without later additional requirements.
B) Solution with subqueries:
It is necessary for the more detailed version of question because if the initial filters are based on some field of type ManyToManyField and also because it contains a distinct clause:
# objects: QuerySet that you get from your initial filters. Not yet executed.
if rtd:
# Method `as_nested_sql` removes the `order_by` clase, unlike `as_sql`
subquery3 = objects.values('id').query \
.get_compiler(connection=connection).as_nested_sql()
subquery2 = ("""SELECT profile_id, category, avg(points) AS points_avg
FROM myapp_ratings
WHERE profile_id in
( %s
) GROUP BY profile_id, category
""" % subquery3[0], subquery3[1]
)
where_sql = ' OR '.join(
'category = %d AND points_avg >= %%s' % cat for cat in rtd.keys()
)
subquery = (
"""SELECT profile_id
FROM
( %s
) subquery2
WHERE %s
GROUP BY profile_id
HAVING count(*) = %s
""" % (subquery2[0], where_sql, len(rtd)),
subquery2[1] + tuple(rtd.values())
)
assert order_by_ranking_level in ('asc', 'desc')
mainquery = ("""SELECT myapp_profile.* FROM myapp_profile
INNER JOIN
( %s
) subquery ON subquery.profile_id=myapp_profile.id
ORDER BY ranking_level %s"""
% (subquery[0], order_by_ranking_level), subquery[1]
)
objects = Profile.objects.raw(mainquery[0], params=mainquery[1])
return objects
Replace please all strings myapp by name_of_your_application.
Example of SQL generated by this code
SELECT myapp_profile.* FROM myapp_profile
INNER JOIN
( SELECT profile_id
FROM
( SELECT profile_id, category, avg(points) AS points_avg
FROM myapp_ratings
WHERE profile_id IN
( SELECT U0.`id` FROM `myapp_profile` U0 WHERE U0.`ranking_level` >= 4
) GROUP BY profile_id, category
) subquery2
WHERE category = 1 AND points_avg >= 7 OR category = 2 AND points_avg >= 5
OR category = 3 AND points_avg >= 5 OR category = 4 AND points_avg >= 7
OR category = 5 AND points_avg >= 9
GROUP BY profile_id
HAVING count(*) = 5
) subquery ON subquery.profile_id=myapp_profile.id
ORDER BY ranking_level asc
(This SQL is for better readability parsed manually with strings %s replaced by parameters, however the database engine receive parameters unparsed for security reasons.)
Your problem is due to little support of subqueries generated by Django. Only examples from documentation of more complicated queries create a subquery. (e.g. aggregate after annotate or count after annotate or aggregate after distinct, but no annotate after distinct or after annotate) Complicated nested aggregations are simplified to one query which is unexpected.
All other solutions that execute a new individual SQL query for every object filtered by the first query are discouraged for production although they can be very useful for testing results of any better solution.
You could add methods to a manager
# Untested code
class ProfileManager(models.Manager):
def with_category_average(self, cat, avg):
# Give each filter a unique annotation key
key = 'avg_pts_' + str(cat)
return self.filter(ratings__category=cat) \
.annotate(**{key: Avg('ratings__points')}) \
.filter(**{key + '__gte': avg})
# Expects a dict of `cat: avg` pairs
def filter_by_averages(self, avg_dict):
qs = self.get_query_set()
for key, val in avg_dict.items():
qs &= self.with_category_average(key, val)
return qs
class Tag(models.Model):
name = models.CharField(maxlength=100)
class Blog(models.Model):
name = models.CharField(maxlength=100)
tags = models.ManyToManyField(Tag)
Simple models just to ask my question.
I wonder how can i query blogs using tags in two different ways.
Blog entries that are tagged with "tag1" or "tag2":
Blog.objects.filter(tags_in=[1,2]).distinct()
Blog objects that are tagged with "tag1" and "tag2" : ?
Blog objects that are tagged with exactly "tag1" and "tag2" and nothing else : ??
Tag and Blog is just used for an example.
You could use Q objects for #1:
# Blogs who have either hockey or django tags.
from django.db.models import Q
Blog.objects.filter(
Q(tags__name__iexact='hockey') | Q(tags__name__iexact='django')
)
Unions and intersections, I believe, are a bit outside the scope of the Django ORM, but its possible to to these. The following examples are from a Django application called called django-tagging that provides the functionality. Line 346 of models.py:
For part two, you're looking for a union of two queries, basically
def get_union_by_model(self, queryset_or_model, tags):
"""
Create a ``QuerySet`` containing instances of the specified
model associated with *any* of the given list of tags.
"""
tags = get_tag_list(tags)
tag_count = len(tags)
queryset, model = get_queryset_and_model(queryset_or_model)
if not tag_count:
return model._default_manager.none()
model_table = qn(model._meta.db_table)
# This query selects the ids of all objects which have any of
# the given tags.
query = """
SELECT %(model_pk)s
FROM %(model)s, %(tagged_item)s
WHERE %(tagged_item)s.content_type_id = %(content_type_id)s
AND %(tagged_item)s.tag_id IN (%(tag_id_placeholders)s)
AND %(model_pk)s = %(tagged_item)s.object_id
GROUP BY %(model_pk)s""" % {
'model_pk': '%s.%s' % (model_table, qn(model._meta.pk.column)),
'model': model_table,
'tagged_item': qn(self.model._meta.db_table),
'content_type_id': ContentType.objects.get_for_model(model).pk,
'tag_id_placeholders': ','.join(['%s'] * tag_count),
}
cursor = connection.cursor()
cursor.execute(query, [tag.pk for tag in tags])
object_ids = [row[0] for row in cursor.fetchall()]
if len(object_ids) > 0:
return queryset.filter(pk__in=object_ids)
else:
return model._default_manager.none()
For part #3 I believe you're looking for an intersection. See line 307 of models.py
def get_intersection_by_model(self, queryset_or_model, tags):
"""
Create a ``QuerySet`` containing instances of the specified
model associated with *all* of the given list of tags.
"""
tags = get_tag_list(tags)
tag_count = len(tags)
queryset, model = get_queryset_and_model(queryset_or_model)
if not tag_count:
return model._default_manager.none()
model_table = qn(model._meta.db_table)
# This query selects the ids of all objects which have all the
# given tags.
query = """
SELECT %(model_pk)s
FROM %(model)s, %(tagged_item)s
WHERE %(tagged_item)s.content_type_id = %(content_type_id)s
AND %(tagged_item)s.tag_id IN (%(tag_id_placeholders)s)
AND %(model_pk)s = %(tagged_item)s.object_id
GROUP BY %(model_pk)s
HAVING COUNT(%(model_pk)s) = %(tag_count)s""" % {
'model_pk': '%s.%s' % (model_table, qn(model._meta.pk.column)),
'model': model_table,
'tagged_item': qn(self.model._meta.db_table),
'content_type_id': ContentType.objects.get_for_model(model).pk,
'tag_id_placeholders': ','.join(['%s'] * tag_count),
'tag_count': tag_count,
}
cursor = connection.cursor()
cursor.execute(query, [tag.pk for tag in tags])
object_ids = [row[0] for row in cursor.fetchall()]
if len(object_ids) > 0:
return queryset.filter(pk__in=object_ids)
else:
return model._default_manager.none()
I've tested these out with Django 1.0:
The "or" queries:
Blog.objects.filter(tags__name__in=['tag1', 'tag2']).distinct()
or you could use the Q class:
Blog.objects.filter(Q(tags__name='tag1') | Q(tags__name='tag2')).distinct()
The "and" query:
Blog.objects.filter(tags__name='tag1').filter(tags__name='tag2')
I'm not sure about the third one, you'll probably need to drop to SQL to do it.
Please don't reinvent the wheel and use django-tagging application which was made exactly for your use case. It can do all queries you describe, and much more.
If you need to add custom fields to your Tag model, you can also take a look at my branch of django-tagging.
This will do the trick for you
Blog.objects.filter(tags__name__in=['tag1', 'tag2']).annotate(tag_matches=models.Count(tags)).filter(tag_matches=2)