Union and Intersect in Django

Union and Intersect in Django - python

class Tag(models.Model):
name = models.CharField(maxlength=100)
class Blog(models.Model):
name = models.CharField(maxlength=100)
tags = models.ManyToManyField(Tag)
Simple models just to ask my question.
I wonder how can i query blogs using tags in two different ways.
Blog entries that are tagged with "tag1" or "tag2":
Blog.objects.filter(tags_in=[1,2]).distinct()
Blog objects that are tagged with "tag1" and "tag2" : ?
Blog objects that are tagged with exactly "tag1" and "tag2" and nothing else : ??
Tag and Blog is just used for an example.

You could use Q objects for #1:
# Blogs who have either hockey or django tags.
from django.db.models import Q
Blog.objects.filter(
Q(tags__name__iexact='hockey') | Q(tags__name__iexact='django')
)
Unions and intersections, I believe, are a bit outside the scope of the Django ORM, but its possible to to these. The following examples are from a Django application called called django-tagging that provides the functionality. Line 346 of models.py:
For part two, you're looking for a union of two queries, basically
def get_union_by_model(self, queryset_or_model, tags):
"""
Create a ``QuerySet`` containing instances of the specified
model associated with *any* of the given list of tags.
"""
tags = get_tag_list(tags)
tag_count = len(tags)
queryset, model = get_queryset_and_model(queryset_or_model)
if not tag_count:
return model._default_manager.none()
model_table = qn(model._meta.db_table)
# This query selects the ids of all objects which have any of
# the given tags.
query = """
SELECT %(model_pk)s
FROM %(model)s, %(tagged_item)s
WHERE %(tagged_item)s.content_type_id = %(content_type_id)s
AND %(tagged_item)s.tag_id IN (%(tag_id_placeholders)s)
AND %(model_pk)s = %(tagged_item)s.object_id
GROUP BY %(model_pk)s""" % {
'model_pk': '%s.%s' % (model_table, qn(model._meta.pk.column)),
'model': model_table,
'tagged_item': qn(self.model._meta.db_table),
'content_type_id': ContentType.objects.get_for_model(model).pk,
'tag_id_placeholders': ','.join(['%s'] * tag_count),
}
cursor = connection.cursor()
cursor.execute(query, [tag.pk for tag in tags])
object_ids = [row[0] for row in cursor.fetchall()]
if len(object_ids) > 0:
return queryset.filter(pk__in=object_ids)
else:
return model._default_manager.none()
For part #3 I believe you're looking for an intersection. See line 307 of models.py
def get_intersection_by_model(self, queryset_or_model, tags):
"""
Create a ``QuerySet`` containing instances of the specified
model associated with *all* of the given list of tags.
"""
tags = get_tag_list(tags)
tag_count = len(tags)
queryset, model = get_queryset_and_model(queryset_or_model)
if not tag_count:
return model._default_manager.none()
model_table = qn(model._meta.db_table)
# This query selects the ids of all objects which have all the
# given tags.
query = """
SELECT %(model_pk)s
FROM %(model)s, %(tagged_item)s
WHERE %(tagged_item)s.content_type_id = %(content_type_id)s
AND %(tagged_item)s.tag_id IN (%(tag_id_placeholders)s)
AND %(model_pk)s = %(tagged_item)s.object_id
GROUP BY %(model_pk)s
HAVING COUNT(%(model_pk)s) = %(tag_count)s""" % {
'model_pk': '%s.%s' % (model_table, qn(model._meta.pk.column)),
'model': model_table,
'tagged_item': qn(self.model._meta.db_table),
'content_type_id': ContentType.objects.get_for_model(model).pk,
'tag_id_placeholders': ','.join(['%s'] * tag_count),
'tag_count': tag_count,
}
cursor = connection.cursor()
cursor.execute(query, [tag.pk for tag in tags])
object_ids = [row[0] for row in cursor.fetchall()]
if len(object_ids) > 0:
return queryset.filter(pk__in=object_ids)
else:
return model._default_manager.none()

I've tested these out with Django 1.0:
The "or" queries:
Blog.objects.filter(tags__name__in=['tag1', 'tag2']).distinct()
or you could use the Q class:
Blog.objects.filter(Q(tags__name='tag1') | Q(tags__name='tag2')).distinct()
The "and" query:
Blog.objects.filter(tags__name='tag1').filter(tags__name='tag2')
I'm not sure about the third one, you'll probably need to drop to SQL to do it.

Please don't reinvent the wheel and use django-tagging application which was made exactly for your use case. It can do all queries you describe, and much more.
If you need to add custom fields to your Tag model, you can also take a look at my branch of django-tagging.

This will do the trick for you
Blog.objects.filter(tags__name__in=['tag1', 'tag2']).annotate(tag_matches=models.Count(tags)).filter(tag_matches=2)

Related

Django Query to Combine Multiple ArrayFields to one text string

I have an object model where Documents are long text files that can have Attachments and both sets of objects can also have spreadsheet-like Tables. Each table has a rectangular array with text. I want users to be able to search for a keyword across the table contents, but the results will be displayed by the main document (so instead of seeing each table that matches, you'll just see the document that has the most tables that match your query).
Below you can see a test query I'm trying to run that in an ideal world would convert all of the table contents (across all attachments) to one long string, that I can then pass to a SearchHighlight to make the headline. For some reason, the test query returns the tables as different objects, rather than concatenated to one long string.
I'm using a custom function that mimics the Postgres 13 StringAgg as I'm using Postgres 10.
Thanks in advance for your help, let me know if I need to provide more information to replicate this.
my models.py:
class Document(AbstractDocument):
tables = GenericRelation(Table)
class Attachment(AbstractDocument):
tables_new = GenericRelation(Table)
main_document = ForeignKey(Document, on_delete=CASCADE, related_name="attachments")
class Table(models.Model):
content_type = models.ForeignKey(ContentType, on_delete=models.CASCADE)
object_id = models.SlugField()
content_object = GenericForeignKey()
content = ArrayField(ArrayField(models.TextField(null=True)))
my query:
def myStringAgg(field: str):
return Func(
F(field),
Value(" "),
Value(""),
function="array_to_string",
output_field=models.TextField(),
)
s = Document.objects.all() \
.annotate(tt=myStringAgg("attachments__tables__content")) \
.values_list('tt', flat=True)
# what I get
>>> <DocumentSet ['table1', 'table2']>
# what I want
>>> <DocumentSet ['table1 table2']>
I'm using Django 3.2 and Postgres 10.
To clarify what my full scope is, this what the final query would look like:
qs = Document.objects.filter(
Q(tables__search_vector=query) |
Q(attachments__tables__search_vector=query)
)
.annotate(rank=rank)
.order_by("-rank")
.annotate(snippet=SearchHeadline(
myStringAgg("attachments__tables__content"),
query, max_fragments=5)
)

You can use the join function to create a string from a list:
s = Document.objects.all() \
.annotate(tt=myStringAgg("attachments__tables__content")) \
.values_list('tt', flat=True)
s = " ". join(list(s))

Can I lookup a related field using a Q object in the Django ORM?

In Django, can I re-use an existing Q object on multiple models, without writing the same filters twice?
I was thinking about something along the lines of the pseudo-Django code below, but did not find anything relevant in the documentation :
class Author(Model):
name = TextField()
company_name = TextField()
class Book(Model):
author = ForeignKey(Author)
# Create a Q object for the Author model
q_author = Q(company_name="Books & co.")
# Use it to retrieve Book objects
qs = Book.objects.filter(author__matches=q_author)
If that is not possible, can I extend an existing Q object to work on a related field? Pseudo-example :
# q_book == Q(author__company_name="Books & co.")
q_book = q_author.extend("author")
# Use it to retrieve Book objects
qs = Book.objects.filter(q_book)
The only thing I've found that comes close is using a subquery, which is a bit unwieldy :
qs = Book.objects.filter(author__in=Author.objects.filter(q_author))

From what I can tell by your comment, it just looks like you're trying to pass a set of common arguments to multiple filters, to do that you can just unpack a dictionary
The values in the dictionary can still be q objects if required as if it were a value you would pass in to the filter argument normally
args = { 'author__company_name': "Books & co" }
qs = Book.objects.filter(**args)
args['author_name'] = 'Foo'
qs = Book.objects.filter(**args)
To share this between different models, you'd have to do some dictionary mangling
author_args = { k.lstrip('author__'): v for k, v in args.items }

You can do this
books = Book.objects.filter(author__company_name="Books & co")

Django Queryset sort by order_by with relatedManager

I am tryint to get objects sorted. this is my code:
ratings = Rate.objects.order_by(sortid)
locations = Location.objects.filter(locations_rate__in=ratings).order_by('locations_rate').distinct('id')
this is my model:
class Rate(models.Model):
von_location= models.ForeignKey(Location,related_name="locations_rate")
price_leistung = models.IntegerField(max_length=5,default=00)
bewertung = models.IntegerField(max_length=3,default=00)
how can I get all Locations in that order which is equal to that of ratings?
what I have above isnot working.
EDIT:
def sort(request):
sortid = request.GET.get('sortid')
ratings = Rate.objects.all()
locations = Location.objects.filter(locations_rate__in=ratings).order_by('locations_rate__%s' % sortid).distinct('id')
if request.is_ajax():
template = 'resultpart.html'
return render_to_response(template,{'locs':locations},context_instance=RequestContext(request))

You must specify a field to use for sorting the Rate objects, for example:
ratings = Rate.objects.all()
locations = Location.objects.filter(
locations_rate__in=ratings
).order_by('locations_rate__%s' % sortid).distinct('id')
You do not need to sort ratings beforehand.
The documentation provides example of use of order_by on related fields.

How can I cut down the number of queries?

This code is currently executing about 50 SQL queries:
c = Category.objects.all()
categories_w_rand_books = []
for category in c:
r = Book.objects.filter(author__category=category).order_by('?')[:5]
categories_w_rand_books.append((category, r))
I need to cut down the number of used queries to the minimum to speed up things and do not cause server load.
Basically, I have three models: Category, Author, Book. The Author belong to the Category (not books) and I need to get a list of all categories with 5 random books under each one.

If you prefer single query and are using MySQL, check the excellent link provided by #Crazyshezy in his comment.
For PostgreSQL backends, a possible query is (assuming there are non-nullable FK relationships from Book to Author and from Author to Category):
SELECT * FROM (
SELECT book_table.*, row_number() OVER (PARTITION BY category_id ORDER BY RANDOM()) AS rn
FROM book_table INNER JOIN author_table ON book_table.author_id = author_table.id
) AS sq
WHERE rn <= 5
You could then wrap it inside a RawQuerySet to get Book instances
from collections import defaultdict
qs = Book.objects.raw("""The above sql suited for your tables...""")
collection = defaultdict(list)
for obj in qs:
collection[obj.category_id].append(obj)
categories_w_rand_books = []
for category in c:
categories_w_rand_books.append((category, collection[category.id]))
You may not want to run this query for each request directly w/o some caching.
Furthermore, your code generates at most 50*5=250 Books, randomly, I just wonder why because it seems too many for a single page. Are items displayed as tabs or something else? Perhaps you could reduce the counts of SQLs by doing Ajax, or simplify the requirement?
Update
To use book.author w/o triggering more than another query, try prefetch_related_objects
from django.db.models.query import prefetch_related_objects
qs = list(qs) # have to evaluate at first
prefetch_related_objects(qs, ['author'])
# now instances inside qs already contain cached author instances, and
qs[0].author # will not trigger an extra query
The above code prefetches authors in batch and fills them into the qs. This just adds another query.

I'm not sure if this will help you because I don't know the details and context of your problem, but using order_by('?') is very inefficient, specially with some DB back-ends.
For displaying entities with a bit of randomness I use this approach, using a custom filter:
#register.filter
def random_iterator(list, k):
import random
class MyIterator:
def __init__(self, obj, order):
self.obj=obj
self.cnt=0
self.order = order
def __iter__(self):
return self
def next(self):
try:
result=self.obj.__getitem__(self.order[self.cnt])
self.cnt+=1
return result
except IndexError:
raise StopIteration
if list is None:
list = []
n = len(list)
k = min(n, k)
return MyIterator(list, random.sample(range(n), k))
The code in my Django view is something like this:
RAND_BOUND = 50
categories = Category.objects.filter(......)[RAND_BOUND]
And, I use it in my template in this way:
{% for cat in categories|random_iterator:5 %}
<li>{{ cat }}</li>
{% endfor %}
This code will pick 5 random categories of a (reduced) set of RAND_BOUND.
This is not THE perfect solution, but hope it helps.

How do I order by date when using ReferenceProperty?

I have a simple one-to-many structure like this:
class User(db.Model):
userEmail = db.StringProperty()
class Comment(db.Model):
user = db.ReferenceProperty(User, collection_name="comments")
comment = db.StringProperty()
date = db.DateTimeProperty()
I fetch a user from by his email:
q = User.all() # prepare User table for querying
q.filter("userEmail =", "az#example.com") # apply filter, email lookup
results = q.fetch(1) # execute the query, apply limit 1
the_user = results[0] # the results is a list of objects, grab the first one
this_users_comments = the_user.comments # get the user's comments
How can I order the user's comments by date, and limit it to 10 comments?

You will want to use the key keyword argument of the built-in sorted function, and use the "date" property as the key:
import operator
sorted_comments = sorted(this_users_comments, key=operator.attrgetter("date"))
# The comments will probably be sorted with earlier comments at the front of the list
# If you want ten most recent, also add the following line:
# sorted_comments.reverse()
ten_comments = sorted_comments[:10]

That query fetches the user. You need to do another query for the comments:
this_users_comments.order('date').limit(10)
for comment in this_users_comments:
...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Union and Intersect in Django - python

Please don't reinvent the wheel and use django-tagging application which was made exactly for your use case. It can do all queries you describe, and much more. If you need to add custom fields to your Tag model, you can also take a look at my branch of django-tagging.

This will do the trick for you Blog.objects.filter(tagsnamein=['tag1', 'tag2']).annotate(tag_matches=models.Count(tags)).filter(tag_matches=2)

Related

Django Query to Combine Multiple ArrayFields to one text string

Can I lookup a related field using a Q object in the Django ORM?

Django Queryset sort by order_by with relatedManager

How can I cut down the number of queries?

How do I order by date when using ReferenceProperty?

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Union and Intersect in Django - python

Please don't reinvent the wheel and use django-tagging application which was made exactly for your use case. It can do all queries you describe, and much more. If you need to add custom fields to your Tag model, you can also take a look at my branch of django-tagging.

This will do the trick for you Blog.objects.filter(tags__name__in=['tag1', 'tag2']).annotate(tag_matches=models.Count(tags)).filter(tag_matches=2)

Related

Django Query to Combine Multiple ArrayFields to one text string

Can I lookup a related field using a Q object in the Django ORM?

Django Queryset sort by order_by with relatedManager

How can I cut down the number of queries?

How do I order by date when using ReferenceProperty?

Categories

Resources

This will do the trick for you Blog.objects.filter(tagsnamein=['tag1', 'tag2']).annotate(tag_matches=models.Count(tags)).filter(tag_matches=2)