How to search in ManyToManyField - python

I am new to django, and I am trying to make a query in a Many to many field.
an example of my query:
in the Models I have
class Line(models.Model):
name = models.CharField("Name of line", max_length=50, blank=True)
class Cross(models.Model):
lines = models.ManyToManyField(Line, verbose_name="Lines crossed")
date = models.DateField('Cross Date', null=True, blank=False)
I am making a search querying all the crosses that have certain lines.
I mean the query in the search box will look like: line_1, line_2, line_3
and the result will be all the crosses that have all the lines (line_1, line2, line_3)
I don't know how should the filter condition be!
all_crosses = Cross.objects.all().filter(???)
The view code:
def inventory(request):
if request.method == "POST":
if 'btn_search' in request.POST:
if 'search_by_lines' in request.POST:
lines_query = request.POST['search_by_lines']
queried_lines = split_query(lines_query, ',')
query = [Q(lines__name=l) for l in queried_lines]
print(query)
result = Cross.objects.filter(reduce(operator.and_, query))
Thank you very much

You should be able to do:
crosses = Cross.objects.filter(lines__name__in=['line_1', 'line_2', 'line_3'])
for any of the three values. If you're looking for all of the values that match, you'll need to use a Q object:
from django.db.models import Q
crosses = Cross.objects.filter(
Q(lines__name='line_1') &
Q(lines__name='line_2') &
Q(lines__name='line_3')
)
There is at least one other approach you can use, which would be chaining filters:
Cross.objects.filter(lines__name='line_1')
.filter(lines_name='line_2')
.filter(lines__name='line_3')
If you need to dynamically construct the Q objects, and assuming the "name" value is what you're posting:
import operator
lines = [Q(line__name='{}'.format(line)) for line in request.POST.getlist('lines')]
crosses = Cross.objects.filter(reduce(operator.and_, lines))
[Update]
Turns out, I was dead wrong. I tried a couple of different ways of querying Cross objects where the value of lines matched all of the items searched. Q objects, annotations of counts on the number of objects contained... nothing worked as expected.
In the end, I ended up matching cross.lines as a list to the list of values posted.
In short, the search view I created matched in this fashion:
results = []
posted_lines = []
search_by_lines = 'search_by_lines' in request.POST.keys()
crosses = Cross.objects.all().prefetch_related('lines')
if request.method == 'POST' and search_by_lines:
posted_lines = request.POST.getlist('line')
for cross in crosses:
if list(cross.lines.values_list('name', flat=True)) == posted_lines:
results.append(cross)
return render(request, 'search.html', {'lines': lines, 'results': results,
'posted_lines': posted_lines})
What I would probably do in this case is add a column on the Cross model to keep a comma separated list of the primary keys of the related lines values, which you could keep in sync via a post_save signal.
With the additional field, you could query directly against the "line" values without joins.

Related

Django ORM filter by Max column value of two related models

I have 3 related models:
Program(Model):
... # which aggregates ProgramVersions
ProgramVersion(Model):
program = ForeignKey(Program)
index = IntegerField()
UserProgramVersion(Model):
user = ForeignKey(User)
version = ForeignKey(ProgramVersion)
index = IntegerField()
ProgramVersion and UserProgramVersion are orderable models based on index field - object with highest index in the table is considered latest/newest object (this is handled by some custom logic, not relevant).
I would like to select all latest UserProgramVersion's, i.e. latest UPV's which point to the same Program.
this can be handled by this UserProgramVersion queryset:
def latest_user_program_versions(self):
latest = self\
.order_by('version__program_id', '-version__index', '-index')\
.distinct('version__program_id')
return self.filter(id__in=latest)
this works fine however im looking for a solution which does NOT use .distinct()
I tried something like this:
def latest_user_program_versions(self):
latest = self\
.annotate(
'max_version_index'=Max('version__index'),
'max_index'=Max('index'))\
.filter(
'version__index'=F('max_version_index'),
'index'=F('max_index'))
return self.filter(id__in=latest)
this however does not work
Use Subquery() expressions in Django 1.11. The example in docs is similar and the purpose is also to get the newest item for required parent records.
(You could start probably by that example with your objects, but I wrote also a complete more complicated suggestion to avoid possible performance pitfalls.)
from django.db.models import OuterRef, Subquery
...
def latest_user_program_versions(self, *args, **kwargs):
# You should filter users by args or kwargs here, for performance reasons.
# If you do it here it is applied also to subquery - much faster on a big db.
qs = self.filter(*args, **kwargs)
parent = Program.objects.filter(pk__in=qs.values('version__program'))
newest = (
qs.filter(version__program=OuterRef('pk'))
.order_by('-version__index', '-index')
)
pks = (
parent.annotate(newest_id=Subquery(newest.values('pk')[:1]))
.values_list('newest_id', flat=True)
)
# Maybe you prefer to uncomment this to be it compiled by two shorter SQLs.
# pks = list(pks)
return self.filter(pk__in=pks)
If you considerably improve it, write the solution in your answer.
EDIT Your problem in your second solution:
Nobody can cut a branch below him, neither in SQL, but I can sit on its temporary copy in a subquery, to can survive it :-) That is also why I ask for a filter at the beginning. The second problem is that Max('version__index') and Max('index') could be from two different objects and no valid intersection is found.
EDIT2: Verified: The internal SQL from my query is complicated, but seems correct.
SELECT app_userprogramversion.id,...
FROM app_userprogramversion
WHERE app_userprogramversion.id IN
(SELECT
(SELECT U0.id
FROM app_userprogramversion U0
INNER JOIN app_programversion U2 ON (U0.version_id = U2.id)
WHERE (U0.user_id = 123 AND U2.program_id = (V0.id))
ORDER BY U2.index DESC, U0.index DESC LIMIT 1
) AS newest_id
FROM app_program V0 WHERE V0.id IN
(SELECT U2.program_id AS Col1
FROM app_userprogramversion U0
INNER JOIN app_programversion U2 ON (U0.version_id = U2.id)
WHERE U0.user_id = 123
)
)

Django Querying database

I'm trying to implement search in django.
My view is as follows :
search_term = request.GET['search_term']
customers = Customer.objects.filter(
Q(chassis__icontains=search_term) | Q(registration__icontains=search_term) |
Q(email__icontains=search_term) | Q(firstname__icontains=search_term) |
Q(lastname__icontains=search_term))
calculations_data = []
if customers:
for customer in customers:
try:
calculation = Calculations.objects.get(customer=customer, user=request.user)
calculations_data.append({
'calculation': calculation,
'price': price_incl_vat(calculation.purchase_price),
'customer_fullname': '{} {} '.format(customer.firstname, customer.lastname),
'car_chassis': customer.chassis,
'car_registration': customer.registration,
})
except Calculations.DoesNotExist:
pass
context = {'search_term': search_term, 'total_result': len(calculations_data), 'calculation_data': calculations_data}
return render(request, 'master/search.html', context)
I have two models, calculations and customer. Inside calculation I have customer as ForeignKey, but it can be empty. Thus, every calculation doesn't need to have a customer.
In my example, if I have search term the result is good, but If there is not search term, then I get only the calculations which have a customer.
But what I need is, if there is no search_term, I want to get all calculations.
Is there maybe a better way to write the query?
Thanks.
Since the results depend on availability of search_term, why aren't you using if-else on search_term.
search_term = request.GET.get('search_term', None)
if search_term:
# when search term is not None
# get relevant calculations
else:
calculations = Calculations.objects.all()
# rest of code
You can further simplify your code when search_term is not None by putting the Q objects directly in a Calculations.objects.filter() itself (instead of getting relevant customers and then finding the relevant calculations). In Django, you can query on attributes of foreign key in Q objects. You are first fetching Customers and then using those results to find Calculations. That will increase number of queries to database.
You can do something like following:
calculations = Calculations.objects.filter(
Q(customer__email__icontains=search_term) |
Q(customer__chassis_icontains=search_term)|
Q(....)).select_related('customer')
Related links:
1. Lookups that span relationships
2. select_related
try this:
if customers:
try:
calculations = Calculations.objects.filter(user=request.user)
if customers:
calculations=calculations.filter(customer__in=customers)
for calculation in calculations:
calculations_data.append({
'calculation': calculation,
'price': price_incl_vat(calculation.purchase_price),
'customer_fullname': '{} {} '.format(customer.firstname, customer.lastname),
'car_chassis': customer.chassis,
'car_registration': customer.registration,
})
except Calculations.DoesNotExist:
pass

How to annotate Count with a condition in a Django queryset

Using Django ORM, can one do something like queryset.objects.annotate(Count('queryset_objects', gte=VALUE)). Catch my drift?
Here's a quick example to use for illustrating a possible answer:
In a Django website, content creators submit articles, and regular users view (i.e. read) the said articles. Articles can either be published (i.e. available for all to read), or in draft mode. The models depicting these requirements are:
class Article(models.Model):
author = models.ForeignKey(User)
published = models.BooleanField(default=False)
class Readership(models.Model):
reader = models.ForeignKey(User)
which_article = models.ForeignKey(Article)
what_time = models.DateTimeField(auto_now_add=True)
My question is: How can I get all published articles, sorted by unique readership from the last 30 mins? I.e. I want to count how many distinct (unique) views each published article got in the last half an hour, and then produce a list of articles sorted by these distinct views.
I tried:
date = datetime.now()-timedelta(minutes=30)
articles = Article.objects.filter(published=True).extra(select = {
"views" : """
SELECT COUNT(*)
FROM myapp_readership
JOIN myapp_article on myapp_readership.which_article_id = myapp_article.id
WHERE myapp_readership.reader_id = myapp_user.id
AND myapp_readership.what_time > %s """ % date,
}).order_by("-views")
This sprang the error: syntax error at or near "01" (where "01" was the datetime object inside extra). It's not much to go on.
For django >= 1.8
Use Conditional Aggregation:
from django.db.models import Count, Case, When, IntegerField
Article.objects.annotate(
numviews=Count(Case(
When(readership__what_time__lt=treshold, then=1),
output_field=IntegerField(),
))
)
Explanation:
normal query through your articles will be annotated with numviews field. That field will be constructed as a CASE/WHEN expression, wrapped by Count, that will return 1 for readership matching criteria and NULL for readership not matching criteria. Count will ignore nulls and count only values.
You will get zeros on articles that haven't been viewed recently and you can use that numviews field for sorting and filtering.
Query behind this for PostgreSQL will be:
SELECT
"app_article"."id",
"app_article"."author",
"app_article"."published",
COUNT(
CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN 1
ELSE NULL END
) as "numviews"
FROM "app_article" LEFT OUTER JOIN "app_readership"
ON ("app_article"."id" = "app_readership"."which_article_id")
GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"
If we want to track only unique queries, we can add distinction into Count, and make our When clause to return value, we want to distinct on.
from django.db.models import Count, Case, When, CharField, F
Article.objects.annotate(
numviews=Count(Case(
When(readership__what_time__lt=treshold, then=F('readership__reader')), # it can be also `readership__reader_id`, it doesn't matter
output_field=CharField(),
), distinct=True)
)
That will produce:
SELECT
"app_article"."id",
"app_article"."author",
"app_article"."published",
COUNT(
DISTINCT CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN "app_readership"."reader_id"
ELSE NULL END
) as "numviews"
FROM "app_article" LEFT OUTER JOIN "app_readership"
ON ("app_article"."id" = "app_readership"."which_article_id")
GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"
For django < 1.8 and PostgreSQL
You can just use raw for executing SQL statement created by newer versions of django. Apparently there is no simple and optimized method for querying that data without using raw (even with extra there are some problems with injecting required JOIN clause).
Articles.objects.raw('SELECT'
' "app_article"."id",'
' "app_article"."author",'
' "app_article"."published",'
' COUNT('
' DISTINCT CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN "app_readership"."reader_id"'
' ELSE NULL END'
' ) as "numviews"'
'FROM "app_article" LEFT OUTER JOIN "app_readership"'
' ON ("app_article"."id" = "app_readership"."which_article_id")'
'GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"')
For django >= 2.0 you can use Conditional aggregation with a filter argument in the aggregate functions:
from datetime import timedelta
from django.utils import timezone
from django.db.models import Count, Q # need import
Article.objects.annotate(
numviews=Count(
'readership__reader__id',
filter=Q(readership__what_time__gt=timezone.now() - timedelta(minutes=30)),
distinct=True
)
)

django - annotate() instead of distinct()

I am stuck in this issue:
I have two models:
Location and Rate.
each location has its rate, possibly multiple rates.
i want to get locations ordered by its rates, ascendingly.
obvouisly, order_by and distinct() dont work together:
locations = Location.objects.filter(**s_kwargs).order_by('locations_rate__rate').distinct('id')
then i read the docs and came to annotate(). but i am not sure whether i have to use a function between annotate.
if i do this:
locations = Location.objects.filter(**s_kwargs).annotate(rate=Count('locations_rate__rate')).order_by('rate')
but this counts the rates and orders by the sum. i want to get locations with its rates ordered by the value of those rates.
my model definitions are:
class Location(models.Model):
name = models.TextField()
adres = models.TextField()
class Rate(models.Model):
location = models.ForeignKey(Location,related_name='locations_rate')
rate = models.IntegerField(max_length=2)
price_rate = models.IntegerField(max_length=2) #<--- added now
datum = models.DateTimeField(auto_now_add=True,blank=True) #<--- added now
Well the issue is not how to make query in Django for the problem you described. It's that your problem is either incorrect or not property thought through. Let me explained with an example:
Suppose you have two Location objects, l1 and l2. l1 has two Rate objects related to it, r1 and r3, such that r1.rate = 1 and r3.rate = 3; And l2 has one rate object related to it, r2, such that r2.rate = 2. Now what should be the order of your query's result l1 followed l2 or l2 followed by l1?? As one of l1's rate is less than l2's rate and the other one is greater than l2's rate.
Try this:
from django.db.models import Count, Sum
# if you want to annotate by count of rates
locations = Location.objects.filter(**s_kwargs) \
.annotate(rate_count = Count('locations_rate')) \
.order_by('rate_count')
# if you want to annotate on values of rate e.g. Sum
locations = Location.objects.filter(**s_kwargs) \
.annotate(rate_count = Sum('locations_rate')) \
.order_by('rate_count')
Possibly you want something like this:
locations = (Location.objects.filter(**s_kwargs)
.values('locations_rate__rate')
.annotate(Count('locations_rate__rate'))
.order_by('locations_rate__rate'))
You need the Count() since you actually need a GROUP BY query, and GROUP BY only works with aggregate functions like COUNT or SUM.
Anyway I think your problem can be solved with normal distinct():
locations = (Location.objects.filter(**s_kwargs)
.order_by('locations_rate__rate')
.distinct('locations_rate__rate'))
Why would you want to use annotate() instead?
I haven't tested both but hope it helps.
annotate(*args, **kwargs),Annotates each object in the QuerySet with the provided list of aggregate values (averages, sums, etc) that have
been computed over the objects that are related to the objects in the QuerySet.
So if you want only to get locations ordered by its rates, ascendingly you dont have to use annotate()
you can try this :
loc = Location.objects.all()
rate = Rate.objects.filter(loc=rate__location).order_by('-rate')

Django Queryset: Need help in optimizing this set of queries

I'm trying to sieve out some common tag-combinations from a list of educational question records.
For this example, I'm looking at only 2-tag example (tag-tag) which I should get an example of result like:
"point" + "curve" (65 entries)
"add" + "subtract" (40 entries)
...
This is the desired outcome in SQL statement:
SELECT a.tag, b.tag, count(*)
FROM examquestions.dbmanagement_tag as a
INNER JOIN examquestions.dbmanagement_tag as b on a.question_id_id = b.question_id_id
where a.tag != b.tag
group by a.tag, b.tag
Basically we are getting different tags with common questions to be identified into a list and group them within the same matching tag combinations.
I have tried to do a similar query using django queryset:
twotaglist = [] #final set of results
alphatags = tag.objects.all().values('tag', 'type').annotate().order_by('tag')
betatags = tag.objects.all().values('tag', 'type').annotate().order_by('tag')
startindex = 0 #startindex reduced by 1 to shorten betatag range each time the atag changes. this is to reduce the double count of comparison of similar matches of tags
for atag in alphatags:
for btag in betatags[startindex:]:
if (atag['tag'] != btag['tag']):
commonQns = [] #to check how many common qns
atagQns = tag.objects.filter(tag=atag['tag'], question_id__in=qnlist).values('question_id').annotate()
btagQns = tag.objects.filter(tag=btag['tag'], question_id__in=qnlist).values('question_id').annotate()
for atagQ in atagQns:
for btagQ in btagQns:
if (atagQ['question_id'] == btagQ['question_id']):
commonQns.append(atagQ['question_id'])
if (len(commonQns) > 0):
twotaglist.append({'atag': atag['tag'],
'btag': btag['tag'],
'count': len(commonQns)})
startindex=startindex+1
The logic works fine, however as I am pretty new to this platform, I'm not sure if there is a shorter workaround instead to make it much efficient.
Currently, the query needed about 45 seconds on about 5K X 5K tag comparison :(
Addon: Tag class
class tag(models.Model):
id = models.IntegerField('id',primary_key=True,null=False)
question_id = models.ForeignKey(question,null=False)
tag = models.TextField('tag',null=True)
type = models.CharField('type',max_length=1)
def __str__(self):
return str(self.tag)
Unfortunately django doesn't allow joining unless there's a foreign key (or one to one) involved. You're going to have to do it in code. I've found a way (totally untested) to do it with a single query which should improve execution time significantly.
from collections import Counter
from itertools import combinations
# Assuming Models
class Question(models.Model):
...
class Tag(models.Model):
tag = models.CharField(..)
question = models.ForeignKey(Question, related_name='tags')
c = Counter()
questions = Question.objects.all().prefetch_related('tags') # prefetch M2M
for q in questions:
# sort them so 'point' + 'curve' == 'curve' + 'point'
tags = sorted([tag.name for tag in q.tags.all()])
c.update(combinations(tags,2)) # get all 2-pair combinations and update counter
c.most_common(5) # show the top 5
The above code uses Counters, itertools.combinations, and django prefetch_related which should cover most of the bits above that might be unknown. Look at those resources if the above code doesn't work exactly, and modify accordingly.
If you're not using a M2M field on your Question model you can still access tags as if it were a M2M field by using reverse relations. See my edit that changes the reverse relation from tag_set to tags. I've made a couple of other edits that should work with the way you've defined your models.
If you don't specify related_name='tags', then just change tags in the filters and prefetch_related to tag_set and you're good to go.
If I understood your question correctly, I would keep things simpler and do something like this
relevant_tags = Tag.objects.filter(question_id__in=qnlist)
#Here relevant_tags has both a and b tags
unique_tags = set()
for tag_item in relevant_tags:
unique_tags.add(tag_item.tag)
#unique_tags should have your A and B tags
a_tag = unique_tags.pop()
b_tag = unique_tags.pop()
#Some logic to make sure what is A and what is B
a_tags = filter(lambda t : t.tag == a_tag, relevant_tags)
b_tags = filter(lambda t : t.tag == b_tag, relevant_tags)
#a_tags and b_tags contain A and B tags filtered from relevant_tags
same_question_tags = dict()
for q in qnlist:
a_list = filter(lambda a: a.question_id == q.id, a_tags)
b_list = filter(lambda a: a.question_id == q.id, b_tags)
same_question_tags[q] = a_list+b_list
The good thing about this is you can extend it to N number of tags by iterating over the returned tags in a loop to get all unique ones and then iterating further to filter them out tag wise.
There are definitely more ways to do this too.

Categories