QuerySet: LEFT JOIN with AND - python

I use old Django version 1.1 with hack, that support join in extra(). It works, but now is time for changes. Django 1.2 use RawQuerySet so I've rewritten my code for that solution. Problem is, that RawQuery doesn't support filters etc. which I have many in code.
Digging through Google, on CaktusGroup I've found, that I could use query.join().
It would be great, but in code I have:
LEFT OUTER JOIN "core_rating" ON
("core_film"."parent_id" = "core_rating"."parent_id"
AND "core_rating"."user_id" = %i
In query.join() I've written first part "core_film"."parent_id" = "core_rating"."parent_id" but I don't know how to add the second part after AND.
Does there exist any solution for Django, that I could use custom JOINs without rewritting all the filters code (Raw)?
This is our current fragment of code in extra()
top_films = top_films.extra(
select=dict(guess_rating='core_rating.guess_rating_alg1'),
join=['LEFT OUTER JOIN "core_rating" ON ("core_film"."parent_id" = "core_rating"."parent_id" and "core_rating"."user_id" = %i)' % user_id] + extra_join,
where=['core_film.parent_id in (select parent_id from core_film EXCEPT select film_id from filmbasket_basketitem where "wishlist" IS NOT NULL and user_id=%i)' % user_id,
'( ("core_rating"."type"=1 AND "core_rating"."rating" IS NULL) OR "core_rating"."user_id" IS NULL)',
' "core_rating"."last_displayed" IS NULL'],
)

Unfortunately, the answer here is no.
The Django ORM, like most of Django, follows a philosophy that easy things should be easy and hard things should be possible. In this case, you are definitely in the "hard things" area and the "possible" solution is to simply write the raw query. There are definitely situations like this where writing the raw query can be difficult and feels kinda gross, but from the project's perspective situations like this are too rare to justify the cost of adding such functionality.

Try this patch: https://code.djangoproject.com/ticket/7231

Related

How can I count across several relationships in django

For a small project I have a registry of matches and results. Every match is between teams (could be a single player team), and has a winner. So I have Match and Team models, joined by a MatchTeam model. This looks like so (simplified)see below for notes
class Team(models.Model):
...
class Match(models.Model):
teams = ManyToManyField(Team, through='MatchTeam')
...
class MatchTeam(models.Model):
match = models.ForeignKey(Match, related_name='matchteams',)
team = models.ForeignKey(Team)
winner = models.NullBooleanField()
...
Now I want to do some stats on the matches, starting with looking up who is the person that beats you the most. I'm not completely sure how to do this, at least, not efficiently.
In SQL (just approximating here), I would mean something like this:
SELECT their_matchteam.id, COUNT(*) as cnt
FROM matchteam AS your_mt
JOIN matchteam AS their_mt ON your_mt.match_id = their_mt.match_id
WHERE your.matchteam.id IN <<:your teams>>
your_matchteam.winner = false
GROUP BY their_matchteam.team_id
ORDER BY cnt DESC
(this also needs a "their_mt is not your_mt" clause btw, but the concept is clear, right?)
While I have not tested this as SQL, it's just to give an insight to what I'm looking for: I want to find this result via a Django aggregation.
According to the manual I can annotate results with an aggregation, in this case a Count. Joining MatchTeams straight on MatchTeams as I'm doing in the SQL is a bit of a shortcut maybe, as there 'should' be a Match in between? At least, I wouldn't know how to translate that into Django
So maybe I need to find certain matches for my team, and then annotate them with the count of the other team? But what is 'the other team'?
Quick write-up would look like:
nemesis = Match.objects \
.filter(matchteams__in=yourteams) \
.annotate(cnt=Count('<<otherteam>>')).order_by('-cnt')[0]
If this is the right track, how should I define the Count here.
And if it's not the right track, what is?
As is, this is all about teams instead of users. This is just to keep things simple :)
An additional question might be: should I even do this with that Django ORM stuff, or am I better off just adding SQL? That has the obvious disadvantage that you're stuck with writing very generic code (is this even possible?) or fixing your DB-backend. If not needed, I'd like to avoid that.
About the model: I really want to understand what I can change about the model to make it better, but I can't really see a solution without downsides. Let me try to explain:
I want to support matches with arbitrary amount of teams, so for instance a 5-team-match. This means I have many-to-many relationship and not one that is for instance 1 match to 2 teams. If that was the case, you could denormalize and put the winners/scores in the team table. But this is not the case.
Extra data about the results of one team (e.g. their final score, their time) is by definition a property of the relation. It cannot go into the team table (as it would be per match and you can have an undefined amount of matches), and it cannot go in the match table for the same reason mutatis mutandis.
Example: I have teams A,B,C,D and E playing a match. Team A and Team B have 10 points, the rest all have 0 points. I want to save the amount of points, and that Team A and Team B are the winners of this match.
So to the comments suggesting I need a 'better' design, by all means, if you have one I would gladly see it, but if you want to support what I support, it's going to be hard.
And as a final remark: This data can be easilly retrieved in SQL, so the model seems fine to me: I'm just too much of a beginner in Django to be able to do it in Django's ORM!
Funny problem ! I think I have the answer (get the team that beats yourteams the most):
Team.objects.get( # the expected result is a team
pk=list( # filter out yourteams
filter(lambda x: x not in [ y.id for y in yourteams ],
list(
Match.objects # search matches
.filter(matchteams__in=yourteams) # in which you were involved
.filter(matchteams__winner=False) # that you loose
.annotate(cnt=Count('teams')) # and count them
.order_by('-cnt') # sort appropriately
.values_list('teams__id', flat=True) # finally get only pks
)
)
)[0] # take the first item that should be the super winner
)
I did not test it explicitly, but if does not work, I think it may be the right track.
You can do something like this
matches_won_aginst_my_team = MatchTeam.objects.filter(team=my_team, winner=False).select_related(matches)
teams_won_matches_aginst_my_team = matches_won_aginst_my_team.filter(winner=True).values_list('matchteams__team')
But as suggested you can probably model better.
I would hold two fields in the MatchModel: home_team, away_team.
Simpler and more indicative.

Select attributes from different joined tables with Flask and SQLAlchemy

Can't get myself to do something as easy as good ol'
SELECT phrase.content, meaning.content
FROM phrase JOIN meaning
ON phrase.id = meaning.phrase_id
All the examples I can find in the documentation/SO are variations of
a = Phrase.query.join(Meaning).all()
which doesn't really work cause then a is a list of Phrase objects, whereas I want to select one attribute from Phrase and one from Meaning.
Anybody? Thanks
q = db.session.query(Phrase.content, Meaning.content).join(Meaning).all()

Efficient Django full-text search without Haystack

What's the next best option for database-agnostic full-text search for Django without Haystack?
I have a model like:
class Paper(models.Model):
title = models.CharField(max_length=1000)
class Person(models.Model):
name = models.CharField(max_length=100)
class PaperReview(models.Model):
paper = models.ForeignKey(Paper)
person = models.ForeignKey(Person)
I need to search for papers by title and reviewer name, but I also want to search from the perspective of a person and find which papers they have and haven't reviewed. With Haystack, it's trivial to implement a full-text index to search by title and name fields, but as far as I can tell, there's no way to do the "left outer join" necessary to find papers without a review by a specific person.
Haystack is just a wrapper that exposes a few different search engine backends:
Solr
ElasticSearch
Whoosh
Xapian
There might be other backends as well available as plugins.
So the real question here is, is there a search backend that gives me the desired functionality, and does haystack expose that functionality?
The answer to that is, you can probably use elasticsearch*, but note the asterix.
Generally, when creating a search index, it's a good idea to think about the documents in the same way you might if you were creating a no-rel database and you want those documents to be as flat as possible.
So one possibility might be to have an array of char fields on a paperreview index. The array would contain all of the related foreign key references.
Another might be to use "nested documents" in elasticsearch.
And lastly, to use "parent/child documents" in elasticsearch.
You can still use haystack for indexing, with some hacking, but you will probably want to use one of the raw backends directly, such as pyelasticsearch or pyes.
http://www.elasticsearch.org/guide/reference/mapping/nested-type/
http://www.elasticsearch.org/guide/reference/mapping/parent-field/
http://pyelasticsearch.readthedocs.org/en/latest/
http://pyes.readthedocs.org/en/latest/
I know this question is older, but I spent some time investigation this recently and answered this as well here but it is actually not too hard to implement this yourself, and wanted to share.
I found the SearchVector/SearchQuery approach actually does not catch all cases, for example partial words (see https://www.fusionbox.com/blog/detail/partial-word-search-with-postgres-full-text-search-in-django/632/ for reference). You can implement your own without much trouble, depending on your constraints.
example, within a viewsets' get_queryset method:
...other params...
search_terms = self.request.GET.get('q')
if search_terms:
# remove possible other delimiters and other chars
# that could interfere
cleaned_terms = re.sub(r'[!\'()|&;,]', ' ', search_terms).strip()
if cleaned_terms:
# Check against all the params we want
# apply to previous terms' filtered results
q = reduce(
lambda p, n: p & n,
map(
lambda word:
Q(your_property__icontains=word) | Q(
second_property__icontains=word) | Q(
third_property__icontains=word)
cleaned_terms.split()
)
)
qs = YourModel.objects.filter(q)
return qs
I use Haystack + elastic search and so far its working pretty well. Dont think its trivial . You can easily implement your requirement, if theres a association between paper and person.
I ended up using djorm-ext-pgfulltext, which provides a simple Django interface for PostgreSQL's built-in full text search features.

Is it possible to combine annotations with defer/only in django 1.2.1?

I have two simple models: Book, and Author
Each Book has one Author, linked through a foreignkey.
Things work normally until I try to use defer/only on an annotation:
authors=Author.objects.all().annotate(bookcount=Count('books'))
that works. The query looks like:
select table_author.name, table_author.birthday, COUNT(table_book.id) as bookcount
from table_book left outer join table_author on table_author.id=table_book.author_id
group by table_author.id
so very simple - selecting everything from author, and additionally selecting a count of the books.
But when I do the following, everything changes:
simple=authors.defer('birthday')
now, the simple query looks like this:
select COUNT(table_book.id) as bookcount from table_book left outer join
table_author on table_author.id=table_book.author_id group by table_author.id
and it has completely lost the extra information. What's the deal?
Well, this would seem to be a bug. There's already a ticket, but it hasn't had much attention for a while. Might be worth making a post to the django-developers Google group to chivvy things along.

Using data from django queries in the same view

I might have missed somthing while searching through the documentation - I can't seem to find a way to use data from one query to form another query.
My query is:
sites_list = Site.objects.filter(worker=worker)
I'm trying to do something like this:
for site in sites_list:
[Insert Query Here]
Edit: I saw the awnser and im not sure how i didnt get that, maybe thats the sign im up too late coding :S
You could easily do something like this:
sites_list = Site.objects.filter(worker=worker)
for site in sites_list:
new_sites_list = Site.objects.filter(name=site.name).filter(something else)
You can also use the __in lookup type. For example, if you had an Entry model with a relation to Site, you could write:
Entry.objects.filter(site__in=Site.objects.filter(...some conditions...))
This will end up doing one query in the DB (the filter condition on sites would be turned into a subquery in the WHERE clause).

Categories