For a small project I have a registry of matches and results. Every match is between teams (could be a single player team), and has a winner. So I have Match and Team models, joined by a MatchTeam model. This looks like so (simplified)see below for notes
class Team(models.Model):
...
class Match(models.Model):
teams = ManyToManyField(Team, through='MatchTeam')
...
class MatchTeam(models.Model):
match = models.ForeignKey(Match, related_name='matchteams',)
team = models.ForeignKey(Team)
winner = models.NullBooleanField()
...
Now I want to do some stats on the matches, starting with looking up who is the person that beats you the most. I'm not completely sure how to do this, at least, not efficiently.
In SQL (just approximating here), I would mean something like this:
SELECT their_matchteam.id, COUNT(*) as cnt
FROM matchteam AS your_mt
JOIN matchteam AS their_mt ON your_mt.match_id = their_mt.match_id
WHERE your.matchteam.id IN <<:your teams>>
your_matchteam.winner = false
GROUP BY their_matchteam.team_id
ORDER BY cnt DESC
(this also needs a "their_mt is not your_mt" clause btw, but the concept is clear, right?)
While I have not tested this as SQL, it's just to give an insight to what I'm looking for: I want to find this result via a Django aggregation.
According to the manual I can annotate results with an aggregation, in this case a Count. Joining MatchTeams straight on MatchTeams as I'm doing in the SQL is a bit of a shortcut maybe, as there 'should' be a Match in between? At least, I wouldn't know how to translate that into Django
So maybe I need to find certain matches for my team, and then annotate them with the count of the other team? But what is 'the other team'?
Quick write-up would look like:
nemesis = Match.objects \
.filter(matchteams__in=yourteams) \
.annotate(cnt=Count('<<otherteam>>')).order_by('-cnt')[0]
If this is the right track, how should I define the Count here.
And if it's not the right track, what is?
As is, this is all about teams instead of users. This is just to keep things simple :)
An additional question might be: should I even do this with that Django ORM stuff, or am I better off just adding SQL? That has the obvious disadvantage that you're stuck with writing very generic code (is this even possible?) or fixing your DB-backend. If not needed, I'd like to avoid that.
About the model: I really want to understand what I can change about the model to make it better, but I can't really see a solution without downsides. Let me try to explain:
I want to support matches with arbitrary amount of teams, so for instance a 5-team-match. This means I have many-to-many relationship and not one that is for instance 1 match to 2 teams. If that was the case, you could denormalize and put the winners/scores in the team table. But this is not the case.
Extra data about the results of one team (e.g. their final score, their time) is by definition a property of the relation. It cannot go into the team table (as it would be per match and you can have an undefined amount of matches), and it cannot go in the match table for the same reason mutatis mutandis.
Example: I have teams A,B,C,D and E playing a match. Team A and Team B have 10 points, the rest all have 0 points. I want to save the amount of points, and that Team A and Team B are the winners of this match.
So to the comments suggesting I need a 'better' design, by all means, if you have one I would gladly see it, but if you want to support what I support, it's going to be hard.
And as a final remark: This data can be easilly retrieved in SQL, so the model seems fine to me: I'm just too much of a beginner in Django to be able to do it in Django's ORM!
Funny problem ! I think I have the answer (get the team that beats yourteams the most):
Team.objects.get( # the expected result is a team
pk=list( # filter out yourteams
filter(lambda x: x not in [ y.id for y in yourteams ],
list(
Match.objects # search matches
.filter(matchteams__in=yourteams) # in which you were involved
.filter(matchteams__winner=False) # that you loose
.annotate(cnt=Count('teams')) # and count them
.order_by('-cnt') # sort appropriately
.values_list('teams__id', flat=True) # finally get only pks
)
)
)[0] # take the first item that should be the super winner
)
I did not test it explicitly, but if does not work, I think it may be the right track.
You can do something like this
matches_won_aginst_my_team = MatchTeam.objects.filter(team=my_team, winner=False).select_related(matches)
teams_won_matches_aginst_my_team = matches_won_aginst_my_team.filter(winner=True).values_list('matchteams__team')
But as suggested you can probably model better.
I would hold two fields in the MatchModel: home_team, away_team.
Simpler and more indicative.
Related
I am trying to apply multiple conditions to my filter. The model looks like this
class modelChat(models.Model):
source = models.ForeignKey(modelEmployer,related_name = 'rsource',on_delete=models.CASCADE,null=True,default=None,blank=True)
job = models.ForeignKey(modelJob,on_delete=models.CASCADE,null=True,default=None,blank=True)
destination = models.ForeignKey(modelEmployer,related_name = 'rdestination',on_delete=models.CASCADE,null=True,default=None,blank=True)
Initially I am trying to obtain an instance of chat that involves 2 parties based on a job. At one point source can be a destination and sometimes the destination can be the source. but the job remains the same.
This is what my query looks like
querySet = modelChat.objects.filter(
(Q(source=modelEmployerSourceInstance) | Q(destination=modelEmployerSourceInstance))
&
(Q(destination=modelEmployerDestinationInstance) | Q(destination=modelEmployerDestinationInstance))
&
Q(job_id=job_id)
)
The job id is correct and I know there is only one item in the DB. However this query alway returns back an empty item. Any suggestions why this is wrong and how I can fix it ?
I can't say for sure if that's the problem since you forgot to show what you really have in your DB but here:
(Q(destination=modelEmployerDestinationInstance) | Q(destination=modelEmployerDestinationInstance))
I assume you want:
(Q(source=modelEmployerDestinationInstance) | Q(destination=modelEmployerDestinationInstance))
instead...
Note that the logical would be much more obvious with shorter names, ie source and destination instead of modelEmployerSourceInstance modelEmployerDestinationInstance:
q = (
(Q(source=source) | Q(destination=source))
& (Q(source=destination | Q(destination=destination))
& Q(job_id=job_id)
)
querySet = modelChat.objects.filter(q)
Meaningful names are a good thing, but they have to be short and distinct enough. With "modelEmployerXXXInstance", you have four words to parse, and with the only distinctive (hence relevant) part of the name being in third position, your brain tends to skip over this part. The "model", "Employer" and "Instance" parts are actually just noise.
I'm making an application in which a user can create categories to put items in them. The items share some basic properties, but the rest of them are defined by the category they belong to. The problem is that both the category and it's special properties are created by the user.
For instance, the user may create two categories: books and buttons. In the 'book' category he may create two properties: number of pages and author. In the buttons category he may create different properties: number of holes and color.
Initially, I placed these properties in a JsonProperty inside the Item. While this works, it means that I query the Datastore just by specifying the category that I am looking for and then I have to filter the results of the query in the code. For example, if I'm looking for all the books whose author is Carl Sagan, I would query the Item class with category == books and the loop through the results to keep only those that match the author.
While I don't really expect to have that many items per category (probably in the hundreds, unlikely to get to one thousand), this looks inefficient. So I tried to use ndb.Expando to make those special properties real properties that are indexed. I did this, adding the corresponding special properties to the item when putting it to the Datastore. So if the user creates an Item in the 'books' category and previously created in that category the special property 'author', an Item is saved with the special property expando_author = author in it. It worked as I expected until this point (dev server).
The real problem though became visible when I did some queries. While they worked in the dev server, they created composite indexes for each special/expando property, even if the query filters were equality only. And while each category can have at most five properties, it is evident that it can easily get out of control.
Example query:
items = Item.query()
for p in properties:
items = items.filter(ndb.GenericProperty(p)==properties[p])
items.fetch()
Now, since I don't know in advance what the properties will be (though I will limit it to 5), I can't build the indexes before uploading the application, and even if I knew it would probably mean having more indexes that I'm comfortable with. Is Expando the wrong tool for what I'm trying to do? Should I just keep filtering the results in the code using the JsonProperty? I would greatly appreciate any advice I can get.
PD. To make this post shorter I omitted a few details about what I did, if you need to know something I may have left out just ask in the comments.
Consider storing category's properties in a single list property prefixed with category property name.
Like (forget me I forgot exact Python syntax, switched to Go)
class Item():
props = StringListProperty()
book = Item(category='book', props=['title:Carl Sagan'])
button = Item(category='button', props=['wholes:5'])
Then you can do have a single composite index on category+props and do queries like this:
def filter_items(category, propName, propValue):
Item.filter(Item.category == category).filter(Item.props==propName+':'+propValue)
And you would need a function on Item to get property values cleaned up from prop names.
What's the next best option for database-agnostic full-text search for Django without Haystack?
I have a model like:
class Paper(models.Model):
title = models.CharField(max_length=1000)
class Person(models.Model):
name = models.CharField(max_length=100)
class PaperReview(models.Model):
paper = models.ForeignKey(Paper)
person = models.ForeignKey(Person)
I need to search for papers by title and reviewer name, but I also want to search from the perspective of a person and find which papers they have and haven't reviewed. With Haystack, it's trivial to implement a full-text index to search by title and name fields, but as far as I can tell, there's no way to do the "left outer join" necessary to find papers without a review by a specific person.
Haystack is just a wrapper that exposes a few different search engine backends:
Solr
ElasticSearch
Whoosh
Xapian
There might be other backends as well available as plugins.
So the real question here is, is there a search backend that gives me the desired functionality, and does haystack expose that functionality?
The answer to that is, you can probably use elasticsearch*, but note the asterix.
Generally, when creating a search index, it's a good idea to think about the documents in the same way you might if you were creating a no-rel database and you want those documents to be as flat as possible.
So one possibility might be to have an array of char fields on a paperreview index. The array would contain all of the related foreign key references.
Another might be to use "nested documents" in elasticsearch.
And lastly, to use "parent/child documents" in elasticsearch.
You can still use haystack for indexing, with some hacking, but you will probably want to use one of the raw backends directly, such as pyelasticsearch or pyes.
http://www.elasticsearch.org/guide/reference/mapping/nested-type/
http://www.elasticsearch.org/guide/reference/mapping/parent-field/
http://pyelasticsearch.readthedocs.org/en/latest/
http://pyes.readthedocs.org/en/latest/
I know this question is older, but I spent some time investigation this recently and answered this as well here but it is actually not too hard to implement this yourself, and wanted to share.
I found the SearchVector/SearchQuery approach actually does not catch all cases, for example partial words (see https://www.fusionbox.com/blog/detail/partial-word-search-with-postgres-full-text-search-in-django/632/ for reference). You can implement your own without much trouble, depending on your constraints.
example, within a viewsets' get_queryset method:
...other params...
search_terms = self.request.GET.get('q')
if search_terms:
# remove possible other delimiters and other chars
# that could interfere
cleaned_terms = re.sub(r'[!\'()|&;,]', ' ', search_terms).strip()
if cleaned_terms:
# Check against all the params we want
# apply to previous terms' filtered results
q = reduce(
lambda p, n: p & n,
map(
lambda word:
Q(your_property__icontains=word) | Q(
second_property__icontains=word) | Q(
third_property__icontains=word)
cleaned_terms.split()
)
)
qs = YourModel.objects.filter(q)
return qs
I use Haystack + elastic search and so far its working pretty well. Dont think its trivial . You can easily implement your requirement, if theres a association between paper and person.
I ended up using djorm-ext-pgfulltext, which provides a simple Django interface for PostgreSQL's built-in full text search features.
I'm not sure the best way to describe what it is that I'm trying to do so forgive my title.
I have two models, User and Group. Group contains field, members, which is a ManyToManyField referring to User.
Given a User, I want to find all of the Groups to which that user belongs.
My idea would be to do something like this:
groups = Group.objects.filter(user in members)
Something like that. Even though I realize that this isn't right
I tried reading through this link but couldn't figure out how to apply:
http://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships
Thanks
EDIT:
Figured it out
groups = Group.objects.filter(members__username=user.username)
If you have the user and you want to have his groups then start querying from it, not the way around ;)
Here's an example:
james = User.objects.get(pk= 123)
james_groups = james.group_set.all()
The most concise way is probably
groups = user1.group_set.all()
which gives you a queryset that is iterable.
I have two simple models: Book, and Author
Each Book has one Author, linked through a foreignkey.
Things work normally until I try to use defer/only on an annotation:
authors=Author.objects.all().annotate(bookcount=Count('books'))
that works. The query looks like:
select table_author.name, table_author.birthday, COUNT(table_book.id) as bookcount
from table_book left outer join table_author on table_author.id=table_book.author_id
group by table_author.id
so very simple - selecting everything from author, and additionally selecting a count of the books.
But when I do the following, everything changes:
simple=authors.defer('birthday')
now, the simple query looks like this:
select COUNT(table_book.id) as bookcount from table_book left outer join
table_author on table_author.id=table_book.author_id group by table_author.id
and it has completely lost the extra information. What's the deal?
Well, this would seem to be a bug. There's already a ticket, but it hasn't had much attention for a while. Might be worth making a post to the django-developers Google group to chivvy things along.