Runtime Foreign Key vs Integerfield - python

I have a problem. I already have two solution for my problem, but i was wondering which of those is the faster solution.
I guess that the second solution is not only more convienient- to use but also faster, but i want to be sure, so thats the reason why im asking.
My problem is i want to group multiple rows together. The group won't hold any meta data. So im only interested in runtime.
On the one hand i can use a Integer field and filter it later on when i need to get all entries that belong to the group. I guess runtime of O(n).
class SingleEntries(models.Model):
name = models.CharField(max_length=20)
group = models.IntegerField(null=True)
def find_all_group_members(id):
return SingleEntries.objects.filter(group=id)
The second solution and probably the more practicle way would be to create a foreign key to another model only using the pk there.
Then i can use the reverse relation to find all the entries that belong to the group.
class Group(models.Model):
id = models.AutoField(primary_key=True)
class SingleEntries(models.Model):
name = models.CharField(max_length=20)
group = models.ForeignKey(Group,on_delete=models.CASCADE,null=True)
def find_all_group_members(id):
return Group.objects.get(id=id).singleentries_set.all()

The first is more efficient, since this will use one query, whereas the latter will first fetch the Group, and then another one for the SingleEntries.
Indeed, if you work with:
SingleEntries.objects.filter(group=id)
this will make a simple query:
SELECT appname_singleentries.*
FROM appname_singleentries
WHERE appname_singleentries.group_id = id
It thus does not first fetch the Group into memory.
The latter will however make two queries. Indeed, it will first make a query to retrieve the Group, and then it will make a query like the one above to fetch the SingleEntries.
The two are also semantically not entirely the same: if there is no such group, then the former will return an empty QuerySet, whereas the latter will raise a Group.DoesNotExists exception.
But you can model this with:
class Group(models.Model):
pass
class SingleEntries(models.Model):
name = models.CharField(max_length=20)
group = models.ForeignKey(Group,on_delete=models.CASCADE,null=True)
def find_all_group_members(id):
return SingleEntries.objects.filter(group_id=id)
So you can use a Group model without having to retrieve the Group first.

If the groups are static in nature, that means if you don't see more groups coming to your system, you can use choices in Django.
Define choices as below
class GroupType(models.IntegerChoices):
GROUP_0 = 0, "Group 0 name"
GROUP_1 = 1, "Group 1 name"
GROUP_2 = 2, "Group 2 name"
And use it as choices field in the SingleEntries model as below
class SingleEntries(models.Model):
name = models.CharField(max_length=20)
group = models.IntegerField(choices=GroupChoices.choices, default=<set default here>)
If the groups are dynamic, meaning users can create groups whenever they want, in that case, go with your second approach of having another model for group.

Related

Querying Many To Many relationship by number of joins using Django

I have two models: ActorModel and FilmModel joined as follows:
FilmModel(models.Model):
actors = models.ManyToManyField(Actor, blank=True, related_name='film_actors')
ActorModel(models.Model):
name = models.CharField(max_length=40)
def __str__(self):
return self.imdb_id
I want to filter my ActorModel for any instance which has more than 5 joins with the FilmModel. I can do this as follows:
actors = ActorModel.objects.all()
more_than_five_films = []
for actor in actors:
actor_film_list = FilmModel.objects.filter(actors__imdb_id=str(name))
if len(actor_film_list)>5:
more_than_five_films.append(actor)
However, using the above code uses lots of processing power. Is there a more efficient way of finding the actors with more than 5 joins? Could I do this at the filtering stage for example?
You could use query like this:
more_than_five_films = ActorModel.objects.annotate(count=Count('film_actors')).filter(count__gt=5)
You access FilmModel objects of ActorModel through related_name field, annotate new field named count by counting number of FilmModel objects related to each ActorModel object and then filter out only objects that have count value greater than 5.
Advice for code you provided is to never use len() on a queryset because it evaluates the whole query which is expensive and not needed since you need only a count value. You should use count() function which returns the number as same as len() does. It looks like this:
FilmModel.objects.filter(actors__imdb_id=str(name)).count()

Django filter by the number of rows matching a certain condition in a ManyToMany

I need to filter for objects where the number of elements in a ManyToMany relationship matches a condition. Here's some simplified models:
Place(models.Model):
name = models.CharField(max_length=100)
Person(models.Model):
type = models.CharField(max_length=1)
place = models.ManyToManyField(Place, related_name="people")
I tried to do this:
c = Count(Q(people__type='V'))
p = Places.objects.annotate(v_people=c)
But this just makes the .v_people attribute count the number of People.
Since python-2.0, you can use the filter=... parameter of the Count(..) function [Django-doc] for this:
Place.objects.annotate(
v_people=Count('people', filter=Q(people__type='V'))
)
So this will assign to v_people the number of people with type='V' for that specific Place object.
An alternative is to .filter(..) the relation first:
Place.objects.filter(
Q(people__type='V') | Q(people__isnull=True)
).annotate(
v_people=Count('people')
)
Here we thus filter the relation such that we allow people that either have type='V', or with no people at all (since it is possible that the Place has no people. We then count the related model.
This generates a query like:
SELECT `place`.*, COUNT(`person_place`.`person_id`) AS `v_people`
FROM `place`
LEFT OUTER JOIN `person_place` ON `place`.`id` = `person_place`.`place_id`
LEFT OUTER JOIN `person` ON `person_place`.`person_id` = `person`.`id`
WHERE `person`.`type` = V OR `person_place`.`person_id` IS NULL

django annotate a function that returns the maximum of a field among all objects that has a feature

Suppose that I have this model:
class Student(models.Model):
class_name = models.CharField()
mark = models.IntegerField()
And I want to get all the students that have the highest mark in their class. I can get the student who has the highest mark in all the classes like it is mentioned in this post. But I want all the students that have the highest mark in their class, something like this:
Student.objects.annotate(
highest_mark_in_class=Max(
Students.objects.filter(class_name=F('class_name'))
.filter(mark=highest_mark_in_class)
)
)
I can do this with a for loop, but with a large database for loops are rather slow. I don't know if it's possible to write such a query in one line?
You will have to use 2 queries for that:
import operator
from functools import reduce
from django.db.models import Max, Q
best_marks = Student.objects.values('class_name').annotate(mark=Max('mark'))
q_object = reduce(operator.or_, (Q(**x) for x in best_marks))
queryset = Student.objects.filter(q_object)
First query gets a list of best mark for each class.
Second query gets all students that where mark and class matches one item of the list.
Note that if you call .annotate(best_mark=Max('mark')) instead of .annotate(mark=Max('mark')), you will have to do some extra work to rename best_mark as mark prior to passing the dictionnary to the Q object. While Q(**x) is quite convenient.

Highly challenging queryset filtering, sorting and annotation (in Django-based app)

I have a Django-based web-app where users congregate and chat with one another. I've just finished writing a feature whereby users can make their own "chat groups", centered around any topic of interest. These could either be private, or publically visible.
My next challenge is showing a list of all existing public groups, paginated, and sorted by the most happening group first. After some deep thinking, I've decided that the most happening group is one which sees the most unique visitors (silent or otherwise) in the previous 60 mins. To be sure, by unique I mean distinct users, and not the same user hitting a group again and again.
What's the most efficient way to get the desired, ordered query-set in the get_queryset() method of my class-based view associated to this popularity listing? Secondly, what's the most efficient way to also annotate total distinct views to each group object in the same queryset, so that I can additionally show total views, while sort according to what's currently hot?
Relevant models are:
class Group(models.Model):
topic = models.TextField(validators=[MaxLengthValidator(200)], null=True)
owner = models.ForeignKey(User)
private = models.CharField(max_length=50, default=0)
created_at = models.DateTimeField(auto_now_add=True)
class GroupTraffic(models.Model):
visitor = models.ForeignKey(User)
which_group = models.ForeignKey(Group)
time = models.DateTimeField(auto_now_add=True)
The relevant view is:
class GroupListView(ListView):
model = Group
form_class = GroupListForm
template_name = "group_list.html"
paginate_by = 25
def get_queryset(self):
return Group.objects.filter(private=0,date__gte=???).distinct('grouptraffic__visitor').annotate(recent_views=Count('grouptraffic__???')).order_by('-recent_views').annotate(total_views=Count('grouptraffic__which_group=group'))
As you can see, I've struggled rather mightily in the get_queryset(self) method above, annotating twice and what not. Please advise!
You cannot combine annotate() and distinct() in a single django query. So you can try like:
date = datetime.datetime.now()-datetime.timedelta(hours=1)
Next query is to get the grouptraffic with unique visitors
new_traff = GroupTraffic.objects.filter(time__gte=date).distinct('visitor','which_group').values_list('id',flat=True)
trendingGrp_ids = GroupTraffic.objects.filter(id__in=new_traff).values('which_group').annotate(total=Count('which_group')).order_by('-total')
The above query will get you trending groupids ordered by total like:
[{'total': 4, 'which_group': 2}, {'total': 2, 'which_group': 1}, {'total': 1, 'which_group': 3}]
Here total refers to no. of new unique visitors for each group in the last 60 minutes.
Now iterate over trendingGrp_ids to get the trending trendingGrps with views:
trendingGrps = [Group.objects.filter(id=grp['which_group']).extra(select={"views":grp['total']})[0] for grp in trendingGrp_ids]
Update:
To get all public groups, and sort them by how hot they are via measuring the traffic they received in the past 1 hr.
new_traff = GroupTraffic.objects.filter(time__gte=date,which_group__private=0).distinct('visitor','which_group').values_list('id',flat=True)
trendingGrp_ids = GroupTraffic.objects.filter(id__in=new_traff).values('which_group').annotate(total=Count('which_group')).order_by('-total')
trendingGrps = [Group.objects.filter(id=grp['which_group']).extra(select={"views":grp['total']})[0] for grp in trendingGrp_ids]
trndids = [grp['which_group'] for grp in trendingGrp_ids]
nonTrendingGrps = Group.objects.filter(private=0).exclude(id__in=trndids).extra(select={"views":0})
allGrps = trendingGrps.append(nonTrendingGrps)
1)Create a separate function that tallies the distinct views in a chatbox. For every chatbox, put the result in a list. Return the biggest value in the list and assign it to a variable. Import the function and filter with the variable.
2) Make a set for each box. The set contains all the distinct users of the chatbox. Filter by the lenght of the set.

Understanding ndb key class vs KeyProperty

I've looked through the documentation, the docs and SO questions and answers and am still struggling with understanding a small piece of this. Which should you choose and when?
This is what I've read so far (just sample):
ndb documentation
movie database structure on SO
Parent Key issues
The key class seems pretty straightforward to me. When you create an ndb entity the datastore automatically creates for you a key usually in the form of key(Kind, id) where the id is created for you .
So say you have these two models:
class Blah(ndb.Model):
last_name = ndb.StringProperty()
class Blah2(ndb.Model):
first_name = ndb.StringProperty()
blahkey = ndb.KeyProperty()
So just using the key kind and you want to make Blah1 a parent (or have several family members with the same last name)
lname = Blah(last_name = "Bonaparte")
l_key = lname.put() **OR**
l_key = lname.key.id() # spits out some long id
fname_key = l_key **OR**
fname_key = ndb.Key('Blah', lname.last_name) # which is more readable..
then:
lname = Blah2( parent=fname_key, first_name = "Napoleon")
lname.put()
lname2 = Blah2( parent=fname_key, first_name = "Lucien")
lname2.put()
So far so good (I think). Now about the KeyProperty for Blah2. Assume Blah1 is still the same.
lname3 = Blah2( first_name = "Louis", blahkey = fname_key)
lname3.put()
Is this correct ?
How to query various things
Query Last Name:
Blah.query() # all last names
Blah.query(last_name='Bonaparte') # That specific entity.
First Name:
Blah2.query()
napol = Blah2.query(first_name = "Napoleon")
bonakey = napol.key.parent().get() # returns Bonaparte's key ??
bona = bonakey.get() # I think this might be redundant
this is where I get lost. How to look for Bonaparte from first name by using either key or keyproperty. I didn't add it here and perhaps should have and that is the discussion of parents, grand parents, great grand parents since Keys keep track of ancestors/parents.
How and why would you use KeyProperty vs the inherent key class. Also imagine you had 3 sensors s1, s2, s3. Each sensor had thousands of readings but you want to keep readings associated with s1 so that you could graph say All readings for today for s1. Which would you use? KeyProperty or the key class ? I apologize if this has been answered elsewhere but I didn't see a clear example/guide about choosing which and why/how.
I think the confusion comes from using a Key. A Key is not associated with any properties inside of an entity, it is only a unique identifier to locate a single entity. It can be either a number or a string.
Fortunately, all your code looks good except for this one line:
fname_key = ndb.Key('Blah', lname.last_name) # which is more readable..
Constructing a Key takes a unique ID, which is not the same as a property. That is, it won't associate the variable lname.last_name with the property last_name. Instead, you can create your record like this:
lname = Blah(id = "Bonaparte")
lname.put()
lname_key = ndb.Key('Blah', "Bonaparte")
You are guaranteed to have only one Blah entity with that ID. In fact, if you use a string like last_name as the ID, you don't need to store it as a separate property. Think of the entity ID as an extra string property that is unique.
Next, Be careful not to assume that Blah.last_name and Blah2.first_name are unique in your queries:
lname = Blah2( parent=fname_key, first_name = "Napoleon")
lname.put()
If you do this more than once, there will be multiple entities with a first_name of Napoleon (all with the same parent key).
Continuing with your code from above:
napol = Blah2.query(first_name = "Napoleon")
bonakey = napol.key.parent().get() # returns Bonaparte's key ??
bona = bonakey.get() # I think this might be redundant
napol holds a Query, not a result. You need to call napol.fetch() to get all entities with "Napolean" (or napol.get() if you're sure there is just one entity).
bonakey is the opposite, it holds the parent entity because of the get() and not Bonaparte's key. If you left the .get() off, then bona would correctly have the parent.
Finally, your question about sensors. You may not need KeyProperty or "inherent" keys. If you have a Readings class like this:
class Readings(ndb.Model):
sensor = ndb.StringProperty()
reading = ndb.IntegerProperty()
then you can store them all in a single table without keys. (You may want to include a timestamp or other attribute.) Later, you can retrieve then with this query:
s1_readings = Readings.query(Readings.sensor == 'S1').fetch()
I'm new to NDB also, and I'm still not understanding all for now, but I think that when you create Blah2 with a parent for Napoleon, you will need the parent to query it or will not appear. For example:
napol = Blah2.query(first_name = "Napoleon")
will not get anything (and you are not using the right format for NDB), but using the parent will do:
napol = Blah2.query(ancestor=fname_key).filter(Blah2.first_name == "Napoleon").get
Don't know if this puts some light for your question.

Categories