Django: avoid multiple DB queries for recursive model

Django: avoid multiple DB queries for recursive model - python

I have following models:
class Topic(models.Model):
...
class Article(models.Model):
...
class ArticleInTopic(models.Model):
topic = models.ForeignKey(Topic, on_delete=models.PROTECT)
article = models.ForeignKey(Article, on_delete=models.PROTECT)
depends_on = models.ForeignKey(ArticleInTopic, on_delete=models.PROTECT)
class Meta:
unique_together = ('topic', 'article', 'depends_on')
With this models set I'm trying to express the following situation: there are some studying topics, each consists of multiple articles. However, in order to learn a topic one should read the articles related to the topic but not in any order, rather in order defined by article dependencies. This means that in context of a topic an article may have another article which is required to be read before reading this article. It is guaranteed that the article on which the current article depends on comes from the same topic.
So, basically, this whole structure looks like an acyclic (it is guaranteed) graph with parent-child nodes relation. In the business logic of my app I'm going to sort the graph topologically so I can tell the user which Article to read 1st, 2nd, etc.
However, the problem is the way Django fetches ForeignKey's data from the database. AFAIK it uses N+1 request to fetch all foreign keys. There is a solution to it - using select_related but this only works when you can specify exact fields which need to be queried in the same request as the main info. In my case this is not possible, because I do not know in advance how many children each node has, so I cannot list them all in select_related.
Instead I was thinking about fetching all ArticleInTopic objects which have the same topic foreign key (topic name comes from user, so I know it by the time I need to show him Articles) and then topologically sort them in memory. But I am not sure whether Django will understand that it has fetched all the required object already when I will access one of objects depends_on field.
For example, I fetch 2 ArticleInTopic objects for topic 'Cars', lets say those objects are A and B. B depends on A. Django has queried them already and they are in memory. Now, what happens if I do B.depends_on? Will Django make another request to the DB in order to select B? Or is it smart enough to understand that B has already been fetched by the previous request? If it is not, is there any way to prevent extra DB queries?

Related

Django 4.x - Conditional QuerySet for Pagination and a many-to-many relationship

Disclaimer: I have searched and a question tackling this particular challenge could not be found at the time of posting.
The Requirement
For a Class Based View I need to implement Pagination for a QuerySet derived through a many to many relationship. Here's the requirement with a more concrete description:
Many Library Records can belong to many Collections
Web pages are required for most (but not necessarily all) Collections, and so I need to build views/templates/urls based on what the client identifies as required
Each Collection Page displaying the relevant Library Records requires Pagination, as there may be 100's of records to display.
The First Approach
And so with this requirement in mind I approached this as I normally would when building a CBV with Pagination. However, this approach did not allow me to meet the requirement. What I quickly discovered was that the Pagination method in the CBV was building the object based on the declared model, but the many to many relationship was not working for me.
I explored the use of object in the template, but after a number of attempts I was getting nowhere. I need to display Library Record objects but the many to many relationship demands that I do so after determining the records based on the Collection they belong to.
EDIT - Addition of model
models.py
class CollectionOrder(models.Model):
collection = models.ForeignKey(
Collection,
related_name='collection_in_collection_order',
on_delete=models.PROTECT,
null=True,
blank=True,
verbose_name='Collection'
)
record = models.ForeignKey(
LibraryRecord,
related_name='record_in_collection_order',
on_delete=models.PROTECT,
null=True,
blank=True,
verbose_name='Library record',
)
order_number = models.PositiveIntegerField(
blank=True,
null=True,
)

Please do not work with record.record.id: this will each time make a query for each CollectionOrder object, and thus if there are 100 CollectionOrder objects, that will make 100 extra queries, and thus eventually make 102 queries. If the number of matches is thus quite large, it will eventually no longer respond (within reasonable time).
Furthermore pk__in=library_records_ids will not respect the order of the library_record_ids. Indeed, it can return the LibraryRecords in any order, as long as these have primary keys that are members of the list.
You can query with:
def get_queryset(self):
return LibraryRecord.objects.filter(
collectionorder__collection__collection='collection-name'
).order_by('collectionorder__order_number')
Where collectionorder is the related_query_name=… [Django-doc] for the ForeignKey, OneToOneField or ManyToManyField named record from CollectionOrder to the LibraryRecord model. If you did not specify a value for the related_query_name=… parameter, it will take the value for the related_name=… parameter [Django-doc], and if you did not specify that one either, it will use the name of the source model (so where the relation is defined) in lowercase, so in this case collectionorder.
This will thus respect the collectionorder__order_number as ordering condition, and perform this in a single database query, minimizing the amount of queries to the database.

Hopefully, this Q&A helps someone else. If in reading the following approach you can think of ways to refactor/optimize I'd love to learn. Note: I deliberately did not implement Pythonic List Comprehension for my personal preference of readability.
What I ended up doing was adding get_queryset() to:
Query the Collection for the records belong to it, to then
Build a list of record ids, to then
Return the QuerySet by filtering for pk__in (the pk exists in the list of library_record_ids)
Here's the resulting code. (Edit: This code has been optimized following another answer - I just didn't want to leave a lesser snippet up)
def get_queryset(self):
return LibraryRecord.objects.filter(
record_in_collection_order__collection__collection='Collection Name'
).order_by('record_in_collection_order__order_number')
The requirement has been met. I welcome constructive criticism. My intention in sharing this Q&A is to try and give a little back to the Stack Overflow Community that has served me so well since starting this journey into Django.

Django set privacy options per model field

I have gone through the question, best way to implement privacy on each field in model django and Its answers doesn't seem solve my problem so I am asking some what related question here,
well, I have a User model. I want the user to make possible to control the privacy of each and every field of their profile (may be gender, education, interests etc . ..).
The privacy options must not to be limited to just private or public, but as descriptive as
public
friends
only me
friend List 1 (User.friendlist.one)
friend List 2 (User.friendlist.two)
friend List 3 (User.friendlist.three)
another infinte lists that user may create.
I also don't want these privacy options to be saved on another model, but the same so that with one query I could get the user object along with the privacy options.
so If I have the UserModel,
class User(models.Model):
name = models.CharField()
email = models.EmailField()
phone = models.CharField()
How do I setup a privacy setting here? I am using postgres, can I map a JSON field or Hstore even an ArrayField?
what is the best solution that people used to do with Django with same problem?
update:
I have n model fields. What I really want is to store the privacy settings of each instance on itself or some other convenient way.

I have worked on my issue, tried solutions with permissions and other relations. I have a Relationship Model and all other relationship lists are derived from the Relationship model, so I don't want to maintain a separate list of Relationships.
So my pick was to go with a Postgres JSONField or HStoreField. Since Django has good support for postgres freatures, I found these points pro for the choice I made.
JSON/HashStore can be queried with Django ORM.
The configurations are plain JSON/HashStore which are easy to edit and maintain than permissions and relations.
I found database query time taken are larger with permissions than with JSON/HStore. (hits are higher with permissions)
Adding and validating permissions per field are complex than adding/validating JSON.
At some point in future if comes a more simple or hassle free solution, I can migrate to it having whole configuration at a single field.
So My choice was to go with a configuration model.
class UserConfiguration(models.Model):
user = # link to the user model
configuration = #either an HStore of JSONFeild
Then wrote a validator to make sure configuration data model is not messed up while saving and updating. I grouped up the fields to minimize the validation fields. Then wrote a simple parser that takes the users and finds the relationship between them, then maps with the configuration to return the allowed field data (logged at 2-4ms in an unoptimized implementation, which is enough for now). (With permission's I would need a separate list of friends to be maintained and should update all the group permissions on updation of privacy configuration, then I still have to validate the permissions and process it, which may take lesser time than this, but for the cost of complex system).
I think this method is scalable as well, as most of the processing is done in Python and database calls are cut down to the least as possible.
Update
I have skinned down database queries further. In the previous implementation the relations between users where iterated, which timed around 1-2ms, changing this implementation to .value_list('relations', flat=True) cut down the query time to 400-520µs.

I also don't want these privacy options to be saved on another model, but the same so that with one query I could get the user object along with the privacy options.
I would advice you to decouple the privacy objects from the UserModel, to not mess your users data together with those options. To minimize the amount of database queries, use djangos select_related and prefetch_related.
The requirements you have defined IMO lead to a set of privacy related objects, which are bound to the UserModel. django.contrib.auth is a good point to start with in this case. It is build to be extendable. Read the docs on that topic.
If you expect a large amount of users and therefore also an even larger amount of groups you might want to consider writing the permissions resolved for one user in a redis based session to be able to fetch them quickly on each page load.
UPDATE:
I thought a little more about your requirements and came to the conclusion that you need per object permission as implemented in django-guardian. You should start reading their samples and code first. They build that on top of django.contrib.auth but without depending on it, which makes it also usable with custom implementations that follow the interfaces in django.contrib.auth.

What about something like this?
class EditorList(models.Model):
name = models.CharField(...)
user = models.ForeignKey(User)
editor = models.ManyToManyField(User)
class UserPermission(models.Model):
user = models.ForeignKey(User)
name = models.BooleanField(default=False)
email = models.BooleanField(default=False)
phone = models.BooleanField(default=False)
...
editor = models.ManyToManyField(User)
editor_list = models.ManyToManyField(EditorList)
If a user wants to give 'email' permissions to public, then she creates a UserPermission with editor=None and editor_list=None and email=True.
If she wants to allow user 'rivadiz' to edit her email, then she creates a UserPermission with editor='rivadiz' and email=True.
If she wants to create a list of friends that can edit her phone, then she creates and populates an EditorList called 'my_friends', then creates a UserPermission with editor_list='my_friends' and phone=True
You should then be able to query all the users that have permission to edit any field on any user.
You could define some properties in the User model for easily checking which fields are editable, given a User and an editor.
You would first need to get all the EditorLists an editor belonged to, then do something like
perms = UserPermissions.objects.filter(user=self).filter(Q(editor=editor) | Q(editor_list=editor_list))

First of all, in my opinion you should go for multiple models and for making the queries faster, as already mentioned in other answers, you can use caching or select_related or prefetch_related as per your usecase.
So here is my proposed solution:
User model
class User(models.Model):
name = models.CharField()
email = models.EmailField()
phone = models.CharField()
...
public_allowed_read_fields = ArrayField(models.IntegerField())
friends_allowed_read_fields = ArrayField(models.IntegerField())
me_allowed_read_fields = ArrayField(models.IntegerField())
friends = models.ManyToManyField(User)
part_of = models.ManyToManyField(Group, through=GroupPrivacy)
Group(friends list) model
class Group(models.Model):
name = models.CharField()
Through model
class GroupPrivacy(models.Model):
user = models.ForeignKey(User)
group = models.ForeignKey(Group)
allowed_read_fields = ArrayField(models.IntegerField())
User Model fields mapping to integers
USER_FIELDS_MAPPING = (
(1, User._meta.get_field('name')),
(2, User._meta.get_field('email')),
(3, User._meta.get_field('phone')),
...
)
HOW DOES THIS HELPS??
for each of public, friends and me, you can have a field in the User model itself as already mentioned above i.e. public_allowed_read_fields, friends_allowed_read_fields and me_allowed_read_fields respectively. Each of this field will contain a list of integers mapped to the ones inside USER_FIELDS_MAPPING(explained in detail below)
for friend_list_1, you will have group named friend_list_1. Now the point is the user wants to show or hide a specific set of fields to this friends list. That's where the through model, GroupPrivacy comes into the play. Using this through model you define a M2M relation between a user and a group with some additional properties which are unique to this relation. In this GroupPrivacy model you can see allowed_read_fields field, it is used to store an array of integers corresponding to the ones in the USER_FIELDS_MAPPING. So lets say, for group friend_list_1 and user A, the allowed_read_fields = [1,2]. Now, if you map this to USER_FIELDS_MAPPING, you will know that user A wants to show only name and email to the friends in this list. Similarly different users in friend_list_1 group will have different values in allowed_read_fields for their corresponding GroupPrivacy model instance.
This will be similar for multiple groups.

This will be much more cumbersome without a separate permissions model. The fact that you can associate a given field of an individual user's profile with more than one friend list implies a Many to Many table, and you're better off just letting Django handle that for you.
I'm thinking something more like:
class Visibility(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
field = models.CharField(max_length=32)
public = models.BooleanField(default=False)
friends = models.BooleanField(default=False)
lists = models.ManyToManyField(FriendList)
#staticmethod
def visible_profile(request_user, profile_user):
"""Get a dictionary of profile_user's profile, as
should be visible to request_user..."""
(I'll leave the details of such a method as an exercise, but it's not
too complex.)
I'll caution that the UI involved for a user to set those permissions is likely to be a challenge because of the many-to-many connection to friend lists. Not impossible, definitely, but a little tedious.
A key advantage of the M2M table here is that it'll be self-maintaining if the user or any friend list is removed -- with one exception. The idea in this scheme is that without any Visibility records, all data is private (to allow everyone to see your name, you'd add a Visibility record with user=(yourself), field="name", and public=True. Since a Visibility record where public=False, friends=False, and lists=[] is pointless, I'd check for that situation after the user edits it and remove that record entirely.
Another valid strategy is to have two special FriendList records: one for "public", and one for "all friends". This simplifies the Visibility model quite a bit at the expense of a little more code elsewhere.

Django one-to many

I'm really really confused about how django handles database relationships.
Originally I had an article model that contained a simple IntegerField for article_views, recently I'm trying to expand the definition of a article_view to contain it's own fields so I created a model for it. (IP, SESSION KEY etc..)
I'm at a bit of a loss regarding how to make the relationship, to me it makes the most sense to have a one-to-many field inside the article model, because an article can have many different views, but a view can only be part of one article.
all the implementations I'm seeing have this set up in a really weird reverse manner, what gives?

Unfortunately Django does not have a One-to-Many field. This is achieved by creating a ForeignKey on in this case the ArticleView model. When you want to easily access the article views in your template you can set the related_name on the ForeignKey.
class Article(models.Model):
# Article definition
class ArticleView(models.Model):
article = models.ForeignKey(Article, related_name='views')
In the template you can now use article.views.count() to get the number of views coupled to an account.
Please note that this creates a database query for each count you want. It would probably be better to have a queryset with annotate: Article.objects.annotate(num_views=Count('views'))

App Engine: Structured Property vs Reference Property for one-to-many relationship

My background with designing data stores comes from Core Data on iOS, which supports properties having a one-to-many relationship with another entity.
I'm working on an App Engine project which currently has three entity types:
User, which represents a person using the app.
Project, which represents a project. A User may be associated with many projects.
Post, which is the main content behind a Project. A Project may have many posts.
Currently, User has a property, projects, that is a one-to-many relationship to Project entities. Project has a property, posts, that is a one-to-many relationship to Post entities.
In this case, is Datastore's Reference Property or NDB's Structured Property better for the job (and how are the two conceptually different)? Is there a better way to structure my data?

By reference property you probably mean Key Property. This is a reference to another datastore entity. It is present in both db and ndb APIs. Using these, you can model a many to one relationship by pointing many entities to the key of another entity.
Structured property is a completely different beast. It allows you to define a data structure, and then include it within another entity.
Here's an example from the docs where you include multiple addresses for a single contact:
class Address(ndb.Model):
type = ndb.StringProperty() # E.g., 'home', 'work'
street = ndb.StringProperty()
city = ndb.StringProperty()
class Contact(ndb.Model):
name = ndb.StringProperty()
addresses = ndb.StructuredProperty(Address, repeated=True)
guido = Contact(name='Guido',
addresses=[Address(type='home',
city='Amsterdam'),
Address(type='work',
street='Spear St',
city='SF')])
guido.put()
For your specific application I'd recommend using NDB (it's always best to use the latest version of the api available), with the following:
Post model included under Project model as a repeated structured property.
Users include a repeated KeyProperty that contains the keys of the Projects they have permissions to.
To make it a bit more complex, you can create a another model to represent projects and permissions/roles, and then include that as a repeated structured property within the user model.
The main reason you want to hang on to the keys, is to keep the data accessible in light of HRDs eventual consistency.
Let me know if you need any more help on this.
EDIT:
To clarify, here's the proposed structure:
Models:
User
User-Project-Mapping (optional, needed to handle permissions)
Project
Post
User model should contain User-Project-Mapping as repeated structured property.
Project model should contain Post as repeated structured property.
User-Project-Mapping only needs to contain Key reference to the Project and relevant permissions representation.
Since this sounds like a commercial project, if you'd like further help with this, I'll gladly consult for you. Hope you have enough to succeed!

There is another point that was not mentioned and might be relevant: entities inserted in a StructuredProperty "are not full-fledged entities", as mentioned in this part of the docs. Below is the complete quote (it refers to the same example mentioned in the answer by #Sologoub):
Although the Address instances are defined using the same
syntax as for model classes, they are not full-fledged entities. They
don't have their own keys in the Datastore. They cannot be retrieved
independently of the Contact entity to which they belong.
This may cast some limitations in the design given that you cannot reuse an entity's property without duplicating data. The KeyProperty, on the other side, refers to another entity's key and therefore represents entities relationship in a more "relational" way. And KeyProperties can also be repeated: just include the repeated=True parameter.

How to model one way one-to-one relationship in Django

I want to model an article with revisions in Django:
I have following in my article's models.py:
class Article(models.Model):
title = models.CharField(blank=False, max_length=80)
slug = models.SlugField(max_length=80)
def __unicode__(self):
return self.title
class ArticleRevision(models.Model):
article = models.ForeignKey(Article)
revision_nr = models.PositiveSmallIntegerField(blank=True, null=True)
body = models.TextField(blank=False)
On the artlcle model I want to have 2 direct references to a revision - one would point to a published revision and another to a revision that is being actively edited. However from what I understand, OneToOne and ForeignKey references generate a backreference on the other side of the model reference, so my question is, how do i create a one-way one-to-one reference in Django?
Is there some special incantation for that or do I have to fake it by including state into revision and custom implementations of the fields that ask for a revision in specific state?
Edit: I guess, I've done somewhat poor job of explaining my intent. Let's try it on a higher abstraction level:
My original intent was to implement a sort of revisioned article model, where each article may have multiple revisions, where one of those revisions may be "published" and one actively edited.
This means that the article will have one-to-many relationship to revisions (represented by ForeignKey(Article) reference in ArticleRevision class) and two one way references from Article to revision: published_revision and edited_revision.
My question is mainly, how can I model this with Django's ORM.

The back-references that Django produces are programatic, and do not affect the underlying Database schema. In other words, if you have a one-to-one or foreign key field on your Article pointing to your Revision, a column will be added to the Article table in the database, but not to the Revision table.
Thus, removing the reverse relationship from the revision to the article is unnecessary. If you really feel strongly about it, and want to document in your code that the backlink is never used, a fairly common Django idiom is to give the fields a related_name attribute like _unused_1. So your Article model might look like the following:
class Article(models.Model):
title = models.CharField(blank=False, max_length=80)
slug = models.SlugField(max_length=80)
revision_1 = models.OneToOneField(ArticleRevision, related_name='_unused_1')
revision_2 = models.OneToOneField(ArticleRevision, related_name='_unused_2')
def __unicode__(self):
return self.title
That said, it's rare that a one-to-one relationship is actually useful in an application (unless you're optimizing for some reason) and I'd suggest carefully reviewing your DB schema to make sure this is really what you want. It may make sense to keep a single ForeignKey field on your ArticleRevision pointing back to an Article (since an ArticleRevision will, presumably, always need to be associated with an Article) and adding another column to Revision indicating whether it's published.

What is wrong with the link going both ways? I would think that the OneToOneField would be the perfect choice here. Is there a specific reason why this will be a detriment to your application? If you don't need the backreference why can't you just ignore it?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.