I'm working on some networking-related model mixins and I have two particular models that are supposed to be identical in every way except for their fieldname prefixes.
Picture:
class SrcEvent(models.Model):
src_ip = models.GenericIPField...
(...many more properties and methods...)
class DstEvent(models.Model):
dst_ip = models.GenericIPField...
(...many more properties and methods...)
Repeating everything twice (or even just extending one to get the methods on the other) doesn't sit well with me; what I'd like to end up with is a generic abstract class Event that just contains attributes like ip, hostname and such, then extend that with two child classes (SrcEvent and DstEvent) that append either "src_" or "dst_" to each field when the model is generated/migrated.
I can't just make Event and call it a day; some models mix in one, the other, or both sets of attributes, and the direction matters. These models are mixins. The models they get mixed into can have attributes pertaining to a source event (such as an alert), a destination event (such as an email), or both a source and destination event (netflow). So for example a Netflow(SrcMixin, DstMixin) model will have both the src_* and the dst_* sets of fields, which doesn't work if both mixins call their respective IP address field ip. This is why I need to maintain the distinction.
I do not know how to go about this within Django, or what to call it to look it up myself. Any tips would be appreciated!
I'm not sure about the 'mixin' aspects of this, but it sounds like a case for using an Abstract Base Class. with Source(Event) and Destination(Event) classes underneath it.
To define an abstract base class you would use something like:
class Event(models.Model):
class Meta:
abstract = True
#define all your common fields here
In the ORM, Source and Destination would become separate tables. As I said, I'm not sure about the 'mixin' aspects, but to a first approximation I think making Source and Destination abstract as well might work, so that objects which instantiate Source or Destination need all the fields populated?
I'm working around this through the use of formsets. I leave the fields generic, but I added a new CharField to indicate direction ('src' or 'dst'). Then I create the objects and references to events based on the number of forms submitted and their direction.
Related
tl;dr
Is there a simple alternative to multi-table inheritance for implementing the basic data-model pattern depicted below, in Django?
Premise
Please consider the very basic data-model pattern in the image below, based on e.g. Hay, 1996.
Simply put: Organizations and Persons are Parties, and all Parties have Addresses. A similar pattern may apply to many other situations.
The important point here is that the Address has an explicit relation with Party, rather than explicit relations with the individual sub-models Organization and Person.
Note that each sub-model introduces additional fields (not depicted here, but see code example below).
This specific example has several obvious shortcomings, but that is beside the point. For the sake of this discussion, suppose the pattern perfectly describes what we wish to achieve, so the only question that remains is how to implement the pattern in Django.
Implementation
The most obvious implementation, I believe, would use multi-table-inheritance:
class Party(models.Model):
""" Note this is a concrete model, not an abstract one. """
name = models.CharField(max_length=20)
class Organization(Party):
"""
Note that a one-to-one relation 'party_ptr' is automatically added,
and this is used as the primary key (the actual table has no 'id'
column). The same holds for Person.
"""
type = models.CharField(max_length=20)
class Person(Party):
favorite_color = models.CharField(max_length=20)
class Address(models.Model):
"""
Note that, because Party is a concrete model, rather than an abstract
one, we can reference it directly in a foreign key.
Since the Person and Organization models have one-to-one relations
with Party which act as primary key, we can conveniently create
Address objects setting either party=party_instance,
party=organization_instance, or party=person_instance.
"""
party = models.ForeignKey(to=Party, on_delete=models.CASCADE)
This seems to match the pattern perfectly. It almost makes me believe this is what multi-table-inheritance was intended for in the first place.
However, multi-table-inheritance appears to be frowned upon, especially from a performance point-of-view, although it depends on the application. Especially this scary, but ancient, post from one of Django's creators is quite discouraging:
In nearly every case, abstract inheritance is a better approach for the long term. I’ve seen more than few sites crushed under the load introduced by concrete inheritance, so I’d strongly suggest that Django users approach any use of concrete inheritance with a large dose of skepticism.
Despite this scary warning, I guess the main point in that post is the following observation regarding multi-table inheritance:
These joins tend to be "hidden" — they’re created automatically — and mean that what look like simple queries often aren’t.
Disambiguation: The above post refers to Django's "multi-table inheritance" as "concrete inheritance", which should not be confused with Concrete Table Inheritance on the database level. The latter actually corresponds better with Django's notion of inheritance using abstract base classes.
I guess this SO question nicely illustrates the "hidden joins" issue.
Alternatives
Abstract inheritance does not seem like a viable alternative to me, because we cannot set a foreign key to an abstract model, which makes sense, because it has no table. I guess this implies that we would need a foreign key for every "child" model plus some extra logic to simulate this.
Proxy inheritance does not seem like an option either, as the sub-models each introduce extra fields. EDIT: On second thought, proxy models could be an option if we use Single Table Inheritance on the database level, i.e. use a single table that includes all the fields from Party, Organization and Person.
GenericForeignKey relations may be an option in some specific cases, but to me they are the stuff of nightmares.
As another alternative, it is often suggested to use explicit one-to-one relations (eoto for short, here) instead of multi-table-inheritance (so Party, Person and Organization would all just be subclasses of models.Model).
Both approaches, multi-table-inheritance (mti) and explicit one-to-one relations (eoto), result in three database tables. So, depending on the type of query, of course, some form of JOIN is often inevitable when retrieving data.
By inspecting the resulting tables in the database, it becomes clear that the only difference between the mti and eoto approaches, on the database level, is that an eoto Person table has an id column as primary-key, and a separate foreign-key column to Party.id, whereas an mti Person table has no separate id column, but instead uses the foreign-key to Party.id as its primary-key.
Question(s)
I don't think the behavior from the example (especially the single direct relation to the parent) can be achieved with abstract inheritance, can it? If it can, then how would you achieve that?
Is an explicit one-to-one relation really that much better than multi-table-inheritance, except for the fact that it forces us to make our queries more explicit? To me the convenience and clarity of the multi-table approach outweighs the explicitness argument.
Note that this SO question is very similar, but does not quite answer my questions. Moreover, the latest answer there is almost nine years old now, and Django has changed a lot since.
[1]: Hay 1996, Data Model Patterns
While awaiting a better one, here's my attempt at an answer.
As suggested by Kevin Christopher Henry in the comments above, it makes sense to approach the problem from the database side. As my experience with database design is limited, I have to rely on others for this part.
Please correct me if I'm wrong at any point.
Data-model vs (Object-Oriented) Application vs (Relational) Database
A lot can be said about the object/relational mismatch,
or, more accurately, the data-model/object/relational mismatch.
In the present
context I guess it is important to note that a direct translation between data-model,
object-oriented implementation (Django), and relational database implementation, is not always
possible or even desirable. A nice three-way Venn-diagram could probably illustrate this.
Data-model level
To me, a data-model as illustrated in the original post represents an attempt to capture the essence of a real world information system. It should be sufficiently detailed and flexible to enable us to reach our goal. It does not prescribe implementation details, but may limit our options nonetheless.
In this case, the inheritance poses a challenge mostly on the database implementation level.
Relational database level
Some SO answers dealing with database implementations of (single) inheritance are:
How can you represent inheritance in a database?
How do you effectively model inheritance in a database?
Techniques for database inheritance?
These all more or less follow the patterns described in Martin Fowler's book
Patterns of Application Architecture.
Until a better answer comes along, I am inclined to trust these views.
The inheritance section in chapter 3 (2011 edition) sums it up nicely:
For any inheritance structure there are basically three options.
You can have one table for all the classes in the hierarchy: Single Table Inheritance (278) ...;
one table for each concrete class: Concrete Table Inheritance (293) ...;
or one table per class in the hierarchy: Class Table Inheritance (285) ...
and
The trade-offs are all between duplication of data structure and speed of access. ...
There's no clearcut winner here. ... My first choice tends to be Single Table Inheritance ...
A summary of patterns from the book is found on martinfowler.com.
Application level
Django's object-relational mapping (ORM) API
allows us to implement these three approaches, although the mapping is not
strictly one-to-one.
The Django Model inheritance docs
distinguish three "styles of inheritance", based on the type of model class used (concrete, abstract, proxy):
abstract parent with concrete children (abstract base classes):
The parent class has no database table. Instead each child class has its own database
table with its own fields and duplicates of the parent fields.
This sounds a lot like Concrete Table Inheritance in the database.
concrete parent with concrete children (multi-table inheritance):
The parent class has a database table with its own fields, and each child class
has its own table with its own fields and a foreign-key (as primary-key) to the
parent table.
This looks like Class Table Inheritance in the database.
concrete parent with proxy children (proxy models):
The parent class has a database table, but the children do not.
Instead, the child classes interact directly with the parent table.
Now, if we add all the fields from the children (as defined in our data-model)
to the parent class, this could be interpreted as an implementation of
Single Table Inheritance.
The proxy models provide a convenient way of dealing with the application side of
the single large database table.
Conclusion
It seems to me that, for the present example, the combination of Single Table Inheritance with Django's proxy models may be a good solution that does not have the disadvantages of "hidden" joins.
Applied to the example from the original post, it would look something like this:
class Party(models.Model):
""" All the fields from the hierarchy are on this class """
name = models.CharField(max_length=20)
type = models.CharField(max_length=20)
favorite_color = models.CharField(max_length=20)
class Organization(Party):
class Meta:
""" A proxy has no database table (it uses the parent's table) """
proxy = True
def __str__(self):
""" We can do subclass-specific stuff on the proxies """
return '{} is a {}'.format(self.name, self.type)
class Person(Party):
class Meta:
proxy = True
def __str__(self):
return '{} likes {}'.format(self.name, self.favorite_color)
class Address(models.Model):
"""
As required, we can link to Party, but we can set the field using
either party=person_instance, party=organization_instance,
or party=party_instance
"""
party = models.ForeignKey(to=Party, on_delete=models.CASCADE)
One caveat, from the Django proxy-model documentation:
There is no way to have Django return, say, a MyPerson object whenever you query for Person objects. A queryset for Person objects will return those types of objects.
A potential workaround is presented here.
I am building a control panel that will have multiple sub-applications in Django. One of my models is an application, which will have important settings like name, description, install_path and id (so that I can associate specific settings and configuration values to this application.
Right now I'm struggling with trying to figure out how to declare this particular model. Each application will do something completely different than each other application. One might manage specific CMS settings and another may handle password resets for our development environment. The goal is to get the common support items in one place.
The main information for each application will be the same. Each will have a name, description, etc. The difference will be what they do and what settings they use. The settings are in their own model though, with a link back to the application via foreign key.
I'm unsure which model type would be most appropriate for my use case. Both look like they'd be useful, but if that's the case, I'm assuming that I am missing an aspect of one (or both) of them.
My question is, what is the difference between declaring my applications using abstract base class models vs. proxy models?
Nobody's touched this for 8 months. I should know better, but I'm going to take a stab at it.
Your first option, obviously, is to not use base classes at all and duplicate your Fields on each model. I know you didn't ask about this, but for others looking at this post, it is a good way to go for beginners. It's easy, and everything for the model is listed in one place rather than pointing to another model located somewhere else in the code for some of your fields.
Abstract base classes are probably the next easiest and next most commonly used. When you have a lot of duplication of fields across two or more models it is worth considering. Using this method you can eliminate the need to type (or cut and paste) fields over and over across multiple models. When you declare the base class abstract, the table is never actually built in the database. The base class is only used when the child tables are built. This keeps your database simpler and maintains performance because you don't have to build relationships to the base class and use joins to query data. You can also add additional fields (attributes) to the child classes on each of your child models(which proxy models cannot).
Proxy models are somewhat similar in that you have a base or parent class, but there are significant differences from there. You will use proxy models in situations where all of you models have the same fields (attributes), but you might have different "types" of objects. For instance you might have a base class of Cars, and use the manufacturer as your type. Then you may have Ford, Chevy and Honda models that are all proxy models of Cars. They all have the same fields. The manager class chosen for the model is what really makes them different from each other. From a database perspective, really only one table is built... Cars, leading to better performance than building multiple tables, but the drawback is you can't add manufactures-specific fields to the model without adding them to the entire Cars table.
In general I would recommend starting with Abstract Base Classes for models with lots of duplicate fields. Proxy models seem to be a more specific use case, but can be used as well if you have the use case and once you're more well-versed.
I'm not 100% clear on your specific use case based on your description, but hopefully I've given you enough information to decide what's best on your own.
We are developing a collection management project using Django, usable for different types of collections.
This problem quite naturally divides itself in two:
The common part, that will be shared by all collections.
The specializations, different for each collection type.
Example
To illustrate this a bit further, let's take a simplified example in pseudocode.
Common part
class ItemBase: # ideally abstract
name = CharField()
class Rental
item = ForeignKey("Item")
rented_to_person = CharField()
Specialization for a collection of cars
class ItemSpecialization
horse_power = Int()
The problem
The question is how we could organize the code in order to allow reuse of the common part without duplicating its content ?
We would imagine it would be best to have the common part as a non-installed application, and have each specialized configuration as a separate installed application. But this would cause a problem with the Rental concrete class, because it resides in the common-part application.
Any advices on how we could proceed ?
It really depends on what you want, you may use an abstract model class for common stuff, and inherit from that in specialized model classes.
Otherwise, if you really want one table for all common data, typically to be able to relate to it, then you'll need your specialized model to have a relation to the common model. It can be a foreign key, or you can use model inheritance, in which case the foreign key in question will be managed for you by django, but it'll be harder to use.
It sounds like you're looking for a OneToOneField field relationship. Based on your example:
class ItemBase:
name = models.CharField(max_length=50)
class Rental:
item = models.OneToOneField(ItemBase)
rented_to_person = models.CharField(max_length=50)
class ItemSpecialization
item = models.OneToOneField(ItemBase)
horse_power = models.IntegerField()
With this model hierarchy, you could fetch Rental or ItemSpecialzation objects and also gain access to ItemBase fields. It's basically OO inheritance for Django models. More details in the docs: https://docs.djangoproject.com/en/1.9/topics/db/examples/one_to_one/
I have several models that implement a so called 'Taggable' behavior:
class Tag(models.Model):
name = models.CharField(max_length=200)
class Taggable(models.Model):
tags = models.ManyToManyField(Tag)
class Meta:
abstract = True
class A(Taggable):
...
class B(Taggable):
...
class C(Taggable):
...
class D(Taggable):
...
This scenario causes an intermediary table to be created for each and every model inheriting from Taggable. I.e.,
appname_a_tags
appname_b_tags
appname_c_tags
appname_d_tags
I am still at the beginning of the development, and the number of such models and thus tables might increase. So, it bothers me a little to have a clutter of tables with similar data, plus, in the future, I might need common functionality for all the tags assigned in the application (e.g., collecting statistical data of which tags used where, or maybe for a search feature).
Now, my question is: from general engineering perspective would it be feasible to use one common intermediary/join table for all of the models consuming this 'Taggable' behavior? If so, what would be the disadvantages of doing so, and would be the best approach to tackle this.
Being new in django, I would have tried (or researched more on) these 2 scenarios if I had to do it:
SCENARIO 1:
include a 'class' field in the Tag model above that will keep the type of ojbect that the tag is assigned to (e.g., A, B, C, or D)
create a proxy Tag class for each of A, B, C, D (e.g., ATag, BTag, ..), override the models.Model class' 'save' method in order to update 'class' column appropriately. I would also have to customize model's manager in order to filter and properly retrieve tags that belong to that class only.
let each class consume its custom Tag class. I.e.,
class A(models.Model):
tags = models.ManyToManyField(ATags)
...
SCENARIO 2:
create a common intermediary model TagAssigned, and include the extra 'class' field besides the 2 foreign keys for the tag and the item tagged
use TagAssigned as intermediary table for all 4 classes. E.g.,
Class A(models.Model)
tags = models.ManyToManyField(Tag, through='TagAssigned')
...
override and customize the 'save' methods and the model managers of classes A, B, C, and D (as opposed to proxy models in scenario 1)
I would suggest to keep your initial implementation as simple as possible (e.g. no proxy table no overwriting save) and not to worry about number of tables unless you reach some pretty amazing numbers, hit some kind of serious issues with it or or outgrow the usefulness of "simple tags". This might save you one hairy implementation that you might have not needed in the first place.
P.S. I am using https://github.com/alex/django-taggit which is pretty handy and easy to work with. Just saw that the repo has ~60 issues in gitgub :) but I have not hit any of them.
In my GAE project, I have a base class called Part.
From this class I derive other classes such as Motor and Battery.
If I run the following:
motors = Motor.query().fetch()
batterys = Battery.query().fetch()
I will get all the parts, but I am looking for something more elegant.
If I run:
parts = Part.query().fetch()
I get an empty list [ ].
How can I run the above query and get all results in one list?
Thank you
You can do this, but all your Parts classes must inherit from PolyModel
https://developers.google.com/appengine/docs/python/ndb/polymodelclass
So
class Part(ndb.PolyModel):
#stuff
class Motor(Part):
# stuff
class Wheel(Part):
# stuff
parts = Part.query().fetch()
However all items are stored in the datastore as Part, they have an additional classes attribute which names each of the classes in it's inheritance heirarchy. Then when the entity is retrieved the correct sub class is instantiated.
Another potential downside, the property names in the stored model are a union of all subclasses, if you have default values for any of the properties. I haven't checked to see how far this goes. So if you have lots of very different properties in all the subclasses you could be storing a lot of empty properties, and incur the cost of storing all the property names in each entity.
The datastore has no concept of inheritance, and doesn't know that your entity types derive from Part.
There isn't really any way of doing this sort of thing with GAE: ancestor keys are not really the answer, as they would have all Motor/Battery entities descending from a single Part, which would severely limit update rates.
The best way to model this kind of relationship would really be to drop the separate models and have a single Part model, with a part_type field that can be "motor" or "battery".