I'm working on a multi-tenanted application in which some users can define their own data fields (via the admin) to collect additional data in forms and report on the data. The latter bit makes JSONField not a great option, so instead I have the following solution:
class CustomDataField(models.Model):
"""
Abstract specification for arbitrary data fields.
Not used for holding data itself, but metadata about the fields.
"""
site = models.ForeignKey(Site, default=settings.SITE_ID)
name = models.CharField(max_length=64)
class Meta:
abstract = True
class CustomDataValue(models.Model):
"""
Abstract specification for arbitrary data.
"""
value = models.CharField(max_length=1024)
class Meta:
abstract = True
Note how CustomDataField has a ForeignKey to Site - each Site will have a different set of custom data fields, but use the same database.
Then the various concrete data fields can be defined as:
class UserCustomDataField(CustomDataField):
pass
class UserCustomDataValue(CustomDataValue):
custom_field = models.ForeignKey(UserCustomDataField)
user = models.ForeignKey(User, related_name='custom_data')
class Meta:
unique_together=(('user','custom_field'),)
This leads to the following use:
custom_field = UserCustomDataField.objects.create(name='zodiac', site=my_site) #probably created in the admin
user = User.objects.create(username='foo')
user_sign = UserCustomDataValue(custom_field=custom_field, user=user, data='Libra')
user.custom_data.add(user_sign) #actually, what does this even do?
But this feels very clunky, particularly with the need to manually create the related data and associate it with the concrete model. Is there a better approach?
Options that have been pre-emptively discarded:
Custom SQL to modify tables on-the-fly. Partly because this won't scale and partly because it's too much of a hack.
Schema-less solutions like NoSQL. I have nothing against them, but they're still not a good fit. Ultimately this data is typed, and the possibility exists of using a third-party reporting application.
JSONField, as listed above, as it's not going to work well with queries.
As of today, there are four available approaches, two of them requiring a certain storage backend:
Django-eav (the original package is no longer mantained but has some thriving forks)
This solution is based on Entity Attribute Value data model, essentially, it uses several tables to store dynamic attributes of objects. Great parts about this solution is that it:
uses several pure and simple Django models to represent dynamic fields, which makes it simple to understand and database-agnostic;
allows you to effectively attach/detach dynamic attribute storage to Django model with simple commands like:
eav.unregister(Encounter)
eav.register(Patient)
Nicely integrates with Django admin;
At the same time being really powerful.
Downsides:
Not very efficient. This is more of a criticism of the EAV pattern itself, which requires manually merging the data from a column format to a set of key-value pairs in the model.
Harder to maintain. Maintaining data integrity requires a multi-column unique key constraint, which may be inefficient on some databases.
You will need to select one of the forks, since the official package is no longer maintained and there is no clear leader.
The usage is pretty straightforward:
import eav
from app.models import Patient, Encounter
eav.register(Encounter)
eav.register(Patient)
Attribute.objects.create(name='age', datatype=Attribute.TYPE_INT)
Attribute.objects.create(name='height', datatype=Attribute.TYPE_FLOAT)
Attribute.objects.create(name='weight', datatype=Attribute.TYPE_FLOAT)
Attribute.objects.create(name='city', datatype=Attribute.TYPE_TEXT)
Attribute.objects.create(name='country', datatype=Attribute.TYPE_TEXT)
self.yes = EnumValue.objects.create(value='yes')
self.no = EnumValue.objects.create(value='no')
self.unkown = EnumValue.objects.create(value='unkown')
ynu = EnumGroup.objects.create(name='Yes / No / Unknown')
ynu.enums.add(self.yes)
ynu.enums.add(self.no)
ynu.enums.add(self.unkown)
Attribute.objects.create(name='fever', datatype=Attribute.TYPE_ENUM,\
enum_group=ynu)
# When you register a model within EAV,
# you can access all of EAV attributes:
Patient.objects.create(name='Bob', eav__age=12,
eav__fever=no, eav__city='New York',
eav__country='USA')
# You can filter queries based on their EAV fields:
query1 = Patient.objects.filter(Q(eav__city__contains='Y'))
query2 = Q(eav__city__contains='Y') | Q(eav__fever=no)
Hstore, JSON or JSONB fields in PostgreSQL
PostgreSQL supports several more complex data types. Most are supported via third-party packages, but in recent years Django has adopted them into django.contrib.postgres.fields.
HStoreField:
Django-hstore was originally a third-party package, but Django 1.8 added HStoreField as a built-in, along with several other PostgreSQL-supported field types.
This approach is good in a sense that it lets you have the best of both worlds: dynamic fields and relational database. However, hstore is not ideal performance-wise, especially if you are going to end up storing thousands of items in one field. It also only supports strings for values.
#app/models.py
from django.contrib.postgres.fields import HStoreField
class Something(models.Model):
name = models.CharField(max_length=32)
data = models.HStoreField(db_index=True)
In Django's shell you can use it like this:
>>> instance = Something.objects.create(
name='something',
data={'a': '1', 'b': '2'}
)
>>> instance.data['a']
'1'
>>> empty = Something.objects.create(name='empty')
>>> empty.data
{}
>>> empty.data['a'] = '1'
>>> empty.save()
>>> Something.objects.get(name='something').data['a']
'1'
You can issue indexed queries against hstore fields:
# equivalence
Something.objects.filter(data={'a': '1', 'b': '2'})
# subset by key/value mapping
Something.objects.filter(data__a='1')
# subset by list of keys
Something.objects.filter(data__has_keys=['a', 'b'])
# subset by single key
Something.objects.filter(data__has_key='a')
JSONField:
JSON/JSONB fields support any JSON-encodable data type, not just key/value pairs, but also tend to be faster and (for JSONB) more compact than Hstore.
Several packages implement JSON/JSONB fields including django-pgfields, but as of Django 1.9, JSONField is a built-in using JSONB for storage.
JSONField is similar to HStoreField, and may perform better with large dictionaries. It also supports types other than strings, such as integers, booleans and nested dictionaries.
#app/models.py
from django.contrib.postgres.fields import JSONField
class Something(models.Model):
name = models.CharField(max_length=32)
data = JSONField(db_index=True)
Creating in the shell:
>>> instance = Something.objects.create(
name='something',
data={'a': 1, 'b': 2, 'nested': {'c':3}}
)
Indexed queries are nearly identical to HStoreField, except nesting is possible. Complex indexes may require manually creation (or a scripted migration).
>>> Something.objects.filter(data__a=1)
>>> Something.objects.filter(data__nested__c=3)
>>> Something.objects.filter(data__has_key='a')
Django MongoDB
Or other NoSQL Django adaptations -- with them you can have fully dynamic models.
NoSQL Django libraries are great, but keep in mind that they are not 100% the Django-compatible, for example, to migrate to Django-nonrel from standard Django you will need to replace ManyToMany with ListField among other things.
Checkout this Django MongoDB example:
from djangotoolbox.fields import DictField
class Image(models.Model):
exif = DictField()
...
>>> image = Image.objects.create(exif=get_exif_data(...))
>>> image.exif
{u'camera_model' : 'Spamcams 4242', 'exposure_time' : 0.3, ...}
You can even create embedded lists of any Django models:
class Container(models.Model):
stuff = ListField(EmbeddedModelField())
class FooModel(models.Model):
foo = models.IntegerField()
class BarModel(models.Model):
bar = models.CharField()
...
>>> Container.objects.create(
stuff=[FooModel(foo=42), BarModel(bar='spam')]
)
Django-mutant: Dynamic models based on syncdb and South-hooks
Django-mutant implements fully dynamic Foreign Key and m2m fields. And is inspired by incredible but somewhat hackish solutions by Will Hardy and Michael Hall.
All of these are based on Django South hooks, which, according to Will Hardy's talk at DjangoCon 2011 (watch it!) are nevertheless robust and tested in production (relevant source code).
First to implement this was Michael Hall.
Yes, this is magic, with these approaches you can achieve fully dynamic Django apps, models and fields with any relational database backend. But at what cost? Will stability of application suffer upon heavy use? These are the questions to be considered. You need to be sure to maintain a proper lock in order to allow simultaneous database altering requests.
If you are using Michael Halls lib, your code will look like this:
from dynamo import models
test_app, created = models.DynamicApp.objects.get_or_create(
name='dynamo'
)
test, created = models.DynamicModel.objects.get_or_create(
name='Test',
verbose_name='Test Model',
app=test_app
)
foo, created = models.DynamicModelField.objects.get_or_create(
name = 'foo',
verbose_name = 'Foo Field',
model = test,
field_type = 'dynamiccharfield',
null = True,
blank = True,
unique = False,
help_text = 'Test field for Foo',
)
bar, created = models.DynamicModelField.objects.get_or_create(
name = 'bar',
verbose_name = 'Bar Field',
model = test,
field_type = 'dynamicintegerfield',
null = True,
blank = True,
unique = False,
help_text = 'Test field for Bar',
)
I've been working on pushing the django-dynamo idea further. The project is still undocumented but you can read the code at https://github.com/charettes/django-mutant.
Actually FK and M2M fields (see contrib.related) also work and it's even possible to define wrapper for your own custom fields.
There's also support for model options such as unique_together and ordering plus Model bases so you can subclass model proxy, abstract or mixins.
I'm actually working on a not in-memory lock mechanism to make sure model definitions can be shared accross multiple django running instances while preventing them using obsolete definition.
The project is still very alpha but it's a cornerstone technology for one of my project so I'll have to take it to production ready. The big plan is supporting django-nonrel also so we can leverage the mongodb driver.
Further research reveals that this is a somewhat special case of Entity Attribute Value design pattern, which has been implemented for Django by a couple of packages.
First, there's the original eav-django project, which is on PyPi.
Second, there's a more recent fork of the first project, django-eav which is primarily a refactor to allow use of EAV with django's own models or models in third-party apps.
Related
I have a Django application that handles data analysis workflows, with database models that look something like this:
class Workflow(models.Model):
execution_id = models.UUIDField()
class WorkflowItem(models.Model):
workflow = models.ForeignKey(Workflow)
type = models.CharField(choices=["input", "output"])
files = models.ManyToManyField(File)
class File(models.Model):
path = models.CharField()
class FileMetadata(models.Model):
metadata = models.JSONField()
file = models.ForeignKey(File)
version = models.IntegerField()
A given Workflow will have many WorkflowItem's, which correspond to File's which can be used by WorkflowItem's across many Workflow's. Each File can have many associated FileMetadata's, of which the entry with the max version value is typically used for a given operation.
As the application has been growing, its getting tedious to build out all the different combinations of logic needed to find the entries in one table based on a given entry in another table just by using each tables' Foreign Key interface (Workflow <-> WorkflowItem <-> File <-> FileMetadata).
I am considering just building a table that holds all the foreign keys for every relationship in a single place. Something like this:
class WorkflowFile(models.Model):
workflow = models.ForeignKey(Workflow)
workflow_item = models.ForeignKey(WorkflowItem)
file = models.ForeignKey(File)
file_metadata = models.ForeignKey(FileMetadata)
However, I am not sure if this is a good idea or not. Its not clear to me if implementing a table like this is advantageous compared to just following all the foreign key relationships individually per-table. Its also not clear to me how I should set up such a table through Django, and if the new requirement for manually entering values into this table all the time would outweigh the reduced need for unique query logic every time I want to query these relationships. My end-goal is to provide a simpler, more consistent way to get all of the items in the relationship based on any of the other items in the relationship.
This question seems similar in premise, but I am not clear that the problem or proposed solution is relevant to what I am looking for here.
Not sure this will actually answer your question but if you want to go the way with multiple FK's then you may consider using through table in combination with m2m changed signal to add proper FK's to this model after adding M2M records to WorkflowItem.
It'll be something like:
from django.db.models.signals import m2m_changed
class WorkflowItem(models.Model):
workflow = models.ForeignKey(Workflow)
type = models.CharField(choices=["input", "output"])
files = models.ManyToManyField(File, through=IntermediateTable)
class IntermediateTable(models.Model):
file = models.ForeignKey(File, related_name='file')
workflow_item = models.ForeignKey(WorkflowItem, related_name='worflowitem')
workflow = models.ForeignKey(Workflow, null=True)
file_metadata = models.ForeignKey(FileMetadata)
def workflow_item_changed(sender, **kwargs):
sender.workflow = sender.workflow_item.workflow
...
sender.save()
m2m_changed.connect(workflow_item_changed, sender=WorkflowItem.files.through)
I am running Django on Heroku with zero-downtime feature. This means that during deployment there are two version of code running (old and new) on the same database. That's why we need to avoid any backward incompatible migrations.
It there a possibility to exclude a field from Django query on a given model?
Let say we have a model (version 1):
class Person(models.Model):
name = models.CharField()
address = models.TextField()
In some time in the future we want to move address to the separate table. We know that we should not delete a field for older code to work so Person model may look like (version 2):
class Person(models.Model):
name = models.CharField()
address = models.ForeignKey(Address)
_address = models.TextField(db_name='address')
This way if old code will query for address it will get it from Person table even if database has been migrated (it will be an old value, but let assume thats not a big issue).
How now I can safetly delete _address field? If we will deploy version 3 with _address field deleted then code for version 2 will still try to fetch _address on select, even if it's not used anywhere and will fail with "No such column" exception.
Is there a way to prevent this and mark some field as "non-fetchable" within the code for version 2? So version 2 will not delete field, but will not fetch it anymore and version 3 will delete field.
You can use custom object manager for defer your specific field/fields for all the queryset.
class CustomManager(models.Manager):
def get_queryset(self):
return super(CustomManager, self).get_queryset().defer('_address',)
class Person(models.Model):
name = models.CharField()
address = models.ForeignKey(Address)
_address = models.TextField(db_name='address')
objects = CustomManager()
after that in your any queryset against Person model will not include _address field in query by default.
Yes, you can do it:
QuerySet.defer():
"In some complex data-modeling situations, your models might contain a lot of fields, some of which could contain a lot of data (for example, text fields), or require expensive processing to convert them to Python objects. If you are using the results of a queryset in some situation where you don’t know if you need those particular fields when you initially fetch the data, you can tell Django not to retrieve them from the database." - docs
Entry.objects.defer("headline", "body")
OR
With django 1.8 onwards: use values_list. You can only include fields that you want. You can also use Queryset.only() and Queryset.defer() to refine your queryset queries. You can chain defer() calls as well
Entry.objects.values_list('id', 'headline')
class BaseCommentAbstractModel(models.Model):
"""
An abstract base class that any custom comment models probably should
subclass.
"""
# Content-object field
content_type = models.ForeignKey(ContentType,
verbose_name=_('content type'),
related_name="content_type_set_for_%(class)s")
object_pk = models.TextField(_('object ID'))
content_object = generic.GenericForeignKey(ct_field="content_type", fk_field="object_pk")
# Metadata about the comment
site = models.ForeignKey(Site)
class Meta:
abstract = True
def get_content_object_url(self):
"""
Get a URL suitable for redirecting to the content object.
"""
return urlresolvers.reverse(
"comments-url-redirect",
args=(self.content_type_id, self.object_pk)
)
I have two questions related to this model code.
models.TextField(_('object ID')) Object ID which probably is the verbose Name of this TextField ,How does it reflect in the database?
Why do Django relies on field abstract of Meta innerclass instead of using abc(AbstractBaseClass) module?
That is indeed that verbose name. I assume you understand that _ is the call to ugettext_lazy which is for localizing strings. This is the verbose name of the field. The verbose name is not represented in the database. The name of the field in the database would be object_pk.
I'm not a django dev so I can't speak with authority, but some things are obvious. ABC is new in Python 2.6. This is an issue because as of the most recent release the minimum python version was finally moved to 2.5. This has been being bumped quite quickly as of late. For example it was only on django 1.2 that python 2.4 became required. Abstract models have existed at least as far back as django 1.0 and I think even further back then that (though I can't recall for sure) So even if ABC would be suitable (which I'm not sure it is since the behavior of models is kinda complex), it wouldn't be suitable for django at this point due to the required python version.
Additionally there is some complexity in managing classes that represent the database rather than just data structures. I'm not sure how much this impacts abstract models but for example you can't perform field hiding on django attributes that are Field instances.
Howdy. I'm working on migrating an internal system to Django and have run into a few wrinkles.
Intro
Our current system (a billing system) tracks double-entry bookkeeping while allowing users to enter data as invoices, expenses, etc.
Base Objects
So I have two base objects/models:
JournalEntry
JournalEntryItems
defined as follows:
class JournalEntry(models.Model):
gjID = models.AutoField(primary_key=True)
date = models.DateTimeField('entry date');
memo = models.CharField(max_length=100);
class JournalEntryItem(models.Model):
journalEntryID = models.AutoField(primary_key=True)
gjID = models.ForeignKey(JournalEntry, db_column='gjID')
amount = models.DecimalField(max_digits=10,decimal_places=2)
So far, so good. It works quite smoothly on the admin side (inlines work, etc.)
On to the next section.
We then have two more models
InvoiceEntry
InvoiceEntryItem
An InvoiceEntry is a superset of / it inherits from JournalEntry, so I've been using a OneToOneField (which is what we're using in the background on our current site). That works quite smoothly too.
class InvoiceEntry(JournalEntry):
invoiceID = models.AutoField(primary_key=True, db_column='invoiceID', verbose_name='')
journalEntry = models.OneToOneField(JournalEntry, parent_link=True, db_column='gjID')
client = models.ForeignKey(Client, db_column='clientID')
datePaid = models.DateTimeField(null=True, db_column='datePaid', blank=True, verbose_name='date paid')
Where I run into problems is when trying to add an InvoiceEntryItem (which inherits from JournalEntryItem) to an inline related to InvoiceEntry. I'm getting the error:
<class 'billing.models.InvoiceEntryItem'> has more than 1 ForeignKey to <class 'billing.models.InvoiceEntry'>
The way I see it, InvoiceEntryItem has a ForeignKey directly to InvoiceEntry. And it also has an indirect ForeignKey to InvoiceEntry through the JournalEntry 1->M JournalEntryItems relationship.
Here's the code I'm using at the moment.
class InvoiceEntryItem(JournalEntryItem):
invoiceEntryID = models.AutoField(primary_key=True, db_column='invoiceEntryID', verbose_name='')
invoiceEntry = models.ForeignKey(InvoiceEntry, related_name='invoiceEntries', db_column='invoiceID')
journalEntryItem = models.OneToOneField(JournalEntryItem, db_column='journalEntryID')
I've tried removing the journalEntryItem OneToOneField. Doing that then removes my ability to retrieve the dollar amount for this particular InvoiceEntryItem (which is only stored in journalEntryItem).
I've also tried removing the invoiceEntry ForeignKey relationship. Doing that removes the relationship that allows me to see the InvoiceEntry 1->M InvoiceEntryItems in the admin inline. All I see are blank fields (instead of the actual data that is currently stored in the DB).
It seems like option 2 is closer to what I want to do. But my inexperience with Django seems to be limiting me. I might be able to filter the larger pool of journal entries to see just invoice entries. But it would be really handy to think of these solely as invoices (instead of a subset of journal entries).
Any thoughts on how to do what I'm after?
First, inheriting from a model creates an automatic OneToOneField in the inherited model towards the parents so you don't need to add them. Remove them if you really want to use this form of model inheritance.
If you only want to share the member of the model, you can use Meta inheritance which will create the inherited columns in the table of your inherited model. This way would separate your JournalEntry in 2 tables though but it would be easy to retrieve only the invoices.
All fields in the superclass also exist on the subclass, so having an explicit relation is unnecessary.
Model inheritance in Django is terrible. Don't use it. Python doesn't need it anyway.
I have a couple of models in django which are connected many-to-many. I want to create instances of these models in memory, present them to the user (via custom method-calls inside the view-templates) and if the user is satisfied, save them to the database.
However, if I try to do anything on the model-instances (call rendering methods, e.g.), I get an error message that says that I have to save the instances first. The documentation says that this is because the models are in a many-to-many relationship.
How do I present objects to the user and allowing him/her to save or discard them without cluttering my database?
(I guess I could turn off transactions-handling and do them myself throughout the whole project, but this sounds like a potentially error-prone measure...)
Thx!
I would add a field which indicates whether the objects are "draft" or "live". That way they are persisted across requests, sessions, etc. and django stops complaining.
You can then filter your objects to only show "live" objects in public views and only show "draft" objects to the user that created them. This can also be extended to allow "archived" objects (or any other state that makes sense).
I think that using django forms may be the answer, as outlined in this documentation (search for m2m...).
Edited to add some explanation for other people who might have the same problem:
say you have a model like this:
from django.db import models
from django.forms import ModelForm
class Foo(models.Model):
name = models.CharField(max_length = 30)
class Bar(models.Model):
foos = models.ManyToManyField(Foo)
def __unicode__(self):
return " ".join([x.name for x in foos])
then you cannot call unicode() on an unsaved Bar object. If you do want to print things out before they will be saved, you have to do this:
class BarForm(ModelForm):
class Meta:
model = Bar
def example():
f1 = Foo(name = 'sue')
f1.save()
f2 = foo(name = 'wendy')
f2.save()
bf = BarForm({'foos' : [f1.id, f2.id]})
b = bf.save(commit = false)
# unfortunately, unicode(b) doesn't work before it is saved properly,
# so we need to do it this way:
if(not bf.is_valid()):
print bf.errors
else:
for (key, value) in bf.cleaned_data.items():
print key + " => " + str(value)
So, in this case, you have to have saved Foo objects (which you might validate before saving those, using their own form), and before saving the models with many to many keys, you can validate those as well. All without the need to save data too early and mess up the database or dealing with transactions...
Very late answer, but wagtail's team has made a separate Django extension called django-modelcluster. It's what powers their CMS's draft previews.
It allows you to do something like this (from their README):
from modelcluster.models import ClusterableModel
from modelcluster.fields import ParentalKey
class Band(ClusterableModel):
name = models.CharField(max_length=255)
class BandMember(models.Model):
band = ParentalKey('Band', related_name='members')
name = models.CharField(max_length=255)
Then the models can be used like so:
beatles = Band(name='The Beatles')
beatles.members = [
BandMember(name='John Lennon'),
BandMember(name='Paul McCartney'),
]
Here, ParentalKey is the replacement for Django's ForeignKey. Similarly, they have ParentalManyToManyField to replace Django's ManyToManyField.