Django model with hundreds of fields

Django model with hundreds of fields - python

I have a model with hundreds of properties. The properties can be of different types (integer, strings, uploaded files, ...). I would like to implement this complex model step by step, starting with the most important properties. I can think of two options:
Define the properties as regular model fields
Define a separate model to hold each property separately, and link it to the main model with a ForeignKey
I have not found any suggestions on how to handle models with lots of properties with django. What are the advantages / drawbacks of both approaches?

You definitely should not define your properties as ForeignKeys. Every time you need a full model, your database server will have to make hundreds of JOINs, therefore ruining your performance.
If your properties are needed almost every time you access the model, you should keep them in the same model. If not, you could make a separate Properties model and link it to your original model via OneToOneField.
I personally had such an experience. We had to build a hotel recomendation engine, and we were using Drupal back then. And as Drupal stores every custom property in a separate MySQL table, we quickly realised we should switch the framework, because every single query crashed our production servers (20+ JOINs are a deadly thing to MySQL). BTW, we ended up using a custom solution based on ElasticSearch, which handles hundreds of fields just fine.
Update: If you're lucky enough to be using a recent version of PostgreSQL, you could leverage the JSONField storage to pack all your fields to a single model field. Note, though, that you'll have to implement a validation scheme by yourself.

customer requirement.
First off, I feel your pain and wish you the best! I wish to reiterate if this wasn't the case that you should be first looking to change this as there should never be any need for hundreds of properties on a single object, it normally shows a need for an array, inheritance, or separate classes etc..
Going forward, you're going to need to heavily make use of values and values_list to only return the properties that you actually need from the database since performance will be severely crippled from this.
Since you can't do anything with the model, you should try to address your performance issues from the design side of things. The single responsibility principle should feature heavily in your website which will mean you'll only ever have a few values needed to be returned from the model. This way it really won't make much difference what option you choose since what is returned will be very limited.
Filter where you can, and use ordering sparingly.

You could group them into a few separate models, linked by OneToOneFields to the main model. That would "namespace" your data, and namespaces are "one honking great idea".

Related

MultiSelectField vs separate model

I'm building a directory of Hospitals and Clinics and as a speciality field I'd like to store the speciality or type of clinic or hospital (like Dermathologist, etc.). However, there are some places, specially big ones, with many different specialities in one place, and as the choices= method of a CharField doesn't allow me to select more than one option I had to think of an alternative.
At first I didn't think it was necessary to create a different table and add a relation, that's why I tried the django-multiselectfield package and it works just fine, but I was wondering if it would be better to create a different table and give it a relation to the Hospitals model. That 'type' or 'speciality' table once built it likely won't ever change in its contents. Is it better to build a different table performance-wise?
Also I'm trying to store the choices of the model in a different choices.py file with TextChoices model classes as I will be using the same choices in various fields of different models through different apps. I know is generally better to store the choices inside the same class, but does that make sense in my case?

Performance is probably not the primary concern here; I think the difference between the two approaches would be negligible. Whether one or more than one model would use the same set of choices doesn't lean one way or another; either a fixed list or many-to-many relation could accommodate that.
Although you say that the selections aren't expected to change (an argument in favor of a hard-coded list of choices), medical specialties are a kind of data that do change in the long run. Contrast this with, say, months of the year or days of the week, which are a lot less likely to change.
That said, if you already have a multi-select field working, I'd be inclined to leave it alone until there's a compelling reason to change it.

For that 2nd part, I see no issue with storing the Choice list in another .py file.
I've done that strictly to keep my models.py looking somewhat pretty- I don't want to scroll past 150 choices to double check a model method.
The 1st part is all about taste. I'd personally go the Relation + Many-To-Many route.
I always plan for edge cases so "likely won't change" = "So there's a possibility"
Also I like that the Relation + Many-To-Many route doesn't have a dependency, it's a Core Django feature.. It's pretty rock solid and future proof
an added benefit is making it another table also means that a non-technical person could potentially add new options and in theory you're not spending your time constantly changing it.

Is there a django idiom to store app-related variables in the DB?

I'm quite new to django, and moved to it from Drupal.
In Drupal is possible to define module-level variables (read "application" for django) which are stored in the DB and use one of Drupal's "core tables". The idiom would be something like:
variable_set('mymodule_variablename', $value);
variable_get('mymodule_variablename', $default_value);
variable_del('mymodule_variablename');
The idea is that it wouldn't make sense to have each module (app) to instantiate a whole "module table" to just store one value, so the core provides a common one to be shared across modules.
To the best of my newbie understanding of django, django lack such a functionality, but - since it is a common pattern - I thought to turn to SO community to check if there is a typical/standard/idiomatic way that django devs use to solve this problem.
(BTW: the value is not a constant that I could put in a settings file. It's a value that should be refreshed daily, and should be read at each request).

There are apps to achieve this, but I'd like to recommend django-modeldict from disqus, as its brief
ModelDict is a very efficient way to store things like settings in
your database. The entire model is transformed into a dictionary
(lazily) as well as stored in your cache. It's invalidated only when
it needs to be (both in process and based on CACHE_BACKEND).

Data that is not static is stored in a model. If you need to share data or functions between apps I have seen the convention of making a shared app, something like 'common'. This would house shared models, or utility functions.
In the django projects I have seen the data is usually specific. The data you are storing should be in a model that is representative of that data, I would rather have an explicit model/object representing my data then a generic object that houses vastly different data.
If you are only defining 1 or two variables which are changed daily, perhaps just a key/value store like memcached would work for you?

Another +1 for ModelDict. Another potential, similar solution is Django Constance:
https://github.com/jazzband/django-constance
It's meant to store app config parameters in the database and has the advantage that it exposes a nice backend to edit them for administrators (with the right permissions), handles default values and also has caching etc.
EDIT:
In case it's not clear from the documentation (which it isn't), you can set settings the same the 'Pythonic way.' I.e. to set a setting to a value, you do
from constance import config
config.variable_name = value

Getting and serializing the state of dynamically created python instances to a relational model

I'm developing a framework of sorts. I'm providing a base class, that will be subclassed by other developers to add behavior to the system. The instances of those classes will have attributes that my framework doesn't necessarily expect, except by inspecting those instances' __dict__. To make things even more interesting, some of those classes can be created dynamically, at any time.
I'd like some things to be handled by the framework, namely, I will need to persist those instances, display their attribute values to the user, and let her search/filter instances using those values.
I have to use a relational database. I know there are some decent python OO database out there, but unfortunately they're not an option in this case.
I'm not looking for a full-blown ORM too... and it may not even be an option, given that some of the classes can be created dynamically.
So, my question is, what state of a python instance do I need to serialize to ensure that I can deserialize it later on? Is it enough to look at __dict__, or are there other private attributes that I should be using?
Pickling the instances is not enough, because I'll need to unpickle them to search/filter the attribute values, and I'm afraid it's too much data to do it in-memory (instead of letting the database do it).

Just use an ORM. This is what they are for.
What you are proposing to do is create your own half-assed ORM on your own time. Save your time for your own code that does things, and use the effort other people put for free into solving this problem for you.
Note that all class creation in python is "dynamic" - this is not an issue, for, well, anything at all. In fact, if you are assembling classes programmatically, it is probably slightly easier with an ORM, because they provide reifications of fields.
In the worst case, if you really do need to store your objects in a fake nosql-type schema, you will still only have to write your own backend driver if you use an existing ORM, rather than coding the whole stack yourself. (As it happens, you're not the first person to face this - solutions exist. Goole "python orm store dynamically created models" and "sqlalchemy store dynamically created models")
Candidates include:
Django ORM
SQLAlchemy
Some others you can find by googling "Python ORM".

Can django lazy-load fields in a model?

One of my django models has a large TextField which I often don't need to use. Is there a way to tell django to "lazy-load" this field? i.e. not to bother pulling it from the database unless I explicitly ask for it. I'm wasting a lot of memory and bandwidth pulling this TextField into python every time I refer to these objects.
The alternative would be to create a new table for the contents of this field, but I'd rather avoid that complexity if I can.

The functionality happens when you make the query, using the defer() statement, instead of in the model definition. Check it out here in the docs:
http://docs.djangoproject.com/en/dev/ref/models/querysets/#defer
Now, actually, your alternative solution of refactoring and pulling the data into another table is a really good solution. Some people would say that the need to lazy load fields means there is a design flaw, and the data should have been modeled differently.
Either way works, though!

There are two options for lazy-loading in Django: https://docs.djangoproject.com/en/1.6/ref/models/querysets/#django.db.models.query.QuerySet.only
defer(*fields)
Avoid loading those fields that require expensive processing to convert them to Python objects.
Entry.objects.defer("text")
only(*fields)
Only load the fields that you actually need
Person.objects.only("name")
Personally, I think only is better than defer since the code is not only easier to understand, but also more maintainable in the long run.

For something like this you can just override the default manager. Usually, it's not advised but for a defer() it makes sense:
class CustomManager(models.Manager):
def get_queryset(self):
return super(CustomManager, self).get_queryset().defer('YOUR_TEXTFIELD_FIELDNAME')
class DjangoModel(models.Model):
objects = CustomerManager()

Examples of use for PickledObjectField (django-picklefield)?

surfing on the web, reading about django dev best practices points to use pickled model fields with extreme caution.
But in a real life example, where would you use a PickledObjectField, to solve what specific problems?

We have a system of social-networks "backends" which do some generic stuff like "post message", "get status", "get friends" etc. The link between each backend class and user is django model, which keeps user, backend name and credentials. Now imagine how many auth systems are there: oauth, plain passwords, facebook's obscure js stuff etc. This is where JSONField shines, we keep all backend-specif auth data in a dictionary on this model, which is stored in db as json, we can put anything into it no problem.

You would use it to store... almost-arbitrary Python objects. In general there's little reason to use it; JSON is safer and more portable.

You can definitely substitute a PickledObjectField with JSON and some extra logic to create an object out of the JSON. At the end of the day, your use case, when considering to use a PickledObjectField or JSON+logic, is serializing a Python object into your database. If you can trust the data in the Python object, and know that it will always be serialize-able, you can reasonably use the PickledObjectField. In my case (I don't use django's ORM, but this should still apply), I have a couple different object types that can go into my PickledObjectField, and their definitions are constantly mutating. Rather than constantly updating my JSON parsing logic to create an object out of JSON values, I simply use a PickledObjectField to just store the different objects, and then later retrieve them in perfectly usable form (calling their functions). Caveat: If you store an object via PickledObjectField, then you change the object definition, and then you retrieve the object, the old object may have trouble fitting into the new object's definition (depending on what you changed).

The problems to be solved are the efficiency and the convenience of defining and handling a complex object consisting of many parts.
You can turn each part type into a Model and connect them via ForeignKeys.
Or you can turn each part type into a class, dictionary, list, tuple, enum or whathaveyou to your liking and use PickledObjectField to store and retrieve the whole beast in one step.
That approach makes sense if you will never manipulate parts individually, only the complex object as a whole.
Real life example
In my application there are RQdef objects that represent essentially a type with a certain basic structure (if you are curious what they mean, look here).
RQdefs consist of several Aspects and some fixed attributes.
Aspects consist of one or more Facets and some fixed attributes.
Facets consist of two or more Levels and some fixed attributes.
Levels consist of a few fixed attributes.
Overall, a typical RQdef will have about 20-40 parts.
An RQdef is always completely constructed in a single step before it is stored in the database and it is henceforth never modified, only read (but read frequently).
PickledObjectField is more convenient and much more efficient for this purpose than would be a set of four models and 20-40 objects for each RQdef.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Django model with hundreds of fields - python

You could group them into a few separate models, linked by OneToOneFields to the main model. That would "namespace" your data, and namespaces are "one honking great idea".

Related

MultiSelectField vs separate model

Is there a django idiom to store app-related variables in the DB?

Getting and serializing the state of dynamically created python instances to a relational model

Can django lazy-load fields in a model?

Examples of use for PickledObjectField (django-picklefield)?

Categories

Resources