Django DRY Model/Form/Serializer Validation

Django DRY Model/Form/Serializer Validation - python

I'm having some issues figuring out the best (read: DRY & maintainable) place for introducing validation logic in Django, namely between models, forms, and DRF serializers.
I've worked with Django for several years and have been following the various conventions for handling model, form, and REST API endpoint validation. I've tried a lot of variations for ensuring overall data integrity, but I've hit a bit of a stumbling block recently. Here is a brief list of what I've tried after looking through many articles, SO posts, and tickets:
Validation at the model level; namely, ensuring all of my custom constraints are matched before calling myModel.save() by overriding myModel.clean() (as well as field-specific and unique together methods). To do this, I ensured myModel.full_clean() was called in myForm.clean() (for forms -- and the admin panel actually already does this) and mySerializer.validate() (for DRF serializers) methods.
Validation at the form and serializer level, calling a shared method for maintainable, DRY code.
Validation at the form and serializer level, with a distinct method for each to ensure maximum flexibility (i.e. for when forms and endpoints have different constraints).
Method one seems the most intuitive to me for when forms and serializers have identical constraints, but is a bit messy in practice; first, data is automatically cleaned and validated by the form or serializer, then the model entity is instantiated, and more validation is run again -- which is a little convoluted and can get complicated.
Method three is what Django Rest Framework recommends as of version 3.0; they eliminated a lot of their model.save() hooks and prefer to leave validation to the user-facing aspects of your application. This makes some sense to me, since Django's base model.save() implementation doesn't call model.full_clean() anyway.
So, method two seems to be the best overall generalized outcome to me; validation lives in a distinct place -- before the model is ever touched -- and the codebase is less cluttered / more DRY due to the shared validation logic.
Unfortunately, most of the trouble I've encountered is with getting Django Rest Framework's serializers to cooperate. All three approaches work well for forms, and in fact work well for most HTTP methods (most notably when POSTing for entity creation) -- but none seem to play well when updating an existing entity (PUT, PATCH).
Long story short, it has proved rather difficult to validate incoming data when it is incomplete (but otherwise valid -- often the case for PATCH). The request data may only contain some fields -- those that contain different / new information -- and the model instance's existing information is maintained for all other fields. In fact, DRF issue #4306 perfectly sums up this particular challenge.
I've also considered running custom model validation at the viewset level (after serializer.validated_data is populated and serializer.instance exists, but before serializer.save() is called), but I'm still struggling to come up with a clean, generalized approach due to the complexities of handling updates.
TL;DR Django Rest Framework makes it a bit hard to write clean, maintainable validation logic in an obvious place, especially for partial updates that rely on a blend of existing model data and incoming request data.
I'd love to have some Django gurus weigh in on what they've gotten to work, because I'm not seeing any convenient solution.
Thanks.

Just realized I never posted my solution back to this question. I ended up writing a model mixin to always run validation before saving; it's a bit inconvenient as validation will technically be run twice in Django's forms (i.e. in the admin panel), but it lets me guarantee that validation is run -- regardless of what triggers a model save. I generally don't use Django's forms, so this doesn't have much impact on my applications.
Here's a quick snippet that does the trick:
class ValidatesOnSaveModelMixin:
""" ValidatesOnSaveModelMixin
A mixin that ensures valid model state prior to saving.
"""
def save(self, **kwargs):
self.full_clean()
super(ValidatesOnSaveModelMixin, self).save(**kwargs)
Here is how you'd use it:
class ImportantModel(ValidatesOnSaveModelMixin, models.Model):
""" Will always ensure its fields pass validation prior to saving. """
There is one important caveat: any of Django's direct-to-database operations (i.e. ImportantModel.objects.update()) don't call a model's save() method, and therefore will not be validated. There's not much to do about this, since these methods are really about optimizing performance by skipping a bunch of database calls -- so just be aware of their impact if you use them.

I agree, the link between models/serializers/validation is broken.
The best DRY solution I've found is to keep validation in model, with validators specified on fields, then if needed, model level validation in clean() overridden.
Then in serializer, override validate and call the model clean() e.g. in MySerializer:
def validate(self, data):
instance = FooModel(**data)
instance.clean()
return data
It's not nice, but I prefer this to 2-level validation in serializer and model.

Just wanted to add on SamuelMS's answer.
In case you use F() expressions and similar. As explained here this will fail.
class ValidatesOnSaveModelMixin:
""" ValidatesOnSaveModelMixin
A mixin that ensures valid model state prior to saving.
"""
def save(self, **kwargs):
if 'clean_on_save_exclude' in kwargs:
self.full_clean(exclude=kwargs.pop('clean_on_save_exclude', None)
  else:
self.full_clean()
super(ValidatesOnSaveModelMixin, self).save(**kwargs)
Then just use it the same way he explained.
And now when calling save, if you use query expressions can just call
instance.save(clean_on_save_exclude=['field_name'])
Just like you would exclude if you were calling full_clean and exclude the fields with query expressions.
See https://docs.djangoproject.com/en/2.2/ref/models/instances/#django.db.models.Model.full_clean

Related

How can I define a metric for uniqueness of strings on Django model?

Suppose I have a model that represents scientific articles. Doing some research, I may find the same article more than once, with approximately equal titles:
Some Article Title
Some Article Title
Notice that the second title string is slightly different: it has an extra space before "Title".
If the problem was because there could be more or less spacing, it would be easy since I could just trim it before saving.
But say there could be more small differences that consist of characters other than spaces:
Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT) project.
Comparison of machine learning techniques to predict all-cause mortality using fitness data: the Henry ford exercIse testing (FIT).
This is some random article I used here as an example
Those titles clearly refer to the same unique work, but the second one for some reason is missing some letters.
What is the best way of defining uniqueness in this situation?
In my mind, I was thinking of some function that calculates the levenshtein distance and decides if the strings are the same title based on some threshold. But is it possible to do on a django model, or define this behavior on a database level?

My first thought was the levenshtein distance too, so it's probably the way to go here ;) You could implement it yourself or find the code that already knows how to compute it (there's a lot of them) and then...
...use it in the model validation:
https://docs.djangoproject.com/en/2.0/ref/models/instances/#validating-objects
You can basically raise an exception in the custom validate_unique if you decide the new object violates this special type of uniqueness. The flipside is you'll probably need to load all other objects there.
If you create these objects on your own, you'll have to call full_clean() explicitly before saving. If the articles come from some kind of form, calling is_valid() on that form is enough.

You have 2 options here, 0 of which are perfect.
Option 1
This assumes you have a function titles_are_similar(title_1: str, title_2: str): bool implemented, that decides whether the two titles are similar. Use any sort of fuzzy string comparison of your choice to implement this function.
We will need to use an enhanced validator.
I said "enhanced" because it will optionally accept the object you are currently trying to save, when a typical django validator for obvious reasons does not do so.
The current object's id is required. When you change and save an already existing instance/row x, validation should not fail because the table already contains a "similar" value that belongs to this exact instance/row x.
The validator itself will use values_list to reduce the performance impact.
def title_unique_enough_validator(value, exclude_obj=None):
query_set = Article.objects.all()
if exclude_obj:
query_set = query_set.exclude(pk=exclude_obj.pk) # pk -> id
old_titles = query_set.values_list("title", flat=True)
if any(titles_are_similar(old_title, new_title) for old_title in old_titles):
raise ValidationError("Similar title already exists") # also use _()
If you will use: title = models.CharField(validators=[title_unique_enough_validator], ...) you will get a ValidationError every time you try to modify and save an existing object, as this object is not passed into the validator and therefore not excluded from the check (I mentioned it above). Instead, we will override the Article.clean() method (docs):
class Article(Model):
...
def clean(self):
super().clean()
title_unique_enough_validator(value=self.title, obj=self)
It will nicely work with forms. But there are 2 other major problems left.
Problem 1
Quoting the docs:
Note, however, that like Model.full_clean(), a model’s clean() method is not invoked when you call your model’s save() method.
To solve this, override the .save() method:
class Article(...):
...
def save(self, *args, **kwargs)
title_unique_enough_validator(value=self.value, obj=self) # can raise ValidationError
return super().save(*args, **kwargs)
However, django does not expect to have a ValidationError when calling save(). So, every time you manually call article.save() from your Python code (without djano forms) you need to wrap it into a try ... except block. Otherwise your software will 500 on ValidationError.
Problem 2
Do you ever explicitly call Article.objects.update()? If so, bad news (docs):
update() does an update at the SQL level and, thus, does not call any save() methods on your models
As a workaround, you might want to create a custom model manager for the Article model and override the update(): simply make it unusable (raise NotImplemented), or implement an additional check there. Just something that will prevent it from violating your constraint.
Option 2
Use database constraints.
Why I did not list this option first? Well, you will encounter tons and tons of problems with it. Django is not aware what database constraints might do. It just dies with OperationalError (docs) every time a constraint prevents it from doing what it wants.
As I have to work with many unmanaged models using django, I can confirm that you will require crap load of efforts to enhance django classes, so that it can deal with the OperationalError every now and then without exploding every bloody time. Especially painful is to deal with it if you're using django.contrib.admin, as it's just an endless pile of spaghetti.
So, seriously, avoid database constraints, unless you already must use unmanaged models or you're a masochist in search of adventures.

What is the proper process for validating and saving data with with Django/Django Rest Framework regardless the data source?

I have a particular model that I'd like to perform custom validations on. I'd like to guarantee that at least one identifier field is always present when creating a new instance such that its impossible to create an instance without one of these fields, though no field in particular is individually required.
from django.db import models
class Security(models.Model):
symbol = models.CharField(unique=True, blank=True)
sedol = models.CharField(unique=True, blank=True)
tradingitemid = models.Charfield(unique=True, blank=True)
I'd like a clean, reliable way to do this no matter where the original data is coming from (e.g., an API post or internal functions that get this data from other sources like a .csv file).
I understand that I could overwrite the models .save() method and perform validation, but best practice stated here suggests that raising validation errors in the .save() method is a bad idea because views will simply return a 500 response instead of returning a validation error to a post request.
I know that I can define a custom serializer with a validator using Django Rest Framework for this model that validates the data (this would be a great solution for a ModelViewSet where the objects are created and I can guarantee this serializer is used each time). But this data integrity guarantee is only good on that API endpoint and then as good as the developer is at remembering to use that serializer each and every time an object is created elsewhere in the codebase (objects can be created throughout the codebase from sources besides the web API).
I am also familiar with Django's .clean() and .full_clean() methods. These seem like the perfect solutions, except that it again relies upon the developer always remembering to call these methods--a guarantee that's only as good as the developer's memory. I know the methods are called automatically when using a ModelForm, but again, for my use case models can be created from .csv downloads as well--I need a general purpose guarantee that's best practice. I could put .clean() in the model's .save() method, but this answer (and related comments and links in the post) seem to make this approach controversial and perhaps an anti-pattern.
Is there a clean, straightforward way to make a guarantee that this model can never be saved without one of the three fields that 1. doesn't raise 500 errors through a view, 2. that doesn't rely upon the developer explicitly using the correct serializer throughout the codebase when creating objects, and 3. Doesn't rely upon hacking a call to .clean() into the .save() method of the model (a seeming anti-pattern)? I feel like there must be a clean solution here that isn't a hodge podge of putting some validation in a serializer, some in a .clean() method, hacking the .save() method to call .clean() (it would get called twice with saves from ModelForms), etc...

One could certainly imagine a design where save() did double duty and handled validation for you. For various reasons (partially summarized in the links here), Django decided to make this a two-step process. So I agree with the consensus you found that trying to shoehorn validation into Model.save() is an anti-pattern. It runs counter to Django's design, and will probably cause problems down the road.
You've already found the "perfect solution", which is to use Model.full_clean() to do the validation. I don't agree with you that remembering this will be burdensome for developers. I mean, remembering to do anything right can be hard, especially with a large and powerful framework, but this particular thing is straightforward, well documented, and fundamental to Django's ORM design.
This is especially true when you consider what is actually, provably difficult for developers, which is the error handling itself. It's not like developers could just do model.validate_and_save(). Rather, they would have to do:
try:
model.validate_and_save()
except ValidationError:
# handle error - this is the hard part
Whereas Django's idiom is:
try:
model.full_clean()
except ValidationError:
# handle error - this is the hard part
else:
model.save()
I don't find Django's version any more difficult. (That said, there's nothing stopping you from writing your own validate_and_save convenience method.)
Finally, I would suggest adding a database constraint for your requirement as well. This is what Django does when you add a constraint that it knows how to enforce at the database level. For example, when you use unique=True on a field, Django will both create a database constraint and add Python code to validate that requirement. But if you want to create a constraint that Django doesn't know about you can do the same thing yourself. You would simply write a Migration that creates the appropriate database constraint in addition to writing your own Python version in clean(). That way, if there's a bug in your code and the validation isn't done, you end up with an uncaught exception (IntegrityError) rather than corrupted data.

Idiomatic Way to Transfer Django Model Validation Errors to a Form

Please help me understand if the following choices I have made are idiomatic/good and if not, how to improve.
1) Model Validation over Form Validation
I prefer to use the new model validation over form validation whenever possible as it seems a more DRY and fundamental way to create rules for the data. Two examples for a simple calendar-entry model:
"start must be before end"
"period must be less than (end-start)"
Is it idiomatic/good to put those at the model level so they don't have to be put into forms? If ModelForm is the best answer, then what about the case of using multiple models in the same form?
(edit: I didn't realize that multiple ModelForms can actually be used together)
2) Transferring Model Validation to a Form (not ModelForm)
(edit: It turns out reinventing the plumbing between model validation and form validation is not necessary in my situation and the solutions below show why)
Let's assume for a moment that any model validation errors I have can be directly transferred and displayed directly to the user (i.e. ignore translating model validation errors to user-friendly form validation errors).
This is one working way I came up with to do it (all in one place, no helper functions):
view_with_a_form:
...
if form.is_valid():
instance = ...
...
#save() is overridden to do full_clean with optional exclusions
try: instance.save()
except ValidationError e:
nfe= e.message_dict[NON_FIELD_ERRORS]
#this is one big question mark:
instance._errors['__all__'] = ErrorList(nfe)
#this is the other big question mark. seems unnatural
if form.is_valid()
return response...
... #other code in standard format
#send the user the form as 1)new 2)form errors 3)model errors
return form
As commented in the code:
a) Is this an idiomatic/good way to transfer model errors to a form?
b) Is this an idiomatic/good way to test for new "form" errors?
Note: This example uses non-field errors, but I think it could equally apply to field errors.

[Edited - hope this answers your comments]
I'll be brief and direct but do not want be impolite :)
1) Model Validation over Form Validation
I guess that the rule of thumb could be that as long as the rule is really associated with the model, yes, it's better to do validation at model level.
For multiple-model forms, please checkout this other SO question: Django: multiple models in one template using forms, not forgetting the attached link which is slightly old but still relevant on the dryest way to achieve it. It's too long to re-discuss it all here!
2)
a) No! You should derive your form from a ModelForm that will perform model validation for you -> you do not need to call model validation yourself
b) No! If you want to validate a model you should not try and save it; you should use full_clean -> so if your ModelForm is valid, then you know the model is valid too and you can save.
And yes, these answers still apply even with multi-model forms.
So what you should do is:
Still use ModelForm
do not worry about validating a model, because a ModelForm will do that for you.
In general if you find yourself doing some plumbing with django it's because there's another, simpler way to do it!
The 2nd pointer should get you started and actually simplify your code a lot...

1) Yes, it's perfectly valid to do validation on the model. That's why the Django folks added it. Forms aren't always used in the process of saving models, so if the validation is only done through forms, you'll have problems. Of course, people worked around this limitation in the past by overriding the save method and including validation that way. However, the new model validation is more semantic and gives you a hook into the validation process when actually using a form.
2) The docs pretty clearly say that model validation (full_clean) is run when ModelForm.is_valid is called. However, if you don't use a ModelForm or want to do additional processing, you need to call full_clean manually. You're doing that, but putting it in an overridden save is the wrong approach. Remember: "Explicit is better than implicit." Besides, save is called in many other places and ways, and in the case of a ModelForm, you'll actually end up running full_clean twice that way.
That said, since the docs say that ModelForm does full_clean automatically, I figured it would make sense to see how it handles errors. In the _post_clean method, starting on line 323 of django.forms.models:
# Clean the model instance's fields.
try:
self.instance.clean_fields(exclude=exclude)
except ValidationError, e:
self._update_errors(e.message_dict)
# Call the model instance's clean method.
try:
self.instance.clean()
except ValidationError, e:
self._update_errors({NON_FIELD_ERRORS: e.messages})
In turn, the code for _update_errors starts on line 248 of the same module:
def _update_errors(self, message_dict):
for k, v in message_dict.items():
if k != NON_FIELD_ERRORS:
self._errors.setdefault(k, self.error_class()).extend(v)
# Remove the data from the cleaned_data dict since it was invalid
if k in self.cleaned_data:
del self.cleaned_data[k]
if NON_FIELD_ERRORS in message_dict:
messages = message_dict[NON_FIELD_ERRORS]
self._errors.setdefault(NON_FIELD_ERRORS, self.error_class()).extend(messages)
You'll have to play with the code a bit, but that should give you a good starting point for combining the form and model validation errors.

data validation for SQLAlchemy declarative models

I'm using CherryPy, Mako templates, and SQLAlchemy in a web app. I'm coming from a Ruby on Rails background and I'm trying to set up some data validation for my models. I can't figure out the best way to ensure, say, a 'name' field has a value when some other field has a value. I tried using SAValidation but it allowed me to create new rows where a required column was blank, even when I used validates_presence_of on the column. I've been looking at WTForms but that seems to involve a lot of duplicated code--I already have my model class set up with the columns in the table, why do I need to repeat all those columns again just to say "hey this one needs a value"? I'm coming from the "skinny controller, fat model" mindset and have been looking for Rails-like methods in my model like validates_presence_of or validates_length_of. How should I go about validating the data my model receives, and ensuring Session.add/Session.merge fail when the validations fail?

Take a look at the documentation for adding validation methods. You could just add an "update" method that takes the POST dict, makes sure that required keys are present, and uses the decorated validators to set the values (raising an error if anything is awry).

I wrote SAValidation for the specific purpose of avoiding code duplication when it comes to validating model data. It works well for us, at least for our use cases.
In our tests, we have examples of the model's setup and tests to show the validation works.

API Logic Server provides business rules for SQLAlchemy models. This includes not only multi-field, multi-table validations, but multi-table validations. It's open source.

I ended up using WTForms after all.

Separation of ORM and validation

I use django and I wonder in what cases where model validation should go. There are at least two variants:
Validate in the model's save method and to raise IntegrityError or another exception if business rules were violated
Validate data using forms and built-in clean_* facilities
From one point of view, answer is obvious: one should use form-based validation. It is because ORM is ORM and validation is completely another concept. Take a look at CharField: forms.CharField allows min_length specification, but models.CharField does not.
Ok cool, but what the hell all that validation features are doing in django.db.models? I can specify that CharField can't be blank, I can use EmailField, FileField, SlugField validation of which are performed here, in python, not on RDBMS. Furthermore there is the URLField which checks existance of url involving some really complex logic.
From another side, if I have an entity I want to guarantee that it will not be saved in inconsistent state whether it came from a form or was modified/created by some internal algorithms. I have a model with name field, I expect it should be longer than one character. I have a min_age and a max_age fields also, it makes not much sense if min_age > max_age. So should I check such conditions in save method?
What are the best practices of model validation?

I am not sure if this is best practise but what I do is that I tend to validate both client side and server side before pushing the data to the database. I know it requires a lot more effort but this can be done by setting some values before use and then maintaining them.
You could also try push in size contraints with **kwargs into a validation function that is called before the put() call.

Your two options are two different things.
Form-based validation can be regarded as syntactic validation + convert HTTP request parameters from text to Python types.
Model-based validation can be regarded as semantic validation, sometimes using context not available at the HTTP/form layer.
And of course there is a third layer at the DB where constraints are enforced, and may not be checkable anywhere else because of concurrent requests updating the database (e.g. uniqueness constraints, optimistic locking).

"but what the hell all that validation features are doing in django.db.models? "
One word: Legacy. Early versions of Django had less robust forms and the validation was scattered.
"So should I check such conditions in save method?"
No, you should use a form for all validation.
"What are the best practices of model validation?"*
Use a form for all validation.
"whether it came from a form or was modified/created by some internal algorithms"
What? If your algorithms suffer from psychotic episodes or your programmers are sociopaths, then -- perhaps -- you have to validate internally-generated data.
Otherwise, internally-generated data is -- by definition -- valid. Only user data can be invalid. If you don't trust your software, what's the point of writing it? Are your unit tests broken?

There's an ongoing Google Summer of Code project that aims to bring validation to the Django model layer. You can read more about it in this presentation from the GSoC student (Honza Kral). There's also a github repository with the preliminary code.
Until that code finds its way into a Django release, one recommended approach is to use ModelForms to validate data, even if the source isn't a form. It's described in this blog entry from one of the Django core devs.

DB/Model validation
The data store in database must always be in a certain form/state. For example: required first name, last name, foreign key, unique constraint. This is where the logic of you app resides. No matter where you think the data comes from - it should be "validated" here and an exception raised if the requirements are not met.
Form validation
Data being entered should look right. It is ok if this data is entered differently through some other means (through admin or api calls).
Examples: length of person's name, proper capitalization of the sentence...
Example1: Object has a StartDate and an EndDate. StartDate must always be before EndDate. Where do you validate this? In the model of course! Consider a case when you might be importing data from some other system - you don't want this to go through.
Example2: Password confirmation. You have a field for storing the password in the db. However you display two fields: password1 and password2 on your form. The form, and only the form, is responsible for comparing those two fields to see that they are the same. After form is valid you can safely store the password1 field into the db as the password.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.