data validation for SQLAlchemy declarative models

data validation for SQLAlchemy declarative models - python

I'm using CherryPy, Mako templates, and SQLAlchemy in a web app. I'm coming from a Ruby on Rails background and I'm trying to set up some data validation for my models. I can't figure out the best way to ensure, say, a 'name' field has a value when some other field has a value. I tried using SAValidation but it allowed me to create new rows where a required column was blank, even when I used validates_presence_of on the column. I've been looking at WTForms but that seems to involve a lot of duplicated code--I already have my model class set up with the columns in the table, why do I need to repeat all those columns again just to say "hey this one needs a value"? I'm coming from the "skinny controller, fat model" mindset and have been looking for Rails-like methods in my model like validates_presence_of or validates_length_of. How should I go about validating the data my model receives, and ensuring Session.add/Session.merge fail when the validations fail?

Take a look at the documentation for adding validation methods. You could just add an "update" method that takes the POST dict, makes sure that required keys are present, and uses the decorated validators to set the values (raising an error if anything is awry).

I wrote SAValidation for the specific purpose of avoiding code duplication when it comes to validating model data. It works well for us, at least for our use cases.
In our tests, we have examples of the model's setup and tests to show the validation works.

API Logic Server provides business rules for SQLAlchemy models. This includes not only multi-field, multi-table validations, but multi-table validations. It's open source.

I ended up using WTForms after all.

Related

What is the proper process for validating and saving data with with Django/Django Rest Framework regardless the data source?

I have a particular model that I'd like to perform custom validations on. I'd like to guarantee that at least one identifier field is always present when creating a new instance such that its impossible to create an instance without one of these fields, though no field in particular is individually required.
from django.db import models
class Security(models.Model):
symbol = models.CharField(unique=True, blank=True)
sedol = models.CharField(unique=True, blank=True)
tradingitemid = models.Charfield(unique=True, blank=True)
I'd like a clean, reliable way to do this no matter where the original data is coming from (e.g., an API post or internal functions that get this data from other sources like a .csv file).
I understand that I could overwrite the models .save() method and perform validation, but best practice stated here suggests that raising validation errors in the .save() method is a bad idea because views will simply return a 500 response instead of returning a validation error to a post request.
I know that I can define a custom serializer with a validator using Django Rest Framework for this model that validates the data (this would be a great solution for a ModelViewSet where the objects are created and I can guarantee this serializer is used each time). But this data integrity guarantee is only good on that API endpoint and then as good as the developer is at remembering to use that serializer each and every time an object is created elsewhere in the codebase (objects can be created throughout the codebase from sources besides the web API).
I am also familiar with Django's .clean() and .full_clean() methods. These seem like the perfect solutions, except that it again relies upon the developer always remembering to call these methods--a guarantee that's only as good as the developer's memory. I know the methods are called automatically when using a ModelForm, but again, for my use case models can be created from .csv downloads as well--I need a general purpose guarantee that's best practice. I could put .clean() in the model's .save() method, but this answer (and related comments and links in the post) seem to make this approach controversial and perhaps an anti-pattern.
Is there a clean, straightforward way to make a guarantee that this model can never be saved without one of the three fields that 1. doesn't raise 500 errors through a view, 2. that doesn't rely upon the developer explicitly using the correct serializer throughout the codebase when creating objects, and 3. Doesn't rely upon hacking a call to .clean() into the .save() method of the model (a seeming anti-pattern)? I feel like there must be a clean solution here that isn't a hodge podge of putting some validation in a serializer, some in a .clean() method, hacking the .save() method to call .clean() (it would get called twice with saves from ModelForms), etc...

One could certainly imagine a design where save() did double duty and handled validation for you. For various reasons (partially summarized in the links here), Django decided to make this a two-step process. So I agree with the consensus you found that trying to shoehorn validation into Model.save() is an anti-pattern. It runs counter to Django's design, and will probably cause problems down the road.
You've already found the "perfect solution", which is to use Model.full_clean() to do the validation. I don't agree with you that remembering this will be burdensome for developers. I mean, remembering to do anything right can be hard, especially with a large and powerful framework, but this particular thing is straightforward, well documented, and fundamental to Django's ORM design.
This is especially true when you consider what is actually, provably difficult for developers, which is the error handling itself. It's not like developers could just do model.validate_and_save(). Rather, they would have to do:
try:
model.validate_and_save()
except ValidationError:
# handle error - this is the hard part
Whereas Django's idiom is:
try:
model.full_clean()
except ValidationError:
# handle error - this is the hard part
else:
model.save()
I don't find Django's version any more difficult. (That said, there's nothing stopping you from writing your own validate_and_save convenience method.)
Finally, I would suggest adding a database constraint for your requirement as well. This is what Django does when you add a constraint that it knows how to enforce at the database level. For example, when you use unique=True on a field, Django will both create a database constraint and add Python code to validate that requirement. But if you want to create a constraint that Django doesn't know about you can do the same thing yourself. You would simply write a Migration that creates the appropriate database constraint in addition to writing your own Python version in clean(). That way, if there's a bug in your code and the validation isn't done, you end up with an uncaught exception (IntegrityError) rather than corrupted data.

Why doesn't Django support Single Table Inheritance?

What is the rationale behind the decision to not support Single Table Inheritance in Django?
Is STI a bad design? Does it result in poor performance? Would it conflict with the Django ORM as it is?
Just wondering because it's been a missing feature for like ten years now and so there must have been a conscious decision made that it would never be supported.

One reason is possibly that Django does not (currently) have the ability to modify database tables after creation.
You can 'kind-of' do STI using proxy models. This will not allow you to have different fields on the different models, but it will allow you to attach different behaviour (via model methods) to different subclasses.
However, if you decide to create a subclass with extra fields, Django will not be able to update the database to reflect that.

Flask: View, model and business logic segration

Please help me how to solve following task in "pythonic" way:
There are several model classes, which are mapped to the DB with the help of SQLAlchemy.
There is a Flask view, which handles the "POST" request.
The business logic of this method contains complex logic which includes:
Getting input parameters from input JSON
Validation
Creation of several different models and the saving to database.
Is it a good idea to leave this logic in the "View"? Or it would be much better to separate this this logic into different modules or classes, for instance by introducing business logic class?

If you need to unit test the code separate from the View then you should defiantly separate it into another module or class.
As there seems to be three parts to your business logic then I would say starting by splitting the view into three functions of a module seems a good place to start.

TastyPie and Django ORM - how tightly coupled are they?

Is it possible to develop an API in Django "TastyPie" in a way which doesn't tie it directly to a "single" Django ORM model? i.e. a call /api/xyz/ would retrieve data from "a", "b" & "c" into a single JSON output. If so, please point me in the right direction.

tastypie is more tightly coupled to the ORM than django-piston, but there are methods that you can define in a tastypie resource to specify how to handle create, read, update, delete: http://readthedocs.org/docs/django-tastypie/en/latest/resources.html?highlight=put_list#obj-get
And you would just not set the queryset meta field.
django-piston on the other hand, has a more direct initial approach to having you define one or more of these methods. The resource can still be bound to a model to give you out of the box REST, but its more up front about showing you the methods to define for custom handling.
tastypie is a bit more robust in its process and features, but it makes this specific feature set a little less apparent.

Tastypie has ModelResource and Resource. The former is tied to a model(which you can override a lot of its methods as jdi suggested) and the latter is what you need I think. Example of Resource here. The example is for a Riak data source, in your case it would be a combination of django models.

Separation of ORM and validation

I use django and I wonder in what cases where model validation should go. There are at least two variants:
Validate in the model's save method and to raise IntegrityError or another exception if business rules were violated
Validate data using forms and built-in clean_* facilities
From one point of view, answer is obvious: one should use form-based validation. It is because ORM is ORM and validation is completely another concept. Take a look at CharField: forms.CharField allows min_length specification, but models.CharField does not.
Ok cool, but what the hell all that validation features are doing in django.db.models? I can specify that CharField can't be blank, I can use EmailField, FileField, SlugField validation of which are performed here, in python, not on RDBMS. Furthermore there is the URLField which checks existance of url involving some really complex logic.
From another side, if I have an entity I want to guarantee that it will not be saved in inconsistent state whether it came from a form or was modified/created by some internal algorithms. I have a model with name field, I expect it should be longer than one character. I have a min_age and a max_age fields also, it makes not much sense if min_age > max_age. So should I check such conditions in save method?
What are the best practices of model validation?

I am not sure if this is best practise but what I do is that I tend to validate both client side and server side before pushing the data to the database. I know it requires a lot more effort but this can be done by setting some values before use and then maintaining them.
You could also try push in size contraints with **kwargs into a validation function that is called before the put() call.

Your two options are two different things.
Form-based validation can be regarded as syntactic validation + convert HTTP request parameters from text to Python types.
Model-based validation can be regarded as semantic validation, sometimes using context not available at the HTTP/form layer.
And of course there is a third layer at the DB where constraints are enforced, and may not be checkable anywhere else because of concurrent requests updating the database (e.g. uniqueness constraints, optimistic locking).

"but what the hell all that validation features are doing in django.db.models? "
One word: Legacy. Early versions of Django had less robust forms and the validation was scattered.
"So should I check such conditions in save method?"
No, you should use a form for all validation.
"What are the best practices of model validation?"*
Use a form for all validation.
"whether it came from a form or was modified/created by some internal algorithms"
What? If your algorithms suffer from psychotic episodes or your programmers are sociopaths, then -- perhaps -- you have to validate internally-generated data.
Otherwise, internally-generated data is -- by definition -- valid. Only user data can be invalid. If you don't trust your software, what's the point of writing it? Are your unit tests broken?

There's an ongoing Google Summer of Code project that aims to bring validation to the Django model layer. You can read more about it in this presentation from the GSoC student (Honza Kral). There's also a github repository with the preliminary code.
Until that code finds its way into a Django release, one recommended approach is to use ModelForms to validate data, even if the source isn't a form. It's described in this blog entry from one of the Django core devs.

DB/Model validation
The data store in database must always be in a certain form/state. For example: required first name, last name, foreign key, unique constraint. This is where the logic of you app resides. No matter where you think the data comes from - it should be "validated" here and an exception raised if the requirements are not met.
Form validation
Data being entered should look right. It is ok if this data is entered differently through some other means (through admin or api calls).
Examples: length of person's name, proper capitalization of the sentence...
Example1: Object has a StartDate and an EndDate. StartDate must always be before EndDate. Where do you validate this? In the model of course! Consider a case when you might be importing data from some other system - you don't want this to go through.
Example2: Password confirmation. You have a field for storing the password in the db. However you display two fields: password1 and password2 on your form. The form, and only the form, is responsible for comparing those two fields to see that they are the same. After form is valid you can safely store the password1 field into the db as the password.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.