I have a django model class that maintains state as a simple property. I have added a couple of helper properties to the class to access aggregate states - e.g. is_live returns false if the state is any one of ['closed', 'expired', 'deleted'] etc.
As a result of this my model has a collection of is_ properties, that do very simple lookups on internal properties of the object.
I now want to add a new property, is_complete - which is semantically the same as all the other properties - a boolean check on the state of the object - however, this check involves loading up dependent (one-to-many) child objects, checking their state and reporting back based on the results - i.e. this property actually does some (more than one) database query, and processes the results.
So, is it still valid to model as a property (using the #property decorator), or should I instead forego the decorator and leave it as a method?
Pro of using a property is that it's semantically consistent with all the other is_ properties.
Pro of using a method is that it indicates to other developers that this is something that has a more complex implementation, and so should be used sparingly (i.e. not inside a for.. loop).
from django.db import models
class MyModel(models.Model):
state = CharField(default='new')
#property
def is_open(self):
# this is a simple lookup, so makes sense as a property
return self.state in ['new', 'open', 'sent']
def is_complete(self):
# this is a complex database activity, but semantically correct
related_objects = self.do_complicated_database_lookup()
return len(related_objects)==0
EDIT: I come from a .NET background originally, where the split is admirably defined by Jeff Atwood as
"if there's any chance at all that code could spawn an hourglass, it definitely should be a method."
EDIT 2: slight update to the question - would it be a problem to have it as a method, called is_complete, so that there are mixed properties and methods with similar names - or is that just confusing?
So - it would look something like this:
>>> m = MyModel()
>>> m.is_live
True
>>> m.is_complete()
False
It is okay to do that, especially if you will use the following pattern:
class SomeClass(models.Model):
#property
def is_complete(self):
if not hasattr(self, '_is_complete'):
related_objects = self.do_complicated_database_lookup()
self._is_complete = len(related_objects) == 0
return self._is_complete
Just remember that it "caches" the results, so first execution does calculation, but subsequent use existing results.
Related
This is actually language agnostic, but I always prefer Python.
The builder design pattern is used to validate that a configuration is valid prior to creating an object, via delegation of the creation process.
Some code to clarify:
class A():
def __init__(self, m1, m2): # obviously more complex in life
self._m1 = m1
self._m2 = m2
class ABuilder():
def __init__():
self._m1 = None
self._m2 = None
def set_m1(self, m1):
self._m1 = m1
return self
def set_m2(self, m1):
self._m2 = m2
return self
def _validate(self):
# complicated validations
assert self._m1 < 1000
assert self._m1 < self._m2
def build(self):
self._validate()
return A(self._m1, self._m2)
My problem is similar, with an extra constraint that I can't re-create the object each time due to to performance limitations.
Instead, I want to only update an existing object.
Bad solutions I came up with:
I could do as suggested here and just use setters like so
class A():
...
set_m1(self, m1):
self._m1 = m1
# and so on
But this is bad because using setters
Beats the purpose of encapsulation
Beats the purpose of the buillder (now updater), which is supposed to validate that some complex configuration is preserved after the creation, or update in this case.
As I mentioned earlier, I can't recreate the object every time, as this is expensive and I only want to update some fields, or sub-fields, and still validate or sub-validate.
I could add update and validation methods to A and call those, but this beats the purpose of delegating the responsibility of updates, and is intractable in the number of fields.
class A():
...
def update1(m1):
pass # complex_logic1
def update2(m2):
pass # complex_logic2
def update12(m1, m2):
pass # complex_logic12
I could just force to update every single field in A in a method with optional parameters
class A():
...
def update("""list of all fields of A"""):
pass
Which again is not tractable, as this method will soon become a god method due to the many combinations possible.
Forcing the method to always accept changes in A, and validating in the Updater also can't work, as the Updater will need to look at A's internal state to make a descision, causing a circular dependency.
How can I delegate updating fields in my object A
in a way that
Doesn't break encapsulation of A
Actually delegates the responsibility of updating to another object
Is tractable as A becomes more complicated
I feel like I am missing something trivial to extend building to updating.
I am not sure I understand all of your concerns, but I want to try and answer your post. From what you have written I assume:
Validation is complex and multiple properties of an object must be checked to decide if any change to the object is valid.
The object must always be in a valid state. Changes that make the object invalid are not permitted.
It is too expensive to copy the object, make the change, validate the object, and then reject the change if the validation fails.
Move the validation logic out of the builder and into a separate class like ModelValidator with a validateModel(model) method
The first option is to use a command pattern.
Create abstract class or interface named Update (I don't think Python abstract classes/interfaces, but that's fine). The Update interface implements two methods, execute() and undo().
A concrete class has a name like UpdateAdress, UpdatePortfolio, or UpdatePaymentInfo.
Each concrete Update object also holds a reference to your model object.
The concrete classes hold the state needed to for a particular kind of update. Imageine these methods exist on the UpdateAddress class:
UpdateAddress
setStreetNumber(...)
setCity(...)
setPostcode(...)
setCountry(...)
The update object needs to hold both the current and new values of a property. Like:
setStreetNumber(aString):
self.oldStreetNumber = model.getStreetNumber
self.newStreetNumber = aString
When the execute method is called, the model is updated:
execute:
model.setStreetNumber(newStreetNumber)
model.setCity(newCity)
# Set postcode and country
if not ModelValidator.isValid(model):
self.undo()
raise ValidationError
and the undo method looks like:
undo:
model.setStreetNumber(oldStreetNumber)
model.setCity(oldCity)
# Set postcode and country
That is a lot of typing, but it would work. Mutating your model object is nicely encapsulated by different kinds of updates. You can execute or undo the changes by calling those methods on the update object. You can even store a list of update objects for multi-level undos and re-tries.
However, it is a lot of typing for the programmer. Consider using persistent data structures. Persistent data structures can be used to copy objects very quickly -- approximately constant time complexity. Here is a python library of persistent data structures.
Let's assume your data was in a persistent data structure version of a dict. The library I referenced calls it a PMap.
The implementation of the update classes can be simpler. Starting with the constructor:
UpdateAddress(pmap)
self.oldPmap = pmap
self.newPmap = pmap
The setters are easier:
setStreetNumber(aString):
self.newPmap = newPmap.set('streetNumber', aString)
Execute passes back a new instance of the model, with all the updates.
execute:
if ModelValidator.isValid(newModel):
return newModel;
else:
raise ValidationError
The original object has not changed at all, thanks to the magic of persistent data structures.
The best thing is to not do any of this. Instead, use an ORM or object database. That is the "enterprise grade" solution. These libraries give you sophisticated tools like transactions and object version history.
Being new to Django, I'm starting to care a bit about performance of my web application.
I'm trying to transform many of my custom functions / properties which were originally in my models to querysets within custom managers.
in my model I have:
class Shape(models.Model):
#property
def nb_color(self):
return 1 if self.colors=='' else int(1+sum(self.colors.upper().count(x) for x in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
def __str__(self):
return self.name + "-" + self.constraints
#property
def image_url(self):
return format_html(f'{settings.SVG_DIR}/{self}.svg')
#property
def image_src(self):
return format_html('<img src="{url}"|urlencode />'.format(url = self.image_url))
def image_display(self):
return format_html(f'{self.image_src}"')
But I'm not clear on a few points:
1/ is there any pros or cons declaring with the propriety decorator in a django model?
2/ what is the cost of calling a function/property in term of database calls
and therefore, is there an added value to use custom managers / querysets and define annotations to simulate my functions at that level?
3/ how would you suggest me to transform my image & nb_color functions into annotations
Thanks in advance
PS: For the image related functions, I mostly figured it out:
self.annotate(image_url = Concat(Value(join(settings.SVG_DIR,'')), F('fullname'), Value('.svg'), output_field=CharField()),
image_src = Concat(Value('<img src="'), F('image_url'), Value('"|urlencode />'), output_field=CharField()),
image_display = Concat(Value(''),F('image_src'), Value(''), output_field=CharField()),
)
I am however having an issue for the display of image_src
through:
readonly_fields=['image']
def image(self, obj):
return format_html(obj.image_src)
it doesn't seem to find the image while the adress is ok.
If anybody has an idea...
PS: For the image related functions, I mostly figured it out:
self.annotate(image_url = Concat(Value(join(settings.SVG_DIR,'')),
F('fullname'), Value('.svg'), output_field=CharField()),
image_src = Concat(Value(''), output_field=CharField()),
image_display = Concat(Value(''),F('image_src'),
Value(''), output_field=CharField()),
) I am however having an issue for the display of image_src through:
readonly_fields=['image'] def image(self, obj):
return format_html(obj.image_src) it doesn't seem to find the image while the adress is ok.
I figured it up for my image problem: I should simply use a relative path and let Django manage:
self.annotate(image_url = Concat(Value('/static/SVG_shapes/'), F('fullname'), Value('.svg'), output_field=CharField()),)
With now 1.5 years more experience, I'll try to answer my newbie questions for the next ones who may have the same questions poping into their minds.
1/ is there any pros or cons declaring with the propriety decorator in a django model?
No cons that I could see so far.
It allows the data to be retrieved as a property of the model (my_shape.image_url), instead of having to call the corresponding method (my_shape.image_url())
However, for different purposes, one my prefer to have a callable (the method) instead of a property
2/ what is the cost of calling a function/property in term of database calls
No extra calling to the database if the data it needs as input are already available, or are themselves attributes of the instance object (fields / properties / methods that don't require input from outside the instance object)
However, if external data are needed, a database call will be generated for each of them.
For this reason, it can be valuable to cache the result of such a property by using the #cached_property decorator instead of the #property decorator
The only thing needed to use cached properties is the following import:
from django.utils.functional import cached_property
After being called for the first time, the cached property will remain available at no extra cost during all the lifetime of the object instance,
and its content can be manipulated like any other property / variable:
and therefore, is there an added value to use custom managers / querysets and define annotations to simulate my functions at that level?
In my understanding and practice so far, it is not uncommon to replicate the same functionality in both property & managers
The reason is that properties are easily available when we are interested only in one specific object instance,
while when you are interested into comparing / retrieving a given property for a range of objects, it is much more efficient to calculate & annotate this property for the whole queryset, for instance through using model managers
My give-away would be:
For a given model,
(1) try to put all the business logic concerning a single object instance into model methods / properties
(2) and all the business logic concerning a range of objects into model managers
3/ how would you suggest me to transform my image & nb_color functions into annotations
Already answered in previous answer
In the following example, cached_attr is used to get or set an attribute on a model instance when a database-expensive property (related_spam in the example) is called. In the example, I use cached_spam to save queries. I put print statements when setting and when getting values so that I could test it out. I tested it in a view by passing an Egg instance into the view and in the view using {{ egg.cached_spam }}, as well as other methods on the Egg model that make calls to cached_spam themselves. When I finished and tested it out the shell output in Django's development server showed that the attribute cache was missed several times, as well as successfully gotten several times. It seems to be inconsistent. With the same data, when I made small changes (as little as changing the print statement's string) and refreshed (with all the same data), different amounts of misses / successes happened. How and why is this happening? Is this code incorrect or highly problematic?
class Egg(models.Model):
... fields
#property
def related_spam(self):
# Each time this property is called the database is queried (expected).
return Spam.objects.filter(egg=self).all() # Spam has foreign key to Egg.
#property
def cached_spam(self):
# This should call self.related_spam the first time, and then return
# cached results every time after that.
return self.cached_attr('related_spam')
def cached_attr(self, attr):
"""This method (normally attached via an abstract base class, but put
directly on the model for this example) attempts to return a cached
version of a requested attribute, and calls the actual attribute when
the cached version isn't available."""
try:
value = getattr(self, '_p_cache_{0}'.format(attr))
print('GETTING - {0}'.format(value))
except AttributeError:
value = getattr(self, attr)
print('SETTING - {0}'.format(value))
setattr(self, '_p_cache_{0}'.format(attr), value)
return value
Nothing wrong with your code, as far as it goes. The problem probably isn't there, but in how you use that code.
The main thing to realise is that model instances don't have identity. That means that if you instantiate an Egg object somewhere, and a different one somewhere else, even if they refer to the same underlying database row they won't share internal state. So calling cached_attr on one won't cause the cache to be populated in the other.
For example, assuming you have a RelatedObject class with a ForeignKey to Egg:
my_first_egg = Egg.objects.get(pk=1)
my_related_object = RelatedObject.objects.get(egg__pk=1)
my_second_egg = my_related_object.egg
Here my_first_egg and my_second_egg both refer to the database row with pk 1, but they are not the same object:
>>> my_first_egg.pk == my_second_egg.pk
True
>>> my_first_egg is my_second_egg
False
So, filling the cache on my_first_egg doesn't fill it on my_second_egg.
And, of course, objects won't persist across requests (unless they're specifically made global, which is horrible), so the cache won't persist either.
Http servers that scale are shared-nothing; you can't rely on anything being singleton. To share state, you need to connect to a special-purpose service.
Django's caching support is appropriate for your use case. It isn't necessarily a global singleton either; if you use locmem://, it will be process-local, which could be the more efficient choice.
I came across this syntax browsing through code for examples. From its surrounding code, it looked like would a) get the entity with the given keyname or b) if the entity did not exist, create a new entity that could be saved. Assume my model class is called MyModel.
my_model = MyModel(key_name='mymodelkeyname',
kwarg1='first arg', kwarg2='second arg')
I'm now running into issues, but only in certain situations. Is my assumption about what this snippet does correct? Or should I always do the following?
my_model = MyModel.get_by_key_name('mymodelkeyname')
if not my_model:
my_model = MyModel(key_name='mymodelkeyname',
kwarg1='first arg', kwarg2='second arg')
else:
# do something with my_model
The constructor, which is what you're using, always constructs a new entity. When you store it, it overwrites any other entity with the same key.
The alternate code you propose also has an issue: it's susceptible to race conditions. Two instances of that code running simultaneously could both determine that the entity does not exist, and each create it, resulting in one overwriting the work of the other.
What you want is the Model.get_or_insert method, which is syntactic sugar for this:
def get_or_insert(cls, key_name, **kwargs):
def _tx():
model = cls.get_by_key_name(key_name)
if not model:
model = cls(key_name=key_name, **kwargs)
model.put()
return model
return db.run_in_transaction(_tx)
Because the get operation and the conditional insert take place in a transaction, the race condition is not possible.
Is this what you are looking for -> http://code.google.com/appengine/docs/python/datastore/modelclass.html#Model_get_or_insert
I have a complex network of objects being spawned from a sqlite database using sqlalchemy ORM mappings. I have quite a few deeply nested:
for parent in owner.collection:
for child in parent.collection:
for foo in child.collection:
do lots of calcs with foo.property
My profiling is showing me that the sqlalchemy instrumentation is taking a lot of time in this use case.
The thing is: I don't ever change the object model (mapped properties) at runtime, so once they are loaded I don't NEED the instrumentation, or indeed any sqlalchemy overhead at all. After much research, I'm thinking I might have to clone a 'pure python' set of objects from my already loaded 'instrumented objects', but that would be a pain.
Performance is really crucial here (it's a simulator), so maybe writing those layers as C extensions using sqlite api directly would be best. Any thoughts?
If you reference a single attribute of a single instance lots of times, a simple trick is to store it in a local variable.
If you want a way to create cheap pure python clones, share the dict object with the original object:
class CheapClone(object):
def __init__(self, original):
self.__dict__ = original.__dict__
Creating a copy like this costs about half of the instrumented attribute access and attribute lookups are as fast as normal.
There might also be a way to have the mapper create instances of an uninstrumented class instead of the instrumented one. If I have some time, I might take a look how deeply ingrained is the assumption that populated instances are of the same type as the instrumented class.
Found a quick and dirty way that seems to at least somewhat work on 0.5.8 and 0.6. Didn't test it with inheritance or other features that might interact badly. Also, this touches some non-public API's, so beware of breakage when changing versions.
from sqlalchemy.orm.attributes import ClassManager, instrumentation_registry
class ReadonlyClassManager(ClassManager):
"""Enables configuring a mapper to return instances of uninstrumented
classes instead. To use add a readonly_type attribute referencing the
desired class to use instead of the instrumented one."""
def __init__(self, class_):
ClassManager.__init__(self, class_)
self.readonly_version = getattr(class_, 'readonly_type', None)
if self.readonly_version:
# default instantiation logic doesn't know to install finders
# for our alternate class
instrumentation_registry._dict_finders[self.readonly_version] = self.dict_getter()
instrumentation_registry._state_finders[self.readonly_version] = self.state_getter()
def new_instance(self, state=None):
if self.readonly_version:
instance = self.readonly_version.__new__(self.readonly_version)
self.setup_instance(instance, state)
return instance
return ClassManager.new_instance(self, state)
Base = declarative_base()
Base.__sa_instrumentation_manager__ = ReadonlyClassManager
Usage example:
class ReadonlyFoo(object):
pass
class Foo(Base, ReadonlyFoo):
__tablename__ = 'foo'
id = Column(Integer, primary_key=True)
name = Column(String(32))
readonly_type = ReadonlyFoo
assert type(session.query(Foo).first()) is ReadonlyFoo
You should be able to disable lazy loading on the relationships in question and sqlalchemy will fetch them all in a single query.
Try using a single query with JOINs instead of the python loops.