I built my own class to implement an estimation procedure (call it EstimationProcedure). To run the procedure, the user calls method fit. First, this fits a Pooled OLS model using the fit method of the PooledOLS class from the linearmodels package. This returns a PanelResults object which I store in variable model. Second, my fit method estimates, e.g., standard errors, t-statistics, p-values, etc. (using a custom bootstrapping method I wrote) whose results are stored in local variables, e.g., std_errors, tstats, pvalues, etc. My method shall now return a PanelResults object that combines information from the initial estimation and my own estimates (because I want to use linearmodel's capabilities to compare multiple regressions and produce latex output).
To this end, I need to create a new PanelResults object. However, the necessary information is not accessible through attributes of model.
Conceptually, what would I need to do to implement this? Or is there a smarter way to achieve this? I suppose that this is rather a question on OOP which I have no experience with.
The following code illustrates the structure of my class:
from linearmodels.panel import PooledOLS
from linearmodels.panel.results import PanelResults
class EstimationProcedure:
def __init__(self, data):
self.data = data
def fit(self):
# estimate Pooled OLS
model = PooledOLS(self.data)
# construct my own results using a bootstrap procedure
# this requires the result from an initial PooledOLS estimation
std_errors, tstats, pvalues = self.bootstrap(self.data)
# to create and return a new PanelResults object, I need
# to pass a number of results, say `res`, from the initial
# pooled OLS estimation along with my own results to the
# constructor. However, `PooledOLS` prepares
# estimation results required by `PanelResults`'s
# constructor internally without making them accessible
# through attributes. Hence, I cannot "recreate" it.
res = dict()
return PanelResults(res)
# data is stored in some dataframe
df = pd.DataFrame()
# usage of my estimation procedure
model = EstimationProcedure(df)
model.fit()
Not very clean solution, but sometimes its impossible to do otherwise.
Create child class of PooledOLS
class CustomPooledOLS(PooledOLS):
Find where is method (or methods) that computes what you need. I assume PooledOLS calculates something you need and disregards it afterwards.
Overwrite method(s) that compute(s) things you need and change it so it saves what you need in some parameter
def nameOfTheFunction(all, of, the, function, arguments):
# - here copy all of the original function code
# - modify it so that it saves in the object what is needed, e.g.
this.tstats_cache = temp_tstats
# later you can use those in EsimationProcedure to feed to PanelResults
I had difficulty understanding all your needs, but hopefully this helps you
You can create a subclass of PanelResults named CustomPanelResults (line 1) and override its constructor to take in your custom results (line 2). Then call the constructor of the parent class with the results from the PooledOLS estimation (line 3) along with your custom results (line 4) to create a new PanelResults object that includes both sets of results.
class CustomPanelResults(PanelResults): # line 1
def __init__(self, model, custom_results): # line 2
super().__init__(model) # line 3
self.custom_results = custom_results # line 4
Then modify the fit method of your EstimationProcedure class to create an instance of this subclass and pass in the necessary arguments to its constructor:
def fit(self):
model = PooledOLS(self.data)
std_errors, tstats, pvalues = self.bootstrap(self.data)
res = {'std_errors': std_errors, 'tstats': tstats, 'pvalues': pvalues}
return CustomPanelResults(model, res)
Related
This is actually language agnostic, but I always prefer Python.
The builder design pattern is used to validate that a configuration is valid prior to creating an object, via delegation of the creation process.
Some code to clarify:
class A():
def __init__(self, m1, m2): # obviously more complex in life
self._m1 = m1
self._m2 = m2
class ABuilder():
def __init__():
self._m1 = None
self._m2 = None
def set_m1(self, m1):
self._m1 = m1
return self
def set_m2(self, m1):
self._m2 = m2
return self
def _validate(self):
# complicated validations
assert self._m1 < 1000
assert self._m1 < self._m2
def build(self):
self._validate()
return A(self._m1, self._m2)
My problem is similar, with an extra constraint that I can't re-create the object each time due to to performance limitations.
Instead, I want to only update an existing object.
Bad solutions I came up with:
I could do as suggested here and just use setters like so
class A():
...
set_m1(self, m1):
self._m1 = m1
# and so on
But this is bad because using setters
Beats the purpose of encapsulation
Beats the purpose of the buillder (now updater), which is supposed to validate that some complex configuration is preserved after the creation, or update in this case.
As I mentioned earlier, I can't recreate the object every time, as this is expensive and I only want to update some fields, or sub-fields, and still validate or sub-validate.
I could add update and validation methods to A and call those, but this beats the purpose of delegating the responsibility of updates, and is intractable in the number of fields.
class A():
...
def update1(m1):
pass # complex_logic1
def update2(m2):
pass # complex_logic2
def update12(m1, m2):
pass # complex_logic12
I could just force to update every single field in A in a method with optional parameters
class A():
...
def update("""list of all fields of A"""):
pass
Which again is not tractable, as this method will soon become a god method due to the many combinations possible.
Forcing the method to always accept changes in A, and validating in the Updater also can't work, as the Updater will need to look at A's internal state to make a descision, causing a circular dependency.
How can I delegate updating fields in my object A
in a way that
Doesn't break encapsulation of A
Actually delegates the responsibility of updating to another object
Is tractable as A becomes more complicated
I feel like I am missing something trivial to extend building to updating.
I am not sure I understand all of your concerns, but I want to try and answer your post. From what you have written I assume:
Validation is complex and multiple properties of an object must be checked to decide if any change to the object is valid.
The object must always be in a valid state. Changes that make the object invalid are not permitted.
It is too expensive to copy the object, make the change, validate the object, and then reject the change if the validation fails.
Move the validation logic out of the builder and into a separate class like ModelValidator with a validateModel(model) method
The first option is to use a command pattern.
Create abstract class or interface named Update (I don't think Python abstract classes/interfaces, but that's fine). The Update interface implements two methods, execute() and undo().
A concrete class has a name like UpdateAdress, UpdatePortfolio, or UpdatePaymentInfo.
Each concrete Update object also holds a reference to your model object.
The concrete classes hold the state needed to for a particular kind of update. Imageine these methods exist on the UpdateAddress class:
UpdateAddress
setStreetNumber(...)
setCity(...)
setPostcode(...)
setCountry(...)
The update object needs to hold both the current and new values of a property. Like:
setStreetNumber(aString):
self.oldStreetNumber = model.getStreetNumber
self.newStreetNumber = aString
When the execute method is called, the model is updated:
execute:
model.setStreetNumber(newStreetNumber)
model.setCity(newCity)
# Set postcode and country
if not ModelValidator.isValid(model):
self.undo()
raise ValidationError
and the undo method looks like:
undo:
model.setStreetNumber(oldStreetNumber)
model.setCity(oldCity)
# Set postcode and country
That is a lot of typing, but it would work. Mutating your model object is nicely encapsulated by different kinds of updates. You can execute or undo the changes by calling those methods on the update object. You can even store a list of update objects for multi-level undos and re-tries.
However, it is a lot of typing for the programmer. Consider using persistent data structures. Persistent data structures can be used to copy objects very quickly -- approximately constant time complexity. Here is a python library of persistent data structures.
Let's assume your data was in a persistent data structure version of a dict. The library I referenced calls it a PMap.
The implementation of the update classes can be simpler. Starting with the constructor:
UpdateAddress(pmap)
self.oldPmap = pmap
self.newPmap = pmap
The setters are easier:
setStreetNumber(aString):
self.newPmap = newPmap.set('streetNumber', aString)
Execute passes back a new instance of the model, with all the updates.
execute:
if ModelValidator.isValid(newModel):
return newModel;
else:
raise ValidationError
The original object has not changed at all, thanks to the magic of persistent data structures.
The best thing is to not do any of this. Instead, use an ORM or object database. That is the "enterprise grade" solution. These libraries give you sophisticated tools like transactions and object version history.
Being new to Django, I'm starting to care a bit about performance of my web application.
I'm trying to transform many of my custom functions / properties which were originally in my models to querysets within custom managers.
in my model I have:
class Shape(models.Model):
#property
def nb_color(self):
return 1 if self.colors=='' else int(1+sum(self.colors.upper().count(x) for x in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'))
def __str__(self):
return self.name + "-" + self.constraints
#property
def image_url(self):
return format_html(f'{settings.SVG_DIR}/{self}.svg')
#property
def image_src(self):
return format_html('<img src="{url}"|urlencode />'.format(url = self.image_url))
def image_display(self):
return format_html(f'{self.image_src}"')
But I'm not clear on a few points:
1/ is there any pros or cons declaring with the propriety decorator in a django model?
2/ what is the cost of calling a function/property in term of database calls
and therefore, is there an added value to use custom managers / querysets and define annotations to simulate my functions at that level?
3/ how would you suggest me to transform my image & nb_color functions into annotations
Thanks in advance
PS: For the image related functions, I mostly figured it out:
self.annotate(image_url = Concat(Value(join(settings.SVG_DIR,'')), F('fullname'), Value('.svg'), output_field=CharField()),
image_src = Concat(Value('<img src="'), F('image_url'), Value('"|urlencode />'), output_field=CharField()),
image_display = Concat(Value(''),F('image_src'), Value(''), output_field=CharField()),
)
I am however having an issue for the display of image_src
through:
readonly_fields=['image']
def image(self, obj):
return format_html(obj.image_src)
it doesn't seem to find the image while the adress is ok.
If anybody has an idea...
PS: For the image related functions, I mostly figured it out:
self.annotate(image_url = Concat(Value(join(settings.SVG_DIR,'')),
F('fullname'), Value('.svg'), output_field=CharField()),
image_src = Concat(Value(''), output_field=CharField()),
image_display = Concat(Value(''),F('image_src'),
Value(''), output_field=CharField()),
) I am however having an issue for the display of image_src through:
readonly_fields=['image'] def image(self, obj):
return format_html(obj.image_src) it doesn't seem to find the image while the adress is ok.
I figured it up for my image problem: I should simply use a relative path and let Django manage:
self.annotate(image_url = Concat(Value('/static/SVG_shapes/'), F('fullname'), Value('.svg'), output_field=CharField()),)
With now 1.5 years more experience, I'll try to answer my newbie questions for the next ones who may have the same questions poping into their minds.
1/ is there any pros or cons declaring with the propriety decorator in a django model?
No cons that I could see so far.
It allows the data to be retrieved as a property of the model (my_shape.image_url), instead of having to call the corresponding method (my_shape.image_url())
However, for different purposes, one my prefer to have a callable (the method) instead of a property
2/ what is the cost of calling a function/property in term of database calls
No extra calling to the database if the data it needs as input are already available, or are themselves attributes of the instance object (fields / properties / methods that don't require input from outside the instance object)
However, if external data are needed, a database call will be generated for each of them.
For this reason, it can be valuable to cache the result of such a property by using the #cached_property decorator instead of the #property decorator
The only thing needed to use cached properties is the following import:
from django.utils.functional import cached_property
After being called for the first time, the cached property will remain available at no extra cost during all the lifetime of the object instance,
and its content can be manipulated like any other property / variable:
and therefore, is there an added value to use custom managers / querysets and define annotations to simulate my functions at that level?
In my understanding and practice so far, it is not uncommon to replicate the same functionality in both property & managers
The reason is that properties are easily available when we are interested only in one specific object instance,
while when you are interested into comparing / retrieving a given property for a range of objects, it is much more efficient to calculate & annotate this property for the whole queryset, for instance through using model managers
My give-away would be:
For a given model,
(1) try to put all the business logic concerning a single object instance into model methods / properties
(2) and all the business logic concerning a range of objects into model managers
3/ how would you suggest me to transform my image & nb_color functions into annotations
Already answered in previous answer
I'm trying to use Factoryboy to create a list in an object of the length specified when created.
I can create the list, but every attempt to create a list with the length specified causes issues due to the lazy nature of the provided length/size.
This is what I have so far:
class FooFactory(factory.Factory):
class Meta:
model = command.Foo
foo_uuid = factory.Faker("uuid4")
bars = factory.List([
factory.LazyAttribute(lambda o: BarFactory()
for _ in range(3))
])
This will create a list of 3 random Bars. I have tried using a combination of Params and exclude, but because range expects an Int, and the int won't be lazily loaded until later, it causes an error.
I would like something similar to how one to many relationships are generated with post_generation ie.
foo = FooFactory(number_of_bars=5)
Anyone had any luck with this?
Main solution
Two things are needed for this:
parameters
and LazyAttribute
(the links point to their documentation, for more detail).
Parameters are like factory attributes that are not passed to the instance that will be created.
In this case, they provide a way to parametrize the length of the list of Bars.
But in order to use parameters to customize a field in the factory, we need to have access to self,
that is, the instance being built.
We can achieve that with LazyAttribute, which is a declaration that takes a function with one argument:
the object being built.
Just what we needed.
So the snippet in the question could be re-written as follows:
class FooFactory(factory.Factory):
class Meta:
model = command.Foo
class Params:
number_of_bars = 1
foo_uuid = factory.Faker("uuid4")
bars = factory.LazyAttribute(lambda self: [BarFactory()] * self.number_of_bars)
And used like this:
foo = FooFactory(number_of_bars=3)
If the number_of_bars argument is not provided, the default of 1 is used.
Drawbacks
Sadly, there is a some limitation to what we can do here.
The preferred way to use a factory in the definition of another factory is via
SubFactory.
That is preferred for two reasons:
it respects the build strategy used for the parent factory
it collects extra keyword arguments to customize the subfactory
The first one means that if we used SubFactory to build a Bar in FooFactory
and called FooFactory with FooFactory.create or FooFactory.build,
the Bar subfactory would respect that and use the same strategy.
In summary, the build strategy only builds an instance,
while the create strategy builds and saves the instance to the persistent storage being used,
for example a database, so respecting this choice is important.
See the docs
for more details.
The second one means that we can directly customize attributes of Bar when calling FooFactory.
For example:
foo = FooFactory(bar__id=2)
would set the id of the bar of foo to be 2 instead of what the Bar subfactory would generate by default.
But I could not find a way to use SubFactory and a dynamic length via Params.
There is no way, as far as I know, to access the value of a parameter in a context where FactoryBoy expects a SubFactory.
The problem is that the declarations that give us access to the object being built always expect a final value to be returned,
not another factory to be called later.
This means that, in the example above, if we write instead:
class FooFactory(factory.Factory):
# ... rest of the factory
bars = factory.LazyAttribute(lambda self: [factory.SubFactory(BarFactory)] * self.number_of_bars)
then calling it like
foo = FooFactory(number_of_bars=3)
would result in a foo that has a list of 3 BarFactory in foo.bars instead of a list of 3 Bars.
And using SelfAttribute,
which is a way to reference another attribute of the instance being built, doesn't work either
because it is not evaluated before the rest of the expression in a declaration like this:
class FooFactory(factory.Factory):
# ... rest of the factory
bars = factory.List([factory.SubFactory(BarFactory)] * SelfAttribute("number_of_bars"))
That raises TypeError: can't multiply sequence by non-int of type 'SelfAttribute'.
A possible workaround is to call BarFactory beforehand and pass it to FooFactory:
number_of_bars = 3
bars = BarFactory.create_batch(number_of_bars)
foo = FooFactory(bars=bars)
But that's certainly not as nice.
Another one that I found out recently is RelatedFactoryList.
But that's still experimental and it doesn't seem to have a way to access parameters.
Additionally, since it's generated after the base factory, it also might not work if the instance constructor
expects that attribute as an argument.
There is a way to pass the length of a list and retain the ability to set additional properties on the subfactory. It requires creating a post_generation method.
class FooFactory(factory.Factory):
class Meta:
model = command.Foo
foo_uuid = factory.Faker("uuid4")
bars__count = 5 # Optional: default number of bars to create
#factory.post_generation
def bars(self, create, extracted, **kwargs):
if not create:
return
num_bars = kwargs.get('count', 0)
color = kwargs.get('color')
if num_bars > 0:
self.bars = [BarFactory(color=color)] * num_bars
elif extracted:
self.bars=extracted
any parameter with the construct modelname__paramname will be passed to the post_generation method as paramname in kwargs.
You can then call the FooFactory as:
FooFactory.create(bars__color='blue')
and it will create Foo with 5 Bars (the default value).
You can also call FooFactory and tell it to create 10 Bars.
FooFactory.create(bars__color='blue', bars__count=10)
I actually have this method in my Model:
def speed_score_compute(self):
# Speed score:
#
# - 8 point for every % of time spent in
# high intensity running phase.
# - Average Speed in High Intensity
# running phase (30km/h = 50 points
# 0-15km/h = 15 points )
try:
high_intensity_running_ratio = ((
(self.h_i_run_time * 100)/self.training_length) * 8)
except ZeroDivisionError:
return 0
high_intensity_running_ratio = min(50, high_intensity_running_ratio)
if self.h_i_average_speed < 15:
average_speed_score = 10
else:
average_speed_score = self.cross_multiplication(
30, self.h_i_average_speed, 50)
final_speed_score = high_intensity_running_ratio + average_speed_score
return final_speed_score
I want to use it as default for my Model like this:
speed_score = models.IntegerField(default=speed_score_compute)
But this don't work (see Error message below) . I've checked different topic like this one, but this work only for function (not using self attribute) but not for methods (I must use methods since I'm working with actual object attributes).
Django doc. seems to talk about this but I don't get it clearly maybe because I'm still a newbie to Django and Programmation in general.
Is there a way to achieve this ?
EDIT:
My function is defined above my models. But here is my error message:
ValueError: Could not find function speed_score_compute in tournament.models.
Please note that due to Python 2 limitations, you cannot serialize unbound method functions (e.g. a method declared and used in the same class body). Please move the function into the main module body to use migrations.
For more information, see https://docs.djangoproject.com/en/1.8/topics/migrations/#serializing-values
Error message is clear, it seems that I'm not able to do this. But is there another way to achieve this ?
The problem
When we provide a default=callable and provide a method from a model, it doesn't get called with the self argument that is the model instance.
Overriding save()
I haven't found a better solution than to override MyModel.save()* like below:
class MyModel(models.Model):
def save(self, *args, **kwargs):
if self.speed_score is None:
self.speed_score = ...
# or
self.calculate_speed_score()
# Now we call the actual save method
super(MyModel, self).save(*args, **kwargs)
This makes it so that if you try to save your model, without a set value for that field, it is populated before the save.
Personally I just prefer this approach of having everything that belongs to a model, defined in the model (data, methods, validation, default values etc). I think this approach is referred to as the fat Django model approach.
*If you find a better approach, I'd like to learn about it too!
Using a pre_save signal
Django provides a pre_save signal that runs before save() is ran on the model. Signals run synchronously, i.e. the pre_save code needs to finish running before save() is called on the model. You'll get the same results (and order of execution) as overriding the save().
from django.db.models.signals import pre_save
from django.dispatch import receiver
from myapp.models import MyModel
#receiver(pre_save, sender=MyModel)
def my_handler(sender, **kwargs):
instance = kwargs['instance']
instance.populate_default_values()
If you prefer to keep the default values behavior separated from the model, this approach is for you!
When is save() called? Can I work with the object before it gets saved?
Good questioin! Because we'd like the ability to work with our object before subjecting to saving of populating default values.
To get an object, without saving it, just to work with it we can do:
instance = MyModel()
If you create it using MyModel.objects.create(), then save() will be called. It is essentially (see source code) equivalent to:
instance = MyModel()
instance.save()
If it's interesting to you, you can also define a MyModel.populate_default_values(), that you can call at any stage of the object lifecycle (at creation, at save, or on-demande, it's up to you)
Apologize if this is duplicate - not sure how to word what I"m trying to accomplish.
I have two class' of interest here (in brief):
class Patient:
...
self.weight = (some float)
self.medicationDays = (some float)
self.AverageWeightChange = (some float)
etc.
class PtAnalyzer:
...
self.ptList1 = [listOfPatients]
self.ptList2 = [anotherListOfPatients]
def getSummaryStats(self,ptList,metric):
list = [patient.metric for patient in ptList]
self.getStats(list)
return list
def sendForStats(self):
weightStats = self.getSummaryStats(self.ptList1, metric = weight)
avgWeightStats = self.getSummaryStats(self.ptList1, metric = AverageWtChange)
...
So the program gathers a bunch of patient instances, then passes them off to the PtAnalyzer which has an attribute - a list holding the Patient instances. Since most of the patient Metrics I"m analyzing are simple floats, I can run stats on them in a standard fashion, though I need to convert the metrics to a list first (for the stats function).
My Question: How can I tell the getSummaryStats function which metric to use? I'm trying to not write separate functions for each metric - seems non-parsimonious.
(This is actually run in a Jython 2.5.2 environment as it needs JDBC, though I use no other Jython spec. functionality)
You want getattr(); pass the metric to use as a string.