Can you code a django-fsm transition that targets an arbitrary state?

Can you code a django-fsm transition that targets an arbitrary state? - python

I'm using django-fsm on a state field (type FSMField) to track sample tubes through a process. I need to allow some power users to "jump" objects from one state to another, logging and sending notifications when it happens. My question is: how do I write this transition while avoiding code repeition (i.e. DRY)?
More details:
I've set protected=True on my FSMField: I really like the protection it provides - no other code paths are able to change state.
Here's the basics (note: not full code, not expected to work, just to illustrate)
class SampleTube(model.Model):
state = FSMField(default='new', choices=(...), protected=True)
#transition(field=state, source='*', target='*', permission='my_app.superpowers') # <-- problem 1
def set_state(self, user, new_state):
assert is_valid_state(new_state)
log_event(self, user, new_state)
send_notification(self, self.owner)
self.state = new_state # <-- problem 2
Problem 1: As I understand it, I can only use a single string value for target (docs link). Fair enough. I love the fact that calling method models automatically sets the state. So I don't think I can write a transition to an arbitrary state.
Problem 2: If I want to retain protected=True (for the reasons above), I can't directly modify the state field (raises AttributeError, as documented)
Do I have to resort to writing this inside my model class? Is there some metaprogramming approach that'll keep me DRY?
#transition(field=state, source='*', target='used', permission='myapp.superpowers')
def set_used(self, user):
# ...
#transition(field=state, source='*', target='received', permission='myapp.superpowers')
def set_received(self, user):
# ...
#... loads more set_xyz methods that all have the same signature...
The reason I want to work this out (other than an appreciation of concise code) is that the number of potential states is quite large (10+).
[edit] Occurs to me that temporarily, explicitly disabling protection on the state field inside a set_state method might be another way to approach this, if I could work out how...

Although as author of django-fsm I highly don't recommend use your approach, I think finally it could leads you to if-elif trash inside set_state function, I could suggest, not to implement it as model method.
Instead just make a regular function, and update db state directly.
def set_state(tube, new_state):
SampleTube.objects.filter(pk=tube.pk).update(state=new_state);

Related

Hook every variable reference and execute code

I have a Python app split across different files. One of them, models.py, contains, among PyQt5 table models, several maps referred from several PyQt5 form files:
# first lines:
agents_id_map = \
{agent.name:agent.id for agent in db.session.query(db.Agent, db.Agent.id)}
# ....
# 2000 thousand lines
I want to keep this kind of maps centralized in a single point. I'm using SQLAlchemy also. Agent class is defined in a db.py file. I use these maps to fulfill the foreign key in another object, say, an invoice, like:
invoice = db.Invoice()
# Here is a reference
invoice.agent_id = models.agents_id_map[agent_combo.currentText()]
····
db.session.add(invoice)
db.session.commit()
The problem is that the model.py module gets cached and several parts of the application access old data, and, if another running instance A of the app creates a new agent, and a running instance B wants to create a new invoice, the B running instance won't see the new Agent created by A unless restarts the app. This also happens if a user in the same running instance creates an agent and then he wants to create an invoice. My solutions are:
Reload the module, to get the whole code executed again, but this could be very expensive.
Isolate the code building those maps in another file, say maps.py, which would be less expensive to reload and change all code that references it through refactoring.
Is there a solution that would allow me to touch only the code building those maps and the rest of the application remains ignorant of the change, and every time the map is referenced from another module or even the same, the code gets executed, effectively re-building maps with fresh data?

Is there a solution that would allow me to touch only the code building those maps and the rest of the application remains ignorant of the change, and every time the map is referenced from another module or even the same, the code gets executed, effectively re-building maps with fresh data?
Certainly: put you maps inside a function, or even better, a class.
If I understand this problem correctly, you have stateful data (maps) which need regenerating under some condition (every time they are accessed? Or just every time the db is updated?). I would do something like this:
class Mappings:
def __init__(self, db):
self._db = db
... # do any initial db stuff you need to here
def id_map(self, thing):
db_thing = getattr(self._db, thing.title)
return {x.name:x.id for x in self._db.session.query(db_thing, db_thing.id)}
def other_property_map(self, prop):
... # etc
mapping = Mapping(db)
mapping.id_map("agent")
This assumes that the mapping example you've given is your major use-case, but this model could easily be adapted for almost any other mapping you might want.
You would write a method of every kind of 'mapping' you need, and it would return the desired dictionary. Note that here I've assumed you handle setting up the db elsewhere and pass a fully initialised db access object to the class, which is probably what you want to do---this class is just about encapsulating mapper state, not re-inventing your orm.
Caching
I have not provided any caching. But if you have complete control over the db, it is easy enough to run a hook before you do any db commits looking to see if you've touched any particular model, and then state that those need rebuilding. Something like this:
class DbAccess(Mappings):
def __init__(self, db, models):
super().init(db)
self._cached_map = {model: {} for model in models}
def db_update(model: str, params: dict):
try:
self._cached_map[model] = {} # wipe cache
except KeyError:
pass
self._db.update_with_model(model, params) # dummy fn
def id_map(self, thing: str):
try:
return self._cached_map[thing]["id"]
except KeyError:
self._cached_map[thing]["id"] = super().id_map(thing)
return self._cached_map[thing]["id"]
I don't really think DbAccess should inherit from Mappings---put it all in one class, or have a DB class and a Mappings mixin and inherit from both. I just didn't want to write everything out again.
I've not written any real db access routines, (hence my dummy fn) as I don't know how you're doing it (but clearly using an ORM). But the basic idea is just to handle the caching yourself, by storing the mapping every time, but deleting all the stored mappings every time you do any commit transactions involving the model in question (thus rebuilding the cache as needed).
Aside
Note that if you really do have 2,000 lines of manually declared mappings of the form thing.name: thing.id you really should generate them at runtime anyhow. Declarative is all very well and good, but writing out 2,000 permutations of the same thing isn't declarative, it's just time-consuming---and doing the job a simple loop putting the data in ram could do for you at startup.

"Updater" design pattern, as opposed to "Builder"

This is actually language agnostic, but I always prefer Python.
The builder design pattern is used to validate that a configuration is valid prior to creating an object, via delegation of the creation process.
Some code to clarify:
class A():
def __init__(self, m1, m2): # obviously more complex in life
self._m1 = m1
self._m2 = m2
class ABuilder():
def __init__():
self._m1 = None
self._m2 = None
def set_m1(self, m1):
self._m1 = m1
return self
def set_m2(self, m1):
self._m2 = m2
return self
def _validate(self):
# complicated validations
assert self._m1 < 1000
assert self._m1 < self._m2
def build(self):
self._validate()
return A(self._m1, self._m2)
My problem is similar, with an extra constraint that I can't re-create the object each time due to to performance limitations.
Instead, I want to only update an existing object.
Bad solutions I came up with:
I could do as suggested here and just use setters like so
class A():
...
set_m1(self, m1):
self._m1 = m1
# and so on
But this is bad because using setters
Beats the purpose of encapsulation
Beats the purpose of the buillder (now updater), which is supposed to validate that some complex configuration is preserved after the creation, or update in this case.
As I mentioned earlier, I can't recreate the object every time, as this is expensive and I only want to update some fields, or sub-fields, and still validate or sub-validate.
I could add update and validation methods to A and call those, but this beats the purpose of delegating the responsibility of updates, and is intractable in the number of fields.
class A():
...
def update1(m1):
pass # complex_logic1
def update2(m2):
pass # complex_logic2
def update12(m1, m2):
pass # complex_logic12
I could just force to update every single field in A in a method with optional parameters
class A():
...
def update("""list of all fields of A"""):
pass
Which again is not tractable, as this method will soon become a god method due to the many combinations possible.
Forcing the method to always accept changes in A, and validating in the Updater also can't work, as the Updater will need to look at A's internal state to make a descision, causing a circular dependency.
How can I delegate updating fields in my object A
in a way that
Doesn't break encapsulation of A
Actually delegates the responsibility of updating to another object
Is tractable as A becomes more complicated
I feel like I am missing something trivial to extend building to updating.

I am not sure I understand all of your concerns, but I want to try and answer your post. From what you have written I assume:
Validation is complex and multiple properties of an object must be checked to decide if any change to the object is valid.
The object must always be in a valid state. Changes that make the object invalid are not permitted.
It is too expensive to copy the object, make the change, validate the object, and then reject the change if the validation fails.
Move the validation logic out of the builder and into a separate class like ModelValidator with a validateModel(model) method
The first option is to use a command pattern.
Create abstract class or interface named Update (I don't think Python abstract classes/interfaces, but that's fine). The Update interface implements two methods, execute() and undo().
A concrete class has a name like UpdateAdress, UpdatePortfolio, or UpdatePaymentInfo.
Each concrete Update object also holds a reference to your model object.
The concrete classes hold the state needed to for a particular kind of update. Imageine these methods exist on the UpdateAddress class:
UpdateAddress
setStreetNumber(...)
setCity(...)
setPostcode(...)
setCountry(...)
The update object needs to hold both the current and new values of a property. Like:
setStreetNumber(aString):
self.oldStreetNumber = model.getStreetNumber
self.newStreetNumber = aString
When the execute method is called, the model is updated:
execute:
model.setStreetNumber(newStreetNumber)
model.setCity(newCity)
# Set postcode and country
if not ModelValidator.isValid(model):
self.undo()
raise ValidationError
and the undo method looks like:
undo:
model.setStreetNumber(oldStreetNumber)
model.setCity(oldCity)
# Set postcode and country
That is a lot of typing, but it would work. Mutating your model object is nicely encapsulated by different kinds of updates. You can execute or undo the changes by calling those methods on the update object. You can even store a list of update objects for multi-level undos and re-tries.
However, it is a lot of typing for the programmer. Consider using persistent data structures. Persistent data structures can be used to copy objects very quickly -- approximately constant time complexity. Here is a python library of persistent data structures.
Let's assume your data was in a persistent data structure version of a dict. The library I referenced calls it a PMap.
The implementation of the update classes can be simpler. Starting with the constructor:
UpdateAddress(pmap)
self.oldPmap = pmap
self.newPmap = pmap
The setters are easier:
setStreetNumber(aString):
self.newPmap = newPmap.set('streetNumber', aString)
Execute passes back a new instance of the model, with all the updates.
execute:
if ModelValidator.isValid(newModel):
return newModel;
else:
raise ValidationError
The original object has not changed at all, thanks to the magic of persistent data structures.
The best thing is to not do any of this. Instead, use an ORM or object database. That is the "enterprise grade" solution. These libraries give you sophisticated tools like transactions and object version history.

Is it possible to create a Django model w/ variable functions as an attribute (passed in either via init or some method)?

I'm creating an app. that will generate math problems. They're specific problems where some parameters can be altered. Each problem will be different, and require a different method to solve (all of which will be programatically implemented).
For example:
models.py
import random
from django.db import models
class Problem(models.Model):
unformattedText = models.TextField()
def __init__(self, unformattedText, genFunction, *args, **kwargs):
super(Problem, self).__init__(*args, **kwargs)
self.unformatedText = unformattedText
self.genFunction = genFunction
def genQAPair():
self.genFunction(self.unformattedText)
views.py
def genP1(text):
num_1 = random.randrange(0, 100)
num_2 = random.randrange(0, 100)
text.format((num_1, num_2))
return {'question':text, 'answer':num_1 - num_2}
def genP2(text, lim=4):
num_1 = random.randrange(0, lim)
text.format(num_1)
return {'question':text, 'answer':num_1*40}
p1 = Problem(
unformattedText='Sally has {} apples. Frank takes {}. How many apples does Sally have?',
genFunction=genP1
)
p1.save()
p2 = Problem(
unformattedText='John jumps {} feet into the air. How long does it take for him to age?',
genFunction=genP2
)
p2.save()
When I try this, the function isn't actually saved. Django just saves the integer 1. When I initiate an instance of the model, the function is there as intended, but apparently only 1 is saved to the database.
Bonus question: I'm actually beginning to question whether or not I even need Django models for this. I'm using Django because it's super easy to get everything onto a webpage. Is there a better way to do this? (Maybe store the text of each problem in a JSON file and the generating functions in some separate script.)

The persistence layer for a Django application is the database, and the database schema is specified by your model definitions. In this case you've only defined a single field in your model, unformattedText; you haven't specified any storage for the corresponding function. Your self.genFunction = genFunction is just creating an attribute on an object in memory; it won't be persisted.
There are various possible ways to store the function. You could store it as raw text; you could store it as a pickle blob; you could store the function path and name (e.g. "my.path.to.problems.genP1"); or do something else. In any case, you'll need to create a database field for that information.
Here is a rough outline of an example solution using the function path:
models.py
class Problem(models.Model):
unformattedText = models.TextField()
genPath = models.TextField()
views.py
import importlib
def problem_view(request, problem_id):
problem = Problem.objects.get(id=problem_id)
gen_path, gen_name = problem.genPath.rsplit(".", 1)
gen_module = importlib.import_module(gen_path)
gen_function = getattr(gen_module, gen_name)
context = gen_function(problem.unformattedText)
return render(request, 'template.html', context)
Only you can determine if you need to use a database at all. If you only have a few fixed questions then you could just stuff everything into a Python file and be done with it. But there are advantages to using Django's models, including the ability to use the admin.

There are a couple of options, depending on the actual task. I ranged them starting with the most safe option to the most dangerous (but flexible):
1. Store function identifiers:
You can store genP1 and genP2 as 'genP1' and 'genP2' - i.e. by name (or you can use any other unique identifier).
Pros:
You can validate user input and execute trusted code only, because in this case you control almost everything.
You can easily debug your functions, because they are part of your system.
Cons:
You need to define all your functions in the code. That means if you want to add new function, you need to redeploy your application.
If you are storing function names, you need to manually import module (or package) with the functions and call them.
If you are storing identifiers, you need to define a mapping {identifier: path to actual function}.
2. Use DSL
You can write your own DSL (or use existing)
Pros:
You can add new functions at runtime without redeploying application.
You can control which code can user execute.
You can see source code for your functions.
Cons:
It is hard to write safe and flexible DSL, especially if you want to call some python code from it.
It is hard to debug huge functions.
3. Serialize them
You can serialize functions using pickle
Pros:
You can add new functions at runtime without redeploying application.
Easier than writing own DSL.
Cons:
Unsafe - you must not execute untrusted code. If you allow users to create their own functions, serialization is not your way - define (or use existing) safe DSL instead.
It might be impossible to show source python code for the serialized function. For more information: How can I get the source code of a Python function?
It is hard to debug huge functions.
4. Just store actual Python code
Just store the python source code in the DB as a string.
Pros:
You can add new functions at runtime without redeploying application.
You can see source code without any additional processing.
Easier than writing own DSL.
Cons:
Unsafe - you must not execute untrusted code. If you allow users to create their own functions, storing source code is not your way - define (or use existing) safe DSL instead.
It is hard to debug huge functions.

Should I add new methods to a class instead of using Single Responsibility Principle

We had a seminar where I presented Single Responsibility Principle to my team so that we use it in our projects.
I used the following popular example:
class Employee:
save()
calculate_salary()
generate_report()
And I asked the team to tell whether everything is okay with this class. Everybody told me that it's okay.
But I see three violations of the SRP principle here.
Am I right if I say that all methods should be extracted from the class?
My reasoning:
save() method is a reason for change if change our database.
calculate_salary() method is a reason for change because the salary policy may change.
generate_report() method is a reason for change if we want to change the presentation of the report (i.e. csv instead of html).
Let's take the last method. I came up with the following HtmlReportGenerator class.
class HTMLReportGenerator:
def __init__(self, reportable):
self.reportable = reportable
def generate_csv_report()
class CSVReportGenerator:
def __init__(self, reportable):
self.reportable = reportable
def generate_html_report()
Now even if business logic of this generator changes, it does not touch the Employee class and this was my main point. Moreover, now we can reuse those classes to objects other than Employee class objects.
But the team came up with a different class:
class Employee:
save()
calculate_salary()
generate_html_report()
generate_csv_report()
They understand that they violate the SRP, but it is okay for them.
And this is where I had no other ideas to fight for ))
Any ideas on the situation?

I agree with you, by adding additional functions they violated both the SRP and the open/close principle, and every time there will be a new report type they will violate it again.
I would keep the generate_report() function but add a parameter from interface Type "ReportType" which has a function generate().
This means that for example you can call (pardon my Java):
employee.generate_report(new CSVReport())
employee.generate_report(new HTMLReport())
And tomorrow if you want to add a XML report you just implement XMLReport from the Report interface and call :
employee.generate_report(new XMLReport())
This gives you a lot of flexibility, no need to change employee for new report types, and much easier to test (for example if generate_report had complicated logic you could just create a TestReport class that implements Report interface and just prints to the output stream for debugging and call generate_report(new TestReport()))

Django: When to customize save vs using post-save signal

I have a series of tests and cases in a database. Whenever a test is obsoleted, it gets end dated, and any sub-cases of that test should also be end dated. I see two ways to accomplish this:
1) Modify the save function to end date sub-cases.
2) Create a receiver which listens for Test models being saved, and then end dates their sub-cases.
Any reason to use one other than the other?
Edit: I see this blog post suggests to use the save method whenever you check given values of the model. Since I'm checking the end_date, maybe that suggests I should use a custom save?
Edit2: Also, for the record, the full hierarchy is Protocol -> Test -> Case -> Planned_Execution, and anytime one is end_dated, every child must also be endDated. I figure I'll end up doing basically the same thing for each.
Edit3: It turns out that in order to tell whether the current save() is the one that is endDating the Test, I need to have access to the old data and the new data, so I used a custom save. Here's what it looks like:
def save(self):
"""Use a custom save to end date any subCases"""
try:
orig = Test.objects.get(id=self.id)
enddated = (not orig.end_date) and self.end_date is not None
except:
enddated = False
super(Test, self).save()
if enddated:
for case in self.case_set.exclude(end_date__isnull=False):
case.end_date = self.end_date
case.enddater = self.enddater
case.save()

I generally use this rule of thumb:
If you have to modify data so that the save won't fail, then override save() (you don't really have another option). For example, in an app I'm working on, I have a model with a text field that has a list of choices. This interfaces with old code, and replaces an older model that had a similar text field, but with a different list of choices. The old code sometimes passes my model a choice from the older model, but there's a 1:1 mapping between choices, so in such a case I can modify the choice to the new one. Makes sense to do this in save().
Otherwise, if the save can proceed without intervention, I generally use a post-save signal.

In my understanding, signals are a means for decoupling modules. Since your task seems to happen in only one module I'd customize save.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.