I'm working on designing a piece of software now, that has a few levels of abstraction. This might be the most complex piece of code I've ever started designing, and it has a requirement for easy upgrading, so I'm wanting to make sure I'm on the right track before I even start coding anything.
Essentially, there will be 3 main levels of classes. These two classes will need to talk with each other.
The first is the input source data. There are currently 2 main types of input data, which produce similar, but not identical output. The main goal of these classes will be to get the data from the two difference sources and convert it into a common interface, for use in the rest of the program.
The second set will be an adapter for an external library. The library has been periodically updated, and I have no reason to suspect that it will not continue to be updated throughout the years. Most likely, each upgrade will remain very similar to the previous one, but there might be some small changes made to support a new library version. This level will be responsible for taking the inputs, and formatting them for a use of an output class.
The last class is the outputs. I don't think that multiple versions will be required for this, but there will need to be at least two different output directories specified. I suspect the easiest thing to do would be to simply pass in an output directory when the output class is created, and that is the only level of abstraction required. This class will be frequently updated, but there is no requirement to support multiple versions.
Set up the code as follows, essentially following a bridge pattern, but with multiple abstraction layers.
The input class will be the abstraction. The currently two different means of getting output will be the two different concrete classes, and more concrete classes can be added if required.
The wrapper class will be a factory pattern. Most of the code should be common between the various implementations, so this should work well to handle minute differences.
The output class will be included as a part of the implementor class. There isn't a pattern really required, as only one version will ever be required for this class. Also, the implementor will likely be a singleton.
Related
I am implementing a workflow management system, where the workflow developer overloads a little process function and inherits from a Workflow class. The class offers a method named add_component in order to add a component to the workflow (a component is the execution of a software or can be more complex).
My Workflow class in order to display status needs to know what components have been added to the workflow. To do so I tried 2 things:
execute the process function 2 times, the first time allow to gather all components required, the second one is for the real execution. The problem is, if the workflow developer do something else than adding components (add element in a databases, create a file) this will be done twice!
parse the python code of the function to extract only the add_component lines, this works but if some components are in a if / else statement and the component should not be executed, the component apears in the monitoring!
I'm wondering if there is other solution (I thought about making my workflow being an XML or something to parse easier but this is less flexible).
You cannot know what a program does without "executing" it (could be in some context where you mock things you don't want to be modified but it look like shooting at a moving target).
If you do a handmade parsing there will always be some issues you miss.
You should break the code in two functions :
a first one where the code can only add_component(s) without any side
effects, but with the possibility to run real code to check the
environment etc. to know which components to add.
a second one that
can have side effects and rely on the added components.
Using an XML (or any static format) is similar except :
you are certain there are no side effects (don't need to rely on the programmer respecting the documentation)
much less flexibility but be sure you need it.
I've noticed I have the same piece of code sitting at the top of several of my controllers. They tend to look like this:
def app_description(app):
""" Dictionary describing an app. """
return {'name': app.app,
'id': app.id,
'is_new': app.is_new(),
'created_on': app.created_on.strftime("%m/%d/%Y"),
'configured': app.configured }
I'll call this from a couple different actions in the controller, but generally not outside that controller. It accesses properties. It calls methods. It formats opaque objects (like dates).
My question is: is this controller code, or model code?
The case for controller:
It defines my API.
It's currently only used in that module.
There doesn't seem to be any logic here.
The case for model:
It seems like a description of the data, which the model should be responsible for.
It feels like I might want to use this in other controllers. Haven't gotten there yet, but these functions are still pretty new, so they might.
Attaching a function to the object it clearly belongs to seems better than leaving it as a module-level function.
It could be more succinctly defined on the model. Something like having the top-level model object define .description(), and the subclasses just define a black/whitelist of properties, plus override the method itself to call functions. I'm pretty sure that would be fewer lines of code (as it would save me the repetition of things like 'name': app.name), which seems like a good thing.
Not sure which framework you are using, but I would suggest creating this helper functionality in its own class and put it in a shared folder like lib/
Alternatively you could have an application helper module that just has a bunch of these helpful application-wide functions.
Either way, I'd keep it away from both the model and the controller.
The answer I finally decided on:
In the short term, having these methods is fine in the controllers. If they define the output, then, OK, they can stay there. They're only used in the model.
Theres a couple things to watch out for, which indicate they've grown up, and need to go elsewhere:
In one case, I needed access to a canonical serialization of the object. At that point, it moved into the model, as a model method.
In another case, I found that I was formatting all timestamps the same. I have a standard #ajaxify decorator that does things like sets Content-Type headers, does JSON encoding, etc. In this case, I moved the datetime standard formatting into there -- when the JSON encoder hits a datetime (formerly unserializable), it always treats it the same (seconds since the epoch, for me).
In yet a third case, I realized that I was re-using this function in a couple controllers. For that, I pulled it out into a common class (like another answer suggested) and used that to define my "Web API". I'd used this pattern before -- it makes sense for grouping similarly-used data (like timeseries data, or top-N lists).
I suspect there's more, but basically, I don't think there all as similar as I thought they were initially. I'm currently happy thinking about them as a convention for simple objects in our (small-ish, new-ish) codebase, with the understanding that after a few iterations, a better solution may present itself. In the meantime, they stay in the controller and define my AJAXy-JSON-only interface.
(note) I would appreciate help generalizing the title. I am sure that this is a class of problems in OO land, and probably has a reasonable pattern, I just don't know a better way to describe it.
I'm considering the following -- Our server script will be called by an outside program, and have a bunch of text dumped at it, (usually XML).
There are multiple possible types of data we could be getting, and multiple versions of the data representation we could be getting, e.g. "Report Type A, version 1.2" vs. "Report Type A, version 2.0"
We will generally want to do the same thing action with all the data -- namely, determine what sort and version it is, then parse it with a custom parser, then call a synchronize-to-database function on it.
We will definitely be adding types and versions as time goes on.
So, what's a good design pattern here? I can come up with two, both seem like they may have som problems.
Option 1
Write a monolithic ID script which determines the type, and then
imports and calls the properly named class functions.
Benefits
Probably pretty easy to debug,
Only one file that does the parsing.
Downsides
Seems hack-ish.
It would be nice to not have to create
knowledge of dataformats in two places, once for ID, once for actual
merging.
Option 2
Write an "ID" function for each class; returns Yes / No / Maybe when given identifying text.
the ID script now imports a bunch of classes, instantiates them on the text and asks if the text and class type match.
Upsides:
Cleaner in that everything lives in one module?
Downsides:
Slower? Depends on logic of running through the classes.
Put abstractly, should Python instantiate a bunch of Classes, and consume an ID function, or should Python instantiate one (or many) ID classes which have a paired item class, or some other way?
You could use the Strategy pattern which would allow you to separate the logic for the different formats which need to be parsed into concrete strategies. Your code would typically parse a portion of the file in the interface and then decide on a concrete strategy.
As far as defining the grammar for your files I would find a fast way to identify the file without implementing the full definition, perhaps a header or other unique feature at the beginning of the document. Then once you know how to handle the file you can pick the best concrete strategy for that file handling the parsing and writes to the database.
My first "serious" language was Java, so I have comprehended object-oriented programming in sense that elemental brick of program is a class.
Now I write on VBA and Python. There are module languages and I am feeling persistent discomfort: I don't know how should I decompose program in a modules/classes.
I understand that one module corresponds to one knowledge domain, one module should ba able to test separately...
Should I apprehend module as namespace(c++) only?
I don't do VBA but in python, modules are fundamental. As you say, the can be viewed as namespaces but they are also objects in their own right. They are not classes however, so you cannot inherit from them (at least not directly).
I find that it's a good rule to keep a module concerned with one domain area. The rule that I use for deciding if something is a module level function or a class method is to ask myself if it could meaningfully be used on any objects that satisfy the 'interface' that it's arguments take. If so, then I free it from a class hierarchy and make it a module level function. If its usefulness truly is restricted to a particular class hierarchy, then I make it a method.
If you need it work on all instances of a class hierarchy and you make it a module level function, just remember that all the the subclasses still need to implement the given interface with the given semantics. This is one of the tradeoffs of stepping away from methods: you can no longer make a slight modification and call super. On the other hand, if subclasses are likely to redefine the interface and its semantics, then maybe that particular class hierarchy isn't a very good abstraction and should be rethought.
It is matter of taste. If you use modules your 'program' will be more procedural oriented. If you choose classes it will be more or less object oriented. I'm working with Excel for couple of months and personally I choose classes whenever I can because it is more comfortable to me. If you stop thinking about objects and think of them as Components you can use them with elegance. The main reason why I prefer classes is that you can have it more that one. You can't have two instances of module. It allows me use encapsulation and better code reuse.
For example let's assume that you like to have some kind of logger, to log actions that were done by your program during execution. You can write a module for that. It can have for example a global variable indicating on which particular sheet logging will be done. But consider the following hypothetical situation: your client wants you to include some fancy report generation functionality in your program. You are smart so you figure out that you can use your logging code to prepare them. But you can't do log and report simultaneously by one module. And you can with two instances of logging Component without any changes in their code.
Idioms of languages are different and thats the reason a problem solved in different languages take different approaches.
"C" is all about procedural decomposition.
Main idiom in Java is about "class or Object" decomposition. Functions are not absent, but they become a part of exhibited behavior of these classes.
"Python" provides support for both Class based problem decomposition as well as procedural based.
All of these uses files, packages or modules as concept for organizing large code pieces together. There is nothing that restricts you to have one module for one knowledge domain.
These are decomposition and organizing techniques and can be applied based on the problem at hand.
If you are comfortable with OO, you should be able to use it very well in Python.
VBA also allows the use of classes. Unfortunately, those classes don't support all the features of a full-fleged object oriented language. Especially inheritance is not supported.
But you can work with interfaces, at least up to a certain degree.
I only used modules like "one module = one singleton". My modules contain "static" or even stateless methods. So in my opinion a VBa module is not namespace. More often a bunch of classes and modules would form a "namespace". I often create a new project (DLL, DVB or something similar) for such a "namespace".
OK I've got 2 really big classes > 1k lines each that I currently have split up into multiple ones. They then get recombined using multiple inheritance. Now I'm wondering, if there is any cleaner/better more pythonic way of doing this. Completely factoring them out would result in endless amounts of self.otherself.do_something calls, which I don't think is the way it should be done.
To make things clear here's what it currently looks like:
from gui_events import GUIEvents # event handlers
from gui_helpers import GUIHelpers # helper methods that don't directly modify the GUI
# GUI.py
class GUI(gtk.Window, GUIEvents, GUIHelpers):
# general stuff here stuff here
One problem that is result of this is Pylint complaining giving me trillions of "init not called" / "undefined attribute" / "attribute accessed before definition" warnings.
EDIT:
You may want to take a look at the code, to make yourself a picture about what the whole thing actually is.
http://github.com/BonsaiDen/Atarashii/tree/next/atarashii/usr/share/pyshared/atarashii/
Please note, I'm really trying anything to keep this thing as DRY as possible, I'm using pylint to detect code duplication, the only thing it complains about are the imports.
If you want to use multiple inheritance to combine everything into one big class (it might make sense to do this), then you can refactor each of the parent classes so that every method and property is either private (starts with '__') or has a short 2-3 character prefix unique to that class. For example, all the methods and properties in your GUIEvents class could start with ge_, everything in GUIHelpers could start with gh_. By doing this, you'll get achieve some of the clarity of using separate sub-class instances (self.ge.doSomething() vs self.ge_doSomething()) and you'll avoid conflicting member names, which is the main risk when combining such large classes into one.
Start by finding classes that model real world concepts that your application needs to work with. Those are natural candidates for classes.
Try to avoid multiple inheritance as much as possible; it's rarely useful and always somewhat confusing. Instead, look to use functional composition ("HAS-A" relationships) to give rich attributes to your objects made of other objects.
Remember to make each method do one small, specific thing; this necessarily entails breaking up methods that do too many things into smaller pieces.
Refactor cases where you find many such methods are duplicating each other's functionality; this is another way to find natural collections of functionality that deserve to be in a distinct class.
I think this is more of a general OO-design problem than Python problem. Python pretty much gives you all the classic OOP tools, conveniently packaged. You'd have to describe the problem in more detail (e.g. what do the GUIEvents and GUIHelpers classes contain?)
One Python-specific aspect to consider is the following: Python supports multiple programming paradigms, and often the best solution is not OOP. This may be the case here. But again, you'll have to throw in more details to get a meaningful answer.
Your code may be substantially improved by implementing a Model-View-Controller design. Depending on how your GUI and tool are setup, you may also benefit from "widgetizing" portions of your GUI, so that rather than having one giant Model-View-Controller, you have a main Model-View-Controller that manages a bunch of smaller Model-View-Controllers, each for distinct portions of your GUI. This would allow you to break up your tool and GUI into many classes, and you may be able to reuse portions of it, reducing the total amount of code you need to maintain.
While python does support multiple programming paradigms, for GUI tools, the best solution will nearly always be an Object-Oriented design.
One possibility is to assign imported functions to class attributes:
In file a_part_1.py:
def add(self, n):
self.n += n
def __init__(self, n):
self.n = n
And in main class file:
import a_part_1
class A:
__init__ = a_part_1.__init__
add = a_part_1.add
Or if you don't want to update main file when new methods are added:
class A: pass
import a_part_1
for k, v in a_part_1.__dict__.items():
if callable(v):
setattr(A,k,v)