Should Domain Model Classes always depend on primitives? - python

Halfway through Architecture Patterns with Python, I have two questions about how should the Domain Model Classes be structured and instantiated. Assume on my Domain Model I have the class DepthMap:
class DepthMap:
def __init__(self, map: np.ndarray):
self.map = map
According to what I understood from the book, this class is not correct since it depends on Numpy, and it should depend only on Python primitives, hence the question: Should Domain Model classes rely only on Python primitives, or is there an exception?
Assuming the answer to the previous question is that classes should solely depend on primitives, what would the correct way create a DepthMap from a Numpy array be? Assume now I have more formats from where I can make a DepthMap object.
class DepthMap:
def __init__(self, map: List):
self.map = map
#classmethod
def from_numpy(cls, map: np.ndarray):
return cls(map.tolist())
#classmethod
def from_str(cls, map: str):
return cls([float(i) for i in s.split(',')])
or a factory:
class DepthMapFactory:
#staticmethod
def from_numpy(map: np.ndarray):
return DepthMap(map.tolist())
#staticmethod
def from_str(map: str):
return DepthMap([float(i) for i in s.split(',')])
I think even the Repository Pattern, which they go through in the book, could fit in here:
class StrRepository:
def get(map: str):
return DepthMap([float(i) for i in s.split(',')])
class NumpyRepository:
def get(map: np.ndarray):
return DepthMap(map.tolist())
The second question: When creating a Domain Model Object from different sources, what is the correct approach?
Note: My background is not software; hence some OOP concepts may be incorrect. Instead of downvoting, please comment and let me know how to improve the question.

I wrote the book, so I can at least have a go at answering your question.
You can use things other than primitives (str, int, boolean etc) in your domain model. Generally, although we couldn't show it in the book, your model classes will contain whole hierarchies of objects.
What you want to avoid is your technical implementation leaking into your code in a way that makes it hard to express your intent. It would probably be inappropriate to pass instances of Numpy arrays around your codebase, unless your domain is Numpy. We're trying to make code easier to read and test by separating the interesting stuff from the glue.
To that end, it's fine for you to have a DepthMap class that exposes some behaviour, and happens to have a Numpy array as its internal storage. That's not any different to you using any other data structure from a library.
If you've got data as a flat file or something, and there is complex logic involved in creating the Numpy array, then I think a Factory is appropriate. That way you can keep the boring, ugly code for producing a DepthMap at the edge of your system, and out of your model.
If creating a DepthMap from a string is really a one-liner, then a classmethod is probably better because it's easier to find and understand.

I think it's perfectly fine to depend on librairies that are pure language extensions or else you will just end up with having to define tons of "interface contracts" (Python doesn't have interfaces as a language construct -- but those can be conceptual) to abstract away these data structures and in the end those newly introduced contracts will probably be poor abstractions anyway and just result in additional complexity.
That means your domain objects can generally depend on these pure types. On the other hand I also think these types should be considered as language "primitives" (native may be more accurate) just like datetime and that you'd want to avoid primitive obsession.
In other words, DepthMap which is a domain concept is allowed to depend on Numpy for it's construction (no abstraction necessary here), but Numpy shouldn't necessarily be allowed to flow deep into the domain (unless it's the appropriate abstraction).
Or in pseudo-code, this could be bad:
someOperation(Numpy: depthMap);
Where this may be better:
class DepthMap(Numpy: data);
someOperation(DepthMap depthMap);
And regarding the second question, from a DDD perspective if the
DepthMap class has a Numpy array as it's internal structure but has to
be constructed from other sources (string or list for example) would
the best approach be a repository pattern? Or is this just for
handling databases and a Factory is a better approach?
The Repository pattern is exclusively for storage/retrieval so it wouldn't be appropriate. Now, you may have a factory method directly on DepthMap that accepts a Numpy or you may have a dedicated factory. If you want to decouple DepthMap from Numpy then it could make sense to introduce a dedicated factory, but it seems unnecessary here at first glance.

Should Domain Model classes rely only on Python primitives
Speaking purely from a domain-driven-design perspective, there's absolutely no reason that this should be true
Your domain dynamics are normally going to be described using the language of your domain, ie the manipulation of ENTITIES and VALUE OBJECTS (Evans, 2003) that are facades that place domain semantics on top of your data structures.
The underlying data structures, behind the facades, are whatever you need to get the job done.
There is nothing in domain driven design requiring that you forsake a well-tested off the shelf implementation of a highly optimized Bazzlefraz and instead write your own from scratch.
Part of the point of domain driven design is that we want to be making our investment into the code that helps the business, not the plumbing.

Related

Both Inheritance and composition in Python, bad practice?

I'm working a project, where the natural approach is to implement a main object with sub-components based on different classes, e.g. a PC consisting of CPU, GPU, ...
I've started with a composition structure, where the components have attributes and functions inherent to their sub-system and whenever higher level attributes are needed, they are given as arguments.
Now, as I'm adding more functionality, it would make sense to have different types of the main object, e.g. a notebook, which would extend the PC class, but still have a CPU, etc. At the moment, I'm using a separate script, which contains all the functions related to the type.
Would it be considered bad practice to combine inheritance and composition, by using child classes for different types of the main object?
In short
Preferring composition over inheritance does not exclude inheritance, and does not automatically make the combination of both a bad practice. It's about making an informed decision.
More details
The recommendation to prefer composition over inheritance is a rule of thumb. It was first coined by GoF. If you'll read their full argumentation, you'll see that it's not about composition being good and inheritance bad; it's that composition is more flexible and therefore more suitable in many cases.
But you'll need to decide case by case. And indeed, if you consider some variant of the composite pattern, specialization of the leaf and composite classes can be perfectly justified in some situations:
polymorphism could avoid a lot of if and cases,
composition could in some circumstances require additional call-forwarding overhead that might not be necessary when it's really about type specialization.
combination of composition and inheritance could be used to get the best of both worlds (caution: if applied carelessly, it could also give the worst of both worlds)
Note: If you'd provide a short overview of the context with an UML diagram, more arguments could be provided in your particular context. Meanwhile, this question on SE, could also be of interest

Can monads do what this OOP can't?

Trying to understand monads and wondering if they'd be useful for data transformation programming in Python, I looked at many introductory explanations. But I don't understand why monads are important.
Is the following Python code a good representation of the Maybe monad?
class Some:
def __init__(self, val):
self.val=val
def apply(self, func):
return func(self.val)
class Error:
def apply(self, func):
return Error()
a = Some(1)
b = a.apply(lambda x:Some(x+5))
Can you give an example of a monad solution, which cannot be transformed into such OOP code?
(Do you think monads for data transformation in OOP languages can be useful?)
Here is a post that discusses monads using Swift as the language that might help you make more sense of them: http://www.javiersoto.me/post/106875422394
The basic point is that a monad is an object that has three components.
A constructor that can take some other object, or objects, and create a monad wrapping that object.
A method that can apply a function that knows nothing about the monad to its contents and return the result wrapped in a monad.
A method that can apply a function that takes the raw object and returns a result wrapped in the monad, and return that monad.
Note that this means even an Array class can be a monad if the language treats methods as first class objects, and the necessary methods exist.
If the language doesn't treat methods as first class objects, even if it's an OO language, it will not able able to implement Monads. I think your confusion may be coming from the fact that you are using a multi-paradigm language (Python) and assuming it is a pure OO language.
The apply you've written corresponds to the monad function called bind.
Given that and the constructor Some (which for a generalized monad is called unit or return) you can define the functions fmap and apply which promote other kinds of functions: where bind promotes and uses a function that takes plain data and returns a monad, fmap does so for for functions from plain data to plain data, and apply does so for functions that are themselves wrapped inside the monad.
You would have issues, I think, defining the last monadic operation: join. join takes a nested monad and flattens it to a single layer. So this is a method that can't be used on all objects of the class, only those with a particular structure. And while join can be avoided if you define bind instead (you can write join in terms of bind and the identity function), it's inconvenient to define bind for array-like monads. Functions on arrays are more naturally described in terms of fmap (loop over the array and apply the function) and join (take a nested array and flatten it).
I think if you can attack these challenges, you can implement a monad in whatever language you choose. And I do think that monads are useful in many languages, as many sub-computations can be described using them. They represent a known solution to many common problems, and if they're already available to you, that means code you don't have to test or debug; they're tools that already work.
Implementing them yourself lets you use the functional literature and way of thinking to attack problems. Whether that convenience is worth the effort of implementing them is up to you.

Explain polymorphism

What is polymorphism? I'm not sure I am understanding it correctly.
In the Python scope, what I am getting out of it is that I can define parameters as followed:
def blah (x, y)
without having to specify the type, as opposed to another language like Java, where it'd look more along the lines of:
public void blah (int x, string y)
Am I getting that right?
Beware that different people use different terminology; in particular there is often a rift between the object oriented community and the (rest of the) programming language theory community.
Generally speaking, polymorphism means that a method or function is able to cope with different types of input. For example the add method (or + operator) in the Integer class might perform integer addition, while the add method in the Float class performs floating-point addition, and the add method in the Bignum class performs the correct operations for an arbitrary-size number. Polymorphism is the ability to call the add method on an object without knowing what kind of a number it is.
One particular kind of polymorphism, usually called parametric polymorphism in the functional community and generic programming in the OOP community, is the ability to perform certain operations on an object without caring about its precise type. For example, to reverse a list, you don't need to care about the type of the elements of the list, you just need to know that it's a list. So you can write generic (in this sense) list reversal code: it'll work identically on lists of integers, strings, widgets, arbitrary objects, whatever. But you can't write code that adds the elements of a list in a generic way, because the way the elements are interpreted as numbers depends on what type they are.
Another kind of polymorphism, usually called ad-hoc polymorphism or (at least for some forms of it) generic programming in the functional community, and often subtyping polymorphism (though this somewhat restricts the concept) in the OOP community, it the ability to have a single method or function that behaves differently depending on the precise type of its arguments (or, for methods, the type of the object whose method is being invoked). The add example above is ad-hoc polymorphism. In dynamically typed languages this ability goes without saying; statically-typed languages tend to (but don't have to) have restrictions such as requiring that the argument be a subclass of some particular class (Addable).
Polymorphism is not about having to specify types when you define a function. That's more related to static vs. dynamic typing, though it's not an intrinsic part of the issue. Dynamically typed languages have no need for type declarations, while statically typed languages usually need some type declarations (going from quite a lot in Java to almost none in ML).
Hope from this example, you will understand what Polymorphism is. In this picture, all objects have a method Speak() but each has a different implementation. Polymorphism allows you to do this, you can declare an action for a class and its subclasses but for each subclass, you can write exactly what you want later.
The answers you've gotten are good, and explain what polymorphism is. I think it can also help to understand some of the reasons it is useful.
In some languages that lack polymorphism you find yourself in situations where you want to perform what is conceptually the same operation on different types of objects, in cases where that operation has to be implemented differently for each type. For instance, in a python-like syntax:
def dosomething(thing):
if type(thing)==suchandsuch:
#do some stuff
elif type(thing)==somesuch:
#do some other stuff
elif type(thing)==nonesuch:
#yet more stuff
There are some problems with this. The biggest is that it causes very tight coupling and a lot of repetition. You are likely to have this same set of tests in a lot of places in your code. What happens if you add a new type that has to support this operation? You have to go find every place you have this sort of conditional and add a new branch. And of course you have to have access to all the source code involved to be able to make those changes. On top of that conditional logic like this is wordy, and hard to understand in real cases.
It's nicer to be able to just write:
thing.dosomething()
On top of being a lot shorter this leads to much looser coupling. This example/explanation is geared to traditional OO languages like Python. The details are a bit different in, say, functional languages. But a lot of the general utility of polymorphism remains the same.
Polymorphism literally means "many shapes", and that's pretty good at explaining its purpose. The idea behind polymorphism is that one can use the same calls on different types and have them behave appropriately.
It is important to distinguish this from the typing system - strongly typed languages require that objects be related through an inheritance chain to be polymorphic, but with weakly typed languages, this is not necessary.
In Java (et al.), this is why interfaces are useful - they define the set of functions that can be called on objects without specifying the exact object - the objects that implement that interface are polymorphic with respect to that interface.
In Python, since things are dynamically typed, the interface is less formal, but the idea is the same - calling foo() will succeed on two objects that know how to properly implement foo(), but we don't care about what type they really are.
No, that is not polymorphism. Polymorphism is the concept that there can be many different implementations of an executable unit and the difference happen all behind the scene without the caller awareness.
For example, Bird and Plane are FlyingObject. So FlyingObject has a method call fly() and both Bird and Plane implement fly() method. Bird and Plan flies differently so the implementations are different. To the clients point of view, it does not matter how Bird or Plane fly, they can just call fly() method to a FlyingObject object does not matter if that object is Plan or Bird.
What you are describing is dynamic and static type checking which the type compatibility is done at run-time and compile-time respectively.
Hope this out.
NawaMan
Short answer: The ability to treat programmatic type instances of different types as the same for certain purposes.
Long answer:
From Ancient Greek poly (many) + morph (form) + -ism.
Polymorphism is a general technique enabling different types to be treated uniformly in some way. Examples in the programming world include:
parametric polymorphism (seen as
generics in Java)
subtyping
polymorphism, implemented in Java
using dynamic message dispatch
between object instances.
ad-hoc
polymorphism, which relates to the
ability to define functions of the
same name that vary only by the types
they deal with (overloading in Java).
The word polymorphism is also used to describe concepts in other, unrelated, domains such as genetics.
What you are talking about is auto-typing--or maybe type detection. It is something a Dynamic language tends to do--it allows the user to not know or care about the types at build time--the types are figured out at runtime and not restricted to a specific type.
Polymorphism is where you have two classes that inherit from a main class but each implement a method differently.
For instance, if you have Vehicle as a root class and Car and Bicycle as instances, then vehicle.calculateDistance() would operate based on gas available if the vehicle is an instance of Car, and would operate based on the endurance of the cyclist if it was a Bicycle type.
It's generally used like this:
getTotalDistance(Vehicle[] vehicle) {
int dist=0
for each vehicle
dist+=vehicle.calculateDistance();
Note that you are passing in the base type, but the instances will never be Vehicle itself, but always a child class of Vehicle--but you DO NOT cast it to the child type. Polymorphis means that vehicle morphs into whatever child is required.
Yes, that is an example of "type-polymorphism". However, when talking about object-oriented programming "polymorphism" typically relates to "subtype polymorphism." The example you gave is often called "typing".
Java, C, C++ and others, use static typing. In that, you have to specify the types are compile time.
Python, and Ruby use dynamic in that the typing is inferred during interpretation.
Subtype polymorphism or just "polymorphism" is the ability for a base class reference that is a derived type, to properly invoke the derived type's method.
For example (in near pseudo code):
class Base
{
virtual void Draw() { //base draw}
}
class Derived: public Base
{
void Draw() { //derived draw}
}
Base* myBasePtr = new Derived();
myBasePtr->Draw(); //invokes Derived::Draw(), even though it's a base pointer.
This is polymorphism.
Polymorphism:
One method call works on several classes, even if the classes need different implementations;
Ability to provide multiple implementations of an action, and to select the correct implementation based on the surrounding context;
Provides overloading and overriding;
Could occurs both in Runtime and Compile-Time;
Run time Polymorphism :
Run time Polymorphism also known as method overriding
Method overriding means having two or more methods with the same name , same signature but with different implementation
Compile time Polymorphism :
Compile time Polymorphism also known as method overloading
Method overloading means having two or more methods with the same name but with different signatures
In computer science, polymorphism is a programming language feature that allows values of different data types to be handled using a uniform interface. The concept of parametric polymorphism applies to both data types and functions. A function that can evaluate to or be applied to values of different types is known as a polymorphic function. A data type that can appear to be of a generalized type (e.g., a list with elements of arbitrary type) is designated polymorphic data type like the generalized type from which such specializations are made.
Disadvantages of Polymorphism:
Polymorphism reduces readability of the program. One needs to visualize runtime behaviour of program to identify actual execution time class involved. It also becomes difficult to navigate through classes implementing this concept. Even sofasticated IDEs can not provide this navigation feature. This adds to the maintainance cost to some extent.
Polymorphism - The same object acting differently based on the scenario it is in. For example, if a 12-year old kid was in a room with a bunch of kids, the type of music they would listen to would be different than if a 12 year old kid was in a room full of adults. The 12 year old kid is the same, however the kid acting differently based on the scenario it is in (the different music).
The ability to define a function in multiple forms is called Polymorphism. In C#, Java, C++ there are two types of polymorphism: compile time polymorphism (overloading) and runtime polymorphism (overriding).
Overriding: Overriding occurs when a class method has the same name and signature as a method in parent class.
Overloading: Overloading is determined at the compile time. It occurs when several methods have same names with:
Different method signature and different number or type of
parameters.
Same method signature but different number of parameters.
Same method signature and same number of parameters but of different type

How many private variables are too many? Capsulizing classes? Class Practices?

Okay so i am currently working on an inhouse statistics package for python, its mainly geared towards a combination of working with arcgis geoprocessor, for modeling comparasion and tools.
Anyways, so i have a single class, that calculates statistics. Lets just call it Stats. Now my Stats class, is getting to the point of being very large. It uses statistics calculated by other statistics, to calculate other statistics sets, etc etc. This leads to alot of private variables, that are kept simply to prevent recalculation. however there is certain ones, while used quite frequintly they are often only used by one or two key subsections of functionality. (e.g. summation of matrix diagonals, and probabilities). However its starting to become a major eyeesore, and i feel as if i am doing this terribly wrong.
So is this bad?
I was recommended by a coworker, to simply start putting core and common functionality togther, in the main class, then simply having capsules, that take a reference to the main class, and simply do what ever functionality they need to within themselves. E.g. for calculating accuracy of model predictions, i would create a capsule, who simply takes a reference to the parent, and it will offload all of the calculations needed, for model predictions.
Is something like this really a good idea? Is there a better way? Right now i have over a dozen different sub statistics that are dumped to a text file to make a smallish report. The code base is growing, and i would just love it if i could start splitting up more and more of my python classes. I am just not sure really what the best way about doing stuff like this is.
Why not create a class for each statistic you need to compute and when of the statistics requires other, just pass an instance of the latter to the computing method? However, there is little known about your code and required functionalities. Maybe you could describe in a broader fashion, what kind of statistics you need calculate and how they depend on each other?
Anyway, if I had to count certain statistics, I would instantly turn to creating separate class for each of them. I did once, when I was writing code statistics library for python. Every statistic, like how many times class is inherited or how often function was called, was a separate class. This way each of them was simple, however I didn't need to use any of them in the other.
I can think of a couple of solutions. One would be to simply store values in an array with an enum like so:
StatisticType = enum('AveragePerDay','MedianPerDay'...)
Another would be to use a inheritance like so:
class StatisticBase
....
class AveragePerDay ( StatisticBase )
...
class MedianPerDay ( StatisticBase )
...
There is no hard and fast rule on "too many", however a guideline is that if the list of fields, properties, and methods when collapsed, is longer than a single screen full, it's probably too big.
It's a common anti-pattern for a class to become "too fat" (have too much functionality and related state), and while this is commonly observed about "base classes" (whence the "fat base class" monicker for the anti-pattern), it can really happen without any inheritance involved.
Many design patterns (DPs for short_ can help you re-factor your code to whittle down the large, untestable, unmaintainable "fat class" to a nice package of cooperating classes (which can be used through "Facade" DPs for simplicity): consider, for example, State, Strategy, Memento, Proxy.
You could attack this problem directly, but I think, especially since you mention in a comment that you're looking at it as a general class design topic, it may offer you a good opportunity to dig into the very useful field of design patterns, and especially "refactoring to patterns" (Fowler's book by that title is excellent, though it doesn't touch on Python-specific issues).
Specifically, I believe you'll be focusing mostly on a few Structural and Behavioral patterns (since I don't think you have much need for Creational ones for this use case, except maybe "lazy initialization" of some of your expensive-to-compute state that's only needed in certain cases -- see this wikipedia entry for a pretty exhaustive listing of DPs, with classification and links for further explanations of each).
Since you are asking about best practices you might want to check out pylint (http://www.logilab.org/857). It has many good suggestions about code style including ones relating to how many private variables in a class.

How bad is it to override a method from a third-party module?

How bad is it to redefine a class method from another, third-party module, in Python?
In fact, users can create NumPy matrices that contain numbers with uncertainty; ideally, I would like their code to run unmodified (compared to when the code manipulates float matrices); in particular, it would be great if the inverse of matrix m could still be obtained with m.I, despite the fact that m.I has to be calculated with my own code (the original I method does not work, in general).
How bad is it to redefine numpy.matrix.I? For one thing, it does tamper with third-party code, which I don't like, as it may not be robust (what if other modules do the same?…). Another problem is that the new numpy.matrix.I is a wrapper that involves a small overhead when the original numpy.matrix.I can actually be applied in order to obtain the inverse matrix.
Is subclassing NumPy matrices and only changing their I method better? this would force users to update their code and create matrices of numbers with uncertainty with m = matrix_with_uncert(…) (instead of keeping using numpy.matrix(…), as for a matrix of floats), but maybe this is an inconvenience that should be accepted for the sake of robustness? Matrix inversions could still be performed with m.I, which is good… On the other hand, it would be nice if users could build all their matrices (of floats or of numbers with uncertainties) with numpy.matrix() directly, without having to bother checking for data types.
Any comment, or additional approach would be welcome!
Subclassing (which does involve overriding, as the term is generally used) is generally much preferable to "monkey-patching" (stuffing altered methods into existing classes or modules), even when the latter is available (built-in types, meaning ones implemented in C, can protect themselves against monkey-patching, and most of them do).
For example, if your functionality relies on monkey-patching, it will break and stop upgrades if at any time the class you're monkey patching gets upgraded to be implemented in C (for speed or specifically to defend against monkey patching). Maintainers of third party packages hate monkey-patching because it means they get bogus bug reports from hapless users who (unbeknownst to them) are in fact using a buggy monkey-patch which breaks the third party package, where the latter (unless broken monkey-wise) is flawless. You've already remarked on the possible performance hit.
Conceptually, a "matrix of numbers with uncertainty" is a different concept from a "matrix of numbers". Subclassing expresses this cleanly, monkey-patching tries to hide it. That's really the root of what's wrong with monkey-patching in general: a covert channel operating through global, hidden means, without clarity and transparency. All the many practical problems descend in a sense from this root conceptual problem.
I strongly urge you to reject monkey-patching in favor of clean solutions such as subclassing.
In general, it's perfectly acceptable to override methods that are ...
Intentionally permit overrides
In a way they document (satisfying LSP won't hurt)
If both conditions are met, then overriding should be safe.
Depends on what you mean with "redefine". Obviously you can use your own version of it, no problem at all. Also you can redefine it by subclassing if it's a method.
You can also make a new method, and patch it into the class, a practice known as monkey_patching. Like so:
from amodule import aclass
def newfunction(self, param):
do_something()
aclass.oldfunction = newfunction
This will make all instances of aclass use your new function instead of the old one, including instances in any "fourth-party" modules. This works and is highly useful, but it's regarded as very ugly and a last resort option. This is because there is nothing in the aclass code to suggest that you have overridden the method, so it's hard to debug. And even worse it gets when two modules monkeypatch the same thing. Then you really get confused.

Categories