Subclass numpy.ndarray without 'propagating' subtype

Subclass numpy.ndarray without 'propagating' subtype - python

I have a Python-based system which operates on abstract simplified representations of floor plans. Currently, a floor plan is represented as a plain NumPy array, and operations on it are implemented as functions which accept that array:
floor_plan = numpy.ndarray(...)
def find_rooms(floor_plan: numpy.ndarray) ...
I'd like to refactor the system to instead use a dedicated floor plan class, with these operations defined as method on that class. This will allow me to also store supplementary information on floor plan instances, perform centralized caching on its methods, etc.
However, I'd also like to retain the ability to treat a floor plan as an array. At the least, this is useful during the refactoring process, as all existing code will continue to work even when the new class is introduced. I can accomplish this by having my new class inherit from numpy.ndarray:
class FloorPlan(numpy.ndarray):
def find_rooms(self) ...
floor_plan = FloorPlan(...)
But there's a problem: if I derive an array from an operation on this subtype, then the derived array is also of the same subtype:
type(floor_plan[:, 0]) == FloorPlan
This is intentional by NumPy's design (https://numpy.org/doc/stable/user/basics.subclassing.html). However, it's not appropriate for my purposes, as a floor plan has discrete semantics which don't necessarily apply to every derived value.
Is there any way to disable or avoid this 'propagation' of my array subtype to derived arrays?

Related

Should Domain Model Classes always depend on primitives?

Halfway through Architecture Patterns with Python, I have two questions about how should the Domain Model Classes be structured and instantiated. Assume on my Domain Model I have the class DepthMap:
class DepthMap:
def __init__(self, map: np.ndarray):
self.map = map
According to what I understood from the book, this class is not correct since it depends on Numpy, and it should depend only on Python primitives, hence the question: Should Domain Model classes rely only on Python primitives, or is there an exception?
Assuming the answer to the previous question is that classes should solely depend on primitives, what would the correct way create a DepthMap from a Numpy array be? Assume now I have more formats from where I can make a DepthMap object.
class DepthMap:
def __init__(self, map: List):
self.map = map
#classmethod
def from_numpy(cls, map: np.ndarray):
return cls(map.tolist())
#classmethod
def from_str(cls, map: str):
return cls([float(i) for i in s.split(',')])
or a factory:
class DepthMapFactory:
#staticmethod
def from_numpy(map: np.ndarray):
return DepthMap(map.tolist())
#staticmethod
def from_str(map: str):
return DepthMap([float(i) for i in s.split(',')])
I think even the Repository Pattern, which they go through in the book, could fit in here:
class StrRepository:
def get(map: str):
return DepthMap([float(i) for i in s.split(',')])
class NumpyRepository:
def get(map: np.ndarray):
return DepthMap(map.tolist())
The second question: When creating a Domain Model Object from different sources, what is the correct approach?
Note: My background is not software; hence some OOP concepts may be incorrect. Instead of downvoting, please comment and let me know how to improve the question.

I wrote the book, so I can at least have a go at answering your question.
You can use things other than primitives (str, int, boolean etc) in your domain model. Generally, although we couldn't show it in the book, your model classes will contain whole hierarchies of objects.
What you want to avoid is your technical implementation leaking into your code in a way that makes it hard to express your intent. It would probably be inappropriate to pass instances of Numpy arrays around your codebase, unless your domain is Numpy. We're trying to make code easier to read and test by separating the interesting stuff from the glue.
To that end, it's fine for you to have a DepthMap class that exposes some behaviour, and happens to have a Numpy array as its internal storage. That's not any different to you using any other data structure from a library.
If you've got data as a flat file or something, and there is complex logic involved in creating the Numpy array, then I think a Factory is appropriate. That way you can keep the boring, ugly code for producing a DepthMap at the edge of your system, and out of your model.
If creating a DepthMap from a string is really a one-liner, then a classmethod is probably better because it's easier to find and understand.

I think it's perfectly fine to depend on librairies that are pure language extensions or else you will just end up with having to define tons of "interface contracts" (Python doesn't have interfaces as a language construct -- but those can be conceptual) to abstract away these data structures and in the end those newly introduced contracts will probably be poor abstractions anyway and just result in additional complexity.
That means your domain objects can generally depend on these pure types. On the other hand I also think these types should be considered as language "primitives" (native may be more accurate) just like datetime and that you'd want to avoid primitive obsession.
In other words, DepthMap which is a domain concept is allowed to depend on Numpy for it's construction (no abstraction necessary here), but Numpy shouldn't necessarily be allowed to flow deep into the domain (unless it's the appropriate abstraction).
Or in pseudo-code, this could be bad:
someOperation(Numpy: depthMap);
Where this may be better:
class DepthMap(Numpy: data);
someOperation(DepthMap depthMap);
And regarding the second question, from a DDD perspective if the
DepthMap class has a Numpy array as it's internal structure but has to
be constructed from other sources (string or list for example) would
the best approach be a repository pattern? Or is this just for
handling databases and a Factory is a better approach?
The Repository pattern is exclusively for storage/retrieval so it wouldn't be appropriate. Now, you may have a factory method directly on DepthMap that accepts a Numpy or you may have a dedicated factory. If you want to decouple DepthMap from Numpy then it could make sense to introduce a dedicated factory, but it seems unnecessary here at first glance.

Should Domain Model classes rely only on Python primitives
Speaking purely from a domain-driven-design perspective, there's absolutely no reason that this should be true
Your domain dynamics are normally going to be described using the language of your domain, ie the manipulation of ENTITIES and VALUE OBJECTS (Evans, 2003) that are facades that place domain semantics on top of your data structures.
The underlying data structures, behind the facades, are whatever you need to get the job done.
There is nothing in domain driven design requiring that you forsake a well-tested off the shelf implementation of a highly optimized Bazzlefraz and instead write your own from scratch.
Part of the point of domain driven design is that we want to be making our investment into the code that helps the business, not the plumbing.

Any way to prevent modifications to content of a ndarray subclass?

I am creating various classes for computational geometry that all subclass numpy.ndarray. The DataCloud class, which is typical of these classes, has Python properties (for example, convex_hull, delaunay_trangulation) that would be time consuming and wasteful to calculate more than once. I want to do calculations once and only once. Also, just in time, because for a given instance, I might not need a given property at all. It is easy enough to set this up by setting self.__convex_hull = None in the constructor and, if/when the convex_hull property is called, doing the required calculation, setting self.__convex_hull, and returning the calculated value.
The problem is that once any of those complicated properties is invoked, any changes to the contents made, external to my subclass, by the various numpy (as opposed to DataCloud subclass) methods will invalidate all the calculated properties, and I won't know about it. For example, suppose external code simply does this to the instance: datacloud[3,8] = 5. So is there any way to either (1) make the ndarray base class read-only once any of those properties is calculated or (2) have ndarray set some indicator that there has been a change to its contents (which for my purposes makes it dirty), so that then invoking any of the complex properties will require recalculation?

Looks like the answer is:
np.ndarray.setflags(write=False)

how to access np array element according to its last element

Actually, I am doing a sequence operation about numpy array, therefore, I want to know how to access a[i] quickly?
(Because I accessa[i-1] in the last loop, therefore, in c++, we may simply access a[i] by adding 1 to the address of a[i-1],but I don't know whether it is possible in numpy. Thanks.

I don't think this is possible/a[i] is the fastest way.
Python is a programming language that is easier to learn (and use) than c++, this of course comes at a cost, one of these costs is, that it's slower.
The references you're talking about can be "dangerous", hence python makes them not (easily) available to people, to protect them from things the do not understand.
While references are faster, you can't use them in python (as it is anyway slower, the difference in using references or not doesn't matter that much)

It's best not to think of a Python NumPy: ndarray as a C++ array. They're much different. Python also offers its own native list objects and includes an array module in its standard libraries.
A Python list behaves mostly like a generic array (as found in many programming languages). It's an ordered sequence; elements can be accessed by integer index from 0 up through (but not including) the length of the list (len(mylist)); ranges of elements can be access using "slice" notation (mylist[start_offset:end_offset]) returning another list object; negative indexes are treated as offsets from the end of the list (mylist[-1] is the last item of the list) and so on.
Additionally they support a number of methods such as .count(), .find() and .replace().
Unlike the arrays in most programming languages, Python lists are heterogenous. The elements can be any mixture of any object types in Python, including references to nested lists, dictionaries, code, classes, generator objects and other callable first class objects and, of course, instances of custom objects.
The Python array module allows one to instantiate homogenous list-like objects. That is you instantiate them using any of a dozen primitive data types (character or Unicode, signed or unsigned short or long integers, floating point or double precision floating point). Indexing and slicing are identical to Python native lists.
The primary advantage of Python array.array() instances is that they can store large numbers of their elements far more compactly than the more generalized list objects. Various operations on these arrays are likely to be somewhat faster than similar operations performed by iterating over or otherwise referencing elements in a native Python list because there's greater locality of reference in the more compact array layout (in memory) and because the type constraint obviates some dispatch overhead that's incurred when handling generalized objects.
NumPy, on the other hand, is far more sophisticated than the Python array module.
For one thing the ndarray can be multi-dimensional and can be dynamically reshaped. It's common to start with a linear ndarray and to reshape it into a matrix or other higher dimensional structure. Also the ndarray supports a much richer set of data types than the Python array module. NumPy also implements some rather advanced fancy indexingfeatures.
But the real performance advantages of NumPy relate to how it "vectorizes" most operations, broadcasts them across the data structures (possibly using any SIMD features supported by your CPU or even your GPU in the process. At the very list many common matrix operations, when properly written in Python for NumPy, are execute as native machine code speed. This performance edge does well beyond the minor effects of locality of references and obviating dispatch tables that one gains using the simple array module.

Can monads do what this OOP can't?

Trying to understand monads and wondering if they'd be useful for data transformation programming in Python, I looked at many introductory explanations. But I don't understand why monads are important.
Is the following Python code a good representation of the Maybe monad?
class Some:
def __init__(self, val):
self.val=val
def apply(self, func):
return func(self.val)
class Error:
def apply(self, func):
return Error()
a = Some(1)
b = a.apply(lambda x:Some(x+5))
Can you give an example of a monad solution, which cannot be transformed into such OOP code?
(Do you think monads for data transformation in OOP languages can be useful?)

Here is a post that discusses monads using Swift as the language that might help you make more sense of them: http://www.javiersoto.me/post/106875422394
The basic point is that a monad is an object that has three components.
A constructor that can take some other object, or objects, and create a monad wrapping that object.
A method that can apply a function that knows nothing about the monad to its contents and return the result wrapped in a monad.
A method that can apply a function that takes the raw object and returns a result wrapped in the monad, and return that monad.
Note that this means even an Array class can be a monad if the language treats methods as first class objects, and the necessary methods exist.
If the language doesn't treat methods as first class objects, even if it's an OO language, it will not able able to implement Monads. I think your confusion may be coming from the fact that you are using a multi-paradigm language (Python) and assuming it is a pure OO language.

The apply you've written corresponds to the monad function called bind.
Given that and the constructor Some (which for a generalized monad is called unit or return) you can define the functions fmap and apply which promote other kinds of functions: where bind promotes and uses a function that takes plain data and returns a monad, fmap does so for for functions from plain data to plain data, and apply does so for functions that are themselves wrapped inside the monad.
You would have issues, I think, defining the last monadic operation: join. join takes a nested monad and flattens it to a single layer. So this is a method that can't be used on all objects of the class, only those with a particular structure. And while join can be avoided if you define bind instead (you can write join in terms of bind and the identity function), it's inconvenient to define bind for array-like monads. Functions on arrays are more naturally described in terms of fmap (loop over the array and apply the function) and join (take a nested array and flatten it).
I think if you can attack these challenges, you can implement a monad in whatever language you choose. And I do think that monads are useful in many languages, as many sub-computations can be described using them. They represent a known solution to many common problems, and if they're already available to you, that means code you don't have to test or debug; they're tools that already work.
Implementing them yourself lets you use the functional literature and way of thinking to attack problems. Whether that convenience is worth the effort of implementing them is up to you.

Explain polymorphism

What is polymorphism? I'm not sure I am understanding it correctly.
In the Python scope, what I am getting out of it is that I can define parameters as followed:
def blah (x, y)
without having to specify the type, as opposed to another language like Java, where it'd look more along the lines of:
public void blah (int x, string y)
Am I getting that right?

Beware that different people use different terminology; in particular there is often a rift between the object oriented community and the (rest of the) programming language theory community.
Generally speaking, polymorphism means that a method or function is able to cope with different types of input. For example the add method (or + operator) in the Integer class might perform integer addition, while the add method in the Float class performs floating-point addition, and the add method in the Bignum class performs the correct operations for an arbitrary-size number. Polymorphism is the ability to call the add method on an object without knowing what kind of a number it is.
One particular kind of polymorphism, usually called parametric polymorphism in the functional community and generic programming in the OOP community, is the ability to perform certain operations on an object without caring about its precise type. For example, to reverse a list, you don't need to care about the type of the elements of the list, you just need to know that it's a list. So you can write generic (in this sense) list reversal code: it'll work identically on lists of integers, strings, widgets, arbitrary objects, whatever. But you can't write code that adds the elements of a list in a generic way, because the way the elements are interpreted as numbers depends on what type they are.
Another kind of polymorphism, usually called ad-hoc polymorphism or (at least for some forms of it) generic programming in the functional community, and often subtyping polymorphism (though this somewhat restricts the concept) in the OOP community, it the ability to have a single method or function that behaves differently depending on the precise type of its arguments (or, for methods, the type of the object whose method is being invoked). The add example above is ad-hoc polymorphism. In dynamically typed languages this ability goes without saying; statically-typed languages tend to (but don't have to) have restrictions such as requiring that the argument be a subclass of some particular class (Addable).
Polymorphism is not about having to specify types when you define a function. That's more related to static vs. dynamic typing, though it's not an intrinsic part of the issue. Dynamically typed languages have no need for type declarations, while statically typed languages usually need some type declarations (going from quite a lot in Java to almost none in ML).

Hope from this example, you will understand what Polymorphism is. In this picture, all objects have a method Speak() but each has a different implementation. Polymorphism allows you to do this, you can declare an action for a class and its subclasses but for each subclass, you can write exactly what you want later.

The answers you've gotten are good, and explain what polymorphism is. I think it can also help to understand some of the reasons it is useful.
In some languages that lack polymorphism you find yourself in situations where you want to perform what is conceptually the same operation on different types of objects, in cases where that operation has to be implemented differently for each type. For instance, in a python-like syntax:
def dosomething(thing):
if type(thing)==suchandsuch:
#do some stuff
elif type(thing)==somesuch:
#do some other stuff
elif type(thing)==nonesuch:
#yet more stuff
There are some problems with this. The biggest is that it causes very tight coupling and a lot of repetition. You are likely to have this same set of tests in a lot of places in your code. What happens if you add a new type that has to support this operation? You have to go find every place you have this sort of conditional and add a new branch. And of course you have to have access to all the source code involved to be able to make those changes. On top of that conditional logic like this is wordy, and hard to understand in real cases.
It's nicer to be able to just write:
thing.dosomething()
On top of being a lot shorter this leads to much looser coupling. This example/explanation is geared to traditional OO languages like Python. The details are a bit different in, say, functional languages. But a lot of the general utility of polymorphism remains the same.

Polymorphism literally means "many shapes", and that's pretty good at explaining its purpose. The idea behind polymorphism is that one can use the same calls on different types and have them behave appropriately.
It is important to distinguish this from the typing system - strongly typed languages require that objects be related through an inheritance chain to be polymorphic, but with weakly typed languages, this is not necessary.
In Java (et al.), this is why interfaces are useful - they define the set of functions that can be called on objects without specifying the exact object - the objects that implement that interface are polymorphic with respect to that interface.
In Python, since things are dynamically typed, the interface is less formal, but the idea is the same - calling foo() will succeed on two objects that know how to properly implement foo(), but we don't care about what type they really are.

No, that is not polymorphism. Polymorphism is the concept that there can be many different implementations of an executable unit and the difference happen all behind the scene without the caller awareness.
For example, Bird and Plane are FlyingObject. So FlyingObject has a method call fly() and both Bird and Plane implement fly() method. Bird and Plan flies differently so the implementations are different. To the clients point of view, it does not matter how Bird or Plane fly, they can just call fly() method to a FlyingObject object does not matter if that object is Plan or Bird.
What you are describing is dynamic and static type checking which the type compatibility is done at run-time and compile-time respectively.
Hope this out.
NawaMan

Short answer: The ability to treat programmatic type instances of different types as the same for certain purposes.
Long answer:
From Ancient Greek poly (many) + morph (form) + -ism.
Polymorphism is a general technique enabling different types to be treated uniformly in some way. Examples in the programming world include:
parametric polymorphism (seen as
generics in Java)
subtyping
polymorphism, implemented in Java
using dynamic message dispatch
between object instances.
ad-hoc
polymorphism, which relates to the
ability to define functions of the
same name that vary only by the types
they deal with (overloading in Java).
The word polymorphism is also used to describe concepts in other, unrelated, domains such as genetics.

What you are talking about is auto-typing--or maybe type detection. It is something a Dynamic language tends to do--it allows the user to not know or care about the types at build time--the types are figured out at runtime and not restricted to a specific type.
Polymorphism is where you have two classes that inherit from a main class but each implement a method differently.
For instance, if you have Vehicle as a root class and Car and Bicycle as instances, then vehicle.calculateDistance() would operate based on gas available if the vehicle is an instance of Car, and would operate based on the endurance of the cyclist if it was a Bicycle type.
It's generally used like this:
getTotalDistance(Vehicle[] vehicle) {
int dist=0
for each vehicle
dist+=vehicle.calculateDistance();
Note that you are passing in the base type, but the instances will never be Vehicle itself, but always a child class of Vehicle--but you DO NOT cast it to the child type. Polymorphis means that vehicle morphs into whatever child is required.

Yes, that is an example of "type-polymorphism". However, when talking about object-oriented programming "polymorphism" typically relates to "subtype polymorphism." The example you gave is often called "typing".
Java, C, C++ and others, use static typing. In that, you have to specify the types are compile time.
Python, and Ruby use dynamic in that the typing is inferred during interpretation.
Subtype polymorphism or just "polymorphism" is the ability for a base class reference that is a derived type, to properly invoke the derived type's method.
For example (in near pseudo code):
class Base
{
virtual void Draw() { //base draw}
}
class Derived: public Base
{
void Draw() { //derived draw}
}
Base* myBasePtr = new Derived();
myBasePtr->Draw(); //invokes Derived::Draw(), even though it's a base pointer.
This is polymorphism.

Polymorphism:
One method call works on several classes, even if the classes need different implementations;
Ability to provide multiple implementations of an action, and to select the correct implementation based on the surrounding context;
Provides overloading and overriding;
Could occurs both in Runtime and Compile-Time;
Run time Polymorphism :
Run time Polymorphism also known as method overriding
Method overriding means having two or more methods with the same name , same signature but with different implementation
Compile time Polymorphism :
Compile time Polymorphism also known as method overloading
Method overloading means having two or more methods with the same name but with different signatures
In computer science, polymorphism is a programming language feature that allows values of different data types to be handled using a uniform interface. The concept of parametric polymorphism applies to both data types and functions. A function that can evaluate to or be applied to values of different types is known as a polymorphic function. A data type that can appear to be of a generalized type (e.g., a list with elements of arbitrary type) is designated polymorphic data type like the generalized type from which such specializations are made.
Disadvantages of Polymorphism:
Polymorphism reduces readability of the program. One needs to visualize runtime behaviour of program to identify actual execution time class involved. It also becomes difficult to navigate through classes implementing this concept. Even sofasticated IDEs can not provide this navigation feature. This adds to the maintainance cost to some extent.

Polymorphism - The same object acting differently based on the scenario it is in. For example, if a 12-year old kid was in a room with a bunch of kids, the type of music they would listen to would be different than if a 12 year old kid was in a room full of adults. The 12 year old kid is the same, however the kid acting differently based on the scenario it is in (the different music).

The ability to define a function in multiple forms is called Polymorphism. In C#, Java, C++ there are two types of polymorphism: compile time polymorphism (overloading) and runtime polymorphism (overriding).
Overriding: Overriding occurs when a class method has the same name and signature as a method in parent class.
Overloading: Overloading is determined at the compile time. It occurs when several methods have same names with:
Different method signature and different number or type of
parameters.
Same method signature but different number of parameters.
Same method signature and same number of parameters but of different type

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.