Using super to make a pipeline? - python

I was thinking about how to use super to make a pipeline in python. I have a series of transformations I must do to a stream, and I thought that a good way to do it was something in the lines of:
class MyBase(object):
def transformData(self, x):
return x
class FirstStage(MyBase):
def transformData(self, x):
y = super(FirstStage, self).transformData(x)
return self.__transformation(y)
def __transformation(self, x):
return x * x
class SecondStage(FirstStage):
def transformData(self, x):
y = super(SecondStage, self).transformData(x)
return self.__transformation(y)
def __transformation(self, x):
return x + 1
It works as I intended, but there's a potential repetition. If I have N stages, I'll have N identical transformData methods where the only thing I change is the name of the current class.
Is there a way to remove this boilerplate? I tried a few things but the results only proved to me that I hadn't understood perfectly how super worked.
What I wanted was to define only the method __transformation and naturally inherit a transformData method that would go up in MRO, call that class' transformData method and then call the current class' __transformation on the result. Is it possible or do I have to define a new identical transformData for each child class?
I agree that this is a poor way of implementing a pipeline. That can be done with much simpler (and clearer) schemes. I thought of this as the least modification I could do on a existing model to get a pipeline out of the existing classes without modifying the code too much. I agree this is not the best way to do it. It would be a trick, and tricks should be avoided. Also I thought of it as a way of better understanding how super works.
Buuuut. Out of curiosity... is it possible to do it in the above scheme without the transformData repetition? This is a genuine doubt. Is there a trick to inherit transformData in a way that the super call in it is changed to be called on the current class?
It would be a tremendously unclear, unreadable, smart-ass trickery. I know. But is it possible?

I don't think using inheritance for a pipeline is the right way to go.
Instead, consider something like this -- here with "simple" examples and a parametrized one (a class using the __call__ magic method, but returning a closured function would do too, or even "JITing" one by way of eval).
def two_power(x):
return x * x
def add_one(x):
return x + 1
class CustomTransform(object):
def __init__(self, multiplier):
self.multiplier = multiplier
def __call__(self, value):
return value * self.multiplier
def transform(data, pipeline):
for datum in data:
for transform in pipeline:
datum = transform(datum)
yield datum
pipe = (two_power, two_power, add_one, CustomTransform(1.25))
print list(transform([1, 2, 4, 8], pipe))
would output
[2.5, 21.25, 321.25, 5121.25]

The problem is that using inheritance here is rather weird in terms of OOP. And do you really need to define the whole chain of transformations when defining classes?
But it's better to forget OOP here, the task is not for OOP. Just define functions for transformations:
def get_pipeline(*functions):
def pipeline(x):
for f in functions:
x = f(x)
return x
return pipeline
p = get_pipeline(lambda x: x * 2, lambda x: x + 1)
print p(5)
An even shorter version is here:
def get_pipeline(*fs):
return lambda v: reduce(lambda x, f: f(x), fs, v)
p = get_pipeline(lambda x: x * 2, lambda x: x + 1)
print p(5)
And here is an OOP solution. It is rather clumsy if compared to the previous one:
class Transform(object):
def __init__(self, prev=None):
self.prev_transform = prev
def transformation(self, x):
raise Exception("Not implemented")
def transformData(self, x):
if self.prev_transform:
x = self.prev_transform.transformData(x)
return self.transformation(x)
class TransformAdd1(Transform):
def transformation(self, x):
return x + 1
class TransformMul2(Transform):
def transformation(self, x):
return x * 2
t = TransformAdd1(TransformMul2())
print t.transformData(1) # 1 * 2 + 1

Related

How to define toString method?

I have just almost finished my assignment and now the only thing I have left is to define the tostring method shown here.
import math
class RegularPolygon:
def __init__(self, n = 1, l = 1):
self.__n = n
self.__l = l
def set_n(self, n):
self.__n = n
def get_n(self):
return self.__n
def addSides(self, x):
self.__n = self.__n + x
def setLength(self, l ):
self.__l = l
def getLength(self):
return self.__l
def setPerimeter(self):
return (self.__n * self.__l )
def getArea(self):
return (self.__l ** 2 / 4 * math.tan(math.radians(180/self.__n)))
def toString(self):
return
x = 3
demo_object = RegularPolygon (3, 1)
print(demo_object.get_n() , demo_object.getLength())
demo_object.addSides(x)
print(demo_object.get_n(), demo_object.getLength())
print(demo_object.getArea())
print(demo_object.setPerimeter())
Basically the tostring on what it does is return a string that has the values of the internal variables included in it. I also need help on the getArea portion too.
Assignment instructions
The assignment says
... printing a string representation of a RegularPolygon object.
So I would expect you get to choose a suitable "representation". You could go for something like this:
return f'{self.__n+2} sided regular polygon of side length {self.__l}'
or as suggested by #Roy Cohen
return f'{self.__class__.__name__}({self.__n}, {self.__l})'
However, as #Klaus D. wrote in the comments, Python is not Java, and as such has its own standards and magic methods to use instead.
I would recommend reading this answer for an explanation between the differences between the two built-in string representation magic-methods: __repr__ and __str__. By implementing these methods, they will automatically be called whenever using print() or something similar, instead of you calling .toString() every time.
Now to address the getters and setters. Typically in Python you avoid these and prefer using properties instead. See this answer for more information, but to summarise you either directly use an objects properties, or use the #property decorator to turn a method into a property.
Edit
Your area formula is likely an error with order-of-operations. Make sure you are explicit with which operation you're performing first:
return self.__l ** 2 / (4 * math.tan(math.radians(180/self.__n)) )
This may be correct :)

Python method calls in constructor and variable naming conventions inside a class

I try to process some data in Python and I defined a class for a sub-type of data. You can find a very simplified version of the class definition below.
class MyDataClass(object):
def __init__(self, input1, input2, input3):
"""
input1 and input2 are a 1D-array
input3 is a 2D-array
"""
self._x_value = None # int
self._y_value = None # int
self.data_array_1 = None # 2D array
self.data_array_2 = None # 1D array
self.set_data(input1, input2, input3)
def set_data(self, input1, input2, input3):
self._x_value, self._y_value = self.get_x_and_y_value(input1, input2)
self.data_array_1 = self.get_data_array_1(input1)
self.data_array_2 = self.get_data_array_2(input3)
#staticmethod
def get_x_and_y_value(input1, input2):
# do some stuff
return x_value, y_value
def get_data_array_1(self, input1):
# do some stuff
return input1[self._x_value:self._y_value + 1]
def get_data_array_2(self, input3):
q = self.data_array_1 - input3[self._x_value:self._y_value + 1, :]
return np.linalg.norm(q, axis=1)
I'm trying to follow the 'Zen of Python' and thereby to write beautiful code. I'm quite sceptic, whether the class definition above is a good pratice or not. While I was thinking about alternatives I came up with the following questions, to which I would like to kindly get your opinions and suggestions.
Does it make sense to define ''get'' and ''set'' methods?
IMHO, as the resulting data will be used several times (in several plots and computation routines), it is more convenient to create and store them once. Hence, I calculate the data arrays once in the constructor.
I do not deal with huge amount of data and therefore processing takes not more than a second, however I cannot estimate its potential implications on RAM if someone would use the same procedure for huge data.
Should I put the function get_x_and_y_value() out of the class scope and convert static method to a function?
As the method is only called inside the class definition, it is better to use it as a static method. If I should define it as a function, should I put all the lines relevant to this class inside a script and create a module of it?
The argument naming of the function get_x_and_y_value() are the same as __init__ method. Should I change it?
It would ease refactoring but could confuse others who read it.
In Python, you do not need getter and setter functions. Use properties instead. This is why you can access attributes directly in Python, unlike other languages like Java where you absolutely need to use getters and setters and to protect your attributes.
Consider the following example of a Circle class. Because we can use the #property decorator, we don't need getter and setter functions like other languages do. This is the Pythonic answer.
This should address all of your questions.
class Circle(object):
def __init__(self, radius):
self.radius = radius
self.x = 0
self.y = 0
#property
def diameter(self):
return self.radius * 2
#diameter.setter
def diameter(self, value):
self.radius = value / 2
#property
def xy(self):
return (self.x, self.y)
#xy.setter
def xy(self, xy_pair):
self.x, self.y = xy_pair
>>> c = Circle(radius=10)
>>> c.radius
10
>>> c.diameter
20
>>> c.diameter = 10
>>> c.radius
5.0
>>> c.xy
(0, 0)
>>> c.xy = (10, 20)
>>> c.x
10
>>> c.y
20

Is it OK to replace a method by a plain function?

This works as expected, but I am somehow unsure about this approach. Is it safe? Is it pythonic?
class Example:
def __init__(self, parameter):
if parameter == 0:
# trivial case, the result is always zero
self.calc = lambda x: 0.0 # <== replacing a method
self._parameter = parameter
def calc(self, x):
# ... long calculation of result ...
return result
(If there is any difference between Python2 and Python3, I'm using Python3 only.)
This is very confusing. If someone else reads it, they won't understand what is going on. Just put a if statement at the beginning of your method.
def calc(self, x):
if self.parameter == 0:
return 0
# ... long calculation of result ...
return result
Also if you change self.parameter after it was initialized with 0, your function wouldn't work anymore.
You'll have a problem should parameter ever changes, so I don't consider it good practice.
Instead, I think you should do this:
class Example:
def __init__(self, parameter):
self._parameter = parameter
def calc(self, x):
if not self._parameter:
return 0.0
# ... long calculation of result ...
return result
I decided to post a summary of several comments and answers. Please do not vote for this summary, but give +1 to the original authors instead.
the approach is safe except for special __methods__
the approach is deemed unpythonic, undesirable, or unnecessary etc.
the parameter determining the function to use must be constant. If it is not the case, this approach makes no sense at all.
from several suggestions I prefer the code below for general cases and the obvious if cond: return 0.0 for simple cases:
class Example:
def __init__(self, parameter):
if parameter == 0:
self.calc = self._calc_trivial
else:
# ... pre-compute data if necessary ...
self.calc = self._calc_regular
self._parameter = parameter
def _calc_regular(self, x):
# ... long calculation of result ...
return result
#staticmethod
def _calc_trivial(x):
return 0.0

Python nested Classes - returning multiple values

I'm fairly new to classes in python, so please be gentle. My script is a tad more complicated than this, but this is essentially what it boils down to:
class primary_state:
def __init__(self,x,y,z):
self.x = x
self.y = y
self.z = z
self.substates=[]
def add_substate(self,i,j,k):
self.substates.append(self.substate(i,j,k))
class substate:
def __init__(self,i,j,k):
self.i = i
self.j = j
self.k = k
state = primary_state(1,2,3)
state.add_substate(4,5,6)
state.add_substate(7,8,9)
Now my question is: is it possible to return an array of values from each object? So for example I'd like to do:
state.substates[:].i
and have it return the values of 4 and 7, but alas substates is a list so it can't handle it. There also must be a more efficient way to do this but I haven't quite figured that out yet. Any advice/thoughts would be greatly appreciated! Thanks.
Use a list comprehension.
[sub.i for sub in state.substates]
This is roughly equivalent to the following:
x = []
for sub in state.substates:
x.append(sub.i)
except shorter, and it's an expression that you can embed in other expressions instead of a series of statements.
You can get the list of substates by calling:
[substate.i for substate in self.substates]
list comprehensions are the way to do it as the other answers point out.
If the only job of the primary state class is to hold subclasses, you can make your class behave like an iterable. In the example you give this is mostly syntactic sugar, but it can be useful. Complete instructions on how to do it are here but it's pretty simple:
class PrimaryState(object): #always use "new style" classes! its 2013!
def __init__(self,x,y,z):
self.x = x
self.y = y
self.z = z
self.substates=[]
def __len__(self):
return len(self.substates)
def __getitem__(self, index):
return self.substates[index]
def __iter__(self):
for sub in substates: yield sub
def __contains__(self, item):
return item in self.substates
def add(self, item):
self.substates.append(item)
This way you can do:
primary = PrimaryState(1,2,3)
primary.add(SubState(4,5,6))
primary.add(SubState(7,8,9))
for item in primary:
print item
# Substate 4,5,6
# Substate 7,8,9
PS: Check out PEP-8, the standard python style guide for naming classes and so on. And use new style classes (inheriting from object). Down the road it's important!

Is there a way to get a view of several lists/variables as a list like thing in python

I have several variables like:
class X(object):
...
class XY(X):
...
class XZ(X):
...
class XA(X):
...
y=XY()
z=[XZ(i) for i in range(1,10)]
a=[XA(i) for i in range(1,10)]
I would like to have a listlike view(including iteration and length) of the different variable y and the variables inside z and a.
this is for the sake of convenience w/out any worries about performance.
I could just do
view = [self.y] + self.z + self.a
each time but that seems to be breaking the DRY principle.
Edit: to clarify that this isn't about taking the instance variables of a class. I just want a view class, probably implementing a list like interface that forwards to other variables.
Or would it be better to make a closure that returns a view list when you call it(since I don't care about performance). Which is simpler/more pythonic/a better idea? How would I implement a list like forwarding class?., ect.
If I understand correctly you want to "chain" the iterators of self.z and self.a and add self.y to that. Perhaps this does what you want:
import itertools
itertools.chain(self.a, self.z, [self.y])
The cleanest would probably be to implement it like this:
class C(object):
def __init__(self, y, z, a):
# ...
def __iter__(self):
return itertools.chain(self.a, self.z, [self.y])
class X:
def __init__ (self, x, y, z):
self.x = x
self.y = y
self.z = z
x = X(1,2,3)
print x.dict

Categories