In my code I have a class, where one method is responsible for filtering some data. To allow customization for descendants I would like to define filtering function as a class attribute as per below:
def my_filter_func(x):
return x % 2 == 0
class FilterClass(object):
filter_func = my_filter_func
def filter_data(self, data):
return filter(self.filter_func, data)
class FilterClassDescendant(FilterClass):
filter_func = my_filter_func2
However, such code leads to TypeError, as filter_func receives "self" as first argument.
What is a pythonic way to handle such use cases? Perhaps, I should define my "filter_func" as a regular class method?
You could just add it as a plain old attribute?
def my_filter_func(x):
return x % 2 == 0
class FilterClass(object):
def __init__(self):
self.filter_func = my_filter_func
def filter_data(self, data):
return filter(self.filter_func, data)
Alternatively, force it to be a staticmethod:
def my_filter_func(x):
return x % 2 == 0
class FilterClass(object):
filter_func = staticmethod(my_filter_func)
def filter_data(self, data):
return filter(self.filter_func, data)
Python has a lot of magic within. One of those magics has something to do with transforming functions into UnboundMethod objects (when assigned to the class, and not to an class' instance).
When you assign a function (And I'm not sure whether it applies to any callable or just functions), Python converts it to an UnboundMethod object (i.e. an object which can be called using an instance or not).
Under normal conditions, you can call your UnboundMethod as normal:
def myfunction(a, b):
return a + b
class A(object):
a = myfunction
A.a(1, 2)
#prints 3
This will not fail. However, there's a distinct case when you try to call it from an instance:
A().a(1, 2)
This will fail since when an instance gets (say, internal getattr) an attribute which is an UnboundMethod, it returns a copy of such method with the im_self member populated (im_self and im_func are members of UnboundMethod). The function you intended to call, is in the im_func member. When you call this method, you're actually calling im_func with, additionally, the value in im_self. So, the function needs an additional parameter (the first one, which will stand for self).
To avoid this magic, Python has two possible decorators:
If you want to pass the function as-is, you must use #staticmethod. In this case, you will have the function not converted to UnboundMethod. However, you will not be able to access the calling class, except as a global reference.
If you want to have the same, but be able to access the current class (disregarding whether the function it is called from an instance or from a class), then your function should have another first argument (INSTEAD of self: cls) which is a reference to the class, and the decorator to use is #classmethod.
Examples:
class A(object):
a = staticmethod(lambda a, b: a + b)
A.a(1, 2)
A().a(1, 2)
Both will work.
Another example:
def add_print(cls, a, b):
print cls.__name__
return a + b
class A(object):
ap = classmethod(add_print)
class B(A):
pass
A.ap(1, 2)
B.ap(1, 2)
A().ap(1, 2)
B().ap(1, 2)
Check this by yourseld and enjoy the magic.
Related
I recently studied how decorators work in python, and found an example which integrates decorators with nested functions.
The code is here :
def integer_check(method):
def inner(ref):
if not isinstance(ref._val1, int) or not isinstance(ref._val2, int):
raise TypeError('val1 and val2 must be integers')
else:
return method(ref)
return inner
class NumericalOps(object):
def __init__(self, val1, val2):
self._val1 = val1
self._val2 = val2
#integer_check
def multiply_together(self):
return self._val1 * self._val2
def power(self, exponent):
return self.multiply_together() ** exponent
y = NumericalOps(1, 2)
print(y.multiply_together())
print(y.power(3))
My question is how the inner function argument("ref") accesses the instance attributes (ref._val1 and ref._val2)?
It seems like ref equals the instance but i have no idea how it happenes.
Let's first recall how a decorator works:
Decorating the method multiply_together with the decorator #integer_check is equivalent to adding the line: multiply_together = integer_check(multiply_together), and by the definition of multiply_together, this is equivalent to multiply_together = inner.
Now, when you call the method multiply_together, since this is an instance method, Python implicitly adds the class instance used to invoke the method as its first (an only, in this case) argument. But multiply_togethet is, actually,inner, so, in fact, inner is invoked with the class instance as an argument. This instance is mapped to the parameter ref, and through this parameter the function gets access to the required instance attributes.
well one explanation I found some time ago about the self argument was that this:
y.multiply_together()
is roughly the same as
NumericalOps.multiply_together(y)
So now that you use that decorator it returns the function inner which requires the ref argument so I see that roughly happen like this (on a lower level):
NumericalOps.inner(y)
Because inner "substitutes" multiply_together while also adding the extra functionality
inner replaces the original function as the value of the class attribute.
#integer_check
def multiply_together(self):
return self._val1 * self._val2
# def multiply_together(self):
# ...
#
# multiply_together = integer_check(multiply_together)
first defines a function and binds it to the name multiply_together. That function is then passed as the argument to integer_check, and then the return value of integer_check is bound to the name multiply_together. The original function is now only refernced by the name ref that is local to inner/multiply_together.
The definition of inner implies that integer_check can only be applied to functions whose first argument will have attributes named _val1 and _val2.
I know first argument in Python methods will be an instance of this class. So we need use "self" as first argument in methods. But should we also specify attribures (variables) in method starting with "self."?
My method work even if i don't specify self in his attributes:
class Test:
def y(self, x):
c = x + 3
print(c)
t = Test()
t.y(2)
5
and
class Test:
def y(self, x):
self.c = x + 3
print(self.c)
t = Test()
t.y(2)
5
For what i would need specify an attribute in methods like "self.a" instead of just "a"?
In which cases first example will not work but second will? Want to see situation which shows really differences between two of them, because now they behave the same from my point of view.
The reason you do self.attribute_name in a class method is to perform computation on that instances attribute as opposed to using a random variable.For Example
class Car:
def __init__(self,size):
self.size = size
def can_accomodate(self,number_of_people):
return self.size> number_of_people
def change_size(self,new_size):
self.size=new_size
#works but bad practice
def can_accomodate_v2(self,size,number_of_people):
return size> number_of_people
c = Car(5)
print(c.can_accomodate(2))
print(c.can_accomodate_v2(4,2))
In the above example you can see that the can_accomodate use's self.size while can_accomodate_v2 passes the size variable which is bad practice.Both will work but the v2 is a bad practice and should not be used.You can pass argument into a class method not related to the instance/class for example "number_of_people" in can_accomodate funtion.
Hope this helps.
I have a base class A with some heavy attributes (actually large numpy arrays) that are derived from data given to A's __init__() method.
First, I would like to subclass A into a new class B to perform modifications on these attributes with some B's specific methods. As these attributes are quite intensive to obtain, I don't want to instantiate B the same way as A but better use an A instance to initialize a B object. This is a type casting between A and B and I think I should use the __new__() method to return a B object.
Second, before every computations on B's attributes, I must be sure that the initial state of B has been restored to the current state of the instance of A that has been used for B instantiation, without creating a B object every time, a kind of dynamic linkage...
Here is an example code I wrote:
from copy import deepcopy
import numpy as np
class A(object):
def __init__(self, data):
self.data=data
def generate_derived_attributes(self):
print "generating derived attributes..."
self.derived_attributes = data.copy()
class B(A):
def __new__(cls, obj_a):
assert isinstance(obj_a, A)
cls = deepcopy(obj_a)
cls.__class__ = B
cls._super_cache = obj_a # This is not a copy... no additional memory required
return cls
def compute(self):
# First reset the state (may use a decorator ?)
self.reset()
print "Doing some computations..."
def reset(self):
print "\nResetting object to its initial state"
_super_cache = self._super_cache # For not being destroyed...
self.__dict__ = deepcopy(self._super_cache.__dict__)
self._super_cache = _super_cache
if __name__ == '__main__':
a = A(np.zeros(100000000, dtype=np.float))
a.generate_derived_attributes()
print a
b = B(a)
print b
b.compute()
b.compute()
Is this implementation a kind way to reach my objective with python or is there more Pythonic ways... ? Could I be more generic ? (I know that using __dict__ will not be a good choice in every cases, especially while using __slots__()...). Do you think that using a decorator around B.compute() would give me more flexibility for using this along with other classes ?
Edit: There was some confusion, but I want to ask a general question about object oriented design in Python.
Consider a class that lets you map data values to counts or frequencies:
class DataMap(dict):
pass
Now consider a subclass that allows you to construct a histogram from a list of data:
class Histogram(DataMap):
def __init__(self, list_of_values):
# 1. Put appropriate super(...) call here if necessary
# 2. Build the map of values to counts in self
pass
Now consider a class that lets you make a smoothed probability mass table rather than a Histogram.
class ProbabilityMass(DataMap):
pass
What is the best way to allow a ProbabilityMass to be constructed from either a Histogram or a list of values?
I "grew up" programming in C++, and in this case I would use an overloaded constructor. In Python I've thought of doing this with:
The constructor takes multiple arguments (all but one of these should == None)
I define from_Histogram and from_list methods
In the second case (which I believe is better), what is the best way to allow the from_list method to use the shared code from the Histogram constructor? A ProbabilityMass table is nearly identical to a Histogram table, but it is scaled so that the sum of all value is 1.0.
If you have come across a similar problem, please share your expertise!
To start with, if you think you want #staticmethod, you almost always don't. Either the function is not part of the class, in which case it should just be a free function, or it is part of the class, but not tied to an instance, and it should be a #classmethod. Your named constructor is a good candidate for a #classmethod.
Also note that you should invoke A.__init__ from B via super(), otherwise multiple inheritance can bite you bad.
class A:
def __init__(self, data):
self.values_to_counts = {}
for val in data:
if val in self.values_to_counts:
self.values_to_counts[val] += 1
else:
self.values_to_counts[val] = 1
#classmethod
def from_values_to_counts(cls, values_to_counts):
self = cls([])
self.values_to_counts = values_to_counts
return self
class B(A):
def __init__(self, data, parameter):
super(B, self).__init__(data)
self.parameter = parameter
def print_parameter(self):
print self.parameter
In this case, you don't need a B.from_values_to_counts, it inherits from A, and it will return an instance of B, since that's how it was called.
If you need to do more complex initialization in B, you can, using super(), which looks very similar to the way it would when you use it with instances. after all, a classmethod really isn't anything more complex than an instancemethod where the im_self attribute is assigned to the class itself.
class A:
def __init__(self, data):
self.values_to_counts = {}
for val in data:
if val in self.values_to_counts:
self.values_to_counts[val] += 1
else:
self.values_to_counts[val] = 1
#classmethod
def from_values_to_counts(cls, values_to_counts):
self = cls([])
self.values_to_counts = values_to_counts
return self
class B(A):
def __init__(self, data, parameter):
super(B, self).__init__(data)
self.parameter = parameter
def print_parameter(self):
print self.parameter
#classmethod
def from_values_to_counts(cls, values_to_counts):
self = super(B, cls).from_values_to_counts(values_to_counts)
do_more_initialization(self)
return self
this works in the desired way:
class d:
def __init__(self,arg):
self.a = arg
def p(self):
print "a= ",self.a
x = d(1)
y = d(2)
x.p()
y.p()
yielding
a= 1
a= 2
i've tried eliminating the "self"s and using a global statement in __init__
class d:
def __init__(self,arg):
global a
a = arg
def p(self):
print "a= ",a
x = d(1)
y = d(2)
x.p()
y.p()
yielding, undesirably:
a= 2
a= 2
is there a way to write it without having to use "self"?
"self" is the way how Python works. So the answer is: No! If you want to cut hair: You don't have to use "self". Any other name will do also. ;-)
Python methods are just functions that are bound to the class or instance of a class. The only difference is that a method (aka bound function) expects the instance object as the first argument. Additionally when you invoke a method from an instance, it automatically passes the instance as the first argument. So by defining self in a method, you're telling it the namespace to work with.
This way when you specify self.a the method knows you're modifying the instance variable a that is part of the instance namespace.
Python scoping works from the inside out, so each function (or method) has its own namespace. If you create a variable a locally from within the method p (these names suck BTW), it is distinct from that of self.a. Example using your code:
class d:
def __init__(self,arg):
self.a = arg
def p(self):
a = self.a - 99
print "my a= ", a
print "instance a= ",self.a
x = d(1)
y = d(2)
x.p()
y.p()
Which yields:
my a= -98
instance a= 1
my a= -97
instance a= 2
Lastly, you don't have to call the first variable self. You could call it whatever you want, although you really shouldn't. It's convention to define and reference self from within methods, so if you care at all about other people reading your code without wanting to kill you, stick to the convention!
Further reading:
Python Classes tutorial
When you remove the self's, you end up having only one variable called a that will be shared not only amongst all your d objects but also in your entire execution environment.
You can't just eliminate the self's for this reason.