This question already has answers here:
Overriding special methods on an instance
(5 answers)
Closed 5 years ago.
Suppose I want to run the following code using Python 3.6.3:
class Foo:
def bar(self):
return 1
def __len__(self):
return 2
class FooWrapper:
def __init__(self, foo):
self.bar = foo.bar
self.__len__ = foo.__len__
f = Foo()
print(f.bar())
print(f.__len__())
print(len(f))
w = FooWrapper(Foo())
print(w.bar())
print(w.__len__())
print(len(w))
Here's the output:
1
2
2
1
2
TypeError: object of type 'FooWrapper' has no len()
So __len__() works, but len() does not? What gives and how go I properly copy __len__ method from Foo to FooWrapper?
By the way, the following behavior is universal for all 'special' methods, not only __len__: for example, __iter__ or __getitem__ do not work either (unless called directly)
The reason this happens is because the special methods have to be on an object's class, not on the instance.
len will look for __len__ in the FooWrapper class. BTW, although this looks like it "works", you are actually adding foo.__len__, i.e. , the methoud already bound to the foo instance of Foo to your FooWrapper object. That might be the intent, but you have to be aware of this.
The easiest way for this to work is to make FooWrapper have itself a __len__ method that will call the wrapped instance's __len__:
class FooWrapper:
def __init__(self, foo):
self.foo = foo
self.bar = foo.bar
self.__len__ = foo.__len__
def __len__(self):
return len(self.foo)
Does that mean that for any and all special methods hace to explicitly exist in the wrapper class? Yes, it does - and it is one of the pains for creating proxies that behave just the same as the wrapped object.
That is because the special methods' checking for existence and calling is done directly in C, and not using Python's lenghty lookup mechanisms, as tat would be too inefficient.
It is possible to create a wrapper-class factory thing that would inspect the object and create a brand new wrapper class,with all meaningful special methods proxied, though - but I think that would be too advanced for what you have in mind right now.
You'd better just use explicit special methods, or explicit access to the wrapped object in the remainder of the code. (Like, when you will need to use __iter__ from a wrapped object, instead of doing just for x in wrapper, do for x in wrapper.wrapped )
Related
From the documentation:
x[i] is roughly equivalent to type(x).__getitem__(x, i).
What is the benefit of the above rather than having a seemingly simpler x.__getitem__(i)?
EDIT: Why is Python behaving this way?
As a downside of the standard behavior let me show this sample code where I was surprised to find the last assertion fails while second to last one (calling __getitem__ directly) passes.
def poww_bar(base):
class Bar():
def __getitem__(self, x):
return lambda: base**x
return Bar()
def poww_foo(base):
class Foo():
pass
f = Foo()
f.__getitem__ = lambda x: lambda: base ** x
return f
pow_bar2 = poww_bar(2)
pow_foo2 = poww_foo(2)
assert pow_bar2.__getitem__(3)() == 8 # OK
assert pow_bar2[3]() == 8 # OK
assert pow_foo2.__getitem__(3)() == 8 # OK
assert pow_foo2[3]() == 8 # TypeError: 'Foo' object is not subscriptable
Methods are class attributes, not instance attributes.
There is no instance attribute named __getitem__ associated with pow_bar2. So lookup proceeds to checking the class for an attribute by that name, and it succeeds in finding Bar.__getitem__.
But the process doesn't end there. pow_bar2.__getitem__(i) is not equivalent to Bar.__getitem__(i), because Python first checks of the attribute lookup produces an object that implements the descriptor protocol. Since Bar.__getitem__ is an instance of function, it does implement the descriptor protocol.
The next step is then to return not the function itself, but the result of Bar.__dict__['__getitem__'].__get__(pow_bar2, Bar). (I'm switching to the use of Bar.__dict__ to emphasize that we do not get into an infinite loop of triggering the descriptor protocol.) This is an instance of method, which is itself a callable that passes is own arguments, along with pow_bar2, as arguments to the original function.
Thus, pow_bar2.__getitem__(i) is equivalent to Bar.__dict__['__getitem__'].__get__(pow_bar2, Bar)(i), which is roughly equivalent to Bar.__dict__['__getitem__'](pow_bar2, i).
But really, pow_bar2[i] is just shorter and more easily recognizable (due to decades of established support for this syntax in other languages) than pow_bar2.__getitem__(i). __getitem__ is what makes the use of [] extendable to other classes, rather than limiting it to built-in types.
The descriptor protocol is not just a one-shot feature that makes instance-method behavior seem more complicated than necessary. It also determines how class methods, static methods, and properties work, and can further be used to customize attribute behavior in other ways.
It could just be an optimization. A class function will only have one reference in the class definition. An object function will have a reference in every object. So the __getitem__ method was specified to be a class function, so they didn't need to waste time looking in the object definitions for it.
This is all speculation of course.
I recently spent way too long debugging a piece of code, only to realize that the issue was I did not include a () after a command. What is the logic behind which commands require a () and which do not?
For example:
import pandas as pd
col1=['a','b','c','d','e']
col2=[1,2,3,4,5]
df=pd.DataFrame(list(zip(col1,col2)),columns=['col1','col2'])
df.columns
Returns Index(['col1', 'col2'], dtype='object') as expected. If we use .columns() we get an error.
Other commands it is the opposite:
df.isna()
Returns:
col1 col2
0 False False
1 False False
2 False False
3 False False
4 False False
but df.isna returns:
<bound method DataFrame.isna of col1 col2
0 a 1
1 b 2
2 c 3
3 d 4
4 e 5>
Which, while not throwing an error, is clearly not what we're looking for.
What's the logic behind which commands use a () and which do not?
I use pandas as an example here, but I think this is relevant to python more generally.
Because functions need parenthesis for their arguments, while variables do not, that's why it's list.append(<item>) but it's list.items.
If you call a function without the parenthesis like list.append what returns is a description of the function, not a description of what the function does, but a description of what it is.
As for classes, a call to a class with parenthesis initiates an object of that class, while a call to a class without the parenthesis point to the class itself, which means that if you were to execute print(SomeClass) you'd get <class '__main__.SomeClass'> which is a description of what it is, the same kind of response you'd get if you were to call a function without parenthesis.
What's the logic behind which commands use a () and which do not?
An object needs to have a __call__ method associated with it for it to called as a function using ():
class Test:
def __call__(self, arg):
print("Called with", arg)
t = Test() # The Test class object uses __call__ to create instances
t(5) # Then this line prints "Called with 5"
So, the difference is that columns doesn't have a __call__ method defined, while Index and DataFrame do.
TL;DR you just kinda have to know
Nominally, the parens are needed to call a function instead of just returning an object.
foo.bar # get the bar object
foo.bar() # call the bar object
Callable objects have a __call__ method. When python sees the (), it knows to call __call__. This is done at the C level.
In addition, python has the concept of a property. Its a callable data object that looks like a regular data object.
class Foo:
def __init__(self):
self._foo = "foo"
#property
def foo(self):
return "I am " + self._foo
#foo.setter
def foo(self, val):
assert isinstance(val, str)
self._foo = val + " you bet"
f = Foo()
f.foo = "Hello" # calls setter
print(f.foo) # calls getter
Similarly, when python sees array notation foo[1] it will call an object's __getitem__ or __setitem__ methods and the object is free to overload that call in any way it sees fit.
Finally, the object itself can intercept attribute access with __getattr__, __getattribute__ and __setattr__ methods, leaving everything up in the air. In fact, python doesn't really know what getting and setting attributes means. It is calling these methods. Most objects just use the default versions inherited from object. If the class is implemented in C, there is no end to what could be going on in the background.
Python is a dynamic language and many packages add abstractions to make it easier (?) to use their services. The downside is that you may spend more time with help text and documentation than one may like.
Object method vs Object attribute.
Objects has methods and attributes.
Methods require a parenthesis to call them -- even if the method does not require arguments.
Where as attributes are like variables are pointed to objects as the program progresses. You just call these attributes by their name (without parenthesis). Of course you may have to qualify both the methods and attributes with the object names as required.
Why doesn't Python have an instancemethod function analogous to staticmethod and classmethod?
Here is how this arose for me. Suppose I have an object which I know will be hashed frequently and whose hash is expensive to calculate. Under this assumption, it is reasonable to compute the hash value once and cache it, as in the following toy example:
class A:
def __init__(self, x):
self.x = x
self._hash_cache = hash(self.x)
def __hash__(self):
return self._hash_cache
The __hash__ function in this class does very little, just an attribute lookup and a return. Naively, it seems it ought to be equivalent to instead write:
class B:
def __init__(self, x):
self.x = x
self._hash_cache = hash(self.x)
__hash__ = operator.attrgetter('_hash_cache')
According to the documentation, operator.attrgetter returns a callable object that fetches the given attribute from its operand. If its operand is self, then it will return self._hash_cache, which is the desired result. Unfortunately this does not work:
>>> hash(A(1))
1
>>> hash(B(1))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: attrgetter expected 1 arguments, got 0
The reason for this is as follows. If one reads the descriptor HOWTO, one finds that class dictionaries store methods as functions; functions are non-data descriptors whose __get__ method returns a bound method. But operator.attrgetter does not return a function; it returns a callable object. And in fact, it is a callable object with no __get__ method:
>>> hasattr(operator.attrgetter('_hash_cache'), '__get__')
False
Lacking a __get__ method, this of course will not automatically be turned into a bound method. We can make a bound method from it using types.MethodType, but using it in our class B would require creating a bound method for every object instance and assigning it to __hash__.
We can see the fact that operator.attrgetter has no __get__ directly if we browse the CPython source. I'm not very familiar with the CPython API, but I believe that what's going on is as follows. The definition of the attrgetter_type is in Modules/_operator.c, at line 1439 as I write this. This type sets tp_descr_get to 0. And according to the type object documentation, that means an object whose type is attrgetter_type will not have a __get__.
Of course, if we give ourselves a __get__ method, then everything works. This is the case in the first example above, where __hash__ is actually a function and not just a callable. It's also true in some other cases. For example, if we want to lookup a class attribute, we could write the following:
class C:
y = 'spam'
get_y = classmethod(operator.attrgetter('y'))
As written this is terribly un-Pythonic (though it might be defensible if there were a strange custom __getattr__ for which we wanted to provide convenience functions). But at least it gives the desired result:
>>> C.get_y()
'spam'
I can't think of any reason why it would be bad for attrgetter_type to implement __get__. But on the other hand, even if it did, there would be other situations where we run into trouble. For example, suppose we have a class whose instances are callable:
class D:
def __call__(self, other):
...
We can't use an instance of this class as a class attribute and expect instance lookups to generate bound methods. For instance,
d = D()
class E:
apply_d = d
When D.__call__ is called, it will receive self but not other, and that generates a TypeError. This example might be a little far-fetched, but I'd be a little surprised if nobody had ever encountered something like this in practice. It could be fixed by giving D a __get__ method; but if D is from a third-party library that could be inconvenient.
It seems that the easiest solution would be to have an instancemethod function. Then we could write __hash__ = instancemethod(operator.attrgetter('_hash_cache')) and apply_d = instancemethod(d) and they would both work as intended. Yet, as far as I know, no such function exists. Hence my question: Why is there no instancemethod function?
EDIT: Just to be clear, the functionality of instancemethod would be equivalent to:
def instancemethod(func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
return wrapper
This could be applied as in the original question above. One could also imagine writing a class decorator that could be applied to D that would give it a __get__ method; but this code doesn't do this.
So I'm not talking about adding a new feature to Python. Really the question is one of language design: Why not provide it as, say, functools.instancemethod? If the answer is simply, "The use cases are so obscure that nobody's bothered," that's okay. But I would be happy to learn about other reasons, if there are any.
There is no instancemethod decorator because this is the default behaviour for functions declared inside a class.
class A:
...
# This is an instance method
def __hash__(self):
return self._hash_cache
Any callable which does not have a __get__ method can thus be wrapped into an instance method like so.
class A:
def instance_method(*args):
return any_callable(*args)
Thus creating an instancemethod decorator would just add another syntax for a feature which already exists. This would go against the saying that there should be one-- and preferably only one --obvious way to do it.
Side note
If it is so expensive to hash your instances, you might want to avoid calling you hash function on instantiation and delay it for when the object are hashed.
One way to do that could be to set the attribute _hash_cache in __hash__ instead of __init__. Although, let me suggest a slightly more self-contained methods which relies on caching your hash.
from weakref import finalize
class CachedHash:
def __init__(self, x):
self.x = x
def __hash__(self, _cache={}):
if id(self) not in _cache:
finalize(self, _cache.pop, id(self))
_cache[id(self)] = hash(self.x) # or some complex hash function
return _cache[id(self)]
The use of finalize ensures the cache is cleared of an id when its instance is garbage collected.
I have a satisfying answer to my question. Python does have the internal interface necessary for an instancemethod function, but it's not exposed by default.
import ctypes
import operator
instancemethod = ctypes.pythonapi.PyInstanceMethod_New
instancemethod.argtypes = (ctypes.py_object,)
instancemethod.restype = ctypes.py_object
class A:
def __init__(self, x):
self.x = x
self._hash_cache = hash(x)
__hash__ = instancemethod(operator.attrgetter('_hash_cache'))
a = A(1)
print(hash(a))
The instancemethod function this creates works in essentially the same way as classmethod and staticmethod. These three functions return new objects of types instancemethod, classmethod, and staticmethod, respectively. We can see how they work by looking at Objects/funcobject.c. These objects all have __func__ members which store a callable object. They also have a __get__. For a staticmethod object, the __get__ returns __func__ unchanged. For a classmethod object, __get__ returns a bound method object, where the binding is to the class object. And for a staticmethod object, __get__ returns a bound method object, where the binding is to the object instance. This is precisely the same behavior as __get__ for a function object and is exactly what we want.
The only documentation on these objects seems to be in the Python C API here. My guess is that they're not exposed because they're so rarely needed. I think it would be nice to have PyInstanceMethod_New available as functools.instancemethod.
I have a function that generally accepts lists, but on occasions needs to accept functions as well. There were several ways of dealing with this, but it would have been very very useful to be able to do len(foo) for a given function foo.
In the end, instead of passing in functions, I passed in callable classes that had a __len__ function defined. But it got me thinking, since in python everything is an object, and functions can have attributes etc. just as a curiosity...
Question
Is there any way to give a function a len? A quick google didn't bring up anything.
My attempt
def foo():
return True
def my_len(self):
return 5
foo.__len__ = my_len
len(foo)
Adding __len__ to an object is not working (see this link added by Aran-Fey why). A function is just an object defining a __call__ method. You can define a class like this:
class Foo:
def __call__(self):
return True
def __len__(self):
return 5
Using it:
>>> foo=Foo()
>>> foo()
True
>>> len(foo)
5
It is possible to create a function which is having a length, but you should consider the use case. Python gives you a lot of power, but not everything what's possible is actually a good idea.
This question already has answers here:
Overriding special methods on an instance
(5 answers)
Closed 5 years ago.
I'm working on a project right now that deals with functions in an abstract mathematical sense. Without boring the reader with the details, I'll say that I had the following structure in an earlier version:
class Foo(Bar):
def __init__(self, a, b):
self.a = a
self.b = b
self.sub_unit = Foo(a, not b)
Of course, that's not quite the change I'm making to the arguments to Foo, but suffice to say, it is necessary that this property, if accessed repeatedly, result in an indefinitely long chain of Foo objects. Obviously, this results in an infinite recursion when one instantiates Foo. I solved this in the earlier version by removing the last line of init and adding the following to the Foo class:
def __getattr__(self, attr: str):
if attr == 'sub_unit':
return Foo(self.a, not self.b)
else:
return super().__getattr__(attr)
This worked quite well, as I could calculate the next object in the chain as needed.
In going over the code, though, I realize that for other reasons, I need an instance of Bar, not a sub-class of it. To see if I could override the getattr for a single instance, I tried the following:
>>> foo = Bar(a=1, b=2) # sub_unit doesn't get set here.
>>> foo.__getattr__ = lambda attr: 'foo'
>>> foo.a
1
>>> foo.__getattr__('a')
'foo'
What is happening here that I don't understand? Why isn't foo.a calling foo.__getattr__('a')?
Is there a good way to overwrite __getattr__ for a single instance, or is my best bet to re-factor all the code I have that reads sub_unit and friends to call those as functions, to handle this special case?
When you lookup the attribute a with foo.a, python looks it up in the instance's property dictionary. When it is not found, the __getattr__ method will then be called.
On the contrary, if a exists in the instance, __getattr__ will not be called.