I'm reading through this blog post on OOP in python 3. In it there is:
As you can see, a __get__ method is listed among the members of the function, and Python recognizes it as a method-wrapper. This method shall connect the open function to the door1 instance, so we can call it passing the instance alone.
I'm trying to understand this more intuitively. In this context what is 'wrapping' what?
The method-wrapper object is wrapping a C function. It binds together an instance (here a function instance, defined in a C struct) and a C function, so that when you call it, the right instance information is passed to the C function.
It is the C API equivalent of method objects and functions on a custom class. Given a class Foo with a function bar, Foo().bar produces a bound method, which combines together the Foo() instance and the Foo.bar function, so that when called, you can pass in that instance as the first argument (usually called self).
Also see the descriptor protocol; this is what defines the __get__ method and how it is invoked.
Related
I'm trying to develop an understanding of OOP programming in python and one thing that has confused me is how a function becomes a bound method. What I think I understand so far is
In python 3.x "unbound methods" no longer exist. There's only functions and (bound) methods.
When an instance invokes a class function obj.func descriptor protocol is used and func.__get__ is invoked
The object returned is a bound method to the obj instance where the first parameter (usually self) is the obj being passed
What happens in between step 2 and 3?
It seems like something along the lines of
obj.func=type(obj).func.__get__(func, obj)
Where somewhere in __get__ there's something like a decorator that will then return the bound method.
According to Python 2.7.12 documentation, User-defined methods:
User-defined method objects may be created when getting an attribute
of a class (perhaps via an instance of that class), if that attribute
is a user-defined function object, an unbound user-defined method
object, or a class method object. When the attribute is a
user-defined method object, a new method object is only created if the
class from which it is being retrieved is the same as, or a derived
class of, the class stored in the original method object; otherwise,
the original method object is used as it is.
I know that everything in Python is an object, so a "user-defined method" must be identical to a "user-defined method object". However, I can't understand why there is a "user-defined function object attribute". Say, in the following code:
class Foo(object):
def meth(self):
pass
meth is a function defined inside a class body, and thus a method. So why can we have a "user-defined function object attribute"? Aren't all attributes defined inside a class body?
Bouns question: Provide some examples illustrating how a user-defined method object is created by getting an attribute of a class. Isn't objects defined in their class definition? (I know methods can be assigned to a class instance, but that's monkey patching.)
I'm asking for help because this part of document is really really confusing to me, a programmer who only knows C, since Python is such a magical language that supports both functional programming and object-oriented programmer, which I haven't mastered yet. I've done a lot of search, but still can't figure that out.
When you do
class Foo(object):
def meth(self):
pass
you are defining a class Foo with a method meth. However, when this class definition is executed, no method object is created to represent the method. The def statement creates an ordinary function object.
If you then do
Foo.meth
or
Foo().meth
the attribute lookup finds the function object, but the function object is not used as the value of the attribute. Instead, using the descriptor protocol, Python calls the __get__ method of the function object to construct a method object, and that method object is used as the value of the attribute for that lookup. For Foo.meth, the method object is an unbound method object, which behaves like the function you defined, but with an extra type checking of self. For Foo().meth, the method object is a bound method object, which already knows what self is.
This is why Foo().meth() doesn't complain about a missing self argument; you pass 0 arguments to the method object, which then prepends self to the (empty) argument list and passes the arguments on to the underlying function object. If Foo().meth evaluated to the meth function directly, you would have to pass it self explicitly.
In Python 3, Foo.meth doesn't create an unbound method object; the function's __get__ still gets called, but it returns the function directly, since Guido decided unbound method objects weren't useful. Foo().meth still creates a bound method object, though.
While looking through the webapp2 documentation online, I found information on the decorator: webapp2.cached_property
In the documentation, it says:
A decorator that converts a function into a lazy property.
My question is:
→ What is a lazy property?
It is a property decorator that gets out of the way after the first call. It allows you to auto-cache a computed value.
The standard library #property decorator is a data descriptor object and is always called, even if there is an attribute on the instance of the same name.
The #cached_property decorator on the other hand, only has a __get__ method, which means that it is not called if there is an attribute with the same name already present. It makes use of this by setting an attribute with the same name on the instance on the first call.
Given a #cached_property-decorated bar method on an instance named foo, this is what happens:
Python resolves foo.bar. No bar attribute is found on the instance.
Python finds the bar descriptor on the class, and calls __get__ on that.
The cached_property __get__ method calls the decorated bar method.
The bar method calculates something, and returns the string 'spam'.
The cached_property __get__ method takes the return value and sets a new attribute bar on the instance; foo.bar = 'spam'.
The cached_property __get__ method returns the 'spam' return value.
If you ask for foo.bar again, Python finds the bar attribute on the instance, and uses that from here on out.
Also see the source code for the original Werkzeug implementation:
# implementation detail: this property is implemented as non-data
# descriptor. non-data descriptors are only invoked if there is
# no entry with the same name in the instance's __dict__.
# this allows us to completely get rid of the access function call
# overhead. If one choses to invoke __get__ by hand the property
# will still work as expected because the lookup logic is replicated
# in __get__ for manual invocation.
Note that as of Python 3.8, the standard library has a similar object, #functools.cached_property(). It's implementation is a little bit more robust, it guards against accidental re-use under a different name, produces a better error message if used on an object without a __dict__ attribute or where that object doesn't support item assignment, and is also thread-safe.
I am new to python, and I don't quite understand the __func__ in python 2.7.
I know when I define a class like this:
class Foo:
def f(self, arg):
print arg
I can use either Foo().f('a') or Foo.f(Foo(), 'a') to call this method. However, I can't call this method by Foo.f(Foo, 'a'). But I accidently found that I can use Foo.f.__func__(Foo, 'a') or even Foo.f.__func__(1, 'a') to get the same result.
I print out the values of Foo.f, Foo().f and Foo.f.__func__, and they are all different. However, I have only one piece of code in definition. Who can help to explain how above code actually works, especially the __func__? I get really confused now.
When you access Foo.f or Foo().f a method is returned; it's unbound in the first case and bound in the second. A python method is essentially a wrapper around a function that also holds a reference to the class it is a method of. When bound, it also holds a reference to the instance.
When you call an method, it'll do a type-check on the first argument passed in to make sure it is an instance (it has to be an instance of the referenced class, or a subclass of that class). When the method is bound, it'll provide that first argument, on an unbound method you provide it yourself.
It's this method object that has the __func__ attribute, which is just a reference to the wrapped function. By accessing the underlying function instead of calling the method, you remove the typecheck, and you can pass in anything you want as the first argument. Functions don't care about their argument types, but methods do.
Note that in Python 3, this has changed; Foo.f just returns the function, not an unbound method. Foo().f returns a method still, still bound, but there is no way to create an unbound method any more.
Under the hood, each function object has a __get__ method, this is what returns the method object:
>>> class Foo(object):
... def f(self): pass
...
>>> Foo.f
<unbound method Foo.f>
>>> Foo().f
<bound method Foo.f of <__main__.Foo object at 0x11046bc10>>
>>> Foo.__dict__['f']
<function f at 0x110450230>
>>> Foo.f.__func__
<function f at 0x110450230>
>>> Foo.f.__func__.__get__(Foo(), Foo)
<bound method Foo.f of <__main__.Foo object at 0x11046bc50>>
>>> Foo.f.__func__.__get__(None, Foo)
<unbound method Foo.f>
This isn't the most efficient codepath, so, Python 3.7 adds a new LOAD_METHOD - CALL_METHOD opcode pair that replaces the current LOAD_ATTRIBUTE - CALL_FUNCTION opcode pair precisely to avoid creating a new method object each time. This optimisation transforms the executon path for instance.foo() from type(instance).__dict__['foo'].__get__(instance, type(instance))() with type(instance).__dict__['foo'](instance), so 'manually' passing in the instance directly to the function object. This saves about 20% time on existing microbenchmarks.
Consider this example of a strategy pattern in Python (adapted from the example here). In this case the alternate strategy is a function.
class StrategyExample(object):
def __init__(self, strategy=None) :
if strategy:
self.execute = strategy
def execute(*args):
# I know that the first argument for a method
# must be 'self'. This is just for the sake of
# demonstration
print locals()
#alternate strategy is a function
def alt_strategy(*args):
print locals()
Here are the results for the default strategy.
>>> s0 = StrategyExample()
>>> print s0
<__main__.StrategyExample object at 0x100460d90>
>>> s0.execute()
{'args': (<__main__.StrategyExample object at 0x100460d90>,)}
In the above example s0.execute is a method (not a plain vanilla function) and hence the first argument in args, as expected, is self.
Here are the results for the alternate strategy.
>>> s1 = StrategyExample(alt_strategy)
>>> s1.execute()
{'args': ()}
In this case s1.execute is a plain vanilla function and as expected, does not receive self. Hence args is empty. Wait a minute! How did this happen?
Both the method and the function were called in the same fashion. How does a method automatically get self as the first argument? And when a method is replaced by a plain vanilla function how does it not get the self as the first argument?
The only difference that I was able to find was when I examined the attributes of default strategy and alternate strategy.
>>> print dir(s0.execute)
['__cmp__', '__func__', '__self__', ...]
>>> print dir(s1.execute)
# does not have __self__ attribute
Does the presence of __self__ attribute on s0.execute (the method), but lack of it on s1.execute (the function) somehow account for this difference in behavior? How does this all work internally?
You can read the full explanation here in the python reference, under "User defined methods". A shorter and easier explanation can be found in the python tutorial's description of method objects:
If you still don’t understand how methods work, a look at the implementation can perhaps clarify matters. When an instance attribute is referenced that isn’t a data attribute, its class is searched. If the name denotes a valid class attribute that is a function object, a method object is created by packing (pointers to) the instance object and the function object just found together in an abstract object: this is the method object. When the method object is called with an argument list, a new argument list is constructed from the instance object and the argument list, and the function object is called with this new argument list.
Basically, what happens in your example is this:
a function assigned to a class (as happens when you declare a method inside the class body) is... a method.
When you access that method through the class, eg. StrategyExample.execute you get an "unbound method": it doesn't "know" to which instance it "belongs", so if you want to use that on an instance, you would need to provide the instance as the first argument yourself, eg. StrategyExample.execute(s0)
When you access the method through the instance, eg. self.execute or s0.execute, you get a "bound method": it "knows" which object it "belongs" to, and will get called with the instance as the first argument.
a function that you assign to an instance attribute directly however, as in self.execute = strategy or even s0.execute = strategy is... just a plain function (contrary to a method, it doesn't pass via the class)
To get your example to work the same in both cases:
either you turn the function into a "real" method: you can do this with types.MethodType:
self.execute = types.MethodType(strategy, self, StrategyExample)
(you more or less tell the class that when execute is asked for this particular instance, it should turn strategy into a bound method)
or - if your strategy doesn't really need access to the instance - you go the other way around and turn the original execute method into a static method (making it a normal function again: it won't get called with the instance as the first argument, so s0.execute() will do exactly the same as StrategyExample.execute()):
#staticmethod
def execute(*args):
print locals()
You need to assign an unbound method (i.e. with a self parameter) to the class or a bound method to the object.
Via the descriptor mechanism, you can make your own bound methods, it's also why it works when you assign the (unbound) function to a class:
my_instance = MyClass()
MyClass.my_method = my_method
When calling my_instance.my_method(), the lookup will not find an entry on my_instance, which is why it will at a later point end up doing this: MyClass.my_method.__get__(my_instance, MyClass) - this is the descriptor protocol. This will return a new method that is bound to my_instance, which you then execute using the () operator after the property.
This will share method among all instances of MyClass, no matter when they were created. However, they could have "hidden" the method before you assigned that property.
If you only want specific objects to have that method, just create a bound method manually:
my_instance.my_method = my_method.__get__(my_instance, MyClass)
For more detail about descriptors (a guide), see here.
The method is a wrapper for the function, and calls the function with the instance as the first argument. Yes, it contains a __self__ attribute (also im_self in Python prior to 3.x) that keeps track of which instance it is attached to. However, adding that attribute to a plain function won't make it a method; you need to add the wrapper. Here is how (although you may want to use MethodType from the types module to get the constructor, rather than using type(some_obj.some_method).
The function wrapped, by the way, is accessible through the __func__ (or im_func) attribute of the method.
When you do self.execute = strategy you set the attribute to a plain method:
>>> s = StrategyExample()
>>> s.execute
<bound method StrategyExample.execute of <__main__.StrategyExample object at 0x1dbbb50>>
>>> s2 = StrategyExample(alt_strategy)
>>> s2.execute
<function alt_strategy at 0x1dc1848>
A bound method is a callable object that calls a function passing an instance as the first argument in addition to passing through all arguments it was called with.
See: Python: Bind an Unbound Method?