I just read about descriptors and it felt very unintentional that the behavior of a class can depend on who uses it. The two methods
__get__(self, instance, owner)
__set__(self, instance, value)
do exactly that. They get in the instance of the class that uses them. What is the reason for this design decision? How is it used?
Update: I think of descriptors as normal types. The class that uses them as a member type can be easily manipulated by side effects of the descriptor. Here is an example of what I mean. Why does Python supprt that?
class Age(object):
def __init__(value):
self.value = value
def __get__(self, instance, owener):
instance.name = 'You got manipulated'
return self.value
class Person(object):
age = Age(42)
name = 'Peter'
peter = Person()
print(peter.name, 'is', peter.age)
__get__ and __set__ receive no information about who's calling them. The 3 arguments are the descriptor object itself, the object whose attribute is being accessed, and the type of the object.
I think the best way to clear this up is with an example. So, here's one:
class Class:
def descriptor(self):
return
foo_instance = Foo()
method_object = foo_instance.descriptor
Functions are descriptors. When you access an object's method, the method object is created by finding the function that implements the method and calling __get__. Here,
method_object = foo_instance.descriptor
calls descriptor.__get__(foo_instance, Foo) to create the method_object. The __get__ method receives no information about who's calling it, only the information needed to perform its task of attribute access.
Descriptors are used to implement binding behaviour; a descriptor requires a context, the object on which they act.
That object is the instance object passed in.
Note that without a descriptor, attribute access on an object acts directly on the object attributes (the instance __dict__ when setting or deleting, otherwise the class and base classes attributes are searched as well).
A descriptor lets you delegate that access to a separate object entirely, encapsulating getting, setting and deleting. But to be able to do so, that object needs access to the context, the instance. Because getting an attribute also normally searches the class and its bases, the __get__ descriptor method is also passed the class (owner) of the instance.
Take functions, for example. A function is a descriptor too, and binding them to an instance produces a method. A class can have any number of instances, but it makes little sense to store bound methods on all those instances when you create the instance, that would be wasteful.
Instead, functions are bound dynamically; you look up the function name on the instance, the function is found on the class instead, and with a call to __get__ the function is bound to the instance, returning a method object. This method object can then pass in the instance to the function when called, producing the self argument.
An example of the descriptor protocol in action is bound methods. When you access an instance method o.foo you can either call it immediately or save it into a variable: a = o.foo. Now, when you call a(x, y, z) the instance o is passed to foo as the first self parameter:
class C(object):
def foo(self, x, y, z):
print(self, x, y, z)
o = C()
a = o.foo
a(1, 2, 3) # prints <C instance at 0x...> 1 2 3
This works because functions implement the descriptor protocol; when you __get__ a function on an object instance it returns a bound method, with the instance bound to the function.
There would be no way for the above to work without the descriptor protocol giving access to the object instance.
Related
I'd like a particular function to be callable as a classmethod, and to behave differently when it's called on an instance.
For example, if I have a class Thing, I want Thing.get_other_thing() to work, but also thing = Thing(); thing.get_other_thing() to behave differently.
I think overwriting the get_other_thing method on initialization should work (see below), but that seems a bit hacky. Is there a better way?
class Thing:
def __init__(self):
self.get_other_thing = self._get_other_thing_inst()
#classmethod
def get_other_thing(cls):
# do something...
def _get_other_thing_inst(self):
# do something else
Great question! What you seek can be easily done using descriptors.
Descriptors are Python objects which implement the descriptor protocol, usually starting with __get__().
They exist, mostly, to be set as a class attribute on different classes. Upon accessing them, their __get__() method is called, with the instance and owner class passed in.
class DifferentFunc:
"""Deploys a different function accroding to attribute access
I am a descriptor.
"""
def __init__(self, clsfunc, instfunc):
# Set our functions
self.clsfunc = clsfunc
self.instfunc = instfunc
def __get__(self, inst, owner):
# Accessed from class
if inst is None:
return self.clsfunc.__get__(None, owner)
# Accessed from instance
return self.instfunc.__get__(inst, owner)
class Test:
#classmethod
def _get_other_thing(cls):
print("Accessed through class")
def _get_other_thing_inst(inst):
print("Accessed through instance")
get_other_thing = DifferentFunc(_get_other_thing,
_get_other_thing_inst)
And now for the result:
>>> Test.get_other_thing()
Accessed through class
>>> Test().get_other_thing()
Accessed through instance
That was easy!
By the way, did you notice me using __get__ on the class and instance function? Guess what? Functions are also descriptors, and that's the way they work!
>>> def func(self):
... pass
...
>>> func.__get__(object(), object)
<bound method func of <object object at 0x000000000046E100>>
Upon accessing a function attribute, it's __get__ is called, and that's how you get function binding.
For more information, I highly suggest reading the Python manual and the "How-To" linked above. Descriptors are one of Python's most powerful features and are barely even known.
Why not set the function on instantiation?
Or Why not set self.func = self._func inside __init__?
Setting the function on instantiation comes with quite a few problems:
self.func = self._funccauses a circular reference. The instance is stored inside the function object returned by self._func. This on the other hand is stored upon the instance during the assignment. The end result is that the instance references itself and will clean up in a much slower and heavier manner.
Other code interacting with your class might attempt to take the function straight out of the class, and use __get__(), which is the usual expected method, to bind it. They will receive the wrong function.
Will not work with __slots__.
Although with descriptors you need to understand the mechanism, setting it on __init__ isn't as clean and requires setting multiple functions on __init__.
Takes more memory. Instead of storing one single function, you store a bound function for each and every instance.
Will not work with properties.
There are many more that I didn't add as the list goes on and on.
Here is a bit hacky solution:
class Thing(object):
#staticmethod
def get_other_thing():
return 1
def __getattribute__(self, name):
if name == 'get_other_thing':
return lambda: 2
return super(Thing, self).__getattribute__(name)
print Thing.get_other_thing() # 1
print Thing().get_other_thing() # 2
If we are on class, staticmethod is executed. If we are on instance, __getattribute__ is first to be executed, so we can return not Thing.get_other_thing but some other function (lambda in my case)
Consider:
class Parent():
def __init__(self, last_name, eye_color):
self.last_name = last_name
self.eye_color = eye_color
def show_info(self):
print("Last Name - "+self.last_name)
print("Eye Color - "+self.eye_color)
billy_cyrus = Parent("Cyrus", "blue")
The above is from the Udacity Python course. I discovered I'm able to call show_info for instance billy_cyrus using either of the following:
billy_cyrus.show_info()
Parent.show_info(billy_cyrus)
I'm curious as to why. Is there a difference between the two methods? If so when would one be used vs. the other? I'm using Python 3.6 if that matters.
In terms of just calling the method, there is no difference most of the time. In terms of how the underlying machinery, works, there is a bit of a difference.
Since show_info is a method, it is a descriptor in the class. That means that when you access it through an instance in which it is not shadowed by another attribute, the . operator calls __get__ on the descriptor to create a bound method for that instance. A bound method is basically a closure that passes in the self parameter for you before any of the other arguments you supply. You can see the binding happen like this:
>>> billy_cyrus.show_info
<bound method Parent.show_info of <__main__.Parent object at 0x7f7598b14be0>>
A different closure is created every time you use the . operator on a class method.
If you access the method through the class object, on the other hand, it does not get bound. The method is a descriptor, which is just a regular attribute of the class:
>>> Parent.show_info
<function __main__.Parent.show_info>
You can simulate the exact behavior of binding a method before calling it by calling its __get__ yourself:
>>> bound_meth = Parent.show_info.__get__(billy_cyrus, type(billy_cyrus))
>>> bound_meth
<bound method Parent.show_info of <__main__.Parent object at 0x7f7598b14be0>>
Again, this will not make any difference to you in 99.99% of cases, since functionally bound_meth() and Parent.bound_meth(billy_cyrus) end up calling the same underlying function object with the same parameters.
Where it matters
There are a couple of places where it matters how you call a class method. One common use case is when you override a method, but want to use the definition provided in the parent class. For example, say I have a class that I made "immutable" by overriding __setattr__. I can still set attributes on the instance, as in the __init__ method shown below:
class Test:
def __init__(self, a):
object.__setattr__(self, 'a', a)
def __setattr__(self, name, value):
raise ValueError('I am immutable!')
If I tried to do a normal call to __setattr__ in __init__ by doing self.a = a, a ValueError would be raised every time. But by using object.__setattr__, I can bypass this limitation. Alternatively, I could do super().__setattr__('a', a) for the same effect, or self.__dict__['a'] = a for a very similar one.
#Silvio Mayolo's answer has another good example, where you would deliberately want to use the class method as a function that could be applied to many objects.
Another place it matters (although not in terms of calling methods), is when you use other common descriptors like property. Unlike methods, properties are data-descriptors. This means that they define a __set__ method (and optionally __delete__) in addition to __get__. A property creates a virtual attribute whose getter and setter are arbitrarily complex functions instead of just simple assignments. To properly use a property, you have to do it through the instance. For example:
class PropDemo:
def __init__(self, x=0):
self.x = x
#property
def x(self):
return self.__dict__['x']
#x.setter
def x(self, value):
if value < 0:
raise ValueError('Not negatives, please!')
self.__dict__['x'] = value
Now you can do something like
>>> inst = PropDemo()
>>> inst.x
0
>>> inst.x = 3
>>> inst.x
3
If you try to access the property through the class, you can get the underlying descriptor object since it will be an unbound attribute:
>>> PropDemo.x
<property at 0x7f7598af00e8>
On a side note, hiding attributes with the same name as a property in __dict__ is a neat trick that works because data descriptors in a class __dict__ trump entries in the instance __dict__, even though instance __dict__ entries trump non-data-descriptors in a class.
Where it can Get Weird
You can override a class method with an instance method in Python. That would mean that type(foo).bar(foo) and foo.bar() don't call the same underlying function at all. This is irrelevant for magic methods because they always use the former invocation, but it can make a big difference for normal method calls.
There are a few ways to override a method on an instance. The one I find most intuitive is to set the instance attribute to a bound method. Here is an example of a modified billy_cyrus, assuming the definition of Parent in the original question:
def alt_show_info(self):
print('Another version of', self)
billy_cyrus.show_info = alt_show_info.__get__(billy_cyrus, Parent)
In this case, calling the method on the instance vs the class would have completely different results. This only works because methods are non-data descriptors by the way. If they were data descriptors (with a __set__ method), the assignment billy_cyrus.show_info = alt_show_info.__get__(billy_cyrus, Parent) would not override anything but would instead just redirect to __set__, and manually setting it in b
billy_cyrus's __dict__ would just get it ignored, as happens with a property.
Additional Resources
Here are a couple of resources on descriptors:
Python Reference - Descriptor Protocol: http://python-reference.readthedocs.io/en/latest/docs/dunderdsc/
(Official?) Descriptor HowTo Guide: https://docs.python.org/3/howto/descriptor.html
There is no semantic difference between the two. It's entirely a matter of style. You would generally use billy_cyrus.show_info() in normal use, but the fact that the second approach is allowed permits you to use Parent.show_info to get the method as a first-class object itself. If that was not allowed, then it would not be possible (or at least, it would be fairly difficult) to do something like this.
function = Parent.show_info
so_many_billy_cyrus = [billy_cyrus, billy_cyrus, billy_cyrus]
map(function, so_many_billy_cyrus)
This question already has answers here:
How does `super` interacts with a class's `__mro__` attribute in multiple inheritance?
(2 answers)
Closed 4 years ago.
From Python3's documentation super() "returns a proxy object that delegates method calls to a parent or sibling class of type." What does that mean?
Suppose I have the following code:
class SuperClass():
def __init__(self):
print("__init__ from SuperClass.")
print("self object id from SuperClass: " + str(id(self)))
class SubClass(SuperClass):
def __init__(self):
print("__init__ from SubClass.")
print("self object id from SubClass: " + str(id(self)))
super().__init__()
sc = SubClass()
The output I get from this is:
__init__ from SubClass.
self object id from SubClass: 140690611849200
__init__ from SuperClass.
self object id from SuperClass: 140690611849200
This means that in the line super().__init__(), super() is returning the current object which is then implicitly passed to the superclass' __init__() method. Is this accurate or am I missing something here?
To put it simply, I want to understand the following:
When super().__init__() is run,
What exactly is being passed to __init__() and how? We are calling it on super() so whatever this is returning should be getting passed to the __init__() method from what I understand about Python so far.
Why don't we have to pass in self to super().__init__()?
returns a proxy object that delegates method calls to a parent or
sibling class of type.
This proxy is an object that acts as the method-calling portion of the parent class. It is not the class itself; rather, it's just enough information so that you can use it to call the parent class methods.
If you call __init__(), you get your own, local, sub-class __init__ function. When you call super(), you get that proxy object, which will redirect you to the parent-class methods. Thus, when you call super().__init__(), that proxy redirects the call to the parent-class __init__ method.
Similarly, if you were to call super().foo, you would get the foo method from the parent class -- again, re-routed by that proxy.
Is that clear to you?
Responses to OP comments
But that must mean that this proxy object is being passed to
__init__() when running super().__init__() right?
Wrong. The proxy object is like a package name, such as calling math.sqrt(). You're not passing math to sqrt, you're using it to denote which sqrt you're using. If you wanted to pass the proxy to __init__, the call would be __init__(super()). That call would be semantically ridiculous, of course.
When we have to actually pass in self which is the sc object in my example.
No, you are not passing in sc; that is the result of the object creation call (internal method __new__), which includes an invocation of init. For __init__, the self object is a new item created for you by the Python run-time system. For most class methods, that first argument (called self out of convention, this in other languages) is the object that invoked the method.
This means that in the line super().__init__(), super() is returning the current object which is then implicitly passed to the superclass' __init__() method. Is this accurate or am I missing something here?
>>> help(super)
super() -> same as super(__class__, <first argument>)
super call returns a proxy/wrapper object which remembers:
The instance invoking super()
The class of the calling object
The class that's invoking super()
This is perfectly sound. super always fetches the attribute of the next class in the hierarchy ( really the MRO) that has the attribute that you're looking for. So it's not returning the current object, but rather and more accurately, it returns an object that remembers enough information to search for attributes higher in the class hierarchy.
What exactly is being passed to __init__() and how? We are calling it on super() so whatever this is returning should be getting passed to the __init__() method from what I understand about Python so far.
You're almost right. But super loves to play tricks on us. super class defines __getattribute__, this method is responsible for attribute search. When you do something like: super().y(), super.__getattribute__ gets called searching for y. Once it finds y it passes the instance that's invoking the super call to y. Also, super has __get__ method, which makes it a descriptor, I'll omit the details of descriptors here, refer to the documentation to know more. This answers your second question as well, as to why self isn't passed explicitly.
*Note: super is a little bit different and relies on some magic. Almost for all other classes, the behavior is the same. That is:
a = A() # A is a class
a.y() # same as A.y(a), self is a
But super is different:
class A:
def y(self):
return self
class B(A):
def y(self)
return super().y() # equivalent to: A.y(self)
b = B()
b.y() is b # True: returns b not super(), self is b not super()
I wrote a simple test to investigate what CPython does for super:
class A:
pass
class B(A):
def f(self):
return super()
#classmethod
def g(cls):
return super()
def h(selfish):
selfish = B()
return super()
class C(B):
pass
c = C()
for method in 'fgh':
super_object = getattr(c, method)()
print(super_object, super_object.__self__, super_object.__self_class__, super_object.__thisclass__) # (These methods were found using dir.)
The zero-argument super call returns an object that stores three things:
__self__ stores the object whose name matches the first parameter of the method—even if that name has been reassigned.
__self_class__ stores its type, or itself in the case of a class method.
__thisclass__ stores the class in which the method is defined.
(It is unfortunate that __thisclass__ was implemented this way rather than fetching an attribute on the method because it makes it impossible to use the zero-argument form of super with meta-programming.)
The object returned by super implements getattribute, which forwards method calls to the type found in the __mro__ of __self_class__ one step after __thisclass__.
I was searching for the meaning of default parameters object,self that are present as default class and function parameters, so moving away from it, if we are calling an attribute of a class should we use Foo (class reference) or should we use Foo() (instance of the class).
If you are reading a normal attribute then it doesn't matter. If you are binding a normal attribute then you must use the correct one in order for the code to work. If you are accessing a descriptor then you must use an instance.
The details of python's class semantics are quite well documented in the data model. Especially the __get__ semantics are at work here. Instances basically stack their namespace on top of their class' namespace and add some boilerplate for calling methods.
There are some large "it depends on what you are doing" gotchas at work here. The most important question: do you want to access class or instance attributes? Second, do you want attribute or methods?
Let's take this example:
class Foo(object):
bar = 1
baz = 2
def __init__(self, foobar="barfoo", baz=3):
self.foobar = foobar
self.baz = baz
def meth(self, param):
print self, param
#classmethod
def clsmeth(cls, param):
print cls, param
#staticmethod
def stcmeth(param):
print param
Here, bar is a class attribute, so you can get it via Foo.bar. Since instances have implicit access to their class namespace, you can also get it as Foo().bar. foobar is an instance attribute, since it is never bound to the class (only instances, i.e. selfs) - you can only get it as Foo().foobar. Last, baz is both a class and an instance attribute. By default, Foo.baz == 2 and Foo().baz == 3, since the class attribute is hidden by the instance attribute set in __init__.
Similarly, in an assignment there are slight differences whether you work on the class or an instance. Foo.bar=2 will set the class attribute (also for all instances) while Foo().bar=2 will create an instance attribute that shadows the class attribute for this specific instance.
For methods, it is somewhat similar. However, here you get the implicit self parameter for instance method (what a function is if defined for a class). Basically, the call Foo().meth(param=x) is silently translated to Foo.meth(self=Foo(), param=x). This is why it is usually not valid to call Foo.meth(param=x) - meth is not "bound" to an instance and thus lacks the self parameter.
Now, sometimes you do not need any instance data in a method - for example, you have strict string transformation that is an implementation detail of a larger parser class. This is where #classmethod and #staticmethod come into play. A classmethod's first parameter is always the class, as opposed to the instance for regular methods. Foo().clsmeth(param=x) and Foo.clsmeth(param=x) result in a call of clsmethod(cls=Foo, param=x). Here, the two are equivalent. Going one step further, a staticmethod doesn't get any class or instance information - it is like a raw function bound to the classes namespace.
class Cls():
def __init__(self, start):
self.value = start
class Desc():
def __get__(self, instance ,owner):
print("In Descriptor's __get__method")
return self.value
def __set__(self, instance, start):
print("In Descriptor's __set__ method")
self.value = start
value = Desc()
X = Cls('Hello')
X.value = "Hi"
Above implementation of descriptor is obscure for me. X.value and Cls.value are refering to same object and is of class str. but Cls.__dict__['value'] is descriptor object. There are two types assigned to a name 'value'.
Can somebody explain this?. What is the logic behind this particular implementation. Why Cls.value or X.value is not descriptor object. I am using python 3.3
You are confusing things by using the name value for two things: one is an attribute of Cls, and its value is a descriptor object. The other is an attribute of that descriptor object, and that is the one whose value is the string.
The key thing to remember is that there is only one descriptor object, shared across all instances of the class. When you do self.value = start in your __set__ method, self refers to the descriptor object, so you are setting a "value" attribute on the descriptor object itself, not on the Cls instance. (If you change it to instance.value = start instead, you will get a recursion error, since that will try to call __set__ again.)
You will see what is going on if you create multiple instances of your class:
>>> x = Cls("oops")
In Descriptor's __set__ method
>>> y = Cls("dang")
In Descriptor's __set__ method
>>> x.value
In Descriptor's __get__method
'dang'
>>> Cls.__dict__['value'].value
'dang'
Notice that creating y changed x.value. This is because there is only one descriptor object, and it only has one "value" attribute, so that value is shared across all instances of Cls.
It's not clear what you're trying to achieve here, so it's hard to say how to "fix" this. Some general principles are:
Don't use self in __get__ and __set__ unless you want to store class-level information. To store instance-specific data, you need to use instance.
Even if you do the above, don't use the same name for the descriptor attribute itself and the hidden attribute where it stores its data, or you will step on your own toes.
Unless you want to do something really fancy, you probably can just use property and not write your own descriptor at all. If you "fixed" the descriptor you wrote above, it would still be useless, because it wouldn't do anything that property doesn't already do.