Particular implementation of Python Descriptors - python

class Cls():
def __init__(self, start):
self.value = start
class Desc():
def __get__(self, instance ,owner):
print("In Descriptor's __get__method")
return self.value
def __set__(self, instance, start):
print("In Descriptor's __set__ method")
self.value = start
value = Desc()
X = Cls('Hello')
X.value = "Hi"
Above implementation of descriptor is obscure for me. X.value and Cls.value are refering to same object and is of class str. but Cls.__dict__['value'] is descriptor object. There are two types assigned to a name 'value'.
Can somebody explain this?. What is the logic behind this particular implementation. Why Cls.value or X.value is not descriptor object. I am using python 3.3

You are confusing things by using the name value for two things: one is an attribute of Cls, and its value is a descriptor object. The other is an attribute of that descriptor object, and that is the one whose value is the string.
The key thing to remember is that there is only one descriptor object, shared across all instances of the class. When you do self.value = start in your __set__ method, self refers to the descriptor object, so you are setting a "value" attribute on the descriptor object itself, not on the Cls instance. (If you change it to instance.value = start instead, you will get a recursion error, since that will try to call __set__ again.)
You will see what is going on if you create multiple instances of your class:
>>> x = Cls("oops")
In Descriptor's __set__ method
>>> y = Cls("dang")
In Descriptor's __set__ method
>>> x.value
In Descriptor's __get__method
'dang'
>>> Cls.__dict__['value'].value
'dang'
Notice that creating y changed x.value. This is because there is only one descriptor object, and it only has one "value" attribute, so that value is shared across all instances of Cls.
It's not clear what you're trying to achieve here, so it's hard to say how to "fix" this. Some general principles are:
Don't use self in __get__ and __set__ unless you want to store class-level information. To store instance-specific data, you need to use instance.
Even if you do the above, don't use the same name for the descriptor attribute itself and the hidden attribute where it stores its data, or you will step on your own toes.
Unless you want to do something really fancy, you probably can just use property and not write your own descriptor at all. If you "fixed" the descriptor you wrote above, it would still be useless, because it wouldn't do anything that property doesn't already do.

Related

How does object.__getattribute__ redirect to __get__ method on my descriptor and __getattr__?

I have a two-part question regarding the implementation of object.__getattribute(self, key), but they are both centered around my confusion of how it is working.
I've defined a data descriptor called NonNullStringDescriptor that I intended to attach to attributes.
class NonNullStringDescriptor:
def __init__(self, value: str = "Default Name"):
self.value = value
def __get__(self, instance, owner):
return self.value
def __set__(self, instance, value):
if isinstance(value, str) and len(value.strip()) > 0:
self.value = value
else:
raise TypeError("The value provided is not a non-null string.")
I then declare a Person class with an attribute of name.
class Person:
name = NonNullStringDescriptor()
def __init__(self):
self.age = 22
def __getattribute__(self, key):
print(f"__getattribute__({key})")
v = super(Person, self).__getattribute__(key)
print(f"Returned {v}")
if hasattr(v, '__get__'):
print("Invoking __get__")
return v.__get__(None, self)
return v
def __getattr__(self, item):
print(f"__getattr__ invoked.")
return "Unknown"
Now, I try accessing variable attributes, some that are descriptors, some normal instance attributes, and others that don't exist:
person = Person()
print("Printing", person.age) # "normal" attribute
print("Printing", person.hobby) # non-existent attribute
print("Printing", person.name) # descriptor attribute
The output that is see is
__getattribute__(age)
Returned 22
Printing 22
__getattribute__(hobby)
__getattr__ invoked.
Printing Unknown
__getattribute__(name)
Returned Default Name
Printing Default Name
I have two main questions, both of which center around super(Person, self).__getattribute__(key):
When I attempt to access a non-existent attribute, like hobby, I see that it redirects to __getattr__, which I know is often the "fallback" method in attribute lookup. However, I see that __getattribute__ is what is invoking this method. However, the Returned ... console output is never printed meaning that the rest of the __getattribute__ does not complete - so how exactly is __getattribute__ invoking __getattr__ directly, returning this default "Unknown" value without executing the rest of its own function call?
I would expect that what is returned from super(Person, self).__getattribute__(key) (v), is the data descriptor instance of NonNullStringDescriptor. However, I see that v is actually the string "Default Name" itself! So how does object.__getattribute__(self, key) just know to use the __get__ method of my descriptor, instead of returning the descriptor instance?
There's references to behavior in the Descriptor Protocol:
If the looked-up value is an object defining one of the descriptor
methods, then Python may override the default behavior and invoke the
descriptor method instead.
But it's never explicitly defined to me what is actually happening in object.__getattribute(self, key) that performs the override. I know that ultimately person.name gets converted into a low-level call to type(person).__dict__["name"].__get__(person, type(person))- is this all happening in object.__getattribute__?
I found this SO post, which describes proper implementation of __getattribute__, but I'm more curious at what is actually happening in object.__getattribute__. However, my IDE (PyCharm) only provides a stub for its implementation:
def __getattribute__(self, *args, **kwargs): # real signature unknown
""" Return getattr(self, name). """
pass
__getattribute__ doesn't call __getattr__. The __getattr__ fallback happens in the attribute access machinery, after __getattribute__ raises an AttributeError. If you want to see the implementation, it's in slot_tp_getattr_hook in Objects/typeobject.c.
object.__getattribute__ knows to call __get__ because there's code in object.__getattribute__ that calls __get__. It's pretty straightforward. If you want to see the implementation, object.__getattribute__ is PyObject_GenericGetAttr in the implementation (yes, even though it says GetAttr - the C side of things is a little different from the Python side), and there are two __get__ call sites (one for data descriptors and one for non-data descriptors), here and here.

Difference between foo.bar() and bar(foo)?

Consider:
class Parent():
def __init__(self, last_name, eye_color):
self.last_name = last_name
self.eye_color = eye_color
def show_info(self):
print("Last Name - "+self.last_name)
print("Eye Color - "+self.eye_color)
billy_cyrus = Parent("Cyrus", "blue")
The above is from the Udacity Python course. I discovered I'm able to call show_info for instance billy_cyrus using either of the following:
billy_cyrus.show_info()
Parent.show_info(billy_cyrus)
I'm curious as to why. Is there a difference between the two methods? If so when would one be used vs. the other? I'm using Python 3.6 if that matters.
In terms of just calling the method, there is no difference most of the time. In terms of how the underlying machinery, works, there is a bit of a difference.
Since show_info is a method, it is a descriptor in the class. That means that when you access it through an instance in which it is not shadowed by another attribute, the . operator calls __get__ on the descriptor to create a bound method for that instance. A bound method is basically a closure that passes in the self parameter for you before any of the other arguments you supply. You can see the binding happen like this:
>>> billy_cyrus.show_info
<bound method Parent.show_info of <__main__.Parent object at 0x7f7598b14be0>>
A different closure is created every time you use the . operator on a class method.
If you access the method through the class object, on the other hand, it does not get bound. The method is a descriptor, which is just a regular attribute of the class:
>>> Parent.show_info
<function __main__.Parent.show_info>
You can simulate the exact behavior of binding a method before calling it by calling its __get__ yourself:
>>> bound_meth = Parent.show_info.__get__(billy_cyrus, type(billy_cyrus))
>>> bound_meth
<bound method Parent.show_info of <__main__.Parent object at 0x7f7598b14be0>>
Again, this will not make any difference to you in 99.99% of cases, since functionally bound_meth() and Parent.bound_meth(billy_cyrus) end up calling the same underlying function object with the same parameters.
Where it matters
There are a couple of places where it matters how you call a class method. One common use case is when you override a method, but want to use the definition provided in the parent class. For example, say I have a class that I made "immutable" by overriding __setattr__. I can still set attributes on the instance, as in the __init__ method shown below:
class Test:
def __init__(self, a):
object.__setattr__(self, 'a', a)
def __setattr__(self, name, value):
raise ValueError('I am immutable!')
If I tried to do a normal call to __setattr__ in __init__ by doing self.a = a, a ValueError would be raised every time. But by using object.__setattr__, I can bypass this limitation. Alternatively, I could do super().__setattr__('a', a) for the same effect, or self.__dict__['a'] = a for a very similar one.
#Silvio Mayolo's answer has another good example, where you would deliberately want to use the class method as a function that could be applied to many objects.
Another place it matters (although not in terms of calling methods), is when you use other common descriptors like property. Unlike methods, properties are data-descriptors. This means that they define a __set__ method (and optionally __delete__) in addition to __get__. A property creates a virtual attribute whose getter and setter are arbitrarily complex functions instead of just simple assignments. To properly use a property, you have to do it through the instance. For example:
class PropDemo:
def __init__(self, x=0):
self.x = x
#property
def x(self):
return self.__dict__['x']
#x.setter
def x(self, value):
if value < 0:
raise ValueError('Not negatives, please!')
self.__dict__['x'] = value
Now you can do something like
>>> inst = PropDemo()
>>> inst.x
0
>>> inst.x = 3
>>> inst.x
3
If you try to access the property through the class, you can get the underlying descriptor object since it will be an unbound attribute:
>>> PropDemo.x
<property at 0x7f7598af00e8>
On a side note, hiding attributes with the same name as a property in __dict__ is a neat trick that works because data descriptors in a class __dict__ trump entries in the instance __dict__, even though instance __dict__ entries trump non-data-descriptors in a class.
Where it can Get Weird
You can override a class method with an instance method in Python. That would mean that type(foo).bar(foo) and foo.bar() don't call the same underlying function at all. This is irrelevant for magic methods because they always use the former invocation, but it can make a big difference for normal method calls.
There are a few ways to override a method on an instance. The one I find most intuitive is to set the instance attribute to a bound method. Here is an example of a modified billy_cyrus, assuming the definition of Parent in the original question:
def alt_show_info(self):
print('Another version of', self)
billy_cyrus.show_info = alt_show_info.__get__(billy_cyrus, Parent)
In this case, calling the method on the instance vs the class would have completely different results. This only works because methods are non-data descriptors by the way. If they were data descriptors (with a __set__ method), the assignment billy_cyrus.show_info = alt_show_info.__get__(billy_cyrus, Parent) would not override anything but would instead just redirect to __set__, and manually setting it in b
billy_cyrus's __dict__ would just get it ignored, as happens with a property.
Additional Resources
Here are a couple of resources on descriptors:
Python Reference - Descriptor Protocol: http://python-reference.readthedocs.io/en/latest/docs/dunderdsc/
(Official?) Descriptor HowTo Guide: https://docs.python.org/3/howto/descriptor.html
There is no semantic difference between the two. It's entirely a matter of style. You would generally use billy_cyrus.show_info() in normal use, but the fact that the second approach is allowed permits you to use Parent.show_info to get the method as a first-class object itself. If that was not allowed, then it would not be possible (or at least, it would be fairly difficult) to do something like this.
function = Parent.show_info
so_many_billy_cyrus = [billy_cyrus, billy_cyrus, billy_cyrus]
map(function, so_many_billy_cyrus)

How can I access attributes stored on Python descriptors?

Let's say I have the following descriptor:
class MyDescriptor(object):
def __init__(self, name, type_):
self.name = name
self.type_ = type_
def __set__(self, obj, value):
assert isinstance(value, self.type_)
obj.__dict__[self.name] = value
Is there a way to access type_ from an object employing MyDescriptor?
i.e.
class MyObject(object):
x = MyDescriptor('x', int)
my_object = MyObject()
my_object.x = 5
print my_object.x.type_
As far as I'm aware, this will raise AttributeError as my_object.x is an int. But, I'm wondering if there's a good way to associate metadata with descriptors.
EDIT: adjusted wording to indicate that there's one instance of a descriptor per class.
Is there a way to access type_ from the object instance which owns the MyDescriptor instance?
There is no object instance which owns the MyDescriptor instance. There is one instance of MyDescriptor which is stored on the class of which the descriptor is an attribute (MyObject in your example). That's how descriptors work. You can access this descriptor instance via the class as described in user2357112's answer, but be aware that you're accessing class-level data. If you want to store instance-level data with the descriptor, you need to store it on the instance itself (i.e., on the object passed as obj to your __set__/__get__) rather than on the descriptor.
You need to access the actual descriptor object. For your descriptor, that can be done with
type(my_object).x.type_
or
MyObject.x.type_
For descriptors where MyObject.x is not the actual descriptor object, such as functions on Python 2, you may need to find the descriptor by looking in the class __dict__, or looking through the dicts of all classes in the MRO if you want a generic way to find inherited descriptors. (For the specific case I just mentioned, you can also use the __func__ attribute of the unbound method object, but that won't work for other descriptors.)

Why should classes with __get__ or __set__ know who uses them?

I just read about descriptors and it felt very unintentional that the behavior of a class can depend on who uses it. The two methods
__get__(self, instance, owner)
__set__(self, instance, value)
do exactly that. They get in the instance of the class that uses them. What is the reason for this design decision? How is it used?
Update: I think of descriptors as normal types. The class that uses them as a member type can be easily manipulated by side effects of the descriptor. Here is an example of what I mean. Why does Python supprt that?
class Age(object):
def __init__(value):
self.value = value
def __get__(self, instance, owener):
instance.name = 'You got manipulated'
return self.value
class Person(object):
age = Age(42)
name = 'Peter'
peter = Person()
print(peter.name, 'is', peter.age)
__get__ and __set__ receive no information about who's calling them. The 3 arguments are the descriptor object itself, the object whose attribute is being accessed, and the type of the object.
I think the best way to clear this up is with an example. So, here's one:
class Class:
def descriptor(self):
return
foo_instance = Foo()
method_object = foo_instance.descriptor
Functions are descriptors. When you access an object's method, the method object is created by finding the function that implements the method and calling __get__. Here,
method_object = foo_instance.descriptor
calls descriptor.__get__(foo_instance, Foo) to create the method_object. The __get__ method receives no information about who's calling it, only the information needed to perform its task of attribute access.
Descriptors are used to implement binding behaviour; a descriptor requires a context, the object on which they act.
That object is the instance object passed in.
Note that without a descriptor, attribute access on an object acts directly on the object attributes (the instance __dict__ when setting or deleting, otherwise the class and base classes attributes are searched as well).
A descriptor lets you delegate that access to a separate object entirely, encapsulating getting, setting and deleting. But to be able to do so, that object needs access to the context, the instance. Because getting an attribute also normally searches the class and its bases, the __get__ descriptor method is also passed the class (owner) of the instance.
Take functions, for example. A function is a descriptor too, and binding them to an instance produces a method. A class can have any number of instances, but it makes little sense to store bound methods on all those instances when you create the instance, that would be wasteful.
Instead, functions are bound dynamically; you look up the function name on the instance, the function is found on the class instead, and with a call to __get__ the function is bound to the instance, returning a method object. This method object can then pass in the instance to the function when called, producing the self argument.
An example of the descriptor protocol in action is bound methods. When you access an instance method o.foo you can either call it immediately or save it into a variable: a = o.foo. Now, when you call a(x, y, z) the instance o is passed to foo as the first self parameter:
class C(object):
def foo(self, x, y, z):
print(self, x, y, z)
o = C()
a = o.foo
a(1, 2, 3) # prints <C instance at 0x...> 1 2 3
This works because functions implement the descriptor protocol; when you __get__ a function on an object instance it returns a bound method, with the instance bound to the function.
There would be no way for the above to work without the descriptor protocol giving access to the object instance.

Why does a python descriptor __get__ method accept the owner class as an arg?

Why does the __get__ method in a python descriptor accept the owner class as it's third argument? Can you give an example of it's use?
The first argument (self) is self evident, the second (instances) makes sense in the context of the typically shown descriptor pattern (ex to follow), but I've never really seen the third (owner) used. Can someone explain what the use case is for it?
Just by way of reference and facilitating answers this is the typical use of descriptors I've seen:
class Container(object):
class ExampleDescriptor(object):
def __get__(self, instance, owner):
return instance._name
def __set__(self, instance, value):
instance._name = value
managed_attr = ExampleDescriptor()
Given that instance.__class__ is available all I can think of is that explicitly passing the class has something to do with directly accessing the descriptor from the class instead of an instances (ex Container.managed_attr). Even so I'm not clear on what one would do in __get__ in this situation.
owner is used when the attribute is accessed from the class instead of an instance of the class, in which case instance will be None.
In your example attempting something like print(Container.managed_attr) would fail because instance is None so instance._name would raise an AttributeError.
You could improve this behavior by checking to see if instance is None, and it may be useful for logging or raising a more helpful exception to know which class the descriptor belongs to, hence the owner attribute. For example:
def __get__(self, instance, owner):
if instance is None:
# special handling for Customer.managed_attr
else:
return instance._name
When the descriptor is accessed from the class, instance will be None. If you have not accounted for that situation (as your example code does not) then an error will occur at that point.
What should you do in that case? Whatever is sensible. ;) If nothing else makes sense you could follow property's example and return the descriptor itself when accessed from the class.
Yes, it's used so that the descriptor can see Container when Container.managed_attr is accessed. You could return some object appropriate to the use case, like an unbound method when descriptors are used to implement methods.
I think the most famous application of the owner parameter of the __get__ method in Python is the classmethod decorator. Here is a pure Python version:
import types
class ClassMethod:
"Emulate PyClassMethod_Type() in Objects/funcobject.c."
def __init__(self, f):
self.f = f
def __get__(self, instance, owner=None):
if instance is None and owner is None:
raise TypeError("__get__(None, None) is invalid")
if owner is None:
owner = type(instance)
if hasattr(self.f, "__get__"):
return self.f.__get__(owner)
return types.MethodType(self.f, owner)
Thanks to the owner parameter, classmethod works for attribute lookup not only from an instance but also from a class:
class A:
#ClassMethod
def name(cls):
return cls.__name__
A().name() # returns 'A' so attribute lookup from an instance works
A.name() # returns 'A' so attribute lookup from a class works too

Categories