according to this guide on python descriptors
https://docs.python.org/howto/descriptor.html
method objects in new style classes are implemented using descriptors in order to avoid special casing them in attribute lookup.
the way I understand this is that there is a method object type that implements __get__ and returns a bound method object when called with an instance and an unbound method object when called with no instance and only a class. the article also states that this logic is implemented in the object.__getattribute__ method. like so:
def __getattribute__(self, key):
"Emulate type_getattro() in Objects/typeobject.c"
v = object.__getattribute__(self, key)
if hasattr(v, '__get__'):
return v.__get__(None, self)
return v
however object.__getattribute__ is itself a method! so how is it bound to an object (without infinite recursion)? if it is special cased in the attribute lookup does that not defeat the purpose of removing the old style special casing?
Actually, in CPython the default __getattribute__ implementation is not a Python method, but is instead implemented in C. It can access object slots (entries in the C structure representing Python objects) directly, without bothering to go through the pesky attribute access routine.
Just because your Python code has to do this, doesn't mean the C code has to. :-)
If you do implement a Python __getattribute__ method, just use object.__getattribute__(self, attrname), or better still, super().__getattribute__(attrname) to access attributes on self. That way you won't hit recursion either.
In the CPython implementation, the attribute access is actually handled by the tp_getattro slot in the C type object, with a fallback to the tp_getattr slot.
To be exhaustive and to fully expose what the C code does, when you use attribute access on an instance, here is the full set of functions called:
Python translates attribute access to a call to the PyObject_GetAttr() C function. The implementation for that function looks up the tp_getattro or tp_getattr slot for your class.
The object type has filled the tp_getattro slot with the PyObject_GenericGetAttr function, which delegates the call to _PyObject_GenericGetAttrWithDict (with the *dict pointer set to NULL and the suppress argument set to 0). This function is your object.__getattribute__ method (a special table maps between the name and the slots).
This _PyObject_GenericGetAttrWithDict function can access the instance __dict__ object through the tp_dict slot, but for descriptors (including methods), the _PyType_Lookup function is used.
_PyType_Lookup handles caching and delegates to find_name_in_mro on cache misses; the latter looks up attributes on the class (and superclasses). The code uses direct pointers to the tp_dict slot on each class in the MRO to reference class attributes.
If a descriptor is found by _PyType_Lookup it is returned to _PyObject_GenericGetAttrWithDict and it calls the tp_descr_get function on that object (the __get__ hook).
When you access an attribute on the class itself, instead of _PyObject_GenericGetAttrWithDict, the type->tp_getattro slot is instead serviced by the type_getattro() function, which takes metaclasses into account too. This version calls __get__ too, but leaves the instance parameter set to None.
Nowhere does this code have to recursively call __getattribute__ to access the __dict__ attribute, as it can simply reach into the C structures directly.
Related
I'm learning overloading in Python 3.X and to better understand the topic, I wrote the following code that works in 3.X but not in 2.X. I expected the below code to fail since I've not defined __call__ for class Test. But to my surprise, it works and prints "constructor called". Demo.
class Test:
def __init__(self):
print("constructor called")
#Test.__getitem__() #error as expected
Test.__call__() #this works in 3.X(but not in 2.X) and prints "constructor called"! WHY THIS DOESN'T GIVE ERROR in 3.x?
So my question is that how/why exactly does this code work in 3.x but not in 2.x. I mean I want to know the mechanics behind what is going on.
More importantly, why __init__ is being used here when I am using __call__?
In 3.x:
About attribute lookup, type and object
Every time an attribute is looked up on an object, Python follows a process like this:
Is it directly a part of the actual data in the object? If so, use that and stop.
Is it directly a part of the object's class? If so, hold onto that for step 4.
Otherwise, check the object's class for __getattr__ and __getattribute__ overrides, look through base classes in the MRO, etc. (This is a massive simplification, of course.)
If something was found in step 2 or 3, check if it has a __get__. If it does, look that up (yes, that means starting over at step 1 for the attribute named __get__ on that object), call it, and use its return value. Otherwise, use what was returned directly.
Functions have a __get__ automatically; it is used to implement method binding. Classes are objects; that's why it's possible to look up attributes in them. That is: the purpose of the class Test: block is to define a data type; the code creates an object named Test which represents the data type that was defined.
But since the Test class is an object, it must be an instance of some class. That class is called type, and has a built-in implementation.
>>> type(Test)
<class 'type'>
Notice that type(Test) is not a function call. Rather, the name type is pre-defined to refer to a class, which every other class created in user code is (by default) an instance of.
In other words, type is the default metaclass: the class of classes.
>>> type
<class 'type'>
One may ask, what class does type belong to? The answer is surprisingly simple - itself:
>>> type(type) is type
True
Since the above examples call type, we conclude that type is callable. To be callable, it must have a __call__ attribute, and it does:
>>> type.__call__
<slot wrapper '__call__' of 'type' objects>
When type is called with a single argument, it looks up the argument's class (roughly equivalent to accessing the __class__ attribute of the argument). When called with three arguments, it creates a new instance of type, i.e., a new class.
How does type work?
Because this is digging right at the core of the language (allocating memory for the object), it's not quite possible to implement this in pure Python, at least for the reference C implementation (and I have no idea what sort of magic is going on in PyPy here). But we can approximately model the type class like so:
def _validate_type(obj, required_type, context):
if not isinstance(obj, required_type):
good_name = required_type.__name__
bad_name = type(obj).__name__
raise TypeError(f'{context} must be {good_name}, not {bad_name}')
class type:
def __new__(cls, name_or_obj, *args):
# __new__ implicitly gets passed an instance of the class, but
# `type` is its own class, so it will be `type` itself.
if len(args) == 0: # 1-argument form: check the type of an existing class.
return obj.__class__
# otherwise, 3-argument form: create a new class.
try:
bases, attrs = args
except ValueError:
raise TypeError('type() takes 1 or 3 arguments')
_validate_type(name, str, 'type.__new__() argument 1')
_validate_type(bases, tuple, 'type.__new__() argument 2')
_validate_type(attrs, dict, 'type.__new__() argument 3')
# This line would not work if we were actually implementing
# a replacement for `type`, as it would route to `object.__new__(type)`,
# which is explicitly disallowed. But let's pretend it does...
result = super().__new__()
# Now, fill in attributes from the parameters.
result.__name__ = name_or_obj
# Assigning to `__bases__` triggers a lot of other internal checks!
result.__bases__ = bases
for name, value in attrs.items():
setattr(result, name, value)
return result
del __new__.__get__ # `__new__`s of builtins don't implement this.
def __call__(self, *args):
return self.__new__(self, *args)
# this, however, does have a `__get__`.
What happens (conceptually) when we call the class (Test())?
Test() uses function-call syntax, but it's not a function. To figure out what should happen, we translate the call into Test.__class__.__call__(Test). (We use __class__ directly here, because translating the function call using type - asking type to categorize itself - would end up in endless recursion.)
Test.__class__ is type, so this becomes type.__call__(Test).
type contains a __call__ directly (type is its own class, remember?), so it's used directly - we don't go through the __get__ descriptor. We call the function, with Test as self, and no other arguments. (We have a function now, so we don't need to translate the function call syntax again. We could - given a function func, func.__class__.__call__.__get__(func) gives us an instance of an unnamed builtin "method wrapper" type, which does the same thing as func when called. Repeating the loop on the method wrapper creates a separate method wrapper that still does the same thing.)
This attempts the call Test.__new__(Test) (since self was bound to Test). Test.__new__ isn't explicitly defined in Test, but since Test is a class, we don't look in Test's class (type), but instead in Test's base (object).
object.__new__(Test) exists, and does magical built-in stuff to allocate memory for a new instance of the Test class, make it possible to assign attributes to that instance (even though Test is a subtype of object, which disallows that), and set its __class__ to Test.
Similarly, when we call type, the same logical chain turns type(Test) into type.__class__.__call__(type, Test) into type.__call__(type, Test), which forwards to type.__new__(type, Test). This time, there is a __new__ attribute directly in type, so this doesn't fall back to looking in object. Instead, with name_or_obj being set to Test, we simply return Test.__class__, i.e., type. And with separate name, bases, attrs arguments, type.__new__ instead creates an instance of type.
Finally: what happens when we call Test.__call__() explicitly?
If there's a __call__ defined in the class, it gets used, since it's found directly. This will fail, however, because there aren't enough arguments: the descriptor protocol isn't used since the attribute was found directly, so self isn't bound, and so that argument is missing.
If there isn't a __call__ method defined, then we look in Test's class, i.e., type. There's a __call__ there, so the rest proceeds like steps 3-5 in the previous section.
In Python 3.x, every class is implicitely a child of the builtin class object. And at least in the CPython implementation, the object class has a __call__ method which is defined in its metaclass type.
That means that Test.__call__() is exactly the same as Test() and will return a new Test object, calling your custom __init__ method.
In Python 2.x classes are by default old-style classes and are not child of object. Because of that __call__ is not defined. You can get the same behaviour in Python 2.x by using new style classes, meaning by making an explicit inheritance on object:
# Python 2 new style class
class Test(object):
...
Why does the function descriptor for dict.fromkeys function differently then other normal functions.
First of all you can't access __get__ like this: dict.fromkeys.__get__ you have to get it from the __dict__. (dict.__dict__['fromkeys'].__get__)
And then it doesn't work like any other function because it will only let itself get bound to a dict.
This works fine what I expected:
class Thing:
def __init__(self):
self.v = 5
def test(self):
print self.v
class OtherThing:
def __init__(self):
self.v = 6
print Thing.test
Thing.test.__get__(OtherThing())
this however does something unexpected:
#unbound method fromkeys
func = dict.__dict__["fromkeys"]
but it's description differs from a normal unbound function to looks like: <method 'fromkeys' of 'dict' objects> instead of: <unbound method dict.fromkeys> which was what I
this works as expected:
func.__get__({})([1,2,3])
but you can't bind it to something else I understand It wouldn't work but that doesn't usually stop us:
func.__get__([])([1,2,3])
This fails with a type error from the function descriptor...:
descriptor 'fromkeys' for type 'dict' doesn't apply to type 'list'
Why does python differentiate builtin type functions and normal functions like this? And can we do this too? Can we make a function that will only be bound to the type it belongs to?
class_or_type.descriptor always calls descriptor.__get__(None, class_or_type), which is why you can't get __get__ from a built-in unbound method object. The (unbound) method type produced for __get__ on a Python 2 function explicitly implements __get__ because Python methods are much more flexible. So both builtin.descriptor and class.descriptor invoke .__get__(), but the object type that either returns differs, and for Python methods the object type proxies attribute access:
Accessing attributes via __dict__, on the other hand, never calls __get__, so class_or_type.__dict__['fromkeys'] gives you the original descriptor object.
Under the hood, types like dict, including their methods, are implemented in C code, not in Python code, and C is much more picky about types. You can't ever use functions that are designed to work with the internal implementation details of a dict object with a list instead, so why bother?
That's really all there is to it; methods for built-in types are not the same thing as functions / methods implemented in Python, so while they mostly work the same, they can't work with dynamic types, and the implementation of how the descriptors work reflects that.
And because Python functions (wrapped in methods or not), could work with any Python type (if so designed), unbound method objects support __get__ so that you can take any such object and stick it onto another class; by supporting __get__ they support being re-bound to the new class.
If you want to achieve the same thing with Python types, you'll have to wrap the function objects in custom descriptors, then implement your own __get__ behaviour to restrict what is acceptable.
Note that in Python 3, function.__get__(None, classobject) just returns the function object itself, not an unbound method object like Python 2 does. The whole unbound / bound method distinction, where unbound method objects restrict the type for the first argument when called, wasn't found to be all that useful so it was dropped.
I am trying to understand when to define __getattr__ or __getattribute__. The python documentation mentions __getattribute__ applies to new-style classes. What are new-style classes?
A key difference between __getattr__ and __getattribute__ is that __getattr__ is only invoked if the attribute wasn't found the usual ways. It's good for implementing a fallback for missing attributes, and is probably the one of two you want.
__getattribute__ is invoked before looking at the actual attributes on the object, and so can be tricky to implement correctly. You can end up in infinite recursions very easily.
New-style classes derive from object, old-style classes are those in Python 2.x with no explicit base class. But the distinction between old-style and new-style classes is not the important one when choosing between __getattr__ and __getattribute__.
You almost certainly want __getattr__.
Lets see some simple examples of both __getattr__ and __getattribute__ magic methods.
__getattr__
Python will call __getattr__ method whenever you request an attribute that hasn't already been defined. In the following example my class Count has no __getattr__ method. Now in main when I try to access both obj1.mymin and obj1.mymax attributes everything works fine. But when I try to access obj1.mycurrent attribute -- Python gives me AttributeError: 'Count' object has no attribute 'mycurrent'
class Count():
def __init__(self,mymin,mymax):
self.mymin=mymin
self.mymax=mymax
obj1 = Count(1,10)
print(obj1.mymin)
print(obj1.mymax)
print(obj1.mycurrent) --> AttributeError: 'Count' object has no attribute 'mycurrent'
Now my class Count has __getattr__ method. Now when I try to access obj1.mycurrent attribute -- python returns me whatever I have implemented in my __getattr__ method. In my example whenever I try to call an attribute which doesn't exist, python creates that attribute and sets it to integer value 0.
class Count:
def __init__(self,mymin,mymax):
self.mymin=mymin
self.mymax=mymax
def __getattr__(self, item):
self.__dict__[item]=0
return 0
obj1 = Count(1,10)
print(obj1.mymin)
print(obj1.mymax)
print(obj1.mycurrent1)
__getattribute__
Now lets see the __getattribute__ method. If you have __getattribute__ method in your class, python invokes this method for every attribute regardless whether it exists or not. So why do we need __getattribute__ method? One good reason is that you can prevent access to attributes and make them more secure as shown in the following example.
Whenever someone try to access my attributes that starts with substring 'cur' python raises AttributeError exception. Otherwise it returns that attribute.
class Count:
def __init__(self,mymin,mymax):
self.mymin=mymin
self.mymax=mymax
self.current=None
def __getattribute__(self, item):
if item.startswith('cur'):
raise AttributeError
return object.__getattribute__(self,item)
# or you can use ---return super().__getattribute__(item)
obj1 = Count(1,10)
print(obj1.mymin)
print(obj1.mymax)
print(obj1.current)
Important: In order to avoid infinite recursion in __getattribute__ method, its implementation should always call the base class method with the same name to access any attributes it needs. For example: object.__getattribute__(self, name) or super().__getattribute__(item) and not self.__dict__[item]
IMPORTANT
If your class contain both getattr and getattribute magic methods then __getattribute__ is called first. But if __getattribute__ raises
AttributeError exception then the exception will be ignored and __getattr__ method will be invoked. See the following example:
class Count(object):
def __init__(self,mymin,mymax):
self.mymin=mymin
self.mymax=mymax
self.current=None
def __getattr__(self, item):
self.__dict__[item]=0
return 0
def __getattribute__(self, item):
if item.startswith('cur'):
raise AttributeError
return object.__getattribute__(self,item)
# or you can use ---return super().__getattribute__(item)
# note this class subclass object
obj1 = Count(1,10)
print(obj1.mymin)
print(obj1.mymax)
print(obj1.current)
This is just an example based on Ned Batchelder's explanation.
__getattr__ example:
class Foo(object):
def __getattr__(self, attr):
print "looking up", attr
value = 42
self.__dict__[attr] = value
return value
f = Foo()
print f.x
#output >>> looking up x 42
f.x = 3
print f.x
#output >>> 3
print ('__getattr__ sets a default value if undefeined OR __getattr__ to define how to handle attributes that are not found')
And if same example is used with __getattribute__ You would get >>> RuntimeError: maximum recursion depth exceeded while calling a Python object
New-style classes inherit from object, or from another new style class:
class SomeObject(object):
pass
class SubObject(SomeObject):
pass
Old-style classes don't:
class SomeObject:
pass
This only applies to Python 2 - in Python 3 all the above will create new-style classes.
See 9. Classes (Python tutorial), NewClassVsClassicClass and What is the difference between old style and new style classes in Python? for details.
New-style classes are ones that subclass "object" (directly or indirectly). They have a __new__ class method in addition to __init__ and have somewhat more rational low-level behavior.
Usually, you'll want to override __getattr__ (if you're overriding either), otherwise you'll have a hard time supporting "self.foo" syntax within your methods.
Extra info: http://www.devx.com/opensource/Article/31482/0/page/4
getattribute: Is used to retrieve an attribute from an instance. It captures every attempt to access an instance attribute by using dot notation or getattr() built-in function.
getattr: Is executed as the last resource when attribute is not found in an object. You can choose to return a default value or to raise AttributeError.
Going back to the __getattribute__ function; if the default implementation was not overridden; the following checks are done when executing the method:
Check if there is a descriptor with the same name (attribute name) defined in any class in the MRO chain (method object resolution)
Then looks into the instance’s namespace
Then looks into the class namespace
Then into each base’s namespace and so on.
Finally, if not found, the default implementation calls the fallback getattr() method of the instance and it raises an AttributeError exception as default implementation.
This is the actual implementation of the object.__getattribute__ method:
.. c:function:: PyObject* PyObject_GenericGetAttr(PyObject *o,
PyObject *name) Generic attribute getter function that is meant to
be put into a type object's tp_getattro slot. It looks for a
descriptor in the dictionary of classes in the object's MRO as well
as an attribute in the object's :attr:~object.dict (if
present). As outlined in :ref:descriptors, data descriptors take
preference over instance attributes, while non-data descriptors
don't. Otherwise, an :exc:AttributeError is raised.
I find that no one mentions this difference:
__getattribute__ has a default implementation, but __getattr__ does not.
class A:
pass
a = A()
a.__getattr__ # error
a.__getattribute__ # return a method-wrapper
This has a clear meaning: since __getattribute__ has a default implementation, while __getattr__ not, clearly python encourages users to implement __getattr__.
In reading through Beazley & Jones PCB, I have stumbled on an explicit and practical use-case for __getattr__ that helps answer the "when" part of the OP's question. From the book:
"The __getattr__() method is kind of like a catch-all for attribute lookup. It's a method that gets called if code tries to access an attribute that doesn't exist." We know this from the above answers, but in PCB recipe 8.15, this functionality is used to implement the delegation design pattern. If Object A has an attribute Object B that implements many methods that Object A wants to delegate to, rather than redefining all of Object B's methods in Object A just to call Object B's methods, define a __getattr__() method as follows:
def __getattr__(self, name):
return getattr(self._b, name)
where _b is the name of Object A's attribute that is an Object B. When a method defined on Object B is called on Object A, the __getattr__ method will be invoked at the end of the lookup chain. This would make code cleaner as well, since you do not have a list of methods defined just for delegating to another object.
According to Python 2.7.12 documentation, User-defined methods:
User-defined method objects may be created when getting an attribute
of a class (perhaps via an instance of that class), if that attribute
is a user-defined function object, an unbound user-defined method
object, or a class method object. When the attribute is a
user-defined method object, a new method object is only created if the
class from which it is being retrieved is the same as, or a derived
class of, the class stored in the original method object; otherwise,
the original method object is used as it is.
I know that everything in Python is an object, so a "user-defined method" must be identical to a "user-defined method object". However, I can't understand why there is a "user-defined function object attribute". Say, in the following code:
class Foo(object):
def meth(self):
pass
meth is a function defined inside a class body, and thus a method. So why can we have a "user-defined function object attribute"? Aren't all attributes defined inside a class body?
Bouns question: Provide some examples illustrating how a user-defined method object is created by getting an attribute of a class. Isn't objects defined in their class definition? (I know methods can be assigned to a class instance, but that's monkey patching.)
I'm asking for help because this part of document is really really confusing to me, a programmer who only knows C, since Python is such a magical language that supports both functional programming and object-oriented programmer, which I haven't mastered yet. I've done a lot of search, but still can't figure that out.
When you do
class Foo(object):
def meth(self):
pass
you are defining a class Foo with a method meth. However, when this class definition is executed, no method object is created to represent the method. The def statement creates an ordinary function object.
If you then do
Foo.meth
or
Foo().meth
the attribute lookup finds the function object, but the function object is not used as the value of the attribute. Instead, using the descriptor protocol, Python calls the __get__ method of the function object to construct a method object, and that method object is used as the value of the attribute for that lookup. For Foo.meth, the method object is an unbound method object, which behaves like the function you defined, but with an extra type checking of self. For Foo().meth, the method object is a bound method object, which already knows what self is.
This is why Foo().meth() doesn't complain about a missing self argument; you pass 0 arguments to the method object, which then prepends self to the (empty) argument list and passes the arguments on to the underlying function object. If Foo().meth evaluated to the meth function directly, you would have to pass it self explicitly.
In Python 3, Foo.meth doesn't create an unbound method object; the function's __get__ still gets called, but it returns the function directly, since Guido decided unbound method objects weren't useful. Foo().meth still creates a bound method object, though.
While looking through the webapp2 documentation online, I found information on the decorator: webapp2.cached_property
In the documentation, it says:
A decorator that converts a function into a lazy property.
My question is:
→ What is a lazy property?
It is a property decorator that gets out of the way after the first call. It allows you to auto-cache a computed value.
The standard library #property decorator is a data descriptor object and is always called, even if there is an attribute on the instance of the same name.
The #cached_property decorator on the other hand, only has a __get__ method, which means that it is not called if there is an attribute with the same name already present. It makes use of this by setting an attribute with the same name on the instance on the first call.
Given a #cached_property-decorated bar method on an instance named foo, this is what happens:
Python resolves foo.bar. No bar attribute is found on the instance.
Python finds the bar descriptor on the class, and calls __get__ on that.
The cached_property __get__ method calls the decorated bar method.
The bar method calculates something, and returns the string 'spam'.
The cached_property __get__ method takes the return value and sets a new attribute bar on the instance; foo.bar = 'spam'.
The cached_property __get__ method returns the 'spam' return value.
If you ask for foo.bar again, Python finds the bar attribute on the instance, and uses that from here on out.
Also see the source code for the original Werkzeug implementation:
# implementation detail: this property is implemented as non-data
# descriptor. non-data descriptors are only invoked if there is
# no entry with the same name in the instance's __dict__.
# this allows us to completely get rid of the access function call
# overhead. If one choses to invoke __get__ by hand the property
# will still work as expected because the lookup logic is replicated
# in __get__ for manual invocation.
Note that as of Python 3.8, the standard library has a similar object, #functools.cached_property(). It's implementation is a little bit more robust, it guards against accidental re-use under a different name, produces a better error message if used on an object without a __dict__ attribute or where that object doesn't support item assignment, and is also thread-safe.