How does attribute resolution work in Python?

How does attribute resolution work in Python? - python

Consider the following code:
class A(object):
def do(self):
print self.z
class B(A):
def __init__(self, y):
self.z = y
b = B(3)
b.do()
Why does this work? When executing b = B(3), attribute z is set. When b.do() is called, Python's MRO finds the do function in class A. But why is it able to access an attribute defined in a subclass?
Is there a use case for this functionality? I would love an example.

It works in a pretty simple way: when a statement is executed that sets an attribute, it is set. When a statement is executed that reads an attribute, it is read. When you write code that reads an attribute, Python does not try to guess whether the attribute will exist when that code is executed; it just waits until the code actually is executed, and if at that time the attribute doesn't exist, then you'll get an exception.
By default, you can always set any attribute on an instance of a user-defined class; classes don't normally define lists of "allowed" attributes that could be set (although you can make that happen too), they just actually set attributes. Of course, you can only read attributes that exist, but again, what matters is whether they exist when you actually try to read them. So it doesn't matter if an attribute exists when you define a function that tries to read it; it only matters when (or if) you actually call that function.
In your example, it doesn't matter that there are two classes, because there is only one instance. Since you only create one instance and call methods on one instance, the self in both methods is the same object. First __init__ is run and it sets the attribute on self. Then do is run and it reads the attribute from the same self. That's all there is to it. It doesn't matter where the attribute is set; once it is set on the instance, it can be accessed from anywhere: code in a superclass, subclass, other class, or not in any class.

Since new attributes can be added to any object at any time, attribute resolution happens at execution time, not compile time. Consider this example which may be a bit more instructive, derived from yours:
class A(object):
def do(self):
print(self.z) # references an attribute which we have't "declared" in an __init__()
#make a new A
aa = A()
# this next line will error, as you would expect, because aa doesn't have a self.z
aa.do()
# but we can make it work now by simply doing
aa.z = -42
aa.do()
The first one will squack at you, but the second will print -42 as expected.
Python objects are just dictionaries. :)

When retrieving an attribute from an object (print self.attrname) Python follows these steps:
If attrname is a special (i.e. Python-provided) attribute for objectname, return it.
Check objectname.__class__.__dict__ for attrname. If it exists and is a data-descriptor, return the descriptor result. Search all bases of objectname.__class__ for the same case.
Check objectname.__dict__ for attrname, and return if found. If objectname is a class, search its bases too. If it is a class and a descriptor exists in it or its bases, return the descriptor result.
Check objectname.__class__.__dict__ for attrname. If it exists and is a non-data descriptor, return the descriptor result. If it exists, and is not a descriptor, just return it. If it exists and is a data descriptor, we shouldn't be here because we would have returned at point 2. Search all bases of objectname.__class__ for same case.
Raise AttributeError
Source
Understanding get and set and Python descriptors

Since you instanciated a B object, B.__init__ was invoked and added an attribute z. This attribute is now present in the object. It's not some weird overloaded magical shared local variable of B methods that somehow becomes inaccessible to code written elsewhere. There's no such thing. Neither does self become a different object when it's passed to a superclass' method (how's polymorphism supposed to work if that happens?).
There's also no such thing as a declaration that A objects have no such object (try o = A(); a.z = whatever), and neither is self in do required to be an instance of A1. In fact, there are no declarations at all. It's all "go ahead and try it"; that's kind of the definition of a dynamic language (not just dynamic typing).
That object's z attribute present "everywhere", all the time2, regardless of the "context" from which it is accessed. It never matters where code is defined for the resolution process, or for several other behaviors3. For the same reason, you can access a list's methods despite not writing C code in listobject.c ;-) And no, methods aren't special. They are just objects too (instances of the type function, as it happens) and are involved in exactly the same lookup sequence.
1 This is a slight lie; in Python 2, A.do would be "bound method" object which in fact throws an error if the first argument doesn't satisfy isinstance(A, <first arg>).
2 Until it's removed with del or one of its function equivalents (delattr and friends).
3 Well, there's name mangling, and in theory, code could inspect the stack, and thereby the caller code object, and thereby the location of its source code.

Related

Explicit call to call works and uses init

I'm learning overloading in Python 3.X and to better understand the topic, I wrote the following code that works in 3.X but not in 2.X. I expected the below code to fail since I've not defined __call__ for class Test. But to my surprise, it works and prints "constructor called". Demo.
class Test:
def __init__(self):
print("constructor called")
#Test.__getitem__() #error as expected
Test.__call__() #this works in 3.X(but not in 2.X) and prints "constructor called"! WHY THIS DOESN'T GIVE ERROR in 3.x?
So my question is that how/why exactly does this code work in 3.x but not in 2.x. I mean I want to know the mechanics behind what is going on.
More importantly, why __init__ is being used here when I am using __call__?

In 3.x:
About attribute lookup, type and object
Every time an attribute is looked up on an object, Python follows a process like this:
Is it directly a part of the actual data in the object? If so, use that and stop.
Is it directly a part of the object's class? If so, hold onto that for step 4.
Otherwise, check the object's class for __getattr__ and __getattribute__ overrides, look through base classes in the MRO, etc. (This is a massive simplification, of course.)
If something was found in step 2 or 3, check if it has a __get__. If it does, look that up (yes, that means starting over at step 1 for the attribute named __get__ on that object), call it, and use its return value. Otherwise, use what was returned directly.
Functions have a __get__ automatically; it is used to implement method binding. Classes are objects; that's why it's possible to look up attributes in them. That is: the purpose of the class Test: block is to define a data type; the code creates an object named Test which represents the data type that was defined.
But since the Test class is an object, it must be an instance of some class. That class is called type, and has a built-in implementation.
>>> type(Test)
<class 'type'>
Notice that type(Test) is not a function call. Rather, the name type is pre-defined to refer to a class, which every other class created in user code is (by default) an instance of.
In other words, type is the default metaclass: the class of classes.
>>> type
<class 'type'>
One may ask, what class does type belong to? The answer is surprisingly simple - itself:
>>> type(type) is type
True
Since the above examples call type, we conclude that type is callable. To be callable, it must have a __call__ attribute, and it does:
>>> type.__call__
<slot wrapper '__call__' of 'type' objects>
When type is called with a single argument, it looks up the argument's class (roughly equivalent to accessing the __class__ attribute of the argument). When called with three arguments, it creates a new instance of type, i.e., a new class.
How does type work?
Because this is digging right at the core of the language (allocating memory for the object), it's not quite possible to implement this in pure Python, at least for the reference C implementation (and I have no idea what sort of magic is going on in PyPy here). But we can approximately model the type class like so:
def _validate_type(obj, required_type, context):
if not isinstance(obj, required_type):
good_name = required_type.__name__
bad_name = type(obj).__name__
raise TypeError(f'{context} must be {good_name}, not {bad_name}')
class type:
def __new__(cls, name_or_obj, *args):
# __new__ implicitly gets passed an instance of the class, but
# `type` is its own class, so it will be `type` itself.
if len(args) == 0: # 1-argument form: check the type of an existing class.
return obj.__class__
# otherwise, 3-argument form: create a new class.
try:
bases, attrs = args
except ValueError:
raise TypeError('type() takes 1 or 3 arguments')
_validate_type(name, str, 'type.__new__() argument 1')
_validate_type(bases, tuple, 'type.__new__() argument 2')
_validate_type(attrs, dict, 'type.__new__() argument 3')
# This line would not work if we were actually implementing
# a replacement for `type`, as it would route to `object.__new__(type)`,
# which is explicitly disallowed. But let's pretend it does...
result = super().__new__()
# Now, fill in attributes from the parameters.
result.__name__ = name_or_obj
# Assigning to `__bases__` triggers a lot of other internal checks!
result.__bases__ = bases
for name, value in attrs.items():
setattr(result, name, value)
return result
del __new__.__get__ # `__new__`s of builtins don't implement this.
def __call__(self, *args):
return self.__new__(self, *args)
# this, however, does have a `__get__`.
What happens (conceptually) when we call the class (Test())?
Test() uses function-call syntax, but it's not a function. To figure out what should happen, we translate the call into Test.__class__.__call__(Test). (We use __class__ directly here, because translating the function call using type - asking type to categorize itself - would end up in endless recursion.)
Test.__class__ is type, so this becomes type.__call__(Test).
type contains a __call__ directly (type is its own class, remember?), so it's used directly - we don't go through the __get__ descriptor. We call the function, with Test as self, and no other arguments. (We have a function now, so we don't need to translate the function call syntax again. We could - given a function func, func.__class__.__call__.__get__(func) gives us an instance of an unnamed builtin "method wrapper" type, which does the same thing as func when called. Repeating the loop on the method wrapper creates a separate method wrapper that still does the same thing.)
This attempts the call Test.__new__(Test) (since self was bound to Test). Test.__new__ isn't explicitly defined in Test, but since Test is a class, we don't look in Test's class (type), but instead in Test's base (object).
object.__new__(Test) exists, and does magical built-in stuff to allocate memory for a new instance of the Test class, make it possible to assign attributes to that instance (even though Test is a subtype of object, which disallows that), and set its __class__ to Test.
Similarly, when we call type, the same logical chain turns type(Test) into type.__class__.__call__(type, Test) into type.__call__(type, Test), which forwards to type.__new__(type, Test). This time, there is a __new__ attribute directly in type, so this doesn't fall back to looking in object. Instead, with name_or_obj being set to Test, we simply return Test.__class__, i.e., type. And with separate name, bases, attrs arguments, type.__new__ instead creates an instance of type.
Finally: what happens when we call Test.__call__() explicitly?
If there's a __call__ defined in the class, it gets used, since it's found directly. This will fail, however, because there aren't enough arguments: the descriptor protocol isn't used since the attribute was found directly, so self isn't bound, and so that argument is missing.
If there isn't a __call__ method defined, then we look in Test's class, i.e., type. There's a __call__ there, so the rest proceeds like steps 3-5 in the previous section.

In Python 3.x, every class is implicitely a child of the builtin class object. And at least in the CPython implementation, the object class has a __call__ method which is defined in its metaclass type.
That means that Test.__call__() is exactly the same as Test() and will return a new Test object, calling your custom __init__ method.
In Python 2.x classes are by default old-style classes and are not child of object. Because of that __call__ is not defined. You can get the same behaviour in Python 2.x by using new style classes, meaning by making an explicit inheritance on object:
# Python 2 new style class
class Test(object):
...

Variables and functions defined in inherited class run in base class

I came across weird behavior in Python 3.6.
I was able to call function and access variable defined only in child class from base class method.
I find this useful in my code but I come from C++ and this code looks very weird.
Can someone please explain this behavior?
class a:
def __init__(self):
print(self.var)
self.checker()
class b(a):
def __init__(self):
self.var=5
super().__init__()
def checker(self):
print('inside B checker')
myB = b()
Output:
5
inside B checker

All methods in Python are looked up dynamically. You're calling a method on self, and self is a b instance, and b instances have a checker method, so that method gets called.
Consider this code at the module top level, or in a top-level function:
myB = b()
myB.checker()
Obviously the global module code isn't part of the b class definition, and yet, this is obviously legal. Why should it be any different if you put the code inside the class a definition, and rename myB to welf? Python doesn't care. You're just asking the value—whether you've called it myB or self—"do you have something named checker?", and the answer is yes, so you can call it.
And var is even simpler; self.var just adds var to self.__dict__, so it's there; the fact that it's a b instance isn't even relevant here (except indirectly—being a b instance means it had b.__init___ called n it, and that's where var was created).
If you're wondering how this "asking the value", a slightly oversimplified version is:
Every object has a __dict__. When you do self.var=5, that actually does self.__dict__['var'] = 5. And when you print(self.var), that does print(self.__dict__['var']).
When that raises a KeyError, as it will for self.checker, Python tries type(self).__dict__['checker'], and, if that doesn't work, it loops over type(self).mro() and tries all of those dicts.
When all of those raise a KeyError, as they would with self.spam, Python calls self.__getattr__('spam').
If even that fails, you get an AttributeError.
Notice that if you try to construct an a instance, this will fail with an AttributeError. That's because now self is an a, not a b. It doesn't have a checker method, and it hasn't gone through the __init__ code that adds a var attribute.
The reason you can't do this in C++ is that C++ methods are looked up statically. It's not a matter of what type the value is at runtime, but what type the variable is at compile time. If the statically looked-up method says it's virtual, then the compiler inserts some dynamic-lookup code, but otherwise, it doesn't.1
One way it's often explained is that in Python (and other languages with SmallTalk semantics, like ObjC and Ruby), all methods are automatically virtual. But that's a bit misleading, because in C++, even with virtual methods, the method name and signature still has to be findable on the base class; in Python (and SmallTalk, etc.), that isn't necessary.
If you're thinking this must be horribly slow, that Python must have to do something like search some stack of namespaces for the method by name every time you call a method—well, it does that, but it's not as slow as you've expect. For one thing, a namespace is a dict, so it's a constant-time search. And the strings are interned and have their hash values cached. And the interpreter can even cache the lookup results if it wants to. The result is still slower than dereferencing a pointer through a vtable, but not by a huge margin (and besides, there are plenty of other things in Python that can be 20x slower than C++, like for loops; you don't use pure Python when you need every detail to work as fast as possible).
1. C++ also has another problem: even if you defined a var attribute and a checker virtual method in a, you don't get to choose the order the initializers get called; the compiler automatically calls a's constructor first, then b's. In Python, it calls b.__init__, and you choose when you want to call super().__init__(), whether it's at the start of the method, at the end, in the middle, or even never.

First you are creating an instance of class b which will call the constructor of class b __init__.
Inside the constructor your are setting the attribute self.var as 5.Later super().__init__() will call the constructor of the parent class A.
Inside the constructor of class A both self.var is printed and self.checker() is called.
Note that when calling the super().__init__() will place the child class instance self as the first argument by default.

How to correctly "stub" objclass in a Python class?

I have Python class looking somewhat like this:
class some_class:
def __getattr__(self, name):
# Do something with "name" (by passing it to a server)
Sometimes, I am working with ptpython (an interactive Python shell) for debugging. ptpython inspects instances of the class and tries to access the __objclass__ attribute, which does not exist. In __getattr__, I could simply check if name != "__objclass__" before working with name, but I'd like to know whether there is a better way by either correctly implementing or somehow stubbing __objclass__.
The Python documentation does not say very much about it, or at least I do not understand what I have to do:
The attribute __objclass__ is interpreted by the inspect module as specifying the class where this object was defined (setting this appropriately can assist in runtime introspection of dynamic class attributes). For callables, it may indicate that an instance of the given type (or a subclass) is expected or required as the first positional argument (for example, CPython sets this attribute for unbound methods that are implemented in C).

You want to avoid interfering with this attribute. There is no reason to do any kind of stubbing manually - you want to get out of the way and let it do what it usually does. If it behaves like attributes usually do, everything will work correctly.
The correct implementation is therefore to special-case the __objclass__ attribute in your __getattr__ function and throw an AttributeError.
class some_class:
def __getattr__(self, name):
if name == "__objclass__":
raise AttributeError
# Do something with "name" (by passing it to a server)
This way it will behave the same way as it would in a class that has no __getattr__: The attribute is considered non-existant by default, until it's assigned to. The __getattr__ method won't be called if the attribute already exists, so it can be used without any issues:
>>> obj = some_class()
>>> hasattr(obj, '__objclass__')
False
>>> obj.__objclass__ = some_class
>>> obj.__objclass__
<class '__main__.some_class'>

Which special methods bypasses getattribute in Python?

In addition to bypassing any instance attributes in the interest of correctness, implicit special method lookup generally also bypasses the __getattribute__() method even of the object’s metaclass.
The docs mention special methods such as __hash__, __repr__ and __len__, and I know from experience it also includes __iter__ for Python 2.7.
To quote an answer to a related question:
"Magic __methods__() are treated specially: They are internally assigned to "slots" in the type data structure to speed up their look-up, and they are only looked up in these slots."
In a quest to improve my answer to another question, I need to know: Which methods, specifically, are we talking about?

You can find an answer in the python3 documentation for object.__getattribute__, which states:
Called unconditionally to implement attribute accesses for instances of the class. If the class also defines __getattr__(), the
latter will not be called unless __getattribute__() either calls it
explicitly or raises an AttributeError. This method should return the
(computed) attribute value or raise an AttributeError exception. In
order to avoid infinite recursion in this method, its implementation
should always call the base class method with the same name to access
any attributes it needs, for example, object.__getattribute__(self,
name).
Note
This method may still be bypassed when looking up special methods as the result of implicit invocation via language syntax or built-in
functions. See Special method lookup.
also this page explains exactly how this "machinery" works. Fundamentally __getattribute__ is called only when you access an attribute with the .(dot) operator(and also by hasattr as Zagorulkin pointed out).
Note that the page does not specify which special methods are implicitly looked up, so I deem that this hold for all of them(which you may find here.

Checked in 2.7.9
Couldn't find any way to bypass the call to __getattribute__, with any of the magical methods that are found on object or type:
# Preparation step: did this from the console
# magics = set(dir(object) + dir(type))
# got 38 names, for each of the names, wrote a.<that_name> to a file
# Ended up with this:
a.__module__
a.__base__
#...
Put this at the beginning of that file, which i renamed into a proper python module (asdf.py)
global_counter = 0
class Counter(object):
def __getattribute__(self, name):
# this will count how many times the method was called
global global_counter
global_counter += 1
return super(Counter, self).__getattribute__(name)
a = Counter()
# after this comes the list of 38 attribute accessess
a.__module__
#...
a.__repr__
#...
print global_counter # you're not gonna like it... it printer 38
Then i also tried to get each of those names by getattr and hasattr -> same result. __getattribute__ was called every time.
So if anyone has other ideas... I was too lazy to look inside C code for this, but I'm sure the answer lies somewhere there.
So either there's something that i'm not getting right, or the docs are lying.

super().method will also bypass __getattribute__. This atrocious code will run just fine (Python 3.11).
class Base:
def print(self):
print("whatever")
def __getattribute__(self, item):
raise Exception("Don't access this with a dot!")
class Sub(Base):
def __init__(self):
super().print()
a = Sub()
# prints 'whatever'
a.print()
# Exception Don't access this with a dot!

Python Suite, Package, Module, TestCase and TestSuite differences

Best Guess:
method - def(self, maybeSomeVariables); lines of code which achieve some purpose
Function - same as method but returns something
Class - group of methods/functions
Module - a script, OR one or more classes. Basically a .py file.
Package - a folder which has modules in, and also a __init__.py file in there.
Suite - Just a word that gets thrown around a lot, by convention
TestCase - unittest's equivalent of a function
TestSuite - unittest's equivalent of a Class (or Module?)
My question is: Is this completely correct, and did I miss any hierarchical building blocks from that list?

I feel that you're putting in differences that don't actually exist. There isn't really a hierarchy as such. In python everything is an object. This isn't some abstract notion, but quite fundamental to how you should think about constructs you create when using python. An object is just a bunch of other objects. There is a slight subtlety in whether you're using new-style classes or not, but in the absence of a good reason otherwise, just use and assume new-style classes. Everything below is assuming new-style classes.
If an object is callable, you can call it using the calling syntax of a pair of braces, with the arguments inside them: my_callable(arg1, arg2). To be callable, an object needs to implement the __call__ method (or else have the correct field set in its C level type definition).
In python an object has a type associated with it. The type describes how the object was constructed. So, for example, a list object is of type list and a function object is of type function. The types themselves are of type type. You can find the type by using the built-in function type(). A list of all the built-in types can be found in the python documentation. Types are actually callable objects, and are used to create instances of a given type.
Right, now that's established, the nature of a given object is defined by it's type. This describes the objects of which it comprises. Coming back to your questions then:
Firstly, the bunch of objects that make up some object are called the attributes of that object. These attributes can be anything, but they typically consist of methods and some way of storing state (which might be types such as int or list).
A function is an object of type function. Crucially, that means it has the __call__ method as an attribute which makes it a callable (the __call__ method is also an object that itself has the __call__ method. It's __call__ all the way down ;)
A class, in the python world, can be considered as a type, but typically is used to refer to types that are not built-in. These objects are used to create other objects. You can define your own classes with the class keyword, and to create a class which is new-style you must inherit from object (or some other new-style class). When you inherit, you create a type that acquires all the characteristics of the parent type, and then you can overwrite the bits you want to (and you can overwrite any bits you want!). When you instantiate a class (or more generally, a type) by calling it, another object is returned which is created by that class (how the returned object is created can be changed in weird and crazy ways by modifying the class object).
A method is a special type of function that is called using the attribute notation. That is, when it is created, 2 extra attributes are added to the method (remember it's an object!) called im_self and im_func. im_self I will describe in a few sentences. im_func is a function that implements the method. When the method is called, like, for example, foo.my_method(10), this is equivalent to calling foo.my_method.im_func(im_self, 10). This is why, when you define a method, you define it with the extra first argument which you apparently don't seem to use (as self).
When you write a bunch of methods when defining a class, these become unbound methods. When you create an instance of that class, those methods become bound. When you call an bound method, the im_self argument is added for you as the object in which the bound method resides. You can still call the unbound method of the class, but you need to explicitly add the class instance as the first argument:
class Foo(object):
def bar(self):
print self
print self.bar
print self.bar.im_self # prints the same as self
We can show what happens when we call the various manifestations of the bar method:
>>> a = Foo()
>>> a.bar()
<__main__.Foo object at 0x179b610>
<bound method Foo.bar of <__main__.Foo object at 0x179b610>>
<__main__.Foo object at 0x179b610>
>>> Foo.bar()
TypeError: unbound method bar() must be called with Foo instance as first argument (got nothing instead)
>>> Foo.bar(a)
<__main__.Foo object at 0x179b610>
<bound method Foo.bar of <__main__.Foo object at 0x179b610>>
<__main__.Foo object at 0x179b610>
Bringing all the above together, we can define a class as follows:
class MyFoo(object):
a = 10
def bar(self):
print self.a
This generates a class with 2 attributes: a (which is an integer of value 10) and bar, which is an unbound method. We can see that MyFoo.a is just 10.
We can create extra attributes at run time, both within the class methods, and outside. Consider the following:
class MyFoo(object):
a = 10
def __init__(self):
self.b = 20
def bar(self):
print self.a
print self.b
def eep(self):
print self.c
__init__ is just the method that is called immediately after an object has been created from a class.
>>> foo = Foo()
>>> foo.bar()
10
20
>>> foo.eep()
AttributeError: 'MyFoo' object has no attribute 'c'
>>> foo.c = 30
>>> foo.eep()
30
This example shows 2 ways of adding an attribute to a class instance at run time (that is, after the object has been created from it's class).
I hope you can see then, that TestCase and TestSuite are just classes that are used to create test objects. There's nothing special about them except that they happen to have some useful features for writing tests. You can subclass and overwrite them to your heart's content!
Regarding your specific point, both methods and functions can return anything they want.
Your description of module, package and suite seems pretty sound. Note that modules are also objects!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.