From the documentation:
x[i] is roughly equivalent to type(x).__getitem__(x, i).
What is the benefit of the above rather than having a seemingly simpler x.__getitem__(i)?
EDIT: Why is Python behaving this way?
As a downside of the standard behavior let me show this sample code where I was surprised to find the last assertion fails while second to last one (calling __getitem__ directly) passes.
def poww_bar(base):
class Bar():
def __getitem__(self, x):
return lambda: base**x
return Bar()
def poww_foo(base):
class Foo():
pass
f = Foo()
f.__getitem__ = lambda x: lambda: base ** x
return f
pow_bar2 = poww_bar(2)
pow_foo2 = poww_foo(2)
assert pow_bar2.__getitem__(3)() == 8 # OK
assert pow_bar2[3]() == 8 # OK
assert pow_foo2.__getitem__(3)() == 8 # OK
assert pow_foo2[3]() == 8 # TypeError: 'Foo' object is not subscriptable
Methods are class attributes, not instance attributes.
There is no instance attribute named __getitem__ associated with pow_bar2. So lookup proceeds to checking the class for an attribute by that name, and it succeeds in finding Bar.__getitem__.
But the process doesn't end there. pow_bar2.__getitem__(i) is not equivalent to Bar.__getitem__(i), because Python first checks of the attribute lookup produces an object that implements the descriptor protocol. Since Bar.__getitem__ is an instance of function, it does implement the descriptor protocol.
The next step is then to return not the function itself, but the result of Bar.__dict__['__getitem__'].__get__(pow_bar2, Bar). (I'm switching to the use of Bar.__dict__ to emphasize that we do not get into an infinite loop of triggering the descriptor protocol.) This is an instance of method, which is itself a callable that passes is own arguments, along with pow_bar2, as arguments to the original function.
Thus, pow_bar2.__getitem__(i) is equivalent to Bar.__dict__['__getitem__'].__get__(pow_bar2, Bar)(i), which is roughly equivalent to Bar.__dict__['__getitem__'](pow_bar2, i).
But really, pow_bar2[i] is just shorter and more easily recognizable (due to decades of established support for this syntax in other languages) than pow_bar2.__getitem__(i). __getitem__ is what makes the use of [] extendable to other classes, rather than limiting it to built-in types.
The descriptor protocol is not just a one-shot feature that makes instance-method behavior seem more complicated than necessary. It also determines how class methods, static methods, and properties work, and can further be used to customize attribute behavior in other ways.
It could just be an optimization. A class function will only have one reference in the class definition. An object function will have a reference in every object. So the __getitem__ method was specified to be a class function, so they didn't need to waste time looking in the object definitions for it.
This is all speculation of course.
Related
I'm learning overloading in Python 3.X and to better understand the topic, I wrote the following code that works in 3.X but not in 2.X. I expected the below code to fail since I've not defined __call__ for class Test. But to my surprise, it works and prints "constructor called". Demo.
class Test:
def __init__(self):
print("constructor called")
#Test.__getitem__() #error as expected
Test.__call__() #this works in 3.X(but not in 2.X) and prints "constructor called"! WHY THIS DOESN'T GIVE ERROR in 3.x?
So my question is that how/why exactly does this code work in 3.x but not in 2.x. I mean I want to know the mechanics behind what is going on.
More importantly, why __init__ is being used here when I am using __call__?
In 3.x:
About attribute lookup, type and object
Every time an attribute is looked up on an object, Python follows a process like this:
Is it directly a part of the actual data in the object? If so, use that and stop.
Is it directly a part of the object's class? If so, hold onto that for step 4.
Otherwise, check the object's class for __getattr__ and __getattribute__ overrides, look through base classes in the MRO, etc. (This is a massive simplification, of course.)
If something was found in step 2 or 3, check if it has a __get__. If it does, look that up (yes, that means starting over at step 1 for the attribute named __get__ on that object), call it, and use its return value. Otherwise, use what was returned directly.
Functions have a __get__ automatically; it is used to implement method binding. Classes are objects; that's why it's possible to look up attributes in them. That is: the purpose of the class Test: block is to define a data type; the code creates an object named Test which represents the data type that was defined.
But since the Test class is an object, it must be an instance of some class. That class is called type, and has a built-in implementation.
>>> type(Test)
<class 'type'>
Notice that type(Test) is not a function call. Rather, the name type is pre-defined to refer to a class, which every other class created in user code is (by default) an instance of.
In other words, type is the default metaclass: the class of classes.
>>> type
<class 'type'>
One may ask, what class does type belong to? The answer is surprisingly simple - itself:
>>> type(type) is type
True
Since the above examples call type, we conclude that type is callable. To be callable, it must have a __call__ attribute, and it does:
>>> type.__call__
<slot wrapper '__call__' of 'type' objects>
When type is called with a single argument, it looks up the argument's class (roughly equivalent to accessing the __class__ attribute of the argument). When called with three arguments, it creates a new instance of type, i.e., a new class.
How does type work?
Because this is digging right at the core of the language (allocating memory for the object), it's not quite possible to implement this in pure Python, at least for the reference C implementation (and I have no idea what sort of magic is going on in PyPy here). But we can approximately model the type class like so:
def _validate_type(obj, required_type, context):
if not isinstance(obj, required_type):
good_name = required_type.__name__
bad_name = type(obj).__name__
raise TypeError(f'{context} must be {good_name}, not {bad_name}')
class type:
def __new__(cls, name_or_obj, *args):
# __new__ implicitly gets passed an instance of the class, but
# `type` is its own class, so it will be `type` itself.
if len(args) == 0: # 1-argument form: check the type of an existing class.
return obj.__class__
# otherwise, 3-argument form: create a new class.
try:
bases, attrs = args
except ValueError:
raise TypeError('type() takes 1 or 3 arguments')
_validate_type(name, str, 'type.__new__() argument 1')
_validate_type(bases, tuple, 'type.__new__() argument 2')
_validate_type(attrs, dict, 'type.__new__() argument 3')
# This line would not work if we were actually implementing
# a replacement for `type`, as it would route to `object.__new__(type)`,
# which is explicitly disallowed. But let's pretend it does...
result = super().__new__()
# Now, fill in attributes from the parameters.
result.__name__ = name_or_obj
# Assigning to `__bases__` triggers a lot of other internal checks!
result.__bases__ = bases
for name, value in attrs.items():
setattr(result, name, value)
return result
del __new__.__get__ # `__new__`s of builtins don't implement this.
def __call__(self, *args):
return self.__new__(self, *args)
# this, however, does have a `__get__`.
What happens (conceptually) when we call the class (Test())?
Test() uses function-call syntax, but it's not a function. To figure out what should happen, we translate the call into Test.__class__.__call__(Test). (We use __class__ directly here, because translating the function call using type - asking type to categorize itself - would end up in endless recursion.)
Test.__class__ is type, so this becomes type.__call__(Test).
type contains a __call__ directly (type is its own class, remember?), so it's used directly - we don't go through the __get__ descriptor. We call the function, with Test as self, and no other arguments. (We have a function now, so we don't need to translate the function call syntax again. We could - given a function func, func.__class__.__call__.__get__(func) gives us an instance of an unnamed builtin "method wrapper" type, which does the same thing as func when called. Repeating the loop on the method wrapper creates a separate method wrapper that still does the same thing.)
This attempts the call Test.__new__(Test) (since self was bound to Test). Test.__new__ isn't explicitly defined in Test, but since Test is a class, we don't look in Test's class (type), but instead in Test's base (object).
object.__new__(Test) exists, and does magical built-in stuff to allocate memory for a new instance of the Test class, make it possible to assign attributes to that instance (even though Test is a subtype of object, which disallows that), and set its __class__ to Test.
Similarly, when we call type, the same logical chain turns type(Test) into type.__class__.__call__(type, Test) into type.__call__(type, Test), which forwards to type.__new__(type, Test). This time, there is a __new__ attribute directly in type, so this doesn't fall back to looking in object. Instead, with name_or_obj being set to Test, we simply return Test.__class__, i.e., type. And with separate name, bases, attrs arguments, type.__new__ instead creates an instance of type.
Finally: what happens when we call Test.__call__() explicitly?
If there's a __call__ defined in the class, it gets used, since it's found directly. This will fail, however, because there aren't enough arguments: the descriptor protocol isn't used since the attribute was found directly, so self isn't bound, and so that argument is missing.
If there isn't a __call__ method defined, then we look in Test's class, i.e., type. There's a __call__ there, so the rest proceeds like steps 3-5 in the previous section.
In Python 3.x, every class is implicitely a child of the builtin class object. And at least in the CPython implementation, the object class has a __call__ method which is defined in its metaclass type.
That means that Test.__call__() is exactly the same as Test() and will return a new Test object, calling your custom __init__ method.
In Python 2.x classes are by default old-style classes and are not child of object. Because of that __call__ is not defined. You can get the same behaviour in Python 2.x by using new style classes, meaning by making an explicit inheritance on object:
# Python 2 new style class
class Test(object):
...
If someone writes a class in python, and fails to specify their own __repr__() method, then a default one is provided for them. However, suppose we want to write a function which has the same, or similar, behavior to the default __repr__(). However, we want this function to have the behavior of the default __repr__() method even if the actual __repr__() for the class was overloaded. That is, suppose we want to write a function which has the same behavior as a default __repr__() regardless of whether someone overloaded the __repr__() method or not. How might we do it?
class DemoClass:
def __init__(self):
self.var = 4
def __repr__(self):
return str(self.var)
def true_repr(x):
# [magic happens here]
s = "I'm not implemented yet"
return s
obj = DemoClass()
print(obj.__repr__())
print(true_repr(obj))
Desired Output:
print(obj.__repr__()) prints 4, but print(true_repr(obj)) prints something like:
<__main__.DemoClass object at 0x0000000009F26588>
You can use object.__repr__(obj). This works because the default repr behavior is defined in object.__repr__.
Note, the best answer is probably just to use object.__repr__ directly, as the others have pointed out. But one could implement that same functionality roughly as:
>>> def true_repr(x):
... type_ = type(x)
... module = type_.__module__
... qualname = type_.__qualname__
... return f"<{module}.{qualname} object at {hex(id(x))}>"
...
So....
>>> A()
hahahahaha
>>> true_repr(A())
'<__main__.A object at 0x106549208>'
>>>
Typically we can use object.__repr__ for that, but this will to the "object repr for every item, so:
>>> object.__repr__(4)
'<int object at 0xa6dd20>'
Since an int is an object, but with the __repr__ overriden.
If you want to go up one level of overwriting, we can use super(..):
>>> super(type(4), 4).__repr__() # going up one level
'<int object at 0xa6dd20>'
For an int that thus again means that we will print <int object at ...>, but if we would for instance subclass the int, then it would use the __repr__ of int again, like:
class special_int(int):
def __repr__(self):
return 'Special int'
Then it will look like:
>>> s = special_int(4)
>>> super(type(s), s).__repr__()
'4'
What we here do is creating a proxy object with super(..). Super will walk the method resolution order (MRO) of the object and will try to find the first function (from a superclass of s) that has overriden the function. If we use single inheritance, that is the closest parent that overrides the function, but if it there is some multiple inheritance involved, then this is more tricky. We thus select the __repr__ of that parent, and call that function.
This is also a rather weird application of super since usually the class (here type(s)) is a fixed one, and does not depend on the type of s itself, since otherwise multiple such super(..) calls would result in an infinite loop.
But usually it is a bad idea to break overriding anyway. The reason a programmer overrides a function is to change the behavior. Not respecting this can of course sometimes result into some useful functions, but frequently it will result in the fact that the code contracts are no longer satisfied. For example if a programmer overrides __eq__, he/she will also override __hash__, if you use the hash of another class, and the real __eq__, then things will start breaking.
Calling magic function directly is also frequently seen as an antipattern, so you better avoid that as well.
I am trying to make a class that wraps a value that will be used across multiple other objects. For computational reasons, the aim is for this wrapped value to only be calculated once and the reference to the value passed around to its users. I don't believe this is possible in vanilla python due to its object container model. Instead, my approach is a wrapper class that is passed around, defined as follows:
class DynamicProperty():
def __init__(self, value = None):
# Value of the property
self.value: Any = value
def __repr__(self):
# Use value's repr instead
return repr(self.value)
def __getattr__(self, attr):
# Doesn't exist in wrapper, get it from the value
# instead
return getattr(self.value, attr)
The following works as expected:
wrappedString = DynamicProperty("foo")
wrappedString.upper() # 'FOO'
wrappedFloat = DynamicProperty(1.5)
wrappedFloat.__add__(2) # 3.5
However, implicitly calling __add__ through normal syntax fails:
wrappedFloat + 2 # TypeError: unsupported operand type(s) for
# +: 'DynamicProperty' and 'float'
Is there a way to intercept these implicit method calls without explicitly defining magic methods for DynamicProperty to call the method on its value attribute?
Talking about "passing by reference" will only confuse you. Keep that terminology to languages where you can have a choice on that, and where it makes a difference. In Python you always pass objects around - and this passing is the equivalent of "passing by reference" - for all objects - from None to int to a live asyncio network connection pool instance.
With that out of the way: the algorithm the language follows to retrieve attributes from an object is complicated, have details - implementing __getattr__ is just the tip of the iceberg. Reading the document called "Data Model" in its entirety will give you a better grasp of all the mechanisms involved in retrieving attributes.
That said, here is how it works for "magic" or "dunder" methods - (special functions with two underscores before and two after the name): when you use an operator that requires the existence of the method that implements it (like __add__ for +), the language checks the class of your object for the __add__ method - not the instance. And __getattr__ on the class can dynamically create attributes for instances of that class only.
But that is not the only problem: you could create a metaclass (inheriting from type) and put a __getattr__ method on this metaclass. For all querying you would do from Python, it would look like your object had the __add__ (or any other dunder method) in its class. However, for dunder methods, Python do not go through the normal attribute lookup mechanism - it "looks" directly at the class, if the dunder method is "physically" there. There are slots in the memory structure that holds the classes for each of the possible dunder methods - and they either refer to the corresponding method, or are "null" (this is "viewable" when coding in C on the Python side, the default dir will show these methods when they exist, or omit them if not). If they are not there, Python will just "say" the object does not implement that operation and period.
The way to work around that with a proxy object like you want is to create a proxy class that either features the dunder methods from the class you want to wrap, or features all possible methods, and upon being called, check if the underlying object actually implements the called method.
That is why "serious" code will rarely, if ever, offer true "transparent" proxy objects. There are exceptions, but from "Weakrefs", to "super()", to concurrent.futures, just to mention a few in the core language and stdlib, no one attempts a "fully working transparent proxy" - instead, the api is more like you call a ".value()" or ".result()" method on the wrapper to get to the original object itself.
However, it can be done, as I described above. I even have a small (long unmaintained) package on pypi that does that, wrapping a proxy for a future.
The code is at https://bitbucket.org/jsbueno/lelo/src/master/lelo/_lelo.py
The + operator in your case does not work, because DynamicProperty does not inherit from float. See:
>>> class Foo(float):
pass
>>> Foo(1.5) + 2
3.5
So, you'll need to do some kind of dynamic inheritance:
def get_dynamic_property(instance):
base = type(instance)
class DynamicProperty(base):
pass
return DynamicProperty(instance)
wrapped_string = get_dynamic_property("foo")
print(wrapped_string.upper())
wrapped_float = get_dynamic_property(1.5)
print(wrapped_float + 2)
Output:
FOO
3.5
Why doesn't Python have an instancemethod function analogous to staticmethod and classmethod?
Here is how this arose for me. Suppose I have an object which I know will be hashed frequently and whose hash is expensive to calculate. Under this assumption, it is reasonable to compute the hash value once and cache it, as in the following toy example:
class A:
def __init__(self, x):
self.x = x
self._hash_cache = hash(self.x)
def __hash__(self):
return self._hash_cache
The __hash__ function in this class does very little, just an attribute lookup and a return. Naively, it seems it ought to be equivalent to instead write:
class B:
def __init__(self, x):
self.x = x
self._hash_cache = hash(self.x)
__hash__ = operator.attrgetter('_hash_cache')
According to the documentation, operator.attrgetter returns a callable object that fetches the given attribute from its operand. If its operand is self, then it will return self._hash_cache, which is the desired result. Unfortunately this does not work:
>>> hash(A(1))
1
>>> hash(B(1))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: attrgetter expected 1 arguments, got 0
The reason for this is as follows. If one reads the descriptor HOWTO, one finds that class dictionaries store methods as functions; functions are non-data descriptors whose __get__ method returns a bound method. But operator.attrgetter does not return a function; it returns a callable object. And in fact, it is a callable object with no __get__ method:
>>> hasattr(operator.attrgetter('_hash_cache'), '__get__')
False
Lacking a __get__ method, this of course will not automatically be turned into a bound method. We can make a bound method from it using types.MethodType, but using it in our class B would require creating a bound method for every object instance and assigning it to __hash__.
We can see the fact that operator.attrgetter has no __get__ directly if we browse the CPython source. I'm not very familiar with the CPython API, but I believe that what's going on is as follows. The definition of the attrgetter_type is in Modules/_operator.c, at line 1439 as I write this. This type sets tp_descr_get to 0. And according to the type object documentation, that means an object whose type is attrgetter_type will not have a __get__.
Of course, if we give ourselves a __get__ method, then everything works. This is the case in the first example above, where __hash__ is actually a function and not just a callable. It's also true in some other cases. For example, if we want to lookup a class attribute, we could write the following:
class C:
y = 'spam'
get_y = classmethod(operator.attrgetter('y'))
As written this is terribly un-Pythonic (though it might be defensible if there were a strange custom __getattr__ for which we wanted to provide convenience functions). But at least it gives the desired result:
>>> C.get_y()
'spam'
I can't think of any reason why it would be bad for attrgetter_type to implement __get__. But on the other hand, even if it did, there would be other situations where we run into trouble. For example, suppose we have a class whose instances are callable:
class D:
def __call__(self, other):
...
We can't use an instance of this class as a class attribute and expect instance lookups to generate bound methods. For instance,
d = D()
class E:
apply_d = d
When D.__call__ is called, it will receive self but not other, and that generates a TypeError. This example might be a little far-fetched, but I'd be a little surprised if nobody had ever encountered something like this in practice. It could be fixed by giving D a __get__ method; but if D is from a third-party library that could be inconvenient.
It seems that the easiest solution would be to have an instancemethod function. Then we could write __hash__ = instancemethod(operator.attrgetter('_hash_cache')) and apply_d = instancemethod(d) and they would both work as intended. Yet, as far as I know, no such function exists. Hence my question: Why is there no instancemethod function?
EDIT: Just to be clear, the functionality of instancemethod would be equivalent to:
def instancemethod(func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
return wrapper
This could be applied as in the original question above. One could also imagine writing a class decorator that could be applied to D that would give it a __get__ method; but this code doesn't do this.
So I'm not talking about adding a new feature to Python. Really the question is one of language design: Why not provide it as, say, functools.instancemethod? If the answer is simply, "The use cases are so obscure that nobody's bothered," that's okay. But I would be happy to learn about other reasons, if there are any.
There is no instancemethod decorator because this is the default behaviour for functions declared inside a class.
class A:
...
# This is an instance method
def __hash__(self):
return self._hash_cache
Any callable which does not have a __get__ method can thus be wrapped into an instance method like so.
class A:
def instance_method(*args):
return any_callable(*args)
Thus creating an instancemethod decorator would just add another syntax for a feature which already exists. This would go against the saying that there should be one-- and preferably only one --obvious way to do it.
Side note
If it is so expensive to hash your instances, you might want to avoid calling you hash function on instantiation and delay it for when the object are hashed.
One way to do that could be to set the attribute _hash_cache in __hash__ instead of __init__. Although, let me suggest a slightly more self-contained methods which relies on caching your hash.
from weakref import finalize
class CachedHash:
def __init__(self, x):
self.x = x
def __hash__(self, _cache={}):
if id(self) not in _cache:
finalize(self, _cache.pop, id(self))
_cache[id(self)] = hash(self.x) # or some complex hash function
return _cache[id(self)]
The use of finalize ensures the cache is cleared of an id when its instance is garbage collected.
I have a satisfying answer to my question. Python does have the internal interface necessary for an instancemethod function, but it's not exposed by default.
import ctypes
import operator
instancemethod = ctypes.pythonapi.PyInstanceMethod_New
instancemethod.argtypes = (ctypes.py_object,)
instancemethod.restype = ctypes.py_object
class A:
def __init__(self, x):
self.x = x
self._hash_cache = hash(x)
__hash__ = instancemethod(operator.attrgetter('_hash_cache'))
a = A(1)
print(hash(a))
The instancemethod function this creates works in essentially the same way as classmethod and staticmethod. These three functions return new objects of types instancemethod, classmethod, and staticmethod, respectively. We can see how they work by looking at Objects/funcobject.c. These objects all have __func__ members which store a callable object. They also have a __get__. For a staticmethod object, the __get__ returns __func__ unchanged. For a classmethod object, __get__ returns a bound method object, where the binding is to the class object. And for a staticmethod object, __get__ returns a bound method object, where the binding is to the object instance. This is precisely the same behavior as __get__ for a function object and is exactly what we want.
The only documentation on these objects seems to be in the Python C API here. My guess is that they're not exposed because they're so rarely needed. I think it would be nice to have PyInstanceMethod_New available as functools.instancemethod.
How are you supposed to access the 10 in this? I've been informed we're returning a function in this function, but how does this make sense?
function([1, 2, 3, 4])(10)
I'm assuming a lot based on the limited information you've provided in your question.
But it looks like you trying to understand a functional closure. Here's a totally contrived example:
def function(a):
def inner(b):
return sum(a) == b
return inner
>>> function([1,2,3,4])(10)
True
>>> eq = function([1,2,3,4])
>>> eq(10)
True
>>> eq(11)
False
In your expression function([1, 2, 3, 4])(10), there are two calls, one with the argument [1, 2, 3, 4] and the other with the argument 10. For this to work, function must be a callable that returns a callable. Python relies heavily on objects having types which define their behaviour, and callability is one of those behaviours, recursively defined by objects having a __call__ method (which is a type of callable). Because of this dynamic behaviour, we can't tell from the expression what type function is.
We can provide examples that would make the expression valid, though. For instance:
function = lambda x: x.__contains__
This creates an anonymous function using a lambda expression, which is a callable. That function returns a bound method (assuming its argument has the __contains__ method) which in turn is callable, and the expression would evaluate to False.
class function:
def __init__(self,a):
"Method called during object initialization"
# Note that the return value doesn't come from this method.
# self is created before it is called and returned after.
def __call__(self,b):
"Method called when the object is called"
return "Well, the first one wasn't quite a function."
This makes a class named function, and classes are callable, which is how we instantiate them. So the first call became an object instantiation and the second call calls an object. In this example, we don't actually have a function, though we do have two methods that are called within the two calls.
AChampion's example uses two normal function definitions, one of which occurs inside another creating a closure over that call's a value. That is a more traditional approach, though we can still muddle the waters using mutable values:
def function(a):
def inner(b):
return sum(a) == b
return inner
>>> l = [1,2,3,4]
>>> eq = function(l)
>>> eq(10)
True
>>> eq(15)
False
>>> l.append(5)
>>> eq(15)
True
>>> eq(10)
False
We see here that this isn't a pure function in the mathematical sense, as its value is affected by other state than its arguments. We frequently try to avoid such side effects, or at least expose them by prominently displaying the state container, such as in method calls.
Lastly, depending on the context, the expression could fail in a variety of ways including NameError if function simply isn't defined, or TypeError if one of the calls was attempted on a non-callable object. It's still syntactically correct Python, and both of those exceptions are possible to handle, although doing so is likely a bit of a perversion. An example might be a spreadsheet program in which the cell formulae are Python expressions; you'd evaluate them with specific namespaces (globals), and catch any error to account for mistyped formulae.