Python __iter__ and for loops - python

As I understand it, I can use the for loop construction on an object with a __iter__ method that returns an iterator. I have an object for which I implement the following __getattribute__ method:
def __getattribute__(self,name):
if name in ["read","readlines","readline","seek","__iter__","closed","fileno","flush","mode","tell","truncate","write","writelines","xreadlines"]:
return getattr(self.file,name)
return object.__getattribute__(self,name)
I have an object of this class, a for which the following happens:
>>> hasattr(a,"__iter__")
True
>>> for l in a: print l
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'TmpFile' object is not iterable
>>> for l in a.file: print l
...
>>>
So python sees that a has an __iter__ method, but doesn't think it is iterable. What have I done wrong? This is with python 2.6.4.

There's a subtle implementation detail getting in your way: __iter__ isn't actually an instance method, but a class method. That is, obj.__class__.__iter__(obj) is called, rather than obj.__iter__().
This is due to slots optimizations under the hood, allowing the Python runtime to set up iterators faster. This is needed since it's very important that iterators be as fast as possible.
It's not possible to define __getattribute__ for the underlying class type, so it's not possible to return this method dynamically. This applies to most __metamethods__; you'll need to write an actual wrapper.

Some of the special methods are optimised when a class is created and cannot be added later or overridden by assignment. See the documentation for __getattribute__ which says:
This method may still be bypassed when
looking up special methods as the
result of implicit invocation via
language syntax or built-in functions.
What you need to do in this case is provide a direct implementation of __iter__ that forwards the call:
def __iter__(self):
return self.file.__iter__()

Related

Why doesn't Python have an instancemethod function?

Why doesn't Python have an instancemethod function analogous to staticmethod and classmethod?
Here is how this arose for me. Suppose I have an object which I know will be hashed frequently and whose hash is expensive to calculate. Under this assumption, it is reasonable to compute the hash value once and cache it, as in the following toy example:
class A:
def __init__(self, x):
self.x = x
self._hash_cache = hash(self.x)
def __hash__(self):
return self._hash_cache
The __hash__ function in this class does very little, just an attribute lookup and a return. Naively, it seems it ought to be equivalent to instead write:
class B:
def __init__(self, x):
self.x = x
self._hash_cache = hash(self.x)
__hash__ = operator.attrgetter('_hash_cache')
According to the documentation, operator.attrgetter returns a callable object that fetches the given attribute from its operand. If its operand is self, then it will return self._hash_cache, which is the desired result. Unfortunately this does not work:
>>> hash(A(1))
1
>>> hash(B(1))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: attrgetter expected 1 arguments, got 0
The reason for this is as follows. If one reads the descriptor HOWTO, one finds that class dictionaries store methods as functions; functions are non-data descriptors whose __get__ method returns a bound method. But operator.attrgetter does not return a function; it returns a callable object. And in fact, it is a callable object with no __get__ method:
>>> hasattr(operator.attrgetter('_hash_cache'), '__get__')
False
Lacking a __get__ method, this of course will not automatically be turned into a bound method. We can make a bound method from it using types.MethodType, but using it in our class B would require creating a bound method for every object instance and assigning it to __hash__.
We can see the fact that operator.attrgetter has no __get__ directly if we browse the CPython source. I'm not very familiar with the CPython API, but I believe that what's going on is as follows. The definition of the attrgetter_type is in Modules/_operator.c, at line 1439 as I write this. This type sets tp_descr_get to 0. And according to the type object documentation, that means an object whose type is attrgetter_type will not have a __get__.
Of course, if we give ourselves a __get__ method, then everything works. This is the case in the first example above, where __hash__ is actually a function and not just a callable. It's also true in some other cases. For example, if we want to lookup a class attribute, we could write the following:
class C:
y = 'spam'
get_y = classmethod(operator.attrgetter('y'))
As written this is terribly un-Pythonic (though it might be defensible if there were a strange custom __getattr__ for which we wanted to provide convenience functions). But at least it gives the desired result:
>>> C.get_y()
'spam'
I can't think of any reason why it would be bad for attrgetter_type to implement __get__. But on the other hand, even if it did, there would be other situations where we run into trouble. For example, suppose we have a class whose instances are callable:
class D:
def __call__(self, other):
...
We can't use an instance of this class as a class attribute and expect instance lookups to generate bound methods. For instance,
d = D()
class E:
apply_d = d
When D.__call__ is called, it will receive self but not other, and that generates a TypeError. This example might be a little far-fetched, but I'd be a little surprised if nobody had ever encountered something like this in practice. It could be fixed by giving D a __get__ method; but if D is from a third-party library that could be inconvenient.
It seems that the easiest solution would be to have an instancemethod function. Then we could write __hash__ = instancemethod(operator.attrgetter('_hash_cache')) and apply_d = instancemethod(d) and they would both work as intended. Yet, as far as I know, no such function exists. Hence my question: Why is there no instancemethod function?
EDIT: Just to be clear, the functionality of instancemethod would be equivalent to:
def instancemethod(func):
#functools.wraps(func)
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
return wrapper
This could be applied as in the original question above. One could also imagine writing a class decorator that could be applied to D that would give it a __get__ method; but this code doesn't do this.
So I'm not talking about adding a new feature to Python. Really the question is one of language design: Why not provide it as, say, functools.instancemethod? If the answer is simply, "The use cases are so obscure that nobody's bothered," that's okay. But I would be happy to learn about other reasons, if there are any.
There is no instancemethod decorator because this is the default behaviour for functions declared inside a class.
class A:
...
# This is an instance method
def __hash__(self):
return self._hash_cache
Any callable which does not have a __get__ method can thus be wrapped into an instance method like so.
class A:
def instance_method(*args):
return any_callable(*args)
Thus creating an instancemethod decorator would just add another syntax for a feature which already exists. This would go against the saying that there should be one-- and preferably only one --obvious way to do it.
Side note
If it is so expensive to hash your instances, you might want to avoid calling you hash function on instantiation and delay it for when the object are hashed.
One way to do that could be to set the attribute _hash_cache in __hash__ instead of __init__. Although, let me suggest a slightly more self-contained methods which relies on caching your hash.
from weakref import finalize
class CachedHash:
def __init__(self, x):
self.x = x
def __hash__(self, _cache={}):
if id(self) not in _cache:
finalize(self, _cache.pop, id(self))
_cache[id(self)] = hash(self.x) # or some complex hash function
return _cache[id(self)]
The use of finalize ensures the cache is cleared of an id when its instance is garbage collected.
I have a satisfying answer to my question. Python does have the internal interface necessary for an instancemethod function, but it's not exposed by default.
import ctypes
import operator
instancemethod = ctypes.pythonapi.PyInstanceMethod_New
instancemethod.argtypes = (ctypes.py_object,)
instancemethod.restype = ctypes.py_object
class A:
def __init__(self, x):
self.x = x
self._hash_cache = hash(x)
__hash__ = instancemethod(operator.attrgetter('_hash_cache'))
a = A(1)
print(hash(a))
The instancemethod function this creates works in essentially the same way as classmethod and staticmethod. These three functions return new objects of types instancemethod, classmethod, and staticmethod, respectively. We can see how they work by looking at Objects/funcobject.c. These objects all have __func__ members which store a callable object. They also have a __get__. For a staticmethod object, the __get__ returns __func__ unchanged. For a classmethod object, __get__ returns a bound method object, where the binding is to the class object. And for a staticmethod object, __get__ returns a bound method object, where the binding is to the object instance. This is precisely the same behavior as __get__ for a function object and is exactly what we want.
The only documentation on these objects seems to be in the Python C API here. My guess is that they're not exposed because they're so rarely needed. I think it would be nice to have PyInstanceMethod_New available as functools.instancemethod.

"implicit uses of special methods always rely on the class-level binding of the special method"

I have difficulty understanding the last part (in bold) from Python in a Nutshell
Per-Instance Methods
An instance can have instance-specific bindings for all attributes,
including callable attributes (methods). For a method, just like for
any other attribute (except those bound to overriding descriptors),
an instance-specific binding hides a class-level binding:
attribute lookup does not consider the class when it finds a
binding directly in the instance. An instance-specific binding for a
callable attribute does not perform any of the transformations
detailed in “Bound and Unbound Methods” on page 110: the attribute
reference returns exactly the same callable object that was earlier
bound directly to the instance attribute.
However, this does not work as you might expect
for per-instance bindings of the special methods that Python calls
implicitly as a result of various operations, as covered in “Special
Methods” on page 123. Such implicit uses of special methods always
rely on the class-level binding of the special method, if any. For
example:
def fake_get_item(idx): return idx
class MyClass(object): pass
n = MyClass()
n.__getitem__ = fake_get_item
print(n[23]) # results in:
# Traceback (most recent call last):
# File "<stdin>", line 1, in ?
# TypeError: unindexable object
What does it mean specifically?
Why is the error of the example?
Thanks.
Neglecting all the fine details it basically says that special methods (as defined in Pythons data model - generally these are the methods starting with two underscores and ending with two underscores and are rarely, if ever, called directly) will never be used implicitly from the instance even if defined there:
n[whatever] # will always call type(n).__getitem__(n, whatever)
This differs from attribute look-up which checks the instance first:
def fake_get_item(idx):
return idx
class MyClass(object):
pass
n = MyClass()
n.__getitem__ = fake_get_item
print(n.__getitem__(23)) # works because attribute lookup checks the instance first
There is a whole section in the documentation about this (including rationale): "Special method lookup":
3.3.9. Special method lookup
For custom classes, implicit invocations of special methods are only guaranteed to work correctly if defined on an object’s type, not in the object’s instance dictionary. That behaviour is the reason why the following code raises an exception:
>>> class C:
... pass
...
>>> c = C()
>>> c.__len__ = lambda: 5
>>> len(c)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: object of type 'C' has no len()
The rationale behind this behaviour lies with a number of special methods such as __hash__() and __repr__() that are implemented by all objects, including type objects. If the implicit lookup of these methods used the conventional lookup process, they would fail when invoked on the type object itself:
>>> 1 .__hash__() == hash(1)
True
>>> int.__hash__() == hash(int)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: descriptor '__hash__' of 'int' object needs an argument
[...]
Bypassing the __getattribute__() machinery in this fashion provides significant scope for speed optimisations within the interpreter, at the cost of some flexibility in the handling of special methods (the special method must be set on the class object itself in order to be consistently invoked by the interpreter).
To put it even more plainly, it means that you can't redefine the dunder methods on the fly. As a consequence, ==, +, and the rest of the operators always mean the same thing for all objects of type T.
I'll try to summarize what the extract says and in particular the part in bold.
Generally speaking, when Python tries to find the value of an attribute (including a method), it first checks the instance (i.e. the actual object you created), then the class.
The code below illustrates the generic behavior.
class MyClass(object):
def a(self):
print("howdy from the class")
n = MyClass()
#here the class method is called
n.a()
#'howdy from the class'
def new_a():
print("hello from new a")
n.a = new_a
#the new instance binding hides the class binding
n.a()
#'hello from new a'
What the part in bold states is that this behavior does not apply to "Special Methods" such as __getitem__. In other words, overriding __getitem__ at the instance level (n.__getitem__ = fake_get_item in your exemple) does nothing : when the method is called through the n[] syntax, an error is raised because the class does not implement the method.
(If the generic behavior also held in this case, the result of print(n[23]) would have been to print 23, i.e. executing the fake_get_item method).
Another example of the same behavior:
class MyClass(object):
def __getitem__(self, idx):
return idx
n = MyClass()
fake_get_item = lambda x: "fake"
print(fake_get_item(23))
#'fake'
n.__getitem__ = fake_get_item
print(n[23])
#'23'
In this example, the class method for __getitem__ (which returns the index number) is called instead of the instance binding (which returns 'fake').

How come an object that implements __iter__ is not recognized as iterable?

Let's say you work with a wrapper object:
class IterOrNotIter:
def __init__(self):
self.f = open('/tmp/toto.txt')
def __getattr__(self, item):
try:
return self.__getattribute__(item)
except AttributeError:
return self.f.__getattribute__(item)
This object implements __iter__, because it passes any call to it to its member f, which implements it. Case in point:
>>> x = IterOrNotIter()
>>> x.__iter__().__next__()
'Whatever was in /tmp/toto.txt\n'
According to the documentation (https://docs.python.org/3/library/stdtypes.html#iterator-types), IterOrNotIter should thus be iterable.
However, the Python interpreter does not recognize an IterOrNotIter object as actually being iterable:
>>> x = IterOrNotIter()
>>> for l in x:
... print(l)
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'IterOrNotIter' object is not iterable
Whereas this works:
>>> x = IterOrNotIter()
>>> for l in x.f:
... print(l)
...
Whatever was in /tmp/toto.txt
I don't understand why.
Basically because your class just doesn't have a real __iter__ method:
>>> hasattr(IterOrNotIter, '__iter__')
False
So it doesn't qualify as iterator because the actual check for __iter__ checks for the existence instead of assuming it's implemented. So workarounds with __getattr__ or __getattribute__ (unfortunatly) don't work.
This is actually mentioned in the documentation for __getattribute__:
Note
This method may still be bypassed when looking up special methods as the result of implicit invocation via language syntax or built-in functions. See Special method lookup.
The latter section also explains the why:
Bypassing the __getattribute__() machinery in this fashion provides significant scope for speed optimisations within the interpreter, at the cost of some flexibility in the handling of special methods (the special method must be set on the class object itself in order to be consistently invoked by the interpreter).
Emphasis mine.

Why can't I iterate over an object which delegates via __getattr__ to an iterable?

An example from the book Core Python Programming on the topic Delegation doesn't seem to be working.. Or may be I didn't understand the topic clearly..
Below is the code, in which the class CapOpen wraps a file object and defines a modified behaviour of file when opened in write mode. It should write all strings in UPPERCASE only.
However when I try to open the file for reading, and iterate over it to print each line, I get the following exception:
Traceback (most recent call last):
File "D:/_Python Practice/Core Python Programming/chapter_13_Classes/
WrappingFileObject.py", line 29, in <module>
for each_line in f:
TypeError: 'CapOpen' object is not iterable
This is strange, because although I haven't explicitly defined iterator methods, I'd expect the calls to be delegated via __getattr__ to the underlying file object. Here's the code. Have I missed anything?
class CapOpen(object):
def __init__(self, filename, mode='r', buf=-1):
self.file = open(filename, mode, buf)
def __str__(self):
return str(self.file)
def __repr__(self):
return `self.file`
def write(self, line):
self.file.write(line.upper())
def __getattr__(self, attr):
return getattr(self.file, attr)
f = CapOpen('wrappingfile.txt', 'w')
f.write('delegation example\n')
f.write('faye is good\n')
f.write('at delegating\n')
f.close()
f = CapOpen('wrappingfile.txt', 'r')
for each_line in f: # I am getting Exception Here..
print each_line,
I am using Python 2.7.
This is a non-intuitive consequence of a Python implementation decision for new-style classes:
In addition to bypassing any instance attributes in the interest of
correctness, implicit special method lookup generally also bypasses
the __getattribute__() method even of the object’s metaclass...
Bypassing the __getattribute__() machinery in this fashion provides
significant scope for speed optimisations within the interpreter, at
the cost of some flexibility in the handling of special methods (the
special method must be set on the class object itself in order to be
consistently invoked by the interpreter).
This is also explicitly pointed out in the documentation for __getattr__/__getattribute__:
Note
This method may still be bypassed when looking up special methods as
the result of implicit invocation via language syntax or built-in
functions. See Special method lookup for new-style classes.
In other words, you can't rely on __getattr__ to always intercept your method lookups when your attributes are undefined. This is not intuitive, because it is reasonable to expect these implicit lookups to follow the same path as all other clients that access your object. If you call f.__iter__ directly from other code, it will resolve as expected. However, that isn't the case when called directly from the language.
The book you quote is pretty old, so the original example probably used old-style classes. If you remove the inheritance from object, your code will work as intended. That being said, you should avoid writing old style classes, since they will become obsolete in Python 3. If you want to, you can still maintain the delegation style here by implementing __iter__ and immediately delegating to the underlying self.file.__iter__.
Alternatively, inherit from the file object directly and __iter__ will be available by normal lookup, so that will also work.
For an object to be iterable, its class has to have __iter__ or __getitem__ defined.
__getattr__ is only called when something is being retrieved from the instance, but because there are several ways that iteration is supported, Python is looking first to see if the appropriate methods even exist.
Try this:
class Fake(object):
def __getattr__(self, name):
print "Nope, no %s here!" % name
raise AttributeError
f = Fake()
for not_here in f:
print not_here
As you can see, the same error is raised: TypeError: 'Fake' object is not iterable.
If you then do this:
print '__getattr__' in Fake.__dict__
print '__iter__' in Fake.__dict__
print '__getitem__' in Fake.__dict__
You can see what Python is seeing: that neither __iter__ nor __getitem__ exist, so Python does not know how to iterate over it. While Python could just try and then catch the exception, I suspect the reason why it does not is that catching exceptions is quite a bit slower.
See my answer here for the many ways to make an iterator.

need memoized function to quack like a function

In a bit of my code I'm using the nice memoized class from the Python Decorator Library.
One of the libraries I'm using uses introspection on a function to get the number of arguments it takes, and fails on the decorated function. Specifically, it checks the co_argcount variable.
if (PyInt_AsLong(co_argcount) < 1) {
PyErr_SetString(PyExc_TypeError, "This function has no parameters to mini\
mize.");
It seems the argcount isn't being transferred to the memoized function.
>>> def f(x):
... return x
...
>>> f.func_code.co_argcount
1
>>> g = memoized(f)
>>> g.func_code.co_argcount
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'memoized' object has no attribute 'func_code'
How can I modify the memoized class so that my memoized functions look, taste, and smell like the original function?
You need to create a signature-preserving decorator. The easiest way to do that is to use the library http://pypi.python.org/pypi/decorator which takes care of preserving the signature for you.
The internals of the library are quite ugly (it uses exec!) but it encapsulates them quite well.
Add that to your memoized class
def __getattr__(self, name):
if name.startswith('func_'):
return getattr(self.func, name)
raise AttributeError
So it'll pass attribute lookup for func_... to the original function.
Maybe you will also want to write a __setattr__ function to deny writing these attributes, but it's not necessary if you know you won't try to change the values.

Categories