I'm trying to write a custom wrapper class for containers. To implement the iterator-prototocol I provide __iter__ and __next__ and to access individual items I provide __getitem__:
#!/usr/bin/python
# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, print_function, unicode_literals, with_statement
from future_builtins import *
import numpy as np
class MyContainer(object):
def __init__(self, value):
self._position = 0
self.value = value
def __str__(self):
return str(self.value)
def __len__(self):
return len(self.value)
def __getitem__(self, key):
return self.value[key]
def __iter__(self):
return self
def next(self):
if (self._position >= len(self.value)):
raise StopIteration
else:
self._position += 1
return self.value[self._position - 1]
So far, everything works as expected, e. g. when trying things like:
if __name__ == '__main__':
a = MyContainer([1,2,3,4,5])
print(a)
iter(a) is iter(a)
for i in a:
print(i)
print(a[2])
But I run into problems when trying to use numpy.maximum:
b= MyContainer([2,3,4,5,6])
np.maximum(a,b)
Raises "ValueError: cannot copy sequence with size 5 to array axis with dimension 0".
When commenting out the __iter__ method, I get back a NumPy array with the correct results (while no longer conforming to the iterator protocol):
print(np.maximum(a,b)) # results in [2 3 4 5 6]
And when commenting out __getitem__, I get back an instance of MyContainer
print(np.maximum(a,b)) # results in [2 3 4 5 6]
But I lose the access to individual items.
Is there any way to achieve all three goals together (Iterator-Protocol, __getitem__ and numpy.maximum working)? Is there anything I'm doing fundamentally wrong?
To note: The actual wrapper class has more functionality but this is the minimal example where I could reproduce the behaviour.
(Python 2.7.12, NumPy 1.11.1)
Your container is its own iterator, which limits it greatly. You can only iterate each container once, after that, it is considered "empty" as far as the iteration protocol goes.
Try this with your code to see:
c = MyContainer([1,2,3])
l1 = list(c) # the list constructor will call iter on its argument, then consume the iterator
l2 = list(c) # this one will be empty, since the container has no more items to iterate on
When you don't provide an __iter__ method but do implement a __len__ methond and a __getitem__ method that accepts small integer indexes, Python will use __getitem__ to iterate. Such iteration can be done multiple times, since the iterator objects that are created are all distinct from each other.
If you try the above code after taking the __iter__ method out of your class, both lists will be [1, 2, 3], as expected. You could also fix up your own __iter__ method so that it returns independent iterators. For instance, you could return an iterator from your internal sequence:
def __iter__(self):
return iter(self.value)
Or, as suggested in a comment by Bakuriu:
def __iter__(self):
return (self[i] for i in range(len(self))
This latter version is essentially what Python will provide for you if you have a __getitem__ method but no __iter__ method.
Related
I am learning to use classes.
I want to create a class with several methods and attributes.
In my case, some of the methods might be functions that I have written elsewhere in the code.
Is it a good practice to re-define the functions as methods of the class like in the following example?
def f(x,b):
return b*x
class Class1:
def __init__(self,b):
self.b=8.
def f(self,x):
return f(x,self.b)
instance1=Class1(5)
print instance1.f(7.)
The code returns what it should, but is this the right way to do it or perhaps it is redundant or it might lead to troubles for larger codes?
What is the right way to define methods using function written elsewhere?
Functions...
Consider the following group of functions:
def stack_push(stack, value):
stack.append(value)
def stack_is_empty(stack):
return len(stack) == 0
def stack_pop(stack):
return stack.pop()
Together, they implement a stack in terms of a built-in list. As long as you don't interact with the list directly, the only thing they allow is adding values to one end, removing values from the same end, and testing if the stack is empty:
>>> s = []
>>> stack_push(s, 3)
>>> stack_push(s, 5)
>>> stack_push(s, 10)
>>> if not stack_is_empty(s): stack_pop(s)
10
>>> if not stack_is_empty(s): stack_pop(s)
5
>>> if not stack_is_empty(s): stack_pop(s)
3
>>> if not stack_is_empty(s): stack_pop(s)
>>>
... vs. Methods
Notice that each function takes the same argument: a list being treated as a stack. This is an indication that we can instead write a class the represents a stack, so that we don't need to maintain a list that could potentially be (mis)used outside of these three functions. It also guarantees that we start with an empty list for our new stack.
class Stack:
def __init__(self):
self.data = []
def push(self, value):
self.data.append(value)
def is_empty(self):
return len(self.data) == 0
def pop(self):
return self.data.pop()
Now, we don't work with a list that supports all sorts of non-stack operations like indexing, iteration, and mutation at the beginning or middle of the list: we can only push, pop, and test for emptiness.
>>> s = Stack()
Things like s[3], s.insert(2, 9), etc are not allowed.
(Note that we aren't strictly prevented from using s.data directly, but it's considered bad practice to do so unless the class says it is OK to do so in its documentation. In this case, we do not allow that.)
We use these methods much like we used the stack_* functions.
>>> s.push(3)
>>> s.push(5)
>>> s.push(10)
>>> if not s.is_empty(): s.pop()
10
>>> if not s.is_empty(): s.pop()
5
>>> if not s.is_empty(): s.pop()
3
>>> if not s.is_empty(): s.pop()
>>>
The difference is, we cannot "accidentally" use other list methods, because Stack does not expose them.
>>> s.insert(3, 9)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'Stack' object has no attribute 'insert'
Finally, note that we don't write our original stack_* functions and use them in the definition of the Stack class: there is no need; we just define the methods explicitly inside the class statement.
# No.
class Stack:
def push(self, value):
stack_push(self.data, value)
We also don't continue to use the stack_* functions on an instance of Stack.
# No no no!
>>> stack_is_empty(s.data)
I am trying to implement an iterable proxy for a web resource (lazily fetched images).
Firstly, I did (returning ids, in production those will be image buffers)
def iter(ids=[1,2,3]):
for id in ids:
yield id
and that worked nicely, but now I need to keep state.
I read the four ways to define iterators. I judged that the iterator protocol is the way to go. Follow my attempt and failure to implement that.
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
return self
def __next__(self):
for id in self.ids:
yield id
raise StopIteration
test = Test([1,2,3])
for t in test:
print('new value', t)
Output:
new value <generator object Test.__next__ at 0x7f9c46ed1750>
new value <generator object Test.__next__ at 0x7f9c46ed1660>
new value <generator object Test.__next__ at 0x7f9c46ed1750>
new value <generator object Test.__next__ at 0x7f9c46ed1660>
new value <generator object Test.__next__ at 0x7f9c46ed1750>
forever.
What's wrong?
Thanks to absolutely everyone! It's all new to me, but I'm learning new cool stuff.
Your __next__ method uses yield, which makes it a generator function. Generator functions return a new iterator when called.
But the __next__ method is part of the iterator interface. It should not itself be an iterator. __next__ should return the next value, not something that returns all values(*).
Because you wanted to create an iterable, you can just make __iter__ the generator here:
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
for id in self.ids:
yield id
Note that a generator function should not use raise StopIteration, just returning from the function does that for you.
The above class is an iterable. Iterables only have an __iter__ method, and no __next__ method. Iterables produce an iterator when __iter__ is called:
Iterable -> (call __iter__) -> Iterator
In the above example, because Test.__iter__ is a generator function, it creates a new object each time we call it:
>>> test = Test([1,2,3])
>>> test.__iter__() # create an iterator
<generator object Test.__iter__ at 0x111e85660>
>>> test.__iter__()
<generator object Test.__iter__ at 0x111e85740>
A generator object is a specific kind of iterator, one created by calling a generator function, or by using a generator expression. Note that the hex values in the representations differ, two different objects were created for the two calls. This is by design! Iterables produce iterators, and can create more at will. This lets you loop over them independently:
>>> test_it1 = test.__iter__()
>>> test_it1.__next__()
1
>>> test_it2 = test.__iter__()
>>> test_it2.__next__()
1
>>> test_it1.__next__()
2
Note that I called __next__() on the object returned by test.__iter__(), the iterator, not on test itself, which doesn't have that method because it is only an iterable, not an iterator.
Iterators also have an __iter__ method, which always must return self, because they are their own iterators. It is the __next__ method that makes them an iterator, and the job of __next__ is to be called repeatedly, until it raises StopIteration. Until StopIteration is raised, each call should return the next value. Once an iterator is done (has raised StopIteration), it is meant to then always raise StopIteration. Iterators can only be used once, unless they are infinite (never raise StopIteration and just keep producing values each time __next__ is called).
So this is an iterator:
class IteratorTest:
def __init__(self, ids):
self.ids = ids
self.nextpos = 0
def __iter__(self):
return self
def __next__(self):
if self.ids is None or self.nextpos >= len(self.ids):
# we are done
self.ids = None
raise StopIteration
value = self.ids[self.nextpos]
self.nextpos += 1
return value
This has to do a bit more work; it has to keep track of what the next value to produce would be, and if we have raised StopIteration yet. Other answerers here have used what appear to be simpler ways, but those actually involve letting something else do all the hard work. When you use iter(self.ids) or (i for i in ids) you are creating a different iterator to delegate __next__ calls to. That's cheating a bit, hiding the state of the iterator inside ready-made standard library objects.
You don't usually see anything calling __iter__ or __next__ in Python code, because those two methods are just the hooks that you can implement in your Python classes; if you were to implement an iterator in the C API then the hook names are slightly different. Instead, you either use the iter() and next() functions, or just use the object in syntax or a function call that accepts an iterable.
The for loop is such syntax. When you use a for loop, Python uses the (moral equivalent) of calling __iter__() on the object, then __next__() on the resulting iterator object to get each value. You can see this if you disassemble the Python bytecode:
>>> from dis import dis
>>> dis("for t in test: pass")
1 0 LOAD_NAME 0 (test)
2 GET_ITER
>> 4 FOR_ITER 4 (to 10)
6 STORE_NAME 1 (t)
8 JUMP_ABSOLUTE 4
>> 10 LOAD_CONST 0 (None)
12 RETURN_VALUE
The GET_ITER opcode at position 2 calls test.__iter__(), and FOR_ITER uses __next__ on the resulting iterator to keep looping (executing STORE_NAME to set t to the next value, then jumping back to position 4), until StopIteration is raised. Once that happens, it'll jump to position 10 to end the loop.
If you want to play more with the difference between iterators and iterables, take a look at the Python standard types and see what happens when you use iter() and next() on them. Like lists or tuples:
>>> foo = (42, 81, 17, 111)
>>> next(foo) # foo is a tuple, not an iterator
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object is not an iterator
>>> t_it = iter(foo) # so use iter() to create one from the tuple
>>> t_it # here is an iterator object for our foo tuple
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it) # it returns itself
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it) is t_it # really, it returns itself, not a new object
True
>>> next(t_it) # we can get values from it, one by one
42
>>> next(t_it) # another one
81
>>> next(t_it) # yet another one
17
>>> next(t_it) # this is getting boring..
111
>>> next(t_it) # and now we are done
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> next(t_it) # an *stay* done
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> foo # but foo itself is still there
(42, 81, 17, 111)
You could make Test, the iterable, return a custom iterator class instance too (and not cop out by having generator function create the iterator for us):
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
return TestIterator(self)
class TestIterator:
def __init__(self, test):
self.test = test
def __iter__(self):
return self
def __next__(self):
if self.test is None or self.nextpos >= len(self.test.ids):
# we are done
self.test = None
raise StopIteration
value = self.test.ids[self.nextpos]
self.nextpos += 1
return value
That's a lot like the original IteratorTest class above, but TestIterator keeps a reference to the Test instance. That's really how tuple_iterator works too.
A brief, final note on naming conventions here: I am sticking with using self for the first argument to methods, so the bound instance. Using different names for that argument only serves to make it harder to talk about your code with other, experienced Python developers. Don't use me, however cute or short it may seem.
(*) Unless your goal was to create an iterator of iterators, of course (which is basically what the itertools.groupby() iterator does, it is an iterator producing (object, group_iterator) tuples, but I digress).
It is unclear to me exactly what you are trying to achieve, but if you really want to use your instance attributes like this, you can convert the input to a generator and then iterate it as such. But, as I said, this feels odd and I don't think you'd actually want a setup like this.
class Test:
def __init__(self, ids):
self.ids = iter(ids)
def __iter__(self):
return self
def __next__(self):
return next(self.ids)
test = Test([1,2,3])
for t in test:
print('new value', t)
The simplest solution is to use __iter__ and return an iterator to the main list:
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
return iter(self.ids)
test = Test([1,2,3])
for t in test:
print('new value', t)
As the update, for lazily loading you can return an iterator to a generator:
def __iter__(self):
return iter(load_file(id) for id in self.ids)
The __next__ function is supposed to return the next value provided by an iterator. Since you have used yield in your implementation, the function returns a generator, which is what you get.
You need to make clear whether you want Test to be an iterable or an iterator. If it is an iterable, it will have the ability to provide an iterator with __iter__. If it is an iterator, it will have the ability to provide new elements with __next__. Iterators can typically work as iterables by returning themselves in __iter__. Martijn's answer shows what you probably want. However, if you want an example of how you could specifically implement __next__ (by making Test explicitly an iterator), it could be something like this:
class Test:
def __init__(self, ids):
self.ids = ids
self.idx = 0
def __iter__(self):
return self
def __next__(self):
if self.idx >= len(self.ids):
raise StopIteration
else:
self.idx += 1
return self.ids[self.idx - 1]
test = Test([1,2,3])
for t in test:
print('new value', t)
Documentation says:
The constructor builds a list whose items are the same and in the same
order as iterable‘s items. iterable may be either a sequence, a
container that supports iteration, or an iterator object. If iterable
is already a list, a copy is made and returned, similar to
iterable[:]...
But if I have an object a of my class A, that implements __iter__, __len__ and __getitem__, which interface is used by list(a) to iterate my object and what logic is behind this?
My quick experimenting confuses me:
class A(object):
def __iter__(self):
print '__iter__ was called'
return iter([1,2,3])
def __len__(self):
print '__len__ was called'
return 3
def __getitem__(self, index):
print '__getitem(%i)__ was called' % index
return index+1
a = A()
list(a)
Outputs
__iter__ was called
__len__ was called
[1, 2, 3]
A.__iter__ was called first, ok. But why then A.__len__ was called? And then why A.__getitem__ was not called?
Then I turned __iter__ to a generator
And this changed the order of magic method calls!
class B(object):
def __iter__(self):
print '__iter__ was called'
yield 1
yield 2
yield 3
def __len__(self):
print '__len__ was called'
return 3
def __getitem__(self, index):
print '__getitem(%i)__ was called' % index
return index+1
b = B()
list(b)
Outputs
__len__ was called
__iter__ was called
[1, 2, 3]
Why B.__len__ was called first now? But why then B.__getitem__ was not called, and conversion was done with B.__iter__?
And what confuses me most is why the order of calls of __len__ and __iter__ is different in cases of A and B?
The call order didn't change. __iter__ still got called first, but calling __iter__ doesn't run the function body immediately when __iter__ is a generator. The print only happens once next gets called.
__len__ getting called is an implementation detail. Python wants a hint for how much space to allocate for the list, so it calls _PyObject_LengthHint on your object, which uses len if the object supports it. It is expected that calling len on an object will generally be fast and free of visible side effects.
The main function of the class is a dictionary with words as keys and id numbers as values (note: id is not in sequential because some of the entries are removed):
x = {'foo':0, 'bar':1, 'king':3}
When i wrote the iterator function for a customdict class i created, it breaks when iterating through range(1 to infinity) because of a KeyError.
class customdict():
def __init__(self,dic):
self.cdict = dic
self.inverse = {}
def keys(self):
# this is necessary when i try to overload the UserDict.Mixin
return self.cdict.values()
def __getitem__(self, valueid):
""" Iterator function of the inversed dictionary """
if self.inverse == {}:
self.inverse = {v:k for k,v in self.cdict.items()}
return self.inverse[valueid]
x = {'foo':0, 'bar':1, 'king':3}
y = customdict(x)
for i in y:
print i
Without try and except and accessing the len(x), how could I resolve the iteration of the dictionary within the customdict class? Reason being x is >>>, len(x) will take too long for realtime.
I've tried UserDict.DictMixin and suddenly it works, why is that so?:
import UserDict.DictMixin
class customdict(UserDict.DictMixin):
...
Is there a way so that i don't use Mixin because somehow in __future__ and python3, mixins looks like it's deprecated?
Define following method.
def __iter__(self):
for k in self.keys():
yield k
I've tried UserDict.DictMixin and suddenly it works, why is that so?:
Because DictMixin define above __iter__ method for you.
(UserDict.py source code.)
Just share another way:
class customdict(dict):
def __init__(self,dic):
dict.__init__(self,{v:k for k,v in dic.items()})
x = {'foo':0, 'bar':1, 'king':3}
y = customdict(x)
for i in y:
print i,y[i]
result:
0 foo
1 bar
3 king
def __iter__(self):
return iter(self.cdict.itervalues())
In Python3 you'd call values() instead.
You're correct that UserDict.DictMixin is out of date, but it's not the fact that it's a mixin that's the problem, it's the fact that collections.Mapping and collections.MutableMapping use a more sensible underlying interface. So if you want to update from UserDict.DictMixin, you should switch to collections.Mapping and implement __iter__() and __len__() instead of keys().
When using list comprehension or the in keyword in a for loop context, i.e:
for o in X:
do_something_with(o)
or
l=[o for o in X]
How does the mechanism behind in works?
Which functions\methods within X does it call?
If X can comply to more than one method, what's the precedence?
How to write an efficient X, so that list comprehension will be quick?
The, afaik, complete and correct answer.
for, both in for loops and list comprehensions, calls iter() on X. iter() will return an iterable if X either has an __iter__ method or a __getitem__ method. If it implements both, __iter__ is used. If it has neither you get TypeError: 'Nothing' object is not iterable.
This implements a __getitem__:
class GetItem(object):
def __init__(self, data):
self.data = data
def __getitem__(self, x):
return self.data[x]
Usage:
>>> data = range(10)
>>> print [x*x for x in GetItem(data)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
This is an example of implementing __iter__:
class TheIterator(object):
def __init__(self, data):
self.data = data
self.index = -1
# Note: In Python 3 this is called __next__
def next(self):
self.index += 1
try:
return self.data[self.index]
except IndexError:
raise StopIteration
def __iter__(self):
return self
class Iter(object):
def __init__(self, data):
self.data = data
def __iter__(self):
return TheIterator(data)
Usage:
>>> data = range(10)
>>> print [x*x for x in Iter(data)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
As you see you need both to implement an iterator, and __iter__ that returns the iterator.
You can combine them:
class CombinedIter(object):
def __init__(self, data):
self.data = data
def __iter__(self):
self.index = -1
return self
def next(self):
self.index += 1
try:
return self.data[self.index]
except IndexError:
raise StopIteration
Usage:
>>> well, you get it, it's all the same...
But then you can only have one iterator going at once.
OK, in this case you could just do this:
class CheatIter(object):
def __init__(self, data):
self.data = data
def __iter__(self):
return iter(self.data)
But that's cheating because you are just reusing the __iter__ method of list.
An easier way is to use yield, and make __iter__ into a generator:
class Generator(object):
def __init__(self, data):
self.data = data
def __iter__(self):
for x in self.data:
yield x
This last is the way I would recommend. Easy and efficient.
X must be iterable. It must implement __iter__() which returns an iterator object; the iterator object must implement next(), which returns next item every time it is called or raises a StopIteration if there's no next item.
Lists, tuples and generators are all iterable.
Note that the plain for operator uses the same mechanism.
Answering question's comments I can say that reading source is not the best idea in this case. The code that is responsible for execution of compiled code (ceval.c) does not seem to be very verbose for a person that sees Python sources for the first time. Here is the snippet that represents iteration in for loops:
TARGET(FOR_ITER)
/* before: [iter]; after: [iter, iter()] *or* [] */
v = TOP();
/*
Here tp_iternext corresponds to next() in Python
*/
x = (*v->ob_type->tp_iternext)(v);
if (x != NULL) {
PUSH(x);
PREDICT(STORE_FAST);
PREDICT(UNPACK_SEQUENCE);
DISPATCH();
}
if (PyErr_Occurred()) {
if (!PyErr_ExceptionMatches(
PyExc_StopIteration))
break;
PyErr_Clear();
}
/* iterator ended normally */
x = v = POP();
Py_DECREF(v);
JUMPBY(oparg);
DISPATCH();
To find what actually happens here you need to dive into bunch of other files which verbosity is not much better. Thus I think that in such cases documentation and sites like SO are the first place to go while the source should be checked only for uncovered implementation details.
X must be an iterable object, meaning it needs to have an __iter__() method.
So, to start a for..in loop, or a list comprehension, first X's __iter__() method is called to obtain an iterator object; then that object's next() method is called for each iteration until StopIteration is raised, at which point the iteration stops.
I'm not sure what your third question means, and how to provide a meaningful answer to your fourth question except that your iterator should not construct the entire list in memory at once.
Maybe this helps (tutorial http://docs.python.org/tutorial/classes.html Section 9.9):
Behind the scenes, the for statement
calls iter() on the container object.
The function returns an iterator
object that defines the method next()
which accesses elements in the
container one at a time. When there
are no more elements, next() raises a
StopIteration exception which tells
the for loop to terminate.
To answer your questions:
How does the mechanism behind in works?
It is the exact same mechanism as used for ordinary for loops, as others have already noted.
Which functions\methods within X does it call?
As noted in a comment below, it calls iter(X) to get an iterator. If X has a method function __iter__() defined, this will be called to return an iterator; otherwise, if X defines __getitem__(), this will be called repeatedly to iterate over X. See the Python documentation for iter() here: http://docs.python.org/library/functions.html#iter
If X can comply to more than one method, what's the precedence?
I'm not sure what your question is here, exactly, but Python has standard rules for how it resolves method names, and they are followed here. Here is a discussion of this:
Method Resolution Order (MRO) in new style Python classes
How to write an efficient X, so that list comprehension will be quick?
I suggest you read up more on iterators and generators in Python. One easy way to make any class support iteration is to make a generator function for iter(). Here is a discussion of generators:
http://linuxgazette.net/100/pramode.html