I've got the following wrapper for a dictionary:
class MyDict:
def __init__(self):
self.container = {}
def __setitem__(self, key, value):
self.container[key] = value
def __getitem__(self, key):
return self.container[key]
def __iter__(self):
return self
def next(self):
pass
dic = MyDict()
dic['a'] = 1
dic['b'] = 2
for key in dic:
print key
My problem is that I don't know how to implement the next method to make MyDict iterable. Any advice would be appreciated.
Dictionaries are themselves not an iterator (which can only be iterated over once). You usually make them an iterable, an object for which you can produce multiple iterators instead.
Drop the next method altogether, and have __iter__ return an iterable object each time it is called. That can be as simple as just returning an iterator for self.container:
def __iter__(self):
return iter(self.container)
If you must make your class an iterator, you'll have to somehow track a current iteration position and raise StopIteration once you reach the 'end'. A naive implementation could be to store the iter(self.container) object on self the first time __iter__ is called:
def __iter__(self):
return self
def next(self):
if not hasattr(self, '_iter'):
self._iter = iter(self.container)
return next(self._iter)
at which point the iter(self.container) object takes care of tracking iteration position for you, and will raise StopIteration when the end is reached. It'll also raise an exception if the underlying dictionary was altered (had keys added or deleted) and iteration order has been broken.
Another way to do this would be to just store in integer position and index into list(self.container) each time, and simply ignore the fact that insertion or deletion can alter the iteration order of a dictionary:
_iter_index = 0
def __iter__(self):
return self
def next(self):
idx = self._iter_index
if idx is None or idx >= len(self.container):
# once we reach the end, all iteration is done, end of.
self._iter_index = None
raise StopIteration()
value = list(self.container)[idx]
self._iter_index = idx + 1
return value
In both cases your object is then an iterator that can only be iterated over once. Once you reach the end, you can't restart it again.
If you want to be able to use your dict-like object inside nested loops, for example, or any other application that requires multiple iterations over the same object, then you need to implement an __iter__ method that returns a newly-created iterator object.
Python's iterable objects all do this:
>>> [1, 2, 3].__iter__()
<listiterator object at 0x7f67146e53d0>
>>> iter([1, 2, 3]) # A simpler equivalent
<listiterator object at 0x7f67146e5390>
The simplest thing for your objects' __iter__ method to do would be to return an iterator on the underlying dict, like this:
def __iter__(self):
return iter(self.container)
For more detail than you probably will ever require, see this Github repository.
Related
I am currently working on LinkedList and I have the following code and I don't understand what the __iter__ and __repr__ are doing exactly?
class Node:
def __init__(self, value):
self.value = value
self.next = None
class LinkedList:
def __init__(self):
self.head = None
def append(self, value):
if self.head is None:
self.head = Node(value)
return
node = self.head
while node.next:
node = node.next
node.next = Node(value)
def __iter__(self):
node = self.head
while node:
yield node.value
node = node.next
def __repr__(self):
return str([v for v in self])
Here I am creating the LinkedList and append the values at the end of my list.
llist = LinkedList()
for value in [4,2,5,1,-3,0]:
llist.append(value)
If I print the list print(llist) then I am getting [4, 2, 5, 1, -3, 0]
I guess this is coming from __iter__ and __repr__. What I don't understand is when my __iter__ and __repr__ starts and which is running first? How can I print objects outside my class?
From my limited understanding of Python internals, when you print() an object, the str() function is called on it. This function probably knows how to format some kinds of objects, such as strings and numbers, for others, it calls their __repr__() method which is supposed to return a string. So __repr__() is called first.
The __repr__() method was written by the developer to return a normal list representation which is created by the list comprehension [v for v in self]. The iteration ultimately calls the __iter__() method which is a generator function (as indicated by the use of yield). This function iterates over the elements of the list, and every yield makes one element available to the for ... in ... construct.
Read the docs.
iter: defines on what attribute/how iterating on a class' object is supposed to work - e.g. what is "next". If you wanted say to iterate on animals, and you wanted the iteration to be defined by each animal's weight, then you could use self.weight in your animal class much in the way that you're using self.value right now....
_ repr _: When you do print(my_Object), Python by default has a representation (repr) defined. You can re-define using this function.
when you are implementing the iter method, you are changing the behaviour of looping throught your list.
what I mean is in the normal for loop when you call:
for l in list:
print(l)
this what will happen behind the scene :
# create an iterator object from that iterable
iter_obj = iter(list)
# infinite loop
while True:
try:
# get the next item
element = next(iter_obj)
# do something with element
except StopIteration:
# if StopIteration is raised, break from loop
break
so if you redefine the iter function inside your linked list class you are redefining the iteration function and what should iterating through your iterator returns.
same basically for repr you redefine the represantation of your object. for example when you call print(obj) you get the obj printed but you can change it to the format you want to if you implement repr correctly as you wish.
take a look here for more explaination:
https://www.programiz.com/python-programming/iterator
https://www.pythonforbeginners.com/basics/str-vs-__repr
__iter__ represents the iterator method of the class. Let's understand through code:
llist = LinkedList()
for value in [4,2,5,1,-3,0]:
llist.append(value)
for value in llist:
print(value)
Output
4
2
5
1
-3
0
Here, when we take the object as an iterator, __iter__() method would be into action which iterates and traverse linked list and yields node values.
print(llist)
Output
[4, 2, 5, 1, -3, 0]
Here, __repr__() method would be into action and would print node values as specified in the code.
I am trying to implement an iterable proxy for a web resource (lazily fetched images).
Firstly, I did (returning ids, in production those will be image buffers)
def iter(ids=[1,2,3]):
for id in ids:
yield id
and that worked nicely, but now I need to keep state.
I read the four ways to define iterators. I judged that the iterator protocol is the way to go. Follow my attempt and failure to implement that.
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
return self
def __next__(self):
for id in self.ids:
yield id
raise StopIteration
test = Test([1,2,3])
for t in test:
print('new value', t)
Output:
new value <generator object Test.__next__ at 0x7f9c46ed1750>
new value <generator object Test.__next__ at 0x7f9c46ed1660>
new value <generator object Test.__next__ at 0x7f9c46ed1750>
new value <generator object Test.__next__ at 0x7f9c46ed1660>
new value <generator object Test.__next__ at 0x7f9c46ed1750>
forever.
What's wrong?
Thanks to absolutely everyone! It's all new to me, but I'm learning new cool stuff.
Your __next__ method uses yield, which makes it a generator function. Generator functions return a new iterator when called.
But the __next__ method is part of the iterator interface. It should not itself be an iterator. __next__ should return the next value, not something that returns all values(*).
Because you wanted to create an iterable, you can just make __iter__ the generator here:
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
for id in self.ids:
yield id
Note that a generator function should not use raise StopIteration, just returning from the function does that for you.
The above class is an iterable. Iterables only have an __iter__ method, and no __next__ method. Iterables produce an iterator when __iter__ is called:
Iterable -> (call __iter__) -> Iterator
In the above example, because Test.__iter__ is a generator function, it creates a new object each time we call it:
>>> test = Test([1,2,3])
>>> test.__iter__() # create an iterator
<generator object Test.__iter__ at 0x111e85660>
>>> test.__iter__()
<generator object Test.__iter__ at 0x111e85740>
A generator object is a specific kind of iterator, one created by calling a generator function, or by using a generator expression. Note that the hex values in the representations differ, two different objects were created for the two calls. This is by design! Iterables produce iterators, and can create more at will. This lets you loop over them independently:
>>> test_it1 = test.__iter__()
>>> test_it1.__next__()
1
>>> test_it2 = test.__iter__()
>>> test_it2.__next__()
1
>>> test_it1.__next__()
2
Note that I called __next__() on the object returned by test.__iter__(), the iterator, not on test itself, which doesn't have that method because it is only an iterable, not an iterator.
Iterators also have an __iter__ method, which always must return self, because they are their own iterators. It is the __next__ method that makes them an iterator, and the job of __next__ is to be called repeatedly, until it raises StopIteration. Until StopIteration is raised, each call should return the next value. Once an iterator is done (has raised StopIteration), it is meant to then always raise StopIteration. Iterators can only be used once, unless they are infinite (never raise StopIteration and just keep producing values each time __next__ is called).
So this is an iterator:
class IteratorTest:
def __init__(self, ids):
self.ids = ids
self.nextpos = 0
def __iter__(self):
return self
def __next__(self):
if self.ids is None or self.nextpos >= len(self.ids):
# we are done
self.ids = None
raise StopIteration
value = self.ids[self.nextpos]
self.nextpos += 1
return value
This has to do a bit more work; it has to keep track of what the next value to produce would be, and if we have raised StopIteration yet. Other answerers here have used what appear to be simpler ways, but those actually involve letting something else do all the hard work. When you use iter(self.ids) or (i for i in ids) you are creating a different iterator to delegate __next__ calls to. That's cheating a bit, hiding the state of the iterator inside ready-made standard library objects.
You don't usually see anything calling __iter__ or __next__ in Python code, because those two methods are just the hooks that you can implement in your Python classes; if you were to implement an iterator in the C API then the hook names are slightly different. Instead, you either use the iter() and next() functions, or just use the object in syntax or a function call that accepts an iterable.
The for loop is such syntax. When you use a for loop, Python uses the (moral equivalent) of calling __iter__() on the object, then __next__() on the resulting iterator object to get each value. You can see this if you disassemble the Python bytecode:
>>> from dis import dis
>>> dis("for t in test: pass")
1 0 LOAD_NAME 0 (test)
2 GET_ITER
>> 4 FOR_ITER 4 (to 10)
6 STORE_NAME 1 (t)
8 JUMP_ABSOLUTE 4
>> 10 LOAD_CONST 0 (None)
12 RETURN_VALUE
The GET_ITER opcode at position 2 calls test.__iter__(), and FOR_ITER uses __next__ on the resulting iterator to keep looping (executing STORE_NAME to set t to the next value, then jumping back to position 4), until StopIteration is raised. Once that happens, it'll jump to position 10 to end the loop.
If you want to play more with the difference between iterators and iterables, take a look at the Python standard types and see what happens when you use iter() and next() on them. Like lists or tuples:
>>> foo = (42, 81, 17, 111)
>>> next(foo) # foo is a tuple, not an iterator
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'tuple' object is not an iterator
>>> t_it = iter(foo) # so use iter() to create one from the tuple
>>> t_it # here is an iterator object for our foo tuple
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it) # it returns itself
<tuple_iterator object at 0x111e9af70>
>>> iter(t_it) is t_it # really, it returns itself, not a new object
True
>>> next(t_it) # we can get values from it, one by one
42
>>> next(t_it) # another one
81
>>> next(t_it) # yet another one
17
>>> next(t_it) # this is getting boring..
111
>>> next(t_it) # and now we are done
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> next(t_it) # an *stay* done
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> foo # but foo itself is still there
(42, 81, 17, 111)
You could make Test, the iterable, return a custom iterator class instance too (and not cop out by having generator function create the iterator for us):
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
return TestIterator(self)
class TestIterator:
def __init__(self, test):
self.test = test
def __iter__(self):
return self
def __next__(self):
if self.test is None or self.nextpos >= len(self.test.ids):
# we are done
self.test = None
raise StopIteration
value = self.test.ids[self.nextpos]
self.nextpos += 1
return value
That's a lot like the original IteratorTest class above, but TestIterator keeps a reference to the Test instance. That's really how tuple_iterator works too.
A brief, final note on naming conventions here: I am sticking with using self for the first argument to methods, so the bound instance. Using different names for that argument only serves to make it harder to talk about your code with other, experienced Python developers. Don't use me, however cute or short it may seem.
(*) Unless your goal was to create an iterator of iterators, of course (which is basically what the itertools.groupby() iterator does, it is an iterator producing (object, group_iterator) tuples, but I digress).
It is unclear to me exactly what you are trying to achieve, but if you really want to use your instance attributes like this, you can convert the input to a generator and then iterate it as such. But, as I said, this feels odd and I don't think you'd actually want a setup like this.
class Test:
def __init__(self, ids):
self.ids = iter(ids)
def __iter__(self):
return self
def __next__(self):
return next(self.ids)
test = Test([1,2,3])
for t in test:
print('new value', t)
The simplest solution is to use __iter__ and return an iterator to the main list:
class Test:
def __init__(self, ids):
self.ids = ids
def __iter__(self):
return iter(self.ids)
test = Test([1,2,3])
for t in test:
print('new value', t)
As the update, for lazily loading you can return an iterator to a generator:
def __iter__(self):
return iter(load_file(id) for id in self.ids)
The __next__ function is supposed to return the next value provided by an iterator. Since you have used yield in your implementation, the function returns a generator, which is what you get.
You need to make clear whether you want Test to be an iterable or an iterator. If it is an iterable, it will have the ability to provide an iterator with __iter__. If it is an iterator, it will have the ability to provide new elements with __next__. Iterators can typically work as iterables by returning themselves in __iter__. Martijn's answer shows what you probably want. However, if you want an example of how you could specifically implement __next__ (by making Test explicitly an iterator), it could be something like this:
class Test:
def __init__(self, ids):
self.ids = ids
self.idx = 0
def __iter__(self):
return self
def __next__(self):
if self.idx >= len(self.ids):
raise StopIteration
else:
self.idx += 1
return self.ids[self.idx - 1]
test = Test([1,2,3])
for t in test:
print('new value', t)
I have a class with __iter__ defined like below:
class MyIterator:
def __iter__(self):
for value in self._iterator:
if is_iterator(value):
yield from value
else:
yield value
I want to do next(my_iterator) but I have to implement __next__ to do so. But it would change this simple implementation to a fairly complicated one - or actually I don't know how to implement this instead of defining __iter__ as a generator function.
Generally spealing, if __iter__ is implemented as a generator fuction which might be difficult to be done without generator, how should I do if I want to use __next__?
Note: Apparently, next(iter(my_iterator)) works, but I don't want to do it.
If your class is supposed to be an iterator, it should not have its __iter__ method implemented as a generator function. That makes the class iterable, but not an iterator. An iterator's __iter__ method is supposed to return itself.
If you really want your class to be an iterator, try something like this:
class MyIterator:
def __init__(self, iterator):
self._iterator = iterator
self._subiterator = None
def __iter__(self):
return self
def __next__(self):
while True:
if self._subiterator is None:
value = next(self._iterator) # may raise StopIteration
try: # could test is_iterator(value) here for LBYL style code
self._subiterator = iter(value)
except TypeError:
return value
try:
return next(self._subiterator)
except StopIteraton:
self._subiterator = None
The next(self._iterator) call may raise StopIteration, which I deliberately do not catch. That exception is the signal we're finished iterating, so if we caught it we'd only have to raise it again.
This code uses a "Easier to Ask Forgiveness than Permission" (EAFP) approach to detecting iterable items within the iterator it's been given. It simply tries calling iter on each one and catches the TypeError that will be raised if they're not iterable. If you prefer to stick with the "Look Before You Leap" (LBYL) style and explicitly test with is_iterator (which is badly named, since it checks for any kind of iterable, not only iterators), you could replace the inner try with:
if is_iterator(value):
return value
else:
self._subiterator = iter(value)
I usually prefer EAFP style to LBYL style in my Python code, but there are situations where either one can be better. Other times it's just a matter of style.
As #BrenBarm commented, the apparent answer was to return next(iter(self)), or next(self._iter_self) by keeping self._iter_self = iter(self). I couldn't just come up with it.
an iterator is an object with a __next__ method that returns values until finally raising StopIteration.
an iterable is an object with an __iter__ method that returns an iterator.
a generator is a special iterable created by python when a function or method include the yield statement.
In your example, __iter__ has a yield so it is a generator. And that means it returns another iterable, not an iterator. That's why you have to do that strange next(iter(my_iterator)) thing, and that doesn't work because it restarts the enumation each time.
How best to solve this problem depends on how you are using this class. You could create a generator function instead and then use iter to make iterators as needed:
import collections.abc
def is_iterator(i):
return isinstance(i, collections.abc.Iterable)
def MyIterator(iterable):
for value in iterable:
if is_iterator(value):
yield from value
else:
yield value
test_this = [1,2, [3, 4, 5], [6], [], 7, 'foo']
my_iterator = iter(MyIterator(test_this))
try:
while True:
print(next(my_iterator))
except StopIteration:
pass
Or you can implement __next__ instead of __iter__. But you can't use yield and have to return a value on each call until the outer iterator completes and raises StopIteration.
import collections.abc
class MyIterator:
def __init__(self, iterable):
self._iterator = iter(iterable)
self._iterating = None
def __next__(self):
while True:
if self._iterating is not None:
try:
return next(self._iterating)
except StopIteration:
self._iterating = None
value = next(self._iterator)
if isinstance(value, collections.abc.Iterable):
self._iterating = iter(value)
else:
return value
test_this = [1,2, [3, 4, 5], [6], [], 7, 'foo']
my_iterator = MyIterator(test_this)
try:
while True:
print(next(my_iterator))
except StopIteration:
pass
class Node:
def __init__(self, value):
self._value = value
self._children = []
def __repr__(self):
return 'Node({!r})'.format(self._value)
def add_child(self, node):
self._children.append(node)
def __iter__(self):
return iter(self._children)
def depth_first(self):
yield self
for c in self:
yield from c.depth_first()
if __name__ == '__main__':
root = Node(0)
child1 = Node(1)
child2 = Node(2)
root.add_child(child1)
root.add_child(child2)
child1.add_child(Node(3))
child1.add_child(Node(4))
child2.add_child(Node(5))
for a in root.depth_first():
print(a)
# Outputs Node(0), Node(1), Node(3), Node(4), Node(2), Node(5)
I thought that a list is an object that we can iterate over it so why use iter() ? I am new in python so this look to me so weird.
Because returning self._children returns a list object, which doesn't work as an iterator, remember, iterators implement the __next__ method in order to supply items during iteration:
>>> next(list()) # doesn't implement a __next__ method
TypeError: 'list' object is not an iterator
lists are iterable, they can be iterated through because calling iter on the will return an iterator, but, lists themselves are not iterators -- a good breakdown of these can be found in the top answer of this Question.
A lists __iter__ method returns a custom new list_iterator object each time __iter__ is called:
list().__iter__()
Out[93]: <list_iterator at 0x7efe7802d748>
and, by doing that, supports iteration multiple times, the list_iterator object implements the __next__ method that's required.
Calling and returning iter on it will just do that, return the lists iterator and save you the trouble of having to implement __next__.
Addressing your comment as for why for c in self._children works, well, because it is essentially doing the same thing. What basically happens with the for loop is:
it = iter(self._children) # returns the list iterator
while True:
try:
i = next(it)
<loop body>
except StopIteration:
break
meaning, iter is called again on the list object and is used by the for loop for next calls.
Every iterable object implements an iter() function that returns itself.
the iter(obj) calls the obj.iter(), which is same as returning the list object in your example.
When using list comprehension or the in keyword in a for loop context, i.e:
for o in X:
do_something_with(o)
or
l=[o for o in X]
How does the mechanism behind in works?
Which functions\methods within X does it call?
If X can comply to more than one method, what's the precedence?
How to write an efficient X, so that list comprehension will be quick?
The, afaik, complete and correct answer.
for, both in for loops and list comprehensions, calls iter() on X. iter() will return an iterable if X either has an __iter__ method or a __getitem__ method. If it implements both, __iter__ is used. If it has neither you get TypeError: 'Nothing' object is not iterable.
This implements a __getitem__:
class GetItem(object):
def __init__(self, data):
self.data = data
def __getitem__(self, x):
return self.data[x]
Usage:
>>> data = range(10)
>>> print [x*x for x in GetItem(data)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
This is an example of implementing __iter__:
class TheIterator(object):
def __init__(self, data):
self.data = data
self.index = -1
# Note: In Python 3 this is called __next__
def next(self):
self.index += 1
try:
return self.data[self.index]
except IndexError:
raise StopIteration
def __iter__(self):
return self
class Iter(object):
def __init__(self, data):
self.data = data
def __iter__(self):
return TheIterator(data)
Usage:
>>> data = range(10)
>>> print [x*x for x in Iter(data)]
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
As you see you need both to implement an iterator, and __iter__ that returns the iterator.
You can combine them:
class CombinedIter(object):
def __init__(self, data):
self.data = data
def __iter__(self):
self.index = -1
return self
def next(self):
self.index += 1
try:
return self.data[self.index]
except IndexError:
raise StopIteration
Usage:
>>> well, you get it, it's all the same...
But then you can only have one iterator going at once.
OK, in this case you could just do this:
class CheatIter(object):
def __init__(self, data):
self.data = data
def __iter__(self):
return iter(self.data)
But that's cheating because you are just reusing the __iter__ method of list.
An easier way is to use yield, and make __iter__ into a generator:
class Generator(object):
def __init__(self, data):
self.data = data
def __iter__(self):
for x in self.data:
yield x
This last is the way I would recommend. Easy and efficient.
X must be iterable. It must implement __iter__() which returns an iterator object; the iterator object must implement next(), which returns next item every time it is called or raises a StopIteration if there's no next item.
Lists, tuples and generators are all iterable.
Note that the plain for operator uses the same mechanism.
Answering question's comments I can say that reading source is not the best idea in this case. The code that is responsible for execution of compiled code (ceval.c) does not seem to be very verbose for a person that sees Python sources for the first time. Here is the snippet that represents iteration in for loops:
TARGET(FOR_ITER)
/* before: [iter]; after: [iter, iter()] *or* [] */
v = TOP();
/*
Here tp_iternext corresponds to next() in Python
*/
x = (*v->ob_type->tp_iternext)(v);
if (x != NULL) {
PUSH(x);
PREDICT(STORE_FAST);
PREDICT(UNPACK_SEQUENCE);
DISPATCH();
}
if (PyErr_Occurred()) {
if (!PyErr_ExceptionMatches(
PyExc_StopIteration))
break;
PyErr_Clear();
}
/* iterator ended normally */
x = v = POP();
Py_DECREF(v);
JUMPBY(oparg);
DISPATCH();
To find what actually happens here you need to dive into bunch of other files which verbosity is not much better. Thus I think that in such cases documentation and sites like SO are the first place to go while the source should be checked only for uncovered implementation details.
X must be an iterable object, meaning it needs to have an __iter__() method.
So, to start a for..in loop, or a list comprehension, first X's __iter__() method is called to obtain an iterator object; then that object's next() method is called for each iteration until StopIteration is raised, at which point the iteration stops.
I'm not sure what your third question means, and how to provide a meaningful answer to your fourth question except that your iterator should not construct the entire list in memory at once.
Maybe this helps (tutorial http://docs.python.org/tutorial/classes.html Section 9.9):
Behind the scenes, the for statement
calls iter() on the container object.
The function returns an iterator
object that defines the method next()
which accesses elements in the
container one at a time. When there
are no more elements, next() raises a
StopIteration exception which tells
the for loop to terminate.
To answer your questions:
How does the mechanism behind in works?
It is the exact same mechanism as used for ordinary for loops, as others have already noted.
Which functions\methods within X does it call?
As noted in a comment below, it calls iter(X) to get an iterator. If X has a method function __iter__() defined, this will be called to return an iterator; otherwise, if X defines __getitem__(), this will be called repeatedly to iterate over X. See the Python documentation for iter() here: http://docs.python.org/library/functions.html#iter
If X can comply to more than one method, what's the precedence?
I'm not sure what your question is here, exactly, but Python has standard rules for how it resolves method names, and they are followed here. Here is a discussion of this:
Method Resolution Order (MRO) in new style Python classes
How to write an efficient X, so that list comprehension will be quick?
I suggest you read up more on iterators and generators in Python. One easy way to make any class support iteration is to make a generator function for iter(). Here is a discussion of generators:
http://linuxgazette.net/100/pramode.html