Confusion about iterator in python [duplicate] - python

This question already has answers here:
Why can't I iterate twice over the same iterator? How can I "reset" the iterator or reuse the data?
(5 answers)
Closed 1 year ago.
I am confused about iterator in python. Please take a look on the code.
class MyNumbers:
def __init__(self):
self.a=4
def __iter__(self):
return self
def __next__(self):
if self.a <= 20:
x = self.a
self.a += 1
return x
else:
raise StopIteration
myclass = MyNumbers()
myiter1 = iter(myclass)
c=list(myiter1)
for x in myiter1:
print(x)
I getting no output from the above code. I was expecting some value iteration from 4.
but when i was removing c=list(myiter1) i was getting expected output. So i am confused about it. why is it happening.

Your iterator is exhausted: list in the assignment to c consumed all the available values, resulting in a list [4, 5, 6, ..., 20] and an iterator that will raise StopIteration immediately when you try to iterate over it again.
You may be confused because it appears you can iterate over a list multiple times. But that is because a list itself is not an iterator.
>>> list.__next__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'list' has no attribute '__next__'
Calling iter on a list produces a list_iterator value, not the list itself. That iterator is exhaustible like any other iterator.
>>> x = [1,2,3]
>>> y = iter(x)
>>> type(y)
<class 'list_iterator'>
>>> list(y)
[1, 2, 3]
>>> list(y)
[]
The for loop implicitly calls iter on the thing you are try to iterate over. By convention, all iterators are also iterable: an iterator's __iter__ should return self.

Related

Does next() eliminate values from a generator?

I've written a generator that does nothing more or less than store a range from 0 to 10:
result = (num for num in range(11))
When I want to print values, I can use next():
print(next(result))
[Out]: 0
print(next(result))
[Out]: 1
print(next(result))
[Out]: 2
print(next(result))
[Out]: 3
print(next(result))
[Out]: 4
If I then run a for loop on the generator, it runs on the values that I have not called next() on:
for value in result:
print(value)
[Out]: 5
6
7
8
9
10
Has the generator eliminated the other values by acting on them with a next() function? I've tried to find some documentation on the functionality of next() and generators but haven't been successful.
Actually this is can be implicitly deduced from next's docs and by understanding the iterator protocol/contract:
next(iterator[, default])
Retrieve the next item from the iterator by
calling its next() method. If default is given, it is returned if
the iterator is exhausted, otherwise StopIteration is raised.
Yes. Using a generator's __next__ method retrieves and removes the next value from the generator.
tldr; yes
An iterator is essentially a value producer that yields successive values from its associated iterable object. The built-in function next() is used to obtain the next value from in iterator.
Here is an example using the same list as above:
>>> l = ['Sarah', 'Roark']
>>> itr = iter(l)
>>> itr
<list_iterator object at 0x100ba8950>
>>> next(itr)
'Sarah'
>>> next(itr)
'Roark'
>>> next(itr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
In this example, l is an iterable list and itr is the associated iterator, obtained with iter(). Each next(itr) call obtains the next value from itr.
Notice how an iterator retains its state internally. It knows which values have been obtained already, so when you call next(), it knows what value to return next.
If all the values from an iterator have been returned already, a subsequent next() call raises a StopIteration exception. Any further attempts to obtain values from the iterator will fail.
We can only obtain values from an iterator in one direction. We can’t go backward. There is no prev() function. But we can define two independent iterators on the same iterable object:
>>> l
['Sarah', 'Roark', 30]
>>> itr1 = iter(l)
>>> itr2 = iter(l)
>>> next(itr1)
'Sarah'
>>> next(itr1)
'Roark'
>>> next(itr1)
30
>>> next(itr2)
'Sarah'
Yes, a for loop in Python just returns the next item from the iterator, the same way that next() does.
https://docs.python.org/3/reference/compound_stmts.html#the-for-statement
The suite is then executed once for each item provided by the iterator, in the order returned by the iterator.
So you can think of a for loop like this:
for x in container:
statement()
As (almost) equivalent to a while loop:
iterator = iter(container)
while True:
x = next(iterator)
if x is None:
break
statement()
If container is already an iterator, then iter(container) is container.
Note: Technically, a for loop is more like this:
iterator = iter(container)
while True:
try:
x = iterator.__next__()
except StopIteration:
break
statement()

Does for loop call __iter__?

Look at below sample:
a = [1, 2, 3, 4]
for i in a:
print(a)
a is the list (iterable) not the iterator.
I'm not asking to know that __iter__ or iter() convert list to iterator!
I'm asking to know if for loop itself convert list implicitly then call __iter__ for iteration keeping list without removing like iterator?
Since stackoverflow identified my question as possible duplicate:
The unique part is that I'm not asking about for loop as concept nor __iter__, I'm asking about the core mechanism of for loop and relationship with iter.
I'm asking to know if for loop itself convert list implicitly then call iter for iteration keeping list without removing like iterator?
The for loop does not convert the list implicitly in the sense that it mutates the list, but it implicitly creates an iterator from the list. The list itself will not change state during iteration, but the created iterator will.
a = [1, 2, 3]
for x in a:
print(x)
is equivalent to
a = [1, 2, 3]
it = iter(a) # calls a.__iter__
while True:
try:
x = next(it)
except StopIteration:
break
print(x)
Here's proof that __iter__ actually gets called:
import random
class DemoIterable(object):
def __iter__(self):
print('__iter__ called')
return DemoIterator()
class DemoIterator(object):
def __iter__(self):
return self
def __next__(self):
print('__next__ called')
r = random.randint(1, 10)
if r == 5:
print('raising StopIteration')
raise StopIteration
return r
Iteration over a DemoIterable:
>>> di = DemoIterable()
>>> for x in di:
... print(x)
...
__iter__ called
__next__ called
9
__next__ called
8
__next__ called
10
__next__ called
3
__next__ called
10
__next__ called
raising StopIteration

What do you call the item of list when used as an iterator in a for loop?

I'm not sure how you name the n in the following for loop. Is there are a term for it?
for n in [1,2,3,4,5]:
print i
And, am I correct that the list itself is the iterator of the for loop ?
While n is called a loop variable the list is absolutely not an iterator. It is iterable object, i.e. and iterable, but it is not an iterator. An iterable may be an iterator itself, but not always. That is to say, iterators are iterable, but not all iterables are iterators. In the case of a list it is simply an iterable.
It is an iterable because it implements an __iter__ method, which returns an iterator:
From the Python Glossary an iterable is:
An object capable of returning its members one at a time. Examples of
iterables include all sequence types (such as list, str, and tuple)
and some non-sequence types like dict, file objects, and objects of
any classes you define with an __iter__() or __getitem__() method.
Iterables can be used in a for loop and in many other places where a
sequence is needed (zip(), map(), ...). When an iterable object is
passed as an argument to the built-in function iter(), it returns an
iterator for the object. This iterator is good for one pass over the
set of values. When using iterables, it is usually not necessary to
call iter() or deal with iterator objects yourself. The for statement
does that automatically for you, creating a temporary unnamed variable
to hold the iterator for the duration of the loop.
So, observe:
>>> x = [1,2,3]
>>> iterator = iter(x)
>>> type(iterator)
<class 'list_iterator'>
>>> next(iterator)
1
>>> next(iterator)
2
>>> next(iterator)
3
>>> next(iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
It is illuminating to understand that a for-loop in Python such as the following:
for n in some_iterable:
# do something
is equivalent to:
iterator = iter(some_iterable)
while True:
try:
n = next(iterator)
# do something
except StopIteration as e:
break
Iterators, which are returned by a call to an object's __iter__ method, also implement the __iter__ method (usually returning themselves) but they also implement a __next__ method. Thus, an easy way to check if something is an iterable is to see if it implements a next method
>>> next(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not an iterator
Again, from the Python Glossary, an iterator is:
An object representing a stream of data. Repeated calls to the
iterator’s __next__() method (or passing it to the built-in function
next()) return successive items in the stream. When no more data are
available a StopIteration exception is raised instead. At this point,
the iterator object is exhausted and any further calls to its
__next__() method just raise StopIteration again. Iterators are required to have an __iter__() method that returns the iterator object
itself so every iterator is also iterable and may be used in most
places where other iterables are accepted. One notable exception is
code which attempts multiple iteration passes. A container object
(such as a list) produces a fresh new iterator each time you pass it
to the iter() function or use it in a for loop. Attempting this with
an iterator will just return the same exhausted iterator object used
in the previous iteration pass, making it appear like an empty
container.
I've illustrated the bevahior of an iterator with the next function above, so now I want to concentrate on the bolded portion.
Basically, an iterator can be used in the place of an iterable because iterators are always iterable. However, an iterator is good for only a single pass. So, if I use a non-iterator iterable, like a list, I can do stuff like this:
>>> my_list = ['a','b','c']
>>> for c in my_list:
... print(c)
...
a
b
c
And this:
>>> for c1 in my_list:
... for c2 in my_list:
... print(c1,c2)
...
a a
a b
a c
b a
b b
b c
c a
c b
c c
>>>
An iterator behaves almost in the same way, so I can still do this:
>>> it = iter(my_list)
>>> for c in it:
... print(c)
...
a
b
c
>>>
However, iterators do not support multiple iteration (well, you can make your an iterator that does, but generally they do not):
>>> it = iter(my_list)
>>> for c1 in it:
... for c2 in it:
... print(c1,c2)
...
a b
a c
Why is that? Well, recall what is happening with the iterator protocol which is used by a for loop under the hood, and consider the following:
>>> my_list = ['a','b','c','d','e','f','g']
>>> iterator = iter(my_list)
>>> iterator_of_iterator = iter(iterator)
>>> next(iterator)
'a'
>>> next(iterator)
'b'
>>> next(iterator_of_iterator)
'c'
>>> next(iterator_of_iterator)
'd'
>>> next(iterator)
'e'
>>> next(iterator_of_iterator)
'f'
>>> next(iterator)
'g'
>>> next(iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> next(iterator_of_iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>
When I used iter() on an iterator, it returned itself!
>>> id(iterator)
139788446566216
>>> id(iterator_of_iterator)
139788446566216
The example you gave is an "iterator-based for-loop"
n is called the loop variable.
The role that list plays is more troublesome to name.
Indeed, after an interesting conversation with #juanpa.arrivillaga I've concluded that there simply isn't a "clearly correct formal name", nor a commonly used name, for that syntactic element.
That being said, I do think that if you referred to it in context in a sentence as "the loop iterator" everyone would know what you meant.
In doing so, you take the risk of confusing yourself or someone else with the fact that the syntactic element in that position is not in fact an iterator, its a collection or (loosely, but from the definition in the referenced article) an "iterable of some sort".
I suspect that one reason why there isn't a name for this is that we hardly ever have to refer to it in a sentence. Another is that they types of element that can appear in that position vary widely, so it is hard to safely cover them all with a label.

Why have an __iter__ method in Python?

Why have an __iter__ method? If an object is an iterator, then it is pointless to have a method which returns itself. If it is not an iterator but is instead an iterable, i.e something with an __iter__ and __getitem__ method, then why would one want to ever define something which returns an iterator but is not an iterator itself? In Python, when would one want to define an iterable that is not itself an iterator? Or, what is an example of something that is an iterable but not an iterator?
Trying to answer your questions one at a time:
Why have an __iter__ method? If an object is an iterator, then it is pointless to have a method which returns itself.
It's not pointless. The iterator protocol demands an __iter__ and __next__ (or next in Python 2) method. All sane iterators I have ever seen just return self in their __iter__ method, but it is still crucial to have that method. Not having it would lead to all kinds of weirdness, for example:
somelist = [1, 2, 3]
it = iter(somelist)
now
iter(it)
or
for x in it: pass
would throw a TypeError and complain that it is not iterable, because when iter(x) is called (which implicitly happens when you employ a for loop) it expects the argument object x to be able to produce an iterator (it just tries to call __iter__ on that object). Concrete example (Python 3):
>>> class A:
... def __iter__(self):
... return B()
...
>>> class B:
... def __next__(self):
... pass
...
>>> iter(iter(A()))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'B' object is not iterable
Consider any functions, escpecially from itertools that expect an iterable, for example dropwhile. Calling it with any object that has an __iter__ method will be fine, regardless of whether it's an iterable that is not an iterator, or an iterator - because you can expect the same result when calling iter with that object as an argument. Making a weird distinction between two kinds of iterables here would go against the principle of duck typing which python strongly embraces.
Neat tricks like
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> list(zip(*[iter(a)]*3))
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]
would just stop working if you could not pass iterators to zip.
why would one want to ever define something which returns an iterator but is not an iterator itself
Let's consider this simple list iterator:
>>> class MyList(list):
... def __iter__(self):
... return MyListIterator(self)
>>>
>>> class MyListIterator:
... def __init__(self, lst):
... self._lst = lst
... self.index = 0
... def __iter__(self):
... return self
... def __next__(self):
... try:
... n = self._lst[self.index]
... self.index += 1
... return n
... except IndexError:
... raise StopIteration
>>>
>>> a = MyList([1,2,3])
>>> for x in a:
... for x in a:
... x
...
1
2
3
1
2
3
1
2
3
Remember that iter is called with the iterable in question for both for loops, expecting a fresh iterator each time from the object's __iter__ method.
Now, without an iterator being produced each time a for loop is employed, how would you be able to keep track of the current state of any iteration when a MyList object is iterated over an arbitrary number of times at the same time? Oh, that's right, you can't. :)
edit: Bonus and sort of a reply to Tadhg McDonald-Jensen's comment
A resuable iterator is not unthinkable, but of course a bit weird because it would rely on being initialized with a "non-consumable" iterable (i.e. not a classic iterator):
>>> class riter(object):
... def __init__(self, iterable):
... self.iterable = iterable
... self.it = iter(iterable)
... def __next__(self): # python 2: next
... try:
... return next(self.it)
... except StopIteration:
... self.it = iter(self.iterable)
... raise
... def __iter__(self):
... return self
...
>>>
>>> a = [1, 2, 3]
>>> it = riter(a)
>>> for x in it:
... x
...
1
2
3
>>> for x in it:
... x
...
1
2
3
An iterable is something that can be iterated (looped) over, where as an iterator is something that is consumed.
what is an example of something that is an iterable but not an iterator?
Simple, a list. Or any sequence, since you can iterate over a list as many times as you want without destruction to the list:
>>> a = [1,2,3]
>>> for i in a:
print(i,end=" ")
1 2 3
>>> for i in a:
print(i,end=" ")
1 2 3
Where as an iterator (like a generator) can only be used once:
>>> b = (i for i in range(3))
>>> for i in b:
print(i,end=" ")
0 1 2
>>> for i in b:
print(i,end=" ")
>>> #iterator has already been used up, nothing gets printed
For a list to be consumed like an iterator you would need to use something like self.pop(0) to remove the first element of the list for iteration:
class IteratorList(list):
def __iter__(self):
return self #since the current mechanics require this
def __next__(self):
try:
return self.pop(0)
except IndexError: #we need to raise the expected kind of error
raise StopIteration
next = __next__ #for compatibility with python 2
a = IteratorList([1,2,3,4,5])
for i in a:
print(i)
if i==3: # lets stop at three and
break # see what the list is after
print(a)
which gives this output:
1
2
3
[4, 5]
You see? This is what iterators do, once a value is returned from __next__ it has no reason to hang around in the iterator or in memory, so it is removed. That's why we need the __iter__, to define iterators that let us iterate over sequences without destroying them in the process.
In response to #timgeb's comment, I suppose if you added items to an IteratorList then iterated over it again that would make sense:
a = IteratorList([1,2,3,4,5])
for i in a:
print(i)
a.extend([6,7,8,9])
for i in a:
print(i)
But all iterators only make sense to either be consumed or never end. (like itertools.repeat)
You are thinking in the wrong direction. The reason why an iterator has to implement __iter__ is that this way, both containers and iterators can be used in for and in statement.
> # list is a container
> list = [1,2,3]
> dir(list)
[...,
'__iter__',
'__getitem__',
...]
> # let's get its iterator
> it = iter(list)
> dir(it)
[...,
'__iter__',
'__next__',
...]
> # you can use the container directly:
> for i in list:
> print(i)
1
2
3
> # you can also use the iterator directly:
> for i in it:
> print(i)
1
2
3
> # the above will fail if it does not implement '__iter__'
And that is also why you simply need to return self in almost all implementations of an iterator. It is not meant for anything funky, just a little bit easiness on syntax.
Ref: https://docs.python.org/dev/library/stdtypes.html#iterator-types

What is the purpose of __iter__ returning the iterator object itself?

I don't understand exactly why the __iter__ special method just returns the object it's called on (if it's called on an iterator). Is it essentially just a flag indicating that the object is an iterator?
EDIT: Actually, I discovered that "This is required to allow both containers and iterators to be used with the for and in statements." https://docs.python.org/3/library/stdtypes.html#iterator.iter
Alright, here's how I understand it: When writing a for loop, you're allowed to specify either an iterable or an iterator to loop over. But Python ultimately needs an iterator for the loop, so it calls the __iter__ method on whatever it's given. If it's been given an iterable, the __iter__ method will produce an iterator, and if it's been given an iterator, the __iter__ method will likewise produce an iterator (the original object given).
When you loop over something using for x in something, then the loop actually calls iter(something) first, so it has something to work with. In general, the for loop is approximately equivalent to something like this:
something_iterator = iter(something)
while True:
try:
x = next(something_iterator)
# loop body
except StopIteration:
break
So as you already figured out yourself, in order to be able to loop over an iterator, i.e. when something is already an iterator, iterators should always return themselves when calling iter() on them. So this basically makes sure that iterators are also iterable.
This depends what object you call iter on. If an object is already an iterator, then there is no operation required to convert it to an iterator, because it already is one. But if the object is not an iterator, but is iterable, then an iterator is constructed from the object.
A good example of this is the list object:
>>> x = [1, 2, 3]
>>> iter(x) == x
False
>>> iter(x)
<list_iterator object at 0x7fccadc5feb8>
>>> x
[1, 2, 3]
Lists are iterable, but they are not themselves iterators. The result of list.__iter__ is not the original list.
In Python when ever you try to use loops, or try to iterate over any object like below..
Lets try to understand for list object..
>>> l = [1, 2, 3] # Defined list l
If we iterate over the above list..
>>> for i in l:
... print i
...
1
2
3
When you try to do this iteration over list l, Python for loop checks for l.__iter__() which intern return an iterator object.
>>> for i in l.__iter__():
... print i
...
1
2
3
To understand this more, lets customize the list and create anew list class..
>>> class ListOverride(list):
... def __iter__(self):
... raise TypeError('Not iterable')
...
Here I've created ListOverride class which intern inherited from list and overrided list.__iter__ method to raise TypeError.
>>> ll = ListOverride([1, 2, 3])
>>> ll
[1, 2, 3]
And i've created anew list using ListOverride class, and since it's list object it should iterate in the same way as list does.
>>> for i in ll:
... print i
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in __iter__
TypeError: Not iterable
If we try to iterate over ListOverride object ll, we'll endup getting NotIterable exception..

Categories