When to use __iter__() vs iter()? - python

Let's say I have a class which implements an __iter__() function, is it preferred to use iter(obj) or calling obj.__iter__() directly? Are there any real differences besides having to type 5 characters less with the magic function?
In contrast: For next() and __next__() I can see an advantage for having a default value with the magic function.

The difference is mostly just convenience. It's less typing and less symbols to read, and so faster to read. However, the various builtin functions (eg. iter, len et al.) usually do a little type checking to catch errors early. If you wrote a customer __iter__ method and it returned 2, then invoking obj.__iter__() wouldn't catch that, but iter(obj) throws a type error. eg.
>>> class X:
def __iter__(self):
return 2
>>> x = X()
>>> x.__iter__()
2
>>> iter(x)
Traceback (most recent call last):
File "<pyshell#37>", line 1, in <module>
iter(x)
TypeError: iter() returned non-iterator of type 'int'
iter also implements the iterator protocol for objects that have no __iter__, but do implement the sequence protocol. That is, they have a __getitem__ method which implements a sequence starting at index 0 and raises an IndexError for indexes not in bounds. This is an older feature of python and not really something new code should be using. eg.
>>> class Y:
def __getitem__(self, index):
if 0 <= index < 5:
return index ** 2
else:
raise IndexError(index)
>>> list(iter(Y())) # iter not strictly needed here
[0, 1, 4, 9, 16]
When should you use __iter__? This might not be so relevant to __iter__, but if you need access to the implementation of method that the parent class uses then it is best to invoke such methods in the style super().__<dunder_method>__() (using Python 3 style super usage). eg.
>>> class BizzareList(list):
def __iter__(self):
for item in super().__iter__():
yield item * 10
>>> l = BizzareList(range(5))
>>> l # normal access
[0, 1, 2, 3, 4]
>>> l[0] # also normal access
0
>>> tuple(iter(l)) # iter not strictly needed here
(0, 10, 20, 30, 40)

The iter() function (which in turn calls the iter() method) is a python built-in function that returns an
iterator from them. You can opt to use iter(obj) or obj.__iter__(), they are simply the same thing. Its just
that __iter__() is a user-defined function and calling it will just do what the written code asks it to do.
On the other hand, calling the iter() function will in turn call the user-defined __iter__() function and run
the same code. However, iter() will also run a built-in 'type' checking, in which if the customized __iter__()
doesn't return an iterator object, it would throw an error.
*The same thing applies to next() & __next__().
Thus, iter(obj) ~= obj.__iter__(),
next(obj) ~= obj.__next__()
Note that __iter__() and __next__() must be defined by the user when creating a class if the object os intended
to be an iterator.
Source: https://www.programiz.com/python-programming/iterator

Related

Iter returning self as a placeholder

I have seen a lot of examples where the __iter__ method returns self, and I've done my own example:
class Transactions:
def __init__(self):
self.t = [1,2,9,12.00]
self.idx = 0
def __iter__(self):
return self
def __next__(self):
pos = self.idx
self.idx += 1
try:
return self.t[pos]
except IndexError:
raise StopIteration
>>> list(iter(Transactions()))
[1, 2, 9, 12.0]
How does "returning self" make the object iterable? What exactly does that do?
What you are making is both an Iterator and an Iterable. The instances of the Transactions class will have both the __iter__ which makes it an Iterable and __next__, which makes it an Iterator.
So when you return self from __iter__ you are basically indicating that the returned object is an Iterable as it has the __iter__, and since on calling __iter__, it must return an instance of an Iterator therefore you have defined the __next__ for the same instance so that it behaves like one.
This is highlighted in the documentation:
Iterable
An object capable of returning its members one at a time. Examples of
iterables include all sequence types (such as list, str, and tuple)
and some non-sequence types like dict, file objects, and objects of
any classes you define with an iter() method or with a
getitem() method that implements Sequence semantics.
Iterables can be used in a for loop and in many other places where a sequence is needed (zip(), map(), …). When an iterable object
is passed as an argument to the built-in function iter(), it returns
an iterator for the object. This iterator is good for one pass over
the set of values. When using iterables, it is usually not necessary
to call iter() or deal with iterator objects yourself. The for
statement does that automatically for you, creating a temporary
unnamed variable to hold the iterator for the duration of the loop.
See also iterator, sequence, and generator.
Iterator
An object representing a stream of data. Repeated calls to the
iterator’s __next__() method (or passing it to the built-in function
next()) return successive items in the stream. When no more data are
available a StopIteration exception is raised instead. At this point,
the iterator object is exhausted and any further calls to its
next() method just raise StopIteration again. Iterators are required to have an iter() method that returns the iterator object
itself so every iterator is also iterable and may be used in most
places where other iterables are accepted. One notable exception is
code which attempts multiple iteration passes. A container object
(such as a list) produces a fresh new iterator each time you pass it
to the iter() function or use it in a for loop. Attempting this with
an iterator will just return the same exhausted iterator object used
in the previous iteration pass, making it appear like an empty
container.
A nice answer was given, I'd like to show a practical example.
The Transaction defined in the question is an iterator. It can be iterated over just once.
While the typical iteration has the form for x in ...:, let's continue to use the shorter list() for demonstration:
>>> t=Transactions()
>>> list(t)
[1, 2, 9, 12.0]
>>> list(t)
[]
>>> list(t)
[]
For a real class, this is not what people expect. In order to iterate over the data every time, a new iterator must be created for each iteration, making the original class an iterable:
class TransactionsIterator:
def __init__(self, t):
self.t = t
self.idx = 0
def __iter__(self):
return self
def __next__(self):
pos = self.idx
self.idx += 1
try:
return self.t[pos]
except IndexError:
raise StopIteration
class Transactions:
def __init__(self):
self.t = [1,2,9,12.00]
def __iter__(self):
return TransactionsIterator(self.t)
This behaves as other classes usually do:
>>> t=Transactions()
>>> list(t)
[1, 2, 9, 12.0]
>>> list(t)
[1, 2, 9, 12.0]
>>> list(t)
[1, 2, 9, 12.0]
>>> list(t)
and finally you don't have to reinvent a list iterator, this is all you need:
class Transactions:
def __init__(self):
self.t = [1,2,9,12.00]
def __iter__(self):
return iter(self.t)
Back to the question. We can iterate over data once with an iterator and every time with an iterable. The whole point of iterators returning self in __iter__ is that the iteration code does not have to distinguish between these two cases.

The Iterator Protocol. Is it Dark Magic?

So I've been writing iterators for a while, and I thought that I understood them. But I've been struggling with some issues tonight, and the more I play with it, the more confused I become.
I thought that for an iterator you had to implement __iter__ and next (or __next__). And that when you first tried to iterate over the iterator the __iter__ method would be called, and then next would be called until an StopIteration was raised.
When I run this code though
class Iter(object):
def __iter__(self):
return iter([2, 4, 6])
def next(self):
for y in [1, 2, 3]:
return y
iterable = Iter()
for x in iterable:
print(x)
The output is 2 4 6. So __iter__ is being called, but not next. That does seem to match with the documentation that I found here. But then that raises a whole bunch more questions in my mind.
Specifically, what's the difference between a container type and iterator if it's not the implementation of next? How do I know before hand which way my class is going to be treated? And most importantly, if I want to write a class where my next method is called when I use for x in Iter(), how can I do that?
A list is iterable, but it is not an iterator. Compare and contrast:
>>> type([])
list
>>> type(iter([]))
list_iterator
Calling iter on a list creates and returns a new iterator object for iterating the contents of that list.
In your object, you just return a list iterator, specifically an iterator over the list [2, 4, 6], so that object knows nothing about yielding elements 1, 2, 3.
def __iter__(self):
return iter([2, 4, 6]) # <-- you're returning the list iterator, not your own
Here's a more fundamental implementation conforming to the iterator protocol in Python 2, which doesn't confuse matters by relying on list iterators, generators, or anything fancy at all.
class Iter(object):
def __iter__(self):
self.val = 0
return self
def next(self):
self.val += 1
if self.val > 3:
raise StopIteration
return self.val
According to the documentation you link to, an iterator's __iter__ method is supposed to return itself. So your iterator isn't really an iterator at all: when for invokes __iter__ to get the iterator, you're giving it iter([2,4,6]), but you should be giving it self.
(Also, I don't think your next method does what you intend: it returns 1 every time it's called, and never raises StopIteration. So if you fixed __iter__, then your iterator would be an iterator over an infinite stream of ones, rather than over the finite list [1, 2, 3]. But that's a side-issue.)

What do you call the item of list when used as an iterator in a for loop?

I'm not sure how you name the n in the following for loop. Is there are a term for it?
for n in [1,2,3,4,5]:
print i
And, am I correct that the list itself is the iterator of the for loop ?
While n is called a loop variable the list is absolutely not an iterator. It is iterable object, i.e. and iterable, but it is not an iterator. An iterable may be an iterator itself, but not always. That is to say, iterators are iterable, but not all iterables are iterators. In the case of a list it is simply an iterable.
It is an iterable because it implements an __iter__ method, which returns an iterator:
From the Python Glossary an iterable is:
An object capable of returning its members one at a time. Examples of
iterables include all sequence types (such as list, str, and tuple)
and some non-sequence types like dict, file objects, and objects of
any classes you define with an __iter__() or __getitem__() method.
Iterables can be used in a for loop and in many other places where a
sequence is needed (zip(), map(), ...). When an iterable object is
passed as an argument to the built-in function iter(), it returns an
iterator for the object. This iterator is good for one pass over the
set of values. When using iterables, it is usually not necessary to
call iter() or deal with iterator objects yourself. The for statement
does that automatically for you, creating a temporary unnamed variable
to hold the iterator for the duration of the loop.
So, observe:
>>> x = [1,2,3]
>>> iterator = iter(x)
>>> type(iterator)
<class 'list_iterator'>
>>> next(iterator)
1
>>> next(iterator)
2
>>> next(iterator)
3
>>> next(iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
It is illuminating to understand that a for-loop in Python such as the following:
for n in some_iterable:
# do something
is equivalent to:
iterator = iter(some_iterable)
while True:
try:
n = next(iterator)
# do something
except StopIteration as e:
break
Iterators, which are returned by a call to an object's __iter__ method, also implement the __iter__ method (usually returning themselves) but they also implement a __next__ method. Thus, an easy way to check if something is an iterable is to see if it implements a next method
>>> next(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not an iterator
Again, from the Python Glossary, an iterator is:
An object representing a stream of data. Repeated calls to the
iterator’s __next__() method (or passing it to the built-in function
next()) return successive items in the stream. When no more data are
available a StopIteration exception is raised instead. At this point,
the iterator object is exhausted and any further calls to its
__next__() method just raise StopIteration again. Iterators are required to have an __iter__() method that returns the iterator object
itself so every iterator is also iterable and may be used in most
places where other iterables are accepted. One notable exception is
code which attempts multiple iteration passes. A container object
(such as a list) produces a fresh new iterator each time you pass it
to the iter() function or use it in a for loop. Attempting this with
an iterator will just return the same exhausted iterator object used
in the previous iteration pass, making it appear like an empty
container.
I've illustrated the bevahior of an iterator with the next function above, so now I want to concentrate on the bolded portion.
Basically, an iterator can be used in the place of an iterable because iterators are always iterable. However, an iterator is good for only a single pass. So, if I use a non-iterator iterable, like a list, I can do stuff like this:
>>> my_list = ['a','b','c']
>>> for c in my_list:
... print(c)
...
a
b
c
And this:
>>> for c1 in my_list:
... for c2 in my_list:
... print(c1,c2)
...
a a
a b
a c
b a
b b
b c
c a
c b
c c
>>>
An iterator behaves almost in the same way, so I can still do this:
>>> it = iter(my_list)
>>> for c in it:
... print(c)
...
a
b
c
>>>
However, iterators do not support multiple iteration (well, you can make your an iterator that does, but generally they do not):
>>> it = iter(my_list)
>>> for c1 in it:
... for c2 in it:
... print(c1,c2)
...
a b
a c
Why is that? Well, recall what is happening with the iterator protocol which is used by a for loop under the hood, and consider the following:
>>> my_list = ['a','b','c','d','e','f','g']
>>> iterator = iter(my_list)
>>> iterator_of_iterator = iter(iterator)
>>> next(iterator)
'a'
>>> next(iterator)
'b'
>>> next(iterator_of_iterator)
'c'
>>> next(iterator_of_iterator)
'd'
>>> next(iterator)
'e'
>>> next(iterator_of_iterator)
'f'
>>> next(iterator)
'g'
>>> next(iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> next(iterator_of_iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>
When I used iter() on an iterator, it returned itself!
>>> id(iterator)
139788446566216
>>> id(iterator_of_iterator)
139788446566216
The example you gave is an "iterator-based for-loop"
n is called the loop variable.
The role that list plays is more troublesome to name.
Indeed, after an interesting conversation with #juanpa.arrivillaga I've concluded that there simply isn't a "clearly correct formal name", nor a commonly used name, for that syntactic element.
That being said, I do think that if you referred to it in context in a sentence as "the loop iterator" everyone would know what you meant.
In doing so, you take the risk of confusing yourself or someone else with the fact that the syntactic element in that position is not in fact an iterator, its a collection or (loosely, but from the definition in the referenced article) an "iterable of some sort".
I suspect that one reason why there isn't a name for this is that we hardly ever have to refer to it in a sentence. Another is that they types of element that can appear in that position vary widely, so it is hard to safely cover them all with a label.

What is the purpose of __iter__ returning the iterator object itself?

I don't understand exactly why the __iter__ special method just returns the object it's called on (if it's called on an iterator). Is it essentially just a flag indicating that the object is an iterator?
EDIT: Actually, I discovered that "This is required to allow both containers and iterators to be used with the for and in statements." https://docs.python.org/3/library/stdtypes.html#iterator.iter
Alright, here's how I understand it: When writing a for loop, you're allowed to specify either an iterable or an iterator to loop over. But Python ultimately needs an iterator for the loop, so it calls the __iter__ method on whatever it's given. If it's been given an iterable, the __iter__ method will produce an iterator, and if it's been given an iterator, the __iter__ method will likewise produce an iterator (the original object given).
When you loop over something using for x in something, then the loop actually calls iter(something) first, so it has something to work with. In general, the for loop is approximately equivalent to something like this:
something_iterator = iter(something)
while True:
try:
x = next(something_iterator)
# loop body
except StopIteration:
break
So as you already figured out yourself, in order to be able to loop over an iterator, i.e. when something is already an iterator, iterators should always return themselves when calling iter() on them. So this basically makes sure that iterators are also iterable.
This depends what object you call iter on. If an object is already an iterator, then there is no operation required to convert it to an iterator, because it already is one. But if the object is not an iterator, but is iterable, then an iterator is constructed from the object.
A good example of this is the list object:
>>> x = [1, 2, 3]
>>> iter(x) == x
False
>>> iter(x)
<list_iterator object at 0x7fccadc5feb8>
>>> x
[1, 2, 3]
Lists are iterable, but they are not themselves iterators. The result of list.__iter__ is not the original list.
In Python when ever you try to use loops, or try to iterate over any object like below..
Lets try to understand for list object..
>>> l = [1, 2, 3] # Defined list l
If we iterate over the above list..
>>> for i in l:
... print i
...
1
2
3
When you try to do this iteration over list l, Python for loop checks for l.__iter__() which intern return an iterator object.
>>> for i in l.__iter__():
... print i
...
1
2
3
To understand this more, lets customize the list and create anew list class..
>>> class ListOverride(list):
... def __iter__(self):
... raise TypeError('Not iterable')
...
Here I've created ListOverride class which intern inherited from list and overrided list.__iter__ method to raise TypeError.
>>> ll = ListOverride([1, 2, 3])
>>> ll
[1, 2, 3]
And i've created anew list using ListOverride class, and since it's list object it should iterate in the same way as list does.
>>> for i in ll:
... print i
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in __iter__
TypeError: Not iterable
If we try to iterate over ListOverride object ll, we'll endup getting NotIterable exception..

How does iter() work, it's giving "TypeError: iter(v, w): v must be callable"

What is the problem with this code?
l = [1,2,3,4,5,6]
for val in iter(l, 4):
print (val)
It returns
TypeError: iter(v, w): v must be callable
Why does callable(list) return True but callable(l) does not?
EDIT
What method should be preferred here:
manual breaks
hundred others
From iter help:
iter(...)
iter(collection) -> iterator
iter(callable, sentinel) -> iterator
Get an iterator from an object. In the first form, the argument must
supply its own iterator, or be a sequence.
In the second form, the callable is called until it returns the sentinel.
You are mixing two variants of iter function. First one accepts collections, second accepts two arguments - function and sentinel value. You're trying to pass collection and sentinel value, which is wrong.
Short note: you can get a lot of interesting info from python's built-in help function. Simply type in python's console help(iter) and you'll get documentation on it.
Why does callabe(list) return true but callable(l) does not?
Because list is function which returns new list object. Function is callable (that's what function does - it gets called), while instance which this function returns - new list object - is not.
When called with two arguments, iter takes a callable and a sentinel value. It's behavior is like it was implemented so:
def iter2args(f, sentinel):
value = f()
while value != sentinel:
yield value
value = f()
What gets passed in as f must be callable, which just means that you can call it like a function. The list builtin is a type object, which you use to create new list instances, by calling it like a function:
>>> list('abcde')
['a', 'b', 'c', 'd', 'e']
The list l you passed in is an existing list instance, which can't be used like a function:
>>> l = [1,2,3,4,5,6]
>>> l(3)
Traceback (most recent call last):
File "<pyshell#20>", line 1, in <module>
l(3)
TypeError: 'list' object is not callable
Thus, there is a large and important difference between the list type object and list instances, which shows up when using with iter.
To iterate through a list until a sentinel is reached, you can use itertools.takewhile:
import itertools
for val in itertools.takewhile(l, lambda x: x!= 4):
print(val)
It has to do with the second value being pass (a so called sentinel value), this ensures that the object being iterated over is a callable ie. a function.
So for every iteration that iter()does it calls __next__() on the object being passed.
iter() has two distinct behaviors,
without a sentinel value
with a sentinel value
The example in the documentation is great for understanding it
with open("mydata.txt") as fp:
for line in iter(fp.readline, "STOP"): #fp.readline is a function here.
process_line(line)
Have a look at docs: http://docs.python.org/2/library/functions.html#iter
When second argument in iter is present then first argument is treated very differently. It is supposed to be a function which is called in each step. If it returns sentinel (i.e. the second argument), then the iteration stops. For example:
l=[1,2,3,4,5,6]
index = -1
def fn():
global index
index += 1
return l[index]
for val in iter(fn, 4):
print (val)
EDIT: If you want to just loop over a list and stop when you see a sentinel, then I recommend doing simply this:
for val in l:
# main body
if val == 4:
break
Remember that classes are objects in Python.
>>> callable(list)
True
Means that list itself is callable, rather than instances of it being callable. As you've seen, they aren't:
>>> callable([])
False
In fact, all classes in Python are callable - if they don't have literals like list, this is the usual way to instantiate them. Consider:
def MyClass:
pass
a = MyClass()
The last line calls the MyClass class object, which creates an instance - so, MyClass must be callable. But you wouldn't expect that instance to be callable, since MyClass itself doesn't define an __call__.
On the other hand, the class of MyClass (ie, its metaclass, type) does:
>>> type.__call__
<slot wrapper '__call__' of 'type' objects>
which is what makes MyClass callable.
Why does callabe(list) return true but callable(l) does not?
Because list is a Python builtin function while l is a list.
Maybe you are looking for something like this?
>>> l = [1,2,3,4,5,6]
>>> l_it = iter(l)
>>> while True:
... next_val = next(l_it, None)
... if not next_val:
... break
... print(next_val)

Categories