Confusion about iterators and iterables in Python - python

I am currently reading in the official documentation of Python 3.5.
It states that range() is iterable, and that list() and for are iterators. [section 4.3]
However, here it states that zip() makes an iterator.
My question is that when we use this instruction:
list(zip(list1, list2))
are we using an iterator (list()) to iterate through another iterator?

The documentation is creating some confusion here, by re-using the term 'iterator'.
There are three components to the iterator protocol:
Iterables; things you can potentially iterate over and get their elements, one by one.
Iterators; things that do the iteration. Every time you want to step through all items of an iterable, you need one of these to keep track of where you are in the process. These are not re-usable; once you reach the end, that's it. For most iterables, you can create multiple indepedent iterators, each tracking position independently.
Consumers of iterators; those things that want to do something with the items.
A for loop is an example of the latter, so #3. A for loop uses the iter() function to produce an iterator (#2 above) for whatever you want to loop over, so that "whatever" must be an iterable (#1 above).
range() is an example of #1; it is iterable object. You can iterate over it multiple times, independently:
>>> r = range(5)
>>> r_iter_1 = iter(r)
>>> next(r_iter_1)
0
>>> next(r_iter_1)
1
>>> r_iter_2 = iter(r)
>>> next(r_iter_2)
0
>>> next(r_iter_1)
2
Here r_iter_1 and r_iter_2 are two separate iterators, and each time you ask for a next item they do so based on their own internal bookkeeping.
list() is an example of both an iterable (#1) and a iteration consumer (#3). If you pass another iterable (#1) to the list() call, a list object is produced containing all elements from that iterable. But list objects themselves are also iterables.
zip(), in Python 3, takes in multiple iterables (#1), and is itself an iterator (#2). zip() stores a new iterator (#2) for each of the iterables you gave it. Each time you ask zip() for the next element, zip() builds a new tuple with the next elements from each of the contained iterables:
>>> lst1, lst2 = ['foo', 'bar'], [42, 81]
>>> zipit = zip(lst1, lst2)
>>> next(zipit)
('foo', 42)
>>> next(zipit)
('bar', 81)
So in the end, list(zip(list1, list2)) uses both list1 and list2 as iterables (#1), zip() consumes those (#3) when it itself is being consumed by the outer list() call.

The documentation is badly worded. Here's the section you're referring to:
We say such an object is iterable, that is, suitable as a target for functions and constructs that expect something from which they can obtain successive items until the supply is exhausted. We have seen that the for statement is such an iterator. The function list() is another; it creates lists from iterables:
In this paragraph, iterator does not refer to a Python iterator object, but the general idea of "something which iterates over something". In particular, the for statement cannot be an iterator object because it isn't an object at all; it's a language construct.
To answer your specific question:
... when we use this instruction:
list(zip(list1, list2))
are we using an iterator (list()) to iterate through another iterator?
No, list() is not an iterator. It's the constructor for the list type. It can accept any iterable (including an iterator) as an argument, and uses that iterable to construct a list.
zip() is an iterator function, that is, a function which returns an iterator. In your example, the iterator it returns is passed to list(), which constructs a list object from it.
A simple way to tell whether an object is an iterator is to call next() with it, and see what happens:
>>> list1 = [1, 2, 3]
>>> list2 = [4, 5, 6]
>>> zipped = zip(list1, list2)
>>> zipped
<zip object at 0x7f27d9899688>
>>> next(zipped)
(1, 4)
In this case, the next element of zipped is returned.
>>> list3 = list(zipped)
>>> list3
[(2, 5), (3, 6)]
Notice that only the last two elements of the iterator are found in list3, because we already consumed the first one with next().
>>> next(list3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not an iterator
This doesn't work, because lists are not iterators.
>>> next(zipped)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
This time, although zipped is an iterator, calling next() with it raises StopIteration because it's already been exhausted to construct list3.

Related

What do you call the item of list when used as an iterator in a for loop?

I'm not sure how you name the n in the following for loop. Is there are a term for it?
for n in [1,2,3,4,5]:
print i
And, am I correct that the list itself is the iterator of the for loop ?
While n is called a loop variable the list is absolutely not an iterator. It is iterable object, i.e. and iterable, but it is not an iterator. An iterable may be an iterator itself, but not always. That is to say, iterators are iterable, but not all iterables are iterators. In the case of a list it is simply an iterable.
It is an iterable because it implements an __iter__ method, which returns an iterator:
From the Python Glossary an iterable is:
An object capable of returning its members one at a time. Examples of
iterables include all sequence types (such as list, str, and tuple)
and some non-sequence types like dict, file objects, and objects of
any classes you define with an __iter__() or __getitem__() method.
Iterables can be used in a for loop and in many other places where a
sequence is needed (zip(), map(), ...). When an iterable object is
passed as an argument to the built-in function iter(), it returns an
iterator for the object. This iterator is good for one pass over the
set of values. When using iterables, it is usually not necessary to
call iter() or deal with iterator objects yourself. The for statement
does that automatically for you, creating a temporary unnamed variable
to hold the iterator for the duration of the loop.
So, observe:
>>> x = [1,2,3]
>>> iterator = iter(x)
>>> type(iterator)
<class 'list_iterator'>
>>> next(iterator)
1
>>> next(iterator)
2
>>> next(iterator)
3
>>> next(iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
It is illuminating to understand that a for-loop in Python such as the following:
for n in some_iterable:
# do something
is equivalent to:
iterator = iter(some_iterable)
while True:
try:
n = next(iterator)
# do something
except StopIteration as e:
break
Iterators, which are returned by a call to an object's __iter__ method, also implement the __iter__ method (usually returning themselves) but they also implement a __next__ method. Thus, an easy way to check if something is an iterable is to see if it implements a next method
>>> next(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not an iterator
Again, from the Python Glossary, an iterator is:
An object representing a stream of data. Repeated calls to the
iterator’s __next__() method (or passing it to the built-in function
next()) return successive items in the stream. When no more data are
available a StopIteration exception is raised instead. At this point,
the iterator object is exhausted and any further calls to its
__next__() method just raise StopIteration again. Iterators are required to have an __iter__() method that returns the iterator object
itself so every iterator is also iterable and may be used in most
places where other iterables are accepted. One notable exception is
code which attempts multiple iteration passes. A container object
(such as a list) produces a fresh new iterator each time you pass it
to the iter() function or use it in a for loop. Attempting this with
an iterator will just return the same exhausted iterator object used
in the previous iteration pass, making it appear like an empty
container.
I've illustrated the bevahior of an iterator with the next function above, so now I want to concentrate on the bolded portion.
Basically, an iterator can be used in the place of an iterable because iterators are always iterable. However, an iterator is good for only a single pass. So, if I use a non-iterator iterable, like a list, I can do stuff like this:
>>> my_list = ['a','b','c']
>>> for c in my_list:
... print(c)
...
a
b
c
And this:
>>> for c1 in my_list:
... for c2 in my_list:
... print(c1,c2)
...
a a
a b
a c
b a
b b
b c
c a
c b
c c
>>>
An iterator behaves almost in the same way, so I can still do this:
>>> it = iter(my_list)
>>> for c in it:
... print(c)
...
a
b
c
>>>
However, iterators do not support multiple iteration (well, you can make your an iterator that does, but generally they do not):
>>> it = iter(my_list)
>>> for c1 in it:
... for c2 in it:
... print(c1,c2)
...
a b
a c
Why is that? Well, recall what is happening with the iterator protocol which is used by a for loop under the hood, and consider the following:
>>> my_list = ['a','b','c','d','e','f','g']
>>> iterator = iter(my_list)
>>> iterator_of_iterator = iter(iterator)
>>> next(iterator)
'a'
>>> next(iterator)
'b'
>>> next(iterator_of_iterator)
'c'
>>> next(iterator_of_iterator)
'd'
>>> next(iterator)
'e'
>>> next(iterator_of_iterator)
'f'
>>> next(iterator)
'g'
>>> next(iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> next(iterator_of_iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>
When I used iter() on an iterator, it returned itself!
>>> id(iterator)
139788446566216
>>> id(iterator_of_iterator)
139788446566216
The example you gave is an "iterator-based for-loop"
n is called the loop variable.
The role that list plays is more troublesome to name.
Indeed, after an interesting conversation with #juanpa.arrivillaga I've concluded that there simply isn't a "clearly correct formal name", nor a commonly used name, for that syntactic element.
That being said, I do think that if you referred to it in context in a sentence as "the loop iterator" everyone would know what you meant.
In doing so, you take the risk of confusing yourself or someone else with the fact that the syntactic element in that position is not in fact an iterator, its a collection or (loosely, but from the definition in the referenced article) an "iterable of some sort".
I suspect that one reason why there isn't a name for this is that we hardly ever have to refer to it in a sentence. Another is that they types of element that can appear in that position vary widely, so it is hard to safely cover them all with a label.

What is the purpose of __iter__ returning the iterator object itself?

I don't understand exactly why the __iter__ special method just returns the object it's called on (if it's called on an iterator). Is it essentially just a flag indicating that the object is an iterator?
EDIT: Actually, I discovered that "This is required to allow both containers and iterators to be used with the for and in statements." https://docs.python.org/3/library/stdtypes.html#iterator.iter
Alright, here's how I understand it: When writing a for loop, you're allowed to specify either an iterable or an iterator to loop over. But Python ultimately needs an iterator for the loop, so it calls the __iter__ method on whatever it's given. If it's been given an iterable, the __iter__ method will produce an iterator, and if it's been given an iterator, the __iter__ method will likewise produce an iterator (the original object given).
When you loop over something using for x in something, then the loop actually calls iter(something) first, so it has something to work with. In general, the for loop is approximately equivalent to something like this:
something_iterator = iter(something)
while True:
try:
x = next(something_iterator)
# loop body
except StopIteration:
break
So as you already figured out yourself, in order to be able to loop over an iterator, i.e. when something is already an iterator, iterators should always return themselves when calling iter() on them. So this basically makes sure that iterators are also iterable.
This depends what object you call iter on. If an object is already an iterator, then there is no operation required to convert it to an iterator, because it already is one. But if the object is not an iterator, but is iterable, then an iterator is constructed from the object.
A good example of this is the list object:
>>> x = [1, 2, 3]
>>> iter(x) == x
False
>>> iter(x)
<list_iterator object at 0x7fccadc5feb8>
>>> x
[1, 2, 3]
Lists are iterable, but they are not themselves iterators. The result of list.__iter__ is not the original list.
In Python when ever you try to use loops, or try to iterate over any object like below..
Lets try to understand for list object..
>>> l = [1, 2, 3] # Defined list l
If we iterate over the above list..
>>> for i in l:
... print i
...
1
2
3
When you try to do this iteration over list l, Python for loop checks for l.__iter__() which intern return an iterator object.
>>> for i in l.__iter__():
... print i
...
1
2
3
To understand this more, lets customize the list and create anew list class..
>>> class ListOverride(list):
... def __iter__(self):
... raise TypeError('Not iterable')
...
Here I've created ListOverride class which intern inherited from list and overrided list.__iter__ method to raise TypeError.
>>> ll = ListOverride([1, 2, 3])
>>> ll
[1, 2, 3]
And i've created anew list using ListOverride class, and since it's list object it should iterate in the same way as list does.
>>> for i in ll:
... print i
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in __iter__
TypeError: Not iterable
If we try to iterate over ListOverride object ll, we'll endup getting NotIterable exception..

How to use python generator expressions to create a oneliner to run a function multiple times and get a list output

I am wondering if there is there is a simple Pythonic way (maybe using generators) to run a function over each item in a list and result in a list of returns?
Example:
def square_it(x):
return x*x
x_set = [0,1,2,3,4]
squared_set = square_it(x for x in x_set)
I notice that when I do a line by line debug on this, the object that gets passed into the function is a generator.
Because of this, I get an error:
TypeError: unsupported operand type(s) for *: 'generator' and 'generator'
I understand that this generator expression created a generator to be passed into the function, but I am wondering if there is a cool way to accomplish running the function multiple times only by specifying an iterable as the argument? (without modifying the function to expect an iterable).
It seems to me that this ability would be really useful to cut down on lines of code because you would not need to create a loop to fun the function and a variable to save the output in a list.
Thanks!
You want a list comprehension:
squared_set = [square_it(x) for x in x_set]
There's a builtin function, map(), for this common problem.
>>> map(square_it, x_set)
[0,1,4,9,16] # On Python 3, a generator is returned.
Alternatively, one can use a generator expression, which is memory-efficient but lazy (meaning the values will not be computed now, only when needed):
>>> (square_it(x) for x in x_set)
<generator object <genexpr> at ...>
Similarly, one can also use a list comprehension, which computes all the values upon creation, returning a list.
Additionally, here's a comparison of generator expressions and list comprehensions.
You want to call the square_it function inside the generator, not on the generator.
squared_set = (square_it(x) for x in x_set)
As the other answers have suggested, I think it is best (most "pythonic") to call your function explicitly on each element, using a list or generator comprehension.
To actually answer the question though, you can wrap your function that operates over scalers with a function that sniffs the input, and has different behavior depending on what it sees. For example:
>>> import types
>>> def scaler_over_generator(f):
... def wrapper(x):
... if isinstance(x, types.GeneratorType):
... return [f(i) for i in x]
... return f(x)
... return wrapper
>>> def square_it(x):
... return x * x
>>> square_it_maybe_over = scaler_over_generator(square_it)
>>> square_it_maybe_over(10)
100
>>> square_it_maybe_over(x for x in range(5))
[0, 1, 4, 9, 16]
I wouldn't use this idiom in my code, but it is possible to do.
You could also code it up with a decorator, like so:
>>> #scaler_over_generator
... def square_it(x):
... return x * x
>>> square_it(x for x in range(5))
[0, 1, 4, 9, 16]
If you didn't want/need a handle to the original function.
Note that there is a difference between list comprehension returning a list
squared_set = [square_it(x) for x in x_set]
and returning a generator that you can iterate over it:
squared_set = (square_it(x) for x in x_set)

How to identify a generator vs list comprehension

I have this:
>>> sum( i*i for i in xrange(5))
My question is, in this case am I passing a list comprehension or a generator object to sum ? How do I tell that? Is there a general rule around this?
Also remember sum by itself needs a pair of parentheses to surround its arguments. I'd think that the parentheses above are for sum and not for creating a generator object. Wouldn't you agree?
You are passing in a generator expression.
A list comprehension is specified with square brackets ([...]). A list comprehension builds a list object first, so it uses syntax closely related to the list literal syntax:
list_literal = [1, 2, 3]
list_comprehension = [i for i in range(4) if i > 0]
A generator expression, on the other hand, creates an iterator object. Only when iterating over that object is the contained loop executed and are items produced. The generator expression does not retain those items; there is no list object being built.
A generator expression always uses (...) round parethesis, but when used as the only argument to a call, the parenthesis can be omitted; the following two expressions are equivalent:
sum((i*i for i in xrange(5))) # with parenthesis
sum(i*i for i in xrange(5)) # without parenthesis around the generator
Quoting from the generator expression documentation:
The parentheses can be omitted on calls with only one argument. See section Calls for the detail.
List comprehensions are enclosed in []:
>>> [i*i for i in xrange(5)] # list comprehension
[0, 1, 4, 9, 16]
>>> (i*i for i in xrange(5)) # generator
<generator object <genexpr> at 0x2cee40>
You are passing a generator.
That is a generator:
>>> (i*i for i in xrange(5))
<generator object <genexpr> at 0x01A27A08>
>>>
List comprehensions are enclosed in [].
You might also be asking, "does this syntax truly cause sum to consume a generator one item at a time, or does it secretly create a list of every item in the generator first"? One way to check this is to try it on a very large range and watch memory usage:
sum(i for i in xrange(int(1e8)))
Memory usage for this case is constant, where as range(int(1e8)) creates the full list and consumes several hundred MB of RAM.
You can test that the parentheses are optional:
def print_it(obj):
print obj
print_it(i for i in xrange(5))
# prints <generator object <genexpr> at 0x03853C60>
I tried this:
#!/usr/bin/env python
class myclass:
def __init__(self,arg):
self.p = arg
print type(self.p)
print self.p
if __name__ == '__main__':
c = myclass(i*i for i in xrange(5))
And this prints:
$ ./genexprorlistcomp.py
<type 'generator'>
<generator object <genexpr> at 0x7f5344c7cf00>
Which is consistent with what Martin and mdscruggs explained in their post.
You are passing a generator object, list comprehension is surrounded by [].

Python `for` syntax: block code vs single line generator expressions

I'm familiar with the for loop in a block-code context. eg:
for c in "word":
print c
I just came across some examples that use for differently. Rather than beginning with the for statement, they tag it at the end of an expression (and don't involve an indented code-block). eg:
sum(x*x for x in range(10))
Can anyone point me to some documentation that outlines this use of for? I've been able to find examples, but not explanations. All the for documentation I've been able to find describes the previous use (block-code example). I'm not even sure what to call this use, so I apologize if my question's title is unclear.
What you are pointing to is Generator in Python. Take a look at: -
http://wiki.python.org/moin/Generators
http://www.python.org/dev/peps/pep-0255/
http://docs.python.org/whatsnew/2.5.html#pep-342-new-generator-features
See the documentation: - Generator Expression which contains exactly the same example you have posted
From the documentation: -
Generators are a simple and powerful tool for creating iterators. They
are written like regular functions but use the yield statement
whenever they want to return data. Each time next() is called, the
generator resumes where it left-off (it remembers all the data values
and which statement was last executed)
Generators are similar to List Comprehension that you use with square brackets instead of brackets, but they are more memory efficient. They don't return the complete list of result at the same time, but they return generator object. Whenever you invoke next() on the generator object, the generator uses yield to return the next value.
List Comprehension for the above code would look like: -
[x * x for x in range(10)]
You can also add conditions to filter out results at the end of the for.
[x * x for x in range(10) if x % 2 != 0]
This will return a list of numbers multiplied by 2 in the range 1 to 5, if the number is not divisible by 2.
An example of Generators depicting the use of yield can be: -
def city_generator():
yield("Konstanz")
yield("Zurich")
yield("Schaffhausen")
yield("Stuttgart")
>>> x = city_generator()
>>> x.next()
Konstanz
>>> x.next()
Zurich
>>> x.next()
Schaffhausen
>>> x.next()
Stuttgart
>>> x.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
So, you see that, every call to next() executes the next yield() in generator. and at the end it throws StopIteration.
Those are generator expressions and they are related to list comprehensions
List comprehensions allow for the easy creation of lists. For example, if you wanted to create a list of perfect squares you could do this:
>>> squares = []
>>> for x in range(10):
... squares.append(x**2)
...
>>> squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
But instead you could use a list comprehension:
squares = [x**2 for x in range(10)]
Generator expressions are like list comprehensions, except they return a generator object instead of a list. You can iterate over this generator object in a similar manner to list comprehensions, but you don't have to store the whole list in memory at once, as you would if you created the list in a list comprehension.
Documentation for generator expressions is here https://www.python.org/dev/peps/pep-0289/
Following is the code using generator expression .
list(x**2 for x in range(0,10))
Your specific example is called a generator expression. List comprehensions, dictionary comprehensions, and set comprehensions are similar in meaning (different result types, and generator expressions are lazy) and have the same syntax, modulo being inside other kinds of brackets, and in the case of a dict comprehension having expr1: expr2 instead of a single expression (x*x in your example).

Categories