Why is range not an iterator but reversed(range) is? - python

How come I can call next to a reversed range but can't call it on a regular range ?
r1 = range(6)
next(r1) # Error
r2 = reversed(range(6))
next(r2) # -> 5

There is a subtle distinction here. First, range is a type. An instance of range is not an iterator, because range.__next__ is not defined. An instance is iterable, though, because range.__iter__ is defined, so you can get an iterator with, for example, iter(range(3)).
>>> type(range(1))
<class 'range'>
>>> type(iter(range(1)))
<class 'range_iterator'>
range.__next__ is not defined, but range_iterator.__next__ is.
An instance of range represents a bounded sequence of integers, without actually being a bounded sequences. As such, you can have multiple independent iterators over the same range.
>>> r = range(10)
>>> i1 = iter(r)
>>> next(i1)
0
>>> next(i1)
1
>>> next(i1)
2
>>> i2 = iter(r)
>>> next(i2)
0
>>> next(i1)
3
reversed, however, by definition returns an iterator. If need be, it can call iter on its iterable argument in order to get an iterator to reverse. It can also use its argument's __reversed__ method to get a reverse iterator. range.__reversed__ yields an iterator like range.__iter__, but going in the opposite direction.

Because, per the reversed() docs:
Return a reverse iterator.
range(), however, returns an immutable sequence.
next() can only be used on objects with a __next__() method.

Related

Does next() eliminate values from a generator?

I've written a generator that does nothing more or less than store a range from 0 to 10:
result = (num for num in range(11))
When I want to print values, I can use next():
print(next(result))
[Out]: 0
print(next(result))
[Out]: 1
print(next(result))
[Out]: 2
print(next(result))
[Out]: 3
print(next(result))
[Out]: 4
If I then run a for loop on the generator, it runs on the values that I have not called next() on:
for value in result:
print(value)
[Out]: 5
6
7
8
9
10
Has the generator eliminated the other values by acting on them with a next() function? I've tried to find some documentation on the functionality of next() and generators but haven't been successful.
Actually this is can be implicitly deduced from next's docs and by understanding the iterator protocol/contract:
next(iterator[, default])
Retrieve the next item from the iterator by
calling its next() method. If default is given, it is returned if
the iterator is exhausted, otherwise StopIteration is raised.
Yes. Using a generator's __next__ method retrieves and removes the next value from the generator.
tldr; yes
An iterator is essentially a value producer that yields successive values from its associated iterable object. The built-in function next() is used to obtain the next value from in iterator.
Here is an example using the same list as above:
>>> l = ['Sarah', 'Roark']
>>> itr = iter(l)
>>> itr
<list_iterator object at 0x100ba8950>
>>> next(itr)
'Sarah'
>>> next(itr)
'Roark'
>>> next(itr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
In this example, l is an iterable list and itr is the associated iterator, obtained with iter(). Each next(itr) call obtains the next value from itr.
Notice how an iterator retains its state internally. It knows which values have been obtained already, so when you call next(), it knows what value to return next.
If all the values from an iterator have been returned already, a subsequent next() call raises a StopIteration exception. Any further attempts to obtain values from the iterator will fail.
We can only obtain values from an iterator in one direction. We can’t go backward. There is no prev() function. But we can define two independent iterators on the same iterable object:
>>> l
['Sarah', 'Roark', 30]
>>> itr1 = iter(l)
>>> itr2 = iter(l)
>>> next(itr1)
'Sarah'
>>> next(itr1)
'Roark'
>>> next(itr1)
30
>>> next(itr2)
'Sarah'
Yes, a for loop in Python just returns the next item from the iterator, the same way that next() does.
https://docs.python.org/3/reference/compound_stmts.html#the-for-statement
The suite is then executed once for each item provided by the iterator, in the order returned by the iterator.
So you can think of a for loop like this:
for x in container:
statement()
As (almost) equivalent to a while loop:
iterator = iter(container)
while True:
x = next(iterator)
if x is None:
break
statement()
If container is already an iterator, then iter(container) is container.
Note: Technically, a for loop is more like this:
iterator = iter(container)
while True:
try:
x = iterator.__next__()
except StopIteration:
break
statement()

What do you call the item of list when used as an iterator in a for loop?

I'm not sure how you name the n in the following for loop. Is there are a term for it?
for n in [1,2,3,4,5]:
print i
And, am I correct that the list itself is the iterator of the for loop ?
While n is called a loop variable the list is absolutely not an iterator. It is iterable object, i.e. and iterable, but it is not an iterator. An iterable may be an iterator itself, but not always. That is to say, iterators are iterable, but not all iterables are iterators. In the case of a list it is simply an iterable.
It is an iterable because it implements an __iter__ method, which returns an iterator:
From the Python Glossary an iterable is:
An object capable of returning its members one at a time. Examples of
iterables include all sequence types (such as list, str, and tuple)
and some non-sequence types like dict, file objects, and objects of
any classes you define with an __iter__() or __getitem__() method.
Iterables can be used in a for loop and in many other places where a
sequence is needed (zip(), map(), ...). When an iterable object is
passed as an argument to the built-in function iter(), it returns an
iterator for the object. This iterator is good for one pass over the
set of values. When using iterables, it is usually not necessary to
call iter() or deal with iterator objects yourself. The for statement
does that automatically for you, creating a temporary unnamed variable
to hold the iterator for the duration of the loop.
So, observe:
>>> x = [1,2,3]
>>> iterator = iter(x)
>>> type(iterator)
<class 'list_iterator'>
>>> next(iterator)
1
>>> next(iterator)
2
>>> next(iterator)
3
>>> next(iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
It is illuminating to understand that a for-loop in Python such as the following:
for n in some_iterable:
# do something
is equivalent to:
iterator = iter(some_iterable)
while True:
try:
n = next(iterator)
# do something
except StopIteration as e:
break
Iterators, which are returned by a call to an object's __iter__ method, also implement the __iter__ method (usually returning themselves) but they also implement a __next__ method. Thus, an easy way to check if something is an iterable is to see if it implements a next method
>>> next(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not an iterator
Again, from the Python Glossary, an iterator is:
An object representing a stream of data. Repeated calls to the
iterator’s __next__() method (or passing it to the built-in function
next()) return successive items in the stream. When no more data are
available a StopIteration exception is raised instead. At this point,
the iterator object is exhausted and any further calls to its
__next__() method just raise StopIteration again. Iterators are required to have an __iter__() method that returns the iterator object
itself so every iterator is also iterable and may be used in most
places where other iterables are accepted. One notable exception is
code which attempts multiple iteration passes. A container object
(such as a list) produces a fresh new iterator each time you pass it
to the iter() function or use it in a for loop. Attempting this with
an iterator will just return the same exhausted iterator object used
in the previous iteration pass, making it appear like an empty
container.
I've illustrated the bevahior of an iterator with the next function above, so now I want to concentrate on the bolded portion.
Basically, an iterator can be used in the place of an iterable because iterators are always iterable. However, an iterator is good for only a single pass. So, if I use a non-iterator iterable, like a list, I can do stuff like this:
>>> my_list = ['a','b','c']
>>> for c in my_list:
... print(c)
...
a
b
c
And this:
>>> for c1 in my_list:
... for c2 in my_list:
... print(c1,c2)
...
a a
a b
a c
b a
b b
b c
c a
c b
c c
>>>
An iterator behaves almost in the same way, so I can still do this:
>>> it = iter(my_list)
>>> for c in it:
... print(c)
...
a
b
c
>>>
However, iterators do not support multiple iteration (well, you can make your an iterator that does, but generally they do not):
>>> it = iter(my_list)
>>> for c1 in it:
... for c2 in it:
... print(c1,c2)
...
a b
a c
Why is that? Well, recall what is happening with the iterator protocol which is used by a for loop under the hood, and consider the following:
>>> my_list = ['a','b','c','d','e','f','g']
>>> iterator = iter(my_list)
>>> iterator_of_iterator = iter(iterator)
>>> next(iterator)
'a'
>>> next(iterator)
'b'
>>> next(iterator_of_iterator)
'c'
>>> next(iterator_of_iterator)
'd'
>>> next(iterator)
'e'
>>> next(iterator_of_iterator)
'f'
>>> next(iterator)
'g'
>>> next(iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>> next(iterator_of_iterator)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>
When I used iter() on an iterator, it returned itself!
>>> id(iterator)
139788446566216
>>> id(iterator_of_iterator)
139788446566216
The example you gave is an "iterator-based for-loop"
n is called the loop variable.
The role that list plays is more troublesome to name.
Indeed, after an interesting conversation with #juanpa.arrivillaga I've concluded that there simply isn't a "clearly correct formal name", nor a commonly used name, for that syntactic element.
That being said, I do think that if you referred to it in context in a sentence as "the loop iterator" everyone would know what you meant.
In doing so, you take the risk of confusing yourself or someone else with the fact that the syntactic element in that position is not in fact an iterator, its a collection or (loosely, but from the definition in the referenced article) an "iterable of some sort".
I suspect that one reason why there isn't a name for this is that we hardly ever have to refer to it in a sentence. Another is that they types of element that can appear in that position vary widely, so it is hard to safely cover them all with a label.

What is the purpose of __iter__ returning the iterator object itself?

I don't understand exactly why the __iter__ special method just returns the object it's called on (if it's called on an iterator). Is it essentially just a flag indicating that the object is an iterator?
EDIT: Actually, I discovered that "This is required to allow both containers and iterators to be used with the for and in statements." https://docs.python.org/3/library/stdtypes.html#iterator.iter
Alright, here's how I understand it: When writing a for loop, you're allowed to specify either an iterable or an iterator to loop over. But Python ultimately needs an iterator for the loop, so it calls the __iter__ method on whatever it's given. If it's been given an iterable, the __iter__ method will produce an iterator, and if it's been given an iterator, the __iter__ method will likewise produce an iterator (the original object given).
When you loop over something using for x in something, then the loop actually calls iter(something) first, so it has something to work with. In general, the for loop is approximately equivalent to something like this:
something_iterator = iter(something)
while True:
try:
x = next(something_iterator)
# loop body
except StopIteration:
break
So as you already figured out yourself, in order to be able to loop over an iterator, i.e. when something is already an iterator, iterators should always return themselves when calling iter() on them. So this basically makes sure that iterators are also iterable.
This depends what object you call iter on. If an object is already an iterator, then there is no operation required to convert it to an iterator, because it already is one. But if the object is not an iterator, but is iterable, then an iterator is constructed from the object.
A good example of this is the list object:
>>> x = [1, 2, 3]
>>> iter(x) == x
False
>>> iter(x)
<list_iterator object at 0x7fccadc5feb8>
>>> x
[1, 2, 3]
Lists are iterable, but they are not themselves iterators. The result of list.__iter__ is not the original list.
In Python when ever you try to use loops, or try to iterate over any object like below..
Lets try to understand for list object..
>>> l = [1, 2, 3] # Defined list l
If we iterate over the above list..
>>> for i in l:
... print i
...
1
2
3
When you try to do this iteration over list l, Python for loop checks for l.__iter__() which intern return an iterator object.
>>> for i in l.__iter__():
... print i
...
1
2
3
To understand this more, lets customize the list and create anew list class..
>>> class ListOverride(list):
... def __iter__(self):
... raise TypeError('Not iterable')
...
Here I've created ListOverride class which intern inherited from list and overrided list.__iter__ method to raise TypeError.
>>> ll = ListOverride([1, 2, 3])
>>> ll
[1, 2, 3]
And i've created anew list using ListOverride class, and since it's list object it should iterate in the same way as list does.
>>> for i in ll:
... print i
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 3, in __iter__
TypeError: Not iterable
If we try to iterate over ListOverride object ll, we'll endup getting NotIterable exception..

How to use python generator expressions to create a oneliner to run a function multiple times and get a list output

I am wondering if there is there is a simple Pythonic way (maybe using generators) to run a function over each item in a list and result in a list of returns?
Example:
def square_it(x):
return x*x
x_set = [0,1,2,3,4]
squared_set = square_it(x for x in x_set)
I notice that when I do a line by line debug on this, the object that gets passed into the function is a generator.
Because of this, I get an error:
TypeError: unsupported operand type(s) for *: 'generator' and 'generator'
I understand that this generator expression created a generator to be passed into the function, but I am wondering if there is a cool way to accomplish running the function multiple times only by specifying an iterable as the argument? (without modifying the function to expect an iterable).
It seems to me that this ability would be really useful to cut down on lines of code because you would not need to create a loop to fun the function and a variable to save the output in a list.
Thanks!
You want a list comprehension:
squared_set = [square_it(x) for x in x_set]
There's a builtin function, map(), for this common problem.
>>> map(square_it, x_set)
[0,1,4,9,16] # On Python 3, a generator is returned.
Alternatively, one can use a generator expression, which is memory-efficient but lazy (meaning the values will not be computed now, only when needed):
>>> (square_it(x) for x in x_set)
<generator object <genexpr> at ...>
Similarly, one can also use a list comprehension, which computes all the values upon creation, returning a list.
Additionally, here's a comparison of generator expressions and list comprehensions.
You want to call the square_it function inside the generator, not on the generator.
squared_set = (square_it(x) for x in x_set)
As the other answers have suggested, I think it is best (most "pythonic") to call your function explicitly on each element, using a list or generator comprehension.
To actually answer the question though, you can wrap your function that operates over scalers with a function that sniffs the input, and has different behavior depending on what it sees. For example:
>>> import types
>>> def scaler_over_generator(f):
... def wrapper(x):
... if isinstance(x, types.GeneratorType):
... return [f(i) for i in x]
... return f(x)
... return wrapper
>>> def square_it(x):
... return x * x
>>> square_it_maybe_over = scaler_over_generator(square_it)
>>> square_it_maybe_over(10)
100
>>> square_it_maybe_over(x for x in range(5))
[0, 1, 4, 9, 16]
I wouldn't use this idiom in my code, but it is possible to do.
You could also code it up with a decorator, like so:
>>> #scaler_over_generator
... def square_it(x):
... return x * x
>>> square_it(x for x in range(5))
[0, 1, 4, 9, 16]
If you didn't want/need a handle to the original function.
Note that there is a difference between list comprehension returning a list
squared_set = [square_it(x) for x in x_set]
and returning a generator that you can iterate over it:
squared_set = (square_it(x) for x in x_set)

How to identify a generator vs list comprehension

I have this:
>>> sum( i*i for i in xrange(5))
My question is, in this case am I passing a list comprehension or a generator object to sum ? How do I tell that? Is there a general rule around this?
Also remember sum by itself needs a pair of parentheses to surround its arguments. I'd think that the parentheses above are for sum and not for creating a generator object. Wouldn't you agree?
You are passing in a generator expression.
A list comprehension is specified with square brackets ([...]). A list comprehension builds a list object first, so it uses syntax closely related to the list literal syntax:
list_literal = [1, 2, 3]
list_comprehension = [i for i in range(4) if i > 0]
A generator expression, on the other hand, creates an iterator object. Only when iterating over that object is the contained loop executed and are items produced. The generator expression does not retain those items; there is no list object being built.
A generator expression always uses (...) round parethesis, but when used as the only argument to a call, the parenthesis can be omitted; the following two expressions are equivalent:
sum((i*i for i in xrange(5))) # with parenthesis
sum(i*i for i in xrange(5)) # without parenthesis around the generator
Quoting from the generator expression documentation:
The parentheses can be omitted on calls with only one argument. See section Calls for the detail.
List comprehensions are enclosed in []:
>>> [i*i for i in xrange(5)] # list comprehension
[0, 1, 4, 9, 16]
>>> (i*i for i in xrange(5)) # generator
<generator object <genexpr> at 0x2cee40>
You are passing a generator.
That is a generator:
>>> (i*i for i in xrange(5))
<generator object <genexpr> at 0x01A27A08>
>>>
List comprehensions are enclosed in [].
You might also be asking, "does this syntax truly cause sum to consume a generator one item at a time, or does it secretly create a list of every item in the generator first"? One way to check this is to try it on a very large range and watch memory usage:
sum(i for i in xrange(int(1e8)))
Memory usage for this case is constant, where as range(int(1e8)) creates the full list and consumes several hundred MB of RAM.
You can test that the parentheses are optional:
def print_it(obj):
print obj
print_it(i for i in xrange(5))
# prints <generator object <genexpr> at 0x03853C60>
I tried this:
#!/usr/bin/env python
class myclass:
def __init__(self,arg):
self.p = arg
print type(self.p)
print self.p
if __name__ == '__main__':
c = myclass(i*i for i in xrange(5))
And this prints:
$ ./genexprorlistcomp.py
<type 'generator'>
<generator object <genexpr> at 0x7f5344c7cf00>
Which is consistent with what Martin and mdscruggs explained in their post.
You are passing a generator object, list comprehension is surrounded by [].

Categories