How to tell if an iterable can be iterated only once? - python

Iterables like generators can only be iterated once:
def f():
for i in range(10):
yield i
a = f()
for x in a:
print(x) # prints x
for x in a:
print(x) # prints none
Iterables like list can be iterated many times:
a = list(range(10))
for x in a:
print(x) # prints x
for x in a:
print(x) # prints x
How can I tell if an iterable can only be iterated once or not?
The motivation for this question comes from the implementation of itertools.cycle:
def cycle(iterable):
# cycle('ABCD') --> A B C D A B C D A B C D ...
saved = []
for element in iterable:
yield element
saved.append(element)
while saved:
for element in saved:
yield element
If we can tell if an iterable can be iterated only once, we can make the implementation more memory-efficient:
def cycle(iterable):
it = iterable
if only_iterated_once(iterable):
it = list(iterable)
while True:
for element in it:
yield element
If the argument can be iterated multiple times, we don't need to save an additional copy.

The main difference between your examples is that in the generator example, a single iterator is created before the loops happen, then that same iterator is used twice. In the list example however, a new iterator is used for each loop.
In the first example, a generator is an iterator itself. When you do
a = f()
The call to f creates a generator (which is an iterator). When you give a to the for loops, they call iter on a, which returns itself. A short MCVE shows this easily:
l = [1]
i = iter(l)
j = iter(i)
print(i is j) # Prints True
One iterator is used for both loops. This means that by the time the second loop starts, the shared iterator will already be exhausted.
In the second example however, when for calls iter on a, a new iterator is created, each time; so two iterators are created. This means that each loop uses its own iterator, so the second loop isn't using an exhausted iterator.
In other words, the way to tell is to think about whether or not you're creating a new iterator with each use, or using an old iterator multiple times.

Related

Unscriptable result of generetion function

I have generator function and function that works witn results of the first one. For example:
def gen():
a = 2
b = 3
yield (a, b)
def func():
c = gen()[0]
d = gen()[1]
I have error "'gen()' is unscriptable"
How can I fix it and work with result of func?
You have two problems here.
First, generator objects are not sequences, they're iterators. And you can't index an iterator the way you can a sequence, by subscripting it like [1]. You can loop over them with a for statement or a comprehension, or manually call next on them until they're done, but you can't [1] them.
That's why you get an error message that says the generator object is not subscriptable.
Second, you didn't want to subscript the generator anyway. Your generator yields an iterable of multiple pairs. It happens to only yield once, but that's no different from a sequence with just one pair in it—it's still not the same thing as a pair.
Consider the nearest sequence equivalent:
def seq():
a = 2
b = 3
return [(a, b)]
Obviously seq()[0] is going to be the tuple (2, 3), and seq()[1] is going to be an IndexError. So, even if you could subscript generators, your code wouldn't make sense.
What you actually want to do is either take the first pair, or loop over all the pairs (I'm not sure which). And then you can do [0] and [1] to the/each pair.
So, either this:
def func():
for pair in gen():
c = pair[0]
d = pair[1]
… or this:
def func():
pair = next(gen())
c = pair[0]
d = pair[1]
Or, if you really wanted to call it twice for some reason, this:
def func():
for pair in gen():
c = pair[0]
for pair in gen():
d = pair[1]
… or this:
def func():
c = next(gen())[0]
d = next(gen())pair[1]
What you are trying to do is get the first and second element without iterating over an iterator. You need to iterate over it to get values from it like -
for i in gen():
c, d = i # you need this because you are returning a tuple
You can go through this post to learn more about iterators and generators

How yield identify that its iteration done once.?

I have read about yield at What does the "yield" keyword do in Python? keyword in python but i have one question that how system identity that yield iterated once.
def test_yield():
name = 'Hello World!'
for i in name:
yield i
y= test_yield()
print "first yield",y
for c in y:
print c
print "second yield",y
for c in y:
print c
Output
first yield <generator object test_yield at 0x7f4c2bc10be0>
H
e
l
l
o
W
o
r
l
d
!
second yield <generator object test_yield at 0x7f4c2bc10be0>
In output second time yield object printed but not iterate.so how program identity that its executed once?
Looping through an iterator "uses it up". So your first yield loop iterates through it and reaches the end. Just as if you opened a file and read all of the lines until EOF. After that, no matter how many times you call read(), you won't get more data, only EOF. Similarly, once your iterator has reached its last element, calling .next on it just raises StopIteration.
When a generator function calls yield, the "state" of the generator function is frozen; the values of all variables are saved and the next line of code to be executed is recorded until next() is called again. Once it is, the generator function simply resumes where it left off. If next() is never called again, the state recorded during the yield call is (eventually) discarded.
Given that when the generator function reaches it's end, a StopIteration Exception is raised, making it exhausted hence you have to reload it, that is the reason it didn't iterate any values during the second call.
Note : for gets values by calling next() implicitly
The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it’ll return the first value of the loop. Then, each other call will run the loop you have written in the function one more time, and return the next value, until there is no value to return. In your second for loop, there's no more value to retrieve because you already wasted them.
For more insight as to what's happening behind the scenes, the for loop can be rewritten to this:
iterator = some_func()
try:
while 1:
print iterator.next()
except StopIteration:
pass
When you wrote
y = test_yield()
you initialize iterator and when you finish your iteration after first
for c in y:
the iterator was ended. You need to initialize once more like:
def test_yield():
name = 'Hello World!'
for i in name:
yield i
y= test_yield()
print "first yield",y
for c in y:
print c
# once again
y= test_yield()
print "second yield",y
for c in y:
print c

recursion to reset a generator in python

I'm trying to write a function that returns the next element of a generator and if it is at the end of the generator it resets it and returns the next result. The expected output of the code below would be:
1
2
3
1
2
However that is not what I get obviously. What am I doing that is incorrect?
a = '123'
def convert_to_generator(iterable):
return (x for x in iterable)
ag = convert_to_generator(a)
def get_next_item(gen, original):
try:
return next(gen)
except StopIteration:
gen = convert_to_generator(original)
get_next_item(gen, original)
for n in range(5):
print(get_next_item(ag,a))
1
2
3
None
None
Is itertools.cycle(iterable) a possible alternative?
You need to return the result of your recursive call:
return get_next_item(gen, original)
which still does not make this a working approach.
The generator ag used in your for-loop is not changed by the rebinding of the local variable gen in your function. It will stay exhausted...
As has been mentioned in the comments, check out itertools.cycle.
the easy way is just use itertools.cycle, otherwise you would need to remember the elements in the iterable if said iterable is an iterator (aka a generator) becase those can't be reset, if its not a iterator, you can reuse it many times.
the documentation include a example implementation
def cycle(iterable):
# cycle('ABCD') --> A B C D A B C D A B C D ...
saved = []
for element in iterable:
yield element
saved.append(element)
while saved:
for element in saved:
yield element
or for example, to do the reuse thing
def cycle(iterable):
# cycle('ABCD') --> A B C D A B C D A B C D ...
if iter(iterable) is iter(iterable): # is a iterator
saved = []
for element in iterable:
yield element
saved.append(element)
else:
saved = iterable
while saved:
for element in saved:
yield element
example use
test = cycle("123")
for i in range(5):
print(next(test))
now about your code, the problem is simple, it don't remember it state
def get_next_item(gen, original):
try:
return next(gen)
except StopIteration:
gen = convert_to_generator(original) # <-- the problem is here
get_next_item(gen, original) #and you should return something here
in the marked line a new generator is build, but you would need to update your ag variable outside this function to get the desire behavior, there are ways to do it, like changing your function to return the element and the generator, there are other ways, but they are not recommended or more complicated like building a class so it remember its state
get_next_item is a generator, that returns an iterator, that gives you the values it yields via the __next__ method. For that reason, your statement doesn't do anything.
What you want to do is this:
def get_next_item(gen, original):
try:
return next(gen)
except StopIteration:
gen = convert_to_generator(original)
for i in get_next_item(gen, original):
return i
or shorter, and completely equivalent (as long as gen has a __iter__ method, which it probably has):
def get_next_item(gen, original):
for i in gen:
yield i
for i in get_next_item(convert_to_generator(original)):
yield i
Or without recursion (which is a big problem in python, as it is 1. limited in depth and 2. slow):
def get_next_item(gen, original):
for i in gen:
yield i
while True:
for i in convert_to_generator(original):
yield i
If convert_to_generator is just a call to iter, it is even shorter:
def get_next_item(gen, original):
for i in gen:
yield i
while True:
for i in original:
yield i
or, with itertools:
import itertools
def get_next_item(gen, original):
return itertools.chain(gen, itertools.cycle(original))
and get_next_item is equivalent to itertools.cycle if gen is guaranteed to be an iterator for original.
Side note: You can exchange for i in x: yield i for yield from x (where x is some expression) with Python 3.3 or higher.

Python for loop and iterator behavior

I wanted to understand a bit more about iterators, so please correct me if I'm wrong.
An iterator is an object which has a pointer to the next object and is read as a buffer or stream (i.e. a linked list). They're particularly efficient cause all they do is tell you what is next by references instead of using indexing.
However I still don't understand why is the following behavior happening:
In [1]: iter = (i for i in range(5))
In [2]: for _ in iter:
....: print _
....:
0
1
2
3
4
In [3]: for _ in iter:
....: print _
....:
In [4]:
After a first loop through the iterator (In [2]) it's as if it was consumed and left empty, so the second loop (In [3]) prints nothing.
However I never assigned a new value to the iter variable.
What is really happening under the hood of the for loop?
Your suspicion is correct: the iterator has been consumed.
In actuality, your iterator is a generator, which is an object which has the ability to be iterated through only once.
type((i for i in range(5))) # says it's type generator
def another_generator():
yield 1 # the yield expression makes it a generator, not a function
type(another_generator()) # also a generator
The reason they are efficient has nothing to do with telling you what is next "by reference." They are efficient because they only generate the next item upon request; all of the items are not generated at once. In fact, you can have an infinite generator:
def my_gen():
while True:
yield 1 # again: yield means it is a generator, not a function
for _ in my_gen(): print(_) # hit ctl+c to stop this infinite loop!
Some other corrections to help improve your understanding:
The generator is not a pointer, and does not behave like a pointer as you might be familiar with in other languages.
One of the differences from other languages: as said above, each result of the generator is generated on the fly. The next result is not produced until it is requested.
The keyword combination for in accepts an iterable object as its second argument.
The iterable object can be a generator, as in your example case, but it can also be any other iterable object, such as a list, or dict, or a str object (string), or a user-defined type that provides the required functionality.
The iter function is applied to the object to get an iterator (by the way: don't use iter as a variable name in Python, as you have done - it is one of the keywords). Actually, to be more precise, the object's __iter__ method is called (which is, for the most part, all the iter function does anyway; __iter__ is one of Python's so-called "magic methods").
If the call to __iter__ is successful, the function next() is applied to the iterable object over and over again, in a loop, and the first variable supplied to for in is assigned to the result of the next() function. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__ method, which is another "magic method".
The for loop ends when next() raises the StopIteration exception (which usually happens when the iterable does not have another object to yield when next() is called).
You can "manually" implement a for loop in python this way (probably not perfect, but close enough):
try:
temp = iterable.__iter__()
except AttributeError():
raise TypeError("'{}' object is not iterable".format(type(iterable).__name__))
else:
while True:
try:
_ = temp.__next__()
except StopIteration:
break
except AttributeError:
raise TypeError("iter() returned non-iterator of type '{}'".format(type(temp).__name__))
# this is the "body" of the for loop
continue
There is pretty much no difference between the above and your example code.
Actually, the more interesting part of a for loop is not the for, but the in. Using in by itself produces a different effect than for in, but it is very useful to understand what in does with its arguments, since for in implements very similar behavior.
When used by itself, the in keyword first calls the object's __contains__ method, which is yet another "magic method" (note that this step is skipped when using for in). Using in by itself on a container, you can do things like this:
1 in [1, 2, 3] # True
'He' in 'Hello' # True
3 in range(10) # True
'eH' in 'Hello'[::-1] # True
If the iterable object is NOT a container (i.e. it doesn't have a __contains__ method), in next tries to call the object's __iter__ method. As was said previously: the __iter__ method returns what is known in Python as an iterator. Basically, an iterator is an object that you can use the built-in generic function next() on1. A generator is just one type of iterator.
If the call to __iter__ is successful, the in keyword applies the function next() to the iterable object over and over again. (Remember: the iterable object could be a generator, or a container object's iterator, or any other iterable object.) Actually, to be more precise: it calls the iterator object's __next__ method).
If the object doesn't have a __iter__ method to return an iterator, in then falls back on the old-style iteration protocol using the object's __getitem__ method2.
If all of the above attempts fail, you'll get a TypeError exception.
If you wish to create your own object type to iterate over (i.e, you can use for in, or just in, on it), it's useful to know about the yield keyword, which is used in generators (as mentioned above).
class MyIterable():
def __iter__(self):
yield 1
m = MyIterable()
for _ in m: print(_) # 1
1 in m # True
The presence of yield turns a function or method into a generator instead of a regular function/method. You don't need the __next__ method if you use a generator (it brings __next__ along with it automatically).
If you wish to create your own container object type (i.e, you can use in on it by itself, but NOT for in), you just need the __contains__ method.
class MyUselessContainer():
def __contains__(self, obj):
return True
m = MyUselessContainer()
1 in m # True
'Foo' in m # True
TypeError in m # True
None in m # True
1 Note that, to be an iterator, an object must implement the iterator protocol. This only means that both the __next__ and __iter__ methods must be correctly implemented (generators come with this functionality "for free", so you don't need to worry about it when using them). Also note that the ___next__ method is actually next (no underscores) in Python 2.
2 See this answer for the different ways to create iterable classes.
For loop basically calls the next method of an object that is applied to (__next__ in Python 3).
You can simulate this simply by doing:
iter = (i for i in range(5))
print(next(iter))
print(next(iter))
print(next(iter))
print(next(iter))
print(next(iter))
# this prints 1 2 3 4
At this point there is no next element in the input object. So doing this:
print(next(iter))
Will result in StopIteration exception thrown. At this point for will stop. And iterator can be any object which will respond to the next() function and throws the exception when there are no more elements. It does not have to be any pointer or reference (there are no such things in python anyway in C/C++ sense), linked list, etc.
There is an iterator protocol in python that defines how the for statement will behave with lists and dicts, and other things that can be looped over.
It's in the python docs here and here.
The way the iterator protocol works typically is in the form of a python generator. We yield a value as long as we have a value until we reach the end and then we raise StopIteration
So let's write our own iterator:
def my_iter():
yield 1
yield 2
yield 3
raise StopIteration()
for i in my_iter():
print i
The result is:
1
2
3
A couple of things to note about that. The my_iter is a function. my_iter() returns an iterator.
If I had written using iterator like this instead:
j = my_iter() #j is the iterator that my_iter() returns
for i in j:
print i #this loop runs until the iterator is exhausted
for i in j:
print i #the iterator is exhausted so we never reach this line
And the result is the same as above. The iter is exhausted by the time we enter the second for loop.
But that's rather simplistic what about something more complicated? Perhaps maybe in a loop why not?
def capital_iter(name):
for x in name:
yield x.upper()
raise StopIteration()
for y in capital_iter('bobert'):
print y
And when it runs, we use the iterator on the string type (which is built into iter). This in turn, allows us run a for loop on it, and yield the results until we are done.
B
O
B
E
R
T
So now this begs the question, so what happens between yields in the iterator?
j = capital_iter("bobert")
print i.next()
print i.next()
print i.next()
print("Hey there!")
print i.next()
print i.next()
print i.next()
print i.next() #Raises StopIteration
The answer is the function is paused at the yield waiting for the next call to next().
B
O
B
Hey There!
E
R
T
Traceback (most recent call last):
File "", line 13, in
StopIteration
Some additional details about the behaviour of iter() with __getitem__ classes that lack their own __iter__ method.
Before __iter__ there was __getitem__. If the __getitem__ works with ints from 0 - len(obj)-1, then iter() supports these objects. It will construct a new iterator that repeatedly calls __getitem__ with 0, 1, 2, ... until it gets an IndexError, which it converts to a StopIteration.
See this answer for more details of the different ways to create an iterator.
Excerpt from the Python Practice book:
5. Iterators & Generators
5.1. Iterators
We use for statement for looping over a list.
>>> for i in [1, 2, 3, 4]:
... print i,
...
1
2
3
4
If we use it with a string, it loops over its characters.
>>> for c in "python":
... print c
...
p
y
t
h
o
n
If we use it with a dictionary, it loops over its keys.
>>> for k in {"x": 1, "y": 2}:
... print k
...
y
x
If we use it with a file, it loops over lines of the file.
>>> for line in open("a.txt"):
... print line,
...
first line
second line
So there are many types of objects which can be used with a for loop. These are called iterable objects.
There are many functions which consume these iterables.
>>> ",".join(["a", "b", "c"])
'a,b,c'
>>> ",".join({"x": 1, "y": 2})
'y,x'
>>> list("python")
['p', 'y', 't', 'h', 'o', 'n']
>>> list({"x": 1, "y": 2})
['y', 'x']
5.1.1. The Iteration Protocol
The built-in function iter takes an iterable object and returns an iterator.
>>> x = iter([1, 2, 3])
>>> x
<listiterator object at 0x1004ca850>
>>> x.next()
1
>>> x.next()
2
>>> x.next()
3
>>> x.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Each time we call the next method on the iterator gives us the next element. If there are no more elements, it raises a StopIteration.
Iterators are implemented as classes. Here is an iterator that works like built-in xrange function.
class yrange:
def __init__(self, n):
self.i = 0
self.n = n
def __iter__(self):
return self
def next(self):
if self.i < self.n:
i = self.i
self.i += 1
return i
else:
raise StopIteration()
The iter method is what makes an object iterable. Behind the scenes, the iter function calls iter method on the given object.
The return value of iter is an iterator. It should have a next method and raise StopIteration when there are no more elements.
Lets try it out:
>>> y = yrange(3)
>>> y.next()
0
>>> y.next()
1
>>> y.next()
2
>>> y.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 14, in next
StopIteration
Many built-in functions accept iterators as arguments.
>>> list(yrange(5))
[0, 1, 2, 3, 4]
>>> sum(yrange(5))
10
In the above case, both the iterable and iterator are the same object. Notice that the iter method returned self. It need not be the case always.
class zrange:
def __init__(self, n):
self.n = n
def __iter__(self):
return zrange_iter(self.n)
class zrange_iter:
def __init__(self, n):
self.i = 0
self.n = n
def __iter__(self):
# Iterators are iterables too.
# Adding this functions to make them so.
return self
def next(self):
if self.i < self.n:
i = self.i
self.i += 1
return i
else:
raise StopIteration()
If both iteratable and iterator are the same object, it is consumed in a single iteration.
>>> y = yrange(5)
>>> list(y)
[0, 1, 2, 3, 4]
>>> list(y)
[]
>>> z = zrange(5)
>>> list(z)
[0, 1, 2, 3, 4]
>>> list(z)
[0, 1, 2, 3, 4]
5.2. Generators
Generators simplifies creation of iterators. A generator is a function that produces a sequence of results instead of a single value.
def yrange(n):
i = 0
while i < n:
yield i
i += 1
Each time the yield statement is executed the function generates a new value.
>>> y = yrange(3)
>>> y
<generator object yrange at 0x401f30>
>>> y.next()
0
>>> y.next()
1
>>> y.next()
2
>>> y.next()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
So a generator is also an iterator. You don’t have to worry about the iterator protocol.
The word “generator” is confusingly used to mean both the function that generates and what it generates. In this chapter, I’ll use the word “generator” to mean the generated object and “generator function” to mean the function that generates it.
Can you think about how it is working internally?
When a generator function is called, it returns a generator object without even beginning execution of the function. When next method is called for the first time, the function starts executing until it reaches yield statement. The yielded value is returned by the next call.
The following example demonstrates the interplay between yield and call to next method on generator object.
>>> def foo():
... print "begin"
... for i in range(3):
... print "before yield", i
... yield i
... print "after yield", i
... print "end"
...
>>> f = foo()
>>> f.next()
begin
before yield 0
0
>>> f.next()
after yield 0
before yield 1
1
>>> f.next()
after yield 1
before yield 2
2
>>> f.next()
after yield 2
end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Lets see an example:
def integers():
"""Infinite sequence of integers."""
i = 1
while True:
yield i
i = i + 1
def squares():
for i in integers():
yield i * i
def take(n, seq):
"""Returns first n values from the given sequence."""
seq = iter(seq)
result = []
try:
for i in range(n):
result.append(seq.next())
except StopIteration:
pass
return result
print take(5, squares()) # prints [1, 4, 9, 16, 25]
Concept 1
All generators are iterators but all iterators are not generator
Concept 2
An iterator is an object with a next (Python 2) or next (Python 3)
method.
Concept 3
Quoting from wiki
Generators Generators
functions allow you to declare a function that behaves like an
iterator, i.e. it can be used in a for loop.
In your case
>>> it = (i for i in range(5))
>>> type(it)
<type 'generator'>
>>> callable(getattr(it, 'iter', None))
False
>>> callable(getattr(it, 'next', None))
True

Lazy evaluation in Python

What is lazy evaluation in Python?
One website said :
In Python 3.x the range() function returns a special range object which computes elements of the list on demand (lazy or deferred evaluation):
>>> r = range(10)
>>> print(r)
range(0, 10)
>>> print(r[3])
3
What is meant by this?
The object returned by range() (or xrange() in Python2.x) is known as a lazy iterable.
Instead of storing the entire range, [0,1,2,..,9], in memory, the generator stores a definition for (i=0; i<10; i+=1) and computes the next value only when needed (AKA lazy-evaluation).
Essentially, a generator allows you to return a list like structure, but here are some differences:
A list stores all elements when it is created. A generator generates the next element when it is needed.
A list can be iterated over as much as you need, a generator can only be iterated over exactly once.
A list can get elements by index, a generator cannot -- it only generates values once, from start to end.
A generator can be created in two ways:
(1) Very similar to a list comprehension:
# this is a list, create all 5000000 x/2 values immediately, uses []
lis = [x/2 for x in range(5000000)]
# this is a generator, creates each x/2 value only when it is needed, uses ()
gen = (x/2 for x in range(5000000))
(2) As a function, using yield to return the next value:
# this is also a generator, it will run until a yield occurs, and return that result.
# on the next call it picks up where it left off and continues until a yield occurs...
def divby2(n):
num = 0
while num < n:
yield num/2
num += 1
# same as (x/2 for x in range(5000000))
print divby2(5000000)
Note: Even though range(5000000) is a generator in Python3.x, [x/2 for x in range(5000000)] is still a list. range(...) does it's job and generates x one at a time, but the entire list of x/2 values will be computed when this list is create.
In a nutshell, lazy evaluation means that the object is evaluated when it is needed, not when it is created.
In Python 2, range will return a list - this means that if you give it a large number, it will calculate the range and return at the time of creation:
>>> i = range(100)
>>> type(i)
<type 'list'>
In Python 3, however you get a special range object:
>>> i = range(100)
>>> type(i)
<class 'range'>
Only when you consume it, will it actually be evaluated - in other words, it will only return the numbers in the range when you actually need them.
A github repo named python patterns and wikipedia tell us what lazy evaluation is.
Delays the eval of an expr until its value is needed and avoids repeated evals.
range in python3 is not a complete lazy evaluation, because it doesn't avoid repeated eval.
A more classic example for lazy evaluation is cached_property:
import functools
class cached_property(object):
def __init__(self, function):
self.function = function
functools.update_wrapper(self, function)
def __get__(self, obj, type_):
if obj is None:
return self
val = self.function(obj)
obj.__dict__[self.function.__name__] = val
return val
The cached_property(a.k.a lazy_property) is a decorator which convert a func into a lazy evaluation property. The first time property accessed, the func is called to get result and then the value is used the next time you access the property.
eg:
class LogHandler:
def __init__(self, file_path):
self.file_path = file_path
#cached_property
def load_log_file(self):
with open(self.file_path) as f:
# the file is to big that I have to cost 2s to read all file
return f.read()
log_handler = LogHandler('./sys.log')
# only the first time call will cost 2s.
print(log_handler.load_log_file)
# return value is cached to the log_handler obj.
print(log_handler.load_log_file)
To use a proper word, a python generator object like range are more like designed through call_by_need pattern, rather than lazy evaluation

Categories