Is there a way to copy an arbitrary generator in Python? - python

Among the best-known features of functional programming are lazy evaluation and infinite lists. In Python, one generally implements these features with generators. But one of the precepts of functional programming is immutability, and generators are not immutable. Just the opposite. Every time one calls next() on a generator, it changes its internal state.
A possible work-around would be to copy a generator before calling next() on it. That works for some generators such as count(). (Perhaps count() is not generator?)
from itertools import count
count_gen = count()
count_gen_copy = copy(count_gen)
print(next(count_gen), next(count_gen), next(count_gen)) # => 0 1 2
print(next(count_gen_copy), next(count_gen_copy), next(count_gen_copy)) # => 0 1 2
But if I define my own generator, e.g., my_count(), I can't copy it.
def my_count(n=0):
while True:
yield n
n += 1
my_count_gen = my_count()
my_count_gen_copy = copy(my_count_gen)
print(next(my_count_gen), next(my_count_gen), next(my_count_gen))
print(next(my_count_gen_copy), next(my_count_gen_copy), next(my_count_gen_copy))
I get an error message when I attempt to execute copy(my_count_gen): TypeError: can't pickle generator objects.
Is there a way around this, or is there some other approach?
Perhaps another way to ask this is: what is copy() copying when it copies copy_gen?
Thanks.
P.S. If I use __iter__() rather than copy(), the __iter__() version acts like the original.
my_count_gen = my_count()
my_count_gen_i = my_count_gen.__iter__()
print(next(my_count_gen), next(my_count_gen), next(my_count_gen)) # => 0 1 2
print(next(my_count_gen_i), next(my_count_gen_i), next(my_count_gen_i)) # => 3 4 5

There's no way to copy arbitrary generators in Python. The operation just doesn't make sense. A generator could depend on all sorts of other uncopyable resources, like file handles, database connections, locks, worker processes, etc. If a generator is holding a lock and you copied it, what would happen to the lock? If a generator is in the middle of a database transaction and you copy it, what would happen to the transaction?
The things you thought were copyable generators aren't generators at all. They're instances of other iterator classes. If you want to write your own iterator class, you can:
class MyCount:
def __init__(self, n=0):
self._n = n
def __iter__(self):
return self
def __next__(self):
retval = self._n
self._n += 1
return retval
Some iterators you write that way might even be reasonably copyable. Others, copy.copy will do something completely unreasonable and useless.

While copy doesn't make sense on a generator, you can effectively "copy" an iterator so that you can iterate it many times. The easiest way is to use tee from the itertools module.
def my_count(n=0):
while True:
yield n
n += 1
a, b, c = itertools.tee(my_count(), 3)
# now use a, b, c ...
This uses memory to cache the iterator's results and pass them on.

Related

When is it better to use an iterator than a generator? [duplicate]

What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
iterator is a more general concept: any object whose class has a __next__ method (next in Python 2) and an __iter__ method that does return self.
Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions (yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph's definition of an iterator.
You may want to use a custom iterator, rather than a generator, when you need a class with somewhat complex state-maintaining behavior, or want to expose other methods besides __next__ (and __iter__ and __init__). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression) is sufficient, and it's simpler to code because state maintenance (within reasonable limits) is basically "done for you" by the frame getting suspended and resumed.
For example, a generator such as:
def squares(start, stop):
for i in range(start, stop):
yield i * i
generator = squares(a, b)
or the equivalent generator expression (genexp)
generator = (i*i for i in range(a, b))
would take more code to build as a custom iterator:
class Squares(object):
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self): return self
def __next__(self): # next in Python 2
if self.start >= self.stop:
raise StopIteration
current = self.start * self.start
self.start += 1
return current
iterator = Squares(a, b)
But, of course, with class Squares you could easily offer extra methods, i.e.
def current(self):
return self.start
if you have any actual need for such extra functionality in your application.
What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
In summary: Iterators are objects that have an __iter__ and a __next__ (next in Python 2) method. Generators provide an easy, built-in way to create instances of Iterators.
A function with yield in it is still a function, that, when called, returns an instance of a generator object:
def a_function():
"when called, returns generator object"
yield
A generator expression also returns a generator:
a_generator = (i for i in range(0))
For a more in-depth exposition and examples, keep reading.
A Generator is an Iterator
Specifically, generator is a subtype of iterator.
>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True
We can create a generator several ways. A very common and simple way to do so is with a function.
Specifically, a function with yield in it is a function, that, when called, returns a generator:
>>> def a_function():
"just a function definition with yield in it"
yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function() # when called
>>> type(a_generator) # returns a generator
<class 'generator'>
And a generator, again, is an Iterator:
>>> isinstance(a_generator, collections.Iterator)
True
An Iterator is an Iterable
An Iterator is an Iterable,
>>> issubclass(collections.Iterator, collections.Iterable)
True
which requires an __iter__ method that returns an Iterator:
>>> collections.Iterable()
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__
Some examples of iterables are the built-in tuples, lists, dictionaries, sets, frozen sets, strings, byte strings, byte arrays, ranges and memoryviews:
>>> all(isinstance(element, collections.Iterable) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
Iterators require a next or __next__ method
In Python 2:
>>> collections.Iterator()
Traceback (most recent call last):
File "<pyshell#80>", line 1, in <module>
collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next
And in Python 3:
>>> collections.Iterator()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__
We can get the iterators from the built-in objects (or custom objects) with the iter function:
>>> all(isinstance(iter(element), collections.Iterator) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
The __iter__ method is called when you attempt to use an object with a for-loop. Then the __next__ method is called on the iterator object to get each item out for the loop. The iterator raises StopIteration when you have exhausted it, and it cannot be reused at that point.
From the documentation
From the Generator Types section of the Iterator Types section of the Built-in Types documentation:
Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the __iter__() and next() [__next__() in Python 3] methods. More information about generators can be found in the documentation for the yield expression.
(Emphasis added.)
So from this we learn that Generators are a (convenient) type of Iterator.
Example Iterator Objects
You might create object that implements the Iterator protocol by creating or extending your own object.
class Yes(collections.Iterator):
def __init__(self, stop):
self.x = 0
self.stop = stop
def __iter__(self):
return self
def next(self):
if self.x < self.stop:
self.x += 1
return 'yes'
else:
# Iterators must raise when done, else considered broken
raise StopIteration
__next__ = next # Python 3 compatibility
But it's easier to simply use a Generator to do this:
def yes(stop):
for _ in range(stop):
yield 'yes'
Or perhaps simpler, a Generator Expression (works similarly to list comprehensions):
yes_expr = ('yes' for _ in range(stop))
They can all be used in the same way:
>>> stop = 4
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop),
('yes' for _ in range(stop))):
... print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes
Conclusion
You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.
However, in the vast majority of cases, you are best suited to use yield to define a function that returns a Generator Iterator or consider Generator Expressions.
Finally, note that generators provide even more functionality as coroutines. I explain Generators, along with the yield statement, in depth on my answer to "What does the “yield” keyword do?".
Iterators are objects which use the next() method to get the following values of a sequence.
Generators are functions that produce or yield a sequence of values using the yield keyword.
Every next() method call on a generator object(for ex: f below) returned by a generator function (for ex: foo() below), generates the next value in the sequence.
When a generator function is called, it returns an generator object without even beginning the execution of the function. When the next() method is called for the first time, the function starts executing until it reaches a yield statement which returns the yielded value. The yield keeps track of what has happened, i.e. it remembers the last execution. And secondly, the next() call continues from the previous value.
The following example demonstrates the interplay between yield and the call to the next method on a generator object.
>>> def foo():
... print("begin")
... for i in range(3):
... print("before yield", i)
... yield i
... print("after yield", i)
... print("end")
...
>>> f = foo()
>>> next(f)
begin
before yield 0 # Control is in for loop
0
>>> next(f)
after yield 0
before yield 1 # Continue for loop
1
>>> next(f)
after yield 1
before yield 2
2
>>> next(f)
after yield 2
end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Adding an answer because none of the existing answers specifically address the confusion in the official literature.
Generator functions are ordinary functions defined using yield instead of return. When called, a generator function returns a generator object, which is a kind of iterator - it has a next() method. When you call next(), the next value yielded by the generator function is returned.
Either the function or the object may be called the "generator" depending on which Python source document you read. The Python glossary says generator functions, while the Python wiki implies generator objects. The Python tutorial remarkably manages to imply both usages in the space of three sentences:
Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed).
The first two sentences identify generators with generator functions, while the third sentence identifies them with generator objects.
Despite all this confusion, one can seek out the Python language reference for the clear and final word:
The yield expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.
When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of a generator function.
So, in formal and precise usage, "generator" unqualified means generator object, not generator function.
The above references are for Python 2 but Python 3 language reference says the same thing. However, the Python 3 glossary states that
generator ... Usually refers to a generator function, but may refer to a generator iterator in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity.
Everybody has a really nice and verbose answer with examples and I really appreciate it. I just wanted to give a short few lines answer for people who are still not quite clear conceptually:
If you create your own iterator, it is a little bit involved - you have
to create a class and at least implement the iter and the next methods. But what if you don't want to go through this hassle and want to quickly create an iterator. Fortunately, Python provides a short-cut way to defining an iterator. All you need to do is define a function with at least 1 call to yield and now when you call that function it will return "something" which will act like an iterator (you can call next method and use it in a for loop). This something has a name in Python called Generator
Hope that clarifies a bit.
Examples from Ned Batchelder highly recommended for iterators and generators
A method without generators that do something to even numbers
def evens(stream):
them = []
for n in stream:
if n % 2 == 0:
them.append(n)
return them
while by using a generator
def evens(stream):
for n in stream:
if n % 2 == 0:
yield n
We don't need any list nor a return statement
Efficient for large/ infinite length stream ... it just walks and yield the value
Calling the evens method (generator) is as usual
num = [...]
for n in evens(num):
do_smth(n)
Generator also used to Break double loop
Iterator
A book full of pages is an iterable, A bookmark is an
iterator
and this bookmark has nothing to do except to move next
litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration (Exception) as we got end of the iterator
To use Generator ... we need a function
To use Iterator ... we need next and iter
As been said:
A Generator function returns an iterator object
The Whole benefit of Iterator:
Store one element a time in memory
No-code 4 line cheat sheet:
A generator function is a function with yield in it.
A generator expression is like a list comprehension. It uses "()" vs "[]"
A generator object (often called 'a generator') is returned by both above.
A generator is also a subtype of iterator.
Previous answers missed this addition: a generator has a close method, while typical iterators don’t. The close method triggers a StopIteration exception in the generator, which may be caught in a finally clause in that iterator, to get a chance to run some clean‑up. This abstraction makes it most usable in the large than simple iterators. One can close a generator as one could close a file, without having to bother about what’s underneath.
That said, my personal answer to the first question would be: iteratable has an __iter__ method only, typical iterators have a __next__ method only, generators has both an __iter__ and a __next__ and an additional close.
For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close method an a greater composability with yield from. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.
For control flow matters, generators are an as much important concept as promises: both are abstract and composable.
It's difficult to answer the question without 2 other concepts: iterable and iterator protocol.
What is difference between iterator and iterable?
Conceptually you iterate over iterable with the help of corresponding iterator. There are a few differences that can help to distinguish iterator and iterable in practice:
One difference is that iterator has __next__ method, iterable does not.
Another difference - both of them contain __iter__ method. In case of iterable it returns the corresponding iterator. In case of iterator it returns itself.
This can help to distinguish iterator and iterable in practice.
>>> x = [1, 2, 3]
>>> dir(x)
[... __iter__ ...]
>>> x_iter = iter(x)
>>> dir(x_iter)
[... __iter__ ... __next__ ...]
>>> type(x_iter)
list_iterator
What are iterables in python? list, string, range etc. What are iterators? enumerate, zip, reversed etc. We may check this using the approach above. It's kind of confusing. Probably it would be easier if we have only one type. Is there any difference between range and zip? One of the reasons to do this - range has a lot of additional functionality - we may index it or check if it contains some number etc. (see details here).
How can we create an iterator ourselves? Theoretically we may implement Iterator Protocol (see here). We need to write __next__ and __iter__ methods and raise StopIteration exception and so on (see Alex Martelli's answer for an example and possible motivation, see also here). But in practice we use generators. It seems to be by far the main method to create iterators in python.
I can give you a few more interesting examples that show somewhat confusing usage of those concepts in practice:
in keras we have tf.keras.preprocessing.image.ImageDataGenerator; this class doesn't have __next__ and __iter__ methods; so it's not an iterator (or generator);
if you call its flow_from_dataframe() method you'll get DataFrameIterator that has those methods; but it doesn't implement StopIteration (which is not common in build-in iterators in python); in documentation we may read that "A DataFrameIterator yielding tuples of (x, y)" - again confusing usage of terminology;
we also have Sequence class in keras and that's custom implementation of a generator functionality (regular generators are not suitable for multithreading) but it doesn't implement __next__ and __iter__, rather it's a wrapper around generators (it uses yield statement);
Generator Function, Generator Object, Generator:
A Generator function is just like a regular function in Python but it contains one or more yield statements. Generator functions is a great tool to create Iterator objects as easy as possible. The Iterator object returend by generator function is also called Generator object or Generator.
In this example I have created a Generator function which returns a Generator object <generator object fib at 0x01342480>. Just like other iterators, Generator objects can be used in a for loop or with the built-in function next() which returns the next value from generator.
def fib(max):
a, b = 0, 1
for i in range(max):
yield a
a, b = b, a + b
print(fib(10)) #<generator object fib at 0x01342480>
for i in fib(10):
print(i) # 0 1 1 2 3 5 8 13 21 34
print(next(myfib)) #0
print(next(myfib)) #1
print(next(myfib)) #1
print(next(myfib)) #2
So a generator function is the easiest way to create an Iterator object.
Iterator:
Every generator object is an iterator but not vice versa. A custom iterator object can be created if its class implements __iter__ and __next__ method (also called iterator protocol).
However, it is much easier to use generators function to create iterators because they simplify their creation, but a custom Iterator gives you more freedom and you can also implement other methods according to your requirements as shown in the below example.
class Fib:
def __init__(self,max):
self.current=0
self.next=1
self.max=max
self.count=0
def __iter__(self):
return self
def __next__(self):
if self.count>self.max:
raise StopIteration
else:
self.current,self.next=self.next,(self.current+self.next)
self.count+=1
return self.next-self.current
def __str__(self):
return "Generator object"
itobj=Fib(4)
print(itobj) #Generator object
for i in Fib(4):
print(i) #0 1 1 2
print(next(itobj)) #0
print(next(itobj)) #1
print(next(itobj)) #1
This thread covers in many details all the differences between the two, but wanted to add something on the conceptual difference between the two:
[...] an iterator as defined in the GoF book retrieves items from a collection, while a generator can produce items “out of thin air”. That’s why the Fibonacci sequence generator is a common example: an infinite series of numbers cannot be stored in a collection.
Ramalho, Luciano. Fluent Python (p. 415). O'Reilly Media. Kindle Edition.
Sure, it does not cover all the aspects but I think it gives a good notion when one can be useful.
You can compare both approaches for the same data:
def myGeneratorList(n):
for i in range(n):
yield i
def myIterableList(n):
ll = n*[None]
for i in range(n):
ll[i] = i
return ll
# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
print("{} {}".format(i1, i2))
# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
Besides, if you check the memory footprint, the generator takes much less memory as it doesn't need to store all the values in memory at the same time.
An iterable object is something which can be iterated (naturally). To do that, however, you will need something like an iterator object, and, yes, the terminology may be confusing. Iterable objects include a __iter__ method which will return the iterator object for the iterable object.
An iterator object is an object which implements the iterator protocol - a set of rules. In this case, it must have at least these two methods: __iter__ and __next__. The __next__ method is a function which supplies a new value. The __iter__ method returns the iterator object. In a more complex object, there may be a separate iterator, but in a simpler case, __iter__ returns the object itself (typically return self).
One iterable object is a list object. It’s not an iterator, but it has an __iter__ method which returns an iterator. You can call this method directly as things.__iter__(), or use iter(things).
If you want to iterate through any collection, you will need to use its iterator:
things_iterator = iter(things)
for i in things_iterator:
print(i)
However, Python will automatically use the iterator, which is why you never see the above example. Instead you write:
for i in things:
print(i)
Writing an iterator yourself can be tedious, so Python has a simpler alternative: the generator function. A generator function is not an ordinary function. Instead of running through the code and returning a final result, the code is deferred, and the function returns immediately with a generator object.
A generator object is like an iterator object in that it implements the iterator protocol. That’s good enough for most purposes. There are many examples of generators in the other answers.
In short, an iterator is an object which allows you to iterate through another object, whether it’s a collection or some other source of values. A generator is a simplified iterator which does more-or-less the same job, but is easier to implement.
Normally, you would go for a generator if that’s all you need. If, however, you’re building a more complex object which includes iteration among other features, you would use the iterator protocol instead.
I am writing specifically for Python newbies in a very simple way, though deep down Python does so many things.
Let’s start with the very basic:
Consider a list,
l = [1,2,3]
Let’s write an equivalent function:
def f():
return [1,2,3]
o/p of print(l): [1,2,3] &
o/p of print(f()) : [1,2,3]
Let’s make list l iterable: In python list is always iterable that means you can apply iterator whenever you want.
Let’s apply iterator on list:
iter_l = iter(l) # iterator applied explicitly
Let’s make a function iterable, i.e. write an equivalent generator function.
In python as soon as you introduce the keyword yield; it becomes a generator function and iterator will be applied implicitly.
Note: Every generator is always iterable with implicit iterator applied and here implicit iterator is the crux
So the generator function will be:
def f():
yield 1
yield 2
yield 3
iter_f = f() # which is iter(f) as iterator is already applied implicitly
So if you have observed, as soon as you made function f a generator, it is already iter(f)
Now,
l is the list, after applying iterator method "iter" it becomes,
iter(l)
f is already iter(f), after applying iterator method "iter" it
becomes, iter(iter(f)), which is again iter(f)
It's kinda you are casting int to int(x) which is already int and it will remain int(x).
For example o/p of :
print(type(iter(iter(l))))
is
<class 'list_iterator'>
Never forget this is Python and not C or C++
Hence the conclusion from above explanation is:
list l ~= iter(l)
generator function f == iter(f)
All generators are iterators but not vice versa.
from typing import Iterator
from typing import Iterable
from typing import Generator
class IT:
def __init__(self):
self.n = 0
def __iter__(self):
return self
def __next__(self):
if self.n == 4:
raise StopIteration
try:
return self.n
finally:
self.n += 1
def g():
for i in range(4):
yield i
def test(it):
print(f'type(it) = {type(it)}')
print(f'isinstance(it, Generator) = {isinstance(it, Generator)}')
print(f'isinstance(it, Iterator) = {isinstance(it, Iterator)}')
print(f'isinstance(it, Iterable) = {isinstance(it, Iterable)}')
print(next(it))
print(next(it))
print(next(it))
print(next(it))
try:
print(next(it))
except StopIteration:
print('boom\n')
print(f'issubclass(Generator, Iterator) = {issubclass(Generator, Iterator)}')
print(f'issubclass(Iterator, Iterable) = {issubclass(Iterator, Iterable)}')
print()
test(IT())
test(g())
Output:
issubclass(Generator, Iterator) = True
issubclass(Iterator, Iterable) = True
type(it) = <class '__main__.IT'>
isinstance(it, Generator) = False
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom
type(it) = <class 'generator'>
isinstance(it, Generator) = True
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom

Shared python generator

I am trying to reproduce the reactive extensions "shared" observable concept with Python generators.
Say I have an API that gives me an infinite stream that I can use like this:
def my_generator():
for elem in the_infinite_stream():
yield elem
I could use this generator multiple times like so:
stream1 = my_generator()
stream2 = my_generator()
And the_infinite_stream() will be called twice (once for each generator).
Now say that the_infinite_stream() is an expensive operation. Is there a way to "share" the generator between multiple clients? It seems like tee would do that, but I have to know in advance how many independent generators I want.
The idea is that in other languages (Java, Swift) using the reactive extensions (RxJava, RxSwift) "shared" streams, I can conveniently duplicate the stream on the client side. I am wondering how to do that in Python.
Note: I am using asyncio
I took tee implementation and modified it such you can have various number of generators from infinite_stream:
import collections
def generators_factory(iterable):
it = iter(iterable)
deques = []
already_gone = []
def new_generator():
new_deque = collections.deque()
new_deque.extend(already_gone)
deques.append(new_deque)
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
already_gone.append(newval)
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return gen(new_deque)
return new_generator
# test it:
infinite_stream = [1, 2, 3, 4, 5]
factory = generators_factory(infinite_stream)
gen1 = factory()
gen2 = factory()
print(next(gen1)) # 1
print(next(gen2)) # 1 even after it was produced by gen1
print(list(gen1)) # [2, 3, 4, 5] # the rest after 1
To cache only some amount of values you can change already_gone = [] into already_gone = collections.deque(maxlen=size) and add size=None parameter to generators_factory.
Consider simple class attributes.
Given
def infinite_stream():
"""Yield a number from a (semi-)infinite iterator."""
# Alternatively, `yield from itertools.count()`
yield from iter(range(100000000))
# Helper
def get_data(iterable):
"""Print the state of `data` per stream."""
return ", ".join([f"{x.__name__}: {x.data}" for x in iterable])
Code
class SharedIterator:
"""Share the state of an iterator with subclasses."""
_gen = infinite_stream()
data = None
#staticmethod
def modify():
"""Advance the shared iterator + assign new data."""
cls = SharedIterator
cls.data = next(cls._gen)
Demo
Given a tuple of client streams (A, B and C),
# Streams
class A(SharedIterator): pass
class B(SharedIterator): pass
class C(SharedIterator): pass
streams = A, B, C
let us modify and print the state of one iterator shared between them:
# Observe changed state in subclasses
A.modify()
print("1st access:", get_data(streams))
B.modify()
print("2nd access:", get_data(streams))
C.modify()
print("3rd access:", get_data(streams))
Output
1st access: A: 0, B: 0, C: 0
2nd access: A: 1, B: 1, C: 1
3rd access: A: 2, B: 2, C: 2
Although any stream can modify the iterator, the class attribute is shared between sub-classes.
See Also
Docs on asyncio.Queue - an async alternative to shared container
Post on the Observer Pattern + asyncio
You can call "tee" repeatedly to create multiple iterators as needed.
it = iter([ random.random() for i in range(100)])
base, it_cp = itertools.tee(it)
_, it_cp2 = itertools.tee(base)
_, it_cp3 = itertools.tee(base)
Sample: http://tpcg.io/ZGc6l5.
You can use single generator and "subscriber generators":
subscribed_generators = []
def my_generator():
while true:
elem = yield
do_something(elem) # or yield do_something(elem) depending on your actual use
def publishing_generator():
for elem in the_infinite_stream():
for generator in subscribed_generators:
generator.send(elem)
subscribed_generators.extend([my_generator(), my_generator()])
# Next is just ane example that forces iteration over `the_infinite_stream`
for elem in publishing_generator():
pass
Instead of generator-function you may also create a class with methods: __next__, __iter__, send, throw. That way you can modify MyGenerator.__init__ method to automatically add new instances of it to subscribed_generators.
This is somewhat similar to event-based approach with a "dumb implementation":
for elem in the_infinite_stream is similar to emitting event
for generator ...: generator.send is similar to sending event to each subscriber.
So one way to implement a "more complex but structured solution" would be to use event-based approach:
For example you can use asyncio.Event
Or some third-party solution like aiopubsub
For any of those approaches you should emit event for each element from the_infinite_stream, and your instances of my_generator should be subscribed to those events.
And other approaches can also be used and the best choice depends: on details of your task, on how are you using event-loop in asyncio. For example:
You can implement the_infinite_stream (or wrapper for it) as some class with "cursors" (objects that track current position in the stream for different subscribers); then each my_generator registers new cursor and uses it to get next item in the infinite stream. In this approach event-loop will not automatically revisit my_generator instances, which might be required if those instances "are not equal" (for example have some "priority balancing")
Intermediate generator calling all the instances of my_generator (as described earlier). In this approach each instance of my_generator is automatically revisited by event-loop. Most likely this approach is thread-safe.
Event-based approaches:
using asyncio.Event. Similar to use of intermediate generator. Not
thread-safe
aiopubsub.
something that uses Observer pattern
Make the_infinite_generator (or wrapper for it) to be "Singleton" that "caches" latest event. Some approaches were described in other answers. Another "caching" solutions can be used:
emit the same element once for each instance of the_infinite_generator (use class with custom __new__ method that tracks instances, or use the same instance of class that has a method returning "shifted" iterator over the_infinite_loop) until someone calls special method on
instance of the_infinite_generator (or on class): infinite_gen.next_cycle. In
this case there should always be some "last finalizing
generator/processor" that at the end of each event-loop's cycle will
do the_infinite_generator().next_cycle()
Similar to previous but same event is allowed to fire multiple times in the same my_generator instance (so they should watch for this case). In this approach the_infinite_generator().next_cycle() can be called "periodically" with loop.call_later or loop.cal_at. This approach might be needed if "subscribers" should be able to handle/analyze: delays, rate-limits, timeouts between events, etc.
Many other solutions are possible. It's hard to propose something specific without looking at your current implementation and without knowing what is the desired behavior of generators that use the_infinite_loop
If I understand your description of "shared" streams correctly, that you really need "one" the_infinite_stream generator and a "handler" for it. Example that tries to do this:
class StreamHandler:
def __init__(self):
self.__real_stream = the_infinite_stream()
self.__sub_streams = []
def get_stream(self):
sub_stream = [] # or better use some Queue/deque object. Using list just to show base principle
self.__sub_streams.append(sub_stream)
while True:
while sub_stream:
yield sub_stream.pop(0)
next(self)
def __next__(self):
next_item = next(self.__real_stream)
for sub_stream in self.__sub_steams:
sub_stream.append(next_item)
some_global_variable = StreamHandler()
# Or you can change StreamHandler.__new__ to make it singleton, or you can create an instance at the point of creation of event-loop
def my_generator():
for elem in some_global_variable.get_stream():
yield elem
But if all your my_generator objects are initialized at the same point of infinite stream, and "equally" iterated inside the loop, then this approach will introduce "unnecessary" memory overhead for each "sub_stream" (used as queue). Unnecessary: because those queues will always be the same (but that can be optimized: if there are some existing "empty" sub_stream than it can be re-used for new sub_streams with some changes to "pop-logic"). And many-many other implementations and nuances can be discussed
If you have a single generator, you can use one queue per "subscriber" and route events to each subscriber as the primary generator produces results.
This has the advantage of allowing the subscribers to move at their own pace, and it can be dropped in existing code with very little changes to the original source.
For example:
def my_gen():
...
m1 = Muxer(my_gen)
m2 = Muxer(my_gen)
consumer1(m1).start()
consumer2(m2).start()
As items are pulled from the primary generator they are inserted into queues for each listener. Listeners can subscribe any time by constructing a new Muxer():
import queue
from threading import Lock
from collections import namedtuple
class Muxer():
Entry = namedtuple('Entry', 'genref listeners, lock')
already = {}
top_lock = Lock()
def __init__(self, func, restart=False):
self.restart = restart
self.func = func
self.queue = queue.Queue()
with self.top_lock:
if func not in self.already:
self.already[func] = self.Entry([func()], [], Lock())
ent = self.already[func]
self.genref = ent.genref
self.lock = ent.lock
self.listeners = ent.listeners
self.listeners.append(self)
def __iter__(self):
return self
def __next__(self):
try:
e = self.queue.get_nowait()
except queue.Empty:
with self.lock:
try:
e = self.queue.get_nowait()
except queue.Empty:
try:
e = next(self.genref[0])
for other in self.listeners:
if not other is self:
other.queue.put(e)
except StopIteration:
if self.restart:
self.genref[0] = self.func()
raise
return e
Original source code, including test suite:
https://gist.github.com/earonesty/cafa4626a2def6766acf5098331157b3
The unit tests run many threads concurrently processing the same generated events in sequence. The code is order preserving, with a lock acquired during the single generator's access.
Caveats: the version here uses a singleton to gate access, otherwise it would be possible to accidentally evade its control over the contained generators. It also allows the contained generators to be "restartable", which was a useful feature for me a the time. There is no "close()" feature, simply because I didn't need it. This is an appropriate use case for __del__ however, since the last reference to a listener is the right time to clean up.

Chaining generators considered harmful?

I claim: Chaining generators in Python is memory-inefficient and renders them unusable for certain types of applications. If possible, please prove me wrong.
First, a very simple and straight-forward example without generators:
import gc
def cocktail_objects():
# find all Cocktail objects currently tracked by the garbage collector
return filter(lambda obj: isinstance(obj, Cocktail), gc.get_objects())
class Cocktail(object):
def __init__(self, ingredients):
# ingredients represents our object data, imagine some heavy arrays
self.ingredients = ingredients
def __str__(self):
return self.ingredients
def __repr__(self):
return 'Cocktail(' + str(self) + ')'
def create(first_ingredient):
return Cocktail(first_ingredient)
def with_ingredient(cocktail, ingredient):
# this could be some data transformation function
return Cocktail(cocktail.ingredients + ' and ' + ingredient)
first_ingredients = ['rum', 'vodka']
print 'using iterative style:'
for ingredient in first_ingredients:
cocktail = create(ingredient)
cocktail = with_ingredient(cocktail, 'coke')
cocktail = with_ingredient(cocktail, 'limes')
print cocktail
print cocktail_objects()
This prints as expected:
rum and coke and limes
[Cocktail(rum and coke and limes)]
vodka and coke and limes
[Cocktail(vodka and coke and limes)]
Now let's use iterator objects to make the cocktail transformation easier composable:
class create_iter(object):
def __init__(self, first_ingredients):
self.first_ingredients = first_ingredients
self.i = 0
def __iter__(self):
return self
def next(self):
try:
ingredient = self.first_ingredients[self.i]
except IndexError:
raise StopIteration
else:
self.i += 1
return create(ingredient)
class with_ingredient_iter(object):
def __init__(self, cocktails_iter, ingredient):
self.cocktails_iter = cocktails_iter
self.ingredient = ingredient
def __iter__(self):
return self
def next(self):
cocktail = next(self.cocktails_iter)
return with_ingredient(cocktail, self.ingredient)
print 'using iterators:'
base = create_iter(first_ingredients)
with_coke = with_ingredient_iter(base, 'coke')
with_coke_and_limes = with_ingredient_iter(with_coke, 'limes')
for cocktail in with_coke_and_limes:
print cocktail
print cocktail_objects()
The output is identical to before.
Finally, let's replace iterators with generators to get rid of boiler-plate:
def create_gen(first_ingredients):
for ingredient in first_ingredients:
yield create(ingredient)
def with_ingredient_gen(cocktails_gen, ingredient):
for cocktail in cocktails_gen:
yield with_ingredient(cocktail, ingredient)
print 'using generators:'
base = create_gen(first_ingredients)
with_coke = with_ingredient_gen(base, 'coke')
with_coke_and_limes = with_ingredient_gen(with_coke, 'limes')
for cocktail in with_coke_and_limes:
print cocktail
print cocktail_objects()
This however prints:
rum and coke and limes
[Cocktail(rum), Cocktail(rum and coke), Cocktail(rum and coke and limes)]
vodka and coke and limes
[Cocktail(vodka), Cocktail(vodka and coke), Cocktail(vodka and coke and limes)]
This means that in a chain of generators, all currently yielded objects in that chain stay in memory and don't get released, even though the ones in earlier chain positions aren't needed anymore. Result: higher than necessary memory consumption.
Now, the question is: Why do the generators hold on to the objects that they are yielding until the next iteration starts? Obviously the objects aren't needed anymore in the generators and the references to them could be released.
I'm using generators in one of my projects to transform heavy data (numpy arrays of hundreds of megabytes) in a kind of pipeline. But as you can see this is very inefficient memory-wise. I am using Python 2.7. If this is a behaviour that got fixed in Python 3, please tell me. Otherwise, does this qualify for a bug report? And most importantly, are there any work-arounds except rewriting as shown?
Work-around 1:
print 'using imap:'
from itertools import imap
base = imap(lambda ingredient: create(ingredient), first_ingredients)
with_coke = imap(lambda cocktail: with_ingredient(cocktail, 'coke'), base)
with_coke_and_limes = imap(lambda cocktail: with_ingredient(cocktail, 'limes'), with_coke)
for cocktail in with_coke_and_limes:
print cocktail
print gc.collect()
print cocktail_objects()
Obviously this would be only useable if no state needs to be kept between the "yields". In the examples this is the case.
Preliminary conclusion: If you use iterator classes, then you decide what state you want to keep. If you use generators, Python implicitly decides what state to keep. If you use itertools.imap you cannot keep any state.
Your with_coke_and_limes yields at a certain point in its execution. At that point, the function has a local variable called cocktail (from its for loop) that refers to the "intermediate" cocktail from the next step up in the generator nesting (i.e., "rum and coke"). Just because the generator yields at that point does not mean it can throw away that object. The execution of with_ingredient_gen is suspended at that point, and at that point the local variable cocktail still exists. The function might need to refer to it later, after it resumes. There's nothing that says the yield has to be the last thing in your for loop, or that there has to be only one yield. You could have written with_ingredient_gen like this:
def with_ingredient_gen(cocktails_gen, ingredient):
for cocktail in cocktails_gen:
yield with_ingredient(cocktail, ingredient)
yield with_ingredient(cocktail, "another ingredient")
If Python threw away cocktail after the first yield, what would it do when it resumed the generator on the next iteration and found it needed that cocktail object again for the second yield?
The same applies to the other generators in the chain. Once you advance with_coke_and_limes to create a cocktail, with_coke and base are also activated and then suspended, and they have local variables referring to their own intermediate cocktails. Just as described above, these functions cannot delete the objects they refer to, since they might need them after resuming.
The generator function has to have some sort of reference to an object in order to yield it. And it has to keep that reference after it yields it, because it is suspended immediately after it yields, but it can't know whether it will need the reference once it is resumed.
Note that the only reason you didn't see the intermediate objects in your first example is because you overwrote the same local variable with each successive cocktail, allowing the earlier cocktail objects to be released. If in your first code snippet you do this instead:
for ingredient in first_ingredients:
cocktail = create(ingredient)
cocktail2 = with_ingredient(cocktail, 'coke')
cocktail3 = with_ingredient(cocktail, 'limes')
print cocktail3
print cocktail_objects()
...then you will see all three intermediate cocktails printed in that case as well, because each now has a separate local variable referring to it. Your generator version splits each of these intermediate variables into separate functions, so you can't overwrite the "parent" cocktail with the "derived" cocktail.
You are right that this can cause a problem if you have a deeply nested sequence of generators, each one of which creates large objects in memory and stores them in local variables. However, that is not a common situation. In such a situation, you have a couple of options. One is to perform the operations in a "flat" iterative style as in your first example.
Another option is to write your intermediate generators so that they don't actually create the large objects, but only "stack" the information needed to do so. For instance, in your example, if you don't want the intermediate Cocktail objects, don't create them. Instead of having each generator create a Cocktail and then having the next generator extract the previous cocktail's ingredients, have the generators pass on just the ingredients and have one final generator that combines the stacked ingredients and creates just one cocktail at the end.
It's hard to say exactly how to do this for your real application, but it may be possible. For instance, if your generators working on numpy arrays are doing things like add this, subtract that, transpose, etc., you can pass on "deltas" that describe what to do without actually doing it. Instead of having an intermediate generator, say, multiply an array by 3 and yield the array, have it yield some kind of indicator like "*3" (or possibly even a function doing the multiplication). Then your last generator can iterate over these "instructions" and perform the operations all in one place.

Subclassing and overriding a generator function in python

I need to override a method of a parent class, which is a generator, and am wondering the correct way to do this. Is there anything wrong with the following, or a more efficient way?
class A:
def gen(self):
yield 1
yield 2
class B(A):
def gen(self):
yield 3
for n in super().gen():
yield n
For Python 3.3 and up, the best, most general way to do this is:
class A:
def gen(self):
yield 1
yield 2
class B(A):
def gen(self):
yield 3
yield from super().gen()
This uses the new yield from syntax for delegating to a subgenerator. It's better than the other solutions because it's actually handing control to the generator it delegates to; if said generator supports .send and .throw to pass values and exceptions into the generator, then delegation means it actually receives the values; explicitly looping and yielding one by one will receive the values in the gen wrapper, not the generator actually producing the values, and the same problem applies to other solutions like using itertools.chain.
What you have looks fine, but is not the only approach. What's important about a generator function is that it returns an iterable object. Your subclass could thus instead directly create an iterable, for example:
import itertools
class B(A):
def gen(self):
return itertools.chain([3], super().gen())
The better approach is going to depend on exactly what you're doing; the above looks needlessly complex, but I wouldn't want to generalize from such a simple example.
To call a method from a subclass you need the keyword super.
New Source Code:
class B(A):
def gen(self):
yield 3
for n in super().gen():
yield n
This:
b = B()
for i in b.gen():
print(i)
produces the output:
3
1
2
In the first Iteration your generator stops at '3', for the following iterations it just goes on as the superclass normally would.
This Question provides a really good and lengthy explanation of generators, iterators and the yield- keyword:
What does the "yield" keyword do in Python?
Your code is correct.
Or rather, I don't see problem in it and it apparently runs correctly.
The only thing I can think of is the following one.
.
Post-scriptum
For new-style classes, see other answers that use super()
But super() only works for new-style classes
Anyway, this answer could be useful at least, but only, for classic-style classes.
.
When the interpreter arrives on the instruction for n in A.gen(self):, it must find the function A.gen.
The notation A.gen doesn't mean that the object A.gen is INSIDE the object A.
The object A.gen is SOMEWHERE in the memory and the interpreter will know where to find it by obtaining the needed information (an address) from A.__dict__['gen'] , in which A.__dict__ is the namespace of A.
So, finding the function object A.gen in the memory requires a lookup in A.__dict__
But to perform this lookup, the interpreter must first find the object A itself.
So, when it arrives on the instruction for n in A.gen(self): , it first searches if the identifier A is among the local identifiers, that is to say it searches for the string 'A' in the local namespace of the function (of which I don't know the name).
Since it is not, the interpreter goes outside the function and searches for this identifier at the module level, in the global namespace (which is globals() )
At this point, it may be that the global namespace would have hundreds or thousands of attributes names among which to perform the lookup for 'A'.
However, A has very few attributes: its __dict__ 's keys are only '_ module _' , 'gen' and '_ doc _' (to see that, make print A.__dict__ )
So, it would be a pity that the little search for the string 'gen' in A._dict_ should be done after a search among hundreds of items in the dictionary-namespace globals() of the module level.
.
That's why I suggest another way to make the interpreter able to find the function A.gen
class A:
def gen(self):
yield 1
yield 2
class BB(A):
def gen(self):
yield 3
for n in self.__class__.__bases__[0].gen(self):
yield n
bb = BB()
print list(bb.gen()) # prints [3, 1, 2]
self._class_ is the class from which has been instanciated the instance, that is to say it is Bu
self._class_._bases_ is a tuple containing the base classes of Bu
Presently there is only one element in this tuple , so self._class_._bases_[0] is A
__class__ and __bases__ are names of special attributes that aren't listed in _dict_ ;
In fact _class_ , _bases_ and _dict_ are special attributes of similar nature, they are Python-provided attributes, see:
http://www.cafepy.com/article/python_attributes_and_methods/python_attributes_and_methods.html
.
Well, what I mean , in the end, is that there are few elements in self._class_ and in self._class_._bases_ , so it is rational to think that the successive lookups in these objects to finally find the way to access to A.gen will be faster than the lookup to search for 'gen' in the global namespace in case this one contains hundreds of elements.
Maybe that's trying to do too much optimization, maybe not.
This answer is mainly to give information on the underlying implied mechanisms, that I personally find interesting to know.
.
Edit
You can obtain the same as your code with a more concise instruction
class A:
def gen(self):
yield 1
yield 2
class Bu(A):
def gen(self):
yield 3
for n in A.gen(self):
yield n
b = Bu()
print 'list(b.gen()) ==',list(b.gen())
from itertools import chain
w = chain(iter((3,)),xrange(1,3))
print 'list(w) ==',list(w)
produces
list(b.gen()) == [3, 1, 2]
list(w) == [3, 1, 2]
If A.gen() also may contain a return statement, then you also need to make sure your override returns with a value. This is easiest done as follows:
class A:
def gen(self):
yield 1
return 2
class B:
def gen(self):
yield 3
ret = yield from super().gen()
return ret
This gives:
>>> i = A.gen()
>>> next(i)
1
>>> next(i)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration: 2
>>> i = B.gen()
>>> next(i)
3
>>> next(i)
1
>>> next(i)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration: 2
Without an explicit return statement, the last line is StopIteration instaed of StopIteration: 2.

Difference between Python's Generators and Iterators

What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
iterator is a more general concept: any object whose class has a __next__ method (next in Python 2) and an __iter__ method that does return self.
Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions (yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph's definition of an iterator.
You may want to use a custom iterator, rather than a generator, when you need a class with somewhat complex state-maintaining behavior, or want to expose other methods besides __next__ (and __iter__ and __init__). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression) is sufficient, and it's simpler to code because state maintenance (within reasonable limits) is basically "done for you" by the frame getting suspended and resumed.
For example, a generator such as:
def squares(start, stop):
for i in range(start, stop):
yield i * i
generator = squares(a, b)
or the equivalent generator expression (genexp)
generator = (i*i for i in range(a, b))
would take more code to build as a custom iterator:
class Squares(object):
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self): return self
def __next__(self): # next in Python 2
if self.start >= self.stop:
raise StopIteration
current = self.start * self.start
self.start += 1
return current
iterator = Squares(a, b)
But, of course, with class Squares you could easily offer extra methods, i.e.
def current(self):
return self.start
if you have any actual need for such extra functionality in your application.
What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
In summary: Iterators are objects that have an __iter__ and a __next__ (next in Python 2) method. Generators provide an easy, built-in way to create instances of Iterators.
A function with yield in it is still a function, that, when called, returns an instance of a generator object:
def a_function():
"when called, returns generator object"
yield
A generator expression also returns a generator:
a_generator = (i for i in range(0))
For a more in-depth exposition and examples, keep reading.
A Generator is an Iterator
Specifically, generator is a subtype of iterator.
>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True
We can create a generator several ways. A very common and simple way to do so is with a function.
Specifically, a function with yield in it is a function, that, when called, returns a generator:
>>> def a_function():
"just a function definition with yield in it"
yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function() # when called
>>> type(a_generator) # returns a generator
<class 'generator'>
And a generator, again, is an Iterator:
>>> isinstance(a_generator, collections.Iterator)
True
An Iterator is an Iterable
An Iterator is an Iterable,
>>> issubclass(collections.Iterator, collections.Iterable)
True
which requires an __iter__ method that returns an Iterator:
>>> collections.Iterable()
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__
Some examples of iterables are the built-in tuples, lists, dictionaries, sets, frozen sets, strings, byte strings, byte arrays, ranges and memoryviews:
>>> all(isinstance(element, collections.Iterable) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
Iterators require a next or __next__ method
In Python 2:
>>> collections.Iterator()
Traceback (most recent call last):
File "<pyshell#80>", line 1, in <module>
collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next
And in Python 3:
>>> collections.Iterator()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__
We can get the iterators from the built-in objects (or custom objects) with the iter function:
>>> all(isinstance(iter(element), collections.Iterator) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
The __iter__ method is called when you attempt to use an object with a for-loop. Then the __next__ method is called on the iterator object to get each item out for the loop. The iterator raises StopIteration when you have exhausted it, and it cannot be reused at that point.
From the documentation
From the Generator Types section of the Iterator Types section of the Built-in Types documentation:
Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the __iter__() and next() [__next__() in Python 3] methods. More information about generators can be found in the documentation for the yield expression.
(Emphasis added.)
So from this we learn that Generators are a (convenient) type of Iterator.
Example Iterator Objects
You might create object that implements the Iterator protocol by creating or extending your own object.
class Yes(collections.Iterator):
def __init__(self, stop):
self.x = 0
self.stop = stop
def __iter__(self):
return self
def next(self):
if self.x < self.stop:
self.x += 1
return 'yes'
else:
# Iterators must raise when done, else considered broken
raise StopIteration
__next__ = next # Python 3 compatibility
But it's easier to simply use a Generator to do this:
def yes(stop):
for _ in range(stop):
yield 'yes'
Or perhaps simpler, a Generator Expression (works similarly to list comprehensions):
yes_expr = ('yes' for _ in range(stop))
They can all be used in the same way:
>>> stop = 4
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop),
('yes' for _ in range(stop))):
... print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes
Conclusion
You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.
However, in the vast majority of cases, you are best suited to use yield to define a function that returns a Generator Iterator or consider Generator Expressions.
Finally, note that generators provide even more functionality as coroutines. I explain Generators, along with the yield statement, in depth on my answer to "What does the “yield” keyword do?".
Iterators are objects which use the next() method to get the following values of a sequence.
Generators are functions that produce or yield a sequence of values using the yield keyword.
Every next() method call on a generator object(for ex: f below) returned by a generator function (for ex: foo() below), generates the next value in the sequence.
When a generator function is called, it returns an generator object without even beginning the execution of the function. When the next() method is called for the first time, the function starts executing until it reaches a yield statement which returns the yielded value. The yield keeps track of what has happened, i.e. it remembers the last execution. And secondly, the next() call continues from the previous value.
The following example demonstrates the interplay between yield and the call to the next method on a generator object.
>>> def foo():
... print("begin")
... for i in range(3):
... print("before yield", i)
... yield i
... print("after yield", i)
... print("end")
...
>>> f = foo()
>>> next(f)
begin
before yield 0 # Control is in for loop
0
>>> next(f)
after yield 0
before yield 1 # Continue for loop
1
>>> next(f)
after yield 1
before yield 2
2
>>> next(f)
after yield 2
end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Adding an answer because none of the existing answers specifically address the confusion in the official literature.
Generator functions are ordinary functions defined using yield instead of return. When called, a generator function returns a generator object, which is a kind of iterator - it has a next() method. When you call next(), the next value yielded by the generator function is returned.
Either the function or the object may be called the "generator" depending on which Python source document you read. The Python glossary says generator functions, while the Python wiki implies generator objects. The Python tutorial remarkably manages to imply both usages in the space of three sentences:
Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed).
The first two sentences identify generators with generator functions, while the third sentence identifies them with generator objects.
Despite all this confusion, one can seek out the Python language reference for the clear and final word:
The yield expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.
When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of a generator function.
So, in formal and precise usage, "generator" unqualified means generator object, not generator function.
The above references are for Python 2 but Python 3 language reference says the same thing. However, the Python 3 glossary states that
generator ... Usually refers to a generator function, but may refer to a generator iterator in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity.
Everybody has a really nice and verbose answer with examples and I really appreciate it. I just wanted to give a short few lines answer for people who are still not quite clear conceptually:
If you create your own iterator, it is a little bit involved - you have
to create a class and at least implement the iter and the next methods. But what if you don't want to go through this hassle and want to quickly create an iterator. Fortunately, Python provides a short-cut way to defining an iterator. All you need to do is define a function with at least 1 call to yield and now when you call that function it will return "something" which will act like an iterator (you can call next method and use it in a for loop). This something has a name in Python called Generator
Hope that clarifies a bit.
Examples from Ned Batchelder highly recommended for iterators and generators
A method without generators that do something to even numbers
def evens(stream):
them = []
for n in stream:
if n % 2 == 0:
them.append(n)
return them
while by using a generator
def evens(stream):
for n in stream:
if n % 2 == 0:
yield n
We don't need any list nor a return statement
Efficient for large/ infinite length stream ... it just walks and yield the value
Calling the evens method (generator) is as usual
num = [...]
for n in evens(num):
do_smth(n)
Generator also used to Break double loop
Iterator
A book full of pages is an iterable, A bookmark is an
iterator
and this bookmark has nothing to do except to move next
litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration (Exception) as we got end of the iterator
To use Generator ... we need a function
To use Iterator ... we need next and iter
As been said:
A Generator function returns an iterator object
The Whole benefit of Iterator:
Store one element a time in memory
No-code 4 line cheat sheet:
A generator function is a function with yield in it.
A generator expression is like a list comprehension. It uses "()" vs "[]"
A generator object (often called 'a generator') is returned by both above.
A generator is also a subtype of iterator.
Previous answers missed this addition: a generator has a close method, while typical iterators don’t. The close method triggers a StopIteration exception in the generator, which may be caught in a finally clause in that iterator, to get a chance to run some clean‑up. This abstraction makes it most usable in the large than simple iterators. One can close a generator as one could close a file, without having to bother about what’s underneath.
That said, my personal answer to the first question would be: iteratable has an __iter__ method only, typical iterators have a __next__ method only, generators has both an __iter__ and a __next__ and an additional close.
For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close method an a greater composability with yield from. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.
For control flow matters, generators are an as much important concept as promises: both are abstract and composable.
It's difficult to answer the question without 2 other concepts: iterable and iterator protocol.
What is difference between iterator and iterable?
Conceptually you iterate over iterable with the help of corresponding iterator. There are a few differences that can help to distinguish iterator and iterable in practice:
One difference is that iterator has __next__ method, iterable does not.
Another difference - both of them contain __iter__ method. In case of iterable it returns the corresponding iterator. In case of iterator it returns itself.
This can help to distinguish iterator and iterable in practice.
>>> x = [1, 2, 3]
>>> dir(x)
[... __iter__ ...]
>>> x_iter = iter(x)
>>> dir(x_iter)
[... __iter__ ... __next__ ...]
>>> type(x_iter)
list_iterator
What are iterables in python? list, string, range etc. What are iterators? enumerate, zip, reversed etc. We may check this using the approach above. It's kind of confusing. Probably it would be easier if we have only one type. Is there any difference between range and zip? One of the reasons to do this - range has a lot of additional functionality - we may index it or check if it contains some number etc. (see details here).
How can we create an iterator ourselves? Theoretically we may implement Iterator Protocol (see here). We need to write __next__ and __iter__ methods and raise StopIteration exception and so on (see Alex Martelli's answer for an example and possible motivation, see also here). But in practice we use generators. It seems to be by far the main method to create iterators in python.
I can give you a few more interesting examples that show somewhat confusing usage of those concepts in practice:
in keras we have tf.keras.preprocessing.image.ImageDataGenerator; this class doesn't have __next__ and __iter__ methods; so it's not an iterator (or generator);
if you call its flow_from_dataframe() method you'll get DataFrameIterator that has those methods; but it doesn't implement StopIteration (which is not common in build-in iterators in python); in documentation we may read that "A DataFrameIterator yielding tuples of (x, y)" - again confusing usage of terminology;
we also have Sequence class in keras and that's custom implementation of a generator functionality (regular generators are not suitable for multithreading) but it doesn't implement __next__ and __iter__, rather it's a wrapper around generators (it uses yield statement);
Generator Function, Generator Object, Generator:
A Generator function is just like a regular function in Python but it contains one or more yield statements. Generator functions is a great tool to create Iterator objects as easy as possible. The Iterator object returend by generator function is also called Generator object or Generator.
In this example I have created a Generator function which returns a Generator object <generator object fib at 0x01342480>. Just like other iterators, Generator objects can be used in a for loop or with the built-in function next() which returns the next value from generator.
def fib(max):
a, b = 0, 1
for i in range(max):
yield a
a, b = b, a + b
print(fib(10)) #<generator object fib at 0x01342480>
for i in fib(10):
print(i) # 0 1 1 2 3 5 8 13 21 34
print(next(myfib)) #0
print(next(myfib)) #1
print(next(myfib)) #1
print(next(myfib)) #2
So a generator function is the easiest way to create an Iterator object.
Iterator:
Every generator object is an iterator but not vice versa. A custom iterator object can be created if its class implements __iter__ and __next__ method (also called iterator protocol).
However, it is much easier to use generators function to create iterators because they simplify their creation, but a custom Iterator gives you more freedom and you can also implement other methods according to your requirements as shown in the below example.
class Fib:
def __init__(self,max):
self.current=0
self.next=1
self.max=max
self.count=0
def __iter__(self):
return self
def __next__(self):
if self.count>self.max:
raise StopIteration
else:
self.current,self.next=self.next,(self.current+self.next)
self.count+=1
return self.next-self.current
def __str__(self):
return "Generator object"
itobj=Fib(4)
print(itobj) #Generator object
for i in Fib(4):
print(i) #0 1 1 2
print(next(itobj)) #0
print(next(itobj)) #1
print(next(itobj)) #1
This thread covers in many details all the differences between the two, but wanted to add something on the conceptual difference between the two:
[...] an iterator as defined in the GoF book retrieves items from a collection, while a generator can produce items “out of thin air”. That’s why the Fibonacci sequence generator is a common example: an infinite series of numbers cannot be stored in a collection.
Ramalho, Luciano. Fluent Python (p. 415). O'Reilly Media. Kindle Edition.
Sure, it does not cover all the aspects but I think it gives a good notion when one can be useful.
You can compare both approaches for the same data:
def myGeneratorList(n):
for i in range(n):
yield i
def myIterableList(n):
ll = n*[None]
for i in range(n):
ll[i] = i
return ll
# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
print("{} {}".format(i1, i2))
# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
Besides, if you check the memory footprint, the generator takes much less memory as it doesn't need to store all the values in memory at the same time.
An iterable object is something which can be iterated (naturally). To do that, however, you will need something like an iterator object, and, yes, the terminology may be confusing. Iterable objects include a __iter__ method which will return the iterator object for the iterable object.
An iterator object is an object which implements the iterator protocol - a set of rules. In this case, it must have at least these two methods: __iter__ and __next__. The __next__ method is a function which supplies a new value. The __iter__ method returns the iterator object. In a more complex object, there may be a separate iterator, but in a simpler case, __iter__ returns the object itself (typically return self).
One iterable object is a list object. It’s not an iterator, but it has an __iter__ method which returns an iterator. You can call this method directly as things.__iter__(), or use iter(things).
If you want to iterate through any collection, you will need to use its iterator:
things_iterator = iter(things)
for i in things_iterator:
print(i)
However, Python will automatically use the iterator, which is why you never see the above example. Instead you write:
for i in things:
print(i)
Writing an iterator yourself can be tedious, so Python has a simpler alternative: the generator function. A generator function is not an ordinary function. Instead of running through the code and returning a final result, the code is deferred, and the function returns immediately with a generator object.
A generator object is like an iterator object in that it implements the iterator protocol. That’s good enough for most purposes. There are many examples of generators in the other answers.
In short, an iterator is an object which allows you to iterate through another object, whether it’s a collection or some other source of values. A generator is a simplified iterator which does more-or-less the same job, but is easier to implement.
Normally, you would go for a generator if that’s all you need. If, however, you’re building a more complex object which includes iteration among other features, you would use the iterator protocol instead.
I am writing specifically for Python newbies in a very simple way, though deep down Python does so many things.
Let’s start with the very basic:
Consider a list,
l = [1,2,3]
Let’s write an equivalent function:
def f():
return [1,2,3]
o/p of print(l): [1,2,3] &
o/p of print(f()) : [1,2,3]
Let’s make list l iterable: In python list is always iterable that means you can apply iterator whenever you want.
Let’s apply iterator on list:
iter_l = iter(l) # iterator applied explicitly
Let’s make a function iterable, i.e. write an equivalent generator function.
In python as soon as you introduce the keyword yield; it becomes a generator function and iterator will be applied implicitly.
Note: Every generator is always iterable with implicit iterator applied and here implicit iterator is the crux
So the generator function will be:
def f():
yield 1
yield 2
yield 3
iter_f = f() # which is iter(f) as iterator is already applied implicitly
So if you have observed, as soon as you made function f a generator, it is already iter(f)
Now,
l is the list, after applying iterator method "iter" it becomes,
iter(l)
f is already iter(f), after applying iterator method "iter" it
becomes, iter(iter(f)), which is again iter(f)
It's kinda you are casting int to int(x) which is already int and it will remain int(x).
For example o/p of :
print(type(iter(iter(l))))
is
<class 'list_iterator'>
Never forget this is Python and not C or C++
Hence the conclusion from above explanation is:
list l ~= iter(l)
generator function f == iter(f)
All generators are iterators but not vice versa.
from typing import Iterator
from typing import Iterable
from typing import Generator
class IT:
def __init__(self):
self.n = 0
def __iter__(self):
return self
def __next__(self):
if self.n == 4:
raise StopIteration
try:
return self.n
finally:
self.n += 1
def g():
for i in range(4):
yield i
def test(it):
print(f'type(it) = {type(it)}')
print(f'isinstance(it, Generator) = {isinstance(it, Generator)}')
print(f'isinstance(it, Iterator) = {isinstance(it, Iterator)}')
print(f'isinstance(it, Iterable) = {isinstance(it, Iterable)}')
print(next(it))
print(next(it))
print(next(it))
print(next(it))
try:
print(next(it))
except StopIteration:
print('boom\n')
print(f'issubclass(Generator, Iterator) = {issubclass(Generator, Iterator)}')
print(f'issubclass(Iterator, Iterable) = {issubclass(Iterator, Iterable)}')
print()
test(IT())
test(g())
Output:
issubclass(Generator, Iterator) = True
issubclass(Iterator, Iterable) = True
type(it) = <class '__main__.IT'>
isinstance(it, Generator) = False
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom
type(it) = <class 'generator'>
isinstance(it, Generator) = True
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom

Categories