Ability to use generator like functions without using yield? (Python3.x) - python

There are some cases where its convenient to use a generator with yield to pass back data, to the caller over an extended period. Is there a way to do something similar to yield, without having to make the function into a generator?
The reason for this, is in some cases I end up having to make all callee's into generators when those nested functions may have useful return values.
# currently this works fine, but requires a return arg
def nested(return_store):
return_store[0] = some_test()
yield from some_generator()
def do_stuff(return_store):
yield some_data
for more_data in data:
yield more_data
# Annoying workaround!
return_store = [None]
yield from nested(return_store)
if return_store[0]:
pass # do anything
def main():
return Reply(do_stuff())
Instead I'd like to pass an object as an argument which I can pass arguments to (instead of using yield)
# is something like this possible?
def nested(iter_obj):
iter_obj.yield_replacement(some_generator())
return some_test()
def do_stuff(iter_obj):
iter_obj.yield_replacement(some_data)
for more_data in data:
iter_obj.yield_replacement(more_data)
# No annoying workaround
if nested(iter_obj):
pass # do anything
def main():
iter_obj = yield_replacement_object(consumer=print)
# sets up the generator (Reply should consume iter_obj)
do_stuff(iter_obj)
return Reply(iter_obj)

Generators are just one form of iterators. Anything that implements the iterator protocol will do.
This means you can replace your nested function with an object with more attributes:
class Nested():
def __iter__:
self.some_flag = some_test()
yield data
I implemented the __iter__ method as a generator function even.
Then use the object in your generator at will:
n = Nested()
yield from n
if n.some_flag:
# ...
Another method is to throw exceptions; if you are trying to communicate some out-of-band state change, throw an exception and catch it in the parent generator.

Related

When is it better to use an iterator than a generator? [duplicate]

What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
iterator is a more general concept: any object whose class has a __next__ method (next in Python 2) and an __iter__ method that does return self.
Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions (yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph's definition of an iterator.
You may want to use a custom iterator, rather than a generator, when you need a class with somewhat complex state-maintaining behavior, or want to expose other methods besides __next__ (and __iter__ and __init__). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression) is sufficient, and it's simpler to code because state maintenance (within reasonable limits) is basically "done for you" by the frame getting suspended and resumed.
For example, a generator such as:
def squares(start, stop):
for i in range(start, stop):
yield i * i
generator = squares(a, b)
or the equivalent generator expression (genexp)
generator = (i*i for i in range(a, b))
would take more code to build as a custom iterator:
class Squares(object):
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self): return self
def __next__(self): # next in Python 2
if self.start >= self.stop:
raise StopIteration
current = self.start * self.start
self.start += 1
return current
iterator = Squares(a, b)
But, of course, with class Squares you could easily offer extra methods, i.e.
def current(self):
return self.start
if you have any actual need for such extra functionality in your application.
What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
In summary: Iterators are objects that have an __iter__ and a __next__ (next in Python 2) method. Generators provide an easy, built-in way to create instances of Iterators.
A function with yield in it is still a function, that, when called, returns an instance of a generator object:
def a_function():
"when called, returns generator object"
yield
A generator expression also returns a generator:
a_generator = (i for i in range(0))
For a more in-depth exposition and examples, keep reading.
A Generator is an Iterator
Specifically, generator is a subtype of iterator.
>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True
We can create a generator several ways. A very common and simple way to do so is with a function.
Specifically, a function with yield in it is a function, that, when called, returns a generator:
>>> def a_function():
"just a function definition with yield in it"
yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function() # when called
>>> type(a_generator) # returns a generator
<class 'generator'>
And a generator, again, is an Iterator:
>>> isinstance(a_generator, collections.Iterator)
True
An Iterator is an Iterable
An Iterator is an Iterable,
>>> issubclass(collections.Iterator, collections.Iterable)
True
which requires an __iter__ method that returns an Iterator:
>>> collections.Iterable()
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__
Some examples of iterables are the built-in tuples, lists, dictionaries, sets, frozen sets, strings, byte strings, byte arrays, ranges and memoryviews:
>>> all(isinstance(element, collections.Iterable) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
Iterators require a next or __next__ method
In Python 2:
>>> collections.Iterator()
Traceback (most recent call last):
File "<pyshell#80>", line 1, in <module>
collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next
And in Python 3:
>>> collections.Iterator()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__
We can get the iterators from the built-in objects (or custom objects) with the iter function:
>>> all(isinstance(iter(element), collections.Iterator) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
The __iter__ method is called when you attempt to use an object with a for-loop. Then the __next__ method is called on the iterator object to get each item out for the loop. The iterator raises StopIteration when you have exhausted it, and it cannot be reused at that point.
From the documentation
From the Generator Types section of the Iterator Types section of the Built-in Types documentation:
Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the __iter__() and next() [__next__() in Python 3] methods. More information about generators can be found in the documentation for the yield expression.
(Emphasis added.)
So from this we learn that Generators are a (convenient) type of Iterator.
Example Iterator Objects
You might create object that implements the Iterator protocol by creating or extending your own object.
class Yes(collections.Iterator):
def __init__(self, stop):
self.x = 0
self.stop = stop
def __iter__(self):
return self
def next(self):
if self.x < self.stop:
self.x += 1
return 'yes'
else:
# Iterators must raise when done, else considered broken
raise StopIteration
__next__ = next # Python 3 compatibility
But it's easier to simply use a Generator to do this:
def yes(stop):
for _ in range(stop):
yield 'yes'
Or perhaps simpler, a Generator Expression (works similarly to list comprehensions):
yes_expr = ('yes' for _ in range(stop))
They can all be used in the same way:
>>> stop = 4
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop),
('yes' for _ in range(stop))):
... print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes
Conclusion
You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.
However, in the vast majority of cases, you are best suited to use yield to define a function that returns a Generator Iterator or consider Generator Expressions.
Finally, note that generators provide even more functionality as coroutines. I explain Generators, along with the yield statement, in depth on my answer to "What does the “yield” keyword do?".
Iterators are objects which use the next() method to get the following values of a sequence.
Generators are functions that produce or yield a sequence of values using the yield keyword.
Every next() method call on a generator object(for ex: f below) returned by a generator function (for ex: foo() below), generates the next value in the sequence.
When a generator function is called, it returns an generator object without even beginning the execution of the function. When the next() method is called for the first time, the function starts executing until it reaches a yield statement which returns the yielded value. The yield keeps track of what has happened, i.e. it remembers the last execution. And secondly, the next() call continues from the previous value.
The following example demonstrates the interplay between yield and the call to the next method on a generator object.
>>> def foo():
... print("begin")
... for i in range(3):
... print("before yield", i)
... yield i
... print("after yield", i)
... print("end")
...
>>> f = foo()
>>> next(f)
begin
before yield 0 # Control is in for loop
0
>>> next(f)
after yield 0
before yield 1 # Continue for loop
1
>>> next(f)
after yield 1
before yield 2
2
>>> next(f)
after yield 2
end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Adding an answer because none of the existing answers specifically address the confusion in the official literature.
Generator functions are ordinary functions defined using yield instead of return. When called, a generator function returns a generator object, which is a kind of iterator - it has a next() method. When you call next(), the next value yielded by the generator function is returned.
Either the function or the object may be called the "generator" depending on which Python source document you read. The Python glossary says generator functions, while the Python wiki implies generator objects. The Python tutorial remarkably manages to imply both usages in the space of three sentences:
Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed).
The first two sentences identify generators with generator functions, while the third sentence identifies them with generator objects.
Despite all this confusion, one can seek out the Python language reference for the clear and final word:
The yield expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.
When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of a generator function.
So, in formal and precise usage, "generator" unqualified means generator object, not generator function.
The above references are for Python 2 but Python 3 language reference says the same thing. However, the Python 3 glossary states that
generator ... Usually refers to a generator function, but may refer to a generator iterator in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity.
Everybody has a really nice and verbose answer with examples and I really appreciate it. I just wanted to give a short few lines answer for people who are still not quite clear conceptually:
If you create your own iterator, it is a little bit involved - you have
to create a class and at least implement the iter and the next methods. But what if you don't want to go through this hassle and want to quickly create an iterator. Fortunately, Python provides a short-cut way to defining an iterator. All you need to do is define a function with at least 1 call to yield and now when you call that function it will return "something" which will act like an iterator (you can call next method and use it in a for loop). This something has a name in Python called Generator
Hope that clarifies a bit.
Examples from Ned Batchelder highly recommended for iterators and generators
A method without generators that do something to even numbers
def evens(stream):
them = []
for n in stream:
if n % 2 == 0:
them.append(n)
return them
while by using a generator
def evens(stream):
for n in stream:
if n % 2 == 0:
yield n
We don't need any list nor a return statement
Efficient for large/ infinite length stream ... it just walks and yield the value
Calling the evens method (generator) is as usual
num = [...]
for n in evens(num):
do_smth(n)
Generator also used to Break double loop
Iterator
A book full of pages is an iterable, A bookmark is an
iterator
and this bookmark has nothing to do except to move next
litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration (Exception) as we got end of the iterator
To use Generator ... we need a function
To use Iterator ... we need next and iter
As been said:
A Generator function returns an iterator object
The Whole benefit of Iterator:
Store one element a time in memory
No-code 4 line cheat sheet:
A generator function is a function with yield in it.
A generator expression is like a list comprehension. It uses "()" vs "[]"
A generator object (often called 'a generator') is returned by both above.
A generator is also a subtype of iterator.
Previous answers missed this addition: a generator has a close method, while typical iterators don’t. The close method triggers a StopIteration exception in the generator, which may be caught in a finally clause in that iterator, to get a chance to run some clean‑up. This abstraction makes it most usable in the large than simple iterators. One can close a generator as one could close a file, without having to bother about what’s underneath.
That said, my personal answer to the first question would be: iteratable has an __iter__ method only, typical iterators have a __next__ method only, generators has both an __iter__ and a __next__ and an additional close.
For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close method an a greater composability with yield from. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.
For control flow matters, generators are an as much important concept as promises: both are abstract and composable.
It's difficult to answer the question without 2 other concepts: iterable and iterator protocol.
What is difference between iterator and iterable?
Conceptually you iterate over iterable with the help of corresponding iterator. There are a few differences that can help to distinguish iterator and iterable in practice:
One difference is that iterator has __next__ method, iterable does not.
Another difference - both of them contain __iter__ method. In case of iterable it returns the corresponding iterator. In case of iterator it returns itself.
This can help to distinguish iterator and iterable in practice.
>>> x = [1, 2, 3]
>>> dir(x)
[... __iter__ ...]
>>> x_iter = iter(x)
>>> dir(x_iter)
[... __iter__ ... __next__ ...]
>>> type(x_iter)
list_iterator
What are iterables in python? list, string, range etc. What are iterators? enumerate, zip, reversed etc. We may check this using the approach above. It's kind of confusing. Probably it would be easier if we have only one type. Is there any difference between range and zip? One of the reasons to do this - range has a lot of additional functionality - we may index it or check if it contains some number etc. (see details here).
How can we create an iterator ourselves? Theoretically we may implement Iterator Protocol (see here). We need to write __next__ and __iter__ methods and raise StopIteration exception and so on (see Alex Martelli's answer for an example and possible motivation, see also here). But in practice we use generators. It seems to be by far the main method to create iterators in python.
I can give you a few more interesting examples that show somewhat confusing usage of those concepts in practice:
in keras we have tf.keras.preprocessing.image.ImageDataGenerator; this class doesn't have __next__ and __iter__ methods; so it's not an iterator (or generator);
if you call its flow_from_dataframe() method you'll get DataFrameIterator that has those methods; but it doesn't implement StopIteration (which is not common in build-in iterators in python); in documentation we may read that "A DataFrameIterator yielding tuples of (x, y)" - again confusing usage of terminology;
we also have Sequence class in keras and that's custom implementation of a generator functionality (regular generators are not suitable for multithreading) but it doesn't implement __next__ and __iter__, rather it's a wrapper around generators (it uses yield statement);
Generator Function, Generator Object, Generator:
A Generator function is just like a regular function in Python but it contains one or more yield statements. Generator functions is a great tool to create Iterator objects as easy as possible. The Iterator object returend by generator function is also called Generator object or Generator.
In this example I have created a Generator function which returns a Generator object <generator object fib at 0x01342480>. Just like other iterators, Generator objects can be used in a for loop or with the built-in function next() which returns the next value from generator.
def fib(max):
a, b = 0, 1
for i in range(max):
yield a
a, b = b, a + b
print(fib(10)) #<generator object fib at 0x01342480>
for i in fib(10):
print(i) # 0 1 1 2 3 5 8 13 21 34
print(next(myfib)) #0
print(next(myfib)) #1
print(next(myfib)) #1
print(next(myfib)) #2
So a generator function is the easiest way to create an Iterator object.
Iterator:
Every generator object is an iterator but not vice versa. A custom iterator object can be created if its class implements __iter__ and __next__ method (also called iterator protocol).
However, it is much easier to use generators function to create iterators because they simplify their creation, but a custom Iterator gives you more freedom and you can also implement other methods according to your requirements as shown in the below example.
class Fib:
def __init__(self,max):
self.current=0
self.next=1
self.max=max
self.count=0
def __iter__(self):
return self
def __next__(self):
if self.count>self.max:
raise StopIteration
else:
self.current,self.next=self.next,(self.current+self.next)
self.count+=1
return self.next-self.current
def __str__(self):
return "Generator object"
itobj=Fib(4)
print(itobj) #Generator object
for i in Fib(4):
print(i) #0 1 1 2
print(next(itobj)) #0
print(next(itobj)) #1
print(next(itobj)) #1
This thread covers in many details all the differences between the two, but wanted to add something on the conceptual difference between the two:
[...] an iterator as defined in the GoF book retrieves items from a collection, while a generator can produce items “out of thin air”. That’s why the Fibonacci sequence generator is a common example: an infinite series of numbers cannot be stored in a collection.
Ramalho, Luciano. Fluent Python (p. 415). O'Reilly Media. Kindle Edition.
Sure, it does not cover all the aspects but I think it gives a good notion when one can be useful.
You can compare both approaches for the same data:
def myGeneratorList(n):
for i in range(n):
yield i
def myIterableList(n):
ll = n*[None]
for i in range(n):
ll[i] = i
return ll
# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
print("{} {}".format(i1, i2))
# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
Besides, if you check the memory footprint, the generator takes much less memory as it doesn't need to store all the values in memory at the same time.
An iterable object is something which can be iterated (naturally). To do that, however, you will need something like an iterator object, and, yes, the terminology may be confusing. Iterable objects include a __iter__ method which will return the iterator object for the iterable object.
An iterator object is an object which implements the iterator protocol - a set of rules. In this case, it must have at least these two methods: __iter__ and __next__. The __next__ method is a function which supplies a new value. The __iter__ method returns the iterator object. In a more complex object, there may be a separate iterator, but in a simpler case, __iter__ returns the object itself (typically return self).
One iterable object is a list object. It’s not an iterator, but it has an __iter__ method which returns an iterator. You can call this method directly as things.__iter__(), or use iter(things).
If you want to iterate through any collection, you will need to use its iterator:
things_iterator = iter(things)
for i in things_iterator:
print(i)
However, Python will automatically use the iterator, which is why you never see the above example. Instead you write:
for i in things:
print(i)
Writing an iterator yourself can be tedious, so Python has a simpler alternative: the generator function. A generator function is not an ordinary function. Instead of running through the code and returning a final result, the code is deferred, and the function returns immediately with a generator object.
A generator object is like an iterator object in that it implements the iterator protocol. That’s good enough for most purposes. There are many examples of generators in the other answers.
In short, an iterator is an object which allows you to iterate through another object, whether it’s a collection or some other source of values. A generator is a simplified iterator which does more-or-less the same job, but is easier to implement.
Normally, you would go for a generator if that’s all you need. If, however, you’re building a more complex object which includes iteration among other features, you would use the iterator protocol instead.
I am writing specifically for Python newbies in a very simple way, though deep down Python does so many things.
Let’s start with the very basic:
Consider a list,
l = [1,2,3]
Let’s write an equivalent function:
def f():
return [1,2,3]
o/p of print(l): [1,2,3] &
o/p of print(f()) : [1,2,3]
Let’s make list l iterable: In python list is always iterable that means you can apply iterator whenever you want.
Let’s apply iterator on list:
iter_l = iter(l) # iterator applied explicitly
Let’s make a function iterable, i.e. write an equivalent generator function.
In python as soon as you introduce the keyword yield; it becomes a generator function and iterator will be applied implicitly.
Note: Every generator is always iterable with implicit iterator applied and here implicit iterator is the crux
So the generator function will be:
def f():
yield 1
yield 2
yield 3
iter_f = f() # which is iter(f) as iterator is already applied implicitly
So if you have observed, as soon as you made function f a generator, it is already iter(f)
Now,
l is the list, after applying iterator method "iter" it becomes,
iter(l)
f is already iter(f), after applying iterator method "iter" it
becomes, iter(iter(f)), which is again iter(f)
It's kinda you are casting int to int(x) which is already int and it will remain int(x).
For example o/p of :
print(type(iter(iter(l))))
is
<class 'list_iterator'>
Never forget this is Python and not C or C++
Hence the conclusion from above explanation is:
list l ~= iter(l)
generator function f == iter(f)
All generators are iterators but not vice versa.
from typing import Iterator
from typing import Iterable
from typing import Generator
class IT:
def __init__(self):
self.n = 0
def __iter__(self):
return self
def __next__(self):
if self.n == 4:
raise StopIteration
try:
return self.n
finally:
self.n += 1
def g():
for i in range(4):
yield i
def test(it):
print(f'type(it) = {type(it)}')
print(f'isinstance(it, Generator) = {isinstance(it, Generator)}')
print(f'isinstance(it, Iterator) = {isinstance(it, Iterator)}')
print(f'isinstance(it, Iterable) = {isinstance(it, Iterable)}')
print(next(it))
print(next(it))
print(next(it))
print(next(it))
try:
print(next(it))
except StopIteration:
print('boom\n')
print(f'issubclass(Generator, Iterator) = {issubclass(Generator, Iterator)}')
print(f'issubclass(Iterator, Iterable) = {issubclass(Iterator, Iterable)}')
print()
test(IT())
test(g())
Output:
issubclass(Generator, Iterator) = True
issubclass(Iterator, Iterable) = True
type(it) = <class '__main__.IT'>
isinstance(it, Generator) = False
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom
type(it) = <class 'generator'>
isinstance(it, Generator) = True
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom

Shared python generator

I am trying to reproduce the reactive extensions "shared" observable concept with Python generators.
Say I have an API that gives me an infinite stream that I can use like this:
def my_generator():
for elem in the_infinite_stream():
yield elem
I could use this generator multiple times like so:
stream1 = my_generator()
stream2 = my_generator()
And the_infinite_stream() will be called twice (once for each generator).
Now say that the_infinite_stream() is an expensive operation. Is there a way to "share" the generator between multiple clients? It seems like tee would do that, but I have to know in advance how many independent generators I want.
The idea is that in other languages (Java, Swift) using the reactive extensions (RxJava, RxSwift) "shared" streams, I can conveniently duplicate the stream on the client side. I am wondering how to do that in Python.
Note: I am using asyncio
I took tee implementation and modified it such you can have various number of generators from infinite_stream:
import collections
def generators_factory(iterable):
it = iter(iterable)
deques = []
already_gone = []
def new_generator():
new_deque = collections.deque()
new_deque.extend(already_gone)
deques.append(new_deque)
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
already_gone.append(newval)
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return gen(new_deque)
return new_generator
# test it:
infinite_stream = [1, 2, 3, 4, 5]
factory = generators_factory(infinite_stream)
gen1 = factory()
gen2 = factory()
print(next(gen1)) # 1
print(next(gen2)) # 1 even after it was produced by gen1
print(list(gen1)) # [2, 3, 4, 5] # the rest after 1
To cache only some amount of values you can change already_gone = [] into already_gone = collections.deque(maxlen=size) and add size=None parameter to generators_factory.
Consider simple class attributes.
Given
def infinite_stream():
"""Yield a number from a (semi-)infinite iterator."""
# Alternatively, `yield from itertools.count()`
yield from iter(range(100000000))
# Helper
def get_data(iterable):
"""Print the state of `data` per stream."""
return ", ".join([f"{x.__name__}: {x.data}" for x in iterable])
Code
class SharedIterator:
"""Share the state of an iterator with subclasses."""
_gen = infinite_stream()
data = None
#staticmethod
def modify():
"""Advance the shared iterator + assign new data."""
cls = SharedIterator
cls.data = next(cls._gen)
Demo
Given a tuple of client streams (A, B and C),
# Streams
class A(SharedIterator): pass
class B(SharedIterator): pass
class C(SharedIterator): pass
streams = A, B, C
let us modify and print the state of one iterator shared between them:
# Observe changed state in subclasses
A.modify()
print("1st access:", get_data(streams))
B.modify()
print("2nd access:", get_data(streams))
C.modify()
print("3rd access:", get_data(streams))
Output
1st access: A: 0, B: 0, C: 0
2nd access: A: 1, B: 1, C: 1
3rd access: A: 2, B: 2, C: 2
Although any stream can modify the iterator, the class attribute is shared between sub-classes.
See Also
Docs on asyncio.Queue - an async alternative to shared container
Post on the Observer Pattern + asyncio
You can call "tee" repeatedly to create multiple iterators as needed.
it = iter([ random.random() for i in range(100)])
base, it_cp = itertools.tee(it)
_, it_cp2 = itertools.tee(base)
_, it_cp3 = itertools.tee(base)
Sample: http://tpcg.io/ZGc6l5.
You can use single generator and "subscriber generators":
subscribed_generators = []
def my_generator():
while true:
elem = yield
do_something(elem) # or yield do_something(elem) depending on your actual use
def publishing_generator():
for elem in the_infinite_stream():
for generator in subscribed_generators:
generator.send(elem)
subscribed_generators.extend([my_generator(), my_generator()])
# Next is just ane example that forces iteration over `the_infinite_stream`
for elem in publishing_generator():
pass
Instead of generator-function you may also create a class with methods: __next__, __iter__, send, throw. That way you can modify MyGenerator.__init__ method to automatically add new instances of it to subscribed_generators.
This is somewhat similar to event-based approach with a "dumb implementation":
for elem in the_infinite_stream is similar to emitting event
for generator ...: generator.send is similar to sending event to each subscriber.
So one way to implement a "more complex but structured solution" would be to use event-based approach:
For example you can use asyncio.Event
Or some third-party solution like aiopubsub
For any of those approaches you should emit event for each element from the_infinite_stream, and your instances of my_generator should be subscribed to those events.
And other approaches can also be used and the best choice depends: on details of your task, on how are you using event-loop in asyncio. For example:
You can implement the_infinite_stream (or wrapper for it) as some class with "cursors" (objects that track current position in the stream for different subscribers); then each my_generator registers new cursor and uses it to get next item in the infinite stream. In this approach event-loop will not automatically revisit my_generator instances, which might be required if those instances "are not equal" (for example have some "priority balancing")
Intermediate generator calling all the instances of my_generator (as described earlier). In this approach each instance of my_generator is automatically revisited by event-loop. Most likely this approach is thread-safe.
Event-based approaches:
using asyncio.Event. Similar to use of intermediate generator. Not
thread-safe
aiopubsub.
something that uses Observer pattern
Make the_infinite_generator (or wrapper for it) to be "Singleton" that "caches" latest event. Some approaches were described in other answers. Another "caching" solutions can be used:
emit the same element once for each instance of the_infinite_generator (use class with custom __new__ method that tracks instances, or use the same instance of class that has a method returning "shifted" iterator over the_infinite_loop) until someone calls special method on
instance of the_infinite_generator (or on class): infinite_gen.next_cycle. In
this case there should always be some "last finalizing
generator/processor" that at the end of each event-loop's cycle will
do the_infinite_generator().next_cycle()
Similar to previous but same event is allowed to fire multiple times in the same my_generator instance (so they should watch for this case). In this approach the_infinite_generator().next_cycle() can be called "periodically" with loop.call_later or loop.cal_at. This approach might be needed if "subscribers" should be able to handle/analyze: delays, rate-limits, timeouts between events, etc.
Many other solutions are possible. It's hard to propose something specific without looking at your current implementation and without knowing what is the desired behavior of generators that use the_infinite_loop
If I understand your description of "shared" streams correctly, that you really need "one" the_infinite_stream generator and a "handler" for it. Example that tries to do this:
class StreamHandler:
def __init__(self):
self.__real_stream = the_infinite_stream()
self.__sub_streams = []
def get_stream(self):
sub_stream = [] # or better use some Queue/deque object. Using list just to show base principle
self.__sub_streams.append(sub_stream)
while True:
while sub_stream:
yield sub_stream.pop(0)
next(self)
def __next__(self):
next_item = next(self.__real_stream)
for sub_stream in self.__sub_steams:
sub_stream.append(next_item)
some_global_variable = StreamHandler()
# Or you can change StreamHandler.__new__ to make it singleton, or you can create an instance at the point of creation of event-loop
def my_generator():
for elem in some_global_variable.get_stream():
yield elem
But if all your my_generator objects are initialized at the same point of infinite stream, and "equally" iterated inside the loop, then this approach will introduce "unnecessary" memory overhead for each "sub_stream" (used as queue). Unnecessary: because those queues will always be the same (but that can be optimized: if there are some existing "empty" sub_stream than it can be re-used for new sub_streams with some changes to "pop-logic"). And many-many other implementations and nuances can be discussed
If you have a single generator, you can use one queue per "subscriber" and route events to each subscriber as the primary generator produces results.
This has the advantage of allowing the subscribers to move at their own pace, and it can be dropped in existing code with very little changes to the original source.
For example:
def my_gen():
...
m1 = Muxer(my_gen)
m2 = Muxer(my_gen)
consumer1(m1).start()
consumer2(m2).start()
As items are pulled from the primary generator they are inserted into queues for each listener. Listeners can subscribe any time by constructing a new Muxer():
import queue
from threading import Lock
from collections import namedtuple
class Muxer():
Entry = namedtuple('Entry', 'genref listeners, lock')
already = {}
top_lock = Lock()
def __init__(self, func, restart=False):
self.restart = restart
self.func = func
self.queue = queue.Queue()
with self.top_lock:
if func not in self.already:
self.already[func] = self.Entry([func()], [], Lock())
ent = self.already[func]
self.genref = ent.genref
self.lock = ent.lock
self.listeners = ent.listeners
self.listeners.append(self)
def __iter__(self):
return self
def __next__(self):
try:
e = self.queue.get_nowait()
except queue.Empty:
with self.lock:
try:
e = self.queue.get_nowait()
except queue.Empty:
try:
e = next(self.genref[0])
for other in self.listeners:
if not other is self:
other.queue.put(e)
except StopIteration:
if self.restart:
self.genref[0] = self.func()
raise
return e
Original source code, including test suite:
https://gist.github.com/earonesty/cafa4626a2def6766acf5098331157b3
The unit tests run many threads concurrently processing the same generated events in sequence. The code is order preserving, with a lock acquired during the single generator's access.
Caveats: the version here uses a singleton to gate access, otherwise it would be possible to accidentally evade its control over the contained generators. It also allows the contained generators to be "restartable", which was a useful feature for me a the time. There is no "close()" feature, simply because I didn't need it. This is an appropriate use case for __del__ however, since the last reference to a listener is the right time to clean up.

python: iterator from a function

What is an idiomatic way to create an infinite iterator from a function? For example
from itertools import islice
import random
rand_characters = to_iterator( random.randint(0,256) )
print ' '.join( islice( rand_characters, 100))
would produce 100 random numbers
You want an iterator which continuously yields values until you stop asking it for new ones? Simply use
it = iter(function, sentinel)
which calls function() for each iteration step until the result == sentinel.
So choose a sentinel which can never be returned by your wanted function, such as None, in your case.
rand_iter = lambda start, end: iter(random.randint(start, end), None)
rand_bytes = rand_iter(0, 256)
If you want to monitor some state on your machine, you could do
iter_mystate = iter(getstate, None)
which, in turn, infinitely calls getstate() for each iteration step.
But beware of functions returning None as a valid value! In this case, you should choose a sentinel which is guaranteed to be unique, maybe an object created for exactly this job:
iter_mystate = iter(getstate, object())
Every time I see iter with 2 arguments, I need to scratch my head an look up the documentation to figure out exactly what is going on. Simply because of that, I would probably roll my own:
def call_forever(callback):
while True:
yield callback()
Or, as stated in the comments by Jon Clements, you could use the itertools.repeatfunc recipe which allows you to pass arguments to the function as well:
import itertools as it
def repeatfunc(func, times=None, *args):
"""
Repeat calls to func with specified arguments.
Example: repeatfunc(random.random)
"""
if times is None:
return it.starmap(func, it.repeat(args))
return it.starmap(func, it.repeat(args, times))
Although I think that the function signature def repeatfunc(func,times=None,*args) is a little awkward. I'd prefer to pass a tuple as args (it seems more explicit to me, and "explicit is better than implicit"):
import itertools as it
def repeatfunc(func, args=(),times=None):
"""
Repeat calls to func with specified arguments.
Example: repeatfunc(random.random)
"""
if times is None:
return it.starmap(func, it.repeat(args))
return it.starmap(func, it.repeat(args, times))
which allows it to be called like:
repeatfunc(func,(arg1,arg2,...,argN),times=4) #repeat 4 times
repeatfunc(func,(arg1,arg2,...)) #repeat infinitely
instead of the vanilla version from itertools:
repeatfunc(func,4,arg1,arg2,...) #repeat 4 times
repeatfunc(func,None,arg1,arg2,...) #repeat infinitely

Create default values for dictionary in python

Let's have a method that would cache results it calculates.
"If" approach:
def calculate1(input_values):
if input_values not in calculate1.cache.keys():
# do some calculation
result = input_values
calculate1.cache[input_values] = result
return calculate1.cache[input_values]
calculate1.cache = {}
"Except" approach:
def calculate2(input_values):
try:
return calculate2.cache[input_values]
except AttributeError:
calculate2.cache = {}
except KeyError:
pass
# do some calculation
result = input_values
calculate2.cache[input_values] = result
return result
"get/has" approach:
def calculate3(input_values):
if not hasattr(calculate3, cache):
calculate3.cache = {}
result = calculate3.cache.get(input_values)
if not result:
# do some calculation
result = input_values
calculate3.cache[input_values] = result
return result
Is there another (faster) way? Which one is most pythonic? Which one would you use?
Note: There's a speed difference:
calculate = calculateX # depening on test run
for i in xrange(10000):
calculate(datetime.utcnow())
Results time python test.py:
calculate1: 0m9.579s
calculate2: 0m0.130s
calculate3: 0m0.095s
Use a collections.defaultdict. It's designed precisely for this purpose.
Of course; this is Python after all: Just use a defaultdict.
Well if you are trying to memoize something, its best to use a Memoize class and decorators.
class Memoize(object):
def __init__(self, func):
self.func = func
self.cache = {}
def __call__(self, *args):
if args not in self.cache:
self.cache[args] = self.func(*args)
return self.cache[args]
Now define some function to be memoized, say a key-strengthening function that does say 100,000 md5sums of a string hashes:
import md5
def one_md5(init_str):
return md5.md5(init_str).hexdigest()
#Memoize
def repeat_md5(cur_str, num=1000000, salt='aeb4f89a2'):
for i in xrange(num):
cur_str = one_md5(cur_str+salt)
return cur_str
The #Memoize function decorator is equivalent to defining the function and then defining repeat_md5 = Memoize(repeat_md5). The first time you call it for a particular set of arguments, the function takes about a second to compute; and the next time you call its near instantaneous as it read from its cache.
As for the method of memoization; as long as you aren't doing something silly (like the first method where you do if key in some_dict.keys() rather than if key in some_dict) there shouldn't be much a significant difference. (The first method is bad as you generate an array from the dictionary first, and then check to see if the key is in it; rather than just check to see whether the key is in the dict (See Coding like a pythonista)). Also catching exceptions will be slower than if statements by nature (you have to create an exception then the exception-handler has to handle it; and then you catch it).

Difference between Python's Generators and Iterators

What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
iterator is a more general concept: any object whose class has a __next__ method (next in Python 2) and an __iter__ method that does return self.
Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions (yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph's definition of an iterator.
You may want to use a custom iterator, rather than a generator, when you need a class with somewhat complex state-maintaining behavior, or want to expose other methods besides __next__ (and __iter__ and __init__). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression) is sufficient, and it's simpler to code because state maintenance (within reasonable limits) is basically "done for you" by the frame getting suspended and resumed.
For example, a generator such as:
def squares(start, stop):
for i in range(start, stop):
yield i * i
generator = squares(a, b)
or the equivalent generator expression (genexp)
generator = (i*i for i in range(a, b))
would take more code to build as a custom iterator:
class Squares(object):
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self): return self
def __next__(self): # next in Python 2
if self.start >= self.stop:
raise StopIteration
current = self.start * self.start
self.start += 1
return current
iterator = Squares(a, b)
But, of course, with class Squares you could easily offer extra methods, i.e.
def current(self):
return self.start
if you have any actual need for such extra functionality in your application.
What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
In summary: Iterators are objects that have an __iter__ and a __next__ (next in Python 2) method. Generators provide an easy, built-in way to create instances of Iterators.
A function with yield in it is still a function, that, when called, returns an instance of a generator object:
def a_function():
"when called, returns generator object"
yield
A generator expression also returns a generator:
a_generator = (i for i in range(0))
For a more in-depth exposition and examples, keep reading.
A Generator is an Iterator
Specifically, generator is a subtype of iterator.
>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True
We can create a generator several ways. A very common and simple way to do so is with a function.
Specifically, a function with yield in it is a function, that, when called, returns a generator:
>>> def a_function():
"just a function definition with yield in it"
yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function() # when called
>>> type(a_generator) # returns a generator
<class 'generator'>
And a generator, again, is an Iterator:
>>> isinstance(a_generator, collections.Iterator)
True
An Iterator is an Iterable
An Iterator is an Iterable,
>>> issubclass(collections.Iterator, collections.Iterable)
True
which requires an __iter__ method that returns an Iterator:
>>> collections.Iterable()
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__
Some examples of iterables are the built-in tuples, lists, dictionaries, sets, frozen sets, strings, byte strings, byte arrays, ranges and memoryviews:
>>> all(isinstance(element, collections.Iterable) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
Iterators require a next or __next__ method
In Python 2:
>>> collections.Iterator()
Traceback (most recent call last):
File "<pyshell#80>", line 1, in <module>
collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next
And in Python 3:
>>> collections.Iterator()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__
We can get the iterators from the built-in objects (or custom objects) with the iter function:
>>> all(isinstance(iter(element), collections.Iterator) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
The __iter__ method is called when you attempt to use an object with a for-loop. Then the __next__ method is called on the iterator object to get each item out for the loop. The iterator raises StopIteration when you have exhausted it, and it cannot be reused at that point.
From the documentation
From the Generator Types section of the Iterator Types section of the Built-in Types documentation:
Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the __iter__() and next() [__next__() in Python 3] methods. More information about generators can be found in the documentation for the yield expression.
(Emphasis added.)
So from this we learn that Generators are a (convenient) type of Iterator.
Example Iterator Objects
You might create object that implements the Iterator protocol by creating or extending your own object.
class Yes(collections.Iterator):
def __init__(self, stop):
self.x = 0
self.stop = stop
def __iter__(self):
return self
def next(self):
if self.x < self.stop:
self.x += 1
return 'yes'
else:
# Iterators must raise when done, else considered broken
raise StopIteration
__next__ = next # Python 3 compatibility
But it's easier to simply use a Generator to do this:
def yes(stop):
for _ in range(stop):
yield 'yes'
Or perhaps simpler, a Generator Expression (works similarly to list comprehensions):
yes_expr = ('yes' for _ in range(stop))
They can all be used in the same way:
>>> stop = 4
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop),
('yes' for _ in range(stop))):
... print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes
Conclusion
You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.
However, in the vast majority of cases, you are best suited to use yield to define a function that returns a Generator Iterator or consider Generator Expressions.
Finally, note that generators provide even more functionality as coroutines. I explain Generators, along with the yield statement, in depth on my answer to "What does the “yield” keyword do?".
Iterators are objects which use the next() method to get the following values of a sequence.
Generators are functions that produce or yield a sequence of values using the yield keyword.
Every next() method call on a generator object(for ex: f below) returned by a generator function (for ex: foo() below), generates the next value in the sequence.
When a generator function is called, it returns an generator object without even beginning the execution of the function. When the next() method is called for the first time, the function starts executing until it reaches a yield statement which returns the yielded value. The yield keeps track of what has happened, i.e. it remembers the last execution. And secondly, the next() call continues from the previous value.
The following example demonstrates the interplay between yield and the call to the next method on a generator object.
>>> def foo():
... print("begin")
... for i in range(3):
... print("before yield", i)
... yield i
... print("after yield", i)
... print("end")
...
>>> f = foo()
>>> next(f)
begin
before yield 0 # Control is in for loop
0
>>> next(f)
after yield 0
before yield 1 # Continue for loop
1
>>> next(f)
after yield 1
before yield 2
2
>>> next(f)
after yield 2
end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Adding an answer because none of the existing answers specifically address the confusion in the official literature.
Generator functions are ordinary functions defined using yield instead of return. When called, a generator function returns a generator object, which is a kind of iterator - it has a next() method. When you call next(), the next value yielded by the generator function is returned.
Either the function or the object may be called the "generator" depending on which Python source document you read. The Python glossary says generator functions, while the Python wiki implies generator objects. The Python tutorial remarkably manages to imply both usages in the space of three sentences:
Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed).
The first two sentences identify generators with generator functions, while the third sentence identifies them with generator objects.
Despite all this confusion, one can seek out the Python language reference for the clear and final word:
The yield expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.
When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of a generator function.
So, in formal and precise usage, "generator" unqualified means generator object, not generator function.
The above references are for Python 2 but Python 3 language reference says the same thing. However, the Python 3 glossary states that
generator ... Usually refers to a generator function, but may refer to a generator iterator in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity.
Everybody has a really nice and verbose answer with examples and I really appreciate it. I just wanted to give a short few lines answer for people who are still not quite clear conceptually:
If you create your own iterator, it is a little bit involved - you have
to create a class and at least implement the iter and the next methods. But what if you don't want to go through this hassle and want to quickly create an iterator. Fortunately, Python provides a short-cut way to defining an iterator. All you need to do is define a function with at least 1 call to yield and now when you call that function it will return "something" which will act like an iterator (you can call next method and use it in a for loop). This something has a name in Python called Generator
Hope that clarifies a bit.
Examples from Ned Batchelder highly recommended for iterators and generators
A method without generators that do something to even numbers
def evens(stream):
them = []
for n in stream:
if n % 2 == 0:
them.append(n)
return them
while by using a generator
def evens(stream):
for n in stream:
if n % 2 == 0:
yield n
We don't need any list nor a return statement
Efficient for large/ infinite length stream ... it just walks and yield the value
Calling the evens method (generator) is as usual
num = [...]
for n in evens(num):
do_smth(n)
Generator also used to Break double loop
Iterator
A book full of pages is an iterable, A bookmark is an
iterator
and this bookmark has nothing to do except to move next
litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration (Exception) as we got end of the iterator
To use Generator ... we need a function
To use Iterator ... we need next and iter
As been said:
A Generator function returns an iterator object
The Whole benefit of Iterator:
Store one element a time in memory
No-code 4 line cheat sheet:
A generator function is a function with yield in it.
A generator expression is like a list comprehension. It uses "()" vs "[]"
A generator object (often called 'a generator') is returned by both above.
A generator is also a subtype of iterator.
Previous answers missed this addition: a generator has a close method, while typical iterators don’t. The close method triggers a StopIteration exception in the generator, which may be caught in a finally clause in that iterator, to get a chance to run some clean‑up. This abstraction makes it most usable in the large than simple iterators. One can close a generator as one could close a file, without having to bother about what’s underneath.
That said, my personal answer to the first question would be: iteratable has an __iter__ method only, typical iterators have a __next__ method only, generators has both an __iter__ and a __next__ and an additional close.
For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close method an a greater composability with yield from. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.
For control flow matters, generators are an as much important concept as promises: both are abstract and composable.
It's difficult to answer the question without 2 other concepts: iterable and iterator protocol.
What is difference between iterator and iterable?
Conceptually you iterate over iterable with the help of corresponding iterator. There are a few differences that can help to distinguish iterator and iterable in practice:
One difference is that iterator has __next__ method, iterable does not.
Another difference - both of them contain __iter__ method. In case of iterable it returns the corresponding iterator. In case of iterator it returns itself.
This can help to distinguish iterator and iterable in practice.
>>> x = [1, 2, 3]
>>> dir(x)
[... __iter__ ...]
>>> x_iter = iter(x)
>>> dir(x_iter)
[... __iter__ ... __next__ ...]
>>> type(x_iter)
list_iterator
What are iterables in python? list, string, range etc. What are iterators? enumerate, zip, reversed etc. We may check this using the approach above. It's kind of confusing. Probably it would be easier if we have only one type. Is there any difference between range and zip? One of the reasons to do this - range has a lot of additional functionality - we may index it or check if it contains some number etc. (see details here).
How can we create an iterator ourselves? Theoretically we may implement Iterator Protocol (see here). We need to write __next__ and __iter__ methods and raise StopIteration exception and so on (see Alex Martelli's answer for an example and possible motivation, see also here). But in practice we use generators. It seems to be by far the main method to create iterators in python.
I can give you a few more interesting examples that show somewhat confusing usage of those concepts in practice:
in keras we have tf.keras.preprocessing.image.ImageDataGenerator; this class doesn't have __next__ and __iter__ methods; so it's not an iterator (or generator);
if you call its flow_from_dataframe() method you'll get DataFrameIterator that has those methods; but it doesn't implement StopIteration (which is not common in build-in iterators in python); in documentation we may read that "A DataFrameIterator yielding tuples of (x, y)" - again confusing usage of terminology;
we also have Sequence class in keras and that's custom implementation of a generator functionality (regular generators are not suitable for multithreading) but it doesn't implement __next__ and __iter__, rather it's a wrapper around generators (it uses yield statement);
Generator Function, Generator Object, Generator:
A Generator function is just like a regular function in Python but it contains one or more yield statements. Generator functions is a great tool to create Iterator objects as easy as possible. The Iterator object returend by generator function is also called Generator object or Generator.
In this example I have created a Generator function which returns a Generator object <generator object fib at 0x01342480>. Just like other iterators, Generator objects can be used in a for loop or with the built-in function next() which returns the next value from generator.
def fib(max):
a, b = 0, 1
for i in range(max):
yield a
a, b = b, a + b
print(fib(10)) #<generator object fib at 0x01342480>
for i in fib(10):
print(i) # 0 1 1 2 3 5 8 13 21 34
print(next(myfib)) #0
print(next(myfib)) #1
print(next(myfib)) #1
print(next(myfib)) #2
So a generator function is the easiest way to create an Iterator object.
Iterator:
Every generator object is an iterator but not vice versa. A custom iterator object can be created if its class implements __iter__ and __next__ method (also called iterator protocol).
However, it is much easier to use generators function to create iterators because they simplify their creation, but a custom Iterator gives you more freedom and you can also implement other methods according to your requirements as shown in the below example.
class Fib:
def __init__(self,max):
self.current=0
self.next=1
self.max=max
self.count=0
def __iter__(self):
return self
def __next__(self):
if self.count>self.max:
raise StopIteration
else:
self.current,self.next=self.next,(self.current+self.next)
self.count+=1
return self.next-self.current
def __str__(self):
return "Generator object"
itobj=Fib(4)
print(itobj) #Generator object
for i in Fib(4):
print(i) #0 1 1 2
print(next(itobj)) #0
print(next(itobj)) #1
print(next(itobj)) #1
This thread covers in many details all the differences between the two, but wanted to add something on the conceptual difference between the two:
[...] an iterator as defined in the GoF book retrieves items from a collection, while a generator can produce items “out of thin air”. That’s why the Fibonacci sequence generator is a common example: an infinite series of numbers cannot be stored in a collection.
Ramalho, Luciano. Fluent Python (p. 415). O'Reilly Media. Kindle Edition.
Sure, it does not cover all the aspects but I think it gives a good notion when one can be useful.
You can compare both approaches for the same data:
def myGeneratorList(n):
for i in range(n):
yield i
def myIterableList(n):
ll = n*[None]
for i in range(n):
ll[i] = i
return ll
# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
print("{} {}".format(i1, i2))
# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
Besides, if you check the memory footprint, the generator takes much less memory as it doesn't need to store all the values in memory at the same time.
An iterable object is something which can be iterated (naturally). To do that, however, you will need something like an iterator object, and, yes, the terminology may be confusing. Iterable objects include a __iter__ method which will return the iterator object for the iterable object.
An iterator object is an object which implements the iterator protocol - a set of rules. In this case, it must have at least these two methods: __iter__ and __next__. The __next__ method is a function which supplies a new value. The __iter__ method returns the iterator object. In a more complex object, there may be a separate iterator, but in a simpler case, __iter__ returns the object itself (typically return self).
One iterable object is a list object. It’s not an iterator, but it has an __iter__ method which returns an iterator. You can call this method directly as things.__iter__(), or use iter(things).
If you want to iterate through any collection, you will need to use its iterator:
things_iterator = iter(things)
for i in things_iterator:
print(i)
However, Python will automatically use the iterator, which is why you never see the above example. Instead you write:
for i in things:
print(i)
Writing an iterator yourself can be tedious, so Python has a simpler alternative: the generator function. A generator function is not an ordinary function. Instead of running through the code and returning a final result, the code is deferred, and the function returns immediately with a generator object.
A generator object is like an iterator object in that it implements the iterator protocol. That’s good enough for most purposes. There are many examples of generators in the other answers.
In short, an iterator is an object which allows you to iterate through another object, whether it’s a collection or some other source of values. A generator is a simplified iterator which does more-or-less the same job, but is easier to implement.
Normally, you would go for a generator if that’s all you need. If, however, you’re building a more complex object which includes iteration among other features, you would use the iterator protocol instead.
I am writing specifically for Python newbies in a very simple way, though deep down Python does so many things.
Let’s start with the very basic:
Consider a list,
l = [1,2,3]
Let’s write an equivalent function:
def f():
return [1,2,3]
o/p of print(l): [1,2,3] &
o/p of print(f()) : [1,2,3]
Let’s make list l iterable: In python list is always iterable that means you can apply iterator whenever you want.
Let’s apply iterator on list:
iter_l = iter(l) # iterator applied explicitly
Let’s make a function iterable, i.e. write an equivalent generator function.
In python as soon as you introduce the keyword yield; it becomes a generator function and iterator will be applied implicitly.
Note: Every generator is always iterable with implicit iterator applied and here implicit iterator is the crux
So the generator function will be:
def f():
yield 1
yield 2
yield 3
iter_f = f() # which is iter(f) as iterator is already applied implicitly
So if you have observed, as soon as you made function f a generator, it is already iter(f)
Now,
l is the list, after applying iterator method "iter" it becomes,
iter(l)
f is already iter(f), after applying iterator method "iter" it
becomes, iter(iter(f)), which is again iter(f)
It's kinda you are casting int to int(x) which is already int and it will remain int(x).
For example o/p of :
print(type(iter(iter(l))))
is
<class 'list_iterator'>
Never forget this is Python and not C or C++
Hence the conclusion from above explanation is:
list l ~= iter(l)
generator function f == iter(f)
All generators are iterators but not vice versa.
from typing import Iterator
from typing import Iterable
from typing import Generator
class IT:
def __init__(self):
self.n = 0
def __iter__(self):
return self
def __next__(self):
if self.n == 4:
raise StopIteration
try:
return self.n
finally:
self.n += 1
def g():
for i in range(4):
yield i
def test(it):
print(f'type(it) = {type(it)}')
print(f'isinstance(it, Generator) = {isinstance(it, Generator)}')
print(f'isinstance(it, Iterator) = {isinstance(it, Iterator)}')
print(f'isinstance(it, Iterable) = {isinstance(it, Iterable)}')
print(next(it))
print(next(it))
print(next(it))
print(next(it))
try:
print(next(it))
except StopIteration:
print('boom\n')
print(f'issubclass(Generator, Iterator) = {issubclass(Generator, Iterator)}')
print(f'issubclass(Iterator, Iterable) = {issubclass(Iterator, Iterable)}')
print()
test(IT())
test(g())
Output:
issubclass(Generator, Iterator) = True
issubclass(Iterator, Iterable) = True
type(it) = <class '__main__.IT'>
isinstance(it, Generator) = False
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom
type(it) = <class 'generator'>
isinstance(it, Generator) = True
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom

Categories