Consider the following Python code:
def values():
with somecontext():
yield 1
yield 2
for v in values():
print(v)
break
In this case, does Python guarantee that the generator is properly closed and, thus, that the context is exited?
I realize that it, in practice, is going to be the case in CPython due to reference counting and eager destruction of the generator, but does Python guarantee this behavior? I do notice that it does indeed not work in Jython, so should that be considered a bug or allowable behavior?
Yes, you can use a with statement in a generator without issue. Python will handle the context correctly, because the generator will be closed when garbage collected.
In the generator a GeneratorExit exception is raised when the generator is garbage collected, because it'll be closed at that time:
>>> from contextlib import contextmanager
>>> #contextmanager
... def somecontext():
... print 'Entering'
... try:
... yield None
... finally:
... print 'Exiting'
...
>>> def values():
... with somecontext():
... yield 1
... yield 2
...
>>> next(values())
Entering
Exiting
1
This is part of PEP 342, where closing a generator raises the exception. Reaping a generator that has no references left should always close that generator, if Jython is not closing the generator I'd consider that a bug.
See points 4 and 5 of the Specification Summary:
Add a close() method for generator-iterators, which raises GeneratorExit at the point where the generator was paused. If
the generator then raises StopIteration (by exiting normally, or
due to already being closed) or GeneratorExit (by not catching
the exception), close() returns to its caller. If the generator
yields a value, a RuntimeError is raised. If the generator
raises any other exception, it is propagated to the caller.
close() does nothing if the generator has already exited due to
an exception or normal exit.
Add support to ensure that close() is called when a generator
iterator is garbage-collected.
The only caveat then is that in Jython, IronPython and PyPy the garbage collector is not guaranteed to be run before exiting the interpreter. If this is important to your application you can explicitly close the generator:
gen = values()
next(gen)
gen.close()
or trigger garbage collection explicitly.
If your emphasis is on safety, you can always wrap the generator in a contextlib.closing - this seems like the most straightforward solution:
from contextlib import closing
with closing(gen_values()) as values:
for value in values:
...
In fact, if it were me I'd write the function as
def gen_values():
def inner():
...
return closing(inner())
ensuring that any user has to put this in a with to use it.
Related
Do I have to write
def count10():
for i in range(10):
yield i
gen = count10()
for j in gen:
print(j)
gen.close()
to save memory, or just
def count10():
for i in range(10):
yield i
for j in count10():
print(j)
In fact I would like to learn details of lifecycle of Python generator but failed to find relevant resources.
You don't need to close that generator.
close-ing a generator isn't about saving memory. (close-ing things is almost never about saving memory.) The idea behind the close method on a generator is that you might stop iterating over a generator while it's still in the middle of a try or with:
def gen():
with something_important():
yield from range(10)
for i in gen():
if i == 5:
break
close-ing a suspended generator throws a GeneratorExit exception into the generator, with the intent of triggering finally blocks and context manager __exit__ methods. Here, close would cause the generator to run the __exit__ method of something_important(). If you don't abandon a generator in the middle like this (or if your generator doesn't have any finally or with blocks, including in generators it delegates to with yield from), then close is unnecessary (and does nothing).
The memory management system usually runs close for you, but to really ensure prompt closure across Python implementations, you'd have to replace code like
for thing in gen():
...
with
with contextlib.closing(gen()) as generator:
for thing in generator:
...
I've never seen anyone do this.
The close method for generators came about in PEP 342:
Add a close() method for generator-iterators, which raises GeneratorExit at the point where the generator was paused. If the generator then raises StopIteration (by exiting normally, or due to already being closed) or GeneratorExit (by not catching the exception), close() returns to its caller. If the generator yields a value, a RuntimeError is raised. If the generator raises any other exception, it is propagated to the caller. close() does nothing if the generator has already exited due to an exception or normal exit.
Note the last sentence: close() does nothing if the generator has already exited due to an exception or normal exit.
I am facing a strange behavior with nested generators.
def empty_generator():
for i in []:
yield
def gen():
next(empty_generator())
print("This is not printed, why?")
yield
list(gen()) # No Error
next(empty_generator()) # Error
I would expect the gen() function to raises an error, as I am calling next() around an empty generator. But this is not the case, the functions is leaving from nowhere, without raising or printing anything.
That seems to violate the principle of least astonishment, isn't it?
Technically, you don't have an error; you have an uncaught StopIteration exception, which is used for flow control. The call to list, which takes an arbitrary iterable as its argument, catches the exception raised by gen for you.
for loops work similarly; every iterator raises StopIteration at the end, but the for loop catches it and ends in response.
Put another way, the consumer of an iterable is responsible for catching StopIteration. When gen calls next, it lets the exception bubble up. The call to list catches it, but you don't when you call next explicitly.
Note that PEP-479 changes this behavior. Python 3.5 provides the new semantics via __future__, Python 3.6 makes provides a deprecation warning, and Python 3.7 (due out Summer 2018) completes the transition. I refer the reader to the PEP itself for further details.
Once an iterator reaches its end, it raises StopIteration which... stops the iteration, so list(gen()) constructs an empty list.
After python 3.3.2+ python support a new syntax for create generator function
yield from <expression>
I have made a quick try for this by
>>> def g():
... yield from [1,2,3,4]
...
>>> for i in g():
... print(i)
...
1
2
3
4
>>>
It seems simple to use but the PEP document is complex. My question is that is there any other difference compare to the previous yield statement? Thanks.
For most applications, yield from just yields everything from the left iterable in order:
def iterable1():
yield 1
yield 2
def iterable2():
yield from iterable1()
yield 3
assert list(iterable2) == [1, 2, 3]
For 90% of users who see this post, I'm guessing that this will be explanation enough for them. yield from simply delegates to the iterable on the right hand side.
Coroutines
However, there are some more esoteric generator circumstances that also have importance here. A less known fact about Generators is that they can be used as co-routines. This isn't super common, but you can send data to a generator if you want:
def coroutine():
x = yield None
yield 'You sent: %s' % x
c = coroutine()
next(c)
print(c.send('Hello world'))
Aside: You might be wondering what the use-case is for this (and you're not alone). One example is the contextlib.contextmanager decorator. Co-routines can also be used to parallelize certain tasks. I don't know too many places where this is taken advantage of, but google app-engine's ndb datastore API uses it for asynchronous operations in a pretty nifty way.
Now, lets assume you send data to a generator that is yielding data from another generator ... How does the original generator get notified? The answer is that it doesn't in python2.x where you need to wrap the generator yourself:
def python2_generator_wapper():
for item in some_wrapped_generator():
yield item
At least not without a whole lot of pain:
def python2_coroutine_wrapper():
"""This doesn't work. Somebody smarter than me needs to fix it. . .
Pain. Misery. Death lurks here :-("""
# See https://www.python.org/dev/peps/pep-0380/#formal-semantics for actual working implementation :-)
g = some_wrapped_generator()
for item in g:
try:
val = yield item
except Exception as forward_exception: # What exceptions should I not catch again?
g.throw(forward_exception)
else:
if val is not None:
g.send(val) # Oops, we just consumed another cycle of g ... How do we handle that properly ...
This all becomes trivial with yield from:
def coroutine_wrapper():
yield from coroutine()
Because yield from truly delegates (everything!) to the underlying generator.
Return semantics
Note that the PEP in question also changes the return semantics. While not directly in OP's question, it's worth a quick digression if you are up for it. In python2.x, you can't do the following:
def iterable():
yield 'foo'
return 'done'
It's a SyntaxError. With the update to yield, the above function is not legal. Again, the primary use-case is with coroutines (see above). You can send data to the generator and it can do it's work magically (maybe using threads?) while the rest of the program does other things. When flow control passes back to the generator, StopIteration will be raised (as is normal for the end of a generator), but now the StopIteration will have a data payload. It is the same thing as if a programmer instead wrote:
raise StopIteration('done')
Now the caller can catch that exception and do something with the data payload to benefit the rest of humanity.
At first sight, yield from is an algorithmic shortcut for:
def generator1():
for item in generator2():
yield item
# do more things in this generator
Which is then mostly equivalent to just:
def generator1():
yield from generator2()
# more things on this generator
In English: when used inside an iterable, yield from issues each element in another iterable, as if that item were coming from the first generator, from the point of view of the code calling the first generator.
The main reasoning for its creation is to allow easy refactoring of code relying heavily on iterators - code which use ordinary functions always could, at very little extra cost, have blocks of one function refactored to other functions, which are then called - that divides tasks, simplifies reading and maintaining the code, and allows for more reusability of small code snippets -
So, large functions like this:
def func1():
# some calculation
for i in somesequence:
# complex calculation using i
# ...
# ...
# ...
# some more code to wrap up results
# finalizing
# ...
Can become code like this, without drawbacks:
def func2(i):
# complex calculation using i
# ...
# ...
# ...
return calculated_value
def func1():
# some calculation
for i in somesequence:
func2(i)
# some more code to wrap up results
# finalizing
# ...
When getting to iterators however, the form
def generator1():
for item in generator2():
yield item
# do more things in this generator
for item in generator1():
# do things
requires that for each item consumed from generator2, the running context be first switched to generator1, nothing is done in that context, and the cotnext have to be switched to generator2 - and when that one yields a value, there is another intermediate context switch to generator1, before getting the value to the actual code consuming those values.
With yield from these intermediate context switches are avoided, which can save quite some resources if there are a lot of iterators chained: the context switches straight from the context consuming the outermost generator to the innermost generator, skipping the context of the intermediate generators altogether, until the inner ones are exhausted.
Later on, the language took advantage of this "tunelling" through intermediate contexts to use these generators as co-routines: functions that can make asynchronous calls. With the proper framework in place, as descibed in https://www.python.org/dev/peps/pep-3156/ , these co-routines are written in a way that when they will call a function that would take a long time to resolve (due to a network operation, or a CPU intensive operation that can be offloaded to another thread) - that call is made with a yield from statement - the framework main loop then arranges so that the called expensive function is properly scheduled, and retakes execution (the framework mainloop is always the code calling the co-routines themselves). When the expensive result is ready, the framework makes the called co-routine behave like an exhausted generator, and execution of the first co-routine resumes.
From the programmer's point of view it is as if the code was running straight forward, with no interruptions. From the process point of view, the co-routine was paused at the point of the expensive call, and other (possibly parallel calls to the same co-routine) continued running.
So, one might write as part of a web crawler some code along:
#asyncio.coroutine
def crawler(url):
page_content = yield from async_http_fetch(url)
urls = parse(page_content)
...
Which could fetch tens of html pages concurrently when called from the asyncio loop.
Python 3.4 added the asyncio module to the stdlib as the default provider for this kind of functionality. It worked so well, that in Python 3.5 several new keywords were added to the language to distinguish co-routines and asynchronous calls from the generator usage, described above. These are described in https://www.python.org/dev/peps/pep-0492/
Here is an example that illustrates it:
>>> def g():
... yield from range(5)
...
>>> list(g())
[0, 1, 2, 3, 4]
>>> def g():
... yield range(5)
...
>>> list(g())
[range(0, 5)]
>>>
yield from yields each item of the iterable, but yield yields the iterable itself.
The difference is simple:
yield:
[extra info, if you know the working of generator you can skip that]
yield is used to produce a single value from the generator function. When the generator function is called, it starts executing, and when a yield statement is encountered, it temporarily suspends the execution of the function, returns the value to the caller, and saves its current state. The next time the function is called, it resumes execution from where it left off, and continues until it hits the next yield statement.
In example below, generator1 and generator2 returning a value wrapped in a generator object but combined_generator is also returning a generator object but that object has another generator object, Now, to get the value of these nested generator we were using yield from
class Gen:
def generator1(self):
yield 1
yield 2
yield 3
def generator2(self):
yield 'a'
yield 'b'
yield 'c'
def combined_generator(self):
"""
This function yielding a generator, which inturn yielding a generator
so we need to use `yield from` so that our end function can directly consume the values instead.
"""
yield from self.generator1()
yield from self.generator2()
def run(self):
print("Gen running ...")
for item in self.combined_generator():
print(item)
g = Gen()
g.run()
The output of above is:
Gen calling ...
1
2
3
a
b
c
I've been testing a dirty hack inspired by this http://docs.python.org/2/library/contextlib.html .
The main idea is to bring try/finally idea onto class level and get reliable and simple class destructor.
class Foo():
def __init__(self):
self.__res_mgr__ = self.__acquire_resources__()
self.__res_mgr__.next()
def __acquire_resources__(self):
try:
# Acquire some resources here
print "Initialize"
self.f = 1
yield
finally:
# Release the resources here
print "Releasing Resources"
self.f = 0
f = Foo()
print "testing resources"
print f.f
But it always gives me:
Initialize
testing resources
1
and never "Releasing Resources". I'm basing my hope on:
As of Python version 2.5, the yield statement is now allowed in the
try clause of a try ... finally construct. If the generator is not
resumed before it is finalized (by reaching a zero reference count or
by being garbage collected), the generator-iterator’s close() method
will be called, allowing any pending finally clauses to execute. Source link
But it seems when the class member is being garbage collected together with the class their ref counts don't decrease, so as a result generators close() and thus finally is never called. As for the second part of the quote
"or by being garbage collected"
I just don't know why it's not true. Any chance to make this utopia work? :)
BTW this works on module level:
def f():
try:
print "ack"
yield
finally:
print "release"
a = f()
a.next()
print "testing"
Output will be as I expect:
ack
testing
release
NOTE: In my task I'm not able to use WITH manager because I'm releasing the resource inside end_callback of the thread (it will be out of any WITH). So I wanted to get a reliable destructor for cases when callback won't be called for some reason
The problem you are having is caused by a reference cycle and an implicit __del__ defined on your generator (it's so implicit, CPython doesn't actually show __del__ when you introspect, because only the C level tp_del exists, no Python-visible __del__ is created). Basically, when a generator has a yield inside:
A try block, or equivalently
A with block
it has an implicit __del__-like implementation. On Python 3.3 and earlier, if a reference cycle contains an object whose class implements __del__ (technically, has tp_del in CPython), unless the cycle is manually broken, the cyclic garbage collector cannot clean it up, and just sticks it in gc.garbage (import gc to gain access), because it doesn't know which objects (if any) must be collected first to clean up "nicely".
Because your class's __acquire_resources__(self) contains a reference to the instance's self, you form a reference cycle:
self -> self.__res_mgr__ (generator object) -> generator frame (referencing locals which includes) -> self
Because of this reference cycle, and the fact that the generator has a try/finally in it (creating tp_del equivalent to __del__), the cycle is uncollectable, and your finally block never gets executed unless you manually advance self.__res_mgr__ (which defeats the whole purpose).
You experiment happens to display this problem automatically because the reference cycle is implicit/automatic, but any accidental reference cycle where an object in the cycle has a class with __del__ will trigger the same problem, so even if you just did:
class Foo():
def __init__(self):
# Acquire some resources here
print "Initialize"
self.f = 1
def __del__(self):
# Release the resources here
print "Releasing Resources"
self.f = 0
if the "resources" involved could conceivably lead to a reference cycle with an instance of Foo, you'd have the same problem.
The solution here is one or both of:
Make your class a context manager so users provide the information necessary for deterministic finalization (by using with blocks) as well as providing an explicit cleanup method (e.g. close) for when with blocks aren't feasible (part of another object's state that is cleaned up through its own resource management). This is also the only way to provide deterministic cleanup on most non-CPython interpreters where reference counting semantics have never been used (so all finalizers are called non-deterministically, if at all)
Move to Python 3.4 or higher, where PEP 442 resolves the issue with uncollectable cyclic garbage (it's technically still possible to produce such cycles on CPython, but only via third party extensions that continue to use tp_del instead of updating to use the tp_finalize slot that allows cyclic garbage to be cleaned properly). It's still non-deterministic cleanup (if a reference cycle exists, you're waiting on the cyclic gc to run, sometime), but it's possible, where pre-3.4, cyclic garbage of this sort could not be cleaned up at all.
Why in the example function terminates:
def func(iterable):
while True:
val = next(iterable)
yield val
but if I take off yield statement function will raise StopIteration exception?
EDIT: Sorry for misleading you guys. I know what generators are and how to use them. Of course when I said function terminates I didn't mean eager evaluation of function. I just implied that when I use function to produce generator:
gen = func(iterable)
in case of func it works and returns the same generator, but in case of func2:
def func2(iterable):
while True:
val = next(iterable)
it raises StopIteration instead of None return or infinite loop.
Let me be more specific. There is a function tee in itertools which is equivalent to:
def tee(iterable, n=2):
it = iter(iterable)
deques = [collections.deque() for i in range(n)]
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return tuple(gen(d) for d in deques)
There is, in fact, some magic, because nested function gen has infinite loop without break statements. gen function terminates due to StopIteration exception when there is no items in it. But it terminates correctly (without raising exceptions), i.e. just stops loop. So the question is: where is StopIteration is handled?
Note: This question (and the original part of my answer to it) are only really meaningful for Python versions prior to 3.7. The behavior that was asked about no longer happens in 3.7 and later, thanks to changes described in PEP 479. So this question and the original answer are only really useful as historical artifacts. After the PEP was accepted, I added an additional section at the bottom of the answer which is more relevant to modern versions of Python.
To answer your question about where the StopIteration gets caught in the gen generator created inside of itertools.tee: it doesn't. It is up to the consumer of the tee results to catch the exception as they iterate.
First off, it's important to note that a generator function (which is any function with a yield statement in it, anywhere) is fundamentally different than a normal function. Instead of running the function's code when it is called, instead, you'll just get a generator object when you call the function. Only when you iterate over the generator will you run the code.
A generator function will never finish iterating without raising StopIteration (unless it raises some other exception instead). StopIteration is the signal from the generator that it is done, and it is not optional. If you reach a return statement or the end of the generator function's code without raising anything, Python will raise StopIteration for you!
This is different from regular functions, which return None if they reach the end without returning anything else. It ties in with the different ways that generators work, as I described above.
Here's an example generator function that will make it easy to see how StopIteration gets raised:
def simple_generator():
yield "foo"
yield "bar"
# StopIteration will be raised here automatically
Here's what happens when you consume it:
>>> g = simple_generator()
>>> next(g)
'foo'
>>> next(g)
'bar'
>>> next(g)
Traceback (most recent call last):
File "<pyshell#6>", line 1, in <module>
next(g)
StopIteration
Calling simple_generator always returns a generator object immediately (without running any of the code in the function). Each call of next on the generator object runs the code until the next yield statement, and returns the yielded value. If there is no more to get, StopIteration is raised.
Now, normally you don't see StopIteration exceptions. The reason for this is that you usually consume generators inside for loops. A for statement will automatically call next over and over until StopIteration gets raised. It will catch and suppress the StopIteration exception for you, so you don't need to mess around with try/except blocks to deal with it.
A for loop like for item in iterable: do_suff(item) is almost exactly equivalent to this while loop (the only difference being that a real for doesn't need a temporary variable to hold the iterator):
iterator = iter(iterable)
try:
while True:
item = next(iterator)
do_stuff(item)
except StopIteration:
pass
finally:
del iterator
The gen generator function you showed at the top is one exception. It uses the StopIteration exception produced by the iterator it is consuming as it's own signal that it is done being iterated on. That is, rather than catching the StopIteration and then breaking out of the loop, it simply lets the exception go uncaught (presumably to be caught by some higher level code).
Unrelated to the main question, there is one other thing I want to point out. In your code, you're calling next on an variable called iterable. If you take that name as documentation for what type of object you will get, this is not necessarily safe.
next is part of the iterator protocol, not the iterable (or container) protocol. It may work for some kinds of iterables (such as files and generators, as those types are their own iterators), but it will fail for others iterables, such as tuples and lists. The more correct approach is to call iter on your iterable value, then call next on the iterator you receive. (Or just use for loops, which call both iter and next for you at appropriate times!)
I just found my own answer in a Google search for a related question, and I feel I should update to point out that the answer above is not true in modern Python versions.
PEP 479 has made it an error to allow a StopIteration to bubble up uncaught from a generator function. If that happens, Python will turn it into a RuntimeError exception instead. This means that code like the examples in older versions of itertools that used a StopIteration to break out of a generator function needs to be modified. Usually you'll need to catch the exception with a try/except and then return.
Because this was a backwards incompatible change, it was phased in gradually. In Python 3.5, all code worked as before by default, but you could get the new behavior with from __future__ import generator_stop. In Python 3.6, unmodified code would still work, but it would give a warning. In Python 3.7 and later, the new behavior applies all the time.
When a function contains yield, calling it does not actually execute anything, it merely creates a generator object. Only iterating over this object will execute the code. So my guess is that you're merely calling the function, which means the function doesn't raise StopIteration because it is never being executed.
Given your function, and an iterable:
def func(iterable):
while True:
val = next(iterable)
yield val
iterable = iter([1, 2, 3])
This is the wrong way to call it:
func(iterable)
This is the right way:
for item in func(iterable):
# do something with item
You could also store the generator in a variable and call next() on it (or iterate over it in some other way):
gen = func(iterable)
print(next(gen)) # prints 1
print(next(gen)) # prints 2
print(next(gen)) # prints 3
print(next(gen)) # StopIteration
By the way, a better way to write your function is as follows:
def func(iterable):
for item in iterable:
yield item
Or in Python 3.3 and later:
def func(iterable):
yield from iter(iterable)
Of course, real generators are rarely so trivial. :-)
Without the yield, you iterate over the entire iterable without stopping to do anything with val. The while loop does not catch the StopIteration exception. An equivalent for loop would be:
def func(iterable):
for val in iterable:
pass
which does catch the StopIteration and simply exit the loop and thus return from the function.
You can explicitly catch the exception:
def func(iterable):
while True:
try:
val = next(iterable)
except StopIteration:
break
yield doesn't catch the StopIteration. What yield does for your function is it causes it to become a generator function rather than a regular function. Thus, the object returned from the function call is an iterable object (which calculates the next value when you ask it to with the next function (which gets called implicitly by a for loop)). If you leave the yield statement out of it, then python executes the entire while loop right away which ends up exhausting the iterable (if it is finite) and raising StopIteration right when you call it.
consider:
x = func(x for x in [])
next(x) #raises StopIteration
A for loop catches the exception -- That's how it knows when to stop calling next on the iterable you gave it.
Tested on Python 3.8, chunk as lazy generator
def split_to_chunk(size: int, iterable: Iterable) -> Iterable[Iterable]:
source_iter = iter(iterable)
while True:
batch_iter = itertools.islice(source_iter, size)
try:
yield itertools.chain([next(batch_iter)], batch_iter)
except StopIteration:
return
Why handling StopInteration error: https://www.python.org/dev/peps/pep-0479/
def sample_gen() -> Iterable[int]:
i = 0
while True:
yield i
i += 1
for chunk in split_to_chunk(7, sample_gen()):
pprint.pprint(list(chunk))
time.sleep(2)
Output:
[0, 1, 2, 3, 4, 5, 6]
[7, 8, 9, 10, 11, 12, 13]
[14, 15, 16, 17, 18, 19, 20]
[21, 22, 23, 24, 25, 26, 27]
............................