Coroutines vs generators for parsing [duplicate] - python

What is the difference between a coroutine and a continuation and a generator ?

I'll start with generators, seeing as they're the simplest case. As #zvolkov mentioned, they're functions/objects that can be repeatedly called without returning, but when called will return (yield) a value and then suspend their execution. When they're called again, they will start up from where they last suspended execution and do their thing again.
A generator is essentially a cut down (asymmetric) coroutine. The difference between a coroutine and generator is that a coroutine can accept arguments after it's been initially called, whereas a generator can't.
It's a bit difficult to come up with a trivial example of where you'd use coroutines, but here's my best try. Take this (made up) Python code as an example.
def my_coroutine_body(*args):
while True:
# Do some funky stuff
*args = yield value_im_returning
# Do some more funky stuff
my_coro = make_coroutine(my_coroutine_body)
x = 0
while True:
# The coroutine does some funky stuff to x, and returns a new value.
x = my_coro(x)
print x
An example of where coroutines are used is lexers and parsers. Without coroutines in the language or emulated somehow, lexing and parsing code needs to be mixed together even though they're really two separate concerns. But using a coroutine, you can separate out the lexing and parsing code.
(I'm going to brush over the difference between symmetric and asymmetric coroutines. Suffice it to say that they're equivalent, you can convert from one to the other, and asymmetric coroutines--which are the most like generators--are the easier to understand. I was outlining how one might implement asymmetric coroutines in Python.)
Continuations are actually quite simple beasts. All they are, are functions representing another point in the program which, if you call it, will cause execution to automatically switch to the point that function represents. You use very restricted versions of them every day without even realising it. Exceptions, for instance, can be thought of as a kind of inside-out continuation. I'll give you a Python based pseudocode example of a continuation.
Say Python had a function called callcc(), and this function took two arguments, the first being a function, and the second being a list of arguments to call it with. The only restriction on that function would be that the last argument it takes will be a function (which will be our current continuation).
def foo(x, y, cc):
cc(max(x, y))
biggest = callcc(foo, [23, 42])
print biggest
What would happen is that callcc() would in turn call foo() with the current continuation (cc), that is, a reference to the point in the program at which callcc() was called. When foo() calls the current continuation, it's essentially the same as telling callcc() to return with the value you're calling the current continuation with, and when it does that, it rolls back the stack to where the current continuation was created, i.e., when you called callcc().
The result of all of this would be that our hypothetical Python variant would print '42'.

Coroutine is one of several procedures that take turns doing their job and then pause to give control to the other coroutines in the group.
Continuation is a "pointer to a function" you pass to some procedure, to be executed ("continued with") when that procedure is done.
Generator (in .NET) is a language construct that can spit out a value, "pause" execution of the method and then proceed from the same point when asked for the next value.

In newer version of Python, you can send values to Generators with generator.send(), which makes python Generators effectively coroutines.
The main difference between python Generator, and other generator, say greenlet, is that in python, your yield value can only return back to the caller. While in greenlet, target.switch(value) can take you to a specific target coroutine and yield a value where the target would continue to run.

Related

Convert callback function to generator

Suppose I am building a library function that does work incrementally and passes the results through callback functions:
def do_something(even_callback, odd_callback):
for i in range(10):
if i % 2:
even_callback(i)
else:
odd_callback(i)
Some callers care about handling the two types of events differently, and so they use separate callback implementations. However many callers don't care about the difference and just use the same implementation:
do_something(print, print)
For these callers, it would be more convenient to have a version of the function do_something_generator:
for result in do_something_generator():
print(result)
Leaving aside the solution of rewriting do_something to be a generator itself, or async/thread/process, how could I wrap do_something to turn it into a generator? (Also excluding "get all the results in a list and then yield them")
Per discussion in the comments, one possible solution would be to rewrite the do_something function to be a generator and then wrap it to get the callback version, however this would require tagging the events as "even" or "odd" so they could be distinguished in the callback implementation.
In my particular case, this is infeasible due to frequent events with chunks of data - adding tagging into the base implementation is a nontrivial overhead cost versus callbacks.
Leaving aside the solution of rewriting do_something to be a generator itself, or async/thread/process, how could I wrap do_something to turn it into a generator?
Assuming that you exclude rewriting the function on the fly so that it actually yields rather than calls callbacks, this is impossible.
Python is single-threaded. What you're describing has two functions running at once, one of which (the wrapped) signals to the other (the wrapper) whenever a value is ready and then passes that value up, whereupon the wrapper yields it. This is impossible, unless one of the following conditions is met:
the wrapped function is in a separate thread/process, able to run by itself
the wrapped function switches execution whilst passing a value rather than calling callbacks
Since the only two ways of switching execution whilst passing a value are yielding or async yielding, there is no way of doing this in the language if your constraints are met. (There is no way to write a callback which signals to the wrapper to yield a value and then allows the wrapped function to resume, except by placing them in different threads.)
What you're describing sounds kind of like protothreads, only for python, by the way.
However there's no reason not to write it the other way round:
from collections import namedtuple
Monad = namedtuple("Monad", "is_left,val")
def do_stuff():
for i in range(10):
yield Monad(is_left=bool(i%2), i)
def do_stuff_always():
for monad in do_stuff():
yield monad.val
def do_stuff_callback(left_callback, right_callback):
for monad in do_stuff():
if monad.is_left:
left_callback(val)
else:
right_callback(val)
I can't resist calling these structures monads, but perhaps that's not what you're trying to encode here.

Why would I ever want to use `async def` over `#asyncio.coroutine`?

Python 3.5 greatly expanded support for asynchronous programming with a new function definition syntax. Whereas async functions were previously just "generators with benefits":
def generate_numbers():
"""
Generator function that lazily returns 1 - 100
"""
for i in range 100:
yield i
generate_async = asyncio.coroutine(generate_numbers)
generate_async.__doc__ = """
Coroutine that lazily returns 1 - 100
This can be used interchangeably as a generator or a coroutine
"""
they now have their own special declaration syntax and special behavior by which they are no longer usable as usual generator functions:
aysnc def generate_async_native():
"""
A coroutine that returns 1 - 100
This CANNOT be used as a generator, and can ONLY be executed by running it from an event loop
"""
for i in range(100):
await i
This is not a question about the functional or practical differences between these types -- that is discussed in this StackOverflow answer.
My question is: why would I ever want to use async def? It seems like it provides no additional benefit over #asyncio.coroutine, but imposes an additional cost in that it
breaks backward-compatibility (Python 3.5 code with async def
won't even parse in older versions, although this is arguably a feature and not a bug) and
seems to provide less flexibility in how the function can be called.
One possible answer is given by Martijn Pieters:
The advantages are that with native support, you can also introduce additional syntax to support asynchronous context managers and iterators. Entering and exiting a context manager, or looping over an iterator then can become more points in your co-routine that signal that other code can run instead because something is waiting again
This has in fact come to fruition with new async with and async for syntax, which cannot be as easily implemented with a "tack-on" solution like a decorated generator.

GridSearchCV: print some expression each time a function completes a loop

Assume you have some function function in Python that works by looping: for example it could be a function that evaluates a certain mathematical expression, e.g. x**2, for all elements from an array, e.g. ([1, 2, ..., 100]) (obviously this is a toy example). Would it be possible to write a code such that, each time function goes through a loop and obtains a result, some code is executed, e.g. print("Loop %s has been executed" % i)? So, in our example, when x**1 has been computed, the program prints Loop 1 has been executed, then when x**2 has been computed, it prints Loop 2 has been executed, and so on.
Note that the difficulty comes from the fact that I do not program the function, it is a preexisting function from some package (more specifically, the function I am interested in would be GridSearchCV from package scikit learn).
The easiest way to do this would be to just copy the function's code into your own function, tweak it, and then use it. In your case, you would have to subclass GridSearchCV and override the _fit method. The problem with this approach is that it may not survive a package upgrade.
In your case, that's not necessary. You can just specify a verbosity level when creating the object:
GridSearchCV(verbose=100)
I'm not entirely sure what the verbosity number itself means. Here's the documentation from the package used internally that does the printing:
The verbosity level: if non zero, progress messages are printed. Above 50, the output is sent to stdout. The frequency of the messages increases with the verbosity level. If it more than 10, all iterations are reported.
You can look at the source code if you really want to know what the verbosity number does. I can't tell.
You could potentially use monkey-patching ("monkey" because it's hacky)
Assuming the library function is
def function(f):
for i in range(100):
i**2
and you want to enter a print statement, you would need to copy the entire function into your own file, and make your tiny edit:
def my_function(f):
for i in range(100):
i**2
print ("Loop %s" % i)
Now you overwrite the library function:
from library import module
module.existing_function = my_function
Obviously this is not an easily maintainable solution (if your target library is upgraded, you might have to go through this process again), so make sure you use it only for temporary debugging purposes.

Is it possible to determine if a function is called in the body of another function or as argument of a function?

Suppose we have a Python function:
def func():
# if called in the body of another function, do something
# if called as argument to a function, do something different
pass
func() can be called in the body of another function:
def funcA():
func()
func() can be also called as an argument to a function:
def funcB(arg1):
pass
def funcC(**kwargs):
pass
funcB(func())
funcC(kwarg1=func())
Is there a way to distinguish between those two case in the body of func()?
EDIT. Here is my use case. I'd like to use Python as a language for rule based 3D model generation. Each rule is a small Python function. Each subsequent rule refines the model and adds additional details. Here is an example rule set. Here is a tutorial describing how the rule set works. My rule language mimics a rule language called CGA shape grammar. If this StackExchange question can be solved, my rule language could be significantly simplified.
EDIT2. Code patching would also suffice for me. For example all cases when func is called in the body of another function are substituted for something like
call_function_on_the_right()>>func()
Others have already pointed out that this might not be a good idea. You could achieve the same by requiring using eg. funct() on toplevel and funca() when as argument, and both could call the same func() with a keyword argument specifying whether you are on toplevel or in an argument, for example. But I'm not here to argue whether this is a good idea or not, I'm here to answer the question.
Is it possible? It might be.
How would you do this, then? Well, the first thing to know is that you can use inspect.stack() to get information about the context you were called from.
You could figure out the line you were called from and read the source file to see how the function is called on that line.
There are two problems with this, however.
The source file could've been modified and give wrong information.
What if you do func(func())? It's called in both ways on the same line!
To get accurate information, you should look at the frame objects in the stack. They have a f_code member that contains the bytecode and a f_lasti member that contains the position of the last instruction. Using those you should be able to figure out which function on the line is currently being called and where the return value is going. You have to parse the whole bytecode (have a look at the dis module) for the frame, though, and keep track of the internal stack used by the interpreter to see where the return value goes.
Now, I'm not 100% sure that it'll work, but I can't see why it wouldn't. The only thing I can see that could "go wrong" is if keeping track of the return value proves to be too hard. But since you only have to keep track of it for the duration of one line of code, there really shouldn't be any structures that would be "impossible" to handle, as far as I can see.
I didn't say it would be easy, only that it might be possible. :)

What is the most pythonic way to have a generator expression executed?

More and more features of Python move to be "lazy executable", like generator
expressions and other kind of iterators.
Sometimes, however, I see myself wanting to roll a one liner "for" loop, just to perform some action.
What would be the most pythonic thing to get the loop actually executed?
For example:
a = open("numbers.txt", "w")
(a.write ("%d " % i) for i in xrange(100))
a.close()
Not actuall code, but you see what I mean. If I use a list generator, instead, I have the side effect of creating a N-lenght list filled with "None"'s.
Currently what I do is to use the expression as the argument in a call to "any" or to "all". But I would like to find a way that would not depend on the result of the expression performed in the loop - both "any" and "all" can stop depending on the expression evaluated.
To be clear, these are ways to do it that I already know about, and each one has its drawbacks:
[a.write ("%d " % i) for i in xrange(100))]
any((a.write ("%d " % i) for i in xrange(100)))
for item in (a.write ("%d " % i) for i in xrange(100)): pass
There is one obvious way to do it, and that is the way you should do it. There is no excuse for doing it a clever way.
a = open("numbers.txt", "w")
for i in xrange(100):
a.write("%d " % i)
d.close()
Lazy execution gives you a serious benefit: It allows you to pass a sequence to another piece of code without having to hold the entire thing in memory. It is for the creation of efficient sequences as data types.
In this case, you do not want lazy execution. You want execution. You can just ... execute. With a for loop.
If I wanted to do this specific example, I'd write
for i in xrange(100): a.write('%d ' % i)
If I often needed to consume an iterator for its effect, I'd define
def for_effect(iterable):
for _ in iterable:
pass
There are many accumulators which have the effect of consuming the whole iterable they're given, such as min or max -- but even they don't ignore entirely the results yielded in the process (min and max, for example, will raise an exception if some of the results are complex numbers). I don't think there's a built-in accumulator that does exactly what you want -- you'll have to write (and add to your personal stash of tiny utility function) a tiny utility function such as
def consume(iterable):
for item in iterable: pass
The main reason, I guess, is that Python has a for statement and you're supposed to use it when it fits like a glove (i.e., for the cases you'd want consume for;-).
BTW, a.write returns None, which is falsish, so any will actually consume it (and a.writelines will do even better!). But I realize you were just giving that as an example;-).
It is 2019 -
and this is a question from 2010 that keeps showing up. A recent thread in one of Python's mailing lists spammed over 70 e-mails on this subject, and they refused again to add a consume call to the language.
On that thread, the most efficient mode to that actually showed up, and it is far from being obvious, so I am posting it as the answer here:
import deque
consume = deque(maxlen=0).extend
And then use the consume callable to process generator expressions.
It turns out the deque native code in cPython actually is optimized for the maxlen=0 case, and will just consume the iterable.
The any and all calls I mentioned in the question should be equally as efficient, but one has to worry about the expression truthiness in order for the iterable to be consumed.
I see this still may be controversial, after all, an explicit two line for loop can handle this - I remembered this question because I just made a commit where I create some threads, start then, and join then back - without a consume callable, that is 4 lines with mostly boiler plate, and without benefiting from cycling through the iterable in native code:
https://github.com/jsbueno/extracontext/blob/a5d24be882f9aa18eb19effe3c2cf20c42135ed8/tests/test_thread.py#L27

Categories