Suppose I am building a library function that does work incrementally and passes the results through callback functions:
def do_something(even_callback, odd_callback):
for i in range(10):
if i % 2:
even_callback(i)
else:
odd_callback(i)
Some callers care about handling the two types of events differently, and so they use separate callback implementations. However many callers don't care about the difference and just use the same implementation:
do_something(print, print)
For these callers, it would be more convenient to have a version of the function do_something_generator:
for result in do_something_generator():
print(result)
Leaving aside the solution of rewriting do_something to be a generator itself, or async/thread/process, how could I wrap do_something to turn it into a generator? (Also excluding "get all the results in a list and then yield them")
Per discussion in the comments, one possible solution would be to rewrite the do_something function to be a generator and then wrap it to get the callback version, however this would require tagging the events as "even" or "odd" so they could be distinguished in the callback implementation.
In my particular case, this is infeasible due to frequent events with chunks of data - adding tagging into the base implementation is a nontrivial overhead cost versus callbacks.
Leaving aside the solution of rewriting do_something to be a generator itself, or async/thread/process, how could I wrap do_something to turn it into a generator?
Assuming that you exclude rewriting the function on the fly so that it actually yields rather than calls callbacks, this is impossible.
Python is single-threaded. What you're describing has two functions running at once, one of which (the wrapped) signals to the other (the wrapper) whenever a value is ready and then passes that value up, whereupon the wrapper yields it. This is impossible, unless one of the following conditions is met:
the wrapped function is in a separate thread/process, able to run by itself
the wrapped function switches execution whilst passing a value rather than calling callbacks
Since the only two ways of switching execution whilst passing a value are yielding or async yielding, there is no way of doing this in the language if your constraints are met. (There is no way to write a callback which signals to the wrapper to yield a value and then allows the wrapped function to resume, except by placing them in different threads.)
What you're describing sounds kind of like protothreads, only for python, by the way.
However there's no reason not to write it the other way round:
from collections import namedtuple
Monad = namedtuple("Monad", "is_left,val")
def do_stuff():
for i in range(10):
yield Monad(is_left=bool(i%2), i)
def do_stuff_always():
for monad in do_stuff():
yield monad.val
def do_stuff_callback(left_callback, right_callback):
for monad in do_stuff():
if monad.is_left:
left_callback(val)
else:
right_callback(val)
I can't resist calling these structures monads, but perhaps that's not what you're trying to encode here.
Related
So I was writing a function where a lot of processing happens in the body of a loop, and occasionally it may be of interest to the caller to have the answer to some of the computations.
Normally I would just put the results in a list and return the list, but in this case the results are too large (a few hundred MB on each loop).
I wrote this without really thinking about it, expecting Python's dynamic typing to figure things out, but the following is always created as a generator.
def mixed(is_generator=False):
for i in range(5):
# process some stuff including writing to a file
if is_generator:
yield i
return
From this I have two questions:
1) Does the presence of the yield keyword in a scope immediately turn the object its in into a generator?
2) Is there a sensible way to obtain the behaviour I intended?
2.1) If no, what is the reasoning behind it not being possible? (In terms of how functions and generators work in Python.)
Lets go step by step:
1) Does the presence of the yield keyword in a scope immediately turn the object its in into a generator? Yes
2) Is there a sensible way to obtain the behaviour I intended? Yes, see example below
The thing is to wrap the computation and either return a generator or a list with the data of that generator:
def mixed(is_generator=False):
# create a generator object
gen = (compute_stuff(i) for i in range(5))
# if we want just the generator
if is_generator:
return gen
# if not we consume it with a list and return that list
return list(gen)
Anyway, I would say this is a bad practice. You should have it separated, usually just have the generator function and then use some logic outside:
def computation():
for i in range(5):
# process some stuff including writing to a file
yield i
gen = computation()
if lazy:
for data in gen:
use_data(data)
else:
data = list(gen)
use_all_data(data)
If you are incrementally designing a function that could have variable number of outputs, what is the best way to design that function? E.g.
def function(input):
return output1, output2, ...
or
def function(input):
return dict(output1=...)
In both cases, you need a bunch of if statements to sort through and utilize the outputs; the difference is where the if statements are used (within function or outside of the function over the dictionary). I am not sure what principle to use to decide on what to do.
If you need to return multiple things, it means the function is either complex and you should break it down, or you need an object with attributes that is "processed" in the function. A dict is a standard object, but you can also create your own, depends if you want to go more the OOP way or the functional/procedural way
What is the difference between a coroutine and a continuation and a generator ?
I'll start with generators, seeing as they're the simplest case. As #zvolkov mentioned, they're functions/objects that can be repeatedly called without returning, but when called will return (yield) a value and then suspend their execution. When they're called again, they will start up from where they last suspended execution and do their thing again.
A generator is essentially a cut down (asymmetric) coroutine. The difference between a coroutine and generator is that a coroutine can accept arguments after it's been initially called, whereas a generator can't.
It's a bit difficult to come up with a trivial example of where you'd use coroutines, but here's my best try. Take this (made up) Python code as an example.
def my_coroutine_body(*args):
while True:
# Do some funky stuff
*args = yield value_im_returning
# Do some more funky stuff
my_coro = make_coroutine(my_coroutine_body)
x = 0
while True:
# The coroutine does some funky stuff to x, and returns a new value.
x = my_coro(x)
print x
An example of where coroutines are used is lexers and parsers. Without coroutines in the language or emulated somehow, lexing and parsing code needs to be mixed together even though they're really two separate concerns. But using a coroutine, you can separate out the lexing and parsing code.
(I'm going to brush over the difference between symmetric and asymmetric coroutines. Suffice it to say that they're equivalent, you can convert from one to the other, and asymmetric coroutines--which are the most like generators--are the easier to understand. I was outlining how one might implement asymmetric coroutines in Python.)
Continuations are actually quite simple beasts. All they are, are functions representing another point in the program which, if you call it, will cause execution to automatically switch to the point that function represents. You use very restricted versions of them every day without even realising it. Exceptions, for instance, can be thought of as a kind of inside-out continuation. I'll give you a Python based pseudocode example of a continuation.
Say Python had a function called callcc(), and this function took two arguments, the first being a function, and the second being a list of arguments to call it with. The only restriction on that function would be that the last argument it takes will be a function (which will be our current continuation).
def foo(x, y, cc):
cc(max(x, y))
biggest = callcc(foo, [23, 42])
print biggest
What would happen is that callcc() would in turn call foo() with the current continuation (cc), that is, a reference to the point in the program at which callcc() was called. When foo() calls the current continuation, it's essentially the same as telling callcc() to return with the value you're calling the current continuation with, and when it does that, it rolls back the stack to where the current continuation was created, i.e., when you called callcc().
The result of all of this would be that our hypothetical Python variant would print '42'.
Coroutine is one of several procedures that take turns doing their job and then pause to give control to the other coroutines in the group.
Continuation is a "pointer to a function" you pass to some procedure, to be executed ("continued with") when that procedure is done.
Generator (in .NET) is a language construct that can spit out a value, "pause" execution of the method and then proceed from the same point when asked for the next value.
In newer version of Python, you can send values to Generators with generator.send(), which makes python Generators effectively coroutines.
The main difference between python Generator, and other generator, say greenlet, is that in python, your yield value can only return back to the caller. While in greenlet, target.switch(value) can take you to a specific target coroutine and yield a value where the target would continue to run.
I am writing a top-down parser which consists of a top-level function that initiates a recursive parse down the text with lower-level functions. Note that the lower-level functions never call the top-level function, but the lower-level functions are mutually recursive.
I noticed that the parser runs somewhat slowly, and I suspect this to be caused by exponential growth in the recursion, because the parser might repeatedly try to parse the same type of object on the same text at the same offset, resulting in wasted effort.
For this reason I want to memoize the lower-level function calls, but after the top-level function returns, I want to clear the memoization cache to release the memory.
That means that if the user calls the top-level function multiple times with the same parameters, the program should actually go through the whole parsing procedure again.
My motivation is that it is unlikely the same text will be parsed at top-level multiple times, so the memory overhead is not worth it (each parse will generate a fairly large cache).
One possible solution is to rewrite all the lower-level functions to take an additional cache argument like this:
def low_level_parse(text, start, cache):
if (text, start) not in cache:
# Do something to compute a result
# ...
cache[(text, start)] = result
return cache[(text, start)]
and rewrite all calls to the low-level functions to pass down the cache argument (which is initially set to {} in the top-level function).
Unfortunately there are many low-level parse functions, and each may also call other low-level parse functions many times. Refactoring the code to implement caching this way would be very tedious and error prone.
Another solution would be to use decorators, and I believe this would be best in terms of maintainability, but I don't know how to implement the memoize decorator in such a way that its cache exists only during the top-level function scope.
I also thought of defining the cache as a global variable in my module, and clear it explicitly after returning from the top-level function. This would spare me the need to modify the low-level functions to take the cache argument explicitly, and I could then use a memoize decorator that makes use of the global cache. But I am not sure the global cache would be a good idea if this is used in a multi-threaded environment.
I found this link to Decorators with Arguments which I think is what is needed here:
class LowLevelProxy:
def __init__(self, cache):
self.cache = cache
def __call__(self, f):
def wrapped_f(*args, **kwargs):
key = (f,args) # <== had to remove kwargs as dicts cannot be keys
if key not in self.cache:
result = f(*args, **kwargs)
self.cache[key] = result
return self.cache[key]
return wrapped_f
NB each function that is wrapped will have its own section in the cache.
you might be able to wrap each of your low level functions like this:
#LowLevelProxy(cache)
def low_level(param_1, param_2):
# do stuff
Python 3.5 greatly expanded support for asynchronous programming with a new function definition syntax. Whereas async functions were previously just "generators with benefits":
def generate_numbers():
"""
Generator function that lazily returns 1 - 100
"""
for i in range 100:
yield i
generate_async = asyncio.coroutine(generate_numbers)
generate_async.__doc__ = """
Coroutine that lazily returns 1 - 100
This can be used interchangeably as a generator or a coroutine
"""
they now have their own special declaration syntax and special behavior by which they are no longer usable as usual generator functions:
aysnc def generate_async_native():
"""
A coroutine that returns 1 - 100
This CANNOT be used as a generator, and can ONLY be executed by running it from an event loop
"""
for i in range(100):
await i
This is not a question about the functional or practical differences between these types -- that is discussed in this StackOverflow answer.
My question is: why would I ever want to use async def? It seems like it provides no additional benefit over #asyncio.coroutine, but imposes an additional cost in that it
breaks backward-compatibility (Python 3.5 code with async def
won't even parse in older versions, although this is arguably a feature and not a bug) and
seems to provide less flexibility in how the function can be called.
One possible answer is given by Martijn Pieters:
The advantages are that with native support, you can also introduce additional syntax to support asynchronous context managers and iterators. Entering and exiting a context manager, or looping over an iterator then can become more points in your co-routine that signal that other code can run instead because something is waiting again
This has in fact come to fruition with new async with and async for syntax, which cannot be as easily implemented with a "tack-on" solution like a decorated generator.