Python generators and reduce - python

I am working on a Python3 tornado web server with asynchronous coroutines for GET requests, using the #gen.coroutine decorator. I want to use this function from a library:
#gen.coroutine
def foo(x):
yield do_something(x)
which is simple enough:
#gen.coroutine
def get(self):
x = self.some_parameter
yield response(foo(x))
Now assume there are multiple functions foo1, foo2, etc. of the same type. I want to do something like ...foo3(foo2(foo1(x).result()).result())... and yield that instead of just response(foo(x)) in the get method.
I thought this would be easy with reduce and the result method. However, because of how tornado works, I cannot force the foos to return something with the result method. This means that yield reduce(...) gives an error: "DummyFuture does not support blocking for results". From other answers on SO and elsewhere, I know I will have to use IOLoop or something, which I didn't really understand, and...
...my question is, how can I avoid evaluating all the foos and yield that unevaluated chunk from the get method?
Edit: This is not a duplicate of this question because I want to: 1. nest a lot of functions and 2. try not to evaluate immediately.

In Tornado, you must yield a Future inside a coroutine in order to get a result. Review Tornado's coroutine guide.
You could write a reducer that is a coroutine. It runs each coroutine to get a Future, calls yield with the Future to get a result, then runs the next coroutine on that result:
from tornado.ioloop import IOLoop
from tornado import gen
#gen.coroutine
def f(x):
# Just to prove we're really a coroutine.
yield gen.sleep(1)
return x * 2
#gen.coroutine
def g(x):
return x + 1
#gen.coroutine
def h():
return 10
#gen.coroutine
def coreduce(*funcs):
# Start by calling last function in list.
result = yield funcs[-1]()
# Call remaining functions.
for func in reversed(funcs[:-1]):
result = yield func(result)
return result
# Wrap in lambda to satisfy your requirement, to
# NOT evaluate immediately.
latent_result = lambda: coreduce(f, g, h)
final_result = IOLoop.current().run_sync(latent_result)
print(final_result)

Related

Terminating multiprocess pool when one of the workers found proper solution

I've created a program, which can be sum up to something like this:
from itertools import combinations
class Test(object):
def __init__(self, t2):
self.another_class_object = t2
def function_1(self,n):
a = 2
while(a <= n):
all_combs = combinations(range(n),a)
for comb in all_combs:
if(another_class_object.function_2(comb)):
return 1
a += 1
return -1
Function combinations is imported from itertools. Function_2 returns True or False depending on the input and is a method in another class object, e.g.:
class Test_2(object):
def __init__(self, list):
self.comb_list = list
def function_2(self,c):
return c in self.comb_list
Everything is working just fine. But now I want to change it a little bit and implement multiprocessing. I found this topic that shows an example of how to exit the script when one of the worker process determines no more work needs to be done. So I made following changes:
added a definition of pool into __init__ method: self.pool = Pool(processes=8)
created a callback function:
all_results = []
def callback_function(self, result):
self.all_results.append(result)
if(result):
self.pool.terminate()
changed function_1:
def function_1(self,n):
a = 2
while(a <= n):
all_combs = combinations(range(n),a)
for comb in all_combs:
self.pool.apply_async(self.another_class_object.function_2, args=comb, callback=self.callback_function)
#self.pool.close()
#self.pool.join()
if(True in all_results):
return 1
a += 1
return -1
Unfortunately, it does not work as I expected. Why? After debugging it looks like the callback function is never reached. I thought that it would be reached by every worker. Am I wrong? What can be the problem?
I did not try your code as such, but I tried your structure. Are you sure the problem is in callback function and not the worker function? I did not manage to get apply_async launch a single instance of the worker function if the function was a class method. It just did not do anything. Apply_async completes without error but it does not implement the worker.
As soon as I moved the worker function (in your case another_class_object.function2) as a standalone global function outside classes, it started working as expected and the callback was triggered normally. The callback function, in contrast, seems to work fine as a class method.
There seems to be discussion about this for example here: Why can I pass an instance method to multiprocessing.Process, but not a multiprocessing.Pool?
Is this in any way useful?
Hannu
Question: ... not work as I expected. ... What can be the problem?
It's always necessary to get() the Results from pool.apply_async(... to see the Errors from the Pool Processes.
Change to the following:
pp = []
for comb in all_combs:
pp.append(pool.apply_async(func=self.another_class_object.function_2, args=comb, callback=self.callback_function))
pool.close()
for ar in pp:
print('ar=%s' % ar.get())
And you will see this Error:
TypeError: function_2() takes 2 positional arguments but 3 were given
Fix for this Error, change args=comb to args=(comb,):
pp.append(pool.apply_async(func=self.another_class_object.function_2, args=(comb,), callback=self.callback_function))
Tested with Python: 3.4.2

Iterating a loop using await or yield causes error

I come from the land of Twisted/Klein. I come in peace and to ask for Tornado help. I'm investigating Tornado and how its take on async differs from Twisted. Twisted has something similar to gen.coroutine which is defer.inlineCallbacks and I'm able to write async code like this:
kleinsample.py
#app.route('/endpoint/<int:n>')
#defer.inlineCallbacks
def myRoute(request, n):
jsonlist = []
for i in range(n):
yield jsonlist.append({'id': i})
return json.dumps(jsonlist)
curl cmd:
curl localhost:9000/json/2000
This endpoint will create a JSON string with n number of elements. n can be small or very big. I'm able to break it up in Twisted such that the event loop won't block using yield. Now here's how I tried to convert this into Tornado:
tornadosample.py
async def get(self, n):
jsonlist = []
for i in range(n):
await gen.Task(jsonlist.append, {'id': i}) # exception here
self.write(json.dumps(jsonlist))
The traceback:
TypeError: append() takes no keyword arguments
I'm confused about what I'm supposed to do to properly iterate each element in the loop so that the event loop doesn't get blocked. Does anyone know the "Tornado" way of doing this?
You cannot and must not await append, since it isn't a coroutine and doesn't return a Future. If you want to occasionally yield to allow other coroutines to proceed using Tornado's event loop, await gen.moment.
from tornado import gen
async def get(self, n):
jsonlist = []
for i in range(n):
jsonlist.append({'id': i})
if not i % 1000: # Yield control for a moment every 1k ops
await gen.moment
return json.dumps(jsonlist)
That said, unless this function is extremely CPU-intensive and requires hundreds of milliseconds or more to complete, you're probably better off just doing all your computation at once instead of taking multiple trips through the event loop before your function returns.
list.append() returns None, so it's a little misleading that your Klein sample looks like it's yielding some object. This is equivalent to jsonlist.append(...); yield as two separate statements. The tornado equivalent would be to do await gen.moment in place of the bare yield.
Also note that in Tornado, handlers produce their responses by calling self.write(), not by returning values, so the return statement should be self.write(json.dumps(jsonlist)).
Let's have a look at gen.Task docs:
Adapts a callback-based asynchronous function for use in coroutines.
Takes a function (and optional additional arguments) and runs it with those arguments plus a callback keyword argument. The argument passed to the callback is returned as the result of the yield expression.
Since append doesn't accept a keyword argument it doesn't know what to do with that callback kwarg and spits that exception.
What you could do is wrap append with your own function that does accept a callback kwarg or the approach showed in this answer.

what's the difference between yield from and yield in python 3.3.2+

After python 3.3.2+ python support a new syntax for create generator function
yield from <expression>
I have made a quick try for this by
>>> def g():
... yield from [1,2,3,4]
...
>>> for i in g():
... print(i)
...
1
2
3
4
>>>
It seems simple to use but the PEP document is complex. My question is that is there any other difference compare to the previous yield statement? Thanks.
For most applications, yield from just yields everything from the left iterable in order:
def iterable1():
yield 1
yield 2
def iterable2():
yield from iterable1()
yield 3
assert list(iterable2) == [1, 2, 3]
For 90% of users who see this post, I'm guessing that this will be explanation enough for them. yield from simply delegates to the iterable on the right hand side.
Coroutines
However, there are some more esoteric generator circumstances that also have importance here. A less known fact about Generators is that they can be used as co-routines. This isn't super common, but you can send data to a generator if you want:
def coroutine():
x = yield None
yield 'You sent: %s' % x
c = coroutine()
next(c)
print(c.send('Hello world'))
Aside: You might be wondering what the use-case is for this (and you're not alone). One example is the contextlib.contextmanager decorator. Co-routines can also be used to parallelize certain tasks. I don't know too many places where this is taken advantage of, but google app-engine's ndb datastore API uses it for asynchronous operations in a pretty nifty way.
Now, lets assume you send data to a generator that is yielding data from another generator ... How does the original generator get notified? The answer is that it doesn't in python2.x where you need to wrap the generator yourself:
def python2_generator_wapper():
for item in some_wrapped_generator():
yield item
At least not without a whole lot of pain:
def python2_coroutine_wrapper():
"""This doesn't work. Somebody smarter than me needs to fix it. . .
Pain. Misery. Death lurks here :-("""
# See https://www.python.org/dev/peps/pep-0380/#formal-semantics for actual working implementation :-)
g = some_wrapped_generator()
for item in g:
try:
val = yield item
except Exception as forward_exception: # What exceptions should I not catch again?
g.throw(forward_exception)
else:
if val is not None:
g.send(val) # Oops, we just consumed another cycle of g ... How do we handle that properly ...
This all becomes trivial with yield from:
def coroutine_wrapper():
yield from coroutine()
Because yield from truly delegates (everything!) to the underlying generator.
Return semantics
Note that the PEP in question also changes the return semantics. While not directly in OP's question, it's worth a quick digression if you are up for it. In python2.x, you can't do the following:
def iterable():
yield 'foo'
return 'done'
It's a SyntaxError. With the update to yield, the above function is not legal. Again, the primary use-case is with coroutines (see above). You can send data to the generator and it can do it's work magically (maybe using threads?) while the rest of the program does other things. When flow control passes back to the generator, StopIteration will be raised (as is normal for the end of a generator), but now the StopIteration will have a data payload. It is the same thing as if a programmer instead wrote:
raise StopIteration('done')
Now the caller can catch that exception and do something with the data payload to benefit the rest of humanity.
At first sight, yield from is an algorithmic shortcut for:
def generator1():
for item in generator2():
yield item
# do more things in this generator
Which is then mostly equivalent to just:
def generator1():
yield from generator2()
# more things on this generator
In English: when used inside an iterable, yield from issues each element in another iterable, as if that item were coming from the first generator, from the point of view of the code calling the first generator.
The main reasoning for its creation is to allow easy refactoring of code relying heavily on iterators - code which use ordinary functions always could, at very little extra cost, have blocks of one function refactored to other functions, which are then called - that divides tasks, simplifies reading and maintaining the code, and allows for more reusability of small code snippets -
So, large functions like this:
def func1():
# some calculation
for i in somesequence:
# complex calculation using i
# ...
# ...
# ...
# some more code to wrap up results
# finalizing
# ...
Can become code like this, without drawbacks:
def func2(i):
# complex calculation using i
# ...
# ...
# ...
return calculated_value
def func1():
# some calculation
for i in somesequence:
func2(i)
# some more code to wrap up results
# finalizing
# ...
When getting to iterators however, the form
def generator1():
for item in generator2():
yield item
# do more things in this generator
for item in generator1():
# do things
requires that for each item consumed from generator2, the running context be first switched to generator1, nothing is done in that context, and the cotnext have to be switched to generator2 - and when that one yields a value, there is another intermediate context switch to generator1, before getting the value to the actual code consuming those values.
With yield from these intermediate context switches are avoided, which can save quite some resources if there are a lot of iterators chained: the context switches straight from the context consuming the outermost generator to the innermost generator, skipping the context of the intermediate generators altogether, until the inner ones are exhausted.
Later on, the language took advantage of this "tunelling" through intermediate contexts to use these generators as co-routines: functions that can make asynchronous calls. With the proper framework in place, as descibed in https://www.python.org/dev/peps/pep-3156/ , these co-routines are written in a way that when they will call a function that would take a long time to resolve (due to a network operation, or a CPU intensive operation that can be offloaded to another thread) - that call is made with a yield from statement - the framework main loop then arranges so that the called expensive function is properly scheduled, and retakes execution (the framework mainloop is always the code calling the co-routines themselves). When the expensive result is ready, the framework makes the called co-routine behave like an exhausted generator, and execution of the first co-routine resumes.
From the programmer's point of view it is as if the code was running straight forward, with no interruptions. From the process point of view, the co-routine was paused at the point of the expensive call, and other (possibly parallel calls to the same co-routine) continued running.
So, one might write as part of a web crawler some code along:
#asyncio.coroutine
def crawler(url):
page_content = yield from async_http_fetch(url)
urls = parse(page_content)
...
Which could fetch tens of html pages concurrently when called from the asyncio loop.
Python 3.4 added the asyncio module to the stdlib as the default provider for this kind of functionality. It worked so well, that in Python 3.5 several new keywords were added to the language to distinguish co-routines and asynchronous calls from the generator usage, described above. These are described in https://www.python.org/dev/peps/pep-0492/
Here is an example that illustrates it:
>>> def g():
... yield from range(5)
...
>>> list(g())
[0, 1, 2, 3, 4]
>>> def g():
... yield range(5)
...
>>> list(g())
[range(0, 5)]
>>>
yield from yields each item of the iterable, but yield yields the iterable itself.
The difference is simple:
yield:
[extra info, if you know the working of generator you can skip that]
yield is used to produce a single value from the generator function. When the generator function is called, it starts executing, and when a yield statement is encountered, it temporarily suspends the execution of the function, returns the value to the caller, and saves its current state. The next time the function is called, it resumes execution from where it left off, and continues until it hits the next yield statement.
In example below, generator1 and generator2 returning a value wrapped in a generator object but combined_generator is also returning a generator object but that object has another generator object, Now, to get the value of these nested generator we were using yield from
class Gen:
def generator1(self):
yield 1
yield 2
yield 3
def generator2(self):
yield 'a'
yield 'b'
yield 'c'
def combined_generator(self):
"""
This function yielding a generator, which inturn yielding a generator
so we need to use `yield from` so that our end function can directly consume the values instead.
"""
yield from self.generator1()
yield from self.generator2()
def run(self):
print("Gen running ...")
for item in self.combined_generator():
print(item)
g = Gen()
g.run()
The output of above is:
Gen calling ...
1
2
3
a
b
c

When will yield actually yield in the function call stack?

I am working on tornado and motor in python 3.4.3.
I got three files. Lets name it like main.py, model.py, core.py
I have three functions, one in each...
main.py
def getLoggedIn(request_handler):
# request_handler = tornado.web.RequestHandler()
db = request_handler.settings["db"]
uid = request_handler.get_secure_cookie("uid")
result = model.Session.get(db, uid=uid)
return result.get("_id", None) if result else None
model.py
#classmethod
def get(cls, db, user_id=None, **kwargs):
session = core.Session(db)
return session.get(user_id, **kwargs)
core.py
#gen.coroutine
def get(self, user_id, **kwargs):
params = kwargs
if user_id:
params.update({"_id": ObjectId(user_id)}) #This does not exist in DB
future = self.collection.find_one(params)
print(future) #prints <tornado.concurrent.Future object at 0x04152A90>
result = yield future
print(result) #prints None
return result
The calls look like getLoggedIn => model.get => core.get
core.get is decorated with #gen.coroutine and I call yield self.collection.find_one(params)
The print(result) prints None but if I return result and try to print the return value in getLoggedIn function it prints .
I believe this is related to asynchronous nature of tornado and the print gets called before yield but I am not sure. It would be a great help if someone could explain about coroutine/generators principles and behavior in different possible cases.
PEP 255 covers the original specification for generators.
However, tornado uses yield inside of coroutines in a very specific way: http://www.tornadoweb.org/en/stable/guide/coroutines.html#how-it-works
Your code doesn't really look or smell like an ordinary generator because the Python notion of generators is being co-opted by tornado to define coroutines.
I would say that you don't really want the principles of generator writing, but the principles of tornado generators -- a wholly different beast.
Assigning the value of the yield is a way for the wrapping #gen.coroutine decorator to pass the result of the future back into core.get.
That way, result is not assigned the future object, but future.result().
yield future essentially suspends your function and turns it into a callback that the future will invoke, resuming execution at the location of the yield.
The asynchronous nature of tornado does not allow the yield to run before the print, as you worried.
Most likely, your Future is not returning anything, or is returning None (semantically equivalent, I know).
It might be best to think of result = yield future as a specialized version of result = future.result()
Every call to a coroutine must be yielded, and the caller must also be a coroutine. So getLoggedIn must be a coroutine that calls:
result = yield model.Session.get(db, uid=uid)
And so on. See my article on refactoring Tornado coroutines for a detailed example and explanation.

Python generator's 'yield' in separate function

I'm implementing a utility library which is a sort-of task manager intended to run within the distributed environment of Google App Engine cloud computing service. (It uses a combination of task queues and memcache to execute background processing). I plan to use generators to control the execution of tasks, essentially enforcing a non-preemptive "concurrency" via the use of yield in the user's code.
The trivial example - processing a bunch of database entities - could be something like the following:
class EntityWorker(Worker):
def setup():
self.entity_query = Entity.all()
def run():
for e in self.entity_query:
do_something_with(e)
yield
As we know, yield is two way communication channel, allowing to pass values to code that uses generators. This allows to simulate a "preemptive API" such as the SLEEP call below:
def run():
for e in self.entity_query:
do_something_with(e)
yield Worker.SLEEP, timedelta(seconds=1)
But this is ugly. It would be great to hide the yield within seperate function which could invoked in simple way:
self.sleep(timedelta(seconds=1))
The problem is that putting yield in function sleep turns it into a generator function. The call above would therefore just return another generator. Only after adding .next() and yield back again we would obtain previous result:
yield self.sleep(timedelta(seconds=1)).next()
which is of course even more ugly and unnecessarily verbose that before.
Hence my question: Is there a way to put yield into function without turning it into generator function but making it usable by other generators to yield values computed by it?
You seem to be missing the obvious:
class EntityWorker(Worker):
def setup(self):
self.entity_query = Entity.all()
def run(self):
for e in self.entity_query:
do_something_with(e)
yield self.sleep(timedelta(seconds=1))
def sleep(self, wait):
return Worker.SLEEP, wait
It's the yield that turns functions into generators, it's impossible to leave it out.
To hide the yield you need a higher order function, in your example it's map:
from itertools import imap
def slowmap(f, sleep, *iters):
for row in imap(f, self.entity_query):
yield Worker.SLEEP, wait
def run():
return slowmap(do_something_with,
(Worker.SLEEP, timedelta(seconds=1)),
self.entity_query)
Alas, this won't work. But a "middle-way" could be fine:
def sleepjob(*a, **k):
if a:
return Worker.SLEEP, a[0]
else:
return Worker.SLEEP, timedelta(**k)
So
yield self.sleepjob(timedelta(seconds=1))
yield self.sleepjob(seconds=1)
looks ok for me.
I would suggest you have a look at the ndb. It uses generators as co-routines (as you are proposing here), allowing you to write programs that work with rpcs asynchronously.
The api does this by wrapping the generator with another function that 'primes' the generator (it calls .next() immediately so that the code begins execution). The tasklets are also designed to work with App Engine's rpc infrastructure, making it possible to use any of the existing asynchronous api calls.
With the concurreny model used in ndb, you yield either a future object (similar to what is described in pep-3148) or an App Engine rpc object. When that rpc has completed, the execution in the function that yielded the object is allowed to continue.
If you are using a model derived from ndb.model.Model then the following will allow you to asynchronously iterate over a query:
from ndb import tasklets
#tasklets.tasklet
def run():
it = iter(Entity.query())
# Other tasklets will be allowed to run if the next call has to wait for an rpc.
while (yield it.has_next_async()):
entity = it.next()
do_something_with(entity)
Although ndb is still considered experimental (some of its error handling code still needs some work), I would recommend you have a look at it. I have used it in my last 2 projects and found it to be an excellent library.
Make sure you read through the documentation linked from the main page, and also the companion documentation for the tasklet stuff.

Categories