How to understand Tornado gen.coroutine - python

I'm just a newbie to Tornado but not Python.
I'm trying to write an async client for CouchDB(using couchdb package).
After 1 day research/Googling, I found every posts just simply using HttpAsyncClient for example and not telling why and how gen.coroutine works.
The source code just too complicated for me understand cause decorator after decorator HttpAsyncClient is a bad example to me...

Frankly, for ma it was only after reading source, that I (partially) understood the logic
It sort of works like this if you have decorated somefunction() with tornado.gen:
in statement yield somefunction() somefunction() is actually called
Since it's a wrapper, not your code is executed but tornado.coroutine.gen. It runs your code until the first yield in somefunction
If a Future(placehodler) is yielded(back to decorator code!), tornado says: "OK, when this future resolves, schedule a task(IOLoop callback) for IOLoop so that it calls some another_callback()".
To track how far the execution of your somefunction() is gone, Tornado maintains a special object called Runner. It remembers the last yield statement which blocked the execution of your somefunction() and is first run when decorator gen is executed.
After point 3., where Future was "registered", this Runner returns from its main run() method and decorator exits, returning Future of its own
When Future from point 3. is ready, it adds a task to IOLoop, which then calls another_callback. Latter is a special callback created by Tornado, shortly put, it run()s the same Runner as was running when in point 3. a newly resolved Future was yielded.
Runner uses .send() method to inject a value of newly resolved Future back to your somefuncion, which cause ait to be assigned to a variable (if any) in your function in statement like this:
a = yield some_other_coroutine_for_example_async_client_fetch()
OK, that's a gist, there are lots of deatails, some of which I canot wrap my head around, especially, exception handling, but HTH

The first answer covers a lot. I just want to let u know a simpler abstraction.
#Gen.coroutine
res = yield foo()
The coroutine idea is asyncly execute foo(), especially when foo takes a lot IO or network job.
The yield could fire up the foo() execution and transfer control out , say, to caller. And it leaves a Runner to register this foo() task as a future obj. Then when foo() successfully return a result, this is magic happen, the Runner will send the result back, in the form of yield statement execution result (tell the diff between value yield and yield statement execution result).

Related

Can't pass awaitable to asyncio.run_coroutine_threadsafe

I have observed that the asyncio.run_coroutine_threadsafe function does not accept general awaitable objects, and I do not understand the reason for this restriction. Observe
import asyncio
async def native_coro():
return
#asyncio.coroutine
def generator_based_coro():
return
class Awaitable:
def __await__(self):
return asyncio.Future()
loop = asyncio.get_event_loop()
asyncio.run_coroutine_threadsafe(native_coro(), loop)
asyncio.run_coroutine_threadsafe(generator_based_coro(), loop)
asyncio.run_coroutine_threadsafe(Awaitable(), loop)
Running this with Python 3.6.6 yields
Traceback (most recent call last):
File "awaitable.py", line 24, in <module>
asyncio.run_coroutine_threadsafe(Awaitable(), loop)
File "~/.local/python3.6/lib/python3.6/asyncio/tasks.py", line 714, in run_coroutine_threadsafe
raise TypeError('A coroutine object is required')
TypeError: A coroutine object is required
where line 24 is asyncio.run_coroutine_threadsafe(Awaitable(), loop).
I know I can wrap my awaitable object in a coroutine defined like
awaitable = Awaitable()
async def wrapper():
return await awaitable
asyncio.run_coroutine_threadsafe(wrapper(), loop)
however my expectation was that the awaitable would be a valid argument directly to run_coroutine_threadsafe.
My questions are:
What is the reason for this restriction?
Is the wrapper function defined above the most conventional way to pass an awaitable to run_coroutine_threadsafe and other APIs that demand an async def or generator-defined coroutine?
What is the reason for this restriction?
Looking at the implementation, the reason is certainly not technical. Since the code already invokes ensure_future (rather than, say, create_task), it would automatically work, and work correctly, on any awaitable object.
The reason for the restriction can be found on the tracker. The function was added in 2015 as a result of a pull request. In the discussion on the related bpo issue the submitter explicitly requests the function be renamed to ensure_future_threadsafe (in parallel to ensure_future) and accept any kind of awaitable, a position seconded by Yury Selivanov. However, Guido was against the idea:
I'm against that idea. I don't really see a great important future for this method either way: It's just a little bit of glue between the threaded and asyncio worlds, and people will learn how to use it by finding an example.
[...]
But honestly I don't want to encourage flipping back and forth between threads and event loops; I see it as a necessary evil. The name we currently have is fine from the POV of someone coding in the threaded world who wants to hand off something to the asyncio world.
Why would someone in the threaded world have an asyncio.future that they need to wait for? That sounds like they're mixing up the two worlds -- or they should be writing asyncio code instead of threaded code.
There are other comments in a similar vein, but the above pretty much sums up the argument.
Is the wrapper function defined above the most conventional way to pass an awaitable to run_coroutine_threadsafe and other APIs that demand an async def or generator-defined coroutine?
If you actually need a coroutine object, something like wrapper is certainly a straightforward and correct way to get one.
If the only reason you're creating the wrapper is to call run_coroutine_threadsafe, but you're not actually interested in the result or the concurrent.futures.Future returned by run_coroutine_threadsafe, you can avoid the wrapping by calling call_soon_threadsafe directly:
loop.call_soon_threadsafe(asyncio.ensure_future, awaitable)

How "yield" works in tornado.gen.coroutine [duplicate]

I'm new to the concept of non-blocking IO, and there is something i'm having trouble understanding - about coroutines. consider this code:
class UserPostHandler(RequestHandler):
#gen.coroutine
def get(self):
var = 'some variable'
data = json.loads(self.request.body)
yield motor_db.users.insert({self.request.remote_ip: data})#asynch non blocking db insert call
#success
self.set_status(201)
print var
when the get function is called, it creates the string var. what happens to this variable when the function waits for the motor.insert to complete? To my understanding "non blocking" implies that no thread is waiting for the IO call to complete, and no memory is being used while waiting. So where is the value of var stored? how is it accessible when the execution resumes?
Any help would be appreciated!
The memory for var is still being used while insert executes, but the get function itself is "frozen", which allows other functions to execute. Tornado's coroutines are implemented using Python generators, which allow function execution to be temporarily suspended when a yield occurs, and then be restarted again (with the function's state preserved) after the yield point. Here's how the behavior is described in the PEP that introduced generators:
If a yield statement is encountered, the state of the function is
frozen, and the value [yielded] is returned to .next()'s caller. By
"frozen" we mean that all local state is retained, including the
current bindings of local variables, the instruction pointer, and the
internal evaluation stack: enough information is saved so that the
next time .next() is invoked, the function can proceed exactly as if
the yield statement were just another external call.
The #gen.coroutine generator has magic in it that ties into Tornado's event loop, so that the Future returned by the insert call is registered with the event loop, allowing the get generator to be restarted when the insert call completes.

Creating an asynchronous method with Google App Engine's NDB

I want to make sure I got down how to create tasklets and asyncrounous methods. What I have is a method that returns a list. I want it to be called from somewhere, and immediatly allow other calls to be made. So I have this:
future_1 = get_updates_for_user(userKey, aDate)
future_2 = get_updates_for_user(anotherUserKey, aDate)
somelist.extend(future_1)
somelist.extend(future_2)
....
#ndb.tasklet
def get_updates_for_user(userKey, lastSyncDate):
noteQuery = ndb.GqlQuery('SELECT * FROM Comments WHERE ANCESTOR IS :1 AND modifiedDate > :2', userKey, lastSyncDate)
note_list = list()
qit = noteQuery.iter()
while (yield qit.has_next_async()):
note = qit.next()
noteDic = note.to_dict()
note_list.append(noteDic)
raise ndb.Return(note_list)
Is this code doing what I'd expect it to do? Namely, will the two calls run asynchronously? Am I using futures correctly?
Edit: Well after testing, the code does produce the desired results. I'm a newbie to Python - what are some ways to test to see if the methods are running async?
It's pretty hard to verify for yourself that the methods are running concurrently -- you'd have to put copious logging in. Also in the dev appserver it'll be even harder as it doesn't really run RPCs in parallel.
Your code looks okay, it uses yield in the right place.
My only recommendation is to name your function get_updates_for_user_async() -- that matches the convention NDB itself uses and is a hint to the reader of your code that the function returns a Future and should be yielded to get the actual result.
An alternative way to do this is to use the map_async() method on the Query object; it would let you write a callback that just contains the to_dict() call:
#ndb.tasklet
def get_updates_for_user_async(userKey, lastSyncDate):
noteQuery = ndb.gql('...')
note_list = yield noteQuery.map_async(lambda note: note.to_dict())
raise ndb.Return(note_list)
Advanced tip: you can simplify this even more by dropping the #ndb.tasklet decorator and just returning the Future returned by map_async():
def get_updates_for_user_Async(userKey, lastSyncDate):
noteQuery = ndb.gql('...')
return noteQuery.map_async(lambda note: note.to_dict())
This is a general slight optimization for async functions that contain only one yield and immediately return the value yielded. (If you don't immediately get this you're in good company, and it runs the risk to be broken by a future maintainer who doesn't either. :-)

Python generator's 'yield' in separate function

I'm implementing a utility library which is a sort-of task manager intended to run within the distributed environment of Google App Engine cloud computing service. (It uses a combination of task queues and memcache to execute background processing). I plan to use generators to control the execution of tasks, essentially enforcing a non-preemptive "concurrency" via the use of yield in the user's code.
The trivial example - processing a bunch of database entities - could be something like the following:
class EntityWorker(Worker):
def setup():
self.entity_query = Entity.all()
def run():
for e in self.entity_query:
do_something_with(e)
yield
As we know, yield is two way communication channel, allowing to pass values to code that uses generators. This allows to simulate a "preemptive API" such as the SLEEP call below:
def run():
for e in self.entity_query:
do_something_with(e)
yield Worker.SLEEP, timedelta(seconds=1)
But this is ugly. It would be great to hide the yield within seperate function which could invoked in simple way:
self.sleep(timedelta(seconds=1))
The problem is that putting yield in function sleep turns it into a generator function. The call above would therefore just return another generator. Only after adding .next() and yield back again we would obtain previous result:
yield self.sleep(timedelta(seconds=1)).next()
which is of course even more ugly and unnecessarily verbose that before.
Hence my question: Is there a way to put yield into function without turning it into generator function but making it usable by other generators to yield values computed by it?
You seem to be missing the obvious:
class EntityWorker(Worker):
def setup(self):
self.entity_query = Entity.all()
def run(self):
for e in self.entity_query:
do_something_with(e)
yield self.sleep(timedelta(seconds=1))
def sleep(self, wait):
return Worker.SLEEP, wait
It's the yield that turns functions into generators, it's impossible to leave it out.
To hide the yield you need a higher order function, in your example it's map:
from itertools import imap
def slowmap(f, sleep, *iters):
for row in imap(f, self.entity_query):
yield Worker.SLEEP, wait
def run():
return slowmap(do_something_with,
(Worker.SLEEP, timedelta(seconds=1)),
self.entity_query)
Alas, this won't work. But a "middle-way" could be fine:
def sleepjob(*a, **k):
if a:
return Worker.SLEEP, a[0]
else:
return Worker.SLEEP, timedelta(**k)
So
yield self.sleepjob(timedelta(seconds=1))
yield self.sleepjob(seconds=1)
looks ok for me.
I would suggest you have a look at the ndb. It uses generators as co-routines (as you are proposing here), allowing you to write programs that work with rpcs asynchronously.
The api does this by wrapping the generator with another function that 'primes' the generator (it calls .next() immediately so that the code begins execution). The tasklets are also designed to work with App Engine's rpc infrastructure, making it possible to use any of the existing asynchronous api calls.
With the concurreny model used in ndb, you yield either a future object (similar to what is described in pep-3148) or an App Engine rpc object. When that rpc has completed, the execution in the function that yielded the object is allowed to continue.
If you are using a model derived from ndb.model.Model then the following will allow you to asynchronously iterate over a query:
from ndb import tasklets
#tasklets.tasklet
def run():
it = iter(Entity.query())
# Other tasklets will be allowed to run if the next call has to wait for an rpc.
while (yield it.has_next_async()):
entity = it.next()
do_something_with(entity)
Although ndb is still considered experimental (some of its error handling code still needs some work), I would recommend you have a look at it. I have used it in my last 2 projects and found it to be an excellent library.
Make sure you read through the documentation linked from the main page, and also the companion documentation for the tasklet stuff.

Python asynchronous callbacks and generators

I'm trying to convert a synchronous library to use an internal asynchronous IO framework. I have several methods that look like this:
def foo:
....
sync_call_1() # synchronous blocking call
....
sync_call_2() # synchronous blocking call
....
return bar
For each of the synchronous functions (sync_call_*), I have written a corresponding async function that takes a a callback. E.g.
def async_call_1(callback=none):
# do the I/O
callback()
Now for the python newbie question -- whats the easiest way to translate the existing methods to use these new async methods instead? That is, the method foo() above needs to now be:
def async_foo(callback):
# Do the foo() stuff using async_call_*
callback()
One obvious choice is to pass a callback into each async method which effectively "resumes" the calling "foo" function, and then call the callback global at the very end of the method. However, that makes the code brittle, ugly and I would need to add a new callback for every call to an async_call_* method.
Is there an easy way to do that using a python idiom, such as a generator or coroutine?
UPDATE: take this with a grain of salt, as I'm out of touch with modern python async developments, including gevent and asyncio and don't actually have serious experience with async code.
There are 3 common approaches to thread-less async coding in Python:
Callbacks - ugly but workable, Twisted does this well.
Generators - nice but require all your code to follow the style.
Use Python implementation with real tasklets - Stackless (RIP) and greenlet.
Unfortunately, ideally the whole program should use one style, or things become complicated. If you are OK with your library exposing a fully synchronous interface, you are probably OK, but if you want several calls to your library to work in parallel, especially in parallel with other async code, then you need a common event "reactor" that can work with all the code.
So if you have (or expect the user to have) other async code in the application, adopting the same model is probably smart.
If you don't want to understand the whole mess, consider using bad old threads. They are also ugly, but work with everything else.
If you do want to understand how coroutines might help you - and how they might complicate you, David Beazley's "A Curious Course on Coroutines and Concurrency" is good stuff.
Greenlets might be actualy the cleanest way if you can use the extension. I don't have any experience with them, so can't say much.
There are several way for multiplexing tasks. We can't say what is the best for your case without deeper knowledge on what you are doing. Probably the most easiest/universal way is to use threads. Take a look at this question for some ideas.
You need to make function foo also async. How about this approach?
#make_async
def foo(somearg, callback):
# This function is now async. Expect a callback argument.
...
# change
# x = sync_call1(somearg, some_other_arg)
# to the following:
x = yield async_call1, somearg, some_other_arg
...
# same transformation again
y = yield async_call2, x
...
# change
# return bar
# to a callback call
callback(bar)
And make_async can be defined like this:
def make_async(f):
"""Decorator to convert sync function to async
using the above mentioned transformations"""
def g(*a, **kw):
async_call(f(*a, **kw))
return g
def async_call(it, value=None):
# This function is the core of async transformation.
try:
# send the current value to the iterator and
# expect function to call and args to pass to it
x = it.send(value)
except StopIteration:
return
func = x[0]
args = list(x[1:])
# define callback and append it to args
# (assuming that callback is always the last argument)
callback = lambda new_value: async_call(it, new_value)
args.append(callback)
func(*args)
CAUTION: I haven't tested this

Categories