Is tornado an async webserver? - python

I am learning to write a backend server that can handle thousands of connections.
I take a look on some sample code, but find it is still writing in sync logic.
For example: (take from http://www.tornadoweb.org/en/stable/gen.html)
#gen.coroutine
def get(self):
http_client = AsyncHTTPClient()
response1, response2 = yield [http_client.fetch(url1), http_client.fetch(url2)]
print(response1.body, response2.body)
It is obvious that the print statement couldn't execute before getting the response of the two fetches, or else it will throw exception due to accessing not exist data.
Therefore, it must have a block between the last two lines, but, block, isn't tornado is highlight for non-block, async, event-driven... and then, could handle thousands of connections?

Yes, tornado is asynchronous. The example you're showing is a coroutine; it's actually non-blocking, and releases control back to the tornado event loop at the yield call. Control only returns back to the get function when both http_client.fetch calls have actually completed.
These two examples are actually functionally equivalent in tornado:
class AsyncHandler(RequestHandler):
#asynchronous
def get(self):
http_client = AsyncHTTPClient()
http_client.fetch("http://example.com",
callback=self.on_fetch)
def on_fetch(self, response):
do_something_with_response(response)
self.render("template.html")
And a coroutine version:
class GenAsyncHandler(RequestHandler):
#gen.coroutine
def get(self):
http_client = AsyncHTTPClient()
response = yield http_client.fetch("http://example.com")
do_something_with_response(response)
self.render("template.html")
Coroutines allow you to write asynchronous code that looks synchronous, which is more readable. When the above code hits the yield, get suspends and yields the Future object returned by http_client.fetch to the gen.coroutine decorator. The gen.coroutine decorator has magic in it that schedules the result of the Future returned by the fetch call to be passed back into get once its ready.

Yes!
It is very likely to confuse with the two different pattern to use yield and coroutine between long running jobs and slow IO operations. I have just make a critical mistake on that.
In long running jobs
The next() method of its generator is repeatedly called, and do a unit of work every time.
If have more than one coroutine running, the scheduler will call each next() method one by one, so it share CPU time between the jobs. So it is cooperative between the jobs, therefore called coroutine.
In slow IO operations
The next() method is called only once for each yield point.
Once yield from the point doing IO operation, the IO operation has delegate to OS kernel. The scheduler will add a callback when IO operation has complete, which will call the next() method.
And now it is a question for scheduler when to call the next() method. This is powered by OS level async feature, like epoll, IOCP to notiy scheduler when IO has complete.
So the whole flow is, running to the point making IO, than yield to hand over execution. After IO has complete, it will continue execute by call next() from the scheduler.
The effects of this control flow is exactly same with callback pattern, both are
run to a point dong IO
handover the execution so the process could do other jobs
IO has completed, continue running.
The only difference is one is continue to execute on the former function, and the other continue to a new callback function.
So, in summary:
in long running jobs, the scheduler will call each next() method once the process is idle.
in slow IO operations, the next() method is called only once, when the IO operations has finished.
I think if you realize that, you will understand that use yield and coroutine actually could have the same power with callback.

Related

replacing asyncio with concurent.futures

asyncio is causing issues on my spyder IDE => would like to replace it with concurent.futures library
how can I replace the below code relying only on concurent.futures library
asyncio.get_event_loop().run_until_complete(api(message))
exact function looks as follows
def async_loop(api, message):
return asyncio.get_event_loop().run_until_complete(api(message))
As written, you're starting up the event loop only until a particular task completes (which may or may not launch or wait on other tasks), and blocking until it completes. The only reason it's a task is because it needs to use async functions, those can only run in an event loop, and while running, they may launch other tasks or wait on other awaitables, and while waiting, the event loop can do other tasks.
In short, if not for the need to be an async task running in a non-async context, this would just be:
def async_loop(api, message):
return api(message)
which calls api and waits for it to complete.
Really, that's it. If the things api does or calls need to run some tasks asynchronously, without blocking on them immediately, you'd have some global executor, e.g.
executor = concurrent.Futures.ThreadPoolExecutor()
which would be used to launch tasks with:
fut = executor.submit(callable, 'arg1', 'arg2', kwarg1='somevalue')
and, when the result of the task is needed, someone would call:
value = fut.result()
on it (which would block if it wasn't done yet, return the result if it completed without an exception, or raise the exception it died with if it died with an exception).
Whenever you no longer need the executor, you just call .shutdown() on it and it will wait for all outstanding tasks to complete. That's it.
As a side-note, the error you're experiencing is part of why they've deprecated get_event_loop() in 3.10 (and discouraged it since 3.7). In all likelihood, the simplest solution to your problem (avoiding a switch to threads, because all that means is you've got new problems) is to use the much simpler high-level API, asyncio.run (introduced in 3.7), which creates an event loop, runs the task in it to completion, does reasonable cleanup, then returns the result:
def async_loop(api, message):
return asyncio.run(api(message))
There's also the asyncio.get_running_loop function (that is the exact replacement for get_event_loop) which you use when an event loop already exists (which you should typically be aware of; event loops don't pop into existence in given thread on their own, so you should know if you launched one; in this case you hadn't, so asyncio.run is the correct one to use).

is asyncio.to_thread() method different to ThreadPoolExecutor?

I see that asyncio.to_thread() method is been added #python 3.9+, its description says it runs blocking codes on a separate thread to run at once. see example below:
def blocking_io():
print(f"start blocking_io at {time.strftime('%X')}")
# Note that time.sleep() can be replaced with any blocking
# IO-bound operation, such as file operations.
time.sleep(1)
print(f"blocking_io complete at {time.strftime('%X')}")
async def main():
print(f"started main at {time.strftime('%X')}")
await asyncio.gather(
asyncio.to_thread(blocking_io),
asyncio.sleep(1))
print(f"finished main at {time.strftime('%X')}")
asyncio.run(main())
# Expected output:
#
# started main at 19:50:53
# start blocking_io at 19:50:53
# blocking_io complete at 19:50:54
# finished main at 19:50:54
By explanation, it seems like using thread mechanism and not context switching nor coroutine. Does this mean it is not actually an async after all? is it same as a traditional multi-threading as in concurrent.futures.ThreadPoolExecutor? what is the benefit of using thread this way then?
Source code of to_thread is quite simple. It boils down to awaiting run_in_executor with a default executor (executor argument is None) which is ThreadPoolExecutor.
In fact, yes, this is traditional multithreading, сode intended to run on a separate thread is not asynchronous, but to_thread allows you to await for its result asynchronously.
Also note that the function runs in the context of the current task, so its context variable values will be available inside the func.
async def to_thread(func, /, *args, **kwargs):
"""Asynchronously run function *func* in a separate thread.
Any *args and **kwargs supplied for this function are directly passed
to *func*. Also, the current :class:`contextvars.Context` is propogated,
allowing context variables from the main thread to be accessed in the
separate thread.
Return a coroutine that can be awaited to get the eventual result of *func*.
"""
loop = events.get_running_loop()
ctx = contextvars.copy_context()
func_call = functools.partial(ctx.run, func, *args, **kwargs)
return await loop.run_in_executor(None, func_call)
you would use asyncio.to_tread when ever you need to call a blocking api from a third party lib that either does not have an asyncio adapter/interface or where you do not want to create one because you just need to use a limited number of functions form that lib.
a concrete example is i am currently writing a applicaiton that will eventually run as a daemon at which point it will use asyncio for its core event loop. The eventloop will involved monitoring a unix socket for notifications which will trigger the deamon to take an action.
for rapid prototyping its currently a cli but one of the depencies/external system the deamon will interact with is call libvirt, an abstraction layer for virtual machine management written in c with a python wrapper called libvirt python.
the python binding are blocking and comunitcate with the libvirt deamon over a separate unix socket with a blocking request responce protocol.
you can conceptually think of making a call to the libvirt bindings as each function internally making a http request to a server and waiting for the server to complete the action. The exact mechanics of how it does that are not important for this disucssion just that its a blocking io operation that depends on and external process that may take some time. i.e. this is not a cpu bound call and therefore it can be offloaded to a thread and awaited.
if i was to directly call “domains = libvirt.conn.listAllDomains()” in a async function
that would block my asyncio event loop until i got a responce form libvirt.
so if any events were recived on the unix socket my main loop is monitoring
they would not be processed while we are waiting for the libvirt deamon to look up all domains and return the list of them to us.
if i use “domains = await asyncio.to_thread(libvirt.conn.listAllDomains)”
however the await call will suspend my current coroutine until we get the responce, yeilding execution back to the asyncio event loop. that means if the daemon recives a notification while we are waiting on libvirt it can be schduled to run concurrently instead of being blocked.
in my application i will also need to read and write to linux speical files in /sys. linux has natiave aio file support which can be used with asyncio vai aiofile however linux does not supprot the aio interface for managing special files, so i would have to use blocking io.
one way to do that in a async applicaiton would be to wrap function that writes to the special files asyncio.to_thread.
i could and might use a decorator to use run_in_executor directly since i own the write_sysfs function but if i did not then to_thread is more polite then monkeypatching someone else’s lib and less work then creating my own wrapper api.
hopefully those are useful examples of where you might want to use to_thread. its really just a convince function and you can use run_in_executor to do the same thing with so addtional overhead.
if you need to support older python release you might also prefer run_in_executor since it predates the intorduction of to_thread but if you can assume 3.9+ then its a nice addtion to leverage when you need too.

Python asyncio: synchronize all access to a shared object

I have a class which processes a buch of work elements asynchronously (mainly due to overlapping HTTP connection requests) using asyncio. A very simplified example to demonstrate the structure of my code:
class Work:
...
def worker(self, item):
# do some work on item...
return
def queue(self):
# generate the work items...
yield from range(100)
async def run(self):
with ThreadPoolExecutor(max_workers=10) as executor:
loop = asyncio.get_event_loop()
tasks = [
loop.run_in_executor(executor, self.worker, item)
for item in self.queue()
]
for result in await asyncio.gather(*tasks):
pass
work = Work()
asyncio.run(work.run())
In practice, the workers need to access a shared container-like object and call its methods which are not async-safe. For example, let's say the worker method calls a function defined like this:
def func(shared_obj, value):
for node in shared_obj.filter(value):
shared_obj.remove(node)
However, calling func from a worker might affect the other asynchronous workers in this or any other function involving the shared object. I know that I need to use some synchronization, such as a global lock, but I don't find its usage easy:
asyncio.Lock can be used only in async functions, so I would have to mark all such function definitions as async
I would also have to await all calls of these functions
await is also usable only in async functions, so eventually all functions between worker and func would be async
if the worker was async, it would not be possible to pass it to loop.run_in_executor (it does not await)
Furthermore, some of the functions where I would have to add async may be generic in the sense that they should be callable from asynchronous as well as "normal" context.
I'm probably missing something serious in the whole concept. With the threading module, I would just create a lock and work with it in a couple of places, without having to further annotate the functions. Also, there is a nice solution to wrap the shared object such that all access is transparently guarded by a lock. I'm wondering if something similar is possible with asyncio...
I'm probably missing something serious in the whole concept. With the threading module, I would just create a lock...
What you are missing is that you're not really using asyncio at all. run_in_executor serves to integrate CPU-bound or legacy sync code into an asyncio application. It works by submitting the function it to a ThreadPoolExecutor and returning an awaitable handle which gets resolved once the function completes. This is "async" in the sense of running in the background, but not in the sense that is central to asyncio. An asyncio program is composed of non-blocking pieces that use async/await to suspend execution when data is unavailable and rely on the event loop to efficiently wait for multiple events at once and resume appropriate async functions.
In other words, as long as you rely on run_in_executor, you are just using threading (more precisely concurrent.futures with a threading executor). You can use a threading.Lock to synchronize between functions, and things will work exactly as if you used threading in the first place.
To get the benefits of asyncio such as scaling to a large number of concurrent tasks or reliable cancellation, you should design your program as async (or mostly async) from the ground up. Then you'll be able to modify shared data atomically simply by doing it between two awaits, or use asyncio.Lock for synchronized modification across awaits.

Twisted callRemote

I have to make remote calls that can take quite a long time (over 60 seconds). Our entire code relies on processing the return value from the callRemote, so that's pretty bad since we're blocking on IO the whole time despite using twqisted + 50 worker threads running.
We currently use something like
result = threads.blockingCallFromThread(reactor, callRemote, "method", args)
and get the result/go on, but as its name says it's blocking the event loop so we cannot wait for several results at the same time.
THere's no way I can refactor the whole code to make it asynchronous so I think the only way is to defer the long IO tasks to threads.
I'm trying to make the remote calls in threads, but I can't find a way to get the result from the blocking calls back. The remoteCalls are made, the result is somewhere but I just can't get a hook on it.
What I'm trying to do currently looks like
reactor.callInThread(callRemote, name, *args, **kw)
which returns a empty Deferred (why ?).
I'm trying to put the result in some sort of queue but it just won't work. How do I do that ?
AFAIK, blockingCallFromThread executes code in reactor's thread. That's why it doesn't work as you need.
If I understand you properly, you need to move some operation out off reactors thread and get the result into reactors thread.
I use approach with deferToThread for the same case.
Example with deferreds:
import time
from twisted.internet import reactor, threads
def doLongCalculation():
time.sleep(1)
return 3
def printResult(x):
print x
# run method in thread and get result as defer.Deferred
d = threads.deferToThread(doLongCalculation)
d.addCallback(printResult)
reactor.run()
Also, you might be interested in threads.deferToThreadPool.
Documentation about threading in Twisted.

Executing python code in parallel with ndb tasklets

First of all i know i can use threading to accomplish such task, like so:
import Queue
import threading
# called by each thread
def do_stuff(q, arg):
result = heavy_operation(arg)
q.put(result)
operations = range(1, 10)
q = Queue.Queue()
for op in operations:
t = threading.Thread(target=do_stuff, args = (q,op))
t.daemon = True
t.start()
s = q.get()
print s
However, in google app engine there's something called ndb tasklets and according to their documentation you can execute code in parallel using them.
Tasklets are a way to write concurrently running functions without
threads; tasklets are executed by an event loop and can suspend
themselves blocking for I/O or some other operation using a yield
statement. The notion of a blocking operation is abstracted into the
Future class, but a tasklet may also yield an RPC in order to wait for
that RPC to complete.
Is it possible to accomplish something like the example with threading above?
I already know how to handle retrieving entities using get_async() (got it from their examples at doc page) but its very unclear to me when it comes to parallel code execution.
Thanks.
The answer depended on what your heavy_operation really is. If the heavy_operation use RPC (Remote Procedure Call, such as datastore access, UrlFetch, ... etc), then the answer is yes.
In
how to understand appengine ndb.tasklet?
I asked a similar question, you may find more details there.
May I put any kind of code inside a function and decorate it as ndb.tasklet? Then used it as async function later. Or it must be appengine RPC?
The Answer
Technically yes, but it will not run asynchronously. When you decorate a non-yielding function with #tasklet, its Future's value is computed and set when you call that function. That is, it runs through the entire function when you call it. If you want to achieve asynchronous operation, you must yield on something that does asynchronous work. Generally in GAE it will work its way down to an RPC call.

Categories