How do you understand the ioloop in tornado? - python

I am looking for a way to understand ioloop in tornado, since I read the official doc several times, but can't understand it. Specifically, why it exists.
from tornado.concurrent import Future
from tornado.httpclient import AsyncHTTPClient
from tornado.ioloop import IOLoop
def async_fetch_future():
http_client = AsyncHTTPClient()
future = Future()
fetch_future = http_client.fetch(
"http://mock.kite.com/text")
fetch_future.add_done_callback(
lambda f: future.set_result(f.result()))
return future
response = IOLoop.current().run_sync(async_fetch_future)
# why get current IO of this thread? display IO, hard drive IO, or network IO?
print response.body
I know what is IO, input and output, e.g. read a hard drive, display graph on the screen, get keyboard input.
by definition, IOLoop.current() returns the current io loop of this thread.
There are many IO device on my laptop running this python code. Which IO does this IOLoop.current() return? I never heard of IO loop in javascript nodejs.
Furthermore, why do I care this low level thing if I just want to do a database query, read a file?

I never heard of IO loop in javascript nodejs.
In node.js, the equivalent concept is the event loop. The node event loop is mostly invisible because all programs use it - it's what's running in between your callbacks.
In Python, most programs don't use an event loop, so when you want one, you have to run it yourself. This can be a Tornado IOLoop, a Twisted Reactor, or an asyncio event loop (all of these are specific types of event loops).
Tornado's IOLoop is perhaps confusingly named - it doesn't do any IO directly. Instead, it coordinates all the different IO (mainly network IO) that may be happening in the program. It may help you to think of it as an "event loop" or "callback runner".

Rather to say it is IOLoop, maybe EventLoop is clearer for you to understand.
IOLoop.current() doesn't really return an IO device but just a pure python event loop which is basically the same as asyncio.get_event_loop() or the underlying event loop in nodejs.
The reason why you need event loop to just do a database query is that you are using event-driven structure to do databse query(In your example, you are doing http request).
Most of time you do not need to care about this low level structure. Instead you just need to use async&await keywords.
Let's say there is a lib which supports asynchronous database access:
async def get_user(user_id):
user = await async_cursor.execute("select * from user where user_id = %s" % user_id)
return user
Then you just need to use this function in your handler:
class YourHandler(tornado.web.RequestHandler):
async def get():
user = await get_user(self.get_cookie("user_id"))
if user is None:
return self.finish("No such user")
return self.finish("Your are %s" % user.user_name)

Related

Replace 'While-True'-Loop with something more efficient

Problem
It's very common for beginners to solve IO waiting while concurrent processing in an similar way like here:
#!/usr/bin/env python3
"""Loop example."""
from time import sleep
WAITING: bool = True
COUNTER: int = 10
def process() -> None:
"""Non-blocking routine, that needs to be invoked periodically."""
global COUNTER # pylint: disable=global-statement
print(f"Done in {COUNTER}.")
COUNTER -= 1
sleep(1)
# Mimicking incoming IO callback
if COUNTER <= 0:
event()
def event() -> None:
"""Incoming IO callback routine."""
global WAITING # pylint: disable=global-statement
WAITING = False
try:
while WAITING:
process()
except KeyboardInterrupt:
print("Canceled.")
Possible applications might be servers, what are listening for incomming messages, while still processing some other internal stuff.
Possible Solution 1
Threading might in some cases a good solution.
But after some research it seems that threading adds a lot of overheading for the communcation between the threads.
One example for this might be the 'Warning' in the osc4py3 package documentation below the headline 'No thread'.
Also i have read somewhere the thumb rule, that 'Threading suits not for slow IO' (sorry, lost the source of this rule).
Possible Solution 2
Asynchronous processing (with the asyncio package) might be another solution.
Especially because the ominous thumb rule also says that 'For slow IO is asyncio efficient'.
What i tried
So i tried to rewrite this example with asyncio but failed completely, even after reading about Tasks, Futures and Awaitables in general in the Python asyncio documentation.
My problem was to solve the perodically (instead of one time) call while waiting.
Of course there are infinite loops possible, but all examples i found in the internet are still using 'While-True'-Loops what does not look like an improvement to me.
For example this snippet:
import asyncio
async def work():
while True:
await asyncio.sleep(1)
print("Task Executed")
loop = asyncio.get_event_loop()
try:
asyncio.ensure_future(work())
loop.run_forever()
except KeyboardInterrupt:
pass
finally:
print("Closing Loop")
loop.close()
Source: https://tutorialedge.net/python/concurrency/asyncio-event-loops-tutorial/#the-run_forever-method
What i want
To know the most elegant and efficient way of rewriting these stupid general 'While-True'-Loop from my first example code.
If my 'While-True'-Loop is still the best way to solve it (beside my global variables), then it's also okay to me.
I just want to improve my code, if possible.
What you describe is some kind of polling operation and is similar to busy waiting. You should rarely rely on those methods as they can incur a serious performance penalty if used incorrectly. Instead, you should rely on concurrency primitives provided by the OS of a concurrency library.
As said in a comment, you could rely on a condition or an event (and more broadly on mutexes) to schedule some come to run after an event occurs. For I/O operations you can also rely on low-level OS facilities such as select, poll and signals/interruptions.
Possible applications might be servers, what are listening for
incomming messages, while still processing some other internal stuff.
For such use cases you should really use a dedicated library to do that efficiently. For instance, here is an example of a minimal server developed with AsyncIO's low-level socket operations. Internally, AsyncIO probably uses the select system call and exposes a friendly interface with async-await.
Solution with asyncio:
#!/usr/bin/env python3
"""Asyncronous loop example."""
from typing import Callable
from asyncio import Event, get_event_loop
DONE = Event()
def callback():
"""Incoming IO callback routine."""
DONE.set()
def process():
"""Non-blocking routine, that needs to be invoked periodically."""
print('Test.')
try:
loop = get_event_loop()
run: Callable = lambda loop, processing: (
processing(),
loop.call_soon(run, loop, processing)
)
loop.call_soon(run, loop, process)
loop.call_later(1, callback) # Mimicking incoming IO callback after 1 sec
loop.run_until_complete(DONE.wait())
except KeyboardInterrupt:
print("Canceled.")
finally:
loop.close()
print("Bye.")

Python asyncio in a thread for migrating existing codebase

We have a rather big project that is doing a lot of networking (API calls, Websocket messages) and that also has a lot of internal jobs running in intervals in threads. Our current architecture involves spawning a lot of threads and the app is not working very well when the system is under a big load, so we've decided to give asyncio a try.
I know that the best way would be to migrate the whole codebase to async code, but that is not realistic in the very near future because of the size of the codebase and the limited development resources. However, we would like to start migrating parts of our codebase to use asyncio event loop and hopefully, we will be able to convert the whole project at some point.
The problem we have encountered so far is that the whole codebase has sync code and in order to add non-blocking asyncio code inside, the code needs to be run in different thread since you can't really run async and sync code in the same thread.
In order to combine async and sync code, I came up with this approach of running the asyncio code in a separate thread that is created on app start. Other parts of the code add jobs to this loop simply by calling add_asyncio_task.
import threading
import asyncio
_tasks = []
def threaded_loop(loop):
asyncio.set_event_loop(loop)
global _tasks
while True:
if len(_tasks) > 0:
# create a copy of needed tasks
needed_tasks = _tasks.copy()
# flush current tasks so that next tasks can be easily added
_tasks = []
# run tasks
task_group = asyncio.gather(*needed_tasks)
loop.run_until_complete(task_group)
def add_asyncio_task(task):
_tasks.append(task)
def start_asyncio_loop():
loop = asyncio.get_event_loop()
t = threading.Thread(target=threaded_loop, args=(loop,))
t.start()
and somewhere in app.py:
start_asyncio_loop()
and anywhere else in the code:
add_asyncio_task(some_coroutine)
Since I am new to asyncio, I am wondering if this is a good approach in our situation or if this approach is considered an anti-pattern and has some problems that will hit us later down the road? Or maybe asyncio already has some solution for this and I'm just trying to invent the wheel here?
Thanks for your inputs!
The approach is fine in general. You have some issues though:
(1) Almost all asyncio objects are not thread safe
(2) Your code is not thread safe on its own. What if a task appears after needed_tasks = _tasks.copy() but before _tasks = []? You need a lock here. Btw making a copy is pointless. Simple needed_tasks = _tasks will do.
(3) Some asyncio constructs are thread safe. Use them:
import threading
import asyncio
# asyncio.get_event_loop() creates a new loop per thread. Keep
# a single reference to the main loop. You can even try
# _loop = asyncio.new_event_loop()
_loop = asyncio.get_event_loop()
def get_app_loop():
return _loop
def asyncio_thread():
loop = get_app_loop()
asyncio.set_event_loop(loop)
loop.run_forever()
def add_asyncio_task(task):
asyncio.run_coroutine_threadsafe(task, get_app_loop())
def start_asyncio_loop():
t = threading.Thread(target=asyncio_thread)
t.start()

asyncio and gevent loops - how to communicate between each other?

I'm trying to build bridge between two protocols based on existing libraries, basically do something based on event (like transmit message, or announce it). The problem is that one library is using Gevent loop and the other is using Asyncio loop, so I'm not able to use built-in loop functionality to do signal/event actions on the other loop, and basically no way to access the other loop.
How to setup event-based communication between them? I can't seem to access the other loop from within existing one. I feel like overthinking.
Is there some way to do it via multithreading by sharing objects between loops?
Sample code:
import libraryBot1
import libraryBot2
bot1 = libraryBot1.Client()
bot2 = libraryBot2.Client()
#bot1.on('chat_message')
def handle_message(user, message_text):
bot2.send(message_text)
#bot2.on('send')
def handle_message(message_text):
print(message_text)
if __name__ == "__main__"
# If I login here, then its run_forever on behind the scenes
# So I cant reach second connection
bot1.login(username="username", password="password")
# Never reached
bot2.login(username="username", password="password")
If I on the other side try to use multithreading, then both of them are started, but they can't access each other (communicate).
Here is an example using only gevent. It might be possible to wrap the greenlets in such a way that it would be compatible with asyncio:
import gevent
from gevent.pool import Pool
from gevent.event import AsyncResult
a = AsyncResult()
pool = Pool(2)
def shared(stuff):
print(stuff)
pool.map(bot1.login, username="username", password="password", event=a, shared=shared)
pool.map(bot2.login, username="username", password="password", event=a, shared=shared)
# and then in both you could something like this
if event.get() == 'ready':
shared('some other result to share')
related:
deleted from pypi https://pypi.python.org/pypi/aiogevent/0.2
see ( https://github.com/gevent/gevent/issues/982 )
http://sdiehl.github.io/gevent-tutorial/#events

is it possible to list all blocked tornado coroutines

I have a "gateway" app written in tornado using #tornado.gen.coroutine to transfer information from one handler to another. I'm trying to do some debugging/status testing. What I'd like to be able to do is enumerate all of the currently blocked/waiting coroutines that are live at a given moment. Is this information accessible somewhere in tornado?
You talk about ioloop _handlers dict maybe. Try to add this in periodic callback:
def print_current_handlers():
io_loop = ioloop.IOLoop.current()
print io_loop._handlers
update: I've checked source code and now think that there is no simple way to trace current running gen.corouitines, A. Jesse Jiryu Davis is right!
But you can trace all "async" calls (yields) from coroutines - each yield from generator go into IOLoop.add_callback (http://www.tornadoweb.org/en/stable/ioloop.html#callbacks-and-timeouts)
So, by examining io_loop._callbacks you can see what yields are in ioloop right now.
Many interesting stuff is here :) https://github.com/tornadoweb/tornado/blob/master/tornado/gen.py
No there isn't, but you could perhaps create your own decorator that wraps gen.coroutine, then updates a data structure when the coroutine begins.
import weakref
import functools
from tornado import gen
from tornado.ioloop import IOLoop
all_coroutines = weakref.WeakKeyDictionary()
def tracked_coroutine(fn):
coro = gen.coroutine(fn)
#functools.wraps(coro)
def start(*args, **kwargs):
future = coro(*args, **kwargs)
all_coroutines[future] = str(fn)
return future
return start
#tracked_coroutine
def five_second_coroutine():
yield gen.sleep(5)
#tracked_coroutine
def ten_second_coroutine():
yield gen.sleep(10)
#gen.coroutine
def tracker():
while True:
running = list(all_coroutines.values())
print(running)
yield gen.sleep(1)
loop = IOLoop.current()
loop.spawn_callback(tracker)
loop.spawn_callback(five_second_coroutine)
loop.spawn_callback(ten_second_coroutine)
loop.start()
If you run this script for a few seconds you'll see two active coroutines, then one, then none.
Note the warning in the docs about the dictionary changing size, you should catch "RuntimeError" in "tracker" to deal with that problem.
This is a bit complex, you might get all you need much more simply by turning on Tornado's logging and using set_blocking_log_threshold.

What kind of problems (if any) would there be combining asyncio with multiprocessing?

As almost everyone is aware when they first look at threading in Python, there is the GIL that makes life miserable for people who actually want to do processing in parallel - or at least give it a chance.
I am currently looking at implementing something like the Reactor pattern. Effectively I want to listen for incoming socket connections on one thread-like, and when someone tries to connect, accept that connection and pass it along to another thread-like for processing.
I'm not (yet) sure what kind of load I might be facing. I know there is currently setup a 2MB cap on incoming messages. Theoretically we could get thousands per second (though I don't know if practically we've seen anything like that). The amount of time spent processing a message isn't terribly important, though obviously quicker would be better.
I was looking into the Reactor pattern, and developed a small example using the multiprocessing library that (at least in testing) seems to work just fine. However, now/soon we'll have the asyncio library available, which would handle the event loop for me.
Is there anything that could bite me by combining asyncio and multiprocessing?
You should be able to safely combine asyncio and multiprocessing without too much trouble, though you shouldn't be using multiprocessing directly. The cardinal sin of asyncio (and any other event-loop based asynchronous framework) is blocking the event loop. If you try to use multiprocessing directly, any time you block to wait for a child process, you're going to block the event loop. Obviously, this is bad.
The simplest way to avoid this is to use BaseEventLoop.run_in_executor to execute a function in a concurrent.futures.ProcessPoolExecutor. ProcessPoolExecutor is a process pool implemented using multiprocessing.Process, but asyncio has built-in support for executing a function in it without blocking the event loop. Here's a simple example:
import time
import asyncio
from concurrent.futures import ProcessPoolExecutor
def blocking_func(x):
time.sleep(x) # Pretend this is expensive calculations
return x * 5
#asyncio.coroutine
def main():
#pool = multiprocessing.Pool()
#out = pool.apply(blocking_func, args=(10,)) # This blocks the event loop.
executor = ProcessPoolExecutor()
out = yield from loop.run_in_executor(executor, blocking_func, 10) # This does not
print(out)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
For the majority of cases, this is function alone is good enough. If you find yourself needing other constructs from multiprocessing, like Queue, Event, Manager, etc., there is a third-party library called aioprocessing (full disclosure: I wrote it), that provides asyncio-compatible versions of all the multiprocessing data structures. Here's an example demoing that:
import time
import asyncio
import aioprocessing
import multiprocessing
def func(queue, event, lock, items):
with lock:
event.set()
for item in items:
time.sleep(3)
queue.put(item+5)
queue.close()
#asyncio.coroutine
def example(queue, event, lock):
l = [1,2,3,4,5]
p = aioprocessing.AioProcess(target=func, args=(queue, event, lock, l))
p.start()
while True:
result = yield from queue.coro_get()
if result is None:
break
print("Got result {}".format(result))
yield from p.coro_join()
#asyncio.coroutine
def example2(queue, event, lock):
yield from event.coro_wait()
with (yield from lock):
yield from queue.coro_put(78)
yield from queue.coro_put(None) # Shut down the worker
if __name__ == "__main__":
loop = asyncio.get_event_loop()
queue = aioprocessing.AioQueue()
lock = aioprocessing.AioLock()
event = aioprocessing.AioEvent()
tasks = [
asyncio.async(example(queue, event, lock)),
asyncio.async(example2(queue, event, lock)),
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
Yes, there are quite a few bits that may (or may not) bite you.
When you run something like asyncio it expects to run on one thread or process. This does not (by itself) work with parallel processing. You somehow have to distribute the work while leaving the IO operations (specifically those on sockets) in a single thread/process.
While your idea to hand off individual connections to a different handler process is nice, it is hard to implement. The first obstacle is that you need a way to pull the connection out of asyncio without closing it. The next obstacle is that you cannot simply send a file descriptor to a different process unless you use platform-specific (probably Linux) code from a C-extension.
Note that the multiprocessing module is known to create a number of threads for communication. Most of the time when you use communication structures (such as Queues), a thread is spawned. Unfortunately those threads are not completely invisible. For instance they can fail to tear down cleanly (when you intend to terminate your program), but depending on their number the resource usage may be noticeable on its own.
If you really intend to handle individual connections in individual processes, I suggest to examine different approaches. For instance you can put a socket into listen mode and then simultaneously accept connections from multiple worker processes in parallel. Once a worker is finished processing a request, it can go accept the next connection, so you still use less resources than forking a process for each connection. Spamassassin and Apache (mpm prefork) can use this worker model for instance. It might end up easier and more robust depending on your use case. Specifically you can make your workers die after serving a configured number of requests and be respawned by a master process thereby eliminating much of the negative effects of memory leaks.
Based on #dano's answer above I wrote this function to replace places where I used to use multiprocess pool + map.
def asyncio_friendly_multiproc_map(fn: Callable, l: list):
"""
This is designed to replace the use of this pattern:
with multiprocessing.Pool(5) as p:
results = p.map(analyze_day, list_of_days)
By letting caller drop in replace:
asyncio_friendly_multiproc_map(analyze_day, list_of_days)
"""
tasks = []
with ProcessPoolExecutor(5) as executor:
for e in l:
tasks.append(asyncio.get_event_loop().run_in_executor(executor, fn, e))
res = asyncio.get_event_loop().run_until_complete(asyncio.gather(*tasks))
return res
See PEP 3156, in particular the section on Thread interaction:
http://www.python.org/dev/peps/pep-3156/#thread-interaction
This documents clearly the new asyncio methods you might use, including run_in_executor(). Note that the Executor is defined in concurrent.futures, I suggest you also have a look there.

Categories