Silently passing exceptions with Future objects - python

I have a problem with silently passing exceptions in tornado while using futures in situations, when I am not explicitly waiting result from future coroutines (yield some_future_obj), such as infinite loop coroutine:
#gen.coroutine
def base_func():
#gen.coroutine
def tail_something():
raise
while True:
yield some_other_coroutine
base_func()
I have also noticed that this topic was already discussed: refer to here or here.
The problem is that if we don't wait for future completion explicitly, future.result() never be called and exception will never be raised. But tornado.concurrent is committed to use concurrent.futures package.
Now I just hang ioloop.add_future on current loop and simply execute log.exception(future.result()). But I don't like this approach, since it is a bit noisy (redundant lines in production code).
Please, contribute your ideas or may be a real answer.

The reason Futures "hide" exceptions is that you have to decide where you want the exception to show up. If you want to be able to handle the exception in your code, you must access its result somewhere (which in a Tornado coroutine means yielding it). If you just want to log the exception, you can ask the IOLoop to do it for you:
IOLoop.instance().add_future(fut, lambda fut: fut.result())
Note that I'm just calling result() instead of logging its value. This ensures that we don't log anything when there is no error, and the exception (with traceback) is logged by IOLoop's normal unhandled-exception machinery.

Related

Python Asyncio - Eventloop in a contextmanager

Since I dont like the approach to use loop.run() for various reasons I wanted to code contextual loop, since the docs states on different occasions that if you don't go with the canonical .run() you have to prevent memory leaks by yourself (i.e). After a bit of research it seems like the python devs answer this feature with We don't need it!. While contextmanagers seems in general perfectly fine if you using the lower level api of asyncio, see PEP 343 - The “with” Statement exampel 10:
This can be used to deterministically close anything with a close
method, be it file, generator, or something else. It can even be used
when the object isn’t guaranteed to require closing (e.g., a function
that accepts an arbitrary iterable)
So can we do it anyway?
Related links:
https://bugs.python.org/issue24795
https://bugs.python.org/issue32875
https://groups.google.com/g/python-tulip/c/8bRLexUzeU4
https://bugs.python.org/issue19860#msg205062
https://github.com/python/asyncio/issues/261
Yes we can have a context manager for our event loop, even if there seems no good practice via subclassing due the c implementations (i.e). Basically the idea crafted out below is the following:
TL;DR
Create an Object with __enter__ and __exit__ to have the syntax with working.
Instead of usually returning the object, we return the loop that is served by asyncio
wrapping the asycio.loop.close() so that the loop gets stopped and our __exit__ method gets invoked before.
Close all connections which can lead to memory leaks and then close the loop.
Side-note
The implementation is due a wrapper object that returns a new loop into an annonymous block statement. Be aware that loop.stop() will finalize the loop and no further actions should be called. Overall the code below is just a little help and more a styling choice in my opinion, especially due the fact that it is no real subclass. But I think if someone wants to use the lower api without minding to finalize everything before, here is a possibility.
import asyncio
class OpenLoop:
def close(self,*args, **kwargs):
self._loop.stop()
def _close_wrapper(self):
self._close = self._loop.close
self._loop.close = self.close
def __enter__(self):
self._loop = asyncio.new_event_loop()
self._close_wrapper()
return self._loop
def __exit__(self,*exc_info):
asyncio.run(self._loop.shutdown_asyncgens())
asyncio.run(self._loop.shutdown_default_executor())
#close other services
self._close()
if __name__ == '__main__':
with OpenLoop() as loop:
loop.call_later(1,loop.close)
loop.run_forever()
assert loop.is_closed()

Python coroutines

I have a little bit of experience with promises in Javascript. I am quite experienced with Python, but new to its coroutines, and there is a bit that I just fail to understand: where does the asynchronicity kick in?
Let's consider the following minimal example:
async def gen():
await something
return 42
As I understand it, await something puts execution of our function aside and lets the main program run other bits. At some point something has a new result and gen will have a result soon after.
If gen and something are coroutines, then by all internet wisdom they are generators. And the only way to know when a generator has a new item available, afaik, is by polling it: x=gen(); next(x). But this is blocking! How does the scheduler "know" when x has a result? The answer can't be "when something has a result" because something must be a generator, too (for it is a coroutine). And this argument applies recursively.
I can't get past this idea that at some point the process will just have to sit and wait synchronously.
The secret sauce here is the asyncio module. Your something object has to be an awaitable object itself, and either depend on more awaitable objects, or must yield from a Future object.
For example, the asyncio.sleep() coroutine yields a Future:
#coroutine
def sleep(delay, result=None, *, loop=None):
"""Coroutine that completes after a given time (in seconds)."""
if delay == 0:
yield
return result
if loop is None:
loop = events.get_event_loop()
future = loop.create_future()
h = future._loop.call_later(delay,
futures._set_result_unless_cancelled,
future, result)
try:
return (yield from future)
finally:
h.cancel()
(The syntax here uses the older generator syntax, to remain backwards compatible with older Python 3 releases).
Note that a future doesn't use await or yield from; they simply use yield self until some condition is met. In the above async.sleep() coroutine, that condition is met when a result has been produced (in the async.sleep() code above, via the futures._set_result_unless_cancelled() function called after a delay).
An event loop then keeps pulling in the next 'result' from each pending future it manages (polling them efficiently) until the future signals it is done (by raising a StopIteration exception holding the results; return from a co-routine would do that, for example). At that point the coroutine that yielded the future can be signalled to continue (either by sending the future result, or by throwing an exception if the future raised anything other than StopIteration).
So for your example, the loop will kick off your gen() coroutine, and await something then (directly or indirectly) yields a future. That future is polled until it raises StopIteration (signalling it is done) or raises some other exception. If the future is done, coroutine.send(result) is executed, allowing it to then advance to the return 42 line, triggering a new StopIteration exception with that value, allowing a calling coroutine awaiting on gen() to continue, etc.

Detect failed tasks in concurrent.futures

I've been using concurrent.futures as it has a simple interface and let user easily control the max number of threads/processes. However, it seems like concurrent.futures hides failed tasks and continue the main thread after all tasks finished/failed.
import concurrent.futures
def f(i):
return (i + 's')
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
fs = [executor.submit(f, i ) for i in range(10)]
concurrent.futures.wait(fs)
Calling f on any integer leads an TypeError. However, the whole script runs just fine and exits with code 0. Is there any way to make it throw an exception/error when any thread failed?
Or, is there a better way to limit number of threads/processes without using concurrent.futures?
concurrent.futures.wait will ensure all the tasks completed, but it doesn't check success (something return-ed) vs. failure (exception raised and not caught in worker function). To do that, you need to call .result() on each Future (which will cause it to either re-raise the exception from the task, or produce the return-ed value). There are other methods to check without actually raising in the main thread (e.g. .exception()), but .result() is the most straightforward method.
If you want to make it re-raise, the simplest approach is just to replace the wait() call with:
for fut in concurrent.futures.as_completed(fs):
fut.result()
which will process results as Futures complete, and promptly raise an Exception if one occurred. Alternatively, you continue to use wait so all tasks finish before you check for exceptions on any of them, then iterate over fs directly and call .result() on each.
There is another way to do the same with multiprocessing.Pool (for processes) or multiprocessing.pool.ThreadPool (for threads). As far as I know it rethrows any caught exceptions.

How should I use asyncio in a library without interfering with other callers?

I want to write a library that manages child processes with asyncio. I don't want to force my callers to be asynchronous themselves, so I'd prefer to get a new_event_loop, do a run_until_complete, and then close it. Ideally I'd like to do this without conflicting with any other asyncio stuff the caller might be doing.
My problem is that waiting on subprocesses doesn't work unless you call set_event_loop, which attaches the internal watcher. But of course if I do that, I might conflict with other event loops in the caller. A workaround is to cache the caller's current loop (if any), and then call set_event_loop one more time when I'm done to restore the caller's state. That almost works. But if the caller is not an asyncio user, a side effect of calling get_event_loop is that I've now created a global loop that didn't exist before, and Python will print a scary warning if the program exits without calling close on that loop.
The only meta-workaround I can think of is to do an atexit.register callback that closes the global loop. That won't conflict with the caller because close is safe to call more than once, unless the caller has done something crazy like trying to start the global loop during exit. So it's still not perfect.
Is there a perfect solution to this?
What you're trying to achieve looks very much like ProcessPoolExecutor (in concurrent.futures).
Asyncronous caller:
#coroutine
def in_process(callback, *args, executor=ProcessPoolExecutor()):
loop = get_event_loop()
result = yield from loop.run_in_executor(executor, callback, *args)
return result
Synchronous caller:
with ProcessPoolExecutor() as executor:
future = executor.submit(callback, *args)
result = future.result()

Communication between threads in Python (without using Global Variables)

Let's say if we have a main thread which launches two threads for test modules - " test_a" and " test_b".
Both the test module threads maintain their state whether they are done performing test or if they encountered any error, warning or if they want to update some other information.
How main thread can get access to this information and act accordingly.
For example, if " test_a" raised an error flag; How "main" will know and stop rest of the tests before existing with error ?
One way to do this is using global variables but that gets very ugly.. Very soon.
The obvious solution is to share some kind of mutable variable, by passing it in to the thread objects/functions at constructor/start.
The clean way to do this is to build a class with appropriate instance attributes. If you're using a threading.Thread subclass, instead of just a thread function, you can usually use the subclass itself as the place to stick those attributes. But I'll show it with a list just because it's shorter:
def test_a_func(thread_state):
# ...
thread_state[0] = my_error_state
# ...
def main_thread():
test_states = [None]
test_a = threading.Thread(target=test_a_func, args=(test_states,))
test_a.start()
You can (and usually want to) also pack a Lock or Condition into the mutable state object, so you can properly synchronize between main_thread and test_a.
(Another option is to use a queue.Queue, an os.pipe, etc. to pass information around, but you still need to get that queue or pipe to the child thread—which you do in the exact same way as above.)
However, it's worth considering whether you really need to do this. If you think of test_a and test_b as "jobs", rather than "thread functions", you can just execute those jobs on a pool, and let the pool handle passing results or errors back.
For example:
try:
with concurrent.futures.ThreadPoolExecutor(workers=2) as executor:
tests = [executor.submit(job) for job in (test_a, test_b)]
for test in concurrent.futures.as_completed(tests):
result = test.result()
except Exception as e:
# do stuff
Now, if the test_a function raises an exception, the main thread will get that exception—and, because that means exiting the with block, and all of the other jobs get cancelled and thrown away, and the worker threads shut down.
If you're using 2.5-3.1, you don't have concurrent.futures built in, but you can install the backport off PyPI, or you can rewrite things around multiprocessing.dummy.Pool. (It's slightly more complicated that way, because you have to create a sequence of jobs and call map_async to get back an iterator over AsyncResult objects… but really that's still pretty simple.)

Categories