I want to save state into the database of a background aiohttp coroutine before server is shut down. I was thinking of creating a global array of coroutine jobs that need to be finished and do an await asyncio.gather(*global_jobs) in the shutdown handler.
Is this the proper approach?
I'm not sure I understand what you mean by "the database of a background aiohttp coroutine" but, as far as cleanup actions at shutdown go, you can:
Use a signal handler
import asyncio
import signal
loop = asyncio.get_event_loop()
loop.add_signal_handler(signal.SIGINT, my_signal_handler, *additional_args_list)
In a Unix setting, if you know that the application is going to be interrupted with a specific signal, you can perform cleanup actions selectively for that signal. See loop.add_signal_handler(signum, callback, *args) for further information.
Note: you can employ a callable class rather than a function as callback, so that the class instance can hold a reference to any resource you wish to interact with during shutdown, e.g. the coroutines you mentioned in your question.
Catch asyncio.CancelledError
import asyncio
async def my_coro():
try:
# Normal interaction with aiohttp
while True:
pass
except asyncio.CancelledError:
cleanup_actions()
If you can assume that, at shutdown, the event loop will be stopped cleanly, you can count on your running coroutines to be thrown an asyncio.CancelledError before the loop is closed.
Related
I have a Python task that reports on it's progress during execution using a status updater that has 1 or more handlers associated with it. I want the updates to be dispatched to each handler in an asynchronous way (each handler is responsible for making an I/O bound call with the updates, pushing to a queue, logging to a file, calling a HTTP endpoint etc). The status updater has a method like so with each handler.dispatch method being a coroutine. This was working until a handler using aiohttp was added and now I am getting weird errors from the aiohttp module.
def _dispatch(self, **updates):
event_loop = asyncio.get_event_loop()
tasks = (event_loop.create_task(handler.dispatch(**updates)) for handler in self._handlers)
event_loop.run_until_complete(asyncio.gather(*tasks))
Every example of asyncio I've seen basically has this pattern
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
My question is, is the way I am attempting to use the asyncio module in this case just completely wrong? Does the event loop need to be created once and only once and then everything else goes through that?
I see that asyncio.to_thread() method is been added #python 3.9+, its description says it runs blocking codes on a separate thread to run at once. see example below:
def blocking_io():
print(f"start blocking_io at {time.strftime('%X')}")
# Note that time.sleep() can be replaced with any blocking
# IO-bound operation, such as file operations.
time.sleep(1)
print(f"blocking_io complete at {time.strftime('%X')}")
async def main():
print(f"started main at {time.strftime('%X')}")
await asyncio.gather(
asyncio.to_thread(blocking_io),
asyncio.sleep(1))
print(f"finished main at {time.strftime('%X')}")
asyncio.run(main())
# Expected output:
#
# started main at 19:50:53
# start blocking_io at 19:50:53
# blocking_io complete at 19:50:54
# finished main at 19:50:54
By explanation, it seems like using thread mechanism and not context switching nor coroutine. Does this mean it is not actually an async after all? is it same as a traditional multi-threading as in concurrent.futures.ThreadPoolExecutor? what is the benefit of using thread this way then?
Source code of to_thread is quite simple. It boils down to awaiting run_in_executor with a default executor (executor argument is None) which is ThreadPoolExecutor.
In fact, yes, this is traditional multithreading, сode intended to run on a separate thread is not asynchronous, but to_thread allows you to await for its result asynchronously.
Also note that the function runs in the context of the current task, so its context variable values will be available inside the func.
async def to_thread(func, /, *args, **kwargs):
"""Asynchronously run function *func* in a separate thread.
Any *args and **kwargs supplied for this function are directly passed
to *func*. Also, the current :class:`contextvars.Context` is propogated,
allowing context variables from the main thread to be accessed in the
separate thread.
Return a coroutine that can be awaited to get the eventual result of *func*.
"""
loop = events.get_running_loop()
ctx = contextvars.copy_context()
func_call = functools.partial(ctx.run, func, *args, **kwargs)
return await loop.run_in_executor(None, func_call)
you would use asyncio.to_tread when ever you need to call a blocking api from a third party lib that either does not have an asyncio adapter/interface or where you do not want to create one because you just need to use a limited number of functions form that lib.
a concrete example is i am currently writing a applicaiton that will eventually run as a daemon at which point it will use asyncio for its core event loop. The eventloop will involved monitoring a unix socket for notifications which will trigger the deamon to take an action.
for rapid prototyping its currently a cli but one of the depencies/external system the deamon will interact with is call libvirt, an abstraction layer for virtual machine management written in c with a python wrapper called libvirt python.
the python binding are blocking and comunitcate with the libvirt deamon over a separate unix socket with a blocking request responce protocol.
you can conceptually think of making a call to the libvirt bindings as each function internally making a http request to a server and waiting for the server to complete the action. The exact mechanics of how it does that are not important for this disucssion just that its a blocking io operation that depends on and external process that may take some time. i.e. this is not a cpu bound call and therefore it can be offloaded to a thread and awaited.
if i was to directly call “domains = libvirt.conn.listAllDomains()” in a async function
that would block my asyncio event loop until i got a responce form libvirt.
so if any events were recived on the unix socket my main loop is monitoring
they would not be processed while we are waiting for the libvirt deamon to look up all domains and return the list of them to us.
if i use “domains = await asyncio.to_thread(libvirt.conn.listAllDomains)”
however the await call will suspend my current coroutine until we get the responce, yeilding execution back to the asyncio event loop. that means if the daemon recives a notification while we are waiting on libvirt it can be schduled to run concurrently instead of being blocked.
in my application i will also need to read and write to linux speical files in /sys. linux has natiave aio file support which can be used with asyncio vai aiofile however linux does not supprot the aio interface for managing special files, so i would have to use blocking io.
one way to do that in a async applicaiton would be to wrap function that writes to the special files asyncio.to_thread.
i could and might use a decorator to use run_in_executor directly since i own the write_sysfs function but if i did not then to_thread is more polite then monkeypatching someone else’s lib and less work then creating my own wrapper api.
hopefully those are useful examples of where you might want to use to_thread. its really just a convince function and you can use run_in_executor to do the same thing with so addtional overhead.
if you need to support older python release you might also prefer run_in_executor since it predates the intorduction of to_thread but if you can assume 3.9+ then its a nice addtion to leverage when you need too.
This is my first attempt at using asyncio in a project. I'd like my class to initialize and run, with several of its functions running periodically "in the background". I'd like the class' init to return after starting these background tasks, so that it can continue to do its synchronous stuff at the same time.
What I have:
class MyClass(threading.Thread):
def __init__(self, param):
self.stoprequest = threading.Event()
threading.Thread.__init__(self)
self.param = param
self.loop = asyncio.new_event_loop()
asyncio.set_event_loop(self.loop)
asyncio.ensure_future(self.periodic(), loop=self.loop)
print("Initialized")
async def periodic(self):
while True:
print("I'm here")
await asyncio.sleep(1)
def run(self):
# continue to do synchronous things
I'm sure unsurprisingly, this doesn't work. I've also tried using a "normal" asyncio function with run_until_complete() in init, but of course init never returns then.
How can I run asyncio functions that belong to this class periodically in the background, while the rest of the class (run()) continues to do synchronous work?
Passing loop as argument to ensure_future doesn't start this loop. You should call run_until_complete or run_forever to force you coroutines being started, there's no other way to do it.
How can I run asyncio functions that belong to this class periodically
in the background, while the rest of the class (run()) continues to do
synchronous work?
You can't. Just as you can't run event loop and synchronious code simultaneously in the main thread. Loop starting - blocks thread's execution flow until loop is stopped. This is just how asyncio works.
If you want to run asyncio in background you should run it in separate thread and do your synchronous things in main thread. Example of how to do it can be found here.
It you need to run blocking code in thread alongside with asyncio most convenient way now is to run asyncio in the main thread and to run blocking code in a background thread using run_in_executor function. You can find example of doing it here.
It's important to say that asyncio itself usually is used in main thread (without other threads) to achieve benefits of asynchronous programming. Are you sure you need second thread? If not please read this answer to see why asyncio is used.
I'm trying to connect to more than one server at the same time. I am currently using loop.create_connection but it freezes up at the first non-responding server.
gsock = loop.create_connection(lambda: opensock(sid), server, port)
transport, protocol = loop.run_until_complete(gsock)
I tried threading this but it created problems with the sid value being used as well as various errors such as RuntimeError: Event loop is running and RuntimeError: Event loop stopped before Future completed. Also, according my variables (tho were getting mixed up) the protocol's connection_made() method gets executed when transport, protocol = loop.run_until_complete(gsock) throws an exception.
I don't understand much about the asyncio module so please be as thorough as possible. I dont think I need reader/writer variables, as the reading should be done automatically and trigger data_received() method.
Thank You.
You can connect to many servers at the same time by scheduling all the coroutines concurrently, rather than using loop.run_until_complete to make each connection individually. One way to do that is to use asyncio.gather to schedule them all and wait for each to finish:
import asyncio
# define opensock somewhere
#asyncio.coroutine
def connect_serv(server, port):
try:
transport, protocol = yield from loop.create_connection(lambda: opensock(sid), server, port)
except Exception:
print("Connection to {}:{} failed".format(server, port))
loop = asyncio.get_event_loop()
loop.run_until_complete(
asyncio.gather(
connect_serv('1.2.3.4', 3333),
connect_serv('2.3.4.5', 5555),
connect_serv('google.com', 80),
))
loop.run_forever()
This will kick off all three coroutines listed in the call to gather concurrently, so that if one of them hangs, the others won't be affected; they'll be able to carry on with their work while the other connection hangs. Then, if all of them complete, loop.run_forever() gets executed, which will allow you program to continue running until you stop the loop or kill the program.
The reader/writer variables you mentioned would only be relevant if you used asyncio.open_connection to connect to the servers, rather than create_connection. It uses the Stream API, which is a higher-level API than the protocol/transport-based API that create_connection uses. It's really up to you to decide which you prefer to use. There are examples of both in the asyncio docs, if you want to see a comparison.
I am writing a class that creates threads that timeout if not used within a certain time. The class allows you to pump data to a specific thread (by keyword), and if it doesn't exist it creates the thread.
Anywho, the problem I have is main supervisor class doesn't know when threads have ended. I can't put blocking code like join or poll to see if it's alive. What I want is an event handler, that is called when a thread ends (or is just about to end) so that I can inform the supervisor that the thread is no longer active.
Is this something that can be done with signal or something similar?
As psuedocode, I'm looking for something like:
def myHandlerFunc():
# inform supervisor the thread is dead
t1 = ThreadFunc()
t1.eventHandler(condition=thread_dies, handler=myHandlerFunc)
EDIT: Perhaps a better way would be to pass a ref to the parent down to the thread, and have the thread tell parent class directly. I'm sure someone will tell me off for data flow inversion.
EDIT: Here is some psuedocode:
class supervisor():
def __init__:
Setup thread dict with all threads as inactive
def dispatch(target, message):
if(target thread inactive):
create new thread
send message to thread
def thread_timeout_handler():
# Func is called asynchronously when a thread dies
# Does some stuff over here
def ThreadFunc():
while( !timeout ):
wait for message:
do stuff with message
(Tell supervisor thread is closing?)
return
The main point is that you send messages to the threads (referenced by keyword) through the supervisor. The supervisor makes sure the thread is alive (since they timeout after a while), creates a new one if it dies, and sends the data over.
Looking at this again, it's easy to avoid needing an event handler as I can just check if the thread is alive using threadObj.isAlive() instead of dynamically keeping a dict of thread statuses.
But out of curiosity, is it possible to get a handler to be called in the supervisor class by signals sent from the thread? The main App code would call the supervisor.dispatch() function once, then do other stuff. It would later be interrupted by the thread_timeout_handler function, as the thread had closed.
You still don't mention if you are using a message/event loop framework, which would provide a way for you to dispatch a call to the "main" thread and call an event handler.
Assuming you're not, than you can't just interrupt or call into the main thread.
You don't need to, though, as you only need to know if a thread is alive when you decide if you need to create a new one. You can do your checking at this time. This way, you only need a way to communicate the "finished" state between threads. There are a lot of ways to do this (I've never used .isAlive(), but you can pass information back in a Queue, Event, or even a shared variable).
Using Event it would look something like this:
class supervisor():
def __init__:
Setup thread dict with all threads as inactive
def dispatch(target, message):
if(thread.event.is_set()):
create new thread
thread.event = Event()
send message to thread
def ThreadFunc(event):
while( !timeout ):
wait for message:
do stuff with message
event.set()
return
Note that this way there is still a possible race condition. The supervisor thread might check is_set() right before the worker thread calls .set() which will lie about the thread's ability to do work. The same problem would exist with isAlive().
Is there a reason you don't just use a threadpool?