Problem
It's very common for beginners to solve IO waiting while concurrent processing in an similar way like here:
#!/usr/bin/env python3
"""Loop example."""
from time import sleep
WAITING: bool = True
COUNTER: int = 10
def process() -> None:
"""Non-blocking routine, that needs to be invoked periodically."""
global COUNTER # pylint: disable=global-statement
print(f"Done in {COUNTER}.")
COUNTER -= 1
sleep(1)
# Mimicking incoming IO callback
if COUNTER <= 0:
event()
def event() -> None:
"""Incoming IO callback routine."""
global WAITING # pylint: disable=global-statement
WAITING = False
try:
while WAITING:
process()
except KeyboardInterrupt:
print("Canceled.")
Possible applications might be servers, what are listening for incomming messages, while still processing some other internal stuff.
Possible Solution 1
Threading might in some cases a good solution.
But after some research it seems that threading adds a lot of overheading for the communcation between the threads.
One example for this might be the 'Warning' in the osc4py3 package documentation below the headline 'No thread'.
Also i have read somewhere the thumb rule, that 'Threading suits not for slow IO' (sorry, lost the source of this rule).
Possible Solution 2
Asynchronous processing (with the asyncio package) might be another solution.
Especially because the ominous thumb rule also says that 'For slow IO is asyncio efficient'.
What i tried
So i tried to rewrite this example with asyncio but failed completely, even after reading about Tasks, Futures and Awaitables in general in the Python asyncio documentation.
My problem was to solve the perodically (instead of one time) call while waiting.
Of course there are infinite loops possible, but all examples i found in the internet are still using 'While-True'-Loops what does not look like an improvement to me.
For example this snippet:
import asyncio
async def work():
while True:
await asyncio.sleep(1)
print("Task Executed")
loop = asyncio.get_event_loop()
try:
asyncio.ensure_future(work())
loop.run_forever()
except KeyboardInterrupt:
pass
finally:
print("Closing Loop")
loop.close()
Source: https://tutorialedge.net/python/concurrency/asyncio-event-loops-tutorial/#the-run_forever-method
What i want
To know the most elegant and efficient way of rewriting these stupid general 'While-True'-Loop from my first example code.
If my 'While-True'-Loop is still the best way to solve it (beside my global variables), then it's also okay to me.
I just want to improve my code, if possible.
What you describe is some kind of polling operation and is similar to busy waiting. You should rarely rely on those methods as they can incur a serious performance penalty if used incorrectly. Instead, you should rely on concurrency primitives provided by the OS of a concurrency library.
As said in a comment, you could rely on a condition or an event (and more broadly on mutexes) to schedule some come to run after an event occurs. For I/O operations you can also rely on low-level OS facilities such as select, poll and signals/interruptions.
Possible applications might be servers, what are listening for
incomming messages, while still processing some other internal stuff.
For such use cases you should really use a dedicated library to do that efficiently. For instance, here is an example of a minimal server developed with AsyncIO's low-level socket operations. Internally, AsyncIO probably uses the select system call and exposes a friendly interface with async-await.
Solution with asyncio:
#!/usr/bin/env python3
"""Asyncronous loop example."""
from typing import Callable
from asyncio import Event, get_event_loop
DONE = Event()
def callback():
"""Incoming IO callback routine."""
DONE.set()
def process():
"""Non-blocking routine, that needs to be invoked periodically."""
print('Test.')
try:
loop = get_event_loop()
run: Callable = lambda loop, processing: (
processing(),
loop.call_soon(run, loop, processing)
)
loop.call_soon(run, loop, process)
loop.call_later(1, callback) # Mimicking incoming IO callback after 1 sec
loop.run_until_complete(DONE.wait())
except KeyboardInterrupt:
print("Canceled.")
finally:
loop.close()
print("Bye.")
Related
I looked online and found some SO discussing and ActiveState recipes for running some code with a timeout. It looks there are some common approaches:
Use thread that run the code, and join it with timeout. If timeout elapsed - kill the thread. This is not directly supported in Python (used private _Thread__stop function) so it is bad practice
Use signal.SIGALRM - but this approach not working on Windows!
Use subprocess with timeout - but this is too heavy - what if I want to start interruptible task often, I don't want fire process for each!
So, what is the right way? I'm not asking about workarounds (eg use Twisted and async IO), but actual way to solve actual problem - I have some function and I want to run it only with some timeout. If timeout elapsed, I want control back. And I want it to work on Linux and Windows.
A completely general solution to this really, honestly does not exist. You have to use the right solution for a given domain.
If you want timeouts for code you fully control, you have to write it to cooperate. Such code has to be able to break up into little chunks in some way, as in an event-driven system. You can also do this by threading if you can ensure nothing will hold a lock too long, but handling locks right is actually pretty hard.
If you want timeouts because you're afraid code is out of control (for example, if you're afraid the user will ask your calculator to compute 9**(9**9)), you need to run it in another process. This is the only easy way to sufficiently isolate it. Running it in your event system or even a different thread will not be enough. It is also possible to break things up into little chunks similar to the other solution, but requires very careful handling and usually isn't worth it; in any event, that doesn't allow you to do the same exact thing as just running the Python code.
What you might be looking for is the multiprocessing module. If subprocess is too heavy, then this may not suit your needs either.
import time
import multiprocessing
def do_this_other_thing_that_may_take_too_long(duration):
time.sleep(duration)
return 'done after sleeping {0} seconds.'.format(duration)
pool = multiprocessing.Pool(1)
print 'starting....'
res = pool.apply_async(do_this_other_thing_that_may_take_too_long, [8])
for timeout in range(1, 10):
try:
print '{0}: {1}'.format(duration, res.get(timeout))
except multiprocessing.TimeoutError:
print '{0}: timed out'.format(duration)
print 'end'
If it's network related you could try:
import socket
socket.setdefaulttimeout(number)
I found this with eventlet library:
http://eventlet.net/doc/modules/timeout.html
from eventlet.timeout import Timeout
timeout = Timeout(seconds, exception)
try:
... # execution here is limited by timeout
finally:
timeout.cancel()
For "normal" Python code, that doesn't linger prolongued times in C extensions or I/O waits, you can achieve your goal by setting a trace function with sys.settrace() that aborts the running code when the timeout is reached.
Whether that is sufficient or not depends on how co-operating or malicious the code you run is. If it's well-behaved, a tracing function is sufficient.
An other way is to use faulthandler:
import time
import faulthandler
faulthandler.enable()
try:
faulthandler.dump_tracebacks_later(3)
time.sleep(10)
finally:
faulthandler.cancel_dump_tracebacks_later()
N.B: The faulthandler module is part of stdlib in python3.3.
If you're running code that you expect to die after a set time, then you should write it properly so that there aren't any negative effects on shutdown, no matter if its a thread or a subprocess. A command pattern with undo would be useful here.
So, it really depends on what the thread is doing when you kill it. If its just crunching numbers who cares if you kill it. If its interacting with the filesystem and you kill it , then maybe you should really rethink your strategy.
What is supported in Python when it comes to threads? Daemon threads and joins. Why does python let the main thread exit if you've joined a daemon while its still active? Because its understood that someone using daemon threads will (hopefully) write the code in a way that it wont matter when that thread dies. Giving a timeout to a join and then letting main die, and thus taking any daemon threads with it, is perfectly acceptable in this context.
I've solved that in that way:
For me is worked great (in windows and not heavy at all) I'am hope it was useful for someone)
import threading
import time
class LongFunctionInside(object):
lock_state = threading.Lock()
working = False
def long_function(self, timeout):
self.working = True
timeout_work = threading.Thread(name="thread_name", target=self.work_time, args=(timeout,))
timeout_work.setDaemon(True)
timeout_work.start()
while True: # endless/long work
time.sleep(0.1) # in this rate the CPU is almost not used
if not self.working: # if state is working == true still working
break
self.set_state(True)
def work_time(self, sleep_time): # thread function that just sleeping specified time,
# in wake up it asking if function still working if it does set the secured variable work to false
time.sleep(sleep_time)
if self.working:
self.set_state(False)
def set_state(self, state): # secured state change
while True:
self.lock_state.acquire()
try:
self.working = state
break
finally:
self.lock_state.release()
lw = LongFunctionInside()
lw.long_function(10)
The main idea is to create a thread that will just sleep in parallel to "long work" and in wake up (after timeout) change the secured variable state, the long function checking the secured variable during its work.
I'm pretty new in Python programming, so if that solution has a fundamental errors, like resources, timing, deadlocks problems , please response)).
solving with the 'with' construct and merging solution from -
Timeout function if it takes too long to finish
this thread which work better.
import threading, time
class Exception_TIMEOUT(Exception):
pass
class linwintimeout:
def __init__(self, f, seconds=1.0, error_message='Timeout'):
self.seconds = seconds
self.thread = threading.Thread(target=f)
self.thread.daemon = True
self.error_message = error_message
def handle_timeout(self):
raise Exception_TIMEOUT(self.error_message)
def __enter__(self):
try:
self.thread.start()
self.thread.join(self.seconds)
except Exception, te:
raise te
def __exit__(self, type, value, traceback):
if self.thread.is_alive():
return self.handle_timeout()
def function():
while True:
print "keep printing ...", time.sleep(1)
try:
with linwintimeout(function, seconds=5.0, error_message='exceeded timeout of %s seconds' % 5.0):
pass
except Exception_TIMEOUT, e:
print " attention !! execeeded timeout, giving up ... %s " % e
My application needs remote control over SSH.
I wish to use this example: https://asyncssh.readthedocs.io/en/latest/#simple-server-with-input
The original app is rather big, using GPIO and 600lines of code, 10 libraries. so I've made a simple example here:
import asyncio, asyncssh, sys, time
# here would be 10 libraries in the original 600line application
is_open = True
return_value = 0;
async def handle_client(process):
process.stdout.write('Enter numbers one per line, or EOF when done:\n')
process.stdout.write(is_open)
total = 0
try:
async for line in process.stdin:
line = line.rstrip('\n')
if line:
try:
total += int(line)
except ValueError:
process.stderr.write('Invalid number: %s\n' % line)
except asyncssh.BreakReceived:
pass
process.stdout.write('Total = %s\n' % total)
process.exit(0)
async def start_server():
await asyncssh.listen('', 8022, server_host_keys=['key'],
authorized_client_keys='key.pub',
process_factory=handle_client)
loop = asyncio.get_event_loop()
try:
loop.run_until_complete(start_server())
except (OSError, asyncssh.Error) as exc:
sys.exit('Error starting server: ' + str(exc))
loop.run_forever()
# here is the "old" program: that would not run now as loop.run_forever() runs.
#while True:
# print(return_value)
# time.sleep(0.1)
The main app is mostly driven by a while True loop with lots of functions and sleep.
I've commented that part out in the simple example above.
My question is: How should I implement the SSH part, that uses loop.run_forever() - and still be able to run my main loop?
Also: the handle_client(process) - must be able to interact with variables in the main program. (read/write)
You have basically three options:
Rewrite your main loop to be asyncio compatible
A main while True loop with lots of sleeps is exactly the kind of code you want to write asynchronously. Convert this:
while True:
task_1() # takes n ms
sleep(0.2)
task_2() # takes n ms
sleep(0.4)
into this:
async def task_1():
while True:
stuff()
await asyncio.sleep(0.6)
async def task_2():
while True:
stuff()
await asyncio.sleep(0.01)
other_stuff()
await asyncio.sleep(0.8)
loop = asyncio.get_event_loop()
loop.add_task(task_1())
loop.add_task(task_2())
...
loop.run_forever()
This is the most work, but it is almost certain that your current code will be better written, clearer, easier to maintain and easier to develop if written as a bunch of coroutines. If you do this the problem goes away: with cooperative multitasking you tell the code when to yield, so sharing state is generally pretty easy. By not awaiting anything in between getting and using a state var you prevent race conditions: no need for any kind of thread-safe var.
Run your asyncio loop in a thread
Leave your current loop intact, but run your ascynio loop in a thread (or process) with either threading or multiprocessing. Expose some kind of thread-safe variable to allow the background thread to change state, or transition to a (thread safe) messaging paradigm, where the ssh thread emits messages into a queue which your main loop handles in its own time (a message could be something like ("a", 5) which would be handled by doing something like state_dict[msg[0]] == msg[1] for everything in the queue).
If you want to go this way, have a look at the multiprocessing and/or threading docs for examples of the right ways to pass variables or messages between threads. Note that this version will likely be less performant than a pure asyncio solution, particularly if your code is mostly sleeping in the main loop anyhow.
Run your synchronous code in a thread, and have asyncio in the foreground
As #MisterMiyagi points out, asyncio has loop.run_in_executor() for launching a process to run blocking code. It's more generally used to run the odd blocking bit of code without tying up the whole loop, but you can run your whole main loop in it. The same concerns about some kind of thread safe variable or message sharing apply. This has the advantage (as #MisterMiyagi points out) of keeping asyncio where it expects to be. I have a few projects which use background asyncio threads in generally non-asyncio code (event-driven gui code with an asyncio thread interacting with custom hardware over usb). It can be done, but you do have to be careful as to how you write it.
Note btw that if you do decide to use multiple threads, message-passing (with a queue) is usually easier than directly sharing variables.
We have a rather big project that is doing a lot of networking (API calls, Websocket messages) and that also has a lot of internal jobs running in intervals in threads. Our current architecture involves spawning a lot of threads and the app is not working very well when the system is under a big load, so we've decided to give asyncio a try.
I know that the best way would be to migrate the whole codebase to async code, but that is not realistic in the very near future because of the size of the codebase and the limited development resources. However, we would like to start migrating parts of our codebase to use asyncio event loop and hopefully, we will be able to convert the whole project at some point.
The problem we have encountered so far is that the whole codebase has sync code and in order to add non-blocking asyncio code inside, the code needs to be run in different thread since you can't really run async and sync code in the same thread.
In order to combine async and sync code, I came up with this approach of running the asyncio code in a separate thread that is created on app start. Other parts of the code add jobs to this loop simply by calling add_asyncio_task.
import threading
import asyncio
_tasks = []
def threaded_loop(loop):
asyncio.set_event_loop(loop)
global _tasks
while True:
if len(_tasks) > 0:
# create a copy of needed tasks
needed_tasks = _tasks.copy()
# flush current tasks so that next tasks can be easily added
_tasks = []
# run tasks
task_group = asyncio.gather(*needed_tasks)
loop.run_until_complete(task_group)
def add_asyncio_task(task):
_tasks.append(task)
def start_asyncio_loop():
loop = asyncio.get_event_loop()
t = threading.Thread(target=threaded_loop, args=(loop,))
t.start()
and somewhere in app.py:
start_asyncio_loop()
and anywhere else in the code:
add_asyncio_task(some_coroutine)
Since I am new to asyncio, I am wondering if this is a good approach in our situation or if this approach is considered an anti-pattern and has some problems that will hit us later down the road? Or maybe asyncio already has some solution for this and I'm just trying to invent the wheel here?
Thanks for your inputs!
The approach is fine in general. You have some issues though:
(1) Almost all asyncio objects are not thread safe
(2) Your code is not thread safe on its own. What if a task appears after needed_tasks = _tasks.copy() but before _tasks = []? You need a lock here. Btw making a copy is pointless. Simple needed_tasks = _tasks will do.
(3) Some asyncio constructs are thread safe. Use them:
import threading
import asyncio
# asyncio.get_event_loop() creates a new loop per thread. Keep
# a single reference to the main loop. You can even try
# _loop = asyncio.new_event_loop()
_loop = asyncio.get_event_loop()
def get_app_loop():
return _loop
def asyncio_thread():
loop = get_app_loop()
asyncio.set_event_loop(loop)
loop.run_forever()
def add_asyncio_task(task):
asyncio.run_coroutine_threadsafe(task, get_app_loop())
def start_asyncio_loop():
t = threading.Thread(target=asyncio_thread)
t.start()
import _thread
import time
def test1():
while True:
time.sleep(1)
print('TEST1')
def test2():
while True:
time.sleep(3)
print('TEST2')
try:
_thread.start_new_thread(test1,())
_thread.start_new_thread(test2,())
except:
print("ERROR")
How can I stop the two threads for example in case of KeyboardInterrupts?
Because for "except KeyboardInterrupt" the threads are still running :/
Important:
The question is about closing threads only with the module _thread!
Is it possible?
There's no way to directly interact with another thread, except for the main thread. While some platforms do offer thread cancel or kill semantics, Python doesn't expose them, and for good reason.1
So, the usual solution is to use some kind of signal to tell everyone to exit. One possibility is a done flag with a Lock around it:
done = False
donelock = _thread.allocate_lock()
def test1():
while True:
try:
donelock.acquire()
if done:
return
finally:
donelock.release()
time.sleep(1)
print('TEST1')
_thread.start_new_thread(test1,())
time.sleep(3)
try:
donelock.acquire()
done = True
finally:
donelock.release()
Of course the same thing is a lot cleaner if you use threading (or a different higher-level API like Qt's threads). Plus, you can use a Condition or Event to make the background threads exit as soon as possible, instead of only after their next sleep finishes.
done = threading.Event()
def test1():
while True:
if done.wait(1):
return
print('TEST1')
t1 = threading.Thread(target=test1)
t1.start()
time.sleep(3)
done.set()
The _thread module doesn't have an Event or Condition, of course, but you can always build one yourself—or just borrowing from the threading source.
Or, if you wanted the threads to be killed asynchronously (which obviously isn't safe if they're, e.g., writing files, but if they're just doing computation or downloads or the like that you don't care about if you're canceling, that's fine), threading makes it even easier:
t1 = threading.Thread(target=test1, daemon=True)
As a side note, the behavior you're seeing isn't actually reliable across platforms:
Background threads created with _thread may keep running, or shut down semi-cleanly, or terminate hard. So, when you use _thread in a portable application, you have to write code that can handle any of the three.
KeyboardInterrupt may be delivered to an arbitrary thread rather than the main thread. If it is, it will usually kill that thread, unless you've set up a handler. So, if you're using _thread, you usually want to handle KeyboardInterrupt and call _thread.interrupt_main().
Also, I don't think your except: is doing what you think it is. That try only covers the start_new_thread calls. If the threads start successfully, the main thread exits the try block and reaches the end of the program. If a KeyboardInterrupt or other exception is raised, the except: isn't going to be triggered. (Also, using a bare except: and not even logging which exception got handled is a really bad idea if you want to be able to understand what your code is doing.) Presumably, on your platform, background threads continue running, and the main thread blocks on them (and probably at the OS level, not the Python level, so there's no code you can write that gets involved there).
If you want your main thread to keep running to make sure it can handle a KeyboardInterrupt and so something with it (but see the caveats above!), you have to give it code to keep running:
try:
while True:
time.sleep(1<<31)
except KeyboardInterrupt:
# background-thread-killing code goes here.
1. TerminateThread on Windows makes it impossible to do all the cleanup Python needs to do. pthread_cancel on POSIX systems like Linux and macOS makes it possible, but very difficult. And the semantics are different enough between the two that trying to write a cross-platform wrapper would be a nightmare. Not to mention that Python supports systems (mostly older Unixes) that don't have the full pthread API, or even have a completely different threading API.
As almost everyone is aware when they first look at threading in Python, there is the GIL that makes life miserable for people who actually want to do processing in parallel - or at least give it a chance.
I am currently looking at implementing something like the Reactor pattern. Effectively I want to listen for incoming socket connections on one thread-like, and when someone tries to connect, accept that connection and pass it along to another thread-like for processing.
I'm not (yet) sure what kind of load I might be facing. I know there is currently setup a 2MB cap on incoming messages. Theoretically we could get thousands per second (though I don't know if practically we've seen anything like that). The amount of time spent processing a message isn't terribly important, though obviously quicker would be better.
I was looking into the Reactor pattern, and developed a small example using the multiprocessing library that (at least in testing) seems to work just fine. However, now/soon we'll have the asyncio library available, which would handle the event loop for me.
Is there anything that could bite me by combining asyncio and multiprocessing?
You should be able to safely combine asyncio and multiprocessing without too much trouble, though you shouldn't be using multiprocessing directly. The cardinal sin of asyncio (and any other event-loop based asynchronous framework) is blocking the event loop. If you try to use multiprocessing directly, any time you block to wait for a child process, you're going to block the event loop. Obviously, this is bad.
The simplest way to avoid this is to use BaseEventLoop.run_in_executor to execute a function in a concurrent.futures.ProcessPoolExecutor. ProcessPoolExecutor is a process pool implemented using multiprocessing.Process, but asyncio has built-in support for executing a function in it without blocking the event loop. Here's a simple example:
import time
import asyncio
from concurrent.futures import ProcessPoolExecutor
def blocking_func(x):
time.sleep(x) # Pretend this is expensive calculations
return x * 5
#asyncio.coroutine
def main():
#pool = multiprocessing.Pool()
#out = pool.apply(blocking_func, args=(10,)) # This blocks the event loop.
executor = ProcessPoolExecutor()
out = yield from loop.run_in_executor(executor, blocking_func, 10) # This does not
print(out)
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
For the majority of cases, this is function alone is good enough. If you find yourself needing other constructs from multiprocessing, like Queue, Event, Manager, etc., there is a third-party library called aioprocessing (full disclosure: I wrote it), that provides asyncio-compatible versions of all the multiprocessing data structures. Here's an example demoing that:
import time
import asyncio
import aioprocessing
import multiprocessing
def func(queue, event, lock, items):
with lock:
event.set()
for item in items:
time.sleep(3)
queue.put(item+5)
queue.close()
#asyncio.coroutine
def example(queue, event, lock):
l = [1,2,3,4,5]
p = aioprocessing.AioProcess(target=func, args=(queue, event, lock, l))
p.start()
while True:
result = yield from queue.coro_get()
if result is None:
break
print("Got result {}".format(result))
yield from p.coro_join()
#asyncio.coroutine
def example2(queue, event, lock):
yield from event.coro_wait()
with (yield from lock):
yield from queue.coro_put(78)
yield from queue.coro_put(None) # Shut down the worker
if __name__ == "__main__":
loop = asyncio.get_event_loop()
queue = aioprocessing.AioQueue()
lock = aioprocessing.AioLock()
event = aioprocessing.AioEvent()
tasks = [
asyncio.async(example(queue, event, lock)),
asyncio.async(example2(queue, event, lock)),
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
Yes, there are quite a few bits that may (or may not) bite you.
When you run something like asyncio it expects to run on one thread or process. This does not (by itself) work with parallel processing. You somehow have to distribute the work while leaving the IO operations (specifically those on sockets) in a single thread/process.
While your idea to hand off individual connections to a different handler process is nice, it is hard to implement. The first obstacle is that you need a way to pull the connection out of asyncio without closing it. The next obstacle is that you cannot simply send a file descriptor to a different process unless you use platform-specific (probably Linux) code from a C-extension.
Note that the multiprocessing module is known to create a number of threads for communication. Most of the time when you use communication structures (such as Queues), a thread is spawned. Unfortunately those threads are not completely invisible. For instance they can fail to tear down cleanly (when you intend to terminate your program), but depending on their number the resource usage may be noticeable on its own.
If you really intend to handle individual connections in individual processes, I suggest to examine different approaches. For instance you can put a socket into listen mode and then simultaneously accept connections from multiple worker processes in parallel. Once a worker is finished processing a request, it can go accept the next connection, so you still use less resources than forking a process for each connection. Spamassassin and Apache (mpm prefork) can use this worker model for instance. It might end up easier and more robust depending on your use case. Specifically you can make your workers die after serving a configured number of requests and be respawned by a master process thereby eliminating much of the negative effects of memory leaks.
Based on #dano's answer above I wrote this function to replace places where I used to use multiprocess pool + map.
def asyncio_friendly_multiproc_map(fn: Callable, l: list):
"""
This is designed to replace the use of this pattern:
with multiprocessing.Pool(5) as p:
results = p.map(analyze_day, list_of_days)
By letting caller drop in replace:
asyncio_friendly_multiproc_map(analyze_day, list_of_days)
"""
tasks = []
with ProcessPoolExecutor(5) as executor:
for e in l:
tasks.append(asyncio.get_event_loop().run_in_executor(executor, fn, e))
res = asyncio.get_event_loop().run_until_complete(asyncio.gather(*tasks))
return res
See PEP 3156, in particular the section on Thread interaction:
http://www.python.org/dev/peps/pep-3156/#thread-interaction
This documents clearly the new asyncio methods you might use, including run_in_executor(). Note that the Executor is defined in concurrent.futures, I suggest you also have a look there.