I have a multiprocessing.Pool running tasks that I wan't to exit gracefully in case of a terminate by handling the SIGTERM signal
This is my code example (used python 3.9)
import os
import signal
import time
from multiprocessing import Pool
class SigTermException(Exception):
pass
def sigtermhandler(signum, frame):
raise SigTermException('sigterm')
def f():
print(os.getpid())
try:
while True:
print("loop")
time.sleep(5)
except SigTermException:
print("Received SIGTERM")
def main():
signal.signal(signal.SIGTERM, sigtermhandler)
pool = Pool()
pool.apply_async(f)
print("wait 5")
time.sleep(5)
print("Terminating")
pool.terminate()
print("Joining")
pool.join()
print("Exiting")
if __name__ == '__main__':
main()
I was expecting to print
...
Terminating
Received SIGTERM
Joining
Exiting
However it seems it doesn't go past pool.terminate()
Here's an example
wait 5
92363
loop
Terminating
loop
Received SIGTERM
Performing a ps I see the following
92362 pts/0 S+ 0:00 | | \_ python signal_pool.py
92363 pts/0 S+ 0:00 | | \_ python signal_pool.py
So it looks like the child process is still 'alive'
Also tested the solution mentioned here to no avail
Any hints o help is appreciated
Your worker function, f, runs forever yet your main process sleeps just for 5 seconds and then calls terminate on the pool which would result in killing any running tasks. This contradicts your saying you would like to have your tasks exit gracefully in case of receiving a SIGTERM because as it now stands, they will not exit gracefully in the absence of a SIGTERM.
So I would think the main process should be waiting as long as necessary for the submitted task or tasks to complete -- this is the usual situation, right? It also seems that when I tried this and issued a kill -15 command, perhaps because the main process is just in a wait state waiting for the submitted task to complete, that the worker function alone handled this and the signal was never passed to the main process. I therefore did not need a try/except block in the main proceess.
import os
import signal
import time
from multiprocessing import Pool
class SigTermException(Exception):
pass
def sigtermhandler(signum, frame):
raise SigTermException('sigterm')
def f():
print(os.getpid())
try:
while True:
print("loop")
time.sleep(5)
except SigTermException:
print("Received SIGTERM")
def main():
signal.signal(signal.SIGTERM, sigtermhandler)
pool = Pool()
async_result = pool.apply_async(f)
print("waiting for task to complete ...")
async_result.get() # wait for task to complete
pool.close()
print("Joining")
pool.join()
print("Exiting")
if __name__ == '__main__':
main()
Printed:
waiting for task to complete ...
98
loop
Received SIGTERM
Joining
Exiting
You can also just do:
def main():
signal.signal(signal.SIGTERM, sigtermhandler)
pool = Pool()
pool.apply_async(f)
print("waiting for all tasks to complete ...")
pool.close()
pool.join()
print("Exiting")
Related
Where using a multiprocessing queue to communicate between processes, many articles recommend sending a terminate message to the queue.
However, if a child process is the producer, if may fail expectedly, leaving the consumer without and notification to expect more messages.
However, the parent process can be notified if a process when a child dies.
It seems it should be possible for it to notify a worker thread in this process to quit and not expect more messages. But how?
multiprocessing.Queue.close()
...doesn't notify consumers (Really? Wait? what!)
def onProcessQuit(): # Notify worker that we are done.
messageQ.put("TERMINATE")
... doesn't let me wait for pending work to complete.
def onProcessQuit(): # Notify worker that we are done.
messageQ.put("TERMINATE")
# messageQ.close()
messageQ.join_thread() # Wait for worker to complete
... fails because the queue is not yet closed.
def onProcessQuit(): # Notify worker that we are done.
messageQ.put("TERMINATE")
messageQ.close()
messageQ.join_thread() # Wait for worker to complete
... seems like it should work, but fails in the worker with a TypeError exception:
msg = messageQ.get()
File "/usr/lib/python3.7/multiprocessing/queues.py", line 94, in get
res = self._recv_bytes()
File "/usr/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytes
buf = self._recv_bytes(maxlength)
File "/usr/lib/python3.7/multiprocessing/connection.py", line 411, in _recv_bytes
return self._recv(size)
File "/usr/lib/python3.7/multiprocessing/connection.py", line 379, in _recv
chunk = read(handle, remaining)
TypeError: an integer is required (got type NoneType)
while !quit:
try:
msg = messageQ.get(block=True, timeout=0.5)
except Empty:
continue
... is terrible in that it unnecessarily demands trading shutdown latency without throttling the CPU.
Full example
import multiprocessing
import threading
def producer(messageQ):
messageQ.put("1")
messageQ.put("2")
messageQ.put("3")
if __name__ == '__main__':
messageQ = multiprocessing.Queue()
def worker():
try:
while True:
msg = messageQ.get()
print(msg)
if msg=="TERMINATE": return
# messageQ.task_done()
finally:
print("Worker quit")
# messageQ.close() # End thread
# messageQ.join_thread()
thr = threading.Thread(target=worker,
daemon=False) # The work queue is precious.
thr.start()
def onProcessQuit(): # Notify worker that we are done.
messageQ.put("TERMINATE") # Notify worker we are done
messageQ.close() # No more messages
messageQ.join_thread() # Wait for worker to complete
def runProcess():
proc = multiprocessing.Process(target=producer, args=(messageQ,))
proc.start()
proc.join()
print("runProcess quitting ...")
onProcessQuit()
print("runProcess quitting .. OK")
runProcess()
If you are concerned about the producer process not completing normally, then I am not sure what your question is because your code as is should work except for a few corrections: (1) it is missing an import statement, (2) there is no call to runProcess and (3) your worker thread is incorrectly a daemon thread (as such it may end up terminating before it has had a chance to process all the messages on the queue).
I would also use as a personal preference (and not a correction) None as the special sentinel message instead of TERMINATE and remove some extraneous queue calls that you don't really need (I don't see your explicitly closing the queue accomplishing anything that is necessary).
These are the changes:
def producer(messageQ):
messageQ.put("1")
messageQ.put("2")
messageQ.put("3")
if __name__ == '__main__':
import multiprocessing
import threading
SENTINEL = None
def worker():
try:
while True:
msg = messageQ.get()
if msg is SENTINEL:
return # No need to print the sentinel
print(msg)
finally:
print("Worker quit")
def onProcessQuit(): # Notify worker that we are done.
messageQ.put(SENTINEL) # Notify worker we are done
def runProcess():
proc = multiprocessing.Process(target=producer, args=(messageQ,))
proc.start()
proc.join()
print("runProcess quitting ...")
onProcessQuit()
print("runProcess quitting .. OK")
thr.join()
messageQ = multiprocessing.Queue()
thr = threading.Thread(target=worker) # The work queue is precious.
thr.start()
runProcess()
Prints:
1
2
3
runProcess quitting ...
runProcess quitting .. OK
Worker quit
if I use asyncio to spawn subprocess which run another python script, there is warning at the end: RuntimeWarning: A loop is being detached from a child watcher with pending handlers, if the subprocess is terminate by terminate().
For example a very simple dummy:
import datetime
import time
import os
if __name__ == '__main__':
for i in range(3):
msg = 'pid({}) {}: continue'.format(os.getpid(), datetime.datetime.now())
print(msg)
time.sleep(1.0)
Then I spawn the dummy:
import asyncio
from asyncio import subprocess
T = 3 # if the T=5, which allows the thread to finish, then there is no such warning.
async def handle_proc():
p = None
try:
p = await subprocess.create_subprocess_exec(
'python3', 'dummy.py',
#'dummy.sh'
)
await asyncio.sleep(T)
finally:
if p and p.returncode is None:
p.terminate()
print('handle_proc Done!')
if __name__ == '__main__':
asyncio.run(handle_proc())
There will be the warning, if T is short enough to allow p.terminate() get called.
if I spawn a bash script there is no warning:
#!/usr/bin/env bash
set -e
N=10
T=1
for i in $(seq 1 $N)
do
echo "loop=$i/$N, sleep $T"
>&2 echo "msg in stderror: ($i/$N,$T)"
sleep $T
done
What did I do wrong here?
To get rid of this warning, you need to await p.wait() after p.terminate().
The reason you need to do this is that p.terminate() only sends signal.SIGTERM to the process. The process might still do some cleanup work after receiving the signal. If you don't wait for the process to finish terminating, your script finishes before the process and the process is cut off prematurely.
I have python function which is calling into a C library I cannot control or update. Unfortunately, there is an intermittent bug with the C library and occasionally it hangs. To protect my application from also hanging I'm trying to isolate the function call in ThreadPoolExecutor or ProcessPoolExecutor so only that thread or process crashes.
However, the following code hangs, because the executor cannot shut down because the process is still running!
Is it possible to cancel an executor with a future that has hung?
import time
from concurrent.futures import ThreadPoolExecutor, wait
if __name__ == "__main__":
def hang_forever(*args):
print("Starting hang_forever")
time.sleep(10.0)
print("Finishing hang_forever")
print("Starting executor")
with ThreadPoolExecutor() as executor:
future = executor.submit(hang_forever)
print("Submitted future")
done, not_done = wait([future], timeout=1.0)
print("Done", done, "Not done", not_done)
# with never exits because future has hung!
if len(not_done) > 0:
raise IOError("Timeout")
The docs say that it's not possible to shut down the executor until all pending futures are done executing:
Regardless of the value of wait, the entire Python program will not
exit until all pending futures are done executing.
Calling future.cancel() won't help as it will also hang. Fortunately, you can solve your problem by using multiprocessing.Process directly instead of using ProcessPoolExecutor:
import time
from multiprocessing import Process
def hang_forever():
while True:
print('hang forever...')
time.sleep(1)
def main():
proc = Process(target=hang_forever)
print('start the process')
proc.start()
time.sleep(1)
timeout = 3
print(f'trying to join the process in {timeout} sec...')
proc.join(timeout)
if proc.is_alive():
print('timeout is exceeded, terminate the process!')
proc.terminate()
proc.join()
print('done')
if __name__ == '__main__':
main()
Output:
start the process
hang forever...
trying to join the process in 3 sec...
hang forever...
hang forever...
hang forever...
hang forever...
timeout is exceeded, terminate the process!
done
This question concerns multiprocessing in python. I want to execute some code when I terminate the process, to be more specific just before it will be terminated. I'm looking for a solution which works as atexit.register for the python program.
I have a method worker which looks:
def worker():
while True:
print('work')
time.sleep(2)
return
I run it by:
proc = multiprocessing.Process(target=worker, args=())
proc.start()
My goal is to execute some extra code just before terminating it, which I do by:
proc.terminate()
Use signal handling and intercept SIGTERM:
import multiprocessing
import time
import sys
from signal import signal, SIGTERM
def before_exit(*args):
print('Hello')
sys.exit(0) # don't forget to exit!
def worker():
signal(SIGTERM, before_exit)
time.sleep(10)
proc = multiprocessing.Process(target=worker, args=())
proc.start()
time.sleep(3)
proc.terminate()
Produces the desirable output just before subprocess termination.
When you import and use package, this package can run non daemon threads. Until these threads are finished, python cannot exit properly (like with sys.exit(0)). For example imagine that thread t is from some package. When unhandled exception occurs in the main thread, you want to terminate. But this won't exit immediately, it will wait 60s till the thread terminates.
import time, threading
def main():
t = threading.Thread(target=time.sleep, args=(60,))
t.start()
a = 5 / 0
if __name__ == '__main__':
try:
main()
except:
sys.exit(1)
So I came up with 2 things. Replace sys.exit(1) with os._exit(1) or enumerate all threads and make them daemon. Both of them seems to work, but what do you thing is better? os._exit won't flush stdio buffers but setting daemon attribute to threads seems like a hack and maybe it's not guaranteed to work all the time.
import time, threading
def main():
t = thread.Thread(target=time.sleep, args=(60,))
t.start()
a = 5 / 0
if __name__ == '__main__':
try:
main()
except:
for t in threading.enumerate():
if not t.daemon and t.name != "MainThread":
t._daemonic = True
sys.exit(1)