Python Multiprocessing weird behavior when NOT USING time.sleep() - python

This is the exact code from Python.org. If you comment out the time.sleep(), it crashes with a long exception traceback. I would like to know why.
And, I do understand why Python.org included it in their example code. But artificially creating "working time" via time.sleep() shouldn't break the code when it's removed. It seems to me that the time.sleep() is affording some sort of spin up time. But as I said, I'd like to know from people who might actually know the answer.
A user comment asked me to fill in more details on the environment this was happening in. It was on OSX Big Sur 11.4. Using a clean install of Python 3.95 from Python.org (no Homebrew, etc). Run from within Pycharm inside a venv. I hope that helps add to understanding the situation.
import time
import random
from multiprocessing import Process, Queue, current_process, freeze_support
#
# Function run by worker processes
#
def worker(input, output):
for func, args in iter(input.get, 'STOP'):
result = calculate(func, args)
output.put(result)
#
# Function used to calculate result
#
def calculate(func, args):
result = func(*args)
return '%s says that %s%s = %s' % \
(current_process().name, func.__name__, args, result)
#
# Functions referenced by tasks
#
def mul(a, b):
#time.sleep(0.5*random.random()) # <--- time.sleep() commented out
return a * b
def plus(a, b):
#time.sleep(0.5*random.random()). # <--- time.sleep() commented out
return a + b
#
#
#
def test():
NUMBER_OF_PROCESSES = 4
TASKS1 = [(mul, (i, 7)) for i in range(20)]
TASKS2 = [(plus, (i, 8)) for i in range(10)]
# Create queues
task_queue = Queue()
done_queue = Queue()
# Submit tasks
for task in TASKS1:
task_queue.put(task)
# Start worker processes
for i in range(NUMBER_OF_PROCESSES):
Process(target=worker, args=(task_queue, done_queue)).start()
# Get and print results
print('Unordered results:')
for i in range(len(TASKS1)):
print('\t', done_queue.get())
# Add more tasks using `put()`
for task in TASKS2:
task_queue.put(task)
# Get and print some more results
for i in range(len(TASKS2)):
print('\t', done_queue.get())
# Tell child processes to stop
for i in range(NUMBER_OF_PROCESSES):
task_queue.put('STOP')
if __name__ == '__main__':
freeze_support()
test()
This is the traceback if it helps anyone:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/synchronize.py", line 110, in __setstate__
Traceback (most recent call last):
File "<string>", line 1, in <module>
self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/synchronize.py", line 110, in __setstate__
self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 116, in spawn_main
exitcode = _main(fd, parent_sentinel)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/synchronize.py", line 110, in __setstate__
self._semlock = _multiprocessing.SemLock._rebuild(*state)
FileNotFoundError: [Errno 2] No such file or directory

Here's a technical breakdown.
This is a race condition where the main process finishes, and exits before some of the children have a chance to fully start up. As long as a child fully starts, there are mechanisms in-place to ensure they shut down smoothly, but there's an unsafe in-between time. Race conditions can be very system dependent, as it is up to the OS and the hardware to schedule the different threads, as well as how fast they chew through their work.
Here's what's going on when a process is started... Early on in the creation of a child process, it registers itself in the main process so that it will be either joined or terminated when the main process exits depending on if it's daemonic (multiprocessing.util._exit_function). This exit function was registered with the atexit module on import of multiprocessing.
Also during creation of the child process, a pair of Pipes are opened which will be used to pass the Process object to the child interpreter (which includes what function you want to execute and its arguments). This requires 2 file handles to be shared with the child, and these file handles are also registered to be closed using atexit.
The problem arises when the main process exits before the child has a chance to read all the necessary data from the pipe (un-pickling the Process object) during the startup phase. If the main process first closes the pipe, then waits for the child to join, then we have a problem. The child will continue spinning up the new python instance until it gets to the point when it needs to read in the Process object containing your function and arguments it should run. It will try to read from a pipe which has already been closed, which is an error.
If all the children get a chance to fully start-up you won't see this ever, because that pipe is only used for startup. Putting in a delay which will in some way guarantee that all the children have some time to fully start up is what solves this problem. Manually calling join will provide this delay by waiting for the children before any of the atexit handlers are called. Additionally, any amount of processing delay means that q.get in the main thread will have to wait a while which also gives the children time to start up before closing. I was never able to reproduce the problem you encountered, but presumably you saw the output from all the TASKS (" Process-1 says that mul(19, 7) = 133 "). Only one or two of the child processes ended up doing all the work, allowing the main process to get all the results, and finish up before the other children finished startup.
EDIT:
The error is unambiguous as to what's happening, but I still can't figure how it happens... As far as I can tell, the file handles should be closed when calling _run_finalizers() in _exit_function after joining or terminating all active_children rather than before via _run_finalizers(0)
EDIT2:
_run_finalizers will seemingly actually never call Popen.finalizer to close the pipes, because exitpriority is None. I'm very confused as to what's going on here, and I think I need to sleep on it...

Apparently #user2357112supportsMonica was on the right track. It totally solves the problem if you join the processes before exiting the program. Also #Aaron's answer has the deep knowledge as to why this fixes the issue!
I added the following bits of code as was suggested and it totally fixed the need to have time.sleep() in there.
First I gathered all the processes when they were started:
processes: list[Process] = []
# Start worker processes
for i in range(NUMBER_OF_PROCESSES):
p = Process(target=worker, args=(task_queue, done_queue))
p.start()
processes.append(p)
Then at the end of the program I joined them as follows:
# Join the processes
for p in processes:
p.join()
Totally solved the issues. Thanks for the advice.

Related

How to use multiprocess inside an asynchronous function? It returns me that freeze_support() has been omitted

I have a program that uses many asynchronous functions that are important for its general operation, but I need one of them to execute 2 processes at the same time.
For this I decided to use the multiprocess, creating and starting the process p1 and the process p2.
p1 will be a kind of timer that will count 8 seconds and that will be the time limit that the process p2 will have to finish your tasks, but if p2 does not complete its tasks before 8 seconds (that is, p1 finishes before p2) then both processes are closed, and the program continues.
In this case p2 is a process that does operations with strings, and encodes the content of the variable called text, returning it to the main program after finishing. On the other hand, p1 is a process whose goal is to limit the time that process p2 has to perform string operations and load the result into the text variable.
The objective of this algorithm would be to avoid the possible case that if a user sent super long operations (or in a possible case that the system was saturated to process that information in a reasonable time) that took more than 8 seconds, then p2 would stop doing them and the content of the variable text would not change.
This is my simplified code, where I try to make 2 threads (called p1 and p2) each one executing a different function, but both functions are within the same class, and I am calling this class from an asynchronous function:
import asyncio, multiprocessing, time
#class BotBrain(BotModule):
class BotBrain:
# real __init__()
'''
def __init__(self, chatbot: object) -> None:
self.chatbot = chatbot
self.max_learn_range = 4
corpus_name = self.chatbot.config["CorpusName"]
self.train(corpus_name)
'''
# test __init__()
def __init__(self):
self.finish_state = multiprocessing.Event()
def operation_process(self, input_text, text):
#text = name_identificator(input_text, text)
text = "aaa" #for this example I change the info directly without detailing the nlp process
return input_text, text
def finish_process(self, finish_state):
time.sleep(8)
print("the process has been interrupted!")
self.finish_state.set()
def process_control(self, finish_state, input_text, text):
p1 = multiprocessing.Process(target=self.finish_process, args=(finish_state,))
p1.start()
p2 = multiprocessing.Process(target=self.operation_process, args=(input_text, text,))
p2.start()
#close all unfinished processes, p1 or p2
finish_state.wait()
for process in [p1, p2]:
process.terminate()
def process(self, **kwargs) -> str:
input_text, text = "Hello, how are you?", ""
text = self.process_control(self.finish_state, input_text, text) #for multiprocessing code
async def main_call():
testInstance = BotBrain() #here execute the instructions of the __init__() function
testInstance.process()
asyncio.run(main_call())
But it is giving me these errors, that I do not know how to solve them:
File "J:\async_and_multiprocess.py", line 46, in process_control
p1.start()
File "C:\Users\MIPC\anaconda3\lib\multiprocessing\process.py", line 121, in start
self._popen = self._Popen(self)
File "C:\Users\MIPC\anaconda3\lib\multiprocessing\context.py", line 224, in _Popen
return _default_context.get_context().Process._Popen(process_obj)
return _default_context.get_context().Process._Popen(process_obj)
File "C:\Users\MIPC\anaconda3\lib\multiprocessing\context.py", line 327, in _Popen
File "C:\Users\MIPC\anaconda3\lib\multiprocessing\context.py", line 327, in _Popen
return Popen(process_obj)
File "C:\Users\MIPC\anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
return Popen(process_obj)
File "C:\Users\MIPC\anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\MIPC\anaconda3\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:\Users\MIPC\anaconda3\lib\multiprocessing\spawn.py", line 154, in get_preparation_data
_check_not_importing_main()
File "C:\Users\MIPC\anaconda3\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
_check_not_importing_main()
File "C:\Users\MIPC\anaconda3\lib\multiprocessing\spawn.py", line 134, in _check_not_importing_main
raise RuntimeError('''
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
What does freeze_support() mean? And where in the class should I put that?
This is the flowchart of how my program should work, though it doesn't consider freeze_support() because I don't know what you mean by that.
Multiprocessing can be tricky, in part because of Python's import system. From the programming guidelines of the multiprocessing module:
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
This is what is happening here. Importing your main module executes asyncio.run(main_call()) which in turn would spawn another subprocess and so on.
You need to guard your asyncio.run from being exectuted on import with:
if __name__ == "__main__":
asyncio.run(main_call())
If you don't want to produce a frozen binary (e.g. with pyinstaller) you can ignore the part with the freeze_support().

Multiprocessing.Queue with hugh data causes _wait_for_tstate_lock

An Exception is raised in threading._wait_for_tstate_lock when I transfere hugh data between a Process and a Thread via multiprocessing.Queue.
My minimal working example looks a bit complex first - sorry. I will explain. The original application loads a lot of (not so important) files into RAM. This is done in a separate process to save ressources. The main gui thread shouldn't freez.
The GUI start a separate Thread to prevent the gui event loop from freezing.
This separate Thread then starts one Process which should does the work.
a) This Thread instanciates a multiprocess.Queue (be aware that this is a multiprocessing and not threading!)
b) This is givin to the Process for sharing data from Process back to the Thread.
The Process does some work (3 steps) and .put() the result into the multiprocessing.Queue.
When the Process ends the Thread takes over again and collect the data from the Queue, store it to its own attribute MyThread.result.
The Thread tells the GUI main loop/thread to call a callback function if it has time for.
The callback function (MyWindow::callback_thread_finished()) get the results from MyWindow.thread.result.
The problem is if the data put to the Queue is to big something happen I don't understand - the MyThread never ends. I have to cancle the application via Strg+C.
I got some hints from the docs. But my problem is I did not fully understand the documentation. But I have the feeling that the key of my problems can be found there.
Please see the two red boxex in "Pipes and Queues" (Python 3.5 docs).
That is the full output
MyWindow::do_start()
Running MyThread...
Running MyProcess...
MyProcess stoppd.
^CProcess MyProcess-1:
Exception ignored in: <module 'threading' from '/usr/lib/python3.5/threading.py'>
Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 1288, in _shutdown
t.join()
File "/usr/lib/python3.5/threading.py", line 1054, in join
self._wait_for_tstate_lock()
File "/usr/lib/python3.5/threading.py", line 1070, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt
Traceback (most recent call last):
File "/usr/lib/python3.5/multiprocessing/process.py", line 252, in _bootstrap
util._exit_function()
File "/usr/lib/python3.5/multiprocessing/util.py", line 314, in _exit_function
_run_finalizers()
File "/usr/lib/python3.5/multiprocessing/util.py", line 254, in _run_finalizers
finalizer()
File "/usr/lib/python3.5/multiprocessing/util.py", line 186, in __call__
res = self._callback(*self._args, **self._kwargs)
File "/usr/lib/python3.5/multiprocessing/queues.py", line 198, in _finalize_join
thread.join()
File "/usr/lib/python3.5/threading.py", line 1054, in join
self._wait_for_tstate_lock()
File "/usr/lib/python3.5/threading.py", line 1070, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt
This is the minimal working example
#!/usr/bin/env python3
import multiprocessing
import threading
import time
import gi
gi.require_version('Gtk', '3.0')
from gi.repository import Gtk
from gi.repository import GLib
class MyThread (threading.Thread):
"""This thread just starts the process."""
def __init__(self, callback):
threading.Thread.__init__(self)
self._callback = callback
def run(self):
print('Running MyThread...')
self.result = []
queue = multiprocessing.Queue()
process = MyProcess(queue)
process.start()
process.join()
while not queue.empty():
process_result = queue.get()
self.result.append(process_result)
print('MyThread stoppd.')
GLib.idle_add(self._callback)
class MyProcess (multiprocessing.Process):
def __init__(self, queue):
multiprocessing.Process.__init__(self)
self.queue = queue
def run(self):
print('Running MyProcess...')
for i in range(3):
self.queue.put((i, 'x'*102048))
print('MyProcess stoppd.')
class MyWindow (Gtk.Window):
def __init__(self):
Gtk.Window.__init__(self)
self.connect('destroy', Gtk.main_quit)
GLib.timeout_add(2000, self.do_start)
def do_start(self):
print('MyWindow::do_start()')
# The process need to be started from a separate thread
# to prevent the main thread (which is the gui main loop)
# from freezing while waiting for the process result.
self.thread = MyThread(self.callback_thread_finished)
self.thread.start()
def callback_thread_finished(self):
result = self.thread.result
for r in result:
print('{} {}...'.format(r[0], r[1][:10]))
if __name__ == '__main__':
win = MyWindow()
win.show_all()
Gtk.main()
Possible duplicate but quite different and IMO without an answer for my situation: Thread._wait_for_tstate_lock() never returns.
Workaround
Using a Manager by modifing line 22 to queue = multiprocessing.Manager().Queue() solve the problem. But I don't know why. My intention of this question is to understand the things behind and not only to make my code work. Even I don't really know what a Manager() is and if it has other (problem causing) implications.
According to the second warning box in the documentation you are linking to you can get a deadlock when you join a process before processing all items in the queue. So starting the process and immediately joining it and then processing the items in the queue is the wrong order of steps. You have to start the process, then receive the items, and then only when all items are received you can call the join method. Define some sentinel value to signal that the process is finished sending data through the queue. None for example if that can't be a regular value you expect from the process.
class MyThread(threading.Thread):
"""This thread just starts the process."""
def __init__(self, callback):
threading.Thread.__init__(self)
self._callback = callback
self.result = []
def run(self):
print('Running MyThread...')
queue = multiprocessing.Queue()
process = MyProcess(queue)
process.start()
while True:
process_result = queue.get()
if process_result is None:
break
self.result.append(process_result)
process.join()
print('MyThread stoppd.')
GLib.idle_add(self._callback)
class MyProcess(multiprocessing.Process):
def __init__(self, queue):
multiprocessing.Process.__init__(self)
self.queue = queue
def run(self):
print('Running MyProcess...')
for i in range(3):
self.queue.put((i, 'x' * 102048))
self.queue.put(None)
print('MyProcess stoppd.')
The documentation in question
reads:
Warning
As mentioned above, if a child process has put items on a queue (and it has not used JoinableQueue.cancel_join_thread), then that process will not terminate until all buffered items have been flushed to the pipe.
This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.
Note that a queue created using a manager does not have this issue. See Programming guidelines.
This is supplementary to the accepted answer, but the edit queue is full.

Call a module's multiprocessing function from a script

My module, which is also a script, calls some internally defined functions that use multiprocessing.
Running the module as a script works just fine on Windows and Linux. Calling its main function from another python script works fine on Linux but not on Windows.
The core, multi-processed function (the function passed to the multiprocessing.Process constructor as the target) never gets executed when my module calls the Process's start() function.
The module must be doing something too demanding for this usage (multiprocessing on Windows when called from a script), but how can I get to the source of this problem?
Here's some example code to demonstrate the behavior. First the module:
# -*- coding: utf-8 -*-
'my_mp_module.py'
import argparse
import itertools
import Queue
import multiprocessing
def meaty_function(**kwargs):
'Do a meaty calculation using multiprocessing'
task_values = kwargs['task_values']
# Set up a queue of tasks to perform, one for each element in the task_values array
in_queue = multiprocessing.Queue()
out_queue = multiprocessing.Queue()
reduce(lambda a, b: a or b,
itertools.imap(in_queue.put, enumerate(task_values)))
core_procargs=(
in_queue ,
out_queue,
)
core_processes = [multiprocessing.Process(target=_core_function,
args=core_procargs) for ii in xrange(len(task_values))]
for p in core_processes:
p.daemon = True # I've tried both ways, setting this to True and False
p.start()
sum_of_results = 0
for result_count in xrange(len(task_values)):
a_result = out_queue.get(block=True)
sum_of_results += a_result
for p in core_processes:
p.join()
return sum_of_results
def _core_function(inp_queue, out_queue):
'Perform the core calculation for each task in the input queue, placing the results in the output queue'
while 1:
try:
task_idx, task_value = inp_queue.get(block=False)
# Perform a calculation with this task value.
task_result = task_idx + task_value # The real calculation is more complicated than this
out_queue.put(task_result)
except Queue.Empty:
break
def get_command_line_arguments(command_line=None):
'parse the given command_line (list of strings) or from sys.argv, return the corresponding argparse.Namespace object'
aparse = argparse.ArgumentParser(description=__doc__)
aparse.add_argument('--task_values', '-t',
action='append',
type=int,
help='''The value for each task to perform.''')
return aparse.parse_args(args=command_line)
def main(command_line=None):
'perform a meaty calculation with the input from the command line, and print the results'
# collect input from the command line
args=get_command_line_arguments(command_line)
keywords = vars(args)
# perform a meaty calculation with the input
meaty_results = meaty_function(**keywords)
# display the results
print(meaty_results)
if __name__ == '__main__':
multiprocessing.freeze_support()
main(command_line=None)
Now the script that calls the module:
# -*- coding: utf-8 -*-
'my_mp_script.py:'
import my_mp_module
import multiprocessing
multiprocessing.freeze_support()
my_mp_module.main(command_line=None)
Running the module as a script gives the expected results:
C:\Users\greg>python -m my_mp_module -t 0 -t 1 -t 2
6
But running another script that simply calls the module's main() function gives an error message under Windows (here I stripped out the error message duplicated from each of the multiple processes):
C:\Users\greg>python my_mp_script.py -t 0 -t 1 -t 2
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Users\greg\AppData\Local\Continuum\anaconda2-64\lib\multiprocessing\forking.py", line 380, in main
prepare(preparation_data)
File "C:\Users\greg\AppData\Local\Continuum\anaconda2-64\lib\multiprocessing\forking.py", line 510, in prepare
'__parents_main__', file, path_name, etc
File "C:\Users\greg\Documents\PythonCode\Scripts\my_mp_script.py", line 7, in <module>
my_mp_module.main(command_line=None)
File "C:\Users\greg\Documents\PythonCode\Lib\my_mp_module.py", line 72, in main
meaty_results = meaty_function(**keywords)
File "C:\Users\greg\Documents\PythonCode\Lib\my_mp_module.py", line 28, in meaty_function
p.start()
File "C:\Users\greg\AppData\Local\Continuum\anaconda2-64\lib\multiprocessing\process.py", line 130, in start
self._popen = Popen(self)
File "C:\Users\greg\AppData\Local\Continuum\anaconda2-64\lib\multiprocessing\forking.py", line 258, in __init__
cmd = get_command_line() + [rhandle]
File "C:\Users\greg\AppData\Local\Continuum\anaconda2-64\lib\multiprocessing\forking.py", line 358, in get_command_line
is not going to be frozen to produce a Windows executable.''')
RuntimeError:
Attempt to start a new process before the current process
has finished its bootstrapping phase.
This probably means that you are on Windows and you have
forgotten to use the proper idiom in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce a Windows executable.
Linux and Windows work a little differently in the way they create additional processes. Linux forks the code but Windows creates a new Python interpreter to run the spawned process. The effect here is that all your code gets re-loaded just as if it were the first time. There is a similar question that might be informative to look at see... How to stop multiprocessing in python running for the full script.
The solution here is to modify the my_mp_script.py script so the call to my_mp_module.main() is guarded like so..
import my_mp_module
import multiprocessing
if __name__ == '__main__':
my_mp_module.main(command_line=None)
Note that I've also removed the freeze_support() functions for now, however those may be acceptable to put back in if needed.

Python Multiprocessing: Crash in subprocess?

What happens when a python script opens subprocesses and one process crashes?
https://stackoverflow.com/a/18216437/311901
Will the main process crash?
Will the other subprocesses crash?
Is there a signal or other event that's propagated?
When using multiprocessing.Pool, if one of the subprocesses in the pool crashes, you will not be notified at all, and a new process will immediately be started to take its place:
>>> import multiprocessing
>>> p = multiprocessing.Pool()
>>> p._processes
4
>>> p._pool
[<Process(PoolWorker-1, started daemon)>, <Process(PoolWorker-2, started daemon)>, <Process(PoolWorker-3, started daemon)>, <Process(PoolWorker-4, started daemon)>]
>>> [proc.pid for proc in p._pool]
[30760, 30761, 30762, 30763]
Then in another window:
dan#dantop:~$ kill 30763
Back to the pool:
>>> [proc.pid for proc in p._pool]
[30760, 30761, 30762, 30767] # New pid for the last process
You can continue using the pool as if nothing happened. However, any work item that the killed child process was running at the time it died will not be completed or restarted. If you were running a blocking map or apply call that was relying on that work item to complete, it will likely hang indefinitely. There is a bug filed for this, but the issue was only fixed in concurrent.futures.ProcessPoolExecutor, rather than in multiprocessing.Pool. Starting with Python 3.3, ProcessPoolExecutor will raise a BrokenProcessPool exception if a child process is killed, and disallow any further use of the pool. Sadly, multiprocessing didn't get this enhancement. For now, if you want to guard against a pool call blocking forever due to a sub-process crashing, you have to use ugly workarounds.
Note: The above only applies to a process in a pool actually crashing, meaning the process completely dies. If a sub-process raises an exception, that will be propagated up the parent process when you try to retrieve the result of the work item:
>>> def f(): raise Exception("Oh no")
...
>>> pool = multiprocessing.Pool()
>>> result = pool.apply_async(f)
>>> result.get()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/multiprocessing/pool.py", line 528, in get
raise self._value
Exception: Oh no
When using a multiprocessing.Process directly, the process object will show that the process has exited with a non-zero exit code if it crashes:
>>> def f(): time.sleep(30)
...
>>> p = multiprocessing.Process(target=f)
>>> p.start()
>>> p.join() # Kill the process while this is blocking, and join immediately ends
>>> p.exitcode
-15
The behavior is similar if an exception is raised:
from multiprocessing import Process
def f(x):
raise Exception("Oh no")
if __name__ == '__main__':
p = Process(target=f)
p.start()
p.join()
print(p.exitcode)
print("done")
Output:
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.2/multiprocessing/process.py", line 267, in _bootstrap
self.run()
File "/usr/lib/python3.2/multiprocessing/process.py", line 116, in run
self._target(*self._args, **self._kwargs)
TypeError: f() takes exactly 1 argument (0 given)
1
done
As you can see, the traceback from the child is printed, but it doesn't affect exceution of the main process, which is able to show the exitcode of the child was 1.

Python Multiprocessing atexit Error "Error in atexit._run_exitfuncs"

I am trying to run a simple multiple processes application in Python. The main thread spawns 1 to N processes and waits until they all done processing. The processes each run an infinite loop, so they can potentially run forever without some user interruption, so I put in some code to handle a KeyboardInterrupt:
#!/usr/bin/env python
import sys
import time
from multiprocessing import Process
def main():
# Set up inputs..
# Spawn processes
Proc( 1).start()
Proc( 2).start()
class Proc ( Process ):
def __init__ ( self, procNum):
self.id = procNum
Process.__init__(self)
def run ( self ):
doneWork = False
while True:
try:
# Do work...
time.sleep(1)
sys.stdout.write('.')
if doneWork:
print "PROC#" + str(self.id) + " Done."
break
except KeyboardInterrupt:
print "User aborted."
sys.exit()
# Main Entry
if __name__=="__main__":
main()
The problem is that when using CTRL-C to exit, I get an additional error even though the processes seem to exit immediately:
......User aborted.
Error in atexit._run_exitfuncs:
Traceback (most recent call last):
File "C:\Python26\lib\atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "C:\Python26\lib\multiprocessing\util.py", line 281, in _exit_function
p.join()
File "C:\Python26\lib\multiprocessing\process.py", line 119, in join
res = self._popen.wait(timeout)
File "C:\Python26\lib\multiprocessing\forking.py", line 259, in wait
res = _subprocess.WaitForSingleObject(int(self._handle), msecs)
KeyboardInterrupt
Error in sys.exitfunc:
Traceback (most recent call last):
File "C:\Python26\lib\atexit.py", line 24, in _run_exitfuncs
func(*targs, **kargs)
File "C:\Python26\lib\multiprocessing\util.py", line 281, in _exit_function
p.join()
File "C:\Python26\lib\multiprocessing\process.py", line 119, in join
res = self._popen.wait(timeout)
File "C:\Python26\lib\multiprocessing\forking.py", line 259, in wait
res = _subprocess.WaitForSingleObject(int(self._handle), msecs)
KeyboardInterrupt
I am running Python 2.6 on Windows. If there is a better way to handle multiprocessing in Python, please let me know.
This is a very old question, but it seems like the accepted answer does not actually fix the problem.
The main issue is that you need to handle the keyboard interrupt in the parent process as well. Additionally to that, in the while loop, you just need to exit the loop, there's no need to call sys.exit()
I've tried to as closely match the example in the original question. The doneWork code does nothing in the example so have removed that for clarity.
import sys
import time
from multiprocessing import Process
def main():
# Set up inputs..
# Spawn processes
try:
processes = [Proc(1), Proc(2)]
[p.start() for p in processes]
[p.join() for p in processes]
except KeyboardInterrupt:
pass
class Proc(Process):
def __init__(self, procNum):
self.id = procNum
Process.__init__(self)
def run(self):
while True:
try:
# Do work...
time.sleep(1)
sys.stdout.write('.')
except KeyboardInterrupt:
print("User aborted.")
break
# Main Entry
if __name__ == "__main__":
main()
Rather then just forcing sys.exit(), you want to send a signal to your threads to tell them to stop. Look into using signal handlers and threads in Python.
You could potentially do this by changing your while True: loop to be while keep_processing: where keep_processing is some sort of global variable that gets set on the KeyboardInterrupt exception. I don't think this is a good practice though.

Categories