ProcessPoolExecutor, BrokenProcessPool handling

ProcessPoolExecutor, BrokenProcessPool handling - python

In this documentation ( https://pymotw.com/3/concurrent.futures/ ) it says:
"The ProcessPoolExecutor works in the same way as ThreadPoolExecutor, but uses processes instead of threads. This allows CPU-intensive operations to use a separate CPU and not be blocked by the CPython interpreter’s global interpreter lock."
This sounds great! It also says:
"If something happens to one of the worker processes to cause it to exit unexpectedly, the ProcessPoolExecutor is considered “broken” and will no longer schedule tasks."
This sounds bad :( So I guess my question is: What is considered "Unexpectedly?" Does that just mean the exit signal is not 1? Can I safely exit the thread and still keep processing a queue? The example is as follows:
from concurrent import futures
import os
import signal
with futures.ProcessPoolExecutor(max_workers=2) as ex:
print('getting the pid for one worker')
f1 = ex.submit(os.getpid)
pid1 = f1.result()
print('killing process {}'.format(pid1))
os.kill(pid1, signal.SIGHUP)
print('submitting another task')
f2 = ex.submit(os.getpid)
try:
pid2 = f2.result()
except futures.process.BrokenProcessPool as e:
print('could not start new tasks: {}'.format(e))

I hadn't see it IRL, but from the code it looks like the returned file descriptors not contains the results_queue file descriptor.
from concurrent.futures.process:
reader = result_queue._reader
while True:
_add_call_item_to_queue(pending_work_items,
work_ids_queue,
call_queue)
sentinels = [p.sentinel for p in processes.values()]
assert sentinels
ready = wait([reader] + sentinels)
if reader in ready: # <===================================== THIS
result_item = reader.recv()
else:
# Mark the process pool broken so that submits fail right now.
executor = executor_reference()
if executor is not None:
executor._broken = True
executor._shutdown_thread = True
executor = None
# All futures in flight must be marked failed
for work_id, work_item in pending_work_items.items():
work_item.future.set_exception(
BrokenProcessPool(
"A process in the process pool was "
"terminated abruptly while the future was "
"running or pending."
))
# Delete references to object. See issue16284
del work_item
the wait function depends on system, but assuming linux OS (at multiprocessing.connection, removed all timeout related code):
def wait(object_list, timeout=None):
'''
Wait till an object in object_list is ready/readable.
Returns list of those objects in object_list which are ready/readable.
'''
with _WaitSelector() as selector:
for obj in object_list:
selector.register(obj, selectors.EVENT_READ)
while True:
ready = selector.select(timeout)
if ready:
return [key.fileobj for (key, events) in ready]
else:
# some timeout code

Related

Get error flag/message from a queued process in Python multiprocessing

I am preparing a Python multiprocessing tool where I use Process and Queue commands. The queue is putting another script in a process to run in parallel. As a sanity check, in the queue, I want to check if there is any error happing in my other script and return a flag/message if there was an error (status = os.system() will run the process and status is a flag for error). But I can't output errors from the queue/child in the consumer process to the parent process. Following are the main parts of my code (shortened):
import os
import time
from multiprocessing import Process, Queue, Lock
command_queue = Queue()
lock = Lock()
p = Process(target=producer, args=(command_queue, lock, test_config_list_path))
for i in range(consumer_num):
c = Process(target=consumer, args=(command_queue, lock))
consumers.append(c)
p.daemon = True
p.start()
for c in consumers:
c.daemon = True
c.start()
p.join()
for c in consumers:
c.join()
if error_flag:
Stop_this_process_and_send_a_message!
def producer(queue, lock, ...):
for config_path in test_config_list_path:
queue.put((config_path, process_to_be_queued))
def consumer(queue, lock):
while True:
elem = queue.get()
if elem is None:
return
status = os.system(elem[1])
if status:
error_flag = 1
time.sleep(3)
Now I want to get that error_flag and use it in the main code to handle things. But seems I can't output error_flag from the consumer (child) part to the main part of the code. I'd appreciate it if someone can help with this.

Given your update, I also pass an multiprocessing.Event instance to your to_do process. This allows you to simply issue a call to wait on the event in the main process, which will block until a call to set is called on it. Naturally, when to_do or one of its threads detects a script error, it would call set on the event after setting error_flag.value to True. This will wake up the main process who can then call method terminate on the process, which will do what you want. On a normal completion of to_do, it still is necessary to call set on the event since the main process is blocking until the event has been set. But in this case the main process will just call join on the process.
Using a multiprocessing.Value instance alone would have required periodically checking its value in a loop, so I think waiting on a multiprocessing.Event is better. I have also made a couple of other updates to your code with comments, so please review them:
import multiprocessing
from ctypes import c_bool
...
def to_do(event, error_flag):
# Run the tests
wrapper_threads.main(event, error_flag)
# on error or normal process completion:
event.set()
def git_pull_change(path_to_repo):
repo = Repo(path)
current = repo.head.commit
repo.remotes.origin.pull()
if current == repo.head.commit:
print("Repo not changed. Sleep mode activated.")
# Call to time.sleep(some_number_of_seconds) should go here, right?
return False
else:
print("Repo changed. Start running the tests!")
return True
def main():
while True:
status = git_pull_change(git_path)
if status:
# The repo was just pulled, so no point in doing it again:
#repo = Repo(git_path)
#repo.remotes.origin.pull()
event = multiprocessing.Event()
error_flag = multiprocessing.Value(c_bool, False, lock=False)
process = multiprocessing.Process(target=to_do, args=(event, error_flag))
process.start()
# wait for an error or normal process completion:
event.wait()
if error_flag.value:
print('Error! breaking the process!!!!!!!!!!!!!!!!!!!!!!!')
process.terminate() # Kill the process
else:
process.join()
break

You should always tag multiprocessing questions with the platform you are running on. Since I do not see your process-creating code within a if __name__ == '__main__': block, I have to assume you are running on a platform that uses OS fork calls to create new processes, such as Linux.
That means your newly created processes inherit the value of error_flag when they are created but for all intents and purposes, if a process modifies this variable, it is modifying a local copy of this variable that exists in an address space that is unique to that process.
You need to create error_flag in shared memory and pass it as an argument to your process:
from multiprocessing import Value
from ctypes import c_bool
...
error_flag = Value(c_bool, False, lock=False)
for i in range(consumer_num):
c = Process(target=consumer, args=(command_queue, lock, error_flag))
consumers.append(c)
...
if error_flag.value:
...
#Stop_this_process_and_send_a_message!
def consumer(queue, lock, error_flag):
while True:
elem = queue.get()
if elem is None:
return
status = os.system(elem[1])
if status:
error_flag.value = True
time.sleep(3)
But I have a questions/comments for you. You have in your original code the following statement:
if error_flag:
Stop_this_process_and_send_a_message!
But this statement is located after you have already joined all the started processes. So what processes are there to stop and where are you sending a message to (you have potentially multiple consumers any of which might be setting the error_flag -- by the way, no need to have this done under a lock since setting the value True is an atomic action). And since you are joining all your processes, i.e. waiting for them to complete, I am not sure why you are making them daemon processes. You are also passing a Lock instance to your producer and consumers, but it is not being used at all.
Your consumers return when they get a None record from the queue. So if you have N consumers, the last N elements of test_config_path need to be None.
I also see no need for having the producer process. The main process could just as well write all the records to the queue either before or even after it starts the consumer processes.
The call to time.sleep(3) you have at the end of function consumer is unreachable.

So the above code summary is the inner process to run some tests in parallel. I removed the def function part from it, but just assume that is the wrapper_threads in the following code summary. Here I'll add the parent process which is checking a variable (let's assume a commit in my git repo). The following process is meant to run indefinitely and when there is a change it will trigger the multiprocess in the main question:
def to_do():
# Run the tests
wrapper_threads.main()
def git_pull_change(path_to_repo):
repo = Repo(path)
current = repo.head.commit
repo.remotes.origin.pull()
if current == repo.head.commit:
print("Repo not changed. Sleep mode activated.")
return False
else:
print("Repo changed. Start running the tests!")
return True
def main():
process = None
while True:
status = git_pull_change(git_path)
if status:
repo = Repo(git_path)
repo.remotes.origin.pull()
process = multiprocessing.Process(target=to_do)
process.start()
if error_flag.value:
print('Error! breaking the process!!!!!!!!!!!!!!!!!!!!!!!')
os.system('pkill -U user XXX')
break
Now I want to propagate that error_flag from the child process to this process and stop process XXX. The problem is that I don't know how to bring that error_flag to this (grand)parent process.

Why does my multiprocess queue not appear to be thread safe?

I am building a watchdog timer that runs another Python program, and if it fails to find a check-in from any of the threads, shuts down the whole program. This is so it will, eventually, be able to take control of needed communication ports. The code for the timer is as follows:
from multiprocessing import Process, Queue
from time import sleep
from copy import deepcopy
PATH_TO_FILE = r'.\test_program.py'
WATCHDOG_TIMEOUT = 2
class Watchdog:
def __init__(self, filepath, timeout):
self.filepath = filepath
self.timeout = timeout
self.threadIdQ = Queue()
self.knownThreads = {}
def start(self):
threadIdQ = self.threadIdQ
process = Process(target = self._executeFile)
process.start()
try:
while True:
unaccountedThreads = deepcopy(self.knownThreads)
# Empty queue since last wake. Add new thread IDs to knownThreads, and account for all known thread IDs
# in queue
while not threadIdQ.empty():
threadId = threadIdQ.get()
if threadId in self.knownThreads:
unaccountedThreads.pop(threadId, None)
else:
print('New threadId < {} > discovered'.format(threadId))
self.knownThreads[threadId] = False
# If there is a known thread that is unaccounted for, then it has either hung or crashed.
# Shut everything down.
if len(unaccountedThreads) > 0:
print('The following threads are unaccounted for:\n')
for threadId in unaccountedThreads:
print(threadId)
print('\nShutting down!!!')
break
else:
print('No unaccounted threads...')
sleep(self.timeout)
# Account for any exceptions thrown in the watchdog timer itself
except:
process.terminate()
raise
process.terminate()
def _executeFile(self):
with open(self.filepath, 'r') as f:
exec(f.read(), {'wdQueue' : self.threadIdQ})
if __name__ == '__main__':
wd = Watchdog(PATH_TO_FILE, WATCHDOG_TIMEOUT)
wd.start()
I also have a small program to test the watchdog functionality
from time import sleep
from threading import Thread
from queue import SimpleQueue
Q_TO_Q_DELAY = 0.013
class QToQ:
def __init__(self, processQueue, threadQueue):
self.processQueue = processQueue
self.threadQueue = threadQueue
Thread(name='queueToQueue', target=self._run).start()
def _run(self):
pQ = self.processQueue
tQ = self.threadQueue
while True:
while not tQ.empty():
sleep(Q_TO_Q_DELAY)
pQ.put(tQ.get())
def fastThread(q):
while True:
print('Fast thread, checking in!')
q.put('fastID')
sleep(0.5)
def slowThread(q):
while True:
print('Slow thread, checking in...')
q.put('slowID')
sleep(1.5)
def hangThread(q):
print('Hanging thread, checked in')
q.put('hangID')
while True:
pass
print('Hello! I am a program that spawns threads!\n\n')
threadQ = SimpleQueue()
Thread(name='fastThread', target=fastThread, args=(threadQ,)).start()
Thread(name='slowThread', target=slowThread, args=(threadQ,)).start()
Thread(name='hangThread', target=hangThread, args=(threadQ,)).start()
QToQ(wdQueue, threadQ)
As you can see, I need to have the threads put into a queue.Queue, while a separate object slowly feeds the output of the queue.Queue into the multiprocessing queue. If instead I have the threads put directly into the multiprocessing queue, or do not have the QToQ object sleep in between puts, the multiprocessing queue will lock up, and will appear to always be empty on the watchdog side.
Now, as the multiprocessing queue is supposed to be thread and process safe, I can only assume I have messed something up in the implementation. My solution seems to work, but also feels hacky enough that I feel I should fix it.
I am using Python 3.7.2, if it matters.

I suspect that test_program.py exits.
I changed the last few lines to this:
tq = threadQ
# tq = wdQueue # option to send messages direct to WD
t1 = Thread(name='fastThread', target=fastThread, args=(tq,))
t2 = Thread(name='slowThread', target=slowThread, args=(tq,))
t3 = Thread(name='hangThread', target=hangThread, args=(tq,))
t1.start()
t2.start()
t3.start()
QToQ(wdQueue, threadQ)
print('Joining with threads...')
t1.join()
t2.join()
t3.join()
print('test_program exit')
The calls to join() means that the test program never exits all by itself since none of the threads ever exit.
So, as is, t3 hangs and the watchdog program detects this and detects the unaccounted for thread and stops the test program.
If t3 is removed from the above program, then the other two threads are well behaved and the watchdog program allows the test program to continue indefinitely.

How to handle abnormal child process termination?

I'm using python 3.7 and following this documentation. I want to have a process, which should spawn a child process, wait for it to finish a task, and get some info back. I use the following code:
if __name__ == '__main__':
q = Queue()
p = Process(target=some_func, args=(q,))
p.start()
print q.get()
p.join()
When the child process finishes correctly there is no problem, and it works great, but the problem starts when my child process is terminated before it finished.
In this case, my application is hanging on wait.
Giving a timeout to q.get() and p.join() not completely solves the issue, because I want to know immediately that the child process died and not to wait to the timeout.
Another problem is that timeout on q.get() yields an exception, which I prefer to avoid.
Can someone suggest me a more elegant way to overcome those issues?

Queue & Signal
One possibility would be registering a signal handler and use it to pass a sentinel value.
On Unix you could handle SIGCHLD in the parent, but that's not an option in your case. According to the signal module:
On Windows, signal() can only be called with SIGABRT, SIGFPE, SIGILL, SIGINT, SIGSEGV, SIGTERM, or SIGBREAK.
Not sure if killing it through Task-Manager will translate into SIGTERM but you can give it a try.
For handling SIGTERM you would need to register the signal handler in the child.
import os
import sys
import time
import signal
from functools import partial
from multiprocessing import Process, Queue
SENTINEL = None
def _sigterm_handler(signum, frame, queue):
print("received SIGTERM")
queue.put(SENTINEL)
sys.exit()
def register_sigterm(queue):
global _sigterm_handler
_sigterm_handler = partial(_sigterm_handler, queue=queue)
signal.signal(signal.SIGTERM, _sigterm_handler)
def some_func(q):
register_sigterm(q)
print(os.getpid())
for i in range(30):
time.sleep(1)
q.put(f'msg_{i}')
if __name__ == '__main__':
q = Queue()
p = Process(target=some_func, args=(q,))
p.start()
for msg in iter(q.get, SENTINEL):
print(msg)
p.join()
Example Output:
12273
msg_0
msg_1
msg_2
msg_3
received SIGTERM
Process finished with exit code 0
Queue & Process.is_alive()
Even if this works with Task-Manager, your use-case sounds like you can't exclude force kills, so I think you're better off with an approach which doesn't rely on signals.
You can check in a loop if your process p.is_alive(), call queue.get() with a timeout specified and handle the Empty exceptions:
import os
import time
from queue import Empty
from multiprocessing import Process, Queue
def some_func(q):
print(os.getpid())
for i in range(30):
time.sleep(1)
q.put(f'msg_{i}')
if __name__ == '__main__':
q = Queue()
p = Process(target=some_func, args=(q,))
p.start()
while p.is_alive():
try:
msg = q.get(timeout=0.1)
except Empty:
pass
else:
print(msg)
p.join()
It would be also possible to avoid an exception, but I wouldn't recommend this because you don't spend your waiting time "on the queue", hence decreasing the responsiveness:
while p.is_alive():
if not q.empty():
msg = q.get_nowait()
print(msg)
time.sleep(0.1)
Pipe & Process.is_alive()
If you intend to utilize one connection per-child, it would however be possible to use a pipe instead of a queue. It's more performant than a queue
(which is mounted on top of a pipe) and you can use multiprocessing.connection.wait (Python 3.3+) to await readiness of multiple objects at once.
multiprocessing.connection.wait(object_list, timeout=None)
Wait till an object in object_list is ready. Returns the list of those objects in object_list which are ready. If timeout is a float then the call blocks for at most that many seconds. If timeout is None then it will block for an unlimited period. A negative timeout is equivalent to a zero timeout.
For both Unix and Windows, an object can appear in object_list if it is a readable Connection object;
a connected and readable socket.socket object; or
the sentinel attribute of a Process object.
A connection or socket object is ready when there is data available to be read from it, or the other end has been closed.
Unix: wait(object_list, timeout) almost equivalent select.select(object_list, [], [], timeout). The difference is that, if select.select() is interrupted by a signal, it can raise OSError with an error number of EINTR, whereas wait() will not.
Windows: An item in object_list must either be an integer handle which is waitable (according to the definition used by the documentation of the Win32 function WaitForMultipleObjects()) or it can be an object with a fileno() method which returns a socket handle or pipe handle. (Note that pipe handles and socket handles are not waitable handles.)
You can use this to await the sentinel attribute of the process and the parental end of the pipe concurrently.
import os
import time
from multiprocessing import Process, Pipe
from multiprocessing.connection import wait
def some_func(conn_write):
print(os.getpid())
for i in range(30):
time.sleep(1)
conn_write.send(f'msg_{i}')
if __name__ == '__main__':
conn_read, conn_write = Pipe(duplex=False)
p = Process(target=some_func, args=(conn_write,))
p.start()
while p.is_alive():
wait([p.sentinel, conn_read]) # block-wait until something gets ready
if conn_read.poll(): # check if something can be received
print(conn_read.recv())
p.join()

Finding the cause of a BrokenProcessPool in python's concurrent.futures

In a nutshell
I get a BrokenProcessPool exception when parallelizing my code with concurrent.futures. No further error is displayed. I want to find the cause of the error and ask for ideas of how to do that.
Full problem
I am using concurrent.futures to parallelize some code.
with ProcessPoolExecutor() as pool:
mapObj = pool.map(myMethod, args)
I end up with (and only with) the following exception:
concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore
Unfortunately, the program is complex and the error appears only after the program has run for 30 minutes. Therefore, I cannot provide a nice minimal example.
In order to find the cause of the issue, I wrapped the method that I run in parallel with a try-except-block:
def myMethod(*args):
try:
...
except Exception as e:
print(e)
The problem remained the same and the except block was never entered. I conclude that the exception does not come from my code.
My next step was to write a custom ProcessPoolExecutor class that is a child of the original ProcessPoolExecutor and allows me to replace some methods with cusomized ones. I copied and pasted the original code of the method _process_worker and added some print statements.
def _process_worker(call_queue, result_queue):
"""Evaluates calls from call_queue and places the results in result_queue.
...
"""
while True:
call_item = call_queue.get(block=True)
if call_item is None:
# Wake up queue management thread
result_queue.put(os.getpid())
return
try:
r = call_item.fn(*call_item.args, **call_item.kwargs)
except BaseException as e:
print("??? Exception ???") # newly added
print(e) # newly added
exc = _ExceptionWithTraceback(e, e.__traceback__)
result_queue.put(_ResultItem(call_item.work_id, exception=exc))
else:
result_queue.put(_ResultItem(call_item.work_id,
result=r))
Again, the except block is never entered. This was to be expected, because I already ensured that my code does not raise an exception (and if everything worked well, the exception should be passed to the main process).
Now I am lacking ideas how I could find the error. The exception is raised here:
def submit(self, fn, *args, **kwargs):
with self._shutdown_lock:
if self._broken:
raise BrokenProcessPool('A child process terminated '
'abruptly, the process pool is not usable anymore')
if self._shutdown_thread:
raise RuntimeError('cannot schedule new futures after shutdown')
f = _base.Future()
w = _WorkItem(f, fn, args, kwargs)
self._pending_work_items[self._queue_count] = w
self._work_ids.put(self._queue_count)
self._queue_count += 1
# Wake up queue management thread
self._result_queue.put(None)
self._start_queue_management_thread()
return f
The process pool is set to be broken here:
def _queue_management_worker(executor_reference,
processes,
pending_work_items,
work_ids_queue,
call_queue,
result_queue):
"""Manages the communication between this process and the worker processes.
...
"""
executor = None
def shutting_down():
return _shutdown or executor is None or executor._shutdown_thread
def shutdown_worker():
...
reader = result_queue._reader
while True:
_add_call_item_to_queue(pending_work_items,
work_ids_queue,
call_queue)
sentinels = [p.sentinel for p in processes.values()]
assert sentinels
ready = wait([reader] + sentinels)
if reader in ready:
result_item = reader.recv()
else: #THIS BLOCK IS ENTERED WHEN THE ERROR OCCURS
# Mark the process pool broken so that submits fail right now.
executor = executor_reference()
if executor is not None:
executor._broken = True
executor._shutdown_thread = True
executor = None
# All futures in flight must be marked failed
for work_id, work_item in pending_work_items.items():
work_item.future.set_exception(
BrokenProcessPool(
"A process in the process pool was "
"terminated abruptly while the future was "
"running or pending."
))
# Delete references to object. See issue16284
del work_item
pending_work_items.clear()
# Terminate remaining workers forcibly: the queues or their
# locks may be in a dirty state and block forever.
for p in processes.values():
p.terminate()
shutdown_worker()
return
...
It is (or seems to be) a fact that a process terminates, but I have no clue why. Are my thoughts correct so far? What are possible causes that make a process terminate without a message? (Is this even possible?) Where could I apply further diagnostics? Which questions should I ask myself in order to come closer to a solution?
I am using python 3.5 on 64bit Linux.

I think I was able to get as far as possible:
I changed the _queue_management_worker method in my changed ProcessPoolExecutor module such that the exit code of the failed process is printed:
def _queue_management_worker(executor_reference,
processes,
pending_work_items,
work_ids_queue,
call_queue,
result_queue):
"""Manages the communication between this process and the worker processes.
...
"""
executor = None
def shutting_down():
return _shutdown or executor is None or executor._shutdown_thread
def shutdown_worker():
...
reader = result_queue._reader
while True:
_add_call_item_to_queue(pending_work_items,
work_ids_queue,
call_queue)
sentinels = [p.sentinel for p in processes.values()]
assert sentinels
ready = wait([reader] + sentinels)
if reader in ready:
result_item = reader.recv()
else:
# BLOCK INSERTED FOR DIAGNOSIS ONLY ---------
vals = list(processes.values())
for s in ready:
j = sentinels.index(s)
print("is_alive()", vals[j].is_alive())
print("exitcode", vals[j].exitcode)
# -------------------------------------------
# Mark the process pool broken so that submits fail right now.
executor = executor_reference()
if executor is not None:
executor._broken = True
executor._shutdown_thread = True
executor = None
# All futures in flight must be marked failed
for work_id, work_item in pending_work_items.items():
work_item.future.set_exception(
BrokenProcessPool(
"A process in the process pool was "
"terminated abruptly while the future was "
"running or pending."
))
# Delete references to object. See issue16284
del work_item
pending_work_items.clear()
# Terminate remaining workers forcibly: the queues or their
# locks may be in a dirty state and block forever.
for p in processes.values():
p.terminate()
shutdown_worker()
return
...
Afterwards I looked up the meaning of the exit code:
from multiprocessing.process import _exitcode_to_name
print(_exitcode_to_name[my_exit_code])
whereby my_exit_code is the exit code that was printed in the block I inserted to the _queue_management_worker. In my case the code was -11, which means that I ran into a segmentation fault. Finding the reason for this issue will be a huge task but goes beyond the scope of this question.

If you are using macOS, there is a known issue with how some versions of macOS uses forking that's not considered fork-safe by Python in some scenarios. The workaround that worked for me is to use no_proxy environment variable.
Edit ~/.bash_profile and include the following (it might be better to specify list of domains or subnets here, instead of *)
no_proxy='*'
Refresh the current context
source ~/.bash_profile
My local versions the issue was seen and worked around are: Python 3.6.0 on
macOS 10.14.1 and 10.13.x
Sources:
Issue 30388
Issue 27126

Python MultiProcessing

I'm using Python Python Multiprocessing for a RabbitMQ Consumers.
On Application Start I create 4 WorkerProcesses.
def start_workers(num=4):
for i in xrange(num):
process = WorkerProcess()
process.start()
Below you find my WorkerClass.
The Logic works so far, I create 4 parallel Consumer Processes.
But the Problem is after a Process got killed. I want to create a new Process. The Problem in the Logic below is that the new Process is created as child process from the old one and after a while the memory runs out of space.
Is there any possibility with Python Multiprocessing to start a new process and kill the old one correctly?
class WorkerProcess(multiprocessing.Process):
def ___init__(self):
app.logger.info('%s: Starting new Thread!', self.name)
super(multiprocessing.Process, self).__init__()
def shutdown(self):
process = WorkerProcess()
process.start()
return True
def kill(self):
start_workers(1)
self.terminate()
def run(self):
try:
# Connect to RabbitMQ
credentials = pika.PlainCredentials(app.config.get('RABBIT_USER'), app.config.get('RABBIT_PASS'))
connection = pika.BlockingConnection(
pika.ConnectionParameters(host=app.config.get('RABBITMQ_SERVER'), port=5672, credentials=credentials))
channel = connection.channel()
# Declare the Queue
channel.queue_declare(queue='screenshotlayer',
auto_delete=False,
durable=True)
app.logger.info('%s: Start to consume from RabbitMQ.', self.name)
channel.basic_qos(prefetch_count=1)
channel.basic_consume(callback, queue='screenshotlayer')
channel.start_consuming()
app.logger.info('%s: Thread is going to sleep!', self.name)
# do what channel.start_consuming() does but with stoppping signal
#while self.stop_working.is_set():
# channel.transport.connection.process_data_events()
channel.stop_consuming()
connection.close()
except Exception as e:
self.shutdown()
return 0
Thank You

In the main process, keep track of your subprocesses (in a list) and loop over them with .join(timeout=50) (https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process.join).
Then check is he is alive (https://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process.is_alive).
If he is not, replace him with a fresh one.
def start_workers(n):
wks = []
for _ in range(n):
wks.append(WorkerProcess())
wks[-1].start()
while True:
#Remove all terminated process
wks = [p for p in wks if p.is_alive()]
#Start new process
for i in range(n-len(wks)):
wks.append(WorkerProcess())
wks[-1].start()

I would not handle the process pool management myself. Instead, I would use the ProcessPoolExecutor from the concurrent.future module.
No need to inherit the WorkerProcess to inherit the Process class. Just write your actual code in the class and then submit it to a process pool executor. The executor would have a pool of processes always ready to execute your tasks.
This way you can keep things simple and less headache for you.
You can read more about in my blog post here: http://masnun.com/2016/03/29/python-a-quick-introduction-to-the-concurrent-futures-module.html
Example Code:
from concurrent.futures import ProcessPoolExecutor
from time import sleep
def return_after_5_secs(message):
sleep(5)
return message
pool = ProcessPoolExecutor(3)
future = pool.submit(return_after_5_secs, ("hello"))
print(future.done())
sleep(5)
print(future.done())
print("Result: " + future.result())

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

ProcessPoolExecutor, BrokenProcessPool handling - python

Related

Get error flag/message from a queued process in Python multiprocessing

Why does my multiprocess queue not appear to be thread safe?

How to handle abnormal child process termination?

Finding the cause of a BrokenProcessPool in python's concurrent.futures

Python MultiProcessing

Categories

Resources