Multiprocessing Queue - child processes gets stuck sometimes and does not reap

Multiprocessing Queue - child processes gets stuck sometimes and does not reap - python

First of all I apologize if the title is bit weird but i literally could not think of how to put into a single line the problem i am facing.
So I have the following code
import time
from multiprocessing import Process, current_process, Manager
from multiprocessing import JoinableQueue as Queue
# from threading import Thread, current_thread
# from queue import Queue
def checker(q):
count = 0
while True:
if not q.empty():
data = q.get()
# print(f'{data} fetched by {current_process().name}')
# print(f'{data} fetched by {current_thread().name}')
q.task_done()
count += 1
else:
print('Queue is empty now')
print(current_process().name, '-----', count)
# print(current_thread().name, '-----', count)
if __name__ == '__main__':
t = time.time()
# m = Manager()
q = Queue()
# with open("/tmp/c.txt") as ifile:
# for line in ifile:
# q.put((line.strip()))
for i in range(1000):
q.put(i)
time.sleep(0.1)
procs = []
for _ in range(2):
p = Process(target=checker, args=(q,), daemon=True)
# p = Thread(target=checker, args=(q,))
p.start()
procs.append(p)
q.join()
for p in procs:
p.join()
Sample outputs
1: When the process just hangs
Queue is empty now
Process-2 ----- 501
output hangs at this point
2: When everything works just fine.
Queue is empty now
Process-1 ----- 515
Queue is empty now
Process-2 ----- 485
Process finished with exit code 0
Now the behavior is intermittent and happens sometimes but not always.
I have tried using Manager.Queue() as well in place of multiprocessing.Queue() but no success and both exhibits same issue.
I tested this with both multiprocessing and multithreading and i get exactly same behavior, with one slight difference that with multithreading the rate of this behavior is much less compared to multiprocessing.
So I think there is something I am missing conceptually or doing wrong, but i am not able to catch it now since I have spent way too much time on this and now my mind is not seeing something which may be very basic.
So any help is appreciated.

I believe you have a race condition in the checker method. You check whether the queue is empty and then dequeue the next task in separate steps. It's usually not a good idea to separate these two kinds of operations without mutual exclusion or locking, because the state of the queue may change between the check and the pop. It may be non-empty, but another process may then dequeue the waiting work before the process which passed the check is able to do so.
However I generally prefer communication over locking whenever possible; it's less error prone and makes one's intentions clearer. In this case, I would send a sentinel value to the worker processes (such as None) to indicate that all work is done. Each worker then just dequeues the next object (which is always thread-safe), and, if the object is None, the sub-process exits.
The example code below is a simplified version of your program, and should work without races:
def checker(q):
while True:
data = q.get()
if data is None:
print(f'process f{current_process().name} ending')
return
else:
pass # do work
if __name__ == '__main__':
q = Queue()
for i in range(1000):
q.put(i)
procs = []
for _ in range(2):
q.put(None) # Sentinel value
p = Process(target=checker, args=(q,), daemon=True)
p.start()
procs.append(p)
for proc in procs:
proc.join()

Related

Multiprocessing does not work and hangs on join on windows 10 [duplicate]

I have a question understanding the queue in the multiprocessing module in python 3
This is what they say in the programming guidelines:
Bear in mind that a process that has put items in a queue will wait before
terminating until all the buffered items are fed by the “feeder” thread to
the underlying pipe. (The child process can call the
Queue.cancel_join_thread
method of the queue to avoid this behaviour.)
This means that whenever you use a queue you need to make sure that all
items which have been put on the queue will eventually be removed before the
process is joined. Otherwise you cannot be sure that processes which have
put items on the queue will terminate. Remember also that non-daemonic
processes will be joined automatically.
An example which will deadlock is the following:
from multiprocessing import Process, Queue
def f(q):
q.put('X' * 1000000)
if __name__ == '__main__':
queue = Queue()
p = Process(target=f, args=(queue,))
p.start()
p.join() # this deadlocks
obj = queue.get()
A fix here would be to swap the last two lines (or simply remove the
p.join() line).
So apparently, queue.get() should not be called after a join().
However there are examples of using queues where get is called after a join like:
import multiprocessing as mp
import random
import string
# define a example function
def rand_string(length, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
rand_str = ''.join(random.choice(
string.ascii_lowercase
+ string.ascii_uppercase
+ string.digits)
for i in range(length))
output.put(rand_str)
if __name__ == "__main__":
# Define an output queue
output = mp.Queue()
# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output))
for x in range(2)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)
I've run this program and it works (also posted as a solution to the StackOverFlow question Python 3 - Multiprocessing - Queue.get() does not respond).
Could someone help me understand what the rule for the deadlock is here?

The queue implementation in multiprocessing that allows data to be transferred between processes relies on standard OS pipes.
OS pipes are not infinitely long, so the process which queues data could be blocked in the OS during the put() operation until some other process uses get() to retrieve data from the queue.
For small amounts of data, such as the one in your example, the main process can join() all the spawned subprocesses and then pick up the data. This often works well, but does not scale, and it is not clear when it will break.
But it will certainly break with large amounts of data. The subprocess will be blocked in put() waiting for the main process to remove some data from the queue with get(), but the main process is blocked in join() waiting for the subprocess to finish. This results in a deadlock.
Here is an example where a user had this exact issue. I posted some code in an answer there that helped him solve his problem.

Don't call join() on a process object before you got all messages from the shared queue.
I used following workaround to allow processes to exit before processing all its results:
results = []
while True:
try:
result = resultQueue.get(False, 0.01)
results.append(result)
except queue.Empty:
pass
allExited = True
for t in processes:
if t.exitcode is None:
allExited = False
break
if allExited & resultQueue.empty():
break
It can be shortened but I left it longer to be more clear for newbies.
Here resultQueue is the multiprocess.Queue that was shared with multiprocess.Process objects. After this block of code you will get the result array with all the messages from the queue.
The problem is that input buffer of the queue pipe that receive messages may become full causing writer(s) infinite block until there will be enough space to receive next message. So you have three ways to avoid blocking:
Increase the multiprocessing.connection.BUFFER size (not so good)
Decrease message size or its amount (not so good)
Fetch messages from the queue immediately as they come (good way)

JoinableQueue join() method blocking main thread even after task_done()

In below code, if I put daemon = True , consumer will quit before reading all queue entries. If consumer is non-daemon, Main thread is always blocked even after the task_done() for all the entries.
from multiprocessing import Process, JoinableQueue
import time
def consumer(queue):
while True:
final = queue.get()
print (final)
queue.task_done()
def producer1(queue):
for i in "QWERTYUIOPASDFGHJKLZXCVBNM":
queue.put(i)
if __name__ == "__main__":
queue = JoinableQueue(maxsize=100)
p1 = Process(target=consumer, args=((queue),))
p2 = Process(target=producer1, args=((queue),))
#p1.daemon = True
p1.start()
p2.start()
print(p1.is_alive())
print (p2.is_alive())
for i in range(1, 10):
queue.put(i)
time.sleep(0.01)
queue.join()

Let's see what—I believe—is happening here:
both processes are being started.
the consumer process starts its loop and blocks until a value is received from the queue.
the producer1 process feeds the queue 26 times with a letter while the main process feeds the queue 9 times with a number. The order in which letters or numbers are being fed is not guaranteed—a number could very well show up before a letter.
when both the producer1 and the main processes are done with feeding their data, the queue is being joined. No problem here, the queue can be joined since all the buffered data has been consumed and task_done() has been called after each read.
the consumer process is still running but is blocked until more data to consume show up.
Looking at your code, I believe that you are confusing the concept of joining processes with the one of joining queues. What you most likely want here is to join processes, you probably don't need a joinable queue at all.
#!/usr/bin/env python3
from multiprocessing import Process, Queue
import time
def consumer(queue):
for final in iter(queue.get, 'STOP'):
print(final)
def producer1(queue):
for i in "QWERTYUIOPASDFGHJKLZXCVBNM":
queue.put(i)
if __name__ == "__main__":
queue = Queue(maxsize=100)
p1 = Process(target=consumer, args=((queue),))
p2 = Process(target=producer1, args=((queue),))
p1.start()
p2.start()
print(p1.is_alive())
print(p2.is_alive())
for i in range(1, 10):
queue.put(i)
time.sleep(0.01)
queue.put('STOP')
p1.join()
p2.join()
Also your producer1 exits on its own after feeding all the letters but you need a way to tell your consumer process to exit when there won't be any more data for it to process. You can do this by sending a sentinel, here I chose the string 'STOP' but it can be anything.
In fact, this code is not great since the 'STOP' sentinel could be received before some letters, thus both causing some letters to not be processed but also a deadlock because the processes are trying to join even though the queue still contains some data. But this is a different problem.

Simple and safe way to wait for a Python Process to complete when using a Queue

I'm using the Process class to create and manage subprocesses, which may return non-trival quantities of data. The documentation states that join() is the correct way to wait for a Process to complete (https://docs.python.org/2/library/multiprocessing.html#the-process-class).
However, when using multiprocessing.Queue this can cause a hang after joining the process, as described here: https://bugs.python.org/issue8426 and here https://docs.python.org/2/library/multiprocessing.html#multiprocessing-programming (not a bug).
These docs suggest removing p.join() - but surely this will remove the guarantee that all processes have completed, as Queue.get() only waits for a single item to become available?
How can I wait for completion of all Processes in this case, and ensure I'm collecting output from them all?
A simple example of the hang I'd like to deal with:
from multiprocessing import Process, Queue
class MyClass:
def __init__(self):
pass
def example_run(output):
output.put([MyClass() for i in range(1000)])
print("Bottom of example_run() - note hangs after this is printed")
if __name__ == '__main__':
output = Queue()
processes = [Process(target=example_run, args=(output,)) for x in range(5)]
for p in processes:
p.start()
for p in processes:
p.join()
print("Processes completed")

https://bugs.python.org/issue8426
This means that whenever you use a queue you need to make sure that
all items which have been put on the queue will eventually be removed
before the process is joined. Otherwise you cannot be sure that
processes which have put items on the queue will terminate.
In your example I just added output.get() before calling to join() and every thing worked fine. We put data in queue to be used some where, so just make sure that.
for p in processes:
p.start()
print output.get()
for p in processes:
p.join()
print("Processes completed")

An inelegant solution is to add
output_final = []
for i in range(5): # we have 5 processes
output_final.append(output.get())
before attempting to join any of the processes. This simply tries to get the appropriate number of outputs for the number of processes we've started.
Turns out a much better, wider solution is not to use Process at all; use Pool instead. This way the hassles of starting worker processes and collecting the results is handled for you:
import multiprocessing
class MyClass:
def __init__(self):
pass
def example_run(someArbitraryInput):
foo = [MyClass() for i in range(10000)]
return foo
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=5)
output = pool.map(example_run, range(5))
pool.close(); pool.join() # make sure the processes are complete and tidy
print("Processes completed")

Python 3 Multiprocessing queue deadlock when calling join before the queue is empty

I have a question understanding the queue in the multiprocessing module in python 3
This is what they say in the programming guidelines:
Bear in mind that a process that has put items in a queue will wait before
terminating until all the buffered items are fed by the “feeder” thread to
the underlying pipe. (The child process can call the
Queue.cancel_join_thread
method of the queue to avoid this behaviour.)
This means that whenever you use a queue you need to make sure that all
items which have been put on the queue will eventually be removed before the
process is joined. Otherwise you cannot be sure that processes which have
put items on the queue will terminate. Remember also that non-daemonic
processes will be joined automatically.
An example which will deadlock is the following:
from multiprocessing import Process, Queue
def f(q):
q.put('X' * 1000000)
if __name__ == '__main__':
queue = Queue()
p = Process(target=f, args=(queue,))
p.start()
p.join() # this deadlocks
obj = queue.get()
A fix here would be to swap the last two lines (or simply remove the
p.join() line).
So apparently, queue.get() should not be called after a join().
However there are examples of using queues where get is called after a join like:
import multiprocessing as mp
import random
import string
# define a example function
def rand_string(length, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
rand_str = ''.join(random.choice(
string.ascii_lowercase
+ string.ascii_uppercase
+ string.digits)
for i in range(length))
output.put(rand_str)
if __name__ == "__main__":
# Define an output queue
output = mp.Queue()
# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output))
for x in range(2)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)
I've run this program and it works (also posted as a solution to the StackOverFlow question Python 3 - Multiprocessing - Queue.get() does not respond).
Could someone help me understand what the rule for the deadlock is here?

The queue implementation in multiprocessing that allows data to be transferred between processes relies on standard OS pipes.
OS pipes are not infinitely long, so the process which queues data could be blocked in the OS during the put() operation until some other process uses get() to retrieve data from the queue.
For small amounts of data, such as the one in your example, the main process can join() all the spawned subprocesses and then pick up the data. This often works well, but does not scale, and it is not clear when it will break.
But it will certainly break with large amounts of data. The subprocess will be blocked in put() waiting for the main process to remove some data from the queue with get(), but the main process is blocked in join() waiting for the subprocess to finish. This results in a deadlock.
Here is an example where a user had this exact issue. I posted some code in an answer there that helped him solve his problem.

Don't call join() on a process object before you got all messages from the shared queue.
I used following workaround to allow processes to exit before processing all its results:
results = []
while True:
try:
result = resultQueue.get(False, 0.01)
results.append(result)
except queue.Empty:
pass
allExited = True
for t in processes:
if t.exitcode is None:
allExited = False
break
if allExited & resultQueue.empty():
break
It can be shortened but I left it longer to be more clear for newbies.
Here resultQueue is the multiprocess.Queue that was shared with multiprocess.Process objects. After this block of code you will get the result array with all the messages from the queue.
The problem is that input buffer of the queue pipe that receive messages may become full causing writer(s) infinite block until there will be enough space to receive next message. So you have three ways to avoid blocking:
Increase the multiprocessing.connection.BUFFER size (not so good)
Decrease message size or its amount (not so good)
Fetch messages from the queue immediately as they come (good way)

Python multiprocessing with an updating queue and an output queue

How can I script a Python multiprocess that uses two Queues as these ones?:
one as a working queue that starts with some data and that, depending on conditions of the functions to be parallelized, receives further tasks on the fly,
another that gathers results and is used to write down the result after processing finishes.
I basically need to put some more tasks in the working queue depending on what I found in its initial items. The example I post below is silly (I could transform the item as I like and put it directly in the output Queue), but its mechanics are clear and reflect part of the concept I need to develop.
Hereby my attempt:
import multiprocessing as mp
def worker(working_queue, output_queue):
item = working_queue.get() #I take an item from the working queue
if item % 2 == 0:
output_queue.put(item**2) # If I like it, I do something with it and conserve the result.
else:
working_queue.put(item+1) # If there is something missing, I do something with it and leave the result in the working queue
if __name__ == '__main__':
static_input = range(100)
working_q = mp.Queue()
output_q = mp.Queue()
for i in static_input:
working_q.put(i)
processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(mp.cpu_count())] #I am running as many processes as CPU my machine has (is this wise?).
for proc in processes:
proc.start()
for proc in processes:
proc.join()
for result in iter(output_q.get, None):
print result #alternatively, I would like to (c)pickle.dump this, but I am not sure if it is possible.
This does not end nor print any result.
At the end of the whole process I would like to ensure that the working queue is empty, and that all the parallel functions have finished writing to the output queue before the later is iterated to take out the results. Do you have suggestions on how to make it work?

The following code achieves the expected results. It follows the suggestions made by #tawmas.
This code allows to use multiple cores in a process that requires that the queue which feeds data to the workers can be updated by them during the processing:
import multiprocessing as mp
def worker(working_queue, output_queue):
while True:
if working_queue.empty() == True:
break #this is the so-called 'poison pill'
else:
picked = working_queue.get()
if picked % 2 == 0:
output_queue.put(picked)
else:
working_queue.put(picked+1)
return
if __name__ == '__main__':
static_input = xrange(100)
working_q = mp.Queue()
output_q = mp.Queue()
results_bank = []
for i in static_input:
working_q.put(i)
processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(mp.cpu_count())]
for proc in processes:
proc.start()
for proc in processes:
proc.join()
results_bank = []
while True:
if output_q.empty() == True:
break
results_bank.append(output_q.get_nowait())
print len(results_bank) # length of this list should be equal to static_input, which is the range used to populate the input queue. In other words, this tells whether all the items placed for processing were actually processed.
results_bank.sort()
print results_bank

You have a typo in the line that creates the processes. It should be mp.Process, not mp.process. This is what is causing the exception you get.
Also, you are not looping in your workers, so they actually only consume a single item each from the queue and then exit. Without knowing more about the required logic, it's not easy to give specific advice, but you will probably want to enclose the body of your worker function inside a while True loop and add a condition in the body to exit when the work is done.
Please note that, if you do not add a condition to explicitly exit from the loop, your workers will simply stall forever when the queue is empty. You might consider using the so-called poison pill technique to signal the workers they may exit. You will find an example and some useful discussion in the PyMOTW article on Communication Between processes.
As for the number of processes to use, you will need to benchmark a bit to find what works for you, but, in general, one process per core is a good starting point when your workload is CPU bound. If your workload is IO bound, you might have better results with a higher number of workers.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing Queue - child processes gets stuck sometimes and does not reap - python

Related

Multiprocessing does not work and hangs on join on windows 10 [duplicate]

JoinableQueue join() method blocking main thread even after task_done()

Simple and safe way to wait for a Python Process to complete when using a Queue

Python 3 Multiprocessing queue deadlock when calling join before the queue is empty

Python multiprocessing with an updating queue and an output queue

Categories

Resources