Processes fail to exit() when job count exceeds n

Processes fail to exit() when job count exceeds n - python

I'm running Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win32.
I spawn 4 processes, give them 2 queues - for tasks and results, and join the task queue at the end. And when the task count reaches a certain amount - njobs = 10000 for example - some of the children and the main process won't exit, even though all tasks are done.
Why is this?
The code to illustrate this
def worker(job_queue, result_queue):
import Queue
while True:
try:
j = job_queue.get(False)
except Queue.Empty:
exit('done')
else:
result_queue.put_nowait(j)
job_queue.task_done()
if __name__ == "__main__":
from multiprocessing import JoinableQueue, Process, cpu_count
job_queue = JoinableQueue()
result_queue = JoinableQueue()
njobs = 10000
for i in xrange(njobs):
job_queue.put(i)
cpus = cpu_count()
for i in xrange(cpus):
p = Process(target=worker, args=(job_queue, result_queue))
p.start()
job_queue.join()
print("DONE")
And the longer the task, the lower number of tasks required for someone (or all) processes to hang. Originally, I'm doing sequence matching with this. And it usually leaves 3 processes hanging when queue is about 500.

Apparently, having more than 6570 items in a queue might cause a deadlock (more information in this thread). What you can do is empty result_queue at the end of the main execution:
while not result_queue.empty():
result_queue.get(False)
result_queue.task_done()
print "Done"
Note that you don't have to call exit in the worker function, return is enough:
except Queue.Empty:
print "done"
return
You might also consider using a Pool:
from multiprocessing import Pool
def task(arg):
"""Called by the workers"""
return arg
def callback(arg):
"""Called by the main process"""
pass
if __name__ == "__main__":
pool = Pool()
njobs = 10000
print "Enqueuing tasks"
for i in xrange(njobs):
pool.apply_async(task, (i,), callback=callback)
print "Closing the pool"
pool.close()
print "Joining the pool"
pool.join()
print "Done"

This is an implementation limit with pipes or sockets well described in Issue 8426: multiprocessing.Queue fails to get() very large objects. Note it also applies to a lot of small objects.
Solution
Either
make sure to consume the result queue concurrently fast enough
from child processes, call Queue.cancel_join_thread()
Documentation
Bear in mind that a process that has put items in a queue will wait
before terminating until all the buffered items are fed by the
“feeder” thread to the underlying pipe. (The child process can call
the cancel_join_thread() method of the queue to avoid this behaviour.)
This means that whenever you use a queue you need to make sure that
all items which have been put on the queue will eventually be removed
before the process is joined. Otherwise you cannot be sure that
processes which have put items on the queue will terminate. Remember
also that non-daemonic processes will be joined automatically.
— Multiprocessing - Programming guidelines

Related

Multiprocessing does not work and hangs on join on windows 10 [duplicate]

I have a question understanding the queue in the multiprocessing module in python 3
This is what they say in the programming guidelines:
Bear in mind that a process that has put items in a queue will wait before
terminating until all the buffered items are fed by the “feeder” thread to
the underlying pipe. (The child process can call the
Queue.cancel_join_thread
method of the queue to avoid this behaviour.)
This means that whenever you use a queue you need to make sure that all
items which have been put on the queue will eventually be removed before the
process is joined. Otherwise you cannot be sure that processes which have
put items on the queue will terminate. Remember also that non-daemonic
processes will be joined automatically.
An example which will deadlock is the following:
from multiprocessing import Process, Queue
def f(q):
q.put('X' * 1000000)
if __name__ == '__main__':
queue = Queue()
p = Process(target=f, args=(queue,))
p.start()
p.join() # this deadlocks
obj = queue.get()
A fix here would be to swap the last two lines (or simply remove the
p.join() line).
So apparently, queue.get() should not be called after a join().
However there are examples of using queues where get is called after a join like:
import multiprocessing as mp
import random
import string
# define a example function
def rand_string(length, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
rand_str = ''.join(random.choice(
string.ascii_lowercase
+ string.ascii_uppercase
+ string.digits)
for i in range(length))
output.put(rand_str)
if __name__ == "__main__":
# Define an output queue
output = mp.Queue()
# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output))
for x in range(2)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)
I've run this program and it works (also posted as a solution to the StackOverFlow question Python 3 - Multiprocessing - Queue.get() does not respond).
Could someone help me understand what the rule for the deadlock is here?

The queue implementation in multiprocessing that allows data to be transferred between processes relies on standard OS pipes.
OS pipes are not infinitely long, so the process which queues data could be blocked in the OS during the put() operation until some other process uses get() to retrieve data from the queue.
For small amounts of data, such as the one in your example, the main process can join() all the spawned subprocesses and then pick up the data. This often works well, but does not scale, and it is not clear when it will break.
But it will certainly break with large amounts of data. The subprocess will be blocked in put() waiting for the main process to remove some data from the queue with get(), but the main process is blocked in join() waiting for the subprocess to finish. This results in a deadlock.
Here is an example where a user had this exact issue. I posted some code in an answer there that helped him solve his problem.

Don't call join() on a process object before you got all messages from the shared queue.
I used following workaround to allow processes to exit before processing all its results:
results = []
while True:
try:
result = resultQueue.get(False, 0.01)
results.append(result)
except queue.Empty:
pass
allExited = True
for t in processes:
if t.exitcode is None:
allExited = False
break
if allExited & resultQueue.empty():
break
It can be shortened but I left it longer to be more clear for newbies.
Here resultQueue is the multiprocess.Queue that was shared with multiprocess.Process objects. After this block of code you will get the result array with all the messages from the queue.
The problem is that input buffer of the queue pipe that receive messages may become full causing writer(s) infinite block until there will be enough space to receive next message. So you have three ways to avoid blocking:
Increase the multiprocessing.connection.BUFFER size (not so good)
Decrease message size or its amount (not so good)
Fetch messages from the queue immediately as they come (good way)

JoinableQueue join() method blocking main thread even after task_done()

In below code, if I put daemon = True , consumer will quit before reading all queue entries. If consumer is non-daemon, Main thread is always blocked even after the task_done() for all the entries.
from multiprocessing import Process, JoinableQueue
import time
def consumer(queue):
while True:
final = queue.get()
print (final)
queue.task_done()
def producer1(queue):
for i in "QWERTYUIOPASDFGHJKLZXCVBNM":
queue.put(i)
if __name__ == "__main__":
queue = JoinableQueue(maxsize=100)
p1 = Process(target=consumer, args=((queue),))
p2 = Process(target=producer1, args=((queue),))
#p1.daemon = True
p1.start()
p2.start()
print(p1.is_alive())
print (p2.is_alive())
for i in range(1, 10):
queue.put(i)
time.sleep(0.01)
queue.join()

Let's see what—I believe—is happening here:
both processes are being started.
the consumer process starts its loop and blocks until a value is received from the queue.
the producer1 process feeds the queue 26 times with a letter while the main process feeds the queue 9 times with a number. The order in which letters or numbers are being fed is not guaranteed—a number could very well show up before a letter.
when both the producer1 and the main processes are done with feeding their data, the queue is being joined. No problem here, the queue can be joined since all the buffered data has been consumed and task_done() has been called after each read.
the consumer process is still running but is blocked until more data to consume show up.
Looking at your code, I believe that you are confusing the concept of joining processes with the one of joining queues. What you most likely want here is to join processes, you probably don't need a joinable queue at all.
#!/usr/bin/env python3
from multiprocessing import Process, Queue
import time
def consumer(queue):
for final in iter(queue.get, 'STOP'):
print(final)
def producer1(queue):
for i in "QWERTYUIOPASDFGHJKLZXCVBNM":
queue.put(i)
if __name__ == "__main__":
queue = Queue(maxsize=100)
p1 = Process(target=consumer, args=((queue),))
p2 = Process(target=producer1, args=((queue),))
p1.start()
p2.start()
print(p1.is_alive())
print(p2.is_alive())
for i in range(1, 10):
queue.put(i)
time.sleep(0.01)
queue.put('STOP')
p1.join()
p2.join()
Also your producer1 exits on its own after feeding all the letters but you need a way to tell your consumer process to exit when there won't be any more data for it to process. You can do this by sending a sentinel, here I chose the string 'STOP' but it can be anything.
In fact, this code is not great since the 'STOP' sentinel could be received before some letters, thus both causing some letters to not be processed but also a deadlock because the processes are trying to join even though the queue still contains some data. But this is a different problem.

Python 3 Multiprocessing queue deadlock when calling join before the queue is empty

I have a question understanding the queue in the multiprocessing module in python 3
This is what they say in the programming guidelines:
Bear in mind that a process that has put items in a queue will wait before
terminating until all the buffered items are fed by the “feeder” thread to
the underlying pipe. (The child process can call the
Queue.cancel_join_thread
method of the queue to avoid this behaviour.)
This means that whenever you use a queue you need to make sure that all
items which have been put on the queue will eventually be removed before the
process is joined. Otherwise you cannot be sure that processes which have
put items on the queue will terminate. Remember also that non-daemonic
processes will be joined automatically.
An example which will deadlock is the following:
from multiprocessing import Process, Queue
def f(q):
q.put('X' * 1000000)
if __name__ == '__main__':
queue = Queue()
p = Process(target=f, args=(queue,))
p.start()
p.join() # this deadlocks
obj = queue.get()
A fix here would be to swap the last two lines (or simply remove the
p.join() line).
So apparently, queue.get() should not be called after a join().
However there are examples of using queues where get is called after a join like:
import multiprocessing as mp
import random
import string
# define a example function
def rand_string(length, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
rand_str = ''.join(random.choice(
string.ascii_lowercase
+ string.ascii_uppercase
+ string.digits)
for i in range(length))
output.put(rand_str)
if __name__ == "__main__":
# Define an output queue
output = mp.Queue()
# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(5, output))
for x in range(2)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)
I've run this program and it works (also posted as a solution to the StackOverFlow question Python 3 - Multiprocessing - Queue.get() does not respond).
Could someone help me understand what the rule for the deadlock is here?

The queue implementation in multiprocessing that allows data to be transferred between processes relies on standard OS pipes.
OS pipes are not infinitely long, so the process which queues data could be blocked in the OS during the put() operation until some other process uses get() to retrieve data from the queue.
For small amounts of data, such as the one in your example, the main process can join() all the spawned subprocesses and then pick up the data. This often works well, but does not scale, and it is not clear when it will break.
But it will certainly break with large amounts of data. The subprocess will be blocked in put() waiting for the main process to remove some data from the queue with get(), but the main process is blocked in join() waiting for the subprocess to finish. This results in a deadlock.
Here is an example where a user had this exact issue. I posted some code in an answer there that helped him solve his problem.

Don't call join() on a process object before you got all messages from the shared queue.
I used following workaround to allow processes to exit before processing all its results:
results = []
while True:
try:
result = resultQueue.get(False, 0.01)
results.append(result)
except queue.Empty:
pass
allExited = True
for t in processes:
if t.exitcode is None:
allExited = False
break
if allExited & resultQueue.empty():
break
It can be shortened but I left it longer to be more clear for newbies.
Here resultQueue is the multiprocess.Queue that was shared with multiprocess.Process objects. After this block of code you will get the result array with all the messages from the queue.
The problem is that input buffer of the queue pipe that receive messages may become full causing writer(s) infinite block until there will be enough space to receive next message. So you have three ways to avoid blocking:
Increase the multiprocessing.connection.BUFFER size (not so good)
Decrease message size or its amount (not so good)
Fetch messages from the queue immediately as they come (good way)

python multi-processing zombie processes

I have a simple implementation of python's multi-processing module
if __name__ == '__main__':
jobs = []
while True:
for i in range(40):
# fetch one by one from redis queue
#item = item from redis queue
p = Process(name='worker '+str(i), target=worker, args=(item,))
# if p is not running, start p
if not p.is_alive():
jobs.append(p)
p.start()
for j in jobs:
j.join()
jobs.remove(j)
def worker(url_data):
"""worker function"""
print url_data['link']
What I expect this code to do:
run in infinite loop, keep waiting for Redis queue.
if Redis queue not empty, fetch item.
create 40 multiprocess.Process, not more not less
if a process has finished processing, start new process, so that ~40 process are running at all time.
I read that, to avoid zombie process that should be bound(join) to the parent, that's what I expected to achieve in the second loop. But the issue is that on launching it spawns 40 processes, workers finish processing and enter zombie state, until all currently spawned processes haven't finished,
then in next iteration of "while True", the same pattern continues.
So my question is:
How can I avoid zombie processes. and spawn new process as soon as 1 in 40 has finished

For a task like the one you described is usually better to use a different approach using Pool.
You can have the main process fetching data and the workers deal with it.
Following an example of Pool from Python Docs
def f(x):
return x*x
if __name__ == '__main__':
pool = Pool(processes=4) # start 4 worker processes
result = pool.apply_async(f, [10]) # evaluate "f(10)" asynchronously
print result.get(timeout=1) # prints "100" unless your computer is *very* slow
print pool.map(f, range(10)) # prints "[0, 1, 4,..., 81]"
I also suggest to use imap instead of map as it seems your task can be asynch.
Roughly your code will be:
p = Pool(40)
while True:
items = items from redis queue
p.imap_unordered(worker, items) #unordered version is faster
def worker(url_data):
"""worker function"""
print url_data['link']

How to list Processes started by multiprocessing Pool?

While attempting to store multiprocessing's process instance in multiprocessing list-variable 'poolList` I am getting a following exception:
SimpleQueue objects should only be shared between processes through inheritance
The reason why I would like to store the PROCESS instances in a variable is to be able to terminate all or just some of them later (if for example a PROCESS freezes). If storing a PROCESS in variable is not an option I would like to know how to get or to list all the PROCESSES started by mutliprocessing POOL. That would be very similar to what .current_process() method does. Except .current_process gets only a single process while I need all the processes started or all the processes currently running.
Two questions:
Is it even possible to store an instance of the Process (as a result of mp.current_process()
Currently I am only able to get a single process from inside of the function that the process is running (from inside of myFunct() using .current_process() method).
Instead I would like to to list all the processes currently running by multiprocessing. How to achieve it?
import multiprocessing as mp
poolList=mp.Manager().list()
def myFunct(arg):
print 'myFunct(): current process:', mp.current_process()
try: poolList.append(mp.current_process())
except Exception, e: print e
for i in range(110):
for n in range(500000):
pass
poolDict[arg]=i
print 'myFunct(): completed', arg, poolDict
from multiprocessing import Pool
pool = Pool(processes=2)
myArgsList=['arg1','arg2','arg3']
pool=Pool(processes=2)
pool.map_async(myFunct, myArgsList)
pool.close()
pool.join()

To list the processes started by a Pool()-instance(which is what you mean if I understand you correctly), there is the pool._pool-list. And it contains the instances of the processes.
However, it is not part of the documented interface and hence, really should not be used.
BUT...it seems a little bit unlikely that it would change just like that anyway. I mean, should they stop having an internal list of processes in the pool? And not call that _pool?
And also, it annoys me that there at least isn't a get processes-method. Or something.
And handling it breaking due to some name change should not be that difficult.
But still, use at your own risk:
from multiprocessing import pool
# Have to run in main
if __name__ == '__main__':
# Create 3 worker processes
_my_pool = pool.Pool(3)
# Loop, terminate, and remove from the process list
# Use a copy [:] of the list to remove items correctly
for _curr_process in _my_pool._pool[:]:
print("Terminating process "+ str(_curr_process.pid))
_curr_process.terminate()
_my_pool._pool.remove(_curr_process)
# If you call _repopulate, the pool will again contain 3 worker processes.
_my_pool._repopulate_pool()
for _curr_process in _my_pool._pool[:]:
print("After repopulation "+ str(_curr_process.pid))
The example creates a pool and manually terminates all processes.
It is important that you remember to delete the process you terminate from the pool yourself i you want Pool() to continue working as usual.
_my_pool._repopulate increases the number of working processes to 3 again, not needed to answer the question, but gives a little bit of behind-the-scenes insight.

Yes you can get all active process and perform action based on name of process
e.g
multiprocessing.Process(target=foo, name="refresh-reports")
and then
for p in multiprocessing.active_children():
if p.name == "foo":
p.terminate()

You're creating a managed List object, but then letting the associated Manager object expire.
Process objects are shareable because they aren't pickle-able; that is, they aren't simple.
Oddly the multiprocessing module doesn't have the equivalent of threading.enumerate() -- that is, you can't list all outstanding processes. As a workaround, I just store procs in a list. I never terminate() a process, but do sys.exit(0) in the parent. It's rough, because the workers will leave things in an inconsistent state, but it's okay for smaller programs
To kill a frozen worker, I suggest: 1) worker receives "heartbeat" jobs in a queue every now and then, 2) if parent notices worker A hasn't responded to a heartbeat in a certain amount of time, then p.terminate(). Consider restating the problem in another SO question, as it's interesting.
To be honest the map stuff is much easier than using a Manager.
Here's a Manager example I've used. A worker adds stuff to a shared list. Another worker occasionally wakes up, processes everything on the list, then goes back to sleep. The code also has verbose logs, which are essential for ease in debugging.
source
# producer adds to fixed-sized list; scanner uses them
import logging, multiprocessing, sys, time
def producer(objlist):
'''
add an item to list every sec; ensure fixed size list
'''
logger = multiprocessing.get_logger()
logger.info('start')
while True:
try:
time.sleep(1)
except KeyboardInterrupt:
return
msg = 'ding: {:04d}'.format(int(time.time()) % 10000)
logger.info('put: %s', msg)
del objlist[0]
objlist.append( msg )
def scanner(objlist):
'''
every now and then, run calculation on objlist
'''
logger = multiprocessing.get_logger()
logger.info('start')
while True:
try:
time.sleep(5)
except KeyboardInterrupt:
return
logger.info('items: %s', list(objlist))
def main():
logger = multiprocessing.log_to_stderr(
level=logging.INFO
)
logger.info('setup')
# create fixed-length list, shared between producer & consumer
manager = multiprocessing.Manager()
my_objlist = manager.list( # pylint: disable=E1101
[None] * 10
)
multiprocessing.Process(
target=producer,
args=(my_objlist,),
name='producer',
).start()
multiprocessing.Process(
target=scanner,
args=(my_objlist,),
name='scanner',
).start()
logger.info('running forever')
try:
manager.join() # wait until both workers die
except KeyboardInterrupt:
pass
logger.info('done')
if __name__=='__main__':
main()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Processes fail to exit() when job count exceeds n - python

Related

Multiprocessing does not work and hangs on join on windows 10 [duplicate]

JoinableQueue join() method blocking main thread even after task_done()

Python 3 Multiprocessing queue deadlock when calling join before the queue is empty

python multi-processing zombie processes

How to list Processes started by multiprocessing Pool?

Categories

Resources