Why is this multiprocessing.Pool stuck?

Why is this multiprocessing.Pool stuck? - python

CODE:
from multiprocessing import Pool
print ('parent')
max_processes = 4
def foo(result):
print (result)
def main():
pool = Pool(processes=max_processes)
while True:
pool.apply_async(foo, 5)
if __name__ == '__main__':
main()
'parent' gets printed 5 times, so initial pools were created. But there was no execution from print(result) statement.

You are passing the arguments incorrectly in your call to apply_async. The arguments need to be in a tuple (or other sequence, maybe), but you're passing 5 as a bare number.
Try:
def main():
pool = Pool(processes=max_processes)
while True:
pool.apply_async(foo, (5,)) # make a 1-tuple for the args!

Try to add with Pool(processes=max_processes) as pool:
with Pool(processes=max_processes) as pool:
while True:
pool.apply_async(foo, 5)
...
Warning multiprocessing.pool objects have internal resources that need to be properly managed (like any other resource) by using the pool as a context manager or by calling close() and terminate() manually. Failure to do this can lead to the process hanging on finalization.

Related

Implementing "competing" processes in python

I'm trying to implement a function that takes 2 functions as arguments, runs both, returns the value of the function that returns first and kills the slower function before it finishes its execution.
My problem is that when I try to empty the Queue object I use to collect the return values, I get stuck.
Is there a more 'correct' way to handle this scenario or even an existing module? If not, can anyone explain what I'm doing wrong?
Here is my code (the implementation of the above function is 'run_both()'):
import multiprocessing as mp
from time import sleep
Q = mp.Queue()
def dump_queue(queue):
result = []
for i in iter(queue.get, 'STOP'):
result.append(i)
return result
def rabbit(x):
sleep(10)
Q.put(x)
def turtle(x):
sleep(30)
Q.put(x)
def run_both(a,b):
a.start()
b.start()
while a.is_alive() and b.is_alive():
sleep(1)
if a.is_alive():
a.terminate()
else:
b.terminate()
a.join()
b.join()
return dump_queue(Q)
p1 = mp.Process(target=rabbit, args=(1,))
p1 = mp.Process(target=turtle, args=(2,))
run_both(p1, p2)

Here's an example to call 2 or more functions with multiprocessing and return the fastest result. There are a few important things to note however.
Running multiprocessing code in IDLE sometimes causes problems. This example works, but I did run into that issue trying to solve this.
Multiprocessing code should start from inside a if __name__ == '__main__' clause, or else it will be run again if the main module is re-imported by another process. read the multiprocessing doc page for more info.
The result queue is passed directly to each process that uses it. When you use the queue by referencing a global name in the module, the code fails on windows because a new instance of the queue is used by each process. Read more here Multiprocessing Queue.get() hangs
I have also added a bit of a feature here to know which process' result was actually used.
import multiprocessing as mp
import time
import random
def task(value):
# our dummy task is to sleep for a random amount of time and
# return the given arg value
time.sleep(random.random())
return value
def process(q, idx, fn, args):
# simply call function fn with args, and push its result in the queue with its index
q.put([fn(*args), idx])
def fastest(calls):
queue = mp.Queue()
# we must pass the queue directly to each process that may use it
# or else on Windows, each process will have its own copy of the queue
# making it useless
procs = []
# create a 'mp.Process' that calls our 'process' for each call and start it
for idx, call in enumerate(calls):
fn = call[0]
args = call[1:]
p = mp.Process(target=process, args=(queue, idx, fn, args))
procs.append(p)
p.start()
# wait for the queue to have something
result, idx = queue.get()
for proc in procs: # kill all processes that may still be running
proc.terminate()
# proc may be using queue, so queue may be corrupted.
# https://docs.python.org/3.8/library/multiprocessing.html?highlight=queue#multiprocessing.Process.terminate
# we no longer need queue though so this is fine
return result, idx
if __name__ == '__main__':
from datetime import datetime
start = datetime.now()
print(start)
# to be compatible with 'fastest', each call is a list with the first
# element being callable, followed by args to be passed
calls = [
[task, 1],
[task, 'hello'],
[task, [1,2,3]]
]
val, idx = fastest(calls)
end = datetime.now()
print(end)
print('elapsed time:', end-start)
print('returned value:', val)
print('from call at index', idx)
Example output:
2019-12-21 04:01:09.525575
2019-12-21 04:01:10.171891
elapsed time: 0:00:00.646316
returned value: hello
from call at index 1

Apart from the typo on the penultimate line which should read:
p2 = mp.Process(target=turtle, args=(2,)) # not p1
the simplest change you can make to get the program to work is to add:
Q.put('STOP')
to the end of turtle() and rabbit().
You also don't really need to keep looping watching if the processes are alive, by definition if you just read the message queue and receive STOP, one of them has finished, so you could replace run_both() with:
def run_both(a,b):
a.start()
b.start()
result = dump_queue(Q)
a.terminate()
b.terminate()
return result
You may also need to think about what happens if both processes put some messages in the queue at much the same time. They could get mixed up. Maybe consider using 2 queues, or joining all the results up into a single message rather than appending multiple values together from queue.get()

I can't see any of my sub processes to be called,when I use apply_async() func of multiprocessing.Pool

I know the basic usage of multiprocessing about pools,and I use apply_async() func to avoid block,my problem code such like:
from multiprocessing import Pool, Queue
import time
q = Queue(maxsize=20)
script = "my_path/my_exec_file"
def initQueue():
...
def test_func(queue):
print 'Coming'
While True:
do_sth
...
if __name__ == '__main__':
initQueue()
pool = Pool(processes=3)
for i in xrange(11,20):
result = pool.apply_async(test_func, (q,))
pool.close()
while True:
if q.empty():
print 'Queue is emty,quit'
break
print 'Main Process Lintening'
time.sleep(2)
The results output are always Main Process Linstening,I can;t find word 'Coming'..
The code above has no syntax error and no any Exceptions.
Any one can help, thanks!

Python sharing a lock between processes

I am attempting to use a partial function so that pool.map() can target a function that has more than one parameter (in this case a Lock() object).
Here is example code (taken from an answer to a previous question of mine):
from functools import partial
def target(lock, iterable_item):
for item in items:
# Do cool stuff
if (... some condition here ...):
lock.acquire()
# Write to stdout or logfile, etc.
lock.release()
def main():
iterable = [1, 2, 3, 4, 5]
pool = multiprocessing.Pool()
l = multiprocessing.Lock()
func = partial(target, l)
pool.map(func, iterable)
pool.close()
pool.join()
However when I run this code, I get the error:
Runtime Error: Lock objects should only be shared between processes through inheritance.
What am I missing here? How can I share the lock between my subprocesses?

You can't pass normal multiprocessing.Lock objects to Pool methods, because they can't be pickled. There are two ways to get around this. One is to create Manager() and pass a Manager.Lock():
def main():
iterable = [1, 2, 3, 4, 5]
pool = multiprocessing.Pool()
m = multiprocessing.Manager()
l = m.Lock()
func = partial(target, l)
pool.map(func, iterable)
pool.close()
pool.join()
This is a little bit heavyweight, though; using a Manager requires spawning another process to host the Manager server. And all calls to acquire/release the lock have to be sent to that server via IPC.
The other option is to pass the regular multiprocessing.Lock() at Pool creation time, using the initializer kwarg. This will make your lock instance global in all the child workers:
def target(iterable_item):
for item in items:
# Do cool stuff
if (... some condition here ...):
lock.acquire()
# Write to stdout or logfile, etc.
lock.release()
def init(l):
global lock
lock = l
def main():
iterable = [1, 2, 3, 4, 5]
l = multiprocessing.Lock()
pool = multiprocessing.Pool(initializer=init, initargs=(l,))
pool.map(target, iterable)
pool.close()
pool.join()
The second solution has the side-effect of no longer requiring partial.

Here's a version (using Barrier instead of Lock, but you get the idea) which would also work on Windows (where the missing fork is causing additional troubles):
import multiprocessing as mp
def procs(uid_barrier):
uid, barrier = uid_barrier
print(uid, 'waiting')
barrier.wait()
print(uid, 'past barrier')
def main():
N_PROCS = 10
with mp.Manager() as man:
barrier = man.Barrier(N_PROCS)
with mp.Pool(N_PROCS) as p:
p.map(procs, ((uid, barrier) for uid in range(N_PROCS)))
if __name__ == '__main__':
mp.freeze_support()
main()

Obtain results from processes using python multiprocessing

I am trying to understand how to use the multiprocessing module in Python. The code below spawns four processes and outputs the results as they become available. It seems to me that there must be a better way for how the results are obtained from the Queue; some method that does not rely on counting how many items the Queue contains but that just returns items as they become available and then gracefully exits once the queue is empty. The docs say that Queue.empty() method is not reliable. Is there a better alternative for how to consume the results from the queue?
import multiprocessing as mp
import time
def multby4_wq(x, queue):
print "Starting!"
time.sleep(5.0/x)
a = x*4
queue.put(a)
if __name__ == '__main__':
queue1 = mp.Queue()
for i in range(1, 5):
p = mp.Process(target=multbyc_wq, args=(i, queue1))
p.start()
for i in range(1, 5): # This is what I am referring to as counting again
print queue1.get()

Instead of using queue, how about using Pool?
For example,
import multiprocessing as mp
import time
def multby4_wq(x):
print "Starting!"
time.sleep(5.0/x)
a = x*4
return a
if __name__ == '__main__':
pool = mp.Pool(4)
for result in pool.map(multby4_wq, range(1, 5)):
print result
Pass multiple arguments
Assume you have a function that accept multiple parameters (add in this example). Make a wrapper function that pass arguments to add (add_wrapper).
import multiprocessing as mp
import time
def add(x, y):
time.sleep(1)
return x + y
def add_wrapper(args):
return add(*args)
if __name__ == '__main__':
pool = mp.Pool(4)
for result in pool.map(add_wrapper, [(1,2), (3,4), (5,6), (7,8)]):
print result

How do you pass a Queue reference to a function managed by pool.map_async()?

I want a long-running process to return its progress over a Queue (or something similar) which I will feed to a progress bar dialog. I also need the result when the process is completed. A test example here fails with a RuntimeError: Queue objects should only be shared between processes through inheritance.
import multiprocessing, time
def task(args):
count = args[0]
queue = args[1]
for i in xrange(count):
queue.put("%d mississippi" % i)
return "Done"
def main():
q = multiprocessing.Queue()
pool = multiprocessing.Pool()
result = pool.map_async(task, [(x, q) for x in range(10)])
time.sleep(1)
while not q.empty():
print q.get()
print result.get()
if __name__ == "__main__":
main()
I've been able to get this to work using individual Process objects (where I am alowed to pass a Queue reference) but then I don't have a pool to manage the many processes I want to launch. Any advise on a better pattern for this?

The following code seems to work:
import multiprocessing, time
def task(args):
count = args[0]
queue = args[1]
for i in xrange(count):
queue.put("%d mississippi" % i)
return "Done"
def main():
manager = multiprocessing.Manager()
q = manager.Queue()
pool = multiprocessing.Pool()
result = pool.map_async(task, [(x, q) for x in range(10)])
time.sleep(1)
while not q.empty():
print q.get()
print result.get()
if __name__ == "__main__":
main()
Note that the Queue is got from a manager.Queue() rather than multiprocessing.Queue(). Thanks Alex for pointing me in this direction.

Making q global works...:
import multiprocessing, time
q = multiprocessing.Queue()
def task(count):
for i in xrange(count):
q.put("%d mississippi" % i)
return "Done"
def main():
pool = multiprocessing.Pool()
result = pool.map_async(task, range(10))
time.sleep(1)
while not q.empty():
print q.get()
print result.get()
if __name__ == "__main__":
main()
If you need multiple queues, e.g. to avoid mixing up the progress of the various pool processes, a global list of queues should work (of course, each process will then need to know what index in the list to use, but that's OK to pass as an argument;-).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why is this multiprocessing.Pool stuck? - python

You are passing the arguments incorrectly in your call to apply_async. The arguments need to be in a tuple (or other sequence, maybe), but you're passing 5 as a bare number. Try: def main(): pool = Pool(processes=max_processes) while True: pool.apply_async(foo, (5,)) # make a 1-tuple for the args!

Related

Implementing "competing" processes in python

I can't see any of my sub processes to be called,when I use apply_async() func of multiprocessing.Pool

Python sharing a lock between processes

Obtain results from processes using python multiprocessing

How do you pass a Queue reference to a function managed by pool.map_async()?

Categories

Resources