Subprocesses complete but do not exit when using multiprocessing in python - python

I have my demo code shown as below. I realize that all subprocesses have finished but they do not exit. Is there anything wrong with my code? Python version: 3.7.4, Operation system: win10
import multiprocessing as mp
res_queue = mp.Queue()
def runCalculation(i):
count_list = []
total_count = i
for k in range(100000):
total_count += k
count_list.append(total_count)
print('task {} finished calculation, putting results to queue'.format(i))
for item in count_list: res_queue.put(item)
print('task {} has put all results to queue'.format(i))
def initPool(res_queue_):
global res_queue
res_queue = res_queue_
def mainFunc():
p = mp.Pool(initializer=initPool, initargs=(res_queue,))
for i in range(20): p.apply_async(runCalculation, args=(i,))
print('Waiting for all subprocesses done...')
p.close()
p.join()
print('All subprocesses done.')
if __name__ == '__main__':
mainFunc()

Related

python problem with multiprocessing and for

I'd like to check how much difference the for statement takes with multiprocessing. I don't think the for statement of the function do_something can be executed when I run the code. Please help me out on which part I did wrong.
The sum result kept on going to zero.
import time
import multiprocessing
from sys import stdout
sum=0
def do_something():
for i in range(1000):
global sum
sum=sum+1
progress = 100*(i+1)/1000 #process percentage
stdout.write("\r ===== %d%% completed =====" % progress) #process percentage
stdout.flush()
stdout.write("\n")
# str=StringVar()
if __name__ == '__main__':
start = time.perf_counter()
processes = []
for _ in range(1):
p = multiprocessing.Process(target=do_something) ##
p.start()
processes.append(p)
for process in processes:
process.join()
finish = time.perf_counter()
print(f'{round(finish-start,2)} sec completed')
print(sum)
#Result
0.16 sec completed
0
As #tdelaney commented the subprocess created will be updating an instance of sum that "lives" in its own address space distinct from the address space of the main process that launched it. The usual solution would be to pass to tdelaney a multiprocessing.Queue instance that it can write the sum to and which the main process can then read (which should be done before joining the subprocess).
In the code below, however, I am using a multiprocessing.Pipe on which the multiprocessing.Queue is built. It is not as flexible as a queue in that it only readily supports a single reader and writer, but for this application that is all you need and it is a much better performer. The call to Pipe() returns two connections, one for sending objects and the other for receiving objects.
Note that in your code that the final print statement needs to be indented.
You should also refrain from naming variables the same as builtin functions, e.g. sum.
import time
import multiprocessing
from sys import stdout
def do_something(send_conn):
the_sum = 0
for i in range(1000):
the_sum = the_sum + 1
progress = 100*(i+1)/1000 #process percentage
stdout.write("\r ===== %d%% completed =====" % progress) #process percentage
stdout.flush()
stdout.write("\n")
send_conn.send(the_sum)
# str=StringVar()
if __name__ == '__main__':
start = time.perf_counter()
read_conn, send_conn = multiprocessing.Pipe(duplex=False)
p = multiprocessing.Process(target=do_something, args=(send_conn,)) ##
p.start()
the_sum = read_conn.recv()
p.join()
finish = time.perf_counter()
print(f'{round(finish-start,2)} sec completed')
print(the_sum)
Prints:
===== 100% completed =====
0.16 sec completed
1000
Here is the same code using a multiprocessing.Queue:
import time
import multiprocessing
from sys import stdout
def do_something(queue):
the_sum = 0
for i in range(1000):
the_sum = the_sum + 1
progress = 100*(i+1)/1000 #process percentage
stdout.write("\r ===== %d%% completed =====" % progress) #process percentage
stdout.flush()
stdout.write("\n")
queue.put(the_sum)
# str=StringVar()
if __name__ == '__main__':
start = time.perf_counter()
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=do_something, args=(queue,)) ##
p.start()
the_sum = queue.get()
p.join()
finish = time.perf_counter()
print(f'{round(finish-start,2)} sec completed')
print(the_sum)
Prints:
===== 100% completed =====
0.17 sec completed
1000

How multiprocess share a common queue?

I want to start 4 process which put an integer in queue when counter is divisible by 100.Same time another process continuously read it and print it.Please correct my code to run...I am getting an error ['Queue' object is not iterable]
from multiprocessing import Lock, Process, Queue, current_process
import time
import queue
def doFirstjob(process_Queue):
i=0
while True:
if i%100==0:
process_Queue.put(i)
else:
i+=1
def doSecondjob(process_Queue):
while(1):
if not process_Queue.Empty:
task = process_Queue.get()
print("task: ",task)
else:
time.sleep(0.2)
def main():
number_of_processes = 4
process_Queue = Queue()
processes = []
process_Queue.put(1)
q = Process(target=doSecondjob, args=(process_Queue))
q.start()
for w in range(number_of_processes):
p = Process(target=doFirstjob, args=(process_Queue))
processes.append(p)
p.start()
if __name__ == '__main__':
main()
You were getting error because Process was expecting a list/tuple in arguments/args.
Also instead of Empty it should be empty.
change the code to below.
from multiprocessing import Lock, Process, Queue, current_process
import time
import queue
def doFirstjob(process_Queue):
i=0
while True:
print("foo")
if i%100==0:
process_Queue.put(i)
else:
i+=1
def doSecondjob(process_Queue):
while(1):
print("bar")
if not process_Queue.empty:
task = process_Queue.get()
print("task: ",task)
else:
time.sleep(0.2)
def main():
number_of_processes = 4
process_Queue = Queue()
processes = []
process_Queue.put(1)
q = Process(target=doSecondjob, args=(process_Queue,))
q.start()
for w in range(number_of_processes):
p = Process(target=doFirstjob, args=(process_Queue,))
processes.append(p)
p.start()
if __name__ == '__main__':
main()

Python multiprocessing with Queue (split loads dynamically)

I am trying to use multiprocessing to process very large number of files.
I tried to put the list of files into queue and make 3 workers split the load with a common Queue data type. However this seems not working. Probably I am misunderstanding about the queue in multiprocessing package.
Below is the example source code:
import multiprocessing
from multiprocessing import Queue
def worker(i, qu):
"""worker function"""
while ~qu.empty():
val=qu.get()
print 'Worker:',i, ' start with file:',val
j=1
for k in range(i*10000,(i+1)*10000): # some time consuming process
for j in range(i*10000,(i+1)*10000):
j=j+k
print 'Worker:',i, ' end with file:',val
if __name__ == '__main__':
jobs = []
qu=Queue()
for j in range(100,110): # files numbers are from 100 to 110
qu.put(j)
for i in range(3): # 3 multiprocess
p = multiprocessing.Process(target=worker, args=(i,qu))
jobs.append(p)
p.start()
p.join()
Thanks for the comments.
I come to know that using Pool is the best solution.
import multiprocessing
import time
def worker(val):
"""worker function"""
print 'Worker: start with file:',val
time.sleep(1.1)
print 'Worker: end with file:',val
if __name__ == '__main__':
file_list=range(100,110)
p = multiprocessing.Pool(2)
p.map(worker, file_list)
Two issues:
1) you are joining only on the 3rd process
2) Why not use multiprocessing.Pool?
3) race condition on qu.get()
1 & 3)
import multiprocessing
from multiprocessing import Queue
def worker(i, qu):
"""worker function"""
while 1:
try:
val=qu.get(timeout)
except Queue.Empty: break# Yay no race condition
print 'Worker:',i, ' start with file:',val
j=1
for k in range(i*10000,(i+1)*10000): # some time consuming process
for j in range(i*10000,(i+1)*10000):
j=j+k
print 'Worker:',i, ' end with file:',val
if __name__ == '__main__':
jobs = []
qu=Queue()
for j in range(100,110): # files numbers are from 100 to 110
qu.put(j)
for i in range(3): # 3 multiprocess
p = multiprocessing.Process(target=worker, args=(i,qu))
jobs.append(p)
p.start()
for p in jobs: #<--- join on all processes ...
p.join()
2)
for how to use the Pool, see:
https://docs.python.org/2/library/multiprocessing.html
You are joining only the last of your created processes. That means if the first or the second process is still working while the third is finished, your main process is goning down and kills the remaining processes before they are finished.
You should join them all in order to wait until they are finished:
for p in jobs:
p.join()
Another thing is you should consider using qu.get_nowait() in order to get rid of the race condition between qu.empty() and qu.get().
For example:
try:
while 1:
message = self.queue.get_nowait()
""" do something fancy here """
except Queue.Empty:
pass
I hope that helps

Python Multiprocessing Pipe "Deadlock"

I'm facing problems with the following example code:
from multiprocessing import Lock, Process, Queue, current_process
def worker(work_queue, done_queue):
for item in iter(work_queue.get, 'STOP'):
print("adding ", item, "to done queue")
#this works: done_queue.put(item*10)
done_queue.put(item*1000) #this doesnt!
return True
def main():
workers = 4
work_queue = Queue()
done_queue = Queue()
processes = []
for x in range(10):
work_queue.put("hi"+str(x))
for w in range(workers):
p = Process(target=worker, args=(work_queue, done_queue))
p.start()
processes.append(p)
work_queue.put('STOP')
for p in processes:
p.join()
done_queue.put('STOP')
for item in iter(done_queue.get, 'STOP'):
print(item)
if __name__ == '__main__':
main()
When the done Queue becomes big enough (a limit about 64k i think), the whole thing freezes without any further notice.
What is the general approach for such a situation when the queue becomes too big? is there some way to remove elements on the fly once they are processed? The Python docs recommend removing the p.join(), in a real application however i can not estimate when the processes have finished. Is there a simple solution for this problem besides infinite looping and using .get_nowait()?
This works for me with 3.4.0alpha4, 3.3, 3.2, 3.1 and 2.6. It tracebacks with 2.7 and 3.0. I pylint'd it, BTW.
#!/usr/local/cpython-3.3/bin/python
'''SSCCE for a queue deadlock'''
import sys
import multiprocessing
def worker(workerno, work_queue, done_queue):
'''Worker function'''
#reps = 10 # this worked for the OP
#reps = 1000 # this worked for me
reps = 10000 # this didn't
for item in iter(work_queue.get, 'STOP'):
print("adding", item, "to done queue")
#this works: done_queue.put(item*10)
for thing in item * reps:
#print('workerno: {}, adding thing {}'.format(workerno, thing))
done_queue.put(thing)
done_queue.put('STOP')
print('workerno: {0}, exited loop'.format(workerno))
return True
def main():
'''main function'''
workers = 4
work_queue = multiprocessing.Queue(maxsize=0)
done_queue = multiprocessing.Queue(maxsize=0)
processes = []
for integer in range(10):
work_queue.put("hi"+str(integer))
for workerno in range(workers):
dummy = workerno
process = multiprocessing.Process(target=worker, args=(workerno, work_queue, done_queue))
process.start()
processes.append(process)
work_queue.put('STOP')
itemno = 0
stops = 0
while True:
item = done_queue.get()
itemno += 1
sys.stdout.write('itemno {0}\r'.format(itemno))
if item == 'STOP':
stops += 1
if stops == workers:
break
print('exited done_queue empty loop')
for workerno, process in enumerate(processes):
print('attempting process.join() of workerno {0}'.format(workerno))
process.join()
done_queue.put('STOP')
if __name__ == '__main__':
main()
HTH

How do you pass a Queue reference to a function managed by pool.map_async()?

I want a long-running process to return its progress over a Queue (or something similar) which I will feed to a progress bar dialog. I also need the result when the process is completed. A test example here fails with a RuntimeError: Queue objects should only be shared between processes through inheritance.
import multiprocessing, time
def task(args):
count = args[0]
queue = args[1]
for i in xrange(count):
queue.put("%d mississippi" % i)
return "Done"
def main():
q = multiprocessing.Queue()
pool = multiprocessing.Pool()
result = pool.map_async(task, [(x, q) for x in range(10)])
time.sleep(1)
while not q.empty():
print q.get()
print result.get()
if __name__ == "__main__":
main()
I've been able to get this to work using individual Process objects (where I am alowed to pass a Queue reference) but then I don't have a pool to manage the many processes I want to launch. Any advise on a better pattern for this?
The following code seems to work:
import multiprocessing, time
def task(args):
count = args[0]
queue = args[1]
for i in xrange(count):
queue.put("%d mississippi" % i)
return "Done"
def main():
manager = multiprocessing.Manager()
q = manager.Queue()
pool = multiprocessing.Pool()
result = pool.map_async(task, [(x, q) for x in range(10)])
time.sleep(1)
while not q.empty():
print q.get()
print result.get()
if __name__ == "__main__":
main()
Note that the Queue is got from a manager.Queue() rather than multiprocessing.Queue(). Thanks Alex for pointing me in this direction.
Making q global works...:
import multiprocessing, time
q = multiprocessing.Queue()
def task(count):
for i in xrange(count):
q.put("%d mississippi" % i)
return "Done"
def main():
pool = multiprocessing.Pool()
result = pool.map_async(task, range(10))
time.sleep(1)
while not q.empty():
print q.get()
print result.get()
if __name__ == "__main__":
main()
If you need multiple queues, e.g. to avoid mixing up the progress of the various pool processes, a global list of queues should work (of course, each process will then need to know what index in the list to use, but that's OK to pass as an argument;-).

Categories