python multiprocessing - process hangs on join for large queue - python

I'm running python 2.7.3 and I noticed the following strange behavior. Consider this minimal example:
from multiprocessing import Process, Queue
def foo(qin, qout):
while True:
bar = qin.get()
if bar is None:
break
qout.put({'bar': bar})
if __name__ == '__main__':
import sys
qin = Queue()
qout = Queue()
worker = Process(target=foo,args=(qin,qout))
worker.start()
for i in range(100000):
print i
sys.stdout.flush()
qin.put(i**2)
qin.put(None)
worker.join()
When I loop over 10,000 or more, my script hangs on worker.join(). It works fine when the loop only goes to 1,000.
Any ideas?

The qout queue in the subprocess gets full. The data you put in it from foo() doesn't fit in the buffer of the OS's pipes used internally, so the subprocess blocks trying to fit more data. But the parent process is not reading this data: it is simply blocked too, waiting for the subprocess to finish. This is a typical deadlock.

There must be a limit on the size of queues. Consider the following modification:
from multiprocessing import Process, Queue
def foo(qin,qout):
while True:
bar = qin.get()
if bar is None:
break
#qout.put({'bar':bar})
if __name__=='__main__':
import sys
qin=Queue()
qout=Queue() ## POSITION 1
for i in range(100):
#qout=Queue() ## POSITION 2
worker=Process(target=foo,args=(qin,))
worker.start()
for j in range(1000):
x=i*100+j
print x
sys.stdout.flush()
qin.put(x**2)
qin.put(None)
worker.join()
print 'Done!'
This works as-is (with qout.put line commented out). If you try to save all 100000 results, then qout becomes too large: if I uncomment out the qout.put({'bar':bar}) in foo, and leave the definition of qout in POSITION 1, the code hangs. If, however, I move qout definition to POSITION 2, then the script finishes.
So in short, you have to be careful that neither qin nor qout becomes too large. (See also: Multiprocessing Queue maxsize limit is 32767)

I had the same problem on python3 when tried to put strings into a queue of total size about 5000 cahrs.
In my project there was a host process that sets up a queue and starts subprocess, then joins. Afrer join host process reads form the queue. When subprocess producess too much data, host hungs on join. I fixed this using the following function to wait for subprocess in the host process:
from multiprocessing import Process, Queue
from queue import Empty
def yield_from_process(q: Queue, p: Process):
while p.is_alive():
p.join(timeout=1)
while True:
try:
yield q.get(block=False)
except Empty:
break
I read from queue as soon as it fills so it never gets very large

I was trying to .get() an async worker after the pool had closed
indentation error outside of a with block
i had this
with multiprocessing.Pool() as pool:
async_results = list()
for job in jobs:
async_results.append(
pool.apply_async(
_worker_func,
(job,),
)
)
# wrong
for async_result in async_results:
yield async_result.get()
i needed this
with multiprocessing.Pool() as pool:
async_results = list()
for job in jobs:
async_results.append(
pool.apply_async(
_worker_func,
(job,),
)
)
# right
for async_result in async_results:
yield async_result.get()

Related

multiprocessing, threading gets stuck and printing output gets messed up

I'm running multiple threads in python. I've tried using threading module, multiprocessing module. Even though the execution gives the correct result, everytime the terminal gets stuck and printing of the output gets messed up.
Here's a simplified version of the code.
import subprocess
import threading
import argparse
import sys
result = []
def check_thread(args,components,id):
for i in components:
cmd = <command to be given to terminal>
output = subprocess.check_output([cmd],shell=True)
result.append((id,i,output))
def check(args,components):
# lock = threading.Lock()
# lock = threading.Semaphore(value=1)
thread_list = []
for id in range(3):
t=threading.Thread(target=check_thread, args=(args,components,i))
thread_list.append(t)
for thread in thread_list:
thread.start()
for thread in thread_list:
thread.join()
for res in result:
print(res)
return res
if __name__ == 'main':
parser = argparse.ArgumentParser(....)
parser.add_argument(.....)
args = parser.parse_args()
components = ['comp1','comp2']
while True:
print('SELECTION MENU\n1)\n2)\n')
option = raw_input('Enter option')
if option=='1':
res = check(args, components)
if option=='2':
<do something else>
else:
sys.exit(0)
I've tried using multiprocessing module with Process, pool. Tried passing a lock to check_thread, tried returning a value from check_thread() and using a queue to take in the values, but everytime it's the same result, execution is successful but the terminal gets stuck and printed output is shabby.
Is there any fix to this? I'm using python 2.7. I'm using a linux terminal.
Here is how the shabby output looks
output
You should use queue module not list.
import multiprocessing as mp
# Define an output queue
output = mp.Queue()
# define a example function
def function(params, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
# Process params and store results in res variable
output.put(res)
# Setup a list of processes that we want to run
processes = [mp.Process(target=function, args=(5, output)) for x in range(10)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)

Python JoinableQueue and Queue Thread Not Completing

I am using the following code to complete a task using multithreading with Queue and Joinable Queue. Sometimes the script executes perfectly other times it stalls at the end of the task without ending the worker and will not continue on to the next portion of the script. I am new to working with Queue and JoinableQueue and I need to find out why this stalling happens.
Before this part in the code I run another Queue, JoinableQueue worker to download some data and it works perfectly fine everytime. Do I need to close() any thing from the first Queue/JoinableQueue? Is there a way to check if it stalls and if so continue on?
Here is my code:
import multiprocessing
from multiprocessing import Queue
from multiprocessing import JoinableQueue
from threading import Thread
def run_this_definition(hr):
#do things here
return()
def worker():
while True:
item = jq.get()
run_this_definition(item)
jq.task_done()
return()
q = Queue()
jq = JoinableQueue()
number_of_threads = 8
for i in range(number_of_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
input_list = [0,1,2,3,4]
for item in input_list:
jq.put(item)
jq.join()
print "finished"
The script never prints "finished" when it stalls, but seems to finish all the tasks and stalls at the end of the 'run_this_definition' on the very last item in the Queue.
My guess is you are using the multiprocessing.JoinableQueue()!? Use the Queue.Queue() instead for threading. It has a .join() and a .task_done() method as well. Furthermore you should pass your queue as an argument to your threads: See the following example:
import threading
from threading import Thread
from Queue import Queue
def worker(jq):
while True:
item = jq.get()
# Do whatever you have to do.
print '{}: {}'.format(threading.currentThread().name, item)
jq.task_done()
return()
number_of_threads = 4
input_list = [1,2,3,4,5]
jq = Queue()
for i in range(number_of_threads):
t = Thread(target=worker, args=(jq,))
t.daemon = True
t.start()
for item in input_list:
jq.put(item)
jq.join()
print "finished"
The print output form multiple threads might look messy, but as an example it should be fine.
For the future: Please provide a comprehensive example of your code. Neither your imports, nor number_of_threads, run_this_definition or input_list were defined in your example.

Why doesn't Python process with input and output queues not join once it is done?

This simple Python3 program using multiprocessing does not seem to work as expected.
All the input processes share an input queue from which they consume data. They all share an output queue where they write a result once they are fully done. I find that this program hangs at the process join(). Why is that?
#!/usr/bin/env python3
import multiprocessing
def worker_func(in_q, out_q):
print("A worker has started")
w_results = {}
while not in_q.empty():
v = in_q.get()
w_results[v] = v
out_q.put(w_results)
print("A worker has finished")
def main():
# Input queue to share among processes
fpaths = [str(i) for i in range(10000)]
in_q = multiprocessing.Queue()
for fpath in fpaths:
in_q.put(fpath)
# Create processes and start them
N_PROC = 2
out_q = multiprocessing.Queue()
workers = []
for _ in range(N_PROC):
w = multiprocessing.Process(target=worker_func, args=(in_q, out_q,))
w.start()
workers.append(w)
print("Done adding workers")
# Wait for processes to finish
for w in workers:
w.join()
print("Done join of workers")
# Collate worker results
out_results = {}
while not out_q.empty():
out_results.update(out_q.get())
if __name__ == "__main__":
main()
I get this result from this program when N_PROC = 2:
$ python3 test.py
Done adding workers
A worker has started
A worker has started
A worker has finished
<---- I do not get "A worker has finished" from second worker
<---- I do not get "Done join of workers"
It does not work even with a single child process N_PROC = 1:
$ python3 test.py
Done adding workers
A worker has started
A worker has finished
<---- I do not get "Done join of workers"
If I try a smaller input queue with say 1000 items, everything works fine.
I am aware of some old StackOverflow questions that say that the Queue has a limit. Why is this not documented in the Python3 docs?
What is an alternative solution I can use? I want to use multi-processing (not threading), to split the input among N processes. Once their shared input queue is empty, I want each process to collect its results (can be a big/complex data structure like dict) and return it back to the parent process. How to do this?
This is a classical bug caused by your design. When the worker are terminating, they stall because they have not been able to put all the data in the out_q, thus deadlocking your program. This has to do with size of the pipe buffer underlying your queue.
When you are using a multiprocessing.Queue, you should empty it before trying to join the feeder process, to make sure that the Process does not stall waiting for all the object to be put in the Queue. So putting your out_q.get call before the joinning the processes should solve your problem:. You can use a sentinel pattern to detect the end of the computations.
#!/usr/bin/env python3
import multiprocessing
from multiprocessing.queues import Empty
def worker_func(in_q, out_q):
print("A worker has started")
w_results = {}
while not in_q.empty():
try:
v = in_q.get(timeout=1)
w_results[v] = v
except Empty:
pass
out_q.put(w_results)
out_q.put(None)
print("A worker has finished")
def main():
# Input queue to share among processes
fpaths = [str(i) for i in range(10000)]
in_q = multiprocessing.Queue()
for fpath in fpaths:
in_q.put(fpath)
# Create processes and start them
N_PROC = 2
out_q = multiprocessing.Queue()
workers = []
for _ in range(N_PROC):
w = multiprocessing.Process(target=worker_func, args=(in_q, out_q,))
w.start()
workers.append(w)
print("Done adding workers")
# Collate worker results
out_results = {}
n_proc_end = 0
while not n_proc_end == N_PROC:
res = out_q.get()
if res is None:
n_proc_end += 1
else:
out_results.update(res)
# Wait for processes to finish
for w in workers:
w.join()
print("Done join of workers")
if __name__ == "__main__":
main()
Also, note that your code has a race condition in it. The queue in_q can be emptied between the moment you check not in_q.empty() and the get. You should use a non blocking get to make sure you don't deadlock, waiting on an empty queue.
Finally, you are trying to implement something that look like a multiprocessing.Pool, which handle this kind of communication in a more robust way. you can also look at the concurrent.futures API, which is even more robust and in some sense, better designed.

python multiprocessing.Process's join can not end

I'm going to write a program which has multiple process(CPU-crowded) and multiple threading(IO-crowded).(the code below just a sample, not the program)
But when the code meet the join() ,it make the program become a deadlock.
My code is post below
import requests
import time
from multiprocessing import Process, Queue
from multiprocessing.dummy import Pool
start = time.time()
queue = Queue()
rQueue = Queue()
url = 'http://www.bilibili.com/video/av'
for i in xrange(10):
queue.put(url+str(i))
def goURLsCrawl(queue, rQueue):
threadPool = Pool(7)
while not queue.empty():
threadPool.apply_async(urlsCrawl, args=(queue.get(), rQueue))
threadPool.close()
threadPool.join()
print 'end'
def urlsCrawl(url, rQueue):
response = requests.get(url)
rQueue.put(response)
p = Process(target=goURLsCrawl, args=(queue, rQueue))
p.start()
p.join() # join() is here
end = time.time()
print 'totle time %0.4f' % (end-start,)
Thanks in advance.😊
I finally find the reason. As you can see, I import the Queue from the multiprocessing, so the Queue should only used for Process, but I make the Thread access the Queue on my code, so it must something unknown occur behind the program.
To correct it, just import Queue instead of multiprocessing.Queue

Multiprocessing and global True/False variable

I'm struggling to get my head around multiprocessing and passing a global True/False variable into my function.
After get_data() finishes I want the analysis() function to start and process the data, while fetch() continues running. How can I make this work? TIA
import multiprocessing
ready = False
def fetch():
global ready
get_data()
ready = True
return
def analysis():
analyse_data()
if __name__ == '__main__':
p1 = multiprocessing.Process(target=fetch)
p2 = multiprocessing.Process(target=analysis)
p1.start()
if ready:
p2.start()
You should run the two processes and use a shared queue to exchange information between them, such as signaling the completion of an action in one of the processes.
Also, you need to have a join() statement to properly wait for completion of the processes you spawn.
from multiprocessing import Process, Queue
import time
def get_data(q):
#Do something to get data
time.sleep(2)
#Put an event in the queue to signal that get_data has finished
q.put('message from get_data to analyse_data')
def analyse_data(q):
#waiting for get_data to finish...
msg = q.get()
print msg #Will print 'message from get_data to analyse_data'
#get_data has finished
if __name__ == '__main__':
#Create queue for exchanging messages between processes
q = Queue()
#Create processes, and send the shared queue to them
processes = [Process(target=get_data,args(q,)),Process(target=analyse_data,args=(q,))]
#Start processes
for p in processes:
p.start()
#Wait until all processes complete
for p in processes:
p.join()
You example won't work for a few reasons :
Process cannot share a piece of memory with each other (you can't change the global in one process and see the change in the other)
Even if you could change the global value, you are checking it too fast and most likely it won't change in time
Read https://docs.python.org/3/library/ipc.html for more possibilities for inter-process-communications

Categories