multiprocessing, threading gets stuck and printing output gets messed up - python

I'm running multiple threads in python. I've tried using threading module, multiprocessing module. Even though the execution gives the correct result, everytime the terminal gets stuck and printing of the output gets messed up.
Here's a simplified version of the code.
import subprocess
import threading
import argparse
import sys
result = []
def check_thread(args,components,id):
for i in components:
cmd = <command to be given to terminal>
output = subprocess.check_output([cmd],shell=True)
result.append((id,i,output))
def check(args,components):
# lock = threading.Lock()
# lock = threading.Semaphore(value=1)
thread_list = []
for id in range(3):
t=threading.Thread(target=check_thread, args=(args,components,i))
thread_list.append(t)
for thread in thread_list:
thread.start()
for thread in thread_list:
thread.join()
for res in result:
print(res)
return res
if __name__ == 'main':
parser = argparse.ArgumentParser(....)
parser.add_argument(.....)
args = parser.parse_args()
components = ['comp1','comp2']
while True:
print('SELECTION MENU\n1)\n2)\n')
option = raw_input('Enter option')
if option=='1':
res = check(args, components)
if option=='2':
<do something else>
else:
sys.exit(0)
I've tried using multiprocessing module with Process, pool. Tried passing a lock to check_thread, tried returning a value from check_thread() and using a queue to take in the values, but everytime it's the same result, execution is successful but the terminal gets stuck and printed output is shabby.
Is there any fix to this? I'm using python 2.7. I'm using a linux terminal.
Here is how the shabby output looks
output

You should use queue module not list.
import multiprocessing as mp
# Define an output queue
output = mp.Queue()
# define a example function
def function(params, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
# Process params and store results in res variable
output.put(res)
# Setup a list of processes that we want to run
processes = [mp.Process(target=function, args=(5, output)) for x in range(10)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)

Related

Python JoinableQueue and Queue Thread Not Completing

I am using the following code to complete a task using multithreading with Queue and Joinable Queue. Sometimes the script executes perfectly other times it stalls at the end of the task without ending the worker and will not continue on to the next portion of the script. I am new to working with Queue and JoinableQueue and I need to find out why this stalling happens.
Before this part in the code I run another Queue, JoinableQueue worker to download some data and it works perfectly fine everytime. Do I need to close() any thing from the first Queue/JoinableQueue? Is there a way to check if it stalls and if so continue on?
Here is my code:
import multiprocessing
from multiprocessing import Queue
from multiprocessing import JoinableQueue
from threading import Thread
def run_this_definition(hr):
#do things here
return()
def worker():
while True:
item = jq.get()
run_this_definition(item)
jq.task_done()
return()
q = Queue()
jq = JoinableQueue()
number_of_threads = 8
for i in range(number_of_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
input_list = [0,1,2,3,4]
for item in input_list:
jq.put(item)
jq.join()
print "finished"
The script never prints "finished" when it stalls, but seems to finish all the tasks and stalls at the end of the 'run_this_definition' on the very last item in the Queue.
My guess is you are using the multiprocessing.JoinableQueue()!? Use the Queue.Queue() instead for threading. It has a .join() and a .task_done() method as well. Furthermore you should pass your queue as an argument to your threads: See the following example:
import threading
from threading import Thread
from Queue import Queue
def worker(jq):
while True:
item = jq.get()
# Do whatever you have to do.
print '{}: {}'.format(threading.currentThread().name, item)
jq.task_done()
return()
number_of_threads = 4
input_list = [1,2,3,4,5]
jq = Queue()
for i in range(number_of_threads):
t = Thread(target=worker, args=(jq,))
t.daemon = True
t.start()
for item in input_list:
jq.put(item)
jq.join()
print "finished"
The print output form multiple threads might look messy, but as an example it should be fine.
For the future: Please provide a comprehensive example of your code. Neither your imports, nor number_of_threads, run_this_definition or input_list were defined in your example.

python multiprocessing.Process's join can not end

I'm going to write a program which has multiple process(CPU-crowded) and multiple threading(IO-crowded).(the code below just a sample, not the program)
But when the code meet the join() ,it make the program become a deadlock.
My code is post below
import requests
import time
from multiprocessing import Process, Queue
from multiprocessing.dummy import Pool
start = time.time()
queue = Queue()
rQueue = Queue()
url = 'http://www.bilibili.com/video/av'
for i in xrange(10):
queue.put(url+str(i))
def goURLsCrawl(queue, rQueue):
threadPool = Pool(7)
while not queue.empty():
threadPool.apply_async(urlsCrawl, args=(queue.get(), rQueue))
threadPool.close()
threadPool.join()
print 'end'
def urlsCrawl(url, rQueue):
response = requests.get(url)
rQueue.put(response)
p = Process(target=goURLsCrawl, args=(queue, rQueue))
p.start()
p.join() # join() is here
end = time.time()
print 'totle time %0.4f' % (end-start,)
Thanks in advance.😊
I finally find the reason. As you can see, I import the Queue from the multiprocessing, so the Queue should only used for Process, but I make the Thread access the Queue on my code, so it must something unknown occur behind the program.
To correct it, just import Queue instead of multiprocessing.Queue

Multiprocessing and global True/False variable

I'm struggling to get my head around multiprocessing and passing a global True/False variable into my function.
After get_data() finishes I want the analysis() function to start and process the data, while fetch() continues running. How can I make this work? TIA
import multiprocessing
ready = False
def fetch():
global ready
get_data()
ready = True
return
def analysis():
analyse_data()
if __name__ == '__main__':
p1 = multiprocessing.Process(target=fetch)
p2 = multiprocessing.Process(target=analysis)
p1.start()
if ready:
p2.start()
You should run the two processes and use a shared queue to exchange information between them, such as signaling the completion of an action in one of the processes.
Also, you need to have a join() statement to properly wait for completion of the processes you spawn.
from multiprocessing import Process, Queue
import time
def get_data(q):
#Do something to get data
time.sleep(2)
#Put an event in the queue to signal that get_data has finished
q.put('message from get_data to analyse_data')
def analyse_data(q):
#waiting for get_data to finish...
msg = q.get()
print msg #Will print 'message from get_data to analyse_data'
#get_data has finished
if __name__ == '__main__':
#Create queue for exchanging messages between processes
q = Queue()
#Create processes, and send the shared queue to them
processes = [Process(target=get_data,args(q,)),Process(target=analyse_data,args=(q,))]
#Start processes
for p in processes:
p.start()
#Wait until all processes complete
for p in processes:
p.join()
You example won't work for a few reasons :
Process cannot share a piece of memory with each other (you can't change the global in one process and see the change in the other)
Even if you could change the global value, you are checking it too fast and most likely it won't change in time
Read https://docs.python.org/3/library/ipc.html for more possibilities for inter-process-communications

python multiprocessing - process hangs on join for large queue

I'm running python 2.7.3 and I noticed the following strange behavior. Consider this minimal example:
from multiprocessing import Process, Queue
def foo(qin, qout):
while True:
bar = qin.get()
if bar is None:
break
qout.put({'bar': bar})
if __name__ == '__main__':
import sys
qin = Queue()
qout = Queue()
worker = Process(target=foo,args=(qin,qout))
worker.start()
for i in range(100000):
print i
sys.stdout.flush()
qin.put(i**2)
qin.put(None)
worker.join()
When I loop over 10,000 or more, my script hangs on worker.join(). It works fine when the loop only goes to 1,000.
Any ideas?
The qout queue in the subprocess gets full. The data you put in it from foo() doesn't fit in the buffer of the OS's pipes used internally, so the subprocess blocks trying to fit more data. But the parent process is not reading this data: it is simply blocked too, waiting for the subprocess to finish. This is a typical deadlock.
There must be a limit on the size of queues. Consider the following modification:
from multiprocessing import Process, Queue
def foo(qin,qout):
while True:
bar = qin.get()
if bar is None:
break
#qout.put({'bar':bar})
if __name__=='__main__':
import sys
qin=Queue()
qout=Queue() ## POSITION 1
for i in range(100):
#qout=Queue() ## POSITION 2
worker=Process(target=foo,args=(qin,))
worker.start()
for j in range(1000):
x=i*100+j
print x
sys.stdout.flush()
qin.put(x**2)
qin.put(None)
worker.join()
print 'Done!'
This works as-is (with qout.put line commented out). If you try to save all 100000 results, then qout becomes too large: if I uncomment out the qout.put({'bar':bar}) in foo, and leave the definition of qout in POSITION 1, the code hangs. If, however, I move qout definition to POSITION 2, then the script finishes.
So in short, you have to be careful that neither qin nor qout becomes too large. (See also: Multiprocessing Queue maxsize limit is 32767)
I had the same problem on python3 when tried to put strings into a queue of total size about 5000 cahrs.
In my project there was a host process that sets up a queue and starts subprocess, then joins. Afrer join host process reads form the queue. When subprocess producess too much data, host hungs on join. I fixed this using the following function to wait for subprocess in the host process:
from multiprocessing import Process, Queue
from queue import Empty
def yield_from_process(q: Queue, p: Process):
while p.is_alive():
p.join(timeout=1)
while True:
try:
yield q.get(block=False)
except Empty:
break
I read from queue as soon as it fills so it never gets very large
I was trying to .get() an async worker after the pool had closed
indentation error outside of a with block
i had this
with multiprocessing.Pool() as pool:
async_results = list()
for job in jobs:
async_results.append(
pool.apply_async(
_worker_func,
(job,),
)
)
# wrong
for async_result in async_results:
yield async_result.get()
i needed this
with multiprocessing.Pool() as pool:
async_results = list()
for job in jobs:
async_results.append(
pool.apply_async(
_worker_func,
(job,),
)
)
# right
for async_result in async_results:
yield async_result.get()

how to kill zombie processes created by multiprocessing module?

I'm very new to multiprocessing module. And I just tried to create the following: I have one process that's job is to get message from RabbitMQ and pass it to internal queue (multiprocessing.Queue). Then what I want to do is : spawn a process when new message comes in. It works, but after the job is finished it leaves a zombie process not terminated by it's parent. Here is my code:
Main Process:
#!/usr/bin/env python
import multiprocessing
import logging
import consumer
import producer
import worker
import time
import base
conf = base.get_settings()
logger = base.logger(identity='launcher')
request_order_q = multiprocessing.Queue()
result_order_q = multiprocessing.Queue()
request_status_q = multiprocessing.Queue()
result_status_q = multiprocessing.Queue()
CONSUMER_KEYS = [{'queue':'product.order',
'routing_key':'product.order',
'internal_q':request_order_q}]
# {'queue':'product.status',
# 'routing_key':'product.status',
# 'internal_q':request_status_q}]
def main():
# Launch consumers
for key in CONSUMER_KEYS:
cons = consumer.RabbitConsumer(rabbit_q=key['queue'],
routing_key=key['routing_key'],
internal_q=key['internal_q'])
cons.start()
# Check reques_order_q if not empty spaw a process and process message
while True:
time.sleep(0.5)
if not request_order_q.empty():
handler = worker.Worker(request_order_q.get())
logger.info('Launching Worker')
handler.start()
if __name__ == "__main__":
main()
And here is my Worker:
import multiprocessing
import sys
import time
import base
conf = base.get_settings()
logger = base.logger(identity='worker')
class Worker(multiprocessing.Process):
def __init__(self, msg):
super(Worker, self).__init__()
self.msg = msg
self.daemon = True
def run(self):
logger.info('%s' % self.msg)
time.sleep(10)
sys.exit(1)
So after all the messages gets processed I can see processes with ps aux command. But I would really like them to be terminated once finished.
Thanks.
Using multiprocessing.active_children is better than Process.join. The function active_children cleans any zombies created since the last call to active_children. The method join awaits the selected process. During that time, other processes can terminate and become zombies, but the parent process will not notice, until the awaited method is joined. To see this in action:
import multiprocessing as mp
import time
def main():
n = 3
c = list()
for i in range(n):
d = dict(i=i)
p = mp.Process(target=count, kwargs=d)
p.start()
c.append(p)
for p in reversed(c):
p.join()
print('joined')
def count(i):
print(f'{i} going to sleep')
time.sleep(i * 10)
print(f'{i} woke up')
if __name__ == '__main__':
main()
The above will create 3 processes that terminate 10 seconds apart each. As the code is, the last process is joined first, so the other two, which terminated earlier, will be zombies for 20 seconds. You can see them with:
ps aux | grep Z
There will be no zombies if the processes are awaited in the sequence that they will terminate. Remove the call to the function reversed to see this case. However, in real applications we rarely know the sequence that children will terminate, so using the method multiprocessing.Process.join will result in some zombies.
The alternative active_children does not leave any zombies.
In the above example, replace the loop for p in reversed(c): with:
while True:
time.sleep(1)
if not mp.active_children():
break
and see what happens.
A couple of things:
Make sure the parent joins its children, to avoid zombies. See Python Multiprocessing Kill Processes
You can check whether a child is still running with the is_alive() member function. See http://docs.python.org/2/library/multiprocessing.html#multiprocessing.Process
Use active_children.
multiprocessing.active_children

Categories