multiprocessing - child process constantly sending back results and keeps running - python

Is it possible to have a few child processes running some calculations, then send the result to main process (e.g. update PyQt ui), but the processes are still running, after a while they send back data and update ui again?
With multiprocessing.queue, it seems like the data can only be sent back after process is terminated.
So I wonder whether this case is possible or not.

I don't know what you mean by "With multiprocessing.queue, it seems like the data can only be sent back after process is terminated". This is exactly the use case that Multiprocessing.Queue was designed for.
PyMOTW is a great resource for a whole load of Python modules, including Multiprocessing. Check it out here: https://pymotw.com/2/multiprocessing/communication.html
A simple example of how to send ongoing messages from a child to the parent using multiprocessing and loops:
import multiprocessing
def child_process(q):
for i in range(10):
q.put(i)
q.put("done") # tell the parent process we've finished
def parent_process():
q = multiprocessing.Queue()
child = multiprocessing.Process(target=child_process, args=(q,))
child.start()
while True:
value = q.get()
if value == "done": # no more values from child process
break
print value
# do other stuff, child will continue to run in separate process

Related

How can I refresh Pool processes reading from Queue?

I'm using multiprocessing Pool + Queue to share processing work between a parent process (processing with GPUs) and child processes (processing on the CPU). My program looks like this:
def reader_proc(queue):
## Read from the queue; this will be spawned as a separate Process
while True:
msg = queue.get() # Read from the queue and do nothing
do_cpu_work(msg)
if (msg == 'DONE'):
break
if __name__=='__main__':
queue = JoinableQueue()
pool = Pool(reader_proc, target=(queue,))
for task in GPUWork:
results = do_task(task)
for result in results:
queue.put(task)
# put 'DONE' on and join and close
I'm having a severe memory leak right now, even after explicitly deleting every variable in the reader_proc and calling gc.collect(). I'm calling into various C++ libraries from the reader_proc and I suspect one of them could be leaking memory. While I try and debug that, I need to get some processing done on this data.
Is there any way to refresh these reader processes? E.g. periodically terminate them and restart them. This exists with maxtasksperchild for a Pool operating on an iter but doesn't seem to apply to this Queue / Process based scheme.

python: Why join keeps me waiting?

I want to do clustering on 10,000 models. Before that, I have to calculate the pearson corralation coefficient associated with every two models. That's a large amount of computation, so I use multiprocessing to spawn processes, assigning the computing job to 16 cpus.My code is like this:
import numpy as np
from multiprocessing import Process, Queue
def cc_calculator(begin, end, q):
index=lambda i,j,n: i*n+j-i*(i+1)/2-i-1
for i in range(begin, end):
for j in range(i, nmodel):
all_cc[i][j]=get_cc(i,j)
q.put((index(i,j,nmodel),all_cc[i][j]))
def func(i):
res=(16-i)/16
res=res**0.5
res=int(nmodel*(1-res))
return res
nmodel=int(raw_input("Entering the number of models:"))
all_cc=np.zeros((nmodel,nmodel))
ncc=int(nmodel*(nmodel-1)/2)
condensed_cc=[0]*ncc
q=Queue()
mprocess=[]
for ii in range(16):
begin=func(i)
end=func(i+1)
p=Process(target=cc_calculator,args=(begin,end,q))
mprocess+=[p]
p.start()
for x in mprocess:
x.join()
while not q.empty():
(ind, value)=q.get()
ind=int(ind)
condensed_cc[ind]=value
np.save("condensed_cc",condensed_cc)
where get_cc(i,j) calculates the corralation coefficient associated with model i and j. all_cc is an upper triangular matrix and all_cc[i][j] stores the cc value. condensed_cc is another version of all_cc. I'll process it to achive condensed_dist to do the clustering. The "func" function helps assign to each cpu almost the same amout of computing.
I run the program successfully with nmodel=20. When I try to run the program with nmodel=10,000, however, seems that it never ends.I wait about two days and use top command in another terminal window, no process with command "python" is still running. But the program is still running and there is no output file. I use Ctrl+C to force it to stop, it points to the line: x.join(). nmodel=40 ran fast but failed with the same problem.
Maybe this problem has something to do with q. Because if I comment the line: q.put(...), it runs successfully.Or something like this:
q.put(...)
q.get()
It is also ok.But the two methods will not give a right condensed_cc. They don't change all_cc or condensed_cc.
Another example with only one subprocess:
from multiprocessing import Process, Queue
def g(q):
num=10**2
for i in range(num):
print '='*10
print i
q.put((i,i+2))
print "qsize: ", q.qsize()
q=Queue()
p=Process(target=g,args=(q,))
p.start()
p.join()
while not q.empty():
q.get()
It is ok with num= 100 but fails with num=10,000. Even with num=100**2, they did print all i and q.qsizes. I cannot figure out why. Also, Ctrl+C causes trace back to p.join().
I want to say more about the size problem of queue. Documentation about Queue and its put method introduces Queue as Queue([maxsize]), and it says about the put method:...block if neccessary until a free slot is available. These all make one think that the subprocess is blocked because of running out of spaces of the queue. However, as I mentioned before in the second example, the result printed on the screen proves an increasing qsize, meaning that the queue is not full. I add one line:
print q.full()
after the print size statement, it is always false for num=10,000 while the program still stuck somewhere. Emphasize one thing: top command in another terminal shows no process with command python. That really puzzles me.
I'm using python 2.7.9.
I believe the problem you are running into is described in the multiprocessing programming guidelines: https://docs.python.org/2/library/multiprocessing.html#multiprocessing-programming
Specifically this section:
Joining processes that use queues
Bear in mind that a process that has put items in a queue will wait before terminating until all the buffered items are fed by the “feeder” thread to the underlying pipe. (The child process can call the cancel_join_thread() method of the queue to avoid this behaviour.)
This means that whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate. Remember also that non-daemonic processes will be joined automatically.
An example which will deadlock is the following:
from multiprocessing import Process, Queue
def f(q):
q.put('X' * 1000000)
if __name__ == '__main__':
queue = Queue()
p = Process(target=f, args=(queue,))
p.start()
p.join() # this deadlocks
obj = queue.get()
A fix here would be to swap the last two lines (or simply remove the p.join() line).
You might also want to check out the section on "Avoid Shared State".
It looks like you are using .join to avoid the race condition of q.empty() returning True before something is added to it. You should not rely on .empty() at all while using multiprocessing (or multithreading). Instead you should handle this by signaling from the worker process to the main process when it is done adding items to the queue. This is normally done by placing a sentinal value in the queue, but there are other options as well.

Closing SSE connection in browser causes segfault in uWSGI when multiple sets of child processes are running concurrently

I'm building a web application for processing ~60,000 (and growing) large files, perform some analysis and return a "best guess" that needs to be verified by a user. The files will be refined by category to avoid loading every file, but I'm still left with a scenario where I might have to process 1000+ files at a time.
These are large files that can take up to 8-9 seconds each to process, and in a 1000+ file situation it is impractical to have a user wait 8 seconds between reviews or 2 hours+ while the files are processed before hand.
To overcome this, I've decided to use multiprocessing to spawn several workers, each of which will pick from a queue of files, process them and insert into an output queue. I have another method that basically polls the output queue for items and then streams them to the client when one becomes available.
We're using gevent with uWSGI and Django in our environment and I'm aware that child process creation via multiprocessing in the context of gevent yields an undesired event loop state in the child. Greenlets spawned before forking are duplicated in the child. Therefore, I've decided to use lets to assist in the handling of the child processes.
This all works beautifully while uninterrupted. However, if a user were to switch categories anytime while files are still being processed, I close the SSE connection in the browser and open another causing a new set of child processes to spawn and killing the existing processes (or attempting to). This causes one of two things to happen.
When I yield out the results to the client, I get an IOError from uWSGI because the connection has closed. I wrapped the entire function in a try...finally and kill all the workers before the function exits.
I can either block while the processes are killed or do it in the background. Each method has different consequences. When trying to kill without blocking, the original processes are never killed, the new processes stop yielding, and any request (from any page) to the server hangs until I manually kill all uWSGI processes.
When blocking, uWSGI reports a segmentation fault, the main worker is killed and restarted, killing all child processes - new and old.
An example of the JavaScript used to open/close the connection:
var state = {};
function analyze(){
// If a connection exists, close it.
if (state.evtSource) {
state.evtSource.close();
}
// Create a new connection to the server.
evtSource = state.evtSource = new EventSource('?myarg=myval');
evtSource.onmessage = function(message){
//do stuff
}
}
Server-side example code:
from item import Item
import lets
import multiprocessing
import time
MAX_WORKERS = 10
# Worker is outside of ProcessFiles because ``lets``
# pickles the target.
def worker(item):
return item.process()
class ProcessFiles(object):
def __init__(self):
self.input_queue = multiprocessing.Queue()
self.output_queue = multiprocessing.Queue()
self.file_count = 0
self.pool = lets.ProcessPool(MAX_WORKERS)
def query_for_results(self):
# Query db for records of files to process.
# Return results and set self.file_count equal to
# the number of records returned.
pass
def start(self):
# Queue up files to process.
for result in self.query_for_results():
item = Item(result)
self.input_queue.append(item)
# Spawn up to MAX_WORKERS child processes to analyze
# all of the items in the input queue. Append processed file
# to output queue.
for item in self.input_queue:
self.pool.apply_async(worker, args=(item,), callback=self.callback)
# Poll for items to send to client.
return self.get_processed_items()
def callback(self, processed):
self.output_queue.put(processed)
def get_processed_items(self):
# Wait for the output queue to hold at least 1 item.
# When an item becomes available, yield it to client.
try:
count = 0
while count != self.file_count:
try:
item = self.output_queue.get(timeout=1)
except:
# Queue is empty. Wait and retry.
time.sleep(1)
continue
count += 1
yield item
yield 'end'
finally:
# Kill all child processes.
self.pool.kill(block=True) # <- Causes segfault.
#self.pool.kill(block=False) # <- Silently fails.
This only happens when a user makes a selection, and while processing those files, makes another selection, effectively closing the current connection and creating a new one, creating two different sets of child processes.
Why does blocking cause a segmentation fault? Why is the behavior different when blocking vs not blocking? What can I do to kill all the original processes?

How to list Processes started by multiprocessing Pool?

While attempting to store multiprocessing's process instance in multiprocessing list-variable 'poolList` I am getting a following exception:
SimpleQueue objects should only be shared between processes through inheritance
The reason why I would like to store the PROCESS instances in a variable is to be able to terminate all or just some of them later (if for example a PROCESS freezes). If storing a PROCESS in variable is not an option I would like to know how to get or to list all the PROCESSES started by mutliprocessing POOL. That would be very similar to what .current_process() method does. Except .current_process gets only a single process while I need all the processes started or all the processes currently running.
Two questions:
Is it even possible to store an instance of the Process (as a result of mp.current_process()
Currently I am only able to get a single process from inside of the function that the process is running (from inside of myFunct() using .current_process() method).
Instead I would like to to list all the processes currently running by multiprocessing. How to achieve it?
import multiprocessing as mp
poolList=mp.Manager().list()
def myFunct(arg):
print 'myFunct(): current process:', mp.current_process()
try: poolList.append(mp.current_process())
except Exception, e: print e
for i in range(110):
for n in range(500000):
pass
poolDict[arg]=i
print 'myFunct(): completed', arg, poolDict
from multiprocessing import Pool
pool = Pool(processes=2)
myArgsList=['arg1','arg2','arg3']
pool=Pool(processes=2)
pool.map_async(myFunct, myArgsList)
pool.close()
pool.join()
To list the processes started by a Pool()-instance(which is what you mean if I understand you correctly), there is the pool._pool-list. And it contains the instances of the processes.
However, it is not part of the documented interface and hence, really should not be used.
BUT...it seems a little bit unlikely that it would change just like that anyway. I mean, should they stop having an internal list of processes in the pool? And not call that _pool?
And also, it annoys me that there at least isn't a get processes-method. Or something.
And handling it breaking due to some name change should not be that difficult.
But still, use at your own risk:
from multiprocessing import pool
# Have to run in main
if __name__ == '__main__':
# Create 3 worker processes
_my_pool = pool.Pool(3)
# Loop, terminate, and remove from the process list
# Use a copy [:] of the list to remove items correctly
for _curr_process in _my_pool._pool[:]:
print("Terminating process "+ str(_curr_process.pid))
_curr_process.terminate()
_my_pool._pool.remove(_curr_process)
# If you call _repopulate, the pool will again contain 3 worker processes.
_my_pool._repopulate_pool()
for _curr_process in _my_pool._pool[:]:
print("After repopulation "+ str(_curr_process.pid))
The example creates a pool and manually terminates all processes.
It is important that you remember to delete the process you terminate from the pool yourself i you want Pool() to continue working as usual.
_my_pool._repopulate increases the number of working processes to 3 again, not needed to answer the question, but gives a little bit of behind-the-scenes insight.
Yes you can get all active process and perform action based on name of process
e.g
multiprocessing.Process(target=foo, name="refresh-reports")
and then
for p in multiprocessing.active_children():
if p.name == "foo":
p.terminate()
You're creating a managed List object, but then letting the associated Manager object expire.
Process objects are shareable because they aren't pickle-able; that is, they aren't simple.
Oddly the multiprocessing module doesn't have the equivalent of threading.enumerate() -- that is, you can't list all outstanding processes. As a workaround, I just store procs in a list. I never terminate() a process, but do sys.exit(0) in the parent. It's rough, because the workers will leave things in an inconsistent state, but it's okay for smaller programs
To kill a frozen worker, I suggest: 1) worker receives "heartbeat" jobs in a queue every now and then, 2) if parent notices worker A hasn't responded to a heartbeat in a certain amount of time, then p.terminate(). Consider restating the problem in another SO question, as it's interesting.
To be honest the map stuff is much easier than using a Manager.
Here's a Manager example I've used. A worker adds stuff to a shared list. Another worker occasionally wakes up, processes everything on the list, then goes back to sleep. The code also has verbose logs, which are essential for ease in debugging.
source
# producer adds to fixed-sized list; scanner uses them
import logging, multiprocessing, sys, time
def producer(objlist):
'''
add an item to list every sec; ensure fixed size list
'''
logger = multiprocessing.get_logger()
logger.info('start')
while True:
try:
time.sleep(1)
except KeyboardInterrupt:
return
msg = 'ding: {:04d}'.format(int(time.time()) % 10000)
logger.info('put: %s', msg)
del objlist[0]
objlist.append( msg )
def scanner(objlist):
'''
every now and then, run calculation on objlist
'''
logger = multiprocessing.get_logger()
logger.info('start')
while True:
try:
time.sleep(5)
except KeyboardInterrupt:
return
logger.info('items: %s', list(objlist))
def main():
logger = multiprocessing.log_to_stderr(
level=logging.INFO
)
logger.info('setup')
# create fixed-length list, shared between producer & consumer
manager = multiprocessing.Manager()
my_objlist = manager.list( # pylint: disable=E1101
[None] * 10
)
multiprocessing.Process(
target=producer,
args=(my_objlist,),
name='producer',
).start()
multiprocessing.Process(
target=scanner,
args=(my_objlist,),
name='scanner',
).start()
logger.info('running forever')
try:
manager.join() # wait until both workers die
except KeyboardInterrupt:
pass
logger.info('done')
if __name__=='__main__':
main()

Understanding os.fork and Queue.Queue

I wanted to implement a simple python program using parallel execution. It's I/O bound, so I figured threads would be appropriate (as opposed to processes). After reading the documentation for Queue and fork, I thought something like the following might work.
q = Queue.Queue()
if os.fork(): # child
while True:
print q.get()
else: # parent
[q.put(x) for x in range(10)]
However, the get() call never returns. I thought it would return once the other thread executes a put() call. Using the threading module, things behave more like I expected:
q = Queue.Queue()
def consume(q):
while True:
print q.get()
worker = threading.Thread (target=consume, args=(q,))
worker.start()
[q.put(x) for x in range(10)]
I just don't understand why the fork approach doesn't do the same thing. What am I missing?
The POSIX fork system call creates a new process, rather than a new thread inside the same adress space:
The fork() function shall create a new process. The new process (child
process) shall be an exact copy of the calling process (parent
process) except as detailed below: [...]
So the Queue is duplicated in your first example, rather than shared between the parent and child.
You can use multiprocessing.Queue instead or just use threads like in your second example :)
By the way, using list comprehensions just for side effects isn't good practice for several reasons. You should use a for loop instead:
for x in range(10): q.put(x)
To share the data between unrelated processes, you can use named pipes. Through the os.open() funcion..
http://docs.python.org/2/library/os.html#os.open. You can simply name a pipe as named_pipe='my_pipe' and in a different python programs use os.open(named_pipe, ), where mode is WRONLY and so on. After that you'll make a FIFO to write into the pipe. Don't forget to close the pipe and catch exceptions..
Fork creates a new process. The child and parent processes do not share the same Queue: that's why the elements put by the parent process cannot be retrieved by the child.

Categories