Multiprocessing processes in uninterruptible sleep (D) state when run from iPython Notebook - python

I am performing a large parallel mapping computation from within iPython notebook. I am mapping a dataframe by subject and condition to an machine learning prediction function, and I want each subject and condition to be spread among 20 cores.
def map_vars_to_functionPredict(subject,condition):
ans = map(predictBasic, [subject],[df],[condition])
return ans
def main_helperPredict(args):
return map_vars_to_functionPredict(*args)
def parallel_predict(subjects, conditions):
p = Pool(20)
# set each matching item into a tuple
job_args = list(itertools.product(*[subjects,conditions]))
print job_args
# map to pool
ans = p.map(main_helperPredict, job_args)
p.close()
p.join()
return ans
When I run these functions from iPython Notebook after starting the notebook, they run quickly and as expected (in 'Running' state at ~100% cpu in 20 cores). However, sometimes if I re-run the parallel_predict function right after running it for the first time, all 20 processes are marked as in uninterruptible sleep (D) state for no reason. I am not writing anything to disk, just having the output as a variable in iPython notebook.
As a last ditch attempt, I have tried including del p after p.join() and this helped somewhat (the function runs normally more often), but I still occasionally have the issue of processes being D, especially if I have a lot of processes in the queue.
Edit:
In general, adding del p after p.join() kept the processes from entering (D) state, but I continued to have an issue where the function would finish all the processes (as far as I could tell from top), but it would not return results. When I stopped the iPython Notebook kernel, I got the error ZMQError: Address already in use.
How should I properly start or finish the multiprocessing Pool to keep this from happening?

I changed four things and now 1) the processes no longer go into (D) state and 2) I can run these functions back-to-back and they always return results and don't hang.
To parallel_predict, I added freeze_support() and replaced p.close() with p.terminate() (and added a print line, but I don't think that makes a difference, but I'm including that since all of this is superstition anyway). I also added del p.
def parallel_predict(subjects, conditions):
freeze_support()
p = Pool(20)
# set each matching item into a tuple
job_args = list(itertools.product(*[subjects,conditions]))
print job_args
# map to pool
ans = p.map(main_helperPredict, job_args)
p.terminate()
p.join()
del p
print "finished"
return ans
Finally, I embedded the line where I call parallel_predict in if __name__ == "__main__" as such:
if __name__ == "__main__":
all_results = parallel_predict(subjects,conditions)

Related

Python multiprocessing module not calling function

I have a program that needs to create several graphs, with each one often taking hours. Therefore I want to run these simultaneously on different cores, but cannot seem to get these processes to run with the multiprocessing module. Here is my code:
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=full_graph)
jobs.append(p)
p.start()
p.join()
(full_graph() has been defined earlier in the program, and is simply a function that runs a collection of other functions)
The function normally outputs some graphs, and saves the data to a .txt file. All data is saved to the same 2 text files. However, calling the functions using the above code gives no console output, nor any output to the text file. All that happens is a few second long pause, and then the program exits.
I am using the Spyder IDE with WinPython 3.6.3
Without a simple full_graph sample nobody can tell you what's happening. But your code is inherently wrong.
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=full_graph)
jobs.append(p)
p.start()
p.join() # <- This would block until p is done
See the comment after p.join(). If your processes really take hours to complete, you would run one process for hours and then the 2nd, the 3rd. Serially and using a single core.
From the docs: https://docs.python.org/3/library/multiprocessing.html
Process.join: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.join
If the optional argument timeout is None (the default), the method blocks until the process whose join() method is called terminates. If timeout is a positive number, it blocks at most timeout seconds. Note that the method returns None if its process terminates or if the method times out. Check the process’s exitcode to determine if it terminated.
If each process does something different, you should then also have some args for full_graph(hint: may that be the missing factor?)
You probably want to use an interface like map from Pool
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool
And do (from the docs again)
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))

multi-process Queue - High RAM consuming

I set a multiprocessing.Queue code to run 2 functions in parallel. 1st func parses and writes data to a text file, 2nd function pulls data from same text file and show a live graph. 2nd func must kick off once the 1st func has created a text file. The code works well.
However:
It takes almost all RAM (ca. 6gb), is that because is a multi-process? in task manager I see 3 python.exe processes of 2gb each running at the same time, while when I run only the 1st func (the most RAM consuming) I can see only 1 python.exe of 2gb.
Once the code has parsed all the text and the graph stopped the processes keep running until I terminate manually the code using eclipse console red button, is that normal?
I have a small script that run before and out of the multi-process functions. It provides a value I need to define in order to run the multi-process functions. It runs OK, but once the the multi-process functions are called, it run again!! I don't know why, because it's definitely out of the multi-process functions.
Part of this was resolved in another stackoverflow question.
define newlista
define key
def get_all(myjson, kind, type)
def on_data(data)
small script run out of the multi-process function
def parsing(q):
keep_running = True
numentries = 0
for text in get_all(newlista, "sentence", "text"):
if 80 < len(text) < 500:
firstword = key.lower().split(None, 1)[0]
if text.lower().startswith(firstword):
pass
else:
on_data(text)
numentries += 1
q.put(keep_running)
keep_running = False
q.put(keep_running)
def live_graph(q):
keep_running = True
while keep_running:
keep_running = q.get()
# do the graph updates here
if __name__=='__main__':
q = Queue()
p1 = Process(target = writing, args=(q,))
p1.start()
p2 = Process(target = live_graph, args=(q,))
p2.start()
UPDATE
The graph function is the one that generates two .py processes and once the 1st function terminated the second function keeps running.

python: Why join keeps me waiting?

I want to do clustering on 10,000 models. Before that, I have to calculate the pearson corralation coefficient associated with every two models. That's a large amount of computation, so I use multiprocessing to spawn processes, assigning the computing job to 16 cpus.My code is like this:
import numpy as np
from multiprocessing import Process, Queue
def cc_calculator(begin, end, q):
index=lambda i,j,n: i*n+j-i*(i+1)/2-i-1
for i in range(begin, end):
for j in range(i, nmodel):
all_cc[i][j]=get_cc(i,j)
q.put((index(i,j,nmodel),all_cc[i][j]))
def func(i):
res=(16-i)/16
res=res**0.5
res=int(nmodel*(1-res))
return res
nmodel=int(raw_input("Entering the number of models:"))
all_cc=np.zeros((nmodel,nmodel))
ncc=int(nmodel*(nmodel-1)/2)
condensed_cc=[0]*ncc
q=Queue()
mprocess=[]
for ii in range(16):
begin=func(i)
end=func(i+1)
p=Process(target=cc_calculator,args=(begin,end,q))
mprocess+=[p]
p.start()
for x in mprocess:
x.join()
while not q.empty():
(ind, value)=q.get()
ind=int(ind)
condensed_cc[ind]=value
np.save("condensed_cc",condensed_cc)
where get_cc(i,j) calculates the corralation coefficient associated with model i and j. all_cc is an upper triangular matrix and all_cc[i][j] stores the cc value. condensed_cc is another version of all_cc. I'll process it to achive condensed_dist to do the clustering. The "func" function helps assign to each cpu almost the same amout of computing.
I run the program successfully with nmodel=20. When I try to run the program with nmodel=10,000, however, seems that it never ends.I wait about two days and use top command in another terminal window, no process with command "python" is still running. But the program is still running and there is no output file. I use Ctrl+C to force it to stop, it points to the line: x.join(). nmodel=40 ran fast but failed with the same problem.
Maybe this problem has something to do with q. Because if I comment the line: q.put(...), it runs successfully.Or something like this:
q.put(...)
q.get()
It is also ok.But the two methods will not give a right condensed_cc. They don't change all_cc or condensed_cc.
Another example with only one subprocess:
from multiprocessing import Process, Queue
def g(q):
num=10**2
for i in range(num):
print '='*10
print i
q.put((i,i+2))
print "qsize: ", q.qsize()
q=Queue()
p=Process(target=g,args=(q,))
p.start()
p.join()
while not q.empty():
q.get()
It is ok with num= 100 but fails with num=10,000. Even with num=100**2, they did print all i and q.qsizes. I cannot figure out why. Also, Ctrl+C causes trace back to p.join().
I want to say more about the size problem of queue. Documentation about Queue and its put method introduces Queue as Queue([maxsize]), and it says about the put method:...block if neccessary until a free slot is available. These all make one think that the subprocess is blocked because of running out of spaces of the queue. However, as I mentioned before in the second example, the result printed on the screen proves an increasing qsize, meaning that the queue is not full. I add one line:
print q.full()
after the print size statement, it is always false for num=10,000 while the program still stuck somewhere. Emphasize one thing: top command in another terminal shows no process with command python. That really puzzles me.
I'm using python 2.7.9.
I believe the problem you are running into is described in the multiprocessing programming guidelines: https://docs.python.org/2/library/multiprocessing.html#multiprocessing-programming
Specifically this section:
Joining processes that use queues
Bear in mind that a process that has put items in a queue will wait before terminating until all the buffered items are fed by the “feeder” thread to the underlying pipe. (The child process can call the cancel_join_thread() method of the queue to avoid this behaviour.)
This means that whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate. Remember also that non-daemonic processes will be joined automatically.
An example which will deadlock is the following:
from multiprocessing import Process, Queue
def f(q):
q.put('X' * 1000000)
if __name__ == '__main__':
queue = Queue()
p = Process(target=f, args=(queue,))
p.start()
p.join() # this deadlocks
obj = queue.get()
A fix here would be to swap the last two lines (or simply remove the p.join() line).
You might also want to check out the section on "Avoid Shared State".
It looks like you are using .join to avoid the race condition of q.empty() returning True before something is added to it. You should not rely on .empty() at all while using multiprocessing (or multithreading). Instead you should handle this by signaling from the worker process to the main process when it is done adding items to the queue. This is normally done by placing a sentinal value in the queue, but there are other options as well.

Python multiprocessing with an updating queue and an output queue

How can I script a Python multiprocess that uses two Queues as these ones?:
one as a working queue that starts with some data and that, depending on conditions of the functions to be parallelized, receives further tasks on the fly,
another that gathers results and is used to write down the result after processing finishes.
I basically need to put some more tasks in the working queue depending on what I found in its initial items. The example I post below is silly (I could transform the item as I like and put it directly in the output Queue), but its mechanics are clear and reflect part of the concept I need to develop.
Hereby my attempt:
import multiprocessing as mp
def worker(working_queue, output_queue):
item = working_queue.get() #I take an item from the working queue
if item % 2 == 0:
output_queue.put(item**2) # If I like it, I do something with it and conserve the result.
else:
working_queue.put(item+1) # If there is something missing, I do something with it and leave the result in the working queue
if __name__ == '__main__':
static_input = range(100)
working_q = mp.Queue()
output_q = mp.Queue()
for i in static_input:
working_q.put(i)
processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(mp.cpu_count())] #I am running as many processes as CPU my machine has (is this wise?).
for proc in processes:
proc.start()
for proc in processes:
proc.join()
for result in iter(output_q.get, None):
print result #alternatively, I would like to (c)pickle.dump this, but I am not sure if it is possible.
This does not end nor print any result.
At the end of the whole process I would like to ensure that the working queue is empty, and that all the parallel functions have finished writing to the output queue before the later is iterated to take out the results. Do you have suggestions on how to make it work?
The following code achieves the expected results. It follows the suggestions made by #tawmas.
This code allows to use multiple cores in a process that requires that the queue which feeds data to the workers can be updated by them during the processing:
import multiprocessing as mp
def worker(working_queue, output_queue):
while True:
if working_queue.empty() == True:
break #this is the so-called 'poison pill'
else:
picked = working_queue.get()
if picked % 2 == 0:
output_queue.put(picked)
else:
working_queue.put(picked+1)
return
if __name__ == '__main__':
static_input = xrange(100)
working_q = mp.Queue()
output_q = mp.Queue()
results_bank = []
for i in static_input:
working_q.put(i)
processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(mp.cpu_count())]
for proc in processes:
proc.start()
for proc in processes:
proc.join()
results_bank = []
while True:
if output_q.empty() == True:
break
results_bank.append(output_q.get_nowait())
print len(results_bank) # length of this list should be equal to static_input, which is the range used to populate the input queue. In other words, this tells whether all the items placed for processing were actually processed.
results_bank.sort()
print results_bank
You have a typo in the line that creates the processes. It should be mp.Process, not mp.process. This is what is causing the exception you get.
Also, you are not looping in your workers, so they actually only consume a single item each from the queue and then exit. Without knowing more about the required logic, it's not easy to give specific advice, but you will probably want to enclose the body of your worker function inside a while True loop and add a condition in the body to exit when the work is done.
Please note that, if you do not add a condition to explicitly exit from the loop, your workers will simply stall forever when the queue is empty. You might consider using the so-called poison pill technique to signal the workers they may exit. You will find an example and some useful discussion in the PyMOTW article on Communication Between processes.
As for the number of processes to use, you will need to benchmark a bit to find what works for you, but, in general, one process per core is a good starting point when your workload is CPU bound. If your workload is IO bound, you might have better results with a higher number of workers.

multiple output returned from python multiprocessing function

I am trying to use multiprocessing to return a list, but instead of waiting until all processes are done, I get several returns from one return statement in mp_factorizer, like this:
None
None
(returns list)
in this example I used 2 threads. If I used 5 threads, there would be 5 None returns before the list is being put out. Here is the code:
def mp_factorizer(nums, nprocs, objecttouse):
if __name__ == '__main__':
out_q = multiprocessing.Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []
for i in range(nprocs):
p = multiprocessing.Process(
target=worker,
args=(nums[chunksize * i:chunksize * (i + 1)],
out_q,
objecttouse))
procs.append(p)
p.start()
# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultlist = []
for i in range(nprocs):
temp=out_q.get()
index =0
for i in temp:
resultlist.append(temp[index][0][0:])
index +=1
# Wait for all worker processes to finish
for p in procs:
p.join()
resultlist2 = [x for x in resultlist if x != []]
return resultlist2
def worker(nums, out_q, objecttouse):
""" The worker function, invoked in a process. 'nums' is a
list of numbers to factor. The results are placed in
a dictionary that's pushed to a queue.
"""
outlist = []
for n in nums:
outputlist=objecttouse.getevents(n)
if outputlist:
outlist.append(outputlist)
out_q.put(outlist)
mp_factorizer gets a list of items, # of threads, and an object that the worker should use, it then splits up the list of items so all threads get an equal amount of the list, and starts the workers.
The workers then use the object to calculate something from the given list, add the result to the queue.
Mp_factorizer is supposed to collect all results from the queue, merge them to one large list and return that list. However - I get multiple returns.
What am I doing wrong? Or is this expected behavior due to the strange way windows handles multiprocessing?
(Python 2.7.3, Windows7 64bit)
EDIT:
The problem was the wrong placement of if __name__ == '__main__':. I found out while working on another problem, see using multiprocessing in a sub process for a complete explanation.
if __name__ == '__main__' is in the wrong place. A quick fix would be to protect only the call to mp_factorizer like Janne Karila suggested:
if __name__ == '__main__':
print mp_factorizer(list, 2, someobject)
However, on windows the main file will be executed once on execution + once for every worker thread, in this case 2. So this would be a total of 3 executions of the main thread, excluding the protected part of the code.
This can cause problems as soon as there are other computations being made in the same main thread, and at the very least unnecessarily slow down performance. Even though only the worker function should be executed several times, in windows everything will be executed thats not protected by if __name__ == '__main__'.
So the solution would be to protect the whole main process by executing all code only after
if __name__ == '__main__'.
If the worker function is in the same file, however, it needs to be excluded from this if statement because otherwise it can not be called several times for multiprocessing.
Pseudocode main thread:
# Import stuff
if __name__ == '__main__':
#execute whatever you want, it will only be executed
#as often as you intend it to
#execute the function that starts multiprocessing,
#in this case mp_factorizer()
#there is no worker function code here, it's in another file.
Even though the whole main process is protected, the worker function can still be started, as long as it is in another file.
Pseudocode main thread, with worker function:
# Import stuff
#If the worker code is in the main thread, exclude it from the if statement:
def worker():
#worker code
if __name__ == '__main__':
#execute whatever you want, it will only be executed
#as often as you intend it to
#execute the function that starts multiprocessing,
#in this case mp_factorizer()
#All code outside of the if statement will be executed multiple times
#depending on the # of assigned worker threads.
For a longer explanation with runnable code, see using multiprocessing in a sub process
Your if __name__ == '__main__' statement is in the wrong place. Put it around the print statement to prevent the subprocesses from executing that line:
if __name__ == '__main__':
print mp_factorizer(list, 2, someobject)
Now you have the if inside mp_factorizer, which makes the function return None when called inside a subprocess.

Categories