I am trying to use multiprocessing to return a list, but instead of waiting until all processes are done, I get several returns from one return statement in mp_factorizer, like this:
None
None
(returns list)
in this example I used 2 threads. If I used 5 threads, there would be 5 None returns before the list is being put out. Here is the code:
def mp_factorizer(nums, nprocs, objecttouse):
if __name__ == '__main__':
out_q = multiprocessing.Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []
for i in range(nprocs):
p = multiprocessing.Process(
target=worker,
args=(nums[chunksize * i:chunksize * (i + 1)],
out_q,
objecttouse))
procs.append(p)
p.start()
# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultlist = []
for i in range(nprocs):
temp=out_q.get()
index =0
for i in temp:
resultlist.append(temp[index][0][0:])
index +=1
# Wait for all worker processes to finish
for p in procs:
p.join()
resultlist2 = [x for x in resultlist if x != []]
return resultlist2
def worker(nums, out_q, objecttouse):
""" The worker function, invoked in a process. 'nums' is a
list of numbers to factor. The results are placed in
a dictionary that's pushed to a queue.
"""
outlist = []
for n in nums:
outputlist=objecttouse.getevents(n)
if outputlist:
outlist.append(outputlist)
out_q.put(outlist)
mp_factorizer gets a list of items, # of threads, and an object that the worker should use, it then splits up the list of items so all threads get an equal amount of the list, and starts the workers.
The workers then use the object to calculate something from the given list, add the result to the queue.
Mp_factorizer is supposed to collect all results from the queue, merge them to one large list and return that list. However - I get multiple returns.
What am I doing wrong? Or is this expected behavior due to the strange way windows handles multiprocessing?
(Python 2.7.3, Windows7 64bit)
EDIT:
The problem was the wrong placement of if __name__ == '__main__':. I found out while working on another problem, see using multiprocessing in a sub process for a complete explanation.
if __name__ == '__main__' is in the wrong place. A quick fix would be to protect only the call to mp_factorizer like Janne Karila suggested:
if __name__ == '__main__':
print mp_factorizer(list, 2, someobject)
However, on windows the main file will be executed once on execution + once for every worker thread, in this case 2. So this would be a total of 3 executions of the main thread, excluding the protected part of the code.
This can cause problems as soon as there are other computations being made in the same main thread, and at the very least unnecessarily slow down performance. Even though only the worker function should be executed several times, in windows everything will be executed thats not protected by if __name__ == '__main__'.
So the solution would be to protect the whole main process by executing all code only after
if __name__ == '__main__'.
If the worker function is in the same file, however, it needs to be excluded from this if statement because otherwise it can not be called several times for multiprocessing.
Pseudocode main thread:
# Import stuff
if __name__ == '__main__':
#execute whatever you want, it will only be executed
#as often as you intend it to
#execute the function that starts multiprocessing,
#in this case mp_factorizer()
#there is no worker function code here, it's in another file.
Even though the whole main process is protected, the worker function can still be started, as long as it is in another file.
Pseudocode main thread, with worker function:
# Import stuff
#If the worker code is in the main thread, exclude it from the if statement:
def worker():
#worker code
if __name__ == '__main__':
#execute whatever you want, it will only be executed
#as often as you intend it to
#execute the function that starts multiprocessing,
#in this case mp_factorizer()
#All code outside of the if statement will be executed multiple times
#depending on the # of assigned worker threads.
For a longer explanation with runnable code, see using multiprocessing in a sub process
Your if __name__ == '__main__' statement is in the wrong place. Put it around the print statement to prevent the subprocesses from executing that line:
if __name__ == '__main__':
print mp_factorizer(list, 2, someobject)
Now you have the if inside mp_factorizer, which makes the function return None when called inside a subprocess.
Related
Edited
I'm trying to run few Python processes, and want to kill all of them as soon as I get
a result from one of them.
edit: How do I do that?
In the code below, we can see a loop that initiates 10 processes,
and prints "hello world (i)". How can I stop after the first print?
I'll put a small example(modified from https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing)
# MAIN
from multiprocessing import Process, Lock
import globals
import globalsOperations
globals.init()
def f(l, i):
# l.acquire()
# try:
if not globalsOperations.get_my_bool_state():
print(globalsOperations.get_my_bool_state())
print('hello world', i)
globalsOperations.set_my_bool_state(True)
print(globalsOperations.get_my_bool_state())
# finally:
# l.release()
if __name__ == '__main__':
lock = Lock()
for num in range(10):
Process(target=f, args=(lock, num)).start()
# global.py
def init():
global my_bool
my_bool = False
#globalsOperations.py
import globals
def set_my_bool_state(bool_value):
globals.my_bool = bool_value
def get_my_bool_state():
return globals.my_bool
Lock is commented because I've tried to stop after the first success, with no luck.
So- to the question- how do I stop after the first result?
preferably with no memory leaks when releasing the processes..
(I'm not asking a lot of questions here so don't be too harsh on me :) )
thanks!
Your biggest problem is the failure to recognize that each process has its own copy of memory so when one process modifies a global variable the memory spaces of other processes have not been updated. In short, your program cannot possibly work. So globals either has to be located in shared memory or can be a managed object represented by a proxy. I have used the latter since how you would access your global data would require the fewer syntactical changes. This is a huge topic. See this.
Second, I would suggest using a multiprocessing pool, e.g. a multiprocessing.pool.Pool instance combined with the imap_unordered method rather than individual multiprocessingProcess instances. The imap_unordered method returns an iterator that you can use to iterate results from your worker function f as soon as they become available. You need to now modify f to return True or False based upon whether its invocation was the first to set globals.my_bool to True or not. As soon as the main process gets a True result, it can issue method terminate on the pool, killing any tasks that are running or scheduled to run.
There will be some lag before the main process detects that a task completed successfully and its termination of the remaining tasks. In that window of time, a few of the other submitted tasks can be running to completion.
Finally, globals is a built-in function name and should not be used for other purposes, such as the name of a module or variable. So I will be using the name gbls instead.
And you do need to use locking or multiple tasks can think that they are the first to succeed.
There is a lot here for you to be investigating:
from multiprocessing import Manager, Pool, Lock
def init_processes(g, l):
"""
Initialize the global variable(s) for each process
in the multiprocessing pool.
In this case we initialize variable gbls with a proxy to a
managed Namespace object.
"""
global gbls, lock
gbls, lock = g, l
def set_my_bool_state(bool_value):
gbls.my_bool = bool_value
def get_my_bool_state():
return gbls.my_bool
def f(i):
with lock:
if not get_my_bool_state():
print(get_my_bool_state())
print('hello world', i, flush=True)
set_my_bool_state(True)
print(get_my_bool_state())
return True # we were the first to succeed
else:
# A few of these might print before the pool is terminated:
print('Already set.', i, flush=True)
return False # we were not the first to succeed
if __name__ == '__main__':
with Manager() as manager:
gbls = manager.Namespace()
gbls.my_bool = False
lock = Lock()
pool = Pool(10, initializer=init_processes, initargs=(gbls, lock))
for result in pool.imap_unordered(f, range(10)):
if result: # first to succeed:
break
pool.terminate() # kill all remaining tasks
# Wait for all processes to end:
pool.join()
Prints:
False
hello world 0
True
Already set. 1
Already set. 2
I am in the following setting: I have a method that takes an objective function f as input. As a subrouting of that method i want to evaluate f on a small set of points. Since f has high complexity i considered doing that in parallel.
All online examples hang up even for trivial functions like squaring on sets with 5 points. They are using the multiprocessing library - and i don't know what i am doing wrong. I am not sure how to encapsulate that __name__ == "__main__" statement in my method. (since it is part of a module - i guess instead of "__main__" i should use the module name?)
Code i have been using looks like
from multiprocessing.pool import Pool
from multiprocessing import cpu_count
x = [1,2,3,4,5]
num_cores = cpu_count()
def f(x):
return x**2
if __name__ == "__main__":
pool = Pool(num_cores)
y = list(pool.map(f, x))
pool.join()
print(y)
When executing this code in my spyder it takes a bloody long time to finish.
So my main questions are: What am i doing wrong in this code? How can i encapsulate the __name__-statement, when this code is part of a bigger method?
Is it even worth it parallelizing this? (one function evaluation can take multiple minutes and in serial this adds up to a total runtime of hours...)
According to documentation :
close()
Prevents any more tasks from being submitted to the pool. Once all the tasks have been completed the worker processes will exit.
terminate()
Stops the worker processes immediately without completing outstanding work. When the pool object is garbage collected
terminate() will be called immediately.
join()
Wait for the worker processes to exit. One must call close() or terminate() before using join().
So you should add :
from multiprocessing.pool import Pool
from multiprocessing import cpu_count
x = [1,2,3,4,5]
def f(x):
return x**2
if __name__ == "__main__":
pool = Pool()
y = list(pool.map(f, x))
pool.close()
pool.join()
print(y)
You can call Pool without any argument and it will use cpu_count by default
If processes is None then the number returned by cpu_count() is used
About the if name == "main", read more informations here.
So you need to think a bit about which code you want executed only in the main program. The most obvious example is that you want code that creates child processes to run only in the main program - so that should be protected by name == 'main'
You might want to look into the chunksize argument of the map function that you are using.
On a large enough input list, a lot of your time is spent simply communicating the arguments to and from the separate parallel processes.
One symptom of this problem is that when you use something like htop all cores are firing but at < 100%.
I have a program that needs to create several graphs, with each one often taking hours. Therefore I want to run these simultaneously on different cores, but cannot seem to get these processes to run with the multiprocessing module. Here is my code:
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=full_graph)
jobs.append(p)
p.start()
p.join()
(full_graph() has been defined earlier in the program, and is simply a function that runs a collection of other functions)
The function normally outputs some graphs, and saves the data to a .txt file. All data is saved to the same 2 text files. However, calling the functions using the above code gives no console output, nor any output to the text file. All that happens is a few second long pause, and then the program exits.
I am using the Spyder IDE with WinPython 3.6.3
Without a simple full_graph sample nobody can tell you what's happening. But your code is inherently wrong.
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=full_graph)
jobs.append(p)
p.start()
p.join() # <- This would block until p is done
See the comment after p.join(). If your processes really take hours to complete, you would run one process for hours and then the 2nd, the 3rd. Serially and using a single core.
From the docs: https://docs.python.org/3/library/multiprocessing.html
Process.join: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.join
If the optional argument timeout is None (the default), the method blocks until the process whose join() method is called terminates. If timeout is a positive number, it blocks at most timeout seconds. Note that the method returns None if its process terminates or if the method times out. Check the process’s exitcode to determine if it terminated.
If each process does something different, you should then also have some args for full_graph(hint: may that be the missing factor?)
You probably want to use an interface like map from Pool
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool
And do (from the docs again)
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
I am performing a large parallel mapping computation from within iPython notebook. I am mapping a dataframe by subject and condition to an machine learning prediction function, and I want each subject and condition to be spread among 20 cores.
def map_vars_to_functionPredict(subject,condition):
ans = map(predictBasic, [subject],[df],[condition])
return ans
def main_helperPredict(args):
return map_vars_to_functionPredict(*args)
def parallel_predict(subjects, conditions):
p = Pool(20)
# set each matching item into a tuple
job_args = list(itertools.product(*[subjects,conditions]))
print job_args
# map to pool
ans = p.map(main_helperPredict, job_args)
p.close()
p.join()
return ans
When I run these functions from iPython Notebook after starting the notebook, they run quickly and as expected (in 'Running' state at ~100% cpu in 20 cores). However, sometimes if I re-run the parallel_predict function right after running it for the first time, all 20 processes are marked as in uninterruptible sleep (D) state for no reason. I am not writing anything to disk, just having the output as a variable in iPython notebook.
As a last ditch attempt, I have tried including del p after p.join() and this helped somewhat (the function runs normally more often), but I still occasionally have the issue of processes being D, especially if I have a lot of processes in the queue.
Edit:
In general, adding del p after p.join() kept the processes from entering (D) state, but I continued to have an issue where the function would finish all the processes (as far as I could tell from top), but it would not return results. When I stopped the iPython Notebook kernel, I got the error ZMQError: Address already in use.
How should I properly start or finish the multiprocessing Pool to keep this from happening?
I changed four things and now 1) the processes no longer go into (D) state and 2) I can run these functions back-to-back and they always return results and don't hang.
To parallel_predict, I added freeze_support() and replaced p.close() with p.terminate() (and added a print line, but I don't think that makes a difference, but I'm including that since all of this is superstition anyway). I also added del p.
def parallel_predict(subjects, conditions):
freeze_support()
p = Pool(20)
# set each matching item into a tuple
job_args = list(itertools.product(*[subjects,conditions]))
print job_args
# map to pool
ans = p.map(main_helperPredict, job_args)
p.terminate()
p.join()
del p
print "finished"
return ans
Finally, I embedded the line where I call parallel_predict in if __name__ == "__main__" as such:
if __name__ == "__main__":
all_results = parallel_predict(subjects,conditions)
How can I script a Python multiprocess that uses two Queues as these ones?:
one as a working queue that starts with some data and that, depending on conditions of the functions to be parallelized, receives further tasks on the fly,
another that gathers results and is used to write down the result after processing finishes.
I basically need to put some more tasks in the working queue depending on what I found in its initial items. The example I post below is silly (I could transform the item as I like and put it directly in the output Queue), but its mechanics are clear and reflect part of the concept I need to develop.
Hereby my attempt:
import multiprocessing as mp
def worker(working_queue, output_queue):
item = working_queue.get() #I take an item from the working queue
if item % 2 == 0:
output_queue.put(item**2) # If I like it, I do something with it and conserve the result.
else:
working_queue.put(item+1) # If there is something missing, I do something with it and leave the result in the working queue
if __name__ == '__main__':
static_input = range(100)
working_q = mp.Queue()
output_q = mp.Queue()
for i in static_input:
working_q.put(i)
processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(mp.cpu_count())] #I am running as many processes as CPU my machine has (is this wise?).
for proc in processes:
proc.start()
for proc in processes:
proc.join()
for result in iter(output_q.get, None):
print result #alternatively, I would like to (c)pickle.dump this, but I am not sure if it is possible.
This does not end nor print any result.
At the end of the whole process I would like to ensure that the working queue is empty, and that all the parallel functions have finished writing to the output queue before the later is iterated to take out the results. Do you have suggestions on how to make it work?
The following code achieves the expected results. It follows the suggestions made by #tawmas.
This code allows to use multiple cores in a process that requires that the queue which feeds data to the workers can be updated by them during the processing:
import multiprocessing as mp
def worker(working_queue, output_queue):
while True:
if working_queue.empty() == True:
break #this is the so-called 'poison pill'
else:
picked = working_queue.get()
if picked % 2 == 0:
output_queue.put(picked)
else:
working_queue.put(picked+1)
return
if __name__ == '__main__':
static_input = xrange(100)
working_q = mp.Queue()
output_q = mp.Queue()
results_bank = []
for i in static_input:
working_q.put(i)
processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(mp.cpu_count())]
for proc in processes:
proc.start()
for proc in processes:
proc.join()
results_bank = []
while True:
if output_q.empty() == True:
break
results_bank.append(output_q.get_nowait())
print len(results_bank) # length of this list should be equal to static_input, which is the range used to populate the input queue. In other words, this tells whether all the items placed for processing were actually processed.
results_bank.sort()
print results_bank
You have a typo in the line that creates the processes. It should be mp.Process, not mp.process. This is what is causing the exception you get.
Also, you are not looping in your workers, so they actually only consume a single item each from the queue and then exit. Without knowing more about the required logic, it's not easy to give specific advice, but you will probably want to enclose the body of your worker function inside a while True loop and add a condition in the body to exit when the work is done.
Please note that, if you do not add a condition to explicitly exit from the loop, your workers will simply stall forever when the queue is empty. You might consider using the so-called poison pill technique to signal the workers they may exit. You will find an example and some useful discussion in the PyMOTW article on Communication Between processes.
As for the number of processes to use, you will need to benchmark a bit to find what works for you, but, in general, one process per core is a good starting point when your workload is CPU bound. If your workload is IO bound, you might have better results with a higher number of workers.