I have some code which runs in parallel using the multiprocessing Pool class. Unfortunately some of the functions I use from another library have some verbose output. To abstract the problem have a look at the following example:
from multiprocessing import Pool
def f(x):
print 'hello'
return x*x
p = Pool(10)
p.map(f, [1,2,3])
So this will print 'hello' like 10 times.
Is there a possibility to mute the output of the processes or
to put the std.out in some variable? Thanks for any suggestion.
EDIT:
I don't want to redirect the whole stdout, just for the processes of my pool.
Use the initializer parameter to call mute in the worker processes:
import sys
import os
from multiprocessing import Pool
def mute():
sys.stdout = open(os.devnull, 'w')
def f(x):
print 'hello'
return x*x
if __name__ == '__main__':
p = Pool(10, initializer=mute)
p.map(f, [1,2,3])
print('Hello')
prints
Hello
Related
I have this code:
from multiprocessing import Process
def f():
print('hello')
if __name__ == '__main__':
p = Process(target=f)
p.start()
print(p.is_alive())
p.join()
Although it prints that the process is alive the f() functions never runs. This is a simple code example that I want to help me understand why it doesn't work.
Here is some pseudocode for what I'm doing
import multiprocessing as mp
from multiprocessing import Manager
from tqdm import tqdm
def loop(arg):
# do stuff
# ...
results.append(result_of_stuff)
if __name__ == '__main__':
manager = Manager()
results = manager.list()
with mp.get_context('spawn').Pool(4) as pool:
list(tqdm(pool.imap(loop, ls), total=len(ls)))
# do stuff with `results`
# ...
So the issue here is that loop doesn't know about results. I have one working way to do this and it's by using "fork" instead of "spawn". But I need to use "spawn" for reasons beyond the scope of my question..
So what is the minimal change I need to make for this to work? And I really want to keep tqdm hence the use of imap
PS: I'm on Linux
You can use functools.partial to add the extra parameters:
import multiprocessing as mp
import os
from functools import partial
from multiprocessing import Manager
from tqdm import tqdm
def loop(results, arg):
results.append(len(arg))
def main():
ctx = mp.get_context("spawn")
manager = Manager()
l = manager.list()
partial_loop = partial(loop, l)
ls = os.listdir("/tmp")
with ctx.Pool() as pool:
results = list(tqdm(pool.imap(partial_loop, ls), total=len(ls)))
print(f"Sum: {sum(l)}")
if __name__ == "__main__":
main()
There is some overhead with this approach as it will spawn a child process to host the Manager server.
Since you will process the results in the main process anyway I would do something like this instead (but this depends on your circumstances of course):
import multiprocessing as mp
import os
from tqdm import tqdm
def loop(arg):
return len(arg)
def main():
ctx = mp.get_context("spawn")
ls = os.listdir("/tmp")
with ctx.Pool() as pool:
results = list(tqdm(pool.imap(loop, ls), total=len(ls)))
print(f"Sum: {sum(results)}")
if __name__ == "__main__":
main()
I know you have already accepted an answer, but let me add my "two cents":
The other way of solving your issue is by initializing each process in your pool with the global variable results as originally intended. The problem was that when using spawn newly created processes do not inherit the address space of the main process (which included the definition of results). Instead execution starts from the top of the program. But the code that creates results never gets executed because of the if __name__ == '__main__' check. But that is a good thing because you do not want a separate instance of this list anyway.
So how do we share the same instance of global variable results across all processes? This is accomplished by using a pool initializer as follows. Also, if you want an accurate progress bar, you should really use imap_unordered instead of imap so that the progress bar is updated in task-completion order rather than in the order in which tasks were submitted. For example, if the first task submitted happens to be the last task to complete, then using imap would result in the progress bar not progressing until all the tasks completed and then it would shoot to 100% all at once.
But Note: The doumentation for imap_unordered only states that the results will be returned in arbitrary order, not completion order. It does however seem that when a chunksize argument of 1 is used (the default if not explicitly specified), the results are returned in completion order. If you do not want to rely on this, then use instead apply_async specifying a callback function that will update the progrss bar. See the last code example.
import multiprocessing as mp
from multiprocessing import Manager
from tqdm import tqdm
def init_pool(the_results):
global results
results = the_results
def loop(arg):
import time
# do stuff
# ...
time.sleep(1)
results.append(arg ** 2)
if __name__ == '__main__':
manager = Manager()
results = manager.list()
ls = list(range(1, 10))
with mp.get_context('spawn').Pool(4, initializer=init_pool, initargs=(results,)) as pool:
list(tqdm(pool.imap_unordered(loop, ls), total=len(ls)))
print(results)
Update: Another (Better) Way
import multiprocessing as mp
from tqdm import tqdm
def loop(arg):
import time
# do stuff
# ...
time.sleep(1)
return arg ** 2
if __name__ == '__main__':
results = []
ls = list(range(1, 10))
with mp.get_context('spawn').Pool(4) as pool:
with tqdm(total=len(ls)) as pbar:
for v in pool.imap_unordered(loop, ls):
results.append(v)
pbar.update(1)
print(results)
Update: The Safest Way
import multiprocessing as mp
from tqdm import tqdm
def loop(arg):
import time
# do stuff
# ...
time.sleep(1)
return arg ** 2
def my_callback(v):
results.append(v)
pbar.update(1)
if __name__ == '__main__':
results = []
ls = list(range(1, 10))
with mp.get_context('spawn').Pool(4) as pool:
with tqdm(total=len(ls)) as pbar:
for arg in ls:
pool.apply_async(loop, args=(arg,), callback=(my_callback))
pool.close()
pool.join()
print(results)
I'm running multiple threads in python. I've tried using threading module, multiprocessing module. Even though the execution gives the correct result, everytime the terminal gets stuck and printing of the output gets messed up.
Here's a simplified version of the code.
import subprocess
import threading
import argparse
import sys
result = []
def check_thread(args,components,id):
for i in components:
cmd = <command to be given to terminal>
output = subprocess.check_output([cmd],shell=True)
result.append((id,i,output))
def check(args,components):
# lock = threading.Lock()
# lock = threading.Semaphore(value=1)
thread_list = []
for id in range(3):
t=threading.Thread(target=check_thread, args=(args,components,i))
thread_list.append(t)
for thread in thread_list:
thread.start()
for thread in thread_list:
thread.join()
for res in result:
print(res)
return res
if __name__ == 'main':
parser = argparse.ArgumentParser(....)
parser.add_argument(.....)
args = parser.parse_args()
components = ['comp1','comp2']
while True:
print('SELECTION MENU\n1)\n2)\n')
option = raw_input('Enter option')
if option=='1':
res = check(args, components)
if option=='2':
<do something else>
else:
sys.exit(0)
I've tried using multiprocessing module with Process, pool. Tried passing a lock to check_thread, tried returning a value from check_thread() and using a queue to take in the values, but everytime it's the same result, execution is successful but the terminal gets stuck and printed output is shabby.
Is there any fix to this? I'm using python 2.7. I'm using a linux terminal.
Here is how the shabby output looks
output
You should use queue module not list.
import multiprocessing as mp
# Define an output queue
output = mp.Queue()
# define a example function
def function(params, output):
""" Generates a random string of numbers, lower- and uppercase chars. """
# Process params and store results in res variable
output.put(res)
# Setup a list of processes that we want to run
processes = [mp.Process(target=function, args=(5, output)) for x in range(10)]
# Run processes
for p in processes:
p.start()
# Exit the completed processes
for p in processes:
p.join()
# Get process results from the output queue
results = [output.get() for p in processes]
print(results)
I have the following code but cannot get out the results from the iterator
from multiprocess import freeze_support
from pathos.multiprocessing import ProcessPool
if __name__ == "__main__":
freeze_support()
pool = ProcessPool(nodes=4)
results = pool.uimap(pow, [1,2,3,4], [5,6,7,8])
print("...")
print(list(results))
The code does not error it just hangs.
There are a couple subtleties to get this to work, but the short version is imap or uimap are iterators unlike map in the python multiprocessing example. To extract the results it needs to be in a for loop. If inside a class you also need the called method to be a #staticmethod
from multiprocessing import freeze_support
from multiprocessing import Pool
def f(vars):
return vars[0]**vars[1]
if __name__ == "__main__":
freeze_support()
pool = Pool(4)
for run in pool.imap(f, [(1,5), (2,8), (3,9)]):
print(run)
I have a simple python multiprocessing script that sets up a pool of workers that attempt to append work-output to a Manager list. The script has 3 call stacks: - main calls f1 that spawns several worker processes that call another function g1. When one attempts to debug the script (incidentally on Windows 7/64 bit/VS 2010/PyTools) the script runs into a nested process creation loop, spawning an endless number of processes. Can anyone determine why? I'm sure I am missing something very simple. Here's the problematic code: -
import multiprocessing
import logging
manager = multiprocessing.Manager()
results = manager.list()
def g1(x):
y = x*x
print "processing: y = %s" % y
results.append(y)
def f1():
logger = multiprocessing.log_to_stderr()
logger.setLevel(multiprocessing.SUBDEBUG)
pool = multiprocessing.Pool(processes=4)
for (i) in range(0,15):
pool.apply_async(g1, [i])
pool.close()
pool.join()
def main():
f1()
if __name__ == "__main__":
main()
PS: tried adding multiprocessing.freeze_support() to main to no avail.
Basically, what sr2222 mentions in his comment is correct. From the multiprocessing manager docs, it says that the ____main____ module must be importable by the children. Each manager " object corresponds to a spawned child process", so each child is basically re-importing your module (you can see by adding a print statement at module scope to my fixed version!)...which leads to infinite recursion.
One solution would be to move your manager code into f1():
import multiprocessing
import logging
def g1(results, x):
y = x*x
print "processing: y = %s" % y
results.append(y)
def f1():
logger = multiprocessing.log_to_stderr()
logger.setLevel(multiprocessing.SUBDEBUG)
manager = multiprocessing.Manager()
results = manager.list()
pool = multiprocessing.Pool(processes=4)
for (i) in range(0,15):
pool.apply_async(g1, [results, i])
pool.close()
pool.join()
def main():
f1()
if __name__ == "__main__":
main()