I am trying to use the multiprocessing packages for python. However, when I try to define a pool in IDLE it silently crashes (no error message or traceback, IDLE just closes). On the other hand, this same script has no problem running when executed from the terminal. What gives? I am using python 2.7 on Ubuntu 12.04.
import multiprocessing
from multiprocessing import Pool
def myfunc(x):
return x*x
cpu_count = int(multiprocessing.cpu_count() - 1)
pool = Pool(processes = cpu_count) #Crashes here in IDLE
resultlist = pool.map(myfunc, range(10))
pool.close()
print(resultlist)
The problem is your code: it is missing the if __name__ == '__main__': guard clause that is an essential part of every working example in the multiprocessing doc chapter, as it keeps each subprocess from running the start and finish code. Running the following in Idle (or without Idle, in a console)
import multiprocessing
from multiprocessing import Pool
def myfunc(x):
return x*x
if __name__ == '__main__':
cpu_count = int(multiprocessing.cpu_count() - 1)
pool = Pool(processes = cpu_count) #Crashes here in IDLE
resultlist = pool.map(myfunc, range(10))
pool.close()
print(resultlist)
almost immediately prints
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
If you run your original code in a console (or with Idle started in the console), you will see an endless stream of error messages as each process starts up more processes.
Edit: the above behavior is on Windows 10
I suggest you ignore opinions and decide for yourself what tools you want to use, based on current and real facts. Try to use the latest bugfix release of whatever Python version you use. For Idle in particular, there have been many fixes in the last 2 years after a couple of years or so of stagnation.
Related
I have a command-line program which runs on single core. It takes an input file, does some calculations, and returns several files which I need to parse to store the produced output.
I have to call the program several times changing the input file. To speed up the things I was thinking parallelization would be useful.
Until now I have performed this task calling every run separately within a loop with the subprocess module.
I wrote a script which creates a new working folder on every run and than calls the execution of the program whose output is directed to that folder and returns some data which I need to store. My question is, how can I adapt the following code, found here, to execute my script always using the indicated amount of CPUs, and storing the output.
Note that each run has a unique running time.
Here the mentioned code:
import subprocess
import multiprocessing as mp
from tqdm import tqdm
NUMBER_OF_TASKS = 4
progress_bar = tqdm(total=NUMBER_OF_TASKS)
def work(sec_sleep):
command = ['python', 'worker.py', sec_sleep]
subprocess.call(command)
def update_progress_bar(_):
progress_bar.update()
if __name__ == '__main__':
pool = mp.Pool(NUMBER_OF_TASKS)
for seconds in [str(x) for x in range(1, NUMBER_OF_TASKS + 1)]:
pool.apply_async(work, (seconds,), callback=update_progress_bar)
pool.close()
pool.join()
I am not entirely clear what your issue is. I have some recommendations for improvement below, but you seem to claim on the page that you link to that everything works as expected and I don't see anything very wrong with the code as long as you are running on Linux.
Since the subprocess.call method is already creating a new process, you should just be using multithreading to invoke your worker function, work. But had you been using multiprocessing and your platform was one that used the spawn method to create new processes (such as Windows), then having the creation of the progress bar outside of the if __name__ = '__main__': block would have resulted in the creation of 4 additional progress bars that did nothing. Not good! So for portability it would have been best to move its creation to inside the if __name__ = '__main__': block.
import subprocess
from multiprocessing.pool import ThreadPool
from tqdm import tqdm
def work(sec_sleep):
command = ['python', 'worker.py', sec_sleep]
subprocess.call(command)
def update_progress_bar(_):
progress_bar.update()
if __name__ == '__main__':
NUMBER_OF_TASKS = 4
progress_bar = tqdm(total=NUMBER_OF_TASKS)
pool = ThreadPool(NUMBER_OF_TASKS)
for seconds in [str(x) for x in range(1, NUMBER_OF_TASKS + 1)]:
pool.apply_async(work, (seconds,), callback=update_progress_bar)
pool.close()
pool.join()
Note: If your worker.py program prints to the console, it will mess up the progress bar (the progress bar will be re-written repeatedly on multiple lines).
Have you considered instead importing worker.py (some refactoring of that code might be necessary) instead of invoking a new Python interpreter to execute it (in this case you would want to be explicitly using multiprocessing). On Windows this might not save you anything since a new Python interpreter would be executed for each new process anyway, but this could save you on Linux:
import subprocess
from multiprocessing.pool import Pool
from worker import do_work
from tqdm import tqdm
def update_progress_bar(_):
progress_bar.update()
if __name__ == '__main__':
NUMBER_OF_TASKS = 4
progress_bar = tqdm(total=NUMBER_OF_TASKS)
pool = Pool(NUMBER_OF_TASKS)
for seconds in [str(x) for x in range(1, NUMBER_OF_TASKS + 1)]:
pool.apply_async(do_work, (seconds,), callback=update_progress_bar)
pool.close()
pool.join()
When I run this and input something it goes into the main function but then again asks for input. Why is that even happening?
I am running using command prompt in windows. version is 3.8
import multiprocessing
from concurrent.futures import ProcessPoolExecutor
import concurrent.futures
input('?')
def pp(id,lock):
with lock:
for i in range(5):
print(f'{id}=>{i}')
def main():
pool = ProcessPoolExecutor()
m = multiprocessing.Manager()
lock = m.Lock()
futures = [pool.submit(pp, num,lock) for num in range(10)]
with concurrent.futures.ThreadPoolExecutor(max_workers=100) as executor:
executor.map(main, list(range(10)),[lock]*10)
if __name__=='__main__':
main()
Here is the output:
?abc
?abd
????
How to solve this problem so it runs the input just once?
I cannot reproduce, it only runs once on my local Python.
What is your Python version ?
However, I can recommend putting input inside the if __name__ == "main". The problem is that your input is called whenever you import your module, which could be done by a Thread when importing the main function.
Note: sorry to not post a comment, but I can't with a lower reputation than 50.
I just want to see a simple code implementation of multiprocessing on windows, but it doesn't enter/run functions neither in jupyternotebook or running saved .py
import time
import multiprocessing
s=[1,4]
def subu(remo):
s[remo-1]=remo*9
print(f'here{remo}')
return
if __name__=="__main__":
p1=multiprocessing.Process(target=subu , args=[1])
p2=multiprocessing.Process(target=subu , args=[2])
p1.start()
p2.start()
p1.join()
p2.join()
# print("2222here")
print(s)
input()
the output by .py is:
[1, 4]
[1, 4]
and the output by jupyternotebook is:
[1,4]
which I hoped to be:
here1
here2
[9,18]
what's wrong with code above? and what about this code:
import concurrent
thread_num=2
s=[1,4]
def subu(remo):
s[remo-1]=remo*9
print(f'here{remo}')
return
with concurrent.futures.ProcessPoolExecutor() as executor:
## or if __name__=="__main__":
##... with concurrent.futures.ProcessPoolExecutor() as executor:
results=[executor.submit(subu,i) for i in range(thread_num)]
for f in concurrent.futures.as_completed(results):
print(f.result())
input()
doesnot run at all in jupyter pulling error
BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.
I kinda know I can't expect jupyter to run multiprocessing. but saved.py also can't run it. and it exits without waiting for input()
There are a couple of potential problems. The worker function needs to be importable (at least on Windows) so that it can be found by the subprocess. And since subprocess memory isn't visible to the parent, the results need to be returned. So, putting the worker in a separate module
subumodule.py
def subu(remo):
remo = remo*9
print(f'here{remo}')
return remo
And using a process pool's existing infrastructure to return a worker return value to the parent. You could
import time
import multiprocessing
if __name__=="__main__":
with multiprocessing.Pool(2) as pool:
s = list(pool.map(subu, (1,2))) #here
print(s)
input()
I have a problem where my .py file, which uses maximum CPU through multiprocessing, stops operating without exiting the .py file.
I am running a heavy task that uses all cores in an old MacBook Pro (2012). The task runs fine at first, where I can visually see four python3.7 tasks populate the Activity Monitor window. However, after about 20 minutes, those four python3.7 disappear from the Activity Monitor.
The strangest part is the multiprocessing .py file is still operating, i.e. it never threw an uncaught exception nor exited the file.
Would you guys/gals have any ideas as to what's going on? My guess is 1) it's most likely an error in the script, and 2) the old computer is overheating.
Thanks!
Edit: Below is the multiprocess code, where the multiprocess function to execute is func with a list as its argument. I hope this helps!
import multiprocessing
def main():
pool = multiprocessing.Pool()
for i in range(24):
pool.apply_async(func, args = ([], ))
pool.close()
pool.join()
if __name__ == '__main__':
main()
Use a context manager to handle closing processes properly.
from multiprocessing import Pool
def main():
with Pool() as p:
result = p.apply_async(func, args = ([], ))
print(result)
if __name__ == '__main__':
main()
I wasn't sure what you were doing with the for i in range() part.
I would like to use pool.apply or pool.map. I tried to run several examples but none of those work. The code keeps running and does not provide any errors. Might this have anything to do with the settings of my system?
I tried to use several simple examples.
one of the examples i tried to run:
from multiprocessing import Pool
def doubler(number):
return number * 2
if __name__ == '__main__':
numbers = [5, 10, 20]
pool = Pool(processes=4)
Print(pool.map(doubler, numbers))
Python keeps running, did not provide any results.