For some reason, I can not do parallel processing by python. Fo example by running the below code, I get runtime errors:
import multiprocessing as mp
import time
def sleep_for_a_bit(seconds):
print(f'Sleeping {seconds} second(s)')
time.sleep(seconds)
print("Done Sleeping")
p1=mp.Process(target=sleep_for_a_bit,args=[1])
p2=mp.Process(target=sleep_for_a_bit,args=[1])
if __name__ == '__main__':
mp.freeze_support()
p1.start()
p2.start()
finish=time.perf_counter()
print("finish running after seconds : ",finish)
this is the error message:
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.`
I have a Windows desktop and it actually ran (maybe I was lucky). But in general, on platforms such as Windows that use spawn to create new platforms, you should take all code that you do not want your newly created processes to execute out of global scope since these processes will be created by launching a new Python interpreter and restarting executing from the top of the program and if the code is not contained within a if __name__ == '__main__': block, it will be executed. So my best suggestion is to try the following (I have made a few corrections to the code):
import multiprocessing as mp
import time
def sleep_for_a_bit(seconds):
print(f'Sleeping {seconds} second(s)')
time.sleep(seconds)
print("Done Sleeping")
if __name__ == '__main__':
mp.freeze_support() # not required unless you are creating an .exe file
p1=mp.Process(target=sleep_for_a_bit,args=[1])
p2=mp.Process(target=sleep_for_a_bit,args=[1])
start = time.perf_counter()
p1.start()
p2.start()
p1.join() # wait for process to finish
p2.join() # wait for process to finish
finish=time.perf_counter()
# perf_counter() is only meaningful when you take the difference between readings:
print("finish running after seconds : ", finish - start)
Prints:
Sleeping 1 second(s)
Sleeping 1 second(s)
Done Sleeping
Done Sleeping
finish running after seconds : 1.0933153999999998
Related
I have a simple function that I intend to run in Parallel using the Python multiprocessing module. However I get the following error RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. The error suggests that I add this:
if __name__ == '__main__':
freeze_support()
And most posts online suggest the same like this SO answer.
I added it and it works but I don't seem to understand why it's necessary for such a simple piece of code.
Code without __name__=="__main__" (throws RuntimeError)
import multiprocessing
import time
start = time.perf_counter()
def do_something():
print('Sleeping 1 second...')
time.sleep(1)
print('Done sleeping...')
p1 = multiprocessing.Process(target=do_something)
p2 = multiprocessing.Process(target=do_something)
p1.start()
p2.start()
finish = time.perf_counter()
print(f'Finished in {round(finish - start, 2)} second(s)')
Code with __name__=="__main__" (doesn't throw RuntimeError)
import multiprocessing
import time
start = time.perf_counter()
def do_something():
print('Sleeping 1 second...')
time.sleep(1)
print('Done sleeping...')
def main():
p1 = multiprocessing.Process(target=do_something)
p2 = multiprocessing.Process(target=do_something)
p1.start()
p2.start()
finish = time.perf_counter()
print(f'Finished in {round(finish - start, 2)} second(s)')
if __name__ == "__main__":
main()
In Windows, multiprocessing.Process executes a fresh copy of python to run the code. It has to get the code you want to execute to load in that process so it pickles a snapshot of your current environment to expand in the child. For that to work, the child needs to reimport modules used by the parent. In particular, it needs to import the main script as a module. When you import, any code residing at module level executes.
So lets make the simplest case
foo.py
import multiprocessing as mp
process = mp.Process(target=print, args=('foo',))
process.start()
process.join()
process.start() executes a new python which imports foo.py. And there's the problem. That new foo will create another subprocess which will again import foo.py. So yet another process is created.
The would go on until you blow up your machine except that python detects the problem and raises the exception.
THE FIX
Python modules have the __name__ attribute. If you run your program as a script, __name__ is "main", otherwise, __name__ is the name of your module. So, when a multiprocessing process is importing your main script to setup your environment, its name is not __main__. You can use that to make sure that your MP work is only done in the parent module.
import multiprocessing as mp
if __name__ == "__main__":
# run as top level script, but not as imported module
process = mp.Process(target=print, args=('foo',))
process.start()
process.join()
I have python function which is calling into a C library I cannot control or update. Unfortunately, there is an intermittent bug with the C library and occasionally it hangs. To protect my application from also hanging I'm trying to isolate the function call in ThreadPoolExecutor or ProcessPoolExecutor so only that thread or process crashes.
However, the following code hangs, because the executor cannot shut down because the process is still running!
Is it possible to cancel an executor with a future that has hung?
import time
from concurrent.futures import ThreadPoolExecutor, wait
if __name__ == "__main__":
def hang_forever(*args):
print("Starting hang_forever")
time.sleep(10.0)
print("Finishing hang_forever")
print("Starting executor")
with ThreadPoolExecutor() as executor:
future = executor.submit(hang_forever)
print("Submitted future")
done, not_done = wait([future], timeout=1.0)
print("Done", done, "Not done", not_done)
# with never exits because future has hung!
if len(not_done) > 0:
raise IOError("Timeout")
The docs say that it's not possible to shut down the executor until all pending futures are done executing:
Regardless of the value of wait, the entire Python program will not
exit until all pending futures are done executing.
Calling future.cancel() won't help as it will also hang. Fortunately, you can solve your problem by using multiprocessing.Process directly instead of using ProcessPoolExecutor:
import time
from multiprocessing import Process
def hang_forever():
while True:
print('hang forever...')
time.sleep(1)
def main():
proc = Process(target=hang_forever)
print('start the process')
proc.start()
time.sleep(1)
timeout = 3
print(f'trying to join the process in {timeout} sec...')
proc.join(timeout)
if proc.is_alive():
print('timeout is exceeded, terminate the process!')
proc.terminate()
proc.join()
print('done')
if __name__ == '__main__':
main()
Output:
start the process
hang forever...
trying to join the process in 3 sec...
hang forever...
hang forever...
hang forever...
hang forever...
timeout is exceeded, terminate the process!
done
I am trying to understand how multiprocessing works in Python. Here is a simple code which is not calling the function as I expected it would.
import time
import multiprocessing
def do_something():
print('Sleep')
time.sleep(1)
print('Wake up')
start = time.perf_counter()
p1 = multiprocessing.Process(target=do_something)
p2 = multiprocessing.Process(target=do_something)
p1.start()
p2.start()
p1.join()
p2.join()
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')
In Jupyter Notebook, after executing I am getting following output:
Finished in 0.2 second(s)
I though it would be like something like this:
Sleep
Sleep
Wake up
Wake up
Finished in 0.2 second(s)
What am I missing?
You should check "Programming guidelines" (https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming) to figure out why you need the:
if __name__ == '__main__' :
guard in your scripts that use multiprocessing. Since you don't have that in your notebooks, it wont work properly.
I'm trying to understand multiprocessing module. Below is my code.
from multiprocessing import Process, current_process
#from time import time
import time
def work(delay):
p = current_process()
print p.name, p.pid, p.deamon
time.sleep(delay)
print 'Finised deamon work'
def main():
print 'Starting Main Process'
p = Process(target=work, args=(2,))
p.deamon = True
p.start()
print 'Exiting Main Process'
if __name__ == '__main__':
main()
Output:
Starting Main Process
Exiting Main Process
Process-1 7863 True
Finised deamon work
I expect main process to exit before deamon process(sleep for 2 secs). Since main process exits, deamon process should also exit. But output is confusing me.
Expected Output:
Starting Main Process
Exiting Main Process
Process-1 7863 True
Is my understanding of multiprocessing module wrong?
Take a look at this simple python code with Process:
from multiprocessing import Process
import time
def f(name):
time.sleep(100)
print 'hello', name
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()#Has to be terminated in 5 seconds
#p.join()
print "This Needs to be Printed Immediately"
I guess I am looking for a function like p.start(timeout).
I want to terminate the p process if it has not self-finished in like 5 seconds. How can I do that? There seems to be no such function.
If p.join() is uncommented, the following print line will have to wait 100 seconds and can not be 'Printed Immediately'.But I want it be done immediately so the p.join() has to be commented out.
Use a separate thread to start the process, wait 5 seconds, then terminate the process. Meanwhile the main thread can do the work you want to happen immediately:
from multiprocessing import Process
import time
import threading
def f(name):
time.sleep(100)
print 'hello', name
def run_process_with_timeout(timeout, target, args):
p = Process(target=target, args=args)
p.start()
time.sleep(timeout)
p.terminate()
if __name__ == '__main__':
t = threading.Thread(target=run_process_with_timeout, args=(5,f,('bob',)))
t.start()
print "This Needs to be Printed Immediately"
You might want to take a look at that SO thread.
basically their solution is to use the timeout capability of the threading module by running the process in a separate thread.
You are right, there is no such function in Python 2.x in the subprocess library.
However, with Python 3.3 you can use:
p = subprocess.Popen(...)
try:
p.wait(timeout=5)
except TimeoutError:
p.kill()
With older Python versions, you would have to write a loop that calls p.poll() and checks the returncode, e.g. once per second.
This is (like polling in general) not optimal from performance point-of-view, but it always depends on what you expect.
Try something like this:
def run_process_with_timeout(timeout, target, args):
p = Process(target=target, args=args)
running = False
second = int(time.strftime("%S"))
if second+timeout > 59:
second = (second+timeout)-60
else:
second = second+timeout
print second
while second > int(time.strftime("%S")):
if running == False:
p.start()
running = True
p.terminate()
basically just using the time module to allow a loop to run for five seconds and then moving on, this assumes timeout is given in seconds.
Though I'd point out that if this was used with the code the OP originally posted, this would work, as print was in a second function separate from the loop and would be carried out immediately after calling this function.
Why not use the timeout option of Process.join(), as in:
import sys
...
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()#Has to be terminated in 5 seconds
# print immediately and flush output
print "This Needs to be Printed Immediately"
sys.stdout.flush()
p.join(5)
if p.is_alive():
p.terminate()