I am trying to understand how multiprocessing works in Python. Here is a simple code which is not calling the function as I expected it would.
import time
import multiprocessing
def do_something():
print('Sleep')
time.sleep(1)
print('Wake up')
start = time.perf_counter()
p1 = multiprocessing.Process(target=do_something)
p2 = multiprocessing.Process(target=do_something)
p1.start()
p2.start()
p1.join()
p2.join()
finish = time.perf_counter()
print(f'Finished in {round(finish-start, 2)} second(s)')
In Jupyter Notebook, after executing I am getting following output:
Finished in 0.2 second(s)
I though it would be like something like this:
Sleep
Sleep
Wake up
Wake up
Finished in 0.2 second(s)
What am I missing?
You should check "Programming guidelines" (https://docs.python.org/3/library/multiprocessing.html#multiprocessing-programming) to figure out why you need the:
if __name__ == '__main__' :
guard in your scripts that use multiprocessing. Since you don't have that in your notebooks, it wont work properly.
Related
I can't get this code to run an input whilst another block of code is running. I want to know if there are any workarounds, my code is as follows.
import multiprocessing
def test1():
input('hello')
def test2():
a=True
while a == True:
b = 5
if __name__ == "__main__":
p1 = multiprocessing.Process(target=test1)
p2 = multiprocessing.Process(target=test2)
p1.start()
p2.start()
p1.join()
p2.join()
When the code is run I get an EOF error which apparently happens when the input function is interrupted.
I would have the main process create a daemon thread responsible for doing the input in conjunction with creating the greatly under-utilized full duplex Pipe which provides two two-way communication Connection instances. For simplicity the following demo just creates one Process instance that loops doing input requests echoing the response until the user enters 'quit':
import multiprocessing
import threading
def test1(conn):
while True:
conn.send('Please enter a value: ')
s = conn.recv()
if s == 'quit':
break
print(f'You entered: "{s}"')
def inputter(conn):
while True:
# The contents of the request is the prompt to be used:
prompt = conn.recv()
conn.send(input(prompt))
if __name__ == "__main__":
conn1, conn2 = multiprocessing.Pipe(duplex=True)
t = threading.Thread(target=inputter, args=(conn1,), daemon=True)
p = multiprocessing.Process(target=test1, args=(conn2,))
t.start()
p.start()
p.join()
That's not all of your code, because it doesn't show the multiprocessing. However, the issue is that only the main process can interact with the console. The other processes do not have a stdin. You can use a Queue to communicate with the main process if you need to, but in general you want the secondary processes to be pretty much standalone.
For some reason, I can not do parallel processing by python. Fo example by running the below code, I get runtime errors:
import multiprocessing as mp
import time
def sleep_for_a_bit(seconds):
print(f'Sleeping {seconds} second(s)')
time.sleep(seconds)
print("Done Sleeping")
p1=mp.Process(target=sleep_for_a_bit,args=[1])
p2=mp.Process(target=sleep_for_a_bit,args=[1])
if __name__ == '__main__':
mp.freeze_support()
p1.start()
p2.start()
finish=time.perf_counter()
print("finish running after seconds : ",finish)
this is the error message:
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.`
I have a Windows desktop and it actually ran (maybe I was lucky). But in general, on platforms such as Windows that use spawn to create new platforms, you should take all code that you do not want your newly created processes to execute out of global scope since these processes will be created by launching a new Python interpreter and restarting executing from the top of the program and if the code is not contained within a if __name__ == '__main__': block, it will be executed. So my best suggestion is to try the following (I have made a few corrections to the code):
import multiprocessing as mp
import time
def sleep_for_a_bit(seconds):
print(f'Sleeping {seconds} second(s)')
time.sleep(seconds)
print("Done Sleeping")
if __name__ == '__main__':
mp.freeze_support() # not required unless you are creating an .exe file
p1=mp.Process(target=sleep_for_a_bit,args=[1])
p2=mp.Process(target=sleep_for_a_bit,args=[1])
start = time.perf_counter()
p1.start()
p2.start()
p1.join() # wait for process to finish
p2.join() # wait for process to finish
finish=time.perf_counter()
# perf_counter() is only meaningful when you take the difference between readings:
print("finish running after seconds : ", finish - start)
Prints:
Sleeping 1 second(s)
Sleeping 1 second(s)
Done Sleeping
Done Sleeping
finish running after seconds : 1.0933153999999998
I have a simple function that I intend to run in Parallel using the Python multiprocessing module. However I get the following error RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. The error suggests that I add this:
if __name__ == '__main__':
freeze_support()
And most posts online suggest the same like this SO answer.
I added it and it works but I don't seem to understand why it's necessary for such a simple piece of code.
Code without __name__=="__main__" (throws RuntimeError)
import multiprocessing
import time
start = time.perf_counter()
def do_something():
print('Sleeping 1 second...')
time.sleep(1)
print('Done sleeping...')
p1 = multiprocessing.Process(target=do_something)
p2 = multiprocessing.Process(target=do_something)
p1.start()
p2.start()
finish = time.perf_counter()
print(f'Finished in {round(finish - start, 2)} second(s)')
Code with __name__=="__main__" (doesn't throw RuntimeError)
import multiprocessing
import time
start = time.perf_counter()
def do_something():
print('Sleeping 1 second...')
time.sleep(1)
print('Done sleeping...')
def main():
p1 = multiprocessing.Process(target=do_something)
p2 = multiprocessing.Process(target=do_something)
p1.start()
p2.start()
finish = time.perf_counter()
print(f'Finished in {round(finish - start, 2)} second(s)')
if __name__ == "__main__":
main()
In Windows, multiprocessing.Process executes a fresh copy of python to run the code. It has to get the code you want to execute to load in that process so it pickles a snapshot of your current environment to expand in the child. For that to work, the child needs to reimport modules used by the parent. In particular, it needs to import the main script as a module. When you import, any code residing at module level executes.
So lets make the simplest case
foo.py
import multiprocessing as mp
process = mp.Process(target=print, args=('foo',))
process.start()
process.join()
process.start() executes a new python which imports foo.py. And there's the problem. That new foo will create another subprocess which will again import foo.py. So yet another process is created.
The would go on until you blow up your machine except that python detects the problem and raises the exception.
THE FIX
Python modules have the __name__ attribute. If you run your program as a script, __name__ is "main", otherwise, __name__ is the name of your module. So, when a multiprocessing process is importing your main script to setup your environment, its name is not __main__. You can use that to make sure that your MP work is only done in the parent module.
import multiprocessing as mp
if __name__ == "__main__":
# run as top level script, but not as imported module
process = mp.Process(target=print, args=('foo',))
process.start()
process.join()
Take a look at this simple python code with Process:
from multiprocessing import Process
import time
def f(name):
time.sleep(100)
print 'hello', name
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()#Has to be terminated in 5 seconds
#p.join()
print "This Needs to be Printed Immediately"
I guess I am looking for a function like p.start(timeout).
I want to terminate the p process if it has not self-finished in like 5 seconds. How can I do that? There seems to be no such function.
If p.join() is uncommented, the following print line will have to wait 100 seconds and can not be 'Printed Immediately'.But I want it be done immediately so the p.join() has to be commented out.
Use a separate thread to start the process, wait 5 seconds, then terminate the process. Meanwhile the main thread can do the work you want to happen immediately:
from multiprocessing import Process
import time
import threading
def f(name):
time.sleep(100)
print 'hello', name
def run_process_with_timeout(timeout, target, args):
p = Process(target=target, args=args)
p.start()
time.sleep(timeout)
p.terminate()
if __name__ == '__main__':
t = threading.Thread(target=run_process_with_timeout, args=(5,f,('bob',)))
t.start()
print "This Needs to be Printed Immediately"
You might want to take a look at that SO thread.
basically their solution is to use the timeout capability of the threading module by running the process in a separate thread.
You are right, there is no such function in Python 2.x in the subprocess library.
However, with Python 3.3 you can use:
p = subprocess.Popen(...)
try:
p.wait(timeout=5)
except TimeoutError:
p.kill()
With older Python versions, you would have to write a loop that calls p.poll() and checks the returncode, e.g. once per second.
This is (like polling in general) not optimal from performance point-of-view, but it always depends on what you expect.
Try something like this:
def run_process_with_timeout(timeout, target, args):
p = Process(target=target, args=args)
running = False
second = int(time.strftime("%S"))
if second+timeout > 59:
second = (second+timeout)-60
else:
second = second+timeout
print second
while second > int(time.strftime("%S")):
if running == False:
p.start()
running = True
p.terminate()
basically just using the time module to allow a loop to run for five seconds and then moving on, this assumes timeout is given in seconds.
Though I'd point out that if this was used with the code the OP originally posted, this would work, as print was in a second function separate from the loop and would be carried out immediately after calling this function.
Why not use the timeout option of Process.join(), as in:
import sys
...
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()#Has to be terminated in 5 seconds
# print immediately and flush output
print "This Needs to be Printed Immediately"
sys.stdout.flush()
p.join(5)
if p.is_alive():
p.terminate()
I have created two processes but they are not starting according to this code.
any idea what is the problem?
import serial
from multiprocessing import Process
ser=serial.Serial('COM8',115200)
c=" "
out=" "
def pi():
print ("started")
out=" "
while 1:
# loop contents
def man():
while(1):
# loop contents
p1=Process(target=pi,args=())
p2=Process(target=man,args=())
p1.start()
p2.start()
p1.join()
p2.join()
I'll guess you're using windows...
Put your initalisation code in an if __name__ == '__main__': block:
import serial
from multiprocessing import Process
ser=serial.Serial('COM8',115200)
c=" "
out=" "
def pi():
print ("started")
out=" "
while 1:
# loop contents
def man():
while(1):
# loop contents
if __name__ == '__main__':
p1=Process(target=pi,args=())
p2=Process(target=man,args=())
p1.start()
p2.start()
p1.join()
p2.join()
On windows, to work around the lack of fork() each newly started subprocess has to import the __main__ module, so you'll run into an endless loop of spawning processes unless if you don't protect your initialistion code.