Program hangs in debug when multiprocessing process opens another process - python

In a python program, a Process is opened using multiprocessing.Process. Then this process is creating a Pool in order to give it some work using the map() method.
When the program is normally run, all works as expected. However, when it is run in PyCharm debugger, the call to Pool.map never returns and program is locked.
The problem is demonstrated in the following simple example:
1) Code:
import multiprocessing
def inc(a):
return a + 1;
def func():
p = multiprocessing.Pool(2)
print("before map")
res = p.map(inc, [1,4]) # ==> the method hangs in debug.
print("after call map")
p.close()
p.join()
print(res)
def main():
p = multiprocessing.Process(target=func)
p.start()
p.join()
if __name__ == '__main__':
main()
2) Output as expected when program is run:
before map
after call map
[2, 5]
Process finished with exit code 0
3) Output when program run in debugger - never completes:
pydev debugger: process 13792 is connecting
Connected to pydev debugger (build 173.4301.16)
before map
Is this just a very annoying debugging issue (maybe caused by debugger background threads?)? or is it a multiprocessing problem that might appear also in real run?
It should be mentioned that using only one of the subprocessing steps, meaning either just opening a Process(), or just using a pool.map(), causes no problems and could be debugged. The problem occurs only in the "nested" subrocessing, as described.
I am running PyCharm on a Windows 10 64 bit machine.

I just had similar problem and found out that I was having a break point placed inside the function that I was calling using the map() method. Removing this break point or ignoring break points resolved the problem for me. Hope that this helps.

Related

Python Multiprocessing Looping Python File Instead of Starting Process

I'm trying to get started with multiprocessing, and I'm running into some interesting issues. The code I'm using is below (for the record, this example is straight from the multiprocessing documentation):
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob'))
p.start()
p.join()
This works fine, and prints "hello bob" as it should. When I add any additional code to the file though, before or after the if statement, then p does not evaluate, and my file loops back to the beginning and runs all over again endlessly. For example, the following code gives me this issue:
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob'))
p.start()
p.join()
test_input = input("test input")
I am running Python using Windows 10, Pycharm v. 2021.3.2, and Python 3.10.0. Is this an issue that any of you have seen before? At this point I'm starting to wonder if perhaps it's even an issue between Windows and Pycharm or Windows and Python, or maybe just a case of inexperience on my part.
Thank you!
That if __name__ == '__main__': guard is important. On systems that don't use fork, it simulates a fork by importing the main script in each worker process without naming it __main__ (it's named __mp_main__ IIRC). Any code that should only run in the "main" script needs to be protected by that guard (it can be indirectly, by defining a function and calling it within the guarded segment; the function will be defined in the workers, but not run).
So to fix this, all you need to do is indent the test_input = input("test input") so it's protected by the if __name__ == '__main__': guard. In real code, I try to keep the guarded section clean (so I can't accidentally write functions that rely on global state that doesn't exist when it's not run as the main script, and for the mild performance benefits of using function locals over globals), so I'd write it like:
from multiprocessing import Process
def f(name):
print('hello', name)
def main():
p = Process(target=f, args=('bob',))
p.start()
p.join()
test_input = input("test input")
if __name__ == '__main__':
main()
but that's not strictly necessary.
I thought I would elaborate on ShadowRanger's answer:
On Windows systems new subprocesses are created by the following steps:
A new process is created wherein the Python interpreter is re-launched.
The Python interpreter re-interprets the current source program executing everything that is at global scope in order to compile function definitions, initialize global variables, etc.
Finally, your worker function, f in this case, is invoked with memory thus initialized.
The reason for placing the code that creates the subprocess within a block that is governed by if __name__ == '__main__': is that if you didn't, then because of Step 2 above you would get into a recursive, infinite loop creating new subprocesses ad inifinitum. The key point is that only in the main function will variable __name__ have the value '__main__'; it will have a different value for any subprocess that is created. And so the code that creates the new subprocess, i.e. p = Process(target=f, args=('bob',)), will not be executed as part of the initialization of the subprocess.
Your problem arises from the statement test_input = input("test input") being at global scope and not being within a if __name__ == '__main__': block and so it will be executed as part of the initialization of the subprocess. So your worker function, f, will not run until this prompt for input is satisfied and then when it returns your main process will be putting out the prompt again. Anyway, this is what I see when the program is run from a Windows command prompt. Perhaps with PyCharm there is a restriction against doing the input statement from any thread other than the main thread. But even if an exception is being thrown from that statement in creating the subprocess, I still don't quite see how your program would be looping continuously. Unfortunately, I do not have PyCharm installed.
Regarding to ShadowRanger answer, I think you should also put comma after 'bob'.
According to https://docs.python.org/3/library/multiprocessing.html
P should be like this if you want to put another statement.
p = Process(target=f, args=('bob',))

Python Multiprocessing within Jupyter Notebook

I am new to the multiprocessing module in Python and work with Jupyter notebooks. I have tried the following code snippet from PMOTW:
import multiprocessing
def worker():
"""worker function"""
print('Worker')
return
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
When I run this as is, there is no output.
I have also tried creating a module called worker.py and then importing that to run the code:
import multiprocessing
from worker import worker
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
There is still no output in that case. In the console, I see the following error (repeated multiple times):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 116, in _main
self = pickle.load(from_parent)
AttributeError: Can't get attribute 'worker' on <module '__main__' (built-in)>
However, I get the expected output when the code is saved as a Python script and exectued.
What can I do to run this code directly from the notebook without creating a separate script?
I'm relatively new to parallel computing so I may be wrong with some technicalities. My understanding is this:
Jupyter notebooks don't work with multiprocessing because the module pickles (serialises) data to send to processes.
multiprocess is a fork of multiprocessing that uses dill instead of pickle to serialise data which allows it to work from within Jupyter notebooks. The API is identical so the only thing you need to do is to change
import multiprocessing
to...
import multiprocess
You can install multiprocess very easily with a simple
pip install multiprocess
You will however find that your processes will still not print to the output, (although in Jupyter labs they will print out to the terminal the server out is running in). I stumbled upon this post trying to work around this and will edit this post when I find out how to.
I'm not an export either in multiprocessing or in ipykernel(which is used by jupyter notebook) but because there seems nobody gives an answer, I will tell you what I guessed. I hope somebody complements this later on.
I guess your jupyter notebook server is running on Windows host. In multiprocessing there are three different start methods. Let's focus on spawn, which is the default on windows, and fork, the default on Unix.
Here is a quick overview.
spawn
(cpython) interactive shell - always raise an error
run as a script - okay only if you nested multiprocessing code in if __name__ == '__main'__
Fork
always okay
For example,
import multiprocessing
def worker():
"""worker function"""
print('Worker')
return
if __name__ == '__main__':
multiprocessing.set_start_method('spawn')
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
This code works when it's saved and run as a script, but raises an error when entered in an python interactive shell. Here is the implementation of ipython kernel, and my guess is that that it uses some kind of interactive shell and so doesn't go well with spawn(but please don't trust me).
For a side note, I will give you an general idea of how spawn and fork are different. Each subprocess is running a different python interpreter in multiprocessing. Particularly, with spawn, a child process starts a new interpreter and imports necessary module from scratch. It's hard to import code in interactive shell, so it may raise an error.
fork is different. With fork, a child process copies the main process including most of the running states of the python interpreter and then continues execution. This code will help you understand the concept.
import os
main_pid = os.getpid()
os.fork()
print("Hello world(%d)" % os.getpid()) # print twice. Hello world(id1) Hello world(id2)
if os.getpid() == main_pid:
print("Hello world(main process)") # print once. Hello world(main process)
Much like you I encountered the attribute error. The problem seems to be related how jupyter handles multithreading. The fastest result I got was to follow the Multi-processing example.
So the ThreadPool took care of my issue.
from multiprocessing.pool import ThreadPool as Pool
def worker():
"""worker function"""
print('Worker\n')
return
pool = Pool(4)
for result in pool.map(worker, range(5)):
pass # or print diagnostics
This works for me on MAC (cannot make it work on windows):
import multiprocessing as mp
mp_start_count = 0
if __name__ == '__main__':
if mp_start_count == 0:
mp.set_start_method('fork')
mp_start_count += 1
Save the function to a separate Python file then import the function back in. It should work fine that way.

Is there any interpreter that works well for running multiprocessing in python 2.7 version on windows 7?

I was trying to run a piece of code.This code is all about multiprocessing.It works fine on command prompt and it also generates some output.But when I try to run this code on pyscripter it just says that script runs ok and it doesn't generate any output nor even it displays any error message.It doesn't even crashes.It would be really helpful if anyone could help me out to find out a right interpreter where this multiprocessing works fine.
Here is the piece of code:
from multiprocessing import Process
def wait():
print "wait"
clean()
def clean():
print "clean"
def main():
p=Process(target=wait)
p.start()
p.join()
if _name_=='_main_':
main()
The normal interpreter works just fine with multiprocessing on Windows 7 for me. (Your IDE might not like multiprocessing.)
You just have to do
if __name__=='__main__':
main()
with 2 underscores (__) each instead of 1 (_).
Also - if you don't have an actual reason not to use it, multiprocessing.Pool is much easier to use than multiprocessing.Process in most cases. Have alook at https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool
An implementation with a Pool would be
import multiprocessing
def wait():
print "wait"
clean()
def clean():
print "clean"
def main():
p=multiprocessing.Pool()
p.apply_async(wait)
p.close()
p.join()
if __name__=='__main__':
main()
but which method of Pool to use strongly depends on what you actually want to do.

IPython: Running commands asynchronously in the background

Say I have a Python command or script that I want to run from IPython asynchronously, in the background, during an IPython session.
I would like to invoke this command from my IPython session and be notified when it is done, or if something fails. I don't want this command to block my IPython prompt.
Are there any IPython magics that support this? If not, what is the recommended way of running asynchronous jobs/scripts/commands (that run locally) on IPython?
For example, say I have a function:
def do_something():
# This will take a long time
# ....
return "Done"
that I have in the current namespace. How I can I run it to the background and be notified when it is done?
Yes, try (in a cell):
%%script bash --bg --out script_out
sleep 10
echo hi!
The script magic is documented along with the other IPython magics. The necessary argument here is -bg to run the below script in the background (asynchronously) instead of the foreground (synchronously).
GitHub Issue #844 is now resolved.
There used to be a magic function in iPython that would let you do just that:
https://github.com/ipython/ipython/wiki/Cookbook:-Running-a-file-in-the-background
However, it seems that it was removed and is still pending to come back in newer versions:
https://github.com/ipython/ipython/issues/844
It still provides a library to help you achieve it, though:
http://ipython.org/ipython-doc/rel-0.10.2/html/api/generated/IPython.background_jobs.html
The most general way would be to use the Multiprocessing Module. This should allow you to call functions in your current script in the background (completely new process).
Edit This might not be the cleanest way, but should get the job done.
import time
from multiprocessing import Process, Pipe
ALONGTIME = 3
def do_something(mpPipe):
# This will take a long time
print "Do_Something_Started"
time.sleep(ALONGTIME)
print "Do_Something_Complete"
mpPipe.send("Done")
if __name__ == "__main__":
parent_conn, child_conn = Pipe()
p = Process(target=do_something, args=(child_conn,))
p.start()
p.join() # block until the process is complete - this should be pushed to the end of your script / managed differently to keep it async :)
print parent_conn.recv() # will tell you when its done.

why process doesn't join and doesn't run?

i have a simple problem to solve(more or less)
if i watch python multiprocessing tutorials i see that a process should be started more or less like this:
from multiprocessing import *
def u(m):
print(m)
return
A=Process(target=u,args=(0,))
A.start()
A.join()
It should print a 0 but nothing gets printed. Instead it hangs forever at the A.join().
if i manually start the function u doing this
A.run()
it actually prints 0 on the shell but it doesn't work simultaneously
for example the output of following code:
from multiprocessing import *
from time import sleep
def u(m):
sleep(1)
print(m)
return
A=Process(target=u,args=(1,))
A.start()
print(0)
should be
0
1
but actually is
0
and if i add before the last line
A.run()
then the output becomes
1
0
this seems confusing to me...
and if i try to join the process it waits forever.
however,if it can help giving me an answer
my OS is Mac os x 10.6.8
python versions used are 3.1 and 3.3
my computer has 1 intel core i3 processor
--Update--
I have noticed that this strange behaviour is present only when launching the program from IDLE ,if i run the program from the terminal everything works as it is supposed to,so this problem must be connected to some IDLE bug.
But runnung programs from terminal is even weirder: using something like range(100000000) activates all my computer's ram until the end of the program; if i remember well this shouldn't happen in python 3,only in older python versions.
I hope these new informations will help you giving an answer
--Update 2--
the bug occurs even if i don't perform output from my process,because setting this:
def u():
return
as the target of the process and then starting it , if i try to join the process,idle waits forever
As suggested here and here, the problem is that IDLE overrides sys.stdin and sys.stdout in some weird ways, which do not propagate cleanly to processes you spawn from it (they are not real filehandles).
The first link also indicates it's unlikely to be fixed any time soon ("may be a 'cannot fix' issue", they say).
So unfortunately the only solution I can suggest is not to use IDLE for this script...
Have you tried adding A.join() to your program? I am guessing that your main process is exiting before the child process prints which is causing the output to be hidden. If you tell the main process to wait for the child process (A.join()), I bet you'll see the output you expect.
Given that it only happens with IDLE, I suspect the problem has to do with the stdout used by both processes. Perhaps it's some file-like object that's not safe to use from two different processes.
If you don't have the child process write to stdout, I suspect it will complete and join properly. For example, you could have it write to a file, instead. Or you could set up a pipe between the parent and child.
Have you tried unbuffered output? Try importing the sys module and change the print statement:
print >> sys.stderr, m
How does this affect the behavior? I'm with the others that suspect that IDLE is mucking with the stdio . . .

Categories