Simple multiprocessing Pool hangs in Jupyter notebook - python

I'm trying to run some multiprocessing in a Jupyter notebook, using python version 3.7.0.
However, even a really simple example seems to hang indefinitely.
After reading this answer I tried explicitly calling .close and .join on the pool, but it still hangs. Example code is below, can anyone tell me what's wrong?
import multiprocessing as mp
def fun(x):
return 2*x
with mp.Pool() as pool:
args = list(range(10))
res = pool.map(fun, args)
pool.close()
pool.join()

Another solution is to use the multiprocess module, which is identical to the multiprocessing library while working with Jupyter.

For me, it worked the solution proposed by #Booboo.
Write your function in an external file
import it to your .ipynb file

Related

Multiprocessing in python [broken pool process]

I am new to the multiprocessing and exploring how to use them. As I refer to python documentation example, I just tried one of my functions but my jupyter notebook gave me an error. What could be the reason?
import concurrent.futures
def dummy(x):
return x**(1/200)
def main():
with concurrent.futures.ProcessPoolExecutor() as executer:
x =[1,2,3,4,5,6]
future = executer.map(dummy,x)
for result in future:
print(result)
if __name__ == '__main__':
main()
and the error is : How will I fix this? I have the latest version of Python.
BrokenProcessPool: A process in the process pool was terminated
abruptly while the future was running or pending.
looks like as I was using Jupyter notebook, it was the problem. I just read the line from the documentation, " The main module must be importable by worker subprocesses. This means that ProcessPoolExecutor will not work in the interactive interpreter".

Multiprocessing AsyncResult.get() hangs in Python 3.7.2 but not in 3.6

I'm trying to port some code from Python 3.6 to Python 3.7 on Windows 10. I see the multiprocessing code hang when calling .get() on the AsyncResult object. The code in question is much more complicated, but I've boiled it down to something similar to the following program.
import multiprocessing
def main(num_jobs):
num_processes = max(multiprocessing.cpu_count() - 1, 1)
pool = multiprocessing.Pool(num_processes)
func_args = []
results = []
try:
for num in range(num_jobs):
args = (1, 2, 3)
func_args.append(args)
results.append(pool.apply_async(print, args))
for result, args in zip(results, func_args):
print('waiting on', args)
result.get()
finally:
pool.terminate()
pool.join()
if __name__ == '__main__':
main(5)
This code also runs in Python 2.7. For some reason the first call to get() hangs in 3.7, but everything works as expected on other versions.
I think this is a regression in Python 3.7.2 as described here. It seems to only affect users when running in a virtualenv.
For the time being you can work-around it by doing what's described in this comment on the bug thread.
import _winapi
import multiprocessing.spawn
multiprocessing.spawn.set_executable(_winapi.GetModuleFileName(0))
That will force the subprocesses to spawn using the real python.exe instead of the one that's in the virtualenv. So, this may not be suitable if you're bundling things into an exe with PyInstaller, but it works OK when running from the CLI with local Python installation.

Python Multiprocessing within Jupyter Notebook

I am new to the multiprocessing module in Python and work with Jupyter notebooks. I have tried the following code snippet from PMOTW:
import multiprocessing
def worker():
"""worker function"""
print('Worker')
return
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
When I run this as is, there is no output.
I have also tried creating a module called worker.py and then importing that to run the code:
import multiprocessing
from worker import worker
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
There is still no output in that case. In the console, I see the following error (repeated multiple times):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 106, in spawn_main
exitcode = _main(fd)
File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 116, in _main
self = pickle.load(from_parent)
AttributeError: Can't get attribute 'worker' on <module '__main__' (built-in)>
However, I get the expected output when the code is saved as a Python script and exectued.
What can I do to run this code directly from the notebook without creating a separate script?
I'm relatively new to parallel computing so I may be wrong with some technicalities. My understanding is this:
Jupyter notebooks don't work with multiprocessing because the module pickles (serialises) data to send to processes.
multiprocess is a fork of multiprocessing that uses dill instead of pickle to serialise data which allows it to work from within Jupyter notebooks. The API is identical so the only thing you need to do is to change
import multiprocessing
to...
import multiprocess
You can install multiprocess very easily with a simple
pip install multiprocess
You will however find that your processes will still not print to the output, (although in Jupyter labs they will print out to the terminal the server out is running in). I stumbled upon this post trying to work around this and will edit this post when I find out how to.
I'm not an export either in multiprocessing or in ipykernel(which is used by jupyter notebook) but because there seems nobody gives an answer, I will tell you what I guessed. I hope somebody complements this later on.
I guess your jupyter notebook server is running on Windows host. In multiprocessing there are three different start methods. Let's focus on spawn, which is the default on windows, and fork, the default on Unix.
Here is a quick overview.
spawn
(cpython) interactive shell - always raise an error
run as a script - okay only if you nested multiprocessing code in if __name__ == '__main'__
Fork
always okay
For example,
import multiprocessing
def worker():
"""worker function"""
print('Worker')
return
if __name__ == '__main__':
multiprocessing.set_start_method('spawn')
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
This code works when it's saved and run as a script, but raises an error when entered in an python interactive shell. Here is the implementation of ipython kernel, and my guess is that that it uses some kind of interactive shell and so doesn't go well with spawn(but please don't trust me).
For a side note, I will give you an general idea of how spawn and fork are different. Each subprocess is running a different python interpreter in multiprocessing. Particularly, with spawn, a child process starts a new interpreter and imports necessary module from scratch. It's hard to import code in interactive shell, so it may raise an error.
fork is different. With fork, a child process copies the main process including most of the running states of the python interpreter and then continues execution. This code will help you understand the concept.
import os
main_pid = os.getpid()
os.fork()
print("Hello world(%d)" % os.getpid()) # print twice. Hello world(id1) Hello world(id2)
if os.getpid() == main_pid:
print("Hello world(main process)") # print once. Hello world(main process)
Much like you I encountered the attribute error. The problem seems to be related how jupyter handles multithreading. The fastest result I got was to follow the Multi-processing example.
So the ThreadPool took care of my issue.
from multiprocessing.pool import ThreadPool as Pool
def worker():
"""worker function"""
print('Worker\n')
return
pool = Pool(4)
for result in pool.map(worker, range(5)):
pass # or print diagnostics
This works for me on MAC (cannot make it work on windows):
import multiprocessing as mp
mp_start_count = 0
if __name__ == '__main__':
if mp_start_count == 0:
mp.set_start_method('fork')
mp_start_count += 1
Save the function to a separate Python file then import the function back in. It should work fine that way.

Is there any interpreter that works well for running multiprocessing in python 2.7 version on windows 7?

I was trying to run a piece of code.This code is all about multiprocessing.It works fine on command prompt and it also generates some output.But when I try to run this code on pyscripter it just says that script runs ok and it doesn't generate any output nor even it displays any error message.It doesn't even crashes.It would be really helpful if anyone could help me out to find out a right interpreter where this multiprocessing works fine.
Here is the piece of code:
from multiprocessing import Process
def wait():
print "wait"
clean()
def clean():
print "clean"
def main():
p=Process(target=wait)
p.start()
p.join()
if _name_=='_main_':
main()
The normal interpreter works just fine with multiprocessing on Windows 7 for me. (Your IDE might not like multiprocessing.)
You just have to do
if __name__=='__main__':
main()
with 2 underscores (__) each instead of 1 (_).
Also - if you don't have an actual reason not to use it, multiprocessing.Pool is much easier to use than multiprocessing.Process in most cases. Have alook at https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool
An implementation with a Pool would be
import multiprocessing
def wait():
print "wait"
clean()
def clean():
print "clean"
def main():
p=multiprocessing.Pool()
p.apply_async(wait)
p.close()
p.join()
if __name__=='__main__':
main()
but which method of Pool to use strongly depends on what you actually want to do.

IPython: Running commands asynchronously in the background

Say I have a Python command or script that I want to run from IPython asynchronously, in the background, during an IPython session.
I would like to invoke this command from my IPython session and be notified when it is done, or if something fails. I don't want this command to block my IPython prompt.
Are there any IPython magics that support this? If not, what is the recommended way of running asynchronous jobs/scripts/commands (that run locally) on IPython?
For example, say I have a function:
def do_something():
# This will take a long time
# ....
return "Done"
that I have in the current namespace. How I can I run it to the background and be notified when it is done?
Yes, try (in a cell):
%%script bash --bg --out script_out
sleep 10
echo hi!
The script magic is documented along with the other IPython magics. The necessary argument here is -bg to run the below script in the background (asynchronously) instead of the foreground (synchronously).
GitHub Issue #844 is now resolved.
There used to be a magic function in iPython that would let you do just that:
https://github.com/ipython/ipython/wiki/Cookbook:-Running-a-file-in-the-background
However, it seems that it was removed and is still pending to come back in newer versions:
https://github.com/ipython/ipython/issues/844
It still provides a library to help you achieve it, though:
http://ipython.org/ipython-doc/rel-0.10.2/html/api/generated/IPython.background_jobs.html
The most general way would be to use the Multiprocessing Module. This should allow you to call functions in your current script in the background (completely new process).
Edit This might not be the cleanest way, but should get the job done.
import time
from multiprocessing import Process, Pipe
ALONGTIME = 3
def do_something(mpPipe):
# This will take a long time
print "Do_Something_Started"
time.sleep(ALONGTIME)
print "Do_Something_Complete"
mpPipe.send("Done")
if __name__ == "__main__":
parent_conn, child_conn = Pipe()
p = Process(target=do_something, args=(child_conn,))
p.start()
p.join() # block until the process is complete - this should be pushed to the end of your script / managed differently to keep it async :)
print parent_conn.recv() # will tell you when its done.

Categories