multiprocessing hangs in Jupyter notebook [duplicate] - python

I am trying to implement multiprocessing in my code, and so, I thought that I would start my learning with some examples. I used the first example found in this documentation.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
When I run the above code I get an AttributeError: can't get attribute 'f' on <module '__main__' (built-in)>. I do not know why I am getting this error. I am also using Python 3.5 if that helps.

This problem seems to be a design feature of multiprocessing.Pool. See https://bugs.python.org/issue25053. For some reason Pool does not always work with objects not defined in an imported module. So you have to write your function into a different file and import the module.
File: defs.py
def f(x):
return x*x
File: run.py
from multiprocessing import Pool
import defs
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(defs.f, [1, 2, 3]))
If you use print or a different built-in function, the example should work. If this is not a bug (according to the link), the given example is chosen badly.

The multiprocessing module has a major limitation when it comes to IPython use:
Functionality within this package requires that the __main__ module be
importable by the children. [...] This means that some examples, such
as the multiprocessing.pool.Pool examples will not work in the
interactive interpreter. [from the documentation]
Fortunately, there is a fork of the multiprocessing module called multiprocess which uses dill instead of pickle to serialization and overcomes this issue conveniently.
Just install multiprocess and replace multiprocessing with multiprocess in your imports:
import multiprocess as mp
def f(x):
return x*x
with mp.Pool(5) as pool:
print(pool.map(f, [1, 2, 3, 4, 5]))
Of course, externalizing the code as suggested in this answer works as well, but I find it very inconvenient: That is not why (and how) I use IPython environments.
<tl;dr> multiprocessing does not work in IPython environments right away, use its fork multiprocess instead.

This answer is for those who get this error on Windows 10 in 2021.
I've researched this error a bit since I got it myself. I get this error when running any examples from the official Python 3 documentation on multiprocessing.
Test environment:
x86 Windows 10.0.19043.1165 + Python 3.9.2 - there is an error
x86 Windows 10.0.19043.1165 + Python 3.9.6 - there is an error
x86 Windows 10.0.19043.1110 + Python 3.9.6 - there is an error
ARM Windows 10.0.21354.1 + Python 3.9.6 - no error (version from DEV branch)
ARM macOS 11.5.2 + Python 3.9.6 - no errors
I have no way to test this situation in other conditions. But my guess is that the problem is on Windows as there is no such bug in the developer version "10.0.21354.1", but this ARM version probably has x86 emulation.
Also note that there was no such bug at the time Python 3.9.2 was released (February). Since all this time I was working on the same computer, I was surprised by the situation when the previously working code stopped working, and only the version for Windows changed.
I was unable to find a bug request with a similar situation in the Python bug tracker (I probably did a poor search). And the message marked "Correct answer" refers to a different situation. The problem is easy to reproduce, you can try to follow any example from the multiprocessing documentation on a freshly installed Windows 10 + Python 3.
Later, I will have the opportunity to check out Python 3.10 and the latest version of Windows 10.
I am also interested in this situation in the context of Windows 11.
If you have information about this error (link to the bug tracker or something similar), be sure to share it.
At the moment I switched to Linux to continue working.

Why not use joblib? Your code is equivalent to:
# pip install joblib
from joblib import Parallel, delayed
def f(x):
return x*x
res = Parallel(
n_jobs=5
)(
delayed(f)(x) for x in [1, 2, 3]
)

If you're using Jupyter notebook (like the OP), then defining the function in a separate cell and executing that cell first fixes the problem. The accepted answer works too, but it's more work. Defining the function before, i.e. above the pool, isn't adequate. It has to be in a completely different notebook cell which is executed first.

Related

Python multiprocessing not working on Windows

I'm created a dedicated environment for my new project using anacoda on Windows 10. I write and run my code from Jupyter Notebook where I want to use multiprocessing but after I run even the most straightforward code from the module's documentation it gets stuck. Here's the code:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
Code below also doesn't work:
p = Pool(5)
results = p.map(f, [1, 2, 3])
print(results)
Changing the environment to the base doesn't help.
However, while running the code from PyCharm it works perfectly fine. Also it runs fine in Jupyter on Linux. I assume then it must be something Windows-Jupyter-related.
Versions of libraries I use:
python = 3.10.4
jupyter = 1.0.0
CPU: Intel i5 (but it shouldn't matter I think).
I've found a topic related to the same topic: Python multiprocessing on Windows 10
It's mentioned there, that it was an issue when using multiprocessing through venv but it is solved since Python 3.7.3.
Any ideas on how to solve the issue? Any workarounds?

ModuleNotFoundError when using multiprocessing

module path lost in multiprocessing spawn (ModuleNotFoundError)
The so-called solution of inserting sys-path above the importing of the module does not work for me.
Here is my main.py
import multiprocessing
from testing import customfunction
customfunction(1,2,3)
if __name__ == "__main__":
process = multiprocessing.Process(target=customfunction)
process.start()
process.join()
print("DONE")
The main.py works fine up to process.start()
This means customfunction has been imported properly
Here is my testing.py
import random
def customfunction(size, test, hello):
random.seed(size)
print(random.random())
return random.random()
Both main.py and testing.py are in the same folder. A separate folder with an init.py file did not work as well.
I get this error:
from testing import customfunction
ModuleNotFoundError: No module named 'testing'
I can not wrap my head around why does the process created not retain the system pathing in order to import the file. If i place the multiprocessing creation in customfunction it doesn't work either, the same error occurs.
The link I shared at the top does not work for me as well.
Thank you for taking the time to read. If you believe this is a duplicate of another question, please link it and explain, I am new to python.
EDIT:
I am using Windows 10 as my OS
I have installed Spyder 4.1.4 using Anaconda Navigator, Python 3.7.7.
I installed using a executable package.
I have tested this code on VS Code as well.
I am running this via the two IDEs mentioned(E.G VS Code Powershell console and Spyder's Python Console by clicking run)
I generally currently believe it is an issue specific to my computer, and I'll like to know if its replicable in other Windows Systems and whether or not the linked "solution" in the first line works. With that I may be able to pinpoint my errors
This should work if you invoke multiprocessing.Process(target=customfunction, args=(1,2,3)) instead. I can't think of a reason why this would not work on Linux.
Can you update your question and provide the following information?
What OS are you using?
What version of Python are you running, and how was it installed?
How are you running main.py (e.g. from the command line, from an IDE, etc.)?
Any other details about your system configuration that might help others answer your question?
No multiprocessing print outputs (Spyder)
The solution mentioned here seems to solve my problem, I never expected either VS Code or Spyder's Console to have an issue with multiprocessing, but running the code in an external system terminal works.
Thank you to Melih Elibol for helping me think more clearly about the problem, I am new to python.

Multiprocess not found with Pathos

This is the first time I am using pathos library in python under Visual Studio 2019 on windows 10. When the debugger encounters the line
solver.SetMapper(Pool(self.Config.NumberOfProcessors).map)
I get error
ModuleNotFoundError: No module named 'multiprocess'
I have the following statement at the beginning of my code
from pathos.pools import ProcessPool as Pool
I have C++ compilers(multiple version of Visual studios) and I have used latest version of pip to install the packages. I also see that the multiprocess package has been installed under pathos.
I do see multiple question on the same topic on the web, but unable to resolve the issue
Actually I can reproduce the same situation with a simple example like:
def foo(x):
return x
def bar(x):
return foo(x)
x=Pool(4).map(bar, [0, 1])
print (x)
A new installation of python 3.8.1 seems to make this issue go away

Cant get attribute error when using Python multiprocessing library [duplicate]

I am trying to implement multiprocessing in my code, and so, I thought that I would start my learning with some examples. I used the first example found in this documentation.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
When I run the above code I get an AttributeError: can't get attribute 'f' on <module '__main__' (built-in)>. I do not know why I am getting this error. I am also using Python 3.5 if that helps.
This problem seems to be a design feature of multiprocessing.Pool. See https://bugs.python.org/issue25053. For some reason Pool does not always work with objects not defined in an imported module. So you have to write your function into a different file and import the module.
File: defs.py
def f(x):
return x*x
File: run.py
from multiprocessing import Pool
import defs
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(defs.f, [1, 2, 3]))
If you use print or a different built-in function, the example should work. If this is not a bug (according to the link), the given example is chosen badly.
The multiprocessing module has a major limitation when it comes to IPython use:
Functionality within this package requires that the __main__ module be
importable by the children. [...] This means that some examples, such
as the multiprocessing.pool.Pool examples will not work in the
interactive interpreter. [from the documentation]
Fortunately, there is a fork of the multiprocessing module called multiprocess which uses dill instead of pickle to serialization and overcomes this issue conveniently.
Just install multiprocess and replace multiprocessing with multiprocess in your imports:
import multiprocess as mp
def f(x):
return x*x
with mp.Pool(5) as pool:
print(pool.map(f, [1, 2, 3, 4, 5]))
Of course, externalizing the code as suggested in this answer works as well, but I find it very inconvenient: That is not why (and how) I use IPython environments.
<tl;dr> multiprocessing does not work in IPython environments right away, use its fork multiprocess instead.
This answer is for those who get this error on Windows 10 in 2021.
I've researched this error a bit since I got it myself. I get this error when running any examples from the official Python 3 documentation on multiprocessing.
Test environment:
x86 Windows 10.0.19043.1165 + Python 3.9.2 - there is an error
x86 Windows 10.0.19043.1165 + Python 3.9.6 - there is an error
x86 Windows 10.0.19043.1110 + Python 3.9.6 - there is an error
ARM Windows 10.0.21354.1 + Python 3.9.6 - no error (version from DEV branch)
ARM macOS 11.5.2 + Python 3.9.6 - no errors
I have no way to test this situation in other conditions. But my guess is that the problem is on Windows as there is no such bug in the developer version "10.0.21354.1", but this ARM version probably has x86 emulation.
Also note that there was no such bug at the time Python 3.9.2 was released (February). Since all this time I was working on the same computer, I was surprised by the situation when the previously working code stopped working, and only the version for Windows changed.
I was unable to find a bug request with a similar situation in the Python bug tracker (I probably did a poor search). And the message marked "Correct answer" refers to a different situation. The problem is easy to reproduce, you can try to follow any example from the multiprocessing documentation on a freshly installed Windows 10 + Python 3.
Later, I will have the opportunity to check out Python 3.10 and the latest version of Windows 10.
I am also interested in this situation in the context of Windows 11.
If you have information about this error (link to the bug tracker or something similar), be sure to share it.
At the moment I switched to Linux to continue working.
Why not use joblib? Your code is equivalent to:
# pip install joblib
from joblib import Parallel, delayed
def f(x):
return x*x
res = Parallel(
n_jobs=5
)(
delayed(f)(x) for x in [1, 2, 3]
)
If you're using Jupyter notebook (like the OP), then defining the function in a separate cell and executing that cell first fixes the problem. The accepted answer works too, but it's more work. Defining the function before, i.e. above the pool, isn't adequate. It has to be in a completely different notebook cell which is executed first.

Solution to multiprocessing error in Ipython Notebook?

I'm trying to use the multiprocessing module in Ipython Notebook, however when I execute my code, it crashes the Kernel:
from multiprocessing import Pool
def f(x):
return x*x
p = Pool(8)
p.map(f, [1, 2, 3])
The docs provide this bit of information:
Functionality within this package requires that the main module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the multiprocessing.Pool examples will not work in the interactive interpreter
However, the module seems to work for some, such as in this tutorial.
Other stackoverflow threads indicate that you should rearrange your code like so:
yet another confusion with multiprocessing error, 'module' object has no attribute 'f'
But I'm still getting the same error:
AttributeError: 'module' object has no attribute 'f'
I'm using Windows 8, Python 2.7.12, and the latest versions of IPython Notebook/Anaconda distribution.
Is there a definitive solution for this bug?
Is the source of the error really the lack of if __name__ == "__main__": statement?
Is this a Windows specific issue, or does it apply to Unix as well?

Categories