Solution to multiprocessing error in Ipython Notebook?

Solution to multiprocessing error in Ipython Notebook? - python

I'm trying to use the multiprocessing module in Ipython Notebook, however when I execute my code, it crashes the Kernel:
from multiprocessing import Pool
def f(x):
return x*x
p = Pool(8)
p.map(f, [1, 2, 3])
The docs provide this bit of information:
Functionality within this package requires that the main module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the multiprocessing.Pool examples will not work in the interactive interpreter
However, the module seems to work for some, such as in this tutorial.
Other stackoverflow threads indicate that you should rearrange your code like so:
yet another confusion with multiprocessing error, 'module' object has no attribute 'f'
But I'm still getting the same error:
AttributeError: 'module' object has no attribute 'f'
I'm using Windows 8, Python 2.7.12, and the latest versions of IPython Notebook/Anaconda distribution.
Is there a definitive solution for this bug?
Is the source of the error really the lack of if __name__ == "__main__": statement?
Is this a Windows specific issue, or does it apply to Unix as well?

Related

multiprocessing hangs in Jupyter notebook [duplicate]

I am trying to implement multiprocessing in my code, and so, I thought that I would start my learning with some examples. I used the first example found in this documentation.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
When I run the above code I get an AttributeError: can't get attribute 'f' on <module '__main__' (built-in)>. I do not know why I am getting this error. I am also using Python 3.5 if that helps.

This problem seems to be a design feature of multiprocessing.Pool. See https://bugs.python.org/issue25053. For some reason Pool does not always work with objects not defined in an imported module. So you have to write your function into a different file and import the module.
File: defs.py
def f(x):
return x*x
File: run.py
from multiprocessing import Pool
import defs
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(defs.f, [1, 2, 3]))
If you use print or a different built-in function, the example should work. If this is not a bug (according to the link), the given example is chosen badly.

The multiprocessing module has a major limitation when it comes to IPython use:
Functionality within this package requires that the __main__ module be
importable by the children. [...] This means that some examples, such
as the multiprocessing.pool.Pool examples will not work in the
interactive interpreter. [from the documentation]
Fortunately, there is a fork of the multiprocessing module called multiprocess which uses dill instead of pickle to serialization and overcomes this issue conveniently.
Just install multiprocess and replace multiprocessing with multiprocess in your imports:
import multiprocess as mp
def f(x):
return x*x
with mp.Pool(5) as pool:
print(pool.map(f, [1, 2, 3, 4, 5]))
Of course, externalizing the code as suggested in this answer works as well, but I find it very inconvenient: That is not why (and how) I use IPython environments.
<tl;dr> multiprocessing does not work in IPython environments right away, use its fork multiprocess instead.

This answer is for those who get this error on Windows 10 in 2021.
I've researched this error a bit since I got it myself. I get this error when running any examples from the official Python 3 documentation on multiprocessing.
Test environment:
x86 Windows 10.0.19043.1165 + Python 3.9.2 - there is an error
x86 Windows 10.0.19043.1165 + Python 3.9.6 - there is an error
x86 Windows 10.0.19043.1110 + Python 3.9.6 - there is an error
ARM Windows 10.0.21354.1 + Python 3.9.6 - no error (version from DEV branch)
ARM macOS 11.5.2 + Python 3.9.6 - no errors
I have no way to test this situation in other conditions. But my guess is that the problem is on Windows as there is no such bug in the developer version "10.0.21354.1", but this ARM version probably has x86 emulation.
Also note that there was no such bug at the time Python 3.9.2 was released (February). Since all this time I was working on the same computer, I was surprised by the situation when the previously working code stopped working, and only the version for Windows changed.
I was unable to find a bug request with a similar situation in the Python bug tracker (I probably did a poor search). And the message marked "Correct answer" refers to a different situation. The problem is easy to reproduce, you can try to follow any example from the multiprocessing documentation on a freshly installed Windows 10 + Python 3.
Later, I will have the opportunity to check out Python 3.10 and the latest version of Windows 10.
I am also interested in this situation in the context of Windows 11.
If you have information about this error (link to the bug tracker or something similar), be sure to share it.
At the moment I switched to Linux to continue working.

Why not use joblib? Your code is equivalent to:
# pip install joblib
from joblib import Parallel, delayed
def f(x):
return x*x
res = Parallel(
n_jobs=5
)(
delayed(f)(x) for x in [1, 2, 3]
)

If you're using Jupyter notebook (like the OP), then defining the function in a separate cell and executing that cell first fixes the problem. The accepted answer works too, but it's more work. Defining the function before, i.e. above the pool, isn't adequate. It has to be in a completely different notebook cell which is executed first.

ModuleNotFoundError when using multiprocessing

module path lost in multiprocessing spawn (ModuleNotFoundError)
The so-called solution of inserting sys-path above the importing of the module does not work for me.
Here is my main.py
import multiprocessing
from testing import customfunction
customfunction(1,2,3)
if __name__ == "__main__":
process = multiprocessing.Process(target=customfunction)
process.start()
process.join()
print("DONE")
The main.py works fine up to process.start()
This means customfunction has been imported properly
Here is my testing.py
import random
def customfunction(size, test, hello):
random.seed(size)
print(random.random())
return random.random()
Both main.py and testing.py are in the same folder. A separate folder with an init.py file did not work as well.
I get this error:
from testing import customfunction
ModuleNotFoundError: No module named 'testing'
I can not wrap my head around why does the process created not retain the system pathing in order to import the file. If i place the multiprocessing creation in customfunction it doesn't work either, the same error occurs.
The link I shared at the top does not work for me as well.
Thank you for taking the time to read. If you believe this is a duplicate of another question, please link it and explain, I am new to python.
EDIT:
I am using Windows 10 as my OS
I have installed Spyder 4.1.4 using Anaconda Navigator, Python 3.7.7.
I installed using a executable package.
I have tested this code on VS Code as well.
I am running this via the two IDEs mentioned(E.G VS Code Powershell console and Spyder's Python Console by clicking run)
I generally currently believe it is an issue specific to my computer, and I'll like to know if its replicable in other Windows Systems and whether or not the linked "solution" in the first line works. With that I may be able to pinpoint my errors

This should work if you invoke multiprocessing.Process(target=customfunction, args=(1,2,3)) instead. I can't think of a reason why this would not work on Linux.
Can you update your question and provide the following information?
What OS are you using?
What version of Python are you running, and how was it installed?
How are you running main.py (e.g. from the command line, from an IDE, etc.)?
Any other details about your system configuration that might help others answer your question?

No multiprocessing print outputs (Spyder)
The solution mentioned here seems to solve my problem, I never expected either VS Code or Spyder's Console to have an issue with multiprocessing, but running the code in an external system terminal works.
Thank you to Melih Elibol for helping me think more clearly about the problem, I am new to python.

Cant get attribute error when using Python multiprocessing library [duplicate]

I am trying to implement multiprocessing in my code, and so, I thought that I would start my learning with some examples. I used the first example found in this documentation.
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
When I run the above code I get an AttributeError: can't get attribute 'f' on <module '__main__' (built-in)>. I do not know why I am getting this error. I am also using Python 3.5 if that helps.

This problem seems to be a design feature of multiprocessing.Pool. See https://bugs.python.org/issue25053. For some reason Pool does not always work with objects not defined in an imported module. So you have to write your function into a different file and import the module.
File: defs.py
def f(x):
return x*x
File: run.py
from multiprocessing import Pool
import defs
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(defs.f, [1, 2, 3]))
If you use print or a different built-in function, the example should work. If this is not a bug (according to the link), the given example is chosen badly.

The multiprocessing module has a major limitation when it comes to IPython use:
Functionality within this package requires that the __main__ module be
importable by the children. [...] This means that some examples, such
as the multiprocessing.pool.Pool examples will not work in the
interactive interpreter. [from the documentation]
Fortunately, there is a fork of the multiprocessing module called multiprocess which uses dill instead of pickle to serialization and overcomes this issue conveniently.
Just install multiprocess and replace multiprocessing with multiprocess in your imports:
import multiprocess as mp
def f(x):
return x*x
with mp.Pool(5) as pool:
print(pool.map(f, [1, 2, 3, 4, 5]))
Of course, externalizing the code as suggested in this answer works as well, but I find it very inconvenient: That is not why (and how) I use IPython environments.
<tl;dr> multiprocessing does not work in IPython environments right away, use its fork multiprocess instead.

Why not use joblib? Your code is equivalent to:
# pip install joblib
from joblib import Parallel, delayed
def f(x):
return x*x
res = Parallel(
n_jobs=5
)(
delayed(f)(x) for x in [1, 2, 3]
)

If you're using Jupyter notebook (like the OP), then defining the function in a separate cell and executing that cell first fixes the problem. The accepted answer works too, but it's more work. Defining the function before, i.e. above the pool, isn't adequate. It has to be in a completely different notebook cell which is executed first.

Python multiprocessing ImportError

I’m having trouble using python’s multiprocessing module. This is the first time I’ve tried using the module. I’ve tried simplifying my processing to the bare bones, but keep getting the same error. I’m using python 2.7.2, and Windows 7.
The script I’m trying to run is called learnmp.py, and the error message says that the problem is that it can't find module learnmp.
import multiprocessing
def doSomething():
"""worker function"""
print 'something'
return
if __name__ == '__main__':
jobs = []
for i in range(2):
p = multiprocessing.Process(target=doSomething)
jobs.append(p)
p.start()
The error is :
File “<string>”, line 1, in <module> File “C:\Python27\ArcGISx6410.1\lib\multiprocessing\forking.py”, line 373,
in main prepare(preparation_data) File “C:\Python27\ArcGISx6410.1\lib\multiprocessing\forking.py”, line 482,
in prepare file, path_name, etc = imp.find_module (main_name, dirs)
ImportError: No module named learnmp
What’s causing the error, and how can I solve it?
EDIT: I still don't know what was causing the error, but changing the file name eliminated it.

I know it's been a while, but I ran into this same error, also using the version of Python distributed with ArcGIS, and I've found a solution which at least worked in my case.
The problem that I had was that I was calling my program name, Test.py, as test.py. Note the difference in case.
c:\python27\arcgisx6410.2\python.exe c:\temp\test.py
c:\python27\arcgisx6410.2\python.exe c:\temp\Test.py
This isn't normally an issue if you're not using the multiprocessing library. However, when you write:
if __name__ == '__main__':
what appears to be happening is that the part of the program in main is being bound to the name of the python file. In my case that was test. However, there is no test, just Test. So although Windows will allow case-incorrect filenames in cmd, PowerShell, and in batch files, Python's multiprocessing library balks at this and throws a nasty series of errors.
Hopefully this helps someone.

Looks like you might be going down a rabbit-hole looking into multiprocessing. As the traceback shows, your python install is trying to look in the ArcGIS version of python before actually looking at your system install.
My guess is that the version of python that ships with ArcGIS is slightly customized for some reason or another and can't find your python script. The question then becomes:
Why is your Windows machine looking in ArcGIS for python?
Without looking at your machine at a slightly lower level I can't quite be sure, but if I had to guess, you probably added the ArcGIS directory to your PATH variable in front of the standard python directory, so it looks in ArcGIS first. If you move the ArcGIS path to the end of your PATH variable it should resolve the problem.
Changing your PATH variable: http://www.computerhope.com/issues/ch000549.htm

Microsoft Visual C++ 9.0 is required for some python modules to work in windows,so download below package it will work.
http://aka.ms/vcpython27
This package contains the compiler and set of system headers necessary for producing binary wheels for Python 2.7 packages.

Python ImportError for strptime in spyder for windows 7

I can't for the life of me figure out what is causing this very odd error.
I am running a script in python 2.7 in the spyder IDE for windows 7. It uses datetime.datetime.strptime at one point. I can run the code once and it seems fine (although I haven't finished debugging, so exceptions have been raised and it hasn't completed normally yet), then if I try running it again I get the following (end of traceback only is shown):
File "C:\path\to\test.py", line 220, in std_imp
self.data[key].append(dt.datetime.strptime(string_var, string_format_var))
ImportError: Failed to import _strptime because the import lockis held by another thread.
I am not running multiple threads with Threading etc. The only way to get the code to make it past this point is to completely restart the computer. Restarting spyder won't work. Web searches haven't seemed to yield any clues or indications of others who have had this happen.
Does anyone understand what is going on? Is this some sort of GIL problem? What is the import lock, and why does it seem to be preventing me from importing this method of the datetime module once I have already tried running the code once?

The solution, as noted by mfitzp, was to include a dummy call to datetime.datetime.strptime at the beginning of the script.
e.g.
# This is a throwaway variable to deal with a python bug
throwaway = datetime.datetime.strptime('20110101','%Y%m%d')

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.