How do you run multiprocessing pool.starmap outside of __main__ function? - python

All the documentation I have read mentioned that the Pool should be surrounded by a check that it's in main. Otherwise, there is potential for an infinite loop.
What I see Online to Do:
if __name__ == "__main__":
with Pool(processes=5) as pool:
output = pool.starmap(test_func, list(tuples))
However, I am running the multiprocessing library in a sub-module (not 'main') and not having any errors (note running it thru a Juypter-notebook). I do have an infinite loop when trying to run some integration tests I made with the unittest library.
How can I run the multiprocessing step in a sub-module?
How I am Currently Doing It:
with Pool(processes=5) as pool:
output = pool.starmap(test_func, list(tuples))

Since I run most of my code in a Jupyter notebook I have run into this same problem. From what I have read Jupyter has a problem with namespace/ memory when trying to run a function that is in the same notebook inside jupyter.
The fix for me was to place the function I wanted to run into a module.py file and import the function and run the pool that way. IDKY but that works when running in Jupyter.

Related

Python interpreter complains about attempt to start new process before current process has finished bootstrapping phase [duplicate]

I am confused about using freeze_support() for multiprocessing and I get a Runtime Error without it. I am only running a script, not defining a function or a module. Can I still use it? Or should the packages I'm importing be using it?
Here is the documentation.
Note that the specific issue is about scikit-learn calling GridSearchCV which tries to spawn processes in parallel. I am not sure if my script needs to be frozen for this, or the some code that's called (from the Anaconda distro). If details are relevant to this question, please head over to the more specific question.
On Windows all of your multiprocessing-using code must be guarded by if __name__ == "__main__":
So to be safe, I would put all of your the code currently at the top-level of your script in a main() function, and then just do this at the top-level:
if __name__ == "__main__":
main()
See the "Safe importing of main module" sub-section here for an explanation of why this is necessary. You probably don't need to call freeze_support at all, though it won't hurt anything to include it.
Note that it's a best practice to use the if __name__ == "__main__" guard for scripts anyway, so that code isn't unexpectedly executed if you find you need to import your script into another script at some point in the future.

python multiprocessing pool.map hangs

I cannot make even the simplest of examples of parallel processing using the multiprocessing package run in python 2.7 (using spyder as a UI on windows) and I need help figuring out the issue. I have run conda update so all of the packages should be up to date and compatible.
Even the first example in the multiprocessing package documentation (given below) wont work, it generates 4 new processes but the console just hangs. I have tried everything I can find over the last 3 days but none of the code that runs without hanging will allocate more than 25% of my computing power to this task (I have a 4 core computer).
I have given up on running the procedure I have designed and need parallel processing for at this point and I am only trying to get proof of concept so I can build from there. Can someone explain and point me in the right direction? Thanks
Example 1 from https://docs.python.org/2/library/multiprocessing.html
#
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool()
print(p.map(f, [1, 2, 3]))
Example 2 (modified from original) from http://chriskiehl.com/article/parallelism-in-one-line/
from multiprocessing import Pool
def fn(i):
return [i,i*i,i*i*i]
test = range(10)
if __name__ == '__main__':
pool = Pool()
results = map(fn,test)
pool.close()
pool.join()
I apologize if there is indeed an answer to this as it seems as though I should be able to manage such a modest task but I am not a programmer and the resources I have found have been less than helpful given my very limited level of knowledge. Please let me know what further information is needed.
Thank you.
After installing spyder on my virtualmachine, it seems to be a spyder specific bug. Example 1 works in IDLE, executed via the command line, executed from within spyder (first saved and then executed), but not when executed line by line in spyder.
I would suggest simply to create a new file in spyder, add the lines of code, save it, and then run it..
For related reports see:
https://groups.google.com/forum/#!topic/spyderlib/LP5d8QZTXd0
QtConse in Spyder cannot use multiprocessing.Manager
Multiprocessing working in Python but not in iPython
https://github.com/spyder-ide/spyder/issues/1900

Puzzling behaviour of __name__ variable using multiprocessing (Python 2.7)

Today i was messing with the multiprocessing library, and i noticed something wierd. I was trying to figure out if it was possible to have nested scripts using multiprocessing (a script that uses multiprocessing to run a part of a script which uses multiprocessing to run more parts of the script). To figure this out i started looking at what the __name__ variable is for the child-scripts, because, if you are familiar with multiprocessing, you know this is going to be a problem
When i wrote a test script, the result surprised me. I wrote this simple script:
from multiprocessing import Pool
def Child(Inf):
print "Child" + __name__
if __name__ == "__main__":
Process = Pool(4)
Process.map(Child, [0,0,0,0])
print "Parent" + __name__
(don't mind the list of four zero's)
The console printed out this:
Child__main__
Child__main__
Child__main__
Child__main__
Parent__main__
This means that the __name__ of the child processes is also __main__
if __name__ == "__main__":
This is the part that puzzles me. After testing, it appeared that the child process gets run 4 times, while the if statement only gets run once. This makes sense when reading the code, but the testing shows that all processes are called the same, and the computer shouldn't be able to tell the difference between them, because the child looks at a variable that is no different from the parent.
I am puzzled by this, because i thought i understood how putting in the if statement prevents the child-processes from running the main program as well, but this appears to be untrue.
Am a missing an important clue or is this just something weird i shouldn't be looking at? :p
Regards,
Harm
What happens is that each process is not fed with the same input.
The parent process will receive and execute the full .py file you input while child processes will fork from the parent with certain functions loaded into memory and will be requested to run one specific function instead of running the entire program (which would lead to infinite recursion...).
The fact __name__ variable is the same is because each child process is a copy of the parent. They are just at different executing points.
On Windows OS:
I didn't notice until now but somehow Python runs the code again when creating multiple processes on Windows.
On Python 3.5 (maybe other versions of Python 3 too, but I didn't check), it will set the __name__ variable to __mp_main__ and avoid the problem.
On Python 2.7, if the value is really __main__, the only explanation I have is that the variable is being set after the input code runs. Otherwise the if block would be executed as well.
Edit: I just tested on Python 2.7.11 and the __name__ variable is set to __parents_main__ before being set to __main__. I would not rely on this because it was changed on Python 3 as you can see.
There is no reason the __name__ should be different at all. It will just give you the name of the module (or __main__ in case of the program itself) where the code using it resides. And you are always calling it either from the original program or from a forked version of it.

Python Multiprocessing RuntimeError on Windows

I have a class function (let's call it "alpha.py") that uses multiprocessing (processes=2) to fork a process and is part of a Python package that I wrote. In a separate Python script (let's call it "beta.py"), I instantiated an object from this class and called the corresponding function that uses multiprocessing. Finally, all of this is wrapped inside a wrapper Python script (let's call this "gamma.py") that handles many different class objects and functions.
Essentially:
Run ./gamma.py from the command line
gamma.py uses subprocess and executes beta.py
beta.py instantiates an object from the alpha.py class and calls the function which uses multiprocessing (processes=2)
This has no problems being run on a Mac or Linux. However, it becomes a problem on a Windows machine and the error (and documentation) suggests that I should write this somewhere:
if __name__ == '__main__':
freeze_support()
This other post also mentions doing the same thing.
However, I don't know exactly where these two lines should reside. Currently, neither alpha.py, beta.py, or gamma.py contains an if __name__ == '__main__': section. It would be great if somebody can tell me where these two lines should go and also the rationale behind it.
Actually, freeze_support() is not needed here. You get a RuntimeError because you create and start your new processes at the top level of your beta module.
When a new process is created using multiprocessing on Windows, a new Python interpreter will be started in this process and it will try to import the module with the target function that should be executed. This is your beta module. Now, when you import it, all your top level statements should be executed which will cause a new process to be created and started again. And then, recursively, another process from that process, and so on and so forth.
This is most likely not what you want, thus new processes should be initialized and started only once, when you run beta.py directly with a subprocess.
if __name__ == '__main__': should be placed in beta.py, then move initialization and start code for your new processes in this section. After that, when beta.py will be imported and not run directly, no new process will be started and you will not see any side effects.

Python multiprocessing Pool start multiple GUI

All,
In my GUI I'm using multiprocessing to run a function. But the pool start multiple GUI.
I have read that peoples add if __name__ == '__main__': in their code and it seems to work.
But I don't know if this trick will work in my case, and where I have to insert this code.
The function run_func() is launched by a button in the GUI.
How can I block this multiple start?
I have a second question:
How can I do to unimport setup at the end of the exec?
Thanks a lot !
#pyqtSlot()
def run_func():
run="""
import os
import sys
from setup import *
print('toto')
print('titi')
"""
from multiprocessing import Pool
pool = Pool(processes=4)
asyncResult = pool.apply_async(exec(run),{},{}),range(1)
You don't provide much context for your question. Anyway I made a test removing the from setup import * part from the run string. And all run well, hence, is not a PyQT problem, is more like at some point you're executing again the module/function that runs the GUI.
For the first question:
I recommend you to use a debugger and set some breakpoints where the GUI is launched, then you can use the call-stack for figureout who is calling your GUI. That way yuo will know where the 'main' code block goes, and even if it is really necessary.
As debugger use pdb with in-code breakpoints (remember you're running in multiprocess) put the line:
import pdb; pdb.set_trace()
where ever yuo want to set a breakpoint.
For the second question:
See, How do I unload (reload) a Python module?

Categories