Today i was messing with the multiprocessing library, and i noticed something wierd. I was trying to figure out if it was possible to have nested scripts using multiprocessing (a script that uses multiprocessing to run a part of a script which uses multiprocessing to run more parts of the script). To figure this out i started looking at what the __name__ variable is for the child-scripts, because, if you are familiar with multiprocessing, you know this is going to be a problem
When i wrote a test script, the result surprised me. I wrote this simple script:
from multiprocessing import Pool
def Child(Inf):
print "Child" + __name__
if __name__ == "__main__":
Process = Pool(4)
Process.map(Child, [0,0,0,0])
print "Parent" + __name__
(don't mind the list of four zero's)
The console printed out this:
Child__main__
Child__main__
Child__main__
Child__main__
Parent__main__
This means that the __name__ of the child processes is also __main__
if __name__ == "__main__":
This is the part that puzzles me. After testing, it appeared that the child process gets run 4 times, while the if statement only gets run once. This makes sense when reading the code, but the testing shows that all processes are called the same, and the computer shouldn't be able to tell the difference between them, because the child looks at a variable that is no different from the parent.
I am puzzled by this, because i thought i understood how putting in the if statement prevents the child-processes from running the main program as well, but this appears to be untrue.
Am a missing an important clue or is this just something weird i shouldn't be looking at? :p
Regards,
Harm
What happens is that each process is not fed with the same input.
The parent process will receive and execute the full .py file you input while child processes will fork from the parent with certain functions loaded into memory and will be requested to run one specific function instead of running the entire program (which would lead to infinite recursion...).
The fact __name__ variable is the same is because each child process is a copy of the parent. They are just at different executing points.
On Windows OS:
I didn't notice until now but somehow Python runs the code again when creating multiple processes on Windows.
On Python 3.5 (maybe other versions of Python 3 too, but I didn't check), it will set the __name__ variable to __mp_main__ and avoid the problem.
On Python 2.7, if the value is really __main__, the only explanation I have is that the variable is being set after the input code runs. Otherwise the if block would be executed as well.
Edit: I just tested on Python 2.7.11 and the __name__ variable is set to __parents_main__ before being set to __main__. I would not rely on this because it was changed on Python 3 as you can see.
There is no reason the __name__ should be different at all. It will just give you the name of the module (or __main__ in case of the program itself) where the code using it resides. And you are always calling it either from the original program or from a forked version of it.
Related
I have two python scripts: one script which triggers an API, and another script which I want to read information from the API call. That is, let's say script1 has a variable "word" that it grabs during its API call and I want script2 to be able to read it and print it.
I tried using the method of importing (this is pseudo-code):
import script1
print script1.word
The problem is the first API is a "polling" script, which activates and stays on. So, when I called the import, as soon I did anything in script2 related to the import from script1, the polling function was activated as part of script2's run. So, then I have two scripts running the API.
What I want is for script1 to just store the variable and for script2 to print it (or be able to use it however I want as a normal variable/object in script2).
I hope this makes sense. I can't come up with a simple example to past here because of the API. So, I'm having difficulty making this question more clear and I'd be happy to answer any questions,
Your first script needs to put its actual polling in a block like this:
if __name__ == '__main__':
# do polling stuff
to have that logic execute when the script itself is executed but not also when it's imported.
I am confused about using freeze_support() for multiprocessing and I get a Runtime Error without it. I am only running a script, not defining a function or a module. Can I still use it? Or should the packages I'm importing be using it?
Here is the documentation.
Note that the specific issue is about scikit-learn calling GridSearchCV which tries to spawn processes in parallel. I am not sure if my script needs to be frozen for this, or the some code that's called (from the Anaconda distro). If details are relevant to this question, please head over to the more specific question.
On Windows all of your multiprocessing-using code must be guarded by if __name__ == "__main__":
So to be safe, I would put all of your the code currently at the top-level of your script in a main() function, and then just do this at the top-level:
if __name__ == "__main__":
main()
See the "Safe importing of main module" sub-section here for an explanation of why this is necessary. You probably don't need to call freeze_support at all, though it won't hurt anything to include it.
Note that it's a best practice to use the if __name__ == "__main__" guard for scripts anyway, so that code isn't unexpectedly executed if you find you need to import your script into another script at some point in the future.
I have a class function (let's call it "alpha.py") that uses multiprocessing (processes=2) to fork a process and is part of a Python package that I wrote. In a separate Python script (let's call it "beta.py"), I instantiated an object from this class and called the corresponding function that uses multiprocessing. Finally, all of this is wrapped inside a wrapper Python script (let's call this "gamma.py") that handles many different class objects and functions.
Essentially:
Run ./gamma.py from the command line
gamma.py uses subprocess and executes beta.py
beta.py instantiates an object from the alpha.py class and calls the function which uses multiprocessing (processes=2)
This has no problems being run on a Mac or Linux. However, it becomes a problem on a Windows machine and the error (and documentation) suggests that I should write this somewhere:
if __name__ == '__main__':
freeze_support()
This other post also mentions doing the same thing.
However, I don't know exactly where these two lines should reside. Currently, neither alpha.py, beta.py, or gamma.py contains an if __name__ == '__main__': section. It would be great if somebody can tell me where these two lines should go and also the rationale behind it.
Actually, freeze_support() is not needed here. You get a RuntimeError because you create and start your new processes at the top level of your beta module.
When a new process is created using multiprocessing on Windows, a new Python interpreter will be started in this process and it will try to import the module with the target function that should be executed. This is your beta module. Now, when you import it, all your top level statements should be executed which will cause a new process to be created and started again. And then, recursively, another process from that process, and so on and so forth.
This is most likely not what you want, thus new processes should be initialized and started only once, when you run beta.py directly with a subprocess.
if __name__ == '__main__': should be placed in beta.py, then move initialization and start code for your new processes in this section. After that, when beta.py will be imported and not run directly, no new process will be started and you will not see any side effects.
I am confused about using freeze_support() for multiprocessing and I get a Runtime Error without it. I am only running a script, not defining a function or a module. Can I still use it? Or should the packages I'm importing be using it?
Here is the documentation.
Note that the specific issue is about scikit-learn calling GridSearchCV which tries to spawn processes in parallel. I am not sure if my script needs to be frozen for this, or the some code that's called (from the Anaconda distro). If details are relevant to this question, please head over to the more specific question.
On Windows all of your multiprocessing-using code must be guarded by if __name__ == "__main__":
So to be safe, I would put all of your the code currently at the top-level of your script in a main() function, and then just do this at the top-level:
if __name__ == "__main__":
main()
See the "Safe importing of main module" sub-section here for an explanation of why this is necessary. You probably don't need to call freeze_support at all, though it won't hurt anything to include it.
Note that it's a best practice to use the if __name__ == "__main__" guard for scripts anyway, so that code isn't unexpectedly executed if you find you need to import your script into another script at some point in the future.
I am writing some multiprocessing code (Python 2.6.4, WinXP) that spawns processes to run background tasks. In playing around with some trivial examples, I am running into an issue where my code just continuously spawns new processes, even though I only tell it to spawn a fixed number.
The program itself runs fine, but if I look in Windows TaskManager, I keep seeing new 'python.exe' processes appear. They just keep spawning more and more as the program runs (eventually starving my machine).
For example,
I would expect the code below to launch 2 python.exe processes. The first being the program itself, and the second being the child process it spawns. Any idea what I am doing wrong?
import time
import multiprocessing
class Agent(multiprocessing.Process):
def __init__(self, i):
multiprocessing.Process.__init__(self)
self.i = i
def run(self):
while True:
print 'hello from %i' % self.i
time.sleep(1)
agent = Agent(1)
agent.start()
It looks like you didn't carefully follow the guidelines in the documentation, specifically this section where it talks about "Safe importing of main module".
You need to protect your launch code with an if __name__ == '__main__': block or you'll get what you're getting, I believe.
I believe it comes down to the multiprocessing module not being able to use os.fork() as it does on Linux, where an already-running process is basically cloned in memory. On Windows (which has no such fork()) it must run a new Python interpreter and tell it to import your main module and then execute the start/run method once that's done. If you have code at "module level", unprotected by the name check, then during the import it starts the whole sequence over again, ad infinitum
When I run this in Linux with python2.6, I see a maximum of 4 python2.6 processes and I can't guarantee that they're all from this process. They're definitely not filling up the machine.
Need new python version? Linux/Windows difference?
I don't see anything wrong with that. Works fine on Ubuntu 9.10 (Python 2.6.4).
Are you sure you don't have cron or something starting multiple copies of your script? Or that the spawned script is not calling anything that would start a new instance, for example as a side effect of import if your code runs directly on import?