Python multiprocessing: Why does using Process run my program from the start? - python

I was having some trouble figuring out why my console would always print the print statements I had at the start of my file. Here's what it looks like:
from multiprocessing import Process
import time
print('hello') # why does this get printed over and over again?
def func1(num):
print(num ** 2)
time.sleep(1)
def func2(num):
print(num ** 3)
time.sleep(1)
if __name__ == '__main__':
counter = 0
while counter < 10:
proc1 = Process(target=func1, args=[2])
proc2 = Process(target=func2, args=[2])
proc1.start()
proc2.start()
proc1.join()
proc2.join()
counter += 1
once I run it: it prints "Hello" a every loop. I'm sure I'm just making a dumb mistake, but any help would be great, Thanks.

multiprocessing can fork an existing process or spawn a new process, depending on which options your operating system supports. On Windows, which can only spawn (execute a new process), a new instance of python is executed. That instance imports the module and then recreates your execution environment by expanding a pickled snapshot of your parent process. Theoretically, just enough to get the environment right for the subprocess.
In your case, print is at the module level so it is executed as part of the import in the subprocess. If this was the "__main__" module, you can simply put that print in the if __name__ == "__main__": clause. When its imported as a module instead of executed as a script, that print won't run.
If its not the main script module, well, that's messy. The general rule for modules is that they should be importable without side effects and that print is a side effect. Best to remove it in that case.

Related

Python Multiprocessing Looping Python File Instead of Starting Process

I'm trying to get started with multiprocessing, and I'm running into some interesting issues. The code I'm using is below (for the record, this example is straight from the multiprocessing documentation):
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob'))
p.start()
p.join()
This works fine, and prints "hello bob" as it should. When I add any additional code to the file though, before or after the if statement, then p does not evaluate, and my file loops back to the beginning and runs all over again endlessly. For example, the following code gives me this issue:
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob'))
p.start()
p.join()
test_input = input("test input")
I am running Python using Windows 10, Pycharm v. 2021.3.2, and Python 3.10.0. Is this an issue that any of you have seen before? At this point I'm starting to wonder if perhaps it's even an issue between Windows and Pycharm or Windows and Python, or maybe just a case of inexperience on my part.
Thank you!
That if __name__ == '__main__': guard is important. On systems that don't use fork, it simulates a fork by importing the main script in each worker process without naming it __main__ (it's named __mp_main__ IIRC). Any code that should only run in the "main" script needs to be protected by that guard (it can be indirectly, by defining a function and calling it within the guarded segment; the function will be defined in the workers, but not run).
So to fix this, all you need to do is indent the test_input = input("test input") so it's protected by the if __name__ == '__main__': guard. In real code, I try to keep the guarded section clean (so I can't accidentally write functions that rely on global state that doesn't exist when it's not run as the main script, and for the mild performance benefits of using function locals over globals), so I'd write it like:
from multiprocessing import Process
def f(name):
print('hello', name)
def main():
p = Process(target=f, args=('bob',))
p.start()
p.join()
test_input = input("test input")
if __name__ == '__main__':
main()
but that's not strictly necessary.
I thought I would elaborate on ShadowRanger's answer:
On Windows systems new subprocesses are created by the following steps:
A new process is created wherein the Python interpreter is re-launched.
The Python interpreter re-interprets the current source program executing everything that is at global scope in order to compile function definitions, initialize global variables, etc.
Finally, your worker function, f in this case, is invoked with memory thus initialized.
The reason for placing the code that creates the subprocess within a block that is governed by if __name__ == '__main__': is that if you didn't, then because of Step 2 above you would get into a recursive, infinite loop creating new subprocesses ad inifinitum. The key point is that only in the main function will variable __name__ have the value '__main__'; it will have a different value for any subprocess that is created. And so the code that creates the new subprocess, i.e. p = Process(target=f, args=('bob',)), will not be executed as part of the initialization of the subprocess.
Your problem arises from the statement test_input = input("test input") being at global scope and not being within a if __name__ == '__main__': block and so it will be executed as part of the initialization of the subprocess. So your worker function, f, will not run until this prompt for input is satisfied and then when it returns your main process will be putting out the prompt again. Anyway, this is what I see when the program is run from a Windows command prompt. Perhaps with PyCharm there is a restriction against doing the input statement from any thread other than the main thread. But even if an exception is being thrown from that statement in creating the subprocess, I still don't quite see how your program would be looping continuously. Unfortunately, I do not have PyCharm installed.
Regarding to ShadowRanger answer, I think you should also put comma after 'bob'.
According to https://docs.python.org/3/library/multiprocessing.html
P should be like this if you want to put another statement.
p = Process(target=f, args=('bob',))

Python multiprocessing with print() doesn't work

I'm learning multiprocessing with python and it doesn't seem to work with the print() function and IDLE's shell.
Also, making a process as Daemon doesn't seems to work either as the process doesn't get killed when the Main program ends.
here is the code I wrote, I hope some1 could explain what could be wrong:
import multiprocessing
import time
def proc1(x): # Creates a function to be used inside a process
for i in range(x):
print("proc1 is running") # It seems like the Child Processes doesn't print the "print()" function into the IDLE's shell
if __name__ == '__main__': # Important!!! we dont want to create endless subprocesses by mistake
proc = multiprocessing.Process(name='proc of Python' , target=proc1 , args=[300000])
proc.daemon = True # make the process a Daemon and get killed with the end of the Main Program - doesn't seems to work in this example, I can see the process keep running on Task Manager
proc2 = multiprocessing.Process(name='proc2 of Python' , target=proc1 , args=[300000])
proc2.start()
proc.start()
# proc.join()
print('Multi Processing is hard!!\n\n')

os.sytem() in Python gives infinite loop

My main Python script imports 2 other scripts; Test1.py and Test2.py.
Test1.py does multiprocessing, and Test2.py does a simple os.system('ls') command. When Test1.py is finished and Test.py is called, os.system(ls) is going crazy and creates infinite new processes. Does anyone know why this happens?
# Main
import multiprocessing
import Test1.py
import Test2.py
def doSomething():
# Function 1, file1...file10 contain [name, path]
data = [file1, file2, file3, file4, file5, file6, file7, file8, file9, file10]
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=min(len(data), 5))
print pool.map(Test1.worker, data)
# Function 2
Test2.worker()
Test1.py; calls perl commands
def worker(data):
command = 'perl '+data[1].split('data_')[0]+'methods_FastQC\\fastqc '+data[1]+'\\'+data[0]+'\\'+data[0]+' --outdir='+data[1]+'\\_IlluminaResults\\_fastqcAnalysis'
process = subprocess.Popen(command, stdout=subprocess.PIPE)
process.wait()
process.stdout.read()
Test2.py should do ONE simple ls command, instead it never stops making new commands;
def worker():
command = 'ls'
os.system(command)
When looking at the processes if script is started, it seems like the processes after function1 also don't close properly. Via the Taskmanager I still see 5 extra pythonw.exe which don't seem to do anything. Only when I close the opened shell they go away. Thats probably related to why os.system(command) goes crazy in function 2? Does anyone have a solution, since I can't close the shell because the script is not finished since it still has to do function2?
Edit: When trying to find a solution, it also happened that function1 started with executing the commands from function(2) multiple times, and after that the perl commands. Which is even more weird.
It seems doSomething() is executed every time your main module is imported and it can be imported several times by multiprocessing during the workers initialization. You could check it by printing process pid: print(os.getpid()) in Test2.worker().
You should use if __name__ == '__main__': at the module level. It is error-prone to do it inside a function as your code shows.
import multiprocessing
# ...
if __name__ == '__main__': # at global level
multiprocessing.freeze_support()
main() # it calls do_something() and everything else
See the very first note in the introduction to multiprocessing.

python multiprocessing behaviour

I've noticed strange behaviour when running some python code that made use of the multiprocessing library. This is all under Windows and likely a Windows thing, but maybe someone could explain what's happening.
If I create a simple python script and create a pool like so:
import multiprocessing
pool = multiprocessing.Pool()
print "made a pool"
while True:
pass
when I run the script I see "made a pool" printed 8 times, which would be the default number of processes created by Pool() as I have 8 cores on my machine.
When I change the script to be like so:
import multiprocessing
def run():
pool = multiprocessing.Pool()
print "made a pool"
while True:
pass
if __name__ == '__main__':
run()
I see "made a pool" printed once - which is what I would have expected in both cases.
I guess I would normally run any code using the multiprocessing library from a function, but got caught out by this while playing with some code in a single python file. Anyone know why it happens?

Python multiprocessing continuously spawns pythonw.exe processes without doing any actual work

I don't understand why this simple code
# file: mp.py
from multiprocessing import Process
import sys
def func(x):
print 'works ', x + 2
sys.stdout.flush()
p = Process(target= func, args= (2, ))
p.start()
p.join()
p.terminate()
print 'done'
sys.stdout.flush()
creates "pythonw.exe" processes continuously and it doesn't print anything, even though I run it from the command line:
python mp.py
I am running the latest of Python 2.6 on Windows 7 both 32 and 64 bits
You need to protect then entry point of the program by using if __name__ == '__main__':.
This is a Windows specific problem. On Windows your module has to be imported into a new Python interpreter in order for it to access your target code. If you don't stop this new interpreter running the start up code it will spawn another child, which will then spawn another child, until it's pythonw.exe processes as far as the eye can see.
Other platforms use os.fork() to launch the subprocesses so don't have the problem of reimporting the module.
So your code will need to look like this:
from multiprocessing import Process
import sys
def func(x):
print 'works ', x + 2
sys.stdout.flush()
if __name__ == '__main__':
p = Process(target= func, args= (2, ))
p.start()
p.join()
p.terminate()
print 'done'
sys.stdout.flush()
According to the programming guidelines for multiprocessing, on windows you need to use an if __name__ == '__main__':
Funny, works on my Linux machine:
$ python mp.py
works 4
done
$
Is the multiprocessing thing supposed to work on Windows? A lot of programs originated in the Unix world don't handle Windows so well, because Unix uses fork(2) to clone processes quite cheaply, but (it is my understanding) that Windows does not support fork(2) gracefully, if at all.

Categories