Python Multiprocessing Looping Python File Instead of Starting Process - python

I'm trying to get started with multiprocessing, and I'm running into some interesting issues. The code I'm using is below (for the record, this example is straight from the multiprocessing documentation):
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob'))
p.start()
p.join()
This works fine, and prints "hello bob" as it should. When I add any additional code to the file though, before or after the if statement, then p does not evaluate, and my file loops back to the beginning and runs all over again endlessly. For example, the following code gives me this issue:
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob'))
p.start()
p.join()
test_input = input("test input")
I am running Python using Windows 10, Pycharm v. 2021.3.2, and Python 3.10.0. Is this an issue that any of you have seen before? At this point I'm starting to wonder if perhaps it's even an issue between Windows and Pycharm or Windows and Python, or maybe just a case of inexperience on my part.
Thank you!

That if __name__ == '__main__': guard is important. On systems that don't use fork, it simulates a fork by importing the main script in each worker process without naming it __main__ (it's named __mp_main__ IIRC). Any code that should only run in the "main" script needs to be protected by that guard (it can be indirectly, by defining a function and calling it within the guarded segment; the function will be defined in the workers, but not run).
So to fix this, all you need to do is indent the test_input = input("test input") so it's protected by the if __name__ == '__main__': guard. In real code, I try to keep the guarded section clean (so I can't accidentally write functions that rely on global state that doesn't exist when it's not run as the main script, and for the mild performance benefits of using function locals over globals), so I'd write it like:
from multiprocessing import Process
def f(name):
print('hello', name)
def main():
p = Process(target=f, args=('bob',))
p.start()
p.join()
test_input = input("test input")
if __name__ == '__main__':
main()
but that's not strictly necessary.

I thought I would elaborate on ShadowRanger's answer:
On Windows systems new subprocesses are created by the following steps:
A new process is created wherein the Python interpreter is re-launched.
The Python interpreter re-interprets the current source program executing everything that is at global scope in order to compile function definitions, initialize global variables, etc.
Finally, your worker function, f in this case, is invoked with memory thus initialized.
The reason for placing the code that creates the subprocess within a block that is governed by if __name__ == '__main__': is that if you didn't, then because of Step 2 above you would get into a recursive, infinite loop creating new subprocesses ad inifinitum. The key point is that only in the main function will variable __name__ have the value '__main__'; it will have a different value for any subprocess that is created. And so the code that creates the new subprocess, i.e. p = Process(target=f, args=('bob',)), will not be executed as part of the initialization of the subprocess.
Your problem arises from the statement test_input = input("test input") being at global scope and not being within a if __name__ == '__main__': block and so it will be executed as part of the initialization of the subprocess. So your worker function, f, will not run until this prompt for input is satisfied and then when it returns your main process will be putting out the prompt again. Anyway, this is what I see when the program is run from a Windows command prompt. Perhaps with PyCharm there is a restriction against doing the input statement from any thread other than the main thread. But even if an exception is being thrown from that statement in creating the subprocess, I still don't quite see how your program would be looping continuously. Unfortunately, I do not have PyCharm installed.

Regarding to ShadowRanger answer, I think you should also put comma after 'bob'.
According to https://docs.python.org/3/library/multiprocessing.html
P should be like this if you want to put another statement.
p = Process(target=f, args=('bob',))

Related

Python multiprocessing: Why does using Process run my program from the start?

I was having some trouble figuring out why my console would always print the print statements I had at the start of my file. Here's what it looks like:
from multiprocessing import Process
import time
print('hello') # why does this get printed over and over again?
def func1(num):
print(num ** 2)
time.sleep(1)
def func2(num):
print(num ** 3)
time.sleep(1)
if __name__ == '__main__':
counter = 0
while counter < 10:
proc1 = Process(target=func1, args=[2])
proc2 = Process(target=func2, args=[2])
proc1.start()
proc2.start()
proc1.join()
proc2.join()
counter += 1
once I run it: it prints "Hello" a every loop. I'm sure I'm just making a dumb mistake, but any help would be great, Thanks.
multiprocessing can fork an existing process or spawn a new process, depending on which options your operating system supports. On Windows, which can only spawn (execute a new process), a new instance of python is executed. That instance imports the module and then recreates your execution environment by expanding a pickled snapshot of your parent process. Theoretically, just enough to get the environment right for the subprocess.
In your case, print is at the module level so it is executed as part of the import in the subprocess. If this was the "__main__" module, you can simply put that print in the if __name__ == "__main__": clause. When its imported as a module instead of executed as a script, that print won't run.
If its not the main script module, well, that's messy. The general rule for modules is that they should be importable without side effects and that print is a side effect. Best to remove it in that case.

if __name__ == __main__ not working after compilation

i'm actually working on a script using multiprocessing library, everything is working perfectly from my text editor ( VSC ):
import multiprocessing
def example_func():
print("This is a targeted function for multiprocessing")
if __name__ == "__main__":
print("This is the main session, starting multiprocessing")
multiprocessing.Process(target=example_func).start()
so in my text editor when i run the code it output this :
This is the main session, starting multiprocessing
This is a targeted function for multiprocessing
but after I compile it to .exe using pyinstaller, something very strange happens, the code starts getting looped infinitely, its like if after i compiled it, the processes were considered as the main session, it means that in if __name__ == "__main__" processes were considered as main.
Please guys help, i really need your help.
EDIT : some guys told me to add string, I have already had it as a string in my script.I just didnt copy well here
You need to use multiprocessing.freeze_support appropriately when you're freezing to a Windows executable:
if __name__ == "__main__":
multiprocessing.freeze_support() # Required for PyInstaller
print("This is the main session, starting multiprocessing")
multiprocessing.Process(target=example_func).start()
Without it, the Windows "fork-like" behavior multiprocessing relies on doesn't know where to stop executing code when it launches subprocesses with the same executable.
It is a string:
if __name__ == "__main__":
pass
Note the double quotes instead of __main__ as an object (?).
I don't really know why it didn't throw an error there but the __main__ should be a string '__main__'
You have to compare __name__ to string __main__
import multiprocessing
def example_func():
print("This is a targeted function for multiprocessing")
if __name__ == "__main__":
print("This is the main session, starting multiprocessing")
multiprocessing.Process(target=example_func).start()

Program hangs in debug when multiprocessing process opens another process

In a python program, a Process is opened using multiprocessing.Process. Then this process is creating a Pool in order to give it some work using the map() method.
When the program is normally run, all works as expected. However, when it is run in PyCharm debugger, the call to Pool.map never returns and program is locked.
The problem is demonstrated in the following simple example:
1) Code:
import multiprocessing
def inc(a):
return a + 1;
def func():
p = multiprocessing.Pool(2)
print("before map")
res = p.map(inc, [1,4]) # ==> the method hangs in debug.
print("after call map")
p.close()
p.join()
print(res)
def main():
p = multiprocessing.Process(target=func)
p.start()
p.join()
if __name__ == '__main__':
main()
2) Output as expected when program is run:
before map
after call map
[2, 5]
Process finished with exit code 0
3) Output when program run in debugger - never completes:
pydev debugger: process 13792 is connecting
Connected to pydev debugger (build 173.4301.16)
before map
Is this just a very annoying debugging issue (maybe caused by debugger background threads?)? or is it a multiprocessing problem that might appear also in real run?
It should be mentioned that using only one of the subprocessing steps, meaning either just opening a Process(), or just using a pool.map(), causes no problems and could be debugged. The problem occurs only in the "nested" subrocessing, as described.
I am running PyCharm on a Windows 10 64 bit machine.
I just had similar problem and found out that I was having a break point placed inside the function that I was calling using the map() method. Removing this break point or ignoring break points resolved the problem for me. Hope that this helps.

Difference in behavior between os.fork and multiprocessing.Process

I have this code :
import os
pid = os.fork()
if pid == 0:
os.environ['HOME'] = "rep1"
external_function()
else:
os.environ['HOME'] = "rep2"
external_function()
and this code :
from multiprocessing import Process, Pipe
def f(conn):
os.environ['HOME'] = "rep1"
external_function()
conn.send(some_data)
conn.close()
if __name__ == '__main__':
os.environ['HOME'] = "rep2"
external_function()
parent_conn, child_conn = Pipe()
p = Process(target=f, args=(child_conn,))
p.start()
print parent_conn.recv()
p.join()
The external_function initializes an external programs by creating the necessary sub-directories in the directory found in the environment variable HOME. This function does this work only once in each process.
With the first example, which uses os.fork(), the directories are created as expected. But with second example, which uses multiprocessing, only the directories in rep2 get created.
Why isn't the second example creating directories in both rep1 and rep2?
The answer you are looking for is in detail addressed here. There is also an explanation of differences between different OS.
One big issue is that the fork system call does not exist on Windows. Therefore, when running a Windows OS you cannot use this method. multiprocessing is a higher-level interface to execute a part of the currently running program. Therefore, it - as forking does - creates a copy of your process current state. That is to say, it takes care of the forking of your program for you.
Therefore, if available you could consider fork() a lower-level interface to forking a program, and the multiprocessing library to be a higher-level interface to forking.
To answer your question directly, there must be some side effect of external_process that makes it so that when the code is run in series, you get different results than if you run them at the same time. This is due to how you set up your code, and the lack of differences between os.fork and multiprocessing.Process in systems that os.fork is supported.
The only real difference between the os.fork and multiprocessing.Process is portability and library overhead, since os.fork is not supported in windows, and the multiprocessing framework is included to make multiprocessing.Process work. This is because os.fork is called by multiprocessing.Process, as this answer backs up.
The important distinction, then, is os.fork copies everything in the current process using Unix's forking, which means at the time of forking both processes are the same with PID differences. In Window's, this is emulated by rerunning all the setup code before the if __name__ == '__main__':, which is roughly the same as creating a subprocess using the subprocess library.
For you, the code snippets you provide are doing fairly different things above, because you call external_function in main before you open the new process in the second code clip, making the two processes run in series but in different processes. Also the pipe is unnecessary, as it emulates no functionality from the first code.
In Unix, the code snippets:
import os
pid = os.fork()
if pid == 0:
os.environ['HOME'] = "rep1"
external_function()
else:
os.environ['HOME'] = "rep2"
external_function()
and:
import os
from multiprocessing import Process
def f():
os.environ['HOME'] = "rep1"
external_function()
if __name__ == '__main__':
p = Process(target=f)
p.start()
os.environ['HOME'] = "rep2"
external_function()
p.join()
should do exactly the same thing, but with a little extra overhead from the included multiprocessing library.
Without further information, we can't figure out what the issue is. If you can provide code that demonstrates the issue, that would help us help you.

os.sytem() in Python gives infinite loop

My main Python script imports 2 other scripts; Test1.py and Test2.py.
Test1.py does multiprocessing, and Test2.py does a simple os.system('ls') command. When Test1.py is finished and Test.py is called, os.system(ls) is going crazy and creates infinite new processes. Does anyone know why this happens?
# Main
import multiprocessing
import Test1.py
import Test2.py
def doSomething():
# Function 1, file1...file10 contain [name, path]
data = [file1, file2, file3, file4, file5, file6, file7, file8, file9, file10]
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=min(len(data), 5))
print pool.map(Test1.worker, data)
# Function 2
Test2.worker()
Test1.py; calls perl commands
def worker(data):
command = 'perl '+data[1].split('data_')[0]+'methods_FastQC\\fastqc '+data[1]+'\\'+data[0]+'\\'+data[0]+' --outdir='+data[1]+'\\_IlluminaResults\\_fastqcAnalysis'
process = subprocess.Popen(command, stdout=subprocess.PIPE)
process.wait()
process.stdout.read()
Test2.py should do ONE simple ls command, instead it never stops making new commands;
def worker():
command = 'ls'
os.system(command)
When looking at the processes if script is started, it seems like the processes after function1 also don't close properly. Via the Taskmanager I still see 5 extra pythonw.exe which don't seem to do anything. Only when I close the opened shell they go away. Thats probably related to why os.system(command) goes crazy in function 2? Does anyone have a solution, since I can't close the shell because the script is not finished since it still has to do function2?
Edit: When trying to find a solution, it also happened that function1 started with executing the commands from function(2) multiple times, and after that the perl commands. Which is even more weird.
It seems doSomething() is executed every time your main module is imported and it can be imported several times by multiprocessing during the workers initialization. You could check it by printing process pid: print(os.getpid()) in Test2.worker().
You should use if __name__ == '__main__': at the module level. It is error-prone to do it inside a function as your code shows.
import multiprocessing
# ...
if __name__ == '__main__': # at global level
multiprocessing.freeze_support()
main() # it calls do_something() and everything else
See the very first note in the introduction to multiprocessing.

Categories