i'm actually working on a script using multiprocessing library, everything is working perfectly from my text editor ( VSC ):
import multiprocessing
def example_func():
print("This is a targeted function for multiprocessing")
if __name__ == "__main__":
print("This is the main session, starting multiprocessing")
multiprocessing.Process(target=example_func).start()
so in my text editor when i run the code it output this :
This is the main session, starting multiprocessing
This is a targeted function for multiprocessing
but after I compile it to .exe using pyinstaller, something very strange happens, the code starts getting looped infinitely, its like if after i compiled it, the processes were considered as the main session, it means that in if __name__ == "__main__" processes were considered as main.
Please guys help, i really need your help.
EDIT : some guys told me to add string, I have already had it as a string in my script.I just didnt copy well here
You need to use multiprocessing.freeze_support appropriately when you're freezing to a Windows executable:
if __name__ == "__main__":
multiprocessing.freeze_support() # Required for PyInstaller
print("This is the main session, starting multiprocessing")
multiprocessing.Process(target=example_func).start()
Without it, the Windows "fork-like" behavior multiprocessing relies on doesn't know where to stop executing code when it launches subprocesses with the same executable.
It is a string:
if __name__ == "__main__":
pass
Note the double quotes instead of __main__ as an object (?).
I don't really know why it didn't throw an error there but the __main__ should be a string '__main__'
You have to compare __name__ to string __main__
import multiprocessing
def example_func():
print("This is a targeted function for multiprocessing")
if __name__ == "__main__":
print("This is the main session, starting multiprocessing")
multiprocessing.Process(target=example_func).start()
Related
I'm trying to get started with multiprocessing, and I'm running into some interesting issues. The code I'm using is below (for the record, this example is straight from the multiprocessing documentation):
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob'))
p.start()
p.join()
This works fine, and prints "hello bob" as it should. When I add any additional code to the file though, before or after the if statement, then p does not evaluate, and my file loops back to the beginning and runs all over again endlessly. For example, the following code gives me this issue:
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob'))
p.start()
p.join()
test_input = input("test input")
I am running Python using Windows 10, Pycharm v. 2021.3.2, and Python 3.10.0. Is this an issue that any of you have seen before? At this point I'm starting to wonder if perhaps it's even an issue between Windows and Pycharm or Windows and Python, or maybe just a case of inexperience on my part.
Thank you!
That if __name__ == '__main__': guard is important. On systems that don't use fork, it simulates a fork by importing the main script in each worker process without naming it __main__ (it's named __mp_main__ IIRC). Any code that should only run in the "main" script needs to be protected by that guard (it can be indirectly, by defining a function and calling it within the guarded segment; the function will be defined in the workers, but not run).
So to fix this, all you need to do is indent the test_input = input("test input") so it's protected by the if __name__ == '__main__': guard. In real code, I try to keep the guarded section clean (so I can't accidentally write functions that rely on global state that doesn't exist when it's not run as the main script, and for the mild performance benefits of using function locals over globals), so I'd write it like:
from multiprocessing import Process
def f(name):
print('hello', name)
def main():
p = Process(target=f, args=('bob',))
p.start()
p.join()
test_input = input("test input")
if __name__ == '__main__':
main()
but that's not strictly necessary.
I thought I would elaborate on ShadowRanger's answer:
On Windows systems new subprocesses are created by the following steps:
A new process is created wherein the Python interpreter is re-launched.
The Python interpreter re-interprets the current source program executing everything that is at global scope in order to compile function definitions, initialize global variables, etc.
Finally, your worker function, f in this case, is invoked with memory thus initialized.
The reason for placing the code that creates the subprocess within a block that is governed by if __name__ == '__main__': is that if you didn't, then because of Step 2 above you would get into a recursive, infinite loop creating new subprocesses ad inifinitum. The key point is that only in the main function will variable __name__ have the value '__main__'; it will have a different value for any subprocess that is created. And so the code that creates the new subprocess, i.e. p = Process(target=f, args=('bob',)), will not be executed as part of the initialization of the subprocess.
Your problem arises from the statement test_input = input("test input") being at global scope and not being within a if __name__ == '__main__': block and so it will be executed as part of the initialization of the subprocess. So your worker function, f, will not run until this prompt for input is satisfied and then when it returns your main process will be putting out the prompt again. Anyway, this is what I see when the program is run from a Windows command prompt. Perhaps with PyCharm there is a restriction against doing the input statement from any thread other than the main thread. But even if an exception is being thrown from that statement in creating the subprocess, I still don't quite see how your program would be looping continuously. Unfortunately, I do not have PyCharm installed.
Regarding to ShadowRanger answer, I think you should also put comma after 'bob'.
According to https://docs.python.org/3/library/multiprocessing.html
P should be like this if you want to put another statement.
p = Process(target=f, args=('bob',))
When using the library multiprocessing in Python, we know that if __name__ == '__main__' is necessary to fork children processes correctly from the parent. Otherwise there can be a RuntimeError like this:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
But is it possible to avoid if __name__ == '__main__'? What if I am writing a library that spawns multiple processes? Is it possible to eliminate the necessity for users to write if __name__ == '__main__' and deal with tedious details about multiprocessing?
I can write if __name__ == '__main__': inside my library, but in this way the user's code would run multiple times in my multiple processes.
I was having some trouble figuring out why my console would always print the print statements I had at the start of my file. Here's what it looks like:
from multiprocessing import Process
import time
print('hello') # why does this get printed over and over again?
def func1(num):
print(num ** 2)
time.sleep(1)
def func2(num):
print(num ** 3)
time.sleep(1)
if __name__ == '__main__':
counter = 0
while counter < 10:
proc1 = Process(target=func1, args=[2])
proc2 = Process(target=func2, args=[2])
proc1.start()
proc2.start()
proc1.join()
proc2.join()
counter += 1
once I run it: it prints "Hello" a every loop. I'm sure I'm just making a dumb mistake, but any help would be great, Thanks.
multiprocessing can fork an existing process or spawn a new process, depending on which options your operating system supports. On Windows, which can only spawn (execute a new process), a new instance of python is executed. That instance imports the module and then recreates your execution environment by expanding a pickled snapshot of your parent process. Theoretically, just enough to get the environment right for the subprocess.
In your case, print is at the module level so it is executed as part of the import in the subprocess. If this was the "__main__" module, you can simply put that print in the if __name__ == "__main__": clause. When its imported as a module instead of executed as a script, that print won't run.
If its not the main script module, well, that's messy. The general rule for modules is that they should be importable without side effects and that print is a side effect. Best to remove it in that case.
I was trying to run a piece of code.This code is all about multiprocessing.It works fine on command prompt and it also generates some output.But when I try to run this code on pyscripter it just says that script runs ok and it doesn't generate any output nor even it displays any error message.It doesn't even crashes.It would be really helpful if anyone could help me out to find out a right interpreter where this multiprocessing works fine.
Here is the piece of code:
from multiprocessing import Process
def wait():
print "wait"
clean()
def clean():
print "clean"
def main():
p=Process(target=wait)
p.start()
p.join()
if _name_=='_main_':
main()
The normal interpreter works just fine with multiprocessing on Windows 7 for me. (Your IDE might not like multiprocessing.)
You just have to do
if __name__=='__main__':
main()
with 2 underscores (__) each instead of 1 (_).
Also - if you don't have an actual reason not to use it, multiprocessing.Pool is much easier to use than multiprocessing.Process in most cases. Have alook at https://docs.python.org/2/library/multiprocessing.html#module-multiprocessing.pool
An implementation with a Pool would be
import multiprocessing
def wait():
print "wait"
clean()
def clean():
print "clean"
def main():
p=multiprocessing.Pool()
p.apply_async(wait)
p.close()
p.join()
if __name__=='__main__':
main()
but which method of Pool to use strongly depends on what you actually want to do.
My main Python script imports 2 other scripts; Test1.py and Test2.py.
Test1.py does multiprocessing, and Test2.py does a simple os.system('ls') command. When Test1.py is finished and Test.py is called, os.system(ls) is going crazy and creates infinite new processes. Does anyone know why this happens?
# Main
import multiprocessing
import Test1.py
import Test2.py
def doSomething():
# Function 1, file1...file10 contain [name, path]
data = [file1, file2, file3, file4, file5, file6, file7, file8, file9, file10]
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=min(len(data), 5))
print pool.map(Test1.worker, data)
# Function 2
Test2.worker()
Test1.py; calls perl commands
def worker(data):
command = 'perl '+data[1].split('data_')[0]+'methods_FastQC\\fastqc '+data[1]+'\\'+data[0]+'\\'+data[0]+' --outdir='+data[1]+'\\_IlluminaResults\\_fastqcAnalysis'
process = subprocess.Popen(command, stdout=subprocess.PIPE)
process.wait()
process.stdout.read()
Test2.py should do ONE simple ls command, instead it never stops making new commands;
def worker():
command = 'ls'
os.system(command)
When looking at the processes if script is started, it seems like the processes after function1 also don't close properly. Via the Taskmanager I still see 5 extra pythonw.exe which don't seem to do anything. Only when I close the opened shell they go away. Thats probably related to why os.system(command) goes crazy in function 2? Does anyone have a solution, since I can't close the shell because the script is not finished since it still has to do function2?
Edit: When trying to find a solution, it also happened that function1 started with executing the commands from function(2) multiple times, and after that the perl commands. Which is even more weird.
It seems doSomething() is executed every time your main module is imported and it can be imported several times by multiprocessing during the workers initialization. You could check it by printing process pid: print(os.getpid()) in Test2.worker().
You should use if __name__ == '__main__': at the module level. It is error-prone to do it inside a function as your code shows.
import multiprocessing
# ...
if __name__ == '__main__': # at global level
multiprocessing.freeze_support()
main() # it calls do_something() and everything else
See the very first note in the introduction to multiprocessing.