I am quite new to python and the multi-processing module. I want to know how to make the process skip the beginning so it doesn't repeat it. Any help would be appreciated :)
print("Doing something!!!")
Code:
import multiprocessing
print("Doing something!!!")
def stuff():
print("Doing stuff")
if __name__ == '__main__':
p1 = multiprocessing.Process(target=stuff)
p1.start()
Output:
Doing something!!!
Doing something!!!
Doing stuff
Desired output:
Doing something!!!
Doing stuff
See the multiprocessing Programming Guidelines in the documentation.
On systems that use the "spawn" or "forkserver" methods of creating processes:
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
Your script is being imported into every process, so it will run any global code in all processes. Just move anything global into the if __name__ == '__main__': section:
import multiprocessing
def stuff():
print("Doing stuff")
if __name__ == '__main__':
print("Doing something!!!")
p1 = multiprocessing.Process(target=stuff)
p1.start()
This insures the function stuff() will be imported and defined in every process, but your print will only run once in the main process.
Related
Suppose i have two functions
def funct1():
##does something
def funct2():
##does something
I want to use them in another function with multiprocessing like so:
def my_funct
##does something
if __name__ == '__main__':
p1 = Process(target = funct1)
p2 = Process(target = funct2)
p1.start()
p2.start()
##more code
p1.terminate()
p2.terminate()
return something
Basically i want to start and end processes inside my function but its not working properly. What would be the correct way to do this?
On platforms that use the spawn method to create new processes it becomes necessary to place the process-creation code that exists at the global scope within an if __name__ == '__main__': block to prevent the newly created child process from trying to re-execute recursively the process-creation code since all code at the global level is re-executed in order to initialize memory (e.g. function definitions and global variables) for the process. Putting such a test within a function or method, which by definition would not be at global scope, would not normally make too much sense.
In your case there must be some code in the main script being executed (where __name__ would be '__main__' unless that script is launched as a module with the -m Python flag) that invokes directly or indirectly your my_funct function. It is that code that should be placed in a if __name__ == '__main__': block. For example:
def my_funct
##does something
p1 = Process(target = funct1)
p2 = Process(target = funct2)
p1.start()
p2.start()
##more code
p1.terminate()
p2.terminate()
return something
def function main():
# Do some work
...
# Call my_funct, which creates new child processes:
print(my_funct())
# Do some more work
...
if __name__ == '__main__':
# The following function invocation is at global scope in the main script
# and invokes code that will ultimately be creating new child processes:
main()
If the if __name__ == '__main__': test were instead moved to where you had it originally, then function main would be invoked as part of the initialization of memory for the new processes that were created in function my_funct. But I am sure you would not want main or any part of my_funct to be re-executed, which would happen with this move.
Note
I should add that any code at global scope that you do not want or do not need to be re-executed as part of memory initialization for the new child process should be placed within a if __name__ == '__main__': block , not just process-creation code. Also note that if your my_funct is imported from some module, then __name__ would not be '__main__' to begin with.
I have a multiprocessing pool , that runs with 1 thread, and it keeps repeating the code before my function, i have tried with different threads, and also, i make things like this quite a bit, so i think i know what is causing the problem but i dont understand why, usually i use argparse to to parse files from the user, but i instead wanted to use input, no errors are thrown so i honestly have no clue.
from colorama import Fore
import colorama
import os
import ctypes
import multiprocessing
from multiprocessing import Pool
import random
colorama.init(autoreset=False)
print("headerhere")
#as you can see i used input instead of argparse
g = open(input(Fore.RED + " File Path?: " + Fore.RESET))
gg = open(input(Fore.RED + "File Path?: " + Fore.RESET))
#I messed around with this to see if it was the problem, ultimately disabling it until i fixed it, i just use 1 thread
threads = int(input(Fore.RED + "Amount of Threads?: " + Fore.RESET))
arrange = [lines.replace("\n", "")for lines in g]
good = [items.replace("\n", "") for items in gg]
#this is all of the code before the function that Pool calls
def che(line):
print("f")
#i would show my code but as i said this isnt the problem since ive made programs like this before, the only thing i changed is how i take file inputs from the user
def main():
pool = Pool(1)
pool.daemon = True
result = pool.map(che, arrange)
if __name__ == "__main__":
main()
if __name__ == "__main__":
main()
Here's a minimal, reproducible example of your issue:
from multiprocessing import Pool
print('header')
def func(n):
print(f'func {n}')
def main():
pool = Pool(3)
pool.map(func,[1,2,3])
if __name__ == '__main__':
main()
On OSes where "spawn" (Windows and MacOS) or "forkserver" (some Unix) are the default start methods, the sub-process imports your script. Since print('header') is at global scope, it will run the first time a script is imported into a process, so the output is:
header
header
header
header
func 1
func 2
func 3
A multiprocessing script should have everything meant to run once inside function(s), and they should be called once by the main script via if_name__ == '__main__':, so the solution is to move it into your def main()::
from multiprocessing import Pool
def func(n):
print(f'func {n}')
def main():
print('header')
pool = Pool(3)
pool.map(func,[1,2,3])
if __name__ == '__main__':
main()
Output:
header
func 1
func 2
func 3
If you want the top level code before the definition of che to only be executed in the master process, then place it in a function and call that function in main.
In multiprocessing, the top level statements will be interpreted/executed by both the master process and every child process. So, if some code should be executed only by the master and not by the children, then such code should not placed that at the top-level. Instead, such code should be placed in functions and these functions should be invoked in the main scope, i.e., in the scope of if block controlled by __main__ (or called in the main function in your code snippet).
It seems multiprocessing swaps between threads faster so I started working on swapping over but I'm getting some unexpected results. It causes my entire script to loop several times when a thread didn't before.
Snippet example:
stuff_needs_done = true
more_stuff_needs_done = true
print "Doing stuff"
def ThreadStuff():
while 1 == 1:
#do stuff here
def OtherThreadStuff():
while 1 == 1:
#do other stuff here
if stuff_needs_done == true:
Thread(target=ThreadStuff).start()
if more_stuff_needs_done == true:
Thread(target=OtherThreadStuff).start()
This works as I'd expect. The threads start and run until stopped. But when running a lot of these the overhead is higher (so I'm told) so I tried swapping to multiprocessing.
Snippet example:
stuff_needs_done = true
more_stuff_needs_done = true
print "Doing stuff"
def ThreadStuff():
while 1 == 1:
#do stuff here
def OtherThreadStuff():
while 1 == 1:
#do other stuff here
if stuff_needs_done == true:
stuffproc1= Process(target=ThreadStuff).start()
if more_stuff_needs_done == true:
stuffproc1= Process(target=OtherThreadStuff).start()
But what seems to happen is the whole thing starts a couple of times so the "Doing stuff" output comes up and a couple of the threads run.
I could put some .join()s in but there is no loop which should cause the print output to run again which means there is nowhere for it to wait.
My hope is this is just a syntax thing but I'm stumped trying to find out why the whole script loops. I'd really appreciate any pointers in the right direction.
This is mentioned in the docs:
Safe importing of main module
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a
starting a new process).
For example, under Windows running the following module would fail with a RuntimeError:
from multiprocessing import Process
def foo():
print 'hello'
p = Process(target=foo)
p.start()
Instead one should protect the “entry point” of the program by using if __name__ == '__main__': as follows:
from multiprocessing import Process, freeze_support
def foo():
print 'hello'
if __name__ == '__main__':
freeze_support()
p = Process(target=foo)
p.start()
This allows the newly spawned Python interpreter to safely import the module and then run the module’s foo() function.
As I have discovered windows is a bit of a pig when it comes to multiprocessing and I have a questions about it.
The pydoc states you should protect the entry point of a windows application when using multiprocessing.
Does this mean only the code which creates the new process?
For example
Script 1
import multiprocessing
def somemethod():
while True:
print 'do stuff'
# this will need protecting
p = multiprocessing.Process(target=somemethod).start()
# this wont
if __name__ == '__main__':
p = multiprocessing.Process(target=somemethod).start()
In this script you need to wrap this in if main because the line in spawning the process.
But what about if you had?
Script 2
file1.py
import file2
if __name__ == '__main__':
p = Aclass().start()
file2.py
import multiprocessing
ITEM = 0
def method1():
print 'method1'
method1()
class Aclass(multiprocessing.Process):
def __init__(self):
print 'Aclass'
super(Aclass, self).__init__()
def run(self):
print 'stuff'
What would need to be protected in this instance?
What would happen if there was a if __main__ in File 2, would the code inside of this get executed if a process was being created?
NOTE: I know the code will not compile. It's just an example.
The pydoc states you should protect the entry point of a windows application when using multiprocessing.
My interpretation differs: the documentations states
the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
So importing your module (import mymodule) should not create new processes. That is, you can avoid starting processes by protecting your process-creating code with an
if __name__ == '__main__':
...
because the code in the ... will only run when your program is run as main program, that is, when you do
python mymodule.py
or when you run it as an executable, but not when you import the file.
So, to answer your question about the file2: no, you do not need protection because no process is started during the import file2.
Also, if you put an if __name__ == '__main__' in file2.py, it would not run because file2 is imported, not executed as main program.
edit: here is an example of what can happen when you do not protect your process-creating code: it might just loop and create a ton of processes.
I want to learn multiprocessing in python. I started reading http://www.doughellmann.com/PyMOTW/multiprocessing/basics.html and I am not able to understand the section on importing target functions.
In particular what does the following sentence mean..
"Wrapping the main part of the application in a check for __main__ ensures that it is not run recursively in each child as the module is imported."
Can someone explain this in more detail with an example ?
http://effbot.org/pyfaq/tutor-what-is-if-name-main-for.htm
http://docs.python.org/tutorial/modules.html#executing-modules-as-scripts
What does if __name__ == "__main__": do?
http://en.wikipedia.org/wiki/Main_function#Python
On Windows, the multiprocessing module imports the __main__ module when spawning a new process. If the code that spawns the new process is not wrapped in a if __name__ == '__main__' block, then importing the main module will again spawn a new process. And so on, ad infinitum.
This issue is also mentioned in the multiprocessing docs in the section entitled "Safe importing of main module". There, you'll find the following simple example:
Running this on Windows:
from multiprocessing import Process
def foo():
print 'hello'
p = Process(target=foo)
p.start()
results in a RuntimeError.
And the fix is to use:
if __name__ == '__main__':
p = Process(target=foo)
p.start()
"""This is my module (mymodule.py)"""
def sum(a,b):
""">>> sum(1,1)
2
>>> sum(1,-1)
0
"""
return a+b
# if you run this module using 'python mymodule.py', run a self test
# if you just import this module, you get foo() and other definitions,
# but the self-test isn't run
if __name__=='__main__':
import doctest
doctest.testmod()
Ensures that the script being run is in the 'top level environment' for interactivity.
For example if you wanted to interact with the user (launching process) you would want to ensure that it is main.
if __name__ == '__main__':
do_something()