Suppose i have two functions
def funct1():
##does something
def funct2():
##does something
I want to use them in another function with multiprocessing like so:
def my_funct
##does something
if __name__ == '__main__':
p1 = Process(target = funct1)
p2 = Process(target = funct2)
p1.start()
p2.start()
##more code
p1.terminate()
p2.terminate()
return something
Basically i want to start and end processes inside my function but its not working properly. What would be the correct way to do this?
On platforms that use the spawn method to create new processes it becomes necessary to place the process-creation code that exists at the global scope within an if __name__ == '__main__': block to prevent the newly created child process from trying to re-execute recursively the process-creation code since all code at the global level is re-executed in order to initialize memory (e.g. function definitions and global variables) for the process. Putting such a test within a function or method, which by definition would not be at global scope, would not normally make too much sense.
In your case there must be some code in the main script being executed (where __name__ would be '__main__' unless that script is launched as a module with the -m Python flag) that invokes directly or indirectly your my_funct function. It is that code that should be placed in a if __name__ == '__main__': block. For example:
def my_funct
##does something
p1 = Process(target = funct1)
p2 = Process(target = funct2)
p1.start()
p2.start()
##more code
p1.terminate()
p2.terminate()
return something
def function main():
# Do some work
...
# Call my_funct, which creates new child processes:
print(my_funct())
# Do some more work
...
if __name__ == '__main__':
# The following function invocation is at global scope in the main script
# and invokes code that will ultimately be creating new child processes:
main()
If the if __name__ == '__main__': test were instead moved to where you had it originally, then function main would be invoked as part of the initialization of memory for the new processes that were created in function my_funct. But I am sure you would not want main or any part of my_funct to be re-executed, which would happen with this move.
Note
I should add that any code at global scope that you do not want or do not need to be re-executed as part of memory initialization for the new child process should be placed within a if __name__ == '__main__': block , not just process-creation code. Also note that if your my_funct is imported from some module, then __name__ would not be '__main__' to begin with.
Related
I am quite new to python and the multi-processing module. I want to know how to make the process skip the beginning so it doesn't repeat it. Any help would be appreciated :)
print("Doing something!!!")
Code:
import multiprocessing
print("Doing something!!!")
def stuff():
print("Doing stuff")
if __name__ == '__main__':
p1 = multiprocessing.Process(target=stuff)
p1.start()
Output:
Doing something!!!
Doing something!!!
Doing stuff
Desired output:
Doing something!!!
Doing stuff
See the multiprocessing Programming Guidelines in the documentation.
On systems that use the "spawn" or "forkserver" methods of creating processes:
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process).
Your script is being imported into every process, so it will run any global code in all processes. Just move anything global into the if __name__ == '__main__': section:
import multiprocessing
def stuff():
print("Doing stuff")
if __name__ == '__main__':
print("Doing something!!!")
p1 = multiprocessing.Process(target=stuff)
p1.start()
This insures the function stuff() will be imported and defined in every process, but your print will only run once in the main process.
What I'd like to do is the following program to print out:
Running Main
Running Second
Running Main
Running Second
[...]
Code:
from multiprocessing import Process
import time
def main():
while True:
print('Running Main')
time.sleep(1)
def second():
while True:
print('Running Second')
time.sleep(1)
p1 = Process(main())
p2 = Process(second())
p1.start()
p2.start()
But it doesn't have the desired behavior. Instead it just prints out:
Running Main
Running Main
[...]
I suspect my program doesn't work because of the while statement?
Is there any way I can overcome this problem and have my program print out what I mentioned no matter what I execute in my function?
The issue here seems to be when you make the process vars. I suspect the reason for why the process inclusively runs the first function is because of syntax. My interpretation is that instead of creating a process out of a function you are making a process that executes a function exclusively.
When you want to create Process object you want to avoid using this
p1 = Process(target=main())
and rather write
p1 = Process(target=main)
That also means if you want to include any input for the function you will have to
p1 = Process(target=main, args=('hi',))
I'm trying to divvy up the task of looking up historical stock price data for a list of symbols by using Pool from the multiprocessing library.
This works great until I try to use the data I get back. I have my hist_price function defined and it outputs to a list-of-dicts pcl. I can print(pcl) and it has been flawless, but if I try to print(pcl) after the if __name__=='__main__': block, it blows up saying pcl is undefined. I've tried declaring global pcl in a couple places but it doesn't make a difference.
from multiprocessing import Pool
syms = ['List', 'of', 'symbols']
def hist_price(sym):
#... lots of code looking up data, calculations, building dicts...
stlh = {"Sym": sym, "10D Max": pcmax, "10D Min": pcmin} #simplified
return stlh
#global pcl
if __name__ == '__main__':
pool = Pool(4)
#global pcl
pcl = pool.map(hist_price, syms)
print(pcl) #this works
pool.close()
pool.join()
print(pcl) #says pcl is undefined
#...rest of my code, dependent on pcl...
I've also tried removing the if __name__=='__main__': block but it gives me a RunTimeError telling me specifically to put it back. Is there some other way to call variables to use outside of the if block?
I think there are two parts to your issue. The first is "what's wrong with pcl in the current code?", and the second is "why do I need the if __name__ == "__main__" guard block at all?".
Lets address them in order. The problem with the pcl variable is that it is only defined in the if block, so if the module gets loaded without being run as a script (which is what sets __name__ == "__main__"), it will not be defined when the later code runs.
To fix this, you can change how your code is structured. The simplest fix would be to guard the other bits of the code that use pcl within an if __name__ == "__main__" block too (e.g. indent them all under the current block, perhaps). An alternative fix would be to put the code that uses pcl into functions (which can be declared outside the guard block), then call the functions from within an if __name__ == "__main__" block. That would look something like this:
def do_stuff_with_pcl(pcl):
print(pcl)
if __name__ == "__main__":
# multiprocessing code, etc
pcl = ...
do_stuff_with_pcl(pcl)
As for why the issue came up in the first place, the ultimate cause is using the multiprocessing module on Windows. You can read about the issue in the documentation.
When multiprocessing creates a new process for its Pool, it needs to initialize that process with a copy of the current module's state. Because Windows doesn't have fork (which copies the parent process's memory into a child process automatically), Python needs to set everything up from scratch. In each child process, it loads the module from its file, and if you the module's top-level code tries to create a new Pool, you'd have a recursive situation where each of the child process would start spawning a whole new set of child processes of its own.
The multiprocessing code has some guards against that, I think (so you won't fork bomb yourself out of simple carelessness), but you still need to do some of the work yourself too, by using if __name__ == "__main__" to guard any code that shouldn't be run in the child processes.
I am working on the integration of two different framework, say Main_process1.py and Main_process2.py. Consider, Main_process1.py have a main() and Main_process2.py have another main().
So, i have changed the main() as main1() and main2() respectively for Main_process1.py and Main_process2.py and created a new file like overall_Main.py.
The new file overall_Main.py will have two process, one starting main1() in Main_process1.py and main2() in Main_process2.py.
Basically, what's required is, an overall main process has to be created for two different main processes. Please give some suggestion
This is, like, the very basics of Python Multiprocessing:
from multiprocessing import Process
from Main_process1 import main1
from Main_process2 import main2
if __name__ == '__main__':
p1 = Process(target=main1, args=(list your args for main1 here,))
p2 = Process(target=main2, args=(list your args for main2 here,))
p1.start()
p2.start()
#your other main code
p1.join()
p2.join()
I am trying to use multiprocessing to return a list, but instead of waiting until all processes are done, I get several returns from one return statement in mp_factorizer, like this:
None
None
(returns list)
in this example I used 2 threads. If I used 5 threads, there would be 5 None returns before the list is being put out. Here is the code:
def mp_factorizer(nums, nprocs, objecttouse):
if __name__ == '__main__':
out_q = multiprocessing.Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []
for i in range(nprocs):
p = multiprocessing.Process(
target=worker,
args=(nums[chunksize * i:chunksize * (i + 1)],
out_q,
objecttouse))
procs.append(p)
p.start()
# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultlist = []
for i in range(nprocs):
temp=out_q.get()
index =0
for i in temp:
resultlist.append(temp[index][0][0:])
index +=1
# Wait for all worker processes to finish
for p in procs:
p.join()
resultlist2 = [x for x in resultlist if x != []]
return resultlist2
def worker(nums, out_q, objecttouse):
""" The worker function, invoked in a process. 'nums' is a
list of numbers to factor. The results are placed in
a dictionary that's pushed to a queue.
"""
outlist = []
for n in nums:
outputlist=objecttouse.getevents(n)
if outputlist:
outlist.append(outputlist)
out_q.put(outlist)
mp_factorizer gets a list of items, # of threads, and an object that the worker should use, it then splits up the list of items so all threads get an equal amount of the list, and starts the workers.
The workers then use the object to calculate something from the given list, add the result to the queue.
Mp_factorizer is supposed to collect all results from the queue, merge them to one large list and return that list. However - I get multiple returns.
What am I doing wrong? Or is this expected behavior due to the strange way windows handles multiprocessing?
(Python 2.7.3, Windows7 64bit)
EDIT:
The problem was the wrong placement of if __name__ == '__main__':. I found out while working on another problem, see using multiprocessing in a sub process for a complete explanation.
if __name__ == '__main__' is in the wrong place. A quick fix would be to protect only the call to mp_factorizer like Janne Karila suggested:
if __name__ == '__main__':
print mp_factorizer(list, 2, someobject)
However, on windows the main file will be executed once on execution + once for every worker thread, in this case 2. So this would be a total of 3 executions of the main thread, excluding the protected part of the code.
This can cause problems as soon as there are other computations being made in the same main thread, and at the very least unnecessarily slow down performance. Even though only the worker function should be executed several times, in windows everything will be executed thats not protected by if __name__ == '__main__'.
So the solution would be to protect the whole main process by executing all code only after
if __name__ == '__main__'.
If the worker function is in the same file, however, it needs to be excluded from this if statement because otherwise it can not be called several times for multiprocessing.
Pseudocode main thread:
# Import stuff
if __name__ == '__main__':
#execute whatever you want, it will only be executed
#as often as you intend it to
#execute the function that starts multiprocessing,
#in this case mp_factorizer()
#there is no worker function code here, it's in another file.
Even though the whole main process is protected, the worker function can still be started, as long as it is in another file.
Pseudocode main thread, with worker function:
# Import stuff
#If the worker code is in the main thread, exclude it from the if statement:
def worker():
#worker code
if __name__ == '__main__':
#execute whatever you want, it will only be executed
#as often as you intend it to
#execute the function that starts multiprocessing,
#in this case mp_factorizer()
#All code outside of the if statement will be executed multiple times
#depending on the # of assigned worker threads.
For a longer explanation with runnable code, see using multiprocessing in a sub process
Your if __name__ == '__main__' statement is in the wrong place. Put it around the print statement to prevent the subprocesses from executing that line:
if __name__ == '__main__':
print mp_factorizer(list, 2, someobject)
Now you have the if inside mp_factorizer, which makes the function return None when called inside a subprocess.