So I took the following code, ran it, and literally nothing happened. Python acted like it had finished everything (maybe it did) but nothing printed. Any help getting this to work would be greatly appreciated!
import multiprocessing
def worker(number):
print number
return
if __name__ == '__main__':
test = multiprocessing.Process(target=worker, args=[0,1,2,3,4])
test.start()
Your code should actually result in an error. The args argument to multiprocessing.Process() does not open a process for each argument, it just supplies the arguments in the list to a single function and then calls that function in a child process. To run 5 separate instances like that, you would have to do something like this:
import multiprocessing
def worker(number):
print number
return
if __name__ == '__main__':
procs = []
for i in range(5):
procs.append(multiprocessing.Process(target=worker, args=[i]))
[proc.start() for proc in procs]
Your code tries to run worker(0,1,2,3,4) in a new process. If you want to execute worker() function in parallel in multiple processes:
from multiprocessing import Pool
def worker(number):
return number*number
if __name__ == '__main__':
pool = Pool() # use all available CPUs
for square in pool.imap(worker, [0,1,2,3,4]):
print(square)
Your code results in error when I run it. Since args are parsed using commas, you need to specify that the entire array consists of a single argument.
import multiprocessing
def worker(number):
print number
return
if __name__ == '__main__':
test = multiprocessing.Process(target=worker, args=([0,1,2,3,4],))
test.start()
test.join()
Also, don't forget to join the process at the end.
Related
I have a multiprocessing pool , that runs with 1 thread, and it keeps repeating the code before my function, i have tried with different threads, and also, i make things like this quite a bit, so i think i know what is causing the problem but i dont understand why, usually i use argparse to to parse files from the user, but i instead wanted to use input, no errors are thrown so i honestly have no clue.
from colorama import Fore
import colorama
import os
import ctypes
import multiprocessing
from multiprocessing import Pool
import random
colorama.init(autoreset=False)
print("headerhere")
#as you can see i used input instead of argparse
g = open(input(Fore.RED + " File Path?: " + Fore.RESET))
gg = open(input(Fore.RED + "File Path?: " + Fore.RESET))
#I messed around with this to see if it was the problem, ultimately disabling it until i fixed it, i just use 1 thread
threads = int(input(Fore.RED + "Amount of Threads?: " + Fore.RESET))
arrange = [lines.replace("\n", "")for lines in g]
good = [items.replace("\n", "") for items in gg]
#this is all of the code before the function that Pool calls
def che(line):
print("f")
#i would show my code but as i said this isnt the problem since ive made programs like this before, the only thing i changed is how i take file inputs from the user
def main():
pool = Pool(1)
pool.daemon = True
result = pool.map(che, arrange)
if __name__ == "__main__":
main()
if __name__ == "__main__":
main()
Here's a minimal, reproducible example of your issue:
from multiprocessing import Pool
print('header')
def func(n):
print(f'func {n}')
def main():
pool = Pool(3)
pool.map(func,[1,2,3])
if __name__ == '__main__':
main()
On OSes where "spawn" (Windows and MacOS) or "forkserver" (some Unix) are the default start methods, the sub-process imports your script. Since print('header') is at global scope, it will run the first time a script is imported into a process, so the output is:
header
header
header
header
func 1
func 2
func 3
A multiprocessing script should have everything meant to run once inside function(s), and they should be called once by the main script via if_name__ == '__main__':, so the solution is to move it into your def main()::
from multiprocessing import Pool
def func(n):
print(f'func {n}')
def main():
print('header')
pool = Pool(3)
pool.map(func,[1,2,3])
if __name__ == '__main__':
main()
Output:
header
func 1
func 2
func 3
If you want the top level code before the definition of che to only be executed in the master process, then place it in a function and call that function in main.
In multiprocessing, the top level statements will be interpreted/executed by both the master process and every child process. So, if some code should be executed only by the master and not by the children, then such code should not placed that at the top-level. Instead, such code should be placed in functions and these functions should be invoked in the main scope, i.e., in the scope of if block controlled by __main__ (or called in the main function in your code snippet).
It seems multiprocessing swaps between threads faster so I started working on swapping over but I'm getting some unexpected results. It causes my entire script to loop several times when a thread didn't before.
Snippet example:
stuff_needs_done = true
more_stuff_needs_done = true
print "Doing stuff"
def ThreadStuff():
while 1 == 1:
#do stuff here
def OtherThreadStuff():
while 1 == 1:
#do other stuff here
if stuff_needs_done == true:
Thread(target=ThreadStuff).start()
if more_stuff_needs_done == true:
Thread(target=OtherThreadStuff).start()
This works as I'd expect. The threads start and run until stopped. But when running a lot of these the overhead is higher (so I'm told) so I tried swapping to multiprocessing.
Snippet example:
stuff_needs_done = true
more_stuff_needs_done = true
print "Doing stuff"
def ThreadStuff():
while 1 == 1:
#do stuff here
def OtherThreadStuff():
while 1 == 1:
#do other stuff here
if stuff_needs_done == true:
stuffproc1= Process(target=ThreadStuff).start()
if more_stuff_needs_done == true:
stuffproc1= Process(target=OtherThreadStuff).start()
But what seems to happen is the whole thing starts a couple of times so the "Doing stuff" output comes up and a couple of the threads run.
I could put some .join()s in but there is no loop which should cause the print output to run again which means there is nowhere for it to wait.
My hope is this is just a syntax thing but I'm stumped trying to find out why the whole script loops. I'd really appreciate any pointers in the right direction.
This is mentioned in the docs:
Safe importing of main module
Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a
starting a new process).
For example, under Windows running the following module would fail with a RuntimeError:
from multiprocessing import Process
def foo():
print 'hello'
p = Process(target=foo)
p.start()
Instead one should protect the “entry point” of the program by using if __name__ == '__main__': as follows:
from multiprocessing import Process, freeze_support
def foo():
print 'hello'
if __name__ == '__main__':
freeze_support()
p = Process(target=foo)
p.start()
This allows the newly spawned Python interpreter to safely import the module and then run the module’s foo() function.
When i tried to run the code:
import multiprocessing
def worker():
"""worker function"""
print 'Worker'
return
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
The output is blank and simply executing without printing "Worker". How to print the required output in multiprocessing?
What actually is happening while using multiprocessing?
What is the maximum number of cores we can use for multiprocessing?
I've tried your code in Windows 7, Cygwin, and Ubuntu. For me all the threads finish before the loop comes to an end so I get all the prints to show, but using join() will guarantee all the threads will finish.
import multiprocessing
def worker():
"""worker function"""
print 'Worker'
return
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=worker)
jobs.append(p)
p.start()
for i in range(len(jobs)):
jobs.pop().join()
As far as how multiprocessing works in the backend, I'm going to let someone more experienced than myself answer that one :) I'll probably just make a fool of myself.
I get 5 time "Worker" printed for my part, are you on Python 3 ? if it is the case you muste use print("Worker"). from my experiment, I think multitreading doesn't mean using multiple cores, it just run the diferent tread alternatively to ensure a parallelism. try reading the multiprocessing lib documentation for more info.
I am trying to run 2 things in parallel with multiprocessing, I have this code:
from multiprocessing import Process
def secondProcess():
x = 0
while True:
x += 1
if __name__ == '__main__':
p = Process(target=secondProcess())
p.start()
print "blah"
p.join()
What seems to happen is that the second process starts running but it does not proceed with running the parent process, it just hangs until the second process finishes (so in this case never). So "blah" is never printed.
How can I make it run both in parallel?
You don't want to call secondProcess. You want to pass it as a parameter.
p = Process(target=secondProcess)
I am trying to use multiprocessing to return a list, but instead of waiting until all processes are done, I get several returns from one return statement in mp_factorizer, like this:
None
None
(returns list)
in this example I used 2 threads. If I used 5 threads, there would be 5 None returns before the list is being put out. Here is the code:
def mp_factorizer(nums, nprocs, objecttouse):
if __name__ == '__main__':
out_q = multiprocessing.Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []
for i in range(nprocs):
p = multiprocessing.Process(
target=worker,
args=(nums[chunksize * i:chunksize * (i + 1)],
out_q,
objecttouse))
procs.append(p)
p.start()
# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultlist = []
for i in range(nprocs):
temp=out_q.get()
index =0
for i in temp:
resultlist.append(temp[index][0][0:])
index +=1
# Wait for all worker processes to finish
for p in procs:
p.join()
resultlist2 = [x for x in resultlist if x != []]
return resultlist2
def worker(nums, out_q, objecttouse):
""" The worker function, invoked in a process. 'nums' is a
list of numbers to factor. The results are placed in
a dictionary that's pushed to a queue.
"""
outlist = []
for n in nums:
outputlist=objecttouse.getevents(n)
if outputlist:
outlist.append(outputlist)
out_q.put(outlist)
mp_factorizer gets a list of items, # of threads, and an object that the worker should use, it then splits up the list of items so all threads get an equal amount of the list, and starts the workers.
The workers then use the object to calculate something from the given list, add the result to the queue.
Mp_factorizer is supposed to collect all results from the queue, merge them to one large list and return that list. However - I get multiple returns.
What am I doing wrong? Or is this expected behavior due to the strange way windows handles multiprocessing?
(Python 2.7.3, Windows7 64bit)
EDIT:
The problem was the wrong placement of if __name__ == '__main__':. I found out while working on another problem, see using multiprocessing in a sub process for a complete explanation.
if __name__ == '__main__' is in the wrong place. A quick fix would be to protect only the call to mp_factorizer like Janne Karila suggested:
if __name__ == '__main__':
print mp_factorizer(list, 2, someobject)
However, on windows the main file will be executed once on execution + once for every worker thread, in this case 2. So this would be a total of 3 executions of the main thread, excluding the protected part of the code.
This can cause problems as soon as there are other computations being made in the same main thread, and at the very least unnecessarily slow down performance. Even though only the worker function should be executed several times, in windows everything will be executed thats not protected by if __name__ == '__main__'.
So the solution would be to protect the whole main process by executing all code only after
if __name__ == '__main__'.
If the worker function is in the same file, however, it needs to be excluded from this if statement because otherwise it can not be called several times for multiprocessing.
Pseudocode main thread:
# Import stuff
if __name__ == '__main__':
#execute whatever you want, it will only be executed
#as often as you intend it to
#execute the function that starts multiprocessing,
#in this case mp_factorizer()
#there is no worker function code here, it's in another file.
Even though the whole main process is protected, the worker function can still be started, as long as it is in another file.
Pseudocode main thread, with worker function:
# Import stuff
#If the worker code is in the main thread, exclude it from the if statement:
def worker():
#worker code
if __name__ == '__main__':
#execute whatever you want, it will only be executed
#as often as you intend it to
#execute the function that starts multiprocessing,
#in this case mp_factorizer()
#All code outside of the if statement will be executed multiple times
#depending on the # of assigned worker threads.
For a longer explanation with runnable code, see using multiprocessing in a sub process
Your if __name__ == '__main__' statement is in the wrong place. Put it around the print statement to prevent the subprocesses from executing that line:
if __name__ == '__main__':
print mp_factorizer(list, 2, someobject)
Now you have the if inside mp_factorizer, which makes the function return None when called inside a subprocess.