I am trying to run 2 things in parallel with multiprocessing, I have this code:
from multiprocessing import Process
def secondProcess():
x = 0
while True:
x += 1
if __name__ == '__main__':
p = Process(target=secondProcess())
p.start()
print "blah"
p.join()
What seems to happen is that the second process starts running but it does not proceed with running the parent process, it just hangs until the second process finishes (so in this case never). So "blah" is never printed.
How can I make it run both in parallel?
You don't want to call secondProcess. You want to pass it as a parameter.
p = Process(target=secondProcess)
Related
I have written a very basic code to test multiprocess in python.
When i try to run the code on my windows machine, it does not run while it works fine on linux machine.
Below is the code and the error that it throws.
from multiprocessing import Process
import os
import time
# creating a list to store all the processes that will be created
processes = []
# Get count of CPU Cores
cores = os.cpu_count()
def square(n): #just creating a random program for demostration
for i in range (n):
print(i)
i*i
time.sleep(0.1)
# create a process.
for i in range(cores):
p = Process(target=square,args=(100,))
processes.append(p)
#statrting all the processes
for proc in processes:
proc.start()
# join process
for proc in processes:
proc.join()
print("All processes are done")
Most Process executions (Not Threads) on Python will start a new instance importing itself. This means your global code will be executed every time the instance is doing the import. (This only applies to the spawn start method)
In order to avoid these issues, you have to move your code into the if __name__ == "__main__": function in order for the Process to create a new instance correctly.
You fix it like so:
from multiprocessing import Process
import os
import time
def square(n): #just creating a random program for demostration
for i in range (n):
print(i)
i*i
time.sleep(0.1)
if __name__ == "__main__":
# Get count of CPU Cores
cores = os.cpu_count()
# creating a list to store all the processes that will be created
processes = []
# create a process.
for i in range(cores):
p = Process(target=square,args=(100,))
processes.append(p)
#statrting all the processes
for proc in processes:
proc.start()
# join process
for proc in processes:
proc.join()
print("All processes are done")
Result:
1
0
0
1
1
0
2
0
0
1
1
1
2
... Truncated
All processes are done
i want to hav a variable number of threads who run at the same time.
I tested multiple multithread examples from multiprocessing but they dont run at the same time.
To explain it better here an example:
from multiprocessing import Pool
def f(x):
print("a",x)
time.sleep(1)
print("b",x)
if __name__ == '__main__':
with Pool(3) as p:
for i in range(5):
p.map(f, [i])
Result:
a 0
b 0
a 1
b 1
a 2
b 2
Here it does a waits 1 sec and then b, but i want it that all a's get printed first and then b's (That every Thread runs at the same time so that the result looks like this:
a0
a1
a2
b0
b1
b2
You mentioned threads but seem to be using processes. The threading module uses threads, the multiprocessing module uses processes. The primary difference is that threads run in the same memory space, while processes have separate memory. If you are looking to use process library. Try using below code snippet.
from multiprocessing import Process
import time
def f(x):
print("a",x)
time.sleep(1)
print("b",x)
if __name__ == '__main__':
for i in range(5):
p = Process(target=f, args=(i,))
p.start()
processes are spawned by creating a Process object and then calling its start() method.
First of all this is not a threads pool, but a processes pool. If you want threads, you need to use multiprocessing.dummy.
Second, it seems like you misunderstood the map method. Most importantly, it is blocking. You are calling it with a single numbered list each time - [i]. So you don't actually use the Pool's powers. You utilize just one process, wait for it to finish, and move on to the next number. To get the output you want, you should instead do:
if __name__ == '__main__':
with Pool(3) as p:
p.map(f, range(5))
But note that in this case you have a race between the number of processes and the range. If you want all as and only then all bs, try to use Pool(5).
I have a program that needs to create several graphs, with each one often taking hours. Therefore I want to run these simultaneously on different cores, but cannot seem to get these processes to run with the multiprocessing module. Here is my code:
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=full_graph)
jobs.append(p)
p.start()
p.join()
(full_graph() has been defined earlier in the program, and is simply a function that runs a collection of other functions)
The function normally outputs some graphs, and saves the data to a .txt file. All data is saved to the same 2 text files. However, calling the functions using the above code gives no console output, nor any output to the text file. All that happens is a few second long pause, and then the program exits.
I am using the Spyder IDE with WinPython 3.6.3
Without a simple full_graph sample nobody can tell you what's happening. But your code is inherently wrong.
if __name__ == '__main__':
jobs = []
for i in range(5):
p = multiprocessing.Process(target=full_graph)
jobs.append(p)
p.start()
p.join() # <- This would block until p is done
See the comment after p.join(). If your processes really take hours to complete, you would run one process for hours and then the 2nd, the 3rd. Serially and using a single core.
From the docs: https://docs.python.org/3/library/multiprocessing.html
Process.join: https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process.join
If the optional argument timeout is None (the default), the method blocks until the process whose join() method is called terminates. If timeout is a positive number, it blocks at most timeout seconds. Note that the method returns None if its process terminates or if the method times out. Check the process’s exitcode to determine if it terminated.
If each process does something different, you should then also have some args for full_graph(hint: may that be the missing factor?)
You probably want to use an interface like map from Pool
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.pool.Pool
And do (from the docs again)
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
So I took the following code, ran it, and literally nothing happened. Python acted like it had finished everything (maybe it did) but nothing printed. Any help getting this to work would be greatly appreciated!
import multiprocessing
def worker(number):
print number
return
if __name__ == '__main__':
test = multiprocessing.Process(target=worker, args=[0,1,2,3,4])
test.start()
Your code should actually result in an error. The args argument to multiprocessing.Process() does not open a process for each argument, it just supplies the arguments in the list to a single function and then calls that function in a child process. To run 5 separate instances like that, you would have to do something like this:
import multiprocessing
def worker(number):
print number
return
if __name__ == '__main__':
procs = []
for i in range(5):
procs.append(multiprocessing.Process(target=worker, args=[i]))
[proc.start() for proc in procs]
Your code tries to run worker(0,1,2,3,4) in a new process. If you want to execute worker() function in parallel in multiple processes:
from multiprocessing import Pool
def worker(number):
return number*number
if __name__ == '__main__':
pool = Pool() # use all available CPUs
for square in pool.imap(worker, [0,1,2,3,4]):
print(square)
Your code results in error when I run it. Since args are parsed using commas, you need to specify that the entire array consists of a single argument.
import multiprocessing
def worker(number):
print number
return
if __name__ == '__main__':
test = multiprocessing.Process(target=worker, args=([0,1,2,3,4],))
test.start()
test.join()
Also, don't forget to join the process at the end.
I am trying to use multiprocessing to return a list, but instead of waiting until all processes are done, I get several returns from one return statement in mp_factorizer, like this:
None
None
(returns list)
in this example I used 2 threads. If I used 5 threads, there would be 5 None returns before the list is being put out. Here is the code:
def mp_factorizer(nums, nprocs, objecttouse):
if __name__ == '__main__':
out_q = multiprocessing.Queue()
chunksize = int(math.ceil(len(nums) / float(nprocs)))
procs = []
for i in range(nprocs):
p = multiprocessing.Process(
target=worker,
args=(nums[chunksize * i:chunksize * (i + 1)],
out_q,
objecttouse))
procs.append(p)
p.start()
# Collect all results into a single result dict. We know how many dicts
# with results to expect.
resultlist = []
for i in range(nprocs):
temp=out_q.get()
index =0
for i in temp:
resultlist.append(temp[index][0][0:])
index +=1
# Wait for all worker processes to finish
for p in procs:
p.join()
resultlist2 = [x for x in resultlist if x != []]
return resultlist2
def worker(nums, out_q, objecttouse):
""" The worker function, invoked in a process. 'nums' is a
list of numbers to factor. The results are placed in
a dictionary that's pushed to a queue.
"""
outlist = []
for n in nums:
outputlist=objecttouse.getevents(n)
if outputlist:
outlist.append(outputlist)
out_q.put(outlist)
mp_factorizer gets a list of items, # of threads, and an object that the worker should use, it then splits up the list of items so all threads get an equal amount of the list, and starts the workers.
The workers then use the object to calculate something from the given list, add the result to the queue.
Mp_factorizer is supposed to collect all results from the queue, merge them to one large list and return that list. However - I get multiple returns.
What am I doing wrong? Or is this expected behavior due to the strange way windows handles multiprocessing?
(Python 2.7.3, Windows7 64bit)
EDIT:
The problem was the wrong placement of if __name__ == '__main__':. I found out while working on another problem, see using multiprocessing in a sub process for a complete explanation.
if __name__ == '__main__' is in the wrong place. A quick fix would be to protect only the call to mp_factorizer like Janne Karila suggested:
if __name__ == '__main__':
print mp_factorizer(list, 2, someobject)
However, on windows the main file will be executed once on execution + once for every worker thread, in this case 2. So this would be a total of 3 executions of the main thread, excluding the protected part of the code.
This can cause problems as soon as there are other computations being made in the same main thread, and at the very least unnecessarily slow down performance. Even though only the worker function should be executed several times, in windows everything will be executed thats not protected by if __name__ == '__main__'.
So the solution would be to protect the whole main process by executing all code only after
if __name__ == '__main__'.
If the worker function is in the same file, however, it needs to be excluded from this if statement because otherwise it can not be called several times for multiprocessing.
Pseudocode main thread:
# Import stuff
if __name__ == '__main__':
#execute whatever you want, it will only be executed
#as often as you intend it to
#execute the function that starts multiprocessing,
#in this case mp_factorizer()
#there is no worker function code here, it's in another file.
Even though the whole main process is protected, the worker function can still be started, as long as it is in another file.
Pseudocode main thread, with worker function:
# Import stuff
#If the worker code is in the main thread, exclude it from the if statement:
def worker():
#worker code
if __name__ == '__main__':
#execute whatever you want, it will only be executed
#as often as you intend it to
#execute the function that starts multiprocessing,
#in this case mp_factorizer()
#All code outside of the if statement will be executed multiple times
#depending on the # of assigned worker threads.
For a longer explanation with runnable code, see using multiprocessing in a sub process
Your if __name__ == '__main__' statement is in the wrong place. Put it around the print statement to prevent the subprocesses from executing that line:
if __name__ == '__main__':
print mp_factorizer(list, 2, someobject)
Now you have the if inside mp_factorizer, which makes the function return None when called inside a subprocess.