Python: Can you set cpu counts with multiprocessing Process - python

With multiprocessing.Pool, there are code samples in the tutorials where you can set number of processes with cpu counts. Can you set the number of cpu's with the multiprocessing.Process method.
from multiprocessing import Process, Value, Array
def f(n, a):
n.value = 3.1415927
for i in range(len(a)):
a[i] = -a[i]
if __name__ == '__main__':
num = Value('d', 0.0)
arr = Array('i', range(10))
p = Process(target=f, args=(num, arr))
p.start()
p.join()
print(num.value)
print(arr[:])

Actually Process represents only one process which uses only one CPU (if you dont use threads) - it is up to you to create as many Processes as you need.
This means that you have to create as many Processes as you have CPUs to use all of them (possibly -1 if you are doing things in the main process)
You can read the number of CPUs with multiprocessing.cpu_count

Related

Add new tasks when the number of active jobs becomes less than N

I am trying to parallelize some very time-consuming tasks on a certain number of cores. My goal is to exploit the server resources as much as possible. The total number of CPUs is 20, but the number of tasks to do is much more (say, 100).
Each task runs for a different time, so the code below leaves an opportunity for a certain number of cores to idle without work.
import multiprocessing as mp
def some_task(*args):
# Something happening here
pass
cpu_count = mp.cpu_count()
p = mp.Pool(cpu_count)
n_thread = 1
for something in somethings:
p.apply_async(some_task, args=(something, ))
if n_thread == cpu_count:
p.close()
p.join()
p = mp.Pool(cpu_count)
n_thread = 1
continue
n_thread += 1
p.close()
if n_thread != 1:
p.join()
How to program the code that will start new tasks when the number of active jobs becomes less than the number of CPUs?

Python multiprocessing pool execution time compared to non-multiprocessing execution time

I am currently redesigning a program to use Python's multiprocessing pools. My first impression was that the execution time increased instead of decreased. Therefore, I got curious and wrote a little test script:
import time
import multiprocessing
def simple(x):
return 2*x
def less_simple(x):
b = x
for i in range(0, 100):
b = b * i
return 2*x
a = list(range(0,1000000))
print("without multiprocessing:")
before = time.time()
res = map(simple, a)
after = time.time()
print(str(after - before))
print("-----")
print("with multiprocessing:")
for i in range(1, 5):
before = time.time()
with multiprocessing.Pool(processes=i) as pool:
pool.map(simple, a)
after = time.time()
print(str(i) + " processes: " + str(after - before))
I get the following results:
without multiprocessing:
2.384185791015625e-06
with multiprocessing:
1 processes: 0.35068225860595703
2 processes: 0.21297240257263184
3 processes: 0.21887946128845215
4 processes: 0.3474385738372803
When I replace simple with less_simple in lines 21 and 31, I get the following results:
without multiprocessing:
2.6226043701171875e-06
with multiprocessing:
1 processes: 3.1453816890716553
2 processes: 1.615351676940918
3 processes: 1.6125438213348389
4 processes: 1.5159809589385986
Honestly, I am a bit confused because the non-multiprocessing version is always some orders of magnitudes faster. Additionally, an increase of the process number seems to have little to no influence on the runtime. Therefore, I have a few questions:
Do I make some mistake in the usage of multiprocessing?
Are my test functions to simple to get a positive impact from multiprocessing?
Is there a chance to estimate at which point multiprocessing has an advantage or do I have to test it?
I did some more research and basically, you are right. Both functions are rather small and somewhat artificial. However, there is a measurable time difference between non-multiprocessing and multiprocessing even for those functions, when you take into consideration how map works. The map function only returns an iterator yielding the results [1], i.e., in the above example, it only creates the iterator which is of course very fast.
Therefore, I replaced the map function with a traditional for loop:
for elem in a:
res = simple(a)
For the simple function, the execution is still faster without multiprocessing because the overhead is too big for such a small function:
without multiprocessing:
0.1392803192138672
with multiprocessing:
1 processes: 0.38080787658691406
2 processes: 0.22507309913635254
3 processes: 0.21307945251464844
4 processes: 0.2152390480041504
However, in case of the function less_simple, you can see an actual advantage of multiprocessing:
without multiprocessing:
3.2029929161071777
with multiprocessing:
1 processes: 3.4934208393096924
2 processes: 1.8259460926055908
3 processes: 1.9196875095367432
4 processes: 1.716357946395874
[1] https://docs.python.org/3/library/functions.html#map

Multiprocessing in Python not faster than doing it sequentially

I want to do something parallelly but it always goes slower. I put an example of two code snippets which can be compared. The multiprocessing way needs 12 seconds on my laptop. The sequential way only 3 seconds. I thought multiprocessing is faster.
I know that the task in this way does not make any sense but it is just made to compare the two ways. I know bubble sort can be replaced by faster ways.
Thanks.
Multiprocessing way:
from multiprocessing import Process, Manager
import os
import random
myArray = []
for i in range(1000):
myArray.append(random.randint(1,1000))
def getRandomSample(myset, sample_size):
sorted_list = sorted(random.sample(xrange(len(myset)), sample_size))
return([myset[i] for i in sorted_list])
def bubbleSort(iterator,alist, return_dictionary):
sample_list = (getRandomSample(alist, 100))
for passnum in range(len(sample_list)-1,0,-1):
for i in range(passnum):
if sample_list[i]>alist[i+1]:
temp = alist[i]
sample_list[i] = alist[i+1]
sample_list[i+1] = temp
return_dictionary[iterator] = sample_list
if __name__ == '__main__':
manager = Manager()
return_dictionary = manager.dict()
jobs = []
for i in range(3000):
p = Process(target=bubbleSort, args=(i,myArray,return_dictionary))
jobs.append(p)
p.start()
for proc in jobs:
proc.join()
print return_dictionary.values()
The other way:
import os
import random
myArray = []
for i in range(1000):
myArray.append(random.randint(1,1000))
def getRandomSample(myset, sample_size):
sorted_list = sorted(random.sample(xrange(len(myset)), sample_size))
return([myset[i] for i in sorted_list])
def bubbleSort(alist):
sample_list = (getRandomSample(alist, 100))
for passnum in range(len(sample_list)-1,0,-1):
for i in range(passnum):
if sample_list[i]>alist[i+1]:
temp = alist[i]
sample_list[i] = alist[i+1]
sample_list[i+1] = temp
return(sample_list)
if __name__ == '__main__':
results = []
for i in range(3000):
results.append(bubbleSort(myArray))
print results
Multiprocessing is faster if you have multiple cores and do the parallelization properly. In your example you create 3000 processes which causes enormous amount on context switching between them. Instead of that use Pool to schedule the jobs for processes:
def bubbleSort(alist):
sample_list = (getRandomSample(alist, 100))
for passnum in range(len(sample_list)-1,0,-1):
for i in range(passnum):
if sample_list[i]>alist[i+1]:
temp = alist[i]
sample_list[i] = alist[i+1]
sample_list[i+1] = temp
return(sample_list)
if __name__ == '__main__':
pool = Pool(processes=4)
for x in pool.imap_unordered(bubbleSort, (myArray for x in range(3000))):
pass
I removed all the output and did some tests on my 4 core machine. As expected the code above was about 4 times faster than your sequential example.
Multiprocessing is not just magically faster. The thing is that your computer still has to do the same amount of work. It's like if you try to do multiple tasks at once, it's not going to be faster.
In a "normal" program, doing it sequential is easier to read and write (that it is that much faster too surprises me a little). Multiprocessing is especially useful if you have to wait for another process like a web request (you can send multiple at once and don't have to wait for each) or having some sort of event loop.
My guess as to why it is faster is that Python already uses multiprocessing internally wherever it makes sense (don't quote me on that). Also with threading it has to keep track of what is where, which means more overhead.
So, if we go back to the example in the real world, if you give a task to somebody else and instead of waiting for it, you do other things at the same time as them, then you are faster.

Deadlock with multiprocessing module

I have a function that without multiprocessing loops over an array with 3-tuples and does some calculation. This array can be really long (>1million entries) so I thought using several processes could help speed things up.
I start with a list of points (random_points) with which I create a permutation of all possible triples (combList). This combList then is passed to my function.
The basic code I have works but only when the random_points list has 18 entries or less.
from scipy import stats
import itertools
import multiprocessing as mp
def calc3PointsList( points,output ):
xy = []
r = []
for point in points:
// do stuff with points and append results to xy and r
output.put( (xy, r) )
output = mp.Queue()
random_points = [ (np.array((stats.uniform(-0.5,1).rvs(),stats.uniform(-0.5,1).rvs()))) for _ in range(18)]
combList = list(itertools.combinations(random_points, 3))
N = 6
processes = [mp.Process(target=calc3PointsList, args=(combList[(i-1)*len(combList)/(N-1):i*len(combList)/(N-1)],output)) for i in range(1,N)]
for p in processes:
p.start()
for p in processes:
p.join()
results = [output.get() for p in processes]
As soon as the length of the random_points list is longer than 18 the program seems to go into a deadlock. With 18 and lower it just finishes fine. Am I using this whole multiprocessing module the wrong way?
OK, the problem is described in the programming guideline mentioned by user2667217:
Bear in mind that a process that has put items in a queue will wait before terminating until all the buffered items are fed by the “feeder” thread to the underlying pipe. (The child process can call the Queue.cancel_join_thread method of the queue to avoid this behaviour.)
This means that whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate. Remember also that non-daemonic processes will be joined automatically.
Removing the join operation made it work. Also the right way to retrieve processes seems to be:
results = [output.get() for p in processes]
I do see anything else you posted that is clearly wrong but there is one thing you should definitely do : start new processes in a if __name__=="main": block, see programming guideline.
from scipy import stats
import itertools
import multiprocessing as mp
def calc3PointsList( points,output ):
xy = []
r = []
for point in points:
// do stuff with points and append results to xy and r
output.put( (xy, r) )
if __name__ == "__main__":
output = mp.Queue()
random_points = [ (np.array((stats.uniform(-0.5,1).rvs(),stats.uniform(-0.5,1).rvs()))) for _ in range(18)]
combList = list(itertools.combinations(random_points, 3))
N = 6
processes = [mp.Process(target=calc3PointsList, args=(combList[(i-1)*len(combList)/(N-1):i*len(combList)/(N-1)],output)) for i in range(1,N)]
for p in processes:
p.start()
for p in processes:
p.join()
results = [output.get for x in range(output.qsize())]

multiprocessing full capacity in Python

I wrote the following code which call function (compute_cluster) 6 times in parallel (each run of this function is independent of the other run and each run write the results in a separate file), the following is my code:
global L
for L in range(6,24):
pool = Pool(6)
pool.map(compute_cluster,range(1,3))
pool.close()
if __name__ == "__main__":
main(sys.argv)
despite the fact that I'm running this code on a I7 processor machine, and no matter how much I set the Pool to it's always running only two processes in parallel so is there any suggestion on how can I run 6 processes in parallel? such that the first three processes use L=6 and call compute_cluster with parameter values from 1:3 in parallel and at the same time the other three processes run the same function with the same parameter values but this time the Global L value is 7 ?
any suggestions is highly appreciated
There are a few things wrong here. First, as to why you always only have 2 processes going at a time -- The reason is because range(1, 3) only returns 2 values. So you're only giving the pool 2 tasks to do before you close it.
The second issue is that you're relying on global state. In this case, the code probably works, but it's limiting your performance since it is the factor which is preventing you from using all your cores. I would parallelize the L loop rather than the "inner" range loop. Something like1:
def wrapper(tup):
l, r = tup
# Even better would be to get rid of `L` and pass it to compute_cluster
global L
L = l
compute_cluster(r)
for r in range(1, 3):
p = Pool(6)
p.map(wrapper, [(l, r) for l in range(6, 24)])
p.close()
This works with the global L because each spawned process picks up its own copy of L -- It doesn't get shared between processes.
1Untested code
As pointed out in the comments, we can even pull the Pool out of the loop:
p = Pool(6)
p.map(wrapper, [(l, r) for l in range(6, 24) for r in range(1, 3)])
p.close()

Categories