Multiprocessing on a set number of cores

Multiprocessing on a set number of cores - python

In the code below, where Function is a function to be called, how might I specify the number of processors to be used as 10?
if __name__ == '__main__':
jobs = []
for l in lst:
p = multiprocessing.Process(target=Function, args=(l,))
jobs.append(p)
p.start()
This code will completely take over my server, so how do I limit it to ten cores? Should I put it in a loop?

Given that you are essentially mapping a function over a list of variables might I suggest that you use multiprocessing.Pool instead.
This is a class which creates a pool of a limited number of worker threads that can then be used to run a function over a list of inputs instead of Process where you create a thread per function call and then run them all at the same time
An example of using it in versions of python < 3.3 would be:
from multiprocessing import Pool
import contextlib
num_threads = 10
with contextlib.closing( Pool(num_threads) ) as pool:
results = pool.map(Function, lst)
If you are using python 3 than the Pool class can use a context manager by default and the code simplifies to:
from multiprocessing import Pool
num_threads = 10
with Pool(num_threads) as pool:
results = pool.map(lst)

Use a process pool to allocate set number of processes to do your task: https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers

Related

Run N processes but never reuse the same process

I like to run a bunch of processes concurrently but never want to reuse an already existing process. So, basically once a process is finished I like to create a new one. But at all times the number of processes should not exceed N.
I don't think I can use multiprocessing.Pool for this since it reuses processes.
How can I achieve this?
One solution would be to run N processes and wait until all processed are done. Then repeat the same thing until all tasks are done. This solution is not very good since each process can have very different runtimes.
Here is a naive solution that appears to work fine:
from multiprocessing import Process, Queue
import random
import os
from time import sleep
def f(q):
print(f"{os.getpid()} Starting")
sleep(random.choice(range(1, 10)))
q.put("Done")
def create_proc(q):
p = Process(target=f, args=(q,))
p.start()
if __name__ == "__main__":
q = Queue()
N = 5
for n in range(N):
create_proc(q)
while True:
q.get()
create_proc(q)

Pool can reuse a process a limited number of times, including one time only when you pass maxtasksperchild=1. You might also try initializer to see if you can run the picky once per process parts of your library there instead of in your pool jobs.

Timing a multiprocessing script

I've stumbled across a weird timing issue while using the multiprocessing module.
Consider the following scenario. I have functions like this:
import multiprocessing as mp
def workerfunc(x):
# timehook 3
# something with x
# timehook 4
def outer():
# do something
mygen = ... (some generator expression)
pool = mp.Pool(processes=8)
# time hook 1
result = [pool.apply(workerfunc, args=(x,)) for x in mygen]
# time hook 2
if __name__ == '__main__':
outer()
I am utilizing the time module to get an arbitrary feeling for how long my functions run. I successfully create 8 separate processes, which terminate without error. The longest time for a worker to finish is about 130 ms (measured between timehook 3 and 4).
I expected (as they are running in parallel) that the time between hook 1 and 2 will be approximately the same. Surprisingly, I get 600 ms as a result.
My machine has 32 cores and should be able to handle this easily. Can anybody give me a hint where this difference in time comes from?
Thanks!

You are using pool.apply which is blocking. Use pool.apply_async instead and then the function calls will all run in parallel, and each will return an AsyncResult object immediately. You can use this object to check when the processes are done and then retrieve the results using this object also.

Since you are using multiprocessing and not multithreading your performance issue is not related to GIL (Python's Global Interpreter Lock).
I've found an interesting link explaining this with an example, you can find it in the bottom of this answer.
The GIL does not prevent a process from running on a different
processor of a machine. It simply only allows one thread to run at
once within the interpreter.
So multiprocessing not multithreading will allow you to achieve true
concurrency.
Lets understand this all through some benchmarking because only that
will lead you to believe what is said above. And yes, that should be
the way to learn — experience it rather than just read it or
understand it. Because if you experienced something, no amount of
argument can convince you for the opposing thoughts.
import random
from threading import Thread
from multiprocessing import Process
size = 10000000 # Number of random numbers to add to list
threads = 2 # Number of threads to create
my_list = []
for i in xrange(0,threads):
my_list.append([])
def func(count, mylist):
for i in range(count):
mylist.append(random.random())
def multithreaded():
jobs = []
for i in xrange(0, threads):
thread = Thread(target=func,args=(size,my_list[i]))
jobs.append(thread)
# Start the threads
for j in jobs:
j.start()
# Ensure all of the threads have finished
for j in jobs:
j.join()
def simple():
for i in xrange(0, threads):
func(size,my_list[i])
def multiprocessed():
processes = []
for i in xrange(0, threads):
p = Process(target=func,args=(size,my_list[i]))
processes.append(p)
# Start the processes
for p in processes:
p.start()
# Ensure all processes have finished execution
for p in processes:
p.join()
if __name__ == "__main__":
multithreaded()
#simple()
#multiprocessed()
Additional information
Here you can find the source of this information and a more detailed technical explanation (bonus: there's also Guido Van Rossum quotes in it :) )

Pythons parallel processing

I am in the following setting: I have a method that takes an objective function f as input. As a subrouting of that method i want to evaluate f on a small set of points. Since f has high complexity i considered doing that in parallel.
All online examples hang up even for trivial functions like squaring on sets with 5 points. They are using the multiprocessing library - and i don't know what i am doing wrong. I am not sure how to encapsulate that __name__ == "__main__" statement in my method. (since it is part of a module - i guess instead of "__main__" i should use the module name?)
Code i have been using looks like
from multiprocessing.pool import Pool
from multiprocessing import cpu_count
x = [1,2,3,4,5]
num_cores = cpu_count()
def f(x):
return x**2
if __name__ == "__main__":
pool = Pool(num_cores)
y = list(pool.map(f, x))
pool.join()
print(y)
When executing this code in my spyder it takes a bloody long time to finish.
So my main questions are: What am i doing wrong in this code? How can i encapsulate the __name__-statement, when this code is part of a bigger method?
Is it even worth it parallelizing this? (one function evaluation can take multiple minutes and in serial this adds up to a total runtime of hours...)

According to documentation :
close()
Prevents any more tasks from being submitted to the pool. Once all the tasks have been completed the worker processes will exit.
terminate()
Stops the worker processes immediately without completing outstanding work. When the pool object is garbage collected
terminate() will be called immediately.
join()
Wait for the worker processes to exit. One must call close() or terminate() before using join().
So you should add :
from multiprocessing.pool import Pool
from multiprocessing import cpu_count
x = [1,2,3,4,5]
def f(x):
return x**2
if __name__ == "__main__":
pool = Pool()
y = list(pool.map(f, x))
pool.close()
pool.join()
print(y)
You can call Pool without any argument and it will use cpu_count by default
If processes is None then the number returned by cpu_count() is used
About the if name == "main", read more informations here.
So you need to think a bit about which code you want executed only in the main program. The most obvious example is that you want code that creates child processes to run only in the main program - so that should be protected by name == 'main'

You might want to look into the chunksize argument of the map function that you are using.
On a large enough input list, a lot of your time is spent simply communicating the arguments to and from the separate parallel processes.
One symptom of this problem is that when you use something like htop all cores are firing but at < 100%.

ThreadPool and Pool for parallel processing

Is there a way to use both ThreadPool and Pool in python to parallelise a loop by specifying the number of CPUs and cores you wish to use?
For example I would have a loop execute as:
from multiprocessing.dummy import Pool as ThreadPool
from tqdm import tqdm
import numpy as np
def my_function(x):
return x + 1
pool = ThreadPool(4)
my_array = np.arange(0,1e6,1)
results = list(tqdm(pool.imap(my_function, my_array),total=len(my_array)))
For 4 cores but it I wanted to spread these out on multiple CPUs as well, is there a simple way to adapt the code?

You are confusing between a core and a CPU. Generally, for all purposes both are considered to be the same(let's call them processor from now on).
When creating a thread pool in python, the threads are user level threads and are run on the same processor, due to Global Interpreter Lock(GIL) in python. As only one thread can control the python interpreter at a time. So, using (python)threads we don't get any real concurrency in data-intensive tasks.
How to solve this? Easy. Spawn multiple python processes running on different processors(each with its own interpreter). This is where the multi processing(mp) module is used, to spawn multiple processes from the parent python process in which it is called.
You can verify this by running htop(on linux, mac) and analysing the number of python processes. In case of mp module, they all will have the same name as the parent script where the pool.map function is called.
Timings for your code on a 8 core mac: 39.7s
Timing for this code on the same machine : 2.9s(note I can use 8 cores at max, but for comparison purposes using only 4)
Below is the modified code:
from multiprocessing.dummy import Pool as ThreadPool
from tqdm import tqdm
import numpy as np
import time
import multiprocessing as mp
def my_function(x):
return x + 1
pool = ThreadPool(4)
my_array = np.arange(0,1e6,1)
t1 = time.time()
# results = list(tqdm(pool.imap(my_function, my_array),total=len(my_array)))
pool = mp.Pool(processes=4) # Generally, set to 2*num_cores you have
res = pool.map(my_function, my_array)
print("Time taken = ", time.time() - t1)

multiprocessing.dummy.Pool is exactly simple ThreadPool, which don't use multicores and multicpus (because of GIL). you must use multiprocessing.Pool to run Process, which is process in your OS (if you define Pool(N) - N is number of this processes, if no - number of your cores in OS is default). Arguments this processes get from internal queue of Pool. 'case of that U will use all cpu and all core in your OS

What is the easiest way to make maximum cpu usage for nested for-loops?

I have code that makes unique combinations of elements. There are 6 types, and there are about 100 of each. So there are 100^6 combinations. Each combination has to be calculated, checked for relevance and then either be discarded or saved.
The relevant bit of the code looks like this:
def modconffactory():
for transmitter in totaltransmitterdict.values():
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
Now this takes a long time and that is fine, but now I realize this process (making the configurations and then calculations for later use) is only using 1 of my 8 processor cores at a time.
I've been reading up about multithreading and multiprocessing, but I only see examples of different processes, not how to multithread one process. In my code I call two functions: 'dosomethingwith()' and 'saveforlateruse_if_useful()'. I could make those into separate processes and have those run concurrently to the for-loops, right?
But what about the for-loops themselves? Can I speed up that one process? Because that is where the time consumption is. (<-- This is my main question)
Is there a cheat? for instance compiling to C and then the os multithreads automatically?

I only see examples of different processes, not how to multithread one process
There is multithreading in Python, but it is very ineffective because of GIL (Global Interpreter Lock). So if you want to use all of your processor cores, if you want concurrency, you have no other choice than use multiple processes, which can be done with multiprocessing module (well, you also could use another language without such problems)
Approximate example of multiprocessing usage for your case:
import multiprocessing
WORKERS_NUMBER = 8
def modconffactoryProcess(generator, step, offset, conn):
"""
Function to be invoked by every worker process.
generator: iterable object, the very top one of all you are iterating over,
in your case, totalrecieverdict.values()
We are passing a whole iterable object to every worker, they all will iterate
over it. To ensure they will not waste time by doing the same things
concurrently, we will assume this: each worker will process only each stepTH
item, starting with offsetTH one. step must be equal to the WORKERS_NUMBER,
and offset must be a unique number for each worker, varying from 0 to
WORKERS_NUMBER - 1
conn: a multiprocessing.Connection object, allowing the worker to communicate
with the main process
"""
for i, transmitter in enumerate(generator):
if i % step == offset:
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
conn.send('done')
def modconffactory():
"""
Function to launch all the worker processes and wait until they all complete
their tasks
"""
processes = []
generator = totaltransmitterdict.values()
for i in range(WORKERS_NUMBER):
conn, childConn = multiprocessing.Pipe()
process = multiprocessing.Process(target=modconffactoryProcess, args=(generator, WORKERS_NUMBER, i, childConn))
process.start()
processes.append((process, conn))
# Here we have created, started and saved to a list all the worker processes
working = True
finishedProcessesNumber = 0
try:
while working:
for process, conn in processes:
if conn.poll(): # Check if any messages have arrived from a worker
message = conn.recv()
if message == 'done':
finishedProcessesNumber += 1
if finishedProcessesNumber == WORKERS_NUMBER:
working = False
except KeyboardInterrupt:
print('Aborted')
You can adjust WORKERS_NUMBER to your needs.
Same with multiprocessing.Pool:
import multiprocessing
WORKERS_NUMBER = 8
def modconffactoryProcess(transmitter):
for reciever in totalrecieverdict.values():
for processor in totalprocessordict.values():
for holoarray in totalholoarraydict.values():
for databus in totaldatabusdict.values():
for multiplexer in totalmultiplexerdict.values():
newconfiguration = [transmitter, reciever, processor, holoarray, databus, multiplexer]
data_I_need = dosomethingwith(newconfiguration)
saveforlateruse_if_useful(data_I_need)
def modconffactory():
pool = multiprocessing.Pool(WORKERS_NUMBER)
pool.map(modconffactoryProcess, totaltransmitterdict.values())
You probably would like to use .map_async instead of .map
Both snippets do the same, but I would say in the first one you have more control over the program.
I suppose the second one is the easiest, though :)
But the first one should give you the idea of what is happening in the second one
multiprocessing docs: https://docs.python.org/3/library/multiprocessing.html

you can run your function in this way:
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
p = Pool(5)
print(p.map(f, [1, 2, 3]))
https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiprocessing on a set number of cores - python

Use a process pool to allocate set number of processes to do your task: https://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers

Related

Run N processes but never reuse the same process

Timing a multiprocessing script

Pythons parallel processing

ThreadPool and Pool for parallel processing

What is the easiest way to make maximum cpu usage for nested for-loops?

Categories

Resources