I am trying to understand how to use the multiprocessing module in Python. The code below spawns four processes and outputs the results as they become available. It seems to me that there must be a better way for how the results are obtained from the Queue; some method that does not rely on counting how many items the Queue contains but that just returns items as they become available and then gracefully exits once the queue is empty. The docs say that Queue.empty() method is not reliable. Is there a better alternative for how to consume the results from the queue?
import multiprocessing as mp
import time
def multby4_wq(x, queue):
print "Starting!"
time.sleep(5.0/x)
a = x*4
queue.put(a)
if __name__ == '__main__':
queue1 = mp.Queue()
for i in range(1, 5):
p = mp.Process(target=multbyc_wq, args=(i, queue1))
p.start()
for i in range(1, 5): # This is what I am referring to as counting again
print queue1.get()
Instead of using queue, how about using Pool?
For example,
import multiprocessing as mp
import time
def multby4_wq(x):
print "Starting!"
time.sleep(5.0/x)
a = x*4
return a
if __name__ == '__main__':
pool = mp.Pool(4)
for result in pool.map(multby4_wq, range(1, 5)):
print result
Pass multiple arguments
Assume you have a function that accept multiple parameters (add in this example). Make a wrapper function that pass arguments to add (add_wrapper).
import multiprocessing as mp
import time
def add(x, y):
time.sleep(1)
return x + y
def add_wrapper(args):
return add(*args)
if __name__ == '__main__':
pool = mp.Pool(4)
for result in pool.map(add_wrapper, [(1,2), (3,4), (5,6), (7,8)]):
print result
Related
My current project requires using multiple processes. I need to share an array between those processes. The array needs to be able to be written to at any time. And the array has to have multiple dimensions. (example: [["test",2],[87209873,"howdy"]]) I've been looking for an answer to this for a few hours now, but I can't find anything. Please help. Thanks in advance!
Try it:
from multiprocessing import Pool, Manager
def worker(v, array):
array.append(["test", v])
def main():
foo = [["test", 2], [87209873, "howdy"]]
array = Manager().list(foo)
with Pool(processes=4) as pool:
pool.starmap(worker, [(i, array)
for i in range(4)])
print(array)
if __name__ == "__main__":
main()
[EDITED]
If you want, that the main program keeps running, during calculating, wrap pooling in a separate thread:
from multiprocessing import Pool, Manager
from threading import Thread
def _worker(v, array):
for i in range(10000):
array.append(["test", v])
def processor(array):
with Pool(processes=4) as pool:
pool.starmap(_worker, [(i, array)
for i in range(4)])
def main():
foo = [["test", 2], [87209873, "howdy"]]
array = Manager().list(foo)
t = Thread(target=processor, args=(array,))
t.start()
print("Good day!")
# Wait, while thread ends.
# Without doing it, you'll print array,
# not knowing when the thread ended.
t.join()
print(array)
if __name__ == "__main__":
main()
First of all, a list is not an array, if you want to share a list between different processes you can use a Manager from the multiprocessing module, for example:
import multiprocessing as mp
def remove_last_element(mp_list: list):
mp_list.pop()
def append_list(mp_list: list):
mp_list.append([12, 'New Hello'])
if __name__ == "__main__":
mp_list = mp.Manager().list()
mp_list.append(['Hello'])
print("before multiprocessing:", mp_list)
worker1 = mp.Process(target=remove_last_element, args=(mp_list,))
worker2 = mp.Process(target=append_list, args=(mp_list,))
worker1.start()
worker2.start()
worker1.join()
worker2.join()
print("after multiprocessing:", mp_list)
>>> before multiprocessing: [['Hello']]
>>> after multiprocessing: [[12, 'New Hello']]
I want to use use multiprocessing to do the following:
class myClass:
def proc(self):
#processing random numbers
return a
def gen_data(self):
with Pool(cpu_count()) as q:
data = q.map(self.proc, [_ for i in range(cpu_count())])#What is the correct approach?
return data
Try this:
def proc(self, i):
#processing random numbers
return a
def gen_data(self):
with Pool(cpu_count()) as q:
data = q.map(self.proc, [i for i in range(cpu_count())])#What is the correct approach?
return data
Since you don't have to pass an argument to the processes, there's no reason to map, just call apply_async() as many times as needed.
Here's what I'm saying:
from multiprocessing import cpu_count
from multiprocessing.pool import Pool
from random import randint
class MyClass:
def proc(self):
#processing random numbers
return randint(1, 10)
def gen_data(self, num_procs):
with Pool() as pool: # The default pool size will be the number of cpus.
results = [pool.apply_async(self.proc) for _ in range(num_procs)]
pool.close()
pool.join() # Wait until all worker processes exit.
return [result.get() for result in results] # Gather results.
if __name__ == '__main__':
obj = MyClass()
print(obj.gen_data(8))
To make my code more "pythonic" and faster, I use multiprocessing and a map function to send it a) the function and b) the range of iterations.
The implanted solution (i.e., calling tqdm directly on the range tqdm.tqdm(range(0, 30))) does not work with multiprocessing (as formulated in the code below).
The progress bar is displayed from 0 to 100% (when python reads the code?) but it does not indicate the actual progress of the map function.
How can one display a progress bar that indicates at which step the 'map' function is ?
from multiprocessing import Pool
import tqdm
import time
def _foo(my_number):
square = my_number * my_number
time.sleep(1)
return square
if __name__ == '__main__':
p = Pool(2)
r = p.map(_foo, tqdm.tqdm(range(0, 30)))
p.close()
p.join()
Any help or suggestions are welcome...
Use imap instead of map, which returns an iterator of the processed values.
from multiprocessing import Pool
import tqdm
import time
def _foo(my_number):
square = my_number * my_number
time.sleep(1)
return square
if __name__ == '__main__':
with Pool(2) as p:
r = list(tqdm.tqdm(p.imap(_foo, range(30)), total=30))
Sorry for being late but if all you need is a concurrent map, I added this functionality in tqdm>=4.42.0:
from tqdm.contrib.concurrent import process_map # or thread_map
import time
def _foo(my_number):
square = my_number * my_number
time.sleep(1)
return square
if __name__ == '__main__':
r = process_map(_foo, range(0, 30), max_workers=2)
References: https://tqdm.github.io/docs/contrib.concurrent/ and https://github.com/tqdm/tqdm/blob/master/examples/parallel_bars.py
It supports max_workers and chunksize and you can also easily switch from process_map to thread_map.
Solution found. Be careful! Due to multiprocessing, the estimation time (iteration per loop, total time, etc.) could be unstable, but the progress bar works perfectly.
Note: Context manager for Pool is only available in Python 3.3+.
from multiprocessing import Pool
import time
from tqdm import *
def _foo(my_number):
square = my_number * my_number
time.sleep(1)
return square
if __name__ == '__main__':
with Pool(processes=2) as p:
max_ = 30
with tqdm(total=max_) as pbar:
for _ in p.imap_unordered(_foo, range(0, max_)):
pbar.update()
You can use p_tqdm instead.
https://github.com/swansonk14/p_tqdm
from p_tqdm import p_map
import time
def _foo(my_number):
square = my_number * my_number
time.sleep(1)
return square
if __name__ == '__main__':
r = p_map(_foo, list(range(0, 30)))
based on the answer of Xavi MartÃnez I wrote the function imap_unordered_bar. It can be used in the same way as imap_unordered with the only difference that a processing bar is shown.
from multiprocessing import Pool
import time
from tqdm import *
def imap_unordered_bar(func, args, n_processes = 2):
p = Pool(n_processes)
res_list = []
with tqdm(total = len(args)) as pbar:
for i, res in tqdm(enumerate(p.imap_unordered(func, args))):
pbar.update()
res_list.append(res)
pbar.close()
p.close()
p.join()
return res_list
def _foo(my_number):
square = my_number * my_number
time.sleep(1)
return square
if __name__ == '__main__':
result = imap_unordered_bar(_foo, range(5))
import multiprocessing as mp
import tqdm
iterable = ...
num_cpu = mp.cpu_count() - 2 # dont use all cpus.
def func():
# your logic
...
if __name__ == '__main__':
with mp.Pool(num_cpu) as p:
list(tqdm.tqdm(p.imap(func, iterable), total=len(iterable)))
For progress bar with apply_async, we can use following code as suggested in:
https://github.com/tqdm/tqdm/issues/484
import time
import random
from multiprocessing import Pool
from tqdm import tqdm
def myfunc(a):
time.sleep(random.random())
return a ** 2
pool = Pool(2)
pbar = tqdm(total=100)
def update(*a):
pbar.update()
for i in range(pbar.total):
pool.apply_async(myfunc, args=(i,), callback=update)
pool.close()
pool.join()
Here is my take for when you need to get results back from your parallel executing functions. This function does a few things (there is another post of mine that explains it further) but the key point is that there is a tasks pending queue and a tasks completed queue. As workers are done with each task in the pending queue they add the results in the tasks completed queue. You can wrap the check to the tasks completed queue with the tqdm progress bar. I am not putting the implementation of the do_work() function here, it is not relevant, as the message here is to monitor the tasks completed queue and update the progress bar every time a result is in.
def par_proc(job_list, num_cpus=None, verbose=False):
# Get the number of cores
if not num_cpus:
num_cpus = psutil.cpu_count(logical=False)
print('* Parallel processing')
print('* Running on {} cores'.format(num_cpus))
# Set-up the queues for sending and receiving data to/from the workers
tasks_pending = mp.Queue()
tasks_completed = mp.Queue()
# Gather processes and results here
processes = []
results = []
# Count tasks
num_tasks = 0
# Add the tasks to the queue
for job in job_list:
for task in job['tasks']:
expanded_job = {}
num_tasks = num_tasks + 1
expanded_job.update({'func': pickle.dumps(job['func'])})
expanded_job.update({'task': task})
tasks_pending.put(expanded_job)
# Set the number of workers here
num_workers = min(num_cpus, num_tasks)
# We need as many sentinels as there are worker processes so that ALL processes exit when there is no more
# work left to be done.
for c in range(num_workers):
tasks_pending.put(SENTINEL)
print('* Number of tasks: {}'.format(num_tasks))
# Set-up and start the workers
for c in range(num_workers):
p = mp.Process(target=do_work, args=(tasks_pending, tasks_completed, verbose))
p.name = 'worker' + str(c)
processes.append(p)
p.start()
# Gather the results
completed_tasks_counter = 0
with tqdm(total=num_tasks) as bar:
while completed_tasks_counter < num_tasks:
results.append(tasks_completed.get())
completed_tasks_counter = completed_tasks_counter + 1
bar.update(completed_tasks_counter)
for p in processes:
p.join()
return results
Based on "user17242583" answer, I created the following function. It should be as fast as Pool.map and the results are always ordered. Plus, you can pass as many parameters to your function as you want and not just a single iterable.
from multiprocessing import Pool
from functools import partial
from tqdm import tqdm
def imap_tqdm(function, iterable, processes, chunksize=1, desc=None, disable=False, **kwargs):
"""
Run a function in parallel with a tqdm progress bar and an arbitrary number of arguments.
Results are always ordered and the performance should be the same as of Pool.map.
:param function: The function that should be parallelized.
:param iterable: The iterable passed to the function.
:param processes: The number of processes used for the parallelization.
:param chunksize: The iterable is based on the chunk size chopped into chunks and submitted to the process pool as separate tasks.
:param desc: The description displayed by tqdm in the progress bar.
:param disable: Disables the tqdm progress bar.
:param kwargs: Any additional arguments that should be passed to the function.
"""
if kwargs:
function_wrapper = partial(_wrapper, function=function, **kwargs)
else:
function_wrapper = partial(_wrapper, function=function)
results = [None] * len(iterable)
with Pool(processes=processes) as p:
with tqdm(desc=desc, total=len(iterable), disable=disable) as pbar:
for i, result in p.imap_unordered(function_wrapper, enumerate(iterable), chunksize=chunksize):
results[i] = result
pbar.update()
return results
def _wrapper(enum_iterable, function, **kwargs):
i = enum_iterable[0]
result = function(enum_iterable[1], **kwargs)
return i, result
This approach simple and it works.
from multiprocessing.pool import ThreadPool
import time
from tqdm import tqdm
def job():
time.sleep(1)
pbar.update()
pool = ThreadPool(5)
with tqdm(total=100) as pbar:
for i in range(100):
pool.apply_async(job)
pool.close()
pool.join()
import multiprocessing
from multiprocessing import Pool
from source.RUN import*
def func(r,grid,pos,h):
return r,grid,pos,h
p = multiprocessing.Pool() # Creates a pool with as many workers as you have CPU cores
results = []
if __name__ == '__main__':
for i in pos[-1]<2:
results.append(Pool.apply_async(LISTE,(r,grid,pos[i,:],h)))
p.close()
p.join()
for result in results:
print('liste', result.get())
I want to create Pool for (LISTE,(r,grid,pos[i,:],h)) process and i is in pos which is variable in different file which is a ndarray[] and I have to call this whole function in another file in between one While Loop. but this code gives error and if I am using
if __name__ == '__main__':
it will not pass through below the if __name__ == '__main__': module
please give me idea how I can make it
I'm still having a somewhat difficult time understanding your question. But I think this is what you're looking for:
You want to be able to call a function that creates a pool given r, grid, pos, h Iterate over pos feed it to the Pool, then return the results. You also want to be able to access that function from different modules. If that's what you're asking, you can do it like this:
async_module.py:
from multiprocessing import Pool
# Not sure where the LISTE function gets defined, but it needs to be in here.
def do_LISTE(*args):
# args is a tuple containing (r, grid, pos[i, :], h)
# we use tuple expansion (*args( to send each parameter to LISTE
return LISTE(*args)
def async_process(r,grid,pos,h):
return r,grid,pos,h
p = multiprocessing.Pool() # Creates a pool with as many workers as you have CPU cores
results = p.map(do_LISTE, [(r,grid,pos[i,:], h) for i in pos[-1]<2])
p.close()
p.join()
return results
Then in some other module:
from async_module import async_process
def do_async_processing():
r = "something"
grid = get_grid()
pos = get_pos()
h = 345
results = async_process(r, grid, pos, h)
if __name__ == "__main__":
do_async_processing() # Make sure the entry point is protected by `if __name__ == "__main__":`.
I want a long-running process to return its progress over a Queue (or something similar) which I will feed to a progress bar dialog. I also need the result when the process is completed. A test example here fails with a RuntimeError: Queue objects should only be shared between processes through inheritance.
import multiprocessing, time
def task(args):
count = args[0]
queue = args[1]
for i in xrange(count):
queue.put("%d mississippi" % i)
return "Done"
def main():
q = multiprocessing.Queue()
pool = multiprocessing.Pool()
result = pool.map_async(task, [(x, q) for x in range(10)])
time.sleep(1)
while not q.empty():
print q.get()
print result.get()
if __name__ == "__main__":
main()
I've been able to get this to work using individual Process objects (where I am alowed to pass a Queue reference) but then I don't have a pool to manage the many processes I want to launch. Any advise on a better pattern for this?
The following code seems to work:
import multiprocessing, time
def task(args):
count = args[0]
queue = args[1]
for i in xrange(count):
queue.put("%d mississippi" % i)
return "Done"
def main():
manager = multiprocessing.Manager()
q = manager.Queue()
pool = multiprocessing.Pool()
result = pool.map_async(task, [(x, q) for x in range(10)])
time.sleep(1)
while not q.empty():
print q.get()
print result.get()
if __name__ == "__main__":
main()
Note that the Queue is got from a manager.Queue() rather than multiprocessing.Queue(). Thanks Alex for pointing me in this direction.
Making q global works...:
import multiprocessing, time
q = multiprocessing.Queue()
def task(count):
for i in xrange(count):
q.put("%d mississippi" % i)
return "Done"
def main():
pool = multiprocessing.Pool()
result = pool.map_async(task, range(10))
time.sleep(1)
while not q.empty():
print q.get()
print result.get()
if __name__ == "__main__":
main()
If you need multiple queues, e.g. to avoid mixing up the progress of the various pool processes, a global list of queues should work (of course, each process will then need to know what index in the list to use, but that's OK to pass as an argument;-).