If I call apply_async 10,000 times, assuming the OOM-killer doesn't interfere, will multiprocessing start them all simultaneously, or will it start them in batches. For example.. Every 100 starts, waiting for 90 to finish starting before starting any more?
Dustin
apply_async() is a method of multiprocessing.Pool objects, and delivers all work to the number of processes you specified when you created the Pool. Only that many tasks can run simultaneously. The rest are saved in queues (or pipes) by the multiprocessing machinery, and automatically doled out to processes as they complete tasks already assigned. Much the same is true of all the Pool methods to which you feed multiple work items.
A little more clarification: apply_async doesn't create, or start, any processes. The processes were created when you called Pool(). The processes just sit there and wait until you invoke Pool methods (like apply_async()) that ask for some real work to be done.
Example
Play with this:
MAX = 100000
from time import sleep
def f(i):
sleep(0.01)
return i
def summer(summand):
global SUM, FINISHED
SUM += summand
FINISHED += 1
if __name__ == "__main__":
import multiprocessing as mp
SUM = 0
FINISHED = 0
pool = mp.Pool(4)
print "queuing", MAX, "work descriptions"
for i in xrange(MAX):
pool.apply_async(f, args=(i,), callback=summer)
if i % 1000 == 0:
print "{}/{}".format(FINISHED, i),
print
print "closing pool"
pool.close()
print "waiting for processes to end"
pool.join()
print "verifying result"
print "got", SUM, "expected", sum(xrange(MAX))
Output is like:
queuing 100000 work descriptions
0/0 12/1000 21/2000 33/3000 42/4000
... stuff chopped for brevity ...
1433/95000 1445/96000 1456/97000 1466/98000 1478/99000
closing pool
waiting for processes to end
... and it waits here "for a long time" ...
verifying result
got 4999950000 expected 4999950000
You can answer most of your questions just by observing its behavior. The work items are queued up quickly. By the time we see "closing pool", all the work items have been queued, but 1478 have already completed, and about 98000 are still waiting for some process to work on them.
If you take the sleep(0.01) out of f(), it's much less revealing, because results come back almost as fast as work items are queued.
Memory use remains trivial no matter how you run it, though. The work items here (the name of the function ("f") and its pickled integer argument) are tiny.
Related
I want to do multiple transformations on some data. I figured I can use multiple Pool.imap's because each of the transformations is just a simple map. And Pool.imap is lazy, so it only does computation when needed.
But strangely, it looks like multiple consecutive Pool.imap's are blocking. And not lazy. Look at the following code as an example.
import time
from multiprocessing import Pool
def slow(n):
time.sleep(0.01)
return n*n
for i in [10, 100, 1000]:
with Pool() as p:
numbers = range(i)
iter1 = p.imap(slow, numbers)
iter2 = p.imap(slow, iter1)
start = time.perf_counter()
next(iter2)
print(i, time.perf_counter() - start)
# Prints
# 10 0.0327413540071575
# 100 0.27094774100987706
# 1000 2.6275791430089157
As you can see the time to the first element is increasing. I have 4 cores on my machine, so it roughly takes 2.5 seconds to process 1000 items with a 0.01 second delay. Hence, I think two consecutive Pool.imap's are blocking. And that the first Pool.imap finishes the entire workload before the second one starts. That is not lazy.
I've did some additional research. It does not matter if I use a process pool or a thread pool. It happens with Pool.imap and Pool.imap_unordered. The blocking takes longer when I do a third Pool.imap. A single Pool.imap is not blocking. This bug report seems related but different.
TL;DR imap is not a real generator, meaning it does not generate items on-demand (lazy computation aka similar to coroutine), and pools initiate "jobs" in serial.
longer answer: Every type of submission to a Pool be it imap, apply, apply_async etc.. gets written to a queue of "jobs". This queue is read by a thread in the main process (pool._handle_tasks) in order to allow jobs to continue to be initiated while the main process goes off and does other things. This thread contains a very simple double for loop (with a lot of error handling) that basically iterates over each job, then over each task within each job. The inner loop blocks until a worker is available to get each task, meaning tasks (and jobs) are always started in serial in the exact order they were submitted. This does not mean they will finish in perfect serial, which is why map, and imap collect results, and re-order them back to their original order (handled by pool._handle_resluts thread) before passing back to the main thread.
Rough pseudocode of what's going on:
#task_queue buffers task inputs first in - first out
pool.imap(foo, ("bar", "baz", "bat"), chunksize=1)
#put an iterator on the task queue which will yield "chunks" (a chunk is given to a single worker process to compute)
pool.imap(fun, ("one", "two", "three"), chunksize=1)
#put a second iterator to the task queue
#inside the pool._task_handler thread within the main proces
for task in task_queue: #[imap_1, imap_2]
#this is actually a while loop in reality that tries to get new tasks until the pool is close()'d
for chunk in task:
_worker_input_queue.put(chunk) # give the chunk to the next available worker
# This blocks until a worker actually takes the chunk, meaning the loop won't
# continue until all chunks are taken by workers.
def worker_function(_worker_input_queue, _worker_output_queue):
while True:
task = _worker_input_queue.get() #get the next chunk of tasks
#if task == StopSignal: break
result = task.func(task.args)
_worker_output_queue.put(result) #results are collected, and re-ordered
# by another thread in the main process
# as they are completed.
I have a script that executes a certain function by multi-threading. Now, it is of interest to have only as much threads running parallel as having CPU-cores.
Now the current code (1:) using the threading.thread statement creates 1000 threads and runs them all simultaneously.
I want to turn this into something that runs only a fixed number of threads at the same time (e.g., 8) and puts the rest into a queue till a executing thread/cpu core is free for usage.
1:
import threading
nSim = 1000
def simulation(i):
print(str(threading.current_thread().getName()) + ': '+ str(i))
if __name__ == '__main__':
threads = [threading.Thread(target=simulation,args=(i,)) for i in range(nSim)]
for t in threads:
t.start()
for t in threads:
t.join()
Q1: Is code 2: doing what I described? (multithreading with a max number of threads running simultaneously) Is it correct? (I think so but I'm not 100% sure)
Q2: Now the code initiates 1000 threads at the same time and executes them on 8 threads. Is there a way to only initiate a new thread when a executing thread/cpu core is free for usage (in order that I don't have 990 threadcalls waiting from the beginning to be executed when possible?
Q3: Is there a way to track which cpu-core executed which thread? Just to proof that the code is doing what it should do.
2:
import threading
import multiprocessing
print(multiprocessing.cpu_count())
from concurrent.futures import ThreadPoolExecutor
nSim = 1000
def simulation(i):
print(str(threading.current_thread().getName()) + ': '+ str(i))
if __name__ == '__main__':
with ThreadPoolExecutor(max_workers=8) as executor:
for i in range (nSim):
res = executor.submit(simulation, i)
print(res.result())
A1: In order to limit number of threads which can simultaneously have access to some resource, you can use threading.Semaphore Actually 1000 threads will not give you tremendous speed boost, recomended number of threads per process is mp.cpu_count()*1 or mp.cpu_count()*2 in some articles. Also note that Threads are good for IO operations in python, but not for computing due to GIL.
A2. Why do you need so many threads if you want to run only 8 of them simultaneously? Create just 8 threads and then supply them with Tasks when the Tasks are ready, to do so you need to use queue.Queue() which is thread safe. But in your concrete example you can do just the following to run your test 250 times per thread using while inside simulation function, by the way you do not need Semaphore in the case.
A3. When we are talking about multithreading, you have one process with multiple threads.
import threading
import time
import multiprocessing as mp
def simulation(i, _s):
# s is threading.Semaphore()
with _s:
print(str(threading.current_thread().getName()) + ': ' + str(i))
time.sleep(3)
if name == 'main':
print("Cores number: {}".format(mp.cpu_count()))
# recommended number of threading is mp.cpu_count()*1 or mp.cpu_count()*2 in some articles
nSim = 25
s = threading.Semaphore(4) # max number of threads which can work simultaneously with resource is 4
threads = [threading.Thread(target=simulation, args=(i, s, )) for i in range(nSim)]
for t in threads:
t.start()
# just to prove that all threads are active in the start and then their number decreases when the work is done
for i in range(6):
print("Active threads number {}".format(threading.active_count()))
time.sleep(3)
A1: No, your code submits a task, receives a Future in res and then calls result which waits for the result. Only after previous task was done a new task is given to a thread. Only one of the worker threads is really working at a time.
Take a look at ThreadPool.map (actually Pool.map) instead of submit to distribute tasks among the workers.
A2: Only 8 threads (the number of workers) are used here at most. If using map the input data of the 1000 tasks may be stored (needs memory) but no additional threads are created.
A3: Not that I know of. A thread is not bound to a core, it may switch between them fast.
please be warned that this demonstration code generates a few GB data.
I have been using versions of the code below for multiprocessing for some time. It works well when the run time of each process in the pool is similar but if one process takes much longer I end up with many blocked processes waiting on the one, so I'm trying to make it run asynchronously - just for one function at a time.
For example, if I have 70 cores and need to run a function 2000 times I want that to run asynchronously then wait for the last process before calling the next function. Currently it just submits processes in batches of how ever many cores I give it and each batch has to wait for the longest process.
As you can see I've tried using map_async but this is clearly the wrong syntax. Can anyone help me out?
import os
p='PATH/test/'
def f1(tup):
x,y=tup
to_write = x*(y**5)
with open(p+x+str(y)+'.txt','w') as fout:
fout.write(to_write)
def f2(tup):
x,y=tup
print (os.path.exists(p+x+str(y)+'.txt'))
def call_func(f,nos,threads,call):
print (call)
for i in range(0, len(nos), threads):
print (i)
chunk = nos[i:i + threads]
tmp = [('args', no) for no in chunk]
pool.map(f, tmp)
#pool.map_async(f, tmp)
nos=[i for i in range(55)]
threads=8
if __name__ == '__main__':
with Pool(processes=threads) as pool:
call_func(f1,nos,threads,'f1')
call_func(f2,nos,threads,'f2')
map will only return and map_async will only call the callback after all tasks of the current chunk are done.
So you can only either give all tasks to map/map_async at once or use apply_async (initially called threads times) where the callback calls apply_asyncfor the next task.
If the actual return values of the call don't matter (or at least their order doesn't), imap_unordered may be another efficient solution when giving it all tasks at once (or an iterator/generator producing the tasks on demand)
I'm using multiprocessing Pool to manage tesseract processes (OCRing pages of microfilm). Very often in a Pool of say 20 tesseract processes a few pages will be more difficult to OCR, and thus these processes are taking much much longer than the other ones. In the mean time, the pool is just hanging and most of the CPUs are not being leveraged. I want these stragglers to be left to continue, but I also want to start up more processes to fill up the many other CPUs that are now lying idle while these few sticky pages are finishing up. My question: is there a way to load up new processes to leverage those idle CPUs. In other words, can the empty spots in the Pool be filled before waiting for the whole pool to complete?
I could use the async version of starmap and then load up a new pool when the current pool has gone down to a certain number of living processes. But this seems inelegant. It would be more elegant to automagically keep slotting in processes as needed.
Here's what my code looks like right now:
def getMpBatchMap(fileList, commandTemplate, concurrentProcesses):
mpBatchMap = []
for i in range(concurrentProcesses):
fileName = fileList.readline()
if fileName:
mpBatchMap.append((fileName, commandTemplate))
return mpBatchMap
def executeSystemProcesses(objFileName, commandTemplate):
objFileName = objFileName.strip()
logging.debug(objFileName)
objDirName = os.path.dirname(objFileName)
command = commandTemplate.substitute(objFileName=objFileName, objDirName=objDirName)
logging.debug(command)
subprocess.call(command, shell=True)
def process(FILE_LIST_FILENAME, commandTemplateString, concurrentProcesses=3):
"""Go through the list of files and run the provided command against them,
one at a time. Template string maps the terms $objFileName and $objDirName.
Example:
>>> runBatchProcess('convert -scale 256 "$objFileName" "$objDirName/TN.jpg"')
"""
commandTemplate = Template(commandTemplateString)
with open(FILE_LIST_FILENAME) as fileList:
while 1:
# Get a batch of x files to process
mpBatchMap = getMpBatchMap(fileList, commandTemplate, concurrentProcesses)
# Process them
logging.debug('Starting MP batch of %i' % len(mpBatchMap))
if mpBatchMap:
with Pool(concurrentProcesses) as p:
poolResult = p.starmap(executeSystemProcesses, mpBatchMap)
logging.debug('Pool result: %s' % str(poolResult))
else:
break
You're mixing something up here. The pool always keeps a number of specified processes alive. As long as you don't close the pool, either manually or by leaving the with-block of the context-manager, there is no need for you to refill the pool with processes, because they're not going anywhere.
What you probably meant to say is 'tasks', tasks these processes can work on. A task is a per-process-chunk of the iterable you pass to the pool-methods. And yes, there's a way to use idle processes in the pool for new tasks before all previously enqueued tasks have been processed. You already picked the right tool for this, the async-versions of the pool-methods. All you have to do, is to reapply some sort of async pool-method.
from multiprocessing import Pool
import os
def busy_foo(x):
x = int(x)
for _ in range(x):
x - 1
print(os.getpid(), ' returning: ', x)
return x
if __name__ == '__main__':
arguments1 = zip([222e6, 22e6] * 2)
arguments2 = zip([111e6, 11e6] * 2)
with Pool(4) as pool:
results = pool.starmap_async(busy_foo, arguments1)
results2 = pool.starmap_async(busy_foo, arguments2)
print(results.get())
print(results2.get())
Example Output:
3182 returning: 22000000
3185 returning: 22000000
3185 returning: 11000000
3182 returning: 111000000
3182 returning: 11000000
3185 returning: 111000000
3181 returning: 222000000
3184 returning: 222000000
[222000000, 22000000, 222000000, 22000000]
[111000000, 11000000, 111000000, 11000000]
Process finished with exit code 0
Note above, processes 3182 and 3185 which ended up with the easier task, immediately start with tasks from the second argument-list, without waiting for 3181 and 3184 to complete first.
If you, for some reason, really would like to use fresh processes after some amount of processed tasks per process, there's the maxtasksperchild parameter for Pool. There you can specify after how many tasks the pool should replace the old processes with new ones. The default for this argument is None, so the Pool does not replace processes by default.
How can I script a Python multiprocess that uses two Queues as these ones?:
one as a working queue that starts with some data and that, depending on conditions of the functions to be parallelized, receives further tasks on the fly,
another that gathers results and is used to write down the result after processing finishes.
I basically need to put some more tasks in the working queue depending on what I found in its initial items. The example I post below is silly (I could transform the item as I like and put it directly in the output Queue), but its mechanics are clear and reflect part of the concept I need to develop.
Hereby my attempt:
import multiprocessing as mp
def worker(working_queue, output_queue):
item = working_queue.get() #I take an item from the working queue
if item % 2 == 0:
output_queue.put(item**2) # If I like it, I do something with it and conserve the result.
else:
working_queue.put(item+1) # If there is something missing, I do something with it and leave the result in the working queue
if __name__ == '__main__':
static_input = range(100)
working_q = mp.Queue()
output_q = mp.Queue()
for i in static_input:
working_q.put(i)
processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(mp.cpu_count())] #I am running as many processes as CPU my machine has (is this wise?).
for proc in processes:
proc.start()
for proc in processes:
proc.join()
for result in iter(output_q.get, None):
print result #alternatively, I would like to (c)pickle.dump this, but I am not sure if it is possible.
This does not end nor print any result.
At the end of the whole process I would like to ensure that the working queue is empty, and that all the parallel functions have finished writing to the output queue before the later is iterated to take out the results. Do you have suggestions on how to make it work?
The following code achieves the expected results. It follows the suggestions made by #tawmas.
This code allows to use multiple cores in a process that requires that the queue which feeds data to the workers can be updated by them during the processing:
import multiprocessing as mp
def worker(working_queue, output_queue):
while True:
if working_queue.empty() == True:
break #this is the so-called 'poison pill'
else:
picked = working_queue.get()
if picked % 2 == 0:
output_queue.put(picked)
else:
working_queue.put(picked+1)
return
if __name__ == '__main__':
static_input = xrange(100)
working_q = mp.Queue()
output_q = mp.Queue()
results_bank = []
for i in static_input:
working_q.put(i)
processes = [mp.Process(target=worker,args=(working_q, output_q)) for i in range(mp.cpu_count())]
for proc in processes:
proc.start()
for proc in processes:
proc.join()
results_bank = []
while True:
if output_q.empty() == True:
break
results_bank.append(output_q.get_nowait())
print len(results_bank) # length of this list should be equal to static_input, which is the range used to populate the input queue. In other words, this tells whether all the items placed for processing were actually processed.
results_bank.sort()
print results_bank
You have a typo in the line that creates the processes. It should be mp.Process, not mp.process. This is what is causing the exception you get.
Also, you are not looping in your workers, so they actually only consume a single item each from the queue and then exit. Without knowing more about the required logic, it's not easy to give specific advice, but you will probably want to enclose the body of your worker function inside a while True loop and add a condition in the body to exit when the work is done.
Please note that, if you do not add a condition to explicitly exit from the loop, your workers will simply stall forever when the queue is empty. You might consider using the so-called poison pill technique to signal the workers they may exit. You will find an example and some useful discussion in the PyMOTW article on Communication Between processes.
As for the number of processes to use, you will need to benchmark a bit to find what works for you, but, in general, one process per core is a good starting point when your workload is CPU bound. If your workload is IO bound, you might have better results with a higher number of workers.