I've been trying to get to grips with how I can use concurrent.futures to call a function 3 times every second, without waiting for it to return. I will collect the results after I've made all the calls I need to make.
Here is where I am at the moment, and I'm surprised that sleep() within this example function prevents my code from launching the next chunk of 3 function calls. I'm obviously not understanding the documentation well enough here :)
def print_something(thing):
print(thing)
time.sleep(10)
# define a generator
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
def main():
chunk_number = 0
alphabet = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
for current_chunk in chunks(alphabet, 3): # Restrict to calling the function 3 times per second
with ProcessPoolExecutor(max_workers=3) as executor:
futures = { executor.submit(print_something, thing): thing for thing in current_chunk }
chunk_number += 1
print('chunk %s' % chunk_number)
time.sleep(1)
for result in as_completed(futures):
print(result.result())
This code results in chunks of 3 being printed with a sleep time of 10s between each chunk.How can I change this to ensure I'm not waiting for the function to return before calling for the next batch ?
Thanks
First, for each iteration of for current_chunk in chunks(alphabet, 3):, you are creating a new ProcessPoolExecutor instance and futures dictionary instance clobbering the previous one. So the final loop for result in as_completed(futures): would only be printing the results from the last chunk submitted. Second, and the reason why I believe you are hanging, your block that is governed by with ProcessPoolExecutor(max_workers=3) as executor: will not terminate until the tasks that are submitted by the executor are completed and that will take at least 10 seconds. So, the next iteration of the for current_chunk in chunks(alphabet, 3): block won't be executed more frequently than once every 10 seconds.
Note also that the block for result in as_completed(futures): needs to be moved within the with ThreadPoolExecutor(max_workers=26) as executor: block for the same reason. That is, if it is placed after, it will not be executed until all the tasks have completed and so you will not be able to get results "as they complete."
You need to do a bit of rearranging as shown below (I have also modified print_something to return something other than None. There should be no hangs now if you have enough workers (26) to run the 26 tasks being submitted. I doubt your desktop (if you are running this on your PC) has 26 cores to support 26 concurrently executing processes. But I note that print_something only prints a short string and then sleeps for 10 seconds, which allows it to relinquish its processor to another process in the pool. So, while with cpu-intensive tasks, little is to be gained by specifying a max_workers value greater than the number of actual physical processors/cores you have on your computer, in this case it's OK. But more efficient when you have tasks that spend little time executing actual Python byte code is to use threading instead of processes, since the cost of creating threads is much less than the cost of creating processes. However, threading is notoriously poor when the tasks you are running largely consists of Python byte code since such code cannot be executed concurrently due to serialization of the Global Interpreter Lock (GIL).
Topic for you to research: The Global Interpreter Lock (GIL) and Python byte code execution
Update to use threads:
So we should substitute the ThreadPoolExecutor with 26 or more light-weight threads for the ProcessPoolExecutor. The beauty of the concurrent.futures module is that no other code needs to be changed. But most important is to change the block structure and have a single executor.
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
def print_something(thing):
# NOT cpu-intensive, so threads should work well here
print(thing)
time.sleep(10)
return thing # so there is a non-None result
# define a generator
def chunks(l, n):
"""Yield successive n-sized chunks from l."""
for i in range(0, len(l), n):
yield l[i:i + n]
def main():
chunk_number = 0
alphabet = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z']
futures = {}
with ThreadPoolExecutor(max_workers=26) as executor:
for current_chunk in chunks(alphabet, 3): # Restrict to calling the function 3 times per second
futures.update({executor.submit(print_something, thing): thing for thing in current_chunk })
chunk_number += 1
print('chunk %s' % chunk_number)
time.sleep(1)
# needs to be within the executor block else it won't run until all futures are complete
for result in as_completed(futures):
print(result.result())
if __name__ == '__main__':
main()
Related
I have a Python multiprocessing pool doing a very long job that even after a thorough debugging is not robust enough not to fail every 24 hours or so, because it depends on many third-party, non-Python tools with complex interactions. Also, the underlying machine has certain problems that I cannot control. Note that by failing I don't mean the whole program crashing, but some or most of the processes becoming idle because of some errors, and the app itself either hanging or continuing the job just with the processes that haven't failed.
My solution right now is to periodically kill the job, manually, and then just restart from where it was.
Even if it's not ideal, what I want to do now is the following: restart the multiprocessing pool periodically, programatically, from the Python code itself. I don't really care if this implies killing the pool workers in the middle of their job. Which would be the best way to do that?
My code looks like:
with Pool() as p:
for _ in p.imap_unordered(function, data):
save_checkpoint()
log()
What I have in mind would be something like:
start = 0
end = 1000 # magic number
while start + 1 < len(data):
current_data = data[start:end]
with Pool() as p:
for _ in p.imap_unordered(function, current_data):
save_checkpoint()
log()
start += 1
end += 1
Or:
start = 0
end = 1000 # magic number
while start + 1 < len(data):
current_data = data[start:end]
start_timeout(time=TIMEOUT) # which would be the best way to to do that without breaking multiprocessing?
try:
with Pool() as p:
for _ in p.imap_unordered(function, current_data):
save_checkpoint()
log()
start += 1
end += 1
except Timeout:
pass
Or any suggestion you think would be better. Any help would be much appreciated, thanks!
The problem with your current code is that it iterates the multiprocessed results directly, and that call will block. Fortunately there's an easy solution: use apply_async exactly as suggested in the docs. But because of how you describe the use-case here and the failure, I've adapted it somewhat. Firstly, a mock task:
from multiprocessing import Pool, TimeoutError, cpu_count
from time import sleep
from random import randint
def log():
print("logging is a dangerous activity: wear a hard hat.")
def work(d):
sleep(randint(1, 100) / 100)
print("finished working")
if randint(1, 10) == 1:
print("blocking...")
while True:
sleep(0.1)
return d
This work function will fail with a probabilty of 0.1, blocking indefinitely. We create the tasks:
data = list(range(100))
nproc = cpu_count()
And then generate futures for all of them:
while data:
print(f"== Processing {len(data)} items. ==")
with Pool(nproc) as p:
tasks = [p.apply_async(work, (d,)) for d in data]
Then we can try to get the tasks out manually:
for task in tasks:
try:
res = task.get(timeout=1)
data.remove(res)
log()
except TimeoutError:
failed.append(task)
if len(failed) < nproc:
print(
f"{len(failed)} processes are blocked,"
f" but {nproc - len(failed)} remain."
)
else:
break
The controlling timeout here is the timeout to .get. It should be as long as you expect the longest process to take. Note that we detect when the whole pool is tied up and give up.
But since in the scenario you describe some threads are going to take longer than others, we can give 'failed' processes some time to recover. Thus every time a task fails we quickly check if the others have in fact succeeded:
for task in failed:
try:
res = task.get(timeout=0.01)
data.remove(res)
failed.remove(task)
log()
except TimeoutError:
continue
Whether this is a good addition in your case depends on whether your tasks really are as flaky as I'm guessing they are.
Exiting the context manager for the pool will terminate the pool, so we don't even need to handle that ourselves. If you have significant variation you might want to increase the pool size (thus increasing the number of tasks which are allowed to stall) or allow tasks a grace period before considering them 'failed'.
I want to do multiple transformations on some data. I figured I can use multiple Pool.imap's because each of the transformations is just a simple map. And Pool.imap is lazy, so it only does computation when needed.
But strangely, it looks like multiple consecutive Pool.imap's are blocking. And not lazy. Look at the following code as an example.
import time
from multiprocessing import Pool
def slow(n):
time.sleep(0.01)
return n*n
for i in [10, 100, 1000]:
with Pool() as p:
numbers = range(i)
iter1 = p.imap(slow, numbers)
iter2 = p.imap(slow, iter1)
start = time.perf_counter()
next(iter2)
print(i, time.perf_counter() - start)
# Prints
# 10 0.0327413540071575
# 100 0.27094774100987706
# 1000 2.6275791430089157
As you can see the time to the first element is increasing. I have 4 cores on my machine, so it roughly takes 2.5 seconds to process 1000 items with a 0.01 second delay. Hence, I think two consecutive Pool.imap's are blocking. And that the first Pool.imap finishes the entire workload before the second one starts. That is not lazy.
I've did some additional research. It does not matter if I use a process pool or a thread pool. It happens with Pool.imap and Pool.imap_unordered. The blocking takes longer when I do a third Pool.imap. A single Pool.imap is not blocking. This bug report seems related but different.
TL;DR imap is not a real generator, meaning it does not generate items on-demand (lazy computation aka similar to coroutine), and pools initiate "jobs" in serial.
longer answer: Every type of submission to a Pool be it imap, apply, apply_async etc.. gets written to a queue of "jobs". This queue is read by a thread in the main process (pool._handle_tasks) in order to allow jobs to continue to be initiated while the main process goes off and does other things. This thread contains a very simple double for loop (with a lot of error handling) that basically iterates over each job, then over each task within each job. The inner loop blocks until a worker is available to get each task, meaning tasks (and jobs) are always started in serial in the exact order they were submitted. This does not mean they will finish in perfect serial, which is why map, and imap collect results, and re-order them back to their original order (handled by pool._handle_resluts thread) before passing back to the main thread.
Rough pseudocode of what's going on:
#task_queue buffers task inputs first in - first out
pool.imap(foo, ("bar", "baz", "bat"), chunksize=1)
#put an iterator on the task queue which will yield "chunks" (a chunk is given to a single worker process to compute)
pool.imap(fun, ("one", "two", "three"), chunksize=1)
#put a second iterator to the task queue
#inside the pool._task_handler thread within the main proces
for task in task_queue: #[imap_1, imap_2]
#this is actually a while loop in reality that tries to get new tasks until the pool is close()'d
for chunk in task:
_worker_input_queue.put(chunk) # give the chunk to the next available worker
# This blocks until a worker actually takes the chunk, meaning the loop won't
# continue until all chunks are taken by workers.
def worker_function(_worker_input_queue, _worker_output_queue):
while True:
task = _worker_input_queue.get() #get the next chunk of tasks
#if task == StopSignal: break
result = task.func(task.args)
_worker_output_queue.put(result) #results are collected, and re-ordered
# by another thread in the main process
# as they are completed.
please be warned that this demonstration code generates a few GB data.
I have been using versions of the code below for multiprocessing for some time. It works well when the run time of each process in the pool is similar but if one process takes much longer I end up with many blocked processes waiting on the one, so I'm trying to make it run asynchronously - just for one function at a time.
For example, if I have 70 cores and need to run a function 2000 times I want that to run asynchronously then wait for the last process before calling the next function. Currently it just submits processes in batches of how ever many cores I give it and each batch has to wait for the longest process.
As you can see I've tried using map_async but this is clearly the wrong syntax. Can anyone help me out?
import os
p='PATH/test/'
def f1(tup):
x,y=tup
to_write = x*(y**5)
with open(p+x+str(y)+'.txt','w') as fout:
fout.write(to_write)
def f2(tup):
x,y=tup
print (os.path.exists(p+x+str(y)+'.txt'))
def call_func(f,nos,threads,call):
print (call)
for i in range(0, len(nos), threads):
print (i)
chunk = nos[i:i + threads]
tmp = [('args', no) for no in chunk]
pool.map(f, tmp)
#pool.map_async(f, tmp)
nos=[i for i in range(55)]
threads=8
if __name__ == '__main__':
with Pool(processes=threads) as pool:
call_func(f1,nos,threads,'f1')
call_func(f2,nos,threads,'f2')
map will only return and map_async will only call the callback after all tasks of the current chunk are done.
So you can only either give all tasks to map/map_async at once or use apply_async (initially called threads times) where the callback calls apply_asyncfor the next task.
If the actual return values of the call don't matter (or at least their order doesn't), imap_unordered may be another efficient solution when giving it all tasks at once (or an iterator/generator producing the tasks on demand)
I do not get any acceleration using asyncio. This snippet still runs the same fashion as a sync job. Most of the examples use asyncio.sleep() to impose delay, my question is what if part of the code poses the delay depending on the input parameters.
async def c(n):
#this loop is supposed to impose delay
for i in range(1, n * 40000):
c *= i
return n
async def f():
tasks = [c(i) for i in [2,1,3]]
r=[]
completed, pending = await asyncio.wait(tasks)
for item in completed:
r.append(item.result())
return r
if __name__=="__main__":
loop = asyncio.get_event_loop()
k=loop.run_until_complete(f())
loop.close()
I expect to get [1,2,3] but I do not (there is no time difference when running in serial also)
asyncio is not about getting acceleration, it's about avoiding "callback hell" when programming in an asynchronous environment, such as (but not limited to) non-blocking IO. Since the code in the question is not asynchronous, there is nothing to gain from using asyncio - but you can look into multiprocessing instead.
In the above case, the function is defined as async, but it runs its entire calculation without awaiting anything. It also contains references to unassigned variables, so let's start with a version that runs:
async def long_calc(n):
p = 1
for i in range(1, n * 10000):
p *= i
print(math.log(p))
return p
The print at the end immediately indicates when the calculation is done. Starting several such coroutines "in parallel" is done with asyncio.gather:
async def wait_calcs():
return await asyncio.gather(*[long_calc(i) for i in [2, 1, 3]])
asyncio.gather will let the calculations run and return once all of them are complete, returning a list of their results in the order in which they they appear in the argument list. But the output printed when running loop.run_until_complete(wait_calcs()) shows that calculations are not really running in parallel:
178065.71824964616
82099.71749644238
279264.3442843094
The results correspond to the [2, 1, 3] order. If the coroutines were running in parallel, the smallest number would appear first because its coroutine has by far the least work to do.
We can force the coroutine to give a chance to other coroutines to run by introducing a no-op sleep in the inner loop:
async def long_calc(n):
p = 1
for i in range(1, n * 10000):
p *= i
await asyncio.sleep(0)
print(math.log(p))
return p
The output now shows that the coroutines were running in parallel:
82099.71749644238
178065.71824964616
279264.3442843094
Note that this version also takes more time to run because it involves more switching between the coroutines and the main loop. The slowdown can be avoided by only sleeping once in a hundred cycles or so.
I am using Python 2.7.
I am currently using ThreadPoolExecuter like this:
params = [1,2,3,4,5,6,7,8,9,10]
with concurrent.futures.ThreadPoolExecutor(5) as executor:
result = list(executor.map(f, params))
The problem is that f sometimes runs for too long. Whenever I run f, I want to limit its run to 100 seconds, and then kill it.
Eventually, for each element x in param, I would like to have an indication of whether or not f had to be killed, and in case it wasn't - what was the return value.
Even if f times out for one parameter, I still want to run it with the next parameters.
The executer.map method does have a timeout parameter, but it sets a timeout for the entire run, from the time of the call to executer.map, and not for each thread separately.
What is the easiest way to get my desired behavior?
This answer is in terms of python's multiprocessing library, which is usually preferable to the threading library, unless your functions are just waiting on network calls. Note that the multiprocessing and threading libraries have the same interface.
Given you're processes run for potentially 100 seconds each, the overhead of creating a process for each one is fairly small in comparison. You probably have to make your own processes to get the necessary control.
One option is to wrap f in another function that will exectue for at most 100 seconds:
from multiprocessing import Pool
def timeout_f(arg):
pool = Pool(processes=1)
return pool.apply_async(f, [arg]).get(timeout=100)
Then your code changes to:
result = list(executor.map(timeout_f, params))
Alternatively, you could write your own thread/process control:
from multiprocessing import Process
from time import time
def chunks(l, n):
""" Yield successive n-sized chunks from l. """
for i in xrange(0, len(l), n):
yield l[i:i+n]
processes = [Process(target=f, args=(i,)) for i in params]
exit_codes = []
for five_processes = chunks(processes, 5):
for p in five_processes:
p.start()
time_waited = 0
start = time()
for p in five_processes:
if time_waited >= 100:
p.join(0)
p.terminate()
p.join(100 - time_waited)
p.terminate()
time_waited = time() - start
for p in five_processes:
exit_codes.append(p.exit_code)
You'd have to get the return values through something like Can I get a return value from multiprocessing.Process?
The exit codes of the processes are 0 if the processes completed and non-zero if they were terminated.
Techniques from:
Join a group of python processes with a timeout, How do you split a list into evenly sized chunks?
As another option, you could just try to use apply_async on multiprocessing.Pool
from multiprocessing import Pool, TimeoutError
from time import sleep
if __name__ == "__main__":
pool = Pool(processes=5)
processes = [pool.apply_async(f, [i]) for i in params]
results = []
for process in processes:
try:
result.append(process.get(timeout=100))
except TimeoutError as e:
results.append(e)
Note that the above possibly waits more than 100 seconds for each process, as if the first one takes 50 seconds to complete, the second process will have had 50 extra seconds in its run time. More complicated logic (such as the previous example) is needed to enforce stricter timeouts.