How do I wait for ThreadPoolExecutor.map to finish - python

I have the following code, which has been simplified:
import concurrent.futures
pool = concurrent.futures.ThreadPoolExecutor(8)
def _exec(x):
return x + x
myfuturelist = pool.map(_exec,[x for x in range(5)])
# How do I wait for my futures to finish?
for result in myfuturelist:
# Is this how it's done?
print(result)
#... stuff that should happen only after myfuturelist is
#completely resolved.
# Documentation says pool.map is asynchronous
The documentation is weak regarding ThreadPoolExecutor.map. Help would be great.
Thanks!

The call to ThreadPoolExecutor.map does not block until all of its tasks are complete. Use wait to do this.
from concurrent.futures import wait, ALL_COMPLETED
...
futures = [pool.submit(fn, args) for args in arg_list]
wait(futures, timeout=whatever, return_when=ALL_COMPLETED) # ALL_COMPLETED is actually the default
do_other_stuff()
You could also call list(results) on the generator returned by pool.map to force the evaluation (which is what you're doing in your original example). If you're not actually using the values returned from the tasks, though, wait is the way to go.

It's true that Executor.map() will not wait for all futures to finish. Because it returns a lazy iterator like #MisterMiyagi said.
But we can accomplish this by using with:
import time
from concurrent.futures import ThreadPoolExecutor
def hello(i):
time.sleep(i)
print(i)
with ThreadPoolExecutor(max_workers=2) as executor:
executor.map(hello, [1, 2, 3])
print("finish")
# output
# 1
# 2
# 3
# finish
As you can see, finish is printed after 1,2,3. It works because Executor has a __exit__() method, code is
def __exit__(self, exc_type, exc_val, exc_tb):
self.shutdown(wait=True)
return False
the shutdown method of ThreadPoolExecutor is
def shutdown(self, wait=True, *, cancel_futures=False):
with self._shutdown_lock:
self._shutdown = True
if cancel_futures:
# Drain all work items from the queue, and then cancel their
# associated futures.
while True:
try:
work_item = self._work_queue.get_nowait()
except queue.Empty:
break
if work_item is not None:
work_item.future.cancel()
# Send a wake-up to prevent threads calling
# _work_queue.get(block=True) from permanently blocking.
self._work_queue.put(None)
if wait:
for t in self._threads:
t.join()
shutdown.__doc__ = _base.Executor.shutdown.__doc__
So by using with, we can get the ability to wait until all futures finish.

Executor.map will run jobs in parallel and wait futures to finish, collect results and return a generator. It has done the wait for you. If you set a timeout, it will wait until timeout and throw exception in generator.
map(func, *iterables, timeout=None, chunksize=1)
the iterables are collected immediately rather than lazily;
func is executed asynchronously and several calls to func may be made concurrently.
To get a list of futures and do the wait manually, you can use:
myfuturelist = [pool.submit(_exec, x) for x in range(5)]
Executor.submit will return a future object, call result on future will explicitly wait for it to finish:
myfutrelist[0].result() # wait the 1st future to finish and return the result

Related

Python concurrency first result ends waiting for not yet done results

What I wish to do is Move on... after the first True, not caring about not yet finished I/O bound tasks. In below case two() is first and only True so the program needs to execute like this:
Second
Move on..
NOT:
Second
First
Third
Move on...
import concurrent.futures
import time
def one():
time.sleep(2)
print('First')
return False
def two():
time.sleep(1)
print('Second')
return True
def three():
time.sleep(4)
print('Third')
return False
tasks = [one, two, three]
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
for t in range(len(tasks)):
executor.submit(tasks[t])
print('Move on...')
A with statement is not what you want here because it waits for all submitted jobs to finish. You need to submit the tasks, as you already do, but then call as_completed to wait for the first task that returns true (and no longer):
executor = concurrent.futures.ThreadPoolExecutor()
futures = [executor.submit(t) for t in tasks]
for f in concurrent.futures.as_completed(futures):
if f.result():
break
print('Move on...')
The problem with concurrent.futures.ThreadPoolExecutor is that once tasks are submitted, they will run to completion so the program will print 'Move on...' but if there is in fact nothing else to do, the program will not terminate until functions one and three terminate and (and print their messages). So the program is guaranteed to run for at least 4 seconds.
Better to use the ThreadPool class in the multiprocessing.pool module which supports a terminate method that will kill all outstanding tasks. The closest thing to an as_completed method would probably be using the imap_unordered method, but that requires a single worker function being used for all 3 tasks. But we can use apply_async specifying a callback function to be invoked as results become available:
from multiprocessing.pool import ThreadPool
import time
from threading import Event
def one():
time.sleep(2)
print('First')
return False
def two():
time.sleep(1)
print('Second')
return True
def three():
time.sleep(4)
print('Third')
return False
def my_callback(result):
if result:
executor.terminate() # kill all other tasks
done_event.set()
tasks = [one, two, three]
executor = ThreadPool(3)
done_event = Event()
for t in tasks:
executor.apply_async(t, callback=my_callback)
done_event.wait()
print("Moving on ...")

Python: Getting a concurrent.futures Executor to wait for done_callbacks to complete

Is it possible to get a ThreadPoolExecutor to wait for all its futures and their add_done_callback() functions to complete without having to call .shutdown(wait=True)? The following code snippet illustrates the essence of what I'm trying to accomplish, which is to reuse the thread pool between iterations of the outer loop.
from concurrent.futures import ThreadPoolExecutor, wait
import time
def proc_func(n):
return n + 1
def create_callback_func(fid, sleep_time):
def callback(future):
time.sleep(sleep_time)
fid.write(str(future.result()))
return
return callback
num_workers = 4
num_files_write = 3
num_tasks = 8
sleep_time = 1
pool = ThreadPoolExecutor(max_workers=num_workers)
for n in range(num_files_write):
fid = open(f'test{n}.txt', 'w')
futs = []
callback_func = create_callback_func(fid, sleep_time)
for t in range(num_tasks):
fut = pool.submit(proc_func, n)
fut.add_done_callback(callback_func)
futs.append(fut)
wait(futs)
fid.close()
pool.shutdown(wait=True)
Running this code throws a bunch of ValueError: I/O operation on closed file. and the three files that get written have contents:
test0.txt -> 1111
test1.txt -> 2222
test3.txt -> 3333
Clearly this is wrong and there should be eight of each numeral. If I create and shutdown a separate ThreadPoolExecutor for each file, then the correct result is achieved. So I know that the Executor has the ability to properly wait for all the callbacks to finish, but can I tell it to do so without shutting it down?
I'm afraid that cannot be done and you are "misusing" the callback.
The primary purpose of the callback is to notify that the scheduled work has been done.
The internal future states are PENDING -> RUNNING -> FINISHED (disregarding cancellations for brevity). When the FINISHED state is reached, the callbacks are invoked, but there is no next state when they finish. That's why it is not possible to synchronize with that event.
The core of the execution of a submitted function in one of the available threads is (simplified):
try:
result = self.fn(*self.args, **self.kwargs)
except BaseException as exc:
self.future.set_exception(exc)
else:
self.future.set_result(result)
where both set_exception and set_result look like this (very simplified):
... save the result/exception
self._state = FINISHED
... wakeup all waiters
self._invoke_callbacks() # this is the last statement
The future is in FINISHED, i.e. "done" state when the "done" callback is called. It would not make sense to notify that the work is done before marking it done.
As you noticed already, in your code:
wait(futs)
fid.close()
the wait returns, the file get closed, but the callback is not finished yet and fails attemtping to write to a closed file.
The second question is why shutdown(wait=True) works? Simply because it waits for all threads:
if wait:
for t in self._threads:
t.join()
Those threads execute also the callbacks (see the code snippets above). That's why the callback execution must be finished when the threads are finished.

Return from function if execution finished within timeout or make callback otherwise

I have a project in Python 3.5 without any usage of asynchronous features. I have to implement the folowing logic:
def should_return_in_3_sec(some_serious_job, arguments, finished_callback):
# Start some_serious_job(*arguments) in a task
# if it finishes within 3 sec:
# return result immediately
# otherwise return None, but do not terminate task.
# If the task finishes in 1 minute:
# call finished_callback(result)
# else:
# call finished_callback(None)
pass
The function should_return_in_3_sec() should remain synchronous, but it is up to me to write any new asynchronous code (including some_serious_job()).
What is the most elegant and pythonic way to do it?
Fork off a thread doing the serious job, let it write its result into a queue and then terminate. Read in your main thread from that queue with a timeout of three seconds. If the timeout occurs, start another thread and return None. Let the second thread read from the queue with a timeout of one minute; if that timeouts also, call finished_callback(None); otherwise call finished_callback(result).
I sketched it like this:
import threading, queue
def should_return_in_3_sec(some_serious_job, arguments, finished_callback):
result_queue = queue.Queue(1)
def do_serious_job_and_deliver_result():
result = some_serious_job(arguments)
result_queue.put(result)
threading.Thread(target=do_serious_job_and_deliver_result).start()
try:
result = result_queue.get(timeout=3)
except queue.Empty: # timeout?
def expect_and_handle_late_result():
try:
result = result_queue.get(timeout=60)
except queue.Empty:
finished_callback(None)
else:
finished_callback(result)
threading.Thread(target=expect_and_handle_late_result).start()
return None
else:
return result
The threading module has some simple timeout options, see Thread.join(timeout) for example.
If you do choose to use asyncio, below is a a partial solution to address some of your needs:
import asyncio
import time
async def late_response(task, flag, timeout, callback):
done, pending = await asyncio.wait([task], timeout=timeout)
callback(done.pop().result() if done else None) # will raise an exception if some_serious_job failed
flag[0] = True # signal some_serious_job to stop
return await task
async def launch_job(loop, some_serious_job, arguments, finished_callback,
timeout_1=3, timeout_2=5):
flag = [False]
task = loop.run_in_executor(None, some_serious_job, flag, *arguments)
done, pending = await asyncio.wait([task], timeout=timeout_1)
if done:
return done.pop().result() # will raise an exception if some_serious_job failed
asyncio.ensure_future(
late_response(task, flag, timeout_2, finished_callback))
return None
def f(flag, n):
for i in range(n):
print("serious", i, flag)
if flag[0]:
return "CANCELLED"
time.sleep(1)
return "OK"
def finished(result):
print("FINISHED", result)
loop = asyncio.get_event_loop()
result = loop.run_until_complete(launch_job(loop, f, [1], finished))
print("result:", result)
loop.run_forever()
This will run the job in a separate thread (Use loop.set_executor(ProcessPoolExecutor()) to run a CPU intensive task in a process instead). Keep in mind it is a bad practice to terminate a process/thread - the code above uses a very simple list to signal the thread to stop (See also threading.Event / multiprocessing.Event).
While implementing your solution, you might discover you would want to modify your existing code to use couroutines instead of using threads.

Thread cue is running serially, not parallel?

I'm making remote API calls using threads, using no join so that the program could make the next API call without waiting for the last to complete.
Like so:
def run_single_thread_no_join(function, args):
thread = Thread(target=function, args=(args,))
thread.start()
return
The problem was I needed to know when all API calls were completed. So I moved to code that's using a cue & join.
Threads seem to run in serial now.
I can't seem to figure out how to get the join to work so that threads execute in parallel.
What am I doing wrong?
def run_que_block(methods_list, num_worker_threads=10):
'''
Runs methods on threads. Stores method returns in a list. Then outputs that list
after all methods in the list have been completed.
:param methods_list: example ((method name, args), (method_2, args), (method_3, args)
:param num_worker_threads: The number of threads to use in the block.
:return: The full list of returns from each method.
'''
method_returns = []
# log = StandardLogger(logger_name='run_que_block')
# lock to serialize console output
lock = threading.Lock()
def _output(item):
# Make sure the whole print completes or threads can mix up output in one line.
with lock:
if item:
print(item)
msg = threading.current_thread().name, item
# log.log_debug(msg)
return
# The worker thread pulls an item from the queue and processes it
def _worker():
while True:
item = q.get()
if item is None:
break
method_returns.append(item)
_output(item)
q.task_done()
# Create the queue and thread pool.
q = Queue()
threads = []
# starts worker threads.
for i in range(num_worker_threads):
t = threading.Thread(target=_worker)
t.daemon = True # thread dies when main thread (only non-daemon thread) exits.
t.start()
threads.append(t)
for method in methods_list:
q.put(method[0](*method[1]))
# block until all tasks are done
q.join()
# stop workers
for i in range(num_worker_threads):
q.put(None)
for t in threads:
t.join()
return method_returns
You're doing all the work in the main thread:
for method in methods_list:
q.put(method[0](*method[1]))
Assuming each entry in methods_list is a callable and a sequence of arguments for it, you did all the work in the main thread, then put the result from each function call in the queue, which doesn't allow any parallelization aside from printing (which is generally not a big enough cost to justify thread/queue overhead).
Presumably, you want the threads to do the work for each function, so change that loop to:
for method in methods_list:
q.put(method) # Don't call it, queue it to be called in worker
and change the _worker function so it calls the function that does the work in the thread:
def _worker():
while True:
method, args = q.get() # Extract and unpack callable and arguments
item = method(*args) # Call callable with provided args and store result
if item is None:
break
method_returns.append(item)
_output(item)
q.task_done()

How to break time.sleep() in a python concurrent.futures

I am playing around with concurrent.futures.
Currently my future calls time.sleep(secs).
It seems that Future.cancel() does less than I thought.
If the future is already executing, then time.sleep() does not get cancel by it.
The same for the timeout parameter for wait(). It does not cancel my time.sleep().
How to cancel time.sleep() which gets executed in a concurrent.futures?
For testing I use the ThreadPoolExecutor.
If you submit a function to a ThreadPoolExecutor, the executor will run the function in a thread and store its return value in the Future object. Since the number of concurrent threads is limited, you have the option to cancel the pending execution of a future, but once control in the worker thread has been passed to the callable, there's no way to stop execution.
Consider this code:
import concurrent.futures as f
import time
T = f.ThreadPoolExecutor(1) # Run at most one function concurrently
def block5():
time.sleep(5)
return 1
q = T.submit(block5)
m = T.submit(block5)
print q.cancel() # Will fail, because q is already running
print m.cancel() # Will work, because q is blocking the only thread, so m is still queued
In general, whenever you want to have something cancellable you yourself are responsible for making sure that it is.
There are some off-the-shelf options available though. E.g., consider using asyncio, they also have an example using sleep. The concept circumvents the issue by, whenever any potentially blocking operation is to be called, instead returning control to a control loop running in the outer-most context, together with a note that execution should be continued whenever the result is available - or, in your case, after n seconds have passed.
I do not know much about concurrent.futures, but you can use this logic to break the time. Use a loop instead of sleep.time() or wait()
for i in range(sec):
sleep(1)
interrupt or break can be used to come out of loop.
I figured it out.
Here is a example:
from concurrent.futures import ThreadPoolExecutor
import queue
import time
class Runner:
def __init__(self):
self.q = queue.Queue()
self.exec = ThreadPoolExecutor(max_workers=2)
def task(self):
while True:
try:
self.q.get(block=True, timeout=1)
break
except queue.Empty:
pass
print('running')
def run(self):
self.exec.submit(self.task)
def stop(self):
self.q.put(None)
self.exec.shutdown(wait=False,cancel_futures=True)
r = Runner()
r.run()
time.sleep(5)
r.stop()
As it is written in its link, You can use a with statement to ensure threads are cleaned up promptly, like the below example:
import concurrent.futures
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
# Retrieve a single page and report the URL and contents
def load_url(url, timeout):
with urllib.request.urlopen(url, timeout=timeout) as conn:
return conn.read()
# We can use a with statement to ensure threads are cleaned up promptly
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
# Start the load operations and mark each future with its URL
future_to_url = {executor.submit(load_url, url, 60): url for url in URLS}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
print('%r generated an exception: %s' % (url, exc))
else:
print('%r page is %d bytes' % (url, len(data)))
I've faced this same problem recently. I had 2 tasks to run concurrently and one of them had to sleep from time to time. In the code below, suppose task2 is the one that sleeps.
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor(max_workers=2)
executor.submit(task1)
executor.submit(task2)
executor.shutdown(wait=True)
In order to avoid the endless sleep I've extracted task2 to run synchronously. I don't whether it's a good practice, but it's simple and fit perfectly in my scenario.
from concurrent.futures import ThreadPoolExecutor
executor = ThreadPoolExecutor(max_workers=1)
executor.submit(task1)
task2()
executor.shutdown(wait=True)
Maybe it's useful to someone else.

Categories