Python concurrency first result ends waiting for not yet done results - python

What I wish to do is Move on... after the first True, not caring about not yet finished I/O bound tasks. In below case two() is first and only True so the program needs to execute like this:
Second
Move on..
NOT:
Second
First
Third
Move on...
import concurrent.futures
import time
def one():
time.sleep(2)
print('First')
return False
def two():
time.sleep(1)
print('Second')
return True
def three():
time.sleep(4)
print('Third')
return False
tasks = [one, two, three]
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as executor:
for t in range(len(tasks)):
executor.submit(tasks[t])
print('Move on...')

A with statement is not what you want here because it waits for all submitted jobs to finish. You need to submit the tasks, as you already do, but then call as_completed to wait for the first task that returns true (and no longer):
executor = concurrent.futures.ThreadPoolExecutor()
futures = [executor.submit(t) for t in tasks]
for f in concurrent.futures.as_completed(futures):
if f.result():
break
print('Move on...')

The problem with concurrent.futures.ThreadPoolExecutor is that once tasks are submitted, they will run to completion so the program will print 'Move on...' but if there is in fact nothing else to do, the program will not terminate until functions one and three terminate and (and print their messages). So the program is guaranteed to run for at least 4 seconds.
Better to use the ThreadPool class in the multiprocessing.pool module which supports a terminate method that will kill all outstanding tasks. The closest thing to an as_completed method would probably be using the imap_unordered method, but that requires a single worker function being used for all 3 tasks. But we can use apply_async specifying a callback function to be invoked as results become available:
from multiprocessing.pool import ThreadPool
import time
from threading import Event
def one():
time.sleep(2)
print('First')
return False
def two():
time.sleep(1)
print('Second')
return True
def three():
time.sleep(4)
print('Third')
return False
def my_callback(result):
if result:
executor.terminate() # kill all other tasks
done_event.set()
tasks = [one, two, three]
executor = ThreadPool(3)
done_event = Event()
for t in tasks:
executor.apply_async(t, callback=my_callback)
done_event.wait()
print("Moving on ...")

Related

How to prevent multiple threads from picking up same task from queue

I want to run multiple threads in parallel. Each thread picks up a task from a task queue and executes that task.
from threading import Thread
from Queue import Queue
import time
class link(object):
def __init__(self, i):
self.name = str(i)
def run_jobs_in_parallel(consumer_func, jobs, results, thread_count,
async_run=False):
def consume_from_queue(jobs, results):
while not jobs.empty():
job = jobs.get()
try:
results.append(consumer_func(job))
except Exception as e:
print str(e)
results.append(False)
finally:
jobs.task_done()
#start worker threads
if jobs.qsize() < thread_count:
thread_count = jobs.qsize()
for tc in range(1,thread_count+1):
worker = Thread(
target=consume_from_queue,
name="worker_{0}".format(str(tc)),
args=(jobs,results,))
worker.start()
if not async_run:
jobs.join()
def create_link(link):
print str(link.name)
time.sleep(10)
return True
def consumer_func(link):
return create_link(link)
# create_link takes a while to execute
jobs = Queue()
results = list()
for i in range(0,10):
jobs.put(link(i))
run_jobs_in_parallel(consumer_func, jobs, results, 25, async_run=False)
Now what is happening is, let say we have 10 link objects in jobs queue, while the threads are running in parallel, multiple threads are executing same task. How can I prevent this from happening?
Note - the above sample code does not have the problem describe above, but i have exactly same code except create_link method does some complex stuff.
I think what you need is a lock object (docs,tutorial+examples). If you create an instance of such an object you can 'lock' some parts of your code, ensuring that only one thread executes this part at a time.
I guess in your case you want to lock the line job = jobs.get().
First you have to create the lock in a scope where all threads have access to it. (You don't want a lock for every thread but a single lock for all your threads. That means creating the lock within your thread just before acquiring it won't work)
import threading
lock = threading.Lock()
then you can use it on your line like:
lock.acquire()
job = jobs.get()
lock.release()
or
with lock:
job = jobs.get()
The first thread to reach acquire() will lock the lock. other threads that try to acquire() the lock will pause until the lock gets unlocked again by calling release().

How do I wait for ThreadPoolExecutor.map to finish

I have the following code, which has been simplified:
import concurrent.futures
pool = concurrent.futures.ThreadPoolExecutor(8)
def _exec(x):
return x + x
myfuturelist = pool.map(_exec,[x for x in range(5)])
# How do I wait for my futures to finish?
for result in myfuturelist:
# Is this how it's done?
print(result)
#... stuff that should happen only after myfuturelist is
#completely resolved.
# Documentation says pool.map is asynchronous
The documentation is weak regarding ThreadPoolExecutor.map. Help would be great.
Thanks!
The call to ThreadPoolExecutor.map does not block until all of its tasks are complete. Use wait to do this.
from concurrent.futures import wait, ALL_COMPLETED
...
futures = [pool.submit(fn, args) for args in arg_list]
wait(futures, timeout=whatever, return_when=ALL_COMPLETED) # ALL_COMPLETED is actually the default
do_other_stuff()
You could also call list(results) on the generator returned by pool.map to force the evaluation (which is what you're doing in your original example). If you're not actually using the values returned from the tasks, though, wait is the way to go.
It's true that Executor.map() will not wait for all futures to finish. Because it returns a lazy iterator like #MisterMiyagi said.
But we can accomplish this by using with:
import time
from concurrent.futures import ThreadPoolExecutor
def hello(i):
time.sleep(i)
print(i)
with ThreadPoolExecutor(max_workers=2) as executor:
executor.map(hello, [1, 2, 3])
print("finish")
# output
# 1
# 2
# 3
# finish
As you can see, finish is printed after 1,2,3. It works because Executor has a __exit__() method, code is
def __exit__(self, exc_type, exc_val, exc_tb):
self.shutdown(wait=True)
return False
the shutdown method of ThreadPoolExecutor is
def shutdown(self, wait=True, *, cancel_futures=False):
with self._shutdown_lock:
self._shutdown = True
if cancel_futures:
# Drain all work items from the queue, and then cancel their
# associated futures.
while True:
try:
work_item = self._work_queue.get_nowait()
except queue.Empty:
break
if work_item is not None:
work_item.future.cancel()
# Send a wake-up to prevent threads calling
# _work_queue.get(block=True) from permanently blocking.
self._work_queue.put(None)
if wait:
for t in self._threads:
t.join()
shutdown.__doc__ = _base.Executor.shutdown.__doc__
So by using with, we can get the ability to wait until all futures finish.
Executor.map will run jobs in parallel and wait futures to finish, collect results and return a generator. It has done the wait for you. If you set a timeout, it will wait until timeout and throw exception in generator.
map(func, *iterables, timeout=None, chunksize=1)
the iterables are collected immediately rather than lazily;
func is executed asynchronously and several calls to func may be made concurrently.
To get a list of futures and do the wait manually, you can use:
myfuturelist = [pool.submit(_exec, x) for x in range(5)]
Executor.submit will return a future object, call result on future will explicitly wait for it to finish:
myfutrelist[0].result() # wait the 1st future to finish and return the result

Return from function if execution finished within timeout or make callback otherwise

I have a project in Python 3.5 without any usage of asynchronous features. I have to implement the folowing logic:
def should_return_in_3_sec(some_serious_job, arguments, finished_callback):
# Start some_serious_job(*arguments) in a task
# if it finishes within 3 sec:
# return result immediately
# otherwise return None, but do not terminate task.
# If the task finishes in 1 minute:
# call finished_callback(result)
# else:
# call finished_callback(None)
pass
The function should_return_in_3_sec() should remain synchronous, but it is up to me to write any new asynchronous code (including some_serious_job()).
What is the most elegant and pythonic way to do it?
Fork off a thread doing the serious job, let it write its result into a queue and then terminate. Read in your main thread from that queue with a timeout of three seconds. If the timeout occurs, start another thread and return None. Let the second thread read from the queue with a timeout of one minute; if that timeouts also, call finished_callback(None); otherwise call finished_callback(result).
I sketched it like this:
import threading, queue
def should_return_in_3_sec(some_serious_job, arguments, finished_callback):
result_queue = queue.Queue(1)
def do_serious_job_and_deliver_result():
result = some_serious_job(arguments)
result_queue.put(result)
threading.Thread(target=do_serious_job_and_deliver_result).start()
try:
result = result_queue.get(timeout=3)
except queue.Empty: # timeout?
def expect_and_handle_late_result():
try:
result = result_queue.get(timeout=60)
except queue.Empty:
finished_callback(None)
else:
finished_callback(result)
threading.Thread(target=expect_and_handle_late_result).start()
return None
else:
return result
The threading module has some simple timeout options, see Thread.join(timeout) for example.
If you do choose to use asyncio, below is a a partial solution to address some of your needs:
import asyncio
import time
async def late_response(task, flag, timeout, callback):
done, pending = await asyncio.wait([task], timeout=timeout)
callback(done.pop().result() if done else None) # will raise an exception if some_serious_job failed
flag[0] = True # signal some_serious_job to stop
return await task
async def launch_job(loop, some_serious_job, arguments, finished_callback,
timeout_1=3, timeout_2=5):
flag = [False]
task = loop.run_in_executor(None, some_serious_job, flag, *arguments)
done, pending = await asyncio.wait([task], timeout=timeout_1)
if done:
return done.pop().result() # will raise an exception if some_serious_job failed
asyncio.ensure_future(
late_response(task, flag, timeout_2, finished_callback))
return None
def f(flag, n):
for i in range(n):
print("serious", i, flag)
if flag[0]:
return "CANCELLED"
time.sleep(1)
return "OK"
def finished(result):
print("FINISHED", result)
loop = asyncio.get_event_loop()
result = loop.run_until_complete(launch_job(loop, f, [1], finished))
print("result:", result)
loop.run_forever()
This will run the job in a separate thread (Use loop.set_executor(ProcessPoolExecutor()) to run a CPU intensive task in a process instead). Keep in mind it is a bad practice to terminate a process/thread - the code above uses a very simple list to signal the thread to stop (See also threading.Event / multiprocessing.Event).
While implementing your solution, you might discover you would want to modify your existing code to use couroutines instead of using threads.

EC2 Spot Instance Termination & Python 2.7

I know that the termination notice is made available via the meta-data url and that I can do something similar to
if requests.get("http://169.254.169.254/latest/meta-data/spot/termination-time").status_code == 200
in order to determine if the notice has been posted. I run a Python service on my Spot Instances that:
Loops over long polling SQS Queues
If it gets a message, it pauses polling and works on the payload.
Working on the payload can take 5-50 minutes.
Working on the payload will involve spawning a threadpool of up to 50 threads to handle parallel uploading of files to S3, this is the majority of the time spent working on the payload.
Finally, remove the message from the queue, rinse, repeat.
The work is idempotent, so if the same payload runs multiple times, I'm out the processing time/costs, but will not negatively impact the application workflow.
I'm searching for an elegant way to now also poll for the termination notice every five seconds in the background. As soon as the termination notice appears, I'd like to immediately release the message back to the SQS queue in order for another instance to pick it up as quickly as possible.
As a bonus, I'd like to shutdown the work, kill off the threadpool, and have the service enter a stasis state. If I terminate the service, supervisord will simply start it back up again.
Even bigger bonus! Is there not a python module available that simplifies this and just works?
I wrote this code to demonstrate how a thread can be used to poll for the Spot instance termination. It first starts up a polling thread, which would be responsible for checking the http endpoint.
Then we create pool of fake workers (mimicking real work to be done) and starts running the pool. Eventually the polling thread will kick in (about 10 seconds into execution as implemented) and kill the whole thing.
To prevent the script from continuing to work after Supervisor restarts it, we would simply put a check at the beginning of the __main__ and if the termination notice is there we sleep for 2.5 minutes, which is longer than that notice lasts before the instance is shutdown.
#!/usr/bin/env python
import threading
import Queue
import random
import time
import sys
import os
class Instance_Termination_Poll(threading.Thread):
"""
Sleep for 5 seconds and eventually pretend that we then recieve the
termination event
if requests.get("http://169.254.169.254/latest/meta-data/spot/termination-time").status_code == 200
"""
def run(self):
print("Polling for termination")
while True:
for i in range(30):
time.sleep(5)
if i==2:
print("Recieve Termination Poll!")
print("Pretend we returned the message to the queue.")
print("Now Kill the entire program.")
os._exit(1)
print("Well now, this is embarassing!")
class ThreadPool:
"""
Pool of threads consuming tasks from a queue
"""
def __init__(self, num_threads):
self.num_threads = num_threads
self.errors = Queue.Queue()
self.tasks = Queue.Queue(self.num_threads)
for _ in range(num_threads):
Worker(self.tasks, self.errors)
def add_task(self, func, *args, **kargs):
"""
Add a task to the queue
"""
self.tasks.put((func, args, kargs))
def wait_completion(self):
"""
Wait for completion of all the tasks in the queue
"""
try:
while True:
if self.tasks.empty() == False:
time.sleep(10)
else:
break
except KeyboardInterrupt:
print "Ctrl-c received! Kill it all with Prejudice..."
os._exit(1)
self.tasks.join()
class Worker(threading.Thread):
"""
Thread executing tasks from a given tasks queue
"""
def __init__(self, tasks, error_queue):
threading.Thread.__init__(self)
self.tasks = tasks
self.daemon = True
self.errors = error_queue
self.start()
def run(self):
while True:
func, args, kargs = self.tasks.get()
try:
func(*args, **kargs)
except Exception, e:
print("Exception " + str(e))
error = {'exception': e}
self.errors.put(error)
self.tasks.task_done()
def do_work(n):
"""
Sleeps a random ammount of time, then creates a little CPU usage to
mimic some work taking place.
"""
for z in range(100):
time.sleep(random.randint(3,10))
print "Thread ID: {} working.".format(threading.current_thread())
for x in range(30000):
x*n
print "Thread ID: {} done, sleeping.".format(threading.current_thread())
if __name__ == '__main__':
num_threads = 30
# Start up the termination polling thread
term_poll = Instance_Termination_Poll()
term_poll.start()
# Create our threadpool
pool = ThreadPool(num_threads)
for y in range(num_threads*2):
pool.add_task(do_work, n=y)
# Wait for the threadpool to complete
pool.wait_completion()

threading.Condition.wait(timeout) ignores threading.Condition.notify()

I have an application that uses 2 threads. I want to be able to shut down both threads by waiting for a condition variable exitCondition. I am using python 3.3 which unlike python 2.7 makes threading.Condition.wait() return True when the condition was notified and False for when a timeout occured.
#!/usr/bin/python
import threading
from time import sleep
exitCondition = threading.Condition()
def inputActivity():
while True:
exitCondition.acquire()
exitConditionReached = exitCondition.wait(.1) #<-critical
print(exitConditionReached)
exitCondition.release()
if exitConditionReached: #exitCondition reached -> shutdown
return
else: #exitCondition not reached -> do work
sleep(.1)
inThread = threading.Thread(target = inputActivity)
inThread.start()
sleep(.2) #<-critical
exitCondition.acquire()
exitCondition.notify()
print("exitCondition notified")
exitCondition.release()
inThread.join()
There are 2 lines with a #<-critical comment in line 10 and 21. If the sleeps are "misaligned" (for example .25 and .1) the program will terminate. If the sleeps are "aligned" (for example .2 and .1) the inThread will run indefinitely printing false forever. It looks like a race condition to me, apparently if notify is called at the same time as wait the notification is not recognized. I was under the impression that the exitCondition.acquire() and exitCondition.release() were supposed to prevent that. The question is why the condition variable is not thread safe and what I can do about it. Ideally I want to write wait(0) with the guarantee that no notification will be swallowed.
If the call to exitCondition.notify occurs when the worker thread is doing work (i.e., is in the sleep(.1) call (or anywhere else other than the .wait call), then the behaviour you describe sounds like exactly what I'd expect. The wait call returns True only if the notification happened during the wait.
It sounds to me as though this is a use-case for a threading.Event instead of a threading.Condition: replace threading.Condition with threading.Event, replace the notify call with a set call, and remove the acquire and release calls altogether (in both threads).
That is, the code should look like this:
#!/usr/bin/python
import threading
from time import sleep
exitCondition = threading.Event()
def inputActivity():
while True:
exitConditionReached = exitCondition.wait(.1) #<-critical
print(exitConditionReached)
if exitConditionReached: #exitCondition reached -> shutdown
return
else: #exitCondition not reached -> do work
sleep(.1)
inThread = threading.Thread(target = inputActivity)
inThread.start()
sleep(.2) #<-critical
exitCondition.set()
print("exitCondition set")
inThread.join()
Once you've got that far, you don't need the first .wait: you can replace that with a direct is_set call to see if the exit condition has been set yet:
#!/usr/bin/python
import threading
from time import sleep
exitCondition = threading.Event()
def inputActivity():
while True:
if exitCondition.is_set(): #exitCondition reached -> shutdown
return
else: #exitCondition not reached -> do work
sleep(.1)
inThread = threading.Thread(target = inputActivity)
inThread.start()
sleep(.2) #<-critical
exitCondition.set()
print("exitCondition set")
inThread.join()

Categories