How to let a Python thread finish gracefully - python

I'm doing a project involving data collection and logging. I have 2 threads running, a collection thread and a logging thread, both started in main. I'm trying to allow the program to be terminated gracefully when with Ctrl-C.
I'm using a threading.Event to signal to the threads to end their respective loops. It works fine to stop the sim_collectData method, but it doesn't seem to be properly stopping the logData thread. The Collection terminated print statement is never executed, and the program just stalls. (It doesn't end, just sits there).
The second while loop in logData is to make sure everything in the queue is logged. The goal is for Ctrl-C to stop the collection thread immediately, then allow the logging thread to finish emptying the queue, and only then fully terminate the program. (Right now, the data is just being printed out - eventually it's going to be logged to a database).
I don't understand why the second thread never terminates. I'm basing what I've done on this answer: Stopping a thread after a certain amount of time. What am I missing?
def sim_collectData(input_queue, stop_event):
''' this provides some output simulating the serial
data from the data logging hardware.
'''
n = 0
while not stop_event.is_set():
input_queue.put("DATA: <here are some random data> " + str(n))
stop_event.wait(random.randint(0,5))
n += 1
print "Terminating data collection..."
return
def logData(input_queue, stop_event):
n = 0
# we *don't* want to loop based on queue size because the queue could
# theoretically be empty while waiting on some data.
while not stop_event.is_set():
d = input_queue.get()
if d.startswith("DATA:"):
print d
input_queue.task_done()
n += 1
# if the stop event is recieved and the previous loop terminates,
# finish logging the rest of the items in the queue.
print "Collection terminated. Logging remaining data to database..."
while not input_queue.empty():
d = input_queue.get()
if d.startswith("DATA:"):
print d
input_queue.task_done()
n += 1
return
def main():
input_queue = Queue.Queue()
stop_event = threading.Event() # used to signal termination to the threads
print "Starting data collection thread...",
collection_thread = threading.Thread(target=sim_collectData, args=(input_queue, stop_event))
collection_thread.start()
print "Done."
print "Starting logging thread...",
logging_thread = threading.Thread(target=logData, args=(input_queue, stop_event))
logging_thread.start()
print "Done."
try:
while True:
time.sleep(10)
except (KeyboardInterrupt, SystemExit):
# stop data collection. Let the logging thread finish logging everything in the queue
stop_event.set()
main()

The problem is that your logger is waiting on d = input_queue.get() and will not check the event. One solution is to skip the event completely and invent a unique message that tells the logger to stop. When you get a signal, send that message to the queue.
import threading
import Queue
import random
import time
def sim_collectData(input_queue, stop_event):
''' this provides some output simulating the serial
data from the data logging hardware.
'''
n = 0
while not stop_event.is_set():
input_queue.put("DATA: <here are some random data> " + str(n))
stop_event.wait(random.randint(0,5))
n += 1
print "Terminating data collection..."
input_queue.put(None)
return
def logData(input_queue):
n = 0
# we *don't* want to loop based on queue size because the queue could
# theoretically be empty while waiting on some data.
while True:
d = input_queue.get()
if d is None:
input_queue.task_done()
return
if d.startswith("DATA:"):
print d
input_queue.task_done()
n += 1
def main():
input_queue = Queue.Queue()
stop_event = threading.Event() # used to signal termination to the threads
print "Starting data collection thread...",
collection_thread = threading.Thread(target=sim_collectData, args=(input_queue, stop_event))
collection_thread.start()
print "Done."
print "Starting logging thread...",
logging_thread = threading.Thread(target=logData, args=(input_queue,))
logging_thread.start()
print "Done."
try:
while True:
time.sleep(10)
except (KeyboardInterrupt, SystemExit):
# stop data collection. Let the logging thread finish logging everything in the queue
stop_event.set()
main()

I'm not an expert in threading, but in your logData function the first d=input_queue.get() is blocking, i.e., if the queue is empty it will sit an wait forever until a queue message is received. This is likely why the logData thread never terminates, it's sitting waiting forever for a queue message.
Refer to the [Python docs] to change this to a non-blocking queue read: use .get(False) or .get_nowait() - but either will require some exception handling for cases when the queue is empty.

You are calling a blocking get on your input_queue with no timeout. In either section of logData, if you call input_queue.get() and the queue is empty, it will block indefinitely, preventing the logging_thread from reaching completion.
To fix, you will want to call input_queue.get_nowait() or pass a timeout to input_queue.get().
Here is my suggestion:
def logData(input_queue, stop_event):
n = 0
while not stop_event.is_set():
try:
d = input_queue.get_nowait()
if d.startswith("DATA:"):
print "LOG: " + d
n += 1
except Queue.Empty:
time.sleep(1)
return
You are also signalling the threads to terminate, but not waiting for them to do so. Consider doing this in your main function.
try:
while True:
time.sleep(10)
except (KeyboardInterrupt, SystemExit):
stop_event.set()
collection_thread.join()
logging_thread.join()

Based on the answer of tdelaney I created an iterator based approach. The iterator exits when the termination message is encountered. I also added a counter of how many get-calls are currently blocking and a stop-method, which sends just as many termination messages. To prevent a race condition between incrementing and reading the counter, I'm setting a stopping bit there. Furthermore I don't use None as the termination message, because it can not necessarily be compared to other data types when using a PriorityQueue.
There are two restrictions, that I had no need to eliminate. For one the stop-method first waits until the queue is empty before shutting down the threads. The second restriction is, that I did not any code to make the queue reusable after stop. The latter can probably be added quite easily, while the former requires being careful about concurrency and the context in which the code is used.
You have to decide whether you want stop to also wait for all the termination messages to be consumed. I choose to put the necessary join there, but you may just remove it.
So this is the code:
import threading, queue
from functools import total_ordering
#total_ordering
class Final:
def __repr__(self):
return "∞"
def __lt__(self, other):
return False
def __eq__(self, other):
return isinstance(other, Final)
Infty = Final()
class IterQueue(queue.Queue):
def __init__(self):
self.lock = threading.Lock()
self.stopped = False
self.getters = 0
super().__init__()
def __iter__(self):
return self
def get(self):
raise NotImplementedError("This queue may only be used as an iterator.")
def __next__(self):
with self.lock:
if self.stopped:
raise StopIteration
self.getters += 1
data = super().get()
if data == Infty:
self.task_done()
raise StopIteration
with self.lock:
self.getters -= 1
return data
def stop(self):
self.join()
self.stopped = True
with self.lock:
for i in range(self.getters):
self.put(Infty)
self.join()
class IterPriorityQueue(IterQueue, queue.PriorityQueue):
pass
Oh, and I wrote this in python 3.2. So after backporting,
import threading, Queue
from functools import total_ordering
#total_ordering
class Final:
def __repr__(self):
return "Infinity"
def __lt__(self, other):
return False
def __eq__(self, other):
return isinstance(other, Final)
Infty = Final()
class IterQueue(Queue.Queue, object):
def __init__(self):
self.lock = threading.Lock()
self.stopped = False
self.getters = 0
super(IterQueue, self).__init__()
def __iter__(self):
return self
def get(self):
raise NotImplementedError("This queue may only be used as an iterator.")
def next(self):
with self.lock:
if self.stopped:
raise StopIteration
self.getters += 1
data = super(IterQueue, self).get()
if data == Infty:
self.task_done()
raise StopIteration
with self.lock:
self.getters -= 1
return data
def stop(self):
self.join()
self.stopped = True
with self.lock:
for i in range(self.getters):
self.put(Infty)
self.join()
class IterPriorityQueue(IterQueue, Queue.PriorityQueue):
pass
you would use it as
import random
import time
def sim_collectData(input_queue, stop_event):
''' this provides some output simulating the serial
data from the data logging hardware.
'''
n = 0
while not stop_event.is_set():
input_queue.put("DATA: <here are some random data> " + str(n))
stop_event.wait(random.randint(0,5))
n += 1
print "Terminating data collection..."
return
def logData(input_queue):
n = 0
# we *don't* want to loop based on queue size because the queue could
# theoretically be empty while waiting on some data.
for d in input_queue:
if d.startswith("DATA:"):
print d
input_queue.task_done()
n += 1
def main():
input_queue = IterQueue()
stop_event = threading.Event() # used to signal termination to the threads
print "Starting data collection thread...",
collection_thread = threading.Thread(target=sim_collectData, args=(input_queue, stop_event))
collection_thread.start()
print "Done."
print "Starting logging thread...",
logging_thread = threading.Thread(target=logData, args=(input_queue,))
logging_thread.start()
print "Done."
try:
while True:
time.sleep(10)
except (KeyboardInterrupt, SystemExit):
# stop data collection. Let the logging thread finish logging everything in the queue
stop_event.set()
input_queue.stop()
main()

Related

RuntimeError: reentrant call inside <_io.BufferedWriter name='<stdout>'>

I'm writing a program which starts one thread to generate "work" and add it to a queue every N seconds. Then, I have a thread pool which processes items in the queue.
The program below works perfectly fine, until I comment out/delete line #97 (time.sleep(0.5) in the main function). Once I do that, it generates a RuntimeError which attempting to gracefully stop the program (by sending a SIGINT or SIGTERM to the main process). It even works fine with an extremely small sleep like 0.1s, but has an issue with none at all.
I tried researching "reentrancy" but it went a bit over my head unfortunately.
Can anyone help me to understand this?
Code:
import random
import signal
import threading
import time
from concurrent.futures import Future, ThreadPoolExecutor
from datetime import datetime
from queue import Empty, Queue, SimpleQueue
from typing import Any
class UniqueQueue:
"""
A thread safe queue which can only ever contain unique items.
"""
def __init__(self) -> None:
self._q = Queue()
self._items = []
self._l = threading.Lock()
def get(self, block: bool = False, timeout: float | None = None) -> Any:
with self._l:
try:
item = self._q.get(block=block, timeout=timeout)
except Empty:
raise
else:
self._items.pop(0)
return item
def put(self, item: Any, block: bool = False, timeout: float | None = None) -> None:
with self._l:
if item in self._items:
return None
self._items.append(item)
self._q.put(item, block=block, timeout=timeout)
def size(self) -> int:
return self._q.qsize()
def empty(self) -> bool:
return self._q.empty()
def stop_app(sig_num, sig_frame) -> None:
# global stop_app_event
print("Signal received to stop the app")
stop_app_event.set()
def work_generator(q: UniqueQueue) -> None:
last_execution = time.time()
is_first_execution = True
while not stop_app_event.is_set():
elapsed_seconds = int(time.time() - last_execution)
if elapsed_seconds <= 10 and not is_first_execution:
time.sleep(0.5)
continue
last_execution = time.time()
is_first_execution = False
print("Generating work...")
for _ in range(100):
q.put({"n": random.randint(0, 500)})
def print_work(w) -> None:
print(f"{datetime.now()}: {w}")
def main():
# Create a work queue
work_queue = UniqueQueue()
# Create a thread to generate the work and add to the queue
t = threading.Thread(target=work_generator, args=(work_queue,))
t.start()
# Create a thread pool, get work from the queue, and submit to the pool for processing
pool = ThreadPoolExecutor(max_workers=20)
futures: list[Future] = []
while True:
print("Processing work...")
if stop_app_event.is_set():
print("stop_app_event is set:", stop_app_event.is_set())
for future in futures:
future.cancel()
break
print("Queue Size:", work_queue.size())
try:
while not work_queue.empty():
work = work_queue.get()
future = pool.submit(print_work, work)
futures.append(future)
except Empty:
pass
time.sleep(0.5)
print("Stopping the work generator thread...")
t.join(timeout=10)
print("Work generator stopped")
print("Stopping the thread pool...")
pool.shutdown(wait=True)
print("Thread pool stopped")
if __name__ == "__main__":
stop_app_event = threading.Event()
signal.signal(signalnum=signal.SIGINT, handler=stop_app)
signal.signal(signalnum=signal.SIGTERM, handler=stop_app)
main()
It's because you called print() in the signal handler, stop_app().
A signal handler is executed in a background thread In C, but in Python it is executed in the main thread(See the reference.). In your case, while executing a print() call, another print() was called, so the term 'reentrant' fits perfectly. And the current IO stack prohibits a reentrant call.(See the implementation if you are interested.)
You can remedy this by using os.write() and sys.stdout like the following.
import sys
import os
...
def stop_app(sig_num, sig_frame):
os.write(sys.stdout.fileno(), b"Signal received to stop the app\n")
stop_app_event.set()

how to terminate a thread from within another thread [duplicate]

How can I start and stop a thread with my poor thread class?
It is in loop, and I want to restart it again at the beginning of the code. How can I do start-stop-restart-stop-restart?
My class:
import threading
class Concur(threading.Thread):
def __init__(self):
self.stopped = False
threading.Thread.__init__(self)
def run(self):
i = 0
while not self.stopped:
time.sleep(1)
i = i + 1
In the main code, I want:
inst = Concur()
while conditon:
inst.start()
# After some operation
inst.stop()
# Some other operation
You can't actually stop and then restart a thread since you can't call its start() method again after its run() method has terminated. However you can make one pause and then later resume its execution by using a threading.Condition variable to avoid concurrency problems when checking or changing its running state.
threading.Condition objects have an associated threading.Lock object and methods to wait for it to be released and will notify any waiting threads when that occurs. Here's an example derived from the code in your question which shows this being done. In the example code I've made the Condition variable a part of Thread subclass instances to better encapsulate the implementation and avoid needing to introduce additional global variables:
from __future__ import print_function
import threading
import time
class Concur(threading.Thread):
def __init__(self):
super(Concur, self).__init__()
self.iterations = 0
self.daemon = True # Allow main to exit even if still running.
self.paused = True # Start out paused.
self.state = threading.Condition()
def run(self):
self.resume()
while True:
with self.state:
if self.paused:
self.state.wait() # Block execution until notified.
# Do stuff...
time.sleep(.1)
self.iterations += 1
def pause(self):
with self.state:
self.paused = True # Block self.
def resume(self):
with self.state:
self.paused = False
self.state.notify() # Unblock self if waiting.
class Stopwatch(object):
""" Simple class to measure elapsed times. """
def start(self):
""" Establish reference point for elapsed time measurements. """
self.start_time = time.time()
return self
#property
def elapsed_time(self):
""" Seconds since started. """
try:
return time.time() - self.start_time
except AttributeError: # Wasn't explicitly started.
self.start_time = time.time()
return 0
MAX_RUN_TIME = 5 # Seconds.
concur = Concur()
stopwatch = Stopwatch()
print('Running for {} seconds...'.format(MAX_RUN_TIME))
concur.start()
while stopwatch.elapsed_time < MAX_RUN_TIME:
concur.resume()
# Can also do other concurrent operations here...
concur.pause()
# Do some other stuff...
# Show Concur thread executed.
print('concur.iterations: {}'.format(concur.iterations))
This is David Heffernan's idea fleshed-out. The example below runs for 1 second, then stops for 1 second, then runs for 1 second, and so on.
import time
import threading
import datetime as DT
import logging
logger = logging.getLogger(__name__)
def worker(cond):
i = 0
while True:
with cond:
cond.wait()
logger.info(i)
time.sleep(0.01)
i += 1
logging.basicConfig(level=logging.DEBUG,
format='[%(asctime)s %(threadName)s] %(message)s',
datefmt='%H:%M:%S')
cond = threading.Condition()
t = threading.Thread(target=worker, args=(cond, ))
t.daemon = True
t.start()
start = DT.datetime.now()
while True:
now = DT.datetime.now()
if (now-start).total_seconds() > 60: break
if now.second % 2:
with cond:
cond.notify()
The implementation of stop() would look like this:
def stop(self):
self.stopped = True
If you want to restart, then you can just create a new instance and start that.
while conditon:
inst = Concur()
inst.start()
#after some operation
inst.stop()
#some other operation
The documentation for Thread makes it clear that the start() method can only be called once for each instance of the class.
If you want to pause and resume a thread, then you'll need to use a condition variable.

How to allow a class's variables to be modified concurrently by multiple threads

I have a class (MyClass) which contains a queue (self.msg_queue) of actions that need to be run and I have multiple sources of input that can add tasks to the queue.
Right now I have three functions that I want to run concurrently:
MyClass.get_input_from_user()
Creates a window in tkinter that has the user fill out information and when the user presses submit it pushes that message onto the queue.
MyClass.get_input_from_server()
Checks the server for a message, reads the message, and then puts it onto the queue. This method uses functions from MyClass's parent class.
MyClass.execute_next_item_on_the_queue()
Pops a message off of the queue and then acts upon it. It is dependent on what the message is, but each message corresponds to some method in MyClass or its parent which gets run according to a big decision tree.
Process description:
After the class has joined the network, I have it spawn three threads (one for each of the above functions). Each threaded function adds items from the queue with the syntax "self.msg_queue.put(message)" and removes items from the queue with "self.msg_queue.get_nowait()".
Problem description:
The issue I am having is that it seems that each thread is modifying its own queue object (they are not sharing the queue, msg_queue, of the class of which they, the functions, are all members).
I am not familiar enough with Multiprocessing to know what the important error messages are; however, it is stating that it cannot pickle a weakref object (it gives no indication of which object is the weakref object), and that within the queue.put() call the line "self._sem.acquire(block, timeout) yields a '[WinError 5] Access is denied'" error. Would it be safe to assume that this failure in the queue's reference not copying over properly?
[I am using Python 3.7.2 and the Multiprocessing package's Process and Queue]
[I have seen multiple Q/As about having threads shuttle information between classes--create a master harness that generates a queue and then pass that queue as an argument to each thread. If the functions didn't have to use other functions from MyClass I could see adapting this strategy by having those functions take in a queue and use a local variable rather than class variables.]
[I am fairly confident that this error is not the result of passing my queue to the tkinter object as my unit tests on how my GUI modifies its caller's queue work fine]
Below is a minimal reproducible example for the queue's error:
from multiprocessing import Queue
from multiprocessing import Process
import queue
import time
class MyTest:
def __init__(self):
self.my_q = Queue()
self.counter = 0
def input_function_A(self):
while True:
self.my_q.put(self.counter)
self.counter = self.counter + 1
time.sleep(0.2)
def input_function_B(self):
while True:
self.counter = 0
self.my_q.put(self.counter)
time.sleep(1)
def output_function(self):
while True:
try:
var = self.my_q.get_nowait()
except queue.Empty:
var = -1
except:
break
print(var)
time.sleep(1)
def run(self):
process_A = Process(target=self.input_function_A)
process_B = Process(target=self.input_function_B)
process_C = Process(target=self.output_function)
process_A.start()
process_B.start()
process_C.start()
# without this it generates the WinError:
# with this it still behaves as if the two input functions do not modify the queue
process_C.join()
if __name__ == '__main__':
test = MyTest()
test.run()
Indeed - these are not "threads" - these are "processes" - while if you were using multithreading, and not multiprocessing, the self.my_q instance would be the same object, placed at the same memory space on the computer,
multiprocessing does a fork of the process, and any data in the original process (the one in execution in the "run" call) will be duplicated when it is used - so, each subprocess will see its own "Queue" instance, unrelated to the others.
The correct way to have various process share a multiprocessing.Queue object is to pass it as a parameter to the target methods. The simpler way to reorganize your code so that it works is thus:
from multiprocessing import Queue
from multiprocessing import Process
import queue
import time
class MyTest:
def __init__(self):
self.my_q = Queue()
self.counter = 0
def input_function_A(self, queue):
while True:
queue.put(self.counter)
self.counter = self.counter + 1
time.sleep(0.2)
def input_function_B(self, queue):
while True:
self.counter = 0
queue.put(self.counter)
time.sleep(1)
def output_function(self, queue):
while True:
try:
var = queue.get_nowait()
except queue.Empty:
var = -1
except:
break
print(var)
time.sleep(1)
def run(self):
process_A = Process(target=self.input_function_A, args=(queue,))
process_B = Process(target=self.input_function_B, args=(queue,))
process_C = Process(target=self.output_function, args=(queue,))
process_A.start()
process_B.start()
process_C.start()
# without this it generates the WinError:
# with this it still behaves as if the two input functions do not modify the queue
process_C.join()
if __name__ == '__main__':
test = MyTest()
test.run()
As you can see, since your class is not actually sharing any data through the instance's attributes, this "class" design does not make much sense for your application - but for grouping the different workers in the same code block.
It would be possible to have a magic-multiprocess-class that would have some internal method to actually start the worker-methods and share the Queue instance - so if you have a lot of those in a project, there would be a lot less boilerplate.
Something along:
from multiprocessing import Queue
from multiprocessing import Process
import time
class MPWorkerBase:
def __init__(self, *args, **kw):
self.queue = None
self.is_parent_process = False
self.is_child_process = False
self.processes = []
# ensure this can be used as a colaborative mixin
super().__init__(*args, **kw)
def run(self):
if self.is_parent_process or self.is_child_process:
# workers already initialized
return
self.queue = Queue()
processes = []
cls = self.__class__
for name in dir(cls):
method = getattr(cls, name)
if callable(method) and getattr(method, "_MP_worker", False):
process = Process(target=self._start_worker, args=(self.queue, name))
self.processes.append(process)
process.start()
# Setting these attributes here ensure the child processes have the initial values for them.
self.is_parent_process = True
self.processes = processes
def _start_worker(self, queue, method_name):
# this method is called in a new spawned process - attribute
# changes here no longer reflect attributes on the
# object in the initial process
# overwrite queue in this process with the queue object sent over the wire:
self.queue = queue
self.is_child_process = True
# call the worker method
getattr(self, method_name)()
def __del__(self):
for process in self.processes:
process.join()
def worker(func):
"""decorator to mark a method as a worker that should
run in its own subprocess
"""
func._MP_worker = True
return func
class MyTest(MPWorkerBase):
def __init__(self):
super().__init__()
self.counter = 0
#worker
def input_function_A(self):
while True:
self.queue.put(self.counter)
self.counter = self.counter + 1
time.sleep(0.2)
#worker
def input_function_B(self):
while True:
self.counter = 0
self.queue.put(self.counter)
time.sleep(1)
#worker
def output_function(self):
while True:
try:
var = self.queue.get_nowait()
except queue.Empty:
var = -1
except:
break
print(var)
time.sleep(1)
if __name__ == '__main__':
test = MyTest()
test.run()

Python threading: can I sleep on two threading.Event()s simultaneously?

If I have two threading.Event() objects, and wish to sleep until either one of them is set, is there an efficient way to do that in python? Clearly I could do something with polling/timeouts, but I would like to really have the thread sleep until one is set, akin to how select is used for file descriptors.
So in the following implementation, what would an efficient non-polling implementation of wait_for_either look like?
a = threading.Event()
b = threading.Event()
wait_for_either(a, b)
Here is a non-polling non-excessive thread solution: modify the existing Events to fire a callback whenever they change, and handle setting a new event in that callback:
import threading
def or_set(self):
self._set()
self.changed()
def or_clear(self):
self._clear()
self.changed()
def orify(e, changed_callback):
e._set = e.set
e._clear = e.clear
e.changed = changed_callback
e.set = lambda: or_set(e)
e.clear = lambda: or_clear(e)
def OrEvent(*events):
or_event = threading.Event()
def changed():
bools = [e.is_set() for e in events]
if any(bools):
or_event.set()
else:
or_event.clear()
for e in events:
orify(e, changed)
changed()
return or_event
Sample usage:
def wait_on(name, e):
print "Waiting on %s..." % (name,)
e.wait()
print "%s fired!" % (name,)
def test():
import time
e1 = threading.Event()
e2 = threading.Event()
or_e = OrEvent(e1, e2)
threading.Thread(target=wait_on, args=('e1', e1)).start()
time.sleep(0.05)
threading.Thread(target=wait_on, args=('e2', e2)).start()
time.sleep(0.05)
threading.Thread(target=wait_on, args=('or_e', or_e)).start()
time.sleep(0.05)
print "Firing e1 in 2 seconds..."
time.sleep(2)
e1.set()
time.sleep(0.05)
print "Firing e2 in 2 seconds..."
time.sleep(2)
e2.set()
time.sleep(0.05)
The result of which was:
Waiting on e1...
Waiting on e2...
Waiting on or_e...
Firing e1 in 2 seconds...
e1 fired!or_e fired!
Firing e2 in 2 seconds...
e2 fired!
This should be thread-safe. Any comments are welcome.
EDIT: Oh and here is your wait_for_either function, though the way I wrote the code, it's best to make and pass around an or_event. Note that the or_event shouldn't be set or cleared manually.
def wait_for_either(e1, e2):
OrEvent(e1, e2).wait()
I think the standard library provides a pretty canonical solution to this problem that I don't see brought up in this question: condition variables. You have your main thread wait on a condition variable, and poll the set of events each time it is notified. It is only notified when one of the events is updated, so there is no wasteful polling. Here is a Python 3 example:
from threading import Thread, Event, Condition
from time import sleep
from random import random
event1 = Event()
event2 = Event()
cond = Condition()
def thread_func(event, i):
delay = random()
print("Thread {} sleeping for {}s".format(i, delay))
sleep(delay)
event.set()
with cond:
cond.notify()
print("Thread {} done".format(i))
with cond:
Thread(target=thread_func, args=(event1, 1)).start()
Thread(target=thread_func, args=(event2, 2)).start()
print("Threads started")
while not (event1.is_set() or event2.is_set()):
print("Entering cond.wait")
cond.wait()
print("Exited cond.wait ({}, {})".format(event1.is_set(), event2.is_set()))
print("Main thread done")
Example output:
Thread 1 sleeping for 0.31569427100177794s
Thread 2 sleeping for 0.486548134317051s
Threads started
Entering cond.wait
Thread 1 done
Exited cond.wait (True, False)
Main thread done
Thread 2 done
Note that wit no extra threads or unnecessary polling, you can wait for an arbitrary predicate to become true (e.g. for any particular subset of the events to be set). There's also a wait_for wrapper for the while (pred): cond.wait() pattern, which can make your code a bit easier to read.
One solution (with polling) would be to do sequential waits on each Event in a loop
def wait_for_either(a, b):
while True:
if a.wait(tunable_timeout):
break
if b.wait(tunable_timeout):
break
I think that if you tune the timeout well enough the results would be OK.
The best non-polling I can think of is to wait for each one in a different thread and set a shared Event whom you will wait after in the main thread.
def repeat_trigger(waiter, trigger):
waiter.wait()
trigger.set()
def wait_for_either(a, b):
trigger = threading.Event()
ta = threading.Thread(target=repeat_trigger, args=(a, trigger))
tb = threading.Thread(target=repeat_trigger, args=(b, trigger))
ta.start()
tb.start()
# Now do the union waiting
trigger.wait()
Pretty interesting, so I wrote an OOP version of the previous solution:
class EventUnion(object):
"""Register Event objects and wait for release when any of them is set"""
def __init__(self, ev_list=None):
self._trigger = Event()
if ev_list:
# Make a list of threads, one for each Event
self._t_list = [
Thread(target=self._triggerer, args=(ev, ))
for ev in ev_list
]
else:
self._t_list = []
def register(self, ev):
"""Register a new Event"""
self._t_list.append(Thread(target=self._triggerer, args=(ev, )))
def wait(self, timeout=None):
"""Start waiting until any one of the registred Event is set"""
# Start all the threads
map(lambda t: t.start(), self._t_list)
# Now do the union waiting
return self._trigger.wait(timeout)
def _triggerer(self, ev):
ev.wait()
self._trigger.set()
This is an old question, but I hope this helps someone coming from Google.
The accepted answer is fairly old and will cause an infinite loop for twice-"orified" events.
Here is an implementation using concurrent.futures
import concurrent.futures
from concurrent.futures import ThreadPoolExecutor
def wait_for_either(events, timeout=None, t_pool=None):
'''blocks untils one of the events gets set
PARAMETERS
events (list): list of threading.Event objects
timeout (float): timeout for events (used for polling)
t_pool (concurrent.futures.ThreadPoolExecutor): optional
'''
if any(event.is_set() for event in events):
# sanity check
pass
else:
t_pool = t_pool or ThreadPoolExecutor(max_workers=len(events))
tasks = []
for event in events:
tasks.append(t_pool.submit(event.wait))
concurrent.futures.wait(tasks, timeout=timeout, return_when='FIRST_COMPLETED')
# cleanup
for task in tasks:
try:
task.result(timeout=0)
except concurrent.futures.TimeoutError:
pass
Testing the function
import threading
import time
from datetime import datetime, timedelta
def bomb(myevent, sleep_s):
'''set event after sleep_s seconds'''
with lock:
print('explodes in ', datetime.now() + timedelta(seconds=sleep_s))
time.sleep(sleep_s)
myevent.set()
with lock:
print('BOOM!')
lock = threading.RLock() # so prints don't get jumbled
a = threading.Event()
b = threading.Event()
t_pool = ThreadPoolExecutor(max_workers=2)
threading.Thread(target=bomb, args=(event1, 5), daemon=True).start()
threading.Thread(target=bomb, args=(event2, 120), daemon=True).start()
with lock:
print('1 second timeout, no ThreadPool', datetime.now())
wait_for_either([a, b], timeout=1)
with lock:
print('wait_event_or done', datetime.now())
print('=' * 15)
with lock:
print('wait for event1', datetime.now())
wait_for_either([a, b], t_pool=t_pool)
with lock:
print('wait_event_or done', datetime.now())
Starting extra threads seems a clear solution, not very effecient though.
Function wait_events will block util any one of events is set.
def wait_events(*events):
event_share = Event()
def set_event_share(event):
event.wait()
event.clear()
event_share.set()
for event in events:
Thread(target=set_event_share(event)).start()
event_share.wait()
wait_events(event1, event2, event3)
Extending Claudiu's answer where you can either wait for:
event 1 OR event 2
event 1 AND even 2
from threading import Thread, Event, _Event
class ConditionalEvent(_Event):
def __init__(self, events_list, condition):
_Event.__init__(self)
self.event_list = events_list
self.condition = condition
for e in events_list:
self._setup(e, self._state_changed)
self._state_changed()
def _state_changed(self):
bools = [e.is_set() for e in self.event_list]
if self.condition == 'or':
if any(bools):
self.set()
else:
self.clear()
elif self.condition == 'and':
if all(bools):
self.set()
else:
self.clear()
def _custom_set(self,e):
e._set()
e._state_changed()
def _custom_clear(self,e):
e._clear()
e._state_changed()
def _setup(self, e, changed_callback):
e._set = e.set
e._clear = e.clear
e._state_changed = changed_callback
e.set = lambda: self._custom_set(e)
e.clear = lambda: self._custom_clear(e)
Example usage will be very similar as before
import time
e1 = Event()
e2 = Event()
# Example to wait for triggering of event 1 OR event 2
or_e = ConditionalEvent([e1, e2], 'or')
# Example to wait for triggering of event 1 AND event 2
and_e = ConditionalEvent([e1, e2], 'and')
Not pretty, but you can use two additional threads to multiplex the events...
def wait_for_either(a, b):
flag = False #some condition variable, event, or similar
class Event_Waiter(threading.Thread):
def __init__(self, event):
self.e = event
def run(self):
self.e.wait()
flag.set()
a_thread = Event_Waiter(a)
b_thread = Event_Waiter(b)
a.start()
b.start()
flag.wait()
Note, you may have to worry about accidentally getting both events if they arrive too quickly. The helper threads (a_thread and b_thread) should lock synchronize around trying to set flag and then should kill the other thread (possibly resetting that thread's event if it was consumed).
def wait_for_event_timeout(*events):
while not all([e.isSet() for e in events]):
#Check to see if the event is set. Timeout 1 sec.
ev_wait_bool=[e.wait(1) for e in events]
# Process if all events are set. Change all to any to process if any event set
if all(ev_wait_bool):
logging.debug('processing event')
else:
logging.debug('doing other work')
e1 = threading.Event()
e2 = threading.Event()
t3 = threading.Thread(name='non-block-multi',
target=wait_for_event_timeout,
args=(e1,e2))
t3.start()
logging.debug('Waiting before calling Event.set()')
time.sleep(5)
e1.set()
time.sleep(10)
e2.set()
logging.debug('Event is set')

Python generator pre-fetch?

I have a generator that takes a long time for each iteration to run. Is there a standard way to have it yield a value, then generate the next value while waiting to be called again?
The generator would be called each time a button is pressed in a gui and the user would be expected to consider the result after each button press.
EDIT: a workaround might be:
def initialize():
res = next.gen()
def btn_callback()
display(res)
res = next.gen()
if not res:
return
If I wanted to do something like your workaround, I'd write a class like this:
class PrefetchedGenerator(object):
def __init__(self, generator):
self._data = generator.next()
self._generator = generator
self._ready = True
def next(self):
if not self._ready:
self.prefetch()
self._ready = False
return self._data
def prefetch(self):
if not self._ready:
self._data = self._generator.next()
self._ready = True
It is more complicated than your version, because I made it so that it handles not calling prefetch or calling prefetch too many times. The basic idea is that you call .next() when you want the next item. You call prefetch when you have "time" to kill.
Your other option is a thread..
class BackgroundGenerator(threading.Thread):
def __init__(self, generator):
threading.Thread.__init__(self)
self.queue = Queue.Queue(1)
self.generator = generator
self.daemon = True
self.start()
def run(self):
for item in self.generator:
self.queue.put(item)
self.queue.put(None)
def next(self):
next_item = self.queue.get()
if next_item is None:
raise StopIteration
return next_item
This will run separately from your main application. Your GUI should remain responsive no matter how long it takes to fetch each iteration.
No. A generator is not asynchronous. This isn't multiprocessing.
If you want to avoid waiting for the calculation, you should use the multiprocessing package so that an independent process can do your expensive calculation.
You want a separate process which is calculating and enqueueing results.
Your "generator" can then simply dequeue the available results.
You can definitely do this with generators, just create your generator so that each next call alternates between getting the next value and returning it by putting in multiple yield statements. Here is an example:
import itertools, time
def quick_gen():
counter = itertools.count().next
def long_running_func():
time.sleep(2)
return counter()
while True:
x = long_running_func()
yield
yield x
>>> itr = quick_gen()
>>> itr.next() # setup call, takes two seconds
>>> itr.next() # returns immediately
0
>>> itr.next() # setup call, takes two seconds
>>> itr.next() # returns immediately
1
Note that the generator does not automatically do the processing to get the next value, it is up to the caller to call next twice for each value. For your use case you would call next once as a setup up, and then each time the user clicks the button you would display the next value generated, then call next again for the pre-fetch.
I was after something similar. I wanted yield to quickly return a value (if it could) while a background thread processed the next, next.
import Queue
import time
import threading
class MyGen():
def __init__(self):
self.queue = Queue.Queue()
# Put a first element into the queue, and initialize our thread
self.i = 1
self.t = threading.Thread(target=self.worker, args=(self.queue, self.i))
self.t.start()
def __iter__(self):
return self
def worker(self, queue, i):
time.sleep(1) # Take a while to process
queue.put(i**2)
def __del__(self):
self.stop()
def stop(self):
while True: # Flush the queue
try:
self.queue.get(False)
except Queue.Empty:
break
self.t.join()
def next(self):
# Start a thread to compute the next next.
self.t.join()
self.i += 1
self.t = threading.Thread(target=self.worker, args=(self.queue, self.i))
self.t.start()
# Now deliver the already-queued element
while True:
try:
print "request at", time.time()
obj = self.queue.get(False)
self.queue.task_done()
return obj
except Queue.Empty:
pass
time.sleep(.001)
if __name__ == '__main__':
f = MyGen()
for i in range(5):
# time.sleep(2) # Comment out to get items as they are ready
print "*********"
print f.next()
print "returned at", time.time()
The code above gave the following results:
*********
request at 1342462505.96
1
returned at 1342462505.96
*********
request at 1342462506.96
4
returned at 1342462506.96
*********
request at 1342462507.96
9
returned at 1342462507.96
*********
request at 1342462508.96
16
returned at 1342462508.96
*********
request at 1342462509.96
25
returned at 1342462509.96

Categories