Python's Queue has a join() method that will block until task_done() has been called on all the items that have been taken from the queue.
Is there a way to periodically check for this condition, or receive an event when it happens, so that you can continue to do other things in the meantime? You can, of course, check if the queue is empty, but that doesn't tell you if the count of unfinished tasks is actually zero.
The Python Queue itself does not support this, so you could try the following
from threading import Thread
class QueueChecker(Thread):
def __init__(self, q):
Thread.__init__(self)
self.q = q
def run(self):
q.join()
q_manager_thread = QueueChecker(my_q)
q_manager_thread.start()
while q_manager_thread.is_alive():
#do other things
#when the loop exits the tasks are done
#because the thread will have returned
#from blocking on the q.join and exited
#its run method
q_manager_thread.join() #to cleanup the thread
a while loop on the thread.is_alive() bit might not be exactly what you want, but at least you can see how to asynchronously check on the status of the q.join now.
Related
I have two threads in a producer consumer pattern. When the consumer receives data it calls an time consuming function expensive() and then enters in a for loop.
But if while the consumer is working new data arrives, it should abort the current work, (exit the loop) and start with the new data.
I tried with a queue.Queue something like this:
q = queue.Queue()
def producer():
while True:
...
q.put(d)
def consumer():
while True:
d = q.get()
expensive(d)
for i in range(10000):
...
if not q.empty():
break
But the problem with this code is that if the producer put data too too fast, and the queue get to have many items, the consumer will do the expensive(d) call plus one loop iteration and then abort for each item, which is time consuming. The code should work, but is not optimized.
Without modifying the code in expensive one solution could be to run it as a separate process which will provide you the ability to terminateit prematurely. Since there's no mention to how long expensive runs this may or may not be more time efficient, however.
import multiprocessing as mp
q = queue.Queue()
def producer():
while True:
...
q.put(d)
def consumer():
while True:
d = q.get()
exp = mp.Thread(target=expensive, args=(d,))
for i in range(10000):
...
if not q.empty():
exp.terminate() # or exp.kill()
break
Well, one way is to use a queue design that can keep an internal lists of waiting and working threads. You can then create several consumer threads to wait on the queue and, when work arrives, set a known consumer thread to do the work. When the thread has finished, it calls into the queue to remove itself from the working list and add itself to the waiting list.
The consumer threads each have an 'abort' atomic that can signal the thread to finish early. There will be some latency while the thread performs inner loops, but that will not matter....
If new work arrives at the queue from the producer, and the working queue is not empty, the 'abort' bool of the working thread/s can be set and their priority set to the minimum possible. The new work can then be dispatched onto one of the waiting threads from the pool, so setting it working.
The waiting threads will need a 'start' function that signals an event/sema/condvar that the wait thread..well..waits on. That allows the producer that supplied work to set that specific thread running, rather than the 'usual' practice where any thread from a pool may pick up work.
Such a design allows new work to be started 'immediately', makes the previous work thread irrelevant by de-prioritizing it and avoids the overheads of thread/process termination.
I was reading about Queue in the Python documentation and this book, and I don't fully understand why my thread hangs. I have the following mcve:
from threading import Thread
import queue
def print_number(number_queue_display):
while True:
number = number_queue_display.get()
print(number)
number_queue_display.task_done()
number_queue = queue.Queue()
printing_numbers = Thread(target=print_number, args=(number_queue,),)
printing_numbers.start()
number_queue.put(5)
number_queue.put(10)
number_queue.put(15)
number_queue.put(20)
number_queue.join()
printing_numbers.join()
The only time it works is if I set the thread to daemon like so:
printing_numbers.setDaemon(True)
but that's because as stated in the Python documentation, the program will exit when only the daemon threads are left. The Python docs example for Queue doesn't use a daemon thread.
A thread can be flagged as a “daemon thread”. The significance of this
flag is that the entire Python program exits when only daemon threads
are left.
Even if I were to remove the two joins(number_queue.join() printing_numbers.join()), it still hangs, but I'm unsure of why.
Questions:
Why is it hanging?
How do I keep it as a non-daemon thread, but prevent it from hanging?
print_number() is running an infinite loop - it never exits, so the thread never ends. It sits in number_queue_display.get() forever, waiting for another queue item that never appears. Then, since the thread never ends, printing_numbers.join() also waits forever.
So you need some way to tell the thread to quit. One common way is to put a special "sentinel" value on the queue, and have the thread exit when it sees that. For concreteness, here's a complete program, which is very much the same as what you started with. None is used as the sentinel (and is commonly used for this purpose), but any unique object would work. Note that the .task_done() parts were removed, because they no longer serve a purpose.
from threading import Thread
import queue
def print_number(number_queue_display):
while True:
number = number_queue_display.get()
if number is None:
break
print(number)
number_queue = queue.Queue()
printing_numbers = Thread(target=print_number, args=(number_queue,),)
printing_numbers.start()
number_queue.put(5)
number_queue.put(10)
number_queue.put(15)
number_queue.put(20)
number_queue.put(None) # tell the thread it's done
printing_numbers.join() # wait for the thread to exit
I am learning about Thread in Python and am trying to make a simple program, one that uses threads to grab a number off the Queue and print it.
I have the following code
import threading
from Queue import Queue
test_lock = threading.Lock()
tests = Queue()
def start_thread():
while not tests.empty():
with test_lock:
if tests.empty():
return
test = tests.get()
print("{}".format(test))
for i in range(10):
tests.put(i)
threads = []
for i in range(5):
threads.append(threading.Thread(target=start_thread))
threads[i].daemon = True
for thread in threads:
thread.start()
tests.join()
When run it just prints the values and never exits.
How do I make the program exit when the Queue is empty?
From the docstring of Queue.join():
Blocks until all items in the Queue have been gotten and processed.
The count of unfinished tasks goes up whenever an item is added to the
queue. The count goes down whenever a consumer thread calls task_done()
to indicate the item was retrieved and all work on it is complete.
When the count of unfinished tasks drops to zero, join() unblocks.
So you must call tests.task_done() after processing the item.
Since your threads are daemon threads, and the queue will handle concurrent access correctly, you don't need to check if the queue is empty or use a lock. You can just do:
def start_thread():
while True:
test = tests.get()
print("{}".format(test))
tests.task_done()
I start a bunch of threads working on a queue and I want to kill them when sending the SIGINT (Ctrl+C). What is the best way to handle this?
targets = Queue.Queue()
threads_num = 10
threads = []
for i in threads_num:
t = MyThread()
t.setDaemon(True)
threads.append(t)
t.start()
targets.join()
If you are not interested in letting the other threads shut down gracefully, simply start them in daemon mode and wrap the join of the queue in a terminator thread.
That way, you can make use of the join method of the thread -- which supports a timeout and does not block off exceptions -- instead of having to wait on the queue's join method.
In other words, do something like this:
term = Thread(target=someQueueVar.join)
term.daemon = True
term.start()
while (term.isAlive()):
term.join(3600)
Now, Ctrl+C will terminate the MainThread whereupon the Python Interpreter hard-kills all threads marked as "daemons". Do note that this means that you have to set "Thread.daemon" for all the other threads or shut them down gracefully by catching the correct exception (KeyboardInterrupt or SystemExit) and doing whatever needs to be done for them to quit.
Do also note that you absolutely need to pass a number to term.join(), as otherwise it will, too, ignore all exceptions. You can select an arbitrarily high number, though.
Isn't Ctrl+C SIGINT?
Anyway, you can install a handler for the appropriate signal, and in the handler:
set a global flag that instructs the workers to exit, and make sure they check it periodically
or put 10 shutdown tokens on the queue, and have the workers exit when they pop this magic token
or set a flag which instructs the main thread to push those tokens, make sure the main thread checks that flag
etc. Mostly it depends on the structure of the application you're interrupting.
One way to do it is to install a signal handler for SIGTERM that directly calls os._exit(signal.SIGTERM). However unless you specify the optional timeout argument to Queue.get the signal handler function will not run until after the get method returns. (That's completely undocumented; I discovered that on my own.) So you can specify sys.maxint as the timeout and put your Queue.get call in a retry loop for purity to get around that.
Why don't you set timeouts for any operation on the queue? Then your threads can regular check if they have to finish by checking if an Event is raised.
This is how I tackled this.
class Worker(threading.Thread):
def __init__(self):
self.shutdown_flag = threading.Event()
def run(self):
logging.info('Worker started')
while not self.shutdown_flag.is_set():
try:
task = self.get_task_from_queue()
except queue.Empty:
continue
self.process_task(task)
def get_task_from_queue(self) -> Task:
return self.task_queue.get(block=True, timeout=10)
def shutdown(self):
logging.info('Shutdown received')
self.shutdown_flag.set()
Upon receiving a signal the main thread sets the shutdown event on workers. The workers wait on a blocking queue, but keep checking every 10 seconds if they have received a shutdown signal.
I managed to solve the problem by emptying the queue on KeyboardInterrupt and letting threads to gracefully stop themselves.
I don't know if it's the best way to handle this but is simple and quite clean.
targets = Queue.Queue()
threads_num = 10
threads = []
for i in threads_num:
t = MyThread()
t.setDaemon(True)
threads.append(t)
t.start()
while True:
try:
# If the queue is empty exit loop
if self.targets.empty() is True:
break
# KeyboardInterrupt handler
except KeyboardInterrupt:
print "[X] Interrupt! Killing threads..."
# Substitute the old queue with a new empty one and exit loop
targets = Queue.Queue()
break
# Join every thread on the queue normally
targets.join()
I have a queue that always needs to be ready to process items when they are added to it. The function that runs on each item in the queue creates and starts thread to execute the operation in the background so the program can go do other things.
However, the function I am calling on each item in the queue simply starts the thread and then completes execution, regardless of whether or not the thread it started completed. Because of this, the loop will move on to the next item in the queue before the program is done processing the last item.
Here is code to better demonstrate what I am trying to do:
queue = Queue.Queue()
t = threading.Thread(target=worker)
t.start()
def addTask():
queue.put(SomeObject())
def worker():
while True:
try:
# If an item is put onto the queue, immediately execute it (unless
# an item on the queue is still being processed, in which case wait
# for it to complete before moving on to the next item in the queue)
item = queue.get()
runTests(item)
# I want to wait for 'runTests' to complete before moving past this point
except Queue.Empty, err:
# If the queue is empty, just keep running the loop until something
# is put on top of it.
pass
def runTests(args):
op_thread = SomeThread(args)
op_thread.start()
# My problem is once this last line 't.start()' starts the thread,
# the 'runTests' function completes operation, but the operation executed
# by some thread is not yet done executing because it is still running in
# the background. I do not want the 'runTests' function to actually complete
# execution until the operation in thread t is done executing.
"""t.join()"""
# I tried putting this line after 't.start()', but that did not solve anything.
# I have commented it out because it is not necessary to demonstrate what
# I am trying to do, but I just wanted to show that I tried it.
Some notes:
This is all running in a PyGTK application. Once the 'SomeThread' operation is complete, it sends a callback to the GUI to display the results of the operation.
I do not know how much this affects the issue I am having, but I thought it might be important.
A fundamental issue with Python threads is that you can't just kill them - they have to agree to die.
What you should do is:
Implement the thread as a class
Add a threading.Event member which the join method clears and the thread's main loop occasionally checks. If it sees it's cleared, it returns. For this override threading.Thread.join to check the event and then call Thread.join on itself
To allow (2), make the read from Queue block with some small timeout. This way your thread's "response time" to the kill request will be the timeout, and OTOH no CPU choking is done
Here's some code from a socket client thread I have that has the same issue with blocking on a queue:
class SocketClientThread(threading.Thread):
""" Implements the threading.Thread interface (start, join, etc.) and
can be controlled via the cmd_q Queue attribute. Replies are placed in
the reply_q Queue attribute.
"""
def __init__(self, cmd_q=Queue.Queue(), reply_q=Queue.Queue()):
super(SocketClientThread, self).__init__()
self.cmd_q = cmd_q
self.reply_q = reply_q
self.alive = threading.Event()
self.alive.set()
self.socket = None
self.handlers = {
ClientCommand.CONNECT: self._handle_CONNECT,
ClientCommand.CLOSE: self._handle_CLOSE,
ClientCommand.SEND: self._handle_SEND,
ClientCommand.RECEIVE: self._handle_RECEIVE,
}
def run(self):
while self.alive.isSet():
try:
# Queue.get with timeout to allow checking self.alive
cmd = self.cmd_q.get(True, 0.1)
self.handlers[cmd.type](cmd)
except Queue.Empty as e:
continue
def join(self, timeout=None):
self.alive.clear()
threading.Thread.join(self, timeout)
Note self.alive and the loop in run.