I have two threads in a producer consumer pattern. When the consumer receives data it calls an time consuming function expensive() and then enters in a for loop.
But if while the consumer is working new data arrives, it should abort the current work, (exit the loop) and start with the new data.
I tried with a queue.Queue something like this:
q = queue.Queue()
def producer():
while True:
...
q.put(d)
def consumer():
while True:
d = q.get()
expensive(d)
for i in range(10000):
...
if not q.empty():
break
But the problem with this code is that if the producer put data too too fast, and the queue get to have many items, the consumer will do the expensive(d) call plus one loop iteration and then abort for each item, which is time consuming. The code should work, but is not optimized.
Without modifying the code in expensive one solution could be to run it as a separate process which will provide you the ability to terminateit prematurely. Since there's no mention to how long expensive runs this may or may not be more time efficient, however.
import multiprocessing as mp
q = queue.Queue()
def producer():
while True:
...
q.put(d)
def consumer():
while True:
d = q.get()
exp = mp.Thread(target=expensive, args=(d,))
for i in range(10000):
...
if not q.empty():
exp.terminate() # or exp.kill()
break
Well, one way is to use a queue design that can keep an internal lists of waiting and working threads. You can then create several consumer threads to wait on the queue and, when work arrives, set a known consumer thread to do the work. When the thread has finished, it calls into the queue to remove itself from the working list and add itself to the waiting list.
The consumer threads each have an 'abort' atomic that can signal the thread to finish early. There will be some latency while the thread performs inner loops, but that will not matter....
If new work arrives at the queue from the producer, and the working queue is not empty, the 'abort' bool of the working thread/s can be set and their priority set to the minimum possible. The new work can then be dispatched onto one of the waiting threads from the pool, so setting it working.
The waiting threads will need a 'start' function that signals an event/sema/condvar that the wait thread..well..waits on. That allows the producer that supplied work to set that specific thread running, rather than the 'usual' practice where any thread from a pool may pick up work.
Such a design allows new work to be started 'immediately', makes the previous work thread irrelevant by de-prioritizing it and avoids the overheads of thread/process termination.
Related
I have a thread that reads frames for a webcam and put them in a queue. Then another thread reads the frames as needed.
My problem is that I want to keep the latest frames in the queue even if the consumer thread cannot keep up.
I can do the following in the thread queuing frames:
if q.full():
drop = q.get()
q.put(new_frame)
But I think this can fail in to ways.
If the queue was full when full() was called but then the consumer fetches frames, the producer will discard a frame for no reason.
If the queue was full when full() was called then the consumer fetches all frames from the queue, the producer will freeze on q.get(). Basically breaking the application.
In my case, would using threading.Lock with a simple list be the way to go?
I believe that should be useful. Put a request to acquire the lock at two places, one before the function where the frames are fetched and one after checking the condition if the queue is full.
After you check that the queue is full and you have acquired the lock then you can check the condition again whether it is still full or not. With respect to this condition you can decide whether to call the get function or not.
Thread 1:
if q.full():
// acquire lock
if q.full():
drop = q.get()
else:
q.put(new_frame)
// release lock
else:
q.put(new_frame)
Thread 2:
if not q.empty():
//acquire lock
q.get()
// release lock
I have implemented a non-blocking put queue
from time import monotonic as time
from queue import Queue
from collections import deque
class NonBlockingPutQueue(Queue):
def put(self, item, block=True, timeout=None):
with self.not_full:
self._put(item)
self.unfinished_tasks += 1
self.not_empty.notify()
def _init(self, maxsize):
self.queue = deque(maxlen=maxsize)
I am learning about Thread in Python and am trying to make a simple program, one that uses threads to grab a number off the Queue and print it.
I have the following code
import threading
from Queue import Queue
test_lock = threading.Lock()
tests = Queue()
def start_thread():
while not tests.empty():
with test_lock:
if tests.empty():
return
test = tests.get()
print("{}".format(test))
for i in range(10):
tests.put(i)
threads = []
for i in range(5):
threads.append(threading.Thread(target=start_thread))
threads[i].daemon = True
for thread in threads:
thread.start()
tests.join()
When run it just prints the values and never exits.
How do I make the program exit when the Queue is empty?
From the docstring of Queue.join():
Blocks until all items in the Queue have been gotten and processed.
The count of unfinished tasks goes up whenever an item is added to the
queue. The count goes down whenever a consumer thread calls task_done()
to indicate the item was retrieved and all work on it is complete.
When the count of unfinished tasks drops to zero, join() unblocks.
So you must call tests.task_done() after processing the item.
Since your threads are daemon threads, and the queue will handle concurrent access correctly, you don't need to check if the queue is empty or use a lock. You can just do:
def start_thread():
while True:
test = tests.get()
print("{}".format(test))
tests.task_done()
I want to set up some processes that take an input and process it and the result of this result is another task that I want to be handled. Essentially each task results in zero or multiple new tasks (of the same type) eventually all tasks will yield no new tasks.
I figured a queue would be good for this so I have an input queue and a results queue to add the tasks that result in nothing new. At any one time, the queue might be empty but more could be added if another process is working on a task.
Hence, I only want it to end when all processes are simultaneously trying to get from the input queue.
I am completely new to both python multiprocessing and multiprocessing in general.
Edited to add a basic overview of what I mean:
class Consumer(Process):
def __init__(self, name):
super().__init__(name=name)
def run():
# This is where I would have the task try to get a new task off of the
# queue and then calculate the results and put them into the queue
# After which it would then try to get a new task and repeat
# If this an all other processes are trying to get and the queue is
# empty That is the only time I know that everything is complete and can
# continue
pass
def start_processing():
in_queue = Queue()
results_queue = Queue()
consumers = [Consumer(str(i)) for i in range(cpu_count())]
for i in consumers:
i.start()
# Wait for the above mentioned conditions to be true before continuing
The JoinableQueue has been designed to fit this purpose. Joining a JoinableQueue will block until there are tasks in progress.
You can use it as follows: the main process will spawn a certain amount of worker processes assigning them the JoinableQueue. The worker processes will use the queue to produce and consume new tasks. The main process will wait by joining the queue up until no more tasks are in progress. After that, it will terminate the worker processes and quit.
A very simplified example (pseudocode):
def consumer(queue):
for task in queue.get():
results = process_task(task)
if 'more_tasks' in results:
for new_task in results['more_tasks']:
queue.put(new_task)
# signal the queue that a task has been completed
queue.task_done()
def main():
queue = JoinableQueue()
processes = start_processes(consumer, queue)
for task in initial_tasks:
queue.put(task)
queue.join() # block until all work is done
terminate_processes(processes)
In Python while using multiprocessing module there are 2 kinds of queues:
Queue
JoinableQueue.
What is the difference between them?
Queue
from multiprocessing import Queue
q = Queue()
q.put(item) # Put an item on the queue
item = q.get() # Get an item from the queue
JoinableQueue
from multiprocessing import JoinableQueue
q = JoinableQueue()
q.task_done() # Signal task completion
q.join() # Wait for completion
JoinableQueue has methods join() and task_done(), which Queue hasn't.
class multiprocessing.Queue( [maxsize] )
Returns a process shared queue implemented using a pipe and a few locks/semaphores. When a process first puts an item on the queue a feeder thread is started which transfers objects from a buffer into the pipe.
The usual Queue.Empty and Queue.Full exceptions from the standard library’s Queue module are raised to signal timeouts.
Queue implements all the methods of Queue.Queue except for task_done() and join().
class multiprocessing.JoinableQueue( [maxsize] )
JoinableQueue, a Queue subclass, is a queue which additionally has task_done() and join() methods.
task_done()
Indicate that a formerly enqueued task is complete. Used by queue consumer threads. For each get() used to fetch a task, a subsequent call to task_done() tells the queue that the processing on the task is complete.
If a join() is currently blocking, it will resume when all items have been processed (meaning that a task_done() call was received for every item that had been put() into the queue).
Raises a ValueError if called more times than there were items placed in the queue.
join()
Block until all items in the queue have been gotten and processed.
The count of unfinished tasks goes up whenever an item is added to the queue. The count goes down whenever a consumer thread calls task_done() to indicate that the item was retrieved and all work on it is complete. When the count of unfinished tasks drops to zero, join() unblocks.
If you use JoinableQueue then you must call JoinableQueue.task_done() for each task removed from the queue or else the semaphore used to count the number of unfinished tasks may eventually overflow, raising an exception.
Based on the documentation, it's hard to be sure that Queue is actually empty. With JoinableQueue you can wait for the queue to empty by calling q.join(). In cases where you want to complete work in distinct batches where you do something discrete at the end of each batch, this could be helpful.
For example, perhaps you process 1000 items at a time through the queue, then send a push notification to a user that you've completed another batch. This would be challenging to implement with a normal Queue.
It might look something like:
import multiprocessing as mp
BATCH_SIZE = 1000
STOP_VALUE = 'STOP'
def consume(q):
for item in iter(q.get, STOP_VALUE):
try:
process(item)
# Be very defensive about errors since they can corrupt pipes.
except Exception as e:
logger.error(e)
finally:
q.task_done()
q = mp.JoinableQueue()
with mp.Pool() as pool:
# Pull items off queue as fast as we can whenever they're ready.
for _ in range(mp.cpu_count()):
pool.apply_async(consume, q)
for i in range(0, len(URLS), BATCH_SIZE):
# Put `BATCH_SIZE` items in queue asynchronously.
pool.map_async(expensive_func, URLS[i:i+BATCH_SIZE], callback=q.put)
# Wait for the queue to empty.
q.join()
notify_users()
# Stop the consumers so we can exit cleanly.
for _ in range(mp.cpu_count()):
q.put(STOP_VALUE)
NB: I haven't actually run this code. If you pull items off the queue faster than you put them on, you might finish early. In that case this code sends an update AT LEAST every 1000 items, and maybe more often. For progress updates, that's probably ok. If it's important to be exactly 1000, you could use an mp.Value('i', 0) and check that it's 1000 whenever your join releases.
Python's Queue has a join() method that will block until task_done() has been called on all the items that have been taken from the queue.
Is there a way to periodically check for this condition, or receive an event when it happens, so that you can continue to do other things in the meantime? You can, of course, check if the queue is empty, but that doesn't tell you if the count of unfinished tasks is actually zero.
The Python Queue itself does not support this, so you could try the following
from threading import Thread
class QueueChecker(Thread):
def __init__(self, q):
Thread.__init__(self)
self.q = q
def run(self):
q.join()
q_manager_thread = QueueChecker(my_q)
q_manager_thread.start()
while q_manager_thread.is_alive():
#do other things
#when the loop exits the tasks are done
#because the thread will have returned
#from blocking on the q.join and exited
#its run method
q_manager_thread.join() #to cleanup the thread
a while loop on the thread.is_alive() bit might not be exactly what you want, but at least you can see how to asynchronously check on the status of the q.join now.