I have a Producer and a Consumer thread (threading.Thread), which share a queue of type Queue.
Producer run:
while self.running:
product = produced() ### I/O operations
queue.put(product)
Consumer run:
while self.running or not queue.empty():
product = queue.get()
time.sleep(several_seconds) ###
consume(product)
Now I need to terminate both threads from main thread, with the requirement that queue must be empty (all consumed) before terminating.
Currently I'm using code like below to terminate these two threads:
main thread stop:
producer.running = False
producer.join()
consumer.running = False
consumer.join()
But I guess it's unsafe if there are more consumers.
In addition, I'm not sure whether the sleep will release schedule to the producer so that it can produce more products. In fact, I find the producer keeps "starving" but I'm not sure whether this is the root cause.
Is there a decent way to deal with this case?
You can put a sentinel object in queue to signal end of tasks, causing all consumers to terminate:
_sentinel = object()
def producer(queue):
while running:
# produce some data
queue.put(data)
queue.put(_sentinel)
def consumer(queue):
while True:
data = queue.get()
if data is _sentinel:
# put it back so that other consumers see it
queue.put(_sentinel)
break
# Process data
This snippet is shamelessly copied from Python Cookbook 12.3.
Use a _sentinel to mark end of queue. None also works if no task produced by producer is None, but using a _sentinel is safer for the more general case.
You don't need to put multiple end markers into queue, for each consumer. You may not be aware of how many threads are consuming. Just put the sentinel back into queue when a consumer finds it, for other consumers to get the signal.
Edit 2:
a) The reason your consumers keep taking so much time is because your loop runs continously even when you have no data.
b) I added code at that bottom that shows how to handle this.
If I understood you correctly, the producer/consumer is a continuous process, e.g. it is acceptable to delay the shutdown until you exit the current blocking I/O and process the data you received from that.
In that case, to shut down your producer and consumer in an orderly fashion, I would add communication from the main thread to the producer thread to invoke a shutdown. In the most general case, this could be a queue that the main thread can use to queue a "shutdown" code, but in the simple case of a single producer that is to be stopped and never restarted, it could simply be a global shutdown flag.
Your producer should check this shutdown condition (queue or flag) in its main loop right before it would start a blocking I/O operation (e.g. after you have finished sending other data to the consumer queue). If the flag is set, then it should put a special end-of-data code (that does not look like your normal data) on the queue to tell the consumer that a shut down is occurring, and then the producer should return (terminate itself).
The consumer should be modified to check for this end-of-data code whenever it pulls data out of the queue. If the end-of-data code is found, it should do an orderly shutdown and return (terminating itself).
If there are multiple consumers, then the producer could queue multiple end-of-data messages -- one for each consumer -- before it shuts down. Since the consumers stop consuming after they read the message, they will all eventually shut down.
Alternatively, if you do not know up-front how many consumers there are, then part of the orderly shut down of the consumer could be re-queueing the end-of-data code.
This will insure that all consumers eventually see the end-of-data code and shut down, and when all are done, there will be one remaining item in the queue -- the end-of-data code queued by the last consumer.
EDIT:
The correct way to represent your end-of-data code is highly application dependent, but in many cases a simple None works very well. Since None is a singleton, the consumer can use the very efficient if data is None construct to deal with the end case.
Another possibility that can be even more efficient in some cases is to set up a try /except outside your main consumer loop, in such a way that if the except happened, it was because you were trying to unpack the data in a way that always works except for when you are processing the end-of-data code.
EDIT 2:
Combining these concepts with your initial code, now the producer does this:
while self.running:
product = produced() ### I/O operations
queue.put(product)
for x in range(number_of_consumers):
queue.put(None) # Termination code
Each consumer does this:
while 1:
product = queue.get()
if product is None:
break
consume(product)
The main program can then just do this:
producer.running = False
producer.join()
for consumer in consumers:
consumer.join()
One observation from your code is that, your consumer will keep on looking for getting some thing from the queue, ideally you should handle that by keeping some timeout and handle Empty exception for the same like below, ideally this helps to check the while self.running or not queue.empty() for every timeout.
while self.running or not queue.empty():
try:
product = queue.get(timeout=1)
except Empty:
pass
time.sleep(several_seconds) ###
consume(product)
I did simulated your situation and created producer and consumer threads, Below is the sample code that is running with 2 producers and 4 consumers it's working very well. hope this helps you!
import time
import threading
from Queue import Queue, Empty
"""A multi-producer, multi-consumer queue."""
# A thread that produces data
class Producer(threading.Thread):
def __init__(self, group=None, target=None, name=None,
args=(), kwargs=None, verbose=None):
threading.Thread.__init__(self, group=group, target=target, name=name,
verbose=verbose)
self.running = True
self.name = name
self.args = args
self.kwargs = kwargs
def run(self):
out_q = self.kwargs.get('queue')
while self.running:
# Adding some integer
out_q.put(10)
# Kepping this thread in sleep not to do many iterations
time.sleep(0.1)
print 'producer {name} terminated\n'.format(name=self.name)
# A thread that consumes data
class Consumer(threading.Thread):
def __init__(self, group=None, target=None, name=None,
args=(), kwargs=None, verbose=None):
threading.Thread.__init__(self, group=group, target=target, name=name,
verbose=verbose)
self.args = args
self.kwargs = kwargs
self.producer_alive = True
self.name = name
def run(self):
in_q = self.kwargs.get('queue')
# Consumer should die one queue is producer si dead and queue is empty.
while self.producer_alive or not in_q.empty():
try:
data = in_q.get(timeout=1)
except Empty, e:
pass
# This part you can do anything to consume time
if isinstance(data, int):
# just doing some work, infact you can make this one sleep
for i in xrange(data + 10**6):
pass
else:
pass
print 'Consumer {name} terminated (Is producer alive={pstatus}, Is Queue empty={qstatus})!\n'.format(
name=self.name, pstatus=self.producer_alive, qstatus=in_q.empty())
# Create the shared queue and launch both thread pools
q = Queue()
producer_pool, consumer_pool = [], []
for i in range(1, 3):
producer_worker = Producer(kwargs={'queue': q}, name=str(i))
producer_pool.append(producer_worker)
producer_worker.start()
for i in xrange(1, 5):
consumer_worker = Consumer(kwargs={'queue': q}, name=str(i))
consumer_pool.append(consumer_worker)
consumer_worker.start()
while 1:
control_process = raw_input('> Y/N: ')
if control_process == 'Y':
for producer in producer_pool:
producer.running = False
# Joining this to make sure all the producers die
producer.join()
for consumer in consumer_pool:
# Ideally consumer should stop once producers die
consumer.producer_alive = False
break
Related
I have two threads in a producer consumer pattern. When the consumer receives data it calls an time consuming function expensive() and then enters in a for loop.
But if while the consumer is working new data arrives, it should abort the current work, (exit the loop) and start with the new data.
I tried with a queue.Queue something like this:
q = queue.Queue()
def producer():
while True:
...
q.put(d)
def consumer():
while True:
d = q.get()
expensive(d)
for i in range(10000):
...
if not q.empty():
break
But the problem with this code is that if the producer put data too too fast, and the queue get to have many items, the consumer will do the expensive(d) call plus one loop iteration and then abort for each item, which is time consuming. The code should work, but is not optimized.
Without modifying the code in expensive one solution could be to run it as a separate process which will provide you the ability to terminateit prematurely. Since there's no mention to how long expensive runs this may or may not be more time efficient, however.
import multiprocessing as mp
q = queue.Queue()
def producer():
while True:
...
q.put(d)
def consumer():
while True:
d = q.get()
exp = mp.Thread(target=expensive, args=(d,))
for i in range(10000):
...
if not q.empty():
exp.terminate() # or exp.kill()
break
Well, one way is to use a queue design that can keep an internal lists of waiting and working threads. You can then create several consumer threads to wait on the queue and, when work arrives, set a known consumer thread to do the work. When the thread has finished, it calls into the queue to remove itself from the working list and add itself to the waiting list.
The consumer threads each have an 'abort' atomic that can signal the thread to finish early. There will be some latency while the thread performs inner loops, but that will not matter....
If new work arrives at the queue from the producer, and the working queue is not empty, the 'abort' bool of the working thread/s can be set and their priority set to the minimum possible. The new work can then be dispatched onto one of the waiting threads from the pool, so setting it working.
The waiting threads will need a 'start' function that signals an event/sema/condvar that the wait thread..well..waits on. That allows the producer that supplied work to set that specific thread running, rather than the 'usual' practice where any thread from a pool may pick up work.
Such a design allows new work to be started 'immediately', makes the previous work thread irrelevant by de-prioritizing it and avoids the overheads of thread/process termination.
I'm using the multiprocessing module to split up a very large task. It works for the most part, but I must be missing something obvious with my design, because this way it's very hard for me to effectively tell when all of the data has been processed.
I have two separate tasks that run; one that feeds the other. I guess this is a producer/consumer problem. I use a shared Queue between all processes, where the producers fill up the queue, and the consumers read from the queue and do the processing. The problem is that there is a finite amount of data, so at some point everyone needs to know that all of the data has been processed so the system can shut down gracefully.
It would seem to make sense to use the map_async() function, but since the producers are filling up the queue, I don't know all of the items up front, so I have to go into a while loop and use apply_async() and try to detect when everything is done with some sort of timeout...ugly.
I feel like I'm missing something obvious. How can this be better designed?
PRODCUER
class ProducerProcess(multiprocessing.Process):
def __init__(self, item, consumer_queue):
self.item = item
self.consumer_queue = consumer_queue
multiprocessing.Process.__init__(self)
def run(self):
for record in get_records_for_item(self.item): # this takes time
self.consumer_queue.put(record)
def start_producer_processes(producer_queue, consumer_queue, max_running):
running = []
while not producer_queue.empty():
running = [r for r in running if r.is_alive()]
if len(running) < max_running:
producer_item = producer_queue.get()
p = ProducerProcess(producer_item, consumer_queue)
p.start()
running.append(p)
time.sleep(1)
CONSUMER
def process_consumer_chunk(queue, chunksize=10000):
for i in xrange(0, chunksize):
try:
# don't wait too long for an item
# if new records don't arrive in 10 seconds, process what you have
# and let the next process pick up more items.
record = queue.get(True, 10)
except Queue.Empty:
break
do_stuff_with_record(record)
MAIN
if __name__ == "__main__":
manager = multiprocessing.Manager()
consumer_queue = manager.Queue(1024*1024)
producer_queue = manager.Queue()
producer_items = xrange(0,10)
for item in producer_items:
producer_queue.put(item)
p = multiprocessing.Process(target=start_producer_processes, args=(producer_queue, consumer_queue, 8))
p.start()
consumer_pool = multiprocessing.Pool(processes=16, maxtasksperchild=1)
Here is where it gets cheesy. I can't use map, because the list to consume is being filled up at the same time. So I have to go into a while loop and try to detect a timeout. The consumer_queue can become empty while the producers are still trying to fill it up, so I can't just detect an empty queue an quit on that.
timed_out = False
timeout= 1800
while 1:
try:
result = consumer_pool.apply_async(process_consumer_chunk, (consumer_queue, ), dict(chunksize=chunksize,))
if timed_out:
timed_out = False
except Queue.Empty:
if timed_out:
break
timed_out = True
time.sleep(timeout)
time.sleep(1)
consumer_queue.join()
consumer_pool.close()
consumer_pool.join()
I thought that maybe I could get() the records in the main thread and pass those into the consumer instead of passing the queue in, but I think I end up with the same problem that way. I still have to run a while loop and use apply_async() Thank you in advance for any advice!
You could use a manager.Event to signal the end of the work. This event can be shared between all of your processes and then when you signal it from your main process the other workers can then gracefully shutdown.
while not event.is_set():
...rest of code...
So, your consumers would wait for the event to be set and handle the cleanup once it is set.
To determine when to set this flag you can do a join on the producer threads and when those are all complete you can then join on the consumer threads.
I would like to strongly recommend SimPy instead of multiprocess/threading to do discrete event simulation.
I have a specific problem.
Main content of program starts with creating Process with dbus loop, where I listen for signals.
Content of signals I store in queues. In next part of main I have a threadpool.
When some thread takes item from queue, it use specific function(detection) to handle request - based on content of item from queue. (There is operation on database, from where I take data and make some operations depends on request)
Every thread in thread pool starts one more thread, which should handle signals (current status and interrupt).
For example: I receive signal, which means I have to handle something on numbers. Any thread from threadpool takes this item from queue and starts function which handle something on numbers - it can take long time. So after any time, I receive signal for current status and I need to send current status of detection - that's why I use threads (for shared memory). Also I can receive interrupt signal from D-Bus ("it takes too long time, so stop this detection and be free for another request"). And the interrupt is the main problem...
So my main questions are:
Is there any way, I can raise exception on interrupt signal and stop function (detection)? (I just found solution, but only for catch in main... but I need to catch it in thread which is in threadpool and raise in thread which is in thread in threadpool)
Second question is about GIL... does my thread with signal receiving receive all signals? I think it doesn't... (Yes, I use threads_init())
program:
SERVICE = multiprocessing.Process(target=dbus_signal_receiver, args=(...))
SERVICE.daemon = True
SERVICE.start()
class worker(threading.Thread):
def __init__(self,...):
threading.Thread.__init__(self)
def run(self):
while True:
#get item from queue
s = threading.Thread(target=curr_and_interr_signal_handle, args=(ID of item from queue,...))
s.daemon = True
s.start()
#start specific detection based on request
for i in range(number of threads):
t = worker(...)
t.daemon = True
t.start()
and I hoped, something like this will work... (but it doesn't)
...
class worker(threading.Thread):
def __init__(self,...):
threading.Thread.__init__(self)
def run(self):
while True:
try:
#get item from queue
s = threading.Thread(target=curr_and_interr_signal_handle, args=(ID of item from queue,...))
s.daemon = True
s.start()
#start specific detection based on request
except raised_interrupt_exception:
#continue - wait for another request from queue
...
Read about 18.8.1.2. Signals and threads
Python signal handlers are always executed in the main Python thread,
even if the signal was received in another thread.
This means that signals can’t be used as a means of inter-thread communication.
You can use the synchronization primitives from the threading module instead.
Besides, only the main thread is allowed to set a new signal handler.
Read about 17.1.7. Event Objects
This is one of the simplest mechanisms for communication between threads: one thread signals an event and other threads wait for it
Isn't clear why you have to use thread in thread.
Why could your worker thread not handle detection?
For instance, the following should be do it:
def run(self):
while self.running.is_set():
#get item from queue
#start specific detection based on request
Is there a specific type of Queue that is "closable", and is suitable for when there are multiple producers, consumers, and the data comes from a stream (so its not known when it will end)?
I've been unable to find a queue that implements this sort of behavior, or a name for one, but it seems like a integral type for producer-consumer type problems.
As an example, ideally I could write code where (1) each producer would tell the queue when it was done, (2) consumers would blindly call a blocking get(), and (3) when all consumers were done, and the queue was empty, all the producers would unblock and receive a "done" notification:
As code, it'd look something like this:
def produce():
for x in range(randint()):
queue.put(x)
sleep(randint())
queue.close() # called once for every producer
def consume():
while True:
try:
print queue.get()
except ClosedQueue:
print 'done!'
break
num_producers = randint()
queue = QueueTypeThatICantFigureOutANameFor(num_producers)
[Thread(target=produce).start() for _ in range(num_producers)]
[Thread(target=consume).start() for _ in range(random())
Also, I'm not looking for the "Poison Pill" solution, where a "done" value is added to the queue for every consumer -- I don't like the inelegance of producers needing to know how many consumers there are.
I'd call that a self-latching queue.
For your primary requirement, combine the queue with a condition variable check that gracefully latches (shuts down) the queue when all producers have vacated:
class SelfLatchingQueue(LatchingQueue):
...
def __init__(self, num_producers):
...
def close(self):
'''Called by a producer to indicate that it is done producing'''
... perhaps check that current thread is a known producer? ...
with self.a_mutex:
self._num_active_producers -= 1
if self._num_active_producers <= 0:
# Future put()s throw QueueLatched. get()s will empty the queue
# and then throw QueueEmpty thereafter
self.latch() # Guess what superclass implements this?
For your secondary requirement (#3 in the original post, finished producers apparently block until all consumers are finished), I'd perhaps use a barrier or just another condition variable. This could be implemented in a subclass of the SelfLatchingQueue, of course, but without knowing the codebase I'd keep this behavior separate from the automatic latching.
I have a queue that always needs to be ready to process items when they are added to it. The function that runs on each item in the queue creates and starts thread to execute the operation in the background so the program can go do other things.
However, the function I am calling on each item in the queue simply starts the thread and then completes execution, regardless of whether or not the thread it started completed. Because of this, the loop will move on to the next item in the queue before the program is done processing the last item.
Here is code to better demonstrate what I am trying to do:
queue = Queue.Queue()
t = threading.Thread(target=worker)
t.start()
def addTask():
queue.put(SomeObject())
def worker():
while True:
try:
# If an item is put onto the queue, immediately execute it (unless
# an item on the queue is still being processed, in which case wait
# for it to complete before moving on to the next item in the queue)
item = queue.get()
runTests(item)
# I want to wait for 'runTests' to complete before moving past this point
except Queue.Empty, err:
# If the queue is empty, just keep running the loop until something
# is put on top of it.
pass
def runTests(args):
op_thread = SomeThread(args)
op_thread.start()
# My problem is once this last line 't.start()' starts the thread,
# the 'runTests' function completes operation, but the operation executed
# by some thread is not yet done executing because it is still running in
# the background. I do not want the 'runTests' function to actually complete
# execution until the operation in thread t is done executing.
"""t.join()"""
# I tried putting this line after 't.start()', but that did not solve anything.
# I have commented it out because it is not necessary to demonstrate what
# I am trying to do, but I just wanted to show that I tried it.
Some notes:
This is all running in a PyGTK application. Once the 'SomeThread' operation is complete, it sends a callback to the GUI to display the results of the operation.
I do not know how much this affects the issue I am having, but I thought it might be important.
A fundamental issue with Python threads is that you can't just kill them - they have to agree to die.
What you should do is:
Implement the thread as a class
Add a threading.Event member which the join method clears and the thread's main loop occasionally checks. If it sees it's cleared, it returns. For this override threading.Thread.join to check the event and then call Thread.join on itself
To allow (2), make the read from Queue block with some small timeout. This way your thread's "response time" to the kill request will be the timeout, and OTOH no CPU choking is done
Here's some code from a socket client thread I have that has the same issue with blocking on a queue:
class SocketClientThread(threading.Thread):
""" Implements the threading.Thread interface (start, join, etc.) and
can be controlled via the cmd_q Queue attribute. Replies are placed in
the reply_q Queue attribute.
"""
def __init__(self, cmd_q=Queue.Queue(), reply_q=Queue.Queue()):
super(SocketClientThread, self).__init__()
self.cmd_q = cmd_q
self.reply_q = reply_q
self.alive = threading.Event()
self.alive.set()
self.socket = None
self.handlers = {
ClientCommand.CONNECT: self._handle_CONNECT,
ClientCommand.CLOSE: self._handle_CLOSE,
ClientCommand.SEND: self._handle_SEND,
ClientCommand.RECEIVE: self._handle_RECEIVE,
}
def run(self):
while self.alive.isSet():
try:
# Queue.get with timeout to allow checking self.alive
cmd = self.cmd_q.get(True, 0.1)
self.handlers[cmd.type](cmd)
except Queue.Empty as e:
continue
def join(self, timeout=None):
self.alive.clear()
threading.Thread.join(self, timeout)
Note self.alive and the loop in run.