Python Queue.join() - python

Even if I do not set thread as Daemon, shouldn't the program exit itself once queue.join(), completes and unblocks?
#!/usr/bin/python
import Queue
import threading
import time
class workerthread(threading.Thread):
def __init__(self,queue):
threading.Thread.__init__(self)
self.queue=queue
def run(self):
print 'In Worker Class'
while True:
counter=self.queue.get()
print 'Going to Sleep'
time.sleep(counter)
print ' I am up!'
self.queue.task_done()
queue=Queue.Queue()
for i in range(10):
worker=workerthread(queue)
print 'Going to Thread!'
worker.daemon=True
worker.start()
for j in range(10):
queue.put(j)
queue.join()

When you call queue.join() in the main thread, all it does is block the main threads until the workers have processed everything that's in the queue. It does not stop the worker threads, which continue executing their infinite loops.
If the worker threads are non-deamon, their continuing execution prevents the program from stopping irrespective of whether the main thread has finished.

I encountered the situation too, everything in the queue had been processed, but the main thread blocked at the point of Queue.task_done(), here is code block.
import queue
def test04():
q = queue.Queue(10)
for x in range(10):
q.put(x)
while q.not_empty:
print('content--->',q.get())
sleep(1)
re = q.task_done()
print('state--->',re,'\n')
q.join()
print('over \n')
test04()

Related

Signal the end of jobs on the Queue?

Here's an example code of from Python documentation:
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
I modified it to fit my use case like this:
import threading
from Queue import Queue
max_threads = 10
q = Queue(maxsize=max_threads + 2)
def worker():
while True:
task = q.get(1)
# do something with the task
q.task_done()
for i in range(max_threads):
t = threading.Thread(target=worker)
t.start()
for task in ['a', 'b', 'c']:
q.put(task)
q.join()
When I execute it, debugger says that all the jobs were executed, but q.join() seems to wait forever. How can I send a signal to the worker threads that I already sent all the tasks?
This process doesn't finish at .join() because the worker threads continue waiting on new queue data (blocking .get())
Here is a method that uses a simple flag finishUp to tell workers to exit, which we set after .join() is done - meaning all tasks are processed. I added a timeout in the q.get() call to allow it to check on finishUp flag
import threading
import queue
max_threads = 5
q = queue.Queue(maxsize=max_threads + 2)
finishUp = False
def worker():
while True:
try:
task = q.get(block=True, timeout=1)
# do something with the task
print ("processing task for:"+str(task))
q.task_done()
except Exception as ex: # we get this exception when queue is empty
if finishUp:
print ("thread finishing because processing is done")
return
for i in range(max_threads):
t = threading.Thread(target=worker)
t.start()
for task in ['a', 'b', 'c']:
q.put(task)
print ("waiting on join")
q.join()
finishUp = True # let the workers know that they can exit
print ("finished")
this produces the following output:
waiting on join
processing task for:a
processing task for:b
processing task for:c
finished
thread finishing because processing is done
thread finishing because processing is done
thread finishing because processing is done
thread finishing because processing is done
thread finishing because processing is done
Process finished with exit code 0
q.join() actually returns. You can test that by put print("done") after the q.join() line.
....
q.join()
print('done')
Then, why does it not end the program?
Because, by default, threads are non-daemon thread.
You can set thread as daemon thread using <thread_object>.daemon = True
for i in range(max_threads):
t = threading.Thread(target=worker)
t.daemon = True # <---
t.start()
According to threading module documentation:
daemon
A boolean value indicating whether this thread is a daemon thread
(True) or not (False). This must be set before start() is called,
otherwise RuntimeError is raised. Its initial value is inherited from
the creating thread; the main thread is not a daemon thread and
therefore all threads created in the main thread default to daemon =
False.
The entire Python program exits when no alive non-daemon threads are
left.
New in version 2.6.
I defined a DONE object to signal the end of work:
DONE = object()
and literally put it into the queue when the upper level knows that no more data will come:
q.put_nowait(DONE)
in the worker thread, as soon as the object is received, the thread quits.
But in case there are other threads listening on the very same queue, we have to put the object back on the queue:
item = q.get()
if item is DONE:
q.put_nowait(DONE)
return
cheers :)

Need for while True:

I don't understand why "while True:" is needed in below example
import os
import sys
import subprocess
import time
from threading import Thread
from Queue import Queue
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
def do_work(item):
time.sleep(item)
print item
q = Queue()
for i in range(2):
t = Thread(target=worker)
t.daemon = True
t.start()
source = [2,3,1,4,5]
for item in source:
q.put(item)
q.join()
Because otherwise the worker thread would quit as soon as the first job was processed from the queue. The infinite loop ensures that the worker thread retrieves a new job from the queue when finished.
Update: to summarize the comments to my (admittedly hasty) answer: the worker thread is daemonic (ensured by t.daemon = True), which means that it will automatically terminate when there are only daemonic threads left in the Python interpreter (a more detailed explanation is given here). It is also worth mentioning that the get method of the queue on which the worker operates blocks the thread when the queue is empty to let other threads run while the worker is waiting for more jobs to appear in the queue.

Python threading: will Event.set() really notify every waiting thread

If I have a threading.Event and the following two lines of code:
event.set()
event.clear()
and I have some threads who are waiting for that event.
My question is related to what happens when calling the set() method:
Can I be ABSOLUTELY sure that all the waiting thread(s) will be notified? (i.e. Event.set() "notifies" the threads)
Or could it happen that those two lines are executed so quickly after each other, that some threads might still be waiting? (i.e. Event.wait() polls the event's state, which might be already "cleared" again)
Thanks for your answers!
In the internals of Python, an event is implemented with a Condition() object.
When calling the event.set() method, the notify_all() of the condition is called (after getting the lock to be sure to be not interrupted), then all the threads receive the notification (the lock is released only when all the threads are notified), so you can be sure that all the threads will effectively be notified.
Now, clearing the event just after the notification is not a problem.... until you do not want to check the event value in the waiting threads with an event.is_set(), but you only need this kind of check if you were waiting with a timeout.
Examples :
pseudocode that works :
#in main thread
event = Event()
thread1(event)
thread2(event)
...
event.set()
event.clear()
#in thread code
...
event.wait()
#do the stuff
pseudocode that may not work :
#in main thread
event = Event()
thread1(event)
thread2(event)
...
event.set()
event.clear()
#in thread code
...
while not event.is_set():
event.wait(timeout_value)
#do the stuff
Edited : in python >= 2.7 you can still wait for an event with a timeout and be sure of the state of the event :
event_state = event.wait(timeout)
while not event_state:
event_state = event.wait(timeout)
It's easy enough to verify that things work as expected (Note: this is Python 2 code, which will need adapting for Python 3):
import threading
e = threading.Event()
threads = []
def runner():
tname = threading.current_thread().name
print 'Thread waiting for event: %s' % tname
e.wait()
print 'Thread got event: %s' % tname
for t in range(100):
t = threading.Thread(target=runner)
threads.append(t)
t.start()
raw_input('Press enter to set and clear the event:')
e.set()
e.clear()
for t in threads:
t.join()
print 'All done.'
If you run the above script and it terminates, all should be well :-) Notice that a hundred threads are waiting for the event to be set; it's set and cleared straight away; all threads should see this and should terminate (though not in any definite order, and the "All done" can be printed anywhere after the "Press enter" prompt, not just at the very end.
Python 3+
It's easier to check that it works
import threading
import time
lock = threading.Lock() # just to sync printing
e = threading.Event()
threads = []
def runner():
tname = threading.current_thread().name
with lock:
print('Thread waiting for event ', tname)
e.wait()
with lock:
print('Thread got event: ', tname)
for t in range(8): # Create 8 threads could be 100's
t = threading.Thread(target=runner)
threads.append(t)
t.start()
time.sleep(1) # force wait until set/clear
e.set()
e.clear()
for t in threads:
t.join()
print('Done')

How do I handle exceptions when using threading and Queue?

If I have a program that uses threading and Queue, how do I get exceptions to stop execution? Here is an example program, which is not possible to stop with ctrl-c (basically ripped from the python docs).
from threading import Thread
from Queue import Queue
from time import sleep
def do_work(item):
sleep(0.5)
print "working" , item
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
num_worker_threads = 10
for i in range(num_worker_threads):
t = Thread(target=worker)
# t.setDaemon(True)
t.start()
for item in range(1, 10000):
q.put(item)
q.join() # block until all tasks are done
The simplest way is to start all the worker threads as daemon threads, then just have your main loop be
while True:
sleep(1)
Hitting Ctrl+C will throw an exception in your main thread, and all of the daemon threads will exit when the interpreter exits. This assumes you don't want to perform cleanup in all of those threads before they exit.
A more complex way is to have a global stopped Event:
stopped = Event()
def worker():
while not stopped.is_set():
try:
item = q.get_nowait()
do_work(item)
except Empty: # import the Empty exception from the Queue module
stopped.wait(1)
Then your main loop can set the stopped Event to False when it gets a KeyboardInterrupt
try:
while not stopped.is_set():
stopped.wait(1)
except KeyboardInterrupt:
stopped.set()
This lets your worker threads finish what they're doing you want instead of just having every worker thread be a daemon and exit in the middle of execution. You can also do whatever cleanup you want.
Note that this example doesn't make use of q.join() - this makes things more complex, though you can still use it. If you do then your best bet is to use signal handlers instead of exceptions to detect KeyboardInterrupts. For example:
from signal import signal, SIGINT
def stop(signum, frame):
stopped.set()
signal(SIGINT, stop)
This lets you define what happens when you hit Ctrl+C without affecting whatever your main loop is in the middle of. So you can keep doing q.join() without worrying about being interrupted by a Ctrl+C. Of course, with my above examples, you don't need to be joining, but you might have some other reason for doing so.

BoundedSemaphore hangs in threads on KeyboardInterrupt

If you raise a KeyboardInterrupt while trying to acquire a semaphore, the threads that also try to release the same semaphore object hang indefinitely.
Code:
import threading
import time
def worker(i, sema):
time.sleep(2)
print i, "finished"
sema.release()
sema = threading.BoundedSemaphore(value=5)
threads = []
for x in xrange(100):
sema.acquire()
t = threading.Thread(target=worker, args=(x, sema))
t.start()
threads.append(t)
Start this up and then ^C as it is running. It will hang and never exit.
0 finished
3 finished
1 finished
2 finished
4 finished
^C5 finished
Traceback (most recent call last):
File "/tmp/proof.py", line 15, in <module>
sema.acquire()
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/threading.py", line 290, in acquire
self.__cond.wait()
File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/threading.py", line 214, in wait
waiter.acquire()
KeyboardInterrupt
6 finished
7 finished
8 finished
9 finished
How can I get it to let the last few threads die natural deaths and then exit normally? (which it does if you don't try to interrupt it)
You can use the signal module to set a flag that tells the main thread to stop processing:
import threading
import time
import signal
import sys
sigint = False
def sighandler(num, frame):
global sigint
sigint = True
def worker(i, sema):
time.sleep(2)
print i, "finished"
sema.release()
signal.signal(signal.SIGINT, sighandler)
sema = threading.BoundedSemaphore(value=5)
threads = []
for x in xrange(100):
sema.acquire()
if sigint:
sys.exit()
t = threading.Thread(target=worker, args=(x, sema))
t.start()
t.join()
threads.append(t)
In your original code you could also make the threads daemon threads. When you interrupt the script, the daemon threads all die as you expected.
t = ...
t.setDaemon(True)
t.start()
In this case, it looks like you might just want to use a thread pool to control the starting and stopping of your threads. You could use Chris Arndt's threadpool library in a manner something like this:
pool = ThreadPool(5)
try:
# enqueue 100 worker threads
pool.wait()
except KeyboardInterrupt, k:
pool.dismiss(5)
# the program will exit after all running threads are complete
This is bug #11714, and has been patched in newer versions of python.
If you are using an older python, you could copy the the version of Semaphore found in that patch into your project and use it instead of relying on the buggy version in threading
# importing modules
import threading
import time
# defining our worker and pass a counter and the semaphore to it
def worker(i, sema):
time.sleep(2)
print i, "finished"
# releasing the thread increments the sema value
sema.release()
# creating the semaphore object
sema = threading.BoundedSemaphore(value=5)
# a list to store the created threads
threads = []
for x in xrange(100):
try:
sema.acquire()
t = threading.Thread(target=worker, args=(x, sema))
t.start()
threads.append(t)
# exit once the user hit CTRL+c
# or you can make the thead as daemon t.setdaemon(True)
except KeyboardInterrupt:
exit()

Categories