Need for while True: - python

I don't understand why "while True:" is needed in below example
import os
import sys
import subprocess
import time
from threading import Thread
from Queue import Queue
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
def do_work(item):
time.sleep(item)
print item
q = Queue()
for i in range(2):
t = Thread(target=worker)
t.daemon = True
t.start()
source = [2,3,1,4,5]
for item in source:
q.put(item)
q.join()

Because otherwise the worker thread would quit as soon as the first job was processed from the queue. The infinite loop ensures that the worker thread retrieves a new job from the queue when finished.
Update: to summarize the comments to my (admittedly hasty) answer: the worker thread is daemonic (ensured by t.daemon = True), which means that it will automatically terminate when there are only daemonic threads left in the Python interpreter (a more detailed explanation is given here). It is also worth mentioning that the get method of the queue on which the worker operates blocks the thread when the queue is empty to let other threads run while the worker is waiting for more jobs to appear in the queue.

Related

How to use Queue correctly with ProcessPoolExecutor in Python?

I am trying to use a queue to load up a bunch of tasks and then have a process pool setup go at it where each process pops-out a task out of the queue and works on it. I am running into problems in that the setup is not working. Something is blocking the processes from getting started and I need help in figuring out the bug. E.g. the queue is filled up correctly, however, when the individual process runs, it doesn't start processing the task subroutine.
# -*- coding: utf-8 -*-
"""
Created on Tue Aug 30 17:08:42 2022
#author: Rahul
"""
import threading
import queue
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import multiprocessing as mp
import time
q = queue.Queue()
# some worker task
def worker(id, q):
print(f'{id}:: Worker running', flush=True)
while q.unfinished_tasks > 0:
item = q.get()
print(f'{id}::Working on {item}', flush=True)
print(f'{id}::Finished {item}', flush=True)
q.task_done()
print(f'{id}::Sleeping. Item: {item}', flush=True)
time.sleep(0.1)
print(
f'We reached the end. Queue size is {q.unfinished_tasks}', flush=True)
def main():
print('running main')
# Send thirty task requests to the worker.
for item in range(30):
q.put(item)
# Confirm that queue is filled
print(f'Size of queue {q.unfinished_tasks}')
id = 0
# start process pool
with ProcessPoolExecutor(max_workers=4) as executor:
executor.map(worker, [1, 2, 3, 4], [q, q, q, q])
# Block until all tasks are done.
q.join()
print('All work completed')
if __name__ == "__main__":
main()
This creates the following output and is stuck after that, no control of keyboard etc., have to shutdown IDE and restart.
running main
Size of queue 30
For multiprocessing, there are 2 ways to uses a queue.
You have to either
use queue as shared global via initializer parameter or
use a manager
See Python multiprocessing.Queue vs multiprocessing.manager().Queue() for examples for how to set it up.
Below is an example of using a manager of OP's use case.
A few things to highlight:
Uses manager.Queue() which helps share the queue across different processes.
Typically for worker process, it's better to use while True loop that terminates when seeing some SENTINEL value. This will allow the worker to wait even if all current work is done, in case there are more work coming. It's also more robust than q.empty() check (or q.unfinished_tasks, which doesn't exist in multiprocessing version)
Using the SENTINEL approach requires adding 4 SENTINEL values, 1 for each process, after all the tasks.
with ProcessPoolExecutor ... context manager is blocking, meaning it will wait until all Processes exits before continuing to the next lines. You may consider using explicit shutdown for non-block statments, e.g.
executor = ProcessPoolExecutor(max_workers=4)
executor.map(...)
executor.shutdown(wait=False)
Now finally, the example solution:
from concurrent.futures import ProcessPoolExecutor, ThreadPoolExecutor
import multiprocessing as mp
import time
# some worker task
SENTINEL = 'DONE'
def worker(id, q):
print(f'{id}:: Worker running', flush=True)
# better to use the while True with SENTINEL
# other methods such as checking 'q.empty()' may be unreliable.
while True:
item = q.get()
if item == SENTINEL:
q.task_done()
break
print(f'{id}::Working on {item}', flush=True)
print(f'{id}::Finished {item}', flush=True)
q.task_done()
print(f'{id}::Sleeping. Item: {item}', flush=True)
time.sleep(0.1)
print(
f'We reached the end.', flush=True)
def main():
print('running main')
# Send thirty task requests to the worker.
with mp.Manager() as manager:
q = manager.Queue()
for item in range(30):
q.put(item)
# adding 4 sentinel values at the end, 1 for each process.
for _ in range(4):
q.put(SENTINEL)
# Confirm that queue is filled
print(f'Approx queue size: {q.qsize()}')
id = 0
# start process pool
with ProcessPoolExecutor(max_workers=4) as executor:
executor.map(worker, [1, 2, 3, 4], [q, q, q, q])
print('working')
# Block until all tasks are done.
q.join()
print('All work completed')
if __name__ == "__main__":
main()

Python JoinableQueue and Queue Thread Not Completing

I am using the following code to complete a task using multithreading with Queue and Joinable Queue. Sometimes the script executes perfectly other times it stalls at the end of the task without ending the worker and will not continue on to the next portion of the script. I am new to working with Queue and JoinableQueue and I need to find out why this stalling happens.
Before this part in the code I run another Queue, JoinableQueue worker to download some data and it works perfectly fine everytime. Do I need to close() any thing from the first Queue/JoinableQueue? Is there a way to check if it stalls and if so continue on?
Here is my code:
import multiprocessing
from multiprocessing import Queue
from multiprocessing import JoinableQueue
from threading import Thread
def run_this_definition(hr):
#do things here
return()
def worker():
while True:
item = jq.get()
run_this_definition(item)
jq.task_done()
return()
q = Queue()
jq = JoinableQueue()
number_of_threads = 8
for i in range(number_of_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
input_list = [0,1,2,3,4]
for item in input_list:
jq.put(item)
jq.join()
print "finished"
The script never prints "finished" when it stalls, but seems to finish all the tasks and stalls at the end of the 'run_this_definition' on the very last item in the Queue.
My guess is you are using the multiprocessing.JoinableQueue()!? Use the Queue.Queue() instead for threading. It has a .join() and a .task_done() method as well. Furthermore you should pass your queue as an argument to your threads: See the following example:
import threading
from threading import Thread
from Queue import Queue
def worker(jq):
while True:
item = jq.get()
# Do whatever you have to do.
print '{}: {}'.format(threading.currentThread().name, item)
jq.task_done()
return()
number_of_threads = 4
input_list = [1,2,3,4,5]
jq = Queue()
for i in range(number_of_threads):
t = Thread(target=worker, args=(jq,))
t.daemon = True
t.start()
for item in input_list:
jq.put(item)
jq.join()
print "finished"
The print output form multiple threads might look messy, but as an example it should be fine.
For the future: Please provide a comprehensive example of your code. Neither your imports, nor number_of_threads, run_this_definition or input_list were defined in your example.

Python Queue.join()

Even if I do not set thread as Daemon, shouldn't the program exit itself once queue.join(), completes and unblocks?
#!/usr/bin/python
import Queue
import threading
import time
class workerthread(threading.Thread):
def __init__(self,queue):
threading.Thread.__init__(self)
self.queue=queue
def run(self):
print 'In Worker Class'
while True:
counter=self.queue.get()
print 'Going to Sleep'
time.sleep(counter)
print ' I am up!'
self.queue.task_done()
queue=Queue.Queue()
for i in range(10):
worker=workerthread(queue)
print 'Going to Thread!'
worker.daemon=True
worker.start()
for j in range(10):
queue.put(j)
queue.join()
When you call queue.join() in the main thread, all it does is block the main threads until the workers have processed everything that's in the queue. It does not stop the worker threads, which continue executing their infinite loops.
If the worker threads are non-deamon, their continuing execution prevents the program from stopping irrespective of whether the main thread has finished.
I encountered the situation too, everything in the queue had been processed, but the main thread blocked at the point of Queue.task_done(), here is code block.
import queue
def test04():
q = queue.Queue(10)
for x in range(10):
q.put(x)
while q.not_empty:
print('content--->',q.get())
sleep(1)
re = q.task_done()
print('state--->',re,'\n')
q.join()
print('over \n')
test04()

Signal the end of jobs on the Queue?

Here's an example code of from Python documentation:
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
I modified it to fit my use case like this:
import threading
from Queue import Queue
max_threads = 10
q = Queue(maxsize=max_threads + 2)
def worker():
while True:
task = q.get(1)
# do something with the task
q.task_done()
for i in range(max_threads):
t = threading.Thread(target=worker)
t.start()
for task in ['a', 'b', 'c']:
q.put(task)
q.join()
When I execute it, debugger says that all the jobs were executed, but q.join() seems to wait forever. How can I send a signal to the worker threads that I already sent all the tasks?
This process doesn't finish at .join() because the worker threads continue waiting on new queue data (blocking .get())
Here is a method that uses a simple flag finishUp to tell workers to exit, which we set after .join() is done - meaning all tasks are processed. I added a timeout in the q.get() call to allow it to check on finishUp flag
import threading
import queue
max_threads = 5
q = queue.Queue(maxsize=max_threads + 2)
finishUp = False
def worker():
while True:
try:
task = q.get(block=True, timeout=1)
# do something with the task
print ("processing task for:"+str(task))
q.task_done()
except Exception as ex: # we get this exception when queue is empty
if finishUp:
print ("thread finishing because processing is done")
return
for i in range(max_threads):
t = threading.Thread(target=worker)
t.start()
for task in ['a', 'b', 'c']:
q.put(task)
print ("waiting on join")
q.join()
finishUp = True # let the workers know that they can exit
print ("finished")
this produces the following output:
waiting on join
processing task for:a
processing task for:b
processing task for:c
finished
thread finishing because processing is done
thread finishing because processing is done
thread finishing because processing is done
thread finishing because processing is done
thread finishing because processing is done
Process finished with exit code 0
q.join() actually returns. You can test that by put print("done") after the q.join() line.
....
q.join()
print('done')
Then, why does it not end the program?
Because, by default, threads are non-daemon thread.
You can set thread as daemon thread using <thread_object>.daemon = True
for i in range(max_threads):
t = threading.Thread(target=worker)
t.daemon = True # <---
t.start()
According to threading module documentation:
daemon
A boolean value indicating whether this thread is a daemon thread
(True) or not (False). This must be set before start() is called,
otherwise RuntimeError is raised. Its initial value is inherited from
the creating thread; the main thread is not a daemon thread and
therefore all threads created in the main thread default to daemon =
False.
The entire Python program exits when no alive non-daemon threads are
left.
New in version 2.6.
I defined a DONE object to signal the end of work:
DONE = object()
and literally put it into the queue when the upper level knows that no more data will come:
q.put_nowait(DONE)
in the worker thread, as soon as the object is received, the thread quits.
But in case there are other threads listening on the very same queue, we have to put the object back on the queue:
item = q.get()
if item is DONE:
q.put_nowait(DONE)
return
cheers :)

How do I handle exceptions when using threading and Queue?

If I have a program that uses threading and Queue, how do I get exceptions to stop execution? Here is an example program, which is not possible to stop with ctrl-c (basically ripped from the python docs).
from threading import Thread
from Queue import Queue
from time import sleep
def do_work(item):
sleep(0.5)
print "working" , item
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
num_worker_threads = 10
for i in range(num_worker_threads):
t = Thread(target=worker)
# t.setDaemon(True)
t.start()
for item in range(1, 10000):
q.put(item)
q.join() # block until all tasks are done
The simplest way is to start all the worker threads as daemon threads, then just have your main loop be
while True:
sleep(1)
Hitting Ctrl+C will throw an exception in your main thread, and all of the daemon threads will exit when the interpreter exits. This assumes you don't want to perform cleanup in all of those threads before they exit.
A more complex way is to have a global stopped Event:
stopped = Event()
def worker():
while not stopped.is_set():
try:
item = q.get_nowait()
do_work(item)
except Empty: # import the Empty exception from the Queue module
stopped.wait(1)
Then your main loop can set the stopped Event to False when it gets a KeyboardInterrupt
try:
while not stopped.is_set():
stopped.wait(1)
except KeyboardInterrupt:
stopped.set()
This lets your worker threads finish what they're doing you want instead of just having every worker thread be a daemon and exit in the middle of execution. You can also do whatever cleanup you want.
Note that this example doesn't make use of q.join() - this makes things more complex, though you can still use it. If you do then your best bet is to use signal handlers instead of exceptions to detect KeyboardInterrupts. For example:
from signal import signal, SIGINT
def stop(signum, frame):
stopped.set()
signal(SIGINT, stop)
This lets you define what happens when you hit Ctrl+C without affecting whatever your main loop is in the middle of. So you can keep doing q.join() without worrying about being interrupted by a Ctrl+C. Of course, with my above examples, you don't need to be joining, but you might have some other reason for doing so.

Categories