I have some code which I hopefully boiled down to a correct MWE.
My goal is to stop the (multiple) threads if a list within the thread has a specific length.
In contrast to the MWE it is not known how many iterations are needed:
from queue import Queue
from threading import Thread
def is_even(n):
return n % 2 == 0
class MT(Thread):
def __init__(self, queue):
super().__init__()
self.queue = queue
self.output = []
def run(self):
while len(self.output) < 4:
task = self.queue.get()
if is_even(task):
self.output.append(task)
self.queue.task_done()
else:
self.queue.task_done()
print(self.output)
print('done')
queue = Queue(10)
threads = 1
thr = []
for th in range(threads):
thr.append(MT(queue))
for th in thr:
th.start()
for i in range(100):
queue.put(i)
queue.join()
for th in thr:
th.join()
print('finished')
This code wil not hit finish...
To quote the documentation,
Queue.join()
Blocks until all items in the queue have been gotten and processed.
You have placed 100 items in the queue. The thread pulls 4 items, and completes. There are still 96 unprocessed items, and nobody is going to pull them. Therefore, queue.join() never returns.
Related
I'm wondering if there can be a sort of deadlock in the following code. I have to read each element of a database (about 1 million items), process it, then collect the results in a unique file.
I've parallelized the execution with multiprocessing using two Queue's and three types of processes:
Reader: Main process which reads the database and adds the read items in a task_queue
Worker: Pool of processes. Each worker gets an item from task_queue, processes the item, saves the results in an intermediate file stored in item_name/item_name.txt and puts the item_name in a completed_queue
Writer: Process which gets an item_name from completed_queue, gets the intermediate result from item_name/item_name.txt and writes it in results.txt
from multiprocessing import Pool, Process, Queue
class Computation():
def __init__(self,K):
self.task_queue = Queue()
self.completed_queue = Queue()
self.n_cpus = K
def reader(self,):
with open(db, "r") as db:
... # Read an item
self.task_queue.put(item)
def worker(self,):
while True:
item = self.task_queue.get(True)
if item == "STOP":
break
self.process_item(item)
def writer_process(self,):
while True:
f = self.completed_queue.get(True)
if f == "DONE":
break
self.write_f(f)
def run(self,):
pool = Pool(n_cpus, self.worker, args=())
writer = Process(target=self.writer_process, args=())
writer.start()
self.reader()
pool.close()
pool.join()
self.completed_queue.put("DONE")
writer.join()
The code works, but it seems that sometimes the writer or the pool stops working (or they are very slow). Is a deadlock possible in this scenario?
There are a couple of issues with your code. First, by using the queues as you are, you are in effect creating your own process pool and have no need for using the multiprocessing.Pool class at all. You are using a pool initializer as an actual pool worker and it's a bit of a misuse of this class; you would be better off to just use regular Process instances (my opinion, anyway).
Second, although it is well and good that you are putting message DONE to the writer_process to signal it to terminate, you have not done similarly for the self.n_cpus worker processes, which are looking for 'STOP' messages, and therefore the reader function needs to put self.n_cpus STOP messages in the task queue:
from multiprocessing import Process, Queue
class Computation():
def __init__(self, K):
self.task_queue = Queue()
self.completed_queue = Queue()
self.n_cpus = K
def reader(self,):
with open(db, "r") as db:
... # Read an item
self.task_queue.put(item)
# signal to the worker processes to terminate:
for _ in range(self.n_cpus):
self.task_queue.put('STOP')
def worker(self,):
while True:
item = self.task_queue.get(True)
if item == "STOP":
break
self.process_item(item)
def writer_process(self,):
while True:
f = self.completed_queue.get(True)
if f == "DONE":
break
self.write_f(f)
def run(self):
processes = [Process(target=self.worker) for _ in range(self.n_cpus)]
for p in processes:
p.start()
writer = Process(target=self.writer_process, args=())
writer.start()
self.reader()
for p in processes:
p.join()
self.completed_queue.put("DONE")
writer.join()
Personally, instead of using 'STOP' and 'DONE' as the sentinel messages, I would use None instead, assuming that is not a valid actual message. I have tested the above code where reader just processed strings in a list and self.process_item(item) simply appended ' done' to the each of those strings and put the modified string on the completed_queue and replaced self.write_f in the writer_process with a print call. I did not see any problems with the code as is.
Update to use a Managed Queue
Disclaimer: I have had no experience using mpi4py and have no idea how the queue proxies would get distributed across different computers. The above code may not be sufficient as suggested by the following article, How to share mutliprocessing queue object between multiple computers. However, that code is creating instances of Queue.Queue (that code is Python 2 code) and not the proxies that are returned by the multiprocessing.SyncManager. The documentation on this is very poor. Try the above change to see if it works better (it will be slower).
Because the proxy returned by manager.Queue(), I have had to rearrange the code a bit; the queues are now being passed explicitly as arguments to the process functions:
from multiprocessing import Process, Manager
class Computation():
def __init__(self, K):
self.n_cpus = K
def reader(self, task_queue):
with open(db, "r") as db:
... # Read an item
# signal to the worker processes to terminate:
for _ in range(self.n_cpus):
task_queue.put('STOP')
def worker(self, task_queue, completed_queue):
while True:
item = task_queue.get(True)
if item == "STOP":
break
self.process_item(item)
def writer_process(self, completed_queue):
while True:
f = completed_queue.get(True)
if f == "DONE":
break
self.write_f(f)
def run(self):
with Manager() as manager:
task_queue = manager.Queue()
completed_queue = manager.Queue()
processes = [Process(target=self.worker, args=(task_queue, completed_queue)) for _ in range(self.n_cpus)]
for p in processes:
p.start()
writer = Process(target=self.writer_process, args=(completed_queue,))
writer.start()
self.reader(task_queue)
for p in processes:
p.join()
completed_queue.put("DONE")
writer.join()
I've found an example representing producer-consumer with two threads. But, when I send a signal to the process to stop, it doesn't. It expects second signal e.g. SIGKILL to completely stop. I thought the problem is with task_done() but it seems not.
import time
import queue
import threading
import random
class Producer(threading.Thread):
"""
Produces random integers to a list
"""
def __init__(self, queue):
"""
Constructor.
#param queue queue synchronization object
"""
threading.Thread.__init__(self)
self.queue = queue
def run(self):
"""
Thread run method. Append random integers to the integers
list at random time.
"""
while True:
integer = random.randint(0, 256)
self.queue.put(integer)
print('%d put to queue by %s' % (integer, self.name))
time.sleep(1)
class Consumer(threading.Thread):
"""
Consumes random integers from a list
"""
def __init__(self, queue):
"""
Constructor.
#param integers list of integers
#param queue queue synchronization object
"""
threading.Thread.__init__(self)
self.queue = queue
def run(self):
"""
Thread run method. Consumes integers from list
"""
while True:
integer = self.queue.get()
print('%d popped from list by %s' % (integer, self.name))
self.queue.task_done()
def main():
q = queue.Queue()
t1 = Producer(q)
t2 = Consumer(q)
t1.start()
t2.start()
t1.join()
t2.join()
if __name__ == '__main__':
main()
Output:
210 put to queue by Thread-1
210 popped from list by Thread-2
Traceback (most recent call last):
File "/Users/abc/PycharmProjects/untitled1/ssid.py", line 74, in <module>
main()
File "/Users/abc/PycharmProjects/untitled1/ssid.py", line 69, in main
t1.join()
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 1056, in join
self._wait_for_tstate_lock()
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/threading.py", line 1072, in _wait_for_tstate_lock
elif lock.acquire(block, timeout):
KeyboardInterrupt
244 put to queue by Thread-1
244 popped from list by Thread-2
85 put to queue by Thread-1
85 popped from list by Thread-2
160 put to queue by Thread-1
160 popped from list by Thread-2
It's because only the main-thread get's stopped by the KeyboardInterrupt. You can watch this by letting your child threads print threading.enumerate() which returns all alive threads + the main thread.
import time
import queue
import threading
import random
class Producer(threading.Thread):
def __init__(self, queue):
super().__init__()
self.queue = queue
def run(self):
while True:
integer = random.randint(0, 256)
self.queue.put(integer)
print(f'{integer} put to queue by {self.name} '
f'threads: {threading.enumerate()}')
time.sleep(1)
class Consumer(threading.Thread):
def __init__(self, queue):
super().__init__()
self.queue = queue
def run(self):
while True:
integer = self.queue.get()
print(f'{integer} popped from list by {self.name} '
f'threads:{threading.enumerate()}')
self.queue.task_done()
def main():
q = queue.Queue()
t1 = Producer(q)
t2 = Consumer(q)
# t1.daemon = True
# t2.daemon = True
t1.start()
t2.start()
t1.join()
t2.join()
if __name__ == '__main__':
try:
main()
except KeyboardInterrupt:
print('got KeyboardInterrupt')
Example Output with KeyboardInterrupt. Note the MainThread listed as 'stopped' after the KeyboardInterrupt:
97 put to queue by Thread-1 threads: [<_MainThread(MainThread, started
139810293606208)>, <Producer(Thread-1, started 139810250913536)>,
<Consumer(Thread-2, started 139810242520832)>]
97 popped from list by Thread-2 threads:[<_MainThread(MainThread, started
139810293606208)>, <Producer(Thread-1, started 139810250913536)>,
<Consumer(Thread-2, started 139810242520832)>]
got KeyboardInterrupt
92 put to queue by Thread-1 threads: [<_MainThread(MainThread, stopped
139810293606208)>, <Producer(Thread-1, started 139810250913536)>,
<Consumer(Thread-2, started 139810242520832)>]
92 popped from list by Thread-2 threads:[<_MainThread(MainThread, stopped
139810293606208)>, <Producer(Thread-1, started 139810250913536)>,
<Consumer(Thread-2, started 139810242520832)>]
You could make the child-threads daemons to let them exit with the main-thread. But that should be only considered in case your threads don't hold any resources:
Note Daemon threads are abruptly stopped at shutdown. Their resources (such as open files, database transactions, etc.) may not be released properly. If you want your threads to stop gracefully, make them non-daemonic and use a suitable signalling mechanism such as an Event docs.
The better way would be to catch the KeyboardInterrupt like in the code above and send a sentinel value over the queue to the child-threads to let them know they should finish, allowing them to do clean-up before exit.
I use Queue to provide tasks that threads can work on. After all work is done from Queue, I see the threads are still alive while I expected them being released. Here is my code. You can see the active threads number is increasing after a batch of task(in the same queue) increases from the console. How could I release the threads after a batch of work get done?
import threading
import time
from Queue import Queue
class ThreadWorker(threading.Thread):
def __init__(self, task_queue):
threading.Thread.__init__(self)
self.task_queue = task_queue
def run(self):
while True:
work = self.task_queue.get()
#do some work
# do_work(work)
time.sleep(0.1)
self.task_queue.task_done()
def get_batch_work_done(works):
task_queue = Queue()
for _ in range(5):
t = ThreadWorker(task_queue)
t.setDaemon(True)
t.start()
for work in range(works):
task_queue.put(work)
task_queue.join()
print 'get batch work done'
print 'active threads count is {}'.format(threading.activeCount())
if __name__ == '__main__':
for work_number in range(3):
print 'start with {}'.format(work_number)
get_batch_work_done(work_number)
Do a non blocking read in a loop and use the exception handling to terminate
def run(self):
try:
while True:
work = self.task_queue.get(True, 0.1)
#do some work
# do_work(work)
except Queue.Empty:
print "goodbye"
In the below program I have posted 5 jobs to the queue, but have created only 3 threads. When I run the program, only 3 jobs are completed. How am I supposed to complete all 5 jobs with only 3 threads? Is there a way to the make a thread that has completed its job take the next job?
import time
import Queue
import threading
class worker(threading.Thread):
def __init__(self,qu):
threading.Thread.__init__(self)
self.que=qu
def run(self):
print "Going to sleep.."
time.sleep(self.que.get())
print "Slept .."
self.que.task_done()
q = Queue.Queue()
for j in range(3):
work = worker(q);
work.setDaemon(True)
work.start()
for i in range(5):
q.put(1)
q.join()
print "done!!"
You need to have your worker threads run in a loop. You can use a sentinel value (like None or custom class) to tell the workers to shut down after you've put all your actual worked items in the queue:
import time
import Queue
import threading
class worker(threading.Thread):
def __init__(self,qu):
threading.Thread.__init__(self)
self.que=qu
def run(self):
for item in iter(self.que.get, None): # This will call self.que.get() until None is returned, at which point the loop will break.
print "Going to sleep.."
time.sleep(item)
print "Slept .."
self.que.task_done()
self.que.task_done()
q = Queue.Queue()
for j in range(3):
work = worker(q);
work.setDaemon(True)
work.start()
for i in range(5):
q.put(1)
for i in range(3): # Shut down all the workers
q.put(None)
q.join()
print "done!!"
Another option would be to use a multiprocessing.dummy.Pool, which is a thread pool that Python manages for you:
import time
from multiprocessing.dummy import Pool
def run(i):
print "Going to sleep..."
time.sleep(i)
print "Slept .."
p = Pool(3) # 3 threads in the pool
p.map(run, range(5)) # Calls run(i) for each element i in range(5)
p.close()
p.join()
print "done!!"
# multi-processes
from multiprocessing import Process, Queue
class Worker(object):
def __init__(self, queue):
self.queue = queue
self.process_num = 10 <------------ 10 processes
self.count = 0
def start(self):
for i in range(self.process_num):
p = Process(target = self.run)
p.start()
p.join()
def run(self):
while True:
self.count += 1
user = self.queue.get()
# do something not so fast like time.sleep(1)
print self.count
if self.queue.empty():
break
I use Worker().start(queue) to start the program, but the output is not so fast as i expected(Seems only one process are running).
Is there any problem in my code ?
Yes, you're only running one process at a time, you're waiting for each process to terminate before starting the next;
def start(self):
for i in range(self.process_num):
p = Process(target = self.run)
p.start() <-- starts a new process
p.join() <-- waits for the process to terminate
In other words, you're starting 10 processes, but the second one won't start until the first one terminates and so on.
For what you're trying to do, it may be better not to use Process manually and instead use a Process Pool.