Python multithread using queue - the program gets blocked forever - python

I am not sure which part of my program is wrong. It will be blocked at the join() calls of two queues. However, if I removed the 2 join calls, the program does not work at all.
import threading
import Queue
queue = Queue.Queue()
out_queue = Queue.Queue()
fruits = ['apple', 'strawberry', 'banana', 'peach', 'rockmelon']
class WorkerThread(threading.Thread):
def __init__(self, queue, out_queue):
threading.Thread.__init__(self)
self.queue = queue
self.out_queue = out_queue
def run(self):
print 'run'
while not self.queue.empty():
name = self.queue.get()
self.out_queue.put(name)
self.queue.task_done()
def main():
print 'start'
for i in xrange(5):
t = WorkerThread(queue, out_queue)
t.setDaemon(True)
t.start()
#populate the queue
for fruit in fruits:
queue.put(fruit)
queue.join()
out_queue.join()
while not out_queue.empty():
print out_queue.get()
print 'end'
if __name__=='__main__':
main()
Thanks in advance.

You're calling out_queue.join(), which waits until out_queue.task_done() has been called the same number of times out_queue.put() has been called. However, you're never calling out_queue.task_done(). This can best be fixed by never calling out_queue.join() in the first place.
EDIT: also, you are populating queue after you start your WorkerThreads. This means there's a chance that the worker threads will run and finish before you've had a chance to insert all your elements. Inserting them before starting the worker thread will fix this.

Related

how to add more items to a multiprocessing queue while script in motion

I am trying to learn multiprocessing with queue.
What I want to do is figure out when/how to "add more items to the queue" when the script is in motion.
The below script is the baseline I am working from:
import multiprocessing
class MyFancyClass:
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print('Doing something fancy in {} for {}!'.format(
proc_name, self.name))
def worker(q):
obj = q.get()
obj.do_something()
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
queue.put(MyFancyClass('Frankie'))
print(queue.qsize())
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
on line 26, the Fancy Dan inject works, but the Frankie piece doesn't. I am able to confirm that Frankie does make it into the queue. I need a spot where I can "Check for more items" and insert them into the queue as needed. If no more items exist, then close the queue when the existing items are clear.
How do I do this?
Thanks!
Let's make it clear:
the target function worker(q) will be called just once in the above scheme. At that first call the function will suspend waiting the result from blocking operation q.get(). It gets the instance MyFancyClass('Fancy Dan') from the queue, invokes its do_something method and get finished.
MyFancyClass('Frankie') will be put into the queue but won't go to the Process cause the process' target function is done.
one of the ways is to read from the queue and wait for a signal/marked item which signals that queue usage is stopped. Let's say None value.
import multiprocessing
class MyFancyClass:
def __init__(self, name):
self.name = name
def do_something(self):
proc_name = multiprocessing.current_process().name
print('Doing something fancy in {} for {}!'.format(proc_name, self.name))
def worker(q):
while True:
obj = q.get()
if obj is None:
break
obj.do_something()
if __name__ == '__main__':
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put(MyFancyClass('Fancy Dan'))
queue.put(MyFancyClass('Frankie'))
# print(queue.qsize())
queue.put(None)
# Wait for the worker to finish
queue.close()
queue.join_thread()
p.join()
The output:
Doing something fancy in Process-1 for Fancy Dan!
Doing something fancy in Process-1 for Frankie!
One way you could do this is by changing worker to
def worker(q):
while not q.empty():
obj = q.get()
obj.do_something()
The problem with your original code is that worker returns after doing work on one item on the queue. You need some sort of looping logic.
This solution is imperfect because empty() is not reliable. Also will fail if the queue becomes empty before adding more items to it (the process will just return).
I would suggest using a Process Pool Executor.
Submit is pretty close to what you're looking for.

Python Queue.join()

Even if I do not set thread as Daemon, shouldn't the program exit itself once queue.join(), completes and unblocks?
#!/usr/bin/python
import Queue
import threading
import time
class workerthread(threading.Thread):
def __init__(self,queue):
threading.Thread.__init__(self)
self.queue=queue
def run(self):
print 'In Worker Class'
while True:
counter=self.queue.get()
print 'Going to Sleep'
time.sleep(counter)
print ' I am up!'
self.queue.task_done()
queue=Queue.Queue()
for i in range(10):
worker=workerthread(queue)
print 'Going to Thread!'
worker.daemon=True
worker.start()
for j in range(10):
queue.put(j)
queue.join()
When you call queue.join() in the main thread, all it does is block the main threads until the workers have processed everything that's in the queue. It does not stop the worker threads, which continue executing their infinite loops.
If the worker threads are non-deamon, their continuing execution prevents the program from stopping irrespective of whether the main thread has finished.
I encountered the situation too, everything in the queue had been processed, but the main thread blocked at the point of Queue.task_done(), here is code block.
import queue
def test04():
q = queue.Queue(10)
for x in range(10):
q.put(x)
while q.not_empty:
print('content--->',q.get())
sleep(1)
re = q.task_done()
print('state--->',re,'\n')
q.join()
print('over \n')
test04()

Why does this http thread pool die (join), but keep functioning?

The following code takes an initial string ('a', 'b', or 'c'), and the two thread types pass it back and forth, appending 'W' and 'H' to it repeatedly, marking that the Worker thread or the Http thread last handled the string.
The code is a simple test to try and eventually accomplish the following. The http thread pool will pull web pages, and the worker thread will add info to a db, and then give the http thread more urls to pull. They just go back and forth. I want both thread pools and queues to stay alive unless BOTH are empty simultaneously. (there are cases where one pool will temporarily run out of things to do, and I don't want it to join because it's companion thread pool will probably be adding more work to it's queue soon.)
In the following code, the http thread pool runs out of things to do almost immediately, and then joins. But you'll notice that the threads keep functioning.
Why does it do this
And how do I make it so neither queues can join until BOTH are simultaneously empty?
from queue import Queue
import threading
import time
class http(threading.Thread):
def __init__(self, queue, out_queue):
threading.Thread.__init__(self)
self.queue = queue
self.out_queue = out_queue
def run(self):
while True:
row = self.queue.get()
print(row)
self.out_queue.put(row+'H')
self.queue.task_done()
class worker(threading.Thread):
def __init__(self, queue, out_queue):
threading.Thread.__init__(self)
self.queue = queue
self.out_queue = out_queue
def run(self):
while True:
time.sleep(1)
row = self.out_queue.get()
self.queue.put(row+'W')
self.out_queue.task_done()
URL_THREAD_COUNT = 3
rows = [chr(x) for x in range(97, 100)]
def main():
queue = Queue()
out_queue = Queue()
#spawn a pool of threads, and pass them queue instance
for i in range(URL_THREAD_COUNT):
t = http(queue, out_queue)
t.daemon = True
t.start()
#populate queue with data
for row in rows:
queue.put(row)
#spawn worker thread
dt = worker(queue, out_queue)
dt.daemon = True
dt.start()
#time.sleep(5)
# wait for queues
queue.join()
print('EXIT http')
out_queue.join()
print('EXIT worker')
start = time.time()
main()
print("Elapsed Time: %s" % (time.time() - start))
"joining" a queue waits until the queue is empty. If worker finishes processing some out_queue messages before the other threads can add more messages, the outer out_queue.join thinks you are done. You may want to add a control message that tells the threads when their work is done so that they can exit, and call thread.join() for them all instead. That will mean keeping a list of threads created in the for loop instead of just abandoning them.

why queue is showing incorrect data?

I have used queue for passing urls to download, however the queue gets corrupted when received in the thread:
class ThreadedFetch(threading.Thread):
""" docstring for ThreadedFetch
"""
def __init__(self, queue, out_queue):
super(ThreadedFetch, self).__init__()
self.queue = queue
self.outQueue = out_queue
def run(self):
items = self.queue.get()
print items
def main():
for i in xrange(len(args.urls)):
t = ThreadedFetch(queue, out_queue)
t.daemon = True
t.start()
# populate queue with data
for url, saveTo in urls_saveTo.iteritems():
queue.put([url, saveTo, split])
# wait on the queue until everything has been processed
queue.join()
output resulting execution of run() when I execute the main is :
['http://www.nasa.gov/images/content/607800main_kepler1200_1600-1200.jpg', ['http://broadcast.lds.org/churchmusic/MP3/1/2/nowords/271.mp3', None, 3None, 3]
]
while expected is
['http://www.nasa.gov/images/content/607800main_kepler1200_1600-1200.jpg', None, 3]
['http://broadcast.lds.org/churchmusic/MP3/1/2/nowords/271.mp3', None, 3]
All of the threads print their data at once and the results are interleaved. If you want threads to display data in production code, you need some way for them to cooperate when writing. One option is a global lock that all screen writers use, another is the logging module.
Populate your queue before you start your threads. Add a lock for I/O (for the reason #tdelaney says -- the threads are interleaving writes to stdout and the results appear broken). And modify your run method to this:
lock = threading.Lock()
def run(self):
while True:
try:
items = self.queue.get_nowait()
with lock:
print items
except Queue.Empty:
break
except Exception as err:
pass
self.queue.task_done()
You might also find that it is easier to do this with concurrent.futures. There is a solid example of using a method that returns a value that is called in a thread pool.

Checking on a thread / remove from list

I have a thread which extends Thread. The code looks a little like this;
class MyThread(Thread):
def run(self):
# Do stuff
my_threads = []
while has_jobs() and len(my_threads) < 5:
new_thread = MyThread(next_job_details())
new_thread.run()
my_threads.append(new_thread)
for my_thread in my_threads
my_thread.join()
# Do stuff
So here in my pseudo code I check to see if there is any jobs (like a db etc) and if there is some jobs, and if there is less than 5 threads running, create new threads.
So from here, I then check over my threads and this is where I get stuck, I can use .join() but my understanding is that - this then waits until it's finished so if the first thread it checks is still in progress, it then waits till it's done - even if the other threads are finished....
so is there a way to check if a thread is done, then remove it if so?
eg
for my_thread in my_threads:
if my_thread.done():
# process results
del (my_threads[my_thread]) ?? will that work...
As TokenMacGuy says, you should use thread.is_alive() to check if a thread is still running. To remove no longer running threads from your list you can use a list comprehension:
for t in my_threads:
if not t.is_alive():
# get results from thread
t.handled = True
my_threads = [t for t in my_threads if not t.handled]
This avoids the problem of removing items from a list while iterating over it.
mythreads = threading.enumerate()
Enumerate returns a list of all Thread objects still alive.
https://docs.python.org/3.6/library/threading.html
you need to call thread.isAlive()to find out if the thread is still running
The answer has been covered, but for simplicity...
# To filter out finished threads
threads = [t for t in threads if t.is_alive()]
# Same thing but for QThreads (if you are using PyQt)
threads = [t for t in threads if t.isRunning()]
Better way is to use Queue class:
http://docs.python.org/library/queue.html
Look at the good example code in the bottom of documentation page:
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
A easy solution to check thread finished or not. It is thread safe
Install pyrvsignal
pip install pyrvsignal
Example:
import time
from threading import Thread
from pyrvsignal import Signal
class MyThread(Thread):
started = Signal()
finished = Signal()
def __init__(self, target, args):
self.target = target
self.args = args
Thread.__init__(self)
def run(self) -> None:
self.started.emit()
self.target(*self.args)
self.finished.emit()
def do_my_work(details):
print(f"Doing work: {details}")
time.sleep(10)
def started_work():
print("Started work")
def finished_work():
print("Work finished")
thread = MyThread(target=do_my_work, args=("testing",))
thread.started.connect(started_work)
thread.finished.connect(finished_work)
thread.start()

Categories