Why am I unable to join this thread in python? - python

I am writing a multithreading class. The class has a parallel_process() function that is overridden with the parallel task. The data to be processed is put in the queue. The worker() function in each thread keeps calling parallel_process() until the queue is empty. Results are put in the results Queue object. The class definition is:
import threading
try:
from Queue import Queue
except ImportError:
from queue import Queue
class Parallel:
def __init__(self, pkgs, common=None, nthreads=1):
self.nthreads = nthreads
self.threads = []
self.queue = Queue()
self.results = Queue()
self.common = common
for pkg in pkgs:
self.queue.put(pkg)
def parallel_process(self, pkg, common):
pass
def worker(self):
while not self.queue.empty():
pkg = self.queue.get()
self.results.put(self.parallel_process(pkg, self.common))
self.queue.task_done()
return
def start(self):
for i in range(self.nthreads):
t = threading.Thread(target=self.worker)
t.daemon = False
t.start()
self.threads.append(t)
def wait_for_threads(self):
print('Waiting on queue to empty...')
self.queue.join()
print('Queue processed. Joining threads...')
for t in self.threads:
t.join()
print('...Thread joined.')
def get_results(self):
results = []
print('Obtaining results...')
while not self.results.empty():
results.append(self.results.get())
return results
I use it to create a parallel task:
class myParallel(Parallel): # return square of numbers in a list
def parallel_process(self, pkg, common):
return pkg**2
p = myParallel(range(50),nthreads=4)
p.start()
p.wait_for_threads()
r = p.get_results()
print('FINISHED')
However all threads do not join every time the code is run. Sometimes only 2 join, sometimes no thread joins. I do not think I am blocking the threads from finishing. What reason could there be for join() to not work here?

This statement may lead to errors:
while not self.queue.empty():
pkg = self.queue.get()
With multiple threads pulling items from the queue, there's no guarantee that self.queue.get() will return a valid item, even if you check if the queue is empty beforehand. Here is a possible scenario
Thread 1 checks the queue and the queue is not empty, control proceeds into the while loop.
Control passes to Thread 2, which also checks the queue, finds it is not empty and enters the while loop. Thread 2 gets an item from the loop. The queue is now empty.
Control passes back to Thread 1, it gets an item from the queue, but the queue is now empty, an Empty Exception should be raised.
You should just use a try/except to get an item from the queue
try:
pkg = self.queue.get_nowait()
except Empty:
pass

#Brendan Abel identified the cause. I'd like to suggest a different solution: queue.join() is usually a Bad Idea too. Instead, create a unique value to use as a sentinel:
class Parallel:
_sentinel = object()
At the end of __init__(), add one sentinel to the queue for each thread:
for i in range(nthreads):
self.queue.put(self._sentinel)
Change the start of worker() like so:
while True:
pkg = self.queue.get()
if pkg is self._sentinel:
break
By the construction of the queue, it won't be empty until each thread has seen its sentinel value, so there's no need to mess with the unpredictable queue.size().
Also remove the queue.join() and queue.task_done() cruft.
This will give you reliable code that's easy to modify for fancier scenarios. For example, if you want to add more work items while the threads are running, fine - just write another method to say "I'm done adding work items now", and move the loop adding sentinels into that.

Related

Ensuring a python queue that can be populated by multiple threads will always be cleared without polling

I have the below code that shows how a queue would always be cleared even with multiple threads adding to the queue. It's using recursion but a while loop could work as well. Is this a bad practice or would there be a scenario where the queue might have an object and it won't get pulled until something gets added to the queue.
The primary purpose of this is to have a queue that ensures order of execution without the need to continually poll or block with q.get()
import queue
import threading
lock = threading.RLock()
q = queue.Queue()
def execute():
with lock:
if not q.empty():
text = q.get()
print(text)
execute()
def add_to_queue(text):
q.put(text)
execute()
# Assume multiple threads can call add to queue
add_to_queue("Hello")
This is one solution that uses timeout on the .get function, one pushes to the queue and one reads from the queue. You could have multiple readers and writers.
import queue
import threading
q = queue.Queue()
def read():
try:
while True:
text = q.get(timeout=1)
print(text)
except queue.Empty:
print("exiting")
def write():
q.put("Hello")
q.put("There")
q.put("My")
q.put("Friend")
writer = threading.Thread(target=write)
reader = threading.Thread(target=read)
writer.start()
reader.start()
reader.join()

Thread cue is running serially, not parallel?

I'm making remote API calls using threads, using no join so that the program could make the next API call without waiting for the last to complete.
Like so:
def run_single_thread_no_join(function, args):
thread = Thread(target=function, args=(args,))
thread.start()
return
The problem was I needed to know when all API calls were completed. So I moved to code that's using a cue & join.
Threads seem to run in serial now.
I can't seem to figure out how to get the join to work so that threads execute in parallel.
What am I doing wrong?
def run_que_block(methods_list, num_worker_threads=10):
'''
Runs methods on threads. Stores method returns in a list. Then outputs that list
after all methods in the list have been completed.
:param methods_list: example ((method name, args), (method_2, args), (method_3, args)
:param num_worker_threads: The number of threads to use in the block.
:return: The full list of returns from each method.
'''
method_returns = []
# log = StandardLogger(logger_name='run_que_block')
# lock to serialize console output
lock = threading.Lock()
def _output(item):
# Make sure the whole print completes or threads can mix up output in one line.
with lock:
if item:
print(item)
msg = threading.current_thread().name, item
# log.log_debug(msg)
return
# The worker thread pulls an item from the queue and processes it
def _worker():
while True:
item = q.get()
if item is None:
break
method_returns.append(item)
_output(item)
q.task_done()
# Create the queue and thread pool.
q = Queue()
threads = []
# starts worker threads.
for i in range(num_worker_threads):
t = threading.Thread(target=_worker)
t.daemon = True # thread dies when main thread (only non-daemon thread) exits.
t.start()
threads.append(t)
for method in methods_list:
q.put(method[0](*method[1]))
# block until all tasks are done
q.join()
# stop workers
for i in range(num_worker_threads):
q.put(None)
for t in threads:
t.join()
return method_returns
You're doing all the work in the main thread:
for method in methods_list:
q.put(method[0](*method[1]))
Assuming each entry in methods_list is a callable and a sequence of arguments for it, you did all the work in the main thread, then put the result from each function call in the queue, which doesn't allow any parallelization aside from printing (which is generally not a big enough cost to justify thread/queue overhead).
Presumably, you want the threads to do the work for each function, so change that loop to:
for method in methods_list:
q.put(method) # Don't call it, queue it to be called in worker
and change the _worker function so it calls the function that does the work in the thread:
def _worker():
while True:
method, args = q.get() # Extract and unpack callable and arguments
item = method(*args) # Call callable with provided args and store result
if item is None:
break
method_returns.append(item)
_output(item)
q.task_done()

Can a queue worker signal failure to the parent?

Imagine that I have a task queue with a consumer like this (this is almost identical to the sample code here):
def worker(tasks):
while True:
try:
item = tasks.get_nowait()
except:
return
execute(item)
tasks.task_done()
and a producer like this:
def batch_execute(items, n_threads):
tasks = Queue()
for item in items:
tasks.put(item)
for n in range(n_threads):
t = threading.Thread(target=worker, args=tasks)
t.start()
tasks.join()
This works, except that execute(item) can throw exceptions. If that happens, the given thread will bail, the others keep running, and the tasks.join() will hang indefinitely. Both traits are undesirable. Is there a typical design people use to e.g. "forward" the exception from the child thread into the parent thread and unblock tasks.join()? Or do I have to manually implement all of that around python's Queue class?

why queue is showing incorrect data?

I have used queue for passing urls to download, however the queue gets corrupted when received in the thread:
class ThreadedFetch(threading.Thread):
""" docstring for ThreadedFetch
"""
def __init__(self, queue, out_queue):
super(ThreadedFetch, self).__init__()
self.queue = queue
self.outQueue = out_queue
def run(self):
items = self.queue.get()
print items
def main():
for i in xrange(len(args.urls)):
t = ThreadedFetch(queue, out_queue)
t.daemon = True
t.start()
# populate queue with data
for url, saveTo in urls_saveTo.iteritems():
queue.put([url, saveTo, split])
# wait on the queue until everything has been processed
queue.join()
output resulting execution of run() when I execute the main is :
['http://www.nasa.gov/images/content/607800main_kepler1200_1600-1200.jpg', ['http://broadcast.lds.org/churchmusic/MP3/1/2/nowords/271.mp3', None, 3None, 3]
]
while expected is
['http://www.nasa.gov/images/content/607800main_kepler1200_1600-1200.jpg', None, 3]
['http://broadcast.lds.org/churchmusic/MP3/1/2/nowords/271.mp3', None, 3]
All of the threads print their data at once and the results are interleaved. If you want threads to display data in production code, you need some way for them to cooperate when writing. One option is a global lock that all screen writers use, another is the logging module.
Populate your queue before you start your threads. Add a lock for I/O (for the reason #tdelaney says -- the threads are interleaving writes to stdout and the results appear broken). And modify your run method to this:
lock = threading.Lock()
def run(self):
while True:
try:
items = self.queue.get_nowait()
with lock:
print items
except Queue.Empty:
break
except Exception as err:
pass
self.queue.task_done()
You might also find that it is easier to do this with concurrent.futures. There is a solid example of using a method that returns a value that is called in a thread pool.

Checking on a thread / remove from list

I have a thread which extends Thread. The code looks a little like this;
class MyThread(Thread):
def run(self):
# Do stuff
my_threads = []
while has_jobs() and len(my_threads) < 5:
new_thread = MyThread(next_job_details())
new_thread.run()
my_threads.append(new_thread)
for my_thread in my_threads
my_thread.join()
# Do stuff
So here in my pseudo code I check to see if there is any jobs (like a db etc) and if there is some jobs, and if there is less than 5 threads running, create new threads.
So from here, I then check over my threads and this is where I get stuck, I can use .join() but my understanding is that - this then waits until it's finished so if the first thread it checks is still in progress, it then waits till it's done - even if the other threads are finished....
so is there a way to check if a thread is done, then remove it if so?
eg
for my_thread in my_threads:
if my_thread.done():
# process results
del (my_threads[my_thread]) ?? will that work...
As TokenMacGuy says, you should use thread.is_alive() to check if a thread is still running. To remove no longer running threads from your list you can use a list comprehension:
for t in my_threads:
if not t.is_alive():
# get results from thread
t.handled = True
my_threads = [t for t in my_threads if not t.handled]
This avoids the problem of removing items from a list while iterating over it.
mythreads = threading.enumerate()
Enumerate returns a list of all Thread objects still alive.
https://docs.python.org/3.6/library/threading.html
you need to call thread.isAlive()to find out if the thread is still running
The answer has been covered, but for simplicity...
# To filter out finished threads
threads = [t for t in threads if t.is_alive()]
# Same thing but for QThreads (if you are using PyQt)
threads = [t for t in threads if t.isRunning()]
Better way is to use Queue class:
http://docs.python.org/library/queue.html
Look at the good example code in the bottom of documentation page:
def worker():
while True:
item = q.get()
do_work(item)
q.task_done()
q = Queue()
for i in range(num_worker_threads):
t = Thread(target=worker)
t.daemon = True
t.start()
for item in source():
q.put(item)
q.join() # block until all tasks are done
A easy solution to check thread finished or not. It is thread safe
Install pyrvsignal
pip install pyrvsignal
Example:
import time
from threading import Thread
from pyrvsignal import Signal
class MyThread(Thread):
started = Signal()
finished = Signal()
def __init__(self, target, args):
self.target = target
self.args = args
Thread.__init__(self)
def run(self) -> None:
self.started.emit()
self.target(*self.args)
self.finished.emit()
def do_my_work(details):
print(f"Doing work: {details}")
time.sleep(10)
def started_work():
print("Started work")
def finished_work():
print("Work finished")
thread = MyThread(target=do_my_work, args=("testing",))
thread.started.connect(started_work)
thread.finished.connect(finished_work)
thread.start()

Categories