Python threading: What am I missing? (task_done() called too many times) - python

My apologies for the long-ish post up front. Hopefully it'll give enough context for a solution. I've tried to create a utility function that will take any number of old classmethods and stick them into a multi-threaded queue:
class QueuedCall(threading.Thread):
def __init__(self, name, queue, fn, args, cb):
threading.Thread.__init__(self)
self.name = name
self._cb = cb
self._fn = fn
self._queue = queue
self._args = args
self.daemon = True
self.start()
def run(self):
r = self._fn(*self._args) if self._args is not None \
else self._fn()
if self._cb is not None:
self._cb(self.name, r)
self._queue.task_done()
Here's what my calling code looks like (within a class)
data = {}
def __op_complete(name, r):
data[name] = r
q = Queue.Queue()
socket.setdefaulttimeout(5)
q.put(QueuedCall('twitter', q, Twitter.get_status, [5,], __op_complete))
q.put(QueuedCall('so_answers', q, StackExchange.get_answers,
['api.stackoverflow.com', 534476, 5], __op_complete))
q.put(QueuedCall('so_user', q, StackExchange.get_user_info,
['api.stackoverflow.com', 534476], __op_complete))
q.put(QueuedCall('p_answers', q, StackExchange.get_answers,
['api.programmers.stackexchange.com', 23901, 5], __op_complete))
q.put(QueuedCall('p_user', q, StackExchange.get_user_info,
['api.programmers.stackexchange.com', 23901], __op_complete))
q.put(QueuedCall('fb_image', q, Facebook.get_latest_picture, None, __op_complete))
q.join()
return data
The problem that I'm running into here is that it seems to work every time on a fresh server restart, but fails every second or third request, with the error:
ValueError: task_done() called too many times
This error presents itself in a random thread every second or third request, so it's rather difficult to nail down exactly what the problem is.
Anyone have any ideas and/or suggestions?
Thanks.
Edit:
I had added prints in an effort to debug this (quick and dirty rather than logging). One print statement (print 'running thread: %s' % self.name) in the first line of run and another right before calling task_done() (print 'thread done: %s' % self.name).
The output of a successful request:
running thread: twitter
running thread: so_answers
running thread: so_user
running thread: p_answers
thread done: twitter
thread done: so_user
running thread: p_user
thread done: so_answers
running thread: fb_image
thread done: p_answers
thread done: p_user
thread done: fb_image
The output of an unsuccessful request:
running thread: twitter
running thread: so_answers
thread done: twitter
thread done: so_answers
running thread: so_user
thread done: so_user
running thread: p_answers
thread done: p_answers
Exception in thread p_answers:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/home/demian/src/www/projects/demianbrecht/demianbrecht/demianbrecht/helpers.py", line 37, in run
self._queue.task_done()
File "/usr/lib/python2.7/Queue.py", line 64, in task_done
raise ValueError('task_done() called too many times')
ValueError: task_done() called too many times
running thread: p_user
thread done: p_user
running thread: fb_image
thread done: fb_image

Your approach to this problem is "unconventional". But ignoring that for now ... the issue is simply that in the code you have given
q.put(QueuedCall('twitter', q, Twitter.get_status, [5,], __op_complete))
it is clearly possible for the following workflow to occur
A thread is constructed and started by QueuedCall.__init__
It is then put into the queue q. However ... before the Queue completes its logic for inserting the item, the independent thread has already finished its work and attempted to call q.task_done(). Which causes the error you have (task_done() has been called before the object was safely put into the queue)
How it should be done? You don't insert threads into queues. Queues hold data that threads process. So instead you
Create a Queue. Insert into it jobs you want done (as eg functions, the args they want and the callback)
You create and start worker threads
A worker thread calls
q.get() to get the function to invoke
invokes it
calls q.task_done() to let the queue know the item was handled.

I may be misunderstanding here, but I'm not sure you're using the Queue correctly.
From a brief survey of the docs, it looks like the idea is that you can use the put method to put work into a Queue, then another thread can call get to get some work out of it, do the work, and then call task_done when it has finished.
What your code appears to do is put instances of QueuedCall into a queue. Nothing ever gets from the queue, but the QueuedCall instances are also passed a reference to the queue they're being inserted into, and they do their work (which they know about intrinsically, not because they get it from the queue) and then call task_done.
If my reading of all that is correct (and you don't call the get method from somewhere else I can't see), then I believe I understand the problem.
The issue is that the QueuedCall instances have to be created before they can be put on the queue, and the act of creating one starts its work in another thread. If the thread finishes its work and calls task_done before the main thread has managed to put the QueuedCall into the queue, then you can get the error you see.
I think it only works when you run it the first time by accident. The GIL 'helps' you a lot; it's not very likely that the QueuedCall thread will actually gain the GIL and begin running immediately. The fact that you don't actually care about the Queue other than as a counter also 'helps' this appear to work: it doesn't matter if the QueuedCall hasn't hit the queue yet so long as it's not empty (this QueuedCall can just task_done another element in the queue, and by the time that element calls task_done this one will hopefully be in the queue, and it can be marked as done by that). And adding sleep also makes the new threads wait a bit, giving the main thread time to make sure they're actually in the queue, which is why that masks the problem as well.
Also note that, as far as I can tell from some quick fiddling with an interactive shell, your queue is actually still full at the end, because you never actually get anything out of it. It's just received a number of task_done messages equal to the number of things that were put in it, so the join works.
I think you'll need to radically redesign the way your QueuedCall class works, or use a different synchronisation primitive than a Queue. A Queue is designed to be used to queue work for worker threads that already exist. Starting a thread from within a constructor for an object that you put on the queue isn't really a good fit.

Related

Python: Make sure concurrent Thread is finished

I'm trying to write a python program that works with threads. I'm using concurrent.futures to handle the threads. In the programm one thread should create an PDF-File. Because this task is lasting very long I want to create a thread to handle the creation. But after sometime I want to work with the pdf file again. Therefore, I have to be sure, that the previous thread is finished.
My question is, how can I check if my concurrent future thread is finished or how I can wait on its execution.
Code
if __name__ == '__main__'
with concurrent.futures.ThreadPoolExecutor() as executor:
document = executor.submit(createPDF, pdfValues)
#working on something different ....
#Here I want to work with the pdf file again, via a new thread
#Therefore I want to make sure that the thread above is finished.
with concurrent.futures.ThreadPoolExecutor() as executor:
result = executor.submit(workWithPDF, values)
your document variable is of type concurrent.futures.Future, which has done() method that returns True if it completed, or result() that returns the result and blocks until it ready (if it still isn't)

Python thread run() blocking

I was attempting to create a thread class that could be terminated by an exception (since I am trying to have the thread wait on an event) when I created the following:
import sys
class testThread(threading.Thread):
def __init__(self):
super(testThread,self).__init__()
self.daemon = True
def run(self):
try:
print('Running')
while 1:
pass
except:
print('Being forced to exit')
test1 = testThread()
test2 = testThread()
print(test1.daemon)
test1.run()
test2.run()
sys.exit()
However, running the program will only print out one Running message, until the other is terminated. Why is that?
The problem is that you're calling the run method.
This is just a plain old method that you implement, which does whatever you put in its body. In this case, the body is an infinite loop, so calling run just loops forever.
The way to start a thread is the start method. This method is part of the Thread class, and what it does is:
Start the thread’s activity.
It must be called at most once per thread object. It arranges for the object’s run() method to be invoked in a separate thread of control.
So, if you call this, it will start a new thread, make that new thread run your run() method, and return immediately, so the main thread can keep doing other stuff.1 That's what you want here.
1. As pointed out by Jean-François Fabre, you're still not going to get any real parallelism here. Busy loops are never a great idea in multithreaded code, and if you're running this in CPython or PyPy, almost all of that busy looping is executing Python bytecode while holding the GIL, and only one thread can hold the GIL at a time. So, from a coarse view, things look concurrent—three threads are running, and all making progress. But if you zoom in, there's almost no overlap where two threads progress at once, usually not even enough to make up for the small scheduler overhead.

Threading Hanging Indefinitely

I was reading about Queue in the Python documentation and this book, and I don't fully understand why my thread hangs. I have the following mcve:
from threading import Thread
import queue
def print_number(number_queue_display):
while True:
number = number_queue_display.get()
print(number)
number_queue_display.task_done()
number_queue = queue.Queue()
printing_numbers = Thread(target=print_number, args=(number_queue,),)
printing_numbers.start()
number_queue.put(5)
number_queue.put(10)
number_queue.put(15)
number_queue.put(20)
number_queue.join()
printing_numbers.join()
The only time it works is if I set the thread to daemon like so:
printing_numbers.setDaemon(True)
but that's because as stated in the Python documentation, the program will exit when only the daemon threads are left. The Python docs example for Queue doesn't use a daemon thread.
A thread can be flagged as a “daemon thread”. The significance of this
flag is that the entire Python program exits when only daemon threads
are left.
Even if I were to remove the two joins(number_queue.join() printing_numbers.join()), it still hangs, but I'm unsure of why.
Questions:
Why is it hanging?
How do I keep it as a non-daemon thread, but prevent it from hanging?
print_number() is running an infinite loop - it never exits, so the thread never ends. It sits in number_queue_display.get() forever, waiting for another queue item that never appears. Then, since the thread never ends, printing_numbers.join() also waits forever.
So you need some way to tell the thread to quit. One common way is to put a special "sentinel" value on the queue, and have the thread exit when it sees that. For concreteness, here's a complete program, which is very much the same as what you started with. None is used as the sentinel (and is commonly used for this purpose), but any unique object would work. Note that the .task_done() parts were removed, because they no longer serve a purpose.
from threading import Thread
import queue
def print_number(number_queue_display):
while True:
number = number_queue_display.get()
if number is None:
break
print(number)
number_queue = queue.Queue()
printing_numbers = Thread(target=print_number, args=(number_queue,),)
printing_numbers.start()
number_queue.put(5)
number_queue.put(10)
number_queue.put(15)
number_queue.put(20)
number_queue.put(None) # tell the thread it's done
printing_numbers.join() # wait for the thread to exit

Can you join a Python queue without blocking?

Python's Queue has a join() method that will block until task_done() has been called on all the items that have been taken from the queue.
Is there a way to periodically check for this condition, or receive an event when it happens, so that you can continue to do other things in the meantime? You can, of course, check if the queue is empty, but that doesn't tell you if the count of unfinished tasks is actually zero.
The Python Queue itself does not support this, so you could try the following
from threading import Thread
class QueueChecker(Thread):
def __init__(self, q):
Thread.__init__(self)
self.q = q
def run(self):
q.join()
q_manager_thread = QueueChecker(my_q)
q_manager_thread.start()
while q_manager_thread.is_alive():
#do other things
#when the loop exits the tasks are done
#because the thread will have returned
#from blocking on the q.join and exited
#its run method
q_manager_thread.join() #to cleanup the thread
a while loop on the thread.is_alive() bit might not be exactly what you want, but at least you can see how to asynchronously check on the status of the q.join now.

Python Queue waiting for thread before getting next item

I have a queue that always needs to be ready to process items when they are added to it. The function that runs on each item in the queue creates and starts thread to execute the operation in the background so the program can go do other things.
However, the function I am calling on each item in the queue simply starts the thread and then completes execution, regardless of whether or not the thread it started completed. Because of this, the loop will move on to the next item in the queue before the program is done processing the last item.
Here is code to better demonstrate what I am trying to do:
queue = Queue.Queue()
t = threading.Thread(target=worker)
t.start()
def addTask():
queue.put(SomeObject())
def worker():
while True:
try:
# If an item is put onto the queue, immediately execute it (unless
# an item on the queue is still being processed, in which case wait
# for it to complete before moving on to the next item in the queue)
item = queue.get()
runTests(item)
# I want to wait for 'runTests' to complete before moving past this point
except Queue.Empty, err:
# If the queue is empty, just keep running the loop until something
# is put on top of it.
pass
def runTests(args):
op_thread = SomeThread(args)
op_thread.start()
# My problem is once this last line 't.start()' starts the thread,
# the 'runTests' function completes operation, but the operation executed
# by some thread is not yet done executing because it is still running in
# the background. I do not want the 'runTests' function to actually complete
# execution until the operation in thread t is done executing.
"""t.join()"""
# I tried putting this line after 't.start()', but that did not solve anything.
# I have commented it out because it is not necessary to demonstrate what
# I am trying to do, but I just wanted to show that I tried it.
Some notes:
This is all running in a PyGTK application. Once the 'SomeThread' operation is complete, it sends a callback to the GUI to display the results of the operation.
I do not know how much this affects the issue I am having, but I thought it might be important.
A fundamental issue with Python threads is that you can't just kill them - they have to agree to die.
What you should do is:
Implement the thread as a class
Add a threading.Event member which the join method clears and the thread's main loop occasionally checks. If it sees it's cleared, it returns. For this override threading.Thread.join to check the event and then call Thread.join on itself
To allow (2), make the read from Queue block with some small timeout. This way your thread's "response time" to the kill request will be the timeout, and OTOH no CPU choking is done
Here's some code from a socket client thread I have that has the same issue with blocking on a queue:
class SocketClientThread(threading.Thread):
""" Implements the threading.Thread interface (start, join, etc.) and
can be controlled via the cmd_q Queue attribute. Replies are placed in
the reply_q Queue attribute.
"""
def __init__(self, cmd_q=Queue.Queue(), reply_q=Queue.Queue()):
super(SocketClientThread, self).__init__()
self.cmd_q = cmd_q
self.reply_q = reply_q
self.alive = threading.Event()
self.alive.set()
self.socket = None
self.handlers = {
ClientCommand.CONNECT: self._handle_CONNECT,
ClientCommand.CLOSE: self._handle_CLOSE,
ClientCommand.SEND: self._handle_SEND,
ClientCommand.RECEIVE: self._handle_RECEIVE,
}
def run(self):
while self.alive.isSet():
try:
# Queue.get with timeout to allow checking self.alive
cmd = self.cmd_q.get(True, 0.1)
self.handlers[cmd.type](cmd)
except Queue.Empty as e:
continue
def join(self, timeout=None):
self.alive.clear()
threading.Thread.join(self, timeout)
Note self.alive and the loop in run.

Categories