Using python, I was trying to demonstrate how a producer/consumer multi-threading scenario could bring to a deadlock when consumer thread ends up waiting on an empty queue that will stay empty for the rest of the execution and 'till the end of it and how to solve this avoiding starvation or a sudden "dirty interruption" of the program.
So I took code from the producer/consumer threading using a queue on this nice RealPython article, here's original code excerpts:
def consumer(queue, event):
"""Pretend we're saving a number in the database."""
while not event.is_set() or not queue.empty():
message = queue.get()
logging.info(
"Consumer storing message: %s (size=%d)", message, queue.qsize()
)
logging.info("Consumer received event. Exiting")
if __name__ == "__main__":
format = "%(asctime)s: %(message)s"
logging.basicConfig(format=format, level=logging.INFO,
datefmt="%H:%M:%S")
pipeline = queue.Queue(maxsize=10)
event = threading.Event()
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
executor.submit(producer, pipeline, event)
executor.submit(consumer, pipeline, event)
time.sleep(0.1)
logging.info("Main: about to set event")
event.set()
I noticed that, however unlikely to occur, the code as it is would bring to the situation I described in the case 'main' thread sets the 'event', and 'producer' comes to its end while the 'consumer' is still waiting to get a message from the queue.
To solve things in the case of a single 'consumer' a simple check for the queue not being empty before the 'get' instruction calling would suffice (for instance: if (not q.empty()): message = q.get()). BUT the problem, however, would still persist in a multi-consumer scenario 'cause the thread could swap immediately after the queue emptiness check and another consumer (a second one) could get the message leaving the queue empty so that swapping back to previous consumer (the first one) it would call the get on an empty queue and... that's it.
I wanted to go for a solution that would potentially work even in an hypothetical multi-consumer scenario. So I modified the 'consumer' code this way, substantially adding a timeout on the queue get instruction and managing the exception:
def consumer(q, event, n):
while not event.is_set() or not q.empty():
print("Consumer"+n+": Q-get")
try:
time.sleep(0.1) #(I don't really need this, I just want to force a consumer-thread swap at this precise point :=> showing that, just as expected, things will work well in a multi-consumer scenario also)
message = q.get(True,1)
except queue.Empty:
print("Consumer"+n+": looping on empty queue")
time.sleep(0.1) #(I don't really need this at all... just hoping -unfortunately without success- _main_ to swap on ThreadPoolExecutor)
continue
logging.info("Consumer%s storing message: %s (size=%d)", n,message,q.qsize())
print("Consumer"+n+": ended")
and also modified the "main" part to make it put a message in the queue and to make it spawn a second consumer instead of a producer...
if __name__ == "__main__":
format = "%(asctime)s: %(message)s"
logging.basicConfig(format=format, level=logging.INFO,datefmt="%H:%M:%S")
pipeline = queue.Queue(maxsize=10)
event = threading.Event()
pipeline.put("XxXxX")
print("Let's start (ThreadPoolExecutor)")
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
#executor.submit(producer, pipeline, event)
executor.submit(consumer, pipeline, event, '1')
executor.submit(consumer, pipeline, event, '2')
print("(_main_ won't even get this far... Why?)") #!#
time.sleep(2)
logging.info("Main: about to set event")
event.set()
(please mind that my purpose here is thwarting the consumer deadlock risk and show it is voided actually, in this phase I don't really need the producer, that's why I made the code not spawn it)
Now, the problem is, I can't understand why, all seems working well if threads are spawned with
threading.Thread(...).start(), for instance:
print("Let's start (simple Thread)")
for i in range(1,3): threading.Thread(target=consumer, args=(pipeline, event, str(i))).start()
whereas using concurrent.futures.ThreadPoolExecutor for it seems making the 'main' thread to never resume (seems like it not even get to its sleep call), so that the execution never ends resulting in an infinite consumer looping...
Can you help me understand why this "difference"? Knowing this is important for me and I think almost certainly would help me understand if it can be solved somehow or if I'll be necessarily forced to not use the ThreadPoolExecutor, so... Thank you in advance for your precious help on this!
The problem is that you put event.set() outside the with block that manages the ThreadPoolExecutor. When used with with, on exiting the with, ThreadPoolExecutor performs the equivalent of .shutdown(wait=True). So you're waiting for the workers to finish, which they won't, because you haven't yet set the event.
If you want to be able to tell it to shutdown when it can, but not wait immediately, you could do:
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
try:
#executor.submit(producer, pipeline, event)
executor.submit(consumer, pipeline, event, '1')
executor.submit(consumer, pipeline, event, '2')
executor.shutdown(wait=False) # No new work can be submitted after this point, and
# workers will opportunistically exit if no work left
time.sleep(2)
logging.info("Main: about to set event")
finally:
event.set() # Set the event *before* we block awaiting shutdown
# Done in a finally block to ensure its set even on exception
# so the with doesn't try to block in a case where
# it will never finish
# Blocks for full shutdown here; after this, the pool is definitely finished and cleaned up
Related
In one of my asyncio projects I use one synchronisation method quite a lot and was wondering, if it is some kind of standard tool with a name I could give to google to learn more. I used the term "1-item queue" only because I don't have a better name. It is a degraded queue and it is NOT related to Queue(maxsize=1).
# [controller] ---- commands ---> [worker]
The controller sends commands to a worker (queue.put, actually put_nowait) and the worker waits for them (queue.get) and executes them, but the special rule is that the only the last command is important and immediately replaces all prior unfinished commands. For this reason, there is never more than 1 command waiting for the execution in the queue.
To implement this, the controller clears the queue before the put. There is no queue.clear, so it must discard (with get_nowait) the waiting item, if any. (The absence of queue.clear started my doubts resulting in this question.)
On the worker's side, if a command execution requires a sleep, it is replaced by a newcmd=queue.get with a timeout. When the timeout occurs, it was a sleep; when the get succeeds, the current work is aborted and the execution of newcmd starts.
The type of queue you are using is not standard - there is such a thing as a one-shot queue, but it's a different thing altogether.
The queue doesn't really fit your use case, though you made it work with some effort. You don't really need queuing of any kind, you need a slot that holds a single object (which can be replaced) and a wakeup mechanism. asyncio.Event can be used for the wakeup and you can attach the payload object (the command) to an attribute of the event. For example:
async def worker(evt):
while True:
await evt.wait()
evt.clear()
if evt.last_command is None:
continue
last_command = evt.last_command
evt.last_command = None
# execute last_command, possibly with timeout
print(last_command)
async def main():
evt = asyncio.Event()
workers = [asyncio.create_task(worker(evt)) for _ in range(5)]
for i in itertools.count():
await asyncio.sleep(1)
evt.last_command = f"foo {i}"
evt.set()
asyncio.run(main())
One difference between this and the queue-based approach is that setting the event will wake up all workers (if there is more than one), even if the first worker immediately calls evt.clear(). A queue item, on the other hand, will be guaranteed to be handed off to a single awaiter of queue.get().
I know there are a few questions and answers related to hanging threads in Python, but my situation is slightly different as the script is hanging AFTER all the threads have been completed. The threading script is below, but obviously the first 2 functions are simplified massively.
When I run the script shown, it works. When I use my real functions, the script hangs AFTER THE LAST LINE. So, all the scenarios are processed (and a message printed to confirm), logStudyData() then collates all the results and writes to a csv. "Script Complete" is printed. And THEN it hangs.
The script with threading functionality removed runs fine.
I have tried enclosing the main script in try...except but no exception gets logged. If I use a debugger with a breakpoint on the final print and then step it forward, it hangs.
I know there is not much to go on here, but short of including the whole 1500-line script, I don't know hat else to do. Any suggestions welcome!
def runScenario(scenario):
# Do a bunch of stuff
with lock:
# access global variables
pass
pass
def logStudyData():
# Combine results from all scenarios into a df and write to csv
pass
def worker():
global q
while True:
next_scenario = q.get()
if next_scenario is None:
break
runScenario(next_scenario)
print(next_scenario , " is complete")
q.task_done()
import threading
from queue import Queue
global q, lock
q = Queue()
threads = []
scenario_list = ['s1','s2','s3','s4','s5','s6','s7','s8','s9','s10','s11','s12']
num_worker_threads = 6
lock = threading.Lock()
for i in range(num_worker_threads):
print("Thread number ",i)
this_thread = threading.Thread(target=worker)
this_thread.start()
threads.append(this_thread)
for scenario_name in scenario_list:
q.put(scenario_name)
q.join()
print("q.join completed")
logStudyData()
print("script complete")
As the docs for Queue.get say:
Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).
In other words, there is no way get can ever return None, except by you calling q.put(None) on the main thread, which you don't do.
Notice that the example directly below those docs does this:
for i in range(num_worker_threads):
q.put(None)
for t in threads:
t.join()
The second one is technically necessary, but you usually get away with not doing it.
But the first one is absolutely necessary. You need to either do this, or come up with some other mechanism to tell your workers to quit. Without that, your main thread just tries to exit, which means it tries to join every worker, but those workers are all blocked forever on a get that will never happen, so your program hangs forever.
Building a thread pool may not be rocket science (if only because rocket scientists tend to need their calculations to be deterministic and hard real-time…), but it's not trivial, either, and there are plenty of things you can get wrong. You may want to consider using one of the two already-built threadpools in the Python standard library, concurrent.futures.ThreadPoolExecutor or multiprocessing.dummy.Pool. This would reduce your entire program to:
import concurrent.futures
def work(scenario):
runScenario(scenario)
print(scenario , " is complete")
scenario_list = ['s1','s2','s3','s4','s5','s6','s7','s8','s9','s10','s11','s12']
with concurrent.futures.ThreadPoolExecutor(max_workers=6) as x:
results = list(x.map(work, scenario_list))
print("q.join completed")
logStudyData()
print("script complete")
Obviously you'll still need a lock around any mutable variables you change inside runScenario—although if you're only using a mutable variable there because you couldn't figure out how to return values to the main thread, that's trivial with an Executor: just return the values from work, and then you can use them like this:
for result in x.map(work, scenario_list):
do_something(result)
I've a python program that spawns a number of threads. These threads last anywhere between 2 seconds to 30 seconds. In the main thread I want to track whenever each thread completes and print a message. If I just sequentially .join() all threads and the first thread lasts 30 seconds and others complete much sooner, I wouldn't be able to print a message sooner -- all messages will be printed after 30 seconds.
Basically I want to block until any thread completes. As soon as a thread completes, print a message about it and go back to blocking if any other threads are still alive. If all threads are done then exit program.
One way I could think of is to have a queue that is passed to all the threads and block on queue.get(). Whenever a message is received from the queue, print it, check if any other threads are alive using threading.active_count() and if so, go back to blocking on queue.get(). This would work but here all the threads need to follow the discipline of sending a message to the queue before terminating.
I'm wonder if this is the conventional way of achieving this behavior or are there any other / better ways ?
Here's a variation on #detly's answer that lets you specify the messages from your main thread, instead of printing them from your target functions. This creates a wrapper function which calls your target and then prints a message before terminating. You could modify this to perform any kind of standard cleanup after each thread completes.
#!/usr/bin/python
import threading
import time
def target1():
time.sleep(0.1)
print "target1 running"
time.sleep(4)
def target2():
time.sleep(0.1)
print "target2 running"
time.sleep(2)
def launch_thread_with_message(target, message, args=[], kwargs={}):
def target_with_msg(*args, **kwargs):
target(*args, **kwargs)
print message
thread = threading.Thread(target=target_with_msg, args=args, kwargs=kwargs)
thread.start()
return thread
if __name__ == '__main__':
thread1 = launch_thread_with_message(target1, "finished target1")
thread2 = launch_thread_with_message(target2, "finished target2")
print "main: launched all threads"
thread1.join()
thread2.join()
print "main: finished all threads"
The thread needs to be checked using the Thread.is_alive() call.
Why not just have the threads themselves print a completion message, or call some other completion callback when done?
You can the just join these threads from your main program, so you'll see a bunch of completion messages and your program will terminate when they're all done, as required.
Here's a quick and simple demonstration:
#!/usr/bin/python
import threading
import time
def really_simple_callback(message):
"""
This is a really simple callback. `sys.stdout` already has a lock built-in,
so this is fine to do.
"""
print message
def threaded_target(sleeptime, callback):
"""
Target for the threads: sleep and call back with completion message.
"""
time.sleep(sleeptime)
callback("%s completed!" % threading.current_thread())
if __name__ == '__main__':
# Keep track of the threads we create
threads = []
# callback_when_done is effectively a function
callback_when_done = really_simple_callback
for idx in xrange(0, 10):
threads.append(
threading.Thread(
target=threaded_target,
name="Thread #%d" % idx,
args=(10 - idx, callback_when_done)
)
)
[t.start() for t in threads]
[t.join() for t in threads]
# Note that thread #0 runs for the longest, but we'll see its message first!
What I would suggest is loop like this
while len(threadSet) > 0:
time.sleep(1)
for thread in theadSet:
if not thread.isAlive()
print "Thread "+thread.getName()+" terminated"
threadSet.remove(thread)
There is a 1 second sleep, so there will be a slight delay between the thread termination and the message being printed. If you can live with this delay, then I think this is a simpler solution than the one you proposed in your question.
You can let the threads push their results into a threading.Queue. Have another thread wait on this queue and print the message as soon as a new item appears.
I'm not sure I see the problem with using:
threading.activeCount()
to track the number of threads that are still active?
Even if you don't know how many threads you're going to launch before starting it seems pretty easy to track. I usually generate thread collections via list comprehension then a simple comparison using activeCount to the list size can tell you how many have finished.
See here: http://docs.python.org/library/threading.html
Alternately, once you have your thread objects you can just use the .isAlive method within the thread objects to check.
I just checked by throwing this into a multithread program I have and it looks fine:
for thread in threadlist:
print(thread.isAlive())
Gives me a list of True/False as the threads turn on and off. So you should be able to do that and check for anything False in order to see if any thread is finished.
I use a slightly different technique because of the nature of the threads I used in my application. To illustrate, this is a fragment of a test-strap program I wrote to scaffold a barrier class for my threading class:
while threads:
finished = set(threads) - set(threading.enumerate())
while finished:
ttt = finished.pop()
threads.remove(ttt)
time.sleep(0.5)
Why do I do it this way? In my production code, I have a time limit, so the first line actually reads "while threads and time.time() < cutoff_time". If I reach the cut-off, I then have code to tell the threads to shut down.
I have the below code but it lives on after the queue is empty, any insights:
def processor():
while(1>0):
if queue.empty() == True:
print "the Queue is empty!"
break
source=queue.get()
page = urllib2.urlopen(source)
print page
def main:
for i in range(threads):
th = Thread(target=processor)
th.setDaemon(True)
th.start()
queue.join()
It prints queue empty as many times as I have threads and just stands there doing nothing.
You need to call queue.task_done() after printing the page, otherwise join() will block. Each thread, after using get() must call task_done().
See documentation for queue
This part:
while(1>0):
if queue.empty() == True:
print "the Queue is empty!"
break
Above is just plain wrong. queue.get() is blocking, there is absolutely no reason to have a busy loop. It should be deleted.
Your code should look something like this.
def processor():
source=queue.get()
page = urllib2.urlopen(source)
print page
queue.task_done()
def main:
for i in range(threads):
th = Thread(target=processor)
th.setDaemon(True)
th.start()
for source in all_sources:
queue.put(source)
queue.join()
It's not the cleanest way to exit, but it will work. Since processor threads are set to be daemons, whole process with exit as soon as the main is done.
As Max said, need a complete example to help with your behavior, but from the documentation:
Python’s Thread class supports a subset of the behavior of Java’s Thread class; currently, there are no priorities, no thread groups, and threads cannot be destroyed, stopped, suspended, resumed, or interrupted.
It stops being alive when its run() method terminates – either normally, or by raising an unhandled exception. The is_alive() method tests whether the thread is alive.
http://docs.python.org/library/threading.html
The lower level thread module does allow you to manually call exit(). Without a more complete example, I don't know if that's what you need in this case, but I suspect not as Thread objects should automatically end when run() is complete.
http://docs.python.org/library/thread.html
I have a queue that always needs to be ready to process items when they are added to it. The function that runs on each item in the queue creates and starts thread to execute the operation in the background so the program can go do other things.
However, the function I am calling on each item in the queue simply starts the thread and then completes execution, regardless of whether or not the thread it started completed. Because of this, the loop will move on to the next item in the queue before the program is done processing the last item.
Here is code to better demonstrate what I am trying to do:
queue = Queue.Queue()
t = threading.Thread(target=worker)
t.start()
def addTask():
queue.put(SomeObject())
def worker():
while True:
try:
# If an item is put onto the queue, immediately execute it (unless
# an item on the queue is still being processed, in which case wait
# for it to complete before moving on to the next item in the queue)
item = queue.get()
runTests(item)
# I want to wait for 'runTests' to complete before moving past this point
except Queue.Empty, err:
# If the queue is empty, just keep running the loop until something
# is put on top of it.
pass
def runTests(args):
op_thread = SomeThread(args)
op_thread.start()
# My problem is once this last line 't.start()' starts the thread,
# the 'runTests' function completes operation, but the operation executed
# by some thread is not yet done executing because it is still running in
# the background. I do not want the 'runTests' function to actually complete
# execution until the operation in thread t is done executing.
"""t.join()"""
# I tried putting this line after 't.start()', but that did not solve anything.
# I have commented it out because it is not necessary to demonstrate what
# I am trying to do, but I just wanted to show that I tried it.
Some notes:
This is all running in a PyGTK application. Once the 'SomeThread' operation is complete, it sends a callback to the GUI to display the results of the operation.
I do not know how much this affects the issue I am having, but I thought it might be important.
A fundamental issue with Python threads is that you can't just kill them - they have to agree to die.
What you should do is:
Implement the thread as a class
Add a threading.Event member which the join method clears and the thread's main loop occasionally checks. If it sees it's cleared, it returns. For this override threading.Thread.join to check the event and then call Thread.join on itself
To allow (2), make the read from Queue block with some small timeout. This way your thread's "response time" to the kill request will be the timeout, and OTOH no CPU choking is done
Here's some code from a socket client thread I have that has the same issue with blocking on a queue:
class SocketClientThread(threading.Thread):
""" Implements the threading.Thread interface (start, join, etc.) and
can be controlled via the cmd_q Queue attribute. Replies are placed in
the reply_q Queue attribute.
"""
def __init__(self, cmd_q=Queue.Queue(), reply_q=Queue.Queue()):
super(SocketClientThread, self).__init__()
self.cmd_q = cmd_q
self.reply_q = reply_q
self.alive = threading.Event()
self.alive.set()
self.socket = None
self.handlers = {
ClientCommand.CONNECT: self._handle_CONNECT,
ClientCommand.CLOSE: self._handle_CLOSE,
ClientCommand.SEND: self._handle_SEND,
ClientCommand.RECEIVE: self._handle_RECEIVE,
}
def run(self):
while self.alive.isSet():
try:
# Queue.get with timeout to allow checking self.alive
cmd = self.cmd_q.get(True, 0.1)
self.handlers[cmd.type](cmd)
except Queue.Empty as e:
continue
def join(self, timeout=None):
self.alive.clear()
threading.Thread.join(self, timeout)
Note self.alive and the loop in run.