I am currently using worker threads in Python to get tasks from a Queue and execute them, as follows:
from queue import Queue
from threading import Thread
def run_job(item)
#runs an independent job...
pass
def workingThread():
while True:
item = q.get()
run_job(item)
q.task_done()
q = Queue()
num_worker_threads = 2
for i in range(num_worker_threads):
t = Thread(target=workingThread)
t.daemon = True
t.start()
for item in listOfJobs:
q.put(item)
q.join()
This is functional, but there is an issue: some of the jobs to be executed under the run_job function are very memory-demanding and can only be run individually. Given that I could identify these during runtime, how could I manage to put the parallel worker threads to halt their execution until said job is taken care of?
Edit: It has been flagged as a possible duplicate of Python - Thread that I can pause and resume, and I have referred to this question before asking, and it surely is a reference that has to be cited. However, I don't think it adresses this situation specifically, as it does not consider the jobs being inside a Queue, nor how to specifically point to the other objects that have to be halted.
I would pause/resume the threads so that they run individually.
The following thread
Python - Thread that I can pause and resume indicates how to do that.
Related
I have two threads in a producer consumer pattern. When the consumer receives data it calls an time consuming function expensive() and then enters in a for loop.
But if while the consumer is working new data arrives, it should abort the current work, (exit the loop) and start with the new data.
I tried with a queue.Queue something like this:
q = queue.Queue()
def producer():
while True:
...
q.put(d)
def consumer():
while True:
d = q.get()
expensive(d)
for i in range(10000):
...
if not q.empty():
break
But the problem with this code is that if the producer put data too too fast, and the queue get to have many items, the consumer will do the expensive(d) call plus one loop iteration and then abort for each item, which is time consuming. The code should work, but is not optimized.
Without modifying the code in expensive one solution could be to run it as a separate process which will provide you the ability to terminateit prematurely. Since there's no mention to how long expensive runs this may or may not be more time efficient, however.
import multiprocessing as mp
q = queue.Queue()
def producer():
while True:
...
q.put(d)
def consumer():
while True:
d = q.get()
exp = mp.Thread(target=expensive, args=(d,))
for i in range(10000):
...
if not q.empty():
exp.terminate() # or exp.kill()
break
Well, one way is to use a queue design that can keep an internal lists of waiting and working threads. You can then create several consumer threads to wait on the queue and, when work arrives, set a known consumer thread to do the work. When the thread has finished, it calls into the queue to remove itself from the working list and add itself to the waiting list.
The consumer threads each have an 'abort' atomic that can signal the thread to finish early. There will be some latency while the thread performs inner loops, but that will not matter....
If new work arrives at the queue from the producer, and the working queue is not empty, the 'abort' bool of the working thread/s can be set and their priority set to the minimum possible. The new work can then be dispatched onto one of the waiting threads from the pool, so setting it working.
The waiting threads will need a 'start' function that signals an event/sema/condvar that the wait thread..well..waits on. That allows the producer that supplied work to set that specific thread running, rather than the 'usual' practice where any thread from a pool may pick up work.
Such a design allows new work to be started 'immediately', makes the previous work thread irrelevant by de-prioritizing it and avoids the overheads of thread/process termination.
My script has started many threads. It has kept count. The maximum number of threads are now running. There are more to be run. How can the script wait for any one of the running threads to end so it can safely start another one? It is using threading.Thread() to create each thread but that can be changed if there is a better module. I am using Python 3.6.x.
To create a pool of processes and pass tasks to them:
def processing_task(arg1, arg2):
...
from multiprocessing import Pool
with Pool() as worker_pool: # By default creates processes == number of CPUs
while True:
task = some_queue_implementation.get() # Some blocking method that receives tasks
worker_pool.apply_async(processing_task, task.arg1, task.arg2)
This will create child processes that will be idle until they are passed a task
Here is a snippet of code I always use when using threads.
sets a certain amount of threads;
ensures that no code coming after the context manager block executes until all threads complete; and
kills and raises an exception, if one of the child threads throws an exception.
with concurrent.futures.ThreadPoolExecutor(max_workers=5)\
as executor:
futures.append(executor.submit(function, args))
for future in concurrent.futures.as_completed(futures):
if future.exception():
for child in futures:
child.cancel()
raise future.exception()
It has kept count
I hope you are using .lock() and .acquire(), ;-)
I am learning about Thread in Python and am trying to make a simple program, one that uses threads to grab a number off the Queue and print it.
I have the following code
import threading
from Queue import Queue
test_lock = threading.Lock()
tests = Queue()
def start_thread():
while not tests.empty():
with test_lock:
if tests.empty():
return
test = tests.get()
print("{}".format(test))
for i in range(10):
tests.put(i)
threads = []
for i in range(5):
threads.append(threading.Thread(target=start_thread))
threads[i].daemon = True
for thread in threads:
thread.start()
tests.join()
When run it just prints the values and never exits.
How do I make the program exit when the Queue is empty?
From the docstring of Queue.join():
Blocks until all items in the Queue have been gotten and processed.
The count of unfinished tasks goes up whenever an item is added to the
queue. The count goes down whenever a consumer thread calls task_done()
to indicate the item was retrieved and all work on it is complete.
When the count of unfinished tasks drops to zero, join() unblocks.
So you must call tests.task_done() after processing the item.
Since your threads are daemon threads, and the queue will handle concurrent access correctly, you don't need to check if the queue is empty or use a lock. You can just do:
def start_thread():
while True:
test = tests.get()
print("{}".format(test))
tests.task_done()
I have a huge list of information and my program should analyze each one of them. To speed up I want to use threads, but I want to limit them by 5. So I need to make a loop with 5 threads and when one finish their job grab a new one till the end of the list.
But I don't have a clue how to do that. Should I use queue? For now I just running 5 threads in the most simple way:
Thank you!
for thread_number in range (5):
thread = Th(thread_number)
thread.start()
Separate the idea of worker thread and task -- do not have one worker work on one task, then terminate the thread. Instead, spawn 5 threads, and let them all get tasks from a common queue. Let them each iterate until they receive a sentinel from the queue which tells them to quit.
This is more efficient than continually spawning and terminating threads after they complete just one task.
import logging
import Queue
import threading
logger = logging.getLogger(__name__)
N = 100
sentinel = object()
def worker(jobs):
name = threading.current_thread().name
for task in iter(jobs.get, sentinel):
logger.info(task)
logger.info('Done')
def main():
logging.basicConfig(level=logging.DEBUG,
format='[%(asctime)s %(threadName)s] %(message)s',
datefmt='%H:%M:%S')
jobs = Queue.Queue()
# put tasks in the jobs Queue
for task in range(N):
jobs.put(task)
threads = [threading.Thread(target=worker, args=(jobs,))
for thread_number in range (5)]
for t in threads:
t.start()
jobs.put(sentinel) # Send a sentinel to terminate worker
for t in threads:
t.join()
if __name__ == '__main__':
main()
It seems that you want a thread pool. If you're using python 3, you're lucky : there is a ThreadPoolExecutor class
Else, from this SO question, you can find various solutions, either handcrafted or using hidden modules from the python library.
I've a python program that spawns a number of threads. These threads last anywhere between 2 seconds to 30 seconds. In the main thread I want to track whenever each thread completes and print a message. If I just sequentially .join() all threads and the first thread lasts 30 seconds and others complete much sooner, I wouldn't be able to print a message sooner -- all messages will be printed after 30 seconds.
Basically I want to block until any thread completes. As soon as a thread completes, print a message about it and go back to blocking if any other threads are still alive. If all threads are done then exit program.
One way I could think of is to have a queue that is passed to all the threads and block on queue.get(). Whenever a message is received from the queue, print it, check if any other threads are alive using threading.active_count() and if so, go back to blocking on queue.get(). This would work but here all the threads need to follow the discipline of sending a message to the queue before terminating.
I'm wonder if this is the conventional way of achieving this behavior or are there any other / better ways ?
Here's a variation on #detly's answer that lets you specify the messages from your main thread, instead of printing them from your target functions. This creates a wrapper function which calls your target and then prints a message before terminating. You could modify this to perform any kind of standard cleanup after each thread completes.
#!/usr/bin/python
import threading
import time
def target1():
time.sleep(0.1)
print "target1 running"
time.sleep(4)
def target2():
time.sleep(0.1)
print "target2 running"
time.sleep(2)
def launch_thread_with_message(target, message, args=[], kwargs={}):
def target_with_msg(*args, **kwargs):
target(*args, **kwargs)
print message
thread = threading.Thread(target=target_with_msg, args=args, kwargs=kwargs)
thread.start()
return thread
if __name__ == '__main__':
thread1 = launch_thread_with_message(target1, "finished target1")
thread2 = launch_thread_with_message(target2, "finished target2")
print "main: launched all threads"
thread1.join()
thread2.join()
print "main: finished all threads"
The thread needs to be checked using the Thread.is_alive() call.
Why not just have the threads themselves print a completion message, or call some other completion callback when done?
You can the just join these threads from your main program, so you'll see a bunch of completion messages and your program will terminate when they're all done, as required.
Here's a quick and simple demonstration:
#!/usr/bin/python
import threading
import time
def really_simple_callback(message):
"""
This is a really simple callback. `sys.stdout` already has a lock built-in,
so this is fine to do.
"""
print message
def threaded_target(sleeptime, callback):
"""
Target for the threads: sleep and call back with completion message.
"""
time.sleep(sleeptime)
callback("%s completed!" % threading.current_thread())
if __name__ == '__main__':
# Keep track of the threads we create
threads = []
# callback_when_done is effectively a function
callback_when_done = really_simple_callback
for idx in xrange(0, 10):
threads.append(
threading.Thread(
target=threaded_target,
name="Thread #%d" % idx,
args=(10 - idx, callback_when_done)
)
)
[t.start() for t in threads]
[t.join() for t in threads]
# Note that thread #0 runs for the longest, but we'll see its message first!
What I would suggest is loop like this
while len(threadSet) > 0:
time.sleep(1)
for thread in theadSet:
if not thread.isAlive()
print "Thread "+thread.getName()+" terminated"
threadSet.remove(thread)
There is a 1 second sleep, so there will be a slight delay between the thread termination and the message being printed. If you can live with this delay, then I think this is a simpler solution than the one you proposed in your question.
You can let the threads push their results into a threading.Queue. Have another thread wait on this queue and print the message as soon as a new item appears.
I'm not sure I see the problem with using:
threading.activeCount()
to track the number of threads that are still active?
Even if you don't know how many threads you're going to launch before starting it seems pretty easy to track. I usually generate thread collections via list comprehension then a simple comparison using activeCount to the list size can tell you how many have finished.
See here: http://docs.python.org/library/threading.html
Alternately, once you have your thread objects you can just use the .isAlive method within the thread objects to check.
I just checked by throwing this into a multithread program I have and it looks fine:
for thread in threadlist:
print(thread.isAlive())
Gives me a list of True/False as the threads turn on and off. So you should be able to do that and check for anything False in order to see if any thread is finished.
I use a slightly different technique because of the nature of the threads I used in my application. To illustrate, this is a fragment of a test-strap program I wrote to scaffold a barrier class for my threading class:
while threads:
finished = set(threads) - set(threading.enumerate())
while finished:
ttt = finished.pop()
threads.remove(ttt)
time.sleep(0.5)
Why do I do it this way? In my production code, I have a time limit, so the first line actually reads "while threads and time.time() < cutoff_time". If I reach the cut-off, I then have code to tell the threads to shut down.