I'm trying to implement a simple threadpool in python.
I start a few threads with the following code:
threads = []
for i in range(10):
t = threading.Thread(target=self.workerFuncSpinner(
taskOnDeckQueue, taskCompletionQueue, taskErrorQueue, i))
t.setDaemon(True)
threads.append(t)
t.start()
for thread in threads:
thread.join()
At this point, the worker thread only prints when it starts and exits and time.sleeps between. The problem is, instead of getting output like:
#All output at the same time
thread 1 starting
thread 2 starting
thread n starting
# 5 seconds pass
thread 1 exiting
thread 2 exiting
thread n exiting
I get:
thread 1 starting
# 5 seconds pass
thread 1 exiting
thread 2 starting
# 5 seconds pass
thread 2 exiting
thread n starting
# 5 seconds pass
thread n exiting
And when I do a threading.current_thread(), they all report they are mainthread.
It's like there not even threads, but running in the main thread context.
Help?
Thanks
You are calling workerFuncSpinner in the main thread when creating the Thread object. Use a reference to the method instead:
t=threading.Thread(target=self.workerFuncSpinner,
args=(taskOnDeckQueue, taskCompletionQueue, taskErrorQueue, i))
Your original code:
t = threading.Thread(target=self.workerFuncSpinner(
taskOnDeckQueue, taskCompletionQueue, taskErrorQueue, i))
t.start()
could be rewritten as
# call the method in the main thread
spinner = self.workerFuncSpinner(
taskOnDeckQueue, taskCompletionQueue, taskErrorQueue, i)
# create a thread that will call whatever `self.workerFuncSpinner` returned,
# with no arguments
t = threading.Thread(target=spinner)
# run whatever workerFuncSpinner returned in background thread
t.start()
You were calling the method serially in the main thread and nothing in the created threads.
I suspect workerFuncSpinner may be your problem. I would verify that it is not actually running the task, but returning a callable object for the thread to run.
https://docs.python.org/2/library/threading.html#threading.Thread
Related
I have set up a thread pool executor with 4 threads. I have added 2 items to my queue to be processed. When I submit the tasks and retrieve futures, it appears the other 2 threads not processing items in the queue keep running and hang, even if they are not processing anything!
import time
import queue
import concurrent
def _read_queue(queue):
msg = queue.get()
time.sleep(2)
queue.task_done()
n_threads = 4
q = queue.Queue()
q.put('test')
q.put("test2")
with concurrent.futures.ThreadPoolExecutor(max_workers=n_threads) as pool:
futures = []
for _ in range(n_threads):
future = pool.submit(_read_queue, q)
print(future.running())
print("Why am running forever?")
How can I adjust my code so that threads that are not processing anything from the queue are shutdown so my program can terminate?
Because queue.get() operation block your ThreadPoolExecutor threads.
for _ in range(n_threads):
future = pool.submit(_read_queue, q)
print(future.running())
Let's examine future = pool.submit(_read_queue, q) in every iteration of for loop
In first iteration of for loop, pool.submit(_read_queue, q) will put a job inside the ThreadPoolExecutor internal queue. When any job are put inside the ThreadPoolExecutor internal queue (it's name is self._work_queue), submit method will create a thread1(I say thread1,thread2.. for easily understand) thread. This thread will execute _read_queue func(This can be happen immediately or this can be happen after the fourth iteration of for loop. This ordering is depends on the Operating System Scheduler, please look at this) and queue.get() will return "test". Then, this thread will sleep for 2 seconds.
In second iteration of for loop, pool.submit(_read_queue, q) will put a job inside the ThreadPoolExecutor internal queue and then submit method will check that there is any thread which is waiting for a job ? No, there is no any waiting thread, first thread is sleeping(for 2 seconds). So submit method will do below steps :
if "there is a thread which will accept a job immediately": #Step 1
return
# Step 2
if numbe_of_created_threads(now this is 1) < self._max_workers:
threading.Thread().... #create a new thread
And then submit method will create a new thread2 thread and this thread will execute _read_queue func and queue.get() will return "test2". Then, this thread will sleep for 2 seconds. Also, q, queue object will be empty and subsequent get() call will block the calling thread
In third iteration of for loop, submit method will put a job inside the ThreadPoolExecutor internal queue and then submit method will check that there is any thread which is waiting for a job ? There is no any waiting thread, first thread is sleeping(for 2 seconds) and second thread also is sleeping, so submit method will create a new thread3 thread (It will check the both step1 and Step2) and this thread will execute _read_queue func same as other threads did. When thread3 run, it will execute queue.get() but this will block the thread3, because q,queue object is empty and if you call get(blocking=True) method of a empty queue object, your calling thread will be blocked .
In fourth iteration of for loop, this will be same as with third case, and then thread4 will be blocked on queue.get() operation.
I assume 2 seconds not passed now, and there will be 5 thread which is alive (can be sleep mode or not) currently. After 2 seconds passed, thread1 and thread2(because time.sleep(2) will return) will terminate*1 but thread3 and thread4 will not, because queue.get() blocking them. That's why your main thread (whole program) will wait them and not terminate.
What can we do in this situation ?
We can put two elements inside the q object because q.get() blocking your thread by using acquire a lock object. We can only release this lock by calling release() method, to do that we need to call queue.put(something)
Here is one of the solutions ;
import time,threading
import queue
from concurrent import futures
def _read_queue(queue):
msg = queue.get()
time.sleep(2)
queue.put(None)
n_threads = 4
q = queue.Queue()
q.put('test')
q.put("test2")
with futures.ThreadPoolExecutor(max_workers=n_threads) as pool:
futures = []
for _ in range(n_threads):
futures.append(pool.submit(_read_queue, q))
*1, I said ThreadPoolExecutor threads will terminate after function finished, but it is depend on calling the shutdown() method, If we don't call shutdown() method of pool object, thread will not terminate even if function finished. Because creating and destruction a thread is costly, that's why threadpool concept is there.(shutdown() method will be called end of the with statement)
If I'm wrong somewhere please correct me.
Lets say I want to run 10 threads at same time and after one is finished start immediately new one. How can I do that?
I know with thread.join() I can wait to get finished, but than 10 threads needs to be finished, but I want after one finished to start new one immediately.
Well, what I understand is that you need to execute 10 thread at the same time.
I suggest you to use threading.BoundedSemaphore()
A sample code on using it is given below:
import threading
from typing import List
def do_something():
print("I hope this cleared your doubt :)")
sema4 = threading.BoundedSemaphore(10)
# 10 is given as parameter since your requirement stated that you need just 10 threads to get executed parallely
threads_list: List[threading.Thread] = []
# Above variable is used to save threads
for i in range(100):
thread = threading.Thread(target=do_something)
threads_list.append(thread) # saving thread in order to join it later
thread.start() # starting the thread
for thread in threads_list:
thread.join() # else, parent program is terminated without waiting for child threads
I had some performance issues with a multi-threading code to parallelize multiple telnet probes.
Slow
My first implementation was is really slow, same a if the tasks were run sequencially:
for printer in printers:
…
thread = threading.Thread(target=collect, args=(task, printers_response), kwargs=kw)
threads.append(thread)
for thread in threads:
thread.start()
thread.join()
Blastlingly Fast
for printer in printers:
…
thread = threading.Thread(target=collect, args=(task, printers_response), kwargs=kw)
threads.append(thread)
thread.start() # <----- moved this
for thread in threads:
thread.join()
Question
I don't get why moving the start() method change the performance so much.
In your first implementation you are actually running the code sequentially because by calling join() immediately after start() the main thread is blocked until the started thread is finished.
thread.join() is blocking every thread as soon as they are created in your first implementation.
According to threading.Thread.join() documentation:
Wait until the thread terminates.
This blocks the calling thread until the thread whose join() method is called terminates -- either normally or through an unhandled exception or until the optional timeout occurs".
In your slow example you start the thread and wait till it is complete, then you iterate to the next thread.
Example
from threading import Thread
from time import sleep
def foo(a, b):
while True:
print(a + ' ' + b)
sleep(1)
ths = []
for i in range(3):
th = Thread(target=foo, args=('hi', str(i)))
ths.append(th)
for th in ths:
th.start()
th.join()
Produces
hi 0
hi 0
hi 0
hi 0
In your slow solution you are basically not using multithreading at all. Id's running a thread, waiting to finish it and then running another - there is no difference in running everything in one thread and this solution - you are running them in series.
The second one on the other hand starts all threads and then joins them. This solution limits the execution time to the longest execution time of one single thread - you are running them in parallel.
Self-taught programming student, so I apologize for all the amateur mistakes. I want to learn some deeper subjects, so I'm trying to understand threading, and exception handling.
import threading
import sys
from time import sleep
from random import randint as r
def waiter(n):
print "Starting thread " + str(n)
wait_time = r(1,10)
sleep(wait_time)
print "Exiting thread " + str(n)
if __name__=='__main__':
try:
for i in range(5):
t = threading.Thread(target=waiter, args=(i+1,))
t.daemon = True
t.start()
sleep(3)
print 'All threads complete!'
sys.exit(1)
except KeyboardInterrupt:
print ''
sys.exit(1)
This script just starts and stops threads after a random time and will kill the program if it receives a ^C. I've noticed that it doesn't print when some threads finish:
Starting thread 1
Starting thread 2
Starting thread 3
Exiting thread 3
Exiting thread 2
Starting thread 4
Exiting thread 1
Exiting thread 4
Starting thread 5
All threads complete!
In this example, it never states it exits thread 5. I find I can fix this if I comment out the t.daemon = True statement, but then exception handling waits for any threads to finish up.
Starting thread 1
Starting thread 2
^C
Exiting thread 1
Exiting thread 2
I can understand that when dealing with threads, it's best that they complete what they're handling before exiting, but I'm just curious as to why this is. I'd really appreciate any answers regarding the nature of threading and daemons to guide my understanding.
The whole point of a daemon thread is that if it's not finished by the time the main thread finishes, it gets summarily killed. Quoting the docs:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property or the daemon constructor argument.
Note Daemon threads are abruptly stopped at shutdown. Their resources (such as open files, database transactions, etc.) may not be released properly. If you want your threads to stop gracefully, make them non-daemonic and use a suitable signalling mechanism such as an Event.
Now, look at your logic. The main thread only sleeps for 3 seconds after starting thread 5. But thread 5 can sleep for anywhere from 1-10 seconds. So, about 70% of the time, it's not going to be finished by the time the main thread wakes up, prints "All threads complete!", and exits. But thread 5 is still asleep for another 5 seconds. In which case thread 5 will be killed without ever going to print "Exiting thread 5".
If this isn't the behavior you want—if you want the main thread to wait for all the threads to finish—then don't use daemon threads.
from thread import start_new_thread
num_threads = 0
def heron(a):
global num_threads
num_threads += 1
# code has been left out, see above
num_threads -= 1
return new
start_new_thread(heron,(99,))
start_new_thread(heron,(999,))
start_new_thread(heron,(1733,))
start_new_thread(heron,(17334,))
while num_threads > 0:
pass
This is simple code of thread i want to know in last line why do we use while loop
The final while-loop waits for all of the threads to finish before the main thread exits.
It is expensive check (100% CPU for the spin-wait). You can improve it in one of two ways:
while num_threads > 0:
time.sleep(0.1)
or by tracking all the threads in a list and joining them one-by-one:
for worker in worker_threads:
worker.join()
We want to keep process alive until all children finish the work. So we must keep executing something in main thread as long as any child is alive, hence the check for num_threads variable.
If it wasn't for this, all children threads would be killed ASAP main thread finished its work regardless of whether they actually finished their work, so waiting for them is mandatory to ensure everything is done.
To build on Raymond Hettinger's answer: the parent process starts a number of threads, each of which does work. We then wait for each to exit, so that we can collect and process their output. In this case each worker just outputs to the screen, so the parent just has to join() each task to make sure it ran and exited correctly.
Here's an alternate way to code the above. It uses the higher-level library threading (vs thread), and only calls join() on threads besides the current one. We also use threading.enumerate() instead of manually keeping track of worker threads -- easier.
code:
import threading
def heron(a):
print '{}: a={}'.format(threading.current_thread(), a)
threading.Thread(target=heron, args=(99,)).start()
threading.Thread(target=heron, args=(999,)).start()
threading.Thread(target=heron, args=(1733,)).start()
threading.Thread(target=heron, args=(17334,)).start()
print
print '{} threads, joining'.format(threading.active_count())
for thread in threading.enumerate():
print '- {} join'.format(thread)
if thread == threading.current_thread():
continue
thread.join()
print 'done'
Example output:
python ./jointhread.py
<Thread(Thread-1, started 140381408802560)>: a=99
<Thread(Thread-2, started 140381400082176)>: a=999
<Thread(Thread-3, started 140381400082176)>: a=1733
2 threads, joining
- <_MainThread(MainThread, started 140381429581632)> join
- <Thread(Thread-4, started 140381408802560)> join
<Thread(Thread-4, started 140381408802560)>: a=17334
done