Some doubts about Thread Pool Executor and Thread in python - python

Recently,I tried to use asyncio to execute multiple blocking operations asynchronously.I used the function loop.run_in_executor,It seems that the function puts tasks into the thread pool.As far as I know about thread pool,it reduces the overhead of creating and destroying threads,because it can put in a new task when a task is finished instead of destroying the thread.I wrote the following code for deeper unstanding.
def blocking_funa():
print('starta')
print('starta')
time.sleep(4)
print('enda')
def blocking_funb():
print('startb')
print('startb')
time.sleep(4)
print('endb')
loop = asyncio.get_event_loop()
tasks = [loop.run_in_executor(None, blocking_funa), loop.run_in_executor(None, blocking_funb)]
loop.run_until_complete(asyncio.wait(tasks))
and the output:
starta
startbstarta
startb
(wait for about 4s)
enda
endb
we can see these two tasks are almost simultaneous.now I use threading module:
threads = [threading.Thread(target = blocking_ioa), threading.Thread(target = blocking_iob)]
for thread in threads:
thread.start()
thread.join()
and the output:
starta
starta
enda
startb
startb
endb
Due to the GIL limitation, only one thread is executing at the same time,so I understand the output.But how does thread pool executor make these two tasks almost simultaneous.What is the different between thread pool and thread?And Why does thread pool look like it's not limited by GIL?

You're not making a fair comparison, since you're joining the first thread before starting the second.
Instead, consider:
import time
import threading
def blocking_funa():
print('a 1')
time.sleep(1)
print('a 2')
time.sleep(1)
print('enda (quick)')
def blocking_funb():
print('b 1')
time.sleep(1)
print('b 2')
time.sleep(4)
print('endb (a few seconds after enda)')
threads = [threading.Thread(target=blocking_funa), threading.Thread(target=blocking_funb)]
for thread in threads:
thread.start()
for thread in threads:
thread.join()
The output:
a 1
b 1
b 2
a 2
enda (quick)
endb (a few seconds after enda)
Considering it hardly takes any time to run a print statement, you shouldn't read too much into the prints in the first example getting mixed up.
If you run the code repeatedly, you may find that b 2 and a 2 will change order more or less randomly. Note how in my posted result, b 2 occurred before a 2.
Also, regarding your remark "Due to the GIL limitation, only one thread is executing at the same time" - you're right that the "execution of any Python bytecode requires acquiring the interpreter lock. This prevents deadlocks (as there is only one lock) and doesn’t introduce much performance overhead. But it effectively makes any CPU-bound Python program single-threaded." https://realpython.com/python-gil/#the-impact-on-multi-threaded-python-programs
The important part there is "CPU-bound" - of course you would still benefit from making I/O-bound code multi-threaded.

Python does a release/acquire on the GIL often. This means that runnable GIL controlled threads will all get little sprints. Its not parallel, just interleaved. More important for your example, python tends to release the GIL when doing a blocking operation. The GIL is released before sleep and also when print enters the C libraries.

Related

How to wait for thread execution to complete before starting new thread?

I have a python code in which I can run a maximum of 10 threads at a time due to GPU and compute limitations. I have 100 folders that I want to process and I want each thread to process one folder. Here is some sample code that I have written to achieve this.
def random_wait(thread_id):
# print('Inside wait')
rand_number = random.randint(3, 9)
# print(f'Random number : {rand_number}')
print(f'Thread {thread_id} waiting for {rand_number} seconds')
time.sleep(rand_number)
print(f'Thread {thread_id} completed execution')
if __name__=='__main__':
total_runs = 6
thread_limit = 3
running_threads = list()
for i in range(total_runs):
print(f'Active threads : {threading.active_count()}')
if threading.active_count() > thread_limit:
print(f'Active thread count exceeded')
# check if an existing thread is alive and for it to finish execution
for running_thread in running_threads:
if not running_thread.is_alive():
# Remove thread
running_threads.remove(running_thread)
print(f'Removing thread: {running_thread}')
else:
thread = threading.Thread(target=random_wait, args=(i,), kwargs={})
running_threads.append(thread)
print(f'Starting thread : {i}')
thread.start()
In this code, I am checking if the number of active threads exceeds the thread limit that I have specified, and the process refrains from creating new threads unless there's space for one more thread to be executed.
I am able to refrain the process from starting new threads. However, I lose the threads that I wanted to start and the code just ends up starting and stopping the first three threads. How can I achieve starting a new thread/processing as soon as there's space for one more? Is there a better way in which I just start 10 threads, but as soon as one thread completes, I assign it to start processing another folder?
You should use a ThreadPoolExecutor from the Python standard library concurrent.futures, it automatically manages a fixed number of threads. If you need to execute the same function with different arguments in parallel (as in a parallel for-loop), you can use the .map() method:
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(10) as e:
results = e.map(work, (arg_1, arg_2, ..., arg_n))
If you need to schedule different work in parallel you should use the .submit() method:
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(10) as e:
future_1 = e.submit(work_1, arg_1)
future_2 = e.submit(work_2, arg_2)
result_1 = future_1.result()
result_2 = future_2.result()
In the second case, .submit() returns a Future object which encapsulates the asynchronous execution of the work. You should store that future and get the result when needed. Note that the context manager (with statement) ensures that the .shutdown() method is call before leaving it, so all works are done after this point.

How does Thread().join work in the following case?

I saw the following code in a thread tutorial:
from time import sleep, perf_counter
from threading import Thread
start = perf_counter()
def foo():
sleep(5)
threads = []
for i in range(100):
t = Thread(target=foo,)
t.start()
threads.append(t)
for i in threads:
i.join()
end = perf_counter()
print(f'Took {end - start}')
When I run it it prints Took 5.014557975. Okay, that part is fine. It does not take 500 seconds as the non threaded version would.
What I don't understand is how .join works. I noticed without calling .join I got Took 0.007060926999999995 which indicates that the main thread ended before the child threads. Since '.join()' is supposed to block, when the first iteration of the loop occurs won't it be blocked and have to wait 5 seconds till the second iteration? How does it still manage to run?
I keep reading python threading is not truly multithreaded and it only appears to be (runs on a single core), but if that is the case then how exactly is the background time running if it's not parallel?
So '.join()' is supposed to block, so when the first iteration of the loop occurs wont it be blocked and it has to wait 5 seconds till the second iteration?
Remember all the threads are started at the same time and all of them take ~5s.
The second for loop waits for all the threads to finish. It will take roughly 5s for the first thread to finish, but the remaining 99 threads will finish roughly at the same time, and so will the remaining 99 iterations of the loop.
By the time you're calling join() on the second thread, it is either already finished or will be within a couple of milliseconds.
I keep reading python threading is not truly multithreaded and it only appears to be (runs on a single core), but if that is the case then how exactly is the background time running if it's not parallel?
It's a topic that has been discussed a lot, so I won't add another page-long answer.
Tl;dr: Yes, Python Multithreading doesn't help with CPU-intensive tasks, but it's just fine for tasks that spend a lot of time on waiting for something else (Network, Disk-I/O, user input, a time-based event).
sleep() belongs to the latter group of tasks, so Multithreading will speed it up, even though it doesn't utilize multiple cores simultaneously.
The OS is in control when the thread starts and the OS will context-switch (I believe that is the correct term) between threads.
time functions access a clock on your computer via the OS - that clock is always running. As long as the OS periodically gives each thread time to access a clock the thread's target can tell if it has been sleeping long enough.
The threads are not running in parallel, the OS periodically gives each one a chance to look at the clock.
Here is a little finer detail for what is happening. I subclassed Thread and overrode its run and join methods to log when they are called.
Caveat The documentation specifically states
only override __init__ and run methods
I was surprised overriding join didn't cause problems.
from time import sleep, perf_counter
from threading import Thread
import pandas as pd
c = {}
def foo(i):
c[i]['foo start'] = perf_counter() - start
sleep(5)
# print(f'{i} - start:{start} end:{perf_counter()}')
c[i]['foo end'] = perf_counter() - start
class Test(Thread):
def __init__(self,*args,**kwargs):
self.i = kwargs['args'][0]
super().__init__(*args,**kwargs)
def run(self):
# print(f'{self.i} - started:{perf_counter()}')
c[self.i]['thread start'] = perf_counter() - start
super().run()
def join(self):
# print(f'{self.i} - joined:{perf_counter()}')
c[self.i]['thread joined'] = perf_counter() - start
super().join()
threads = []
start = perf_counter()
for i in range(10):
c[i] = {}
t = Test(target=foo,args=(i,))
t.start()
threads.append(t)
for i in threads:
i.join()
df = pd.DataFrame(c)
print(df)
0 1 2 3 4 5 6 7 8 9
thread start 0.000729 0.000928 0.001085 0.001245 0.001400 0.001568 0.001730 0.001885 0.002056 0.002215
foo start 0.000732 0.000931 0.001088 0.001248 0.001402 0.001570 0.001732 0.001891 0.002058 0.002217
thread joined 0.002228 5.008274 5.008300 5.008305 5.008323 5.008327 5.008330 5.008333 5.008336 5.008339
foo end 5.008124 5.007982 5.007615 5.007829 5.007672 5.007899 5.007724 5.007758 5.008051 5.007549
Hopefully you can see that all the threads are started in sequence very close together; once thread 0 is joined nothing else happens till it stops (foo ends) then each of the other threads are joined and terminate.
Sometimes a thread terminates before it is even joined - for threads one plus foo ends before the thread is joined.

Python multi-threading performance issue related to start()

I had some performance issues with a multi-threading code to parallelize multiple telnet probes.
Slow
My first implementation was is really slow, same a if the tasks were run sequencially:
for printer in printers:
…
thread = threading.Thread(target=collect, args=(task, printers_response), kwargs=kw)
threads.append(thread)
for thread in threads:
thread.start()
thread.join()
Blastlingly Fast
for printer in printers:
…
thread = threading.Thread(target=collect, args=(task, printers_response), kwargs=kw)
threads.append(thread)
thread.start() # <----- moved this
for thread in threads:
thread.join()
Question
I don't get why moving the start() method change the performance so much.
In your first implementation you are actually running the code sequentially because by calling join() immediately after start() the main thread is blocked until the started thread is finished.
thread.join() is blocking every thread as soon as they are created in your first implementation.
According to threading.Thread.join() documentation:
Wait until the thread terminates.
This blocks the calling thread until the thread whose join() method is called terminates -- either normally or through an unhandled exception or until the optional timeout occurs".
In your slow example you start the thread and wait till it is complete, then you iterate to the next thread.
Example
from threading import Thread
from time import sleep
def foo(a, b):
while True:
print(a + ' ' + b)
sleep(1)
ths = []
for i in range(3):
th = Thread(target=foo, args=('hi', str(i)))
ths.append(th)
for th in ths:
th.start()
th.join()
Produces
hi 0
hi 0
hi 0
hi 0
In your slow solution you are basically not using multithreading at all. Id's running a thread, waiting to finish it and then running another - there is no difference in running everything in one thread and this solution - you are running them in series.
The second one on the other hand starts all threads and then joins them. This solution limits the execution time to the longest execution time of one single thread - you are running them in parallel.

Python: Understanding Threading Module

While learning Python's threading module I've run a simple test. Interesting that the threads are running sequentially and not parallel. Is it possible to modify this test code so a program executes the threads in same fashion as multiprocessing does: in parallel?
import threading
def mySlowFunc(arg):
print "\nStarting...", arg
m=0
for i in range(arg):
m+=i
print '\n...Finishing', arg
myList =[35000000, 45000000, 55000000]
for each in myList:
thread = threading.Thread(target=mySlowFunc, args=(each,) )
thread.daemon = True
thread.start()
thread.join()
print "\n Happy End \n"
REVISED CODE:
This version of the code will initiate 6 'threads' running in 'parallel'. But even while there will be 6 threads only two CPU's threads are actually used (6 other Physical CPU threads will be idling and doing nothing).
import threading
def mySlowFunc(arg):
print "\nStarting " + str(arg) + "..."
m=0
for i in range(arg):
m+=i
print '\n...Finishing ' + str(arg)
myList =[35000000, 45000000, 55000000, 25000000, 75000000, 65000000]
for each in myList:
thread = threading.Thread(target=mySlowFunc, args=(each,) )
thread.daemon = False
thread.start()
print "\n Bottom of script reached \n"
From the docs for the join method:
Wait until the thread terminates. This blocks the calling thread until the thread whose join() method is called terminates – either normally or through an unhandled exception – or until the optional timeout occurs.
Just create a list of threads and join them after launching every single one of them.
Edit:
The threads are executing in parallel, you can think of python's threads like a computer with a single core, the thing is, python's threads are best for IO operations (reading/writing a big file, sending data through a socket, that sort of thing). If you want CPU power you need to use the multiprocessing module
If python didn't have the GIL, you ought to be able to see true parallelism by changing your code to only join after you have started all threads:
threads = []
for each in myList:
t = threading.Thread(target=mySlowFunc, args=(each,) )
t.daemon = True
t.start()
threads.append(t)
for t in threads:
t.join()
With the above code in python, you should at least be able to see interleaving: thread #2 doing some work before thread #1 has completed. But, you won't see genuine parallelism. See the GIL link for more background.

In Python threading, how I can I track a thread's completion?

I've a python program that spawns a number of threads. These threads last anywhere between 2 seconds to 30 seconds. In the main thread I want to track whenever each thread completes and print a message. If I just sequentially .join() all threads and the first thread lasts 30 seconds and others complete much sooner, I wouldn't be able to print a message sooner -- all messages will be printed after 30 seconds.
Basically I want to block until any thread completes. As soon as a thread completes, print a message about it and go back to blocking if any other threads are still alive. If all threads are done then exit program.
One way I could think of is to have a queue that is passed to all the threads and block on queue.get(). Whenever a message is received from the queue, print it, check if any other threads are alive using threading.active_count() and if so, go back to blocking on queue.get(). This would work but here all the threads need to follow the discipline of sending a message to the queue before terminating.
I'm wonder if this is the conventional way of achieving this behavior or are there any other / better ways ?
Here's a variation on #detly's answer that lets you specify the messages from your main thread, instead of printing them from your target functions. This creates a wrapper function which calls your target and then prints a message before terminating. You could modify this to perform any kind of standard cleanup after each thread completes.
#!/usr/bin/python
import threading
import time
def target1():
time.sleep(0.1)
print "target1 running"
time.sleep(4)
def target2():
time.sleep(0.1)
print "target2 running"
time.sleep(2)
def launch_thread_with_message(target, message, args=[], kwargs={}):
def target_with_msg(*args, **kwargs):
target(*args, **kwargs)
print message
thread = threading.Thread(target=target_with_msg, args=args, kwargs=kwargs)
thread.start()
return thread
if __name__ == '__main__':
thread1 = launch_thread_with_message(target1, "finished target1")
thread2 = launch_thread_with_message(target2, "finished target2")
print "main: launched all threads"
thread1.join()
thread2.join()
print "main: finished all threads"
The thread needs to be checked using the Thread.is_alive() call.
Why not just have the threads themselves print a completion message, or call some other completion callback when done?
You can the just join these threads from your main program, so you'll see a bunch of completion messages and your program will terminate when they're all done, as required.
Here's a quick and simple demonstration:
#!/usr/bin/python
import threading
import time
def really_simple_callback(message):
"""
This is a really simple callback. `sys.stdout` already has a lock built-in,
so this is fine to do.
"""
print message
def threaded_target(sleeptime, callback):
"""
Target for the threads: sleep and call back with completion message.
"""
time.sleep(sleeptime)
callback("%s completed!" % threading.current_thread())
if __name__ == '__main__':
# Keep track of the threads we create
threads = []
# callback_when_done is effectively a function
callback_when_done = really_simple_callback
for idx in xrange(0, 10):
threads.append(
threading.Thread(
target=threaded_target,
name="Thread #%d" % idx,
args=(10 - idx, callback_when_done)
)
)
[t.start() for t in threads]
[t.join() for t in threads]
# Note that thread #0 runs for the longest, but we'll see its message first!
What I would suggest is loop like this
while len(threadSet) > 0:
time.sleep(1)
for thread in theadSet:
if not thread.isAlive()
print "Thread "+thread.getName()+" terminated"
threadSet.remove(thread)
There is a 1 second sleep, so there will be a slight delay between the thread termination and the message being printed. If you can live with this delay, then I think this is a simpler solution than the one you proposed in your question.
You can let the threads push their results into a threading.Queue. Have another thread wait on this queue and print the message as soon as a new item appears.
I'm not sure I see the problem with using:
threading.activeCount()
to track the number of threads that are still active?
Even if you don't know how many threads you're going to launch before starting it seems pretty easy to track. I usually generate thread collections via list comprehension then a simple comparison using activeCount to the list size can tell you how many have finished.
See here: http://docs.python.org/library/threading.html
Alternately, once you have your thread objects you can just use the .isAlive method within the thread objects to check.
I just checked by throwing this into a multithread program I have and it looks fine:
for thread in threadlist:
print(thread.isAlive())
Gives me a list of True/False as the threads turn on and off. So you should be able to do that and check for anything False in order to see if any thread is finished.
I use a slightly different technique because of the nature of the threads I used in my application. To illustrate, this is a fragment of a test-strap program I wrote to scaffold a barrier class for my threading class:
while threads:
finished = set(threads) - set(threading.enumerate())
while finished:
ttt = finished.pop()
threads.remove(ttt)
time.sleep(0.5)
Why do I do it this way? In my production code, I have a time limit, so the first line actually reads "while threads and time.time() < cutoff_time". If I reach the cut-off, I then have code to tell the threads to shut down.

Categories