why do we use while 1: pass in thread in python - python

from thread import start_new_thread
num_threads = 0
def heron(a):
global num_threads
num_threads += 1
# code has been left out, see above
num_threads -= 1
return new
start_new_thread(heron,(99,))
start_new_thread(heron,(999,))
start_new_thread(heron,(1733,))
start_new_thread(heron,(17334,))
while num_threads > 0:
pass
This is simple code of thread i want to know in last line why do we use while loop

The final while-loop waits for all of the threads to finish before the main thread exits.
It is expensive check (100% CPU for the spin-wait). You can improve it in one of two ways:
while num_threads > 0:
time.sleep(0.1)
or by tracking all the threads in a list and joining them one-by-one:
for worker in worker_threads:
worker.join()

We want to keep process alive until all children finish the work. So we must keep executing something in main thread as long as any child is alive, hence the check for num_threads variable.
If it wasn't for this, all children threads would be killed ASAP main thread finished its work regardless of whether they actually finished their work, so waiting for them is mandatory to ensure everything is done.

To build on Raymond Hettinger's answer: the parent process starts a number of threads, each of which does work. We then wait for each to exit, so that we can collect and process their output. In this case each worker just outputs to the screen, so the parent just has to join() each task to make sure it ran and exited correctly.
Here's an alternate way to code the above. It uses the higher-level library threading (vs thread), and only calls join() on threads besides the current one. We also use threading.enumerate() instead of manually keeping track of worker threads -- easier.
code:
import threading
def heron(a):
print '{}: a={}'.format(threading.current_thread(), a)
threading.Thread(target=heron, args=(99,)).start()
threading.Thread(target=heron, args=(999,)).start()
threading.Thread(target=heron, args=(1733,)).start()
threading.Thread(target=heron, args=(17334,)).start()
print
print '{} threads, joining'.format(threading.active_count())
for thread in threading.enumerate():
print '- {} join'.format(thread)
if thread == threading.current_thread():
continue
thread.join()
print 'done'
Example output:
python ./jointhread.py
<Thread(Thread-1, started 140381408802560)>: a=99
<Thread(Thread-2, started 140381400082176)>: a=999
<Thread(Thread-3, started 140381400082176)>: a=1733
2 threads, joining
- <_MainThread(MainThread, started 140381429581632)> join
- <Thread(Thread-4, started 140381408802560)> join
<Thread(Thread-4, started 140381408802560)>: a=17334
done

Related

How to wait for thread execution to complete before starting new thread?

I have a python code in which I can run a maximum of 10 threads at a time due to GPU and compute limitations. I have 100 folders that I want to process and I want each thread to process one folder. Here is some sample code that I have written to achieve this.
def random_wait(thread_id):
# print('Inside wait')
rand_number = random.randint(3, 9)
# print(f'Random number : {rand_number}')
print(f'Thread {thread_id} waiting for {rand_number} seconds')
time.sleep(rand_number)
print(f'Thread {thread_id} completed execution')
if __name__=='__main__':
total_runs = 6
thread_limit = 3
running_threads = list()
for i in range(total_runs):
print(f'Active threads : {threading.active_count()}')
if threading.active_count() > thread_limit:
print(f'Active thread count exceeded')
# check if an existing thread is alive and for it to finish execution
for running_thread in running_threads:
if not running_thread.is_alive():
# Remove thread
running_threads.remove(running_thread)
print(f'Removing thread: {running_thread}')
else:
thread = threading.Thread(target=random_wait, args=(i,), kwargs={})
running_threads.append(thread)
print(f'Starting thread : {i}')
thread.start()
In this code, I am checking if the number of active threads exceeds the thread limit that I have specified, and the process refrains from creating new threads unless there's space for one more thread to be executed.
I am able to refrain the process from starting new threads. However, I lose the threads that I wanted to start and the code just ends up starting and stopping the first three threads. How can I achieve starting a new thread/processing as soon as there's space for one more? Is there a better way in which I just start 10 threads, but as soon as one thread completes, I assign it to start processing another folder?
You should use a ThreadPoolExecutor from the Python standard library concurrent.futures, it automatically manages a fixed number of threads. If you need to execute the same function with different arguments in parallel (as in a parallel for-loop), you can use the .map() method:
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(10) as e:
results = e.map(work, (arg_1, arg_2, ..., arg_n))
If you need to schedule different work in parallel you should use the .submit() method:
from concurrent.futures import ThreadPoolExecutor
with ThreadPoolExecutor(10) as e:
future_1 = e.submit(work_1, arg_1)
future_2 = e.submit(work_2, arg_2)
result_1 = future_1.result()
result_2 = future_2.result()
In the second case, .submit() returns a Future object which encapsulates the asynchronous execution of the work. You should store that future and get the result when needed. Note that the context manager (with statement) ensures that the .shutdown() method is call before leaving it, so all works are done after this point.

Starting thread after thread is finished

Lets say I want to run 10 threads at same time and after one is finished start immediately new one. How can I do that?
I know with thread.join() I can wait to get finished, but than 10 threads needs to be finished, but I want after one finished to start new one immediately.
Well, what I understand is that you need to execute 10 thread at the same time.
I suggest you to use threading.BoundedSemaphore()
A sample code on using it is given below:
import threading
from typing import List
def do_something():
print("I hope this cleared your doubt :)")
sema4 = threading.BoundedSemaphore(10)
# 10 is given as parameter since your requirement stated that you need just 10 threads to get executed parallely
threads_list: List[threading.Thread] = []
# Above variable is used to save threads
for i in range(100):
thread = threading.Thread(target=do_something)
threads_list.append(thread) # saving thread in order to join it later
thread.start() # starting the thread
for thread in threads_list:
thread.join() # else, parent program is terminated without waiting for child threads

Python multi-threading performance issue related to start()

I had some performance issues with a multi-threading code to parallelize multiple telnet probes.
Slow
My first implementation was is really slow, same a if the tasks were run sequencially:
for printer in printers:
…
thread = threading.Thread(target=collect, args=(task, printers_response), kwargs=kw)
threads.append(thread)
for thread in threads:
thread.start()
thread.join()
Blastlingly Fast
for printer in printers:
…
thread = threading.Thread(target=collect, args=(task, printers_response), kwargs=kw)
threads.append(thread)
thread.start() # <----- moved this
for thread in threads:
thread.join()
Question
I don't get why moving the start() method change the performance so much.
In your first implementation you are actually running the code sequentially because by calling join() immediately after start() the main thread is blocked until the started thread is finished.
thread.join() is blocking every thread as soon as they are created in your first implementation.
According to threading.Thread.join() documentation:
Wait until the thread terminates.
This blocks the calling thread until the thread whose join() method is called terminates -- either normally or through an unhandled exception or until the optional timeout occurs".
In your slow example you start the thread and wait till it is complete, then you iterate to the next thread.
Example
from threading import Thread
from time import sleep
def foo(a, b):
while True:
print(a + ' ' + b)
sleep(1)
ths = []
for i in range(3):
th = Thread(target=foo, args=('hi', str(i)))
ths.append(th)
for th in ths:
th.start()
th.join()
Produces
hi 0
hi 0
hi 0
hi 0
In your slow solution you are basically not using multithreading at all. Id's running a thread, waiting to finish it and then running another - there is no difference in running everything in one thread and this solution - you are running them in series.
The second one on the other hand starts all threads and then joins them. This solution limits the execution time to the longest execution time of one single thread - you are running them in parallel.

Python Threading/Daemon

Self-taught programming student, so I apologize for all the amateur mistakes. I want to learn some deeper subjects, so I'm trying to understand threading, and exception handling.
import threading
import sys
from time import sleep
from random import randint as r
def waiter(n):
print "Starting thread " + str(n)
wait_time = r(1,10)
sleep(wait_time)
print "Exiting thread " + str(n)
if __name__=='__main__':
try:
for i in range(5):
t = threading.Thread(target=waiter, args=(i+1,))
t.daemon = True
t.start()
sleep(3)
print 'All threads complete!'
sys.exit(1)
except KeyboardInterrupt:
print ''
sys.exit(1)
This script just starts and stops threads after a random time and will kill the program if it receives a ^C. I've noticed that it doesn't print when some threads finish:
Starting thread 1
Starting thread 2
Starting thread 3
Exiting thread 3
Exiting thread 2
Starting thread 4
Exiting thread 1
Exiting thread 4
Starting thread 5
All threads complete!
In this example, it never states it exits thread 5. I find I can fix this if I comment out the t.daemon = True statement, but then exception handling waits for any threads to finish up.
Starting thread 1
Starting thread 2
^C
Exiting thread 1
Exiting thread 2
I can understand that when dealing with threads, it's best that they complete what they're handling before exiting, but I'm just curious as to why this is. I'd really appreciate any answers regarding the nature of threading and daemons to guide my understanding.
The whole point of a daemon thread is that if it's not finished by the time the main thread finishes, it gets summarily killed. Quoting the docs:
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property or the daemon constructor argument.
Note Daemon threads are abruptly stopped at shutdown. Their resources (such as open files, database transactions, etc.) may not be released properly. If you want your threads to stop gracefully, make them non-daemonic and use a suitable signalling mechanism such as an Event.
Now, look at your logic. The main thread only sleeps for 3 seconds after starting thread 5. But thread 5 can sleep for anywhere from 1-10 seconds. So, about 70% of the time, it's not going to be finished by the time the main thread wakes up, prints "All threads complete!", and exits. But thread 5 is still asleep for another 5 seconds. In which case thread 5 will be killed without ever going to print "Exiting thread 5".
If this isn't the behavior you want—if you want the main thread to wait for all the threads to finish—then don't use daemon threads.

In Python threading, how I can I track a thread's completion?

I've a python program that spawns a number of threads. These threads last anywhere between 2 seconds to 30 seconds. In the main thread I want to track whenever each thread completes and print a message. If I just sequentially .join() all threads and the first thread lasts 30 seconds and others complete much sooner, I wouldn't be able to print a message sooner -- all messages will be printed after 30 seconds.
Basically I want to block until any thread completes. As soon as a thread completes, print a message about it and go back to blocking if any other threads are still alive. If all threads are done then exit program.
One way I could think of is to have a queue that is passed to all the threads and block on queue.get(). Whenever a message is received from the queue, print it, check if any other threads are alive using threading.active_count() and if so, go back to blocking on queue.get(). This would work but here all the threads need to follow the discipline of sending a message to the queue before terminating.
I'm wonder if this is the conventional way of achieving this behavior or are there any other / better ways ?
Here's a variation on #detly's answer that lets you specify the messages from your main thread, instead of printing them from your target functions. This creates a wrapper function which calls your target and then prints a message before terminating. You could modify this to perform any kind of standard cleanup after each thread completes.
#!/usr/bin/python
import threading
import time
def target1():
time.sleep(0.1)
print "target1 running"
time.sleep(4)
def target2():
time.sleep(0.1)
print "target2 running"
time.sleep(2)
def launch_thread_with_message(target, message, args=[], kwargs={}):
def target_with_msg(*args, **kwargs):
target(*args, **kwargs)
print message
thread = threading.Thread(target=target_with_msg, args=args, kwargs=kwargs)
thread.start()
return thread
if __name__ == '__main__':
thread1 = launch_thread_with_message(target1, "finished target1")
thread2 = launch_thread_with_message(target2, "finished target2")
print "main: launched all threads"
thread1.join()
thread2.join()
print "main: finished all threads"
The thread needs to be checked using the Thread.is_alive() call.
Why not just have the threads themselves print a completion message, or call some other completion callback when done?
You can the just join these threads from your main program, so you'll see a bunch of completion messages and your program will terminate when they're all done, as required.
Here's a quick and simple demonstration:
#!/usr/bin/python
import threading
import time
def really_simple_callback(message):
"""
This is a really simple callback. `sys.stdout` already has a lock built-in,
so this is fine to do.
"""
print message
def threaded_target(sleeptime, callback):
"""
Target for the threads: sleep and call back with completion message.
"""
time.sleep(sleeptime)
callback("%s completed!" % threading.current_thread())
if __name__ == '__main__':
# Keep track of the threads we create
threads = []
# callback_when_done is effectively a function
callback_when_done = really_simple_callback
for idx in xrange(0, 10):
threads.append(
threading.Thread(
target=threaded_target,
name="Thread #%d" % idx,
args=(10 - idx, callback_when_done)
)
)
[t.start() for t in threads]
[t.join() for t in threads]
# Note that thread #0 runs for the longest, but we'll see its message first!
What I would suggest is loop like this
while len(threadSet) > 0:
time.sleep(1)
for thread in theadSet:
if not thread.isAlive()
print "Thread "+thread.getName()+" terminated"
threadSet.remove(thread)
There is a 1 second sleep, so there will be a slight delay between the thread termination and the message being printed. If you can live with this delay, then I think this is a simpler solution than the one you proposed in your question.
You can let the threads push their results into a threading.Queue. Have another thread wait on this queue and print the message as soon as a new item appears.
I'm not sure I see the problem with using:
threading.activeCount()
to track the number of threads that are still active?
Even if you don't know how many threads you're going to launch before starting it seems pretty easy to track. I usually generate thread collections via list comprehension then a simple comparison using activeCount to the list size can tell you how many have finished.
See here: http://docs.python.org/library/threading.html
Alternately, once you have your thread objects you can just use the .isAlive method within the thread objects to check.
I just checked by throwing this into a multithread program I have and it looks fine:
for thread in threadlist:
print(thread.isAlive())
Gives me a list of True/False as the threads turn on and off. So you should be able to do that and check for anything False in order to see if any thread is finished.
I use a slightly different technique because of the nature of the threads I used in my application. To illustrate, this is a fragment of a test-strap program I wrote to scaffold a barrier class for my threading class:
while threads:
finished = set(threads) - set(threading.enumerate())
while finished:
ttt = finished.pop()
threads.remove(ttt)
time.sleep(0.5)
Why do I do it this way? In my production code, I have a time limit, so the first line actually reads "while threads and time.time() < cutoff_time". If I reach the cut-off, I then have code to tell the threads to shut down.

Categories