What happens if I don't join() a python thread? - python

I have a query. I have seen examples where developers write something like the code as follows:
import threading
def do_something():
return true
t = threading.Thread(target=do_something)
t.start()
t.join()
I know that join() signals the interpreter to wait till the thread is completely executed. But what if I do not write t.join()? Will the thread get closed automatically and will it be reused later?
Please let me know the answer. It's my first attempt at creating a multi-threaded application in Python 3.5.0.

A Python thread is just a regular OS thread. If you don't join it, it still keeps running concurrently with the current thread. It will eventually die, when the target function completes or raises an exception. No such thing as "thread reuse" exists, once it's dead it rests in peace.
Unless the thread is a "daemon thread" (via a constructor argument daemon or assigning the daemon property) it will be implicitly joined for before the program exits, otherwise, it is killed abruptly.
One thing to remember when writing multithreading programs in Python, is that they only have limited use due to infamous Global interpreter lock. In short, using threads won't make your CPU-intensive program any faster. They can be useful only when you perform something involving waiting (e.g. you wait for certain file system event to happen in a thread).

The join part means the main program will wait for the thread to end before continuing. Without join, the main program will end and the thread will continue.
Now if you set the daemon parameter to "True", it means the thread will depends on the main program, and it will ends if the main program ends before.
Here is an example to understand better :
import threading
import time
def do_something():
time.sleep(2)
print("do_something")
return True
t = threading.Thread(target=do_something)
t.daemon = True # without the daemon parameter, the function in parallel will continue even your main program ends
t.start()
t.join() # with this, the main program will wait until the thread ends
print("end of main program")
no daemon, no join:
end of main program
do_something
daemon only:
end of main program
join only:
do_something
end of main program
daemon and join:
do_something
end of main program
# Note : in this case the daemon parameter is useless

Without join(), non-daemon threads are running and are completed with the main thread concurrently.
Without join(), daemon threads are running with the main thread concurrently and when the main thread is completed, the daemon threads are exited without completed if the daemon threads are still running.
You can see my answer in this post explaining about it in detail.

Related

Trying to understand python multithreading

Please consider this code:
import threading
def printer():
for i in range(2):
with lock:
print ['foo', 'bar', 'baz']
def main():
global lock
lock = threading.Lock()
threads = [threading.Thread(target=printer) for x in xrange(2)]
for t in threads:
t.start()
t.join()
main()
I can understand this code and it is clear: We create two threads and we run them sequentially - we run second thread only when first thread is finished. Ok, now consider another variant:
import threading
def printer():
for i in range(2):
with lock:
print ['foo', 'bar', 'baz']
def main():
global lock
lock = threading.Lock()
threads = [threading.Thread(target=printer) for x in xrange(2)]
for t in threads:
t.start()
for t in threads:
t.join()
main()
What happens here? Ok, we run them in parallel, but what is the purpose of make main thread waiting for child threads in second variant? How it can influence on the output?
In the second variant, the ordering of execution is much less defined.
The lock is released each time through the loop in printer. In both variants, you have two threads and two loops within a thread.
In the first variant, since only one thread runs at a time, you know the total ordering.
In the second variant, each time the lock is released, the thread running may change.
So you might get
thread 1 loop 1
thread 1 loop 2
thread 2 loop 1
thread 2 loop 2
or perhaps
* thread 2 loop 1
* thread 1 loop 1
* thread 1 loop 2
* thread 2 loop 2
The only constraint is that loop1 within a given thread runs before loop 2 within that thread and that the two print statements come together since the lock is held for both of them.
In this particular case I'm not sure the call to t.join() in the second variant has an observable effect. It guarantees that the main thread will be the last thread to end, but I'm not sure that in this code you can observe that in any way. In more complex code, joining the threads can be important so that cleanup actions are only performed after all threads terminate. This can also be very important if you have daemon threads, because the entire program will terminate when all non-daemon threads terminate.
To better understand the multithreading in python, you need to first understand the relationship between the main thread and the children threads.
The main thread is the entry of the program, it is created by your system when you run your script. For example, in your script, the main function is run in the main thread.
While the children thread is created by your main thread when you instanate the Thread class.
The most important thing is how the main thread controls the children thread. Basically, the instance of the Thread is everything that the main thread know about and control over this child thread. At the time when a child thread is created, this child thread does not run immediately, until the main thread call start function on this thread instance. After the start the child thread, you can assume that the main thread and the child thread is running parallelly now.
But one more important thing is how the main thread knows that the task of child thread is done. Though the main thread knows nothing about how the task is done by the child thread, it does be aware of the running status of the child thread. Thread.is_alive can check the status of a thread by the main thread. In pratice, the Thread.join function is always used to tell the main thread wait until the child thread is done. This function will block the main thread.
Okay, let's examine the two script you are confused with. For the first script:
for t in threads:
t.start()
t.join()
The children threads in the loop are started and then joined one by one. Note that start does not block main thread, while join will block the main thread wait until this child thread is done. Thus they are running sequentially.
While for the second script:
for t in threads:
t.start()
for t in threads:
t.join()
All children threads are started in the first loop. As the Thread.start function will not block the main thread, all children threadings are running parallelly after the first loop. In the second loop, the main thread will wait for the task done of each child thread one by one.
Now I think you should notice the difference between these two script: in the first one, children threads running one by one, while in the second script, they are running simultaneously.
There are other useful topics for the python threading:
(1) How to handle the Keyboard Interrupt Exception, e.g., when I want to terminate the program by Ctrl-C? Only the main thread will receive the exception, you have to handle the termination of children threads.
(2) Multithreading vs Multiprocessing. Although we are saying that threading is parallel, it is not the real parallel in CPU level. So if your application is CPU intensive, try multiprocessing, and if your application is I/O intensive, multithreading maybe sufficient.
By the way, read through the documentation of python threading section and try some code may help you understand it.
Hope this would be helpful. Thanks.

Killing a script while a function is active

I have a main thread and another thread which starts after threading.Timer(1,success).start() calls it.
In the defined function success I need to kill the whole python script, I tried sys.exit() but that only ends the thread. I can't signal the main thread as the reason the timer went off was because the main thread took too long to respond, so there's no guarantee the signal would be read by the main thread.
I considered using os.exit() which works, but it's messy as the script is reloaded after a second by another program and memory fills up.
You can join the created thread with a timeout:
join(timeout=None)
Thus, it will not wait for completion.

Why does threading.Thread operate synchronously by blocking execution in python 2.5?

I am limited to python2.5, and I thought that threading.Thread was asynchronous. I run: python t.py and the script does not return to the shell until 3 seconds have gone by, which means its blocking. Why is it blocking?
My Code:
#!/usr/bin/python
import threading,time
def doit():
time.sleep(3)
print "DONE"
thr = threading.Thread(target=doit, args=(), kwargs={})
thr.start() # will run "foo"
By default, threads in Python are non-daemonic. A Python application will not exit until the all non-daemon threads have completed, so in your case it won't exit until doit has finished. If you want to script to exit immediately upon reaching the end of the main thread, you need to make the thread a daemon, by setting the daemon attribute prior to starting the thread:
thr = threading.Thread(target=doit, args=(), kwargs={})
thr.daemon = True
thr.start()
Threading in Python is "kind-of" asynchronous. What does this mean?
Only one thread can be running Python code at one time
threads that are Python code and CPU intensive will not benefit
Your issue seems to be that you think a Python thread should keep running after Python itself quits -- that's not how it works. If you do make a thread a daemon then when Python quits those threads just die, instantly -- no cleanup, no error recovery, just dead.
If you want to actually make a daemon process, something that keeps running in the background after the main application exits, you want to look at os.fork(). If you want to do it the easier way, you can try my daemon library, pandaemonium

What is a python thread

I have several questions regarding Python threads.
Is a Python thread a Python or OS implementation?
When I use htop a multi-threaded script has multiple entries - the same memory consumption, the same command but a different PID. Does this mean that a [Python] thread is actually a special kind of process? (I know there is a setting in htop to show these threads as one process - Hide userland threads)
Documentation says:
A thread can be flagged as a “daemon thread”. The significance of this
flag is that the entire Python program exits when only daemon threads
are left.
My interpretation/understanding was: main thread terminates when all non-daemon threads are terminated.
So python daemon threads are not part of Python program if "the entire Python program exits when only daemon threads are left"?
Python threads are implemented using OS threads in all implementations I know (C Python, PyPy and Jython). For each Python thread, there is an underlying OS thread.
Some operating systems (Linux being one of them) show all different threads launched by the same executable in the list of all running processes. This is an implementation detail of the OS, not of Python. On some other operating systems, you may not see those threads when listing all the processes.
The process will terminate when the last non-daemon thread finishes. At that point, all the daemon threads will be terminated. So, those threads are part of your process, but are not preventing it from terminating (while a regular thread will prevent it). That is implemented in pure Python. A process terminates when the system _exit function is called (it will kill all threads), and when the main thread terminates (or sys.exit is called), the Python interpreter checks if there is another non-daemon thread running. If there is none, then it calls _exit, otherwise it waits for the non-daemon threads to finish.
The daemon thread flag is implemented in pure Python by the threading module. When the module is loaded, a Thread object is created to represent the main thread, and it's _exitfunc method is registered as an atexit hook.
The code of this function is:
class _MainThread(Thread):
def _exitfunc(self):
self._Thread__stop()
t = _pickSomeNonDaemonThread()
if t:
if __debug__:
self._note("%s: waiting for other threads", self)
while t:
t.join()
t = _pickSomeNonDaemonThread()
if __debug__:
self._note("%s: exiting", self)
self._Thread__delete()
This function will be called by the Python interpreter when sys.exit is called, or when the main thread terminates. When the function returns, the interpreter will call the system _exit function. And the function will terminate, when there are only daemon threads running (if any).
When the _exit function is called, the OS will terminate all of the process threads, and then terminate the process. The Python runtime will not call the _exit function until all the non-daemon thread are done.
All threads are part of the process.
My interpretation/understanding was: main thread terminates when all
non-daemon threads are terminated.
So python daemon threads are not part of python program if "the entire
Python program exits when only daemon threads are left"?
Your understanding is incorrect. For the OS, a process is composed of many threads, all of which are equal (there is nothing special about the main thread for the OS, except that the C runtime add a call to _exit at the end of the main function). And the OS doesn't know about daemon threads. This is purely a Python concept.
The Python interpreter uses native thread to implement Python thread, but has to remember the list of threads created. And using its atexit hook, it ensures that the _exit function returns to the OS only when the last non-daemon thread terminates. When using "the entire Python program", the documentation refers to the whole process.
The following program can help understand the difference between daemon thread and regular thread:
import sys
import time
import threading
class WorkerThread(threading.Thread):
def run(self):
while True:
print 'Working hard'
time.sleep(0.5)
def main(args):
use_daemon = False
for arg in args:
if arg == '--use_daemon':
use_daemon = True
worker = WorkerThread()
worker.setDaemon(use_daemon)
worker.start()
time.sleep(1)
sys.exit(0)
if __name__ == '__main__':
main(sys.argv[1:])
If you execute this program with the '--use_daemon', you will see that the program will only print a small number of Working hard lines. Without this flag, the program will not terminate even when the main thread finishes, and the program will print Working hard lines until it is killed.
I'm not familiar with the implementation, so let's make an experiment:
import threading
import time
def target():
while True:
print 'Thread working...'
time.sleep(5)
NUM_THREADS = 5
for i in range(NUM_THREADS):
thread = threading.Thread(target=target)
thread.start()
The number of threads reported using ps -o cmd,nlwp <pid> is NUM_THREADS+1 (one more for the main thread), so as long as the OS tools detect the number of threads, they should be OS threads. I tried both with cpython and jython and, despite in jython there are some other threads running, for each extra thread that I add, ps increments the thread count by one.
I'm not sure about htop behaviour, but ps seems to be consistent.
I added the following line before starting the threads:
thread.daemon = True
When I executed the using cpython, the program terminated almost immediately and no process was found using ps, so my guess is that the program terminated together with the threads. In jython the program worked the same way (it didn't terminate), so maybe there are some other threads from the jvm that prevent the program from terminating or daemon threads aren't supported.
Note: I used Ubuntu 11.10 with python 2.7.2+ and jython 2.2.1 on java1.6.0_23
Python threads are practically an interpreter implementation, because the so called global interpreter lock (GIL), even if it's technically using the os-level threading mechanisms. On *nix it's utilizing the pthreads, but the GIL effectivly makes it a hybrid stucked to the application-level threading paradigm. So you will see it on *nix systems multiple times in a ps/top output, but it still behaves (performance-wise) like a software-implemented thread.
No, you are just seeing the kind of underlying thread implementation of your os. This kind of behaviur is exposed by *nix pthread-like threading or im told even windows does implement threads this way.
When your program closes, it waits for all threads to finish also. If you have threads, which could postpone the exit indefinitly, it may be wise to flag those threads as "daemons" and allow your program to finish even if those threads are still running.
Some reference material you might be interested:
Linux Gazette: Understanding Threading in Python.
Doug Hellman: Multi-processing techniques in Python
David Beazley: PyCon 2010:Understanding the Python GIL(Video-presentation)
There are great answers to the question, but I feel the daemon threads question is still not explained in a simple fashion. So this answer refers just to the third question
"main thread terminates when all non-daemon threads are terminated."
So python daemon threads are not part of Python program if "the entire Python program exits when only daemon threads are left"?
If you think about what a daemon is, it is usually a service. Some code that runs in an infinite loop, that serves request, fill queues, accepts connections, etc. Other threads use it. It has no purpose when running by itself (in a single process terms).
So the program can't wait for the daemon thread to terminate, because it might never happen. Python will end the program when all non daemon threads are done. It also stops the daemon threads.
To wait until a daemon thread has completed its work, use the join() method.
daemon_thread.join() will make Python to wait for the daemon thread as well before exiting. The join() also accepts a timeout argument.

Parent Thread exiting before Child Threads [python]

I'm using Python in a webapp (CGI for testing, FastCGI for production) that needs to send an occasional email (when a user registers or something else important happens). Since communicating with an SMTP server takes a long time, I'd like to spawn a thread for the mail function so that the rest of the app can finish up the request without waiting for the email to finish sending.
I tried using thread.start_new(func, (args)), but the Parent return's and exits before the sending is complete, thereby killing the sending process before it does anything useful. Is there anyway to keep the process alive long enough for the child process to finish?
Take a look at the thread.join() method. Basically it will block your calling thread until the child thread has returned (thus preventing it from exiting before it should).
Update:
To avoid making your main thread unresponsive to new requests you can use a while loop.
while threading.active_count() > 0:
# ... look for new requests to handle ...
time.sleep(0.1)
# or try joining your threads with a timeout
#for thread in my_threads:
# thread.join(0.1)
Update 2:
It also looks like thread.start_new(func, args) is obsolete. It was updated to thread.start_new_thread(function, args[, kwargs]) You can also create threads with the higher level threading package (this is the package that allows you to get the active_count() in the previous code block):
import threading
my_thread = threading.Thread(target=func, args=(), kwargs={})
my_thread.daemon = True
my_thread.start()
You might want to use threading.enumerate, if you have multiple workers and want to see which one(s) are still running.
Other alternatives include using threading.Event---the main thread sets the event to True and starts the worker thread off. The worker thread unsets the event when if finishes work, and the main check whether the event is set/unset to figure out if it can exit.

Categories