Daemon threads vs daemon processes in Python

Daemon threads vs daemon processes in Python - python

Based on the Python documentation, daemon threads are threads that die once the main thread dies. This seems to be the complete opposite behavior of daemon processes which involve creating a child process and terminating the parent process in order to have init take over the child process (aka killing the parent process does NOT kill the child process).
So why do daemon threads die when the parent dies, is this a misnomer? I would think that "daemon" threads would keep running after the main process has been terminated.

It's just names meaning different things in different contexts.
In case you are not aware, like threading.Thread, multiprocessing.Process also can be flagged as "daemon". Your description of "daemon processes" fits to Unix-daemons, not to Python's daemon-processes.
The docs also have a section about Process.daemon:
... Note that a daemonic process is not allowed to create child processes.
Otherwise a daemonic process would leave its children orphaned if it
gets terminated when its parent process exits. Additionally, these are
not Unix daemons or services, they are normal processes that will be
terminated (and not joined) if non-daemonic processes have exited.
The only thing in common between Python's daemon-processes and Unix-daemons (or Windows "Services") is that you would use them for background-tasks
(for Python: only an option for tasks which don't need proper clean up on shutdown, though).
Python imposes it's own abstraction layer on top of OS-threads and processes. The daemon-attribute for Thread and Process is about this OS-independent, Python-level abstraction.
At the Python-level, a daemon-thread is a thread which doesn't get joined (awaited to exit voluntarily) when the main-thread exits and a daemon-process is a process which gets terminated (not joined) when the parent-process exits. Daemon-threads and processes both experience the same behavior in that their natural exit is not awaited in case the main or parent-process is shutting down. That's all.
Note that Windows doesn't even have the concept of "related processes" like Unix, but Python implements this relationship of "child" and "parent" in a cross-platform manner.
I would think that "daemon" threads would keep running after the main
process has been terminated.
A thread cannot exist outside of a process. A process always hosts and gives context to at least one thread.

Related

Can a Daemon process fork child processes in python?

I have a daemon process that keeps on running which I created using runit package. I want daemon process to listen to a table and perform tasks based on the column of the table which says what task it needs to perform.
EG: table 'A' has column job_type.
I was thinking of forking child processes from this daemon process every time it gets a new task to perform (based on the new row inserted in the table A which daemon listens to).
The multiprocessing module says I can't or shouldn't fork child processes from daemon as if it dies, the children processes are orphaned.
What is a good approach to achieve that Daemons listens to table, based on column value,forks child processes (all independent of each other) which does the task and goes back to the daemon and dies.
I need to use some locking mechanism if the child processes are accessing shared data and modifying it..

I assume the daemon process you have is also spawned from a python script which called multiprocess with daemon=true.
In this case the daemon is running implies that your creator process is still running, so you can just send it a message via pipes to spawn a new process for you. If your daemon needs to talk with this, use sockets or any ipc method of your choice.

Python - calling multiprocessing.pool inside a daemon

I have a Python script which spawns a daemon process. Inside the process, I am using multiprocessing.pool to run 1 to 4 processes simultaneously.
When I run this outside the daemon process, it works perfectly (i.e., when I set run_from_debugger=True - see code below), but if I run the code via a daemon process, (i.e., run_from_debugger=False), async_function is never executed.
Is it possible to use multiprocessing.pool inside a daemon process???
I am using Python-daemon 1.6 as my daemon package (if it matters).
Code:
def loop_callback(params):
#Spawn the process in the pool
# Because loop_callback is called many times, often faster than async_function executes,
# adding them to a pool allows for parallel execution.
pool.apply_async(async_function, params)
def run_service():
# loop is a method that can/will call loop_callback multiple times, and it will call
# loop_callback faster than the code in asyc_function executes
loop(alignment_watch_folder, sleep_duration)
#Class declaration
app = App()
#Declare a pool of processes
# processes=1 indicates serial execution
pool = Pool(processes=4)
#Either run from a daemon process or not
run_from_debugger = False
#Run the daemon process
if run_from_debugger:
run_service()
else:
daemon_runner = runner.DaemonRunner(app)
daemon_runner.do_action()
Any advice would be greatly appreciated.

Quoting from the documentation of multiprocessing:
daemon
The process’s daemon flag, a Boolean value. This must be set before start() is called.
The initial value is inherited from the creating process.
When a process exits, it attempts to terminate all of its daemonic child processes.
Note that a daemonic process is not allowed to create child processes. Otherwise a daemonic process would leave its children
orphaned if it gets terminated when its parent process exits.
Additionally, these are not Unix daemons or services, they are normal
processes that will be terminated (and not joined) if non-daemonic
processes have exited.
Since multiprocessing.Pool has to create worker processes, you cannot daemonaize a process using it.

Processes sharing queue not terminating properly

I have a multiprocessing application where the parent process creates a queue and passes it to worker processes. All processes use this queue for creating a queuehandler for the purpose of logging. There is a worker process reading from this queue and doing logging.
The worker processes continuously check if parent is alive or not. The problem is that when I kill the parent process from command line, all workers are killed except for one. The logger process also terminates. I don't know why one process keeps executing. Is it because of any locks etc in queue? How to properly exit in this scenario? I am using
sys.exit(0)
for exiting.

I would use sys.exit(0) only if there is no other chance. It's always better to cleanly finish each thread / process. You will have some while loop in your Process. So just do break there, so that it can come to an end.
Tidy up before you leave, i.e., release all handles of external resources, e.g., files, sockets, pipes.
Somewhere in these handles might be the reason for the behavior you see.

python/django spawn background process and avoid zombie process

I need to spawn a background process in django, the view returns immediately, the background process continues make some changes, then update the db. This is done by os.spawnl() function to call a separate .py file.
The problem is after the background process is done, it becames a zombie function [python] <defunct>.
How do I avoid that? I followed this and this example but I still got the child process as zombie after the django render process.
I want to take this chance to practice my *nix process management skills so please do me a favor, don't give me Celery or other mq/async task solutions, and I hate dependencies.

This got to long for a comment-
The wait syscall (which os.wait is a wrapper for) reaps exit codes/pids from dead processes. You will want to os.wait in the process that is a generation above your zombie processes; the parent of the zombies processes. The parent processes will receive a SIGCHLD signal when one of its child processes die. If you insist on doing all of this yourself, you will need to install a signal handler to trap for SIGCHLD and in the signal handler call os.wait. Read some documentation on unix process handling and the Python documentation on the os module as there are variations of the os.wait function that will be non-blocking which maybe helpful.
import signal
signal.signal(signal.SIGCHLD, lambda _x,_y: os.wait())

I had a similar problem. I used active_children() from multiprocessing module.
import multiprocessing
# somewhere in middleware or where appropriate call
active_children()

What is a python thread

I have several questions regarding Python threads.
Is a Python thread a Python or OS implementation?
When I use htop a multi-threaded script has multiple entries - the same memory consumption, the same command but a different PID. Does this mean that a [Python] thread is actually a special kind of process? (I know there is a setting in htop to show these threads as one process - Hide userland threads)
Documentation says:
A thread can be flagged as a “daemon thread”. The significance of this
flag is that the entire Python program exits when only daemon threads
are left.
My interpretation/understanding was: main thread terminates when all non-daemon threads are terminated.
So python daemon threads are not part of Python program if "the entire Python program exits when only daemon threads are left"?

Python threads are implemented using OS threads in all implementations I know (C Python, PyPy and Jython). For each Python thread, there is an underlying OS thread.
Some operating systems (Linux being one of them) show all different threads launched by the same executable in the list of all running processes. This is an implementation detail of the OS, not of Python. On some other operating systems, you may not see those threads when listing all the processes.
The process will terminate when the last non-daemon thread finishes. At that point, all the daemon threads will be terminated. So, those threads are part of your process, but are not preventing it from terminating (while a regular thread will prevent it). That is implemented in pure Python. A process terminates when the system _exit function is called (it will kill all threads), and when the main thread terminates (or sys.exit is called), the Python interpreter checks if there is another non-daemon thread running. If there is none, then it calls _exit, otherwise it waits for the non-daemon threads to finish.
The daemon thread flag is implemented in pure Python by the threading module. When the module is loaded, a Thread object is created to represent the main thread, and it's _exitfunc method is registered as an atexit hook.
The code of this function is:
class _MainThread(Thread):
def _exitfunc(self):
self._Thread__stop()
t = _pickSomeNonDaemonThread()
if t:
if __debug__:
self._note("%s: waiting for other threads", self)
while t:
t.join()
t = _pickSomeNonDaemonThread()
if __debug__:
self._note("%s: exiting", self)
self._Thread__delete()
This function will be called by the Python interpreter when sys.exit is called, or when the main thread terminates. When the function returns, the interpreter will call the system _exit function. And the function will terminate, when there are only daemon threads running (if any).
When the _exit function is called, the OS will terminate all of the process threads, and then terminate the process. The Python runtime will not call the _exit function until all the non-daemon thread are done.
All threads are part of the process.
My interpretation/understanding was: main thread terminates when all
non-daemon threads are terminated.
So python daemon threads are not part of python program if "the entire
Python program exits when only daemon threads are left"?
Your understanding is incorrect. For the OS, a process is composed of many threads, all of which are equal (there is nothing special about the main thread for the OS, except that the C runtime add a call to _exit at the end of the main function). And the OS doesn't know about daemon threads. This is purely a Python concept.
The Python interpreter uses native thread to implement Python thread, but has to remember the list of threads created. And using its atexit hook, it ensures that the _exit function returns to the OS only when the last non-daemon thread terminates. When using "the entire Python program", the documentation refers to the whole process.
The following program can help understand the difference between daemon thread and regular thread:
import sys
import time
import threading
class WorkerThread(threading.Thread):
def run(self):
while True:
print 'Working hard'
time.sleep(0.5)
def main(args):
use_daemon = False
for arg in args:
if arg == '--use_daemon':
use_daemon = True
worker = WorkerThread()
worker.setDaemon(use_daemon)
worker.start()
time.sleep(1)
sys.exit(0)
if __name__ == '__main__':
main(sys.argv[1:])
If you execute this program with the '--use_daemon', you will see that the program will only print a small number of Working hard lines. Without this flag, the program will not terminate even when the main thread finishes, and the program will print Working hard lines until it is killed.

I'm not familiar with the implementation, so let's make an experiment:
import threading
import time
def target():
while True:
print 'Thread working...'
time.sleep(5)
NUM_THREADS = 5
for i in range(NUM_THREADS):
thread = threading.Thread(target=target)
thread.start()
The number of threads reported using ps -o cmd,nlwp <pid> is NUM_THREADS+1 (one more for the main thread), so as long as the OS tools detect the number of threads, they should be OS threads. I tried both with cpython and jython and, despite in jython there are some other threads running, for each extra thread that I add, ps increments the thread count by one.
I'm not sure about htop behaviour, but ps seems to be consistent.
I added the following line before starting the threads:
thread.daemon = True
When I executed the using cpython, the program terminated almost immediately and no process was found using ps, so my guess is that the program terminated together with the threads. In jython the program worked the same way (it didn't terminate), so maybe there are some other threads from the jvm that prevent the program from terminating or daemon threads aren't supported.
Note: I used Ubuntu 11.10 with python 2.7.2+ and jython 2.2.1 on java1.6.0_23

Python threads are practically an interpreter implementation, because the so called global interpreter lock (GIL), even if it's technically using the os-level threading mechanisms. On *nix it's utilizing the pthreads, but the GIL effectivly makes it a hybrid stucked to the application-level threading paradigm. So you will see it on *nix systems multiple times in a ps/top output, but it still behaves (performance-wise) like a software-implemented thread.
No, you are just seeing the kind of underlying thread implementation of your os. This kind of behaviur is exposed by *nix pthread-like threading or im told even windows does implement threads this way.
When your program closes, it waits for all threads to finish also. If you have threads, which could postpone the exit indefinitly, it may be wise to flag those threads as "daemons" and allow your program to finish even if those threads are still running.
Some reference material you might be interested:
Linux Gazette: Understanding Threading in Python.
Doug Hellman: Multi-processing techniques in Python
David Beazley: PyCon 2010:Understanding the Python GIL(Video-presentation)

There are great answers to the question, but I feel the daemon threads question is still not explained in a simple fashion. So this answer refers just to the third question
"main thread terminates when all non-daemon threads are terminated."
So python daemon threads are not part of Python program if "the entire Python program exits when only daemon threads are left"?
If you think about what a daemon is, it is usually a service. Some code that runs in an infinite loop, that serves request, fill queues, accepts connections, etc. Other threads use it. It has no purpose when running by itself (in a single process terms).
So the program can't wait for the daemon thread to terminate, because it might never happen. Python will end the program when all non daemon threads are done. It also stops the daemon threads.
To wait until a daemon thread has completed its work, use the join() method.
daemon_thread.join() will make Python to wait for the daemon thread as well before exiting. The join() also accepts a timeout argument.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.