How to track all descendant processes in Linux - python

I am making a library that needs to spawn multiple processes.
I want to be able to know the set of all descendant processes that were spawned during a test. This is useful for terminating well-behaved daemons at the end of a passed test or for debugging deadlocks/hanging processes by getting the stack trace of any processes present after a failing test.
Since some of this requires spawning daemons (fork, fork, then let parent die), we cannot find all processes by iterating over the process tree.
Currently my approach is:
Register handler using os.register_at_fork
On fork, in child, flock a file and append (pid, process start time) into another file
Then when required, we can get the set of child processes by iterating over the entries in the file and keeping the ones where (pid, process start time) match an existing process
The downsides of this approach are:
Only works with multiprocessing or os.fork - does not work when spawning a new Python process using subprocess or a non-Python process.
Locking around the fork may make things more deterministic during tests than they will be in reality, hiding race conditions.
I am looking for a different way to track child processes that avoids these 2 downsides.
Alternatives I have considered:
Using bcc to register probes of fork/clone - the problem with this is that it requires root, which I think would be kind of annoying for running tests from a contributor point-of-view. Is there something similar that can be done as an unprivileged user just for the current process and descendants?
Using strace (or ptrace) similar to above - the problem with this is the performance impact. Several of the tests are specifically benchmarking startup time and ptrace has a relatively large overhead. Maybe it would be less so if only tracking fork and clone, but it still conflicts with the desire to get the stacks on test timeout.
Can someone suggest an approach to this problem that avoids the pitfalls and downsides of the ones above? I am only interested in Linux right now, and ideally it shouldn't require a kernel later than 4.15.

For subprocess.Popen, there's preexec_fn argument for a callable -- you can hack your way through it.
Alternatively, take a look at cgroups (control groups) -- I believe they can handle tricky situations such as daemon creation and so forth.

Given the constraints from my original post, I used the following approach:
putenv("PID_DIR", <some tempdir>)
For the current process, override fork and clone with versions which will trace the process start time to $PID_DIR/<pid>. The override is done using plthook and applies to all loaded shared objects. dlopen should also be overridden to override the functions on any other dynamically loaded libraries.
Set a library with implementations of __libc_start_main, fork, and clone as LD_PRELOAD.
An initial implementation is available here used like:
import process_tracker; process_tracker.install()
import os
pid1 = os.fork()
pid2 = os.fork()
pid3 = os.fork()
if pid1 and pid2 and pid3:
print(process_tracker.children())

Related

Safe to call multiprocessing from a thread in Python?

According to
https://github.com/joblib/joblib/issues/180, and Is there a safe way to create a subprocess from a thread in python?
the Python multiprocessing module does not allow use from within threads. Is this true?
My understanding is that its fine to fork from threads, as long as you
aren't holding a threading.Lock when you do so (in the current thread? anywhere in the process?). However, Python's documentation is silent on whether threading.Lock objects are safely shared after a fork.
There's also this: locks shared from the logging module causes issues with fork. https://bugs.python.org/issue6721
I'm not sure how this issue arises. It sounds like the state of any locks in the process are copied into the child process when the current thread forks (which seems like a design error and certain to deadlock). If so, does using multiprocessing really provide any protection against this (since I'm free to create my multiprocessing.Pool after threading.Lock is created and entered by other threads, and after threads have started that using the not-fork-safe logging module) -- the multiprocessing module docs are also silent about whether multiprocessing.Pools should be allocated before Locks.
Does replacing threading.Lock with multiprocessing.Lock everywhere avoid this issue and allow us to safely combine threads and forks?
It sounds like the state of any locks in the process are copied into the child process when the current thread forks (which seems like a design error and certain to deadlock).
It is not a design error, rather, fork() predates single-process multithreading. The state of all locks is copied into the child process because they're just objects in memory; the entire address-space of the process is copied as is in fork. There are only bad alternatives: either copy all threads over fork, or deny forking in multithreaded application.
Therefore, fork()ing in a multithreading program was never the safe thing to do, unless then followed by execve() or exit() in the child process.
Does replacing threading.Lock with multiprocessing.Lock everywhere avoid this issue and allow us to safely combine threads and forks?
No. Nothing makes it safe to combine threads and forks, it cannot be done.
The problem is that when you have multiple threads in a process, after fork() system call you cannot continue safely running the program in POSIX systems.
For example, Linux manuals fork(2):
After a fork(2) in a multithreaded program, the child can safely call
only async-signal-safe functions (see signal(7)) until such time as it
calls execve(2).
I.e. it is OK to fork() in a multithreaded program and then only call async-signal-safe C functions (which is a rather limited subset of C functions), until the child process has been replaced with another executable!
Unsafe C function calls in child processes are then for example
malloc for dynamic memory allocation
any <stdio.h> functions for formatted input
most of the pthread_* functions required for thread state handling, including creation of new threads...
Thus there is very little what the child process can actually safely do. Unfortunately CPython core developers have been downplaying the problems caused by this. Even now the documentation says:
Note that safely forking a multithreaded process is
problematic.
Quite an euphemism for "impossible".
It is safe to use multiprocessing from a Python process that has multiple threads of control provided that you're not using the fork start method; in Python 3.4+ it is now possible to change the start method. In previous Python versions including all of Python 2, the POSIX systems always behaved as if fork was specified as the start method; this would result in undefined behaviour.
The problems are not limited to just threading.Lock objects but all locks held by the C standard library, the C extensions etc. What is worse that most of the time people would say "it works for me"... until it stops from working.
There were even a cases where a seemingly single-threading Python program is actually multithreading in MacOS X, causing failures and deadlocks upon using multiprocessing.
Another problem is that all opened file handles, their use, shared sockets might behave oddly in programs that forks, but that would be the case even in single-threaded programs.
TL;DR: using multiprocessing in multithreaded programs, with C extensions, with opened sockets etc:
fine in 3.4+ & POSIX if you explicitly specify a starting method that is not fork,
fine in Windows because it doesn't support forking;
in Python 2 - 3.3 on POSIX: you'll mostly shoot yourself in the foot.

Does python os.fork uses the same python interpreter?

I understand that threads in Python use the same instance of Python interpreter. My question is it the same with process created by os.fork? Or does each process created by os.fork has its own interpreter?
Whenever you fork, the entire Python process is duplicated in memory (including the Python interpreter, your code and any libraries, current stack etc.) to create a second process - one reason why forking a process is much more expensive than creating a thread.
This creates a new copy of the python interpreter.
One advantage of having two python interpreters running is that you now have two GIL's (Global Interpreter Locks), and therefore can have true multi-processing on a multi-core system.
Threads in one process share the same GIL, meaning only one runs at a given moment, giving only the illusion of parallelism.
While fork does indeed create a copy of the current Python interpreter rather than running with the same one, it usually isn't what you want, at least not on its own. Among other problems:
There can be problems forking multi-threaded processes on some platforms. And some libraries (most famously Apple's Cocoa/CoreFoundation) may start threads for you in the background, or use thread-local APIs even though you've only got one thread, etc., without your knowledge.
Some libraries assume that every process will be initialized properly, but if you fork after initialization that isn't true. Most infamously, if you let ssl seed its PRNG in the main process, then fork, you now have potentially predictable random numbers, which is a big hole in your security.
Open file descriptors are inherited (as dups) by the children, with details that vary in annoying ways between platforms.
POSIX only requires platforms to implement a very specific set of syscalls between a fork and an exec. If you never call exec, you can only use those syscalls. Which basically means you can't do anything portably.
Anything to do with signals is especially annoying and nonportable after fork.
See POSIX fork or your platform's manpage for details on these issues.
The right answer is almost always to use multiprocessing, or concurrent.futures (which wraps up multiprocessing), or a similar third-party library.
With 3.4+, you can even specify a start method. The fork method basically just calls fork. The forkserver method runs a single "clean" process (no threads, signal handlers, SSL initialization, etc.) and forks off new children from that. The spawn method calls fork then exec, or an equivalent like posix_spawn, to get you a brand-new interpreter instead of a copy. So you can start off with fork, ut then if there are any problems, switch to forkserver or spawn and nothing else in your code has to change. Which is pretty nice.
os.fork() is equivalent to the fork() syscall in many UNIC(es). So yes your sub-process(es) will be separate from the parent and have a different interpreter (as such).
man fork:
FORK(2)
NAME
fork - create a child process
SYNOPSIS
#include
pid_t fork(void);
DESCRIPTION
fork() creates a new process by duplicating the calling process. The new process, referred to as the child,
is an exact duplicate of the calling process, referred to as the parent, except for the following points:
pydoc os.fork():
os.fork() Fork a child process. Return 0 in the child and the
child’s process id in the parent. If an error occurs OSError is
raised.
Note that some platforms including FreeBSD <= 6.3, Cygwin and OS/2 EMX
have known issues when using fork() from a thread.
See also: Martin Konecny's response as to the why's and advantages of "forking" :)
For brevity; other approaches to concurrency which don't involve a separate process and therefore a separate Python interpreter include:
Green or Lightweight threads; ala greenlet
Coroutines ala Python generators and the new Python 3+ yield from
Async I/O ala asyncio, Twisted, circuits, etc.

How to refactor Python code dependant on fork() copying state

I'm working on a large-ish Python code base that's been around for over a decade now. The application in question makes use of forking for it's parallelism.
The basic premise is that the user asks the program to build a particular target, we figure out a dependancy graph for the target, then from topological partitions in the build graph figure out some tasks we can perform in parallel. We then fork some processes to perform those tasks (from the partition) in parallel.
It all kind of works. However I'd like to refactor it NOT to depend on fork(). In particular, it's the dependancy on state from the master process being available in the child processes that's a problem.
There are a couple of motivating factors for the refactoring:
I'd like to have the code as similar as possible between Linux
and Windows (currently on Windows we perform a non forking build,
thus no parallelism)
The forking is a little ugly with respect to
other refactoring I want to do (basically, I'd like to have more
centralised control and monitoring of building). Instead of forking,
I'd like to go through the Python Multiprocessing module (which I've
used in the past with good results).
The problem is quite a lot of data structures that are currently used by the forked processes (which were set up by the master process) cannot easily be serialized (nor can they be inferred for construction by the child process). Open file descriptors is one such example, dependancy on object identity (build graph) is another.
Basically, I'm looking for advice on how to best approach this problem holistically.
I propose following paradigm
Master is a single process and does all the dependency resolution, graph partitioning, etc, down to single, individual job. Thus there is only one copy of system state.
These leaf jobs are offloaded using subprocess or multiprocessing or os.system.
The simpler offload mechanism, the more platform independence :)
Leaves are of course asynchronous, thus you need a framework for handling asynchronous notifications -- you can use gevent or some library that implements futures. If you are truly hardcore, twisted. Python 3.x also brings in asyncio that may be useful.
You can also use a resource / executor pool with ad-hoc notifications, e.g. post-order tranversal which I think can be implemented relatively simply using a recursive function, or in your case, recursive generator.

Status of mixing multiprocessing and threading in Python

What are best practices or work-arounds for using both multiprocessing and user threads in the same python application in Linux with respect to Issue 6721, Locks in python standard library should be sanitized on fork?
Why do I need both? I use child processes to do heavy computation that produce data structure results that are much too large to return through a queue -- rather they must be immediately stored to disk. It seemed efficient to have each of these child processes monitored by a separate thread, so that when finished, the thread could handle the IO of reading the large (eg multi GB) data back into the process where the result was needed for further computation in combination with the results of other child processes.
The children processes would intermittently hang, which I just (after much head pounding) found was 'caused' by using the logging module. Others have documented the problem here:
https://twiki.cern.ch/twiki/bin/view/Main/PythonLoggingThreadingMultiprocessingIntermixedStudy
which points to this apparently unsolved python issue: Locks in python standard library should be sanitized on fork; http://bugs.python.org/issue6721
Alarmed at the difficulty I had tracking this down, I answered:
Are there any reasons not to mix Multiprocessing and Threading module in Python
with the rather unhelpful suggestion to 'Be careful' and links to the above.
But the lengthy discussion re: Issue 6721 suggests that it is a 'bug' to use both multiprocessing (or os.fork) and user threads in the same application. With my limited understanding of the problem, I find too much disagreement in the discussion to conclude what are the work-arounds or strategies for using both multiprocessing and threading in the same application. My immediate problem was solved by disabling logging, but I create a small handful of other (explicit) locks in both parent and child processes, and suspect I am setting myself up for further intermittent deadlocks.
Can you give practical recommendations to avoid deadlocks while using locks and/or the logging module while using threading and multiprocessing in a python (2.7,3.2,3.3) application?
You will be safe if you fork off additional processes while you still have only one thread in your program (that is, fork from main thread, before spawning worker threads).
Your use case looks like you don't even need multiprocessing module; you can use subprocess (or even simpler os.system-like calls).
See also Is it safe to fork from within a thread?

How to "signal" interested child processes (without signals)?

I'm trying to find a good and simple method to signal child processes
(created through SocketServer with ForkingMixIn) from the parent
process.
While Unix signals could be used, I want to avoid them since only
children who are interested should receive the signal, and it would be
overkill and complicated to require some kind of registration
mechanism to identify to the parent process who is interested.
(Please don't suggest threads, as this particular program won't work
with threads, and thus has to use forks.)
Since you are on a unix system, semaphores should be the easy answer.
Unfortunately, python does not seem to offer a way to call the semop system call.
If you are using python 2.6 , you may be able to use the
multiprocessing module Condition class.
I have come up with the idea of using a pipe file descriptor that the parent could write and then read/flush in combination with select, but this doesn't really qualify as a very elegant design.
In more detail: The parent would create a pipe, the subprocesses would inherit it, the parent process would write to the pipe, thereby waking up any subprocess select():ing on the file descriptor, but the parent would then immediately read from the read end of the pipe and empty it - the only effect being that those child processes that were select():ing on the pipe have woken up.
As I said, this feels odd and ugly, but I haven't found anything really better yet.
Edit:
It turns out that this doesn't work - some child processes are woken up and some aren't. I've resorted to using a Condition from the multiprocessing module.

Categories