Safe to call multiprocessing from a thread in Python? - python

According to
https://github.com/joblib/joblib/issues/180, and Is there a safe way to create a subprocess from a thread in python?
the Python multiprocessing module does not allow use from within threads. Is this true?
My understanding is that its fine to fork from threads, as long as you
aren't holding a threading.Lock when you do so (in the current thread? anywhere in the process?). However, Python's documentation is silent on whether threading.Lock objects are safely shared after a fork.
There's also this: locks shared from the logging module causes issues with fork. https://bugs.python.org/issue6721
I'm not sure how this issue arises. It sounds like the state of any locks in the process are copied into the child process when the current thread forks (which seems like a design error and certain to deadlock). If so, does using multiprocessing really provide any protection against this (since I'm free to create my multiprocessing.Pool after threading.Lock is created and entered by other threads, and after threads have started that using the not-fork-safe logging module) -- the multiprocessing module docs are also silent about whether multiprocessing.Pools should be allocated before Locks.
Does replacing threading.Lock with multiprocessing.Lock everywhere avoid this issue and allow us to safely combine threads and forks?

It sounds like the state of any locks in the process are copied into the child process when the current thread forks (which seems like a design error and certain to deadlock).
It is not a design error, rather, fork() predates single-process multithreading. The state of all locks is copied into the child process because they're just objects in memory; the entire address-space of the process is copied as is in fork. There are only bad alternatives: either copy all threads over fork, or deny forking in multithreaded application.
Therefore, fork()ing in a multithreading program was never the safe thing to do, unless then followed by execve() or exit() in the child process.
Does replacing threading.Lock with multiprocessing.Lock everywhere avoid this issue and allow us to safely combine threads and forks?
No. Nothing makes it safe to combine threads and forks, it cannot be done.
The problem is that when you have multiple threads in a process, after fork() system call you cannot continue safely running the program in POSIX systems.
For example, Linux manuals fork(2):
After a fork(2) in a multithreaded program, the child can safely call
only async-signal-safe functions (see signal(7)) until such time as it
calls execve(2).
I.e. it is OK to fork() in a multithreaded program and then only call async-signal-safe C functions (which is a rather limited subset of C functions), until the child process has been replaced with another executable!
Unsafe C function calls in child processes are then for example
malloc for dynamic memory allocation
any <stdio.h> functions for formatted input
most of the pthread_* functions required for thread state handling, including creation of new threads...
Thus there is very little what the child process can actually safely do. Unfortunately CPython core developers have been downplaying the problems caused by this. Even now the documentation says:
Note that safely forking a multithreaded process is
problematic.
Quite an euphemism for "impossible".
It is safe to use multiprocessing from a Python process that has multiple threads of control provided that you're not using the fork start method; in Python 3.4+ it is now possible to change the start method. In previous Python versions including all of Python 2, the POSIX systems always behaved as if fork was specified as the start method; this would result in undefined behaviour.
The problems are not limited to just threading.Lock objects but all locks held by the C standard library, the C extensions etc. What is worse that most of the time people would say "it works for me"... until it stops from working.
There were even a cases where a seemingly single-threading Python program is actually multithreading in MacOS X, causing failures and deadlocks upon using multiprocessing.
Another problem is that all opened file handles, their use, shared sockets might behave oddly in programs that forks, but that would be the case even in single-threaded programs.
TL;DR: using multiprocessing in multithreaded programs, with C extensions, with opened sockets etc:
fine in 3.4+ & POSIX if you explicitly specify a starting method that is not fork,
fine in Windows because it doesn't support forking;
in Python 2 - 3.3 on POSIX: you'll mostly shoot yourself in the foot.

Related

How to manipulate the GIL when many threads need to execute?

My understanding is that the typical GIL manipulations involve, e.g., blocking I/O operations. Hence one would want to release the lock before the I/O operation and reacquire it once it has completed.
I'm currently facing a different scenario with a C extension: I am creating X windows that are exposed to Python via the Canvas class. When the method show() is called on an instance, a new UI thread is started using PyThreads (with a call to PyThread_start_new_thread). This new thread is responsible for drawing on the X window, using the Python code specified in the on_draw method of a subclass of Canvas. A pure C event loop is started in the main thread that simply checks for events on the X window and, for the time being, only captures the WM_DELETE_EVENT.
So I have potentially many threads (one for each X window) that want to execute Python code and the main thread that does not execute any Python code at all.
How do I release/acquire the GIL in order to allow the UI threads to get into the interpreter orderly?
The rule is easy: you need to hold the GIL to access Python machinery (any API starting with Py<...> and any PyObject).
So, you can release it whenever you don't need any of that.
Anything further than this is the fundamental problem of locking granularity: potential benefits vs locking overhead. There was an experiment for Py 1.4 to replace the GIL with more granular locks that failed exactly because the overhead proved prohibitive.
That's why it's typically released for code chunks involving call(s) to extental facilities that can take arbitrary time (especially if they involve waiting for external events) -- if you don't release the lock, Python will be just idling during this time.
Heeding this rule, you will get to your goal automatically: whenever a thread can't proceed further (whether it's I/O, signal from another thread, or even so much as a time.sleep() to avoid a busy loop), it will release the lock and allow other threads to proceed in its stead. The GIL assigning mechanism strives to be fair (see issue8299 for exploration on how fair it is), releasing the programmer from bothering about any bias stemming solely from the engine.
I think the problem stems from the fact that, in my opinion, the official documentation is a bit ambiguous on the meaning of Non-Python created threads. Quoting from it:
When threads are created using the dedicated Python APIs (such as the threading module), a thread state is automatically associated to them and the code showed above is therefore correct. However, when threads are created from C (for example by a third-party library with its own thread management), they don’t hold the GIL, nor is there a thread state structure for them.
I have highlighted in bold the parts that I find off-putting. As I have stated in the OP, I am calling PyThread_start_new_thread. Whilst this creates a new thread from C, this function is not part of a third-party library, but of the dedicated Python (C) APIs. Based on this assumption, I ruled out that I actually needed to use the PyGILState_Ensure/PyGILState_Release paradigm.
As far as I can tell from what I've seen with my experiments, a thread created from C with (just) PyThread_start_new_thread should be considered as a non-Python created thread.

Does python os.fork uses the same python interpreter?

I understand that threads in Python use the same instance of Python interpreter. My question is it the same with process created by os.fork? Or does each process created by os.fork has its own interpreter?
Whenever you fork, the entire Python process is duplicated in memory (including the Python interpreter, your code and any libraries, current stack etc.) to create a second process - one reason why forking a process is much more expensive than creating a thread.
This creates a new copy of the python interpreter.
One advantage of having two python interpreters running is that you now have two GIL's (Global Interpreter Locks), and therefore can have true multi-processing on a multi-core system.
Threads in one process share the same GIL, meaning only one runs at a given moment, giving only the illusion of parallelism.
While fork does indeed create a copy of the current Python interpreter rather than running with the same one, it usually isn't what you want, at least not on its own. Among other problems:
There can be problems forking multi-threaded processes on some platforms. And some libraries (most famously Apple's Cocoa/CoreFoundation) may start threads for you in the background, or use thread-local APIs even though you've only got one thread, etc., without your knowledge.
Some libraries assume that every process will be initialized properly, but if you fork after initialization that isn't true. Most infamously, if you let ssl seed its PRNG in the main process, then fork, you now have potentially predictable random numbers, which is a big hole in your security.
Open file descriptors are inherited (as dups) by the children, with details that vary in annoying ways between platforms.
POSIX only requires platforms to implement a very specific set of syscalls between a fork and an exec. If you never call exec, you can only use those syscalls. Which basically means you can't do anything portably.
Anything to do with signals is especially annoying and nonportable after fork.
See POSIX fork or your platform's manpage for details on these issues.
The right answer is almost always to use multiprocessing, or concurrent.futures (which wraps up multiprocessing), or a similar third-party library.
With 3.4+, you can even specify a start method. The fork method basically just calls fork. The forkserver method runs a single "clean" process (no threads, signal handlers, SSL initialization, etc.) and forks off new children from that. The spawn method calls fork then exec, or an equivalent like posix_spawn, to get you a brand-new interpreter instead of a copy. So you can start off with fork, ut then if there are any problems, switch to forkserver or spawn and nothing else in your code has to change. Which is pretty nice.
os.fork() is equivalent to the fork() syscall in many UNIC(es). So yes your sub-process(es) will be separate from the parent and have a different interpreter (as such).
man fork:
FORK(2)
NAME
fork - create a child process
SYNOPSIS
#include
pid_t fork(void);
DESCRIPTION
fork() creates a new process by duplicating the calling process. The new process, referred to as the child,
is an exact duplicate of the calling process, referred to as the parent, except for the following points:
pydoc os.fork():
os.fork() Fork a child process. Return 0 in the child and the
child’s process id in the parent. If an error occurs OSError is
raised.
Note that some platforms including FreeBSD <= 6.3, Cygwin and OS/2 EMX
have known issues when using fork() from a thread.
See also: Martin Konecny's response as to the why's and advantages of "forking" :)
For brevity; other approaches to concurrency which don't involve a separate process and therefore a separate Python interpreter include:
Green or Lightweight threads; ala greenlet
Coroutines ala Python generators and the new Python 3+ yield from
Async I/O ala asyncio, Twisted, circuits, etc.

Status of mixing multiprocessing and threading in Python

What are best practices or work-arounds for using both multiprocessing and user threads in the same python application in Linux with respect to Issue 6721, Locks in python standard library should be sanitized on fork?
Why do I need both? I use child processes to do heavy computation that produce data structure results that are much too large to return through a queue -- rather they must be immediately stored to disk. It seemed efficient to have each of these child processes monitored by a separate thread, so that when finished, the thread could handle the IO of reading the large (eg multi GB) data back into the process where the result was needed for further computation in combination with the results of other child processes.
The children processes would intermittently hang, which I just (after much head pounding) found was 'caused' by using the logging module. Others have documented the problem here:
https://twiki.cern.ch/twiki/bin/view/Main/PythonLoggingThreadingMultiprocessingIntermixedStudy
which points to this apparently unsolved python issue: Locks in python standard library should be sanitized on fork; http://bugs.python.org/issue6721
Alarmed at the difficulty I had tracking this down, I answered:
Are there any reasons not to mix Multiprocessing and Threading module in Python
with the rather unhelpful suggestion to 'Be careful' and links to the above.
But the lengthy discussion re: Issue 6721 suggests that it is a 'bug' to use both multiprocessing (or os.fork) and user threads in the same application. With my limited understanding of the problem, I find too much disagreement in the discussion to conclude what are the work-arounds or strategies for using both multiprocessing and threading in the same application. My immediate problem was solved by disabling logging, but I create a small handful of other (explicit) locks in both parent and child processes, and suspect I am setting myself up for further intermittent deadlocks.
Can you give practical recommendations to avoid deadlocks while using locks and/or the logging module while using threading and multiprocessing in a python (2.7,3.2,3.3) application?
You will be safe if you fork off additional processes while you still have only one thread in your program (that is, fork from main thread, before spawning worker threads).
Your use case looks like you don't even need multiprocessing module; you can use subprocess (or even simpler os.system-like calls).
See also Is it safe to fork from within a thread?

Python threads all executing on a single core

I have a Python program that spawns many threads, runs 4 at a time, and each performs an expensive operation. Pseudocode:
for object in list:
t = Thread(target=process, args=(object))
# if fewer than 4 threads are currently running, t.start(). Otherwise, add t to queue
But when the program is run, Activity Monitor in OS X shows that 1 of the 4 logical cores is at 100% and the others are at nearly 0. Obviously I can't force the OS to do anything but I've never had to pay attention to performance in multi-threaded code like this before so I was wondering if I'm just missing or misunderstanding something.
Thanks.
Note that in many cases (and virtually all cases where your "expensive operation" is a calculation implemented in Python), multiple threads will not actually run concurrently due to Python's Global Interpreter Lock (GIL).
The GIL is an interpreter-level lock.
This lock prevents execution of
multiple threads at once in the Python
interpreter. Each thread that wants to
run must wait for the GIL to be
released by the other thread, which
means your multi-threaded Python
application is essentially single
threaded, right? Yes. Not exactly.
Sort of.
CPython uses what’s called “operating
system” threads under the covers,
which is to say each time a request to
make a new thread is made, the
interpreter actually calls into the
operating system’s libraries and
kernel to generate a new thread. This
is the same as Java, for example. So
in memory you really do have multiple
threads and normally the operating
system controls which thread is
scheduled to run. On a multiple
processor machine, this means you
could have many threads spread across
multiple processors, all happily
chugging away doing work.
However, while CPython does use
operating system threads (in theory
allowing multiple threads to execute
within the interpreter
simultaneously), the interpreter also
forces the GIL to be acquired by a
thread before it can access the
interpreter and stack and can modify
Python objects in memory all
willy-nilly. The latter point is why
the GIL exists: The GIL prevents
simultaneous access to Python objects
by multiple threads. But this does not
save you (as illustrated by the Bank
example) from being a lock-sensitive
creature; you don’t get a free ride.
The GIL is there to protect the
interpreters memory, not your sanity.
See the Global Interpreter Lock section of Jesse Noller's post for more details.
To get around this problem, check out Python's multiprocessing module.
multiple processes (with judicious use
of IPC) are[...] a much better
approach to writing apps for multi-CPU
boxes than threads.
-- Guido van Rossum (creator of Python)
Edit based on a comment from #spinkus:
If Python can't run multiple threads simultaneously, then why have threading at all?
Threads can still be very useful in Python when doing simultaneous operations that do not need to modify the interpreter's state. This includes many (most?) long-running function calls that are not in-Python calculations, such as I/O (file access or network requests)) and [calculations on Numpy arrays][6]. These operations release the GIL while waiting for a result, allowing the program to continue executing. Then, once the result is received, the thread must re-acquire the GIL in order to use that result in "Python-land"
Python has a Global Interpreter Lock, which can prevent threads of interpreted code from being processed concurrently.
http://en.wikipedia.org/wiki/Global_Interpreter_Lock
http://wiki.python.org/moin/GlobalInterpreterLock
For ways to get around this, try the multiprocessing module, as advised here:
Does running separate python processes avoid the GIL?
AFAIK, in CPython the Global Interpreter Lock means that there can't be more than one block of Python code being run at any one time. Although this does not really affect anything in a single processor/single-core machine, on a mulitcore machine it means you have effectively only one thread running at any one time - causing all the other core to be idle.

Are Python extensions produced by Cython/Pyrex threadsafe?

If not, is there a way I can guarantee thread safety by programming a certain way?
To clarify, when talking about "threadsafe,' I mean Python threads, not OS-level threads.
It all depends on the interaction between your Cython code and Python's GIL, as documented in detail here. If you don't do anything special, Cython-generated code will respect the GIL (as will a C-coded extension that doesn't use the GIL-releasing macros); that makes such code "as threadsafe as Python code" -- which isn't much, but is easier to handle than completely free-threading code (you still need to architect multi-threaded cooperation and synchronization, ideally with Queue instances but possibly with locking &c).
Code that has relinquished the GIL and not yet acquired it back MUST NOT in any way interact with the Python runtime and the objects that the Python runtime uses -- this goes for Cython just as well as for C-coded extensions. The upside of it is of course that such code can run on a separate core (until it needs to sync up or in any way communicate with the Python runtime again, of course).
Python's global interpreter lock means that only one thread can be active in the interpreter at any one time. However, once control is passed out to a C extension another thread can be active within the interpreter. Multiple threads can be created, and nothing prevents a thread from being interrupted within the middle of a critical section. N
on thread-safe code can be implemented within the interpreter, so nothing about code running within the interpreter is inherently thread safe. Code in C or Pyrex modules can still modify data structures that are visible to python code. Native code can, of course, also have threading issues with native data structures.
You can't guarantee thread safety beyond using appropriate design and synchronisation - the GIL on the python interpreter doesn't materially change this.

Categories