Is Malloc safe to use with nogil? - python

Is it safe to allocate memory with malloc with nogil in cython?
Also is it safe to pass pointers with you have multithreaded program running with nogil?

The GIL is in place because CPythons memory management is not thread-safe. As a consequence, you can use nogil in cases where you do not interact with a Python Object, i.e with memory that is handled by Python.
This is mentioned in the documentation for releasing the GIL:
Code in the body of the statement must not manipulate Python objects in any way, and must not call anything that manipulates Python objects without first re-acquiring the GIL. Cython currently does not check this.
So, using malloc, passing pointers and doing anything else that is legal in C is perfectly safe as long as no Python Objects are involved.

Related

Wrapping C++ with Python - Returning References

I'm looking to wrap a small C++ library for use in Python. I've briefly read up on the Python C-API, ctypes, Cython, SWIG, Boost.Python, and CLIF. Which framework (or other) should I use given my specific use-case described below?
Context: I have a multi-process program that does communication via shared memory. I've written Writer and Reader classes to interact with the shared memory and handle synchronization. I have multiple C++ programs working together via the shared memory and want to add a Python program to the party. Its behavior in interacting with the shared memory should be the same as the C++ programs.
Considerations:
(main) The data packets stored in the shared memory are large, so I want to avoid a copy when returning to Python. Specifically, the Reader::read() method returns a const T& reference so the client can directly read from it. I'd like to preserve this behavior in the Python class.
Would like to directly expose the Writer() and Reader() classes as Python classes without re-implementing logic.
I'm only exposing these two classes, not an entire library. I don't mind some upfront work if it means more maintainability.
All things being equal, I'd like to minimize external dependencies. That said, I don't mind dependencies if it leads to a better outcome.

Is it thread safe to modify a static variable?

Since C++11, static variable initialization is guaranteed to be thread safe. But how about modifying a static variable in multiple threads? like below
static int initialized = 0;
Initialize()
{
if (initialized)
return;
initialized = 1; // Is this thread safe?
}
The reason I ask this question is that I am reading the source code for
Py_Initialize(), I am trying to embed Python in a multithreaded C++ application, I am wondering if it is safe to call Py_Initialize() multiple times in several threads? The implementation of Py_Initialize() boils down to
function _Py_InitializeEx_Private, which is like below
// pylifecycle.c
static int initialized = 0;
_Py_InitializeEx_Private(int install_sigs, int install_importlib)
{
if (initialized)
return;
initialized = 1;
// a bunch of other stuff
}
And is the conclusion for C the same as C++?
EDIT
So all the answers are good, I chose the one which clears my head most.
No, static in this context is only about the storage duration (see http://en.cppreference.com/w/c/language/static_storage_duration).
The variable has no extra thread safety at all over some other variable.
Try using std::call_once for this, see http://en.cppreference.com/w/cpp/thread/call_once
It's not thread safe to modify a static variable, but initializing a static variable is thread safe. So you can do:
void my_py_init() {
static bool x = (Py_Initialize(), true);
}
That's it. You can now call my_py_init from as many threads as you want and Py_Initialize will only ever get called once.
Py_Initialize is not thread-safe. You can call it from multiple threads only if you know that the Python interpreter has already been initialized, but if you can prove that it would be silly to call the function.
Indeed, most Python C-API calls are not thread-safe; you need to acquire the Global Interpreter Lock (GIL) in order to interact with the Python interpreter. (See the Python C-API docs for more details. Read it carefully.)
However, as far as I know you cannot use the standard API to acquire the GIL until the interpreter has been initialized. So if you have multiple threads, any of which might initialize the same Python interpreter, you would need to protect the calls to Py_Initialize with your own mutex. You might well be better off doing the initialization once before you start up any threads, if that is possible with your program logic.
The code you cite:
static int initialized = 0;
void Initialize_If_Necessary()
{
if (initialized)
return;
initialized = 1;
// Do the initialization only once
}
is clearly not threadsafe in any language, even if initialized is an atomic type. Suppose two threads were simultaneously executing this code before any initialization happened: both of them see initialized as false, so both of them proceed with the initialization. (If you don't have two cores, you could imagine that the first process is task switched between the test of initialized and the assignment.)
Modifying a static variable across multiple threads is not safe, since if the variable is put into a register, then other cores' information in the same registers will be different (modifying the variable in another thread would be the same as attempting to access that core's version of the register, which contains completely different data).
The first code sample is the typical starting point for what is referred to as 'lazy-initialisation'. It's useful for guaranteeing once-only initialisation of "expensive objects"; but doing so only if needed just before any use of the object.
That specific example doesn't have any serious problems, but it's an oversimplification. And when you look more holistically at lazy-initialisation, you'll see that multi-threaded lazy-initialisation is not a good idea.
The concept of "Thread Safety" goes way beyond just a single variable (static or otherwise). You need to step back and consider things happening to the same1 resources (memory, objects, files, ...) at the same time.
1: Different instances of the same class are not the same thing; but their static members are.
Consider the following extract from your second example.
if (initialized)
return;
initialized = 1;
// a bunch of other stuff
In the first 3 lines, there's no serious harm if multiple threads execute that code approximately concurrently. Some threads might return early; others might be a little "too quick" and all perform the task of setting initialized = 1;. However, that wouldn't be a concern, since no matter how many threads set the shared variable, the net effect is always the same.
The problem comes in with the fourth line. The one almost nonchalantly brushed aside as "a bunch of other stuff". That "other stuff" is the really critical code, because if it's possible for initialized = 1; to be called multiple times, you need to consider the impact of calling "other stuff" multiple times and concurrently.
Now, in the unlikely event you satisfy yourself that "other stuff" can be called multiple times, there's another concern...
Consider the client code that might be using Python.
Py_Initialize();
//use Python
If 2 threads call the above simultaneously; with 1 'returning early' and the other actually performing the initialisation. Then the 'early-returning thread' would start (or try to start) using Python before it's fully initialised!
As a bit of a hack, you might try blocking at the if (initialized) line for the duration of the initialisation process. But this is undesirable for 2 reasons:
Multiple threads are likely to be stuck waiting in the early stages of their processing.
Even after initialisation is complete you'd have a small (but totally wasteful) overhead of checking the lock each time you 'lazy-initialise' the Python framework.
Conclusion
Lazy-initialisation has its uses. But you're much better off not trying to perform the lazy initialisation from multiple threads. Rather have a "safe thread" (main thread is usually good enough) that can perform the lazy-initialisation before even creating any threads that would try to use whatever has been initialised. Then you won't have to worry about the thread-safety at all.

Multithreaded Access to Python bytearray

It seems that since access to NumPy array data doesn't require calls into the Python interpreter, C extensions can manipulate these arrays after releasing the GIL. For instance, in this thread.
The built-in Python type bytearray supports the Buffer Protocol, one member of which is
void *buf
A pointer to the start of the logical structure described by the
buffer fields. [...]
For contiguous arrays, the value points to the beginning of the memory
block.
My question is, can a C extension manipulate this buf after releasing the GIL (Py_BEGIN_ALLOW_THREADS) since accessing it no longer requires calls to the Python C API? Or does the nature of the Python garbage collector forbid this, since the bytearray, and its buf, might be moved during execution?
To clarify the short answer written as comment: you can access the *buf data without holding the GIL, provided you are sure that the Py_buffer struct is "owned" by the thread while it is running without the GIL.
For the sake of completeness, I should add that this may open the door to (very remote) crashes risks: if the GIL-less thread reads the data at *buf while at the same time another GIL-holding thread is running Python code that changes the same data (bytearray[index]=x) then the GIL-less thread can see unexpected changes of the data under its feet. The opposite is true too, and even more annoying (but still theoretical): if the GIL-less thread changes the data at *buf, then other GIL-holding, Python-running threads might see strange results or even maybe crashes if doing some complex reading operations like bytearray.split().

Calling Python code from a C thread

I'm very confused as to how exactly I can ensure thread-safety when calling Python code from a C (or C++) thread.
The Python documentation seems to be saying that the usual idiom to do so is:
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
/* Perform Python actions here. */
result = CallSomeFunction();
/* evaluate result or handle exception */
/* Release the thread. No Python API allowed beyond this point. */
PyGILState_Release(gstate);
And indeed, this stackoverflow answer seems to confirm as much. But a commenter (with a very high reputation) says otherwise. The commenter says you should use PyEval_RestoreThread()/PyEval_SaveThread().
The docs seem to confirm this:
PyThreadState* PyEval_SaveThread()
Release the global interpreter lock (if it has been created and
thread support is enabled) and reset the thread state to NULL,
returning the previous thread state (which is not NULL). If the lock
has been created, the current thread must have acquired it. (This
function is available even when thread support is disabled at compile
time.)
void PyEval_RestoreThread(PyThreadState *tstate)
Acquire the global interpreter lock (if it has been created and thread
support is enabled) and set the thread state to tstate, which must not
be NULL. If the lock has been created, the current thread must not have
acquired it, otherwise deadlock ensues. (This function is available even
when thread support is disabled at compile time.)
The way the docs describe this, it seems that PyEval_RestoreThread()/PyEval_SaveThread() is basically a mutex lock/unlock idiom. So it would make sense that before calling any Python code from C, you first need to lock the GIL, and then unlock it.
So which is it? When calling Python code from C, should I use:
PyGILState_Ensure()/PyGILState_Release()
or
PyEval_RestoreThread/PyEval_SaveThread?
And what is really the difference?
First, you almost never want to call PyEval_RestoreThread/PyEval_SaveThread. Instead, you want to call the wrapper macros Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS. The documentation is written for those macros, which is why you couldn't find it.
Anyway, either way, you don't use the thread functions/macros to acquire the GIL; you use them to temporarily release the GIL when you've acquired it.
So, why would you ever want to do this? Well, in simple cases you don't; you just need Ensure/Release. But sometimes you need to hold onto your Python thread state until later, but don't need to hold onto the GIL (or even explicitly need to not hold onto the GIL, to allow some other thread to progress so it can signal you). As the docs explain, the most common reasons for this are doing file I/O or extensive CPU-bound computation.
Finally, is there any case where you want to call the functions instead of the macros? Yes, if you want access to the stashed PyThreadState. If you can't think of a reason why you might want that, you probably don't have one.

What are the implications of calling NumPy's C API functions from multiple threads?

This is risky business, and I understand the Global Interpreter Lock to be a formidable foe of parallelism. However, if I'm using NumPy's C API (specifically the PyArray_DATA macro on a NumPy array), are there potential consequences to invoking it from multiple concurrent threads?
Note that I will still own the GIL and not be releasing it with NumPy's threading support. Also, even if NumPy makes no guarantees about thread safety but PyArray_DATA is thread-safe in practice, that's good enough for me.
I'm running Python 2.6.6 with NumPy 1.3.0 on Linux.
Answering my own question here, but after poking into the source code for NumPy 1.3.0, I believe the answer is: Yes, PyArray_DATA is thread-safe.
PyArray_DATA is defined in
ndarrayobject.h:
#define PyArray_DATA(obj) ((void *)(((PyArrayObject *)(obj))->data))
The PyArrayObject struct type is
defined in the same file; the field
of interest is:
char *data;
So now, the question is whether accessing data from multiple threads is safe or not.
Creating a new NumPy array from scratch (i.e., not deriving it from an existing data structure) passes a NULL data pointer to PyArray_NewFromDescr, defined in arrayobject.c.
This causes PyArray_NewFromDescr to invoke PyDataMem_NEW in order to allocate memory for the PyArrayObject's data field. This is simply a macro for malloc:
#define PyDataMem_NEW(size) ((char *)malloc(size))
In summary, PyArray_DATA is thread-safe and as long as the NumPy arrays are created separately, it is safe to write to them from different threads.

Categories