It seems that since access to NumPy array data doesn't require calls into the Python interpreter, C extensions can manipulate these arrays after releasing the GIL. For instance, in this thread.
The built-in Python type bytearray supports the Buffer Protocol, one member of which is
void *buf
A pointer to the start of the logical structure described by the
buffer fields. [...]
For contiguous arrays, the value points to the beginning of the memory
block.
My question is, can a C extension manipulate this buf after releasing the GIL (Py_BEGIN_ALLOW_THREADS) since accessing it no longer requires calls to the Python C API? Or does the nature of the Python garbage collector forbid this, since the bytearray, and its buf, might be moved during execution?
To clarify the short answer written as comment: you can access the *buf data without holding the GIL, provided you are sure that the Py_buffer struct is "owned" by the thread while it is running without the GIL.
For the sake of completeness, I should add that this may open the door to (very remote) crashes risks: if the GIL-less thread reads the data at *buf while at the same time another GIL-holding thread is running Python code that changes the same data (bytearray[index]=x) then the GIL-less thread can see unexpected changes of the data under its feet. The opposite is true too, and even more annoying (but still theoretical): if the GIL-less thread changes the data at *buf, then other GIL-holding, Python-running threads might see strange results or even maybe crashes if doing some complex reading operations like bytearray.split().
Related
Since C++11, static variable initialization is guaranteed to be thread safe. But how about modifying a static variable in multiple threads? like below
static int initialized = 0;
Initialize()
{
if (initialized)
return;
initialized = 1; // Is this thread safe?
}
The reason I ask this question is that I am reading the source code for
Py_Initialize(), I am trying to embed Python in a multithreaded C++ application, I am wondering if it is safe to call Py_Initialize() multiple times in several threads? The implementation of Py_Initialize() boils down to
function _Py_InitializeEx_Private, which is like below
// pylifecycle.c
static int initialized = 0;
_Py_InitializeEx_Private(int install_sigs, int install_importlib)
{
if (initialized)
return;
initialized = 1;
// a bunch of other stuff
}
And is the conclusion for C the same as C++?
EDIT
So all the answers are good, I chose the one which clears my head most.
No, static in this context is only about the storage duration (see http://en.cppreference.com/w/c/language/static_storage_duration).
The variable has no extra thread safety at all over some other variable.
Try using std::call_once for this, see http://en.cppreference.com/w/cpp/thread/call_once
It's not thread safe to modify a static variable, but initializing a static variable is thread safe. So you can do:
void my_py_init() {
static bool x = (Py_Initialize(), true);
}
That's it. You can now call my_py_init from as many threads as you want and Py_Initialize will only ever get called once.
Py_Initialize is not thread-safe. You can call it from multiple threads only if you know that the Python interpreter has already been initialized, but if you can prove that it would be silly to call the function.
Indeed, most Python C-API calls are not thread-safe; you need to acquire the Global Interpreter Lock (GIL) in order to interact with the Python interpreter. (See the Python C-API docs for more details. Read it carefully.)
However, as far as I know you cannot use the standard API to acquire the GIL until the interpreter has been initialized. So if you have multiple threads, any of which might initialize the same Python interpreter, you would need to protect the calls to Py_Initialize with your own mutex. You might well be better off doing the initialization once before you start up any threads, if that is possible with your program logic.
The code you cite:
static int initialized = 0;
void Initialize_If_Necessary()
{
if (initialized)
return;
initialized = 1;
// Do the initialization only once
}
is clearly not threadsafe in any language, even if initialized is an atomic type. Suppose two threads were simultaneously executing this code before any initialization happened: both of them see initialized as false, so both of them proceed with the initialization. (If you don't have two cores, you could imagine that the first process is task switched between the test of initialized and the assignment.)
Modifying a static variable across multiple threads is not safe, since if the variable is put into a register, then other cores' information in the same registers will be different (modifying the variable in another thread would be the same as attempting to access that core's version of the register, which contains completely different data).
The first code sample is the typical starting point for what is referred to as 'lazy-initialisation'. It's useful for guaranteeing once-only initialisation of "expensive objects"; but doing so only if needed just before any use of the object.
That specific example doesn't have any serious problems, but it's an oversimplification. And when you look more holistically at lazy-initialisation, you'll see that multi-threaded lazy-initialisation is not a good idea.
The concept of "Thread Safety" goes way beyond just a single variable (static or otherwise). You need to step back and consider things happening to the same1 resources (memory, objects, files, ...) at the same time.
1: Different instances of the same class are not the same thing; but their static members are.
Consider the following extract from your second example.
if (initialized)
return;
initialized = 1;
// a bunch of other stuff
In the first 3 lines, there's no serious harm if multiple threads execute that code approximately concurrently. Some threads might return early; others might be a little "too quick" and all perform the task of setting initialized = 1;. However, that wouldn't be a concern, since no matter how many threads set the shared variable, the net effect is always the same.
The problem comes in with the fourth line. The one almost nonchalantly brushed aside as "a bunch of other stuff". That "other stuff" is the really critical code, because if it's possible for initialized = 1; to be called multiple times, you need to consider the impact of calling "other stuff" multiple times and concurrently.
Now, in the unlikely event you satisfy yourself that "other stuff" can be called multiple times, there's another concern...
Consider the client code that might be using Python.
Py_Initialize();
//use Python
If 2 threads call the above simultaneously; with 1 'returning early' and the other actually performing the initialisation. Then the 'early-returning thread' would start (or try to start) using Python before it's fully initialised!
As a bit of a hack, you might try blocking at the if (initialized) line for the duration of the initialisation process. But this is undesirable for 2 reasons:
Multiple threads are likely to be stuck waiting in the early stages of their processing.
Even after initialisation is complete you'd have a small (but totally wasteful) overhead of checking the lock each time you 'lazy-initialise' the Python framework.
Conclusion
Lazy-initialisation has its uses. But you're much better off not trying to perform the lazy initialisation from multiple threads. Rather have a "safe thread" (main thread is usually good enough) that can perform the lazy-initialisation before even creating any threads that would try to use whatever has been initialised. Then you won't have to worry about the thread-safety at all.
Is it safe to allocate memory with malloc with nogil in cython?
Also is it safe to pass pointers with you have multithreaded program running with nogil?
The GIL is in place because CPythons memory management is not thread-safe. As a consequence, you can use nogil in cases where you do not interact with a Python Object, i.e with memory that is handled by Python.
This is mentioned in the documentation for releasing the GIL:
Code in the body of the statement must not manipulate Python objects in any way, and must not call anything that manipulates Python objects without first re-acquiring the GIL. Cython currently does not check this.
So, using malloc, passing pointers and doing anything else that is legal in C is perfectly safe as long as no Python Objects are involved.
I'm writing some python code that splices together large files at various points. I've done something similar in C where I allocated a 1MB char array and used that as the read/write buffer. And it was very simple: read 1MB into the char array then write it out.
But with python I'm assuming it is different, each time I call read() with size = 1M, it will allocate a 1M long character string. And hopefully when the buffer goes out of scope it will we freed in the next gc pass.
Would python handle the allocation this way? If so, is the constant allocation/deallocation cycle be computationally expensive?
Can I tell python to use the same block of memory just like in C? Or is the python vm smart enough to do it itself?
I guess what I'm essentially aiming for is kinda like an implementation of dd in python.
Search site docs.python.org for readinto to find docs appropriate for the version of Python you're using. readinto is a low-level feature. They'll look a lot like this:
readinto(b)
Read up to len(b) bytes into bytearray b and return the number of bytes read.
Like read(), multiple reads may be issued to the underlying raw stream, unless the latter is interactive.
A BlockingIOError is raised if the underlying raw stream is in non blocking-mode, and has no data available at the moment.
But don't worry about it prematurely. Python allocates and deallocates dynamic memory at a ferocious rate, and it's likely that the cost of repeatedly getting & free'ing a measly megabyte will be lost in the noise. And note that CPython is primarily reference-counted, so your buffer will get reclaimed "immediately" when it goes out of scope. As to whether Python will reuse the same memory space each time, the odds are decent but it's not assured. Python does nothing to try to force that, but depending on the entire allocation/deallocation pattern and the details of the system C's malloc()/free() implementation, it's not impossible it will get reused ;-)
I'm very confused as to how exactly I can ensure thread-safety when calling Python code from a C (or C++) thread.
The Python documentation seems to be saying that the usual idiom to do so is:
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
/* Perform Python actions here. */
result = CallSomeFunction();
/* evaluate result or handle exception */
/* Release the thread. No Python API allowed beyond this point. */
PyGILState_Release(gstate);
And indeed, this stackoverflow answer seems to confirm as much. But a commenter (with a very high reputation) says otherwise. The commenter says you should use PyEval_RestoreThread()/PyEval_SaveThread().
The docs seem to confirm this:
PyThreadState* PyEval_SaveThread()
Release the global interpreter lock (if it has been created and
thread support is enabled) and reset the thread state to NULL,
returning the previous thread state (which is not NULL). If the lock
has been created, the current thread must have acquired it. (This
function is available even when thread support is disabled at compile
time.)
void PyEval_RestoreThread(PyThreadState *tstate)
Acquire the global interpreter lock (if it has been created and thread
support is enabled) and set the thread state to tstate, which must not
be NULL. If the lock has been created, the current thread must not have
acquired it, otherwise deadlock ensues. (This function is available even
when thread support is disabled at compile time.)
The way the docs describe this, it seems that PyEval_RestoreThread()/PyEval_SaveThread() is basically a mutex lock/unlock idiom. So it would make sense that before calling any Python code from C, you first need to lock the GIL, and then unlock it.
So which is it? When calling Python code from C, should I use:
PyGILState_Ensure()/PyGILState_Release()
or
PyEval_RestoreThread/PyEval_SaveThread?
And what is really the difference?
First, you almost never want to call PyEval_RestoreThread/PyEval_SaveThread. Instead, you want to call the wrapper macros Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS. The documentation is written for those macros, which is why you couldn't find it.
Anyway, either way, you don't use the thread functions/macros to acquire the GIL; you use them to temporarily release the GIL when you've acquired it.
So, why would you ever want to do this? Well, in simple cases you don't; you just need Ensure/Release. But sometimes you need to hold onto your Python thread state until later, but don't need to hold onto the GIL (or even explicitly need to not hold onto the GIL, to allow some other thread to progress so it can signal you). As the docs explain, the most common reasons for this are doing file I/O or extensive CPU-bound computation.
Finally, is there any case where you want to call the functions instead of the macros? Yes, if you want access to the stashed PyThreadState. If you can't think of a reason why you might want that, you probably don't have one.
This is risky business, and I understand the Global Interpreter Lock to be a formidable foe of parallelism. However, if I'm using NumPy's C API (specifically the PyArray_DATA macro on a NumPy array), are there potential consequences to invoking it from multiple concurrent threads?
Note that I will still own the GIL and not be releasing it with NumPy's threading support. Also, even if NumPy makes no guarantees about thread safety but PyArray_DATA is thread-safe in practice, that's good enough for me.
I'm running Python 2.6.6 with NumPy 1.3.0 on Linux.
Answering my own question here, but after poking into the source code for NumPy 1.3.0, I believe the answer is: Yes, PyArray_DATA is thread-safe.
PyArray_DATA is defined in
ndarrayobject.h:
#define PyArray_DATA(obj) ((void *)(((PyArrayObject *)(obj))->data))
The PyArrayObject struct type is
defined in the same file; the field
of interest is:
char *data;
So now, the question is whether accessing data from multiple threads is safe or not.
Creating a new NumPy array from scratch (i.e., not deriving it from an existing data structure) passes a NULL data pointer to PyArray_NewFromDescr, defined in arrayobject.c.
This causes PyArray_NewFromDescr to invoke PyDataMem_NEW in order to allocate memory for the PyArrayObject's data field. This is simply a macro for malloc:
#define PyDataMem_NEW(size) ((char *)malloc(size))
In summary, PyArray_DATA is thread-safe and as long as the NumPy arrays are created separately, it is safe to write to them from different threads.