Is it thread safe to modify a static variable? - python

Since C++11, static variable initialization is guaranteed to be thread safe. But how about modifying a static variable in multiple threads? like below
static int initialized = 0;
Initialize()
{
if (initialized)
return;
initialized = 1; // Is this thread safe?
}
The reason I ask this question is that I am reading the source code for
Py_Initialize(), I am trying to embed Python in a multithreaded C++ application, I am wondering if it is safe to call Py_Initialize() multiple times in several threads? The implementation of Py_Initialize() boils down to
function _Py_InitializeEx_Private, which is like below
// pylifecycle.c
static int initialized = 0;
_Py_InitializeEx_Private(int install_sigs, int install_importlib)
{
if (initialized)
return;
initialized = 1;
// a bunch of other stuff
}
And is the conclusion for C the same as C++?
EDIT
So all the answers are good, I chose the one which clears my head most.

No, static in this context is only about the storage duration (see http://en.cppreference.com/w/c/language/static_storage_duration).
The variable has no extra thread safety at all over some other variable.
Try using std::call_once for this, see http://en.cppreference.com/w/cpp/thread/call_once

It's not thread safe to modify a static variable, but initializing a static variable is thread safe. So you can do:
void my_py_init() {
static bool x = (Py_Initialize(), true);
}
That's it. You can now call my_py_init from as many threads as you want and Py_Initialize will only ever get called once.

Py_Initialize is not thread-safe. You can call it from multiple threads only if you know that the Python interpreter has already been initialized, but if you can prove that it would be silly to call the function.
Indeed, most Python C-API calls are not thread-safe; you need to acquire the Global Interpreter Lock (GIL) in order to interact with the Python interpreter. (See the Python C-API docs for more details. Read it carefully.)
However, as far as I know you cannot use the standard API to acquire the GIL until the interpreter has been initialized. So if you have multiple threads, any of which might initialize the same Python interpreter, you would need to protect the calls to Py_Initialize with your own mutex. You might well be better off doing the initialization once before you start up any threads, if that is possible with your program logic.
The code you cite:
static int initialized = 0;
void Initialize_If_Necessary()
{
if (initialized)
return;
initialized = 1;
// Do the initialization only once
}
is clearly not threadsafe in any language, even if initialized is an atomic type. Suppose two threads were simultaneously executing this code before any initialization happened: both of them see initialized as false, so both of them proceed with the initialization. (If you don't have two cores, you could imagine that the first process is task switched between the test of initialized and the assignment.)

Modifying a static variable across multiple threads is not safe, since if the variable is put into a register, then other cores' information in the same registers will be different (modifying the variable in another thread would be the same as attempting to access that core's version of the register, which contains completely different data).

The first code sample is the typical starting point for what is referred to as 'lazy-initialisation'. It's useful for guaranteeing once-only initialisation of "expensive objects"; but doing so only if needed just before any use of the object.
That specific example doesn't have any serious problems, but it's an oversimplification. And when you look more holistically at lazy-initialisation, you'll see that multi-threaded lazy-initialisation is not a good idea.
The concept of "Thread Safety" goes way beyond just a single variable (static or otherwise). You need to step back and consider things happening to the same1 resources (memory, objects, files, ...) at the same time.
1: Different instances of the same class are not the same thing; but their static members are.
Consider the following extract from your second example.
if (initialized)
return;
initialized = 1;
// a bunch of other stuff
In the first 3 lines, there's no serious harm if multiple threads execute that code approximately concurrently. Some threads might return early; others might be a little "too quick" and all perform the task of setting initialized = 1;. However, that wouldn't be a concern, since no matter how many threads set the shared variable, the net effect is always the same.
The problem comes in with the fourth line. The one almost nonchalantly brushed aside as "a bunch of other stuff". That "other stuff" is the really critical code, because if it's possible for initialized = 1; to be called multiple times, you need to consider the impact of calling "other stuff" multiple times and concurrently.
Now, in the unlikely event you satisfy yourself that "other stuff" can be called multiple times, there's another concern...
Consider the client code that might be using Python.
Py_Initialize();
//use Python
If 2 threads call the above simultaneously; with 1 'returning early' and the other actually performing the initialisation. Then the 'early-returning thread' would start (or try to start) using Python before it's fully initialised!
As a bit of a hack, you might try blocking at the if (initialized) line for the duration of the initialisation process. But this is undesirable for 2 reasons:
Multiple threads are likely to be stuck waiting in the early stages of their processing.
Even after initialisation is complete you'd have a small (but totally wasteful) overhead of checking the lock each time you 'lazy-initialise' the Python framework.
Conclusion
Lazy-initialisation has its uses. But you're much better off not trying to perform the lazy initialisation from multiple threads. Rather have a "safe thread" (main thread is usually good enough) that can perform the lazy-initialisation before even creating any threads that would try to use whatever has been initialised. Then you won't have to worry about the thread-safety at all.

Related

Is embedded python Py_Finalize() blocking?

I'm seeing intermittent crashing when I run large embedded python programs. My question is does the Py_Finalize() call block until all of the python interpreter is in a safe state before continuing? If it doesn't, how do I know when the interpreter has destroyed everything?
My current code looks like this:
Py_Initialize();
...
...
Py_Finalize(); // Unsure if this returns immediately or returns after completing all Finalizing actions
I don't think this is totally answering the question I originally asked, but, I have found a way to make the garbage collector do a better job when we I call Py_Finalize. That is to stop using static class variables in Python.
Old code:
class MyClass(object):
a = {}
def __init__(self):
...
...
New code:
class MyClass(object):
def __init__(self):
self.a = {}
...
...
If I'm right calling Py_Finalize(); will clear the python interpreter (some exceptions are found on [1]).
I would suggest you to create a class for the python interpreter and to manually check all your tasks are finished before calling Py_Finalize();. In the projects where I have worked using the embedded python interpreter, this was suiting the best.
Hope it helps!
[Python Doc][1]
https://docs.python.org/2/c-api/init.html
== EDIT ==
For Py_Finalize()
Bugs and caveats: The destruction of modules and objects in modules is
done in random order; this may cause destructors (del() methods)
to fail when they depend on other objects (even functions) or modules.
Dynamically loaded extension modules loaded by Python are not
unloaded. Small amounts of memory allocated by the Python interpreter
may not be freed (if you find a leak, please report it). Memory tied
up in circular references between objects is not freed. Some memory
allocated by extension modules may not be freed. Some extensions may
not work properly if their initialization routine is called more than
once; this can happen if an application calls Py_Initialize and
Py_Finalize more than once.
Seems like if your program is only calling Py_Initialize() and Py_Finalize() once, you might find some trouble (which I never did) and have some memory leak. However if you are only Initializing the python interpreter and performing tasks while your main program is running (I'm more familiar to this approach) you won't have many trouble.

Multithreaded Access to Python bytearray

It seems that since access to NumPy array data doesn't require calls into the Python interpreter, C extensions can manipulate these arrays after releasing the GIL. For instance, in this thread.
The built-in Python type bytearray supports the Buffer Protocol, one member of which is
void *buf
A pointer to the start of the logical structure described by the
buffer fields. [...]
For contiguous arrays, the value points to the beginning of the memory
block.
My question is, can a C extension manipulate this buf after releasing the GIL (Py_BEGIN_ALLOW_THREADS) since accessing it no longer requires calls to the Python C API? Or does the nature of the Python garbage collector forbid this, since the bytearray, and its buf, might be moved during execution?
To clarify the short answer written as comment: you can access the *buf data without holding the GIL, provided you are sure that the Py_buffer struct is "owned" by the thread while it is running without the GIL.
For the sake of completeness, I should add that this may open the door to (very remote) crashes risks: if the GIL-less thread reads the data at *buf while at the same time another GIL-holding thread is running Python code that changes the same data (bytearray[index]=x) then the GIL-less thread can see unexpected changes of the data under its feet. The opposite is true too, and even more annoying (but still theoretical): if the GIL-less thread changes the data at *buf, then other GIL-holding, Python-running threads might see strange results or even maybe crashes if doing some complex reading operations like bytearray.split().

Calling Python code from a C thread

I'm very confused as to how exactly I can ensure thread-safety when calling Python code from a C (or C++) thread.
The Python documentation seems to be saying that the usual idiom to do so is:
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
/* Perform Python actions here. */
result = CallSomeFunction();
/* evaluate result or handle exception */
/* Release the thread. No Python API allowed beyond this point. */
PyGILState_Release(gstate);
And indeed, this stackoverflow answer seems to confirm as much. But a commenter (with a very high reputation) says otherwise. The commenter says you should use PyEval_RestoreThread()/PyEval_SaveThread().
The docs seem to confirm this:
PyThreadState* PyEval_SaveThread()
Release the global interpreter lock (if it has been created and
thread support is enabled) and reset the thread state to NULL,
returning the previous thread state (which is not NULL). If the lock
has been created, the current thread must have acquired it. (This
function is available even when thread support is disabled at compile
time.)
void PyEval_RestoreThread(PyThreadState *tstate)
Acquire the global interpreter lock (if it has been created and thread
support is enabled) and set the thread state to tstate, which must not
be NULL. If the lock has been created, the current thread must not have
acquired it, otherwise deadlock ensues. (This function is available even
when thread support is disabled at compile time.)
The way the docs describe this, it seems that PyEval_RestoreThread()/PyEval_SaveThread() is basically a mutex lock/unlock idiom. So it would make sense that before calling any Python code from C, you first need to lock the GIL, and then unlock it.
So which is it? When calling Python code from C, should I use:
PyGILState_Ensure()/PyGILState_Release()
or
PyEval_RestoreThread/PyEval_SaveThread?
And what is really the difference?
First, you almost never want to call PyEval_RestoreThread/PyEval_SaveThread. Instead, you want to call the wrapper macros Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS. The documentation is written for those macros, which is why you couldn't find it.
Anyway, either way, you don't use the thread functions/macros to acquire the GIL; you use them to temporarily release the GIL when you've acquired it.
So, why would you ever want to do this? Well, in simple cases you don't; you just need Ensure/Release. But sometimes you need to hold onto your Python thread state until later, but don't need to hold onto the GIL (or even explicitly need to not hold onto the GIL, to allow some other thread to progress so it can signal you). As the docs explain, the most common reasons for this are doing file I/O or extensive CPU-bound computation.
Finally, is there any case where you want to call the functions instead of the macros? Yes, if you want access to the stashed PyThreadState. If you can't think of a reason why you might want that, you probably don't have one.

Embedding python in multithreaded C application

I'm embedding the python interpreter in a multithreaded C application and I'm a little confused as to what APIs I should use to ensure thread safety.
From what I gathered, when embedding python it is up to the embedder to take care of the GIL lock before calling any other Python C API call. This is done with these functions:
gstate = PyGILState_Ensure();
// do some python api calls, run python scripts
PyGILState_Release(gstate);
But this alone doesn't seem to be enough. I still got random crashes since it doesn't seem to provide mutual exclusion for the Python APIs.
After reading some more docs I also added:
PyEval_InitThreads();
right after the call to Py_IsInitialized() but that's where the confusing part comes. The docs state that this function:
Initialize and acquire the global interpreter lock
This suggests that when this function returns, the GIL is supposed to be locked and should be unlocked somehow. but in practice this doesn't seem to be required. With this line in place my multithreaded worked perfectly and mutual exclusion was maintained by the PyGILState_Ensure/Release functions.
When I tried adding PyEval_ReleaseLock() after PyEval_ReleaseLock() the app dead-locked pretty quickly in a subsequent call to PyImport_ExecCodeModule().
So what am I missing here?
I had exactly the same problem and it is now solved by using PyEval_SaveThread() immediately after PyEval_InitThreads(), as you suggest above. However, my actual problem was that I used PyEval_InitThreads() after PyInitialise() which then caused PyGILState_Ensure() to block when called from different, subsequent native threads. In summary, this is what I do now:
There is global variable:
static int gil_init = 0;
From a main thread load the native C extension and start the Python interpreter:
Py_Initialize()
From multiple other threads my app concurrently makes a lot of calls into the Python/C API:
if (!gil_init) {
gil_init = 1;
PyEval_InitThreads();
PyEval_SaveThread();
}
state = PyGILState_Ensure();
// Call Python/C API functions...
PyGILState_Release(state);
From the main thread stop the Python interpreter
Py_Finalize()
All other solutions I've tried either caused random Python sigfaults or deadlock/blocking using PyGILState_Ensure().
The Python documentation really should be more clear on this and at least provide an example for both the embedding and extension use cases.
Eventually I figured it out.
After
PyEval_InitThreads();
You need to call
PyEval_SaveThread();
While properly release the GIL for the main thread.
Note that the if (!gil_init) { code in #forman's answer runs only once, so it can be just as well done in the main thread, which allows us to drop the flag (gil_init would properly have to be atomic or otherwise synchronized).
PyEval_InitThreads() is meaningful only in CPython 3.6 and older, and has been deprecated in CPython 3.9, so it has to be guarded with a macro.
Given all this, what I am currently using is the following:
In the main thread, run all of
Py_Initialize();
PyEval_InitThreads(); // only on Python 3.6 or older!
/* tstate = */ PyEval_SaveThread(); // maybe save the return value if you need it later
Now, whenever you need to call into Python, do
state = PyGILState_Ensure();
// Call Python/C API functions...
PyGILState_Release(state);
Finally, from the main thread, stop the Python interpreter
PyGILState_Ensure(); // PyEval_RestoreThread(tstate); seems to work just as well
Py_Finalize()
Having a multi-threaded C app trying to communicate from multiple threads to multiple Python threads of a single CPython instance looks risky to me.
As long as only one C thread communicates with Python you should not have to worry about locking even if the Python application is multi-threading.
If you need multiple python threads you can set the application up this way and have multiple C threads communicate via a queue with that single C thread that farms them out to multiple Python threads.
An alternative that might work for you is to have multiple CPython instances one for each C thread that needs it (of course communication between Python programs should be via the C program).
Another alternative might the Stackless Python interpreter. That does away with the GIL, but I am not sure you run into other problems binding it to multiple threads. stackless was a drop-in replacement for my (single-threaded) C application.

Python C API from C++ app - know when to lock

I am trying to write a C++ class that calls Python methods of a class that does some I/O operations (file, stdout) at once. The problem I have ran into is that my class is called from different threads: sometimes main thread, sometimes different others. Obviously I tried to apply the approach for Python calls in multi-threaded native applications. Basically everything starts from PyEval_AcquireLock and PyEval_ReleaseLock or just global locks. According to the documentation here when a thread is already locked a deadlock ensues. When my class is called from the main thread or other one that blocks Python execution I have a deadlock.
Python> Cfunc1() - C++ func that creates threads internally which lead to calls in "my class",
It stuck on PyEval_AcquireLock, obviously the Python is already locked, i.e. waiting for C++ Cfunc1 call to complete... It completes fine if I omit those locks. Also it completes fine when Python interpreter is ready for the next user command, i.e. when thread is calling funcs in the background - not inside of a native call
I am looking for a workaround. I need to distinguish whether or not the global lock is allowed, i.e. Python is not locked and ready to receive the next command... I tried PyGIL_Ensure, unfortunately I see hang.
Any known API or solution for this ?
(Python 2.4)
Unless you have wrapped your C++ code quite peculiarly, when any Python thread calls into your C++ code, the GIL is held. You may release it in your C++ code (if you want to do some consuming task that doesn't require any Python interaction), and then will have to acquire it again when you want to do any Python interaction -- see the docs: if you're just using the good old C API, there are macros for that, and the recommended idiom is
Py_BEGIN_ALLOW_THREADS
...Do some blocking I/O operation...
Py_END_ALLOW_THREADS
the docs explain:
The Py_BEGIN_ALLOW_THREADS macro opens
a new block and declares a hidden
local variable; the
Py_END_ALLOW_THREADS macro closes the
block. Another advantage of using
these two macros is that when Python
is compiled without thread support,
they are defined empty, thus saving
the thread state and GIL
manipulations.
So you just don't have to acquire the GIL (and shouldn't) until after you've explicitly released it (ideally with that macro) and need to interact with Python in any way again. (Where the docs say "some blocking I/O operation", it could actually be any long-running operation with no Python interaction whatsoever).

Categories