Calling Python code from a C thread

Calling Python code from a C thread - python

I'm very confused as to how exactly I can ensure thread-safety when calling Python code from a C (or C++) thread.
The Python documentation seems to be saying that the usual idiom to do so is:
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
/* Perform Python actions here. */
result = CallSomeFunction();
/* evaluate result or handle exception */
/* Release the thread. No Python API allowed beyond this point. */
PyGILState_Release(gstate);
And indeed, this stackoverflow answer seems to confirm as much. But a commenter (with a very high reputation) says otherwise. The commenter says you should use PyEval_RestoreThread()/PyEval_SaveThread().
The docs seem to confirm this:
PyThreadState* PyEval_SaveThread()
Release the global interpreter lock (if it has been created and
thread support is enabled) and reset the thread state to NULL,
returning the previous thread state (which is not NULL). If the lock
has been created, the current thread must have acquired it. (This
function is available even when thread support is disabled at compile
time.)
void PyEval_RestoreThread(PyThreadState *tstate)
Acquire the global interpreter lock (if it has been created and thread
support is enabled) and set the thread state to tstate, which must not
be NULL. If the lock has been created, the current thread must not have
acquired it, otherwise deadlock ensues. (This function is available even
when thread support is disabled at compile time.)
The way the docs describe this, it seems that PyEval_RestoreThread()/PyEval_SaveThread() is basically a mutex lock/unlock idiom. So it would make sense that before calling any Python code from C, you first need to lock the GIL, and then unlock it.
So which is it? When calling Python code from C, should I use:
PyGILState_Ensure()/PyGILState_Release()
or
PyEval_RestoreThread/PyEval_SaveThread?
And what is really the difference?

First, you almost never want to call PyEval_RestoreThread/PyEval_SaveThread. Instead, you want to call the wrapper macros Py_BEGIN_ALLOW_THREADS/Py_END_ALLOW_THREADS. The documentation is written for those macros, which is why you couldn't find it.
Anyway, either way, you don't use the thread functions/macros to acquire the GIL; you use them to temporarily release the GIL when you've acquired it.
So, why would you ever want to do this? Well, in simple cases you don't; you just need Ensure/Release. But sometimes you need to hold onto your Python thread state until later, but don't need to hold onto the GIL (or even explicitly need to not hold onto the GIL, to allow some other thread to progress so it can signal you). As the docs explain, the most common reasons for this are doing file I/O or extensive CPU-bound computation.
Finally, is there any case where you want to call the functions instead of the macros? Yes, if you want access to the stashed PyThreadState. If you can't think of a reason why you might want that, you probably don't have one.

Related

Calling Py_EndInterpreter from a C worker thread

The deprecation of Python's PyEval_ReleaseLock has introduced a problem in our codebase: We want to terminate a Python interpreter from a C callback function using Py_EndInterpreter
So to do that, Python's Docs say that you must hold the GIL when calling this function:
void Py_EndInterpreter(PyThreadState *tstate)
Destroy the (sub-)interpreter represented by the given thread state. The given thread state must be
the current thread state. See the discussion of thread states below. When the call returns, the
current thread state is NULL. All thread states associated with this interpreter are destroyed. (The
global interpreter lock must be held before calling this function and is still held when it returns.)
Py_FinalizeEx() will destroy all sub-interpreters that haven’t been explicitly destroyed at that point.
Great! So we call PyEval_RestoreThread to restore our thread state to the thread we're about to terminate, and then call Py_EndInterpreter.
// Acquire the GIL
PyEval_RestoreThread(thread);
// Tear down the interpreter.
Py_EndInterpreter(thread);
// Now what? We still hold the GIL and we no longer have a valid thread state.
// Previously we did PyEval_ReleaseLock here, but that is now deprecated.
The documentation for PyEval_ReleaseLock says that we should either use PyEval_SaveThread or PyEval_ReleaseThread.
PyEval_ReleaseThread's documentation says that the input thread state must not be NULL. Okay, but we can't pass in the recently deleted thread state.
PyEval_SaveThread will hit a debug assertion if you try to call it after calling Py_EndInterpreter, so that's not an option either.
So, we've currently implemented a hack to get around this issue - we save the thread state of the thread that calls Py_InitializeEx in a global variable, and swap to it after calling Py_EndInterpreter.
// Acquire the GIL
PyEval_RestoreThread(thread);
// Tear down the interpreter.
Py_EndInterpreter(thread);
// Swap to the main thread state.
PyThreadState_Swap(g_init.thread_state_);
PyEval_SaveThread(); // Release the GIL. Probably.
What's the proper solution here? It seems that embedded Python is an afterthought for this API.
Similar question: PyEval_InitThreads in Python 3: How/when to call it? (the saga continues ad nauseam)

Is it thread safe to modify a static variable?

Since C++11, static variable initialization is guaranteed to be thread safe. But how about modifying a static variable in multiple threads? like below
static int initialized = 0;
Initialize()
{
if (initialized)
return;
initialized = 1; // Is this thread safe?
}
The reason I ask this question is that I am reading the source code for
Py_Initialize(), I am trying to embed Python in a multithreaded C++ application, I am wondering if it is safe to call Py_Initialize() multiple times in several threads? The implementation of Py_Initialize() boils down to
function _Py_InitializeEx_Private, which is like below
// pylifecycle.c
static int initialized = 0;
_Py_InitializeEx_Private(int install_sigs, int install_importlib)
{
if (initialized)
return;
initialized = 1;
// a bunch of other stuff
}
And is the conclusion for C the same as C++?
EDIT
So all the answers are good, I chose the one which clears my head most.

No, static in this context is only about the storage duration (see http://en.cppreference.com/w/c/language/static_storage_duration).
The variable has no extra thread safety at all over some other variable.
Try using std::call_once for this, see http://en.cppreference.com/w/cpp/thread/call_once

It's not thread safe to modify a static variable, but initializing a static variable is thread safe. So you can do:
void my_py_init() {
static bool x = (Py_Initialize(), true);
}
That's it. You can now call my_py_init from as many threads as you want and Py_Initialize will only ever get called once.

Py_Initialize is not thread-safe. You can call it from multiple threads only if you know that the Python interpreter has already been initialized, but if you can prove that it would be silly to call the function.
Indeed, most Python C-API calls are not thread-safe; you need to acquire the Global Interpreter Lock (GIL) in order to interact with the Python interpreter. (See the Python C-API docs for more details. Read it carefully.)
However, as far as I know you cannot use the standard API to acquire the GIL until the interpreter has been initialized. So if you have multiple threads, any of which might initialize the same Python interpreter, you would need to protect the calls to Py_Initialize with your own mutex. You might well be better off doing the initialization once before you start up any threads, if that is possible with your program logic.
The code you cite:
static int initialized = 0;
void Initialize_If_Necessary()
{
if (initialized)
return;
initialized = 1;
// Do the initialization only once
}
is clearly not threadsafe in any language, even if initialized is an atomic type. Suppose two threads were simultaneously executing this code before any initialization happened: both of them see initialized as false, so both of them proceed with the initialization. (If you don't have two cores, you could imagine that the first process is task switched between the test of initialized and the assignment.)

Modifying a static variable across multiple threads is not safe, since if the variable is put into a register, then other cores' information in the same registers will be different (modifying the variable in another thread would be the same as attempting to access that core's version of the register, which contains completely different data).

The first code sample is the typical starting point for what is referred to as 'lazy-initialisation'. It's useful for guaranteeing once-only initialisation of "expensive objects"; but doing so only if needed just before any use of the object.
That specific example doesn't have any serious problems, but it's an oversimplification. And when you look more holistically at lazy-initialisation, you'll see that multi-threaded lazy-initialisation is not a good idea.
The concept of "Thread Safety" goes way beyond just a single variable (static or otherwise). You need to step back and consider things happening to the same1 resources (memory, objects, files, ...) at the same time.
1: Different instances of the same class are not the same thing; but their static members are.
Consider the following extract from your second example.
if (initialized)
return;
initialized = 1;
// a bunch of other stuff
In the first 3 lines, there's no serious harm if multiple threads execute that code approximately concurrently. Some threads might return early; others might be a little "too quick" and all perform the task of setting initialized = 1;. However, that wouldn't be a concern, since no matter how many threads set the shared variable, the net effect is always the same.
The problem comes in with the fourth line. The one almost nonchalantly brushed aside as "a bunch of other stuff". That "other stuff" is the really critical code, because if it's possible for initialized = 1; to be called multiple times, you need to consider the impact of calling "other stuff" multiple times and concurrently.
Now, in the unlikely event you satisfy yourself that "other stuff" can be called multiple times, there's another concern...
Consider the client code that might be using Python.
Py_Initialize();
//use Python
If 2 threads call the above simultaneously; with 1 'returning early' and the other actually performing the initialisation. Then the 'early-returning thread' would start (or try to start) using Python before it's fully initialised!
As a bit of a hack, you might try blocking at the if (initialized) line for the duration of the initialisation process. But this is undesirable for 2 reasons:
Multiple threads are likely to be stuck waiting in the early stages of their processing.
Even after initialisation is complete you'd have a small (but totally wasteful) overhead of checking the lock each time you 'lazy-initialise' the Python framework.
Conclusion
Lazy-initialisation has its uses. But you're much better off not trying to perform the lazy initialisation from multiple threads. Rather have a "safe thread" (main thread is usually good enough) that can perform the lazy-initialisation before even creating any threads that would try to use whatever has been initialised. Then you won't have to worry about the thread-safety at all.

PyEval_InitThreads in Python 3: How/when to call it? (the saga continues ad nauseam)

Basically there seems to be massive confusion/ambiguity over when exactly PyEval_InitThreads() is supposed to be called, and what accompanying API calls are needed. The official Python documentation is unfortunately very ambiguous. There are already many questions on stackoverflow regarding this topic, and indeed, I've personally already asked a question almost identical to this one, so I won't be particularly surprised if this is closed as a duplicate; but consider that there seems to be no definitive answer to this question. (Sadly, I don't have Guido Van Rossum on speed-dial.)
Firstly, let's define the scope of the question here: what do I want to do? Well... I want to write a Python extension module in C that will:
Spawn worker threads using the pthread API in C
Invoke Python callbacks from within these C threads
Okay, so let's start with the Python docs themselves. The Python 3.2 docs say:
void PyEval_InitThreads()
Initialize and acquire the global interpreter lock. It should be
called in the main thread before creating a second thread or engaging
in any other thread operations such as PyEval_ReleaseThread(tstate).
It is not needed before calling PyEval_SaveThread() or
PyEval_RestoreThread().
So my understanding here is that:
Any C extension module which spawns threads must call
PyEval_InitThreads() from the main thread before any other threads
are spawned
Calling PyEval_InitThreads locks the GIL
So common sense would tell us that any C extension module which creates threads must call PyEval_InitThreads(), and then release the Global Interpreter Lock. Okay, seems straightforward enough. So prima facie, all that's required would be the following code:
PyEval_InitThreads(); /* initialize threading and acquire GIL */
PyEval_ReleaseLock(); /* Release GIL */
Seems easy enough... but unfortunately, the Python 3.2 docs also say that PyEval_ReleaseLock has been deprecated. Instead, we're supposed to use PyEval_SaveThread in order to release the GIL:
PyThreadState* PyEval_SaveThread()
Release the global interpreter lock (if it has been created and thread
support is enabled) and reset the thread state to NULL, returning the
previous thread state (which is not NULL). If the lock has been
created, the current thread must have acquired it.
Er... okay, so I guess a C extension module needs to say:
PyEval_InitThreads();
PyThreadState* st = PyEval_SaveThread();
Indeed, this is exactly what this stackoverflow answer says. Except when I actually try this in practice, the Python interpreter immediately seg-faults when I import the extension module. Nice.
Okay, so now I'm giving up on the official Python documentation and turning to Google. So, this random blog claims all you need to do from an extension module is to call PyEval_InitThreads(). Of course, the documentation claims that PyEval_InitThreads() acquires the GIL, and indeed, a quick inspection of the source code for PyEval_InitThreads() in ceval.c reveals that it does indeed call the internal function take_gil(PyThreadState_GET());
So PyEval_InitThreads() definitely acquires the GIL. I would think then that you would absolutely need to somehow release the GIL after calling PyEval_InitThreads(). But how? PyEval_ReleaseLock() is deprecated, and PyEval_SaveThread() just inexplicably seg-faults.
Okay... so maybe for some reason which is currently beyond my understanding, a C extension module doesn't need to release the GIL. I tried that... and, as expected, as soon as another thread attempts to acquire the GIL (using PyGILState_Ensure), the program hangs from a deadlock. So yeah... you really do need to release the GIL after calling PyEval_InitThreads().
So again, the question is: how do you release the GIL after calling PyEval_InitThreads()?
And more generally: what exactly does a C-extension module have to do to be able to safely invoke Python code from worker C-threads?

Your understanding is correct: invoking PyEval_InitThreads does, among other things, acquire the GIL. In a correctly written Python/C application, this is not an issue because the GIL will be unlocked in time, either automatically or manually.
If the main thread goes on to run Python code, there is nothing special to do, because Python interpreter will automatically relinquish the GIL after a number of instructions have been executed (allowing another thread to acquire it, which will relinquish it again, and so on). Additionally, whenever Python is about to invoke a blocking system call, e.g. to read from the network or write to a file, it will release the GIL around the call.
The original version of this answer pretty much ended here. But there is one more thing to take into account: the embedding scenario.
When embedding Python, the main thread often initializes Python and goes on to execute other, non-Python-related tasks. In that scenario there is nothing that will automatically release the GIL, so this must be done by the thread itself. That is in no way specific to the call that calls PyEval_InitThreads, it is expected of all Python/C code invoked with the GIL acquired.
For example, the main() might contain code like this:
Py_Initialize();
PyEval_InitThreads();
Py_BEGIN_ALLOW_THREADS
... call the non-Python part of the application here ...
Py_END_ALLOW_THREADS
Py_Finalize();
If your code creates threads manually, they need to acquire the GIL before doing anything Python-related, even as simple as Py_INCREF. To do so, use the following:
// Acquire the GIL
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
... call Python code here ...
// Release the GIL. No Python API allowed beyond this point.
PyGILState_Release(gstate);

There are two methods of multi threading while executing C/Python API.
1.Execution of different threads with same interpreter - We can execute a Python interpreter and share the same interpreter over the different threads.
The coding will be as follows.
main(){
//initialize Python
Py_Initialize();
PyRun_SimpleString("from time import time,ctime\n"
"print 'In Main, Today is',ctime(time())\n");
//to Initialize and acquire the global interpreter lock
PyEval_InitThreads();
//release the lock
PyThreadState *_save;
_save = PyEval_SaveThread();
// Create threads.
for (int i = 0; i<MAX_THREADS; i++)
{
hThreadArray[i] = CreateThread
//(...
MyThreadFunction, // thread function name
//...)
} // End of main thread creation loop.
// Wait until all threads have terminated.
//...
//Close all thread handles and free memory allocations.
//...
//end python here
//but need to check for GIL here too
PyEval_RestoreThread(_save);
Py_Finalize();
return 0;
}
//the thread function
DWORD WINAPI MyThreadFunction(LPVOID lpParam)
{
//non Pythonic activity
//...
//check for the state of Python GIL
PyGILState_STATE gilState;
gilState = PyGILState_Ensure();
//execute Python here
PyRun_SimpleString("from time import time,ctime\n"
"print 'In Thread Today is',ctime(time())\n");
//release the GIL
PyGILState_Release(gilState);
//other non Pythonic activity
//...
return 0;
}
Another method is that, we can execute a Python interpreter in the main thread and, to each thread we can give its own sub interpreter. Thus every thread runs with its own separate , independent versions of all imported modules, including the fundamental modules - builtins, __main__ and sys.
The code is as follows
int main()
{
// Initialize the main interpreter
Py_Initialize();
// Initialize and acquire the global interpreter lock
PyEval_InitThreads();
// Release the lock
PyThreadState *_save;
_save = PyEval_SaveThread();
// create threads
for (int i = 0; i<MAX_THREADS; i++)
{
// Create the thread to begin execution on its own.
hThreadArray[i] = CreateThread
//(...
MyThreadFunction, // thread function name
//...); // returns the thread identifier
} // End of main thread creation loop.
// Wait until all threads have terminated.
WaitForMultipleObjects(MAX_THREADS, hThreadArray, TRUE, INFINITE);
// Close all thread handles and free memory allocations.
// ...
//end python here
//but need to check for GIL here too
//re capture the lock
PyEval_RestoreThread(_save);
//end python interpreter
Py_Finalize();
return 0;
}
//the thread functions
DWORD WINAPI MyThreadFunction(LPVOID lpParam)
{
// Non Pythonic activity
// ...
//create a new interpreter
PyEval_AcquireLock(); // acquire lock on the GIL
PyThreadState* pThreadState = Py_NewInterpreter();
assert(pThreadState != NULL); // check for failure
PyEval_ReleaseThread(pThreadState); // release the GIL
// switch in current interpreter
PyEval_AcquireThread(pThreadState);
//execute python code
PyRun_SimpleString("from time import time,ctime\n" "print\n"
"print 'Today is',ctime(time())\n");
// release current interpreter
PyEval_ReleaseThread(pThreadState);
//now to end the interpreter
PyEval_AcquireThread(pThreadState); // lock the GIL
Py_EndInterpreter(pThreadState);
PyEval_ReleaseLock(); // release the GIL
// Other non Pythonic activity
return 0;
}
It is necessary to note that the Global Interpreter Lock still persists and, in spite of giving individual interpreters to each thread, when it comes to python execution, we can still execute only one thread at a time. GIL is UNIQUE to PROCESS, so in spite of providing unique sub interpreter to each thread, we cannot have simultaneous execution of threads
Sources: Executing a Python interpreter in the main thread and, to each thread we can give its own sub interpreter
Multi threading tutorial (msdn)

I have seen symptoms similar to yours: deadlocks if I only call PyEval_InitThreads(), because my main thread never calls anything from Python again, and segfaults if I unconditionally call something like PyEval_SaveThread(). The symptoms depend on the version of Python and on the situation: I am developing a plug-in that embeds Python for a library that can be loaded as part of a Python extension. The code needs therefore to run independent of whether it is loaded by Python as main.
The following worked for be with both python2.7 and python3.4, and with my library running within Python and outside of Python. In my plug-in init routine, which is executed in the main thread, I run:
Py_InitializeEx(0);
if (!PyEval_ThreadsInitialized()) {
PyEval_InitThreads();
PyThreadState* mainPyThread = PyEval_SaveThread();
}
(mainPyThread is actually some static variable, but I don't think that matters as I never need to use it again).
Then I create threads using pthreads, and in each function that needs to access the Python API, I use:
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
// Python C API calls
PyGILState_Release(gstate);

To quote above:
The short answer: you shouldn't care about releasing the GIL after calling PyEval_InitThreads...
Now, for a longer answer:
I'm limiting my answer to be about Python extensions (as opposed to embedding Python). If we are only extending Python, than any entry point into your module is from Python. This by definition means that we don't have to worry about calling a function from a non-Python context, which makes things a bit simpler.
If threads have NOT be initialized, then we know there is no GIL (no threads == no need for locking), and thus "It is not safe to call this function when it is unknown which thread (if any) currently has the global interpreter lock" does not apply.
if (!PyEval_ThreadsInitialized())
{
PyEval_InitThreads();
}
After calling PyEval_InitThreads(), a GIL is created and assigned... to our thread, which is the thread currently running Python code. So all is good.
Now, as far as our own launched worker "C"-threads, they will need to ask for the GIL before running relevant code: so their common methodology is as follows:
// Do only non-Python things up to this point
PyGILState_STATE state = PyGILState_Ensure();
// Do Python-things here, like PyRun_SimpleString(...)
PyGILState_Release(state);
// ... and now back to doing only non-Python things
We don't have to worry about deadlock any more than normal usage of extensions. When we entered our function, we had control over Python, so either we were not using threads (thus, no GIL), or the GIL was already assigned to us. When we give control back to the Python run-time by exiting our function, the normal processing loop will check the GIL and hand control of as appropriate to other requesting objects: including our worker threads via PyGILState_Ensure().
All of this the reader probably already knows. However, the "proof is in the pudding". I've posted a very-minimally-documented example that I wrote today to learn for myself what the behavior actually was, and that things work properly. Sample Source Code on GitHub
I was learning several things with the example, including CMake integration with Python development, SWIG integration with both of the above, and Python behaviors with extensions and threads. Still, the core of the example allows you to:
Load the module -- 'import annoy'
Load zero or more worker threads which do Python things -- 'annoy.annoy(n)'
Clear any worker threads -- 'annon.annoy(0)'
Provide thread cleanup (on Linux) at application exit
... and all of this without any crashes or segfaults. At least on my system (Ubuntu Linux w/ GCC).

The suggestion to call PyEval_SaveThread works
PyEval_InitThreads();
PyThreadState* st = PyEval_SaveThread();
However to prevent crash when module is imported, ensure Python APIs to import are protected using
PyGILState_Ensure and PyGILState_Release
e.g.
PyGILState_STATE gstate = PyGILState_Ensure();
PyObject *pyModule_p = PyImport_Import(pyModuleName_p);
PyGILState_Release(gstate);

I feel confuse on this issue too. The following code works by coincidence.
Py_InitializeEx(0);
if (!PyEval_ThreadsInitialized()) {
PyEval_InitThreads();
PyThreadState* mainPyThread = PyEval_SaveThread();
}
My main thread do some python runtime initial work, and create other pthread to handle tasks. And I have a better workaround for this. In the Main thread:
if (!PyEval_ThreadsInitialized()){
PyEval_InitThreads();
}
//other codes
while(alive) {
Py_BEGIN_ALLOW_THREADS
sleep or other block code
Py_END_ALLOW_THREADS
}

You don't need to call that in your extension modules. That's for initializing the interpreter which has already been done if your C-API extension module is being imported. This interface is to be used by embedding applications.
When is PyEval_InitThreads meant to be called?

Embedding python in multithreaded C application

I'm embedding the python interpreter in a multithreaded C application and I'm a little confused as to what APIs I should use to ensure thread safety.
From what I gathered, when embedding python it is up to the embedder to take care of the GIL lock before calling any other Python C API call. This is done with these functions:
gstate = PyGILState_Ensure();
// do some python api calls, run python scripts
PyGILState_Release(gstate);
But this alone doesn't seem to be enough. I still got random crashes since it doesn't seem to provide mutual exclusion for the Python APIs.
After reading some more docs I also added:
PyEval_InitThreads();
right after the call to Py_IsInitialized() but that's where the confusing part comes. The docs state that this function:
Initialize and acquire the global interpreter lock
This suggests that when this function returns, the GIL is supposed to be locked and should be unlocked somehow. but in practice this doesn't seem to be required. With this line in place my multithreaded worked perfectly and mutual exclusion was maintained by the PyGILState_Ensure/Release functions.
When I tried adding PyEval_ReleaseLock() after PyEval_ReleaseLock() the app dead-locked pretty quickly in a subsequent call to PyImport_ExecCodeModule().
So what am I missing here?

I had exactly the same problem and it is now solved by using PyEval_SaveThread() immediately after PyEval_InitThreads(), as you suggest above. However, my actual problem was that I used PyEval_InitThreads() after PyInitialise() which then caused PyGILState_Ensure() to block when called from different, subsequent native threads. In summary, this is what I do now:
There is global variable:
static int gil_init = 0;
From a main thread load the native C extension and start the Python interpreter:
Py_Initialize()
From multiple other threads my app concurrently makes a lot of calls into the Python/C API:
if (!gil_init) {
gil_init = 1;
PyEval_InitThreads();
PyEval_SaveThread();
}
state = PyGILState_Ensure();
// Call Python/C API functions...
PyGILState_Release(state);
From the main thread stop the Python interpreter
Py_Finalize()
All other solutions I've tried either caused random Python sigfaults or deadlock/blocking using PyGILState_Ensure().
The Python documentation really should be more clear on this and at least provide an example for both the embedding and extension use cases.

Eventually I figured it out.
After
PyEval_InitThreads();
You need to call
PyEval_SaveThread();
While properly release the GIL for the main thread.

Note that the if (!gil_init) { code in #forman's answer runs only once, so it can be just as well done in the main thread, which allows us to drop the flag (gil_init would properly have to be atomic or otherwise synchronized).
PyEval_InitThreads() is meaningful only in CPython 3.6 and older, and has been deprecated in CPython 3.9, so it has to be guarded with a macro.
Given all this, what I am currently using is the following:
In the main thread, run all of
Py_Initialize();
PyEval_InitThreads(); // only on Python 3.6 or older!
/* tstate = */ PyEval_SaveThread(); // maybe save the return value if you need it later
Now, whenever you need to call into Python, do
state = PyGILState_Ensure();
// Call Python/C API functions...
PyGILState_Release(state);
Finally, from the main thread, stop the Python interpreter
PyGILState_Ensure(); // PyEval_RestoreThread(tstate); seems to work just as well
Py_Finalize()

Having a multi-threaded C app trying to communicate from multiple threads to multiple Python threads of a single CPython instance looks risky to me.
As long as only one C thread communicates with Python you should not have to worry about locking even if the Python application is multi-threading.
If you need multiple python threads you can set the application up this way and have multiple C threads communicate via a queue with that single C thread that farms them out to multiple Python threads.
An alternative that might work for you is to have multiple CPython instances one for each C thread that needs it (of course communication between Python programs should be via the C program).
Another alternative might the Stackless Python interpreter. That does away with the GIL, but I am not sure you run into other problems binding it to multiple threads. stackless was a drop-in replacement for my (single-threaded) C application.

Python C API from C++ app - know when to lock

I am trying to write a C++ class that calls Python methods of a class that does some I/O operations (file, stdout) at once. The problem I have ran into is that my class is called from different threads: sometimes main thread, sometimes different others. Obviously I tried to apply the approach for Python calls in multi-threaded native applications. Basically everything starts from PyEval_AcquireLock and PyEval_ReleaseLock or just global locks. According to the documentation here when a thread is already locked a deadlock ensues. When my class is called from the main thread or other one that blocks Python execution I have a deadlock.
Python> Cfunc1() - C++ func that creates threads internally which lead to calls in "my class",
It stuck on PyEval_AcquireLock, obviously the Python is already locked, i.e. waiting for C++ Cfunc1 call to complete... It completes fine if I omit those locks. Also it completes fine when Python interpreter is ready for the next user command, i.e. when thread is calling funcs in the background - not inside of a native call
I am looking for a workaround. I need to distinguish whether or not the global lock is allowed, i.e. Python is not locked and ready to receive the next command... I tried PyGIL_Ensure, unfortunately I see hang.
Any known API or solution for this ?
(Python 2.4)

Unless you have wrapped your C++ code quite peculiarly, when any Python thread calls into your C++ code, the GIL is held. You may release it in your C++ code (if you want to do some consuming task that doesn't require any Python interaction), and then will have to acquire it again when you want to do any Python interaction -- see the docs: if you're just using the good old C API, there are macros for that, and the recommended idiom is
Py_BEGIN_ALLOW_THREADS
...Do some blocking I/O operation...
Py_END_ALLOW_THREADS
the docs explain:
The Py_BEGIN_ALLOW_THREADS macro opens
a new block and declares a hidden
local variable; the
Py_END_ALLOW_THREADS macro closes the
block. Another advantage of using
these two macros is that when Python
is compiled without thread support,
they are defined empty, thus saving
the thread state and GIL
manipulations.
So you just don't have to acquire the GIL (and shouldn't) until after you've explicitly released it (ideally with that macro) and need to interact with Python in any way again. (Where the docs say "some blocking I/O operation", it could actually be any long-running operation with no Python interaction whatsoever).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.