Python C API from C++ app - know when to lock - python

I am trying to write a C++ class that calls Python methods of a class that does some I/O operations (file, stdout) at once. The problem I have ran into is that my class is called from different threads: sometimes main thread, sometimes different others. Obviously I tried to apply the approach for Python calls in multi-threaded native applications. Basically everything starts from PyEval_AcquireLock and PyEval_ReleaseLock or just global locks. According to the documentation here when a thread is already locked a deadlock ensues. When my class is called from the main thread or other one that blocks Python execution I have a deadlock.
Python> Cfunc1() - C++ func that creates threads internally which lead to calls in "my class",
It stuck on PyEval_AcquireLock, obviously the Python is already locked, i.e. waiting for C++ Cfunc1 call to complete... It completes fine if I omit those locks. Also it completes fine when Python interpreter is ready for the next user command, i.e. when thread is calling funcs in the background - not inside of a native call
I am looking for a workaround. I need to distinguish whether or not the global lock is allowed, i.e. Python is not locked and ready to receive the next command... I tried PyGIL_Ensure, unfortunately I see hang.
Any known API or solution for this ?
(Python 2.4)

Unless you have wrapped your C++ code quite peculiarly, when any Python thread calls into your C++ code, the GIL is held. You may release it in your C++ code (if you want to do some consuming task that doesn't require any Python interaction), and then will have to acquire it again when you want to do any Python interaction -- see the docs: if you're just using the good old C API, there are macros for that, and the recommended idiom is
Py_BEGIN_ALLOW_THREADS
...Do some blocking I/O operation...
Py_END_ALLOW_THREADS
the docs explain:
The Py_BEGIN_ALLOW_THREADS macro opens
a new block and declares a hidden
local variable; the
Py_END_ALLOW_THREADS macro closes the
block. Another advantage of using
these two macros is that when Python
is compiled without thread support,
they are defined empty, thus saving
the thread state and GIL
manipulations.
So you just don't have to acquire the GIL (and shouldn't) until after you've explicitly released it (ideally with that macro) and need to interact with Python in any way again. (Where the docs say "some blocking I/O operation", it could actually be any long-running operation with no Python interaction whatsoever).

Related

How to manipulate the GIL when many threads need to execute?

My understanding is that the typical GIL manipulations involve, e.g., blocking I/O operations. Hence one would want to release the lock before the I/O operation and reacquire it once it has completed.
I'm currently facing a different scenario with a C extension: I am creating X windows that are exposed to Python via the Canvas class. When the method show() is called on an instance, a new UI thread is started using PyThreads (with a call to PyThread_start_new_thread). This new thread is responsible for drawing on the X window, using the Python code specified in the on_draw method of a subclass of Canvas. A pure C event loop is started in the main thread that simply checks for events on the X window and, for the time being, only captures the WM_DELETE_EVENT.
So I have potentially many threads (one for each X window) that want to execute Python code and the main thread that does not execute any Python code at all.
How do I release/acquire the GIL in order to allow the UI threads to get into the interpreter orderly?
The rule is easy: you need to hold the GIL to access Python machinery (any API starting with Py<...> and any PyObject).
So, you can release it whenever you don't need any of that.
Anything further than this is the fundamental problem of locking granularity: potential benefits vs locking overhead. There was an experiment for Py 1.4 to replace the GIL with more granular locks that failed exactly because the overhead proved prohibitive.
That's why it's typically released for code chunks involving call(s) to extental facilities that can take arbitrary time (especially if they involve waiting for external events) -- if you don't release the lock, Python will be just idling during this time.
Heeding this rule, you will get to your goal automatically: whenever a thread can't proceed further (whether it's I/O, signal from another thread, or even so much as a time.sleep() to avoid a busy loop), it will release the lock and allow other threads to proceed in its stead. The GIL assigning mechanism strives to be fair (see issue8299 for exploration on how fair it is), releasing the programmer from bothering about any bias stemming solely from the engine.
I think the problem stems from the fact that, in my opinion, the official documentation is a bit ambiguous on the meaning of Non-Python created threads. Quoting from it:
When threads are created using the dedicated Python APIs (such as the threading module), a thread state is automatically associated to them and the code showed above is therefore correct. However, when threads are created from C (for example by a third-party library with its own thread management), they don’t hold the GIL, nor is there a thread state structure for them.
I have highlighted in bold the parts that I find off-putting. As I have stated in the OP, I am calling PyThread_start_new_thread. Whilst this creates a new thread from C, this function is not part of a third-party library, but of the dedicated Python (C) APIs. Based on this assumption, I ruled out that I actually needed to use the PyGILState_Ensure/PyGILState_Release paradigm.
As far as I can tell from what I've seen with my experiments, a thread created from C with (just) PyThread_start_new_thread should be considered as a non-Python created thread.

Safe to call multiprocessing from a thread in Python?

According to
https://github.com/joblib/joblib/issues/180, and Is there a safe way to create a subprocess from a thread in python?
the Python multiprocessing module does not allow use from within threads. Is this true?
My understanding is that its fine to fork from threads, as long as you
aren't holding a threading.Lock when you do so (in the current thread? anywhere in the process?). However, Python's documentation is silent on whether threading.Lock objects are safely shared after a fork.
There's also this: locks shared from the logging module causes issues with fork. https://bugs.python.org/issue6721
I'm not sure how this issue arises. It sounds like the state of any locks in the process are copied into the child process when the current thread forks (which seems like a design error and certain to deadlock). If so, does using multiprocessing really provide any protection against this (since I'm free to create my multiprocessing.Pool after threading.Lock is created and entered by other threads, and after threads have started that using the not-fork-safe logging module) -- the multiprocessing module docs are also silent about whether multiprocessing.Pools should be allocated before Locks.
Does replacing threading.Lock with multiprocessing.Lock everywhere avoid this issue and allow us to safely combine threads and forks?
It sounds like the state of any locks in the process are copied into the child process when the current thread forks (which seems like a design error and certain to deadlock).
It is not a design error, rather, fork() predates single-process multithreading. The state of all locks is copied into the child process because they're just objects in memory; the entire address-space of the process is copied as is in fork. There are only bad alternatives: either copy all threads over fork, or deny forking in multithreaded application.
Therefore, fork()ing in a multithreading program was never the safe thing to do, unless then followed by execve() or exit() in the child process.
Does replacing threading.Lock with multiprocessing.Lock everywhere avoid this issue and allow us to safely combine threads and forks?
No. Nothing makes it safe to combine threads and forks, it cannot be done.
The problem is that when you have multiple threads in a process, after fork() system call you cannot continue safely running the program in POSIX systems.
For example, Linux manuals fork(2):
After a fork(2) in a multithreaded program, the child can safely call
only async-signal-safe functions (see signal(7)) until such time as it
calls execve(2).
I.e. it is OK to fork() in a multithreaded program and then only call async-signal-safe C functions (which is a rather limited subset of C functions), until the child process has been replaced with another executable!
Unsafe C function calls in child processes are then for example
malloc for dynamic memory allocation
any <stdio.h> functions for formatted input
most of the pthread_* functions required for thread state handling, including creation of new threads...
Thus there is very little what the child process can actually safely do. Unfortunately CPython core developers have been downplaying the problems caused by this. Even now the documentation says:
Note that safely forking a multithreaded process is
problematic.
Quite an euphemism for "impossible".
It is safe to use multiprocessing from a Python process that has multiple threads of control provided that you're not using the fork start method; in Python 3.4+ it is now possible to change the start method. In previous Python versions including all of Python 2, the POSIX systems always behaved as if fork was specified as the start method; this would result in undefined behaviour.
The problems are not limited to just threading.Lock objects but all locks held by the C standard library, the C extensions etc. What is worse that most of the time people would say "it works for me"... until it stops from working.
There were even a cases where a seemingly single-threading Python program is actually multithreading in MacOS X, causing failures and deadlocks upon using multiprocessing.
Another problem is that all opened file handles, their use, shared sockets might behave oddly in programs that forks, but that would be the case even in single-threaded programs.
TL;DR: using multiprocessing in multithreaded programs, with C extensions, with opened sockets etc:
fine in 3.4+ & POSIX if you explicitly specify a starting method that is not fork,
fine in Windows because it doesn't support forking;
in Python 2 - 3.3 on POSIX: you'll mostly shoot yourself in the foot.

Multithreading with Python and C api

I have a C++ program that uses the C api to use a Python library of mine.
Both the Python library AND the C++ code are multithreaded.
In particular, one thread of the C++ program instantiates a Python object that inherits from threading.Thread. I need all my C++ threads to be able to call methods on that object.
From my very first tries (I naively just instantiate the object from the main thread, then wait some time, then call the method) I noticed that the execution of the Python thread associated with the object just created stops as soon as the execution comes back to the C++ program.
If the execution stays with Python (for example, if I call PyRun_SimpleString("time.sleep(5)");) the execution of the Python thread continues in background and everything works fine until the wait ends and the execution goes back to C++.
I am evidently doing something wrong. What should I do to make both my C++ and Python multithreaded and capable of working with each other nicely? I have no previous experience in the field so please don't assume anything!
A correct order of steps to perform what you are trying to do is:
In the main thread:
Initialize Python using Py_Initialize*.
Initialize Python threading support using PyEval_InitThreads().
Start the C++ thread.
At this point, the main thread still holds the GIL.
In a C++ thread:
Acquire the GIL using PyGILState_Ensure().
Create a new Python thread object and start it.
Release the GIL using PyGILState_Release().
Sleep, do something useful or exit the thread.
Because the main thread holds the GIL, this thread will be waiting to acquire the GIL. If the main thread calls the Python API it may release the GIL from time to time allowing the Python thread to execute for a little while.
Back in the main thread:
Release the GIL, enabling threads to run using PyEval_SaveThread()
Before attempting to use other Python calls, reacquire the GIL using PyEval_RestoreThread()
I suspect that you are missing the last step - releasing the GIL in the main thread, allowing the Python thread to execute.
I have a small but complete example that does exactly that at this link.
You probably do not unlock the Global Interpreter Lock when you callback from python's threading.Thread.
Well, if you are using bare python's C API you have some documentation here, about how to release/acquire the GIL. But while using C++, I must warn you that it might broke down upon any exceptions throwing in your C++ code. See here.
In general any of your C++ function that runs for too long should unlock GIL and lock, whenever it use the C Python API again.

working in python console while executing a boost::python module

How can i keep using the console while executing a process from a boost::python module? I figured i have to use threading but I think I'm missing something.
import pk #my boost::python module from c++
import threading
t = threading.Thread(target=pk.showExample, args=())
t.start()
This executes showExample, which runs a Window rendering 3D content. Now i would like to keep on coding in the python console while this window is running. The example above works to show the Window but fails to keep the console interactive. Any Ideas how to do it? Thanks for any suggestions.
Greetings
Chris
Edit: I also tried to make Threads in the showExample() C++ code, didn't work as well. I probably have to make the console a thread, but I have not a clue how and can't find any helpful examples.
Edit2: to make the example more simple I implemented these c++ methods:
void Example::simpleWindow()
{
int running = GL_TRUE;
glfwInit();
glfwOpenWindow(800,600, 8,8,8,8,24,8, GLFW_WINDOW);
glewExperimental = GL_TRUE;
glewInit();
glEnable(GL_DEPTH_TEST);
glEnable(GL_CULL_FACE);
glCullFace(GL_BACK);
while(running)
{
glClearColor(0.0f, 0.0f, 0.0f, 0.0f);
glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT | GL_STENCIL_BUFFER_BIT);
glfwSwapBuffers();
running = !glfwGetKey(GLFW_KEY_ESC) && gkfwGetWindowParam(GLFW_OPENED);
}
}
void Example::makeWindowThread()
{
boost::thread t(simpleWindow);
t.join();
}
There may be some useless lines of code (it was just copy paste of a part from the real method i want to use.) Both methods are static. If I start interactive console in a thread and start the pk.makeWindowThread() in python, i can't give input anymore. Doesn't work if I put the call of pk.makeWindowThread() in a python thread as well. (I am trying to print something in console while showing the window.
When trying to execute a process while keeping the console interactive, then consider using the subprocess or multiprocessing modules. When doing this within Boost.Python, it is probably more appropriate to execute a process in C++ using execv() family of functions.
When trying to spawn a thread while keeping the console interactive, then one must consider the Global Interpreter Lock (GIL). In short, the GIL is a mutex around the interpreter, preventing parallel operations to be performed on Python objects. Thus, at any point in time, a max of one thread, the one that has acquired the GIL, is allowed to perform operations on Python objects.
For multithreaded Python programs with no C or C++ threads, the CPython interpreter functions as a cooperative scheduler, enabling concurrency. Threads will yield control when Python knows the thread is about to perform a blocking call. For example, a thread will release the GIL within time.sleep(). Additionally, the interpreter will force a thread to yield control after certain criteria have been met. For example, after a thread has executed a certain amount of bytecode operations, the interpreter will force it to yield control, allowing other threads to execute.
C or C++ threads are sometimes referred to as alien threads in the Python documentation. The Python interpreter has no ability to force an alien thread to yield control by releasing the GIL. Therefore, alien threads are responsible for managing the GIL to permit concurrent or parallel execution with Python threads. With this in mind, lets examine some of the C++ code:
void Example::makeWindowThread()
{
boost::thread t(simpleWindow);
t.join();
}
This will spawn a thread, and thread::join() will block, waiting for the t thread to complete execution. If this function is exposed to Python via Boost.Python, then the calling thread will block. As only one Python thread is allowed to be executed at any point in time, the calling thread will own the GIL. Once the calling thread blocks on t.join(), all other Python threads will remain blocked, as the interpreter cannot force the thread to yield control. To enable other Python threads to run, the GIL should be released pre-join, and acquired post-join.
void Example::makeWindowThread()
{
boost::thread t(simpleWindow);
release GIL // allow other python threads to run.
t.join();
acquire GIL // execution is going to occur within the interpreter.
}
However, this will still cause the console to block waiting for the thread to complete execution. Instead, consider spawning the thread and detaching from it via thread::detach(). As the calling thread will no longer block, managing the GIL within Example::makeWindowThread is no longer necessary.
void Example::makeWindowThread()
{
boost::thread(simpleWindow).detach();
}
For more details/examples of managing the GIL, please consider reading this answer for a basic implementation overview, and this answer for a much deeper dive into considerations one must take.
You have two options:
start python with the -i flag, that will cause to drop it to the interactive interperter instead of exiting from the main thread
start an interactive session manually:
import code
code.interact()
The second option is particularily useful if you want to run the interactive session in it's own thread, as some libraries (like PyQt/PySide) don't like it when they arn't started from the main thread:
from code import interact
from threading import Thread
Thread(target=interact, kwargs={'local': globals()}).start()
... # start some mainloop which will block the main thread
Passing local=globals() to interact is necessary so that you have access to the scope of the module, otherwise the interpreter session would only have access to the content of the thread's scope.

Are Python extensions produced by Cython/Pyrex threadsafe?

If not, is there a way I can guarantee thread safety by programming a certain way?
To clarify, when talking about "threadsafe,' I mean Python threads, not OS-level threads.
It all depends on the interaction between your Cython code and Python's GIL, as documented in detail here. If you don't do anything special, Cython-generated code will respect the GIL (as will a C-coded extension that doesn't use the GIL-releasing macros); that makes such code "as threadsafe as Python code" -- which isn't much, but is easier to handle than completely free-threading code (you still need to architect multi-threaded cooperation and synchronization, ideally with Queue instances but possibly with locking &c).
Code that has relinquished the GIL and not yet acquired it back MUST NOT in any way interact with the Python runtime and the objects that the Python runtime uses -- this goes for Cython just as well as for C-coded extensions. The upside of it is of course that such code can run on a separate core (until it needs to sync up or in any way communicate with the Python runtime again, of course).
Python's global interpreter lock means that only one thread can be active in the interpreter at any one time. However, once control is passed out to a C extension another thread can be active within the interpreter. Multiple threads can be created, and nothing prevents a thread from being interrupted within the middle of a critical section. N
on thread-safe code can be implemented within the interpreter, so nothing about code running within the interpreter is inherently thread safe. Code in C or Pyrex modules can still modify data structures that are visible to python code. Native code can, of course, also have threading issues with native data structures.
You can't guarantee thread safety beyond using appropriate design and synchronisation - the GIL on the python interpreter doesn't materially change this.

Categories