Embedding a Python interpreter in a multi-threaded C++ program with pybind11

Embedding a Python interpreter in a multi-threaded C++ program with pybind11 - python

I'm trying to use pybind11 in order to make a 3rd party C++ library call a Python method. The library is multithreaded, and each thread creates a Python object, and then does numerous calls to the object's methods.
My problem is that the call to py::gil_scoped_acquire acquire; deadlocks. A minimal code which reproduces the problem is given below. What am I doing wrong?
// main.cpp
class Wrapper
{
public:
Wrapper()
{
py::gil_scoped_acquire acquire;
auto obj = py::module::import("main").attr("PythonClass")();
_get_x = obj.attr("get_x");
_set_x = obj.attr("set_x");
}
int get_x()
{
py::gil_scoped_acquire acquire;
return _get_x().cast<int>();
}
void set_x(int x)
{
py::gil_scoped_acquire acquire;
_set_x(x);
}
private:
py::object _get_x;
py::object _set_x;
};
void thread_func()
{
Wrapper w;
for (int i = 0; i < 10; i++)
{
w.set_x(i);
std::cout << "thread: " << std::this_thread::get_id() << " w.get_x(): " << w.get_x() << std::endl;
std::this_thread::sleep_for(100ms);
}
}
int main() {
py::scoped_interpreter python;
std::vector<std::thread> threads;
for (int i = 0; i < 5; ++i)
threads.push_back(std::thread(thread_func));
for (auto& t : threads)
t.join();
return 0;
}
and the Python code:
// main.py
class PythonClass:
def __init__(self):
self._x = 0
def get_x(self):
return self._x
def set_x(self, x):
self._x = x
Related questions can be found here and here, but did not help me solve the problem.

I managed to resolve the issue by releasing the GIL in the main thread, before starting the worker threads (added py::gil_scoped_release release;). For anybody who is interested, the following now works (also added cleaning up Python objects):
#include <pybind11/embed.h>
#include <iostream>
#include <thread>
#include <chrono>
#include <sstream>
namespace py = pybind11;
using namespace std::chrono_literals;
class Wrapper
{
public:
Wrapper()
{
py::gil_scoped_acquire acquire;
_obj = py::module::import("main").attr("PythonClass")();
_get_x = _obj.attr("get_x");
_set_x = _obj.attr("set_x");
}
~Wrapper()
{
_get_x.release();
_set_x.release();
}
int get_x()
{
py::gil_scoped_acquire acquire;
return _get_x().cast<int>();
}
void set_x(int x)
{
py::gil_scoped_acquire acquire;
_set_x(x);
}
private:
py::object _obj;
py::object _get_x;
py::object _set_x;
};
void thread_func(int iteration)
{
Wrapper w;
for (int i = 0; i < 10; i++)
{
w.set_x(i);
std::stringstream msg;
msg << "iteration: " << iteration << " thread: " << std::this_thread::get_id() << " w.get_x(): " << w.get_x() << std::endl;
std::cout << msg.str();
std::this_thread::sleep_for(100ms);
}
}
int main() {
py::scoped_interpreter python;
py::gil_scoped_release release; // add this to release the GIL
std::vector<std::thread> threads;
for (int i = 0; i < 5; ++i)
threads.push_back(std::thread(thread_func, 1));
for (auto& t : threads)
t.join();
return 0;
}

Related to #bavaza's answer above, there is a way to self-contain both initialization and GIL release into a single class. You have to be careful, as that class is a singleton now (no different than scoped_interpreter), but it's possible. Here's the idea:
#include <pybind11/embed.h>
#include <memory>
using py = pybind11;
class PythonWrapper {
public:
PythonWrapper() : m_interpreter() {
// Do whatever one-time module/object initialization you want here
py::object obj = py::module::import("main").attr("PythonClass")(); // Speeds up importing later
// Last line of constructor releases the GIL
mp_gil_release = std::make_unique<py::gil_scoped_release>();
}
private:
py::scoped_interpreter m_interpreter;
// Important that this is the LAST member, so it gets destructed first, re-acquiring the GIL
std::unique_ptr<py::gil_scoped_release> mp_gil_release;
};
This would replace the two objects on the stack in main, leaving the Wrapper class unchanged! And if you wanted to have a true singleton for all your Python calls, this would help there too.
Again, thanks to #bavaza for the original solution. It helped me get my head around the right way to use the scoped locks for my own cross-thread usage.

Python is known to have a Global Interpreter Lock.
So you basically need to write your own Python interpreter from scratch, or download the source code of Python and improve it a lot.
If you are on Linux, you could consider running many Python interpreters (using appropriate syscalls(2), with pipe(7) or unix(7) for interprocess communication) - perhaps one Python process communicating with each of your C++ threads.
What am I doing wrong?
Coding in Python something which should be coded otherwise. Did you consider trying SBCL?
Some libraries (e.g. Tensorflow) can be called from both Python and C++. Maybe you could take inspiration from them...
In practice, if you have just a dozen C++ threads on a powerful Linux machine, you could afford having one Python process per C++ thread. So each C++ thread would have its own companion Python process.
Otherwise, budget several years of work to improve the source code of Python to remove its GIL. You might code your GCC plugin to help you on that task -analyzing and understanding the C code of Python.

Related

How to use future / async in cppyy

I'm trying to use future from C++ STL via cppyy (a C++-python binding packet).
For example, I could run this following code in C++ (which is adapted from this answer)
#include <future>
#include <thread>
#include <chrono>
#include <iostream>
using namespace std;
using namespace chrono_literals;
int main () {
promise<int> p;
future<int> f = p.get_future();
thread t([&p]() {
this_thread::sleep_for(10s);
p.set_value(2);
});
auto status = f.wait_for(10ms);
if (status == future_status::ready) {
cout << "task is read" << endl;
} else {
cout << "task is running" << endl;
}
t.join();
return 0;
}
A similar implementation of the above in Python is
import cppyy
cppyy.cppdef(r'''
#include <future>
#include <thread>
#include <chrono>
#include <iostream>
using namespace std;
int test () {
promise<int> p;
future<int> f = p.get_future();
thread t([&p]() {
this_thread::sleep_for(10s);
p.set_value(2);
});
auto status = f.wait_for(10ms);
if (status == future_status::ready) {
cout << "task is read" << endl;
} else {
cout << "task is running" << endl;
}
t.join();
return 0;
}
''')
cppyy.gbl.test()
And the above code yields
IncrementalExecutor::executeFunction: symbol '__emutls_v._ZSt15__once_callable' unresolved while linking symbol '__cf_4'!
IncrementalExecutor::executeFunction: symbol '__emutls_v._ZSt11__once_call' unresolved while linking symbol '__cf_4'!
It looks like it's caused by using future in cppyy.
Any solutions to this?

Clang9's JIT does not support thread local storage the way the modern g++ implements it, will check again when the (on-going) upgrade to Clang13 is finished, which may resolve this issue.
Otherwise, cppyy mixes fine with threaded code (e.g. the above example runs fine on MacOS, with Clang the system compiler). Just that any TLS use needs to sit in compiled library code while the JIT has this limitation.

`threading.local` unexpected behavior with Python embedding

I am embedding Python using C embedding API. The main thread does
Py_Initialize();
PyEval_InitThreads();
Then I have more threads created by native code, whose lifetime I do not control. They need to call Python too. So far they seemed to be working fine with
void* gil = PyGILState_Ensure();
calls to Python go here
PyGILState_Release(gil);
Problem is given this simple setup I faced issues with Python code, that uses threading.local. Imagine secondary thread S, that periodically executes increase_counter:
// initialized once at the beginning of program to
// an instance of threading.local()
PyObject* threadLocal;
...
void increase_counter()
{
void* gil = PyGILState_Ensure();
// this really does C API calls, but for simplicity I'll write Python
if hasattr(threadLocal, "counter"):
threadLocal.counter += 1
else:
threadLocal.counter = 1
// end of Python
PyGILState_Release(gil);
}
Well, the problem is that in thread S multiple calls to increase_counter don't actually increase anything - hasattr always returns False, and the value of counter for this thread is discarded as soon as PyGILState_Release is called.
It only works correctly in S if the whole body of S is wrapped into:
void* gilForS = PyGILState_Ensure();
void* sPythonThreadState = PyEval_SaveThread();
// rest of the S body, that sometimes calls increase_counter
PyEval_RestoreThread(sPythonThreadState);
PyGILState_Release(gilForS);
Which I can do for the purpose of this question, but in the actual product the lifetime of S is not controlled by me (it is a thread pool thread), only increase_counter, so I can't make it run PyEval_SaveThread in the beginning, and I can't ensure PyEval_RestoreThread will be called in the end.
What is the proper way to initialize threads like S so that threading.local would correctly work there?
Full example, that reproduces the issue as requested. Prints "set counter" twice instead of "set counter" + "Found counter!". It works as expected if I uncomment code in async_thread which I can't do in real product:
#include <Python.h>
#include <pthread.h>
PyObject *threadLocal;
void *async_thread(void *arg);
void increase_counter();
int main(int argc, char *argv[])
{
Py_Initialize();
PyEval_InitThreads();
PyObject *threading = PyImport_ImportModule("threading");
PyObject *tlocal = PyObject_GetAttrString(threading, "local");
threadLocal = PyObject_Call(tlocal, PyTuple_New(0), NULL);
pthread_t async;
int err = pthread_create(&async, NULL, async_thread, NULL);
if (err)
{
printf("unable to create thread\n");
exit(-1);
}
PyThreadState* ts = PyEval_SaveThread();
pthread_join(async, NULL);
PyEval_RestoreThread(ts);
Py_Finalize();
pthread_exit(NULL);
}
void *async_thread(void *arg)
{
//PyGILState_STATE gil = PyGILState_Ensure();
for (int i = 0; i < 2; i++)
{
increase_counter();
}
//PyGILState_Release(gil);
pthread_exit(NULL);
return NULL;
}
void increase_counter()
{
PyGILState_STATE gil = PyGILState_Ensure();
if (PyObject_HasAttrString(threadLocal, "counter"))
{
printf("Found counter!\n");
}
else
{
PyObject *val = PyLong_FromLong(1);
PyObject_SetAttrString(threadLocal, "counter", val);
printf("set counter\n");
}
PyGILState_Release(gil);
}

Boost Python issue with converters - static linking

I have a question regarding the below code.
It's an example how to pass a custom class via shared_ptr to embedded python code and it works when boost is dynamically linked.
Unfortunately the same code with statically linked boost doesn't work with the following error message:
"No to_python (by-value) converter found for C++ type: class boost::shared_ptr".
I don't understand why a different linking can affect type recognition of a registered converter. What am I missing?
Can anybody help me out?
Thanks,
Dominik
Example from here.
#include <boost/shared_ptr.hpp>
#include <boost/make_shared.hpp>
#include <boost/python.hpp>
#include <string>
#include <iostream>
namespace bp = boost::python;
struct Foo{
Foo(){}
Foo(std::string const& s) : m_string(s){}
void doSomething() {
std::cout << "Foo:" << m_string << std::endl;
}
std::string m_string;
};
typedef boost::shared_ptr<Foo> foo_ptr;
BOOST_PYTHON_MODULE(hello)
{
bp::class_<Foo, foo_ptr>("Foo")
.def("doSomething", &Foo::doSomething)
;
};
int main(int argc, char **argv)
{
Py_Initialize();
try {
PyRun_SimpleString(
"a_foo = None\n"
"\n"
"def setup(a_foo_from_cxx):\n"
" print 'setup called with', a_foo_from_cxx\n"
" global a_foo\n"
" a_foo = a_foo_from_cxx\n"
"\n"
"def run():\n"
" a_foo.doSomething()\n"
"\n"
"print 'main module loaded'\n"
);
foo_ptr a_cxx_foo = boost::make_shared<Foo>("c++");
inithello();
bp::object main = bp::object(bp::handle<>(bp::borrowed(
PyImport_AddModule("__main__")
)));
// pass the reference to a_cxx_foo into python:
bp::object setup_func = main.attr("setup");
setup_func(a_cxx_foo);
// now run the python 'main' function
bp::object run_func = main.attr("run");
run_func();
}
catch (bp::error_already_set) {
PyErr_Print();
}
Py_Finalize();
return 0;
}

I far as I understand the documentation about Boost Python linkage, it seems that the conversion registry used for automatic conversion of Python object into C++ object is not available when statically linked. I'm facing the same issue and that's a pity it is actually the case. I would have imagined at least the required converter to be bundle but I'm afraid it is not the case for some reason.

Calling python object's method from c++

I am trying to achieve the following: passing a python object to a c++ callback chain (which are typical in many popular c++ libraries). In the c++ code, callbacks pass on objects that have necessary information for consecutive callbacks in the cascade/chain.
Here is a small test code I wrote: we pass a python object to a c routine (case 1) and call it's method. That works ok. But when I pass the python object to a c++ object and try to call it "inside" the c++ object, I get segfault.. :(
Here it goes:
c++ module ("some.cpp"):
#include <stdint.h>
#include <iostream>
#include <Python.h>
/* objective:
* create c++ objects / routines that accept python objects
* then call methods of the python objects inside c++
*
* python objects (including its variables and methods) could be passed along, for example in c++ callback chains ..
* .. and in the end we could call a python callback
*
* Compile and test as follows:
* python setup.py build_ext
* [copy/link some.so where test.py is]
* python test.py
*
*/
class testclass {
public:
testclass(int* i, PyObject* po) {
std::cerr << "testclass constructor! \n";
i=i; po=po;
}
~testclass() {}
void runpo() {
PyObject* name;
const char* mname="testmethod";
name=PyString_FromString(mname);
std::cerr << "about to run the python method .. \n";
PyObject_CallMethodObjArgs(po, name, NULL);
std::cerr << ".. you did it - i will buy you a beer!\n";
}
public:
int* i;
PyObject* po;
};
/* Docstrings */
static char module_docstring[] = "hand-made python module";
/* Available functions */
static PyObject* regi_wrapper(PyObject * self, PyObject * args);
void regi(int* i, PyObject* po);
/* Module specification */
static PyMethodDef module_methods[] = {
{"regi_wrapper",regi_wrapper, METH_VARARGS, "lets see if we can wrap this sucker"},
{NULL, NULL, 0, NULL}
};
/* Initialize the module */
PyMODINIT_FUNC initsome(void)
{
PyObject *m = Py_InitModule3("some", module_methods, module_docstring);
if (m == NULL)
return;
// import_array(); // numpy not required here ..
}
static PyObject* regi_wrapper(PyObject * self, PyObject * args)
{
int* input_i; // whatever input variable
PyObject* input_po; // python object
PyObject* ret; // return variable
// parse arguments
if (!PyArg_ParseTuple(args, "iO", &input_i, &input_po)) {
return NULL;
}
// https://stackoverflow.com/questions/16606872/calling-python-method-from-c-or-c-callback
// Py_INCREF(input_po); // need this, right? .. makes no difference
/* // seems not to make any difference ..
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
*/
regi(input_i, input_po);
// PyGILState_Release(gstate); // .. makes no difference
// Py_DECREF(input_po); // .. makes no difference
Py_RETURN_TRUE;
}
void regi(int* i, PyObject* po) {
// search variable and methods from PyObject "po" and call its methods?
PyObject* name;
const char* mname="testmethod";
testclass* testobj;
testobj=new testclass(i,po);
/* [insert // in front of this line to test case 1]
// ***** this one works! *********
name=PyString_FromString(mname);
PyObject_CallMethodObjArgs(po, name, NULL);
*/ // [insert // in front of this line to test case 1]
// *** I WOULD LIKE THIS TO WORK *** but it gives segfault.. :(
testobj->runpo(); // [uncomment to test case 2]
}
setup.py:
from distutils.core import setup, Extension
# the c++ extension module
extension_mod = Extension("some", ["some.cpp"])
setup(name = "some", ext_modules=[extension_mod])
test.py:
import some
class sentinel:
def __init__(self):
pass
def testmethod(self):
print "hello from sentinel.testmethod"
pass
se=sentinel()
some.regi_wrapper(1,se)
This question seems relevant:
Calling python method from C++ (or C) callback
.. however the answer did not help me.
What am I missing/misunderstanding here (my c++ sucks big time, so I might have missed something obvious) .. ?
Also, some bonus questions:
a) I am familiar with swig and swig "directors".. however, I would like to use swig for general wrapping of the code, but my custom wrapping for the sort of things described in this question (i.e. without directors). Is there any way to achieve this?
b) Any other suggestions to achieve what I am trying to achieve here, are highly appreciated.. is this possible or just pure insanity?

Using in the constructor
po=this->po
solves the "issue". Sorry for the spam! I will leave here this thing as an example.. maybe someone finds it useful.

how do i get a structure in c++ from python?

My C++ program:
#include <iostream>
using namespace std;
struct FirstStructure
{
public:
int first_int;
int second_int;
};
struct SecondStructure
{
public:
int third_int;
FirstStructure ft;
};
int test_structure(SecondStructure ss)
{
int sum = ss.ft.first_int + ss.ft.second_int + ss.third_int;
return sum;
}
extern "C"
{
int test(SecondStructure ss)
{
return test_structure(ss);
}
}
And I compile the cpp file use this command "g++ -fPIC -shared -o array.so array.cpp".
Then I call the file array.so use python,my python program as these:
#coding=utf-8
import ctypes
from ctypes import *
class FirstStructure(Structure):
_fields_ = [
("first_int", c_int),
("second_int", c_int)
]
class SecondStructure(Structure):
_fields_ = [
("third_int", c_int),
("ft", FirstStructure)
]
if __name__ == '__main__':
fs = FirstStructure(1, 2)
ss = SecondStructure(3, fs)
print ss.ft.first_int
lib = ctypes.CDLL("./array.so")
print lib.test(ss)
When I run the python program,the console show an error, the error is "segmentation fault".I read the documentation from the url "https://docs.python.org/2/library/ctypes.html",how to fix the bug.

You have to declare a function's argument and return types in python, in order to be able to call it properly.
So, insert the following before calling the test function:
lib.test.argtypes = [SecondStructure]
lib.test.restype = ctypes.c_int
Things should work then, as far as I can see...
As long as the amount of C-to-python interfaces remains "small" (whatever that is), I think ctypes is just fine.

ok,I got it,modified code as these:
#include <iostream>
using namespace std;
extern "C"
{
struct FirstStructure
{
public:
int first_int;
int second_int;
};
struct SecondStructure
{
public:
int third_int;
FirstStructure ft;
};
int test_structure(SecondStructure *ss)
{
int sum = ss->ft.first_int + ss->ft.second_int + ss->third_int;
return sum;
}
int test(SecondStructure *ss)
{
return test_structure(ss);
}
}
and then,I fixed the bug.

Well if you are intending to design communication medium between C++ and python then I would suggest go for combination zmq and google protocol buffers.
where proto buf would serve for serialization/deserialization and zmq for communication medium.

You might want to have a look at Boost.python
https://wiki.python.org/moin/boost.python/SimpleExample
It will allow you to compile python modules from C++ and define how the python is allowed to access the c++ code in an easy to understand fashion

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Embedding a Python interpreter in a multi-threaded C++ program with pybind11 - python

Related

How to use future / async in cppyy

`threading.local` unexpected behavior with Python embedding

Boost Python issue with converters - static linking

Calling python object's method from c++

how do i get a structure in c++ from python?

Categories

Resources