`threading.local` unexpected behavior with Python embedding

`threading.local` unexpected behavior with Python embedding - python

I am embedding Python using C embedding API. The main thread does
Py_Initialize();
PyEval_InitThreads();
Then I have more threads created by native code, whose lifetime I do not control. They need to call Python too. So far they seemed to be working fine with
void* gil = PyGILState_Ensure();
calls to Python go here
PyGILState_Release(gil);
Problem is given this simple setup I faced issues with Python code, that uses threading.local. Imagine secondary thread S, that periodically executes increase_counter:
// initialized once at the beginning of program to
// an instance of threading.local()
PyObject* threadLocal;
...
void increase_counter()
{
void* gil = PyGILState_Ensure();
// this really does C API calls, but for simplicity I'll write Python
if hasattr(threadLocal, "counter"):
threadLocal.counter += 1
else:
threadLocal.counter = 1
// end of Python
PyGILState_Release(gil);
}
Well, the problem is that in thread S multiple calls to increase_counter don't actually increase anything - hasattr always returns False, and the value of counter for this thread is discarded as soon as PyGILState_Release is called.
It only works correctly in S if the whole body of S is wrapped into:
void* gilForS = PyGILState_Ensure();
void* sPythonThreadState = PyEval_SaveThread();
// rest of the S body, that sometimes calls increase_counter
PyEval_RestoreThread(sPythonThreadState);
PyGILState_Release(gilForS);
Which I can do for the purpose of this question, but in the actual product the lifetime of S is not controlled by me (it is a thread pool thread), only increase_counter, so I can't make it run PyEval_SaveThread in the beginning, and I can't ensure PyEval_RestoreThread will be called in the end.
What is the proper way to initialize threads like S so that threading.local would correctly work there?
Full example, that reproduces the issue as requested. Prints "set counter" twice instead of "set counter" + "Found counter!". It works as expected if I uncomment code in async_thread which I can't do in real product:
#include <Python.h>
#include <pthread.h>
PyObject *threadLocal;
void *async_thread(void *arg);
void increase_counter();
int main(int argc, char *argv[])
{
Py_Initialize();
PyEval_InitThreads();
PyObject *threading = PyImport_ImportModule("threading");
PyObject *tlocal = PyObject_GetAttrString(threading, "local");
threadLocal = PyObject_Call(tlocal, PyTuple_New(0), NULL);
pthread_t async;
int err = pthread_create(&async, NULL, async_thread, NULL);
if (err)
{
printf("unable to create thread\n");
exit(-1);
}
PyThreadState* ts = PyEval_SaveThread();
pthread_join(async, NULL);
PyEval_RestoreThread(ts);
Py_Finalize();
pthread_exit(NULL);
}
void *async_thread(void *arg)
{
//PyGILState_STATE gil = PyGILState_Ensure();
for (int i = 0; i < 2; i++)
{
increase_counter();
}
//PyGILState_Release(gil);
pthread_exit(NULL);
return NULL;
}
void increase_counter()
{
PyGILState_STATE gil = PyGILState_Ensure();
if (PyObject_HasAttrString(threadLocal, "counter"))
{
printf("Found counter!\n");
}
else
{
PyObject *val = PyLong_FromLong(1);
PyObject_SetAttrString(threadLocal, "counter", val);
printf("set counter\n");
}
PyGILState_Release(gil);
}

Related

Embedding a Python interpreter in a multi-threaded C++ program with pybind11

I'm trying to use pybind11 in order to make a 3rd party C++ library call a Python method. The library is multithreaded, and each thread creates a Python object, and then does numerous calls to the object's methods.
My problem is that the call to py::gil_scoped_acquire acquire; deadlocks. A minimal code which reproduces the problem is given below. What am I doing wrong?
// main.cpp
class Wrapper
{
public:
Wrapper()
{
py::gil_scoped_acquire acquire;
auto obj = py::module::import("main").attr("PythonClass")();
_get_x = obj.attr("get_x");
_set_x = obj.attr("set_x");
}
int get_x()
{
py::gil_scoped_acquire acquire;
return _get_x().cast<int>();
}
void set_x(int x)
{
py::gil_scoped_acquire acquire;
_set_x(x);
}
private:
py::object _get_x;
py::object _set_x;
};
void thread_func()
{
Wrapper w;
for (int i = 0; i < 10; i++)
{
w.set_x(i);
std::cout << "thread: " << std::this_thread::get_id() << " w.get_x(): " << w.get_x() << std::endl;
std::this_thread::sleep_for(100ms);
}
}
int main() {
py::scoped_interpreter python;
std::vector<std::thread> threads;
for (int i = 0; i < 5; ++i)
threads.push_back(std::thread(thread_func));
for (auto& t : threads)
t.join();
return 0;
}
and the Python code:
// main.py
class PythonClass:
def __init__(self):
self._x = 0
def get_x(self):
return self._x
def set_x(self, x):
self._x = x
Related questions can be found here and here, but did not help me solve the problem.

I managed to resolve the issue by releasing the GIL in the main thread, before starting the worker threads (added py::gil_scoped_release release;). For anybody who is interested, the following now works (also added cleaning up Python objects):
#include <pybind11/embed.h>
#include <iostream>
#include <thread>
#include <chrono>
#include <sstream>
namespace py = pybind11;
using namespace std::chrono_literals;
class Wrapper
{
public:
Wrapper()
{
py::gil_scoped_acquire acquire;
_obj = py::module::import("main").attr("PythonClass")();
_get_x = _obj.attr("get_x");
_set_x = _obj.attr("set_x");
}
~Wrapper()
{
_get_x.release();
_set_x.release();
}
int get_x()
{
py::gil_scoped_acquire acquire;
return _get_x().cast<int>();
}
void set_x(int x)
{
py::gil_scoped_acquire acquire;
_set_x(x);
}
private:
py::object _obj;
py::object _get_x;
py::object _set_x;
};
void thread_func(int iteration)
{
Wrapper w;
for (int i = 0; i < 10; i++)
{
w.set_x(i);
std::stringstream msg;
msg << "iteration: " << iteration << " thread: " << std::this_thread::get_id() << " w.get_x(): " << w.get_x() << std::endl;
std::cout << msg.str();
std::this_thread::sleep_for(100ms);
}
}
int main() {
py::scoped_interpreter python;
py::gil_scoped_release release; // add this to release the GIL
std::vector<std::thread> threads;
for (int i = 0; i < 5; ++i)
threads.push_back(std::thread(thread_func, 1));
for (auto& t : threads)
t.join();
return 0;
}

Related to #bavaza's answer above, there is a way to self-contain both initialization and GIL release into a single class. You have to be careful, as that class is a singleton now (no different than scoped_interpreter), but it's possible. Here's the idea:
#include <pybind11/embed.h>
#include <memory>
using py = pybind11;
class PythonWrapper {
public:
PythonWrapper() : m_interpreter() {
// Do whatever one-time module/object initialization you want here
py::object obj = py::module::import("main").attr("PythonClass")(); // Speeds up importing later
// Last line of constructor releases the GIL
mp_gil_release = std::make_unique<py::gil_scoped_release>();
}
private:
py::scoped_interpreter m_interpreter;
// Important that this is the LAST member, so it gets destructed first, re-acquiring the GIL
std::unique_ptr<py::gil_scoped_release> mp_gil_release;
};
This would replace the two objects on the stack in main, leaving the Wrapper class unchanged! And if you wanted to have a true singleton for all your Python calls, this would help there too.
Again, thanks to #bavaza for the original solution. It helped me get my head around the right way to use the scoped locks for my own cross-thread usage.

Python is known to have a Global Interpreter Lock.
So you basically need to write your own Python interpreter from scratch, or download the source code of Python and improve it a lot.
If you are on Linux, you could consider running many Python interpreters (using appropriate syscalls(2), with pipe(7) or unix(7) for interprocess communication) - perhaps one Python process communicating with each of your C++ threads.
What am I doing wrong?
Coding in Python something which should be coded otherwise. Did you consider trying SBCL?
Some libraries (e.g. Tensorflow) can be called from both Python and C++. Maybe you could take inspiration from them...
In practice, if you have just a dozen C++ threads on a powerful Linux machine, you could afford having one Python process per C++ thread. So each C++ thread would have its own companion Python process.
Otherwise, budget several years of work to improve the source code of Python to remove its GIL. You might code your GCC plugin to help you on that task -analyzing and understanding the C code of Python.

Embedding multiple python 3 interpreters with different built-in modules

I embedded the python 3.6 interpreter successfully in a C++ program, but I have a problem.
I'd like to embed two interpreters in the same program:
One which can use my C++ defined module (MyModule)
One which can not use this module.
Regarding the documentation I should call PyImport_AppendInittab before Py_Initialize function, so the module will be available in the whole program, but I'd like to create separate interpreters with separate built-in modules.
Calling Py_Initialize and Py_Finalize doesn't help, the module will be available in the second interpreter, too. By the way calling init and finalize function multiple times causes huge memory leaks, so I think this wouldn't be a good solution even if it would work.
Do you have any idea how to solve this issue?
Full code:
#include <iostream>
#pragma push_macro("_DEBUG")
#undef _DEBUG
#include "Python.h"
#pragma pop_macro("_DEBUG")
static PyObject* Addition (PyObject *self, PyObject *args)
{
double a = 0.0;
double b = 0.0;
if (!PyArg_ParseTuple (args, "dd", &a, &b)) {
return nullptr;
}
double result = a + b;
return PyFloat_FromDouble (result);
}
static PyMethodDef ModuleMethods[] =
{
{"Add", Addition, METH_VARARGS, "Adds numbers."},
{nullptr, nullptr, 0, nullptr}
};
static PyModuleDef ModuleDef = {
PyModuleDef_HEAD_INIT,
"MyModule",
NULL,
-1,
ModuleMethods,
NULL,
NULL,
NULL,
NULL
};
static PyObject* ModuleInitializer (void)
{
return PyModule_Create (&ModuleDef);
}
int main ()
{
Py_SetPythonHome (L".");
Py_SetPath (L"python36.zip\\Lib");
{ // first interpreter
PyImport_AppendInittab ("MyModule", ModuleInitializer);
Py_Initialize ();
PyRun_SimpleString (
"import MyModule\n"
"print (MyModule.Add (1, 2))"
);
Py_Finalize ();
}
{ // second interpreter without AppendInittab (should not find MyModule, but it does)
Py_Initialize ();
PyRun_SimpleString (
"import MyModule\n"
"print (MyModule.Add (1, 2))"
);
Py_Finalize ();
}
system ("pause");
return 0;
}

Use a C exe as Nativehost application instead python in chrome NativeHost messaging

I took this chrome app example for NativeMessaging, it allows a python program to run and talk back to app. I converted it to extension and am trying to replace the python script with a C program but my C program is not able to communicate back to extension.my question is, Is it possible to use C instead python (looks possible to me, at least). if it is possible then what am I doing wrong?
Code for replacement of python:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <pthread.h>
#define NUM_THREADS 2
int t1=0;
void *thr_fwrite(void *arg) {
FILE *p;
if(arg==NULL)
p=(FILE*)arg;
char buffer[200]={"0023 hello how are you q"};;
int len=27;
while(1)
{
if(t1==0){
fwrite(buffer,len,sizeof(char),stdout);
fwrite(buffer,len,sizeof(char),p);
t1=1;
}
}
pthread_exit(NULL);
}
/* thread function */
void *thr_fread(void *arg) {
FILE *p;
if(arg==NULL)
p=(FILE*)arg;
char buffer[200]={""};
int len=0;
while(1)
{
if(t1==1){
fread(buffer,1,sizeof(int),stdin);
if(strcmp(buffer,"")==0)
exit(0);
len=atoi(buffer);
if(len==0 || len>200)
exit(0);
fread(buffer,len,sizeof(char),stdin);
fwrite(buffer,len,sizeof(char),p);
t1=0;
}
}
pthread_exit(NULL);
}
int main(int argc, char **argv) {
pthread_t thr[NUM_THREADS];
int i, rc;
FILE *p;
FILE *q;
p=fopen("~/temp.txt","w+");
q=fopen("~/temp1.txt","w+");
/* create threads */
if ((rc = pthread_create(&thr[0], NULL, thr_fwrite, &p))) {
fprintf(stderr, "error: pthread_create, rc: %d\n", rc);
return EXIT_FAILURE;
}
if ((rc = pthread_create(&thr[1], NULL, thr_fread, &q))) {
fprintf(stderr, "error: pthread_create, rc: %d\n", rc);
return EXIT_FAILURE;
}
/* block until all threads complete */
for (i = 0; i < NUM_THREADS; ++i) {
pthread_join(thr[i], NULL);
}
return EXIT_SUCCESS;
}
partial code for chrome extension code is:
function ConnectHost() {
var hostName = "com.example.nativehost";
port = chrome.runtime.connectNative(hostName);
if(port == null)
{
alert("Could not connect to Host Client");
}
else
{
port.onMessage.addListener(onNativeMessage);
port.onDisconnect.addListener(onDisconnected);
}
function onDisconnected() {
alert(chrome.runtime.lastError.message);
port = null;
}
function onNativeMessage(message) {
alert(message);
SendToNative("exit");
}
Do we have any example already which uses C instead of python?

Chrome doesn't care (and cannot know) what's on the other side as long as the program answers in expected format (see Native Messaging Protocol). Which does not seem to be the case here from a brief glance at your code.
Chrome will only accept messages consisting of 4-byte length header + UTF-8 encoded string that is valid JSON.
There are some examples of C Native Hosts on SO.
An important caveat to consider is that the communication should be in binary mode, not text mode:
Windows-only: Make sure that the program's I/O mode is set to O_BINARY. By default, the I/O mode is O_TEXT, which corrupts the message format as line breaks (\n = 0A) are replaced with Windows-style line endings (\r\n = 0D 0A). The I/O mode can be set using __setmode.

Calling python object's method from c++

I am trying to achieve the following: passing a python object to a c++ callback chain (which are typical in many popular c++ libraries). In the c++ code, callbacks pass on objects that have necessary information for consecutive callbacks in the cascade/chain.
Here is a small test code I wrote: we pass a python object to a c routine (case 1) and call it's method. That works ok. But when I pass the python object to a c++ object and try to call it "inside" the c++ object, I get segfault.. :(
Here it goes:
c++ module ("some.cpp"):
#include <stdint.h>
#include <iostream>
#include <Python.h>
/* objective:
* create c++ objects / routines that accept python objects
* then call methods of the python objects inside c++
*
* python objects (including its variables and methods) could be passed along, for example in c++ callback chains ..
* .. and in the end we could call a python callback
*
* Compile and test as follows:
* python setup.py build_ext
* [copy/link some.so where test.py is]
* python test.py
*
*/
class testclass {
public:
testclass(int* i, PyObject* po) {
std::cerr << "testclass constructor! \n";
i=i; po=po;
}
~testclass() {}
void runpo() {
PyObject* name;
const char* mname="testmethod";
name=PyString_FromString(mname);
std::cerr << "about to run the python method .. \n";
PyObject_CallMethodObjArgs(po, name, NULL);
std::cerr << ".. you did it - i will buy you a beer!\n";
}
public:
int* i;
PyObject* po;
};
/* Docstrings */
static char module_docstring[] = "hand-made python module";
/* Available functions */
static PyObject* regi_wrapper(PyObject * self, PyObject * args);
void regi(int* i, PyObject* po);
/* Module specification */
static PyMethodDef module_methods[] = {
{"regi_wrapper",regi_wrapper, METH_VARARGS, "lets see if we can wrap this sucker"},
{NULL, NULL, 0, NULL}
};
/* Initialize the module */
PyMODINIT_FUNC initsome(void)
{
PyObject *m = Py_InitModule3("some", module_methods, module_docstring);
if (m == NULL)
return;
// import_array(); // numpy not required here ..
}
static PyObject* regi_wrapper(PyObject * self, PyObject * args)
{
int* input_i; // whatever input variable
PyObject* input_po; // python object
PyObject* ret; // return variable
// parse arguments
if (!PyArg_ParseTuple(args, "iO", &input_i, &input_po)) {
return NULL;
}
// https://stackoverflow.com/questions/16606872/calling-python-method-from-c-or-c-callback
// Py_INCREF(input_po); // need this, right? .. makes no difference
/* // seems not to make any difference ..
PyGILState_STATE gstate;
gstate = PyGILState_Ensure();
*/
regi(input_i, input_po);
// PyGILState_Release(gstate); // .. makes no difference
// Py_DECREF(input_po); // .. makes no difference
Py_RETURN_TRUE;
}
void regi(int* i, PyObject* po) {
// search variable and methods from PyObject "po" and call its methods?
PyObject* name;
const char* mname="testmethod";
testclass* testobj;
testobj=new testclass(i,po);
/* [insert // in front of this line to test case 1]
// ***** this one works! *********
name=PyString_FromString(mname);
PyObject_CallMethodObjArgs(po, name, NULL);
*/ // [insert // in front of this line to test case 1]
// *** I WOULD LIKE THIS TO WORK *** but it gives segfault.. :(
testobj->runpo(); // [uncomment to test case 2]
}
setup.py:
from distutils.core import setup, Extension
# the c++ extension module
extension_mod = Extension("some", ["some.cpp"])
setup(name = "some", ext_modules=[extension_mod])
test.py:
import some
class sentinel:
def __init__(self):
pass
def testmethod(self):
print "hello from sentinel.testmethod"
pass
se=sentinel()
some.regi_wrapper(1,se)
This question seems relevant:
Calling python method from C++ (or C) callback
.. however the answer did not help me.
What am I missing/misunderstanding here (my c++ sucks big time, so I might have missed something obvious) .. ?
Also, some bonus questions:
a) I am familiar with swig and swig "directors".. however, I would like to use swig for general wrapping of the code, but my custom wrapping for the sort of things described in this question (i.e. without directors). Is there any way to achieve this?
b) Any other suggestions to achieve what I am trying to achieve here, are highly appreciated.. is this possible or just pure insanity?

Using in the constructor
po=this->po
solves the "issue". Sorry for the spam! I will leave here this thing as an example.. maybe someone finds it useful.

Why am I getting this segfault when using the Python/C API?

I am getting a segmentation fault when decrefing a PyObject* in my C++ code using the Python/C API, and I can't figure out why. I am using C++ and Python 2.7. I am using new-style classes for future Python 3 compatibility.
My goal is to create a C++ class MyClass to serve as a wrapper for a class defined in a Python module. In the MyClass constructor, I pass in the name of the Python module, import the module, locate the class (which always has a pre-defined name PyClass), and call that class to create an instance of it. I then store the resulting PyObject* in MyClass for future use. In the MyClass destructor, I decref that stored PyObject* to avoid memory leaks.
I have already verified that everything is working correctly as far as locating the class and creating an instance of it. I have even verified that I can use the stored PyObject* in other MyClass methods, for example, to access methods in the PyClass. However, when the destructor does the decref, it causes a segfault.
Here is a sample of my code. I also call Py_Initialize() and Py_Finalize() elsewhere at appropriate times, and I have left out some of my error-checking code for brevity:
MyPythonModule.py
class PyClass:
pass
MyClass.h
class MyClass {
public:
MyClass(const char* modulename);
~MyClass();
private:
void* _StoredPtr;
};
MyClass.cpp
#include <Python.h>
#include <iostream>
#include "MyClass.h"
MyClass::MyClass(const char* modulename) {
_StoredPtr = NULL;
PyObject *pName = NULL, *pModule = NULL, *pAttr = NULL;
// Import the Python module.
pName = PyString_FromString(modulename);
if (pName == NULL) {goto error;}
pModule = PyImport_Import(pName);
if (pModule == NULL) {goto error;}
// Create a PyClass instance and store a pointer to it.
pAttr = PyObject_GetAttrString(pModule, "PyClass");
if (pAttr == NULL) {goto error;}
_StoredPtr = (void*) PyObject_CallObject(pAttr, NULL);
Py_DECREF(pAttr);
if (_StoredPtr == NULL) {goto error;}
error:
if (PyErr_Occurred()) {PyErr_Print();}
Py_XDECREF(pName);
Py_XDECREF(pModule);
return;
}
MyClass::~MyClass() {
std::cout << "Starting destructor..." << std::endl;
Py_XDECREF((PyObject*)(_StoredPtr));
std::cout << "Destructor complete." << std::endl;
}
I know that I could avoid the segfault by leaving out the Py_XDECREF() in the destructor, but I am afraid of causing a memory leak because I do not understand exactly why this is happening. It seems especially strange that I can use _StoredPtr successfully in other MyClass methods, yet I can't decref it.
I have also tried storing the PyObject* of the imported module in MyClass and holding on to it until after _StoredPtr is decrefed, but the _StoredPtr decref still segfaults. I tried commenting out the Py_DECREF(pAttr); line, but that doesn't help.
As I mentioned, I can retrieve methods in the PyClass using _StoredPtr, and I have also tried storing these in MyClass and decrefing them in the destructor. When I do this, I can decref _StoredPtr, but then it segfaults when I try to decref the method's PyObject*. If I do this with several methods, it is always the last decref that causes the segfault, no matter what order I put them in.
Any insights as to what's happening here?

This works for me
#include <Python.h>
#include <iostream>
#include "MyClass.h"
MyClass::MyClass(const char* modulename) {
_StoredPtr = NULL;
PyObject *pName = NULL, *pModule = NULL, *pAttr = NULL;
// Import the Python module.
pName = PyString_FromString(modulename);
if (pName == NULL) {goto error;}
pModule = PyImport_Import(pName);
if (pModule == NULL) {goto error;}
// Create a PyClass instance and store a pointer to it.
pAttr = PyObject_GetAttrString(pModule, "PyClass");
if (pAttr == NULL) {goto error;}
_StoredPtr = (void*) PyObject_CallObject(pAttr, NULL);
Py_DECREF(pAttr);
if (_StoredPtr == NULL) {goto error;}
else{
// do something with _StoredPtr
Py_XDECREF((*PyObject)_StoredPtr)
}
error:
if (PyErr_Occurred()) {PyErr_Print();}
Py_XDECREF(pName);
Py_XDECREF(pModule);
return;
}
MyClass::~MyClass() {}
I basically moved the XDECREF outside the destructor into the function that is using the PyObject.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

`threading.local` unexpected behavior with Python embedding - python

Related

Embedding a Python interpreter in a multi-threaded C++ program with pybind11

Embedding multiple python 3 interpreters with different built-in modules

Use a C exe as Nativehost application instead python in chrome NativeHost messaging

Calling python object's method from c++

Why am I getting this segfault when using the Python/C API?

Categories

Resources