Python C API with recursion - segfaults - python

I'm using python's C API (2.7) in C++ to convert a python tree structure into a C++ tree. The code goes as follows:
the python tree is implemented recursively as a class with a list of children. the leaf nodes are just primitive integers (not class instances)
I load a module and invoke a python method from C++, using code from here, which returns an instance of the tree, python_tree, as a PyObject in C++.
recursively traverse the obtained PyObject. To obtain the list of children, I do this:
PyObject* attr = PyString_FromString("children");
PyObject* list = PyObject_GetAttr(python_tree,attr);
for (int i=0; i<PyList_Size(list); i++) {
PyObject* child = PyList_GetItem(list,i);
...
Pretty straightforward, and it works, until I eventually hit a segmentation fault, at the call to PyObject_GetAttr (Objects/object.c:1193, but I can't see the API code). It seems to happen on the visit to the last leaf node of the tree.
I'm having a hard time determining the problem. Are there any special considerations for doing recursion with the C API? I'm not sure if I need to be using Py_INCREF/Py_DECREF, or using these functions or something. I don't fully understand how the API works to be honest. Any help is much appreciated!
EDIT: Some minimal code:
void VisitTree(PyObject* py_tree) throw (Python_exception)
{
PyObject* attr = PyString_FromString("children");
if (PyObject_HasAttr(py_tree, attr)) // segfault on last visit
{
PyObject* list = PyObject_GetAttr(py_tree,attr);
if (list)
{
int size = PyList_Size(list);
for (int i=0; i<size; i++)
{
PyObject* py_child = PyList_GetItem(list,i);
PyObject *cls = PyString_FromString("ExpressionTree");
// check if child is class instance or number (terminal)
if (PyInt_Check(py_child) || PyLong_Check(py_child) || PyString_Check(py_child))
;// terminal - do nothing for now
else if (PyObject_IsInstance(py_child, cls))
VisitTree(py_child);
else
throw Python_exception("unrecognized object from python");
}
}
}
}

One can identify several problems with your Python/C code:
PyObject_IsInstance takes a class, not a string, as its second argument.
There is no code dedicated to reference counting. New references, such as those returned by PyObject_GetAttr are never released, and borrowed references obtained with PyList_GetItem are never acquired before use. Mixing C++ exceptions with otherwise pure Python/C aggravates the issue, making it even harder to implement correct reference counting.
Important error checks are missing. PyString_FromString can fail when there is insufficient memory; PyList_GetItem can fail if the list shrinks in the meantime; PyObject_GetAttr can fail in some circumstances even after PyObject_HasAttr succeeds.
Here is a rewritten (but untested) version of the code, featuring the following changes:
The utility function GetExpressionTreeClass obtains the ExpressionTree class from the module that defines it. (Fill in the correct module name for my_module.)
Guard is a RAII-style guard class that releases the Python object when leaving the scope. This small and simple class makes reference counting exception-safe, and its constructor handles NULL objects itself. boost::python defines layers of functionality in this style, and I recommend to take a look at it.
All Python_exception throws are now accompanied by setting the Python exception info. The catcher of Python_exception can therefore use PyErr_PrintExc or PyErr_Fetch to print the exception or otherwise find out what went wrong.
The code:
class Guard {
PyObject *obj;
public:
Guard(PyObject *obj_): obj(obj_) {
if (!obj)
throw Python_exception("NULL object");
}
~Guard() {
Py_DECREF(obj);
}
};
PyObject *GetExpressionTreeClass()
{
PyObject *module = PyImport_ImportModule("my_module");
Guard module_guard(module);
return PyObject_GetAttrString(module, "ExpressionTree");
}
void VisitTree(PyObject* py_tree) throw (Python_exception)
{
PyObject *cls = GetExpressionTreeClass();
Guard cls_guard(cls);
PyObject* list = PyObject_GetAttrString(py_tree, "children");
if (!list && PyErr_ExceptionMatches(PyExc_AttributeError)) {
PyErr_Clear(); // hasattr does this exact check
return;
}
Guard list_guard(list);
Py_ssize_t size = PyList_Size(list);
for (Py_ssize_t i = 0; i < size; i++) {
PyObject* child = PyList_GetItem(list, i);
Py_XINCREF(child);
Guard child_guard(child);
// check if child is class instance or number (terminal)
if (PyInt_Check(child) || PyLong_Check(child) || PyString_Check(child))
; // terminal - do nothing for now
else if (PyObject_IsInstance(child, cls))
VisitTree(child);
else {
PyErr_Format(PyExc_TypeError, "unrecognized %s object", Py_TYPE(child)->tp_name);
throw Python_exception("unrecognized object from python");
}
}
}

Related

Revert cppyy automatic mapping of operator() to __getitem__ via C++ pythonization callback

As is also explained in this cppyy issue, an A& operator() on the C++ side is mapped to the python __getitem__.
On the issue it is suggested to add a special pythonization if this is not the wished for result.
An extra constraint in my case would be to add this to the C++ class itself to ensure that this pythonization is always applied.
I'm however having trouble figuring out how to properly do this via the Python C API. (1st time working with the that API so I'm a bit lost)
Minimal Reproducer somewhat contrived but shows the problem:
Note in the example below that struct A is code that I can't modify because that class is defined in library code. So the callback has to be in B.
import cppyy
cppyy.include("Python.h")
cppyy.cppdef(r"""
void myprint(PyObject* py){
PyObject* tmp = PyObject_Str(py);
Py_ssize_t size = 0;
const char* s = PyUnicode_AsUTF8AndSize(tmp, &size);
std::cout << std::string(s, size) << std::endl;
}
template <typename T>
struct A {
T& operator[](size_t idx) { return arr[idx]; }
const T& operator[](size_t idx) const { return arr[idx]; }
std::array<T, 10> arr{};
};
template <typename T>
struct B : public A<T> {
B& operator()() { return *this; };
static void __cppyy_pythonize__( PyObject* klass, const std::string& name){
std::cout << "Hello from pythonize" << std::endl;
PyObject* getitem = PyObject_GetAttrString(klass, "__getitem__");
myprint(getitem);
}
};
using tmp = B<double>;
""")
t = cppyy.gbl.B['double']
print(t.__getitem__.__doc__)
I can get the __getitem__ function from the PyObject* klass but, as explained in the docs, the callback happens at the very end after all the internal processing of the class.
Thus the __call__ function, which here is B& operator()(), has already been mapped to __getitem__.
Unfortunately, I can't for the life of me figure out how I would undo that mapping and get back that old __getitem__ function.
Is that operator[]() function even still accessible via the PyObject* klass ?
Any help/pointers would be much appreciated :)
First, to answer your question, to find the __getitem__ you want, get it from the base class of klass, not from klass directly. You can also do this in Python, rather than adding pythonizations in C++. In fact, doing this in Python is preferred as then you don't have to deal with the C-API.
However, since the actual bug report is not the one you referenced, but this one, and since the suggestion made there, which you followed here, makes this a classic XY-problem, let me also add that what you really want is to simply do PyObject_DelAttrString(klass, "__getitem__") in your code example.
Completely aside, the code that is giving you trouble here is from the Gaudi project, the core developers of which are the ones who asked for this automatic mapping in the first place. You may want to take this up with them.

How do I stop pybind11 from deallocating an object constructed from Python?

So, I know that pybind lets you set a return value policy for methods that you wrap up. However, that doesn't seem to be working for me when I try to use this policy on a constructor. I have a class to wrap my C++ type that looks like this:
class PyComponent{
public:
static Component* Create(ComponentType type) {
Component* c = new Component(type);
// Irrelevant stuff removed here
return c;
}
/// #brief Wrap a behavior for Python
static void PyInitialize(py::module_& m);
};
void PyComponent::PyInitialize(py::module_ & m)
{
py::class_<Component>(m, "Component")
.def(py::init<>(&PyComponent::Create), py::return_value_policy::reference)
;
}
However, this does NOT stop my Component type from getting deallocated from the Python side if I call Component() and the created object goes out of scope. Any suggestions?
I did figure out the solution to this. It's to pass py::nodelete to the wrapper for my class
void PyComponent::PyInitialize(py::module_ & m)
{
py::class_<Component, std::unique_ptr<Component, py::nodelete>>(m, "Component")
.def(py::init<>(&PyComponent::Create), py::return_value_policy::reference)
;
}

How to achieve polymorphism in Python C API?

I'm writing functools.partial object alternative, that accumulates arguments until their number become sufficient to make a call.
I use C API and I have tp_call implementation which when its called, returns modified version of self or PyObject*.
At first I followed Defining New Types guide and then realized, that I just can't return different types (PyObject * and MyObject*) from tp_call implementation.
Then I tried to not use struct with MyObject* definition and use PyObject_SetAttrString in tp_init instead, just like we do that in Python. But in that case I got AttributeError, because you can't set arbitrary attributes on object instances in Python.
What I need here is to make my tp_call implementation polymorphic, and make it able to return either MyObject which is subclass of PyObject, or PyObject type itself.
What is the sane way to do that?
UPDATE #0
That's the code:
static PyObject *Curry_call(Curry *self, PyObject *args,
PyObject *kwargs) {
PyObject * old_args = self->args;
self->args = PySequence_Concat(self->args, args);
Py_DECREF(old_args);
if (self->kwargs == NULL && kwargs != NULL) {
self->kwargs = kwargs;
Py_INCREF(self->kwargs);
} else if (self->kwargs != NULL && kwargs != NULL) {
PyDict_Merge(self->kwargs, kwargs, 1);
}
if ((PyObject_Size(self->args) +
(self->kwargs != NULL ? PyObject_Size(self->kwargs) : 0)) >=
self->num_args) {
return PyObject_Call(self->fn, self->args, self->kwargs);
} else {
return (PyObject *)self;
}
}
UPDATE #1
Why I initially abandoned this implementation - because I get segfault with it on subsequent calls of partial object. I thought that It happens because of casting Curry * to PyObject* issues. But now I have fixed the segfault by adding Py_INCREF(self); before return (PyObject *)self;. Very strange to me. Should I really INCREF self if I return it by C API ownership rules?
If you've defined your MyObject type correctly, you should be able to simply cast your MyObject * to a PyObject * and return that. The first member of a MyObject is a PyObject, and C lets you cast a pointer to a struct to a pointer to the struct's first member and vice versa. I believe the feature exists specifically to allow things like this.
I don't really know your whole code, but as long as MyObject is a PyObject (compatible, i.e. has the same "header" fields, make sure you have a length field), CPython is designed to just take your MyObject as a PyObject; simply cast the pointer to PyObject before returning it.
As you can see here, that is one of the things that is convenient when using C++: You can actually have subclasses with type safety, and you don't have to worry about someone just copying over half of your subclass' instance, for example.
EDIT: because it was asked "isn't this unsafe": yes. It is. But its only as unsafe as type handling in user code gets; CPython lets you do this, because it stores and checks the PyTypeObject *ob_type member of the PyObject struct contained. That's about as safe as for example C++'s runtime type checking is -- but it's implemented by python developers as opposed to GCC/clang/MSVC/icc/... developers.

Python C-extension class with a dynamic size

I try to write C-extension which contains python class.
This class takes a filename as a parameter fname of constructor, then loads that file to memory and stores as a config.
Below what I have:
typedef struct {
PyObject_HEAD
XmlConfig m_cfg;
} PyAgent;
static int PyAgent_init(PyAgent *self, PyObject *args) {
const char *fname;
if (!PyArg_ParseTuple(args, "s", &fname)) {
return NULL;
}
self->m_cfg.Load(fname);
return 0;
}
static PyTypeObject PyAgentType = {
PyObject_HEAD_INIT(NULL)
0, /* ob_size */
"classify.Agent", /* tp_name */
...
}
...
I get segmentation fault when I try to load file. As I understand it happens because PyAgent struct has object size that increases because of memory allocation for file.
I've tried to use PyObject_VAR_HEAD macro but it doesn't help.
Can you give a clue or working similar example?
Maybe self->m_cfg needs to be initialized first? You are calling a C++ method on it, but didn't call its constructor first. The struct is allocated by C code, not C++, so it doesn't know it needs to construct this field.
There are ways to manually call the constructor (something like new(&self->m_cfg) XmlConfig() if I remember correctly), but the easiest is probably to have instead a field XmlConfig *cfg, and allocate and free it as needed with the usual syntax of the C++ operators new and delete (the latter to be put in the tp_dealloc).

Using SWIG to pass C++ object pointers to Python, than back to C++ again

I'm using SWIG to wrap 2 C++ objects, and I am embedding the Python interpreter in my application (i.e. calling PyInitialize() etc myself).
The first object is a wrapper for some application data.
The second is a "helper" object, also written in C++, which can perform certain operation based on what it finds in the data object.
The python script decides when/how/if to invoke the helper object.
So I pass a pointer to my C++ object to SWIG/Python thus:
swig_type_info *ty = SWIG_MangledTypeQuery("_p_MyDataObject");
if(ty == NULL)
{
Py_Finalize();
return false;
}
PyObject *data_obj = SWIG_NewPointerObj(PointerToMyDataObject, ty, 0);
if(data_obj == NULL)
{
Py_Finalize();
return false;
}
ty = SWIG_MangledTypeQuery("_p_MyHelperObject");
if(ty == NULL)
{
Py_Finalize();
return false;
}
PyObject *helper_obj = SWIG_NewPointerObj(PointerToMyHelperObject, ty, 0);
if(helper_obj == NULL)
{
Py_Finalize();
return false;
}
PyTuple_SetItem(pArgs, 0, data_obj);
PyTuple_SetItem(pArgs, 1, helper_obj);
PyObject *pValue = PyObject_CallObject(pFunc, pArgs);
if(pValue == NULL)
{
Py_Finalize();
return false;
}
In Python, we see something like:
def go(dataobj, helperobj):
## if conditions are right....
helperobj.helpme(dataobj)
Now, this largely works except for one thing. In my C++ code when I am preparing my arguments to pass on to the Python script, I observe the pointer value of PointerToMyDataObject.
When I set a breakpoint in the C++ implementation of helperobj.helpme(), I see that the memory address is different, though it seems to be a pointer to a valid instance of MyDataObject.
This is important to me, as "MyDataObject" is in fact a base class for a few possible derived classes. My helper object wants to perform an appropriate (determined by context) dynamic cast on the pointer it receives to point at the appropriate derived class. That's failing for what I think are obvious reasons now.
I've read some things about "shadow" objects in SWIG, which only adds to my confusion (apologies for my tiny brain :-P)
So, is SWIG making a copy of my object for some reason, and then passing a pointer to the copy? If it is, then I can understand why my assumptions about dynamic casts won't work.
I Tried to add this as a comment, but struggled with formatting, so..... more insight follows:
The problem has to do with pass-by-reference. Notice I have 2 implementations of the virtual method helpMe():
bool MyHelperObject::helpMe(MyDataObject mydata_obj)
{
return common_code(&mydata_obj);
}
bool MyHelperObject::helpMe(MyDataObject *mydata_obj)
{
return common_code(mydata_obj);
}
Although I provided python with a pointer, it is calling the pass-by-reference version. This explains why I'm getting different pointer values. But what can I do to force a call on the version that takes a pointer argument?
Based on what you've shown I think you want to make sure SWIG only gets to see the pointer version of helpMe. The non-pointer version will be creating a temporary copy and then passing that into the function and it sounds like that isn't what you want.
SWIG will have a hard time picking which version to use since it abstracts the pointer concept slightly to match Python better.
You can hide the non-pointer version from SWIG with %ignore before the declaration or %import that shows it to SWIG in your interface file:
%ignore MyHelperObject::helpMe(MyDataObject mydata_obj)
%import "some.h"

Categories