Python C-extension class with a dynamic size - python

I try to write C-extension which contains python class.
This class takes a filename as a parameter fname of constructor, then loads that file to memory and stores as a config.
Below what I have:
typedef struct {
PyObject_HEAD
XmlConfig m_cfg;
} PyAgent;
static int PyAgent_init(PyAgent *self, PyObject *args) {
const char *fname;
if (!PyArg_ParseTuple(args, "s", &fname)) {
return NULL;
}
self->m_cfg.Load(fname);
return 0;
}
static PyTypeObject PyAgentType = {
PyObject_HEAD_INIT(NULL)
0, /* ob_size */
"classify.Agent", /* tp_name */
...
}
...
I get segmentation fault when I try to load file. As I understand it happens because PyAgent struct has object size that increases because of memory allocation for file.
I've tried to use PyObject_VAR_HEAD macro but it doesn't help.
Can you give a clue or working similar example?

Maybe self->m_cfg needs to be initialized first? You are calling a C++ method on it, but didn't call its constructor first. The struct is allocated by C code, not C++, so it doesn't know it needs to construct this field.
There are ways to manually call the constructor (something like new(&self->m_cfg) XmlConfig() if I remember correctly), but the easiest is probably to have instead a field XmlConfig *cfg, and allocate and free it as needed with the usual syntax of the C++ operators new and delete (the latter to be put in the tp_dealloc).

Related

Garbage collection and custom getter in SWIG

I'm not the author, but there's a public software package I use that seems to be leaking memory (Github issue). I'm trying to figure out how to patch it to make it work correctly.
To narrow the problem down, there's a struct, call it xxx_t. First %extend is used to make a member of the struct available in Python:
%extend xxx_t {
char *surface;
}
Then there's a custom getter. What exactly it does here isn't important except that it uses new to create a char*.
%{
char* xxx_t_surface_get(xxx *n) {
char *s = new char [n->length + 1];
memcpy (s, n->surface, n->length);
s[n->length] = '\0';
return s;
}
%}
Currently the code has this line to handle garbage collection:
%newobject surface;
This does not seem to work as expected. %newobject xxx_t::surface; also doesn't work. If I replace it with %newobject xxx_t_surface_get; that doesn't work because the getter function is escaped (inside %{ ... %}).
What is the right way to tell SWIG about the char* so it gets freed?
Before getting start it's worth pointing out one thing: Because you return char* it ends up using SWIG's normal string typemaps to produce a Python string.
Having said that let's understand what the code that currently gets generated looks like. We can start our investigation with the following SWIG interface definition to experiment with:
%module test
%inline %{
struct foobar {
};
%}
%extend foobar {
char *surface;
}
If we run something like this through SWIG we'll see a generated function which wraps your _surface_get code, something like this:
SWIGINTERN PyObject *_wrap_foobar_surface_get(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
PyObject *resultobj = 0;
foobar *arg1 = (foobar *) 0 ;
void *argp1 = 0 ;
int res1 = 0 ;
PyObject * obj0 = 0 ;
char *result = 0 ;
if (!PyArg_ParseTuple(args,(char *)"O:foobar_surface_get",&obj0)) SWIG_fail;
res1 = SWIG_ConvertPtr(obj0, &argp1,SWIGTYPE_p_foobar, 0 | 0 );
if (!SWIG_IsOK(res1)) {
SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "foobar_surface_get" "', argument " "1"" of type '" "foobar *""'");
}
arg1 = reinterpret_cast< foobar * >(argp1);
result = (char *)foobar_surface_get(arg1);
resultobj = SWIG_FromCharPtr((const char *)result);
/* result is never used again from here onwards */
return resultobj;
fail:
return NULL;
}
The thing to note here however is that the result of calling your getter is lost when this wrapper returns. That is to say that it isn't even tied to the lifespan of the Python string object that gets returned.
So there are several ways we could fix this:
One option would be to ensure that the generated wrapper calls delete[] on the result of calling your getter, after the SWIG_FromCharPtr has happened. This is exactly what %newobject does in this instance. (See below).
Another alternative would be to keep the allocated buffer between calls, probably in some thread local storage and track the size to minimise allocations
Alternatively we could use some kind of RAII based object to own the temporary buffer and make sure it gets removed. (We could do something neat with operator void* if we wanted even).
If we change our interface to add %newobject like so:
%module test
%inline %{
struct foobar {
};
%}
%newobject surface;
%extend foobar {
char *surface;
}
Then we see that our generated code now looks like this:
// ....
result = (char *)foobar_surface_get(arg1);
resultobj = SWIG_FromCharPtr((const char *)result);
delete[] result;
We can see this in the real code from github too, so this isn't the bug that you're looking for.
Typically for C++ I'd lean towards the RAII option. And as it happens there's a neat way to do this from both a SWIG perspective and a C++ one: std::string. So we can fix your leak in a simple and clean way just by doing something like this:
%include <std_string.i> /* If you don't already have this... */
%extend xxx_t {
std::string surface;
}
%{
std::string xxx_t_surface_get(xxx *n) {
return std::string(n->surface, n->length);
}
%}
(You'll need to change the setter to match too though, unless you made it const so there is no setter)
The thing about this though is that it's still making two sets of allocations for the same output. Firstly the std::string object makes one allocation and then secondly an allocation occurs for the Python string object. That's all for something where the buffer already exists in C++ anyway. So whilst this change would be sufficient and correct to solve the leak you can also go further and write a version that does less duplicitous copying:
%extend xxx_t {
PyObject *surface;
}
%{
PyObject *xxx_t_surface_get(xxx *n) {
return SWIG_FromCharPtrAndSize(n->surface, n->length);
}
%}

C++ to communicate to Python function

I am new to c++,
I created DLL which contains Class, and functions and all the return type of each function is PyObject (Python Object), So now i want to write the C++ application which loads DLL dynamically using LoadLibrary function.
Able to execute with adding the project to same solution and adding the reference to DLL.
I am able to load the DLL, but when i am calling the function it returns PyObject data type, how to store the return type of PyObject in C++?
You should take a look at Python's documentation on Concrete Objects Layer. Basically you have to convert PyObject into a C++ type using a function of the form Py*T*_As*T*(PyObject* obj) where T is the concrete type you want to retrieve.
The API assumes you know which function you should call. But, as stated in the doc, you can check the type before use:
...if you receive an object from a Python program and you are not sure that it has the right type, you must perform a type check first; for example, to check that an object is a dictionary, use PyDict_Check().
Here is an example to convert a PyObject into long:
PyObject* some_py_object = /* ... */;
long as_long(
PyLong_AsLong(some_py_object)
);
Py_DECREF(some_py_object);
Here is another, more complicated example converting a Python list into a std::vector:
PyObject* some_py_list = /* ... */;
// assuming the list contains long
std::vector<long> as_vector(PyList_Size(some_py_list));
for(size_t i = 0; i < as_vector.size(); ++i)
{
PyObject* item = PyList_GetItem(some_py_list, i);
as_vector[i] = PyLong_AsLong(item);
Py_DECREF(item);
}
Py_DECREF(some_py_list);
A last, more complicated example, to parse a Python dict into a std::map:
PyObject* some_py_dict = /* ... */;
// assuming the dict uses long as keys, and contains string as values
std::map<long, std::string> as_map;
// first get the keys
PyObject* keys = PyDict_Keys(some_py_dict);
size_t key_count = PyList_Size(keys);
// loop on the keys and get the values
for(size_t i = 0; i < key_count; ++i)
{
PyObject* key = PyList_GetItem(keys, i);
PyObject* item = PyDict_GetItem(some_py_dict, key);
// add to the map
as_map.emplace(PyLong_AsLong(key), PyString_AsString(item));
Py_DECREF(key);
Py_DECREF(item);
}
Py_DECREF(keys);
Py_DECREF(some_py_dict);

Writing to new Python buffer interface

I have implemented the new python buffer interface in C++ outlined here:
https://docs.python.org/2/c-api/buffer.html
I have implemented my Py_buffer struct and filled it in:
template<typename T>
static int getbuffer(PyObject *obj, Py_buffer *view, int flags)
{
flags;
static const Py_ssize_t suboffsets[1] = { 0};
view->buf = (void*)(_Cast<T>(obj)->getbuffer());
view->obj = NULL;
view->len = _Cast<T>(obj)->getbuffersize();
view->itemsize = 1;
view->readonly = _Cast<T>(obj)->getreadonly();
view->ndim = 0;
view->format = NULL;
view->shape = NULL;
view->strides = NULL;
view->suboffsets = NULL;
view->internal = NULL;
return 0;
}
I am creating my Python buffer class in Python and handing it to C++. I am getting a pyObject along with my Py_Buffer. So now my question is, how am I supposed to write and resize this pyBuffer in C++? I can get access to the pointer directly and a size. But if its a newly created buffer how do I tell it how much space I need? There does not seem to be any sort of resize function for me to call.
I can use: int result = PyBuffer_FromContiguous(&m_view, const_cast<void*>(data), pySize, 'A');
to add data to my buffer. But my buffer must already have the correct size or it wont write. I do not think this is the correct way to be using it anyway.
Cython is not an option.
You shouldn't resize the Py_buffer directly, since it is just an interface to the data of a PyObject.
Instead, use PyByteArray_Resize() (or possibly _PyString_Resize()) on the underlying PyObject.

How can I create python custom types from C++ using native Python/C API?

Explanation is defined below:
I have defined a new Python type named "Ex1".
typedef struct {
PyObject_HEAD
PyObject * int_id;
int * value;
} Ex1;
With this type in mind and all appropriate methods generated and validated in Python interpreted (it works pretty well). I want to be able to create a python object of the new Ex1 Type from C++ backend. A typical structure of what I need is:
int main
{
// Create Ex1 Object.
Ex1 Example;
// Call PythonC-API method to include Ex1 Object into the python interpreter.
// ¿Any function-method from Python API to perform this task?
}
Actually I managed to solve this problem using python docs:
https://docs.python.org/2/extending/newtypes.html
First of all it is necessary to define the appropiate methos as is described in the python docs (1) . Assuming the PyType created has two attributes (varp1 and varp2):
PyObject * create_Ex1(vartype1 var1, vartype2 var2)
{
PyObject * pInst = PyObject_CallObject((PyObject *)&Ex1Type, NULL);
((Ex1*)pInst)->varp1 = var1;
((Ex1*)pInst)->varp2 = var2;
return pInst;
}
and
static int Ex1_init(Ex1 *self, PyObject *args, PyObject *kwds)
{
// On init return -1 if error
self->varp1= NULL;
self->varp2= NULL;
return 0;
};
This is defined on the static PyTypeObject Ex1Type as is described in the python docs (1).
Then the object is created and initialized using this line:
PyObject* Ex1_obj = create_Ex1(var1, var2);

Boost::Python: How do I expose a dynamic non-object array to a PyBuf?

I'm working on a Computer Vision system with OpenCV in C++. I wrote a small GUI for it by using Boost::Python and PyQT4. Since I don't want to introduce QT to the C++ project, I need a way to expose Mat::data (an unsigned char * member) to Python in order to create a QImage there.
First I tried it like this:
class_<cv::Mat>("Mat", init<>())
.add_property("data_", make_getter(&Mat::data))
but then I got this in Python: "TypeError: No to_python (by-value) converter found for C++ type: unsigned char*"
I couldn't write a converter for it because a PyBuf of course needs to know its size.
So my next approach was trying to create a proxy object like this:
struct uchar_array {
uchar *data;
size_t size;
bool copied;
static const bool debug = true;
// copy from byte array
uchar_array(uchar *ptr, size_t size, bool copy) {
this->size = size;
this->copied = copy;
if(copied) {
data = new uchar[size];
memcpy(data, ptr, size);
} else {
data = ptr;
}
if(debug) LOG_ERR("init %d bytes in #%p, mem #%p", size, this, data);
}
PyObject *py_ptr() {
if(debug) LOG_ERR("py_ptr");
return boost::python::incref(PyBuffer_FromMemory(data, size));
}
~uchar_array() {
if(debug) LOG_ERR("~uchar_array #%p", this);
if(copied) {
if(debug) LOG_ERR("free #%p, mem #%p", this, data);
delete [] data;
}
}
};
And exposing this via a non-member method:
uchar_array *getMatData(Mat &mat) {
size_t size = mat.rows * mat.cols * mat.elemSize();
uchar_array *arr = new uchar_array(mat.data, size, true);
return arr;
}
class_<cv::Mat>("Mat", init<>())
.def("data", getMatData, with_custodian_and_ward_postcall<1, 0, return_value_policy<manage_new_object> >())
class_<uchar_array, shared_ptr<uchar_array> >("uchar_array", no_init)
.def("ptr", &uchar_array::py_ptr);
This works and gets me the buffer into Python, but there are two problems with this approach:
I now have to use mat.data().ptr(), it would be nicer to just do mat.data
When doing mat.data().ptr(), it seems the temporary uchar_array gets destructed immediately after calling ptr(), thus freeing the memory while I still want to use it
I did several experiments with custodian_and_ward and other stuff but got to a point where I stopped to understand this.
So, could anyone please tell me: What's the preferred way to export an unsigned char * to a PyBuf? In two variants, if possible: allocated for Python so should be freed by Python or as internal pointer where C++ frees it.
char* buffers are not really python friendly. On my project (which is not performance sensitive) I would use a std::vector or std::string, depending on what it was intended to contain. Both of these are nicely python friendly.
If you are not able to alter the underlying data structure, you can use add_property and a couple of getter and setter functions to convert data to a more convenient structure.

Categories