Garbage collection and custom getter in SWIG - python

I'm not the author, but there's a public software package I use that seems to be leaking memory (Github issue). I'm trying to figure out how to patch it to make it work correctly.
To narrow the problem down, there's a struct, call it xxx_t. First %extend is used to make a member of the struct available in Python:
%extend xxx_t {
char *surface;
}
Then there's a custom getter. What exactly it does here isn't important except that it uses new to create a char*.
%{
char* xxx_t_surface_get(xxx *n) {
char *s = new char [n->length + 1];
memcpy (s, n->surface, n->length);
s[n->length] = '\0';
return s;
}
%}
Currently the code has this line to handle garbage collection:
%newobject surface;
This does not seem to work as expected. %newobject xxx_t::surface; also doesn't work. If I replace it with %newobject xxx_t_surface_get; that doesn't work because the getter function is escaped (inside %{ ... %}).
What is the right way to tell SWIG about the char* so it gets freed?

Before getting start it's worth pointing out one thing: Because you return char* it ends up using SWIG's normal string typemaps to produce a Python string.
Having said that let's understand what the code that currently gets generated looks like. We can start our investigation with the following SWIG interface definition to experiment with:
%module test
%inline %{
struct foobar {
};
%}
%extend foobar {
char *surface;
}
If we run something like this through SWIG we'll see a generated function which wraps your _surface_get code, something like this:
SWIGINTERN PyObject *_wrap_foobar_surface_get(PyObject *SWIGUNUSEDPARM(self), PyObject *args) {
PyObject *resultobj = 0;
foobar *arg1 = (foobar *) 0 ;
void *argp1 = 0 ;
int res1 = 0 ;
PyObject * obj0 = 0 ;
char *result = 0 ;
if (!PyArg_ParseTuple(args,(char *)"O:foobar_surface_get",&obj0)) SWIG_fail;
res1 = SWIG_ConvertPtr(obj0, &argp1,SWIGTYPE_p_foobar, 0 | 0 );
if (!SWIG_IsOK(res1)) {
SWIG_exception_fail(SWIG_ArgError(res1), "in method '" "foobar_surface_get" "', argument " "1"" of type '" "foobar *""'");
}
arg1 = reinterpret_cast< foobar * >(argp1);
result = (char *)foobar_surface_get(arg1);
resultobj = SWIG_FromCharPtr((const char *)result);
/* result is never used again from here onwards */
return resultobj;
fail:
return NULL;
}
The thing to note here however is that the result of calling your getter is lost when this wrapper returns. That is to say that it isn't even tied to the lifespan of the Python string object that gets returned.
So there are several ways we could fix this:
One option would be to ensure that the generated wrapper calls delete[] on the result of calling your getter, after the SWIG_FromCharPtr has happened. This is exactly what %newobject does in this instance. (See below).
Another alternative would be to keep the allocated buffer between calls, probably in some thread local storage and track the size to minimise allocations
Alternatively we could use some kind of RAII based object to own the temporary buffer and make sure it gets removed. (We could do something neat with operator void* if we wanted even).
If we change our interface to add %newobject like so:
%module test
%inline %{
struct foobar {
};
%}
%newobject surface;
%extend foobar {
char *surface;
}
Then we see that our generated code now looks like this:
// ....
result = (char *)foobar_surface_get(arg1);
resultobj = SWIG_FromCharPtr((const char *)result);
delete[] result;
We can see this in the real code from github too, so this isn't the bug that you're looking for.
Typically for C++ I'd lean towards the RAII option. And as it happens there's a neat way to do this from both a SWIG perspective and a C++ one: std::string. So we can fix your leak in a simple and clean way just by doing something like this:
%include <std_string.i> /* If you don't already have this... */
%extend xxx_t {
std::string surface;
}
%{
std::string xxx_t_surface_get(xxx *n) {
return std::string(n->surface, n->length);
}
%}
(You'll need to change the setter to match too though, unless you made it const so there is no setter)
The thing about this though is that it's still making two sets of allocations for the same output. Firstly the std::string object makes one allocation and then secondly an allocation occurs for the Python string object. That's all for something where the buffer already exists in C++ anyway. So whilst this change would be sufficient and correct to solve the leak you can also go further and write a version that does less duplicitous copying:
%extend xxx_t {
PyObject *surface;
}
%{
PyObject *xxx_t_surface_get(xxx *n) {
return SWIG_FromCharPtrAndSize(n->surface, n->length);
}
%}

Related

swig macro $descriptor within helper function

Intro
My typical swig interface file is similar to the following:
%{ //COPY THIS BLOCK AS IS
#include <headers.h>
static CppClass* get_CppClass_from_swig_object(PyObject* obj)
{
void* self_obj = nullptr;
int ok = SWIG_Python_ConvertPtr(obj, &self_obj, SWIGTYPE_p_CppClass, 0);
if(!SWIG_IsOK(ok))
{
PyErr_SetString(PyExc_TypeError, "Object must be a CppClass");
return nullptr;
}
return reinterpret_cast<CppClass*>(self_obj);
}
static CppClass convert_to_CppClass(PyObject* py_obj)
{
CppClass* converted_ptr = get_CppClass_from_swig_object(py_obj);
if(converted_ptr==nullptr)
throw std::runtime_error("Python object is not a CppClass");
return CppClass(*converted_ptr);
}
%}
%typemap(in) std::vector<CppClass>& (std::vector<CppClass> temp) {
try{
temp = SequenceConverter::to_vector<CppClass>($input, convert_to_CppClass);
$1 = &temp;
}catch(std::exception& e){
PyErr_SetString(PyExc_RuntimeError, e.what());
SWIG_fail;
}
}
%typemap(typecheck, precedence=SWIG_TYPECHECK_CPPCLASS_VECTOR) std::vector<CppClass>& {
$1 = 0;
if(PyTuple_Check($input))
$1 = 1;
else if(PyList_Check($input))
$1 = 1;
}
class CppClass
{
public:
CppClass();
CppClass(const CppClass& other);
//other methods
};
but I would like to avoid explicitly using SWIGTYPE_p_CppClass within get_CppClass_from_swig_object.
As is, it is not possible to use the $descriptor(CppClass) swig macro as I would like to, because the %{ ... %} block is copied as -is rather than interpreted by swig, and so the $descriptor swig macro would not be interpreted. On the other hand, if i would remove the % and use a { ... } block, swig would try and wrap the whole get_CppClass_from_swig_object and convert_to_CppClass classes rather than simply defining them so they can be used in the typemap.
Question
How can I change my file structure and allow using $descriptor macro within the conversion helpers?
TL;DR
%{...%} blocks are neither preprocessed nor wrapped by swig
{...} blocks are both preprocessed and wrapped ( but small pieces can be prevented from swig preprocessing with a preceding % )
How can I make swig preprocess but not wrap a piece of code?
I don't think there's a way to make the contents of %{...%} be preprocessed, but not wrapped - for most of what the preprocessor does it relies on a typemap to actually be instantiated to populate the substitution information (although $descriptor could still work and I've wished for it in the past).
My usual solution is to pass the SWIG type info as an argument into functions like that, for example:
%{ //COPY THIS BLOCK AS IS
#include <headers.h>
static CppClass* get_CppClass_from_swig_object(PyObject* obj, swig_type_info *ty)
{
//... use ty instead of $descriptor here
Which means that when you use get_CppClass_from_swig_object in a typemap all you need to do is use $1_descriptor or $descriptor to get the correct value for the second argument.

Returning arguments in SWIG/Python

According to Swig docs and the marvelous explanation at SWIG in typemap works, but argout does not by #Flexo, the argout typemap turns reference arguments into return values in Python.
I have a scenario, in which I pass a dict, which then is converted to an unordered_map in typemap(in), which then gets populated in the C++ lib. Stepping through the code, I can see the mapping changed after it returned from C++, so I wonder why there is not a possibility to just convert the unordered_map back in place in to the dict that was passed. Or is it possible by now and I'm just overlooking something?
Thanks!
I am a little confused as to what exactly you are asking, but my understanding is:
You have an "in" typemap to convert a Python dict to a C++ unordered_map for some function argument.
The function then modifies the unordered_map.
After completion of the function, you want the Python dict updated to the current unordered_map, and are somehow having trouble with this step.
Since you know how to convert a dict to an unordered_map, I assume you basically do know how to convert the unordered_map back to the dict using the Python C-API, but are somehow unsure into which SWIG typemap to put the code.
So, under these assumptions, I'll try to help:
"the argout typemap turns reference arguments into return values in Python". Not really, although it is mostly used for this purpose. An "argout" typemap simply supplies code to deal with some function argument (internally referred to as $1) that is inserted into the wrapper code after the C++ function is called. Compare this with an "in" typemap that supplies code to convert the supplied Python argument $input to a C++ argument $1, which is obviously inserted into the wrapper code before the C++ function is called.
The original passed Python dict can be referred to in the "argout" typemap as $input, and the modified unordered_map as $1 (see the SWIG docs linked above).
Therefore, all you need to do is write an "argout" typemap for the same argument signature as the "in" typemap that you already have, and insert the code (using the Python C-API) to update the contents of the Python dict ($input) to reflect the contents of the unordered_map ($1).
Note that this is different from the classical use of "argout" typemaps, which would typically convert the $1 back to a new Python dict and append this to the Python return object, which you can refer to by $result.
I hope this helps. If you are still stuck at some point, please edit your question to make clear at which point you are having trouble.
I am well aware of that the user has already solved his issue, but here goes a solution. Some validation of inputs may be introduced to avoid non-string values of the input dictionary.
Header file
// File: test.h
#pragma once
#include <iostream>
#include <string>
#include <unordered_map>
void method(std::unordered_map<std::string, std::string>* inout) {
for( const auto& n : (*inout) ) {
std::cout << "Key:[" << n.first << "] Value:[" << n.second << "]\n";
}
(*inout)["BLACK"] = "#000000";
};
Interface file
// File : dictmap.i
%module dictmap
%{
#include "test.h"
%}
%include "typemaps.i"
%typemap(in) std::unordered_map<std::string, std::string>* (std::unordered_map<std::string, std::string> temp) {
PyObject *key, *value;
Py_ssize_t pos = 0;
$1 = &temp;
temp = std::unordered_map<std::string, std::string>();
while (PyDict_Next($input, &pos, &key, &value)) {
(*$1)[PyString_AsString(key)] = std::string(PyString_AsString(value));
}
}
%typemap(argout) std::unordered_map<std::string, std::string>* {
$result = PyDict_New();
for( const auto& n : *$1) {
PyDict_SetItemString($result, n.first.c_str(),
PyString_FromString(n.second.c_str()));
}
}
%include "test.h"
Test
import dictmap
out = dictmap.method({'WHITE' : '#FFFFFF'})
Output is an updated dictionary
In[2]: out
Out[3] : {'BLACK': '#000000', 'WHITE': '#FFFFFF'}

Python C-extension class with a dynamic size

I try to write C-extension which contains python class.
This class takes a filename as a parameter fname of constructor, then loads that file to memory and stores as a config.
Below what I have:
typedef struct {
PyObject_HEAD
XmlConfig m_cfg;
} PyAgent;
static int PyAgent_init(PyAgent *self, PyObject *args) {
const char *fname;
if (!PyArg_ParseTuple(args, "s", &fname)) {
return NULL;
}
self->m_cfg.Load(fname);
return 0;
}
static PyTypeObject PyAgentType = {
PyObject_HEAD_INIT(NULL)
0, /* ob_size */
"classify.Agent", /* tp_name */
...
}
...
I get segmentation fault when I try to load file. As I understand it happens because PyAgent struct has object size that increases because of memory allocation for file.
I've tried to use PyObject_VAR_HEAD macro but it doesn't help.
Can you give a clue or working similar example?
Maybe self->m_cfg needs to be initialized first? You are calling a C++ method on it, but didn't call its constructor first. The struct is allocated by C code, not C++, so it doesn't know it needs to construct this field.
There are ways to manually call the constructor (something like new(&self->m_cfg) XmlConfig() if I remember correctly), but the easiest is probably to have instead a field XmlConfig *cfg, and allocate and free it as needed with the usual syntax of the C++ operators new and delete (the latter to be put in the tp_dealloc).

How do I invoke a method on a C++ class pointer with swig wrappers?

I'm using SWIG to wrap C++ code for use within a Python testing framework. My problem is that I'm getting a pointer to an instance of a class that I need to then invoke methods on. Eg, within my swig file example.i:
iExample* getMyClassInstance();
...
class iExample
{
public:
virtual void somePureVirtualMethod() = 0;
// ...
};
Now, in python, if I had the class, I could just call the method
myClassInstance.somePureVirtualMethod()
However, I don't actually have an instance of the class, of course. I have an opaque pointer generated from SWIG. How do I use it? Obviously in Python I can't do
myClassInstancePtr = example.getMyClassInstance()
myClassInstancePtr->somePureVirtualMethod()
I tried using cpointer.i or pointer.i in swig to generate pointer functions, but that's no good, because it's trying to create copies of the class. This can't even compile with an interface with pure virtual methods, and even if I wasn't using pure virtual methods, I don't want to create a copy of the class, I just want to invoke something on it!
SWIG can handle this just fine. Make sure you define the interface in SWIG and then it won't be opaque. Here's a working example:
%module x
%inline %{
// Define the interface.
struct iExample
{
virtual int somePureVirtualMethod() = 0;
};
iExample* getMyClassInstance();
%}
// Implementation, not exposed to Python
%{
struct Internal : public iExample
{
int somePureVirtualMethod() { return 5; }
};
iExample* getMyClassInstance() { return new Internal(); }
%}
Demo:
>>> import x
>>> i = x.getMyClassInstance()
>>> i.somePureVirtualMethod()
5
However, this implementation will leak an Internal Instance. You may want to implement a way to free it automatically. One way is to use %newobject and define a virtual destructor. Python will delete the object when there are no more references to it.
%module x
%newobject getMyClassInstance;
%inline %{
struct iExample
{
virtual ~iExample() {};
virtual int somePureVirtualMethod() = 0;
};
iExample* getMyClassInstance();
%}
// Implementation
%{
#include <iostream>
struct Internal : public iExample
{
int somePureVirtualMethod() { return 5; }
~Internal() { std::cout << "destroyed" << std::endl; }
};
iExample* getMyClassInstance() { return new Internal(); }
%}
Demo:
>>> import x
>>> i = x.getMyClassInstance()
>>> i.somePureVirtualMethod()
5
>>> i=2 # reassign i
destroyed # garbage-collected
The simplest answer I've found is to edit your example.i to add in some helper functions to do dereferencing. In your swig file example.i:
{%
...
// Helper function to dereference pointers within python
template <typename T>
T& dereference(T* ptr)
{
return *ptr;
}
...
%}
...
// Make every version of the templated functions we'll need
template <typename T> T& dereference(T* ptr);
%template(dereferencePtr_iExample) dereference<iExample>;
Now in python
myClassInstance = example.dereferencePtr_iExample(example.getMyClassInstance())
myClassInstance.somePureVirtualMethod()
I imagine this method should work generically for other languages like perl as well, and you don't have to screw around with SWIG typemaps.

How to get current function name in Python C-API?

I implemented a bunch of functions and they are dispatched from the same C function called by the Python interpreter:
PyObject *
CmdDispatch(PyObject *self, PyObject *args, PyObject *kwargs)
Unexpectedly, self is NULL, and I need to get the function name currently being called. Is there any way to get this information?
I have dozens of functions which are all going through this routine. This command processes all of the options into a C++ map and passes it along to the implementation of each command.
Update:
http://docs.python.org/extending/extending.html#a-simple-example specifically says "The self argument points to the module object for module-level functions; for a method it would point to the object instance.", but I am getting passed null when linking against python 2.6.
I've been trying to solve very similar problem.
The conclusion I've come to suggests there is no way to determine name of current function or caller(s) at the level of Python C API. The reason being, Python interpreter puts on call stack only pure Python functions (implemented in Python itself). Functions implemented in C, regardless if registered in module methods table, are not put on Python stack, thus it's not possible to find them inspecting the stack frames.
Here is a quick example in Python illustrating what I wanted to achieve (I assume Juan asks for similar behaviour):
import sys
def foo():
print('foo:', sys._getframe(0).f_code.co_name)
def bar():
print('bar:', sys._getframe(0).f_code.co_name)
foo()
bar()
Here is close equivalent of this example (based on the Python 3 docs) but implemented using Python C API:
// Based on example from Python 3.2.1 documentation, 5.4. Extending Embedded Python
// http://docs.python.org/release/3.2.1/extending/embedding.html#extending-embedded-python
//
#include <Python.h>
#include <frameobject.h>
static void foo()
{
PyThreadState * ts = PyThreadState_Get();
PyFrameObject* frame = ts->frame;
while (frame != 0)
{
char const* filename = _PyUnicode_AsString(frame->f_code->co_filename);
char const* name = _PyUnicode_AsString(frame->f_code->co_name);
printf("foo: filename=%s, name=%s\n", filename, name);
frame = frame->f_back;
}
}
static void bar()
{
PyRun_SimpleString(
"import sys\n"
"print(\"bar: filename=%s, name=%s\" % (sys._getframe(0).f_code.co_filename, sys._getframe(0).f_code.co_name))"
);
}
static PyObject* emb_numargs(PyObject *self, PyObject *args)
{
foo();
bar();
PyRun_SimpleString(
"import sys\n"
"print(\"emb_numargs: filename=%s, name=%s\" % (sys._getframe(0).f_code.co_filename, sys._getframe(0).f_code.co_name))"
);
return PyLong_FromLong(0);
}
static PyMethodDef EmbMethods[] = {
{"numargs", emb_numargs, METH_VARARGS, "Return number 0"},
{NULL, NULL, 0, NULL}
};
static PyModuleDef EmbModule = {
PyModuleDef_HEAD_INIT, "emb", NULL, -1, EmbMethods,
NULL, NULL, NULL, NULL
};
static PyObject* PyInit_emb(void)
{
return PyModule_Create(&EmbModule);
}
int main(int argc, char* argv[])
{
PyImport_AppendInittab("emb", &PyInit_emb);
Py_Initialize();
PyRun_SimpleString(
"import emb\n"
"print('This is Zero: ', emb.numargs())\n"
);
Py_Finalize();
return 0;
}
I hope this completes Ned's answer too.
The Python api isn't built to tell you what function it is calling. You've created a function, and it is calling it, the API assumes you know what function you've written. You'll need to create a small wrapper function for each of your Python functions. The best way to do this is to register your one C function as one Python function that takes a string as its first argument. Then, in your Python layer, create as many Python functions as you need, each invoking your C function with the proper string argument identifying what function you really want to call.
Another alternative is to rethink the structure of your C code, and have as many functions as you need, each of which invokes your common helper code to process options, etc.
NOTICE error checking for API is not being provided for clarity;
This example insert a new function directly on python __builtin__ module, allowing to call the method without class.method schema. Just change mymodule to any other module as you wish.
PyObject* py_cb(PyObject *self, PyObject *args)
{
const char *name = (const char *) PyCObject_AsVoidPtr(self);
printf("%s\n", name);
Py_RETURN_NONE;
}
int main(int argc, char *argv)
{
PyObject *mod, *modname, *dict, *fnc, *usrptr;
const char *mymodule = "__builtin__";
PyMethodDef *m;
const char *method = "hello_python";
Py_Initialize();
mod = PyImport_ImportModule(mymodule);
modname = PyString_FromString(mymodule);
dict = PyModule_GetDict(mod);
m = (PyMethodDef *) calloc(1, sizeof(PyMethodDef));
m->ml_name = strdup(method);
m->ml_meth = py_cb;
usrptr = PyCObject_FromVoidPtr("I'm am the hello_python!", NULL);
fnc = PyCFunction_NewEx(m, usrptr, modname);
PyDict_SetItemString(dict, method, fnc);
...
When python script execute hello_python, the py_cb extension function will show:
I'm am the hello_python!
The self is used to send a real pointer such as the library context instead of this const char * of this example, this is now a matter of changing it to something interesting though.
I don't know if can be done directly from the C-API. At worst, you could call the traceback module from C, to get the name of the caller.

Categories