C++ and Python 3 memory leak using PyArg_ParseTuple - python

I'm not a C++ developer so i don't really know what I'm doing. Unfortunately I have to debug the following code but I'm not making any progress.
static PyObject* native_deserialize(PyObject *self, PyObject *args){
PyObject * pycontent;
int len;
PyObject * props = NULL;
PyArg_ParseTuple(args, "|SiO", &pycontent, &len, &props);
RecordParser reader("onet_ser_v0");
TrackerListener* listener;
listener = new TrackerListener(props);
#if PY_MAJOR_VERSION >= 3
reader.parse((unsigned char*)PyBytes_AsString(pycontent), len, *listener);
#else
reader.parse((unsigned char*)PyString_AsString(pycontent), len, *listener);
#endif
return listener->obj;
}
Here is the python that calls that code:
clsname, data = pyorient_native.deserialize(content,
content.__sizeof__(), self.props)
This code creates a nasty memory leak. In fact, when I run this code, it kills my memory within 20 minutes.
I looked at the code but can't find the problem in the C++.
How can I prevent rogue C++ code from killing my Python code? Is there a way to flag C++ code from within python to be recycled regardless whether the C++ created a memory leak?
Is there a way I can force the memory to be garbage collected in C++. How can I find the exact leak in C++ by running python?
My biggest issue is understanding Py_XDECREF and Py_XINCREF and the rest of the reference counting macros. I'm reading the docs but obviously I'm missing some context because I can't figure out where and when to use these. I have loads of respect for C++ developers. Their jobs seem unnecessarily difficult :(

It turns out the solution was to Py_XDECREF the reference count for al the created objects. I still don't know exactly how, why and were as many of this still doesn't make sense to me.
I found this page that points out some of the pitfalls of these macros.
https://wingware.com/psupport/python-manual/2.3/ext/node22.html
There is the documentation but that wasn't very helpful.
https://docs.python.org/3/c-api/refcounting.html
Maybe someone can share something else that is easier to consume for us non C++ peoplez?

Related

Handling objects from a C library in python code

I would like to implement a C / C++ library from a .dll file into a Python script to control a piece of i/o equipment called ClipX by HBM (in case anyone needs help with this in the future).
The manufacturer gives an example C implementation, and an example C++ implementation. In the C example, the Connect() function returns some pointer, which is used in subsequent read/write functions. In the C++ example, a ClipX class is used to establish the connection, and read/write functions are methods in that class. I've simplified the code for the purposes of this question.
Basically, I want to connect() to the device, and at some later point read() from it. From what I've read, it seems like Cython would be a good way to wrap connect() and read() as separate functions, and import them as a module into Python. My questions are:
For the C implementation, would I be able to pass MHandle pointer back to Python, after connecting, for later use (i.e. calling the read function)? Would the pointer even have any meaning, being used later in a different function call?
For the C++ implementation, could the dev object be passed to the Python code, to be later passed back for a Read()? Can you do that with arbitrary objects?
I am a mechanical engineer, sorry if this is gibberish or wildly uninformed. Any guidance is very much appreciated.
C Code:
/*From .h file*/
----------------------------------------------------
struct sClipX {
void *obj;
};
typedef struct sClipX * MHandle;
ClipX_API MHandle __stdcall Connect(const char *);
----------------------------------------------------
/*End .h file*/
int main()
{
const char * IP = "172.21.104.76";
MHandle m=Connect(IP);
Read(m, 0x4428);
}
C++ Code:
int main(){
ClipX dev = ClipX();
dev.Connect("172.21.104.76");
dev.Read(0x4428);
C++ functions are callable from C if you declare them as extern "C". This is related to name mangling
The Python interpreter can be extended with C functions. Read carefully the Extending and Embedding the Python Interpreter chapter.
Be careful about C++ exceptions. You don't want them to cross the Python interpreter code. So any extern "C" C++ function called from Python should handle and catch exceptions raised by internal routines.
At last, be careful about memory management and garbage collection. P.Wilson old paper on Uniprocessor garbage collection techniques is relevant, at least for terminology and insights. Or read the GC handbook. Python uses a reference counting scheme and handles specially weak references. Be careful about circular references.
Be of course aware of the GIL in Python. Roughly speaking, you cannot have several threads doing Python things without precautions.
Serialization of device-related data would also be a concern, but you probably don't need it.
Most importantly, document well your code.
Tools like doxygen could help (perhaps with LaTeX or DocBook).
Use of course a good enough version control system. I recommend git. Also a good build automation tool.
My suggestion is to publish your C++ code as open source, e.g. on github or gitlab. You then could get useful code reviews and feedback.
If your hardware + software system is safety-critical, consider static program analysis techniques e.g. with Frama-C or Clang static analyzer or with your own GCC plugin. In a few months (end of 2020), you might try Bismon (read also this draft report).
I am definitely biased, but I do recommend trying some Linux distribution (e.g. Ubuntu or Debian) as your cross-development platform. Be aware that a lot of devices (including RaspBerryPi) are running some embedded Linux system, so the learning effort makes sense. Then read Advanced Linux Programming

Does python code using boost python in cpp make dynamic memory allocation?

In my code, python grammar is written in Cpp code using boost python.
It works well. But I didn't fully understand how it increase object size.
Even, I couldn't prove inner code.
So I worry about memory leak.
Because real code is very huge, system will be able to die.
Below code is simple example which same algorithm applied.
Could anyone tell me that this code make memory leak or how it increase object size?
using namespace boost::python;
dict get_name(){
int i;
dict school;
list class1;
for(i=0;i<10;i++){
class1.append(student[i]);
}
school["class1"] = class1;
return school;
}
This code might or might not leak depending of the held type of whatever student is mapping. But otherwise, no, you should not have to worry about a leak because the objects are wrapped by Python which will delete them whenever they are no longer referenced.

How to stop memory leaks when using `as_ptr()`?

Since it's my first time learning systems programming, I'm having a hard time wrapping my head around the rules. Now, I got confused about memory leaks. Let's consider an example. Say, Rust is throwing a pointer (to a string) which Python is gonna catch.
In Rust, (I'm just sending the pointer of the CString)
use std::ffi::CString;
pub extern fn do_something() -> *const c_char {
CString::new(some_string).unwrap().as_ptr()
}
In Python, (I'm dereferencing the pointer)
def call_rust():
lib = ctypes.cdll.LoadLibrary(rustLib)
lib.do_something.restype = ctypes.c_void_p
c_pointer = lib.do_something()
some_string = ctypes.c_char_p(c_pointer).value
Now, my question is about freeing the memory. I thought it should be freed in Python, but then ownership pops in. Because, as_ptr seems to take an immutable reference. So, I got confused about whether I should free the memory in Rust or Python (or both?). If it's gonna be Rust, then how should I go about freeing it when the control flow has landed back into Python?
Your Rust function do_something constructs a temporary CString, takes a pointer into it, and then drops the CString. The *const c_char is invalid from the instant you return it. If you're on nightly, you probably want CString#into_ptr instead of CString#as_ptr, as the former consumes the CString without deallocating the memory. On stable, you can mem::forget the CString. Then you can worry about who is supposed to free it.
Freeing from Python will be tricky or impossible, since Rust may use a different allocator. The best approach would be to expose a Rust function that takes a c_char pointer, constructs a CString for that pointer (rather than copying the data into a new allocation), and drops it. Unfortunately the middle part (creating the CString) seems impossible on stable for now: CString::from_ptr is unstable.
A workaround would to pass (a pointer to) the entire CString to Python and provide an accessor function to get the char pointer from it. You simply need to box the CString and transmute the box to a raw pointer. Then you can have another function that transmutes the pointer back to a box and lets it drop.

Const correctness of Python's C API

It seems that the Python C API is not consistent with the const correctness of character arrays. For example, PyImport_ImportFrozenModule accepts a char*, whereas PyImport_ImportModule accepts a const char*.
The implication of all this is that in my C++ application that I am writing with an embedded Python interpreter, I sometimes have to cast the string literal that I pass to a Python API call as just a char* (as opposed to const char*), and sometimes I don't. For example:
PyObject *os = PyImport_ImportModule("os"); // Works without the const_cast
PyObject *cwd = PyObject_CallMethod(os, const_cast<char*>("getcwd"), NULL); // Accepts char*, not const char*
If I don't do the const_cast<char*> (or (char*)) on the string literal, I get a compiler warning about casting string literals to char*.
Here are my questions:
Is there an advantage/reason to having some of the functions not take a const char* (and/or why would the Python API not be consistent in this)? My understanding is that if the function can take a string literal, it cannot change the char* so the const modifier would just be reinforcing this. I also believe that the const distinction is not as important for C (for which the API was written) than it is in C++ (correct me if I am wrong... my strength is python, not C/C++). Is the lack of "const correctness" of the Python API because it's simply not as important in C? (There is an old thread on the python mailing list from 2000 asking the same question, but it didn't seem to go anywhere and it is implied the reason might be due to some compilers not supporting const. Since many functions now have const char*, this doesn't seem to apply anymore)
Because my understanding of C++ is limited, I am unsure if I am going about casting string literals properly. The way I see it, I can either one of the following (I am currently doing the first):
// Method 1) Use const_cast<char*>
PyImport_ImportFrozenModule(const_cast<char*>("mymodule"));
// Method 2) Use (char*)
PyImport_ImportFrozenModule((char*) "mymodule");
// Method 3) Use char array
char mod[] = "mymodule";
PyImport_ImportFrozenModule(mod);
Which is the best method do use?
Update:
It looks like the Python3 branch is slowly trying to fix the const correctness issue. For example, the PyImport_ImportFrozenModule function I use as an example above now takes a const char* in Python 3.4, but there are still functions that take only a char*, such as PyLong_FromString.
Based on some mailing list conversations from python-dev, it looks like the initial API just simply wasn't created with const correctness in mind, probably just because Guido didn't think about it. Dating all the way back to 2002, someone asked if there was any desire to address that by adding const-correctness, complaining that it's a pain to always have to do this:
somefunc(const char* modulename, const char* key)
{
... PyImport_ImportModule(const_cast<char*>(modulename)) ...
Guido Van Rossum (the creator of Python) replied (emphasis mine):
I've never tried to enforce const-correctness before, but I've heard
enough horror stories about this. The problem is that it breaks 3rd
party extensions left and right, and fixing those isn't always easy.
In general, whenever you add a const somewhere, it ends up propagating
to some other API, which then also requires a const, which propagates
to yet another API needing a const, ad infinitum.
There was a bit more discussion, but without Guido's support the idea died.
Fast forward nine years, and the topic came up again. This time someone was simply wondering why some functions were const-correct, while others weren't. One of the Python core developers replied with this:
We have been adding const to many places over the years. I think the
specific case was just missed (i.e. nobody cared about adding const
there).
It seems that when it could be done without breaking backwards compatibility, const-correctness has been added to many places in the C API (and in the case of Python 3, in places where it would break backwards compatibility with Python 2), but there was never a real global effort to fix it everywhere. So the situation is better in Python 3, but the entire API is likely not const correct even now.
I'm don't think that the Python community has any preferred way to handle casting with calls that are not const-correct (there's no mention of it in the official C-API style guide), probably because there aren't a ton of people out there interfacing with the C-API from C++ code. I would say the preferred way of doing it from a pure C++ best-practices perspective would be the first choice, though. (I'm by no means a C++ expert, so take that with a grain of salt).
Is there an advantage/reason to having some of the functions not take a const char*?
No. Looks like an oversight in the library's design or, like you say, legacy issues. They could at least have made it consistent, though!
My understanding is that if the function can take a string literal, it cannot change the char* so the const modifier would just be reinforcing this.
Exactly. Their documentation should also specify that the function argument (or, rather, the argument's pointee) shall not be modified during the function call; alas it currently does not say this.
I also believe that the const distinction is not as important for C (for which the API was written) than it is in C++.
Well, not really, at least as far as I know.
The way I see it, I can either one of the following (I am currently doing the first)
(good)
Which is the best method do use?
Well the const_cast will at least make sure that you are only modifying the const-ness, so if you had to choose I'd go with that. But, really, I wouldn't be too bothered about this.

LIBSVM Python ctypes string function pointer segmentation faults

I've been porting a Python package that uses libsvm onto some production servers and ran into a strange segmentation fault which I traced to a ctypes function pointer. I'm trying to determine where the ctypes wrapper failed and if this is a distro specific problem or not.
The system I am running this on is a very clean virtual machine with almost nothing installed:
Solaris 5.11
amd64 pentium_pro+mmx pentium_pro pentium+mmx pentium i486 i386 i86
Python 2.7.2
Now for the problem description and how I narrowed to ctypes. In libsvm you can specify the print function by passing a void (*print_func)(const char *) pointer into the svm_set_print_string_function function. The default with a NULL pointer is to print to stdout. Now the interesting part is that the Python wrapper for libsvm (which works fine on a variety of other systems) makes such a function pointer when asking for quiet mode (no printing) via the following:
PRINT_STRING_FUN = CFUNCTYPE(None, c_char_p)
def print_null(s):
return
if argv[i] == "-q":
self.print_func = PRINT_STRING_FUN(print_null)
libsvm.svm_set_print_string_function(self.print_func)
When I set quiet mode libsvm accepts the function pointer but hangs after a few seconds when calling svm_train then seg faults. I tried making a void * argument function pointer and then casting it to a const char * function pointer with the same results, which means it wasn't the conversion from const char * to a PyStringObject.
Then I finally just wrote a C++ function to set the function pointer to a no-op in the library itself by:
void print_null(const char *) {}
void svm_set_print_null() {
svm_set_print_string_function(&print_null);
}
which worked as expected with no segmentation faults. This leads me to think that the ctypes is failing at some internal point of function pointer conversion. Looking through the ctypes source files hasn't revealed anything obvious to me though I haven't worked a lot with ctypes explicitly so it's difficult to narrow down where the bug might be.
I can use my library addition solution for now, but if I want to silently process the returns I would need to actually be able to pass a function pointer into libsvm. Plus it doesn't give me peace of mind about stability if I need to implement such workarounds without knowing what's the true root cause of the problem.
Has anyone else had problems with libsvm print functions on Solaris or specifically ctypes function pointers in Python on Solaris? I couldn't find anything searching online about either such problems with Solaris. I planning on playing around with library calls and making some function processing libs to find the exact boundaries of failure, but someone else's input might save me a day or two of debug testing.
UPDATE
The problem is reproducible on the 32bit version of Solaris 5.11 as well.

Categories