Memory deallocation from SWIG typemap - python

I am trying to fix a memory leak in a Python wrapper for a C++ dll.
The problem is when assigning a byte buffer to a helper object that has been created in Python:
struct ByteBuffer
{
int length;
uint8_t * dataBuf;
};
I want to supply the dataBuf as a Python array, so the typemap that I came up with (and works) is that:
%module(directors="1") mymodule
%typemap(in) uint8_t * (uint8_t *temp){
int length = PySequence_Length($input);
temp = new uint8_t[length]; // memory allocated here. How to free?
for(int i=0; i<length; i++) {
PyObject *o = PySequence_GetItem($input,i);
if (PyNumber_Check(o)) {
temp[i] = (uint8_t) PyLong_AsLong(o);
//cout << (int)temp[i] << endl;
} else {
PyErr_SetString(PyExc_ValueError,"Sequence elements must be uint8_t");
return NULL;
}
}
$1 = temp;
}
The problem is that the typemap allocates memory for a new C array each time and this memory is not freed within the dll. In other words, the dll expects the user to manage the memory of the dataBuf of the ByteBuffer. For example, when creating 10000 such objects sequentially in Python and then deleting them, it the memory usage rises steadily (leak):
for i in range(10000):
byteBuffer = mymodule.ByteBuffer()
byteBuffer.length = 10000
byteBuffer.dataBuf = [0]*10000
# ... use byteBuffer
del byteBuffer
Is there a way to delete the allocated dataBuf from python? Thank you for your patience!
Edit: I don't post the whole working code to keep it short. If required, I'll do it. Additionally, I am using Python 3.5 x64 and SWIG ver 3.0.7

It was far more simple than I thought. I just added that to the .i file
%typemap(freearg) uint8_t * {
//cout << "Freeing uint8_t*!!! " << endl;
if ($1) delete[]($1);
}
Seems to work.
Edit: switched free with delete[]

Related

How to force/test malloc failure in shared library when called via Python ctypes

I have a Python program that calls a shared library (libpq in this case) that itself calls malloc under the hood.
I want to be able to test (i.e. in unit tests) what happens when those calls to malloc fail (e.g. when there isn't enough memory).
How can I force that?
Note: I don't think setting a resource limit on the process using ulimit -d would work. It would need to be be precise and robust enough to, say, make a single malloc call inside libpq, for example one inside PQconnectdbParams, to fail, but all others to work fine, across different versions of Python, and even different resource usages in the same version of Python.
It's possible, but it's tricky. In summary
You can override malloc in a shared library, test_malloc_override.so say, and then (on linux at least) using the LD_PRELOAD environment variable to load it.
But... Python calls malloc all over the place, and you need those to succeed. To isolate the "right" calls to malloc to fail you can use the glibc functions "backtrace" and "backtrace_symbols" to inspect the stack to see if it's the right one to fail.
This shared library exposes a small API to control which calls to malloc will fail (so it doesn't need to be hard coded in the library)
To allow some calls to malloc to succeed, you need a pointer to the original malloc function. However, to find this you need to call dlsym, which itself can call malloc. So you need to build in a simple allocator inside the new malloc so these calls (recursive) calls to malloc succeed. Thanks to https://stackoverflow.com/a/10008252/1319998 for this tip.
In more detail:
The shared library code
// In test_override_malloc.c
// Some of this code is inspired by https://stackoverflow.com/a/10008252/1319998
#define _GNU_SOURCE
#include <dlfcn.h>
#include <execinfo.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
// Fails malloc at the fail_in-th call when search_string is in the backtrade
// -1 means never fail
static int fail_in = -1;
static char search_string[1024];
// To find the original address of malloc during malloc, we might
// dlsym will be called which might allocate memory via malloc
static char initialising_buffer[10240];
static int initialising_buffer_pos = 0;
// The pointers to original memory management functions to call
// when we don't want to fail
static void *(*original_malloc)(size_t) = NULL;
static void (*original_free)(void *ptr) = NULL;
void set_fail_in(int _fail_in, char *_search_string) {
fail_in = _fail_in;
strncpy(search_string, _search_string, sizeof(search_string));
}
void *
malloc(size_t size) {
void *memory = NULL;
int trace_size = 100;
void *stack[trace_size];
static int initialising = 0;
static int level = 0;
// Save original
if (!original_malloc) {
if (initialising) {
if (size + initialising_buffer_pos >= sizeof(initialising_buffer)) {
exit(1);
}
void *ptr = initialising_buffer + initialising_buffer_pos;
initialising_buffer_pos += size;
return ptr;
}
initialising = 1;
original_malloc = dlsym(RTLD_NEXT, "malloc");
original_free = dlsym(RTLD_NEXT, "free");
initialising = 0;
}
// If we're in a nested malloc call (the backtrace functions below can call malloc)
// then call the original malloc
if (level) {
return original_malloc(size);
}
++level;
if (fail_in == -1) {
memory = original_malloc(size);
} else {
// Find if we're in the stack
backtrace(stack, trace_size);
char **symbols = backtrace_symbols(stack, trace_size);
int found = 0;
for (int i = 0; i < trace_size; ++i) {
if (strstr(symbols[i], search_string) != NULL) {
found = 1;
break;
}
}
free(symbols);
if (!found) {
memory = original_malloc(size);
} else {
if (fail_in > 0) {
memory = original_malloc(size);
}
--fail_in;
}
}
--level;
return memory;
}
void free(void *ptr) {
if (ptr < (void*) initialising_buffer || ptr > (void*)(initialising_buffer + sizeof(initialising_buffer))) {
original_free(ptr);
}
}
Compiled with
gcc -shared -fPIC test_override_malloc.c -o test_override_malloc.so -ldl
Example Python code
This could go inside the unit tests
# Inside my_test.py
from ctypes import cdll
cdll.LoadLibrary('./test_override_malloc.so').set_fail_in(0, b'libpq.so')
# ... then call a function in the shared library libpq.so
# The `0` above means the very next call it makes to malloc will fail
Run with
LD_PRELOAD=$PWD/test_override_malloc.so python3 my_test.py
(This might all not be worth it admittedly... if Python calls malloc a lot, I wonder if that in most situations it's unlikely that Python will be fine but just the one call in the library will fail)

Freeze/Fail when using functional with OpenMP [Pybind11/OpenMP]

I have a problem with the functional feature of Pybind11 when I use it with a for-loop with OpenMP. I've done some research and my problem sounds pretty similar to the one in this Pull Request from 2 years ago, but although this PR is closed and the issue seems to be fixed I still have this issue. A code example I created will hopefully explain my problem better:
b.h
#include <pybind11/pybind11.h>
#include <pybind11/functional.h>
#include <omp.h>
namespace py = pybind11;
class B {
public:
B(int n, const int& initial_value);
void map(const std::function<int(int)> &f);
private:
int n;
int* elements;
};
b.cpp
#include <pybind11/pybind11.h>
#include <pybind11/functional.h>
#include "b.h"
namespace py = pybind11;
B::B(int n, const int& v)
: n(n) {
elements = new int[n];
#pragma omp parallel for
for (int i = 0; i < n; i++) {
elements[i] = v;
}
}
void B::map(const std::function<int(int)> &f) {
#pragma omp parallel for
for (int i = 0; i < n; i++) {
elements[i] = f(elements[i]);
}
}
PYBIND11_MODULE(m, handle) {
handle.doc() = "Example Module";
py::class_<B>(handle, "B")
.def(py::init<int, int>())
.def("map", &B::map)
;
}
CMakeLists.txt
cmake_minimum_required(VERSION 3.4...3.18)
project(example)
find_package(OpenMP)
add_subdirectory(pybind11)
pybind11_add_module(m b.cpp)
if(OpenMP_CXX_FOUND)
target_link_libraries(m PUBLIC OpenMP::OpenMP_CXX)
else()
message( FATAL_ERROR "Your compiler does not support OpenMP" )
endif()
test.py
from build.m import *
def test(i):
return i * 20
b = B(2, 2)
b.map(test)
I basically have an array where I want to apply a Python function to every element using a for-loop. I know that it is an issue with functional and OpenMP specifically because in other parts of my project I am using OpenMP successfully and functional is also working if I am not using OpenMP.
Edit: It freezes at the map function and has to be terminated. I am using Ubuntu 21.10, Python 3.9, GCC 11.2.0, OpenMP 4.5, and the newest version of the pybind11 repo.
You're likely experiencing a deadlock between OpenMP's scheduler and Python's GIL (Global Interpreter Lock).
I suggest attaching gdb to your process and looking at where the threads are to verify that's really the problem.
IMHO mixing Python functions and OpenMP like that is asking for trouble. If you want multi-threading of Python functions you can use multiprocessing.pool.ThreadPool. But unless your functions release the GIL most of the time you won't benefit from multi-threading.

Python embed into C++

I have python codes ebmedded into C++.
Do I need to release memory(Py_XDECREF) PyObject* pValue and PyObject *pArgs?
When I do Py_XDECREF(pArgs) and Py_XDECREF(pValue) I have Segmentation Fault (Core dumped).
I think python side is still using those variables and c++ try to release memory.
What is the best practice for this issue?
for(int i=0; i < 100: i++){
.......do sth.......
if (pModule != NULL) {
std::string st = jps.updateZone(worldx_y, lenVect);
PyObject* pValue = PyBytes_FromString(st.c_str());
if (pFunc_insert && PyCallable_Check(pFunc_insert)) {
PyObject *pArgs = PyTuple_New(1);
PyTuple_SetItem(pArgs, 0, pValue);
PyObject_CallObject(pFunc_insert, pArgs);
Py_XDECREF(pArgs);
}
Py_XDECREF(pValue);
}
......do sth.......
}
PyTuple_SetItem steals a reference to the item. You don't need to decref the item, because you no longer own a reference to it. You do need to decref the tuple.
If you still get segfaults after that, you have some other bug.

Passing variable from embedded Python to C

Can Someone please explain to me how I can pass a Variable from embedded Python to my C Program?
I've looked everywhere on the web and what I found I did not understand, because I know very little Python.
I tried to create a callback function in C, but I did not understand how its supposed to work.
Now, my main program is in C. There I create a Python Object and in a thread and call a Python Function from a Python Script. This Function produces values and these values I need to pass back to the C program for further use.
For embedding, I would recommend looking at the docs on the Python website: https://docs.python.org/3.4/extending/embedding.html#pure-embedding
The section that is of interest to you is:
if (pModule != NULL) {
pFunc = PyObject_GetAttrString(pModule, argv[2]);
/* pFunc is a new reference */
if (pFunc && PyCallable_Check(pFunc)) {
pArgs = PyTuple_New(argc - 3);
for (i = 0; i < argc - 3; ++i) {
pValue = PyLong_FromLong(atoi(argv[i + 3]));
if (!pValue) {
Py_DECREF(pArgs);
Py_DECREF(pModule);
fprintf(stderr, "Cannot convert argument\n");
return 1;
}
/* pValue reference stolen here: */
PyTuple_SetItem(pArgs, i, pValue);
}
pValue = PyObject_CallObject(pFunc, pArgs);
Py_DECREF(pArgs);
if (pValue != NULL) {
printf("Result of call: %ld\n", PyLong_AsLong(pValue));
Py_DECREF(pValue);
}
else {
Py_DECREF(pFunc);
Py_DECREF(pModule);
PyErr_Print();
fprintf(stderr,"Call failed\n");
return 1;
}
}
else {
if (PyErr_Occurred())
PyErr_Print();
fprintf(stderr, "Cannot find function \"%s\"\n", argv[2]);
}
Py_XDECREF(pFunc);
Py_DECREF(pModule);
}
specifically this line:
pValue = PyObject_CallObject(pFunc, pArgs);
this is calling a python function (callable python object), pFunc, with arguments pArgs, and returning a python object pValue.
I would suggest reading through that entire page to get a better understanding of embedding python. Also, since you say you know very little python, I would suggest getting more familiar with the language and how different it is from c/c++. You'll need to know more about how python works before you can be effective with embedding it.
Edit:
If you need to share memory between your c/c++ code and python code (Running in separate thread), I don't believe you can share memory directly, and least not in the way you would normally do with just c/c++. However, you can create a memory mapped file to achieve the same effect with about the same performance. Doing so is going to be platform dependent, and I don't have any experience with doing so, but here is a link that should help: http://www.codeproject.com/Articles/11843/Embedding-Python-in-C-C-Part-II.
Basically, you just create the mmap in c (this is the platform dependent part) and create an mmap to the same file in your python code, then write/read to/from the file descriptor in each of your threads.

PyRun_SimpleString call cause memory corruption?

I am trying to use Python in C++ and have the following code. I intended to parse and do sys.path.append on a user input path. It looks like the call to PyRun_SimpleString caused some sort of spillage into a private class var of the class. How did this happen? I have tried various buffer size 50, 150, 200, and it did not change the output.
class Myclass
{
...
private:
char *_modName;
char *_modDir;
};
Myclass::Myclass()
{
Py_Initialize();
PyRun_SimpleString("import sys");
PyRun_SimpleString((char *)"sys.path.append('/home/userA/Python')");
}
Myclass::init()
{
// this function is called before Myclass::test()
// a couple other python funcitons are called as listed below.
// PyString_FromString, PyImport_Import, PyObject_GetAttrString, PyTuple_New, PyTuple_SetItem, PyObject_CallObject, PyDict_GetItemString
}
Myclass::test()
{
char buffer[150];
char *strP1 = (char *)"sys.path.append('";
char *strP2 = (char *)"')";
strcpy (buffer, strP1);
strcat (buffer, _modDir);
strcat (buffer, strP2);
printf("Before %s\n", _modDir);
printf("Before %s\n", _modName);
PyRun_SimpleString(buffer);
printf("After %s\n", _modName);
}
Here is the output. FYI I'm using a,b,c,d,f for illustration purpose only. It almost fills like PyRun_SimpleString(buffer) stick the end of buffer into _modName.
Before /aaa/bbb/ccc/ddd
Before ffffff
After cc/ddd'
Thanks for Klamer Schutte for hinting in the correct direction.
The DECRF in my code was the culprit. Being unfamilar with how reference works, I guess. The DECREF call released pValue together with it the content pointed by _modName. A guess a more beginner quesiton would be, should I have added a Py_INCREF(pValue) after _modName assignment?
_modName = PyString_AsString (PyDict_GetItemString(pValue, (char*)"modName"));
Py_DECREF(pValue);

Categories