PyRun_SimpleString call cause memory corruption? - python

I am trying to use Python in C++ and have the following code. I intended to parse and do sys.path.append on a user input path. It looks like the call to PyRun_SimpleString caused some sort of spillage into a private class var of the class. How did this happen? I have tried various buffer size 50, 150, 200, and it did not change the output.
class Myclass
{
...
private:
char *_modName;
char *_modDir;
};
Myclass::Myclass()
{
Py_Initialize();
PyRun_SimpleString("import sys");
PyRun_SimpleString((char *)"sys.path.append('/home/userA/Python')");
}
Myclass::init()
{
// this function is called before Myclass::test()
// a couple other python funcitons are called as listed below.
// PyString_FromString, PyImport_Import, PyObject_GetAttrString, PyTuple_New, PyTuple_SetItem, PyObject_CallObject, PyDict_GetItemString
}
Myclass::test()
{
char buffer[150];
char *strP1 = (char *)"sys.path.append('";
char *strP2 = (char *)"')";
strcpy (buffer, strP1);
strcat (buffer, _modDir);
strcat (buffer, strP2);
printf("Before %s\n", _modDir);
printf("Before %s\n", _modName);
PyRun_SimpleString(buffer);
printf("After %s\n", _modName);
}
Here is the output. FYI I'm using a,b,c,d,f for illustration purpose only. It almost fills like PyRun_SimpleString(buffer) stick the end of buffer into _modName.
Before /aaa/bbb/ccc/ddd
Before ffffff
After cc/ddd'

Thanks for Klamer Schutte for hinting in the correct direction.
The DECRF in my code was the culprit. Being unfamilar with how reference works, I guess. The DECREF call released pValue together with it the content pointed by _modName. A guess a more beginner quesiton would be, should I have added a Py_INCREF(pValue) after _modName assignment?
_modName = PyString_AsString (PyDict_GetItemString(pValue, (char*)"modName"));
Py_DECREF(pValue);

Related

How to force/test malloc failure in shared library when called via Python ctypes

I have a Python program that calls a shared library (libpq in this case) that itself calls malloc under the hood.
I want to be able to test (i.e. in unit tests) what happens when those calls to malloc fail (e.g. when there isn't enough memory).
How can I force that?
Note: I don't think setting a resource limit on the process using ulimit -d would work. It would need to be be precise and robust enough to, say, make a single malloc call inside libpq, for example one inside PQconnectdbParams, to fail, but all others to work fine, across different versions of Python, and even different resource usages in the same version of Python.
It's possible, but it's tricky. In summary
You can override malloc in a shared library, test_malloc_override.so say, and then (on linux at least) using the LD_PRELOAD environment variable to load it.
But... Python calls malloc all over the place, and you need those to succeed. To isolate the "right" calls to malloc to fail you can use the glibc functions "backtrace" and "backtrace_symbols" to inspect the stack to see if it's the right one to fail.
This shared library exposes a small API to control which calls to malloc will fail (so it doesn't need to be hard coded in the library)
To allow some calls to malloc to succeed, you need a pointer to the original malloc function. However, to find this you need to call dlsym, which itself can call malloc. So you need to build in a simple allocator inside the new malloc so these calls (recursive) calls to malloc succeed. Thanks to https://stackoverflow.com/a/10008252/1319998 for this tip.
In more detail:
The shared library code
// In test_override_malloc.c
// Some of this code is inspired by https://stackoverflow.com/a/10008252/1319998
#define _GNU_SOURCE
#include <dlfcn.h>
#include <execinfo.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
// Fails malloc at the fail_in-th call when search_string is in the backtrade
// -1 means never fail
static int fail_in = -1;
static char search_string[1024];
// To find the original address of malloc during malloc, we might
// dlsym will be called which might allocate memory via malloc
static char initialising_buffer[10240];
static int initialising_buffer_pos = 0;
// The pointers to original memory management functions to call
// when we don't want to fail
static void *(*original_malloc)(size_t) = NULL;
static void (*original_free)(void *ptr) = NULL;
void set_fail_in(int _fail_in, char *_search_string) {
fail_in = _fail_in;
strncpy(search_string, _search_string, sizeof(search_string));
}
void *
malloc(size_t size) {
void *memory = NULL;
int trace_size = 100;
void *stack[trace_size];
static int initialising = 0;
static int level = 0;
// Save original
if (!original_malloc) {
if (initialising) {
if (size + initialising_buffer_pos >= sizeof(initialising_buffer)) {
exit(1);
}
void *ptr = initialising_buffer + initialising_buffer_pos;
initialising_buffer_pos += size;
return ptr;
}
initialising = 1;
original_malloc = dlsym(RTLD_NEXT, "malloc");
original_free = dlsym(RTLD_NEXT, "free");
initialising = 0;
}
// If we're in a nested malloc call (the backtrace functions below can call malloc)
// then call the original malloc
if (level) {
return original_malloc(size);
}
++level;
if (fail_in == -1) {
memory = original_malloc(size);
} else {
// Find if we're in the stack
backtrace(stack, trace_size);
char **symbols = backtrace_symbols(stack, trace_size);
int found = 0;
for (int i = 0; i < trace_size; ++i) {
if (strstr(symbols[i], search_string) != NULL) {
found = 1;
break;
}
}
free(symbols);
if (!found) {
memory = original_malloc(size);
} else {
if (fail_in > 0) {
memory = original_malloc(size);
}
--fail_in;
}
}
--level;
return memory;
}
void free(void *ptr) {
if (ptr < (void*) initialising_buffer || ptr > (void*)(initialising_buffer + sizeof(initialising_buffer))) {
original_free(ptr);
}
}
Compiled with
gcc -shared -fPIC test_override_malloc.c -o test_override_malloc.so -ldl
Example Python code
This could go inside the unit tests
# Inside my_test.py
from ctypes import cdll
cdll.LoadLibrary('./test_override_malloc.so').set_fail_in(0, b'libpq.so')
# ... then call a function in the shared library libpq.so
# The `0` above means the very next call it makes to malloc will fail
Run with
LD_PRELOAD=$PWD/test_override_malloc.so python3 my_test.py
(This might all not be worth it admittedly... if Python calls malloc a lot, I wonder if that in most situations it's unlikely that Python will be fine but just the one call in the library will fail)

Bad access when calling PyObject_CallObject for the second time [duplicate]

On the second call of the following code, my app segfault, so I guess I am missing something :
Py_Initialize();
pName = PyString_FromString("comp_macbeth");
pModule = PyImport_Import(pName);
Py_DECREF(pName);
if(pModule == NULL) {
PyErr_Print();
Py_Finalize();
return;
}
pFunc = PyObject_GetAttrString(pModule, "compute");
/* pFunc is a new reference */
if (!pFunc || !PyCallable_Check(pFunc) ) {
PyErr_Print();
Py_Finalize();
return;
}
Py_Finalize();
The comp_macbeth.py is importing numpy. If I remove the numpy import, everything is fine. Is it a numpy bug, or am I missing something about imports ?
From the Py_Finalize docs:
Some extensions may not work properly if their initialization routine is called more than once; this can happen if an application calls Py_Initialize() and Py_Finalize() more than once.
Apparently Numpy is one of those. See also this message from Numpy-discussion.
Calling Py_Initialize() only once, and cleaning up at exit, is the way to go. (And it's should be faster, too!)
I have this in my module initialization part, but the URL does not exist anymore. In case it helps:
// http://numpy.scipy.org/numpydoc/numpy-13.html mentions this must be done in module init, otherwise we will crash
import_array();

Segmentation fault error in python embedded code in C++ code of Omnet++ simple module

I want to call a Python function from C++ code in OMNeT++ simple module.
I debugged the code using gdb. It passes all lines fine, but at the end the
segmentation fault occurs after Py_Finalize();.
I found the following issue on GitHub that describes the same problem.
But it did not help me to resolve the problem.
double result=0;
// 1) Initialise python interpretator
if (!Py_IsInitialized()) {
Py_Initialize();
//Py_AtExit(Py_Finalize);
}
// 2) Initialise python thread mechanism
if (!PyEval_ThreadsInitialized()) {
PyEval_InitThreads();
assert(PyEval_ThreadsInitialized());
}
PyGILState_STATE s = PyGILState_Ensure();
PyRun_SimpleString("import sys; sys.path.append('/home/mypath/')");
PyObject *pName = PyUnicode_DecodeFSDefault((char*)"integrationTest");
PyObject* pModule = PyImport_Import(pName);
if (pModule != NULL)
{
PyObject* pFunction = PyObject_GetAttrString(pModule, (char*)"calculateExecutionTime");
/// changement will be held in this level Args and function result.
PyObject* pArgs = PyTuple_Pack(2,PyFloat_FromDouble(2.0),PyFloat_FromDouble(8.0));
PyObject* pResult = PyObject_CallObject(pFunction, pArgs);
result = (double)PyFloat_AsDouble(pResult);
///////
}
// Clean up
PyGILState_Release(s);
Py_DECREF(pName);
Py_DECREF(pModule);
Py_Finalize();
The problem occurs after the first initialization/uninitialization of python interpreter. What happens during the OmneT++ simulation is initializing/uninitializing/re-initializing/... the Python interpreter. However, Numpy doesn't support this;
So, I resolved this problem by initializing python interpreter just one time in the beginning of the simulation in initialize() method. Then, I called Py_Finalize(); in the destructor.

Python C API free() errors after using Py_SetPath() and Py_GetPath()

I'm trying to figure out why I can't simply get and set the python path through its C API. I am using Python3.6, on Ubuntu 17.10 with gcc version 7.2.0. Compiling with:
gcc pytest.c `python3-config --libs` `python3-config --includes`
#include <Python.h>
int main()
{
Py_Initialize(); // removes error if put after Py_SetPath
printf("setting path\n"); // prints
Py_SetPath(L"/usr/lib/python3.6"); // Error in `./a.out': free(): invalid size: 0x00007fd5a8365030 ***
printf("success\n"); // doesn't print
return 0;
}
Setting the path works fine, unless I also try to get the path prior to doing so. If I get the path at all, even just to print without modifying the returned value or anything, I get a "double free or corruption" error.
Very confused. Am I doing something wrong or is this a bug? Anyone know a workaround if so?
Edit: Also errors after calling Py_Initialize();. Updated code. Now errors even if I don't call Py_GetPath() first.
From alk it seems related to this bug: https://bugs.python.org/issue31532
Here is the workaround I am using. Since you can't call Py_GetPath() before Py_Initialize(), and also seemingly you can't call Py_SetPath() after Py_Initialize(), you can add to or get the path like this after calling Py_Initialize():
#include <Python.h>
int main()
{
Py_Initialize();
// get handle to python sys.path object
PyObject *sys = PyImport_ImportModule("sys");
PyObject *path = PyObject_GetAttrString(sys, "path");
// make a list of paths to add to sys.path
PyObject *newPaths = PyUnicode_Split(PyUnicode_FromWideChar(L"a:b:c", -1), PyUnicode_FromWideChar(L":", 1), -1);
// iterate through list and add all paths
for(int i=0; i<PyList_Size(newPaths); i++) {
PyList_Append(path, PyList_GetItem(newPaths, i));
}
// print out sys.path after appends
PyObject *newlist = PyUnicode_Join(PyUnicode_FromWideChar(L":", -1), path);
printf("newlist = %ls\n", PyUnicode_AsWideCharString(newlist, NULL));
return 0;
}
[the below answer refers to this version of the question.]
From the docs:
void Py_Initialize()
Initialize the Python interpreter. In an application embedding Python, this should be called before using any other Python/C API functions; with the exception of Py_SetProgramName(), Py_SetPythonHome() and Py_SetPath().
But the code you show does call Py_GetPath() before it calls Py_Initialize();, which it per the above paragraph implicitly should not.

How can embedded python call a C++ class's function?

So, StackOverflow, I'm stumped.
The code as I have it is a C++ function with embedded Python. I generate a message on the C++ side, send it to the python, get a different message back. I got it to work, I got it tested, so far, so good.
The next step is that I need Python to generate messages on their own and send them into C++. This is where I'm starting to get stuck. After spending a few hours puzzling over the documentation, it seemed like the best way would be to define a module to hold my functions. So I wrote up the following stub:
static PyMethodDef mailbox_methods[] = {
{ "send_external_message",
[](PyObject *caller, PyObject *args) -> PyObject *
{
classname *interface = (classname *)
PyCapsule_GetPointer(PyTuple_GetItem(args, 0), "turkey");
class_field_type toReturn;
toReturn = class_field_type.python2cpp(PyTuple_GetItem(args, 1));
interface ->send_message(toReturn);
Py_INCREF(Py_None);
return Py_None;
},
METH_VARARGS,
"documentation" },
{ NULL, NULL, 0, NULL }
};
static struct PyModuleDef moduledef = {
PyModuleDef_HEAD_INIT,
"turkey",
"documentation",
-1,
mailbox_methods
};
//defined in header file, just noting its existence here
PyObject *Module;
PyMODINIT_FUNC PyInit_turkey()
{
Module = PyModule_Create(&moduledef);
return Module;
}
And on the Python side, I had the following receiver code:
import turkey
I get the following response:
ImportError: No module named 'turkey'
Now, here's the part where I get really confused. Here's the code in my initialization function:
PyInit_turkey();
PyObject *interface = PyCapsule_New(this, "instance", NULL);
char *repr = PyUnicode_AsUTF8(PyObject_Repr(Module));
cout << "REPR: " << repr << "\n";
if (PyErr_Occurred())
PyErr_Print();
It prints out
REPR: <module 'turkey'>
Traceback (most recent call last):
<snipped>
import turkey
ImportError: No module named 'turkey'
So the module exists, but it's not ever being passed to Python anywhere. I can't find documentation on how to pass it in and get it initialized on the Python side. I realize that I'm probably just missing a trivial step, but I can't for the life off me figure out what it is. Can anyone help?
The answer was, in the end, a single function that I was missing. Included at the start of my initialization function, before I call Py_Initialize:
PyImport_AppendInittab("facade", initfunc);
Py_Initialize();
PyEval_InitThreads();
The Python documentation does not mention PyImport_AppendInittab except in passing, and this is why I was having such a difficult time making the jump.
To anyone else who finds this in the future: You do not need to create a DLL to extend python or use a pre-built library to bring a module into Python. It can be far easier than that.

Categories