Memory management of SWIG generated objects passed to C - python

I am trying to wrap a library for Python written in C++ using SWIG. The library uses function calls that accept byte buffers as parameters. In Python I am creating these byte buffers using %array_class from SWIG . I made a proof-of-concept program in order to test this out and I noticed a significant memory leak associated with passing these buffers to C++. Specifically, running the code below steadily raises the memory usage of the Python application (as observed on the Task Manager) up to about 250MB where the program halts. The printouts from C indicate that the program does run as expected, but just eats up more memory. The del buff statement runs, but does nothing to release the memory. I tried creating and deleting the buffer in each loop, but same result.
Running delete x; in C++ crashes my program entirely.
My Swig Interface file:
%module example
%include "carrays.i"
%array_class(uint8_t, buffer);
%{
#include "PythonConnector.h"
%}
%include "PythonConnector.h"
The C header file:
class PythonConnector {
public:
void print_array(uint8_t *x);
};
The minimal C-defined function
void PythonConnector::print_array(uint8_t *x)
{
//int i;
//for (i = 0; i < 100; i++) {
// printf("[%d] = %d\n", i, x[i]);
//}
//delete x; // <-- This crashed the program
return;
}
The tester Python script
import time
import example
sizeBytes = 10000
buff = example.buffer(sizeBytes)
for j in range(1000):
# Initialize data buffer
for i in range(sizeBytes):
buff[i] = i%256
buff[0] = 0
example.PythonConnector().print_array(buff.cast())
print(j)
del buff
time.sleep(10)
Am I missing something? I suspect that SWIG creates some proxy object for each time the buffer is passed to the C++ that is not garbage-collected.
Edit:
SWIG version 3.0.7
CPython version 3.5 x64
Windows 10 OS
Thanks for your help.

OK, Thanks to #Flexo, I found the answer.
The problem is the instantiation of the example.PythonConnector() being created in each loop. Instantiating only once outside the loop seems to fix the memory problem:
import time
import example
sizeBytes = 10000
buff = example.buffer(sizeBytes)
conn = example.PythonConnector()
for j in range(1000):
# Initialize data buffer
for i in range(sizeBytes):
buff[i] = i%256
buff[0] = 0
conn.print_array(buff.cast())
print(j)
del buff
time.sleep(10)
There still remains the question why the many connectors don't get garbage collected in the original code.

Related

How to force/test malloc failure in shared library when called via Python ctypes

I have a Python program that calls a shared library (libpq in this case) that itself calls malloc under the hood.
I want to be able to test (i.e. in unit tests) what happens when those calls to malloc fail (e.g. when there isn't enough memory).
How can I force that?
Note: I don't think setting a resource limit on the process using ulimit -d would work. It would need to be be precise and robust enough to, say, make a single malloc call inside libpq, for example one inside PQconnectdbParams, to fail, but all others to work fine, across different versions of Python, and even different resource usages in the same version of Python.
It's possible, but it's tricky. In summary
You can override malloc in a shared library, test_malloc_override.so say, and then (on linux at least) using the LD_PRELOAD environment variable to load it.
But... Python calls malloc all over the place, and you need those to succeed. To isolate the "right" calls to malloc to fail you can use the glibc functions "backtrace" and "backtrace_symbols" to inspect the stack to see if it's the right one to fail.
This shared library exposes a small API to control which calls to malloc will fail (so it doesn't need to be hard coded in the library)
To allow some calls to malloc to succeed, you need a pointer to the original malloc function. However, to find this you need to call dlsym, which itself can call malloc. So you need to build in a simple allocator inside the new malloc so these calls (recursive) calls to malloc succeed. Thanks to https://stackoverflow.com/a/10008252/1319998 for this tip.
In more detail:
The shared library code
// In test_override_malloc.c
// Some of this code is inspired by https://stackoverflow.com/a/10008252/1319998
#define _GNU_SOURCE
#include <dlfcn.h>
#include <execinfo.h>
#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
// Fails malloc at the fail_in-th call when search_string is in the backtrade
// -1 means never fail
static int fail_in = -1;
static char search_string[1024];
// To find the original address of malloc during malloc, we might
// dlsym will be called which might allocate memory via malloc
static char initialising_buffer[10240];
static int initialising_buffer_pos = 0;
// The pointers to original memory management functions to call
// when we don't want to fail
static void *(*original_malloc)(size_t) = NULL;
static void (*original_free)(void *ptr) = NULL;
void set_fail_in(int _fail_in, char *_search_string) {
fail_in = _fail_in;
strncpy(search_string, _search_string, sizeof(search_string));
}
void *
malloc(size_t size) {
void *memory = NULL;
int trace_size = 100;
void *stack[trace_size];
static int initialising = 0;
static int level = 0;
// Save original
if (!original_malloc) {
if (initialising) {
if (size + initialising_buffer_pos >= sizeof(initialising_buffer)) {
exit(1);
}
void *ptr = initialising_buffer + initialising_buffer_pos;
initialising_buffer_pos += size;
return ptr;
}
initialising = 1;
original_malloc = dlsym(RTLD_NEXT, "malloc");
original_free = dlsym(RTLD_NEXT, "free");
initialising = 0;
}
// If we're in a nested malloc call (the backtrace functions below can call malloc)
// then call the original malloc
if (level) {
return original_malloc(size);
}
++level;
if (fail_in == -1) {
memory = original_malloc(size);
} else {
// Find if we're in the stack
backtrace(stack, trace_size);
char **symbols = backtrace_symbols(stack, trace_size);
int found = 0;
for (int i = 0; i < trace_size; ++i) {
if (strstr(symbols[i], search_string) != NULL) {
found = 1;
break;
}
}
free(symbols);
if (!found) {
memory = original_malloc(size);
} else {
if (fail_in > 0) {
memory = original_malloc(size);
}
--fail_in;
}
}
--level;
return memory;
}
void free(void *ptr) {
if (ptr < (void*) initialising_buffer || ptr > (void*)(initialising_buffer + sizeof(initialising_buffer))) {
original_free(ptr);
}
}
Compiled with
gcc -shared -fPIC test_override_malloc.c -o test_override_malloc.so -ldl
Example Python code
This could go inside the unit tests
# Inside my_test.py
from ctypes import cdll
cdll.LoadLibrary('./test_override_malloc.so').set_fail_in(0, b'libpq.so')
# ... then call a function in the shared library libpq.so
# The `0` above means the very next call it makes to malloc will fail
Run with
LD_PRELOAD=$PWD/test_override_malloc.so python3 my_test.py
(This might all not be worth it admittedly... if Python calls malloc a lot, I wonder if that in most situations it's unlikely that Python will be fine but just the one call in the library will fail)

Openmesh: updating face normals faster with Python than with C++?

I have created the following simple C++ script with OpenMesh:
#include <string>
#include <OpenMesh/Core/IO/MeshIO.hh>
#include <OpenMesh/Core/Mesh/TriMesh_ArrayKernelT.hh>
struct MyTraits : OpenMesh::DefaultTraits{
typedef OpenMesh::Vec3d Point;
typedef OpenMesh::Vec3d Normal;
};
typedef OpenMesh::TriMesh_ArrayKernelT<MyTraits> MyMesh;
int main(int argc, char *argv[]){
std::string filename = "filename.stl";
MyMesh OM_mesh;
OM_mesh.request_face_normals();
OM_mesh.request_halfedge_normals();
OM_mesh.request_vertex_normals();
OM_mesh.request_face_status();
OM_mesh.request_edge_status();
OM_mesh.request_halfedge_status();
OM_mesh.request_vertex_status();
OpenMesh::IO::Options ropt;
ropt += OpenMesh::IO::Options::Binary;
ropt += OpenMesh::IO::Options::FaceNormal;
OpenMesh::IO::read_mesh(OM_mesh, filename);
for(int k=0; k<1000; k++){
OM_mesh.update_face_normals();
}
return 0;
}
Also, I have developed the following simple Python script using the OpenMesh bindings:
import openmesh as OM
filename = "filename.stl"
OM_mesh = OM.TriMesh()
OM_mesh.request_face_normals()
OM_mesh.request_halfedge_normals()
OM_mesh.request_vertex_normals()
OM_mesh.request_face_status()
OM_mesh.request_edge_status()
OM_mesh.request_halfedge_status()
OM_mesh.request_vertex_status()
options = OM.Options()
options += OM.Options.Binary
options += OM.Options.FaceNormal
OM.read_mesh(OM_mesh, filename, options)
for k in range(1000):
OM_mesh.update_face_normals()
Both scripts update the face normals of the loaded mesh 1000 times. I expected that the C++ script would be considerably faster than the Python script, but in fact it is just the opposite. I found that the C++ script spends around 8 seconds, while the Python script only spends around 0.3 seconds.
How can this be possible? Are the Python bindings doing something different than just "wrap" the C++ update_face_normals method? Thanks.
I've found that I should use the reading options when I read the file in C++, like this:
OpenMesh::IO::read_mesh(OM_mesh, filename, ropt);
By doing so, the speed in C++ is higher than in Python. However, in .off files, this update is not correct, but this is another issue.

static openCL class not properly released in python module using boost.python

EDIT: Ok, all the edits made the layout of the question a bit confusing so I will try to rewrite the question (not changing the content, but improving its structure).
The issue in short
I have an openCL program that works fine, if I compile it as an executable. Now I try to make it callable from Python using boost.python. However, as soon as I exit Python (after importing my module), python crashes.
The reason seems to have something to do with
statically storing only GPU CommandQueues and their release mechanism when the program terminates
MWE and setup
Setup
IDE used: Visual Studio 2015
OS used: Windows 7 64bit
Python version: 3.5
AMD OpenCL APP 3.0 headers
cl2.hpp directly from Khronos as suggested here: empty openCL program throws deprecation warning
Also I have an Intel CPU with integrated graphics hardware and no other dedicated graphics card
I use version 1.60 of the boost library compiled as 64-bit versions
The boost dll I use is called: boost_python-vc140-mt-1_60.dll
The openCL program without python works fine
The python module without openCL works fine
MWE
#include <vector>
#define CL_HPP_ENABLE_EXCEPTIONS
#define CL_HPP_TARGET_OPENCL_VERSION 200
#define CL_HPP_MINIMUM_OPENCL_VERSION 200 // I have the same issue for 100 and 110
#include "cl2.hpp"
#include <boost/python.hpp>
using namespace std;
class TestClass
{
private:
std::vector<cl::CommandQueue> queues;
TestClass();
public:
static const TestClass& getInstance()
{
static TestClass instance;
return instance;
}
};
TestClass::TestClass()
{
std::vector<cl::Device> devices;
vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
//remove non 2.0 platforms (as suggested by doqtor)
platforms.erase(
std::remove_if(platforms.begin(), platforms.end(),
[](const cl::Platform& platform)
{
int v = cl::detail::getPlatformVersion(platform());
short version_major = v >> 16;
return !(version_major >= 2);
}),
platforms.end());
//Get all available GPUs
for (const cl::Platform& pl : platforms)
{
vector<cl::Device> plDevices;
try {
pl.getDevices(CL_DEVICE_TYPE_GPU, &plDevices);
}
catch (cl::Error&)
{
// Doesn't matter. No GPU is available on the current machine for
// this platform. Just check afterwards, that you have at least one
// device
continue;
}
devices.insert(end(devices), begin(plDevices), end(plDevices));
}
cl::Context context(devices[0]);
cl::CommandQueue queue(context, devices[0]);
queues.push_back(queue);
}
int main()
{
TestClass::getInstance();
return 0;
}
BOOST_PYTHON_MODULE(FrameWork)
{
TestClass::getInstance();
}
Calling program
So after compiling the program as a dll I start python and run the following program
import FrameWork
exit()
While the import works without issues, python crashes on exit(). So I click on debug and Visual Studio tells me there was an exception in the following code section (in cl2.hpp):
template <>
struct ReferenceHandler<cl_command_queue>
{
static cl_int retain(cl_command_queue queue)
{ return ::clRetainCommandQueue(queue); }
static cl_int release(cl_command_queue queue) // -- HERE --
{ return ::clReleaseCommandQueue(queue); }
};
If you compile the above code instead as a simple executable, it works without issues. Also the code works if one of the following is true:
CL_DEVICE_TYPE_GPU is replaced by CL_DEVICE_TYPE_ALL
the line queues.push_back(queue) is removed
Question
So what could be the reason for this and what are possible solutions? I suspect it has something to do with the fact that my testclass is static, but since it works with the executable I am at a loss what is causing it.
I came across similar problem in the past.
clRetain* functions are supported from OpenCL1.2.
When getting devices for the first GPU platform (platforms[0].getDevices(...) for CL_DEVICE_TYPE_GPU) in your case it must happen to be a platform pre OpenCL1.2 hence you get a crash. When getting devices of any type (GPU/CPU/...) your first platform changes to be a OpenCL1.2+ and everything is fine.
To fix the problem set:
#define CL_HPP_MINIMUM_OPENCL_VERSION 110
This will ensure calls to clRetain* aren't made for unsupported platforms (pre OpenCL 1.2)
Update: I think there is a bug in cl2.hpp which despite setting minimum OpenCL version to 1.1 it still tries to use clRetain* on pre OpenCL1.2 devices when creating a command queue.
Setting minimum OpenCL version to 110 and version filtering works fine for me.
Complete working example:
#include "stdafx.h"
#include <vector>
#define CL_HPP_ENABLE_EXCEPTIONS
#define CL_HPP_TARGET_OPENCL_VERSION 200
#define CL_HPP_MINIMUM_OPENCL_VERSION 110
#include <CL/cl2.hpp>
using namespace std;
class TestClass
{
private:
std::vector<cl::CommandQueue> queues;
TestClass();
public:
static const TestClass& getInstance()
{
static TestClass instance;
return instance;
}
};
TestClass::TestClass()
{
std::vector<cl::Device> devices;
vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
size_t x = 0;
for (; x < platforms.size(); ++x)
{
cl::Platform &p = platforms[x];
int v = cl::detail::getPlatformVersion(p());
short version_major = v >> 16;
if (version_major >= 2) // OpenCL 2.x
break;
}
if (x == platforms.size())
return; // no OpenCL 2.0 platform available
platforms[x].getDevices(CL_DEVICE_TYPE_GPU, &devices);
cl::Context context(devices);
cl::CommandQueue queue(context, devices[0]);
queues.push_back(queue);
}
int main()
{
TestClass::getInstance();
return 0;
}
Update2:
So what could be the reason for this and what are possible solutions?
I suspect it has something to do with the fact that my testclass is
static, but since it works with the executable I am at a loss what is
causing it.
TestClass static seems to be a reason. Looks like releasing memory is happening in wrong order when run from python. To fix that you may want to add a method which will have to be explicitly called to release opencl objects before python starts releasing memory.
static TestClass& getInstance() // <- const removed
{
static TestClass instance;
return instance;
}
void release()
{
queues.clear();
}
BOOST_PYTHON_MODULE(FrameWork)
{
TestClass::getInstance();
TestClass::getInstance().release();
}
"I would appreciate an answer that explains to me what the problem actually is and if there are ways to fix it."
First, let me say that doqtor already answered how to fix the issue -- by ensuring a well-defined destruction time of all used OpenCL resources. IMO, this is not a "hack", but the right thing to do. Trying to rely on static init/cleanup magic to do the right thing -- and watching it fail to do so -- is the real hack!
Second, some thoughts about the issue: the actual problem is even more complex than the common static initialization order fiasco stories. It involves DLL loading/unloading order, both in connection with python loading your custom dll at runtime and (more importantly) with OpenCL's installable client driver (ICD) model.
What DLLs are involved when running an application/dll that uses OpenCL? To the application, the only relevant DLL is the opencl.dll you link against. It is loaded into process memory during application startup time (or when your custom DLL which needs opencl is dynamically loaded in python).
Then, at the time when you first call clGetPlatformInfo() or similar in your code, the ICD logic kicks in: opencl.dll will look for installed drivers (in windows, those are mentioned somewhere in the registry) and dynamically load their respective dlls (using sth like the LoadLibrary() system call). That may be e.g. nvopencl.dll for nvidia, or some other dll for the intel driver you have installed. Now, in contrast to the relatively simple opencl.dll, this ICD dll can and will have a multitude of dependencies on its own -- probably using Intel IPP, or TBB, or whatever. So by now, things have become real messy already.
Now, during shutdown, the windows loader must decide which dlls to unload in which order. When you compile your example in a single executable, the number and order of dlls being loaded/unloaded will certainly be different than in the "python loads your custom dll at runtime" scenario. And that could well be the reason why you experience the problem only in the latter case, and only if you still have an opencl-context+commandqueue alive during shutdown of your custom dll. The destruction of your queue (triggered via clRelease... during static destruction of your testclass instance) is delegated to the intel-icd-dll, so this dll must still be fully functional at that time. If, for some reason, that is not the case (perhaps because the loader chose to unload it or one of the dlls it needs), you crash.
That line of thought reminded me of this article:
https://blogs.msdn.microsoft.com/larryosterman/2004/06/10/dll_process_detach-is-the-last-thing-my-dlls-going-to-see-right/
There's a paragraph, talking about "COM objects", which might be equally applicable to "OpenCL resources":
"So consider the case where you have a DLL that instantiates a COM object at some point during its lifetime. If that DLL keeps a reference to the COM object in a global variable, and doesn’t release the COM object until the DLL_PROCESS_DETACH, then the DLL that implements the COM object will be kept in memory during the lifetime of the COM object. Effectively the DLL implementing the COM object has become dependant on the DLL that holds the reference to the COM object. But the loader has no way of knowing about this dependency. All it knows is that the DLL’s are loaded into memory."
Now, I wrote a lot of words without coming to a definitive proof of what's actually going wrong. The main lesson I learned from bugs like these is: don't enter that snake pit, and do your resource-cleanup in a well-defined place like doqtor suggested. Good night.

Python - C embedded Segmentation fault

I am facing a problem similar to the Py_initialize / Py_Finalize not working twice with numpy .. The basic coding in C:
Py_Initialize();
import_array();
//Call a python function which imports numpy as a module
//Py_Finalize()
The program is in a loop and it gives a seg fault if the python code has numpy as one of the imported module. If I remove numpy, it works fine.
As a temporary work around I tried not to use Py_Finalize(), but that is causing huge memory leaks [ observed as the memory usage from TOP keeps on increasing ]. And I tried but did not understand the suggestion in that link I posted. Can someone please suggest the best way to finalize the call while having imports such as numpy.
Thanks
santhosh.
I recently faced a very similar issue and developed a workaround that works for my purposes, so I thought I would write it here in the hope it might help others.
The problem
I work with some postprocessing pipeline for which I can write a own functor to work on some data passing through the pipeline and I wanted to be able to use Python scripts for some of the operations.
The problem is that the only thing I can control is the functor itself, which gets instantiated and destroyed at times beyond my control. I furthermore have the problem that even if I do not call Py_Finalize the pipeline sometimes crashes once I pass another dataset through the pipeline.
The solution in a Nutshell
For those who don't want to read the whole story and get straight to the point, here's the gist of my solution:
The main idea behind my workaround is not to link against the Python library, but instead load it dynamically using dlopen and then get all the addresses of the required Python functions using dlsym. Once that's done, one can call Py_Initialize() followed by whatever you want to do with Python functions followed by a call to Py_Finalize() once you're done. Then, one can simply unload the Python library. The next time you need to use Python functions, simply repeat the steps above and Bob's your uncle.
However, if you are importing NumPy at any point between Py_Initialize and Py_Finalize, you will also need to look for all the currently loaded libraries in your program and manually unload those using dlclose.
Detailed workaround
Loading instead of linking Python
The main idea as I mentioned above is not to link against the Python library. Instead, what we will do is load the Python library dynamically using dlopen():
#include
...
void* pHandle = dlopen("/path/to/library/libpython2.7.so", RTLD_NOW | RTLD_GLOBAL);
The code above loads the Python shared library and returns a handle to it (the return type is an obscure pointer type, thus the void*). The second argument (RTLD_NOW | RTLD_GLOBAL) is there to make sure that the symbols are properly imported into the current application's scope.
Once we have a pointer to the handle of the loaded library, we can search that library for the functions it exports using the dlsym function:
#include <dlfcn.h>
...
// Typedef named 'void_func_t' which holds a pointer to a function with
// no arguments with no return type
typedef void (*void_func_t)(void);
void_func_t MyPy_Initialize = dlsym(pHandle, "Py_Initialize");
The dlsym function takes two parameters: a pointer to the handle of the library that we obtained previously and the name of the function we are looking for (in this case, Py_Initialize). Once we have the address of the function we want, we can create a function pointer and initialize it to that address. To actually call the Py_Initialize function, one would then simply write:
MyPy_Initialize();
For all the other functions provided by the Python C-API, one can just add calls to dlsym and initialize function pointers to its return value and then use those function pointers instead of the Python functions. One simply has to know the parameter and return value of the Python function in order to create the correct type of function pointer.
Once we are finished with the Python functions and call Py_Finalize using a procedure similar to the one for Py_Initialize one can unload the Python dynamic library in the following way:
dlclose(pHandle);
pHandle = NULL;
Manually unloading NumPy libraries
Unfortunately, this does not solve the segmentation fault problems that occur when importing NumPy. The problems comes from the fact that NumPy also loads some libraries using dlopen (or something equivalent) and those do not get unloaded them when you call Py_Finalize. Indeed, if you list all the loaded libraries within your program, you will notice that after closing the Python environment with Py_Finalize, followed by a call to dlclose, some NumPy libraries will remain loaded in memory.
The second part of the solution requires to list all the Python libraries that remain in memory after the call dlclose(pHandle);. Then, for each of those libraries, grab a handle to them and then call dlcloseon them. After that, they should get unloaded automatically by the operating system.
Fortunately, there are functions under both Windows and Linux (sorry MacOS, couldn't find anything that would work in your case...):
- Linux: dl_iterate_phdr
- Windows: EnumProcessModules in conjunction with OpenProcess and GetModuleFileNameEx
Linux
This is rather straight forward once you read the documentation about dl_iterate_phdr:
#include <link.h>
#include <string>
#include <vector>
// global variables are evil!!! but this is just for demonstration purposes...
std::vector<std::string> loaded_libraries;
// callback function that gets called for every loaded libraries that
// dl_iterate_phdr finds
int dl_list_callback(struct dl_phdr_info *info, size_t, void *)
{
loaded_libraries.push_back(info->dlpi_name);
return 0;
}
int main()
{
...
loaded_libraries.clear();
dl_iterate_phdr(dl_list_callback, NULL);
// loaded_libraries now contains a list of all dynamic libraries loaded
// in your program
....
}
Basically, the function dl_iterate_phdr cycles through all the loaded libraries (in the reverse order they were loaded) until either the callback returns something other than 0 or it reaches the end of the list. To save the list, the callback simply adds each element to a global std::vector (one should obviously avoid global variables and use a class for example).
Windows
Under Windows, things get a little more complicated, but still manageable:
#include <windows.h>
#include <psapi.h>
std::vector<std::string> list_loaded_libraries()
{
std::vector<std::string> m_asDllList;
HANDLE hProcess(OpenProcess(PROCESS_QUERY_INFORMATION
| PROCESS_VM_READ,
FALSE, GetCurrentProcessId()));
if (hProcess) {
HMODULE hMods[1024];
DWORD cbNeeded;
if (EnumProcessModules(hProcess, hMods, sizeof(hMods), &cbNeeded)) {
const DWORD SIZE(cbNeeded / sizeof(HMODULE));
for (DWORD i(0); i < SIZE; ++i) {
TCHAR szModName[MAX_PATH];
// Get the full path to the module file.
if (GetModuleFileNameEx(hProcess,
hMods[i],
szModName,
sizeof(szModName) / sizeof(TCHAR))) {
#ifdef UNICODE
std::wstring wStr(szModName);
std::string tModuleName(wStr.begin(), wStr.end());
#else
std::string tModuleName(szModName);
#endif /* UNICODE */
if (tModuleName.substr(tModuleName.size()-3) == "dll") {
m_asDllList.push_back(tModuleName);
}
}
}
}
CloseHandle(hProcess);
}
return m_asDllList;
}
The code in this case is slightly longer than for the Linux case, but the main idea is the same: list all the loaded libraries and save them into a std::vector. Don't forget to also link your program to the Psapi.lib!
Manual unloading
Now that we can list all the loaded libraries, all you need to do is find among those the ones that come from loading NumPy, grab a handle to them and then call dlclose on that handle. The code below will work on both Windows and Linux, provided that you use the dlfcn-win32 library.
#ifdef WIN32
# include <windows.h>
# include <psapi.h>
# include "dlfcn_win32.h"
#else
# include <dlfcn.h>
# include <link.h> // for dl_iterate_phdr
#endif /* WIN32 */
#include <string>
#include <vector>
// Function that list all loaded libraries (not implemented here)
std::vector<std::string> list_loaded_libraries();
int main()
{
// do some preprocessing stuff...
// store the list of loaded libraries now
// any libraries that get added to the list from now on must be Python
// libraries
std::vector<std::string> loaded_libraries(list_loaded_libraries());
std::size_t start_idx(loaded_libraries.size());
void* pHandle = dlopen("/path/to/library/libpython2.7.so", RTLD_NOW | RTLD_GLOBAL);
// Not implemented here: get the addresses of the Python function you need
MyPy_Initialize(); // Needs to be defined somewhere above!
MyPyRun_SimpleString("import numpy"); // Needs to be defined somewhere above!
// ...
MyPyFinalize(); // Needs to be defined somewhere above!
// Now list the loaded libraries again and start manually unloading them
// starting from the end
loaded_libraries = list_loaded_libraries();
// NB: this below assumes that start_idx != 0, which should always hold true
for(std::size_t i(loaded_libraries.size()-1) ; i >= start_idx ; --i) {
void* pHandle = dlopen(loaded_libraries[i].c_str(),
#ifdef WIN32
RTLD_NOW // no support for RTLD_NOLOAD
#else
RTLD_NOW|RTLD_NOLOAD
#endif /* WIN32 */
);
if (pHandle) {
const unsigned int Nmax(50); // Avoid getting stuck in an infinite loop
for (unsigned int j(0) ; j < Nmax && !dlclose(pHandle) ; ++j);
}
}
}
Final words
The examples shown here capture the basic ideas behind my solution, but can certainly be improved to avoid global variables and facilitate ease of use (for example, I wrote a singleton class that handles the automatic initialization of all the function pointers after loading the Python library).
I hope this can be useful to someone in the future.
References
dl_iterate_phdr: https://linux.die.net/man/3/dl_iterate_phdr
PsAPI library: https://msdn.microsoft.com/en-us/library/windows/desktop/ms684894(v=vs.85).aspx
OpenProcess: https://msdn.microsoft.com/en-us/library/windows/desktop/ms684320(v=vs.85).aspx
EnumProcess: https://msdn.microsoft.com/en-us/library/windows/desktop/ms682629(v=vs.85).aspx
GetModuleFileNameEx: https://msdn.microsoft.com/en-us/library/windows/desktop/ms683198(v=vs.85).aspx
dlfcn-win32 library: library: https://github.com/dlfcn-win32/dlfcn-win32
I'm not quite sure how you don't seem to understand the solution posted in Py_initialize / Py_Finalize not working twice with numpy. The solution posted is quite simple: call Py_Initialize and Py_Finalize only once for each time your program executes. Do not call them every time you run the loop.
I assume that your program, when it starts, runs some initialization commands (which are only run once). Call Py_Initialize there. Never call it again. Also, I assume that when your program terminates, it has some code to tear down things, dump log files, etc. Call Py_Finalize there. Py_Initialize and Py_Finalize are not intended to help you manage memory in the Python interpreter. Do not use them for that, as they cause your program to crash. Instead, use Python's own functions to get rid of objects you don't want to keep.
If you really MUST create a new environment every time you run your code, you can use Py_NewInterpreter and to create a sub-interpreter and Py_EndInterpreter to destroy that sub-interpreter later. They're documented near the bottom of the Python C API page. This works similarly to having a new interpreter, except that modules are not re-initialized each time a sub-interpreter starts.

Segfault on calling standard windows .dll from python ctypes with wine

I'm trying to call some function from Kernel32.dll in my Python script running on Linux. As Johannes Weiß pointed How to call Wine dll from python on Linux? I'm loading kernel32.dll.so library via ctypes.cdll.LoadLibrary() and it loads fine. I can see kernel32 loaded and even has GetLastError() function inside. However whenever I'm trying to call the function i'm gettings segfault.
import ctypes
kernel32 = ctypes.cdll.LoadLibrary('/usr/lib/i386-linux-gnu/wine/kernel32.dll.so')
print kernel32
# <CDLL '/usr/lib/i386-linux-gnu/wine/kernel32.dll.so', handle 8843c10 at b7412e8c>
print kernel32.GetLastError
# <_FuncPtr object at 0xb740b094>
gle = kernel32.GetLastError
# OK
gle_result = gle()
# fails with
# Segmentation fault (core dumped)
print gle_result
First I was thinking about calling convention differences but it seems to be okay after all. I'm ending with testing simple function GetLastError function without any params but I'm still getting Segmentation fault anyway.
My testing system is Ubuntu 12.10, Python 2.7.3 and wine-1.4.1 (everything is 32bit)
UPD
I proceed with my testing and find several functions that I can call via ctypes without segfault. For instance I can name Beep() and GetCurrentThread() functions, many other functions still give me segfault. I created a small C application to test kernel32.dll.so library without python but i've got essentially the same results.
int main(int argc, char **argv)
{
void *lib_handle;
#define LOAD_LIBRARY_AS_DATAFILE 0x00000002
long (*GetCurrentThread)(void);
long (*beep)(long,long);
void (*sleep)(long);
long (*LoadLibraryExA)(char*, long, long);
long x;
char *error;
lib_handle = dlopen("/usr/local/lib/wine/kernel32.dll.so", RTLD_LAZY);
if (!lib_handle)
{
fprintf(stderr, "%s\n", dlerror());
exit(1);
}
// All the functions are loaded e.g. sleep != NULL
GetCurrentThread = dlsym(lib_handle, "GetCurrentThread");
beep = dlsym(lib_handle, "Beep");
LoadLibraryExA = dlsym(lib_handle, "LoadLibraryExA");
sleep = dlsym(lib_handle, "Sleep");
if ((error = dlerror()) != NULL)
{
fprintf(stderr, "%s\n", error);
exit(1);
}
// Works
x = (*GetCurrentThread)();
printf("Val x=%d\n",x);
// Works (no beeping, but no segfault too)
(*beep)(500,500);
// Segfault
(*sleep)(5000);
// Segfault
(*LoadLibraryExA)("/home/ubuntu/test.dll",0,LOAD_LIBRARY_AS_DATAFILE);
printf("The End\n");
dlclose(lib_handle);
return 0;
}
I was trying to use different calling conventions for Sleep() function but got no luck with it too. When I comparing function declarations\implementation in Wine sources they are essentially the same
Declarations
HANDLE WINAPI GetCurrentThread(void) // http://source.winehq.org/source/dlls/kernel32/thread.c#L573
BOOL WINAPI Beep( DWORD dwFreq, DWORD dwDur ) // http://source.winehq.org/source/dlls/kernel32/console.c#L354
HMODULE WINAPI DECLSPEC_HOTPATCH LoadLibraryExA(LPCSTR libname, HANDLE hfile, DWORD flags) // http://source.winehq.org/source/dlls/kernel32/module.c#L928
VOID WINAPI DECLSPEC_HOTPATCH Sleep( DWORD timeout ) // http://source.winehq.org/source/dlls/kernel32/sync.c#L95
WINAPI is defined to be __stdcall
However some of them works and some don't. As I can understand this sources are for kernel32.dll file and kernel32.dll.so file is a some kind of proxy that supposed to provide access to kernel32.dll for linux code. Probably I need to find exact sources of kernel32.dll.so file and take a look on declarations.
Is there any tool I can use to take a look inside .so file and find out what functions and what calling conventions are used?
The simplest way to examine a DLL is to use the nm command, i.e.
$ nm kernel32.dll.so | grep GetLastError
7b86aae0 T _GetLastError
As others have pointed out, the default calling convention for Windows C DLLs is stdcall. It has nothing to do with using Python. On the Windows platform, ctypes.windll is available.
However, I am not even sure what you are trying to do is at all possible. Wine is a full-blown Windows emulator and it is safe to guess that at least you would have to start it with wine_init before loading any other functions. The Windows API probably have some state (set when Windows boots).
The easiest way to continue is probably to install a Windows version of Python under Wine and run your script from there.

Categories