Why might this use of new cause a segmentation fault? - python

If I use placement new in the following way my code appears to run reliably. If I use the commented code instead I frequently get either a segmentation fault (11) or unexpected results. Returning the address of a statically allocated object also works.
I've verified that null is not returned and that this function is being called exactly once for the single Python object in my test script.
extern "C"
{
void * hail_detector_new()
{
alignas(alignof(hail::Detector)) static U8 allocation[sizeof(hail::Detector)];
// return new(std::nothrow) hail::Detector;
return new(allocation) hail::Detector;
}
}
Python arguments and return type declarations:
f = _library.hail_detector_new
f.restype = c_void_p
f.argtypes = None
class Detector(object):
def __init__(self):
self._object = _library.hail_detector_new()
Versions
Squall: clang --version
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin16.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Squall: python --version
Python 2.7.13 :: Continuum Analytics, Inc.

Related

SWIG, Python and unicode strings

I am developing a Python module using SWIG and linking with a certain library.
I have a typemap like the following:
%typemap(in) MyStringType {
char* buffer = 0;
size_t size = 0;
int alloc = SWIG_OLDOBJ;
int res = SWIG_AsCharPtrAndSize( $input, &buffer, &size, &alloc );
if (!SWIG_IsOK(res)) {
SWIG_exception_fail(SWIG_TypeError, "in method '$symname', expecting string");
}
$1.p = buffer;
$1.n = size - 1;
}
I have some function that takes as parameter MyStringType as the parameter:
void myFunc(MyStringType type);
Then, when I create the python module I do the following in python 2.7:
myFunc("Test string")
However, when I try with Python 3.4 I get the following:
TypeError: in method 'myFunc', expecting string
I could solve that by adding this to the swig interface file:
%begin %{
#define SWIG_PYTHON_STRICT_UNICODE_WCHAR
%}
That solved the problem. However I still need to add one more macro because some other functions require it. So, now it is like this:
%begin %{
#define SWIG_PYTHON_STRICT_BYTE_CHAR
#define SWIG_PYTHON_STRICT_UNICODE_WCHAR
%}
This would make me call myFunc like this:
myFunc(b"Test string")
And everything works fine when using tox in my pc, for python 2.7, 3.4 and 3.5, but it fails in travis-ci.
After removing the b before "Test string" it succeeds in travis-ci but fails in my pc.
In travis-ci I have: python: 2.7.6, 3.4.3
In my pc : python: 2.7.13, 3.4.8
Any idea why?
Solved. Thanks for all the hints.
It was different swig versions.

Qt - How to write "more" data to QProcess?

I am facing an issue I faced a few times so far.
The issue in question just gets solved by itself every time, without me understanding what is causing it
So what happens is that I start a python virtual environment from my c++ code. That works, afterwards by using the write function I am able to write stuff in that environment. This also works perfectly fine so far. However I am unable to write my last command to the process.
I though about maybe some buffer being full but I didn't really find anything about a buffer in the Qt docs
This is the relevant piece of code:
static QStringList params;
QProcess *p = new QProcess();
params<<"-f"<<"-c"<<"python2"<< "/home/John/Desktop/python.log";
qDebug()<<"parameters: "<<params;
qDebug()<<"going to write";
p->start("script", params);
qDebug()<<"Turning on new user process...";
while(!p->waitForStarted())
{qDebug()<<"waiting for virtualenv to be ready";}
successFailWrite = p->write("import imp;\n");
while(!p->waitForBytesWritten());
successFailWrite = p->write("foo = imp.load_source('myTest', '/home/John/recognitionClass.py');\n");
while(!p->waitForBytesWritten());
successFailWrite = p->write("from myTest import recognitionClass;\n");
while(!p->waitForBytesWritten());
successFailWrite = p->write("myClassObj = recognitionClass();\n");
if(successFailWrite !=-1)
{qDebug()<<"OK written";}
while(!p->waitForBytesWritten());
successFailWrite = p->write("habelahabela\n");
if(successFailWrite !=-1)
{qDebug()<<"OK written";}
QString name = "John";
QString processNewUserParameter= "print myClassObj.addNewUser("+ name +");\n";
QByteArray processNewUserParameterByteArr= processNewUserParameter.toUtf8();
p->write(processNewUserParameterByteArr);
I keep a log file which contains what is being written to the python virtualenv and what is being printed
Script started on Son 27 Aug 2017 20:09:52 CEST
import imp;
foo = imp.load_source('myTest', '/home/John/recognitionClass.py');
from myTest import recognitionClass;
myClassObj = recognitionClass();
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import imp;
>>> foo = imp.load_source('myTest', '/home/John/recognit
<myTest', '/home/John/recogniti onClass.py');
/usr/local/lib/python2.7/dist-packages/sklearn/cross_validation.py:44: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
"This module will be removed in 0.20.", DeprecationWarning)
/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.py:43: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.
DeprecationWarning)
>>> from myTest import recognitionClass;
>>> myClassObj = recognitionClass();
>>>
It does pront "OK written" twice, which on one side proves that I successfully wrote my commands to the process, yet I can't see anything.
As you can see the test sentence "habelahabela" doesn't get written neither.
Does anybody have an idea about what I may be doing wrong?
I know that I am writing my commands to quickly to the environment. Because as you can see I start by writing "import imp", it then gets buffered and a little later the buffer gets flushed and the virtualenv executes the command (this is why you see it twice).
Does anybody see why I can't see the test-sentence and -more importantly- my actual command "print myClassObj.addNewUser("+ name +");\n" being printed to the virtual environment?
Thanks
First of all, there is no sense in writing while(!p->waitForBytesWritten());. waitForBytesWritten already blocks your thread without a while loop and, as the name states, waits until bytes are written. It returns false only if there are either timeout or an error. In the first case you should give it more time to write bytes. In the second case you should fix the error and only then try again.
The same holds for waitForStarted and all other Qt functions starting with "waitFor...".
So the usage looks like:
if(!p->waitForBytesWritten(-1)) // waits forever until bytes ARE written
{
qDebug() << "Error while writing bytes";
}
Regarding the question: I believe the problem (or at least a part of it) is that you write your last 2 messages into p, but you neither wait for bytesWritten() signal, nor use waitForBytesWritten() blocking function. Although, there is probably no error occuring (because p->write(...) does not return -1 at that point), however it does not mean that your message is written yet. In a nutshell, wait for bytesWritten() signal...
QProcess inherits from QIODevice, so I recommend to look its docs and learn about it a bit more.

static openCL class not properly released in python module using boost.python

EDIT: Ok, all the edits made the layout of the question a bit confusing so I will try to rewrite the question (not changing the content, but improving its structure).
The issue in short
I have an openCL program that works fine, if I compile it as an executable. Now I try to make it callable from Python using boost.python. However, as soon as I exit Python (after importing my module), python crashes.
The reason seems to have something to do with
statically storing only GPU CommandQueues and their release mechanism when the program terminates
MWE and setup
Setup
IDE used: Visual Studio 2015
OS used: Windows 7 64bit
Python version: 3.5
AMD OpenCL APP 3.0 headers
cl2.hpp directly from Khronos as suggested here: empty openCL program throws deprecation warning
Also I have an Intel CPU with integrated graphics hardware and no other dedicated graphics card
I use version 1.60 of the boost library compiled as 64-bit versions
The boost dll I use is called: boost_python-vc140-mt-1_60.dll
The openCL program without python works fine
The python module without openCL works fine
MWE
#include <vector>
#define CL_HPP_ENABLE_EXCEPTIONS
#define CL_HPP_TARGET_OPENCL_VERSION 200
#define CL_HPP_MINIMUM_OPENCL_VERSION 200 // I have the same issue for 100 and 110
#include "cl2.hpp"
#include <boost/python.hpp>
using namespace std;
class TestClass
{
private:
std::vector<cl::CommandQueue> queues;
TestClass();
public:
static const TestClass& getInstance()
{
static TestClass instance;
return instance;
}
};
TestClass::TestClass()
{
std::vector<cl::Device> devices;
vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
//remove non 2.0 platforms (as suggested by doqtor)
platforms.erase(
std::remove_if(platforms.begin(), platforms.end(),
[](const cl::Platform& platform)
{
int v = cl::detail::getPlatformVersion(platform());
short version_major = v >> 16;
return !(version_major >= 2);
}),
platforms.end());
//Get all available GPUs
for (const cl::Platform& pl : platforms)
{
vector<cl::Device> plDevices;
try {
pl.getDevices(CL_DEVICE_TYPE_GPU, &plDevices);
}
catch (cl::Error&)
{
// Doesn't matter. No GPU is available on the current machine for
// this platform. Just check afterwards, that you have at least one
// device
continue;
}
devices.insert(end(devices), begin(plDevices), end(plDevices));
}
cl::Context context(devices[0]);
cl::CommandQueue queue(context, devices[0]);
queues.push_back(queue);
}
int main()
{
TestClass::getInstance();
return 0;
}
BOOST_PYTHON_MODULE(FrameWork)
{
TestClass::getInstance();
}
Calling program
So after compiling the program as a dll I start python and run the following program
import FrameWork
exit()
While the import works without issues, python crashes on exit(). So I click on debug and Visual Studio tells me there was an exception in the following code section (in cl2.hpp):
template <>
struct ReferenceHandler<cl_command_queue>
{
static cl_int retain(cl_command_queue queue)
{ return ::clRetainCommandQueue(queue); }
static cl_int release(cl_command_queue queue) // -- HERE --
{ return ::clReleaseCommandQueue(queue); }
};
If you compile the above code instead as a simple executable, it works without issues. Also the code works if one of the following is true:
CL_DEVICE_TYPE_GPU is replaced by CL_DEVICE_TYPE_ALL
the line queues.push_back(queue) is removed
Question
So what could be the reason for this and what are possible solutions? I suspect it has something to do with the fact that my testclass is static, but since it works with the executable I am at a loss what is causing it.
I came across similar problem in the past.
clRetain* functions are supported from OpenCL1.2.
When getting devices for the first GPU platform (platforms[0].getDevices(...) for CL_DEVICE_TYPE_GPU) in your case it must happen to be a platform pre OpenCL1.2 hence you get a crash. When getting devices of any type (GPU/CPU/...) your first platform changes to be a OpenCL1.2+ and everything is fine.
To fix the problem set:
#define CL_HPP_MINIMUM_OPENCL_VERSION 110
This will ensure calls to clRetain* aren't made for unsupported platforms (pre OpenCL 1.2)
Update: I think there is a bug in cl2.hpp which despite setting minimum OpenCL version to 1.1 it still tries to use clRetain* on pre OpenCL1.2 devices when creating a command queue.
Setting minimum OpenCL version to 110 and version filtering works fine for me.
Complete working example:
#include "stdafx.h"
#include <vector>
#define CL_HPP_ENABLE_EXCEPTIONS
#define CL_HPP_TARGET_OPENCL_VERSION 200
#define CL_HPP_MINIMUM_OPENCL_VERSION 110
#include <CL/cl2.hpp>
using namespace std;
class TestClass
{
private:
std::vector<cl::CommandQueue> queues;
TestClass();
public:
static const TestClass& getInstance()
{
static TestClass instance;
return instance;
}
};
TestClass::TestClass()
{
std::vector<cl::Device> devices;
vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
size_t x = 0;
for (; x < platforms.size(); ++x)
{
cl::Platform &p = platforms[x];
int v = cl::detail::getPlatformVersion(p());
short version_major = v >> 16;
if (version_major >= 2) // OpenCL 2.x
break;
}
if (x == platforms.size())
return; // no OpenCL 2.0 platform available
platforms[x].getDevices(CL_DEVICE_TYPE_GPU, &devices);
cl::Context context(devices);
cl::CommandQueue queue(context, devices[0]);
queues.push_back(queue);
}
int main()
{
TestClass::getInstance();
return 0;
}
Update2:
So what could be the reason for this and what are possible solutions?
I suspect it has something to do with the fact that my testclass is
static, but since it works with the executable I am at a loss what is
causing it.
TestClass static seems to be a reason. Looks like releasing memory is happening in wrong order when run from python. To fix that you may want to add a method which will have to be explicitly called to release opencl objects before python starts releasing memory.
static TestClass& getInstance() // <- const removed
{
static TestClass instance;
return instance;
}
void release()
{
queues.clear();
}
BOOST_PYTHON_MODULE(FrameWork)
{
TestClass::getInstance();
TestClass::getInstance().release();
}
"I would appreciate an answer that explains to me what the problem actually is and if there are ways to fix it."
First, let me say that doqtor already answered how to fix the issue -- by ensuring a well-defined destruction time of all used OpenCL resources. IMO, this is not a "hack", but the right thing to do. Trying to rely on static init/cleanup magic to do the right thing -- and watching it fail to do so -- is the real hack!
Second, some thoughts about the issue: the actual problem is even more complex than the common static initialization order fiasco stories. It involves DLL loading/unloading order, both in connection with python loading your custom dll at runtime and (more importantly) with OpenCL's installable client driver (ICD) model.
What DLLs are involved when running an application/dll that uses OpenCL? To the application, the only relevant DLL is the opencl.dll you link against. It is loaded into process memory during application startup time (or when your custom DLL which needs opencl is dynamically loaded in python).
Then, at the time when you first call clGetPlatformInfo() or similar in your code, the ICD logic kicks in: opencl.dll will look for installed drivers (in windows, those are mentioned somewhere in the registry) and dynamically load their respective dlls (using sth like the LoadLibrary() system call). That may be e.g. nvopencl.dll for nvidia, or some other dll for the intel driver you have installed. Now, in contrast to the relatively simple opencl.dll, this ICD dll can and will have a multitude of dependencies on its own -- probably using Intel IPP, or TBB, or whatever. So by now, things have become real messy already.
Now, during shutdown, the windows loader must decide which dlls to unload in which order. When you compile your example in a single executable, the number and order of dlls being loaded/unloaded will certainly be different than in the "python loads your custom dll at runtime" scenario. And that could well be the reason why you experience the problem only in the latter case, and only if you still have an opencl-context+commandqueue alive during shutdown of your custom dll. The destruction of your queue (triggered via clRelease... during static destruction of your testclass instance) is delegated to the intel-icd-dll, so this dll must still be fully functional at that time. If, for some reason, that is not the case (perhaps because the loader chose to unload it or one of the dlls it needs), you crash.
That line of thought reminded me of this article:
https://blogs.msdn.microsoft.com/larryosterman/2004/06/10/dll_process_detach-is-the-last-thing-my-dlls-going-to-see-right/
There's a paragraph, talking about "COM objects", which might be equally applicable to "OpenCL resources":
"So consider the case where you have a DLL that instantiates a COM object at some point during its lifetime. If that DLL keeps a reference to the COM object in a global variable, and doesn’t release the COM object until the DLL_PROCESS_DETACH, then the DLL that implements the COM object will be kept in memory during the lifetime of the COM object. Effectively the DLL implementing the COM object has become dependant on the DLL that holds the reference to the COM object. But the loader has no way of knowing about this dependency. All it knows is that the DLL’s are loaded into memory."
Now, I wrote a lot of words without coming to a definitive proof of what's actually going wrong. The main lesson I learned from bugs like these is: don't enter that snake pit, and do your resource-cleanup in a well-defined place like doqtor suggested. Good night.

Boost::Python Not Finding C++ Class in OSX

I'm porting an Application from Linux to OS X and the Boost::Python integration is failing at run time.
I'm exposing my C++ classes like so:
using namespace scarlet;
BOOST_PYTHON_MODULE(libscarlet) {
using namespace boost::python;
class_<VideoEngine, boost::noncopyable>("VideoEngine", no_init)
.def("play", &VideoEngine::play)
.def("pause", &VideoEngine::pause)
.def("isPaused", &VideoEngine::isPaused)
[...]
;
}
I'm importing the library like so:
try {
boost::python::import("libscarlet");
} catch (boost::python::error_already_set & e) {
PyErr_Print();
}
Then I'm inject an instance into the global Python namespace like so:
void VideoEngine::init() {
[...]
try {
auto main_module = boost::python::import("__main__");
auto main_namespace = main_module.attr("__dict__");
main_namespace["engine"] = boost::python::object(boost::python::ptr(this));
} catch (boost::python::error_already_set & e) {
PyErr_Print();
}
[...]
}
It works great in Linux but in OS X an exception is thrown and PyErr_Print() returns TypeError: No Python class registered for C++ class scarlet::VideoEngine.
As far as I can tell the module works without issue when imported via the Python interpreter. It is difficult to test since it designed to be injected as a pre-constructed instance but the class and functions are present as shown below:
$ python
Python 2.7.5 (default, Mar 9 2014, 22:15:05)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import libscarlet
>>> libscarlet.VideoEngine
<class 'libscarlet.VideoEngine'>
>>> libscarlet.VideoEngine.play
<unbound method VideoEngine.play>
Any ideas as to where the incompatibility lies?
Edit: I'm starting to think it might be related to multithreading since my OS X implementation uses a different threading structure, although all of the calls detailed here happen in the same thread. Could that be the cause of such an issue? Probably not the issue since it doesn't work in MS Windows in single-threaded mode.
I have solved this now.
It was caused entirely by Boost::Python being statically compiled and once I recompiled it as a shared library the problem went away entirely, on all platforms.
The lesson: don't compile boost statically. I'm pretty sure there are warnings against it and they should be heeded.

How do you pass a void * back and forth to a shared library in python using ctypes?

I believe the basic process is:
from ctypes import *
LEAP = ctypes.CDLL('leap.dylib')
leap_controller = LEAP.leap_controller
leap_controller.res_type = c_void_p
leap_controller_dispose = LEAP.leap_controller_dispose
leap_controller_dispose.argtype = [c_void_p]
However, when I invoke this code it mangles the void * pointer. Debugging from the associated C code, I get:
doug:test doug$ python test.py
Controller init!
returning: 0x7fcf9a0022c0
Created controller
Controller dispose!
argument: 0xffffffff9a0022c0
Segmentation fault: 11
Obviously the free() call fails because the pointer isn't right.
I've tried a couple of variations on this, for example, defining a structure, like:
class LEAP_CONTROLLER(Structure):
_fields_ = [
("data", c_void_p)
]
leap_controller = LEAP.leap_controller
leap_controller.res_type = POINTER(LEAP_CONTROLLER)
leap_controller_dispose = LEAP.leap_controller_dispose
leap_controller_dispose.argtype = [POINTER(LEAP_CONTROLLER)]
...but the result always seems to be the same.
Python reads the pointer, but only keeps 32 bits worth of data, and returns a broken pointer with the remaining bits all 1.
Both of the binaries are 64 bit:
Non-fat file: /usr/local/bin/python is architecture: x86_64
Non-fat file: ../../dll/libcleap.dylib is architecture: x86_64
I saw this ancient issue, here:
http://bugs.python.org/issue1703286
...but it's marked as resolved.
A few other random mailing this threads seem to mention ctypes 'truncating 64-bit pointers to 32-bit', but people all seem to have magically resolved their issues without doing anything ('it works now') or gotten no answer to their questions.
Anyone know what I'm doing wrong?
Answer as per the comments; this was a typo in my code.
Per the docs, it's argtypes and restype, not argtype and res_type. – eryksun Oct 4 at 2:31

Categories