What are the closest concepts in Python to namespace and using statements in C++?
The closest equivalent to the namespace directive found in other languages is the Implicit Namespace Packages facility described in PEP 420 and introduced in Python 3.3. It allows for modules in multiple locations to be combined into a single, unified namespace rather than forcing the import of the first valid candidate found in sys.path.
There is no direct equivalent of using; importing specific names from a module adds them to the local scope unilaterally.
There isn't really an analogue. Consider this simple header:
// a.h
namespace ns {
struct A { .. };
struct B { .. };
}
If we were to do this:
#include "a.h"
using ns::A;
The point of that code is to be able to write A unqualified (as opposed to having to write ns::A). Now, you might consider a python equivalent as:
from a import A
But regardless of the using, the entire a.h header will still be included and compiled, so we would still be able to write ns::B, whereas in the Python version, a.B would not be visible.
The more expansive version:
using namespace ns;
definitely has no Python analogue either, since that brings in all names from namespace ns throughout the entire code-base - and namespaces can be reused. The most common thing I see beginner C++ programmers do is:
#include <vector>
#include <map>
#include <algorithm>
using namespace std; // bring in EVERYTHING
That one line is kind of equivalent to:
from vector import *
from map import *
from algorithm import *
at least in what it does, but then it only actually brings in what's in namespace std - which isn't necessarily everything.
Related
I'd like a simple example of exporting a function from a C++ Windows DLL.
I'd like to see the header, the .cpp file, and the .def file (if absolutely required).
I'd like the exported name to be undecorated. I'd like to use the most standard calling convention (__stdcall?). I'd like the use __declspec(dllexport) and not have to use a .def file.
For example:
//header
extern "C"
{
__declspec(dllexport) int __stdcall foo(long bar);
}
//cpp
int __stdcall foo(long bar)
{
return 0;
}
I'm trying to avoid the linker added underscores and/or numbers (byte counts?) to the name.
I'm OK with not supporting dllimport and dllexport using the same header. I don't want any information about exporting C++ class methods, just c-style global functions.
UPDATE
Not including the calling convention (and using extern "C") gives me the export names as I like, but what does that mean? Is whatever default calling convention I'm getting what pinvoke (.NET), declare (vb6), and GetProcAddress would expect? (I guess for GetProcAddress it would depend on the function pointer the caller created).
I want this DLL to be used without a header file, so I don't really need the a lot of the fancy #defines to make the header usable by a caller.
I'm OK with an answer being that I have to use a *.def file.
If you want plain C exports, use a C project not C++. C++ DLLs rely on name-mangling for all the C++isms (namespaces etc...). You can compile your code as C by going into your project settings under C/C++->Advanced, there is an option "Compile As" which corresponds to the compiler switches /TP and /TC.
If you still want to use C++ to write the internals of your lib but export some functions unmangled for use outside C++, see the second section below.
Exporting/Importing DLL Libs in VC++
What you really want to do is define a conditional macro in a header that will be included in all of the source files in your DLL project:
#ifdef LIBRARY_EXPORTS
# define LIBRARY_API __declspec(dllexport)
#else
# define LIBRARY_API __declspec(dllimport)
#endif
Then on a function that you want to be exported you use LIBRARY_API:
LIBRARY_API int GetCoolInteger();
In your library build project create a define LIBRARY_EXPORTS this will cause your functions to be exported for your DLL build.
Since LIBRARY_EXPORTS will not be defined in a project consuming the DLL, when that project includes the header file of your library all of the functions will be imported instead.
If your library is to be cross-platform you can define LIBRARY_API as nothing when not on Windows:
#ifdef _WIN32
# ifdef LIBRARY_EXPORTS
# define LIBRARY_API __declspec(dllexport)
# else
# define LIBRARY_API __declspec(dllimport)
# endif
#elif
# define LIBRARY_API
#endif
When using dllexport/dllimport you do not need to use DEF files, if you use DEF files you do not need to use dllexport/dllimport. The two methods accomplish the same task different ways, I believe that dllexport/dllimport is the recommended method out of the two.
Exporting unmangled functions from a C++ DLL for LoadLibrary/PInvoke
If you need this to use LoadLibrary and GetProcAddress, or maybe importing from another language (i.e PInvoke from .NET, or FFI in Python/R etc) you can use extern "C" inline with your dllexport to tell the C++ compiler not to mangle the names. And since we are using GetProcAddress instead of dllimport we don't need to do the ifdef dance from above, just a simple dllexport:
The Code:
#define EXTERN_DLL_EXPORT extern "C" __declspec(dllexport)
EXTERN_DLL_EXPORT int getEngineVersion() {
return 1;
}
EXTERN_DLL_EXPORT void registerPlugin(Kernel &K) {
K.getGraphicsServer().addGraphicsDriver(
auto_ptr<GraphicsServer::GraphicsDriver>(new OpenGLGraphicsDriver())
);
}
And here's what the exports look like with Dumpbin /exports:
Dump of file opengl_plugin.dll
File Type: DLL
Section contains the following exports for opengl_plugin.dll
00000000 characteristics
49866068 time date stamp Sun Feb 01 19:54:32 2009
0.00 version
1 ordinal base
2 number of functions
2 number of names
ordinal hint RVA name
1 0 0001110E getEngineVersion = #ILT+265(_getEngineVersion)
2 1 00011028 registerPlugin = #ILT+35(_registerPlugin)
So this code works fine:
m_hDLL = ::LoadLibrary(T"opengl_plugin.dll");
m_pfnGetEngineVersion = reinterpret_cast<fnGetEngineVersion *>(
::GetProcAddress(m_hDLL, "getEngineVersion")
);
m_pfnRegisterPlugin = reinterpret_cast<fnRegisterPlugin *>(
::GetProcAddress(m_hDLL, "registerPlugin")
);
For C++ :
I just faced the same issue and I think it is worth mentioning a problem comes up when one use both __stdcall (or WINAPI) and extern "C":
As you know extern "C" removes the decoration so that instead of :
__declspec(dllexport) int Test(void) --> dumpbin : ?Test##YaHXZ
you obtain a symbol name undecorated:
extern "C" __declspec(dllexport) int Test(void) --> dumpbin : Test
However the _stdcall ( = macro WINAPI, that changes the calling convention) also decorates names so that if we use both we obtain :
extern "C" __declspec(dllexport) int WINAPI Test(void) --> dumpbin : _Test#0
and the benefit of extern "C" is lost because the symbol is decorated (with _ #bytes)
Note that this only occurs for x86 architecture because
the __stdcall convention is ignored on x64 (msdn : on x64 architectures, by convention, arguments are passed in registers when possible, and subsequent arguments are passed on the stack.).
This is particularly tricky if you are targeting both x86 and x64 platforms.
Two solutions
Use a definition file. But this forces you to maintain the state of the def file.
the simplest way : define the macro (see msdn) :
#define EXPORT comment(linker, "/EXPORT:" __FUNCTION__ "="
__FUNCDNAME__)
and then include the following pragma in the function body:
#pragma EXPORT
Full Example :
int WINAPI Test(void)
{
#pragma EXPORT
return 1;
}
This will export the function undecorated for both x86 and x64 targets while preserving the __stdcall convention for x86. The __declspec(dllexport) is not required in this case.
I had exactly the same problem, my solution was to use module definition file (.def) instead of __declspec(dllexport) to define exports(http://msdn.microsoft.com/en-us/library/d91k01sh.aspx). I have no idea why this works, but it does
I think _naked might get what you want, but it also prevents the compiler from generating the stack management code for the function. extern "C" causes C style name decoration. Remove that and that should get rid of your _'s. The linker doesn't add the underscores, the compiler does. stdcall causes the argument stack size to be appended.
For more, see:
http://en.wikipedia.org/wiki/X86_calling_conventions
http://www.codeproject.com/KB/cpp/calling_conventions_demystified.aspx
The bigger question is why do you want to do that? What's wrong with the mangled names?
I have experience working with both Boost Multiprecision and with Python's mpmath, separately.
When it gets to making both communicate (for example to create Python extensions in C++), my attempts have always involved some sort of wasteful float-to-string and string-to-float conversion.
My question is: is it possible to make both communicate in a more performant (and elegant) way? And by that I mean, is there a way to directly have C++ Boost Multiprecision load from and export to a Python mpmath.mpf object in the same vein as C's mpp does via pybind11?
I have been searching for this for quite a bit. The only other similar question I found was about just exporting from Boost Multiprecision to Python (in general) using pybind11, not to a mpmath object directly. And in that question, the OP ended up using the same approach I am trying to avoid (that is, converting from/to strings when communicating from/to C++ and Python).
This answers only partially to your question. Because the direct answer is: No, it is not possible in a clean way without a wasteful conversion to string, because mpmath is a purely python library without any parts of it written in C or C++, hence even if you try to skip "wasteful conversion" by seeking to use some sort of binary compatibility, your code will be very fragile: it will break every time when some python or mpmath internals are changed ever so slightly.
However I needed exactly the same thing. And so I settled down for an automated conversion registered via boost::python which checks and converts using strings. Actually inside python you also create mpmath.mpf objects from strings, so it's very much the same, except in the code below it is faster because it is written inside C++.
So here's what works for me:
#include <boost/python.hpp>
#include <iostream>
#include <limits>
#include <sstream>
#include <boost/math/constants/constants.hpp>
#include <boost/multiprecision/cpp_bin_float.hpp>
namespace py = ::boost::python;
using Prec80 = boost::multiprecision::number<boost::multiprecision::cpp_bin_float<80>>;
template<typename ArbitraryReal>
struct ArbitraryReal_to_python {
static PyObject* convert(const ArbitraryReal& val){
std::stringstream ss{};
ss << std::setprecision(std::numeric_limits<ArbitraryReal>::digits10+1) << val;
py::object mpmath = py::import("mpmath");
mpmath.attr("mp").attr("dps")=int(std::numeric_limits<ArbitraryReal>::digits10+1);
py::object result = mpmath.attr("mpf")(ss.str());
return boost::python::incref( result.ptr() );
}
};
template<typename ArbitraryReal>
struct ArbitraryReal_from_python {
ArbitraryReal_from_python(){
boost::python::converter::registry::push_back(&convertible,&construct,boost::python::type_id<ArbitraryReal>());
}
static void* convertible(PyObject* obj_ptr){
// Accept whatever python is able to convert into float
// This works with mpmath numbers. However if you want to accept strings as numbers this checking code can be a little longer to verify if string is a valid number.
double check = PyFloat_AsDouble(obj_ptr);
return (PyErr_Occurred()==nullptr) ? obj_ptr : nullptr;
}
static void construct(PyObject* obj_ptr, boost::python::converter::rvalue_from_python_stage1_data* data){
std::istringstream ss{ py::call_method<std::string>(obj_ptr, "__str__") };
void* storage=((boost::python::converter::rvalue_from_python_storage<ArbitraryReal>*)(data))->storage.bytes;
new (storage) ArbitraryReal;
ArbitraryReal* val=(ArbitraryReal*)storage;
ss >> *val;
data->convertible=storage;
}
};
struct Var
{
Prec80 value{"-71.23"};
Prec80 get() const { return value; };
void set(Prec80 val) { value = val; };
};
BOOST_PYTHON_MODULE(pysmall)
{
ArbitraryReal_from_python<Prec80>();
py::to_python_converter<Prec80,ArbitraryReal_to_python<Prec80>>();
py::class_<Var>("Var" )
.add_property("val", &Var::get, &Var::set);
}
Now you compile this code with this command:
g++ -O1 -g pysmall.cpp -o pysmall.so -std=gnu++17 -fPIC -shared -I/usr/include/python3.7m/ -lboost_python37 -lpython3.7m -Wl,-soname,"pysmall.so"
And here is an example python session:
In [1]: import pysmall
In [2]: a=pysmall.Var()
In [3]: a.val
Out[3]: mpf('-71.2299999999999999999999999999999999999999999999999999999999999999999999999999997072')
In [4]: a.val=123.12
In [5]: a.val
Out[5]: mpf('123.120000000000000000000000000000000000000000000000000000000000000000000000000000003')
The C++ code does not care whether mpmath is already imported in python. If it is, it obtains the exsiting library handle, if it is not then it imports it.
If you find any room for improvement in this snippet please let me know!
Here's a couple of useful references when I was writing this:
https://misspent.wordpress.com/2009/09/27/how-to-write-boost-python-converters/
https://github.com/bluescarni/mppp/blob/master/include/mp%2B%2B/extra/pybind11.hpp (but I didn't want to use pybind11, just boost::python)
EDIT: I have now finished implementing this in YADE , it works with EIGEN and CGAL libraries. The part concerning this question is in file ToFromPythonConverter.hpp
I have a swigged C++ class container, MyContainer, holding objects of type MyObject, also a C++ class.
The following is the C++ header code (freemenot.h)
#ifndef freemenotH
#define freemenotH
#include <vector>
#include <string>
using std::string;
class MyObject
{
public:
MyObject(const string& lbl);
~MyObject();
string getLabel();
private:
string label;
};
class MyContainer
{
public:
MyContainer();
~MyContainer();
void addObject(MyObject* o);
MyObject* getObject(unsigned int t);
int getNrOfObjects();
private:
std::vector<MyObject*> mObjects;
};
#endif
and this is the source (freemenot.cpp)
#include "freemenot.h"
#include <iostream>
using namespace std;
/* MyObject source */
MyObject::MyObject(const string& lbl)
:
label(lbl)
{ cout<<"In object ctor"<<endl; }
MyObject::~MyObject() { cout<<"In object dtor"<<endl; }
string MyObject::getLabel() { return label; }
/* MyContainer source */
MyContainer::MyContainer() { cout<<"In container ctor"<<endl; }
MyContainer::~MyContainer()
{
cout<<"In container dtor"<<endl;
for(unsigned int i = 0; i < mObjects.size(); i++)
{
delete mObjects[i];
}
}
int MyContainer::getNrOfObjects() { return mObjects.size(); }
void MyContainer::addObject(MyObject* o) { mObjects.push_back(o); }
MyObject* MyContainer::getObject(unsigned int i) { return mObjects[i]; }
Observe that the objects are stored as RAW POINTERS in the vector. The class is such designed, and the container is thus responsible to free the objects in its destructor, as being done in the destructors for loop.
In C++ code, like below, an object o1 is added to the container c, which is returned to client code
MyContainer* getAContainerWithSomeObjects()
{
MyContainer* c = new MyContainer();
MyObject* o1 = new MyObject();
c.add(o1);
return c;
}
The returned container owns its objects, and are responsible to de-allocate these objects when done. In C++, access to the containers objects is fine after the function exits above.
Exposing the above classes to python, using Swig, will need an interface file. This interface file looks like this
%module freemenot
%{ #include "freemenot.h" %}
%include "std_string.i"
//Expose to Python
%include "freemenot.h"
And to generate a Python module, using CMake, the following CMake script was used.
cmake_minimum_required(VERSION 2.8)
project(freemenot)
find_package(SWIG REQUIRED)
include(UseSWIG)
find_package(PythonInterp)
find_package(PythonLibs)
get_filename_component(PYTHON_LIB_FOLDER ${PYTHON_LIBRARIES} DIRECTORY CACHE)
message("Python lib folder: " ${PYTHON_LIB_FOLDER})
message("Python include folder: " ${PYTHON_INCLUDE_DIRS})
message("Python libraries: " ${PYTHON_LIBRARIES})
set(PyModule "freemenot")
include_directories(
${PYTHON_INCLUDE_PATH}
${CMAKE_CURRENT_SOURCE_DIR}
)
link_directories( ${PYTHON_LIB_FOLDER})
set(CMAKE_MODULE_LINKER_FLAGS ${CMAKE_CURRENT_SOURCE_DIR}/${PyModule}.def)
set_source_files_properties(${PyModule}.i PROPERTIES CPLUSPLUS ON)
set_source_files_properties(${PyModule}.i PROPERTIES SWIG_FLAGS "-threads")
SWIG_ADD_LIBRARY(${PyModule}
MODULE LANGUAGE python
SOURCES ${PyModule}.i freemenot.cpp)
SWIG_LINK_LIBRARIES (${PyModule} ${PYTHON_LIB_FOLDER}/Python37_CG.lib )
# INSTALL PYTHON BINDINGS
# Get the python site packages directory by invoking python
execute_process(COMMAND python -c "import site; print(site.getsitepackages()[0])" OUTPUT_VARIABLE PYTHON_SITE_PACKAGES OUTPUT_STRIP_TRAILING_WHITESPACE)
message("PYTHON_SITE_PACKAGES = ${PYTHON_SITE_PACKAGES}")
install(
TARGETS _${PyModule}
DESTINATION ${PYTHON_SITE_PACKAGES})
install(
FILES ${CMAKE_CURRENT_BINARY_DIR}/${PyModule}.py
DESTINATION ${PYTHON_SITE_PACKAGES}
)
Generating the make files using CMake, and compiling using borlands bcc32 compiler, a Python module (freemenot) is generated and installed into a python3 valid sitepackages folder.
Then, in Python, the following script can be used to illuminate the problem
import freemenot as fmn
def getContainer():
c = fmn.MyContainer()
o1 = fmn.MyObject("This is a label")
o1.thisown = 0
c.addObject(o1)
return c
c = getContainer()
print (c.getNrOfObjects())
#if the thisown flag for objects in the getContainer function
#is equal to 1, the following call return an undefined object
#If the flag is equal to 0, the following call will return a valid object
a = c.getObject(0)
print (a.getLabel())
This Python code may look fine, but don't work as expected. Problem is that, when the function getContainer() returns, the memory for object o1 is freed, if the thisown flag is not set to zero. Accessing the object after this line, using the returned container will end up in disaster. Observe, there is not nothing wrong with this per se, as this is how pythons garbage collection works.
For the above use case being able to set the python objects thisown flag inside the addObject function, would render the C++ objects usable in Python.
Having the user to set this flag is no good solution.
One could also extend the python class with an "addObject" function, and modifying the thisown flag inside this function, and thereby hiding this memory trick from the user.
Question is, how to get Swig to do this, without extending the class?
I'm looking for using a typemap, or perhaps %pythoncode, but I seem not able to find a good working example.
The above code is to be used by, and passed to, a C++ program that is invoking the Python interpreter. The C++ program is responsible to manage the memory allocated in the python function, even after PyFinalize().
The above code can be downloaded from github https://github.com/TotteKarlsson/miniprojects
There are a number of different ways you could solve this problem, so I'll try and explain them each in turn, building on a few things along the way. Hopefully this is useful as a view into the options and innards of SWIG even if you only really need the first example.
Add Python code to modify thisown directly
The solution most like what you proposed relies on using SWIG's %pythonprepend directive to add some extra Python code. You can target it based on the C++ declaration of the overload you care about, e.g.:
%module freemenot
%{ #include "freemenot.h" %}
%include "std_string.i"
%pythonprepend MyContainer::addObject(MyObject*) %{
# mess with thisown
print('thisown was: %d' % args[0].thisown)
args[0].thisown = 0
%}
//Expose to Python
%include "freemenot.h"
Where the only notable quirk comes from the fact that the arguments are passed in using *args instead of named arguments, so we have to access it via position number.
There are several other places/methods to inject extra Python code (provided you're not using -builtin) in the SWIG Python documentation and monkey patching is always an option too.
Use Python's C API to tweak thisown
The next possible option here is to use a typemap calls the Python C API to perform the equivalent functionality. In this instance I've matched on the argument type and argument name, but that does mean the typemap here would get applied to all functions which receive a MyObject * named o. (The easiest solution here is to make the names describe the intended semantics in the headers if that would over-match currently which has the side benefit of making IDEs and documentation clearer).
%module freemenot
%{ #include "freemenot.h" %}
%include "std_string.i"
%typemap(in) MyObject *o {
PyObject_SetAttrString($input, "thisown", PyInt_FromLong(0)); // As above, but C API
$typemap(in,MyObject*); // use the default typemap
}
//Expose to Python
%include "freemenot.h"
The most noteworthy point about this example other than the typemap matching is the use of $typemap here to 'paste' another typemap, specifically the default one for MyObject* into our own typemap. It's worth having a look inside the generated wrapper file at a before/after example of what this ends up looking like.
Use SWIG runtime to get at SwigPyObject struct's own member directly
Since we're already writing C++ instead of going via the setattr in the Python code we can adapt this typemap to use more of SWIG's internals and skip a round-trip from C to Python and back to C again.
Internally within SWIG there's a struct that contains the details of each instance, including the ownership, type etc.
We could just cast from PyObject* to SwigPyObject* ourselves directly, but that would require writing error handling/type checking (is this PyObject even a SWIG one?) ourselves and become dependent on the details of the various differing ways SWIG can produce Python interfaces. Instead there's a single function we can call which just handles all that for us, so we can write our typemap like this now:
%module freemenot
%{ #include "freemenot.h" %}
%include "std_string.i"
%typemap(in) MyObject *o {
// TODO: handle NULL pointer still
SWIG_Python_GetSwigThis($input)->own = 0; // Safely cast $input from PyObject* to SwigPyObject*
$typemap(in,MyObject*); // use the default typemap
}
//Expose to Python
%include "freemenot.h"
This is just an evolution of the previous answer really, but implemented purely in the SWIG C runtime.
Copy construct a new instance before adding
There are other ways to approach this kind of ownership problem. Firstly in this specific instance your MyContainer assumes it can always call delete on every instance it stores (and hence owns in these semantics).
The motivating example for this would be if we were also wrapping a function like this:
MyObject *getInstanceOfThing() {
static MyObject a;
return &a;
}
Which introduces a problem with our prior solutions - we set thisown to 0, but here it would already have been 0 and so we still can't legally call delete on the pointer when the container is released.
There's a simple way to deal with this that doesn't require knowing about SWIG proxy internals - assuming MyObject is copy constructable then you can simply make a new instance and be sure that no matter where it came from it's going to be legal for the container to delete it. We can do that by adapting our typemap a little:
%module freemenot
%{ #include "freemenot.h" %}
%include "std_string.i"
%typemap(in) MyObject *o {
$typemap(in,MyObject*); // use the default typemap as before
$1 = new $*1_type(*$1); // but afterwards call copy-ctor
}
//Expose to Python
%include "freemenot.h"
The point to note here is the use of several more SWIG features that let us know the type of the typemap inputs - $*1_type is the type of the typemap argument dereferenced once. We could have just written MyObject here, as that's what it resolves to but this lets you handle things like templates if your container is really a template, or re-use the typemap in other similar containers with %apply.
The thing to watch for here now is leaks if you had a C++ function that you were deliberately allowing to return an instance without thisown being set on the assumption that the container would take ownership that wouldn't now hold.
Give the container a chance to manage ownership
Finally one of the other techniques I like using a lot isn't directly possible here as currently posed, but is worth mentioning for posterity. If you get the chance to store some additional data along side each instance in the container you can call Py_INCREF and retain a reference to the underlying PyObject* no matter where it came from. Provided you then get a callback at destruction time you can also call Py_DECREF and force the Python runtime to keep the object alive as long as the container.
You can also do that even when it's not possible to keep a 1-1 MyObject*/PyObject* pairing alive, by keeping a shadow container alive somewhere also. That can be hard to do unless you're willing to add another object into the container, subclass it or can be very certain that the initial Python instance of the container will always live long enough.
You're looking for %newobject. Here's a small example:
%module test
%newobject create;
%delobject destroy;
%inline %{
#include <iostream>
struct Test
{
Test() { std::cout << "create" << std::endl; }
~Test() { std::cout << "destroy" << std::endl; }
};
Test* create() { return new Test; }
void destroy(Test* t) { delete t; }
%}
Use:
>>> import test
>>> t1 = test.create() # create a test object
create
>>> t2 = test.Test() # don't really need a create function :)
create
>>> t3 = test.create() # and another.
create
>>> test.destroy(t2) # explicitly destroy one
destroy
>>>
>>>
>>>
>>> ^Z # exit Python and the other two get destroyed.
destroy
destroy
I just wanted thisown to be set to zero in the constructor. I did it in two ways
I simply added one line sed statement to my makefile to add 'self.thisown = 0' at the end of init() function of my class.
Using pythonappend. I figured out two caveats (a) %pythonappend statement has to be placed before c++ class definition, (b) c++ constructor overloads do not matter
%pythonappend MyApp::MyApp() %{
self.thisown = 0
%}
%include <MyApp.hpp>
This is very related to this question
Regardless of whether or not this is coding practice, I have come across code that looks like this
test.hh
#include <vector>
using std::vector;
class Test
{
public:
vector<double> data;
};
I am trying to swig this using swig3.0 using the following interface file
test.i
%module test_swig
%include "std_vector.i"
namespace std {
%template(VectorDouble) vector<double>;
};
%{
#include "test.hh"
%}
%naturalvar Test::data;
%include "test.hh"
And the following test code
test.py
t = test.Test()
jprint(t)
a = [1, 2, 3]
t.data = a # fails
doing so gives me the following error
in method 'Test_data_set', argument 2 of type 'vector< double >'
This can be fixed by either changing the using std::vector in test.hh to using namespace std or by removing using std::vector and changing vector<double> to std::vector<double>. This is not what I want.
The problem is that I was given this code as is. I am not allowed to make changes, but I am supposed to still make everything available in python via SWIG. What's going on here?
Thanks in advance.
To me, this looks like SWIG does not support the using std::vector; statement correctly. I think it's a SWIG bug. I can think of the following workarounds:
Add using namespace std; to the SWIG interface file (this will only affect the way wrappers are created; the using statement will not enter C++ code)
Add #define vector std::vector to the SWIG interface file (this will only work if vector is never used as std::vector)
Copy the declarations from the header file to the SWIG interface file, and change vector to std::vector. This will cause SWIG to generate correct wrappers, and again will not affect the C++ library code.
How does module loading work in CPython under the hood? Especially, how does the dynamic loading of extensions written in C work? Where can I learn about this?
I find the source code itself rather overwhelming. I can see that trusty ol' dlopen() and friends is used on systems that support it but without any sense of the bigger picture it would take a long time to figure this out from the source code.
An enormous amount could be written on this topic but as far as I can tell, almost nothing has been — the abundance of webpages describing the Python language itself makes this difficult to search for. A great answer would provide a reasonably brief overview and references to resources where I can learn more.
I'm mostly concerned with how this works on Unix-like systems simply because that's what I know but I am interested in if the process is similar elsewhere.
To be more specific (but also risk assuming too much), how does CPython use the module methods table and initialization function to "make sense" of dynamically loaded C?
TLDR short version bolded.
References to the Python source code are based on version 2.7.6.
Python imports most extensions written in C through dynamic loading. Dynamic loading is an esoteric topic that isn't well documented but it's an absolute prerequisite. Before explaining how Python uses it, I must briefly explain what it is and why Python uses it.
Historically C extensions to Python were statically linked against the Python interpreter itself. This required Python users to recompile the interpreter every time they wanted to use a new module written in C. As you can imagine, and as Guido van Rossum describes, this became impractical as the community grew. Today, most Python users never compile the interpreter once. We simply "pip install module" and then "import module" even if that module contains compiled C code.
Linking is what allows us to make function calls across compiled units of code. Dynamic loading solves the problem of linking code when the decision for what to link is made at runtime. That is, it allows a running program to interface with the linker and tell the linker what it wants to link with. For the Python interpreter to import modules with C code, this is what's called for. Writing code that makes this decision at runtime is quite uncommon and most programmers would be surprised that it's possible. Simply put, a C function has an address, it expects you to put certain data in certain places, and it promises to have put certain data in certain places upon return. If you know the secret handshake, you can call it.
The challenge with dynamic loading is that it's incumbent upon the programmer to get the handshake right and there are no safety checks. At least, they're not provided for us. Normally, if we try to call a function name with an incorrect signature we get a compile or linker error. With dynamic loading we ask the linker for a function by name (a "symbol") at runtime. The linker can tell us if that name was found but it can’t tell us how to call that function. It just gives us an address - a void pointer. We can try to cast to a function pointer of some sort but it is solely up to the programmer to get the cast correct. If we get the function signature wrong in our cast, it’s too late for the compiler or linker to warn us. We’ll likely get a segfault after the program careens out of control and ends up accessing memory inappropriately. Programs using dynamic loading must rely on pre-arranged conventions and information gathered at runtime to make proper function calls. Here's a small example before we tackle the Python interpreter.
File 1: main.c
/* gcc-4.8 -o main main -ldl */
#include <dlfcn.h> /* key include, also in Python/dynload_shlib.c */
/* used for cast to pointer to function that takes no args and returns nothing */
typedef void (say_hi_type)(void);
int main(void) {
/* get a handle to the shared library dyload1.so */
void* handle1 = dlopen("./dyload1.so", RTLD_LAZY);
/* acquire function ptr through string with name, cast to function ptr */
say_hi_type* say_hi1_ptr = (say_hi_type*)dlsym(handle1, "say_hi1");
/* dereference pointer and call function */
(*say_hi1_ptr)();
return 0;
}
/* error checking normally follows both dlopen() and dlsym() */
File 2: dyload1.c
/* gcc-4.8 -o dyload1.so dyload1.c -shared -fpic */
/* compile as C, C++ does name mangling -- changes function names */
#include <stdio.h>
void say_hi1() {
puts("dy1: hi");
}
These files are compiled and linked separately but main.c knows to go looking for ./dyload1.so at runtime. The code in main assumes that dyload1.so will have a symbol "say_hi1". It gets a handle to dyload1.so's symbols with dlopen(), gets the address of a symbol using dlsym(), assumes it is a function that takes no arguments and returns nothing, and calls it. It has no way to know for sure what "say_hi1" is -- a prior agreement is all that keeps us from segfaulting.
What I've shown above is the dlopen() family of functions. Python is deployed on many platforms, not all of which provide dlopen() but most have similar dynamic loading mechanisms. Python achieves portable dynamic loading by wrapping the dynamic loading mechanisms of several operating systems in a common interface.
This comment in Python/importdl.c summarizes the strategy.
/* ./configure sets HAVE_DYNAMIC_LOADING if dynamic loading of modules is
supported on this platform. configure will then compile and link in one
of the dynload_*.c files, as appropriate. We will call a function in
those modules to get a function pointer to the module's init function.
*/
As referenced, in Python 2.7.6 we have these dynload*.c files:
Python/dynload_aix.c Python/dynload_beos.c Python/dynload_hpux.c
Python/dynload_os2.c Python/dynload_stub.c Python/dynload_atheos.c
Python/dynload_dl.c Python/dynload_next.c Python/dynload_shlib.c
Python/dynload_win.c
They each define a function with this signature:
dl_funcptr _PyImport_GetDynLoadFunc(const char *fqname, const char *shortname,
const char *pathname, FILE *fp)
These functions contain the different dynamic loading mechanisms for different operating systems. The mechanism for dynamic loading on Mac OS newer than 10.2 and most Unix(-like) systems is dlopen(), which is called in Python/dynload_shlib.c.
Skimming over dynload_win.c, the analagous function for Windows is LoadLibraryEx(). Its use looks very similar.
At the bottom of Python/dynload_shlib.c you can see the actual call to dlopen() and to dlsym().
handle = dlopen(pathname, dlopenflags);
/* error handling */
p = (dl_funcptr) dlsym(handle, funcname);
return p;
Right before this, Python composes the string with the function name it will look for. The module name is in the shortname variable.
PyOS_snprintf(funcname, sizeof(funcname),
LEAD_UNDERSCORE "init%.200s", shortname);
Python simply hopes there's a function called init{modulename} and asks the linker for it. Starting here, Python relies on a small set of conventions to make dynamic loading of C code possible and reliable.
Let's look at what C extensions must do to fulfill the contract that makes the above call to dlsym() work. For compiled C Python modules, the first convention that allows Python to access the compiled C code is the init{shared_library_filename}() function. For a module named spam compiled as shared library named “spam.so”, we might provide this initspam() function:
PyMODINIT_FUNC
initspam(void)
{
PyObject *m;
m = Py_InitModule("spam", SpamMethods);
if (m == NULL)
return;
}
If the name of the init function does not match the filename, the Python interpreter cannot know how to find it. For example, renaming spam.so to notspam.so and trying to import gives the following.
>>> import spam
ImportError: No module named spam
>>> import notspam
ImportError: dynamic module does not define init function (initnotspam)
If the naming convention is violated there's simply no telling if the shared library even contains an initialization function.
The second key convention is that once called, the init function is responsible for initializing itself by calling Py_InitModule. This call adds the module to a "dictionary"/hash table kept by the interpreter that maps module name to module data. It also registers the C functions in the method table. After calling Py_InitModule, modules may initialize themselves in other ways such as adding objects. (Ex: the SpamError object in the Python C API tutorial). (Py_InitModule is actually a macro that creates the real init call but with some info baked in like what version of Python our compiled C extension used.)
If the init function has the proper name but does not call Py_InitModule(), we get this:
SystemError: dynamic module not initialized properly
Our methods table happens to be called SpamMethods and looks like this.
static PyMethodDef SpamMethods[] = {
{"system", spam_system, METH_VARARGS,
"Execute a shell command."},
{NULL, NULL, 0, NULL}
};
The method table itself and the function signature contracts it entails is the third and final key convention necessary for Python to make sense of dynamically loaded C. The method table is an array of struct PyMethodDef with a final sentinel entry. A PyMethodDef is defined in Include/methodobject.h as follows.
struct PyMethodDef {
const char *ml_name; /* The name of the built-in function/method */
PyCFunction ml_meth; /* The C function that implements it */
int ml_flags; /* Combination of METH_xxx flags, which mostly
describe the args expected by the C func */
const char *ml_doc; /* The __doc__ attribute, or NULL */
};
The crucial part here is that the second member is a PyCFunction. We passed in the address of a function, so what is a PyCFunction? It's a typedef, also in Include/methodobject.h
typedef PyObject *(*PyCFunction)(PyObject *, PyObject *);
PyCFunction is a typedef for a pointer to a function that returns a pointer to a PyObject and that takes for arguments two pointers to PyObjects. As a lemma to convention three, C functions registered with the method table all have the same signature.
Python circumvents much of the difficulty in dynamic loading by using a limited set of C function signatures. One signature in particular is used for most C functions. Pointers to C functions that take additional arguments can be "snuck in" by casting to PyCFunction. (See the keywdarg_parrot example in the Python C API tutorial.) Even C functions that backup Python functions that take no arguments in Python will take two arguments in C (shown below). All functions are also expected to return something (which may just be the None object). Functions that take multiple positional arguments in Python have to unpack those arguments from a single object in C.
That's how the data for interfacing with dynamically loaded C functions is acquired and stored. Finally, here's an example of how that data is used.
The context here is that we're evaluating Python "opcodes", instruction by instruction, and we've hit a function call opcode. (see https://docs.python.org/2/library/dis.html. It's worth a skim.) We've determined that the Python function object is backed by a C function. In the code below we check if the function in Python takes no arguments (in Python) and if so, call it (with two arguments in C).
Python/ceval.c.
if (flags & (METH_NOARGS | METH_O)) {
PyCFunction meth = PyCFunction_GET_FUNCTION(func);
PyObject *self = PyCFunction_GET_SELF(func);
if (flags & METH_NOARGS && na == 0) {
C_TRACE(x, (*meth)(self,NULL));
}
It does of course take arguments in C - exactly two. Since everything is an object in Python, it gets a self argument. At the bottom you can see that meth is assigned a function pointer which is then dereferenced and called. The return value ends up in x.