How to Implement Python Interfaces for C++ Libraries - python

What is the best/standard way to create Python interfaces for C++ libraries?
I know this question has been asked on here before but that was in 2008 and things may/likely have changed since then.
I've looked around and tested a few different methods but can't decide which is best. I've tried Swig, ctypes, and cppyy so far and think that cppyy is by far the easiest/fastest to implement. I've seen recommendations for Swig but it took a very long time to get Swig working and the results were not impressive. Is there a current standard? Why do people recommend Swig so much but I hear nothing of cppyy? Thank you.

SWIG has been around since February 1996 and supports a range of languages, not just Python. Although what is now cppyy started in February 2003, as RootPython, it has always been embedded with ROOT (http://root.cern.ch), and was not available standalone. A full, easy, installation with wheels on PyPI for all three major platforms only exists since March of this year and on conda-forge (for Linux & Mac) only since two months now. So, even though it has a long pedigree, within the wider Python world cppyy is really quite fresh, which is why I doubt many folks have heard of it yet, whereas SWIG is the (spiritual) ancestor of them all.
The reason for putting in the effort of making cppyy available, is that it offers quite a few features that other binders do not have, and would not be easy to add: a compliant C++17 parser (b/c of Clang/LLVM); automatic template instantiations, cross-inheritance and callbacks, all at run-time (b/c of Cling); and much better performance. It also does not create C extension modules, so you only need to recompile cppyy itself for different versions of Python, but none of the bound code.
Now, to your first question of what is the best. Well, it depends on the use case. For example, if you need more bindings than just Python, SWIG is your best bet. If you have lots of templates that you can not all instantiate at build time, need the performance and scaling, or have a C++ framework with lots of interfaces, then cppyy is hard to beat. If you have modern C++ and do not want any run-time dependency on external libraries, then PyBind11 is where it's at.
These days I can't recommend ctypes. The only real benefit is that it is a builtin module for most Pythons in the wild, but with the advent of PyPI and conda that has become moot. If you want a super-light C binder (not C++, but you can wrap those functions with C helpers), then go for CFFI.
As to your question of whether there is a standard: no, there is no one binder that is best for all use cases. There are even quite a few more than the ones you have mentioned, but many of them play in the same space (eg. SWIG vs. SIP, and PyBind11 vs. boost.python) and I wouldn't recommend them over the ones you already tried. I do want to point out AutoWIG, which is a generator utilizing Clang with PyBind11 or boost.python code as output; and cython, which is a Python-like code to write C extension modules and which has some (limited) C++ support. I've always felt that cython was neither here nor there, but lots of people like it, and it's used extensively in the scientific community and in math-heavy code, so that vouches for its quality.
Now, even though there is no "standard", all binders can convert proxies to PyCapsule objects and rebind them. So although it is a bit clunky at times, you can actually mix binders within one application.
One final point: CFFI and cppyy (through CFFI's backend) have near-native performance on PyPy. Unfortunately, cppyy isn't as up-to-date on PyPy as it is on CPython (e.g. cross-inheritance is still missing), but it's getting there. The other binders work through the Python C-API, which is fully functional on PyPy, but precludes the JIT from doing its work, with reduced performance as result.
Full disclaimer: I'm the author of cppyy, and these days I only use cppyy, CFFI, and PyBind11 for my binding needs.

Related

Fastest way to interface simple C++ function with Python

I have a C++ function that takes in std::vector<std::vector<double> > X and does some operations on X and outputs std::vector<std::vector<double> > X_mod.
I want to be able to quickly make an interface such that I can pass a Python numpy array into this C++ function, and then have the C++ function return X_mod into Python.
I have briefly looked at Boost, and it seems too complicated for this simple purpose?
Any other suggestions on how to write a quick interface for this?
As suggested in the comments Pybind11 can be used to write c++ bindings for python Pybind Documentation, Pybind Github Repo,
Example how to use it.
The Reason why everyone is proposing Pybind over Boost can be found in their readme:
The main issue with Boost.Python—and the reason for creating such a similar project—is Boost. Boost is an enormously large and complex suite of utility libraries that works with almost every C++ compiler in existence. This compatibility has its cost: arcane template tricks and workarounds are necessary to support the oldest and buggiest of compiler specimens. Now that C++11-compatible compilers are widely available, this heavy machinery has become an excessively large and unnecessary dependency.

How to use a dll compiled with VC++ 2013 with cython?

I am developing a library for 3D and environmental audio in C++, which I wish to use from Python and a myriad of other languages. I wish it to work with Kivy on the iPhone down the line, so I want to use Cython instead of ctypes. To that end, I have already constructed and exposed a C-only api that uses only primitive types, i.e. no structs. I would ideally like to have only one implementation of the Python binding, and believe that on Mac/Linux this won't be a problem (I could be wrong, but that is off-topic for this question).
The problem is that it is using C++11 heavily, and consequently needs to compile with VC++ 2013. Python 2 is still typically compiled with VC++ 2008, and I believe Python 3 is only VC++ 2010.
I am familiar with the rules for dynamic runtime mixing: don't pass memory out and free it from the other side, never ever send handles back and forth, etc. I am following them. This means that I'm ctypes and cffi-ready. All functions are appropriately extern "c" and the like.
Can I use this with Cython safely, and if so how? My reading is giving me very mixed answers on whether or not it's safe to go use the export library produced by VC++ 2013 with VC++ 2008, and Cython seems to have no in-built functionality for dynamically linking (and that won't work on iOS anyway). The only thing I can think of that would work besides linking with the import library is to also bind the windows DLL manipulation functions and use those in Cython-but this defeats the one implementation goal pretty quickly.
I've already tried researching this myself, but there seems to be no good info; simply trying it will be inconclusive for weeks-linking is no guarantee of functionality, after all. This is the kind of thing that can very easily seem to work right without actually doing so; to that end, sources and links to further reading are much appreciated.

Working OpenSceneGraph bindings for Python?

I'm building a rendering engine in Python for fun. I need to load 3D scenes. Any standard modern format like DAE, 3DS, or MAX would work: I can convert my files easily between standard formats.
OpenSceneGraph seems to be the most comprehensive and well-maintained solution. It would be ideal to be able to use it in Python without much hassle. Are there working Python bindings for OSG that are easy to install, work on Mac OS X (I'm on 10.8), and are compatible with the latest versions of OSG?
I searched around and came across osgswig (http://code.google.com/p/osgswig/) and PyOSG (http://sourceforge.net/projects/pyosg/), but they don't seem to be actively maintained. I don't see any recent activity related to these packages, and it seems that people had trouble running osgswig on OSX. Ideally, I'd like to find something that "just works", without major compilation hassles. I'd like to just install a package and be able to import a module that will let me load COLLADA or 3DS files.
I also came across pycollada (https://github.com/pycollada/pycollada). It seems active, but fairly early-stage. Ideally, I'd like a reasonably comprehensive package that supports specular maps, normal maps, and other reasonably advanced features. Animation would be nice as well.
In summary, I need to load 3D scenes in Python. Bindings for OSG would probably be ideal, because OSG is so comprehensive. But I need something that works on OSX. I would also prefer something that can be installed reasonably easily. Does something like this exist?
Thanks!
Take a look at Open Asset Import Library (short name: Assimp). It is a portable Open Source library to import various well-known 3D model formats in a uniform manner. http://www.assimp.org/
You should loot at panda3D (http://www.panda3d.org/), it's a game engine with extensive python bindings. It has the features you want : http://www.panda3d.org/manual/index.php/Features
I used it for a few years and it was a solid tool.
I made my own fork of a mirror of a clone of the osgswig project for a similar purpose. I have it working with OpenSceneGraph version 3.2.1 on Windows and Mac; and it's likely I will eventually polish it for linux too. I'm already delivering one product to customers based on my version of osgswig, and I'm considering making others. Find my fork here:
https://github.com/cmbruns/osgswig
If others show enough interest, I might be coaxed into creating binary installers for my version of the osgswig module, to make installation easier.
If you just want the easiest OpenSceneGraph bindings for OSG 3.2.1, you can stop reading this answer here. Read on for more of my thoughts for the future.
Though I am maintaining a fork of osgswig (as stated above), I sort of hate SWIG, and I would prefer to use bindings based on Boost.Python, rather than on SWIG. For large, complex C++ APIs, like OpenSceneGraph, Boost.Python can be much more elegant than SWIG, both for the API consumer, and for the binding maintainer (me, and me). I found one project using Boost.Python to wrap OSG, at https://code.google.com/p/osgboostpython/, but the developer is lovingly wrapping each part of the interface by hand, and has thus only completed a tiny fraction of the large OpenSceneGraph API.
Taking that Boost.Python based project as inspiration, I created yet another OpenSceneGraph Python binding project, at https://github.com/JaneliaSciComp/osgpyplusplus. Eventually, I want to use this osgpyplusplus project for all my python osg needs. And I would appreciate help in making it ready. Right now, osgpyplusplus suffers from the following weaknesses, compared to osgswig:
osgpyplusplus is not yet used in any working product
The build environment is tricky to set up, requiring both Boost.Python and Pyplusplus
I haven't paid much attention to osgpyplusplus recently, so it might rust away if I continue to ignore it.
Though osgpyplusplus probably wraps most of the OpenSceneGraph API, there are probably some important missing pieces that won't be identified until someone tries to develop a significant project with it.
It would be a lot of work for me to create a binary module installer for osgpyplusplus at this point, so please don't ask me to.

Choosing between Scons and Waf in Large Projects

We are thinking about converting a really large project from using GNU Make to some more modern build tool. My current suggestion is to use SCons or Waf.
Currently:
Build times are around 15 minutes.
Around 100 developers.
About 10 percent of code is C/C++/Fortran rest is Ada (using gnatmake).
Potential hopes/gains on improvements are
Shared Compiler Cache to cut down build times and requires disk space
Easier maintenance
Does SCons scale well for this task? I've seen comments on it not scaling aswell as Waf. Those are however a couple of years old. Have scons gained in performanced the last years? If not, what is the reason for its bad performance compared to Waf.
I have been developing a tool chain for our company that is built around waf. It targets Fedora, Ubuntu, Arch, Windows, Mac OSX and will be rolled out to our embedded devices doing cross-compilation on various hosts.
We have found the way that waf allows contained extensibility through the tools, features and other methods has made it incredibly easy to customise and extend for our projects.
Personally, I think it is brilliant and find it nicely abstracts the interfaces to different tools that are integrated.
Unfortunately, I have no in-depth experience with Scons but lots with GNU Make/Autotools. Our decision to go with waf after evaluating build tools was that we needed something that worked well everywhere which made our build tool being backed by python and that it was fast. I based my decision on these results and went from there.
In the past, SCons wasnt as performant, but lots of improvements have been added since then.
I like both options and had to make the same decision about 6 months ago. I went with SCons since it appears to have a larger user and support base.
Here is a helpful link that compares SCons to other build tools.
I personally prefer Waf because it's more flexible and doesn't have the variant directory issue.
Waf
Pros:
Separate variant directory; you don't clutter your source folder with object files (SCons also has this, but it's not on by default and takes a few tries to get working)
Very flexible
Automatic dependency sorting
Works on lots of Python versions(CPython 2, CPython 3, Jython, and PyPy)
You distribute it with your application, so users just need Python
Cons:
A pain to extend
Horribly underdocumented(although the examples help)
Doesn't differentiate between GCC and Clang well(not sure if SCons has that problem too)
SCons
Pros:
Much simpler than Waf
Easier to extend than Waf(see here)
Somewhat better documented
Cons:
Scalability issues (Note that SCons isn't actually quite that bad, buy it still gets a tad slower as size increases)
Ugly (this is 100% personal opinion)
Bottom line
It depends on what you're looking for. In general, Waf seems very good at managing large projects(and not just because of speed), but, if you need to extend it, look elsewhere. On the other hand, SCons is much easier to use.
If you decide to go with Waf, just post your problems to the mailing list.

PyPy -- How can it possibly beat CPython?

From the Google Open Source Blog:
PyPy is a reimplementation of Python
in Python, using advanced techniques
to try to attain better performance
than CPython. Many years of hard work
have finally paid off. Our speed
results often beat CPython, ranging
from being slightly slower, to
speedups of up to 2x on real
application code, to speedups of up to
10x on small benchmarks.
How is this possible? Which Python implementation was used to implement PyPy? CPython? And what are the chances of a PyPyPy or PyPyPyPy beating their score?
(On a related note... why would anyone try something like this?)
"PyPy is a reimplementation of Python in Python" is a rather misleading way to describe PyPy, IMHO, although it's technically true.
There are two major parts of PyPy.
The translation framework
The interpreter
The translation framework is a compiler. It compiles RPython code down to C (or other targets), automatically adding in aspects such as garbage collection and a JIT compiler. It cannot handle arbitrary Python code, only RPython.
RPython is a subset of normal Python; all RPython code is Python code, but not the other way around. There is no formal definition of RPython, because RPython is basically just "the subset of Python that can be translated by PyPy's translation framework". But in order to be translated, RPython code has to be statically typed (the types are inferred, you don't declare them, but it's still strictly one type per variable), and you can't do things like declaring/modifying functions/classes at runtime either.
The interpreter then is a normal Python interpreter written in RPython.
Because RPython code is normal Python code, you can run it on any Python interpreter. But none of PyPy's speed claims come from running it that way; this is just for a rapid test cycle, because translating the interpreter takes a long time.
With that understood, it should be immediately obvious that speculations about PyPyPy or PyPyPyPy don't actually make any sense. You have an interpreter written in RPython. You translate it to C code that executes Python quickly. There the process stops; there's no more RPython to speed up by processing it again.
So "How is it possible for PyPy to be faster than CPython" also becomes fairly obvious. PyPy has a better implementation, including a JIT compiler (it's generally not quite as fast without the JIT compiler, I believe, which means PyPy is only faster for programs susceptible to JIT-compilation). CPython was never designed to be a highly optimising implementation of the Python language (though they do try to make it a highly optimised implementation, if you follow the difference).
The really innovative bit of the PyPy project is that they don't write sophisticated GC schemes or JIT compilers by hand. They write the interpreter relatively straightforwardly in RPython, and for all RPython is lower level than Python it's still an object-oriented garbage collected language, much more high level than C. Then the translation framework automatically adds things like GC and JIT. So the translation framework is a huge effort, but it applies equally well to the PyPy python interpreter however they change their implementation, allowing for much more freedom in experimentation to improve performance (without worrying about introducing GC bugs or updating the JIT compiler to cope with the changes). It also means when they get around to implementing a Python3 interpreter, it will automatically get the same benefits. And any other interpreters written with the PyPy framework (of which there are a number at varying stages of polish). And all interpreters using the PyPy framework automatically support all platforms supported by the framework.
So the true benefit of the PyPy project is to separate out (as much as possible) all the parts of implementing an efficient platform-independent interpreter for a dynamic language. And then come up with one good implementation of them in one place, that can be re-used across many interpreters. That's not an immediate win like "my Python program runs faster now", but it's a great prospect for the future.
And it can run your Python program faster (maybe).
Q1. How is this possible?
Manual memory management (which is what CPython does with its counting) can be slower than automatic management in some cases.
Limitations in the implementation of the CPython interpreter preclude certain optimisations that PyPy can do (eg. fine grained locks).
As Marcelo mentioned, the JIT. Being able to on the fly confirm the type of an object can save you the need to do multiple pointer dereferences to finally arrive at the method you want to call.
Q2. Which Python implementation was used to implement PyPy?
The PyPy interpreter is implemented in RPython which is a statically typed subset of Python (the language and not the CPython interpreter). - Refer https://pypy.readthedocs.org/en/latest/architecture.html for details.
Q3. And what are the chances of a PyPyPy or PyPyPyPy beating their score?
That would depend on the implementation of these hypothetical interpreters. If one of them for example took the source, did some kind of analysis on it and converted it directly into tight target specific assembly code after running for a while, I imagine it would be quite faster than CPython.
Update: Recently, on a carefully crafted example, PyPy outperformed a similar C program compiled with gcc -O3. It's a contrived case but does exhibit some ideas.
Q4. Why would anyone try something like this?
From the official site. https://pypy.readthedocs.org/en/latest/architecture.html#mission-statement
We aim to provide:
a common translation and support framework for producing
implementations of dynamic languages, emphasizing a clean
separation between language specification and implementation
aspects. We call this the RPython toolchain_.
a compliant, flexible and fast implementation of the Python_
Language which uses the above toolchain to enable new advanced
high-level features without having to encode the low-level
details.
By separating concerns in this way, our implementation of Python - and
other dynamic languages - is able to automatically generate a
Just-in-Time compiler for any dynamic language. It also allows a
mix-and-match approach to implementation decisions, including many
that have historically been outside of a user's control, such as
target platform, memory and threading models, garbage collection
strategies, and optimizations applied, including whether or not to
have a JIT in the first place.
The C compiler gcc is implemented in C, The Haskell compiler GHC is written in Haskell. Do you have any reason for the Python interpreter/compiler to not be written in Python?
PyPy is implemented in Python, but it implements a JIT compiler to generate native code on the fly.
The reason to implement PyPy on top of Python is probably that it is simply a very productive language, especially since the JIT compiler makes the host language's performance somewhat irrelevant.
PyPy is written in Restricted Python. It does not run on top of the CPython interpreter, as far as I know. Restricted Python is a subset of the Python language. AFAIK, the PyPy interpreter is compiled to machine code, so when installed it does not utilize a python interpreter at runtime.
Your question seems to expect the PyPy interpreter is running on top of CPython while executing code.
Edit: Yes, to use PyPy you first translate the PyPy python code, either to C and build with gcc, to jvm byte code, or to .Net CLI code. See Getting Started

Categories