Fastest way to interface simple C++ function with Python

Fastest way to interface simple C++ function with Python - python

I have a C++ function that takes in std::vector<std::vector<double> > X and does some operations on X and outputs std::vector<std::vector<double> > X_mod.
I want to be able to quickly make an interface such that I can pass a Python numpy array into this C++ function, and then have the C++ function return X_mod into Python.
I have briefly looked at Boost, and it seems too complicated for this simple purpose?
Any other suggestions on how to write a quick interface for this?

As suggested in the comments Pybind11 can be used to write c++ bindings for python Pybind Documentation, Pybind Github Repo,
Example how to use it.
The Reason why everyone is proposing Pybind over Boost can be found in their readme:
The main issue with Boost.Python—and the reason for creating such a similar project—is Boost. Boost is an enormously large and complex suite of utility libraries that works with almost every C++ compiler in existence. This compatibility has its cost: arcane template tricks and workarounds are necessary to support the oldest and buggiest of compiler specimens. Now that C++11-compatible compilers are widely available, this heavy machinery has become an excessively large and unnecessary dependency.

Related

How to Implement Python Interfaces for C++ Libraries

What is the best/standard way to create Python interfaces for C++ libraries?
I know this question has been asked on here before but that was in 2008 and things may/likely have changed since then.
I've looked around and tested a few different methods but can't decide which is best. I've tried Swig, ctypes, and cppyy so far and think that cppyy is by far the easiest/fastest to implement. I've seen recommendations for Swig but it took a very long time to get Swig working and the results were not impressive. Is there a current standard? Why do people recommend Swig so much but I hear nothing of cppyy? Thank you.

SWIG has been around since February 1996 and supports a range of languages, not just Python. Although what is now cppyy started in February 2003, as RootPython, it has always been embedded with ROOT (http://root.cern.ch), and was not available standalone. A full, easy, installation with wheels on PyPI for all three major platforms only exists since March of this year and on conda-forge (for Linux & Mac) only since two months now. So, even though it has a long pedigree, within the wider Python world cppyy is really quite fresh, which is why I doubt many folks have heard of it yet, whereas SWIG is the (spiritual) ancestor of them all.
The reason for putting in the effort of making cppyy available, is that it offers quite a few features that other binders do not have, and would not be easy to add: a compliant C++17 parser (b/c of Clang/LLVM); automatic template instantiations, cross-inheritance and callbacks, all at run-time (b/c of Cling); and much better performance. It also does not create C extension modules, so you only need to recompile cppyy itself for different versions of Python, but none of the bound code.
Now, to your first question of what is the best. Well, it depends on the use case. For example, if you need more bindings than just Python, SWIG is your best bet. If you have lots of templates that you can not all instantiate at build time, need the performance and scaling, or have a C++ framework with lots of interfaces, then cppyy is hard to beat. If you have modern C++ and do not want any run-time dependency on external libraries, then PyBind11 is where it's at.
These days I can't recommend ctypes. The only real benefit is that it is a builtin module for most Pythons in the wild, but with the advent of PyPI and conda that has become moot. If you want a super-light C binder (not C++, but you can wrap those functions with C helpers), then go for CFFI.
As to your question of whether there is a standard: no, there is no one binder that is best for all use cases. There are even quite a few more than the ones you have mentioned, but many of them play in the same space (eg. SWIG vs. SIP, and PyBind11 vs. boost.python) and I wouldn't recommend them over the ones you already tried. I do want to point out AutoWIG, which is a generator utilizing Clang with PyBind11 or boost.python code as output; and cython, which is a Python-like code to write C extension modules and which has some (limited) C++ support. I've always felt that cython was neither here nor there, but lots of people like it, and it's used extensively in the scientific community and in math-heavy code, so that vouches for its quality.
Now, even though there is no "standard", all binders can convert proxies to PyCapsule objects and rebind them. So although it is a bit clunky at times, you can actually mix binders within one application.
One final point: CFFI and cppyy (through CFFI's backend) have near-native performance on PyPy. Unfortunately, cppyy isn't as up-to-date on PyPy as it is on CPython (e.g. cross-inheritance is still missing), but it's getting there. The other binders work through the Python C-API, which is fully functional on PyPy, but precludes the JIT from doing its work, with reduced performance as result.
Full disclaimer: I'm the author of cppyy, and these days I only use cppyy, CFFI, and PyBind11 for my binding needs.

How to use a dll compiled with VC++ 2013 with cython?

I am developing a library for 3D and environmental audio in C++, which I wish to use from Python and a myriad of other languages. I wish it to work with Kivy on the iPhone down the line, so I want to use Cython instead of ctypes. To that end, I have already constructed and exposed a C-only api that uses only primitive types, i.e. no structs. I would ideally like to have only one implementation of the Python binding, and believe that on Mac/Linux this won't be a problem (I could be wrong, but that is off-topic for this question).
The problem is that it is using C++11 heavily, and consequently needs to compile with VC++ 2013. Python 2 is still typically compiled with VC++ 2008, and I believe Python 3 is only VC++ 2010.
I am familiar with the rules for dynamic runtime mixing: don't pass memory out and free it from the other side, never ever send handles back and forth, etc. I am following them. This means that I'm ctypes and cffi-ready. All functions are appropriately extern "c" and the like.
Can I use this with Cython safely, and if so how? My reading is giving me very mixed answers on whether or not it's safe to go use the export library produced by VC++ 2013 with VC++ 2008, and Cython seems to have no in-built functionality for dynamically linking (and that won't work on iOS anyway). The only thing I can think of that would work besides linking with the import library is to also bind the windows DLL manipulation functions and use those in Cython-but this defeats the one implementation goal pretty quickly.
I've already tried researching this myself, but there seems to be no good info; simply trying it will be inconclusive for weeks-linking is no guarantee of functionality, after all. This is the kind of thing that can very easily seem to work right without actually doing so; to that end, sources and links to further reading are much appreciated.

How can I easily convert FORTRAN code to Python code (real code, not wrappers)

I have a numerical library in FORTRAN (I believe FORTRAN IV) and I want to convert it to Python code. I want real source code that I can import on any Python virtual machine --- Windows, MacOS-X, Linux, Android. I started to do this by hand, but there are about 1,000 routines in the library, so that's not a reasonable solution.

Such a tool exists for Fortran to Lisp, or Fortran to C, or even Fortran to Java. But you will never have a Fortran to Python tool, for a simple reason: unlike Fortran, Lisp or C, Python does not have GOTO [1]. And there are many GOTOs in Fortran (especially Fortran IV) code. Even if there is a theorem by Jacopini stating that you can emulate GOTO with structured programming, it's far too cumbersome to implement a real (and efficient) language conversion tool.
So not only will you need to translate the code of 1000 routines, but you will also need to understand each algorithm, with all its imbricated gotos, and translate the algorithm into a structured program before writing it in Python. Good luck!
Hey, why do you think a wrapper is bad? Windows, OSX, and Linux all have Fortran and C [2] compilers and good wrappers!
For C (not your language here, but f2c may be an option), there is SWIG, and Fortran has f2py, now integrated with numpy. SWIG has some support for Android.
By the way, instead of converting to "pure" Python, you can use NumPy: NumPy capabilities are similar to Fortran 90 (see a comparison here), so you may consider first translating your programs to F90 for a smoother transition. There seems to be also a Numpy on Adnroid. And in case you need NumPy on 64-bit Windows, there are binaries here.
If you decide to use wrappers, gfortran runs on Linux (simply install from distribution packages), Windows (MinGW), and Android.
If you go along that line, don't forget you compile FIV code, so there is the usual "one-trip loop" problem (usually a compiler option is fine). You will probably have also to manually convert some old, non-standard statements, not found in modern compilers.
You have also, obviously, the option to switch your project language to Lisp or Java...
[1] You may ask: but if GOTO is the problem, how come there is a Fortran to Java tool? Well, it uses tricks with the JVM, which has internally the GOTO instruction. There is also a GOTO in Python bytecode (look for JUMP here), so there may be something to investigate here. So my previous statement is wrong: there may be a Fortran to Python tool, using bytecode tricks like in Java. But it remains to develop, and the availability of good libraries (like NumPy, matplotlib, pandas...) makes it unnecessary, to say the least.

I wrote a translator that converts a subset of Fortran into Python (and several other languages). It is only compatible with a small subset of Fortran, but I hope it will still be useful.
The translator can parse this Fortran function:
LOGICAL function is_greater_than(a, b)
real,intent(in) :: a
real,intent(in) :: b
is_greater_than = a<b
end function is_greater_than
...and translate it into this Python function:
def is_greater_than(a,b):
return a<b

PyPy -- How can it possibly beat CPython?

From the Google Open Source Blog:
PyPy is a reimplementation of Python
in Python, using advanced techniques
to try to attain better performance
than CPython. Many years of hard work
have finally paid off. Our speed
results often beat CPython, ranging
from being slightly slower, to
speedups of up to 2x on real
application code, to speedups of up to
10x on small benchmarks.
How is this possible? Which Python implementation was used to implement PyPy? CPython? And what are the chances of a PyPyPy or PyPyPyPy beating their score?
(On a related note... why would anyone try something like this?)

"PyPy is a reimplementation of Python in Python" is a rather misleading way to describe PyPy, IMHO, although it's technically true.
There are two major parts of PyPy.
The translation framework
The interpreter
The translation framework is a compiler. It compiles RPython code down to C (or other targets), automatically adding in aspects such as garbage collection and a JIT compiler. It cannot handle arbitrary Python code, only RPython.
RPython is a subset of normal Python; all RPython code is Python code, but not the other way around. There is no formal definition of RPython, because RPython is basically just "the subset of Python that can be translated by PyPy's translation framework". But in order to be translated, RPython code has to be statically typed (the types are inferred, you don't declare them, but it's still strictly one type per variable), and you can't do things like declaring/modifying functions/classes at runtime either.
The interpreter then is a normal Python interpreter written in RPython.
Because RPython code is normal Python code, you can run it on any Python interpreter. But none of PyPy's speed claims come from running it that way; this is just for a rapid test cycle, because translating the interpreter takes a long time.
With that understood, it should be immediately obvious that speculations about PyPyPy or PyPyPyPy don't actually make any sense. You have an interpreter written in RPython. You translate it to C code that executes Python quickly. There the process stops; there's no more RPython to speed up by processing it again.
So "How is it possible for PyPy to be faster than CPython" also becomes fairly obvious. PyPy has a better implementation, including a JIT compiler (it's generally not quite as fast without the JIT compiler, I believe, which means PyPy is only faster for programs susceptible to JIT-compilation). CPython was never designed to be a highly optimising implementation of the Python language (though they do try to make it a highly optimised implementation, if you follow the difference).
The really innovative bit of the PyPy project is that they don't write sophisticated GC schemes or JIT compilers by hand. They write the interpreter relatively straightforwardly in RPython, and for all RPython is lower level than Python it's still an object-oriented garbage collected language, much more high level than C. Then the translation framework automatically adds things like GC and JIT. So the translation framework is a huge effort, but it applies equally well to the PyPy python interpreter however they change their implementation, allowing for much more freedom in experimentation to improve performance (without worrying about introducing GC bugs or updating the JIT compiler to cope with the changes). It also means when they get around to implementing a Python3 interpreter, it will automatically get the same benefits. And any other interpreters written with the PyPy framework (of which there are a number at varying stages of polish). And all interpreters using the PyPy framework automatically support all platforms supported by the framework.
So the true benefit of the PyPy project is to separate out (as much as possible) all the parts of implementing an efficient platform-independent interpreter for a dynamic language. And then come up with one good implementation of them in one place, that can be re-used across many interpreters. That's not an immediate win like "my Python program runs faster now", but it's a great prospect for the future.
And it can run your Python program faster (maybe).

Q1. How is this possible?
Manual memory management (which is what CPython does with its counting) can be slower than automatic management in some cases.
Limitations in the implementation of the CPython interpreter preclude certain optimisations that PyPy can do (eg. fine grained locks).
As Marcelo mentioned, the JIT. Being able to on the fly confirm the type of an object can save you the need to do multiple pointer dereferences to finally arrive at the method you want to call.
Q2. Which Python implementation was used to implement PyPy?
The PyPy interpreter is implemented in RPython which is a statically typed subset of Python (the language and not the CPython interpreter). - Refer https://pypy.readthedocs.org/en/latest/architecture.html for details.
Q3. And what are the chances of a PyPyPy or PyPyPyPy beating their score?
That would depend on the implementation of these hypothetical interpreters. If one of them for example took the source, did some kind of analysis on it and converted it directly into tight target specific assembly code after running for a while, I imagine it would be quite faster than CPython.
Update: Recently, on a carefully crafted example, PyPy outperformed a similar C program compiled with gcc -O3. It's a contrived case but does exhibit some ideas.
Q4. Why would anyone try something like this?
From the official site. https://pypy.readthedocs.org/en/latest/architecture.html#mission-statement
We aim to provide:
a common translation and support framework for producing
implementations of dynamic languages, emphasizing a clean
separation between language specification and implementation
aspects. We call this the RPython toolchain_.
a compliant, flexible and fast implementation of the Python_
Language which uses the above toolchain to enable new advanced
high-level features without having to encode the low-level
details.
By separating concerns in this way, our implementation of Python - and
other dynamic languages - is able to automatically generate a
Just-in-Time compiler for any dynamic language. It also allows a
mix-and-match approach to implementation decisions, including many
that have historically been outside of a user's control, such as
target platform, memory and threading models, garbage collection
strategies, and optimizations applied, including whether or not to
have a JIT in the first place.
The C compiler gcc is implemented in C, The Haskell compiler GHC is written in Haskell. Do you have any reason for the Python interpreter/compiler to not be written in Python?

PyPy is implemented in Python, but it implements a JIT compiler to generate native code on the fly.
The reason to implement PyPy on top of Python is probably that it is simply a very productive language, especially since the JIT compiler makes the host language's performance somewhat irrelevant.

PyPy is written in Restricted Python. It does not run on top of the CPython interpreter, as far as I know. Restricted Python is a subset of the Python language. AFAIK, the PyPy interpreter is compiled to machine code, so when installed it does not utilize a python interpreter at runtime.
Your question seems to expect the PyPy interpreter is running on top of CPython while executing code.
Edit: Yes, to use PyPy you first translate the PyPy python code, either to C and build with gcc, to jvm byte code, or to .Net CLI code. See Getting Started

Creating a shared library in MATLAB

A researcher has created a small simulation in MATLAB and we want to make it accessible to others. My plan is to take the simulation, clean up a few things and turn it into a set of functions. Then I plan to compile it into a C library and use SWIG to create a Python wrapper. At that point, I should be able to call the simulation from a small Django application. At least I hope so.
Do I have the right plan? Are there are any serious pitfalls that I'm not aware of at the moment?

One thing to remember is that the MATLAB compiler does not actually compile the MATLAB code into native machine instructions. It simply wraps it into a stand-alone executable or a library with its own runtime engine that runs it. You would be able to run your code without MATLAB installed, and you would be able to interface it with other languages, but it will still be interpreted MATLAB code, so there would be no speedup.
Matlab Coder, on the other hand, is the thing that can generate C code from Matlab. There are some limitations, though. Not all Matlab functions are supported for code generation, and there are things you cannot do, like change the type of a variable on the fly.

I remember that I was able to wrap a MATLAB simulation into a DLL file and then call it from a Delphi application. It worked really well.

I'd also try ctypes first.
Use the MATLAB compiler to compile the code into C.
Compile the C code into a DLL.
Use ctypes to load and call code from this DLL
The hardest step is probably 1, but if you already know MATLAB and have used the MATLAB compiler, you should not have serious problems with it.

Perhaps try ctypes instead of SWIG. If it has been included as a part of Python 2.5, then it must be good :-)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.