Python C interoperability - python

I wish to wrap an existing C (pure C that is. No C++) library into Python so that I can call it from Python scripts. Which approach among the various available (C Api, SWIG etc.) would be the most suitable?

go with Ctypes, it is part of standard distribution and works very well.
basically you can wrap C structures and types in python classes, as well as functions. Some types and functionality is already provided by library.
ctypes
couple caveats though: passing triple pointers to C routines is not obvious (if you have to), and I could not get it to work with static libraries on Linux, DLL and shared objects are fine.

SWIG is great for doing this. Here is a sample tutorial: http://www.swig.org/Doc1.3/Python.html.

Since your code is "pure" C you might consider using Pyrex/Cython. This is not a voting issue and Cython has already been mentioned. I am just clarifying why it is a better choice for pure C.

Related

How does Python call C?

How exactly can Python call a C library? Tensorflow, for example, I believe is written mostly in C, but can be used from Python. I'm thinking of implementing something like this in my own (interpreted) programming language (written in Go, but I assume it would be a similar process).
What happens when a Python program calls a C function? I'm thinking either RPC or DLLs, but both of them seem unlikely.
cPython has two main ways to call C code: either by loading a shared library and calling its symbols, or by packing C code as Python binary modules and then calling them from Python code as though they were ordinary Python modules, which is how high performance stuff in the standard library is implemented - e.g. json.
Loading a shared library and calling functions from it using the ctypes module is rather trivial, and you can find a lot of examples here: https://docs.python.org/3/library/ctypes.html
Packing your C code as binary Python module requires a lot of boilerplate and careful attention to details such as ref counting, null pointers, etc, and is documented here: https://docs.python.org/2/extending/extending.html
There are several libraries that automate the process and generate binding code for you. One example is boost.python: https://www.boost.org/doc/libs/1_65_0/libs/python/doc/html/tutorial/index.html

How to know what part of CPython is implemented in C?

Parts of the standard library of CPython that were written in C are faster than the parts that implemented in Python. To optimize the code, it would be good if one uses the functions which are implemented in C. My question is how can you determine or know the parts of the standard library is implemented in C?
Parts of the standard library of CPython that were written in C are faster than the parts that implemented in Python. To optimise your code, it good if you use the functions and implemented in C.
While that's correct it's only half of the story. All builtins are implemented in C, a lot of standard-library modules are completely or partly implemented in C. So everything already uses C functions.
For example collections.Counter is a pure-Python class but the collections._count_elements function (Python 3) is implemented in C and used by Counter so that it can "count faster". But makes that Counter a C function?
So it's not a clear-cut thing and you shouldn't expect a Python part to be necessarily (much) slower than if it were implemented in C. Also "written in C" or "written in Python" is kind of an implementation detail. So what's written in Python now could be re-implemented as C functions in a future version (probably also vise-versa but that happens less frequently or not at all).
how can you determine or know the parts of the standard library that were implemented in C?
You have to investigate that yourself. Some modules are available with a C implementation and a Python implementation (for example StringIO vs. cStringIO (python 2)) and other modules are completely implemented in C (for example itertools), others are partly implemented in C (for example collections).
Fortunately the CPython source code is available at GitHub but it still requires to look at the Lib folder to check if there's a Python implementation. If there is no Python implementation it's almost certainly completely written in C but if there's a .py file (or in a subfolder) you still need to check what's imported there. For example collections imports (and overrides) a lot of things from _collections which is implemented in C.
My question is how can you determine or know the parts of the standard library that were implemented in C?
You can read the sources of the standard library.
Python Modules
C modules
Or, what you should probably do, is measure the performance of your code, and then act based on that.

What are the different options for interfacing C (or C++) with Python?

I know there are many ways to interface C function into Python: the Python C API, scipy.weave, ctypes, pyrex/cython, SWIG, Boost.Python, Psyco... What are each of them best for? Why should I use a given method instead of others? What should be considered when I need to choose a binding between Python and C?
I know some discussions about that, but they all seems incomplete...
http://wiki.cython.org/SWIG
http://sage.math.washington.edu/tmp/sage-2.8.12.alpha0/doc/prog/node35.html
I know that some questions on StackOverflow are related too. For example:
About interfacing an existing C library
C API vs Cython
I haven't used all these methods although I have investigated them all at one point or another...
The Python C API: For writing C code that compiles to a python module that can be imported in Python. Or for writing a Python module that acts as "glue" code to interface with some C library.
scipy.weave: Allows you to shove bits of C code into your python code, if you're using NumPy and SciPy for doing numeric work, look into this. The C code would be as a string, like, weave.inline('printf("%s", foo)') for example.
ctypes: A python module that allows you to call in to C code from your python code. You basically import the shared library then make calls into its API. Some work needed to marshall data in and out of those calls. If you're looking at using an existing C library that you or someone else wrote, I'd start here.
pyrex/cython: Allows you to write Python code (using some special syntax) that will get generated into C code (which can be imported as a Python module) and, obviously, run faster than if it was run through the Python interpreter. This is kind of like the "Python C API" route, only it generates the C code for you. Useful if you have some chunk of code that is your bottleneck and is really slow. Rewrite that function using cython and import it from the calling code.
SWIG: Generates wrapper code for a C/C++ library. You should end up with a python module you can import and use.
Boost.Python: This is the one I know the least about. Looks to me like it's similar to SWIG although you write the wrapper layer yourself, but with a lot of help from Boost macros/functions.
Psyco: Speeds up your python code a bit, I've never had much luck with this. I wouldn't waste your time with it. Profile your code, find your bottlenecks and speed them up using one of the above techniques.
This is only a brief answer to a portion of your question, but:
ctypes is probably best when you have a preexisting C library that you want to use with Python.
The Python C API is best when you either want to write something in C that utilizes aspects of Python, or want to write an extension for Python in C. (Cython is another way of doing this.)
Of course, both of those are likely elaborated on in much more detail in some of the answers to the SO questions you link to in your question.

Is there a python wrapper for a FastLZ implementation

Looking to use FastLZ in Python, or something similar. Tried Google and didn't find anything. Wondering if there is another algorithm with similar performance available in Python?
What about using ctypes to call directly into fastlz.so (or .dll as the case may be)? It seems to have only 3 entry points, so wrapping them in ctypes should not be hard. Yes, SWIG or a custom C API wrapper should be almost as trivial, but ctypes lets you start experimenting right now even if you don't have a compiler (as long as you can get a working DLL/so of FastLZ for your platform)... hard to beat!-)
Blosc exposes FastLZ and several other compressors in Python.

Prototyping with Python code before compiling

I have been mulling over writing a peak-fitting library for a while. I know Python fairly well and plan on implementing everything in Python to begin with but envisage that I may have to re-implement some core routines in a compiled language eventually.
IIRC, one of Python's original remits was as a prototyping language, however Python is pretty liberal in allowing functions, functors, objects to be passed to functions and methods, whereas I suspect the same is not true of say C or Fortran.
What should I know about designing functions/classes which I envisage will have to interface into the compiled language? And how much of these potential problems are dealt with by libraries such as cTypes, bgen, SWIG, Boost.Python, Cython or Python SIP?
For this particular use case (a fitting library), I imagine allowing users to define mathematical functions (Guassian, Lorentzian etc.) as Python functions which can then to be passed an interpreted by the compiled code fitting library. Passing and returning arrays is also essential.
Finally a question that I can really put a value answer to :).
I have investigated f2py, boost.python, swig, cython and pyrex for my work (PhD in optical measurement techniques). I used swig extensively, boost.python some and pyrex and cython a lot. I also used ctypes. This is my breakdown:
Disclaimer: This is my personal experience. I am not involved with any of these projects.
swig:
does not play well with c++. It should, but name mangling problems in the linking step was a major headache for me on linux & Mac OS X. If you have C code and want it interfaced to python, it is a good solution. I wrapped the GTS for my needs and needed to write basically a C shared library which I could connect to. I would not recommend it.
Ctypes:
I wrote a libdc1394 (IEEE Camera library) wrapper using ctypes and it was a very straigtforward experience. You can find the code on https://launchpad.net/pydc1394. It is a lot of work to convert headers to python code, but then everything works reliably. This is a good way if you want to interface an external library. Ctypes is also in the stdlib of python, so everyone can use your code right away. This is also a good way to play around with a new lib in python quickly. I can recommend it to interface to external libs.
Boost.Python: Very enjoyable. If you already have C++ code of your own that you want to use in python, go for this. It is very easy to translate c++ class structures into python class structures this way. I recommend it if you have c++ code that you need in python.
Pyrex/Cython: Use Cython, not Pyrex. Period. Cython is more advanced and more enjoyable to use. Nowadays, I do everything with cython that i used to do with SWIG or Ctypes. It is also the best way if you have python code that runs too slow. The process is absolutely fantastic: you convert your python modules into cython modules, build them and keep profiling and optimizing like it still was python (no change of tools needed). You can then apply as much (or as little) C code mixed with your python code. This is by far faster then having to rewrite whole parts of your application in C; you only rewrite the inner loop.
Timings: ctypes has the highest call overhead (~700ns), followed by boost.python (322ns), then directly by swig (290ns). Cython has the lowest call overhead (124ns) and the best feedback where it spends time on (cProfile support!). The numbers are from my box calling a trivial function that returns an integer from an interactive shell; module import overhead is therefore not timed, only function call overhead is. It is therefore easiest and most productive to get python code fast by profiling and using cython.
Summary: For your problem, use Cython ;). I hope this rundown will be useful for some people. I'll gladly answer any remaining question.
Edit: I forget to mention: for numerical purposes (that is, connection to NumPy) use Cython; they have support for it (because they basically develop cython for this purpose). So this should be another +1 for your decision.
I haven't used SWIG or SIP, but I find writing Python wrappers with boost.python to be very powerful and relatively easy to use.
I'm not clear on what your requirements are for passing types between C/C++ and python, but you can do that easily by either exposing a C++ type to python, or by using a generic boost::python::object argument to your C++ API. You can also register converters to automatically convert python types to C++ types and vice versa.
If you plan use boost.python, the tutorial is a good place to start.
I have implemented something somewhat similar to what you need. I have a C++ function that
accepts a python function and an image as arguments, and applies the python function to each pixel in the image.
Image* unary(boost::python::object op, Image& im)
{
Image* out = new Image(im.width(), im.height(), im.channels());
for(unsigned int i=0; i<im.size(); i++)
{
(*out)[i] == extract<float>(op(im[i]));
}
return out;
}
In this case, Image is a C++ object exposed to python (an image with float pixels), and op is a python defined function (or really any python object with a __call__ attribute). You can then use this function as follows (assuming unary is located in the called image that also contains Image and a load function):
import image
im = image.load('somefile.tiff')
double_im = image.unary(lambda x: 2.0*x, im)
As for using arrays with boost, I personally haven't done this, but I know the functionality to expose arrays to python using boost is available - this might be helpful.
The best way to plan for an eventual transition to compiled code is to write the performance sensitive portions as a module of simple functions in a functional style (stateless and without side effects), which accept and return basic data types.
This will provide a one-to-one mapping from your Python prototype code to the eventual compiled code, and will let you use ctypes easily and avoid a whole bunch of headaches.
For peak fitting, you'll almost certainly need to use arrays, which will complicate things a little, but is still very doable with ctypes.
If you really want to use more complicated data structures, or modify the passed arguments, SWIG or Python's standard C-extension interface will let you do what you want, but with some amount of hassle.
For what you're doing, you may also want to check out NumPy, which might do some of the work you would want to push to C, as well as offering some additional help in moving data back and forth between Python and C.
f2py (part of numpy) is a simpler alternative to SWIG and boost.python for wrapping C/Fortran number-crunching code.
In my experience, there are two easy ways to call into C code from Python code. There are other approaches, all of which are more annoying and/or verbose.
The first and easiest is to compile a bunch of C code as a separate shared library and then call functions in that library using ctypes. Unfortunately, passing anything other than basic data types is non-trivial.
The second easiest way is to write a Python module in C and then call functions in that module. You can pass anything you want to these C functions without having to jump through any hoops. And it's easy to call Python functions or methods from these C functions, as described here: https://docs.python.org/extending/extending.html#calling-python-functions-from-c
I don't have enough experience with SWIG to offer intelligent commentary. And while it is possible to do things like pass custom Python objects to C functions through ctypes, or to define new Python classes in C, these things are annoying and verbose and I recommend taking one of the two approaches described above.
Python is pretty liberal in allowing functions, functors, objects to be passed to functions and methods, whereas I suspect the same is not true of say C or Fortran.
In C you cannot pass a function as an argument to a function but you can pass a function pointer which is just as good a function.
I don't know how much that would help when you are trying to integrate C and Python code but I just wanted to clear up one misconception.
In addition to the tools above, I can recommend using Pyrex
(for creating Python extension modules) or Psyco (as JIT compiler for Python).

Categories