I have a python class that has a couple performance-sensitive methods that justify being implemented in C. But it also has some methods that don't need to be fast and that would be a giant pain to write in C.
Is there a standard way to have the best of both worlds, where a few core methods are defined in C but some convenience methods are defined in python?
(Ideally it should work for special methods like __str__.)
For example, maybe I could use inheritance. Is that the right way to do it? Are there performance costs?
Try Cython. It really does a fantastic job blending the best features of both languages. No longer do you have to decide between control and performance, and efficiency and ease of development.
If the C code doesn't need to interact with the object itself, possibly you could use the ctypes module to call C functions from your python code.
Put your C code in into a shared library or DLL and then call it from your method.
Related
How exactly can Python call a C library? Tensorflow, for example, I believe is written mostly in C, but can be used from Python. I'm thinking of implementing something like this in my own (interpreted) programming language (written in Go, but I assume it would be a similar process).
What happens when a Python program calls a C function? I'm thinking either RPC or DLLs, but both of them seem unlikely.
cPython has two main ways to call C code: either by loading a shared library and calling its symbols, or by packing C code as Python binary modules and then calling them from Python code as though they were ordinary Python modules, which is how high performance stuff in the standard library is implemented - e.g. json.
Loading a shared library and calling functions from it using the ctypes module is rather trivial, and you can find a lot of examples here: https://docs.python.org/3/library/ctypes.html
Packing your C code as binary Python module requires a lot of boilerplate and careful attention to details such as ref counting, null pointers, etc, and is documented here: https://docs.python.org/2/extending/extending.html
There are several libraries that automate the process and generate binding code for you. One example is boost.python: https://www.boost.org/doc/libs/1_65_0/libs/python/doc/html/tutorial/index.html
I have heard many times that C and Python/Ruby code can be integrated.
Now, my question is, can I use, for example a Python/Ruby ORM from within C?
Yes, but the API would be unlikely to be very nice, especially because the point of an ORM is to return objects and C doesn't have objects, hence making access to the nice OOP API unwieldy.
Even in C++ is would be problematic as the objects would be Python/Ruby objects and the values Python/Ruby objects/values, and you would need to convert back and forth.
You would be better off using a nice database layer especially made for C.
For Ruby, yes, you can by using the Ruby C API. After including ruby.h you can use rb_funcall:
To invoke methods directly, you can use the function below
VALUE rb_funcall(VALUE recv, ID mid, int argc, ...)
This function invokes a method on the recv, with the method name specified by the symbol mid.
This will allow you to call any Ruby method, and thus use any Ruby code from C. It won’t be pretty, though. There are a lot of good resources in SO’s Ruby C API tag wiki.
Is there a difference (in terms of execution time) between implementing a function in Python and implementing it in C and then calling it from Python? If so, why?
Python (at least the "standard" CPython implementation) never actually compiles to native machine code; it compiles to bytecode which is then interpreted. So a C function which is in fact compiled to machine code will run faster; the question is whether it will make a relevant difference. So what's the actual problem you're trying to solve?
If I understand and restate your question properly, you are asking, if wrapping python over a c executable be anyway faster than a pure python module itself? The answer to it is, it depends upon the executable and the kind of task you are performing.
There are a set of modules in Python that are written using Python C-API's. Performance of those would be comparable to wrapping a C executable
On the other hand, wrapping c program would be faster than pure python both implementing the same functionality with a sane logic. Compare difflib usage vs wrapping subprocess over diff.
The C version is often faster, but not always. One of the main points of speedup is that C code does not have to look up values dynamically, like Python (Python has reference semantics). A good example for this is Numpy. Numpy arrays are typed, all values in the array have the same type, and are internally stored in continuous block of memory. This is the main reason that numpy is so much faster, because it skips all the dynamic variable lookup that Python has to do. The most efficient C implementation of an algorithm can become very slow if it operates on Python data structures, where each value has to be looked up dynamically.
A good way to implement such things yourself and save all the hassle of Python C-APIs is to use Cython.
Typically, a function written in C will be substantially faster that the Python equivalent. It is also much more difficult to integrate, since it involves:
compiling C code that #includes the Python headers and exposes appropriate wrapper code so that it is callable from Python;
linking against the correct Python libraries;
deploying the resulting shared library to the appropriate location, so that your Python code can import it.
You would want to be very certain that the benefits outweigh the costs before trying this, which means this should only be reserved for performance-critical sections of your code that you simply can't make fast enough with pure Python.
If you really need to go down this path, Boost.Python can make the task much less painful.
My first "serious" language was Java, so I have comprehended object-oriented programming in sense that elemental brick of program is a class.
Now I write on VBA and Python. There are module languages and I am feeling persistent discomfort: I don't know how should I decompose program in a modules/classes.
I understand that one module corresponds to one knowledge domain, one module should ba able to test separately...
Should I apprehend module as namespace(c++) only?
I don't do VBA but in python, modules are fundamental. As you say, the can be viewed as namespaces but they are also objects in their own right. They are not classes however, so you cannot inherit from them (at least not directly).
I find that it's a good rule to keep a module concerned with one domain area. The rule that I use for deciding if something is a module level function or a class method is to ask myself if it could meaningfully be used on any objects that satisfy the 'interface' that it's arguments take. If so, then I free it from a class hierarchy and make it a module level function. If its usefulness truly is restricted to a particular class hierarchy, then I make it a method.
If you need it work on all instances of a class hierarchy and you make it a module level function, just remember that all the the subclasses still need to implement the given interface with the given semantics. This is one of the tradeoffs of stepping away from methods: you can no longer make a slight modification and call super. On the other hand, if subclasses are likely to redefine the interface and its semantics, then maybe that particular class hierarchy isn't a very good abstraction and should be rethought.
It is matter of taste. If you use modules your 'program' will be more procedural oriented. If you choose classes it will be more or less object oriented. I'm working with Excel for couple of months and personally I choose classes whenever I can because it is more comfortable to me. If you stop thinking about objects and think of them as Components you can use them with elegance. The main reason why I prefer classes is that you can have it more that one. You can't have two instances of module. It allows me use encapsulation and better code reuse.
For example let's assume that you like to have some kind of logger, to log actions that were done by your program during execution. You can write a module for that. It can have for example a global variable indicating on which particular sheet logging will be done. But consider the following hypothetical situation: your client wants you to include some fancy report generation functionality in your program. You are smart so you figure out that you can use your logging code to prepare them. But you can't do log and report simultaneously by one module. And you can with two instances of logging Component without any changes in their code.
Idioms of languages are different and thats the reason a problem solved in different languages take different approaches.
"C" is all about procedural decomposition.
Main idiom in Java is about "class or Object" decomposition. Functions are not absent, but they become a part of exhibited behavior of these classes.
"Python" provides support for both Class based problem decomposition as well as procedural based.
All of these uses files, packages or modules as concept for organizing large code pieces together. There is nothing that restricts you to have one module for one knowledge domain.
These are decomposition and organizing techniques and can be applied based on the problem at hand.
If you are comfortable with OO, you should be able to use it very well in Python.
VBA also allows the use of classes. Unfortunately, those classes don't support all the features of a full-fleged object oriented language. Especially inheritance is not supported.
But you can work with interfaces, at least up to a certain degree.
I only used modules like "one module = one singleton". My modules contain "static" or even stateless methods. So in my opinion a VBa module is not namespace. More often a bunch of classes and modules would form a "namespace". I often create a new project (DLL, DVB or something similar) for such a "namespace".
I have been mulling over writing a peak-fitting library for a while. I know Python fairly well and plan on implementing everything in Python to begin with but envisage that I may have to re-implement some core routines in a compiled language eventually.
IIRC, one of Python's original remits was as a prototyping language, however Python is pretty liberal in allowing functions, functors, objects to be passed to functions and methods, whereas I suspect the same is not true of say C or Fortran.
What should I know about designing functions/classes which I envisage will have to interface into the compiled language? And how much of these potential problems are dealt with by libraries such as cTypes, bgen, SWIG, Boost.Python, Cython or Python SIP?
For this particular use case (a fitting library), I imagine allowing users to define mathematical functions (Guassian, Lorentzian etc.) as Python functions which can then to be passed an interpreted by the compiled code fitting library. Passing and returning arrays is also essential.
Finally a question that I can really put a value answer to :).
I have investigated f2py, boost.python, swig, cython and pyrex for my work (PhD in optical measurement techniques). I used swig extensively, boost.python some and pyrex and cython a lot. I also used ctypes. This is my breakdown:
Disclaimer: This is my personal experience. I am not involved with any of these projects.
swig:
does not play well with c++. It should, but name mangling problems in the linking step was a major headache for me on linux & Mac OS X. If you have C code and want it interfaced to python, it is a good solution. I wrapped the GTS for my needs and needed to write basically a C shared library which I could connect to. I would not recommend it.
Ctypes:
I wrote a libdc1394 (IEEE Camera library) wrapper using ctypes and it was a very straigtforward experience. You can find the code on https://launchpad.net/pydc1394. It is a lot of work to convert headers to python code, but then everything works reliably. This is a good way if you want to interface an external library. Ctypes is also in the stdlib of python, so everyone can use your code right away. This is also a good way to play around with a new lib in python quickly. I can recommend it to interface to external libs.
Boost.Python: Very enjoyable. If you already have C++ code of your own that you want to use in python, go for this. It is very easy to translate c++ class structures into python class structures this way. I recommend it if you have c++ code that you need in python.
Pyrex/Cython: Use Cython, not Pyrex. Period. Cython is more advanced and more enjoyable to use. Nowadays, I do everything with cython that i used to do with SWIG or Ctypes. It is also the best way if you have python code that runs too slow. The process is absolutely fantastic: you convert your python modules into cython modules, build them and keep profiling and optimizing like it still was python (no change of tools needed). You can then apply as much (or as little) C code mixed with your python code. This is by far faster then having to rewrite whole parts of your application in C; you only rewrite the inner loop.
Timings: ctypes has the highest call overhead (~700ns), followed by boost.python (322ns), then directly by swig (290ns). Cython has the lowest call overhead (124ns) and the best feedback where it spends time on (cProfile support!). The numbers are from my box calling a trivial function that returns an integer from an interactive shell; module import overhead is therefore not timed, only function call overhead is. It is therefore easiest and most productive to get python code fast by profiling and using cython.
Summary: For your problem, use Cython ;). I hope this rundown will be useful for some people. I'll gladly answer any remaining question.
Edit: I forget to mention: for numerical purposes (that is, connection to NumPy) use Cython; they have support for it (because they basically develop cython for this purpose). So this should be another +1 for your decision.
I haven't used SWIG or SIP, but I find writing Python wrappers with boost.python to be very powerful and relatively easy to use.
I'm not clear on what your requirements are for passing types between C/C++ and python, but you can do that easily by either exposing a C++ type to python, or by using a generic boost::python::object argument to your C++ API. You can also register converters to automatically convert python types to C++ types and vice versa.
If you plan use boost.python, the tutorial is a good place to start.
I have implemented something somewhat similar to what you need. I have a C++ function that
accepts a python function and an image as arguments, and applies the python function to each pixel in the image.
Image* unary(boost::python::object op, Image& im)
{
Image* out = new Image(im.width(), im.height(), im.channels());
for(unsigned int i=0; i<im.size(); i++)
{
(*out)[i] == extract<float>(op(im[i]));
}
return out;
}
In this case, Image is a C++ object exposed to python (an image with float pixels), and op is a python defined function (or really any python object with a __call__ attribute). You can then use this function as follows (assuming unary is located in the called image that also contains Image and a load function):
import image
im = image.load('somefile.tiff')
double_im = image.unary(lambda x: 2.0*x, im)
As for using arrays with boost, I personally haven't done this, but I know the functionality to expose arrays to python using boost is available - this might be helpful.
The best way to plan for an eventual transition to compiled code is to write the performance sensitive portions as a module of simple functions in a functional style (stateless and without side effects), which accept and return basic data types.
This will provide a one-to-one mapping from your Python prototype code to the eventual compiled code, and will let you use ctypes easily and avoid a whole bunch of headaches.
For peak fitting, you'll almost certainly need to use arrays, which will complicate things a little, but is still very doable with ctypes.
If you really want to use more complicated data structures, or modify the passed arguments, SWIG or Python's standard C-extension interface will let you do what you want, but with some amount of hassle.
For what you're doing, you may also want to check out NumPy, which might do some of the work you would want to push to C, as well as offering some additional help in moving data back and forth between Python and C.
f2py (part of numpy) is a simpler alternative to SWIG and boost.python for wrapping C/Fortran number-crunching code.
In my experience, there are two easy ways to call into C code from Python code. There are other approaches, all of which are more annoying and/or verbose.
The first and easiest is to compile a bunch of C code as a separate shared library and then call functions in that library using ctypes. Unfortunately, passing anything other than basic data types is non-trivial.
The second easiest way is to write a Python module in C and then call functions in that module. You can pass anything you want to these C functions without having to jump through any hoops. And it's easy to call Python functions or methods from these C functions, as described here: https://docs.python.org/extending/extending.html#calling-python-functions-from-c
I don't have enough experience with SWIG to offer intelligent commentary. And while it is possible to do things like pass custom Python objects to C functions through ctypes, or to define new Python classes in C, these things are annoying and verbose and I recommend taking one of the two approaches described above.
Python is pretty liberal in allowing functions, functors, objects to be passed to functions and methods, whereas I suspect the same is not true of say C or Fortran.
In C you cannot pass a function as an argument to a function but you can pass a function pointer which is just as good a function.
I don't know how much that would help when you are trying to integrate C and Python code but I just wanted to clear up one misconception.
In addition to the tools above, I can recommend using Pyrex
(for creating Python extension modules) or Psyco (as JIT compiler for Python).