C++ extension to Python - safe memory access, and memory layout

C++ extension to Python - safe memory access, and memory layout - python

I'm extending python code with c++ functions acting on Numpy arrays (very large).
Due to legacy i currently have both PyBind and Python API functions, both for Python 3.6 and above.
As soon as I access memory via ptr, I would love to be sure that memory layout is exactly corresponding to c++ array under this ptr.
I found that transposed array has exactly the same content in ptr in both cases. I also found that subarrays being send via Python API give in c++ exactly the same ptr as if it was the full array. In the course of development and testing i also observed more weird examples I believe, but cannot reproduce them any longer.
I cannot find any recipes on the internet so far. My solution is to make copy of all input arrays in Python, like
f(a.copy(), b.copy())
It seem to work well.
Is this optimal/sufficient solution?
I do not have any limitations on how input arrays have been produced. Transpose, subarray, reshape, in any combinations.

With pybind11, you can use the py::array::c_style flag as described in Matt Eding’s link. Numpy’s C API provides much the same functionality via the NPY_ARRAY_C_CONTIGUOUS flag. In either case, the array will be copied implicitly if needed to satisfy the layout requirements; if you prefer to reject such arguments (to avoid silent inefficiency), you’ll have to check the array’s flags yourself.

Related

Interpret Python bytecode in C# (with fine control)

For a project idea of mine, I have the following need, which is quite precise:
I would like to be able to execute Python code (pre-compiled before hand if necessary) on a per-bytecode-instruction basis. I also need to access what's inside the Python VM (frame stack, data stacks, etc.). Ideally, I would also like to remove a lot of Python built-in features and reimplement a few of them my own way (such as file writing).
All of this must be coded in C# (I'm using Unity).
I'm okay with loosing a few of Python's actual features, especially concerning complicated stuff with imports, etc. However, I would like most of it to stay intact.
I looked a little bit into IronPython's code but it remains very obscure to me and it seems quite enormous too. I began translating Byterun (a Python bytecode interpreter written in Python) but I face a lot of difficulties as Byterun leverages a lot of Python's features to... interpret Python.
Today, I don't ask for a pre-made solution (except if you have one in mind?), but rather for some advice, places to look at, etc. Do you have any ideas about the things I should research first?

I've tried to do my own implementation of the Python VM in the distant past and learned a lot but never came even close to a fully working implementation. I used the C implementation as a starting point, specifically everything in https://github.com/python/cpython/tree/main/Objects and
https://github.com/python/cpython/blob/main/Python/ceval.c (look for switch(opcode))
Here are some pointers:
Come to grips with the Python object model. Implement an abstract PyObject class with the necessary methods for instancing, attribute access, indexing and slicing, calling, comparisons, aritmetic operations and representation. Provide concrete implemetations for None, booleans, ints, floats, strings, tuples, lists and dictionaries.
Implement the core of your VM: a Frame object that loops over the opcodes and dispatches, using a giant switch statment (following the C implementation here), to the corresponding methods of the PyObject. The frame should maintains a stack of PyObjects for the operants of the opcodes. Depending on the opcode, arguments are popped from and pushed on this stack. A dict can be used to store and retrieve local variables. Use the Frame object to create a PyObject for function objects.
Get familiar with the idea of a namespace and the way Python builds on the concept of namespaces. Implement a module, a class and an instance object, using the dict to map (attribute)names to objects.
Finally, add as many builtin functions as you think you need to get a usefull implementation.
I think it is easy to underestimate the amount of work you're getting yourself into, but ... have fun!

Is there a Python library that would have numpy-like data types

I'm a C developer that need to access a FTDI device using the pyftdi library and as I try to manipulate the remote registers of the slaves. I found that it is impossible to execute bitwise operations (bitshift, NOT, AND and OR) on things other than int in Python but I found the NumPy library that has data types that enable such functionality. My problem is that NumPy is very heavy on resources and I wonder if there is an alternative to such a heavy library.
I've tried the native BitField of PyFtdi which doesn't have such functionalities and the ctypes library which neither has.
This is the kind of code where I would like not to use NumPy for:
def set_bit(variable, bit_ID):
variable |= np.ubyte(1 << bit_ID)
return variable
Again, the main issue is not that it doesn't work, it is that it works but is very heavy on ressources and I need the functionalities of a 8-bit variables with the bitwise operators without always switching data types.
I need that kind of variable to avoid casting multiple times my variables using pyftdi's functions:
Acquiring data from it in the Python-native bytes() and converting it to int.
Then using the bitwise operator and restricting to 8 bits the outputs.
Then converting back to bytes() to send them back via the I2C API of PYFTDI.

Finally, I have resorted to cast each time the data types in one or another data type. It's not the most elegant answer to such a problem but it'll do for now. If someone ever has a miracle solution, fell free to post it ;)

create ndarray out of c++ pointer

I created a module in c++ and need to use the results in python.
Already wrote a wrapper and it is working with this code
a = np.empty([r, hn])
for i in xrange(r):
for j in xrange(hn):
a[i,j]=self.thisptr.H[i*hn+j]
return a
The code is working, but I think there should be an easier and faster way to handle the pointer data.
Sadly I am not used to python and cython and can't figure it out myself.
Any help would be appreciated. :)

Typed memoryviews (http://docs.cython.org/src/userguide/memoryviews.html) are your friend here.
a = np.empty([r,hn])
# interpret the array as a typed memoryview of shape (r, hn)
# and copy into a
# I've assumed the array is of type double* for the sake of answering the question
a[...] = <double[:r,:hn]>self.thisptr.H
It well may not be a huge amount faster (internally it's a loop pretty similar to what your wrote), but it is easier.
Alternatively, even simpler, just using the example from the documentation (http://docs.cython.org/src/userguide/memoryviews.html#coercion-to-numpy)
a = np.asarray(<double[:r,:hn]>self.thisptr.H)

A possible approach is to manually write the wrapper in C. The struct of your Python object can contain a pointer to the C++ object. Looking at my code (I did this is 2005), I see that I tested for NULL in C functions that need the C++ object and created it on the fly.
Nice side effect is that you don't have to expose all C++ methods 1:1 to Python and you can adjust the interface to make it more Pythonic. In my wrapper, I stored some additional information in the struct to be able to emulate Python list behaviour and to make loading data into the C++ object more efficient.

Embed python into fortran 90

I was looking at the option of embedding python into fortran90 to add python functionality to my existing fortran90 code. I know that it can be done the other way around by extending python with fortran90 using the f2py from numpy. But, i want to keep my super optimized main loop in fortran and add python to do some additional tasks / evaluate further developments before I can do it in fortran, and also to ease up code maintenance. I am looking for answers for the following questions:
1) Is there a library that already exists from which I can embed python into fortran? (I am aware of f2py and it does it the other way around)
2) How do we take care of data transfer from fortran to python and back?
3) How can we have a call back functionality implemented? (Let me describe the scenario a bit....I have my main_fortran program in Fortran, that call Func1_Python module in python. Now, from this Func1_Python, I want to call another function...say Func2_Fortran in fortran)
4) What would be the impact of embedding the interpreter of python inside fortran in terms of performance....like loading time, running time, sending data (a large array in double precision) across etc.
Thanks a lot in advance for your help!!
Edit1: I want to set the direction of the discussion right by adding some more information about the work I am doing. I am into scientific computing stuff. So, I would be working a lot on huge arrays / matrices in double precision and doing floating point operations. So, there are very few options other than fortran really to do the work for me. The reason i want to include python into my code is that I can use NumPy for doing some basic computations if necessary and extend the capabilities of the code with minimal effort. For example, I can use several libraries available to link between python and some other package (say OpenFoam using PyFoam library).

1. Don't do it
I know that you're wanting to add Python code inside a Fortan program, instead of having a Python program with Fortran extensions. My first piece of advice is to not do this. Fortran is faster than Python at array arithmetic, but Python is easier to write than Fortran, it's easier to extend Python code with OOP techniques, and Python may have access to libraries that are important to you. You mention having a super-optimized main loop in Fortran; Fortran is great for super-optimized inner loops. The logic for passing a Fortran array around in a Python program with Numpy is much more straightforward than what you would have to do to correctly handle a Python object in Fortran.
When I start a scientific computing project from scratch, I always write first in Python, identify performance bottlenecks, and translate those into Fortran. Being able to test faster Fortran code against validated Python code makes it easier to show that the code is working correctly.
Since you have existing code, extending the Python code with a module made in Fortran will require refactoring, but this process should be straightforward. Separate the initialization code from the main loop, break the loop into logical pieces, wrap each of these routines in a Python function, and then your main Python code can call the Fortran subroutines and interleave these with Python functions as appropriate. In this process, you may be able to preserve a lot of the optimizations you have in your main loop in Fortran. F2PY is a reasonably standard tool for this, so it won't be tough to find people who can help you with whatever problems will arise.
2. System calls
If you absolutely must have Fortran code calling Python code, instead of the other way around, the simplest way to do this is to just have the Fortran code write some data to disk, and run the Python code with a SYSTEM or EXECUTE_COMMAND_LINE. If you use EXECUTE_COMMAND_LINE, you can have the Python code output its result to stdout, and the Fortran code can read it as character data; if you have a lot of output (e.g., a big matrix), it would make more sense for the Python code to output a file that the Fortran code then reads. Disk read/write overhead could wind up being prohibitively significant for this. Also, you would have to write Fortran code to output your data, Python code to read it, Python code to output it again, and Fortran code to re-input the data. This code should be straightforward to write and test, but keeping these four parts in sync as you edit the code may turn into a headache.
(This approach is tried in this Stack Overflow question)
3. Embedding Python in C in Fortran
There is no way that I know of to directly pass a Python object in memory to Fortran. However, Fortran code can call C code, and C code can have Python embedded in it. (See the Python tutorial on extending and embedding.) In general, extending Python (like I recommend in point 1) is preferable to embedding it in C/C++. (See Extending Vs. Embedding: There is Only One Correct Decision.) Getting this to work will be a nightmare, because any communication problems between Python and Fortran could happen between Python and C, or between C and Fortran. I don't know if anyone is actually embedding Python in C in Fortran, and so getting help will be difficult.

I have developed the library Forpy that allows you to use Python in Fortran (embedding).
It uses Fortran C interoperability to call Python C API functions.
While I agree that extending (using Fortran in Python) is often preferable, embedding has its uses:
Large, existing Fortran codes might need a substantial amount of refactoring before
they can be used from Python - here embedding can save development time
Replacing a part of an existing code with a Python implementation
Temporarily embedding Python to experiment with a given Fortran code:
for example to test alternative algorithms or to extract intermediary results
Besides embedding, Forpy also supports extending Python.
With Forpy you can write a Python extension module entirely in Fortran.
An advantage to existing tools such as f2py is that you can use Python datatypes
(e. g. to write a function that takes a Python list as argument or a function that returns a Python dict).
Working with existing, possibly legacy, Fortran codes is often very challenging and I
think that developers should have tools at their disposal both for embedding and extending Python.

If you are going to embed Python in Fortran, you will have to do it via Fortran's C interface; that's what ISO_C_BINDING is for. I would caution against embedding Python, not because of the technical difficulty in doing so, but because Python (the language or the community) seems adamantly opposed to Python being used as a subordinate language. The common view is that whatever non-Python language your code is currently written in should be broken up into libraries and used to extend Python, never the other way around. So you will see (as here) more responses trying to convince you that you really don't want to do what you actually want to do than actual technical assistance.
This is not flaming or editorializing or making a moral judgment; this is a simple statement of fact. You will not get help from the Python community if you try to embed Python.
If what you need is functionality beyond what the Fortran language itself supports (e.g. filesystem operations) and you do not specifically need Python and you want a language more expressive than C, you may want to look at embedding Lua instead. Unlike Python, Lua is specifically meant to be embedded so you are likely to face much less social and technical resistance.
There are a number of projects which integrate Fortran and Lua, the most complete one I've seen to date is Aotus. The author is very responsive and the integration process is simple.
Admittedly, this does not answer the original question (how to embed a Python interpreter in a Fortran 90 application) but to be fair, none of the other responses do either. I use Python as my portable general-purpose language of choice these days and I'd really prefer to stick with it when extending our primary products (written in Fortran). For the reasons laid out above, I abandoned my attempts to embed Python and switched to embedding Lua; for social reasons, I feel Lua is a much better technical choice. It's not my first choice but it's workable, at least in my case.
Apologies if I've offended anyone; I'm not trying to pick a fight, just relating my experience when researching this specific topic.

There is a very easy way to do this using f2py. Write your python method and add it as an input to your Fortran subroutine. Declare it in both the cf2py hook and the type declaration as EXTERNAL and also as its return value type, e.g. REAL*8. Your Fortran code will then have a pointer to the address where the python method is stored. It will be SLOW AS MOLASSES, but for testing out algorithms it can be useful. I do this often (I port a lot of ancient spaghetti Fortran to python modules...) It's also a great way to use things like optimised Scipy calls in legacy fortran

I have tried out several approaches to solve the problem and I have found one possibly optimal way of doing it. I will briefly list down the approaches and the results.
1) Embedding via System call: Everytime we want to access python from fortran, we use the system call to execute some python script and exchange data between them. The speed in this approach is limited by the disk read, write (in this age of cache level optimization of code, going to disk is a mortal sin). Also, we need to initialize the interpreter everytime we want to execute the script, which is a considerable overhead. A simple Runge Kutta 4th order method running for 300 timesteps took a whopping 59 seconds to execute.
2) Going from Fortran to Python via C: We use the ISO_C bindings to communicate between Fortran and C; and we embed the Python interpreter inside C. I got parts of it working, but in the meanwhile I found a better way and dropped this idea. I would still like to evaluate this for the sake of completeness though.
3) Using f2py to import Fortran subroutines to Python (Extending) :
Here, we take the main loop out of Fortran and code it in Python (this approach is called Extending Python with Fortran); and we import all Fortran subroutines into Python using f2py (http://cens.ioc.ee/projects/f2py2e/usersguide/). We have the flexibility of having the most important data in any Scientific application, i.e. the outermost loop (generally the time loop) in Python so that we can couple it with other applications. But, we also have the drawback of having to exchange possibly more than needed data between Fortran and Python. The same Runge Kutta 4th order method example took 0.372 seconds to execute.
4) Mimicking Embedding via Extending:
Till now we have seen the two pure approaches of Embedding (the main loop stays in fortran and we call python as needed) and Extending (the main loop stay in python and we call fortran as needed). There is another way of doing it, which I found to be the most optimal. Transferring parts of main loop into Python would lead to an overhead, which may not be necessary all the time. To get rid of this overhead, we can keep the main loop in Fortran which is converted into a subroutine without any changes, have a pseudo main loop in Python, which just calls the main loop in Fortran and the Program executes as if it was our untouched Fortran program. Whenever necessary, we can use a callback function to come back to python with the required data, execute a script and go back to fortran again. In this approach, the Runge Kutta 4th Order method took 0.083 seconds. I profiled the code, and found that the initialization of python interpreter and loading took 0.075 seconds and the program took only 0.008 seconds (which includes 300 callback functions to python). The original fortran code took 0.007 seconds. So, we get almost Fortran like performance with python like flexibility using this approach.

I've just successfully embedded Python into our in-house ~500 KLOC Fortran program with cffi. An important aspect was not to touch the existing code. The program is written in Fortran 95. I wrote a thin 2003 wrapper using the iso_c_binding module that simply imports data from the various modules, gets C pointers to those data and/or wraps Fortran types into C structs, puts everything into a single type/struct and sends off to a C function. This C function happens to be a Python function wrapped with cffi. It unpacks the C struct into a more user-friendly Python object, wraps Fortran arrays as Numpy arrays (no copying) and then either drops into an interactive Python console or runs a Python script, based on user configuration. There is no need to write C code except for a single header file. There is obviously quite some overhead, but this functionality is intended for extendibility, not performance.
I would advise against f2py. It's not maintained well and severely limits the public interface of your Fortran code.

I wrote a library for that, forcallpy, using a C layer which embeds Python expressions interpretation functions, and working specifically on arguments passing between Fortran and Python to make scripts call as easy as possible (uses embedded numpy to directly map Fortran arrays inside ndarrays, use arguments names to know their type in the C/Python side).
You can see some examples in the documentation at readthedocs.
Laurent.

Prototyping with Python code before compiling

I have been mulling over writing a peak-fitting library for a while. I know Python fairly well and plan on implementing everything in Python to begin with but envisage that I may have to re-implement some core routines in a compiled language eventually.
IIRC, one of Python's original remits was as a prototyping language, however Python is pretty liberal in allowing functions, functors, objects to be passed to functions and methods, whereas I suspect the same is not true of say C or Fortran.
What should I know about designing functions/classes which I envisage will have to interface into the compiled language? And how much of these potential problems are dealt with by libraries such as cTypes, bgen, SWIG, Boost.Python, Cython or Python SIP?
For this particular use case (a fitting library), I imagine allowing users to define mathematical functions (Guassian, Lorentzian etc.) as Python functions which can then to be passed an interpreted by the compiled code fitting library. Passing and returning arrays is also essential.

Finally a question that I can really put a value answer to :).
I have investigated f2py, boost.python, swig, cython and pyrex for my work (PhD in optical measurement techniques). I used swig extensively, boost.python some and pyrex and cython a lot. I also used ctypes. This is my breakdown:
Disclaimer: This is my personal experience. I am not involved with any of these projects.
swig:
does not play well with c++. It should, but name mangling problems in the linking step was a major headache for me on linux & Mac OS X. If you have C code and want it interfaced to python, it is a good solution. I wrapped the GTS for my needs and needed to write basically a C shared library which I could connect to. I would not recommend it.
Ctypes:
I wrote a libdc1394 (IEEE Camera library) wrapper using ctypes and it was a very straigtforward experience. You can find the code on https://launchpad.net/pydc1394. It is a lot of work to convert headers to python code, but then everything works reliably. This is a good way if you want to interface an external library. Ctypes is also in the stdlib of python, so everyone can use your code right away. This is also a good way to play around with a new lib in python quickly. I can recommend it to interface to external libs.
Boost.Python: Very enjoyable. If you already have C++ code of your own that you want to use in python, go for this. It is very easy to translate c++ class structures into python class structures this way. I recommend it if you have c++ code that you need in python.
Pyrex/Cython: Use Cython, not Pyrex. Period. Cython is more advanced and more enjoyable to use. Nowadays, I do everything with cython that i used to do with SWIG or Ctypes. It is also the best way if you have python code that runs too slow. The process is absolutely fantastic: you convert your python modules into cython modules, build them and keep profiling and optimizing like it still was python (no change of tools needed). You can then apply as much (or as little) C code mixed with your python code. This is by far faster then having to rewrite whole parts of your application in C; you only rewrite the inner loop.
Timings: ctypes has the highest call overhead (~700ns), followed by boost.python (322ns), then directly by swig (290ns). Cython has the lowest call overhead (124ns) and the best feedback where it spends time on (cProfile support!). The numbers are from my box calling a trivial function that returns an integer from an interactive shell; module import overhead is therefore not timed, only function call overhead is. It is therefore easiest and most productive to get python code fast by profiling and using cython.
Summary: For your problem, use Cython ;). I hope this rundown will be useful for some people. I'll gladly answer any remaining question.
Edit: I forget to mention: for numerical purposes (that is, connection to NumPy) use Cython; they have support for it (because they basically develop cython for this purpose). So this should be another +1 for your decision.

I haven't used SWIG or SIP, but I find writing Python wrappers with boost.python to be very powerful and relatively easy to use.
I'm not clear on what your requirements are for passing types between C/C++ and python, but you can do that easily by either exposing a C++ type to python, or by using a generic boost::python::object argument to your C++ API. You can also register converters to automatically convert python types to C++ types and vice versa.
If you plan use boost.python, the tutorial is a good place to start.
I have implemented something somewhat similar to what you need. I have a C++ function that
accepts a python function and an image as arguments, and applies the python function to each pixel in the image.
Image* unary(boost::python::object op, Image& im)
{
Image* out = new Image(im.width(), im.height(), im.channels());
for(unsigned int i=0; i<im.size(); i++)
{
(*out)[i] == extract<float>(op(im[i]));
}
return out;
}
In this case, Image is a C++ object exposed to python (an image with float pixels), and op is a python defined function (or really any python object with a __call__ attribute). You can then use this function as follows (assuming unary is located in the called image that also contains Image and a load function):
import image
im = image.load('somefile.tiff')
double_im = image.unary(lambda x: 2.0*x, im)
As for using arrays with boost, I personally haven't done this, but I know the functionality to expose arrays to python using boost is available - this might be helpful.

The best way to plan for an eventual transition to compiled code is to write the performance sensitive portions as a module of simple functions in a functional style (stateless and without side effects), which accept and return basic data types.
This will provide a one-to-one mapping from your Python prototype code to the eventual compiled code, and will let you use ctypes easily and avoid a whole bunch of headaches.
For peak fitting, you'll almost certainly need to use arrays, which will complicate things a little, but is still very doable with ctypes.
If you really want to use more complicated data structures, or modify the passed arguments, SWIG or Python's standard C-extension interface will let you do what you want, but with some amount of hassle.
For what you're doing, you may also want to check out NumPy, which might do some of the work you would want to push to C, as well as offering some additional help in moving data back and forth between Python and C.

f2py (part of numpy) is a simpler alternative to SWIG and boost.python for wrapping C/Fortran number-crunching code.

In my experience, there are two easy ways to call into C code from Python code. There are other approaches, all of which are more annoying and/or verbose.
The first and easiest is to compile a bunch of C code as a separate shared library and then call functions in that library using ctypes. Unfortunately, passing anything other than basic data types is non-trivial.
The second easiest way is to write a Python module in C and then call functions in that module. You can pass anything you want to these C functions without having to jump through any hoops. And it's easy to call Python functions or methods from these C functions, as described here: https://docs.python.org/extending/extending.html#calling-python-functions-from-c
I don't have enough experience with SWIG to offer intelligent commentary. And while it is possible to do things like pass custom Python objects to C functions through ctypes, or to define new Python classes in C, these things are annoying and verbose and I recommend taking one of the two approaches described above.

Python is pretty liberal in allowing functions, functors, objects to be passed to functions and methods, whereas I suspect the same is not true of say C or Fortran.
In C you cannot pass a function as an argument to a function but you can pass a function pointer which is just as good a function.
I don't know how much that would help when you are trying to integrate C and Python code but I just wanted to clear up one misconception.

In addition to the tools above, I can recommend using Pyrex
(for creating Python extension modules) or Psyco (as JIT compiler for Python).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.