I would like to cythonize the following templated C++ class:
template <typname T>
class Fc2Par
{
public:
Fc2Par(std::string const& par_file)
~Fc2Par()
std::vector<Box<T>> convert_boxes(std::vector<Box<T>> const& boxes) const;
std::vector<Point<T>> convert_points(std::vector<Point<T>> const& points) const;
private:
PartitionMap<T> par_map;
PartitionRTree<T> par_idx;
};
In reality, T will be [int, double] only. Box/Point are additional templated classes but I'm not sure if i want to expose that in python. To cythonize, I have the following, but I'm stuck in some areas. I think I can use a fused type for T?
cimport cython
from libcpp.vector cimport vector
from libcpp.string cimport string
my_fused_type = cython.fused_type(cython.int, cython.double)
cdef extern from 'Fc2Par.h':
cdef cppclass Fc2Par[T]
Fc2Par(string&) except +
vector[Box[T]] convert_boxes(vector[Box[T]]&)
vector[Point[T]] convert_points(vector[Point[T]]&)
cdef class PyFc2Par:
cdef Fc2Par* thisptr <-- should this be Fc2Par[my_fused_type]*?
def __cinit__(self, par_file):
self.thisptr = new Fc2Par[my_fused_type](par_file)
def __dealloc__(self)
del self.thisptr
def convert_boxes(self, boxes)
I'm not sure what to do here?
def convert_points(self, points)
This will be very similar to convert_boxes once I figure that out.
Ideally, I want to use the API in python like this:
boxes_int = [(0,0,1,1), (0,0,2,2), ...]
boxes_float = [(0.0,0.0,1.0,1.0), (0.0,0.0,2.0,2.0), ...]
fc2par = PyFc2Par('foo.csv')
converted_int = fc2par.convert_boxes(boxes_int)
converted_float = fc2par.convert_boxes(boxes_float)
They return a list of tuples with xmin,xmax,ymin,ymax.
My Questions:
Is using a fused type correct in this situation?
If I take a list of tuples, how do I convert them into Box[T]/Point[T] in the Cython code without exposing these classes in Python? Once I have the result, I can convert that back to a list of tuples and send that back. i.e., how should the convert_boxes implementation look like?
Thank you for any help.
Question 1 - unfortunately you can't use fused types there. (See previous questions on the subject: c++ class in fused type; Cython: templates in python class wrappers). You have to create a separate wrapper for each different variant. e.g.:
cdef class PyFc2ParInt:
cdef Fc2Par[int]* thisptr
# etc ...
cdef class PyFc2ParDouble:
cdef Fc2Par[double]* thisptr
# etc ...
This unfortunately involves a lot of unavoidable code duplication.
Question 2. The implementation if convert_points essentially involves iterating through a Python list creating your boxes and then iterating through the vector to create a Python list. A rough outline is:
def convert_points(self, points):
cdef vector[Box[double]] v # or vector[Box[int]]
for p in points:
# I've taken a guess at what the Box constructor looks like
v.push_back(Box[double](p[0],p[1],p[2],p[3]))
v = self.thisptr.convert_points(v) # reuse v for the output
# now iterate through the vector and copy out
# I've taken a guess at the interface of Box
output = [ (v[i].left, v[i].right, v[i].top, v[i].bottom)
for i in range(v.size()) ]
return output
Note that you need to have told Cython about Box (cdef extern from ...), even if you don't then expose it to Python.
Related
I'm wrapping a C++ API for use in Python, so I'd like the functionality of the Python wrapper classes to pretty closely mirror that of the C++ classes. In one case, I have two objects that are actually nested structs:
// myheader.hpp
#include <vector>
namespace mynames{
struct Data{
struct Piece{
double piece1;
int piece2;
std::vector<double> piece3;
};
std::vector<Piece> pieces;
};
}
I'd like to interact with this object fluidly in Python as if it were a typical Python class using numpy and extension types. So I began by declaring two extension types:
# mydeclarations.pxd
from libcpp.vector cimport vector
cdef extern from "myheader.hpp" namespace "mynames":
cdef cppclass Data:
vector[Piece] pieces
cdef extern from "myheader.hpp" namespace "mynames::Data":
cdef cppclass Piece:
double piece1
int piece2
vector[double] piece3
Then wrapping in Python:
# mytypes.pyx
cimport mydeclarations as cpp
from cython cimport NULL
from libcpp.vector cimport vector
import numpy as np
cdef class Piece:
cdef cpp.Piece *_cppPiece
def __cinit__(self):
self._cppPiece = new cpp.Piece()
def __dealloc__(self):
if self._cppPiece is not NULL:
del self._cppPiece
#property
def piece1(self):
return self._cppPiece.piece1
#piece1.setter
def piece1(self, double d):
self._cppPiece.piece1 = d
#property
def piece2(self):
return self._cppPiece.piece2
#piece2.setter
def piece2(self, int i):
self._cppPiece.piece2 = i
# Use cython's automatic type conversion: (cpp)vector <---> (py)list (COPIES)
#property
def piece3(self):
return np.asarray(self._cppPiece.piece3, dtype=np.double)
#piece3.setter
def piece3(self, arr):
self._cppPiece.piece3 = <vector[double]>np.asarray(arr, dtype=np.double)
#----------------------------------------------------------------------
cdef class Data:
cdef cpp.Data *_cppData
def __cinit__(self):
self._cppData = new cpp.Data()
def __dealloc__(self):
if self._cppData is not NULL:
del self._cppData
#property
def pieces(self):
# Create a list of Python objects that hold copies of the C++ data
cdef Piece pyPiece
cdef int i
pyPieces = []
for i in range(self._cppData.pieces.size()):
pyPiece = Piece()
pyPiece._cppPiece[0] = self._cppData.pieces.at(i)
pyPieces.append(pyPiece)
return np.asarray(pyPieces, dtype=Piece)
#pieces.setter
def pieces(self, arr):
# Clear the existing vector and create a new one containing copies of the data in arr
cdef Piece pyPiece
self._cppData.pieces.clear()
for pyPiece in arr:
self._cppData.pieces.push_back(deref(pyPiece._cppPiece))
This is a simple implementation and as far as I can tell it works, but there are some issues:
Since we use copies, there's no in-place functionality that you might expect if, for example, Piece().piece3 was a python class attribute holding a numpy array. For example,
a = Piece()
a.piece3 = [1,2,3]
a.piece3[0] = 55 # No effect, need to do b = a.piece3; b[0]=55; a.piece3=b
There's a lot of iterating over data and copying. This is probably an issue when the size of Data.pieces is very large.
Can anyone suggest some better alternatives to address these issues? Although Data is more complicated that Pieces, I think they are related and ultimately boil down to wrapping C++ classes with vector attributes for use in Python.
If you want to avoid data copying then it probably involves creating a wrapper class.
cdef class DoubleVector:
cdef vector[double] *vec
cdef owner
def __dealloc__(self):
if owner is not None:
del self.vec
#staticmethod
cdef create_from_existing_Piece(Piece obj):
out = DoubleVector()
out.owner = obj
out.vec = obj._cppData.piece3
return out
# create len/__getindex__/__setindex__ functions
# You could also have this class expose the buffer protocol
Here I've assumed that DoubleVector doesn't own its own data the majority of the time. Therefore it keeps a reference to the Python class that owns the C++ class that owns that data (thus ensuring that object lifetimes are preserved).
Some details (mainly creating a nice sequence interface) is left for you to fill in.
Exposing vector[Piece] is more difficult, largely because any changes to the vector (including resizing it) would invalidate any pointers into the vector. Therefore I'd give serious thought to having a different Python interface to C++ interface.
Could you make Data immutable (so that you simply can't change it from Python and so can safely return pointers into it)?
Could you avoid returning Piece from data and have functions like get_ith_piece1, get_ith_piece2, get_ith_piece3 (i.e. remove a layer from your Python wrapping)?
Alternatively you could do something like
cdef class BasePiece:
cdef cpp.Piece* get_piece(self):
raise NotImplementedError
# most of the implementation of your Piece class goes here
cdef class Piece(BasePiece):
# wrapper that owns its own data.
# largely as before but with
cdef cpp.Piece* get_piece(self):
return self._cppPiece
# ...
cdef class UnownedPiece(BasePiece):
cdef Data d
cdef int index
cdef cpp.Piece* get_piece(self):
return self.d._cppClass.pieces[index]
This is at least safe if the contents of the vector changes (it doesn't point to an existing Piece, but just to the indexed position). You obviously need to be careful about changing the size.
Your getter function for Data.pieces might be something like
#property
def pieces(self):
l = []
for i in range(self.pieces.size()):
l.append(UnownedPiece(self.pieces[i], self))
return tuple(l) # convert to tuple so it's immutable and people
# won't be tempted to try to append to it.
There's obviously a number of other approaches that you could take, but you can create a reasonably nice interface with this kind of approach.
The main thing is: restrict the Python interface as much as is possible.
I am using Cython to wrap a c++ library in python. Unfortunately, I have no access to the c++ library. Therefore, I have to find somehow a way to wrap the structures and functions as exposed by the lib API.
My question regards the best way to wrap a c++ structure and; subsequently, how to create a memory view in python and pass its pointer (first element address) to a c++ function with parameter an array of cpp structures.
For example, let say I have the following h file:
//test.h
struct cxxTestData
{
int m_id;
double m_value;
};
void processData(cxxTestData* array_of_test_data, int isizeArr)
My pyx file will look like the following
cdef extern from "Test.h":
cdef struct cxxTestData:
int m_id
double m_value
cdef class pyTestData:
cdef cxxTestData cstr
def __init__(self, id, value):
self.cstr.m_id = id
self.cstr.m_value = value
#property
def ID(self):
return self.cstr.m_id
#property
def Value(self):
return self.cstr.m_value
Now, I want to create a number of pyTestData and store them in an array of dtype object. Then I want to pass this array as memory view in a cython/python function.
The wrapping function will have the following signature
cpdef void pyProcessData(pyTestData[::1] test_data_arr)
I have tested the above and it compiles successfully. I managed also to modify the members of each structure. However, this is not what I am trying to achieve. My question is how from this point I can pass an array with the c++ structures encapsulated in each pyTestData object (via the self.cstr).
As an example please have a look to the following listing:
cpdef void pyProcessData(pyTestData[::1] test_data_arr):
cdef int isize test_data_arr.shape[0]
# here I want to create/insert an array of type cxxTestData to pass it
# to the cpp function
# In other words, I want to create an array of [test_data_arr.cstr]
# I guess I can use cxxTestData[::1] or cxxTestData* via malloc and
# copy each test_data_arr[i].cstr to this new array
cdef cxxTestData* testarray = <cxxTestData*>malloc(isize*sizeof(cxxTestData))
cdef int i
for i in range(isize):
testarray[i] = test_data_arr[i].cstr
processData(&testarray[0], isize)
for i in range(isize):
arrcntrs[i].pystr = testarray[i]
free(testarray)
Has anyone come across with such a case? Is there any better way to pass my python objects in the above function without having to copy over the cxx structures internally?
Many thanks in advance and apologies if I do something fundamentally wrong.
Since you want an array of cxxTestData to pass to your C++ functions, the best thing to do is to allocate it as an array. Some untested code that illustrates the approach:
cdef class TestDataArray:
cdef cxxTestData* array:
def __init__(self, int length):
self.array = <cxxTestData*>calloc(length,sizeof(cxxTestData))
def __dealloc__(self):
free(self.array)
def __getitem__(self, int idx):
return PyTestData.from_pointer(&self.array[idx],self) # see later
def __setitem__(self, int idx, PyTestData pyobj): # this is optional
self.array[idx] = deref(pyobj.cstr)
You then want to slightly modify your PyTestData class so that it holds a pointer rather than holding the class directly. It should also have a field representing the ultimate owner of the data (e.g. the array). This ensures the array is kept alive, and can also allow for the case where the PyTestData owns its own data:
cdef class PyTestData:
cdef cxxTestData* cstr
cdef object owner
def __init__(self, id, value):
self.owner = None
self.cstr = <cxxTestData*>malloc(sizeof(cxxTestData))
self.cstr.m_id = id
self.cstr.m_value = value
def __dealloc__(self):
if self.owner is None: # i.e. this class owns it
free(self.cstr)
#staticmethod
cdef PyTestData from_pointer(cxxTestData* ptr, owner):
# calling __new__ avoids calling the constructor
cdef PyTestData x = PyTestData.__new__(PyTestData)
x.owner = owner
x.cstr = ptr
return x
There is a little extra effort in creating the TestDataArray class, but it stores the data in a format directly usable from C++, and so I think it's the best solution.
I want to pass a list of 2d numpy arrays to a c++ function. My first idea is using a std::vector<float *> to receive the list of array, but I can't find a way to pass the list.
The c++ function looks like this:
double cpp_func(const std::vector<const float*>& vec) {
return 0.0;
}
Cython function likes this:
cpdef py_func(list list_of_array):
cdef vector[float*] vec
cdef size_t i
cdef size_t n = len(list_of_array)
for i in range(n):
vec.push_back(&list_of_array[i][0][0]) # error: Cannot take address of Python object
return cpp_func(vec)
I have tried declare list_of_array using list[float[:,:]], but won't work either.
I will slightly change the signature of your function:
for every numpy-array the function also needs to know the number of elements in this array
data is double * rather than float * because this is what corresponds to default np.float-type. But this can be adjusted accordingly to your needs.
That leads to the following c++-interface/code (for convenience I use C-verbatim-code feature for Cython>=0.28):
%%cython --cplus -c=-std=c++11
from libcpp.vector cimport vector
cdef extern from *:
"""
struct Numpy1DArray{
double *ptr;
int size;
};
static double cpp_func(const std::vector<Numpy1DArray> &vec){
// Fill with life to see, that it really works:
double res = 0.0;
for(const auto &a : vec){
if(a.size>0)
res+=a.ptr[0];
}
return res;
}
"""
cdef struct Numpy1DArray:
double *ptr
int size
double cpp_func(const vector[Numpy1DArray] &vec)
...
The struct Numpy1DArray just bundles the needed information for a np-array, because this is more than just a pointer to continuous data.
Naive version
Now, writing the wrapper function is pretty straight forward:
%%cython --cplus -c=-std=c++11
....
def call_cpp_func(list_of_arrays):
cdef Numpy1DArray ar_descr
cdef vector[Numpy1DArray] vec
cdef double[::1] ar
for ar in list_of_arrays: # coerse elements to double[::1]
ar_descr.size = ar.size
if ar.size > 0:
ar_descr.ptr = &ar[0]
else:
ar_descr.ptr = NULL # set to nullptr
vec.push_back(ar_descr)
return cpp_func(vec)
There are some things worth noting:
you need to coerce the elements of list to something what implements buffer protocol, otherwise &ar[0] will obviously not work, because Cython would expect ar[0] to be a Python-object. Btw, this is what you have missed.
I have chosen Cython's memory views (i.e. double[::1]) as target for coersion. The advantages over np.ndarray are that it also works with array.array and it is also automatically checked, that the data is continuous (that is the meaning of ::1).
a common pitfall is to access ar[0] for an empty ndarray - this access must be guarded.
this code is not thread-safe. Another thread could invalidate the the pointers for example by resizing the numpy-arrays in-place or by deleting the numpy-arrays altogether.
IIRC, for Python 2 you will have to cimport array for the code to work with array.array.
Finally, here is a test, that the code works (there is also an array.array in the list to make the point):
import array
import numpy as np
lst = (np.full(3, 1.0), np.full(0, 2.0), array.array('d', [2.0]))
call_cpp_func(lst) # 3.0 as expected!
Thread-safe version
The code above can also be written in thread-safe manier. The possible problems are:
Another thread could trigger the deletion of numpy-arrays by calling for example list_of_arrays.clear() - after that there could be no more refernces of the arrays around and they would get deleted. That means we need to keep a reference to every input-array as long as we use the pointers.
Another thread could resize the arrays, thus invalidating the pointers. That means we have to use the buffer protocol - its __getbuffer__ locks the buffer, so it cannot be invalidated and release the buffer via __releasebuffer__ once we are done with calculations.
Cython's memory views can be used to lock the buffers and to keep a reference of the input-arrays around:
%%cython --cplus -c=-std=c++11
....
def call_cpp_func_safe(list_of_arrays):
cdef Numpy1DArray ar_descr
cdef vector[Numpy1DArray] vec
cdef double[::1] ar
cdef list stay_alive = []
for ar in list_of_arrays: # coerse elements to double[::1]
stay_alive.append(ar) # keep arrays alive and locked
ar_descr.size = ar.size
if ar.size > 0:
ar_descr.ptr = &ar[0]
else:
ar_descr.ptr = NULL # set to nullptr
vec.push_back(ar_descr)
return cpp_func(vec)
There is small overhead: adding memory views to a list - the price of the safety.
Releasing gil
One last improvement: The gil can be released when cpp_fun is calculated, that means we have to import cpp_func as nogil and release it why calling the function:
%%cython --cplus -c=-std=c++11
from libcpp.vector cimport vector
cdef extern from *:
....
double cpp_func(const vector[Numpy1DArray] &vec) nogil
...
def call_cpp_func(list_of_arrays):
...
with nogil:
result = cpp_func(vec)
return result
Cython will figure out, that result is of type double and thus will be able to release the gil while calling cpp_func.
(I think this question can easily be answered by an expert without an actual copy-paste-working-example, so I did not spent extra time on it…)
I have a C++ method, which returns an array of integers:
int* Narf::foo() {
int bar[10];
for (int i = 0; i < 10; i++) {
bar[i] = i;
}
return bar;
}
I created the Cython stuff for its class:
cdef extern from "Narf" namespace "narf":
cdef cppclass Narf:
Narf() except +
int* foo()
And these are my Python wrappers:
cdef class PyNarf:
cdef Narf c_narf
def __cinit__(self):
self.c_narf = Narf()
def foo(self):
return self.c_narf.foo()
The problem is the foo method, with its int* return type (other methods which I did not list in this example work perfectly fine!). It does not compile and gives the following error:
def foo(self):
return self.c_narf.foo()
^
------------------------------------------------------------
narf.pyx:39:37: Cannot convert 'int *' to Python object
Of course, it obviously does not accept int * as return type. How do I solve this problem? Is there an easy way of wrapping this int * into a numpy array (I'd prefer numpy), or how is this supposed to be handled?
I'm also not sure how to handle the memory here, since I'm reading in large files etc.
To wrap it a numpy array, you need to know the size, then you can do it like this:
def foo(self):
cdef int[::1] view = <int[:self.c_narf.size()]> self.c_narf.foo()
return np.asarray(view)
The above code assumes that there exists a function self.c_narf.size() that returns the size of the array.
This looks like it can be solved using the solution to this question: Have pointer to data. Need Numpy array in Fortran order. Want to use Cython
The situation is as follows: I want to wrap the method Unit.getDistance
//Unit.h
class Unit{
....
int getDistance(PositionOrUnit target) const;
};
//PositionUnit.h
class PositionOrUnit{
PositionOrUnit(Unit unit = nullptr);
PositionOrUnit(Position pos);
};
The library uses converting constructors to allow Unit or Position to be automatically constructed into a PositionOrUnit object by the compiler, so that it is possible to pass a Unit or Position object directly into this method.
#cUnit.pxd
ctypedef UnitInterface *Unit
cdef cppclass UnitInterface:
int getDistance(PositionOrUnit target) const
#cPositionUnit.pxd
cdef cppclass PositionOrUnit:
PositionOrUnit() #fake for nullptr unit
PositionOrUnit(Unit unit)
PositionOrUnit(Position pos)
Now I don't know how to make a converting constructor in Python, so I use subclass polymorphism to get this to work. I declare a base class PositionUnitConverter and have Unit and Position subclass from it.
#PositionUnit.pyx
cdef class PositionUnitConverter:
cdef cPositionUnit.PositionOrUnit getPositionOrUnit(self):
return cPositionUnit.PositionOrUnit()
cdef class PositionOrUnit(PositionUnitConverter):
cdef cPositionUnit.PositionOrUnit thisobj
cdef cPositionUnit.PositionOrUnit getPositionOrUnit(self):
return self.thisobj
#Unit.pyx
cdef class Unit(PositionUnitConverter):
cdef cUnit.Unit thisptr
cdef cPositionUnit.PositionOrUnit getPositionOrUnit(self):
return cPositionUnit.PositionOrUnit(self.thisptr)
def getDistance(self, PositionUnitConverter target):
return self.thisptr.getDistance(target.getPositionOrUnit())
The end result is that in my Python code I can call Unit.getDistance with either type of object still.
# this works
unit.getDistance(unit2)
But now I get several of these warnings:
CyBW\CyBW.cpp(15698) : warning C4190: 'abstract declarator' has C-linkage specified, but
returns UDT 'BWAPI::PositionOrUnit' which is incompatible with C
../include/BWAPI/PositionUnit.h(13) : see declaration of 'BWAPI::PositionOrUnit'
The .cpp code generated at that line is: (split to be easier to read)
__pyx_vtable_4CyBW_5BWAPI_PositionUnitConverter.getPositionOrUnit = (BWAPI::PositionOrUnit
(*)(struct __pyx_obj_4CyBW_5BWAPI_PositionUnitConverter *))
__pyx_f_4CyBW_5BWAPI_21PositionUnitConverter_getPositionOrUnit;
My Questions are:
Am I doing something wrong to get this warning
How can I avoid this warning?
or should I ignore this warning, and how?
If there is anything else I can provide to help, please comment.