I am using Cython to wrap a c++ library in python. Unfortunately, I have no access to the c++ library. Therefore, I have to find somehow a way to wrap the structures and functions as exposed by the lib API.
My question regards the best way to wrap a c++ structure and; subsequently, how to create a memory view in python and pass its pointer (first element address) to a c++ function with parameter an array of cpp structures.
For example, let say I have the following h file:
//test.h
struct cxxTestData
{
int m_id;
double m_value;
};
void processData(cxxTestData* array_of_test_data, int isizeArr)
My pyx file will look like the following
cdef extern from "Test.h":
cdef struct cxxTestData:
int m_id
double m_value
cdef class pyTestData:
cdef cxxTestData cstr
def __init__(self, id, value):
self.cstr.m_id = id
self.cstr.m_value = value
#property
def ID(self):
return self.cstr.m_id
#property
def Value(self):
return self.cstr.m_value
Now, I want to create a number of pyTestData and store them in an array of dtype object. Then I want to pass this array as memory view in a cython/python function.
The wrapping function will have the following signature
cpdef void pyProcessData(pyTestData[::1] test_data_arr)
I have tested the above and it compiles successfully. I managed also to modify the members of each structure. However, this is not what I am trying to achieve. My question is how from this point I can pass an array with the c++ structures encapsulated in each pyTestData object (via the self.cstr).
As an example please have a look to the following listing:
cpdef void pyProcessData(pyTestData[::1] test_data_arr):
cdef int isize test_data_arr.shape[0]
# here I want to create/insert an array of type cxxTestData to pass it
# to the cpp function
# In other words, I want to create an array of [test_data_arr.cstr]
# I guess I can use cxxTestData[::1] or cxxTestData* via malloc and
# copy each test_data_arr[i].cstr to this new array
cdef cxxTestData* testarray = <cxxTestData*>malloc(isize*sizeof(cxxTestData))
cdef int i
for i in range(isize):
testarray[i] = test_data_arr[i].cstr
processData(&testarray[0], isize)
for i in range(isize):
arrcntrs[i].pystr = testarray[i]
free(testarray)
Has anyone come across with such a case? Is there any better way to pass my python objects in the above function without having to copy over the cxx structures internally?
Many thanks in advance and apologies if I do something fundamentally wrong.
Since you want an array of cxxTestData to pass to your C++ functions, the best thing to do is to allocate it as an array. Some untested code that illustrates the approach:
cdef class TestDataArray:
cdef cxxTestData* array:
def __init__(self, int length):
self.array = <cxxTestData*>calloc(length,sizeof(cxxTestData))
def __dealloc__(self):
free(self.array)
def __getitem__(self, int idx):
return PyTestData.from_pointer(&self.array[idx],self) # see later
def __setitem__(self, int idx, PyTestData pyobj): # this is optional
self.array[idx] = deref(pyobj.cstr)
You then want to slightly modify your PyTestData class so that it holds a pointer rather than holding the class directly. It should also have a field representing the ultimate owner of the data (e.g. the array). This ensures the array is kept alive, and can also allow for the case where the PyTestData owns its own data:
cdef class PyTestData:
cdef cxxTestData* cstr
cdef object owner
def __init__(self, id, value):
self.owner = None
self.cstr = <cxxTestData*>malloc(sizeof(cxxTestData))
self.cstr.m_id = id
self.cstr.m_value = value
def __dealloc__(self):
if self.owner is None: # i.e. this class owns it
free(self.cstr)
#staticmethod
cdef PyTestData from_pointer(cxxTestData* ptr, owner):
# calling __new__ avoids calling the constructor
cdef PyTestData x = PyTestData.__new__(PyTestData)
x.owner = owner
x.cstr = ptr
return x
There is a little extra effort in creating the TestDataArray class, but it stores the data in a format directly usable from C++, and so I think it's the best solution.
Related
I'm wrapping a C++ API for use in Python, so I'd like the functionality of the Python wrapper classes to pretty closely mirror that of the C++ classes. In one case, I have two objects that are actually nested structs:
// myheader.hpp
#include <vector>
namespace mynames{
struct Data{
struct Piece{
double piece1;
int piece2;
std::vector<double> piece3;
};
std::vector<Piece> pieces;
};
}
I'd like to interact with this object fluidly in Python as if it were a typical Python class using numpy and extension types. So I began by declaring two extension types:
# mydeclarations.pxd
from libcpp.vector cimport vector
cdef extern from "myheader.hpp" namespace "mynames":
cdef cppclass Data:
vector[Piece] pieces
cdef extern from "myheader.hpp" namespace "mynames::Data":
cdef cppclass Piece:
double piece1
int piece2
vector[double] piece3
Then wrapping in Python:
# mytypes.pyx
cimport mydeclarations as cpp
from cython cimport NULL
from libcpp.vector cimport vector
import numpy as np
cdef class Piece:
cdef cpp.Piece *_cppPiece
def __cinit__(self):
self._cppPiece = new cpp.Piece()
def __dealloc__(self):
if self._cppPiece is not NULL:
del self._cppPiece
#property
def piece1(self):
return self._cppPiece.piece1
#piece1.setter
def piece1(self, double d):
self._cppPiece.piece1 = d
#property
def piece2(self):
return self._cppPiece.piece2
#piece2.setter
def piece2(self, int i):
self._cppPiece.piece2 = i
# Use cython's automatic type conversion: (cpp)vector <---> (py)list (COPIES)
#property
def piece3(self):
return np.asarray(self._cppPiece.piece3, dtype=np.double)
#piece3.setter
def piece3(self, arr):
self._cppPiece.piece3 = <vector[double]>np.asarray(arr, dtype=np.double)
#----------------------------------------------------------------------
cdef class Data:
cdef cpp.Data *_cppData
def __cinit__(self):
self._cppData = new cpp.Data()
def __dealloc__(self):
if self._cppData is not NULL:
del self._cppData
#property
def pieces(self):
# Create a list of Python objects that hold copies of the C++ data
cdef Piece pyPiece
cdef int i
pyPieces = []
for i in range(self._cppData.pieces.size()):
pyPiece = Piece()
pyPiece._cppPiece[0] = self._cppData.pieces.at(i)
pyPieces.append(pyPiece)
return np.asarray(pyPieces, dtype=Piece)
#pieces.setter
def pieces(self, arr):
# Clear the existing vector and create a new one containing copies of the data in arr
cdef Piece pyPiece
self._cppData.pieces.clear()
for pyPiece in arr:
self._cppData.pieces.push_back(deref(pyPiece._cppPiece))
This is a simple implementation and as far as I can tell it works, but there are some issues:
Since we use copies, there's no in-place functionality that you might expect if, for example, Piece().piece3 was a python class attribute holding a numpy array. For example,
a = Piece()
a.piece3 = [1,2,3]
a.piece3[0] = 55 # No effect, need to do b = a.piece3; b[0]=55; a.piece3=b
There's a lot of iterating over data and copying. This is probably an issue when the size of Data.pieces is very large.
Can anyone suggest some better alternatives to address these issues? Although Data is more complicated that Pieces, I think they are related and ultimately boil down to wrapping C++ classes with vector attributes for use in Python.
If you want to avoid data copying then it probably involves creating a wrapper class.
cdef class DoubleVector:
cdef vector[double] *vec
cdef owner
def __dealloc__(self):
if owner is not None:
del self.vec
#staticmethod
cdef create_from_existing_Piece(Piece obj):
out = DoubleVector()
out.owner = obj
out.vec = obj._cppData.piece3
return out
# create len/__getindex__/__setindex__ functions
# You could also have this class expose the buffer protocol
Here I've assumed that DoubleVector doesn't own its own data the majority of the time. Therefore it keeps a reference to the Python class that owns the C++ class that owns that data (thus ensuring that object lifetimes are preserved).
Some details (mainly creating a nice sequence interface) is left for you to fill in.
Exposing vector[Piece] is more difficult, largely because any changes to the vector (including resizing it) would invalidate any pointers into the vector. Therefore I'd give serious thought to having a different Python interface to C++ interface.
Could you make Data immutable (so that you simply can't change it from Python and so can safely return pointers into it)?
Could you avoid returning Piece from data and have functions like get_ith_piece1, get_ith_piece2, get_ith_piece3 (i.e. remove a layer from your Python wrapping)?
Alternatively you could do something like
cdef class BasePiece:
cdef cpp.Piece* get_piece(self):
raise NotImplementedError
# most of the implementation of your Piece class goes here
cdef class Piece(BasePiece):
# wrapper that owns its own data.
# largely as before but with
cdef cpp.Piece* get_piece(self):
return self._cppPiece
# ...
cdef class UnownedPiece(BasePiece):
cdef Data d
cdef int index
cdef cpp.Piece* get_piece(self):
return self.d._cppClass.pieces[index]
This is at least safe if the contents of the vector changes (it doesn't point to an existing Piece, but just to the indexed position). You obviously need to be careful about changing the size.
Your getter function for Data.pieces might be something like
#property
def pieces(self):
l = []
for i in range(self.pieces.size()):
l.append(UnownedPiece(self.pieces[i], self))
return tuple(l) # convert to tuple so it's immutable and people
# won't be tempted to try to append to it.
There's obviously a number of other approaches that you could take, but you can create a reasonably nice interface with this kind of approach.
The main thing is: restrict the Python interface as much as is possible.
I have the following code in my Cython wrapper to a C++ code:
# distutils: language = c++
# distutils: sources = symbolic.cpp
from libcpp.vector cimport vector
from libcpp.pair cimport pair
from libcpp.string cimport string
from libcpp cimport bool
cdef extern from "symbolic.h" namespace "metadiff::symbolic":
cdef cppclass SymbolicMonomial:
vector[pair[int, int]] powers
long long coefficient;
SymbolicMonomial()
SymbolicMonomial(long)
SymbolicMonomial(const SymbolicMonomial&)
bool is_constant()
long long int eval(vector[int]&)
long long int eval()
string to_string()
string to_string_with_star() const
cdef SymbolicMonomial mul_mm"operator*"(const SymbolicMonomial&, const SymbolicMonomial&)
# SymbolicMonomial operator*(long long, const SymbolicMonomial&)
# SymbolicMonomial operator*(const SymbolicMonomial&, long long)
cdef class SymMonomial:
cdef SymbolicMonomial* thisptr # hold a C++ instance which we're wrapping
def __cinit__(self):
self.thisptr = new SymbolicMonomial()
def __cinit__(self, int value):
self.thisptr = new SymbolicMonomial(value)
def __dealloc__(self):
del self.thisptr
def is_constant(self):
return self.thisptr.is_constant()
def eval(self):
return self.thisptr.eval()
def __str__(self):
return self.to_string_with_star()
def to_string(self):
return self.thisptr.to_string().decode('UTF-8')
def to_string_with_star(self):
return self.thisptr.to_string_with_star().decode('UTF-8')
def __mul__(self, other):
return mul_mm(self.thisptr, other)
def variable(variable_id):
monomial = SymMonomial()
monomial.thisptr.powers.push_back((variable_id, 1))
return monomial
However, I never figured it out how to call the mul_mm method correctly. It keeps saying Cannot convert 'SymbolicMonomial' to Python object or vice versa. The thing is I need to be able to multiply two SymMonomials in this way. However for some reason I can not get the hang of it of how to do it properly. Any advices?
You have a number of issues:
You can't return C++ objects directly to Python - you need to return your wrapper type (assign to thisptr of the wrapper)
You can't guarantee either self or other is of the correct type at the point the function is called (see the note in http://docs.cython.org/src/userguide/special_methods.html#arithmetic-methods about how the methods can be called with the operands in either order). To use the C/C++ members of a Cython class you need to ensure Cython knows that object is indeed of that class. I recommend using the <Classname?> style cast (note the question mark) which throws an exception if it doesn't match.
You need to get thisptr from other too, rather than just passing the Python wrapper class to your C++ function.
The following should work.
def __mul__(self,other):
cdef SymMonomial tmp = SymMonomial()
cdef SymMonomial self2, other2
try:
self2 = <SymMonomial?>self
other2 = <SymMonomial?>other
except TypeError:
return NotImplemented # this is what Python expects for operators
# that don't know what to do
tmp.thisptr[0] = mul_mm(self2.thisptr[0],other2.thisptr[0])
return tmp
I would like to cythonize the following templated C++ class:
template <typname T>
class Fc2Par
{
public:
Fc2Par(std::string const& par_file)
~Fc2Par()
std::vector<Box<T>> convert_boxes(std::vector<Box<T>> const& boxes) const;
std::vector<Point<T>> convert_points(std::vector<Point<T>> const& points) const;
private:
PartitionMap<T> par_map;
PartitionRTree<T> par_idx;
};
In reality, T will be [int, double] only. Box/Point are additional templated classes but I'm not sure if i want to expose that in python. To cythonize, I have the following, but I'm stuck in some areas. I think I can use a fused type for T?
cimport cython
from libcpp.vector cimport vector
from libcpp.string cimport string
my_fused_type = cython.fused_type(cython.int, cython.double)
cdef extern from 'Fc2Par.h':
cdef cppclass Fc2Par[T]
Fc2Par(string&) except +
vector[Box[T]] convert_boxes(vector[Box[T]]&)
vector[Point[T]] convert_points(vector[Point[T]]&)
cdef class PyFc2Par:
cdef Fc2Par* thisptr <-- should this be Fc2Par[my_fused_type]*?
def __cinit__(self, par_file):
self.thisptr = new Fc2Par[my_fused_type](par_file)
def __dealloc__(self)
del self.thisptr
def convert_boxes(self, boxes)
I'm not sure what to do here?
def convert_points(self, points)
This will be very similar to convert_boxes once I figure that out.
Ideally, I want to use the API in python like this:
boxes_int = [(0,0,1,1), (0,0,2,2), ...]
boxes_float = [(0.0,0.0,1.0,1.0), (0.0,0.0,2.0,2.0), ...]
fc2par = PyFc2Par('foo.csv')
converted_int = fc2par.convert_boxes(boxes_int)
converted_float = fc2par.convert_boxes(boxes_float)
They return a list of tuples with xmin,xmax,ymin,ymax.
My Questions:
Is using a fused type correct in this situation?
If I take a list of tuples, how do I convert them into Box[T]/Point[T] in the Cython code without exposing these classes in Python? Once I have the result, I can convert that back to a list of tuples and send that back. i.e., how should the convert_boxes implementation look like?
Thank you for any help.
Question 1 - unfortunately you can't use fused types there. (See previous questions on the subject: c++ class in fused type; Cython: templates in python class wrappers). You have to create a separate wrapper for each different variant. e.g.:
cdef class PyFc2ParInt:
cdef Fc2Par[int]* thisptr
# etc ...
cdef class PyFc2ParDouble:
cdef Fc2Par[double]* thisptr
# etc ...
This unfortunately involves a lot of unavoidable code duplication.
Question 2. The implementation if convert_points essentially involves iterating through a Python list creating your boxes and then iterating through the vector to create a Python list. A rough outline is:
def convert_points(self, points):
cdef vector[Box[double]] v # or vector[Box[int]]
for p in points:
# I've taken a guess at what the Box constructor looks like
v.push_back(Box[double](p[0],p[1],p[2],p[3]))
v = self.thisptr.convert_points(v) # reuse v for the output
# now iterate through the vector and copy out
# I've taken a guess at the interface of Box
output = [ (v[i].left, v[i].right, v[i].top, v[i].bottom)
for i in range(v.size()) ]
return output
Note that you need to have told Cython about Box (cdef extern from ...), even if you don't then expose it to Python.
This slow code can be improved by changing the structure, but this is difficult to work around sometimes. The cause, I think, comes from classes stored in an array. I've heard memory views are used to link python and c arrays, but I'm still pretty new to this (only some python knowledge).
Is there a way to do the following efficiently?
An example class:
cdef class ClassWithAdditionFunction:
cdef double value
def __init__(self, double value):
self.value = value
cpdef add_one(self):
self.value += 1
A slow function:
cdef unsigned long int i, ii
cdef unsigned long int loops = pow(10, 8)
cdef double value
addition_classes = np.array([None] * 10)
for i in range(len(addition_classes)):
addition_classes[i] = ClassWithAdditionFunction(value=0)
for i in range(loops/10):
for ii in range(10):
addition_classes[ii].add_one()
Thank you very much for any suggestions!
There are some small things that you could do that should help a little. Really the line of code you want to speed up is addition_classes[ii].add_one(). If you use cython -a to see what's really happening under the hood you'll see that you're making a call to Pyx_GetItemInt, then PyObject_GetAttr, then PyObject_Call. You want to structure your code to avoid these 3 calls.
To avoid the GetItem Call, you'll want to use either numpy's buffer interface or memory views. This tells cython the structure of your array and allows it to pull items from the array more efficiently. In the example bellow I've used a memory view. If you do something similar, make sure that the array is in fact an array full of ClassWithAdditionFunction instances, otherwise you'll likely get a segfault.
To avoid the GetAttr call, declare a variable of type ClassWithAdditionFunction and make the method calls on that variable, that way cython knows that the variable has a compiled version of the method which it can use for faster calls.
Lastly you've already defined add_one with a cpdef method, but I would suggest also adding a return type. Normally we could just put void, but because this is a cpdef function and not a cdef function you could use int instead.
If you put all that together it should look something like:
import numpy as np
cimport cython
cdef class ClassWithAdditionFunction:
cdef double value
def __init__(self, double value):
self.value = value
cpdef int add_one(self):
self.value += 1
return 0
#cython.boundscheck(False)
#cython.wraparound(False)
def main():
cdef:
unsigned long int i, ii, loops = 10 ** 6
ClassWithAdditionFunction addInstance
double value, y
addition_classes = np.array([None] * 10)
cdef ClassWithAdditionFunction[:] arrayview = addition_classes
for i in range(len(addition_classes)):
addition_classes[i] = ClassWithAdditionFunction(value=0)
for i in range(loops/10):
for ii in range(10):
addInstance = arrayview[ii]
addInstance.add_one()
return None
This question already has answers here:
Wrapping a pre-initialized pointer in a cython class
(3 answers)
Closed 1 year ago.
I'm trying to wrap two C++ classes: Cluster and ClusterTree. ClusterTree has a method get_current_cluster() that instantiates a Cluster object, and returns a reference to it. ClusterTree owns the Cluster object, and manages its creation and deletion in C++.
I've wrapped Cluster with cython, resulting in PyCluster.
PyCluster should have two ways of creation:
1) By passing in two arrays, which then implies that Python should then automatically handle deletion (via __dealloc__)
2) By directly passing in a raw C++ pointer (created by ClusterTree's get_current_cluster()). In this case, ClusterTree then assumes responsibility of deleting the underlying pointer.
from libcpp cimport bool
from libcpp.vector cimport vector
cdef extern from "../include/Cluster.h" namespace "Terran":
cdef cppclass Cluster:
Cluster(vector[vector[double]],vector[int]) except +
cdef class PyCluster:
cdef Cluster* __thisptr
__autoDelete = True
def __cinit__(self, vector[vector[double]] data, vector[int] period):
self.__thisptr = new Cluster(data, period)
#classmethod
def __constructFromRawPointer(self, raw_ptr):
self.__thisptr = raw_ptr
self.__autoDelete = False
def __dealloc__(self):
if self.__autoDelete:
del self.__thisptr
cdef extern from "../include/ClusterTree.h" namespace "Terran":
cdef cppclass ClusterTree:
ClusterTree(vector[vector[double]],vector[int]) except +
Cluster& getCurrentCluster()
cdef class PyClusterTree:
cdef ClusterTree *__thisptr
def __cinit__(self, vector[vector[double]] data, vector[int] period):
self.__thisptr = new ClusterTree(data,period)
def __dealloc__(self):
del self.__thisptr
def get_current_cluster(self):
cdef Cluster* ptr = &(self.__thisptr.getCurrentCluster())
return PyCluster.__constructFromRawPointer(ptr)
This results in:
Error compiling Cython file:
------------------------------------------------------------
...
def get_current_cluster(self):
cdef Cluster* ptr = &(self.__thisptr.getCurrentCluster())
return PyCluster.__constructFromRawPointer(ptr)
^
------------------------------------------------------------
terran.pyx:111:54: Cannot convert 'Cluster *' to Python object
Note I cannot cdef __init__ or #classmethods.
Pointers can only be passed to cdef'd functions as arguments, and cinit has to be def'd. But providing a classmethod is almost the way to go!
cdef Cluster* __thisptr
cdef bool __wrapped ## defaults to False
#staticmethod
cdef PyCluster wrap(Cluster* ptr):
cdef PyCluster pc = PyCluster([], []) ## Initialize as cheaply as possible
del pc.__thisptr ## delete the old pointer to avoid memory leaks!
pc.__thisptr = ptr
pc.__wrapped = True
return pc
I know this is an old question, but after my own recent struggles with Cython I thought I'd post an answer for the sake of posterity.
It seems to me you could use a copy constructor to create a new PyCluster object from an existing Cluster object.
Define the copy constructor in your C code, then call the copy constructor in the Python class definition (in this case, when a pointer is passed) using new. This will work, although it may not be the best or most performant solution.