Directly call C++ struct constructor from cython - python

I'm trying to wrap some C++ code that uses structs with constructors, and not figuring out how.
C++ Struct:
typedef struct point_3d_t
{
double x;
double y;
double z;
point_3d_t(double x, double y, double z)
: x(x)
, y(y)
, z(z)
{}
} point_3d;
Cython wrapper:
cdef extern from "./cppdar.hpp":
ctypedef struct point_3d:
point_3d(double, double, double)
double x;
double y;
double z;
Now, I'd expect to be able to construct the struct via something like cdef point_3d p1(v, v, v) (from within the cython file), but I can't seem to figure out how to get cython to just use the provided constructor.
I've tried:
cdef point_3d p1(v, v, v)
cdef point_3d p1 = point_3d(v, v, v)
cdef point_3d p1(0, 0, 0)
cdef point_3d p1 = point_3d(0, 0, 0)
Where v is a explicit cdef double v = 0, but none work.
Using plain cdef point_3d p1, p1.x = nnnn, etc..., but that's annoying, and I don't see why I shouldn't be able to use the default constructor, I think.
Trying to research the issue yields a lot of clutter related to class constructors, which hasn't been helpful.

Ok, so the answer is you can't stack-allocate C++ objects with constructor arguments in cython, basically at all:
From: https://groups.google.com/forum/#!topic/cython-users/fuKd-nQLpBs
Yes, it's a limitation, but it's a more fundamental issue than the
parser. The construction and destruction of stack-allocated objects in
C++ is intricately tied to their scope, and scoping rules are
different in Python and C. For example, consider
if some_condition():
x = Foo(1)
else:
x = Foo(2, 3)
return x.method()
This simply can't be expressed as such in C++. Conversely
if (some_other_condition()) {
Foo_with_RIAA foo(x)
}
...
wouldn't translate "correctly" Python scoping rules.
Now there are some cases where it could make sense, but significant
code generation changes would have to be made, as currently all
variables are declared at the top of a function (to follow the C89
standard, which some compilers enforce) but in C++ mode we would have
to defer the declaration of the variable to it's instantiation
(including avoiding any automatically inserted C-level {} scopes used
for simplicity in code generation).
As one can always allocate such complicated objects on the heap, this
isn't a significant limitation.
This is extra-double-plus annoying, because it means you simply cannot wrap classes that lack default constructors in many cases.
The horrible, no-good hacky workaround is to wrap the constructors in a simple C(++) function, and then expose that via cython.

Related

Cython: efficient custom numpy 1D array for cdef class

Say we have a class in cython that wraps (via a pointer) a C++ class with unknown/variable size in memory:
//poly.h
class Poly{
std::vector[int] v
// [...] Methods to initialize/add/multiply/... coefficients [...] e.g.,
Poly(int len, int val){for (int i=0; i<len; i++){this->v.push_back(val)};};
void add(Poly& p) {for (int i=0; i<this->v.size();i++){this->v[i] += p->v[i];};};
};
We can conveniently expose operations like add in PyPoly using operator overloads (e.g., __add__/__iadd__):
cdef extern from "poly.h":
cdef cppclass Poly:
Poly(int len, int val)
void add(Poly& p)
#pywrapper.pyx
cdef class PyPoly
cdef Poly* c_poly
cdef __cinit__(self, int l, int val):
self.c_poly = new Poly(l, val)
cdef __dealloc__(self):
del self.c_poly
def __add__(self, PyPoly other):
new_poly = PyPoly(self.c_poly.size(), 0)
new_poly.c_poly.add(self.c_poly)
new_poly.c_poly.add(other.c_poly)
return new_poly
How to create an efficient 1D numpy array with this cdef class?
The naive way I'm using so far involves a np.ndarray of type object, which benefits from the existing operator overloads:
pypoly_arr = np.array([PyPoly(l=10, val) for val in range(10)])
pypoly_sum = np.sum(pypoly_arr) # Works thanks to implemented PyPoly.__add__
However, the above solution has to go through python code to understand the data type and the proper way to deal with __add__, which becomes quite cumbersome for big array sizes.
Inspired by https://stackoverflow.com/a/45150611/9670056, I tried with an array wrapper of my own, but I'm not sure how to create a vector[PyPoly], whether I should do it or instead just hold a vector of borrowed references vector[Poly*], so that the call to np.sum could be treated (and paralellized) at C++ level.
Any help/suggestions will be highly appreciated! (specially to rework the question/examples to make it as generic as possible & runnable)
This is not possible to do that in Cython. Indeed, Numpy does not support native Cython classes as a data type. The reason is that the Numpy code is written in C and it already compiled when your Cython code is compiled. This means Numpy cannot directly use your native type. It has to do an indirection and this indirection is made possible through the object CPython type which has the downside of being slow (mainly because of the actual indirection but also a bit because of CPython compiler overheads). Cython do not reimplement Numpy primitives as it would be a huge work. Numpy only supports a restricted predefined set of data types. It supports custom user types such types are not as powerful as CPython classes (eg. you cannot reimplement custom operators on items like you did).
Just-in-time (JIT) compiler modules like Numba can theoretically supports this because they reimplement Numpy and generate a code at runtime. However, the support of JIT classes in Numba is experimental and AFAIK array of JIT classes are not yet supported.
Note that you do not need to build an array in this case. A basic loop is faster and use less memory. Something (untested) like:
cdef int val
cdef PyPoly pypoly_sum
pypoly_sum = PyPoly(l=10, 0)
for val in range(1, 10):
pypoly_sum += PyPoly(l=10, val)

Can one use the while(file >> ...) C++ idiom to read files in Cython?

I'd like to use the C++ way of reading files in Cython.
I have a simple file reader that looks like this:
std::ifstream file(fileName);
while(file >> chromosome >> start >> end >> junk >> junk >> strand)
{ ... }
Can I do this in Cython?
Probably better options would be to use python parsing functionality (for example pandas' or numpy's) or, if first solution isn't flexible enough, to code the reader in pure C++ and then call the functionality from Cython.
However, also your approach is possible in Cython, but in order to make it work, one needs to jump through some hoops:
the whole iostream hierarchy isn't part of the provided libcpp-wrappers, so one has to wrap it (and if one doesn't it quick&dirty that are a few lines).
Because std::ifsteam doesn't provide a default constructor, we cannot construct it as an object with automatic lifetime in Cython and need take care of memory management.
Another issue is wrapping of used-defined conversion. It is not very well described in the documentation (see this SO-question), but only operator bool()]3 is supported, so we need to use C++11 (otherwise it is operator void*() const;).
So here is a quick&dirty proof of concept:
%%cython -+ -c=-std=c++11
cdef extern from "<fstream>" namespace "std" nogil:
cdef cppclass ifstream:
# constructor
ifstream (const char* filename)
# needed operator>> overloads:
ifstream& operator>> (int& val)
# others are
# ifstream& operator>> (unsigned int& val)
# ifstream& operator>> (long& val)
# ...
bint operator bool() # is needed,
# so while(file) can be evaluated
def read_with_cpp(filename):
cdef int a=0,b=0
cdef ifstream* foo = new ifstream(filename)
try:
while (foo[0] >> a >> b):
print(a, b)
finally: # don't forget to call destructor!
del foo
actually the return type of operator>>(...) is not std::ifstream but std::basic_istream - I'm just too lazy to wrap it as well.
And now:
>>> read_with_cpp(b"my_test_file.txt")
prints the content of the file to console.
However, as stated above, I would go for writing the parsing in pure C++ and consume it from Cython (e.g. by passing a functor, so the cpp-code can use Python functionality), here is a possible implementation:
%%cython -+
cdef extern from *:
"""
#include <fstream>
void read_file(const char* file_name, void(*line_callback)(int, int)){
std::ifstream file(file_name);
int a,b;
while(file>>a>>b){
line_callback(a,b);
}
}
"""
ctypedef void(*line_callback_type)(int, int)
void read_file(const char* file_name, line_callback_type line_callback)
# use function pointer to get access to Python functionality in cpp-code:
cdef void line_callback(int a, int b):
print(a,b)
# expose functionality to pure Python:
def read_with_cpp2(filename):
read_file(filename, line_callback)
and now calling read_with_cpp2(b"my_test_file.txt") leads to the same result as above.

Are non-member operator overloads (specifically operator==) broken in Cython?

I've got a use case I'm testing where, for performance, it's convenient to write a Cython wrapper class for C++'s unordered_map<int, int> type up front, then pass the wrapper to a bunch of Cython functions that use it, rather than always passing around Python dict (keyed and valued by Python ints that fit in C int) and converting on the fly over and over. I'm also trying to bake in some basic functionality to make it behave like a dict at the Python layer, so it can be worked with without converting en masse from Python dict to Cython unordered_map<int, int> wrapper and back.
My problem occurs in trying to implement __eq__ efficiently in the simplest case (comparing one wrapper instance to another). AFAICT, in theory Cython's libcpp.unordered_map pxd definition pretends unordered_map has a member operator==. This answer claims it should be fine, even though the operator== is actually a non-member function, since the definition only needs to exist so Cython knows it can just put a literal == in the code; the C++ compiler will look up the real overload when compiling the extension module.
But in practice, I can't seem to get it to work without additional manual (and hacky) intervention. For experimenting, I've just been using ipython %%cython magic. Right now what I'm doing is (minimized as much as possible while still exhibiting problem):
>>> %load_ext cython
>>> %%cython -+ --verbose --compile-args=/std:c++latest
... from libcpp.unordered_map cimport unordered_map
... from cython.operator cimport dereference as deref
... cdef extern from "<unordered_map>" namespace "std":
... bint operator==(unordered_map[int, int]&, unordered_map[int, int]&)
... cdef class UII(object):
... cdef unordered_map[int, int] c_map
... def __cinit__(self, dict py_map):
... cdef int k, v
... for k, v in py_map.iteritems():
... self.c_map[k] = v
... def __eq__(UII self, other):
... if isinstance(other, UII):
... return self.c_map == (<UII>other).c_map
... return NotImplemented
...
To be clear, this works right now, e.g.:
>>> pydict = {1:2, 3:4}; ui = UII(pydict); uieq = UII(pydict); uine = UII({1: 2, 4: 3})
>>> ui == uieq # Compares equal for same value UIIs
True
>>> ui == uine # Correctly determines different valued UII not-equal
False
but it only works because I included:
cdef extern from "<unordered_map>" namespace "std":
bint operator==(unordered_map[int, int]&, unordered_map[int, int]&)
to explicitly define the non-member overload of operator== for the <int, int> template specifically (because Cython doesn't support generically templated functions, only generically templated classes and explicitly declared templates of functions). If I omit those lines, I get the following error from Cython:
[1/1] Cythonizing C:\Users\ShadowRanger\.ipython\cython\_cython_magic_ea9bfadf105ac88c17e000476fd582dc.pyx
Error compiling Cython file:
------------------------------------------------------------
...
cdef int k, v
for k, v in py_map.iteritems():
self.c_map[k] = v
def __eq__(UII self, other):
if isinstance(other, UII):
return self.c_map == (<UII>other).c_map
^
------------------------------------------------------------
C:\Users\ShadowRanger\.ipython\cython\_cython_magic_ea9bfadf105ac88c17e000476fd582dc.pyx:13:30: Invalid types for '==' (unordered_map[int,int], unordered_map[int,int])
indicating that it believes there is no overload for operator== available. Am I doing something wrong that I can fix, or is Cython broken for non-member operator== (and possibly other non-member functions, despite what this answer claims)? I hate explicitly defining every template specialization for non-member functions; it's not a huge burden for this specific case, but it seems odd that Cython would define member overloads, presumably to solve this specific problem, but not actually be able to use them. I checked the actual text of unordered_map.pxd, and it's definitely defined there:
cdef extern from "<unordered_map>" namespace "std" nogil:
cdef cppclass unordered_map[T, U]:
# ... irrelevant stuff expunged ...
bint operator==(unordered_map&, unordered_map&) # Looks properly defined...
Two things are broken about Cython's handling of non-member operators:
Non-member operators defined in a .pxd file outside a class aren't correctly cimported to Cython (the workround is to do from something cimport *) and thus aren't used. This is what I think I said in my previous answer
Non-member operators defined within a class with two arguments (as in the code you showed in unordered_map.pxd) aren't recognised by Cython and aren't used (despite being defined like that all over the C++ standard library wrappers included in Cython). At one point I tried to submit a patch for this but it was ignored. This is no longer true. Only point 1 now applies.
What does work is to tell Cython that it's a member operator (even if C++ implements it as a non-member operator). Therefore a simple patch to unordered_map.pxd would work. Note that I'm changing it to be defined with one argument and the C++ implicit this/self:
cdef extern from "<unordered_map>" namespace "std" nogil:
cdef cppclass unordered_map[T, U]:
# ... irrelevant stuff expunged ...
bint operator==(unordered_map&)
Alternatively, you can define it yourself before you need to use it (like you're doing currently) but as a template. This at least saves you having to define every specialization
cdef extern from "<unordered_map>" namespace "std":
bint operator==[R,S](unordered_map[R, S]&, unordered_map[R, S]&)
i.e. the statement in your question
(because Cython doesn't support generically templated functions, only generically templated classes and explicitly declared templates of functions)
isn't true.
It is all a bit of a mess though

Writing Cython extension: how to access a C struct internal data from Python?

Disclaimer: I took the following example from the Python Cookbook (O'Reilly).
Let's say I have the following simple struct:
typedef struct {
double x,y;
} Point;
with a function that calculates the Euclidean distance between two Points:
extern double distance(Point* p1, Point* p2);
All this is part of a shared library called points:
points.h - the header file
points.c - the source file
libpoints.so - the library file (the Cython extension links against it)
I have created my wrapping Python script (called pypoints.py):
#include "Python.h"
#include "points.h"
// Destructor for a Point instance
static void del_Point(PyObject* obj) {
// ...
}
// Constructor for a Point instance
static void py_Point(PyObject* obj) {
// ...
}
// Wrapper for the distance function
static PyObject* py_distance(PyObject* self, PyObject* arg) {
// ...
}
// Method table
static PyMethodDef PointsMethods[] = {
{"Point", py_Point, METH_VARARGS, "Constructor for a Point"},
{"distance", py_distance, METH_VARARGS, "Calculate Euclidean distance between two Points"}
}
// Module description
static struct PyModuleDef pointsmodule = {
PyModuleDef_HEAD_INIT,
"points", // Name of the module; use "import points" to use
"A module for working with points", // Doc string for the module
-1,
PointsMethods // Methods provided by the module
}
Note that this is just an example. For the struct and function above I can easily use ctypes or cffi but I want to learn how to write Cython extensions. The setup.py is not required here so no need to post it.
Now as you can see the constructor above allows us to do
import points
p1 = points.Point(1, 2) # Calls py_Point(...)
p2 = points.Point(-3, 7) # Calls py_Point(...)
dist = points.distance(p1, p2)
It works great. However what if I want to actually access the internals of the Point structure? For example how would I do
print("p1(x: " + str(p1.x) + ", y: " + str(p1.y))
As you know a struct internals can be directly accessed (if we use C++ terminology we can say that all struct members are public) so in a C code we can easily do
Point p1 = {.x = 1., .y = 2.};
printf("p1(x: %f, y: %f)", p1.x, p1.y)
In Python class members (self.x, self.y) can also be accessed without any getters and setters.
I can write functions which act as an intermediate step:
double x(Point* p);
double y(Point* p);
however I am unsure how to wrap these and how to describe their call inside the table of methods.
How can I do that? I want to have a simple p1.x for getting the x of my Point structure in Python.
I was initially a little confused about this question since it seemed to have no Cython content (sorry the editing mess resulting from that confusion).
The Python cookbook uses Cython in a very odd way which I wouldn't recommend following. For some reason it wants to use PyCapsules which I have never seen used before in Cython.
# tell Cython about what's in "points.h"
# (this does match the cookbook version)
cdef extern from "points.h"
ctypedef struct Point:
double x
double y
double distance(Point *, Point *)
# Then we describe a class that has a Point member "pt"
cdef class Py_Point:
cdef Point pt
def __init__(self,x,y):
self.pt.x = x
self.pt.y = y
# define properties in the normal Python way
#property
def x(self):
return self.pt.x
#x.setter
def x(self,val):
self.pt.x = val
#property
def y(self):
return self.pt.y
#y.setter
def y(self,val):
self.pt.y = val
def py_distance(Py_Point a, Py_Point b):
return distance(&a.pt,&b.pt) # get addresses of the Point members
You can then compile it and use it from Python as
from whatever_your_module_is_called import *
# create a couple of points
pt1 = Py_Point(1.3,4.5)
pt2 = Py_Point(1.5,0)
print(pt1.x, pt1.y) # access the members
print(py_distance(pt1,pt2)) # distance between the two
In fairness to the Python Cookbook it then gives a second example that does something very similar to what I've done (but using a slightly older property syntax from when Cython didn't support the Python-like approach). So if you'd have read a bit further you wouldn't have needed this question. But avoid mixing Cython and pycapsules - it isn't a sensible solution and I don't know why they recommended it.

Creating a PyCObject pointer in Cython

A few SciPy functions (like scipy.ndimage.interpolation.geometric_transform) can take pointers to C functions as arguments to avoid having to call a Python callable on each point of the input array.
In a nutshell :
Define a function called my_function somewhere in the C module
Return a PyCObject with the &my_function pointer and (optionally) a void* pointer to pass some global data around
The related API method is PyCObject_FromVoidPtrAndDesc, and you can read Extending ndimage in C to see it in action.
I am very interested in using Cython to keep my code more manageable, but I'm not sure how exactly I should create such an object. Any, well,... pointers?
Just do in Cython the same thing you would do in C, call PyCObject_FromVoidPtrAndDesc directly. Here is an example from your link ported to Cython:
###### example.pyx ######
from libc.stdlib cimport malloc, free
from cpython.cobject cimport PyCObject_FromVoidPtrAndDesc
cdef int _shift_function(int *output_coordinates, double* input_coordinates,
int output_rank, int input_rank, double *shift_data):
cdef double shift = shift_data[0]
cdef int ii
for ii in range(input_rank):
input_coordinates[ii] = output_coordinates[ii] - shift
return 1
cdef void _shift_destructor(void* cobject, void *shift_data):
free(shift_data)
def shift_function(double shift):
"""This is the function callable from python."""
cdef double* shift_data = <double*>malloc(sizeof(shift))
shift_data[0] = shift
return PyCObject_FromVoidPtrAndDesc(&_shift_function,
shift_data,
&_shift_destructor)
Performance should be identical to pure C version.
Note that Cyhton requires operator & to get function address. Also, Cython lacks pointer dereference operator *, indexing equivalent is used instead (*ptr -> ptr[0]).
I think that is a bad idea. Cython was created to avoid writing PyObjects also! Moreover, in this case, writing the code through Cython probably doesn't improve code maintenance...
Anyway, you can import the PyObject with
from cpython.ref cimport PyObject
in your Cython code.
UPDATE
from cpython cimport *
is safer.
Cheers,
Davide

Categories