Are non-member operator overloads (specifically operator==) broken in Cython? - python

I've got a use case I'm testing where, for performance, it's convenient to write a Cython wrapper class for C++'s unordered_map<int, int> type up front, then pass the wrapper to a bunch of Cython functions that use it, rather than always passing around Python dict (keyed and valued by Python ints that fit in C int) and converting on the fly over and over. I'm also trying to bake in some basic functionality to make it behave like a dict at the Python layer, so it can be worked with without converting en masse from Python dict to Cython unordered_map<int, int> wrapper and back.
My problem occurs in trying to implement __eq__ efficiently in the simplest case (comparing one wrapper instance to another). AFAICT, in theory Cython's libcpp.unordered_map pxd definition pretends unordered_map has a member operator==. This answer claims it should be fine, even though the operator== is actually a non-member function, since the definition only needs to exist so Cython knows it can just put a literal == in the code; the C++ compiler will look up the real overload when compiling the extension module.
But in practice, I can't seem to get it to work without additional manual (and hacky) intervention. For experimenting, I've just been using ipython %%cython magic. Right now what I'm doing is (minimized as much as possible while still exhibiting problem):
>>> %load_ext cython
>>> %%cython -+ --verbose --compile-args=/std:c++latest
... from libcpp.unordered_map cimport unordered_map
... from cython.operator cimport dereference as deref
... cdef extern from "<unordered_map>" namespace "std":
... bint operator==(unordered_map[int, int]&, unordered_map[int, int]&)
... cdef class UII(object):
... cdef unordered_map[int, int] c_map
... def __cinit__(self, dict py_map):
... cdef int k, v
... for k, v in py_map.iteritems():
... self.c_map[k] = v
... def __eq__(UII self, other):
... if isinstance(other, UII):
... return self.c_map == (<UII>other).c_map
... return NotImplemented
...
To be clear, this works right now, e.g.:
>>> pydict = {1:2, 3:4}; ui = UII(pydict); uieq = UII(pydict); uine = UII({1: 2, 4: 3})
>>> ui == uieq # Compares equal for same value UIIs
True
>>> ui == uine # Correctly determines different valued UII not-equal
False
but it only works because I included:
cdef extern from "<unordered_map>" namespace "std":
bint operator==(unordered_map[int, int]&, unordered_map[int, int]&)
to explicitly define the non-member overload of operator== for the <int, int> template specifically (because Cython doesn't support generically templated functions, only generically templated classes and explicitly declared templates of functions). If I omit those lines, I get the following error from Cython:
[1/1] Cythonizing C:\Users\ShadowRanger\.ipython\cython\_cython_magic_ea9bfadf105ac88c17e000476fd582dc.pyx
Error compiling Cython file:
------------------------------------------------------------
...
cdef int k, v
for k, v in py_map.iteritems():
self.c_map[k] = v
def __eq__(UII self, other):
if isinstance(other, UII):
return self.c_map == (<UII>other).c_map
^
------------------------------------------------------------
C:\Users\ShadowRanger\.ipython\cython\_cython_magic_ea9bfadf105ac88c17e000476fd582dc.pyx:13:30: Invalid types for '==' (unordered_map[int,int], unordered_map[int,int])
indicating that it believes there is no overload for operator== available. Am I doing something wrong that I can fix, or is Cython broken for non-member operator== (and possibly other non-member functions, despite what this answer claims)? I hate explicitly defining every template specialization for non-member functions; it's not a huge burden for this specific case, but it seems odd that Cython would define member overloads, presumably to solve this specific problem, but not actually be able to use them. I checked the actual text of unordered_map.pxd, and it's definitely defined there:
cdef extern from "<unordered_map>" namespace "std" nogil:
cdef cppclass unordered_map[T, U]:
# ... irrelevant stuff expunged ...
bint operator==(unordered_map&, unordered_map&) # Looks properly defined...

Two things are broken about Cython's handling of non-member operators:
Non-member operators defined in a .pxd file outside a class aren't correctly cimported to Cython (the workround is to do from something cimport *) and thus aren't used. This is what I think I said in my previous answer
Non-member operators defined within a class with two arguments (as in the code you showed in unordered_map.pxd) aren't recognised by Cython and aren't used (despite being defined like that all over the C++ standard library wrappers included in Cython). At one point I tried to submit a patch for this but it was ignored. This is no longer true. Only point 1 now applies.
What does work is to tell Cython that it's a member operator (even if C++ implements it as a non-member operator). Therefore a simple patch to unordered_map.pxd would work. Note that I'm changing it to be defined with one argument and the C++ implicit this/self:
cdef extern from "<unordered_map>" namespace "std" nogil:
cdef cppclass unordered_map[T, U]:
# ... irrelevant stuff expunged ...
bint operator==(unordered_map&)
Alternatively, you can define it yourself before you need to use it (like you're doing currently) but as a template. This at least saves you having to define every specialization
cdef extern from "<unordered_map>" namespace "std":
bint operator==[R,S](unordered_map[R, S]&, unordered_map[R, S]&)
i.e. the statement in your question
(because Cython doesn't support generically templated functions, only generically templated classes and explicitly declared templates of functions)
isn't true.
It is all a bit of a mess though

Related

Can one use the while(file >> ...) C++ idiom to read files in Cython?

I'd like to use the C++ way of reading files in Cython.
I have a simple file reader that looks like this:
std::ifstream file(fileName);
while(file >> chromosome >> start >> end >> junk >> junk >> strand)
{ ... }
Can I do this in Cython?
Probably better options would be to use python parsing functionality (for example pandas' or numpy's) or, if first solution isn't flexible enough, to code the reader in pure C++ and then call the functionality from Cython.
However, also your approach is possible in Cython, but in order to make it work, one needs to jump through some hoops:
the whole iostream hierarchy isn't part of the provided libcpp-wrappers, so one has to wrap it (and if one doesn't it quick&dirty that are a few lines).
Because std::ifsteam doesn't provide a default constructor, we cannot construct it as an object with automatic lifetime in Cython and need take care of memory management.
Another issue is wrapping of used-defined conversion. It is not very well described in the documentation (see this SO-question), but only operator bool()]3 is supported, so we need to use C++11 (otherwise it is operator void*() const;).
So here is a quick&dirty proof of concept:
%%cython -+ -c=-std=c++11
cdef extern from "<fstream>" namespace "std" nogil:
cdef cppclass ifstream:
# constructor
ifstream (const char* filename)
# needed operator>> overloads:
ifstream& operator>> (int& val)
# others are
# ifstream& operator>> (unsigned int& val)
# ifstream& operator>> (long& val)
# ...
bint operator bool() # is needed,
# so while(file) can be evaluated
def read_with_cpp(filename):
cdef int a=0,b=0
cdef ifstream* foo = new ifstream(filename)
try:
while (foo[0] >> a >> b):
print(a, b)
finally: # don't forget to call destructor!
del foo
actually the return type of operator>>(...) is not std::ifstream but std::basic_istream - I'm just too lazy to wrap it as well.
And now:
>>> read_with_cpp(b"my_test_file.txt")
prints the content of the file to console.
However, as stated above, I would go for writing the parsing in pure C++ and consume it from Cython (e.g. by passing a functor, so the cpp-code can use Python functionality), here is a possible implementation:
%%cython -+
cdef extern from *:
"""
#include <fstream>
void read_file(const char* file_name, void(*line_callback)(int, int)){
std::ifstream file(file_name);
int a,b;
while(file>>a>>b){
line_callback(a,b);
}
}
"""
ctypedef void(*line_callback_type)(int, int)
void read_file(const char* file_name, line_callback_type line_callback)
# use function pointer to get access to Python functionality in cpp-code:
cdef void line_callback(int a, int b):
print(a,b)
# expose functionality to pure Python:
def read_with_cpp2(filename):
read_file(filename, line_callback)
and now calling read_with_cpp2(b"my_test_file.txt") leads to the same result as above.

Directly call C++ struct constructor from cython

I'm trying to wrap some C++ code that uses structs with constructors, and not figuring out how.
C++ Struct:
typedef struct point_3d_t
{
double x;
double y;
double z;
point_3d_t(double x, double y, double z)
: x(x)
, y(y)
, z(z)
{}
} point_3d;
Cython wrapper:
cdef extern from "./cppdar.hpp":
ctypedef struct point_3d:
point_3d(double, double, double)
double x;
double y;
double z;
Now, I'd expect to be able to construct the struct via something like cdef point_3d p1(v, v, v) (from within the cython file), but I can't seem to figure out how to get cython to just use the provided constructor.
I've tried:
cdef point_3d p1(v, v, v)
cdef point_3d p1 = point_3d(v, v, v)
cdef point_3d p1(0, 0, 0)
cdef point_3d p1 = point_3d(0, 0, 0)
Where v is a explicit cdef double v = 0, but none work.
Using plain cdef point_3d p1, p1.x = nnnn, etc..., but that's annoying, and I don't see why I shouldn't be able to use the default constructor, I think.
Trying to research the issue yields a lot of clutter related to class constructors, which hasn't been helpful.
Ok, so the answer is you can't stack-allocate C++ objects with constructor arguments in cython, basically at all:
From: https://groups.google.com/forum/#!topic/cython-users/fuKd-nQLpBs
Yes, it's a limitation, but it's a more fundamental issue than the
parser. The construction and destruction of stack-allocated objects in
C++ is intricately tied to their scope, and scoping rules are
different in Python and C. For example, consider
if some_condition():
x = Foo(1)
else:
x = Foo(2, 3)
return x.method()
This simply can't be expressed as such in C++. Conversely
if (some_other_condition()) {
Foo_with_RIAA foo(x)
}
...
wouldn't translate "correctly" Python scoping rules.
Now there are some cases where it could make sense, but significant
code generation changes would have to be made, as currently all
variables are declared at the top of a function (to follow the C89
standard, which some compilers enforce) but in C++ mode we would have
to defer the declaration of the variable to it's instantiation
(including avoiding any automatically inserted C-level {} scopes used
for simplicity in code generation).
As one can always allocate such complicated objects on the heap, this
isn't a significant limitation.
This is extra-double-plus annoying, because it means you simply cannot wrap classes that lack default constructors in many cases.
The horrible, no-good hacky workaround is to wrap the constructors in a simple C(++) function, and then expose that via cython.

Using Cython to wrap a c++ template to accept any numpy array

I'm trying to wrap a parallel sort written in c++ as a template, to use it with numpy arrays of any numeric type. I'm trying to use Cython to do this.
My problem is that I don't know how to pass a pointer to the numpy array data (of a correct type) to a c++ template. I believe I should use fused dtypes for this, but I don't quite understand how.
The code in .pyx file is below
# importing c++ template
cdef extern from "test.cpp":
void inPlaceParallelSort[T](T* arrayPointer,int arrayLength)
def sortNumpyArray(np.ndarray a):
# This obviously will not work, but I don't know how to make it work.
inPlaceParallelSort(a.data, len(a))
In the past I did similar tasks with ugly for-loops over all possible dtypes, but I believe there should be a better way to do this.
Yes, you want to use a fused type to have Cython call the sorting template for the appropriate specialization of the template.
Here's a working example for all non-complex data types that does this with std::sort.
# cython: wraparound = False
# cython: boundscheck = False
cimport cython
cdef extern from "<algorithm>" namespace "std":
cdef void sort[T](T first, T last) nogil
ctypedef fused real:
cython.char
cython.uchar
cython.short
cython.ushort
cython.int
cython.uint
cython.long
cython.ulong
cython.longlong
cython.ulonglong
cython.float
cython.double
cpdef void npy_sort(real[:] a) nogil:
sort(&a[0], &a[a.shape[0]-1])

Passing a Cython extension type method to a pure C function

I have a Cython extension type, let's call it Foo, and I'd like to be able to pass an 'instance' of it (isn't it like a struct?) in to a pure-C function so I can call a method of the extension type from C. I've tried piecing together how I might do this from various SO and mailing list posts but haven't been able to get it to work. My C skills are fairly rusty...
I know the below is wrong for a number of reasons, but hopefully it at least illustrates what I'd like to do...Thanks for your patience!
test.c:
#include <stdlib.h>
#include <stdio.h>
void cfunc(void* foo, double derp) {
double r = foo.bar(derp);
printf("%f", r);
}
cy_test.pyx:
import cython
cimport cython
cdef extern from "/tmp/test.c":
void cfunc(void*, double)
cdef public class Foo[type FooType, object Foo]:
cdef double spam
def __init__(self, spam):
self.spam = spam
cdef public double bar(self, double q) nogil:
q *= 15. + self.spam
return q
cpdef call_func():
foo = Foo(1.5)
cfunc(&foo, 10.)
call_func()
Python objects are of type PyObject*, which can be manipulated via the Python/C API. You'll probably find PyObject_CallMethod() and its relatives most useful here.
You may find it easier to write a C function in Cython that does what you want (i.e. a cdef function), and then call that from C code. That way, Cython does all the annoying PyObject* fiddling (including reference counting) and you just get your desired result.

Creating a PyCObject pointer in Cython

A few SciPy functions (like scipy.ndimage.interpolation.geometric_transform) can take pointers to C functions as arguments to avoid having to call a Python callable on each point of the input array.
In a nutshell :
Define a function called my_function somewhere in the C module
Return a PyCObject with the &my_function pointer and (optionally) a void* pointer to pass some global data around
The related API method is PyCObject_FromVoidPtrAndDesc, and you can read Extending ndimage in C to see it in action.
I am very interested in using Cython to keep my code more manageable, but I'm not sure how exactly I should create such an object. Any, well,... pointers?
Just do in Cython the same thing you would do in C, call PyCObject_FromVoidPtrAndDesc directly. Here is an example from your link ported to Cython:
###### example.pyx ######
from libc.stdlib cimport malloc, free
from cpython.cobject cimport PyCObject_FromVoidPtrAndDesc
cdef int _shift_function(int *output_coordinates, double* input_coordinates,
int output_rank, int input_rank, double *shift_data):
cdef double shift = shift_data[0]
cdef int ii
for ii in range(input_rank):
input_coordinates[ii] = output_coordinates[ii] - shift
return 1
cdef void _shift_destructor(void* cobject, void *shift_data):
free(shift_data)
def shift_function(double shift):
"""This is the function callable from python."""
cdef double* shift_data = <double*>malloc(sizeof(shift))
shift_data[0] = shift
return PyCObject_FromVoidPtrAndDesc(&_shift_function,
shift_data,
&_shift_destructor)
Performance should be identical to pure C version.
Note that Cyhton requires operator & to get function address. Also, Cython lacks pointer dereference operator *, indexing equivalent is used instead (*ptr -> ptr[0]).
I think that is a bad idea. Cython was created to avoid writing PyObjects also! Moreover, in this case, writing the code through Cython probably doesn't improve code maintenance...
Anyway, you can import the PyObject with
from cpython.ref cimport PyObject
in your Cython code.
UPDATE
from cpython cimport *
is safer.
Cheers,
Davide

Categories