I am attempting to extend some sklearn classes in the sklearn.neighbors.dist_metrics module. However, this entire module is written in Cython, and the things I want to do with my custom class must implement a cdef dist (as opposed to a def dist), apparently.
Thus, in my own module (I am using jupyter %%cython for now, actually) I would like to implement my custom class in Cython, implementing the cdef dist method as required. However, there are a number of other cdef things in the same module, in the sklearn.neighbors.typedefs module, and possibly in other modules that I need to import. When I try to import those things, I get errors of various sorts.
When I try a naive import:
%%cython
import numpy as np
from sklearn.neighbors.dist_metrics import DistanceMetric
from sklearn.neighbors.typedefs import DTYPE_t, ITYPE_t
cdef class NewDistance(sklearn.neighbors.DistanceMetric):
cdef inline DTYPE_t dist(self, DTYPE_t* x1, DTYPE_t* x2,
ITYPE_t size) nogil except -1:
return 5
I get errors suggesting that once the things have been "pythonized" I cannot use them in a "cython" definition:
Error compiling Cython file:
...
First base of 'KernelDistance' is not an extension type
...
'DTYPE_t' is not a type identifier
There is a cimport keyword, so:
%%cython
import numpy as np
from sklearn.neighbors.dist_metrics cimport DistanceMetric
from sklearn.neighbors.typedefs cimport DTYPE_t, ITYPE_t
cdef class NewDistance(sklearn.neighbors.DistanceMetric):
cdef inline DTYPE_t dist(self, DTYPE_t* x1, DTYPE_t* x2,
ITYPE_t size) nogil except -1:
return 5
Annnnndddd...
Error compiling Cython file:
....
'sklearn/neighbors/dist_metrics.pxd' not found
....
'sklearn/neighbors/typedefs.pxd' not found
How does one go about importing cython files from other libraries?
Related
Suppose I have .pxd and .pyx file, they all use function argument like this : np.ndarray[DTYPE_double_t, ndim=1] weight, but ctypedef in both file(pyx and pxd) will occur: 'DTYPE_int_t' redeclared
my code is following in both file start part:
import numpy as np
cimport numpy as np
cimport cython
DTYPE_double = np.float64
DTYPE_int = np.int32
ctypedef np.float64_t DTYPE_double_t
ctypedef np.int32_t DTYPE_int_t
From the Cython documentation on pxd files
When accompanying an equally named pyx file, they provide a Cython
interface to the Cython module so that other Cython modules can
communicate with it using a more efficient protocol than the Python
one.
This means that copying declarations from the .pxd file to the .pyx file is an error as it will be included automatically. To compile your code, you must remove the duplication.
You have not stated why you wish to duplicate the code, so if it is important for some purpose please explain why so I or others can help you to resolve the issue.
I'm trying to wrap a parallel sort written in c++ as a template, to use it with numpy arrays of any numeric type. I'm trying to use Cython to do this.
My problem is that I don't know how to pass a pointer to the numpy array data (of a correct type) to a c++ template. I believe I should use fused dtypes for this, but I don't quite understand how.
The code in .pyx file is below
# importing c++ template
cdef extern from "test.cpp":
void inPlaceParallelSort[T](T* arrayPointer,int arrayLength)
def sortNumpyArray(np.ndarray a):
# This obviously will not work, but I don't know how to make it work.
inPlaceParallelSort(a.data, len(a))
In the past I did similar tasks with ugly for-loops over all possible dtypes, but I believe there should be a better way to do this.
Yes, you want to use a fused type to have Cython call the sorting template for the appropriate specialization of the template.
Here's a working example for all non-complex data types that does this with std::sort.
# cython: wraparound = False
# cython: boundscheck = False
cimport cython
cdef extern from "<algorithm>" namespace "std":
cdef void sort[T](T first, T last) nogil
ctypedef fused real:
cython.char
cython.uchar
cython.short
cython.ushort
cython.int
cython.uint
cython.long
cython.ulong
cython.longlong
cython.ulonglong
cython.float
cython.double
cpdef void npy_sort(real[:] a) nogil:
sort(&a[0], &a[a.shape[0]-1])
I am new to Cython and encountered this code snippet:
import numpy as np
cimport numpy as np
testarray = np.arange(5)
cdef np.ndarray[np.int_t, ndim=1] testarray1 = testarray.copy()
cdef np.ndarray[np.float_t, ndim=1] testarray2 = testarray.astype(np.float)
During compilation, it said Buffer types only allowed as function local variables. However, I am using .copy() or .astype() which is returning not a memoryview, but a copy. Why is this still happening? How can I get around this?
Thanks!
When you define an array in cython using np.ndarray[Type, dim], that is accessing the python buffer interface, and those can't be set as module level variables. This is a separate issue from views vs copies of numpy array data.
Typically if I want to have an array as a module level variable (i.e not local to a method), I define a typed memoryview and then set it within a method using something like (untested):
import numpy as np
cimport numpy as np
cdef np.int_t[:] testarray1
def init_arrays(np.int_t[:] testarray):
global testarray1
testarray1 = testarray.copy()
Let us have script foo.pyx with function in it:
def hello():
cdef int* i = <int *> malloc(sizeof(int))
i[0] = 1
trol(i)
print i
and script with function noo.pyx:
cdef trol(int * i):
i[0] = 42
the question is, how do I now import the trol function from file noo.pyx to foo.pyx, so I can use it in hello function.
This is only model example, but I think, that it illustrate the problem fair enough.
I tried simple
from noo import trol
but that throws "Cannot convert 'int *' to Python object"
Edit: I would better add, that this example will work just fine if I put both functions to same file.
This seems like something obvious to try, but did you try:
from noo cimport trol
If you use import instead of cimport, I think it will try to cast trol as a python function and generate the error you're getting.
The solution eventually was to create additional .pxd file, which something very similar to classic header .h file in C. It stores functions declarations and when cimport is called, it is in this file where it looks for functions and structures.
So to be specific, all I needed to do was to create file noo.pxd containing:
cdef trol(int * i)
and than we can simply cimport this function from foo.pyx by calling
from noo cimport trol
What is confusing is that if you want to create an array you use
chunk = np.array ( [[94.,3.],[44.,4.]], dtype=np.float64)
But if you want to define the type inside a buffer reference , you use
cdef func1 (np.ndarray[np.float64_t, ndim=2] A):
print A
Notice the difference between np.float64 and np.float64_t .
My Guesses
I am guessing that a type identifier is what's created explicitly w/ the Cython C-like typedef syntax
ctypedef np.float64_t dtype_t
But the numpy type is just the Python <type 'type'> type .
>>> type ( np.float64)
<type 'type'>
The Numpy documentation on dtypes doesn't help me. http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html
in your cython code, you do:
import numpy as np
cimport numpy as np
the first line import numpy module in python space, but the second line just include numpy.pxd in cython space.
you can found numpy.pxd in you cython install folder. It define float64_t as:
ctypedef double npy_float64
ctypedef npy_float64 float64_t