numpy array with cython - python

I'm trying to port some python code to cython and I'm encountering some minor problems.
Below you see a code snippet (simplified example) of the code.
cimport numpy as np
cimport cython
#cython.boundscheck(False) # turn of bounds-checking for entire function
#cython.wraparound(False)
#cython.nonecheck(False)
def Interpolation(cells, int nmbcellsx):
cdef np.ndarray[float,ndim=1] celle
cdef int cellnonzero
cdef int i,l
for i in range(nmbcellsx):
celle = cells[i].e
cellnonzero = cells[i].nonzero
for l in range(cellnonzero):
celle[l] = celle[l] * celle[l]
I don't understand why the inner-most loop does not fully translate to C code (i.e. the last line, celle[l] = ...), see output from cython -a feedback:
What am I missing here?
Thanks a lot.

I finally realized that a simple "return 0" at the very end of the function solves this problem. However, this behaviour seems quite strange to me. Is this actually a bug?

Related

Cython error on import: C function has wrong signature

As part of larger cython project I am getting an error that seems to indicate that maybe the automatic type merging is failing. It is possible I am doing something silly or that the fact that I am structuring things in this way is a bad smell. I think I can get around the issue by using void* and then casts everywhere but this seems quite messy for what should be a simple thing.
It seems similar to this issue, but that involved function pointers.
Wrong signature error using function types in multiple modules
I have put together a minimal example that demonstrates the issue. The files are as follows.
a.pxd
cdef struct A:
int a
cdef A makeA(int a)
a.pyx
cdef struct A:
pass
cdef A makeA(int a):
cdef A sA
sA.a = a
return sA
b.pyx
from cfr.a cimport A, makeA
cdef int get_A_value():
cdef A sA = makeA(5)
return sA.a
If I compile and then try to import b in a jupyter-lab notebook I get the following error:
b.pyx in init b()
TypeError: C function cfr.a.makeA has wrong signature (expected struct __pyx_t_3cfr_1a_A (int), got struct __pyx_t_1a_A (int))
EDIT: It looks like it might be something to do with my directory structure. Note that I do from cfr.a cimport A (as this happens to be the layout of my project). I think I need to do from a import A and get the import to work through the use of my pythonpath.
So it turned out the issue was in the from cfr.a cimport A line. This was somehow meaning that the A type was not being seen as the same type as A in the a.pyx file.
To fix use:
from a cimport A
(You may then get a not found error, but we can fix this with our pythonpath).
In the comments adding a __init__.pxd file was suggested, but adding a blank one did not work for me.

Fast indexing: Cython with numpy array of bool and str

I am trying to speed up a Python script. I have profiled the code and re-factored quite a lot already in pure Python. It seems that I am still spending a lot of time in accessing some numpy arrays in a way that looks like:
KeyArray[BoolArray[index]]
where KeyArray is ndim=2 and contains strings, BoolArray is ndim=1 and contains bool and index is an int.
I am trying to learn Cython to see how faster it could be. I wrote the following script that does not work:
import numpy as np
cimport numpy as np
def fastindexer(np.ndarray[np.str_t,ndim=1] KeyArray, np.ndarray [np.bool_t,ndim=2] BoolArray, np.int_t DateIndex):
cdef np.ndarray[np.str_t,ndim=1] FArray = KeyArray[BoolArray[DateIndex]]
return FArray
I understand that types str/bool are not available 'as is' in np arrays. I tried to cast as well but I don't understand how this should be written.
All help welcome
As #Joe said, moving a single indexing statement to Cython won't give you speed. If you decide to move more of your program to Cython, you need to fix a number of problems.
1) You use def instead of cdef, limiting you to Python-only functionality.
2) You use the old buffer syntax. Read about memoryviews
3) Slicing a 2-D array is slow because a new memoryview is created each time. That said, it still is a lot faster than Python, but for peak performance you would have to use a different approach.
Heres something to get you started.
cpdef func():
cdef int i
cdef bool[:] my_bool_array = np.zeros(10, dtype=bool)
# I'm not if this next line is correct
cdef char[:,:] my_string_array = np.chararray((10, 10))
cdef char answer
for i in range(10):
answer = my_string_array[ my_bool_array[i] ]

Cython's `cdef` raises a NameError where a `def` works just fine

Maybe I am understanding the cdef for function definition incorrectly. E.g., assume I want to write a function to convert a Python list to a C array:
%%cython
cimport cython
from libc.stdlib cimport malloc, free
cdef int* list_to_array(a_list):
""" Converts a Python list into a C array """
cdef int *c_array
cdef int count, n
c_array = <int *>malloc(len(a_list)*cython.sizeof(int))
count = len(a_list)
for i in range(count):
c_array[i] = a_list[i]
return c_array
when I call the function now via
list_to_array([1,2,3])
I get a
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-32-8f3f777d7883> in <module>()
----> 1 list_to_array([1,2,3])
NameError: name 'list_to_array' is not defined
However, when I just use the def, the function can be called (although it doesn't return what I want, it is just for illustrating my problem...)
%%cython
cimport cython
from libc.stdlib cimport malloc, free
def list_to_array1(a_list):
""" Converts a Python list into a C array """
cdef int *c_array
cdef int count, n
c_array = <int *>malloc(len(a_list)*cython.sizeof(int))
count = len(a_list)
for i in range(count):
c_array[i] = a_list[i]
return 1
list_to_array1([1,2,3])
1
When I tried to use cpdef instead of cdef, I encounter a different issue:
Error compiling Cython file:
------------------------------------------------------------
...
cimport cython
from libc.stdlib cimport malloc, free
cpdef int* list_to_carray(a_list):
^
------------------------------------------------------------
/Users/sebastian/.ipython/cython/_cython_magic_c979dc7a52cdfb492e901a4b337ed2d2.pyx:3:6: Cannot convert 'int *' to Python object
Citing the docs, "The cdef statement is used to make C level declarations"
Then if you scroll a bit down here, you see that cdef functions can not be called from python, hence your NameError. Try using cpdef.
Note that if you plan to use that function in python code it will leak memory. You may also want to have a look at this answer on how/why you should return pass a list to/from cython (disclaimer: the answer is mine) to avoid the leakage.
EDIT, in reply to updated question:
The error once you introduce cpdef happens because a pointer cannot be converted to a python object in a trivial way. Cython does the hard work for you in the simplest cases, see here. The question you should ask here is why you want to return a C pointer to the python environment, that does not provide pointers.

Creating a PyCObject pointer in Cython

A few SciPy functions (like scipy.ndimage.interpolation.geometric_transform) can take pointers to C functions as arguments to avoid having to call a Python callable on each point of the input array.
In a nutshell :
Define a function called my_function somewhere in the C module
Return a PyCObject with the &my_function pointer and (optionally) a void* pointer to pass some global data around
The related API method is PyCObject_FromVoidPtrAndDesc, and you can read Extending ndimage in C to see it in action.
I am very interested in using Cython to keep my code more manageable, but I'm not sure how exactly I should create such an object. Any, well,... pointers?
Just do in Cython the same thing you would do in C, call PyCObject_FromVoidPtrAndDesc directly. Here is an example from your link ported to Cython:
###### example.pyx ######
from libc.stdlib cimport malloc, free
from cpython.cobject cimport PyCObject_FromVoidPtrAndDesc
cdef int _shift_function(int *output_coordinates, double* input_coordinates,
int output_rank, int input_rank, double *shift_data):
cdef double shift = shift_data[0]
cdef int ii
for ii in range(input_rank):
input_coordinates[ii] = output_coordinates[ii] - shift
return 1
cdef void _shift_destructor(void* cobject, void *shift_data):
free(shift_data)
def shift_function(double shift):
"""This is the function callable from python."""
cdef double* shift_data = <double*>malloc(sizeof(shift))
shift_data[0] = shift
return PyCObject_FromVoidPtrAndDesc(&_shift_function,
shift_data,
&_shift_destructor)
Performance should be identical to pure C version.
Note that Cyhton requires operator & to get function address. Also, Cython lacks pointer dereference operator *, indexing equivalent is used instead (*ptr -> ptr[0]).
I think that is a bad idea. Cython was created to avoid writing PyObjects also! Moreover, in this case, writing the code through Cython probably doesn't improve code maintenance...
Anyway, you can import the PyObject with
from cpython.ref cimport PyObject
in your Cython code.
UPDATE
from cpython cimport *
is safer.
Cheers,
Davide

cython with array of pointers

I have a list of numpy.ndarrays (with different length) in python and need to have very fast access to those in python. I think an array of pointers would do the trick. I tried:
float_type_t* list_of_arrays[no_of_arrays]
for data_array in python_list_of_arrays:
list_of_arrays[0] = data_array
But cython complains:
no_of_arrays < Not allowed in a constant expression
I have tried several ways to constify this variable:
cdef extern from *:
ctypedef int const_int "const int"
(there have been more creative attempts) - however it unfortunatley doesn't work.
Please help.
Why don't you use a numpy object array instead of a list of arrays?
I think the problem you're having is because you are declaring list_of_arrays in the stack, and its size must be known at compile-time. You can try some dynamic allocation, like this:
from libc.stdlib cimport malloc, free
cdef float_type_t *list_of_arrays = \
<float_type_t *>malloc(no_of_arrays * sizeof(float_type_t*))
for i in range(no_of_arrays):
list_of_arrays[i] = &(data_array[i].data)
# don't forget to free list_of_arrays!
(This assumes data_array is a numpy array.)
But this is still guessing a bit what you want to do.

Categories