I am trying to speed up a Python script. I have profiled the code and re-factored quite a lot already in pure Python. It seems that I am still spending a lot of time in accessing some numpy arrays in a way that looks like:
KeyArray[BoolArray[index]]
where KeyArray is ndim=2 and contains strings, BoolArray is ndim=1 and contains bool and index is an int.
I am trying to learn Cython to see how faster it could be. I wrote the following script that does not work:
import numpy as np
cimport numpy as np
def fastindexer(np.ndarray[np.str_t,ndim=1] KeyArray, np.ndarray [np.bool_t,ndim=2] BoolArray, np.int_t DateIndex):
cdef np.ndarray[np.str_t,ndim=1] FArray = KeyArray[BoolArray[DateIndex]]
return FArray
I understand that types str/bool are not available 'as is' in np arrays. I tried to cast as well but I don't understand how this should be written.
All help welcome
As #Joe said, moving a single indexing statement to Cython won't give you speed. If you decide to move more of your program to Cython, you need to fix a number of problems.
1) You use def instead of cdef, limiting you to Python-only functionality.
2) You use the old buffer syntax. Read about memoryviews
3) Slicing a 2-D array is slow because a new memoryview is created each time. That said, it still is a lot faster than Python, but for peak performance you would have to use a different approach.
Heres something to get you started.
cpdef func():
cdef int i
cdef bool[:] my_bool_array = np.zeros(10, dtype=bool)
# I'm not if this next line is correct
cdef char[:,:] my_string_array = np.chararray((10, 10))
cdef char answer
for i in range(10):
answer = my_string_array[ my_bool_array[i] ]
Related
I'm trying to send float16 data to an Nvidia P100 card from some Cython code. When I was using float32, I could define my types in Cython like so:
DTYPE = np.float32
ctypedef np.float32_t DTYPE_t
cdef np.ndarray[DTYPE_t, ndim=2] mat = np.empty((100, 100), dtype=DTYPE)
But Cython can't find a defined type for np.float16_t, so I can't just replace the 32 with 16. If I try to provide another type that takes the same amount of space, like np.uint16_t, I get errors like:
Does not understand character buffer dtype format string ('e')
When I google, all I can find is a thread from 2011 about people trying to figure out how to support it...surely there must be a solution by now?
I think the answer is "sort of, but it's a reasonable amount of work if you want to do any real calculations".
The basic problem is that C doesn't support a 16-bit float type on most PCs (because the processor instructions don't exist). Therefore, what numpy has done is typedefed a 16-bit unsigned int to store the 16-bit float, and then written a set of functions to convert that to/from the supported float types. Any calculations using np.float16 are actually done on 32-bit or 64-bit floats but the data is stored in the 16-bit format between calculations.
The consequence of this is that Cython doesn't have an easy way of generating valid C code for any calculations it needs to do. The consequence is that you may need to write out this C code yourself.
There are a number of levels of complexity, depending on what you actually want to do:
1) Don't type anything
Cython doesn't actually need you to specify any types - it compiles Python code happy. Therefore, don't assign types to the half-float arrays and just let them by Python objects. This may not be terribly fast but it's worth remembering that it will work.
2) To move data you can view it as uint16
If you're just shuffling data around then can define uint16 arrays and use those to copy it from one place to another. Use the numpy view function to get the data in a format that Cython recognises and to get it back. However, you can't do maths in this mode (the answers will be meaningless).
from libc.stdint cimport uint16_t
import numpy as np
def just_move_stuff(x):
assert x.dtype == np.float16
# I've used memoryviews by cdef np.ndarray should be fine too
cdef uint16_t[:] x_as_uint = x.view(np.uint16)
cdef uint16_t[:] y_as_uint = np.empty(x.shape,dtype=np.uint16)
for n in range(x_as_uint.shape[0]):
y_as_uint[n] = x_as_uint[n]
return np.asarray(y_as_uint).view(dtype=np.float16)
The view function doesn't make a copy so is pretty cheap to use.
3) Do maths with manual conversions
If you want to do any calculations you'll need to use numpy's conversion functions to change your "half-float" data to full floats and back. If you forget to do this the answers you get will be meaningless. Start by including them from numpy/halffloat.h:
cdef extern from "numpy/halffloat.h":
ctypedef uint16_t npy_half
# conversion functions
float npy_half_to_float(npy_half h);
npy_half npy_float_to_half(float f);
def do_some_maths(x):
assert x.dtype == np.float16
cdef uint16_t[:] x_as_uint = x.view(np.uint16)
cdef uint16_t[:] y_as_uint = np.empty(x.shape,dtype=np.uint16)
for n in range(x_as_uint.shape[0]):
y_as_uint[n] = npy_float_to_half(2*npy_half_to_float(x_as_uint[n]))
return np.asarray(y_as_uint).view(dtype=np.float16)
This code requires you to link against the numpy core math library:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
from numpy.distutils.misc_util import get_info
info = get_info('npymath')
ext_modules = [Extension("module_name", ["module_name.pyx"],**info)]
setup(
ext_modules = cythonize(ext_modules)
)
I am new to Cython and encountered this code snippet:
import numpy as np
cimport numpy as np
testarray = np.arange(5)
cdef np.ndarray[np.int_t, ndim=1] testarray1 = testarray.copy()
cdef np.ndarray[np.float_t, ndim=1] testarray2 = testarray.astype(np.float)
During compilation, it said Buffer types only allowed as function local variables. However, I am using .copy() or .astype() which is returning not a memoryview, but a copy. Why is this still happening? How can I get around this?
Thanks!
When you define an array in cython using np.ndarray[Type, dim], that is accessing the python buffer interface, and those can't be set as module level variables. This is a separate issue from views vs copies of numpy array data.
Typically if I want to have an array as a module level variable (i.e not local to a method), I define a typed memoryview and then set it within a method using something like (untested):
import numpy as np
cimport numpy as np
cdef np.int_t[:] testarray1
def init_arrays(np.int_t[:] testarray):
global testarray1
testarray1 = testarray.copy()
I'm trying to port some python code to cython and I'm encountering some minor problems.
Below you see a code snippet (simplified example) of the code.
cimport numpy as np
cimport cython
#cython.boundscheck(False) # turn of bounds-checking for entire function
#cython.wraparound(False)
#cython.nonecheck(False)
def Interpolation(cells, int nmbcellsx):
cdef np.ndarray[float,ndim=1] celle
cdef int cellnonzero
cdef int i,l
for i in range(nmbcellsx):
celle = cells[i].e
cellnonzero = cells[i].nonzero
for l in range(cellnonzero):
celle[l] = celle[l] * celle[l]
I don't understand why the inner-most loop does not fully translate to C code (i.e. the last line, celle[l] = ...), see output from cython -a feedback:
What am I missing here?
Thanks a lot.
I finally realized that a simple "return 0" at the very end of the function solves this problem. However, this behaviour seems quite strange to me. Is this actually a bug?
I have a list of numpy.ndarrays (with different length) in python and need to have very fast access to those in python. I think an array of pointers would do the trick. I tried:
float_type_t* list_of_arrays[no_of_arrays]
for data_array in python_list_of_arrays:
list_of_arrays[0] = data_array
But cython complains:
no_of_arrays < Not allowed in a constant expression
I have tried several ways to constify this variable:
cdef extern from *:
ctypedef int const_int "const int"
(there have been more creative attempts) - however it unfortunatley doesn't work.
Please help.
Why don't you use a numpy object array instead of a list of arrays?
I think the problem you're having is because you are declaring list_of_arrays in the stack, and its size must be known at compile-time. You can try some dynamic allocation, like this:
from libc.stdlib cimport malloc, free
cdef float_type_t *list_of_arrays = \
<float_type_t *>malloc(no_of_arrays * sizeof(float_type_t*))
for i in range(no_of_arrays):
list_of_arrays[i] = &(data_array[i].data)
# don't forget to free list_of_arrays!
(This assumes data_array is a numpy array.)
But this is still guessing a bit what you want to do.
I want to create a signal processing algorithm that needs to hold some internal state in a numpy array.
For speed, I coded that in cython and declared the state a global variable like this:
import numpy as np
cimport numpy as np
cdef np.ndarray delay_buffer
However, what I really would like to do is this:
import numpy as np
cimport numpy as np
DTYPE = np.float32
ctypedef np.float32_t DTYPE_t
cdef np.ndarray[DTYPE_t] delay_buffer
This I can do anyhwere else, but not in the global scope. Is there any way to accomplish this?
Is there any way to accomplish this?
No. As the error says, Buffer types only allowed as function local variables.
One alternative is to use a monolithic main function. This really only takes indenting everything but it means that you can only share so much.
My favourite alternative would be to upgrade to the modern method of using memoryviews:
cdef DTYPE_t[:] delay_buffer
The should be faster, cleaner and no less powerful.