Using half-precision NumPy floats in Cython - python

I'm trying to send float16 data to an Nvidia P100 card from some Cython code. When I was using float32, I could define my types in Cython like so:
DTYPE = np.float32
ctypedef np.float32_t DTYPE_t
cdef np.ndarray[DTYPE_t, ndim=2] mat = np.empty((100, 100), dtype=DTYPE)
But Cython can't find a defined type for np.float16_t, so I can't just replace the 32 with 16. If I try to provide another type that takes the same amount of space, like np.uint16_t, I get errors like:
Does not understand character buffer dtype format string ('e')
When I google, all I can find is a thread from 2011 about people trying to figure out how to support it...surely there must be a solution by now?

I think the answer is "sort of, but it's a reasonable amount of work if you want to do any real calculations".
The basic problem is that C doesn't support a 16-bit float type on most PCs (because the processor instructions don't exist). Therefore, what numpy has done is typedefed a 16-bit unsigned int to store the 16-bit float, and then written a set of functions to convert that to/from the supported float types. Any calculations using np.float16 are actually done on 32-bit or 64-bit floats but the data is stored in the 16-bit format between calculations.
The consequence of this is that Cython doesn't have an easy way of generating valid C code for any calculations it needs to do. The consequence is that you may need to write out this C code yourself.
There are a number of levels of complexity, depending on what you actually want to do:
1) Don't type anything
Cython doesn't actually need you to specify any types - it compiles Python code happy. Therefore, don't assign types to the half-float arrays and just let them by Python objects. This may not be terribly fast but it's worth remembering that it will work.
2) To move data you can view it as uint16
If you're just shuffling data around then can define uint16 arrays and use those to copy it from one place to another. Use the numpy view function to get the data in a format that Cython recognises and to get it back. However, you can't do maths in this mode (the answers will be meaningless).
from libc.stdint cimport uint16_t
import numpy as np
def just_move_stuff(x):
assert x.dtype == np.float16
# I've used memoryviews by cdef np.ndarray should be fine too
cdef uint16_t[:] x_as_uint = x.view(np.uint16)
cdef uint16_t[:] y_as_uint = np.empty(x.shape,dtype=np.uint16)
for n in range(x_as_uint.shape[0]):
y_as_uint[n] = x_as_uint[n]
return np.asarray(y_as_uint).view(dtype=np.float16)
The view function doesn't make a copy so is pretty cheap to use.
3) Do maths with manual conversions
If you want to do any calculations you'll need to use numpy's conversion functions to change your "half-float" data to full floats and back. If you forget to do this the answers you get will be meaningless. Start by including them from numpy/halffloat.h:
cdef extern from "numpy/halffloat.h":
ctypedef uint16_t npy_half
# conversion functions
float npy_half_to_float(npy_half h);
npy_half npy_float_to_half(float f);
def do_some_maths(x):
assert x.dtype == np.float16
cdef uint16_t[:] x_as_uint = x.view(np.uint16)
cdef uint16_t[:] y_as_uint = np.empty(x.shape,dtype=np.uint16)
for n in range(x_as_uint.shape[0]):
y_as_uint[n] = npy_float_to_half(2*npy_half_to_float(x_as_uint[n]))
return np.asarray(y_as_uint).view(dtype=np.float16)
This code requires you to link against the numpy core math library:
from distutils.core import setup
from distutils.extension import Extension
from Cython.Build import cythonize
from numpy.distutils.misc_util import get_info
info = get_info('npymath')
ext_modules = [Extension("module_name", ["module_name.pyx"],**info)]
setup(
ext_modules = cythonize(ext_modules)
)

Related

How to pass void pointer (starting of an array) to C function from Cython? [duplicate]

I have a numpy array which came from a cv2.imread and so has dtype = np.uint8 and ndim = 3.
I want to convert it to a Cython unsigned int* for use with an external cpp library.
I am trying cdef unsigned int* buff = <unsigned int*>im.data however I get the error Python objects cannot be cast to pointers of primitive types
What am I doing wrong?
Thanks
The more modern way would be to use a memoryview rather than a pointer:
cdef np.uint32_t[:,:,::1] mv_buff = np.ascontiguousarray(im, dtype = np.uint32)
The [:,;,::1] syntax tells Cython the memoryview is 3D and C contiguous in memory. The advantage of defining the type to be a memoryview rather than a numpy array is
it can accept any type that defines the buffer interface, for example the built in array module, or objects from the PIL imaging library.
Memoryviews can be passed without holding the GIL, which is useful for parallel code
To get the pointer from the memoryview get the address of the first element:
cdef np.uint32_t* im_buff = &mv_buff[0,0,0]
This is better than doing <np.uint32_t*>mv_buff.data because it avoids a cast, and casts can often hide errors.
thanks for your comments. solved by:
cdef np.ndarray[np.uint32_t, ndim=3, mode = 'c'] np_buff = np.ascontiguousarray(im, dtype = np.uint32)
cdef unsigned int* im_buff = <unsigned int*> np_buff.data

return numpy array of std::variant in c++

I'm trying to create a pybind11 interface which returns a tuple of two numpy arrays from C++ to Python. The subfunction lttb inside the lambda expression returns two vectors of type std::vector<double> and std::vector<std::variant<uint32_t, in64_t, double>>.
If I directly return the vectors, the pybind11/stl.h header takes care of the conversion just fine. The creation of the double numpy array also works flawlessly. However, when trying to create a numpy array of std::variant, I get the following error
..\subprojects\pybind11\include\pybind11/numpy.h(1159): error C2338: Attempt to use a non-POD or unimplemented POD type as a numpy dtype
There is a way to register structured types (https://pybind11.readthedocs.io/en/stable/advanced/pycpp/numpy.html#structured-types) in order to prevent this error message, however I only see a way to make this work with a struct not with an std::variant. Does anyone know how to create a numpy array of std::variant? Should not be that hard as the conversion of an std::vector<std::variant<...>> works out of the box. Here's the code.
.def("lttb",
[&](ScopeDataElement& data,
const std::pair<double, double>& limit,
int64_t masterTick,
long double samplingTime,
size_t destinationSize) {
auto&& [x, y] =
data.lttb(limit, masterTick, samplingTime, destinationSize);
using T = std::decay_t<decltype(x)>::value_type;
using U = std::decay_t<decltype(y)>::value_type;
auto arrx = py::array_t<T>(x.size(), x.data());
auto arry = py::array_t<U>(y.size(), y.data());
return py::make_tuple(arrx, arry);
})

Fast indexing: Cython with numpy array of bool and str

I am trying to speed up a Python script. I have profiled the code and re-factored quite a lot already in pure Python. It seems that I am still spending a lot of time in accessing some numpy arrays in a way that looks like:
KeyArray[BoolArray[index]]
where KeyArray is ndim=2 and contains strings, BoolArray is ndim=1 and contains bool and index is an int.
I am trying to learn Cython to see how faster it could be. I wrote the following script that does not work:
import numpy as np
cimport numpy as np
def fastindexer(np.ndarray[np.str_t,ndim=1] KeyArray, np.ndarray [np.bool_t,ndim=2] BoolArray, np.int_t DateIndex):
cdef np.ndarray[np.str_t,ndim=1] FArray = KeyArray[BoolArray[DateIndex]]
return FArray
I understand that types str/bool are not available 'as is' in np arrays. I tried to cast as well but I don't understand how this should be written.
All help welcome
As #Joe said, moving a single indexing statement to Cython won't give you speed. If you decide to move more of your program to Cython, you need to fix a number of problems.
1) You use def instead of cdef, limiting you to Python-only functionality.
2) You use the old buffer syntax. Read about memoryviews
3) Slicing a 2-D array is slow because a new memoryview is created each time. That said, it still is a lot faster than Python, but for peak performance you would have to use a different approach.
Heres something to get you started.
cpdef func():
cdef int i
cdef bool[:] my_bool_array = np.zeros(10, dtype=bool)
# I'm not if this next line is correct
cdef char[:,:] my_string_array = np.chararray((10, 10))
cdef char answer
for i in range(10):
answer = my_string_array[ my_bool_array[i] ]

ctypedef in Cython with numpy: what is right convention?

In Cython when using numpy, what is the point of writing:
cimport numpy as np
import numpy as np
ctypedef np.int_t DTYPE_t
and then using DTYPE_t everywhere instead of just using np.int_t? Does the ctypedef actually do anything differently in the resulting code here?
You can read the notes from the docs for cython, reading the notes they explain the reason for the use of this notation and imports.
from __future__ import division
import numpy as np
# "cimport" is used to import special compile-time information
# about the numpy module (this is stored in a file numpy.pxd which is
# currently part of the Cython distribution).
cimport numpy as np
# We now need to fix a datatype for our arrays. I've used the variable
# DTYPE for this, which is assigned to the usual NumPy runtime
# type info object.
DTYPE = np.int
# "ctypedef" assigns a corresponding compile-time type to DTYPE_t. For
# every type in the numpy module there's a corresponding compile-time
# type with a _t-suffix.
ctypedef np.int_t DTYPE_t
# "def" can type its arguments but not have a return type. The type of the
# arguments for a "def" function is checked at run-time when entering the
# function.
#
# The arrays f, g and h is typed as "np.ndarray" instances. The only effect
# this has is to a) insert checks that the function arguments really are
# NumPy arrays, and b) make some attribute access like f.shape[0] much
# more efficient. (In this example this doesn't matter though.)

cython with array of pointers

I have a list of numpy.ndarrays (with different length) in python and need to have very fast access to those in python. I think an array of pointers would do the trick. I tried:
float_type_t* list_of_arrays[no_of_arrays]
for data_array in python_list_of_arrays:
list_of_arrays[0] = data_array
But cython complains:
no_of_arrays < Not allowed in a constant expression
I have tried several ways to constify this variable:
cdef extern from *:
ctypedef int const_int "const int"
(there have been more creative attempts) - however it unfortunatley doesn't work.
Please help.
Why don't you use a numpy object array instead of a list of arrays?
I think the problem you're having is because you are declaring list_of_arrays in the stack, and its size must be known at compile-time. You can try some dynamic allocation, like this:
from libc.stdlib cimport malloc, free
cdef float_type_t *list_of_arrays = \
<float_type_t *>malloc(no_of_arrays * sizeof(float_type_t*))
for i in range(no_of_arrays):
list_of_arrays[i] = &(data_array[i].data)
# don't forget to free list_of_arrays!
(This assumes data_array is a numpy array.)
But this is still guessing a bit what you want to do.

Categories