Using the buffer API in Cython - python

I'm working in with a C library that repeatedly calls a user supplied function pointer to get more data. I'd like to write a Cython wrapper in such a way that the Python implementation of that callback can return any reasonable data type like str, bytearray, memory mapped files, and so on (specifically, supports the Buffer interface). what I have so far is:
from cpython.buffer cimport PyBUF_SIMPLE
from cpython.buffer cimport Py_buffer
from cpython.buffer cimport PyObject_GetBuffer
from cpython.buffer cimport PyBuffer_Release
from libc.string cimport memmove
cdef class _callback:
cdef public object callback
cdef public object data
cdef uint16_t GetDataCallback(void * userdata,
uint32_t wantlen, unsigned char * data,
uint32_t * gotlen):
cdef Py_buffer gotdata
box = <_callback> userdata
gotdata_object = box.callback(box.data, wantlen)
if not PyObject_CheckBuffer(gotdata_object):
# sulk
return 1
try:
PyObject_GetBuffer(gotdata_object, &gotdata, PyBUF_SIMPLE)
if not (0 < gotdata.len <= wantlen):
# sulk
return 1
memmove(data, gotdata.buf, gotdata.len)
return 0
finally:
PyBuffer_Release(&gotdata)
The code I want to write would produce equivalent C code, but look like this:
from somewhere cimport something
from libc.string cimport memmove
cdef class _callback:
cdef public object callback
cdef public object data
cdef uint16_t GetDataCallback(void * userdata,
uint32_t wantlen, unsigned char * data,
uint32_t * gotlen):
cdef something gotdata
box = <_callback> userdata
gotdata = box.callback(box.data, wantlen)
if not (0 < gotdata.len <= wantlen):
# sulk
return 1
memmove(data, gotdata.buf, gotdata.len)
return 0
The generated C code looks like what I think it should be doing; but this seems like digging around in the Python API unnecessarily. Does Cython provide a nicer syntax to achieve this effect?

If you want to support everything that implements every variation of the new-style or old-style buffer interface, then you have to use the C API.
But if you don't care about old-style buffers, you can almost always use a memoryview:
Cython memoryviews support nearly all objects exporting the interface of Python new style buffers. This is the buffer interface described in PEP 3118. NumPy arrays support this interface, as do Cython arrays. The “nearly all” is because the Python buffer interface allows the elements in the data array to themselves be pointers; Cython memoryviews do not yet support this.
This of course includes str (or, in 3.x, bytes), bytearray, etc—if you followed the link, you may notice that it links to the same page to explain what it supports that you linked to explain what you want to support.
For 1D arrays of characters (like str), it's:
cdef char [:] gotdata

Related

How to pass void pointer (starting of an array) to C function from Cython? [duplicate]

I have a numpy array which came from a cv2.imread and so has dtype = np.uint8 and ndim = 3.
I want to convert it to a Cython unsigned int* for use with an external cpp library.
I am trying cdef unsigned int* buff = <unsigned int*>im.data however I get the error Python objects cannot be cast to pointers of primitive types
What am I doing wrong?
Thanks
The more modern way would be to use a memoryview rather than a pointer:
cdef np.uint32_t[:,:,::1] mv_buff = np.ascontiguousarray(im, dtype = np.uint32)
The [:,;,::1] syntax tells Cython the memoryview is 3D and C contiguous in memory. The advantage of defining the type to be a memoryview rather than a numpy array is
it can accept any type that defines the buffer interface, for example the built in array module, or objects from the PIL imaging library.
Memoryviews can be passed without holding the GIL, which is useful for parallel code
To get the pointer from the memoryview get the address of the first element:
cdef np.uint32_t* im_buff = &mv_buff[0,0,0]
This is better than doing <np.uint32_t*>mv_buff.data because it avoids a cast, and casts can often hide errors.
thanks for your comments. solved by:
cdef np.ndarray[np.uint32_t, ndim=3, mode = 'c'] np_buff = np.ascontiguousarray(im, dtype = np.uint32)
cdef unsigned int* im_buff = <unsigned int*> np_buff.data

Cython using gmp arithmetic

I'm trying to implement a simple code in cython using Jupyter notebook (I use python 2) and using gmp arithmetic in order to handle very large integers. I'm not a gmp/cython expert. My question is : how do I print the value a in the function fib().
The following code returns {}.
As fas as I can understand it has to do with stdout. For instance I tried gmp_printf and it didn't work.
%%cython --link-args=-lgmp
cdef extern from "gmp.h":
ctypedef struct mpz_t:
pass
cdef void mpz_init(mpz_t)
cdef void mpz_init_set_ui(mpz_t, unsigned int)
cdef void mpz_add(mpz_t, mpz_t, mpz_t)
cdef void mpz_sub(mpz_t, mpz_t, mpz_t)
cdef void mpz_add_ui(mpz_t, const mpz_t, unsigned long int)
cdef void mpz_set(mpz_t, mpz_t)
cdef void mpz_clear(mpz_t)
cdef unsigned long int mpz_get_ui(mpz_t)
cdef void mpz_set_ui(mpz_t, unsigned long int)
cdef int gmp_printf (const char*, ...)
cdef size_t mpz_out_str (FILE , int , const mpz_t)
def fib(unsigned long int n):
cdef mpz_t a,b
mpz_init(a)
mpz_init(b)
mpz_init_set_ui(a,1)
mpz_init_set_ui(b,1)
cdef int i
for i in range(n):
mpz_add(a,a,b)
mpz_sub(b,a,b)
return a
And the result
fib(10)
{}
If I use return mpz_get_ui(a) instead of return a
the code is working fine, but this is not the thing I really want (to get a long integer).
EDIT.
I compared the previous code with another one again in cython but not using mpz.
%%cython
def pyfib(unsigned long int n):
a,b=1,1
for i in range(n):
a=a+b
b=a-b
return a
and finally the same code but using mpz from gmpy2
%%cython
import gmpy2
from gmpy2 import mpz
def pyfib_with_gmpy2(unsigned long int n):
cdef int i
a,b=mpz(1),mpz(1)
for i in range(n):
a=a+b
b=a-b
return a
Then
timeit fib(700000)
1 loops, best of 3: 3.19 s per loop
and
timeit pyfib(700000)
1 loops, best of 3: 11 s per loop
and
timeit pyfib_with_gmpy2(700000)
1 loops, best of 3: 3.28 s per loop
(Answer mostly summarises a bunch of comments)
The immediate issue you were having was that Python has no real way to handle C structs. To get around this, Cython tries to convert structs to dictionaries when they are passed to Python (if possible). In this particular case, mpz_t is treated as "opaque" by C (and thus Cython) so you aren't supposed to know about its members. Therefore Cython "helpfully" converts it to an empty dictionary (a correct representation of all the members it knows about).
For an immediate fix I suggested using the gmpy library, which is an existing Python/Cython wrapping of GMP. This is probably a better choice than repeating the effort to wrap it.
As a general solution to this sort of problem there are two obvious options.
You could create a cdef wrapper class. The documentation I have linked is for C++, but the idea could be applied to C as well (with new/'del' replaced with 'malloc'/'free'). This is ultimately a Python class (so can be returned from Cython to Python) but contains a C struct, which you can manipulate directly in Cython. The approach is pretty well documented and doesn't need repeating here.
You could convert the mpz_t back to a Python integer at the end of the function. I feel this makes most sense, since ultimately they represent the same thing. The code shown below is a rough outline and hasn't been tested (I don't have gmp installed):
cdef mpz_to_py_int(mpz_t x):
# get bytes that describe the integer
cdef const mp_limb_t* x_data = mpz_limbs_read(x)
# view as a unsigned char* (i.e. as bytes)
cdef unsigned char* x_data_bytes = <unsigned char*>x_data
# cast to a memoryview then pass that to the int classmethod "from_bytes"
# assuming big endian (python 3.2+ required)
out = int.from_bytes(<unsigned char[:mpz_size(x):1]>x_data_bytes,'big')
# correct using sign
if mpz_sign(x) < 0:
return -out
else
return out

Obtaining pointer to python memoryview on bytes object

I have a python memoryview pointing to a bytes object on which I would like to perform some processing in cython.
My problem is:
because the bytes object is not writable, cython does not allow constructing a typed (cython) memoryview from it
I cannot use pointers either because I cannot get a pointer to the memoryview start
Example:
In python:
array = memoryview(b'abcdef')[3:]
In cython:
cdef char * my_ptr = &array[0] fails to compile with the message: Cannot take address of Python variable
cdef char[:] my_view = array fails at runtime with the message: BufferError: memoryview: underlying buffer is not writable
How does one solve this?
Ok, after digging through the python api I found a solution to get a pointer to the bytes object's buffer in a memoryview (here called bytes_view = memoryview(bytes())). Maybe this helps somebody else:
from cpython.buffer cimport PyObject_GetBuffer, PyBuffer_Release, PyBUF_ANY_CONTIGUOUS, PyBUF_SIMPLE
cdef Py_buffer buffer
cdef char * my_ptr
PyObject_GetBuffer(bytes, &buffer, PyBUF_SIMPLE | PyBUF_ANY_CONTIGUOUS)
try:
my_ptr = <char *>buffer.buf
# use my_ptr
finally:
PyBuffer_Release(&buffer)
Using a bytearray (as per #CheeseLover's answer) is probably the right way of doing things. My advice would be to work entirely in bytearrays thereby avoiding temporary conversions. However:
char* can be directly created from a Python string (or bytes) - see the end of the linked section:
cdef char * my_ptr = array
# you can then convert to a memoryview as normal in Cython
cdef char[:] mview = <char[:len(array)]>my_ptr
A couple of warnings:
Remember that bytes is not mutable and if you attempt to modify that memoryview is likely to cause issues
my_ptr (and thus mview) are only valid so long as array is valid, so be sure to keep a reference to array for as long as you need access ti the data,
You can use bytearray to create a mutable memoryview. Please note that this won't change the string, only the bytearray
data = bytearray('python')
view = memoryview(data)
view[0] = 'c'
print data
# cython
If you don't want cython memoryview to fail with 'underlying buffer is not writable' you simply should not ask for a writable buffer. Once you're in C domain you can summarily deal with that writability. So this works:
cdef const unsigned char[:] my_view = array
cdef char* my_ptr = <char*>&my_view[0]

Using Cython to wrap a c++ template to accept any numpy array

I'm trying to wrap a parallel sort written in c++ as a template, to use it with numpy arrays of any numeric type. I'm trying to use Cython to do this.
My problem is that I don't know how to pass a pointer to the numpy array data (of a correct type) to a c++ template. I believe I should use fused dtypes for this, but I don't quite understand how.
The code in .pyx file is below
# importing c++ template
cdef extern from "test.cpp":
void inPlaceParallelSort[T](T* arrayPointer,int arrayLength)
def sortNumpyArray(np.ndarray a):
# This obviously will not work, but I don't know how to make it work.
inPlaceParallelSort(a.data, len(a))
In the past I did similar tasks with ugly for-loops over all possible dtypes, but I believe there should be a better way to do this.
Yes, you want to use a fused type to have Cython call the sorting template for the appropriate specialization of the template.
Here's a working example for all non-complex data types that does this with std::sort.
# cython: wraparound = False
# cython: boundscheck = False
cimport cython
cdef extern from "<algorithm>" namespace "std":
cdef void sort[T](T first, T last) nogil
ctypedef fused real:
cython.char
cython.uchar
cython.short
cython.ushort
cython.int
cython.uint
cython.long
cython.ulong
cython.longlong
cython.ulonglong
cython.float
cython.double
cpdef void npy_sort(real[:] a) nogil:
sort(&a[0], &a[a.shape[0]-1])

Creating a PyCObject pointer in Cython

A few SciPy functions (like scipy.ndimage.interpolation.geometric_transform) can take pointers to C functions as arguments to avoid having to call a Python callable on each point of the input array.
In a nutshell :
Define a function called my_function somewhere in the C module
Return a PyCObject with the &my_function pointer and (optionally) a void* pointer to pass some global data around
The related API method is PyCObject_FromVoidPtrAndDesc, and you can read Extending ndimage in C to see it in action.
I am very interested in using Cython to keep my code more manageable, but I'm not sure how exactly I should create such an object. Any, well,... pointers?
Just do in Cython the same thing you would do in C, call PyCObject_FromVoidPtrAndDesc directly. Here is an example from your link ported to Cython:
###### example.pyx ######
from libc.stdlib cimport malloc, free
from cpython.cobject cimport PyCObject_FromVoidPtrAndDesc
cdef int _shift_function(int *output_coordinates, double* input_coordinates,
int output_rank, int input_rank, double *shift_data):
cdef double shift = shift_data[0]
cdef int ii
for ii in range(input_rank):
input_coordinates[ii] = output_coordinates[ii] - shift
return 1
cdef void _shift_destructor(void* cobject, void *shift_data):
free(shift_data)
def shift_function(double shift):
"""This is the function callable from python."""
cdef double* shift_data = <double*>malloc(sizeof(shift))
shift_data[0] = shift
return PyCObject_FromVoidPtrAndDesc(&_shift_function,
shift_data,
&_shift_destructor)
Performance should be identical to pure C version.
Note that Cyhton requires operator & to get function address. Also, Cython lacks pointer dereference operator *, indexing equivalent is used instead (*ptr -> ptr[0]).
I think that is a bad idea. Cython was created to avoid writing PyObjects also! Moreover, in this case, writing the code through Cython probably doesn't improve code maintenance...
Anyway, you can import the PyObject with
from cpython.ref cimport PyObject
in your Cython code.
UPDATE
from cpython cimport *
is safer.
Cheers,
Davide

Categories