Obtaining pointer to python memoryview on bytes object - python

I have a python memoryview pointing to a bytes object on which I would like to perform some processing in cython.
My problem is:
because the bytes object is not writable, cython does not allow constructing a typed (cython) memoryview from it
I cannot use pointers either because I cannot get a pointer to the memoryview start
Example:
In python:
array = memoryview(b'abcdef')[3:]
In cython:
cdef char * my_ptr = &array[0] fails to compile with the message: Cannot take address of Python variable
cdef char[:] my_view = array fails at runtime with the message: BufferError: memoryview: underlying buffer is not writable
How does one solve this?

Ok, after digging through the python api I found a solution to get a pointer to the bytes object's buffer in a memoryview (here called bytes_view = memoryview(bytes())). Maybe this helps somebody else:
from cpython.buffer cimport PyObject_GetBuffer, PyBuffer_Release, PyBUF_ANY_CONTIGUOUS, PyBUF_SIMPLE
cdef Py_buffer buffer
cdef char * my_ptr
PyObject_GetBuffer(bytes, &buffer, PyBUF_SIMPLE | PyBUF_ANY_CONTIGUOUS)
try:
my_ptr = <char *>buffer.buf
# use my_ptr
finally:
PyBuffer_Release(&buffer)

Using a bytearray (as per #CheeseLover's answer) is probably the right way of doing things. My advice would be to work entirely in bytearrays thereby avoiding temporary conversions. However:
char* can be directly created from a Python string (or bytes) - see the end of the linked section:
cdef char * my_ptr = array
# you can then convert to a memoryview as normal in Cython
cdef char[:] mview = <char[:len(array)]>my_ptr
A couple of warnings:
Remember that bytes is not mutable and if you attempt to modify that memoryview is likely to cause issues
my_ptr (and thus mview) are only valid so long as array is valid, so be sure to keep a reference to array for as long as you need access ti the data,

You can use bytearray to create a mutable memoryview. Please note that this won't change the string, only the bytearray
data = bytearray('python')
view = memoryview(data)
view[0] = 'c'
print data
# cython

If you don't want cython memoryview to fail with 'underlying buffer is not writable' you simply should not ask for a writable buffer. Once you're in C domain you can summarily deal with that writability. So this works:
cdef const unsigned char[:] my_view = array
cdef char* my_ptr = <char*>&my_view[0]

Related

How to pass void pointer (starting of an array) to C function from Cython? [duplicate]

I have a numpy array which came from a cv2.imread and so has dtype = np.uint8 and ndim = 3.
I want to convert it to a Cython unsigned int* for use with an external cpp library.
I am trying cdef unsigned int* buff = <unsigned int*>im.data however I get the error Python objects cannot be cast to pointers of primitive types
What am I doing wrong?
Thanks
The more modern way would be to use a memoryview rather than a pointer:
cdef np.uint32_t[:,:,::1] mv_buff = np.ascontiguousarray(im, dtype = np.uint32)
The [:,;,::1] syntax tells Cython the memoryview is 3D and C contiguous in memory. The advantage of defining the type to be a memoryview rather than a numpy array is
it can accept any type that defines the buffer interface, for example the built in array module, or objects from the PIL imaging library.
Memoryviews can be passed without holding the GIL, which is useful for parallel code
To get the pointer from the memoryview get the address of the first element:
cdef np.uint32_t* im_buff = &mv_buff[0,0,0]
This is better than doing <np.uint32_t*>mv_buff.data because it avoids a cast, and casts can often hide errors.
thanks for your comments. solved by:
cdef np.ndarray[np.uint32_t, ndim=3, mode = 'c'] np_buff = np.ascontiguousarray(im, dtype = np.uint32)
cdef unsigned int* im_buff = <unsigned int*> np_buff.data

Cython: Memory view of freed memory

In Cython code, I can allocate some memory and wrap it in a memory view, e.g. like this:
cdef double* ptr
cdef double[::1] view
ptr = <double*> PyMem_Malloc(N*sizeof('double'))
view = <double[:N]> ptr
If I now free the memory using PyMem_Free(ptr), trying to access elements like ptr[i] throws an error, as it should. However, I can safely try to access view[i] (it does not return the original data though).
My question is this: Is it always safe to just deallocate the pointer? Is the memory view object somehow informed of the memory being freed, or should I manually remove the view somehow? Also, is the memory guaranteed to be freed, even though it is referred to by memory views?
It requires a bit of digging into the C code to show this, but:
The line view = <double[:N]> ptr actually generates a __pyx_array_obj. This is the same type detailed in the documentation as a "Cython array" and cimportable as cython.view.array. The Cython array does have an optional member called callback_free_data that can act as a destructor.
The line translates as:
struct __pyx_array_obj *__pyx_t_1 = NULL;
# ...
__pyx_t_1 = __pyx_array_new(__pyx_t_2, sizeof(double), PyBytes_AS_STRING(__pyx_t_3), (char *) "c", (char *) __pyx_v_ptr);
(__pyx_t_2 and __pyx_t_3 are just temporaries storing the size and format respectively). If we look inside __pyx_array_new we see firstly that the array's data member is assigned directly to the value passed as __pyx_v_ptr
__pyx_v_result->data = __pyx_v_buf;
(i.e. a copy is not made) and secondly that callback_free_data is not set. Side note: The C code for cython.view.array is actually generated from Cython code so if you want to investigate further it's probably easier to read that than the generated C.
Essentially, the memoryview holds a cython.view.array which has a pointer to the original data, but no callback_free_data set. When the memoryview dies the destructor for the cython.view.array is called. This cleans up a few internals, but does nothing to free the data it points to (since it has no indication of how to do so).
It is therefore not safe to access the memoryview after you have called PyMem_Free. That fact you seem to get away with it is luck. It is safe for the memoryview to keep existing though, providing you don't access it. A function like:
def good():
cdef double* ptr
cdef double[::1] view
ptr = <double*> PyMem_Malloc(N*sizeof('double'))
try:
view = <double[:N]> ptr
# some other stuff
finally:
PyMem_Free(ptr)
# some other stuff not involving ptr or view
would be fine. A function like:
def bad():
cdef double* ptr
cdef double[::1] view
ptr = <double*> PyMem_Malloc(N*sizeof('double'))
try:
view = <double[:N]> ptr
# some other stuff
finally:
PyMem_Free(ptr)
view[0] = 0
return view
would be a bad idea since it's passing back a memoryview that doesn't point to anything, and accessing view after the data it views has been freed.
You should definitely make sure to call PyMem_Free at some point, otherwise you have a memory leak. One way of doing it if view gets passed around and so the lifetime is hard to track would be to manually create a cython.view.array with callback_free_data set:
cdef view.array my_array = view.array((N,), allocate_buffer=False)
my_array.data = <char *> ptr
my_array.callback_free_data = PyMem_Free
view = my_array
If the lifetime of view is obvious then you can just call PyMem_Free on ptr as you've been doing.

Potential memory leak when converting wide char to python string

I have the following code in in cython in the pyx file, which converts wchar_t* to python string (unicode)
// All code below is python 2.7.4
cdef wc_to_pystr(wchar_t *buf):
if buf == NULL:
return None
cdef size_t buflen
buflen = wcslen(buf)
cdef PyObject *p = PyUnicode_FromWideChar(buf, buflen)
return <unicode>p
I called this function in a loop like this:
cdef wchar_t* buf = <wchar_t*>calloc(100, sizeof(wchar_t))
# ... copy some wide string to buf
for n in range(30000):
u = wc_to_pystr(buf) #<== behaves as if its a memory leak
free(buf)
I tested this on Windows and the observation is that the memory (as seen in Task Manager) keeps on increasing and hence I suspect that there could be a memory leak here.
This is surprising because:
As per my understanding the API PyUnicode_FromWideChar() copies the
supplied buffer.
Every-time the variable 'u' is assigned a different value, the previous value
should be freed-up
Since the source buffer ('buf') remains as is and is released only after the loop
ends, I was expecting that memory should not increase after a certain point at all
Any idea where am I going wrong? Is there a better way to implement Wide Char to python unicode object?
Solved!!
Solution:
(Note: The solution refers to a piece of my code which was not in the question originally. I had no clue while posting that it would hold the key to solve this. Sorry to those who gave it a thought to solve ... )
In cython pyx file, I had declared the python API like:
PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
I checked out the docs at https://github.com/cython/cython/blob/master/Cython/Includes/cpython/init.pxd
I had declared return type as PyObject* and hence an additional ref was created which I was not deref-ing explicitly. Solution was to change the return type in the signature like:
object PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
As per docs adding 'object' as return type does not increment any ref count and hence in the for loop memory is freed-up correctly. The modified 'wc_to_pystr' looks like this:
cdef wc_to_pystr(wchar_t *buf):
if buf == NULL:
return None
cdef size_t buflen
buflen = wcslen(buf)
p = PyUnicode_FromWideChar(buf, buflen)
return p

Converting C array to Python bytes

I'm trying to use Cython to run a C++ library in python3 environment.
When I try to return the int array back to python like this:
def readBytes(self, length):
cdef int *buffer = []
self.stream.read(buffer, length)
return buffer
I get the error
return buffer
^
Cannot convert 'int *' to Python object
P.S. I don't get errors if I use
cdef char *buffer = ''
it looks like stream.read() allocates the memory pointed by buffer. If this is the case, you cannot return to python space data allocated in C++ space. You should:
1) create a python object, or a numpy array if you prefer, in python/cython code
2) copy the data from the allocated memory pointed by *buffer to your new shiny python object allocated in python space. You can then return this object.
This is necessary because python cannot deal with the memory allocated in C space in any way, and the memory allocated by your C code is leaking, i.e. it will not be deallocated.
Now you also asked why you do not get the error with cdef char *buffer = ''. In this latter case, cython recognizes that buffer points to a string, and automatically generates a new python object with the content pointed by buffer. Example follows for ipython:
%%cython
def ReturnThisString():
cdef char *buffer = 'foobar'
return buffer
print ReturnThisString() #this outputs 'foobar'
Notice that buffer is initialized by your C compiler on the stack, and there's no guarantee that when you use this function from python the string will still be there at that memory position. However, when cython runs the return statement it automatically initializes a python string from your char * pointer. (In python 3 I guess it is converted to bytes as #Veedrac says, but this is a minor point). In this second case, the creation of the python object and the copy operation is hidden and taken care by cython, but it is still there.
char can be automatically coerced to bytes because Cython thinks that they're approximately the same and can do it fast. Note that char * pointers will be null terminated by default.
This is not implemented for int *. You will typically want to coerce to a Numpy object (this is actually wrapping the array). If you want something faster, think about cpython.array.

Using the buffer API in Cython

I'm working in with a C library that repeatedly calls a user supplied function pointer to get more data. I'd like to write a Cython wrapper in such a way that the Python implementation of that callback can return any reasonable data type like str, bytearray, memory mapped files, and so on (specifically, supports the Buffer interface). what I have so far is:
from cpython.buffer cimport PyBUF_SIMPLE
from cpython.buffer cimport Py_buffer
from cpython.buffer cimport PyObject_GetBuffer
from cpython.buffer cimport PyBuffer_Release
from libc.string cimport memmove
cdef class _callback:
cdef public object callback
cdef public object data
cdef uint16_t GetDataCallback(void * userdata,
uint32_t wantlen, unsigned char * data,
uint32_t * gotlen):
cdef Py_buffer gotdata
box = <_callback> userdata
gotdata_object = box.callback(box.data, wantlen)
if not PyObject_CheckBuffer(gotdata_object):
# sulk
return 1
try:
PyObject_GetBuffer(gotdata_object, &gotdata, PyBUF_SIMPLE)
if not (0 < gotdata.len <= wantlen):
# sulk
return 1
memmove(data, gotdata.buf, gotdata.len)
return 0
finally:
PyBuffer_Release(&gotdata)
The code I want to write would produce equivalent C code, but look like this:
from somewhere cimport something
from libc.string cimport memmove
cdef class _callback:
cdef public object callback
cdef public object data
cdef uint16_t GetDataCallback(void * userdata,
uint32_t wantlen, unsigned char * data,
uint32_t * gotlen):
cdef something gotdata
box = <_callback> userdata
gotdata = box.callback(box.data, wantlen)
if not (0 < gotdata.len <= wantlen):
# sulk
return 1
memmove(data, gotdata.buf, gotdata.len)
return 0
The generated C code looks like what I think it should be doing; but this seems like digging around in the Python API unnecessarily. Does Cython provide a nicer syntax to achieve this effect?
If you want to support everything that implements every variation of the new-style or old-style buffer interface, then you have to use the C API.
But if you don't care about old-style buffers, you can almost always use a memoryview:
Cython memoryviews support nearly all objects exporting the interface of Python new style buffers. This is the buffer interface described in PEP 3118. NumPy arrays support this interface, as do Cython arrays. The “nearly all” is because the Python buffer interface allows the elements in the data array to themselves be pointers; Cython memoryviews do not yet support this.
This of course includes str (or, in 3.x, bytes), bytearray, etc—if you followed the link, you may notice that it links to the same page to explain what it supports that you linked to explain what you want to support.
For 1D arrays of characters (like str), it's:
cdef char [:] gotdata

Categories