In Cython code, I can allocate some memory and wrap it in a memory view, e.g. like this:
cdef double* ptr
cdef double[::1] view
ptr = <double*> PyMem_Malloc(N*sizeof('double'))
view = <double[:N]> ptr
If I now free the memory using PyMem_Free(ptr), trying to access elements like ptr[i] throws an error, as it should. However, I can safely try to access view[i] (it does not return the original data though).
My question is this: Is it always safe to just deallocate the pointer? Is the memory view object somehow informed of the memory being freed, or should I manually remove the view somehow? Also, is the memory guaranteed to be freed, even though it is referred to by memory views?
It requires a bit of digging into the C code to show this, but:
The line view = <double[:N]> ptr actually generates a __pyx_array_obj. This is the same type detailed in the documentation as a "Cython array" and cimportable as cython.view.array. The Cython array does have an optional member called callback_free_data that can act as a destructor.
The line translates as:
struct __pyx_array_obj *__pyx_t_1 = NULL;
# ...
__pyx_t_1 = __pyx_array_new(__pyx_t_2, sizeof(double), PyBytes_AS_STRING(__pyx_t_3), (char *) "c", (char *) __pyx_v_ptr);
(__pyx_t_2 and __pyx_t_3 are just temporaries storing the size and format respectively). If we look inside __pyx_array_new we see firstly that the array's data member is assigned directly to the value passed as __pyx_v_ptr
__pyx_v_result->data = __pyx_v_buf;
(i.e. a copy is not made) and secondly that callback_free_data is not set. Side note: The C code for cython.view.array is actually generated from Cython code so if you want to investigate further it's probably easier to read that than the generated C.
Essentially, the memoryview holds a cython.view.array which has a pointer to the original data, but no callback_free_data set. When the memoryview dies the destructor for the cython.view.array is called. This cleans up a few internals, but does nothing to free the data it points to (since it has no indication of how to do so).
It is therefore not safe to access the memoryview after you have called PyMem_Free. That fact you seem to get away with it is luck. It is safe for the memoryview to keep existing though, providing you don't access it. A function like:
def good():
cdef double* ptr
cdef double[::1] view
ptr = <double*> PyMem_Malloc(N*sizeof('double'))
try:
view = <double[:N]> ptr
# some other stuff
finally:
PyMem_Free(ptr)
# some other stuff not involving ptr or view
would be fine. A function like:
def bad():
cdef double* ptr
cdef double[::1] view
ptr = <double*> PyMem_Malloc(N*sizeof('double'))
try:
view = <double[:N]> ptr
# some other stuff
finally:
PyMem_Free(ptr)
view[0] = 0
return view
would be a bad idea since it's passing back a memoryview that doesn't point to anything, and accessing view after the data it views has been freed.
You should definitely make sure to call PyMem_Free at some point, otherwise you have a memory leak. One way of doing it if view gets passed around and so the lifetime is hard to track would be to manually create a cython.view.array with callback_free_data set:
cdef view.array my_array = view.array((N,), allocate_buffer=False)
my_array.data = <char *> ptr
my_array.callback_free_data = PyMem_Free
view = my_array
If the lifetime of view is obvious then you can just call PyMem_Free on ptr as you've been doing.
Related
I have a python memoryview pointing to a bytes object on which I would like to perform some processing in cython.
My problem is:
because the bytes object is not writable, cython does not allow constructing a typed (cython) memoryview from it
I cannot use pointers either because I cannot get a pointer to the memoryview start
Example:
In python:
array = memoryview(b'abcdef')[3:]
In cython:
cdef char * my_ptr = &array[0] fails to compile with the message: Cannot take address of Python variable
cdef char[:] my_view = array fails at runtime with the message: BufferError: memoryview: underlying buffer is not writable
How does one solve this?
Ok, after digging through the python api I found a solution to get a pointer to the bytes object's buffer in a memoryview (here called bytes_view = memoryview(bytes())). Maybe this helps somebody else:
from cpython.buffer cimport PyObject_GetBuffer, PyBuffer_Release, PyBUF_ANY_CONTIGUOUS, PyBUF_SIMPLE
cdef Py_buffer buffer
cdef char * my_ptr
PyObject_GetBuffer(bytes, &buffer, PyBUF_SIMPLE | PyBUF_ANY_CONTIGUOUS)
try:
my_ptr = <char *>buffer.buf
# use my_ptr
finally:
PyBuffer_Release(&buffer)
Using a bytearray (as per #CheeseLover's answer) is probably the right way of doing things. My advice would be to work entirely in bytearrays thereby avoiding temporary conversions. However:
char* can be directly created from a Python string (or bytes) - see the end of the linked section:
cdef char * my_ptr = array
# you can then convert to a memoryview as normal in Cython
cdef char[:] mview = <char[:len(array)]>my_ptr
A couple of warnings:
Remember that bytes is not mutable and if you attempt to modify that memoryview is likely to cause issues
my_ptr (and thus mview) are only valid so long as array is valid, so be sure to keep a reference to array for as long as you need access ti the data,
You can use bytearray to create a mutable memoryview. Please note that this won't change the string, only the bytearray
data = bytearray('python')
view = memoryview(data)
view[0] = 'c'
print data
# cython
If you don't want cython memoryview to fail with 'underlying buffer is not writable' you simply should not ask for a writable buffer. Once you're in C domain you can summarily deal with that writability. So this works:
cdef const unsigned char[:] my_view = array
cdef char* my_ptr = <char*>&my_view[0]
I am currently wrapping a library using Cython. For this purpose, I wanted to re-use one function of a pure-C binding.
This is the basic setup:
mylib.pxd
old_lib.c
old_lib.h
In mylib.pxd I do :
cdef extern from old_lib.h:
PyObject* get_pyobject()
And then pass old_lib.c as a source file in my extension:
setup(ext_modules=[Extension("mylib", sources=["mylib.pxd", "old_lib.c"])])
In mylib.pxd, I use get_pyobject that creates a new object that I want to return like so:
cdef PyObject* ptr
ptr = get_pyobject()
return <object>ptr
This gives me the desired behaviour, but I am afraid that this will leak the ptr reference.
Will it ? I get confused because I found (old) references saying that you should manage PyObject* references by yourself and call Py_INCREF/DECREF accordingly but it seems that in the Cython FAQ they say :
Note that the lifetime of the object is only bound to its owned references, not to any C pointers that happen to point to it.
Does this mean that whenever the returned value is discarded, the ptr will be garbage collected?
In the old_lib.c the flow goes like this:
PyObject* get_pyobject()
{
PyTypedObject* typeptr = PyObject_NEW_VAR(MyType, &Type, size)
fill_attribute(typeptr->attrib)
return (PyObject*)typeptr
}
Where PyObject_NEW_VAR is implemented in the python standard library (objimpl.h:196 in my version) using PyObject_InitVar. Thus, the returned reference is a borrowed reference, but as PyObject_MALLOC is used, I guess this is the only reference to this object. Relevant code:
#define PyObject_NEW_VAR(type, typeobj, n) \
( (type *) PyObject_InitVar( \
(PyVarObject *) PyObject_MALLOC(_PyObject_VAR_SIZE((typeobj),(n)) ),\
(typeobj), (n)) )
EDIT :
I have checked, and when using the above code,sys.getrefcount returns 3. So as far as I understand, when I create the object, it comes with a refcount of 1. Then, when casting it to object its refcount is bumped to 2. It will thus never be garbage collected (unless there is a way to remove two refcounts to an object which has only one accessible pointer) and leak.
If I insert a PY_DECREF, it still works and correctly returns 2. I also took time to rewrite that function directly in Cython, and it returns 2.
Looking at old documentation PyObject_NEW_VAR is a macro version of the function PyObject_NewVar which (as #MadPhysicist says) returns a "new reference" (i.e. has a refcount of 1). I suspect you're no longer encouraged to use the macro so it's disappeared from the more recent documentation.
The fact that it's implemented in terms of something that returns a "borrowed reference" should probably just be regarded as an implementation detail, and not something that means it returns a "borrowed reference" itself.
Regarding Cython behaviour, the cast to <object> increments the reference count, and so causes a memory leak. My suggested approach for diagnosing it was to look at the reference count, something like this:
from cpython.ref cimport PyObject # somewhere at the top
def whatever_function():
cdef PyObject* ptr
ptr = get_pyobject()
print ptr.ob_refcnt # prints 1
ret_val = <object>ptr
print ptr.ob_refcnt # prints 2,
# but it will only every be decremented to 1, so never be freed
return ret_val
In terms of fixing it you have two choices - you could decrement the reference count once yourself, or you could change the Cython wrapping of the function
cdef extern from old_lib.h:
object get_pyobject()
(don't worry that it doesn't exactly match the header file!). Cython interprets this as "get_pyobject() returns a new reference, so we don't increment it ourself, but handle the reference counting automatically from here."
I'm writing a python wrapper to a C class and I'm allocating memory using PyMem_Malloc as explained here
cdef class SomeMemory:
cdef double* data
def __cinit__(self, size_t number):
# allocate some memory (uninitialised, may contain arbitrary data)
self.data = <my_data_t*> PyMem_Malloc(number * sizeof(my_data_t))
if not self.data:
raise MemoryError()
Then I import and use the class in another script using:
import SomeMemory
sm = SomeMemory(10)
I now would like to access the elements of self.data but I encounter 2 problems
if I type self.data and hit enter, the ipython kernel crashes
if I try to loop on self data
like:
for p in self.data:
print p
I get an error that self.data is not iterable.
How can I access self.data? Do I need to cast the elements to my_data_t first?
(Heavily edited in light of the updated question)
So - my understanding is that self.data needs to be declared as public to access it from Python:
cdef class SomeMemory:
cdef public double* data
# also make sure you define __dealloc__ to remove the data
However, that doesn't seem to be enforced here.
However, your real problems are that Python doesn't know what to do with an object of type double *. It's certainly never going to be iterable because the information about when to stop is simply not stored (so it will always go off the end.
There are a range of better alternatives:
you store your data as a Python array (see http://docs.cython.org/src/tutorial/array.html for a guide). You have quick access to the raw c pointer from within Cython if you want, but also Python knows what to do with it
Code follows
from cpython cimport array as c_array
from array import array
cdef class SomeMemory:
cdef c_array.array data
def __cinit__(self, size_t number):
self.data = array('d', some initial value goes here)
You store your data as a numpy array. This is possibly slightly more complicated, but has much the same advantages. I won't give an example, but it's easier to find.
You use Cython's typed memory view: http://docs.cython.org/src/userguide/memoryviews.html. The advantage of this is that it lets you control the memory management yourself (if that's absolutely important to you). However, I'm not 100% sure that you can access them cleanly in Python (too lazy to test!)
You wrap the data in your own class implementing __getitem__ and __len__ so it can be used as an iterator. This is likely more trouble that it's worth.
I recommend the first option, unless you have good reason for one of the others.
I have the following code in in cython in the pyx file, which converts wchar_t* to python string (unicode)
// All code below is python 2.7.4
cdef wc_to_pystr(wchar_t *buf):
if buf == NULL:
return None
cdef size_t buflen
buflen = wcslen(buf)
cdef PyObject *p = PyUnicode_FromWideChar(buf, buflen)
return <unicode>p
I called this function in a loop like this:
cdef wchar_t* buf = <wchar_t*>calloc(100, sizeof(wchar_t))
# ... copy some wide string to buf
for n in range(30000):
u = wc_to_pystr(buf) #<== behaves as if its a memory leak
free(buf)
I tested this on Windows and the observation is that the memory (as seen in Task Manager) keeps on increasing and hence I suspect that there could be a memory leak here.
This is surprising because:
As per my understanding the API PyUnicode_FromWideChar() copies the
supplied buffer.
Every-time the variable 'u' is assigned a different value, the previous value
should be freed-up
Since the source buffer ('buf') remains as is and is released only after the loop
ends, I was expecting that memory should not increase after a certain point at all
Any idea where am I going wrong? Is there a better way to implement Wide Char to python unicode object?
Solved!!
Solution:
(Note: The solution refers to a piece of my code which was not in the question originally. I had no clue while posting that it would hold the key to solve this. Sorry to those who gave it a thought to solve ... )
In cython pyx file, I had declared the python API like:
PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
I checked out the docs at https://github.com/cython/cython/blob/master/Cython/Includes/cpython/init.pxd
I had declared return type as PyObject* and hence an additional ref was created which I was not deref-ing explicitly. Solution was to change the return type in the signature like:
object PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
As per docs adding 'object' as return type does not increment any ref count and hence in the for loop memory is freed-up correctly. The modified 'wc_to_pystr' looks like this:
cdef wc_to_pystr(wchar_t *buf):
if buf == NULL:
return None
cdef size_t buflen
buflen = wcslen(buf)
p = PyUnicode_FromWideChar(buf, buflen)
return p
I'm wrapping a C library that uses callbacks as external memory allocators. Basically, instead of doing malloc and free itself, it exposes several callbacks for making buffers of specified sizes. Hooking those up with ctypes is fairly easy, and all that seems to be working. However, I can't seem to connect the array to the pointer that's passed in.
The C interface looks something like this:
extern int (*alloc_int_buffer_callback)(some_struct* buffer, uint32_t length);
I've got the struct defined through ctypes, and it seems to match up perfectly.
The ctypes callbacks look something like this:
_int_functype = CFUNCTYPE(c_int, POINTER(some_struct), c_uint)
The problem I'm having is that trying to get the POINTER(some_struct) instance to point at an array of type (c_int * length) doesn't seem to work. Here's the callback I've got right now:
def _alloc_struct_buffer(ptr, length):
arr_type = SomeStruct * length
arr = arr_type()
ptr.contents = cast(addressof(arr), POINTER(SomeStruct))
return 1
Unfortunately, I'm getting an error along the lines of "Expected SomeStruct instead of LP_SomeStruct". What am I missing?
If you want to return a pointer in a parameter, you need to pass in a pointer to a pointer:
extern int alloc_int_buffer_callback(some_struct **buffer, uint32_t length);
If you simply pass in a pointer, changing the pointer inside the function does nothing, since you passed the pointer by value.
Inside your Python callback, you can use
ptr.contents = cast(arr, POINTER(SomeStruct))
after changing the prototype.