Potential memory leak when converting wide char to python string - python

I have the following code in in cython in the pyx file, which converts wchar_t* to python string (unicode)
// All code below is python 2.7.4
cdef wc_to_pystr(wchar_t *buf):
if buf == NULL:
return None
cdef size_t buflen
buflen = wcslen(buf)
cdef PyObject *p = PyUnicode_FromWideChar(buf, buflen)
return <unicode>p
I called this function in a loop like this:
cdef wchar_t* buf = <wchar_t*>calloc(100, sizeof(wchar_t))
# ... copy some wide string to buf
for n in range(30000):
u = wc_to_pystr(buf) #<== behaves as if its a memory leak
free(buf)
I tested this on Windows and the observation is that the memory (as seen in Task Manager) keeps on increasing and hence I suspect that there could be a memory leak here.
This is surprising because:
As per my understanding the API PyUnicode_FromWideChar() copies the
supplied buffer.
Every-time the variable 'u' is assigned a different value, the previous value
should be freed-up
Since the source buffer ('buf') remains as is and is released only after the loop
ends, I was expecting that memory should not increase after a certain point at all
Any idea where am I going wrong? Is there a better way to implement Wide Char to python unicode object?

Solved!!
Solution:
(Note: The solution refers to a piece of my code which was not in the question originally. I had no clue while posting that it would hold the key to solve this. Sorry to those who gave it a thought to solve ... )
In cython pyx file, I had declared the python API like:
PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
I checked out the docs at https://github.com/cython/cython/blob/master/Cython/Includes/cpython/init.pxd
I had declared return type as PyObject* and hence an additional ref was created which I was not deref-ing explicitly. Solution was to change the return type in the signature like:
object PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size)
As per docs adding 'object' as return type does not increment any ref count and hence in the for loop memory is freed-up correctly. The modified 'wc_to_pystr' looks like this:
cdef wc_to_pystr(wchar_t *buf):
if buf == NULL:
return None
cdef size_t buflen
buflen = wcslen(buf)
p = PyUnicode_FromWideChar(buf, buflen)
return p

Related

Cython using gmp arithmetic

I'm trying to implement a simple code in cython using Jupyter notebook (I use python 2) and using gmp arithmetic in order to handle very large integers. I'm not a gmp/cython expert. My question is : how do I print the value a in the function fib().
The following code returns {}.
As fas as I can understand it has to do with stdout. For instance I tried gmp_printf and it didn't work.
%%cython --link-args=-lgmp
cdef extern from "gmp.h":
ctypedef struct mpz_t:
pass
cdef void mpz_init(mpz_t)
cdef void mpz_init_set_ui(mpz_t, unsigned int)
cdef void mpz_add(mpz_t, mpz_t, mpz_t)
cdef void mpz_sub(mpz_t, mpz_t, mpz_t)
cdef void mpz_add_ui(mpz_t, const mpz_t, unsigned long int)
cdef void mpz_set(mpz_t, mpz_t)
cdef void mpz_clear(mpz_t)
cdef unsigned long int mpz_get_ui(mpz_t)
cdef void mpz_set_ui(mpz_t, unsigned long int)
cdef int gmp_printf (const char*, ...)
cdef size_t mpz_out_str (FILE , int , const mpz_t)
def fib(unsigned long int n):
cdef mpz_t a,b
mpz_init(a)
mpz_init(b)
mpz_init_set_ui(a,1)
mpz_init_set_ui(b,1)
cdef int i
for i in range(n):
mpz_add(a,a,b)
mpz_sub(b,a,b)
return a
And the result
fib(10)
{}
If I use return mpz_get_ui(a) instead of return a
the code is working fine, but this is not the thing I really want (to get a long integer).
EDIT.
I compared the previous code with another one again in cython but not using mpz.
%%cython
def pyfib(unsigned long int n):
a,b=1,1
for i in range(n):
a=a+b
b=a-b
return a
and finally the same code but using mpz from gmpy2
%%cython
import gmpy2
from gmpy2 import mpz
def pyfib_with_gmpy2(unsigned long int n):
cdef int i
a,b=mpz(1),mpz(1)
for i in range(n):
a=a+b
b=a-b
return a
Then
timeit fib(700000)
1 loops, best of 3: 3.19 s per loop
and
timeit pyfib(700000)
1 loops, best of 3: 11 s per loop
and
timeit pyfib_with_gmpy2(700000)
1 loops, best of 3: 3.28 s per loop
(Answer mostly summarises a bunch of comments)
The immediate issue you were having was that Python has no real way to handle C structs. To get around this, Cython tries to convert structs to dictionaries when they are passed to Python (if possible). In this particular case, mpz_t is treated as "opaque" by C (and thus Cython) so you aren't supposed to know about its members. Therefore Cython "helpfully" converts it to an empty dictionary (a correct representation of all the members it knows about).
For an immediate fix I suggested using the gmpy library, which is an existing Python/Cython wrapping of GMP. This is probably a better choice than repeating the effort to wrap it.
As a general solution to this sort of problem there are two obvious options.
You could create a cdef wrapper class. The documentation I have linked is for C++, but the idea could be applied to C as well (with new/'del' replaced with 'malloc'/'free'). This is ultimately a Python class (so can be returned from Cython to Python) but contains a C struct, which you can manipulate directly in Cython. The approach is pretty well documented and doesn't need repeating here.
You could convert the mpz_t back to a Python integer at the end of the function. I feel this makes most sense, since ultimately they represent the same thing. The code shown below is a rough outline and hasn't been tested (I don't have gmp installed):
cdef mpz_to_py_int(mpz_t x):
# get bytes that describe the integer
cdef const mp_limb_t* x_data = mpz_limbs_read(x)
# view as a unsigned char* (i.e. as bytes)
cdef unsigned char* x_data_bytes = <unsigned char*>x_data
# cast to a memoryview then pass that to the int classmethod "from_bytes"
# assuming big endian (python 3.2+ required)
out = int.from_bytes(<unsigned char[:mpz_size(x):1]>x_data_bytes,'big')
# correct using sign
if mpz_sign(x) < 0:
return -out
else
return out

Pass str as an int array to a Python C extended function (extended using SWIG)

How can I pass a str value (containing 3000 {'0', '1'} bytes) obtained using python code as an argument to a python c extended function (extended using SWIG) which requires int * (fixed length int array) as an input argument? My code is such:
int *exposekey(int *bits) {
int a[1000];
for (int j=2000; j < 3000; j++) {
a[j - 2000] = bits[j];
}
return a;
}
What I've tried was to use ctypes (see below code):
import ctypes
ldpc = ctypes.cdll.LoadLibrary('./_ldpc.so')
arr = (ctypes.c_int * 3072)(<mentioned below>)
ldpc.exposekey(arr)
with 3072 {0, 1} entered in the position. Python returns syntax error : more than 255 arguments. This still doesn't help me to pass assigned str value instead of the initialized ctypes int array.
Other suggestion included using SWIG typemaps but how would that work for converting a str into int * ? Thanks in advance.
Regarding my comment, here are some more details about returning arrays from functions: [SO]: Returning an array using C. In short: ways handle this:
Make the returned variable static
Dynamically allocate it (using malloc (family) or new)
Turn it into an additional argument for the function
Getting that piece of C code to run within the Python interpreter is possible in 2 ways:
[Python 3.Docs]: Extending Python with C or C++ - which creates a C written Python module
A way of doing that is using swig which offers a simple interface for generating the module ([SWIG]: SWIG Basics) saving you the trouble of writing it yourself using [Python 3.Docs]: Python/C API Reference Manual
The other way around, leaving the code in a standard dll which can be accessed via [Python 3.Docs]: ctypes - A foreign function library for Python
Since they both are doing the same thing, mixing them together makes no sense. So, pick the one that best fits your needs.
1. ctypes
This is what you started with
It's one of the ways of doing things using ctypes
ctypes_demo.c:
#include <stdio.h>
#if defined(_WIN32)
# define CTYPES_DEMO_EXPORT_API __declspec(dllexport)
#else
# define CTYPES_DEMO_EXPORT_API
#endif
CTYPES_DEMO_EXPORT_API int exposekey(char *bitsIn, char *bitsOut) {
int ret = 0;
printf("Message from C code...\n");
for (int j = 0; j < 1000; j++)
{
bitsOut[j] = bitsIn[j + 2000];
ret++;
}
return ret;
}
Notes:
Based on comments, I changed the types in the function from int* to char*, because it's 4 times more compact (although it's still ~700% inefficient since 7 bits of each char are ignored versus only one of them being used; that can be fixed, but requires bitwise processing)
I took a and turned into the 2nd argument (bitsOut). I think this is best because it's caller responsibility to allocate and deallocate the array (the 3rd option from the beginning)
I also modified the index range (without changing functionality), because it makes more sense to work with low index values and add something to them in one place, instead of a high index values and subtract (the same) something in another place
The return value is the number of bits set (obviously, 1000 in this case) but it's just an example
printf it's just dummy, to show that the C code gets executed
When dealing with such arrays, it's recommended to pass their dimensions as well, to avoid out of bounds errors. Also, error handling is an important aspect
test_ctypes.py:
from ctypes import CDLL, c_char, c_char_p, c_int, create_string_buffer
bits_string = "010011000110101110101110101010010111011101101010101"
def main():
dll = CDLL("./ctypes_demo.dll")
exposekey = dll.exposekey
exposekey.argtypes = [c_char_p, c_char_p]
exposekey.restype = c_int
bits_in = create_string_buffer(b"\0" * 2000 + bits_string.encode())
bits_out = create_string_buffer(1000)
print("Before: [{}]".format(bits_out.raw[:len(bits_string)].decode()))
ret = exposekey(bits_in, bits_out)
print("After: [{}]".format(bits_out.raw[:len(bits_string)].decode()))
print("Return code: {}".format(ret))
if __name__ == "__main__":
main()
Notes:
1st, I want to mention that running your code didn't raise the error you got
Specifying function's argtypes and restype is mandatory, and also makes things easier (documented in the ctypes tutorial)
I am printing the bits_out array (only the first - and relevant - part, as the rest are 0) in order to prove that the C code did its job
I initialize bits_in array with 2000 dummy 0 at the beginning, as those values are not relevant here. Also, the input string (bits_string) is not 3000 characters long (for obvious reasons). If your bits_string is 3000 characters long you can simply initialize bits_in like: bits_in = create_string_buffer(bits_string.encode())
Do not forget to initialize bits_out to an array with a size large enough (in our example 1000) for its purpose, otherwise segfault might arise when trying to set its content past the size
For this (simple) function, the ctypes variant was easier (at least for me, since I don't use swig frequently), but for more complex functions / projects it will become an overkill and switching to swig would be the right thing to do
Output (running with Python3.5 on Win):
c:\Work\Dev\StackOverflow\q47276327>"c:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" test_ctypes.py
Before: [ ]
Message from C code...
After: [010011000110101110101110101010010111011101101010101]
Return code: 1000
2. swig
Almost everything from the ctypes section, applies here as well
swig_demo.c:
#include <malloc.h>
#include <stdio.h>
#include "swig_demo.h"
char *exposekey(char *bitsIn) {
char *bitsOut = (char*)malloc(sizeof(char) * 1000);
printf("Message from C code...\n");
for (int j = 0; j < 1000; j++) {
bitsOut[j] = bitsIn[j + 2000];
}
return bitsOut;
}
swig_demo.i:
%module swig_demo
%{
#include "swig_demo.h"
%}
%newobject exposekey;
%include "swig_demo.h"
swig_demo.h:
char *exposekey(char *bitsIn);
Notes:
Here I'm allocating the array and return it (the 2nd option from the beginning)
The .i file is a standard swig interface file
Defines the module, and its exports (via %include)
One thing that is worth mentioning is the %newobject directive that deallocates the pointer returned by exposekey to avoid memory leaks
The .h file just contains the function declaration, in order to be included by the .i file (it's not mandatory, but things are more elegant this way)
The rest is pretty much the same
test_swig.py:
from swig_demo import exposekey
bits_in = "010011000110101110101110101010010111011101101010101"
def main():
bits_out = exposekey("\0" * 2000 + bits_in)
print("C function returned: [{}]".format(bits_out))
if __name__ == "__main__":
main()
Notes:
Things make much more sense from Python programmer's PoV
Code is a lot shorter (that is because swig did some "magic" behind the scenes):
The wrapper .c wrapper file generated from the .i file has ~120K
The swig_demo.py generated module has ~3K
I used the same technique with 2000 0 at the beginning of the string
Output:
c:\Work\Dev\StackOverflow\q47276327>"c:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" test_swig.py
Message from C code...
C function returned: [010011000110101110101110101010010111011101101010101]
3. Plain Python C API
I added this part as a personal exercise
This is what swig does, but "manually"
capi_demo.c:
#include "Python.h"
#include "swig_demo.h"
#define MOD_NAME "capi_demo"
static PyObject *PyExposekey(PyObject *self, PyObject *args) {
PyObject *bitsInArg = NULL, *bitsOutArg = NULL;
char *bitsIn = NULL, *bitsOut = NULL;
if (!PyArg_ParseTuple(args, "O", &bitsInArg))
return NULL;
bitsIn = PyBytes_AS_STRING(PyUnicode_AsEncodedString(bitsInArg, "ascii", "strict"));
bitsOut = exposekey(bitsIn);
bitsOutArg = PyUnicode_FromString(bitsOut);
free(bitsOut);
return bitsOutArg;
}
static PyMethodDef moduleMethods[] = {
{"exposekey", (PyCFunction)PyExposekey, METH_VARARGS, NULL},
{NULL}
};
static struct PyModuleDef moduleDef = {
PyModuleDef_HEAD_INIT, MOD_NAME, NULL, -1, moduleMethods
};
PyMODINIT_FUNC PyInit_capi_demo(void) {
return PyModule_Create(&moduleDef);
}
Notes:
It requires swig_demo.h and swig_demo.c (not going to duplicate their contents here)
It only works with Python 3 (actually I got quite some headaches making it work, especially because I was used to PyString_AsString which is no longer present)
Error handling is poor
test_capi.py is similar to test_swig.py with one (obvious) difference: from swig_demo import exposekey should be replaced by from capi_demo import exposekey
The output is also the same to test_swig.py (again, not going to duplicate it here)

Obtaining pointer to python memoryview on bytes object

I have a python memoryview pointing to a bytes object on which I would like to perform some processing in cython.
My problem is:
because the bytes object is not writable, cython does not allow constructing a typed (cython) memoryview from it
I cannot use pointers either because I cannot get a pointer to the memoryview start
Example:
In python:
array = memoryview(b'abcdef')[3:]
In cython:
cdef char * my_ptr = &array[0] fails to compile with the message: Cannot take address of Python variable
cdef char[:] my_view = array fails at runtime with the message: BufferError: memoryview: underlying buffer is not writable
How does one solve this?
Ok, after digging through the python api I found a solution to get a pointer to the bytes object's buffer in a memoryview (here called bytes_view = memoryview(bytes())). Maybe this helps somebody else:
from cpython.buffer cimport PyObject_GetBuffer, PyBuffer_Release, PyBUF_ANY_CONTIGUOUS, PyBUF_SIMPLE
cdef Py_buffer buffer
cdef char * my_ptr
PyObject_GetBuffer(bytes, &buffer, PyBUF_SIMPLE | PyBUF_ANY_CONTIGUOUS)
try:
my_ptr = <char *>buffer.buf
# use my_ptr
finally:
PyBuffer_Release(&buffer)
Using a bytearray (as per #CheeseLover's answer) is probably the right way of doing things. My advice would be to work entirely in bytearrays thereby avoiding temporary conversions. However:
char* can be directly created from a Python string (or bytes) - see the end of the linked section:
cdef char * my_ptr = array
# you can then convert to a memoryview as normal in Cython
cdef char[:] mview = <char[:len(array)]>my_ptr
A couple of warnings:
Remember that bytes is not mutable and if you attempt to modify that memoryview is likely to cause issues
my_ptr (and thus mview) are only valid so long as array is valid, so be sure to keep a reference to array for as long as you need access ti the data,
You can use bytearray to create a mutable memoryview. Please note that this won't change the string, only the bytearray
data = bytearray('python')
view = memoryview(data)
view[0] = 'c'
print data
# cython
If you don't want cython memoryview to fail with 'underlying buffer is not writable' you simply should not ask for a writable buffer. Once you're in C domain you can summarily deal with that writability. So this works:
cdef const unsigned char[:] my_view = array
cdef char* my_ptr = <char*>&my_view[0]

How to use #define as size of char array in Cython

c++ header (some.h) contains:
#define NAME_SIZE 42
struct s_X{
char name[NAME_SIZE + 1]
} X;
I want to use X structure in Python. How could I make it?
I write:
cdef extern from "some.h":
cdef int NAME_SIZE # 42
ctypedef struct X:
char name[NAME_SIZE + 1]
And got an error: Not allowed in a constant expression
It often doesn't really matter what you tell Cython when declaring types - it uses the information for checking you aren't doing anything obviously wrong with type casting and that's it. The cdef extern "some.h" statement ensures that some.h is included into to c-file Cython creates and ultimately that determines what is complied.
Therefore, in this particular case, you can just insert an arbitary number and it will work fine
cdef extern "some.h":
cdef int NAME_SIZE # 42
ctypedef struct X:
char name[2] # you can pick a number at random here
In situations it won't work though, especially where Cython has to actually use the number in the C code it generates. For example:
def some_function():
cdef char_array[NAME_SIZE+1] # won't work! Cython needs to know NAME_SIZE to generate the C code...
# other code follows
(I don't currently have a suggestion as to what to do in this case)
NAME_SIZE doesn't actually exist in your program so you'll probably have to hardcode it in the Python.
Despite how it looks in your C source code, you hardcoded it in the C array declaration, too.

Using Py_buffer and PyMemoryView_FromBuffer with different itemsizes

This question is related to a previous question I asked. Namely this one if anyone is interested. Basically, what I want to do is to expose a C array to Python using a Py_buffer wrapped in a memoryview-object. I've gotten it to work using PyBuffer_FillInfo (work = I can manipulate the data in Python and write it to stdout in C), but if I try to roll my own buffer I get a segfault after the C function returns.
I need to create my own buffer because PyBuffer_FillInfo assumes that the format is char, making the itemsize field 1. I need to be able to provide items of size 1, 2, 4 and 8.
Some code, this is a working example:
Py_buffer *buf = (Py_buffer *) malloc(sizeof(*buf));
int r = PyBuffer_FillInfo(buf, NULL, malloc(sizeof(char) * 4), 4, 0, PyBUF_CONTIG);
PyObject *mv = PyMemoryView_FromBuffer(buf);
//Pack the memoryview object into an argument list and call the Python function
for (blah)
printf("%c\n", *buf->buf++); //this prints the values i set in the Python function
Looking at the implementation of PyBuffer_FillInfo, which is really simple, I rolled my own function to be able to provide custom itemsizes:
//buffer creation function
Py_buffer *getReadWriteBuffer(int nitems, int itemsize, char *fmt) {
Py_buffer *buf = (Py_buffer *) malloc(sizeof(*buf));
buf->obj = NULL
buf->buf = malloc(nitems * itemsize);
buf->len = nitems * itemsize;
buf->readonly = 0;
buf->itemsize = itemsize;
buf->format = fmt;
buf->ndim = 1;
buf->shape = NULL;
buf->strides = NULL;
buf->suboffsets = NULL;
buf->internal = NULL;
return buf;
}
How i use it:
Py_buffer *buf = getReadWriteBuffer(32, 2, "h");
PyObject *mv = PyMemoryView_FromBuffer(buf);
// pack the memoryview into an argument list and call the Python function as before
for (blah)
printf("%d\n", *buf->buf); //this prints all zeroes even though i modify the array in Python
return 0;
//the segfault happens somewhere after here
The result of using my own buffer object is a segfault after the C function returns. I really don't understand why this happens at all. Any help would be most appreciated.
EDIT
According to this question, which I failed to find before, itemsize > 1 might not even be supported at all. Which makes this question even more interesting. Maybe I could use PyBuffer_FillInfo with a large enough block of memory to hold what I want (32 C floats for example). In that case, the question is more about how to assign Python floats to the memoryview object in the Python function. Questions questions.
So, in lack of answers I decided to take another approach than the one I originally intended. Leaving this here in case someone else hits the same snag.
Basically, instead of creating a buffer (or bytearray, equiv.) in C and passing it to Python for the extension user to modify. I simply redesigned the code slightly, so that the user returns a bytearray (or any type that supports the buffer interface) from the Python callback function. This way I need not even worry about the size of the items since, in my case, all the C code does with the returned object is to extract its buffer and copy it to another buffer with a simple memcpy.
Code:
PYGILSTATE_ACQUIRE; //a macro i made
PyObject *result = PyEval_CallObject(python_callback, NULL);
if (!PyObject_CheckBuffer(result))
; //raise exception
Py_buffer *view = (Py_buffer *) malloc(sizeof(*view));
int error = PyObject_GetBuffer(result, view, PyBUF_SIMPLE);
if (error)
; //raise exception
memcpy(my_other_buffer, view->buf, view->len);
PyBuffer_Release(view);
Py_DECREF(result);
PYGILSTATE_RELEASE; //another macro
I hope this helps someone.

Categories