Converting C array to Python bytes

Converting C array to Python bytes - python

I'm trying to use Cython to run a C++ library in python3 environment.
When I try to return the int array back to python like this:
def readBytes(self, length):
cdef int *buffer = []
self.stream.read(buffer, length)
return buffer
I get the error
return buffer
^
Cannot convert 'int *' to Python object
P.S. I don't get errors if I use
cdef char *buffer = ''

it looks like stream.read() allocates the memory pointed by buffer. If this is the case, you cannot return to python space data allocated in C++ space. You should:
1) create a python object, or a numpy array if you prefer, in python/cython code
2) copy the data from the allocated memory pointed by *buffer to your new shiny python object allocated in python space. You can then return this object.
This is necessary because python cannot deal with the memory allocated in C space in any way, and the memory allocated by your C code is leaking, i.e. it will not be deallocated.
Now you also asked why you do not get the error with cdef char *buffer = ''. In this latter case, cython recognizes that buffer points to a string, and automatically generates a new python object with the content pointed by buffer. Example follows for ipython:
%%cython
def ReturnThisString():
cdef char *buffer = 'foobar'
return buffer
print ReturnThisString() #this outputs 'foobar'
Notice that buffer is initialized by your C compiler on the stack, and there's no guarantee that when you use this function from python the string will still be there at that memory position. However, when cython runs the return statement it automatically initializes a python string from your char * pointer. (In python 3 I guess it is converted to bytes as #Veedrac says, but this is a minor point). In this second case, the creation of the python object and the copy operation is hidden and taken care by cython, but it is still there.

char can be automatically coerced to bytes because Cython thinks that they're approximately the same and can do it fast. Note that char * pointers will be null terminated by default.
This is not implemented for int *. You will typically want to coerce to a Numpy object (this is actually wrapping the array). If you want something faster, think about cpython.array.

Related

Storing unsafe C derivative of temporary Python reference Cython

Although I few similar questions already being asked, but I couldn't get head around on how to fix.
Basically I have this function:
Module one.pyx:
cdef char *isInData(data, dataLength):
cdef char *ret = <char *>malloc(200)
memset(ret, 0x00, 200)
if (b"Hello" in data and b"World" in data):
strcat(ret, b"Hello World!")
return ret
Module two.pyx:
import one
from libc.stdlib cimport malloc,realloc, free
cdef char *tempo():
cdef char *str
str = one.isInData(b"Hello what is up World", 23)
# do some stuff
free(str)
Issue occurs on line str = one.isInData("Hello what is up World", 23), I am assuming that as soon as isInData->ret is assigned to str. isInData->ret is deleted which causes this issue. But annyone help me on how to fix this?

import one
This line does a Python import of one. It doesn't know about any cdef functions defined in one (or even that one is a Cython module...). Therefore it assumes that isInData is a Python object that it can look up and that it'll be a callable returning another Python object.
cdf char* str = some_python_function() is unsafe because str points into the Python object. However the Python object is only a temporary and is most likely freed almost immediately.
What you mean is:
cimport one
which tells Cython that it's a Cython module that you're importing, and it should expect to find the declarations at compile-time. You'll probably need to write a pxd file to give the declarations.
I haven't looked in detail at the rest of your code so don't know if you are handling C strings right. But in general you may find it easier just to use Python strings rather than messing around with C string handling.

Pointer to gpu memory to Python from CUDA C [duplicate]

I would like to call my C functions within a shared library from Python scripts. Problem arrises when passing pointers, the 64bit addresses seem to be truncated to 32bit addresses within the called function. Both Python and my library are 64bit.
The example codes below demonstrate the problem. The Python script prints the address of the data being passed to the C function. Then, the address received is printed from within the called C function. Additionally, the C function proves that it is 64bit by printing the size and address of locally creating memory. If the pointer is used in any other way, the result is a segfault.
CMakeLists.txt
cmake_minimum_required (VERSION 2.6)
add_library(plate MODULE plate.c)
plate.c
#include <stdio.h>
#include <stdlib.h>
void plate(float *in, float *out, int cnt)
{
void *ptr = malloc(1024);
fprintf(stderr, "passed address: %p\n", in);
fprintf(stderr, "local pointer size: %lu\n local pointer address: %p\n", sizeof(void *), ptr);
free(ptr);
}
test_plate.py
import numpy
import scipy
import ctypes
N = 3
x = numpy.ones(N, dtype=numpy.float32)
y = numpy.ones(N, dtype=numpy.float32)
plate = ctypes.cdll.LoadLibrary('libplate.so')
print 'passing address: %0x' % x.ctypes.data
plate.plate(x.ctypes.data, y.ctypes.data, ctypes.c_int(N))
Output from python-2.7
In [1]: run ../test_plate.py
passing address: 7f9a09b02320
passed address: 0x9b02320
local pointer size: 8
local pointer address: 0x7f9a0949a400

The problem is that the ctypes module doesn't check the function signature of the function you're trying to call. Instead, it bases the C types on the Python types, so the line...
plate.plate(x.ctypes.data, y.ctypes.data, ctypes.c_int(N))
...is passing the the first two params as integers. See eryksun's answer for the reason why they're being truncated to 32 bits.
To avoid the truncation, you'll need to tell ctypes that those params are actually pointers with something like...
plate.plate(ctypes.c_void_p(x.ctypes.data),
ctypes.c_void_p(y.ctypes.data),
ctypes.c_int(N))
...although what they're actually pointers to is another matter - they may not be pointers to float as your C code assumes.
Update
eryksun has since posted a much more complete answer for the numpy-specific example in this question, but I'll leave this here, since it might be useful in the general case of pointer truncation for programmers using something other than numpy.

Python's PyIntObject uses a C long internally, which is 64-bit on most 64-bit platforms (excluding 64-bit Windows). However, ctypes assigns the converted result to pa->value.i, where value is a union and the i field is a 32-bit int. For the details, see ConvParam in Modules/_ctypes/callproc.c, lines 588-607 and 645-664. ctypes was developed on Windows, where a long is always 32-bit, but I don't know why this hasn't been changed to use the long field instead, i.e. pa->value.l. Probably, it's just more convenient most of the time to default to creating a C int instead of using the full range of the long.
Anyway, this means you can't simply pass a Python int to create a 64-bit pointer. You have to explicitly create a ctypes pointer. You have a number of options for this. If you're not concerned about type safety, the simplest option for a NumPy array is to use its ctypes attribute. This defines the hook _as_parameter_ that lets Python objects set how they're converted in ctypes function calls (see lines 707-719 in the previous link). In this case it creates a void *. For example, you'd call plate like this:
plate.plate(x.ctypes, y.ctypes, N)
However, this doesn't offer any type safety to prevent the function from being called with an array of the wrong type, which will result in either nonsense, bugs, or a segmentation fault. np.ctypeslib.ndpointer solves this problem. This creates a custom type that you can use in setting the argtypes and restype of a ctypes function pointer. This type can verify the array's data type, number of dimensions, shape, and flags. For example:
import numpy as np
import ctypes
c_npfloat32_1 = np.ctypeslib.ndpointer(
dtype=np.float32,
ndim=1,
flags=['C', 'W'])
plate = ctypes.CDLL('libplate.so')
plate.plate.argtypes = [
c_npfloat32_1,
c_npfloat32_1,
ctypes.c_int,
]
N = 3
x = np.ones(N, dtype=np.float32)
y = np.ones(N, dtype=np.float32)
plate.plate(x, y, N) # the parameter is the array itself

If you don't tell ctypes what type the parameters are, it attempts to infer it from the values that you pass to the function. And this inference will not always work as you need.
The recommended way to deal with this is to set the argtypes attribute of the function and so explicitly tell ctypes what the parameter types are.
plate.plate.argtypes = [
ctypes.POINTER(ctypes.c_float),
ctypes.POINTER(ctypes.c_float),
ctypes.c_int
]
Then you can call the function like this:
plate.plate(x.ctypes.data, y.ctypes.data, N)

Actually, You should set plate.argstype = [ctypes.c_void_p, ctypes.c_void_p, ctypes.c_int], and then it will be ok to accept the address in c func from python.
I met the problem and I solved it as what I say.

Interning and memory address for str and Py_UNICODE in cython

Context: I built a Tree data structure that stores single characters in its Nodes in cython. Now I'm wondering whether I can save save memory if I intern all those characters. And whether I should use Py_UNICODE as the variable type or regular str. This is my stripped-down Node object, using Py_UNICODE:
from libc.stdint cimport uintptr_t
from cpython cimport PyObject
cdef class Node():
cdef:
public Py_UNICODE character
def __init__(self, Py_UNICODE character):
self.character = character
def memory(self):
return <uintptr_t>&self.character
If first tried to see whether the characters are interned automatically. If I import that class in Python and create multiple objects with different or the same character, these are the results I get:
a = Node("a")
a_py = a.character
a2 = Node("a")
b = Node("b")
print(a.memory(), a2.memory(), b.memory())
# 140532544296704 140532548558776 140532544296488
print(id(a.character), id(a2.character), id(b.character), id(a_py))
# 140532923573504 140532923573504 140532923840528 140532923573504
So from that I would have concluded that Py_UNICODE is not interned automatically and that using id() in python does not give me the actual memory address, but that of a copy (I suppose python does intern single unicode characters automatically and then just returns the memory address of that to me).
Next I tried doing the same thing when using str instead. Simply replacing Py_UNICODE with str didn't work, so this is how I'm trying to do it now:
%%cython
from libc.stdint cimport uintptr_t
from cpython cimport PyObject
cdef class Node():
cdef:
public str character
def __init__(self, str character):
self.character = character
def memory(self):
return <uintptr_t>(<PyObject*>self.character)
And these are the results I get with that:
...
print(a.memory(), a2.memory(), b.memory())
# 140532923573504 140532923573504 140532923840528
print(id(a.character), id(a2.character), id(b.character), id(a_py))
# 140532923573504 140532923573504 140532923840528 140532923573504
Based on that, I first thought that single character str are interned in cython as well and that cython just doesn't need to copy the characters from python, explaining why id() and .memory() give the same address. But then I tried using longer strings got the same results, from which I probably don't want to conclude that longer strings are also interned automatically? It's also the case that my tree uses less memory if I use Py_UNICODE, so that doesn't make much sense if str were interned, but Py_UNICODE isn't. Can someone explain this behaviour to me? And how would I go about interning?
(I'm testing this in Jupyter, in case that makes a difference)
Edit: Removed an unnecessary id comparison of nodes instead of characters.

There is a misunderstanding from your side. PY_UNICODE isn't a python-object - it is a typedef for wchar_t.
Only string objects (at least some of them) get interned, but not simple C-variables of type wchar_t (or as matter of fact of any C-type). It wouldn't also not make any sense: a wchar_t is most probably 32bit large, while keeping a pointer to an interned object would cost 64bit.
Thus, the memory address of the variable self.character (of type PY_UNICODE) is never the same as long as self are different objects (no matter which value self.character has).
On the other hand, when you call a.character in pure python, Cython knows, that the variable is not a simple 32bit integer and converts it automatically (character is property right?) to an unicode-object via PyUnicode_FromOrdinal. The returned string (i.e. a_py) might be "interned" or not.
When the code point of this character is less than 256 (i.e. latin1), it gets kind of interned - otherwise not. The first 256 unicode-objects consisting of only one character have a special place - not the same as other interned strings (thus the usage of "interned" in the previous section).
Consider:
>>> a="\u00ff" # ord(a) = 255
>>> b="\u00ff"
>>> a is b
# True
but
>>> a="\u0100" # ord(a) = 256
>>> b="\u0100"
>>> a is b
# False
The key take away from this is: use PY_UNICODE - it is cheaper (4 bytes) even if not interned than interned strings/unicode-objects (8 bytes for reference + once the memory for the interned object) and much cheaper than not interned objects (which can happen).
Or better, as #user2357112 has pointed out, use Py_UCS4 to ensure that the size of 4 bytes is guaranteed (which are needed to be able to support all possible unicode-characters) - wchar_t could be as small as 1 byte (even if this is probably pretty unusual nowdays). If you know more about the used characters, you could fall back to Py_UCS2 or Py_UCS1.
However, when using Py_UCS2 or Py_USC1 one must take into consideration, that Cython will no support the conversion from/to unicode as in the case of Py_UCS4 (or deprecated Py_UNICODE) and one will have to do it by hand, for example:
%%cython
from libc.stdint cimport uint16_t
# need to wrap typedef as Cython doesn't do it
cdef extern from "Python.h":
ctypedef uint16_t Py_UCS2
cdef class Node:
cdef:
Py_UCS2 character_
#property
def character(self):
# cython will do the right thing for Py_USC4
return <Py_UCS4>(self.character_)
def __init__(self, str character):
# unicode -> Py_UCS4 managed by Cython
# Py_UCS4 -> Py_UCS2 is a simple C-cast
self.character_ = <Py_UCS2><Py_UCS4>(character)
One should also make sure, that using Py_USC2 really saves memory: CPython uses pymalloc which has alignment of 8 bytes, that means that an object of e.g. 20 bytes will still use 24 bytes (3*8) memory. Another issue is alignment for structs coming from C-compiler, for
struct A{
long long int a;
long long int b;
char ch;
};
sizeof(A) is 24 instead of 17 (see live).
If one is really after these two bytes, then there is a bigger fish to fry: Don't make nodes Python-objects as it brings the overhead of 16 bytes for not really needed polymorphism and reference counting - that means the whole data structure should be written with C and wrappend as whole in Python. However also here make sure to allocate memory in right fashion: the usual C-runtime memory allocators have 32 or 64 bytes alignment, i.e. allocation smaller sizes still leads to usage of 32/64 bytes.

Obtaining pointer to python memoryview on bytes object

I have a python memoryview pointing to a bytes object on which I would like to perform some processing in cython.
My problem is:
because the bytes object is not writable, cython does not allow constructing a typed (cython) memoryview from it
I cannot use pointers either because I cannot get a pointer to the memoryview start
Example:
In python:
array = memoryview(b'abcdef')[3:]
In cython:
cdef char * my_ptr = &array[0] fails to compile with the message: Cannot take address of Python variable
cdef char[:] my_view = array fails at runtime with the message: BufferError: memoryview: underlying buffer is not writable
How does one solve this?

Ok, after digging through the python api I found a solution to get a pointer to the bytes object's buffer in a memoryview (here called bytes_view = memoryview(bytes())). Maybe this helps somebody else:
from cpython.buffer cimport PyObject_GetBuffer, PyBuffer_Release, PyBUF_ANY_CONTIGUOUS, PyBUF_SIMPLE
cdef Py_buffer buffer
cdef char * my_ptr
PyObject_GetBuffer(bytes, &buffer, PyBUF_SIMPLE | PyBUF_ANY_CONTIGUOUS)
try:
my_ptr = <char *>buffer.buf
# use my_ptr
finally:
PyBuffer_Release(&buffer)

Using a bytearray (as per #CheeseLover's answer) is probably the right way of doing things. My advice would be to work entirely in bytearrays thereby avoiding temporary conversions. However:
char* can be directly created from a Python string (or bytes) - see the end of the linked section:
cdef char * my_ptr = array
# you can then convert to a memoryview as normal in Cython
cdef char[:] mview = <char[:len(array)]>my_ptr
A couple of warnings:
Remember that bytes is not mutable and if you attempt to modify that memoryview is likely to cause issues
my_ptr (and thus mview) are only valid so long as array is valid, so be sure to keep a reference to array for as long as you need access ti the data,

You can use bytearray to create a mutable memoryview. Please note that this won't change the string, only the bytearray
data = bytearray('python')
view = memoryview(data)
view[0] = 'c'
print data
# cython

If you don't want cython memoryview to fail with 'underlying buffer is not writable' you simply should not ask for a writable buffer. Once you're in C domain you can summarily deal with that writability. So this works:
cdef const unsigned char[:] my_view = array
cdef char* my_ptr = <char*>&my_view[0]

Python is passing 32bit pointer address to C functions

I would like to call my C functions within a shared library from Python scripts. Problem arrises when passing pointers, the 64bit addresses seem to be truncated to 32bit addresses within the called function. Both Python and my library are 64bit.
The example codes below demonstrate the problem. The Python script prints the address of the data being passed to the C function. Then, the address received is printed from within the called C function. Additionally, the C function proves that it is 64bit by printing the size and address of locally creating memory. If the pointer is used in any other way, the result is a segfault.
CMakeLists.txt
cmake_minimum_required (VERSION 2.6)
add_library(plate MODULE plate.c)
plate.c
#include <stdio.h>
#include <stdlib.h>
void plate(float *in, float *out, int cnt)
{
void *ptr = malloc(1024);
fprintf(stderr, "passed address: %p\n", in);
fprintf(stderr, "local pointer size: %lu\n local pointer address: %p\n", sizeof(void *), ptr);
free(ptr);
}
test_plate.py
import numpy
import scipy
import ctypes
N = 3
x = numpy.ones(N, dtype=numpy.float32)
y = numpy.ones(N, dtype=numpy.float32)
plate = ctypes.cdll.LoadLibrary('libplate.so')
print 'passing address: %0x' % x.ctypes.data
plate.plate(x.ctypes.data, y.ctypes.data, ctypes.c_int(N))
Output from python-2.7
In [1]: run ../test_plate.py
passing address: 7f9a09b02320
passed address: 0x9b02320
local pointer size: 8
local pointer address: 0x7f9a0949a400

The problem is that the ctypes module doesn't check the function signature of the function you're trying to call. Instead, it bases the C types on the Python types, so the line...
plate.plate(x.ctypes.data, y.ctypes.data, ctypes.c_int(N))
...is passing the the first two params as integers. See eryksun's answer for the reason why they're being truncated to 32 bits.
To avoid the truncation, you'll need to tell ctypes that those params are actually pointers with something like...
plate.plate(ctypes.c_void_p(x.ctypes.data),
ctypes.c_void_p(y.ctypes.data),
ctypes.c_int(N))
...although what they're actually pointers to is another matter - they may not be pointers to float as your C code assumes.
Update
eryksun has since posted a much more complete answer for the numpy-specific example in this question, but I'll leave this here, since it might be useful in the general case of pointer truncation for programmers using something other than numpy.

Python's PyIntObject uses a C long internally, which is 64-bit on most 64-bit platforms (excluding 64-bit Windows). However, ctypes assigns the converted result to pa->value.i, where value is a union and the i field is a 32-bit int. For the details, see ConvParam in Modules/_ctypes/callproc.c, lines 588-607 and 645-664. ctypes was developed on Windows, where a long is always 32-bit, but I don't know why this hasn't been changed to use the long field instead, i.e. pa->value.l. Probably, it's just more convenient most of the time to default to creating a C int instead of using the full range of the long.
Anyway, this means you can't simply pass a Python int to create a 64-bit pointer. You have to explicitly create a ctypes pointer. You have a number of options for this. If you're not concerned about type safety, the simplest option for a NumPy array is to use its ctypes attribute. This defines the hook _as_parameter_ that lets Python objects set how they're converted in ctypes function calls (see lines 707-719 in the previous link). In this case it creates a void *. For example, you'd call plate like this:
plate.plate(x.ctypes, y.ctypes, N)
However, this doesn't offer any type safety to prevent the function from being called with an array of the wrong type, which will result in either nonsense, bugs, or a segmentation fault. np.ctypeslib.ndpointer solves this problem. This creates a custom type that you can use in setting the argtypes and restype of a ctypes function pointer. This type can verify the array's data type, number of dimensions, shape, and flags. For example:
import numpy as np
import ctypes
c_npfloat32_1 = np.ctypeslib.ndpointer(
dtype=np.float32,
ndim=1,
flags=['C', 'W'])
plate = ctypes.CDLL('libplate.so')
plate.plate.argtypes = [
c_npfloat32_1,
c_npfloat32_1,
ctypes.c_int,
]
N = 3
x = np.ones(N, dtype=np.float32)
y = np.ones(N, dtype=np.float32)
plate.plate(x, y, N) # the parameter is the array itself

If you don't tell ctypes what type the parameters are, it attempts to infer it from the values that you pass to the function. And this inference will not always work as you need.
The recommended way to deal with this is to set the argtypes attribute of the function and so explicitly tell ctypes what the parameter types are.
plate.plate.argtypes = [
ctypes.POINTER(ctypes.c_float),
ctypes.POINTER(ctypes.c_float),
ctypes.c_int
]
Then you can call the function like this:
plate.plate(x.ctypes.data, y.ctypes.data, N)

Actually, You should set plate.argstype = [ctypes.c_void_p, ctypes.c_void_p, ctypes.c_int], and then it will be ok to accept the address in c func from python.
I met the problem and I solved it as what I say.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting C array to Python bytes - python

Related

Storing unsafe C derivative of temporary Python reference Cython

Pointer to gpu memory to Python from CUDA C [duplicate]

Interning and memory address for str and Py_UNICODE in cython

Obtaining pointer to python memoryview on bytes object

Python is passing 32bit pointer address to C functions

Categories

Resources