C char array from python string - python

I have a list of strings in python which I'm trying to pass down to a C extension for character analysis. I've gotten so far as to have the list broken up into their individual string PyObjects. Next, I'm hoping to split these strings into their individual characters so that every string PyObject is now a corresponding C-type character array. I can't seem to figure out how to do this though.
Here's what I have so far: Currently after building the .pyd file it will return a list of 1's as a filler to Python (so everything else works), I just don't know how to split a string PyObject into the C-type character array.
--- cExt.c ---
#include <Python.h>
#include <stdio.h>
static int *CitemCheck(PyObject *commandString, int commandStringLength) {
// HAALP
//char* commandChars = (char*) malloc(commandStringLength*sizeof(char*));
// char c[] = PyString_AsString("c", commandString);
// printf("%c" , c);
// printf("%s", PyString_AsString(commandString));
// for (int i=0; i<sizeof(commandChars)/sizeof(*commandChars); i++) {
// printf("%s", PyString_AsString(commandString));
// printf("%c", commandChars[i]);
// }
return 1; // TODO: RETURN PROPER RESULTANT
}
static PyObject *ClistCheck(PyObject *commandList, int commandListLength) {
PyObject *results = PyList_New(commandListLength);
for (int index = 0; index < commandListLength; index++) {
PyObject *commandString;
commandString = PyList_GetItem(commandList, index);
int commandStringLength = PyObject_Length(commandString);
// CitemCheck should take string PyObject and its length as int
int x = CitemCheck(commandString, commandStringLength);
PyObject* pyItem = Py_BuildValue("i", x);
PyList_SetItem(results, index, pyItem);
}
return results;
}
static PyObject *parseListCheck(PyObject *self, PyObject *args) {
PyObject *commandList;
int commandListLength;
if (!PyArg_ParseTuple(args, "O", &commandList)){
return NULL;
}
commandListLength = PyObject_Length(commandList);
return Py_BuildValue("O", ClistCheck(commandList, commandListLength));
}
static char listCheckDocs[] =
""; // TODO: ADD DOCSTRING
static PyMethodDef listCheck[] = {
{"listCheck", (PyCFunction) parseListCheck, METH_VARARGS, listCheckDocs},
{NULL,NULL,0,NULL}
};
static struct PyModuleDef DCE = {
PyModuleDef_HEAD_INIT,
"listCheck",
NULL,
-1,
listCheck
};
PyMODINIT_FUNC PyInit_cExt(void){
return PyModule_Create(&DCE);
}
for reference, my temporary extension build file:
--- _c_setup.py ---
(located in same folder as cExt.c)
"""
to build C files, pass:
python _c_setup.py build_ext --inplace clean --all
in command prompt which is cd'd to the file's dierctory
"""
import glob
from setuptools import setup, Extension, find_packages
from os import path
here = path.abspath(path.dirname(__file__))
files = [path.split(x)[1] for x in glob.glob(path.join(here, '**.c'))]
extensions = [Extension(
path.splitext(x)[0], [x]
) for x in files]
setup(
ext_modules = extensions,
)

You can use PyUnicode_AsEncodedString, which
Encode a Unicode object and return the result as Python bytes object. encoding and errors have the same meaning as the parameters of the same name in the Unicode encode() method. The codec to be used is looked up using the Python codec registry. Return NULL if an exception was raised by the codec.
see https://docs.python.org/3/c-api/unicode.html#c.PyUnicode_AsEncodedString
Then with PyBytes_AsString you get a pointer to internal buffer with a terminating NUL byte. This buffer must neither be deallocated nor modified. If you need a copy you could use e.g. strdup.
see https://docs.python.org/3/c-api/bytes.html#c.PyBytes_AsString
Slightly modifying your code it could look like this:
PyObject *encodedString = PyUnicode_AsEncodedString(commandString, "UTF-8", "strict");
if (encodedString) { //returns NULL if an exception was raised
char *commandChars = PyBytes_AsString(encodedString); //pointer refers to the internal buffer of encodedString
if(commandChars) {
printf("the string '%s' consists of the following chars:\n", commandChars);
for (int i = 0; commandChars[i] != '\0'; i++) {
printf("%c ", commandChars[i]);
}
printf("\n");
}
Py_DECREF(encodedString);
}
If one would test with:
import cExt
fruits = ["apple", "pears", "cherry", "pear", "blueberry", "strawberry"]
res = cExt.listCheck(fruits)
print(res)
The output would be:
the string 'apple' consists of the following chars:
a p p l e
the string 'pears' consists of the following chars:
p e a r s
the string 'cherry' consists of the following chars:
c h e r r y
the string 'pear' consists of the following chars:
p e a r
the string 'blueberry' consists of the following chars:
b l u e b e r r y
the string 'strawberry' consists of the following chars:
s t r a w b e r r y
[1, 1, 1, 1, 1, 1]
Side note not directly related to the question:
Your CitemCheck function returns a pointer to int, but if looking at how it is called, it seems that you want to return an int value. The function signature should look more like this:
static int CitemCheck(PyObject *commandString, int commandStringLength)
(note the removed * after int).

Related

SWIG struct pointer as output parameter

I have a struct:
struct some_struct_s {
int arg1;
int arg2;
};
I have a C function:
int func(some_struct_s *output);
Both are %included into my SWIG file.
I want some_struct_s *output to be treated like an output parameter. Python example:
int_val, some_struct_output = func()
"Output parameters" is covered in the manual for POD-types (sec 10.1.3), but not for non-POD types.
How do I tell SWIG I want some_struct_s *output to be an output parameter?
From the documentation:
11.5.7 "argout" typemap
The "argout" typemap is used to return values from arguments. This is most commonly used to write wrappers for C/C++ functions that need to return multiple values. The "argout" typemap is almost always combined with an "in" typemap---possibly to ignore the input value....
Here's a complete example for your code (no error checking for brevity):
%module test
// Declare an input typemap that suppresses requiring any input and
// declare a temporary stack variable to hold the return data.
%typemap(in,numinputs=0) some_struct_s* (some_struct_s tmp) %{
$1 = &tmp;
%}
// Declare an output argument typemap. In this case, we'll use
// a tuple to hold the structure data (no error checking).
%typemap(argout) some_struct_s* (PyObject* o) %{
o = PyTuple_New(2);
PyTuple_SET_ITEM(o,0,PyLong_FromLong($1->arg1));
PyTuple_SET_ITEM(o,1,PyLong_FromLong($1->arg2));
$result = SWIG_Python_AppendOutput($result,o);
%}
// Instead of a header file, we'll just declare this code inline.
// This includes the code in the wrapper, as well as telling SWIG
// to create wrappers in the target language.
%inline %{
struct some_struct_s {
int arg1;
int arg2;
};
int func(some_struct_s *output) {
output->arg1 = 1;
output->arg2 = 2;
return 0;
}
%}
Demo below. Note that the int return value of zero as well as the output parameter as a tuple are returned as a list.
>>> import test
>>> test.func()
[0, (1, 2)]
If you don't want typemaps, you can also inject code to create the object and return it to hide it from the user:
%module test
%rename(_func) func; // Give the wrapper a different name
%inline %{
struct some_struct_s {
int arg1;
int arg2;
};
int func(struct some_struct_s *output)
{
output->arg1 = 1;
output->arg2 = 2;
return 0;
}
%}
// Declare your interface
%pythoncode %{
def func():
s = some_struct_s()
r = _func(s)
return r, s
%}
Demo:
>>> import test
>>> r, s = test.func()
>>> r
0
>>> s
<test.some_struct_s; proxy of <Swig Object of type 'some_struct_s *' at 0x000001511D70A880> >
>>> s.arg1
1
>>> s.arg2
2
You can make the typemap language agnostic if you carefully select SWIG macros:
%module test
%typemap(in,numinputs=0) struct some_struct_s *output %{
$1 = malloc(sizeof(struct some_struct_s));
%}
%typemap(argout) struct some_struct_s* output {
// The last parameter passes ownership of the pointer
// to Python so it will be freed when the object's
// reference count goes to zero.
%append_output(SWIG_NewPointerObj($1, $1_descriptor, SWIG_POINTER_OWN));
}
%inline %{
struct some_struct_s {
int arg1;
int arg2;
};
int func(struct some_struct_s *output)
{
output->arg1 = 1;
output->arg2 = 2;
return 0;
}
%}
Demo:
>>> import test
>>> r, s = test.func()
>>> r
0
>>> s
<test.some_struct_s; proxy of <Swig Object of type 'some_struct_s *' at 0x000001DD0425A700> >
>>> s.arg1
1
>>> s.arg2
2

How to get a string from C++ to python when using ctypes and wchar_t?

I can:
Get an integer from C++ and use it in python
Send a python string (as a wchar_t) to C++ and do some logic with it
I cannot
Step 2 in opposite direction.
Here is my C++ code (compiled with clion and cygwin as a shared library using C++14).
#include <iostream>
wchar_t aa[2];
extern "C" {
int DoA()
{
return 10;
}
int DoB(wchar_t * in)
{
if (in[1] == 'a')
{
return 25;
}
return 30;
}
wchar_t * DoC()
{
aa[0] = 'a';
aa[1] = 'b';
return aa;
}
}
Here is my python 3.6.1 code that shows what I can and what I cannot do. So how should I get my string and do things with it in python? I expect to use the address with wstring_at to get the value, but it is not working.
from ctypes import *
import os.path
print('Hello')
itExist = os.path.exists('C:/Users/Daan/CLionProjects/stringproblem/cmake-build-release/cygstringproblem.dll')
print(itExist)
lib = cdll.LoadLibrary('C:/Users/Daan/CLionProjects/stringproblem/cmake-build-release/cygstringproblem.dll')
print('dll loaded')
A = lib.DoA()
print(A)
Bx = lib.DoB(c_wchar_p('aaa'))
print(Bx)
By = lib.DoB(c_wchar_p('bbb'))
print(By)
Ca = lib.DoC()
print(Ca)
print('Issue is coming')
Cb = wstring_at(Ca,2)
print(Cb)
Here is the output with error.
Hello
True
dll loaded
10
25
30
-1659080704
Issue is coming
Traceback (most recent call last):
File "ShowProblem.py", line 19, in <module>
Cb = wstring_at(Ca,2)
File "C:\Users\Daan\AppData\Local\Programs\Python\Python36\lib\ctypes\__init__.py", line 504, in wstring_at
return _wstring_at(ptr, size)
OSError: exception: access violation reading 0xFFFFFFFF9D1C7000
I reproduced your problem on Linux and corrected it by defining the return type from your DoC function:
from ctypes import *
print('Hello')
lib = cdll.LoadLibrary(PATH_TO_TOUR_LIB)
print('dll loaded')
# this line solved the issue for me
lib.DoC.restype = c_wchar_p
A = lib.DoA()
print(A)
Bx = lib.DoB(c_wchar_p('aaa'))
print(Bx)
By = lib.DoB(c_wchar_p('bbb'))
print(By)
Ca = lib.DoC()
print(Ca)
print('Issue is coming')
Cb = wstring_at(Ca,2)
print(Cb)
I also allocated the memory dynamically (some Python expert might comment on this, I guess that this causes a memory leak):
extern "C" {
int DoA()
{
return 10;
}
int DoB(wchar_t * in)
{
if (in[1] == 'a')
{
return 25;
}
return 30;
}
wchar_t * DoC()
{
wchar_t* aa = new wchar_t[2];
aa[0] = 'a';
aa[1] = 'b';
return aa;
}
}
Let me know if it works on Windows.
If you set the .argtypes and .restype of your wrapped functions, you can call them more naturally. To handle an output string, it will be thread safe if you allocate the buffer in Python instead of using a global variable, or just return a wide string constant. Here's an example coded for the Microsoft compiler:
test.c
#include <wchar.h>
#include <string.h>
__declspec(dllexport) int DoA(void) {
return 10;
}
__declspec(dllexport) int DoB(const wchar_t* in) {
if(wcslen(in) > 1 && in[1] == 'a') // Make sure not indexing past the end.
return 25;
return 30;
}
// This version good if variable data is returned.
// Need to pass a buffer of sufficient length.
__declspec(dllexport) int DoC(wchar_t* aa, size_t length) {
if(length < 3)
return 0;
aa[0] = 'a';
aa[1] = 'b';
aa[2] = '\0';
return 1;
}
// Safe to return a constant. No memory leak.
__declspec(dllexport) wchar_t* DoD(void) {
return L"abcdefg";
}
test.py
from ctypes import *
# Set up the arguments and return type
lib = CDLL('test')
lib.DoA.argtypes = None
lib.DoA.restype = c_int # default, but just to be thorough.
lib.DoB.argtypes = [c_wchar_p]
lib.DoB.restype = c_int
lib.DoC.argtypes = [c_wchar_p,c_size_t]
lib.DoC.restype = c_int
lib.DoD.argtypes = None
lib.DoD.restype = c_wchar_p
# Map to local namespace functions
DoA = lib.DoA
DoB = lib.DoB
DoD = lib.DoD
# Do some pre- and post-processing to hide the memory details.
def DoC():
tmp = create_unicode_buffer(3) # Writable array of wchar_t.
lib.DoC(tmp,sizeof(tmp))
return tmp.value # return a Python string instead of the ctypes array.
print(DoA())
print(DoB('aaa'))
print(DoB('bbb'))
print(DoC())
print(DoD())
Output:
10
25
30
ab
abcdefg

Updating an LP_c_ubyte buffer created in a C DLL

I am creating a Python wrapper for a C DLL using Python ctypes.
In the Python code below I am creating a array connectionString of c_ubyte that I need to fill int the individual. For example 1,2,3,4,5,6... This connection string is passed to the DLL's DoCallBack function and printed. A buffer is created for the callback function to fill in and everything is passed to the python call back function.
I am looking for a way to update the connectionString bytes before passing them to the DLL's DoCallBack.
Then how to extract the bytes from the connectionString in the python callbackFnk function.
I am looking for a way to update the bytes in outBuffer from the callbackFnk python function
A continuation of this question
In python how do I set the value of a LP_c_ubyte
C DLL Code
typedef void(*FPCallback)(unsigned char * outBuffer, unsigned short MaxOutBufferLength, unsigned char * connectionString);
FPCallback g_Callback;
extern "C" __declspec( dllexport ) void RegisterCallback(void(*p_Callback)( unsigned char * outBuffer, unsigned short MaxOutBufferLength, unsigned char * connectionString)) {
g_Callback = p_Callback ;
}
extern "C" __declspec( dllexport ) void DoCallBack( unsigned char connectionString) {
printf( "connectionString=[%02x %02x %02x %02x %02x %02x...]\n", connectionString[0], connectionString[1], connectionString[2], connectionString[3], connectionString[4], connectionString[5] );
const unsigned short MAX_BUFFER_SIZE = 6 ;
unsigned char outBuffer[MAX_BUFFER_SIZE];
g_Callback( outBuffer, MAX_BUFFER_SIZE, connectionString, 6 );
// Print the results.
printf( "buffer=[%02x %02x %02x %02x %02x %02x...]\n", buffer[0], buffer[1], buffer[2], buffer[3], buffer[4], buffer[5] );
}
Python code
def callbackFnk( outBuffer, outBufferMaxSize, connectionString )
# (Q2) How do I extract individual bytes of the connectionString?
# (Q3) How do I update individual bytes of the out buffer?
customDLL = cdll.LoadLibrary ("customeDLL.dll")
# RegisterCallback
CustomDLLCallbackFUNC = CFUNCTYPE(None, POINTER( c_ubyte), c_ushort, POINTER( c_ubyte) )
CustomDLLCallback_func = CustomDLLCallbackFUNC( callbackFnk )
RegisterCallback = customDLL.RegisterCallback
RegisterCallback.argtypes = [ CustomDLLCallbackFUNC ]
RegisterCallback( CustomDLLCallback_func )
# DoCallBack
DoCallBack = customDLL.DoCallBack
DoCallBack.argtypes = [ POINTER( c_ubyte) ]
connectionString = c_ubyte(6)
# (Q1) How do I update this array of bytes?
# Call the callback
DoCallBack(connectionString)
The OP's example has a number of errors and doesn't compile, so I put this together. I assume connectionString is just a nul-terminated input string, and demonstrate updating the output string in the callback.
Note with an input string, c_char_p can be the type and a Python byte string can be passed. c_wchar_p is used for Python Unicode strings. The string must not be modified in the C code. The callback will receive it as a Python string as well, making it easy to read.
The output buffer can just be indexed, being careful to not index past the length of the buffer. Output buffers allocated by the caller should always be passed as a pointer-and-length.
C++ DLL
#include <stdio.h>
typedef void (*CALLBACK)(const char* string, unsigned char* buffer, size_t size);
CALLBACK g_pCallback;
extern "C" __declspec(dllexport) void RegisterCallback(CALLBACK pCallback) {
g_pCallback = pCallback;
}
extern "C" __declspec(dllexport) void DoCallBack(char* string) {
unsigned char buf[6];
printf("string = %s\n", string);
g_pCallback(string, buf, sizeof(buf));
printf("buf = [%02x %02x %02x %02x %02x %02x]\n", buf[0], buf[1], buf[2], buf[3], buf[4], buf[5]);
}
Python
from ctypes import *
CALLBACK = CFUNCTYPE(None,c_char_p,POINTER(c_ubyte),c_size_t)
#CALLBACK
def callback(string,buf,length):
print(string)
for i in range(length):
buf[i] = i * 2
dll = CDLL('test')
# RegisterCallback
RegisterCallback = dll.RegisterCallback
RegisterCallback.argtypes = [CALLBACK]
RegisterCallback.restype = None
RegisterCallback(callback)
# DoCallBack
DoCallBack = dll.DoCallBack
DoCallBack.argtypes = [c_char_p]
DoCallBack.restype = None
DoCallBack(b'test string')
Output
string = test string
b'test string'
buf = [00 02 04 06 08 0a]

Python CFFI doesn't copy typedef from cdef() into generated C file

I'm feeding a generated header file into ffi.cdef(), with a bunch of typedefs like this at the beginning:
typedef enum
{
LE_GPIO_EDGE_NONE = 0,
LE_GPIO_EDGE_RISING = 1,
// ...etc...
}
le_gpio_Edge_t;
Then I try to compile it:
with open(args.api_name + '_cdef.h') as f:
cdef = f.read()
ffibuilder.cdef(cdef)
if __name__ == "__main__":
ffibuilder.compile(verbose=True)
But it generates C code like this:
static int _cffi_const_LE_GPIO_EDGE_RISING(unsigned long long *o)
{
int n = (LE_GPIO_EDGE_RISING) <= 0;
*o = (unsigned long long)((LE_GPIO_EDGE_RISING) | 0); /* check that LE_GPIO_EDGE_RISING is an integer */
return n;
}
Which causes the build to fail, because the symbol LE_GPIO_EDGE_RISING isn't defined anywhere (or referenced anywhere else)
le_gpio.c: In function ‘_cffi_const_LE_GPIO_EDGE_RISING’:
le_gpio.c:494:12: error: ‘LE_GPIO_EDGE_RISING’ undeclared (first use in this function)
int n = (LE_GPIO_EDGE_RISING) <= 0;
Method ffibuilder.set_source seems to place the type definition to the generated C file.
import cffi
ffibuilder = cffi.FFI()
tdef = r"""
typedef enum
{
LE_GPIO_EDGE_NONE = 0,
LE_GPIO_EDGE_RISING = 1,
// ...etc...
} le_gpio_Edge_t;
"""
ffibuilder.set_source("package._foo", tdef)
ffibuilder.cdef(tdef)
if __name__ == "__main__":
ffibuilder.compile(verbose=True)
See documentation for c_header_source argument of set_source.

C equivalent to python pickle (object serialization)?

What would be the C equivalent to this python code?
Thanks.
data = gather_me_some_data()
# where data = [ (metic, datapoints), ... ]
# and datapoints = [ (timestamp, value), ... ]
serialized_data = cPickle.dumps(data, protocol=-1)
length_prefix = struct.pack("!L", len(serialized_data))
message = length_prefix + serialized_data
C doesn't supports direct serialization mechanism because in C you can't get type information at run-time. You must yourself inject some type info at run-time and then construct required object by that type info. So define all your possible structs:
typedef struct {
int myInt;
float myFloat;
unsigned char myData[MY_DATA_SIZE];
} MyStruct_1;
typedef struct {
unsigned char myUnsignedChar;
double myDouble;
} MyStruct_2;
Then define enum which collects info about what structs in total you have:
typedef enum {
ST_MYSTRUCT_1,
ST_MYSTRUCT_2
} MyStructType;
Define helper function which lets to determine any struct size:
int GetStructSize(MyStructType structType) {
switch (structType) {
case ST_MYSTRUCT_1:
return sizeof(MyStruct_1);
case ST_MYSTRUCT_2:
return sizeof(MyStruct_2);
default:
// OOPS no such struct in our pocket
return 0;
}
}
Then define serialize function:
void BinarySerialize(
MyStructType structType,
void * structPointer,
unsigned char * serializedData) {
int structSize = GetStructSize(structType);
if (structSize != 0) {
// copy struct metadata to serialized bytes
memcpy(serializedData, &structType, sizeof(structType));
// copy struct itself
memcpy(serializedData+sizeof(structType), structPointer, structSize);
}
}
And de-serialization function:
void BinaryDeserialize(
MyStructType structTypeDestination,
void ** structPointer,
unsigned char * serializedData)
{
// get source struct type
MyStructType structTypeSource;
memcpy(&structTypeSource, serializedData, sizeof(structTypeSource));
// get source struct size
int structSize = GetStructSize(structTypeSource);
if (structTypeSource == structTypeDestination && structSize != 0) {
*structPointer = malloc(structSize);
memcpy(*structPointer, serializedData+sizeof(structTypeSource), structSize);
}
}
Serialization usage example:
MyStruct_2 structInput = {0x69, 0.1};
MyStruct_1 * structOutput_1 = NULL;
MyStruct_2 * structOutput_2 = NULL;
unsigned char testSerializedData[SERIALIZED_DATA_MAX_SIZE] = {0};
// serialize structInput
BinarySerialize(ST_MYSTRUCT_2, &structInput, testSerializedData);
// try to de-serialize to something
BinaryDeserialize(ST_MYSTRUCT_1, &structOutput_1, testSerializedData);
BinaryDeserialize(ST_MYSTRUCT_2, &structOutput_2, testSerializedData);
// determine which object was de-serialized
// (plus you will get code-completion support about object members from IDE)
if (structOutput_1 != NULL) {
// do something with structOutput_1
free(structOutput_1);
}
else if (structOutput_2 != NULL) {
// do something with structOutput_2
free(structOutput_2);
}
I think this is most simple serialization approach in C. But it has some problems:
struct must not have pointers, because you will never know how much memory one needs to allocate when serializing pointers and from where/how to serialize data into pointers.
this example has issues with system endianess - you need to be careful about how data is stored in memory - in big-endian or little-endian fashion and reverse bytes if needed [when casting char * to integal type such as enum] (...or refactor code to be more portable).
If you can use C++, there is the PicklingTools library

Categories