I am using C++ to implement a dll, and python to use it. When I pass a numpy to dll, a bug is reported
My C++ code is:
__declspec(dllexport) void image(float * img, int m, int n)
{
for (int i = 0; i < m*n; i++)
{
printf("%d ", img[i]);
}
}
In the above code, I just pass a numpy, and print it.
Then, my python code to use this dll is:
import ctypes
import numpy as np
lib = ctypes.cdll.LoadLibrary("./bin/Release/dllAPI.dll")
img = np.random.random(size=[3, 3])*10
img = img.astype(np.float)
img_p = img.ctypes.data_as(ctypes.POINTER(ctypes.c_float))
lib.image.argtypes = (ctypes.POINTER(ctypes.c_float), ctypes.c_int, ctypes.c_int)
lib.image.restype = None
print('c++ result is: ')
lib.image(img_p, 3, 3)
print('\n original data is:')
print(img)
The printed information is:
c++ result is:
-536870912 -1073741824 0 1610612736 -1073741824 1073741824 1610612736 536870912 -2147483648
original data is:
[[7.76128455 3.16101652 7.44757958]
[2.32058998 9.96955139 3.26344099]
[9.42976627 1.34360611 8.4006054 ]]
My C++ code print a random number, and it looks like the memory is leakage.
My environment is:
win 10
vs 2015 x64
python 3.7
ctypes 1.1.0
How can I pass a numpy to C++ by ctypes? Any suggestion is appreciated~~~
----------------- update ----------------------
The C++ code has a bug for printf, and I updated the C++ code as:
__declspec(dllexport) void image(float * img, int m, int n)
{
for (int i = 0; i < m*n; i++)
{
printf("%f ", img[i]);
}
}
However, the printed information is still incorrect:
c++ result is:
12425670907412733350375273403699429376.000000 2.468873 0.000000 2.502769 4435260031901892608.000000 2.458449 -0.000000 2.416936 -4312230798506823897841664.000000
original data is:
[[7.50196262 8.08859399 7.33518741]
[6.67098506 0.04736352 9.5017838 ]
[3.47544102 9.09726041 0.48091646]]
So after a bit of digging I found that apparently numpy.float, was actually equivalent to ctypes.c_double(?), and you should do img.astype(ctypes.c_float), so that the pointer points to the valid result. I don't know why np.float would be a 64 bit float, but from my debugging that's what I saw. If you want to reproduce this try printing img_p[0] in python to see.
Note: Using the specific cast may be best practice in this case, because different dtypes may be implementation defined, leading to this exact error (even though both are float)
Related
Getting memory allocation errors when running a compiled version of the following code. This is an application where a struct of pointers is defined and I would like to assign a value to the pointer and then pass this struct to c code. I have seen other examples and questions on this subject and I believe this is being done correctly, however still having issues.
The code will compile fine, however it crashes Python when running it. Debugging with Visual Studio, it is showing a memory access violation. I have researched this quite a bit but am unable to come up with a reason why this is happening. Was able to reproduce this on a different computer.
I believe it has something to do with how the struct test_M is being allocated and referenced on the stack. I've tried several different variations of defining the test_M.param.gain_val, the one shown does allow the code to compile fine and I can get the output to print on the screen. However, Python crashes immediately after this.
Unfortunately I can not modify the c code because this is the format auto-generated code from Matlab/Simulink embedded coder.
Any help would be appreciated.
Using:
python = 3.6
cython = 0.26
numpy = 1.13.1
Visual Studio 2017 v15
ccodetest.c
#include <stdlib.h>
typedef struct P_T_ P_T;
typedef struct tag_T RT_MODEL_T;
struct P_T_ {
double gain_val;
};
struct tag_T {
P_T *param;
};
void compute(double array_in[4], RT_MODEL_T *const test_M)
{
P_T *test_P = ((P_T *) test_M->param);
int size;
size = sizeof(array_in);
int i;
for (i=0; i<size; i++)
{
array_in[i] = array_in[i] * test_P->gain_val;
}
}
cython_param_test.pyx
cimport cython
import numpy as np
cimport numpy as np
from cpython.mem cimport PyMem_Malloc, PyMem_Free
np.import_array()
cdef extern from "ccodetest.c":
ctypedef tag_T RT_MODEL_T
ctypedef P_T_ P_T
cdef struct P_T_:
double gain_val
cdef struct tag_T:
P_T *param
void compute(double array_in[4], RT_MODEL_T *const test_M)
cdef double array_in[4]
def run(
np.ndarray[np.double_t, ndim=1, mode='c'] x_in,
np.ndarray[np.double_t, ndim=2, mode='c'] x_out,
np.ndarray[np.double_t, ndim=1, mode='c'] gain):
cdef RT_MODEL_T* test_M = <RT_MODEL_T*> PyMem_Malloc(sizeof(RT_MODEL_T))
global array_in
test_M.param.gain_val = <double>gain
cdef int idx
try:
for idx in range(len(x_in)):
array_in[idx] = x_in[idx]
compute(array_in, test_M)
for idx in range(len(x_in)):
x_out[idx] = array_in[idx]
finally:
PyMem_Free(test_M)
return None
setup.py
import numpy
from Cython.Distutils import build_ext
def configuration(parent_package='', top_path=None):
from numpy.distutils.misc_util import Configuration
config = Configuration('', parent_package, top_path)
config.add_extension('cython_param_test',
sources=['cython_param_test.pyx'],
# libraries=['m'],
depends=['ccodetest.c'],
include_dirs=[numpy.get_include()])
return config
if __name__ == '__main__':
params = configuration(top_path='').todict()
params['cmdclass'] = dict(build_ext=build_ext)
setup(**params)
run_cython_param_test.py
import cython_param_test
import numpy as np
n_samples = 4
x_in = np.arange(n_samples, dtype='double') % 4
x_out = np.empty((n_samples, 1))
gain = np.ones(1, dtype='double') * 5
cython_param_test.run(x_in, x_out, gain)
print(x_out)
cdef RT_MODEL_T* test_M = <RT_MODEL_T*> PyMem_Malloc(sizeof(RT_MODEL_T))
You allocate space for a RT_MODEL_T. test_M has one member, a pointer to a P_T. Allocating space for the RT_MODEL_T only allocates space to store the pointer - it doesn't allocate a P_T to be pointed to. Where param points is completely arbitrary at the moment and is most likely a memory address that you aren't allowed to write to.
test_M.param.gain_val = ...
You attempt to write to an element of the P_T pointed to by param. However, param does not point to an allocated P_T.
... = <double>gain
You attempt to cast a numpy array to a double. This does not make sense at all. You probably want to get the first element of the numpy array or you should pass gain as just a double rather than a numpy array of doubles?
Since test_M and its contents don't need to live beyond the end of the function they're allocated in, I'd be tempted to allocate them on the stack instead, and that way you can completely avoid malloc and free:
cdef RT_MODEL_T test_M # not a pointer
cdef P_T p_t_instance
p_t_instance.gain_val = gain # or gain[0]?
test_M.param = &p_t_instance
# ...
compute(array_in, &test_M) # pass the address of `test_M`
Only do this if you are sure of the required lifetime of test_M and the P_T it holds a pointer to.
I'm trying to pick up Cython.
import counter
cdef public void increment():
counter.increment()
cdef public int get():
return counter.get()
cdef public void say(int times):
counter.say(times)
This is the "glue code" I'm using to call functions from counter.py, a pure Python source code file. It's laid out like this:
count = 0
def increment():
global count
count += 1
def get():
global count
return count
def say(times):
global count
print(str(count) * times)
I have successfully compiled and run this program. The functions work fine. However, a very strange thing occured when I tested this program:
int main(int argc, char *argv[]) {
Py_Initialize();
// The following two lines add the current working directory
// to the environment variable `PYTHONPATH`. This allows us
// to import Python modules in this directory.
PyRun_SimpleString("import sys");
PyRun_SimpleString("sys.path.append(\".\")");
PyInit_glue();
// Tests
for (int i = 0; i < 10; i++)
{
increment();
}
int x = get();
printf("Incremented %d times\n", x);
printf("The binary representation of the number 42 is");
say(3);
Py_Finalize();
return 0;
}
I would expect the program to produce this output:
Incremented 10 times
The binary representation of the number 42 is
101010
However, it prints this:
Incremented 10 times
101010
The binary representation of the number 42 is
But if I change the line
printf("The binary representation of the number 42 is");
to
printf("The binary representation of the number 42 is\n");
then the output is corrected.
This seems strange to me. I understand that if I want to print the output of a Python function, I might just as well return it to C and store it in a variable, and use C's printf() rather than the native Python print(). But I would be very interested to hear the reason this is happening. After all, the printf() statement is reached before the say() statement (I double checked this in gdb just to make sure). Thanks for reading.
+
I'm trying to optimize a piece of python code using AVX. I'm using ctypes to access the C++ function. Sometimes the functions segfaults and sometimes dont. I think it maybe has got something to do with the alignment?
Maybe anyone can help me with this, I'm kinda stuck here.
Python-Code:
from ctypes import *
import numpy as np
#path_cnt
path_cnt = 16
c_path_cnt = c_int(path_cnt)
#ndarray1
ndarray1 = np.ones(path_cnt,dtype=np.float32,order='C')
ndarray1.setflags(align=1,write=1)
c_ndarray1 = stock.ctypes.data_as(POINTER(c_float))
#ndarray2
ndarray2 = np.ones(path_cnt,dtype=np.float32,order='C');
ndarray2.setflags(align=1,write=1)
c_ndarray2 = max_vola.ctypes.data_as(POINTER(c_float))
#call function
finance = cdll.LoadLibrary(".../libfin.so")
finance.foobar.argtypes = [c_void_p, c_void_p,c_int]
finance.foobar(c_ndarray1,c_ndarray2,c_path_cnt)
x=0
while x < path_cnt:
print c_stock[x]
x+=1
C++ Code
extern "C"{
int foobar(float * ndarray1,float * ndarray2,int path_cnt)
{
for(int i=0;i<path_cnt;i=i+8)
{
__m256 arr1 = _mm256_load_ps(&ndarray1[i]);
__m256 arr2 = _mm256_load_ps(&ndarray2[i]);
__m256 add = _mm256_add_ps(arr1,arr2);
_mm256_store_ps(&ndarray1[i],add);
}
return 0;
}
}
And now the odd output behavior, making the some call in terminal twice gives different results!
tobias#tobias-Lenovo-U310:~/workspace/finance$ python finance.py
Segmentation fault (core dumped)
tobias#tobias-Lenovo-U310:~/workspace/finance$ python finance.py
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
Thanks in advance!
There are aligned and unaligned load instructions. The aligned ones will fault if you violate the alignment rules, but they are faster. The unaligned ones accept any address and do loads/shifts internally to get the data you want. You are using the aligned version, _mm256_load_ps and can just switch to the unaligned version _mm256_loadu_ps without any intermediate allocation.
A good vectorizing compiler will include a lead-in loop to reach an aligned address, then a body to work on aligned data, then a final loop to clean up any stragglers.
Allright, I tink I found a sultion, its not very elegant but it works at least!
The should be a better way, anyone any suggestions?
extern "C"{
int foobar(float * ndarray1,float * ndarray2,int path_cnt)
{
float * test = (float*)_mm_malloc(path_cnt*sizeof(float),32);
float * test2 = (float*)_mm_malloc(path_cnt*sizeof(float),32);
//copy to aligned memory(this part is kinda stupid)
for(int i=0;i<path_cnt;i++)
{
test[i] = stock[i];
test2[i] = max_vola[i];
}
for(int i=0;i<path_cnt;i=i+8)
{
__m256 arr1 = _mm256_load_ps(&test1[i]);
__m256 arr2 = _mm256_load_ps(&test2[i]);
__m256 add = _mm256_add_ps(arr1,arr2);
_mm256_store_ps(&test1[i],add);
}
//and copy everything back!
for(int i=0;i<path_cnt;i++)
{
stock[i] = test[i];
}
return 0;
}
}
Coming from MATLAB, I am looking for some way to create functions in Python which are derived from wrapping C functions. I came across Cython, ctypes, SWIG. My intent is not to improve speed by any factor (it would certainly help though).
Could someone recommend a decent solution for such a purpose.
Edit: What's the most popular/adopted way of doing this job?
Thanks.
I've found that weave works pretty well for shorter functions and has a very simple interface.
To give you an idea of just how easy the interface is, here's an example (taken from the PerformancePython website). Notice how multi-dimensional array conversion is handled for you by the converter (in this case Blitz).
from scipy.weave import converters
def inlineTimeStep(self, dt=0.0):
"""Takes a time step using inlined C code -- this version uses
blitz arrays."""
g = self.grid
nx, ny = g.u.shape
dx2, dy2 = g.dx**2, g.dy**2
dnr_inv = 0.5/(dx2 + dy2)
u = g.u
code = """
#line 120 "laplace.py" (This is only useful for debugging)
double tmp, err, diff;
err = 0.0;
for (int i=1; i<nx-1; ++i) {
for (int j=1; j<ny-1; ++j) {
tmp = u(i,j);
u(i,j) = ((u(i-1,j) + u(i+1,j))*dy2 +
(u(i,j-1) + u(i,j+1))*dx2)*dnr_inv;
diff = u(i,j) - tmp;
err += diff*diff;
}
}
return_val = sqrt(err);
"""
# compiler keyword only needed on windows with MSVC installed
err = weave.inline(code,
['u', 'dx2', 'dy2', 'dnr_inv', 'nx', 'ny'],
type_converters=converters.blitz,
compiler = 'gcc')
return err
I have the following DLL ('arrayprint.dll') function that I want to use in Python via ctypes:
__declspec(dllexport) void PrintArray(int* pArray) {
int i;
for(i = 0; i < 5; pArray++, i++) {
printf("%d\n",*pArray);
}
}
My Python script is as follows:
from ctypes import *
fiveintegers = c_int * 5
x = fiveintegers(2,3,5,7,11)
px = pointer(x)
mydll = CDLL('arrayprint.dll')
mydll.PrintArray(px)
The final function call outputs the following:
2
3
5
7
11
2226984
What is the 2226984 and how do I get rid of it? It doesn't look to be the decimal value for the memory address of the DLL, x, or px.
Thanks,
Mike
(Note: I'm not actually using PrintArray for anything; it was just the easiest example I could find that generated the same behavior as the longer function I'm using.)
mydll.PrintArray.restype = None
mydll.PrintArray(px)
By default ctypes assumes the function returns an integral type, which causes undefined behavior (reading a garbage memory location).