I'm using Cython in CPython 3.6 and I'd like to mark some pointers as "not aliased", to improve performance and be explicit semantically.
In C this is done with the restrict (__restrict, __restrict__) keyword. But how do I use restrict on my cdef variables from a Cython .pyx code?
Thanks!
Cython doesn't have syntax/support for the restrict-keyword (yet?). Your best bet is to code this function in C, most convenient is probably to use verbatim-C-code, e.g. here for a dummy example:
%%cython
cdef extern from *:
"""
void c_fun(int * CYTHON_RESTRICT a, int * CYTHON_RESTRICT b){
a[0] = b[0];
}
"""
void c_fun(int *a, int *b)
# example usage
def doit(k):
cdef int i=k;
cdef int j=k+1;
c_fun(&i,&j)
print(i)
Here I use Cython's (undocumented) define CYTHON_RESTRICT, which is defined as
// restrict
#ifndef CYTHON_RESTRICT
#if defined(__GNUC__)
#define CYTHON_RESTRICT __restrict__
#elif defined(_MSC_VER) && _MSC_VER >= 1400
#define CYTHON_RESTRICT __restrict
#elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 199901L
#define CYTHON_RESTRICT restrict
#else
#define CYTHON_RESTRICT
#endif
#endif
so it works not only for C99-compliant c-compiler. In the long run however, it is probably best to define something similar and not to depend on undocumented features.
Related
My question is similar to the one asked here - Passing C++ vector to Numpy through Cython without copying and taking care of memory management automatically I am also getting a segmentation fault but before I can fix it I have some compilation errors to take of with the move method in Cython.
This is the simplest example I have (just an extension of the Rectangle example provided in the net
What I want to do
My C++ code returns a deque of Points. It can return a vector of points as well. Any container will do
In the cython code I want to store the returned deque of points(one for each iteration of the for loop) in a collection(such as vector or deque). This collection will be an instance variable.
Later on I want to iterate through the collection and convert that deque of deque of points into a python list of lists. Here is where I believe I am getting the segmentation fault.
Point.h
#ifndef POINT_H
#define POINT_H
class Point
{
private:
double coordinate1,coordinate2;
public:
virtual double getCoordinate1() const;
virtual double getCoordinate2() const ;
virtual void setCoordinate1(double coordinate1);
virtual void setCoordinate2(double coordinate2);
};
Rectangle.h
#include <deque>
#include "Point.h"
using std:deque;
deque<Point> getAllPoints(Point query);
Rectangle.cpp
include "Rectangle.h"
deque<Point> Rectangle::getAllPoints(Point query)
{
deque<Point> deq;
for (int i = 0;i < 10000; ++i)
{
deq.push_back(query);
}
return deq;
Note Unlike the linked question I am not returning an address but returning a reference
rect.pxd
cdef extern from "<utility>" namespace "std" nogil:
T move[T](T) #
cdef extern from "Point.h":
cdef cppclass Point:
Point() nogil except +
double getCoordinate1()
double getCoordinate2()
void setCoordinate1(double coordinate1) nogil
void setCoordinate2(double coordinate2) nogil
cdef cppclass SphericalPoint(Point):
SphericalPoint() nogil except +
double getCoordinate1()
double getCoordinate2()
void setCoordinate1(double lat) nogil
void setCoordinate2(double lon) nogil
cdef extern from "Rectangle.h" namespace "shapes":
cdef cppclass Rectangle:
Rectangle(int, int, int, int) except + nogil
deque[Point] getArea(Point p) nogil
And finally
rect.pyx
cdef class PyRectangle:
cdef Rectangle *rect
cdef deque[Point] colOfPoints
def __cinit__(self, int x0, int y0, int x1, int y1):
self.rect = new Rectangle(x0, y0, x1, y1)
self.colOfPoints = deque[Point]()
def performCalc(self,maxDistance,chunk):
cdef deque[Point] area
cdef double[:,:] gPoints
gPoints = memoryview(chunk)
for i in range(0,len(gPoints)):
with nogil:
area = self.getArea(gPoints[i])
self.colOfPoints = move(area)
cdef deque[Point] getArea(self,double[:] p) nogil:
cdef deque[Point] area
area = self.rect.getArea(point)
return area
I believe I am setting the C++ 17 in setup.pyx
setup.py
os.environ['CFLAGS'] = '-O3 -Wall -std=c++17'
ext_modules = [Extension("rect",
["rect.pyx","Rectangle.cpp"],
include_dirs=['/usr/local/include'],
extra_link_args=["-std=c++17"],
language='c++',
)]
extensions = cythonize(ext_modules, language_level = "3")
I am getting these compilation errors
rect.cpp: In function ‘PyObject* __pyx_pf_4rect_11PyRectangle_6performCalc(__pyx_obj_4rect_PyRectangle*, PyObject*, PyObject*)’:
rect.cpp:3781:81: error: no matching function for call to ‘move<std::deque<Point, std::allocator<Point> > >(std::deque<Point>&)’
__pyx_v_self->colOfPoints = std::move<std::deque<Point> >(__pyx_v_area);
^
In file included from /usr/include/c++/7 /bits/nested_exception.h:40:0,
from /usr/include/c++/7/exception:143,
from /usr/include/c++/7/ios:39,
from rect.cpp:632:
/usr/include/c++/7/bits/move.h:98:5: note: candidate: template<class _Tp> constexpr typename std::remove_reference< <template-parameter-1-1> >::type&& std::move(_Tp&&)
move(_Tp&& __t) noexcept
^~~~
/usr/include/c++/7/bits/move.h:98:5: note: template argument deduction/substitution failed:
rect.cpp:3781:81: note: cannot convert ‘__pyx_v_area’ (type ‘std::deque<Point>’) to type ‘std::deque<Point>&&’
__pyx_v_self->colOfPoints = std::move<std::deque<Point> >(__pyx_v_area);
^
In file included from /usr/include/c++/7/bits/char_traits.h:39:0,
from /usr/include/c++/7/ios:40,
from rect.cpp:632:
/usr/include/c++/7/bits/stl_algobase.h:479:5: note: candidate: template<class _II, class _OI> _OI std::move(_II, _II, _OI)
move(_II __first, _II __last, _OI __result)
^~~~
/usr/include/c++/7/bits/stl_algobase.h:479:5: note: template argument deduction/substitution failed:
rect.cpp:3781:81: note: candidate expects 3 arguments, 1 provided
__pyx_v_self->colOfPoints = std::move<std::deque<Point> >(__pyx_v_area);
^
In file included from /usr/include/c++/7/deque:66:0,
from rect.cpp:636:
/usr/include/c++/7/bits/deque.tcc:1048:5: note: candidate: template<class _Tp> std::_Deque_iterator<_Tp, _Tp&, _Tp*> std::move(std::_Deque_iterator<_Tp, const _Tp&, const _Tp*>, std::_Deque_iterator<_Tp, const _Tp&, const _Tp*>, std::_Deque_iterator<_Tp, _Tp&, _Tp*>)
move(_Deque_iterator<_Tp, const _Tp&, const _Tp*> __first,
^~~~
/usr/include/c++/7/bits/deque.tcc:1048:5: note: template argument deduction/substitution failed:
rect.cpp:3781:81: note: candidate expects 3 arguments, 1 provided
__pyx_v_self->colOfPoints = std::move<std::deque<Point> >(__pyx_v_area);
^
In file included from /usr/include/c++/7/deque:64:0,
from rect.cpp:636:
/usr/include/c++/7/bits/stl_deque.h:424:5: note: candidate: template<class _Tp> std::_Deque_iterator<_Tp, _Tp&, _Tp*> std::move(std::_Deque_iterator<_Tp, _Tp&, _Tp*>, std::_Deque_iterator<_Tp, _Tp&, _Tp*>, std::_Deque_iterator<_Tp, _Tp&, _Tp*>)
move(_Deque_iterator<_Tp, _Tp&, _Tp*> __first,
^~~~
/usr/include/c++/7/bits/stl_deque.h:424:5: note: template argument deduction/substitution failed:
rect.cpp:3781:81: note: candidate expects 3 arguments, 1 provided
__pyx_v_self->colOfPoints = std::move<std::deque<Point> >(__pyx_v_area);
Below is the slightly simplifier version that I used to reproduce your issue. I'm just including it as an illustration of how to cut your example down further - note that I don't need to use C++ files - I can just include the code directly in the pyx file by putting it in the docstring.
#distutils: language = c++
from libcpp.deque cimport deque
cdef extern from *:
"""
#include <deque>
using std::deque;
class Point{};
deque<Point> getAllPoints() {
deque<Point> deq;
for (int i=0; i<10000; ++i) {
deq.push_back(Point{});
}
return deq;
}
"""
cdef cppclass Point:
pass
deque[Point] getAllPoints()
cdef extern from "<utility>" namespace "std" nogil:
T move[T](T)
cdef class PyRectange:
cdef deque[Point] colOfPoints
def something(self):
cdef deque[Point] area = self.getArea()
self.colOfPoints = move(area)
cdef deque[Point] getArea(self):
return getAllPoints()
The basic problem is that when Cython generates C++ code for templates it writes std::move<deque<Point>>(area) rather that std::move(area) and letting C++ deduce the template type. For reasons I don't fully understand this seems to generate the wrong code a lot of the time.
I have two and a half solutions:
Don't tell Cython that move is a template function. Instead just tell it about the overloads you want:
cdef extern from "<utility>" namespace "std" nogil:
deque[Point] move(deque[Point])
This is easiest I think and probably my approach.
If you avoid creating the temporary area then the C++ code does work with the template:
self.colOfPoints = move(self.getArea())
I suspect in this case you might not even need the move - C++ will probably automatically use the move assignment operator anyway.
This answer claimed to come up with a small wrapper helps Cython call the correct move (it also slightly missed the actually problem on the question it was answering...). I haven't tested it myself, but it might be worth a go if you want a templated move.
I have a C function which signature looks like this:
typedef double (*func_t)(double*, int)
int some_f(func_t myFunc);
I would like to pass a Python function (not necessarily explicitly) as an argument for some_f. Unfortunately, I can't afford to alter declaration of some_f, that's it: I shouldn't change C code.
One obvious thing I tried to do is to create a basic wrapping function like this:
cdef double wraping_f(double *d, int i /*?, object f */):
/*do stuff*/
return <double>f(d_t)
However, I can't come up with a way to actually "put" it inside wrapping_f's body.
There is a very bad solution to this problem: I could use a global object variable, however this forces me copy-n-paste multiple instances of essentially same wrapper function that will use different global functions (I am planning to use multiple Python functions simultaneously).
I keep my other answer for historical reasons - it shows, that there is no way to do what you want without jit-compilation and helped me to understand how great #DavidW's advise in this answer was.
For the sake of simplicity, I use a slightly simpler signature of functions and trust you to change it accordingly to your needs.
Here is a blueprint for a closure, which lets ctypes do the jit-compilation behind the scenes:
%%cython
#needs Cython > 0.28 to run because of verbatim C-code
cdef extern from *: #fill some_t with life
"""
typedef int (*func_t)(int);
static int some_f(func_t fun){
return fun(42);
}
"""
ctypedef int (*func_t)(int)
int some_f(func_t myFunc)
#works with any recent Cython version:
import ctypes
cdef class Closure:
cdef object python_fun
cdef object jitted_wrapper
def inner_fun(self, int arg):
return self.python_fun(arg)
def __cinit__(self, python_fun):
self.python_fun=python_fun
ftype = ctypes.CFUNCTYPE(ctypes.c_int,ctypes.c_int) #define signature
self.jitted_wrapper=ftype(self.inner_fun) #jit the wrapper
cdef func_t get_fun_ptr(self):
return (<func_t *><size_t>ctypes.addressof(self.jitted_wrapper))[0]
def use_closure(Closure closure):
print(some_f(closure.get_fun_ptr()))
And now using it:
>>> cl1, cl2=Closure(lambda x:2*x), Closure(lambda x:3*x)
>>> use_closure(cl1)
84
>>> use_closure(cl2)
126
This answer is more in Do-It-Yourself style and while not unintersting you should refer to my other answer for a concise recept.
This answer is a hack and a little bit over the top, it only works for Linux64 and probably should not be recommended - yet I just cannot stop myself from posting it.
There are actually four versions:
how easy the life could be, if the API would take the possibility of closures into consideration
using a global state to produce a single closure [also considered by you]
using multiple global states to produce multiple closures at the same time [also considered by you]
using jit-compiled functions to produce an arbitrary number of closures at the same time
For the sake of simplicity I chose a simpler signature of func_t - int (*func_t)(void).
I know, you cannot change the API. Yet I cannot embark on a journey full of pain, without mentioning how simple it could be... There is a quite common trick to fake closures with function pointers - just add an additional parameter to your API (normally void *), i.e:
#version 1: Life could be so easy
# needs Cython >= 0.28 because of verbatim C-code feature
%%cython
cdef extern from *: #fill some_t with life
"""
typedef int (*func_t)(void *);
static int some_f(func_t fun, void *params){
return fun(params);
}
"""
ctypedef int (*func_t)(void *)
int some_f(func_t myFunc, void *params)
cdef int fun(void *obj):
print(<object>obj)
return len(<object>obj)
def doit(s):
cdef void *params = <void*>s
print(some_f(&fun, params))
We basically use void *params to pass the inner state of the closure to fun and so the result of fun can depend on this state.
The behavior is as expected:
>>> doit('A')
A
1
But alas, the API is how it is. We could use a global pointer and a wrapper to pass the information:
#version 2: Use global variable for information exchange
# needs Cython >= 0.28 because of verbatim C-code feature
%%cython
cdef extern from *:
"""
typedef int (*func_t)();
static int some_f(func_t fun){
return fun();
}
static void *obj_a=NULL;
"""
ctypedef int (*func_t)()
int some_f(func_t myFunc)
void *obj_a
cdef int fun(void *obj):
print(<object>obj)
return len(<object>obj)
cdef int wrap_fun():
global obj_a
return fun(obj_a)
cdef func_t create_fun(obj):
global obj_a
obj_a=<void *>obj
return &wrap_fun
def doit(s):
cdef func_t fun = create_fun(s)
print(some_f(fun))
With the expected behavior:
>>> doit('A')
A
1
create_fun is just convenience, which sets the global object and return the corresponding wrapper around the original function fun.
NB: It would be safer to make obj_a a Python-object, because void * could become dangling - but to keep the code nearer to versions 1 and 4 we use void * instead of object.
But what if there are more than one closure in use at the same time, let's say 2? Obviously with the approach above we need 2 global objects and two wrapper functions to achieve our goal:
#version 3: two function pointers at the same time
%%cython
cdef extern from *:
"""
typedef int (*func_t)();
static int some_f(func_t fun){
return fun();
}
static void *obj_a=NULL;
static void *obj_b=NULL;
"""
ctypedef int (*func_t)()
int some_f(func_t myFunc)
void *obj_a
void *obj_b
cdef int fun(void *obj):
print(<object>obj)
return len(<object>obj)
cdef int wrap_fun_a():
global obj_a
return fun(obj_a)
cdef int wrap_fun_b():
global obj_b
return fun(obj_b)
cdef func_t create_fun(obj) except NULL:
global obj_a, obj_b
if obj_a == NULL:
obj_a=<void *>obj
return &wrap_fun_a
if obj_b == NULL:
obj_b=<void *>obj
return &wrap_fun_b
raise Exception("Not enough slots")
cdef void delete_fun(func_t fun):
global obj_a, obj_b
if fun == &wrap_fun_a:
obj_a=NULL
if fun == &wrap_fun_b:
obj_b=NULL
def doit(s):
ss = s+s
cdef func_t fun1 = create_fun(s)
cdef func_t fun2 = create_fun(ss)
print(some_f(fun2))
print(some_f(fun1))
delete_fun(fun1)
delete_fun(fun2)
After compiling, as expected:
>>> doit('A')
AA
2
A
1
But what if we have to provide an arbitrary number of function-pointers at the same time?
The problem is, that we need to create the wrapper-functions at the run time, because there is no way to know how many we will need while compiling, so the only thing I can think of is to jit-compile these wrapper-functions when they are needed.
The wrapper function looks quite simple, here in assembler:
wrapper_fun:
movq address_of_params, %rdi ; void *param is the parameter of fun
movq address_of_fun, %rax ; addresse of the function which should be called
jmp *%rax ;jmp instead of call because it is last operation
The addresses of params and of fun will be known at run time, so we just have to link - replace the placeholder in the resulting machine code.
In my implementation I'm following more or less this great article: https://eli.thegreenplace.net/2017/adventures-in-jit-compilation-part-4-in-python/
#4. version: jit-compiled wrapper
%%cython
from libc.string cimport memcpy
cdef extern from *:
"""
typedef int (*func_t)(void);
static int some_f(func_t fun){
return fun();
}
"""
ctypedef int (*func_t)()
int some_f(func_t myFunc)
cdef extern from "sys/mman.h":
void *mmap(void *addr, size_t length, int prot, int flags,
int fd, size_t offset);
int munmap(void *addr, size_t length);
int PROT_READ # #define PROT_READ 0x1 /* Page can be read. */
int PROT_WRITE # #define PROT_WRITE 0x2 /* Page can be written. */
int PROT_EXEC # #define PROT_EXEC 0x4 /* Page can be executed. */
int MAP_PRIVATE # #define MAP_PRIVATE 0x02 /* Changes are private. */
int MAP_ANONYMOUS # #define MAP_ANONYMOUS 0x20 /* Don't use a file. */
# |-----8-byte-placeholder ---|
blue_print = b'\x48\xbf\x00\x00\x00\x00\x00\x00\x00\x00' # movabs 8-byte-placeholder,%rdi
blue_print+= b'\x48\xb8\x00\x00\x00\x00\x00\x00\x00\x00' # movabs 8-byte-placeholder,%rax
blue_print+= b'\xff\xe0' # jmpq *%rax ; jump to address in %rax
cdef func_t link(void *obj, void *fun_ptr) except NULL:
cdef size_t N=len(blue_print)
cdef char *mem=<char *>mmap(NULL, N,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS,
-1,0)
if <long long int>mem==-1:
raise OSError("failed to allocated mmap")
#copy blueprint:
memcpy(mem, <char *>blue_print, N);
#inject object address:
memcpy(mem+2, &obj, 8);
#inject function address:
memcpy(mem+2+8+2, &fun_ptr, 8);
return <func_t>(mem)
cdef int fun(void *obj):
print(<object>obj)
return len(<object>obj)
cdef func_t create_fun(obj) except NULL:
return link(<void *>obj, <void *>&fun)
cdef void delete_fun(func_t fun):
munmap(fun, len(blue_print))
def doit(s):
ss, sss = s+s, s+s+s
cdef func_t fun1 = create_fun(s)
cdef func_t fun2 = create_fun(ss)
cdef func_t fun3 = create_fun(sss)
print(some_f(fun2))
print(some_f(fun1))
print(some_f(fun3))
delete_fun(fun1)
delete_fun(fun2)
delete_fun(fun3)
And now, the expected behavior:
>>doit('A')
AA
2
A
1
AAA
3
After looking at this, maybe there is a change the API can be changed?
Dealing with processing large matrices (NxM with 1K <= N <= 20K & 10K <= M <= 200K), I often need to pass Numpy matrices to C++ through Cython to get the job done and this works as expected & without copying.
However, there are times when I need to initiate and preprocess a matrix in C++ and pass it to Numpy (Python 3.6). Let's assume the matrices are linearized (so the size is N*M and it's a 1D matrix - col/row major doesn't matter here). Following the information in here: exposing C-computed arrays in Python without data copies & modifying it for C++ compatibility, I'm able to pass C++ array.
The problem is if I want to use std vector instead of initiating array, I'd get Segmentation fault. For example, considering the following files:
fast.h
#include <iostream>
#include <vector>
using std::cout; using std::endl; using std::vector;
int* doit(int length);
fast.cpp
#include "fast.h"
int* doit(int length) {
// Something really heavy
cout << "C++: doing it fast " << endl;
vector<int> WhyNot;
// Heavy stuff - like reading a big file and preprocessing it
for(int i=0; i<length; ++i)
WhyNot.push_back(i); // heavy stuff
cout << "C++: did it really fast" << endl;
return &WhyNot[0]; // or WhyNot.data()
}
faster.pyx
cimport numpy as np
import numpy as np
from libc.stdlib cimport free
from cpython cimport PyObject, Py_INCREF
np.import_array()
cdef extern from "fast.h":
int* doit(int length)
cdef class ArrayWrapper:
cdef void* data_ptr
cdef int size
cdef set_data(self, int size, void* data_ptr):
self.data_ptr = data_ptr
self.size = size
def __array__(self):
print ("Cython: __array__ called")
cdef np.npy_intp shape[1]
shape[0] = <np.npy_intp> self.size
ndarray = np.PyArray_SimpleNewFromData(1, shape,
np.NPY_INT, self.data_ptr)
print ("Cython: __array__ done")
return ndarray
def __dealloc__(self):
print("Cython: __dealloc__ called")
free(<void*>self.data_ptr)
print("Cython: __dealloc__ done")
def faster(length):
print("Cython: calling C++ function to do it")
cdef int *array = doit(length)
print("Cython: back from C++")
cdef np.ndarray ndarray
array_wrapper = ArrayWrapper()
array_wrapper.set_data(length, <void*> array)
print("Ctyhon: array wrapper set")
ndarray = np.array(array_wrapper, copy=False)
ndarray.base = <PyObject*> array_wrapper
Py_INCREF(array_wrapper)
print("Cython: all done - returning")
return ndarray
setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext
import numpy
ext_modules = [Extension(
"faster",
["faster.pyx", "fast.cpp"],
language='c++',
extra_compile_args=["-std=c++11"],
extra_link_args=["-std=c++11"]
)]
setup(
cmdclass = {'build_ext': build_ext},
ext_modules = ext_modules,
include_dirs=[numpy.get_include()]
)
If you build this with
python setup.py build_ext --inplace
and run Python 3.6 interpreter, if you enter the following you'd get seg fault after a couple of tries.
>>> from faster import faster
>>> a = faster(1000000)
Cython: calling C++ function to do it
C++: doing it fast
C++: did it really fast
Cython: back from C++
Ctyhon: array wrapper set
Cython: __array__ called
Cython: __array__ done
Cython: all done - returning
>>> a = faster(1000000)
Cython: calling C++ function to do it
C++: doing it fast
C++: did it really fast
Cython: back from C++
Ctyhon: array wrapper set
Cython: __array__ called
Cython: __array__ done
Cython: all done - returning
Cython: __dealloc__ called
Segmentation fault (core dumped)
Couple of things to note:
If you use array instead of vector (in fast.cpp) this would work like a charm!
If you call faster(1000000) and put the result into something other than variable a this would work.
If you enter smaller number like faster(10) you'd get a more detailed info like:
Cython: calling C++ function to do it
C++: doing it fast
C++: did it really fast
Cython: back from C++
Ctyhon: array wrapper set
Cython: __array__ called
Cython: __array__ done
Cython: all done - returning
Cython: __dealloc__ called <--- Perhaps this happened too early or late?
*** Error in 'python': double free or corruption (fasttop): 0x0000000001365570 ***
======= Backtrace: =========
More info here ....
It's really puzzling that why this doesn't happen with arrays? No matter what!
I make use of vectors a lot and would love to be able to use them in these scenarios.
I think #FlorianWeimer's answer provides a decent solution (allocate a vector and pass that into your C++ function) but it should be possible to return a vector from doit and avoid copies by using the move constructor.
from libcpp.vector cimport vector
cdef extern from "<utility>" namespace "std" nogil:
T move[T](T) # don't worry that this doesn't quite match the c++ signature
cdef extern from "fast.h":
vector[int] doit(int length)
# define ArrayWrapper as holding in a vector
cdef class ArrayWrapper:
cdef vector[int] vec
cdef Py_ssize_t shape[1]
cdef Py_ssize_t strides[1]
# constructor and destructor are fairly unimportant now since
# vec will be destroyed automatically.
cdef set_data(self, vector[int]& data):
self.vec = move(data)
# #ead suggests `self.vec.swap(data)` instead
# to avoid having to wrap move
# now implement the buffer protocol for the class
# which makes it generally useful to anything that expects an array
def __getbuffer__(self, Py_buffer *buffer, int flags):
# relevant documentation http://cython.readthedocs.io/en/latest/src/userguide/buffer.html#a-matrix-class
cdef Py_ssize_t itemsize = sizeof(self.vec[0])
self.shape[0] = self.vec.size()
self.strides[0] = sizeof(int)
buffer.buf = <char *>&(self.vec[0])
buffer.format = 'i'
buffer.internal = NULL
buffer.itemsize = itemsize
buffer.len = self.v.size() * itemsize # product(shape) * itemsize
buffer.ndim = 1
buffer.obj = self
buffer.readonly = 0
buffer.shape = self.shape
buffer.strides = self.strides
buffer.suboffsets = NULL
You should then be able to use it as:
cdef vector[int] array = doit(length)
cdef ArrayWrapper w
w.set_data(array) # "array" itself is invalid from here on
numpy_array = np.asarray(w)
Edit: Cython isn't hugely good with C++ templates - it insists on writing std::move<vector<int>>(...) rather than std::move(...) then letting C++ deduce the types. This sometimes causes problems with std::move. If you're having issues with it then the best solution is usually to tell Cython about only the overloads you want:
cdef extern from "<utility>" namespace "std" nogil:
vector[int] move(vector[int])
When you return from doit, the WhyNot object goes out of scope, and the array elements are deallocated. This means that &WhyNot[0] is no longer a valid pointer. You need to store the WhyNot object somewhere else, probably in a place provided by the caller.
One way to do this is to split doit into three functions, doit_allocate which allocates the vector and returns a pointer to it, doit as before (but with an argument which receives a pointer to the preallocated vector, anddoit_free` which deallocates the vector.
Something like this:
vector<int> *
doit_allocate()
{
return new vector<int>;
}
int *
doit(vector<int> *WhyNot, int length)
{
// Something really heavy
cout << "C++: doing it fast " << endl;
// Heavy stuff - like reading a big file and preprocessing it
for(int i=0; i<length; ++i)
WhyNot->push_back(i); // heavy stuff
cout << "C++: did it really fast" << endl;
return WhyNot->front();
}
void
doit_free(vector<int> *WhyNot)
{
delete WhyNot;
}
Cython equivalent of c define
#define myfunc(Node x,...) SetNode(x.getattributeNode(),__VA_ARGS__)
I have a c api SetNode which takes first argument a node of struct type node and N variables (N is variable number from 0-N)
here is a c example to solve such problum
exampleAPI.c
#include<stdarg.h>
float sumN(int len,...){
va_list argp;
int i;
float s=0;
va_start(argp,len);
for(i=0;i<len;i++){
s+=va_arg(argp,int);
}
va_end(argp);
}
exampleAPI.h
#include<stdarg.h>
float sumN(int len,...)
examplecode.c
#include<stdarg.h>
#include"exampleAPI.h"
int len(float first,...){
va_list argp;
int i=1;
va_start(argp,first);
while(1){
if(va_arg(argp,float)==NULL){
return i
}
else{
i++;
}
}
va_end(argp);
}
#define sum(...) sumN(len(__VA_ARGS__),__VA_ARGS__)
Now calling
sum(1,2,3,4);
will return 10.000000
sum(1.5,6.5);
will return 8.00000
I need a cython alternative for bellow c definition and not above example
because I have a C-API which has SetNode function which takes variable number of arguments and I want to wrap it in cython and call from python
#define myfunc(Node x,...) SetNode(x.getattributeNode(),__VA_ARGS__)
here Node is a class defined in cython which holds a c stuct as attribute and getattributeNode() is a function of Node class which returns c struct that needs to be passed into C-API.
cdef extern "Network.h":
ctypedef struct node_bn:
pass
node_bn* SetNode(node_bn* node,...)
cdef class Node:
cdef node_bn *node
cdef getattributeNode(self):
return self.node
def setNode(self,*arg):
self.node=SetNode(self.node,*arg) # Error cannot convert python objects to c type
Alternative thing I tried
cdef extern from "stdarg.h":
ctypedef struct va_list:
pass
ctypedef struct fake_type:
pass
void va_start(va_list, void* arg)
void* va_arg(va_list, fake_type)
void va_end(va_list)
fake_type int_type "int"
cdef extern "Network.h":
ctypedef struct node_bn:
pass
node_bn* VSetNode(node_bn* node,va_list argp)
cdef class Node:
cdef node_bn *node
cdef getattributeNode(self):
return self.node
cpdef _setNode(self,node_bn *node,...):
cdef va_list agrp
va_start(va_list, node)
self.node=VSetNode(node,argp)
va_end(va_list)
def setNode(self,*arg):
self._setNode(self.node,*arg)
works fine when argument list is empty
n = Node()
n.setNode() #This works
n.SetNode("top",1) # error takes exactly one argument given 3 in self._setNode(self.node,*arg)
If anyone could suggest cython equivalent to it, it would be great.
I don't think it's easily done though Cython (the problem is telling Cython what type conversions to do for an arbitrary number of arguments). The best I can suggest is to use the standard library ctypes library for this specific case and wrap the rest in Cython.
For the sake of an example, I've used a very simple sum function. va_sum.h contains:
typedef struct { double val; } node_bn;
node_bn* sum_va(node_bn* node,int len, ...);
/* on windows this must be:
__declspec(dllexport) node_bn* sum_va(node_bn* node,int len, ...);
*/
and va_sum.c contains:
#include <stdarg.h>
#include "va_sum.h"
node_bn* sum_va(node_bn* node,int len, ...) {
int i;
va_list vl;
va_start(vl,len);
for (i=0; i<len; ++i) {
node->val += va_arg(vl,double);
}
va_end(vl);
return node;
}
I've written it so it adds everything to a field in a structure just to demonstrate that you can pass pointers to structures without too much trouble.
The Cython file is:
# definition of a structure
cdef extern from "va_sum.h":
ctypedef struct node_bn:
double val;
# obviously you'll want to wrap things in terms of Python accessible classes, but this atleast demonstrates how it works
def test_sum(*args):
cdef node_bn input_node;
cdef node_bn* output_node_p;
input_node.val = 5.0 # create a node, and set an initial value
from ctypes import CDLL, c_double,c_void_p
import os.path
# load the Cython library with ctypes to gain access to the "sum_va" function
# Assuming you've linked it in when you build the Cython module
full_path = os.path.realpath(__file__)
this_file_library = CDLL(full_path)
# I treat all my arguments as doubles - you may need to do
# something more sophisticated, but the idea is the same:
# convert them to the c-type the function is expecting
args = [ c_double(arg) for arg in args ]
sum_va = this_file_library.sum_va
sum_va.restype = c_void_p # it returns a pointer
# pass the pointers as a void pointer
# my c compiler warns me if I used int instead of long
# but which integer type you have to use is likely system dependent
# and somewhere you have to be careful
output_node_p_as_integer = sum_va(c_void_p(<long>&input_node),len(args),
*args)
# unfortunately getting the output needs a two stage typecast - first to a long, then to a pointer
output_node_p = <node_bn*>(<long>(output_node_p_as_integer))
return output_node_p.val
You need to compile your va_sum.c together with your Cython file (e.g. by adding sources = ['cython_file.pyx','va_sum.c'] in setup.py)
Ctypes is probably a bit slower than Cython (I think there's a reasonable overhead on each call), and it's odd to mix them, but this should at least let you write the main wrapper in Cython, and use ctypes to get round the specific limitation.
This is probably not the proper answer, since I am not sure I understand the question fully. I would have replied in a comment, but the code formatting is too poor.
In Python the functions sum and len are available:
def my_len(*args):
return len(args)
def my_sum(*args):
return sum(args)
print "len =", my_len("hello", 123, "there")
print "sum =", my_sum(6.5, 1.5, 2.0)
outputs:
len = 3
sum = 10.0
So, I have C++ class that I wrap in C so I can use it in Python using ctypes.
Declaration of C++ class:
// Test.h
class Test
{
public:
static double Add(double a, double b);
};
//Test.cpp
#include "stdafx.h"
#include "Test.h"
double Test::Add(double a, double b)
{
return a + b;
}
C wrap:
// cdll.h
#ifndef WRAPDLL_EXPORTS
#define WRAPDLL_API __declspec(dllexport)
#else
#define WRAPDLL_API __declspec(dllimport)
#endif
#include "Test.h"
extern "C"
{
WRAPDLL_API struct TestC;
WRAPDLL_API TestC* newTest();
WRAPDLL_API double AddC(TestC* pc, double a, double b);
}
//cdll.cpp
#include "stdafx.h"
#include "cdll.h"
TestC* newTest()
{
return (TestC*) new Test;
}
double AddC(TestC* pc, double a, double b)
{
return ((Test*)pc)->Add(a, b);
}
Python script:
import ctypes
t = ctypes.cdll('../Debug/cdll.dll')
a = t.newTest()
t.AddC(a, 2, 3)
Result of t.AddC(a, 2, 3) is always some negative integer.
There is a problem with a pointer, but I do not know what is a problem.
Does anyone have any ideas?
As AddC is a static function the pointer is not your problem.
You need to pass double values to AddC, and get a double type back:
t.AddC.restype = c_double
t.AddC(a, c_double(2), c_double(3))
The documentation for ctype explains all this.
As stated in the documentation
By default functions are assumed to return the C int type. Other return types can be specified by setting the restype attribute of the function object.
So just add
t.AddC.restype = c_double
t.AddC(a, 2.0, 3.0)
and you'll get 5.0 instead.