NumPy C extension with SWIG unknown length array

NumPy C extension with SWIG unknown length array - python

I would like to wrap a C function with SWIG.
The function takes a couple arrays (of the same length) as input and returns three more arrays.
It is however not possible to predict the length of the return arrays beforehand and these are dynamically allocated in the function.
Is it possible to wrap such a function with SWIG (using numpy.i) and if so how?
A simplified function declaration looks like:
int func(double **a, double **b, long int *N, double *x, double *y, long int *Nx, long int *Ny);
Where Nx and Ny are known beforehand but N (the length of a and b) is not and a and b are allocated (with malloc) in the function.

It seems that SWIG (or any other Python wrapper generator for that matter) cannot do this.
I ended up writing the Python wrapper by hand, which is actually quite easy, using PyArray_SimpleNew or PyArray_SimpleNewFromData to create the output arrays.
With the latter one has to be extra careful so as to not generate memory leaks.
After playing with it a bit I found the former combined with a simple memcpy to be safer.

Related

Assigning ndarray in cython to new variable very slow? Or what is going on here?

I am fairly new to cython and I am wondering why the following takes very long:
cpdef test(a):
cdef np.ndarray[dtype=int] b
for i in range(10):
b=a
a=np.array([1,2,3],dtype=int)
t = timeit.Timer(functools.partial(test.test, a))
print(t.timeit(1000000))
-> 0.5446977 Seconds
If i comment out the cdef declaration this is done in no-time. If i declare "a" as np.ndarray in the function header nothing changes. Also, id(a) == id(b) so no new objects are created.
Similar behaviour can be observed when calling a function that takes many ndarray as args, e.g.
cpdef foo(np.ndarray a, np.ndarray b,np.ndarray c, ..... )
Can anybody help me? What am i missing here?
Edit:
I noticed the following:
This is slow:
cpdef foo(np.ndarray[dtype=int,ndim=1] a,np.ndarray[dtype=int,ndim=1] b,np.ndarray[dtype=int,ndim=1] c ) :
return
This is faster:
def foo(np.ndarray[dtype=int,ndim=1] a,np.ndarray[dtype=int,ndim=1] b,np.ndarray[dtype=int,ndim=1] c ) :
return
This is the fastest
cpdef foo( a,b,c ) :
return
The function foo() is called very frequently (many million times) in my project from many different locations and does some calculus with the three numpy arrays (however, it doesnt change their content).
I basically need the speed of knowing the data-type inside of the arrays while also having a very low function-call overead. What would be the most adequate solution for this?

b = a generates a bunch of type checking that needs to identify whether the type of a is actually an ndarray and makes sure it exports the buffer protocol with an appropriate element type. In exchange for this one-off cost you get fast indexing of single elements.
If you're not doing indexing of single elements then typing as np.ndarray is literally pointless and you're pessimizing your code. If you are doing this indexing then you can get significant optimizations.
If i comment out the cdef declaration this is done in no-time.
This is often a sign that the C compiler has realized the entire function does nothing and optimized it out completely. And therefore your measurement may be meaningless.
cpdef foo(np.ndarray a, np.ndarray b,np.ndarray c, ..... )
just specifying the type as np.ndarray without specifying the element dtype usually gains you very little, and is probably not worthwhile.
If you have a function that you're calling millions of times then it is likely that the input arrays come from somewhere, and can be pre-typed, probably with less frequency. For example they might come by taking slices from a larger array?
The newer memoryview syntax (int[:]) is quick to slice, so for example if you already have a 2D memoryview (int[:,:] x) it's very quick to generate a 1D memoryview from it with (e.g. x[:,0]), and it's quick to pass existing memoryviews into a cdef function with memoryview arguments. (Note that (a) I'm just unsure if all of this applies to np.ndarray too, and (b) seeing up a fresh memoryview is likely to be about the same cost an an np.ndarray so I'm only suggesting using them because I know slicing is quick).
Therefore my main suggestion is to move the typing outwards to try to reduce the number of fresh initializations of these typed arrays. If that isn't possible then I think you may be stuck.

The simplest way to pass pointer to contiguous data from Python to C

I am using ctypes to call function in C. Function expects pointer to the first element of the contiguous data and number of data.
One thing that works is something like that
a=15 # a could be any number
temp = numpy.array([a]*14, dtype=numpy.int8)
c_function(temp.ctypes.data_as(ctypes.c_void_p), 14)
This is really cumbersome, requires both numpy and ctypes. Is there any other more simple way that works both in Python2 and Python3 (AFAIK bytes([a]*14) works but only for Python3)
EDIT: More interestingly this also works (!)
a=15 # a could be any number
temp = chr(a)*14
c_function(temp, 14)
There were suggestions in other threads that one could pass something pointer to the first element of the contiguous data, like here Passing memoryview to C function, but I was just unable to make this work.

Preliminaries
Python does not have pointers. You cannot create a pointer in Python, though Python variables act in some ways like pointers. You can create Python objects that represent lower-level pointers, but what you actually seem to want is to feed your C function a pointer to Python-managed data, which is an altogether different thing.
What ctypes does for you
You seem to have settled on using ctypes for the actual function call, so the general question expressed in the question title is a little overbroad for what you actually want to know. The real question seems to be more like "How do I get ctypes to pass a C pointer to Python-managed data to a C function?"
According to the ctypes Python 2 docs, in Python 2,
None, integers, longs, byte strings and unicode strings are the only
native Python objects that can directly be used as parameters in these
function calls. None is passed as a C NULL pointer, byte strings and
unicode strings are passed as pointer to the memory block that
contains their data (char * or wchar_t *). [...]
(emphasis added).
It's more or less the same list in Python 3 ...
None, integers, bytes objects and (unicode) strings
... with the same semantics.
Note well that ctypes takes care of the conversion from Python object to corresponding C representation -- nowhere does Python code handle C pointers per se, nor even direct representations of them.
Relevant C details
In many C implementations, all object pointer types have the same representation and can be used semi-interchangeably, but pointers of type char * are guaranteed by the standard to have the same size and representation as pointers of type void *. These two pointer types are guranteed to be interchangeable as function parameters and return values, among other things.
Synthesis
How convenient! It is acceptable to call your C function with a first argument of type char * when the function declares that parameter to be of type void *, and that is exactly what ctypes will arrange for you when the Python argument is a byte string (Python 2) or a bytes object (Python 3). The C function will receive a pointer to the object's data, not to the object itself. This provides a simpler and better way forward than going through numpy or a similar package, and it is basically the approach that you appended to your question. Thus, supposing that c_function identifies a ctypes-wrapped C function, you could do this (python3):
len = 15
c_function(b'0' * len, len)
Of course, you can also create a variable for the object and pass that, instead, which would allow you to afterward see whatever the C function has done with the contents of the object.
Do note, however, that
Byte strings and bytes objects are immutable as far as Python is concerned. You can get yourself in trouble if you use a C function to change the contents of a bytes object that other Python code assumes will never change.
The C side cannot determine the size of the data from a pointer to it. That is presumably the purpose of the second parameter. If you tell the function that the object is larger than it really is, and the function relies on that to try to modify bytes past the end of the actual data, then you will have difficult to debug trouble, from corruption of other data to memory leaks. If you're lucky, your program will crash.
It depends on what Python implementation you use, but typically the elements of a Unicode string are larger than one byte each. Save yourself some trouble and use byte strings / bytes instead.

Python-like Coding in C for Pointers

I am transitioning from Python to C, so my question might appear naive. I am reading tutorial on Python-C bindings and it is mentioned that:
In C, all parameters are pass-by-value. If you want to allow a function to change a variable in the caller, then you need to pass a pointer to that variable.
Question: Why cant we simply re-assign the values inside the function and be free from pointers?
The following code uses pointers:
#include <stdio.h>
int i = 24;
int increment(int *j){
(*j)++;
return *j;
}
void main() {
increment(&i);
printf("i = %d", i);
}
Now this can be replaced with the following code that doesn't use pointers:
int i = 24;
int increment(int j){
j++;
return j;
}
void main() {
i=increment(i);
printf("i = %d", i);
}

You can only return one thing from a function. If you need to update multiple parameters, or you need to use the return value for something other than the updated variable (such as an error code), you need to pass a pointer to the variable.

Getting this out of the way first - pointers are fundamental to C programming. You cannot be “free” of pointers when writing C. You might as well try to never use if statements, arrays, or any of the arithmetic operators. You cannot use a substantial chunk of the standard library without using pointers.
“Pass by value” means, among other things, that the formal parameter j in increment and the actual parameter i in main are separate objects in memory, and changing one has absolutely no effect on the other. The value of i is copied to j when the function is called, but any changes to i are not reflected in j and vice-versa.
We work around this in C by using pointers. Instead of passing the value of i to increment, we pass its address (by value), and then dereference that address with the unary * operator.
This is one of the cases where we have to use pointers. The other case is when we track dynamically-allocated memory. Pointers are also useful (if not strictly required) for building containers (lists, trees, queues, stacks, etc.).
Passing a value as a parameter and returning its updated value works, but only for a single parameter. Passing multiple parameters and returning their updated values in a struct type can work, but is not good style if you’re doing it just to avoid using pointers. It’s also not possible if the function must update parameters and return some kind of status (such as the scanf library function, for example).
Similarly, using file-scope variables does not scale and creates maintenance headaches. There are times when it’s not the wrong answer, but in general it’s not a good idea.

So, imagine you need to pass large arrays or other data structures that need modification. If you apply the way you use to increment an integer, then you create a copy of that large array for each call to that function. Obviously, it is not memory-friendly to create a copy, instead, we pass pointers to functions and do the updates on a single array or whatever it is.
Plus, as the other answer mentioned, if you need to update many parameters then it is impossible to return in the way you declared.

2D MemoryView from dynamic arrays in Cython

I am aware of this question, but I was looking for a simpler way to generate 2d memoryviews from C arrays. Since I am a C and Cython noobie, could someone please explain why something like
cdef int[:, :] get_zeros(int d):
# get 2-row array of zeros with d as second dimension
cdef int i
cdef int *arr = <int *> malloc(sizeof(int) * d)
for i in range(d):
arr[i] = 0
cdef int[:, :] arr_view
arr_view[0, :] = <int[:d]>arr
arr_view[1, :] = <int[:d]>arr
return arr_view
won't work?
When compiling it I get Cannot assign type 'int[::1]' to 'int' as error. Does this mean, that the 2d memview is collapsed by the first assign statement to 1d or is it because memoryviews need contiguous blocks etc.?

It's obviously quite hard to "explain why something [...] won't work", because ultimately it's just a design decision that could have been taken differently. But:
Cython memoryviews are designed to be pretty dumb. All they do is provide some nice syntax to access the memory of something that implements the Python buffer protocol, and then have a tiny bit of additional syntax to let you do things like get a 1D memoryview of a pointer.
Additionally, the memoryview as a whole wraps something. When you create cdef int[:, :] arr_view it's invalid until you do arr_view = something. Attempts to assign to part of it are nonsense, since (a) it'd delegate the assignment to the thing it wraps using the buffer protocol and (b) exactly how the assignment would work would depend on what format of buffer protocol you were wrapping. What you've done might be valid if wrapping an "indirect" buffer protocol object but would make no sense if wrapping a contiguous array. Since arr_view could be wrapping either the Cython compiler has to treat it as an error.
The question you link to implements the buffer protocol and so is the correct way to implement this kind of array. What you're attempting to do is to take the extra syntax that gives a 1D memoryview from a pointer and force that into part of a 2D memoryview in the vague hope that this might work. This requires a lot of logic that goes well beyond the scope of what a Cython memoryview was designed to do.
There's probably a couple of additional points worth making:
Memoryviews of pointers don't handle freeing of pointers (since it'd be pretty much impossible for them to second-guess what you want). You have to handle this logic. Your current design would leak memory, if it worked. In the design you linked to the wrapping class could implement this in __dealloc__ (although it isn't shown in that answer) and thus much better.
My personal view is that "ragged arrays" (2D arrays of pointers to pointers) are awful. They require a lot of allocation and deallocation. There's lots of opportunity to half-initialize them. Access to them requires a couple of levels of indirection and so is slow. The only thing going for them is that they provide a arr[idx1][idx2] syntax in C. In general I much prefer Numpy's approach of allocating a 1D array and using shape/strides to work out where to index. (Obviously if you're wrapping an existing library then you may not be your choice...)

In addition to the wonderful answer #DavidW has provided, I would like to add some more info. In your included code, I see that you are malloc-ing an array of ints and then zeroing out the contents in a for-loop. A more convenient way of accomplishing this is to use C's calloc function instead, which guarantees a pointer to zeroed memory and would not require a for loop afterwards.
Additionally, you could create a single int * that points to an "array" of data that is calloced to a total size of 2 * d * sizeof(int). This would ensure that both of the "rows" of data are contiguous with each other instead of separate and ragged. This could then be cast directly to a 2d memoryview.
As promised in the comments, here is what that conversion code could look like (with calloc use included):
cdef int[:, :] get_zeros(int d):
cdef int *arr = <int *>calloc(2 * d, sizeof(int))
cdef int[:, :] arr_view = <int[:2, :d]>arr
return arr_view
There also appears to be a calloc equivalent in the python c-api per the docs if you want to try it out. However, it does not appear to be wrapped in cython's mem.pxd module, which is why you were likely not able to find it. You could declare a similar extern block in your code to wrap it like the other functions included in that link.
And here is a bonus link if you want to know more about writing an allocator to dole out memory from a large block if you go the pre-allocation route (i.e. what PyMem_* functions likely do behind the scenes, but more tunable and under your control for your specific use case).

How do I wrap this C function, with multiple arguments, with ctypes?

I have the function prototype here:
extern "C" void __stdcall__declspec(dllexport) ReturnPulse(double*,double*,double*,double*,double*);
I need to write some python to access this function that is in a DLL.
I have loaded the DLL, but
each of the double* is actually pointing to a variable number of doubles (an array), and
I'm having trouble getting it to function properly.
Thanks all!

To make an array with, say, n doubles:
arr7 = ctypes.c_double * `n`
x = arr7()
and pass x to your function where it wants a double*. Or if you need to initialize x as you make it:
x = arr7(i*0.1 for i in xrange(7))
and the like. You can loop over x, index it, and so on.

I haven't looked at ctypes too much, but try using a numpy array of the right type. If that doesn't just automatically work, they also have a ctypes attribute that should contain a pointer to the data.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.