2D MemoryView from dynamic arrays in Cython

2D MemoryView from dynamic arrays in Cython - python

I am aware of this question, but I was looking for a simpler way to generate 2d memoryviews from C arrays. Since I am a C and Cython noobie, could someone please explain why something like
cdef int[:, :] get_zeros(int d):
# get 2-row array of zeros with d as second dimension
cdef int i
cdef int *arr = <int *> malloc(sizeof(int) * d)
for i in range(d):
arr[i] = 0
cdef int[:, :] arr_view
arr_view[0, :] = <int[:d]>arr
arr_view[1, :] = <int[:d]>arr
return arr_view
won't work?
When compiling it I get Cannot assign type 'int[::1]' to 'int' as error. Does this mean, that the 2d memview is collapsed by the first assign statement to 1d or is it because memoryviews need contiguous blocks etc.?

It's obviously quite hard to "explain why something [...] won't work", because ultimately it's just a design decision that could have been taken differently. But:
Cython memoryviews are designed to be pretty dumb. All they do is provide some nice syntax to access the memory of something that implements the Python buffer protocol, and then have a tiny bit of additional syntax to let you do things like get a 1D memoryview of a pointer.
Additionally, the memoryview as a whole wraps something. When you create cdef int[:, :] arr_view it's invalid until you do arr_view = something. Attempts to assign to part of it are nonsense, since (a) it'd delegate the assignment to the thing it wraps using the buffer protocol and (b) exactly how the assignment would work would depend on what format of buffer protocol you were wrapping. What you've done might be valid if wrapping an "indirect" buffer protocol object but would make no sense if wrapping a contiguous array. Since arr_view could be wrapping either the Cython compiler has to treat it as an error.
The question you link to implements the buffer protocol and so is the correct way to implement this kind of array. What you're attempting to do is to take the extra syntax that gives a 1D memoryview from a pointer and force that into part of a 2D memoryview in the vague hope that this might work. This requires a lot of logic that goes well beyond the scope of what a Cython memoryview was designed to do.
There's probably a couple of additional points worth making:
Memoryviews of pointers don't handle freeing of pointers (since it'd be pretty much impossible for them to second-guess what you want). You have to handle this logic. Your current design would leak memory, if it worked. In the design you linked to the wrapping class could implement this in __dealloc__ (although it isn't shown in that answer) and thus much better.
My personal view is that "ragged arrays" (2D arrays of pointers to pointers) are awful. They require a lot of allocation and deallocation. There's lots of opportunity to half-initialize them. Access to them requires a couple of levels of indirection and so is slow. The only thing going for them is that they provide a arr[idx1][idx2] syntax in C. In general I much prefer Numpy's approach of allocating a 1D array and using shape/strides to work out where to index. (Obviously if you're wrapping an existing library then you may not be your choice...)

In addition to the wonderful answer #DavidW has provided, I would like to add some more info. In your included code, I see that you are malloc-ing an array of ints and then zeroing out the contents in a for-loop. A more convenient way of accomplishing this is to use C's calloc function instead, which guarantees a pointer to zeroed memory and would not require a for loop afterwards.
Additionally, you could create a single int * that points to an "array" of data that is calloced to a total size of 2 * d * sizeof(int). This would ensure that both of the "rows" of data are contiguous with each other instead of separate and ragged. This could then be cast directly to a 2d memoryview.
As promised in the comments, here is what that conversion code could look like (with calloc use included):
cdef int[:, :] get_zeros(int d):
cdef int *arr = <int *>calloc(2 * d, sizeof(int))
cdef int[:, :] arr_view = <int[:2, :d]>arr
return arr_view
There also appears to be a calloc equivalent in the python c-api per the docs if you want to try it out. However, it does not appear to be wrapped in cython's mem.pxd module, which is why you were likely not able to find it. You could declare a similar extern block in your code to wrap it like the other functions included in that link.
And here is a bonus link if you want to know more about writing an allocator to dole out memory from a large block if you go the pre-allocation route (i.e. what PyMem_* functions likely do behind the scenes, but more tunable and under your control for your specific use case).

Related

Assigning ndarray in cython to new variable very slow? Or what is going on here?

I am fairly new to cython and I am wondering why the following takes very long:
cpdef test(a):
cdef np.ndarray[dtype=int] b
for i in range(10):
b=a
a=np.array([1,2,3],dtype=int)
t = timeit.Timer(functools.partial(test.test, a))
print(t.timeit(1000000))
-> 0.5446977 Seconds
If i comment out the cdef declaration this is done in no-time. If i declare "a" as np.ndarray in the function header nothing changes. Also, id(a) == id(b) so no new objects are created.
Similar behaviour can be observed when calling a function that takes many ndarray as args, e.g.
cpdef foo(np.ndarray a, np.ndarray b,np.ndarray c, ..... )
Can anybody help me? What am i missing here?
Edit:
I noticed the following:
This is slow:
cpdef foo(np.ndarray[dtype=int,ndim=1] a,np.ndarray[dtype=int,ndim=1] b,np.ndarray[dtype=int,ndim=1] c ) :
return
This is faster:
def foo(np.ndarray[dtype=int,ndim=1] a,np.ndarray[dtype=int,ndim=1] b,np.ndarray[dtype=int,ndim=1] c ) :
return
This is the fastest
cpdef foo( a,b,c ) :
return
The function foo() is called very frequently (many million times) in my project from many different locations and does some calculus with the three numpy arrays (however, it doesnt change their content).
I basically need the speed of knowing the data-type inside of the arrays while also having a very low function-call overead. What would be the most adequate solution for this?

b = a generates a bunch of type checking that needs to identify whether the type of a is actually an ndarray and makes sure it exports the buffer protocol with an appropriate element type. In exchange for this one-off cost you get fast indexing of single elements.
If you're not doing indexing of single elements then typing as np.ndarray is literally pointless and you're pessimizing your code. If you are doing this indexing then you can get significant optimizations.
If i comment out the cdef declaration this is done in no-time.
This is often a sign that the C compiler has realized the entire function does nothing and optimized it out completely. And therefore your measurement may be meaningless.
cpdef foo(np.ndarray a, np.ndarray b,np.ndarray c, ..... )
just specifying the type as np.ndarray without specifying the element dtype usually gains you very little, and is probably not worthwhile.
If you have a function that you're calling millions of times then it is likely that the input arrays come from somewhere, and can be pre-typed, probably with less frequency. For example they might come by taking slices from a larger array?
The newer memoryview syntax (int[:]) is quick to slice, so for example if you already have a 2D memoryview (int[:,:] x) it's very quick to generate a 1D memoryview from it with (e.g. x[:,0]), and it's quick to pass existing memoryviews into a cdef function with memoryview arguments. (Note that (a) I'm just unsure if all of this applies to np.ndarray too, and (b) seeing up a fresh memoryview is likely to be about the same cost an an np.ndarray so I'm only suggesting using them because I know slicing is quick).
Therefore my main suggestion is to move the typing outwards to try to reduce the number of fresh initializations of these typed arrays. If that isn't possible then I think you may be stuck.

Is there a limit to how large an array can be inside of a function in cython?

This code compiles and runs just fine:
cdef enum:
SIZE = 1000000
cdef int ar[SIZE]
cdef int i
for i in range(SIZE):
ar[i] = i
print(ar[5])
but this code:
cdef enum:
SIZE = 1000000
def test():
cdef int ar[SIZE]
cdef int i
for i in range(SIZE):
ar[i] = i
print(ar[5])
test()
crashes the python kernel (I'm running this with jupyter magic).
Is there some limit to how large arrays can in inside of functions? If there is, is there a way to remove that limit?

In the first case, the array is a global-variable statically-defined plain C array which are allocated on the heap. In the second case, the local-variable array is allocated on the stack. The thing is the stack has a fixed maximum size (generally something like 2 MiB but this can change a lot on different platforms). If you try to write something allocated beyond the limit of the stack you get a crash or a nice stack overflow error if you are lucky. Note that function calls and local variables temporary take some space in the stack. While the size of the stack can be resized, the best solution is to use dynamic allocation for relatively big arrays or the ones with a variable size. Hopefully, this is possible in Cython using malloc and free (see the documentation) but you should be careful not to forget the free (nor to free the array twice). An alternative solution is to create a Numpy array and then work with memory views (this is more expensive but prevent any possible leak).

How do I define an array of objects of an extension type in Cython without using list? [duplicate]

I would like to have a cython array of a cdef class:
cdef class Child:
cdef int i
def do(self):
self.i += 1
cdef class Mother:
cdef Child[:] array_of_child
def __init__(self):
for i in range(100):
self.array_of_child[i] = Child()

The answer is no - it is not really possible in a useful way: newsgroup post of essentially the same question
It wouldn't be possible to have a direct array (allocated in a single chunk) of Childs. Partly because, if somewhere else ever gets a reference to a Child in the array, that Child has to be kept alive (but not the whole array) which wouldn't be possible to ensure if they were all allocated in the same chunk of memory. Additionally, were the array to be resized (if this is a requirement) then it would invalidate any other references to the objects within the array.
Therefore you're left with having an array of pointers to Child. Such a structure would be fine, but internally would look almost exactly like a Python list (so there's really no benefit to doing something more complicated in Cython...).
There are a few sensible workarounds:
The workaround suggested in the newsgroup post is just to use a python list. You could also use a numpy array with dtype=object. If you need to to access a cdef function in the class you can do a cast first:
cdef Child c = <Child?>a[0] # omit the ? if you don't want
# the overhead of checking the type.
c.some_cdef_function()
Internally both these options are stored as an C array of PyObject pointers to your Child objects and so are not as inefficient as you probably assume.
A further possibility might be to store your data as a C struct (cdef struct ChildStruct: ....) which can be readily stored as an array. When you need a Python interface to that struct you can either define Child so it contains a copy of ChildStruct (but modifications won't propagate back to your original array), or a pointer to ChildStruct (but you need to be careful with ensuring that the memory is not freed which the Child pointing to it is alive).
You could use a Numpy structured array - this is pretty similar to using an array of C structs except Numpy handles the memory, and provides a Python interface.
The memoryview syntax in your question is valid: cdef Child[:] array_of_child. This can be initialized from a numpy array of dtype object:
array_of_child = np.array([(Child() for i in range(100)])
In terms of data-structure, this is an array of pointers (i.e. the same as a Python list, but can be multi-dimensional). It avoids the need for <Child> casting. The important thing it doesn't do is any kind of type-checking - if you feed an object that isn't Child into the array then it won't notice (because the underlying dtype is object), but will give nonsense answers or segmentation faults.
In my view this approach gives you a false sense of security about two things: first that you have made a more efficient data structure (you haven't, it's basically the same as a list); second that you have any kind of type safety. However, it does exist. (If you want to use memoryviews, e.g. for multi-dimensional arrays, it would probably be better to use a memoryview of type object - this is honest about the underlying dtype)

Cython: Transpose a memoryview

Some background for the question:
I'm trying to optimise a custom neural network code.
It relies heavily on loops and I decided to use cython to speed up the calculation.
I followed the usual online tips: Declare all local variables with appropriate cdefs and switch off boundscheck and nonecheck. This barely gave me 10% performance.
Well, my code relies on lots of class members. Therefore I decided to convert the entire class into a cdef class. Turns out that cython doesn't allow numpy ndarrays as types for class members. Instead one has to use memoryviews.
Unfortunately the two types seem to be vastly incompatible.
I already ran into this problem: Cython memoryview transpose: Typeerror
To sum it up: You can store an np.ndarray in a memoryview. You can transpose it and store the returned array in a memview. But not if that memview is a class member. Then you have to create an intermediate memview, store the result in that and assign the intermediate memview to the class member.
Here's the code ( many thanks to DavidW)
def double[:,:,:,:] temporary_view_of_transpose
# temporary_view_of_transpose now "looks at" the memory allocated by transpose
# no square brackets!
temporary_view_of_transpose = out_image.transpose(1, 0, 2, 3)
# data is copied from temporary_view_of_transpose to self.y
self.y[...] = temporary_view_of_transpose # (remembering that self.y must be the correct shape before this assignment).
Now I've got a new problem.
The code above is from the so-called "forward-pass". There is also a corresponding backward-pass, which does all the calculations backward (for analytical gradients).
This means that for the backward pass, I have to transpose the memoryview and store it in a numpy array:
cdef np.ndarray[DTYPE_t, ndim=4] d_out_image = self.d_y.transpose(1, 0, 2,3)
d_y has to be a class member, therefore it has to be a memoryview. Memoryviews don't allow transposing. They have a .T method, but that doesn't help me.
Actual Question:
How do I correctly store a numpy array as a class member of a cdef class?
If the answer is :"as a memoryview", how do I transpose a memoryview ?

I think the best answer is "you store the numpy as an untyped python object"
cdef class C:
cdef object array
def example_function(self):
# if you want to use the fast Cython array indexing in a function
# you can do:
cdef np.ndarray[np.float64_t,ndim=4] self_array = self.array
# or
cdef np.float64_t[:,:,:,:] self_array2 = self.array
# note that neither of these are copies - they're references
# to exactly the same array and so if you modify one it'll modify
# self.array too
def function2(self):
return self.array.transpose(1,0,2,3) # works fine!
The small cost to doing it this way is that there's a bit of type-checking at the start of example_function to check that it is actually a 4D numpy array with the correct dtype. Provided you do a decent amount of work in the function that shouldn't matter.
As an alternative (if you decide you really want to store them as memoryviews) you could use np.asarray to convert it back to a numpy array without making a copy (i.e. they share data).
e.g.
cdef np.ndarray[DTYPE_t, ndim=4] d_out_image = np.asarray(self.d_y).transpose(1, 0, 2,3)

Casting python list to cython ptr

For given ctype array or a python list, how does one cast python object to cython void ptr?
The way I'm doing it now is something like this(_arr is python list):
int *int_t = <int*> malloc(cython.sizeof(int) * len(_arr))
if int_t is NULL:
raise MemoryError()
for i in xrange(len(_arr)):
int_t[i] = _arr[i]
After this I have int_t, in which I have entire array. But I wan't the thing to be more general and support other types, not just int. Do I have to do the same thing for each type or is there any generic way in which this can be done?

First of all, you should be aware that numpy arrays support quite a range of data types. Here is a general introduction on how to work with numpy arrays in cython. If you want to do numerical stuff that should do the trick. Also, numpy does support arrays of Python objects. But then again, the array lookups are optimized but interaction with the objects is not, since they are still Python objects.
If you are thinking about doing this with arbitrary types, i.e. converting arbitrary Python objects into some kind of C type or C++ object, you have to do it manually for every type. And this makes sense, too, since automatic casting between a potentially dynamic data structure and a static one is not always obvious, especially not to a compiler.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.