Some background for the question:
I'm trying to optimise a custom neural network code.
It relies heavily on loops and I decided to use cython to speed up the calculation.
I followed the usual online tips: Declare all local variables with appropriate cdefs and switch off boundscheck and nonecheck. This barely gave me 10% performance.
Well, my code relies on lots of class members. Therefore I decided to convert the entire class into a cdef class. Turns out that cython doesn't allow numpy ndarrays as types for class members. Instead one has to use memoryviews.
Unfortunately the two types seem to be vastly incompatible.
I already ran into this problem: Cython memoryview transpose: Typeerror
To sum it up: You can store an np.ndarray in a memoryview. You can transpose it and store the returned array in a memview. But not if that memview is a class member. Then you have to create an intermediate memview, store the result in that and assign the intermediate memview to the class member.
Here's the code ( many thanks to DavidW)
def double[:,:,:,:] temporary_view_of_transpose
# temporary_view_of_transpose now "looks at" the memory allocated by transpose
# no square brackets!
temporary_view_of_transpose = out_image.transpose(1, 0, 2, 3)
# data is copied from temporary_view_of_transpose to self.y
self.y[...] = temporary_view_of_transpose # (remembering that self.y must be the correct shape before this assignment).
Now I've got a new problem.
The code above is from the so-called "forward-pass". There is also a corresponding backward-pass, which does all the calculations backward (for analytical gradients).
This means that for the backward pass, I have to transpose the memoryview and store it in a numpy array:
cdef np.ndarray[DTYPE_t, ndim=4] d_out_image = self.d_y.transpose(1, 0, 2,3)
d_y has to be a class member, therefore it has to be a memoryview. Memoryviews don't allow transposing. They have a .T method, but that doesn't help me.
Actual Question:
How do I correctly store a numpy array as a class member of a cdef class?
If the answer is :"as a memoryview", how do I transpose a memoryview ?
I think the best answer is "you store the numpy as an untyped python object"
cdef class C:
cdef object array
def example_function(self):
# if you want to use the fast Cython array indexing in a function
# you can do:
cdef np.ndarray[np.float64_t,ndim=4] self_array = self.array
# or
cdef np.float64_t[:,:,:,:] self_array2 = self.array
# note that neither of these are copies - they're references
# to exactly the same array and so if you modify one it'll modify
# self.array too
def function2(self):
return self.array.transpose(1,0,2,3) # works fine!
The small cost to doing it this way is that there's a bit of type-checking at the start of example_function to check that it is actually a 4D numpy array with the correct dtype. Provided you do a decent amount of work in the function that shouldn't matter.
As an alternative (if you decide you really want to store them as memoryviews) you could use np.asarray to convert it back to a numpy array without making a copy (i.e. they share data).
e.g.
cdef np.ndarray[DTYPE_t, ndim=4] d_out_image = np.asarray(self.d_y).transpose(1, 0, 2,3)
Related
I am fairly new to cython and I am wondering why the following takes very long:
cpdef test(a):
cdef np.ndarray[dtype=int] b
for i in range(10):
b=a
a=np.array([1,2,3],dtype=int)
t = timeit.Timer(functools.partial(test.test, a))
print(t.timeit(1000000))
-> 0.5446977 Seconds
If i comment out the cdef declaration this is done in no-time. If i declare "a" as np.ndarray in the function header nothing changes. Also, id(a) == id(b) so no new objects are created.
Similar behaviour can be observed when calling a function that takes many ndarray as args, e.g.
cpdef foo(np.ndarray a, np.ndarray b,np.ndarray c, ..... )
Can anybody help me? What am i missing here?
Edit:
I noticed the following:
This is slow:
cpdef foo(np.ndarray[dtype=int,ndim=1] a,np.ndarray[dtype=int,ndim=1] b,np.ndarray[dtype=int,ndim=1] c ) :
return
This is faster:
def foo(np.ndarray[dtype=int,ndim=1] a,np.ndarray[dtype=int,ndim=1] b,np.ndarray[dtype=int,ndim=1] c ) :
return
This is the fastest
cpdef foo( a,b,c ) :
return
The function foo() is called very frequently (many million times) in my project from many different locations and does some calculus with the three numpy arrays (however, it doesnt change their content).
I basically need the speed of knowing the data-type inside of the arrays while also having a very low function-call overead. What would be the most adequate solution for this?
b = a generates a bunch of type checking that needs to identify whether the type of a is actually an ndarray and makes sure it exports the buffer protocol with an appropriate element type. In exchange for this one-off cost you get fast indexing of single elements.
If you're not doing indexing of single elements then typing as np.ndarray is literally pointless and you're pessimizing your code. If you are doing this indexing then you can get significant optimizations.
If i comment out the cdef declaration this is done in no-time.
This is often a sign that the C compiler has realized the entire function does nothing and optimized it out completely. And therefore your measurement may be meaningless.
cpdef foo(np.ndarray a, np.ndarray b,np.ndarray c, ..... )
just specifying the type as np.ndarray without specifying the element dtype usually gains you very little, and is probably not worthwhile.
If you have a function that you're calling millions of times then it is likely that the input arrays come from somewhere, and can be pre-typed, probably with less frequency. For example they might come by taking slices from a larger array?
The newer memoryview syntax (int[:]) is quick to slice, so for example if you already have a 2D memoryview (int[:,:] x) it's very quick to generate a 1D memoryview from it with (e.g. x[:,0]), and it's quick to pass existing memoryviews into a cdef function with memoryview arguments. (Note that (a) I'm just unsure if all of this applies to np.ndarray too, and (b) seeing up a fresh memoryview is likely to be about the same cost an an np.ndarray so I'm only suggesting using them because I know slicing is quick).
Therefore my main suggestion is to move the typing outwards to try to reduce the number of fresh initializations of these typed arrays. If that isn't possible then I think you may be stuck.
I would like to have a cython array of a cdef class:
cdef class Child:
cdef int i
def do(self):
self.i += 1
cdef class Mother:
cdef Child[:] array_of_child
def __init__(self):
for i in range(100):
self.array_of_child[i] = Child()
The answer is no - it is not really possible in a useful way: newsgroup post of essentially the same question
It wouldn't be possible to have a direct array (allocated in a single chunk) of Childs. Partly because, if somewhere else ever gets a reference to a Child in the array, that Child has to be kept alive (but not the whole array) which wouldn't be possible to ensure if they were all allocated in the same chunk of memory. Additionally, were the array to be resized (if this is a requirement) then it would invalidate any other references to the objects within the array.
Therefore you're left with having an array of pointers to Child. Such a structure would be fine, but internally would look almost exactly like a Python list (so there's really no benefit to doing something more complicated in Cython...).
There are a few sensible workarounds:
The workaround suggested in the newsgroup post is just to use a python list. You could also use a numpy array with dtype=object. If you need to to access a cdef function in the class you can do a cast first:
cdef Child c = <Child?>a[0] # omit the ? if you don't want
# the overhead of checking the type.
c.some_cdef_function()
Internally both these options are stored as an C array of PyObject pointers to your Child objects and so are not as inefficient as you probably assume.
A further possibility might be to store your data as a C struct (cdef struct ChildStruct: ....) which can be readily stored as an array. When you need a Python interface to that struct you can either define Child so it contains a copy of ChildStruct (but modifications won't propagate back to your original array), or a pointer to ChildStruct (but you need to be careful with ensuring that the memory is not freed which the Child pointing to it is alive).
You could use a Numpy structured array - this is pretty similar to using an array of C structs except Numpy handles the memory, and provides a Python interface.
The memoryview syntax in your question is valid: cdef Child[:] array_of_child. This can be initialized from a numpy array of dtype object:
array_of_child = np.array([(Child() for i in range(100)])
In terms of data-structure, this is an array of pointers (i.e. the same as a Python list, but can be multi-dimensional). It avoids the need for <Child> casting. The important thing it doesn't do is any kind of type-checking - if you feed an object that isn't Child into the array then it won't notice (because the underlying dtype is object), but will give nonsense answers or segmentation faults.
In my view this approach gives you a false sense of security about two things: first that you have made a more efficient data structure (you haven't, it's basically the same as a list); second that you have any kind of type safety. However, it does exist. (If you want to use memoryviews, e.g. for multi-dimensional arrays, it would probably be better to use a memoryview of type object - this is honest about the underlying dtype)
I am aware of this question, but I was looking for a simpler way to generate 2d memoryviews from C arrays. Since I am a C and Cython noobie, could someone please explain why something like
cdef int[:, :] get_zeros(int d):
# get 2-row array of zeros with d as second dimension
cdef int i
cdef int *arr = <int *> malloc(sizeof(int) * d)
for i in range(d):
arr[i] = 0
cdef int[:, :] arr_view
arr_view[0, :] = <int[:d]>arr
arr_view[1, :] = <int[:d]>arr
return arr_view
won't work?
When compiling it I get Cannot assign type 'int[::1]' to 'int' as error. Does this mean, that the 2d memview is collapsed by the first assign statement to 1d or is it because memoryviews need contiguous blocks etc.?
It's obviously quite hard to "explain why something [...] won't work", because ultimately it's just a design decision that could have been taken differently. But:
Cython memoryviews are designed to be pretty dumb. All they do is provide some nice syntax to access the memory of something that implements the Python buffer protocol, and then have a tiny bit of additional syntax to let you do things like get a 1D memoryview of a pointer.
Additionally, the memoryview as a whole wraps something. When you create cdef int[:, :] arr_view it's invalid until you do arr_view = something. Attempts to assign to part of it are nonsense, since (a) it'd delegate the assignment to the thing it wraps using the buffer protocol and (b) exactly how the assignment would work would depend on what format of buffer protocol you were wrapping. What you've done might be valid if wrapping an "indirect" buffer protocol object but would make no sense if wrapping a contiguous array. Since arr_view could be wrapping either the Cython compiler has to treat it as an error.
The question you link to implements the buffer protocol and so is the correct way to implement this kind of array. What you're attempting to do is to take the extra syntax that gives a 1D memoryview from a pointer and force that into part of a 2D memoryview in the vague hope that this might work. This requires a lot of logic that goes well beyond the scope of what a Cython memoryview was designed to do.
There's probably a couple of additional points worth making:
Memoryviews of pointers don't handle freeing of pointers (since it'd be pretty much impossible for them to second-guess what you want). You have to handle this logic. Your current design would leak memory, if it worked. In the design you linked to the wrapping class could implement this in __dealloc__ (although it isn't shown in that answer) and thus much better.
My personal view is that "ragged arrays" (2D arrays of pointers to pointers) are awful. They require a lot of allocation and deallocation. There's lots of opportunity to half-initialize them. Access to them requires a couple of levels of indirection and so is slow. The only thing going for them is that they provide a arr[idx1][idx2] syntax in C. In general I much prefer Numpy's approach of allocating a 1D array and using shape/strides to work out where to index. (Obviously if you're wrapping an existing library then you may not be your choice...)
In addition to the wonderful answer #DavidW has provided, I would like to add some more info. In your included code, I see that you are malloc-ing an array of ints and then zeroing out the contents in a for-loop. A more convenient way of accomplishing this is to use C's calloc function instead, which guarantees a pointer to zeroed memory and would not require a for loop afterwards.
Additionally, you could create a single int * that points to an "array" of data that is calloced to a total size of 2 * d * sizeof(int). This would ensure that both of the "rows" of data are contiguous with each other instead of separate and ragged. This could then be cast directly to a 2d memoryview.
As promised in the comments, here is what that conversion code could look like (with calloc use included):
cdef int[:, :] get_zeros(int d):
cdef int *arr = <int *>calloc(2 * d, sizeof(int))
cdef int[:, :] arr_view = <int[:2, :d]>arr
return arr_view
There also appears to be a calloc equivalent in the python c-api per the docs if you want to try it out. However, it does not appear to be wrapped in cython's mem.pxd module, which is why you were likely not able to find it. You could declare a similar extern block in your code to wrap it like the other functions included in that link.
And here is a bonus link if you want to know more about writing an allocator to dole out memory from a large block if you go the pre-allocation route (i.e. what PyMem_* functions likely do behind the scenes, but more tunable and under your control for your specific use case).
numba.jit() allows entering a type signature but I can't figure out what the signature for a zero dimensional array is.
For example:
numba.jit('void(float32, float32[:])')
says the function return is void and the input arguments are float32 scalar and float32 1-D array.
But what is instead of a scalar I want to pass in a 0-dimensional array. What's the type signature? I tried the obvious float32[] but that didn't seem to work.
In case you are wondering how one gets a 0-D array in numpy you do it like this:
a = numpy.array(2)
which is different than
a = numpy.array([2])
the latter is a 1-D array.
This is how you can do it using numba.types.Array:
import numba as nb
import numpy as np
# |---------0d int array---------|
#nb.njit(nb.types.Array(nb.int64, 0, "C")())
def func():
return np.array(2)
Here I used that the returned value will be a C-contiguous int64 array with 0 dimensions. Adjust these as needed.
In my experience there is rarely a use-case (see "Benefit and Limitations of Ahead-of-Time compilation") for explicitly typed functions in numba - except for compilation times or in case one needs to avoid numba using already inferred types when it should compile a new function. So, personally, I wouldn't use these signatures.
I have a Cython function that takes a 2d nd.array (numpy array) of integers and returns a 1d numpy array whose length is the same as the input 2d array.
import numpy as np
cimport numpy as np
np.import_array()
cimport cython
def func(np.ndarray[np.float_t, dim=2] input_arr):
cdef np.ndarray[np.float_t, ndim=1] new_arr = ...
# do stuff
return new_arr
In another loop in the program, I want to call func, but pass it a 2d array that is created dynamically from another 2d array. Right now I have:
my_2d_numpy_array = np.array([[0.5, 0.1], [0.1, 10]]) # assume this is defined
cdef int N = 10000
cdef int k
for j in xrange(N)
# find some element k of interest
# create a 2d array on fly containing just the k-th to func()
func(np.array([my_2d_numpy_array[k]], dtype=float)) # KEY LINE
This works, but I think that the call to np.array each time inside the loop creates a huge overhead, because it goes back to Python. Since func only reads the array and doesn't modify it, how can I just pass it a view of the array as a pointer, without making a new array by going back to Python? I'm only interested in pulling out the kth row of my_2d_numpy_array and passing that to func()
Update: A related question: if I am using an nd.array inside the loop but don't need the full functionality of nd.array in func, can I make func instead take something like a static C array and somehow treat the nd.array as that? Will that save costs? Presumably then you don't have to pass an object to func (nd.array is an object)
You want to use Cython memory views.
They are designed for passing array slices between functions that are a part of the same Cython module.
You may need to inline the function within your Cython module to get the full performance benefit, but that isn't always necessary.
You can take a look at the documentation.
I recently wrote a rather lengthy answer to another question that looks in to when memory views should be used.
If you want a more detailed examination of why slicing works well with memory views, have a look at this blog post.
If you don't use memory views, the slicing involving NumPy arrays still involves a Python call and is not performed in C.
For your specific case, here are a few thoughts:
If you are passing array slices between functions in your Cython module you should be able to use a memory view to pass the slices.
This approach does depend on compile-time optimizations, so if you need to pass an array between two functions that are compiled at separate times, you will have to use a pointer to pass data between functions.
This will mean doing some careful pointer arithmetic, but it should still work.
If you need to do slicing and use NumPy functions, you may just end up having to use NumPy arrays, but it could be worth trying to use NumPy arrays and memory views that view the same data.
That way you will be able to pass slices as memory views, while only having to create NumPy arrays when you really need them.
Also, I would recommend making the function func a C-function so that you don't have to go through the overhead of calling a Python function when you call it.
You can do that by using the cdef or cpdef keyword to declare it.
Use cdef if you don't need to call it from outside the module.
Use cpdef if you want a C function and a corresponding Python wrapper that is accessible to Python.
func(my_2d_numpy_array[k:k+1])
Slicing my_2d_numpy_array instead of indexing it gets you the view you wanted with the shape you wanted.