numba.jit() allows entering a type signature but I can't figure out what the signature for a zero dimensional array is.
For example:
numba.jit('void(float32, float32[:])')
says the function return is void and the input arguments are float32 scalar and float32 1-D array.
But what is instead of a scalar I want to pass in a 0-dimensional array. What's the type signature? I tried the obvious float32[] but that didn't seem to work.
In case you are wondering how one gets a 0-D array in numpy you do it like this:
a = numpy.array(2)
which is different than
a = numpy.array([2])
the latter is a 1-D array.
This is how you can do it using numba.types.Array:
import numba as nb
import numpy as np
# |---------0d int array---------|
#nb.njit(nb.types.Array(nb.int64, 0, "C")())
def func():
return np.array(2)
Here I used that the returned value will be a C-contiguous int64 array with 0 dimensions. Adjust these as needed.
In my experience there is rarely a use-case (see "Benefit and Limitations of Ahead-of-Time compilation") for explicitly typed functions in numba - except for compilation times or in case one needs to avoid numba using already inferred types when it should compile a new function. So, personally, I wouldn't use these signatures.
Related
Can mypy check that a NumPy array of floats is passed as a function argument? For the code below mypy is silent when an array of integers or booleans is passed.
import numpy as np
import numpy.typing as npt
def half(x: npt.NDArray[np.cfloat]):
return x/2
print(half(np.full(4,2.1)))
print(half(np.full(4,6))) # want mypy to complain about this
print(half(np.full(4,True))) # want mypy to complain about this
Mypy can check the type of values passed as function arguments, but it currently has limited support for NumPy arrays. You can use the numpy.typing.NDArray type hint, as in your code, to specify that the half function takes a NumPy array of complex floats as an argument. However, mypy will not raise an error if an array of integers or booleans is passed, as it currently cannot perform type-checking on the elements of the array. To ensure that only arrays of complex floats are passed to the half function, you will need to write additional runtime checks within the function to validate the input.
I tried giving numba a go, as I was told it works very well for numerical/scientific computing applications. However, it seems that I've already run into a problem in the following scenario:
I have a function that computes a 12x12 Jacobian matrix, represented by a numpy array, and then returns this Jacobian. However, when I attempt to decorate said function with #numba.njit, I get the following error:
This is not usually a problem with Numba itself but instead often caused by
the use of unsupported features or an issue in resolving types.
As a basic example of my usage, the following code tries to declare a 12x12 numpy zero matrix, but it fails:
import numpy as np
import numba
#numba.njit
def numpy_matrix_test():
A = np.zeros([12,12])
return A
A_out = numpy_matrix_test()
print(A_out)
Since I assumed declaring numpy arrays in such a way was common enough that numba would be able to handle them, I'm quite surprised.
The assumption that the functions called in a numba jitted function are the same functions when not used in a numba function is actually wrong (but understandable). In reality numba (behind the scenes) delegates to its own functions instead of using the "real" NumPy functions.
So it's not really np.zeros that is called in the jitted function, it's their own function. So some differences between Numba and NumPy are unavoidable.
For example you cannot use a list for the shape, it has to be a tuple (lists and arrays produce the exception you've encountered). So the correct syntax would be:
#numba.njit
def numpy_matrix_test():
A = np.zeros((12, 12))
return A
Something similar applies to the dtype argument. It has to be a real NumPy/numba type, a Python type cannot be used:
#numba.njit
def numpy_matrix_test():
A = np.zeros((12, 12), dtype=int) # to make it work use numba.int64 instead of int here
return A
Even if "plain" NumPy allows it:
np.zeros((12, 12), dtype=int)
Do you perhaps mean numpy.zeros((12,12)), because you want a shape of 12 rows and 12 columns?
Numpy Zeros reference
Some background for the question:
I'm trying to optimise a custom neural network code.
It relies heavily on loops and I decided to use cython to speed up the calculation.
I followed the usual online tips: Declare all local variables with appropriate cdefs and switch off boundscheck and nonecheck. This barely gave me 10% performance.
Well, my code relies on lots of class members. Therefore I decided to convert the entire class into a cdef class. Turns out that cython doesn't allow numpy ndarrays as types for class members. Instead one has to use memoryviews.
Unfortunately the two types seem to be vastly incompatible.
I already ran into this problem: Cython memoryview transpose: Typeerror
To sum it up: You can store an np.ndarray in a memoryview. You can transpose it and store the returned array in a memview. But not if that memview is a class member. Then you have to create an intermediate memview, store the result in that and assign the intermediate memview to the class member.
Here's the code ( many thanks to DavidW)
def double[:,:,:,:] temporary_view_of_transpose
# temporary_view_of_transpose now "looks at" the memory allocated by transpose
# no square brackets!
temporary_view_of_transpose = out_image.transpose(1, 0, 2, 3)
# data is copied from temporary_view_of_transpose to self.y
self.y[...] = temporary_view_of_transpose # (remembering that self.y must be the correct shape before this assignment).
Now I've got a new problem.
The code above is from the so-called "forward-pass". There is also a corresponding backward-pass, which does all the calculations backward (for analytical gradients).
This means that for the backward pass, I have to transpose the memoryview and store it in a numpy array:
cdef np.ndarray[DTYPE_t, ndim=4] d_out_image = self.d_y.transpose(1, 0, 2,3)
d_y has to be a class member, therefore it has to be a memoryview. Memoryviews don't allow transposing. They have a .T method, but that doesn't help me.
Actual Question:
How do I correctly store a numpy array as a class member of a cdef class?
If the answer is :"as a memoryview", how do I transpose a memoryview ?
I think the best answer is "you store the numpy as an untyped python object"
cdef class C:
cdef object array
def example_function(self):
# if you want to use the fast Cython array indexing in a function
# you can do:
cdef np.ndarray[np.float64_t,ndim=4] self_array = self.array
# or
cdef np.float64_t[:,:,:,:] self_array2 = self.array
# note that neither of these are copies - they're references
# to exactly the same array and so if you modify one it'll modify
# self.array too
def function2(self):
return self.array.transpose(1,0,2,3) # works fine!
The small cost to doing it this way is that there's a bit of type-checking at the start of example_function to check that it is actually a 4D numpy array with the correct dtype. Provided you do a decent amount of work in the function that shouldn't matter.
As an alternative (if you decide you really want to store them as memoryviews) you could use np.asarray to convert it back to a numpy array without making a copy (i.e. they share data).
e.g.
cdef np.ndarray[DTYPE_t, ndim=4] d_out_image = np.asarray(self.d_y).transpose(1, 0, 2,3)
I have a Cython function that takes a 2d nd.array (numpy array) of integers and returns a 1d numpy array whose length is the same as the input 2d array.
import numpy as np
cimport numpy as np
np.import_array()
cimport cython
def func(np.ndarray[np.float_t, dim=2] input_arr):
cdef np.ndarray[np.float_t, ndim=1] new_arr = ...
# do stuff
return new_arr
In another loop in the program, I want to call func, but pass it a 2d array that is created dynamically from another 2d array. Right now I have:
my_2d_numpy_array = np.array([[0.5, 0.1], [0.1, 10]]) # assume this is defined
cdef int N = 10000
cdef int k
for j in xrange(N)
# find some element k of interest
# create a 2d array on fly containing just the k-th to func()
func(np.array([my_2d_numpy_array[k]], dtype=float)) # KEY LINE
This works, but I think that the call to np.array each time inside the loop creates a huge overhead, because it goes back to Python. Since func only reads the array and doesn't modify it, how can I just pass it a view of the array as a pointer, without making a new array by going back to Python? I'm only interested in pulling out the kth row of my_2d_numpy_array and passing that to func()
Update: A related question: if I am using an nd.array inside the loop but don't need the full functionality of nd.array in func, can I make func instead take something like a static C array and somehow treat the nd.array as that? Will that save costs? Presumably then you don't have to pass an object to func (nd.array is an object)
You want to use Cython memory views.
They are designed for passing array slices between functions that are a part of the same Cython module.
You may need to inline the function within your Cython module to get the full performance benefit, but that isn't always necessary.
You can take a look at the documentation.
I recently wrote a rather lengthy answer to another question that looks in to when memory views should be used.
If you want a more detailed examination of why slicing works well with memory views, have a look at this blog post.
If you don't use memory views, the slicing involving NumPy arrays still involves a Python call and is not performed in C.
For your specific case, here are a few thoughts:
If you are passing array slices between functions in your Cython module you should be able to use a memory view to pass the slices.
This approach does depend on compile-time optimizations, so if you need to pass an array between two functions that are compiled at separate times, you will have to use a pointer to pass data between functions.
This will mean doing some careful pointer arithmetic, but it should still work.
If you need to do slicing and use NumPy functions, you may just end up having to use NumPy arrays, but it could be worth trying to use NumPy arrays and memory views that view the same data.
That way you will be able to pass slices as memory views, while only having to create NumPy arrays when you really need them.
Also, I would recommend making the function func a C-function so that you don't have to go through the overhead of calling a Python function when you call it.
You can do that by using the cdef or cpdef keyword to declare it.
Use cdef if you don't need to call it from outside the module.
Use cpdef if you want a C function and a corresponding Python wrapper that is accessible to Python.
func(my_2d_numpy_array[k:k+1])
Slicing my_2d_numpy_array instead of indexing it gets you the view you wanted with the shape you wanted.
I'm working on a project in Python requiring a lot of numerical array calculations. Unfortunately (or fortunately, depending on your POV), I'm very new to Python, but have been doing MATLAB and Octave programming (APL before that) for years. I'm very used to having every variable automatically typed to a matrix float, and still getting used to checking input types.
In many of my functions, I require the input S to be a numpy.ndarray of size (n,p), so I have to both test that type(S) is numpy.ndarray and get the values (n,p) = numpy.shape(S). One potential problem is that the input could be a list/tuple/int/etc..., another problem is that the input could be an array of shape (): S.ndim = 0. It occurred to me that I could simultaneously test the variable type, fix the S.ndim = 0problem, then get my dimensions like this:
# first simultaneously test for ndarray and get proper dimensions
try:
if (S.ndim == 0):
S = S.copy(); S.shape = (1,1);
# define dimensions p, and p2
(p,p2) = numpy.shape(S);
except AttributeError: # got here because input is not something array-like
raise AttributeError("blah blah blah");
Though it works, I'm wondering if this is a valid thing to do? The docstring for ndim says
If it is not already an ndarray, a conversion is
attempted.
and we surely know that numpy can easily convert an int/tuple/list to an array, so I'm confused why an AttributeError is being raised for these types inputs, when numpy should be doing this
numpy.array(S).ndim;
which should work.
When doing input validation for NumPy code, I always use np.asarray:
>>> np.asarray(np.array([1,2,3]))
array([1, 2, 3])
>>> np.asarray([1,2,3])
array([1, 2, 3])
>>> np.asarray((1,2,3))
array([1, 2, 3])
>>> np.asarray(1)
array(1)
>>> np.asarray(1).shape
()
This function has the nice feature that it only copies data when necessary; if the input is already an ndarray, the data is left in-place (only the type may be changed, because it also gets rid of that pesky np.matrix).
The docstring for ndim says
That's the docstring for the function np.ndim, not the ndim attribute, which non-NumPy objects don't have. You could use that function, but the effect would be that the data might be copied twice, so instead do:
S = np.asarray(S)
(p, p2) = S.shape
This will raise a ValueError if S.ndim != 2.
[Final note: you don't need ; in Python if you just follow the indentation rules. In fact, Python programmers eschew the semicolon.]
Given the comments to #larsmans answer, you could try:
if not isinstance(S, np.ndarray):
raise TypeError("Input not a ndarray")
if S.ndim == 0:
S = np.reshape(S, (1,1))
(p, p2) = S.shape
First, you check explicitly whether S is a (subclass of) ndarray. Then, you use the np.reshape to copy your data (and reshaping it, of course) if needed. At last, you get the dimension.
Note that in most cases, the np functions will first try to access the corresponding method of a ndarray, then attempt to convert the input to a ndarray (sometimes keeping it a subclass, as in np.asanyarray, sometimes not (as in np.asarray(...)). In other terms, it's always more efficient to use the method rather than the function: that's why we're using S.shape and not np.shape(S).
Another point: the np.asarray, np.asanyarray, np.atleast_1D... are all particular cases of the more generic function np.array. For example, asarray sets the optional copy argument of array to False, asanyarray does the same and sets subok=True, atleast_1D sets ndmin=1, atleast_2d sets ndmin=2... In other terms, it's always easier to use np.array with the appropriate arguments. But as mentioned in some comments, it's a matter of style. Shortcuts can often improve readability, which is always an objective to keep in mind.
In any case, when you use np.array(..., copy=True), you're explicitly asking for a copy of your initial data, a bit like doing a list([....]). Even if nothing else changed, your data will be copied. That has the advantages of its drawbacks (as we say in French), you could for example change the order from row-first C to column-first F. But anyway, you get the copy you wanted.
With np.array(input, copy=False), a new array is always created. It will either point to the same block of memory as input if this latter was already a ndarray (that is, no waste of memory), or will create a new one "from scratch" if input wasn't. The interesting case is of course if input was a ndarray.
Using this new array in a function may or may not change the original input, depending on the function. You have to check the documentation of the function you want to use to see whether it returns a copy or not. The NumPy developers try hard to limit unnecessary copies (following the Python example), but sometimes it can't be avoided. The documentation should tell explicitly what happens, if it doesn't or it's unclear, please mention it.
np.array(...) may raise some exceptions if something goes awry. For example, trying to use a dtype=float with an input like ["STRING", 1] will raise a ValueError. However, I must admit I can't remember which exceptions in all the cases, please edit this post accordingly.
Welcome to stack-overflow. This comes down to almost a style choice, but the most common way I've seen to deal with this kind of situation is to convert the input to an array. Numpy provides some useful tools for this. numpy.asarray has already been mentioned, but here are a few more. numpy.at_least1d is similar to asarray, but reshapes () arrays to be (1,) numpy.at_least2d is the same as above but reshapes 0d and 1d arrays to be 2d, ie (3,) to (1, 3). The reason we convert "array_like" inputs to arrays is partly just because we're lazy, for example sometimes it can be easier to write foo([1, 2, 3]) than foo(numpy.array([1, 2, 3])), but this is also the design choice made within numpy itself. Notice that the following works:
>>> numpy.mean([1., 2., 3.])
>>> 2.0
In the docs for numpy.mean we can see that x should be "array_like".
Parameters
----------
a : array_like
Array containing numbers whose mean is desired. If `a` is not an
array, a conversion is attempted.
That being said, there are situations when you want to only accept arrays as arguments and not all "array_like" types.