Calling a numba.guvectorize'ed function inside of another

Calling a numba.guvectorize'ed function inside of another - python

I'm trying to call a vectorized function (numba.guvectorize) inside another guvectorize'd function:
import numpy as np
from numba import guvectorize
#guvectorize("(float64[:], float64[:])", "(n) -> (n)")
def pregain(device_gain, out):
gain = np.cumsum(device_gain)[:-1]
out[:] = np.concatenate((np.array([0.0]), gain))
#guvectorize("(float64[:], float64[:])", "(n) -> (n)")
def call_pregain(device_gain, out):
pgain = pregain(device_gain)
out[:] = pgain
call_pregain(np.array([12, -1.5, 8, -1, 2, -0.8, 15]))
But I get this error and numba falls back to object mode:
NumbaWarning: Compilation is falling back to object mode WITHOUT looplifting enabled because Function "call_pregain" failed type inference due to: Untyped global name 'pregain': Cannot determine Numba type of <class 'numba.np.ufunc.gufunc.GUFunc'>
File "mvce.py", line 13:
def call_pregain(device_gain, out):
pgain = pregain(device_gain)
^

Related

How do you use the user_data argument in scipy.LowLevelCallable in conjunction with scipy.ndimage.generic_filter?

I would like to use scipy's LowLevelCallable utility to improve the performance of my call to generic_filter with my own defined function, which takes in two arrays as input parameters. In this working example they show how a regular call to generic_filter could look.
LowLevelCallable already has fixed input arguments:
int callback(double *buffer, npy_intp filter_size,
double *return_value, void *user_data)
so the only way to pass this second array is via the user_data pointer. However, in order to create a carray object I need both the pointer to the array as well as it's length. How can I modify my function wrapper to pass two objects to my function?
from numba import cfunc,carray,jit
from numba.types import intc, CPointer, float64, intp, voidptr
from scipy import LowLevelCallable, ndimage
import numpy as np
image = np.random.random((128, 128))
footprint = np.array([[0, 1, 0],
[1, 1, 1],
[0, 1, 0]], dtype=bool)
def jit_filter_function(filter_function):
jitted_function = jit(filter_function, nopython=True)
#cfunc(intc(CPointer(float64), intp, CPointer(float64), voidptr))
def wrapped(values_ptr, len_values, result, data):
values = carray(values_ptr, (len_values,), dtype=float64)
more = carray(...) # what do I put here?
result[0] = jitted_function(values,more)
return 1
return LowLevelCallable(wrapped.ctypes)
#jit_filter_function
def fmin(values: np.ndarray, more:np.ndarray):
result = np.inf
for v in values:
if v < result:
result = v
return result
ndimage.generic_filter(image, fmin, footprint=footprint,extra_arguments = (np.array([1,2,3]),))

TypingError when using numpy.stack() with numba njit

The original issue is connected to using np.linspace with arrays as start and stop parameters, though right now I'm having issues with the workaround I came up with.
Take the following:
from numba import njit
import numpy as np
#njit
def f1():
start = np.array([0.1, 1.0], np.float32)
stop = np.array([1.0, 10.0], np.float32)
return np.linspace(start, stop, 10)
f1()
This will raise an error, because though documented as supporting "only the 3-argument form" of linspace, what they actually mean is "the 3-argument form with scalar values for start and stop".
So I came up with the folloing workaround:
import numpy as np
from numba import njit
#njit
def f2():
start = np.array([0.1, 1.0], np.float32)
stop = np.array([1.0, 10.0], np.float32)
pts_0 = np.linspace(start[0], stop[0], 10).astype(np.float32) # works
pts_1 = np.linspace(start[1], stop[1], 10).astype(np.float32) # works
return np.stack([pts_0, pts_1]).T # error
which raises this error:
---------------------------------------------------------------------------
TypingError Traceback (most recent call last)
c:\Users\X\Desktop\X\data_analysis.ipynb Cell 46' in <cell line: 18>()
15 pts_1 = np.linspace(start[1], stop[1], 10).astype(np.float32)
16 return np.stack([pts_0, pts_1]).T
---> 18 r = f2()
File c:\Users\X\miniconda3\envs\X\lib\site-packages\numba\core\dispatcher.py:468, in _DispatcherBase._compile_for_args(self, *args, **kws)
464 msg = (f"{str(e).rstrip()} \n\nThis error may have been caused "
465 f"by the following argument(s):\n{args_str}\n")
466 e.patch_message(msg)
--> 468 error_rewrite(e, 'typing')
469 except errors.UnsupportedError as e:
470 # Something unsupported is present in the user code, add help info
471 error_rewrite(e, 'unsupported_error')
File c:\Users\X\miniconda3\envs\X\lib\site-packages\numba\core\dispatcher.py:409, in _DispatcherBase._compile_for_args.<locals>.error_rewrite(e, issue_type)
407 raise e
408 else:
--> 409 raise e.with_traceback(None)
TypingError: Failed in nopython mode pipeline (step: nopython frontend)
No implementation of function Function(<function stack at 0x00000186F280CAF0>) found for signature:
>>> stack(list(array(float32, 1d, C))<iv=None>)
Again, according to the documentation, np.stack is supported (no side-commens on this one either).
What am I missing?

np.stack is supported but it expect a tuple instead of a list so far. Here is a fixed code:
#njit
def f2():
start = np.array([0.1, 1.0], np.float32)
stop = np.array([1.0, 10.0], np.float32)
pts_0 = np.linspace(start[0], stop[0], 10).astype(np.float32) # works
pts_1 = np.linspace(start[1], stop[1], 10).astype(np.float32) # works
return np.stack((pts_0, pts_1)).T # works
By the way, note that np.stack((pts_0, pts_1)).T is not very efficient since it creates temporary arrays and a non-contiguous view. Since the purpose of using Numba is to speed up codes, consider using basic loops that should be faster here. The same thing applies for astype(np.float32): a loop can cast the values in-place. Memory and allocations are expensive and this is often what make Numpy slower (also the lack of specific-purpose functions). Such things will be slower in the future (for more information, consider reading more about the "memory wall") so one need to avoid them.
Here is a significantly faster version with basic loops:
#njit
def f2():
start1, start2 = np.float32(0.1), np.float32(1.0)
stop1, stop2 = np.float32(1.0), np.float32(10.0)
steps = 10
delta = np.float32(1 / (steps - 1))
res = np.empty((steps, 2), dtype=np.float32)
for i in range(steps):
res[i, 0] = start1 + (stop1 - start1) * (delta * i)
res[i, 1] = start2 + (stop2 - start2) * (delta * i)
return res
Note that results can be slightly different due to 32-bit FP rounding.

How to use supported numpy and math functions with CUDA in Python?

According to numba 0.51.2 documentation, CUDA Python supports several math functions. However, it doesn't work in the following kernel function:
#cuda.jit
def find_angle(angles):
i, j = cuda.grid(2)
if i < angles.shape[0] and j < angles.shape[1]:
angles[i][j] = math.atan2(j, i)
The output:
numba.core.errors.LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
No definition for lowering <built-in function atan2>(int64, int64) -> float64
Am I using the function incorrectly?

The hint to the source of the problem is here:
No definition for lowering <built-in function atan2>(int64, int64) -> float64
The arguments returned by cuda.grid() (i.e. i, j which you are passing to atan2) are integer values because they are related to indexing.
numba can't find a version of atan2 that it can use that takes two integer arguments and returns a floating-point value:
float64 = atan2(int64, int64)
One possible solution is to convert your atan2 input arguments to match the type that numba seems to want to return from that function, which is evidently float64:
from numba import cuda, float64
import numpy
import math
#cuda.jit
def find_angle(angles):
i, j = cuda.grid(2)
if i < angles.shape[0] and j < angles.shape[1]:
angles[i][j] = math.atan2(float64(j), float64(i))
block_x = 32
block_y = 32
block = (block_x, block_y)
x = 256
y = 256
grid = (x//block_x, y//block_y) # not for arbitrary x and y
angles = numpy.ones((x, y), numpy.float64)
find_angle[grid, block](angles)

Device function throws nopython exception when its returning a list instead of an integer

a device function I have written always throws a no python exception and I do not understand why or where my error is.
Here a small example that represents my problem.
I have the following device function that I call from a kernel:
#cuda.jit (device=True)
def sub_stuff(vec_a, vec_b):
x0 = vec_a[0] - vec_b[0]
x1 = vec_a[1] - vec_b[1]
x2 = vec_a[2] - vec_b[2]
return [x0, x1, x2]
The kernel that calls this function looks like this:
#cuda.jit
def kernel_via_polygon(vectors_a, vectors_b, result_array):
pos = cuda.grid(1)
if pos < vectors_a.size and pos < result_array.size:
result_array[pos] = sub_stuff(vectors_a[pos], vectors_b[pos])
The three input arrays are the following:
vectors_a = np.arange(1, 10).reshape((3, 3))
vectors_b = np.arange(1, 10).reshape((3, 3))
result = np.zeros_like(vectors_a)
When I now call the function via trace_via_polygon(vectors_a, vectors_b, result) a no python error is thrown. When the device funtion would return only an integer value, this error is prevented.
Can someone explain to me where my mistake is?
Edit: FYI as answered by
talonmies list construction isn't supported in device code. An alternative that helped me is using tuples, which are supported.

The source of your error is that the device function sub_stuff is attempting to create a list in GPU code, and that isn't supported.
About the best you can do would be something like this:
from numba import jit, guvectorize, int32, int64, float64
from numba import cuda
import numpy as np
import math
#cuda.jit (device=True)
def sub_stuff(vec_a, vec_b, result):
for i in range(vec_a.shape[0]):
result[i] = vec_a[i] - vec_b[i]
#cuda.jit
def kernel_via_polygon(vectors_a, vectors_b, result_array):
pos = cuda.grid(1)
if pos < vectors_a.size and pos < result_array.size:
sub_stuff(vectors_a[pos], vectors_b[pos], result_array[pos])
vectors_a = 100 + np.arange(1, 10).reshape((3, 3))
vectors_b = np.arange(1, 10).reshape((3, 3))
result = np.zeros_like(vectors_a)
kernel_via_polygon[1,10](vectors_a, vectors_b, result)
print(result)
which uses a loop to iterate over the individual array slices and perform the subtraction between each element.

Invalid operand types for '*' (double[::1]; double[::1])

I instantiated the memoryviews in my class as following:
from __future__ import division
import numpy as np
import pylab as plt
cimport numpy as np
cimport cython
cdef class fit(object):
cdef public double[::1] shear_g1, shear_g2, shear_z, halo_pos_arcsec
cdef public double[:,::1] shear_pos_arcsec, source_zpdf
cdef char* path
cdef double omega_m, omega_l, h, sigma_g
#cython.boundscheck(False)
#cython.cdivision(True)
#cython.wraparound(False)
#cython.nonecheck(False)
def __init__(self, shear_g1, shear_g2, shear_pos_arcsec, shear_z, halo_pos_arcsec, double halo_z, source_zpdf, sigma_g, path=None, omega_m=None, omega_l=None, h=None ):
self.shear_g1 = shear_g1
self.shear_g2 = shear_g2
self.shear_pos_arcsec = shear_pos_arcsec
self.shear_z = shear_z
self.halo_pos_arcsec = halo_pos_arcsec
self.halo_z = halo_z
self.sigma_g = sigma_g
self.shear_zpdf= source_zpdf
if path is None:
raise ValueError("Could not find a path to the file which contains the table of angular diameter distances")
self.path = path
self.n_model_evals = 0
self.gaussian_prior_theta = [{'mean' : 14, 'std': 0.5}]
if omega_m is None:
self.omega_m=0.3
if omega_l is None:
self.omega_l=1-self.omega_m
if h is None:
self.h=1.
def plot(self, g1, g2):
emag=np.sqrt(g1**2+g2**2)
ephi=0.5*np.arctan2(g2,g1)
nuse=1
quiver_scale=10
plt.quiver(self.shear_pos_arcsec[::nuse,0], self.shear_pos_arcsec[::nuse,1], emag[::nuse]*np.cos(ephi)[::nuse], emag[::nuse]*np.sin(ephi)[::nuse], linewidths=0.001, headwidth=0., headlength=0., headaxislength=0., pivot='mid', color='r', label='original', scale=quiver_scale)
plt.xlim([min(self.shear_pos_arcsec[::nuse,0]),max(self.shear_pos_arcsec[::nuse,0])])
plt.ylim([min(self.shear_pos_arcsec[::nuse,1]),max(self.shear_pos_arcsec[::nuse,1])])
plt.axis('equal')
def plot_res(self, model_g1, model_g2, show=False):
res1 , res2 = self.shear_g1 - model_g1, self.shear_g2 - model_g2
emag_data=np.sqrt(self.shear_g1*self.shear_g1+self.shear_g2*self.shear_g1)
ephi_data=0.5*np.arctan2(self.shear_g2,self.shear_g1)
emag_res=np.sqrt(res1**2+res2**2)
ephi_res=0.5*np.arctan2(res2,res1)
emag_model=np.sqrt(model_g1**2+model_g2**2)
ephi_model=0.5*np.arctan2(model_g2,model_g1)
plt.figure()
plt.subplot(3,1,1)
self.plot(self.shear_g1,self.shear_g2)
plt.subplot(3,1,2)
self.plot(model_g1,model_g2)
plt.subplot(3,1,3)
self.plot(res1 , res2)
if show:
plt.show()
But I got this error message regarding the operations on the memoryviews.
Error compiling Cython file:
------------------------------------------------------------
...
def plot_res(self, model_g1, model_g2, show=False):
res1 , res2 = self.shear_g1 - model_g1, self.shear_g2 - model_g2
emag_data=np.sqrt(self.shear_g1*self.shear_g1+self.shear_g2*self.shear_g1)
^
------------------------------------------------------------
model.pyx:90:40: Invalid operand types for '*' (double[::1]; double[::1])
I am wondering how I should carry out mathematical operations on the memoryviews?

As mentioned in the comments, you can use np.asarray() to temporarily cast your memory views into arrays without making copies, but adding some overhead. A very fast solution would be to perform a loop and operate element-wise in your memory views.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Calling a numba.guvectorize'ed function inside of another - python

Related

How do you use the user_data argument in scipy.LowLevelCallable in conjunction with scipy.ndimage.generic_filter?

TypingError when using numpy.stack() with numba njit

How to use supported numpy and math functions with CUDA in Python?

Device function throws nopython exception when its returning a list instead of an integer

Invalid operand types for '*' (double[::1]; double[::1])

Categories

Resources