I meet a problem about using numba jit decorator (#nb.jit)! Here is the warning from jupyter notebook,
NumbaWarning:
Compilation is falling back to object mode WITH looplifting enabled because Function "get_nb_freq" failed type inference due to: No implementation of function Function(<function dotenter image description here at 0x00000190AC399B80>) found for signature:
This is complete information:
Here is my code:
#numba.jit
def get_nb_freq( nb_count = None, onehot_ct = None):
# nb_freq = onehot_ct.T # nb_count
nb_freq = np.dot(onehot_ct.T, nb_count)
res = nb_freq/nb_freq.sum(axis = 1).reshape(Num_celltype,-1)
return res
## onehot_ct is array, and its shape is (921600,4)
## nb_count is array, and its shape is the same as onehot_ct
## Num_celltype equals 4
Based on your mentioned shapes we can create the arrays as:
onehot_ct = np.random.rand(921600, 4)
nb_count = np.random.rand(921600, 4)
Your prepared code will be work correctly and get an answer like:
[[0.25013102754197963 0.25021461207825463 0.2496806287276126 0.24997373165215303]
[0.2501574139037384 0.25018726649940737 0.24975108864220968 0.24990423095464467]
[0.25020550587624757 0.2501303498983212 0.24978335463279314 0.24988078959263807]
[0.2501855533482036 0.2500913419625523 0.24979681404573967 0.24992629064350436]]
So, it shows the code is working and the problem seems to be related to type of the arrays, that numba can not recognize them. So, signature may be helpful here, which by we can recognize the types manually for the function. So, based on the error I think the following signature will pass your issue:
#nb.jit("float64[:, ::1](float64[:, ::1], float32[:, ::1])")
def get_nb_freq( nb_count = None, onehot_ct = None):
nb_freq = np.dot(onehot_ct.T, nb_count)
res = nb_freq/nb_freq.sum(axis=1).reshape(4, -1)
return res
But it will stuck again if you test by get_nb_freq(nb_count.astype(np.float64), onehot_ct.astype(np.float32)), So another cause could be related to unequal types in np.dot. So, use the onehot_ct array as array type np.float64, could pass the issue:
#nb.jit("float64[:, ::1](float64[:, ::1], float32[:, ::1])")
def get_nb_freq( nb_count, onehot_ct):
nb_freq = np.dot(onehot_ct.astype(np.float64).T, nb_count)
res = nb_freq/nb_freq.sum(axis=1).reshape(4, -1)
return res
It ran on my machine with this correction. I recommend write numba equivalent codes (like this for np.dot) instead using np.dot or …, which can be much faster.
Related
I'm trying out numba, the python package that is said to make my nparray super fast. I want to run my function in nonpython mode. What it essentially does is that it takes in an 20x20 array, assigns random numbers to each of its elements, calculate its inverse matrix, then return it.
But here's the problem, when I initialize the array result with np.zeros(), my script crashes and gives me an error message 'overload of function zeros'.
Could someone kindly tell me what is going on? Much appreciated.
from numba import njit
import time
import numpy as np
import random
arr = np.zeros((20,20),dtype = float)
#njit
def aFunctionWithNumba (incomingArray):
result = np.zeros(np.shape(incomingArray), dtype = float)
for i in range(len(incomingArray[0])):
for j in range(len(incomingArray[1])):
incomingArray[i,j] = random.randrange(105150,1541586)
result = np.linalg.inv(incomingArray)
return result
t0 = time.time()
fastArray = aFunctionWithNumba(arr)
t1 = time.time()
s1 = t1 - t0
Here's the full error message:
Exception has occurred: TypingError Failed in nopython mode pipeline (step: nopython frontend) No implementation of function Function(<built-in function zeros>) found for signature:
>>> zeros(UniTuple(int64 x 2), dtype=Function(<class 'float'>)) There are 2 candidate implementations:
- Of which 2 did not match due to: Overload of function 'zeros': File: numba\core\typing\npydecl.py: Line 511.
With argument(s): '(UniTuple(int64 x 2), dtype=Function(<class 'float'>))': No match.
During: resolving callee type: Function(<built-in function zeros>) During: typing of call at c:\Users\Eric\Desktop\testNumba.py (9)
File "testNumba.py", line 9: def aFunctionWithNumba (incomingArray):
result = np.zeros(np.shape(incomingArray), dtype = float)
^ File "C:\Users\Eric\Desktop\testNumba.py", line 25, in <module>
fastArray = aFunctionWithNumba(arr)
The error
You should use Numpy or Numba types inside JITted functions.
Changing the following line your code works:
result = np.zeros(np.shape(incomingArray), dtype=np.float64)
But your code will be more generic using:
result = np.zeros(incomingArray.shape, dtype=incomingArray.dtype)
Or, even better:
result = np.zeros_like(incomingArray)
The timing
The first time you call a JITted function it will take some time to compile it, much longer than the time it will take to execute it. So you should call the function with the same parameter types once before you make any timings.
Additional optimization
If you are interested in comparing the execution time of nested loops with or without Numba, your code is fine. Otherwise you can replace the loops with something like:
incomingArray[:] = np.random.random(incomingArray.shape) * (1541586 - 105150) + 105150
I am trying to compile the following function with Numba:
#njit(fastmath=True, nogil=True)
def generate_items(array, start):
array_positions = np.empty(SIZE, dtype=np.int64)
count = 0
while count < SIZE - start:
new_array = mutate(np.empty(0, dtype=np.uint8))
if len(new_array) > 0:
array_positions[count] = len(array) # <<=== FAILS HERE
array = np.append(array, np.append(new_array, 255))
count += 1
return array, array_positions
But it fails on the indicated line above with this error message:
numba.core.errors.TypingError: Failed in nopython mode pipeline (step: nopython frontend)
Cannot unify array(uint8, 1d, C) and array(int64, 1d, C) for 'array.3', defined at ...
Which doesn't seem to make sense since I'm just assigning an int (the result on len) to an array that has a dtype of np.int64?
Note that array is of type np.uint8 - but I'm not assigning the array itself so this message makes no sense to me.
I attempted to refactor to this:
tmp = len(array) # <<=== FAILS HERE
array_positions[count] = tmp
But then it fails there... same message.
I also tried replacing len(array) by array.size since this is a 1d array, but same error.
Can anyone see why this is failing?
I'm on Python 3.7 and Numba 0.50
Thanks!
First issue is that you use an unsupported function mutate (as stated here).
Next, as the error says, you try to add arrays with different data types in
np.append(array, np.append(new_array, 255))
where the array is of type int64 and new_array holds values of type uint8. If you've used #jit it would have showed you a warning of falling to object mode, but because you've enforced the nonpython mode by using the #njit decorator, it throws an error.
Cheers.
I am new to cython and have the following code for a numpy for loop that I am trying to optimize. So far, this Cython code isn't much faster than the numpy for loop.
# cython: infer_types = True
import numpy as np
cimport numpy
DTYPE = np.double
def hdcfTransfomation(scanData):
cdef Py_ssize_t Position
scanLength = scanData.shape[0]
hdcfFunction_np = np.zeros(scanLength, dtype = DTYPE)
cdef double [::1] hdcfFunction = hdcfFunction_np
for position in range(scanLength - 1):
topShift = scanData[1 + position:]
bottomShift = scanData[:-(position + 1)]
arrayDiff = np.subtract(topShift, bottomShift)
arraySquared = np.square(arrayDiff)
arrayMean = np.mean(arraySquared, axis = 0)
hdcfFunction[position] = arrayMean
return hdcfFunction
I know that using C math library functions would be more ideal than calling back into the numpy language (subtract, square, mean), but I am not sure where I can find a list of functions that can be called in this manner.
I have been trying to figure out ways to optimize this code by using different types, ect. but nothing is providing the performance that I think is possible with a fully optimized implementation of Cython.
Here is a working example of the numpy for-loop:
def hdcfTransfomation(scanData):
scanLength = scanData.shape[0]
hdcfFunction = np.zeros(scanLength)
for position in range(scanLength - 1):
topShift = scanData[1 + position:]
bottomShift = scanData[:-(position + 1)]
arrayDiff = np.subtract(topShift, bottomShift)
arraySquared = np.square(arrayDiff)
arrayMean = np.mean(arraySquared, axis = 0)
hdcfFunction[position] = arrayMean
return hdcfFunction
scanDataArray = np.random.rand(80000, 1)
transformedScan = hdcfTransformed(scanDataArray)
Always provide as much informations as possible (some example data, Python/Cython Version, Compiler Version/Settings and CPU Model.
Without that it is quite hard to compare any timings. For example this problem benefits quite well from SIMD-vectorization. It will make quite a difference which compiler you use or if you want to redistribute a compiled version which should also run on low-end or quite old CPUS (eg. no AVX).
I am not very familiar with Cython, but I think your main problem is the missing declaration for scanData. Maybe the C-Compiler needs additional flags like march=native, but the real syntax is compiler dependend. I am am also not sure how Cython or the C-compiler optimizes this part:
arrayDiff = np.subtract(topShift, bottomShift)
arraySquared = np.square(arrayDiff)
arrayMean = np.mean(arraySquared, axis = 0)
If that loops (all vectorized commands are actually loops) are not joined, but intead there are temporary arryas like in pure Python created, this will slow down the code. It will be a good idea to create a 1D array first. (eg. scanData=scanData[::1]
As said I am not that familliar with Cython, I tried what is possible with Numba. At least it shows what should also be possible with a resonable good Cython implementation.
Maybe easier to otimize for the compiler
import numba as nb
import numpy as np
#nb.njit(fastmath=True,error_model='numpy',parallel=True)
#scanData is a 1D-array here
def hdcfTransfomation(scanData):
scanLength = scanData.shape[0]
hdcfFunction = np.zeros(scanLength, dtype = scanData.dtype)
for position in nb.prange(scanLength - 1):
topShift = scanData[1 + position:]
bottomShift = scanData[:scanData.shape[0]-(position + 1)]
sum=0.
jj=0
for i in range(scanLength-(position + 1)):
jj+=1
sum+=(topShift[i]-bottomShift[i])**2
hdcfFunction[position] = sum/jj
return hdcfFunction
I also used parallelization here, because the problem is embarrassingly parallel. At least with a size of 80_000 and Numba it doesn't matter if you use a slightly modified version of your code (1D-array), or the code above.
Timings
#Quadcore Core i7-4th gen,Numba 0.4dev,Python 3.6
scanData=np.random.rand(80_000)
#The first call to the function isn't measured (compilation overhead),but the following calls.
Pure Python: 5900ms
Numba single-threaded: 947ms
Numba parallel: 260ms
Especially for larger arrays than np.random.rand(80_000) there may be better aproaches (loop tilling for better cache usage), but for this size that should be more or less OK (At least it fits in the L3-cache)
Naive GPU Implementation
from numba import cuda, float32
#cuda.jit('void(float32[:], float32[:])')
def hdcfTransfomation_gpu(scanData,out_data):
scanLength = scanData.shape[0]
position = cuda.grid(1)
if position < scanLength - 1:
sum= float32(0.)
offset=1 + position
for i in range(scanLength-offset):
sum+=(scanData[i+offset]-scanData[i])**2
out_data[position] = sum/(scanLength-offset)
hdcfTransfomation_gpu[scanData.shape[0]//64,64](scanData,res_3)
This gives about 400ms on a GT640 (float32) and 970ms (float64). For a good implemenation shared arrays should be considered.
Putting cython aside, does this do the same thing as your current code but without a for loop? We can tighten it up and correct for inaccuracies, but the first port of call is to try apply operations in numpy to 2D arrays before turning to cython for for loops. It's too long to put in a comment.
import numpy as np
# Setup
arr = np.random.choice(np.arange(10), 100).reshape(10, 10)
top_shift = arr[:, :-1]
bottom_shift = arr[:, 1:]
arr_diff = top_shift - bottom_shift
arr_squared = np.square(arr_diff)
arr_mean = arr_squared.mean(axis=1)
I'm trying to initialize two vectors in memory using gsl_vector_set(). In the main code it is initialized to zero on default, but I wanted to initialize them to some non-zero value. I made a test code based on a working function that uses the gsl_vector_set() function.
from ctypes import *;
gsl = cdll.LoadLibrary('libgsl-0.dll');
gsl.gsl_vector_get.restype = c_double;
gsl.gsl_matrix_get.restype = c_double;
gsl.gsl_vector_set.restype = c_double;
foo = dict(
x_ht = [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0],
x_ht_m = [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
);
for f in range(0,18):
gsl.gsl_vector_set(foo['x_ht_m'],f,c_double(1.0));
gsl.gsl_vector_set(foo['x_ht'],f,c_double(1.0));
When I run the code I get this error.
ArgumentError: argument 1: <type 'exceptions.TypeError'>: Don't know how to convert parameter 1
I'm new to using ctypes and gsl functions so I'm not sure what the issue is or what the error message means. I an also not sure if there is a better way that I should be trying to save a vector to memory
Thank you #CristiFati for pointing out that I needed gsl_vector_calloc in my test code. I noticed that in the main code I was working in that the vector I needed to set was
NAV.KF_dictnry['x_hat_m']
instead of
NAV.KF_dictnry['x_ht_m']
So I fixed the test code to mirror the real code a bit better by creating a class holding the dictionary, and added the ability to change each value in the vector to an arbitrary value.
from ctypes import *;
gsl = cdll.LoadLibrary('libgsl-0.dll');
gsl.gsl_vector_get.restype = c_double;
gsl.gsl_matrix_get.restype = c_double;
gsl.gsl_vector_set.restype = c_double;
class foo(object):
fu = dict(
x_hat = gsl.gsl_vector_calloc(c_size_t(18)),
x_hat_m = gsl.gsl_vector_calloc(c_size_t(18)),
);
x_ht = [1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]
x_ht_m = [1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,
1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0]
for f in range(0,18):
gsl.gsl_vector_set(foo.fu['x_hat_m'],f,c_double(x_ht_m[f]));
gsl.gsl_vector_set(foo.fu['x_hat'],f,c_double(x_ht[f]));
After running I checked with:
gsl.gsl_vector_get(foo.fu['x_hat_m'],0)
and got out a 1.0 (worked for the entire vector).
Turned out to just be some stupid mistakes on my end.
Thanks again!
I am writing a code which needs to do some indexing in python using numba.
However, I cannot do it correctly.
It seems something is prohibited.
The code is as follows:
from numba import cuda
import numpy as np
#cuda.jit
def function(output, size, random_array):
i_p, i_k1, i_k2 = cuda.grid(3)
if i_p<size and i_k1<size and i_k2<size:
a1=i_p**2+i_k1
a2=i_p**2+i_k2
a3=i_k1**2+i_k2**2
a=[a1,a2,a3]
for i in range(len(random_array)):
output[i_p,i_k1,i_k2,i] = a[int(random_array[i])]
output=cuda.device_array((10,10,10,5))
random_array=cuda.to_device(np.array([np.random.random()*3 for i in range(5)]))
size=10
threadsperblock = (8, 8, 8)
blockspergridx=(size + (threadsperblock[0] - 1)) // threadsperblock[0]
blockspergrid = ((blockspergridx, blockspergridx, blockspergridx))
# Start the kernel
function[blockspergrid, threadsperblock](output, size, random_array)
print(output.copy_to_host())
It yields an error:
LoweringError: Failed at nopython (nopython mode backend)
'CUDATargetContext' object has no attribute 'build_list'
File "<ipython-input-57-6058e2bfe8b9>", line 10
[1] During: lowering "$40.21 = build_list(items=[Var(a1, <ipython-input-57-6058e2bfe8b9> (7)), Var(a2, <ipython-input-57-6058e2bfe8b9> (8)), Var(a3, <ipython-input-57-6058e2bfe8b9> (9))])" at <ipython-input-57-6058e2bfe8b9> (10
Can anyone help me with this?
One choice is to feed a also as an input of the function, but when a is really large like some 1000*1000*1000*7 array, it always gives me out off memory.
The problem has nothing to do with array indexing. Within the kernel, this line:
a=[a1,a2,a3]
is not supported. You cannot create a list within a #cuda.jit function. The exact list of supported Python types within kernels is fully documented here.