I'm using numba to speed up my code which is working fine without numba. But after using #jit, it crashes with this error:
Traceback (most recent call last):
File "C:\work_asaaki\code\gbc_classifier_train_7.py", line 54, in <module>
gentlebooster.train(X_train, y_train, boosting_rounds)
File "C:\work_asaaki\code\gentleboost_c_class_jit_v7_nolimit.py", line 298, in train
self.g_per_round, self.g = train_function(X, y, H)
File "C:\Anaconda\lib\site-packages\numba\dispatcher.py", line 152, in _compile_for_args
return self.jit(sig)
File "C:\Anaconda\lib\site-packages\numba\dispatcher.py", line 143, in jit
return self.compile(sig, **kws)
File "C:\Anaconda\lib\site-packages\numba\dispatcher.py", line 250, in compile
locals=self.locals)
File "C:\Anaconda\lib\site-packages\numba\compiler.py", line 183, in compile_bytecode
flags.no_compile)
File "C:\Anaconda\lib\site-packages\numba\compiler.py", line 323, in native_lowering_stage
lower.lower()
File "C:\Anaconda\lib\site-packages\numba\lowering.py", line 219, in lower
self.lower_block(block)
File "C:\Anaconda\lib\site-packages\numba\lowering.py", line 254, in lower_block
raise LoweringError(msg, inst.loc)
numba.lowering.LoweringError: Internal error:
NotImplementedError: ('cast', <llvm.core.Instruction object at 0x000000001801D320>, slice3_type, int64)
File "gentleboost_c_class_jit_v7_nolimit.py", line 103
Line 103 is below, in a loop:
weights = np.empty([n,m])
for curr_n in range(n):
weights[curr_n,:] = 1.0/(n) # this is line 103
where n is a constant already defined somewhere above in my code.
How can I remove the error? What "lowering" is going on? I'm using Anaconda 2.0.1 with Numba 0.13.x and Numpy 1.8.x on a 64-bit machine.
Based on this: https://gist.github.com/cc7768/bc5b8b7b9052708f0c0a,
I figured out what to do to avoid the issue. Instead of using the colon : to refer to any row/column, I just opened up the loop into two loops to explicitly refer to the indices in each dimension of the array:
weights = np.empty([n,m])
for curr_n in range(n):
for curr_m in range (m):
weights[curr_n,curr_m] = 1.0/(n)
There were other instances in my code after this where I used the colon, but they didn't cause errors further down, not sure why.
Related
I'm trying to use Scatterv to distribute parts of an array to each of my processors, but the line where I run the Scatterv call fails, and I get this error:
Traceback (most recent call last):
File "<ipython-input-16-e1f960b94347>", line 1, in <module>
comm.Scatterv([init_data, (sendcount,split)], init_data_local, root=0)
File "mpi4py/MPI/Comm.pyx", line 626, in mpi4py.MPI.Comm.Scatterv
File "mpi4py/MPI/msgbuffer.pxi", line 538, in mpi4py.MPI._p_msg_cco.for_scatter
File "mpi4py/MPI/msgbuffer.pxi", line 440, in mpi4py.MPI._p_msg_cco.for_cco_send
File "mpi4py/MPI/msgbuffer.pxi", line 266, in mpi4py.MPI.message_vector
File "mpi4py/MPI/msgbuffer.pxi", line 100, in mpi4py.MPI.message_basic
KeyError: '38w'
I have no idea what I'm doing wrong or how to fix this error. Any help would be appreciated!
EDIT: Here is a reproducible example of the code. Changing the data type of the init_data array changes the number following KeyError, but still gives the same error. My choice of '<U38' as the dtype is because that is what np.loadtxt uses when loading the array within my actual code.
import numpy as np
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
if rank==0:
init_data=np.ones((5187,3), dtype='<U38')
length=len(init_data[:,0])
else:
length=None
init_data=None
length=comm.bcast(length, root=0)
sendcount=[]
split=[]
for r in range(size):
split.append(r*length//size)
if r<size-1:
sendcount.append(length//size)
else:
sendcount.append(length-(r*length//size))
sendcount=tuple(sendcount)
split=tuple(split)
init_data_local=np.empty((sendcount[rank], 3),dtype=str)
comm.Scatterv([init_data, (sendcount,split)], init_data_local, root=0)
I'm learning how to use Numba to speed up functions with jit and vectorize. I didn't have any issues with the jit version of this code, but I am getting an index error with vectorize. I suspect this question's answer is getting at the right idea that there is a type error, but I'm not confident on which direction to take on changing the indexing. Included below is the function I've been playing around with, which outputs the Fibonacci numbers up to a chosen index of the sequence. What is going wrong with the indexing, and how I can correct my code to account for it?
from numba import vectorize
import numpy as np
from timeit import timeit
#vectorize
def fib(n):
'''
Adjusted from:
https://lectures.quantecon.org/py/numba.html
https://en.wikipedia.org/wiki/Fibonacci_number
https://www.geeksforgeeks.org/program-for-nth-fibonacci-number/
'''
if n == 1:
return np.ones(1)
elif n > 1:
x = np.empty(n)
x[0] = 1
x[1] = 1
for i in range(2,n):
x[i] = x[i-1] + x[i-2]
return x
else:
print('WARNING: Check validity of input.')
print(timeit('fib(10)', globals={'fib':fib}))
Which results in the following error output.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/llvmlite/ir/instructions.py", line 619, in __init__
typ = typ.elements[i]
AttributeError: 'PointerType' object has no attribute 'elements'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/galen/Projects/myjekyllblog/test_code/quantecon_2.py", line 27, in <module>
print(timeit('fib(10)', globals={'fib':fib}))
File "/usr/lib/python3.6/timeit.py", line 233, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/lib/python3.6/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/dufunc.py", line 166, in _compile_for_args
return self._compile_for_argtys(tuple(argtys))
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/dufunc.py", line 188, in _compile_for_argtys
cres, actual_sig)
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/ufuncbuilder.py", line 157, in _build_element_wise_ufunc_wrapper
cres.objectmode, cres)
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/wrappers.py", line 220, in build_ufunc_wrapper
env=envptr)
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/wrappers.py", line 130, in build_fast_loop_body
env=env)
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/wrappers.py", line 23, in _build_ufunc_loop_body
store(retval)
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/wrappers.py", line 126, in store
out.store_aligned(retval, ind)
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/wrappers.py", line 276, in store_aligned
self.context.pack_value(self.builder, self.fe_type, value, ptr)
File "/usr/local/lib/python3.6/dist-packages/numba/targets/base.py", line 482, in pack_value
dataval = self.data_model_manager[ty].as_data(builder, value)
File "/usr/local/lib/python3.6/dist-packages/numba/datamodel/models.py", line 558, in as_data
elems = self._as("as_data", builder, value)
File "/usr/local/lib/python3.6/dist-packages/numba/datamodel/models.py", line 530, in _as
self.get(builder, value, i)))
File "/usr/local/lib/python3.6/dist-packages/numba/datamodel/models.py", line 558, in as_data
elems = self._as("as_data", builder, value)
File "/usr/local/lib/python3.6/dist-packages/numba/datamodel/models.py", line 530, in _as
self.get(builder, value, i)))
File "/usr/local/lib/python3.6/dist-packages/numba/datamodel/models.py", line 624, in get
name="extracted." + self._fields[pos])
File "/usr/local/lib/python3.6/dist-packages/llvmlite/ir/builder.py", line 911, in extract_value
instr = instructions.ExtractValue(self.block, agg, idx, name=name)
File "/usr/local/lib/python3.6/dist-packages/llvmlite/ir/instructions.py", line 622, in __init__
% (list(indices), agg.type))
TypeError: Can't index at [0] in i8*
The error is because you are trying to vectorize a function which you can say is essentially not vectorizable. I think you are confusing the functionality of how #jit and #vectorize work. In order to speed up your functions, you use #jit, while #vectorize is used to create numpy universal functions. See the official documentation here :
Using vectorize(), you write your function as operating over input
scalars, rather than arrays. Numba will generate the surrounding loop
(or kernel) allowing efficient iteration over the actual inputs.
So it is essentially not possible to create a numpy universal function which has the same functionality as your fibonacci function. Here is the link for official documentation on universal functions if you are interested.
So in order to use #vectorize, you need to create a function which can be essentially used as a numpy universal function. For your purpose of speeding up your code, you simply need to use #jit.
I am confused. I have 21 files that have been generated by the same process and I am filtering them all using savitzky-golay filter with the same parameters.
It works normally for some files, but at some point, I receive the ValueError: array must not contain infs or NaNs. The problem is, I checked the file and there aren't any infs or NaNs!
print "nan", df.isnull().sum()
print "inf", np.isinf(df).sum()
gives
nan x 0
T 0
std_T 0
sterr_T 0
dtype: int64
inf x 0
T 0
std_T 0
sterr_T 0
dtype: int64
So the problem may be in implementation of the filter? Could this result from for example the choice of window length or polyorder relative to the length or step of the data?
Complete traceback:
Traceback (most recent call last):
File "<ipython-input-7-40b33049ef41>", line 1, in <module>
runfile('D:/data/scripts/DailyProfiles_Gradients.py', wdir='D:/data/DFDP2/DFDP2B/DTS/DTS_scripts')
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
execfile(filename, namespace)
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "D:/data/scripts/DailyProfiles_Gradients.py", line 142, in <module>
grad = gradient(y, x, scale,PO)
File "D:/data/scripts/DailyProfiles_Gradients.py", line 76, in Tgradient
smoothed=savgol_filter(list(x), scale, PO, deriv=1, delta=dy[0])
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\scipy\signal\_savitzky_golay.py", line 337, in savgol_filter
coeffs = savgol_coeffs(window_length, polyorder, deriv=deriv, delta=delta)
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\scipy\signal\_savitzky_golay.py", line 140, in savgol_coeffs
coeffs, _, _, _ = lstsq(A, y)
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\scipy\linalg\basic.py", line 822, in lstsq
b1 = _asarray_validated(b, check_finite=check_finite)
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\scipy\_lib\_util.py", line 187, in _asarray_validated
a = toarray(a)
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 1033, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
This problem is rather specific to the data and method and I have not been able to produce a minimum reproducible working example. I am not asking for fixing my code, I am just asking for some brainstorming: What aspects have I not checked yet that might be causing the error? What should the function parameters look like, other than that the window length must be an odd number greater than the polyorder?
I am grateful for the discussion that has arisen, it helped, eventually.
I can reproduce the error ValueError: array must not contain infs or NaNs if delta is extremely small (e.g. delta=1e-310). Check your code and data to ensure that the values that you pass for delta are reasonable.
I persisted a TfidfVectorizer using the module joblib. The object that I run through the method fit_transform was a list of strings.
The resulting matrix had a dimensionality of 263744 columns.
I am running a list of strings through the transform method, and I get the following error.
Any clues?
File "/usr/local/lib/python2.7/dist- packages/sklearn/feature_extraction/text.py",
line 1334, in transform
return self._tfidf.transform(X, copy=False)
File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.py",
line 1037, in transform
X = X * self._idf_diag
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/base.py", line
318, in __mul__
return self._mul_sparse_matrix(other)
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/compressed.py",
line 487, in _mul_sparse_matrix
other = self.__class__(other) # convert to this format
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/compressed.py",
line 31, in __init__
arg1 = arg1.asformat(self.format)
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/base.py",
line 219, in asformat
return getattr(self,'to' + format)()
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/dia.py",
line 241, in tocsr
return self.tocoo().tocsr()
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/dia.py",
line 249, in tocoo
num_offsets, offset_len = self.data.shape
AttributeError: 'NDArrayWrapper' object has no attribute 'shape'
Assuming you are persisting the trained transformer or pipeline to disk, and then reloading it before seeing the error, you could:
Try saving the original (working) object using the compress keyword argument parameter to joblib.dump, with an integer value greater than 0:
_ = joblib.dump(python_object, persisted_file_name, compress=3)
If the persisted file is being moved to a new location, make
sure to copy all the file pieces. If it is large, joblib will
split it up, e.g.:
persisted_model.joblib.pkl
persisted_model.joblib.pkl_01.npy
persisted_model.joblib.pkl_02.npy
joblib docs
I am running the code below and it is running fine for the first iteration and when the second iteration starts it gives me a key error. I notice that the there is a string "L" added to the key automatically when the second iteration starts.
Link to my code below:
Code for KNN having issues here
Link for the data that I am using is below:
Data used for the code
Not sure why it is happening. Can someone pls let me know what is causing the issue. Help is greatly appreciated!!
Traceback (most recent call last):
File "C:/Python27/myScripts/KNN.py", line 114, in <module>
pred_lst.append(predict_output_of_query(10.0, features_train, df_housePrice_train, features_test[i]))
File "C:/Python27/myScripts/KNN.py", line 96, in predict_output_of_query
avg1 += output_train["price"][i]
File "C:\Python27\lib\site-packages\pandas\core\series.py", line 557, in __getitem__
result = self.index.get_value(self, key)
File "C:\Python27\lib\site-packages\pandas\core\index.py", line 1790, in get_value
return self._engine.get_value(s, k)
File "pandas\index.pyx", line 103, in pandas.index.IndexEngine.get_value (pandas\index.c:3204)
File "pandas\index.pyx", line 111, in pandas.index.IndexEngine.get_value (pandas\index.c:2903)
File "pandas\index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)
File "pandas\hashtable.pyx", line 303, in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6525)
File "pandas\hashtable.pyx", line 309, in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6463)
KeyError: 6818L
Now, I looked only at your get_numpy_data definition and think that it doesn't work as you'd expect. For example, the line
features_train, output_train = get_numpy_data(df_housePrice_train, feature_list, 'price')
seems to modify df_housePrice_train. And output_train becomes an np array containing a string "price".
Update:
The line distances = [] should really be inside the function compute_distances. This function appends elements to distances on every execution. Next, indices (positions) of some of the elements are applied to a data frame. On first execution everything works fine, but later the list grows and some indices get larger - exceeding the size of the data frame.
Update 2
For completeness: KeyError: 6818L means that the long integer 6818 (L denotes a type here) is not a valid key in df_housePrice_train.
Needed code modification:
## KNN.py, line 61:
#distances = [] # <- delete this line
def compute_distances(features_instances, features_query):
distances = [] # <-- add here
# rest of the function body...