2nd Iteration adding extra character in Pandas/Numpy

2nd Iteration adding extra character in Pandas/Numpy - python

I am running the code below and it is running fine for the first iteration and when the second iteration starts it gives me a key error. I notice that the there is a string "L" added to the key automatically when the second iteration starts.
Link to my code below:
Code for KNN having issues here
Link for the data that I am using is below:
Data used for the code
Not sure why it is happening. Can someone pls let me know what is causing the issue. Help is greatly appreciated!!
Traceback (most recent call last):
File "C:/Python27/myScripts/KNN.py", line 114, in <module>
pred_lst.append(predict_output_of_query(10.0, features_train, df_housePrice_train, features_test[i]))
File "C:/Python27/myScripts/KNN.py", line 96, in predict_output_of_query
avg1 += output_train["price"][i]
File "C:\Python27\lib\site-packages\pandas\core\series.py", line 557, in __getitem__
result = self.index.get_value(self, key)
File "C:\Python27\lib\site-packages\pandas\core\index.py", line 1790, in get_value
return self._engine.get_value(s, k)
File "pandas\index.pyx", line 103, in pandas.index.IndexEngine.get_value (pandas\index.c:3204)
File "pandas\index.pyx", line 111, in pandas.index.IndexEngine.get_value (pandas\index.c:2903)
File "pandas\index.pyx", line 157, in pandas.index.IndexEngine.get_loc (pandas\index.c:3843)
File "pandas\hashtable.pyx", line 303, in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6525)
File "pandas\hashtable.pyx", line 309, in pandas.hashtable.Int64HashTable.get_item (pandas\hashtable.c:6463)
KeyError: 6818L

Now, I looked only at your get_numpy_data definition and think that it doesn't work as you'd expect. For example, the line
features_train, output_train = get_numpy_data(df_housePrice_train, feature_list, 'price')
seems to modify df_housePrice_train. And output_train becomes an np array containing a string "price".
Update:
The line distances = [] should really be inside the function compute_distances. This function appends elements to distances on every execution. Next, indices (positions) of some of the elements are applied to a data frame. On first execution everything works fine, but later the list grows and some indices get larger - exceeding the size of the data frame.
Update 2
For completeness: KeyError: 6818L means that the long integer 6818 (L denotes a type here) is not a valid key in df_housePrice_train.
Needed code modification:
## KNN.py, line 61:
#distances = [] # <- delete this line
def compute_distances(features_instances, features_query):
distances = [] # <-- add here
# rest of the function body...

Related

single positional indexer is out-of-bounds Error Coding masters, please help me

I am working on collecting and pre-processing Excel data written in 2021.
My code is given below:
AE_cor = pd.DataFrame()
global fnames
for i in fnames:
# AE_team = pd.read_excel(f'{i}', header = 3)
AE_team = pd.read_excel(f'{i}')
team = AE_team.iloc[5,0]
date = AE_team.iloc[0,5]
'fname' contains multiple Excel file paths received from the QFileDialog.getOpenFileNames.
Present output:
Traceback (most recent call last):
File "c:\Users\My\Desktop\python workspace\.vscode\business_expenses\Agency_expense.py", line 71, in create_table
team = AE_team.iloc[5,0]
File "C:\Python39\lib\site-packages\pandas\core\indexing.py", line 889, in __getitem__
return self._getitem_tuple(key)
File "C:\Python39\lib\site-packages\pandas\core\indexing.py", line 1450, in _getitem_tuple
self._has_valid_tuple(tup)
File "C:\Python39\lib\site-packages\pandas\core\indexing.py", line 723, in _has_valid_tuple
self._validate_key(k, i)
File "C:\Python39\lib\site-packages\pandas\core\indexing.py", line 1358, in _validate_key
self._validate_integer(key, axis)
File "C:\Python39\lib\site-packages\pandas\core\indexing.py", line 1444, in _validate_integer
raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
The strange thing is that the Excel file written in 2020 works well. Why is there such an error? Is there anyone who can help you understand the error?

The error IndexError: single positional indexer is out-of-bounds is telling you that one of the DataFrames doesn't have the number of rows or columns you expect it to have.
Running print(AE_team.head(10)) before team = AE_team.iloc[5,0] will likely help you figure out which one (the one that triggers the error).

Numba Indexing Error: TypeError: Can't index at [0] in i8*

I'm learning how to use Numba to speed up functions with jit and vectorize. I didn't have any issues with the jit version of this code, but I am getting an index error with vectorize. I suspect this question's answer is getting at the right idea that there is a type error, but I'm not confident on which direction to take on changing the indexing. Included below is the function I've been playing around with, which outputs the Fibonacci numbers up to a chosen index of the sequence. What is going wrong with the indexing, and how I can correct my code to account for it?
from numba import vectorize
import numpy as np
from timeit import timeit
#vectorize
def fib(n):
'''
Adjusted from:
https://lectures.quantecon.org/py/numba.html
https://en.wikipedia.org/wiki/Fibonacci_number
https://www.geeksforgeeks.org/program-for-nth-fibonacci-number/
'''
if n == 1:
return np.ones(1)
elif n > 1:
x = np.empty(n)
x[0] = 1
x[1] = 1
for i in range(2,n):
x[i] = x[i-1] + x[i-2]
return x
else:
print('WARNING: Check validity of input.')
print(timeit('fib(10)', globals={'fib':fib}))
Which results in the following error output.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/llvmlite/ir/instructions.py", line 619, in __init__
typ = typ.elements[i]
AttributeError: 'PointerType' object has no attribute 'elements'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/galen/Projects/myjekyllblog/test_code/quantecon_2.py", line 27, in <module>
print(timeit('fib(10)', globals={'fib':fib}))
File "/usr/lib/python3.6/timeit.py", line 233, in timeit
return Timer(stmt, setup, timer, globals).timeit(number)
File "/usr/lib/python3.6/timeit.py", line 178, in timeit
timing = self.inner(it, self.timer)
File "<timeit-src>", line 6, in inner
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/dufunc.py", line 166, in _compile_for_args
return self._compile_for_argtys(tuple(argtys))
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/dufunc.py", line 188, in _compile_for_argtys
cres, actual_sig)
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/ufuncbuilder.py", line 157, in _build_element_wise_ufunc_wrapper
cres.objectmode, cres)
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/wrappers.py", line 220, in build_ufunc_wrapper
env=envptr)
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/wrappers.py", line 130, in build_fast_loop_body
env=env)
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/wrappers.py", line 23, in _build_ufunc_loop_body
store(retval)
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/wrappers.py", line 126, in store
out.store_aligned(retval, ind)
File "/usr/local/lib/python3.6/dist-packages/numba/npyufunc/wrappers.py", line 276, in store_aligned
self.context.pack_value(self.builder, self.fe_type, value, ptr)
File "/usr/local/lib/python3.6/dist-packages/numba/targets/base.py", line 482, in pack_value
dataval = self.data_model_manager[ty].as_data(builder, value)
File "/usr/local/lib/python3.6/dist-packages/numba/datamodel/models.py", line 558, in as_data
elems = self._as("as_data", builder, value)
File "/usr/local/lib/python3.6/dist-packages/numba/datamodel/models.py", line 530, in _as
self.get(builder, value, i)))
File "/usr/local/lib/python3.6/dist-packages/numba/datamodel/models.py", line 558, in as_data
elems = self._as("as_data", builder, value)
File "/usr/local/lib/python3.6/dist-packages/numba/datamodel/models.py", line 530, in _as
self.get(builder, value, i)))
File "/usr/local/lib/python3.6/dist-packages/numba/datamodel/models.py", line 624, in get
name="extracted." + self._fields[pos])
File "/usr/local/lib/python3.6/dist-packages/llvmlite/ir/builder.py", line 911, in extract_value
instr = instructions.ExtractValue(self.block, agg, idx, name=name)
File "/usr/local/lib/python3.6/dist-packages/llvmlite/ir/instructions.py", line 622, in __init__
% (list(indices), agg.type))
TypeError: Can't index at [0] in i8*

The error is because you are trying to vectorize a function which you can say is essentially not vectorizable. I think you are confusing the functionality of how #jit and #vectorize work. In order to speed up your functions, you use #jit, while #vectorize is used to create numpy universal functions. See the official documentation here :
Using vectorize(), you write your function as operating over input
scalars, rather than arrays. Numba will generate the surrounding loop
(or kernel) allowing efficient iteration over the actual inputs.
So it is essentially not possible to create a numpy universal function which has the same functionality as your fibonacci function. Here is the link for official documentation on universal functions if you are interested.
So in order to use #vectorize, you need to create a function which can be essentially used as a numpy universal function. For your purpose of speeding up your code, you simply need to use #jit.

error with persisted sklearn.feature_extraction.text.TfidfVectorizer

I persisted a TfidfVectorizer using the module joblib. The object that I run through the method fit_transform was a list of strings.
The resulting matrix had a dimensionality of 263744 columns.
I am running a list of strings through the transform method, and I get the following error.
Any clues?
File "/usr/local/lib/python2.7/dist- packages/sklearn/feature_extraction/text.py",
line 1334, in transform
return self._tfidf.transform(X, copy=False)
File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.py",
line 1037, in transform
X = X * self._idf_diag
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/base.py", line
318, in __mul__
return self._mul_sparse_matrix(other)
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/compressed.py",
line 487, in _mul_sparse_matrix
other = self.__class__(other) # convert to this format
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/compressed.py",
line 31, in __init__
arg1 = arg1.asformat(self.format)
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/base.py",
line 219, in asformat
return getattr(self,'to' + format)()
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/dia.py",
line 241, in tocsr
return self.tocoo().tocsr()
File "/usr/local/lib/python2.7/dist-packages/scipy/sparse/dia.py",
line 249, in tocoo
num_offsets, offset_len = self.data.shape
AttributeError: 'NDArrayWrapper' object has no attribute 'shape'

Assuming you are persisting the trained transformer or pipeline to disk, and then reloading it before seeing the error, you could:
Try saving the original (working) object using the compress keyword argument parameter to joblib.dump, with an integer value greater than 0:
_ = joblib.dump(python_object, persisted_file_name, compress=3)
If the persisted file is being moved to a new location, make
sure to copy all the file pieces. If it is large, joblib will
split it up, e.g.:
persisted_model.joblib.pkl
persisted_model.joblib.pkl_01.npy
persisted_model.joblib.pkl_02.npy
joblib docs

How to fix "ValueError: need at least one array to concatenate" error

Following up from here: Calculating percentage of Bounding box overlap, for image detector evaluation, I'm getting an error at this line:
poly_clipped = poly.clip_to_bbox(clip_rect).to_polygons()[0]
This is the error:
File "C:\work_asaaki\code\detection.py", line 32, in clip_boxes
poly_clipped = poly.clip_to_bbox(clip_rect).to_polygons()[0]
File "C:\Anaconda\lib\site-packages\matplotlib\path.py", line 909, in clip_to_bbox
return self.make_compound_path(*paths)
File "C:\Anaconda\lib\site-packages\matplotlib\path.py", line 328, in make_compound_path
vertices = np.vstack([x.vertices for x in args])
File "C:\Anaconda\lib\site-packages\numpy\core\shape_base.py", line 228, in vstack
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
ValueError: need at least one array to concatenate
This doesn't always happen, it happens based on specific sets of polygons... What I'm trying to understand is when exactly does it not work? And how can I solve the issue?

Remove the numba.lowering.LoweringError: Internal error

I'm using numba to speed up my code which is working fine without numba. But after using #jit, it crashes with this error:
Traceback (most recent call last):
File "C:\work_asaaki\code\gbc_classifier_train_7.py", line 54, in <module>
gentlebooster.train(X_train, y_train, boosting_rounds)
File "C:\work_asaaki\code\gentleboost_c_class_jit_v7_nolimit.py", line 298, in train
self.g_per_round, self.g = train_function(X, y, H)
File "C:\Anaconda\lib\site-packages\numba\dispatcher.py", line 152, in _compile_for_args
return self.jit(sig)
File "C:\Anaconda\lib\site-packages\numba\dispatcher.py", line 143, in jit
return self.compile(sig, **kws)
File "C:\Anaconda\lib\site-packages\numba\dispatcher.py", line 250, in compile
locals=self.locals)
File "C:\Anaconda\lib\site-packages\numba\compiler.py", line 183, in compile_bytecode
flags.no_compile)
File "C:\Anaconda\lib\site-packages\numba\compiler.py", line 323, in native_lowering_stage
lower.lower()
File "C:\Anaconda\lib\site-packages\numba\lowering.py", line 219, in lower
self.lower_block(block)
File "C:\Anaconda\lib\site-packages\numba\lowering.py", line 254, in lower_block
raise LoweringError(msg, inst.loc)
numba.lowering.LoweringError: Internal error:
NotImplementedError: ('cast', <llvm.core.Instruction object at 0x000000001801D320>, slice3_type, int64)
File "gentleboost_c_class_jit_v7_nolimit.py", line 103
Line 103 is below, in a loop:
weights = np.empty([n,m])
for curr_n in range(n):
weights[curr_n,:] = 1.0/(n) # this is line 103
where n is a constant already defined somewhere above in my code.
How can I remove the error? What "lowering" is going on? I'm using Anaconda 2.0.1 with Numba 0.13.x and Numpy 1.8.x on a 64-bit machine.

Based on this: https://gist.github.com/cc7768/bc5b8b7b9052708f0c0a,
I figured out what to do to avoid the issue. Instead of using the colon : to refer to any row/column, I just opened up the loop into two loops to explicitly refer to the indices in each dimension of the array:
weights = np.empty([n,m])
for curr_n in range(n):
for curr_m in range (m):
weights[curr_n,curr_m] = 1.0/(n)
There were other instances in my code after this where I used the colon, but they didn't cause errors further down, not sure why.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

2nd Iteration adding extra character in Pandas/Numpy - python

Related

single positional indexer is out-of-bounds Error Coding masters, please help me

Numba Indexing Error: TypeError: Can't index at [0] in i8*

error with persisted sklearn.feature_extraction.text.TfidfVectorizer

How to fix "ValueError: need at least one array to concatenate" error

Remove the numba.lowering.LoweringError: Internal error

Categories

Resources