UserWarning: X scores are null at iteration - python

I am trying to run CCA for a multi label/text classification problem but keep getting following warning and an error which I think are related
warnings.warn('Maximum number of iterations reached')
/Library/Python/2.7/site-packages/sklearn/cross_decomposition/pls_.py:290:
UserWarning: X scores are null at iteration 0 warnings.warn('X
scores are null at iteration %s' % k)
warnings.warn('Maximum number of iterations reached')
/Library/Python/2.7/site-packages/sklearn/cross_decomposition/pls_.py:290:
UserWarning: X scores are null at iteration 1
warnings.warn('X scores are null at iteration %s' % k)
...
for all the 400 iterations and then following error at the end which I think is a side effect of above warning:
Traceback (most recent call last): File "scikit_fb3.py", line 477,
in
getCCA(shorttestfilepathPreProcessed) File "scikit_fb3.py", line 318, in getCCA
X_CCA = cca.fit(x_array, Y_indicator).transform(X) File "/Library/Python/2.7/site-packages/sklearn/cross_decomposition/pls_.py",
line 368, in transform
Xc = (np.asarray(X) - self.x_mean_) / self.x_std_ File "/usr/local/bin/src/scipy/scipy/sparse/compressed.py", line 389, in
sub
raise NotImplementedError('adding a nonzero scalar to a ' NotImplementedError: adding a nonzero scalar to a sparse matrix is not
supported
What could possibly be wrong?

CCA doesn't support sparse matrices. By default, you should assume scikit-learn estimators do not grok sparse matrices and check their docstrings to find out if by chance you found one that does.
(I admit the warning could have been friendlier.)

Related

python ahrs complementary filter yields error for shape

I'm trying to use ahrs' Complementary filter, but can't get it working.
The shape of the data:
for ele in zip(gl, al):
comp.append(ahrs.filters.complementary.Complementary(ele[0],ele[1],None,200))
where ele[0] (gyroscope data) looks like [ 0.75 0.6875 -0.625 ] and ele[1] (accelerometer data) looks like [ 0.03125 -0.08935547 1.01123047]
The datatype of both arrays is <class 'numpy.ndarray'> as demanded by the Complementary function.
The current error comes from the framework itself where it states
Traceback (most recent call last):
File "/home/luke/test_imu.py", line 111, in <module>
comp.append(ahrs.filters.complementary.Complementary(ele[0],ele[1],None,200))
File "/usr/lib/python3.9/site-packages/ahrs/filters/complementary.py", line 158, in __init__
self.Q = self._compute_all()
File "/usr/lib/python3.9/site-packages/ahrs/filters/complementary.py", line 184, in _compute_all
Q[0] = self.am_estimation(self.acc[0], self.mag[0]) if self.q0 is None else self.q0.copy()
File "/usr/lib/python3.9/site-packages/ahrs/filters/complementary.py", line 283, in am_estimation
if acc.shape[-1] != 3:
IndexError: tuple index out of range
the problem therefore is, that the line Q[0] = self.am_estimation(self.acc[0], self.mag[0]) if self.q0 is None else self.q0.copy() takes the first element off the 3-size ndarray with self.acc[0]. Of course the value is then the wrong shape.
So is there a error in the framework or in my data structure? How do i fix this error?
The acc and gyr arrays are n * 3 arrays. you may by the looks of things only have 1x3 arrays.

How to avoid numeric error when normalizing min-max near zero?

I am using
from sklearn import preprocessing
v01 = preprocessing.minmax_scale(v01, feature_range=(rf_imp_vec_truncated.min(), rf_imp_vec_truncated.max()))
and it usually works, except for some times when I get errors like
preprocessing.minmax_scale(v01, feature_range=(rf_imp_vec_truncated.min(), rf_imp_vec_truncated.max()))
File "C:\Code\EPMD\Kodex\EPD_Prerequisite\python_3.7.6\Lib\site-packages\sklearn\preprocessing\_data.py", line 510, in minmax_scale
X = s.fit_transform(X)
File "C:\Code\EPMD\Kodex\EPD_Prerequisite\python_3.7.6\Lib\site-packages\sklearn\base.py", line 571, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "C:\Code\EPMD\Kodex\EPD_Prerequisite\python_3.7.6\Lib\site-packages\sklearn\preprocessing\_data.py", line 339, in fit
return self.partial_fit(X, y)
File "C:\Code\EPMD\Kodex\EPD_Prerequisite\python_3.7.6\Lib\site-packages\sklearn\preprocessing\_data.py", line 365, in partial_fit
" than maximum. Got %s." % str(feature_range))
ValueError: Minimum of desired feature range must be smaller than maximum. Got (-6.090366306515144e-15, -6.090366306515144e-15).
This looks like a numeric error, and I would like to see a flat line in this case.
How to get around this without too much code uglification?
Are you sure you're interpreting the meaning of feature_range correctly? The docs mention, it is the range in which you want the output data, say [0, 1].
The docs also state that the feature_index[0] (i.e., the minimum) must be strictly less than feature_index[1] (i.e., the maximum). However, in your case both are equal (-6.09e-15 and -6.09e-15), hence the error.
The cleanest solutuion I could find for this was to add epsilon to the max:
v01 = preprocessing.minmax_scale(v01, feature_range=(rf_imp_vec_truncated.min(), rf_imp_vec_truncated.max() + np.finfo(rf_imp_vec_truncated.dtype).eps))
Now they are no longer equal.

False ValueError with scipy.signal.savgol_filter

I am confused. I have 21 files that have been generated by the same process and I am filtering them all using savitzky-golay filter with the same parameters.
It works normally for some files, but at some point, I receive the ValueError: array must not contain infs or NaNs. The problem is, I checked the file and there aren't any infs or NaNs!
print "nan", df.isnull().sum()
print "inf", np.isinf(df).sum()
gives
nan x 0
T 0
std_T 0
sterr_T 0
dtype: int64
inf x 0
T 0
std_T 0
sterr_T 0
dtype: int64
So the problem may be in implementation of the filter? Could this result from for example the choice of window length or polyorder relative to the length or step of the data?
Complete traceback:
Traceback (most recent call last):
File "<ipython-input-7-40b33049ef41>", line 1, in <module>
runfile('D:/data/scripts/DailyProfiles_Gradients.py', wdir='D:/data/DFDP2/DFDP2B/DTS/DTS_scripts')
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 714, in runfile
execfile(filename, namespace)
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
exec(compile(scripttext, filename, 'exec'), glob, loc)
File "D:/data/scripts/DailyProfiles_Gradients.py", line 142, in <module>
grad = gradient(y, x, scale,PO)
File "D:/data/scripts/DailyProfiles_Gradients.py", line 76, in Tgradient
smoothed=savgol_filter(list(x), scale, PO, deriv=1, delta=dy[0])
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\scipy\signal\_savitzky_golay.py", line 337, in savgol_filter
coeffs = savgol_coeffs(window_length, polyorder, deriv=deriv, delta=delta)
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\scipy\signal\_savitzky_golay.py", line 140, in savgol_coeffs
coeffs, _, _, _ = lstsq(A, y)
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\scipy\linalg\basic.py", line 822, in lstsq
b1 = _asarray_validated(b, check_finite=check_finite)
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\scipy\_lib\_util.py", line 187, in _asarray_validated
a = toarray(a)
File "C:\Users\me\AppData\Local\Continuum\Anaconda2\lib\site-packages\numpy\lib\function_base.py", line 1033, in asarray_chkfinite
"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs
This problem is rather specific to the data and method and I have not been able to produce a minimum reproducible working example. I am not asking for fixing my code, I am just asking for some brainstorming: What aspects have I not checked yet that might be causing the error? What should the function parameters look like, other than that the window length must be an odd number greater than the polyorder?
I am grateful for the discussion that has arisen, it helped, eventually.
I can reproduce the error ValueError: array must not contain infs or NaNs if delta is extremely small (e.g. delta=1e-310). Check your code and data to ensure that the values that you pass for delta are reasonable.

declare a variable as *not* an integer in sage/maxima solve

I am trying to solve symbolically a simple equation for x:
solve(x^K + d == R, x)
I am declaring these variables and assumptions:
var('K, d, R')
assume(K>0)
assume(K, 'real')
assume(R>0)
assume(R<1)
assume(d<R)
assumptions()
︡> [K > 0, K is real, R > 0, R < 1, d < R]
Yet when I run the solve, I obtain the following error:
Error in lines 1-1
Traceback (most recent call last):
File
"/projects/sage/sage-7.3/local/lib/python2.7/site-packages/smc_sagews/sage_server.py",
line 957, in execute
exec compile(block+'\n', '', 'single') in namespace, locals
...
File "/projects/sage/sage-7.3/local/lib/python2.7/site-packages/sage/interfaces/interface.py",
line 671, in init
raise TypeError(x)
TypeError: Computation failed since Maxima requested additional constraints; using the 'assume' command before evaluation may help (example of legal syntax is 'assume(K>0)', see assume? for more details)
Is K an integer?
Apparently, maxima is asking whether K is an integer? But I explicitly declared it 'real'!
How can I spell out to maxima that it should not assume that K is an integer?
I am simply expecting (R-d)^(1/K) or exp(log(R-d)/K) as answer.
The assumption framework in both Sage and Maxima is fairly weak, though in this case it doesn't matter, since integers are real numbers, right?
However, you might want to try assume(K,'noninteger') because apparently Maxima does support this particular assumption (I had not seen it before). I can't try this right now, unfortunately, good luck!

float division by zero error related to ngram and nltk

My task is to use 10-fold cross validation method with uni, bi and trigrams in a corpus and compare their accuracy. However, I am stuck with a float division error. All of these codes are given by the question setter except for the loop, so the error is probably there. Here, we are only using the first 1000 sentences to test the program, and that line will be removed once I know the program runs.
import codecs
mypath = "/Users/myname/Desktop/"
corpusFile = codecs.open(mypath + "estonianSample.txt",mode="r",encoding="latin-1")
sentences = [[tuple(w.split("/")) for w in line[:-1].split()] for line in corpusFile.readlines()]
corpusFile.close()
from math import ceil
N=len(sentences)
chunkSize = int(ceil(N/10.0))
sentences = sentences[:1000]
chunks=[sentences[i:i+chunkSize] for i in range(0, N, chunkSize)]
for i in range(10):
training = reduce(lambda x,y:x+y,[chunks[j] for j in range(10) if j!=i])
testing = chunks[i]
from nltk import UnigramTagger,BigramTagger,TrigramTagger
t1 = UnigramTagger(training)
t2 = BigramTagger(training,backoff=t1)
t3 = TrigramTagger(training,backoff=t2)
t3.evaluate(testing)
This is what the error says:
runfile('/Users/myname/pythonhw3.py', wdir='/Users/myname')
Traceback (most recent call last):
File "<ipython-input-1-921164840ebd>", line 1, in <module>
runfile('/Users/myname/pythonhw3.py', wdir='/Users/myname')
File "/Users/myname/anaconda/lib/python2.7/site-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "/Users/myname/pythonhw3.py", line 34, in <module>
t3.evaluate(testing)
File "/Users/myname/anaconda/lib/python2.7/site-packages/nltk/tag/api.py", line 67, in evaluate
return accuracy(gold_tokens, test_tokens)
File "/Users/myname/anaconda/lib/python2.7/site-packages/nltk/metrics/scores.py", line 40, in accuracy
return float(sum(x == y for x, y in izip(reference, test))) / len(test)
ZeroDivisionError: float division by zero
Your error is occurring due to the return value being close to negative infinity.
The line specifically causing the issue is,
t3.evaluate(testing)
What you can do instead is,
try:
t3.evaluate(testing)
except ZeroDivisonError:
# Do whatever you want it to do
print(0)
It works on my end. Try it out!
The answer is four years later, but hopefully, a fellow net citizen can find this helpful.

Categories