weird error i cannot get my head around in pearsonr function - python

my code is like below:
predict_n=model.predict(x_test)
predict_n=predict_n.astype(np.float64)
corr_value, p_value = pearsonr(predict_n, y_test)
print(corr_value,round(p_value,4))
print(esc('31;1;4') +"correlation:"+corr_value+" p_value:"+p_value)
fig = plt.figure(figsize=(30, 30))
plot_corr(val_d[:,i,j], predict_n[:,i,j],corrs[i,j])
and when it hits the 3rd row it outputs this:
TypeError Traceback (most recent call last)
<ipython-input-18-b71eb959e83c> in <module>
66 predict_n=model.predict(x_test)
67 predict_n=predict_n.astype(np.float64)
---> 68 corr_value, p_value = pearsonr(predict_n, y_test)
69 print(corr_value,round(p_value,4))
70 print(esc('31;1;4') +"correlation:"+corr_value+" p_value:"+p_value)
~\Anaconda3\envs\deeplearning\lib\site-packages\scipy\stats\stats.py in pearsonr(x, y)
3517 return dtype(np.sign(x[1] - x[0])*np.sign(y[1] - y[0])), 1.0
3518
-> 3519 xmean = x.mean(dtype=dtype)
3520 ymean = y.mean(dtype=dtype)
3521
~\Anaconda3\envs\deeplearning\lib\site-packages\numpy\core\_methods.py in _mean(a, axis, dtype, out, keepdims)
149 is_float16_result = True
150
--> 151 ret = umr_sum(arr, axis, dtype, out, keepdims)
152 if isinstance(ret, mu.ndarray):
153 ret = um.true_divide(
TypeError: No loop matching the specified signature and casting was found for ufunc add
This error i found on this site has to do with variable type. Thats why i added the second row above to make them both to float64. when i, for example input
print(np.shape(predict_n))
print(np.shape(y_test))
print(predict_n.dtype)
print(y_test.dtype)
iget the output
(367, 100, 1)
(367, 100, 1)
float64
float64
Can anyone pls help figure this out.

The shapes of your inputs are (367, 100, 1). pearsonr requires the inputs to be 1-d arrays¹. Unfortunately, that cryptic error message provides no help for figuring out what is wrong!
If your intent is to treat each input as a 1-d sequence of 36700 values, you can use pearsonr(predict_n.ravel(), y_test.ravel()).
If you expected pearsonr to implicitly loop over one of the dimensions, you'll have to write your own loop to do that.
¹ Eventually pearsonr will be enhanced with an axis argument, but for now, its inputs must be 1-d.

Related

SciPy - ValueError: all the input array dimensions for the concatenation axis must match exactly

So I'm working on a problem where I need to simply use SciPy to perform linear regression to get the weights and statistics on the weights, but I'm getting the error
"ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 11 and the array at index 1 has size 100"
The code is simply:
from scipy import stats
x = x_copy
y = y_copy
stats.linregress(x, y)
Where x is a dataframe and y is a numpy array.
When doing x.shape and y.shape I get that x is (100, 11) and y is (100,). Running the exact same matrices in np.linalg.lstsq and sklearn.linear_model.LinearRegression both work fine and output the weights, but as far as I'm aware I need SciPy to get the statistics on the weights themselves.
I've also checked x.dtypes and all variables are float64, and y.dtype also returns float64. I've also tried replacing to x in the regression call with x.to_numpy() incase there was something with the headers/index but received the same issue.
Any suggestions?
Edit:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_28012/197024375.py in <module>
4 y = y_copy
5
----> 6 stats.linregress(x.values, y)
7
8 x.values.shape
~\anaconda3\lib\site-packages\scipy\stats\_stats_mstats_common.py in linregress(x, y, alternative)
153 # ssxm = mean( (x-mean(x))^2 )
154 # ssxym = mean( (x-mean(x)) * (y-mean(y)) )
--> 155 ssxm, ssxym, _, ssym = np.cov(x, y, bias=1).flat
156
157 # R-value
<__array_function__ internals> in cov(*args, **kwargs)
~\anaconda3\lib\site-packages\numpy\lib\function_base.py in cov(m, y, rowvar, bias, ddof, fweights, aweights, dtype)
2426 if not rowvar and y.shape[0] != 1:
2427 y = y.T
-> 2428 X = np.concatenate((X, y), axis=0)
2429
2430 if ddof is None:
<__array_function__ internals> in concatenate(*args, **kwargs)
ValueError: all the input array dimensions for the concatenation axis must match exactly, but along dimension 1, the array at index 0 has size 11 and the array at index 1 has size 100```

Getting a mistake with shap plotting

X = df.copy()
# Save and drop labels
y = df['class']
X = X.drop('class', axis=1)
cat_features = list(range(0, X.shape[1]))
model = CatBoostClassifier(iterations=2000, learning_rate=0.1, random_seed=12)
model.fit(X, y, verbose=False, plot=False)
explainer = shap.Explainer(model)
shap_values = explainer(X)
shap.force_plot(explainer.expected_value, shap_values[0:5,:],X.iloc[0:5,:], plot_cmap="DrDb")
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-170-ba1eca12b9ed> in <module>
----> 1 shap.force_plot(10, shap_values[0:5,:],X.iloc[0:5,:], plot_cmap="DrDb")
~\anaconda3\lib\site-packages\shap\plots\_force.py in force(base_value, shap_values, features, feature_names, out_names, link, plot_cmap, matplotlib, show, figsize, ordering_keys, ordering_keys_time_format, text_rotation, contribution_threshold)
101
102 if type(shap_values) != np.ndarray:
--> 103 return visualize(shap_values)
104
105 # convert from a DataFrame or other types
~\anaconda3\lib\site-packages\shap\plots\_force.py in visualize(e, plot_cmap, matplotlib, figsize, show, ordering_keys, ordering_keys_time_format, text_rotation, min_perc)
343 return AdditiveForceArrayVisualizer(e, plot_cmap=plot_cmap, ordering_keys=ordering_keys, ordering_keys_time_format=ordering_keys_time_format)
344 else:
--> 345 assert False, "visualize() can only display Explanation objects (or arrays of them)!"
346
347 class BaseVisualizer:
AssertionError: visualize() can only display Explanation objects (or arrays of them)!
Was trying to plot with shap and my data, but got a mistake and I actually don't understand why. Haven't found anything about this. Please explain how to avoid this error?
explainer.expected_value
-5.842052267820879
You should change the last line to this : shap.force_plot(explainer.expected_value, shap_values.values[0:5,:],X.iloc[0:5,:], plot_cmap="DrDb")
by calling shap_values.values instead of just shap_values, because shap_values holds the shapley values, the base_values and the data . I had the same problem until I inspected the variable.

How to use cov function to a dataset iris python

I want to get the covariance from the iris data set, https://www.kaggle.com/jchen2186/machine-learning-with-iris-dataset/data
I am using numpy, and the function -> np.cov(iris)
with open("Iris.csv") as iris:
reader = csv.reader(iris)
data = []
next(reader)
for row in reader:
data.append(row)
for i in data:
i.pop(0)
i.pop(4)
iris = np.array(data)
np.cov(iris)
And I get this error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-4-bfb836354075> in <module>
----> 1 np.cov(iris)
D:\Anaconda\lib\site-packages\numpy\lib\function_base.py in cov(m, y, rowvar, bias, ddof, fweights, aweights)
2300 w *= aweights
2301
-> 2302 avg, w_sum = average(X, axis=1, weights=w, returned=True)
2303 w_sum = w_sum[0]
2304
D:\Anaconda\lib\site-packages\numpy\lib\function_base.py in average(a, axis, weights, returned)
354
355 if weights is None:
--> 356 avg = a.mean(axis)
357 scl = avg.dtype.type(a.size/avg.size)
358 else:
D:\Anaconda\lib\site-packages\numpy\core\_methods.py in _mean(a, axis, dtype, out, keepdims)
73 is_float16_result = True
74
---> 75 ret = umr_sum(arr, axis, dtype, out, keepdims)
76 if isinstance(ret, mu.ndarray):
77 ret = um.true_divide(
TypeError: cannot perform reduce with flexible type
I don't understand what it means..
So, if you want to modify your code you could try by reading the Iris.csv with pandas.read_csv function. And then select the appropiate columns of your choice.
BUT, here is a little set of commands to ease up this task. They use scikit-learn and numpy to load the iris dataset obtain X and y and obtain covariance matrix:
from sklearn.datasets import load_iris
import numpy as np
data = load_iris()
X = data['data']
y = data['target']
np.cov(X)
Hope this has helped.

Using numpy.min to get minimum value in a float64 type numpy array but got error 'numpy.float64' object cannot be interpreted as an index

I was trying to get the minimum value between tr_loss and val_loss using numpy.min.
np.min(np.min(tr_loss), np.min(val_loss))
The tr_loss and val_loss are numpy arrays that returned from model.fit in keras.
'tr_loss': [0.84579472304284575, 0.77913203762769701, 0.76625978895127778, 0.75814685845822094, 0.75486504282504319, 0.74989902700781819, 0.74833822523653504, 0.74695981823652979, 0.74483485338091848, 0.74150521695762872]
'val_loss': [0.76307238261103627, 0.75163262798049202, 0.74257619685517573, 0.75038179922993964, 0.72936564083517463, 0.73233943380595634, 0.72518632964207708, 0.74037907492741795, 0.7237680551772061, 0.73257833277079065]}
But I keep getting this error
TypeError Traceback (most recent call last)
<ipython-input-35-e82cb24a3b5d> in <module>()
3
4
----> 5 y_ax_min = np.min(np.min(tr_loss), np.min(val_loss)) - .1
6 y_ax_max = np.max(np.max(tr_loss), np.max(val_loss)) + .1
7 plt.figure(figsize=(8, 8),dpi=500)
D:\Anaconda\envs\py27\lib\site-packages\numpy\core\fromnumeric.pyc in
amin(a, axis, out, keepdims)
2347 pass
2348 else:
-> 2349 return amin(axis=axis, out=out, **kwargs)
2350
2351 return _methods._amin(a, axis=axis,
D:\Anaconda\envs\py27\lib\site-packages\numpy\core\_methods.pyc in _amin(a,
axis, out, keepdims)
27
28 def _amin(a, axis=None, out=None, keepdims=False):
---> 29 return umr_minimum(a, axis, None, out, keepdims)
30
31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
TypeError: 'numpy.float64' object cannot be interpreted as an index
Does anyone know where is the problem?
When comparing two elements, numpy.minimum should be used instead of numpy.min.
The comments on this question helped me. Thanks to all those who commented.

LDA python library not taking sparse matrix as input

I am trying to use the lda 1.0.2 package for python.
The documentation says that sparse matrix are acceptable, but when I pass a sparse matrix to the transform() function. It throws the error
The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all().
The transform() function works fine with normal matrix.
Has anybody else faced similar problem ?
any help will be great! Thanks in advance :)
I just got the same error. To reproduce:
from scipy.sparse import csr_matrix
import lda
X = csr_matrix([[1,0],[0,1]])
lda_test = lda.LDA(n_topics=2, n_iter=10)
lda_test.fit(X)
X_trans = lda_test.transform(X)
Which produces the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-192-a1a0875bac02> in <module>()
5 lda_test = lda.LDA(n_topics=2, n_iter=10)
6 lda_test.fit(X)
----> 7 X_trans = lda_test.transform(X)
C:\Users\lidw6lw\PortablePython\App\lib\site-packages\lda\lda.pyc in transform(self, X, max_iter, tol)
173 n_topics = len(self.components_)
174 doc_topic = np.empty((len(X), n_topics))
--> 175 WS, DS = lda.utils.matrix_to_lists(X)
176 # TODO: this loop is parallelizable
177 for d in range(len(X)):
C:\Users\lidw6lw\PortablePython\App\lib\site-packages\lda\utils.pyc in matrix_to_lists(doc_word)
44 if np.count_nonzero(doc_word.sum(axis=1)) != doc_word.shape[0]:
45 logger.warning("all zero row in document-term matrix found")
---> 46 if np.count_nonzero(doc_word.sum(axis=0)) != doc_word.shape[1]:
47 logger.warning("all zero column in document-term matrix found")
48 sparse = True
C:\Users\lidw6lw\PortablePython\App\lib\site-packages\numpy\core\_methods.pyc in _sum(a, axis, dtype, out, keepdims)
23 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
24 return um.add.reduce(a, axis=axis, dtype=dtype,
---> 25 out=out, keepdims=keepdims)
26
27 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):
C:\Users\lidw6lw\PortablePython\App\lib\site-packages\scipy\sparse\base.pyc in __bool__(self)
181 return True if self.nnz == 1 else False
182 else:
--> 183 raise ValueError("The truth value of an array with more than one "
184 "element is ambiguous. Use a.any() or a.all().")
185 __nonzero__ = __bool__
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all().
Looks like it's due to lda.utils.matrix_to_lists
Both of the below work just fine:
X_trans = lda_test.fit(X.toarray())
X_trans2 = lda_test.fit_transform(X)
EDIT: It's actually that the transform function that doesn't account for sparse matrices properly.Make a copy of the package, and in the code for transformjust replace len(X) with X.shape(0) and comment out the np.atleast_2d(X) line. So the section right below the docstring in transform looks like this:
# X = np.atleast_2d(X)
phi = self.components_
alpha = self.alpha
# for debugging, let's not worry about the documents
n_topics = len(self.components_)
doc_topic = np.empty((X.shape[0], n_topics))
WS, DS = lda.utils.matrix_to_lists(X)
# TODO: this loop is parallelizable
for d in range(X.shape[0]):
Got the similar error recently.
ValueError: expected sparse matrix with integer values, found float values
This fixed the issue:
model.fit(X.toarray().astype(int))

Categories