np.linalg.inv() leads to array full of np.nan

np.linalg.inv() leads to array full of np.nan - python

I am trying to calculate the inverse of a matrix via:
A = pd.read_csv("A_use.csv", header = None)
I = np.identity(len(A))
INV = np.linalg.inv(I - A)
However, the resulting array is full of np.nan.
I don't understand why that's the case.
I've tried to replace all np.nan values in A (although there shouldn't be any) via A[np.isnan(A)] = 0 but the problem persists.
Any suggestions?

Not all matrices have an inverse. A matrix has and inverse if its determinant is non-zero.
Check first whether
np.linalg.det(I-A) ~= 0
If it's non-zero, then you should be able to do
np.linalg.inv(I-A)
Second, make sure I-A does not have a single NaN value. If it does, then computing its inverse will result to a matrix of NaN values.

The problem is in A.
There could be nan values in the dataframe.
The matrix A could be singular, please check if np.linalg.det(A) is not 0
Then I would pass to the function np.linalg.inv a numpy array using pd.DataFrame.to_numpy (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_numpy.html)

The reason is an np.inf value in at least column 293. The np.inf value can be replaced via A[np.isinf(A)] = 0. If np.inf is replaced with zero, there are no np.nan values in L.

Related

Change 0 values to nan values in numpy array changes everything to nan

I have numpy array with the shape of (1212,2117).
The array contains pixels with value 0 or values that are rgeater than 0 , looks like this:
I want to give the 0 pixels value of no data. I have tried to do it this way:
arr=arr.astype('float')
arr[arr==0]=np.nan
It seems like the result is chart that is all NaN.with one little square:
plt.imshow(test)
However it seems like all the values were changes, as if I check what is the max or min value of this array I get nan:
test.max()
>>>nan
test.min()
>>>nan
I would like to understand why I get this result and how can I correctly give no data values for pixels with value of 0.

You have the reason and solution in the docs (Notes section).
NaN values are propagated, that is if at least one item is NaN, the
corresponding max value will be NaN as well. To ignore NaN values
(MATLAB behavior), please use nanmax.
np.nanmax(arr)
# and
np.nanmin(arr))
Should give the expected result.

Make a list
Iterate through every pixel, check if it is a nan, then append 0, if not then append the number.
np.array(your array)
Although it is unpythonic, it may get the job done.

Numpy element wise minimum for masked array return valid value instead of masked value

I'm looping over a large masked array along a specified axis, using element-wise comparison between the current index values and the previous minima.
Below is a simplified example to illustrate the problem. If one of the values is masked and the other is valid, numpy.minimum() returns a masked value. Is it possible to change this behaviour, such that the valid value is returned instead?
x = np.random.randn(5).round(2)
x[[2,4]] = np.nan
print(x)
print(np.minimum(x[-1], x[-2]))
y = np.ma.masked_invalid(x)
print(y)
print(np.minimum(y[-1], y[-2]))
>> [-1.21 2.02 nan -0.31 nan]
>> nan
>> [-1.21 2.02 -- -0.31 --]
>> --
Instead, I would like the return value to be -0.31.
In this simple example, I could use y.compressed() or something like that, but in my case, I have several dimensions, so that doesn't work.
Currently, I use a very high fill_value, which works because there's a physical constraint on the data, but I would like to have a more general solution.

sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

I am using sklearn and having a problem with the affinity propagation. I have built an input matrix and I keep getting the following error.
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
I have run
np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True
I tried using
mat[np.isfinite(mat) == True] = 0
to remove the infinite values but this did not work either.
What can I do to get rid of the infinite values in my matrix, so that I can use the affinity propagation algorithm?
I am using anaconda and python 2.7.9.

This might happen inside scikit, and it depends on what you're doing. I recommend reading the documentation for the functions you're using. You might be using one which depends e.g. on your matrix being positive definite and not fulfilling that criteria.
EDIT: How could I miss that:
np.isnan(mat.any()) #and gets False
np.isfinite(mat.all()) #and gets True
is obviously wrong. Right would be:
np.any(np.isnan(mat))
and
np.all(np.isfinite(mat))
You want to check whether any of the elements are NaN, and not whether the return value of the any function is a number...

I got the same error message when using sklearn with pandas. My solution is to reset the index of my dataframe df before running any sklearn code:
df = df.reset_index()
I encountered this issue many times when I removed some entries in my df, such as
df = df[df.label=='desired_one']

This is my function (based on this) to clean the dataset of nan, Inf, and missing cells (for skewed datasets):
import pandas as pd
import numpy as np
def clean_dataset(df):
assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"
df.dropna(inplace=True)
indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(axis=1)
return df[indices_to_keep].astype(np.float64)

In most cases getting rid of infinite and null values solve this problem.
get rid of infinite values.
df.replace([np.inf, -np.inf], np.nan, inplace=True)
get rid of null values the way you like, specific value such as 999, mean, or create your own function to impute missing values
df.fillna(999, inplace=True)

This is the check on which it fails:
https://github.com/scikit-learn/scikit-learn/blob/0.17.X/sklearn/utils/validation.py#L51
Which says
def _assert_all_finite(X):
"""Like assert_all_finite, but only for ndarray."""
X = np.asanyarray(X)
# First try an O(n) time, O(1) space solution for the common case that
# everything is finite; fall back to O(n) space np.isfinite to prevent
# false positives from overflow in sum method.
if (X.dtype.char in np.typecodes['AllFloat'] and not np.isfinite(X.sum())
and not np.isfinite(X).all()):
raise ValueError("Input contains NaN, infinity"
" or a value too large for %r." % X.dtype)
So make sure that you have non NaN values in your input. And all those values are actually float values. None of the values should be Inf either.

The Dimensions of my input array were skewed, as my input csv had empty spaces.

With this version of python 3:
/opt/anaconda3/bin/python --version
Python 3.6.0 :: Anaconda 4.3.0 (64-bit)
Looking at the details of the error, I found the lines of codes causing the failure:
/opt/anaconda3/lib/python3.6/site-packages/sklearn/utils/validation.py in _assert_all_finite(X)
56 and not np.isfinite(X).all()):
57 raise ValueError("Input contains NaN, infinity"
---> 58 " or a value too large for %r." % X.dtype)
59
60
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
From this, I was able to extract the correct way to test what was going on with my data using the same test which fails given by the error message: np.isfinite(X)
Then with a quick and dirty loop, I was able to find that my data indeed contains nans:
print(p[:,0].shape)
index = 0
for i in p[:,0]:
if not np.isfinite(i):
print(index, i)
index +=1
(367340,)
4454 nan
6940 nan
10868 nan
12753 nan
14855 nan
15678 nan
24954 nan
30251 nan
31108 nan
51455 nan
59055 nan
...
Now all I have to do is remove the values at these indexes.

None of the answers here worked for me. This was what worked.
Test_y = np.nan_to_num(Test_y)
It replaces the infinity values with high finite values and the nan values with numbers

I had the same error, and in my case X and y were dataframes so I had to convert them to matrices first:
X = X.values.astype(np.float)
y = y.values.astype(np.float)
Edit: The originally suggested X.as_matrix() is Deprecated

Problem seems to occur in DecisionTreeClassifier input check, Try
X_train = X_train.replace((np.inf, -np.inf, np.nan), 0).reset_index(drop=True)

I had the error after trying to select a subset of rows:
df = df.reindex(index=my_index)
Turns out that my_index contained values that were not contained in df.index, so the reindex function inserted some new rows and filled them with nan.

Remove all infinite values:
(and replace with min or max for that column)
import numpy as np
# generate example matrix
matrix = np.random.rand(5,5)
matrix[0,:] = np.inf
matrix[2,:] = -np.inf
>>> matrix
array([[ inf, inf, inf, inf, inf],
[0.87362809, 0.28321499, 0.7427659 , 0.37570528, 0.35783064],
[ -inf, -inf, -inf, -inf, -inf],
[0.72877665, 0.06580068, 0.95222639, 0.00833664, 0.68779902],
[0.90272002, 0.37357483, 0.92952479, 0.072105 , 0.20837798]])
# find min and max values for each column, ignoring nan, -inf, and inf
mins = [np.nanmin(matrix[:, i][matrix[:, i] != -np.inf]) for i in range(matrix.shape[1])]
maxs = [np.nanmax(matrix[:, i][matrix[:, i] != np.inf]) for i in range(matrix.shape[1])]
# go through matrix one column at a time and replace + and -infinity
# with the max or min for that column
for i in range(matrix.shape[1]):
matrix[:, i][matrix[:, i] == -np.inf] = mins[i]
matrix[:, i][matrix[:, i] == np.inf] = maxs[i]
>>> matrix
array([[0.90272002, 0.37357483, 0.95222639, 0.37570528, 0.68779902],
[0.87362809, 0.28321499, 0.7427659 , 0.37570528, 0.35783064],
[0.72877665, 0.06580068, 0.7427659 , 0.00833664, 0.20837798],
[0.72877665, 0.06580068, 0.95222639, 0.00833664, 0.68779902],
[0.90272002, 0.37357483, 0.92952479, 0.072105 , 0.20837798]])

I found that after calling pct_change on a new column that nan existed in one of rows. I remove the nan row with the following code
df = df.replace([np.inf, -np.inf], np.nan)
df = df.dropna()
df = df.reset_index()

i got the same error. it worked with df.fillna(-99999, inplace=True) before doing any replacement, substitution etc

I would like to propose a solution for numpy that worked well for me. The line
from numpy import inf
inputArray[inputArray == inf] = np.finfo(np.float64).max
substitues all infite values of a numpy array with the maximum float64 number.

Puff !! In my case the problem was about NaN values...
You can list your columns that had NaN with this function
your_data.isnull().sum()
and then you can fill these NAN values in your dataset file.
Here is the code for how to "Replace NaN with zero and infinity with large finite numbers."
your_data[:] = np.nan_to_num(your_data)
from numpy.nan_to_num

In my case the problem was that many scikit functions return numpy arrays, which are devoid of pandas index. So there was an index mismatch when I used those numpy arrays to build new DataFrames and then I tried to mix them with the original data.

dataset = dataset.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False)
This worked for me

I had the same issue, in my case the answer was simply that I had a cell in my CSV with no value ("x,y,z,,"). Putting a default value in fixed it for me.

Using isneginf may help.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.isneginf.html#numpy.isneginf
x[numpy.isneginf(x)] = 0 #0 is the value you want to replace with

Note: This solution only applies if you consciously want to keep NaN entries in your dataset.
This error happened to me when I was using some of the scikit-learn functionality (in my case: GridSearchCV). Under the hood I was using an xgboost XGBClassifier which handles NaN data gracefully. However, GridSearchCV was using sklearn.utils.validation module that encforced lack of missing data in the input data by calling _assert_all_finite function. This was ultimately causing an error:
ValueError: Input contains NaN, infinity or a value too large for dtype('float64')
Sidenote: _assert_all_finite accepts an allow_nan argument, which, if set to True, would not be causing issues. However, scikit-learn API does not allow us to have control over this argument.
Solution
My solution was to use patch module to silence the _assert_all_finite function so that it does not raise ValueError. Here is a snippet
import sklearn
with mock.patch("sklearn.utils.validation._assert_all_finite"):
# your code that raises ValueError
this will replace the _assert_all_finite by a dummy mock function so it won't get executed.
Please note that patching is not a recommended practice and might result in unpredictable behaviour!
EDIT:
This Pull Request should resolve the issue (though the fix has not been released as of Jan 2022)

If you're running an estimator, it could be that your learning rate is too high. I passed in the wrong array to a grid search by accident and ended up training with a learning rate of 500, which I could see causing issues with the training process.
Basically it's not necessarily only your inputs that have to all be valid, but the intermediate data as well.

After a long time of dealing with this problem, I realized that this is because in splits of training and testing sets there are columns of data which are the same for all data rows. Then some calculations in some algorithms may lead to infinity results. If the data that you are using is in a way that close rows are more likely to be similar then shuffling the data can help. This is a bug with scikit. I'm using version 0.23.2.

If you happen to use the "kc_house_data.csv" dataset (which some commenters and many data-science newcomers seem to use, because it's presented in lots of popular course material), the data is faulty and the true source for the error.
To fix it, as of 2022:
Delete the last (empty) line in the csv file
There are two lines that contain one empty data value "x,x,,x,x" - to fix it, don't delete the comma, instead add a random integer value like 2000, so it looks like this "x,x,2000,x,x"
Don't forget to save and reload in your project.
All the other answers are helpful and correct, but not in this case:
If you use kc_house_data.csv you need to fix the data in the file, nothing else will help, the empty data field will shift the other data around randomly and generate weird bugs that are hard to track to the source!

In my case the algorithm required data to be between (0,1) noninclusive. My quite brutal solutions was to add a small random number to all desired values:
y_train = pd.DataFrame(y_train).applymap(lambda x: x + np.random.rand()/100000.0)["col_name"]
y_train[y_train >= 1] = 0.999999
while y_train is in the range of [0,1].
This is definitely not suitable for all cases, as you are messing with your input data but can be a solution if you have sparse data and only need a quick forecast

try
mat.sum()
If the sum of your data is infinity (greater that the max float value which is 3.402823e+38) you will get that error.
see the _assert_all_finite function in validation.py from the scikit source code:
if is_float and np.isfinite(X.sum()):
pass
elif is_float:
msg_err = "Input contains {} or a value too large for {!r}."
if (allow_nan and np.isinf(X).any() or
not allow_nan and not np.isfinite(X).all()):
type_err = 'infinity' if allow_nan else 'NaN, infinity'
# print(X.sum())
raise ValueError(msg_err.format(type_err, X.dtype))

Numpy: multiplying with NaN values without using nan_to_num

I was able to optimise some operations in my program quite a bit using numpy. When I profile a run, I noticed that most of the time is spent in numpy.nan_to_num. I'd like to improve this even further.
The sort of calculations occurring are multiplication of two arrays for which one of the arrays could contain nan values. I want these to be treated as zeros, but I can't initialise the array with zeros, as nan has a meaning later on and can't be set to 0. Is there a way of doing multiplications (and additions) with nan being treated as zero?
From the nan_to_num docstring, I can see a new array is produced which may explain why it's taking so long.
Replace nan with zero and inf with finite numbers.
Returns an array or scalar replacing Not a Number (NaN) with zero,...
A function like nansum for arbitrary arithmetic operations would be great.

Here's some example data:
import numpy as np
a = np.random.rand(1000, 1000)
a[a < 0.1] = np.nan # set some random values to nan
b = np.ones_like(a)
One option is to use np.where to set the value of the result to 0 wherever one of your arrays is equal to NaN:
result = np.where(np.isnan(a), 0, a * b)
If you have to do several operations on an array that contains NaNs, you might consider using masked arrays, which provide a more general method for dealing with missing or invalid values:
masked_a = np.ma.masked_invalid(a)
result2 = masked_a * b
Here, result2 is another np.ma.masked_array whose .mask attribute is set according to where the NaN values were in a. To convert this back to a normal np.ndarray with the masked values replaced by 0s, you can use the .filled() method, passing in the fill value of your choice:
result_filled = result2.filled(0)
assert np.all(result_filled == result)

setting null values in a numpy array

how do I null certain values in numpy array based on a condition?
I don't understand why I end up with 0 instead of null or empty values where the condition is not met... b is a numpy array populated with 0 and 1 values, c is another fully populated numpy array. All arrays are 71x71x166
a = np.empty(((71,71,166)))
d = np.empty(((71,71,166)))
for indexes, value in np.ndenumerate(b):
i,j,k = indexes
a[i,j,k] = np.where(b[i,j,k] == 1, c[i,j,k], d[i,j,k])
I want to end up with an array which only has values where the condition is met and is empty everywhere else but with out changing its shape
FULL ISSUE FOR CLARIFICATION as asked for:
I start with a float populated array with shape (71,71,166)
I make an int array based on a cutoff applied to the float array basically creating a number of bins, roughly marking out 10 areas within the array with 0 values in between
What I want to end up with is an array with shape (71,71,166) which has the average values in a particular array direction (assuming vertical direction, if you think of a 3D array as a 3D cube) of a certain "bin"...
so I was trying to loop through the "bins" b == 1, b == 2 etc, sampling the float where that condition is met but being null elsewhere so I can take the average, and then recombine into one array at the end of the loop....
Not sure if I'm making myself understood. I'm using the np.where and using the indexing as I keep getting errors when I try and do it without although it feels very inefficient.

Consider this example:
import numpy as np
data = np.random.random((4,3))
mask = np.random.random_integers(0,1,(4,3))
data[mask==0] = np.NaN
The data will be set to nan wherever the mask is 0. You can use any kind of condition you want, of course, or do something different for different values in b.
To erase everything except a specific bin, try the following:
c[b!=1] = np.NaN
So, to make a copy of everything in a specific bin:
a = np.copy(c)
a[b!=1] == np.NaN
To get the average of everything in a bin:
np.mean(c[b==1])
So perhaps this might do what you want (where bins is a list of bin values):
a = np.empty(c.shape)
a[b==0] = np.NaN
for bin in bins:
a[b==bin] = np.mean(c[b==bin])

np.empty sometimes fills the array with 0's; it's undefined what the contents of an empty() array is, so 0 is perfectly valid. For example, try this instead:
d = np.nan * np.empty((71, 71, 166)).
But consider using numpy's strength, and don't iterate over the array:
a = np.where(b, c, d)
(since b is 0 or 1, I've excluded the explicit comparison b == 1.)
You may even want to consider using a masked array instead:
a = np.ma.masked_where(b, c)
which seems to make more sense with respect to your question: "how do I null certain values in a numpy array based on a condition" (replace null with mask and you're done).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

np.linalg.inv() leads to array full of np.nan - python

The reason is an np.inf value in at least column 293. The np.inf value can be replaced via A[np.isinf(A)] = 0. If np.inf is replaced with zero, there are no np.nan values in L.

Related

Change 0 values to nan values in numpy array changes everything to nan

Numpy element wise minimum for masked array return valid value instead of masked value

sklearn error ValueError: Input contains NaN, infinity or a value too large for dtype('float64')

Numpy: multiplying with NaN values without using nan_to_num

setting null values in a numpy array

Categories

Resources