How do I remove NaN values from a NumPy array?

How do I remove NaN values from a NumPy array? - python

How do I remove NaN values from a NumPy array?
[1, 2, NaN, 4, NaN, 8] ⟶ [1, 2, 4, 8]

To remove NaN values from a NumPy array x:
x = x[~numpy.isnan(x)]
Explanation
The inner function numpy.isnan returns a boolean/logical array which has the value True everywhere that x is not-a-number. Since we want the opposite, we use the logical-not operator ~ to get an array with Trues everywhere that x is a valid number.
Lastly, we use this logical array to index into the original array x, in order to retrieve just the non-NaN values.

filter(lambda v: v==v, x)
works both for lists and numpy array
since v!=v only for NaN

For me the answer by #jmetz didn't work, however using pandas isnull() did.
x = x[~pd.isnull(x)]

Try this:
import math
print [value for value in x if not math.isnan(value)]
For more, read on List Comprehensions.

#jmetz's answer is probably the one most people need; however it yields a one-dimensional array, e.g. making it unusable to remove entire rows or columns in matrices.
To do so, one should reduce the logical array to one dimension, then index the target array. For instance, the following will remove rows which have at least one NaN value:
x = x[~numpy.isnan(x).any(axis=1)]
See more detail here.

As shown by others
x[~numpy.isnan(x)]
works. But it will throw an error if the numpy dtype is not a native data type, for example if it is object. In that case you can use pandas.
x[~pandas.isna(x)] or x[~pandas.isnull(x)]

If you're using numpy
# first get the indices where the values are finite
ii = np.isfinite(x)
# second get the values
x = x[ii]

The accepted answer changes shape for 2d arrays.
I present a solution here, using the Pandas dropna() functionality.
It works for 1D and 2D arrays. In the 2D case you can choose weather to drop the row or column containing np.nan.
import pandas as pd
import numpy as np
def dropna(arr, *args, **kwarg):
assert isinstance(arr, np.ndarray)
dropped=pd.DataFrame(arr).dropna(*args, **kwarg).values
if arr.ndim==1:
dropped=dropped.flatten()
return dropped
x = np.array([1400, 1500, 1600, np.nan, np.nan, np.nan ,1700])
y = np.array([[1400, 1500, 1600], [np.nan, 0, np.nan] ,[1700,1800,np.nan]] )
print('='*20+' 1D Case: ' +'='*20+'\nInput:\n',x,sep='')
print('\ndropna:\n',dropna(x),sep='')
print('\n\n'+'='*20+' 2D Case: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna (rows):\n',dropna(y),sep='')
print('\ndropna (columns):\n',dropna(y,axis=1),sep='')
print('\n\n'+'='*20+' x[np.logical_not(np.isnan(x))] for 2D: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna:\n',x[np.logical_not(np.isnan(x))],sep='')
Result:
==================== 1D Case: ====================
Input:
[1400. 1500. 1600. nan nan nan 1700.]
dropna:
[1400. 1500. 1600. 1700.]
==================== 2D Case: ====================
Input:
[[1400. 1500. 1600.]
[ nan 0. nan]
[1700. 1800. nan]]
dropna (rows):
[[1400. 1500. 1600.]]
dropna (columns):
[[1500.]
[ 0.]
[1800.]]
==================== x[np.logical_not(np.isnan(x))] for 2D: ====================
Input:
[[1400. 1500. 1600.]
[ nan 0. nan]
[1700. 1800. nan]]
dropna:
[1400. 1500. 1600. 1700.]

Doing the above :
x = x[~numpy.isnan(x)]
or
x = x[numpy.logical_not(numpy.isnan(x))]
I found that resetting to the same variable (x) did not remove the actual nan values and had to use a different variable. Setting it to a different variable removed the nans.
e.g.
y = x[~numpy.isnan(x)]

In case it helps, for simple 1d arrays:
x = np.array([np.nan, 1, 2, 3, 4])
x[~np.isnan(x)]
>>> array([1., 2., 3., 4.])
but if you wish to expand to matrices and preserve the shape:
x = np.array([
[np.nan, np.nan],
[np.nan, 0],
[1, 2],
[3, 4]
])
x[~np.isnan(x).any(axis=1)]
>>> array([[1., 2.],
[3., 4.]])
I encountered this issue when dealing with pandas .shift() functionality, and I wanted to avoid using .apply(..., axis=1) at all cost due to its inefficiency.

Simply fill with
x = numpy.array([
[0.99929941, 0.84724713, -0.1500044],
[-0.79709026, numpy.NaN, -0.4406645],
[-0.3599013, -0.63565744, -0.70251352]])
x[numpy.isnan(x)] = .555
print(x)
# [[ 0.99929941 0.84724713 -0.1500044 ]
# [-0.79709026 0.555 -0.4406645 ]
# [-0.3599013 -0.63565744 -0.70251352]]

pandas introduces an option to convert all data types to missing values.
https://pandas.pydata.org/docs/user_guide/missing_data.html
The np.isnan() function is not compatible with all data types, e.g.
>>> import numpy as np
>>> values = [np.nan, "x", "y"]
>>> np.isnan(values)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
The pd.isna() and pd.notna() functions are compatible with many data types and pandas introduces a pd.NA value:
>>> import numpy as np
>>> import pandas as pd
>>> values = pd.Series([np.nan, "x", "y"])
>>> values
0 NaN
1 x
2 y
dtype: object
>>> values.loc[pd.isna(values)]
0 NaN
dtype: object
>>> values.loc[pd.isna(values)] = pd.NA
>>> values.loc[pd.isna(values)]
0 <NA>
dtype: object
>>> values
0 <NA>
1 x
2 y
dtype: object
#
# using map with lambda, or a list comprehension
#
>>> values = [np.nan, "x", "y"]
>>> list(map(lambda x: pd.NA if pd.isna(x) else x, values))
[<NA>, 'x', 'y']
>>> [pd.NA if pd.isna(x) else x for x in values]
[<NA>, 'x', 'y']

A simplest way is:
numpy.nan_to_num(x)
Documentation: https://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html

Related

Get values of pandas series from a array of index locations

I have a 2-d array of an index of a pandas series. Would like to create a 2-d array of the values from the pandas series that correspond to the index.
For example:
import pandas as pd
import numpy as np
A = pd.Series(data=[1,2,3,4,5])
idx = np.array([[0,2,3],[2,3,1]])
Would like to return:
B = np.array([[1,3,4],[3,4,2]])
I know I could do this as a loop:
B = np.zeros((2,3))
for i in [0,1]:
B[i,:] = test[idx[i]]
However, in practice need to do this repeatedly so would like to broadcast the index locations directly. Pandas is not necessary, happy to do it all in numpy if easier.

Something like this might work:
A[idx.flatten()].values.reshape(idx.shape)

A[idx] gives a Cannot index with multidimensional key error.
In [190]: A = pd.Series(data=[1,2,3,4,5])
...: idx = np.array([[0,2,3],[2,3,1]])
But the 1d array derived from the Series, can be indexed this way:
In [191]: A.values
Out[191]: array([1, 2, 3, 4, 5])
In [192]: A.values[idx]
Out[192]:
array([[1, 3, 4],
[3, 4, 2]])
numpy has no problems returning an array with a dimension that matches idx.
Indexing the Series like this returns a Series - which by definition is 1d:
In [194]: A[idx.ravel()]
Out[194]:
0 1
2 3
3 4
2 3
3 4
1 2
dtype: int64

Find all NaN slice in numpy array

I have a four dimensional Numpy ndarray (time, pressure level, latitude, longitude), and I want to check for each time and pressure level (dimensions 0 and 1) if there is an all-NaN slice along the latitude or longitude dimenstion (2 and 3).
I'd like to to it in a vectorized way, so without looping over the array, but I can't figure out how.
import numpy as np
a=np.ones([2,3,5,5])
a[0,2,:,2]=np.nan*np.ones_like(a[0,2,:,2])
a[0,1,1,:]=np.nan*np.ones_like(a[0,1,1,:])
a[0,0,1,2]=np.nan
a[1,1,:,2]=np.nan*np.ones_like(a[0,2,:,2])
a[1,1,1,:]=np.nan*np.ones_like(a[0,1,1,:])
print(a)
The array now holds ones (i.e. numbers), and in some locations slices of only NaNs. I'd like to know these locations. So in this case, I need to find that the NaN slices are at [0,2,:,2], [0,1,1,:], [1,1,:,2], and a[1,1,1,:].

You should use the np.isnan function which creates a boolean matrix of the same size as your original matrix. Then just use boolean reduction operations like np.all. Thus the following code stores in idx the index of the lines (axis=1) of which all the elements are equal to np.nan.
arr = np.array([[0, 0, 0], [np.nan, np.nan, np.nan], [1, np.nan, 1]])
arr_isnan = np.isnan(arr)
idx = np.argwhere(arr_isnan.all(axis=1))
Output:
>>>print(idx)
[[1]]
Following your example this methods gives you this output :
arr_isnan = np.isnan(a)
idx = np.argwhere(arr_isnan.all(axis=2))
>>>print(idx) #[0,2,:,2] and [1,1,:,2] because axis=2
array([[0, 2, 2],
[1, 1, 2]], dtype=int64)
>>>print(a[idx[:,0], idx[:,1], :, idx[:,2]])
[[nan nan nan nan nan]
[nan nan nan nan nan]]
So you just have to adjust the position of ":" according to the axis.

Removing NaNs from imported lists [duplicate]

How do I remove NaN values from a NumPy array?
[1, 2, NaN, 4, NaN, 8] ⟶ [1, 2, 4, 8]

To remove NaN values from a NumPy array x:
x = x[~numpy.isnan(x)]
Explanation
The inner function numpy.isnan returns a boolean/logical array which has the value True everywhere that x is not-a-number. Since we want the opposite, we use the logical-not operator ~ to get an array with Trues everywhere that x is a valid number.
Lastly, we use this logical array to index into the original array x, in order to retrieve just the non-NaN values.

filter(lambda v: v==v, x)
works both for lists and numpy array
since v!=v only for NaN

For me the answer by #jmetz didn't work, however using pandas isnull() did.
x = x[~pd.isnull(x)]

Try this:
import math
print [value for value in x if not math.isnan(value)]
For more, read on List Comprehensions.

#jmetz's answer is probably the one most people need; however it yields a one-dimensional array, e.g. making it unusable to remove entire rows or columns in matrices.
To do so, one should reduce the logical array to one dimension, then index the target array. For instance, the following will remove rows which have at least one NaN value:
x = x[~numpy.isnan(x).any(axis=1)]
See more detail here.

As shown by others
x[~numpy.isnan(x)]
works. But it will throw an error if the numpy dtype is not a native data type, for example if it is object. In that case you can use pandas.
x[~pandas.isna(x)] or x[~pandas.isnull(x)]

If you're using numpy
# first get the indices where the values are finite
ii = np.isfinite(x)
# second get the values
x = x[ii]

The accepted answer changes shape for 2d arrays.
I present a solution here, using the Pandas dropna() functionality.
It works for 1D and 2D arrays. In the 2D case you can choose weather to drop the row or column containing np.nan.
import pandas as pd
import numpy as np
def dropna(arr, *args, **kwarg):
assert isinstance(arr, np.ndarray)
dropped=pd.DataFrame(arr).dropna(*args, **kwarg).values
if arr.ndim==1:
dropped=dropped.flatten()
return dropped
x = np.array([1400, 1500, 1600, np.nan, np.nan, np.nan ,1700])
y = np.array([[1400, 1500, 1600], [np.nan, 0, np.nan] ,[1700,1800,np.nan]] )
print('='*20+' 1D Case: ' +'='*20+'\nInput:\n',x,sep='')
print('\ndropna:\n',dropna(x),sep='')
print('\n\n'+'='*20+' 2D Case: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna (rows):\n',dropna(y),sep='')
print('\ndropna (columns):\n',dropna(y,axis=1),sep='')
print('\n\n'+'='*20+' x[np.logical_not(np.isnan(x))] for 2D: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna:\n',x[np.logical_not(np.isnan(x))],sep='')
Result:
==================== 1D Case: ====================
Input:
[1400. 1500. 1600. nan nan nan 1700.]
dropna:
[1400. 1500. 1600. 1700.]
==================== 2D Case: ====================
Input:
[[1400. 1500. 1600.]
[ nan 0. nan]
[1700. 1800. nan]]
dropna (rows):
[[1400. 1500. 1600.]]
dropna (columns):
[[1500.]
[ 0.]
[1800.]]
==================== x[np.logical_not(np.isnan(x))] for 2D: ====================
Input:
[[1400. 1500. 1600.]
[ nan 0. nan]
[1700. 1800. nan]]
dropna:
[1400. 1500. 1600. 1700.]

Doing the above :
x = x[~numpy.isnan(x)]
or
x = x[numpy.logical_not(numpy.isnan(x))]
I found that resetting to the same variable (x) did not remove the actual nan values and had to use a different variable. Setting it to a different variable removed the nans.
e.g.
y = x[~numpy.isnan(x)]

In case it helps, for simple 1d arrays:
x = np.array([np.nan, 1, 2, 3, 4])
x[~np.isnan(x)]
>>> array([1., 2., 3., 4.])
but if you wish to expand to matrices and preserve the shape:
x = np.array([
[np.nan, np.nan],
[np.nan, 0],
[1, 2],
[3, 4]
])
x[~np.isnan(x).any(axis=1)]
>>> array([[1., 2.],
[3., 4.]])
I encountered this issue when dealing with pandas .shift() functionality, and I wanted to avoid using .apply(..., axis=1) at all cost due to its inefficiency.

Simply fill with
x = numpy.array([
[0.99929941, 0.84724713, -0.1500044],
[-0.79709026, numpy.NaN, -0.4406645],
[-0.3599013, -0.63565744, -0.70251352]])
x[numpy.isnan(x)] = .555
print(x)
# [[ 0.99929941 0.84724713 -0.1500044 ]
# [-0.79709026 0.555 -0.4406645 ]
# [-0.3599013 -0.63565744 -0.70251352]]

pandas introduces an option to convert all data types to missing values.
https://pandas.pydata.org/docs/user_guide/missing_data.html
The np.isnan() function is not compatible with all data types, e.g.
>>> import numpy as np
>>> values = [np.nan, "x", "y"]
>>> np.isnan(values)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
The pd.isna() and pd.notna() functions are compatible with many data types and pandas introduces a pd.NA value:
>>> import numpy as np
>>> import pandas as pd
>>> values = pd.Series([np.nan, "x", "y"])
>>> values
0 NaN
1 x
2 y
dtype: object
>>> values.loc[pd.isna(values)]
0 NaN
dtype: object
>>> values.loc[pd.isna(values)] = pd.NA
>>> values.loc[pd.isna(values)]
0 <NA>
dtype: object
>>> values
0 <NA>
1 x
2 y
dtype: object
#
# using map with lambda, or a list comprehension
#
>>> values = [np.nan, "x", "y"]
>>> list(map(lambda x: pd.NA if pd.isna(x) else x, values))
[<NA>, 'x', 'y']
>>> [pd.NA if pd.isna(x) else x for x in values]
[<NA>, 'x', 'y']

A simplest way is:
numpy.nan_to_num(x)
Documentation: https://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html

Inconsistent behavior of jitted function

I have a very simple function like this one:
import numpy as np
from numba import jit
import pandas as pd
#jit
def f_(n, x, y, z):
for i in range(n):
z[i] = x[i] * y[i]
f_(df.shape[0], df["x"].values, df["y"].values, df["z"].values)
To which I pass
df = pd.DataFrame({"x": [1, 2, 3], "y": [3, 4, 5], "z": np.NaN})
I expected that function will modify data z column in place like this:
>>> f_(df.shape[0], df["x"].values, df["y"].values, df["z"].values)
>>> df
x y z
0 1 3 3.0
1 2 4 8.0
2 3 5 15.0
This works fine most of the time, but somehow fails to modify data in others.
I double checked things and:
I haven't determined any problems with data points which could cause this problem.
I see that data is modified as expected when I print the result.
If I return z array from the function it is modified as expected.
Unfortunately I couldn't reduce the problem to a minimal reproducible case. For example removing unrelated columns seems to "fix" the problem making reduction impossible.
Do I use jit in a way that is not intended to be used? Are there any border cases I should be aware of? Or is it likely to be a bug?
Edit:
I found the source of the problem. It occurs when data contains duplicated column names:
>>> df_ = pd.read_json('{"schema": {"fields":[{"name":"index","type":"integer"},{"name":"v","type":"integer"},{"name":"y","type":"integer"},
... {"name":"v","type":"integer"},{"name":"x","type":"integer"},{"name":"z","type":"number"}],"primaryKey":["index"],"pandas_version":"0.20.
... 0"}, "data": [{"index":0,"v":0,"y":3,"v":0,"x":1,"z":null}]}', orient="table")
>>> f_(df_.shape[0], df_["x"].values, df_["y"].values, df_["z"].values)
>>> df_
v y v x z
0 0 3 0 1 NaN
If duplicate is removed the function works like expected:
>>> df_.drop("v", axis="columns", inplace=True)
>>> f_(df_.shape[0], df_["x"].values, df_["y"].values, df_["z"].values)
>>> df_
y x z
0 3 1 3.0

Ah, that's because in your "failing case" the df["z"].values returns a copy of what is stored in the 'z' column of df. It has nothing to do with the numba function:
>>> import pandas as pd
>>> import numpy as np
>>> df = pd.DataFrame([[0, 3, 0, 1, np.nan]], columns=['v', 'y', 'v', 'x', 'z'])
>>> np.shares_memory(df['z'].values, df['z'])
False
While in the "working case" it's a view into the 'z' column:
>>> df = pd.DataFrame([[0, 3, 1, np.nan]], columns=['v', 'y', 'x', 'z'])
>>> np.shares_memory(df['z'].values, df['z'])
True
NB: It's actually quite funny that this works, because the copy is made when you do df['z'] not when you access the .values.
The take-away here is that you cannot expect that indexing a DataFrame or accessing the .values of a Series will always return a view. So updating the column in-place may not change the values of the original. Not only duplicate column names could be a problem. When the property values returns a copy and when it returns a view is not always clear (except for pd.Series then it's always a view). But these are just implementation details. So it's never a good idea to rely on a specific behavior here. The only guarantee that .values is making is that it returns a numpy.ndarray containing the same values.
However it's pretty easy to avoid that problem by simply returning the modified z column from the function:
import numba as nb
import numpy as np
import pandas as pd
#nb.njit
def f_(n, x, y, z):
for i in range(n):
z[i] = x[i] * y[i]
return z # this is new
Then assign the result of the function to the column:
>>> df = pd.DataFrame([[0, 3, 0, 1, np.nan]], columns=['v', 'y', 'v', 'x', 'z'])
>>> df['z'] = f_(df.shape[0], df["x"].values, df["y"].values, df["z"].values)
>>> df
v y v x z
0 0 3 0 1 3.0
>>> df = pd.DataFrame([[0, 3, 1, np.nan]], columns=['v', 'y', 'x', 'z'])
>>> df['z'] = f_(df.shape[0], df["x"].values, df["y"].values, df["z"].values)
>>> df
v y x z
0 0 3 1 3.0
In case you're interested what happened in your specific case currently (as I mentioned we're talking about implementation details here so don't take this as given. It's just the way it's implemented now). If you have a DataFrame it will store the columns that have the same dtype in a multidimensional NumPy array. This can be seen if you access the blocks attribute (deprecated because the internal storage may change in the near future):
>>> df = pd.DataFrame([[0, 3, 0, 1, np.nan]], columns=['v', 'y', 'v', 'x', 'z'])
>>> df.blocks
{'float64':
z
0 NaN
,
'int64':
v y v x
0 0 3 0 1}
Normally it's very easy to create a view into that block, by translating the column name to the column index of the corresponding block. However if you have a duplicate column name the accessing an arbitrary column cannot be guaranteed to be a view. For example if you want to access 'v' then it has to index the Int64 Block with index 0 and 2:
>>> df = pd.DataFrame([[0, 3, 0, 1, np.nan]], columns=['v', 'y', 'v', 'x', 'z'])
>>> df['v']
v v
0 0 0
Technically it could be possible to index the non-duplicated columns as views (and in this case even for the duplicated column, for example by using Int64Block[::2] but that's a very special case...). Pandas opts for the safe option to always return a copy if there are duplicate column names (makes sense if you think about it. Why should indexing one column return a view and another returns a copy). The indexing of the DataFrame has an explicit check for duplicate columns and treats them differently (resulting in copies):
def _getitem_column(self, key):
""" return the actual column """
# get column
if self.columns.is_unique:
return self._get_item_cache(key)
# duplicate columns & possible reduce dimensionality
result = self._constructor(self._data.get(key))
if result.columns.is_unique:
result = result[key]
return result
The columns.is_unique is the important line here. It's True for your "normal case" but "False" for the "failing case".

Convert Select Columns in Pandas Dataframe to Numpy Array

I would like to convert everything but the first column of a pandas dataframe into a numpy array. For some reason using the columns= parameter of DataFrame.to_matrix() is not working.
df:
viz a1_count a1_mean a1_std
0 n 3 2 0.816497
1 n 0 NaN NaN
2 n 2 51 50.000000
I tried X=df.as_matrix(columns=[df[1:]]) but this yields an array of all NaNs

the easy way is the "values" property df.iloc[:,1:].values
a=df.iloc[:,1:]
b=df.iloc[:,1:].values
print(type(df))
print(type(a))
print(type(b))
so, you can get type
<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.frame.DataFrame'>
<class 'numpy.ndarray'>

Please use the Pandas to_numpy() method. Below is an example--
>>> import pandas as pd
>>> df = pd.DataFrame({"A":[1, 2], "B":[3, 4], "C":[5, 6]})
>>> df
A B C
0 1 3 5
1 2 4 6
>>> s_array = df[["A", "B", "C"]].to_numpy()
>>> s_array
array([[1, 3, 5],
[2, 4, 6]])
>>> t_array = df[["B", "C"]].to_numpy()
>>> print (t_array)
[[3 5]
[4 6]]
Hope this helps. You can select any number of columns using
columns = ['col1', 'col2', 'col3']
df1 = df[columns]
Then apply to_numpy() method.

The columns parameter accepts a collection of column names. You're passing a list containing a dataframe with two rows:
>>> [df[1:]]
[ viz a1_count a1_mean a1_std
1 n 0 NaN NaN
2 n 2 51 50]
>>> df.as_matrix(columns=[df[1:]])
array([[ nan, nan],
[ nan, nan],
[ nan, nan]])
Instead, pass the column names you want:
>>> df.columns[1:]
Index(['a1_count', 'a1_mean', 'a1_std'], dtype='object')
>>> df.as_matrix(columns=df.columns[1:])
array([[ 3. , 2. , 0.816497],
[ 0. , nan, nan],
[ 2. , 51. , 50. ]])

Hope this easy one liner helps:
cols_as_np = df[df.columns[1:]].to_numpy()

The best way for converting to Numpy Array is using '.to_numpy(self, dtype=None, copy=False)'. It is new in version 0.24.0.Refrence
You can also use '.array'.Refrence
Pandas .as_matrix deprecated since version 0.23.0.

Instead of .as_matrix(), use .values, because the first one was deprecated. Here is the contribution:
'DataFrame' object has no attribute 'as_matrix

The fastest and easiest way is to use .as_matrix(). One short line:
df.iloc[:,[1,2,3]].as_matrix()
Gives:
array([[3, 2, 0.816497],
[0, 'NaN', 'NaN'],
[2, 51, 50.0]], dtype=object)
By using indices of the columns, you can use this code for any dataframe with different column names.
Here are the steps for your example:
import pandas as pd
columns = ['viz', 'a1_count', 'a1_mean', 'a1_std']
index = [0,1,2]
vals = {'viz': ['n','n','n'], 'a1_count': [3,0,2], 'a1_mean': [2,'NaN', 51], 'a1_std': [0.816497, 'NaN', 50.000000]}
df = pd.DataFrame(vals, columns=columns, index=index)
Gives:
viz a1_count a1_mean a1_std
0 n 3 2 0.816497
1 n 0 NaN NaN
2 n 2 51 50
Then:
x1 = df.iloc[:,[1,2,3]].as_matrix()
Gives:
array([[3, 2, 0.816497],
[0, 'NaN', 'NaN'],
[2, 51, 50.0]], dtype=object)
Where x1 is numpy.ndarray.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How do I remove NaN values from a NumPy array? - python

How do I remove NaN values from a NumPy array? [1, 2, NaN, 4, NaN, 8] ⟶ [1, 2, 4, 8]

filter(lambda v: v==v, x) works both for lists and numpy array since v!=v only for NaN

For me the answer by #jmetz didn't work, however using pandas isnull() did. x = x[~pd.isnull(x)]

Try this: import math print [value for value in x if not math.isnan(value)] For more, read on List Comprehensions.

As shown by others x[~numpy.isnan(x)] works. But it will throw an error if the numpy dtype is not a native data type, for example if it is object. In that case you can use pandas. x[~pandas.isna(x)] or x[~pandas.isnull(x)]

If you're using numpy # first get the indices where the values are finite ii = np.isfinite(x) # second get the values x = x[ii]

Doing the above : x = x[~numpy.isnan(x)] or x = x[numpy.logical_not(numpy.isnan(x))] I found that resetting to the same variable (x) did not remove the actual nan values and had to use a different variable. Setting it to a different variable removed the nans. e.g. y = x[~numpy.isnan(x)]

Simply fill with x = numpy.array([ [0.99929941, 0.84724713, -0.1500044], [-0.79709026, numpy.NaN, -0.4406645], [-0.3599013, -0.63565744, -0.70251352]]) x[numpy.isnan(x)] = .555 print(x) # [[ 0.99929941 0.84724713 -0.1500044 ] # [-0.79709026 0.555 -0.4406645 ] # [-0.3599013 -0.63565744 -0.70251352]]

A simplest way is: numpy.nan_to_num(x) Documentation: https://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html

Related

Get values of pandas series from a array of index locations

Find all NaN slice in numpy array

Removing NaNs from imported lists [duplicate]

Inconsistent behavior of jitted function

Convert Select Columns in Pandas Dataframe to Numpy Array

Categories

Resources