I have a four dimensional Numpy ndarray (time, pressure level, latitude, longitude), and I want to check for each time and pressure level (dimensions 0 and 1) if there is an all-NaN slice along the latitude or longitude dimenstion (2 and 3).
I'd like to to it in a vectorized way, so without looping over the array, but I can't figure out how.
import numpy as np
a=np.ones([2,3,5,5])
a[0,2,:,2]=np.nan*np.ones_like(a[0,2,:,2])
a[0,1,1,:]=np.nan*np.ones_like(a[0,1,1,:])
a[0,0,1,2]=np.nan
a[1,1,:,2]=np.nan*np.ones_like(a[0,2,:,2])
a[1,1,1,:]=np.nan*np.ones_like(a[0,1,1,:])
print(a)
The array now holds ones (i.e. numbers), and in some locations slices of only NaNs. I'd like to know these locations. So in this case, I need to find that the NaN slices are at [0,2,:,2], [0,1,1,:], [1,1,:,2], and a[1,1,1,:].
You should use the np.isnan function which creates a boolean matrix of the same size as your original matrix. Then just use boolean reduction operations like np.all. Thus the following code stores in idx the index of the lines (axis=1) of which all the elements are equal to np.nan.
arr = np.array([[0, 0, 0], [np.nan, np.nan, np.nan], [1, np.nan, 1]])
arr_isnan = np.isnan(arr)
idx = np.argwhere(arr_isnan.all(axis=1))
Output:
>>>print(idx)
[[1]]
Following your example this methods gives you this output :
arr_isnan = np.isnan(a)
idx = np.argwhere(arr_isnan.all(axis=2))
>>>print(idx) #[0,2,:,2] and [1,1,:,2] because axis=2
array([[0, 2, 2],
[1, 1, 2]], dtype=int64)
>>>print(a[idx[:,0], idx[:,1], :, idx[:,2]])
[[nan nan nan nan nan]
[nan nan nan nan nan]]
So you just have to adjust the position of ":" according to the axis.
Related
I have a list of numbers
a = [1, 2, 3, 4, 5]
and an existed array
b = [[np.nan, 10, np.nan],
[11, 12, 13],
[np.nan, 14, np.nan]]
How can I place the numbers from "list a" to the elements on array b that contains a number which I should get
c = [[np.nan, 1, np.nan],
[2, 3, 4],
[np.nan, 5, np.nan]]
Maybe it can be done with loops but I want to avoid it because the length of the list and the dimension of the array will change. However, the length of the list will always match the number of the elements that are not an np.nan in the array.
Here is an approach to solve it without using loops.
First, we flatten the array b to convert it to a 1D array and then replace the none nan values with contents of a. Then, convert the array back to its initial shape.
flat_b = b.flatten()
flat_b[~np.isnan(flat_b)] = a
flat_b.reshape(b.shape)
You can np.isnan to create a boolean mask. Then use it in indexing1.
m = np.isnan(b)
b[~m] = a
print(b)
[[nan 1. nan]
[ 2. 3. 4.]
[nan 5. nan]]
1. NumPy's Boolean Indexing
c = b
current = 0
for i in range(len(c)):
for j in range(len(c[i])):
if c[i][j] != np.nan and current < len(a):
c[i][j] = a[current]
current += 1
While this may look long and complicated, it actually only has a O(n) complexity. It just iterates through the 2D array and replaces the non-nan values with the current value from a.
I want to count the number of 'nan' values per column inside a matrix full of string values. Like this one:
m:
[['CB_2' 'CB_3']
['CB_1-1' 'CB_4-1']
['CB_1-2' 'CB_4-2']
['CB_2-1' 'CB_5-1']
['CB_2-2' 'CB_5-2']
[nan 'CB_6-1']
[nan 'CB_6-2']]
I tried using np.count_nonzero(~np.isnan(m) but it seems to work only with numerical values. Perhaps if I convert it into an empty string or zero (?).
Also, I created a sample numpy array with strings (to try several options) (np.array([['a','b'],['c','d'],['e','f'],['e','g'],['k','ñ'],['w','q'],['y','d']])) but when I use np.nan it doesnt seems to works correctly since it adds the nan value as a string ('nan').
Thanks,
You can transform the array into something numerical (I could not reproduce array with nans, but you can make function to return 0 for non-strings):
def f(x):
if isinstance(x, str):
if x == 'nan':
return 0
else:
return 1
return 0
vf = np.vectorize(f)
x = np.array([['CB_2', 'CB_3'],
['CB_1-1', 'CB_4-1'],
['CB_1-2', 'CB_4-2'],
['CB_2-1', 'CB_5-1'],
['CB_2-2', 'CB_5-2'],
[np.nan, 'CB_6-1'],
[np.nan, 'CB_6-2']])
>>> x
array([['CB_2', 'CB_3'],
['CB_1-1', 'CB_4-1'],
['CB_1-2', 'CB_4-2'],
['CB_2-1', 'CB_5-1'],
['CB_2-2', 'CB_5-2'],
['nan', 'CB_6-1'],
['nan', 'CB_6-2']], dtype='<U6')
>>> vf(x)
array([[1, 1],
[1, 1],
[1, 1],
[1, 1],
[1, 1],
[0, 1],
[0, 1]])
How do I remove NaN values from a NumPy array?
[1, 2, NaN, 4, NaN, 8] ⟶ [1, 2, 4, 8]
To remove NaN values from a NumPy array x:
x = x[~numpy.isnan(x)]
Explanation
The inner function numpy.isnan returns a boolean/logical array which has the value True everywhere that x is not-a-number. Since we want the opposite, we use the logical-not operator ~ to get an array with Trues everywhere that x is a valid number.
Lastly, we use this logical array to index into the original array x, in order to retrieve just the non-NaN values.
filter(lambda v: v==v, x)
works both for lists and numpy array
since v!=v only for NaN
For me the answer by #jmetz didn't work, however using pandas isnull() did.
x = x[~pd.isnull(x)]
Try this:
import math
print [value for value in x if not math.isnan(value)]
For more, read on List Comprehensions.
#jmetz's answer is probably the one most people need; however it yields a one-dimensional array, e.g. making it unusable to remove entire rows or columns in matrices.
To do so, one should reduce the logical array to one dimension, then index the target array. For instance, the following will remove rows which have at least one NaN value:
x = x[~numpy.isnan(x).any(axis=1)]
See more detail here.
As shown by others
x[~numpy.isnan(x)]
works. But it will throw an error if the numpy dtype is not a native data type, for example if it is object. In that case you can use pandas.
x[~pandas.isna(x)] or x[~pandas.isnull(x)]
If you're using numpy
# first get the indices where the values are finite
ii = np.isfinite(x)
# second get the values
x = x[ii]
The accepted answer changes shape for 2d arrays.
I present a solution here, using the Pandas dropna() functionality.
It works for 1D and 2D arrays. In the 2D case you can choose weather to drop the row or column containing np.nan.
import pandas as pd
import numpy as np
def dropna(arr, *args, **kwarg):
assert isinstance(arr, np.ndarray)
dropped=pd.DataFrame(arr).dropna(*args, **kwarg).values
if arr.ndim==1:
dropped=dropped.flatten()
return dropped
x = np.array([1400, 1500, 1600, np.nan, np.nan, np.nan ,1700])
y = np.array([[1400, 1500, 1600], [np.nan, 0, np.nan] ,[1700,1800,np.nan]] )
print('='*20+' 1D Case: ' +'='*20+'\nInput:\n',x,sep='')
print('\ndropna:\n',dropna(x),sep='')
print('\n\n'+'='*20+' 2D Case: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna (rows):\n',dropna(y),sep='')
print('\ndropna (columns):\n',dropna(y,axis=1),sep='')
print('\n\n'+'='*20+' x[np.logical_not(np.isnan(x))] for 2D: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna:\n',x[np.logical_not(np.isnan(x))],sep='')
Result:
==================== 1D Case: ====================
Input:
[1400. 1500. 1600. nan nan nan 1700.]
dropna:
[1400. 1500. 1600. 1700.]
==================== 2D Case: ====================
Input:
[[1400. 1500. 1600.]
[ nan 0. nan]
[1700. 1800. nan]]
dropna (rows):
[[1400. 1500. 1600.]]
dropna (columns):
[[1500.]
[ 0.]
[1800.]]
==================== x[np.logical_not(np.isnan(x))] for 2D: ====================
Input:
[[1400. 1500. 1600.]
[ nan 0. nan]
[1700. 1800. nan]]
dropna:
[1400. 1500. 1600. 1700.]
Doing the above :
x = x[~numpy.isnan(x)]
or
x = x[numpy.logical_not(numpy.isnan(x))]
I found that resetting to the same variable (x) did not remove the actual nan values and had to use a different variable. Setting it to a different variable removed the nans.
e.g.
y = x[~numpy.isnan(x)]
In case it helps, for simple 1d arrays:
x = np.array([np.nan, 1, 2, 3, 4])
x[~np.isnan(x)]
>>> array([1., 2., 3., 4.])
but if you wish to expand to matrices and preserve the shape:
x = np.array([
[np.nan, np.nan],
[np.nan, 0],
[1, 2],
[3, 4]
])
x[~np.isnan(x).any(axis=1)]
>>> array([[1., 2.],
[3., 4.]])
I encountered this issue when dealing with pandas .shift() functionality, and I wanted to avoid using .apply(..., axis=1) at all cost due to its inefficiency.
Simply fill with
x = numpy.array([
[0.99929941, 0.84724713, -0.1500044],
[-0.79709026, numpy.NaN, -0.4406645],
[-0.3599013, -0.63565744, -0.70251352]])
x[numpy.isnan(x)] = .555
print(x)
# [[ 0.99929941 0.84724713 -0.1500044 ]
# [-0.79709026 0.555 -0.4406645 ]
# [-0.3599013 -0.63565744 -0.70251352]]
pandas introduces an option to convert all data types to missing values.
https://pandas.pydata.org/docs/user_guide/missing_data.html
The np.isnan() function is not compatible with all data types, e.g.
>>> import numpy as np
>>> values = [np.nan, "x", "y"]
>>> np.isnan(values)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
The pd.isna() and pd.notna() functions are compatible with many data types and pandas introduces a pd.NA value:
>>> import numpy as np
>>> import pandas as pd
>>> values = pd.Series([np.nan, "x", "y"])
>>> values
0 NaN
1 x
2 y
dtype: object
>>> values.loc[pd.isna(values)]
0 NaN
dtype: object
>>> values.loc[pd.isna(values)] = pd.NA
>>> values.loc[pd.isna(values)]
0 <NA>
dtype: object
>>> values
0 <NA>
1 x
2 y
dtype: object
#
# using map with lambda, or a list comprehension
#
>>> values = [np.nan, "x", "y"]
>>> list(map(lambda x: pd.NA if pd.isna(x) else x, values))
[<NA>, 'x', 'y']
>>> [pd.NA if pd.isna(x) else x for x in values]
[<NA>, 'x', 'y']
A simplest way is:
numpy.nan_to_num(x)
Documentation: https://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html
I want to find indexes of row based on criteria over certain columns
So, something like:
import numpy as np
x = np.random.rand(4, 5)
x[2, 2] = 0
x[2, 3] = 0
x[3, 1] = 0
x[1, 3] = 0
Now, I want to get the index of the rows where either of columns 3 or 4 are zeros. How can one do that with numpy? Do I need to make multiple calls to nonzero for each column and combine these indices using a set or something like that?
Using np.where first array within the tuple is row index
np.where(x[:,[3,4]]==0)
Out[79]: (array([1, 2], dtype=int64), array([0, 0], dtype=int64))
How do I remove NaN values from a NumPy array?
[1, 2, NaN, 4, NaN, 8] ⟶ [1, 2, 4, 8]
To remove NaN values from a NumPy array x:
x = x[~numpy.isnan(x)]
Explanation
The inner function numpy.isnan returns a boolean/logical array which has the value True everywhere that x is not-a-number. Since we want the opposite, we use the logical-not operator ~ to get an array with Trues everywhere that x is a valid number.
Lastly, we use this logical array to index into the original array x, in order to retrieve just the non-NaN values.
filter(lambda v: v==v, x)
works both for lists and numpy array
since v!=v only for NaN
For me the answer by #jmetz didn't work, however using pandas isnull() did.
x = x[~pd.isnull(x)]
Try this:
import math
print [value for value in x if not math.isnan(value)]
For more, read on List Comprehensions.
#jmetz's answer is probably the one most people need; however it yields a one-dimensional array, e.g. making it unusable to remove entire rows or columns in matrices.
To do so, one should reduce the logical array to one dimension, then index the target array. For instance, the following will remove rows which have at least one NaN value:
x = x[~numpy.isnan(x).any(axis=1)]
See more detail here.
As shown by others
x[~numpy.isnan(x)]
works. But it will throw an error if the numpy dtype is not a native data type, for example if it is object. In that case you can use pandas.
x[~pandas.isna(x)] or x[~pandas.isnull(x)]
If you're using numpy
# first get the indices where the values are finite
ii = np.isfinite(x)
# second get the values
x = x[ii]
The accepted answer changes shape for 2d arrays.
I present a solution here, using the Pandas dropna() functionality.
It works for 1D and 2D arrays. In the 2D case you can choose weather to drop the row or column containing np.nan.
import pandas as pd
import numpy as np
def dropna(arr, *args, **kwarg):
assert isinstance(arr, np.ndarray)
dropped=pd.DataFrame(arr).dropna(*args, **kwarg).values
if arr.ndim==1:
dropped=dropped.flatten()
return dropped
x = np.array([1400, 1500, 1600, np.nan, np.nan, np.nan ,1700])
y = np.array([[1400, 1500, 1600], [np.nan, 0, np.nan] ,[1700,1800,np.nan]] )
print('='*20+' 1D Case: ' +'='*20+'\nInput:\n',x,sep='')
print('\ndropna:\n',dropna(x),sep='')
print('\n\n'+'='*20+' 2D Case: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna (rows):\n',dropna(y),sep='')
print('\ndropna (columns):\n',dropna(y,axis=1),sep='')
print('\n\n'+'='*20+' x[np.logical_not(np.isnan(x))] for 2D: ' +'='*20+'\nInput:\n',y,sep='')
print('\ndropna:\n',x[np.logical_not(np.isnan(x))],sep='')
Result:
==================== 1D Case: ====================
Input:
[1400. 1500. 1600. nan nan nan 1700.]
dropna:
[1400. 1500. 1600. 1700.]
==================== 2D Case: ====================
Input:
[[1400. 1500. 1600.]
[ nan 0. nan]
[1700. 1800. nan]]
dropna (rows):
[[1400. 1500. 1600.]]
dropna (columns):
[[1500.]
[ 0.]
[1800.]]
==================== x[np.logical_not(np.isnan(x))] for 2D: ====================
Input:
[[1400. 1500. 1600.]
[ nan 0. nan]
[1700. 1800. nan]]
dropna:
[1400. 1500. 1600. 1700.]
Doing the above :
x = x[~numpy.isnan(x)]
or
x = x[numpy.logical_not(numpy.isnan(x))]
I found that resetting to the same variable (x) did not remove the actual nan values and had to use a different variable. Setting it to a different variable removed the nans.
e.g.
y = x[~numpy.isnan(x)]
In case it helps, for simple 1d arrays:
x = np.array([np.nan, 1, 2, 3, 4])
x[~np.isnan(x)]
>>> array([1., 2., 3., 4.])
but if you wish to expand to matrices and preserve the shape:
x = np.array([
[np.nan, np.nan],
[np.nan, 0],
[1, 2],
[3, 4]
])
x[~np.isnan(x).any(axis=1)]
>>> array([[1., 2.],
[3., 4.]])
I encountered this issue when dealing with pandas .shift() functionality, and I wanted to avoid using .apply(..., axis=1) at all cost due to its inefficiency.
Simply fill with
x = numpy.array([
[0.99929941, 0.84724713, -0.1500044],
[-0.79709026, numpy.NaN, -0.4406645],
[-0.3599013, -0.63565744, -0.70251352]])
x[numpy.isnan(x)] = .555
print(x)
# [[ 0.99929941 0.84724713 -0.1500044 ]
# [-0.79709026 0.555 -0.4406645 ]
# [-0.3599013 -0.63565744 -0.70251352]]
pandas introduces an option to convert all data types to missing values.
https://pandas.pydata.org/docs/user_guide/missing_data.html
The np.isnan() function is not compatible with all data types, e.g.
>>> import numpy as np
>>> values = [np.nan, "x", "y"]
>>> np.isnan(values)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
The pd.isna() and pd.notna() functions are compatible with many data types and pandas introduces a pd.NA value:
>>> import numpy as np
>>> import pandas as pd
>>> values = pd.Series([np.nan, "x", "y"])
>>> values
0 NaN
1 x
2 y
dtype: object
>>> values.loc[pd.isna(values)]
0 NaN
dtype: object
>>> values.loc[pd.isna(values)] = pd.NA
>>> values.loc[pd.isna(values)]
0 <NA>
dtype: object
>>> values
0 <NA>
1 x
2 y
dtype: object
#
# using map with lambda, or a list comprehension
#
>>> values = [np.nan, "x", "y"]
>>> list(map(lambda x: pd.NA if pd.isna(x) else x, values))
[<NA>, 'x', 'y']
>>> [pd.NA if pd.isna(x) else x for x in values]
[<NA>, 'x', 'y']
A simplest way is:
numpy.nan_to_num(x)
Documentation: https://docs.scipy.org/doc/numpy/reference/generated/numpy.nan_to_num.html