Remove NaN from 2D numpy array

Remove NaN from 2D numpy array - python

For example, if I have the 2D array as follows.
[[1,2,3,NAN],
[4,5,NAN,NAN],
[6,NAN,NAN,NAN]
]
The desired result is
[[1,2,3],
[4,5],
[6]
]
How should I transform?
I find using
x = x[~numpy.isnan(x)] can only generate [1,2,3,4,5,6], which has been squeezed into one dimensional array.
Thanks!

Just apply that isnan on a row by row basis
In [135]: [row[~np.isnan(row)] for row in arr]
Out[135]: [array([1., 2., 3.]), array([4., 5.]), array([6.])]
Boolean masking as in x[~numpy.isnan(x)] produces a flattened result because, in general, the result will be ragged like this, and can't be formed into a 2d array.
The source array must be float dtype - because np.nan is a float:
In [138]: arr = np.array([[1,2,3,np.nan],[4,5,np.nan,np.nan],[6,np.nan,np.nan,np.nan]])
In [139]: arr
Out[139]:
array([[ 1., 2., 3., nan],
[ 4., 5., nan, nan],
[ 6., nan, nan, nan]])
If object dtype, the numbers can be integer, but np.isnan(arr) won't work.
If the original is a list, rather than an array:
In [146]: alist = [[1,2,3,np.nan],[4,5,np.nan,np.nan],[6,np.nan,np.nan,np.nan]]
In [147]: alist
Out[147]: [[1, 2, 3, nan], [4, 5, nan, nan], [6, nan, nan, nan]]
In [148]: [[i for i in row if ~np.isnan(i)] for row in alist]
Out[148]: [[1, 2, 3], [4, 5], [6]]
The flat array could be turned into a list of arrays with split:
In [152]: np.split(arr[~np.isnan(arr)],(3,5))
Out[152]: [array([1., 2., 3.]), array([4., 5.]), array([6.])]
where the (3,5) split parameter could be determined by counting the non-nan in each row, but that's more work and doesn't promise to be faster than than the row iteration.

Related

How to stack uneven numpy arrays?

how can I stack the elements from the same respective index from each array in a list of arrays?
arrays = [np.array([1,2,3,4,5]),
np.array([6,7,8,9]),
np.array([11,22,33,44,55]),
np.array([2,4])]
output = [[1,6,11,2],
[2,7,22,4],
[3,8,33],
[4,9,44],
[5,55]]
arrays is a list of arrays of uneven lengths. The output has a first array (don't mind if it's a list too) that contains all possible index 0s from each array. The next array within output contains all possible index 1s and so on...
Closest thing I can find (but requires same shape arrays) is:
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
np.stack((a, b), axis=-1)
# which gives
array([[1, 2],
[2, 3],
[3, 4]])
Thanks.

This gets you close. You can't really have a 2D sparse array as shown in your example output.
import numpy as np
arrays = [np.array([1,2,3,4,5]),
np.array([6,7,8,9]),
np.array([11,22,33,44,55]),
np.array([2,4])]
maxx = max(x.shape[0] for x in arrays)
for x in arrays:
x.resize(maxx,refcheck=False)
output = np.stack(arrays, axis=1)
print(output)
C:\tmp>python x.py
[[ 1 6 11 2]
[ 2 7 22 4]
[ 3 8 33 0]
[ 4 9 44 0]
[ 5 0 55 0]]

You could just wrap it in a DataFrame first:
arr = pd.DataFrame(arrays).values.T
Output:
array([[ 1., 6., 11., 2.],
[ 2., 7., 22., 4.],
[ 3., 8., 33., nan],
[ 4., 9., 44., nan],
[ 5., nan, 55., nan]])
Though if you really want it with different sizes, go with:
arr = [x.dropna().values for _, x in pd.DataFrame(arrays).iteritems()]
Output:
[array([ 1, 6, 11, 2]),
array([ 2, 7, 22, 4]),
array([ 3., 8., 33.]),
array([ 4., 9., 44.]),
array([ 5., 55.])]

Compare Two Matrices Containing NaN and Mask the Element Values in Both Matrices Where at Least One of Them Contains NaN in Python

I have two 3D matrices (A and B), both containing NaN in random elements. I am comparing these two matrices and in places where at least one of them contains NaN I want both of them to contain NaN. In other words, if the both of them don't already contain NaN at that index, I want to replace that index value with NaN. Is there an efficient way to do this with a python function?
import numpy as np
# Create the fake variables A and B. Here is what A and B look like initially:
A = np.array([[1, 2, np.nan], [4, np.nan, 6], [np.nan, 8, 9]])
B = np.array([[1, 2, 3], [4, 5, np.nan], [np.nan, 8, 9]])
# What I want A and B to look like in the end:
A
array([[ 1., 2., nan],
[ 4., nan, nan],
[ nan, 8., 9.]])
B
array([[ 1., 2., nan],
[ 4., nan, nan],
[ nan, 8., 9.]])

You need numpy.isnan() and boolean indexing.
>>> A[np.isnan(B)] = np.nan
>>> B[np.isnan(A)] = np.nan

Python: properly iterating through a dictionary of numpy arrays

Given the following numpy arrays:
import numpy
a=numpy.array([[1,1,1],[1,1,1],[1,1,1]])
b=numpy.array([[2,2,2],[2,2,2],[2,2,2]])
c=numpy.array([[3,3,3],[3,3,3],[3,3,3]])
and this dictionary containing them all:
mydict={0:a,1:b,2:c}
What is the most efficient way of iterating through mydict so to compute the average numpy array that has (1+2+3)/3=2 as values?
My attempt fails as I am giving it too many values to unpack. It is also extremely inefficient as it has an O(n^3) time complexity:
aver=numpy.empty([a.shape[0],a.shape[1]])
for c,v in mydict.values():
for i in range(0,a.shape[0]):
for j in range(0,a.shape[1]):
aver[i][j]=mydict[c][i][j] #<-too many values to unpack
The final result should be:
In[17]: aver
Out[17]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
EDIT
I am not looking for an average value for each numpy array. I am looking for an average value for each element of my colleciton of numpy arrays. This is a minimal example, but the real thing I am working on has over 120,000 elements per array, and for the same position the values change from array to array.

I think you're making this harder than it needs to be. Either sum them and divide by the number of terms:
In [42]: v = mydict.values()
In [43]: sum(v) / len(v)
Out[43]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
Or stack them into one big array -- which it sounds like is the format they probably should have been in to start with -- and take the mean over the stacked axis:
In [44]: np.array(list(v)).mean(axis=0)
Out[44]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])

You really shouldn't be using a dict of numpy.arrays. Just use a multi-dimensional array:
>>> bigarray = numpy.array([arr.tolist() for arr in mydict.values()])
>>> bigarray
array([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]],
[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]]])
>>> bigarray.mean(axis=0)
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
>>>
You should modify your code to not even work with a dict. Especially not a dict with integer keys...

Replace values in numpy array containing NaN

aa = np.array([2.0, np.NaN])
aa[aa>1.0] = np.NaN
On running the code above, I get the foll. warning, I understand the reason for this warning, but how to avoid it?
RuntimeWarning: invalid value encountered in greater

Store the indices of the valid ones (non - NaNs). First off, we will use these indices to index into the array and perform the comparison to get a mask and then again index into those indices with that mask to retrieve back the indices corresponding to original order. Using the original-ordered indices, we could then assign elements in the input array to NaNs.
Thus, an implementation/solution would be -
idx = np.flatnonzero(~np.isnan(aa))
aa[idx[aa[idx] > 1.0]] = np.nan
Sample run -
In [106]: aa # Input array with NaNs
Out[106]: array([ 0., 3., nan, 0., 9., 6., 6., nan, 18., 6.])
In [107]: idx = np.flatnonzero(~np.isnan(aa)) # Store valid indices
In [108]: idx
Out[108]: array([0, 1, 3, 4, 5, 6, 8, 9])
In [109]: aa[idx[aa[idx] > 1.0]] = np.nan # Do the assignment
In [110]: aa # Verify
Out[110]: array([ 0., nan, nan, 0., nan, nan, nan, nan, nan, nan])

Column_match([[1],[1,1]]) <--- how to make dimensions match with NA values?

Any flag for this? Please, see the intended.
>>> numpy.column_stack([[1], [1,2]])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/pymodules/python2.7/numpy/lib/shape_base.py", line 296, in column_stack
return _nx.concatenate(arrays,1)
ValueError: array dimensions must agree except for d_0
Input
[[1],[1,2]]
Intended Output
[[NA,1], [1,2]]
In general
[[1],[2,2],[3,3,3],...,[n,n,n,n,n...,n]]
to
[[NA, NA, NA,..., NA,1], [NA, NA, ..., 2, 2], ...[n,n,n,n,n]]
where the columns may be a triangluar zero matrix initially. Yes you can understand the term NA as None. I got the triangular matrix almost below.
>>> a=[[1],[2,2],[3,3,3]]
>>> a
[[1], [2, 2], [3, 3, 3]]
>>> len(a)
3
>>> [aa+['']*(N-len(aa)) for
...
KeyboardInterrupt
>>> N=len(a)
>>> [aa+['']*(N-len(aa)) for aa in a]
[[1, '', ''], [2, 2, ''], [3, 3, 3]]
>>> transpose([aa+['']*(N-len(aa)) for aa in a])
array([['1', '2', '3'],
['', '2', '3'],
['', '', '3']],
dtype='|S4')

a pure numpy solution:
>>> lili = [[1],[2,2],[3,3,3],[4,4,4,4]]
>>> y = np.nan*np.ones((4,4))
>>> y[np.tril_indices(4)] = np.concatenate(lili)
>>> y
array([[ 1., nan, nan, nan],
[ 2., 2., nan, nan],
[ 3., 3., 3., nan],
[ 4., 4., 4., 4.]])
>>> y[:,::-1]
array([[ nan, nan, nan, 1.],
[ nan, nan, 2., 2.],
[ nan, 3., 3., 3.],
[ 4., 4., 4., 4.]])
I'm not sure which triangular array you want, there is also np.triu_indices
(maybe not always faster, but easy to read)

column_stack adds a column to an array. That column is supposed to be a smalled (1D) array.
When I try :
from numpy import *
x = array([0])
z = array([1, 2])
if you do this :
r = column_stack ((x,z))
You'll get this :
>>> array([0,1,2])
So, in order to add a column to your first array, maybe this :
n = array([9])
arr = ([column_stack((n, x))], z)
It shows up this :
>>> arr
([array([[9, 0]])], array([[1, 2]]))
It has the same look as your "intended output"
Hope this was helpful !

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove NaN from 2D numpy array - python

For example, if I have the 2D array as follows. [[1,2,3,NAN], [4,5,NAN,NAN], [6,NAN,NAN,NAN] ] The desired result is [[1,2,3], [4,5], [6] ] How should I transform? I find using x = x[~numpy.isnan(x)] can only generate [1,2,3,4,5,6], which has been squeezed into one dimensional array. Thanks!

Related

How to stack uneven numpy arrays?

Compare Two Matrices Containing NaN and Mask the Element Values in Both Matrices Where at Least One of Them Contains NaN in Python

Python: properly iterating through a dictionary of numpy arrays

Replace values in numpy array containing NaN

Column_match([[1],[1,1]]) <--- how to make dimensions match with NA values?

Categories

Resources