Remove NaN from 2D numpy array - python

For example, if I have the 2D array as follows.
[[1,2,3,NAN],
[4,5,NAN,NAN],
[6,NAN,NAN,NAN]
]
The desired result is
[[1,2,3],
[4,5],
[6]
]
How should I transform?
I find using
x = x[~numpy.isnan(x)] can only generate [1,2,3,4,5,6], which has been squeezed into one dimensional array.
Thanks!

Just apply that isnan on a row by row basis
In [135]: [row[~np.isnan(row)] for row in arr]
Out[135]: [array([1., 2., 3.]), array([4., 5.]), array([6.])]
Boolean masking as in x[~numpy.isnan(x)] produces a flattened result because, in general, the result will be ragged like this, and can't be formed into a 2d array.
The source array must be float dtype - because np.nan is a float:
In [138]: arr = np.array([[1,2,3,np.nan],[4,5,np.nan,np.nan],[6,np.nan,np.nan,np.nan]])
In [139]: arr
Out[139]:
array([[ 1., 2., 3., nan],
[ 4., 5., nan, nan],
[ 6., nan, nan, nan]])
If object dtype, the numbers can be integer, but np.isnan(arr) won't work.
If the original is a list, rather than an array:
In [146]: alist = [[1,2,3,np.nan],[4,5,np.nan,np.nan],[6,np.nan,np.nan,np.nan]]
In [147]: alist
Out[147]: [[1, 2, 3, nan], [4, 5, nan, nan], [6, nan, nan, nan]]
In [148]: [[i for i in row if ~np.isnan(i)] for row in alist]
Out[148]: [[1, 2, 3], [4, 5], [6]]
The flat array could be turned into a list of arrays with split:
In [152]: np.split(arr[~np.isnan(arr)],(3,5))
Out[152]: [array([1., 2., 3.]), array([4., 5.]), array([6.])]
where the (3,5) split parameter could be determined by counting the non-nan in each row, but that's more work and doesn't promise to be faster than than the row iteration.

Related

How to stack uneven numpy arrays?

how can I stack the elements from the same respective index from each array in a list of arrays?
arrays = [np.array([1,2,3,4,5]),
np.array([6,7,8,9]),
np.array([11,22,33,44,55]),
np.array([2,4])]
output = [[1,6,11,2],
[2,7,22,4],
[3,8,33],
[4,9,44],
[5,55]]
arrays is a list of arrays of uneven lengths. The output has a first array (don't mind if it's a list too) that contains all possible index 0s from each array. The next array within output contains all possible index 1s and so on...
Closest thing I can find (but requires same shape arrays) is:
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
np.stack((a, b), axis=-1)
# which gives
array([[1, 2],
[2, 3],
[3, 4]])
Thanks.
This gets you close. You can't really have a 2D sparse array as shown in your example output.
import numpy as np
arrays = [np.array([1,2,3,4,5]),
np.array([6,7,8,9]),
np.array([11,22,33,44,55]),
np.array([2,4])]
maxx = max(x.shape[0] for x in arrays)
for x in arrays:
x.resize(maxx,refcheck=False)
output = np.stack(arrays, axis=1)
print(output)
C:\tmp>python x.py
[[ 1 6 11 2]
[ 2 7 22 4]
[ 3 8 33 0]
[ 4 9 44 0]
[ 5 0 55 0]]
You could just wrap it in a DataFrame first:
arr = pd.DataFrame(arrays).values.T
Output:
array([[ 1., 6., 11., 2.],
[ 2., 7., 22., 4.],
[ 3., 8., 33., nan],
[ 4., 9., 44., nan],
[ 5., nan, 55., nan]])
Though if you really want it with different sizes, go with:
arr = [x.dropna().values for _, x in pd.DataFrame(arrays).iteritems()]
Output:
[array([ 1, 6, 11, 2]),
array([ 2, 7, 22, 4]),
array([ 3., 8., 33.]),
array([ 4., 9., 44.]),
array([ 5., 55.])]

Compare Two Matrices Containing NaN and Mask the Element Values in Both Matrices Where at Least One of Them Contains NaN in Python

I have two 3D matrices (A and B), both containing NaN in random elements. I am comparing these two matrices and in places where at least one of them contains NaN I want both of them to contain NaN. In other words, if the both of them don't already contain NaN at that index, I want to replace that index value with NaN. Is there an efficient way to do this with a python function?
import numpy as np
# Create the fake variables A and B. Here is what A and B look like initially:
A = np.array([[1, 2, np.nan], [4, np.nan, 6], [np.nan, 8, 9]])
B = np.array([[1, 2, 3], [4, 5, np.nan], [np.nan, 8, 9]])
# What I want A and B to look like in the end:
A
array([[ 1., 2., nan],
[ 4., nan, nan],
[ nan, 8., 9.]])
B
array([[ 1., 2., nan],
[ 4., nan, nan],
[ nan, 8., 9.]])
You need numpy.isnan() and boolean indexing.
>>> A[np.isnan(B)] = np.nan
>>> B[np.isnan(A)] = np.nan

Python: properly iterating through a dictionary of numpy arrays

Given the following numpy arrays:
import numpy
a=numpy.array([[1,1,1],[1,1,1],[1,1,1]])
b=numpy.array([[2,2,2],[2,2,2],[2,2,2]])
c=numpy.array([[3,3,3],[3,3,3],[3,3,3]])
and this dictionary containing them all:
mydict={0:a,1:b,2:c}
What is the most efficient way of iterating through mydict so to compute the average numpy array that has (1+2+3)/3=2 as values?
My attempt fails as I am giving it too many values to unpack. It is also extremely inefficient as it has an O(n^3) time complexity:
aver=numpy.empty([a.shape[0],a.shape[1]])
for c,v in mydict.values():
for i in range(0,a.shape[0]):
for j in range(0,a.shape[1]):
aver[i][j]=mydict[c][i][j] #<-too many values to unpack
The final result should be:
In[17]: aver
Out[17]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
EDIT
I am not looking for an average value for each numpy array. I am looking for an average value for each element of my colleciton of numpy arrays. This is a minimal example, but the real thing I am working on has over 120,000 elements per array, and for the same position the values change from array to array.
I think you're making this harder than it needs to be. Either sum them and divide by the number of terms:
In [42]: v = mydict.values()
In [43]: sum(v) / len(v)
Out[43]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
Or stack them into one big array -- which it sounds like is the format they probably should have been in to start with -- and take the mean over the stacked axis:
In [44]: np.array(list(v)).mean(axis=0)
Out[44]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
You really shouldn't be using a dict of numpy.arrays. Just use a multi-dimensional array:
>>> bigarray = numpy.array([arr.tolist() for arr in mydict.values()])
>>> bigarray
array([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]],
[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]]])
>>> bigarray.mean(axis=0)
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
>>>
You should modify your code to not even work with a dict. Especially not a dict with integer keys...

Replace values in numpy array containing NaN

aa = np.array([2.0, np.NaN])
aa[aa>1.0] = np.NaN
On running the code above, I get the foll. warning, I understand the reason for this warning, but how to avoid it?
RuntimeWarning: invalid value encountered in greater
Store the indices of the valid ones (non - NaNs). First off, we will use these indices to index into the array and perform the comparison to get a mask and then again index into those indices with that mask to retrieve back the indices corresponding to original order. Using the original-ordered indices, we could then assign elements in the input array to NaNs.
Thus, an implementation/solution would be -
idx = np.flatnonzero(~np.isnan(aa))
aa[idx[aa[idx] > 1.0]] = np.nan
Sample run -
In [106]: aa # Input array with NaNs
Out[106]: array([ 0., 3., nan, 0., 9., 6., 6., nan, 18., 6.])
In [107]: idx = np.flatnonzero(~np.isnan(aa)) # Store valid indices
In [108]: idx
Out[108]: array([0, 1, 3, 4, 5, 6, 8, 9])
In [109]: aa[idx[aa[idx] > 1.0]] = np.nan # Do the assignment
In [110]: aa # Verify
Out[110]: array([ 0., nan, nan, 0., nan, nan, nan, nan, nan, nan])

Column_match([[1],[1,1]]) <--- how to make dimensions match with NA values?

Any flag for this? Please, see the intended.
>>> numpy.column_stack([[1], [1,2]])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/pymodules/python2.7/numpy/lib/shape_base.py", line 296, in column_stack
return _nx.concatenate(arrays,1)
ValueError: array dimensions must agree except for d_0
Input
[[1],[1,2]]
Intended Output
[[NA,1], [1,2]]
In general
[[1],[2,2],[3,3,3],...,[n,n,n,n,n...,n]]
to
[[NA, NA, NA,..., NA,1], [NA, NA, ..., 2, 2], ...[n,n,n,n,n]]
where the columns may be a triangluar zero matrix initially. Yes you can understand the term NA as None. I got the triangular matrix almost below.
>>> a=[[1],[2,2],[3,3,3]]
>>> a
[[1], [2, 2], [3, 3, 3]]
>>> len(a)
3
>>> [aa+['']*(N-len(aa)) for
...
KeyboardInterrupt
>>> N=len(a)
>>> [aa+['']*(N-len(aa)) for aa in a]
[[1, '', ''], [2, 2, ''], [3, 3, 3]]
>>> transpose([aa+['']*(N-len(aa)) for aa in a])
array([['1', '2', '3'],
['', '2', '3'],
['', '', '3']],
dtype='|S4')
a pure numpy solution:
>>> lili = [[1],[2,2],[3,3,3],[4,4,4,4]]
>>> y = np.nan*np.ones((4,4))
>>> y[np.tril_indices(4)] = np.concatenate(lili)
>>> y
array([[ 1., nan, nan, nan],
[ 2., 2., nan, nan],
[ 3., 3., 3., nan],
[ 4., 4., 4., 4.]])
>>> y[:,::-1]
array([[ nan, nan, nan, 1.],
[ nan, nan, 2., 2.],
[ nan, 3., 3., 3.],
[ 4., 4., 4., 4.]])
I'm not sure which triangular array you want, there is also np.triu_indices
(maybe not always faster, but easy to read)
column_stack adds a column to an array. That column is supposed to be a smalled (1D) array.
When I try :
from numpy import *
x = array([0])
z = array([1, 2])
if you do this :
r = column_stack ((x,z))
You'll get this :
>>> array([0,1,2])
So, in order to add a column to your first array, maybe this :
n = array([9])
arr = ([column_stack((n, x))], z)
It shows up this :
>>> arr
([array([[9, 0]])], array([[1, 2]]))
It has the same look as your "intended output"
Hope this was helpful !

Categories