Avoid NaN values and add two matrices element wise in python - python

I have two 4D matrices, which I would like to add. The matrices have the exact same dimension and number of elements, but they both contain randomly distributed NaN values.
I would prefer to add them as below using numpy.nansum.
(1) if two values are added I want the sum to be a value,
(2) if a value and a NaN are added I want the sum to be the value and
(3) if two NaN are added I want the sum to be NaN.
Herewith what I tried
a[6x7x180x360]
b[6x7x180x360]
C=np.nansum[(a,b)]
C=np.nansum(np.dstack((a,b)),2)
But I am unable to get the resultant matrix with same dimension as input. It means resultant matrix C should be in [6x7x180x360].
Anyone can help in this regard. Thank you in advance.

You could use np.stack((a,b)) to stack along a new 0-axis, then call nansum to sum along that 0-axis:
C = np.nansum(np.stack((a,b)), axis=0)
For example,
In [34]: a = np.random.choice([1,2,3,np.nan], size=(6,7,180,360))
In [35]: b = np.random.choice([1,2,3,np.nan], size=(6,7,180,360))
In [36]: np.stack((a,b)).shape
Out[36]: (2, 6, 7, 180, 360)
In [37]: np.nansum(np.stack((a,b)), axis=0).shape
Out[37]: (6, 7, 180, 360)
You had the right idea, but np.dstack stacks along the third axis, which is not desireable here since you already have 4 axes:
In [31]: np.dstack((a,b)).shape
Out[31]: (6, 7, 360, 360)
Regarding your point (3):
Note that the behavior of np.nansum depends on the NumPy version:
In NumPy versions <= 1.8.0 Nan is returned for slices that are all-NaN or
empty. In later versions zero is returned.
If you are using NumPy version > 1.8.0, then you may have to use a solution such as
Maarten Fabré's to address this issue.

I believe the function np.nansum is not appropriate in your case. If I understand your question correctly, you wish to do an element-wise addition of two matrices with a little of logic regarding the NaN values.
Here is the full example on how to do it:
import numpy as np
a = np.array([ [np.nan, 2],
[3, np.nan]])
b = np.array([ [3, np.nan],
[1, np.nan]])
result = np.add(a,b)
a_is_nan = np.isnan(a)
b_is_nan = np.isnan(b)
result_is_nan = np.isnan(result)
mask_a = np.logical_and(result_is_nan, np.logical_not(a_is_nan))
result[mask_a] = a[mask_a]
mask_b = np.logical_and(result_is_nan, np.logical_not(b_is_nan))
result[mask_b] = b[mask_b]
print(result)
A little bit of explanation:
The first operation is np.add(a,b). This adds both matrices and any NaN element will produce a result of NaN also.
To select the NaN values from either arrays, we use a logical mask:
# result_is_nan is a boolean array containing True whereve the result is np.NaN. This occurs when any of the two element were NaN
result_is_nan = np.isnan(result)
# mask_a is a boolean array which 'flags' elements that are NaN in result but were not NaN in a !
mask_a = np.logical_and(result_is_nan, np.logical_not(a_is_nan))
# Using that mask, we assign those value to result
result[mask_a] = a[mask_a]
There you have it !

I think the easiest way is to use np.where
result = np.where(
np.isnan(a+b),
np.where(np.isnan(a), b, a),
a+b
)
This reads as:
if a+b is not nan, use a+b, else use a, unless it is nan, then use b. Whether or b is nan is of little consequence then.
Alternatively, you can use it like this:
result2 = np.where(
np.isnan(a) & np.isnan(b),
np.nan,
np.nansum(np.stack((a,b)), axis=0)
)
np.testing.assert_equal(result, result2) passes

Related

Python Numpy 2D Array Number of Rows For Empty Matrix is Still 1

Say I have a matrices A, B, and C. When I initialize the matrices to
A = np.array([[]])
B = np.array([[1,2,3]])
C = np.array([[1,2,3],[4,5,6]])
Then
B.shape[0]
C.shape[0]
give 1 and 2, respectively (as expected), but
A.shape[0]
gives 1, just like B.shape[0].
What is the simplest way to get the number of rows of a given matrix, but still ensure that an empty matrix like A gives a value of zero.
After searching stack overflow for awhile, I couldn't find an answer, so I'm posting my own below, but if you can come up with a cleaner, more general answer, I'll accept your answer instead. Thanks!
A = np.array([[]])
That's a 1-by-0 array. You seem to want a 0-by-3 array. Such an array is almost completely useless, but if you really want one, you can make one:
A = np.zeros([0, 3])
Then you'll have A.shape[0] == 0.
You could qualify the shape[0] by the test of whether size is 0 or not.
In [121]: A.shape[0]*(A.size>0)
Out[121]: 0
In [122]: B.shape[0]*(B.size>0)
Out[122]: 1
In [123]: C.shape[0]*(C.size>0)
Out[123]: 2
or test the number of columns
In [125]: A.shape[0]*(A.shape[1]>0)
Out[125]: 0
What's distinctive about A is the number of columns, the 2nd dimension.
Using
A.size/(len(A[0]) or 1)
B.size/(len(B[0]) or 1)
C.size/(len(C[0]) or 1)
yields 0, 1, and 2, respectively.

Subtract two arrays, keep only values where the other has no data

I've been trying to subtract (and combine an multiply etc.) two Numpy arrays so that the resulting array would have values left only in such places, where the other array has no data.
Like if I have matrices a and b, a-b would give c:
a = np.array([0,2,3,0])
b = np.array([1,0,3,0])
c = np.array([0,2,0,0])
I've already tried multiplying b with a very big number, but then I couldn't figure out how to get rid of the negative values. There is also that that the arrays a and b have missing values as -999.
Help would be very appreciated!Thanks!
How about this?
>>> a = np.array([0,2,3,0])
>>> b = np.array([1,0,3,0])
>>> a[b!=0] = 0
>>> a
array([0, 2, 0, 0])

Numpy: 2D array access with 2D array of indices

I have two arrays, one is a matrix of index pairs,
a = array([[[0,0],[1,1]],[[2,0],[2,1]]], dtype=int)
and another which is a matrix of data to access at these indices
b = array([[1,2,3],[4,5,6],[7,8,9]])
and I want to able to use the indices of a to get the entries of b. Just doing:
>>> b[a]
does not work, as it gives one row of b for each entry in a, i.e.
array([[[[1,2,3],
[1,2,3]],
[[4,5,6],
[4,5,6]]],
[[[7,8,9],
[1,2,3]],
[[7,8,9],
[4,5,6]]]])
when I would like to use the index pair in the last axis of a to give the two indices of b:
array([[1,5],[7,8]])
Is there a clean way of doing this, or do I need to reshape b and combine the columns of a in a corresponding manner?
In my actual problem a has about 5 million entries, and b is 100-by-100, I'd like to avoid for loops.
Actually, this works:
b[a[:, :, 0],a[:, :, 1]]
Gives array([[1, 5],
[7, 8]]).
For this case, this works
tmp = a.reshape(-1,2)
b[tmp[:,0], tmp[:,1]]
A more general solution, whenever you want to use a 2D array of indices of shape (n,m) with arbitrary large dimension m, named inds, in order to access elements of another 2D array of shape (n,k), named B:
# array of index offsets to be added to each row of inds
offset = np.arange(0, inds.size, inds.shape[1])
# numpy.take(B, C) "flattens" arrays B and C and selects elements from B based on indices in C
Result = np.take(B, offset[:,np.newaxis]+inds)
Another solution, which doesn't use np.take and I find more intuitive, is the following:
B[np.expand_dims(np.arange(B.shape[0]), -1), inds]
The advantage of this syntax is that it can be used both for reading elements from B based on inds (like np.take), as well as for assignment.
You can test this by using, e.g.:
B = 1/(np.arange(n*m).reshape(n,-1) + 1)
inds = np.random.randint(0,B.shape[1],(B.shape[0],B.shape[1]))

Index of multidimensional array

I have a problem using multi-dimensional vectors as indices for multi-dimensional vectors. Say I have C.ndim == idx.shape[0], then I want C[idx] to give me a single element. Allow me to explain with a simple example:
A = arange(0,10)
B = 10+A
C = array([A.T, B.T])
C = C.T
idx = array([3,1])
Now, C[3] gives me the third row, and C[1] gives me the first row. C[idx] then will give me a vstack of both rows. However, I need to get C[3,1]. How would I achieve that given arrays C, idx?
/edit:
An answer suggested tuple(idx). This work's perfectly for a single idx. But:
Let's take it to the next level: say INDICES is a vector where I have stacked vertically arrays of shape idx. tuple(INDICES) will give me one long tuple, so C[tuple(INDICES)] won't work. Is there a clean way of doing this or will I need to iterate over the rows?
If you convert idx to a tuple, it'll be interpreted as basic and not advanced indexing:
>>> C[3,1]
13
>>> C[tuple(idx)]
13
For the vector case:
>>> idx
array([[3, 1],
[7, 0]])
>>> C[3,1], C[7,0]
(13, 7)
>>> C[tuple(idx.T)]
array([13, 7])
>>> C[idx[:,0], idx[:,1]]
array([13, 7])

how to match two numpy array of unequal length?

i have two 1D numpy arrays. The lengths are unequal. I want to make pairs (array1_elemnt,array2_element) of the elements which are close to each other. Lets consider following example
a = [1,2,3,8,20,23]
b = [1,2,3,5,7,21,35]
The expected result is
[(1,1),
(2,2),
(3,3),
(8,7),
(20,21),
(23,25)]
It is important to note that 5 is left alone. It could easily be done by loops but I have very large arrays. I considered using nearest neighbor. But felt like killing a sparrow with a canon.
Can anybody please suggest any elegant solution.
Thanks a lot.
How about using the Needleman-Wunsch algorithm? :)
The scoring matrix would be trivial, as the "distance" between two numbers is just their difference.
But that will probably feel like killing a sparrow with a tank ...
You could use the built in map function to vectorize a function that does this. For example:
ar1 = np.array([1,2,3,8,20,23])
ar2 = np.array([1,2,3,5,7,21,35])
def closest(ar1, ar2, iter):
x = np.abs(ar1[iter] - ar2)
index = np.where(x==x.min())
value = ar2[index]
return value
def find(x):
return closest(ar1, ar2, x)
c = np.array(map(find, range(ar1.shape[0])))
In the example above, it looked like you wanted to exclude values once they had been paired. In that case, you could include a removal process in the first function like this, but be very careful about how array 1 is sorted:
def closest(ar1, ar2, iter):
x = np.abs(ar1[iter] - ar2)
index = np.where(x==x.min())
value = ar2[index]
ar2[ar2==value] = -10000000
return value
The best method I can think of is use a loop. If loop in python is slow, you can use Cython to speedup you code.
I think one can do it like this:
create two new structured arrays, such that there is a second index which is 0 or 1 indicating to which array the value belongs, i.e. the key
concatenate both arrays
sort the united array along the first field (the values)
use 2 stacks: go through the array putting elements with key 1 on the left stack, and when you cross an element with key 0, put them in the right stack. When you reach the second element with key 0, for the first with key 0 check the top and bottom of the left and right stacks and take the closest value (maybe with a maximum distance), switch stacks and continue.
sort should be slowest step and max total space for the stacks is n or m.
You can do the following:
a = np.array([1,2,3,8,20,23])
b = np.array([1,2,3,5,7,21,25])
def find_closest(a, sorted_b):
j = np.searchsorted(.5*(sorted_b[1:] + sorted_b[:-1]), a, side='right')
return b[j]
b.sort() # or, b = np.sort(b), if you don't want to modify b in-place
print np.c_[a, find_closest(a, b)]
# ->
# array([[ 1, 1],
# [ 2, 2],
# [ 3, 3],
# [ 8, 7],
# [20, 21],
# [23, 25]])
This should be pretty fast. How it works is that searchsorted will find for each number a the index into the b past the midpoint between two numbers, i.e., the closest number.

Categories