Numpy - find rows where all elements are finite - python

I have data which are stored in a numpy array with n rows and p columns.
I would like to check which rows are fully finite and store this information in a boolean array to use it as a mask somewhere.
I have solved it for the p=2 case, but would like to solve it for all cases
My code looks like this:
raw_test = np.array([[0, numpy.NaN], [0, 0], [numpy.NaN, numpy.NaN]])
test = np.isfinite(raw_test)
def multiply(x):
return x[0] * x[1]
numpy.apply_along_axis(multiply, 1, test)

You can use numpy.isnan to check which of the items are NaN and then find the indices of the rows which with all True's using numpy.all and numpy.where.
>>> np.isnan(raw_test)
array([[False, True],
[False, False],
[ True, True]], dtype=bool)
>>> np.all(np.isnan(raw_test), axis=1)
array([False, False, True], dtype=bool)
>>> np.where(np.all(np.isnan(raw_test), axis=1))[0]
array([2])

Another option is to use a masked_array:
import numpy as np
raw_test = np.array([[0, np.NaN], [0, 0], [np.NaN, np.NaN]])
test = np.ma.masked_invalid(raw_test)
print(test)
# [[0.0 --]
# [0.0 0.0]
# [-- --]]
def multiply(x):
return x[0] * x[1]
print(np.apply_along_axis(multiply, 1, test))
yields
[ nan 0. nan]

Related

How to count repeated elements in a numpy 2d array?

I have many very large padded numpy 2d arrays, simplified to array A, shown below. Array Z is the basic pad array:
A = np.array(([1 , 2, 3], [2, 3, 4], [0, 0, 0], [0, 0, 0], [0, 0, 0]))
Z = np.array([0, 0, 0])
How to count the number of pads in array A in the simplest / fastest pythonic way?
This works (zCount=3), but seems verbose, loopy and unpythonic:
zCount = 0
for a in A:
if a.any() == Z.any():
zCount += 1
zCount
Also tried a one-line list comprehension, which doesn't work (dont know why not):
[zCount += 1 for a in A if a.any() == Z.any()]
zCount
Also tried a list count, but 'truth value of array with more than one element is ambiguous':
list(A).count(Z)
Have searched for a simple numpy expression without success. np.count_nonzero gives full elementwise boolean for [0]. Is there a one-word / one-line counting expression for [0, 0, 0]? (My actual arrays are approx. shape (100,30) and I have up to millions of these. I am trying to deal with them in batches, so any simple time savings generating a count would be helpful). thx
Try:
>>> np.equal(A, Z).all(axis=1).sum()
3
Step by step:
>>> np.equal(A, Z)
array([[False, False, False],
[False, False, False],
[ True, True, True],
[ True, True, True],
[ True, True, True]])
>>> np.equal(A, Z).all(axis=1)
array([False, False, True, True, True])
>>> np.equal(A, Z).all(axis=1).sum()
3

How can I compare two matrixes for similarity using Python?

Python 3:
How can I compare two matrices of similar shape to one another?
For example, lets say we have matrix x:
1 0 1
0 0 1
1 1 0
I would like to compare this to matrix y:
1 0 1
0 0 1
1 1 1
Which would give me a score, for example, 8/9 as 8/9 of the items were the same, with the exception of that last digit that went from 0 to 1. The matrices I am dealing with are much larger, but their dimensions are consistent for comparison.
There must be a library of some sort that can do this. Any thoughts?
If you are using numpy, you can simply use np.mean() on the boolean array after comparison as follows.
import numpy as np
m1 = np.array([
[1, 0, 1],
[0, 0, 1],
[1, 1, 0],
])
m2 = np.array([
[1, 0, 1],
[0, 0, 1],
[1, 1, 1],
])
score = np.mean(m1 == m2)
print(score) # prints 0.888..
You can do this easily with numpy arrays.
import numpy as np
a = np.array([
[1, 0, 1],
[0, 0, 1],
[1, 1, 0],
])
b = np.array([
[1, 0, 1],
[0, 0, 1],
[1, 1, 1],
])
print(np.sum(a == b) / a.size)
Gives back 0.889.
If your matrices are represented using the third-party library Numpy (which provides a lot of other useful stuff for dealing with matrices, as well as any kind of rectangular, multi-dimensional array):
>>> import numpy as np
>>> x = np.array([[1,0,1],[0,0,1],[1,1,0]])
>>> y = np.array([[1,0,1],[0,0,1],[1,1,1]])
Then finding the number of corresponding equal elements is as simple as:
>>> (x == y).sum() / x.size
0.8888888888888888
This works because x == y "broadcasts" the comparison to each corresponding element pair:
>>> x == y
array([[ True, True, True],
[ True, True, True],
[ True, True, False]])
and then we add up the boolean values (converted to integer, True has a value of 1 and False has a value of 0) and divide by the total number of elements.
If you are using NumPy you can compare them and get the following output:
import numpy as np
a = np.array([[1,0,1],[0,0,1],[1,1,0]])
b = np.array([[1,0,1],[0,0,1],[1,1,1]])
print(a == b)
Out: matrix([[ True, True, True],
[ True, True, True],
[ True, True, False]],
To count the matches you can reshape the matrices to a list and count the matching values:
import numpy as np
a = np.array([[1,0,1],[0,0,1],[1,1,0]])
b = np.array([[1,0,1],[0,0,1],[1,1,1]])
res = list(np.array(a==b).reshape(-1,))
print(f'{res.count(True)}/{len(res)}')
Out: 8/9

Elegant way to check co-ordinates of a 2D NumPy array lie within a certain range

So let us say we have a 2D NumPy array (denoting co-ordinates) and I want to check whether all the co-ordinates lie within a certain range. What is the most Pythonic way to do this? For example:
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
#ALL THE COORDINATES WITHIN x-> 0 to 4 AND y-> 0 to 4 SHOULD
BE PUT IN b (x and y ranges might not be equal)
b = #DO SOME OPERATION
>>> b
>>> [[3,4],
[0,0]]
If the range is the same for both directions, x, and y, just compare them and use all:
import numpy as np
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
a[(a >= 0).all(axis=1) & (a <= 4).all(axis=1)]
# array([[3, 4],
# [0, 0]])
If the ranges are not the same, you can also compare to an iterable of the same size as that axis (so two here):
mins = 0, 1 # x_min, y_min
maxs = 4, 10 # x_max, y_max
a[(a >= mins).all(axis=1) & (a <= maxs).all(axis=1)]
# array([[1, 5],
# [3, 4]])
To see what is happening here, let's have a look at the intermediate steps:
The comparison gives a per-element result of the comparison, with the same shape as the original array:
a >= mins
# array([[False, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, False],
# [False, False]], dtype=bool)
Using nmpy.ndarray.all, you get if all values are truthy or not, similarly to the built-in function all:
(a >= mins).all()
# False
With the axis argument, you can restrict this to only compare values along one (or multiple) axis of the array:
(a >= mins).all(axis=1)
# array([False, True, True, True, True, False, False], dtype=bool)
(a >= mins).all(axis=0)
# array([False, False], dtype=bool)
Note that the output of this is the same shape as array, except that all dimnsions mentioned with axis have been contracted to a single True/False.
When indexing an array with a sequence of True, False values, it is cast to the right shape if possible. Since we index an array with shape (7, 2) with an (7,) = (7, 1) index, the values are implicitly repeated along the second dimension, so these values are used to select rows of the original array.

Numpy matrix comparison to several criteria

I'm working on comparing values in a numpy matrix.
Initially I wanted to check if any of the values in the matrix m were smaller than X, so I used:
(m<(X)).any()
Which worked fine, but now I would like it to ignore all 0 values in the matrix, so in essence to tell me if any values in the matrix m are in that range 0 < m < X.
I've figured a way to do this by going into a while look put was hoping that there might be a similar function to that above that could do the trick?
Many Thanks
Much like here, you can do
np.where(np.logical_and(0<a,a<6))
And it will give you two arrays, which tell you the locations in your matrix.
(array([0, 0, 1, 1, 1], dtype=int32),
array([1, 2, 0, 1, 2], dtype=int32))
Unlike the above, you have an n-dimensional array, and the output of that may not be as useful as using a masked array
b=np.ma.masked_where(np.logical_or(a<=0,a>=6),a)
b
Out[40]:
masked_array(data =
[[-- 1 2]
[3 4 5]
[-- -- --]],
mask =
[[ True False False]
[False False False]
[ True True True]],
fill_value = 999999)
Since that can give you a more useful array that preserves location.

Vectorized element assignment involving comparisons between matrices in Numpy

I'm currently trying to replace the for-loops in this code chunk with vectorized operations in Numpy:
def classifysignal(samplemat, binedges, nbinmat, nodatacode):
ndata, nsignals = np.shape(samplemat)
classifiedmat = np.zeros(shape=(ndata, nsignals))
ncounts = 0
for i in range(ndata):
for j in range(nsignals):
classifiedmat[i,j] = nbinmat[j]
for e in range(nbinmat[j]):
if samplemat[i,j] == nodatacode:
classifiedmat[i,j] == nodatacode
break
elif samplemat[i,j] <= binedges[j, e]:
classifiedmat[i,j] = e
ncounts += 1
break
ncounts = float(ncounts/nsignals)
return classifiedmat, ncounts
However, I'm having a little trouble conceptualizing how to replace the third for loop (i.e. the one beginning with for e in range(nbinmat[j]), since it entails comparing individual elements of two separate matrices before assigning a value, with the indices of these elements (i and e) being completely decoupled from each other. Is there a simple way to do this using whole-array operations, or would sticking with for-loops be best?
PS: My first Stackoverflow question, so if anything's unclear/more details are needed, please let me know! Thanks.
Without some concrete examples and explanation it's hard (or at least work) to figure out what you are trying to do, especially in the inner loop. So let's tackle a few pieces and try to simplify them
In [59]: C=np.zeros((3,4),int)
In [60]: N=np.arange(4)
In [61]: C[:]=N
In [62]: C
Out[62]:
array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
means that classifiedmat[i,j] = nbinmat[j] can be moved out of the loops
classifiedmat = np.zeros(samplemat.shape)
classifiedmat[:] = nbinmat
and
In [63]: S=np.arange(12).reshape(3,4)
In [64]: C[S>8]=99
In [65]: C
Out[65]:
array([[ 0, 1, 2, 3],
[ 0, 1, 2, 3],
[ 0, 99, 99, 99]])
suggests that
if samplemat[i,j] == nodatacode:
classifiedmat[i,j] == nodatacode
could be replaced with
classifiedmat[samplemat==nodatacode] = nodatacode
I haven't worked out whether loop and break modifies this replacement or not.
a possible model for inner loop is:
In [83]: B=np.array((np.arange(4),np.arange(2,6))).T
In [84]: for e in range(2):
C[S<=B[:,e]]=e
....:
In [85]: C
Out[85]:
array([[ 1, 1, 1, 1],
[ 0, 1, 2, 3],
[ 0, 99, 99, 99]])
You could also compare all values of S and B with:
In [86]: S[:,:,None]<=B[None,:,:]
Out[86]:
array([[[ True, True],
[ True, True],
[ True, True],
[ True, True]],
[[False, False],
[False, False],
[False, False],
[False, False]],
[[False, False],
[False, False],
[False, False],
[False, False]]], dtype=bool)
The fact that you are iterating over:
for e in range(nbinmat[j]):
may throw out all these equivalences. I'm not going try to figure out its significance. But maybe I've given you some ideas.
Well, if you want to use vector operations you need to solve the problem using linear algebra. I can't rethink the problem for you, but the general approach I would take is something like:
res = Subtract samplemat from binedges
res = Normalize values in res to 0 and 1 (use clip?). i.e if > 0, then 1 else 0.
ncount = sum ( res )
classifiedMat = res * binedges
And so on.

Categories