How to count repeated elements in a numpy 2d array? - python

I have many very large padded numpy 2d arrays, simplified to array A, shown below. Array Z is the basic pad array:
A = np.array(([1 , 2, 3], [2, 3, 4], [0, 0, 0], [0, 0, 0], [0, 0, 0]))
Z = np.array([0, 0, 0])
How to count the number of pads in array A in the simplest / fastest pythonic way?
This works (zCount=3), but seems verbose, loopy and unpythonic:
zCount = 0
for a in A:
if a.any() == Z.any():
zCount += 1
zCount
Also tried a one-line list comprehension, which doesn't work (dont know why not):
[zCount += 1 for a in A if a.any() == Z.any()]
zCount
Also tried a list count, but 'truth value of array with more than one element is ambiguous':
list(A).count(Z)
Have searched for a simple numpy expression without success. np.count_nonzero gives full elementwise boolean for [0]. Is there a one-word / one-line counting expression for [0, 0, 0]? (My actual arrays are approx. shape (100,30) and I have up to millions of these. I am trying to deal with them in batches, so any simple time savings generating a count would be helpful). thx

Try:
>>> np.equal(A, Z).all(axis=1).sum()
3
Step by step:
>>> np.equal(A, Z)
array([[False, False, False],
[False, False, False],
[ True, True, True],
[ True, True, True],
[ True, True, True]])
>>> np.equal(A, Z).all(axis=1)
array([False, False, True, True, True])
>>> np.equal(A, Z).all(axis=1).sum()
3

Related

Count number of transitions in each row of a Numpy array

I have a 2D boolean array
a=np.array([[True, False, True, False, True],[True , True, True , True, True], [True , True ,False, False ,False], [False, True , True, False, False], [True , True ,False, True, False]])
I would like to create a new array, providing count of True-False transitions in each row of this array.
The desired result is count=[2, 0, 1, 1, 2]
I operate with a large numpy array, so I don't apply cycle to browse through all lines.
I tried to adopt available solutions to a 2D array with counting for each line separately, but did not succeed.
Here is a possible solution:
b = a.astype(int)
c = (b[:, :-1] - b[:, 1:])
count = (c == 1).sum(axis=1)
Result:
>>> count
array([2, 0, 1, 1, 2])

Numpy getting row indices of last two elements of each column in mask

I have a boolean mask shaped (M, N). Each column in the mask may have a different number of True elements, but is guaranteed to have at least two. I want to find the row index of the last two such elements as efficiently as possible.
If I only wanted one element, I could do something like (M - 1) - np.argmax(mask[::-1, :], axis=0). However, that won't help me get the second-to-last index.
I've come up with an iterative solution using np.where or np.nonzero:
M = 4
N = 3
mask = np.array([
[False, True, True],
[True, False, True],
[True, False, True],
[False, True, False]
])
result = np.zeros((2, N), dtype=np.intp)
for col in range(N):
result[:, col] = np.flatnonzero(mask[:, col])[-2:]
This creates the expected result:
array([[1, 0, 1],
[2, 3, 2]], dtype=int64)
I would like to avoid the final loop. Is there a reasonably vectorized form of the above? I am looking for specifically two rows, which are always guaranteed to exist. A general solution for arbitrary element counts is not required.
An argsort does it -
In [9]: np.argsort(mask,axis=0,kind='stable')[-2:]
Out[9]:
array([[1, 0, 1],
[2, 3, 2]])
Another with cumsum -
c = mask.cumsum(0)
out = np.where((mask & (c>=c[-1]-1)).T)[1].reshape(-1,2).T
Specifically for exactly two rows, one way with argmax -
c = mask.copy()
idx = len(c)-c[::-1].argmax(0)-1
c[idx,np.arange(len(idx))] = 0
idx2 = len(c)-c[::-1].argmax(0)-1
out = np.vstack((idx2,idx))

Elegant way to check co-ordinates of a 2D NumPy array lie within a certain range

So let us say we have a 2D NumPy array (denoting co-ordinates) and I want to check whether all the co-ordinates lie within a certain range. What is the most Pythonic way to do this? For example:
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
#ALL THE COORDINATES WITHIN x-> 0 to 4 AND y-> 0 to 4 SHOULD
BE PUT IN b (x and y ranges might not be equal)
b = #DO SOME OPERATION
>>> b
>>> [[3,4],
[0,0]]
If the range is the same for both directions, x, and y, just compare them and use all:
import numpy as np
a = np.array([[-1,2], [1,5], [6,7], [5,2], [3,4], [0, 0], [-1,-1]])
a[(a >= 0).all(axis=1) & (a <= 4).all(axis=1)]
# array([[3, 4],
# [0, 0]])
If the ranges are not the same, you can also compare to an iterable of the same size as that axis (so two here):
mins = 0, 1 # x_min, y_min
maxs = 4, 10 # x_max, y_max
a[(a >= mins).all(axis=1) & (a <= maxs).all(axis=1)]
# array([[1, 5],
# [3, 4]])
To see what is happening here, let's have a look at the intermediate steps:
The comparison gives a per-element result of the comparison, with the same shape as the original array:
a >= mins
# array([[False, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, True],
# [ True, False],
# [False, False]], dtype=bool)
Using nmpy.ndarray.all, you get if all values are truthy or not, similarly to the built-in function all:
(a >= mins).all()
# False
With the axis argument, you can restrict this to only compare values along one (or multiple) axis of the array:
(a >= mins).all(axis=1)
# array([False, True, True, True, True, False, False], dtype=bool)
(a >= mins).all(axis=0)
# array([False, False], dtype=bool)
Note that the output of this is the same shape as array, except that all dimnsions mentioned with axis have been contracted to a single True/False.
When indexing an array with a sequence of True, False values, it is cast to the right shape if possible. Since we index an array with shape (7, 2) with an (7,) = (7, 1) index, the values are implicitly repeated along the second dimension, so these values are used to select rows of the original array.

Delete rows from a multidimensional array in Python

Im trying to delete specific rows in my numpy array that following certain conditions.
This is an example:
a = np.array ([[1,1,0,0,1],
[0,0,1,1,1],
[0,1,0,1,1],
[1,0,1,0,1],
[0,0,1,0,1],
[1,0,1,0,0]])
I want to able to delete all rows, where specific columns are zero, this array could be a lot bigger.
In this example, if first two element are zero, or if last two elements are zero, the rows will be deleted.
It could be any combination, no only first element or last ones.
This should be the final:
a = np.array ([[1,1,0,0,1],
[0,1,0,1,1],
[1,0,1,0,1]])
For example If I try:
a[:,0:2] == 0
After reading:
Remove lines with empty values from multidimensional-array in php
and this question: How to delete specific rows from a numpy array using a condition?
But they don't seem to apply to my case, or probably I'm not understanding something here as nothing works my case.
This gives me all rows there the first two cases are zero, True, True
array([[False, False],
[ True, True],
[ True, False],
[False, True],
[ True, True],
[False, True]])
and for the last two columns being zero, the last row should be deleted too. So at the end I will only be left with 2 rows.
a[:,3:5] == 0
array([[ True, False],
[False, False],
[False, False],
[ True, False],
[ True, False],
[ True, True]])
Im trying something like this, but I don't understand now how to tell it to only give me the rows that follow the condition, although this only :
(a[a[:,0:2]] == 0).all(axis=1)
array([[ True, True, False, False, False],
[False, False, True, True, False],
[False, False, False, False, False],
[False, False, False, False, False],
[False, False, True, True, False],
[False, False, False, False, False]])
(a[((a[:,0])& (a[:,1])) ] == 0).all(axis=1)
and this shows everything as False
could you please guide me a bit?
thank you
Just adding in the question, that the case it wont always be the first 2 or the last 2. If my matrix has 35 columns, it could be the column 6th to 10th, and then column 20th and 25th. An user will be able to decide which columns they want to get deleted.
Try this
idx0 = (a[:,0:2] == 0).all(axis=1)
idx1 = (a[:,-2:] == 0).all(axis=1)
a[~(idx0 | idx1)]
The first two steps select the indices of the rows that match your filtering criteria. Then do an or (|) operation, and the not (~) operation to obtain the final indices you want.
If I understood correctly you could do something like this:
import numpy as np
a = np.array([[1, 1, 0, 0, 1],
[0, 0, 1, 1, 1],
[0, 1, 0, 1, 1],
[1, 0, 1, 0, 1],
[0, 0, 1, 0, 1],
[1, 0, 1, 0, 0]])
left = np.count_nonzero(a[:, :2], axis=1) != 0
a = a[left]
right = np.count_nonzero(a[:, -2:], axis=1) != 0
a = a[right]
print(a)
Output
[[1 1 0 0 1]
[0 1 0 1 1]
[1 0 1 0 1]]
Or, a shorter version:
left = np.count_nonzero(a[:, :2], axis=1) != 0
right = np.count_nonzero(a[:, -2:], axis=1) != 0
a = a[(left & right)]
Use the following mask:
[np.any(a[:,:2], axis=1) & np.any(a[:,:-2], axis=1)]
if you want to create a filtered view:
a[np.any(a[:,:2], axis=1) & np.any(a[:,:-2], axis=1)]
if you want to create a new array:
np.delete(a,np.where(~(np.any(a[:,:2], axis=1) & np.any(a[:,:-2], axis=1))), axis=0)

Vectorized element assignment involving comparisons between matrices in Numpy

I'm currently trying to replace the for-loops in this code chunk with vectorized operations in Numpy:
def classifysignal(samplemat, binedges, nbinmat, nodatacode):
ndata, nsignals = np.shape(samplemat)
classifiedmat = np.zeros(shape=(ndata, nsignals))
ncounts = 0
for i in range(ndata):
for j in range(nsignals):
classifiedmat[i,j] = nbinmat[j]
for e in range(nbinmat[j]):
if samplemat[i,j] == nodatacode:
classifiedmat[i,j] == nodatacode
break
elif samplemat[i,j] <= binedges[j, e]:
classifiedmat[i,j] = e
ncounts += 1
break
ncounts = float(ncounts/nsignals)
return classifiedmat, ncounts
However, I'm having a little trouble conceptualizing how to replace the third for loop (i.e. the one beginning with for e in range(nbinmat[j]), since it entails comparing individual elements of two separate matrices before assigning a value, with the indices of these elements (i and e) being completely decoupled from each other. Is there a simple way to do this using whole-array operations, or would sticking with for-loops be best?
PS: My first Stackoverflow question, so if anything's unclear/more details are needed, please let me know! Thanks.
Without some concrete examples and explanation it's hard (or at least work) to figure out what you are trying to do, especially in the inner loop. So let's tackle a few pieces and try to simplify them
In [59]: C=np.zeros((3,4),int)
In [60]: N=np.arange(4)
In [61]: C[:]=N
In [62]: C
Out[62]:
array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
means that classifiedmat[i,j] = nbinmat[j] can be moved out of the loops
classifiedmat = np.zeros(samplemat.shape)
classifiedmat[:] = nbinmat
and
In [63]: S=np.arange(12).reshape(3,4)
In [64]: C[S>8]=99
In [65]: C
Out[65]:
array([[ 0, 1, 2, 3],
[ 0, 1, 2, 3],
[ 0, 99, 99, 99]])
suggests that
if samplemat[i,j] == nodatacode:
classifiedmat[i,j] == nodatacode
could be replaced with
classifiedmat[samplemat==nodatacode] = nodatacode
I haven't worked out whether loop and break modifies this replacement or not.
a possible model for inner loop is:
In [83]: B=np.array((np.arange(4),np.arange(2,6))).T
In [84]: for e in range(2):
C[S<=B[:,e]]=e
....:
In [85]: C
Out[85]:
array([[ 1, 1, 1, 1],
[ 0, 1, 2, 3],
[ 0, 99, 99, 99]])
You could also compare all values of S and B with:
In [86]: S[:,:,None]<=B[None,:,:]
Out[86]:
array([[[ True, True],
[ True, True],
[ True, True],
[ True, True]],
[[False, False],
[False, False],
[False, False],
[False, False]],
[[False, False],
[False, False],
[False, False],
[False, False]]], dtype=bool)
The fact that you are iterating over:
for e in range(nbinmat[j]):
may throw out all these equivalences. I'm not going try to figure out its significance. But maybe I've given you some ideas.
Well, if you want to use vector operations you need to solve the problem using linear algebra. I can't rethink the problem for you, but the general approach I would take is something like:
res = Subtract samplemat from binedges
res = Normalize values in res to 0 and 1 (use clip?). i.e if > 0, then 1 else 0.
ncount = sum ( res )
classifiedMat = res * binedges
And so on.

Categories