Use numpy.argwhere to obtain the matching values in an np.array - python

I'd like to use np.argwhere() to obtain the values in an np.array.
For example:
z = np.arange(9).reshape(3,3)
[[0 1 2]
[3 4 5]
[6 7 8]]
zi = np.argwhere(z % 3 == 0)
[[0 0]
[1 0]
[2 0]]
I want this array: [0, 3, 6] and did this:
t = [z[tuple(i)] for i in zi] # -> [0, 3, 6]
I assume there is an easier way.

Why not simply use masking here:
z[z % 3 == 0]
For your sample matrix, this will generate:
>>> z[z % 3 == 0]
array([0, 3, 6])
If you pass a matrix with the same dimensions with booleans as indices, you get an array with the elements of that matrix where the boolean matrix is True.
This will furthermore work more efficient, since you do the filtering at the numpy level (whereas list comprehension works at the Python interpreter level).

Source for argwhere
def argwhere(a):
"""
Find the indices of array elements that are non-zero, grouped by element.
...
"""
return transpose(nonzero(a))
np.where is the same as np.nonzero.
In [902]: z=np.arange(9).reshape(3,3)
In [903]: z%3==0
Out[903]:
array([[ True, False, False],
[ True, False, False],
[ True, False, False]], dtype=bool)
In [904]: np.nonzero(z%3==0)
Out[904]: (array([0, 1, 2], dtype=int32), array([0, 0, 0], dtype=int32))
In [905]: np.transpose(np.nonzero(z%3==0))
Out[905]:
array([[0, 0],
[1, 0],
[2, 0]], dtype=int32)
In [906]: z[[0,1,2], [0,0,0]]
Out[906]: array([0, 3, 6])
z[np.nonzero(z%3==0)] is equivalent to using I,J as indexing arrays:
In [907]: I,J =np.nonzero(z%3==0)
In [908]: I
Out[908]: array([0, 1, 2], dtype=int32)
In [909]: J
Out[909]: array([0, 0, 0], dtype=int32)
In [910]: z[I,J]
Out[910]: array([0, 3, 6])

Related

Rows of sparse matrix where no column is zero

I have a matrix like this:
A = sp.csr_matrix(np.array(
[[1, 1, 2, 1],
[0, 0, 2, 0],
[1, 4, 1, 1],
[0, 1, 0, 0]]))
I want to get all the rows where all columns are nonzero, so I can then get their sum. Either as an array:
rows = [True, False, True, False]
result = A[rows].sum()
Or as indices:
rows = [0, 2]
result = A[rows].sum()
I am stuck however at the first part, figuring out which rows to include in the sum, as most results seem to be looking for the opposite (rows where all columns are zero).
In [35]: from scipy import sparse
In [36]: A = sparse.csr_matrix(np.array(
...: [[1, 1, 2, 1],
...: [0, 0, 2, 0],
...: [1, 4, 1, 1],
...: [0, 1, 0, 0]]))
In [37]: A
Out[37]:
<4x4 sparse matrix of type '<class 'numpy.int64'>'
with 10 stored elements in Compressed Sparse Row format>
Sparse doesn't do 'all/any' kinds of operations because they treat 0's as significant values.
all on the dense equivalent works nicely:
In [41]: A.A.all(axis=1)
Out[41]: array([ True, False, True, False])
On the sparse one we can turn the dtype to boolean, and sum along the axis. And then test it for the full value:
In [42]: A.astype(bool).sum(axis=1)
Out[42]:
matrix([[4],
[1],
[4],
[1]])
In [43]: A.astype(bool).sum(axis=1).A1==4
Out[43]: array([ True, False, True, False])
Notice that the sparse sum returns a np.matrix. I used A1 to turn that into a 1d array.
If the matrix isn't too large, working with the dense array may be faster. Sparse operations like sum are actually performed with matrix multiplication.
In [51]: A.astype(bool)#np.ones(4,int)
Out[51]: array([4, 1, 4, 1])
Or we could convert it to lil format, and look at the length of the 'rows':
In [67]: A.tolil().data
Out[67]:
array([list([1, 1, 2, 1]), list([2]), list([1, 4, 1, 1]), list([1])],
dtype=object)
In [68]: [len(i) for i in A.tolil().data]
Out[68]: [4, 1, 4, 1]
But wait, there's more. The indptr attribute of the csr is:
In [69]: A.indptr
Out[69]: array([ 0, 4, 5, 9, 10], dtype=int32)
In [70]: np.diff(A.indptr)
Out[70]: array([4, 1, 4, 1], dtype=int32)
I've omitted some test timings, but this last is clearly the fastest!
It is a bit easier to do for numpy arrays than for sparse ones. If you do not mind converting to numpy as an intermediate step, you can get the right rows via
(A.toarray() != 0).all(axis=1)
to produce
array([ True, False, True, False])
and then use it in indexing A as such:
A[(A.toarray() != 0).all(axis=1),:].sum()
returns 12

Convert a list to a numpy mask array

Given list like indice = [1, 0, 2] and dimension m = 3, I want to get the mask array like this
>>> import numpy as np
>>> mask_array = np.array([ [1, 1, 0], [1, 0, 0], [1, 1, 1] ])
>>> mask_array
[[1, 1, 0],
[1, 0, 0],
[1, 1, 1]]
Given m = 3, so the axis=1 of mask_array is 3, the row of mask_array indicates the length of indice.
For converting the indice to mask_array, the rule is marking the item values whose index is less or equal to the each entry of inside to value 1. For example, indice[0]=1, so the output is [1, 1, 0], given dimension is 3.
In NumPy, are there any APIs which can be used to do this?
Sure, just use broadcasting with arange(m), make sure to use an np.array for the indices, not a list...
>>> indice = [1, 0, 2]
>>> m = 3
>>> np.arange(m) <= np.array(indice)[..., None]
array([[ True, True, False],
[ True, False, False],
[ True, True, True]])
Note, the [..., None] just reshapes the indices array so that the broadcasting works like we want, like this:
>>> indices = np.array(indice)
>>> indices
array([1, 0, 2])
>>> indices[...,None]
array([[1],
[0],
[2]])

Count different non-zero values of two 2D arrays in Python numpy

I would like to find out how many values in 2D array array1 are different from values in array2 on same positions (x, y) and not equal 0 in array2 using Numpy.
array1 = numpy.array([[1, 2], [3, 0]])
array2 = numpy.array([[1, 2], [0, 3]])
print(numpy.count_nonzero(array1 != array2)) # 2
Example above prints 2, because 0 and 3 are different. Is there any way not to count difference if value in array2 is 0? Something like that (which is not working - ValueError: The truth value of an array with more then one element is ambiguous. Use a.any() or a.all()):
print(numpy.count_nonzero(array1 != array2 and array2 != 0)) # Should be 1.
a = np.array([[1, 2], [3, 0]])
b = np.array([[1, 2], [0, 3]])
Filter out b's zero values
np.nonzero returns indices, this uses multidimensional index arrays to filter out the zero values.
In [144]: b.nonzero()
Out[144]: (array([0, 0, 1], dtype=int64), array([0, 1, 1], dtype=int64))
In [145]: a[b.nonzero()]
Out[145]: array([1, 2, 0])
In [146]: b[b.nonzero()]
Out[146]: array([1, 2, 3])
In [147]: c = a[b.nonzero()] != b[b.nonzero()]
In [148]: c.sum()
Out[148]: 1
This uses boolean indexing to filter out the zero values.
In [149]: b != 0
Out[149]:
array([[ True, True],
[False, True]], dtype=bool)
In [150]: a[b != 0]
Out[150]: array([1, 2, 0])
In [151]: b[b != 0]
Out[151]: array([1, 2, 3])
In [152]: c = a[b != 0] != b[b != 0]
In [153]: c.sum()
Out[153]: 1
You can achieve that by replacing and with multiplication:
print(numpy.count_nonzero((array1 != array2) * (array2 != 0)))

2d numpy mask not working as expected

I'm trying to turn a 2x3 numpy array into a 2x2 array by removing select indexes.
I think I can do this with a mask array with true/false values.
Given
[ 1, 2, 3],
[ 4, 1, 6]
I want to remove one element from each row to give me:
[ 2, 3],
[ 4, 6]
However this method isn't working quite like I would expect:
import numpy as np
in_array = np.array([
[ 1, 2, 3],
[ 4, 1, 6]
])
mask = np.array([
[False, True, True],
[True, False, True]
])
print in_array[mask]
Gives me:
[2 3 4 6]
Which is not what I want. Any ideas?
The only thing 'wrong' with that is it is the shape - 1d rather than 2. But what if your mask was
mask = np.array([
[False, True, False],
[True, False, True]
])
1 value in the first row, 2 in second. It couldn't return that as a 2d array, could it?
So the default behavior when masking like this is to return a 1d, or raveled result.
Boolean indexing like this is effectively a where indexing:
In [19]: np.where(mask)
Out[19]: (array([0, 0, 1, 1], dtype=int32), array([1, 2, 0, 2], dtype=int32))
In [20]: in_array[_]
Out[20]: array([2, 3, 4, 6])
It finds the elements of the mask which are true, and then selects the corresponding elements of the in_array.
Maybe the transpose of where is easier to visualize:
In [21]: np.argwhere(mask)
Out[21]:
array([[0, 1],
[0, 2],
[1, 0],
[1, 2]], dtype=int32)
and indexing iteratively:
In [23]: for ij in np.argwhere(mask):
...: print(in_array[tuple(ij)])
...:
2
3
4
6

Instantiate a matrix with x zeros and the rest ones

I would like to be able to quickly instantiate a matrix where the first few (variable number of) cells in a row are 0, and the rest are ones.
Imagine we want a 3x4 matrix.
I have instantiated the matrix first as all ones:
ones = np.ones([4,3])
Then imagine we have an array that announces how many leading zeros there are:
arr = np.array([2,1,3,0]) # first row has 2 zeroes, second row 1 zero, etc
Required result:
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[1, 1, 1]])
Obviously this can be done in the opposite way as well, but I'd consider the approach where 1 is a default value, and zeros would be replaced.
What would be the best way to avoid some silly loop?
Here's one way. n is the number of columns in the result. The number of rows is determined by len(arr).
In [29]: n = 5
In [30]: arr = np.array([1, 2, 3, 0, 3])
In [31]: (np.arange(n) >= arr[:, np.newaxis]).astype(int)
Out[31]:
array([[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 1, 1],
[1, 1, 1, 1, 1],
[0, 0, 0, 1, 1]])
There are two parts to the explanation of how this works. First, how to create a row with m zeros and n-m ones? For that, we use np.arange to create a row with values [0, 1, ..., n-1]`:
In [35]: n
Out[35]: 5
In [36]: np.arange(n)
Out[36]: array([0, 1, 2, 3, 4])
Next, compare that array to m:
In [37]: m = 2
In [38]: np.arange(n) >= m
Out[38]: array([False, False, True, True, True], dtype=bool)
That gives an array of boolean values; the first m values are False and the rest are True. By casting those values to integers, we get an array of 0s and 1s:
In [39]: (np.arange(n) >= m).astype(int)
Out[39]: array([0, 0, 1, 1, 1])
To perform this over an array of m values (your arr), we use broadcasting; this is the second key idea of the explanation.
Note what arr[:, np.newaxis] gives:
In [40]: arr
Out[40]: array([1, 2, 3, 0, 3])
In [41]: arr[:, np.newaxis]
Out[41]:
array([[1],
[2],
[3],
[0],
[3]])
That is, arr[:, np.newaxis] reshapes arr into a 2-d array with shape (5, 1). (arr.reshape(-1, 1) could have been used instead.) Now when we compare this to np.arange(n) (a 1-d array with length n), broadcasting kicks in:
In [42]: np.arange(n) >= arr[:, np.newaxis]
Out[42]:
array([[False, True, True, True, True],
[False, False, True, True, True],
[False, False, False, True, True],
[ True, True, True, True, True],
[False, False, False, True, True]], dtype=bool)
As #RogerFan points out in his comment, this is basically an outer product of the arguments, using the >= operation.
A final cast to type int gives the desired result:
In [43]: (np.arange(n) >= arr[:, np.newaxis]).astype(int)
Out[43]:
array([[0, 1, 1, 1, 1],
[0, 0, 1, 1, 1],
[0, 0, 0, 1, 1],
[1, 1, 1, 1, 1],
[0, 0, 0, 1, 1]])
Not as concise as I wanted (I was experimenting with mask_indices), but this will also do the work:
>>> n = 3
>>> zeros = [2, 1, 3, 0]
>>> numpy.array([[0] * zeros[i] + [1]*(n - zeros[i]) for i in range(len(zeros))])
array([[0, 0, 1],
[0, 1, 1],
[0, 0, 0],
[1, 1, 1]])
>>>
Works very simple: concatenates multiplied required number of times, one-element lists [0] and [1], creating the array row by row.

Categories