Related
Let's suppose we have 2 tensors like
A = [[1, 2, 3, 4],
[5, 6, 7, 8]]
B = [[True, True, True, True],
[True, False, True, True]]
I want to extract K left-most columns from A where its corresponding boolean mask in B is True. In the above example, if K=2, the results should be
C = [[1, 2],
[5, 7]]
6 is not included in C because its corresponding boolean mask is False.
I was able to do that with the following code:
batch_size = 2
C = tf.zeros((batch_size, K), tf.int32)
for batch_idx in tf.range(batch_size):
a = A[batch_idx]
b = B[batch_idx]
tmp = tf.boolean_mask(a, b)
tmp = tmp[:K]
C = tf.tensor_scatter_nd_update(
C, [[batch_idx]], tf.expand_dims(tmp, axis=0))
But I don't want to iterate over A and B with for loop.
Is there any way to do this with matrix operators only?
Not sure if it will work for all corner cases, but you could try using a tf.ragged.boolean_mask
import tensorflow as tf
A = [[1, 2, 3, 4],
[5, 6, 7, 8]]
B = [[True, True, True, True],
[True, False, True, True]]
K = 2
tmp = tf.ragged.boolean_mask(A, B)
C = tmp[:, :K].to_tensor()
tf.Tensor(
[[1 2]
[5 7]], shape=(2, 2), dtype=int32)
K = 3:
tf.Tensor(
[[1 2 3]
[5 7 8]], shape=(2, 3), dtype=int32)
suppose we have two arrays like these two:
A=np.array([[1, 4, 3, 0, 5],[6, 0, 7, 12, 11],[20, 15, 34, 45, 56]])
B=np.array([[4, 5, 6, 7]])
I intend to write a code in which I can find the indexes of an array such as A based on values in
the array B
for example, I want the final results to be something like this:
C=[[0 1]
[0 4]
[1 0]
[1 2]]
can anybody provide me with a solution or a hint?
Do you mean?
In [375]: np.isin(A,B[0])
Out[375]:
array([[False, True, False, False, True],
[ True, False, True, False, False],
[False, False, False, False, False]])
In [376]: np.argwhere(np.isin(A,B[0]))
Out[376]:
array([[0, 1],
[0, 4],
[1, 0],
[1, 2]])
B shape of (1,4) where the initial 1 isn't necessary. That's why I used B[0], though isin, via in1d ravels it anyways.
where is result is often more useful
In [381]: np.where(np.isin(A,B))
Out[381]: (array([0, 0, 1, 1]), array([1, 4, 0, 2]))
though it's a bit harder to understand.
Another way to get the isin array:
In [383]: (A==B[0,:,None,None]).any(axis=0)
Out[383]:
array([[False, True, False, False, True],
[ True, False, True, False, False],
[False, False, False, False, False]])
You can try in this way by using np.where().
index = []
for num in B:
for nums in num:
x,y = np.where(A == nums)
index.append([x,y])
print(index)
>>array([[0,1],
[0,4],
[1,0],
[1,2]])
With zip and np.where:
>>> list(zip(*np.where(np.in1d(A, B).reshape(A.shape))))
[(0, 1), (0, 4), (1, 0), (1, 2)]
Alternatively:
>>> np.vstack(np.where(np.isin(A,B))).transpose()
array([[0, 1],
[0, 4],
[1, 0],
[1, 2]], dtype=int64)
From numpy docs
>>> np.where([[True, False], [True, True]],
... [[1, 2], [3, 4]],
... [[9, 8], [7, 6]])
array([[1, 8],
[3, 4]])
Am I right in assuming that the [[True, False], [True, True]] part is the condition and [[1, 2], [3, 4]] and [[9, 8], [7, 6]] are x and y respectively according to the docs parameters.
Then how exactly is the function choosing the elements in the following examples?
Also, why is the element type in these examples a list?
>>> np.where([[True, False,True], [False, True]], [[1, 2,56], [3, 4]], [[9, 8,79], [7, 6]])
array([list([1, 2, 56]), list([3, 4])], dtype=object)
>>> np.where([[False, False,True,True], [False, True]], [[1, 2,56,69], [3, 4]], [[9, 8,90,100], [7, 6]])
array([list([1, 2, 56, 69]), list([3, 4])], dtype=object)
In the first case, each term is a (2,2) array (or rather list that can be made into such an array). For each True in the condition, it returns the corresponding term in x, the [[1 -][3,4]], and for each False, the term from y [[- 8][- -]]
In the second case, the lists are ragged
In [1]: [[True, False,True], [False, True]]
Out[1]: [[True, False, True], [False, True]]
In [2]: np.array([[True, False,True], [False, True]])
Out[2]: array([list([True, False, True]), list([False, True])], dtype=object)
the array is (2,), with 2 lists. And when cast as boolean, a 2 element array, with both True. Only an empty list would produce False.
In [3]: _.astype(bool)
Out[3]: array([ True, True])
The where then returns just the x values.
This second case is understandable, but pathological.
more details
Let's demonstrate where in more detail, with a simpler case. Same condition array:
In [57]: condition = np.array([[True, False], [True, True]])
In [58]: condition
Out[58]:
array([[ True, False],
[ True, True]])
The single argument version, which is the equivalent to condition.nonzero():
In [59]: np.where(condition)
Out[59]: (array([0, 1, 1]), array([0, 0, 1]))
Some find it easier to visualize the transpose of that tuple - the 3 pairs of coordinates where condition is True:
In [60]: np.argwhere(condition)
Out[60]:
array([[0, 0],
[1, 0],
[1, 1]])
Now the simplest version with 3 arguments, with scalar values.
In [61]: np.where(condition, True, False) # same as condition
Out[61]:
array([[ True, False],
[ True, True]])
In [62]: np.where(condition, 100, 200)
Out[62]:
array([[100, 200],
[100, 100]])
A good way of visualizing this action is with two masked assignments.
In [63]: res = np.zeros(condition.shape, int)
In [64]: res[condition] = 100
In [65]: res[~condition] = 200
In [66]: res
Out[66]:
array([[100, 200],
[100, 100]])
Another way to do this is to initial an array with the y value(s), and where the nonzero where to fill in the x value.
In [69]: res = np.full(condition.shape, 200)
In [70]: res
Out[70]:
array([[200, 200],
[200, 200]])
In [71]: res[np.where(condition)] = 100
In [72]: res
Out[72]:
array([[100, 200],
[100, 100]])
If x and y are arrays, not scalars, this masked assignment will require refinements, but hopefully for a start this will help.
np.where(condition,x,y)
It checks the condition and if its True returns x else it returns y
np.where([[True, False], [True, True]],
[[1, 2], [3, 4]],
[[9, 8], [7, 6]])
Here you condition is[[True, False], [True, True]]
x = [[1 , 2] , [3 , 4]]
y = [[9 , 8] , [7 , 6]]
First condition is true so it return 1 instead of 9
Second condition is false so it returns 8 instead of 2
After reading about broadcasting as #hpaulj suggested I think I know how the function works.
It will try to broadcast the 3 arrays,then if the broadcast was successful it will use the True and False values to pick elements either from x or y.
In the example
>>>np.where([[True, False,True], [False, True]], [[1, 2,56], [3, 4]], [[9, 8,79], [7, 6]])
We have
cnd=np.array([[True, False,True], [False, True]])
x=np.array([[1, 2,56], [3, 4]])
y=np.array([[9, 8,79], [7, 6]])
Now
>>>x.shape
Out[7]: (2,)
>>>y.shape
Out[8]: (2,)
>>>cnd.shape
Out[9]: (2,)
So all three are just arrays with 2 elements(of type list) even the condition(cnd).So both [True, False,True] and [False, True] will be evaluated as True.And both the elements will be selected from x.
>>>np.where([[True, False,True], [False, True]], [[1, 2,56], [3, 4]], [[9, 8,79], [7, 6]])
Out[10]: array([list([1, 2, 56]), list([3, 4])], dtype=object)
I also tried it with a more complex example(a 2x2x2 broadcast) and it still explains it.
np.where([[[True,False],[True,True]], [[False,False],[True,False]]],
[[[12,45],[10,50]], [[100,10],[17,81]]],
[[[90,93],[85,13]], [[12,345], [190,56,34]]])
Where
cnd=np.array([[[True,False],[True,True]], [[False,False],[True,False]]])
x=np.array([[[12,45],[10,50]], [[100,10],[17,81]]])
y=np.array( [[[90,93],[85,13]], [[12,345], [190,56,34]]])
Here cnd and x have the shape (2,2,2) and y has the shape (2,2).
>>>cnd.shape
Out[14]: (2, 2, 2)
>>>x.shape
Out[15]: (2, 2, 2)
>>>y.shape
Out[16]: (2, 2)
Now as #hpaulj commented y will be broadcasted to (2,2,2).
And it'll probably look like this
>>>cnd
Out[6]:
array([[[ True, False],
[ True, True]],
[[False, False],
[ True, False]]])
>>>x
Out[7]:
array([[[ 12, 45],
[ 10, 50]],
[[100, 10],
[ 17, 81]]])
>>>np.broadcast_to(y,(2,2,2))
Out[8]:
array([[[list([90, 93]), list([85, 13])],
[list([12, 345]), list([190, 56, 34])]],
[[list([90, 93]), list([85, 13])],
[list([12, 345]), list([190, 56, 34])]]], dtype=object)
And the result can be easily predicted to be
>>>np.where([[[True,False],[True,True]], [[False,False],[True,False]]], [[[12,45],[10,50]], [[100,10],[17,81]]],[[[90,93],[85,13]], [[12,345], [190,56,34]]])
Out[9]:
array([[[12, list([85, 13])],
[10, 50]],
[[list([90, 93]), list([85, 13])],
[17, list([190, 56, 34])]]], dtype=object)
Using numpy, I have a matrix called points.
points
=> matrix([[0, 2],
[0, 0],
[1, 3],
[4, 6],
[0, 7],
[0, 3]])
If I have the tuple (1, 3), I want to find the row in points that matches these numbers (in this case, the row index is 2).
I tried using np.where:
np.where(points == (1, 3))
=> (array([2, 2, 5]), array([0, 1, 1]))
What is the meaning of this output? Can it be used to find the row where (1, 3) occurs?
You were just needed to look for ALL matches along each row, like so -
np.where((a==(1,3)).all(axis=1))[0]
Steps involved using given sample -
In [17]: a # Input matrix
Out[17]:
matrix([[0, 2],
[0, 0],
[1, 3],
[4, 6],
[0, 7],
[0, 3]])
In [18]: (a==(1,3)) # Matrix of broadcasted matches
Out[18]:
matrix([[False, False],
[False, False],
[ True, True],
[False, False],
[False, False],
[False, True]], dtype=bool)
In [19]: (a==(1,3)).all(axis=1) # Look for ALL matches along each row
Out[19]:
matrix([[False],
[False],
[ True],
[False],
[False],
[False]], dtype=bool)
In [20]: np.where((a==(1,3)).all(1))[0] # Use np.where to get row indices
Out[20]: array([2])
How do I get the masked data only without flattening the data into a 1D array? That is, suppose I have a numpy array
a = np.array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
and I mask all elements greater than 1,
b = ma.masked_greater(a, 1)
masked_array(data =
[[0 1 -- --]
[0 1 -- --]
[0 1 -- --]],
mask =
[[False False True True]
[False False True True]
[False False True True]],
fill_value = 999999)
How do I get only the masked elements without flattening the output? That is, I need to get
array([[ 2, 3],
[2, 3],
[2, 3]])
Lets try an example that produces a ragged result - different number of 'masked' values in each row.
In [292]: a=np.arange(12).reshape(3,4)
In [293]: a
Out[293]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [294]: a<6
Out[294]:
array([[ True, True, True, True],
[ True, True, False, False],
[False, False, False, False]], dtype=bool)
The flattened list of values that match this condition. It can't return a regular 2d array, so it has to revert to a flattened array.
In [295]: a[a<6]
Out[295]: array([0, 1, 2, 3, 4, 5])
do the same thing, but iterating row by row
In [296]: [a1[a1<6] for a1 in a]
Out[296]: [array([0, 1, 2, 3]), array([4, 5]), array([], dtype=int32)]
Trying to make an array of the result produces an object type array, which is little more than a list in an array wrapper:
In [297]: np.array([a1[a1<6] for a1 in a])
Out[297]: array([array([0, 1, 2, 3]), array([4, 5]), array([], dtype=int32)], dtype=object)
The fact that the result is ragged is a good indicator that it is difficult, if not impossible, to perform that action with one vectorized operation.
Here's another way of producing the list of arrays. With sum I find how many elements there are in each row, and then use this to split the flattened array into sublists.
In [320]: idx=(a<6).sum(1).cumsum()[:-1]
In [321]: idx
Out[321]: array([4, 6], dtype=int32)
In [322]: np.split(a[a<6], idx)
Out[322]: [array([0, 1, 2, 3]), array([4, 5]), array([], dtype=float64)]
It does use 'flattening'. And for these small examples it is slower than the row iteration. (Don't worry about the empty float array, split had to construct something and used a default dtype. )
A different mask, without empty rows clearly shows the equivalence of the 2 approaches.
In [344]: mask=np.tri(3,4,dtype=bool) # lower tri
In [345]: mask
Out[345]:
array([[ True, False, False, False],
[ True, True, False, False],
[ True, True, True, False]], dtype=bool)
In [346]: idx=mask.sum(1).cumsum()[:-1]
In [347]: idx
Out[347]: array([1, 3], dtype=int32)
In [348]: [a1[m] for a1,m in zip(a,mask)]
Out[348]: [array([0]), array([4, 5]), array([ 8, 9, 10])]
In [349]: np.split(a[mask],idx)
Out[349]: [array([0]), array([4, 5]), array([ 8, 9, 10])]
Zip the two lists together, and then filter them out:
data = [[0, 1, 1, 1], [0, 1, 1, 1], [0, 1, 1, 1]]
mask = [[False, False, True, True],
[False, False, True, True],
[False, False, True, True]]
zipped = zip(data, mask) # [([0, 1, 1, 1], [False, False, True, True]), ([0, 1, 1, 1], [False, False, True, True]), ([0, 1, 1, 1], [False, False, True, True])]
masked = []
for lst, mask in zipped:
pairs = zip(lst, mask) # [(0, False), (1, False), (1, True), (1, True)]
masked.append([num for num, b in pairs if b])
print(masked) # [[1, 1], [1, 1], [1, 1]]
or more succinctly:
zipped = [...]
masked = [[num for num, b in zip(lst, mask) if b] for lst, mask in zipped]
print(masked) # [[1, 1], [1, 1], [1, 1]]
Due to vectorization in numpy you can use np.where to select items from the first array and use None (or some arbitrary value) to indicate the places that a value has been masked out. Note that this means you have to use a less compact representation for the array so may want to use -1 or some special value.
import numpy as np
a = np.array([
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
mask = np.array([[ True, True, True, True],
[ True, False, True, True],
[False, True, True, False]])
np.where(a, np.array, None)
This produces
array([[0, 1, 2, 3],
[0, None, 2, 3],
[None, 1, 2, None]], dtype=object)