Related
suppose we have two arrays like these two:
A=np.array([[1, 4, 3, 0, 5],[6, 0, 7, 12, 11],[20, 15, 34, 45, 56]])
B=np.array([[4, 5, 6, 7]])
I intend to write a code in which I can find the indexes of an array such as A based on values in
the array B
for example, I want the final results to be something like this:
C=[[0 1]
[0 4]
[1 0]
[1 2]]
can anybody provide me with a solution or a hint?
Do you mean?
In [375]: np.isin(A,B[0])
Out[375]:
array([[False, True, False, False, True],
[ True, False, True, False, False],
[False, False, False, False, False]])
In [376]: np.argwhere(np.isin(A,B[0]))
Out[376]:
array([[0, 1],
[0, 4],
[1, 0],
[1, 2]])
B shape of (1,4) where the initial 1 isn't necessary. That's why I used B[0], though isin, via in1d ravels it anyways.
where is result is often more useful
In [381]: np.where(np.isin(A,B))
Out[381]: (array([0, 0, 1, 1]), array([1, 4, 0, 2]))
though it's a bit harder to understand.
Another way to get the isin array:
In [383]: (A==B[0,:,None,None]).any(axis=0)
Out[383]:
array([[False, True, False, False, True],
[ True, False, True, False, False],
[False, False, False, False, False]])
You can try in this way by using np.where().
index = []
for num in B:
for nums in num:
x,y = np.where(A == nums)
index.append([x,y])
print(index)
>>array([[0,1],
[0,4],
[1,0],
[1,2]])
With zip and np.where:
>>> list(zip(*np.where(np.in1d(A, B).reshape(A.shape))))
[(0, 1), (0, 4), (1, 0), (1, 2)]
Alternatively:
>>> np.vstack(np.where(np.isin(A,B))).transpose()
array([[0, 1],
[0, 4],
[1, 0],
[1, 2]], dtype=int64)
From numpy docs
>>> np.where([[True, False], [True, True]],
... [[1, 2], [3, 4]],
... [[9, 8], [7, 6]])
array([[1, 8],
[3, 4]])
Am I right in assuming that the [[True, False], [True, True]] part is the condition and [[1, 2], [3, 4]] and [[9, 8], [7, 6]] are x and y respectively according to the docs parameters.
Then how exactly is the function choosing the elements in the following examples?
Also, why is the element type in these examples a list?
>>> np.where([[True, False,True], [False, True]], [[1, 2,56], [3, 4]], [[9, 8,79], [7, 6]])
array([list([1, 2, 56]), list([3, 4])], dtype=object)
>>> np.where([[False, False,True,True], [False, True]], [[1, 2,56,69], [3, 4]], [[9, 8,90,100], [7, 6]])
array([list([1, 2, 56, 69]), list([3, 4])], dtype=object)
In the first case, each term is a (2,2) array (or rather list that can be made into such an array). For each True in the condition, it returns the corresponding term in x, the [[1 -][3,4]], and for each False, the term from y [[- 8][- -]]
In the second case, the lists are ragged
In [1]: [[True, False,True], [False, True]]
Out[1]: [[True, False, True], [False, True]]
In [2]: np.array([[True, False,True], [False, True]])
Out[2]: array([list([True, False, True]), list([False, True])], dtype=object)
the array is (2,), with 2 lists. And when cast as boolean, a 2 element array, with both True. Only an empty list would produce False.
In [3]: _.astype(bool)
Out[3]: array([ True, True])
The where then returns just the x values.
This second case is understandable, but pathological.
more details
Let's demonstrate where in more detail, with a simpler case. Same condition array:
In [57]: condition = np.array([[True, False], [True, True]])
In [58]: condition
Out[58]:
array([[ True, False],
[ True, True]])
The single argument version, which is the equivalent to condition.nonzero():
In [59]: np.where(condition)
Out[59]: (array([0, 1, 1]), array([0, 0, 1]))
Some find it easier to visualize the transpose of that tuple - the 3 pairs of coordinates where condition is True:
In [60]: np.argwhere(condition)
Out[60]:
array([[0, 0],
[1, 0],
[1, 1]])
Now the simplest version with 3 arguments, with scalar values.
In [61]: np.where(condition, True, False) # same as condition
Out[61]:
array([[ True, False],
[ True, True]])
In [62]: np.where(condition, 100, 200)
Out[62]:
array([[100, 200],
[100, 100]])
A good way of visualizing this action is with two masked assignments.
In [63]: res = np.zeros(condition.shape, int)
In [64]: res[condition] = 100
In [65]: res[~condition] = 200
In [66]: res
Out[66]:
array([[100, 200],
[100, 100]])
Another way to do this is to initial an array with the y value(s), and where the nonzero where to fill in the x value.
In [69]: res = np.full(condition.shape, 200)
In [70]: res
Out[70]:
array([[200, 200],
[200, 200]])
In [71]: res[np.where(condition)] = 100
In [72]: res
Out[72]:
array([[100, 200],
[100, 100]])
If x and y are arrays, not scalars, this masked assignment will require refinements, but hopefully for a start this will help.
np.where(condition,x,y)
It checks the condition and if its True returns x else it returns y
np.where([[True, False], [True, True]],
[[1, 2], [3, 4]],
[[9, 8], [7, 6]])
Here you condition is[[True, False], [True, True]]
x = [[1 , 2] , [3 , 4]]
y = [[9 , 8] , [7 , 6]]
First condition is true so it return 1 instead of 9
Second condition is false so it returns 8 instead of 2
After reading about broadcasting as #hpaulj suggested I think I know how the function works.
It will try to broadcast the 3 arrays,then if the broadcast was successful it will use the True and False values to pick elements either from x or y.
In the example
>>>np.where([[True, False,True], [False, True]], [[1, 2,56], [3, 4]], [[9, 8,79], [7, 6]])
We have
cnd=np.array([[True, False,True], [False, True]])
x=np.array([[1, 2,56], [3, 4]])
y=np.array([[9, 8,79], [7, 6]])
Now
>>>x.shape
Out[7]: (2,)
>>>y.shape
Out[8]: (2,)
>>>cnd.shape
Out[9]: (2,)
So all three are just arrays with 2 elements(of type list) even the condition(cnd).So both [True, False,True] and [False, True] will be evaluated as True.And both the elements will be selected from x.
>>>np.where([[True, False,True], [False, True]], [[1, 2,56], [3, 4]], [[9, 8,79], [7, 6]])
Out[10]: array([list([1, 2, 56]), list([3, 4])], dtype=object)
I also tried it with a more complex example(a 2x2x2 broadcast) and it still explains it.
np.where([[[True,False],[True,True]], [[False,False],[True,False]]],
[[[12,45],[10,50]], [[100,10],[17,81]]],
[[[90,93],[85,13]], [[12,345], [190,56,34]]])
Where
cnd=np.array([[[True,False],[True,True]], [[False,False],[True,False]]])
x=np.array([[[12,45],[10,50]], [[100,10],[17,81]]])
y=np.array( [[[90,93],[85,13]], [[12,345], [190,56,34]]])
Here cnd and x have the shape (2,2,2) and y has the shape (2,2).
>>>cnd.shape
Out[14]: (2, 2, 2)
>>>x.shape
Out[15]: (2, 2, 2)
>>>y.shape
Out[16]: (2, 2)
Now as #hpaulj commented y will be broadcasted to (2,2,2).
And it'll probably look like this
>>>cnd
Out[6]:
array([[[ True, False],
[ True, True]],
[[False, False],
[ True, False]]])
>>>x
Out[7]:
array([[[ 12, 45],
[ 10, 50]],
[[100, 10],
[ 17, 81]]])
>>>np.broadcast_to(y,(2,2,2))
Out[8]:
array([[[list([90, 93]), list([85, 13])],
[list([12, 345]), list([190, 56, 34])]],
[[list([90, 93]), list([85, 13])],
[list([12, 345]), list([190, 56, 34])]]], dtype=object)
And the result can be easily predicted to be
>>>np.where([[[True,False],[True,True]], [[False,False],[True,False]]], [[[12,45],[10,50]], [[100,10],[17,81]]],[[[90,93],[85,13]], [[12,345], [190,56,34]]])
Out[9]:
array([[[12, list([85, 13])],
[10, 50]],
[[list([90, 93]), list([85, 13])],
[17, list([190, 56, 34])]]], dtype=object)
I'm trying to turn a 2x3 numpy array into a 2x2 array by removing select indexes.
I think I can do this with a mask array with true/false values.
Given
[ 1, 2, 3],
[ 4, 1, 6]
I want to remove one element from each row to give me:
[ 2, 3],
[ 4, 6]
However this method isn't working quite like I would expect:
import numpy as np
in_array = np.array([
[ 1, 2, 3],
[ 4, 1, 6]
])
mask = np.array([
[False, True, True],
[True, False, True]
])
print in_array[mask]
Gives me:
[2 3 4 6]
Which is not what I want. Any ideas?
The only thing 'wrong' with that is it is the shape - 1d rather than 2. But what if your mask was
mask = np.array([
[False, True, False],
[True, False, True]
])
1 value in the first row, 2 in second. It couldn't return that as a 2d array, could it?
So the default behavior when masking like this is to return a 1d, or raveled result.
Boolean indexing like this is effectively a where indexing:
In [19]: np.where(mask)
Out[19]: (array([0, 0, 1, 1], dtype=int32), array([1, 2, 0, 2], dtype=int32))
In [20]: in_array[_]
Out[20]: array([2, 3, 4, 6])
It finds the elements of the mask which are true, and then selects the corresponding elements of the in_array.
Maybe the transpose of where is easier to visualize:
In [21]: np.argwhere(mask)
Out[21]:
array([[0, 1],
[0, 2],
[1, 0],
[1, 2]], dtype=int32)
and indexing iteratively:
In [23]: for ij in np.argwhere(mask):
...: print(in_array[tuple(ij)])
...:
2
3
4
6
Suppose I have a column vector y with length n, and I have a matrix X of size n*m. I want to check for each element i in y, whether the element is in the corresponding row in X. What is the most efficient way of doing this?
For example:
y = [1,2,3,4].T
and
X =[[1, 2, 3],[3, 4, 5],[4, 3, 2],[2, 2, 2]]
Then the output should be
[1, 0, 1, 0] or [True, False, True, False]
which ever is easier.
Of course we can use a for loop to iterate through both y and X, but is there any more efficient way of doing this?
Vectorized approach using broadcasting -
((X == y[:,None]).any(1)).astype(int)
Sample run -
In [41]: X # Input 1
Out[41]:
array([[1, 2, 3],
[3, 4, 5],
[4, 3, 2],
[2, 2, 2]])
In [42]: y # Input 2
Out[42]: array([1, 2, 3, 4])
In [43]: X == y[:,None] # Broadcasted comparison
Out[43]:
array([[ True, False, False],
[False, False, False],
[False, True, False],
[False, False, False]], dtype=bool)
In [44]: (X == y[:,None]).any(1) # Check for any match along each row
Out[44]: array([ True, False, True, False], dtype=bool)
In [45]: ((X == y[:,None]).any(1)).astype(int) # Convert to 1s and 0s
Out[45]: array([1, 0, 1, 0])
How do I get the masked data only without flattening the data into a 1D array? That is, suppose I have a numpy array
a = np.array([[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
and I mask all elements greater than 1,
b = ma.masked_greater(a, 1)
masked_array(data =
[[0 1 -- --]
[0 1 -- --]
[0 1 -- --]],
mask =
[[False False True True]
[False False True True]
[False False True True]],
fill_value = 999999)
How do I get only the masked elements without flattening the output? That is, I need to get
array([[ 2, 3],
[2, 3],
[2, 3]])
Lets try an example that produces a ragged result - different number of 'masked' values in each row.
In [292]: a=np.arange(12).reshape(3,4)
In [293]: a
Out[293]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In [294]: a<6
Out[294]:
array([[ True, True, True, True],
[ True, True, False, False],
[False, False, False, False]], dtype=bool)
The flattened list of values that match this condition. It can't return a regular 2d array, so it has to revert to a flattened array.
In [295]: a[a<6]
Out[295]: array([0, 1, 2, 3, 4, 5])
do the same thing, but iterating row by row
In [296]: [a1[a1<6] for a1 in a]
Out[296]: [array([0, 1, 2, 3]), array([4, 5]), array([], dtype=int32)]
Trying to make an array of the result produces an object type array, which is little more than a list in an array wrapper:
In [297]: np.array([a1[a1<6] for a1 in a])
Out[297]: array([array([0, 1, 2, 3]), array([4, 5]), array([], dtype=int32)], dtype=object)
The fact that the result is ragged is a good indicator that it is difficult, if not impossible, to perform that action with one vectorized operation.
Here's another way of producing the list of arrays. With sum I find how many elements there are in each row, and then use this to split the flattened array into sublists.
In [320]: idx=(a<6).sum(1).cumsum()[:-1]
In [321]: idx
Out[321]: array([4, 6], dtype=int32)
In [322]: np.split(a[a<6], idx)
Out[322]: [array([0, 1, 2, 3]), array([4, 5]), array([], dtype=float64)]
It does use 'flattening'. And for these small examples it is slower than the row iteration. (Don't worry about the empty float array, split had to construct something and used a default dtype. )
A different mask, without empty rows clearly shows the equivalence of the 2 approaches.
In [344]: mask=np.tri(3,4,dtype=bool) # lower tri
In [345]: mask
Out[345]:
array([[ True, False, False, False],
[ True, True, False, False],
[ True, True, True, False]], dtype=bool)
In [346]: idx=mask.sum(1).cumsum()[:-1]
In [347]: idx
Out[347]: array([1, 3], dtype=int32)
In [348]: [a1[m] for a1,m in zip(a,mask)]
Out[348]: [array([0]), array([4, 5]), array([ 8, 9, 10])]
In [349]: np.split(a[mask],idx)
Out[349]: [array([0]), array([4, 5]), array([ 8, 9, 10])]
Zip the two lists together, and then filter them out:
data = [[0, 1, 1, 1], [0, 1, 1, 1], [0, 1, 1, 1]]
mask = [[False, False, True, True],
[False, False, True, True],
[False, False, True, True]]
zipped = zip(data, mask) # [([0, 1, 1, 1], [False, False, True, True]), ([0, 1, 1, 1], [False, False, True, True]), ([0, 1, 1, 1], [False, False, True, True])]
masked = []
for lst, mask in zipped:
pairs = zip(lst, mask) # [(0, False), (1, False), (1, True), (1, True)]
masked.append([num for num, b in pairs if b])
print(masked) # [[1, 1], [1, 1], [1, 1]]
or more succinctly:
zipped = [...]
masked = [[num for num, b in zip(lst, mask) if b] for lst, mask in zipped]
print(masked) # [[1, 1], [1, 1], [1, 1]]
Due to vectorization in numpy you can use np.where to select items from the first array and use None (or some arbitrary value) to indicate the places that a value has been masked out. Note that this means you have to use a less compact representation for the array so may want to use -1 or some special value.
import numpy as np
a = np.array([
[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3]])
mask = np.array([[ True, True, True, True],
[ True, False, True, True],
[False, True, True, False]])
np.where(a, np.array, None)
This produces
array([[0, 1, 2, 3],
[0, None, 2, 3],
[None, 1, 2, None]], dtype=object)