Related
I want to know where array a is equal to any of the values in array b.
For example,
a = np.random.randint(0,16, size=(3,4))
b = np.array([2,3,9])
# like this, but for any size b:
locations = np.nonzero((a==b[0]) | (a==b[1]) | (a==b[3]))
The reason is so I can change the values in a from (any of b) to another value:
a[locations] = 99
Or-ing the equality checks is not a great solution, because I would like to do this without knowing the size of b ahead of time. Is there an array solution?
[edit]
There are now 2 good answers to this question, one using broadcasting with extra dimensions, and another using np.in1d. Both work for the specific case in this question. I ended up using np.isin instead, since it seems like it is more agnostic to the shapes of both a and b.
I accepted the answer that taught me about in1d since that led me to my preferred solution.
You can use np.in1d then reshape back to a's shape so you can set the values in a to your special flag.
import numpy as np
np.random.seed(410012)
a = np.random.randint(0, 16, size=(3, 4))
#array([[ 8, 5, 5, 15],
# [ 3, 13, 8, 10],
# [ 3, 11, 0, 10]])
b = np.array([[2,3,9], [4,5,6]])
a[np.in1d(a, b).reshape(a.shape)] = 999
#array([[ 8, 999, 999, 15],
# [999, 13, 8, 10],
# [999, 11, 0, 10]])
Or-ing the equality checks is not a great solution, because I would like to do this without knowing the size of b ahead of time.
EDIT:
Vectorized equivalent to the code you have written above -
a = np.random.randint(0,16, size=(3,4))
b = np.array([2,3,9])
locations = np.nonzero((a==b[0]) | (a==b[1]) | (a==b[2]))
locations2 = np.nonzero((a[None,:,:]==b[:,None,None]).any(0))
np.allclose(locations, locations2)
True
This shows that your output is exactly the same as this output, without the need of explicitly mentioning b[0], b[1]... or using a for loop.
Explanation -
Broadcasting an operation can help you in this case. What you are trying to do is to compare each of the (3,4) matrix elements to each value in b which is (3,). This means that the resultant boolean matrix that you want is going to be three, (3,4) matrices, or (3,3,4)
Once you have done that, you want to take an ANY or OR between the three (3,4) matrices element-wise. That would reduce the (3,3,4) to a (3,4)
Finally you want to use np.nonzero to identify the locations where values are equal to TRUE
The above 3 steps can be done as follows -
Broadcasting comparison operation:
a[None,:,:]==b[:,None,None]] #(1,3,4) == (3,1,1) -> (3,3,4)
Reduction using OR logic:
(a[None,:,:]==b[:,None,None]).any(0) #(3,3,4) -> (3,4)
Get non-zero locations:
np.nonzero((a[None,:,:]==b[:,None,None]).any(0))
numpy.isin works on multi-dimensional a and b.
In [1]: import numpy as np
In [2]: a = np.random.randint(0, 16, size=(3, 4)); a
Out[2]:
array([[12, 2, 15, 11],
[12, 15, 5, 10],
[ 4, 2, 14, 7]])
In [3]: b = [2, 4, 5, 12]
In [4]: c = [[2, 4], [5, 12]]
In [5]: np.isin(a, b).astype(int)
Out[5]:
array([[1, 1, 0, 0],
[1, 0, 1, 0],
[1, 1, 0, 0]])
In [6]: np.isin(a, c).astype(int)
Out[6]:
array([[1, 1, 0, 0],
[1, 0, 1, 0],
[1, 1, 0, 0]])
In [7]: a[np.isin(a, b)] = 99; a
Out[7]:
array([[99, 99, 15, 11],
[99, 15, 99, 10],
[99, 99, 14, 7]])
I'm trying to iterate over numpy rows, and put the index of each cluster of 3 elements that contains the lowest value into another row. This should be in the context of left, middle, right; the left and right edges only look at two values ('left and middle' or 'middle and right'), but everything in the middle should look at all 3.
For loops do this trivially, but it's very slow. Some kind of numpy vectorization would probably speed this up.
For example:
[1 18 3 6 2]
# should give the indices...
[0 0 2 4 4] # matching values 1 1 3 2 2
Slow for loop of an implementation:
for y in range(height):
for x in range(width):
i = 0 if x == 0 else x - 1
other_array[y,x] = np.argmin(array[y,i:x+2]) + i
NOTE: See update below for a solution with no for loops.
This works for an array of any number of dimensions:
def window_argmin(arr):
padded = np.pad(
arr,
[(0,)] * (arr.ndim-1) + [(1,)],
'constant',
constant_values=np.max(arr)+1,
)
slices = np.concatenate(
[
padded[..., np.newaxis, i:i+3]
for i in range(arr.shape[-1])
],
axis=-2,
)
return (
np.argmin(slices, axis=-1) +
np.arange(-1, arr.shape[-1]-1)
)
The code uses np.pad to pad the last dimension of the array with an extra number to the left and one to the right, so we can always use windows of 3 elements for the argmin. It sets the extra elements as max+1 so they'll never be picked by argmin.
Then it uses an np.concatenate of a list of slices to add a new dimension with each of 3-element windows. This is the only place we're using a for loop and we're only looping over the last dimension, once, to create the separate 3-element windows. (See update below for a solution that removes this for loop.)
Finally, we call np.argmin on each of the windows.
We need to adjust them, which we can do by adding the offset of the first element of the window (which is actually -1 for the first window, since it's a padded element.) We can do the adjustment with a simple sum of an arange array, which works with the broadcast.
Here's a test with your sample array:
>>> x = np.array([1, 18, 3, 6, 2])
>>> window_argmin(x)
array([0, 0, 2, 4, 4])
And a 3d example:
>>> z
array([[[ 1, 18, 3, 6, 2],
[ 1, 2, 3, 4, 5],
[ 3, 6, 19, 19, 7]],
[[ 1, 18, 3, 6, 2],
[99, 4, 4, 67, 2],
[ 9, 8, 7, 6, 3]]])
>>> window_argmin(z)
array([[[0, 0, 2, 4, 4],
[0, 0, 1, 2, 3],
[0, 0, 1, 4, 4]],
[[0, 0, 2, 4, 4],
[1, 1, 1, 4, 4],
[1, 2, 3, 4, 4]]])
UPDATE: Here's a version using stride_tricks that doesn't use any for loops:
def window_argmin(arr):
padded = np.pad(
arr,
[(0,)] * (arr.ndim-1) + [(1,)],
'constant',
constant_values=np.max(arr)+1,
)
slices = np.lib.stride_tricks.as_strided(
padded,
shape=arr.shape + (3,),
strides=padded.strides + (padded.strides[-1],),
)
return (
np.argmin(slices, axis=-1) +
np.arange(-1, arr.shape[-1]-1)
)
What helped me come up with the stride tricks solution was this numpy issue asking to add a sliding window function, linking to an example implementation of it, so I just adapted it for this specific case. It's still pretty much magic to me, but it works. 😁
Tested and works as expected for arrays of different numbers of dimensions.
import numpy as np
array = [1, 18, 3, 6, 2]
array.insert(0, np.max(array) + 1) # right shift of array
# [19, 1, 18, 3, 6, 2]
other_array = [ np.argmin(array[i-1:i+2]) + i - 2 for i in range(1, len(array)) ]
array.remove(np.max(array)) # original array
# [1, 18, 3, 6, 2]
I have the following function, that applies the histogram intersection kernel for 2 arrays:
def histogram_intersection_kernel(X, Y):
k = np.array([])
for x_i,y_i in zip(X,Y):
k = np.append(k,np.minimum(x_i,y_i))
return np.sum(k)
now, lets say I have the following matrix "mat":
[[1,0,0,2,3],
[2,3,4,0,1],
[3,3,5,0,1]]
I would like to find an efficient way to get the matrix that is the result of applying "histogram_intersection_kernel" to all of the combinations of rows in mat. In this example it would be:
[[6,2,2],
[6,10,10],
[2,10,12]]
Extend dimensions to 3D and leverage broadcasting -
np.minimum(a[:,None,:],a[None,:,:]).sum(axis=2)
Or simply -
np.minimum(a[:,None],a).sum(2)
Sample run -
In [248]: a
Out[248]:
array([[1, 0, 0, 2, 3],
[2, 3, 4, 0, 1],
[3, 3, 5, 0, 1]])
In [249]: np.minimum(a[:,None],a).sum(2)
Out[249]:
array([[ 6, 2, 2],
[ 2, 10, 10],
[ 2, 10, 12]])
I have a 2d and 1d array. I am looking to find the two rows that contain at least once the values from the 1d array as follows:
import numpy as np
A = np.array([[0, 3, 1],
[9, 4, 6],
[2, 7, 3],
[1, 8, 9],
[6, 2, 7],
[4, 8, 0]])
B = np.array([0,1,2,3])
results = []
for elem in B:
results.append(np.where(A==elem)[0])
This works and results in the following array:
[array([0, 5], dtype=int64),
array([0, 3], dtype=int64),
array([2, 4], dtype=int64),
array([0, 2], dtype=int64)]
But this is probably not the best way of proceeding. Following the answers given in this question (Search Numpy array with multiple values) I tried the following solutions:
out1 = np.where(np.in1d(A, B))
num_arr = np.sort(B)
idx = np.searchsorted(B, A)
idx[idx==len(num_arr)] = 0
out2 = A[A == num_arr[idx]]
But these give me incorrect values:
In [36]: out1
Out[36]: (array([ 0, 1, 2, 6, 8, 9, 13, 17], dtype=int64),)
In [37]: out2
Out[37]: array([0, 3, 1, 2, 3, 1, 2, 0])
Thanks for your help
If you need to know whether each row of A contains ANY element of array B without interest in which particular element of B it is, the following script can be used:
input:
np.isin(A,B).sum(axis=1)>0
output:
array([ True, False, True, True, True, True])
Since you're dealing with a 2D array* you can use broadcasting to compare B with raveled version of A. This will give you the respective indices in a raveled shape. Then you can reverse the result and get the corresponding indices in original array using np.unravel_index.
In [50]: d = np.where(B[:, None] == A.ravel())[1]
In [51]: np.unravel_index(d, A.shape)
Out[51]: (array([0, 5, 0, 3, 2, 4, 0, 2]), array([0, 2, 2, 0, 0, 1, 1, 2]))
^
# expected result
* From documentation: For 3-dimensional arrays this is certainly efficient in terms of lines of code, and, for small data sets, it can also be computationally efficient. For large data sets, however, the creation of the large 3-d array may result in sluggish performance.
Also, Broadcasting is a powerful tool for writing short and usually intuitive code that does its computations very efficiently in C. However, there are cases when broadcasting uses unnecessarily large amounts of memory for a particular algorithm. In these cases, it is better to write the algorithm's outer loop in Python. This may also produce more readable code, as algorithms that use broadcasting tend to become more difficult to interpret as the number of dimensions in the broadcast increases.
Is something like this what you are looking for?
import numpy as np
from itertools import combinations
A = np.array([[0, 3, 1],
[9, 4, 6],
[2, 7, 3],
[1, 8, 9],
[6, 2, 7],
[4, 8, 0]])
B = np.array([0,1,2,3])
for i in combinations(A, 2):
if np.all(np.isin(B, np.hstack(i))):
print(i[0], ' ', i[1])
which prints the following:
[0 3 1] [2 7 3]
[0 3 1] [6 2 7]
note: this solution does NOT require the rows be consecutive. Please let me know if that is required.
This question already has answers here:
Get corner values in Python numpy ndarray
(4 answers)
Closed 4 years ago.
I am trying to extract the four corner elements of a NumPy 2D array:
import numpy as np
data = np.arange(16).reshape((4, -1))
#array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11],
# [12, 13, 14, 15]])
The expected output is either [[0,3],[12,15]] or [0,3,12,15] (anything goes). True 2D fancy indexing delivers only the ends of the main diagonal:
data[[0,-1],[0,-1]]
#array([ 0, 15])
Pseudo-2D fancy indexing (first row-wise, then column-wise) delivers the right answer, but looks awkward:
data[[0,-1]][:,[0,-1]]
#array([[ 0, 3],
# [12, 15]])
Is there a way to use true fancy indexing, such as data[XXX,YYY], where XXX and YYY are lists/arrays/slices, to extract all four corners?
You can do:
data[[0, 0, -1, -1], [0, -1, 0, -1]]
Here are two possibilities. (Ok, first one isn't actually fancy):
>>> a = np.arange(9).reshape(3, 3)
>>>
>>> m, n = a.shape
>>> a[::m-1, ::n-1]
array([[0, 2],
[6, 8]])
>>>
>>> a[np.ix_((0,-1), (0,-1))]
array([[0, 2],
[6, 8]])
More explicitly:
>>> idx = np.ix_((0,-1), (0,-1))
>>> idx
(array([[ 0],
[-1]]), array([[ 0, -1]]))
>>> a[idx]
array([[0, 2],
[6, 8]])
The trick is to leverage broadcasting on the indices. np.ix_ knows the details of how to do it.