Numpy select elements with a condition along axis - python

I have a 2D numpy array x as:
[ [ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12],
[13, 14, 15],
[16, 17, 18],
[19, 20, 21],
[22, 23, 24],
[25, 26, 27],
[28, 29, 30],
[31, 32, 33],
[34, 35, 36],
[37, 38, 39],
[40, 41, 42],
[43, 44, 45],
[46, 47, 48],
[49, 50, 51],
[52, 53, 54],
[55, 56, 57],
[58, 59, 60] ]
I want to extract the arguments of rows for which any element in the row is less than 25. So, what I need for output is [0,1,2,3,4,5,6,7] for just the rows but using np.where(x<35) is giving me the list of 2D arguments for all possible values. In other words, I what I want are the arguments of all the rows of x where at least one element is less than 25, but what I am getting are the arguments of all the elements of x that are less than 25.
What should I do? Is there a specific function for this or should I just select the unique values from the returned list of arguments?

One way would be this:
import numpy as np
# x is your array
x1 = (x < 25).sum(axis = 1)
rows = np.where(x1 > 0)[0]
The row indices are in rows as array([0, 1, 2, 3, 4, 5, 6, 7]).
You can also use nonzero as:
rows = np.nonzero((x < 25).sum(axis = 1))[0]

Related

NumPy Array Fill Rows Downward By Indexed Sections

Let's say I have the following (fictitious) NumPy array:
arr = np.array(
[[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20],
[21, 22, 23, 24],
[25, 26, 27, 28],
[29, 30, 31, 32],
[33, 34, 35, 36],
[37, 38, 39, 40]
]
)
And for row indices idx = [0, 2, 3, 5, 8, 9] I'd like to repeat the values in each row downward until it reaches the next row index:
np.array(
[[1, 2, 3, 4],
[1, 2, 3, 4],
[9, 10, 11, 12],
[13, 14, 15, 16],
[13, 14, 15, 16],
[21, 22, 23, 24],
[21, 22, 23, 24],
[21, 22, 23, 24],
[33, 34, 35, 36],
[37, 38, 39, 40]
]
)
Note that idx will always be sorted and have no repeat values. While I can accomplish this by doing something like:
for start, stop in zip(idx[:-1], idx[1:]):
for i in range(start, stop):
arr[i] = arr[start]
# Handle last index in `idx`
start, stop = idx[-1], arr.shape[0]
for i in range(start, stop):
arr[i] = arr[start]
Unfortunately, I have many, many arrays like this and this can become slow as the size of the array gets larger (in both the number of rows as well as the number of columns) and the length of idx also increases. The final goal is to plot these as a heatmaps in matplotlib, which I already know how to do. Another approach that I tried was using np.tile:
for start, stop in zip(idx[:-1], idx[1:]):
reps = max(0, stop - start)
arr[start:stop] = np.tile(arr[start], (reps, 1))
# Handle last index in `idx`
start, stop = idx[-1], arr.shape[0]
arr[start:stop] = np.tile(arr[start], (reps, 1))
But I am hoping that there's a way to get rid of the slow for-loop.
Try np.diff to find the repetition for each row, then np.repeat:
# this assumes `idx` is a standard list as in the question
np.repeat(arr[idx], np.diff(idx+[len(arr)]), axis=0)
Output:
array([[ 1, 2, 3, 4],
[ 1, 2, 3, 4],
[ 9, 10, 11, 12],
[13, 14, 15, 16],
[13, 14, 15, 16],
[21, 22, 23, 24],
[21, 22, 23, 24],
[21, 22, 23, 24],
[33, 34, 35, 36],
[37, 38, 39, 40]])

Error in comparing of two arrays: DeprecationWarning: elementwise comparison failed; this will raise an error in the future

I have an array a with the shape of (1000000,32) and array b with the shape of (10000,32)
I want to find the indices of a that contains the rows of b.
I have written the following code:
I = np.argwhere((a == b[:, None]).all(axis=2))[:, 1]
when I test it in other cases it works very well. But for my current arrays it gives the following error:
...\Anaconda3\lib\site-packages\ipykernel_launcher.py:111: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
AttributeError: 'bool' object has no attribute 'all'
Any idea what is the source of the error? thanks
Run:
result = (a[:, np.newaxis] == b).all(-1).any(-1)
Steps:
a[:, np.newaxis] == b - "by element" comparison. First and second index -
indices of rows from a and b, third index - column index in both rows.
….all(-1) - does a[i] has its "counterpart" in b[j] (all
elements of both rows are equal).
….any(-1) - does a[i] has its "counterpart" in any row in b.
To check the results of each step, use 2 arrays with e.g. up to 10 rows and
2 columns.
np.arange(a.shape[0])[np.isin(a,b).all(axis=1)]
>>> import numpy as np
>>> a=np.arange(60).reshape(10,6)
>>> a
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59]])
>>> b=np.arange(24).reshape(4,6)
>>> b
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
>>> np.arange(a.shape[0])[np.isin(a,b).all(axis=1)]
array([0, 1, 2, 3])
Of course np.isin() is in the docs here and tests if an element of a is in b. The questioner is already familiar with all() and the use of the axis=1 argument. So np.isin(a,b).all(axis=1) produces a boolean array that selects from among the indexes of a which are represented by np.arange(a.shape[0]).

How to access array by a list of point coordinates

I have array A = np.ones((4,4,4)) and another array that represent the coordinates of point in array A called B, lets assume that B = [[2,2,2], [3,2,1]].
I tried to access A by array indexing like A[B], but it didn't works.
How i can do it in elegant way, that also work for B that it's have a higher dimensions like B of shape (10, 20, 3) ?
You can pass a list of coordinates, but you should transpose the list. Such that the items of the i-th dimension are passed as the i-th element in the indexing, for example with:
A[tuple(np.transpose(B))]
For a 4×4×4 matrix:
>>> A = np.arange(64).reshape(4,4,4)
>>> A
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]],
[[32, 33, 34, 35],
[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]],
[[48, 49, 50, 51],
[52, 53, 54, 55],
[56, 57, 58, 59],
[60, 61, 62, 63]]])
we get for the given coordinates:
>>> A[tuple(np.transpose(B))]
array([42, 57])
and if we calculate these manually, we get:
>>> A[2,2,2]
42
>>> A[3,2,1]
57
Background:
A[1,2,3] is short for A[(1,2,3)] (so in a tuple). You can fetch multiple items with A[([2,3], [2,2], [2,1])] but then you thus first need to transpose the data.
Since the data is represented as [[2,2,2], [3,2,1]], we thus first need to transpose it to [[2,3], [2,2], [2,1]]. Next we wrap it in a tuple, and can use this to subscript A.

Pytorch tensor indexing: How to gather rows by tensor containing indices

I have the tensors:
ids: shape (7000,1) containing indices like [[1],[0],[2],...]
x: shape(7000,3,255)
ids tensor encodes the index of bold marked dimension of x which should be selected.
I want to gather the selected slices in a resulting vector:
result: shape (7000,255)
Background:
I have some scores (shape = (7000,3)) for each of the 3 elements and want only to select the one with the highest score. Therefore, I used the function
ids = torch.argmax(scores,1,True)
giving me the maximum ids. I already tried to do it with gather function:
result = x.gather(1,ids)
but that didn't work.
Here is a solution you may look for
ids = ids.repeat(1, 255).view(-1, 1, 255)
An example as below:
x = torch.arange(24).view(4, 3, 2)
"""
tensor([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]],
[[18, 19],
[20, 21],
[22, 23]]])
"""
ids = torch.randint(0, 3, size=(4, 1))
"""
tensor([[0],
[2],
[0],
[2]])
"""
idx = ids.repeat(1, 2).view(4, 1, 2)
"""
tensor([[[0, 0]],
[[2, 2]],
[[0, 0]],
[[2, 2]]])
"""
torch.gather(x, 1, idx)
"""
tensor([[[ 0, 1]],
[[10, 11]],
[[12, 13]],
[[22, 23]]])
"""
using the example of David Ng I found out another way to do it:
idx = ids.flatten() + torch.arange(0,4*3,3)
tensor([ 0, 5, 6, 11])
x.view(-1,2)[idx]
tensor([[ 0, 1],
[10, 11],
[12, 13],
[22, 23]])
Another solution may provide better memory read pattern in cases where the dimensions are higher.
# data
x = torch.arange(60).reshape(3, 4, 5)
# index
y = torch.randint(0, 4, (12,), dtype=torch.int64).reshape(3, 4)
# result
z = x[torch.arange(x.shape[0]).repeat_interleave(x.shape[1]), y.flatten()]
z = z.reshape(x.shape)
An example result of the x, y, z will be
Tensor([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
tensor([[1, 1, 2, 3],
[3, 1, 1, 0],
[1, 1, 1, 1]])
tensor([[[ 5, 6, 7, 8, 9],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[35, 36, 37, 38, 39],
[25, 26, 27, 28, 29],
[25, 26, 27, 28, 29],
[20, 21, 22, 23, 24]],
[[45, 46, 47, 48, 49],
[45, 46, 47, 48, 49],
[45, 46, 47, 48, 49],
[45, 46, 47, 48, 49]]])

What does this slicing mean [:, :, 0]?

I am trying to understand this code
I can't get my head around what this line is doing. The flow variable is an array of flow vectors with one for each pixel in the image (so a 2d array).
fx, fy = flow[:, :, 0], flow[:, :, 1]
Any help would be appreciated
Let us first simplify the expression. Your code:
fx, fy = flow[:, :, 0], flow[:, :, 1]
is equivalent to:
fx = flow[:, :, 0]
fy = flow[:, :, 1]
So now it boils down on what flow[:, :, 0]. It means that flow is a numpy array with at least three dimensions (let us define N as the number of dimensions). Then flow[:,:,0] is an N-1-dimensional array, where we pick 0 always as the third dimension.
In the context of image processing, an image is usually a 3d-array (given it has color) with dimensions w × h × 3 (three color channels). So here it means that flow[:,:,0] will generate a w × h view where for each pixel, we select the red channel (given the red channel is the first channel).
So if flow is a 5 × 4 × 3 matrix, like:
>>> flow
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]],
[[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]],
[[24, 25, 26],
[27, 28, 29],
[30, 31, 32],
[33, 34, 35]],
[[36, 37, 38],
[39, 40, 41],
[42, 43, 44],
[45, 46, 47]],
[[48, 49, 50],
[51, 52, 53],
[54, 55, 56],
[57, 58, 59]]])
Then we will obtain for each 3-tuple the first element, making it:
>>> flow[:,:,0]
array([[ 0, 3, 6, 9],
[12, 15, 18, 21],
[24, 27, 30, 33],
[36, 39, 42, 45],
[48, 51, 54, 57]])
and by querying flow[:,:,1], we obtain:
>>> flow[:,:,1]
array([[ 1, 4, 7, 10],
[13, 16, 19, 22],
[25, 28, 31, 34],
[37, 40, 43, 46],
[49, 52, 55, 58]])
mind that these are views: if you alter flow, it will have impact on fx and fy as well, even if you did these assignments before.

Categories