Related
Let's say I have the following (fictitious) NumPy array:
arr = np.array(
[[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20],
[21, 22, 23, 24],
[25, 26, 27, 28],
[29, 30, 31, 32],
[33, 34, 35, 36],
[37, 38, 39, 40]
]
)
And for row indices idx = [0, 2, 3, 5, 8, 9] I'd like to repeat the values in each row downward until it reaches the next row index:
np.array(
[[1, 2, 3, 4],
[1, 2, 3, 4],
[9, 10, 11, 12],
[13, 14, 15, 16],
[13, 14, 15, 16],
[21, 22, 23, 24],
[21, 22, 23, 24],
[21, 22, 23, 24],
[33, 34, 35, 36],
[37, 38, 39, 40]
]
)
Note that idx will always be sorted and have no repeat values. While I can accomplish this by doing something like:
for start, stop in zip(idx[:-1], idx[1:]):
for i in range(start, stop):
arr[i] = arr[start]
# Handle last index in `idx`
start, stop = idx[-1], arr.shape[0]
for i in range(start, stop):
arr[i] = arr[start]
Unfortunately, I have many, many arrays like this and this can become slow as the size of the array gets larger (in both the number of rows as well as the number of columns) and the length of idx also increases. The final goal is to plot these as a heatmaps in matplotlib, which I already know how to do. Another approach that I tried was using np.tile:
for start, stop in zip(idx[:-1], idx[1:]):
reps = max(0, stop - start)
arr[start:stop] = np.tile(arr[start], (reps, 1))
# Handle last index in `idx`
start, stop = idx[-1], arr.shape[0]
arr[start:stop] = np.tile(arr[start], (reps, 1))
But I am hoping that there's a way to get rid of the slow for-loop.
Try np.diff to find the repetition for each row, then np.repeat:
# this assumes `idx` is a standard list as in the question
np.repeat(arr[idx], np.diff(idx+[len(arr)]), axis=0)
Output:
array([[ 1, 2, 3, 4],
[ 1, 2, 3, 4],
[ 9, 10, 11, 12],
[13, 14, 15, 16],
[13, 14, 15, 16],
[21, 22, 23, 24],
[21, 22, 23, 24],
[21, 22, 23, 24],
[33, 34, 35, 36],
[37, 38, 39, 40]])
I have an array a with the shape of (1000000,32) and array b with the shape of (10000,32)
I want to find the indices of a that contains the rows of b.
I have written the following code:
I = np.argwhere((a == b[:, None]).all(axis=2))[:, 1]
when I test it in other cases it works very well. But for my current arrays it gives the following error:
...\Anaconda3\lib\site-packages\ipykernel_launcher.py:111: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
AttributeError: 'bool' object has no attribute 'all'
Any idea what is the source of the error? thanks
Run:
result = (a[:, np.newaxis] == b).all(-1).any(-1)
Steps:
a[:, np.newaxis] == b - "by element" comparison. First and second index -
indices of rows from a and b, third index - column index in both rows.
….all(-1) - does a[i] has its "counterpart" in b[j] (all
elements of both rows are equal).
….any(-1) - does a[i] has its "counterpart" in any row in b.
To check the results of each step, use 2 arrays with e.g. up to 10 rows and
2 columns.
np.arange(a.shape[0])[np.isin(a,b).all(axis=1)]
>>> import numpy as np
>>> a=np.arange(60).reshape(10,6)
>>> a
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23],
[24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35],
[36, 37, 38, 39, 40, 41],
[42, 43, 44, 45, 46, 47],
[48, 49, 50, 51, 52, 53],
[54, 55, 56, 57, 58, 59]])
>>> b=np.arange(24).reshape(4,6)
>>> b
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11],
[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])
>>> np.arange(a.shape[0])[np.isin(a,b).all(axis=1)]
array([0, 1, 2, 3])
Of course np.isin() is in the docs here and tests if an element of a is in b. The questioner is already familiar with all() and the use of the axis=1 argument. So np.isin(a,b).all(axis=1) produces a boolean array that selects from among the indexes of a which are represented by np.arange(a.shape[0]).
I have array A = np.ones((4,4,4)) and another array that represent the coordinates of point in array A called B, lets assume that B = [[2,2,2], [3,2,1]].
I tried to access A by array indexing like A[B], but it didn't works.
How i can do it in elegant way, that also work for B that it's have a higher dimensions like B of shape (10, 20, 3) ?
You can pass a list of coordinates, but you should transpose the list. Such that the items of the i-th dimension are passed as the i-th element in the indexing, for example with:
A[tuple(np.transpose(B))]
For a 4×4×4 matrix:
>>> A = np.arange(64).reshape(4,4,4)
>>> A
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]],
[[32, 33, 34, 35],
[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]],
[[48, 49, 50, 51],
[52, 53, 54, 55],
[56, 57, 58, 59],
[60, 61, 62, 63]]])
we get for the given coordinates:
>>> A[tuple(np.transpose(B))]
array([42, 57])
and if we calculate these manually, we get:
>>> A[2,2,2]
42
>>> A[3,2,1]
57
Background:
A[1,2,3] is short for A[(1,2,3)] (so in a tuple). You can fetch multiple items with A[([2,3], [2,2], [2,1])] but then you thus first need to transpose the data.
Since the data is represented as [[2,2,2], [3,2,1]], we thus first need to transpose it to [[2,3], [2,2], [2,1]]. Next we wrap it in a tuple, and can use this to subscript A.
I have the tensors:
ids: shape (7000,1) containing indices like [[1],[0],[2],...]
x: shape(7000,3,255)
ids tensor encodes the index of bold marked dimension of x which should be selected.
I want to gather the selected slices in a resulting vector:
result: shape (7000,255)
Background:
I have some scores (shape = (7000,3)) for each of the 3 elements and want only to select the one with the highest score. Therefore, I used the function
ids = torch.argmax(scores,1,True)
giving me the maximum ids. I already tried to do it with gather function:
result = x.gather(1,ids)
but that didn't work.
Here is a solution you may look for
ids = ids.repeat(1, 255).view(-1, 1, 255)
An example as below:
x = torch.arange(24).view(4, 3, 2)
"""
tensor([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]],
[[18, 19],
[20, 21],
[22, 23]]])
"""
ids = torch.randint(0, 3, size=(4, 1))
"""
tensor([[0],
[2],
[0],
[2]])
"""
idx = ids.repeat(1, 2).view(4, 1, 2)
"""
tensor([[[0, 0]],
[[2, 2]],
[[0, 0]],
[[2, 2]]])
"""
torch.gather(x, 1, idx)
"""
tensor([[[ 0, 1]],
[[10, 11]],
[[12, 13]],
[[22, 23]]])
"""
using the example of David Ng I found out another way to do it:
idx = ids.flatten() + torch.arange(0,4*3,3)
tensor([ 0, 5, 6, 11])
x.view(-1,2)[idx]
tensor([[ 0, 1],
[10, 11],
[12, 13],
[22, 23]])
Another solution may provide better memory read pattern in cases where the dimensions are higher.
# data
x = torch.arange(60).reshape(3, 4, 5)
# index
y = torch.randint(0, 4, (12,), dtype=torch.int64).reshape(3, 4)
# result
z = x[torch.arange(x.shape[0]).repeat_interleave(x.shape[1]), y.flatten()]
z = z.reshape(x.shape)
An example result of the x, y, z will be
Tensor([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
tensor([[1, 1, 2, 3],
[3, 1, 1, 0],
[1, 1, 1, 1]])
tensor([[[ 5, 6, 7, 8, 9],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[35, 36, 37, 38, 39],
[25, 26, 27, 28, 29],
[25, 26, 27, 28, 29],
[20, 21, 22, 23, 24]],
[[45, 46, 47, 48, 49],
[45, 46, 47, 48, 49],
[45, 46, 47, 48, 49],
[45, 46, 47, 48, 49]]])
I am trying to understand this code
I can't get my head around what this line is doing. The flow variable is an array of flow vectors with one for each pixel in the image (so a 2d array).
fx, fy = flow[:, :, 0], flow[:, :, 1]
Any help would be appreciated
Let us first simplify the expression. Your code:
fx, fy = flow[:, :, 0], flow[:, :, 1]
is equivalent to:
fx = flow[:, :, 0]
fy = flow[:, :, 1]
So now it boils down on what flow[:, :, 0]. It means that flow is a numpy array with at least three dimensions (let us define N as the number of dimensions). Then flow[:,:,0] is an N-1-dimensional array, where we pick 0 always as the third dimension.
In the context of image processing, an image is usually a 3d-array (given it has color) with dimensions w × h × 3 (three color channels). So here it means that flow[:,:,0] will generate a w × h view where for each pixel, we select the red channel (given the red channel is the first channel).
So if flow is a 5 × 4 × 3 matrix, like:
>>> flow
array([[[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]],
[[12, 13, 14],
[15, 16, 17],
[18, 19, 20],
[21, 22, 23]],
[[24, 25, 26],
[27, 28, 29],
[30, 31, 32],
[33, 34, 35]],
[[36, 37, 38],
[39, 40, 41],
[42, 43, 44],
[45, 46, 47]],
[[48, 49, 50],
[51, 52, 53],
[54, 55, 56],
[57, 58, 59]]])
Then we will obtain for each 3-tuple the first element, making it:
>>> flow[:,:,0]
array([[ 0, 3, 6, 9],
[12, 15, 18, 21],
[24, 27, 30, 33],
[36, 39, 42, 45],
[48, 51, 54, 57]])
and by querying flow[:,:,1], we obtain:
>>> flow[:,:,1]
array([[ 1, 4, 7, 10],
[13, 16, 19, 22],
[25, 28, 31, 34],
[37, 40, 43, 46],
[49, 52, 55, 58]])
mind that these are views: if you alter flow, it will have impact on fx and fy as well, even if you did these assignments before.