Indexing tensor with binary matrix in numpy

Indexing tensor with binary matrix in numpy - python

I have a tensor A such that A.shape = (32, 19, 2) and a binary matrix B such that B.shape = (32, 19). Is there a one-line operation I can perform to get a matrix C, where C.shape = (32, 19) and C(i,j) = A[i, j, B[i,j]]?
Essentially, I want to use B as an indexing matrix, where if B[i,j] = 1 I take A[i,j,1] to form C(i,j).

np.where to the rescue. It's the same principle as mtrw's answer:
In [344]: A=np.arange(4*3*2).reshape(4,3,2)
In [345]: B=np.zeros((4,3),dtype=int)
In [346]: B[[0,1,1,2,3],[0,0,1,2,2]]=1
In [347]: B
Out[347]:
array([[1, 0, 0],
[1, 1, 0],
[0, 0, 1],
[0, 0, 1]])
In [348]: np.where(B,A[:,:,1],A[:,:,0])
Out[348]:
array([[ 1, 2, 4],
[ 7, 9, 10],
[12, 14, 17],
[18, 20, 23]])
np.choose can be used if the last dimension is larger than 2 (but smaller than 32). (choose operates on a list or the 1st dimension, hence the rollaxis.
In [360]: np.choose(B,np.rollaxis(A,2))
Out[360]:
array([[ 1, 2, 4],
[ 7, 9, 10],
[12, 14, 17],
[18, 20, 23]])
B can also be used directly as an index. The trick is to specify the other dimensions in a way that broadcasts to the same shape.
In [373]: A[np.arange(A.shape[0])[:,None], np.arange(A.shape[1])[None,:], B]
Out[373]:
array([[ 1, 2, 4],
[ 7, 9, 10],
[12, 14, 17],
[18, 20, 23]])
This last approach can be modified to work when B does not match the 1st 2 dimensions of A.
np.ix_ may simplify this indexing
I, J = np.ix_(np.arange(4),np.arange(3))
A[I, J, B]

You can do it using list comprehension:
C = np.array([[A[i, j, B[i, j]] for j in range(A.shape[1])] for i in range(A.shape[0])])

C = A[:,:,0]*(B==0) + A[:,:,1]*(B==1) should work. You can generalize this as np.sum([A[:,:,k]*(B==k) for k in np.arange(A.shape[-1])], axis=0) if you need to index more planes.

Related

np.dot in NumPy printing the transpose of what should be expected

I'm really new to Python and am wondering why this is printing the opposite of expected. A (7x4)(4x2)(2x1) multiplication should result in a 7x1 column vector.
import numpy as np
nutrition = np.array([[61, 100, 7, 2.2, 1, 7, 215],
[156, 340, 18, 7, 44, 5, 0],
[19, 110, 9, 3.3, 0, 6, 16],
[27, 60, 2, 0.5, 8, 2, 16]])
meals = np.array([[2, 1, 0, 0],
[0, 1, 1, 1]]
M = np.array([40, 10])
print(np.dot(nutrition.T, np.dot(meals.T, M)))
Instead, it is printing a 1x7 row vector:
[13140. 26700. 1570. 564. 2360. 890. 17520.]
Any explanation or problems to look into would be appreciated.

Your array M is of shape (2,) and NOT (2,1):
print(M.shape)
(2,)
Hence, the output shape is (7,) and NOT (7,1). Which makes it a 1-D array represented in a single row:
print(np.dot(nutrition.T, np.dot(meals.T, M)).shape)
(7,)
If you want a (7,1) output, simply reshape your M to (2,1):
M = M.reshape(-1,1)
#[[40]
# [10]]
And output would be:
[[13140.]
[26700.]
[ 1570.]
[ 564.]
[ 2360.]
[ 890.]
[17520.]]

How does "Fancy Indexing with Broadcasting and Boolean Masking" work?

I came across this snippet of code in Jake Vanderplas's Data Science Handbook. The concept of using Broadcasting along with Fancy Indexing here wasn't clear to me. Please explain.
In[5]: X = np.arange(12).reshape((3, 4))
X
Out[5]: array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
In[6]: row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
In[7]: X[row[:, np.newaxis], col]
Out[7]: array([[ 2, 1, 3],
[ 6, 5, 7],
[10, 9, 11]])
It says: "Here, each row value is matched with each column vector, exactly as we saw in broadcasting of arithmetic operations. For example:"
In[8]: row[:, np.newaxis] * col
Out[8]: array([[0, 0, 0],
[2, 1, 3],
[4, 2, 6]])

If you use an integer array to index another array
you basically loop over the given indices and pick the respective elements (may still be an array) along the axis you are indexing and stack them together.
arr55 = np.arange(25).reshape((5, 5))
# array([[ 0, 1, 2, 3, 4],
# [ 5, 6, 7, 8, 9],
# [10, 11, 12, 13, 14],
# [15, 16, 17, 18, 19],
# [20, 21, 22, 23, 24]])
arr53 = arr55[:, [3, 3, 4]]
# pick the elements at (arr[:, 3], arr[:, 3], arr[:, 4])
# array([[ 3, 3, 4],
# [ 8, 8, 9],
# [13, 13, 14],
# [18, 18, 19],
# [23, 23, 24]])
So if you index an (m, n) array with an row (or col) index of length k (or length l) the resulting shape is:
A_nm[row, :] -> A_km
A_nm[:, col] -> A_nl
If however you use two arrays row and col to index an array
you loop over both indices simultaneously and stack the elements (may still be arrays) at the respective position together.
Here it row and col must have the same length.
A_nm[row, col] -> A_k
array([ 3, 13, 24])
arr3 = arr55[[0, 2, 4], [3, 3, 4]]
# pick the element at (arr[0, 3], arr[2, 3], arr[4, 4])
Now finally for your question: it is possible to use broadcasting while indexing arrays. Sometimes it is not wanted that only the elements
(arr[0, 3], arr[2, 3], arr[4, 4])
are picked, but rather the expanded version:
(arr[0, [3, 3, 4]], arr[2, [3, 3, 4]], arr[4, [3, 3, 4]])
# each row value is matched with each column vector
This matching/broadcasting is exactly as in other arithmetic operations.
But the example here might be bad in the sense, that not the result of the shown multiplication is of importance for the indexing.
The focus here is on the combinations and the resulting shape:
row * col
# performs a element wise multiplication resulting in 3
numbers
row[:, np.newaxis] * col
# performs a multiplication where each row value is *matched* with each column vector
The example wanted to emphasis this matching of row and col.
We can have a look and play around with the different possibilities:
n = 3
m = 4
X = np.arange(n*m).reshape((n, m))
row = np.array([0, 1, 2]) # k = 3
col = np.array([2, 1, 3]) # l = 3
X[row, :] # A_nm[row, :] -> A_km
# array([[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]])
X[:, col] # A_nm[:, col] -> A_nl
# array([[ 2, 1, 3],
# [ 6, 5, 7],
# [10, 9, 11]])
X[row, col] # A_nm[row, col] -> A_l == A_k
# array([ 2, 5, 11]
X[row, :][:, col] # A_nm[row, :][:, col] -> A_km[:, col] -> A_kl
# == X[:, col][row, :]
# == X[row[:, np.newaxis], col] # A_nm[row[:, np.newaxis], col] -> A_kl
# array([[ 2, 1, 3],
# [ 6, 5, 7],
# [10, 9, 11]])
X[row, col[:, np.newaxis]]
# == X[row[:, np.newaxis], col].T
# array([[ 2, 6, 10],
# [ 1, 5, 9],
# [ 3, 7, 11]])

I came here looking for an answer to this question, and hpaulj's comment helped me. I'm going to expand on it.
In the following snippet,
import numpy as np
X = np.arange(12).reshape((3, 4))
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
Y = X[row.reshape(-1, 1), col]
the indexes we're passing to X are getting broadcasted.
The code below, which follows the numpy broadcasting rules but uses far more memory, accomplishes the same slicing:
# Make the row and column indices 'conformable'
R = np.repeat(row.reshape(-1, 1), 3, axis=1) # repeat row index across columns
C = np.repeat(col.reshape(1, -1), 3, axis=0) # repeat column index across rows
Y = X[R, C] # Y[i, j] = X[R[i, j], C[i, j]]

view_as_windows on 4d array

Given an ndarray of shape (batch_size, w, h, c), and a patch size (p, p), I want to extract patches from each 3D matrix (i.e. shape (p, p, c). No patches will overlap, so the stride can be thought of as p.
This should return an array with (batch_size * p * p, p, p, c)
Using skimage.view_as_windows here is a minimal example
import numpy as np
import skimage
a = np.arange(8*8*2).reshape((8, 8, 2))
b = a * 2
c = np.concatenate((a[np.newaxis, :, :, :], b[np.newaxis, :, :, :]), axis = 0)
d = skimage.util.view_as_windows(c, 2, step = 2).reshape((8*2*2, 2, 2, 2))
However, only the alternate values are what I expect:
d[0]
Out[183]:
array([[[ 0, 1],
[ 2, 3]],
[[16, 17],
[18, 19]]])
d[1]
Out[184]:
array([[[ 0, 2],
[ 4, 6]],
[[32, 34],
[36, 38]]])
d[2]
Out[185]:
array([[[ 4, 5],
[ 6, 7]],
[[20, 21],
[22, 23]]])
d[3]
Out[186]:
array([[[ 8, 10],
[12, 14]],
[[40, 42],
[44, 46]]])
d[4]
Out[187]:
array([[[ 8, 9],
[10, 11]],
[[24, 25],
[26, 27]]])
Thus, d[::2] is close to my solution but half the values are lost
I am not sure if the problem is the window size or the step, or even if my problem is possible using view_as_windows, so I am open to any efficient suggestion.

First, I think you mean to return a volume of shape (batch_size * w/p * h/p, p, p, c)? i.e., if the patches are non-overlapping, then the product of the dimensions should be the same pre- and post-patching.
Having gotten that out of the way, here's my attempt. I'm changing the batch size and channel dimensions to make it clearer which is which.
import numpy as np
from skimage import util
batch = np.arange(4*8*8*3).reshape((4, 8, 8, 3))
blocked = util.view_as_blocks((1, 2, 2, 3))
patches = blocked.reshape((64, 2, 2, 3))
print(patches[0].transpose((2, 0, 1)))
print(patches[1].transpose((2, 0, 1)))
which gives:
[[[0 1]
[8 9]]
[[0 1]
[8 9]]
[[0 1]
[8 9]]]
and
[[[ 2 3]
[10 11]]
[[ 2 3]
[10 11]]
[[ 2 3]
[10 11]]]
Unfortunately, the reshape triggers a copy. I'm not sure whether there is a way to avoid it, but hopefully this is not your main computational/memory concern.

Remove/delete each minimum value in each row of a NxM matrix?

Is it possible to remove/delete each minimum value in each row of a NxM matrix
creating a new matrix?
I´ve tried this so far without any luck:
for n in range(0,len(matrix_name)):
Ma = grades.remove(np.min(matrix_name[n,:]))
and this too:
for n in range(0,len(matrix_name)):
Ma = np.delete(matrix_name,np.min(matrix_name[n,:]))

If duplicates are not an issue or if deleting only one of them per row is acceptable:
m, n = a.shape
np.where(np.arange(n-1) < a.argmin(axis=1)[:, None], a[:, :-1], a[:, 1:])

If reconstructing the desired result from the original array instead of modifying the original array by deleting the min values, is allowed then this approach should do the job:
# some test array
In [19]: arr
Out[19]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19]])
In [20]: r, c = arr.shape
# find the minimum along axis 1 (i.e. along rows)
In [21]: min_vals = np.min(arr, axis=1, keepdims=True)
# reshape the result to 2D array
In [22]: (arr[np.where(arr != min_vals)]).reshape(r, c-1)
Out[22]:
array([[ 1, 2, 3],
[ 5, 6, 7],
[ 9, 10, 11],
[13, 14, 15],
[17, 18, 19]])
Note: This approach assumes that there's only one minimum value in each row.

When getting an ROI from a numpy array (opencv image) why does img[y0:y1, x0:x1] seem to use an inconsistent range of indicies?

OpenCV uses numpy arrays in order to store image data. In this question and accepted answer I was told that to access a subregion of interest in an image, I could use the form roi = img[y0:y1, x0:x1].
I am confused because when I create an numpy array in the terminal and test, I don't seem to be getting this behavior. Below I want to get the roi [[6,7], [11,12]], where y0 = index 1, y1 = index 2, and x0 = index 0, x1 = index 1.
Why then do I get what I want only with arr[1:3, 0:2]? I expected to get it with arr[1:2, 0:1].
It seems that when I slice an n-by-n ndarray[a:b, c:d], a and c are the expected range of indicies 0..n-1, but b and d are indicies ranging 1..n.

In your posted example numpy and cv2 are working as expected. Indexing or Slicing in numpy, just as in python in general, is 0 based and of the form [a, b), i.e. not including b.
Recreate your example:
>>> import numpy as np
>>> arr = np.arange(1,26).reshape(5,5)
>>> arr
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[11, 12, 13, 14, 15],
[16, 17, 18, 19, 20],
[21, 22, 23, 24, 25]])
So the statement arr[1:2, 0:1] means get the value(s) at row=1 (row 1 up to but not including 2) and column=0 (we expect 6):
>>> arr[1:2, 0:1]
array([[6]])
Similarly for arr[1:3, 0:2] we expect rows 1,2 and columns 0,1:
>>> arr[1:3, 0:2]
array([[ 6, 7],
[11, 12]])
So if what you want is the region [[a, b], [c, d]] to include b and d, what you really need is:
[[a, b+1], [c, d+1]]
Further examples:
Suppose you need all columns but just rows 0 and 1:
>>> arr[:2, :]
array([[ 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10]])
Here arr[:2, :] means all rows up to, but not including 2, followed by all columns :.
Suppose you want every other column, starting at column index 0 (and all rows):
>>> arr[:, ::2]
array([[ 1, 3, 5],
[ 6, 8, 10],
[11, 13, 15],
[16, 18, 20],
[21, 23, 25]])
where ::2 follows the start:stop:step notation (where stop is not inclusive).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Indexing tensor with binary matrix in numpy - python

You can do it using list comprehension: C = np.array([[A[i, j, B[i, j]] for j in range(A.shape[1])] for i in range(A.shape[0])])

C = A[:,:,0](B==0) + A[:,:,1](B==1) should work. You can generalize this as np.sum([A[:,:,k]*(B==k) for k in np.arange(A.shape[-1])], axis=0) if you need to index more planes.

Related

np.dot in NumPy printing the transpose of what should be expected

How does "Fancy Indexing with Broadcasting and Boolean Masking" work?

view_as_windows on 4d array

Remove/delete each minimum value in each row of a NxM matrix?

When getting an ROI from a numpy array (opencv image) why does img[y0:y1, x0:x1] seem to use an inconsistent range of indicies?

Categories

Resources