Select from a 3-dimensional array with a 2-dimensional array - python

I have two arrays:
a: a 3-dimensional source array (N x M x 2)
b: a 2-dimensional index array (N x M) containing 0 and 1s.
I want to use the indices in b to select the corresponding elements of a in its third dimension. The resulting array should have the dimensions N x M. Here is the example as code:
import numpy as np
a = np.array( # dims: 3x3x2
[[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]]]
)
b = np.array( # dims: 3x3
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1]]
)
# select the elements in a according to b
# to achieve this result:
desired = np.array(
[[ 1, 3, 5],
[ 7, 9, 11],
[13, 15, 17]]
)
At first, I thought this must have a simple solution but I could not find one at all. Since I would like to port it to tensorflow, I would appreciate if somebody knows a numpy-type solution for this.
Edit: The third dimension of a might contain more than two elements. Hence, b might also contain indices different from 0 and 1 - it is not a boolean mask.

We can use np.where for this:
np.where(b, a[:, :, 1], a[:, :, 0])
Output:
array([[ 1, 3, 5],
[ 7, 9, 11],
[13, 15, 17]])

As #jdehesa sugests, we can use np.ogrid to obtain the indices for the first two axes:
ax0, ax1 = np.ogrid[:b.shape[0], :b.shape[1]]
And then we can use b to directly index along the last axis. Note that ax0 and ax1 will be broadcast to the shape of b:
desired = a[ax0, ax1 ,b]
print(desired)
array([[ 1, 3, 5],
[ 7, 9, 11],
[13, 15, 17]])

I added some solutions for tensorflow.
import tensorflow as tf
a = tf.constant([[[ 0, 1],[ 2, 3],[ 4, 5]],
[[ 6, 7],[ 8, 9],[10, 11]],
[[12, 13],[14, 15],[16, 17]]],dtype=tf.float32)
b = tf.constant([[1, 1, 1],[1, 1, 1],[1, 1, 1]],dtype=tf.int32)
# 1. use tf.gather_nd
colum,row = tf.meshgrid(tf.range(a.shape[0]),tf.range(a.shape[1]))
idx = tf.stack([row, colum, b], axis=-1) # Thanks for #jdehesa's suggestion
result1 = tf.gather_nd(a,idx)
# 2. use tf.reduce_sum
mask = tf.one_hot(b,depth=a.shape[-1],dtype=tf.float32)
result2 = tf.reduce_sum(a*mask,axis=-1)
# 3. use tf.boolean_mask
mask = tf.one_hot(b,depth=a.shape[-1],dtype=tf.float32)
result3 = tf.reshape(tf.boolean_mask(a,mask),b.shape)
with tf.Session() as sess:
print('method 1: \n',sess.run(result1))
print('method 2: \n',sess.run(result2))
print('method 3: \n',sess.run(result3))
method 1:
[[ 1. 3. 5.]
[ 7. 9. 11.]
[13. 15. 17.]]
method 2:
[[ 1. 3. 5.]
[ 7. 9. 11.]
[13. 15. 17.]]
method 3:
[[ 1. 3. 5.]
[ 7. 9. 11.]
[13. 15. 17.]]

You can use np.take_along_axis:
import numpy as np
a = np.array(
[[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]],
[[12, 13],
[14, 15],
[16, 17]]])
b = np.array(
[[1, 1, 1],
[1, 1, 1],
[1, 1, 1]])
print(np.take_along_axis(a, b[..., np.newaxis], axis=-1)[..., 0])
# [[ 1 3 5]
# [ 7 9 11]
# [13 15 17]]

Related

Multiple numpy arrays to bytes

Input/Output:
[array([[ 2.120417 , -13.725279 ],
[ 2.066555 , -13.953174 ]], dtype=float32)
array([[ 1.952603, 6.800025],
[ 1.952603, 6.800025]], dtype=float32)
b"\x40\x07\xb4\xea\xc1\x5b\x9a\xbe\x3f\xf9\xee\xe5\x40\xd9\x99\xce\x40\x04\x42\x70\xc1\x5f\x40\x33\x3f\xf9\xee\xe5\x40\xd9\x99\xce"
Each array contains multiple x, y coordinates (floats). I want to go through one element in an array (one element contains a set of x, y coords) and then the next array at the same index, then after all arrays have been gone through the first index, then the next.
IIUC, you can hstack and ravel:
np.hstack([arr1, arr2, arr3]).ravel()
Output:
array([ 0, 1, 4, 5, 8, 9, 2, 3, 6, 7, 10, 11])
Used input ([arr1, arr2, arr3]):
[array([[0, 1],
[2, 3]]),
array([[4, 5],
[6, 7]]),
array([[ 8, 9],
[10, 11]])]

Pytorch/NumPy batched submatrix indexing

There's a single source (square) matrix L of shape (N, N)
import torch as pt
import numpy as np
N = 4
L = pt.arange(N*N).reshape(N, N) # or np.arange(N*N).reshape(N, N)
L = tensor([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
and a matrix (vector of vectors) of boolean masks m of shape (K, N) according to which I'd like to extract submatrices from L.
K = 3
m = tensor([[ True, True, False, False],
[False, True, True, False],
[False, True, False, True]])
I know how to extract a single submatrix using a single mask vector by calling L[m[i]][:, m[i]] for any i. So, for example, for i=0, we'd get
tensor([[ 0, 1],
[ 4, 5]])
but I need to perform the operation along the entire "batch" dimension. The end result I'm looking for then could be achieved by
res = []
for i in range(K):
res.append(L[m[i]][:, m[i]])
output = pt.stack(res)
however, I hope there is a better solution excluding the for loop. I realize that the for loop solution itself would crash if the sum of m along the last dimension (dim/axis=1) wasn't constant, but if I can guarantee that it is, is there a better solution? If there isn't, would changing the selector representation help? I chose boolean masks for convenience, but I prefer better performance.
Notice that you can get the first square by indexing together with broadcasting:
r = torch.tensor([0,1])
L[r[:,None], r]
output:
tensor([[0, 1],
[4, 5]])
The same principle can be applied to the second square:
r = torch.tensor([1,2])
L[r[:,None], r]
output:
tensor([[ 5, 6],
[ 9, 10]])
In combination you get:
i = torch.tensor([[0, 1], [1, 2]])
L[i[:,:,None], i[:,None]]
output:
tensor([[[ 0, 4],
[ 1, 5]],
[[ 5, 9],
[ 6, 10]]])
All 3 squares:
i = torch.tensor([
[0, 1],
[1, 2],
[1, 3],
])
L[i[:,:,None], i[:,None]]
output:
tensor([[[ 0, 1],
[ 4, 5]],
[[ 5, 6],
[ 9, 10]],
[[ 5, 7],
[13, 15]]])
to summarize, I would suggest using indices instead of a mask.

Why is the output of torch.lstsq drastically different than np.linalg.lstsq?

Pytorch provides a lstsq function, but the result it returns drastically differs from the numpy's version. Here is an example input and both of their results:
import numpy as np
import torch
a = torch.tensor([[1., 1, 1],
[2, 3, 4],
[3, 5, 2],
[4, 2, 5],
[5, 4, 3]])
b = torch.tensor([[-10., -3],
[ 12, 14],
[ 14, 12],
[ 16, 16],
[ 18, 16]])
a1 = a.clone().numpy()
b1 = b.clone().numpy()
x, r = torch.lstsq(b, a)
x1, res, r1, s = np.linalg.lstsq(b1, a1)
print(f'torch_x: {x}')
print(f'torch_r: {r}\n')
print(f'np_x: {x1}')
print(f'np_res: {res}')
print(f'np_r1(rank): {r1}')
print(f'np_s: {s}')
Output:
torch_x: tensor([[ 2.0000, 1.0000],
[ 1.0000, 1.0000],
[ 1.0000, 2.0000],
[10.9635, 4.8501],
[ 8.9332, 5.2418]])
torch_r: tensor([[-7.4162, -6.7420, -6.7420],
[ 0.2376, -3.0896, 0.1471],
[ 0.3565, 0.5272, 3.0861],
[ 0.4753, -0.3952, -0.4312],
[ 0.5941, -0.1411, 0.2681]])
np_x: [[-0.11452514 -0.10474861 -0.28631285]
[ 0.35913807 0.33719075 0.54070234]]
np_res: [ 5.4269753 10.197526 1.4185953]
np_r1(rank): 2
np_s: [43.057705 5.199417]
What am I missing here?
torch.lstq(a, b) solves minX L2∥bX−a∥
while np.linalg.lstsq(a, b) solves minX L2∥aX−b∥
So change the order of parameters passed.
Here's a sample:
import numpy as np
import torch
a = torch.tensor([[1., 1, 1],
[2, 3, 4],
[3, 5, 2],
[4, 2, 5],
[5, 4, 3]])
b = torch.tensor([[-10., -3],
[ 12, 14],
[ 14, 12],
[ 16, 16],
[ 18, 16]])
a1 = a.clone().numpy()
b1 = b.clone().numpy()
x, _ = torch.lstsq(a, b)
x1, res, r1, s = np.linalg.lstsq(b1, a1)
print(f'torch_x: {x[:b.shape[1]]}')
print(f'np_x: {x1}')
Results:
torch_x: tensor([[-0.1145, -0.1047, -0.2863],
[ 0.3591, 0.3372, 0.5407]])
np_x: [[-0.11452514 -0.10474861 -0.28631285]
[ 0.35913807 0.33719075 0.54070234]]
link to torch doc
link to numpy doc
And also the returned rank from numpy.lianalg.lstsq is rank of 1st parameters . To get rank in pytorch use torch.matrix_rank() function.

numpy nansum across first index

I have an example 2 x 2 x 2 array:
np.array([[[ 1, 2],
[ 3, 4]],
[[ 5, 6],
[ 7 , 8]]])
I want the nansum of the array across the first index as follows:
Sum all values in:
[[ 1, 2],
[ 3, 4]]
and
[[ 5, 6],
[ 7 , 8]]
The sum of the first array would be 10 and the second would be 26
i.e.
array([10, 26])
I think you are looking for this
a = np.array([[[ 1, 2],
[ 3, 4]],
[[ 5, 6],
[ 7 , 8]]])
np.nansum(a,axis=(1,2))
# array([10, 26])
because you want to sum on axis 1 and 2 only, and get one number per axis 0

How to stack elements from array excluding a single index

I got a 5x10x100 array and want to exclude one index and stack the rest, resulting in a 40x100 array.
old_arr.shape
>> (5, 10, 100)
I tried the following single line Generator:
i_to_exclude = 4
new_arr = np.array([element for i, element in enumerate(old_arr) if i != i_to_exclude])
new_arr.shape
>> (4, 10, 100)
I'm not sure how to use the Generator to stack the lists instead of appending them.
Try this:
np.vstack(np.delete(old_arr, i_to_exclude , axis = 0))
example:
old_arr = np.arange(16).reshape((4,2,2))
#array([[[ 0, 1],
# [ 2, 3]],
# [[ 4, 5],
# [ 6, 7]],
# [[ 8, 9],
# [10, 11]],
# [[12, 13],
# [14, 15]]])
i_to_exclude = 3
new_arr = np.vstack(np.delete(old_arr, i_to_exclude , axis = 0))
#array([[ 0, 1],
# [ 2, 3],
# [ 4, 5],
# [ 6, 7],
# [ 8, 9],
# [10, 11]])
This will also be faster than using a for-loop.

Categories