Numpy select matrix specified by a matrix of indices, from multidimensional array - python

I have a numpy array a of size 5x5x4x5x5. I have another matrix b of size 5x5. I want to get a[i,j,b[i,j]] for i from 0 to 4 and for j from 0 to 4. This will give me a 5x5x1x5x5 matrix. Is there any way to do this without just using 2 for loops?

Let's think of the matrix a as 100 (= 5 x 5 x 4) matrices of size (5, 5). So, if you could get a liner index for each triplet - (i, j, b[i, j]) - you are done. That's where np.ravel_multi_index comes in. Following is the code.
import numpy as np
import itertools
# create some matrices
a = np.random.randint(0, 10, (5, 5, 4, 5, 5))
b = np.random(0, 4, (5, 5))
# creating all possible triplets - (ind1, ind2, ind3)
inds = list(itertools.product(range(5), range(5)))
(ind1, ind2), ind3 = zip(*inds), b.flatten()
allInds = np.array([ind1, ind2, ind3])
linearInds = np.ravel_multi_index(allInds, (5,5,4))
# reshaping the input array
a_reshaped = np.reshape(a, (100, 5, 5))
# selecting the appropriate indices
res1 = a_reshaped[linearInds, :, :]
# reshaping back into desired shape
res1 = np.reshape(res1, (5, 5, 1, 5, 5))
# verifying with the brute force method
res2 = np.empty((5, 5, 1, 5, 5))
for i in range(5):
for j in range(5):
res2[i, j, 0] = a[i, j, b[i, j], :, :]
print np.all(res1 == res2) # should print True

There's np.take_along_axis exactly for this purpose -
np.take_along_axis(a,b[:,:,None,None,None],axis=2)

Related

In numpy, multipy two structured matrices concisely

I have two matrices. The first has the following structure:
[[1, 0, a],
[0, 1, b],
[1, 0, c],
[0, 1, d]]
where 1, 0, a, b, c, and d are scalars. The matrix is 4 by 3
The second is just a 2 by 3 matrix:
[[r1],
[r2]]
where r1 and r2 are the first and second rows respectively, each having 3 elements.
I would like the output to be:
[[r1, 0, a*r1],
[0, r1, b*r1],
[r2, 0, c*r2],
[0, r2, d*r2]]
which would be a 4 by 9 matrix.
This is similar to the Kronecker product, except separately for each row of the second matrix. Of course this could be done with cumbersome loops which I want to avoid.
How can I do this concisely?
You can do exactly what you said in the last line: do a separate Kronecker product for each row of the second column and then concatenate the results.
Let's assume that the two matrices are called x (4 by 3) and y (2 by 3). The first thing to do is to split x in two parts because only half matrix participates in each part of the product.
x = x.reshape(2, 2, 3)
Then you can calculate the two products separately:
z0 = np.kron(x[0], y[0])
z1 = np.kron(x[1], y[1])
Finally, concatenate the two results along the first axis:
z = np.concatenate([z0, z1], axis=0)
Or if, like me, you enjoy big ugly one-liners you can do:
z = np.concatenate([np.kron(xr, yr) for xr, yr in zip(x.reshape(2, 2, 3), y)], axis=0)
In the general case you mentioned in the comments, it would become:
z = np.concatenate([np.kron(xr, yr) for xr, yr in zip(x.reshape(int(n / 2), 2, 3), y)], axis=0)
This gives equal results to the explicit loop, which can be numba.jit compiled I believe:
def solve_explicit(x, y):
# sanity checks
assert x.shape[0] == 2*y.shape[0]
assert x.shape[1] == y.shape[1]
n = x.shape[0]
z = np.zeros((n, 9))
for i in range(n):
for j in range(3):
for k in range(3):
z[i, k + 3 * j] = x[i, j] * y[int(i / 2), k]
return z
Using broadcasting, with x.shape (n, 3), and y.shape (n//2, 3):
out = (x.reshape(-1, 2, 3, 1) * y.reshape(-1, 1, 1, 3)).reshape(-1, 9)
I personally would use np.einsum in this situation because I think it's easier to understand than broadcasting.
import numpy as np
(a, b, c, d) = np.random.rand(4)
x = np.array([[1, 0, a], [0, 1, b], [1, 0, c], [0, 1, d]])
y = np.random.rand(2, 3)
z = np.einsum("ij,ik->ijk", x.reshape(-1, 6), y).reshape(-1, 9)
# timeit magic commands.
# %timeit -n 50000 np.einsum("ij,ik->ijk", x.reshape(-1, 6), y).reshape(-1, 9)
# %timeit -n 50000 (x.reshape(-1, 2, 3, 1) * y.reshape(-1, 1, 1, 3)).reshape(-1, 9)
Some good references on Einstein summation in NumPy: [2, 3, 4].

Indexing an array of unknown length in Python (Numpy, PyTorch)

I have an array of shape [m, 2, m, 2, ...]. By this, I mean that it has dimensions of size m and 2 that repeat a number of times L. I would like a solution of the following that works for any given L.
Example:
For L=1 the array would be of shape [m, 2]
For L=2 the array would be of shape [m, 2, m, 2]
For L=3 the array would be of shape [m, 2, m, 2, m, 2]
And so on...
I would like to index this array, in the dims of size m, with another array indices of shape [L, N] such as to eventually obtain an array of size [N, 2, 2, ...].
For a given L (e.g. L=3), I would do the indexing as follows,
array[indices[0], :, indices[1], :, indices[2], :]
resulting in an array of shape [N, 2, 2, 2].
Is there a smart way to do the indexing for generic L?
(Hope to have made the question clear!)
Edit 1:
To give idea of behavior, an ugly solution:
def indexing(array, indices):
L = indices.shape[0]
if L == 1:
array = array[indices[0]]
elif L == 2:
array = array[indices[0], :, indices[1], :]
elif L == 3:
array = array[indices[0], :, indices[1], :, indices[2], :]
elif L == 4:
array = array[indices[0], :, indices[1], :, indices[2], :, indices[3], :]
# etc...
return array
And a use example:
import torch
m = 5
N = 4
L = 3
array = torch.randn(m, 2, m, 2, m, 2)
indices = torch.randint(m, size=(L, N))
indexing(array, indices).shape # torch.Size([4, 2, 2, 2])
You can use len()! Pretty simple usage:
length = len(array)
for i in range(0, length):
# do something
You can also access the last item of the array whatever its length is by indexing -1, like so:
array = [1, 1, 5, 2, 4, ..., 99]
print(array[-1]) # 99

How do I reduce the use of for loops using numpy?

Basically, I have three arrays that I multiply with values from 0 to 2, expanding the number of rows to the number of products (the values to be multiplied are the same for each array). From there, I want to calculate the product of every combination of rows from all three arrays. So I have three arrays
A = np.array([1, 2, 3])
B = np.array([1, 2, 3])
C = np.array([1, 2, 3])
and I'm trying to reduce the operation given below
search_range = np.linspace(0, 2, 11)
results = np.array([[0, 0, 0]])
for i in search_range:
for j in search_range:
for k in search_range:
sm = i*A + j*B + k*C
results = np.append(results, [sm], axis=0)
What I tried doing:
A = np.array([[1, 2, 3]])
B = np.array([[1, 2, 3]])
C = np.array([[1, 2, 3]])
n = 11
scale = np.linspace(0, 2, n).reshape(-1, 1)
A = np.repeat(A, n, axis=0) * scale
B = np.repeat(B, n, axis=0) * scale
C = np.repeat(C, n, axis=0) * scale
results = np.array([[0, 0, 0]])
for i in range(n):
A_i = A[i]
for j in range(n):
B_j = B[j]
C_k = C
sm = A_i + B_j + C_k
results = np.append(results, sm, axis=0)
which only removes the last for loop. How do I reduce the other for loops?
You can get the same result like this:
search_range = np.linspace(0, 2, 11)
search_range = np.array(np.meshgrid(search_range, search_range, search_range))
search_range = search_range.T.reshape(-1, 3)
sm = search_range[:, 0, None]*A + search_range[:, 1, None]*B + search_range[:, 2, None]*C
results = np.concatenate(([[0, 0, 0]], sm))
Instead of using three nested loops to get every combination of elements in the "search_range" array, I used the meshgrid function to convert "search_range" to a 2D array of every possible combination and then instead of i, j and k you can use the 3 items in the arrays in the "search_range".
And finally, as suggested by #Mercury you can use indexing for the new "search_range" array to generate the result. For example search_range[:, 1, None] is an array in shape of (1331, 1), containing singleton arrays of every element at index of 0 in arrays in the "search_range". That concatenate is only there because you wanted the results array to have default value of [[0, 0, 0]], so I appended sm to it; Otherwise, the sm array contains the answer.

(Numpy or PyTorch) Sum array elements for given bins

I would like this problem to be solved using PyTorch tensors. If there is no efficient solution in torch, then feel free to suggest a numpy solution.
Let a be a 1-dimensional tensor (or numpy array), and bin_indices be a tensor (np array) of integers between 0 and n excluded. I want to compute the array bins that at position i contains the sum of elements of a[bins_indices == i].
n = 3
a = [1, 4, 3, -2, 5] # Values
bins_indices = [0, 0, 1, 2, 0] # Correspondent bin indices
bins = [10, 3, -2] # bins[0] = 1 + 4 + 5 etc. bins has 3 elements since n=3
If you can provide also a way of making this work for batches I would be immensely grateful to you!
Not sure if this is the best way but here is another solution:
>>> bins = torch.unique(bins_indices)
>>> vfunc = np.vectorize( lambda x: torch.sum( a[ bins_indices == x ] ) )
>>> vfunc( bins )
array([10, 3, -2])
Here's a one-line Numpy solution I could think of:
bins = [np.sum(a[np.argwhere(bins_indices == i).flatten()]) for i in range(len(a))]
PyTorch 1.12 added a function scatter_reduce_ to perform exactly this kind of operations
import torch
n = 3
a = torch.tensor([1, 4, 3, -2, 5]) # Values
bins_indices = torch.tensor([0, 0, 1, 2, 0]) # Correspondent bin indices
target_bins = torch.tensor([10, 3, -2]) # bins[0] = 1 + 4 + 5 etc. bins has 3 elements since n=3
bins = torch.zeros(3, dtype=a.dtype)
bins.scatter_reduce_(dim=0, src=a, index=bins_indices, reduce="sum")
assert torch.allclose(target_bins, bins)

"Multiply" 1d numpy array with a smaller one and sum the result

I want to "multiply" (for lack of better description) a numpy array X of size M with a smaller numpy array Y of size N, for every N elements in X. Then, I want to sum the resulting array (almost like a dotproduct).
I hope the example makes it more clear:
Example
X = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Y = [1,2,3]
Z = mymul(X, Y)
= [0*1, 1*2, 2*3, 3*1, 4*2, 5*3, 6*1, 7*2, 8*3, 9*1]
= [ 0, 2, 6, 3, 8, 15, 6, 14, 24, 9]
result = sum(Z) = 87
X and Y can be of varying lengths and Y is always smaller than X, but not necessarily divisible (e.g. M % N != 0)
I have some solutions but they are quite slow. I'm hoping there is a faster way to do this.
import numpy as np
X = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=int)
Y = np.array([1,2,3], dtype=int)
# these work but are slow for large X, Y
# simple for-loop
t = 0
for i in range(len(X)):
t += X[i] * Y[i % len(Y)]
print(t) #87
# extend Y M/N times so np.dot can be applied
Ytiled = np.tile(Y, int(np.ceil(len(X) / len(Y))))[:len(X)]
t = np.dot(X, Ytiled)
print(t) #87
Resize Y to same length as X and then use matrix-multiplication -
In [52]: np.dot(X, np.resize(Y,len(X)))
Out[52]: 87
Alternative to using np.resize would be with tiling. Hence, np.tile(Y,(m+n-1)//n)[:m] for m,n = len(X), len(Y), could replace np.resize(Y,len(X)) for a faster one.
Another without resizing Y to achieve memory-efficiency -
In [79]: m,n = len(X), len(Y)
In [80]: s = n*(m//n)
In [81]: X2D = X[:s].reshape(-1,n)
In [82]: X2D.dot(Y).sum() + np.dot(X[s:],Y[:m-s])
Out[82]: 87
Alternatively, we can use np.einsum('ij,j->',X2D,Y) to replace X2D.dot(Y).sum().
You can use convolve (documentation):
np.convolve(X, Y[::-1], 'same')[::len(Y)].sum()
Remember to reverse the second array.

Categories