Indexing an array of unknown length in Python (Numpy, PyTorch)

Indexing an array of unknown length in Python (Numpy, PyTorch) - python

I have an array of shape [m, 2, m, 2, ...]. By this, I mean that it has dimensions of size m and 2 that repeat a number of times L. I would like a solution of the following that works for any given L.
Example:
For L=1 the array would be of shape [m, 2]
For L=2 the array would be of shape [m, 2, m, 2]
For L=3 the array would be of shape [m, 2, m, 2, m, 2]
And so on...
I would like to index this array, in the dims of size m, with another array indices of shape [L, N] such as to eventually obtain an array of size [N, 2, 2, ...].
For a given L (e.g. L=3), I would do the indexing as follows,
array[indices[0], :, indices[1], :, indices[2], :]
resulting in an array of shape [N, 2, 2, 2].
Is there a smart way to do the indexing for generic L?
(Hope to have made the question clear!)
Edit 1:
To give idea of behavior, an ugly solution:
def indexing(array, indices):
L = indices.shape[0]
if L == 1:
array = array[indices[0]]
elif L == 2:
array = array[indices[0], :, indices[1], :]
elif L == 3:
array = array[indices[0], :, indices[1], :, indices[2], :]
elif L == 4:
array = array[indices[0], :, indices[1], :, indices[2], :, indices[3], :]
# etc...
return array
And a use example:
import torch
m = 5
N = 4
L = 3
array = torch.randn(m, 2, m, 2, m, 2)
indices = torch.randint(m, size=(L, N))
indexing(array, indices).shape # torch.Size([4, 2, 2, 2])

You can use len()! Pretty simple usage:
length = len(array)
for i in range(0, length):
# do something
You can also access the last item of the array whatever its length is by indexing -1, like so:
array = [1, 1, 5, 2, 4, ..., 99]
print(array[-1]) # 99

Related

Pytorch tensor multi-dimensional selection

i have a question regarding the efficient operation of the pytorch tensor multidimensional selection.
Assuming i have a tensor a, with
# B=2, V=20000, d=64
a = torch.rand(B, V, d)
and a tensor b, with
# B=2, N=30000, k=10; k is the index inside of [0, V]
b = torch.randint(0, V, (B, N, k))
The target is to construct a selected tensor from a, namely
help_1 = a[:, None, :, :].repeat(1, N, 1, 1) # [B, N, V, d]
help_2 = b[:, :, :, None].expand(-1,-1,-1,d) # [B, N, k, d]
c = torch.gather(help_1, dim=2, index=help_2)
this operation can indeed output the desired results, but is not very efficient since i created a very large help_1 matrix, which has size [2, 30000, 20000, 64]. I wonder if anyone has idea about doing this without creating such a large helper tensor for selection? Thank you!

You could use broadcasting with the indexing to save memory. Something like the following would work.
idx0 = torch.arange(B, device=b.device).reshape(-1, 1, 1, 1) # [B, 1, 1, 1]
idx1 = b[..., None] # [B, N, k, 1]
idx2 = torch.arange(d, device=b.device).reshape(1, 1, 1, -1) # [1, 1, 1, d]
c = a[idx0, idx1, idx2] # [B, N, k, d]

How do I reduce the use of for loops using numpy?

Basically, I have three arrays that I multiply with values from 0 to 2, expanding the number of rows to the number of products (the values to be multiplied are the same for each array). From there, I want to calculate the product of every combination of rows from all three arrays. So I have three arrays
A = np.array([1, 2, 3])
B = np.array([1, 2, 3])
C = np.array([1, 2, 3])
and I'm trying to reduce the operation given below
search_range = np.linspace(0, 2, 11)
results = np.array([[0, 0, 0]])
for i in search_range:
for j in search_range:
for k in search_range:
sm = i*A + j*B + k*C
results = np.append(results, [sm], axis=0)
What I tried doing:
A = np.array([[1, 2, 3]])
B = np.array([[1, 2, 3]])
C = np.array([[1, 2, 3]])
n = 11
scale = np.linspace(0, 2, n).reshape(-1, 1)
A = np.repeat(A, n, axis=0) * scale
B = np.repeat(B, n, axis=0) * scale
C = np.repeat(C, n, axis=0) * scale
results = np.array([[0, 0, 0]])
for i in range(n):
A_i = A[i]
for j in range(n):
B_j = B[j]
C_k = C
sm = A_i + B_j + C_k
results = np.append(results, sm, axis=0)
which only removes the last for loop. How do I reduce the other for loops?

You can get the same result like this:
search_range = np.linspace(0, 2, 11)
search_range = np.array(np.meshgrid(search_range, search_range, search_range))
search_range = search_range.T.reshape(-1, 3)
sm = search_range[:, 0, None]*A + search_range[:, 1, None]*B + search_range[:, 2, None]*C
results = np.concatenate(([[0, 0, 0]], sm))
Instead of using three nested loops to get every combination of elements in the "search_range" array, I used the meshgrid function to convert "search_range" to a 2D array of every possible combination and then instead of i, j and k you can use the 3 items in the arrays in the "search_range".
And finally, as suggested by #Mercury you can use indexing for the new "search_range" array to generate the result. For example search_range[:, 1, None] is an array in shape of (1331, 1), containing singleton arrays of every element at index of 0 in arrays in the "search_range". That concatenate is only there because you wanted the results array to have default value of [[0, 0, 0]], so I appended sm to it; Otherwise, the sm array contains the answer.

Numpy: assign different values to different rows using mask

Say I have an array p in shape (m, n), an thresholds vector Ts in shape (m). I want to replace value in p using the following rule:
for i in range(m):
for j in range(n):
if p[i, j] > Ts[i]:
p[i, j] = Ts[i]
My implementation is:
newP = np.zeros_like(p)
cond = p > Ts[:, None]
newP += cond * Ts[:, None]
newP += ~cond * p
p = newP
It definitely looks ugly. I want to know if there's any way like p[cond]=Ts style. Thanks :)
An example:
# m = 2, n = 5
p = np.array([[1, 2, 3, 4, 5],
[6, 7, 8, 9, 10]])
Ts = np.array([3, 8])
expected_new_p = \
np.array([[1, 2, 3, 3, 3],
[6, 7, 8, 8, 8]])

You can simply use np.where. If condition is not met, then return element from broadcasted Ts, else return the respective p value.
np.where(p < Ts[:, None], p, Ts[:, None])
array([[1, 2, 3, 3, 3],
[6, 7, 8, 8, 8]])

I think the most readable solution would be to use np.minimum() to extract element-wise minimums between p and the broadcasted array Ts[:,None]:
p = np.minimum(p, Ts[:,None])

You can use np.tile to cast threshold array to input array. So you can use boolean indexing directly. This might be useful for you:
m, n = 3, 4
x = np.random.random((m,n))
t = np.random.random((m))
mask = x > t[:,np.newaxis]
x[mask] = np.tile(t[:,np.newaxis], (1,n))[mask] #assigning values of t for True values to corresponding elements

You can compare p and Ts by adding an extra dimension to Ts, extract the locations where p < Ts with np.where. Then, ovewrite with the values from Ts:
i = np.where(p > Ts[:, None])
p[i] = Ts[i[0]]
Above, i is a tuple of arrays containing the indices in each dimensions of p.

Numpy select matrix specified by a matrix of indices, from multidimensional array

I have a numpy array a of size 5x5x4x5x5. I have another matrix b of size 5x5. I want to get a[i,j,b[i,j]] for i from 0 to 4 and for j from 0 to 4. This will give me a 5x5x1x5x5 matrix. Is there any way to do this without just using 2 for loops?

Let's think of the matrix a as 100 (= 5 x 5 x 4) matrices of size (5, 5). So, if you could get a liner index for each triplet - (i, j, b[i, j]) - you are done. That's where np.ravel_multi_index comes in. Following is the code.
import numpy as np
import itertools
# create some matrices
a = np.random.randint(0, 10, (5, 5, 4, 5, 5))
b = np.random(0, 4, (5, 5))
# creating all possible triplets - (ind1, ind2, ind3)
inds = list(itertools.product(range(5), range(5)))
(ind1, ind2), ind3 = zip(*inds), b.flatten()
allInds = np.array([ind1, ind2, ind3])
linearInds = np.ravel_multi_index(allInds, (5,5,4))
# reshaping the input array
a_reshaped = np.reshape(a, (100, 5, 5))
# selecting the appropriate indices
res1 = a_reshaped[linearInds, :, :]
# reshaping back into desired shape
res1 = np.reshape(res1, (5, 5, 1, 5, 5))
# verifying with the brute force method
res2 = np.empty((5, 5, 1, 5, 5))
for i in range(5):
for j in range(5):
res2[i, j, 0] = a[i, j, b[i, j], :, :]
print np.all(res1 == res2) # should print True

There's np.take_along_axis exactly for this purpose -
np.take_along_axis(a,b[:,:,None,None,None],axis=2)

Given distances and values array, return sorted filtered values in numpy

I am not sure what the title of this question should be. But lets say we have 2 arrays, values and distances.
values = np.array([[-1,-1,-1],
[1, 2, 0],
[-1,-1,-1]])
distances = np.array([[1,2,3],
[6,5,4],
[7,8,9]])
I would like to get the values that are non negative, and have them in order by its corresponding distance, based on the distances array.
So with the example above, the positive values are [1,2,0] and its distances will be [6,5,4]. Thus, if sorting by its corresponding distance, I would like to have [0,2,1] as the answer.
My code is below. It works, but would like to have the solution of just using numpy. Im sure that would be more efficient than this:
import numpy as np
import heapq
def get_sorted_values(seek_val, values, distances):
r, c = np.where(values >= seek_val)
di = distances[r, c]
vals = values[r, c]
print("di", di)
print("vals", vals)
if len(di) >= 1:
heap = []
for d, v in zip(di,vals):
heapq.heappush(heap, (d,v))
lists = []
while heap:
d, v = heapq.heappop(heap)
lists.append(v)
return lists
else:
## NOTHING FOUND
return None
Input:
seek_val = 0
values = np.array([[-1,-1,-1],
[1,2,0],
[-1,-1,-1]])
distances = np.array([[1,2,3],
[6,5,4],
[7,8,9]])
print("Ans:",get_sorted_values(seek_val, values, distances))
Output:
di [6 5 4]
vals [1 2 0]
Ans: [0, 2, 1]

"one liner":
values[np.where(values >= 0)][np.argsort(distances[np.where(values >= 0)])]
Out[981]: array([0, 2, 1])
repeating np.where(values >= 0) is inefficient, could make a variable if values is big
v_indx = np.where(values >= 0)
values[v_indx][np.argsort(distances[v_indx])]

Try np.argsort
import numpy as np
values = np.array([[-1,-1,-1],
[ 1, 2, 0],
[-1,-1,-1]])
distances = np.array([[1, 2, 3],
[6, 5, 4],
[7, 8, 9]])
print(values[values >= 0])
# [1 2 0]
print(distances[values >= 0])
# [6 5 4]
print('Ans:', values[values >= 0][np.argsort(distances[values >= 0])])
# Ans: [0 2 1]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Indexing an array of unknown length in Python (Numpy, PyTorch) - python

You can use len()! Pretty simple usage: length = len(array) for i in range(0, length): # do something You can also access the last item of the array whatever its length is by indexing -1, like so: array = [1, 1, 5, 2, 4, ..., 99] print(array[-1]) # 99

Related

Pytorch tensor multi-dimensional selection

How do I reduce the use of for loops using numpy?

Numpy: assign different values to different rows using mask

Numpy select matrix specified by a matrix of indices, from multidimensional array

Given distances and values array, return sorted filtered values in numpy

Categories

Resources