Numpy - Index last dimension of array with index array - python

I'm trying to index the last dimension of a 3D matrix with a matrix consisting of indices that I wish to keep.
I have a matrix of thrust values with shape:
(3, 3, 5)
I would like to filter the last index according to some criteria so that it is reduced from size 5 to size 1. I have already found the indices in the last dimension that fit my criteria:
[[0 0 1]
[0 0 1]
[1 4 4]]
What I want to achieve: for the first row and first column I want the 0th index of the last dimension. For the first row and third column I want the 1st index of the last dimension. In terms of indices to keep the final matrix will become a (3, 3) 2D matrix like this:
[[0,0,0], [0,1,0], [0,2,1];
[1,0,0], [1,1,0], [1,2,1];
[2,0,1], [2,1,4], [2,2,4]]
I'm pretty confident numpy can achieve this, but I'm unable to figure out exactly how. I'd rather not build a construction with nested for loops.
I have already tried:
minValidPower = totalPower[:, :, tuple(indexMatrix)]
But this results in a (3, 3, 3, 3) matrix, so I am not entirely sure how I'm supposed to approach this.

With a as input array and idx as the indexing one -
np.take_along_axis(a,idx[...,None],axis=-1)[...,0]
Alternatively, with open-grids -
I,J = np.ogrid[:idx.shape[0],:idx.shape[1]]
out = a[I,J,idx]

You can build corresponding index arrays for the first two dimensions. Those would basically be:
[0 1 2]
[0 1 2]
[0 1 2]
[0 0 0]
[1 1 1]
[2 2 2]
You can construct these using the meshgrid function. I stored them as m1, and m2 in the example:
vals = np.arange(3*3*5).reshape(3, 3, 5) # test sample
m1, m2 = np.meshgrid(range(3), range(3), indexing='ij')
m3 = np.array([[0, 0, 1], 0, 0, 1], [1, 4, 4]])
sel_vals = vals[m1, m2, m3]
The shape of the result matches the shape of the indexing arrays m1, m2 and m3.

Related

How to convert 2D matrix to 3D by embeddings in numpy? [duplicate]

I'm trying to index the last dimension of a 3D matrix with a matrix consisting of indices that I wish to keep.
I have a matrix of thrust values with shape:
(3, 3, 5)
I would like to filter the last index according to some criteria so that it is reduced from size 5 to size 1. I have already found the indices in the last dimension that fit my criteria:
[[0 0 1]
[0 0 1]
[1 4 4]]
What I want to achieve: for the first row and first column I want the 0th index of the last dimension. For the first row and third column I want the 1st index of the last dimension. In terms of indices to keep the final matrix will become a (3, 3) 2D matrix like this:
[[0,0,0], [0,1,0], [0,2,1];
[1,0,0], [1,1,0], [1,2,1];
[2,0,1], [2,1,4], [2,2,4]]
I'm pretty confident numpy can achieve this, but I'm unable to figure out exactly how. I'd rather not build a construction with nested for loops.
I have already tried:
minValidPower = totalPower[:, :, tuple(indexMatrix)]
But this results in a (3, 3, 3, 3) matrix, so I am not entirely sure how I'm supposed to approach this.
With a as input array and idx as the indexing one -
np.take_along_axis(a,idx[...,None],axis=-1)[...,0]
Alternatively, with open-grids -
I,J = np.ogrid[:idx.shape[0],:idx.shape[1]]
out = a[I,J,idx]
You can build corresponding index arrays for the first two dimensions. Those would basically be:
[0 1 2]
[0 1 2]
[0 1 2]
[0 0 0]
[1 1 1]
[2 2 2]
You can construct these using the meshgrid function. I stored them as m1, and m2 in the example:
vals = np.arange(3*3*5).reshape(3, 3, 5) # test sample
m1, m2 = np.meshgrid(range(3), range(3), indexing='ij')
m3 = np.array([[0, 0, 1], 0, 0, 1], [1, 4, 4]])
sel_vals = vals[m1, m2, m3]
The shape of the result matches the shape of the indexing arrays m1, m2 and m3.

Slicing a 2D NumPy Array by all zero rows

This is essentially the 2D array equivalent of slicing a python list into smaller lists at indexes that store a particular value. I'm running a program that extracts a large amount of data out of a CSV file and copies it into a 2D NumPy array. The basic format of these arrays are something like this:
[[0 8 9 10]
[9 9 1 4]
[0 0 0 0]
[1 2 1 4]
[0 0 0 0]
[1 1 1 2]
[39 23 10 1]]
I want to separate my NumPy array along rows that contain all zero values to create a set of smaller 2D arrays. The successful result for the above starting array would be the arrays:
[[0 8 9 10]
[9 9 1 4]]
[[1 2 1 4]]
[[1 1 1 2]
[39 23 10 1]]
I've thought about simply iterating down the array and checking if the row has all zeros but the data I'm handling is substantially large. I have potentially millions of rows of data in the text file and I'm trying to find the most efficient approach as opposed to a loop that could waste computation time. What are your thoughts on what I should do? Is there a better way?
a is your array. You can use any to find all zero rows, remove them, and then use split to split by their indices:
#not_all_zero rows indices
idx = np.flatnonzero(a.any(1))
#all_zero rows indices
idx_zero = np.delete(np.arange(a.shape[0]),idx)
#select not_all_zero rows and split by all_zero row indices
output = np.split(a[idx],idx_zero-np.arange(idx_zero.size))
output:
[array([[ 0, 8, 9, 10],
[ 9, 9, 1, 4]]),
array([[1, 2, 1, 4]]),
array([[ 1, 1, 1, 2],
[39, 23, 10, 1]])]
You can use the np.all function to check for rows which are all zeros, and then index appropriately.
# assume `x` is your data
indices = np.all(x == 0, axis=1)
zeros = x[indices]
nonzeros = x[np.logical_not(indices)]
The all function accepts an axis argument (as do many NumPy functions), which indicates the axis along which to operate. 1 here means to do the reduction along rows, so you get back a boolean array of shape (x.shape[0],), which can be used to directly index x.
Note that this will be much faster than a for-loop over the rows, especially for large arrays.

how to apply unique_with_counts over 2d array in tensorflow

I have this tensor:
tf_a1 = [[0 3 1 22]
[3 5 2 2]
[2 6 3 13]
[1 7 0 3 ]
[4 9 11 10]]
What I want to do is to find the unique values being repeated more than a threshold across all columns.
For example here, 3 repeated in 4 columns. 0 repeated in 2 columns. 2 repeated in 3 columns and so on.
I want my output be like this(suppose threshold is 2, so indexes repeated more than 2 times will be masked).
[[F T F F]
[T F T T]
[T F T F]
[F F F T]
[F F F F]]
This is what I have done:
y, idx, count = tf.unique_with_counts(tf_a1)
tf.where(tf.where(count, tf_a1, tf.zeros_like(tf_a1)))
But it raises the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: unique
expects a 1D vector. [Op:UniqueWithCounts]
Thanks.
Currently, the API of unique_with_counts that supports axis is not public yet. If you want to unique_with_counts with multi-dim tensor, you can call it like this in tf 1.14:
from tensorflow.python.ops import gen_array_ops
# tensor 'x' is [[1, 0, 0],
# [1, 0, 0],
# [2, 0, 0]]
y, idx, count = gen_array_ops.unique_with_counts_v2(x, [0])
y ==> [[1, 0, 0],
[2, 0, 0]]
idx ==> [0, 0, 1]
count ==> [2, 1]
Since I cannot comment #sariii post. It should be tf.greater(count,2). Also, I do not think this solution is satisfying the requirement of the problem. For example, if the first col are all 2 and the rest of col don't have 2. According to your requirement, 2 should be counted as 1 times. But in your solution, if you reshape to the 1d first, then you will lost this info. Please correct me if I'm wrong.
issue#16503
It seems unique_with_count should support multidimensional and axis, however I could not find anything in the documentation.
The workaround for me was to first of al reshape my tensor to 1d array and then apply unique_with_count and then reshape it back to the original size:
Thanks to #a_guest for sharing the idea
token_idx = tf.reshape(tf_a1, [-1])
y, idx, count = tf.unique_with_counts(token_idx)
masked = tf.greater_equal(count, 2)
backed_same_size = tf.gather(masked, idx)
tf.reshape(backed_same_size, shape=tf.shape(tf_a1))

numpy get row index where elements in certain columns are zero

I want to find indexes of row based on criteria over certain columns
So, something like:
import numpy as np
x = np.random.rand(4, 5)
x[2, 2] = 0
x[2, 3] = 0
x[3, 1] = 0
x[1, 3] = 0
Now, I want to get the index of the rows where either of columns 3 or 4 are zeros. How can one do that with numpy? Do I need to make multiple calls to nonzero for each column and combine these indices using a set or something like that?
Using np.where first array within the tuple is row index
np.where(x[:,[3,4]]==0)
Out[79]: (array([1, 2], dtype=int64), array([0, 0], dtype=int64))

Why does dim=1 return row indices in torch.argmax?

I am working on argmax function of PyTorch which is defined as:
torch.argmax(input, dim=None, keepdim=False)
Consider an example
a = torch.randn(4, 4)
print(a)
print(torch.argmax(a, dim=1))
Here when I use dim=1 instead of searching column vectors, the function searches for row vectors as shown below.
print(a) :
tensor([[-1.7739, 0.8073, 0.0472, -0.4084],
[ 0.6378, 0.6575, -1.2970, -0.0625],
[ 1.7970, -1.3463, 0.9011, -0.8704],
[ 1.5639, 0.7123, 0.0385, 1.8410]])
print(torch.argmax(a, dim=1))
tensor([1, 1, 0, 3])
As far as my assumption goes dim = 0 represents rows and dim =1 represent columns.
It's time to correctly understand how the axis or dim argument work in PyTorch:
The following example should make sense once you comprehend the above picture:
|
v
dim-0 ---> -----> dim-1 ------> -----> --------> dim-1
| [[-1.7739, 0.8073, 0.0472, -0.4084],
v [ 0.6378, 0.6575, -1.2970, -0.0625],
| [ 1.7970, -1.3463, 0.9011, -0.8704],
v [ 1.5639, 0.7123, 0.0385, 1.8410]]
|
v
# argmax (indices where max values are present) along dimension-1
In [215]: torch.argmax(a, dim=1)
Out[215]: tensor([1, 1, 0, 3])
Note: dim (short for 'dimension') is the torch equivalent of 'axis' in NumPy.
Dimensions are defined as shown in the above excellent answer. I have highlighted the way I understand dimensions in Torch and Numpy (dim and axis respectively) and hope that this will be helpful to others.
Notice that only the specified dimension’s index varies during the argmax operation, and the specified dimension’s index range reduces to a single index once the operation is completed. Let tensor A have M rows and N columns and consider the sum operation for simplicity. The shape of A is (M, N). If dim=0 is specified, then the vectors A[0,:], A[1,:], ..., A[M-1,:] are summed elementwise and the result is another tensor with 1 row and N columns. Notice that only the 0th dimension’s indices vary from 0 throughout M-1. Similarly, If dim=1 is specified, then the vectors A[:,0], A[:,1], ..., A[:,N-1] are summed elementwise and the result is another tensor with M rows and 1 column.
An example is given below:
>>> A = torch.tensor([[1,2,3], [4,5,6]])
>>> A
tensor([[1, 2, 3],
[4, 5, 6]])
>>> S0 = torch.sum(A, dim = 0)
>>> S0
tensor([5, 7, 9])
>>> S1 = torch.sum(A, dim = 1)
>>> S1
tensor([ 6, 15])
In the above sample code, the first sum operation specifies dim=0, therefore A[0,:] and A[1,:], which are [1,2,3] and [4,5,6], are summed and resulted in [5, 7, 9]. When dim=1 was specified, the vectors A[:,0], A[:,1], and A[:2], which are the vectors [1, 4], [2, 5], and [3, 6], are elementwise added to find [6, 15].
Note also that the specified dimension collapses. Again let A have the shape (M, N). If dim=0, then the result will have the shape (1, N), where dimension 0 is reduced from M to 1. Similarly if dim=1, then the result would have the shape (M, 1), where N is reduced to 1. Note also that shapes (1, N) and (M,1) are represented by a single-dimensional tensor with N and M elements respectively.

Categories