How to convert 2D matrix to 3D by embeddings in numpy? [duplicate] - python

I'm trying to index the last dimension of a 3D matrix with a matrix consisting of indices that I wish to keep.
I have a matrix of thrust values with shape:
(3, 3, 5)
I would like to filter the last index according to some criteria so that it is reduced from size 5 to size 1. I have already found the indices in the last dimension that fit my criteria:
[[0 0 1]
[0 0 1]
[1 4 4]]
What I want to achieve: for the first row and first column I want the 0th index of the last dimension. For the first row and third column I want the 1st index of the last dimension. In terms of indices to keep the final matrix will become a (3, 3) 2D matrix like this:
[[0,0,0], [0,1,0], [0,2,1];
[1,0,0], [1,1,0], [1,2,1];
[2,0,1], [2,1,4], [2,2,4]]
I'm pretty confident numpy can achieve this, but I'm unable to figure out exactly how. I'd rather not build a construction with nested for loops.
I have already tried:
minValidPower = totalPower[:, :, tuple(indexMatrix)]
But this results in a (3, 3, 3, 3) matrix, so I am not entirely sure how I'm supposed to approach this.

With a as input array and idx as the indexing one -
Alternatively, with open-grids -
I,J = np.ogrid[:idx.shape[0],:idx.shape[1]]
out = a[I,J,idx]

You can build corresponding index arrays for the first two dimensions. Those would basically be:
[0 1 2]
[0 1 2]
[0 1 2]
[0 0 0]
[1 1 1]
[2 2 2]
You can construct these using the meshgrid function. I stored them as m1, and m2 in the example:
vals = np.arange(3*3*5).reshape(3, 3, 5) # test sample
m1, m2 = np.meshgrid(range(3), range(3), indexing='ij')
m3 = np.array([[0, 0, 1], 0, 0, 1], [1, 4, 4]])
sel_vals = vals[m1, m2, m3]
The shape of the result matches the shape of the indexing arrays m1, m2 and m3.


Slicing a 2D NumPy Array by all zero rows

This is essentially the 2D array equivalent of slicing a python list into smaller lists at indexes that store a particular value. I'm running a program that extracts a large amount of data out of a CSV file and copies it into a 2D NumPy array. The basic format of these arrays are something like this:
[[0 8 9 10]
[9 9 1 4]
[0 0 0 0]
[1 2 1 4]
[0 0 0 0]
[1 1 1 2]
[39 23 10 1]]
I want to separate my NumPy array along rows that contain all zero values to create a set of smaller 2D arrays. The successful result for the above starting array would be the arrays:
[[0 8 9 10]
[9 9 1 4]]
[[1 2 1 4]]
[[1 1 1 2]
[39 23 10 1]]
I've thought about simply iterating down the array and checking if the row has all zeros but the data I'm handling is substantially large. I have potentially millions of rows of data in the text file and I'm trying to find the most efficient approach as opposed to a loop that could waste computation time. What are your thoughts on what I should do? Is there a better way?
a is your array. You can use any to find all zero rows, remove them, and then use split to split by their indices:
#not_all_zero rows indices
idx = np.flatnonzero(a.any(1))
#all_zero rows indices
idx_zero = np.delete(np.arange(a.shape[0]),idx)
#select not_all_zero rows and split by all_zero row indices
output = np.split(a[idx],idx_zero-np.arange(idx_zero.size))
[array([[ 0, 8, 9, 10],
[ 9, 9, 1, 4]]),
array([[1, 2, 1, 4]]),
array([[ 1, 1, 1, 2],
[39, 23, 10, 1]])]
You can use the np.all function to check for rows which are all zeros, and then index appropriately.
# assume `x` is your data
indices = np.all(x == 0, axis=1)
zeros = x[indices]
nonzeros = x[np.logical_not(indices)]
The all function accepts an axis argument (as do many NumPy functions), which indicates the axis along which to operate. 1 here means to do the reduction along rows, so you get back a boolean array of shape (x.shape[0],), which can be used to directly index x.
Note that this will be much faster than a for-loop over the rows, especially for large arrays.

how to apply unique_with_counts over 2d array in tensorflow

I have this tensor:
tf_a1 = [[0 3 1 22]
[3 5 2 2]
[2 6 3 13]
[1 7 0 3 ]
[4 9 11 10]]
What I want to do is to find the unique values being repeated more than a threshold across all columns.
For example here, 3 repeated in 4 columns. 0 repeated in 2 columns. 2 repeated in 3 columns and so on.
I want my output be like this(suppose threshold is 2, so indexes repeated more than 2 times will be masked).
[[F T F F]
[T F T T]
[T F T F]
[F F F T]
[F F F F]]
This is what I have done:
y, idx, count = tf.unique_with_counts(tf_a1)
tf.where(tf.where(count, tf_a1, tf.zeros_like(tf_a1)))
But it raises the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: unique
expects a 1D vector. [Op:UniqueWithCounts]
Currently, the API of unique_with_counts that supports axis is not public yet. If you want to unique_with_counts with multi-dim tensor, you can call it like this in tf 1.14:
from tensorflow.python.ops import gen_array_ops
# tensor 'x' is [[1, 0, 0],
# [1, 0, 0],
# [2, 0, 0]]
y, idx, count = gen_array_ops.unique_with_counts_v2(x, [0])
y ==> [[1, 0, 0],
[2, 0, 0]]
idx ==> [0, 0, 1]
count ==> [2, 1]
Since I cannot comment #sariii post. It should be tf.greater(count,2). Also, I do not think this solution is satisfying the requirement of the problem. For example, if the first col are all 2 and the rest of col don't have 2. According to your requirement, 2 should be counted as 1 times. But in your solution, if you reshape to the 1d first, then you will lost this info. Please correct me if I'm wrong.
It seems unique_with_count should support multidimensional and axis, however I could not find anything in the documentation.
The workaround for me was to first of al reshape my tensor to 1d array and then apply unique_with_count and then reshape it back to the original size:
Thanks to #a_guest for sharing the idea
token_idx = tf.reshape(tf_a1, [-1])
y, idx, count = tf.unique_with_counts(token_idx)
masked = tf.greater_equal(count, 2)
backed_same_size = tf.gather(masked, idx)
tf.reshape(backed_same_size, shape=tf.shape(tf_a1))

Numpy - Index last dimension of array with index array

I'm trying to index the last dimension of a 3D matrix with a matrix consisting of indices that I wish to keep.
I have a matrix of thrust values with shape:
(3, 3, 5)
I would like to filter the last index according to some criteria so that it is reduced from size 5 to size 1. I have already found the indices in the last dimension that fit my criteria:
[[0 0 1]
[0 0 1]
[1 4 4]]
What I want to achieve: for the first row and first column I want the 0th index of the last dimension. For the first row and third column I want the 1st index of the last dimension. In terms of indices to keep the final matrix will become a (3, 3) 2D matrix like this:
[[0,0,0], [0,1,0], [0,2,1];
[1,0,0], [1,1,0], [1,2,1];
[2,0,1], [2,1,4], [2,2,4]]
I'm pretty confident numpy can achieve this, but I'm unable to figure out exactly how. I'd rather not build a construction with nested for loops.
I have already tried:
minValidPower = totalPower[:, :, tuple(indexMatrix)]
But this results in a (3, 3, 3, 3) matrix, so I am not entirely sure how I'm supposed to approach this.
With a as input array and idx as the indexing one -
Alternatively, with open-grids -
I,J = np.ogrid[:idx.shape[0],:idx.shape[1]]
out = a[I,J,idx]
You can build corresponding index arrays for the first two dimensions. Those would basically be:
[0 1 2]
[0 1 2]
[0 1 2]
[0 0 0]
[1 1 1]
[2 2 2]
You can construct these using the meshgrid function. I stored them as m1, and m2 in the example:
vals = np.arange(3*3*5).reshape(3, 3, 5) # test sample
m1, m2 = np.meshgrid(range(3), range(3), indexing='ij')
m3 = np.array([[0, 0, 1], 0, 0, 1], [1, 4, 4]])
sel_vals = vals[m1, m2, m3]
The shape of the result matches the shape of the indexing arrays m1, m2 and m3.

numpy get row index where elements in certain columns are zero

I want to find indexes of row based on criteria over certain columns
So, something like:
import numpy as np
x = np.random.rand(4, 5)
x[2, 2] = 0
x[2, 3] = 0
x[3, 1] = 0
x[1, 3] = 0
Now, I want to get the index of the rows where either of columns 3 or 4 are zeros. How can one do that with numpy? Do I need to make multiple calls to nonzero for each column and combine these indices using a set or something like that?
Using np.where first array within the tuple is row index
Out[79]: (array([1, 2], dtype=int64), array([0, 0], dtype=int64))

which numpy command could I use to subtract vectors with different dimensions many times?

i have to write this function:
in which x is a vector with dimensions [150,2] and c is [N,2] (lets suppose N=20). From each component xi (i=1,2) I have to subtract the components of c in this way ([x11-c11,x12-c12])...([x11-cN1, x12-cN2])for all the 150 sample.
I've trasformed them in a way I have the same dimensions and I can subtract them, but the result of the function should be a vector. Maybe How can I write this in numpy?
Thank you
Ok, lets suppose x=(5,2) and c=(3,2)
this is what I have obtained transforming dimensions of the two arrays. the problem is that, I have to do this but with a iteration "for loop" because the exp function should give me as a result a vector. so I have to obtain a sort of matrix divided in N blocks.
From what I understand of the issue, the problem seems to be in the way you are calculating the vector norm, not in the subtraction. Using your example, but calculating exp(-||x-c||), try:
x = np.linspace(8,17,10).reshape((5,2))
c = np.linspace(1,6,6).reshape((3,2))
sub = np.linalg.norm(x[:,None] - c, axis=-1)
array([[ 5.02000299e-05, 8.49325705e-04, 1.43695961e-02],
[ 2.96711024e-06, 5.02000299e-05, 8.49325705e-04],
[ 1.75373266e-07, 2.96711024e-06, 5.02000299e-05],
[ 1.03655678e-08, 1.75373266e-07, 2.96711024e-06],
[ 6.12664624e-10, 1.03655678e-08, 1.75373266e-07]])
(5, 3)
numpy.linalg.norm will try to return some kind of matrix norm across all the dimensions of its input unless you tell it explicitly which axis represents the vector components.
I I understand, try if this give the expected result, but there is still the problem that the result has the same shape of x:
import numpy as np
x = np.arange(10).reshape(5,2)
c = np.arange(6).reshape(3,2)
c_col_sum = np.sum(c, axis=0)
for (h,k), value in np.ndenumerate(x):
x[h,k] = c.shape[0] * x[h,k] - c_col_sum[k]
Initially x is:
[[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
And c is:
[[0 1]
[2 3]
[4 5]]
After the function x becomes:
[[-6 -6]
[ 0 0]
[ 6 6]
[12 12]
[18 18]]
