How to convert 2D matrix to 3D by embeddings in numpy? [duplicate] - python

I'm trying to index the last dimension of a 3D matrix with a matrix consisting of indices that I wish to keep.
I have a matrix of thrust values with shape:
(3, 3, 5)
I would like to filter the last index according to some criteria so that it is reduced from size 5 to size 1. I have already found the indices in the last dimension that fit my criteria:
[[0 0 1]
[0 0 1]
[1 4 4]]
What I want to achieve: for the first row and first column I want the 0th index of the last dimension. For the first row and third column I want the 1st index of the last dimension. In terms of indices to keep the final matrix will become a (3, 3) 2D matrix like this:
[[0,0,0], [0,1,0], [0,2,1];
[1,0,0], [1,1,0], [1,2,1];
[2,0,1], [2,1,4], [2,2,4]]
I'm pretty confident numpy can achieve this, but I'm unable to figure out exactly how. I'd rather not build a construction with nested for loops.
I have already tried:
minValidPower = totalPower[:, :, tuple(indexMatrix)]
But this results in a (3, 3, 3, 3) matrix, so I am not entirely sure how I'm supposed to approach this.

With a as input array and idx as the indexing one -
np.take_along_axis(a,idx[...,None],axis=-1)[...,0]
Alternatively, with open-grids -
I,J = np.ogrid[:idx.shape[0],:idx.shape[1]]
out = a[I,J,idx]

You can build corresponding index arrays for the first two dimensions. Those would basically be:
[0 1 2]
[0 1 2]
[0 1 2]
[0 0 0]
[1 1 1]
[2 2 2]
You can construct these using the meshgrid function. I stored them as m1, and m2 in the example:
vals = np.arange(3*3*5).reshape(3, 3, 5) # test sample
m1, m2 = np.meshgrid(range(3), range(3), indexing='ij')
m3 = np.array([[0, 0, 1], 0, 0, 1], [1, 4, 4]])
sel_vals = vals[m1, m2, m3]
The shape of the result matches the shape of the indexing arrays m1, m2 and m3.

Related

Slicing a 2D NumPy Array by all zero rows

This is essentially the 2D array equivalent of slicing a python list into smaller lists at indexes that store a particular value. I'm running a program that extracts a large amount of data out of a CSV file and copies it into a 2D NumPy array. The basic format of these arrays are something like this:
[[0 8 9 10]
[9 9 1 4]
[0 0 0 0]
[1 2 1 4]
[0 0 0 0]
[1 1 1 2]
[39 23 10 1]]
I want to separate my NumPy array along rows that contain all zero values to create a set of smaller 2D arrays. The successful result for the above starting array would be the arrays:
[[0 8 9 10]
[9 9 1 4]]
[[1 2 1 4]]
[[1 1 1 2]
[39 23 10 1]]
I've thought about simply iterating down the array and checking if the row has all zeros but the data I'm handling is substantially large. I have potentially millions of rows of data in the text file and I'm trying to find the most efficient approach as opposed to a loop that could waste computation time. What are your thoughts on what I should do? Is there a better way?
a is your array. You can use any to find all zero rows, remove them, and then use split to split by their indices:
#not_all_zero rows indices
idx = np.flatnonzero(a.any(1))
#all_zero rows indices
idx_zero = np.delete(np.arange(a.shape[0]),idx)
#select not_all_zero rows and split by all_zero row indices
output = np.split(a[idx],idx_zero-np.arange(idx_zero.size))
output:
[array([[ 0, 8, 9, 10],
[ 9, 9, 1, 4]]),
array([[1, 2, 1, 4]]),
array([[ 1, 1, 1, 2],
[39, 23, 10, 1]])]
You can use the np.all function to check for rows which are all zeros, and then index appropriately.
# assume `x` is your data
indices = np.all(x == 0, axis=1)
zeros = x[indices]
nonzeros = x[np.logical_not(indices)]
The all function accepts an axis argument (as do many NumPy functions), which indicates the axis along which to operate. 1 here means to do the reduction along rows, so you get back a boolean array of shape (x.shape[0],), which can be used to directly index x.
Note that this will be much faster than a for-loop over the rows, especially for large arrays.

how to apply unique_with_counts over 2d array in tensorflow

I have this tensor:
tf_a1 = [[0 3 1 22]
[3 5 2 2]
[2 6 3 13]
[1 7 0 3 ]
[4 9 11 10]]
What I want to do is to find the unique values being repeated more than a threshold across all columns.
For example here, 3 repeated in 4 columns. 0 repeated in 2 columns. 2 repeated in 3 columns and so on.
I want my output be like this(suppose threshold is 2, so indexes repeated more than 2 times will be masked).
[[F T F F]
[T F T T]
[T F T F]
[F F F T]
[F F F F]]
This is what I have done:
y, idx, count = tf.unique_with_counts(tf_a1)
tf.where(tf.where(count, tf_a1, tf.zeros_like(tf_a1)))
But it raises the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: unique
expects a 1D vector. [Op:UniqueWithCounts]
Thanks.
Currently, the API of unique_with_counts that supports axis is not public yet. If you want to unique_with_counts with multi-dim tensor, you can call it like this in tf 1.14:
from tensorflow.python.ops import gen_array_ops
# tensor 'x' is [[1, 0, 0],
# [1, 0, 0],
# [2, 0, 0]]
y, idx, count = gen_array_ops.unique_with_counts_v2(x, [0])
y ==> [[1, 0, 0],
[2, 0, 0]]
idx ==> [0, 0, 1]
count ==> [2, 1]
Since I cannot comment #sariii post. It should be tf.greater(count,2). Also, I do not think this solution is satisfying the requirement of the problem. For example, if the first col are all 2 and the rest of col don't have 2. According to your requirement, 2 should be counted as 1 times. But in your solution, if you reshape to the 1d first, then you will lost this info. Please correct me if I'm wrong.
issue#16503
It seems unique_with_count should support multidimensional and axis, however I could not find anything in the documentation.
The workaround for me was to first of al reshape my tensor to 1d array and then apply unique_with_count and then reshape it back to the original size:
Thanks to #a_guest for sharing the idea
token_idx = tf.reshape(tf_a1, [-1])
y, idx, count = tf.unique_with_counts(token_idx)
masked = tf.greater_equal(count, 2)
backed_same_size = tf.gather(masked, idx)
tf.reshape(backed_same_size, shape=tf.shape(tf_a1))

Numpy - Index last dimension of array with index array

I'm trying to index the last dimension of a 3D matrix with a matrix consisting of indices that I wish to keep.
I have a matrix of thrust values with shape:
(3, 3, 5)
I would like to filter the last index according to some criteria so that it is reduced from size 5 to size 1. I have already found the indices in the last dimension that fit my criteria:
[[0 0 1]
[0 0 1]
[1 4 4]]
What I want to achieve: for the first row and first column I want the 0th index of the last dimension. For the first row and third column I want the 1st index of the last dimension. In terms of indices to keep the final matrix will become a (3, 3) 2D matrix like this:
[[0,0,0], [0,1,0], [0,2,1];
[1,0,0], [1,1,0], [1,2,1];
[2,0,1], [2,1,4], [2,2,4]]
I'm pretty confident numpy can achieve this, but I'm unable to figure out exactly how. I'd rather not build a construction with nested for loops.
I have already tried:
minValidPower = totalPower[:, :, tuple(indexMatrix)]
But this results in a (3, 3, 3, 3) matrix, so I am not entirely sure how I'm supposed to approach this.
With a as input array and idx as the indexing one -
np.take_along_axis(a,idx[...,None],axis=-1)[...,0]
Alternatively, with open-grids -
I,J = np.ogrid[:idx.shape[0],:idx.shape[1]]
out = a[I,J,idx]
You can build corresponding index arrays for the first two dimensions. Those would basically be:
[0 1 2]
[0 1 2]
[0 1 2]
[0 0 0]
[1 1 1]
[2 2 2]
You can construct these using the meshgrid function. I stored them as m1, and m2 in the example:
vals = np.arange(3*3*5).reshape(3, 3, 5) # test sample
m1, m2 = np.meshgrid(range(3), range(3), indexing='ij')
m3 = np.array([[0, 0, 1], 0, 0, 1], [1, 4, 4]])
sel_vals = vals[m1, m2, m3]
The shape of the result matches the shape of the indexing arrays m1, m2 and m3.

numpy get row index where elements in certain columns are zero

I want to find indexes of row based on criteria over certain columns
So, something like:
import numpy as np
x = np.random.rand(4, 5)
x[2, 2] = 0
x[2, 3] = 0
x[3, 1] = 0
x[1, 3] = 0
Now, I want to get the index of the rows where either of columns 3 or 4 are zeros. How can one do that with numpy? Do I need to make multiple calls to nonzero for each column and combine these indices using a set or something like that?
Using np.where first array within the tuple is row index
np.where(x[:,[3,4]]==0)
Out[79]: (array([1, 2], dtype=int64), array([0, 0], dtype=int64))

which numpy command could I use to subtract vectors with different dimensions many times?

i have to write this function:
in which x is a vector with dimensions [150,2] and c is [N,2] (lets suppose N=20). From each component xi (i=1,2) I have to subtract the components of c in this way ([x11-c11,x12-c12])...([x11-cN1, x12-cN2])for all the 150 sample.
I've trasformed them in a way I have the same dimensions and I can subtract them, but the result of the function should be a vector. Maybe How can I write this in numpy?
Thank you
Ok, lets suppose x=(5,2) and c=(3,2)
this is what I have obtained transforming dimensions of the two arrays. the problem is that, I have to do this but with a iteration "for loop" because the exp function should give me as a result a vector. so I have to obtain a sort of matrix divided in N blocks.
From what I understand of the issue, the problem seems to be in the way you are calculating the vector norm, not in the subtraction. Using your example, but calculating exp(-||x-c||), try:
x = np.linspace(8,17,10).reshape((5,2))
c = np.linspace(1,6,6).reshape((3,2))
sub = np.linalg.norm(x[:,None] - c, axis=-1)
np.exp(-sub)
array([[ 5.02000299e-05, 8.49325705e-04, 1.43695961e-02],
[ 2.96711024e-06, 5.02000299e-05, 8.49325705e-04],
[ 1.75373266e-07, 2.96711024e-06, 5.02000299e-05],
[ 1.03655678e-08, 1.75373266e-07, 2.96711024e-06],
[ 6.12664624e-10, 1.03655678e-08, 1.75373266e-07]])
np.exp(-sub).shape
(5, 3)
numpy.linalg.norm will try to return some kind of matrix norm across all the dimensions of its input unless you tell it explicitly which axis represents the vector components.
I I understand, try if this give the expected result, but there is still the problem that the result has the same shape of x:
import numpy as np
x = np.arange(10).reshape(5,2)
c = np.arange(6).reshape(3,2)
c_col_sum = np.sum(c, axis=0)
for (h,k), value in np.ndenumerate(x):
x[h,k] = c.shape[0] * x[h,k] - c_col_sum[k]
Initially x is:
[[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
And c is:
[[0 1]
[2 3]
[4 5]]
After the function x becomes:
[[-6 -6]
[ 0 0]
[ 6 6]
[12 12]
[18 18]]

Categories