how to apply unique_with_counts over 2d array in tensorflow - python

I have this tensor:
tf_a1 = [[0 3 1 22]
[3 5 2 2]
[2 6 3 13]
[1 7 0 3 ]
[4 9 11 10]]
What I want to do is to find the unique values being repeated more than a threshold across all columns.
For example here, 3 repeated in 4 columns. 0 repeated in 2 columns. 2 repeated in 3 columns and so on.
I want my output be like this(suppose threshold is 2, so indexes repeated more than 2 times will be masked).
[[F T F F]
[T F T T]
[T F T F]
[F F F T]
[F F F F]]
This is what I have done:
y, idx, count = tf.unique_with_counts(tf_a1)
tf.where(tf.where(count, tf_a1, tf.zeros_like(tf_a1)))
But it raises the error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: unique
expects a 1D vector. [Op:UniqueWithCounts]
Thanks.

Currently, the API of unique_with_counts that supports axis is not public yet. If you want to unique_with_counts with multi-dim tensor, you can call it like this in tf 1.14:
from tensorflow.python.ops import gen_array_ops
# tensor 'x' is [[1, 0, 0],
# [1, 0, 0],
# [2, 0, 0]]
y, idx, count = gen_array_ops.unique_with_counts_v2(x, [0])
y ==> [[1, 0, 0],
[2, 0, 0]]
idx ==> [0, 0, 1]
count ==> [2, 1]
Since I cannot comment #sariii post. It should be tf.greater(count,2). Also, I do not think this solution is satisfying the requirement of the problem. For example, if the first col are all 2 and the rest of col don't have 2. According to your requirement, 2 should be counted as 1 times. But in your solution, if you reshape to the 1d first, then you will lost this info. Please correct me if I'm wrong.
issue#16503

It seems unique_with_count should support multidimensional and axis, however I could not find anything in the documentation.
The workaround for me was to first of al reshape my tensor to 1d array and then apply unique_with_count and then reshape it back to the original size:
Thanks to #a_guest for sharing the idea
token_idx = tf.reshape(tf_a1, [-1])
y, idx, count = tf.unique_with_counts(token_idx)
masked = tf.greater_equal(count, 2)
backed_same_size = tf.gather(masked, idx)
tf.reshape(backed_same_size, shape=tf.shape(tf_a1))

Related

Generalized version of np.roll

I have a 2D array
a = np.array([[0,1,2,3],[4,5,6,7]])
that is a 2x4 array. I need to shift the elements of each of the two arrays in axis 0 in but with different steps, say 1 for the first and 2 for the second, so that the output will be
np.array([[1,2,3,0],[6,7,4,5]])
With np.roll it doesn't seem possible to do it, at least looking at the documentation, I don't see any useful hint. There exists another function doing this?
This is an attempt at a generalized version of numpy.roll.
import numpy as np
a = np.array([[0,1,2,3],[4,5,6,7]])
def roll(a, shifts, axis):
assert a.shape[axis] == len(shifts)
return np.stack([
np.roll(np.take(a, i, axis), shifts[i]) for i in range(len(shifts))
], axis)
print(a)
print(roll(a, [-1, -2], 0))
print(roll(a, [1, 2, 1, 0], 1))
prints
[[0 1 2 3]
[4 5 6 7]]
[[1 2 3 0]
[6 7 4 5]]
[[4 1 6 3]
[0 5 2 7]]
Here, the parameter a is a numpy.array, shifts is an Iterable containing the shift amounts per element and axis is the axis along which to shift. Note that was only tested on two-dimensional arrays however.

How to convert 2D matrix to 3D by embeddings in numpy? [duplicate]

I'm trying to index the last dimension of a 3D matrix with a matrix consisting of indices that I wish to keep.
I have a matrix of thrust values with shape:
(3, 3, 5)
I would like to filter the last index according to some criteria so that it is reduced from size 5 to size 1. I have already found the indices in the last dimension that fit my criteria:
[[0 0 1]
[0 0 1]
[1 4 4]]
What I want to achieve: for the first row and first column I want the 0th index of the last dimension. For the first row and third column I want the 1st index of the last dimension. In terms of indices to keep the final matrix will become a (3, 3) 2D matrix like this:
[[0,0,0], [0,1,0], [0,2,1];
[1,0,0], [1,1,0], [1,2,1];
[2,0,1], [2,1,4], [2,2,4]]
I'm pretty confident numpy can achieve this, but I'm unable to figure out exactly how. I'd rather not build a construction with nested for loops.
I have already tried:
minValidPower = totalPower[:, :, tuple(indexMatrix)]
But this results in a (3, 3, 3, 3) matrix, so I am not entirely sure how I'm supposed to approach this.
With a as input array and idx as the indexing one -
np.take_along_axis(a,idx[...,None],axis=-1)[...,0]
Alternatively, with open-grids -
I,J = np.ogrid[:idx.shape[0],:idx.shape[1]]
out = a[I,J,idx]
You can build corresponding index arrays for the first two dimensions. Those would basically be:
[0 1 2]
[0 1 2]
[0 1 2]
[0 0 0]
[1 1 1]
[2 2 2]
You can construct these using the meshgrid function. I stored them as m1, and m2 in the example:
vals = np.arange(3*3*5).reshape(3, 3, 5) # test sample
m1, m2 = np.meshgrid(range(3), range(3), indexing='ij')
m3 = np.array([[0, 0, 1], 0, 0, 1], [1, 4, 4]])
sel_vals = vals[m1, m2, m3]
The shape of the result matches the shape of the indexing arrays m1, m2 and m3.

Slicing a 2D NumPy Array by all zero rows

This is essentially the 2D array equivalent of slicing a python list into smaller lists at indexes that store a particular value. I'm running a program that extracts a large amount of data out of a CSV file and copies it into a 2D NumPy array. The basic format of these arrays are something like this:
[[0 8 9 10]
[9 9 1 4]
[0 0 0 0]
[1 2 1 4]
[0 0 0 0]
[1 1 1 2]
[39 23 10 1]]
I want to separate my NumPy array along rows that contain all zero values to create a set of smaller 2D arrays. The successful result for the above starting array would be the arrays:
[[0 8 9 10]
[9 9 1 4]]
[[1 2 1 4]]
[[1 1 1 2]
[39 23 10 1]]
I've thought about simply iterating down the array and checking if the row has all zeros but the data I'm handling is substantially large. I have potentially millions of rows of data in the text file and I'm trying to find the most efficient approach as opposed to a loop that could waste computation time. What are your thoughts on what I should do? Is there a better way?
a is your array. You can use any to find all zero rows, remove them, and then use split to split by their indices:
#not_all_zero rows indices
idx = np.flatnonzero(a.any(1))
#all_zero rows indices
idx_zero = np.delete(np.arange(a.shape[0]),idx)
#select not_all_zero rows and split by all_zero row indices
output = np.split(a[idx],idx_zero-np.arange(idx_zero.size))
output:
[array([[ 0, 8, 9, 10],
[ 9, 9, 1, 4]]),
array([[1, 2, 1, 4]]),
array([[ 1, 1, 1, 2],
[39, 23, 10, 1]])]
You can use the np.all function to check for rows which are all zeros, and then index appropriately.
# assume `x` is your data
indices = np.all(x == 0, axis=1)
zeros = x[indices]
nonzeros = x[np.logical_not(indices)]
The all function accepts an axis argument (as do many NumPy functions), which indicates the axis along which to operate. 1 here means to do the reduction along rows, so you get back a boolean array of shape (x.shape[0],), which can be used to directly index x.
Note that this will be much faster than a for-loop over the rows, especially for large arrays.

Numpy - Index last dimension of array with index array

I'm trying to index the last dimension of a 3D matrix with a matrix consisting of indices that I wish to keep.
I have a matrix of thrust values with shape:
(3, 3, 5)
I would like to filter the last index according to some criteria so that it is reduced from size 5 to size 1. I have already found the indices in the last dimension that fit my criteria:
[[0 0 1]
[0 0 1]
[1 4 4]]
What I want to achieve: for the first row and first column I want the 0th index of the last dimension. For the first row and third column I want the 1st index of the last dimension. In terms of indices to keep the final matrix will become a (3, 3) 2D matrix like this:
[[0,0,0], [0,1,0], [0,2,1];
[1,0,0], [1,1,0], [1,2,1];
[2,0,1], [2,1,4], [2,2,4]]
I'm pretty confident numpy can achieve this, but I'm unable to figure out exactly how. I'd rather not build a construction with nested for loops.
I have already tried:
minValidPower = totalPower[:, :, tuple(indexMatrix)]
But this results in a (3, 3, 3, 3) matrix, so I am not entirely sure how I'm supposed to approach this.
With a as input array and idx as the indexing one -
np.take_along_axis(a,idx[...,None],axis=-1)[...,0]
Alternatively, with open-grids -
I,J = np.ogrid[:idx.shape[0],:idx.shape[1]]
out = a[I,J,idx]
You can build corresponding index arrays for the first two dimensions. Those would basically be:
[0 1 2]
[0 1 2]
[0 1 2]
[0 0 0]
[1 1 1]
[2 2 2]
You can construct these using the meshgrid function. I stored them as m1, and m2 in the example:
vals = np.arange(3*3*5).reshape(3, 3, 5) # test sample
m1, m2 = np.meshgrid(range(3), range(3), indexing='ij')
m3 = np.array([[0, 0, 1], 0, 0, 1], [1, 4, 4]])
sel_vals = vals[m1, m2, m3]
The shape of the result matches the shape of the indexing arrays m1, m2 and m3.

Iterating through matrix in python using numpy

I want to generate a resultant matrix by iterating through 5 different matrices and firstly i want to take first value of all matrix and take the average of these values and append the result as the first value of resultant matrix. Can anyone tell how to do this in python using numpy library??
In general you want to avoid (potentially slow) python-based looping and let numpy do (faster) c-based looping (or no looping at all).
Most people would call the approach of removing explicit loops as (numpy-)vectorization which is usually very important if going for performance.
The following example creates 5 numpy-arrays with size (3,3) (the matrix-type, which also exists, is kind of deprecated, not used here and most numpy-users should use arrays as replacement for matrices) and calculate a new matrix containing all the averages with the same shape (elementwise-mean over matrix-cells; we are interpreting the 2d-arrays as a matrix).
Code:
import numpy as np
a, b, c, d, e = [np.random.randint(0, 5, size=(3,3)) for i in range(5)]
all = np.stack((a, b, c, d, e), axis=0)
print(all.shape)
x = np.mean(all, axis=0)
print(a)
print(b)
print(c)
print(d)
print(e)
print(x)
Out:
(5, 3, 3)
[[0 0 0]
[0 1 0]
[2 4 0]]
[[4 2 0]
[3 3 4]
[0 4 0]]
[[3 4 0]
[2 2 1]
[0 0 4]]
[[3 1 2]
[4 3 4]
[2 0 3]]
[[3 4 2]
[3 1 0]
[1 0 0]]
[[ 2.6 2.2 0.8]
[ 2.4 2. 1.8]
[ 1. 1.6 1.4]]
If you still want to loop, you can just use a nested loop like:
for row in range(array.shape[0]):
for col in range(array.shape[1]):
cell_value = array[row, col]
...
given an array of 2 dimensions.

Categories