How to do tf.not_equal() on sparse tensor?

How to do tf.not_equal() on sparse tensor? - python

I get TypeError: Failed to convert object. Is there some way to do tf.not_equal() or equivalent on a sparse tensor? It must stay sparse; conversion to dense not permitted.

Supposing you want to compare two sparse tensors, and you have numbers in them, I think it's easiest to subtract one from the other and keep resulting non-zero values as "True" with tf.sparse_retain(). DomJack's answer only works if you want to compare a sparse tensor to a constant, but that's much easier with tf.sparse_retain() like the function sparse_not_equal_to_constant() below. (Please note this is not an accurate not_equal operation because it only tests existing values for inequality. Since the non-listed elements of a sparse tensor are zero, if the constant we're comparing to is not itself zero, then the rest of the matrix should also be marked as not equal. That's best done when converting back to dense, with the default_value parameter, but considering where the matrix had values to start with.) Tested code for comparing two sparse tensors, including function to compare it to constant:
import tensorflow as tf
import numpy as np
def sparse_not_equal_to_constant( s, c ):
a = tf.sparse_retain( s, tf.not_equal( c, s.values ) )
return tf.SparseTensor( a.indices, tf.ones_like( a.values, dtype = tf.bool ), dense_shape = s.dense_shape )
def sparse_not_equal( a, b ):
neg_b = tf.SparseTensor( b.indices, -b.values, dense_shape = b.dense_shape )
difference = tf.sparse_add( a, neg_b )
return sparse_not_equal_to_constant( difference, 0.0 )
# test data
a = tf.SparseTensor( [ [ 0, 0 ], [ 1, 4 ], [ 2, 3 ] ], [ 5.0, 6, 7 ], dense_shape = ( 5, 5 ) )
b = tf.SparseTensor( [ [ 0, 0 ], [ 0, 2 ], [ 2, 3 ] ], [ 5.0, 6, 2 ], dense_shape = ( 5, 5 ) )
e = sparse_not_equal( a, b )
f = tf.sparse_tensor_to_dense( e, default_value = False )
with tf.Session() as sess:
print( sess.run( f ) )
Outputs:
[[False False True False False]
[False False False False True]
[False False False True False]
[False False False False False]
[False False False False False]]
as expected.

I think you'll have to operate on the indices/values independently.
import numpy as np
import tensorflow as tf
def sparse_not_equal(sparse_tensor, value):
indices = sparse_tensor.indices
values = sparse_tensor.values
condition = tf.squeeze(tf.where(tf.not_equal(values, value)), axis=-1)
indices = tf.gather(indices, condition)
values = tf.ones(shape=(tf.shape(indices)[0],), dtype=tf.bool)
return tf.SparseTensor(
indices,
values,
sparse_tensor.dense_shape)
def get_sparse():
vals = tf.constant([2, 3, 4, 2])
indices = tf.constant(np.array([[1], [4], [5], [10]]))
dense_shape = [16]
return tf.SparseTensor(indices, vals, dense_shape)
sparse_tensor = get_sparse()
sparse_filtered = sparse_not_equal(sparse_tensor, 2)
with tf.Session() as sess:
s = sess.run(sparse_filtered)
print(s)

Related

How to index a multidimensional numpy array with a number of 1d boolean arrays?

Assume that I have a numpy array A with n dimensions, which might be very large, and assume that I have k 1-dimensional boolean masks M1, ..., Mk
I would like to extract from A an n-dimensional array B which contains all the elements of A located at indices where the "outer-AND" of all the masks is True.
..but I would like to do this without first forming the (possibly very large) "outer-AND" of all the masks, and without having to extract the specified elements from each axis one axis at a time hence creating (possibly many) intermediate copies in the process.
The example below demonstrates the two ways of extracting the elements from A just described above:
from functools import reduce
import numpy as np
m = 100
for _ in range(m):
n = np.random.randint(0, 10)
k = np.random.randint(0, n + 1)
A_shape = tuple(np.random.randint(0, 10, n))
A = np.random.uniform(-1, 1, A_shape)
M_lst = [np.random.randint(0, 2, dim).astype(bool) for dim in A_shape]
# creating shape of B:
B_shape = tuple(map(np.count_nonzero, M_lst)) + A_shape[len(M_lst):]
# size of B:
B_size = np.prod(B_shape)
# --- USING "OUTER-AND" OF ALL MASKS --- #
# creating "outer-AND" of all masks:
M = reduce(np.bitwise_and, (np.expand_dims(M, tuple(np.r_[:i, i+1:n])) for i, M in enumerate(M_lst)), True)
# extracting elements from A and reshaping to the correct shape:
B1 = A[M].reshape(B_shape)
# checking that the correct number of elements was extracted
assert B1.size == B_size
# THE PROBLEM WITH THIS METHOD IS THE POSSIBLY VERY LARGE OUTER-AND OF ALL THE MASKS!
# --- USING ONE MASK AT A TIME --- #
B2 = A
for i, M in enumerate(M_lst):
B2 = B2[tuple(slice(None) for _ in range(i)) + (M,)]
assert B2.size == np.prod(B_shape)
assert B2.shape == B_shape
# THE PROBLEM WITH THIS METHOD IS THE POSSIBLY LARGE NUMBER OF POSSIBLY LARGE INTERMEDIATE COPIES!
assert np.all(B1 == B2)
# EDIT 1:
# USING np.ix_ AS SUGGESTED BY Chrysophylaxs
i = np.ix_(*M_lst)
B3 = A[i]
assert B3.shape == B_shape
assert B3.size == B_size
assert np.prod(list(map(np.size, i))) == B_size
print(f'All three methods worked all {m} times')
Is there a smarter (more efficient) way to do this, possibly using an existing numpy function?.

IIUC, you're looking for np.ix_; an example:
import numpy as np
arr = np.arange(60).reshape(3, 4, 5)
x = [True, False, True]
y = [False, True, True, False]
z = [False, True, False, True, False]
out = arr[np.ix_(x, y, z)]
out:
array([[[ 6, 8],
[11, 13]],
[[46, 48],
[51, 53]]])

How to use tf.gather_nd for multi-dimensional tensor

I don't fully understand how I should use tf.gather_nd() to pick up elements along some axis if I have multi-dimensional tensor. Let's take a small example (if I get answer for this simple example, it solves also my more complex original problem). Let's say that I have rgb image and I am trying to pick the smallest pixel value along channels (last dimension if data order is (B,H,W,C)). I know that this can be done with tf.recude_min(x, axis=-1) but I would like to know that is it also possible to do the same thing with tf.argmin() and tf.gather_nd()?
from skimage import data
import tensorflow as tf
import numpy as np
# Load RGB image from skimage, cast it to float32 and put it in order (B,H,W,C)
image = data.astronaut()
image = tf.cast(image, tf.float32)
image = tf.expand_dims(image, axis=0)
# Take minimum pixel value of each channel in a way number 1
min_along_channels_1 = tf.reduce_min(image, axis=-1)
# Take minimum pixel value of each channel in a way number 2
# The goal is that min_along_channels_1 is equal to min_along_channels_2
idxs = tf.argmin(image, axis=-1)
min_along_channels_2 = tf.gather_nd(image, idxs) # This line gives error :(

You will have to use tf.meshgrid, which will create a rectangular grid of two one-dimensional arrays representing the tensor indexing of the first and second dimension, since tf.gather_nd needs to know exactly where to extract values across the dimensions. Here is a simplified example:
import tensorflow as tf
image = tf.random.normal((1, 4, 4, 3))
image = tf.squeeze(image, axis=0)
idx = tf.argmin(image, axis=-1)
ij = tf.stack(tf.meshgrid(
tf.range(image.shape[0], dtype=tf.int64),
tf.range(image.shape[1], dtype=tf.int64),
indexing='ij'), axis=-1)
gather_indices = tf.concat([ij, tf.expand_dims(idx, axis=-1)], axis=-1)
result = tf.gather_nd(image, gather_indices)
print('First option -->', tf.reduce_min(image, axis=-1))
print('Second option -->', result)
First option --> tf.Tensor(
[[-0.53245485 -0.29117298 -0.64434254 -0.8209638 ]
[-0.9386176 -0.5993224 -0.597746 -1.5392851 ]
[-0.5478666 -1.5280861 -1.0344954 -1.920418 ]
[-0.5580688 -1.425873 -1.9276617 -1.0668412 ]], shape=(4, 4), dtype=float32)
Second option --> tf.Tensor(
[[-0.53245485 -0.29117298 -0.64434254 -0.8209638 ]
[-0.9386176 -0.5993224 -0.597746 -1.5392851 ]
[-0.5478666 -1.5280861 -1.0344954 -1.920418 ]
[-0.5580688 -1.425873 -1.9276617 -1.0668412 ]], shape=(4, 4), dtype=float32)
Or with your example:
from skimage import data
import tensorflow as tf
import numpy as np
image = data.astronaut()
image = tf.cast(image, tf.float32)
image = tf.expand_dims(image, axis=0)
min_along_channels_1 = tf.reduce_min(image, axis=-1)
image = tf.squeeze(image, axis=0)
idx = tf.argmin(image, axis=-1)
ij = tf.stack(tf.meshgrid(
tf.range(image.shape[0], dtype=tf.int64),
tf.range(image.shape[1], dtype=tf.int64),
indexing='ij'), axis=-1)
gather_indices = tf.concat([ij, tf.expand_dims(idx, axis=-1)], axis=-1)
min_along_channels_2 = tf.gather_nd(image, gather_indices)
print(tf.equal(min_along_channels_1, min_along_channels_2))
tf.Tensor(
[[[ True True True ... True True True]
[ True True True ... True True True]
[ True True True ... True True True]
...
[ True True True ... True True True]
[ True True True ... True True True]
[ True True True ... True True True]]], shape=(1, 512, 512), dtype=bool)

Good way to slice a numpy array based on the shape of another array

Take two arrays of arbitrary shape, but where each of the dimensions of the second is less than or equal to the dimensions of the first. For example:
np.random.seed(8675309)
a = np.random.choice(10, 3**3).reshape(3,3,3)
b = np.zeros(2**3).reshape(2,2,2)
What I want is the following:
c = a[:b.shape[0], :b.shape[1], :b.shape[2]]
but for an array b with arbitrary shape, potentially with fewer dimensions. How could I do this programmatically? Such that
def reference_slicer(a, b):
???
return c
reference_slicer(a,b) == c

You mean something like this?
def reference_slicer(a, b):
index = [slice(0, dim) for dim in b.shape]
for i in range(len(b.shape), len(a.shape)):
index.append(slice(0,a.shape[i]))
return a[index]
#array([[[ True, True],
# [ True, True]],
# [[ True, True],
# [ True, True]]])

How to elegantly drop unnecessary elements in numpy?

I have an ndarray of shape [batch_size, seq_len, num_features]. However, some of elements in the end of the sequential dimension is not necessary, and therefore I want to drop them and merge the sequential dimension into the batch dimension. For example, the ndarray a I want to manipulate is
batch_size = 2
seq_len = 3
num_features = 1
a = np.random.randn(batch_size, seq_len, num_features)
mask = np.ones((batch_size, seq_len), dtype=np.bool)
mask[0][1:] = 0
mask[1][2:] = 0
"""
>>> a = [[[-0.3908401 ]
[ 0.89686512]
[ 0.07594243]]
[[-0.12256737]
[-1.00838131]
[ 0.56543754]]]
mask=[[ True False False]
[ True True False]]
"""
where mask is used to indicate whether the elements in a is useful. I can get what I want using the following code
res = []
for seq, m in zip(a, mask):
res.append(seq[:sum(m)])
np.concatenate(res, axis=0)
"""
>>>array([[0.08676509],
[0.47162315],
[0.98070665]])
"""
I'm wondering if there is a more elegant way to do this in numpy?

Not sure if this is what your asking but the results look fine
res = a[mask]

Since dimensions related to batch and seq are going to be merged, you could reshape both a and mask to 2D array of shape (batch_size * seq_len, num_features).
Next, simply filter important samples using boolean index. See the code:
mask2d = mask.reshape(-1) # or mask.ravel()
a2d = a.reshape(-1, num_features)
result = a2d[mask2d]

Get masked argmax with different mask for each row in TensorFlow

I have a tensor of shape Nx7, which looks something like this:
[0.97863993 0.64479575 -0.202357 0.94678476 0.0080051 0.44507797 0.47864
0.05914348 -0.72649432 0.193803 0.47295245 0.8381458 0.30449861 0.46783]
I have another tensor of the same shape, which is a boolean mask:
[True False True True False True False
False True False False True False False]
I want to get the argmax of each row in the first tensor, but only of those elements for which the mask is True, so basically the argmax of the following array:
[0.97863993 X -0.202357 0.94678476 X 0.44507797 X
X -0.72649432 X X 0.8381458 X X]
Which should thus become:
[0
4]
Is this possible in TensorFlow? I am trying to figure it out with tf.boolean_mask, but I don't see how to deal with different rows having differing numbers of True values in the mask.
Input code in TF:
mask = tf.placeholder(shape=[None, 7], dtype=tf.bool)
val = tf.placeholder(shape=[None, 7], dtype=tf.float32)
arg_max = ???
Note that I want negative values to be handled correctly as well (otherwise the method proposed by Ishant Mrinal would work).

Convert the boolean array into a float array
# mask = tf.placeholder(shape=[None, 7], dtype=tf.bool)
# mask = tf.cast(mask, dtype=tf.float32)
mask = tf.placeholder(shape=[None, 7], dtype=tf.float32)
val = tf.placeholder(shape=[None, 7], dtype=tf.float32)
argmax = tf.argmax(tf.multiply(val, mask), axis=1)
sess.run(argmax, {val: your_val_array, mask: 2*mask_bool_array.astype(float)-1 })

To emulate a masked argmax, you can set values outside of the mask to -inf, for example like this:
masked_val = tf.minimum(val, (2* tf.to_float(mask) - 1) * np.inf)
masked_arg_max = tf.argmax(masked_val, axis=1)
Alternatively, to compute masked_val, you could use
masked_val = tf.where(mask, val, -tf.ones_like(val) * np.inf)
which is arguably clearer, but may waste memory.
For a masked argmin, you would do the opposite:
masked_val = tf.maximum(val, (1 - 2* tf.to_float(mask)) * np.inf)
masked_arg_min = tf.argmin(masked_val, axis=1)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to do tf.not_equal() on sparse tensor? - python

I get TypeError: Failed to convert object. Is there some way to do tf.not_equal() or equivalent on a sparse tensor? It must stay sparse; conversion to dense not permitted.

Related

How to index a multidimensional numpy array with a number of 1d boolean arrays?

How to use tf.gather_nd for multi-dimensional tensor

Good way to slice a numpy array based on the shape of another array

How to elegantly drop unnecessary elements in numpy?

Get masked argmax with different mask for each row in TensorFlow

Categories

Resources