Removing low quality tensor predictions from softmax - python

I want to apply a filter to a tensor and remove values that do not meet my criteria. For example, lets say I have a tensor that looks like this:
softmax_tensor = [[ 0.05 , 0.05, 0.2, 0.7], [ 0.25 , 0.25, 0.3, 0.2 ]]
Right now, the classifier picks the argmax of the tensors to predict:
predictions = [[3],[2]]
But this isn't exactly what I want because I loose information about the confidence of that prediction. I would rather not make a prediction than to make an incorrect prediction. So what I would like to do is return filtered tensors like so:
new_softmax_tensor = [[ 0.05 , 0.05, 0.2, 0.7]]
new_predictions = [[3]]
If this were straight-up python, I'd have no trouble:
new_softmax_tensor = []
new_predictions = []
for idx,listItem in enumerate(softmax_tensor):
# get two highest max values and see if they are far enough apart
M = max(listItem)
M2 = max(n for n in listItem if n!=M)
if M2 - M > 0.3: # just making up a criteria here
but given that tensorflow works on tensors, I'm not sure how to do this - and if I did, would it break the computation graph?
A previous SO post suggested using tf.gather_nd, but in that scenario they already had a tensor that they wated to filter on. I've also looked at tf.cond but still don't understand. I would imagine many other people would benefit from this exact same solution.
Thanks all.

Two things that I would do to solve your problem :
First, I would return the value of the softmax tensor. You look for a reference to it somewhere (you keep a reference to it when you create it, or you find it back in the appropriate tensor collection) And then evaluate it in a[softmaxtensor,prediction],feed_dict=..) And then you play with it with python as much as you like.
Second If you want to stay within the graph, I would use the build-it tf.where(), working quite alike the np.where function from numpy package doc there

Ok. I've got it sorted out now. Here is a working example.
import tensorflow as tf
#Set dummy example tensor
original_softmax_tensor = tf.Variable([
#Set dummy prediction tensor
original_predictions = tf.Variable([3,3,4,4],name='original_predictions')
#Now create a place to store my new variables
new_softmax_tensor = original_softmax_tensor
new_predictions = original_predictions
#set my cutoff variable
min_diff = tf.constant(0.3)
init_op = tf.global_variables_initializer()
with tf.Session() as sess: #execute init_op
#There's probably a better way to do this, but I had to do this hack to get
# the difference between the top 2 scores
tmp_diff1, _ = tf.nn.top_k(original_softmax_tensor,k=2,sorted=True)
tmp_diff2, _ = tf.nn.top_k(original_softmax_tensor,k=1,sorted=True)
#subtracting the max scores from both, makes the largest one '0'
actual_diff = tf.subtract(tmp_diff2,tmp_diff1)
#The max value for each will be the actual value of interest
actual_diff = tf.reduce_max(actual_diff,reduction_indices=[1])
#Create a boolean tensor that says to keep or not
cond_result = actual_diff > min_diff
#Keep only the values I want
new_predictions = tf.boolean_mask(original_predictions,cond_result)
new_softmax_tensor = tf.boolean_mask(new_softmax_tensor,cond_result)
# return these if this is in a function


How to build TF tensor with ones in specified locations - batch compatible

I apologize for the poor question title but I'm not sure quite how to phrase it. Here's the problem I'm trying to solve: I have two NNs working off of the same input dataset in my code. One of them is a traditional network while the other is used to limit the acceptable range of the first. This works by using a tf.where() statement which works fine in most cases, such as this toy example:
pcts= [0.04,0.06,0.06,0.06,0.06,0.06,0.06,0.04,0.04,0.04]
legal_actions = tf.where(pcts>=0.05, tf.ones_like(pcts), tf.zeros_like(pcts))
Which gives the correct result: legal_actions = [0,1,1,1,1,1,1,0,0,0]
I can then multiply this by the output of my first network to limit its Q values to only those of the legal actions. In a case like the above this works great.
However, it is also possible that my original vector looks something like this, with low values in the middle of the high values: pcts= [0.04,0.06,0.06,0.04,0.04,0.06,0.06,0.04,0.04,0.04]
Using the same code as above my legal_actions comes out as this: legal_actions = [0,1,1,0,0,1,1,0,0,0]
Based on the code I have this is correct, however, I'd like to include any zeros in the middle as part of my legal_actions. In other words, I'd like this second example to be the same as the first. Working in basic TF this is easy to do in several different ways, such as in this reproducible example (it's also easy to do with sparse tensors):
import tensorflow as tf
pcts= tf.placeholder(tf.float32, shape=(10,))
legal_actions = tf.where(pcts>=0.05, tf.ones_like(pcts), tf.zeros_like(pcts))
mask = tf.where(tf.greater(legal_actions,0))
legals = tf.cast(tf.range(tf.reduce_min(mask),tf.reduce_max(mask)+1),tf.int64)
oh = tf.one_hot(legals,10)
oh = tf.reduce_sum(oh,0)
with tf.Session() as sess:
The problem that I'm running into is when I try to apply this to my actual code which is reading in batches from a file. I can't figure out a way to fill in the "gaps" in my tensor without the range function and/or I can't figure out how to make the range function work with batches (it will only make one range at a time, not one per batch, as near as I can tell). Any suggestions on how to either make what I'm working on work or how to solve the problem a completely different way would be appreciated.
Try this code:
import tensorflow as tf
pcts = tf.random.uniform((2,3,4))
a = pcts>=0.5
shape = tf.shape(pcts)[-1]
a = tf.reshape(a, (-1, shape))
a = tf.cast(a, dtype=tf.float32)
def rng(t):
left = tf.scan(lambda a, x: max(a, x), t)
right = tf.scan(lambda a, x: max(a, x), t, reverse=True)
return tf.minimum(left, right)
a = tf.map_fn(lambda x: rng(x), a)
a = tf.reshape(a, (tf.shape(pcts)))

Logical operation on the contents of a tensor

I have a list named 'datastate' of shape (?,10) which is getting filled with the results of 10 batch samples in tensorflow (all tensors). In other words, with a batch size of 256, this will be populated with 10 different tensors of size 256.
in pseudocode below ....
datastate = {}
for sample in range(num_samples):
datastate[sample] = batch_results
What I would like to do next is define a variable like 'datastate_change', which would determine if the i-th record of batch_results was changed versus the (i-1)th record of batch_results. This might look something like the following if Pandas style syntax worked ... but I'm not clear on how to do this inside of tf during the
for sample in range(num_samples):
datastate[sample] = batch_results
datastate_change[sample] = batch_results - batch_results.shift(1)
To be a bit more concrete, if a single instance of batch_results are [1,1,1,0,1] I would like to have datastate[1] = [1,1,1,0,1] and datastate_change[1] = [1,0,0,-1,1]
Found a satisfactory answer on my own - key was numpy is better analogue plugin than pandas....
First I create a copy of my datastate which is padded along the top with zeros
Then I slice off the bottom row of this copy
Lastly I subtract the two.
top_paddings = tf.constant([[1, 0]]) #New tensor with the 'top' being zeros
top_padded_datastate_[sample] = tf.pad(datastate[sample], top_paddings, "CONSTANT")
top_padded_datastate[sample] = top_padded_datastate_[sample][:-1]
datastate_changes[sample] = tf.subtract(datastate[sample], top_padded_datastate[sample])

Apparently contradictory results from topk/sort and pick

I'm predicting roughly one of 100K possible outputs with a MXNet model, using a fairly standard softmax output. I want to compare the probability assigned to the true label versus the top predictions under the model. To get the former I'm using the pick operator; the later I've tried the cheap version (topk operator) and the expensive version (sort/argsort + slice).
In both cases I'm getting contradictory results. Specifically, there are numerous cases where the probability of the true label (retrieved with pick) is significantly higher than the highest probability output (retrieved with topk/sort). I think this means I'm doing something wrong but don't understand what. It does not happen for all predictions, but it does for a significant fraction.
Can anybody give me a hint as to what is going on?
Code follows:
for batch in data_iter:
model.forward(batch, is_train=False)
predictions = model.get_outputs()[0]
labels = batch.label[0].as_in_context(predictions.context)
# scores = mx.nd.topk(predictions, axis=1, k=6, ret_typ='value')
scores = mx.nd.sort(predictions, axis=1, is_ascend=0)
scores = mx.nd.slice_axis(scores, axis=1, begin=0, end=6)
label_score = mx.nd.pick(predictions, labels, axis=1)
equal = label_score.asnumpy() <= scores.asnumpy()[:, 0]
if not np.all(equal):
#I think this should never happen but it does frequently
Testing with MXNet 1.1.0, the following code shows that the problem doesn't happen:
for _ in range(10):
predictions = nd.random.uniform(shape=(100, 100000))
labels = nd.array(np.random.randint(0, 99999, size=(100, 1)))
scores = mx.nd.sort(predictions, axis=1, is_ascend=0)
scores = mx.nd.slice_axis(scores, axis=1, begin=0, end=6)
label_score = mx.nd.pick(predictions, labels, axis=1)
equal = label_score.asnumpy() <= scores.asnumpy()[:, 0]
if not np.all(equal):

TensorFlow Averaging with Dynamic Lengths

I am trying to do a Mean operation given the actual lengths of sequences. (Masking Zero vectors)
My inputs sequence_outpus are of (batch_size, max_len, dimensions)
I have a tensor that stores the actual lengths of each sequence in the batch. I used the function from
def length(sequence):
used = tf.sign(tf.reduce_max(tf.abs(sequence), reduction_indices=2))
length = tf.reduce_sum(used, reduction_indices=1)
length = tf.cast(length, tf.int64)
return length
I do this:
lengths = length(sequence_outputs)
lengths = tf.cast(length, tf.float32)
lengths = tf.expand_dims(lengths,1)
sentence_outputs = tf.reduce_sum(sentence_outputs,1) / lengths
The graph compiles but I am getting NaN loss values. Furthermore my lengths become negative values when debugging with eval().
This seems to be a simple problem but I've been stuck with this for sometime and would appreciate some help!
I see no issue. Your code is slightly over-complicated. The following code
import numpy as np
import tensorflow as tf
# creating data
B = 15
data = np.zeros([B, MAX_LEN], dtype=np.float32)
for b in range(B):
current_len = np.random.randint(2, MAX_LEN)
current_vector = np.concatenate([np.random.randn(current_len), np.zeros(MAX_LEN - current_len)], axis=-1)
print("{}\t\t{}".format(current_vector, current_vector.shape))
data[b, ...] = current_vector
data_op = tf.convert_to_tensor(data)
def tf_length(x):
assert len(x.get_shape().as_list()) == 2
length = tf.count_nonzero(x, axis=1, keepdims=True)
return length
x = tf.reduce_sum(data_op, axis=1) / tf_length(data_op)
# test gradients
grads = tf.gradients(tf.reduce_mean(x), [data_op])
with tf.Session() as sess:
runs perfectly fine here without any NaNs. Are you sure, you are really using this code? If I need to guess, I would bet you forget the tf.abs somewhere in your sequence length computation.
Be aware: your length function, as well as tf_length in this post, assume non-zero values in the sequence! The calculating the sequence-length should be the task of the data-producer and fed into the computation graph. Everything else, I consider as a hacky solution.

How does one read TensorBoard histograms for a 1D example in TensorFlow?

I made the simplest 1D example for TensorBoard (tracking the minimization of a quadratic) but I get plots that don't make sense to me and I can't figure out why. Is it my own implementation or is TensorBoard buggy?
Here are the plots:
Usually I think of histograms as bar graphs that encode probability distributions (or frequency counts). I assume that the y-axis say the values and the x-axis the count? Since my numbers of steps is 120 that seemed reasonable guess.
and Scalar plot:
why is there a strange line going through my plots?
The code that produced it (you should be able to copy paste it and run it):
## run cmd to collect model: python --logdir=/tmp/playground_tmp
## show board on browser run cmd: tensorboard --logdir=/tmp/playground_tmp
## browser: http://localhost:6006/
import tensorflow as tf
# x variable
x = tf.Variable(10.0,name='x')
# b placeholder (simualtes the "data" part of the training)
b = tf.placeholder(tf.float32)
# make model (1/2)(x-b)^2
xx_b = 0.5*tf.pow(x-b,2)
learning_rate = 1.0
# get optimizer
opt = tf.train.GradientDescentOptimizer(learning_rate)
# gradient variable list = [ (gradient,variable) ]
gv = opt.compute_gradients(y,[x])
# transformed gradient variable list = [ (T(gradient),variable) ]
decay = 0.9 # decay the gradient for the sake of the example
# apply transformed gradients
tgv = [ (decay*g, v) for (g,v) in gv] #list [(grad,var)]
apply_transform_op = opt.apply_gradients(tgv)
# track value of x
x_scalar_summary = tf.scalar_summary("x", x)
x_histogram_sumarry = tf.histogram_summary('x_his', x)
with tf.Session() as sess:
merged = tf.merge_all_summaries()
tensorboard_data_dump = '/tmp/playground_tmp'
writer = tf.train.SummaryWriter(tensorboard_data_dump, sess.graph)
epochs = 120
for i in range(epochs):
b_val = 1.0 #fake data (in SGD it would be different on every epoch)
# applies the gradients
[summary_str_apply_transform,_] =[merged,apply_transform_op], feed_dict={b: b_val})
writer.add_summary(summary_str_apply_transform, i)
I also met the same problem where multiple lines occurred in the Instance tab in tensor board (even I tried your codes and Board service shows the duplicated warning and only present one curve, better than me)
WARNING:tensorflow:Found more than one graph event per run. Overwriting the graph with the newest event.
nevertheless, the solution hold the same as #Olivier Moindrot mentioned, delete the old logs, while sometimes the board may cache some results so you may want to reboot the board services.
The way to make sure we present the newest summary, as the MINIST example shown, is to log at a new folder:
if tf.gfile.Exists(FLAGS.summaries_dir):
Link to full source, with TF version r0.10:
