Only evaluate non-zero values of tf.Tensor - python

I am training to train a Neural Network using Keras and I am using my own metric function as the loss function. The reason for this is that the actual values in the test set have a lot of NaN values. Let me give an example of the actual values in the test set:
12
NaN
NaN
NaN
8
NaN
NaN
3
In the preprocessing of my data, I replaced all the NaN values with zeros, so the above example contains zeros on each NaN row.
The Neural Network produces an output like this:
14
12
9
9
8
7
6
3
I only want to calculate the root mean squared error between the non-zero values. So for the example above, it should only calculate the RMSE for rows 1, 5 and 8. To do this, I created the following function:
from sklearn.metrics import mean_squared_error
from math import sqrt
[...]
def evaluation_metric(y_true, y_pred):
y_true = y_true[np.nonzero(y_true)]
y_pred = y_pred[np.nonzero(y_true)]
error = sqrt(mean_squared_error(y_true, y_pred))
return error
When you test the function by hand, by feeding the actual values from the test set and an output from the neural network that is initialized with random weights, it works well an produces an error value. I am able to optimize the weights using an Evolutionary approach, and I am able to optimize this error measure by adjusting the weights of the network.
Now, I want to train the network with evaluation_metric as the loss function using the model.compile function from Keras. When I run:
model.compile(loss=evaluation_metric, optimizer='rmsprop', metrics=[evaluation_metric])
I get the following error:
TypeError: Using a tf.Tensor as a Python bool is not allowed. Use if t is not None: instead of if t: to test if a tensor is defined, and use TensorFlow ops such as tf.cond to execute subgraphs conditioned on the value of a tensor.
I think this has to do with the usage of np.nonzero. Since I am working with Keras, I should probably use a function of the Keras Backend, or using something like tf.cond to check for the non zero values of y_true.
Can someone help me with this?
EDIT
The code works after applying the following fix:
def evaluation_metric(y_true, y_pred):
y_true = y_true * (y_true != 0)
y_pred = y_pred * (y_true != 0)
error = root_mean_squared_error(y_true, y_pred)
return error
Along with the following function for calculating the RMSE of a tf object:
def root_mean_squared_error(y_true, y_pred):
return K.sqrt(K.mean(K.square(y_pred - y_true), axis=-1))

Yes, indeed the problem lies in using numpy function. Here is a quick fix:
def evaluation_metric(y_true, y_pred):
y_true = y_true * (y_true != 0)
y_pred = y_pred * (y_true != 0)
error = sqrt(mean_squared_error(y_true, y_pred))
return error

I would write the metric in tensorflow on my own like:
import tensorflow as tf
import numpy as np
data = np.array([0, 1, 2, 0, 0, 3, 7, 0]).astype(np.float32)
pred = np.random.randn(8).astype(np.float32)
gt = np.random.randn(8).astype(np.float32)
data_op = tf.convert_to_tensor(data)
pred_op = tf.convert_to_tensor(pred)
gt_op = tf.convert_to_tensor(gt)
expected = np.sqrt(((gt[data != 0] - pred[data != 0]) ** 2).mean())
def nonzero_mean(gt_op, pred_op, data_op):
mask_op = 1 - tf.cast(tf.equal(data_op, 0), tf.float32)
actual_op = ((gt_op - pred_op) * mask_op)**2
actual_op = tf.reduce_sum(actual_op) / tf.cast(tf.count_nonzero(mask_op), tf.float32)
actual_op = tf.sqrt(actual_op)
return actual_op
with tf.Session() as sess:
actual = sess.run(nonzero_mean(gt_op, pred_op, data_op))
print actual, expected
The y_true != 0 is not possible in plain Tensorflow. Not sure, if keras does some magic here.

Related

Creating BLEU loss method on tensorflow gives "No gradient provided"

I need to build a custom loss method based on BLEU. I'm passing my LabelEncoder in the constructor to reverse labels and predictions and calculate the bleu distance.
Here is my Loss class
class CIMCodeSuccessiveLoss(Loss):
def __init__(self, labelEncoder: LabelEncoder):
super().__init__()
self.le = labelEncoder
def bleu_score(self, true_label, pred_label):
cim_true_label = self.le.inverse_transform(true_label.numpy())
cim_pred_label = self.le.inverse_transform(pred_label.numpy())
bleu_scores = [sentence_bleu(list(one_true_label),
list(one_pred_label),
weights=(0.5, 0.25, 0.125, 0.125)) for one_true_label, one_pred_label in
zip(cim_true_label, cim_pred_label)]
return np.float32(bleu_scores)
def call(self, y_true, y_pred):
labeled_y_pred = tf.cast(tf.argmax(y_pred, axis=-1), tf.int32)
bleu = tf.py_function(self.bleu_score, (tf.reshape(y_true, [-1]), labeled_y_pred), tf.float32)
return tf.reduce_sum(tf.square(1 - bleu))
The bleu_score method is calculating the correct scores and returns a NumPy array.
when I try to return the squared sum, I get this error
raise ValueError(f"No gradients provided for any variable: {variable}.
I'm also providing the model:
inputs = tf.keras.Input(shape=(1,), dtype=tf.string)
x = vectorize_layer(inputs)
x = Embedding(vocab_size, embedding_dim, name="embedding")(x)
x = LSTM(units=32, name="lstm")(x)
outputs = Dense(classes_number, name="classification")(x)
model = tf.keras.Model(inputs=inputs, outputs=outputs, name="first_cim_classifier")
model.summary()
# we add early stopping for our model.
early_stopping = EarlyStopping(monitor='loss', patience=2)
model.compile(
loss=CIMCodeSuccessiveLoss(le),
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy", "crossentropy"],
run_eagerly=True)
trained_model = model.fit(np.array(x_train), np.array(y_train), batch_size=64, epochs=10,
validation_data=(np.array(x_val), np.array(y_val)),
callbacks=[early_stopping])
Any help is appreciated. Thanks in advance.
To calculate the loss function, you use the method 'tf.argmax(y_pred, axis=-1)',argmax is not differentiable and the automatic differentiation to calculate the gradients is not possible, you have to remove this method, for example (depending on your data) you can change the output layer to softmax and labels to one_hot.
The issue is, the argmax function is not a differentiable, which is problematic when including it in a loss function:
labeled_y_pred = tf.cast(tf.argmax(y_pred, axis=-1), tf.int32)
One way to workaround this is to use a differentiable approximation of the argmax function, similar to the smooth maximum function:
As β approaches infinity, this will approach the the true maximum. For your purposes, β=10 or β=100 should accomplish your goals.
In Tensorflow, this could be accomplished as follows:
def differentiable_argmax_approx(x, beta=10, axis=None):
return tf.reduce_sum(tf.cumsum(tf.ones_like(x)) * tf.exp(beta * x) / tf.reduce_sum(tf.exp(beta * x), axis=axis), axis=axis) - 1
Then changing the original line to:
labeled_y_pred = tf.cast(differentiable_argmax_approx(y_pred, axis=-1), tf.int32)
We can verify the functionality with a simple test case:
beta = 10
x = np.array([1, 2, 3, 10, 4, 5], dtype=np.float)
y = differentiable_argmax_approx(x, beta)
assert x.argmax() == y
One caveat to this approach: if the maximum value is not unique along the axis that we're applying the function to, the result will be the arithmetic mean of the indices. Providing another test case to illustrate:
beta = 10
x = np.array([1, 2, 10, 3, 10], dtype=np.float)
y = differentiable_argmax_approx(x, beta)
assert y == 3
The result is 3 here, because we have two occurrences of the maximum value (10): one at index 2, and the other at index 4. In contrast, the regular argmax function returns the first index of the maximum argument.
Another improvement would be moving more computation into Tensorflow functions. To start, instead of using sklearn's LabelEncoder, to apply a mapping in the loss function, you could use a tf.lookup.StaticHashTable to accomplish the same objective with the Tensorflow API. To convert from a LabelEncoder to a tf.lookup.StaticHashTable, you can use the following function:
def convert_label_encoder_to_static_hash_table(le: LabelEncoder,
default_value: int = -1) -> tf.lookup.StaticHashTable:
static_hash_table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(
tf.convert_to_tensor(le.classes_),
tf.convert_to_tensor(le.transform(le.classes_))), default_value=default_value)
return static_hash_table
Or, for your purposes, since you're applying the inverse mapping (to go from integers -> string), you may want to swap the key and the values:
def convert_label_encoder_to_static_hash_table(le: LabelEncoder,
default_value: int = "") -> tf.lookup.StaticHashTable:
static_hash_table = tf.lookup.StaticHashTable(
tf.lookup.KeyValueTensorInitializer(
tf.convert_to_tensor(le.transform(le.classes_)),
tf.convert_to_tensor(le.classes_))), default_value=default_value)
return static_hash_table
and, in the initializer:
def __init__(self, labelEncoder: LabelEncoder):
super().__init__()
self.table = convert_label_encoder_to_static_hash_table(labelEncoder)
By operating on tf.Tensor objects, you can utilize tf.map_fn instead of using a for-loop and converting to a numpy array/lists - your loss function would become:
def bleu_score(self, true_label, pred_label):
cim_true_label = self.table[true_label]
cim_pred_label = self.table[pred_label]
bleu_scores = tf.map_fn(lambda x: sentence_bleu([str(x[0])], [str(x[1])], weights=(0.5, 0.25, 0.125, 0.125)),
elems=tf.stack([(ground_truth, pred) for ground_truth, pred in
zip(cim_pred_label, cim_true_label)],
dtype=(tf.string, tf.string),
fn_output_signature=tf.int32))
return bleu_scores
This should also mitigate the need to call tf.py_func in the loss computation, since the bleu_score function is now entirely Tensorflow operations instead of calling native Python functions.

Tensorflow AutoGraph Polynomial Model With Multiple Outputs

I have a tensorflow model whose outputs correspond to coefficients of multiple polynomials. Note that my model actually has another set outputs (multi-output), but I've mocked this below just by returning the input in addition to the polynomial coefficients.
I'm having a lot of trouble during the training of the model, related to tensor shapes. I've verified that the model is able to predict on sample inputs, and that the loss function works on sample outputs. But, during training, it immediately throws an error (see below)
For every input, the model takes in a fixed embedding-size input, and outputs coefficients for 2 polynomials of degree 2. For example, the output on a single input can look like:
[array([[[1, 2, 3],
[ 4, 5, 6]]]),
[...]]
corresponding to polynomials [1*x^2+2*x+3, 4*x^2+5*x+6]. Note that I've hidden the second output.
I noticed that tf.math.polyval requires a list of coefficients, making it wonky with AutoGrad. So, I implemented my own version of Horner's algorithm with pure tensors.
import numpy as np
import tensorflow as tf
import logging
import tensorflow.keras as K
#tf.function
def tensor_polyval(coeffs, x):
"""
Calculates polynomial scalars from tensor of polynomial coefficients
Tensorflow tf.math.polyval requires a list coeff, which isn't compatible with autograd
# Inputs:
- coeffs (NxD Tensor): each row of coeffs corresponds to r[0]*x^(D-1)+r[1]*x^(D-2)...+r[D]
- x: Scalar!
# Output:
- r[0]*x^(D-1)+r[1]*x^(D-2)...+r[D] for row in coeffs
"""
p = coeffs[:, 0]
for i in range(1,coeffs.shape[1]):
tf.autograph.experimental.set_loop_options(
shape_invariants=[(p, tf.TensorShape([None]))])
c = coeffs[:, i]
p = tf.add(c, tf.multiply(x, p))
return p
#tf.function
def coeffs_to_poly(coeffs, n):
# Converts a NxD array of coefficients to N evaluated polynomials at x=n
return tensor_polyval(coeffs, tf.convert_to_tensor(n))
Now here's a super-simplified example of my model, loss function and training routine:
def model_init(embedDim=8, polyDim=2,terms=2):
input = K.Input(shape=(embedDim,))
x = K.layers.Reshape((embedDim,))(input)
aCoeffs = K.layers.Dense((polyDim+1)*terms, activation='tanh')(x)
aCoeffs = K.layers.Reshape((terms, polyDim+1))(aCoeffs)
model = K.Model(inputs=input, outputs=[aCoeffs, input])
return model
def get_random_batch(batch, embedDim, dtype='float64'):
x = np.random.randn(batch, embedDim).astype(dtype)
y = np.array([1. for i in range(batch)]).astype(dtype)
return [x,
y]
#tf.function
def test_loss(y_true, y_pred, dtype=dataType):
an = tf.vectorized_map(lambda y_p: coeffs_to_poly(y_p[0],
tf.constant(5,dtype=dataType)),
y_pred)
return tf.reduce_mean(tf.reduce_mean(an,axis=-1))
embedDim=8
polyDim=2
terms=2
dataType = 'float64'
tf.keras.backend.set_floatx(dataType)
model = model_init(embedDim, polyDim, terms)
XTrain, yTrain = get_random_batch(batch=128,
embedDim=embedDim)
# Init Model
LR = 0.001
loss = test_loss
epochs = 5
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LR), loss=loss)
hist = model.fit(XTrain,
yTrain,
batch_size=4,
epochs=epochs,
max_queue_size=10, workers=2, use_multiprocessing=True)
The error I get is related to the tensor_polyval function:
<ipython-input-15-f96bd099fe08>:3 test_loss *
an = tf.vectorized_map(lambda y_p: coeffs_to_poly(y_p[0],
<ipython-input-5-7205207d12fd>:23 coeffs_to_poly *
return tensor_polyval(coeffs, tf.convert_to_tensor(n))
<ipython-input-5-7205207d12fd>:13 tensor_polyval *
p = coeffs[:, 0]
...
ValueError: Index out of range using input dim 1; input has only 1 dims for '{{node strided_slice}} = StridedSlice[Index=DT_INT32, T=DT_DOUBLE, begin_mask=1, ellipsis_mask=0, end_mask=1, new_axis_mask=0, shrink_axis_mask=2](coeffs, strided_slice/stack, strided_slice/stack_1, strided_slice/stack_2)' with input shapes: [3], [2], [2], [2] and with computed input tensors: input[3] = <1 1>.
What's frustrating is that I'm perfectly able to predict with the model on sample inputs and also calculate a sample loss:
test_loss(yTrain[0:5],
model.predict(XTrain[0:5]),
dtype=dataType)
which runs just fine.
In the test_loss function, specifically the I'm just referring to the first output, via y_p[0]. It tries to calculate the value of the polynomials at n=5 and then outputs an average over everything (again this is just mocked code). As I understand it, y_p[1] would refer to the second output (in this case, a copy of the input). I would think the tf.vectorized_map should be operating across all outputs of the model batch, but it seems to be slicing one extra dimension??
I noticed that the code does train if I remove the output ,input in the model (making it a single output) and change y_p[0] to y_p in the test_loss. I have no idea why it's broken when adding the extra output, as my understanding of tf.vectorized_map implies that it acts separately on each element of the list y_pred
If we need the single loss function to receive multiple outputs altogether, perhaps we can concatenate them together to form one output.
In this case:
Changes to the model structure, here we pack the outputs:
def model_init(embedDim=8, polyDim=2, terms=2):
input = K.Input(shape=(embedDim, ))
x = K.layers.Reshape((embedDim, ))(input)
aCoeffs = K.layers.Dense((polyDim + 1) * terms, activation='tanh')(x)
# pack the two outputs, add flatten layers if their shapes are not batch*K
outputs = K.layers.Concatenate()([aCoeffs, input])
model = K.Model(inputs=input, outputs=outputs)
model.summary()
return model
Changes to the loss function, here we unpack the outputs:
# the loss function needs to know these
polyDim = 2
terms = 2
#tf.function
def test_loss(y_true, y_pred, dtype=dataType):
"""Loss function for flattened outputs."""
# unpack multiple outputs
offset = (polyDim + 1) * terms
aCoeffs = tf.reshape(y_pred[:, :offset], [-1, terms, polyDim + 1])
inputs = y_pred[:, offset:]
print(aCoeffs, inputs)
# do something with the two unpacked outputs, like below
an = tf.vectorized_map(
lambda y_p: coeffs_to_poly(y_p, tf.constant(5, dtype=dataType)),
aCoeffs)
return tf.reduce_mean(tf.reduce_mean(an, axis=-1))
Notice that the loss function relies on the knowledge of the original shapes of the outputs in order to restore them. Consider sub-classing tf.keras.losses.Loss.
P.S. For anyone simply need different losses for the multiple losses:
Define loss functions for the two outputs.
#tf.function
def test_loss(y_true, y_pred, dtype=dataType):
"""Loss function for output 1
(Only changed y_p[0] to y_p)"""
an = tf.vectorized_map(
lambda y_p: coeffs_to_poly(y_p, tf.constant(5, dtype=dataType)),
y_pred)
return tf.reduce_mean(tf.reduce_mean(an, axis=-1))
#tf.function
def dummy_loss(y_true, y_pred, dtype=dataType):
"""Loss function for output 2 i.e. the input, for debugging
Better use 0 insead of 1.2345"""
return tf.constant(1.2345, dataType)
Change to model.compile:
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=LR), loss=[test_loss, dummy_loss])

Keras replace log(0) in custom loss function

I am trying to use Poisson unscaled deviance as a loss function for my neural network, but there's a major flow with this : y_true can take (and will take very often) the value 0.
Unscaled deviance works like this for Poisson case :
If y_true = 0, then loss = 2 * d * y_pred
If y_true > 0, then loss = 2 * d *y_pred * (y_true * log(y_true)-y_true * log(y_pred)-y_true+y_pred
Note that as soon as log(0) is computed, the loss becomes -inf so my goal is to prevent this to happen.
I tried using the switch function to solve this but here's the trick:
If I have the value log(0), I don't want to replace it by 0 (with K.zeros()) because it would be considering that y_true = 1 since log(1) = 0.
Therefore I want to try using a large negative value in this case (-10000 for example) but I don't know how to do this since K.variable(-10000) gives the error:
ValueError: Rank of `condition` should be less than or equal to rank of `then_expression` and `else_expression`. ndim(condition)=1, ndim(then_expression)=0
Using K.zeros_like(y_true) instead of K.variable(-10000) will work for keras but it is mathematically incorrect and the optimisation doesn't work properly because of this.
I'd like to know how to replace the log by a large negative value in the switch function. Here's my attempt:
def custom_loss3(data, y_pred):
y_true = data[:, 0]
d = data[:, 1]
# condition
loss_value = KB.switch(KB.less_equal(y_true, 0),
2 * d * y_pred, 2 * d * (y_true * KB.switch(KB.less_equal(y_true, 0),
KB.variable(-10000), KB.log(y_true)) - y_true * KB.switch(KB.less_equal(y_pred, 0.), KB.variable(-10000), KB.log(y_pred)) - y_true + y_pred))
return loss_value

Weight different misclassifications differently keras

I want my model to increase the loss for a false positive prediction when training by creating a custom loss function.
The class_weight parameter in model.fit() does not work for this issue. The class_weight is already set to { 0: 1, 1:23 } as I have skewed training data where there are 23 times as many non-true labels as there are true labels.
I am not too experienced when working with the keras backend. I have mostly worked with the functional model.
What I want to create is:
def weighted_binary_crossentropy(y_true, y_pred):
#where y_true == 0 and y_pred == 1:
# weight this loss and make it 50 times larger
#return loss
I can do simple stuff with the tensors such as getting the mean squared error but I have no idea how to do logical stuff.
I have tried to do some hacky solution which doesnt work and feels totally wrong:
def weighted_binary_crossentropy(y_true, y_pred):
false_positive_weight = 50
thresh = 0.5
y_pred_true = K.greater_equal(thresh,y_pred)
y_not_true = K.less_equal(thresh,y_true)
false_positive_tensor = K.equal(y_pred_true,y_not_true)
loss_weights = K.ones_like(y_pred) + false_positive_weight*false_positive_tensor
return K.binary_crossentropy(y_true, y_pred)*loss_weights
I am using python 3 with keras 2 and tensorflow as backend.
Thanks in advance!
I think you're almost there...
from keras.losses import binary_crossentropy
def weighted_binary_crossentropy(y_true, y_pred):
false_positive_weight = 50
thresh = 0.5
y_pred_true = K.greater_equal(thresh,y_pred)
y_not_true = K.less_equal(thresh,y_true)
false_positive_tensor = K.equal(y_pred_true,y_not_true)
#changing from here
#first let's transform the bool tensor in numbers - maybe you need float64 depending on your configuration
false_positive_tensor = K.cast(false_positive_tensor,'float32')
#and let's create it's complement (the non false positives)
complement = 1 - false_positive_tensor
#now we're going to separate two groups
falsePosGroupTrue = y_true * false_positive_tensor
falsePosGroupPred = y_pred * false_positive_tensor
nonFalseGroupTrue = y_true * complement
nonFalseGroupPred = y_pred * complement
#let's calculate one crossentropy loss for each group
#(directly from the keras loss functions imported above)
falsePosLoss = binary_crossentropy(falsePosGroupTrue,falsePosGroupPred)
nonFalseLoss = binary_crossentropy(nonFalseGroupTrue,nonFalseGroupPred)
#return them weighted:
return (false_positive_weight*falsePosLoss) + nonFalseLoss

Custom loss function implementation

I'm trying to implement a new loss function of my own.
When I tried to debug it (or print in it) I've noticed it is called only once at the model creating section of the code.
How can I know what y_pred and y_true contains (shapes, data etc..) if I cannot run my code into this function while fitting the model?
I wrote this loss function:
def my_loss(y_true, y_pred):
# run over the sequence, jump by 3
# calc the label
# if the label incorrect punish
y_pred = K.reshape(y_pred, (1, 88, 3))
y_pred = K.argmax(y_pred, axis=1)
zero_count = K.sum(K.clip(y_pred, 0, 0))
one_count = K.sum(K.clip(y_pred, 1, 1))
two_count = K.sum(K.clip(y_pred, 2, 2))
zero_punish = 1 - zero_count / K.count_params(y_true)
one_punish = 1- one_count/ K.count_params(y_true)
two_punish = 1- two_count/ K.count_params(y_true)
false_arr = K.not_equal(y_true, y_pred)
mask0 = K.equal(y_true, K.zeros_like(y_pred))
mask0_miss = K.dot(false_arr, mask0) * zero_punish
mask1 = K.equal(y_true, K.ones_like(y_pred))
mask1_miss = K.dot(false_arr, mask1) * one_punish
mask2 = K.equal(y_true, K.zeros_like(y_pred)+2)
mask2_miss = K.dot(false_arr, mask2) * two_punish
return K.sum(mask0_miss) + K.sum(mask1_miss) + K.sum(mask2_miss)
It fails on:
theano.gof.fg.MissingInputError: A variable that is an input to the graph was
neither provided as an input to the function nor given a value. A chain of
variables leading from this input to an output is [/dense_1_target, Shape.0].
This chain may not be unique
Backtrace when the variable is created:
How can I fix it?
You have to understand that Theano is a symbolic language. For example, when we define the following loss function in Keras:
def myLossFn(y_true, y_pred):
return K.mean(K.abs(y_pred - y_true), axis=-1)
Theano is just making a symbolic rule in a computational graph, which would be executed when it gets values i.e. when you train the model with some mini-batches.
As far as your question on how to debug your model goes, you can use theano.function for that. Now, you want to know if your loss calculation is correct. You do the following.
You can implement the python/numpy version of your loss function. Pass two random vectors to your numpy-loss-function and get a number. To verify if theano gives nearly identical result, define something as follows.
import theano
from theano import tensor as T
from keras import backend as K
Y_true = T.frow('Y_true')
Y_pred = T.fcol('Y_pred')
out = K.mean(K.abs(Y_pred - Y_true), axis=-1)
f = theano.function([Y_true, Y_pred], out)
# creating some values
y_true = np.random.random((10,))
y_pred = np.random.random((10,))
numpy_loss_result = np.mean(np.abs(y_true-y_pred))
theano_loss_result = f(y_true, y_pred)
# check if both are close enough
print numpy_loss_result-theano_loss_result # should be less than 1e-5
Basically, theano.function is a way to put values and evaluate those symbolic expressions. I hope this helps.

Categories