Writing a custom loss function element by element for Keras - python

I am new to machine learning, python and tensorflow. I am used to code in C++ or C# and it is difficult for me to use tf.backend.
I am trying to write a custom loss function for an LSTM network that tries to predict if the next element of a time series will be positive or negative. My code runs nicely with the binary_crossentropy loss function. I want now to improve my network having a loss function that adds the value of the next time series element if the predicted probability is greater than 0.5 and substracts it if the prob is less or equal to 0.5.
I tried something like this:
def customLossFunction(y_true, y_pred):
temp = 0.0
for i in range(0, len(y_true)):
if(y_pred[i] > 0):
temp += y_true[i]
else:
temp -= y_true[i]
return temp
Obviously, dimensions are wrong but since I cannot step into my function while debugging, it is very hard to get a grasp of dimensions here.
Can you please tell me if I can use an element-by-element function? If yes, how? And if not, could you help me with tf.backend?
Thanks a lot

From keras backend functions, you have the function greater that you can use:
import keras.backend as K
def customLossFunction(yTrue,yPred)
greater = K.greater(yPred,0.5)
greater = K.cast(greater,K.floatx()) #has zeros and ones
multiply = (2*greater) - 1 #has -1 and 1
modifiedTrue = multiply * yTrue
#here, it's important to know which dimension you want to sum
return K.sum(modifiedTrue, axis=?)
The axis parameter should be used according to what you want to sum.
axis=0 -> batch or sample dimension (number of sequences)
axis=1 -> time steps dimension (if you're using return_sequences = True until the end)
axis=2 -> predictions for each step
Now, if you have only a 2D target:
axis=0 -> batch or sample dimension (number of sequences)
axis=1 -> predictions for each sequence
If you simply want to sum everything for every sequence, then just don't put the axis parameter.
Important note about this function:
Since it contains only values from yTrue, it cannot backpropagate to change the weights. This will lead to a "none values not supported" error or something very similar.
Although yPred (the one that is connected to the model's weights) is used in the function, it's used only for getting a true x false condition, which is not differentiable.

Related

Efficient batch derivative operations in PyTorch

I am using Pytorch to implement a neural network that has (say) 5 inputs and 2 outputs
class myNetwork(nn.Module):
def __init__(self):
super(myNetwork,self).__init__()
self.layer1 = nn.Linear(5,32)
self.layer2 = nn.Linear(32,2)
def forward(self,x):
x = torch.relu(self.layer1(x))
x = self.layer2(x)
return x
Obviously, I can feed this an (N x 5) Tensor and get an (N x 2) result,
net = myNetwork()
nbatch = 100
inp = torch.rand([nbatch,5])
inp.requires_grad = True
out = net(inp)
I would now like to compute the derivatives of the NN output with respect to one element of the input vector (let's say the 5th element), for each example in the batch. I know I can calculate the derivatives of one element of the output with respect to all inputs using torch.autograd.grad, and I could use this as follows:
deriv = torch.zeros([nbatch,2])
for i in range(nbatch):
for j in range(2):
deriv[i,j] = torch.autograd.grad(out[i,j],inp,retain_graph=True)[0][i,4]
However, this seems very inefficient: it calculates the gradient of out[i,j] with respect to every single element in the batch, and then discards all except one. Is there a better way to do this?
By virtue of backpropagation, if you did only compute the gradient w.r.t a single input, the computational savings wouldn't necessarily amount to much, you would only save some in the first layer, all layers afterwards need to be backpropagated either way.
So this may not be the optimal way, but it doesn't actually create much overhead, especially if your network has many layers.
By the way, is there a reason that you need to loop over nbatch? If you wanted the gradient of each element of a batch w.r.t a parameter, I could understand that, because pytorch will lump them together, but you seem to be solely interested in the input...

Compute logarithm of nonzero values in a tensor with keras

I am trying to implement a custom loss function and it requires taking logarithm of values in the output tensor from the model. The tensor may contain zeros as well and so I want to take only non-zero values and compute logarithm.
The output tensor is of shape (20,224,224). I could get the number of nonzero elements along axis 0 using the below function.
#To get the number of nonzero elements along the axis 0
count = K.tf.count_nonzero(y,axis=0)
But I couldn't understand how to calculate the nonzero log. I could come up with numpy solution as below but not sure about it's keras equivalent.
loss = np.log2(y, out=np.zeros_like(y), where=(y!=0))
Can someone help me with calculating logarithm of non zeros along axis 0 for the tensor.
A possible approach might leverage:
tf.where(
condition,
x=None,
y=None,
name=None
)
and related functions listed: https://www.tensorflow.org/api_docs/python/tf
that are similar to the numpy versions you are exploring.
For example, you might consider something like using the above tf.where() function to test for zero entries and either select from the original tensor (x), or a similar shaped tensor of "ones" depending on the result. Then, you can compute the log of the resulting tensor and do the final summation over that result etc.
The below is an example using Colab with Eager execution that shows the idea a bit more explicitly.
%tensorflow_version 2.x
# Above only works in Google Colab.
# Also, note eager execution is default True for Tensorflow 2.0
import tensorflow as tf
y_true = tf.Variable([[2.0],[0.0],[0.0],[3.0],[4.0]])
y_pred = tf.Variable([[2.0],[-1.0],[1.0],[3.0],[4.0]])
def my_loss_function(y_true, y_pred):
print('y_true:')
print(y_true)
print('y_pred:')
print(y_pred)
y_zeros = tf.zeros_like(y_pred)
print('y_zeros:')
print(y_zeros)
y_mask = tf.math.greater(y_pred, y_zeros)
print('y_mask:')
print(y_mask)
res = tf.boolean_mask(y_pred, y_mask)
print('res:')
print(res)
logres = tf.math.log(res)
print('logres:')
print(logres)
finres = tf.math.reduce_sum(logres)
print('finres:')
print(finres)
return finres
myres = my_loss_function(y_true, y_pred)
print(myres)
I hope this helps.

How to ignore part of input and output in Keras?

I'm trying to train a model that takes n values as input and output n values. The problem is that n can be from 1 to 700. So I build a network with 700 as input and 700 as output. The extra inputs and outputs are set to zero.
When training the model, I don't care about if the extra outputs are accurate or not. So I tried to define my own loss function as follows:
def mse_truncate(y_true, y_pred):
def fn(x):
return tf.cond(x < 0.01,lambda: 0.0,lambda: 1.0)
#Ignore the square error if y_true[i] is near zero
sgn = tf.map_fn(fn,y_true)
return K.mean(sgn * K.square(y_true-y_pred),axis=-1)
This function works on console.
But when I compile the model, I get an error:
model.compile(optimizer='sgd',loss=mse_truncate, metrics=['accuracy'])
ValueError: Shape must be rank 0 but is rank 1 for 'loss_5/dense_2_loss/map/while/cond/Switch' (op: 'Switch') with input shapes: [?], [?].
Can someone tell me what's wrong here?
Or are there better ways to handle the variable length input and output?
Note:
More on the problem, the input is a sequence(length <= 700) and the output is the distance between the first element and each element in the sequence.
You could use tf.where and tf.gather to only take those values that you care about into consideration, e.g.:
indices = tf.where(tf.greater(y_true, 0.01)) # or `tf.less`, `tf.equal` etc.
loss = K.mean(K.square(tf.gather(y_true, indices) - tf.gather(y_pred, indices))))

tensorflow: gradients for a custom loss function

I have an LSTM predicting time series values in tensorflow.
The model is working using an MSE as a loss function.
However, I'd like to be able to create a custom loss function where one of the error values is multiplied by two (therefore producing a higher error value).
In my batch of size 10, I want the 3rd value of the first input to be multiplied by 2, but because this is time series, this corresponds to the second value in the second input and the first value in the third input.
The error I get is:
ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients
How do I make the gradients?
def loss_function(y_true, y_pred, peak_value=3, weight=2):
# peak value is where the multiplication happens on the first line
# weight is the how much the error is multiplied by
all_dif = tf.squared_difference(y_true, y_pred) # should be shape=[10,10]
peak = [peak_value] * 10
listy = range(0, 10)
c = [(i - j) % 10 for i, j in zip(peak, listy)]
for i in range(0, 10):
indices = [[i, c[i]]]
values = [1.0]
shape = [10,10]
delta = tf.SparseTensor(indices, values, shape)
all_dif = all_dif + tf.sparse_tensor_to_dense(delta)
return tf.reduce_sum(all_dif)
I believe the psuedo code would look something like this:
#tf.custom_gradient
def loss_function(y_true, y_pred, peak_value=3, weight=2)
## your code
def grad(dy):
return dy * partial_derivative
return loss, grad
Where partial_derivative is the analytically evaluated partial derivative with respect to your loss function. If your loss function is a function of more than one variable, it will require a partial derivative respect to each variable, I believe.
If you need more information, the documentation is good: https://www.tensorflow.org/api_docs/python/tf/custom_gradient
And I've yet to find an example of this functionality embedded in a model that's not a toy.

Keras & TensorFlow: getting 2nd derivative of f(x) wrt x, where dim(x) = (1, n)

I'm working in Keras with TensorFlow under the hood. I have a deep neural model (predictive autoencoder). I'm doing something somewhat similar to this: https://arxiv.org/abs/1612.00796 -- I'm trying to understand influence of variables in a given layer on the output.
For this I need to find 2nd derivative (Hessian) of the loss (L) with respect to output of particular layer (s):
Diagonal entries would be sufficient. L is a scalar, s is 1 by n.
What I tried first:
dLds = tf.gradients(L, s) # works fine to get first order derivatives
d2Lds2 = tf.gradients(dLds, s) # throws an error
TypeError: Second-order gradient for while loops not supported.
I also tried:
d2Lds2 = tf.hessians(L, s)
ValueError: Computing hessians is currently only supported for one-dimensional tensors. Element number 0 of `xs` has 2 dimensions.
I cannot change shape of s cause it's a part of neural network (LSTM's state). The first dimension (batch_size) is already set to 1, I don't think I can get rid of it.
I cannot reshape s because it breaks flow of the gradients, e.g.:
tf.gradients(L, tf.reduce_sum(s, axis=0))
gives:
[None]
Any ideas on what can I do in this situation?
This is not supported at the moment. See this report.

Categories