I am trying to implement a custom loss function and it requires taking logarithm of values in the output tensor from the model. The tensor may contain zeros as well and so I want to take only non-zero values and compute logarithm.
The output tensor is of shape (20,224,224). I could get the number of nonzero elements along axis 0 using the below function.
#To get the number of nonzero elements along the axis 0
count = K.tf.count_nonzero(y,axis=0)
But I couldn't understand how to calculate the nonzero log. I could come up with numpy solution as below but not sure about it's keras equivalent.
loss = np.log2(y, out=np.zeros_like(y), where=(y!=0))
Can someone help me with calculating logarithm of non zeros along axis 0 for the tensor.
A possible approach might leverage:
tf.where(
condition,
x=None,
y=None,
name=None
)
and related functions listed: https://www.tensorflow.org/api_docs/python/tf
that are similar to the numpy versions you are exploring.
For example, you might consider something like using the above tf.where() function to test for zero entries and either select from the original tensor (x), or a similar shaped tensor of "ones" depending on the result. Then, you can compute the log of the resulting tensor and do the final summation over that result etc.
The below is an example using Colab with Eager execution that shows the idea a bit more explicitly.
%tensorflow_version 2.x
# Above only works in Google Colab.
# Also, note eager execution is default True for Tensorflow 2.0
import tensorflow as tf
y_true = tf.Variable([[2.0],[0.0],[0.0],[3.0],[4.0]])
y_pred = tf.Variable([[2.0],[-1.0],[1.0],[3.0],[4.0]])
def my_loss_function(y_true, y_pred):
print('y_true:')
print(y_true)
print('y_pred:')
print(y_pred)
y_zeros = tf.zeros_like(y_pred)
print('y_zeros:')
print(y_zeros)
y_mask = tf.math.greater(y_pred, y_zeros)
print('y_mask:')
print(y_mask)
res = tf.boolean_mask(y_pred, y_mask)
print('res:')
print(res)
logres = tf.math.log(res)
print('logres:')
print(logres)
finres = tf.math.reduce_sum(logres)
print('finres:')
print(finres)
return finres
myres = my_loss_function(y_true, y_pred)
print(myres)
I hope this helps.
Related
I am working on a NN with Pytorch which simply maps points from the plane into real numbers, for example
model = nn.Sequential(nn.Linear(2,2),nn.ReLU(),nn.Linear(2,1))
What I want to do, since this network defines a map h:R^2->R, is to compute the gradient of this mapping h in the training loop. So for example
for it in range(epochs):
pred = model(X_train)
grad = torch.autograd.grad(pred,X_train)
....
The training set has been defined as a tensor requiring the gradient. My problem is that even if the output, for each fixed point, is a scalar, since I am propagating a set of N=100 points, the output is actually a Nx1 tensor. This brings to the error: autograd can compute the gradient just of scalar functions.
In fact, trying with the little change
pred = torch.sum(model(X_train))
everything works perfectly. However I am interested in all the single gradients so, is there a way to compute all these gradients together?
Actually computing the sum as presented above gives exactly the same result I expect of course, but I wanted to know if this is the only possiblity.
There are other possibilities but using .sum is the simplest way. Using .sum() on the final loss vector and computing dpred/dinput will give you the desired output. Here is why:
Since, pred = sum(loss) = sum (f(xi))
where i is the index of input x.
dpred/dinput will be a matrix [dpred/dx0, dpred/dx1, dpred/dx...]
Consider, dpred/dx0, it will be equal to df(x0)/dx0, since other df(xi)/dx0 is 0.
PS: Please excuse the crappy mathematical expressions... SO does not support latex/math expressions.
I've noticed that Tensorflow's automatic differentiation does not give the same values as finite differences when the loss function converts the input to a numpy array to calculate the output value. Here's a minimum working example of the problem:
import tensorflow as tf
import numpy as np
def lossFn(inputTensor):
# Input is a rank-2 square tensor
return tf.linalg.trace(inputTensor # inputTensor)
def lossFnWithNumpy(inputTensor):
# Same function, but converts input to a numpy array before performing the norm
inputArray = inputTensor.numpy()
return tf.linalg.trace(inputArray # inputArray)
N = 2
tf.random.set_seed(0)
randomTensor = tf.random.uniform([N, N])
# Prove that the two functions give the same output; evaluates to exactly zero
print(lossFn(randomTensor) - lossFnWithNumpy(randomTensor))
theoretical, numerical = tf.test.compute_gradient(lossFn, [randomTensor])
# These two values match
print(theoretical[0])
print(numerical[0])
theoretical, numerical = tf.test.compute_gradient(lossFnWithNumpy, [randomTensor])
# The theoretical value is [0 0 0 0]
print(theoretical[0])
print(numerical[0])
The function tf.test.compute_gradients computes the 'theoretical' gradient using automatic differentiation, and the numerical gradient using finite differences. As the code shows, if I use .numpy() in the loss function the automatic differentiation does not calculate the gradient.
Could anybody explain the reason for this?
From the guide : Introduction to Gradients and Automatic Differentiation
The tape can't record the gradient path if the calculation exits TensorFlow. For example:
x = tf.Variable([[1.0, 2.0],
[3.0, 4.0]], dtype=tf.float32)
with tf.GradientTape() as tape:
x2 = x**2
# This step is calculated with NumPy
y = np.mean(x2, axis=0)
# Like most ops, reduce_mean will cast the NumPy array to a constant tensor
# using `tf.convert_to_tensor`.
y = tf.reduce_mean(y,axis=0)
print(tape.gradient(y, x))
outputs None
The numpy value will be cast back as a constant tensor in the call to tf.linalg.trace, which Tensorflow cannot compute gradients on.
I'd like to train a neural network in Python and Keras using a metric learning custom loss function. The loss minimizes the distances of the outputs for similar inputs and maximizes the distances between dissimilar ones. The part considering similar inputs is:
# function to create a pairwise similarity matrix, i.e
# L[i,j] == 1 for similar samples i, j and 0 otherwise
def build_indicator_matrix(y_, thr=0.1):
# y_: contains the labels of the samples,
# samples are similar in case of same label
# prevent checking equality of floats --> check if absolute
# differences are below threshold
lbls_diff = K.expand_dims(y_, axis=0) - K.expand_dims(y_, axis=1)
lbls_thr = K.less(K.abs(lbls_diff), thr)
# cast bool tensor back to float32
L = K.cast(lbls_thr, 'float32')
# POSSIBLE WORKAROUND
#L = K.sum(L, axis=2)
return L
# function to compute the (squared) Euclidean distances between all pairs
# of samples, store in DIST[i,j] the distance between output y_pred[i,:] and y_pred[j,:]
def compute_pairwise_distances(y_pred):
DIFF = K.expand_dims(y_pred, axis=0) - K.expand_dims(y_pred, axis=1)
DIST = K.sum(K.square(DIFF), axis=-1)
return DIST
# function to compute the average distance between all similar samples
def my_loss(y_true, y_pred):
# y_true: contains true labels of the samples
# y_pred: contains network outputs
L = build_indicator_matrix(y_true)
DIST = compute_pairwise_distances(y_pred)
return K.mean(DIST * L, axis=1)
For training, I pass a numpy array y of shape (n,) as target variable to my_loss. However, I found (using the computational graph in TensorBoard) that the tensorflow backend creates a 2D variable out of y (displayed shape ? x ?), and hence L in build_indicator_matrix is not 2 but 3-dimensional (shape ? x ? x ? in TensorBoard). This causes net.evaulate() and net.fit() to compute wrong results.
Why does tensorflow create a 2D rather than a 1D array? And how does this affect net.evaluate() and net.fit()?
As quick workarounds I found that either replacing the build_indicator_matrix() with static numpy code for computing L , or collapsing the "fake" dimension with the line L = K.sum(L, axis=2) solves the problem. In the latter case, however, the output of K.eval(build_indicator_matrix(y)) is of only of shape (n,) and not (n,n), so I do not understand why this workaround still yields correct results. Why does tensorflow introduce an additional dimension?
My library versions are:
keras: 2.2.4
tensorflow: 1.8.0
numpy: 1.15.0
This is because evaluate and fit work in batches.
The first dimension you see in tensorboard is the batch dimension, unknown in advance and therefore denoted ?.
When using custom metrics, remember the tensors (y_true and y_pred) you get are the ones corresponding to the batch.
For more info, show us how you call both those functions.
I want to create a custom metric for pearson correlation as defined here
I'm not sure how exactly to apply it to batches of y_pred and y_true
What I did:
def pearson_correlation_f(y_true, y_pred):
y_true,_ = tf.split(y_true[:,1:],2,axis=1)
y_pred, _ = tf.split(y_pred[:,1:], 2, axis=1)
fsp = y_pred - K.mean(y_pred,axis=-1,keepdims=True)
fst = y_true - K.mean(y_true,axis=-1, keepdims=True)
corr = K.mean((K.sum((fsp)*(fst),axis=-1))) / K.mean((
K.sqrt(K.sum(K.square(y_pred -
K.mean(y_pred,axis=-1,keepdims=True)),axis=-1) *
K.sum(K.square(y_true - K.mean(y_true,axis=-1,keepdims=True)),axis=-1))))
return corr
Is it necessary for me to use keepdims and handle the batch dimension manually and the take the mean over it? Or does Keras somehow do this automatically?
When you use K.mean without an axis, Keras automatically calculates the mean for the entire batch.
And the backend already has standard deviation functions, so it might be cleaner (and perhaps faster) to use them.
If your true data is shaped like (BatchSize,1), I'd say keep_dims is unnecessary. Otherwise I'm not sure and it would be good to test the results.
(I don't understand why you use split, but it seems also unnecessary).
So, I'd try something like this:
fsp = y_pred - K.mean(y_pred) #being K.mean a scalar here, it will be automatically subtracted from all elements in y_pred
fst = y_true - K.mean(y_true)
devP = K.std(y_pred)
devT = K.std(y_true)
return K.mean(fsp*fst)/(devP*devT)
If it's relevant to have the loss for each feature instead of putting them all in the same group:
#original shapes: (batch, 10)
fsp = y_pred - K.mean(y_pred,axis=0) #you take the mean over the batch, keeping the features separate.
fst = y_true - K.mean(y_true,axis=0)
#mean shape: (1,10)
#fst shape keeps (batch,10)
devP = K.std(y_pred,axis=0)
devt = K.std(y_true,axis=0)
#dev shape: (1,10)
return K.sum(K.mean(fsp*fst,axis=0)/(devP*devT))
#mean shape: (1,10), making all tensors in the expression be (1,10).
#sum is only necessary because we need a single loss value
Summing the result of the ten features or taking a mean of them is the same, being one 10 times the other (That is not very relevant to keras models, affecting only the learning rate, but many optimizers quickly find their way around this).
I'm working in Keras with TensorFlow under the hood. I have a deep neural model (predictive autoencoder). I'm doing something somewhat similar to this: https://arxiv.org/abs/1612.00796 -- I'm trying to understand influence of variables in a given layer on the output.
For this I need to find 2nd derivative (Hessian) of the loss (L) with respect to output of particular layer (s):
Diagonal entries would be sufficient. L is a scalar, s is 1 by n.
What I tried first:
dLds = tf.gradients(L, s) # works fine to get first order derivatives
d2Lds2 = tf.gradients(dLds, s) # throws an error
TypeError: Second-order gradient for while loops not supported.
I also tried:
d2Lds2 = tf.hessians(L, s)
ValueError: Computing hessians is currently only supported for one-dimensional tensors. Element number 0 of `xs` has 2 dimensions.
I cannot change shape of s cause it's a part of neural network (LSTM's state). The first dimension (batch_size) is already set to 1, I don't think I can get rid of it.
I cannot reshape s because it breaks flow of the gradients, e.g.:
tf.gradients(L, tf.reduce_sum(s, axis=0))
gives:
[None]
Any ideas on what can I do in this situation?
This is not supported at the moment. See this report.