I am trying to calculate the tilted loss, which in turn will be used in Keras. However, I must be doing something wrong since I am getting negative loss values (which ought to be impossible). Can anyone point out what I've done wrong. I'm assuming it's the theano syntax that I have got wrong.
The loss is defined mathematically as:
where $\xi_i = y_i - f_i$ where $y_i$ is the observation and $f_i$ is the prediction. Furthermore I am after the mean loss, thus I have defined my loss function as:
$$
\mathcal{L} = \frac{\alpha\sum \xi_i-\sum I(\xi_i<0)\xi_i}{N}
$$
where $I()$ is the indicator function and takes on the values 1 if true.
Hence my loss function is defined as follows:
def tilted_loss2(y,f):
q = 0.05
e = (y-f)
return (q*tt.sum(e)-tt.sum(e[e<0]))/e.shape[0]
however, when I run my network I get negative values. Is there something wrong with the theano syntax here? my biggest suspicion is here: tt.sum(e[e<0])). Can you slice it like this?
Any thoughts would be appreciated.
You can not slice like this. see this answer
You need to change your loss function as follows:
def tilted_loss2(y,f):
q = 0.05
e = (y-f)
return (q*tt.sum(e)-tt.sum(e[(e<0).nonzero()]))/e.shape[0]
You can also try this work-around using abs function instead of complex slicing syntax that might not work:
def tilted_loss2(y,f):
q = 0.05
e = (y-f)
return (q*tt.sum(e)-tt.sum(e-abs(e))/2.)/e.shape[0]
Related
I'm trying to implement a loss function that increases loss as the model ranks images from worst to best, to do this I've come up with an algorithm that sorts the predicted score array according to the true scores of the image batch, then starting from the largest predicted score, check how far away it is from the first position in the array and give it a loss based on that, for the second largest we will see how far away it is from the 2nd array position and give it a loss based on that
To do this, I'm using tf.nn.top_k and other functions that I looked up to all be differentiable to my knowledge, but I still get the No gradients provided error
Can someone please tell me what part am I doing wrong?
Please note that the global sub_tensor was a workaround (to replace correct_indices) I was doing to avoid using a range which I know is non-differentiable, an array from outside the function that is fixed to be a range of the length of the batch [0-32]. This still didn't work
sub_tensor = constant(np.array([np.arange(32)],dtype='int32'))
def get_ranking_loss(y_true,y_pred):
global sub_tensor
_, y_true_ind_k = tf.nn.top_k(y_true, y_true.shape[1])
sorted_y_pred = tf.gather(y_pred,y_true_ind_k)
_, y_pred_ind_k = tf.nn.top_k(sorted_y_pred, sorted_y_pred.shape[1])
# correct_indices = tf.range(0,sorted_y_pred.shape[1])
subtracted = tf.math.subtract(y_pred_ind_k,sub_tensor)
absolute = tf.abs(subtracted)
absolute = tf.cast(absolute, float64)
return tf.reduce_sum(absolute)
I tried to change almost all functions to be tf functions only, but no luck
I am implementing a customer operation whose gradients must be calculated. The following is the function:
def difference(prod,box):
result = tf.Variable(tf.zeros((prod.shape[0],box.shape[1]),dtype=tf.float16))
for i in tf.range(0,prod.shape[0]):
for j in tf.range(0,box.shape[1]):
result[i,j].assign((tf.reduce_prod(box[:,j])-tf.reduce_prod(prod[i,:]))/tf.reduce_prod(box[:,j]))
return result
I am unable to calculate the gradients with respect to box, the tape.gradient() is returning None, here is the code I have written for calculating gradients
prod = tf.constant([[3,4,5],[4,5,6],[1,3,3]],dtype=tf.float16)
box = tf.Variable([[4,5],[5,6],[5,7]],dtype=tf.float16)
with tf.GradientTape() as tape:
tape.watch(box)
loss = difference(prod,box)
print(tape.gradient(loss,box))
I am not able to find the reason for unconnected gradients. Is the result variable causing it? Kindly suggest an alternative implementation.
Yes, in order to calculate gradients we need a set of (differentiable) operations on your variables.
You should re-write difference as a function of the 2 input tensors. I think (though happy to confess I am not 100% sure!) that it is the use of 'assign' that makes the gradient tape fall over.
Perhaps something like this:
def difference(prod, box):
box_red = tf.reduce_prod(box, axis=0)
prod_red = tf.reduce_prod(prod, axis=1)
return (tf.expand_dims(box_red, 0) - tf.expand_dims(prod_red, 1)) / tf.expand_dims(box_red, 0)
would get you the desired result
I'm trying to implement the loss function in http://anthology.aclweb.org/W16-1617 in PyTorch. It is shown as follows:
I've implemented the loss as follows:
class CosineContrastiveLoss(nn.Module):
"""
Cosine contrastive loss function.
Based on: http://anthology.aclweb.org/W16-1617
Maintain 0 for match, 1 for not match.
If they match, loss is 1/4(1-cos_sim)^2.
If they don't, it's cos_sim^2 if cos_sim < margin or 0 otherwise.
Margin in the paper is ~0.4.
"""
def __init__(self, margin=0.4):
super(CosineContrastiveLoss, self).__init__()
self.margin = margin
def forward(self, output1, output2, label):
cos_sim = F.cosine_similarity(output1, output2)
loss_cos_con = torch.mean((1-label) * torch.div(torch.pow((1.0-cos_sim), 2), 4) +
(label) * torch.pow(cos_sim * torch.lt(cos_sim, self.margin), 2))
return loss_cos_con
However, I'm getting an error saying:
TypeError: mul received an invalid combination of arguments - got (torch.cuda.ByteTensor), but expected one of:
* (float value)
didn't match because some of the arguments have invalid types: (torch.cuda.ByteTensor)
* (torch.cuda.FloatTensor other)
didn't match because some of the arguments have invalid types: (torch.cuda.ByteTensor)
I know that torch.lt() returns a ByteTensor, but if I try to coerce it to a FloatTensor with torch.Tensor.float() I get AttributeError: module 'torch.autograd.variable' has no attribute 'FloatTensor'.
I'm really not sure where to go from here. It seems logical to me to do an element-wise multiplication between the cosine similarity tensor and a tensor with 0 or 1 based on a less-than rule.
Maybe you can try float() method on the variable directly?
Variable(torch.zeros(5)).float() - works for me, for instance
I know that the question has time, but as many I have come here to find how to use the "cosine similarity" in a contrastive loss.
The formula that they expose in the article does not seem correct to me.
If you look at the operator "<" of formula 13, Ew < m in figure (2) of the article can never be happen. I think that equation 13 is the following:
Plot of the equation (13) of the article (It does not look like Figure 2):
Wrong (13) equation
Plot of the equation of the equivalent to figure 2 (m=0.4):
Correct (13) equation
I am trying to write a gradient descent function in python as part of a multivariate linear regression exercise. It runs, but does not compute the correct answer. My code is below. I've been trying for weeks to finish this problem but have made zero progress.
I believe that I understand the concept of gradient descent to optimize a multivariate linear regression function and also that the 'math' is correct. I believe that the error is in my code, but I am still learning python. Your help is very much appreciated.
def regression_gradient_descent(feature_matrix,output,initial_weights,step_size,tolerance):
from math import sqrt
converged = False
weights = np.array(initial_weights)
while not converged:
predictions = np.dot(feature_matrix,weights)
errors = predictions - output
gradient_sum_squares = 0
for i in range(len(weights)):
derivative = -2 * np.dot(errors[i],feature_matrix[i])
gradient_sum_squares = gradient_sum_squares + np.dot(derivative, derivative)
weights[i] = weights[i] - step_size * derivative[i]
gradient_magnitude = sqrt(gradient_sum_squares)
print gradient_magnitude
if gradient_magnitude < tolerance:
converged = True
return(weights)
Feature matrix is:
sales = gl.SFrame.read_csv('kc_house_data.csv',column_type_hints = {'bathrooms':float, 'waterfront':int, 'sqft_above':int, 'sqft_living15':float,'grade':int, 'yr_renovated':int, 'price':float, 'bedrooms':float, 'zipcode':str,'long':float, 'sqft_lot15':float, 'sqft_living':float, 'floors':str, 'condition':int,'lat':float, 'date':str, 'sqft_basement':int, 'yr_built':int, 'id':str, 'sqft_lot':int,'view':int})
I'm calling the function as:
train_data,test_data = sales.random_split(.8,seed=0)
simple_features = ['sqft_living']
my_output= 'price'
(simple_feature_matrix, output) = get_numpy_data(train_data, simple_features, my_output)
initial_weights = np.array([-47000., 1.])
step_size = 7e-12
tolerance = 2.5e7
simple_weights = regression_gradient_descent(simple_feature_matrix, output,initial_weights,step_size,tolerance)
**get_numpy_data is just a function to convert everything into arrays and works as intended
Update: I fixed the formula to:
derivative = 2 * np.dot(errors,feature_matrix)
and it seems to have worked. The derivation of this formula in my online course used
-2 * np.dot(errors,feature_matrix)
and I'm not sure why this formula did not provide the correct answer.
The step size seems too small, and the tolerance unusually big. Perhaps you meant to use them the other way around?
In general, the step size is determined by a trial-and-error procedure: the "natural" step size α=1 might lead to divergence, so one could try to lower the value (e.g. taking α=1/2, α=1/4, etc until convergence is achieved. Don't start with a very small step size.
One of my operaction need integer, but output of convolution is float.
It means I need to use tf.floor, tf.ceil, tf.cast...etc to handle it.
But these operactions cause None gradients, since operactions like tf.floor are not differentiable
So, I tried something like below
First. detour
out1 = tf.subtract(vif, tf.subtract(vif, tf.floor(vif)))
But output of test.compute_gradient_error is 500 or 0, I don't think this is a reasonable gradient.
Second. override gradient function of floor
#ops.RegisterGradient("CustomFloor")
def _custom_floor_grad(op, grads):
return [grads]
A, B = 50, 7
shape = [A, B]
f = np.ones(shape, dtype=np.float32)
vif = tf.constant(f, dtype=tf.float32)
# out1 = tf.subtract(vif, tf.subtract(vif, tf.floor(vif)))
with tf.get_default_graph().gradient_override_map({"Floor": "CustomFloor"}):
out1 = tf.floor(vif)
with tf.Session() as sess:
err1 = tf.test.compute_gradient_error(vif, shape, out1, shape)
print err1
output of test.compute_gradient_error is 500 or 1, doesn't work too.
Question: A way to get integer and keep back propagation work fine (value like 2.0, 5.0 is ok)
In general, it's not inadvisable to solve discrete problem with gradient descent. You should be able express, to some extent integer solvers in TF but you're more or less on your own.
FWIW, the floor function looks like a saw. Its derivative is a constant function at 1 with little holes at every integer. At these positions you have a Dirac functional pointing downwards, like a rake if you wish. The Dirac functional has finite energy but no finite value.
The canonical way to tackle these problems is to relax the problem by "relaxiing" the hard floor constraint with something that is (at least once) differentiable (smooth).
There are multiple ways to do this. Perhaps the most popular are:
Hack up a function that looks like what you want. For instance a piece-wise linear function that slopes down quickly, but not vertically.
Replace step functions by sigmoids
Use a filter approximation which is well understood if it's a time series