I want to implement an accuracy function for a triplet loss network so that I know, how does the algorithm works during the training. So far I have tried something, but I'm not sure whether it actually can work and also I have troubles implementing it in keras. My idea was to compare the predicted anchor-positive and anchor-negative distances (in y_pred), so that the positive distance should be low enough and the negative one large enough:
def accuracy(_, y_pred):
pos_treshold = 0.4
neg_treshold = 0.6
return K.mean(y_pred[0] < pos_treshold and y_pred[1] > neg_treshold)
The problem with this is that I couldn't figure out how to implement this and condition in keras.
Then I tried to find something on this topic of accuracy for triplet loss. One way of doing it is to define the accuracy as a proportion of the number of triplets in which the predicted distance between the anchor image and the positive image is less than the one between the anchor image and the negative image. With this I have even bigger problems in implementing it in keras.
I tried this (although I don't know whether it does what I described):
K.mean(y_pred[0] < y_pred[1])
which gives me accuracy around 0.5 all the time (probably some random stuff). So still I don't know whether the model is bad or the accuracy function is bad.
So my question is how to implement any reasonable accuracy function in keras? Whether it would be one of these two I don't really care.
That's what I use (condition y_pred[0] < y_pred[1]), while taking into account the batch dimension. Note that I'm not using a mean, so that it would support sample-weight.
def triplet_accuracy(_, y_pred):
'''
Input: y_pred shape is (batch_size, 2)
[pos, neg]
Output: shape (batch_size, 1)
loss[i] = 1 if y_pred[i, 0] < y_pred[i, 1] else 0
'''
subtraction = K.constant([-1, 1], shape=(2, 1))
diff = K.dot(y_pred, subtraction)
loss = K.maximum(K.sign(diff), K.constant(0))
return loss
Related
this is my first post so I will try to detail all the relevant info. If anything is missing please let me know!
I am currently trying to create cnn (based off unet) for image segmentation on grayscale images.
I have created a custom function to calculate dice loss and Binary Cross Entropy loss, see below.
def dice_BCE_coef_loss(y_true, y_pred):
smooth = 1
bce_weight = 0.5
#y_true_f = tensorflow.math.reduce_sum(y_true)
#y_pred_f = tensorflow.math.reduce_sum(y_pred)
y_true_f = tensorflow.reshape(y_true, [-1])
y_pred_f = tensorflow.reshape(y_pred, [-1])
intersection = tensorflow.math.reduce_sum(y_true_f * y_pred_f)
union = tensorflow.math.reduce_sum(y_true_f + y_pred_f)
dice_coef = (2*intersection + smooth) / (union + smooth)
dice_loss = 1 - dice_coef
BCE = tensorflow.keras.losses.BinaryCrossentropy(from_logits=True)(y_true, y_pred)
dice_BCE = tensorflow.math.reduce_mean(BCE * bce_weight + dice_loss * (1 - bce_weight))
return dice_BCE
I then add this to my model as the loss.
model.compile(optimizer=tensorflow.keras.optimizers.Adam(lr=1e-3),
loss=dice_BCE_coef_loss,
metrics=['accuracy']
)
The issue comes when I calculate the dice_BCE manually the loss value is different to the output loss during training. To confirm whether this was a correct value across the whole dataset (my manual check was a single image) I reduced my dataset to a single image and mask yet they still didn't match.
Image showing the discrepancy of loss vs my expected dice_BCE loss (I hope a picture is allowed in this case) 1
This loss always remains around 0.48 after several epochs, however never really improves from there and sometimes you can see the output mask is really close (and a good expected dice_BCE to match) yet it ends up diverging because the loss it seems to train on can be improved in other ways (but increase the expected dice_loss).
The dice loss (through the loss value of the epoch) is also far lower than when calculated through the function. Around 0.001 even when the accuracy of the prediction is ~50% and it visible looks incorrect.
Can anybody explain how this loss is calculated and why it does not match what I expect it to be?
I have read through similar posts on here but can't find anything useful.
Any suggestions of what to look into next or resources to investigate further if this is obvious please do let me know! Thank you in advance
I'm currenly working on a dataset where I've to predict an integer output. It starts from 1 to N. I've build a network with loss function mse. But I feel like mse loss function may not be an ideal loss function to minimize in the case of integer output.
I'm also round my prediction to get integer output. Is there a way to make/optimize the model better in case of integer output.
Can anyone provide some help on how to deal with integer output/targets. This is the loss function I'm using right now.
model.compile(optimizer=SGD(0.001), loss='mse')
You are using the wrong loss, mean squared error is a loss for regression, and you have a classification problem (discrete outputs, not continuous).
So for this your model should have a softmax output layer:
model.add(Dense(N, activation="softmax"))
And you should be using a classification loss:
model.compile(optimizer=SGD(0.001), loss='sparse_categorical_crossentropy')
Assuming your labels are integers in the [0, N-1] range (off by one), this should work. To make a prediction, you should do:
output = np.argmax(model.predict(some_data), axis=1) + 1
The +1 is because integer labels go from 0 to N-1
Ordinal regression could be an appropriate approach, in case predicting the wrong month but close to the true month is considered a smaller mistake than predicting a value one year earlier or later. Only you can know that, based on the specific problem you want to solve.
I found an implementation of the appropriate loss function on github (no affiliation). For completeness, below I copy-paste the code from that repo:
from keras import backend as K
from keras import losses
def loss(y_true, y_pred):
weights = K.cast(
K.abs(K.argmax(y_true, axis=1) - K.argmax(y_pred, axis=1))/(K.int_shape(y_pred)[1] - 1),
dtype='float32'
)
return (1.0 + weights) * losses.categorical_crossentropy(y_true, y_pred)
I'm trying to use exact match / subset accuracy as a metric for my Keras model. I understand basically how it's supposed to work, but I'm having a hard time with the tensor manipulation.
I'm working on a multilabel classification task with 55 possible labels. I'm considering an output > 0.5 to be a positive for that label. I want a metric that describes how often the output exactly matches the true labels.
My approach is to convert y_true to tf.bool, and y_pred > 0.5 to tf.bool, and then return a tensor containing True if they match exactly, and False otherwise. It appears to be working when I do basic tests, but when I train the model, it stays at 0.0000 without ever changing.
def subset_accuracy(y_true, y_pred):
y_pred_bin = tf.cast(y_pred > 0.5, tf.bool)
equality = tf.equal(tf.cast(y_true, tf.bool), y_pred_bin)
return tf.equal(
tf.cast(tf.math.count_nonzero(equality), tf.int32),
tf.size(y_true)
)
I am expecting to see the metric slowly climb, even if it only goes up to 50% or something. But it's staying at 0.0.
Here is another option, tested with tensorflow 2.3:
def subset_accuracy(y_true, y_pred):
threshold = tf.constant(.8, tf.float32)
gtt_pred = tf.math.greater(y_pred, threshold)
gtt_true = tf.math.greater(y_true, threshold)
accuracy = tf.reduce_mean(tf.cast(tf.equal(gtt_pred, gtt_true), tf.float32), axis=-1)
return accuracy
I would imagine that tf.cast(y_true, tf.bool) could be a problem, as it casts float to bool, so depending on how tf deals with it internally, it may first get cast to int, so anything < 1.0 will be zero, and then to bool. That's why nothing will match and you only get zero accuracy.
The above suggestion avoids that problem.
Suggestion: test your metric independently from the model. Use a model (untrained works as well) and model.evaluate a single batch. Compute the metric manually by using the output of model.predict.
Make sure that your computation and the metric the model outputs come to the same result and that the result makes sense for the values in this batch.
Once you are sure that your loss is really mathematically correct; you can then try to debug your model.
It is not clear from your code snippet what you consider to be subset accuracy.
For instance Keras defines categorical_acuracy as:
def categorical_accuracy(y_true, y_pred):
return K.cast(K.equal(K.argmax(y_true, axis=-1),
K.argmax(y_pred, axis=-1)),
K.floatx())
How do you intend your accuracy metric to be different. Just ensure that the value is greater than 0.5 ? Perhaps you may consider modifying the Keras metric.
This question comes from watching the following video on TensorFlow and Reinforcement Learning from Google I/O 18: https://www.youtube.com/watch?v=t1A3NTttvBA
Here they train a very simple RL algorithm to play the game of Pong.
In the slides they use, the loss is defined like this ( approx # 11m 25s ):
loss = -R(sampled_actions * log(action_probabilities))
Further they show the following code ( approx # 20m 26s):
# loss
cross_entropies = tf.losses.softmax_cross_entropy(
onehot_labels=tf.one_hot(actions, 3), logits=Ylogits)
loss = tf.reduce_sum(rewards * cross_entropies)
# training operation
optimizer = tf.train.RMSPropOptimizer(learning_rate=0.001, decay=0.99)
train_op = optimizer.minimize(loss)
Now my question is this; They use the +1 for winning and -1 for losing as rewards. In the code that is provided, any cross entropy loss that's multiplied by a negative reward will be very low? And if the training operation is using the optimizer to minimize the loss, well then the algorithm is trained to lose?
Or is there something fundamental I'm missing ( probably because of my very limited mathematical skills )
Great question Corey. I am also wondering exactly what this popular loss function in RL actually means. I've seen many implementations of it, but many contradict each other. For my understanding, it means this:
Loss = - log(pi) * A
Where A is the advantage compared to a baseline case. In Google's case, they used a baseline of 0, so A = R. This is multiplied by that specific action at that specific time, so in your above example, actions were one hot encoded as [1, 0, 0]. We will ignore the 0s and only take the 1. Hence we have the above equation.
If you intuitively calculate this loss for a negative reward:
Loss = - (-1) * log(P)
But for any P less than 1, log of that value will be negative. Therefore, you have a negative loss which can be interpreted as "very good", but really doesn't make physical sense.
The correct way:
However in my opinion, and please others correct me if I'm wrong, you do not calculate the loss directly. You take the gradient of the loss. That is, you take the derivative of -log(pi)*A.
Therefore, you would have:
-(d(pi) / pi) * A
Now, if you have a large negative reward, it will translate to a very large loss.
I hope this makes sense.
I'm working on a 200-class classification task(but it's a little bit different, because there might be multiple 1's in the y vector) using a 4-layer fully-connected neural network. For most times y(label vector) contains one or two 1's, and that's where the problem is. When training, the model tends to predict all the labels as zero, even it should be 1.
Thus the accuracy is low(less than 99%, which is almostly worse than all-zero prediction). The activation function for each layer is sigmoid. Could you give me some advice to improve the model?
This is my loss function. The accuracy is low because when I predict all labels as 0, it'll get almost 99% accuracy.
loss = tf.reduce_mean(tf.reduce_sum(-(sum_all - sum_one) / sum_all * tf.multiply(ys, tf.log(prediction)) - sum_one / sum_all * tf.multiply((one - ys), tf.log(one - prediction)), reduction_indices = [1])) sum_one indicates the number of 1's in the label. I implemented a weighting here.