I'm currenly working on a dataset where I've to predict an integer output. It starts from 1 to N. I've build a network with loss function mse. But I feel like mse loss function may not be an ideal loss function to minimize in the case of integer output.
I'm also round my prediction to get integer output. Is there a way to make/optimize the model better in case of integer output.
Can anyone provide some help on how to deal with integer output/targets. This is the loss function I'm using right now.
model.compile(optimizer=SGD(0.001), loss='mse')
You are using the wrong loss, mean squared error is a loss for regression, and you have a classification problem (discrete outputs, not continuous).
So for this your model should have a softmax output layer:
model.add(Dense(N, activation="softmax"))
And you should be using a classification loss:
model.compile(optimizer=SGD(0.001), loss='sparse_categorical_crossentropy')
Assuming your labels are integers in the [0, N-1] range (off by one), this should work. To make a prediction, you should do:
output = np.argmax(model.predict(some_data), axis=1) + 1
The +1 is because integer labels go from 0 to N-1
Ordinal regression could be an appropriate approach, in case predicting the wrong month but close to the true month is considered a smaller mistake than predicting a value one year earlier or later. Only you can know that, based on the specific problem you want to solve.
I found an implementation of the appropriate loss function on github (no affiliation). For completeness, below I copy-paste the code from that repo:
from keras import backend as K
from keras import losses
def loss(y_true, y_pred):
weights = K.cast(
K.abs(K.argmax(y_true, axis=1) - K.argmax(y_pred, axis=1))/(K.int_shape(y_pred)[1] - 1),
dtype='float32'
)
return (1.0 + weights) * losses.categorical_crossentropy(y_true, y_pred)
Related
Hello I am in need of a custom regularization term to add to my (binary cross entropy) Loss function. Can somebody help me with the Tensorflow syntax to implement this?
I simplified everything as much as possible so it could be easier to help me.
The model takes a dataset 10000 of 18 x 18 binary configurations as input and has a 16x16 of a configuration set as output. The neural network consists only of 2 Convlutional layer.
My model looks like this:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
EPOCHS = 10
model = models.Sequential()
model.add(layers.Conv2D(1,2,activation='relu',input_shape=[18,18,1]))
model.add(layers.Conv2D(1,2,activation='sigmoid',input_shape=[17,17,1]))
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),loss=tf.keras.losses.BinaryCrossentropy())
model.fit(initial.reshape(10000,18,18,1),target.reshape(10000,16,16,1),batch_size = 1000, epochs=EPOCHS, verbose=1)
output = model(initial).numpy().reshape(10000,16,16)
Now I wrote a function which I'd like to use as an aditional regularization terme to have as a regularization term. This function takes the true and the prediction. Basically it multiplies every point of both with its 'right' neighbor. Then the difference is taken. I assumed that the true and prediction term is 16x16 (and not 10000x16x16). Is this correct?
def regularization_term(prediction, true):
order = list(range(1,4))
order.append(0)
deviation = (true*true[:,order]) - (prediction*prediction[:,order])
deviation = abs(deviation)**2
return 0.2 * deviation
I would really appreciate some help with adding something like this function as a regularization term to my loss for helping the neural network to train better to this 'right neighbor' interaction. I'm really struggling with using the customizable Tensorflow functionalities a lot.
Thank you, much appreciated.
It is quite simple. You need to specify a custom loss in which you define your adding regularization term. Something like this:
# to minimize!
def regularization_term(true, prediction):
order = list(range(1,4))
order.append(0)
deviation = (true*true[:,order]) - (prediction*prediction[:,order])
deviation = abs(deviation)**2
return 0.2 * deviation
def my_custom_loss(y_true, y_pred):
return tf.keras.losses.BinaryCrossentropy()(y_true, y_pred) + regularization_term(y_true, y_pred)
model.compile(optimizer='Adam', loss=my_custom_loss)
As stated by keras:
Any callable with the signature loss_fn(y_true, y_pred) that returns
an array of losses (one of sample in the input batch) can be passed to
compile() as a loss. Note that sample weighting is automatically
supported for any such loss.
So be sure to return an array of losses (EDIT: as I can see now it is possible to return also a simple scalar. It doesn't matter if you use for example the reduce function). Basically y_true and y_predicted have as first dimension the batch size.
here details: https://keras.io/api/losses/
I'm trying to use exact match / subset accuracy as a metric for my Keras model. I understand basically how it's supposed to work, but I'm having a hard time with the tensor manipulation.
I'm working on a multilabel classification task with 55 possible labels. I'm considering an output > 0.5 to be a positive for that label. I want a metric that describes how often the output exactly matches the true labels.
My approach is to convert y_true to tf.bool, and y_pred > 0.5 to tf.bool, and then return a tensor containing True if they match exactly, and False otherwise. It appears to be working when I do basic tests, but when I train the model, it stays at 0.0000 without ever changing.
def subset_accuracy(y_true, y_pred):
y_pred_bin = tf.cast(y_pred > 0.5, tf.bool)
equality = tf.equal(tf.cast(y_true, tf.bool), y_pred_bin)
return tf.equal(
tf.cast(tf.math.count_nonzero(equality), tf.int32),
tf.size(y_true)
)
I am expecting to see the metric slowly climb, even if it only goes up to 50% or something. But it's staying at 0.0.
Here is another option, tested with tensorflow 2.3:
def subset_accuracy(y_true, y_pred):
threshold = tf.constant(.8, tf.float32)
gtt_pred = tf.math.greater(y_pred, threshold)
gtt_true = tf.math.greater(y_true, threshold)
accuracy = tf.reduce_mean(tf.cast(tf.equal(gtt_pred, gtt_true), tf.float32), axis=-1)
return accuracy
I would imagine that tf.cast(y_true, tf.bool) could be a problem, as it casts float to bool, so depending on how tf deals with it internally, it may first get cast to int, so anything < 1.0 will be zero, and then to bool. That's why nothing will match and you only get zero accuracy.
The above suggestion avoids that problem.
Suggestion: test your metric independently from the model. Use a model (untrained works as well) and model.evaluate a single batch. Compute the metric manually by using the output of model.predict.
Make sure that your computation and the metric the model outputs come to the same result and that the result makes sense for the values in this batch.
Once you are sure that your loss is really mathematically correct; you can then try to debug your model.
It is not clear from your code snippet what you consider to be subset accuracy.
For instance Keras defines categorical_acuracy as:
def categorical_accuracy(y_true, y_pred):
return K.cast(K.equal(K.argmax(y_true, axis=-1),
K.argmax(y_pred, axis=-1)),
K.floatx())
How do you intend your accuracy metric to be different. Just ensure that the value is greater than 0.5 ? Perhaps you may consider modifying the Keras metric.
I am new to tensorflow
In a part of a code for a tensorflow session, there is :
loss = tf.nn.softmax_cross_entropy_with_logits_v2(
logits=net, labels=self.out_placeholder, name='cross_entropy')
self.loss = tf.reduce_mean(loss, name='mean_squared_error')
I want to use mean_squared_error loss function for this purpose. I found this loss function in tensorflow website:
tf.losses.mean_squared_error(
labels,
predictions,
weights=1.0,
scope=None,
loss_collection=tf.GraphKeys.LOSSES,
reduction=Reduction.SUM_BY_NONZERO_WEIGHTS
)
I need this loss function for a regression problem.
I tried:
loss = tf.losses.mean_squared_error(predictions=net, labels=self.out_placeholder)
self.loss = tf.reduce_mean(loss, name='mean_squared_error')
Where net = tf.matmul(input_tensor, weights) + biases
However, I'm not sure if it's the correct way.
First of all keep in mind that cross-entropy is mainly used for classification, while MSE is used for regression.
In your case cross entropy measures the difference between two distributions (the real occurences, called labels - and your predictions)
So while the first loss functions works on the result of the softmax layer (which can be seen as a probability distribution), the second one works directly on the floating point output of your network (which is no probability distribution) - therefore they cannot be simply exchanged.
I want to implement an accuracy function for a triplet loss network so that I know, how does the algorithm works during the training. So far I have tried something, but I'm not sure whether it actually can work and also I have troubles implementing it in keras. My idea was to compare the predicted anchor-positive and anchor-negative distances (in y_pred), so that the positive distance should be low enough and the negative one large enough:
def accuracy(_, y_pred):
pos_treshold = 0.4
neg_treshold = 0.6
return K.mean(y_pred[0] < pos_treshold and y_pred[1] > neg_treshold)
The problem with this is that I couldn't figure out how to implement this and condition in keras.
Then I tried to find something on this topic of accuracy for triplet loss. One way of doing it is to define the accuracy as a proportion of the number of triplets in which the predicted distance between the anchor image and the positive image is less than the one between the anchor image and the negative image. With this I have even bigger problems in implementing it in keras.
I tried this (although I don't know whether it does what I described):
K.mean(y_pred[0] < y_pred[1])
which gives me accuracy around 0.5 all the time (probably some random stuff). So still I don't know whether the model is bad or the accuracy function is bad.
So my question is how to implement any reasonable accuracy function in keras? Whether it would be one of these two I don't really care.
That's what I use (condition y_pred[0] < y_pred[1]), while taking into account the batch dimension. Note that I'm not using a mean, so that it would support sample-weight.
def triplet_accuracy(_, y_pred):
'''
Input: y_pred shape is (batch_size, 2)
[pos, neg]
Output: shape (batch_size, 1)
loss[i] = 1 if y_pred[i, 0] < y_pred[i, 1] else 0
'''
subtraction = K.constant([-1, 1], shape=(2, 1))
diff = K.dot(y_pred, subtraction)
loss = K.maximum(K.sign(diff), K.constant(0))
return loss
This question is about the tf.losses.huber_loss() function and how it can be applied on scalars rather than vectors. Thank you for your time!
My model is similar to a classification problem like MNIST. I based my code on the TensorFlow layers tutorial and made changes where I saw fit. I do not think the exact code is needed for my question.
I have lables that take integer values in {0,..,8}, that are converted into onehot labels like this:
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=n_classes)
The last layer in the model is
logits = tf.layers.dense(inputs=dense4, units=n_classes)
which is converted into predictions like this:
predictions = {"classes": tf.argmax(input=logits, axis=1), "probabilities": tf.nn.softmax(logits, name="softmax_tensor")}
From the tutorial, I started with the tf.losses.softmax_cross_entropy() loss function. But in my model, I am predicting in which discretized bin a value will fall. So I started looking for a loss function that would translate that a prediction of one bin off is less of a problem than two bins off. Something like the absolute_difference or Huber function.
The code
onehot_labels = tf.one_hot(indices=tf.cast(labels, tf.int32), depth=n_classes)
loss = tf.losses.softmax_cross_entropy(onehot_labels=onehot_labels, logits=logits)
in combination with the optimizer:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=ps.learning_rate)
works without any errors. When changing to the Huber function:
loss = tf.losses.huber_loss(labels=onehot_labels, predictions=logits)
there are still no errors. But at this point I am unsure about what exactly happens. Based on the reduction definition I expect that the Huber function is applied pairwise for elements of the vectors and then summed up or averaged.
I would like to apply the Huber function only on the label integer (in {0,...,9}) and predicted value:
preds = tf.argmax(input=logits, axis=1)
So this is what I tried:
loss = tf.losses.huber_loss(labels=indices, predictions=preds)
This is raising the error
ValueError: No gradients provided for any variable
I have found two common causes that I do not think are happening in my situation:
This where there is no path between tf.Variable objects and the loss function. But since my prediction code is often used and the labels were provided as integers, I do not think this applies here.
The function is not derivable into a gradient. But the Huber function does work when vectors are used as input, so I do not think this is the case.
My question is: what code lets me use the Huber loss function on my two integer tensors (labels and predictions)?