I want to define loss function for 3-best classification - python

I'm design a classification model.
I have a problem, there are many categories which has similar features.
I think best options is re-generate category hierarchy, but those are fixed.
So, I focused on 3-best accuracy, instead of 1-best accuracy.
I want to defined a loss function for 3-best accuracy.
I don't care where is the answer in position 1 - 3.
Is there any good loss function for that? of How can I define it?

You can use keras.metrics.top_k_categorical_accuracy for calculating accuracy. But this one is accuracy metric. I don't think there is any inbuilt top_k loss function in TensorFlow or Keras as of now. A loss function should be differentiable to work with gradient based learning methods. While top_k is not a differentiable function. Just like accuracy metric. Hence it can be used as accuracy metric but not as learning objective. So you won't find any inbuilt method for this, however there are other research papers aiming to solve this problems. You might want to have a look at Learning with Average Top-k Loss and Smooth Loss Functions for Deep Top-k Classification.

you can use any of the below
top_k_categorical_accuracy
keras.metrics.top_k_categorical_accuracy(y_true, y_pred, k=3)
sparse_top_k_categorical_accuracy
keras.metrics.sparse_top_k_categorical_accuracy(y_true, y_pred, k=3)

Related

Creating a criterion that measures the F1 Loss

I am currently creating criterion to measure the MSE loss function using:
loss_fcn = torch.nn.MSELoss()
loss = loss_fcn(logits[getMaskForBatch(subgraph)], labels.float())
Now I need to change it to F1 score but I cannot seem to find one library that could be used for it
In particular, depending on the task you need to have a specific loss function.
Loss function also known as objective, cost or error function is somehow opposite to the optimization function. Loss function creates the loss, optimization function reduces the loss. :). These two functions should live in equilibrium so we don't overfit.
PyTorch Regression losses:
nn.L1Loss L1 Loss (MAE)
nn.MSELoss L2 Loss (MSE)
nn.SmoothL1Loss Huber
PyTorch Classification losses:
nn.CrossEntropyLoss
nn.KLDivLoss
nn.NLLLoss
PyTorch GAN training
nn.MarginRankingLoss
So if you used nn.MSELoss you probably need to stay with regression, because F1 is a classification metric.
If you really need F1 score for some other reason, you may use scikit learn.
Why do you need to do that?
F1 score is usually an evaluation metrics not a loss function. Moreover, to use F1 score as a loss function you have to ensure that it's differentiable and convex (which I think probably is not the case otherwise it would already be in the literature).
There're many loss functions that could sweet with your problem like crossentropy, negative log likelyhood, CTC loss, etc.

Optimizing for accuracy instead of loss in Keras model

If I correctly understood the significance of the loss function to the model, it directs the model to be trained based on minimizing the loss value. So for example, if I want my model to be trained in order to have the least mean absolute error, i should use the MAE as the loss function. Why is it, for example, sometimes you see someone wanting to achieve the best accuracy possible, but building the model to minimize another completely different function? For example:
model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')
How come the model above is trained to give us the best acc, since during it's training it will try to minimize another function (MSE). I know that, when already trained, the metric of the model will give us the best acc found during the training.
My doubt is: shouldn't the focus of the model during it's training to maximize acc (or minimize 1/acc) instead of minimizing MSE? If done in that way, wouldn't the model give us even higher accuracy, since it knows it has to maximize it during it's training?
To start with, the code snippet you have used as example:
model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')
is actually invalid (although Keras will not produce any error or warning) for a very simple and elementary reason: MSE is a valid loss for regression problems, for which problems accuracy is meaningless (it is meaningful only for classification problems, where MSE is not a valid loss function). For details (including a code example), see own answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)?; for a similar situation in scikit-learn, see own answer in this thread.
Continuing to your general question: in regression settings, usually we don't need a separate performance metric, and we normally use just the loss function itself for this purpose, i.e. the correct code for the example you have used would simply be
model.compile(loss='mean_squared_error', optimizer='sgd')
without any metrics specified. We could of course use metrics='mse', but this is redundant and not really needed. Sometimes people use something like
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['mse','mae'])
i.e. optimise the model according to the MSE loss, but show also its performance in the mean absolute error (MAE) in addition to MSE.
Now, your question:
shouldn't the focus of the model during its training to maximize acc (or minimize 1/acc) instead of minimizing MSE?
is indeed valid, at least in principle (save for the reference to MSE), but only for classification problems, where, roughly speaking, the situation is as follows: we cannot use the vast arsenal of convex optimization methods in order to directly maximize the accuracy, because accuracy is not a differentiable function; so, we need a proxy differentiable function to use as loss. The most common example of such a loss function suitable for classification problems is the cross entropy.
Rather unsurprisingly, this question of yours pops up from time to time, albeit in slight variations in context; see for example own answers in
Cost function training target versus accuracy desired goal
Targeting a specific metric to optimize in tensorflow
For the interplay between loss and accuracy in the special case of binary classification, you may find my answers in the following threads useful:
Loss & accuracy - Are these reasonable learning curves?
How does Keras evaluate the accuracy?

The Accuracy Metric Purpose

I am using Keras to build a CNN and I have come to a misunderstanding about what the Accuracy metric does exactly.
I have done some research and it appears that it returns the Accuracy of the model. Where is this information stored exactly? Does this metric effect the epoch results?
I cannot find any resources that actually describe in depth what the Accuracy metric does. How are my results affected by using this metric?
model.compile(
loss="sparse_categorical_crossentropy",
optimizer='adam',
metrics=['accuracy']
)
The Keras documentation does not explain the purpose of this metric.
In case of your question it is easier to check the Keras source code, because any Deep Learning framework has a poor documentation.
Firstly, you need to find how string representations are processed:
if metric in ('accuracy', 'acc'):
metric_fn = metrics_module.categorical_accuracy
This follows to metric module where the categorical_accuracy function is defined:
def categorical_accuracy(y_true, y_pred):
return K.cast(K.equal(K.argmax(y_true, axis=-1),
K.argmax(y_pred, axis=-1)),
K.floatx())
It is clear that the function returns a tensor, and just a number presented in logs, so there is a wrapper function for processing the tensor with comparison results:
weighted_metric_fn = weighted_masked_objective(metric_fn)
This wrapper function contains the logic for calculating the final values. As no weights and masks are defined, just a simple averaging is used:
return K.mean(score_array)
So, there is an equation as a result:
P.S. I slightly disagree with #VnC, because accuracy and precision are different terms. Accuracy shows the rate of correct predictions in a classification task, and precision shows the rate of positive predicted values (more).
It is only used to report on your model performance and shouldn't affect it in any way, e.g. how accurate your predictions are.
Accuracy basically means precision:
precision = true_positives / ( true_positives + false_positives )
I would recommend using f1_score (link) as it combines precision and recall.
Hope that clears it up.
Any metric is a function of the model's predictions and the ground truth, same as a loss. The accuracy of a model by itself makes no sense, its not a property of only the model, but also of the dataset where the model is being evaluated.
Accuracy in particular is a metric used for classification, and it is just the ratio between the number of correct predictions (prediction equal to label), and the total number of data points in the dataset.
Any metric that is evaluated during training/evaluation is information for you, its not used to train the model. Only the loss function is used for actual training of the weights.

Scikit learn LogisticRegression log loss increases when adding features

I am performing a Multinomial Logistic Regression on variables in the NHTS 2017 dataset. According to the docs, sklearn.linear_model.LogisticRegression uses cross-entropy loss (log loss) as the loss function to optimize the model. However, as I add new features and fit the model, the loss does not seem to be monotone decreasing. Specifically, if I fit household driver count to vehicle ownership, (driver count is the single most predictive variable for vehicle ownership), I get less loss than if I indiscriminately fit all of the variables.
Possibly this is due to sklearn.metrics.log_loss doing something different than the actual loss function for LogisticRegression. Possibly the problem has become so non-convex that it finds a crappy solution. Can anybody hep explain why my loss would increase as I add features?
There could be multiple reasons but my guess is the following:
penalty - by default logistic regression is trained with a l2
penalty to prevent overfitting. In this case, the loss function is cross entropy loss plus the l2 norm of weights. As a result, more features will not necessarily guarantee that the cross entropy itself decreases.
Btw, it seems like your goal is to get the highest score (lowest loss) on a training set. I am not gonna dispute that but maybe look into test/validation sets.

Exact definitions of loss functions in sklearn.linear_model.SGDClassifier

I know that I may change loss function to one of the following:
loss : str, 'hinge' or 'log' or 'modified_huber'
The loss function to be used. Defaults to 'hinge'. The hinge loss is
a margin loss used by standard linear SVM models. The 'log' loss is
the loss of logistic regression models and can be used for
probability estimation in binary classifiers. 'modified_huber'
is another smooth loss that brings tolerance to outliers.
But what the definitions of this functions?
I understand that hinge is max(0, 1 - margin). And what are others too?
Here are the graphs of all these functions, taken from the scikit-learn example gallery:
In the current dev version of the example, the losses are implemented inline in the script.
sklearn's source code is available on GitHub, so you can examine it. List of loss functions can be found in sklearn/linear_model/stochastic_gradient.py. Definitions of that losses are here: sklearn/linear_model/sgd_fast.pyx#L46

Categories