I am currently creating criterion to measure the MSE loss function using:
loss_fcn = torch.nn.MSELoss()
loss = loss_fcn(logits[getMaskForBatch(subgraph)], labels.float())
Now I need to change it to F1 score but I cannot seem to find one library that could be used for it
In particular, depending on the task you need to have a specific loss function.
Loss function also known as objective, cost or error function is somehow opposite to the optimization function. Loss function creates the loss, optimization function reduces the loss. :). These two functions should live in equilibrium so we don't overfit.
PyTorch Regression losses:
nn.L1Loss L1 Loss (MAE)
nn.MSELoss L2 Loss (MSE)
nn.SmoothL1Loss Huber
PyTorch Classification losses:
nn.CrossEntropyLoss
nn.KLDivLoss
nn.NLLLoss
PyTorch GAN training
nn.MarginRankingLoss
So if you used nn.MSELoss you probably need to stay with regression, because F1 is a classification metric.
If you really need F1 score for some other reason, you may use scikit learn.
Why do you need to do that?
F1 score is usually an evaluation metrics not a loss function. Moreover, to use F1 score as a loss function you have to ensure that it's differentiable and convex (which I think probably is not the case otherwise it would already be in the literature).
There're many loss functions that could sweet with your problem like crossentropy, negative log likelyhood, CTC loss, etc.
Related
I searched all over the web but I couldn't find it. how does Keras calculate the loss if we have multiple output values.
It depends on which loss function you are using.
Usually, For each batch during training, Keras will call the
loss function to compute the loss and use it to perform a Gradient
Descent step. Moreover, it will keep track of the total loss since the
beginning of the epoch, and it will display the mean loss.
Regarding multiple outputs, same process happens and calculation wise you can pick any loss function mentioned in the document here and check the examples.
Let's say for SparseCategoricalCrossentropy you can check this document.
I'm design a classification model.
I have a problem, there are many categories which has similar features.
I think best options is re-generate category hierarchy, but those are fixed.
So, I focused on 3-best accuracy, instead of 1-best accuracy.
I want to defined a loss function for 3-best accuracy.
I don't care where is the answer in position 1 - 3.
Is there any good loss function for that? of How can I define it?
You can use keras.metrics.top_k_categorical_accuracy for calculating accuracy. But this one is accuracy metric. I don't think there is any inbuilt top_k loss function in TensorFlow or Keras as of now. A loss function should be differentiable to work with gradient based learning methods. While top_k is not a differentiable function. Just like accuracy metric. Hence it can be used as accuracy metric but not as learning objective. So you won't find any inbuilt method for this, however there are other research papers aiming to solve this problems. You might want to have a look at Learning with Average Top-k Loss and Smooth Loss Functions for Deep Top-k Classification.
you can use any of the below
top_k_categorical_accuracy
keras.metrics.top_k_categorical_accuracy(y_true, y_pred, k=3)
sparse_top_k_categorical_accuracy
keras.metrics.sparse_top_k_categorical_accuracy(y_true, y_pred, k=3)
If I correctly understood the significance of the loss function to the model, it directs the model to be trained based on minimizing the loss value. So for example, if I want my model to be trained in order to have the least mean absolute error, i should use the MAE as the loss function. Why is it, for example, sometimes you see someone wanting to achieve the best accuracy possible, but building the model to minimize another completely different function? For example:
model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')
How come the model above is trained to give us the best acc, since during it's training it will try to minimize another function (MSE). I know that, when already trained, the metric of the model will give us the best acc found during the training.
My doubt is: shouldn't the focus of the model during it's training to maximize acc (or minimize 1/acc) instead of minimizing MSE? If done in that way, wouldn't the model give us even higher accuracy, since it knows it has to maximize it during it's training?
To start with, the code snippet you have used as example:
model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')
is actually invalid (although Keras will not produce any error or warning) for a very simple and elementary reason: MSE is a valid loss for regression problems, for which problems accuracy is meaningless (it is meaningful only for classification problems, where MSE is not a valid loss function). For details (including a code example), see own answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)?; for a similar situation in scikit-learn, see own answer in this thread.
Continuing to your general question: in regression settings, usually we don't need a separate performance metric, and we normally use just the loss function itself for this purpose, i.e. the correct code for the example you have used would simply be
model.compile(loss='mean_squared_error', optimizer='sgd')
without any metrics specified. We could of course use metrics='mse', but this is redundant and not really needed. Sometimes people use something like
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['mse','mae'])
i.e. optimise the model according to the MSE loss, but show also its performance in the mean absolute error (MAE) in addition to MSE.
Now, your question:
shouldn't the focus of the model during its training to maximize acc (or minimize 1/acc) instead of minimizing MSE?
is indeed valid, at least in principle (save for the reference to MSE), but only for classification problems, where, roughly speaking, the situation is as follows: we cannot use the vast arsenal of convex optimization methods in order to directly maximize the accuracy, because accuracy is not a differentiable function; so, we need a proxy differentiable function to use as loss. The most common example of such a loss function suitable for classification problems is the cross entropy.
Rather unsurprisingly, this question of yours pops up from time to time, albeit in slight variations in context; see for example own answers in
Cost function training target versus accuracy desired goal
Targeting a specific metric to optimize in tensorflow
For the interplay between loss and accuracy in the special case of binary classification, you may find my answers in the following threads useful:
Loss & accuracy - Are these reasonable learning curves?
How does Keras evaluate the accuracy?
I am performing a Multinomial Logistic Regression on variables in the NHTS 2017 dataset. According to the docs, sklearn.linear_model.LogisticRegression uses cross-entropy loss (log loss) as the loss function to optimize the model. However, as I add new features and fit the model, the loss does not seem to be monotone decreasing. Specifically, if I fit household driver count to vehicle ownership, (driver count is the single most predictive variable for vehicle ownership), I get less loss than if I indiscriminately fit all of the variables.
Possibly this is due to sklearn.metrics.log_loss doing something different than the actual loss function for LogisticRegression. Possibly the problem has become so non-convex that it finds a crappy solution. Can anybody hep explain why my loss would increase as I add features?
There could be multiple reasons but my guess is the following:
penalty - by default logistic regression is trained with a l2
penalty to prevent overfitting. In this case, the loss function is cross entropy loss plus the l2 norm of weights. As a result, more features will not necessarily guarantee that the cross entropy itself decreases.
Btw, it seems like your goal is to get the highest score (lowest loss) on a training set. I am not gonna dispute that but maybe look into test/validation sets.
I know that I may change loss function to one of the following:
loss : str, 'hinge' or 'log' or 'modified_huber'
The loss function to be used. Defaults to 'hinge'. The hinge loss is
a margin loss used by standard linear SVM models. The 'log' loss is
the loss of logistic regression models and can be used for
probability estimation in binary classifiers. 'modified_huber'
is another smooth loss that brings tolerance to outliers.
But what the definitions of this functions?
I understand that hinge is max(0, 1 - margin). And what are others too?
Here are the graphs of all these functions, taken from the scikit-learn example gallery:
In the current dev version of the example, the losses are implemented inline in the script.
sklearn's source code is available on GitHub, so you can examine it. List of loss functions can be found in sklearn/linear_model/stochastic_gradient.py. Definitions of that losses are here: sklearn/linear_model/sgd_fast.pyx#L46