I am using Keras to build a CNN and I have come to a misunderstanding about what the Accuracy metric does exactly.
I have done some research and it appears that it returns the Accuracy of the model. Where is this information stored exactly? Does this metric effect the epoch results?
I cannot find any resources that actually describe in depth what the Accuracy metric does. How are my results affected by using this metric?
model.compile(
loss="sparse_categorical_crossentropy",
optimizer='adam',
metrics=['accuracy']
)
The Keras documentation does not explain the purpose of this metric.
In case of your question it is easier to check the Keras source code, because any Deep Learning framework has a poor documentation.
Firstly, you need to find how string representations are processed:
if metric in ('accuracy', 'acc'):
metric_fn = metrics_module.categorical_accuracy
This follows to metric module where the categorical_accuracy function is defined:
def categorical_accuracy(y_true, y_pred):
return K.cast(K.equal(K.argmax(y_true, axis=-1),
K.argmax(y_pred, axis=-1)),
K.floatx())
It is clear that the function returns a tensor, and just a number presented in logs, so there is a wrapper function for processing the tensor with comparison results:
weighted_metric_fn = weighted_masked_objective(metric_fn)
This wrapper function contains the logic for calculating the final values. As no weights and masks are defined, just a simple averaging is used:
return K.mean(score_array)
So, there is an equation as a result:
P.S. I slightly disagree with #VnC, because accuracy and precision are different terms. Accuracy shows the rate of correct predictions in a classification task, and precision shows the rate of positive predicted values (more).
It is only used to report on your model performance and shouldn't affect it in any way, e.g. how accurate your predictions are.
Accuracy basically means precision:
precision = true_positives / ( true_positives + false_positives )
I would recommend using f1_score (link) as it combines precision and recall.
Hope that clears it up.
Any metric is a function of the model's predictions and the ground truth, same as a loss. The accuracy of a model by itself makes no sense, its not a property of only the model, but also of the dataset where the model is being evaluated.
Accuracy in particular is a metric used for classification, and it is just the ratio between the number of correct predictions (prediction equal to label), and the total number of data points in the dataset.
Any metric that is evaluated during training/evaluation is information for you, its not used to train the model. Only the loss function is used for actual training of the weights.
Related
In many ML applications a weighted loss may be desirable since some types of incorrect predictions might be worse outcomes than other errors. E.g. in medical binary classification (healthy/ill) a false negative, where the patient doesn't get further examinations is a worse outcome than a false positive, where a follow-up examination will reveal the error.
So if I define a weighted loss function like this:
def weighted_loss(prediction, target):
if prediction == target:
return 0 # correct, no loss
elif prediction == 0: # class 0 is healthy
return 100 # false negative, very bad
else:
return 1 # false positive, incorrect
How can I pass something equivalent to this to scikit-learn classifiers like Random Forests or SVM classifiers?
I am afraid your question is ill-posed, stemming from a fundamental confusion between the different notions of loss and metric.
Loss functions do not work with prediction == target-type conditions - this is what metrics (like accuracy, precision, recall etc) do - which, however, play no role during loss optimization (i.e. training), and serve only for performance assessment. Loss does not work with hard class predictions; it only works with the probabilistic outputs of the classifier, where such equality conditions never apply.
An additional layer of "insulation" between loss and metrics is the choice of a threshold, which is necessary for converting the probabilistic outputs of a classifier (only thing that matters during training) to "hard" class predictions (only thing that matters for the business problem under consideration). And again, this threshold plays absolutely no role during model training (where the only relevant quantity is the loss, which knows nothing about thresholds and hard class predictions); as nicely put in the Cross Validated thread Reduce Classification Probability Threshold:
the statistical component of your exercise ends when you output a probability for each class of your new sample. Choosing a threshold beyond which you classify a new observation as 1 vs. 0 is not part of the statistics any more. It is part of the decision component.
Although you can certainly try to optimize this (decision) threshold with extra procedures outside of the narrowly-defined model training (i.e. loss minimization), as you briefly describe in the comments, your expectation that
I am pretty sure that I'd get better results if the decision boundaries drawn by the RBFs took that into account, when fitting to the data
with something similar to your weight_loss function is futile.
So, no function similar to your weight_loss shown here (essentially a metric, and not a loss function, despite its name), that employs equality conditions like prediction == target, can be used for model training.
The discusion in the following SO threads might also be useful in clarifying the issue:
Loss & accuracy - Are these reasonable learning curves?
What is the difference between loss function and metric in Keras? (despite the title, the definitions are generally applicable and not only for Keras)
Cost function training target versus accuracy desired goal
How to interpret loss and accuracy for a machine learning model
I'm design a classification model.
I have a problem, there are many categories which has similar features.
I think best options is re-generate category hierarchy, but those are fixed.
So, I focused on 3-best accuracy, instead of 1-best accuracy.
I want to defined a loss function for 3-best accuracy.
I don't care where is the answer in position 1 - 3.
Is there any good loss function for that? of How can I define it?
You can use keras.metrics.top_k_categorical_accuracy for calculating accuracy. But this one is accuracy metric. I don't think there is any inbuilt top_k loss function in TensorFlow or Keras as of now. A loss function should be differentiable to work with gradient based learning methods. While top_k is not a differentiable function. Just like accuracy metric. Hence it can be used as accuracy metric but not as learning objective. So you won't find any inbuilt method for this, however there are other research papers aiming to solve this problems. You might want to have a look at Learning with Average Top-k Loss and Smooth Loss Functions for Deep Top-k Classification.
you can use any of the below
top_k_categorical_accuracy
keras.metrics.top_k_categorical_accuracy(y_true, y_pred, k=3)
sparse_top_k_categorical_accuracy
keras.metrics.sparse_top_k_categorical_accuracy(y_true, y_pred, k=3)
I am trying to do binary classification, and the one class (0) is approximately 1 third of the other class (1). when I run the raw data through a normal feed forward neural network, the accuracy is about 0.78. However, when I implement class_weights, the accuracy drops to about 0.49. The roc curve also seems to do better without the class_weights. Why does this happen, and how can i fix it?
II have already tried changing the model, and implementing regularization, and dropouts, etc. But nothing seems to change the overall accuracy
this is how i get my weights:
class_weights = class_weight.compute_class_weight('balanced', np.unique(y_train), y_train)
class_weight_dict = dict(enumerate(class_weights))
Here is the results without the weights:
Here is with the weights:
I would expect the results to be better with the class_weights but the opposite seems to be true. Even the roc does not seem to do any better with the weights.
Due to the class imbalance a very weak baseline of always selecting the majority class will get accuracy of approximately 75%.
The validation curve of the network that was trained without class weights appears to show that it is picking a solution close to always selecting the majority class. This can be seen from the network not improving much over the validation accuracy it gets in the 1st epoch.
I would recommend looking into the confusion matrix, precision and recall metrics to get more information about which model is better.
This answer seems too late, but I hope it is helpful anyway. I just want to add four points:
Since the proportion of your data is minority: 25% and majority: 75%, accuracy is computed as:
accuracy = True positive + true negative / (true positive + true negative + false positive + false negative)
Thus, if you look at the accuracy as a metric, most likely any models could achieve around 75% accuracy by simply predicting the majority class all the time. That's why on the validation set, the model was not able to predict correctly.
While with class weights, the learning curve was not smooth but the model actually started to learn and it failed from time to time on the validation set.
As it was already stated, perhaps changing metrics such as F1 score would help. I saw that you are implementing tensorflow, tensorflow has metric F1 score on their Addons, you can find it on their documentation here. For me, I looked at the classfication report in scikit learn, let's say you want to see the model's performance on the validation set (X_val, y_val):
from sklearn.metrics import classification_report
y_predict = model.predict(X_val, batch_size=64, verbose=1
print(classification_report(y_val, y_predict))
Other techniques you might want to try such as implementing upsampling and downsampling at the same time can help, or SMOTE.
Best of luck!
If I correctly understood the significance of the loss function to the model, it directs the model to be trained based on minimizing the loss value. So for example, if I want my model to be trained in order to have the least mean absolute error, i should use the MAE as the loss function. Why is it, for example, sometimes you see someone wanting to achieve the best accuracy possible, but building the model to minimize another completely different function? For example:
model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')
How come the model above is trained to give us the best acc, since during it's training it will try to minimize another function (MSE). I know that, when already trained, the metric of the model will give us the best acc found during the training.
My doubt is: shouldn't the focus of the model during it's training to maximize acc (or minimize 1/acc) instead of minimizing MSE? If done in that way, wouldn't the model give us even higher accuracy, since it knows it has to maximize it during it's training?
To start with, the code snippet you have used as example:
model.compile(loss='mean_squared_error', optimizer='sgd', metrics='acc')
is actually invalid (although Keras will not produce any error or warning) for a very simple and elementary reason: MSE is a valid loss for regression problems, for which problems accuracy is meaningless (it is meaningful only for classification problems, where MSE is not a valid loss function). For details (including a code example), see own answer in What function defines accuracy in Keras when the loss is mean squared error (MSE)?; for a similar situation in scikit-learn, see own answer in this thread.
Continuing to your general question: in regression settings, usually we don't need a separate performance metric, and we normally use just the loss function itself for this purpose, i.e. the correct code for the example you have used would simply be
model.compile(loss='mean_squared_error', optimizer='sgd')
without any metrics specified. We could of course use metrics='mse', but this is redundant and not really needed. Sometimes people use something like
model.compile(loss='mean_squared_error', optimizer='sgd', metrics=['mse','mae'])
i.e. optimise the model according to the MSE loss, but show also its performance in the mean absolute error (MAE) in addition to MSE.
Now, your question:
shouldn't the focus of the model during its training to maximize acc (or minimize 1/acc) instead of minimizing MSE?
is indeed valid, at least in principle (save for the reference to MSE), but only for classification problems, where, roughly speaking, the situation is as follows: we cannot use the vast arsenal of convex optimization methods in order to directly maximize the accuracy, because accuracy is not a differentiable function; so, we need a proxy differentiable function to use as loss. The most common example of such a loss function suitable for classification problems is the cross entropy.
Rather unsurprisingly, this question of yours pops up from time to time, albeit in slight variations in context; see for example own answers in
Cost function training target versus accuracy desired goal
Targeting a specific metric to optimize in tensorflow
For the interplay between loss and accuracy in the special case of binary classification, you may find my answers in the following threads useful:
Loss & accuracy - Are these reasonable learning curves?
How does Keras evaluate the accuracy?
I have a skewed dataset (5,000,000 positive examples and only 8000 negative [binary classified]) and thus, I know, accuracy is not a useful model evaluation metric. I know how to calculate precision and recall mathematically but I am unsure how to implement them in python code.
When I train the model on all the data I get 99% accuracy overall but 0% accuracy on the negative examples (ie. classifying everything as positive).
I have built my current model in Pytorch with the criterion = nn.CrossEntropyLoss() and optimiser = optim.Adam().
So, my question is, how do I implement precision and recall into my training to produce the best model possible?
Thanks in advance
The implementation of precision, recall and F1 score and other metrics are usually imported from the scikit-learn library in python.
link: http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics
Regarding your classification task, the number of positive training samples simply eclipse the negative samples. Try training with reduced number of positive samples or generating more negative samples. I am not sure deep neural networks could provide you with an optimal result considering the class skewness.
Negative samples can be generated using the Synthetic Minority Over-sampling Technique (SMOT) technique. This link is a good place to start.
Link: https://www.analyticsvidhya.com/blog/2017/03/imbalanced-classification-problem/
Try using simple models such as logistic regression or random forest first and check if there is any improvement in the F1 score of the model.
To add to the other answer, some classifiers have a parameter called class_weight which let's you modify the loss function. By penalizing wrong predictions on the minority class more, you can train your classifier to learn to predict both classes.
For a pytorch specific answer, you can refer this link
As mentioned in the other answer, over and undersampling strategies can be used. If you are looking for something better, take a look at this paper