Evaluation of multilabel and multiclass data labels

Evaluation of multilabel and multiclass data labels - python

are there any evaluation metrics available for multiclass-multilabel classification?
for example, I'm taking part in the following competition at kaggle and it requires ROC AUC as evaluation metric.: http://www.kaggle.com/c/mlsp-2013-birds
Is it possible to do this using sklearn?

There's this library from Kaggle's Director of Engineering:
https://github.com/benhamner/Metrics/tree/master/Python

As of 2021, sklearn.metrics includes several functions you can use for evaluating multiclass-multilabel classification models. For example accuracy_score can calculate the fraction of correct (i.e. all predicted labels are correct) predictions. The hamming_loss function can calculate the Hamming Loss, or fraction of labels that are incorrectly predicted, in a given test set. You can find an in-depth discussion of the available metrics here.

Related

Is it possible to apply sklearn evaluation metrics such as precision, recall, f1_score on my problem?

My deep learning topic is classifying images into 5 different categories. I used the ImageDataGenerator library to split my dataset into train and test. I've successfully developed a model architecture following the CNN method and evaluated the performance of my model on a test dataset, which gave me an accuracy of 83%.
Is it possible to apply sklearn evaluation metrics such as precision, recall, f1_score, etc. to evaluate my test results? If yes, how can I do it?

Yes you can do it as long as your model is giving out either the class labels or probabilities as it predictions.
If your model is predicting the encoded (integer) labels then you can use
sklearn.metrics.precision_score(y_true, model.predict(test_x))
On the other hand if the model is predictiong the probabilies which is norammly the case then you have to fist convert them into class labels using argmax. So if you have a batch of test_x data then you can use
sklearn.metrics.precision_score(y_true, np.argmax(model.predict(test_x), axis=1))

Nature and redundancy of classifiers

I am applying a set of linear and non-linear classification models in a classification task. The input data are language vectors (CountVectorizer, Word2Vec) and binary labels. In scikit-learn, I selected following estimators:
LogisticRegression(),
LinearSVC(),
XGBClassifier(),
SGDClassifier(),
SVC(), # Radial basis function kernel
BernoulliNB(), # Naive Bayes seems widely used for LV models
KNeighborsClassifier(),
RandomForestClassifier(),
MLPClassifier()
Question: Am I correct that LinearSVC() is a linear
classifier, at least for the case of a binary estimator?
Question: In view of experts, is there any significant redundancy among the classifiers?
Thanks for clarification.

LogisticRegression(), LinearSVC(), SGDClassifier() and BernoulliNB() are linear models.
With the default loss function SGDClassifier() works as a linear SVM, with log loss as a logistic regression, so one of these three is redundant. Also you could substitute LogisticRegression() for LogisticRegressionCV() which has built-in optimization for regularization hyperparameter.
XGBClassifier() and all the others are non-linear.
The list seems to include all the major sklearn classifiers.

How to perform cross validation on NMF Python

I am trying to perform cross-validation on NMF to find the best parameters to use. I tried using the sklearn cross-validation but get an error that states the NMF does not have a scoring method. Could anyone here help me with that? Thank you all

A property of nmf is that it is an unsupervised (machine learning) method. This generally means that there is no labeled data that can serve as a 'golden standard'.
In case of NMF you can not define what is the 'desired' outcome beforehand.
The cross validation in sklearn is designed for supervised machine learning, in which you have labeled data by definition.
What cross validation does, it holds out sets of labeled data, then trains a model on the data that is leftover and evaluates this model on the held out set. For this evaluation any metric can be used. For example: accuracy, precision, recall and F-measure, and for computing these measures it needs labeled data.

How to use SciKit Random Forests's oob_decision_function_ for learning curves?

Can someone explain how to use the oob_decision_function_ attribute for the python SciKit Random Forest Classifier? I want to use it to plot learning curves comparing training and validation error against different training set sizes in order to identify overfitting and other problems. Can't seem to find any information about how to do this.

You can pass in a custom scoring function into any of the scoring parameters in the model evaluation fields, it needs to have the signiture classifier, X, y_true -> score.
For your case you could use something like
from sklearn.learning_curve import learning_curve
learning_curve(r, X, y, cv=3, scoring=lambda c,x,y: c.oob_score_)
This will compute 3-fold cross validated oob scores against different training set sizes. Btw I don't think you should get overfitting with random forests, that's one of the benefits of them.

Scikit-learn categorisation: binomial log regression?

I have texts that are rated on a continous scale from -100 to +100. I am trying to classify them as positive or negative.
How can you perform binomial log regression to get the probability that test data is -100 or +100?
The closest I have got is the SGDClassifier( penalty='l2',alpha=1e-05, n_iter=10), but this doesn't provide the same results as SPSS when I use binomial log regression to predict the probability of -100 and +100. So I'm guessing this is not the right function?

SGDClassifier provides access to several linear classifiers, all trained with stochastic gradient decent. It will default to a linear support vector machine, unless you call it with a different loss function. loss='log' will provide a probabilistic logistic regression.
See the documentation at:
http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier
Alternatively, you could use sklearn.linear_model.LogisticRegression to classify your texts with a logistic regression.
It's not clear to me that you will get exactly the same results as you do with SPSS due to differences in implementation. However, I would not expect to see statistically significant differences.
Edited to add:
My suspicion is that the 99% accuracy you're getting with the SPSS logistic regression is training set accuracy, while the 87% that you're seeing with scikits-learn logistic regression is test set accuracy. I found this question on the datascience stack exchange where a different person is attempting and extremely similar problem, and getting ~99% accuracy on training sets and 90% test set accuracy.
https://datascience.stackexchange.com/questions/987/text-categorization-combining-different-kind-of-features
My recommended path forwards is a follows: Try several different basic classifiers in scikits-learn, including the standard logistic regression and a linear SVM, and also rerun the SPSS logistic regression several times with different train/test subsets of your data and compare the results. If you continue to see a large divergence across classifiers that can't be accounted for by ensuring similar train/test data splits, then post the results that you're seeing into your question, and we can move forward from there.
Good luck!

If pos/neg, or the probability of pos, is really the only thing you need as output, then you can derive binary labels y as
y = score > 0
assuming you have the scores in a NumPy array score.
You can then feed this to a LogisticRegression instance, using the continuous score to derive relative weights for the samples:
clf = LogisticRegression()
sample_weight = np.abs(score)
sample_weight /= sample_weight.sum()
clf.fit(X, y, sample_weight)
This gives maximum weight to tweets with scores ±100, and a weight of zero to tweets that are labeled neutral, varying linearly between the two.
If the dataset is very large, then as #brentlance showed, you can use SGDClassifier, but you have to give it loss="log" if you want a logistic regression model; otherwise, you'll get a linear SVM.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.