Shapley for Logistic regression?

Shapley for Logistic regression? - python

Does shapley support logistic regression models?
Running the following code i get:
logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)
predictions = logmodel.predict(X_test)
explainer = shap.TreeExplainer(logmodel )
Exception: Model type not yet supported by TreeExplainer: <class 'sklearn.linear_model.logistic.LogisticRegression'>
P.S. You are supposed to use a different explainder for different models

Shap is model agnostic by definition. It looks like you have just chosen an explainer that doesn't suit your model type. I suggest looking at KernelExplainer which as described by the creators here is
An implementation of Kernel SHAP, a model agnostic method to estimate SHAP values for any model. Because it makes not assumptions about the model type, KernelExplainer is slower than the other model type specific algorithms.
The documentation for Shap is mostly solid and has some decent examples.

explainer = shap.LinearExplainer(logmodel) should work as Logistic Regression is a linear model.

Logistic Regression is a linear model, so you should use the linear explainer.

Related

How to add sample_weight into a scikit-learn estimator

I have recently developed a scikit-learn estimator (a classifier) and I am now wanting to add sample_weight to the estimator. The reason is so I could apply boosting (ie. Adaboost) to the estimator (as Adaboost requires sample_weight to be present in the estimator).
I had a look at a few different scikit-learn estimators such as linear regression, logistic regression and SVM, but they all seem to have a different way of adding sample_weight into their estimators and it's not very clear to me:
Linear regression:
https://github.com/scikit-learn/scikit-learn/blob/95d4f0841/sklearn/linear_model/_base.py#L375
Logistic regression:
https://github.com/scikit-learn/scikit-learn/blob/95d4f0841/sklearn/linear_model/_logistic.py#L1459
SVM:
https://github.com/scikit-learn/scikit-learn/blob/95d4f0841d57e8b5f6b2a570312e9d832e69debc/sklearn/svm/_base.py#L796
So I am confused now and wanting to know how do I add sample_weight into my estimator? Is there a standard way of doing this in scikit-learn or it just depends on the estimator? Any templates or any examples would really be appreciated. Many thanks in advance.

One versus One and One versus All multiclass classification using logistic regression in python

This is my understanding of OvO versus OvA:
One versus One is binary classification like Banana versus Orange. One versus All/Rest classification turns it into multiple different binary classification problems.
My implementation in python for these 2 strategies yield very similar results :
OvA:
model = LogisticRegression(random_state=0, multi_class='ovr', solver='lbfgs')
model.fit(x,y)
model.predict(x)
OvO:
model = LogisticRegression()
model.fit(x,y)
model.predict(x)
I wanted to confirm my understanding and implementation is correct since I get similar results.
I need to implement OvO and OvA strategy for multiclass classification using logistic regression

I ended up using the sklearn inbuilt class for oneVsRestClassifier and OneVsOneclassifier

How to get probabilities for SGDClassifier (LinearSVM)

I'm using SGDClassifier with loss function = "hinge". But hinge loss does not support probability estimates for class labels.
I need probabilities for calculating roc_curve. How can I get probabilities for hinge loss in SGDClassifier without using SVC from svm?
I've seen people mention about using CalibratedClassifierCV to get the probabilities but I've never used it and I don't know how it works.
I really appreciate the help. Thanks

In the strict sense, that's not possible.
Support vector machine classifiers are non-probabilistic: they use a hyperplane (a line in 2D, a plane in 3D and so on) to separate points into one of two classes. Points are only defined by which side of the hyperplane they are on., which forms the prediction directly.
This is in contrast with probabilistic classifiers like logistic regression and decision trees, which generate a probability for every point that is then converted to a prediction.
CalibratedClassifierCV is a sort of meta-estimator; to use it, you simply pass your instance of a base estimator to its constructor, so this will work:
base_model = SGDClassifier()
model = CalibratedClassifierCV(base_model)
model.fit(X, y)
model.predict_proba(X)
What it does is perform internal cross-validation to create a probability estimate. Note that this is equivalent to what sklearn.SVM.SVC does anyway.

Nature and redundancy of classifiers

I am applying a set of linear and non-linear classification models in a classification task. The input data are language vectors (CountVectorizer, Word2Vec) and binary labels. In scikit-learn, I selected following estimators:
LogisticRegression(),
LinearSVC(),
XGBClassifier(),
SGDClassifier(),
SVC(), # Radial basis function kernel
BernoulliNB(), # Naive Bayes seems widely used for LV models
KNeighborsClassifier(),
RandomForestClassifier(),
MLPClassifier()
Question: Am I correct that LinearSVC() is a linear
classifier, at least for the case of a binary estimator?
Question: In view of experts, is there any significant redundancy among the classifiers?
Thanks for clarification.

LogisticRegression(), LinearSVC(), SGDClassifier() and BernoulliNB() are linear models.
With the default loss function SGDClassifier() works as a linear SVM, with log loss as a logistic regression, so one of these three is redundant. Also you could substitute LogisticRegression() for LogisticRegressionCV() which has built-in optimization for regularization hyperparameter.
XGBClassifier() and all the others are non-linear.
The list seems to include all the major sklearn classifiers.

Online version of Ridge Regression Classifier in ski-kitlearn?

I'm trying a range of Online classifiers in the ski-kitlearn library to train a model from huge data. I found there are many classifiers supporting the partial_fit allowing for incremental learning. I want to use the Ridge Regression classifier in this setting, but could not find it in the implementation. Is there an alternative model that can do this in sklearn?

sklearn.linear_model.SGDClassifier, its loss function contain ‘hinge’, ‘log’, ‘modified_huber’, ‘squared_hinge’, ‘perceptron’
sklearn.linear_model.SGDRegressor, its default loss function is squared_loss, The possible values are ‘squared_loss’, ‘huber’, ‘epsilon_insensitive’, or ‘squared_epsilon_insensitive’

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Shapley for Logistic regression? - python

explainer = shap.LinearExplainer(logmodel) should work as Logistic Regression is a linear model.

Logistic Regression is a linear model, so you should use the linear explainer.

Related

How to add sample_weight into a scikit-learn estimator

One versus One and One versus All multiclass classification using logistic regression in python

How to get probabilities for SGDClassifier (LinearSVM)

Nature and redundancy of classifiers

Online version of Ridge Regression Classifier in ski-kitlearn?

Categories

Resources