scikit-learn classification on soft labels - python

According to the documentation it is possible to specify different loss functions to SGDClassifier. And as far as I understand log loss is a cross-entropy loss function which theoretically can handle soft labels, i.e. labels given as some probabilities [0,1].
The question is: is it possible to use SGDClassifier with log loss function out the box for classification problems with soft labels? And if not - how this task (linear classification on soft labels) can be solved using scikit-learn?
UPDATE:
The way target is labeled and by the nature of the problem hard labels don't give good results. But it is still a classification problem (not regression) and I wan't to keep probabilistic interpretation of the prediction so regression doesn't work out of the box too. Cross-entropy loss function can handle soft labels in target naturally. It seems that all loss functions for linear classifiers in scikit-learn can only handle hard labels.
So the question is probably:
How to specify my own loss function for SGDClassifier, for example. It seems scikit-learn doesn't stick to the modular approach here and changes need to be done somewhere inside it's sources

I recently had this problem and came up with a nice fix that seems to work.
Basically, transform your targets to log-odds-ratio space using the inverse sigmoid function. Then fit a linear regression. Then, to do inference, take the sigmoid of the predictions from the linear regression model.
So say we have soft targets/labels y ∈ (0, 1) (make sure to clamp the targets to say [1e-8, 1 - 1e-8] to avoid instability issues when we take logs).
We take the inverse sigmoid, then we fit a linear regression (assuming predictor variables are in matrix X):
y = np.clip(y, 1e-8, 1 - 1e-8) # numerical stability
inv_sig_y = np.log(y / (1 - y)) # transform to log-odds-ratio space
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X, inv_sig_y)
Then to make predictions:
def sigmoid(x):
ex = np.exp(x)
return ex / (1 + ex)
preds = sigmoid(lr.predict(X_new))
This seems to work, at least for my use case. My guess is that it's not far off what happens behind the scenes for LogisticRegression anyway.
Bonus: this also seems to work with other regression models in sklearn, e.g. RandomForestRegressor.

According to the docs,
The ‘log’ loss gives logistic regression, a probabilistic classifier.
In general a loss function is of the form Loss( prediction, target ), where prediction is the model's output, and target is the ground-truth value. In the case of logistic regression, prediction is a value on (0,1) (i.e., a "soft label"), while target is 0 or 1 (i.e., a "hard label").
So in answer to your question, it depends on if you are referring to the prediction or target. Generally speaking, the form of the labels ("hard" or "soft") is given by the algorithm chosen for prediction and by the data on hand for target.
If your data has "hard" labels, and you desire a "soft" label output by your model (which can be thresholded to give a "hard" label), then yes, logistic regression is in this category.
If your data has "soft" labels, then you would have to choose a threshold to convert them to "hard" labels before using typical classification methods (i.e., logistic regression). Otherwise, you could use a regression method where the model is fit to predict the "soft" target. In this latter approach, your model could give values outside of (0,1), and this would have to be handled.

for those interested, i've implemented a custom class that behaves like a normal classifier, but takes a any regressor in the cosntructor to perform the transformation suggested by #nlml:
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.utils.validation import check_array
from scipy.special import softmax
import numpy as np
def _log_odds_ratio_scale(X):
X = np.clip(X, 1e-8, 1 - 1e-8) # numerical stability
X = np.log(X / (1 - X)) # transform to log-odds-ratio space
return X
class FuzzyTargetClassifier(ClassifierMixin, BaseEstimator):
def __init__(self, regressor):
'''
Fits regressor in the log odds ratio space (inverse crossentropy) of target variable.
during transform, rescales back to probability space with softmax function
Parameters
---------
regressor: Sklearn Regressor
base regressor to fit log odds ratio space. Any valid sklearn regressor can be used here.
'''
self.regressor = regressor
return
def fit(self, X, y=None, **kwargs):
#ensure passed y is onehotencoded-like
y = check_array(y, accept_sparse=True, dtype = 'numeric', ensure_min_features=1)
self.regressors_ = [clone(self.regressor) for _ in range(y.shape[1])]
for i in range(y.shape[1]):
self._fit_single_regressor(self.regressors_[i], X, y[:,i], **kwargs)
return self
def _fit_single_regressor(self, regressor, X, ysub, **kwargs):
ysub = _log_odds_ratio_scale(ysub)
regressor.fit(X, ysub, **kwargs)
return regressor
def decision_function(self,X):
all_results = []
for reg in self.regressors_:
results = reg.predict(X)
if results.ndim < 2:
results = results.reshape(-1,1)
all_results.append(results)
results = np.hstack(all_results)
return results
def predict_proba(self, X):
results = self.decision_function(X)
results = softmax(results, axis = 1)
return results
def predict(self, X):
results = self.decision_function(X)
results = results.argmax(1)
return results

Related

Minimizing BER for sklearn linear models

I'm using sklearn's LinearSVC and LogisticRegression models for classification. I want the optimization in training to minimize BER (balanced error rate). I noticed that you can set the class_weight parameter in the models to 'balanced', which should (according to documentation) 'automatically adjust weights inversely proportional to class frequencies in the input data'. However, you can also pass sample weights as a parameter to the fit() method. I'm wondering if these are equivalent paths of achieving the same goal? Anything I should be aware of / concerned about here?
--
(Added detail below based on bot suggestion)
For example:
You could define
model = sklearn.svm.LinearSVC(class_weight = 'balanced')
and then fit
model.fit(X_train, y_train)
OR you could define
model = sklearn.svm.LinearSVC()
and then fit
wts = _array of length n (number of samples) formulated to balance the classes_
model.fit(X_train,y_train, sample_weight = wts)
...
Are these equivalent and do either/both satisfy my goal of minimizing BER?

Python: statsmodels - what does .predict(X) actually predict?

I'm a bit confused as to what the line model.predict(X) actually predicts. I can't find anything on it with a Google search.
import statsmodels.api as sm
# Step 1) Load data into dataframe
df = pd.read_csv('my_data.csv')
# Step 2) Separate dependent and independent variables
X = df['independent_variable']
y = df["dependent_variable"]
# Step 3) using OLS -fit a linear regression
model = sm.OLS(y, X).fit()
predictions = model.predict(X) # make predictions
predictions
I'm not sure what predictions is showing? Is it predicting the next x amount of rows or something? Aren't I just passing in my independent variables?
You are fitting an OLS model from your data, which is most likely interpreted as an array. The predict method will returns an array of fitted values given the trained model.
In other words, from statsmodels documentation:
Return linear predicted values from a design matrix.
Similar to the sk-learn. After model = sm.OLS(y, X).fit(), you will have a model, then predictions = model.predict(X) is not predict next x amount of rows, it will predict from your X, the training dataset. The model using ordinary least squares will be a function of "x" and the output should be:
$$ \hat{y}=f(x) $$
If you want to predict the new X, you need to split X into training and testing dataset.
Actually you are doing it wrong
The predict method is use to predict next values
After separating dependent and I dependent values
You can split the data in two part train and test
From sklearn.model_selection import train_test_split
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,0.2)
This will make X_train ur 80% of total data with only independent variable
And you can put your y_test in predict method to check how well the model is performing

Underfitting, Overfitting, Good_Generalization

So as a part of my assignment I'm applying linear and lasso regressions, and here's Question 7.
Based on the scores from question 6, what gamma value corresponds to a
model that is underfitting (and has the worst test set accuracy)? What
gamma value corresponds to a model that is overfitting (and has the
worst test set accuracy)? What choice of gamma would be the best
choice for a model with good generalization performance on this
dataset (high accuracy on both training and test set)?
Hint: Try plotting the scores from question 6 to visualize the
relationship between gamma and accuracy. Remember to comment out the
import matplotlib line before submission.
This function should return one tuple with the degree values in this order: (Underfitting, Overfitting, Good_Generalization) Please note there is only one correct solution.
I really need help, I can't really think of any way to solve this last question. What code should I use to determine (Underfitting, Overfitting, Good_Generalization) and why???
Thanks,
Data set: http://archive.ics.uci.edu/ml/datasets/Mushroom?ref=datanews.io
Here's my code from question 6:
from sklearn.svm import SVC
from sklearn.model_selection import validation_curve
def answer_six():
# SVC requires kernel='rbf', C=1, random_state=0 as instructed
# C: Penalty parameter C of the error term
# random_state: The seed of the pseudo random number generator
# used when shuffling the data for probability estimates
# e radial basis function kernel, or RBF kernel, is a popular
# kernel function used in various kernelized learning algorithms,
# In particular, it is commonly used in support vector machine
# classification
model = SVC(kernel='rbf', C=1, random_state=0)
# Return numpy array numbers spaced evenly on a log scale (start,
# stop, num=50, endpoint=True, base=10.0, dtype=None, axis=0)
gamma = np.logspace(-4,1,6)
# Create a Validation Curve for model and subsets.
# Create parameter name and range regarding gamma. Test Scoring
# requires accuracy.
# Validation curve requires X and y.
train_scores, test_scores = validation_curve(model, X_subset, y_subset, param_name='gamma', param_range=gamma, scoring ='accuracy')
# Determine mean for scores and tests along columns (axis=1)
sc = (train_scores.mean(axis=1), test_scores.mean(axis=1))
return sc
answer_six()
Well, make yourself familiar with overfitting. You are supposed to produce something like this: Article on this topic
On the left you have underfitting, on the right overfitting... Where both errors are low you have good generalisation.
And these things are a function of gamma (the regularizor)
Overfitting = your model false
if model false
scatter it
change linear to poly or suport vector with working kernel...
Underfitting = your dataset false
add new data ideal correleated ...
check by nubers
score / accuracy of test and train if test and train high and no big difference you are doiing good ...
if test low or train low then you facing overfitting / underfitting
hope explained you ...

How to use a custom loss function in a Neural Network with MLPClassifier Sklearn?

I would like to use a custom loss function to train a neural network in scikit learn; using MLPClassifier. I would like to give more importance to larger values. Therefore, I would like to use something like the mean square error but multiplying the numerator by y. Thus, it would look like :
1/n∑y(yi-y(hat)i)^2
Here is the code of my model:
mlp10 = MLPClassifier(hidden_layer_sizes=(150,100,50,25,10), max_iter=1000,
random_state=42)
mlp10.fit(X_train, y_train)
How can I modify the loss function ?
I don't believe you can modify the loss function directly as there is no parameter for it in the construction of the classifier and the documentation explicitly specifies that it's optimizing using the log-loss function. If you're willing to be a bit flexible, you might be able to get the effect you're looking for simply by an transform of the y values before training and then use the inverse transform to recover the predicted ys after testing.
For instance, mapping y_prime = transform(y) and y = inverse_transform(y_prime) on each value where you define transform and inverse_transform as:
def transform(y):
return y ** 2
def inverse_transform(y_prime):
return math.sqrt(y_prime)
would cause larger values of y to have more influence in the training. Obviously you could experiment with different transforms to see what works best for your use-case. The key is just to make sure that transform is superlinear.
Before training you'd need to do:
y_train = map(transform, y_train)
And after calling predict:
y_predict = model.predict(x)
y_predict = map(inverse_transform, y_predict)

post-process cross-validated prediction before scoring

I have a regression problem, where I am cross-validating the results and evaluating the performance. I know beforehand that the ground truth cannot be smaller than zero. Therefore, I would like to intercept the predictions, before they are fed to the score metric, to clip the predictions to zero. I thought that using the make_scorer function would be useful to do this. Is it possible to somehow post-process the predictions after cross-validation, but before applying an evaluation metric to it?
from sklearn.metrics import mean_squared_error, r2_score, make_scorer
from sklearn.model_selection import cross_validate
# X = Stacked feature vectors
# y = ground truth vector
# regr = some regression estimator
#### How to indicate that the predictions need post-processing
#### before applying the score function???
scoring = {'r2': make_scorer(r2_score),
'neg_mse': make_scorer(mean_squared_error)}
scores = cross_validate(regr, X, y, scoring=scoring, cv=10)
PS: I know there are constrained estimators, but I wanted to see how a heuristic approach like this would perform.
One thing you can do is wrap those scorers you're looking to use (r2_score, mean_squared_error) in a custom scorer function using make_scorer() as you suggested.
Take a look at this part of the sklearn documentation and this Stack Overflow post for some examples. In particular, your function can do something like this:
def clipped_r2(y_true, y_pred):
y_pred_clipped = np.clip(y_pred, 0, None)
return r2_score(y_true, y_pred_clipped)
def clipped_mse(y_true, y_pred):
y_pred_clipped = (y_pred, 0, None)
return mean_squared_error(y_true, y_pred_clipped)
This allows you to do the post-processing right within the scorer before calling the scoring function (in this case r2_score or mean_squared_error). Then to use it just use make_scorer like you were doing above, setting greater_is_better according to whether the scorer is a scoring function (like r2, greater is better), or loss function (mean_squared_error is better when it's 0, i.e. less):
scoring = {'r2': make_scorer(clipped_r2, greater_is_better=True),
'neg_mse': make_scorer(clipped_mse, greater_is_better=False)}
scores = cross_validate(regr, X, y, scoring=scoring, cv=10)

Categories