f1_score metric in lightgbm - python

I want to train a lgb model with custom metric : f1_score with weighted average.
I went through the advanced examples of lightgbm over here and found the implementation of custom binary error function. I implemented as similar function to return f1_score as shown below.
def f1_metric(preds, train_data):
labels = train_data.get_label()
return 'f1', f1_score(labels, preds, average='weighted'), True
I tried to train the model by passing feval parameter as f1_metric as shown below.
evals_results = {}
bst = lgb.train(params,
dtrain,
valid_sets= [dvalid],
valid_names=['valid'],
evals_result=evals_results,
num_boost_round=num_boost_round,
early_stopping_rounds=early_stopping_rounds,
verbose_eval=25,
feval=f1_metric)
Then I am getting ValueError: Found input variables with inconsistent numbers of samples:
The training set is being passed to the function rather than the validation set.
How can I configure such that the validation set is passed and f1_score is returned?

The docs are a bit confusing. When describing the signature of the function that you pass to feval, they call its parameters preds and train_data, which is a bit misleading.
But the following seems to work:
from sklearn.metrics import f1_score
def lgb_f1_score(y_hat, data):
y_true = data.get_label()
y_hat = np.round(y_hat) # scikits f1 doesn't like probabilities
return 'f1', f1_score(y_true, y_hat), True
evals_result = {}
clf = lgb.train(param, train_data, valid_sets=[val_data, train_data], valid_names=['val', 'train'], feval=lgb_f1_score, evals_result=evals_result)
lgb.plot_metric(evals_result, metric='f1')
To use more than one custom metric, define one overall custom metrics function just like above, in which you calculate all metrics and return a list of tuples.
Edit: Fixed code, of course with F1 bigger is better should be set to True.

Regarding Toby's answers:
def lgb_f1_score(y_hat, data):
y_true = data.get_label()
y_hat = np.round(y_hat) # scikits f1 doesn't like probabilities
return 'f1', f1_score(y_true, y_hat), True
I suggest change the y_hat part to this:
y_hat = np.where(y_hat < 0.5, 0, 1)
Reason:
I used the y_hat = np.round(y_hat) and fonud out that during training the lightgbm model will sometimes(very unlikely but still a change) regard our y prediction to multiclass instead of binary.
My speculation:
Sometimes the y prediction will be small or higher enough to be round to negative value or 2?I'm not sure,but when i changed the code using np.where, the bug is gone.
Cost me a morning to figure this bug,although I'm not really sure if the np.where solution is good.

Related

How to use a custom loss function in a Neural Network with MLPClassifier Sklearn?

I would like to use a custom loss function to train a neural network in scikit learn; using MLPClassifier. I would like to give more importance to larger values. Therefore, I would like to use something like the mean square error but multiplying the numerator by y. Thus, it would look like :
1/n∑y(yi-y(hat)i)^2
Here is the code of my model:
mlp10 = MLPClassifier(hidden_layer_sizes=(150,100,50,25,10), max_iter=1000,
random_state=42)
mlp10.fit(X_train, y_train)
How can I modify the loss function ?
I don't believe you can modify the loss function directly as there is no parameter for it in the construction of the classifier and the documentation explicitly specifies that it's optimizing using the log-loss function. If you're willing to be a bit flexible, you might be able to get the effect you're looking for simply by an transform of the y values before training and then use the inverse transform to recover the predicted ys after testing.
For instance, mapping y_prime = transform(y) and y = inverse_transform(y_prime) on each value where you define transform and inverse_transform as:
def transform(y):
return y ** 2
def inverse_transform(y_prime):
return math.sqrt(y_prime)
would cause larger values of y to have more influence in the training. Obviously you could experiment with different transforms to see what works best for your use-case. The key is just to make sure that transform is superlinear.
Before training you'd need to do:
y_train = map(transform, y_train)
And after calling predict:
y_predict = model.predict(x)
y_predict = map(inverse_transform, y_predict)

post-process cross-validated prediction before scoring

I have a regression problem, where I am cross-validating the results and evaluating the performance. I know beforehand that the ground truth cannot be smaller than zero. Therefore, I would like to intercept the predictions, before they are fed to the score metric, to clip the predictions to zero. I thought that using the make_scorer function would be useful to do this. Is it possible to somehow post-process the predictions after cross-validation, but before applying an evaluation metric to it?
from sklearn.metrics import mean_squared_error, r2_score, make_scorer
from sklearn.model_selection import cross_validate
# X = Stacked feature vectors
# y = ground truth vector
# regr = some regression estimator
#### How to indicate that the predictions need post-processing
#### before applying the score function???
scoring = {'r2': make_scorer(r2_score),
'neg_mse': make_scorer(mean_squared_error)}
scores = cross_validate(regr, X, y, scoring=scoring, cv=10)
PS: I know there are constrained estimators, but I wanted to see how a heuristic approach like this would perform.
One thing you can do is wrap those scorers you're looking to use (r2_score, mean_squared_error) in a custom scorer function using make_scorer() as you suggested.
Take a look at this part of the sklearn documentation and this Stack Overflow post for some examples. In particular, your function can do something like this:
def clipped_r2(y_true, y_pred):
y_pred_clipped = np.clip(y_pred, 0, None)
return r2_score(y_true, y_pred_clipped)
def clipped_mse(y_true, y_pred):
y_pred_clipped = (y_pred, 0, None)
return mean_squared_error(y_true, y_pred_clipped)
This allows you to do the post-processing right within the scorer before calling the scoring function (in this case r2_score or mean_squared_error). Then to use it just use make_scorer like you were doing above, setting greater_is_better according to whether the scorer is a scoring function (like r2, greater is better), or loss function (mean_squared_error is better when it's 0, i.e. less):
scoring = {'r2': make_scorer(clipped_r2, greater_is_better=True),
'neg_mse': make_scorer(clipped_mse, greater_is_better=False)}
scores = cross_validate(regr, X, y, scoring=scoring, cv=10)

Logistic Regression mean square error

NOTE: I appreciate the massive quantity of comments suggesting that this is inappropriate to quantify model performance. However, this is irrelevant to my error, and this error occurs for a variety of other metrics. Also, see here for the appropriate way to respond when you think the OP is "asking the wrong question"
I have an sklearn logistic model for which I am attempting to get the RMSE. However, when I .predict_proba, I get a vector of probabilities. However, my y_test is in its categorical form, which sklearn.linear_model.LogisticRegression just sort of dealt with automagically.
How to I reconcile these two things to get the RMSE?
>>> sklearn.metrics.mean_squared_error(y_test, pred_proba, sample_weight=weights_test)
ValueError: y_true and y_pred have different number of output (1!=13)
predict_proba is predicting the probability that a sample belongs to a class. The arg max of those probabilities is the predicted class (categorical form). RMSE is not a metric for classification. If you want to evaluate your model, consider a different metric like accuracy_score:
from sklearn.metrics import accuracy_score
predictions = your_model.predict(X_test)
print("Accuracy: %.3f" % accuracy_score(y_test, predictions))
The brier score, basically the mean squared error, is a known and valid loss function for classification models that leverage probability scores; I would take a look at that as well.
To your particular issue, you want to compare the probabilities returned for your target class, i.e. for a binary class problem:
from sklearn.metrics import brier_score_loss
probs = your_model.predict_proba(X_test)
brier_score_loss(y_true, probs[:, 1])
I'm not sure brier is formally defined for multiclass problems. I would point to the idea of mean misclassification error, which averages the error across classes.
To leverage this within the sklearn API, encode your y_true categorically, i.e. each class gets its own column, and call
sklearn.metrics.mean_squared_error(y_true, probs, multioutput=’uniform_average’)
Here is how you can calculate RMSE:
import numpy as np
from sklearn.metrics import mean_squared_error
x = np.range(10)
y = x
rmse = np.sqrt(mean_squared_error(x, y))
One can transform the y_test into a format compatible with the predict_proba output as follows:
model = sklearn.linear_model.LogisticRegression().fit(X,y) # or whatever model
label_encoder = sklearn.preprocessing.LabelEncoder()
label_encoder.classes_ = model.classes_
y_test_onehot = sklearn.preprocessing.OneHotEncoder().fit_transform(label_encoder.transform(y_test).reshape((-1,1)))
You can now apply any of the metrics in sklearn.metric. This is essential for computing, say, the brier score.

sklearn RandomizedSearchCV extract confusion matrix for different folds

I try to calculate an aggregated confusion matrix to evaluate my model:
cv_results = cross_validate(estimator, dataset.data, dataset.target, scoring=scoring,
cv=Config.CROSS_VALIDATION_FOLDS, n_jobs=N_CPUS, return_train_score=False)
But I don't know how to extract the single confusion matrices of the different folds. In a scorer I can compute it:
scoring = {
'cm': make_scorer(confusion_matrix)
}
, but I cannot return the comfusion matrix, because it has to return a number instead of an array. If I try it I get the following error:
ValueError: scoring must return a number, got [[...]] (<class 'numpy.ndarray'>) instead. (scorer=cm)
I wonder if it is possible to store the confusion matrices in a global variable, but had no success using
global cm_list
cm_list.append(confusion_matrix(y_true,y_pred))
in a custom scorer.
Thanks in advance for any advice.
To return confusion matrix for each fold ,you can call confusion_matrix from metrics modules in each iteration(fold) which will give you an array as output.Input will be a y_true and y_predict values obtained for each fold.
from sklearn import metrics
print metrics.confusion_matrix(y_true,y_predict)
array([[327582, 264313],
[167523, 686735]])
Alternatively, if you are using pandas then pandas has a crosstab module
df_conf = pd.crosstab(y_true,y_predict,rownames=['Actual'],colnames=['Predicted'],margins=True)
print df_conf
Predicted 0 1 All
Actual
0 332553 58491 391044
1 97283 292623 389906
All 429836 351114 780950
The problem was, that I could not get access to the estimator after RandomizedSearchCV was finished, because I did not know RandomizedSearchCV implements a predict method. Here is my personal solution:
r_search = RandomizedSearchCV(estimator=estimator, param_distributions=param_distributions,
n_iter=n_iter, cv=cv, scoring=scorer, n_jobs=n_cpus,
refit=next(iter(scorer)))
r_search.fit(X, y_true)
y_pred = r_search.predict(X)
cm = confusion_matrix(y_true, y_pred)

scikit-learn classification on soft labels

According to the documentation it is possible to specify different loss functions to SGDClassifier. And as far as I understand log loss is a cross-entropy loss function which theoretically can handle soft labels, i.e. labels given as some probabilities [0,1].
The question is: is it possible to use SGDClassifier with log loss function out the box for classification problems with soft labels? And if not - how this task (linear classification on soft labels) can be solved using scikit-learn?
UPDATE:
The way target is labeled and by the nature of the problem hard labels don't give good results. But it is still a classification problem (not regression) and I wan't to keep probabilistic interpretation of the prediction so regression doesn't work out of the box too. Cross-entropy loss function can handle soft labels in target naturally. It seems that all loss functions for linear classifiers in scikit-learn can only handle hard labels.
So the question is probably:
How to specify my own loss function for SGDClassifier, for example. It seems scikit-learn doesn't stick to the modular approach here and changes need to be done somewhere inside it's sources
I recently had this problem and came up with a nice fix that seems to work.
Basically, transform your targets to log-odds-ratio space using the inverse sigmoid function. Then fit a linear regression. Then, to do inference, take the sigmoid of the predictions from the linear regression model.
So say we have soft targets/labels y ∈ (0, 1) (make sure to clamp the targets to say [1e-8, 1 - 1e-8] to avoid instability issues when we take logs).
We take the inverse sigmoid, then we fit a linear regression (assuming predictor variables are in matrix X):
y = np.clip(y, 1e-8, 1 - 1e-8) # numerical stability
inv_sig_y = np.log(y / (1 - y)) # transform to log-odds-ratio space
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X, inv_sig_y)
Then to make predictions:
def sigmoid(x):
ex = np.exp(x)
return ex / (1 + ex)
preds = sigmoid(lr.predict(X_new))
This seems to work, at least for my use case. My guess is that it's not far off what happens behind the scenes for LogisticRegression anyway.
Bonus: this also seems to work with other regression models in sklearn, e.g. RandomForestRegressor.
According to the docs,
The ‘log’ loss gives logistic regression, a probabilistic classifier.
In general a loss function is of the form Loss( prediction, target ), where prediction is the model's output, and target is the ground-truth value. In the case of logistic regression, prediction is a value on (0,1) (i.e., a "soft label"), while target is 0 or 1 (i.e., a "hard label").
So in answer to your question, it depends on if you are referring to the prediction or target. Generally speaking, the form of the labels ("hard" or "soft") is given by the algorithm chosen for prediction and by the data on hand for target.
If your data has "hard" labels, and you desire a "soft" label output by your model (which can be thresholded to give a "hard" label), then yes, logistic regression is in this category.
If your data has "soft" labels, then you would have to choose a threshold to convert them to "hard" labels before using typical classification methods (i.e., logistic regression). Otherwise, you could use a regression method where the model is fit to predict the "soft" target. In this latter approach, your model could give values outside of (0,1), and this would have to be handled.
for those interested, i've implemented a custom class that behaves like a normal classifier, but takes a any regressor in the cosntructor to perform the transformation suggested by #nlml:
from sklearn.base import BaseEstimator, ClassifierMixin
from sklearn.utils.validation import check_array
from scipy.special import softmax
import numpy as np
def _log_odds_ratio_scale(X):
X = np.clip(X, 1e-8, 1 - 1e-8) # numerical stability
X = np.log(X / (1 - X)) # transform to log-odds-ratio space
return X
class FuzzyTargetClassifier(ClassifierMixin, BaseEstimator):
def __init__(self, regressor):
'''
Fits regressor in the log odds ratio space (inverse crossentropy) of target variable.
during transform, rescales back to probability space with softmax function
Parameters
---------
regressor: Sklearn Regressor
base regressor to fit log odds ratio space. Any valid sklearn regressor can be used here.
'''
self.regressor = regressor
return
def fit(self, X, y=None, **kwargs):
#ensure passed y is onehotencoded-like
y = check_array(y, accept_sparse=True, dtype = 'numeric', ensure_min_features=1)
self.regressors_ = [clone(self.regressor) for _ in range(y.shape[1])]
for i in range(y.shape[1]):
self._fit_single_regressor(self.regressors_[i], X, y[:,i], **kwargs)
return self
def _fit_single_regressor(self, regressor, X, ysub, **kwargs):
ysub = _log_odds_ratio_scale(ysub)
regressor.fit(X, ysub, **kwargs)
return regressor
def decision_function(self,X):
all_results = []
for reg in self.regressors_:
results = reg.predict(X)
if results.ndim < 2:
results = results.reshape(-1,1)
all_results.append(results)
results = np.hstack(all_results)
return results
def predict_proba(self, X):
results = self.decision_function(X)
results = softmax(results, axis = 1)
return results
def predict(self, X):
results = self.decision_function(X)
results = results.argmax(1)
return results

Categories