So I am relatively new to the ML/AI game in python, and I'm currently working on a problem surrounding the implementation of a custom objective function for XGBoost.
My differential equation knowledge is pretty rusty so I've created a custom obj function with a gradient and hessian that models the mean squared error function that is ran as the default objective function in XGBRegressor to make sure that I am doing all of this correctly. The problem is, the results of the model (the error outputs are close but not identical for the most part (and way off for some points). I don't know what I'm doing wrong or how that could be possible if I am computing things correctly. If you all could look at this an maybe provide insight into where I am wrong, that would be awesome!
The original code without a custom function is:
import xgboost as xgb
reg = xgb.XGBRegressor(n_estimators=150,
max_depth=2,
objective ="reg:squarederror",
n_jobs=-1)
reg.fit(X_train, y_train)
y_pred_test = reg.predict(X_test)
and my custom objective function for MSE is as follows:
def gradient_se(y_true, y_pred):
#Compute the gradient squared error.
return (-2 * y_true) + (2 * y_pred)
def hessian_se(y_true, y_pred):
#Compute the hessian for squared error
return 0*(y_true + y_pred) + 2
def custom_se(y_true, y_pred):
#squared error objective. A simplified version of MSE used as
#objective function.
grad = gradient_se(y_true, y_pred)
hess = hessian_se(y_true, y_pred)
return grad, hess
the documentation reference is here
Thanks!
According to the documentation, the library passes the predicted values (y_pred in your case) and the ground truth values (y_true in your case) in this order.
You pass the y_true and y_pred values in reversed order in your custom_se(y_true, y_pred) function to both the gradient_se and hessian_se functions. For the hessian it doesn't make a difference since the hessian should return 2 for all x values and you've done that correctly.
For the gradient_se function you've incorrect signs for y_true and y_pred.
The correct implementation is as follows:
def gradient_se(y_pred, y_true):
#Compute the gradient squared error.
return 2*(y_pred - y_true)
def hessian_se(y_pred, y_true):
#Compute the hessian for squared error
return 0*y_true + 2
def custom_se(y_pred, y_true):
#squared error objective. A simplified version of MSE used as
#objective function.
grad = gradient_se(y_pred, y_true)
hess = hessian_se(y_pred, y_true)
return grad, hess
Update: Please keep in mind that the native XGBoost implementation and the implementation of the sklearn wrapper for XGBoost use a different ordering of the arguments. The native implementation takes predictions first and true labels (dtrain) second, while the sklearn implementation takes the true labels (dtrain) first and the predictions second.
Related
How do I get the y_true and y_pred values for this loss function I wrote :
def kullback_leibler_divergence(y_true, y_pred):
y_true = K.clip(y_true, K.epsilon(), 1)
y_pred = K.clip(y_pred, K.epsilon(), 1)
return K.sum(y_true * K.log(y_true / y_pred), axis=-1)
I'm not sure if it helps but, in machine learning
y_true refers to the true values, the original labels.
y_pred are the predictions of your classifier thus y_pred = clf.predict(y_true)
The loss function essentially computes how far the predictions are from the 'true' labels.
As I understood your question this might be useful.you can use from tensorflow.keras import backend as K to calculate anything using y_true, y_pred as the following example
when you compile, you should add the functions **to the loss **. then it will provide y_true, y_pred values.
loss:String (name of objective function), objective function or tf.keras.losses.Loss instance. See tf.keras.losses. An objective function is any callable with the signature loss = fn(y_true, y_pred)
# compile the model
model.compile(optimizer='adam', loss=kullback_leibler_divergence, metrics=['accuracy'])
I try to use Squared Normalized Error as my objective function for XGBoostRegressor using documentation hints here: https://xgboost.readthedocs.io/en/latest/tutorials/custom_metric_obj.html. My objective function equation is:
(prediction - observation) / standard_deviation(observations)
While trying to develop it I encountered the following issues:
I am wondering if such objective function is allowed, since standard deviation contains information about all observations (labels) while loss is calculated for each training example individually.
If my approach is correct, I am wondering how to calculate hessian and gradient of this objective function. I analyzed the squared error loss function here: Creating a Custom Objective Function in for XGBoost.XGBRegressor, but failed to understand why x=(predictions - observations) is treated as one parameter. In other words, why do we use loss function as x^2 instead of (x-y)^2? x and y correspond to predictions and observations respectively.
EDIT: I use XGBoost for the task of photovoltaic (PV) yield forecasting and I make predictions for multiple systems using one model. I would like to have low percentage error for all systems, despite their size. However, squared error makes training focus on the largest systems, as their error is naturally the largest. I changed the objective function to:
(prediction - observation) / system_size
and made system_size a global variable, as adding new input variables to gradient and hessian functions is not allowed. The code compiles without errors, but predictions are within very small range. Gradient can be divided by system_sizes, as dividing by constant does not change the derivative. Code I managed to develop so far:
def gradient_sne(predt: np.ndarray, dtrain: DMatrix) -> np.ndarray:
#Compute the gradient squared normalized error.
y = dtrain.get_label()
return 2*(predt - y)/system_sizes
def hessian_sne(predt: np.ndarray, dtrain: DMatrix) -> np.ndarray:
#Compute the hessian for squared error
y = dtrain.get_label()
return 0*y + 2
def custom_sne(y_pred, y_true):
#squared error objective. A simplified version of MSNE used as
#objective function.
grad = gradient_sne(y_pred, y_true)
hess = hessian_sne(y_pred, y_true)
return grad, hess
# Customized metric
def nrmse(predt: np.ndarray, dtrain: DMatrix):
''' Root mean squared normalized error metric. '''
y = dtrain.get_label()
predt[predt < 0] = 0 # all negative predictions are zero
std_dev = np.std(y)
elements = np.power(((y - predt) / std_dev), 2)
return 'RMSNE', float(np.sqrt(np.sum(elements) / len(y)))
I use python 3.7.5 and xgboost 1.0.2. I would appreciate your help very much.
Given a custom loss function in keras:
def my_custom_loss_func(self, y_true, y_pred):
#some code
Is it possible to get the values in y_pred to do some calculations for the loss function? I tried, but someone told me y_pred is just a place holder and the actual value of y_pred cannot be extracted. You can just use Keras backend functions to process y_pred, but you cannot actually access the values in it, such as y_pred[1], or something like that.
What I want to do is something like: if "the top 10 values in y_pred are negative and the bottom 10 are negative", then "return a very high cost because I do not want the network to be optimized this way".
Yes, no worries. Here is some sample code.
def my_custom_loss_func(self, y_true, y_pred):
position_vector_initial = x[0, 0:3] #global variable
position_vector_now = y_pred[0, 0:3]
angle = angle_between(position_vector_initial, position_vector_now)
if (angle < 0):
return high_loss
else:
return kb.mean(kb.sum(kb.square(y_true - y_pred)))
I try to use machine learning techniques to predict time to events. My predictions will be probability vectors v of length 20, v[i] being
the probability that the event occurs in i + 1 days (i ranges from 0 to 19).
How can I test the custom loss and metric functions I write?
I'd like to use the following loss and metric to train a model :
Her's how I tried to implement it :
from keras import backend as K
def weighted_meansquare(y_true, y_pred):
w = K.constant(np.array([i + 1 for i in range(20)]))
return K.sum(K.square(w * y_pred - w * y_true))
def esperance_metric(y_true, y_pred):
w = K.constant(np.array([i + 1 for i in range(20)]))
return K.sum(w * y_true - w * y_true)
I expected the model to minimize the metric (which is basically an expectation since my model returns a probability vector). Yet when I try to fit my model I see that the metric is always 0.0000e+00 .
What I'm looking for is :
some specific tips about how to code these functions
some general tips about testing keras.backend functions
You have a typo in your definition of esperance_metric: you use y_true - y_true instead of y_pred - y_true, which is why your metric is always 0.
I also see a mistake in weighted_meansquare. You should multiple by w after taking the square as follows:
K.sum(w * K.square(y_pred - y_true))
In general, if you want to test backend functions you can try evaluating them with K.eval. For example:
y_pred = K.constant([1.] * 20)
y_true = K.constant([0.] * 20)
print(K.eval(esperance_metric(y_true, y_pred)))
Is there any way to calculate residual deviance of a scikit-learn logistic regression model? This is a standard output from R model summaries, but I couldn't find it any of sklearn's documentation.
As suggested by #russell-richie, it should be model.predict_proba
Don't forget the argument normalize=False in function metrics.log_loss() to return the sum of the per-sample losses.
So to complete #ingo's answer, to obtain the model deviance with sklearn.linear_model.LogisticRegression, you can compute:
def deviance(X, y, model):
return 2*metrics.log_loss(y, model.predict_proba(X), normalize=False)
Actually, you can. Deviance is closely related to cross entropy, which is in sklearn.metrics.log_loss. Deviance is just 2*(loglikelihood_of_saturated_model - loglikelihood_of_fitted_model). Scikit learn can (without larger tweaks) only handle classification of individual instances, so that the log-likelihood of the saturated model is going to be zero. Cross entropy as returned by log_loss is the negative log-likelihood. Thus, the deviance is simply
def deviance(X, y, model):
return 2*metrics.log_loss(y, model.predict_log_proba(X))
I know this is a very late answer, but I hope it helps anyway.
You cannot do it in scikit-learn but check out statsmodels, GLMResults(API)
Here is a python implementation of explained_deviance that implements the discussions from this thread: Github code
import numpy as np
from scipy.special import softmax, expit
from sklearn.metrics import log_loss
from sklearn.dummy import DummyClassifier
# deviance function
def explained_deviance(y_true, y_pred_logits=None, y_pred_probas=None,
returnloglikes=False):
"""Computes explained_deviance score to be comparable to explained_variance"""
assert y_pred_logits is not None or y_pred_probas is not None, "Either the predicted probabilities \
(y_pred_probas) or the predicted logit values (y_pred_logits) should be provided. But neither of the two were provided."
if y_pred_logits is not None and y_pred_probas is None:
# check if binary or multiclass classification
if y_pred_logits.ndim == 1:
y_pred_probas = expit(y_pred_logits)
elif y_pred_logits.ndim == 2:
y_pred_probas = softmax(y_pred_logits)
else: # invalid
raise ValueError(f"logits passed seem to have incorrect shape of {y_pred_logits.shape}")
if y_pred_probas.ndim == 1: y_pred_probas = np.stack([1-y_pred_probas, y_pred_probas], axis=-1)
# compute a null model's predicted probability
X_dummy = np.zeros(len(y_true))
y_null_probas = DummyClassifier(strategy='prior').fit(X_dummy,y_true).predict_proba(X_dummy)
#strategy : {"most_frequent", "prior", "stratified", "uniform", "constant"}
# suggestion from https://stackoverflow.com/a/53215317
llf = -log_loss(y_true, y_pred_probas, normalize=False)
llnull = -log_loss(y_true, y_null_probas, normalize=False)
### McFadden’s pseudo-R-squared: 1 - (llf / llnull)
explained_deviance = 1 - (llf / llnull)
## Cox & Snell’s pseudo-R-squared: 1 - exp((llnull - llf)*(2/nobs))
# explained_deviance = 1 - np.exp((llnull - llf) * (2 / len(y_pred_probas))) ## TODO, not implemented
if returnloglikes:
return explained_deviance, {'loglike_model':llf, 'loglike_null':llnull}
else:
return explained_deviance