My model performs a multi-class (3) classification task.
I would like to change the way model "fits". Instead of calculation of a metric such as acc or logloss - I would like to run a simulation on whole data set to see how the model performs after each fit, in real-time.
Please note that simulation != loss/error. Simulation takes into the consideration time component of the data, the sequence in which events occur. Whereas the loss function simply calculates the error based on true values.
Currently I do the simulation after the whole "fitting" process has been done:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
all_ds = lgb.Dataset(X, label=y)
train_ds = lgb.Dataset(X_train, label=y_train)
test_ds = lgb.Dataset(X_test, label=y_test)
params = {
'device_type': "gpu",
'objective': 'multiclass',
'metric': 'multi_logloss',
"boosting_type": "gbdt",
"num_class": 3,
'random_state': 123
}
# fit
model = lgb.train(
params,
train_ds,
num_boost_round=20
valid_sets=[test_ds]
)
# make prediction on a whole data set
y_pred= model.predict(all_ds)
# simulate
simulation_result = simulate(X, y_pred) # float value
current process is:
fit step 1 - error x
fit step 2 - error y
..
fit step 20 - error z
simulate - see how the model performs
I would like to change the process to
fit step 1 - simulate - use result of simulation as an error
fit step 2 - simulate - use result of simulation as an error
..
fit step 20 - simulate - use result of simulation as an error
Is there a way to achieve it through a custom callback or a custom evaluation metric or some other way?
I tried creating a custom eval metric, unfortunately I cannot invoke predict() from within the function. Moreover I find the preds parameter value to be something I cannot simply use without transformations of some sort.. It contains some sort of multidimensional array that I have no idea how to convert to actual predictions.
def customEvalMetric(preds, eval_data):
# how to invoke predict() method on a whole dataset here?
# OR how to convert preds to one-hot encoded values?
# simulation_result = simulate(all_ds, ..?..)
return 'simulation_result', simulation_result, True
and using as
model = lgb.train(
params,
train_ds,
num_boost_round=20
valid_sets=[all_ds],
feval=customEvalMetric,
)
p.s. now that I think about it - I could in theory fit once in a loop, then use init_model to load the existing model weights.. Is this the only way?
I suppose this question is applicable to other tree boosting libraries since the API are similar (xgboost for example)
The custom eval function should work. As per the docs, preds is:
The predicted values. Predicted values are returned before any transformation, e.g. they are raw margin instead of probability of positive class for binary task.
So if this is a classification problem, you might need to apply the softmax transformation to each row. For a regression problem, you should be able to use this output as-is.
Related
I am trying to use keras to fit a CNN model to classify 2 classes of data . I have imbalanced dataset I want to balance the data. I don't know can I use class_weight in model.fit_generator . I wonder if I used class_weight="balanced" in model.fit_generator
The main code:
def generate_arrays_for_training(indexPat, paths, start=0, end=100):
while True:
from_=int(len(paths)/100*start)
to_=int(len(paths)/100*end)
for i in range(from_, int(to_)):
f=paths[i]
x = np.load(PathSpectogramFolder+f)
x = np.expand_dims(x, axis=0)
if('P' in f):
y = np.repeat([[0,1]],x.shape[0], axis=0)
else:
y =np.repeat([[1,0]],x.shape[0], axis=0)
yield(x,y)
history=model.fit_generator(generate_arrays_for_training(indexPat, filesPath, end=75),
validation_data=generate_arrays_for_training(indexPat, filesPath, start=75),
steps_per_epoch=int((len(filesPath)-int(len(filesPath)/100*25))),
validation_steps=int((len(filesPath)-int(len(filesPath)/100*75))),
verbose=2,
epochs=15, max_queue_size=2, shuffle=True, callbacks=[callback])
If you don't want to change your data creation process, you can use class_weight in your fit generator. You can use dictionary to set your class_weight and observe with fine tuning. For instance when class_weight is not used, and you have 50 examples for class0 and 100 examples for class1. Then, loss function calculate loss uniformly. It means that class1 will be a problem. But, when you set:
class_weight = {0:2 , 1:1}
It means that loss function will give 2 times weight to your class 0 now. Therefore, misclassification of underrepresented data will take 2 times more punishment than before. Thus, model can handle imbalanced data.
If you use class_weight='balanced' model can make that setting automatically. But my suggestion is that, create a dictionary like class_weight = {0:a1 , 1:a2} and try different values for a1 and a2, so you can understand difference.
Also, you can use undersampling methods for imbalanced data instead of using class_weight. Check Bootstrapping methods for that purpose.
I have created a custom error metric which prints as I run the XGBoost xgb.train but does not actually have any affect on the output. From what I can tell it is simply printing the custom error metric for the round but not using that to determine the accuracy.
I think this because the prediction outputs is exactly the same to when I use the default error metric. I have also tried hard coding the error output to a static 1 so that the output should be random but the result was exactly the same.
Do I need to create a custome objective function for the custom error metric to work?
Thanks!
My code:
# xgboost fitting with arbitrary parameters
xgb_params_1 = list(
objective = "reg:linear",
eta = 0.2,
max.depth = 6,
booster = "gbtree"
)
evalerror <- function(preds, dtrain) {
labels <- getinfo(dtrain, "label")
score <- as.numeric((sum(preds[1:1000]) - sum(labels[1:1000] )) / sum(labels[1:1000]) )
return(list(metric="custom_error",value=1))
}
myWatch <- list(val=dvalid,train=dtrain)
# fit the model with the arbitrary parameters specified above
xgb_1 = xgb.train(data = dtrain,
params = xgb_params_1,
nrounds = 150,
nthread = 6,
verbose = T,
print_every_n = 50,
watchlist = myWatch,
early_stop_round = 1000,
eval_metric = evalerror,
disable_default_eval_metric = 1
)
# Perform a prediction
pred <- predict(xgb_1, dvalid)
results <- cbind(as.data.table(pred), as.data.table(data[year > trainEndDate,"total_installs"]))
#Compute test RMSE
sqrt(mean((results$pred - results$total_installs)**2))
Printed error metrics:
Custom eval_metric is just for evaluation purposes. It is displayed at every round (when using watches) and it is useful to tune number of boosting rounds, and you can use it when you do cross-validation to tune your parameters to maximise/minimise your metric. I use it in particular to tune my learning rate to make the model converge faster with less rounds.
Custom objective function is a completely different beast and it is not the same as evaluation metric. It is more of a type of model like classification, regression etc. It drives the convergence of the model. If you still want it here is an example of xgboost regression objective.
I'm working on a Kaggle competition (https://www.kaggle.com/c/house-prices-advanced-regression-techniques#evaluation) and it states that my model will be evaluated by:
Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)
I couldn't find this in the docs (it's basically RMSE(log(truth), log(prediction)), so I went about writing a custom scorer:
def custom_loss(truth, preds):
truth_logs = np.log(truth)
print(truth_logs)
preds_logs = np.log(preds)
numerator = np.sum(np.square(truth_logs - preds_logs))
return np.sum(np.sqrt(numerator / len(truth)))
custom_scorer = make_scorer(custom_loss, greater_is_better=False)
Two questions:
1) Should my custom loss function return a numpy array of scores (one for each (truth, prediction) pair? Or should it be the total loss over those (truth, prediction) pairs, returning a single number?
I looked into the docs but they weren't super helpful re: what my custom loss function should return.
2) When I run:
xgb_model = xgb.XGBRegressor()
params = {"max_depth": [3, 4], "learning_rate": [0.05],
"n_estimators": [1000, 2000], "n_jobs": [8], "subsample": [0.8], "random_state": [42]}
grid_search_cv = GridSearchCV(xgb_model, params, scoring=custom_scorer,
n_jobs=8, cv=KFold(n_splits=10, shuffle=True, random_state=42), verbose=2)
grid_search_cv.fit(X, y)
grid_search_cv.best_score_
I get back:
-0.12137097567803554
which is very surprising. Given that my loss function is taking RMSE(log(truth) - log(prediction)), I shouldn't be able to have a negative best_score_.
Any idea why it's negative?
Thanks!
1) You should return a single number as loss, not array. GridSearchCV will sort the params accroding to the results of this scorer.
By the way instead of defining a custom metric, you can use mean_squared_log_error, which does what you want.
2) Why does it return negative? - Without your actual data and complete code we cant say.
You should be careful with the notation.
There are 2 levels of optimization here:
The loss function optimized when the XGBRegressor is fitted to the data.
The scoring function that is optimized during the grid search.
I prefer calling the second scoring function instead of loss function, since loss function usually refers to a term that is subject to optimization during the model fitting process itself.
However, your custom function only specifies 2. whilst leaving 1. untouched. In case you want to change the loss function of XGBRegressor see here. Most regression models have several criteria from which you can choose such as mean_square_error or mean_absolute_error.
Note, that passing customized loss functions is not supported at the moment (see reasons here and here).
The make_scorer function sign flips if greater_is_better is False
I have built a TensorFlow model that uses a DNNClassifier to classify input into two categories.
My problem is that Outcome 1 occurs upwards of 90-95% of the time. Therefore, TensorFlow is giving me the same probabilities for all of my predictions.
I am trying to predict the other outcome (e.g. having a false positive for Outcome 2 is preferable to missing a possible occurrence of Outcome 2). I know that in machine learning in general, in this case it would be worthwhile to try to upweight Outcome 2.
However, I don't know how to do this in TensorFlow. The documentation alludes to it being possible, but I can't find any examples of what it would actually look like. Has anyone has successfully done this, or does anyone know where I could find some example code or a thorough explanation (I'm using Python)?
Note: I have seen exposed weights being manipulated when someone is using the more fundamental parts of TensorFlow and not an estimator. For maintenance reasons, I need to do this using an estimator.
tf.estimator.DNNClassifier constructor has weight_column argument:
weight_column: A string or a _NumericColumn created by
tf.feature_column.numeric_column defining feature column representing
weights. It is used to down weight or boost examples during training.
It will be multiplied by the loss of the example. If it is a string,
it is used as a key to fetch weight tensor from the features. If it is
a _NumericColumn, raw tensor is fetched by key weight_column.key, then
weight_column.normalizer_fn is applied on it to get weight tensor.
So just add a new column and fill it with some weight for the rare class:
weight = tf.feature_column.numeric_column('weight')
...
tf.estimator.DNNClassifier(..., weight_column=weight)
[Update] Here's a complete working example:
import numpy as np
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('mnist', one_hot=False)
train_x, train_y = mnist.train.next_batch(1024)
test_x, test_y = mnist.test.images, mnist.test.labels
x_column = tf.feature_column.numeric_column('x', shape=[784])
weight_column = tf.feature_column.numeric_column('weight')
classifier = tf.estimator.DNNClassifier(feature_columns=[x_column],
hidden_units=[100, 100],
weight_column=weight_column,
n_classes=10)
# Training
train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'x': train_x, 'weight': np.ones(train_x.shape[0])},
y=train_y.astype(np.int32),
num_epochs=None, shuffle=True)
classifier.train(input_fn=train_input_fn, steps=1000)
# Testing
test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'x': test_x, 'weight': np.ones(test_x.shape[0])},
y=test_y.astype(np.int32),
num_epochs=1, shuffle=False)
acc = classifier.evaluate(input_fn=test_input_fn)
print('Test Accuracy: %.3f' % acc['accuracy'])
Context:
I am using Passive Aggressor from scikit library and confused whether to use warm start or partial fit.
Efforts hitherto:
Referred this thread discussion:
https://github.com/scikit-learn/scikit-learn/issues/1585
Gone through the scikit code for _fit and _partial_fit.
My observations:
_fit in turn calls _partial_fit.
When warm_start is set, _fit calls _partial_fit with self.coef_
When _partial_fit is called without coef_init parameter and self.coef_ is set, it continues to use self.coef_
Question:
I feel both are ultimately providing the same functionalities.Then, what is the basic difference between them? In which contexts, either of them are used?
Am I missing something evident? Any help is appreciated!
I don't know about the Passive Aggressor, but at least when using the SGDRegressor, partial_fit will only fit for 1 epoch, whereas fit will fit for multiple epochs (until the loss converges or max_iter
is reached). Therefore, when fitting new data to your model, partial_fit will only correct the model one step towards the new data, but with fit and warm_start it will act as if you would combine your old data and your new data together and fit the model once until convergence.
Example:
from sklearn.linear_model import SGDRegressor
import numpy as np
np.random.seed(0)
X = np.linspace(-1, 1, num=50).reshape(-1, 1)
Y = (X * 1.5 + 2).reshape(50,)
modelFit = SGDRegressor(learning_rate="adaptive", eta0=0.01, random_state=0, verbose=1,
shuffle=True, max_iter=2000, tol=1e-3, warm_start=True)
modelPartialFit = SGDRegressor(learning_rate="adaptive", eta0=0.01, random_state=0, verbose=1,
shuffle=True, max_iter=2000, tol=1e-3, warm_start=False)
# first fit some data
modelFit.fit(X, Y)
modelPartialFit.fit(X, Y)
# for both: Convergence after 50 epochs, Norm: 1.46, NNZs: 1, Bias: 2.000027, T: 2500, Avg. loss: 0.000237
print(modelFit.coef_, modelPartialFit.coef_) # for both: [1.46303288]
# now fit new data (zeros)
newX = X
newY = 0 * Y
# fits only for 1 epoch, Norm: 1.23, NNZs: 1, Bias: 1.208630, T: 50, Avg. loss: 1.595492:
modelPartialFit.partial_fit(newX, newY)
# Convergence after 49 epochs, Norm: 0.04, NNZs: 1, Bias: 0.000077, T: 2450, Avg. loss: 0.000313:
modelFit.fit(newX, newY)
print(modelFit.coef_, modelPartialFit.coef_) # [0.04245779] vs. [1.22919864]
newX = np.reshape([2], (-1, 1))
print(modelFit.predict(newX), modelPartialFit.predict(newX)) # [0.08499296] vs. [3.66702685]
If warm_start = False, each subsequent call to .fit() (after an initial call to .fit() or partial_fit()) will reset the model's trainable parameters for the initialisation. If warm_start = True, each subsequent call to .fit() (after an initial call to .fit() or partial_fit()) will retain the values of the model's trainable parameters from the previous run, and use those initially.
Regardless of the value of warm_start, each call to partial_fit() will retain the previous run's model parameters and use those initially.
Example using MLPRegressor:
import sklearn.neural_network
import numpy as np
np.random.seed(0)
x = np.linspace(-1, 1, num=50).reshape(-1, 1)
y = (x * 1.5 + 2).reshape(50,)
cold_model = sklearn.neural_network.MLPRegressor(hidden_layer_sizes=(), warm_start=False, max_iter=1)
warm_model = sklearn.neural_network.MLPRegressor(hidden_layer_sizes=(), warm_start=True, max_iter=1)
cold_model.fit(x,y)
print cold_model.coefs_, cold_model.intercepts_
#[array([[0.17009494]])] [array([0.74643783])]
cold_model.fit(x,y)
print cold_model.coefs_, cold_model.intercepts_
#[array([[-0.60819342]])] [array([-1.21256186])]
#after second run of .fit(), values are completely different
#because they were re-initialised before doing the second run for the cold model
warm_model.fit(x,y)
print warm_model.coefs_, warm_model.intercepts_
#[array([[-1.39815616]])] [array([1.651504])]
warm_model.fit(x,y)
print warm_model.coefs_, warm_model.intercepts_
#[array([[-1.39715616]])] [array([1.652504])]
#this time with the warm model, params change relatively little, as params were
#not re-initialised during second call to .fit()
cold_model.partial_fit(x,y)
print cold_model.coefs_, cold_model.intercepts_
#[array([[-0.60719343]])] [array([-1.21156187])]
cold_model.partial_fit(x,y)
print cold_model.coefs_, cold_model.intercepts_
#[array([[-0.60619347]])] [array([-1.21056189])]
#with partial_fit(), params barely change even for cold model,
#as no re-initialisation occurs
warm_model.partial_fit(x,y)
print warm_model.coefs_, warm_model.intercepts_
#[array([[-1.39615617]])] [array([1.65350392])]
warm_model.partial_fit(x,y)
print warm_model.coefs_, warm_model.intercepts_
#[array([[-1.39515619]])] [array([1.65450372])]
#and of course the same goes for the warm model
First, let us look at the difference between .fit() and .partial_fit().
.fit() would let you train from the scratch. Hence, you could think of this as a option that can be used only once for a model. If you call .fit() again with a new set of data, the model would be build on the new data and will have no influence of previous dataset.
.partial_fit() would let you update the model with incremental data. Hence, this option can be used more than once for a model. This could be useful, when the whole dataset cannot be loaded into the memory, refer here.
If both .fit() or .partial_fit() are going to be used once, then it makes no difference.
warm_start can be only used in .fit(), it would let you start the learning from the co-eff of previous fit(). Now it might sound similar to the purpose to partial_fit(), but recommended way would be partial_fit(). May be do the partial_fit() with same incremental data few number of times, to improve the learning.
About difference. Warm start it just an attribute of class. Partial fit it is method of this class. It's basically different things.
About same functionalities. Yes, partial fit will use self.coef_ because it still needed to get some values to update on training period. And for empty coef_init we just put zero values to self.coef_ and go to the next step of training.
Description.
For first start:
Whatever how (with or without warm start). We will train on zero coefficients but in result we will save average of our coefficients.
N+1 start:
With warm start. We will check via method _allocate_parameter_mem our previous coefficients and take it to train. In result save our average coefficients.
Without warm start. We will put zero coefficients (as first start) and go to training step. In result we will still write average coefficients to memory.