I am having a hard time understanding the need for what seems like two search space definitions in the same program flow. The tune.Tuner() object takes in a param_space argument, where we can set up the hyperparameter space to look into, however, it can also take in a scheduler.
As an example, I have a HuggingFace transformer setup with a Population Based Training scheduler, with its own hyperparam_mutations, which looks like another hyperparameter space to look into.
What is the interaction between these two spaces?
If I just want to perturb learning_rate to see its effect on my accuracy, would I put this into the tuner's param_space or into the scheduler's hyperparam_mutations?
import ray
from ray import tune
from ray.tune import CLIReporter
from ray.tune.schedulers import PopulationBasedTraining
num_tune_trials = 3
batch_size = 2
num_labels = 2
model_ckpt = 'imaginary_ckpt'
odel_name = f"{model_ckpt}-finetuned"
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
def model_init():
return AutoModelForSequenceClassification.from_pretrained(model_ckpt, num_labels=num_labels).to(device)
def training_args():
return TrainingArguments(output_dir=model_name,
num_train_epochs=4,
learning_rate=2e-5,
per_device_train_batch_size=batch_size,
per_device_eval_batch_size=batch_size,
weight_decay=0.01,
evaluation_strategy="epoch",
push_to_hub=False,
log_level="error")
def trainer_hyperparam():
return Trainer(model=model, args=training_args,
compute_metrics=compute_metrics,
train_dataset=data_encoded["train"],
eval_dataset=data_encoded["validation"],
model_init=model_init,
tokenizer=tokenizer)
trainer = trainer_hyperparam()
tune_config = {
"per_device_train_batch_size": batch_size,
"per_device_eval_batch_size": batch_size,
}
scheduler = PopulationBasedTraining(
time_attr="training_iteration",
metric="eval_accuracy",
mode="max",
perturbation_interval=1,
hyperparam_mutations={
"weight_decay": tune.uniform(0.005, 0.02),
"learning_rate": tune.uniform(1e-3, 1e-6),
"per_device_train_batch_size": [4,5,6,7,8,9],
},
)
reporter = CLIReporter(
parameter_columns={
"weight_decay": "w_decay",
"learning_rate": "lr",
"per_device_train_batch_size": "train_bs/gpu",
},
metric_columns=["eval_accuracy", "eval_loss", "epoch", "training_iteration"],
)
trainer.hyperparameter_search(
hp_space=lambda _: tune_config,
backend="ray",
n_trials=num_tune_trials,
resources_per_trial={"cpu": 4, "gpu": 1},
scheduler=scheduler,
keep_checkpoints_num=1,
checkpoint_score_attr="training_iteration",
stop=None,
progress_reporter=reporter,
local_dir="~/ray_results/",
name="tune_transformer_pbt",
)
This section in one of the PBT user guides touches on both questions.
In particular, the param_space is used to get the initial samples, and the hyperparam_mutations specifies the resample distributions (resampling being one of the possible mutation operations) and determines which parameters actually get mutated. If not specified in param_space, PBT samples from hyperparam_mutations initially.
If you only want learning rate to be mutated, then that's the only one that should be specified in hyperparam_mutations.
Related
I have a function that will load a pre-trained model from huggingface and fine-tune it for sentiment analysis then calculates the F1 score and returns the result.
The problem is when I call this function multiple times with the exact same arguments, it will give the exact same metric score which is expected, except for the first time which is different, how is that possible?
This is my function which is written based on this tutorial in huggingface:
import uuid
import numpy as np
from datasets import (
load_dataset,
load_metric,
DatasetDict,
concatenate_datasets
)
from transformers import (
AutoTokenizer,
AutoModelForSequenceClassification,
DataCollatorWithPadding,
TrainingArguments,
Trainer,
)
CHECKPOINT = "distilbert-base-uncased"
SAVING_FOLDER = "sst2"
def custom_train(datasets, checkpoint=CHECKPOINT, saving_folder=SAVING_FOLDER):
model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
def tokenize_function(example):
return tokenizer(example["sentence"], truncation=True)
tokenized_datasets = datasets.map(tokenize_function, batched=True)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
saving_folder = f"{SAVING_FOLDER}_{str(uuid.uuid1())}"
training_args = TrainingArguments(saving_folder)
trainer = Trainer(
model,
training_args,
train_dataset=tokenized_datasets["train"],
eval_dataset=tokenized_datasets["validation"],
data_collator=data_collator,
tokenizer=tokenizer,
)
trainer.train()
predictions = trainer.predict(tokenized_datasets["test"])
print(predictions.predictions.shape, predictions.label_ids.shape)
preds = np.argmax(predictions.predictions, axis=-1)
metric_fun = load_metric("f1")
metric_result = metric_fun.compute(predictions=preds, references=predictions.label_ids)
return metric_result
And then I will run this function several times with the same datasets, and append the result of the returned F1 score each time:
raw_datasets = load_dataset("glue", "sst2")
small_datasets = DatasetDict({
"train": raw_datasets["train"].select(range(100)).flatten_indices(),
"validation": raw_datasets["validation"].select(range(100)).flatten_indices(),
"test": raw_datasets["validation"].select(range(100, 200)).flatten_indices(),
})
results = []
for i in range(4):
result = custom_train(small_datasets)
results.append(result)
And then when I check the results list:
[{'f1': 0.7755102040816325}, {'f1': 0.5797101449275361}, {'f1': 0.5797101449275361}, {'f1': 0.5797101449275361}]
Something that may come to mind is that when I load a pre-trained model, the head will be initialized with random weights and that is why the results are different, if that is the case, why only the first one is different and the others are exactly the same?
Sylvain Gugger answered this question here: https://discuss.huggingface.co/t/multiple-training-will-give-exactly-the-same-result-except-for-the-first-time/8493
You need to set the seed before instantiating your model, otherwise the random head is not initialized the same way, that’s why the first run will always be different.
The subsequent runs are all the same because the seed has been set by the Trainer in the train method.
To set the seed:
from transformers import set_seed
set_seed(42)
I would like to get the best model to use later in the notebook to predict using a different test batch.
reproducible example (taken from Optuna Github) :
import lightgbm as lgb
import numpy as np
import sklearn.datasets
import sklearn.metrics
from sklearn.model_selection import train_test_split
import optuna
# FYI: Objective functions can take additional arguments
# (https://optuna.readthedocs.io/en/stable/faq.html#objective-func-additional-args).
def objective(trial):
data, target = sklearn.datasets.load_breast_cancer(return_X_y=True)
train_x, valid_x, train_y, valid_y = train_test_split(data, target, test_size=0.25)
dtrain = lgb.Dataset(train_x, label=train_y)
dvalid = lgb.Dataset(valid_x, label=valid_y)
param = {
"objective": "binary",
"metric": "auc",
"verbosity": -1,
"boosting_type": "gbdt",
"lambda_l1": trial.suggest_loguniform("lambda_l1", 1e-8, 10.0),
"lambda_l2": trial.suggest_loguniform("lambda_l2", 1e-8, 10.0),
"num_leaves": trial.suggest_int("num_leaves", 2, 256),
"feature_fraction": trial.suggest_uniform("feature_fraction", 0.4, 1.0),
"bagging_fraction": trial.suggest_uniform("bagging_fraction", 0.4, 1.0),
"bagging_freq": trial.suggest_int("bagging_freq", 1, 7),
"min_child_samples": trial.suggest_int("min_child_samples", 5, 100),
}
# Add a callback for pruning.
pruning_callback = optuna.integration.LightGBMPruningCallback(trial, "auc")
gbm = lgb.train(
param, dtrain, valid_sets=[dvalid], verbose_eval=False, callbacks=[pruning_callback]
)
preds = gbm.predict(valid_x)
pred_labels = np.rint(preds)
accuracy = sklearn.metrics.accuracy_score(valid_y, pred_labels)
return accuracy
my understanding is that the study below will tune for accuracy. I would like to somehow retrieve the best model from the study (not just the parameters) without saving it as a pickle, I just want to use the model somewhere else in my notebook.
if __name__ == "__main__":
study = optuna.create_study(
pruner=optuna.pruners.MedianPruner(n_warmup_steps=10), direction="maximize"
)
study.optimize(objective, n_trials=100)
print("Best trial:")
trial = study.best_trial
print(" Params: ")
for key, value in trial.params.items():
print(" {}: {}".format(key, value))
desired output would be
best_model = ~model from above~
new_target_pred = best_model.predict(new_data_test)
metrics.accuracy_score(new_target_test, new__target_pred)
Short addition to #Toshihiko Yanase's answer, because the condition study.best_trial==trial was never True for me. This was even the case when both (Frozen)Trial objects had the same content, so it is likely a bug in Optuna. Changing the condition to study.best_trial.number==trial.number solves the problem for me.
Also if you prefer to not use globals in Python, you can use the study and trial user attributes
def objective(trial):
gmb = ...
trial.set_user_attr(key="best_booster", value=gbm)
def callback(study, trial):
if study.best_trial.number == trial.number:
study.set_user_attr(key="best_booster", value=trial.user_attrs["best_booster"])
if __name__ == "__main__":
study = optuna.create_study(
pruner=optuna.pruners.MedianPruner(n_warmup_steps=10), direction="maximize"
)
study.optimize(objective, n_trials=100, callbacks=[callback])
best_model=study.user_attrs["best_booster"]
I think you can use the callback argument of Study.optimize to save the best model. In the following code example, the callback checks if a given trial is corresponding to the best trial and saves the model as a global variable best_booster.
best_booster = None
gbm = None
def objective(trial):
global gbm
# ...
def callback(study, trial):
global best_booster
if study.best_trial == trial:
best_booster = gbm
if __name__ == "__main__":
study = optuna.create_study(
pruner=optuna.pruners.MedianPruner(n_warmup_steps=10), direction="maximize"
)
study.optimize(objective, n_trials=100, callbacks=[callback])
If you define your objective function as a class, you can remove the global variables. I created a notebook as a code example. Please take a look at it:
https://colab.research.google.com/drive/1ssjXp74bJ8bCAbvXFOC4EIycBto_ONp_?usp=sharing
I would like to somehow retrieve the best model from the study (not just the parameters) without saving it as a pickle
FYI, if you can pickle the boosters, I think you can make the code simple by following this FAQ.
I know this is has been already answered, there is a straightforward way of doing this with the optuna-lightgbm integration lightgbmtuner released in late 2020.
In short you could perform what you want to do, i.e. save the best booster as follows
import optuna.integration.lightgbm as lgb
dtrain = lgb.Dataset(X,Y,categorical_feature = 'auto')
params = {
"objective": "binary",
"metric": "auc",
"verbosity": -1,
"boosting_type": "gbdt",
}
tuner = lgb.LightGBMTuner(
params, dtrain, verbose_eval=100, early_stopping_rounds=1000,
model_dir= 'directory_to_save_boosters'
)
tuner.run()
Please note that the main thing here is to specify a model_dir directory to save the models in each of the iterations.
There is usually no need for a pruning callback as the optimization is done using a combination of Bayesian methods and expert heuristics and the search is usually over in around 60-64 iterations.
Then you can get the best model from the model directory you specified above using the single line
tuner.get_best_booster()
I optimized my keras model using hyperopt. Now how do we save the best optimized keras model and its weights to disk.
My code:
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from sklearn.metrics import roc_auc_score
import sys
X = []
y = []
X_val = []
y_val = []
space = {'choice': hp.choice('num_layers',
[ {'layers':'two', },
{'layers':'three',
'units3': hp.uniform('units3', 64,1024),
'dropout3': hp.uniform('dropout3', .25,.75)}
]),
'units1': hp.choice('units1', [64,1024]),
'units2': hp.choice('units2', [64,1024]),
'dropout1': hp.uniform('dropout1', .25,.75),
'dropout2': hp.uniform('dropout2', .25,.75),
'batch_size' : hp.uniform('batch_size', 20,100),
'nb_epochs' : 100,
'optimizer': hp.choice('optimizer',['adadelta','adam','rmsprop']),
'activation': 'relu'
}
def f_nn(params):
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import Adadelta, Adam, rmsprop
print ('Params testing: ', params)
model = Sequential()
model.add(Dense(output_dim=params['units1'], input_dim = X.shape[1]))
model.add(Activation(params['activation']))
model.add(Dropout(params['dropout1']))
model.add(Dense(output_dim=params['units2'], init = "glorot_uniform"))
model.add(Activation(params['activation']))
model.add(Dropout(params['dropout2']))
if params['choice']['layers']== 'three':
model.add(Dense(output_dim=params['choice']['units3'], init = "glorot_uniform"))
model.add(Activation(params['activation']))
model.add(Dropout(params['choice']['dropout3']))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=params['optimizer'])
model.fit(X, y, nb_epoch=params['nb_epochs'], batch_size=params['batch_size'], verbose = 0)
pred_auc =model.predict_proba(X_val, batch_size = 128, verbose = 0)
acc = roc_auc_score(y_val, pred_auc)
print('AUC:', acc)
sys.stdout.flush()
return {'loss': -acc, 'status': STATUS_OK}
trials = Trials()
best = fmin(f_nn, space, algo=tpe.suggest, max_evals=100, trials=trials)
print 'best: '
print best
Trials class object stores many relevant information related with each iteration of hyperopt. We can also ask this object to save trained model.
You have to make few small changes in your code base to achieve this.
-- return {'loss': -acc, 'status': STATUS_OK}
++ return {'loss':loss, 'status': STATUS_OK, 'Trained_Model': model}
Note:'Trained_Model' just a key and you can use any other string.
best = fmin(f_nn, space, algo=tpe.suggest, max_evals=100, trials=trials)
model = getBestModelfromTrials(trials)
Retrieve the trained model from the trials object:
import numpy as np
from hyperopt import STATUS_OK
def getBestModelfromTrials(trials):
valid_trial_list = [trial for trial in trials
if STATUS_OK == trial['result']['status']]
losses = [ float(trial['result']['loss']) for trial in valid_trial_list]
index_having_minumum_loss = np.argmin(losses)
best_trial_obj = valid_trial_list[index_having_minumum_loss]
return best_trial_obj['result']['Trained_Model']
Note: I have used this approach in Scikit-Learn classes.
Make f_nn return the model.
def f_nn(params):
# ...
return {'loss': -acc, 'status': STATUS_OK, 'model': model}
The models will be available on trials object under results. I put in some sample data and got print(trials.results) to spit out
[{'loss': 2.8245880603790283, 'status': 'ok', 'model': <keras.engine.training.Model object at 0x000001D725F62B38>}, {'loss': 2.4592788219451904, 'status': 'ok', 'model': <keras.engine.training.Model object at 0x000001D70BC3ABA8>}]
Use np.argmin to find the smallest loss, then save using model.save
trials.results[np.argmin([r['loss'] for r in trials.results])]['model']
(Side note, in C# this would be trials.results.min(r => r.loss).model... if there's a better way to do this in Python please let me know!)
You may wish to use attachments on the trial object if you're using MongoDB, as the model may be very large:
attachments - a dictionary of key-value pairs whose keys are short strings (like filenames) and whose values are potentially long strings (like file contents) that should not be loaded from a database every time we access the record. (Also, MongoDB limits the length of normal key-value pairs so once your value is in the megabytes, you may have to make it an attachment.) Source.
I don't know how to send some variable to f_nn or another hyperopt target explicilty. But I've use two approaches to do the same task.
First approach is some global variable (don't like it, because it's non-clear) and the second is to save the metric value to the file, then read and compare with a current metric. The last approach seems to me better.
def f_nn(params):
...
# I omit a part of the code
pred_auc =model.predict_proba(X_val, batch_size = 128, verbose = 0)
acc = roc_auc_score(y_val, pred_auc)
try:
with open("metric.txt") as f:
min_acc = float(f.read().strip()) # read best metric,
except FileNotFoundError:
min_acc = acc # else just use current value as the best
if acc < min_acc:
model.save("model.hd5") # save best to disc and overwrite metric
with open("metric.txt", "w") as f:
f.write(str(acc))
print('AUC:', acc)
sys.stdout.flush()
return {'loss': -acc, 'status': STATUS_OK}
trials = Trials()
best = fmin(f_nn, space, algo=tpe.suggest, max_evals=100, trials=trials)
print 'best: '
print best
from keras.models import load_model
best_model = load_model("model.hd5")
This approach has several advantages: you can keep metric and model together, and even apply to it some version or data version control system - so you can restore results of an experiment in the future.
Edit
It can cause an unexpected behaviour, if there's some metric from a previous run, but you don't delete it. So you can adopt the code - remove the metric after the optimization or use timestamp etc. to distinguish your experimets' data.
It is easy to implement a global variable to save the model. I would recommend saving it as an attribute under the trials object for clarity. In my experience in using hyperopt, unless you wrap ALL the remaining parameters (that are not tuned) into a dict to feed into the objective function (e.g. objective_fn = partial(objective_fn_withParams, otherParams=otherParams), it is very difficult to avoid global vars.
Example provided below:
trials = Trials()
trials.mybest = None # initialize an attribute for saving model later
best = fmin(f_nn, space, algo=tpe.suggest, max_evals=100, trials=trials)
trials.mybest['model'].save("model.hd5")
## In your optimization objective function
def f_nn(params):
global trials
model = trainMyKerasModelWithParams(..., params)
...
pred_auc =model.predict_proba(X_val, batch_size = 128, verbose = 0)
acc = roc_auc_score(y_val, pred_auc)
loss = -acc
## Track only best model (for saving later)
if ((trials.mybest is None)
or (loss < trials.mybest['loss'])):
trials.mybest = {'loss': loss,'model': model}
...
##
I wish to use a LIFT metric, using lift_score(), as the metric in xgboost tree model, then I set
.cv( ...,
feval = lift_score,
...,
)
but it shows the error:
TypeError: len() of unsized object
It might be, because my dataset is of an int type, but xgboost tree only accepts integer data, not sure how to fix this problem.
Below is my code:
import xgboost as xgb
from mlxtend.evaluate import lift_score
t_params = { 'objective': 'binary:logistic',
'eta': 0.1,
'subsample': 0.8,
'colsample_bytree': 0.8,
'max_depth': 4,
'min_child_weight': 6,
'seed': 0,
}
xgdmat = xgb.DMatrix( X_train, y_train ) # my data
cv_xgb = xgb.cv( params = t_params,
dtrain = xgdmat,
feval = lift_score,
maximize = True,
num_boost_round = 600,
nfold = 5,
early_stopping_rounds = 100
)
Why? Well, the API expectations were not met:
While the feval parameter could be set freely, there are some expectations from the xgboost.cv() method API, that ought be fulfilled.
The simpler part:
# user defined evaluation function, return a pair metric_name, result
so, your metric-evaluator function has to deliver a compatible result, best by:
return 'error', <_a_custom_LIFT_score_>
not meeting this requirement ( detected by testing if the len( ... ) was at least 2 ), has actually ignited the above thrown TypeError exception. So this part is solved.
Next, the harder part to meet:
( taken from the source-code )
# NOTE: when you do customized loss function, the default prediction value is margin
# this may make builtin evaluation metric not function properly
# for example, we are doing logistic loss, the prediction is score before logistic transformation
# the builtin evaluation error assumes input is after logistic transformation
# Take this in mind when you use the customization, and maybe you need write customized evaluation function
def evalerror(preds, dtrain):
labels = dtrain.get_label()
# return a pair metric_name, result
# since preds are margin(before logistic transformation, cutoff at 0)
return 'error', float(sum(labels != (preds > 0.0))) / len(labels)
Call-interface matching issues:
def eval_LIFT( ModelPREDICTIONS, dtrain ):
# a thin wrapper to mediate conversion from
# Xgboost.cv() <feval>-FUN call-signature
# to a target lift_score() call-signature
return 'LIFT', lift_score( dtrain.get_label(),
ModelPREDICTIONS
)
because the mlxtend.evaluate.lift_score() call-signature does not match the xgboost.cv()-native parameter ordering, but has it's own one:
def lift_score( y_target,
y_predicted,
binary = True,
positive_label = 1
):
"""
Lift measures the degree to which the predictions of a
classification model are better than randomly-generated predictions.
The in terms of True Positives (TP), True Negatives (TN),
False Positives (FP), and False Negatives (FN), the lift score is
computed as:
[ TP/(TP+FN) ] / [ (TP+FP) / (TP+TN+FP+FN) ]
...
"""
...
the simple above wrapper will do the simpler part.
In case the xgboost walkthrough source-code warnings above, about the actual value-biases is to be implemented also for your LIFT-metric case, your value-adaptation step will have to take place inside the eval_LIFT(), before passing, now correctly adapted-values to the mlxtend.evaluate.lift_score() that expects un-biased values.
I am using dev version of Python sklearn package with NN implementation.
My task is to train 4 NN with different input data and the average the predictions
X_median = preprocessing.scale(data_median)
X_min = preprocessing.scale(data_min)
X_max = preprocessing.scale(data_max)
X_mean = preprocessing.scale(data_mean)
I creat a Neural Networks like this
NN1 = MLPClassifier(hidden_layer_sizes = (50), activation = 'logistic', algorithm='adam', alpha= 0 , max_iter = 40, batch_size = 10, learning_rate = 'adaptive', shuffle = True, random_state=1)
NN2 = MLPClassifier(hidden_layer_sizes = (50), activation = 'logistic', algorithm='adam', alpha= 0 , max_iter = 40, batch_size = 10, learning_rate = 'adaptive', shuffle = True, random_state=1)
NN3 = MLPClassifier(hidden_layer_sizes = (50), activation = 'logistic', algorithm='adam', alpha= 0 , max_iter = 40, batch_size = 10, learning_rate = 'adaptive', shuffle = True, random_state=1)
NN4 = MLPClassifier(hidden_layer_sizes = (50), activation = 'logistic', algorithm='adam', alpha= 0 , max_iter = 40, batch_size = 10, learning_rate = 'adaptive', shuffle = True, random_state=1)
(standard sklearn function)
and I want to train them on described above datasets.
Without using pool my code will look like this:
NN1.fit(X_mean,train_y)
NN2.fit(X_median,train_y)
NN3.fit(X_min,train_y)
NN4.fit(X_max,train_y)
Of course since all 4 training are independent I want to run them in parallel, and I assume I should use pool for this. However, I do not understand completely how the computation is performed. I would assume to write something like this:
pool = Pool()
pool.apply_async(NN1.fit, args = (X_mean, train_y))
However, this does not produce any results, I can even type like this(passing only one argument) and the program will finish without any errors!
pool.apply_async(NN1.fit, args = (X_mean,)).
What will be the correct way to perform such computations?
Can someone advise good resource to understand the usage of Python multiprocessing?
Finally I made it work)
I based my solution on this answer. So, firstly create two help functions:
1)
def Myfunc(MyNN,X,train_y):
MyBrain.fit(X,train_y)
return MyNN
This one is just to make desirable function global to feed pool methods
2)
def test_star(a_b):
return Myfunc(*a_b)
This is key part of it- help function to take 1 argument and split it to desirable number of args Myfunc needed.
Then just create
mylist = [(NN_mean,X_mean, train_y), (NN_median,X_median, train_y)]
and execute
NN_mean, NN_median = pool.map(test_star, my list).
From my point of view this solution is super ugly, but it works. I hope someone can create more elegant one and post it :).