AttributeError: 'GridSearchCV' object has no attribute 'best_estimator_' - python

I'm trying to tune my model using the Grid search model in #kaggle notebook. In order to benefit from the GPU, I used this package hummingbird-ml. Thanks in advance
However, I get the following issue:
AttributeError: 'GridSearchCV' object has no attribute 'best_estimator_'
Here is my code:
from hummingbird.ml import convert
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVR
from sklearn.metrics import make_scorer, mean_squared_error
from pprint import pprint
# Hyper-tunning for SVM regressor
import numpy as np
base_svr = SVR()
scorer = make_scorer(mean_squared_error, greater_is_better=False)
param_grid_svr = {'C': [0.01, 0.1,1, 10, 100],
'gamma': [1,0.1, 0.01, 0.001, 0.0001],
'kernel': ['linear', 'poly', 'rbf', 'sigmoid', 'precomputed'],
'epsilon': [0.01, 0.1, 0.2 , 0.3, 1]}
pprint(param_grid_svr)
# Create a GridSearchCV object and fit it to the training data
svr_gs = GridSearchCV(base_svr,param_grid_svr, n_jobs = -1 , scoring=scorer, cv=3 ,refit=True,verbose=2)
# Converting scikit-learn model to PyTorch on CPU
svr_gs_pytorch = convert(svr_gs, 'torch')
# Switching PyTorch from CPU to GPU
%%capture
svr_gs_pytorch.to('cuda')
# Train the model in GPU
svr_gs_pytorch.fit(X_train,y_train)
# print best parameter after tuning
svr_gs_pytorch.best_params_

Related

Hyper-tuning of SVM regressor using Grid search and cuML

I tried tuning the SVM regressor parameters using the code below. However, during the search for the best params, the grid-search model tends to choose the first kernel of the model within the proposed kernels, every time. I tried different combinations to see if I could reach good results.
I attempted to use various cross-validations(3, 5-default, 7).
I used/removed the score function of the GridSearch model, to see if this impact.
I also tried with higher values of the penalty score (see the attached).
Please can anyone confirm this?
And is the kernels' order in the list/Grid params impact the search?
Note: I used the cuML to benefit from using the GPU and to speed up the search.
I can send you the dataset if you want
from cuml.svm import SVR
from sklearn.model_selection import GridSearchCV
#from sklearn.svm import SVR
from sklearn.metrics import make_scorer, mean_squared_error
from pprint import pprint
# Hyper-tunning for SVM regressor
import numpy as np
base_svr = SVR()
# If you want to use your custom score function, specify the function and use it.
def my_rmse_loss_func(y_true, y_pred):
return np.sqrt(mean_squared_error((y_true, y_pred)))
scorer = make_scorer(mean_squared_error, greater_is_better=False)#,, squared = False
param_grid_svr = {'C': [0.0001, 0.01, 0.1, 0.9,1, 1.1, 2,3], # I used 10, 100
'gamma': ['scale', 'auto'],
'kernel': ['rbf', 'poly', 'sigmoid', 'precomputed','linear'], #,
'epsilon': [0.01, 0.1, 0.2 ,0.22, 0.3, 1]}
# Create a GridSearchCV object and fit it to the training data with 7 cv
svr_cv = GridSearchCV(base_svr,param_grid_svr, n_jobs = -1 , # Use 4
scoring='neg_mean_squared_error', cv = 7 ,verbose=3 ,return_train_score =True ) # ,refit=True mean_squared_error, scoring='neg_mean_squared_error' ,
# Train the model in GPU
svr_cv.fit(X_train,y_train)
# Predictions
## By using the re-trained model
predictions_RF_regr_tuned= svr_cu.predict(X_test)

how to implement RFECV and gridsearchCV in svm

I try to implement gridsearchcv and then rfecv on svm
from sklearn.feature_selection import RFECV
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
from sklearn.metrics import classification_report
param_grid = {'C': [0.1, 1, 10, 100, 1000],
'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
'kernel': ['linear','rbf','sigmoid']}
estimator = SVC(probability = True, coef0 = 1.0)
clf = GridSearchCV (estimator, param_grid, cv=10, verbose=True)
clf.fit(data_train, label_train)
selector = RFECV(estimator, step = 1, cv= 10)
selector.fit(data_train, label_train)
label_predicted = selector.predict(data_test)
print(classification_report(label_test, label_predicted, digits=4))
it shows an error
ValueError: when importance_getter=='auto', the underlying estimator SVC should have coef_ or feature_importances_ attribute. Either pass a fitted estimator to feature selector or call fit before calling transform.
here is output image

Optimizing learning rate and number of estimators for multioutput gradient boosting

I have a dataset with multiple outputs and am trying to use gradient boosting to predict all the values at once. I imported MultiOutputRegressor so multiple outputs can be predicted at once; I'm able to make it work for the default gradient boosting function. However, I'm running into an error when I try to optimize the gradient boosting function for each output.
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn import ensemble
params = {'max_depth': 3, 'n_estimators': 100, 'learning_rate': 0.1}
gradient_regressor = MultiOutputRegressor(ensemble.GradientBoostingRegressor(**params))
GradBoostModel = gradient_regressor.fit(X_train, y_train)
prediction_GradBoost = GradBoostModel.predict(X_test)
LR = {'learning_rate':[0.15, 0.125, 0.1, 0.75, 0.05], 'n_estimators':[50, 75, 100, 150, 200, 250, 300, 400]}
tuning = GridSearchCV(estimator = GradBoostModel, param_grid = LR, scoring = 'r2')
tuning.fit(X_train, y_train)
tuning.best_params_, tuning.best_score_
I'm trying to use GridSearchCV to cycle through the listed learning rates and number of estimators to find the optimal values. But, I get the following error:
Invalid parameter learning_rate for estimator MultiOutputRegressor.
Check the list of available parameters with `estimator.get_params().keys()`
I think I understand the reason for the error: when I try to optimize the gradient boosting parameters, they are passed through the MultiOutputRegressor, which doesn't recognize them. Is this the case? Also, how can I change my code, such that I can optimize these parameters for each output?
Indeed the params are prefixed with estimator__, in general, to find out what params to use downstream in your pipeline use the .get_params().keys() method on your model, eg:
print(GradBoostModel.get_params().keys())
dict_keys(['estimator__alpha', 'estimator__ccp_alpha', 'estimator__criterion', 'estimator__init', 'estimator__learning_rate',...
Full working example with the linnerud dataset:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.multioutput import MultiOutputRegressor
from sklearn.datasets import load_linnerud
from sklearn.model_selection import train_test_split, GridSearchCV
import numpy as np
# Data
rng = np.random.RandomState(0)
X, y = load_linnerud(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=rng)
# Model
params = {'max_depth': 3, 'n_estimators': 100, 'learning_rate': 0.1}
gradient_regressor = MultiOutputRegressor(GradientBoostingRegressor(**params))
GradBoostModel = gradient_regressor.fit(X_train, y_train)
prediction_GradBoost = GradBoostModel.predict(X_test)
LR = {'estimator__learning_rate': [0.15, 0.125, 0.1, 0.75, 0.05], 'estimator__n_estimators': [50, 75, 100, 150, 200, 250, 300, 400]}
print('Params from GradBoostModel', GradBoostModel.get_params().keys())
tuning = GridSearchCV(estimator=GradBoostModel, param_grid=LR, scoring='r2')
tuning.fit(X_train, y_train)

'GridSearchCV' object has no attribute 'best_params_' when using LogisticRegression

Below is the code that I am trying to execute
# Train a logistic regression model, report the coefficients and model performance
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import cross_val_score
from sklearn import metrics
clf = LogisticRegression().fit(X_train, y_train)
params = {'penalty':['l1','l2'],'dual':[True,False],'C':[0.001, 0.01, 0.1, 1, 10, 100, 1000], 'fit_intercept':[True,False],
'solver':['saga']}
gridlog = GridSearchCV(clf, params, cv=5, n_jobs=2, scoring='roc_auc')
cv_scores = cross_val_score(gridlog, X_train, y_train)
#find best parameters
print('Logistic Regression parameters: ',gridlog.best_params_) # throws error
The last code line above is where the error is being thrown from. I have used this exact same code to run other models. Any idea why I may be facing this issue?
You need to fit gridlog first. cross_val_score will not do this, it returns the scores & nothing else.
Hence, as gridlog isn't trained, it throws error.
Below code works perfectly fine:
from sklearn import datasets
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
diabetes = datasets.load_breast_cancer()
x = diabetes.data[:150]
y = diabetes.target[:150]
clf = LogisticRegression().fit(x, y)
params = {'C':[0.001, 0.01, 0.1, 1, 10, 100, 1000]}
gridlog = GridSearchCV(clf, params, cv=2, n_jobs=2,
scoring='roc_auc')
gridlog.fit(x,y) # <- missing in your code
cv_scores = cross_val_score(gridlog, x, y)
print(cv_scores)
#find best parameters
print('Logistic Regression parameters: ',gridlog.best_params_)
# result:
Logistic regression parameters: {'C': 1}
Your code should be updated such that the LogisticRegression classifier is passed to the GridSearch (not its fit):
from sklearn.datasets import load_breast_cancer # For example only
X_train, y_train = load_breast_cancer(return_X_y=True)
params = {'penalty':['l1', 'l2'],'dual':[True, False],'C':[0.001, 0.01, 0.1, 1, 10, 100, 1000], 'fit_intercept':[True, False],
'solver':['saga']}
gridlog = GridSearchCV(LogisticRegression(), params, cv=5, n_jobs=2, scoring='roc_auc')
gridlog.fit(X_train, y_train)
#find best parameters
print('Logistic Regression parameters: ', gridlog.best_params_) # Now it displays all the parameters selected by the grid search
Results
Logistic Regression parameters: {'C': 0.1, 'dual': False, 'fit_intercept': True, 'penalty': 'l2', 'solver': 'saga'}
Note, as #desertnaut pointed out, you don't use cross_val_score for GridSearchCV.
See a complete example of how to use GridSearch here.
The example use a SVC classifier instead of a LogisticRegression, but the approach is the same.

GridSearchCV - FitFailedWarning: Estimator fit failed

I am running this:
# Hyperparameter tuning - Random Forest #
# Hyperparameters' grid
parameters = {'n_estimators': list(range(100, 250, 25)), 'criterion': ['gini', 'entropy'],
'max_depth': list(range(2, 11, 2)), 'max_features': [0.1, 0.2, 0.3, 0.4, 0.5],
'class_weight': [{0: 1, 1: i} for i in np.arange(1, 4, 0.2).tolist()], 'min_samples_split': list(range(2, 7))}
# Instantiate random forest
from sklearn.ensemble import RandomForestClassifier
classifier = RandomForestClassifier(random_state=0)
# Execute grid search and retrieve the best classifier
from sklearn.model_selection import GridSearchCV
classifiers_grid = GridSearchCV(estimator=classifier, param_grid=parameters, scoring='balanced_accuracy',
cv=5, refit=True, n_jobs=-1)
classifiers_grid.fit(X, y)
and I am receiving this warning:
.../anaconda/lib/python3.7/site-packages/sklearn/model_selection/_validation.py:536:
FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details:
TypeError: '<' not supported between instances of 'str' and 'int'
Why is this and how can I fix it?
I had similar issue of FitFailedWarning with different details, after many runs I found, the parameter value passing has the error, try
parameters = {'n_estimators': [100,125,150,175,200,225,250],
'criterion': ['gini', 'entropy'],
'max_depth': [2,4,6,8,10],
'max_features': [0.1, 0.2, 0.3, 0.4, 0.5],
'class_weight': [0.2,0.4,0.6,0.8,1.0],
'min_samples_split': [2,3,4,5,6,7]}
This will pass for sure, for me it happened in XGBClassifier, somehow the values datatype mixing up
One more is if the value exceeds the range, for example in XGBClassifier 'subsample' paramerters max value is 1.0, if it is set as 1.1, FitFailedWarning will occur
For me this was giving same error but after removing none from max_dept it is fitting properly.
param_grid={'n_estimators':[100,200,300,400,500],
'criterion':['gini', 'entropy'],
'max_depth':['None',5,10,20,30,40,50,60,70],
'min_samples_split':[5,10,20,25,30,40,50],
'max_features':[ 'sqrt', 'log2'],
'max_leaf_nodes':[5,10,20,25,30,40,50],
'min_samples_leaf':[1,100,200,300,400,500]
}
code which is running properly:
param_grid={'n_estimators':[100,200,300,400,500],
'criterion':['gini', 'entropy'],
'max_depth':[5,10,20,30,40,50,60,70],
'min_samples_split':[5,10,20,25,30,40,50],
'max_features':[ 'sqrt', 'log2'],
'max_leaf_nodes':[5,10,20,25,30,40,50],
'min_samples_leaf':[1,100,200,300,400,500]
}
I too got same error and when I passed hyperparameters as in MachineLearningMastery, I got output without warning...
Try this way if anyone get similar issues...
# grid search logistic regression model on the sonar dataset
from pandas import read_csv
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import RepeatedStratifiedKFold
from sklearn.model_selection import GridSearchCV
# load dataset
url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/sonar.csv'
dataframe = read_csv(url, header=None)
# split into input and output elements
data = dataframe.values
X, y = data[:, :-1], data[:, -1]
# define model
model = LogisticRegression()
# define evaluation
cv = RepeatedStratifiedKFold(n_splits=10, n_repeats=3, random_state=1)
# define search space
space = dict()
space['solver'] = ['newton-cg', 'lbfgs', 'liblinear']
space['penalty'] = ['none', 'l1', 'l2', 'elasticnet']
space['C'] = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1, 10, 100]
# define search
search = GridSearchCV(model, space, scoring='accuracy', n_jobs=-1, cv=cv)
# execute search
result = search.fit(X, y)
# summarize result
print('Best Score: %s' % result.best_score_)
print('Best Hyperparameters: %s' % result.best_params_)
Make sure the y-variable is an int, not bool or str.
Change your last line of code to make the y series a 0 or 1, for example:
classifiers_grid.fit(X, list(map(int, y)))

Categories