GridSearch for an estimator inside a OneVsRestClassifier - python

I want to perform GridSearchCV in a SVC model, but that uses the one-vs-all strategy. For the latter part, I can just do this:
model_to_set = OneVsRestClassifier(SVC(kernel="poly"))
My problem is with the parameters. Let's say I want to try the following values:
parameters = {"C":[1,2,4,8], "kernel":["poly","rbf"],"degree":[1,2,3,4]}
In order to perform GridSearchCV, I should do something like:
cv_generator = StratifiedKFold(y, k=10)
model_tunning = GridSearchCV(model_to_set, param_grid=parameters, score_func=f1_score, n_jobs=1, cv=cv_generator)
However, then I execute it I get:
Traceback (most recent call last):
File "/.../main.py", line 66, in <module>
argclass_sys.set_model_parameters(model_name="SVC", verbose=3, file_path=PATH_ROOT_MODELS)
File "/.../base.py", line 187, in set_model_parameters
model_tunning.fit(self.feature_encoder.transform(self.train_feats), self.label_encoder.transform(self.train_labels))
File "/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.py", line 354, in fit
return self._fit(X, y)
File "/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.py", line 392, in _fit
for clf_params in grid for train, test in cv)
File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 473, in __call__
self.dispatch(function, args, kwargs)
File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 296, in dispatch
job = ImmediateApply(func, args, kwargs)
File "/usr/local/lib/python2.7/dist-packages/sklearn/externals/joblib/parallel.py", line 124, in __init__
self.results = func(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/sklearn/grid_search.py", line 85, in fit_grid_point
clf.set_params(**clf_params)
File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 241, in set_params
% (key, self.__class__.__name__))
ValueError: Invalid parameter kernel for estimator OneVsRestClassifier
Basically, since the SVC is inside a OneVsRestClassifier and that's the estimator I send to the GridSearchCV, the SVC's parameters can't be accessed.
In order to accomplish what I want, I see two solutions:
When creating the SVC, somehow tell it not to use the one-vs-one strategy but the one-vs-all.
Somehow indicate the GridSearchCV that the parameters correspond to the estimator inside the OneVsRestClassifier.
I'm yet to find a way to do any of the mentioned alternatives. Do you know if there's a way to do any of them? Or maybe you could suggest another way to get to the same result?
Thanks!

When you use nested estimators with grid search you can scope the parameters with __ as a separator. In this case the SVC model is stored as an attribute named estimator inside the OneVsRestClassifier model:
from sklearn.datasets import load_iris
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV
from sklearn.metrics import f1_score
iris = load_iris()
model_to_set = OneVsRestClassifier(SVC(kernel="poly"))
parameters = {
"estimator__C": [1,2,4,8],
"estimator__kernel": ["poly","rbf"],
"estimator__degree":[1, 2, 3, 4],
}
model_tunning = GridSearchCV(model_to_set, param_grid=parameters,
score_func=f1_score)
model_tunning.fit(iris.data, iris.target)
print model_tunning.best_score_
print model_tunning.best_params_
That yields:
0.973290762737
{'estimator__kernel': 'poly', 'estimator__C': 1, 'estimator__degree': 2}

For Python 3, the following code should be used
from sklearn.datasets import load_iris
from sklearn.multiclass import OneVsRestClassifier
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import f1_score
iris = load_iris()
model_to_set = OneVsRestClassifier(SVC(kernel="poly"))
parameters = {
"estimator__C": [1,2,4,8],
"estimator__kernel": ["poly","rbf"],
"estimator__degree":[1, 2, 3, 4],
}
model_tunning = GridSearchCV(model_to_set, param_grid=parameters,
scoring='f1_weighted')
model_tunning.fit(iris.data, iris.target)
print(model_tunning.best_score_)
print(model_tunning.best_params_)

param_grid = {"estimator__alpha": [10**-5, 10**-3, 10**-1, 10**1, 10**2]}
clf = OneVsRestClassifier(SGDClassifier(loss='log',penalty='l1'))
model = GridSearchCV(clf,param_grid, scoring = 'f1_micro', cv=2,n_jobs=-1)
model.fit(x_train_multilabel, y_train)

Related

sklearn VotingClassifier with RandomizedSearchCV gives pickle error

I'm trying to get randomized hyperparameter search to work with the voting classifier from sklearn by adapting the example given in the sklearn documentation.
I've seen this minimal working example, but it breaks in many ways using my version of sklearn.
Here is a stripped-down example:
import numpy as np
from sklearn import __version__ as skv
from sklearn.ensemble import RandomForestClassifier as RFClassi
from sklearn.ensemble import HistGradientBoostingClassifier as HGBClassi
from sklearn.tree import DecisionTreeClassifier as DTClassi
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import VotingClassifier
from sklearn.datasets import load_iris
print(f"sklearn version: {skv}")
df_X, target = load_iris(return_X_y=True, as_frame=True)
ensemble = ['rf','dtree','hgb']
hy_pa_grid = {
'hgb': dict(learning_rate = list(np.linspace(0.01,0.5,10).round(3))),
'rf':dict(criterion = ['gini', 'entropy']),
'dtree':dict(criterion = ['gini', 'entropy']),
}
clfs = {'hgb' : HGBClassi(), 'rf': RFClassi(), 'dtree' : DTClassi()}
vc = VotingClassifier(estimators = clfs.items(), voting = 'soft')
params = {
f"{c}__{p}" : hy_pa_grid[c][p]
for c in ensemble
for p in hy_pa_grid[c].keys()
}
print("\n".join(map(str,params.items())))
clf = RandomizedSearchCV(estimator = vc, param_distributions = params)
clf.fit(df_X,target)
The output I get is this:
sklearn version: 1.1.3
{'rf__criterion': ['gini', 'entropy'], 'dtree__criterion': ['gini', 'entropy'], 'hgb__learning_rate': [0.01, 0.064, 0.119, 0.173, 0.228, 0.282, 0.337, 0.391, 0.446, 0.5]}
Traceback (most recent call last):
File "vc.py", line 34, in <module>
clf.fit(df_X,target)
File "/home/USER/.local/lib/python3.8/site-packages/sklearn/model_selection/_search.py", line 789, in fit
base_estimator = clone(self.estimator)
File "/home/USER/.local/lib/python3.8/site-packages/sklearn/base.py", line 87, in clone
new_object_params[name] = clone(param, safe=False)
File "/home/USER/.local/lib/python3.8/site-packages/sklearn/base.py", line 68, in clone
return copy.deepcopy(estimator)
File "/usr/lib/python3.8/copy.py", line 161, in deepcopy
rv = reductor(4)
TypeError: cannot pickle 'dict_items' object
Any ideas for getting round this? I also tried doing it with GridSearchCV, as in the example, but I get the same error.
Oops, it turns out the problem was in
estimators = clfs.items()
All was well once I wrapped it in tuple() to be an actual tuple rather than a generator.

Userwarning with GridSearchCV scoring and refit params

I am using GridSearchCV for tuning my MNB model. However, I keep getting UserWarning when setting the GridSearchCV(scoring, refit) params. I've read the docs and other related StackOverflow questions over and over and followed the answers but still getting errors. It works when using only one metric though. I am having a hard time understanding what's wrong.
from sklearn.naive_bayes import MultinomialNB
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import make_scorer
from sklearn.metrics import accuracy_score
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
parameters = {'alpha': [1, 0.1, 0.01, 0.001, 0.0001, 0.00001]}
scorers = {
"accuracy": make_scorer(accuracy_score),
"precision": make_scorer(precision_score),
"recall": make_scorer(recall_score),
"f1_score": make_scorer(f1_score)
}
mnb = MultinomialNB()
classifier = GridSearchCV(mnb, parameters, return_train_score=False, cv=10, scoring=scorers, refit='accuracy')
classifier.fit(x_train_features, y)
Error
UserWarning,
/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py:774: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/sklearn/model_selection/_validation.py", line 761, in _score
scores = scorer(estimator, X_test, y_test)
File "/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_scorer.py", line 103, in __call__
score = scorer._score(cached_call, estimator, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_scorer.py", line 264, in _score
return self._sign * self._score_func(y_true, y_pred, **self._kwargs)
File "/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py", line 1765, in precision_score
zero_division=zero_division,
File "/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py", line 1544, in precision_recall_fscore_support
labels = _check_set_wise_labels(y_true, y_pred, average, labels, pos_label)
File "/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_classification.py", line 1357, in _check_set_wise_labels
f"pos_label={pos_label} is not a valid label. It "
ValueError: pos_label=1 is not a valid label. It should be one of ['Negative', 'Positive']
I don't know how to override the pos_label parameters using the above setup in the score metrics so I
replaced my dataset labels Positive -> 1 and Negative -> 0 instead.

Implementing GridSearchCV and Pipelines to perform Hyperparameters Tuning for KNN Algorithm

I have been reading about perfroming Hyperparameters Tuning for KNN Algorthim, and understood that the best practice of implementing it is to make sure that for each fold, my dataset should be normalized and oversamplmed using a pipeline (To avoid data leakage and overfitting).
What I'm trying to do is that I'm trying to identify the best number of neighbors (n_neighbors) possible that gives me the best accuracy in training. In the code I have set the number of neighbors to be a list range (1,50), and the number of iterations cv=10.
My code below:
# dataset reading & preprocessing libraries
import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
#oversmapling
from imblearn.over_sampling import SMOTE
#KNN Model related Libraries
import cuml
from imblearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split, GridSearchCV
from cuml.neighbors import KNeighborsClassifier
#loading the dataset
df = pd.read_csv("/content/drive/MyDrive/Colab Notebooks/dataset/IanDataset.csv")
#filling missing values with zeros
df = df.fillna(0)
#replace the data in from being objects to integers
df["command response"].replace({"b'0'": "0", "b'1'": "1"}, inplace=True)
df["binary result"].replace({"b'0'": "0", "b'1'": "1"}, inplace=True)
#change the datatype of some features to be able to be used later
df["command response"] = pd.to_numeric(df["command response"]).astype(float)
df["binary result"] = pd.to_numeric(df["binary result"]).astype(int)
# dataset splitting
X = df.iloc[:, 0:17]
y_bin = df.iloc[:, 17]
# spliting the dataset into train and test for binary classification
X_train, X_test, y_bin_train, y_bin_test = train_test_split(X, y_bin, random_state=0, test_size=0.2)
#making pipleline that normalize, oversample and use classifier before GridSearchCV
pipe = Pipeline([
('normalization', MinMaxScaler()),
('oversampling', SMOTE()),
('classifier', KNeighborsClassifier(metric='eculidean', output='input'))
])
#Using GridSearchCV
neighbors = list(range(1,50))
parameters = {
'classifier__n_neighbors': neighbors
}
grid_search = GridSearchCV(pipe, parameters, cv=10)
grid_search.fit(X_train, y_bin_train)
print("Best Accuracy: {}" .format(grid_search.best_score_))
print("Best num of neighbors: {}" .format(grid_search.best_estimator_.get_params()['n_neighbors']))
At step grid_search.fit(X_train, y_bin_train), the program is repeating the error that i'm getting is :
/usr/local/lib/python3.7/site-packages/sklearn/model_selection/_validation.py:619: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/sklearn/model_selection/_validation.py", line 598, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/usr/local/lib/python3.7/dist-packages/imblearn/pipeline.py", line 266, in fit
self._final_estimator.fit(Xt, yt, **fit_params_last_step)
File "/usr/local/lib/python3.7/site-packages/cuml/internals/api_decorators.py", line 409, in inner_with_setters
return func(*args, **kwargs)
File "cuml/neighbors/kneighbors_classifier.pyx", line 176, in cuml.neighbors.kneighbors_classifier.KNeighborsClassifier.fit
File "/usr/local/lib/python3.7/site-packages/cuml/internals/api_decorators.py", line 409, in inner_with_setters
return func(*args, **kwargs)
File "cuml/neighbors/nearest_neighbors.pyx", line 397, in cuml.neighbors.nearest_neighbors.NearestNeighbors.fit
ValueError: Metric is not valid. Use sorted(cuml.neighbors.VALID_METRICSeculidean[brute]) to get valid options.
I'm not sure from which side is this error coming from, is it because I'm importing KNN Algorthim from cuML Library instead of sklearn ? Or is there something wrong wtih my Pipeline and GridSearchCV implementation?
This error indicates you've passed an invalid value for the metric parameter (in both scikit-learn and cuML). You've misspelled "euclidean".
import cuml
from sklearn import datasets
​
from sklearn.preprocessing import MinMaxScaler
​
from imblearn.over_sampling import SMOTE
​
from imblearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split, GridSearchCV
from cuml.neighbors import KNeighborsClassifier
​
X, y = datasets.make_classification(
n_samples=100
)
​
pipe = Pipeline([
('normalization', MinMaxScaler()),
('oversampling', SMOTE()),
('classifier', KNeighborsClassifier(metric='euclidean', output='input'))
])
​
parameters = {
'classifier__n_neighbors': [1,3,6]
}
​
grid_search = GridSearchCV(pipe, parameters, cv=2)
grid_search.fit(X, y)
GridSearchCV(cv=2,
estimator=Pipeline(steps=[('normalization', MinMaxScaler()),
('oversampling', SMOTE()),
('classifier', KNeighborsClassifier())]),
param_grid={'classifier__n_neighbors': [1, 3, 6]})

Error when trying to run a GridSearchCV on sklearn Pipeline

I'm trying to run a sklearn pipeline with TFIDF vectorizer and XGBoost Classifier through a GridSearchCV, but it doesn't work because of an internal error. The data is 4000 sentences, marked either true or false (1 or 0). This is the code:
import numpy as np
import pandas as pd
from gensim import utils
import gensim.parsing.preprocessing as gsp
from sklearn.pipeline import Pipeline
from sklearn.base import BaseEstimator
from sklearn.feature_extraction.text import TfidfVectorizer
import xgboost as xgb
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import f1_score
train = pd.read_csv("train_data.csv")
test = pd.read_csv("test_data.csv")
train_x = train.iloc[:, 0]
train_y = train.iloc[:, 1]
test_x = test.iloc[:, 0]
test_y = test.iloc[:, 1]
folds = 4
xgb_parameters = {
'xgboost__n_estimators': [1000, 1500],
'xgboost__max_depth': [12, 15],
'xgboost__learning_rate': [0.1, 0.12],
'xgboost__objective': ['binary:logistic']
}
model = Pipeline(steps=[('tfidf', TfidfVectorizer()),
('xgboost', xgb.XGBClassifier())])
gs_cv = GridSearchCV(estimator=model,
param_grid=xgb_parameters,
n_jobs=1,
refit=True,
cv=2,
scoring=f1_score)
gs_cv.fit(train_x, train_y)
But I am getting an error:
>>> gs_cv.fit(train_x, train_y)
C:\Users\draga\miniconda3\lib\site-packages\xgboost\sklearn.py:888: UserWarning: The use of label encoder in XGBClassifier is deprecated and will be removed in a future release. To remove this warning, do the following: 1) Pass option use_label_encoder=False when constructing XGBClassifier object; and 2) Encode your labels (y) as integers starting with 0, i.e. 0, 1, 2, ..., [num_class - 1].
[21:31:18] WARNING: C:/Users/Administrator/workspace/xgboost-win64_release_1.3.0/src/learner.cc:1061: Starting in XGBoost 1.3.0, the default evaluation metric used with the objective 'binary:logistic' was changed from 'error' to 'logloss'. Explicitly set eval_metric if you'd like to restore the old behavior.
C:\Users\draga\miniconda3\lib\site-packages\sklearn\utils\validation.py:70: FutureWarning: Pass labels=0 0
1 1
2 1
3 0
4 1
..
2004 0
2005 0
2008 0
2009 0
2012 0
Name: Bad Sentence, Length: 2000, dtype: int64 as keyword args. From version 1.0 (renaming of 0.25) passing these as positional arguments will result in an error
warnings.warn(f"Pass {args_msg} as keyword args. From version "
C:\Users\draga\miniconda3\lib\site-packages\sklearn\model_selection\_validation.py:683: UserWarning: Scoring failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File "C:\Users\draga\miniconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 674, in _score
scores = scorer(estimator, X_test, y_test)
File "C:\Users\draga\miniconda3\lib\site-packages\sklearn\utils\validation.py", line 74, in inner_f
return f(**kwargs)
File "C:\Users\draga\miniconda3\lib\site-packages\sklearn\metrics\_classification.py", line 1068, in f1_score
return fbeta_score(y_true, y_pred, beta=1, labels=labels,
File "C:\Users\draga\miniconda3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\draga\miniconda3\lib\site-packages\sklearn\metrics\_classification.py", line 1192, in fbeta_score
_, _, f, _ = precision_recall_fscore_support(y_true, y_pred,
File "C:\Users\draga\miniconda3\lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
return f(*args, **kwargs)
File "C:\Users\draga\miniconda3\lib\site-packages\sklearn\metrics\_classification.py", line 1461, in precision_recall_fscore_support
labels = _check_set_wise_labels(y_true, y_pred, average, labels,
File "C:\Users\draga\miniconda3\lib\site-packages\sklearn\metrics\_classification.py", line 1274, in _check_set_wise_labels
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "C:\Users\draga\miniconda3\lib\site-packages\sklearn\metrics\_classification.py", line 83, in _check_targets
check_consistent_length(y_true, y_pred)
File "C:\Users\draga\miniconda3\lib\site-packages\sklearn\utils\validation.py", line 259, in check_consistent_length
lengths = [_num_samples(X) for X in arrays if X is not None]
File "C:\Users\draga\miniconda3\lib\site-packages\sklearn\utils\validation.py", line 259, in <listcomp>
lengths = [_num_samples(X) for X in arrays if X is not None]
File "C:\Users\draga\miniconda3\lib\site-packages\sklearn\utils\validation.py", line 192, in _num_samples
raise TypeError(message)
TypeError: Expected sequence or array-like, got <class 'sklearn.pipeline.Pipeline'>
What could be the problem?
Do I need to include the transform method for TfidfVectorizer() in the pipeline?
The main problem is your scoring parameter for the search. Scorers for hyperparameter tuners in sklearn need to have the signature (estimator, X, y). You can use the make_scorer convenience function, or in this case just pass the name as a string, scorer="f1".
See the docs, the list of builtins and information on signatures.
(You do not need to explicitly use the transform method; that's handled internally by the pipeline.)

RandomForest score method ValueError

I am trying to find the score of a given data set with respect to some training data. I have written the following code:
from sklearn.ensemble import RandomForestClassifier
import numpy as np
randomForest = RandomForestClassifier(n_estimators = 200)
li_train1 = [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]
li_train2 = [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]
li_text1 = [[10,20,30,40,50,60,70,80,90], [10,20,30,40,50,60,70,80,90]]
li_text2 = [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]
randomForest.fit(li_train1, li_train2)
output = randomForest.score(li_train1, li_text1)
On compiling and trying to run the program I am getting the error:
Traceback (most recent call last):
File "trial.py", line 16, in <module>
output = randomForest.score(li_train1, li_text1)
File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 349, in score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 172, in accuracy_score
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 89, in _check_targets
raise ValueError("{0} is not supported".format(y_type))
ValueError: multiclass-multioutput is not supported
On checking the documentation related to the score method it says:
score(X, y, sample_weight=None)
X : array-like, shape = (n_samples, n_features)
Test samples.
y : array-like, shape = (n_samples) or (n_samples, n_outputs)
True labels for X.
Both X and y in my case are arrays, 2d arrays.
I also went through this question but I couldn't understand where am I going wrong.
EDIT
So as per the answer and the comments that follow, I have edited the program as follows:
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import MultiLabelBinarizer
import numpy as np
randomForest = RandomForestClassifier(n_estimators = 200)
mlb = MultiLabelBinarizer()
li_train1 = [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]
li_train2 = [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]
li_text1 = [100,200]
li_text2 = [[1,2,3,4,5,6,7,8,9],[1,2,3,4,5,6,7,8,9]]
randomForest.fit(li_train1, li_train2)
output = randomForest.score(li_train1, li_text1)
After this edit I am getting the error:
Traceback (most recent call last):
File "trial.py", line 19, in <module>
output = randomForest.score(li_train1, li_text1)
File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 349, in score
return accuracy_score(y, self.predict(X), sample_weight=sample_weight)
File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 172, in accuracy_score
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "/usr/local/lib/python2.7/dist-packages/sklearn/metrics/classification.py", line 82, in _check_targets
"".format(type_true, type_pred))
ValueError: Can't handle mix of binary and multiclass-multioutput
According to the documentation:
Warning: At present, no metric in sklearn.metrics supports the multioutput-multiclass classification task.
The score method invokes sklearn's accuracy metric but this isn't supported for the multi-class, multi-output classification problem you've defined.
It's not clear from your question if you really intend to solve a multi-class, multi-output problem. If that's not your intention, then you should restructure your input arrays.
If on the other hand you really want to solve this kind of problem, you'll simply need to define your own scoring function.
UPDATE
Since you are not solving a multi-class, multi-label problem you should restructure your data so that it looks something like this:
from sklearn.ensemble import RandomForestClassifier
# training data
X = [
[1,2,3,4,5,6,7,8,9],
[1,2,3,4,5,6,7,8,9]
]
y = [0,1]
# fit the model
randomForest.fit(X,y)
# test data
Xtest = [
[1,2,0,4,5,6,0,8,9],
[1,1,3,1,5,0,7,8,9]
]
ytest = [0,1]
output = randomForest.score(Xtest,ytest)
print(output) # 0.5

Categories