AttributeError: 'SVC' object has no attribute 'best_estimator_' [closed] - python

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I'm trying to use GridSearchCV for SVM linear but I get this error:
AttributeError: 'SVC' object has no attribute 'best_estimator_'
the code of the SVM linear:
classifier = SVC()
classifier = GridSearchCV(classifier, {'C':[0.001, 0.01, 0.1, 1, 10,0.1, 100, 1000]}, cv=3, n_jobs=4)
classifier = SVC(kernel='linear')
classifier.fit(train_vectors, train_labels)
classifier = classifier.best_estimator_
can anyone help?

Do this:
classifier = SVC(kernel='linear')
gridsearch = GridSearchCV(classifier, {'C':[0.001, 0.01, 0.1, 1, 10,0.1, 100, 1000]}, cv=3, n_jobs=4)
gridsearch.fit(train_vectors, train_labels)
best_params = gridsearch.best_params_
classifier = gridsearch.best_estimator_

Related

I was trying to evaluate the efficiency of different ML algorithms but I am getting this error [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed yesterday.
Improve this question
This is my code.
names = []
res = []
for name, model in models:
kfold = StratifiedKFold(n_splits=10, random_state=None)
cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring='accuracy')
results.append(cv_results)
names.append(name)
res.append(cv_results.mean())
print('%s: %f (%f)' % (name, cv_results.mean(), cv_results.std()))
pyplot.ylim(.990, .999)
pyplot.bar(names, res, color ='maroon', width = 0.6)
pyplot.title('Algorithm Comparison')
pyplot.show()
This error occurred when I executed the code.

Scikit-learn has very low accuracy on LOGISTIC REGRESSION, Random Forest, SVM but has high accuracy on linear regression [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 11 months ago.
Improve this question
This is my dataset
I transformed the columns with string type to float by doing this
df2['Sex'] = df['Sex'].astype('category')
df2['Housing'] = df['Housing'].astype('category')
df2['Saving accounts'] = df['Saving accounts'].astype('category')
df2['Checking account'] = df['Checking account'].astype('category')
df2['Purpose'] = df['Purpose'].astype('category')
To train the model:
train, test = train_test_split(df2, test_size=0.2)
Y_train = pd.DataFrame()
Y_test = pd.DataFrame()
Y_train["score"] = train["score"]
Y_test["score"] = test["score"]
X_train = train.drop('score', 1)
X_test = test.drop('score', 1)
lr = LogisticRegression(penalty='l1', C=0.9, solver='liblinear', n_jobs=-1)
lr.fit(X_train, Y_train)
Y_pred = lr.predict(X_test)
My accuracy with LOGISTIC REGRESSION, RandomForest or SVM is very low
from sklearn.metrics import accuracy_score
accuracy_score(Y_test,Y_pred)
0.05
Your problem is regression, but you tried classification models (Logistic regression, SVM and RandomForest). You should try RandomForestRegressor, SVR (as opposed to SVC) etc.

I want to know the resulting label for all of the input data [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
Learning was carried out using the random forest algorithm. I want to append the results of the input data to the existing input data, how do I do it? In the case of scikit-learn, it provides model evaluation criteria such as accuracy, precision, recall, and f1 score for the result, but I am not sure if there is a function that returns the label of the prediction result like keras. I don't know where to start with the code, so I'll just ask a question.
Usually, you have something like this when using sklearn:
input_data = pd.read_csv("path/to/data")
features = ["area", "location", "rooms"]
y = input_data["Price"]
X = input_data[features]
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
model = RandomForest()
model.fit(train_X, train_y)
Now your model is trained. As you mentioned you could get different metrics from the model using sklearn on your validation set.
Getting output label from the model means getting predictions(inference):
output_label = model.predict(val_X)
#This is an nd array with the size of val_y
x = pd.DataFrame(val_X,columns=["input"])
x["output_label"] = output_label
Or you could use numpy.concatenate to append the labels directly to your input data

Append values from function to list [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 2 years ago.
Improve this question
I use this function:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
for i in range(X.shape[1]):
clf = LogisticRegression(random_state=0, verbose=0, max_iter=1000).fit(X[:,i].reshape(-1,1), y)
print(clf.score(X[:,i].reshape(-1,1),y))
and i get 4 values as output:
0.7466666666666667
0.5533333333333333
0.9533333333333334
0.96
But when i try to add these 4 values to list:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
X, y = load_iris(return_X_y=True)
for i in range(X.shape[1]):
my_list = []
clf = LogisticRegression(random_state=0, verbose=0, max_iter=1000).fit(X[:,i].reshape(-1,1), y)
my_list .append(clf.score(X[:,i].reshape(-1,1),y))
print(my_list)
I get only last value:
[0.96]
I want to get :
[0.7466666666666667, 0.5533333333333333, 0.9533333333333334, 0.96]
How could i do that?
Make sure to declare the list outside the loop so it doesn't get reset ever iteration!
my_list = []
for i in range(X.shape[1]):
# you were resetting the list each time
clf = LogisticRegression(random_state=0, verbose=0, max_iter=1000).fit(X[:,i].reshape(-1,1), y)
my_list .append(clf.score(X[:,i].reshape(-1,1),y))
you could also use a list comprehension to avoid the confusion:
my_list = [LogisticRegression(random_state=0, verbose=0, max_iter=1000).fit(X[:,i].reshape(-1,1), y).score(X[:,i].reshape(-1,1),y) for i in range(X.shape[1])]
both store the list:
[0.7466666666666667, 0.5533333333333333, 0.9533333333333334, 0.96]

Speed Improvements to Leave One Group Out in Large Datasets [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am performing classification by LogisticRegression over a large dataset (1.5 million observations) using LeaveOneGroupOut cross-validation. I am using scikit-learn for implementation. My code takes around 2 days to run and I would appreciate your inputs on how to make it faster. A snippet of my code is shown below:
grp = data['id_x'].values
logo = LeaveOneGroupOut()
LogReg = LogisticRegression()
params_grid = {'C': [0.78287388, 1.19946909, 1.0565957 , 0.69874106, 0.88427995, 1.33028731, 0.51466415, 0.91421747, 1.25318725, 0.82665192, 1, 10],
'penalty': ['l1', 'l2'] }
random_search = RandomizedSearchCV(LogReg, param_distributions = params_grid, n_iter = 3, cv = logo, scoring = 'accuracy')
random_search.fit(X, y, grp)
print random_search.best_params_
print random_search.best_score_
I am going to make the following assumptions:
1- you are using scikit-learn.
2- you need your code to be faster.
To get your final results faster, you can train multiple models at once by running them in parallel. To do so, you need to modify the variable n_jobs in scikit-learn. Possible options for n_jobs can be #of_CPU_cores or #of_CPU_cores-1 if you are not running anything else on your computer while training the model.
Examples:
RandomizedSearchCV in parallel:
random_search = RandomizedSearchCV(LogReg, n_jobs=3, param_distributions = params_grid, n_iter = 3, cv = logo, scoring = 'accuracy')
LogisticRegression in parallel:
LogisticRegression(n_jobs=3)
I recommend parallelizing only RandomizedSearchCV.
It might be helpful to also look at the original scikit-learn documentations:
sklearn.linear_model.LogisticRegression
sklearn.model_selection.RandomizedSearchCV

Categories