I was trying to evaluate the efficiency of different ML algorithms but I am getting this error [closed] - python

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed yesterday.
Improve this question
This is my code.
names = []
res = []
for name, model in models:
kfold = StratifiedKFold(n_splits=10, random_state=None)
cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring='accuracy')
results.append(cv_results)
names.append(name)
res.append(cv_results.mean())
print('%s: %f (%f)' % (name, cv_results.mean(), cv_results.std()))
pyplot.ylim(.990, .999)
pyplot.bar(names, res, color ='maroon', width = 0.6)
pyplot.title('Algorithm Comparison')
pyplot.show()
This error occurred when I executed the code.

Related

How to scale datasets correctly [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
Which one is more correct or there is any other way to scale data? (I've used StandardScaler as an example)
I've tried every way and computed the accuracy of every model but there is no meaningful difference but I want to know which way is more correct
dataset= pd.read_csv("wine.csv")
x = dataset.iloc[:,:13]
y = dataset.iloc[:,13]
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.8,random_state=0)
sc=StandardScaler()
x_train=sc.fit_transform(x_train)
x_test=sc.fit_transform(x_test)
or
dataset= pd.read_csv("wine.csv")
x = dataset.iloc[:,:13]
y = dataset.iloc[:,13]
sc=StandardScaler()
x = sc.fit_transform(x)
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.8,random_state=0)
or
dataset= pd.read_csv("wine.csv")
x = dataset.iloc[:,:13]
y = dataset.iloc[:,13]
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.8,random_state=0)
sc=StandardScaler()
x_train=sc.fit_transform(x_train)
x_test=sc.transform(x_test)
Test data should not bee seen or used during the training a of model as they are used to assert the performance of the model.
Therefore the last option is the correct one. The scaling parameter should be computed solely on the train set as follow:
sc=StandardScaler()
x_train=sc.fit_transform(x_train)
x_test=sc.transform(x_test)

I want to know the resulting label for all of the input data [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
Learning was carried out using the random forest algorithm. I want to append the results of the input data to the existing input data, how do I do it? In the case of scikit-learn, it provides model evaluation criteria such as accuracy, precision, recall, and f1 score for the result, but I am not sure if there is a function that returns the label of the prediction result like keras. I don't know where to start with the code, so I'll just ask a question.
Usually, you have something like this when using sklearn:
input_data = pd.read_csv("path/to/data")
features = ["area", "location", "rooms"]
y = input_data["Price"]
X = input_data[features]
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1)
model = RandomForest()
model.fit(train_X, train_y)
Now your model is trained. As you mentioned you could get different metrics from the model using sklearn on your validation set.
Getting output label from the model means getting predictions(inference):
output_label = model.predict(val_X)
#This is an nd array with the size of val_y
x = pd.DataFrame(val_X,columns=["input"])
x["output_label"] = output_label
Or you could use numpy.concatenate to append the labels directly to your input data

copying column data into another dataframe using iloc [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
after spliting x_train ,y_train .
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3)
print(x_train.shape,y_train.shape)
(354, 13) (354,)
Again I need to join ytrain column to xtrain .Price is new column
x_train['Price']=y_train but this does not work
I am trying to use iloc like following but it gives warning
x_train['price']=y_train.iloc[0:354]
please help me out regarding this
You get that warning because x_train is a view of X. Using an example:
df = pd.DataFrame(np.random.uniform(0,1,(100,4)),
columns=['x1','x2','x3','y'])
X = df[['x1','x2','x3']]
Y = df[['y']]
x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size = 0.3)
You can see:
x_train._is_view
True
If I try to run your code, I get the same warning.
See this post about views of a data frame and also this on dealing with the warning. What you can do make a copy if you don't think it's an issue:
x_train = x_train.copy()
x_train['Price'] = y_train
Or use insert:
x_train.insert(x_train.shape[1],"Price",y_train)

AttributeError: 'SVC' object has no attribute 'best_estimator_' [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 1 year ago.
Improve this question
I'm trying to use GridSearchCV for SVM linear but I get this error:
AttributeError: 'SVC' object has no attribute 'best_estimator_'
the code of the SVM linear:
classifier = SVC()
classifier = GridSearchCV(classifier, {'C':[0.001, 0.01, 0.1, 1, 10,0.1, 100, 1000]}, cv=3, n_jobs=4)
classifier = SVC(kernel='linear')
classifier.fit(train_vectors, train_labels)
classifier = classifier.best_estimator_
can anyone help?
Do this:
classifier = SVC(kernel='linear')
gridsearch = GridSearchCV(classifier, {'C':[0.001, 0.01, 0.1, 1, 10,0.1, 100, 1000]}, cv=3, n_jobs=4)
gridsearch.fit(train_vectors, train_labels)
best_params = gridsearch.best_params_
classifier = gridsearch.best_estimator_

Speed Improvements to Leave One Group Out in Large Datasets [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 5 years ago.
Improve this question
I am performing classification by LogisticRegression over a large dataset (1.5 million observations) using LeaveOneGroupOut cross-validation. I am using scikit-learn for implementation. My code takes around 2 days to run and I would appreciate your inputs on how to make it faster. A snippet of my code is shown below:
grp = data['id_x'].values
logo = LeaveOneGroupOut()
LogReg = LogisticRegression()
params_grid = {'C': [0.78287388, 1.19946909, 1.0565957 , 0.69874106, 0.88427995, 1.33028731, 0.51466415, 0.91421747, 1.25318725, 0.82665192, 1, 10],
'penalty': ['l1', 'l2'] }
random_search = RandomizedSearchCV(LogReg, param_distributions = params_grid, n_iter = 3, cv = logo, scoring = 'accuracy')
random_search.fit(X, y, grp)
print random_search.best_params_
print random_search.best_score_
I am going to make the following assumptions:
1- you are using scikit-learn.
2- you need your code to be faster.
To get your final results faster, you can train multiple models at once by running them in parallel. To do so, you need to modify the variable n_jobs in scikit-learn. Possible options for n_jobs can be #of_CPU_cores or #of_CPU_cores-1 if you are not running anything else on your computer while training the model.
Examples:
RandomizedSearchCV in parallel:
random_search = RandomizedSearchCV(LogReg, n_jobs=3, param_distributions = params_grid, n_iter = 3, cv = logo, scoring = 'accuracy')
LogisticRegression in parallel:
LogisticRegression(n_jobs=3)
I recommend parallelizing only RandomizedSearchCV.
It might be helpful to also look at the original scikit-learn documentations:
sklearn.linear_model.LogisticRegression
sklearn.model_selection.RandomizedSearchCV

Categories