Logistic Regression- multiclass-multioutput is not supported + errors - python

I am new to python and while doing a Logistic Regression I'm having a few issues, such as displayed. Here's my code, and then the error messages:
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.preprocessing import LabelEncoder
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_curve, roc_auc_score
X = dataset_df
Y = dataset_df
'X_train, X_test, y_train, y_test\
= train_test_split(X, y, test_size = 0.3, random_state=1)'
X_train, X_validation, y_train, y_validation\
= train_test_split(X_train, y_train, test_size = 0.3, random_state=1)
sc = StandardScaler()
sc.fit(X_train)
X_train_Std = sc.transform(X_train)
lr_classifier = LogisticRegression(C = 1000, random_state= 1)
rf_classifier = RandomForestClassifier(max_depth=5, random_state= 1)
rf_classifier.fit(X_train_Std, y_train)
rf_classifier.predict_proba(sc.transform(X_validation))
then here
roc_auc_score(y_true=y_test, y_score=lr_2.predict(X_test_std_pca_1))
NameError: name 'lr_2' is not defined
and there
max_depth_params = [2, 3, 5 ,10]
for max_depth in max_depth_params:
rf_classifier = RandomForestClassifier(max_depth=max_depth, random_state= 1)
rf_classifier.fit(X_train_Std, y_train)
y_pred2 = rf_classifier.predict(sc.transform(X_validation))
print('max depth param:', max_depth, 'accuracy:', accuracy_score(y_true=y_validation, y_pred=y_pred2))
ValueError: multiclass-multioutput is not supported
and there
lr_classifier.fit(X_train_Std, y_train)
y_pred = lr_classifier.predict(sc.transform(X_validation))
ValueError: y should be a 1d array, got an array of shape (3876, 16) instead.
and finally :
y_pred2 = rf_classifier.predict(sc.transform(X_validation))
print('Misclassified samples {0} out of {1}, i.e. {2:.2f}% accurate'.\
format((y_validation != y_pred).sum(), len(y_validation), (1 - (y_validation != y_pred).sum()/len(y_validation))*100))
TypeError: unsupported format string passed to Series.format
so many messages of error that i feel as if my head is going to explode, if someone could help i'd be very grateful 🙏

Variable name error lr_2 is not variable name you have defined it as lr_classifier
Your Target or y is not single column but it's 2D (Multiclass)
It clearly states the issue that y is not 1D array
Unsupported datatype is passed probably an Series.
And try to debug errors one at a time and learn python basic before going to Machine Learning please.

Related

Can anyone help resolve my polynomial regression model with feature scaling and transformations?

The error I got is:
ValueError: shapes (1,2) and (15,) not aligned: 2 (dim 1) != 15 (dim 0)
Code:
import numpy as np
import pandas as pd
dataset = pd.read_csv('music.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, -1].values
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
y = le.fit_transform(y)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
poly_reg = PolynomialFeatures(degree = 4)
X_poly = poly_reg.fit_transform(X_train)
regressor = LinearRegression()
regressor.fit(X_poly, y_train)
print(y_train.inverse_transform((regressor.predict([[33,0]]))))
This is the full error:
print(y_test.inverse_transform((regressor.predict([[33,0]]))))
AttributeError: 'numpy.ndarray' object has no attribute 'inverse_transform'
Process finished with exit code 1
y_train is a numpy array, but you are treating it as a fitted encoder.
y_train.inverse_transform((regressor.predict([[33,0]]))) is your problem. It should be le.inverse_transform(regressor.predict([[33,0]])).

r2 score turns out to be negative

I study support vector regression but I faced a problem: my r2 score becomes negative. Is that normal or is there any changeable part in my code to fix this?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR
df = pd.read_csv('Position_Salaries.csv')
df.head()
X = df.iloc[:, 1:2].values
y = df.iloc[:, -1].values
from sklearn.preprocessing import StandardScaler
y = y.reshape(len(y),1)
x_scaler = StandardScaler()
y_scaler = StandardScaler()
X = x_scaler.fit_transform(X)
y = y_scaler.fit_transform(y)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 42)
regressor = SVR(kernel="rbf")
regressor.fit(x_train,y_train.ravel())
y_pred = y_scaler.inverse_transform(regressor.predict(x_scaler.transform(x_test)))
from sklearn.metrics import r2_score
r2_score(y_scaler.inverse_transform(y_test), y_pred)
My output is -0.5313206322807349
In this part, your X is in scaled version
X = x_scaler.fit_transform(X)
In this part, your x_test also in scaled version
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 42)
When creating prediction, you shouldn't transform your input again since your x_test already in scaled version
y_pred = y_scaler.inverse_transform(regressor.predict(x_scaler.transform(x_test)))
From the documentation of sklearn.metrics.r2_score.
Best possible score is 1.0 and it can be negative (because the model
can be arbitrarily worse). A constant model that always predicts the
expected value of y, disregarding the input features, would get a R^2
score of 0.0.
Per documentation:
Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse)

What does the error mean and how to fix it - "ValueError: query data dimension must match training data dimension"

I am trying to write the code for K-NN
Below is my code. - I know that issue is in `predict() but I am not able to figure out how o fix it.
# Importing the libraries
import numpy as np
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('UniversalBank.csv')
X = dataset.iloc[:,[ 1,2,3,5,6,7,8,10,11,12,13]].values #,
y = dataset.iloc[:,9].values
#Splitting the dataset to training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state= 0)
#Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#Fitting the classifier to training set
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train,y_train)
#Predicting the test results
y_pred = classifier.predict(X_test)

ValueError: Can't handle mix of continuous and multiclass

I want to estimate the model from the data I've used here in scikit-learn. I am using the DecisionTreeClassifier.score function but when running the code I'll receive an ValueError:
Can't handle mix of continuous and multiclass.
Here is the code I use:
from sklearn import datasets
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
nba = pd.read_excel(r"C:\Users\user\Desktop\nba.xlsx")
X = nba.drop('平均得分', axis = 1)
y = nba['平均得分']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size = 0.20)
nba_tree = DecisionTreeClassifier()
nba_tree.fit(X_train, y_train.astype('int'))
y_pred = nba_tree.predict(X_test)
nba_tree.score(X_test, y_test)
It looks like your target variable 平均得分 is a continuous variable. Probably you are try to solve a regression problem. If that is the case then try DecisionTreeRegressor instead of DecisionTreeClassifier.

Error code with Preprocessor Scaling?

Using KNN and I wanted to experiment with different normalizers (Normalizer(), MinMaxScaler(), StandardScaler() etc).
I have loaded the data into a variable called X:
X = pd.read_csv('C:/Users/rmahesh/documents/parkinson.csv')
After doing some data wrangling, I try and run this code:
from sklearn import preprocessing
from sklearn.decomposition import PCA
T = preprocessing.Normalizer().fit(X)
from sklearn.cross_validation import train_test_split
T_train, T_test, y_train, y_test = train_test_split(T, y, test_size = 0.3, random_state = 7)
from sklearn.svm import SVC
model = SVC()
model = model.fit(T_train, y_train)
score = model.score(T_test, y_test)
print(score)
The specific error code I am getting is this:
TypeError: Singleton array array(Normalizer(copy=True, norm='l2'), dtype=object) cannot be considered a valid collection.
The code in which the error is appearing is this line:
T_train, T_test, y_train, y_test = train_test_split(T, y,
test_size = 0.3, random_state = 7)
Any help would be greatly appreciated!
You're fitting your normalizer and then treating it as an array directly. Replace
T = preprocessing.Normalizer().fit(X)
With
T = preprocessing.Normalizer().fit_transform(X)
So that the actual output of the normalization is used instead. .fit() returns the Normalizer object itself.

Categories