I have data about Parkinson patients stored in the dataframe X and whether a patient has Parkinson indicated by y (0 or 1). This is retrieved by:
X=pd.read_csv('parkinsons.data',index_col=0)
y=X['status']
X=X.drop(['status'],axis=1)
Then, I create training and test samples:
X_train, y_train, X_test, y_test = train_test_split(X,y,test_size=0.3,random_state=7)
I want to use SVC on this training data:
svc=SVC()
svc.fit(X_train,y_train)
Then, I get the error:
ValueError: bad input shape (59, 22).
What did I do wrong and how can I get rid of this error?
You have problems with the definition of train_test_split Careful! train_test_split outputs the X part first followed by the Y part. You are actually naming y_train what is X_test. Change this and it should work:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=7)
Either use this
X_train, y_train, X_test, y_test =train_test_split(X,y,test_size=0.3,random_state=7)
svc=SVC()
svc.fit(X_train,X_test)
Or this
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3,random_state=7)
svc=SVC()
svc.fit(X_train,y_train)
I prefer using the second one
Related
screenshot here Help please?
Already tried adding .values to the X's, still resulted in an error. Any suggestions?
X = df[['Personal income','Personal saving']]
y = df['Gross domestic product']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
regr = linear_model.LinearRegression().fit(X_train, y_train)
sample = [10000, 1000]
sample_pred = regr.predict([sample])
As stated in this issue https://github.com/tylerjrichards/Getting-Started-with-Streamlit-for-Data-Science/issues/5 , converting X_train dataframe to np array (X_train.values) before fitting removes the warning.
it did for my testing : you can either try :
X_train, X_test, y_train, y_test = train_test_split(X.values, y, test_size=0.2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X.to_numpy(), y, test_size=0.2, random_state=42)
In addition, this warning doesn't affect the calculation precision, you can ignore it and continue working until the update of the versions of libraries.
I'm trying to split my data into training, validation, and test sets using Fast_ml for a machine learning purpose. Both my input and output data are read from .npy files through np.load. The input "P" is an array with the shape of (100000, 4, 4, 6, 1) and the target "Q" is a vector of shape (100000,). I use the code below:
from fast_ml.model_development import train_valid_test_split
X_train, y_train, X_valid, y_valid, X_test, y_test = train_valid_test_split(P, Q,
train_size=0.8,
valid_size=0.1,
test_size=0.1)
However, I receive this error:
AttributeError: 'numpy.ndarray' object has no attribute 'drop'
This solved my problem:
from sklearn.model_selection import train_test_split
X_train, X_rem, y_train, y_rem = train_test_split(P,Q, train_size=0.8)
X_valid, X_test, y_valid, y_test = train_test_split(X_rem,y_rem, test_size=0.5)
Trying to fit a logistic regression model but receiving the below error:
ValueError: bad input shape (330, 5)
from sklearn.model_selection import train_test_split
X = ad_data[['Daily Time Spent on Site','Age','Area Income','Daily Internet Usage','Male']]
y= ad_data['Clicked on Ad']
X_train, y_train, X_test, y_test = train_test_split(X,y,test_size=0.33,random_state=42)
logmodel = LogisticRegression()
logmodel.fit(X_train,y_train)
The error is not very verbose, but I think you should assign train_test_split it this way:
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.33,random_state=42)
refer to: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html
Just trying to do a simple nearest neighbors classification but I'm baffled by this error for:
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X_train, y_train)
which produces:
ValueError: Found input variables with inconsistent numbers of samples: [489, 1890]
Anyone explain it that what i am missing?
The error is telling you that the size of your X_train and y_train samples are not the same. Revisit your train test split and make sure that you are executing it properly. For example if you are using sklearn.model_selectin.train_test_split you would do it like so:
X_train, X_test, y_train, y_test = train_test_split(X, y)
neigh = KNeighborsClassifier(n_neighbors=3)
neigh.fit(X_train, y_train)
Please check the order of train_test_split argument, it should be in exactly below order
X_train, X_test, y_train, y_test=train_test_split(X,y)
Wrong order produces the error "ValueError: Found input variables with inconsistent numbers of samples......"
l want to split data into train and test and also a vector that contains names (it serves me as an index and reference).
name_images has a shape of (2440,)
My data are :
data has a shape of (2440, 3072)
labels has a shape of (2440,)
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test= train_test_split(data, labels, test_size=0.3)
but l want also to split my name_images into name_images_train and name_images_test with respect to the split of data and labels
l tried
x_train, x_test, y_train, y_test,name_images_train,name_images_test= train_test_split(data, labels,name_images, test_size=0.3)
it doesn't preserve the order
Any suggestions
thank you
EDIT1:
x_train, x_test, y_train, y_test= train_test_split(data, labels,test_size=0.3, random_state=42)
name_images_train, name_images_test=train_test_split(name_images,
test_size=0.3,
random_state=42)
EDIT1 don't preserve the order
There are multiple ways to accomplish this.
The most straight forward is to use random_state parameter of train_test_split. As the documentation states:
random_state : int or RandomState :-
Pseudo-random number generator state used for random sampling.
When you fix the random_state, the indices which are generated for splitting the arrays into train and test are exact same each time.
So change your code to:
x_train, x_test,
y_train, y_test,
name_images_train, name_images_test=train_test_split(data, labels, name_images,
test_size=0.3,
random_state=42)
For more understanding on random_state, see my answer here:
https://stackoverflow.com/a/42197534/3374996
In my case, I realize that my input arrays were not in proper order in the first place. So for future Googlers--you may want to double-check if (data, labels) are in the same order or not.