How to predict single instance with Logistic Regression using sci-kit learn? - python

I'm trying to build a Logistic Regression model which can predict new instance's class.
Here what I've done:
path = 'diabetes.csv'
df = pd.read_csv(path, header = None)
print "Classifying with Logistic Regression"
values = df.values
X = values[1:,0:8]
y = values[1:,8]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.10, random_state=42)
model=LogisticRegression()
model.fit(X_train,y_train)
X_test = []
X_test.append(int(pregnancies_info))
X_test.append(int(glucose_info))
X_test.append(int(blood_press_info))
X_test.append(int(skin_thickness_info))
X_test.append(int(insulin_info))
X_test.append(float(BMI_info))
X_test.append(float(dpf_info))
X_test.append(int(age_info))
#X_test = np.array(X_test).reshape(-1, 1)
print X_test
y_pred=model.predict(X_test)
if y_pred == 0:
Label(login_screen, text="Healthy").pack()
if y_pred == 1:
Label(login_screen, text="Diabetes Metillus").pack()
pregnancies_entry.delete(0, END)
glucose_entry.delete(0, END)
blood_press_entry.delete(0, END)
skin_thickness_entry.delete(0, END)
insulin_entry.delete(0, END)
BMI_entry.delete(0, END)
dpf_entry.delete(0, END)
age_entry.delete(0, END)
But I got this error:
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
If I uncomment this line X_test = np.array(X_test).reshape(-1, 1) this error appears:
File "/anaconda2/lib/python2.7/site-packages/sklearn/linear_model/base.py", line 305, in decision_function
% (X.shape[1], n_features))
ValueError: X has 1 features per sample; expecting 8

You have to give it as
X_test = np.array(X_test).reshape(1, -1))
or you can directly do,
y_pred=model.predict([X_test])
The reason is predict function expects a 2D array with dimension (n_samples, n_features). When you have only record for which you need prediction, create a list of list and feed it! Hope it helps.

Related

getting shape errors for .score method from sklearn

df = pd.read_csv('../input/etu-ai-club-competition-2/train.csv')
df.shape
(750000,77)
X = df.drop(columns = 'Target')
y = df['Target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
model = MLPRegressor(hidden_layer_sizes = 60, activation = "relu", solver = "adam")
model
model.fit(X_train, y_train)
pr = model.predict(X_test)
pr.shape
(187500,)
model.score(y_test, pr)
ValueError: Expected 2D array, got 1D array instead:
array=[-120.79511811 -394.11307519 -449.59524477 ... -432.46130084 -492.81440014
-753.02016315].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Just started getting into ml. I dont really understand why I need to have a 2d array to get score or how do I convert mine into one. I did try to reshape it as said in the error but when I do that I get the messages ValueError: X has 1 features, but MLPRegressor is expecting 76 features as input. and ValueError: X has 187500 features, but MLPRegressor is expecting 76 features as input. for reshaping into (-1, 1) and (1, -1) respectively.
The correct way to call the score method would be:
model.score(X_test, y_test)
Internally, it first computes the predictions and then passes the predictions to a scoring function.
If you want to pass the predictions directly, you need to use one of the scoring functions in the metrics package, as explained here:
https://scikit-learn.org/0.15/modules/model_evaluation.html
Note: you might also want to have a look at the example code in the MLPRegressor documentation:
https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html

sklearn linear_model LinearRegression, ValueError: Expected 2D array, got 1D array instead

I am trying to fit data to my model,
This is the data
le = sklearn.preprocessing.LabelEncoder()
date = le.fit_transform(list(data["Date"]))
_open = le.fit_transform(list(data["Open"]))
high = le.fit_transform(list(data["High"]))
low = le.fit_transform(list(data["Low"]))
adj_close = le.fit_transform(list(data["Adj Close"]))
volume = le.fit_transform(list(data["Volume"]))
X = list(date)
y = list(zip(high, low, _open, adj_close, volume))
x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(X, y, test_size=0.1)
But when I try to fit the data into the model as displayed below
linear = sklearn.linear_model.LinearRegression()
linear.fit(x_train, y_train)
I get this error
ValueError: Expected 2D array, got 1D array instead:
array=[2088 311 1839 ... 2422 64 1705].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or
array.reshape(1, -1) if it contains a single sample.
Thanks
Try this
x_train= x_train.reshape(-1, 1)
x_test = x_test.reshape(-1, 1)

Python - Predicting test data that is smaller than train data

I have preprocessed some data ready to train a Multinomial Naive Bayes classification. The train data is 80% of my data and the test data is 20%.
The train data is an array of size 8452 and the test data is an array of size of 4231
If I want to see the predictions of train data I execute the following code just fine
multiNB = MultinomialNB()
model = multiNB.fit(x_train, y_train)
y_preds = model.predict(x_train)
but if I want to predict my test
i.e.
y_preds = model.predict(x_test)
I get the following error:
ValueError: matmul: Input operand 1 has a mismatch in its core dimension 0,
with gufunc signature (n?,k),(k,m?)->(n?,m?) (size 8452 is different from 4231)
If I need to provide more information about my code please ask, but I am stuck here and I do not really understand what is causing that error, and any help is welcomed.
This is how I obtained my train-test sets:
total_count = len(tokenised_reviews)
split = int(total_count * 0.8)
shuffle = np.random.permutation(total_count)
x = []
y = []
for i in range(total_count):
x.append(x_data[shuffle[i]])
y.append(y_data[shuffle[i]])
x_train = x[:split]
x_test = x[split:]
y_train = y[:split]
y_test = y[split:]
Too long to type as a comment, I got a very weird structure when I tried your again. I have no idea what is x_data so hard to explain what is the exact error.
i suspect something went wrong with putting the data back into a list again, so if you do this:
total_count = len(x_train)
split = int(total_count * 0.8)
shuffle = np.random.permutation(total_count)
x_train = x_data[shuffle[split:]]
x_test = x_data[shuffle[:split]]
y_train = y_data[shuffle[split:]]
y_test = y_data[shuffle[:split]]
You should get your x_train and x_test as a subset of the original data.
Or you can simply do:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x_data, y_data, test_size=0.2)

How to use the predict() function for a GLMGam model with BSplines (statsmodel api Python)?

I have a dataset of 25544 observations and 7 explanatory variables, that I split in train set and test set. Then I run a GAMGam model with BSplines on the train set.
y = dfop[['RATIO_OPENING']]
X = dfop.loc[:, ~dfop.columns.isin(['MED_RATIO_OPENING','RATIO_OPENING','OD_UNDIR_CITY_PAIR','MONTH'])]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
x_spline = X_train[['DISTANCE', 'CITY_POP_A','CITY_POP_B','A_GDP_PPP_1990_2015_5arcmin','A_HDI_1990_2015','B_GDP_PPP_1990_2015_5arcmin','B_HDI_1990_2015']]
bs = BSplines(x_spline, df=[3,3,3,3,3,3,3], degree=[2,2,2,2,2,2,2])
poisson = GLMGam(y_train, x_spline, smoother=bs, family=sm.families.Poisson())
poisson_fit = poisson.fit()
I want to predict the dependant variable on the test set.
X_test = X_test[['DISTANCE', 'CITY_POP_A','CITY_POP_B','A_GDP_PPP_1990_2015_5arcmin','A_HDI_1990_2015','B_GDP_PPP_1990_2015_5arcmin','B_HDI_1990_2015']]
results = poisson_fit.predict(exog=X_test, transform=True)
The last line returns the following error.
ValueError: shapes (6386,7) and (21,) not aligned: 7 (dim 1) != 21 (dim 0)
What is the correct syntax for the prediction?

How can I predict on the trained SVR model and resolve error Value Error: X.shape[1] = 1 should be equal to 22

I have datasets that have more than 2000 rows and 23 columns including the age column. I have completed all of the processes for SVR. Now I want to predict the trained SVR model is where I need to input X_test to the model? Have faced an error that is
ValueError: X.shape[1] = 1 should be equal to 22, the number of features at training time
How may I resolve this problem? How may I write code for making predictions on the trained SVR model?
import pandas as pd
import numpy as np
# Make fake dataset
dataset = pd.DataFrame(data= np.random.rand(2000,22))
dataset['age'] = np.random.randint(2, size=2000)
# Separate the target from the other features
target = dataset['age']
data = dataset.drop('age', axis = 1)
X_train, y_train = data.loc[:1000], target.loc[:1000]
X_test, y_test = data.loc[1001], target.loc[1001]
X_test = np.array(X_test).reshape((len(X_test), 1))
print(X_test.shape)
SupportVectorRefModel = SVR()
SupportVectorRefModel.fit(X_train, y_train)
y_pred = SupportVectorRefModel.predict(X_test)
Output:
ValueError: X.shape[1] = 1 should be equal to 22, the number of features at training time
Your reshaping of X_test is not correct; it should be:
X_test = np.array(X_test).reshape(1, -1)
print(X_test.shape)
# (1, 22)
With that change, the rest of your code runs OK:
y_pred = SupportVectorRefModel.predict(X_test)
y_pred
# array([0.90156667])
UPDATE
In the case as you show it in your code, obviously X_test consists of one single sample, as defined here:
X_test, y_test = data.loc[1001], target.loc[1001]
But if (as I suspect) this is not what you actually want, but in fact you want the rest of your data as your test set, you should change the definition to:
X_test, y_test = data.loc[1001:], target.loc[1001:]
X_test.shape
# (999, 22)
and without any reshaping
y_pred = SupportVectorRefModel.predict(X_test)
y_pred.shape
# (999,)
i.e. a y_pred of 999 predictions.

Categories