Error: Shapes (1,4) and (14,14) not aligned - python

So I'm a newbie to Machine Learning and a little baffled by this error:
Shapes (1,4) and (14,14) not aligned: 4 (dim 1) != 14 (dim 0)
Here is the full error:
File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/utils/extmath.py", line 140, in safe_sparse_dot
return np.dot(a, b)
ValueError: shapes (1,4) and (14,14) not aligned: 4 (dim 1) != 14 (dim 0)
My test set has 4 rows of data and training set 14 rows of data, as indicated by (1,4) and (14,14). At least I think that's what that means.
I'm trying to fit a simple linear regression to a Training set as indicated by my code below:
# Fit Simple Linear Regression to Training Set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
X_train = X_train.reshape(1,-1)
y_train = y_train.reshape(1,-1)
regressor.fit(X_train, y_train)
Then predict the Test Set Results:
# Predicting the Test Set Results
X_test = X_test.reshape(1,-1)
y_pred = regressor.predict(X_test)
My code is failing on the last line with the above error:
y_pred = regressor.predict(X_test)
Any hints in the right direction would be great.
Here is my whole code sample:
# Simple Linear Regression
# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Import dataset
dataset = pd.read_csv('NBA.csv')
X = dataset.iloc[:, 1].values
y = dataset.iloc[:, :-1].values
# Splitting the dataset into Train and Test
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
# None
# Fit Simple Linear Regression to Training Set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
X_train = X_train.reshape(1,-1)
y_train = y_train.reshape(1,-1)
regressor.fit(X_train, y_train)
# Predicting the Test Set Results
X_test = X_test.reshape(1,-1)
y_pred = regressor.predict(X_test)
** EDIT **
I checked the shape of X and y. Here is my output below:
dataset = pd.read_csv('NBA.csv')
X = dataset.iloc[:, 1].values
y = dataset.iloc[:, :-1].values
print(X.shape)
print(y.shape)
-->(18,)
-->(18, 1)

Please replace reshape(1,-1) to reshape(-1, 1) for all usages. The former transforms an array into (1 person x n features) and the latter does (n persons x 1 feature). feature is hight, in this case.
If you modified import section as below, there is no need to reshape the array since their shapes are already satisfy the form of (n persons x 1 feature).
# Import dataset
dataset = pd.read_csv('NBA.csv')
X = dataset.iloc[:, 1].values
y = dataset.iloc[:, 0].values
X = X.reshape(-1, 1)
y = y.reshape(-1, 1)
In an early age of the sklearn, you can feed vector as inputs. But recently it has changed and now you need to explicitly indicate whether the vector is (1 sample x n features) or (n samples x 1 feature) by using reshape or some other methods.

Related

Trying to train simple linear regression algorithm. Keep getting error [duplicate]

While practicing Simple Linear Regression Model I got this error,
I think there is something wrong with my data set.
Here is my data set:
Here is independent variable X:
Here is dependent variable Y:
Here is X_train
Here Is Y_train
This is error body:
ValueError: Expected 2D array, got 1D array instead:
array=[ 7. 8.4 10.1 6.5 6.9 7.9 5.8 7.4 9.3 10.3 7.3 8.1].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
And this is My code:
import pandas as pd
import matplotlib as pt
#import data set
dataset = pd.read_csv('Sample-data-sets-for-linear-regression1.csv')
x = dataset.iloc[:, 1].values
y = dataset.iloc[:, 2].values
#Spliting the dataset into Training set and Test Set
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= 0.2, random_state=0)
#linnear Regression
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x_train,y_train)
y_pred = regressor.predict(x_test)
Thank you
You need to give both the fit and predict methods 2D arrays. Your x_train and x_test are currently only 1 dimensional. What is suggested by the console should work:
x_train= x_train.reshape(-1, 1)
x_test = x_test.reshape(-1, 1)
This uses numpy's reshape to transform your array. For example, x = [1, 2, 3] wopuld be transformed to a matrix x' = [[1], [2], [3]] (-1 gives the x dimension of the matrix, inferred from the length of the array and remaining dimensions, 1 is the y dimension - giving us a n x 1 matrix where n is the input length).
Questions about reshape have been answered in the past, this for example should answer what reshape(-1,1) fully means: What does -1 mean in numpy reshape? (also some of the other below answers explain this very well too)
A lot of times when doing linear regression problems, people like to envision this graph
On the input, we have an X of X = [1,2,3,4,5]
However, many regression problems have multidimensional inputs. Consider the prediction of housing prices. It's not one attribute that determines housing prices. It's multiple features (ex: number of rooms, location, etc. )
If you look at the documentation you will see this
It tells us that rows consist of the samples while the columns consist of the features.
However, consider what happens when he have one feature as our input. Then we need an n x 1 dimensional input where n is the number of samples and the 1 column represents our only feature.
Why does the array.reshape(-1, 1) suggestion work? -1 means choose a number of rows that works based on the number of columns provided. See the image for how it changes in the input.
If you look at documentation of LinearRegression of scikit-learn.
fit(X, y, sample_weight=None)
X : numpy array or sparse matrix of shape [n_samples,n_features]
predict(X)
X : {array-like, sparse matrix}, shape = (n_samples, n_features)
As you can see X has 2 dimensions, where as, your x_train and x_test clearly have one.
As suggested, add:
x_train = x_train.reshape(-1, 1)
x_test = x_test.reshape(-1, 1)
Before fitting and predicting the model.
Use
y_pred = regressor.predict([[x_test]])
I would suggest to reshape X at the beginning before you do the split into train and test dataset:
import pandas as pd
import matplotlib as pt
#import data set
dataset = pd.read_csv('Sample-data-sets-for-linear-regression1.csv')
x = dataset.iloc[:, 1].values
y = dataset.iloc[:, 2].values
# Here is the trick
x = x.reshape(-1,1)
#Spliting the dataset into Training set and Test Set
from sklearn.cross_validation import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size= 0.2, random_state=0)
#linnear Regression
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x_train,y_train)
y_pred = regressor.predict(x_test)
This is what I use
X_train = X_train.values.reshape(-1, 1)
y_train = y_train.values.reshape(-1, 1)
X_test = X_test.values.reshape(-1, 1)
y_test = y_test.values.reshape(-1, 1)
This is the solution
regressor.predict([[x_test]])
And for polynomial regression:
regressor_2.predict(poly_reg.fit_transform([[x_test]]))
Modify
regressor.fit(x_train,y_train)
y_pred = regressor.predict(x_test)
to
regressor.fit(x_train.values.reshape(-1,1),y_train)
y_pred = regressor.predict(x_test.values.reshape(-1,1))

after using logistics regression i got ValueError: y should be a 1d array, got an array of shape (295, 7) instead

#splitting the dataset into dependent(y) and independent variable(x)
x = training_data.iloc[:,[0,2,3,4,5,6,7]].values
y = training_data.iloc[:,1].values
from sklearn.model_selection import train_test_split
x_train,y_train,x_test,y_test = train_test_split(x,y,test_size = 0.3,random_state = 0)
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(x_train,y_train)
i am trying to use logistic regression to train independent(x_train) and dependent variable(y_train) but everytime i run the code i see error
ValueError: y should be a 1d array, got an array of shape (295, 7) instead.
i don't know what to do
You have an error when making the train_test_split.
Be aware of output variables order, the correct output is like below:
X_train, X_test, y_train, y_test = train_test_split(x,y,test_size = 0.3,random_state=0)
Just changing this line, your problem should disappear.

Shapes not aligned when fitting polynomial regression

Long time listener, first time caller...
I know a similar question has been answered in the past (see here for other thread I have referenced), but I am still having difficulties. How can I get my regression to fit? My code is below:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
#data
np.random.seed(0)
n = 15
x = np.linspace(0,10,n) + np.random.randn(n)/5
y = np.sin(x)+x/6 + np.random.randn(n)/10
X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=0)
#regression fitting
X_predict_input = np.linspace(0,10,100).reshape(-1,1)
y_train = y_train.reshape((-1,1))
X_train = X_train.reshape((-1,1))
#looping through different degree values
for i, degree in enumerate([1,3,6,9]):
poly = PolynomialFeatures(degree=degree)
X_train_poly = poly.fit_transform(X_train)
linreg = LinearRegression().fit(X_train_poly, y_train)
result[i,:] = linreg.predict(X_predict_input)
I tried to fix the shaping issues with X_train and y_train, but after looking into each shape, I am thinking that the X_train_poly is what is driving this error...
X_train shape: (11, 1)
y_train shape: (11, 1)
X_train_poly shape: (11, 10)
Respective error message:
ValueError: shapes (100,1) and (2,1) not aligned: 1 (dim 1) != 2 (dim 0)
When I try to address the shape inconsistencies in X_train_poly by the following...
X_train_poly = poly.fit_transform(X_train).reshape((-1,1))
...I receive this error:
ValueError: Found input variables with inconsistent numbers of samples: [22, 11]
I have spent an embarrassing amount of time on this, so any insight at all would be greatly appreciated!
Thank you in advance :)
I think the problem is quite simple. You're using the PolynomialFeatures transform to generate features for the training data but when it comes to prediction, you're not applying the same transform to the input data.
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
# data
np.random.seed(0)
n = 15
x = np.linspace(0, 10, n) + np.random.randn(n)/5
y = np.sin(x) + x/6 + np.random.randn(n)/10
X_train, X_test, y_train, y_test = train_test_split(x.reshape((-1, 1)),
y.reshape((-1, 1)),
random_state=0)
# Check data matrices are in columns
assert(X_train.shape == (11, 1))
assert(y_train.shape == (11, 1))
# Build library of polynomial features
degree = 3
poly = PolynomialFeatures(degree)
X_train_poly = poly.fit_transform(X_train)
assert(X_train_poly.shape == (11, 4))
# Fit model
linreg = LinearRegression().fit(X_train_poly, y_train)
# Make prediction
X_predict = np.linspace(0, 10, 100).reshape(-1, 1)
X_predict_poly = poly.fit_transform(X_predict)
y_predict = linreg.predict(X_predict_poly)
assert(y_predict.shape == X_predict.shape)
Update:
To avoid the inconvenience of having to apply the transform every time you make a prediction, you might want to check out sklearn.Pipeline:
# Using a pipeline to automate the input transformation
from sklearn.pipeline import Pipeline
poly = PolynomialFeatures(degree)
model = LinearRegression()
pipeline = Pipeline(steps=[('t', poly), ('m', model)])
linreg = pipeline.fit(X_train, y_train)
y_predict2 = linreg.predict(X_predict)
assert(np.array_equal(y_predict, y_predict2))

How can I predict on the trained SVR model and resolve error Value Error: X.shape[1] = 1 should be equal to 22

I have datasets that have more than 2000 rows and 23 columns including the age column. I have completed all of the processes for SVR. Now I want to predict the trained SVR model is where I need to input X_test to the model? Have faced an error that is
ValueError: X.shape[1] = 1 should be equal to 22, the number of features at training time
How may I resolve this problem? How may I write code for making predictions on the trained SVR model?
import pandas as pd
import numpy as np
# Make fake dataset
dataset = pd.DataFrame(data= np.random.rand(2000,22))
dataset['age'] = np.random.randint(2, size=2000)
# Separate the target from the other features
target = dataset['age']
data = dataset.drop('age', axis = 1)
X_train, y_train = data.loc[:1000], target.loc[:1000]
X_test, y_test = data.loc[1001], target.loc[1001]
X_test = np.array(X_test).reshape((len(X_test), 1))
print(X_test.shape)
SupportVectorRefModel = SVR()
SupportVectorRefModel.fit(X_train, y_train)
y_pred = SupportVectorRefModel.predict(X_test)
Output:
ValueError: X.shape[1] = 1 should be equal to 22, the number of features at training time
Your reshaping of X_test is not correct; it should be:
X_test = np.array(X_test).reshape(1, -1)
print(X_test.shape)
# (1, 22)
With that change, the rest of your code runs OK:
y_pred = SupportVectorRefModel.predict(X_test)
y_pred
# array([0.90156667])
UPDATE
In the case as you show it in your code, obviously X_test consists of one single sample, as defined here:
X_test, y_test = data.loc[1001], target.loc[1001]
But if (as I suspect) this is not what you actually want, but in fact you want the rest of your data as your test set, you should change the definition to:
X_test, y_test = data.loc[1001:], target.loc[1001:]
X_test.shape
# (999, 22)
and without any reshaping
y_pred = SupportVectorRefModel.predict(X_test)
y_pred.shape
# (999,)
i.e. a y_pred of 999 predictions.

Expected 2D array error not getting resolved

i am trying to use my machine learning model on dataset where i have only two columns while standard scaling them,i got the error expected 2D array but got 1 .
Below is the code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# Splitting the dataset into the Training set and Test set
"""from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"""
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)
# Fitting SVR to the dataset
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(X, y)
# Predicting a new result
y_pred = regressor.predict(6.5)
y_pred = sc_y.inverse_transform(y_pred)
# Visualising the SVR results
plt.scatter(X, y, color = 'red')
plt.plot(X, regressor.predict(X), color = 'blue')
plt.title('Truth or Bluff (SVR)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
when i try to put
y = sc_y.fit_transform([y])
like this i received no error but when i execute next 3 lines i receive another error.
which is bad input shape (1, 10)
can anyone help me on this?
The StandardScaler() function in sklearn expects the input(X) to be in the following format:
X : numpy array of shape [n_samples, n_features]
So, reshape X to (-1,1) if you have only one feature column.
sc_X.fit_transform(X.reshape[-1,1])
This should work!

Categories