i am trying to use my machine learning model on dataset where i have only two columns while standard scaling them,i got the error expected 2D array but got 1 .
Below is the code:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# Splitting the dataset into the Training set and Test set
"""from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)"""
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
sc_y = StandardScaler()
X = sc_X.fit_transform(X)
y = sc_y.fit_transform(y)
# Fitting SVR to the dataset
from sklearn.svm import SVR
regressor = SVR(kernel = 'rbf')
regressor.fit(X, y)
# Predicting a new result
y_pred = regressor.predict(6.5)
y_pred = sc_y.inverse_transform(y_pred)
# Visualising the SVR results
plt.scatter(X, y, color = 'red')
plt.plot(X, regressor.predict(X), color = 'blue')
plt.title('Truth or Bluff (SVR)')
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
when i try to put
y = sc_y.fit_transform([y])
like this i received no error but when i execute next 3 lines i receive another error.
which is bad input shape (1, 10)
can anyone help me on this?
The StandardScaler() function in sklearn expects the input(X) to be in the following format:
X : numpy array of shape [n_samples, n_features]
So, reshape X to (-1,1) if you have only one feature column.
sc_X.fit_transform(X.reshape[-1,1])
This should work!
Related
I am trying to train my data for forecasting by support vector regression. I exclude time series before doing regression because time is not an input. But I need time data in plotting and should be in order. When it comes to plotting, I am having a problem with getting actual data time index values. I need actual corresponding time series for y_test and y_pred. When I tried to get original datetime index, plotting is not correct and not in the date order corresponding with the y series.
The output should be time(in order such as from 01/01/2021 to 31/12/2021) vs y_pred and y_test.
Here is my dataset: https://github.com/ozgurylc/Dataset
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
dataset = pd.read_csv('Combined_MET_PV_data.csv')
# takes necessary columns
df = dataset[['referenceTime', 'dew_point_temp', 'air_temp', 'relative_humidity',
'irradiance', 'wind_speed', 'wind_category',
'hour_harmonic', 'AC_Power_IV2']]
print(df)
X = df.iloc[:, :-1].values # does not take Power
y = df.iloc[:, -1].values # only takes Power
print(X)
print(y)
print(X.shape, y.shape)
y = np.reshape(y, (-1,1))
print(y.shape)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
print("Train Shape: {} {} \nTest Shape: {} {}".format(X_train.shape, y_train.shape,
X_test.shape, y_test.shape))
X_train = X_train[:, 1:] # excludes referenceTime from X_train
X_test1 = X_test[:, 1:] #excludes referenceTime fron X_test
print(X_test[:, 0].tolist()) # this keeps referenceTime
print(X_test)
Here is where regression is done:
# scaling
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test1 = sc_X.transform(X_test1)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)
y_test = sc_y.transform(y_test)
y_train = y_train.reshape((-1,))
svr_linear = svm.SVR(kernel='rbf')
svr_linear.fit(X_train, y_train)
print(svr_linear.score(X_test1, y_test))
y_pred = svr_linear.predict(X_test1)
print(y_pred)
# in the following code X_test[:,0] where time index is kept.
plot_1 = plt.plot(X_test[:, 0], y_test, color='red', linewidth=2)
plot_2 = plt.plot(X_test[:, 0], y_pred, color='blue', linewidth=2, linestyle='--')
plt.show()
I study support vector regression but I faced a problem: my r2 score becomes negative. Is that normal or is there any changeable part in my code to fix this?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import SVR
df = pd.read_csv('Position_Salaries.csv')
df.head()
X = df.iloc[:, 1:2].values
y = df.iloc[:, -1].values
from sklearn.preprocessing import StandardScaler
y = y.reshape(len(y),1)
x_scaler = StandardScaler()
y_scaler = StandardScaler()
X = x_scaler.fit_transform(X)
y = y_scaler.fit_transform(y)
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 42)
regressor = SVR(kernel="rbf")
regressor.fit(x_train,y_train.ravel())
y_pred = y_scaler.inverse_transform(regressor.predict(x_scaler.transform(x_test)))
from sklearn.metrics import r2_score
r2_score(y_scaler.inverse_transform(y_test), y_pred)
My output is -0.5313206322807349
In this part, your X is in scaled version
X = x_scaler.fit_transform(X)
In this part, your x_test also in scaled version
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.4, random_state = 42)
When creating prediction, you shouldn't transform your input again since your x_test already in scaled version
y_pred = y_scaler.inverse_transform(regressor.predict(x_scaler.transform(x_test)))
From the documentation of sklearn.metrics.r2_score.
Best possible score is 1.0 and it can be negative (because the model
can be arbitrarily worse). A constant model that always predicts the
expected value of y, disregarding the input features, would get a R^2
score of 0.0.
Per documentation:
Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse)
I am trying to write the code for K-NN
Below is my code. - I know that issue is in `predict() but I am not able to figure out how o fix it.
# Importing the libraries
import numpy as np
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('UniversalBank.csv')
X = dataset.iloc[:,[ 1,2,3,5,6,7,8,10,11,12,13]].values #,
y = dataset.iloc[:,9].values
#Splitting the dataset to training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state= 0)
#Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
#Fitting the classifier to training set
from sklearn.neighbors import KNeighborsClassifier
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train,y_train)
#Predicting the test results
y_pred = classifier.predict(X_test)
I am trying to plot an 8x8 correlation matrix between the different feature scores and the corresponding chances of admit. May I know how I am supposed to do so?
import tensorflow as tf
import numpy as np
import pylab as plt
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import pandas as pd
admit_data = np.genfromtxt('admission_predict.csv', delimiter= ',')
X_data, Y_data = admit_data[1:,1:8], admit_data[1:,-1]
x_train, x_test, y_train_, y_test_ = train_test_split(
X_data,
Y_data,
test_size=0.3,
random_state=42
)
scaler = preprocessing.StandardScaler()
x_train = scaler.fit_transform(x_train)
x_test = scaler.fit_transform(x_test)
y_train = y_train_.reshape(len(y_train_), no_labels)
y_test = y_test_.reshape(len(y_test_), no_labels)
data = admit_data
df = pd.DataFrame(data, columns = ['Serial No.','GRE Score','TOEFL Score','University Rating','SOP','LOR','CGPA','Research','Chance of Admit'])
df.corr()
This is the code I'm reading now and my file looks like this
Please help me plot a 8x8 correlation matrix as of now my code doesn't return a 8x8 correlation matrix
What about
import matplotlib.pyplot as plt
cors = df.corr()
plt.matshow(cors)
plt.yticks(range(cors.shape[1]), cors.columns, fontsize=7)
plt.xticks(range(cors.shape[1]), cors.columns, fontsize=7, rotation=90)
plt.colorbar()
to use all except "Serial No" column use this cors instead:
cors = df.drop("Serial No.", axis=1).corr()
So I'm a newbie to Machine Learning and a little baffled by this error:
Shapes (1,4) and (14,14) not aligned: 4 (dim 1) != 14 (dim 0)
Here is the full error:
File "/Users/jim/anaconda3/lib/python3.6/site-packages/sklearn/utils/extmath.py", line 140, in safe_sparse_dot
return np.dot(a, b)
ValueError: shapes (1,4) and (14,14) not aligned: 4 (dim 1) != 14 (dim 0)
My test set has 4 rows of data and training set 14 rows of data, as indicated by (1,4) and (14,14). At least I think that's what that means.
I'm trying to fit a simple linear regression to a Training set as indicated by my code below:
# Fit Simple Linear Regression to Training Set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
X_train = X_train.reshape(1,-1)
y_train = y_train.reshape(1,-1)
regressor.fit(X_train, y_train)
Then predict the Test Set Results:
# Predicting the Test Set Results
X_test = X_test.reshape(1,-1)
y_pred = regressor.predict(X_test)
My code is failing on the last line with the above error:
y_pred = regressor.predict(X_test)
Any hints in the right direction would be great.
Here is my whole code sample:
# Simple Linear Regression
# Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Import dataset
dataset = pd.read_csv('NBA.csv')
X = dataset.iloc[:, 1].values
y = dataset.iloc[:, :-1].values
# Splitting the dataset into Train and Test
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
# Feature Scaling
# None
# Fit Simple Linear Regression to Training Set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
X_train = X_train.reshape(1,-1)
y_train = y_train.reshape(1,-1)
regressor.fit(X_train, y_train)
# Predicting the Test Set Results
X_test = X_test.reshape(1,-1)
y_pred = regressor.predict(X_test)
** EDIT **
I checked the shape of X and y. Here is my output below:
dataset = pd.read_csv('NBA.csv')
X = dataset.iloc[:, 1].values
y = dataset.iloc[:, :-1].values
print(X.shape)
print(y.shape)
-->(18,)
-->(18, 1)
Please replace reshape(1,-1) to reshape(-1, 1) for all usages. The former transforms an array into (1 person x n features) and the latter does (n persons x 1 feature). feature is hight, in this case.
If you modified import section as below, there is no need to reshape the array since their shapes are already satisfy the form of (n persons x 1 feature).
# Import dataset
dataset = pd.read_csv('NBA.csv')
X = dataset.iloc[:, 1].values
y = dataset.iloc[:, 0].values
X = X.reshape(-1, 1)
y = y.reshape(-1, 1)
In an early age of the sklearn, you can feed vector as inputs. But recently it has changed and now you need to explicitly indicate whether the vector is (1 sample x n features) or (n samples x 1 feature) by using reshape or some other methods.