I am trying to build a model for an application,
I have used both the ridge regression and the SVR from sklearn and they seen to be different although I tried to keep the parameters the same.
I have used the regularization parameter to be = 1 in both models. (they both have L2 regularization)
There is an extra parameter for the poly kernel which I set to zero
The data are standardized.
from sklearn.linear_model import Ridge
linear_ridge = Ridge(alpha=1.0) # L2 regularization
linear_ridge.fit(np.array(X_train) , np.array(y_train))
from sklearn import svm
model_SVR_poly = svm.SVR(kernel = 'poly' , coef0=0.0 , degree = 1, C = 1.0 , epsilon = 0.1 ) #L2 regularization
model_SVR_poly.fit(np.array(X_train) , np.array(y_train))
Linear_ridge_pred = linear_ridge.predict(test_data[start_data:]) *Y_std[0] + Y_mean[0]
svr_poly_pred = model_SVR_poly.predict(test_data[start_data:]) *Y_std[0] + Y_mean[0]
If the value of epsilon is decreased , to 0.0 it will undershoot more than the ridge and if increased, it will overshoot more.
In the testing phase, the Ridge seems to undershoot while the SVR seems to overshoot.
What is the difference between the two implementations in my case or in general ?
For me, there might be some differences in the implementations of Ridge() and SVR() as you are pointing out.
On one side, there's a difference in the loss function as you might see here (epsilon-insensitive loss and squared epsilon-insensitive loss) vs here (Ridge loss). This is emphasized also within this example from sklearn documentation which however compares Kernel Ridge Regression and SVR with a non-linear kernel.
In addition to this, the fact you're using SVR with a polynomial Kernel of degree 1 adds a further difference: as you can see here and here (SVR is built on top of the LibSVM library) there's a further parameter (gamma) to be considered (you might put it equal to 1 for convenience, it equals 'scale' by default).
Here is the difference in fitting that I could get by adjusting this toy example (with non-tuned parameters). I've also tried to consider LinearSVR() that has some further differences wrt SVR() as you can see eg here or here.
print(__doc__)
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.svm import LinearSVR, SVR
import matplotlib.pyplot as plt
np.random.seed(42)
# #############################################################################
# Generate sample data
X = np.sort(5 * np.random.rand(40, 1), axis=0)
y = np.sin(X).ravel()
# #############################################################################
# Add noise to targets
y[::5] += 3 * (0.5 - np.random.rand(8))
# #############################################################################
# Fit regression model
svr_lin = SVR(kernel='linear', C=1, tol=1e-5)
svr_lins = LinearSVR(loss='squared_epsilon_insensitive', C=1, tol=1e-5, random_state=42)
svr_poly = SVR(kernel='poly', C=1, degree=1, gamma=1, tol=1e-5, coef0=0.0)
ridge = Ridge(alpha=1, random_state=42)
y_lin = svr_lin.fit(X, y).predict(X)
y_lins = svr_lins.fit(X, y).predict(X)
y_poly = svr_poly.fit(X, y).predict(X)
y_ridge = ridge.fit(X, y).predict(X)
coef_y_lin, intercept_y_lin = svr_lin.coef_, svr_lin.intercept_
coef_y_lins, intercept_y_lins = svr_lins.coef_, svr_lins.intercept_
coef_y_ridge, intercept_y_ridge = ridge.coef_, ridge.intercept_
# #############################################################################
# Look at the results
lw = 2
plt.figure(figsize=(10,5))
plt.scatter(X, y, color='darkorange', label='data')
plt.plot(X, y_lins, color='navy', lw=lw, label='Linear model (LinearSVR) %s, %s' %
(coef_y_lins, intercept_y_lins))
plt.plot(X, y_lin, color='red', lw=lw, label='Linear model (SVR) %s, %s' % (coef_y_lin, intercept_y_lin))
plt.plot(X, y_poly, color='cornflowerblue', lw=lw, label='Polynomial model of degree 1 (SVR)')
plt.plot(X, y_ridge, color='g', lw=lw, label='Ridge %s, %s' % (coef_y_ridge, intercept_y_ridge))
plt.xlabel('data')
plt.ylabel('target')
plt.title('Support Vector Regression')
plt.legend()
plt.axis([0, 5, -1, 1.5])
I try to create a model that would describe the behaviour of my data. I tried the simple linear regression, simple polynomial regression and polynomial regression with regularization and cross-validation. I followed this explanation to perform the regressions.
The problem is that all models give negative R^2 for the test data. I tried 1st, 2nd, 3rd degree polynomial models. Then it is getting even worse.
I was wondering whether somebody could help me to figure out what is wrong? Or what model can I use to get rid of negative R^2 and obtain a normal one?
Summary for the simple linear regression
MAE, MSE, RMSE and R^2 for the simple linear regression
MAE, MSE, RMSE and R^2 for the simple polynomial regression
MAE, MSE, RMSE and R^2 for the polynomial regression with regularization and cross-validation
Code:
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
from pandas import DataFrame
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
# Import function to automatically create polynomial features
from sklearn.preprocessing import PolynomialFeatures
# Import Linear Regression and a regularized regression function
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import LassoCV
#Initial data (Three independent variables - Cycle, Internal Resistance and CV Capacity; One dependent - Full Capacity)
SoH = {'Cycle': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28],
'Internal_Resistance': [0.039684729, 0.033377614, 0.031960606, 0.03546798, 0.036786229, 0.03479803, 0.026613861, 0.028650246, 0.028183795, 0.035455215, 0.029205355, 0.033891692, 0.026988849, 0.025647298, 0.033970376, 0.03172454, 0.032437203, 0.033771218, 0.030939938, 0.036919977, 0.027832869, 0.028602469, 0.023065191, 0.028890529, 0.026640394, 0.031488253, 0.02865842, 0.027648949],
'CV_Capacity': [389.9270401, 307.7366414, 357.6412139, 192.134787, 212.415946, 204.737916, 166.506029, 157.826878, 196.432589, 181.937188, 192.070363, 209.890964, 198.978988, 206.126864, 185.631644, 193.776497, 200.61431, 174.359373, 177.503285, 174.07905, 170.654873, 184.528031, 208.065379, 210.134795, 208.199237, 184.693507, 193.00402, 191.913131],
'Full_Capacity': [1703.8575, 1740.7017, 1760.66, 1775.248302, 1771.664053, 1781.958089, 1783.2295, 1784.500912, 1779.280477, 1780.175547, 1800.761265, 1789.047162, 1791.763677, 1787.014667, 1796.520256, 1798.349587, 1791.776304, 1788.892761, 1791.990303, 1790.307248, 1796.580484, 1803.89133, 1793.305294, 1784.638742, 1780.056339, 1783.081746, 1772.001436, 1794.182046]
}
#Data to test the model
Test = {'Cycle': [29, 30, 31, 32, 33, 34, 35],
'Internal_Resistance': [0.026217822, 0.032549629, 0.025744309, 0.027945824, 0.027332509, 0.027960729, 0.028969193],
'CV_Capacity': [196.610972, 194.915587, 183.209067, 182.41669, 204.018257, 179.929472, 189.576431],
'Full_Capacity': [1777.880947, 1792.21646, 1785.653845, 1788.401923, 1782.983718, 1793.939504, 1788.67233]
}
#Convert initial data into DataFrame
df = DataFrame(SoH,columns=['Cycle','Internal_Resistance','CV_Capacity','Full_Capacity'])
df1 = DataFrame(SoH,columns=['Cycle','Internal_Resistance','CV_Capacity'])
X = df1.to_numpy()
print(df.head())
print()
#Convert data to test the model into DataFrame
dft = DataFrame(Test,columns=['Cycle','Internal_Resistance','CV_Capacity','Full_Capacity'])
dft1 = DataFrame(Test,columns=['Cycle','Internal_Resistance','CV_Capacity'])
Xt = dft1.to_numpy()
#Plot the Full Capacity vs predictors (Cycle, Internal Resistance and CV Capacity)
for i in df.columns:
df.plot.scatter(i,'Full_Capacity', edgecolors=(0,0,0),s=50,c='g',grid=True)
#STATSMODELS
# Fitting data with statsmodels
X1 = df[['Cycle','Internal_Resistance','CV_Capacity']]
Y1 = df['Full_Capacity']
X1 = sm.add_constant(X1.values) # adding a constant
model = sm.OLS(Y1, X1).fit()
predictions = model.predict(X1)
print_model = model.summary()
print(print_model)
print()
#SCIKIT LEARN (Simple polynomial regression and polynomial regression with regularization and cross-validation)
# Fitting data - simple polynomial regression (1st degree)
linear_model = LinearRegression(normalize=True)
X_linear=df.drop('Full_Capacity',axis=1)
y_linear=df['Full_Capacity']
X_linear_test=dft.drop('Full_Capacity',axis=1)
y_linear_test=dft['Full_Capacity']
linear_model.fit(X_linear,y_linear)
y_pred_linear = linear_model.predict(X_linear)
y_pred_linear_test = linear_model.predict(X_linear_test)
#Coefficients for the model
coeff_linear = pd.DataFrame(linear_model.coef_,index=df.drop('Full_Capacity',axis=1).columns, columns=['Linear model coefficients'])
print(coeff_linear)
print()
#Metrics of the model
MAE_linear = mean_absolute_error(y_linear, y_pred_linear)
print("Mean absolute error of linear model:",MAE_linear)
MSE_linear = mean_squared_error(y_linear, y_pred_linear)
print("Mean-squared error of linear model:",MSE_linear)
RMSE_linear = np.sqrt(MSE_linear)
print("Root-mean-squared error of linear model:",RMSE_linear)
print()
MAE_linear_test = mean_absolute_error(y_linear_test, y_pred_linear_test)
print("Mean absolute error of linear model (validation):",MAE_linear_test)
MSE_linear_test = mean_squared_error(y_linear_test, y_pred_linear_test)
print("Mean-squared error of linear model (validation):",MSE_linear_test)
RMSE_linear_test = np.sqrt(MSE_linear_test)
print("Root-mean-squared error of linear model (validation):",RMSE_linear_test)
print()
print ("R2 value of linear model:",linear_model.score(X_linear,y_linear))
print ("R2 value of linear model (validation):",linear_model.score(X_linear_test,y_linear_test))
print()
#Plot predicted values vs actual values
plt.figure(figsize=(12,8))
plt.xlabel("Predicted value with linear fit",fontsize=20)
plt.ylabel("Actual y-values",fontsize=20)
plt.grid(1)
plt.scatter(y_pred_linear,y_linear,edgecolors=(0,0,0),lw=2,s=80)
plt.plot(y_pred_linear,y_pred_linear, 'k--', lw=2)
plt.figure(figsize=(12,8))
plt.xlabel("Predicted (validation) value with linear fit",fontsize=20)
plt.ylabel("Actual (validation) y-values",fontsize=20)
plt.grid(1)
plt.scatter(y_pred_linear_test,y_linear_test,edgecolors=(0,0,0),lw=2,s=80)
plt.plot(y_pred_linear_test,y_pred_linear_test, 'k--', lw=2)
#Fitting data - simple polynomial regression (3rd degree)
poly = PolynomialFeatures(3,include_bias=False)
X_poly = poly.fit_transform(X)
X_poly_feature_name = poly.get_feature_names(['Feature'+str(l) for l in range(1,4)])
print()
print()
print("3rd degree polynomial regression")
print()
print()
print(X_poly_feature_name)
print(len(X_poly_feature_name))
print()
df_poly = pd.DataFrame(X_poly, columns=X_poly_feature_name)
print(df_poly.head())
print()
df_poly['y']=df['Full_Capacity']
print(df_poly.head())
print()
X_train=df_poly.drop('y',axis=1)
y_train=df_poly['y']
#Testing the model
test = PolynomialFeatures(3,include_bias=False)
X_test=test.fit_transform(Xt)
X_test_feature_name = test.get_feature_names(['Feature'+str(l) for l in range(1,4)])
print(X_test_feature_name)
print(len(X_test_feature_name))
print()
df_test = pd.DataFrame(X_test, columns=X_test_feature_name)
print(df_test.head())
print()
df_test['y']=dft['Full_Capacity']
#Data to test the polynomial models
X_testo=df_test.drop('y',axis=1)
y_testo=df_test['y']
poly = LinearRegression(normalize=True)
model_poly=poly.fit(X_train,y_train)
y_poly = poly.predict(X_train)
y_poly_test = np.array(poly.predict(X_testo))
coeff_poly = pd.DataFrame(model_poly.coef_,index=df_poly.drop('y',axis=1).columns, columns=['Coefficients polynomial model'])
print(coeff_poly)
print()
#Metrics of the polynomial model
MAE_poly = mean_absolute_error(y_train, y_poly)
print("Mean absolute error of simple polynomial model:",MAE_poly)
MSE_poly = mean_squared_error(y_train, y_poly)
print("Mean-squared error of simple polynomial model:",MSE_poly)
RMSE_poly = np.sqrt(MSE_poly)
print("Root-mean-squared error of simple polynomial model:",RMSE_poly)
print()
MAE_poly_test = mean_absolute_error(y_testo, y_poly_test)
print("Mean absolute error of simple polynomial model (validation):",MAE_poly_test)
MSE_poly_test = mean_squared_error(y_testo, y_poly_test)
print("Mean-squared error of simple polynomial model (validation):",MSE_poly_test)
RMSE_poly_test = np.sqrt(MSE_poly_test)
print("Root-mean-squared error of simple polynomial model (validation):",RMSE_poly_test)
print()
print ("R2 value of simple polynomial model:",model_poly.score(X_train,y_train))
print ("R2 value of simple polynomial model (validation):",model_poly.score(X_testo,y_testo))
print()
plt.figure(figsize=(12,8))
plt.xlabel("Predicted value with simple polynomial model",fontsize=20)
plt.ylabel("Actual y-values",fontsize=20)
plt.grid(1)
plt.scatter(y_poly,y_train,edgecolors=(0,0,0),lw=2,s=80)
plt.plot(y_poly,y_poly, 'k--', lw=2)
plt.figure(figsize=(12,8))
plt.xlabel("Predicted (validation) value with Simple polynomial model",fontsize=20)
plt.ylabel("Actual (validation) y-values",fontsize=20)
plt.grid(1)
plt.scatter(y_poly_test,y_testo,edgecolors=(0,0,0),lw=2,s=80)
plt.plot(y_poly_test,y_poly_test, 'k--', lw=2)
#Fitting data with a polynomial model with regularization and cross-validation
model1 = LassoCV(cv=10,verbose=0,normalize=True,eps=0.001,n_alphas=100, fit_intercept = True, tol=0.0001,max_iter=10000)
model1.fit(X_train,y_train)
y_pred1 = np.array(model1.predict(X_train))
y_pred2 = np.array(model1.predict(X_testo))
print()
print()
print("3rd degree polynomial regression with regularization and cross-validation")
print()
print()
coeff1 = pd.DataFrame(model1.coef_,index=df_poly.drop('y',axis=1).columns, columns=['Coefficients Metamodel'])
print(coeff1)
print()
print(coeff1[coeff1['Coefficients Metamodel']!=0])
print("Intercept of the new polynomial model:",model1.intercept_)
print()
#Metrics of the polynomial model with regularization and cross-validation
MAE_1 = mean_absolute_error(y_train, y_pred1)
print("Mean absolute error of the new polynomial model:",MAE_1)
MSE_1 = mean_squared_error(y_train, y_pred1)
print("Mean-squared error of the new polynomial model:",MSE_1)
RMSE_1 = np.sqrt(MSE_1)
print("Root-mean-squared error of the new polynomial model:",RMSE_1)
print()
MAE_1_test = mean_absolute_error(y_testo, y_pred2)
print("Mean absolute error of the new polynomial model (validation):",MAE_1_test)
MSE_1_test = mean_squared_error(y_testo, y_pred2)
print("Mean-squared error of the new polynomial model (validation):",MSE_1_test)
RMSE_1_test = np.sqrt(MSE_1_test)
print("Root-mean-squared error of the new polynomial model (validation):",RMSE_1_test)
print()
print ("R2 value of the new polynomial model:",model1.score(X_train,y_train))
print ("R2 value of the new polynomial model (validation):",model1.score(X_testo,y_testo))
print ("Alpha of the new polynomial model:",model1.alpha_)
print()
plt.figure(figsize=(12,8))
plt.xlabel("Predicted value with Metamodel",fontsize=20)
plt.ylabel("Actual y-values",fontsize=20)
plt.grid(1)
plt.scatter(y_pred1,y_train,edgecolors=(0,0,0),lw=2,s=80)
plt.plot(y_pred1,y_pred1, 'k--', lw=2)
plt.figure(figsize=(12,8))
plt.xlabel("Predicted (validation) value with Metamodel",fontsize=20)
plt.ylabel("Actual (validation) y-values",fontsize=20)
plt.grid(1)
plt.scatter(y_pred2,y_testo,edgecolors=(0,0,0),lw=2,s=80)
plt.plot(y_pred2,y_pred2, 'k--', lw=2) ```
[1]: https://i.stack.imgur.com/AhSwJ.png
I'm currently using TensorFlow and SkLearn to to try to make a model that can predict the amount of sales for a certain product, X, based on the outdoor temperature in celcius.
I took my datasets for the temperature and set it equal to the x variable, and the amount of sales to as a y variable. As seen on the picture below, there is some sort of correlation between the temperature and the amount of sales:
First and foremost, I tried to do linear regression to see how well it'd fit. This is the code for that:
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(x_train, y_train) #fit tries to fit the x variable and y variable.
#Let's try to plot it out.
y_pred = model.predict(x_train)
plt.scatter(x_train,y_train)
plt.plot(x_train,y_pred,'r')
plt.legend(['Predicted Line', 'Observed data'])
plt.show()
This resulted in a predicted line that had a pretty poor fit:
A very nice feature from sklearn however is that you can try to predict an value based on a temperature, so if I were to write
model.predict(15)
i'd get the output
array([6949.05567873])
This is exactly what I want, I just wanted to line to fit better so instead I tried polynoimal regression with sklearn by doing following:
from sklearn.preprocessing import PolynomialFeatures
poly = PolynomialFeatures(degree=8, include_bias=False) #the bias is avoiding the need to intercept
x_new = poly.fit_transform(x_train)
new_model = LinearRegression()
new_model.fit(x_new,y_train)
#plotting
y_prediction = new_model.predict(x_new) #this actually predicts x...?
plt.scatter(x_train,y_train)
plt.plot(x_new[:,0], y_prediction, 'r')
plt.legend(['Predicted line', 'Observed data'])
plt.show()
The line seems to fit better now:
My problem is not that I can't use new_model.predict(x) since it'll result in "ValueError: shapes (1,1) and (8,) not aligned: 1 (dim 1) != 8 (dim 0)". I understand that this is because I'm using a 8-degree polynomium, but is there any way for me to predict the y-axsis based on ONE temperature using the polynomial regression model?
Try using new_model.predict([x**a for a in range(1,9)])
or according to your previously used code, you can do new_model.predict(poly.fit_transform(x))
Since you fit a line
y = ax^1 + bx^2 + ... + h*x^8
you, need to transform your input in the same manner i.e. turn it into a polynomial without the intercept and slope terms. This was what you passed into Linear Regression training function. It learns the slope terms for that polynomial. The plot you've shown only contains the x^1 term you indexed into (x_new[:,0]) which means that the data you're using has more columns.
One last note: always make sure your training data and future/validation data undergo the same preprocessing steps to ensure your model works.
Here's some detail :
Let's start by running your code, on synthetic data.
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures
from numpy.random import rand
x_train = rand(1000,1)
y_train = rand(1000,1)
poly = PolynomialFeatures(degree=8, include_bias=False) #the bias is avoiding the need to intercept
x_new = poly.fit_transform(x_train)
new_model = LinearRegression()
new_model.fit(x_new,y_train)
#plotting
y_prediction = new_model.predict(x_new) #this predicts y
plt.scatter(x_train,y_train)
plt.plot(x_new[:,0], y_prediction, 'r')
plt.legend(['Predicted line', 'Observed data'])
plt.show()
Now we can predict y value by transforming an x-value into a polynomial of degree 8 without an intercept
print(new_model.predict(poly.fit_transform(0.25)))
[[0.47974408]]