Simple Linear Regression in Python - python

I am trying to implement this algorithm to find the intercept and slope for single variable:
Here is my Python code to update the Intercept and slope. But it is not converging. RSS is Increasing with Iteration rather than decreasing and after some iteration it's becoming infinite. I am not finding any error implementing the algorithm.How Can I solve this problem? I have attached the csv file too.
Here is the code.
import pandas as pd
import numpy as np
#Defining gradient_decend
#This Function takes X value, Y value and vector of w0(intercept),w1(slope)
#INPUT FEATURES=X(sq.feet of house size)
#TARGET VALUE=Y (Price of House)
#W=np.array([w0,w1]).reshape(2,1)
#W=[w0,
# w1]
def gradient_decend(X,Y,W):
intercept=W[0][0]
slope=W[1][0]
#Here i will get a list
#list is like this
#gd=[sum(predicted_value-(intercept+slope*x)),
# sum(predicted_value-(intercept+slope*x)*x)]
gd=[sum(y-(intercept+slope*x) for x,y in zip(X,Y)),
sum(((y-(intercept+slope*x))*x) for x,y in zip(X,Y))]
return np.array(gd).reshape(2,1)
#Defining Resudual sum of squares
def RSS(X,Y,W):
return sum((y-(W[0][0]+W[1][0]*x))**2 for x,y in zip(X,Y))
#Reading Training Data
training_data=pd.read_csv("kc_house_train_data.csv")
#Defining fixed parameters
#Learning Rate
n=0.0001
iteration=1500
#Intercept
w0=0
#Slope
w1=0
#Creating 2,1 vector of w0,w1 parameters
W=np.array([w0,w1]).reshape(2,1)
#Running gradient Decend
for i in range(iteration):
W=W+((2*n)* (gradient_decend(training_data["sqft_living"],training_data["price"],W)))
print RSS(training_data["sqft_living"],training_data["price"],W)
Here is the CSV file.

Firstly, I find that when writing machine learning code, it's best NOT to use complex list comprehension because anything that you can iterate,
it's easier to read if written when normal loops and indentation and/or
it can be done with numpy broadcasting
And using proper variable names can help you better understand the code. Using Xs, Ys, Ws as short hand is nice only if you're good at math. Personally, I don't use them in the code, especially when writing in python. From import this: explicit is better than implicit.
My rule of thumb is to remember that if I write code I can't read 1 week later, it's bad code.
First, let's decide what is the input parameters for gradient descent, you will need:
feature_matrix (The X matrix, type: numpy.array, a matrix of N * D size, where N is the no. of rows/datapoints and D is the no. of columns/features)
output (The Y vector, type: numpy.array, a vector of size N)
initial_weights (type: numpy.array, a vector of size D).
Additionally, to check for convergence you will need:
step_size (the magnitude of change when iterating through to change the weights; type: float, usually a small number)
tolerance (the criteria to break the iterations, when the gradient magnitude is smaller than tolerance, assume that your weights have convereged, type: float, usually a small number but much bigger than the step size).
Now to the code.
def regression_gradient_descent(feature_matrix, output, initial_weights, step_size, tolerance):
converged = False # Set a boolean to check for convergence
weights = np.array(initial_weights) # make sure it's a numpy array
while not converged:
# compute the predictions based on feature_matrix and weights.
# iterate through the row and find the single scalar predicted
# value for each weight * column.
# hint: a dot product can solve this easily
predictions = [??? for row in feature_matrix]
# compute the errors as predictions - output
errors = predictions - output
gradient_sum_squares = 0 # initialize the gradient sum of squares
# while we haven't reached the tolerance yet, update each feature's weight
for i in range(len(weights)): # loop over each weight
# Recall that feature_matrix[:, i] is the feature column associated with weights[i]
# compute the derivative for weight[i]:
# Hint: the derivative is = 2 * dot product of feature_column and errors.
derivative = 2 * ????
# add the squared value of the derivative to the gradient magnitude (for assessing convergence)
gradient_sum_squares += (derivative * derivative)
# subtract the step size times the derivative from the current weight
weights[i] -= (step_size * derivative)
# compute the square-root of the gradient sum of squares to get the gradient magnitude:
gradient_magnitude = ???
# Then check whether the magnitude is lower than the tolerance.
if ???:
converged = True
# Once it while loop breaks, return the loop.
return(weights)
I hope the extended pseudo-code helps you better understand the gradient descent. I won't fill in the ??? so as to not spoil your homework.
Note that your RSS code is also unreadable and unmaintainable. It's easier to do just:
>>> import numpy as np
>>> prediction = np.array([1,2,3])
>>> output = np.array([1,1,5])
>>> residual = output - prediction
>>> RSS = sum(residual * residual)
>>> RSS
5
Going through numpy basics will go a long way to machine learning and matrix-vector manipulation without going nuts with iterations: http://docs.scipy.org/doc/numpy-1.10.1/user/basics.html

I have solved my own problem!
Here is the solved way.
import numpy as np
import pandas as pd
import math
from sys import stdout
#function Takes the pandas dataframe, Input features list and the target column name
def get_numpy_data(data, features, output):
#Adding a constant column with value 1 in the dataframe.
data['constant'] = 1
#Adding the name of the constant column in the feature list.
features = ['constant'] + features
#Creating Feature matrix(Selecting columns and converting to matrix).
features_matrix=data[features].as_matrix()
#Target column is converted to the numpy array
output_array=np.array(data[output])
return(features_matrix, output_array)
def predict_outcome(feature_matrix, weights):
weights=np.array(weights)
predictions = np.dot(feature_matrix, weights)
return predictions
def errors(output,predictions):
errors=predictions-output
return errors
def feature_derivative(errors, feature):
derivative=np.dot(2,np.dot(feature,errors))
return derivative
def regression_gradient_descent(feature_matrix, output, initial_weights, step_size, tolerance):
converged = False
#Initital weights are converted to numpy array
weights = np.array(initial_weights)
while not converged:
# compute the predictions based on feature_matrix and weights:
predictions=predict_outcome(feature_matrix,weights)
# compute the errors as predictions - output:
error=errors(output,predictions)
gradient_sum_squares = 0 # initialize the gradient
# while not converged, update each weight individually:
for i in range(len(weights)):
# Recall that feature_matrix[:, i] is the feature column associated with weights[i]
feature=feature_matrix[:, i]
# compute the derivative for weight[i]:
#predict=predict_outcome(feature,weights[i])
#err=errors(output,predict)
deriv=feature_derivative(error,feature)
# add the squared derivative to the gradient magnitude
gradient_sum_squares=gradient_sum_squares+(deriv**2)
# update the weight based on step size and derivative:
weights[i]=weights[i] - np.dot(step_size,deriv)
gradient_magnitude = math.sqrt(gradient_sum_squares)
stdout.write("\r%d" % int(gradient_magnitude))
stdout.flush()
if gradient_magnitude < tolerance:
converged = True
return(weights)
#Example of Implementation
#Importing Training and Testing Data
# train_data=pd.read_csv("kc_house_train_data.csv")
# test_data=pd.read_csv("kc_house_test_data.csv")
# simple_features = ['sqft_living', 'sqft_living15']
# my_output= 'price'
# (simple_feature_matrix, output) = get_numpy_data(train_data, simple_features, my_output)
# initial_weights = np.array([-100000., 1., 1.])
# step_size = 7e-12
# tolerance = 2.5e7
# simple_weights = regression_gradient_descent(simple_feature_matrix, output,initial_weights, step_size,tolerance)
# print simple_weights

It is so simple
def mean(values):
return sum(values)/float(len(values))
def variance(values, mean):
return sum([(x-mean)**2 for x in values])
def covariance(x, mean_x, y, mean_y):
covar = 0.0
for i in range(len(x)):
covar+=(x[i]-mean_x) * (y[i]-mean_y)
return covar
def coefficients(dataset):
x = []
y = []
for line in dataset:
xi, yi = map(float, line.split(','))
x.append(xi)
y.append(yi)
dataset.close()
x_mean, y_mean = mean(x), mean(y)
b1 = covariance(x, x_mean, y, y_mean)/variance(x, x_mean)
b0 = y_mean-b1*x_mean
return [b0, b1]
dataset = open('trainingdata.txt')
b0, b1 = coefficients(dataset)
n=float(raw_input())
print(b0+b1*n)
reference : www.machinelearningmastery.com/implement-simple-linear-regression-scratch-python/

Related

Parameter Optimization Using Minimize in Python

I have written the following two functions to calibrate a model :
The main function is:
def function_Price(para,y,t,T,tau,N,C):
# y= price array
# C = Auto and cross correlation array
# a= paramters need to be calibrated
a=para[0:]
temp=0
for j in range(N):
price_j = a[j]*C[j]*P[t:T-tau,j]
temp=temp+price_j
Price=temp
return Price
The objective function is :
def GError_function_Price(para,y,k,t,T,tau,N,C):
# k is the price need to be fitted
return sum((function_Price(para,y,t,T,tau,N,C)-k[t+tau:T]) ** 2)
Now, I am calling these two functions to do the optimization of the model:
import numpy as np
from scipy.optimize import minimize
# Prices (example)
y = np.array([[1,2,3,4,5,4], [4,5,6,7,8,9], [6,7,8,7,8,6], [13,14,15,11,12,19]])
# Correaltion (example)
Corr= np.array([[1,2,3,4,5,4], [4,5,6,7,8,9], [6,7,8,7,8,6], [13,14,15,11,12,19],[1,2,3,4,5,4],[6,7,8,7,8,6]])
# Define
tau=1
Size = y.shape
N = Size[1]
T = Size[0]
t=0
# initial Values
para=np.zeros(N)
# Bounds
B = np.zeros(shape=(N,2))
for n in range(N):
B[n][0]= float('-inf')
B[n][1]= float('inf')
# Calibration
A = np.zeros(shape=(N,N))
for i in range (N):
k=y[:,i] #fitted one
C=Corr[i,:]
parag=minimize(GError_function_Price,para,args=(y,Y,t,T,tau,N,C),method='SLSQP',bounds=B)
A[i,:]=parag.x
Once, I run the model, It should produce an N by N array of optimized values of paramters. But, except for the first column, it keeps zeros for the rest. Something is wrong.
Can you help me fix the problem, please?
I know how to do it in Matlab.
The following is Matlab Code :
main function
function Price=function_Price(para,P,t,T,tau,N,C)
a=para(:,:);
temp=0;
for j=1:N
price_j = a(j).*C(j).*P(t:T-tau,j);
temp=temp+price_j;
end
Price=temp;
end
The objective function:
function gerr=GError_function_Price(para,P,Y,t,T,tau,N,C)
gerr=sum((function_Price(para,P,t,T,tau,N,C)-Y(t+tau:T)).^2);
end
Now, I call these two functions in the following way:
P = [1,2,3,4,5,4;4,5,6,7,8,9;6,7,8,7,8,6;13,14,15,11,12,19];
AutoAndCrossCorr= [1,2,3,4,5,4;4,5,6,7,8,9;6,7,8,7,8,6;13,14,15,11,12,19;1,2,3,4,5,4;6,7,8,7,8,6];
tau=1;
Size = size(P);
N =6;
T =4;
t=1;
for i=1:N
Y=P(:,i); % fitted one
C=AutoAndCrossCorr(i,:);
para=zeros(1,N);
lb= repmat(-inf,N,1);
ub= repmat(inf ,N,1);
parag=fminsearchbnd(#(para)abs(GError_function_Price(para,P,Y,t,T,tau,N,C)),para,lb,ub);
a(i,:)=parag;
end
The problem seems to be that you're passing the result of a function call to minimize, rather than the function itself. The arguments get passed by the args parameter. So instead of:
minimize(GError_function_Price(para,y,k,t,T,tau,N,C),para,method='SLSQP',bounds=B)
the following should work
minimize(GError_function_Price,para,args=(y,k,t,T,tau,N,C),method='SLSQP',bounds=B)

Python: Develope Multiple Linear Regression Model From Scrath

I am trying to create a multiple linear regression model from scratch in python. Dataset used: Boston Housing Dataset from Sklearn. Since my focus was on the model building I did not perform any pre-processing steps on the data. However, I used an OLS model to calculate p-values and dropped 3 features from the data. After that, I used a Linear Regression model to find out the weights for each feature.
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
X=load_boston()
data=pd.DataFrame(X.data,columns=X.feature_names)
y=X.target
data.head()
#dropping three features
data=data.drop(['INDUS','NOX','AGE'],axis=1)
#new shape of the data (506,10) not including the target variable
#Passed the whole dataset to Linear Regression Model
model_lr=LinearRegression()
model_lr.fit(data,y)
model_lr.score(data,y)
0.7278959820021539
model_lr.intercept_
22.60536462807957 #----- intercept value
model_lr.coef_
array([-0.09649731, 0.05281081, 2.3802989 , 3.94059598, -1.05476566,
0.28259531, -0.01572265, -0.75651996, 0.01023922, -0.57069861]) #--- coefficients
Now I wanted to calculate the coefficients manually in excel before creating the model in python. To calculate the weights of each feature I used this formula:
Calculating the Weights of the Features
To calculate the intercept I used the formula
b0 = mean(y)-b1*mean(x1)-b2*(mean(x2)....-bn*mean(xn)
The intercept value from my calculations was 22.63551387(almost same to that of the model)
The problem is that the weights of the features from my calculation are far off from that of the sklearn linear model.
-0.002528644 #-- CRIM
-0.001028914 #-- Zn
-0.038663314 #-- CHAS
-0.035026972 #-- RM
-0.014275311 #-- DIS
-0.004058291 #-- RAD
-0.000241103 #-- TAX
-0.015035534 #-- PTRATIO
-0.000318376 #-- B
-0.006411897 #-- LSTAT
Using the first row as a test data to check my calculations, I get 22.73167044199992 while the Linear Regression model predicts 30.42657776. The original value is 24.
But as soon as I check for other rows the sklearn model is having more variation while the predictions made by the weights from my calculations are all showing values close to 22.
I think I am making a mistake in calculating the weights, but I am not sure where the problem is? Is there a mistake in my calculation? Why are all my coefficients from the calculations so close to 0?
Here is my Code for Calculating the coefficients:(beginner here)
x_1=[]
x_2=[]
for i,j in zip(data['CRIM'],y):
mean_x=data['CRIM'].mean()
mean_y=np.mean(y)
c=i-mean_x*(j-mean_y)
d=(i-mean_x)**2
x_1.append(c)
x_2.append(d)
print(sum(x_1)/sum(x_2))
Thank you for reading this long post, I appreciate it.
It seems like the trouble lies in the coefficient calculation. The formula you have given for calculating the coefficients is in scalar form, used for the simplest case of linear regression, namely with only one feature x.
EDIT
Now after seeing your code for the coefficient calculation, the problem is clearer.
You cannot use this equation to calculate the coefficients of each feature independent of each other, as each coefficient will depend on all the features. I suggest you take a look at the derivation of the solution to this least squares optimization problem in the simple case here and in the general case here. And as a general tip stick with matrix implementation whenever you can, as this is radically more efficient.
However, in this case we have a 10-dimensional feature vector, and so in matrix notation it becomes.
See derivation here
I suspect you made some computational error here, as implementing this in python using the scalar formula is more tedious and untidy than the matrix equivalent. But since you haven't shared this peace of your code its hard to know.
Here's an example of how you would implement it:
def calc_coefficients(X,Y):
X=np.mat(X)
Y = np.mat(Y)
return np.dot((np.dot(np.transpose(X),X))**(-1),np.transpose(np.dot(Y,X)))
def score_r2(y_pred,y_true):
ss_tot=np.power(y_true-y_true.mean(),2).sum()
ss_res = np.power(y_true -y_pred,2).sum()
return 1 -ss_res/ss_tot
X = np.ones(shape=(506,11))
X[:,1:] = data.values
B=calc_coefficients(X,y)
##### Coeffcients
B[:]
matrix([[ 2.26053646e+01],
[-9.64973063e-02],
[ 5.28108077e-02],
[ 2.38029890e+00],
[ 3.94059598e+00],
[-1.05476566e+00],
[ 2.82595310e-01],
[-1.57226536e-02],
[-7.56519964e-01],
[ 1.02392192e-02],
[-5.70698610e-01]])
#### Intercept
B[0]
matrix([[22.60536463]])
y_pred = np.dot(np.transpose(B),np.transpose(X))
##### First 5 rows predicted
np.array(y_pred)[0][:5]
array([30.42657776, 24.80818347, 30.69339701, 29.35761397, 28.6004966 ])
##### First 5 rows Ground Truth
y[:5]
array([24. , 21.6, 34.7, 33.4, 36.2])
### R^2 score
score_r2(y_pred,y)
0.7278959820021539
Complete Solution - 2020 - boston dataset
As the other said, to compute the coefficients for the linear regression you have to compute
β = (X^T X)^-1 X^T y
This give you the coefficients ( all B for the feature + the intercept ).
Be sure to add a column with all 1ones to the X for compute the intercept(more in the code)
Main.py
from sklearn.datasets import load_boston
import numpy as np
from CustomLibrary import CustomLinearRegression
from CustomLibrary import CustomMeanSquaredError
boston = load_boston()
X = np.array(boston.data, dtype="f")
Y = np.array(boston.target, dtype="f")
regression = CustomLinearRegression()
regression.fit(X, Y)
print("Projection matrix sk:", regression.coefficients, "\n")
print("bias sk:", regression.intercept, "\n")
Y_pred = regression.predict(X)
loss_sk = CustomMeanSquaredError(Y, Y_pred)
print("Model performance:")
print("--------------------------------------")
print("MSE is {}".format(loss_sk))
print("\n")
CustomLibrary.py
import numpy as np
class CustomLinearRegression():
def __init__(self):
self.coefficients = None
self.intercept = None
def fit(self, x , y):
x = self.add_one_column(x)
x_T = np.transpose(x)
inverse = np.linalg.inv(np.dot(x_T, x))
pseudo_inverse = inverse.dot(x_T)
coef = pseudo_inverse.dot(y)
self.intercept = coef[0]
self.coefficients = coef[1:]
return coef
def add_one_column(self, x):
'''
the fit method with x feature return x coefficients ( include the intercept)
so for have the intercept + x feature coefficients we have to add one column ( in the beginning )
with all 1ones
'''
X = np.ones(shape=(x.shape[0], x.shape[1] +1))
X[:, 1:] = x
return X
def predict(self, x):
predicted = np.array([])
for sample in x:
result = self.intercept
for idx, feature_value_in_sample in enumerate(sample):
result += feature_value_in_sample * self.coefficients[idx]
predicted = np.append(predicted, result)
return predicted
def CustomMeanSquaredError(Y, Y_pred):
mse = 0
for idx,data in enumerate(Y):
mse += (data - Y_pred[idx])**2
return mse * (1 / len(Y))

How is cv_values_ computed in sklearn.linear::RidgeCV?

The reproducible example to fix the discussion:
from sklearn.linear_model import RidgeCV
from sklearn.datasets import load_boston
from sklearn.preprocessing import scale
boston = scale(load_boston().data)
target = load_boston().target
import numpy as np
alphas = np.linspace(1.0,200.0, 5)
fit0 = RidgeCV(alphas=alphas, store_cv_values = True, gcv_mode='eigen').fit(boston, target)
fit0.alpha_
fit0.cv_values_[:,0]
The question: what formula is used to compute fit0.cv_values_?
Edit:
#Abhinav Arora answer below seems to suggests that fit0.cv_values_[:,0][0], the first entry of fit0.cv_values_[:,0] would be
(fit1.predict(boston[0,].reshape(1, -1)) - target[0])**2
where fit1 is a ridge regression with alpha = 1.0, fitted to the data-set from which observation 0 was removed.
Let's see:
1) create new dataset with first row of original dataset removed:
from sklearn.linear_model import Ridge
boston1 = np.delete(boston, (0), axis=0)
target1 = np.delete(target, (0), axis=0)
2) fit a ridge model with alpha = 1.0 on this truncated dataset:
fit1 = Ridge(alpha=1.0).fit(boston1, target1)
3) check the MSE of that model on the first data-point:
(fit1.predict(boston[0,].reshape(1, -1)) - target[0])**2
it is array([ 37.64650853]) which is not the same as what is produced by the fit0.cv_values_[:,0], ergo:
fit0.cv_values_[:,0][0]
which is 37.495629960571137
What gives?
Quoting from the Sklearn documentation:
Cross-validation values for each alpha (if store_cv_values=True and
cv=None). After fit() has been called, this attribute will contain the
mean squared errors (by default) or the values of the
{loss,score}_func function (if provided in the constructor).
Since you have not provided any scoring function in the constructor and also not provided anything for the cv argument in the constructor, this attribute should store the mean squared error for each sample using Leave-One out cross validation. The general formula for Mean Squared Error is
where the Y (with the cap) is the prediction of your regressor and the other Y is the true value.
In your case, you are doing Leave-One out cross validation. Therefore, in every fold you have only 1 test point and thus n = 1. So, in your case doing a fit0.cv_values_[:,0] will simply give you the squared error for every point in your training data set when it was a part of the test fold and when the value of alpha was 1.0
Hope that helps.
Let's look - it's open source after all
The first call to fit makes a call upwards to its parent, _BaseRidgeCV (line 997, in that implementation). We haven't provided a cross-validation generator, so we make another call upwards to _RidgeGCV.fit. There' plenty of math in the documentation of this function, but we're so close to the source that I'll let you go and read about it.
Here's the actual source
v, Q, QT_y = _pre_compute(X, y)
n_y = 1 if len(y.shape) == 1 else y.shape[1]
cv_values = np.zeros((n_samples * n_y, len(self.alphas)))
C = []
scorer = check_scoring(self, scoring=self.scoring, allow_none=True)
error = scorer is None
for i, alpha in enumerate(self.alphas):
weighted_alpha = (sample_weight * alpha
if sample_weight is not None
else alpha)
if error:
out, c = _errors(weighted_alpha, y, v, Q, QT_y)
else:
out, c = _values(weighted_alpha, y, v, Q, QT_y)
cv_values[:, i] = out.ravel()
C.append(c)
Note the un-exciting pre_compute function
def _pre_compute(self, X, y):
# even if X is very sparse, K is usually very dense
K = safe_sparse_dot(X, X.T, dense_output=True)
v, Q = linalg.eigh(K)
QT_y = np.dot(Q.T, y)
return v, Q, QT_y
Abinav has explained what's going on on a mathematical level -it's simply accumulating the weighted mean squared error. The details of their implementation, and where it differs from your implementation, can be evaluated step-by-step from the code

using scipy.optimize.fmin_bfgs for logistic regression classifier leads divide by zero error

I'm trying to implement my binary logistic regression classifier and I decided to use scipy.optimize.fmin_bfgs to minimize the objective function (loglikelihood) as stated in the following formula:
The gradient of this objective function is calculated as:
where:
Now I have my LogisticRegression class which has the sigmoid function to calculate the sigmoid, the loglikelihood function to calculate the loglikelihood, and gradient to calculate the gradient. Finally, I have my learn_classifier method which calls the optimize.fmin_bfgs function to find the best weight vector. My training data set consists of 2013 tuples and each tuple has 113 attributes where the first attribute is the outcome (taking either one or zero). Here is my code:
from features_reader import FeaturesReader
import numpy as np
from scipy import optimize
from scipy.optimize import check_grad
class LogisticRegression:
def __init__(self, features_reader = FeaturesReader()):
features_reader.read_features()
fHeight = len(features_reader.feature_data)
fWidth = len(features_reader.feature_data[0])
tHeight = len(features_reader.test_data)
tWidth = len(features_reader.test_data[0])
self.training_data = np.zeros((fHeight, fWidth))
self.testing_data = np.zeros((tHeight, tWidth))
print 'training data size: ', self.training_data.shape
print 'testing data size: ', self.testing_data.shape
for index, item in enumerate(features_reader.feature_data):
self.training_data[index, 0] = item['outcome']
self.training_data[index, 1:] = np.array([value for key, value in item.items() if key!='outcome'])
def sigmoid(self, v_x, v_weight):
return 1.0/(1.0 + np.exp(-np.dot(v_x, v_weight[1:])+v_weight[0]))
def loglikelihood(self, v_weight, v_x, v_y):
return -1*np.sum(v_y*np.log(self.sigmoid(v_x, v_weight)) + (1-v_y)*(np.log(1-self.sigmoid(v_x, v_weight))))
def gradient(self, v_weight, v_x, v_y):
gradient = np.zeros(v_weight.shape[0])
for row, y in zip(v_x,v_y):
new_row = np.ones(1+row.shape[0])
new_row[1:] = row
y_prime = self.sigmoid(new_row[1:], v_weight)
gradient+=(y_prime-y)*new_row
return gradient
def learn_classifier(self):
result = optimize.fmin_bfgs(f=self.loglikelihood,
x0=np.zeros(self.training_data.shape[1]),
fprime=self.gradient,
args=(self.training_data[:,1:], self.training_data[:,0]))
return result
def main():
features_reader = FeaturesReader(filename = 'features.csv', features_file = 'train_filter1.arff')
logistic_regression = LogisticRegression(features_reader)
result = logistic_regression.learn_classifier()
print result
if __name__ == "__main__":
main()
The FeaturesReader class is the parser that reads the csv file which I did not paste here. But I'm pretty sure the init function correctly parses the csv into a 2-D numpy array that represents training data . This 2-D array has shape (2013, 113) where the first column is the training output . When I ran the learn_classifier function it gives these warning and terminates:
training data size: (2013, 113)
testing data size: (4700, 113)
logistic_regression.py:26: RuntimeWarning: overflow encountered in exp
return 1.0/(1.0 + np.exp(-np.dot(v_x, v_weight[1:])+v_weight[0]))
logistic_regression.py:30: RuntimeWarning: divide by zero encountered in log
return -1*np.sum(v_y*np.log(self.sigmoid(v_x, v_weight)) + (1-v_y)*(np.log(1-self.sigmoid(v_x, v_weight))))
logistic_regression.py:30: RuntimeWarning: invalid value encountered in multiply
return -1*np.sum(v_y*np.log(self.sigmoid(v_x, v_weight)) + (1-v_y)*(np.log(1-self.sigmoid(v_x, v_weight))))
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: nan
Iterations: 1
Function evaluations: 32
Gradient evaluations: 32
So I got these three errors: 1. divide by zero error, 2. overflow encountered in exp 3. invalid value encountered in multiply. And the algorithm terminates after first iteration which is abnormal. I have no clue why these are happening? Do you guys think I did something wrong in calculating the loglikelihood and gradient ? Where else do you think these errors originate? To be more specific, in my loglikelihood function, my w_weight paramenter is assumed to be 1D (shape = 113), my v_x is 2d and has shape (2013,112) (because I do not count the outcome column), and my v_y is 1d and has shape (2013).
I've been dealing with problem myself. The the call to np.exp can very quickly create numbers large enough that they overflow a float64. Even using numpy's longdouble datatype (which may provide more resolution, depending on your CPU) I run into numerical issues.
I found that at least in some case, using feature scaling and normalization solve numerical problems. If you have n features, for each subtract the mean from that feature, then divide by the standard deviation of the feature. The numpy code is:
for row in X.shape[0]:
X[:, row] -= X[:, row].mean()
X[:, row] /= X[:, row].std()
There's probably a more vectorized way to do it without the explicit loop. : )
You can also look at my toy implementation of it here.
Alternatively, you could look at this technique, which describes taking the log of the exponential in order to prevent overflow.

Debugging backpropagation algorithm

I am trying to implement the back-propagation algorithm using numpy in python. I have been using this site to implement the matrix form of back-propagation. While testing this code on XOR, my network does not converge even after multiple runs of thousands of iterations. I think there is some sort of logic error. I would be very grateful if anyone would be willing to look it over. Fully runnable code can be found at github
import numpy as np
def backpropagate(network, tests, iterations=50):
#convert tests into numpy matrices
tests = [(np.matrix(inputs, dtype=np.float64).reshape(len(inputs), 1),
np.matrix(expected, dtype=np.float64).reshape(len(expected), 1))
for inputs, expected in tests]
for _ in range(iterations):
#accumulate the weight and bias deltas
weight_delta = [np.zeros(matrix.shape) for matrix in network.weights]
bias_delta = [np.zeros(matrix.shape) for matrix in network.bias]
#iterate over the tests
for potentials, expected in tests:
#input the potentials into the network
#calling the network with trace == True returns a list of matrices,
#representing the potentials of each layer
trace = network(potentials, trace=True)
errors = [expected - trace[-1]]
#iterate over the layers backwards
for weight_matrix, layer in reversed(list(zip(network.weights, trace))):
#compute the error vector for a layer
errors.append(np.multiply(weight_matrix.transpose()*errors[-1],
network.sigmoid.derivative(layer)))
#remove the input layer
errors.pop()
errors.reverse()
#compute the deltas for bias and weight
for index, error in enumerate(errors):
bias_delta[index] += error
weight_delta[index] += error * trace[index].transpose()
#apply the deltas
for index, delta in enumerate(weight_delta):
network.weights[index] += delta
for index, delta in enumerate(bias_delta):
network.bias[index] += delta
Additionally, here is the code that computes the output, and my sigmoid function. It is less likely that bug lies here; I was able to trained a network to simulate XOR using simulated annealing.
# the call function of the neural network
def __call__(self, potentials, trace=True):
#ensure the input is properly formated
potentials = np.matrix(potentials, dtype=np.float64).reshape(len(potentials), 1)
#accumulate the trace
trace = [potentials]
#iterate over the weights
for index, weight_matrix in enumerate(self.weights):
potentials = weight_matrix * potentials + self.bias[index]
potentials = self.sigmoid(potentials)
trace.append(potentials)
return trace
#The sigmoid function that is stored in the network
def sigmoid(x):
return np.tanh(x)
sigmoid.derivative = lambda x : (1-np.square(x))
The problem is the missing step-size parameter. Gradient should be additionally scaled, not to make the whole step in the weights space at once. So instead of: network.weights[index] += delta and network.bias[index] += delta it should be:
def backpropagate(network, tests, stepSize = 0.01, iterations=50):
#...
network.weights[index] += stepSize * delta
#...
network.bias[index] += stepSize * delta

Categories