I am performing a least squares regression as below (univariate). I would like to express the significance of the result in terms of R^2. Numpy returns a value of unscaled residual, what would be a sensible way of normalizing this.
field_clean,back_clean = rid_zeros(backscatter,field_data)
num_vals = len(field_clean)
x = field_clean[:,row:row+1]
y = 10*log10(back_clean)
A = hstack([x, ones((num_vals,1))])
soln = lstsq(A, y )
m, c = soln [0]
residues = soln [1]
print residues
See http://en.wikipedia.org/wiki/Coefficient_of_determination
Your R2 value =
1 - residual / sum((y - y.mean())**2)
which is equivalent to
1 - residual / (n * y.var())
As an example:
import numpy as np
# Make some data...
n = 10
x = np.arange(n)
y = 3 * x + 5 + np.random.random(n)
# Note that polyfit is an easier way to do this...
# It would just be "model, resid = np.polyfit(x,y,1,full=True)[:2]"
A = np.vstack((x, np.ones(n))).T
model, resid = np.linalg.lstsq(A, y)[:2]
r2 = 1 - resid / (y.size * y.var())
print r2
Related
I wrote some code that performs gradient descent on a couple of data points.
For some reason the curve is not converging correctly, but I have no idea why that is. I always end up with an exploding tail.
Am I doing one of the computations wrong? Am I actually getting stuck in a local minimum or is it something else?
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
def estimate(weights, x, order):
est = 0
for i in range(order):
est += weights[i] * x ** i
return est
def cost_function(x, y, weights, m):
cost = 0
for i in range(m-1):
cost += (((weights[i] * x ** i) - y) ** 2)
return (np.sum(cost ** 2) / ( 2 * m ))
def descent(A, b, iterations, descent_rate, order):
x = A.T[0]
y = b.reshape(4)
# features
ones = np.vstack(np.ones(len(A)))
x = np.vstack(A.T[0])
x2 = np.vstack(A.T[0] ** 2)
# Our feature matrix
features = np.concatenate((ones,x,x2), axis = 1).T
# Initialize our coefficients to zero
weights = np.zeros(order + 1)
m = len(y)
# gradient descent
for i in range(iterations):
est = estimate(weights, x, order).T
difference = est - y
weights = weights + (-descent_rate * (1/m) * np.matmul(difference, features.T)[0])
cost = cost_function(x, y, weights, m)
print(cost)
plt.scatter(x,y)
u = np.linspace(0,3,100)
plt.plot(u, (u ** 2) * weights[2] + u * weights[1] + weights[0], '-')
plt.show()
A = np.array(((0,1),
(1,1),
(2,1),
(3,1)))
b = np.array((1,2,0,3), ndmin = 2 ).T
iterations = 150
descent_rate = 0.01
order = 2
descent(A, b, iterations, descent_rate, order)
I would like to avoid getting stuck in such a minimum. I have attempted setting the initial weights to random values but to no avail, sometimes it dips a bit more but then gives me the same behaviour again.
Here is the one of the plots that I am getting:
And here is the expected result obtained by a least squares solution:
Your estimate function should be
def estimate(weights, x, order):
est = 0
for i in range(order+1):
est += weights[i] * x ** i
return est
Better yet, since the order information is already present in the size of the weights vector, remove the redundancy with:
def estimate(weights, x):
est = 0
for i in range(len(weights)):
est += weights[i] * x ** i
return est
This is what I got when using your code and running 2000 iterations:
I made a gradient descent algorithm in Python and it doesn't work. My m and b values keep increasing and never stop until I get the -inf error or the overflow encountered in square error.
import numpy as np
x = np.array([2,3,4,5])
y = np.array([5,7,9,5])
m = np.random.randn()
b = np.random.randn()
error = 0
lr = 0.0001
for q in range(1000):
for i in range(len(x)):
ypred = m*x[i] + b
error += (ypred - y[i]) **2
m = m - (x * error) *lr
b = b - (lr * error)
print(b,m)
I expected my algorithm to return the best m and b values for my data (x and y) but it didn't work. What is going wrong?
import numpy as np
x = np.array([2,3,4,5])
y = 0.3*x+0.6
m = np.random.randn()
b = np.random.randn()
lr = 0.001
for q in range(100000):
ypred = m*x + b
error = (1./(2*len(x))) * np.sum(np.square(ypred - y)) #eq 1
m = m - lr * np.sum((ypred - y)*x)/len(x) # eq 2 and eq 4
b = b - lr * np.sum(ypred - y)/len(x) # eq 3 and eq 5
print (m , b)
Output:
0.30007724168011807 0.5997039817571881
Math behind it
Use numpy vectorized operations to avoid loops.
I think you implemented the formula incorrectly:
Use summation on x - error
divide by length of x
See below code:
import numpy as np
x = np.array([2,3,4,5])
y = np.array([5,7,9,11])
m = np.random.randn()
b = np.random.randn()
error = 0
lr = 0.1
print(b, m)
for q in range(1000):
ypred = []
for i in range(len(x)):
temp = m*x[i] + b
ypred.append(temp)
error += temp - y[i]
m = m - np.sum(x * (ypred-y)) *lr/len(x)
b = b - np.sum(lr * (ypred-y))/len(x)
print(b,m)
Output:
-1.198074371762264 0.058595039571115955 # initial weights
0.9997389097653074 2.0000681277214487 # Final weights
I'm starting the ML journey and I'm having troubles with this coding exercise
here is my code
import numpy as np
import pandas as pd
import scipy.optimize as op
# Read the data and give it labels
data = pd.read_csv('ex2data2.txt', header=None, name['Test1', 'Test2', 'Accepted'])
# Separate the features to make it fit into the mapFeature function
X1 = data['Test1'].values.T
X2 = data['Test2'].values.T
# This function makes more features (degree)
def mapFeature(x1, x2):
degree = 6
out = np.ones((x1.shape[0], sum(range(degree + 2))))
curr_column = 1
for i in range(1, degree + 1):
for j in range(i+1):
out[:,curr_column] = np.power(x1, i-j) * np.power(x2, j)
curr_column += 1
return out
# Separate the data into training and target, also initialize theta
X = mapFeature(X1, X2)
y = np.matrix(data['Accepted'].values).T
m, n = X.shape
cols = X.shape[1]
theta = np.matrix(np.zeros(cols))
#Initialize the learningRate(sigma)
learningRate = 1
# Define the Sigmoid Function (Output between 0 and 1)
def sigmoid(z):
return 1 / (1 + np.exp(-z))
def cost(theta, X, y, learningRate):
# This is require to make the optimize function work
theta = theta.reshape(-1, 1)
error = sigmoid(X # theta)
first = np.multiply(-y, np.log(error))
second = np.multiply(1 - y, np.log(1 - error))
j = np.sum((first - second)) / m + (learningRate * np.sum(np.power(theta, 2)) / 2 * m)
return j
# Define the gradient of the cost function
def gradient(theta, X, y, learningRate):
# This is require to make the optimize function work
theta = theta.reshape(-1, 1)
error = sigmoid(X # theta)
grad = (X.T # (error - y)) / m + ((learningRate * theta) / m)
grad_no = (X.T # (error - y)) / m
grad[0] = grad_no[0]
return grad
Result = op.minimize(fun=cost, x0=theta, args=(X, y, learningRate), method='TNC', jac=gradient)
opt_theta = np.matrix(Result.x)
def predict(theta, X):
sigValue = sigmoid(X # theta.T)
p = sigValue >= 0.5
return p
p = predict(opt_theta, X)
print('Train Accuracy: {:f}'.format(np.mean(p == y) * 100))
So, when the learningRate = 1, the accuracy should be around 83,05% but I'm getting 80.5% and when the learningRate = 0, the accuracy should be 91.52% but I'm getting 87.28%
So the question is What am I doing wrong? Why my accuracy is below the problem default answer?
Hope someone can guide me in the right direction. Thanks!
P.D: Here is the dataset, maybe it can help
https://raw.githubusercontent.com/TheGirlWhiteWithBandages/Machine-Learning-Algorithms/master/Logistic%20Regression/ex2data2.txt
Hey guys I found a way to make it even better!
Here is the code
import numpy as np
import pandas as pd
import scipy.optimize as op
from sklearn.preprocessing import PolynomialFeatures
# Read the data and give it labels
data = pd.read_csv('ex2data2.txt', header=None, names=['Test1', 'Test2', 'Accepted'])
# Separate the data into training and target
X = (data.iloc[:, 0:2]).values
y = (data.iloc[:, 2:3]).values
# Modify the features to a certain degree (Polynomial)
poly = PolynomialFeatures(6)
m = y.size
XX = poly.fit_transform(data.iloc[:, 0:2].values)
# Initialize Theta
theta = np.zeros(XX.shape[1])
# Define the Sigmoid Function (Output between 0 and 1)
def sigmoid(z):
return(1 / (1 + np.exp(-z)))
# Define the Regularized cost function
def costFunctionReg(theta, reg, *args):
# This is require to make the optimize function work
h = sigmoid(XX # theta)
first = np.log(h).T # - y
second = np.log(1 - h).T # (1 - y)
J = (1 / m) * (first - second) + (reg / (2 * m)) * np.sum(np.square(theta[1:]))
return J
# Define the Regularized gradient function
def gradientReg(theta, reg, *args):
theta = theta.reshape(-1, 1)
h = sigmoid(XX # theta)
grad = (1 / m) * (XX.T # (h - y)) + (reg / m) * np.r_[[[0]], theta[1:]]
return grad.flatten()
# Define the predict Function
def predict(theta, X):
sigValue = sigmoid(X # theta.T)
p = sigValue >= 0.5
return p
# A loop to test between different values for sigma (reg parameter)
for i, Sigma in enumerate([0, 1, 100]):
# Optimize costFunctionReg
res2 = op.minimize(costFunctionReg, theta, args=(Sigma, XX, y), method=None, jac=gradientReg)
# Get the accuracy of the model
accuracy = 100 * sum(predict(res2.x, XX) == y.ravel()) / y.size
# Get the Error between different weights
error1 = costFunctionReg(res2.x, Sigma, XX, y)
# print the accuracy and error
print('Train accuracy {}% with Lambda = {}'.format(np.round(accuracy, decimals=4), Sigma))
print(error1)
Thanks for all your help!
try out this:
# import library
import pandas as pd
import numpy as np
dataset = pd.read_csv('ex2data2.csv',names = ['Test #1','Test #2','Accepted'])
# splitting to x and y variables for features and target variable
x = dataset.iloc[:,:-1].values
y = dataset.iloc[:,-1].values
print('x[0] ={}, y[0] ={}'.format(x[0],y[0]))
m, n = x.shape
print('#{} Number of training samples, #{} features per sample'.format(m,n))
# import library FeatureMapping
from sklearn.preprocessing import PolynomialFeatures
# We also add one column of ones to interpret theta 0 (x with power of 0 = 1) by
include_bias as True
pf = PolynomialFeatures(degree = 6, include_bias = True)
x_poly = pf.fit_transform(x)
pd.DataFrame(x_poly).head(5)
m,n = x_poly.shape
# define theta as zero
theta = np.zeros(n)
# define hyperparameter λ
lambda_ = 1
# reshape (-1,1) because we just have one feature in y column
y = y.reshape(-1,1)
def sigmoid(z):
return 1/(1+np.exp(-z))
def lr_hypothesis(x,theta):
return np.dot(x,theta)
def compute_cost(theta,x,y,lambda_):
theta = theta.reshape(n,1)
infunc1 = -y*(np.log(sigmoid(lr_hypothesis(x,theta)))) - ((1-y)*(np.log(1 - sigmoid(lr_hypothesis(x,theta)))))
infunc2 = (lambda_*np.sum(theta[1:]**2))/(2*m)
j = np.sum(infunc1)/m+ infunc2
return j
# gradient[0] correspond to gradient for theta(0)
# gradient[1:] correspond to gradient for theta(j) j>0
def compute_gradient(theta,x,y,lambda_):
gradient = np.zeros(n).reshape(n,)
theta = theta.reshape(n,1)
infunc1 = sigmoid(lr_hypothesis(x,theta))-y
gradient_in = np.dot(x.transpose(),infunc1)/m
gradient[0] = gradient_in[0,0] # theta(0)
gradient[1:] = gradient_in[1:,0]+(lambda_*theta[1:,]/m).reshape(n-1,) # theta(j) ; j>0
gradient = gradient.flatten()
return gradient
You can now test your cost and gradient without optimization. Th below code will optimize the model:
# hyperparameters
m,n = x_poly.shape
# define theta as zero
theta = np.zeros(n)
# define hyperparameter λ
lambda_array = [0, 1, 10, 100]
import scipy.optimize as opt
for i in range(0,len(lambda_array)):
# Train
print('======================================== Iteration {} ===================================='.format(i))
optimized = opt.minimize(fun = compute_cost, x0 = theta, args = (x_poly, y,lambda_array[i]),
method = 'TNC', jac = compute_gradient)
new_theta = optimized.x
# Prediction
y_pred_train = predictor(x_poly,new_theta)
cm_train = confusion_matrix(y,y_pred_train)
t_train,f_train,acc_train = acc(cm_train)
print('With lambda = {}, {} correct, {} wrong ==========> accuracy = {}%'
.format(lambda_array[i],t_train,f_train,acc_train*100))
Now you should see output like this :
=== Iteration 0 === With lambda = 0, 104 correct, 14 wrong ==========> accuracy = 88.13559322033898%
=== Iteration 1 === With lambda = 1, 98 correct, 20 wrong ==========> accuracy = 83.05084745762711%
=== Iteration 2 === With lambda = 10, 88 correct, 30 wrong ==========> accuracy = 74.57627118644068%
=== Iteration 3 === With lambda = 100, 72 correct, 46 wrong ==========> accuracy = 61.016949152542374%
Writing this algorithm for my final year project. Used gradient descent to find the best fit line. I tried solving it with excel too using Multi-regression. The values are different.
The csv file is attached here https://drive.google.com/file/d/1-UaU34w3c5-VunYrVz9fD7vRb0c-XDqk/view?usp=sharing. The first 3 columns are independent variables (x1,x2,x3) and the last is dependent (y).
Its a different question, If you could explain why the answer is different from excel values?
import numpy as np
import random
import pandas as pd
def gradientDescent(x, y, theta, alpha, m, numIterations):
xTrans = x.transpose()
for i in range(0, numIterations):
hypothesis = np.dot(x, theta)
loss = hypothesis - y
cost = np.sum(loss ** 2) / (2 * m)
print("Iteration %d | Cost: %f" % (i, cost))
gradient = np.dot(xTrans, loss) / m
theta = theta - alpha * gradient
return theta
df = pd.read_csv(r'C:\Users\WELCOME\Desktop\FinalYearPaper\ConferencePaper\NewTrain.csv', 'rU', delimiter=",",header=None)
df.columns = ['x0','Speed','Feed','DOC','Roughness']
print(df)
y = np.array(df['Roughness'])
#x = np.array(d)
x = np.array(df.drop(['Roughness'],1))
#x[:,2:3] = 1.0
print (x)
print(y)
m, n = np.shape(x)
print(m,n)
numIterations= 50000
alpha = 0.000001
theta = np.ones(n)
theta = gradientDescent(x, y, theta, alpha, m, numIterations)
print(theta)
I was trying to match the orthogonal polynomials in the following code in R:
X <- cbind(1, poly(x = x, degree = 9))
but in python.
To do this I implemented my own method for giving orthogonal polynomials:
def get_hermite_poly(x,degree):
#scipy.special.hermite()
N, = x.shape
##
X = np.zeros( (N,degree+1) )
for n in range(N):
for deg in range(degree+1):
X[n,deg] = hermite( n=deg, z=float(x[deg]) )
return X
though it does not seem to match it. Does someone know type of orthogonal polynomial it uses? I tried search in the documentation but didn't say.
To give some context I am trying to implement the following R code in python (https://stats.stackexchange.com/questions/313265/issue-with-convergence-with-sgd-with-function-approximation-using-polynomial-lin/315185#comment602020_315185):
set.seed(1234)
N <- 10
x <- seq(from = 0, to = 1, length = N)
mu <- sin(2 * pi * x * 4)
y <- mu
plot(x,y)
X <- cbind(1, poly(x = x, degree = 9))
# X <- sapply(0:9, function(i) x^i)
w <- rnorm(10)
learning_rate <- function(t) .1 / t^(.6)
n_samp <- 2
for(t in 1:100000) {
mu_hat <- X %*% w
idx <- sample(1:N, n_samp)
X_batch <- X[idx,]
y_batch <- y[idx]
score_vec <- t(X_batch) %*% (y_batch - X_batch %*% w)
change <- score_vec * learning_rate(t)
w <- w + change
}
plot(mu_hat, ylim = c(-1, 1))
lines(mu)
fit_exact <- predict(lm(y ~ X - 1))
lines(fit_exact, col = 'red')
abs(w - coef(lm(y ~ X - 1)))
because it seems to be the only one that works with gradient descent with linear regression with polynomial features.
I feel that any orthogonal polynomial (or at least orthonormal) should work and give a hessian with condition number 1 but I can't seem to make it work in python. Related question: How does one use Hermite polynomials with Stochastic Gradient Descent (SGD)?
poly uses QR factorization, as described in some detail in this answer.
I think that what you really seem to be looking for is how to replicate the output of R's poly using python.
Here I have written a function to do that based on R's implementation. I have also added some comments so that you can see the what the equivalent statements in R look like:
import numpy as np
def poly(x, degree):
xbar = np.mean(x)
x = x - xbar
# R: outer(x, 0L:degree, "^")
X = x[:, None] ** np.arange(0, degree+1)
#R: qr(X)$qr
q, r = np.linalg.qr(X)
#R: r * (row(r) == col(r))
z = np.diag((np.diagonal(r)))
# R: Z = qr.qy(QR, z)
Zq, Zr = np.linalg.qr(q)
Z = np.matmul(Zq, z)
# R: colSums(Z^2)
norm1 = (Z**2).sum(0)
#R: (colSums(x * Z^2)/norm2 + xbar)[1L:degree]
alpha = ((x[:, None] * (Z**2)).sum(0) / norm1 +xbar)[0:degree]
# R: c(1, norm2)
norm2 = np.append(1, norm1)
# R: Z/rep(sqrt(norm1), each = length(x))
Z = Z / np.reshape(np.repeat(norm1**(1/2.0), repeats = x.size), (-1, x.size), order='F')
#R: Z[, -1]
Z = np.delete(Z, 0, axis=1)
return [Z, alpha, norm2];
Checking that this works:
x = np.arange(10) + 1
degree = 9
poly(x, degree)
The first row of the returned matrix is
[-0.49543369, 0.52223297, -0.45342519, 0.33658092, -0.21483446,
0.11677484, -0.05269379, 0.01869894, -0.00453516],
compared to the same operation in R
poly(1:10, 9)
# [1] -0.495433694 0.522232968 -0.453425193 0.336580916 -0.214834462
# [6] 0.116774842 -0.052693786 0.018698940 -0.004535159