Predict function for multiple linear regression - python

I am trying to make a predict function for a homework problem where it takes the dot products of a matrix(x) and a vector(y) and inserts them into a numpy array
def predict(x, y):
y_hat = np.empty
for j in range(len(y)):
y_hat[i] = np.dot(x, y)
return y_hat
There is an error message on y_hat[i] = np.dot(x,y)

There are two errors in the code:
numpy.empty() is a method which get arguments for the shape. Here, you must define it as np.empty([len(y), len(x)]) (if x is matrix and y is a vector,np.dot(x, y) results a vector with length len(x)). It produces a placeholder for np.dot() resulted arrays.
variable i is not defined.
so:
def predict(x, y):
y_hat = np.empty([len(y), len(x)])
for j in range(len(y)):
y_hat[j] = np.dot(x, y)
return y_hat

Related

Understanding JAX argnums parameter in its gradient function

I'm trying to understand the behaviour of argnums in JAX's gradient function.
Suppose I have the following function:
def make_mse(x, t):
def mse(w,b):
return np.sum(jnp.power(x.dot(w) + b - t, 2))/2
return mse
And I'm taking the gradient in the following way:
w_gradient, b_gradient = grad(make_mse(train_data, y), (0,1))(w,b)
argnums= (0,1) in this case, but what does it mean? With respect to which variables the gradient is calculated? What will be the difference if I will use argnums=0 instead?
Also, can I use the same function to get the Hessian matrix?
I looked at JAX help section about it, but couldn't figure it out
When you pass multiple argnums to grad, the result is a function that returns a tuple of gradients, equivalent to if you had computed each separately:
def f(x, y):
return x ** 2 + x * y + y ** 2
df_dxy = grad(f, argnums=(0, 1))
df_dx = grad(f, argnums=0)
df_dy = grad(f, argnums=1)
x = 3.0
y = 4.25
assert df_dxy(x, y) == (df_dx(x, y), df_dy(x, y))
If you want to compute a mixed second derivatives, you can do this by repeatedly applying the gradient:
d2f_dxdy = grad(grad(f, argnums=0), argnums=1)
assert d2f_dxdy(x, y) == 1

How to return intermideate gradients (for non-leaf nodes) in pytorch?

My question is concerning the syntax of pytorch register_hook.
x = torch.tensor([1.], requires_grad=True)
y = x**2
z = 2*y
x.register_hook(print)
y.register_hook(print)
z.backward()
outputs:
tensor([2.])
tensor([4.])
this snippet simply prints the gradient of z w.r.t x and y, respectively.
Now my (most likely trivial) question is how to return the intermediate gradients (rather than only printing)?
UPDATE:
It appears that calling retain_grad() solves the issue for leaf nodes. ex. y.retain_grad().
However, retain_grad does not seem to solve it for non-leaf nodes. Any suggestions?
I think you can use those hooks to store the gradients in a global variable:
grads = []
x = torch.tensor([1.], requires_grad=True)
y = x**2 + 1
z = 2*y
x.register_hook(lambda d:grads.append(d))
y.register_hook(lambda d:grads.append(d))
z.backward()
But you most likely also need to remember the corresponding tensor these gradients were computed for. In that case, we slightly extend above using a dict instead of list:
grads = {}
x = torch.tensor([1.,2.], requires_grad=True)
y = x**2 + 1
z = 2*y
def store(grad,parent):
print(grad,parent)
grads[parent] = grad.clone()
x.register_hook(lambda grad:store(grad,x))
y.register_hook(lambda grad:store(grad,y))
z.sum().backward()
Now you can, for example, access tensor y's grad simply using grads[y]

Cost Function and Gradient Seem to be Working, but scipy.optimize functions are not

I'm working through my Matlab code for the Andrew NG Coursera course and turning it into python. I am working on non-regularized logistic regression and after writing my gradient and cost functions I needed something similar to fminunc and after some googling, I found a couple options. They are both returning the same results, but they do not match what is in Andrew NG's expected results code. Others seem to be getting this to work correctly, but I'm wondering why my specific code does not seem to return the desired result when using scipy.optimize functions, but does for the cost and gradient pieces earlier in the code.
The data I'm using can be found at the link below;
ex2data1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as op
#Machine Learning Online Class - Exercise 2: Logistic Regression
#Load Data
#The first two columns contains the exam scores and the third column contains the label.
data = pd.read_csv('ex2data1.txt', header = None)
X = np.array(data.iloc[:, 0:2]) #100 x 3
y = np.array(data.iloc[:,2]) #100 x 1
y.shape = (len(y), 1)
#Creating sub-dataframes for plotting
pos_plot = data[data[2] == 1]
neg_plot = data[data[2] == 0]
#==================== Part 1: Plotting ====================
#We start the exercise by first plotting the data to understand the
#the problem we are working with.
print('Plotting data with + indicating (y = 1) examples and o indicating (y = 0) examples.')
plt.plot(pos_plot[0], pos_plot[1], "+", label = "Admitted")
plt.plot(neg_plot[0], neg_plot[1], "o", label = "Not Admitted")
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.legend()
plt.show()
def sigmoid(z):
'''
SIGMOID Compute sigmoid function
g = SIGMOID(z) computes the sigmoid of z.
Instructions: Compute the sigmoid of each value of z (z can be a matrix,
vector or scalar).
'''
g = 1 / (1 + np.exp(-z))
return g
def costFunction(theta, X, y):
'''
COSTFUNCTION Compute cost and gradient for logistic regression
J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
parameter for logistic regression and the gradient of the cost
w.r.t. to the parameters.
'''
m = len(y) #number of training examples
h = sigmoid(X.dot(theta)) #logisitic regression hypothesis
J = (1/m) * np.sum((-y*np.log(h)) - ((1-y)*np.log(1-h)))
#h is 100x1, y is %100x1, these end up as 2 vector we subtract from each other
#then we sum the values by rows
#cost function for logisitic regression
return J
def gradient(theta, X, y):
m = len(y)
grad = np.zeros((theta.shape))
h = sigmoid(X.dot(theta))
for i in range(len(theta)): #number of rows in theta
XT = X[:,i]
XT.shape = (len(X),1)
grad[i] = (1/m) * np.sum((h-y)*XT) #updating each row of the gradient
return grad
#============ Part 2: Compute Cost and Gradient ============
#In this part of the exercise, you will implement the cost and gradient
#for logistic regression. You neeed to complete the code in costFunction.m
#Add intercept term to x and X_test
Bias = np.ones((len(X), 1))
X = np.column_stack((Bias, X))
#Initialize fitting parameters
initial_theta = np.zeros((len(X[0]), 1))
#Compute and display initial cost and gradient
(cost, grad) = costFunction(initial_theta, X, y), gradient(initial_theta, X, y)
print('Cost at initial theta (zeros): %f' % cost)
print('Expected cost (approx): 0.693\n')
print('Gradient at initial theta (zeros):')
print(grad)
print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628')
#Compute and display cost and gradient with non-zero theta
test_theta = np.array([[-24], [0.2], [0.2]]);
(cost, grad) = costFunction(test_theta, X, y), gradient(test_theta, X, y)
print('\nCost at test theta: %f' % cost)
print('Expected cost (approx): 0.218\n')
print('Gradient at test theta:')
print(grad)
print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')
result = op.fmin_tnc(func = costFunction, x0 = initial_theta, fprime = gradient, args = (X,y))
result[1]
Result = op.minimize(fun = costFunction,
x0 = initial_theta,
args = (X, y),
method = 'TNC',
jac = gradient, options={'gtol': 1e-3, 'disp': True, 'maxiter': 1000})
theta = Result.x
theta
test = np.array([[1, 45, 85]])
prob = sigmoid(test.dot(theta))
print('For a student with scores 45 and 85, we predict an admission probability of %f,' % prob)
print('Expected value: 0.775 +/- 0.002\n')
This was a very difficult problem to debug, and illustrates a poorly documented aspect of the scipy.optimize interface. The documentation vaguely indicates that theta will be passed around as a vector:
Minimization of scalar function of one or more variables.
In general, the optimization problems are of the form:
minimize f(x) subject to
g_i(x) >= 0, i = 1,...,m
h_j(x) = 0, j = 1,...,p
where x is a vector of one or more variables.
What's important is that they really mean vector in the most primitive sense, a 1-dimensional array. So you have to expect that whenever theta is passed into one of your callbacks, it will be passed in as a 1-d array. But in numpy, 1-d arrays sometimes behave differently from 2-d row arrays (and, obviously, from 2-d column arrays).
I don't know exactly why it's causing a problem in your case, but it's easily fixed regardless. You just have to add the following at the top of both your cost function and your gradient function:
theta = theta.reshape(-1, 1)
This guarantees that theta will be a 2-d column array, as expected. Once you've done this, the results are correct.
I have had similar issues with Scipy dealing with the same problem as you. As senderle points out the interface is not the easiest to deal with, especially combined with the numpy array interface... Here is my implementation which works as expected.
Defining the cost and gradient functions
Note that initial_theta is passed as a simple array of shape (3,) and converted to a column vector of shape (3,1) within the function. The gradient function then returns the grad.ravel() which has shape (3,) again. This is important as doing otherwise caused an error message with various optimization methods in Scipy.optimize.
Note that different methods have different behaviours but returning .ravel() seems to fix most issues...
import pandas as pd
import numpy as np
import scipy.optimize as opt
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def CostFunc(theta,X,y):
#Initializing variables
m = len(y)
J = 0
grad = np.zeros(theta.shape)
#Vectorized computations
z = X # theta
h = sigmoid(z)
J = (1/m) * ( (-y.T # np.log(h)) - (1 - y).T # np.log(1-h));
return J
def Gradient(theta,X,y):
#Initializing variables
m = len(y)
theta = theta[:,np.newaxis]
grad = np.zeros(theta.shape)
#Vectorized computations
z = X # theta
h = sigmoid(z)
grad = (1/m)*(X.T # ( h - y));
return grad.ravel() #<-- This is the trick
Initializing variables and parameters
Note that initial_theta.shape returns (3,)
X = data1.iloc[:,0:2].values
m,n = X.shape
X = np.concatenate((np.ones(m)[:,np.newaxis],X),1)
y = data1.iloc[:,-1].values[:,np.newaxis]
initial_theta = np.zeros((n+1))
Calling Scipy.optimize
model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)
Any comments from more knowledgeable people are welcome, this Scipy interface is a mystery to me, thanks

Passing an array where single value is expected?

I am trying to implement simple optimization problem in Python. I am currently struggling with the following error:
Value Error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
As far as I understand, this means that I am somewhere trying to plug in an array where only single value can be accepted. Nevertheless, I haven't managed to come up with a solution, nor have I discovered where is the problem.
My code follows
def validationCurve(X, y, Xval, yval):
#[lmbda_vec, error_train, error_val] =
# VALIDATIONCURVE(X, y, Xval, yval) returns the train and
# validation errors (in error_train, error_val) for different
# values of lmbda. Given the training set (X,y) and validation
# set (Xval, yval).
lmbda_vec = [0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1];
m = len(y);
X = numpy.concatenate((numpy.ones((m,1)), X), axis = 1);
n = len(Xval);
Xval = numpy.concatenate((numpy.ones((n,1)), Xval), axis = 1);
error_train = numpy.zeros((len(lmbda_vec), 1));
error_val = numpy.zeros((len(lmbda_vec), 1));
for i in range(0,len(lmbda_vec)):
lmbda = lmbda_vec[i];
theta = trainLinearReg(X, y, lmbda);
error_train[i] = linearRegCostFunction(X, y, theta, lmbda);
error_val[i] = linearRegCostFunction(Xval, yval, theta, lmbda);
return lmbda_vec, error_train, error_val
def trainLinearReg(X, y, lmbda):
#[theta] = TRAINLINEARREG (X, y, lmbda) trains linear
# regression usingthe dataset (X, y) and regularization
# parameter lmbda. Returns the trained parameters theta.
alpha = 1 # learning rate
num_iters = 200 # number of iterations
initial_theta = (numpy.zeros((len(X[0,:]),1))) #initial guess
#Create "short hand" for the cost function to be minimized
costFunction = lambda t: linearRegCostFunction(X, y, t, lmbda);
#Minimize using Conjugate Gradient
theta = minimize(costFunction, initial_theta, method = 'Newton-CG',
jac = True, options = {'maxiter': 200})
return theta
def linearRegCostFunction(X, y, theta, lmbda):
# [J, grad] = LINEARREGCOSTFUNCTION(X, y, theta, lmbda)
# computes the cost of using theta as the parameter for
# linear regression to fit the data points in X and y.
# Returns the cost in J and the gradient in grad.
# Initialize some useful values
m, n = X.shape; # number of training examples
J = 0;
grad = numpy.zeros((n ,1))
J = numpy.dot((y- X # theta).T, (y-X # theta)) +
lmbda*(theta[1:].T # theta[1:])
J = J/m
grad = (X.T # (y - X # theta))/m
grad [1:] += (lmbda*theta[1:])/m
grad = grad[:];
return grad
I am trying to obtain an optimal regularization parameter by computing cost function and minimizing with respect to theta.
My input values are:
X.shape = (100,25), y.shape = (100,1)
Xval.shape = (55,25), yval.shape = (55,1)
Outputted errors are:
--> 129 lmbda_vec , error_train, error_val = validationCurve(Xtrain, ytrain, Xva
lid, yvalid )
---> 33 theta = trainLinearReg(X, y, lmbda);
---> 49 theta = minimize(costFunction, initial_theta,
method = 'Newton-CG', jac = True, options = {'maxiter': 200})
Later I won't to use the optimized model to predict y on new X.
Could you please advice me where is the problem in my code?
Also, if you observe any points for improvement in my code, please let me know. I will be glad to hear and improve.
Thank you!
Nimitz14 hit the general explanation: you've supplied an array where a scalar is required. This caused a run-time error. The problem is how to fix it from here.
First, try slapping your favourite debugger on the problem, so you can stop the program at useful spots and figure out exactly what array is causing the problem. This should help you determine where it originated.
Failing that, place some strategic print statements along the call route, printing each argument just before the function call. Then examine the signature (call sequence) of each function, and see where you might have given an array in place of a scalar. I don't see it, but ...
Is it possible that you've somehow redefined True as an array?

python lmfit "object too deep for desired array"

I am trying out lmfit and using as an example problem below. In this example, I am simply solving for x in a system Ax = y. Here A is a 3*2 array, y is a 3*1 array. I have declared all of them as arrays.
import numpy as np
from lmfit import minimize, Parameters
A = np.array([1,2,-1,3,-2,5])
A = A.reshape(3,2)
y = np.array([12, 13, 21])
def residual(params, A, y, eps_y=1):
x = params['x'].value
y_hat = np.dot(A, x)
return (y - y_hat)/eps_y
x = np.array([0,0])
params = Parameters()
params.add('x', x)
out = minimize(residual, params, args=(A,y))
print out.value
When running this I get an error: "ValueError: object too deep for desired array".
I have found instances of similar problems researching here and on web. In general, most often reason cited is that A, x and y should be arrays and not matrix. Also in some solutions, x and y are asked to be a kept as a vector with shape (len(v),). Above is already in compliance with these suggestions but I am still getting "ValueError: object too deep for desired array".
I have wasted quite a bit of time trying to solve this problem and am stumped now. Any help on this will be very welcome.
The documentation for Parameter is here:
http://newville.github.io/lmfit-py/parameters.html#Parameter
It specifically states that the value of a parameter must be a numerical value, and not an array of any kind. So instead of doing:
x = np.array([0,0])
params.add('x', x)
do:
params.add('x0', 0)
params.add('x1', 0)
and then change the residuals function to:
def residual(params, A, y, eps_y=1):
x0 = params['x0'].value
x1 = params['x1'].value
y_hat = np.dot(A, [x0, x1])
return (y - y_hat)/eps_y

Categories