I am trying to implement simple optimization problem in Python. I am currently struggling with the following error:
Value Error: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
As far as I understand, this means that I am somewhere trying to plug in an array where only single value can be accepted. Nevertheless, I haven't managed to come up with a solution, nor have I discovered where is the problem.
My code follows
def validationCurve(X, y, Xval, yval):
#[lmbda_vec, error_train, error_val] =
# VALIDATIONCURVE(X, y, Xval, yval) returns the train and
# validation errors (in error_train, error_val) for different
# values of lmbda. Given the training set (X,y) and validation
# set (Xval, yval).
lmbda_vec = [0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1];
m = len(y);
X = numpy.concatenate((numpy.ones((m,1)), X), axis = 1);
n = len(Xval);
Xval = numpy.concatenate((numpy.ones((n,1)), Xval), axis = 1);
error_train = numpy.zeros((len(lmbda_vec), 1));
error_val = numpy.zeros((len(lmbda_vec), 1));
for i in range(0,len(lmbda_vec)):
lmbda = lmbda_vec[i];
theta = trainLinearReg(X, y, lmbda);
error_train[i] = linearRegCostFunction(X, y, theta, lmbda);
error_val[i] = linearRegCostFunction(Xval, yval, theta, lmbda);
return lmbda_vec, error_train, error_val
def trainLinearReg(X, y, lmbda):
#[theta] = TRAINLINEARREG (X, y, lmbda) trains linear
# regression usingthe dataset (X, y) and regularization
# parameter lmbda. Returns the trained parameters theta.
alpha = 1 # learning rate
num_iters = 200 # number of iterations
initial_theta = (numpy.zeros((len(X[0,:]),1))) #initial guess
#Create "short hand" for the cost function to be minimized
costFunction = lambda t: linearRegCostFunction(X, y, t, lmbda);
#Minimize using Conjugate Gradient
theta = minimize(costFunction, initial_theta, method = 'Newton-CG',
jac = True, options = {'maxiter': 200})
return theta
def linearRegCostFunction(X, y, theta, lmbda):
# [J, grad] = LINEARREGCOSTFUNCTION(X, y, theta, lmbda)
# computes the cost of using theta as the parameter for
# linear regression to fit the data points in X and y.
# Returns the cost in J and the gradient in grad.
# Initialize some useful values
m, n = X.shape; # number of training examples
J = 0;
grad = numpy.zeros((n ,1))
J = numpy.dot((y- X # theta).T, (y-X # theta)) +
lmbda*(theta[1:].T # theta[1:])
J = J/m
grad = (X.T # (y - X # theta))/m
grad [1:] += (lmbda*theta[1:])/m
grad = grad[:];
return grad
I am trying to obtain an optimal regularization parameter by computing cost function and minimizing with respect to theta.
My input values are:
X.shape = (100,25), y.shape = (100,1)
Xval.shape = (55,25), yval.shape = (55,1)
Outputted errors are:
--> 129 lmbda_vec , error_train, error_val = validationCurve(Xtrain, ytrain, Xva
lid, yvalid )
---> 33 theta = trainLinearReg(X, y, lmbda);
---> 49 theta = minimize(costFunction, initial_theta,
method = 'Newton-CG', jac = True, options = {'maxiter': 200})
Later I won't to use the optimized model to predict y on new X.
Could you please advice me where is the problem in my code?
Also, if you observe any points for improvement in my code, please let me know. I will be glad to hear and improve.
Thank you!
Nimitz14 hit the general explanation: you've supplied an array where a scalar is required. This caused a run-time error. The problem is how to fix it from here.
First, try slapping your favourite debugger on the problem, so you can stop the program at useful spots and figure out exactly what array is causing the problem. This should help you determine where it originated.
Failing that, place some strategic print statements along the call route, printing each argument just before the function call. Then examine the signature (call sequence) of each function, and see where you might have given an array in place of a scalar. I don't see it, but ...
Is it possible that you've somehow redefined True as an array?
Related
I started out my ML journey, taking Andrew Ng's Machine learning course on Coursera, and tried to implement Multivariate Linear Regression using Python. However, I'm facing a lot of trouble parsing the data and converting it into a proper working Numpy array. There seems to be so many Numpy functions that perform the same type of processing on the data, that it's hard to figure which function does what. The major problem cause of this is that it affects the algorithm that I'm running.
When I implement the code using np.matrix() to convert the data into a Numpy matrix I get a Cost function of 2064911681.6185248. I get the same Cost function when I use np.values instead of np.matrix(). However, every python solution for this problem online, gets a cost function of 2105448288.629247 using np.newaxis for X and Y. Whenever, I try to use np.newaxis I get a Type Error saying Key is Invalid.
My question is, why does parsing the data in different ways give different cost functions, even though the data shape is how I want it to be? I've provided my Code below. Is there a single efficient and correct way to convert the data into Numpy arrays?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#Load the data
data = pd.read_csv("ex1data2.txt", header=None, names=["Size", "No. of Bedrooms","Price"])
print(data.head(),"\n")
#Initialize columns and size of dataset
cols = data.shape[1]
m = data.shape[0]
theta = np.matrix(np.zeros(cols))
ones = np.ones((m,1))
#Initializing Parameters
alpha = 0.01
iters = 400
cost_list = np.zeros(iters)
#Setting X and Y
X = np.matrix(data.iloc[:,:cols-1])
Y = np.matrix(data.iloc[:,cols-1:])
#Feature Scaling and Adding Ones to X
X = (X - np.mean(X)) / np.std(X)
X = np.hstack((ones, X))
print(f"X Shape: {X.shape} \nY Shape: {Y.shape} \nTheta Shape: {theta.shape} \n")
#Defining Algorithms
def hypothesis(X, theta):
h = np.dot(X, theta.T)
return h
def cost_function(X, Y, theta, m):
squared_error = np.power((hypothesis(X, theta) - Y) , 2)
J = np.sum(squared_error) / (2*m)
return J
def gradient_descent(X, Y, theta, m, alpha, iters):
for i in range(iters):
error = hypothesis(X, theta) - Y
temp = np.dot(error.T, X)
theta = theta - ((alpha/m) * temp)
cost_list[i] = cost_function(X, Y, theta, m)
return theta, cost_list
#Printing Initial and Final Values
print(f"Inital Theta = {theta}")
print(f"Inital Cost = {cost_function(X, Y, theta, m)}")
new_theta, cost_list = gradient_descent(X, Y, theta, m, alpha, iters)
print(f"Final Theta = {new_theta}")
print(f"Final Cost = {cost_function(X, Y, new_theta, m)}")
#Plotting Cost vs Iterations
plt.plot(cost_list, color="Red", label="Cost")
plt.xlabel("Iterations")
plt.ylabel("Cost")
plt.title("Cost vs Iterations")
plt.legend()
plt.show()
The issue I am having is that when I use the code below to find the norm-1 of my error. Firstly, when I plot the error against step-size h, the error values are quite small, in the range of 10^-14 to 10^-16. Secondly, underneath, you can see my attempt to apply the np.polyfit to my graph, which when run, won't fit a characteristic but will output values. The value of p[0] is not perfect, so I believe something is wrong, but it is "close" to the desired output of 3. Is this a matter of just the wrong input or bad data?
def rk3(A,bvector,y0,interval,N):
x0=interval[0]
x_end=interval[1]
x=np.linspace(x0,x_end,N+1)
h=(x_end-x0)/N
y=np.zeros((N+1,len(y0)))
y[0, :] = y0
for n in range(N):
y_1=y[n,:]+h*(np.dot(A,y[n,:])+bvector(x[n]))
y_2=(3/4)*y[n,:]+(1/4)*y_1+(1/4)*h*(np.dot(A,y_1)+bvector(x[n]+h))
y[n+1,:]=(1/3)*y[n,:]+(2/3)*y_2+(2/3)*h*(np.dot(A,y_2)+bvector(x[n]+(1/2)*h))
return x,y
err_vals = []
h_vals = []
for k in range(2,11): #for the range of N=40k, where k=1,...,10
N=40*k
x, y = rk3(A,bvector,y0,[0,0.1],N)
yc = y[-1,:]
h = (x[-1]-x[0])/N
h_vals.append(h)
yvals.append(yc)
yn = y[:,1]
abs_err = np.zeros(N)
print("The value of y at k=",k," is ",yc)
for j in range(1,N):
y_exact=np.array([np.exp(-1000*x[j]), (1000/999)*(np.exp(-x[j])-np.exp(-1000*x[j]))])
y_exact_2 = y_exact[1]
abs_err[j] = np.abs((y[j, 1] - y_exact_2)/y_exact_2)
Error = h*np.sum(abs_err[j])
err_vals.append(Error)
p = np.polyfit(np.log(h_vals), np.log(err_vals), 1)
pyplot.loglog(h_vals,err_vals,"kx")
pyplot.xlabel("h")
pyplot.ylabel("Error")
pyplot.loglog(h,np.exp(p[1])*h**(p[0]), 'r--')
print("Best fit line slope ",format(p[0]))
My evolution of your code below gives a completely straight line with slope close to 3 for the integration over the interval [0,0.01].
For the given interval [0,0.1] the slope value is about 1/3 larger. The error profiles, that is, the absolute error divided by the expected global error power of the step size, gives a converging pattern, confirming the convergence of order 3 of the method.
The error bound 2e7*h^3 is rather large, showing why the combination of problem and method can become very problematic for larger step sizes.
The error is computed via the L1 norms of the function difference and exact solution,
Error = sum(abs((y-y_exact(x))[:,1]))/sum(abs(y[:,1]))
giving a mathematically sound quantity. The summation of the local relative errors can lead to distortions of the total error where the exact solution has a root or small values. But still, even using your computation method of integrating the local relative error leaving out the first data point which is zero,
Error = sum(abs((y[1:,1]/y_exact(x)[1:,1]-1)))*h
gives a similar linear plot, with the range shifted down to 1e-7..1e-9, the slope staying at 3.0293
Note that if you want to use the list h_vals in a computation like the one to plot the fitted line, you have to convert in into a numpy array first.
h=np.asarray(h_vals)
complete code
def rk3(A,bvector,y0,interval,N):
"""Solves an IVP y'=f(x, y(x)) on x \in [0, x_end] with y(0) = y0 using N points, using Runge-Kutta method."""
x=np.linspace(*interval,N+1)
h=x[1]-x[0]
y=np.zeros((N+1,len(y0)))
y[0, :] = y0
for n in range(N):
y_1=y[n]+h*(np.dot(A,y[n])+bvector(x[n]))
y_2=(3/4)*y[n,:]+(1/4)*y_1+(1/4)*h*(np.dot(A,y_1)+bvector(x[n]+h))
y[n+1]=(1/3)*y[n]+(2/3)*y_2+(2/3)*h*(np.dot(A,y_2)+bvector(x[n]+0.5*h))
return x,y
A = np.array([[-1000.0,0.0],[1000.0,-1.0]]);
bvector = lambda x: 0
y_exact = lambda x: np.array([np.exp(-1000*x), (1000/999)*(np.exp(-x)-np.exp(-1000*x))]).T
y0 = y_exact(0)
plt.figure(figsize=(6,3));
h_vals, y_vals, err_vals = [],[],[]
for k in range(2,11): #for the range of N=40k, where k=1,...,10
N=40*k
x, y = rk3(A,bvector,y0,[0,0.01],N)
yc = y[-1,:]
h = x[1]-x[0];
plt.plot(x,(y-y_exact(x))[:,1]/h**3)
h_vals.append(h)
y_vals.append(yc)
yn = y[:,1]
print("The value of y at k=",k," is ",yc)
Error = sum(abs((y-y_exact(x))[:,1]))/sum(abs(y[:,1]))
err_vals.append(Error)
plt.grid(); plt.show()
p = np.polyfit(np.log(h_vals), np.log(err_vals), 1)
plt.figure(figsize=(6,4))
plt.loglog(h_vals,err_vals,"kx")
h=np.asarray(h_vals)
plt.plot(h,np.exp(p[1])*h**(p[0]), '--r', lw=0.5)
plt.xlabel("h")
plt.ylabel("Error")
plt.grid(); plt.show()
print("Best fit line slope ",format(p[0]))
I am trying to implement the optimization algorithm from Scipy. It works fine when I implement it without inputting the Jacobian gradient function. I believe the issue that I am getting when I input the gradient is because the minimize function itself is changing the shape of the initial guess x0. You can see this from the output of the code below.
Input:
import numpy as np
from costFunction import *
import scipy.optimize as op
def sigmoid(z):
epsilon = np.finfo(z.dtype).eps
g = 1/(1+np.exp(-z))
g = np.clip(g,epsilon,1-epsilon)
return g
def costFunction(theta,X,y):
m = y.size
h = sigmoid(X#theta)
J = 1/(m)*(-y.T#np.log(h)-(1-y).T#np.log(1-h))
grad = 1/m*X.T#(h-y)
print ('Shape of theta is',np.shape(theta),'\n')
print ('Shape of gradient is',np.shape(grad),'\n')
return J, grad
X = np.array([[1, 3],[5,7]])
y = np.array([[1],[0]])
m,n = np.shape(X)
one_vec = np.ones((m,1))
X = np.hstack((one_vec,X))
initial_theta = np.zeros((n+1,1))
print ('Running costFunction before executing minimize function...\n')
cost, grad = costFunction(initial_theta,X,y) #To test the shape of gradient before calling minimize
print ('Executing minimize function...\n')
Result = op.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
Output:
Running costFunction before executing minimize function...
Shape of theta is (3, 1)
Traceback (most recent call last):
Shape of gradient is (3, 1)
Executing minimize function...
Shape of theta is (3,)
File "C:/Users/#####/minimizeshapechange.py", line 34, in <module>
Shape of gradient is (3, 2)
Result = op.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
File "C:\Users\#####\anaconda3\lib\site-packages\scipy\optimize\_minimize.py", line 453, in minimize
**options)
File "C:\Users\#####\anaconda3\lib\site-packages\scipy\optimize\tnc.py", line 409, in _minimize_tnc
xtol, pgtol, rescale, callback)
ValueError: tnc: invalid gradient vector from minimized function.
Process finished with exit code 1
I will not analyze your exact computations, but some remarks:
(1) Your gradient is broken!
scipy expects a partial derivative resulting in an array of shape equal to your x0.
your gradient is of shape (3,2), while (n+1, 1) is expected
compare with the example given in the tutorial which uses scipy.optimize.rosen_der (der = derivative)
(2) It seems your scipy-version is a bit older, because mine (0.19.0) is telling me:
ValueError: tnc: invalid gradient vector from minimized function.
Some supporting source-code from scipy:
if (PyArray_SIZE(arr_grad) != py_state->n)
{
PyErr_SetString(PyExc_ValueError,
"tnc: invalid gradient vector from minimized function.");
goto failure;
Remark: This code above was changed / touched / introduced 5 years ago. If you really don't get this error while using your code listed (with removal of the import of costFunction), it seems you are using scipy < v0.13.0b1, which i do no recommend! I assume you are using some deprecated windows-based inofficial distribution with outdated scipy. You should change that!
I had the same problem with Scipy trying to do the same thing as you. I don't understand exactly why this solves the problem but playing with array shapes until it worked gave me the following:
Gradient function defined as follows
def Gradient(theta,X,y):
#Initializing variables
m = len(y)
theta = theta[:,np.newaxis] #<---- THIS IS THE TRICK
grad = np.zeros(theta.shape)
#Vectorized computations
z = X # theta
h = sigmoid(z)
grad = (1/m)*(X.T # ( h - y));
return grad #< --- also works with grad.ravel()
Initial_theta initialized as
initial_theta = np.zeros((n+1))
initial_theta.shape
(3,)
i.e. a simple numpy array rather than a column vector.
Gradient function returns
Gradient(initial_theta,X,y).shape
(3,1) or (3,) depending on whether the function returns grad or grad.ravel
scipy.optimize called as
import scipy.optimize as opt
model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)
What does not work with Scipy
initial_theta of shape (3,1) using initial_theta = np.zeros((n+1))[:,np.newaxis] crashes the scipy.minimize function call.
ValueError: tnc: invalid gradient vector from minimized function.
If someone could clarify these points that would be great ! Thanks
your code of costFunctuion is wrong,maybe you should look that
def costFunction(theta,X,y):
h_theta = sigmoid(X#theta)
J = (-y) * np.log(h_theta) - (1 - y) * np.log(1 - h_theta)
return np.mean(J)
please copy and past in jpuiter in1 and so on in separte cell
In 1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
filepath =('C:/Pythontry/MachineLearning/dataset/couresra/ex2data1.txt')
data =pd.read_csv(filepath,sep=',',header=None)
#print(data)
X = data.values[:,:2] #(100,2)
y = data.values[:,2:3] #(100,1)
#print(np.shape(y))
#In 2
#%% ==================== Part 1: Plotting ====================
postive_value = data.loc[data[2] == 1]
#print(postive_value.values[:,2:3])
negative_value = data.loc[data[2] == 0]
#print(len(postive_value))
#print(len(negative_value))
ax1 = postive_value.plot(kind='scatter',x=0,y=1,s=50,color='b',marker="+",label="Admitted") # S is line width #https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.scatter.html#matplotlib.axes.Axes.scatter
ax2 = negative_value.plot(kind='scatter',x=0,y=1,s=50,color='y',ax=ax1,label="Not Admitted")
ax1.set_xlabel("Exam 1 score")
ax2.set_ylabel("Exam 2 score")
plt.show()
#print(ax1 == ax2)
#print(np.shape(X))
# In 3
#============ Part 2: Compute Cost and Gradient ===========
[m,n] = np.shape(X) #(100,2)
print(m,n)
additional_coulmn = np.ones((m,1))
X = np.append(additional_coulmn,X,axis=1)
initial_theta = np.zeros((n+1), dtype=int)
print(initial_theta)
# In4
#Sigmoid and cost function
def sigmoid(z):
g = np.zeros(np.shape(z));
g = 1/(1+np.exp(-z));
return g
def costFunction(theta, X, y):
J = 0;
#print(theta)
receive_theta = np.array(theta)[np.newaxis] ##This command is used to create the 1D array
#print(receive_theta)
theta = np.transpose(receive_theta)
#print(np.shape(theta))
#grad = np.zeros(np.shape(theta))
z = np.dot(X,theta) # where z = theta*X
#print(z)
h = sigmoid(z) #formula h(x) = g(z) whether g = 1/1+e(-z) #(100,1)
#print(np.shape(h))
#J = np.sum(((-y)*np.log(h)-(1-y)*np.log(1-h))/m);
J = np.sum(np.dot((-y.T),np.log(h))-np.dot((1-y).T,np.log(1-h)))/m
#J = (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
#error = h-y
#print(np.shape(error))
#print(np.shape(X))
grad =np.dot(X.T,(h-y))/m
#print(grad)
return J,grad
#In5
[cost, grad] = costFunction(initial_theta, X, y)
print('Cost at initial theta (zeros):', cost)
print('Expected cost (approx): 0.693\n')
print('Gradient at initial theta (zeros): \n',grad)
print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628\n')
In6 # Compute and display cost and gradient with non-zero theta
test_theta = [-24, 0.2, 0.2]
#test_theta_value = np.array([-24, 0.2, 0.2])[np.newaxis] #This command is used to create the 1D row array
#test_theta = np.transpose(test_theta_value) # Transpose
#test_theta = test_theta_value.transpose()
[cost, grad] = costFunction(test_theta, X, y)
print('\nCost at test theta: \n', cost)
print('Expected cost (approx): 0.218\n')
print('Gradient at test theta: \n',grad);
print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')
#IN6
# ============= Part 3: Optimizing using range =============
import scipy.optimize as opt
#initial_theta_initialize = np.array([0, 0, 0])[np.newaxis]
#initial_theta = np.transpose(initial_theta_initialize)
print ('Executing minimize function...\n')
# Working models
#result = opt.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
result = opt.fmin_tnc(func=costFunction, x0=initial_theta, args=(X, y))
# Not working model
#costFunction(initial_theta,X,y)
#model = opt.minimize(fun = costFunction, x0 = initial_theta, args = (X, y), method = 'TNC',jac = costFunction)
print('Thetas found by fmin_tnc function: ', result);
print('Cost at theta found : \n', cost);
print('Expected cost (approx): 0.203\n');
print('theta: \n',result[0]);
print('Expected theta (approx):\n');
print(' -25.161\n 0.206\n 0.201\n');
Result:
Executing minimize function...
Thetas found by fmin_tnc function: (array([-25.16131854, 0.20623159, 0.20147149]), 36, 0)
Cost at theta found :
0.218330193827
Expected cost (approx): 0.203
theta:
[-25.16131854 0.20623159 0.20147149]
Expected theta (approx):
-25.161
0.206
0.201
scipy’s fmin_tnc doesn’t work well with column or row vector. It expects the parameters to be in an array format.
Python Implementation of Andrew Ng’s Machine Learning Course (Part 2.1)
opt.fmin_tnc(func = costFunction, x0 = theta.flatten(),fprime = gradient, args = (X, y.flatten()))
What worked for me is to reshape y as a vector (1-D) rather than a matrix (2-D array). I simply used the following code and then reran the SciPy's minimize function and it worked.
y = np.reshape(y,100) #e.g., if your y variable has 100 data points.
Little bit late but I also started anderw assignment to implement with Python and put a lot of effort to resolve the mentioned issue. Finally is works for me.
This blog help me but with one changes in fmin_tnc function calling, refer below :-
result = op.fmin_tnc(func=costFunction, x0=initial_theta, fprime=None, approx_grad=True, args=(X, y)) Got this info from here
I'm working through my Matlab code for the Andrew NG Coursera course and turning it into python. I am working on non-regularized logistic regression and after writing my gradient and cost functions I needed something similar to fminunc and after some googling, I found a couple options. They are both returning the same results, but they do not match what is in Andrew NG's expected results code. Others seem to be getting this to work correctly, but I'm wondering why my specific code does not seem to return the desired result when using scipy.optimize functions, but does for the cost and gradient pieces earlier in the code.
The data I'm using can be found at the link below;
ex2data1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as op
#Machine Learning Online Class - Exercise 2: Logistic Regression
#Load Data
#The first two columns contains the exam scores and the third column contains the label.
data = pd.read_csv('ex2data1.txt', header = None)
X = np.array(data.iloc[:, 0:2]) #100 x 3
y = np.array(data.iloc[:,2]) #100 x 1
y.shape = (len(y), 1)
#Creating sub-dataframes for plotting
pos_plot = data[data[2] == 1]
neg_plot = data[data[2] == 0]
#==================== Part 1: Plotting ====================
#We start the exercise by first plotting the data to understand the
#the problem we are working with.
print('Plotting data with + indicating (y = 1) examples and o indicating (y = 0) examples.')
plt.plot(pos_plot[0], pos_plot[1], "+", label = "Admitted")
plt.plot(neg_plot[0], neg_plot[1], "o", label = "Not Admitted")
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.legend()
plt.show()
def sigmoid(z):
'''
SIGMOID Compute sigmoid function
g = SIGMOID(z) computes the sigmoid of z.
Instructions: Compute the sigmoid of each value of z (z can be a matrix,
vector or scalar).
'''
g = 1 / (1 + np.exp(-z))
return g
def costFunction(theta, X, y):
'''
COSTFUNCTION Compute cost and gradient for logistic regression
J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
parameter for logistic regression and the gradient of the cost
w.r.t. to the parameters.
'''
m = len(y) #number of training examples
h = sigmoid(X.dot(theta)) #logisitic regression hypothesis
J = (1/m) * np.sum((-y*np.log(h)) - ((1-y)*np.log(1-h)))
#h is 100x1, y is %100x1, these end up as 2 vector we subtract from each other
#then we sum the values by rows
#cost function for logisitic regression
return J
def gradient(theta, X, y):
m = len(y)
grad = np.zeros((theta.shape))
h = sigmoid(X.dot(theta))
for i in range(len(theta)): #number of rows in theta
XT = X[:,i]
XT.shape = (len(X),1)
grad[i] = (1/m) * np.sum((h-y)*XT) #updating each row of the gradient
return grad
#============ Part 2: Compute Cost and Gradient ============
#In this part of the exercise, you will implement the cost and gradient
#for logistic regression. You neeed to complete the code in costFunction.m
#Add intercept term to x and X_test
Bias = np.ones((len(X), 1))
X = np.column_stack((Bias, X))
#Initialize fitting parameters
initial_theta = np.zeros((len(X[0]), 1))
#Compute and display initial cost and gradient
(cost, grad) = costFunction(initial_theta, X, y), gradient(initial_theta, X, y)
print('Cost at initial theta (zeros): %f' % cost)
print('Expected cost (approx): 0.693\n')
print('Gradient at initial theta (zeros):')
print(grad)
print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628')
#Compute and display cost and gradient with non-zero theta
test_theta = np.array([[-24], [0.2], [0.2]]);
(cost, grad) = costFunction(test_theta, X, y), gradient(test_theta, X, y)
print('\nCost at test theta: %f' % cost)
print('Expected cost (approx): 0.218\n')
print('Gradient at test theta:')
print(grad)
print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')
result = op.fmin_tnc(func = costFunction, x0 = initial_theta, fprime = gradient, args = (X,y))
result[1]
Result = op.minimize(fun = costFunction,
x0 = initial_theta,
args = (X, y),
method = 'TNC',
jac = gradient, options={'gtol': 1e-3, 'disp': True, 'maxiter': 1000})
theta = Result.x
theta
test = np.array([[1, 45, 85]])
prob = sigmoid(test.dot(theta))
print('For a student with scores 45 and 85, we predict an admission probability of %f,' % prob)
print('Expected value: 0.775 +/- 0.002\n')
This was a very difficult problem to debug, and illustrates a poorly documented aspect of the scipy.optimize interface. The documentation vaguely indicates that theta will be passed around as a vector:
Minimization of scalar function of one or more variables.
In general, the optimization problems are of the form:
minimize f(x) subject to
g_i(x) >= 0, i = 1,...,m
h_j(x) = 0, j = 1,...,p
where x is a vector of one or more variables.
What's important is that they really mean vector in the most primitive sense, a 1-dimensional array. So you have to expect that whenever theta is passed into one of your callbacks, it will be passed in as a 1-d array. But in numpy, 1-d arrays sometimes behave differently from 2-d row arrays (and, obviously, from 2-d column arrays).
I don't know exactly why it's causing a problem in your case, but it's easily fixed regardless. You just have to add the following at the top of both your cost function and your gradient function:
theta = theta.reshape(-1, 1)
This guarantees that theta will be a 2-d column array, as expected. Once you've done this, the results are correct.
I have had similar issues with Scipy dealing with the same problem as you. As senderle points out the interface is not the easiest to deal with, especially combined with the numpy array interface... Here is my implementation which works as expected.
Defining the cost and gradient functions
Note that initial_theta is passed as a simple array of shape (3,) and converted to a column vector of shape (3,1) within the function. The gradient function then returns the grad.ravel() which has shape (3,) again. This is important as doing otherwise caused an error message with various optimization methods in Scipy.optimize.
Note that different methods have different behaviours but returning .ravel() seems to fix most issues...
import pandas as pd
import numpy as np
import scipy.optimize as opt
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def CostFunc(theta,X,y):
#Initializing variables
m = len(y)
J = 0
grad = np.zeros(theta.shape)
#Vectorized computations
z = X # theta
h = sigmoid(z)
J = (1/m) * ( (-y.T # np.log(h)) - (1 - y).T # np.log(1-h));
return J
def Gradient(theta,X,y):
#Initializing variables
m = len(y)
theta = theta[:,np.newaxis]
grad = np.zeros(theta.shape)
#Vectorized computations
z = X # theta
h = sigmoid(z)
grad = (1/m)*(X.T # ( h - y));
return grad.ravel() #<-- This is the trick
Initializing variables and parameters
Note that initial_theta.shape returns (3,)
X = data1.iloc[:,0:2].values
m,n = X.shape
X = np.concatenate((np.ones(m)[:,np.newaxis],X),1)
y = data1.iloc[:,-1].values[:,np.newaxis]
initial_theta = np.zeros((n+1))
Calling Scipy.optimize
model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)
Any comments from more knowledgeable people are welcome, this Scipy interface is a mystery to me, thanks
I have the following code to minimize the Cost Function with its gradient.
def trainLinearReg( X, y, lamda ):
# theta = zeros( shape(X)[1], 1 )
theta = random.rand( shape(X)[1], 1 ) # random initialization of theta
result = scipy.optimize.fmin_cg( computeCost, fprime = computeGradient, x0 = theta,
args = (X, y, lamda), maxiter = 200, disp = True, full_output = True )
return result[1], result[0]
But I am having this warning:
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 8403387632289934651424768.000000
Iterations: 0
Function evaluations: 15
Gradient evaluations: 3
My computeCost and computeGradient are defined as
def computeCost( theta, X, y, lamda ):
theta = theta.reshape( shape(X)[1], 1 )
m = shape(y)[0]
J = 0
grad = zeros( shape(theta) )
h = X.dot(theta)
squaredErrors = (h - y).T.dot(h - y)
# theta[0] = 0.0
J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))
return J[0]
def computeGradient( theta, X, y, lamda ):
theta = theta.reshape( shape(X)[1], 1 )
m = shape(y)[0]
J = 0
grad = zeros( shape(theta) )
h = X.dot(theta)
squaredErrors = (h - y).T.dot(h - y)
# theta[0] = 0.0
J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))
grad = (1.0 / m) * (X.T.dot(h - y)) + (lamda / m) * theta
return grad.flatten()
I have reviewed these similar questions:
scipy.optimize.fmin_bfgs: “Desired error not necessarily achieved due to precision loss”
scipy.optimize.fmin_cg: "'Desired error not necessarily achieved due to precision loss.'
scipy is not optimizing and returns "Desired error not necessarily achieved due to precision loss"
But still cannot have the solution to my problem. How to let the minimization function process converge instead of being stuck at first?
ANSWER:
I solve this problem based on #lejlot 's comments below.
He is right. The data set X is to large since I did not properly return the correct normalized value to the correct variable. Even though this is a small mistake, it indeed can give you the thought where should we look at when encountering such problems. The Cost Function value is too large leads to the possibility that there are some wrong with my data set.
The previous wrong one:
X_poly = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly = c_[ones((m, 1)), X_poly]
The correct one:
X_poly = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly = c_[ones((m, 1)), X_poly]
where X_poly is actually used in the following traing as
cost, theta = trainLinearReg(X_poly, y, lamda)
ANSWER:
I solve this problem based on #lejlot 's comments below.
He is right. The data set X is to large since I did not properly return the correct normalized value to the correct variable. Even though this is a small mistake, it indeed can give you the thought where should we look at when encountering such problems. The Cost Function value is too large leads to the possibility that there are some wrong with my data set.
The previous wrong one:
X_poly = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly = c_[ones((m, 1)), X_poly]
The correct one:
X_poly = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly = c_[ones((m, 1)), X_poly]
where X_poly is actually used in the following traing as
cost, theta = trainLinearReg(X_poly, y, lamda)
For my implementation scipy.optimize.fmin_cg also failed with the above-mentioned error in some initial guesses. Then I changed it to the BFGS method and converged.
scipy.optimize.minimize(fun, x0, args=(), method='BFGS', jac=None, tol=None, callback=None, options={'disp': False, 'gtol': 1e-05, 'eps': 1.4901161193847656e-08, 'return_all': False, 'maxiter': None, 'norm': inf})
seems that this error in cg is inevitable still as,
CG ends up with a non-descent direction
I too faced this problem and even after searching a lot for solutions nothing happened as the solutions were not clearly defined.
Then I read the documentation from scipy.optimize.fmin_cg where it is clearly mentioned that parameter x0 must be a 1-D array.
My approach was same as that of you wherein I passed 2-D matrix as x0 and I always got some precision error or divide by zero error and same warning as you got.
Then I changed my approach and passed theta as a 1-D array and converted that array into 2-D matrix inside the computeCost and computeGradient function which worked for me and I got the results as expected.
My solutiion for Logistic Regression
def sigmoid(z):
return 1 / (1 + np.exp(-z))
theta = np.zeros(features)
def computeCost(theta,X, Y):
x = np.matrix(X.values)
y = np.matrix(Y.values)
theta = np.matrix(theta)
xtheta = np.matmul(x,theta.T)
hx = sigmoid(xtheta)
cost = (np.multiply(y,np.log(hx)))+(np.multiply((1-y),np.log(1-hx)))
return -(np.sum(cost))/m
def computeGradient(theta, X, Y):
x = np.matrix(X.values)
y = np.matrix(Y.values)
theta = np.matrix(theta)
grad = np.zeros(features)
xtheta = np.matmul(x,theta.T)
hx = sigmoid(xtheta)
error = hx-Y
for i in range(0,features,1):
term = np.multiply(error,x[:,i])
grad[i] = (np.sum(term))/m
return grad
import scipy.optimize as opt
result = opt.fmin_tnc(func=computeCost, x0=theta, fprime=computeGradient, args=(X, Y))
print cost(result[0],X, Y)
Note Again that theta has to be a 1-D array
So in your code modify theta in trainLinearReg to theta = random.randn(features)
I today faced this problem.
I then noticed that my cost function was implemented wrong way and was producing high scaled errors due to which scipy was asking for more data. Hope this helps for someone like me.