scipy.optimize.minimize() in place of gradient desc

scipy.optimize.minimize() in place of gradient desc - python

Given a set of points we'd like to find a straight line fitting the data best possible. I have implemented a version where the minimization of the cost function is done via gradient descent, and now I'd like to use the algorithm from scipy (scipy.optimize.minimize).
I tried this:
def gradDescVect(X, y, theta, alpha, iters):
m = shape(X)[0]
grad = copy(theta)
for c in range(0, iters):
error_sum = hypo(X, grad) - y
error_sum = X.T.dot(error_sum)
grad -= (alpha/m)*error_sum
return grad
def computeCostScipy(theta, X, y):
"""Compute cost, vectorized version"""
m = len(y)
term = hypo(X, theta) - y
print((term.T.dot(term) / (2 * m))[0, 0])
return (term.T.dot(term) / (2 * m))[0, 0]
def findMinTheta( theta, X, y):
result = scipy.optimize.minimize( computeCostScipy, theta, args=(X, y), method='BFGS', options={"maxiter":5000, "disp":True} )
return result.x, result.fun
This actually works pretty fine and gives results very close results to the original grad desc version.
The only problem seems that fminunc() stops execution before reaching minimum value of costFunction().
its sure that costFunction() works fine, and no. of iteration of minimize() is set greater than max iters it takes.
Output:
Optimization terminated successfully.
Current function value: 15.024985
Iterations: 2
Function evaluations: 16
Gradient evaluations: 4
Cost : 15.0249848024
Thteta : [ 0.15232531 0.93072285]
while the correct results are :
Cost : 4.48338825659
Theta : [[-3.63029144]
[ 1.16636235]]
See how close the results are:

Related

Gradient descent function in python - error in loss function or weights

Im working with the gradient function for an exercise but i still couldn't get the expected outcome. That is, i receive 2 error messages:
Wrong output for the loss function. Check how you are implementing the matrix multiplications.
Wrong values for weight's matrix theta. Check how you are updating the matrix of weights.
When applying the function (see below) i notice that the cost decreases at each iteration but still it does not converge to the desired outcome in the exercise. I already tried several adaptations on the formula but couldn't solve it yet.
# gradientDescent
def gradientDescent(x, y, theta, alpha, num_iters):
Input:
x: matrix of features which is (m,n+1)
y: corresponding labels of the input matrix x, dimensions (m,1)
theta: weight vector of dimension (n+1,1)
alpha: learning rate
num_iters: number of iterations you want to train your model for
Output:
J: the final cost
theta: your final weight vector
Hint: you might want to print the cost to make sure that it is going down.
### START CODE HERE ###
# get 'm', the number of rows in matrix x
m = len(x)
for i in range(0, num_iters):
# get z, the dot product of x and theta
# z = predictins
z = np.dot(x, theta)
h = sigmoid(z)
loss = z - y
# calculate the cost function
J = (-1/m) * np.sum(loss)
print("Iteration %d | Cost: %f" % (i, J))#
gradient = np.dot(x.T, loss)
#update theta
theta = theta - (1/m) * alpha * gradient
### END CODE HERE ###
J = float(J)
return J, theta

The issue is that i wrongly applied the formula of the cost function and the formula for calculating the weights:
𝐽=−1/𝑚×(𝐲𝑇⋅𝑙𝑜𝑔(𝐡)+(1−𝐲)𝑇⋅𝑙𝑜𝑔(1−𝐡))
𝜃=𝜃−𝛼/𝑚×(𝐱𝑇⋅(𝐡−𝐲))
The solution is:
J = (-1/m) * (np.dot(y.T, np.log(h)) + (np.dot((1-y).T, np.log(1-h)))
theta = theta - (alpha/m) * gradient

Understanding JAX argnums parameter in its gradient function

I'm trying to understand the behaviour of argnums in JAX's gradient function.
Suppose I have the following function:
def make_mse(x, t):
def mse(w,b):
return np.sum(jnp.power(x.dot(w) + b - t, 2))/2
return mse
And I'm taking the gradient in the following way:
w_gradient, b_gradient = grad(make_mse(train_data, y), (0,1))(w,b)
argnums= (0,1) in this case, but what does it mean? With respect to which variables the gradient is calculated? What will be the difference if I will use argnums=0 instead?
Also, can I use the same function to get the Hessian matrix?
I looked at JAX help section about it, but couldn't figure it out

When you pass multiple argnums to grad, the result is a function that returns a tuple of gradients, equivalent to if you had computed each separately:
def f(x, y):
return x ** 2 + x * y + y ** 2
df_dxy = grad(f, argnums=(0, 1))
df_dx = grad(f, argnums=0)
df_dy = grad(f, argnums=1)
x = 3.0
y = 4.25
assert df_dxy(x, y) == (df_dx(x, y), df_dy(x, y))
If you want to compute a mixed second derivatives, you can do this by repeatedly applying the gradient:
d2f_dxdy = grad(grad(f, argnums=0), argnums=1)
assert d2f_dxdy(x, y) == 1

Python: How to prevent Scipy's optimize.minimize function from changing the shape of initial guess x0?

I am trying to implement the optimization algorithm from Scipy. It works fine when I implement it without inputting the Jacobian gradient function. I believe the issue that I am getting when I input the gradient is because the minimize function itself is changing the shape of the initial guess x0. You can see this from the output of the code below.
Input:
import numpy as np
from costFunction import *
import scipy.optimize as op
def sigmoid(z):
epsilon = np.finfo(z.dtype).eps
g = 1/(1+np.exp(-z))
g = np.clip(g,epsilon,1-epsilon)
return g
def costFunction(theta,X,y):
m = y.size
h = sigmoid(X#theta)
J = 1/(m)*(-y.T#np.log(h)-(1-y).T#np.log(1-h))
grad = 1/m*X.T#(h-y)
print ('Shape of theta is',np.shape(theta),'\n')
print ('Shape of gradient is',np.shape(grad),'\n')
return J, grad
X = np.array([[1, 3],[5,7]])
y = np.array([[1],[0]])
m,n = np.shape(X)
one_vec = np.ones((m,1))
X = np.hstack((one_vec,X))
initial_theta = np.zeros((n+1,1))
print ('Running costFunction before executing minimize function...\n')
cost, grad = costFunction(initial_theta,X,y) #To test the shape of gradient before calling minimize
print ('Executing minimize function...\n')
Result = op.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
Output:
Running costFunction before executing minimize function...
Shape of theta is (3, 1)
Traceback (most recent call last):
Shape of gradient is (3, 1)
Executing minimize function...
Shape of theta is (3,)
File "C:/Users/#####/minimizeshapechange.py", line 34, in <module>
Shape of gradient is (3, 2)
Result = op.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
File "C:\Users\#####\anaconda3\lib\site-packages\scipy\optimize\_minimize.py", line 453, in minimize
**options)
File "C:\Users\#####\anaconda3\lib\site-packages\scipy\optimize\tnc.py", line 409, in _minimize_tnc
xtol, pgtol, rescale, callback)
ValueError: tnc: invalid gradient vector from minimized function.
Process finished with exit code 1

I will not analyze your exact computations, but some remarks:
(1) Your gradient is broken!
scipy expects a partial derivative resulting in an array of shape equal to your x0.
your gradient is of shape (3,2), while (n+1, 1) is expected
compare with the example given in the tutorial which uses scipy.optimize.rosen_der (der = derivative)
(2) It seems your scipy-version is a bit older, because mine (0.19.0) is telling me:
ValueError: tnc: invalid gradient vector from minimized function.
Some supporting source-code from scipy:
if (PyArray_SIZE(arr_grad) != py_state->n)
{
PyErr_SetString(PyExc_ValueError,
"tnc: invalid gradient vector from minimized function.");
goto failure;
Remark: This code above was changed / touched / introduced 5 years ago. If you really don't get this error while using your code listed (with removal of the import of costFunction), it seems you are using scipy < v0.13.0b1, which i do no recommend! I assume you are using some deprecated windows-based inofficial distribution with outdated scipy. You should change that!

I had the same problem with Scipy trying to do the same thing as you. I don't understand exactly why this solves the problem but playing with array shapes until it worked gave me the following:
Gradient function defined as follows
def Gradient(theta,X,y):
#Initializing variables
m = len(y)
theta = theta[:,np.newaxis] #<---- THIS IS THE TRICK
grad = np.zeros(theta.shape)
#Vectorized computations
z = X # theta
h = sigmoid(z)
grad = (1/m)*(X.T # ( h - y));
return grad #< --- also works with grad.ravel()
Initial_theta initialized as
initial_theta = np.zeros((n+1))
initial_theta.shape
(3,)
i.e. a simple numpy array rather than a column vector.
Gradient function returns
Gradient(initial_theta,X,y).shape
(3,1) or (3,) depending on whether the function returns grad or grad.ravel
scipy.optimize called as
import scipy.optimize as opt
model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)
What does not work with Scipy
initial_theta of shape (3,1) using initial_theta = np.zeros((n+1))[:,np.newaxis] crashes the scipy.minimize function call.
ValueError: tnc: invalid gradient vector from minimized function.
If someone could clarify these points that would be great ! Thanks

your code of costFunctuion is wrong,maybe you should look that
def costFunction(theta,X,y):
h_theta = sigmoid(X#theta)
J = (-y) * np.log(h_theta) - (1 - y) * np.log(1 - h_theta)
return np.mean(J)

please copy and past in jpuiter in1 and so on in separte cell
In 1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
filepath =('C:/Pythontry/MachineLearning/dataset/couresra/ex2data1.txt')
data =pd.read_csv(filepath,sep=',',header=None)
#print(data)
X = data.values[:,:2] #(100,2)
y = data.values[:,2:3] #(100,1)
#print(np.shape(y))
#In 2
#%% ==================== Part 1: Plotting ====================
postive_value = data.loc[data[2] == 1]
#print(postive_value.values[:,2:3])
negative_value = data.loc[data[2] == 0]
#print(len(postive_value))
#print(len(negative_value))
ax1 = postive_value.plot(kind='scatter',x=0,y=1,s=50,color='b',marker="+",label="Admitted") # S is line width #https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.scatter.html#matplotlib.axes.Axes.scatter
ax2 = negative_value.plot(kind='scatter',x=0,y=1,s=50,color='y',ax=ax1,label="Not Admitted")
ax1.set_xlabel("Exam 1 score")
ax2.set_ylabel("Exam 2 score")
plt.show()
#print(ax1 == ax2)
#print(np.shape(X))
# In 3
#============ Part 2: Compute Cost and Gradient ===========
[m,n] = np.shape(X) #(100,2)
print(m,n)
additional_coulmn = np.ones((m,1))
X = np.append(additional_coulmn,X,axis=1)
initial_theta = np.zeros((n+1), dtype=int)
print(initial_theta)
# In4
#Sigmoid and cost function
def sigmoid(z):
g = np.zeros(np.shape(z));
g = 1/(1+np.exp(-z));
return g
def costFunction(theta, X, y):
J = 0;
#print(theta)
receive_theta = np.array(theta)[np.newaxis] ##This command is used to create the 1D array
#print(receive_theta)
theta = np.transpose(receive_theta)
#print(np.shape(theta))
#grad = np.zeros(np.shape(theta))
z = np.dot(X,theta) # where z = theta*X
#print(z)
h = sigmoid(z) #formula h(x) = g(z) whether g = 1/1+e(-z) #(100,1)
#print(np.shape(h))
#J = np.sum(((-y)*np.log(h)-(1-y)*np.log(1-h))/m);
J = np.sum(np.dot((-y.T),np.log(h))-np.dot((1-y).T,np.log(1-h)))/m
#J = (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
#error = h-y
#print(np.shape(error))
#print(np.shape(X))
grad =np.dot(X.T,(h-y))/m
#print(grad)
return J,grad
#In5
[cost, grad] = costFunction(initial_theta, X, y)
print('Cost at initial theta (zeros):', cost)
print('Expected cost (approx): 0.693\n')
print('Gradient at initial theta (zeros): \n',grad)
print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628\n')
In6 # Compute and display cost and gradient with non-zero theta
test_theta = [-24, 0.2, 0.2]
#test_theta_value = np.array([-24, 0.2, 0.2])[np.newaxis] #This command is used to create the 1D row array
#test_theta = np.transpose(test_theta_value) # Transpose
#test_theta = test_theta_value.transpose()
[cost, grad] = costFunction(test_theta, X, y)
print('\nCost at test theta: \n', cost)
print('Expected cost (approx): 0.218\n')
print('Gradient at test theta: \n',grad);
print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')
#IN6
# ============= Part 3: Optimizing using range =============
import scipy.optimize as opt
#initial_theta_initialize = np.array([0, 0, 0])[np.newaxis]
#initial_theta = np.transpose(initial_theta_initialize)
print ('Executing minimize function...\n')
# Working models
#result = opt.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
result = opt.fmin_tnc(func=costFunction, x0=initial_theta, args=(X, y))
# Not working model
#costFunction(initial_theta,X,y)
#model = opt.minimize(fun = costFunction, x0 = initial_theta, args = (X, y), method = 'TNC',jac = costFunction)
print('Thetas found by fmin_tnc function: ', result);
print('Cost at theta found : \n', cost);
print('Expected cost (approx): 0.203\n');
print('theta: \n',result[0]);
print('Expected theta (approx):\n');
print(' -25.161\n 0.206\n 0.201\n');
Result:
Executing minimize function...
Thetas found by fmin_tnc function: (array([-25.16131854, 0.20623159, 0.20147149]), 36, 0)
Cost at theta found :
0.218330193827
Expected cost (approx): 0.203
theta:
[-25.16131854 0.20623159 0.20147149]
Expected theta (approx):
-25.161
0.206
0.201

scipy’s fmin_tnc doesn’t work well with column or row vector. It expects the parameters to be in an array format.
Python Implementation of Andrew Ng’s Machine Learning Course (Part 2.1)
opt.fmin_tnc(func = costFunction, x0 = theta.flatten(),fprime = gradient, args = (X, y.flatten()))

What worked for me is to reshape y as a vector (1-D) rather than a matrix (2-D array). I simply used the following code and then reran the SciPy's minimize function and it worked.
y = np.reshape(y,100) #e.g., if your y variable has 100 data points.

Little bit late but I also started anderw assignment to implement with Python and put a lot of effort to resolve the mentioned issue. Finally is works for me.
This blog help me but with one changes in fmin_tnc function calling, refer below :-
result = op.fmin_tnc(func=costFunction, x0=initial_theta, fprime=None, approx_grad=True, args=(X, y)) Got this info from here

Cost Function and Gradient Seem to be Working, but scipy.optimize functions are not

I'm working through my Matlab code for the Andrew NG Coursera course and turning it into python. I am working on non-regularized logistic regression and after writing my gradient and cost functions I needed something similar to fminunc and after some googling, I found a couple options. They are both returning the same results, but they do not match what is in Andrew NG's expected results code. Others seem to be getting this to work correctly, but I'm wondering why my specific code does not seem to return the desired result when using scipy.optimize functions, but does for the cost and gradient pieces earlier in the code.
The data I'm using can be found at the link below;
ex2data1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as op
#Machine Learning Online Class - Exercise 2: Logistic Regression
#Load Data
#The first two columns contains the exam scores and the third column contains the label.
data = pd.read_csv('ex2data1.txt', header = None)
X = np.array(data.iloc[:, 0:2]) #100 x 3
y = np.array(data.iloc[:,2]) #100 x 1
y.shape = (len(y), 1)
#Creating sub-dataframes for plotting
pos_plot = data[data[2] == 1]
neg_plot = data[data[2] == 0]
#==================== Part 1: Plotting ====================
#We start the exercise by first plotting the data to understand the
#the problem we are working with.
print('Plotting data with + indicating (y = 1) examples and o indicating (y = 0) examples.')
plt.plot(pos_plot[0], pos_plot[1], "+", label = "Admitted")
plt.plot(neg_plot[0], neg_plot[1], "o", label = "Not Admitted")
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.legend()
plt.show()
def sigmoid(z):
'''
SIGMOID Compute sigmoid function
g = SIGMOID(z) computes the sigmoid of z.
Instructions: Compute the sigmoid of each value of z (z can be a matrix,
vector or scalar).
'''
g = 1 / (1 + np.exp(-z))
return g
def costFunction(theta, X, y):
'''
COSTFUNCTION Compute cost and gradient for logistic regression
J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
parameter for logistic regression and the gradient of the cost
w.r.t. to the parameters.
'''
m = len(y) #number of training examples
h = sigmoid(X.dot(theta)) #logisitic regression hypothesis
J = (1/m) * np.sum((-y*np.log(h)) - ((1-y)*np.log(1-h)))
#h is 100x1, y is %100x1, these end up as 2 vector we subtract from each other
#then we sum the values by rows
#cost function for logisitic regression
return J
def gradient(theta, X, y):
m = len(y)
grad = np.zeros((theta.shape))
h = sigmoid(X.dot(theta))
for i in range(len(theta)): #number of rows in theta
XT = X[:,i]
XT.shape = (len(X),1)
grad[i] = (1/m) * np.sum((h-y)*XT) #updating each row of the gradient
return grad
#============ Part 2: Compute Cost and Gradient ============
#In this part of the exercise, you will implement the cost and gradient
#for logistic regression. You neeed to complete the code in costFunction.m
#Add intercept term to x and X_test
Bias = np.ones((len(X), 1))
X = np.column_stack((Bias, X))
#Initialize fitting parameters
initial_theta = np.zeros((len(X[0]), 1))
#Compute and display initial cost and gradient
(cost, grad) = costFunction(initial_theta, X, y), gradient(initial_theta, X, y)
print('Cost at initial theta (zeros): %f' % cost)
print('Expected cost (approx): 0.693\n')
print('Gradient at initial theta (zeros):')
print(grad)
print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628')
#Compute and display cost and gradient with non-zero theta
test_theta = np.array([[-24], [0.2], [0.2]]);
(cost, grad) = costFunction(test_theta, X, y), gradient(test_theta, X, y)
print('\nCost at test theta: %f' % cost)
print('Expected cost (approx): 0.218\n')
print('Gradient at test theta:')
print(grad)
print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')
result = op.fmin_tnc(func = costFunction, x0 = initial_theta, fprime = gradient, args = (X,y))
result[1]
Result = op.minimize(fun = costFunction,
x0 = initial_theta,
args = (X, y),
method = 'TNC',
jac = gradient, options={'gtol': 1e-3, 'disp': True, 'maxiter': 1000})
theta = Result.x
theta
test = np.array([[1, 45, 85]])
prob = sigmoid(test.dot(theta))
print('For a student with scores 45 and 85, we predict an admission probability of %f,' % prob)
print('Expected value: 0.775 +/- 0.002\n')

This was a very difficult problem to debug, and illustrates a poorly documented aspect of the scipy.optimize interface. The documentation vaguely indicates that theta will be passed around as a vector:
Minimization of scalar function of one or more variables.
In general, the optimization problems are of the form:
minimize f(x) subject to
g_i(x) >= 0, i = 1,...,m
h_j(x) = 0, j = 1,...,p
where x is a vector of one or more variables.
What's important is that they really mean vector in the most primitive sense, a 1-dimensional array. So you have to expect that whenever theta is passed into one of your callbacks, it will be passed in as a 1-d array. But in numpy, 1-d arrays sometimes behave differently from 2-d row arrays (and, obviously, from 2-d column arrays).
I don't know exactly why it's causing a problem in your case, but it's easily fixed regardless. You just have to add the following at the top of both your cost function and your gradient function:
theta = theta.reshape(-1, 1)
This guarantees that theta will be a 2-d column array, as expected. Once you've done this, the results are correct.

I have had similar issues with Scipy dealing with the same problem as you. As senderle points out the interface is not the easiest to deal with, especially combined with the numpy array interface... Here is my implementation which works as expected.
Defining the cost and gradient functions
Note that initial_theta is passed as a simple array of shape (3,) and converted to a column vector of shape (3,1) within the function. The gradient function then returns the grad.ravel() which has shape (3,) again. This is important as doing otherwise caused an error message with various optimization methods in Scipy.optimize.
Note that different methods have different behaviours but returning .ravel() seems to fix most issues...
import pandas as pd
import numpy as np
import scipy.optimize as opt
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def CostFunc(theta,X,y):
#Initializing variables
m = len(y)
J = 0
grad = np.zeros(theta.shape)
#Vectorized computations
z = X # theta
h = sigmoid(z)
J = (1/m) * ( (-y.T # np.log(h)) - (1 - y).T # np.log(1-h));
return J
def Gradient(theta,X,y):
#Initializing variables
m = len(y)
theta = theta[:,np.newaxis]
grad = np.zeros(theta.shape)
#Vectorized computations
z = X # theta
h = sigmoid(z)
grad = (1/m)*(X.T # ( h - y));
return grad.ravel() #<-- This is the trick
Initializing variables and parameters
Note that initial_theta.shape returns (3,)
X = data1.iloc[:,0:2].values
m,n = X.shape
X = np.concatenate((np.ones(m)[:,np.newaxis],X),1)
y = data1.iloc[:,-1].values[:,np.newaxis]
initial_theta = np.zeros((n+1))
Calling Scipy.optimize
model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)
Any comments from more knowledgeable people are welcome, this Scipy interface is a mystery to me, thanks

fmin_cg: Desired error not necessarily achieved due to precision loss

I have the following code to minimize the Cost Function with its gradient.
def trainLinearReg( X, y, lamda ):
# theta = zeros( shape(X)[1], 1 )
theta = random.rand( shape(X)[1], 1 ) # random initialization of theta
result = scipy.optimize.fmin_cg( computeCost, fprime = computeGradient, x0 = theta,
args = (X, y, lamda), maxiter = 200, disp = True, full_output = True )
return result[1], result[0]
But I am having this warning:
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 8403387632289934651424768.000000
Iterations: 0
Function evaluations: 15
Gradient evaluations: 3
My computeCost and computeGradient are defined as
def computeCost( theta, X, y, lamda ):
theta = theta.reshape( shape(X)[1], 1 )
m = shape(y)[0]
J = 0
grad = zeros( shape(theta) )
h = X.dot(theta)
squaredErrors = (h - y).T.dot(h - y)
# theta[0] = 0.0
J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))
return J[0]
def computeGradient( theta, X, y, lamda ):
theta = theta.reshape( shape(X)[1], 1 )
m = shape(y)[0]
J = 0
grad = zeros( shape(theta) )
h = X.dot(theta)
squaredErrors = (h - y).T.dot(h - y)
# theta[0] = 0.0
J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))
grad = (1.0 / m) * (X.T.dot(h - y)) + (lamda / m) * theta
return grad.flatten()
I have reviewed these similar questions:
scipy.optimize.fmin_bfgs: “Desired error not necessarily achieved due to precision loss”
scipy.optimize.fmin_cg: "'Desired error not necessarily achieved due to precision loss.'
scipy is not optimizing and returns "Desired error not necessarily achieved due to precision loss"
But still cannot have the solution to my problem. How to let the minimization function process converge instead of being stuck at first?
ANSWER:
I solve this problem based on #lejlot 's comments below.
He is right. The data set X is to large since I did not properly return the correct normalized value to the correct variable. Even though this is a small mistake, it indeed can give you the thought where should we look at when encountering such problems. The Cost Function value is too large leads to the possibility that there are some wrong with my data set.
The previous wrong one:
X_poly = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly = c_[ones((m, 1)), X_poly]
The correct one:
X_poly = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly = c_[ones((m, 1)), X_poly]
where X_poly is actually used in the following traing as
cost, theta = trainLinearReg(X_poly, y, lamda)

ANSWER:
I solve this problem based on #lejlot 's comments below.
He is right. The data set X is to large since I did not properly return the correct normalized value to the correct variable. Even though this is a small mistake, it indeed can give you the thought where should we look at when encountering such problems. The Cost Function value is too large leads to the possibility that there are some wrong with my data set.
The previous wrong one:
X_poly = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly = c_[ones((m, 1)), X_poly]
The correct one:
X_poly = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly = c_[ones((m, 1)), X_poly]
where X_poly is actually used in the following traing as
cost, theta = trainLinearReg(X_poly, y, lamda)

For my implementation scipy.optimize.fmin_cg also failed with the above-mentioned error in some initial guesses. Then I changed it to the BFGS method and converged.
scipy.optimize.minimize(fun, x0, args=(), method='BFGS', jac=None, tol=None, callback=None, options={'disp': False, 'gtol': 1e-05, 'eps': 1.4901161193847656e-08, 'return_all': False, 'maxiter': None, 'norm': inf})
seems that this error in cg is inevitable still as,
CG ends up with a non-descent direction

I too faced this problem and even after searching a lot for solutions nothing happened as the solutions were not clearly defined.
Then I read the documentation from scipy.optimize.fmin_cg where it is clearly mentioned that parameter x0 must be a 1-D array.
My approach was same as that of you wherein I passed 2-D matrix as x0 and I always got some precision error or divide by zero error and same warning as you got.
Then I changed my approach and passed theta as a 1-D array and converted that array into 2-D matrix inside the computeCost and computeGradient function which worked for me and I got the results as expected.
My solutiion for Logistic Regression
def sigmoid(z):
return 1 / (1 + np.exp(-z))
theta = np.zeros(features)
def computeCost(theta,X, Y):
x = np.matrix(X.values)
y = np.matrix(Y.values)
theta = np.matrix(theta)
xtheta = np.matmul(x,theta.T)
hx = sigmoid(xtheta)
cost = (np.multiply(y,np.log(hx)))+(np.multiply((1-y),np.log(1-hx)))
return -(np.sum(cost))/m
def computeGradient(theta, X, Y):
x = np.matrix(X.values)
y = np.matrix(Y.values)
theta = np.matrix(theta)
grad = np.zeros(features)
xtheta = np.matmul(x,theta.T)
hx = sigmoid(xtheta)
error = hx-Y
for i in range(0,features,1):
term = np.multiply(error,x[:,i])
grad[i] = (np.sum(term))/m
return grad
import scipy.optimize as opt
result = opt.fmin_tnc(func=computeCost, x0=theta, fprime=computeGradient, args=(X, Y))
print cost(result[0],X, Y)
Note Again that theta has to be a 1-D array
So in your code modify theta in trainLinearReg to theta = random.randn(features)

I today faced this problem.
I then noticed that my cost function was implemented wrong way and was producing high scaled errors due to which scipy was asking for more data. Hope this helps for someone like me.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

scipy.optimize.minimize() in place of gradient desc - python

Related

Gradient descent function in python - error in loss function or weights

Understanding JAX argnums parameter in its gradient function

Python: How to prevent Scipy's optimize.minimize function from changing the shape of initial guess x0?

Cost Function and Gradient Seem to be Working, but scipy.optimize functions are not

fmin_cg: Desired error not necessarily achieved due to precision loss

Categories

Resources