Questions about using scipy to deal with Lasso - python

I'm struggling with using scipy to deal with lasso regression.
I wrote a code that could theoretically compute the problem:
where p = 1.
However it returns such kind of error and I'm wondering where did I go wrong. The dimension of x is (50,3) and y is (50, 1):
ValueError: The user-provided objective function must return a scalar value.
My code goes like this:
def norm_one_regression(w, X, y, lambda_):
a = y - X # w
a = a.T # a
return a + lambda_ * np.linalg.norm(w, 1)
# using scipy
w0 = np.ones((3, 1))
X0 = data['x']
y0 = data['y']
lambda_0 = 1
minimum = sop.fmin(norm_one_regression, w0, args=(X0, y0, lambda_0))
print(minimum)
Thanks for your consideration and help.

Related

2 Different Costs in Multivariate Linear Regression due to Numpy

I started out my ML journey, taking Andrew Ng's Machine learning course on Coursera, and tried to implement Multivariate Linear Regression using Python. However, I'm facing a lot of trouble parsing the data and converting it into a proper working Numpy array. There seems to be so many Numpy functions that perform the same type of processing on the data, that it's hard to figure which function does what. The major problem cause of this is that it affects the algorithm that I'm running.
When I implement the code using np.matrix() to convert the data into a Numpy matrix I get a Cost function of 2064911681.6185248. I get the same Cost function when I use np.values instead of np.matrix(). However, every python solution for this problem online, gets a cost function of 2105448288.629247 using np.newaxis for X and Y. Whenever, I try to use np.newaxis I get a Type Error saying Key is Invalid.
My question is, why does parsing the data in different ways give different cost functions, even though the data shape is how I want it to be? I've provided my Code below. Is there a single efficient and correct way to convert the data into Numpy arrays?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#Load the data
data = pd.read_csv("ex1data2.txt", header=None, names=["Size", "No. of Bedrooms","Price"])
print(data.head(),"\n")
#Initialize columns and size of dataset
cols = data.shape[1]
m = data.shape[0]
theta = np.matrix(np.zeros(cols))
ones = np.ones((m,1))
#Initializing Parameters
alpha = 0.01
iters = 400
cost_list = np.zeros(iters)
#Setting X and Y
X = np.matrix(data.iloc[:,:cols-1])
Y = np.matrix(data.iloc[:,cols-1:])
#Feature Scaling and Adding Ones to X
X = (X - np.mean(X)) / np.std(X)
X = np.hstack((ones, X))
print(f"X Shape: {X.shape} \nY Shape: {Y.shape} \nTheta Shape: {theta.shape} \n")
#Defining Algorithms
def hypothesis(X, theta):
h = np.dot(X, theta.T)
return h
def cost_function(X, Y, theta, m):
squared_error = np.power((hypothesis(X, theta) - Y) , 2)
J = np.sum(squared_error) / (2*m)
return J
def gradient_descent(X, Y, theta, m, alpha, iters):
for i in range(iters):
error = hypothesis(X, theta) - Y
temp = np.dot(error.T, X)
theta = theta - ((alpha/m) * temp)
cost_list[i] = cost_function(X, Y, theta, m)
return theta, cost_list
#Printing Initial and Final Values
print(f"Inital Theta = {theta}")
print(f"Inital Cost = {cost_function(X, Y, theta, m)}")
new_theta, cost_list = gradient_descent(X, Y, theta, m, alpha, iters)
print(f"Final Theta = {new_theta}")
print(f"Final Cost = {cost_function(X, Y, new_theta, m)}")
#Plotting Cost vs Iterations
plt.plot(cost_list, color="Red", label="Cost")
plt.xlabel("Iterations")
plt.ylabel("Cost")
plt.title("Cost vs Iterations")
plt.legend()
plt.show()

Python: How to prevent Scipy's optimize.minimize function from changing the shape of initial guess x0?

I am trying to implement the optimization algorithm from Scipy. It works fine when I implement it without inputting the Jacobian gradient function. I believe the issue that I am getting when I input the gradient is because the minimize function itself is changing the shape of the initial guess x0. You can see this from the output of the code below.
Input:
import numpy as np
from costFunction import *
import scipy.optimize as op
def sigmoid(z):
epsilon = np.finfo(z.dtype).eps
g = 1/(1+np.exp(-z))
g = np.clip(g,epsilon,1-epsilon)
return g
def costFunction(theta,X,y):
m = y.size
h = sigmoid(X#theta)
J = 1/(m)*(-y.T#np.log(h)-(1-y).T#np.log(1-h))
grad = 1/m*X.T#(h-y)
print ('Shape of theta is',np.shape(theta),'\n')
print ('Shape of gradient is',np.shape(grad),'\n')
return J, grad
X = np.array([[1, 3],[5,7]])
y = np.array([[1],[0]])
m,n = np.shape(X)
one_vec = np.ones((m,1))
X = np.hstack((one_vec,X))
initial_theta = np.zeros((n+1,1))
print ('Running costFunction before executing minimize function...\n')
cost, grad = costFunction(initial_theta,X,y) #To test the shape of gradient before calling minimize
print ('Executing minimize function...\n')
Result = op.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
Output:
Running costFunction before executing minimize function...
Shape of theta is (3, 1)
Traceback (most recent call last):
Shape of gradient is (3, 1)
Executing minimize function...
Shape of theta is (3,)
File "C:/Users/#####/minimizeshapechange.py", line 34, in <module>
Shape of gradient is (3, 2)
Result = op.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
File "C:\Users\#####\anaconda3\lib\site-packages\scipy\optimize\_minimize.py", line 453, in minimize
**options)
File "C:\Users\#####\anaconda3\lib\site-packages\scipy\optimize\tnc.py", line 409, in _minimize_tnc
xtol, pgtol, rescale, callback)
ValueError: tnc: invalid gradient vector from minimized function.
Process finished with exit code 1
I will not analyze your exact computations, but some remarks:
(1) Your gradient is broken!
scipy expects a partial derivative resulting in an array of shape equal to your x0.
your gradient is of shape (3,2), while (n+1, 1) is expected
compare with the example given in the tutorial which uses scipy.optimize.rosen_der (der = derivative)
(2) It seems your scipy-version is a bit older, because mine (0.19.0) is telling me:
ValueError: tnc: invalid gradient vector from minimized function.
Some supporting source-code from scipy:
if (PyArray_SIZE(arr_grad) != py_state->n)
{
PyErr_SetString(PyExc_ValueError,
"tnc: invalid gradient vector from minimized function.");
goto failure;
Remark: This code above was changed / touched / introduced 5 years ago. If you really don't get this error while using your code listed (with removal of the import of costFunction), it seems you are using scipy < v0.13.0b1, which i do no recommend! I assume you are using some deprecated windows-based inofficial distribution with outdated scipy. You should change that!
I had the same problem with Scipy trying to do the same thing as you. I don't understand exactly why this solves the problem but playing with array shapes until it worked gave me the following:
Gradient function defined as follows
def Gradient(theta,X,y):
#Initializing variables
m = len(y)
theta = theta[:,np.newaxis] #<---- THIS IS THE TRICK
grad = np.zeros(theta.shape)
#Vectorized computations
z = X # theta
h = sigmoid(z)
grad = (1/m)*(X.T # ( h - y));
return grad #< --- also works with grad.ravel()
Initial_theta initialized as
initial_theta = np.zeros((n+1))
initial_theta.shape
(3,)
i.e. a simple numpy array rather than a column vector.
Gradient function returns
Gradient(initial_theta,X,y).shape
(3,1) or (3,) depending on whether the function returns grad or grad.ravel
scipy.optimize called as
import scipy.optimize as opt
model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)
What does not work with Scipy
initial_theta of shape (3,1) using initial_theta = np.zeros((n+1))[:,np.newaxis] crashes the scipy.minimize function call.
ValueError: tnc: invalid gradient vector from minimized function.
If someone could clarify these points that would be great ! Thanks
your code of costFunctuion is wrong,maybe you should look that
def costFunction(theta,X,y):
h_theta = sigmoid(X#theta)
J = (-y) * np.log(h_theta) - (1 - y) * np.log(1 - h_theta)
return np.mean(J)
please copy and past in jpuiter in1 and so on in separte cell
In 1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
filepath =('C:/Pythontry/MachineLearning/dataset/couresra/ex2data1.txt')
data =pd.read_csv(filepath,sep=',',header=None)
#print(data)
X = data.values[:,:2] #(100,2)
y = data.values[:,2:3] #(100,1)
#print(np.shape(y))
#In 2
#%% ==================== Part 1: Plotting ====================
postive_value = data.loc[data[2] == 1]
#print(postive_value.values[:,2:3])
negative_value = data.loc[data[2] == 0]
#print(len(postive_value))
#print(len(negative_value))
ax1 = postive_value.plot(kind='scatter',x=0,y=1,s=50,color='b',marker="+",label="Admitted") # S is line width #https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.scatter.html#matplotlib.axes.Axes.scatter
ax2 = negative_value.plot(kind='scatter',x=0,y=1,s=50,color='y',ax=ax1,label="Not Admitted")
ax1.set_xlabel("Exam 1 score")
ax2.set_ylabel("Exam 2 score")
plt.show()
#print(ax1 == ax2)
#print(np.shape(X))
# In 3
#============ Part 2: Compute Cost and Gradient ===========
[m,n] = np.shape(X) #(100,2)
print(m,n)
additional_coulmn = np.ones((m,1))
X = np.append(additional_coulmn,X,axis=1)
initial_theta = np.zeros((n+1), dtype=int)
print(initial_theta)
# In4
#Sigmoid and cost function
def sigmoid(z):
g = np.zeros(np.shape(z));
g = 1/(1+np.exp(-z));
return g
def costFunction(theta, X, y):
J = 0;
#print(theta)
receive_theta = np.array(theta)[np.newaxis] ##This command is used to create the 1D array
#print(receive_theta)
theta = np.transpose(receive_theta)
#print(np.shape(theta))
#grad = np.zeros(np.shape(theta))
z = np.dot(X,theta) # where z = theta*X
#print(z)
h = sigmoid(z) #formula h(x) = g(z) whether g = 1/1+e(-z) #(100,1)
#print(np.shape(h))
#J = np.sum(((-y)*np.log(h)-(1-y)*np.log(1-h))/m);
J = np.sum(np.dot((-y.T),np.log(h))-np.dot((1-y).T,np.log(1-h)))/m
#J = (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()
#error = h-y
#print(np.shape(error))
#print(np.shape(X))
grad =np.dot(X.T,(h-y))/m
#print(grad)
return J,grad
#In5
[cost, grad] = costFunction(initial_theta, X, y)
print('Cost at initial theta (zeros):', cost)
print('Expected cost (approx): 0.693\n')
print('Gradient at initial theta (zeros): \n',grad)
print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628\n')
In6 # Compute and display cost and gradient with non-zero theta
test_theta = [-24, 0.2, 0.2]
#test_theta_value = np.array([-24, 0.2, 0.2])[np.newaxis] #This command is used to create the 1D row array
#test_theta = np.transpose(test_theta_value) # Transpose
#test_theta = test_theta_value.transpose()
[cost, grad] = costFunction(test_theta, X, y)
print('\nCost at test theta: \n', cost)
print('Expected cost (approx): 0.218\n')
print('Gradient at test theta: \n',grad);
print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')
#IN6
# ============= Part 3: Optimizing using range =============
import scipy.optimize as opt
#initial_theta_initialize = np.array([0, 0, 0])[np.newaxis]
#initial_theta = np.transpose(initial_theta_initialize)
print ('Executing minimize function...\n')
# Working models
#result = opt.minimize(costFunction,initial_theta,args=(X,y),method='TNC',jac=True,options={'maxiter':400})
result = opt.fmin_tnc(func=costFunction, x0=initial_theta, args=(X, y))
# Not working model
#costFunction(initial_theta,X,y)
#model = opt.minimize(fun = costFunction, x0 = initial_theta, args = (X, y), method = 'TNC',jac = costFunction)
print('Thetas found by fmin_tnc function: ', result);
print('Cost at theta found : \n', cost);
print('Expected cost (approx): 0.203\n');
print('theta: \n',result[0]);
print('Expected theta (approx):\n');
print(' -25.161\n 0.206\n 0.201\n');
Result:
Executing minimize function...
Thetas found by fmin_tnc function: (array([-25.16131854, 0.20623159, 0.20147149]), 36, 0)
Cost at theta found :
0.218330193827
Expected cost (approx): 0.203
theta:
[-25.16131854 0.20623159 0.20147149]
Expected theta (approx):
-25.161
0.206
0.201
scipy’s fmin_tnc doesn’t work well with column or row vector. It expects the parameters to be in an array format.
Python Implementation of Andrew Ng’s Machine Learning Course (Part 2.1)
opt.fmin_tnc(func = costFunction, x0 = theta.flatten(),fprime = gradient, args = (X, y.flatten()))
What worked for me is to reshape y as a vector (1-D) rather than a matrix (2-D array). I simply used the following code and then reran the SciPy's minimize function and it worked.
y = np.reshape(y,100) #e.g., if your y variable has 100 data points.
Little bit late but I also started anderw assignment to implement with Python and put a lot of effort to resolve the mentioned issue. Finally is works for me.
This blog help me but with one changes in fmin_tnc function calling, refer below :-
result = op.fmin_tnc(func=costFunction, x0=initial_theta, fprime=None, approx_grad=True, args=(X, y)) Got this info from here

Cost Function and Gradient Seem to be Working, but scipy.optimize functions are not

I'm working through my Matlab code for the Andrew NG Coursera course and turning it into python. I am working on non-regularized logistic regression and after writing my gradient and cost functions I needed something similar to fminunc and after some googling, I found a couple options. They are both returning the same results, but they do not match what is in Andrew NG's expected results code. Others seem to be getting this to work correctly, but I'm wondering why my specific code does not seem to return the desired result when using scipy.optimize functions, but does for the cost and gradient pieces earlier in the code.
The data I'm using can be found at the link below;
ex2data1
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize as op
#Machine Learning Online Class - Exercise 2: Logistic Regression
#Load Data
#The first two columns contains the exam scores and the third column contains the label.
data = pd.read_csv('ex2data1.txt', header = None)
X = np.array(data.iloc[:, 0:2]) #100 x 3
y = np.array(data.iloc[:,2]) #100 x 1
y.shape = (len(y), 1)
#Creating sub-dataframes for plotting
pos_plot = data[data[2] == 1]
neg_plot = data[data[2] == 0]
#==================== Part 1: Plotting ====================
#We start the exercise by first plotting the data to understand the
#the problem we are working with.
print('Plotting data with + indicating (y = 1) examples and o indicating (y = 0) examples.')
plt.plot(pos_plot[0], pos_plot[1], "+", label = "Admitted")
plt.plot(neg_plot[0], neg_plot[1], "o", label = "Not Admitted")
plt.xlabel('Exam 1 score')
plt.ylabel('Exam 2 score')
plt.legend()
plt.show()
def sigmoid(z):
'''
SIGMOID Compute sigmoid function
g = SIGMOID(z) computes the sigmoid of z.
Instructions: Compute the sigmoid of each value of z (z can be a matrix,
vector or scalar).
'''
g = 1 / (1 + np.exp(-z))
return g
def costFunction(theta, X, y):
'''
COSTFUNCTION Compute cost and gradient for logistic regression
J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
parameter for logistic regression and the gradient of the cost
w.r.t. to the parameters.
'''
m = len(y) #number of training examples
h = sigmoid(X.dot(theta)) #logisitic regression hypothesis
J = (1/m) * np.sum((-y*np.log(h)) - ((1-y)*np.log(1-h)))
#h is 100x1, y is %100x1, these end up as 2 vector we subtract from each other
#then we sum the values by rows
#cost function for logisitic regression
return J
def gradient(theta, X, y):
m = len(y)
grad = np.zeros((theta.shape))
h = sigmoid(X.dot(theta))
for i in range(len(theta)): #number of rows in theta
XT = X[:,i]
XT.shape = (len(X),1)
grad[i] = (1/m) * np.sum((h-y)*XT) #updating each row of the gradient
return grad
#============ Part 2: Compute Cost and Gradient ============
#In this part of the exercise, you will implement the cost and gradient
#for logistic regression. You neeed to complete the code in costFunction.m
#Add intercept term to x and X_test
Bias = np.ones((len(X), 1))
X = np.column_stack((Bias, X))
#Initialize fitting parameters
initial_theta = np.zeros((len(X[0]), 1))
#Compute and display initial cost and gradient
(cost, grad) = costFunction(initial_theta, X, y), gradient(initial_theta, X, y)
print('Cost at initial theta (zeros): %f' % cost)
print('Expected cost (approx): 0.693\n')
print('Gradient at initial theta (zeros):')
print(grad)
print('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628')
#Compute and display cost and gradient with non-zero theta
test_theta = np.array([[-24], [0.2], [0.2]]);
(cost, grad) = costFunction(test_theta, X, y), gradient(test_theta, X, y)
print('\nCost at test theta: %f' % cost)
print('Expected cost (approx): 0.218\n')
print('Gradient at test theta:')
print(grad)
print('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n')
result = op.fmin_tnc(func = costFunction, x0 = initial_theta, fprime = gradient, args = (X,y))
result[1]
Result = op.minimize(fun = costFunction,
x0 = initial_theta,
args = (X, y),
method = 'TNC',
jac = gradient, options={'gtol': 1e-3, 'disp': True, 'maxiter': 1000})
theta = Result.x
theta
test = np.array([[1, 45, 85]])
prob = sigmoid(test.dot(theta))
print('For a student with scores 45 and 85, we predict an admission probability of %f,' % prob)
print('Expected value: 0.775 +/- 0.002\n')
This was a very difficult problem to debug, and illustrates a poorly documented aspect of the scipy.optimize interface. The documentation vaguely indicates that theta will be passed around as a vector:
Minimization of scalar function of one or more variables.
In general, the optimization problems are of the form:
minimize f(x) subject to
g_i(x) >= 0, i = 1,...,m
h_j(x) = 0, j = 1,...,p
where x is a vector of one or more variables.
What's important is that they really mean vector in the most primitive sense, a 1-dimensional array. So you have to expect that whenever theta is passed into one of your callbacks, it will be passed in as a 1-d array. But in numpy, 1-d arrays sometimes behave differently from 2-d row arrays (and, obviously, from 2-d column arrays).
I don't know exactly why it's causing a problem in your case, but it's easily fixed regardless. You just have to add the following at the top of both your cost function and your gradient function:
theta = theta.reshape(-1, 1)
This guarantees that theta will be a 2-d column array, as expected. Once you've done this, the results are correct.
I have had similar issues with Scipy dealing with the same problem as you. As senderle points out the interface is not the easiest to deal with, especially combined with the numpy array interface... Here is my implementation which works as expected.
Defining the cost and gradient functions
Note that initial_theta is passed as a simple array of shape (3,) and converted to a column vector of shape (3,1) within the function. The gradient function then returns the grad.ravel() which has shape (3,) again. This is important as doing otherwise caused an error message with various optimization methods in Scipy.optimize.
Note that different methods have different behaviours but returning .ravel() seems to fix most issues...
import pandas as pd
import numpy as np
import scipy.optimize as opt
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def CostFunc(theta,X,y):
#Initializing variables
m = len(y)
J = 0
grad = np.zeros(theta.shape)
#Vectorized computations
z = X # theta
h = sigmoid(z)
J = (1/m) * ( (-y.T # np.log(h)) - (1 - y).T # np.log(1-h));
return J
def Gradient(theta,X,y):
#Initializing variables
m = len(y)
theta = theta[:,np.newaxis]
grad = np.zeros(theta.shape)
#Vectorized computations
z = X # theta
h = sigmoid(z)
grad = (1/m)*(X.T # ( h - y));
return grad.ravel() #<-- This is the trick
Initializing variables and parameters
Note that initial_theta.shape returns (3,)
X = data1.iloc[:,0:2].values
m,n = X.shape
X = np.concatenate((np.ones(m)[:,np.newaxis],X),1)
y = data1.iloc[:,-1].values[:,np.newaxis]
initial_theta = np.zeros((n+1))
Calling Scipy.optimize
model = opt.minimize(fun = CostFunc, x0 = initial_theta, args = (X, y), method = 'TNC', jac = Gradient)
Any comments from more knowledgeable people are welcome, this Scipy interface is a mystery to me, thanks

fmin_cg: Desired error not necessarily achieved due to precision loss

I have the following code to minimize the Cost Function with its gradient.
def trainLinearReg( X, y, lamda ):
# theta = zeros( shape(X)[1], 1 )
theta = random.rand( shape(X)[1], 1 ) # random initialization of theta
result = scipy.optimize.fmin_cg( computeCost, fprime = computeGradient, x0 = theta,
args = (X, y, lamda), maxiter = 200, disp = True, full_output = True )
return result[1], result[0]
But I am having this warning:
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 8403387632289934651424768.000000
Iterations: 0
Function evaluations: 15
Gradient evaluations: 3
My computeCost and computeGradient are defined as
def computeCost( theta, X, y, lamda ):
theta = theta.reshape( shape(X)[1], 1 )
m = shape(y)[0]
J = 0
grad = zeros( shape(theta) )
h = X.dot(theta)
squaredErrors = (h - y).T.dot(h - y)
# theta[0] = 0.0
J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))
return J[0]
def computeGradient( theta, X, y, lamda ):
theta = theta.reshape( shape(X)[1], 1 )
m = shape(y)[0]
J = 0
grad = zeros( shape(theta) )
h = X.dot(theta)
squaredErrors = (h - y).T.dot(h - y)
# theta[0] = 0.0
J = (1.0 / (2 * m)) * (squaredErrors) + (lamda / (2 * m)) * (theta.T.dot(theta))
grad = (1.0 / m) * (X.T.dot(h - y)) + (lamda / m) * theta
return grad.flatten()
I have reviewed these similar questions:
scipy.optimize.fmin_bfgs: “Desired error not necessarily achieved due to precision loss”
scipy.optimize.fmin_cg: "'Desired error not necessarily achieved due to precision loss.'
scipy is not optimizing and returns "Desired error not necessarily achieved due to precision loss"
But still cannot have the solution to my problem. How to let the minimization function process converge instead of being stuck at first?
ANSWER:
I solve this problem based on #lejlot 's comments below.
He is right. The data set X is to large since I did not properly return the correct normalized value to the correct variable. Even though this is a small mistake, it indeed can give you the thought where should we look at when encountering such problems. The Cost Function value is too large leads to the possibility that there are some wrong with my data set.
The previous wrong one:
X_poly = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly = c_[ones((m, 1)), X_poly]
The correct one:
X_poly = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly = c_[ones((m, 1)), X_poly]
where X_poly is actually used in the following traing as
cost, theta = trainLinearReg(X_poly, y, lamda)
ANSWER:
I solve this problem based on #lejlot 's comments below.
He is right. The data set X is to large since I did not properly return the correct normalized value to the correct variable. Even though this is a small mistake, it indeed can give you the thought where should we look at when encountering such problems. The Cost Function value is too large leads to the possibility that there are some wrong with my data set.
The previous wrong one:
X_poly = polyFeatures(X, p)
X_norm, mu, sigma = featureNormalize(X_poly)
X_poly = c_[ones((m, 1)), X_poly]
The correct one:
X_poly = polyFeatures(X, p)
X_poly, mu, sigma = featureNormalize(X_poly)
X_poly = c_[ones((m, 1)), X_poly]
where X_poly is actually used in the following traing as
cost, theta = trainLinearReg(X_poly, y, lamda)
For my implementation scipy.optimize.fmin_cg also failed with the above-mentioned error in some initial guesses. Then I changed it to the BFGS method and converged.
scipy.optimize.minimize(fun, x0, args=(), method='BFGS', jac=None, tol=None, callback=None, options={'disp': False, 'gtol': 1e-05, 'eps': 1.4901161193847656e-08, 'return_all': False, 'maxiter': None, 'norm': inf})
seems that this error in cg is inevitable still as,
CG ends up with a non-descent direction
I too faced this problem and even after searching a lot for solutions nothing happened as the solutions were not clearly defined.
Then I read the documentation from scipy.optimize.fmin_cg where it is clearly mentioned that parameter x0 must be a 1-D array.
My approach was same as that of you wherein I passed 2-D matrix as x0 and I always got some precision error or divide by zero error and same warning as you got.
Then I changed my approach and passed theta as a 1-D array and converted that array into 2-D matrix inside the computeCost and computeGradient function which worked for me and I got the results as expected.
My solutiion for Logistic Regression
def sigmoid(z):
return 1 / (1 + np.exp(-z))
theta = np.zeros(features)
def computeCost(theta,X, Y):
x = np.matrix(X.values)
y = np.matrix(Y.values)
theta = np.matrix(theta)
xtheta = np.matmul(x,theta.T)
hx = sigmoid(xtheta)
cost = (np.multiply(y,np.log(hx)))+(np.multiply((1-y),np.log(1-hx)))
return -(np.sum(cost))/m
def computeGradient(theta, X, Y):
x = np.matrix(X.values)
y = np.matrix(Y.values)
theta = np.matrix(theta)
grad = np.zeros(features)
xtheta = np.matmul(x,theta.T)
hx = sigmoid(xtheta)
error = hx-Y
for i in range(0,features,1):
term = np.multiply(error,x[:,i])
grad[i] = (np.sum(term))/m
return grad
import scipy.optimize as opt
result = opt.fmin_tnc(func=computeCost, x0=theta, fprime=computeGradient, args=(X, Y))
print cost(result[0],X, Y)
Note Again that theta has to be a 1-D array
So in your code modify theta in trainLinearReg to theta = random.randn(features)
I today faced this problem.
I then noticed that my cost function was implemented wrong way and was producing high scaled errors due to which scipy was asking for more data. Hope this helps for someone like me.

python lmfit "object too deep for desired array"

I am trying out lmfit and using as an example problem below. In this example, I am simply solving for x in a system Ax = y. Here A is a 3*2 array, y is a 3*1 array. I have declared all of them as arrays.
import numpy as np
from lmfit import minimize, Parameters
A = np.array([1,2,-1,3,-2,5])
A = A.reshape(3,2)
y = np.array([12, 13, 21])
def residual(params, A, y, eps_y=1):
x = params['x'].value
y_hat = np.dot(A, x)
return (y - y_hat)/eps_y
x = np.array([0,0])
params = Parameters()
params.add('x', x)
out = minimize(residual, params, args=(A,y))
print out.value
When running this I get an error: "ValueError: object too deep for desired array".
I have found instances of similar problems researching here and on web. In general, most often reason cited is that A, x and y should be arrays and not matrix. Also in some solutions, x and y are asked to be a kept as a vector with shape (len(v),). Above is already in compliance with these suggestions but I am still getting "ValueError: object too deep for desired array".
I have wasted quite a bit of time trying to solve this problem and am stumped now. Any help on this will be very welcome.
The documentation for Parameter is here:
http://newville.github.io/lmfit-py/parameters.html#Parameter
It specifically states that the value of a parameter must be a numerical value, and not an array of any kind. So instead of doing:
x = np.array([0,0])
params.add('x', x)
do:
params.add('x0', 0)
params.add('x1', 0)
and then change the residuals function to:
def residual(params, A, y, eps_y=1):
x0 = params['x0'].value
x1 = params['x1'].value
y_hat = np.dot(A, [x0, x1])
return (y - y_hat)/eps_y

Categories