I am having troubles using scipy.minimize() in a logistic neuron training.
My cost and gradient functions have been successfully tested.
scipy.minimize() sends me back "IndexError: too many indices for array".
I am using method='CG', but that's the same with other methods.
res = minimize(loCostEntro, W, args=(XX,Y,lmbda), method='CG', jac=loGradEntro, options={'maxiter': 500})
W (weights), XX(training sets) and Y(result) are all numpy 2D arrays.
Please find below the code of the gradient and the cost functions:
def loOutput(X, W):
Z = np.dot(X, W)
O = misc.sigmoid(Z)
return O
def loCostEntro(W, X, Y, lmbda=0):
m = len(X)
O = loOutput(X, W)
cost = -1 * (1 / m) * (np.log(O).T.dot(Y) + np.log(1 - O).T.dot(1 - Y)) \
+ (lmbda / (2 * m)) * np.sum( np.square(W[1:]))
return cost[0,0]
def loGradEntro(W, X, Y, lmbda=0):
m = len(X)
O = loOutput(X, W)
GRAD = (1 / m) * np.dot(X.T, (O - Y)) + (lmbda / m) * np.r_[[[0]], W[1:].reshape(-1, 1)]
return GRAD
Thanks to this working example, I figured out what was wrong. The reason is that scipy.minimize() sends a 1D Weights array (W) to my Gradient and Cost functions whereas my functions supported only 2D arrays.
So reshaping W in the dot product as below fixed the issue :
def loOutput(X, W):
Z = np.dot(X, W.reshape(-1, 1)) # reshape(-1, 1) because scipy.minimize() sends 1-D W !!!
O = misc.sigmoid(Z)
return O
By the way, I encountered another similar problem after fixing this one. The Gradient function should return a 1D gradient. So I added :
def loGradEntroFlatten(W, X, Y, lmbda=0):
return loGradEntro(W, X, Y, lmbda).flatten()
and I updated :
res = minimize(loCostEntro, W, args=(XX,Y,lmbda), method='CG', jac=loGradEntroFlatten, options={'maxiter': 500})
Related
I just started a ML course and I'm trying to run gradient descent in python. The below functions work fine, but as I move on to the bigger chunk where I do the actual learning, I just can't get the expected output and learn the right parameters, as you can tell from this decision boundary I plotted afterwards. And I'm trying to figure out why.
plotting the decision boundary
def sigmoid(z):
sigma = 1/(1+np.exp(-z))
return sigma
def compute_cost(X, y, w, b):
y_hat = sigmoid((X * np.expand_dims(w, axis=0)).sum(axis=1) + b)
total_cost = (-y * np.log(y_hat) - (1-y) * np.log(1-y_hat)).mean()
return total_cost
def compute_gradient(X, y, w, b):
z = w * X + b
yhat = sigmoid(z)
y1 = np.expand_dims(y, axis=1)
error = yhat - y1
db = error.mean()
dw_j1 = (X * error)
dw_j = np.mean(dw_j1,axis=0)
return dw_j, db
Before building this gradient descent function, I tested all the above with my training data & they all work and output the correct numbers. Really appreciate it if you can spot my mistakes.
Learning parameters with gradient descent
def gradient_descent(X, y, w, b, alpha, num_iters):
m = len(X)
J_history = []
wb_history = []
for i in range(num_iters):
cost = compute_cost(X, y, w, b)
dw_j, db = compute_gradient(X, y, w, b)
w = w - alpha * dw_j
b = b - alpha * db
wb_history.append((w,b))
J_history.append(cost)
if i % math.ceil(num_iters/10) == 0 or i == (num_iters-1):
print(f"Iteration {i:4}: Cost {float(J_history[-1]):8.2f}")
return w, b, J_history, wb_history
np.random.seed(1)
initial_w = 0.01 * (np.random.rand(2) - 0.5)
initial_b = -8
iterations = 10000
alpha = 0.001
w, b, J_history, _ = gradient_descent(X_train ,y_train, initial_w, initial_b, alpha, iterations)
def linear_regression(theta, X, y, lamb):
# X(12,1+1) theta(2,1) y(12,1)
m = X.shape[0]
ones = np.ones([m, 1])
X = np.hstack([ones, X])
h = X.dot(theta)
# cost function
J = 1 / 2 / m * np.sum(np.power(h - y, 2)) + lamb / 2 / m * np.sum(np.power(theta[1:], 2))
# gradient X(12,2) X.T(2,12) (h-y)(12,1) sum_error(2,1)\
sum_error = 1 / m * X.T.dot(h - y)
temp = theta
temp[0] = 0
gradient = sum_error + lamb / m * temp
return J, gradient
def f(theta, X, y, lamb):
J, gradient = linear_regression(theta, X, y, lamb)
return J
def fprime(theta, X, y, lamb):
J, gradient = linear_regression(theta, X, y, lamb)
return gradient
J, gradient = linear_regression(theta,X,y,1)
# theta need to be a vector not matrix
result = opt.fmin_cg(f, theta, fprime=fprime, args=(X,y,1))
print(result[0])
Explain:
opt.fmin_cg(f, theta, fprime=fprime, args=(X,y,1)) need a callable function f and fprime
f, fprime is the return value of linear_regression(theta, X, y, lamb)
It is easy to compute the cost function and gradient in the same function
Question:
Is there an easy way to extract two callable function from linear_regression(theta, X, y, lamb)
calling J, gradient = linear_regression(theta,X,y,1) and pass to opt.fmin_cg(J, theta, fprime=gradient , args=(X,y,1)) is not work
I have a problem where i have to Create a dataset ,
Afterwards,I have to use Theano to get the w_0 and w_1 parameters of the following model:
y = log(1 + w_0 * |x|) + (w_1 * |x|)
the datasets are created and i have computed the w_0 and w_1 values but with numpy using the following code but I have studied throughly but don't know how to compute w_0 and w_1 values with theano .. how can I compute these using theano?
It will be great help thankyou :)
code that i am using :
import numpy as np
import math
import theano as t
#code to generate datasets
trX = np.linspace(-1, 1, 101)
trY = np.linspace(-1, 1, 101)
for i in range(len(trY)):
trY[i] = math.log(1 + 0.5 * abs(trX[i])) + trX[i] / 3 + np.random.randn() * 0.033
#code that produce w0 w1 and i want to compute it with theano
X = np.column_stack((np.ones(101, dtype=trX.dtype), trX))
print(X.shape)
Xplus = np.linalg.pinv(X) #pseudo-inverse of X
w_opt = Xplus # trY #The # symbol denotes matrix multiplication
print(w_opt)
x = abs(trX) #abs is a built in function to return positive values in a array
y= trY
for i in range(len(trX)):
y[i] = math.log(1 + w_opt[0] * x[i]) + (w_opt[1] * x[i])
Good morning Hina Malik,
Using the gradient descent algorithm and with the right model selection, this problem should be solved. also, you should create 2 shared variables (w & c) one for each parameter.
X = T.scalar()
Y = T.scalar()
def model(X, w, c):
return X * w + c
w = theano.shared(np.asarray(0., dtype = theano.config.floatX))
c = theano.shared(np.asarray(0., dtype = theano.config.floatX))
y = model(X, w, c)
learning_rate=0.01
cost = T.mean(T.sqr(y - Y))
gradient_w = T.grad(cost = cost, wrt = w)
gradient_c = T.grad(cost = cost, wrt = c)
updates = [[w, w - gradient_w * learning_rate], [c, c - gradient_c * learning_rate]]
train = theano.function(inputs = [X, Y], outputs = cost, updates = updates)
coste=[] #Variable para almacenar los datos de coste para poder representarlos gráficamente
for i in range(101):
for x, y in zip(trX, trY):
cost_i = train(x, y)
coste.append(cost_i)
w0=float(w.get_value())
w1=float(c.get_value())
print(w0,w1)
I replied also to the same or very similar topic in the 'Spanish' version of StackOverFlow here: go to solution
I hope this can help you
Best regards
I wrote some code that performs gradient descent on a couple of data points.
For some reason the curve is not converging correctly, but I have no idea why that is. I always end up with an exploding tail.
Am I doing one of the computations wrong? Am I actually getting stuck in a local minimum or is it something else?
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
def estimate(weights, x, order):
est = 0
for i in range(order):
est += weights[i] * x ** i
return est
def cost_function(x, y, weights, m):
cost = 0
for i in range(m-1):
cost += (((weights[i] * x ** i) - y) ** 2)
return (np.sum(cost ** 2) / ( 2 * m ))
def descent(A, b, iterations, descent_rate, order):
x = A.T[0]
y = b.reshape(4)
# features
ones = np.vstack(np.ones(len(A)))
x = np.vstack(A.T[0])
x2 = np.vstack(A.T[0] ** 2)
# Our feature matrix
features = np.concatenate((ones,x,x2), axis = 1).T
# Initialize our coefficients to zero
weights = np.zeros(order + 1)
m = len(y)
# gradient descent
for i in range(iterations):
est = estimate(weights, x, order).T
difference = est - y
weights = weights + (-descent_rate * (1/m) * np.matmul(difference, features.T)[0])
cost = cost_function(x, y, weights, m)
print(cost)
plt.scatter(x,y)
u = np.linspace(0,3,100)
plt.plot(u, (u ** 2) * weights[2] + u * weights[1] + weights[0], '-')
plt.show()
A = np.array(((0,1),
(1,1),
(2,1),
(3,1)))
b = np.array((1,2,0,3), ndmin = 2 ).T
iterations = 150
descent_rate = 0.01
order = 2
descent(A, b, iterations, descent_rate, order)
I would like to avoid getting stuck in such a minimum. I have attempted setting the initial weights to random values but to no avail, sometimes it dips a bit more but then gives me the same behaviour again.
Here is the one of the plots that I am getting:
And here is the expected result obtained by a least squares solution:
Your estimate function should be
def estimate(weights, x, order):
est = 0
for i in range(order+1):
est += weights[i] * x ** i
return est
Better yet, since the order information is already present in the size of the weights vector, remove the redundancy with:
def estimate(weights, x):
est = 0
for i in range(len(weights)):
est += weights[i] * x ** i
return est
This is what I got when using your code and running 2000 iterations:
I am trying to mimic the gradient descent algorithm for linear regression from Andrew NG's Machine learning course to Python, but for some reason my implementation is not working correctly.
Here's my implementation in Octave, it works correctly:
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
prediction = X*theta;
margin_error = prediction - y;
gradient = 1/m * (alpha * (X' * margin_error));
theta = theta - gradient;
J_history(iter) = computeCost(X, y, theta);
end
end
However, when I translate this to Python for some reason it is not giving me accurate results. The cost seems to be going up rather than descending.
Here's my implementation in Python:
def gradientDescent(x, y, theta, alpha, iters):
m = len(y)
J_history = np.matrix(np.zeros((iters,1)))
for i in range(iters):
prediction = x*theta.T
margin_error = prediction - y
gradient = 1/m * (alpha * (x.T * margin_error))
theta = theta - gradient
J_history[i] = computeCost(x,y,theta)
return theta,J_history
My code is compiling and there isn't anything wrong. Please note this is theta:
theta = np.matrix(np.array([0,0]))
Alpha and iters is set to this:
alpha = 0.01
iters = 1000
When I run it, opt_theta, cost = gradientDescent(x, y, theta, alpha, iters) and print out opt_theta, I get this:
matrix([[ 2.36890383e+16, -1.40798902e+16],
[ 2.47503758e+17, -2.36890383e+16]])
when I should get this:
matrix([[-3.24140214, 1.1272942 ]])
What am I doing wrong?
Edit:
Cost function
def computeCost(x, y, theta):
# Get length of data set
m = len(y)
# We get theta transpose because we are working with a numpy array [0,0] for example
prediction = x * theta.T
J = 1/(2*m) * np.sum(np.power((prediction - y), 2))
return J
Look there:
>>> A = np.matrix([3,3,3])
>>> B = np.matrix([[1,1,1], [2,2,2]])
>>> A-B
matrix([[2, 2, 2],
[1, 1, 1]])
Matrices are broadcasted together.
"it's because np.matrix inherits from np.array. np.matrix overrides multiplication, but not addition and subtraction"
In yours situation theta(1x2) subtract gradient(2x1) and in result you have got 2x2. Try to transpose gradient before subtracting.
theta = theta - gradient.T