I tried using minimize function in scipy packages like below code
When I use jac option = approx_fprime, iteration is 0 and optimization doesn't work.
But When I use jac option = rosen_der, it worked!
import numpy as np
from scipy.optimize import minimize, approx_fprime
def rosen(x):
"""The Rosenbrock function"""
return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)
def rosen_der(x):
# derivative of rosenbrock function
xm = x[1:-1]
xm_m1 = x[:-2]
xm_p1 = x[2:]
der = np.zeros_like(x)
der[1:-1] = 200*(xm-xm_m1**2) - 400*(xm_p1 - xm**2)*xm - 2*(1-xm)
der[0] = -400*x[0]*(x[1]-x[0]**2) - 2*(1-x[0])
der[-1] = 200*(x[-1]-x[-2]**2)
return der
x0=np.array([1.3, 0.7])
eps = np.sqrt(np.finfo(float).eps)
fprime = lambda x : np.array(approx_fprime(x0, rosen, eps))
res = minimize(rosen, x0, method='CG', jac=fprime, options={'maxiter':10, 'disp': True})
print(res.x)
[ 515.40001106 -197.99999905]
[ 515.4 -198. ]
98.10000000000005
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 98.100000
Iterations: 0
Function evaluations: 33
Gradient evaluations: 21
[1.3 0.7]
I checked approx_fprime is ndarray, same rosen_der, and value is same too.
Why optimization doesn't work??
Your function fprime is a function of x but approximates the derivative at x0. Consequently, you're evaluating the gradient at the initial guess x0 in every iteration. You should evaluate/approximate the derivative at x instead:
fprime = lambda x : approx_fprime(x, rosen, eps)
Note that approx_fprime already returns an np.ndarray, so there's no need for the extra np.array call.
It's also worth mentioning that you don't need to pass approximated derivatives as minimize approximates them by finite differences by default once you don't pass any derivatives, i.e. jac=None. However, minimize uses approx_derivative under the hood instead of approx_fprime as it provides support for evaluating derivatives at variable bounds.
Related
I would need to optimize a function f with respects to a vector x, that takes as input a constant matrix m and returns a scalar v >= 0.
MWE with random numbers:
import numpy as np
from scipy.optimize import minimize
np.random.seed(1)
m = np.array([[1,0,0.15],[2,0,0.15],[1.5,0.2,0.2],[3,0.5,0.1],[2.2,0.1,0.15]])
x0 = np.random.rand(5)*2
def f(x, m):
pg = -np.concatenate((-arr[:, :2], x.reshape(-1, 1)), axis=1).sum(axis=1)
return sum(arr[:, 2] * pg)
res = minimize(
f, x0,
method='nelder-mead', args=(m,),
options={'xatol': 1e-8, 'maxiter': 1e+4, 'disp': True}
)
How do I set up the constraint for the output value? As far as I read in the doc I can only set constraints for the inputs. I read this post saying to use minimize_scalar, but it can only be used when the input is scalar as well.
Simply add the constraint f(x,m) >= 0:
import numpy as np
from scipy.optimize import minimize
np.random.seed(1)
m = np.array([[1,0,0.15],[2,0,0.15],[1.5,0.2,0.2],[3,0.5,0.1],[2.2,0.1,0.15]])
x0 = np.random.rand(5)*2
def f(x, m):
pg = -np.concatenate((-arr[:, :2], x.reshape(-1, 1)), axis=1).sum(axis=1)
return sum(arr[:, 2] * pg)
# add the constraint f(x, m) >= 0
con = [{'type': 'ineq', 'fun': lambda x: f(x, m)}]
res = minimize(
f, x0,
constraints=con,
method='nelder-mead', args=(m,),
options={'xatol': 1e-8, 'maxiter': 1e+4, 'disp': True}
)
Alternatively, you can enforce a positive objective function value by minimizing some vector norm of your objective, e.g. f(x,m)**2. You wouldn't need a constraint then.
PS: The second argument of your function should probably be arr instead of m.
PPS: Since both your objective function and the constraint are continuously differentiable, a gradient-based algorithm will very likely perform much better than Nelder-Mead, even if the gradient is approximated by finite differences.
I am trying to solve this simple simultaneous equations using scipy's fsolve function:
x + 2 = 10 &
x^2 = 64.
I am expecting 8 as the solution. However I'm getting an error saying "minpack.error: Result from function call is not a proper array of floats."
I am pretty new to python scientific library. Can someone please explain how to solve this error? Thanks!
from scipy.optimize import fsolve
def equations(p):
x = p
return (x-8, x**2 - 64)
x = fsolve(equations, 1)
print(x)
When you look at how fsolve is defined in the scipy module we see:
def fsolve(func, x0, args=(), fprime=None, full_output=0,
col_deriv=0, xtol=1.49012e-8, maxfev=0, band=None,
epsfcn=None, factor=100, diag=None):
"""
Find the roots of a function.
Return the roots of the (non-linear) equations defined by
``func(x) = 0`` given a starting estimate.
Parameters
----------
func : callable ``f(x, *args)``
A function that takes at least one (possibly vector) argument,
and returns a value of the same length.
'''
So your input value for p should consist of just as many elements as are returned by your function. Try for example:
from scipy.optimize import fsolve
import numpy as np
def equations(p):
x1 = p[0]
x2 = p[1]
return x1-8, x2**2 - 64
x = fsolve(equations, np.array([1, 2]))
print(x)
which gives 8, 8 as an answer.
I am trying to use a Scipy global optimizer to find a global min of a function, but instead of giving the global min as an answer, it stuck in a local min.
The code:
def f(x):
return x**2 + 10*np.sin(x)
x = np.arange(-10, 10, 0.1)
print(optimize.basinhopping(f, 3))
Can anyone tell me why? And which method in Scipy you think is the best one to solve global optimization?
Only SHGO will provide you guarantees. It also has a better performance based on my benchmarking exercise. Basin-hopping can spend too much time on one possible local minima whereas the homology based technique (i.e. shgo, which you can also pip install and use separately of scipy if you want) avoids this in a cunning fashion.
I should add that if you really want to be careful, you should change the default SHGO sampling (it will use Sobol by default, which technically breaks the guarantee).
The simple scipy.optimize is finding the global minimum for your problem. Run the following code:
import math
import numpy as np
from scipy.optimize import minimize
def objective(x):
# min f(x)
return x[0]**2+10*math.sin(x[0])
if __name__ == "__main__":
# initial guesses
n = 1
x0 = np.zeros(n)
x0[0] = 90.0
# show initial objective
print('\nInitial Objective: ' + str(objective(x0)))
# state bounds
b1 = (-100.0, 100.0)
bds = (b1,)
print("\n")
solution = minimize(objective, x0, method='SLSQP', jac=None, bounds=bds,
tol = 1e-20, constraints=(),
options={'maxiter': 1000000, 'ftol': 1e-20, 'disp': True})
# z is a numpy.ndarray vector
z = solution.x
# show final objective
print('\nFinal Objective')
print('f* = ' + str(objective(z)))
# print solution
print('\nSolution')
print('x1* = ' + str(z[0]))
which outputs:
Initial Objective: 10005.063656411097
Optimization terminated successfully. (Exit mode 0)
Current function value: -7.9458233756152845
Iterations: 7
Function evaluations: 34
Gradient evaluations: 7
Final Objective
f* = -7.9458233756152845
Solution
x1* = -1.3064400096083357
I tried to use scipy.optimize.minimum to estimate parameters in logistic regression. Before this, I wrote log likelihood function and gradient of log likelihood function. I then used Nelder-Mead and BFGS algorithm, respectively. Turned out the latter one failed but the former one succeeded. Because BFGS algorithm also uses the gradient, I doubt the gradient part (function log_likelihood_gradient(x, y)) suffers from bugs. But it's also likely due to any numerical issue associated with Nelder-Mead since I got the warning Desired error not necessarily achieved due to precision loss.
I'm using UCLA's tutorial and dataset (see link) and the correct estimates are,
FYI, I also highly suggest you reading this post to understand the problem setup and preprocess on the dataset.
Now is my code, the first part is data preparation. You can simply run it to get yourself a copy,
import numpy as np
from scipy.optimize import minimize
from patsy import dmatrices
from urllib.request import urlretrieve
url = 'http://www.ats.ucla.edu/stat/data/binary.csv'
urlretrieve(url, './admit_rate.csv')
data = pd.read_csv('./admit_rate.csv')
y, x = dmatrices('admit ~ gre + gpa + C(rank)', data, return_type = 'dataframe')
y = np.array(y, dtype=np.float64)
x = np.array(x, dtype=np.float64)
The second part is the MLE part, log_likelihood defines the sum of log-likelihood function, and log_likelihood_gradient defines the gradient of the sum of log-likelihood function. Both come from the formula as below,
def sigmoid(x):
"""
Logistic function
"""
return 1.0 / (1 + np.exp(-x))
def sigmoid_derivative(x):
"""
Logistic function's first derivative
"""
return sigmoid(x) * (1 - sigmoid(x))
def log_likelihood(x, y):
"""
The sum of log likelihood functions
"""
n1, n2 = x.shape
def log_likelihood_p(p):
"""
p: betas/parameters in logistic regression, to be estimated
"""
p = p.reshape(n2, 1)
sig = sigmoid(np.dot(x, p)).reshape(n1, 1)
return np.sum(y * np.log(sig) + (1 - y) * np.log(1 - sig))
return log_likelihood_p
def log_likelihood_gradient(x, y):
"""
The gradient of the sum of log likelihood functions used in optimization
"""
n1, n2 = x.shape
def log_likelihood_gradient_p(p):
"""
p: betas/parameters in logistic regression, to be estimated
"""
p = p.reshape(n2, 1)
xp = np.dot(x, p)
sig = sigmoid(xp)
sig_der = sigmoid_derivative(xp)
return np.sum(y * sig_der / sig * x - (1 - y) * sig_der / (1 - sig) * x, axis=0)
return log_likelihood_gradient_p
def negate(f):
return lambda *args, **kwargs: -f(*args, **kwargs)
The third part is the optimization part, I started with inital value
x0 = np.array([-1, -0.5, -1, -1.5, 0, 1])
for Nelder-Mead, it works and outputs
array([ -3.99005691e+00, -6.75402757e-01, -1.34021420e+00, -1.55145450e+00, 2.26447746e-03, 8.04046227e-01].
But then when I tried Nelder-Mead algorithm, it failed at
x0 = np.array([-1, -0.5, -1, -1.5, 0, 1]).
Even when I gave even closer initial value like the output from Nelder-Mead algorithm, it failed with warning Desired error not necessarily achieved due to precision loss.
Nelder-Mead
x0 = np.array([-1, -0.5, -1, -1.5, 0, 1])
estimator1 = minimize(negate(log_likelihood(x, y)), x0,
method='nelder-mead', options={'disp': True})
print(estimator1.success)
estimator1.x
BFGS
x0 = np.array([ -3.99005691e+00, -6.75402757e-01, -1.34021420e+00,
-1.55145450e+00, 2.26447746e-03, 8.04046227e-01])
estimator2 = minimize(negate(log_likelihood(x, y)), x0, method='BFGS',
jac=log_likelihood_gradient(x, y), options={'disp': True})
print(estimator2.success)
I know my post is a bit long, but I'd glad to answer for any questions you throw to me and can anyone help?
Consider the following (convex) optimization problem:
minimize 0.5 * y.T * y
s.t. A*x - b == y
where the optimization (vector) variables are x and y and A, b are a matrix and vector, respectively, of appropriate dimensions.
The code below finds a solution easily using the SLSQP method from Scipy:
import numpy as np
from scipy.optimize import minimize
# problem dimensions:
n = 10 # arbitrary integer set by user
m = 2 * n
# generate parameters A, b:
np.random.seed(123) # for reproducibility of results
A = np.random.randn(m,n)
b = np.random.randn(m)
# objective function:
def obj(z):
vy = z[n:]
return 0.5 * vy.dot(vy)
# constraint function:
def cons(z):
vx = z[:n]
vy = z[n:]
return A.dot(vx) - b - vy
# constraints input for SLSQP:
cons = ({'type': 'eq','fun': cons})
# generate a random initial estimate:
z0 = np.random.randn(n+m)
sol = minimize(obj, x0 = z0, constraints = cons, method = 'SLSQP', options={'disp': True})
Optimization terminated successfully. (Exit mode 0)
Current function value: 2.12236220865
Iterations: 6
Function evaluations: 192
Gradient evaluations: 6
Note that the constraint function is a convenient 'array-output' function.
Now, instead of an array-output function for the constraint, one could in principle use an equivalent set of 'scalar-output' constraint functions (actually, the scipy.optimize documentation discusses only this type of constraint functions as input to minimize).
Here is the equivalent constraint set followed by the output of minimize (same A, b, and initial value as the above listing):
# this is the i-th element of cons(z):
def cons_i(z, i):
vx = z[:n]
vy = z[n:]
return A[i].dot(vx) - b[i] - vy[i]
# listable of scalar-output constraints input for SLSQP:
cons_per_i = [{'type':'eq', 'fun': lambda z: cons_i(z, i)} for i in np.arange(m)]
sol2 = minimize(obj, x0 = z0, constraints = cons_per_i, method = 'SLSQP', options={'disp': True})
Singular matrix C in LSQ subproblem (Exit mode 6)
Current function value: 6.87999270692
Iterations: 1
Function evaluations: 32
Gradient evaluations: 1
Evidently, the algorithm fails (the returning objective value is actually the objective value for the given initialization), which I find a bit weird. Note that running [cons_per_i[i]['fun'](sol.x) for i in np.arange(m)] shows that sol.x, obtained using the array-output constraint formulation, satisfies all scalar-output constraints of cons_per_i as expected (within numerical tolerance).
I would appreciate if anyone has some explanation for this issue.
You've run into the "late binding closures" gotcha. All the calls to cons_i are being made with the second argument equal to 19.
A fix is to use the args dictionary element in the dictionary that defines the constraints instead of the lambda function closures:
cons_per_i = [{'type':'eq', 'fun': cons_i, 'args': (i,)} for i in np.arange(m)]
With this, the minimization works:
In [417]: sol2 = minimize(obj, x0 = z0, constraints = cons_per_i, method = 'SLSQP', options={'disp': True})
Optimization terminated successfully. (Exit mode 0)
Current function value: 2.1223622086
Iterations: 6
Function evaluations: 192
Gradient evaluations: 6
You could also use the the suggestion made in the linked article, which is to use a lambda expression with a second argument that has the desired default value:
cons_per_i = [{'type':'eq', 'fun': lambda z, i=i: cons_i(z, i)} for i in np.arange(m)]