I have a reasonably simple constrained optimization problem but get different answers depending on how I do it. Let's get the import and a pretty print function out of the way first:
import numpy as np
from scipy.optimize import minimize, LinearConstraint, NonlinearConstraint, SR1
def print_res( res, label ):
print("\n\n ***** ", label, " ***** \n")
print(res.message)
print("obj func value at solution", obj_func(res.x))
print("starting values: ", x0)
print("ending values: ", res.x.astype(int) )
print("% diff", (100.*(res.x-x0)/x0).astype(int) )
print("target achieved?",target,res.x.sum())
The sample data is very simple:
n = 5
x0 = np.arange(1,6) * 10_000
target = x0.sum() + 5_000 # increase sum from 15,000 to 20,000
Here's the constrained optimization (including jacobians). In words, the objective function I want to minimize is just the sum of squared percentage changes from the initial values to final values. The linear equality constraint is simply requiring x.sum() to equal a constant.
def obj_func(x):
return ( ( ( x - x0 ) / x0 ) ** 2 ).sum()
def obj_jac(x):
return 2. * ( x - x0 ) / x0 ** 2
def constr_func(x):
return x.sum() - target
def constr_jac(x):
return np.ones(n)
And for comparison, I've re-factored as an unconstrained minimization by using the equality constraint to replace x[0] with a function of x[1:]. Note that the unconstrained function is passed x0[1:] whereas the constrained function is passed x0.
def unconstr_func(x):
x_one = target - x.sum()
first_term = ( ( x_one - x0[0] ) / x0[0] ) ** 2
second_term = ( ( ( x - x0[1:] ) / x0[1:] ) ** 2 ).sum()
return first_term + second_term
I then try to minimize in three ways:
Unconstrained with 'Nelder-Mead'
Constrained with 'trust-constr' (w/ & w/o jacobian)
Constrained with 'SLSQP' (w/ & w/o jacobian)
Code:
##### (1) unconstrained
res0 = minimize( unconstr_func, x0[1:], method='Nelder-Mead') # OK, but weird note
res0.x = np.hstack( [target - res0.x.sum(), res0.x] )
print_res( res0, 'unconstrained' )
##### (2a) constrained -- trust-constr w/ jacobian
nonlin_con = NonlinearConstraint( constr_func, 0., 0., constr_jac )
resTCjac = minimize( obj_func, x0, method='trust-constr',
jac='2-point', hess=SR1(), constraints = nonlin_con )
print_res( resTCjac, 'trust-const w/ jacobian' )
##### (2b) constrained -- trust-constr w/o jacobian
nonlin_con = NonlinearConstraint( constr_func, 0., 0. )
resTC = minimize( obj_func, x0, method='trust-constr',
jac='2-point', hess=SR1(), constraints = nonlin_con )
print_res( resTC, 'trust-const w/o jacobian' )
##### (3a) constrained -- SLSQP w/ jacobian
eq_cons = { 'type': 'eq', 'fun' : constr_func, 'jac' : constr_jac }
resSQjac = minimize( obj_func, x0, method='SLSQP',
jac = obj_jac, constraints = eq_cons )
print_res( resSQjac, 'SLSQP w/ jacobian' )
##### (3b) constrained -- SLSQP w/o jacobian
eq_cons = { 'type': 'eq', 'fun' : constr_func }
resSQ = minimize( obj_func, x0, method='SLSQP',
jac = obj_jac, constraints = eq_cons )
print_res( resSQ, 'SLSQP w/o jacobian' )
Here is some simplified output (and of course you can run the code to get the full output):
starting values: [10000 20000 30000 40000 50000]
***** (1) unconstrained *****
Optimization terminated successfully.
obj func value at solution 0.0045454545454545305
ending values: [10090 20363 30818 41454 52272]
***** (2a) trust-const w/ jacobian *****
The maximum number of function evaluations is exceeded.
obj func value at solution 0.014635854609684874
ending values: [10999 21000 31000 41000 51000]
***** (2b) trust-const w/o jacobian *****
`gtol` termination condition is satisfied.
obj func value at solution 0.0045454545462939935
ending values: [10090 20363 30818 41454 52272]
***** (3a) SLSQP w/ jacobian *****
Optimization terminated successfully.
obj func value at solution 0.014636111111111114
ending values: [11000 21000 31000 41000 51000]
***** (3b) SLSQP w/o jacobian *****
Optimization terminated successfully.
obj func value at solution 0.014636111111111114
ending values: [11000 21000 31000 41000 51000]
Notes:
(1) & (2b) are plausible solutions in that they achieve significantly lower objective function values and intuitively we'd expect the variables with larger starting values to move more (both absolutely and in percentage terms) than the smaller ones.
Adding the jacobian to 'trust-const' causes it to get the wrong answer (or at least a worse answer) and also to exceed max iterations. Maybe the jacobian is wrong, but the function is so simple that I'm pretty sure it's correct (?)
'SLSQP' doesn't seem to work w/ or w/o the jacobian supplied, but works very fast and claims to terminate successfully. This seems very worrisome in that getting the wrong answer and claiming to have terminated successfully is pretty much the worst possible outcome.
Initially I used very small starting values and targets (just 1/1,000 of what I have above) and in that case all 5 approaches above work fine and give the same answers. My sample data is still extremely small, and it seems kinda bizarre for it to handle 1,2,..,5 but not 1000,2000,..5000.
FWIW, note that the 3 incorrect results all hit the target by adding 1,000 to each initial value -- this satisfies the constraint but comes nowhere near minimizing the objective function (b/c variables with higher initial values should be increased more than lower ones to minimize the sum of squared percentage differences).
So my question is really just what is happening here and why do only (1) and (2b) seem to work?
More generally, I'd like to find a good python-based approach to this and similar optimization problems and will consider answers using other packages besides scipy although the best answer would ideally also address what is going on with scipy here (e.g. is this user error or a bug I should post to github?).
Here is how this problem could be solved using nlopt which is a library for nonlinear optimization which I've been pretty impressed with.
First, the objective function and gradient are both defined using the same function:
def obj_func(x, grad):
if grad.size > 0:
grad[:] = obj_jac(x)
return ( ( ( x/x0 - 1 )) ** 2 ).sum()
def obj_jac(x):
return 2. * ( x - x0 ) / x0 ** 2
def constr_func(x, grad):
if grad.size > 0:
grad[:] = constr_jac(x)
return x.sum() - target
def constr_jac(x):
return np.ones(n)
Then, to run the minimization using Nelder-Mead and SLSQP:
opt = nlopt.opt(nlopt.LN_NELDERMEAD,len(x0)-1)
opt.set_min_objective(unconstr_func)
opt.set_ftol_abs(1e-15)
xopt = opt.optimize(x0[1:].copy())
xopt = np.hstack([target - xopt.sum(), xopt])
fval = opt.last_optimum_value()
print_res(xopt,fval,"Nelder-Mead");
opt = nlopt.opt(nlopt.LD_SLSQP,len(x0))
opt.set_min_objective(obj_func)
opt.add_equality_constraint(constr_func)
opt.set_ftol_abs(1e-15)
xopt = opt.optimize(x0.copy())
fval = opt.last_optimum_value()
print_res(xopt,fval,"SLSQP w/ jacobian");
And here are the results:
***** Nelder-Mead *****
obj func value at solution 0.00454545454546
result: 3
starting values: [ 10000. 20000. 30000. 40000. 50000.]
ending values: [10090 20363 30818 41454 52272]
% diff [0 1 2 3 4]
target achieved? 155000.0 155000.0
***** SLSQP w/ jacobian *****
obj func value at solution 0.00454545454545
result: 3
starting values: [ 10000. 20000. 30000. 40000. 50000.]
ending values: [10090 20363 30818 41454 52272]
% diff [0 1 2 3 4]
target achieved? 155000.0 155000.0
When testing this out, I think I discovered what the issue with the original attempt was. If I set the absolute tolerance on the function to 1e-8 which is what the scipy functions default to I get:
***** Nelder-Mead *****
obj func value at solution 0.0045454580693
result: 3
starting values: [ 10000. 20000. 30000. 40000. 50000.]
ending values: [10090 20363 30816 41454 52274]
% diff [0 1 2 3 4]
target achieved? 155000.0 155000.0
***** SLSQP w/ jacobian *****
obj func value at solution 0.0146361108503
result: 3
starting values: [ 10000. 20000. 30000. 40000. 50000.]
ending values: [10999 21000 31000 41000 51000]
% diff [9 5 3 2 2]
target achieved? 155000.0 155000.0
which is exactly what you were seeing. So my guess is that the minimizer ends up somewhere in the likelihood space during SLSQP where the next jump is less than 1e-8 from the last place.
This is a partial answer to the question that I'm putting here to keep the question from getting even bigger, but I'd still love to see a more comprehensive and explanatory answer. These answers are based on comments from two others, but neither of them fully wrote out the code, and I thought it would make sense to make that explicit so here it is:
Fixing 2a (trust-constr with jacobian)
It's seems that the key here with regard to the Jacobian and Hessian is to specify neither or both (but not the jacobian only). #SubhaneilLahiri commented to this effect and there was also an error message to this effect that I initially failed to notice:
UserWarning: delta_grad == 0.0. Check if the approximated function is linear. If the function is linear better results can be obtained by defining the Hessian as zero instead of using quasi-Newton approximations.
So I fixed it by defining the hessian function:
def constr_hess(x,v):
return np.zeros([n,n])
and adding it to the constraint
nonlin_con = NonlinearConstraint( constr_func, 0., 0., constr_jac, constr_hess )
Fixing 3a & 3b (SLSQP)
This just seemed to be a matter of making the tolerance smaller as suggested by #user545424. So I just added options={'ftol':1e-15} to the minimization:
resSQjac = minimize( obj_func, x0, method='SLSQP',
options={'ftol':1e-15},
jac = obj_jac, constraints = eq_cons )
Related
I am trying to minimize variance across a portfolio of 100 securities.
def portvol(w, x):
return np.dot(w.T, np.dot(x, w))*252
covmat = annreturn.cov()
w0 = np.ones(len(covmat)) * (1 / len(covmat)) #equal weighting initially
bounds = ((0,1),) * len(covmat)
constraints = {'fun': lambda i: np.sum(i)-1.0, 'type': 'eq'}
optweights = minimize(portvol, w0, args = (covmat), method = 'SLSQP', bounds = bounds, constraints =
constraints)
annreturn.cov() is a 100x100 DataFrame. The output is the same .01 even weightings I started with and this failure message:
message: 'Inequality constraints incompatible'
nfev: 102
nit: 1
njev: 1
status: 4
success: False
This is how I calculated annualized returns...
annreturn = data.pct_change() #again, assuming percentage change
annreturn = annreturn.iloc[1:]
annreturn = (annreturn+1)**252-1
If you don't notice anything off the bat, it's ok. It took me 2 days to realize I didn't divide my PCT_CHANGE() result by 100. Time well spent. I was getting correlations to the powers of like 15+. Here is what the last line should have looked like, and the minimize function from the original question works fine.
annreturn = (annreturn/100+1)**252-1
Sorry if anyone took time on this without the above piece!
I am looking for a way to set a fixed step size for solving my initial value problem by Runge-Kutta method in Python. Accordingly, how I can tell the scipy.integrate.RK45 to keep a constant update (step size) for its integration procedure?
Thank you very much.
Scipy.integrate is usually used with changeable step method by controlling the TOL(one step error) while integrating numerically. The TOL is usually computed by checking with another numerical method. For example RK45 uses the 5th order Runge-Kutta to check the TOL of the 4th order Runge-Kutta method to determine the integrating step.
Hence if you must integrate ODEs with fixed step, just turn off the TOL check by setting atol, rtol with a rather large constant. For example, like the form:
solve_ivp(your function, t_span=[0, 10], y0=..., method="RK45", max_step=0.01, atol = 1, rtol = 1)
The TOL check is set to be so large that the integrating step would be the max_step you choose.
It is quite easy to code the Butcher tableau for the Dormand-Prince RK45 method.
0
1/5 | 1/5
3/10 | 3/40 9/40
4/5 | 44/45 −56/15 32/9
8/9 | 19372/6561 −25360/2187 64448/6561 −212/729
1 | 9017/3168 −355/33 46732/5247 49/176 −5103/18656
1 | 35/384 0 500/1113 125/192 −2187/6784 11/84
-----------------------------------------------------------------------------------------
| 35/384 0 500/1113 125/192 −2187/6784 11/84 0
| 5179/57600 0 7571/16695 393/640 −92097/339200 187/2100 1/40
first in a function for a single step
import numpy as np
def DoPri45Step(f,t,x,h):
k1 = f(t,x)
k2 = f(t + 1./5*h, x + h*(1./5*k1) )
k3 = f(t + 3./10*h, x + h*(3./40*k1 + 9./40*k2) )
k4 = f(t + 4./5*h, x + h*(44./45*k1 - 56./15*k2 + 32./9*k3) )
k5 = f(t + 8./9*h, x + h*(19372./6561*k1 - 25360./2187*k2 + 64448./6561*k3 - 212./729*k4) )
k6 = f(t + h, x + h*(9017./3168*k1 - 355./33*k2 + 46732./5247*k3 + 49./176*k4 - 5103./18656*k5) )
v5 = 35./384*k1 + 500./1113*k3 + 125./192*k4 - 2187./6784*k5 + 11./84*k6
k7 = f(t + h, x + h*v5)
v4 = 5179./57600*k1 + 7571./16695*k3 + 393./640*k4 - 92097./339200*k5 + 187./2100*k6 + 1./40*k7;
return v4,v5
and then in a standard fixed-step loop
def DoPri45integrate(f, t, x0):
N = len(t)
x = [x0]
for k in range(N-1):
v4, v5 = DoPri45Step(f,t[k],x[k],t[k+1]-t[k])
x.append(x[k] + (t[k+1]-t[k])*v5)
return np.array(x)
Then test it for some toy example with known exact solution y(t)=sin(t)
def mms_ode(t,y): return np.array([ y[1], sin(sin(t))-sin(t)-sin(y[0]) ])
mms_x0 = [0.0, 1.0]
and plot the error scaled by h^5
for h in [0.2, 0.1, 0.08, 0.05, 0.01][::-1]:
t = np.arange(0,20,h);
y = DoPri45integrate(mms_ode,t,mms_x0)
plt.plot(t, (y[:,0]-np.sin(t))/h**5, 'o', ms=3, label = "h=%.4f"%h);
plt.grid(); plt.legend(); plt.show()
to get the confirmation that this is indeed an order 5 method, as the graphs of the error coefficients come close together.
By looking at the implementation of the step, you'll find that the best you can do is to control the initial step size (within the bounds set by the minimum and maximum step size) by setting the attribute h_abs prior to calling RK45.step:
In [27]: rk = RK45(lambda t, y: t, 0, [0], 1e6)
In [28]: rk.h_abs = 30
In [29]: rk.step()
In [30]: rk.step_size
Out[30]: 30.0
If you are interested in data-wise fix step size, then I highly recommend you to use the scipy.integrate.solve_ivp function and its t_eval argument.
This function wraps up all of the scipy.integrate ode solvers in one function, thus you have to choose the method by giving value to its method argument. Fortunately, the default method is the RK45, so you don't have to bother with that.
What is more interesting for you is the t_eval argument, where you have to give a flat array. The function samples the solution curve at t_eval values and only returns these points. So if you want a uniform sampling by the step size then just give the t_eval argument the following: numpy.linspace(t0, tf, samplingResolution), where t0 is the start and tf is the end of the simulation.
Thusly you can have uniform sampling without having to resort fix step size that causes instability for some ODEs.
You've said you want a fixed-time step behaviour, not just a fixed evluation time step. Therefore, you have to "hack" your way through that if you not want to reimplement the solver yourself. Just set the integration tolerances atol and rtol to 1e90, and max_step and first_step to the value dt of the time step you want to use. This way the estimated integration error will always be very small, thus tricking the solver into not shrinking the time step dynamically.
However, only use this trick with EXPLICIT algortithms (RK23,RK45,DOP853) !
The implicit algorithms from "solve_ivp" (Radau, BDF, maybe LSODA as well) adjust the precision of the nonlinear Newton solver according to atol and rtol, therefore you might end up having a solution which does not make any sense...
I suggest to write your own rk4 fixed step program in py. There are many internet examples to help. That guarantees that you know precisely how each value is being computed. Furthermore, there will normally be no 0/0 calculations and if so they will be easy to trace and prompt another look at the ode's being solved.
Consider the following (convex) optimization problem:
minimize 0.5 * y.T * y
s.t. A*x - b == y
where the optimization (vector) variables are x and y and A, b are a matrix and vector, respectively, of appropriate dimensions.
The code below finds a solution easily using the SLSQP method from Scipy:
import numpy as np
from scipy.optimize import minimize
# problem dimensions:
n = 10 # arbitrary integer set by user
m = 2 * n
# generate parameters A, b:
np.random.seed(123) # for reproducibility of results
A = np.random.randn(m,n)
b = np.random.randn(m)
# objective function:
def obj(z):
vy = z[n:]
return 0.5 * vy.dot(vy)
# constraint function:
def cons(z):
vx = z[:n]
vy = z[n:]
return A.dot(vx) - b - vy
# constraints input for SLSQP:
cons = ({'type': 'eq','fun': cons})
# generate a random initial estimate:
z0 = np.random.randn(n+m)
sol = minimize(obj, x0 = z0, constraints = cons, method = 'SLSQP', options={'disp': True})
Optimization terminated successfully. (Exit mode 0)
Current function value: 2.12236220865
Iterations: 6
Function evaluations: 192
Gradient evaluations: 6
Note that the constraint function is a convenient 'array-output' function.
Now, instead of an array-output function for the constraint, one could in principle use an equivalent set of 'scalar-output' constraint functions (actually, the scipy.optimize documentation discusses only this type of constraint functions as input to minimize).
Here is the equivalent constraint set followed by the output of minimize (same A, b, and initial value as the above listing):
# this is the i-th element of cons(z):
def cons_i(z, i):
vx = z[:n]
vy = z[n:]
return A[i].dot(vx) - b[i] - vy[i]
# listable of scalar-output constraints input for SLSQP:
cons_per_i = [{'type':'eq', 'fun': lambda z: cons_i(z, i)} for i in np.arange(m)]
sol2 = minimize(obj, x0 = z0, constraints = cons_per_i, method = 'SLSQP', options={'disp': True})
Singular matrix C in LSQ subproblem (Exit mode 6)
Current function value: 6.87999270692
Iterations: 1
Function evaluations: 32
Gradient evaluations: 1
Evidently, the algorithm fails (the returning objective value is actually the objective value for the given initialization), which I find a bit weird. Note that running [cons_per_i[i]['fun'](sol.x) for i in np.arange(m)] shows that sol.x, obtained using the array-output constraint formulation, satisfies all scalar-output constraints of cons_per_i as expected (within numerical tolerance).
I would appreciate if anyone has some explanation for this issue.
You've run into the "late binding closures" gotcha. All the calls to cons_i are being made with the second argument equal to 19.
A fix is to use the args dictionary element in the dictionary that defines the constraints instead of the lambda function closures:
cons_per_i = [{'type':'eq', 'fun': cons_i, 'args': (i,)} for i in np.arange(m)]
With this, the minimization works:
In [417]: sol2 = minimize(obj, x0 = z0, constraints = cons_per_i, method = 'SLSQP', options={'disp': True})
Optimization terminated successfully. (Exit mode 0)
Current function value: 2.1223622086
Iterations: 6
Function evaluations: 192
Gradient evaluations: 6
You could also use the the suggestion made in the linked article, which is to use a lambda expression with a second argument that has the desired default value:
cons_per_i = [{'type':'eq', 'fun': lambda z, i=i: cons_i(z, i)} for i in np.arange(m)]
I'm using SciPy for optimization and the method SLSQP seems to ignore my constraints.
Specifically, I want x[3] and x[4] to be in the range [0-1]
I'm getting the message: 'Inequality constraints incompatible'
Here is the results of the execution followed by an example code (uses a dummy function):
status: 4
success: False
njev: 2
nfev: 24
fun: 0.11923608071680103
x: array([-10993.4278558 , -19570.77080806, -23495.15914299, -26531.4862831 ,
4679.97660534])
message: 'Inequality constraints incompatible'
jac: array([ 12548372.4766904 , 12967696.88362279, 39928956.72239509,
-9224613.99092537, 3954696.30747453, 0. ])
nit: 2
Here is my code:
from random import random
from scipy.optimize import minimize
def func(x):
""" dummy function to optimize """
print 'x'+str(x)
return random()
my_constraints = ({'type':'ineq', 'fun':lambda(x):1-x[3]-x[4]},
{'type':'ineq', 'fun':lambda(x):x[3]},
{'type':'ineq', 'fun':lambda(x):x[4]},
{'type':'ineq', 'fun':lambda(x):1-x[4]},
{'type':'ineq', 'fun':lambda(x):1-x[3]})
minimize(func, [57.9499 ,-18.2736,1.1664,0.0000,0.0765],
method='SLSQP',constraints=my_constraints)
EDIT -
The problem persists when even when removing the first constraint.
The problem persists when I try to use the bounds variables.
i.e.,
bounds_pairs = [(None,None),(None,None),(None,None),(0,1),(0,1)]
minimize(f,initial_guess,method=method_name,bounds=bounds_pairs,constraints=non_negative_prob)
I know this is a very old question, but I was intrigued.
When does it happen?
This problem occurs when the optimisation function is not reliably differentiable. If you use a nice smooth function like this:
opt = numpy.array([2, 2, 2, 2, 2])
def func(x):
return sum((x - opt)**2)
The problem goes away.
How do I impose hard constraints?
Note that none of the constrained algorithms in scipy.minimize have guarantees that the function will never be evaluated outside the constraints. If this is a requirement for you, you should rather use transformations. So for instance to ensure that no negative values for x[3] are ever used, you can use a transformation x3_real = 10^x[3]. This way x[3] can be any value but the variable you use will never be negative.
Deeper analysis
Investigating the Fortran code for slsqp yields the following insights into when this error occurs. The routine returns a MODE variable, which can take on these values:
C* MODE = -1: GRADIENT EVALUATION, (G&A) *
C* 0: ON ENTRY: INITIALIZATION, (F,G,C&A) *
C* ON EXIT : REQUIRED ACCURACY FOR SOLUTION OBTAINED *
C* 1: FUNCTION EVALUATION, (F&C) *
C* *
C* FAILURE MODES: *
C* 2: NUMBER OF EQUALITY CONTRAINTS LARGER THAN N *
C* 3: MORE THAN 3*N ITERATIONS IN LSQ SUBPROBLEM *
C* 4: INEQUALITY CONSTRAINTS INCOMPATIBLE *
C* 5: SINGULAR MATRIX E IN LSQ SUBPROBLEM *
C* 6: SINGULAR MATRIX C IN LSQ SUBPROBLEM *
The part which assigns mode 4 (which is the error you are getting) is as follows:
C SEARCH DIRECTION AS SOLUTION OF QP - SUBPROBLEM
CALL dcopy_(n, xl, 1, u, 1)
CALL dcopy_(n, xu, 1, v, 1)
CALL daxpy_sl(n, -one, x, 1, u, 1)
CALL daxpy_sl(n, -one, x, 1, v, 1)
h4 = one
CALL lsq (m, meq, n , n3, la, l, g, a, c, u, v, s, r, w, iw, mode)
C AUGMENTED PROBLEM FOR INCONSISTENT LINEARIZATION
IF (mode.EQ.6) THEN
IF (n.EQ.meq) THEN
mode = 4
ENDIF
ENDIF
So basically you can see it attempts to find a descent direction, if the constraints are active it attempts derivative evaluation along the constraint and fails with a singular matrix in the lsq subproblem (mode = 6), then it reasons that if all the constraint equations were evaluated and none yielded successful descent directions, this must be a contradictory set of constraints (mode = 4).
I have a function which is actually a call to another program (some Fortran code). When I call this function (run_moog) I can parse 4 variables, and it returns 6 values. These values should all be close to 0 (in order to minimize). However, I combined them like this: np.sum(results**2). Now I have a scalar function. I would like to minimize this function, i.e. get the np.sum(results**2) as close to zero as possible.
Note: When this function (run_moog) takes the 4 input parameters, it creates an input file for the Fortran code that depends on these parameters.
I have tried several ways to optimize this from the scipy docs. But none works as expected. The minimization should be able to have bounds on the 4 variables. Here is an attempt:
from scipy.optimize import minimize # Tried others as well from the docs
x0 = 4435, 3.54, 0.13, 2.4
bounds = [(4000, 6000), (3.00, 4.50), (-0.1, 0.1), (0.0, None)]
a = minimize(fun_mmog, x0, bounds=bounds, method='L-BFGS-B') # I've tried several different methods here
print a
This then gives me
status: 0
success: True
nfev: 5
fun: 2.3194639999999964
x: array([ 4.43500000e+03, 3.54000000e+00, 1.00000000e-01,
2.40000000e+00])
message: 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL'
jac: array([ 0., 0., -54090399.99999981, 0.])
nit: 0
The third parameter changes slightly, while the others are exactly the same. Also there have been 5 function calls (nfev) but no iterations (nit). The output from scipy is shown here.
Couple of possibilities:
Try COBYLA. It should be derivative-free, and supports inequality constraints.
You can't use different epsilons via the normal interface; so try scaling your first variable by 1e4. (Divide it going in, multiply coming back out.)
Skip the normal automatic jacobian constructor, and make your own:
Say you're trying to use SLSQP, and you don't provide a jacobian function. It makes one for you. The code for it is in approx_jacobian in slsqp.py. Here's a condensed version:
def approx_jacobian(x,func,epsilon,*args):
x0 = asfarray(x)
f0 = atleast_1d(func(*((x0,)+args)))
jac = zeros([len(x0),len(f0)])
dx = zeros(len(x0))
for i in range(len(x0)):
dx[i] = epsilon
jac[i] = (func(*((x0+dx,)+args)) - f0)/epsilon
dx[i] = 0.0
return jac.transpose()
You could try replacing that loop with:
for (i, e) in zip(range(len(x0)), epsilon):
dx[i] = e
jac[i] = (func(*((x0+dx,)+args)) - f0)/e
dx[i] = 0.0
You can't provide this as the jacobian to minimize, but fixing it up for that is straightforward:
def construct_jacobian(func,epsilon):
def jac(x, *args):
x0 = asfarray(x)
f0 = atleast_1d(func(*((x0,)+args)))
jac = zeros([len(x0),len(f0)])
dx = zeros(len(x0))
for i in range(len(x0)):
dx[i] = epsilon
jac[i] = (func(*((x0+dx,)+args)) - f0)/epsilon
dx[i] = 0.0
return jac.transpose()
return jac
You can then call minimize like:
minimize(fun_mmog, x0,
jac=construct_jacobian(fun_mmog, [1e0, 1e-4, 1e-4, 1e-4]),
bounds=bounds, method='SLSQP')
It sounds like your target function doesn't have well-behaving derivatives. The line in the output jac: array([ 0., 0., -54090399.99999981, 0.]) means that changing only the third variable value is significant. And because the derivative w.r.t. to this variable is virtually infinite, there is probably something wrong in the function. That is also why the third variable value ends up in its maximum.
I would suggest that you take a look at the derivatives, at least in a few points in your parameter space. Compute them using finite differences and the default step size of SciPy's fmin_l_bfgs_b, 1e-8. Here is an example of how you could compute the derivates.
Try also plotting your target function. For instance, keep two of the parameters constant and let the two others vary. If the function has multiple local optima, you shouldn't use gradient-based methods like BFGS.
How difficult is it to get an analytical expression for the gradient? If you have that you can then approximate the product of Hessian with a vector using finite difference. Then you can use other optimization routines available.
Among the various optimization routines available in SciPy, the one called TNC (Newton Conjugate Gradient with Truncation) is quite robust to the numerical values associated with the problem.
The Nelder-Mead Simplex Method (suggested by Cristián Antuña in the comments above) is well known to be a good choice for optimizing (posibly ill-behaved) functions with no knowledge of derivatives (see Numerical Recipies In C, Chapter 10).
There are two somewhat specific aspects to your question. The first is the constraints on the inputs, and the second is a scaling problem. The following suggests solutions to these points, but you might need to manually iterate between them a few times until things work.
Input Constraints
Assuming your input constraints form a convex region (as your examples above indicate, but I'd like to generalize it a bit), then you can write a function
is_in_bounds(p):
# Return if p is in the bounds
Using this function, assume that the algorithm wants to move from point from_ to point to, where from_ is known to be in the region. Then the following function will efficiently find the furthermost point on the line between the two points on which it can proceed:
from numpy.linalg import norm
def progress_within_bounds(from_, to, eps):
"""
from_ -- source (in region)
to -- target point
eps -- Eucliedan precision along the line
"""
if norm(from_, to) < eps:
return from_
mid = (from_ + to) / 2
if is_in_bounds(mid):
return progress_within_bounds(mid, to, eps)
return progress_within_bounds(from_, mid, eps)
(Note that this function can be optimized for some regions, but it's hardly worth the bother, as it doesn't even call your original object function, which is the expensive one.)
One of the nice aspects of Nelder-Mead is that the function does a series of steps which are so intuitive. Some of these points can obviously throw you out of the region, but it's easy to modify this. Here is an implementation of Nelder Mead with modifications made marked between pairs of lines of the form ##################################################################:
import copy
'''
Pure Python/Numpy implementation of the Nelder-Mead algorithm.
Reference: https://en.wikipedia.org/wiki/Nelder%E2%80%93Mead_method
'''
def nelder_mead(f, x_start,
step=0.1, no_improve_thr=10e-6, no_improv_break=10, max_iter=0,
alpha = 1., gamma = 2., rho = -0.5, sigma = 0.5):
'''
#param f (function): function to optimize, must return a scalar score
and operate over a numpy array of the same dimensions as x_start
#param x_start (numpy array): initial position
#param step (float): look-around radius in initial step
#no_improv_thr, no_improv_break (float, int): break after no_improv_break iterations with
an improvement lower than no_improv_thr
#max_iter (int): always break after this number of iterations.
Set it to 0 to loop indefinitely.
#alpha, gamma, rho, sigma (floats): parameters of the algorithm
(see Wikipedia page for reference)
'''
# init
dim = len(x_start)
prev_best = f(x_start)
no_improv = 0
res = [[x_start, prev_best]]
for i in range(dim):
x = copy.copy(x_start)
x[i] = x[i] + step
score = f(x)
res.append([x, score])
# simplex iter
iters = 0
while 1:
# order
res.sort(key = lambda x: x[1])
best = res[0][1]
# break after max_iter
if max_iter and iters >= max_iter:
return res[0]
iters += 1
# break after no_improv_break iterations with no improvement
print '...best so far:', best
if best < prev_best - no_improve_thr:
no_improv = 0
prev_best = best
else:
no_improv += 1
if no_improv >= no_improv_break:
return res[0]
# centroid
x0 = [0.] * dim
for tup in res[:-1]:
for i, c in enumerate(tup[0]):
x0[i] += c / (len(res)-1)
# reflection
xr = x0 + alpha*(x0 - res[-1][0])
##################################################################
##################################################################
xr = progress_within_bounds(x0, x0 + alpha*(x0 - res[-1][0]), prog_eps)
##################################################################
##################################################################
rscore = f(xr)
if res[0][1] <= rscore < res[-2][1]:
del res[-1]
res.append([xr, rscore])
continue
# expansion
if rscore < res[0][1]:
xe = x0 + gamma*(x0 - res[-1][0])
##################################################################
##################################################################
xe = progress_within_bounds(x0, x0 + gamma*(x0 - res[-1][0]), prog_eps)
##################################################################
##################################################################
escore = f(xe)
if escore < rscore:
del res[-1]
res.append([xe, escore])
continue
else:
del res[-1]
res.append([xr, rscore])
continue
# contraction
xc = x0 + rho*(x0 - res[-1][0])
##################################################################
##################################################################
xc = progress_within_bounds(x0, x0 + rho*(x0 - res[-1][0]), prog_eps)
##################################################################
##################################################################
cscore = f(xc)
if cscore < res[-1][1]:
del res[-1]
res.append([xc, cscore])
continue
# reduction
x1 = res[0][0]
nres = []
for tup in res:
redx = x1 + sigma*(tup[0] - x1)
score = f(redx)
nres.append([redx, score])
res = nres
Note This implementation is GPL, which is either fine for you or not. It's extremely easy to modify NM from any pseudocode, though, and you might want to throw in simulated annealing in any case.
Scaling
This is a trickier problem, but jasaarim has made an interesting point regarding that. Once the modified NM algorithm has found a point, you might want to run matplotlib.contour while fixing a few dimensions, in order to see how the function behaves. At this point, you might want to rescale one or more of the dimensions, and rerun the modified NM.
–