Consider the following (convex) optimization problem:
minimize 0.5 * y.T * y
s.t. A*x - b == y
where the optimization (vector) variables are x and y and A, b are a matrix and vector, respectively, of appropriate dimensions.
The code below finds a solution easily using the SLSQP method from Scipy:
import numpy as np
from scipy.optimize import minimize
# problem dimensions:
n = 10 # arbitrary integer set by user
m = 2 * n
# generate parameters A, b:
np.random.seed(123) # for reproducibility of results
A = np.random.randn(m,n)
b = np.random.randn(m)
# objective function:
def obj(z):
vy = z[n:]
return 0.5 * vy.dot(vy)
# constraint function:
def cons(z):
vx = z[:n]
vy = z[n:]
return A.dot(vx) - b - vy
# constraints input for SLSQP:
cons = ({'type': 'eq','fun': cons})
# generate a random initial estimate:
z0 = np.random.randn(n+m)
sol = minimize(obj, x0 = z0, constraints = cons, method = 'SLSQP', options={'disp': True})
Optimization terminated successfully. (Exit mode 0)
Current function value: 2.12236220865
Iterations: 6
Function evaluations: 192
Gradient evaluations: 6
Note that the constraint function is a convenient 'array-output' function.
Now, instead of an array-output function for the constraint, one could in principle use an equivalent set of 'scalar-output' constraint functions (actually, the scipy.optimize documentation discusses only this type of constraint functions as input to minimize).
Here is the equivalent constraint set followed by the output of minimize (same A, b, and initial value as the above listing):
# this is the i-th element of cons(z):
def cons_i(z, i):
vx = z[:n]
vy = z[n:]
return A[i].dot(vx) - b[i] - vy[i]
# listable of scalar-output constraints input for SLSQP:
cons_per_i = [{'type':'eq', 'fun': lambda z: cons_i(z, i)} for i in np.arange(m)]
sol2 = minimize(obj, x0 = z0, constraints = cons_per_i, method = 'SLSQP', options={'disp': True})
Singular matrix C in LSQ subproblem (Exit mode 6)
Current function value: 6.87999270692
Iterations: 1
Function evaluations: 32
Gradient evaluations: 1
Evidently, the algorithm fails (the returning objective value is actually the objective value for the given initialization), which I find a bit weird. Note that running [cons_per_i[i]['fun'](sol.x) for i in np.arange(m)] shows that sol.x, obtained using the array-output constraint formulation, satisfies all scalar-output constraints of cons_per_i as expected (within numerical tolerance).
I would appreciate if anyone has some explanation for this issue.
You've run into the "late binding closures" gotcha. All the calls to cons_i are being made with the second argument equal to 19.
A fix is to use the args dictionary element in the dictionary that defines the constraints instead of the lambda function closures:
cons_per_i = [{'type':'eq', 'fun': cons_i, 'args': (i,)} for i in np.arange(m)]
With this, the minimization works:
In [417]: sol2 = minimize(obj, x0 = z0, constraints = cons_per_i, method = 'SLSQP', options={'disp': True})
Optimization terminated successfully. (Exit mode 0)
Current function value: 2.1223622086
Iterations: 6
Function evaluations: 192
Gradient evaluations: 6
You could also use the the suggestion made in the linked article, which is to use a lambda expression with a second argument that has the desired default value:
cons_per_i = [{'type':'eq', 'fun': lambda z, i=i: cons_i(z, i)} for i in np.arange(m)]
Related
I tried using minimize function in scipy packages like below code
When I use jac option = approx_fprime, iteration is 0 and optimization doesn't work.
But When I use jac option = rosen_der, it worked!
import numpy as np
from scipy.optimize import minimize, approx_fprime
def rosen(x):
"""The Rosenbrock function"""
return sum(100.0*(x[1:]-x[:-1]**2.0)**2.0 + (1-x[:-1])**2.0)
def rosen_der(x):
# derivative of rosenbrock function
xm = x[1:-1]
xm_m1 = x[:-2]
xm_p1 = x[2:]
der = np.zeros_like(x)
der[1:-1] = 200*(xm-xm_m1**2) - 400*(xm_p1 - xm**2)*xm - 2*(1-xm)
der[0] = -400*x[0]*(x[1]-x[0]**2) - 2*(1-x[0])
der[-1] = 200*(x[-1]-x[-2]**2)
return der
x0=np.array([1.3, 0.7])
eps = np.sqrt(np.finfo(float).eps)
fprime = lambda x : np.array(approx_fprime(x0, rosen, eps))
res = minimize(rosen, x0, method='CG', jac=fprime, options={'maxiter':10, 'disp': True})
print(res.x)
[ 515.40001106 -197.99999905]
[ 515.4 -198. ]
98.10000000000005
Warning: Desired error not necessarily achieved due to precision loss.
Current function value: 98.100000
Iterations: 0
Function evaluations: 33
Gradient evaluations: 21
[1.3 0.7]
I checked approx_fprime is ndarray, same rosen_der, and value is same too.
Why optimization doesn't work??
Your function fprime is a function of x but approximates the derivative at x0. Consequently, you're evaluating the gradient at the initial guess x0 in every iteration. You should evaluate/approximate the derivative at x instead:
fprime = lambda x : approx_fprime(x, rosen, eps)
Note that approx_fprime already returns an np.ndarray, so there's no need for the extra np.array call.
It's also worth mentioning that you don't need to pass approximated derivatives as minimize approximates them by finite differences by default once you don't pass any derivatives, i.e. jac=None. However, minimize uses approx_derivative under the hood instead of approx_fprime as it provides support for evaluating derivatives at variable bounds.
I am trying to code an optimizer finding the optimal constant parameters so as to minimize the MSE between an array y and a generic function over X. The generic function is given in pre-order, so for example if the function over X is x1 + c*x2 the function would be [+, x1, *, c, x2]. The objective in the previous example, would be minimizing:
sum_for_all_x (y - (x1 + c*x2))^2
I show next what I have done to solve the problem. Some things that sould be known are:
X and y are torch tensors.
constants is the list of values to be optimized.
def loss(self, constants, X, y):
stack = [] # Stack to save the partial results
const = 0 # Index of constant to be used
for idx in self.traversal[::-1]: # Reverse the prefix notation
if idx > Language.max_variables: # If we are dealing with an operator
function = Language.idx_to_token[idx] # Get its associated function
first_operand = stack.pop() # Get first operand
if function.arity == 1: # If the arity of the operator is one (e.g sin)
stack.append(function.function(first_operand)) # Append result
else: # Same but if arity is 2
second_operand = stack.pop() # Need a second operand
stack.append(function.function(first_operand, second_operand))
elif idx == 0: # If it is a constant -> idx 0 indicates a constant
stack.append(constants[const]*torch.ones(X.shape[0])) # Append constant
const += 1 # Update
else:
stack.append(X[:, idx - 1]) # Else append the associated column of X
prediction = stack[0]
return (y - prediction).pow(2).mean().cpu().numpy()
def optimize_constants(self, X, y):
'''
# This function optimizes the constants of the expression tree.
'''
if 0 not in self.traversal: # If there are no constants to be optimized return
return self.traversal
x0 = [0 for i in range(len(self.constants))] # Initial guess
ini = time.time()
res = minimize(self.loss, x0, args=(X, y), method='BFGS', options={'disp': True})
print(res)
print('Time:', time.time() - ini)
The problem is that the optimizer theoretically terminates successfully but does not iterate at all. The output res would be something like that:
Optimization terminated successfully.
Current function value: 2.920725
Iterations: 0
Function evaluations: 2
Gradient evaluations: 1
fun: 2.9207253456115723
hess_inv: array([[1]])
jac: array([0.])
message: 'Optimization terminated successfully.'
nfev: 2
nit: 0
njev: 1
status: 0
success: True
x: array([0.])
So far I have tried to:
Change the method in the minimizer (e.g Nelder-Mead, SLSQP,...) but it happens the same with all of them.
Change the way I return the result (e.g (y - prediction).pow(2).mean().item())
It seems that scipy optimize minimize does not work well with Pytorch. Changing the code to use numpy ndarrays solved the problem.
I'm trying to minimize a constrained function of several variables adopting the algorithm scipy.optimize.minimize. The function concerns the minimization of 3*N parameters, where Nis an input. More specifically, my minimization parameters are given in three arrays H = H[0],H[1],...,H[N-1], a = a[0],a[1],...,a[N-1] and b = b[0],b[1],...,b[N-1] which I concatenated in only one array named mins, with len(mins)=3*N.
Those parameters are also subjected to constraints as follows:
0 <= H and sum(H) = 0.5
0 <= a <= Pi/2
0 <= b <= Pi/2
So, my code for the constraints read as:
import numpy as np
# constraints on x:
def Hlhs(mins): # left hand side
return np.diag(np.ones(N)) # mins.reshape(3,N)[0]
def Hrhs(mins): # right hand side
return np.sum(mins.reshape(3,N)[0]) - 0.5
con1H = {'type': 'ineq', 'fun': lambda H: Hlhs(H)}
con2H = {'type': 'eq', 'fun': lambda H: Hrhs(H)}
# constraints on a:
def alhs(mins):
return np.diag(np.ones(N)) # mins.reshape(3,N)[1]
def arhs(mins):
return -np.diag(np.ones(N)) # mins.reshape(3,N)[1] + (np.ones(N))*np.pi/2
con1a = {'type': 'ineq', 'fun': lambda a: alhs(a)}
con2a = {'type': 'ineq', 'fun': lambda a: arhs(a)}
# constraints on b:
def blhs(mins):
return np.diag(np.ones(N)) # mins.reshape(3,N)[2]
def brhs(mins):
return -np.diag(np.ones(N)) # mins.reshape(3,N)[2] + (np.ones(N))*np.pi/2
con1b = {'type': 'ineq', 'fun': lambda b: blhs(b)}
con2b = {'type': 'ineq', 'fun': lambda b: brhs(b)}
My function, with the other parameters (and adopting N=3) to be minimized, is given by (I'm sorry if it is too long):
gamma = 17
C = 85
T = 0
Hf = 0.5
Li = 2
Bi = 1
N = 3
def FUN(mins):
H, a, b = mins.reshape(3,N)
S1 = 0; S2 = 0
B = np.zeros(N); L = np.zeros(N);
for i in range(N):
sbi=Bi; sli=Li
for j in range(i+1):
sbi += 2*H[j]*np.tan(b[j])
sli += 2*H[j]*np.tan(a[j])
B[i]=sbi
L[i]=sli
for i in range(N):
S1 += (C*(1-np.sin(a[i])) + T*np.sin(a[i])) * (Bi*H[i]+H[i]**2*np.tan(b[i]))/np.cos(a[i]) + \
(C*(1-np.sin(b[i])) + T*np.sin(b[i])) * (Li*H[i]+H[i]**2*np.tan(a[i]))/np.cos(b[i])
S2 += (gamma*H[0]/12)*(Bi*Li + 4*(B[0]-H[0]*np.tan(b[0]))*(L[0]-H[0]*np.tan(a[0])) + B[0]*L[0])
j=1
while j<(N):
S2 += (gamma*H[j]/12)*(B[j-1]*L[j-1] + 4*(B[j]-H[j]*np.tan(b[j]))*(L[j]-H[j]*np.tan(a[j])) + B[j]*L[j])
j += 1
F = 2*(S1+S2)
return F
And, finally, adopting an initial guess for the values as 0, the minimization is given by:
x0 = np.zeros(3*N)
res = scipy.optimize.minimize(FUN,x0,constraints=(con1H,con2H,con1a,con2a,con1b,con2b),tol=1e-25)
My problems are:
a) Observing the result res, some values got negative even though I have constraints for them to be positive. The success of the minimization was False, and the message was: Positive directional derivative for linesearch. Also, the result is very far from the minimum expected.
b) Adopting the method='trust-constr' I got a value closer to what I was expecting but with a false success and the message The maximum number of function evaluations is exceeded.. Is there any way to improve this?
I know that there is a minimum very close to these values:
H = [0.2,0.15,0.15]
a = [1.0053,1.0053,1.2566]
b = [1.0681,1.1310,1.3195]
where the value for the function is 123,45. I've checked the function several times and it seems to be working properly. Can anyone help me to find where my problem is? I've tried to change xtol and maxiter but with no success.
Here are a few hints:
Your initial point x0 is not feasible since it doesn't satisfy the constraint sum(H) = 0.5. Providing a feasible initial point should fix your first problem.
Except for the constraint sum(H) = 0.5, all constraints are simple bounds on the variables. In general, it's recommended to pass variable bounds via the bounds parameter of minimize. You can simply define and pass all the bounds like this
from scipy.optimize import minimize
import numpy as np
# ..your variables and functions ..
bounds = [(0, None)]*N + [(0, np.pi/2)]*2*N
x0 = np.zeros(3*N)
x0[0] = 0.5
res = minimize(FUN, x0, constraints=(con2H,), bounds=bounds,
method="trust-constr", options={'maxiter': 20000})
where each tuple contains the lower and upper bound for each variable.
Unfortunately, 'trust-constr' has still trouble to converge to a local minimizer. In this case, you can either try other initial points or you can use the state-of-the-art open source solver Ipopt instead. The Cython wrapper cyipopt provides a interface similar to scipy:
from cyipopt import minimize_ipopt
# rest as above
res = minimize_ipopt(FUN, x0, constraints=(con2H,), bounds=bounds)
this gives me a solution with objective value 122.9.
Last but not least, it's always a good idea to provide exact gradients, jacobians and hessians.
I am trying to solve a couple minimization problems using Python but the setup with constraints is difficult for me to understand. I have:
minimize: x+y+2z^2
subject to: x = 1 and x^2+y^2 = 1
This is very easy obviously and I know the solution is x=1,y=0,z=0. I tried to use scipy.optimize.L-BFGS-B but had issues.
I also have:
minimize: 2x1^2+x2^2
subject to: x1+x2=1
I need to use a gradient based optimizer so I chose scipy.optimizer.COBYLA but had issues using an equality constraint as it only takes inequality constraints. The code for this is:
def objective(x):
x1 = x[0]
x2 = x[1]
return 2*(x1**2)+ x2
def constraint1(x):
return x[0]+x[1]-1
#Try an initial condition of x1=1 and x2=0
#Our initial condition satisfies the constraint already
x0 = [0.3,0.7]
print(objective(x0))
xnew = [0.25,0.75]
print(objective(xnew))
#Since we have already calculated on paper we know that x1 and x2 fall between 0 and 1
#We can set our bounds for both variables as being between 0 and 1
b = (0,1)
bounds = (b,b)
#Lets make note of the type of constraint we have for out optimizer
con1 = {'type': 'eq', 'fun':constraint1}
cons = [con1]
sol_gradient = minimize(objective,x0,method='COBYLA',bounds=bounds, constraints=cons)
Then I get error about using equality constraints with this optimizer.
A few things:
Your objective function does not match with the description you have provided. Should it be this: 2*(x1**2) + x2**2?
From the docs scipy.optimize.minimize you can see that COBYLA does not support eq as a constraint. From the page:
Note that COBYLA only supports inequality constraints.
Since you said you want to use a Gradient based optimizer, one option could be to use the Sequential Least Squares Programming (SLSQP) optimizer.
Below is the code replacing 'COBYLA' with 'SLSQP' and changing the objective function according to 1:
def objective(x):
x1 = x[0]
x2 = x[1]
return 2*(x1**2)+ x2**2
def constraint1(x):
return x[0]+x[1]-1
#Try an initial condition of x1=1 and x2=0
#Our initial condition satisfies the constraint already
x0 = [0.3,0.7]
print(objective(x0))
xnew = [0.25,0.75]
print(objective(xnew))
#Since we have already calculated on paper we know that x1 and x2 fall between 0 and 1
#We can set our bounds for both variables as being between 0 and 1
b = (0,1)
bounds = (b,b)
#Lets make note of the type of constraint we have for out optimizer
con1 = {'type': 'eq', 'fun':constraint1}
cons = [con1]
sol_gradient = minimize(objective,x0,method='SLSQP',bounds=bounds, constraints=cons)
print(sol_gradient)
Which gives the final answer as:
fun: 0.6666666666666665
jac: array([1.33333336, 1.33333335])
message: 'Optimization terminated successfully'
nfev: 7
nit: 2
njev: 2
status: 0
success: True
x: array([0.33333333, 0.66666667])
I have a logarithmic function being fit to a set of points
def log_func(x, a, b):
return a * np.log(x) + b
popt, pcov = curve_fit(log_func, x, yn)
This results in a plot as follows - Plotted Curve
However, the system has constraints that the range should be fixed between 0 and 100. I've specifically passed points at those bounds (i.e. x = np.array([3200 ... other points ... 42000 ]) and y = np.array([0 ... other points ... 100 ] ) but obviously the curve does not necessarily fix those values.
I've read that I can add bounds to the parameters (so a and b here), but is there a way to constrain the output by specifically forcing the curve through two endpoints. Or alternatively, do I have to introduce some sort of extreme penalization to the function to result in parameters that will be force a result between 0 and 100?
You could formulate the curve fitting problem as a constrained optimization problem and use scipy.optimize.minimize to solve it. Considering a data set for which the optimal a should be positive, it follows that the requirement on the range of the fitted function is equivalent to the constraints a*np.log(3200)+b>=0 and a*np.log(42000)+b<=100.
One can proceed as follows (I used a simple data set).
from scipy.optimize import minimize
x = np.array([3200, 14500, 42000])
yn = np.array([0, 78, 100])
def LS_obj(p):
a, b = p
return ((log_func(x, a, b) - yn)**2).sum()
cons = ({'type': 'ineq', 'fun': lambda p: p[0] * np.log(3200) + p[1]},
{'type': 'ineq', 'fun': lambda p: 100 -p[0] * np.log(42000) - p[1]})
p0 = [10,-100] #initial estimate
sol = minimize(LS_obj,p0 ,constraints=cons)
print(sol.x) #optimal parameters
[ 36.1955 -285.316 ]
The following figure compares the curve_fit and minimize solutions. As expected, the minimize solution is within the required range.