I don't have much experience using Genetic Algorithms, so I would like to ask the community for some useful comments. I want to apologize for my terminology errors. Please, correct me if it's needed.
The problem I want to optimize is optimal power flow in an islanded microgrid. In the simple microgrid we have 2 diesel generators (DG), 1 PV array, 1 Energy Storage System (ESS) and Load. Let's assume we know Load and PV array output power values for next periods.
So, the objective function should be minimized is OPEX as sum of every microgrid component operational cost at each moment t in period T:
where a, b are some operational cost coefficients, is diesel generator binary (0/1 or ON/OFF) status variable and P is output power of the microgrid component at the time t.
And here are some of constraints (the real problem is hardly and nonlinearly constrained so I wrote down only three of constraints):
Power balance
ESS' Maximum depth of disharge
Diesel gensets power limit
So, it's mixed integer problem with nonlinear constraints. I tried to adapt the problem for solving it using Genetic Algorithm. I used pymoo Python library for multiobjective optimization with NSGA2 algorithm. Let's consider and for this T we have some Load and PV power data:
from pymoo.model.problem import FunctionalProblem
from pymoo.factory import get_sampling, get_crossover, get_mutation
from pymoo.operators.mixed_variable_operator import MixedVariableSampling, MixedVariableMutation, MixedVariableCrossover
from pymoo.algorithms.nsga2 import NSGA2
from pymoo.factory import get_sampling, get_crossover, get_mutation, get_termination
from pymoo.optimize import minimize
PV = np.array([10, 19.8, 16, 25, 7.8, 42.8, 10]) #PV inverter output power, kW
Load = np.array([100, 108, 150, 150, 90, 16, 170]) #Load, kW
balance_eps = 0.001 #equality constraint relaxing coefficient
DG1_pmin = 0.3 #DG1 min power
DG2_pmin = 0.3 #DG2 min power
P_dg1 = 75 #DG1 rated power, kW
P_dg2 = 75 #DG1 rated power, kW
P_PV_inv = 50 #PV inverter rated power, kW
P_ESS_inv = 30 #ESS bidirectional inverter absolute rated discharge/charge power, kW
ESS_c = 100 #ESS capacity, kWh
SOC_min = 30
SOC_max = 100
objs = [lambda x: x[0]*x[2]*200 + x[1]*x[3]*200 + x[4]*0.002 #objective function]
constr_eq = [lambda x: ((Load[t] - x[0]*x[2] - x[1]*x[3] - x[4] - PV[t] )**2)]
constr_ieq = [lambda x: -SOC_t + 100*x[4]/ESS_c + SOC_min,
lambda x: SOC_t - 100*x[4]/ESS_c - SOC_max]
problem = FunctionalProblem(n_var=n_var, objs, constr_eq=constr_eq, constr_eq_eps=1e-03, constr_ieq=constr_ieq,
xl=np.array([0, 0, DG1_pmin*P_dg1, DG2_pmin*P_dg2, -P_ESS_inv]), xu=np.array([1, 1, P_dg1, P_dg2, P_ESS_inv]))
mask = ["int", "int", "real", "real", "real"]
sampling = MixedVariableSampling(mask, {
"real": get_sampling("real_random"),
"int": get_sampling("int_random")})
crossover = MixedVariableCrossover(mask, {
"real": get_crossover("real_sbx", prob=1.0, eta=3.0),
"int": get_crossover("int_sbx", prob=1.0, eta=3.0)})
mutation = MixedVariableMutation(mask, {
"real": get_mutation("real_pm", eta=3.0),
"int": get_mutation("int_pm", eta=3.0)})
algorithm = NSGA2(
pop_size=150,
sampling=sampling,
crossover=crossover,
mutation=mutation,
eliminate_duplicates=True)
We have n_var = 5 decision variables which are being optimized: . We should also have an access to the previous value of SOC.
I wrote a recursive code to implement a consecutive optimization chain:
x=[]
s=[]
SOC_t = 100 #SOC at t = -1
for t in range (0, 7):
res = minimize(
problem,
algorithm,
seed=1,
termination = get_termination("n_gen", 300),
save_history=True, verbose=False)
SOC_t = SOC_t - 100*res.X[4]/ESS_c
print(res.X[:2], np.around(res.X[2:].astype(np.double), 3), np.around(SOC_t, 2))
x.append(res.X)
s.append(SOC_t)
So, we have initialized populations with size 150 for every time step t and individuals in that populations looked like . Running this code I get these optimization results found:
[1 1] [27.272 34.635 28.071] 71.93
[0 1] [28.127 58.168 30. ] 41.93
[1 1] [50.95 71.423 11.599] 30.33
[1 1] [53.966 70.97 0.034] 30.3
[1 1] [24.636 59.236 -1.702] 32.0
[0 0] [40.831 29.184 -26.832] 58.83
[1 1] [68.299 63.148 28.572] 30.26
Even my little experience in Genetic Algorithms allows me to state, that such approach is inappropriate and unefficient.
So, here is my question (if you're still reading my post :)
Is there a way to optimize such problem using not consecutive optimization of a particular variables set at t, but defining individuals in population as arrays with size (T, n_var)?
For the problem described an individual in population may look like
Is it possible to implement such approach? If yes, how to do it in pymoo?
Thank you very much for your time! Any comments and suggestions will be appreciated.
Related
#Error: setting an array element with a sequence
I am trying to mninimize the downside risk.
I have a two dimensional array of returns shape(1000, 10), and the portfolio starts with $100. Compound that 10 times by each return in a row. Do that for all the rows. Compare that last cell's value for each row with mean of last column's values. Keep the value if it's less than mean or else zero. So we will have an array of (1000, 1). At the end I am finding the standard deviation of that.
Objective is to minimize the standard deviation.
Constraints: weights need to be less than 1
the expected return i.e. wt*ret should be equal to a value like 7%. I have to do that for couple of values like 7%, 8% or 10%.
wt = np.array([0.4, 0.3, 0.3])
cov = array([[0.00026566, 0.00016167, 0.00011949],
[0.00016167, 0.00065866, 0.00021662],
[0.00011949, 0.00021662, 0.00043748]])
ret =[.098, 0.0620,.0720]
iterations = 10000
return_sim = np.random.multivariate_normal(ret, cov, iterations)
def simulations(wt):
downside =[]
fund_ret =np.zeros((1000,10))
prt_ret = np.dot(return_sim , wt)
re_ret = np.array(prt_ret).reshape(1000, 10) #10 years
for m in range(len(re_ret)):
fund_ret[m][0] = 100 * (1 + re_ret[m][0]) #start with $100
for n in range(9):
fund_ret[m][n+1] = fund_ret[m][n]* (1 + re_ret[m][n+1])
mean = np.mean(fund_ret[:,-1]) #just need the last column and all rows
for i in range(1000):
downside.append(np.maximum((mean - fund_ret[i,-1]), 0))
return np.std(downside)
b = GEKKO()
w = b.Array(b.Var,3,value=0.33,lb=1e-5, ub=1)
b.Equation(b.sum(w)<=1)
b.Equation(np.dot(w,ret) == .07)
b.Minimize(simulations(w))
b.solve(disp=False)
#simulations(wt)
If you comment out the gekko section and call the simulation function at the bottom, it works fine
In this case, you would want to consider a different optimizer such as scipy.minimize.optimize. The function np.std() is not currently supported in Gekko. Gekko compiles the model into byte-code for automatic differentiation so you need to fit the problem into a form that is supported. Gekko's approach has several advantages, especially for large-scale or non-linear problems. For small problems with fewer than 100 variables and nearly linear constraints, an optimizer such as scipy.minimize.optimize is often a viable option. Here is your problem with a solution:
import numpy as np
from scipy.optimize import minimize
wt = np.array([0.4, 0.3, 0.3])
cov = np.array([[0.00026566, 0.00016167, 0.00011949],
[0.00016167, 0.00065866, 0.00021662],
[0.00011949, 0.00021662, 0.00043748]])
ret =[.098, 0.0620,.0720]
iterations = 10000
return_sim = np.random.multivariate_normal(ret, cov, iterations)
def simulations(wt):
downside =[]
fund_ret =np.zeros((1000,10))
prt_ret = np.dot(return_sim , wt)
re_ret = np.array(prt_ret).reshape(1000, 10) #10 years
for m in range(len(re_ret)):
fund_ret[m][0] = 100 * (1 + re_ret[m][0]) #start with $100
for n in range(9):
fund_ret[m][n+1] = fund_ret[m][n]* (1+re_ret[m][n+1])
#just need the last column and all rows
mean = np.mean(fund_ret[:,-1])
for i in range(1000):
downside.append(np.maximum((mean - fund_ret[i,-1]), 0))
return np.std(downside)
b = (1e-5,1); bnds=(b,b,b)
cons = ({'type': 'ineq', 'fun': lambda x: sum(x)-1},\
{'type': 'eq', 'fun': lambda x: np.dot(x,ret)-.07})
sol = minimize(simulations,wt,bounds=bnds,constraints=cons)
w = sol.x
print(w)
This produces the solution sol with optimal values w=sol.x:
fun: 6.139162309118155
jac: array([ 8.02691203, 10.04863131, 9.49171901])
message: 'Optimization terminated successfully.'
nfev: 33
nit: 6
njev: 6
status: 0
success: True
x: array([0.09741111, 0.45326888, 0.44932001])
I'm using an example of linear regression from bayesian methods for hackers but having trouble expanding it to my usage.
I have observations on a random variable, an assumed distribution on that random variable, and finally another assumed distribution on that random variable for which I have observations. How I have tried to model it is with intermediate distributions on a and b, but it complains Wrong number of dimensions: expected 0, got 1 with shape (788,).
To describe the actual model, I am predicting the conversion rate for a certain amount (n) of cultivating emails. My prior is that the conversion rate (described by a Beta function on alpha and beta) will be updated by having alpha and beta scaled by some factors (0,inf] a and b, which start at 1 for n=0 and increase to their max value at some threshold.
# Generate predictive data, X and target data, Y
data = [
{'n': 0 , 'trials': 120, 'successes': 1},
{'n': 5 , 'trials': 111, 'successes': 2},
{'n': 10, 'trials': 78 , 'successes': 1},
{'n': 15, 'trials': 144, 'successes': 3},
{'n': 20, 'trials': 280, 'successes': 7},
{'n': 25, 'trials': 55 , 'successes': 1}]
X = np.empty(0)
Y = np.empty(0)
for dat in data:
X = np.insert(X, 0, np.ones(dat['trials']) * dat['n'])
target = np.zeros(dat['trials'])
target[:dat['successes']] = 1
Y = np.insert(Y, 0, target)
with pm.Model() as model:
alpha = pm.Uniform("alpha_n", 5, 13)
beta = pm.Uniform("beta_n", 1000, 1400)
n_sat = pm.Gamma("n_sat", alpha=20, beta=2, testval=10)
a_gamma = pm.Gamma("a_gamma", alpha=18, beta=15)
b_gamma = pm.Gamma("b_gamma", alpha=18, beta=27)
a_slope = pm.Deterministic('a_slope', 1 + (X/n_sat)*(a_gamma-1))
b_slope = pm.Deterministic('b_slope', 1 + (X/n_sat)*(b_gamma-1))
a = pm.math.switch(X >= n_sat, a_gamma, a_slope)
b = pm.math.switch(X >= n_sat, b_gamma, b_slope)
p = pm.Beta("p", alpha=alpha*a, beta=beta*b)
observed = pm.Bernoulli("observed", p, observed=Y)
Is there a way to get this to work?
Data
First, note that the total likelihood of repeated Bernoulli trials is exactly a binomial likelihood, so there is no need to expand to individual trials in your data. I'd also suggest using a Pandas DataFrame to manage your data - it's helps to keep things tidy:
import pandas as pd
df = pd.DataFrame({
'n': [0, 5, 10, 15, 20, 25],
'trials': [120, 111, 78, 144, 280, 55],
'successes': [1, 2, 1, 3, 7, 1]
})
Solution
This will help simplify the model, but the solution really is to add a shape argument to the p random variable so that PyMC3 knows to how to interpret the one dimensional parameters. The fact is that you do want a different p distribution for each n case you have, so there is nothing conceptually wrong here.
with pm.Model() as model:
# conversion rate hyperparameters
alpha = pm.Uniform("alpha_n", 5, 13)
beta = pm.Uniform("beta_n", 1000, 1400)
# switchpoint prior
n_sat = pm.Gamma("n_sat", alpha=20, beta=2, testval=10)
a_gamma = pm.Gamma("a_gamma", alpha=18, beta=15)
b_gamma = pm.Gamma("b_gamma", alpha=18, beta=27)
# NB: I removed pm.Deterministic b/c (a|b)_slope[0] is constant
# and this causes issues when using ArViZ
a_slope = 1 + (df.n.values/n_sat)*(a_gamma-1)
b_slope = 1 + (df.n.values/n_sat)*(b_gamma-1)
a = pm.math.switch(df.n.values >= n_sat, a_gamma, a_slope)
b = pm.math.switch(df.n.values >= n_sat, b_gamma, b_slope)
# conversion rates
p = pm.Beta("p", alpha=alpha*a, beta=beta*b, shape=len(df.n))
# observations
pm.Binomial("observed", n=df.trials, p=p, observed=df.successes)
trace = pm.sample(5000, tune=10000)
This samples nicely
and yields reasonable intervals on the conversion rates
but the fact that the posteriors for alpha_n and beta_n go right up to your prior boundaries is a bit concerning:
I think the reason for this is that, for each condition you only do 55-280 trials, which, if the conditions were independent (worst case), conjugacy would tells us that your Beta hyperparameters should be in that range. Since you are doing a regression, then the best case scenario for information sharing across the trials would put your hyperparameters in the range of the sum of trials (788) - but that's an upper limit. Because you're outside this range, the concern here is that you're forcing the model to be more precise in its estimates than you really have the evidence to support. However, one can justify this is if the prior is based on strong independent evidence.
Otherwise, I'd suggest expanding the ranges on those priors that affect the final alpha*a and beta*b numbers (the sums of those should be close to your trial counts in the posterior).
Alternative Model
I'd probably do something along the following lines, which I think has a more transparent parameterization, though it's not completely identical to your model:
with pm.Model() as model_br_sp:
# regression coefficients
alpha = pm.Normal("alpha", mu=0, sd=1)
beta = pm.Normal("beta", mu=0, sd=1)
# saturation parameters
saturation_point = pm.Gamma("saturation_point", alpha=20, beta=2)
max_success_rate = pm.Beta("max_success_rate", 1, 9)
# probability of conversion
success_rate = pm.Deterministic("success_rate",
pm.math.switch(df.n.values > saturation_point,
max_success_rate,
max_success_rate*pm.math.sigmoid(alpha + beta*df.n)))
# observations
pm.Binomial("successes", n=df.trials, p=success_rate, observed=df.successes)
trace_br_sp = pm.sample(draws=5000, tune=10000)
Here we map the predictor space to probability space through a sigmoid that maxes out at the maximum success rate. The prior on the saturation point is identical to yours, while that on the maximum success rate is weakly informative (Beta[1,9] - though I will say it runs on a flat prior nearly as well). This also samples well,
and gives similar intervals (though the switchpoint seems to dominate more):
We can compare the two models and see that there isn't a significant difference in their explanatory power:
import arviz as az
model_compare = az.compare({'Binomial Regression w/ Switchpoint': trace_br_sp,
'Original Model': trace})
az.plot_compare(model_compare)
Given this simulated data:
import numpy as np
from statsmodels.tsa.arima_process import ArmaProcess
from statsmodels.tsa.statespace.structural import UnobservedComponents
np.random.seed(12345)
ar = np.r_[1, 0.9]
ma = np.array([1])
arma_process = ArmaProcess(ar, ma)
X = 100 + arma_process.generate_sample(nsample=100)
y = 1.2 * X + np.random.normal(size=100)
We build a UnobservedComponents model with the first 70 points to run inferences on the last 30 points like so:
model = UnobservedComponents(y[:70], level='llevel', exog=X[:70])
f_model = model.fit()
forecaster = f_model.get_forecast(
steps=30,
exog=X[70:].reshape(-1, 1)
)
conf_int = forecaster.conf_int()
If we observe the mean for the 95% confidence interval, we get the following:
conf_int.mean(axis=0)
array([118.19789195, 122.14101161])
But when trying to get the same values through model simulations, we don't quite get the same results. Here's the script we run for the simulated boundaries:
sim_model = UnobservedComponents(np.zeros(30), level='llevel', exog=X[70:])
res = []
predicted_state = f_model.predicted_state[..., -1]
predicted_state_cov = f_model.predicted_state_cov[..., -1]
for i in range(1000):
init_state = np.random.multivariate_normal(
predicted_state,
predicted_state_cov
)
sim = sim_model.simulate(
f_model.params,
30,
initial_state=init_state)
res.append(sim.mean())
Printing the lower 2.5 and upper 97.5 percentile we get:
np.percentile(res, [2.5, 97.5])
array([119.06735028, 121.26810407])
As we use model simulations to distinguish signal from noise in data, this difference ended up being big enough to lead to contradictory conclusions. If we make for instance:
y[70:] += 1
Then according to the first technique we conclude the new y carries no signal as its mean is lower than 122.14. But the same is not true if we use the second technique: as the upper boundary is 121.2, we conclude that there's signal.
What we are trying to understand now is whether this is expected. Shouldn't the lower and upper 95% confidence interval of both techniques be equal?
I'm trying to implement an numerical gradient calculation in numpy to be used as the callback function for the gradient in cyipopt. My understanding of the numpy gradient function is that it should return the gradient calculated at a point based on a finite different approximation.
I don't understand how I would able to implement the gradient of a nonlinear function with this module. The sample problem given appears to be a linear function.
>>> f = np.array([1, 2, 4, 7, 11, 16], dtype=np.float)
>>> np.gradient(f)
array([ 1. , 1.5, 2.5, 3.5, 4.5, 5. ])
>>> np.gradient(f, 2)
array([ 0.5 , 0.75, 1.25, 1.75, 2.25, 2.5 ])
My code snippet is as follows:
import numpy as np
# Hock & Schittkowski test problem #40
x = np.mgrid[0.75:0.85:0.01, 0.75:0.8:0.01, 0.75:0.8:0.01, 0.75:0.8:0.01]
# target is evaluation at x = [0.8, 0.8, 0.8, 0.8]
f = -x[0] * x[1] * x[2] * x[3]
g = np.gradient(f)
print g
The other downside of this is that I have to evaluate x at several points (and it returns the gradient at several points)
Is there a better option in numpy/scipy for the gradient to be numerically evaluated at a single point so I can implement this as a callback function?
First of all, some warnings:
numerical-optimization is hard to do right
ipopt is very complex software
combining ipopt with numerical-differentiation sounds like you are asking for trouble, but that depends on your problem of course
ipopt is almost always based on automatic-differentiation tools and not numerical-differentiation!
And some more:
as this is a complex task and the state of python + ipopt is not as nice as in some other languages (julia + JuMP for example), it's a bit of work
And some alternatives:
use pyomo which wraps ipopt and has automatic-differentiation
use casadi which also wraps ipopt and has automatic-differentiation
use autograd to automatically calculate gradients on a subset of numpy-code
then use cyipopt to add those
scipy.minimize with solvers SLSQP or COBYLA which can do everything for you (SLSQP can use equality and inequality constraints; COBYLA only inequality-constraints, where emulating equality-constraints by x >= y + x <= y can work)
Approaching your task with your tools
Your complete example-problem is defined in Test Examples for Nonlinear Programming Codes:
Here is some code, based on numerical-differentiation, solving your test-problem, including the official setup (function, gradients, start-point, bounds, ...)
import numpy as np
import scipy.sparse as sps
import ipopt
from scipy.optimize import approx_fprime
class Problem40(object):
""" # Hock & Schittkowski test problem #40
Basic structure follows:
- cyipopt example from https://pythonhosted.org/ipopt/tutorial.html#defining-the-problem
- which follows ipopt's docs from: https://www.coin-or.org/Ipopt/documentation/node22.html
Changes:
- numerical-diff using scipy for function & constraints
- removal of hessian-calculation
- we will use limited-memory approximation
- ipopt docs: https://www.coin-or.org/Ipopt/documentation/node31.html
- (because i'm too lazy to reason about the math; lagrange and co.)
"""
def __init__(self):
self.num_diff_eps = 1e-8 # maybe tuning needed!
def objective(self, x):
# callback for objective
return -np.prod(x) # -x1 x2 x3 x4
def constraint_0(self, x):
return np.array([x[0]**3 + x[1]**2 -1])
def constraint_1(self, x):
return np.array([x[0]**2 * x[3] - x[2]])
def constraint_2(self, x):
return np.array([x[3]**2 - x[1]])
def constraints(self, x):
# callback for constraints
return np.concatenate([self.constraint_0(x),
self.constraint_1(x),
self.constraint_2(x)])
def gradient(self, x):
# callback for gradient
return approx_fprime(x, self.objective, self.num_diff_eps)
def jacobian(self, x):
# callback for jacobian
return np.concatenate([
approx_fprime(x, self.constraint_0, self.num_diff_eps),
approx_fprime(x, self.constraint_1, self.num_diff_eps),
approx_fprime(x, self.constraint_2, self.num_diff_eps)])
def hessian(self, x, lagrange, obj_factor):
return False # we will use quasi-newton approaches to use hessian-info
# progress callback
def intermediate(
self,
alg_mod,
iter_count,
obj_value,
inf_pr,
inf_du,
mu,
d_norm,
regularization_size,
alpha_du,
alpha_pr,
ls_trials
):
print("Objective value at iteration #%d is - %g" % (iter_count, obj_value))
# Remaining problem definition; still following official source:
# http://www.ai7.uni-bayreuth.de/test_problem_coll.pdf
# start-point -> infeasible
x0 = [0.8, 0.8, 0.8, 0.8]
# variable-bounds -> empty => np.inf-approach deviates from cyipopt docs!
lb = [-np.inf, -np.inf, -np.inf, -np.inf]
ub = [np.inf, np.inf, np.inf, np.inf]
# constraint bounds -> c == 0 needed -> both bounds = 0
cl = [0, 0, 0]
cu = [0, 0, 0]
nlp = ipopt.problem(
n=len(x0),
m=len(cl),
problem_obj=Problem40(),
lb=lb,
ub=ub,
cl=cl,
cu=cu
)
# IMPORTANT: need to use limited-memory / lbfgs here as we didn't give a valid hessian-callback
nlp.addOption(b'hessian_approximation', b'limited-memory')
x, info = nlp.solve(x0)
print(x)
print(info)
# CORRECT RESULT & SUCCESSFUL STATE
Output:
******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
Ipopt is released as open source code under the Eclipse Public License (EPL).
For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************
This is Ipopt version 3.12.8, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).
Number of nonzeros in equality constraint Jacobian...: 12
Number of nonzeros in inequality constraint Jacobian.: 0
Number of nonzeros in Lagrangian Hessian.............: 0
Total number of variables............................: 4
variables with only lower bounds: 0
variables with lower and upper bounds: 0
variables with only upper bounds: 0
Total number of equality constraints.................: 3
Total number of inequality constraints...............: 0
inequality constraints with only lower bounds: 0
inequality constraints with lower and upper bounds: 0
inequality constraints with only upper bounds: 0
Objective value at iteration #0 is - -0.4096
iter objective inf_pr inf_du lg(mu) ||d|| lg(rg) alpha_du alpha_pr ls
0 -4.0960000e-01 2.88e-01 2.53e-02 0.0 0.00e+00 - 0.00e+00 0.00e+00 0
Objective value at iteration #1 is - -0.255391
1 -2.5539060e-01 1.28e-02 2.98e-01 -11.0 2.51e-01 - 1.00e+00 1.00e+00h 1
Objective value at iteration #2 is - -0.249299
2 -2.4929898e-01 8.29e-05 3.73e-01 -11.0 7.77e-03 - 1.00e+00 1.00e+00h 1
Objective value at iteration #3 is - -0.25077
3 -2.5076955e-01 1.32e-03 3.28e-01 -11.0 2.46e-02 - 1.00e+00 1.00e+00h 1
Objective value at iteration #4 is - -0.250025
4 -2.5002535e-01 4.06e-05 1.93e-02 -11.0 4.65e-03 - 1.00e+00 1.00e+00h 1
Objective value at iteration #5 is - -0.25
5 -2.5000038e-01 6.57e-07 1.70e-04 -11.0 5.46e-04 - 1.00e+00 1.00e+00h 1
Objective value at iteration #6 is - -0.25
6 -2.5000001e-01 2.18e-08 2.20e-06 -11.0 9.69e-05 - 1.00e+00 1.00e+00h 1
Objective value at iteration #7 is - -0.25
7 -2.5000000e-01 3.73e-12 4.42e-10 -11.0 1.27e-06 - 1.00e+00 1.00e+00h 1
Number of Iterations....: 7
(scaled) (unscaled)
Objective...............: -2.5000000000225586e-01 -2.5000000000225586e-01
Dual infeasibility......: 4.4218750883118219e-10 4.4218750883118219e-10
Constraint violation....: 3.7250202922223252e-12 3.7250202922223252e-12
Complementarity.........: 0.0000000000000000e+00 0.0000000000000000e+00
Overall NLP error.......: 4.4218750883118219e-10 4.4218750883118219e-10
Number of objective function evaluations = 8
Number of objective gradient evaluations = 8
Number of equality constraint evaluations = 8
Number of inequality constraint evaluations = 0
Number of equality constraint Jacobian evaluations = 8
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations = 0
Total CPU secs in IPOPT (w/o function evaluations) = 0.016
Total CPU secs in NLP function evaluations = 0.000
EXIT: Optimal Solution Found.
[ 0.79370053 0.70710678 0.52973155 0.84089641]
{'x': array([ 0.79370053, 0.70710678, 0.52973155, 0.84089641]), 'g': array([ 3.72502029e-12, -3.93685085e-13, 5.86974913e-13]), 'obj_val': -0.25000000000225586, 'mult_g': array([ 0.49999999, -0.47193715, 0.35355339]), 'mult_x_L': array([ 0., 0., 0., 0.]), 'mult_x_U': array([ 0., 0., 0., 0.]), 'status': 0, 'status_msg': b'Algorithm terminated successfully at a locally optimal point, satisfying the convergence tolerances (can be specified by options).'}
Remarks about the code
We use scipy's approx_fprime which basically was added for all those gradient-based optimizers in scipy.optimize
As stated in the sources; i did not take care about ipopt's need for the hessian and we used ipopts hessian-approximation
the basic idea is described at wiki: LBFGS
I did ignore ipopts need for sparsity structure of the Jacobian of the constraints
a default-assumption: the default hessian structure is of a lower triangular matrix is used and i won't give any guarantees on what can happen here (bad performance vs. breaking everything)
I think you have some kind of misunderstanding about what is a mathematical function and what is its numerical implementation.
You should define your function as:
def func(x1, x2, x3, x4):
return -x1*x2*x3*x4
Now you want to evaluate your function at specific points, which you can do using the np.mgrid you provided.
If you want to compute your gradient, use copy.misc.derivative(https://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.derivative.html) (watch out the default parameters for dx is usually bad, change it to 1e-5. There is no difference between linear and non-linear gradient for the numerical evaluation, only that for non linear function the gradient won't be the same everywhere.
What you did was with np.gradient was actually to compute the gradient from the point in your array, the definition of your function being hidden by your definition of f, thus not allowing for multiple gradient evaluation at different points. Also using your method makes you dependant of your discretisation step.
During optimization, it is often helpful to normalize the input parameters to make them on the same order of magnitude, so the convergence can be much better. For example, if we want to minimize f(x), while a reasonable approximation is x0=[1e3, 1e-4], it might be helpful to normalize x0[0] and x0[1] to about the same order of magnitude (often O(1)).
My question is, I have been using scipy.optimize and specifically, the L-BFGS-B algorithm. I was wondering that, do I need to normalize that manually by writing a function, or the algorithm already did it for me?
Thank you!
I wrote a quick small program to test your question.
To summarize: if the parameters are within a couple orders of magnitude of each other, then the algorithm handles it (it successfully converges and does not need to do significantly more function evaluations).
On the other hand, when you start getting beyond a factor of 10000, the algorithm starts to break down and errors out.
Here is the program:
import scipy.optimize
def test_lbfgsb():
def surface(x):
return (x[0] - 3.0) ** 2 + (factor * x[1] - 4.0) ** 2
factor = None
for exponent in xrange(0, 9):
magnitude = 10 ** exponent
factors = [x * magnitude for x in [1, 3]]
for factor in factors:
optimization_result = scipy.optimize.minimize(surface, [0, 0], method='l-bfgs-b')
desc = 'at factor %d' % (factor)
if not optimization_result.success:
print '%s FAILURE (%s)' % (desc, optimization_result.message)
else:
print '%s, found min at %s, after %d evaluations' % (
desc, optimization_result.x, optimization_result.nfev)
test_lbfgsb()
Here is its output:
at factor 1, found min at [ 3.00000048 4.00000013], after 12 evaluations
at factor 3, found min at [ 2.99999958 1.33333351], after 36 evaluations
at factor 10, found min at [ 3.00000059 0.39999999], after 28 evaluations
at factor 30, found min at [ 2.99999994 0.13333333], after 36 evaluations
at factor 100, found min at [ 3.00000013 0.03999999], after 40 evaluations
at factor 300, found min at [ 3. 0.01333333], after 52 evaluations
at factor 1000, found min at [ 3. 0.00399999], after 64 evaluations
at factor 3000, found min at [ 3.00000006e+00 1.33332833e-03], after 72 evaluations
at factor 10000, found min at [ 3.00002680e+00 3.99998309e-04], after 92 evaluations
at factor 30000, found min at [ 3.00000002e+00 1.33328333e-04], after 104 evaluations
at factor 100000 FAILURE (ABNORMAL_TERMINATION_IN_LNSRCH)
at factor 300000, found min at [ 3.00013621e+00 1.33292531e-05], after 196 evaluations
at factor 1000000, found min at [ 3.00000348e-12 3.99500004e-06], after 60 evaluations
at factor 3000000 FAILURE (ABNORMAL_TERMINATION_IN_LNSRCH)
at factor 10000000 FAILURE (ABNORMAL_TERMINATION_IN_LNSRCH)
at factor 30000000 FAILURE (ABNORMAL_TERMINATION_IN_LNSRCH)
at factor 100000000 FAILURE (ABNORMAL_TERMINATION_IN_LNSRCH)
at factor 300000000, found min at [ 3.33333330e-17 8.33333350e-09], after 72 evaluations