scipy minimize COBYLA callback not working - python

I am using scipy.optimize's minimizefunction, and I would like to terminate the search as soon as the function value is below some threshold. I have tried using a callback that returns true when the above mentioned condition is met, but this in my code the search just continues.
I also have another "fundamental" issue with using the structure of callback they documentation requires: my function is pretty expensive to evaluate, and using the callback I evaluate it twice for the same set of parameters (once with the callback and the second as actual iteration); so if I could be spared from the extra computational cost it would also be nice.
Below my code
class MinimizeStopper(object):
def __init__(self, maximal_non_overlap = 0.05):
self.max_non_overlap = maximal_non_overlap
def __call__(self, xk):
res = fit_min(xk)
return (res <= self.max_non_overlap)
my_cb = MinimizeStopper(0.1)
print(scipy.optimize.minimize(fit_min, ansatz_params[1], callback= my_cb.__call__, method='COBYLA'))

I guess scipy.optimize.minimize's documentation is not 100% clear in regard to the callbacks. AFAIK, only the 'trust-constr' method gets terminated by the callback once the callback returns True. For all the remaining methods the callback can only be used for logging purposes.
Regarding the second part of your question, I cite the docs:
For ‘trust-constr’ it is a callable with the signature:
callback(xk, OptimizeResult state) -> bool
where xk is the current parameter vector. and state is an OptimizeResult > object, with the same fields as the ones from the return.
Thus, assuming you're open for 'trust-constr', you don't need to evaluate your objective function again since you can directly access the objective value in the current iteration via state.fun:
from scipy.optimize import minimize
def cb(xk, state, threshold_value = 0.1):
return state.fun <= threshold_value
res = minimize(your_fun, x0=x0, callback=cb, method="trust-constr")
Since your objective function is expensive to evaluate, it's highly recommended to pass the gradient and hessian as well if your objective is twice continuously differentiable and both are known. Otherwise, both get approximated by finite differences which will be quite slow.

Related

How to make a function pickable so that it can be minimized through the parallel version of scipy.optimize.differential_evolution

I need to minimize a function using the scipy implementation of differential evolution.
I'd like to exploit parallelism to speed up the computation and I tried setting workers=-1.
I get an error and searching I found that the problem is that the function that I'm trying to minimize is not pickable.
I need help to understand how to make it pickable.
The function to minimize works in the following way:
A class object has an attribute vector, the observed data.
One method of the class takes some parameters and compute an estimate of the vector.
The function to minimize compute the mean square error between the vector and the computed estimate.
The pseudocode of the function could be something like that:
def function_to_minimize(self, parameters):
true_vector = self.true_vector
estimated_vector = self.estimate_vector(parameters)
return mse(true_vector, estimated_vector)
Something like this should work:
class Objective(object):
def __init__(self, data):
self.measured_data = data
def __call__(self, parameters):
# need to return a scalar value
estimated_vector = self.estimate_vector(parameters)
return np.sum(np.power(self.measured_data - estimated_vector, 2))
def estimate_vector(parameters):
# calculate what you expect to happen with the parameters
pass
you should pass Objective(data) to differential_evolution as the function to minimize. On macOS and Windows that use spawn as the default way of creating new processes this function should be defined in a file that is importable.

Why are some variables attached to self in the "Specifying sparse partial derivatives for a simple vectorized component" video?

In this video, Justin has written the following in the compute function of his component.
def compute(self, inputs, outputs):
self.alp_sc = 0.91
T0 = 28. #reference temperature
eff0 = .285 #efficiency at ref temp
T1 = -150.
eff1 = 0.335
delta_T = inputs['T'] - T0
self.slope = (eff1 - eff0) / (T1 - T0)
outputs['eta'] = (eff0 + self.slope * delta_T)/self.alp_sc
Why does he choose to use self.slope and self.alp_sc instead of just plain variables? Is it something important for openMDAO/vectorised components, or just an arbitrary choice?
The attributes (e.g. self.alp_sc) are constants that were never intended to change for that sample component. They do not need to be input variables, because they will never be connected to.
It is considered a best practice to define partial derivatives for all outputs with respect to all inputs in any component. Therefore, by making these values attributes, we can respect the best practice and avoid declaring partials for things we know will never change.
If you continue to watch the video, self.slope and self.alp_sc are also used in the compute_partials method. self.alp_sc is constant, so I think it could have been declared in the initialize method of the class (if it is a parameter, that you want to potentially change for different instances), or outside of the class (if it is really a constant). self.slope in this case is also a constant, so the same applies. But imagine, the slope would depend on the input of your component, and you would need to recalculate it in each iteration (and let's say it is also very computationally expensive, which in the example is clearly not). In this case you could save some computations by storing the value in a class attribute (self.slope) and just reusing it in the derivative computation.
One thing, that must to be ensured is, that in each iteration compute is called before compute_partials (otherwise you could end up using an obsolete value from the previous iteration in the derivative calculation), but I think that is always true in the current OpenMDAO 3.0.
It is quite common that you need to calculate the same quantities for the function and for the derivatives. Storing it in an attribute is one way to do it (less computation), or calling the same function within compute and compute_partials (twice as much computation, less memory) is another way.
From the snippet of code it looks like def compute(self, inputs, outputs) is a method inside of an object/class. This can be seen from self being conatined in its initialisation.
self.slope and 'self.alp_scwill be variables that are declared within the scope of the object that thecompute()` method is part of.
See classes in Python for more on this.

scipy.optimize.least_squares - limit number of jacobian evaluations

I am trying to use scipy.optimize.least_squares(fun= my_fun, jac=my_jac, max_nfev= 1000) with two callable functions: my_fun and my_jac
both fuctions: my_fun and my_jac, use an external software to evaluate their value, this task is much time consuming, therefore I prefer to control the number of evaluations for both
the trf method uses the my_fun function for evaluating if trust region is adequate and the my_jac function for determine both the cost function and the jacobian matrix
There is an input parameter max_nfev. does this parameter count only for the fun evaluations? does it consider also the jac evaluations?
moreover, in matlab there are two parameters for the lsqnonlin function, MaxIterations and MaxFunctionEvaluations. does it exist in scipy.optimize.least_squares?
Thanks
Alon
According to the help of scipy.optimize.least_squares, max_nfev is the number of function evaluations before the program exits :
max_nfev : None or int, optional
Maximum number of function evaluations before the termination.
If None (default), the value is chosen automatically:
Again, according to the help, there is no MaxIterations argument but you can define the tolerance in f (ftol) that is the function you want to minimize or x (xtol) the solution, before exiting the code.
You can also use scipy.optimize.minimize(). In it, you can define a maxiter argument which will be in the options dictionary.
If you do so, beware that the function you want to minimize must be your cost function, meaning that you will have to code your least square function.
I hope it will be clear and useful to you

Can I pass the objective and derivative functions to scipy.optimize.minimize as one function?

I'm trying to use scipy.optimize.minimize to minimize a complicated function. I noticed in hindsight that the minimize function takes the objective and derivative functions as separate arguments. Unfortunately, I've already defined a function which returns the objective function value and first-derivative values together -- because the two are computed simultaneously in a for loop. I don't think there is a good way to separate my function into two without the program essentially running the same for loop twice.
Is there a way to pass this combined function to minimize?
(FYI, I'm writing an artificial neural network backpropagation algorithm, so the for loop is used to loop over training data. The objective and derivatives are accumulated concurrently.)
Yes, you can pass them in a single function:
import numpy as np
from scipy.optimize import minimize
def f(x):
return np.sin(x) + x**2, np.cos(x) + 2*x
sol = minimize(f, [0], jac=True, method='L-BFGS-B')
Something that might work is: you can memoize the function, meaning that if it gets called with the same inputs a second time, it will simply return the same outputs corresponding to those inputs without doing any actual work the second time. What is happening behind the scenes is that the results are getting cached. In the context of a nonlinear program, there could be thousands of calls which implies a large cache. Often with memoizers(?), you can specify a cache limit and the population will be managed FIFO. IOW you still benefit fully for your particular case because the inputs will be the same only when you are needing to return function value and derivative around the same point in time. So what I'm getting at is that a small cache should suffice.
You don't say whether you are using py2 or py3. In Py 3.2+, you can use functools.lru_cache as a decorator to provide this memoization. Then, you write your code like this:
#functools.lru_cache
def original_fn(x):
blah
return fnvalue, fnderiv
def new_fn_value(x):
fnvalue, fnderiv = original_fn(x)
return fnvalue
def new_fn_deriv(x):
fnvalue, fnderiv = original_fn(x)
return fnderiv
Then you pass each of the new functions to minimize. You still have a penalty because of the second call, but it will do no work if x is unchanged. You will need to research what unchanged means in the context of floating point numbers, particularly since the change in x will fall away as the minimization begins to converge.
There are lots of recipes for memoization in py2.x if you look around a bit.
Did I make any sense at all?

Multithreaded calls to the objective function of scipy.optimize.leastsq

I'm using scipy.optimize.leastsq in conjunction with a simulator. leastsq calls a user-defined objective function and passes an input vector to it. In turn, the objective function returns an error vector. leastsq optimizes the input vector in such a way that the sum of the squares of the error vector is minimized.
In my case the objective function will run a whole simulation each time it is called. The employed simulator is single-threaded and needs several minutes for each run. I'd therefore like to run multiple instances of the simulator at once. However, calls to the objective function are performed serially.
How can I get leastsq to perform multiple calls to the objective function at once?
There's a good opportunity to speed up leastsq by supplying your own function to calculate the derivatives (the Dfun parameter), providing you have several parameters. If this function is not supplied, leastsq iterates over each of the parameters to calculate the derivative each time, which is time consuming. This appears to take the majority of the time in the fitting.
You can use your own Dfun function which calculates the derivatives for each parameter using a multiprocessing.Pool to do the work. These derivatives can be calculated independently and should be trivially parallelised.
Here is a rough example, showing how to do this:
import numpy as np
import multiprocessing
import scipy.optimize
def calcmod(params):
"""Return the model."""
return func(params)
def delta(params):
"""Difference between model and data."""
return calcmod(params) - y
pool = multiprocessing.Pool(4)
def Dfun(params):
"""Calculate derivatives for each parameter using pool."""
zeropred = calcmod(params)
derivparams = []
delta = 1e-4
for i in range(len(params)):
copy = np.array(params)
copy[i] += delta
derivparams.append(copy)
results = pool.map(calcmod, derivparams)
derivs = [ (r - zeropred)/delta for r in results ]
return derivs
retn = scipy.optimize.leastsq(leastfuncall, inputparams, gtol=0.01,
Dfun=Dfun, col_deriv=1)
The algorithm used by leastsq, Levenberg-Marquardt, needs to know the value of the objective function at the current point before determining the next point. In short, there is no straightforward way to parallelize such a serial algorithm.
You can, however, parallelize your objective function in some cases. This can be done, if it's of the form:
def objective_f(params):
r = np.zeros([200], float)
for j in range(200):
r[j] = run_simulation(j, params)
return
def run_simulation(j, params):
r1 = ... compute j-th entry of the result ...
return r1
Here, you can clearly parallelize across the loop over j, for instance using the multiprocessing module. Something like this: (untested)
def objective_f(params):
r = np.zeros([200], float)
def parameters():
for j in range(200):
yield j, params
pool = multiprocessing.Pool()
r[:] = pool.map(run_simulation, parameters())
return r
Another opportunity for parallelization occurs if you have to fit multiple data sets --- this is an (embarassingly) parallel problem, and the different data sets can be fitted in parallel.
If this does not help, you can look into discussion on parallelization of the LM algorithm in the literature. For instance: http://dl.acm.org/citation.cfm?id=1542338 The main optimization suggested in this paper seems to be parallelization of the numerical computation of the Jacobian. You can do this by supplying your own parallelized Jacobian function to leastsq. The remaining suggestion of the paper, speculatively parallelizing Levenberg-Marquardt search steps, is however more difficult to implement and requires changes in the LM algorithm.
I'm not aware of Python (or other language) libraries implementing optimization algorithms targeted for parallel computation, although there may be some. If you manage to implement/find one of them, please advertise this on the Scipy users mailing list --- there is certainly interest in one of these!
Does this help?
http://docs.python.org/library/multiprocessing.html
I've always found Pool to be the simplest to multiprocess with python.
NumPy/SciPy's functions are usually optimized for multithreading. Did you look at your CPU utilization to confirm that only one core is being used while the simulation is being ran? Otherwise you have nothing to gain from running multiple instances.
If it is, in fact, single threaded, then your best option is to employ the multiprocessing module. It runs several instances of the Python interpreter so you can make several simultaneous calls to SciPy.
Have you used scipy.least_squares, it is a much better option, and when I use it to optimize a function it uses all the available threads. Therefore exactly what you asked

Categories