Python: Levenberg Marquardt Algorithm parallelisation - python

I have a bit of code that fits theoretical prediction to experimental data, and I want to run a LMA (Levenberg-Marquardt Algorithm) to fit the theory to experiment. However the calculations are non-trivial, with each model taking ~10-30 minutes to calculate on a single processor, however the problem is embarrassingly parallelisable and the code is currently set up to submit the different components (of a single itteration) to a cluster computer (this calculation still takes ~1-2 minutes).
Now this submission script is set up within a callable function within python - so for setting it up with the scipy LMA (scipy.optimise.leastsq) it is relatively trivial - however the scipy LMA will, I imagine, pass each individual calculation (for gauging the gradient) in serial, and wait for the return, whereas I'd prefer the LMA to send an entire set of calculations at a time, and then await the return. The python submission script looks a bit like:
def submission_script(number_iterations,number_parameters,value_parameters):
fitness_parameter = [0]*number_iterations
<fun stuff>
return (fitness_parameter)
Where the "value_parameters" is a nested list of dimensions [number_iterations][number_parameters] which contains the variables that are to be calculated for each model, "number_parameters" is the number of parameters that are to be fitted, "number_iterations" is the number of models to be calculated (so each step, to gauge the gradient, the LMA calculates 2*number_parameters models), and "fitness_parameter" is the value that has to be minimised (and has the dimensions [iterations]).
Now, obviously, I could write my own LMA, but that is a little bit of reinventing the wheel - I was wondering if there was anything out there that would satisfy my needs (or if the scipy LMA can be used in this way).
A Gauss-Newton algorithm should also work, as the starting point should be near the minima. The ability to constrain the fit (i.e. set maximum and minimum values for the fitted parameters) would be nice, but isn't necessary.

The scipy.optimize.leastsq function gives you the opportunity to provide a function J for evaluating the jacobian for a given parameter vector. You could implement a multiprocessing solution for calculating this matrix instead of having scipy.optimize.leastsq approximate it by serially calling your function f.
Unfortunately the LMA implementation in scipy uses separate functions for f and J. You may want to cache information you calculate in f in order to reuse it in J if it is called with the same parameter vecor. Alternatively you can implement a own LMA version that uses a single fJ call.

Found that this is basically a repeated question - it has been asked an answered at the link below.
Multithreaded calls to the objective function of scipy.optimize.leastsq

Related

Touble with implement gradient descent methods for optimizing structural geometries

A problem I'm currently working on requires me to optimize some dimension parameters for a structure in order to prevent buckling while still not being over engineered. I've been able to solve it use iterative (semi-brute forced) methods, however, I wondering if there is a way to implement a gradient descent method to optimize the parameters. More background is given below:
Let's say we are trying to optimize three length/thickness parameters, (t1,t2,t3) .
We initialize these parameters with some random guess (t1,t2,t3)g. Through some transformation to each of these parameters (weights and biases), the aim is to obtain (t1,t2,t3)ideal such that three main criteria (R1,R2,R3)ideal are met. The criteria are calculated by using (t1,t2,t3)i as inputs to some structural equations, where i represents the inputs after the first iteration. Following this, some kind of loss function could be implemented to calculate the error, (R1,R2,R3)i - (R1,R2,R3)ideal
My confusion lies in the fact that traditionally, (t1,t2,t3)ideal would be known and the cost would be a function of the error between (t1,t2,t3)ideal and (t1,t2,t3)i, and subsequent iterations would follow. However, in a case where (t1,t2,t3)ideal are unknown and the targets (R1,R2,R3)ideal (known) are an indirect function of the inputs, how would gradient descent be implemented? How would minimizing the cost relate to the step change in (t1,t2,t3)i ?
P.S: Sorry about the formatting, I cannot embed latex images until my reputation is higher.
I'm having some difficulty understanding how the constraints you're describing are calculated. I'd imagine the quantity you're trying to minimize is the total material used or the cost of construction, not the "error" you describe?
I don't know the details of your specific problem, but it's probably a safe bet that the cost function isn't convex. Any gradient-based optimization algorithm carries the risk of getting stuck in a local minimum. If the cost function isn't computationally intensive to evaluate then I'd recommend you use an algorithm like differential evolution that starts with a population of initial guesses scattered throughout the parameter space. SciPy has a nice implementation of it that allows for constraints (and includes a final gradient-based "polishing" step).

lmfit/scipy.optimize minimization methods description?

Is there any place with a brief description of each of the algorithms for the parameter method in the minimize function of the lmfit package? Both there and in the documentation of SciPy there is no explanation about the details of each algorithm. Right now I know I can choose between them but I don't know which one to choose...
My current problem
I am using lmfit in Python to minimize a function. I want to minimize the function within a finite and predefined range where the function has the following characteristics:
It is almost zero everywhere, which makes it to be numerically identical to zero almost everywhere.
It has a very, very sharp peak in some point.
The peak can be anywhere within the region.
This makes many minimization algorithms to not work. Right now I am using a combination of the brute force method (method="brute") to find a point close to the peak and then feed this value to the Nelder-Mead algorithm (method="nelder") to finally perform the minimization. It is working approximately 50 % of the times, and the other 50 % of the times it fails to find the minimum. I wonder if there are better algorithms for cases like this one...
I think it is a fair point that docs for lmfit (such as https://lmfit.github.io/lmfit-py/fitting.html#fit-methods-table) and scipy.optimize (such as https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html#optimization-scipy-optimize) do not give detailed mathematical descriptions of the algorithms.
Then again, most of the docs for scipy, numpy, and related libraries describe how to use the methods, but do not describe in much mathematical detail how the algorithms work.
In fairness, the different optimization algorithms share many features and the differences between them can get pretty technical. All of these methods try to minimize some metric (often called "cost" or "residual") by changing the values of parameters for the supplied function.
It sort of takes a text book (or at least a Wikipedia page) to establish the concepts and mathematical terms used for these methods, and then a paper (or at least a Wikipedia page) to describe how each method differs from the others. So, I think the basic answer would be to look up the different methods.

Why are components executed two times for each Gauss-Seidel iteration? (OpenMDAO 2.4.0)

I've been using the the NonLinearBlockGS as nonlinear_solver for my MDO system consisting of ExplicitComponents and this works as expected. First I was using this with simple mathematical functions (hence runtime << 1s), but now I'm also implementing a system with multiple explicit components that have runtimes of around one minute or more. That's when I noticed that the NonLinearBlockGS solver actually needs to run the tools in the coupled system two times per iteration. These runs originate from the self._iter_execute() and the self._run_apply() in the _run_iterator() method of the solver (class Solver in file solver.py).
My main question is, are two runs per iteration really required, and if so, why?
It seems the first component run (self.iter_execute()) uses an initial guess for the feedback variables that need to be converged and then runs the components sequentially while updating any feedforward data. This is the step I would expect for Gauss-Seidel. But then the second component run (self._run_apply()) runs the components again with the updated feedback variables that resulted from the first run while keeping the feedforwards as they were in that first run. If I'm not mistaken, this information is then (only) used to assess the convergence of that iteration (self._iter_get_norm()).
Instead of having this second run inside the iteration, wouldn't it be more efficient to directly continue to the next iteration? In that iteration we can use the new values of the feedback variables and do another self._iter_execute() with the update of feedforward data and then assess the convergence based on the difference between the results between those two iterations. Of course this means that we need at least two iterations to assess convergence, but it saves one component run per iteration. (This is actually the existing implementation that I have for convergence of these components in MATLAB and that works as expected, hence it finds the same converged design, but with half the amount of component runs.)
So another way of putting this is: why do we need the self._run_apply() in each iteration when doing Gauss-Seidel convergence? (And could this be turned off?)
There are a couple of different aspects to your question. First, I'll address the details of solve_nonlinear vs apply_nonlinear. With underlying mathematical algorithms of OpenMDAO, based on the MAUD framework , solve_nonlinear computes the values of the output values only (does not set residuals). apply_nonlinear computes only the residuals (and does not set outputs).
For sub-classes of ExplicitComponent, the user only implements a compute method, and the base class implements both solve_nonlinear and apply_nonlinear using compute.
As you described it, in OpenMDAO V2.4 current implementation of NonlinearBlockGaussSeidel for each iteration, performs one recursive solve_nonlinear call on its group and then calls apply_nonlinear to check the residual and look for convergence.
However, you're also correct that we could be doing this more efficiently. The modification you suggested to the algorithm would work, and we'll put it on the development pipeline for for V2.6 (as of the time of this post, we're just about to release V2.5 and there won't be time to add this into that release)

How should I scipy.optimize a multivariate and non-differentiable function with boundaries?

I come upon the following optimization problem:
The target function is a multivariate and non-differentiable function which takes as argument a list of scalars and return a scalar. It is non-differentiable in the sense that the computation within the function is based on pandas and a series of rolling, std, etc. actions.
The pseudo code is below:
def target_function(x: list) -> float:
# calculations
return output
Besides, each component of the x argument has its own bounds defined as a tuple (min, max). So how should I use the scipy.optimize library to find the global minimum of this function? Any other libraries could help?
I already tried scipy.optimize.brute, which took me like forever and scipy.optimize.minimize, which never produced a seemingly correct answer.
basinhopping, brute, and differential_evolution are the methods available for global optimization. As you've already discovered, brute-force global optimization is not going to be particularly efficient.
Differential evolution is a stochastic method that should do better than brute-force, but may still require a large number of objective function evaluations. If you want to use it, you should play with the parameters and see what will work best for your problem. This tends to work better than other methods if you know that your objective function is not "smooth": there could be discontinuities in the function or its derivatives.
Basin-hopping, on the other hand, makes stochastic jumps but also uses local relaxation after each jump. This is useful if your objective function has many local minima, but due to the local relaxation used, the function should be smooth. If you can't easily get at the gradient of your function, you could still try basin-hopping with one of the local minimizers which doesn't require this information.
The advantage of the scipy.optimize.basinhopping routine is that it is very customizable. You can use take_step to define a custom random jump, accept_test to override the test used for deciding whether to proceed with or discard the results of a random jump and relaxation, and minimizer_kwargs to adjust the local minimization behavior. For example, you might override take_step to stay within your bounds, and then select perhaps the L-BFGS-B minimizer, which can numerically estimate your function's gradient as well as take bounds on the parameters. L-BFGS-B does work better if you give it a gradient, but I've used it without one and it still is able to minimize well. Be sure to read about all of the parameters on the local and global optimization routines and adjust things like tolerances as acceptable to improve performance.

Is there any way to do scipy.optimize.minimize (or something functionally equivalent) in parallel?

I have a multivariate optimization problem that I want to run. Each evaluation is quite slow. So obviously the ability to thread it out to multiple machines would be quite nice. I have no trouble writing the code to dispatch jobs to other machines. However, scipy.optimize.minimize calls each evaluation call sequentially; it won't give me another set of parameters to evaluate until the previous one returns.
Now, I know that the "easy" solution to this would be "run your evaluation task in a parallel manner - break it up". Indeed, while it is possible to do this to some extent, it only goes so far; the bandwidth rises the more you split it up until splitting it up further actually starts to slow you down. Having another axis in which to parallelize - aka, the minimization function itself - would greatly increase scaleability.
Is there no way to do this with scipy.optimize.minimize? Or any other utility that performs in a roughly functionally equivalent manner (trying to find as low of a minimum as possible)? Surely it's possible for a minimization utility to make use of parallelism, particularly on a multivariate optimization problem where there's many axes whose gradients relative to each other at given datapoints need to be examined.

Categories