I am trying to find the global minimum of an objective function using basinhopping, but for a majority of the time it is stuck at a local minimum. I read through the document for basinhopping, and found the interval and accept_test might be helpful, but now the question is what values to give them, e.g. I want my objective function to go as close to 0 as possible (1e-5 close) without spending too much time at very large values like 4 or 5. As for interval how does one know how often a stepsize is being updated?
Here is how i'm looking for a "global" minimum:
np.random.seed(555) # Seeded to allow replication.
minimizer_kwargs = {"method": "L-BFGS-B", "bounds": bnds,, tol=1e-4}
ret = basinhopping(merit_function, abcdex, minimizer_kwargs=minimizer_kwargs, niter=10)
zoom = ret['x']
res = minimize(merit_function, zoom, method = 'L-BFGS-B', bounds=bnds, tol=1e-9)
print res
If you're stuck in a local minimum then that likely means you need a bigger stepsize. You can set the stepsize with the keyword "stepsize".
An appropriate stepsize depends on the problem, but luckily basinhopping will adjust the stepsize automatically. How often it does this depends on the "interval" keyword. Every interval iterations the stepsize will be increased or decreased by a factor of 0.9. If the initial guess for the stepsize is way off this can still take some time. If you decrease the interval to 10 (or so) this should be much faster.
I don't think accept_test will help you here. That can be used to, for example, enforce forbidden regions of configuration space.
Related
EDIT: looks like this was already answered before here
It didn't show up in my searches because I didn't know the right nomenclature. I'll leave the question here for now in case someone arrives here because of the constraints.
I'm trying to optimize a function which is flat on almost all points ("steps function", but in a higher dimension).
The objective is to optimize a set of weights, that must sum to one, and are the parameters of a function which I need to minimize.
The problem is that, as the function is flat at most points, gradient techniques fail because they immediately converge on the starting "guess".
My hypothesis is that this could be solved with (a) Annealing or (b) Genetic Algos. Scipy sends me to basinhopping. However, I cannot find any way to use the constraint (the weights must sum to 1) or ranges (weights must be between 0 and 1) using scipy.
Actual question: How can I solve a minimization problem without gradients, and also use constraints and ranges for the input variables?
The following is a toy example (evidently this one could be solved using the gradient):
# import minimize
from scipy.optimize import minimize
# define a toy function to minimize
def my_small_func(g):
x = g[0]
y = g[1]
return x**2 - 2*y + 1
# define the starting guess
start_guess = [.5,.5]
# define the acceptable ranges (for [g1, g2] repectively)
my_ranges = ((0,1),(0,1))
# define the constraint (they must always sum to 1)
def constraint(g):
return g[0] + g[1] - 1
cons = {'type':'eq', 'fun': constraint}
# minimize
minimize(my_small_func, x0=start_guess, method='SLSQP',
bounds=rranges, constraints=cons)
I usually use R so maybe this is a bad answer, but anyway here goes.
You can solve optimization problems like the using a global optimizer. An example of this is Differential Evolution. The linked method does not use gradients. As for constraints, I usually build them manually. That looks something like this:
# some dummy function to minimize
def objective.function(a, b)
if a + b != 1 # if some constraint is not met
# return a very high value, indicating a very bad fit
return(10^90)
else
# do actual stuff of interest
return(fit.value)
Then you simply feed this function to the differential evolution package function and that should do the trick. Methods like differential evolution are made to solve in particular very high dimensional problems. However the constraint you mentioned can be a problem as it will likely result in very many invalid parameter configurations. This is not necessarily a problem for the algorithm, but is simply means you need to do a lot of tweaking and need to expect a lot of waiting time. Depending on your problem, you could try optimizing weights/ parameters in blocks. That means, optimize parameters given a set of weights, then optimize weights given the previous set of parameters and repeat that many times.
Hope this helps :)
Currently I'm using PuLP to solve a maximization problem. It works fine, but I'd like to be able to get the N-best solutions instead of just one. Is there a way to do this in PuLP or any other free/Python solution? I toyed with the idea of just randomly picking some of the variables from the optimal solution and throwing them out and re-running, but this seems like a total hack.
If your problem is fast to solve, you can try to limit the objective from above step by step. For examle, if the objective value of the optimal solution is X, try to re-run the problem with an additional constraint:
problem += objective <= X - eps, ""
where the reduction step eps depends on your knowledge of the problem.
Of course, if you just pick some eps blindly and get a solution, you don't know if the solution is the 2nd best, 10th best or 1000-th best ... But you can do some systematic search (binary, grid) on the eps parameter (if the problem is really fast to solve).
So I figured out how (by RTFM) to get multiple soutions. In my code I essentially have:
number_unique = 1 # The number of variables that should be unique between runs
model += objective
model += constraint1
model += constraint2
model += constraint3
for i in range(1,5):
model.solve()
selected_vars = []
for p in vars:
if p_vars[p].value() != 0:
selected_vars.append(p)
print_results()
# Add a new constraint that the sum of all of the variables should
# not total up to what I'm looking for (effectively making unique solutions)
model += sum([p_vars[p] for p in selected_vars]) <= 10 - number_unique
This works great, but I've realized that I really do need to go the random route. I've got 10 different variables and by only throwing out a couple of them, my solutions tend to have the same heavy weighted vars in all the permutations (which is to be expected).
I have a function I want to minimize with scipy.optimize.fmin. Note that I force a print when my function is evaluated.
My problem is, when I start the minimization, the value printed decreases untill it reaches a certain point (the value 46700222.800). There it continues to decrease by very small bites, e.g., 46700222.797,46700222.765,46700222.745,46700222.699,46700222.688,46700222.678
So intuitively, I feel I have reached the minimum, since the length of each step are minus then 1. But the algorithm keeps running untill I get a "Maximum number of function evaluations has been exceeded" error.
My question is: how can I force my algorithm to accept the value of the parameter when the function evaluation reaches a value from where it does not really evolve anymore (let say, I don't gain more than 1 after an iteration). I read that the options ftol could be used but it has absolutely no effect on my code. In fact, I don't even know what value to put for ftol. I tried everything from 0.00001 to 10000 and there is still no convergence.
There is actually no need to see your code to explain what is happening. I will answer point by point quoting you.
My problem is, when I start the minimization, the value printed decreases
untill it reaches a certain point (the value 46700222.800). There it
continues to decrease by very small bites, e.g.,
46700222.797,46700222.765,46700222.745,46700222.699,46700222.688,46700222.678
Notice that the difference between the last 2 values is -0.009999997913837433, i.e. about 1e-2. In the convention of minimization algorithm, what you call values is usually labelled x. The algorithm stops if these 2 conditions are respected AT THE SAME TIME at the n-th iteration:
convergence on x: the absolute value of the difference between x[n] and the next iteration x[n+1] is smaller than xtol
convergence on f(x): the absolute value of the difference between f[n] and f[n+1] is smaller than ftol.
Moreover, the algorithm stops also if the maximum number of iterations is reached.
Now notice that xtol defaults to a value of 1e-4, about 100 times smaller than the value 1e-2 that appears for your case. The algorithm then does not stop, because the first condition on xtol is not respected, until it reaches the maximum number of iterations.
I read that the options ftol could be used but it has absolutely no
effect on my code. In fact, I don't even know what value to put for
ftol. I tried everything from 0.00001 to 10000 and there is still no
convergence.
This helped you respecting the second condition on ftol, but again the first condition was never reached.
To reach your aim, increase also xtol.
The following methods will also help you more in general when debugging the convergence of an optimization routine.
inside the function you want to minimize, print the value of x and the value of f(x) before returning it. Then run the optimization routine. From these prints you can decide sensible values for xtol and ftol.
consider nondimensionalizing the problem. There is a reason if ftol and xtol default both to 1e-4. They expect you to formulate the problem so that x and f(x) are of order O(1) or O(10), say numbers between -100 and +100. If you carry out the nondimensionalization you handle a simpler problem, in the way that you often know what values to expect and what tolerances you are after.
if you are interested just in a rough calculation and can't estimate typical values for xtol and ftol, and you know (or you hope) that your problem is well behaved, i.e. that it will converge, you can run fmin in a try block, pass to fmin only maxiter=20 (say), and catch the error regarding the Maximum number of function evaluations has been exceeded.
I just spent three hours digging into the source code of scipy.minimize. In it, the "while" loop in function "_minimize_neldermead" deals with the convergence rule:
if (numpy.max(numpy.ravel(numpy.abs(sim[1:] - sim[0]))) <= xtol and
numpy.max(numpy.abs(fsim[0] - fsim[1:])) <= ftol):
break"
"fsim" is the variable that stores results from functional evaluation. However, I found that fsim[0] = f(x0) which is the function evaluation of the initial value, and it never changes during the "while" loop. fsim[1:] updates itself all the time. The second condition of the while loop was never satisfied. It might be a bug. But my knowledge of mathematical optimization is far from enough to judge it.
My current solution: design your own system to control the convergence. Add this in your function:
global x_old, Q_old
if (np.absolute(x_old-x).sum() <= 1e-4) and (np.absolute(Q_old-Q).sum() <= 1e-4):
return None
x_old = x; Q_old = Q
Here Q=f(x). Don't forget to give them an initial value.
Update 01/30/15:
I got it! This should be the correct code for the second line of the if function (i.e. remove numpy.absolute):
numpy.max(fsim[0] - fsim[1:]) <= ftol)
btw, this is my first debugging of a open source software. I just created an issue on GitHub.
Update 01/31/15 - 1:
I don't think my previous update is correct. Nevertheless, this is the a screenshot of the iterations of a function using the original code.
It prints the values of sim and fsim variable for each iteration. As you can see, the changes of each iteration is less than both of xtol and ftol values, but it just kept going without stopping. The original code compares the difference between fsim[0] and the rest of fsim values, i.e. the value here is always 87.63228689 - 87.61312213 = .01916476, which is greater than ftol=1e-2.
Update 01/31/15 - 2:
Here is the data and code that I used to reproduce the previous results. It includes two data files and one iPython Notebook file.
From the documentation it looks like you DO want to change the ftol arg.
Post your code so we can look at your progress.
edit: Try increasing xtol as well.
Your question is a bit ambiguous. Are you printing the value of your function, or the point where it is evaluated?
My understanding of xtol and ftol is as follows. The iteration stops
when the change in the value of the function between iterations is less than ftol
AND
when the change in x between successive iterations is less than xtol
When you say "...accept the value of the parameter...", this suggests you should change xtol.
I have been playing around with Python and math lately, and I ran in to something I have yet to be able to figure out. Namely, is it possible, given an arbitrary lambda, to return the inverse of that lambda for mathematical operations? That is, invertLambda such that invertLambda(lambda x:(x+2))(2) = 0. The fact that lambdas are restricted to expressions gives me hope, but so far I have not been able to make it work. I understand that any result would have problems with functions that lose information, but I am willing to restrict users and myself to lossless functions if I have to.
Of course not: if lambda is not an injective function, you cannot invert it. Example: you cannot invert lambda mapping x to x*x, since the sign of the original x is lost.
Leaving injectivity aside, there are functions which are computationally very complex to invert. Consider, for example, restoring the original value from its md5 hash. (For a lambda calculating md5 hash, inverted function must break md5 in cryptological sense!)
Edit:
indeed, we can theoretically make lambdas invertable if we restrict the expressions which can be used there. For example, if the lambda is a linear function of 1 argument, we can easily invert it. If it's a polynomial of degree > 4, we have a problem with algebraically exact solution.
Of course, we could refrain from exact solution, and just invert the function numerically. This is possible, using, well, any method of numerical solving of the equation lambda(x) = value will do (the simplest be binary search).
I am a bit late, but I just published a python package that does this precisely. You may want to borrow some ideas from it:
https://pypi.python.org/pypi/pynverse
It essentially follows this strategy:
Figure out if the function is increasing or decreasing. For this two reference points ref1 and ref2 are needed:
In case of a finite interval, the points ref points are 1/4 and 3/4 through the interval.
In an infinite interval any two values work really.
If f(ref1) < f(ref2), the function is increasing, otherwise is decreasing.
Figure out the image of the function in the interval.
If values are provided, then those are used.
In a closed interval just calculate f(a) and f(b), where a and b are the ends of the interval.
In an open interval try to calculate f(a) and f(b), if this works those are used, otherwise it will be assume to be (-Inf, Inf).
Built a bounded function with the following conditions:
bounded_f(x):
return -Inf if x below interval, and f is increasing.
return +Inf if x below interval, and f is decreasing.
return +Inf if x above interval, and f is increasing.
return -Inf if x above interval, and f is decreasing.
return f(x) otherwise
If the required number y0 for the inverse is outside the image, raise an exception.
Find roots for bounded_f(x)-y0, by minimizing (bounded_f(x)-y0)**2, using the Brent method, making sure that the algorithm for minimising starts in a point inside the original interval by setting ref1, ref2 as brackets. As soon as if goes outside the allowed intervals, bounded_f returns infinite, forcing the algorithm to go back to search inside the interval.
Check that the solutions are accurate and they meet f(x0)=y0 to some desired precision, raising a warning otherwise.
Of course, as Vlad pointed out, the function has to be invertible for the inverse to exist, and also continuous in the domain for this to work.
I have spent many hours trying to figure what is going on here.
The function 'grad_logp' in the code below is called many times in my program, and cProfile and runsnakerun the visualize the results reveals that the function grad_logp spends about .00004s 'locally' every call not in any functions it calls and the function 'n' spends about .00006s locally every call. Together these two times make up about 30% of program time that I care about. It doesn't seem like this is function overhead as other python functions spend far less time 'locally' and merging 'grad_logp' and 'n' does not make my program faster, but the operations that these two functions do seem rather trivial. Does anyone have any suggestions on what might be happening?
Have I done something obviously inefficient? Am I misunderstanding how cProfile works?
def grad_logp(self, variable, calculation_set ):
p = params(self.p,self.parents)
return self.n(variable, self.p)
def n (self, variable, p ):
gradient = self.gg(variable, p)
return np.reshape(gradient, np.shape(variable.value))
def gg(self, variable, p):
if variable is self:
gradient = self._grad_logps['x']( x = self.value, **p)
else:
gradient = __builtin__.sum([self._pgradient(variable, parameter, value, p) for parameter, value in self.parents.iteritems()])
return gradient
Functions coded in C are not instrumented by profiling; so, for example, any time spent in sum (which you're spelling __builtin__.sum) will be charged to its caller. Not sure what np.reshape is, but if it's numpy.reshape, the same applies there.
Your "many hours" might be better spent making your code less like a maze of twisty little passages and also documenting it.
The first method's arg calculation_set is NOT USED.
Then it does p = params(self.p,self.parents) but that p is NOT USED.
variable is self???
__builtin__.sum???
Get it firstly understandable, secondly correct. Then and only then, worry about the speed.