I have a function I want to minimize with scipy.optimize.fmin. Note that I force a print when my function is evaluated.
My problem is, when I start the minimization, the value printed decreases untill it reaches a certain point (the value 46700222.800). There it continues to decrease by very small bites, e.g., 46700222.797,46700222.765,46700222.745,46700222.699,46700222.688,46700222.678
So intuitively, I feel I have reached the minimum, since the length of each step are minus then 1. But the algorithm keeps running untill I get a "Maximum number of function evaluations has been exceeded" error.
My question is: how can I force my algorithm to accept the value of the parameter when the function evaluation reaches a value from where it does not really evolve anymore (let say, I don't gain more than 1 after an iteration). I read that the options ftol could be used but it has absolutely no effect on my code. In fact, I don't even know what value to put for ftol. I tried everything from 0.00001 to 10000 and there is still no convergence.
There is actually no need to see your code to explain what is happening. I will answer point by point quoting you.
My problem is, when I start the minimization, the value printed decreases
untill it reaches a certain point (the value 46700222.800). There it
continues to decrease by very small bites, e.g.,
46700222.797,46700222.765,46700222.745,46700222.699,46700222.688,46700222.678
Notice that the difference between the last 2 values is -0.009999997913837433, i.e. about 1e-2. In the convention of minimization algorithm, what you call values is usually labelled x. The algorithm stops if these 2 conditions are respected AT THE SAME TIME at the n-th iteration:
convergence on x: the absolute value of the difference between x[n] and the next iteration x[n+1] is smaller than xtol
convergence on f(x): the absolute value of the difference between f[n] and f[n+1] is smaller than ftol.
Moreover, the algorithm stops also if the maximum number of iterations is reached.
Now notice that xtol defaults to a value of 1e-4, about 100 times smaller than the value 1e-2 that appears for your case. The algorithm then does not stop, because the first condition on xtol is not respected, until it reaches the maximum number of iterations.
I read that the options ftol could be used but it has absolutely no
effect on my code. In fact, I don't even know what value to put for
ftol. I tried everything from 0.00001 to 10000 and there is still no
convergence.
This helped you respecting the second condition on ftol, but again the first condition was never reached.
To reach your aim, increase also xtol.
The following methods will also help you more in general when debugging the convergence of an optimization routine.
inside the function you want to minimize, print the value of x and the value of f(x) before returning it. Then run the optimization routine. From these prints you can decide sensible values for xtol and ftol.
consider nondimensionalizing the problem. There is a reason if ftol and xtol default both to 1e-4. They expect you to formulate the problem so that x and f(x) are of order O(1) or O(10), say numbers between -100 and +100. If you carry out the nondimensionalization you handle a simpler problem, in the way that you often know what values to expect and what tolerances you are after.
if you are interested just in a rough calculation and can't estimate typical values for xtol and ftol, and you know (or you hope) that your problem is well behaved, i.e. that it will converge, you can run fmin in a try block, pass to fmin only maxiter=20 (say), and catch the error regarding the Maximum number of function evaluations has been exceeded.
I just spent three hours digging into the source code of scipy.minimize. In it, the "while" loop in function "_minimize_neldermead" deals with the convergence rule:
if (numpy.max(numpy.ravel(numpy.abs(sim[1:] - sim[0]))) <= xtol and
numpy.max(numpy.abs(fsim[0] - fsim[1:])) <= ftol):
break"
"fsim" is the variable that stores results from functional evaluation. However, I found that fsim[0] = f(x0) which is the function evaluation of the initial value, and it never changes during the "while" loop. fsim[1:] updates itself all the time. The second condition of the while loop was never satisfied. It might be a bug. But my knowledge of mathematical optimization is far from enough to judge it.
My current solution: design your own system to control the convergence. Add this in your function:
global x_old, Q_old
if (np.absolute(x_old-x).sum() <= 1e-4) and (np.absolute(Q_old-Q).sum() <= 1e-4):
return None
x_old = x; Q_old = Q
Here Q=f(x). Don't forget to give them an initial value.
Update 01/30/15:
I got it! This should be the correct code for the second line of the if function (i.e. remove numpy.absolute):
numpy.max(fsim[0] - fsim[1:]) <= ftol)
btw, this is my first debugging of a open source software. I just created an issue on GitHub.
Update 01/31/15 - 1:
I don't think my previous update is correct. Nevertheless, this is the a screenshot of the iterations of a function using the original code.
It prints the values of sim and fsim variable for each iteration. As you can see, the changes of each iteration is less than both of xtol and ftol values, but it just kept going without stopping. The original code compares the difference between fsim[0] and the rest of fsim values, i.e. the value here is always 87.63228689 - 87.61312213 = .01916476, which is greater than ftol=1e-2.
Update 01/31/15 - 2:
Here is the data and code that I used to reproduce the previous results. It includes two data files and one iPython Notebook file.
From the documentation it looks like you DO want to change the ftol arg.
Post your code so we can look at your progress.
edit: Try increasing xtol as well.
Your question is a bit ambiguous. Are you printing the value of your function, or the point where it is evaluated?
My understanding of xtol and ftol is as follows. The iteration stops
when the change in the value of the function between iterations is less than ftol
AND
when the change in x between successive iterations is less than xtol
When you say "...accept the value of the parameter...", this suggests you should change xtol.
Related
I am setting up to use SciPy's basin-hopping global optimizer. Its documentation for parameter T states
T: float, optional
The “temperature” parameter for the accept or reject criterion. Higher “temperatures” mean that larger jumps in function value will be accepted. For best results T should be comparable to the separation (in function value) between local minima.
When it says "function value", does that mean the expected return value of the cost function func? Or the value passed to it? Or something else?
I read the source, and I see where T is passed to the Metropolis acceptance criterion, but I do not understand how it is used when converted to "beta".
I'm unfamiliar with the algorithm, but if you keep reading the documentation on the link you posted you'll find this:
Choosing T: The parameter T is the “temperature” used in the Metropolis criterion. Basinhopping steps are always accepted if func(xnew) < func(xold). Otherwise, they are accepted with probability:exp( -(func(xnew) - func(xold)) / T ). So, for best results, T should to be comparable to the typical difference (in function values) between local minima. (The height of “walls” between local minima is irrelevant.)
So I believe T should take on the value of the function which you are trying to optimize, func. This makes sense if you look at that probability expression -- you'd be comparing a difference in function values to what is meant to be a type of "upper bound" for the step. For example, if one local minima is func = 10 and another is func = 14, you might consider T = 4 to be an appropriate value.
I'm looking for a Python algorithm to find the root of a function f(x) using bisection, equivalent to scipy.optimize.bisect, but allowing for discontinuities (jumps) in f. The function f is weakly monotonous.
It would be nice but not necessary for the algorithm to flag if the crossing (root) is directly 'at' a jump, and in this case to return the exact value x at which the relevant jump occurs (i.e. say the x for which sign(f(x-e)) != sign(f(x+e)) and abs(f(x-e)-f(x+e)>a for infinitesimal e>0 and non-infinitesimal a>0). It is also okay if instead the algorithm, for example, simply returns an x within a certain tolerance in this case.
As the function is only weakly monotonous, it can have flat areas, and theoretically these can occur 'at' the root, i.e. where f=0: f(x)=0 for an entire range, x in [x_0,x_1]. In this case again, nice but not necessary for the algo to flag this particularity, and to, say, ensure an x from the range [x_0,x_1] is returned.
As long as you supply (possibly very small) strictly positive positives for xtol and rtol, the function will work with discontinuities:
import numpy as np
>>> optimize.bisect(f=np.sign, a=-1, b=1, xtol=0.0001, rtol=0.001)
0.0
If you look in the scipy codebase at the C source code implementation of the function you can see that this is a very simple function that makes no assumptions on continuity. It basically takes two points which have a sign change, and switches to a smaller range with a sign change, until the iterations run out or the tolerances are met.
Given your requirements that functions might be discontinuous / flat, it is in fact necessary (for any algorithm) to supply these tolerances. Without them, it could be impossible for an optimization function to converge to a solution.
I am trying to find the global minimum of an objective function using basinhopping, but for a majority of the time it is stuck at a local minimum. I read through the document for basinhopping, and found the interval and accept_test might be helpful, but now the question is what values to give them, e.g. I want my objective function to go as close to 0 as possible (1e-5 close) without spending too much time at very large values like 4 or 5. As for interval how does one know how often a stepsize is being updated?
Here is how i'm looking for a "global" minimum:
np.random.seed(555) # Seeded to allow replication.
minimizer_kwargs = {"method": "L-BFGS-B", "bounds": bnds,, tol=1e-4}
ret = basinhopping(merit_function, abcdex, minimizer_kwargs=minimizer_kwargs, niter=10)
zoom = ret['x']
res = minimize(merit_function, zoom, method = 'L-BFGS-B', bounds=bnds, tol=1e-9)
print res
If you're stuck in a local minimum then that likely means you need a bigger stepsize. You can set the stepsize with the keyword "stepsize".
An appropriate stepsize depends on the problem, but luckily basinhopping will adjust the stepsize automatically. How often it does this depends on the "interval" keyword. Every interval iterations the stepsize will be increased or decreased by a factor of 0.9. If the initial guess for the stepsize is way off this can still take some time. If you decrease the interval to 10 (or so) this should be much faster.
I don't think accept_test will help you here. That can be used to, for example, enforce forbidden regions of configuration space.
TL;DR: How to minimize a fairly smooth function that returns an integer value (not a float)?
>>> import scipy.optimize as opt
>>> opt.fmin(lambda (x,y): (0.1*x**2+0.1*(y**2)), (-10, 9))
Optimization terminated successfully.
Current function value: 0.000000
Iterations: 49
Function evaluations: 92
array([ -3.23188819e-05, -1.45087583e-06])
>>> opt.fmin(lambda (x,y): int(0.1*x**2+0.1*(y**2)), (-10, 9))
Optimization terminated successfully.
Current function value: 17.000000
Iterations: 17
Function evaluations: 60
array([-9.5 , 9.45])
Trying to minimize a function that accepts floating point parameters but returns an integer, I'm running into a problem that the solver terminates immediately. This effect is demonstrated in the examples above - notice that the when the value returned is rounded as an int, the evaluation terminates prematurely.
I assume that this is happening because it detects no change in the derivative, i.e. the first time it changes a parameter, the change it makes is too small and the difference between first result and second is 0.00000000000, incorrectly indicating a minimum has been found.
I've had better luck with optimize.anneal, but despite its integer valued return I've plotted some regions of the function in three dimensions and it's actually pretty smooth. Therefore, I was hoping that when a derivative-aware minimizer would work better.
I've reverted to manually graphing to explore the space, but I'd like to introduce a couple more parameters so it'd be great if I could get this working.
The function I'm trying to minimize can't be made to return a float. It's counting the number of successful hits from a cross-validation, and I'm having the optimizer alter parameters on the model.
Any ideas?
UPDATE
Found a similar question: How to force larger steps on scipy.optimize functions?
In general, minimization on the integer space is an entirely different field called integer programming (or discrete optimization). The addition of integer constraints actually creates quite a few algorithmic difficulties that render continuous methods unfit. Look into scipy.optimize.anneal
I have been playing around with Python and math lately, and I ran in to something I have yet to be able to figure out. Namely, is it possible, given an arbitrary lambda, to return the inverse of that lambda for mathematical operations? That is, invertLambda such that invertLambda(lambda x:(x+2))(2) = 0. The fact that lambdas are restricted to expressions gives me hope, but so far I have not been able to make it work. I understand that any result would have problems with functions that lose information, but I am willing to restrict users and myself to lossless functions if I have to.
Of course not: if lambda is not an injective function, you cannot invert it. Example: you cannot invert lambda mapping x to x*x, since the sign of the original x is lost.
Leaving injectivity aside, there are functions which are computationally very complex to invert. Consider, for example, restoring the original value from its md5 hash. (For a lambda calculating md5 hash, inverted function must break md5 in cryptological sense!)
Edit:
indeed, we can theoretically make lambdas invertable if we restrict the expressions which can be used there. For example, if the lambda is a linear function of 1 argument, we can easily invert it. If it's a polynomial of degree > 4, we have a problem with algebraically exact solution.
Of course, we could refrain from exact solution, and just invert the function numerically. This is possible, using, well, any method of numerical solving of the equation lambda(x) = value will do (the simplest be binary search).
I am a bit late, but I just published a python package that does this precisely. You may want to borrow some ideas from it:
https://pypi.python.org/pypi/pynverse
It essentially follows this strategy:
Figure out if the function is increasing or decreasing. For this two reference points ref1 and ref2 are needed:
In case of a finite interval, the points ref points are 1/4 and 3/4 through the interval.
In an infinite interval any two values work really.
If f(ref1) < f(ref2), the function is increasing, otherwise is decreasing.
Figure out the image of the function in the interval.
If values are provided, then those are used.
In a closed interval just calculate f(a) and f(b), where a and b are the ends of the interval.
In an open interval try to calculate f(a) and f(b), if this works those are used, otherwise it will be assume to be (-Inf, Inf).
Built a bounded function with the following conditions:
bounded_f(x):
return -Inf if x below interval, and f is increasing.
return +Inf if x below interval, and f is decreasing.
return +Inf if x above interval, and f is increasing.
return -Inf if x above interval, and f is decreasing.
return f(x) otherwise
If the required number y0 for the inverse is outside the image, raise an exception.
Find roots for bounded_f(x)-y0, by minimizing (bounded_f(x)-y0)**2, using the Brent method, making sure that the algorithm for minimising starts in a point inside the original interval by setting ref1, ref2 as brackets. As soon as if goes outside the allowed intervals, bounded_f returns infinite, forcing the algorithm to go back to search inside the interval.
Check that the solutions are accurate and they meet f(x0)=y0 to some desired precision, raising a warning otherwise.
Of course, as Vlad pointed out, the function has to be invertible for the inverse to exist, and also continuous in the domain for this to work.