I am setting up to use SciPy's basin-hopping global optimizer. Its documentation for parameter T states
T: float, optional
The “temperature” parameter for the accept or reject criterion. Higher “temperatures” mean that larger jumps in function value will be accepted. For best results T should be comparable to the separation (in function value) between local minima.
When it says "function value", does that mean the expected return value of the cost function func? Or the value passed to it? Or something else?
I read the source, and I see where T is passed to the Metropolis acceptance criterion, but I do not understand how it is used when converted to "beta".
I'm unfamiliar with the algorithm, but if you keep reading the documentation on the link you posted you'll find this:
Choosing T: The parameter T is the “temperature” used in the Metropolis criterion. Basinhopping steps are always accepted if func(xnew) < func(xold). Otherwise, they are accepted with probability:exp( -(func(xnew) - func(xold)) / T ). So, for best results, T should to be comparable to the typical difference (in function values) between local minima. (The height of “walls” between local minima is irrelevant.)
So I believe T should take on the value of the function which you are trying to optimize, func. This makes sense if you look at that probability expression -- you'd be comparing a difference in function values to what is meant to be a type of "upper bound" for the step. For example, if one local minima is func = 10 and another is func = 14, you might consider T = 4 to be an appropriate value.
Related
In this video, Justin has written the following in the compute function of his component.
def compute(self, inputs, outputs):
self.alp_sc = 0.91
T0 = 28. #reference temperature
eff0 = .285 #efficiency at ref temp
T1 = -150.
eff1 = 0.335
delta_T = inputs['T'] - T0
self.slope = (eff1 - eff0) / (T1 - T0)
outputs['eta'] = (eff0 + self.slope * delta_T)/self.alp_sc
Why does he choose to use self.slope and self.alp_sc instead of just plain variables? Is it something important for openMDAO/vectorised components, or just an arbitrary choice?
The attributes (e.g. self.alp_sc) are constants that were never intended to change for that sample component. They do not need to be input variables, because they will never be connected to.
It is considered a best practice to define partial derivatives for all outputs with respect to all inputs in any component. Therefore, by making these values attributes, we can respect the best practice and avoid declaring partials for things we know will never change.
If you continue to watch the video, self.slope and self.alp_sc are also used in the compute_partials method. self.alp_sc is constant, so I think it could have been declared in the initialize method of the class (if it is a parameter, that you want to potentially change for different instances), or outside of the class (if it is really a constant). self.slope in this case is also a constant, so the same applies. But imagine, the slope would depend on the input of your component, and you would need to recalculate it in each iteration (and let's say it is also very computationally expensive, which in the example is clearly not). In this case you could save some computations by storing the value in a class attribute (self.slope) and just reusing it in the derivative computation.
One thing, that must to be ensured is, that in each iteration compute is called before compute_partials (otherwise you could end up using an obsolete value from the previous iteration in the derivative calculation), but I think that is always true in the current OpenMDAO 3.0.
It is quite common that you need to calculate the same quantities for the function and for the derivatives. Storing it in an attribute is one way to do it (less computation), or calling the same function within compute and compute_partials (twice as much computation, less memory) is another way.
From the snippet of code it looks like def compute(self, inputs, outputs) is a method inside of an object/class. This can be seen from self being conatined in its initialisation.
self.slope and 'self.alp_scwill be variables that are declared within the scope of the object that thecompute()` method is part of.
See classes in Python for more on this.
I'm looking for a Python algorithm to find the root of a function f(x) using bisection, equivalent to scipy.optimize.bisect, but allowing for discontinuities (jumps) in f. The function f is weakly monotonous.
It would be nice but not necessary for the algorithm to flag if the crossing (root) is directly 'at' a jump, and in this case to return the exact value x at which the relevant jump occurs (i.e. say the x for which sign(f(x-e)) != sign(f(x+e)) and abs(f(x-e)-f(x+e)>a for infinitesimal e>0 and non-infinitesimal a>0). It is also okay if instead the algorithm, for example, simply returns an x within a certain tolerance in this case.
As the function is only weakly monotonous, it can have flat areas, and theoretically these can occur 'at' the root, i.e. where f=0: f(x)=0 for an entire range, x in [x_0,x_1]. In this case again, nice but not necessary for the algo to flag this particularity, and to, say, ensure an x from the range [x_0,x_1] is returned.
As long as you supply (possibly very small) strictly positive positives for xtol and rtol, the function will work with discontinuities:
import numpy as np
>>> optimize.bisect(f=np.sign, a=-1, b=1, xtol=0.0001, rtol=0.001)
0.0
If you look in the scipy codebase at the C source code implementation of the function you can see that this is a very simple function that makes no assumptions on continuity. It basically takes two points which have a sign change, and switches to a smaller range with a sign change, until the iterations run out or the tolerances are met.
Given your requirements that functions might be discontinuous / flat, it is in fact necessary (for any algorithm) to supply these tolerances. Without them, it could be impossible for an optimization function to converge to a solution.
I am trying to use scipy.optimize.least_squares(fun= my_fun, jac=my_jac, max_nfev= 1000) with two callable functions: my_fun and my_jac
both fuctions: my_fun and my_jac, use an external software to evaluate their value, this task is much time consuming, therefore I prefer to control the number of evaluations for both
the trf method uses the my_fun function for evaluating if trust region is adequate and the my_jac function for determine both the cost function and the jacobian matrix
There is an input parameter max_nfev. does this parameter count only for the fun evaluations? does it consider also the jac evaluations?
moreover, in matlab there are two parameters for the lsqnonlin function, MaxIterations and MaxFunctionEvaluations. does it exist in scipy.optimize.least_squares?
Thanks
Alon
According to the help of scipy.optimize.least_squares, max_nfev is the number of function evaluations before the program exits :
max_nfev : None or int, optional
Maximum number of function evaluations before the termination.
If None (default), the value is chosen automatically:
Again, according to the help, there is no MaxIterations argument but you can define the tolerance in f (ftol) that is the function you want to minimize or x (xtol) the solution, before exiting the code.
You can also use scipy.optimize.minimize(). In it, you can define a maxiter argument which will be in the options dictionary.
If you do so, beware that the function you want to minimize must be your cost function, meaning that you will have to code your least square function.
I hope it will be clear and useful to you
Fairly simple question: How should I use #pm.stochastic? I have read some blog posts that claim #pm.stochasticexpects a negative log value:
#pm.stochastic(observed=True)
def loglike(value=data):
# some calculations that generate a numeric result
return -np.log(result)
I tried this recently but found really bad results. Since I also noticed that some people used np.log instead of -np.log, I give it a try and worked much better. What is really expecting #pm.stochastic? I'm guessing there was a small confusion on the sign required due to a very popular example using something like np.log(1/(1+t_1-t_0)) which was written as -np.log(1+t_1-t_0)
Another question: What is this decorator doing with the value argument? As I understand it, we start with some proposed value for the priors that need to enter in the likelihood and the idea of #pm.stochastic is basically produce some number to compare this likelihood to the number generated by the previous iteration in the sampling process. The likelihood should receive the value argument and some values for the priors, but I'm not sure if this is all value is doing because that's the only required argument and yet I can write:
#pm.stochastic(observed=True)
def loglike(value=[1]):
data = [3,5,1] # some data
# some calculations that generate a numeric result
return np.log(result)
And as far as I can tell, that produces the same result as before. Maybe, it works in this way because I added observed=True to the decorator. If I would have tried this in a stochastic variable with observed=False by default, value would be changed in each iteration trying to obtain a better likelihood.
#pm.stochastic is a decorator, so it is expecting a function. The simplest way to use it is to give it a function that includes value as one of its arguments, and returns a log-likelihood.
You should use the #pm.stochastic decorator to define a custom prior for a parameter in your model. You should use the #pm.observed decorator to define a custom likelihood for data. Both of these decorators will create a pm.Stochastic object, which takes its name from the function it decorates, and has all the familiar methods and attributes (here is a nice article on Python decorators).
Examples:
A parameter a that has a triangular distribution a priori:
#pm.stochastic
def a(value=.5):
if 0 <= value < 1:
return np.log(1.-value)
else:
return -np.inf
Here value=.5 is used as the initial value of the parameter, and changing it to value=1 raises an exception, because it is outside of the support of the distribution.
A likelihood b that has is normal distribution centered at a, with a fixed precision:
#pm.observed
def b(value=[.2,.3], mu=a):
return pm.normal_like(value, mu, 100.)
Here value=[.2,.3] is used to represent the observed data.
I've put this together in a notebook that shows it all in action here.
Yes confusion is easy since the #stochastic returns a likelihood which is the opposite of the error essentially. So you take the negative log of your custom error function and return THAT as your log-likelihood.
I have been playing around with Python and math lately, and I ran in to something I have yet to be able to figure out. Namely, is it possible, given an arbitrary lambda, to return the inverse of that lambda for mathematical operations? That is, invertLambda such that invertLambda(lambda x:(x+2))(2) = 0. The fact that lambdas are restricted to expressions gives me hope, but so far I have not been able to make it work. I understand that any result would have problems with functions that lose information, but I am willing to restrict users and myself to lossless functions if I have to.
Of course not: if lambda is not an injective function, you cannot invert it. Example: you cannot invert lambda mapping x to x*x, since the sign of the original x is lost.
Leaving injectivity aside, there are functions which are computationally very complex to invert. Consider, for example, restoring the original value from its md5 hash. (For a lambda calculating md5 hash, inverted function must break md5 in cryptological sense!)
Edit:
indeed, we can theoretically make lambdas invertable if we restrict the expressions which can be used there. For example, if the lambda is a linear function of 1 argument, we can easily invert it. If it's a polynomial of degree > 4, we have a problem with algebraically exact solution.
Of course, we could refrain from exact solution, and just invert the function numerically. This is possible, using, well, any method of numerical solving of the equation lambda(x) = value will do (the simplest be binary search).
I am a bit late, but I just published a python package that does this precisely. You may want to borrow some ideas from it:
https://pypi.python.org/pypi/pynverse
It essentially follows this strategy:
Figure out if the function is increasing or decreasing. For this two reference points ref1 and ref2 are needed:
In case of a finite interval, the points ref points are 1/4 and 3/4 through the interval.
In an infinite interval any two values work really.
If f(ref1) < f(ref2), the function is increasing, otherwise is decreasing.
Figure out the image of the function in the interval.
If values are provided, then those are used.
In a closed interval just calculate f(a) and f(b), where a and b are the ends of the interval.
In an open interval try to calculate f(a) and f(b), if this works those are used, otherwise it will be assume to be (-Inf, Inf).
Built a bounded function with the following conditions:
bounded_f(x):
return -Inf if x below interval, and f is increasing.
return +Inf if x below interval, and f is decreasing.
return +Inf if x above interval, and f is increasing.
return -Inf if x above interval, and f is decreasing.
return f(x) otherwise
If the required number y0 for the inverse is outside the image, raise an exception.
Find roots for bounded_f(x)-y0, by minimizing (bounded_f(x)-y0)**2, using the Brent method, making sure that the algorithm for minimising starts in a point inside the original interval by setting ref1, ref2 as brackets. As soon as if goes outside the allowed intervals, bounded_f returns infinite, forcing the algorithm to go back to search inside the interval.
Check that the solutions are accurate and they meet f(x0)=y0 to some desired precision, raising a warning otherwise.
Of course, as Vlad pointed out, the function has to be invertible for the inverse to exist, and also continuous in the domain for this to work.