Optuna suggest float log=True

Optuna suggest float log=True - python

How can I have optuna suggest float numeric values from this list:
[1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1.0]
I'm using this Python code snippet:
trial.suggest_float("lambda", 1e-6, 1.0, log=True)
It correctly suggests values between 1e-6 and 1.0, but it suggests other values in the range, not just the values explicitly in the list above. What am I doing wrong?

What's wrong with suggest_categorical (in this instance)?
The suggest_categorical approach works alright, but it's subtly going to hurt the efficiency of your search in a way that can sometimes be extremely significant (especially if your list is large or you use this categorical approach many times in the same search space). Optuna considers each value in the list to be its own separate entity that cannot be ordered or compared to others. For example, I often use suggest_categorical to select between different algorithms, e.g. trial.suggest_categorical("algo", ["dbscan", "kmeans"]). But of course, it would be silly to say that dbscan > kmeans, or to find the value "halfway between" dbscan and kmeans. Optuna treats your list of numbers the same way. Even if it finds that performance is steadily and substantially decreasing as you move from 1.0 to 1e-5, it will still try 1e-6 because it cannot extrapolate on the trend. Note this is actually good, when your values are truly categorical and un-ordered, but bad in your case since we are trying parameters that are almost certainly bad.
What to do instead
trial.suggest_float selects between the min and max values provided in a continuous manner, unless the "step" argument is provided.
For example, trial.suggest_float("x", 0, 10) can return 0.0, 6.5, 3.25846, or anything else between 0 and 10. With step=0.5, it can only return numbers divisible by 0.5.
Sadly, the Optuna docs state:
The step and log arguments cannot be used at the same time. To set the step argument to a float number, set the log argument to False.
However, you can get around this. What I would suggest is having optima give you the exponent, and you calculate the actual value yourself. For example:
exp = trial.suggest_int("exp", -6, 0)
x = 10 ** exp
Essentially, anytime you can express the parameters you want as some function (in this case 10^x for x between -6 and 0), it is better to just code that function directly, with optima supplying the inputs, than try and bend optuna's functions too far to your purposes.

For selecting from a list, use suggest_categorical.
trial.suggest_categorical("lambda", [1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1.0])

Related

Exclude/Ignore data region in polynomial fit (zfit)

I wanted to know if there's a way to exclude one or more data regions in a polynomial fit. Currently this doesn't seem to work as I would expect. Here a small example:
import numpy as np
import pandas as pd
import zfit
# Create test data
left_data = np.random.uniform(0, 3, size=1000).tolist()
mid_data = np.random.uniform(3, 6, size=5000).tolist()
right_data = np.random.uniform(6, 9, size=1000).tolist()
testsample = pd.DataFrame(left_data + mid_data + right_data, columns=["x"])
# Define fit parameter
coeff1 = zfit.Parameter('coeff1', 0.1, -3, 3)
coeff2 = zfit.Parameter('coeff2', 0.1, -3, 3)
# Define Space for the fit
obs_all = zfit.Space("x", limits=(0, 9))
# Perform the fit
bkg_fit = zfit.pdf.Chebyshev(obs=obs_all, coeffs=[coeff1, coeff2], coeff0=1)
new_testsample = zfit.Data.from_pandas(obs=obs_all, df=testsample.query("x<3 or x>6"), weights=None)
nll = zfit.loss.UnbinnedNLL(model=bkg_fit, data=new_testsample)
minimizer = zfit.minimize.Minuit()
result = minimizer.minimize(nll)
TestSample.png
Here I've created a small testsample with 3 uniformly distributed data. I only want to use the data in x < 3 OR x > 6 and ignore the 'peak' in between. Because of their equal shape and height, I'd expect that coeff1 and coeff2 would be at (nearly) zero and the fitted curve would be a straight, horizontal line. Obviously this doesn't happen because zfit assumes that there're just no entries between 3 and 6.
I also tried using MultiSpaces to ignore that region via
limit1 = zfit.Space("x", limits=(0, 3))
limit2 = zfit.Space("x", limits=(6, 9))
obs_data = limit1 + limit2
But this leads to a
ValueError: obs need to be a Space with exactly one limit if rescaling is requested.
Anyone has an idea how to solve this?
Thanks in advance ^^

Indeed, this is a bit of a tricky problem, but that may just needs a small update in zfit.
What you are doing is correct: simply use only the data in the desired region. However, this is not the whole story because there is a "normalization range": probabilistically speaking, it's like a conditioning on a certain region as we know the data can only be in a specific region. Hence the normalization of the PDF should only integrate over the included (LOW and HIGH) regions.
This can normally be done in two ways:
Using multispace
using the multispace property as you do. This should work (it is though most probably not the way to go in the future), except for a quirk in the polynomial function: the polynomials are defined from -1 to 1. Currently, the data is simply rescaled therefore to be within -1 and 1 (and for that it should use the "space" property of the PDF). This, currently, requires to be a simple space (which could also be allowed in principle, using the minimum and maximum of the limits).
Simultaneous fit
As mentioned in the comments by #jtlz2, you can do a simultaneous fit. That is nothing to worry about, it is simply splitting the likelihood into two parts. As it is a product of probabilities, we can just conceptually split it into two products and multiply (or add their log).
So you can have the pdf fit the lower region and the upper at the same time. However, this does not solve the problem of the normalization: what should the PDF be normalized to? We will run into the same problem.
Solution 1: different space and norm
Space and the normalization range are however not the same. By default, the space (usually called 'obs') is also used as the default normalization range but not required. So you could use one space going from the lowest to the largest point as the obs and then set the norm range with your multispace (set_norm should do it or set_norm_range if you're using not the newest version). This, I think, should do the trick.
Solution 2: manual re-scaling
The actual problem is that it complains about the re-scaling to -1 and 1 that can't be done. Every polynomial which does that can also be told not to do that by using the apply_scaling=False argument. With that, you're responsible to scale the data within -1 and 1 (as the polynomials are not defined outside) and there should not be any error.

Vectorized computation of log(n!)

I have an (arbitrarily shaped) array X of integers, and I would like to compute the logarithm of the factorial of each entry (Precisely, not through the Gamma function).
The numbers are big enough that
np.log(scipy.special.factorial(X))
is unfeasible. So I want to do something like np.sum(np.log(np.arange(2,X+1)), axis=-1)
But the arange() function gives a different size to each entry, so this doesn't work. I though about padding with ones, but I'm not sure how to do this.
Can this be done in a vectorized way?

I don't see what problem you have with the gamma function. The gamma function isn't an approximation, and while approximations may be involved in the computation of scipy.special.gammaln, there's no reason to expect those approximations to be worse than the error involved in computing the result manually. scipy.special.gammaln seems like the perfect tool for the job:
X_log_factorials = scipy.special.gammaln(X+1)
If you want to do this manually anyway, you could take the logarithms of all positive integers up to the maximum of your array, compute a cumulative sum, and then select the log-factorials you're interested in:
logarithms = numpy.log(numpy.arange(1, X.max()+1))
log_factorials = numpy.cumsum(logarithms)
X_log_factorials = log_factorials[X-1]
(If you want to handle 0!, you will need to make a minor adjustment, such as by setting X_log_factorials[X==0] = 0.)

scipy leastsq fit - penalize certain solutions

I have implemented an algorithm that is able to fit multiple data sets at the same time. It is based on this solution: multi fit
The target function is too complex to show here (LaFortune scatter model), so I will use the target function from the solution for explanation:
def lor_func(x,c,par):
a,b,d=par
return a/((x-c)**2+b**2)
How can I punish the fitting algorithm if it chooses a parameter set par that results in lor_func < 0.
A negative value for the target function is valid from a mathematical point of view. So the parameter set par resulting in this negative target function might be the solution with the least error. But I want to exlude such solutions as they are nor physically valid.
A function like:
def lor_func(x,c,par):
a,b,d=par
value = a/((x-c)**2+b**
return max(0, value)
does not work as the fit returns wrong data as it optimizes the 0-values too. The result will then be different from the correct one.

use the bounds argument of scipy.optimize.least_squares?
res = least_squares(func, x_guess, args=(Gd, K),
bounds=([0.0, -100, 0, 0],
[1.0, 0.0, 10, 1]),
max_nfev=100000, verbose=1)
like I did here:
Suggestions for fitting noisy exponentials with scipy curve_fit?

Using python SALib saltelli.sample method with boolean or discrete input parameters

I would like to use the Sobol method to run a sensitivity analysis on a complex model in python. This model includes continuous, discrete, as well as boolean input parameters.
Is it possible to use the SALib python package to perform this analysis? Specifically, can I use the saltelli.sample method to generate quasi-random sets of input parameters when some of them don't actually have upper or lower bounds but instead only several discrete options (like 0 or 1, for instance)
Here is an example of the saltelli.sample method (which generates low-discrepancy sequences) from the SALib documentation:
from SALib.sample import saltelli
import numpy as np
problem = {
'num_vars': 3,
'names': ['x1', 'x2', 'x3'],
'bounds': [[-np.pi, np.pi]]*3
}
# Generate samples
param_values = saltelli.sample(problem, 1000, calc_second_order=True)
My question is, how (if at all) can I use this method if my input parameters are more like this:
x1: continuous (so possible values could be 0, 0.01, 1.2...0.987)
x2: boolean (so possible values are 0 or 1)
x3: discrete (so possible values are 0, 0.25, 0.5, 0.75, or 1)

Solution posted on Github and wanted to share here for others:
Right now there is no way to (properly) sample discrete or boolean values. So I'd suggest a hack: sample a continous range and round to the nearest value you want.
If it's a boolean variable, sample on [0,1] and just round up or down. If it's discrete with N outcomes, sample on [0,N] and round to the nearest integer.
There was a blog post about this a while back:
https://waterprogramming.wordpress.com/2014/02/11/extensions-of-salib-for-more-complex-sensitivity-analyses/ (item 2 on the list)
It is a little hacky, but I think this is the more-or-less accepted way of doing things, especially if you're mixing continuous and discrete variables.

Integer step size in scipy optimize minimize

I have a computer vision algorithm I want to tune up using scipy.optimize.minimize. Right now I only want to tune up two parameters but the number of parameters might eventually grow so I would like to use a technique that can do high-dimensional gradient searches. The Nelder-Mead implementation in SciPy seemed like a good fit.
I got the code all set up but it seems that the minimize function really wants to use floating point values with a step size that is less than one.The current set of parameters are both integers and one has a step size of one and the other has a step size of two (i.e. the value must be odd, if it isn't the thing I am trying to optimize will convert it to an odd number). Roughly one parameter is a window size in pixels and the other parameter is a threshold (a value from 0-255).
For what it is worth I am using a fresh build of scipy from the git repo. Does anyone know how to tell scipy to use a specific step size for each parameter? Is there some way I can roll my own gradient function? Is there a scipy flag that could help me out? I am aware that this could be done with a simple parameter sweep, but I would eventually like to apply this code to much larger sets of parameters.
The code itself is dead simple:
import numpy as np
from scipy.optimize import minimize
from ScannerUtil import straightenImg
import bson
def doSingleIteration(parameters):
# do some machine vision magic
# return the difference between my value and the truth value
parameters = np.array([11,10])
res = minimize( doSingleIteration, parameters, method='Nelder-Mead',options={'xtol': 1e-2, 'disp': True,'ftol':1.0,}) #not sure if these params do anything
print "~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~"
print res
This is what my output looks like. As you can see we are repeating a lot of runs and not getting anywhere in the minimization.
*+++++++++++++++++++++++++++++++++++++++++
[ 11. 10.] <-- Output from scipy minimize
{'block_size': 11, 'degree': 10} <-- input to my algorithm rounded and made int
+++++++++++++++++++++++++++++++++++++++++
120 <-- output of the function I am trying to minimize
+++++++++++++++++++++++++++++++++++++++++
[ 11.55 10. ]
{'block_size': 11, 'degree': 10}
+++++++++++++++++++++++++++++++++++++++++
120
+++++++++++++++++++++++++++++++++++++++++
[ 11. 10.5]
{'block_size': 11, 'degree': 10}
+++++++++++++++++++++++++++++++++++++++++
120
+++++++++++++++++++++++++++++++++++++++++
[ 11.55 9.5 ]
{'block_size': 11, 'degree': 9}
+++++++++++++++++++++++++++++++++++++++++
120
+++++++++++++++++++++++++++++++++++++++++
[ 11.1375 10.25 ]
{'block_size': 11, 'degree': 10}
+++++++++++++++++++++++++++++++++++++++++
120
+++++++++++++++++++++++++++++++++++++++++
[ 11.275 10. ]
{'block_size': 11, 'degree': 10}
+++++++++++++++++++++++++++++++++++++++++
120
+++++++++++++++++++++++++++++++++++++++++
[ 11. 10.25]
{'block_size': 11, 'degree': 10}
+++++++++++++++++++++++++++++++++++++++++
120
+++++++++++++++++++++++++++++++++++++++++
[ 11.275 9.75 ]
{'block_size': 11, 'degree': 9}
+++++++++++++++++++++++++++++++++++++++++
120
+++++++++++++++++++++++++++++++++++++++++
~~~
SNIP
~~~
+++++++++++++++++++++++++++++++++++++++++
[ 11. 10.0078125]
{'block_size': 11, 'degree': 10}
+++++++++++++++++++++++++++++++++++++++++
120
Optimization terminated successfully.
Current function value: 120.000000
Iterations: 7
Function evaluations: 27
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
status: 0
nfev: 27
success: True
fun: 120.0
x: array([ 11., 10.])
message: 'Optimization terminated successfully.'
nit: 7*

Assuming that the function to minimize is arbitrarily complex (nonlinear), this is a very hard problem in general. It cannot be guaranteed to be solved optimal unless you try every possible option. I do not know if there are any integer constrained nonlinear optimizer (somewhat doubt it) and I will assume you know that Nelder-Mead should work fine if it was a contiguous function.
Edit: Considering the comment from #Dougal I will just add here: Set up a coarse+fine grid search first, if you then feel like trying if your Nelder-Mead works (and converges faster), the points below may help...
But maybe some points that help:
Considering how the whole integer constraint is very difficult, maybe it would be an option to do some simple interpolation to help the optimizer. It should still converge to an integer solution. Of course this requires to calculate extra points, but it might solve many other problems. (even in linear integer programming its common to solve the unconstrained system first AFAIK)
Nelder-Mead starts with N+1 points, these are hard wired in scipy (at least older versions) to (1+0.05) * x0[j] (for j in all dimensions, unless x0[j] is 0), which you will see in your first evaluation steps. Maybe these can be supplied in newer versions, otherwise you could just change/copy the scipy code (it is pure python) and set it to something more reasonable. Or if you feel that is simpler, scale all input variables down so that (1+0.05)*x0 is of sensible size.
Maybe you should cache all function evaluations, since if you use Nelder-Mead I would guess you can always run into duplicat evaluation (at least at the end).
You have to check how likely Nelder-Mead will just shrink to a single value and give up, because it always finds the same result.
You generally must check if your function is well behaved at all... This optimization is doomed if the function does not change smooth over the parameter space, and even then it can easily run into local minima if you should have of those. (since you cached all evaluations - see 2. - you could at least plot those and have a look at the error landscape without needing to do any extra evluations)

Unfortunately, Scipy's built-in optimization tools don't easily allow for this. But never fear; it sounds like you have a convex problem, and so you should be able to find a unique optimum, even if it won't be mathematically pretty.
Two options that I've implemented for different problems are creating a custom gradient descent algorithm, and using bisection on a series of univariate problems. If you're doing cross-validation in your tuning, your loss function unfortunately won't be smooth (because of noise from cross-validation on different datasets), but will be generally convex.
To implement gradient descent numerically (without having an analytical method for evaluating the gradient), choose a test point and a second point that is delta away from your test point in all dimensions. Evaluating your loss function at these two points can allow you to numerically compute a local subgradient. It is important that delta be large enough that it steps outside of local minima created by cross-validation noise.
A slower but potentially more robust alternative is to implement bisection for each parameter you're testing. If you know that the problem in jointly convex in your two parameters (or n parameters), you can separate this into n univariate optimization problems, and write a bisection algorithm which recursively hones in on the optimal parameters. This can help handle some types of quasiconvexity (e.g. if your loss function takes a background noise value for part of its domain, and is convex in another region), but requires a good guess as to the bounds for the initial iteration.
If you simply snap the requested x values to an integer grid without fixing xtol to map to that gridsize, you risk having the solver request two points within a grid cell, receiving the same output value, and concluding that it is at a minimum.
No easy answer, unfortunately.

Snap your floats x, y (a.k.a. winsize, threshold) to an integer grid inside your function, like this:
def func( x, y ):
x = round( x )
y = round( (y - 1) / 2 ) * 2 + 1 # 1 3 5 ...
...
Then Nelder-Mead will see function values only on the grid, and should give you near-integer x, y.
(If you'd care to post your code someplace, I'm looking for test cases for a Nelder-Mead
with restarts.)

The Nelder-Mead minimize method now lets you specify the initial simplex vertex points, so you should be able to set the simplex points far apart, and the simplex will then flop around and find the minimum and converge when the simplex size drops below 1.
https://docs.scipy.org/doc/scipy/reference/optimize.minimize-neldermead.html#optimize-minimize-neldermead

The problem is that the algorithm gets stuck trying to shrink its (N+1) simplex.
I'd highly recommend for anyone new to the concept to learn more about the geographical shape of a simplex and figure out how the input parameters relate to the points on the simplex. Once you get a grasp of that then as I.P. Freeley suggested the problem can be solved by defining strong initial points for your simplex, Note that this is different than defining your x0 and goes into nelder-mead's dedicated options. Here is an example of a higher --4-- dimensional problem. Also note that the initial simplex has to have N+1 points in this case 5 and in your case 3.
init_simplex = np.array([[1, .1, .3, .3], [.1, 1, .3, .3], [.1, .1, 5, .3],
[.1, .1, .3, 5], [1, 1, 5, 5]])
minimum = minimize(Optimize.simplex_objective, x0=np.array([.01, .01, .01, .01]),
method='Nelder-Mead',
options={'adaptive': True, 'xatol': 0.1, 'fatol': .00001,
'initial_simplex': init_simplex})
In this example the x0 gets ignored by the definition of the initial_simplex. Other useful option in high dimensional problems is the 'adaptive' option, which takes the number of parameters into acount while trying to set the models operational coefficients (ie. α, γ,ρ and σ for reflection, expansion, contraction and shrink respectively). And if you haven't already, I'd also recommend familiarizing yourself with the steps of the algorithm.
Now as for the reason this problem is happening its because the method gets no good results in an expansion so it keeps shrinking the simplex smaller and smaller trying to find out a better solution that may or may not exist.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.