Binary variables for minimization by scipy differential evolution - python

I have a non-linear minimization problem that takes a combination of continuous and binary variables as input. Think of it as a network flow problem with valves, for which the throughput can be controlled, and with pumps, for which you can change the direction.
A "natural," minimalistic formulation could be:
arg( min( f(x1,y2,y3) )) s.t.
x1 \in [0,1] //a continuous variable
y2,y3 \in {0,1} //two binary variables
The objective function is deterministic, but expensive to solve. If I leave away the binary variables, Scipy's differential evolution algorithm turns out to be a useful solution approach for my problem (converging faster than basin hopping).
There is some evidence available already with regard to the inclusion of integer variables in a differential evolution-based minimization problem. The suggested approaches turn y2,y3 into continuous variables x2,x3 \in [0,1], and then modify the objective function as follows:
(i) f(x1, round(x2), round(x3))
(ii) f(x1,x2,x3) + K( (x2-round(x2))^2 + (x3-round(x3))^2 )
with K a tuning parameter
A third, and probably naive approach would be to combine the binary variables into a single continuous variable z \in [0,1], and thereby to reduce the number of optimization variables.
For instance,
if z<0.25: y2=y3=0
elif z<0.5: y2=1, y3=0
elif z<0.75: y2=0, y3=1
else: y2=y3=1.
Which one of the above should be preferred, and why? I'd be very curious to hear how binary variables can be integrated in a continuous differential evolution algorithm (such as Scipy's) in a smart way.
PS. I'm aware that there's some literature available that proposes dedicated mixed-integer evolutionary algorithms. For now, I'd like to stay with Scipy.

I'd be very curious to hear how binary variables can be integrated in a continuous differential evolution algorithm
wrapdisc is a package that is a thin wrapper which will let you optimize binary variables alongside floats with various scipy.optimize optimizers. There is a usage example in its readme. With it, you don't have to adapt your objective function at all.
As of v2.0.0, it has two possible encodings for binary:
ChoiceVar: This uses one-hot max encoding. Two floats are used to represent the binary variable.
GridVar: This uses rounding. One float is used to represent the binary variable.
Although neither of these two variable types were made for binary, they both can support it just the same. On average, GridVar requires fewer function evaluations because it uses one less float than ChoiceVar.

When scipy 1.9 is released the differential_evolution function will gain an integrality parameter that will allow the user to indicate which parameters should be considered as integers. For binary selection one would use bounds of (0,1) for an integer parameter.

Related

Scalar minimization using scipy (`minimize` vs `minimize_scalar`)

I have a polynomial function for which I would like to find all local extrema. I can evaluate the polynomial via P(x) and to its derivative via d_P(x).
My first thought was to use minimize_scalar, however this does not seem to be able to take advantage of the fact that I can evaluate the derivative. Alternatively, I can use the more general minimize function and provide the gradient.
Is there a rule of thumb about which method will work better, or is this something where I should test out both methods and see what works better. Since the function I am optimizing is a polynomial (well behaved) I wonder if it really matters so much which I use, but if someone has a more background that would be great.
In particular, P(x) is the (unique) polynomial of degree n which alternatively attains a value of 1 or -1 on a set of n-1 points.
Here is a sample of the P(x) scaled so that P(0)=1. Note that the y axis is plotted on a symlog scale.
Since you have a continuous scalar function, the documentation of minimize_scalar suggests a more discrete optimization approach. Since it doesn't use gradient information you won't have trouble with noise/discontinuities/discreteness in your objective. However, if you use minimize in conjunction with a gradient based method then you will have trouble with convergence for noise/discontinuities/discreteness.
If the objective function is fist order continuous then both minimize and minimize_scalar should yield the same solution for a given bound.

The difference between C++ (LAPACK, sgels) and Python (Numpy, lstsq) results

I am comparing the numerical results of C++ and Python computations. In C++, I make use of LAPACK's sgels function to compute the coefficients of a linear regression problem. In Python, I use Numpy's linalg.lstsq function for a similar task.
What is the mathematical difference between the methods used by sgels and linalg.lstsq?
What is the expected error (e.g. 6 significant digits) when comparing the results (i.e. the regression coefficients) numerically?
FYI: I am by no means a C++ or Python expert, which makes it difficult to understand what is going on inside the functions.
Taking a look at the source of numpy, in the file linalg.py, lstsq relies on LAPACK's zgelsd() for complex and dgelsd() for real. Here are the differences to sgels():
dgelsd() is for double while sgels() is for float. There is a difference of precision...
dgels() makes use the QR factorization of the matrix A and assumes that A has full rank. The condition number of the matrix must be reasonable to get a significant result. See this course for getting the logic of the method. On the other hand, dgelsd() makes use of the Singular value decomposition of A. In particular, A can be rank defiencient and small singular values are discarted depending of the additional argument rcond or machine precision. Notice that numpy's default value for rcond is -1: negative values refers to machine precision. See this course for the logic.
According to the benchmark of LAPACK, on can expect dgels() to be about 5 time faster than dgelsd().
You may see significant differences between the result of sgels() and dgelsd() if the matrix is ill conditionned. Indeed, there is a bound on the error of the linear regression which depends on the algorithm and the value of rcond() that is used. See the user guide of LAPACK on, Error Bounds for Linear Least Squares Problems for estimates of the errors and Further Details: Error Bounds for Linear Least Squares Problems for technical details.
As a conclusion, sgels() and dgels() can be used if the measures in b are accurate and easily related to the explanatory variables. For instance, if sensors are placed at the exits of exhaust pipes, it's easy to guess which motors are running. But sometimes, the linear link between the source and the measures is not precisely known (uncertainty on the terms of A) or discriminating polluters on the base of measurements becomes more difficult (Some polluters are far from the set of sensors and A is ill-conditionned). In this kind of situation, dgelsd() and tunning the rcond argument can help. Whenever in doubt, use dgelsd() and estimate the error on the estimated x according to LAPACK's user guide.

Scipy Differential Evolution with integers

I'm trying to run an optimization with scipy.optimize.differential_evolution. The code calls for bounds for each variable in x. But I want to a solution where parts of x must be integers, while others can range freely as floats. The relevant part of my code looks like
bounds = [(0,3),(0,3),(0,3),???,???]
result = differential_evolution(func, bounds)
What do I replace the ???'s with to force those variables to be ints in a given range?
As noted in the comments there isn't direct support for a "integer constraint".
You could however minimize a modified objective function, e.g.:
def func1(x):
return func(x) + K * (x[3] - round(x[3]))**2
and this will force x[3] towards an integer value (unfortunately you have to tune the K parameter).
An alternative is to round (some of) the real-valued parameters before evaluating the objective function:
def func1(x):
z = x;
z[3] = round(z[3])
return func(z)
Both are common techniques to approach a discrete optimization problem using Differential Evolution and they work quite well.
Differential evolution can support integer constraint but the current scipy implementation would need to be changed.
From the scipy source code it appears that their DE is based Storn, R and Price, K, Differential Evolution - a Simple and Efficient Heuristic for Global Optimization over Continuous Spaces, Journal of Global Optimization, 1997
However there has been progress in this field as pointed out by this review paper Recent advances in differential evolution – An updated survey
There are a few papers which introduce changes to the algorithm so that it can handle ints. I have not had time to look at all the options but perhaps this paper could help.
An Improved Differential Evolution Algorithm for Mixed Integer Programming Problems
What do I replace the ???'s with to force those variables to be ints in a given range?
wrapdisc is a package that is a thin wrapper which will let you optimize bounded integer variables alongside floats with various scipy.optimize optimizers. There is a usage example in its readme. With it, you don't have to adapt your objective function at all. It internally uses rounding in order to support integers, although this detail is hidden from the user.

Solving a system of coupled iterative equations containing lots of heavy integrals and derivatives

I am trying to solve a system of coupled iterative equations, each of which containing lots of integrations and derivatives.
First I used maxima (embedded in Sage) to solve it analytically, but the solution was too dependent on the initial guesses I had make for my unknown functions, constant initial guesses yielded answering back almost immediately while symbolic functions when used as initial guesses yielded the system to go deep into calculations, sometimes seemingly never ending ones.
However, what I tried with Sage was actually a simplified version of my original equations so I thought it might be the case that I have no other choice rather than to treat the integrations and derivatives numerically, however, I had some not ignorable problems:
integrations were only allowed to have numerical limits and variables were not allowed as e.g. their upper limits (I thought maybe a numerical method algorithm is faster than the analytic one even-though I leave a variable or parameter in its calculations, but it just didn't work so).
integrands couldn't also admit extra variables and parameters w.r.t. which not being integrated.
the derivative function was itself a big obstacle, as I wasn't able to compute partial derivatives or to use a derivative in the integrand of an integral.
To get rid of all the problems with numerical derivative I substitute it by the symbolic diff() function and the speed improvement was still hopeful but the problems with numerical integration persist.
Now I have three questions:
a- Is it right to conclude there is no other way for me rather than to discretize the equations and do a complete numerical treatment instead of a mixed one?
b- If so then is there any way to do this automatically? My equations are not DE ones to use ODEint or else, they are iterative equations, I have integrations and derivatives only to update my unknowns at each step to their newer values.
c- If my calculations are so huge in size is there any suggestion on switching from python to fortran or things like that as well?
Best Regards

Continuous mutual information in Python

[Frontmatter] (skip this if you just want the question):
I'm currently looking at using Shannon-Weaver Mutual Information and normalized redundancy to measure the degree of information masking between bags of discrete and continuous feature values, organized by feature. Using this method, it is my goal to construct an algorithm that looks very similar to ID3, but instead of using Shannon entropy, the algorithm will seek (as a loop constraint) to maximize or minimize shared information between a single feature and a collection of features based on the complete input feature space, adding new features to the latter collection if (and only if) they increase or decrease mutual information, respectively. This, in effect, moves ID3's decision algorithm into pairspace, stapling an ensemble approach to it with all of the expected time and space complexities of both methods.
[/Frontmatter]
On to the question: I'm trying to get a continuous integrator working in Python using SciPy. Because I'm working with comparisons of discrete and continuous variables, my current strategy for each comparison for feature-feature pairs is as follows:
Discrete feature versus discrete feature: use the discrete form of mutual information. This results in a double summation of the probabilities, which my code handles without issue.
All other cases (discrete versus continuous, the inverse, and continuous versus continuous): use the continuous form, using a Gaussian estimator to smooth out the probability density functions.
It is possible for me to perform some kind of discretization for the latter cases, but because my input data sets are not inherently linear, this is potentially needlessly complex.
Here's the salient code:
import math
import numpy
import scipy
from scipy.stats import gaussian_kde
from scipy.integrate import dblquad
# Constants
MIN_DOUBLE = 4.9406564584124654e-324
# The minimum size of a Float64; used here to prevent the
# logarithmic function from hitting its undefined region
# at its asymptote of 0.
INF = float('inf') # The floating-point representation for "infinity"
# x and y are previously defined as collections of
# floating point values with the same length
# Kernel estimation
gkde_x = gaussian_kde(x)
gkde_y = gaussian_kde(y)
if len(binned_x) != len(binned_y) and len(binned_x) != len(x):
x.append(x[0])
y.append(y[0])
gkde_xy = gaussian_kde([x,y])
mutual_info = lambda a,b: gkde_xy([a,b]) * \
math.log((gkde_xy([a,b]) / (gkde_x(a) * gkde_y(b))) + MIN_DOUBLE)
# Compute MI(X,Y)
(minfo_xy, err_xy) = \
dblquad(mutual_info, -INF, INF, lambda a: 0, lambda a: INF)
print 'minfo_xy = ', minfo_xy
Note that overcounting exactly one point is done deliberately to prevent a singularity in SciPy's gaussian_kde class. As the size of x and y mutually approach infinity, this effect becomes negligible.
My current snag is in trying to get multiple integration working against a Gaussian kernel density estimate in SciPy. I've been trying to use SciPy's dblquad to perform the integration, but in the latter case, I receive an astounding spew of the following messages.
When I set numpy.seterr ( all='ignore' ):
Warning: The ocurrence of roundoff error is detected, which prevents
the requested tolerance from being achieved. The error may be
underestimated.
And when I set it to 'call' using an error handler:
Floating point error (underflow), with flag 4
Floating point error (invalid value), with flag 8
Pretty easy to figure out what's going on, right? Well, almost: IEEE 754-2008 and SciPy only tell me what's going on here, not why or how to work around it.
The upshot: minfo_xy generally resolves to nan; its sampling is insufficient to prevent information from becoming lost or invalid when performing Float64 math.
Is there a general workaround for this problem when using SciPy?
Even better: if there is a robust, canned implementation of continuous mutual information for Python with an interface that takes two collections of floating point values or a merged collection of pairs, it would resolve this complete problem. Please link it if you know of one that exists.
Thank you in advance.
Edit: this resolves the nan propagation issue in the example above:
mutual_info = lambda a,b: gkde_xy([a,b]) * \
math.log((gkde_xy([a,b]) / ((gkde_x(a) * gkde_y(b)) + MIN_DOUBLE)) \
+ MIN_DOUBLE)
However, the question of roundoff correction remains, as does the request for a more robust implementation. Any help in either domain would be greatly appreciated.
Before trying more radical solutions like reframing the problem or using different integration tools, see if this helps. Replace INF=float('INF') with INF=1E12 or some other large number -- that may eliminate NaN results created by simple arithmetic operations on the input variables.
No promises on this one, but it is sometimes helpful to try a quick fix before engaging in a significant algorithmic rewrite or substitution of alternate tools.

Categories