ampl vs gams MINLP portfolio optimization syntax - python

I am looking for a MINLP optimizer to solve portfolio optimisation problem which minimizes x'.S.x where x is a vector S is a given matrix. There are integer constraints which x elements depend on ex; x[i] = g[i].K[i] where g[i] is an integer and K[i] is a given vector, thus we need to find g[i]s while minimizing objective.
I am considering using AMPL or gams. Main program is in python. I am not sure if these are the best MINLP out there, but anyways there seems to be some examples on both websites. In terms of matrix multiplication for the minimization objective, I am not clear if there is a simple way of writing this in AMPL, do I need to write it as an algebraic expansion? Can you provide an example of x'.S.x operation in AMPL language?
In terms of gams, I see the package is free only for limited usage of a number of variables. Therefore I was considering AMPL, but maybe for smaller problems gams might be the solution if I cannot figure out AMPL notation for matrix vector multiplications

The AMPL syntax is very straightforward:
sum{i in I, j in I} x[i]*S[i,j]*x[j]
Note that many portfolio models do not need a full-blown MINLP solver but can be solved with the quadratic (and SOCP) capabilities present in systems like Cplex and Gurobi. Your question is somewhat difficult to parse (at least for me) but I believe your model falls in this category.

Related

Binary variables for minimization by scipy differential evolution

I have a non-linear minimization problem that takes a combination of continuous and binary variables as input. Think of it as a network flow problem with valves, for which the throughput can be controlled, and with pumps, for which you can change the direction.
A "natural," minimalistic formulation could be:
arg( min( f(x1,y2,y3) )) s.t.
x1 \in [0,1] //a continuous variable
y2,y3 \in {0,1} //two binary variables
The objective function is deterministic, but expensive to solve. If I leave away the binary variables, Scipy's differential evolution algorithm turns out to be a useful solution approach for my problem (converging faster than basin hopping).
There is some evidence available already with regard to the inclusion of integer variables in a differential evolution-based minimization problem. The suggested approaches turn y2,y3 into continuous variables x2,x3 \in [0,1], and then modify the objective function as follows:
(i) f(x1, round(x2), round(x3))
(ii) f(x1,x2,x3) + K( (x2-round(x2))^2 + (x3-round(x3))^2 )
with K a tuning parameter
A third, and probably naive approach would be to combine the binary variables into a single continuous variable z \in [0,1], and thereby to reduce the number of optimization variables.
For instance,
if z<0.25: y2=y3=0
elif z<0.5: y2=1, y3=0
elif z<0.75: y2=0, y3=1
else: y2=y3=1.
Which one of the above should be preferred, and why? I'd be very curious to hear how binary variables can be integrated in a continuous differential evolution algorithm (such as Scipy's) in a smart way.
PS. I'm aware that there's some literature available that proposes dedicated mixed-integer evolutionary algorithms. For now, I'd like to stay with Scipy.
I'd be very curious to hear how binary variables can be integrated in a continuous differential evolution algorithm
wrapdisc is a package that is a thin wrapper which will let you optimize binary variables alongside floats with various scipy.optimize optimizers. There is a usage example in its readme. With it, you don't have to adapt your objective function at all.
As of v2.0.0, it has two possible encodings for binary:
ChoiceVar: This uses one-hot max encoding. Two floats are used to represent the binary variable.
GridVar: This uses rounding. One float is used to represent the binary variable.
Although neither of these two variable types were made for binary, they both can support it just the same. On average, GridVar requires fewer function evaluations because it uses one less float than ChoiceVar.
When scipy 1.9 is released the differential_evolution function will gain an integrality parameter that will allow the user to indicate which parameters should be considered as integers. For binary selection one would use bounds of (0,1) for an integer parameter.

Python external constraint function in MINLP

Is it possible to add as a dynamic constraint an external custom function in Mixed Integer Nonlinear Programming libraries in Python? I am working with boolean variables and Numpy matrices (size m x n) where i want to minimize the sum of total values requested (e.g tot_vals = 2,3......n). Therefore, i want to add some "spatial" constraints, I've created the functions (based on Boolean indexing) and i try to implement them in my optimization procedure. In CVXPY, it fails as i can only add CVXPY's formatted constraints (as far as i know), PULP fails as it works only for LP problems, maybe a choice could be Pyomo, OpenOpt or PySCIPopt?
Thank you in advance for your help
With PySCIPOpt this is possible. You would need to create a custom constraint handler which checks the current LP solution for feasibility and maybe adds valid inequalities to circumvent the infeasibility in the next node.
One example of this procedure is the TSP implementation in PySCIPOpt. This is also explained in some more detail in this tutorial article about PySCIPOpt.

Gurobi-style model construction for Scipy.linprog?

I want to compare Gurobi and Scipy's linear programming tools, such as linprog. Scipy requires to specify problems in a matrix-list-vector-form while Gurobi works like here such that
m = Model()
m.addVar(...) %for variables
m.addConstr(..>) %for constraints
m.update() %for updating the model
m.optimize % for optimizing the model
m.params %for getting parameters
m._vars %for getting variables
in comparison Scipy
Minimize: c^T * x
Subject to: A_ub * x <= b_ub
A_eq * x == b_eq
c : array_like
Coefficients of the linear objective function to be minimized.
A_ub : array_like, optional
2-D array which, when matrix-multiplied by x, gives the values of the upper-bound inequality constraints at x.
b_ub : array_like, optional
1-D array of values representing the upper-bound of each inequality constraint (row) in A_ub.
A_eq : array_like, optional
2-D array which, when matrix-multiplied by x, gives the values of the equality constraints at x.
b_eq : array_like, optional
1-D array of values representing the RHS of each equality constraint (row) in A_eq.
bounds : sequence, optional
My goal is to write the code in only one method and still benchmark the results with both solvers. In order to speed up comparing the solvers:
Does there exist Gurobi-style model construction of LP problems for Scipy?
Does there exist some package to make the two methods interchangeable (I could write scipy-style for Gurobi or in Gurobi-style for Scipy)?
Does scipy provide any other interface to specify linear programming problems?
That sounds like a lot of work to show the obvious:
Commercial solvers like Gurobi are much faster and much more robust than non-commercial ones
There are also high-quality benchmarks showing this by H. Mittelmann (CLP and GLPK beeing the most popular non-commercial ones)
While scipy's linprog is ok, it's much worse than the open-source competition including CBC/CLP, GLPK, lpSolve...
Speed and robustness!
Also: scipy's linprog really seems unmaintained open issues
There are some ways you could do that:
A) Use linprog's way of problem-definition and convert this to Gurobi-style
very easy to convert-matrix form to Gurobi's modelling
B) Use cvxpy as modelling-tool, grab the standard-form and write a wrapper for Gurobi (actually: there is one) and linprog (again easy). This would allow a very powerful modelling-language to be used by both
Disadvantage: Intransparent transformations according to the problem (e.g. abs(some_vector) might introduce auxiliary variables)
C) Write some MPS-reader / or take one from other tools to model you problems within Gurobi, output these and read & solve within linprog
Candidate tools: cvxopt's mps-reader (well-hidden in the docs), some GLPK-interface or even some CBC-interface
(Maybe hidden transformations)
No matter what you do, solution-process analysis will be a big part of your code as linprog might fail a lot. It's also not able to handle big sparse models.
Remarks based on your gurobi-example
Your example (TSP) is a MIP, not an LP
For MIP, everything said above get's worse (especially performance differences between commerical and open-source)
There is no MIP-solver within scipy!

Methods of discrete optimization of particular function in Python

I have a matrix on Z^2 with large dimensions (e.g. 20000 vectors of 200 elements). Each vector contains the same number of ones. I want to find minimal set of the vectors that gives a vector of ones in bitwise OR. This is solved by dynamic programming, but the time complexity of the solution is atrocious. I want to apply some optimization like annealing or genetic algorithm or something else to find less or more good approximation of the answer. But I have no experience in optimizing such functions and just don't know what to try first and what to start with. I want to learn some optimization in Python working on this problem, so some advice on pythonic way of discrete optimization here will be appreciated!

How do you show cost function per iteration in scikit-learn?

I've been running some linear/logistic regression models recently, and I wanted to know how you can output the cost function for each iteration. One of the parameters in sci-kit LinearRegression is 'maxiter', but in reality you need to see cost vs iteration to find out what this value really needs to be i.e. is the benefit worth the computational time to run more iterations etc
I'm sure I'm missing something but I would have thought there was a method that outputted this information?
Thanks in advance!
One has to understand if there is any iteration (implying computing a cost function) or an analytical exact solution, when fitting any estimator.
Linear Regression
In fact, Linear Regression - ie Minimization of the Ordinary Least Square - is not an algorithm but a minimization problem that can be solved using different techniques. And those techniques
Not getting into the details of the statistical part described here :
There are at least three methods used in practice for computing least-squares solutions: the normal equations, QR decomposition, and singular value decomposition.
As far as I got into the details of the codes, it seems that the computational time is involved by getting the analytical exact solution, not iterating over the cost function. But I bet they depend on your system being under-, well- or over-determined, as well as the language and library you are using.
Logistic Regression
As Linear Regression, Logistic Regression is a minimization problem that can be solved using different techniques that, for scikit-learn, are : newton-cg, lbfgs, liblinear and sag.
As you mentionned, sklearn.linear_model.LogisticRegression includes the max_iter argument, meaning it includes iterations*. Those are controled either because the updated argument doesn't change anymore - up to a certain epsilon value - or because it reached the maximum number of iterations.
*As mentionned in the doc, it includes iterations only for some of the solvers
Useful only for the newton-cg, sag and lbfgs solvers. Maximum number of iterations taken for the solvers to converge.
In fact, each solver involves its own implementation, such as here for the liblinear solver.
I would recommand to use the verbose argument, maybe equal to 2 or 3 to get the maximum value. Depending on the solver, it might print the cost function error. However, I don't understand how you are planning to use this information.
Another solution might be to code your own solver and print the cost function at each iteration.
Curiosity kills cat but I checked the source code of scikit which involves many more.
First, sklearn.linear_model.LinearRegression use a fit to train its parameters.
Then, in the source code of fit, they use the Ordinary Least Square of Numpy (source).
Finally, Numpy's Least Square function uses the functionscipy.linalg.lapack.dgelsd, a wrapper to the LAPACK (Linear Algebra PACKage) function DGELSD written in Fortran (source).
That is to say that getting into the error calculation, if any, is not easy for scikit-learn developers. However, for the various using of LinearRegression and many more I had, the trade-off between cost-function and iteration time is well-adressed.

Categories