In an optimization problem developed in PuLP i use the following objective function:
objective = p.lpSum(vec[r] for r in range(0,len(vec)))
All variables are non-negative integers, hence the sum over the vector gives the total number of units for my problem.
Now i am struggling with the fact, that PuLP only gives one of many solutions and i would like to narrow down the solution space to results that favors the solution set with the smallest standard deviation of the decision variables.
E.g. say vec is a vector with elements 6 and 12. Then 7/11, 8/10, 9/9 are equally feasible solutions and i would like PuLP to arrive at 9/9.
Then the objective
objective = p.lpSum(vec[r]*vec[r] for r in range(0,len(vec)))
would obviously create a cost function, that would help the case, but alas, it is non-linear and PuLP throws an error.
Anyone who can point me to a potential solution?
Instead of minimizing the standard deviation (which is inherently non-linear), you could minimize the range or bandwidth. Along the lines of:
minimize maxv-minv
maxv >= vec[r] for all r
minv <= vec[r] for all r
Related
I am new to linear programming and am hoping to get some help in understanding how to include intercept terms in the objective for a piecewise function (see below code example).
import pulp as pl
# Pieces
pieces = [1, 2]
# Problem
prob = pl.LpProblem('Example', pl.LpMaximize)
# Decision Vars
x_vars = pl.LpVariable.dict('x', pieces, 0, None, pl.LpInteger)
y_vars = pl.LpVariable.dict('y', pieces, 0, None, pl.LpInteger)
# Objective
prob += (-500+10*x_vars[1]) + (150+9*x_vars[2]) + (2+9.1*y_vars[1]) + (4+6*y_vars[2])
# Constraints
prob += pl.lpSum(x_vars[i] for i in pieces) + pl.lpSum(y_vars[i] for i in pieces) <= 1100
prob += x_vars[1] <= 700
prob += x_vars[2] <= 700
prob += y_vars[1] <= 400
prob += y_vars[2] <= 400
# Solve
prob.solve()
# Results
for v in prob.variables():
print(v.name, "=", pl.value(v))
The terms included in the objective function are the piecewise intercepts and coefficients obtained from univariate piecewise regression models. For example, the linear regression model for x is yhat=-500+10*x for the first piece, and yhat=150+9*x for the second piece. Likewise, for y we have yhat=2+9.1*x and yhat=4+6*x for the first and second pieces, respectively.
If I remove and/or change any of the intercept values, I arrive at the same solution. I would have thought that each intercept is required for producing the estimates in the objective function. Have I not specified the objective function properly? ..or are the intercept terms not required (and therefore not taken into account) in this type of LP formulation.
I don't exactly understand what you are trying to achieve. But let me try to explain what we normally do when we talk about piecewise linear functions.
A piecewise linear function is completely determined by its breakpoints. E.g.
The input is just these points
xbar = [1,3,6,10]
ybar = [6,2,8,7]
These points you have to calculate in advance, outside the optimization model. The intercept and slope of the segments are represented in these points. Note that the intercept cannot be ignored: it would lead to a very different segment. Calculations of these breakpoints need care: without correct breakpoints, your model will not function properly.
When using such a piecewise linear function, we want to maintain a mapping between x and y (both decision variables). I.e. we always want to hold for any feasible solution:
y = f(x)
where f represents the piecewise linear function. This means that besides choosing a segment, we need to interpolate between the breakpoints (i.e. we want to trace the blue line). The formulations below essentially form the constraint y=f(x) but in such a way that it is accepted by a MIP (Mixed Integer Programming) solver.
To interpolate between the breakpoints, we can use a lot of different formulations. The simplest is to use SOS2 variables. (SOS2 stands for Special Ordered Sets of Type 2, a construct that is supported by most high-end solvers). The formulation would look like:
Here x,y, and λ are decision variables (and xbar,ybar are data, i.e. constants). k is the set of points (here: k=1,..,4).
Not all solvers and modeling tools support SOS2 variables. Here is another formulation using binary variables:
Here s is the segment index: s=1,2,3. This is sometimes called the incremental formulation.
These are just two formulations. There are many others. Note that some solvers and modeling tools have special constructs to express piecewise linear functions. But all these share the idea of providing a collection of breakpoints.
This is very different from what you did. But this is what we typically do to model piecewise linear functions in Mixed-Integer Programming models.
A well-written reference is: H. Paul Williams, Model Building in Mathematical Programming, Wiley. You are encouraged to consult this practical book: it is very good.
I have to implement the cosine function in python by using its Taylor series. I have to print its values and the absolute and relative errors for all the values in listt from below. Here is what I tried:
import math
def expression(x, k):
return (-1)**k/(math.factorial(2*k))*(x**(2*k))
pi=math.pi
listt=[0, pi/6, pi/4, pi/3, pi/2, (2*pi)/3, (3*pi)/4, (5*pi)/6, pi]
sum=0
for x in listt:
k=0
relative_error=1
while relative_error>10**(-15):
sum+=expression(x, k)
absolute_error=abs(math.cos(x)-sum)
relative_error=absolute_error/abs(math.cos(x))
k+=1
print("%.20f"%x, '\t', "%.20f"%sum, '\t', "%.20f"%math.cos(x), '\t', "%.20f"%absolute_error, '\t', "%.20f"%relative_error )
This, however, produces a huge error. The cause is probably that I perform all those subtractions. However, I don't know how to avoid them. I ran into the same problem when computing e^x for negative x, but there I just computed e^(-x) if x was negative and I could then just write that e^x=1/e^(-x). Is there some similar trick here?
Also, note that my while is infinite, because the relative error is always huge.
Fatal error: You set sum=0 outside the loop over the test points. You need to reset it for every test point. This error could be easily avoided if you made the approximate cosine computation a separate function.
It is always a better idea to compute the terms of the series by multiplying with the quotient to the previous term. This avoids the computation of the factorial and all the difficulties that its growth may produce.
That said, the growth of terms is the same as those for the exponential series, so the largest term is where 2*k is about |x|. For the given points that should not be prohibitively large.
To be a little more precise, the error of a cosine partial sum is smaller than the next term, as the series is alternating. The term of degree 2*k for |x|<=4 has the approximate bound, using Stirling's formula for the factorial,
4^(4*k)/(4*k/e)^(4*k) = (e/k)^(4*k) < (3/k)^(4*k)
which for k=6 gives an upper bound of 2^(-24) ~ 10^(-7). Thus there should be no catastrophic errors for the test values.
See also Calculate maclaurin series for sin using C, Issue with program to approximate sin and cosine values, Sine calculation with Taylor series not working
I was trying to use FiPy to solve a set of PDEs when I realized the command sweep was not working the way I thought it would. Here goes a sample with part of my code:
from pylab import *
import sys
from fipy import *
viscosity = 5.55555555556e-06
Pe =5.
pfi=100.
lfi=0.01
Ly=1.
Nx =200
Ny=100
Lx=Ly*Nx/Ny
dL=Ly/Ny
mesh = PeriodicGrid2DTopBottom(nx=Nx, ny=Ny, dx=dL, dy=dL)
x, y = mesh.cellCenters
xVelocity = CellVariable(mesh=mesh, hasOld=True, name='X velocity')
xVelocity.constrain(Pe, mesh.facesLeft)
xVelocity.constrain(Pe, mesh.facesRight)
rad=0.1
var1 = DistanceVariable(name='distance to center', mesh=mesh, value=numerix.sqrt((x-Nx*dL/2.)**2+(y-Ny*dL/2.)**2))
pi_fi= CellVariable(mesh=mesh, value=0.,name='Fluid-interface energy map')
pi_fi.setValue(pfi*exp(-1.*(var1-rad)/lfi), where=(var1 > rad) )
pi_fi.setValue(pfi, where=(var1 <= rad))
xVelocityEq = DiffusionTerm(coeff=viscosity) - ImplicitSourceTerm(pi_fi)
xres=10.
while (xres > 1.e-6) :
xVelocity.updateOld()
mySolver = LinearGMRESSolver(iterations=1000,tolerance=1.e-6)
xres = xVelocityEq.sweep(var=xVelocity,solver=mySolver)
print 'Result = ', xres
#Thats it
In short, I am declaring a function called xVelocityEq and solving it using sweep. Here is my output:
Result = 0.0007856742013190237
Result = 6.414470433257661e-07
As you can see, the while loop ends after two iterations. My first question is: why is my first residual error (=0.0007856742013190237) higher than the solver's tolerance? I thought that, since xVelocityEq corresponds to a linear system, solver tolerance and residual error would mean the same thing.
If I increase the no. of iterations in mySolver from 1000 to 10000, I get the following output:
Result = 0.0007856742013190237
Result = 2.4619110931978988e-09
Why did the second residual change, given that the first remained the same?
If I increase the tolerance in mySolver from 1.e-6 to 7.e-4, I get the following output:
Result = 0.0007856742013190237
Result = 6.414470433257661e-07
Note that these residuals are the same as in the first output. Now if I try to further increase the tolerance to 8.e-4, here's what I get as output:
Result = 0.0007856742013190237
Result = 0.0007856742013190237
Result = 0.0007856742013190237
Result = 0.0007856742013190237
Result = 0.0007856742013190237
...
At this point I was completely lost. Why the residuals have the same values for all solver tolerances smaller than 7.e-4? And why these residuals are constant and equal to 0.0007856742013190237 for solver tolerances higher than 7.e-4?
If I change the mySolver to LinearLUSolver (iterations=1000, tolerance=1.e-6), here's what I get:
Result = 0.0007856742013190237
Result = 1.6772757200988522e-18
Why in the world is my first residual the same as before, even though I have changed the solver?
why is my first residual error (=0.0007856742013190237) higher than the solver's tolerance?
The residual calculated by .sweep() is calculated before the solver is invoked to calculated a new solution vector. The matrix L and right-hand-side vector b are calculated based on the initial value of the solution vector x.
The residual is a measure of how well the current solution vector satisfies the non-linear PDE. The solver tolerance places a limit on how hard the solver should work to satisfy the linear system of equations discretized from the PDE.
Even if the PDE is linear (e.g., the diffusion coefficient is not a function of the solution variable), the initial value presumably doesn't solve the PDE, so the residual is large. After the solver is invoked, then x should solve the PDE, to within the solver tolerance. If the PDE is non-linear, then a well-converged solution to the linear algebra is still probably not a good solution to the PDE; that's what sweeping is for.
I thought that, since xVelocityEq corresponds to a linear system, solver tolerance and residual error would mean the same thing.
There wouldn't be any utility in keeping track of both. In addition to the residual being before the solve and the solver tolerance being used to terminate the solve, there are different normalizations that can be used and a lot of the solver documentation can be kind of sketchy. FiPy uses |L x - b|_2 as its residual. Solvers may normalize by the magnitude of b, the diagonal of L, or the phase of the moon, all of which can make it hard to directly compare the residual with the tolerance.
Why did the second residual change, given that the first remained the same?
By allowing 1000 iterations instead of 100, the solver was able to drive to a more exacting tolerance which, in turn, led to a smaller residual for the next sweep.
Why the residuals have the same values for all solver tolerances smaller than 7.e-4? And why these residuals are constant and equal to 0.0007856742013190237 for solver tolerances higher than 7.e-4?
Probably because the solver is failing and so not changing the value of the solution vector. Some solvers don't report this. In other cases, we should be doing a better job of reporting that fact to you.
Why in the world is my first residual the same as before, even though I have changed the solver?
The residual is not a property of the solver. It is a property of the discretized system of equations that approximates your PDE. Those linear algebra equations are then the input to the solver.
I've implemented the mutual information formula in python using pandas and numpy
def mutual_info(p):
p_x=p.sum(axis=1)
p_y=p.sum(axis=0)
I=0.0
for i_y in p.index:
for i_x in p.columns:
I+=(p.ix[i_y,i_x]*np.log2(p.ix[i_y,i_x]/(p_x[i_y]*p[i_x]))).values[0]
return I
However, if a cell in p has a zero probability, then np.log2(p.ix[i_y,i_x]/(p_x[i_y]*p[i_x])) is negative infinity, and the whole expression is multiplied by zero and returns NaN.
What is the right way to work around that?
For various theoretical and practical reasons (e.g., see Competitive Distribution Estimation:
Why is Good-Turing Good), you might consider never using a zero probability with the log loss measure.
So, say, if you have a probability vector p, then, for some small scalar α > 0, you would use α 1 + (1 - α) p (where here the first 1 is the uniform vector). Unfortunately, there are no general guidelines for choosing α, and you'll have to assess this further down the calculation.
For the Kullback-Leibler distance, you would of course apply this to each of the inputs.
I have a set of measured radii (t+epsilon+error) at an equally spaced angles.
The model is circle of radius (R) with center at (r, Alpha) with added small noise and some random error values which are much bigger than noise.
The problem is to find the center of the circle model (r,Alpha) and the radius of the circle (R). But it should not be too much sensitive to random error (in below data points at 7 and 14).
Some radii could be missing therefore the simple mean would not work here.
I tried least square optimization but it significantly reacts on error.
Is there a way to optimize least deltas but not the least squares of delta in Python?
Model:
n=36
R=100
r=10
Alpha=2*Pi/6
Data points:
[95.85, 92.66, 94.14, 90.56, 88.08, 87.63, 88.12, 152.92, 90.75, 90.73, 93.93, 92.66, 92.67, 97.24, 65.40, 97.67, 103.66, 104.43, 105.25, 106.17, 105.01, 108.52, 109.33, 108.17, 107.10, 106.93, 111.25, 109.99, 107.23, 107.18, 108.30, 101.81, 99.47, 97.97, 96.05, 95.29]
It seems like your main problem here is going to be removing outliers. There are a couple of ways to do this, but for your application, your best bet is to probably just to remove items based on their distance from the median (Since the median is much less sensitive to outliers than the mean.)
If you're using numpy that would looks like this:
def remove_outliers(data_points, margin=1.5):
nd = np.abs(data_points - np.median(data_points))
s = nd/np.median(nd)
return data_points[s<margin]
After which you should run least squares.
If you're not using numpy you can do something similar with native python lists:
def median(points):
return sorted(points)[len(points)/2] # evaluates to an int in python2
def remove_outliers(data_points, margin=1.5):
m = median(data_points)
centered_points = [abs(point - m) for point in data_points]
centered_median = median(centered_points)
ratios = [datum/centered_median for datum in centered_points]
return [point for i, point in enumerate(data_points) if ratios[i]>margin]
If you're looking to just not count outliers as highly you can just calculate the mean of your dataset, which is just a linear equivalent of the least-squares optimization.
If you're looking for something a little better I might suggest throwing your data through some kind of low pass filter, but I don't think that's really needed here.
A low-pass filter would probably be the best, which you can do as follows: (Note, alpha is a number you will have to fiddle with to get your desired output.)
def low_pass(data, alpha):
new_data = [data[0]]
for i in range(1, len(data)):
new_data.append(alpha * data[i] + (1 - alpha) * new_data[i-1])
return new_data
At which point your least squares optimization should work fine.
Replying to your final question
Is there a way to optimize least deltas but not the least squares of delta in Python?
Yes, pick an optimization method (for example downhill simplex implemented in scipy.optimize.fmin) and use the sum of absolute deviations as a merit function. Your dataset is small, I suppose that any general purpose optimization method will converge quickly. (In case of non-linear least squares fitting it is also possible to use general purpose optimization algorithm, but it's more common to use the Levenberg-Marquardt algorithm which minimizes sums of squares.)
If you are interested when minimizing absolute deviations instead of squares has theoretical justification see Numerical Recipes, chapter Robust Estimation.
From practical side, the sum of absolute deviations may not have unique minimum.
In the trivial case of two points, say, (0,5) and (1,9) and constant function y=a, any value of a between 5 and 9 gives the same sum (4). There is no such problem when deviations are squared.
If minimizing absolute deviations would not work, you may consider heuristic procedure to identify and remove outliers. Such as RANSAC or ROUT.