Add step size to a linear optimization - python

I'm working on a blending problem similar to the pulp example
I have this constrain to make sure the quantity produced is the desired one
prob += lpSum([KG[i] * deposit_vars[i] for i in deposit]) == 64, "KGRequirement"
But I also need to add another constraint for the minimun value different than zero, this is because is not convenient that I take for example, 0.002KG of one ingredient, I have to take either 0 or at least 2 kg, hence valid cases are e.g. 0, 2, 2.3, 6, 3.23.
I tried to make it this way:
for i in deposit:
prob += (KG[i] * deposit_vars[i] == 0) or (TM[i] * deposit_vars[i] >= 30)
But that is not working and it just make the problem Infeasible
EDIT
This my current code:
import pulp
from pulp import *
import pandas as pd
food = ["f1","f2","f3","f4"]
KG = [10,20,50,80]
Protein = [18,12,16,18]
Grass = [13,14,13,16]
price_per_kg = [15,11,10,22]
## protein,carbohydrates,kg
df = pd.DataFrame({"tkid":food,"KG":KG,"Protein":Protein,"Grass":Grass,"value":price_per_kg})
deposit = df["tkid"].values.tolist()
factor_volumen = 1
costs = dict((k,v) for k,v in zip(df["tkid"],df["value"]))
Protein = dict((k,v) for k,v in zip(df["tkid"],df["Protein"]))
Grass = dict((k,v) for k,v in zip(df["tkid"],df["Grass"]))
KG = dict((k,v) for k,v in zip(df["tkid"],df["KG"]))
prob = LpProblem("The Whiskas Problem", LpMinimize)
deposit_vars = LpVariable.dicts("Ingr",deposit,0)
prob += lpSum([costs[i]*deposit_vars[i] for i in deposit]), "Total Cost of Ingredients per can"
#prob += lpSum([deposit_vars[i] for i in deposit]) == 1.0, "PercentagesSum"
prob += lpSum([Protein[i] *KG[i] * deposit_vars[i] for i in deposit]) >= 17.2*14, "ProteinRequirement"
prob += lpSum([Grass[i] *KG[i] * deposit_vars[i] for i in deposit]) >= 12.8*14, "FatRequirement"
prob += lpSum([KG[i] * deposit_vars[i] for i in deposit]) == 14, "KGRequirement"
prob += lpSum([KG[i] * deposit_vars[i] for i in deposit]) <= 80, "KGRequirement1"
prob.writeLP("WhiskasModel.lp")
prob.solve()
# The status of the solution is printed to the screen
print ("Status:", LpStatus[prob.status])
# Each of the variables is printed with it's resolved optimum value
for v in prob.variables():
print (v.name, "=", v.varValue)
# The optimised objective function value is printed to the screen
print ("Total Cost of Ingredients per can = ", value(prob.objective))
The new contrain I want to add is in this part:
prob += lpSum([KG[i] * deposit_vars[i] for i in deposit]) <= 80, "KGRequirement1"
Where I want the product KG[i] * deposit_vars[i] be either 0 or to be between a and b

In the traditional linear programming formulation, all variables, objective function(s), and constraints need to be continuous. What you are asking is how to make this variable a discrete variable, i.e. it can only accept values a,b,... and not anything in between. When you have a combination of continuous and discrete variables, that is called a mixed integer problem (MIP). See PuLP documentation that reflects this explanation. I suggest you carefully read the blending problems mentions on "integers;" they are scattered about the page. According to PuLP's documentation, it can solve MIP problems by calling external MIP solver, some of which are already included.
Without a minimum working example, it is a little tricky to explain how to implement this. One way to do this would be to specify the variable(s) as an integer with the values it can take as a dict. Leaving the default solver, COIN-OR's CBC solver solver, will then solve the MIP. Meanwhile, here's a couple of resources for you to move forward:
https://www.toptal.com/algorithms/mixed-integer-programming#example-problem-scheduling
Note how it uses CBC solver, which is the default solver, to solve this problem
http://yetanothermathprogrammingconsultant.blogspot.com/2018/08/scheduling-easy-mip.html
A more explicit example on how they set-up their integer variables and call the CBC solver

'or' is not something you can use in an LP / MIP model directly. Remember, an LP/MIP consists of a linear objective and linear constraints.
To model x=0 or x≥L you can use socalled semi-continuous variables. Most advanced solvers support them. I don't believe Pulp supports this however. As a workaround you can also use a binary variable δ:
δ*L ≤ x ≤ δ*U
where U is an upperbound on x. It is easy to see this works:
δ = 0 ⇒ x = 0
δ = 1 ⇒ L ≤ x ≤ U
Semi-continuous variables don't require these constraints. Just tell the solver variable x is semi-continuous with bounds [L,U] (or just L if there is no upperbound).
The constraint
a*x=0 or L ≤ a*x ≤ U
can be rewritten as
δ*L ≤ x*a ≤ δ*U
δ binary variable
This is a fairly standard formulation. Semi-continuous variables are often used in finance (portfolio models) to prevent small allocations.
All of this keeps the model perfectly linear (not quadratic), so one can use a standard MIP solver and a standard LP/MIP modeling tool such as Pulp.

Related

Add penalty parameter to linear program pulp

I'm writing a LP problem in pulp python. I'm not new to LP but I am to pulp. So far I got a couple of constraints that are implemented correctly. They are simple and I know how they work. The problem is about assigning containers to voyages;
# All containers asigned to only 1 voyage
for i in cntrs:
prob += lpSum([x[(i,v)] for v in voyages]) <= 1
# Contaienr to right destination
for v in voyages:
prob += lpSum([x[(i,v)] * posibleDest.loc[i,v] for i in cntrs]) == 1
# Weight capacity of voyages
for v in voyages:
for b in barges:
prob += lpSum([weight[i] * x[(i,v)] for i in cntrs]) <= voyWCap[v]
# Type capacity of voyages
for c in cats:
for v in voyages:
prob += cntrCat.loc[i,c] * x[(i,v)] <= bargeCATCAP.loc[c,b] * voyBarge.loc[b,v]
# TEU cap of voyages
for v in voyages:
for b in barges:
prob += lpSum([cntrTEU[i] * x[(i,v)] for i in cntrs]) <= voyTEUCap[v]
I tested the program and it works just fine, however I'm stuck at a particular part. I want to add an parameter 'Tardy' which, if the container arrives to late/early, it gives the container a 'penalty value'. My objective function is to minimize unused space, so adding the sum of penalties times a big number should 'push' the program into trying to get everything in the right time window.
Now my problem; I know this works, only not how to program it.
What I've done so far;
My objective function is as follows
prob += lpSum([(TEUcap[b] * voyBarge.loc[b,v]) - (x[(i,v)] * cntrTEU[i]) + Tardy[i] * M]
for i in cntrs
for b in barges
for v in voyages)
Where M is a very big number
I've created a dictionary (Tardy) with 0's and a loop to fill that dictionary;
Tardy = dict.fromkeys(cntrs,0)
for i in cntrs:
for v in voyages:
if cntrDest.dot(voyArive).loc[i,v] != 0:
if cntrDest.dot(voyArive).loc[i,v] * x[(i,v)] <= (cntrOpen.dot(voyDest)).loc[i,v] * x[(i,v)]:
Tardy[i] = 1
elif cntrDest.dot(voyArive).loc[i,v] * x[(i,v)] >= (cntrClose.dot(voyDest)).loc[i,v] * x[(i,v)]:
Tardy[i] = 1
else:
Tardy[i] = 0
In words: most of my parameters are matrices, if there is a value (not 0) for
cntrDest.dot(voyArive).loc[i,v]
It means there is an arival datetime for container i on voyage v, if this value is greater than close datetime, or smaller than the open datetime, that container should get a penalty (Tardy[container] =1)
Because x is a LpVariable
x[(i,v)]
is always 0 before the problem is solved, therefore, tardy is always 1.
I think I have to 'paste' a prob+= in somewhere, but I can't figure out how to let the program take it into account. If anyone could help me make it work, or have another suggestion on how to program it, that would be greatly appreciated!
Kind regards
You can't formulate your model "conditionally"... Meaning Tardy is a variable in your model and you cannot assign values to it within a conditional statement (if-elif-else) inside of a linear program because the value of the dependent variables (x in this case) is not known when the problem is formulated and handed over to the solver, so we need to try something else and re-formulate that.
It isn't totally clear how you are handling times within your model, but it appears that the containers have due-dates and the voyages have arrival times, which would be the basis of calculating Tardy. So, you should introduce Tardy[i] as a non-negative real value and just constrain it to be larger than the difference between the arrival time and the due date. That assumes that container 'i' goes on that particular voyage 'v'. So, we need to multiply that delta by the selection binary variable x to only apply in the case of selection. In pseudo-code:
Tardy[i] >= (arrival_time[v] - due_time[i]) * x[i,v]
and then build that into your pulp model for each i,v in the model

Add constraint in GurobiPy using conditional decision variable

I am new to optimisation and have a fairly basic query.
I have a model with two decision variables x and y that vary in time. I'd like to add a conditional constraint on y at time t depending upon x[t-1], such that I've implemented the following code:
for t in model.timesteps:
if t>1:
if model.x[t-1] <= 1:
model.addConstr(model.y[t] >= 100)
elif model.x[t-1] <= 0.5:
model.addConstr(model.y[t] >= 50)
elif model.x[t-1] <= 0.3:
model.addConstr(model.y[t] >= 20)
However, the above code produces the error:
File "tempconstr.pxi", line 44, in gurobipy.TempConstr.bool
gurobipy.GurobiError: Constraint has no bool value (are you trying "lb <= expr <= ub"?)
Having done a little reading on previous related queries on this page, I believe I might need to use a binary indicator variable in order to implement the above. However, I'm not certain as to whether this would solve the above issue.
Could anyone point me in the right direction here please?
Many thanks in advance!
First, I assume your order of operations is incorrect; you intended that the right-hand side is 20 for 0 ≤ x[t-1] ≤ 0.3, 50 for 0.3 < x[t-1] ≤ 0.5 and 100 for 0.5 < x[t-1] ≤ 1.0.
The bigger issue is that you were mixing Python programming with MIP modeling. What you need is to convert that logic into a MIP model. There are several ways to do this. One is to use a piecewise linear constraint to represent the right-side values of the y[t] constraints. However, I prefer to model this explicitly. There are a few similar options; here is one I think is easy to understand: add binary variables z[0], z[1] and z[2] to represent the 3 ranges of x[t-1]; this gives the following code:
for t in model.timesteps:
if t>1:
z = model.addVars(3, vtype='B', name="z_%s" % str(t))
model.addConstr(x[t-1] <= z.prod([0.3, 0.5, 1.0]))
model.addConstr(y[t] >= z.prod([20, 50, 100]))
model.addConstr(z.sum() == 1)

Initial Guess/Warm start in CVXPY: give a hint of the solution

In this bit of code:
import cvxpy as cvx
# Examples: linear programming
# Create two scalar optimization variables.
x = cvx.Variable()
y = cvx.Variable()
# Create 4 constraints.
constraints = [x >= 0,
y >= 0,
x + y >= 1,
2*x + y >= 1]
# Form objective.
obj = cvx.Minimize(x+y)
# Form and solve problem.
prob = cvx.Problem(obj, constraints)
prob.solve(warm_start= True) # Returns the optimal value.
print ("status:", prob.status)
print ("optimal value", prob.value)
print ("optimal var", x.value, y.value)
I'm looking for a way to choose the warm start value myself (for example: x = 1/2 and y = 1/2), not the previous solver result.
Is there any way to give the solver this input? And if not, is there a non-commercial alternative to cvxpy?
To the 2021 readers: today is impossible (in cvxpy) to give a hand to the solver with an initial guess. Warm start right now only works when you solve the same problem with different parameter values, initializing with the previous solution (see https://github.com/cvxpy/cvxpy/issues/1355).
You can manually assign the values using x.value = 1/2, and then passing the warm_start=True parameter in the available solvers. Keep in mind not all solvers allow this, one that does is for example SCS.
More info available on: https://www.cvxpy.org/tutorial/advanced/index.html

Generalized Random Response for local differential privacy implementation

I have been tasked with implementing a local (non-interactive) differential privacy mechanism. I am working with a large database of census data. The only sensitive attribute is "Number of children" which is a numerical value ranging from 0 to 13.
I decided to go with the Generalized Random Response mechanism as it seems like the most intuitive method. This mechanism is described here and presented here.
After loading each value into an array (ignoring the other attributes for now), I perform the perturbation as follows.
d = 14 # values may range from 0 to 13
eps = 1 # epsilon level of privacy
p = (math.exp(eps)/(math.exp(eps)+d-1))
q = 1/(math.exp(eps)+d-1)
p_dataset = []
for row in dataset:
coin = random.random()
if coin <= p:
p_dataset.append(row)
else:
p_dataset.append(random.randint(0,13))
Unless I have misinterpreted the definition, I believe this will guarantee epsilon differential privacy on p_dataset.
However, I am having difficulty understanding how the aggregator must interpret this dataset. Following the presentation above, I attempted to implement a method for estimating the number of individuals who answered a particular value.
v = 0 # we are estimating the number of individuals in the dataset who answered 0
nv = 0 # number of users in the perturbed dataset who answered the value
for row in p_dataset:
if row == v:
nv += 1
Iv = nv * p + (n - nv) * q
estimation = (Iv - (n*q)) / (p-q)
I do not know if I have correctly implemented the method described as I do not completely understand what it is doing, and cannot find a clear definition.
Regardless, I used this method to estimate the total amount of individuals who answered each value in the dataset with a value for epsilon ranging from 1 to 14, and then compared this to the actual values. The results are below (please excuse the formatting).
As you can see, the utility of the dataset suffers greatly for low values of epsilon. Additionally, when executed multiple times, there was relatively little deviation in estimations, even for small values of epsilon.
For example, when estimating the number of participants who answered 0 and using an epsilon of 1, all estimations seemed to be centered around 1600, with the largest distance between estimations being 100. Considering the actual value of this query is 5969, I am led to believe that I may have implemented something incorrectly.
Is this the expected behaviour of the Generalized Random Response mechanism, or have I made a mistake in my implementation?
I think when getting a false answer, we cannot directly use p_dataset.append(random.randint(0,13)), because it contains true answer
max_v = 13
min_v = 0
for row in dataset: #row就是dataset里的真实值
coin = random.random()
if coin <= p:
p_dataset.append(row)
else:
ans = []
if row == min_v:
ans = np.arange(min_v + 1, max_v + 1).tolist()
elif row == max_v:
ans = np.arange(min_v, max_v).tolist()
else:
a = np.arange(min_v, row).tolist()
b = np.arange(row + 1, max_v + 1).tolist()
[ans.append(i) for i in a]
[ans.append(i) for i in b]
p_dataset.append(random.sample(ans, 1)) # 这其实有一点问题 应该是真实值以外的其他值 这样写还包括真实值

Find N symbols as unlike to each other as possible, when given matrix of "likeness" of all existing symbols?

I am trying to generate a set of symbols (currently bezier curves), and I need them to be as unlike to each other as possible. I have a function that compares two symbols with each other, and therefore I have a big matrix of values of how alike any two given symbols are.
I'm sure that there are better ways than to bruteforce this? I have been looking at various solutions of vaguely related geometric problems ("Find max area polygon given set of points"), but they all seem to be solved geometrically using the pairs of coordinates, which I don't have. Same goes for "Find numbers which sum up to a certain number"; I can't generalize the solutions given there.
I am absolutely at a loss on how to proceed now. It might be possible to somehow position the N symbols in an N-dimensional cube and play with the distances between the points as a n-dimensional polygon, but that's over my head both to formalize and to actually write.
Currently the most I have is finding the N most-alike and least-alike symbols, as long as N is something bruteforceable.
I'd be grateful for any pointers on how to proceed.
I'm assuming your problem works like this:
Given N symbols indexed by [1, ..., N], you can calculate the distance from symbol i to symbol j using a distance function dist(i, j). Your goal is to choose a subset of M symbols, where M <= N, that maximize sum(dist(i, j) for i in range(M) for j in range(i)). (Note: I assume that dist(i, i) == 0 and dist(i, j) = dist(j, i), so we only consider pairs with i < j.)
I don't know if there's a direct way to solve this (you could try asking on math.stackexchange.com), but this can be written as an optimization problem with binary variables, roughly like this:
decision variables:
# whether to include symbol i
x[i] (binary, i in range(N))
objective:
maximize sum(x[i]*x[j]*dist(i, j) for i in range(M) for j in range(i))
constraint:
sum(x[i] for i in range(N)) == M
This version has a quadratic objective function, and not every solver can handle that. However, you can also write it as a linear programming problem with binary variables:
decision variables:
# whether to include symbol i
x[i] (binary, i in range(N))
# whether both symbol i and j are included
xx[i, j] (i in range(N), j in range(i))
objective
maximize sum(xx[i, j]*dist(i, j) for i in range(M) for j in range(i))
constraints:
sum(x[i] for i in range(N)) == M
xx[i, j] <= x[i], (i in range(N), j in range(i))
xx[i, j] <= x[j], (i in range(N), j in range(i))
You can implement either of the problems above in a python-based optimization package and then have that package use a standard mixed-integer solver to find the answer. The pulp or pyomo packages would be good choices for this. In my experience, pulp is easier to get started with, but pyomo is better for large, standardized problems where you may run the same model with different data.
By default, I think pulp uses the glpk solver, which cannot handle quadratic problems. So if you use the quadratic form, you will also need to install a solver that can handle quadratic integer problems. If your problem is small, you should be able to use CBC or the community version of gurobi or cplex as the solver. Even if you use the linear form, you may want to use cplex or gurobi as the solver, because they are much faster than glpk and cbc.
Here's an example of solving the linear form of this model using the pulp package (you'll need to use pip install pulp to get that package):
from pulp import *
def dist(i, j):
# use simple distance for testing
return abs(i - j)
N = 20 # number of symbols
M = 10 # number of symbols to select
# create problem object
prob = LpProblem("Greatest Distance Problem", LpMaximize)
# define decision variables
include = LpVariable.dicts("Include", range(N), 0, 1, LpInteger)
include_both = LpVariable.dicts("Include Both", combination(range(N), 2))
# add objective function
prob += (
lpSum([include_both[i, j]*dist(i, j) for i, j in combination(range(N), 2)]),
"Total Distance"
)
# define constraints
prob += (
lpSum(include[i] for i in range(N)) == M,
"Select M symbols"
)
for i, j in combination(range(N), 2):
prob += (include_both[i, j] <= include[i], "")
prob += (include_both[i, j] <= include[j], "")
prob.solve()
print "Status: {}".format(LpStatus[prob.status])
print "Included: {}".format([i for i in range(N) if value(include[i]) == 1])

Categories