Add penalty parameter to linear program pulp - python

I'm writing a LP problem in pulp python. I'm not new to LP but I am to pulp. So far I got a couple of constraints that are implemented correctly. They are simple and I know how they work. The problem is about assigning containers to voyages;
# All containers asigned to only 1 voyage
for i in cntrs:
prob += lpSum([x[(i,v)] for v in voyages]) <= 1
# Contaienr to right destination
for v in voyages:
prob += lpSum([x[(i,v)] * posibleDest.loc[i,v] for i in cntrs]) == 1
# Weight capacity of voyages
for v in voyages:
for b in barges:
prob += lpSum([weight[i] * x[(i,v)] for i in cntrs]) <= voyWCap[v]
# Type capacity of voyages
for c in cats:
for v in voyages:
prob += cntrCat.loc[i,c] * x[(i,v)] <= bargeCATCAP.loc[c,b] * voyBarge.loc[b,v]
# TEU cap of voyages
for v in voyages:
for b in barges:
prob += lpSum([cntrTEU[i] * x[(i,v)] for i in cntrs]) <= voyTEUCap[v]
I tested the program and it works just fine, however I'm stuck at a particular part. I want to add an parameter 'Tardy' which, if the container arrives to late/early, it gives the container a 'penalty value'. My objective function is to minimize unused space, so adding the sum of penalties times a big number should 'push' the program into trying to get everything in the right time window.
Now my problem; I know this works, only not how to program it.
What I've done so far;
My objective function is as follows
prob += lpSum([(TEUcap[b] * voyBarge.loc[b,v]) - (x[(i,v)] * cntrTEU[i]) + Tardy[i] * M]
for i in cntrs
for b in barges
for v in voyages)
Where M is a very big number
I've created a dictionary (Tardy) with 0's and a loop to fill that dictionary;
Tardy = dict.fromkeys(cntrs,0)
for i in cntrs:
for v in voyages:
if cntrDest.dot(voyArive).loc[i,v] != 0:
if cntrDest.dot(voyArive).loc[i,v] * x[(i,v)] <= (cntrOpen.dot(voyDest)).loc[i,v] * x[(i,v)]:
Tardy[i] = 1
elif cntrDest.dot(voyArive).loc[i,v] * x[(i,v)] >= (cntrClose.dot(voyDest)).loc[i,v] * x[(i,v)]:
Tardy[i] = 1
else:
Tardy[i] = 0
In words: most of my parameters are matrices, if there is a value (not 0) for
cntrDest.dot(voyArive).loc[i,v]
It means there is an arival datetime for container i on voyage v, if this value is greater than close datetime, or smaller than the open datetime, that container should get a penalty (Tardy[container] =1)
Because x is a LpVariable
x[(i,v)]
is always 0 before the problem is solved, therefore, tardy is always 1.
I think I have to 'paste' a prob+= in somewhere, but I can't figure out how to let the program take it into account. If anyone could help me make it work, or have another suggestion on how to program it, that would be greatly appreciated!
Kind regards

You can't formulate your model "conditionally"... Meaning Tardy is a variable in your model and you cannot assign values to it within a conditional statement (if-elif-else) inside of a linear program because the value of the dependent variables (x in this case) is not known when the problem is formulated and handed over to the solver, so we need to try something else and re-formulate that.
It isn't totally clear how you are handling times within your model, but it appears that the containers have due-dates and the voyages have arrival times, which would be the basis of calculating Tardy. So, you should introduce Tardy[i] as a non-negative real value and just constrain it to be larger than the difference between the arrival time and the due date. That assumes that container 'i' goes on that particular voyage 'v'. So, we need to multiply that delta by the selection binary variable x to only apply in the case of selection. In pseudo-code:
Tardy[i] >= (arrival_time[v] - due_time[i]) * x[i,v]
and then build that into your pulp model for each i,v in the model

Related

Enumerating Constraints in LP Model in Pulp Package

I am creating a linear optimization model using python using the pulp package. I am wondering if there is a simple way to add constraints to a model without many hard coding every variable. For example...I am currently using a for loop to create the following set partitioning constraints:
for j in range(0,(len(excel_data_df))) :
i = j * 3
OptModel += x[i] + x[i + 1] + x[i + 2] == 1
This works for smaller problems. However as the variable i gets larger it becomes very time consuming to add all of the indices in the constraint.
Would it be possible to loop through all of the values which i can take and then generate the OptModel+= line of code automatically? For example if the variable i is 100 I would want the code to generate the following without having to manually add each x[i] variable.
for j in range(0,(len(excel_data_df))) :
for i in range(0,100)
OptModel += x[0] + x[1] + x[2] + ......+ x[100] == 1
You can use the lpSum method which allows you to sum over lists of vatriables - so all you need to do is have a way of generating the indexes you want to sum over. In your second exmaple you could do:
OptModel += lpSum([x[i] for i in range(0, 100)]) == 1

My code suddenly stops writing at beyond 15500 iterations

I'm studying how to code in Python and I'm trying to recreate a code I did in college.
The code is based on a 2D Ising model applied to epidemiology. What it does is:
it constructs a 2D 100x100 array using numpy, and assigns a value of -1 to each element.
The energy is calculated based on the function calc_h in the script below.
Then, the code randomly selects a cell from the lattice, changes the value to 1, then calculates the energy of the system again.
Then, the code compares if the energy of the system is less than or equal to the previous configuration. If it does, it "accepts" the change. If it isn't, a probability is compared to a random number to determine if the change is "accepted". This part is done in the metropolis function.
The code repeats the process using a while loop until the maximum specified iteration, max_iterations.
-The code tallies the number of elements with a -1 value (which is the s variable) and the number of elements with a 1 value (which is the i variable) in the countSI function. The script appends to a text file every 500 iteratons.
THE PROBLEM
I ran the script and besides taking too long to execute, the tallying stops at 15500. The code doesn't throw any error, but it just keeps going. I waited for around 3 hours for it to finish but it still goes only up to 15500 iterations.
I've tried commenting out the writing to csv block and instead printing the values first to observe it as it happens, and there I see, it stops at 15500 again.
I have no idea what's wrong as it doesn't throw in any error, and the code doesn't stop.
Here's the whole script. I put a description of what the part does below each block:
import numpy as np
import random as r
import math as m
import csv
init_size = input("Input array size: ")
size = int(init_size)
this part initializes the size of the 2D array. For observation purposes, I selected a 100 by 100 latice.
def check_overflow(indx, size):
if indx == size - 1:
return -indx
else:
return 1
I use this function for the calc_h function, to initialize a circular boundary condition. Simply put, the edges of the lattice are connected to one another.
def calc_h(pop, J1, size):
h_sum = 0
r = 0
c = 0
while r < size:
buffr = check_overflow(r, size)
while c < size:
buffc = check_overflow(c, size)
h_sum = h_sum + J1*pop[r,c] * pop[r,c-1] * pop[r-1,c] * pop[r+buffr,c] * pop[r,c+buffc]
c = c + 1
c = 0
r = r + 1
return h_sum
this function calculates the energy of the system by taking the sum of the product of the value of a cell, its top, bottom, left and right neighbors, multiplied to a constant J.
def metropolis(h, h0, T_):
if h <= h0:
return 1
else:
rand_h = r.random()
p = m.exp(-(h - h0)/T_)
if rand_h <= p:
return 1
else:
return 0
This determines whether the change from -1 to 1 is accepted depending on what calc_h gets.
def countSI(pop, sz, iter):
s = np.count_nonzero(pop == -1)
i = np.count_nonzero(pop == 1)
row = [iter, s, i]
with open('data.txt', 'a') as si_csv:
tally_data = csv.writer(si_csv)
tally_data.writerow(row)
si_csv.seek(0)
This part tallies the number of -1's and 1's in the lattice.
def main():
J = 1
T = 4.0
max_iterations = 150000
population = np.full((size, size), -1, np.int8) #initialize population array
The 2D array is initialized in population.
h_0 = calc_h(population, J, size)
turn = 1
while turn <= max_iterations:
inf_x = r.randint(1,size) - 1
inf_y = r.randint(1,size) - 1
while population[inf_x,inf_y] == 1:
inf_x = r.randint(1,size) - 1
inf_y = r.randint(1,size) - 1
population[inf_x, inf_y] = 1
h = calc_h(population, J, size)
accept_i = metropolis(h,h_0,T)
This is the main loop, where a random cell is selected, and whether the change is accepted or not is determined by the function metropolis.
if (accept_i == 0):
population[inf_x, inf_y] = -1
if turn%500 == 0 :
countSI(population, size, turn)
The script tallies every 500th iteration.
turn = turn + 1
h_0 = h
main()
The expected output is a text file with the tallies of the number of the s and i every 500th iteration. something that looks like this:
500,9736,264
1000,9472,528
1500,9197,803
2000,8913,1087
2500,8611,1389
3000,8292,1708
3500,7968,2032
4000,7643,2357
4500,7312,2688
5000,6960,3040
5500,6613,3387
6000,6257,3743
6500,5913,4087
7000,5570,4430
7500,5212,4788
I have no idea where to start at a solution. At first, I thought it was the writing to csv that's causing the problem, but probing through the print function proves otherwise. I tried to make it as concise as I can.
I hope you guys can help! I really wanna learn this language and start simulating a lot of stuff, and I think this mini project is a great starting step for me.
Thanks a lot!
Answer provided by #randomir in the comments:
Your code is probably wrong. It will block in that nested while loop whenever the number of spins to flip is smaller than the number of iterations. In your example from the previous comment, the size of the population is 10000 and you want to flip 15500 spins. Note that once spin is flipped up (with 100% prob), it will be flipped down with smaller prob, due to metropolis sampling.
works.

Add step size to a linear optimization

I'm working on a blending problem similar to the pulp example
I have this constrain to make sure the quantity produced is the desired one
prob += lpSum([KG[i] * deposit_vars[i] for i in deposit]) == 64, "KGRequirement"
But I also need to add another constraint for the minimun value different than zero, this is because is not convenient that I take for example, 0.002KG of one ingredient, I have to take either 0 or at least 2 kg, hence valid cases are e.g. 0, 2, 2.3, 6, 3.23.
I tried to make it this way:
for i in deposit:
prob += (KG[i] * deposit_vars[i] == 0) or (TM[i] * deposit_vars[i] >= 30)
But that is not working and it just make the problem Infeasible
EDIT
This my current code:
import pulp
from pulp import *
import pandas as pd
food = ["f1","f2","f3","f4"]
KG = [10,20,50,80]
Protein = [18,12,16,18]
Grass = [13,14,13,16]
price_per_kg = [15,11,10,22]
## protein,carbohydrates,kg
df = pd.DataFrame({"tkid":food,"KG":KG,"Protein":Protein,"Grass":Grass,"value":price_per_kg})
deposit = df["tkid"].values.tolist()
factor_volumen = 1
costs = dict((k,v) for k,v in zip(df["tkid"],df["value"]))
Protein = dict((k,v) for k,v in zip(df["tkid"],df["Protein"]))
Grass = dict((k,v) for k,v in zip(df["tkid"],df["Grass"]))
KG = dict((k,v) for k,v in zip(df["tkid"],df["KG"]))
prob = LpProblem("The Whiskas Problem", LpMinimize)
deposit_vars = LpVariable.dicts("Ingr",deposit,0)
prob += lpSum([costs[i]*deposit_vars[i] for i in deposit]), "Total Cost of Ingredients per can"
#prob += lpSum([deposit_vars[i] for i in deposit]) == 1.0, "PercentagesSum"
prob += lpSum([Protein[i] *KG[i] * deposit_vars[i] for i in deposit]) >= 17.2*14, "ProteinRequirement"
prob += lpSum([Grass[i] *KG[i] * deposit_vars[i] for i in deposit]) >= 12.8*14, "FatRequirement"
prob += lpSum([KG[i] * deposit_vars[i] for i in deposit]) == 14, "KGRequirement"
prob += lpSum([KG[i] * deposit_vars[i] for i in deposit]) <= 80, "KGRequirement1"
prob.writeLP("WhiskasModel.lp")
prob.solve()
# The status of the solution is printed to the screen
print ("Status:", LpStatus[prob.status])
# Each of the variables is printed with it's resolved optimum value
for v in prob.variables():
print (v.name, "=", v.varValue)
# The optimised objective function value is printed to the screen
print ("Total Cost of Ingredients per can = ", value(prob.objective))
The new contrain I want to add is in this part:
prob += lpSum([KG[i] * deposit_vars[i] for i in deposit]) <= 80, "KGRequirement1"
Where I want the product KG[i] * deposit_vars[i] be either 0 or to be between a and b
In the traditional linear programming formulation, all variables, objective function(s), and constraints need to be continuous. What you are asking is how to make this variable a discrete variable, i.e. it can only accept values a,b,... and not anything in between. When you have a combination of continuous and discrete variables, that is called a mixed integer problem (MIP). See PuLP documentation that reflects this explanation. I suggest you carefully read the blending problems mentions on "integers;" they are scattered about the page. According to PuLP's documentation, it can solve MIP problems by calling external MIP solver, some of which are already included.
Without a minimum working example, it is a little tricky to explain how to implement this. One way to do this would be to specify the variable(s) as an integer with the values it can take as a dict. Leaving the default solver, COIN-OR's CBC solver solver, will then solve the MIP. Meanwhile, here's a couple of resources for you to move forward:
https://www.toptal.com/algorithms/mixed-integer-programming#example-problem-scheduling
Note how it uses CBC solver, which is the default solver, to solve this problem
http://yetanothermathprogrammingconsultant.blogspot.com/2018/08/scheduling-easy-mip.html
A more explicit example on how they set-up their integer variables and call the CBC solver
'or' is not something you can use in an LP / MIP model directly. Remember, an LP/MIP consists of a linear objective and linear constraints.
To model x=0 or x≥L you can use socalled semi-continuous variables. Most advanced solvers support them. I don't believe Pulp supports this however. As a workaround you can also use a binary variable δ:
δ*L ≤ x ≤ δ*U
where U is an upperbound on x. It is easy to see this works:
δ = 0 ⇒ x = 0
δ = 1 ⇒ L ≤ x ≤ U
Semi-continuous variables don't require these constraints. Just tell the solver variable x is semi-continuous with bounds [L,U] (or just L if there is no upperbound).
The constraint
a*x=0 or L ≤ a*x ≤ U
can be rewritten as
δ*L ≤ x*a ≤ δ*U
δ binary variable
This is a fairly standard formulation. Semi-continuous variables are often used in finance (portfolio models) to prevent small allocations.
All of this keeps the model perfectly linear (not quadratic), so one can use a standard MIP solver and a standard LP/MIP modeling tool such as Pulp.

Generalized Random Response for local differential privacy implementation

I have been tasked with implementing a local (non-interactive) differential privacy mechanism. I am working with a large database of census data. The only sensitive attribute is "Number of children" which is a numerical value ranging from 0 to 13.
I decided to go with the Generalized Random Response mechanism as it seems like the most intuitive method. This mechanism is described here and presented here.
After loading each value into an array (ignoring the other attributes for now), I perform the perturbation as follows.
d = 14 # values may range from 0 to 13
eps = 1 # epsilon level of privacy
p = (math.exp(eps)/(math.exp(eps)+d-1))
q = 1/(math.exp(eps)+d-1)
p_dataset = []
for row in dataset:
coin = random.random()
if coin <= p:
p_dataset.append(row)
else:
p_dataset.append(random.randint(0,13))
Unless I have misinterpreted the definition, I believe this will guarantee epsilon differential privacy on p_dataset.
However, I am having difficulty understanding how the aggregator must interpret this dataset. Following the presentation above, I attempted to implement a method for estimating the number of individuals who answered a particular value.
v = 0 # we are estimating the number of individuals in the dataset who answered 0
nv = 0 # number of users in the perturbed dataset who answered the value
for row in p_dataset:
if row == v:
nv += 1
Iv = nv * p + (n - nv) * q
estimation = (Iv - (n*q)) / (p-q)
I do not know if I have correctly implemented the method described as I do not completely understand what it is doing, and cannot find a clear definition.
Regardless, I used this method to estimate the total amount of individuals who answered each value in the dataset with a value for epsilon ranging from 1 to 14, and then compared this to the actual values. The results are below (please excuse the formatting).
As you can see, the utility of the dataset suffers greatly for low values of epsilon. Additionally, when executed multiple times, there was relatively little deviation in estimations, even for small values of epsilon.
For example, when estimating the number of participants who answered 0 and using an epsilon of 1, all estimations seemed to be centered around 1600, with the largest distance between estimations being 100. Considering the actual value of this query is 5969, I am led to believe that I may have implemented something incorrectly.
Is this the expected behaviour of the Generalized Random Response mechanism, or have I made a mistake in my implementation?
I think when getting a false answer, we cannot directly use p_dataset.append(random.randint(0,13)), because it contains true answer
max_v = 13
min_v = 0
for row in dataset: #row就是dataset里的真实值
coin = random.random()
if coin <= p:
p_dataset.append(row)
else:
ans = []
if row == min_v:
ans = np.arange(min_v + 1, max_v + 1).tolist()
elif row == max_v:
ans = np.arange(min_v, max_v).tolist()
else:
a = np.arange(min_v, row).tolist()
b = np.arange(row + 1, max_v + 1).tolist()
[ans.append(i) for i in a]
[ans.append(i) for i in b]
p_dataset.append(random.sample(ans, 1)) # 这其实有一点问题 应该是真实值以外的其他值 这样写还包括真实值

Avoid underflow using exp and minimum positive float128 in numpy

I am trying to calculate the following ratio:
w(i) / (sum(w(j)) where w are updated using an exponential decreasing function, i.e. w(i) = w(i) * exp(-k), k being a positive parameter. All the numbers are non-negative.
This ratio is then used to a formula (multiply with a constant and add another constant). As expected, I soon run into underflow problems.
I guess this happens often but can someone give me some references on how to deal with this? I did not find an appropriate transformation so one thing I tried to do is set some minimum positive number as a safety threshold but I did not manage to find which is the minimum positive float (I am representing numbers in numpy.float128). How can I actually get the minimum positive such number on my machine?
The code looks like this:
w = np.ones(n, dtype='float128')
lt = np.ones(n)
for t in range(T):
p = (1-k) * w / w.sum() + (k/n)
# Process a subset of the n elements, call it set I, j is some range()
for i in I:
s = p[list(j[i])].sum()
lt /= s
w[s] *= np.exp(-k * lt)
where k is some constant in (0,1) and n is the length of the array
When working with exponentially small numbers it's usually better to work in log space. For example, log(w*exp(-k)) = log(w) - k, which won't have any over/underflow problems unless k is itself exponentially large or w is zero. And, if w is zero, numpy will correctly return -inf. Then, when doing the sum, you factor out the largest term:
log_w = np.log(w) - k
max_log_w = np.max(log_w)
# Individual terms in the following may underflow, but then they wouldn't
# contribute to the sum anyways.
log_sum_w = max_log_w + np.log(np.sum(np.exp(log_w - max_log_w)))
log_ratio = log_w - log_sum_w
This probably isn't exactly what you want since you could just factor out the k completely (assuming it's a constant and not an array), but it should get you on your way.
Scikit-learn implements a similar thing with extmath.logsumexp, but it's basically the same as the above.

Categories