Background: I am trying to allocate students to teams where each student will have a series of preferences to other students being in their teams. I have an objective function which I want to minimise along with 3 constraints for the function (written in the image below). In my DB I have a set of students along with their preferences (such as student i rating student j as their 3rd choice).
If student A rates student B as their 1st choice, that preference will have a weighting of 1 which is why the objective function is set to minimise.
Mathematical Formula:
Question: I am unsure whether I have written the constraints and variables correctly in PuLP, and I can't find any close resources that do team allocation with preferences. I'm very new to PuLP and am struggling to figure out if what I've written is correct syntatically, thanks for any help!
Here is the code that I have written in my file:
from pulp import *
model = LpProblem("Team Allocation Problem", LpMinimize)
############ VARIABLES ############
students = [1,...,20]
n = 20
# this will be imported from the database
r = [[...],...,[...]]
team_sizes = [5,5,5,5]
num_teams = len(z)
# x(ik) = 1 if student i in team k
x_vars = [[LpVariable("x%d%d" % (i,k), cat='Binary')
for k in range(num_teams)]
for i in range(num_students)]
# y(ijk) = 1 if student i and j are in team k
y_vars = [[[LpVariable("y%d%d%d" % (i,j,k), cat='Binary')
for k in range(num_teams)]
for j in range(num_students)]
for i in range(num_students)]
############ OBJECTIVE FUNCTION ############
for i in range(num_students):
for j in range(num_students):
if i!=j:
for k in range(num_teams):
model += r[i][j] * y_vars[i][j][k], "Minimize the sum of rank points in the team"
############ CONSTRAINTS ############
# C1: Every student is on exactly one team
for i in range(num_students):
for k in range(num_teams):
model += lpSum(x_vars[i][k]) == 1
# C2: Every team has the right size
for k in range(num_teams):
for i in range(num_students):
model += lpSum(x_vars[i][k]) == team_sizes[k]
# C3:
for i in range(num_students):
for j in range(num_students):
if i != j:
for k in range(num_teams):
model += 1 + y_vars[i][j][k] >= x_vars[i][k] + x_vars[j][k]
# Solve and Print
model.solve()
print("Status:", model.status)
(1) Make sure the sense of the objective is correct. The way I read your problem, you should maximize.
(2) The proper linearization of
y(i,j,k) = x(i,k)*x(j,k)
is
y(i,j,k) <= x(i,k)
y(i,j,k) <= x(j,k)
y(i,j,k) >= x(i,k)+x(j,k)-1
Sometimes we can drop some of these constraints because of how the objective works. Make sure you have verified, you indeed can drop y(i,j,k) <= x(i,k) and y(i,j,k) <= x(j,k).
(3) This is (almost) the same question as Algorithms for optimal student seating arrangements.
(4) I want to minimise the objective in this case, if someone rates someone as their first choice they'll essentially be given 1 point, 2 points for second, 3 points for third etc. You cannot have 0=no points 1=best 2=second best,... in your formulation. I suggest to recode your points: 0=no points, 1=ok, 2=better, 3=best. (Just preprocessing of the data). Then maximize instead of minimize. You can add -1,-2,... for dislike if you want.
Related
I'm working on a dynamic programming problem and actually, I'm not quite sure whether it is dynamic programming since moving average M is based on previous M. No need to consider the efficiency. The problem requires selling a product over T time periods and maximizing the total actual sale amount. The total number of products is N and I plan to sell some products over different periods n0,n1,⋯,nT−1 and ∑ni=N.
In conclusion, this question wants to find the most optimal schedule for n0,n1,⋯,nT−1 such that ∑ni=N, which maximizes the ∑Si.
And the actual sale amount Si are based on current moving average M and current ni.
Assume that α=0.001 and π=0.5
Initialize M=0. Then for i=0,1,…,T−1
Compute new Mi=⌈0.5∗(Mi+ni)⌉
At time i we sell Si = ⌈(1−α*M^πi)*ni⌉ products
Continue this process until the last time period. For example, assume we already know ni for all periods, the trading will be below
M = 0
T = 4
N = 10000
alpha = 1e-3
pi = 0.5
S = np.zeros(T,dtype='i')
n = np.array([5000,1000,2000,2000])
print(n)
total = 0
for i in range(T):
M = math.ceil(0.5*(M + n[i]))
S[i] = math.ceil((1 - alpha*M**pi)*n[i])
total += S[i]
print('at time %d, M = %d and we sell %d products' %(i,M,S[i]))
print('total sold =', total)
My idea is to keep track of the state based on t time period, n products left, and m moving average as index and store the actual sale in a high dimension matrix. I think the upper bound for moving average is just [0,n] I'm still confusing how to program it. Could someone provide ideas about how to fix some problems in my programming? Thank you very much.
The below is some of my crude codes but the output is a little strange.
def DPtry(N,T,alpha,pi,S):
schedule = np.zeros(T)
M = 0
for n in range(0,N+1):
for m in range(0,n+1):
S[T-1,n,m] = math.ceil((1 - alpha*m**pi)*n)
for k in range(1,T):
t = T - k - 1
print("t = ",t)
for n in range(0,N+1):
for m in range(0,n+1):
best = -1
for plan in range(0,n+1):
salenow = math.ceil((1 - alpha*m**pi)*plan)
M = math.ceil(0.5*(m + plan))
salelater = S[t+1,n-plan,M]
candidate = salenow + salelater
if candidate > best:
best = candidate
S[t,n,m] = best
print(S[0,N,0])
N = 100
T = 5
pi = .5
alpha = 1e-3
S = np.zeros((T,N+1,N+1))
DPtry(N,T,alpha,pi,S)
I am looking to formulate a (I think) complex LP problem in Python using PuLP.
The optimization goal is to maximize profit margin (aggregate acquisition cost vs aggregate sale revenue plus some future appreciation (FV)) on a basket of products for purchase.
The LP decision variables are pricing 'statistics' distinct for each product.
The constraint is that the bid for a particular product cannot use a pricing statistic > some maximum value. 1.0 in this case. And the aggregate ratio of sum(FV) of won items to net revenue must be <= -2.0.
The price I'd bid is a function of a theoretical price plus a theoretical future value (FV) minus a theoretical cost. These 3 inputs are static - but the pricing statistic scales (or weights) the impact of the FV on the bid, which is what I'd like to solve for. Higher statistic -> higher bid. The trick is that once you change the statistic, you change the bid, and this changes the aggregates that PuLP is trying to optimize for. I figured this would be ok since the bid price is a closed form linear formula, but please see below for how I tried to tackle.
I also have the actual price the item sold for, so can compare the model's output price to the actual price to determine whether I would have bought in that case.
Concretely:
Bid[item j] = Theo price + (Statistic to be tuned[product i] * FV) - (Costs + Expenses)
There are 10 products to tune for, and j total items, non-uniformly distributed throughout a dataset.
If my output bid price based on the parameter being tuned in the LP > actual winning bid price, then consider the item purchased, and add it to the objective function.
Can someone please help me formulate this in PuLP? Maybe this is MIP? If so, I am unsure how to represent it formally.
What I have so far is the following:
from pulp import LpMaximize, LpProblem, LpStatus, lpSum, LpVariable, LpBinary
import pandas as pd
df= pd.read_excel('data.xlsx')
#create matrices and set variables
MAX_STAT = 1.0
RATIO_CONSTRAINT = -2.0
PRODUCTS = [0,1,2,3,4,5,6,7,8,9]
ITEMS = df['ITEMS'].tolist() # IDs
#1xj dicts
ITEM_PRODUCT = {ITEMS[i]:df['PRODUCT'].iloc[i] for i in range(len(df))}
ACTUAL_PX = {ITEMS[i]:df['ACTUAL_PX'].iloc[i] for i in range(len(df))}
COST = {ITEMS[i]:df['COST'].iloc[i] for i in range(len(df))}
EXPENSE = {ITEMS[i]:df['EXPENSE'].iloc[i] for i in range(len(df))}
#ixj dicts
THEO_PX = {ITEMS[i]:[df['THEO_PX'].iloc[i] if PRODUCTS[ITEMS[i]] == x else 0for x in PRODUCTS] for i in range(len(df))}
QUANTITY = {ITEMS[i]:[df['QUANTITY'].iloc[i] if PRODUCTS[ITEMS[i]] == x else 0 for x in PRODUCTS] for i in range(len(df))}
FV = {ITEMS[i]:[df['FV'].iloc[i] if PRODUCTS[ITEMS[i]] == x else 0 for x in PRODUCTS] for i in range(len(df))}
use_vars = {j:[i if ITEM_PRODUCT[j] == i else 0 for i in PRODUCTS] for j in ITEMS}
#Define the model
model = LpProblem(name="maximize_margin", sense=LpMaximize)
#Define decision variables
strategy_statistic = LpVariable.dicts('StrategyStat', [(j,i) for j in ITEMS for i in PRODUCTS], 0, MAX_STAT)
#other variables dependent on the statistic
strategy_bid = {(j,i):strategy_statistic[(j,i)]*FV[j][i]+THEO_PX[j][i]-COST[j]-EXPENSE[j] for j in ITEMS for i in PRODUCTS}
win_loss = {(j,i):1 if strategy_bid[(j,i)] >= ACTUAL_PX[j] else 0 for j in ITEMS for i in PRODUCTS}
aggQuantity = lpSum(win_loss[(j,i)]*QUANTITY[j][i]*use_vars[j][i] for j in ITEMS for i in PRODUCTS)
aggTheo = lpSum(win_loss[(j,i)]*THEO_PX[j][i]*QUANTITY[j][i]*use_vars[j][i] for j in ITEMS for i in PRODUCTS)
aggFV = lpSum(win_loss[(j,i)]*FV[j][i]*QUANTITY[j][i]*use_vars[j][i] for j in ITEMS for i in PRODUCTS)
aggBidNotional = lpSum(win_loss[(j,i)]*strategy_bid[(j,i)]*QUANTITY[j][i]*use_vars[j][i] for j in ITEMS for i in PRODUCTS)
model += (aggTheo - aggBidNotional + aggFV)
model += (aggFV / (aggTheo - aggBidNotional)) <= RATIO_CONSTRAINT
Currently seeing an error on the last line saying that:
TypeError: Expressions cannot be divided by a non-constant expression
But I think there is more wrong with this formulation than that...
I've been using a pre-existing assignment tool which I use for allocating undergraduate students to projects. I am now keen to try and build an assignment tool in python which will allow me to add and adjust constraints as we are faced with extraordinary pressures on space usage due to COVID.
The basis of the task is to place as many possible students with their favored supervisors. I have 85 students who provided a 1-to-5 rank order of preferred supervisors, which allows me to tune in a cost variable. In addition, there are 40 supervisors each with varying levels of capacity; some can take 2 students, some 3 etc, with total capacity ca 100.
I have been using Google OR-Tools, python implementation, and have attempted "Assignment with Task Sizes" strategy so far, both CP-SAT and MIP. I can produce a solution using CP-SAT for a small dummy data set, but when I use last years data with a cost matrix of size 85x40, I haven't been able to generate an assignment solution. In contrast, the MIP solver approach produces an assignment with "0 cost" and no actual assignments. I have also started to construct a minimum flow model but so far I have not been able to get the program to run on my input data.
My general question is "what is the best general approach for such an assignment problem of more agents than tasks, but tasks with capacity to accept greater than 1 agent."
Happy to provide what code I have if useful.
Thanks, Dave
EDIT
The code I have been using to attempt a CP-SAT solution is below. I initially populate a np-array of sizes defined by number of students and number of staff and populate this with the value 100. This is what I use for elements which are not a choice in the cost matrix. I have taken the student's choices ( 1-to-5) and squared to give a bit of differentiation:
from __future__ import print_function
from ortools.sat.python import cp_model
import time
import pandas as pd
import numpy as np
capacity=pd.read_excel(r'C:\Users\Dave\Documents\assign_2019.xlsx',
index_col=0, sheet_name='capacity')
capacity=capacity.reset_index()
choices=pd.read_excel(r'C:\Users\Dave\Documents\assign_2019.xlsx',
index_col=0, sheet_name='choices')
choices=choices.reset_index()
array=np.empty((len(choices), len(capacity)))
array.fill(100)
cost = pd.DataFrame(data=array)
cost.index=choices['student']
cost.columns=capacity['staff']
choices=choices.set_index(['student'])
for i in choices.index:
for j in choices.columns:
s=choices.loc[i, j]
cost.loc[i,s]=j**2
cost=cost.to_numpy()
cost=cost.astype(int)
sizes = capacity['capacity']
sizes=sizes.to_numpy()
sizes=sizes.astype(int)
def main():
model = cp_model.CpModel()
start = time.time()
num_workers = len(cost)
num_tasks = len(cost[1])
# Variables
x = []
for i in range(num_workers):
t = []
for j in range(num_tasks):
t.append(model.NewIntVar(0, 1, "x[%i,%i]" % (i, j)))
x.append(t)
x_array = [x[i][j] for i in range(num_workers) for j in
range(num_tasks)]
# Constraints
# Each staff is allocated no more than capacity.
[model.Add(sum(x[i][j] for i in range(num_workers)) <= sizes[j])
for j in range(num_tasks)]
# Number of projects allocated to a student is 1.
[model.Add(sum(x[i][j] for j in range(num_tasks)) == 1)
for i in range(num_workers)]
model.Minimize(sum([np.dot(x_row, cost_row) for (x_row, cost_row) in
zip(x, cost)]))
solver = cp_model.CpSolver()
status = solver.Solve(model)
if status == cp_model.OPTIMAL:
print('Minimum cost = %i' % solver.ObjectiveValue())
print()
for i in range(num_workers):
for j in range(num_tasks):
if solver.Value(x[i][j]) >= 1:
print('Worker ', i, ' assigned to task ', j, ' Cost = ',
cost[i][j])
print()
end = time.time()
print("Time = ", round(end - start, 4), "seconds")
if __name__ == '__main__':
main()
EDIT 2:
Code used for MIP solver (data input is the same as the CP-SAT attempt described above):
def main():
solver = pywraplp.Solver('SolveAssignmentProblem',
pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
start = time.time()
num_workers = len(cost)
num_tasks = len(cost[1])
# Variables
x = {}
for i in range(num_workers):
for j in range(num_tasks):
x[i, j] = solver.IntVar(0, 1, 'x[%i,%i]' % (i, j))
# Constraints
# Staff can accept students up to capacity.
for i in range(num_workers):
solver.Add(solver.Sum([x[i, j] for j in range(num_tasks)]) <=
sizes[j])
# Student is allocated 1 project.
for j in range(num_tasks):
solver.Add(solver.Sum([x[i, j] for i in range(num_workers)]) == 1)
solver.Minimize(solver.Sum([cost[i][j] * x[i,j] for i in
range(num_workers)
for j in range(num_tasks)]))
print('Minimum cost = ', solver.Objective().Value())
print()
for i in range(num_workers):
for j in range(num_tasks):
if x[i, j].solution_value() > 0:
print('Worker', i,' assigned to task', j, ' Cost = ', cost[i]
[j])
print()
end = time.time()
print("Time = ", round(end - start, 4), "seconds")
if __name__ == '__main__':
main()
The outcome of the MIP attempt is:
Minimum cost = 0.0
Time = 0.553 seconds
Trying to code the secretary problem in python by doing a Monte Carlo simulation (without using e). The essence of the problem is here: https://en.wikipedia.org/wiki/Secretary_problem
Described as :Imagine an administrator who wants to hire the best secretary out of n rankable applicants for a position. The applicants are interviewed one by one in random order. A decision about each particular applicant is to be made immediately after the interview. Once rejected, an applicant cannot be recalled. During the interview, the administrator can rank the applicant among all applicants interviewed so far but is unaware of the quality of yet unseen applicants. The question is about the optimal strategy (stopping rule) to maximize the probability of selecting the best applicant. Taken from: https://www.geeksforgeeks.org/secretary-problem-optimal-stopping-problem/
Table that I'm checking my code against:
Here is my python code so far:
n = 7; # of applicants
m = 10000; # of repeats
plot = np.zeros(1);
for i in range (2,m): #multiple runs
array = np.random.randint(1,1000,n);
for j in range(2,n): #over range of array
test = 0;
if array[j] > array[1] and array[j] == array.max():
plot=plot+1
test = 1;
break
if array[j]> array[1]:
test = 2;
break
print(plot/m)
print(array)
print("j = ",j)
print("test = ",test)
I am doing something wrong with my code here that I'm unable to replicate the table. In the above code I've tried to do 7 = number of applicants and take the best applicant after '2'.
The plot/m should output the percentage in column three given the number of applicants and 'take the best after'.
Answered! As below.
Additional code:
import numpy as np
import matplotlib.pyplot as plt
import time
plt.style.use('seaborn-whitegrid')
n = 150 #total number of applicants
nplot = np.empty([1,1])
#take = 3 #not necessary, turned into J below:
for k in range(2,n):
m = 10000 #number of repeats
plot = np.empty([1,1]);
for j in range(1,k):
passed = 0
for i in range (0,m): #multiple runs
array = np.random.rand(k);
picked = np.argmax(array[j:]>max(array[0:j])) + j
best = np.argmax(array)
if best == picked:
passed = passed+1
#print(passed/m)
plot = np.append(plot,[passed/m])
#print(plot)
plot = plot[1:];
x = range(1,k);
y = plot
#print("N = ",k)
print("Check ",plot.argmax()," if you have ",k," applicants", round(100* plot.max(),2),"% chance of finding the best applicant")
nplot =np.append(nplot,plot.max())
# Plot:
nplot = nplot[1:];
x = range(2,n);
y = nplot
plt.plot(x, y, 'o', color='black');
plt.xlabel("Number of Applicants")
plt.ylabel("Probability of Best Applicant")
Here is something that seems to do the job and is a bit simpler. Comments:
Use argmax to determine who is the best secretary, or to pick the first that has a better grade than another group
Draw from a real-valued function to reduce the odds of having 2 secretaries having the same grade.
Hence:
import numpy as np
n = 7
take = 2
m = 100000
passed = 0
for i in range (0,m): #multiple runs
array = np.random.rand(n);
picked = np.argmax(array[take:]>max(array[0:take])) + take
best = np.argmax(array)
if best == picked:
passed = passed+1
print(passed/m)
I'm new to mixed integer optimization problem. Currently, I'm using pulp python interface with default CBC solver to solve the problem.
The problem is to improve resource utilization in a cancer clinic model and below is the code with objective function and constraints. When I use prob.solve(), I have 3 different questions:
1. I get values 1.0 for BeginTreatment variable however I do not get values 1.0 for ContinueTreatment variable?
2. Based on chair continuity constraint, slots numbered more than 29 should not be assigned to pat_type of 8 as there are max 40 slots only available. But I still see that? (and not only for pat_type of 8 but others too)
3. Should I try a different solver instead of the default CBC solver of pulp? If yes, how do I do that?
import pulp
# Indices and Parameters
chair = list(range(1,24))
pat_type = list(range(1,9))
slot = list(range(1,41))
type_and_demand = {1:24,2:10,3:13,4:9,5:7,6:6,7:2,8:1}
type_and_slots = {1:1,2:4,3:8,4:12,5:16,6:20,7:24,8:28}
# Decision Variables
Y = pulp.LpVariable.dicts("BeginTreatment",
(chair,pat_type,slot),0,1,pulp.LpBinary)
X = pulp.LpVariable.dicts("ContinueTreatment",
(chair,pat_type,slot),0,1,pulp.LpBinary)
# Objective Function
prob = pulp.LpProblem("ChairUtilization", pulp.LpMaximize)
prob += pulp.lpSum([Y[i][j][t] for i in chair for j in pat_type for t in
slot])
# Constraints
# Patient Type 1 Continuity constraint
for i in chair:
for t in slot:
for j in range(1,2):
prob += X[i][j][t] == 0
# Chair Continuity Constraint
for i in chair:
for j in range(2,9):
for t in range(1,(len(slot)-type_and_slots[j]+1)+1):
prob += pulp.lpSum([X[i][j][u] for u in range(t+1,t+type_and_slots[j])])
== (type_and_slots[j] - 1)*Y[i][j][t]
# No more than one patient per chair
for t in slot:
for i in chair:
prob += pulp.lpSum([X[i][j][t] for j in pat_type]) + pulp.lpSum([Y[i][j]
[t] for j in pat_type]) <= 1
# No new arrivals during lunch time period
prob += pulp.lpSum([Y[i][j][t] for i in chair for j in pat_type for t in
range(19,23)]) == 0
# Patient Mix
for j in pat_type:
prob += pulp.lpSum([Y[i][j][t] for i in chair for t in slot]) ==
type_and_demand[j]
prob.solve()