Adding soft constraints to a scheduling problem in or-tools python - python

tools and im trying to use it to generate a timeTable for a highschool.
Variables are Lesson, rooms, and timeslots and the goal of course to assign all the lessons to a certain room and a timeslot while respecting the given constraints.
The problem is in the documentation i don't see it talking about soft and hard constraint and all the constraint i've added are surely hard ones, is there a way to implement soft constraint for this example just a simple one.
Thanks in advance.

Probably a duplicate of Do Google Optimization Tools support Soft Constraints?, but I'll add some examples with CP-SAT.
Here's a simple soft limit example:
from ortools.sat.python import cp_model
model = cp_model.CpModel()
solver = cp_model.CpSolver()
x = [model.NewBoolVar("") for i in range(10)]
# hard constraint: number of 1's >= 4
model.Add(sum(x) >= 4)
# soft constraint: number of 1's <= 5
delta = model.NewIntVar(-5, 5, "")
excess = model.NewIntVar(0, 5, "")
model.Add(delta == sum(x) - 5)
model.AddMaxEquality(excess, [delta, 0])
model.Minimize(excess)
solver.Solve(model)
print([solver.Value(i) for i in x])
print(solver.Value(excess))
See a more complex example here
And here's one concerning fullfiled requests:
from ortools.sat.python import cp_model
model = cp_model.CpModel()
solver = cp_model.CpSolver()
x = [model.NewIntVar(0, 10, "") for i in range(10)]
# request: sum() <= 10
req1 = model.NewBoolVar("")
model.Add(sum(x) <= 10).OnlyEnforceIf(req1)
# request: sum() >= 5
req2 = model.NewBoolVar("")
model.Add(sum(x) >= 5).OnlyEnforceIf(req2)
# request: sum() >= 100
req3 = model.NewBoolVar("")
model.Add(sum(x) >= 100).OnlyEnforceIf(req3)
model.Maximize(req1 + req2 + req3)
solver.Solve(model)
print(solver.Value(sum(x)))
print(solver.ObjectiveValue())

Related

Definition of binary variable in Pyomo is not working

I'm new to Pyomo (and optimization) and am trying to reproduce a simple approach (see comment from Fengyuan-Shi on https://github.com/Pyomo/pyomo/issues/821) to create a maximum constraint using the Big M method and binary variables. The code returns the correct answer, but the variables u_1 and u_2, which are supposed to be binary (taking values of 0 or 1 only) are actually taking values between 0 and 1. Can anyone see what I'm doing wrong?
import pyomo.environ as pyomo
m = pyomo.ConcreteModel()
m.x = pyomo.Param(initialize=5)
m.y = pyomo.Param(initialize=9)
m.z = pyomo.Var(domain = pyomo.NonNegativeReals)
m.u_1 = pyomo.Var(domain = pyomo.Binary)
m.u_2 = pyomo.Var(domain = pyomo.Binary)
m.M = pyomo.Param(initialize=1e3) # Big M
m.o = pyomo.Objective(expr = m.z + 8)
m.cons = pyomo.ConstraintList()
# ensure z is the maximum of x and y, per comment from Fengyuan Shi on https://github.com/Pyomo/pyomo/issues/821
# =============================================================================
m.cons.add(m.x <= m.z)
m.cons.add(m.y <= m.z)
m.cons.add(m.x >= m.z - m.M*(1-m.u_1))
m.cons.add(m.y >= m.z - m.M*(1-m.u_2))
m.cons.add(m.u_1 + m.u_2 >= 1)
m.pprint()
solver = pyomo.SolverFactory('ipopt')
status = solver.solve(m)
print("Status = %s" % status.solver.termination_condition)
for v in m.component_objects(pyomo.Var, active=True):
print ("Variable component object",v, v.value)
When the code is run, the output is:
Variable component object z 8.999999912504697 (correct, the maximum of x = 5 and y=9)
Variable component object u_1 0.5250908817936364 (expected this to be either 0 or 1)
Variable component object u_2 0.5274112114061761 (expected this to be either 0 or 1)
Your construct appears correct. You need to use a different solver.
ipopt is typically used for non-linear problems and it does not support integer requirements (which includes binary assignment). Specifically, it only supports continuous variables.
The problem you have is completely linear, so you should be using a linear solver that supports MIP formulations. Your problem is a "Mixed Integer Program" because of the binary requirements. I'd suggest cbc or glpk, both of which are freeware.

How to formulate linear programming optimization in PuLP?

I am looking to formulate a (I think) complex LP problem in Python using PuLP.
The optimization goal is to maximize profit margin (aggregate acquisition cost vs aggregate sale revenue plus some future appreciation (FV)) on a basket of products for purchase.
The LP decision variables are pricing 'statistics' distinct for each product.
The constraint is that the bid for a particular product cannot use a pricing statistic > some maximum value. 1.0 in this case. And the aggregate ratio of sum(FV) of won items to net revenue must be <= -2.0.
The price I'd bid is a function of a theoretical price plus a theoretical future value (FV) minus a theoretical cost. These 3 inputs are static - but the pricing statistic scales (or weights) the impact of the FV on the bid, which is what I'd like to solve for. Higher statistic -> higher bid. The trick is that once you change the statistic, you change the bid, and this changes the aggregates that PuLP is trying to optimize for. I figured this would be ok since the bid price is a closed form linear formula, but please see below for how I tried to tackle.
I also have the actual price the item sold for, so can compare the model's output price to the actual price to determine whether I would have bought in that case.
Concretely:
Bid[item j] = Theo price + (Statistic to be tuned[product i] * FV) - (Costs + Expenses)
There are 10 products to tune for, and j total items, non-uniformly distributed throughout a dataset.
If my output bid price based on the parameter being tuned in the LP > actual winning bid price, then consider the item purchased, and add it to the objective function.
Can someone please help me formulate this in PuLP? Maybe this is MIP? If so, I am unsure how to represent it formally.
What I have so far is the following:
from pulp import LpMaximize, LpProblem, LpStatus, lpSum, LpVariable, LpBinary
import pandas as pd
df= pd.read_excel('data.xlsx')
#create matrices and set variables
MAX_STAT = 1.0
RATIO_CONSTRAINT = -2.0
PRODUCTS = [0,1,2,3,4,5,6,7,8,9]
ITEMS = df['ITEMS'].tolist() # IDs
#1xj dicts
ITEM_PRODUCT = {ITEMS[i]:df['PRODUCT'].iloc[i] for i in range(len(df))}
ACTUAL_PX = {ITEMS[i]:df['ACTUAL_PX'].iloc[i] for i in range(len(df))}
COST = {ITEMS[i]:df['COST'].iloc[i] for i in range(len(df))}
EXPENSE = {ITEMS[i]:df['EXPENSE'].iloc[i] for i in range(len(df))}
#ixj dicts
THEO_PX = {ITEMS[i]:[df['THEO_PX'].iloc[i] if PRODUCTS[ITEMS[i]] == x else 0for x in PRODUCTS] for i in range(len(df))}
QUANTITY = {ITEMS[i]:[df['QUANTITY'].iloc[i] if PRODUCTS[ITEMS[i]] == x else 0 for x in PRODUCTS] for i in range(len(df))}
FV = {ITEMS[i]:[df['FV'].iloc[i] if PRODUCTS[ITEMS[i]] == x else 0 for x in PRODUCTS] for i in range(len(df))}
use_vars = {j:[i if ITEM_PRODUCT[j] == i else 0 for i in PRODUCTS] for j in ITEMS}
#Define the model
model = LpProblem(name="maximize_margin", sense=LpMaximize)
#Define decision variables
strategy_statistic = LpVariable.dicts('StrategyStat', [(j,i) for j in ITEMS for i in PRODUCTS], 0, MAX_STAT)
#other variables dependent on the statistic
strategy_bid = {(j,i):strategy_statistic[(j,i)]*FV[j][i]+THEO_PX[j][i]-COST[j]-EXPENSE[j] for j in ITEMS for i in PRODUCTS}
win_loss = {(j,i):1 if strategy_bid[(j,i)] >= ACTUAL_PX[j] else 0 for j in ITEMS for i in PRODUCTS}
aggQuantity = lpSum(win_loss[(j,i)]*QUANTITY[j][i]*use_vars[j][i] for j in ITEMS for i in PRODUCTS)
aggTheo = lpSum(win_loss[(j,i)]*THEO_PX[j][i]*QUANTITY[j][i]*use_vars[j][i] for j in ITEMS for i in PRODUCTS)
aggFV = lpSum(win_loss[(j,i)]*FV[j][i]*QUANTITY[j][i]*use_vars[j][i] for j in ITEMS for i in PRODUCTS)
aggBidNotional = lpSum(win_loss[(j,i)]*strategy_bid[(j,i)]*QUANTITY[j][i]*use_vars[j][i] for j in ITEMS for i in PRODUCTS)
model += (aggTheo - aggBidNotional + aggFV)
model += (aggFV / (aggTheo - aggBidNotional)) <= RATIO_CONSTRAINT
Currently seeing an error on the last line saying that:
TypeError: Expressions cannot be divided by a non-constant expression
But I think there is more wrong with this formulation than that...

Assignment problem: more agents than tasks, but tasks with multiple capacity

I've been using a pre-existing assignment tool which I use for allocating undergraduate students to projects. I am now keen to try and build an assignment tool in python which will allow me to add and adjust constraints as we are faced with extraordinary pressures on space usage due to COVID.
The basis of the task is to place as many possible students with their favored supervisors. I have 85 students who provided a 1-to-5 rank order of preferred supervisors, which allows me to tune in a cost variable. In addition, there are 40 supervisors each with varying levels of capacity; some can take 2 students, some 3 etc, with total capacity ca 100.
I have been using Google OR-Tools, python implementation, and have attempted "Assignment with Task Sizes" strategy so far, both CP-SAT and MIP. I can produce a solution using CP-SAT for a small dummy data set, but when I use last years data with a cost matrix of size 85x40, I haven't been able to generate an assignment solution. In contrast, the MIP solver approach produces an assignment with "0 cost" and no actual assignments. I have also started to construct a minimum flow model but so far I have not been able to get the program to run on my input data.
My general question is "what is the best general approach for such an assignment problem of more agents than tasks, but tasks with capacity to accept greater than 1 agent."
Happy to provide what code I have if useful.
Thanks, Dave
EDIT
The code I have been using to attempt a CP-SAT solution is below. I initially populate a np-array of sizes defined by number of students and number of staff and populate this with the value 100. This is what I use for elements which are not a choice in the cost matrix. I have taken the student's choices ( 1-to-5) and squared to give a bit of differentiation:
from __future__ import print_function
from ortools.sat.python import cp_model
import time
import pandas as pd
import numpy as np
capacity=pd.read_excel(r'C:\Users\Dave\Documents\assign_2019.xlsx',
index_col=0, sheet_name='capacity')
capacity=capacity.reset_index()
choices=pd.read_excel(r'C:\Users\Dave\Documents\assign_2019.xlsx',
index_col=0, sheet_name='choices')
choices=choices.reset_index()
array=np.empty((len(choices), len(capacity)))
array.fill(100)
cost = pd.DataFrame(data=array)
cost.index=choices['student']
cost.columns=capacity['staff']
choices=choices.set_index(['student'])
for i in choices.index:
for j in choices.columns:
s=choices.loc[i, j]
cost.loc[i,s]=j**2
cost=cost.to_numpy()
cost=cost.astype(int)
sizes = capacity['capacity']
sizes=sizes.to_numpy()
sizes=sizes.astype(int)
def main():
model = cp_model.CpModel()
start = time.time()
num_workers = len(cost)
num_tasks = len(cost[1])
# Variables
x = []
for i in range(num_workers):
t = []
for j in range(num_tasks):
t.append(model.NewIntVar(0, 1, "x[%i,%i]" % (i, j)))
x.append(t)
x_array = [x[i][j] for i in range(num_workers) for j in
range(num_tasks)]
# Constraints
# Each staff is allocated no more than capacity.
[model.Add(sum(x[i][j] for i in range(num_workers)) <= sizes[j])
for j in range(num_tasks)]
# Number of projects allocated to a student is 1.
[model.Add(sum(x[i][j] for j in range(num_tasks)) == 1)
for i in range(num_workers)]
model.Minimize(sum([np.dot(x_row, cost_row) for (x_row, cost_row) in
zip(x, cost)]))
solver = cp_model.CpSolver()
status = solver.Solve(model)
if status == cp_model.OPTIMAL:
print('Minimum cost = %i' % solver.ObjectiveValue())
print()
for i in range(num_workers):
for j in range(num_tasks):
if solver.Value(x[i][j]) >= 1:
print('Worker ', i, ' assigned to task ', j, ' Cost = ',
cost[i][j])
print()
end = time.time()
print("Time = ", round(end - start, 4), "seconds")
if __name__ == '__main__':
main()
EDIT 2:
Code used for MIP solver (data input is the same as the CP-SAT attempt described above):
def main():
solver = pywraplp.Solver('SolveAssignmentProblem',
pywraplp.Solver.CBC_MIXED_INTEGER_PROGRAMMING)
start = time.time()
num_workers = len(cost)
num_tasks = len(cost[1])
# Variables
x = {}
for i in range(num_workers):
for j in range(num_tasks):
x[i, j] = solver.IntVar(0, 1, 'x[%i,%i]' % (i, j))
# Constraints
# Staff can accept students up to capacity.
for i in range(num_workers):
solver.Add(solver.Sum([x[i, j] for j in range(num_tasks)]) <=
sizes[j])
# Student is allocated 1 project.
for j in range(num_tasks):
solver.Add(solver.Sum([x[i, j] for i in range(num_workers)]) == 1)
solver.Minimize(solver.Sum([cost[i][j] * x[i,j] for i in
range(num_workers)
for j in range(num_tasks)]))
print('Minimum cost = ', solver.Objective().Value())
print()
for i in range(num_workers):
for j in range(num_tasks):
if x[i, j].solution_value() > 0:
print('Worker', i,' assigned to task', j, ' Cost = ', cost[i]
[j])
print()
end = time.time()
print("Time = ", round(end - start, 4), "seconds")
if __name__ == '__main__':
main()
The outcome of the MIP attempt is:
Minimum cost = 0.0
Time = 0.553 seconds

Coding a secretary problem (Monte Carlo) - problems with python code

Trying to code the secretary problem in python by doing a Monte Carlo simulation (without using e). The essence of the problem is here: https://en.wikipedia.org/wiki/Secretary_problem
Described as :Imagine an administrator who wants to hire the best secretary out of n rankable applicants for a position. The applicants are interviewed one by one in random order. A decision about each particular applicant is to be made immediately after the interview. Once rejected, an applicant cannot be recalled. During the interview, the administrator can rank the applicant among all applicants interviewed so far but is unaware of the quality of yet unseen applicants. The question is about the optimal strategy (stopping rule) to maximize the probability of selecting the best applicant. Taken from: https://www.geeksforgeeks.org/secretary-problem-optimal-stopping-problem/
Table that I'm checking my code against:
Here is my python code so far:
n = 7; # of applicants
m = 10000; # of repeats
plot = np.zeros(1);
for i in range (2,m): #multiple runs
array = np.random.randint(1,1000,n);
for j in range(2,n): #over range of array
test = 0;
if array[j] > array[1] and array[j] == array.max():
plot=plot+1
test = 1;
break
if array[j]> array[1]:
test = 2;
break
print(plot/m)
print(array)
print("j = ",j)
print("test = ",test)
I am doing something wrong with my code here that I'm unable to replicate the table. In the above code I've tried to do 7 = number of applicants and take the best applicant after '2'.
The plot/m should output the percentage in column three given the number of applicants and 'take the best after'.
Answered! As below.
Additional code:
import numpy as np
import matplotlib.pyplot as plt
import time
plt.style.use('seaborn-whitegrid')
n = 150 #total number of applicants
nplot = np.empty([1,1])
#take = 3 #not necessary, turned into J below:
for k in range(2,n):
m = 10000 #number of repeats
plot = np.empty([1,1]);
for j in range(1,k):
passed = 0
for i in range (0,m): #multiple runs
array = np.random.rand(k);
picked = np.argmax(array[j:]>max(array[0:j])) + j
best = np.argmax(array)
if best == picked:
passed = passed+1
#print(passed/m)
plot = np.append(plot,[passed/m])
#print(plot)
plot = plot[1:];
x = range(1,k);
y = plot
#print("N = ",k)
print("Check ",plot.argmax()," if you have ",k," applicants", round(100* plot.max(),2),"% chance of finding the best applicant")
nplot =np.append(nplot,plot.max())
# Plot:
nplot = nplot[1:];
x = range(2,n);
y = nplot
plt.plot(x, y, 'o', color='black');
plt.xlabel("Number of Applicants")
plt.ylabel("Probability of Best Applicant")
Here is something that seems to do the job and is a bit simpler. Comments:
Use argmax to determine who is the best secretary, or to pick the first that has a better grade than another group
Draw from a real-valued function to reduce the odds of having 2 secretaries having the same grade.
Hence:
import numpy as np
n = 7
take = 2
m = 100000
passed = 0
for i in range (0,m): #multiple runs
array = np.random.rand(n);
picked = np.argmax(array[take:]>max(array[0:take])) + take
best = np.argmax(array)
if best == picked:
passed = passed+1
print(passed/m)

Different forms of genetic algorithim

I wrote a code that implements a simple genetic algorithm to maximize:
f(x) = 15x - x^2
The function has its maximum at 7.5, so the code output should be 7 or 8 since the population are integers.
When I run the code 10 times I get 7 or 8 around three times out of 10.
What modification should I make to further improve the algorithm and what are different types of genetic algorithms?
Here is the code:
from random import *
import numpy as np
#fitness function
def fit(x):
return 15*x -x**2
#covert binary list to decimal number
def to_dec(x):
return int("".join(str(e) for e in x), 2)
#picks pairs from the original population
def gen_pairs(populationl, prob):
pairsl = []
test = [0, 1, 2, 3, 4, 5]
for i in range(3):
pair = []
for j in range(2):
temp = np.random.choice(test, p=prob)
pair.append(populationl[temp].copy())
pairsl.append(pair)
return pairsl
#mating function
def cross_over(prs, mp):
new = []
for pr in prs:
if mp[prs.index(pr)] == 1:
index = np.random.choice([1,2,3], p=[1/3, 1/3, 1/3])
pr[0][:index], pr[1][:index] = pr[1][:index], pr[0][:index]
for pr in prs:
new.append(pr[0])
new.append(pr[1])
return new
#mutation
def mutation(x):
for chromosome in x:
for gene in chromosome:
mutation_prob = np.random.choice([0, 1], p=[0.999, .001])
if mutation_prob == 1:
#m_index = np.random.choice([0,1,2,3])
if gene == 0:
gene = 1
else:
gene = 0
#generate initial population
randlist = lambda n:[randint(0,1) for b in range(1, n+1)]
for j in range(10):
population = [randlist(4) for i in range(6)]
for _ in range(20):
fittness = [fit(to_dec(y)) for y in population]
s = sum(fittness)
prob = [e/s for e in fittness]
pairsg = gen_pairs(population.copy(), prob)
mating_prob = []
for i in pairsg:
mating_prob.append(np.random.choice([0,1], p=[0.4,0.6]))
new_population = cross_over(pairsg, mating_prob)
mutated = mutation(new_population)
decimal_p = [to_dec(i)for i in population]
decimal_new = [to_dec(i)for i in new_population]
# print(decimal_p)
# print(decimal_new)
population = new_population
print(decimal_new)
This is a very typical situation with evolutionary algorithms. Success rate is a quite common metric, and 30% is a decent result.
Just an example, recently I implemented a GP/GE solver for Santa Fe Trail problem, and it demonstrates the success rate of 30% or less.
How to improve success rate
A personal interpretation of the problem based on limited experience follows.
An evolutionary algorithm fails to find a close to global optimum solution when it converges around a local optimum or gets stuck on a great plateau, and has not enough diversity in its population to escape this trap by finding a better region.
You may try to supply your algorithm with more diversity by increasing the size of the population. Or you may look into techniques like novelty search, and quality diversity.
By the way, here is a very nice interactive demonstration of novelty search vs. fitness search: http://eplex.cs.ucf.edu/noveltysearch/userspage/demo.html

Categories