I want to solve a (convex) mixed integer program as well as its continuous relaxation using cvxpy. Is there a way to use the same implementation of the objective and the constraints for both calculations?
As an example, take a look at the MIP example problem from the cvxpy website with some added constraint 'x[0]>=2':
np.random.seed(0)
m, n= 40, 25
A = np.random.rand(m, n)
b = np.random.randn(m)
# Construct a CVXPY problem
x = cp.Variable(n, integer=True) # x is an integer variable
obj = cp.sum_squares(A#x - b)
objective = cp.Minimize(obj)
constraint = [x[0] >= 2]
prob = cp.Problem(objective, constraint)
prob.solve()
print("The optimal value is", prob.value)
print("A solution x is")
print(x.value)
x = cp.Variable(n) # Now, x is no longer an integer variable but continuous
obj = cp.sum_squares(A#x - b) # I want to leave out this line (1)
constraint = [x[0] >= 2] # I want to leave out this line (2)
objective = cp.Minimize(obj)
prob = cp.Problem(objective, constraint)
prob.solve()
print("The optimal value is", prob.value)
print("A solution x is")
print(x.value)
When leaving out line (2), the problem is solved without the constraint. When leaving out line (1), the mixed integer problem is solved (so, changing 'x' to a continuous variable did not have any effect).
I want to avoid reimplementing the objective function and constraints because a missed copy and paste may lead to weird, hard-to-find errors.
Thank you for your help!
Edit: Thank you, Sascha, for your reply. You are right, outsourcing the model building solves the problem. So
class ModelBuilder:
m, n = 40, 25
A = np.random.rand(m, n)
b = np.random.randn(m)
def __init__(self, solve_continuous):
np.random.seed(0)
if solve_continuous:
self.x = cp.Variable(self.n)
else:
self.x = cp.Variable(self.n, integer=True)
#staticmethod
def constraint_func(x):
return [x[0] >= 2]
def objective_func(self, x):
return cp.sum_squares(self.A#x - self.b)
def build_problem(self):
objective = cp.Minimize(self.objective_func(self.x))
constraint = self.constraint_func(self.x)
return cp.Problem(objective, constraint)
# Construct and solve mixed integer problem
build_cont_model = False
MIP_Model = ModelBuilder(build_cont_model)
MIP_problem = MIP_Model.build_problem()
MIP_problem.solve()
print("The optimal value is", MIP_problem.value)
print("A solution x is")
print(MIP_Model.x.value)
# Construct and solve continuous problem
build_cont_model = True
Cont_Model = ModelBuilder(build_cont_model)
Cont_problem = Cont_Model.build_problem()
Cont_problem.solve()
print("The optimal value is", Cont_problem.value)
print("A solution x is")
print(Cont_Model.x.value)
works just as expected. Since I did not have this simple idea, it shows me that I do not yet understand the concept of applying a cvxpy.Variable to an expression.
In my first attempt, I defined variable x and used it when defining obj. Then, I changed the value of x (one line before (1)). I thought that obj was linked to x by a pointer or something similar, so that it would change its behavior, as well. Apparently, this is not the case.
Do you know any resources that could help me understand this behavior? Or is it obvious to anyone that is familiar with Python? Then, where could I learn about it?
I'm building a simple N-body integrator in python3.5, that implements the leapfrog timestepping as a position verlet.
In its essence it updates two float variables, x_tmp and v_tmp back and forth, where I need a function call to self.forces() to update v_tmp.
It is this function call that slows me down incredibly (I've profiled). The call does nothing out of the ordinary, just a few square roots and some adding and dividing numbers.
for t in range(self.max_timesteps):
#For all objects do the position verlet with generator expression / list comprehensions
x_tmp = [x_tmp[j] + 0.5*self.timestep*v_tmp[j] for j in range(self.num_objects)]
v_tmp = [v_tmp[j] + self.timestep*self.forces(x_tmp[j]) for j in range(self.num_objects)]
x_tmp = [x_tmp[j] + 0.5*self.timestep*v_tmp[j] for j in range(self.num_objects)]
if(t % self.outputtime == 0):
self.x_list[outputcounter] = x_tmp
self.v_list[outputcounter] = v_tmp
and the function self.forces() is
def forces(self,x):
r = np.sqrt((x[0])**2+(x[1])**2+(x[2])**2) # spherical radius
R = math.hypot(x[0], x[1]) # cylindrical radius
def _f1(r,x,y,z):
f = -G*self.Mb/(r*(r+self.rb)**2)
return np.array([f*x, f*y, f*z])
def _f2(R,x,y,z):
rr = math.hypot(self.disc_b,z)
arr = (self.disc_a+rr)
arrR = math.hypot(arr,R)
f = -G*self.Md/ arrR**3.
fz = f*(arr/rr)
return np.array([f*x,f*y,fz*z])
def _f3(r,x,y,z):
f = -self.Vh**2/(self.rh**2+r**2)
return np.array([f*x,f*y,f*z])
a = _f1(r,x[0],x[1],x[2]) + _f2(R,x[0],x[1],x[2]) + _f3(r,x[0],x[1],x[2])
return np.array((a[0], a[1], a[2]))
Now both lines in the upper code-block with x_tmp = ... in them scale well with num_objects (they nearly don't), but the line with v_tmp and the function call in it scales linearly with num_objects.
This is pretty bad. With max_timestes = 10^6 I get num_objects seconds runtime with this code, so if I want to compute 200 objects with this code, it takes me 200 seconds. This is completely unacceptable.
However I'm a bit at a loss for what to do here, as I've already optimized a few 2D-square-roots with math.hypot() and a couple of other things. But the forces()-call is still incredibly slow, an occurence that would never happen in C or equivalent.
So now I'm asking for help, is there anything obvious that I have overlooked in optimizing those function calls? Or can I quickly build a C-function to call which could speed things up.
Any ideas appreciated.
I developed a SCIP/MIP model using LP relaxation which relies on branching on 0-1 variables. However, it is quite inefficient as I have not figured out how to use relevant SCIP callbacks.
Here is my code:
isMIP = False
while True:
model.optimize()
if isMIP:
print("Optimal value:", model.getObjVal())
break
else:
print("Intermediate value:", model.getObjVal())
x,y,u = model.data
fracvars = []
for j in y:
w = model.getVal(y[j])
if w > 0.001 and w < 0.999:
fracvars.append([j,abs(w-0.5)])
if fracvars:
fracvars.sort(key = itemgetter(1))
min_var, min_value = min([(val[0],val[1]) for val in fracvars])
model.freeTransform()
model.chgVarType(y[min_var],"I") # the very inefficient part...
print("Integer constraint on y[%s]" % min_var)
else:
isMIP = True
Could anyone help me speed up the code? Many thanks.
Please see http://scip.zib.de/doc-5.0.1/html/BRANCH.php for how to write a branching rule and http://scip.zib.de/doc-5.0.1/html/SEPA.php for cutting plane separators (I am still not sure what you want to do exactly...). This is the description for C plugins, but the equivalents should exist in PySCIPOpt or should be easy to add if you know what you need.
I have an undirected graph with 1034 vertices and 53498 edges. I'm computing the preferential attachment index for the vertices. The Preferential Attachment similarity between two vertices is defined as the multiplication of the degree of the first vertex times the degree of the second vertex. I noticed that my computations are very slow. It took 2.7 minutes to compute that for the mentioned graph. I'm not sure if it's my algorithm that is slow or something else is wrong. I would be very thankful if someone could have a little look into my code.
Edit: I just realized that S is a 1034_by_1034 matrix. Looking at the nested for-loops it seems that it's a O(n^2) algorithm! I guess that is why it's slow. Don't you agree?
def pa(graph):
"""
Calculates Preferential Attachment index.
Returns S the similarity matrix.
"""
A = gts.adjacency(graph)
S = np.zeros(A.shape)
for i in xrange(S.shape[0]):
for j in xrange(S.shape[0]):
i_degree = graph.vertex(i).out_degree()
j_degree = graph.vertex(j).out_degree()
factor = i_degree * j_degree
S[i,j] = factor
return S
With all i know about it, these are the speedups i can suggest:
zeroth speedup: the i_degree is not depending on j, so move it up one level
def pa(graph):
A = gts.adjacency(graph)
S = np.zeros(A.shape)
for i in xrange(S.shape[0]):
i_degree = graph.vertex(i).out_degree() # easy to see that this can be put here instead, since it does not depend on j
for j in xrange(S.shape[0]):
j_degree = graph.vertex(j).out_degree()
factor = i_degree * j_degree
S[i,j] = factor
return S
first speedup: calling out_degree() only N times, instead of 2N^2.
def pa2(graph):
A = gts.adjacency(graph)
i_degree = numpy.zeros(A.shape[0])
for i in xrange(A.shape[0]):
i_degree[i] = graph.vertex(i).out_degree()
S = np.zeros(A.shape)
for i in xrange(S.shape[0]):
for j in xrange(S.shape[0]):
S[i,j] = i_degree[i]*i_degree[j]
return S
Second speedup: numpy instead of python for-loop
def pa3(graph):
A = gts.adjacency(graph)
i_degree = numpy.zeros(A.shape[0])
for i in xrange(A.shape[0]):
i_degree[i] = graph.vertex(i).out_degree()
S = i_degree[:,None]*i_degree[None,:]
return S
This abuses the symmetry of your problem.
Note: The [None,:] does the same as using [numpy.newaxis,:]. If you wanted to keep your code, you could also use an #memoize decorator on that out_degree() method, but it is better to use that only on stuff that is recursive, and this is not one of those cases.
The subset sum problem is well-known for being NP-complete, but there are various tricks to solve versions of the problem somewhat quickly.
The usual dynamic programming algorithm requires space that grows with the target sum. My question is: can we reduce this space requirement?
I am trying to solve a subset sum problem with a modest number of elements but a very large target sum. The number of elements is too large for the exponential time algorithm (and shortcut method) and the target sum is too large for the usual dynamic programming method.
Consider this toy problem that illustrates the issue. Given the set A = [2, 3, 6, 8] find the number of subsets that sum to target = 11 . Enumerating all subsets we see the answer is 2: (3, 8) and (2, 3, 6).
The dynamic programming solution gives the same result, of course - ways[11] returns 2:
def subset_sum(A, target):
ways = [0] * (target + 1)
ways[0] = 1
ways_next = ways[:]
for x in A:
for j in range(x, target + 1):
ways_next[j] += ways[j - x]
ways = ways_next[:]
return ways[target]
Now consider targeting the sum target = 1100 the set A = [200, 300, 600, 800]. Clearly there are still 2 solutions: (300, 800) and (200, 300, 600). However, the ways array has grown by a factor of 100.
Is it possible to skip over certain weights when filling out the dynamic programming storage array? For my example problem I could compute the greatest common denominator of the input set and then reduce all items by that constant, but this won't work for my real application.
This SO question is related, but those answers don't use the approach I have in mind. The second comment by Akshay on this page says:
...in the cases where n is very small (eg. 6) and sum is very large
(eg. 1 million) then the space complexity will be too large. To avoid
large space complexity n HASHTABLES can be used.
This seems closer to what I'm looking for, but I can't seem to actually implement the idea. Is this really possible?
Edited to add: A smaller example of a problem to solve. There is 1 solution.
target = 5213096522073683233230240000
A = [2316931787588303659213440000,
1303274130518420808307560000,
834095443531789317316838400,
579232946897075914803360000,
425558899761116998631040000,
325818532629605202076890000,
257436865287589295468160000,
208523860882947329329209600,
172333769324749858949760000,
144808236724268978700840000,
123386899930738064691840000,
106389724940279249657760000,
92677271503532146368537600,
81454633157401300519222500,
72153585080604612224640000,
64359216321897323867040000,
57762842349846905631360000,
52130965220736832332302400,
47284322195679666514560000,
43083442331187464737440000,
39418499221729173786240000,
36202059181067244675210000,
33363817741271572692673536,
30846724982684516172960000,
28604096143065477274240000,
26597431235069812414440000,
24794751591313594450560000,
23169317875883036592134400,
21698632766175580575360000,
20363658289350325129805625,
19148196591638873216640000,
18038396270151153056160000,
17022355990444679945241600]
A real problem is:
target = 262988806539946324131984661067039976436265064677212251086885351040000
A = [116883914017753921836437627140906656193895584300983222705282378240000,
65747201634986581032996165266759994109066266169303062771721337760000,
42078209046391411861117545770726396229802410348353960173901656166400,
29220978504438480459109406785226664048473896075245805676320594560000,
21468474003260924418937523352411426647858372626711204170357987840000,
16436800408746645258249041316689998527266566542325765692930334440000,
12987101557528213537381958571211850688210620477887024745031375360000,
10519552261597852965279386442681599057450602587088490043475414041600,
8693844844295746252297013588993057072273225278585528961549928960000,
7305244626109620114777351696306666012118474018811451419080148640000,
6224587137040149683597270084426981690799173128454727836375984640000,
5367118500815231104734380838102856661964593156677801042589496960000,
4675356560710156873457505085636266247755823372039328908211295129600,
4109200102186661314562260329172499631816641635581441423232583610000,
3639983481521748430892521260443459881470796742937193786669693440000,
3246775389382053384345489642802962672052655119471756186257843840000,
2914003396564502206448583502127866774917064428556368433095682560000,
2629888065399463241319846610670399764362650646772122510868853510400,
2385386000362324935437502594712380738650930291856800463373109760000,
2173461211073936563074253397248264268068306319646382240387482240000,
1988573206351200938616141104476672789688204647842814753019927040000,
1826311156527405028694337924076666503029618504702862854770037160000,
1683128361855656474444701830829055849192096413934158406956066246656,
1556146784260037420899317521106745422699793282113681959093996160000,
1443011284169801504153550952356872298690068941987447193892375040000,
1341779625203807776183595209525714165491148289169450260647374240000,
1250838556670374906691960338012080744048823137584838292922165760000,
1168839140177539218364376271409066561938955843009832227052823782400,
1094646437211014876720019400903392201607763016346356924399106560000,
1027300025546665328640565082293124907954160408895360355808145902500,
965982760477305139144112620999228563585913919842836551283325440000,
909995870380437107723130315110864970367699185734298446667423360000,
858738960130436976757500934096457065914334905068448166814319513600,
811693847345513346086372410700740668013163779867939046564460960000,
768411414287644482489363509326632509674989232073666182868912640000,
728500849141125551612145875531966693729266107139092108273920640000,
691620793004461075955252231602997965644352569828303092930664960000,
657472016349865810329961652667599941090662661693030627717213377600,
625791330255672395317036671188673352614551016483550865168079360000,
596346500090581233859375648678095184662732572964200115843277440000,
568931977371436071675467087219123799753953628290345594563299840000,
543365302768484140768563349312066067017076579911595560096870560000,
519484062301128541495278342848474027528424819115480989801255014400,
497143301587800234654035276119168197422051161960703688254981760000,
476213321032044045508347054897310957784092466595223632570186240000,
456577789131851257173584481019166625757404626175715713692509290000,
438132122515529069774235170457376054037925971973698044293020160000,
420782090463914118611175457707263962298024103483539601739016561664,
404442609057972047876946806715939986830088526993021531852188160000,
389036696065009355224829380276686355674948320528420489773499040000,
374494562534633427030238036407319297168052779889230688624970240000,
360752821042450376038387738089218074672517235496861798473093760000,
347753793771829850091880543559722282890929011143421158461997158400,
335444906300951944045898802381428541372787072292362565161843560000,
323778155173833578494287055791985197213007158728485381455075840000,
312709639167593726672990084503020186012205784396209573230541440000,
302199145693704480473409550206308504954053507241841138853071360000,
292209785044384804591094067852266640484738960752458056763205945600,
282707666261699891568916593460940582033071824431295083135592960000,
273661609302753719180004850225848050401940754086589231099776640000,
265042888929147215048611399412486748738992254650755607041456640000,
256825006386666332160141270573281226988540102223840088952036475625,
248983485481605987343890803377079267631966925138189113455039385600,
241495690119326284786028155249807140896478479960709137820831360000,
234340660761814501342824380545368657996226388663143017230461440000,
227498967595109276930782578777716242591924796433574611666855840000,
220952578483466770957349011608519198854244960871423861446658560000,
214684740032609244189375233524114266478583726267112041703579878400,
208679870295533683104133831435857945991878646837700655494453760000,
202923461836378336521593102675185167003290944966984761641115240000,
197401994025105141026072179446079922264038329650750423033879040000,
192102853571911120622340877331658127418747308018416545717228160000,
187014262428406274938300203425450649910232934881573156328451805184,
182125212285281387903036468882991673432316526784773027068480160000,
177425404985627474536673746714144021883127046501745489011223040000,
172905198251115268988813057900749491411088142457075773232666240000,
168555556186474170249629649778586749838977769381324948621621760000,
164368004087466452582490413166899985272665665423257656929303344400]
In the particular comment you linked to, the suggestion is to use a hashtable to only store values which actually arise as a sum of some subset. In the worst case, this is exponential in the number of elements, so it is basically equivalent to the brute force approach you already mentioned and ruled out.
In general, there are two parameters to the problem - the number of elements in the set and the size of the target sum. Naive brute force is exponential in the first, while the standard dynamic programming solution is exponential in the second. This works well when one of the parameters is small, but you already indicated that both parameters are too big for an exponential solution. Therefore, you are stuck with the "hard" general case of the problem.
Most NP-Complete problems have some underlying graph whether implicit or explicit. Using graph partitioning and DP, it can be solved exponential in the treewidth of the graph but only polynomial in the size of the graph with treewidth held constant. Of course, without access to your data, it is impossible to say what the underlying graph might look like or whether it is in one of the classes of graphs that have bounded treewidths and hence can be solved efficiently.
Edit: I just wrote the following code to show what I meant by reducing it mod small numbers. The following code solves your first problem in less than a second, but it doesn't work on the larger problem (though it does reduce it to n=57, log(t)=68).
target = 5213096522073683233230240000
A = [2316931787588303659213440000,
1303274130518420808307560000,
834095443531789317316838400,
579232946897075914803360000,
425558899761116998631040000,
325818532629605202076890000,
257436865287589295468160000,
208523860882947329329209600,
172333769324749858949760000,
144808236724268978700840000,
123386899930738064691840000,
106389724940279249657760000,
92677271503532146368537600,
81454633157401300519222500,
72153585080604612224640000,
64359216321897323867040000,
57762842349846905631360000,
52130965220736832332302400,
47284322195679666514560000,
43083442331187464737440000,
39418499221729173786240000,
36202059181067244675210000,
33363817741271572692673536,
30846724982684516172960000,
28604096143065477274240000,
26597431235069812414440000,
24794751591313594450560000,
23169317875883036592134400,
21698632766175580575360000,
20363658289350325129805625,
19148196591638873216640000,
18038396270151153056160000,
17022355990444679945241600]
import itertools, time
from fractions import gcd
def gcd_r(seq):
return reduce(gcd, seq)
def miniSolve(t, vals):
vals = [x for x in vals if x and x <= t]
for k in range(len(vals)):
for sub in itertools.combinations(vals, k):
if sum(sub) == t:
return sub
return None
def tryMod(n, state, answer):
t, vals, mult = state
mods = [x%n for x in vals if x%n]
if (t%n or mods) and sum(mods) < n:
print 'Filtering with', n
print t.bit_length(), len(vals)
else:
return state
newvals = list(vals)
tmod = t%n
if not tmod:
for x in vals:
if x%n:
newvals.remove(x)
else:
if len(set(mods)) != len(mods):
#don't want to deal with the complexity of multisets for now
print 'skipping', n
else:
mini = miniSolve(tmod, mods)
if mini is None:
return None
mini = set(mini)
for x in vals:
mod = x%n
if mod:
if mod in mini:
t -= x
answer.add(x*mult)
newvals.remove(x)
g = gcd_r(newvals + [t])
t = t//g
newvals = [x//g for x in newvals]
mult *= g
return (t, newvals, mult)
def solve(t, vals):
answer = set()
mult = 1
for d in itertools.count(2):
if not t:
return answer
elif not vals or t < min(vals):
return None #no solution'
res = tryMod(d, (t, vals, mult), answer)
if res is None:
return None
t, vals, mult = res
if len(vals) < 23:
break
if (d % 10000) == 0:
print 'd', d
#don't want to deal with the complexity of multisets for now
assert(len(set(vals)) == len(vals))
rest = miniSolve(t, vals)
if rest is None:
return None
answer.update(x*mult for x in rest)
return answer
start_t = time.time()
answer = solve(target, A)
assert(answer <= set(A) and sum(answer) == target)
print answer