I am working on a large quadratic programming problem. I would like to feed in the Q matrix defining the objective function into IBM's Cplex using Python API. The Q matrix is built using scipy lil matrix because it is sparse. Ideally, I would like to pass the matrix onto Cplex. Does Cplex accept scipy lil matrix?
I can convert the Q to the format of list of lists which Cplex accepts, lets call it qMat. But the size of qMat becomes too large and the machine runs out of memory (even with 120 Gig).
Below is my work in progress code. In the actual problem n is around half a million, and m is around 5 million. In the actual problem Q is given and not randomly assigned as in the problem below.
from __future__ import division
import numpy as np
import cplex
import sys
import random
from scipy import sparse
n = 10
m = 5
def create():
Q = sparse.lil_matrix((n, n))
nums = random.sample(range(0, n), m)
for i in nums:
for j in nums:
a = random.uniform(0,1)
Q[i,j] = a
Q[j,i] = a
return Q
def convert(Q):
qMat = [[[], []] for _ in range(n)]
for k in xrange(n-1):
qMat[k][0] = Q.rows[k]
qMat[k][1] = Q.data[k]
return qMat
Q = create()
qMat = convert(Q)
my_prob = cplex.Cplex()
my_prob.objective.set_quadratic(qMat)
If n = 500000 and m = 5000000, then that is 2.5e12 non-zeroes. For each of these you'd need roughly one double for the non-zero value and one CPXDIM for the index. That is 8+4=12 bytes per non-zero. This would give:
>>> print(2.5e12 * 12 / 1024. / 1024. / 1024.)
27939.6772385
Roughly, 28TB of memory! It's not clear exactly how many non-zeros you plan on having, but using this calculation you can easily find out whether it is even possible or not to do what you're asking.
As mentioned in the comments, the CPLEX Python API does not accept scipy lil matrices. You could try docplex, which is numpy friendly, or you could even try generating an LP file directly.
Using something like the following is probably your best bet in terms of reducing the conversion overhead (I think I made an off-by-one error in the comments section above):
my_prob.objective.set_quadratic(list(zip(Q.rows, Q.data)))
or
my_prob.objective.set_quadratic([[row, data] for row, data in zip(Q.rows, Q.data)]))
At any rate, you should play with these to see what gives the best performance (in terms of speed and memory).
Related
here is my problem:
I would like to define an array of persons and change the entries of this array in a for loop. Since I also would like to see the asymptotics of the resulting distribution, I want to repeat this simulation quiet a lot, thus I'm using a matrix to store the several array in each row. I know how to do this with two for loops:
import random
import numpy as np
nobs = 100
rep = 10**2
steps = 10**2
dmoney = 1
state = np.matrix([[10] * nobs] * rep)
for i in range(steps):
for j in range(rep)
sample = random.sample(range(state.shape[1]),2)
state[j,sample[0]] = state[j,sample[0]] + dmoney
state[j,sample[1]] = state[j,sample[1]] - dmoney
I thought I use the multiprocessing library but I don't know how to do it, because in my simple mind, the workers manipulate the same global matrix in parallel, which I read is not a good idea.
So, how can I do this, to speed up calculations?
Thanks in advance.
OK, so this might not be much use, I haven't profiled it to see if there's a speed-up, but list comprehensions will be a little faster than normal loops anyway.
...
y_ix = np.arange(rep) # create once as same for each loop
for i in range(steps):
# presumably the two locations in the population to swap need refreshing each loop
x_ix = np.array([np.random.choice(nobs, 2) for j in range(rep)])
state[y_ix, x_ix[:,0]] += dmoney
state[y_ix, x_ix[:,1]] -= dmoney
PS what numpy splits over multiple processors depends on what libraries have been included when compiled (BLAS etc). You will be able to find info on line about this.
EDIT I can confirm, after comparing the original with the numpy indexed version above, that the original method is faster!
i got this decision variable:
x={}
for j in range(10):
for i in range(500000):
x[i,j] = m.addVar(vtype=GRB.BINARY, name="x%d%d" %(i,j))
so i need to add constraints for each x[i,j] variable like this:
for p in range(10):
for u in range(500000):
m.addConstr(x[u,p-1]<=x[u,p])
this is taking me so much time, more that 12hrs and then a lack of memory pop-up appears at my computer.
Can someone helpme to improve this constraint addition problem
Most likely, you are running out of physical memory and using virtual (swap) memory. This would not cause your computer to report an out-of-memory warning or error.
I rewrote your code as follows:
from gurobipy import *
m = Model()
x={}
for j in range(10):
for i in range(500000):
x[i,j] = m.addVar(vtype=GRB.BINARY, name="x%d%d" %(i,j))
m.update()
for p in range(10):
for u in range(500000):
try:
m.addConstr(x[u,p-1]<=x[u,p])
except:
pass
m.update()
I tested this using Gurobi Optimizer 6.5.2 on a computer with an Intel Xeon E3-1240 processor (3.40 GHz) and 32 GB of physical memory. It was able to formulate the variables and constraints in 1 minute 14 seconds. You might be able to save a small amount of memory using a list, but I believe that Gurobi Var and Constr objects require far more memory than a Python dict or list.
General Remark:
It looks quite costly to add 5 million constraints in general
Specific Remark:
Approach
You are wasting time and space by using dictionaries
Despite having constant-access complexity, these constants are big
They are also wasting memory
In a simple 2-dimensional case like this: stick to arrays!
Validity
Your indexing is missing the border-case of the first element, so indexing breaks!
Try this (much more efficient approach; using numpy's arrays):
import numpy as np
from gurobipy import *
N = 10
M = 500000
m = Model("Testmodel")
x = np.empty((N, M), dtype=object)
for i in range(N):
for j in range(M):
x[i,j] = m.addVar(vtype=GRB.BINARY, name="x%d%d" %(i,j))
m.update()
for u in range(M): # i switched the loop-order
for p in range(1,N): # i'm handling the border-case
m.addConstr(x[p-1,u] <= x[p,u])
Result:
~2 minutes
~2.5GB memory (complete program incl. Gurobi's internals)
I'm a bit of a newbie to both Matlab and Python so, many apologies if this question is a bit dumb...
I'm trying to convert some Matlab code over to Python using numpy and scipy and things were going fine until I reached the sparse matrix that someone wrote. The Matlab code goes like:
unwarpMatrix = sparse(phaseOrigin, ceil([1:nRead*nSlice*nPhaseDmap]/expan), 1, numPoints, numPoints)/expan;
Here's my python code (with my thought process) leading up to my attempt at conversion. For a given dataset I was testing with (in both Matlab and Python):
nread = 64
nslice = 28
nphasedmap = 3200
expan = 100
numpoints = 57344
Thus, the length of phaseorigin, s, and j arrays are 5734400 (and I've confirmed the functions that create my phaseorigin array output exactly the same result that Matlab does)
#Matlab sparse takes: S = sparse(i,j,s,m,n)
#Generates an m by n sparse matrix such that: S(i(k),j(k)) = s(k)
#scipy csc matrix takes: csc_matrix((data, ij), shape=(M, N))
#Matlab code is: unwarpMatrix = sparse(phaseOrigin, ceil([1:nRead*nSlice*nPhaseDmap]/expan), 1, numPoints, numPoints)/expan;
size = nread*nslice*nphasedmap
#i would be phaseOrigin variable
j = np.ceil(np.arange(1,size+1, dtype=np.double)/expan)
#Matlab apparently treats '1' as a scalar so I should be tiling 1 to the same size as j and phaseorigin
s = np.tile(1,size)
unwarpmatrix = csc_matrix((s,(phaseorigin, j)), shape=(numpoints,numpoints))/expan
so when I try to run my python code I get:
ValueError: column index exceedes matrix dimensions
This doesn't occur when I run the Matlab code even though the array sizes are larger than the defined matrix size...
What am I doing wrong? I've obviously screwed something up... Thanks very much in advance for any help!
The problem is; Python indexes start from 0, whereas Matlab indexes start from 1. So for an array of size 57344, in Python first element would be arr[0] and last element would be arr[57343].
You variable j has values from 1 to 57344. You probably see the problem. Creating your j like this would solve the problem:
j = np.floor(np.arange(0,size, dtype=np.double)/expan)
Still, better to check this before using...
What's the best(fastest) way to do this?
This generates what I believe is the correct answer, but obviously at N = 10e6 it is painfully slow. I think I need to keep the Xi values so I can correctly calculate the standard deviation, but are there any techniques to make this run faster?
def randomInterval(a,b):
r = ((b-a)*float(random.random(1)) + a)
return r
N = 10e6
Sum = 0
x = []
for sample in range(0,int(N)):
n = randomInterval(-5.,5.)
while n == 5.0:
n = randomInterval(-5.,5.) # since X is [-5,5)
Sum += n
x = np.append(x, n)
A = Sum/N
for sample in range(0,int(N)):
summation = (x[sample] - A)**2.0
standard_deviation = np.sqrt((1./N)*summation)
You made a decent attempt, but should make sure you understand this and don't copy explicitly since this is HW
import numpy as np
N = int(1e6)
a = np.random.uniform(-5,5,size=(N,))
standard_deviation = np.std(a)
This assumes you can use a package like numpy (you tagged it as such). If you can, there are a whole host of methods that allow you to create and do operations on arrays of data, thus avoiding explicit looping (it's done under the hood in an efficient manner). It would be good to take a look at the documentation to see what features are available and how to use them:
http://docs.scipy.org/doc/numpy/reference/index.html
Using the formulas found on this wiki page for Variance, you could compute it in one loop without storing a list of the random numbers (assuming you didn't need them elsewhere).
The context: my Python code pass arrays of 2D vertices to OpenGL.
I tested 2 approaches, one with ctypes, the other with struct, the latter being more than twice faster.
from random import random
points = [(random(), random()) for _ in xrange(1000)]
from ctypes import c_float
def array_ctypes(points):
n = len(points)
return n, (c_float*(2*n))(*[u for point in points for u in point])
from struct import pack
def array_struct(points):
n = len(points)
return n, pack("f"*2*n, *[u for point in points for u in point])
Any other alternative?
Any hint on how to accelerate such code (and yes, this is one bottleneck of my code)?
You can pass numpy arrays to PyOpenGL without incurring any overhead. (The data attribute of the numpy array is a buffer that points to the underlying C data structure that contains the same information as the array you're building)
import numpy as np
def array_numpy(points):
n = len(points)
return n, np.array(points, dtype=np.float32)
On my computer, this is about 40% faster than the struct-based approach.
You could try Cython. For me, this gives:
function usec per loop:
Python Cython
array_ctypes 1370 1220
array_struct 384 249
array_numpy 336 339
So Numpy only gives 15% benefit on my hardware (old laptop running WindowsXP), whereas Cython gives about 35% (without any extra dependency in your distributed code).
If you can loosen your requirement that each point is a tuple of floats, and simply make 'points' a flattened list of floats:
def array_struct_flat(points):
n = len(points)
return pack(
"f"*n,
*[
coord
for coord in points
]
)
points = [random() for _ in xrange(1000 * 2)]
then the resulting output is the same, but the timing goes down further:
function usec per loop:
Python Cython
array_struct_flat 157
Cython might be capable of substantially better than this too, if someone smarter than me wanted to add static type declarations to the code. (Running 'cython -a test.pyx' is invaluable for this, it produces an html file showing you where the slowest (yellow) plain Python is in your code, versus python that has been converted to pure C (white). That's why I spread the code above out onto so many lines, because the coloring is done per-line, so it helps to spread it out like that.)
Full Cython instructions are here:
http://docs.cython.org/src/quickstart/build.html
Cython might produce similar performance benefits across your whole codebase, and in ideal conditions, with proper static typing applied, can improve speed by factors of ten or a hundred.
There's another idea I stumbled across. I don't have time to profile it right now, but in case someone else does:
# untested, but I'm fairly confident it runs
# using 'flattened points' list, i.e. a list of n*2 floats
points = [random() for _ in xrange(1000 * 2)]
c_array = c_float * len(points * 2)
c_array[:] = points
That is, first we create the ctypes array but don't populate it. Then we populate it using the slice notation. People smarter than I tell me that assigning to a slice like this may help performance. It allows us to pass a list or iterable directly on the RHS of the assignment, without having to use the *iterable syntax, which would perform some intermediate wrangling of the iterable. I suspect that this is what happens in the depths of creating pyglet's Batches.
Presumably you could just create c_array once, then just reassign to it (the final line in the above code) every time the points list changes.
There is probably an alternative formulation which accepts the original definition of points (a list of (x,y) tuples.) Something like this:
# very untested, likely contains errors
# using a list of n tuples of two floats
points = [(random(), random()) for _ in xrange(1000)]
c_array = c_float * len(points * 2)
c_array[:] = chain(p for p in points)
If performance is an issue, you do not want to use ctypes arrays with the star operation (e.g., (ctypes.c_float * size)(*t)).
In my test packis fastest followed by the use of the array module with a cast of the address (or using the from_buffer function).
import timeit
repeat = 100
setup="from struct import pack; from random import random; import numpy; from array import array; import ctypes; t = [random() for _ in range(2* 1000)];"
print(timeit.timeit(stmt="v = array('f',t); addr, count = v.buffer_info();x = ctypes.cast(addr,ctypes.POINTER(ctypes.c_float))",setup=setup,number=repeat))
print(timeit.timeit(stmt="v = array('f',t);a = (ctypes.c_float * len(v)).from_buffer(v)",setup=setup,number=repeat))
print(timeit.timeit(stmt='x = (ctypes.c_float * len(t))(*t)',setup=setup,number=repeat))
print(timeit.timeit(stmt="x = pack('f'*len(t), *t);",setup=setup,number=repeat))
print(timeit.timeit(stmt='x = (ctypes.c_float * len(t))(); x[:] = t',setup=setup,number=repeat))
print(timeit.timeit(stmt='x = numpy.array(t,numpy.float32).data',setup=setup,number=repeat))
The array.array approach is slightly faster than Jonathan Hartley's approach in my test while the numpy approach has about half the speed:
python3 convert.py
0.004665990360081196
0.004661010578274727
0.026358536444604397
0.0028003649786114693
0.005843495950102806
0.009067213162779808
The net winner is pack.
You can use array (notice also the generator expression instead of the list comprehension):
array("f", (u for point in points for u in point)).tostring()
Another optimization would be to keep the points flattened from the beginning.