I am trying to parallelize an interaction with a Python object that is computationally expensive. I would like to use Ray to do this but so far my best efforts have failed.
The object is a CPLEX model object and I'm trying to add a set of constraints for a list of conditions.
Here's my setup:
import numpy as np
import docplex.mp.model as cpx
import ray
m = cpx.Model(name="mymodel")
def mask_array(arr, mask_val):
array_mask = np.argwhere(arr == mask_val)
arg_slice = [i[0] for i in array_mask]
return arg_slice
weeks = [1,3,7,8,9]
const = 1.5
r = rate = np.array(df['r'].tolist(), dtype=np.float)
x1 = m.integer_var_list(data_indices, lb=lower_bound, ub=upper_bound)
x2 = m.dot(x1, r)
#ray.remote
def add_model_constraint(m, x2, x2sum, const):
m.add_constraint(x2sum <= x2*const)
return m
x2sums = []
for w in weeks:
arg_slice = mask_array(x2, w)
x2sum = m.dot([x2[i] for i in arg_slice], r[arg_slice])
x2sums.append(x2sum)
#: this is the expensive part
for x2sum in x2sums:
add_model_constraint.remote(m, x2, x2sum, const)
In a nutshell, what I'm doing is creating a model object, some variables, and then looping over a set of weeks in order to build a constraint. I subset my variable, compute some dot products and apply the constraint. I would like to be able to create the constraint in parallel because it takes a while but so far my code just hangs and I'm not sure why.
I don't know if I should return the model object in my function because by default the m.add_constraint method modifies the object in place. But at the same time I know Ray returns references to the remote value so yea, not sure what's supposed to happen there.
Is this at all a valid use of ray? It it reasonable to expect to be able to modify a CPLEX object in this way (or any other arbitrary python object)?
I am new to Ray so I may be structuring this all wrong, or maybe this will never work for X, Y, and Z reason which would also be good to know.
The Model object is not designed to be used in parallel. You cannot add constraints from multiple threads at the same time. This will result in undefined behavior. You will have to at least a lock to make sure only thread at a time adds constraints.
Note that parallel model building may not be a good idea at all: the order of constraints will be more or less random. On the other hand, behavior of the solver may depend on the order of constraints (this is called performance variability). So you may have a hard time reproducing certain results/behavior.
I understand the primary issue was the performance of module building.
From the code you sent, I have two suggestions to address this:
post constraints in batches, that is store constraints in a list and add them once using Model.add_constraints(), this should be more efficient than adding them one at a time.
experiment with Model.dotf() (functional-style scalar product). It avoids building auxiliary lists, passing instead a function of the key , returning the coefficient.
This method is new in Docplex version 2.12.
For example, assuming a list of 3 variables:
abc = m.integer_var_list(3, name=["a", "b", "c"]) m.dotf(abc, lambda
k: k+2)
docplex.mp.LinearExpression(a+2b+3c)
Model.dotf() is usually faster than Model.dot()
Related
Context:
I'm developing an optimizer by using SciPy's differential evolution package. I get some good results with worker = 1, but I would like to speed up the runtime.
I checked the following thread already regarding How to enable parallel in scipy.optimize.differential_evolution? Even if I add if __name__ == "main": and set workers = -1, the runtime is exactly the same. I tested my code on my local machine (2 physical, 4 logical processors) or on our server environment (16 cores)
I tested the following use case https://medium.com/#grvsinghal/speed-up-your-code-using-multiprocessing-in-python-36e4e703213e. Changing the number of workers does impact the runtime, so parallel processing work on my laptop and server as well.
Consequently, my hypothesis is that the way I defined my objective and constraint function might be the problem.
Pseudo Code:
The code is for work, so I can't share it. My objective and constraint functions need a lot of constants, hence I wrapped them into a Class. I know that the args parameter is for that, but the constraint function doesn't have that parameter.
The code structure looks the following way:
class MyClass:
def __init__(configuration, array1, array2, dataframe):
# assigning attributes so that
self.something = datafame[column1]
...
def obj(self, x):
# based on the initialized values + optimized parameters, it calculates the objective
def cons(self, x):
# based on the initialized values + optimized parameters, it calculates the constraint violations
Then, I create a class instance o = MyClass() and call the differential evolution function with the class module: differential_evolution(func = o.obj, ...).
Question:
Has anyone faced the following issue, i.e. even if you set multiple workers the code runs on one?
Do you have any suggestions on how to better design the objective and constraint functions so that they are eligible for parallel processing?
Thank you!
Multiple symfit model instances share parameter objects with the same name. I'd like to understand where this behaviour comes from, what it's intent is and if it's possible to deactivate.
To illustrate what I mean, a minimial example:
import symfit as sf
# Create Parameters and Variables
a = sf.Parameter('a',value=0)
b = sf.Parameter('b',value=1,fixed=True)
x, y = sf.variables('x, y')
# Instanciate two models
model1=sf.Model({y:a*x+b})
model2=sf.Model({y:a*x+b})
# They are indeed not the same
id(model1) == id(model2)
>>False
# There are two parameters
print(model1.params)
>>[a,b]
print(model1.params[1].name, model1.params[1].value)
>>b 1
print(model2.params[1].name, model2.params[1].value)
>>b 1
#They are initially identical
# We want to manually modify the fixed one in only one model
model1.params[1].value = 3
# Both have changed
print(model1.params[1].name, model1.params[1].value)
>>b 3
print(model2.params[1].name, model2.params[1].value)
>>b 3
id(model1.params[1]) == id(model2.params[1])
>>True
# The parameter is the same object
I want to fit multiple data streams with different models, but different fixed paramter values dependent on the data stream. Renaming the parameters in each instance of the model would work, but is ugly given that the paramter represents the same quantity. Processing them sequentially and modifying the parameters in between is possible, but I worry about unintended interactions between steps.
PS: Can someone with sufficient reputation please create the symfit tag
Excellent question. In principle this is because Parameter objects are a subclass of sympy.Symbol, and from its docstring:
Symbols are identified by name and assumptions:
>>> from sympy import Symbol
>>> Symbol("x") == Symbol("x")
True
>>> Symbol("x", real=True) == Symbol("x", real=False)
False
This is fundamental to the inner working of sympy, and therefore something we also use in symfit. But the value and fixed arguments are not viewed as assumptions, so they are not used to distinguish parameters.
Now, to your question on how this would affect fitting. Like you say, working sequentially is a good solution, and one that will not have any side effects:
model = sf.Model({y:a*x+b})
b.fixed = True
fit_results = []
for b_value, xdata, ydata in datastream:
b.value = b_value
fit = Fit(model, x=xdata, y=ydata)
fit_results.append(fit.execute())
So there is no need to define a new Parameter every iteration, the b.value attribute will be the same within each loop so there is no way this can go wrong. The only way I can imagine this going wrong is if you use threading, that will probably create some race conditions. But threading is not desirable for CPU bound tasks anyway, multiprocessing is the way to go. And in that case, separate processes will be spawned, creating separate microcosms, so there should be no problem there either.
I hope this answers your question, if not let me know.
p.s. I'm slowly answering my way up to 1500 to make that tag, but if someone beats me to it I'd be all the happier for it of course ;)
Suppose I have some kind of monte carlo simulation f(x), where x is some parameter that I want to hold constant for N trials. The obvious way to parallelize this and collect the output is:
results = pool.map(f,x * np.ones(N))
However, the meaning of x in my program is really not sequence-like, so typing x * np.ones(N) seems kind of silly. None of the map variants seem to do exactly this, but maybe there exists some kind of function like:
results = pool.const_map(f,x)
Such a function could return a multiset rather than a list because the idea "repeatedly do something" does not have a notion of "order". Or maybe the developers have some interest in adding such a function? I could submit a pull request eventually when I find the time.
I'm trying to use scipy.optimize.minimize to minimize a complicated function. I noticed in hindsight that the minimize function takes the objective and derivative functions as separate arguments. Unfortunately, I've already defined a function which returns the objective function value and first-derivative values together -- because the two are computed simultaneously in a for loop. I don't think there is a good way to separate my function into two without the program essentially running the same for loop twice.
Is there a way to pass this combined function to minimize?
(FYI, I'm writing an artificial neural network backpropagation algorithm, so the for loop is used to loop over training data. The objective and derivatives are accumulated concurrently.)
Yes, you can pass them in a single function:
import numpy as np
from scipy.optimize import minimize
def f(x):
return np.sin(x) + x**2, np.cos(x) + 2*x
sol = minimize(f, [0], jac=True, method='L-BFGS-B')
Something that might work is: you can memoize the function, meaning that if it gets called with the same inputs a second time, it will simply return the same outputs corresponding to those inputs without doing any actual work the second time. What is happening behind the scenes is that the results are getting cached. In the context of a nonlinear program, there could be thousands of calls which implies a large cache. Often with memoizers(?), you can specify a cache limit and the population will be managed FIFO. IOW you still benefit fully for your particular case because the inputs will be the same only when you are needing to return function value and derivative around the same point in time. So what I'm getting at is that a small cache should suffice.
You don't say whether you are using py2 or py3. In Py 3.2+, you can use functools.lru_cache as a decorator to provide this memoization. Then, you write your code like this:
#functools.lru_cache
def original_fn(x):
blah
return fnvalue, fnderiv
def new_fn_value(x):
fnvalue, fnderiv = original_fn(x)
return fnvalue
def new_fn_deriv(x):
fnvalue, fnderiv = original_fn(x)
return fnderiv
Then you pass each of the new functions to minimize. You still have a penalty because of the second call, but it will do no work if x is unchanged. You will need to research what unchanged means in the context of floating point numbers, particularly since the change in x will fall away as the minimization begins to converge.
There are lots of recipes for memoization in py2.x if you look around a bit.
Did I make any sense at all?
I'm using scipy.optimize.leastsq in conjunction with a simulator. leastsq calls a user-defined objective function and passes an input vector to it. In turn, the objective function returns an error vector. leastsq optimizes the input vector in such a way that the sum of the squares of the error vector is minimized.
In my case the objective function will run a whole simulation each time it is called. The employed simulator is single-threaded and needs several minutes for each run. I'd therefore like to run multiple instances of the simulator at once. However, calls to the objective function are performed serially.
How can I get leastsq to perform multiple calls to the objective function at once?
There's a good opportunity to speed up leastsq by supplying your own function to calculate the derivatives (the Dfun parameter), providing you have several parameters. If this function is not supplied, leastsq iterates over each of the parameters to calculate the derivative each time, which is time consuming. This appears to take the majority of the time in the fitting.
You can use your own Dfun function which calculates the derivatives for each parameter using a multiprocessing.Pool to do the work. These derivatives can be calculated independently and should be trivially parallelised.
Here is a rough example, showing how to do this:
import numpy as np
import multiprocessing
import scipy.optimize
def calcmod(params):
"""Return the model."""
return func(params)
def delta(params):
"""Difference between model and data."""
return calcmod(params) - y
pool = multiprocessing.Pool(4)
def Dfun(params):
"""Calculate derivatives for each parameter using pool."""
zeropred = calcmod(params)
derivparams = []
delta = 1e-4
for i in range(len(params)):
copy = np.array(params)
copy[i] += delta
derivparams.append(copy)
results = pool.map(calcmod, derivparams)
derivs = [ (r - zeropred)/delta for r in results ]
return derivs
retn = scipy.optimize.leastsq(leastfuncall, inputparams, gtol=0.01,
Dfun=Dfun, col_deriv=1)
The algorithm used by leastsq, Levenberg-Marquardt, needs to know the value of the objective function at the current point before determining the next point. In short, there is no straightforward way to parallelize such a serial algorithm.
You can, however, parallelize your objective function in some cases. This can be done, if it's of the form:
def objective_f(params):
r = np.zeros([200], float)
for j in range(200):
r[j] = run_simulation(j, params)
return
def run_simulation(j, params):
r1 = ... compute j-th entry of the result ...
return r1
Here, you can clearly parallelize across the loop over j, for instance using the multiprocessing module. Something like this: (untested)
def objective_f(params):
r = np.zeros([200], float)
def parameters():
for j in range(200):
yield j, params
pool = multiprocessing.Pool()
r[:] = pool.map(run_simulation, parameters())
return r
Another opportunity for parallelization occurs if you have to fit multiple data sets --- this is an (embarassingly) parallel problem, and the different data sets can be fitted in parallel.
If this does not help, you can look into discussion on parallelization of the LM algorithm in the literature. For instance: http://dl.acm.org/citation.cfm?id=1542338 The main optimization suggested in this paper seems to be parallelization of the numerical computation of the Jacobian. You can do this by supplying your own parallelized Jacobian function to leastsq. The remaining suggestion of the paper, speculatively parallelizing Levenberg-Marquardt search steps, is however more difficult to implement and requires changes in the LM algorithm.
I'm not aware of Python (or other language) libraries implementing optimization algorithms targeted for parallel computation, although there may be some. If you manage to implement/find one of them, please advertise this on the Scipy users mailing list --- there is certainly interest in one of these!
Does this help?
http://docs.python.org/library/multiprocessing.html
I've always found Pool to be the simplest to multiprocess with python.
NumPy/SciPy's functions are usually optimized for multithreading. Did you look at your CPU utilization to confirm that only one core is being used while the simulation is being ran? Otherwise you have nothing to gain from running multiple instances.
If it is, in fact, single threaded, then your best option is to employ the multiprocessing module. It runs several instances of the Python interpreter so you can make several simultaneous calls to SciPy.
Have you used scipy.least_squares, it is a much better option, and when I use it to optimize a function it uses all the available threads. Therefore exactly what you asked