I'm using scipy.optimize.leastsq in conjunction with a simulator. leastsq calls a user-defined objective function and passes an input vector to it. In turn, the objective function returns an error vector. leastsq optimizes the input vector in such a way that the sum of the squares of the error vector is minimized.
In my case the objective function will run a whole simulation each time it is called. The employed simulator is single-threaded and needs several minutes for each run. I'd therefore like to run multiple instances of the simulator at once. However, calls to the objective function are performed serially.
How can I get leastsq to perform multiple calls to the objective function at once?
There's a good opportunity to speed up leastsq by supplying your own function to calculate the derivatives (the Dfun parameter), providing you have several parameters. If this function is not supplied, leastsq iterates over each of the parameters to calculate the derivative each time, which is time consuming. This appears to take the majority of the time in the fitting.
You can use your own Dfun function which calculates the derivatives for each parameter using a multiprocessing.Pool to do the work. These derivatives can be calculated independently and should be trivially parallelised.
Here is a rough example, showing how to do this:
import numpy as np
import multiprocessing
import scipy.optimize
def calcmod(params):
"""Return the model."""
return func(params)
def delta(params):
"""Difference between model and data."""
return calcmod(params) - y
pool = multiprocessing.Pool(4)
def Dfun(params):
"""Calculate derivatives for each parameter using pool."""
zeropred = calcmod(params)
derivparams = []
delta = 1e-4
for i in range(len(params)):
copy = np.array(params)
copy[i] += delta
derivparams.append(copy)
results = pool.map(calcmod, derivparams)
derivs = [ (r - zeropred)/delta for r in results ]
return derivs
retn = scipy.optimize.leastsq(leastfuncall, inputparams, gtol=0.01,
Dfun=Dfun, col_deriv=1)
The algorithm used by leastsq, Levenberg-Marquardt, needs to know the value of the objective function at the current point before determining the next point. In short, there is no straightforward way to parallelize such a serial algorithm.
You can, however, parallelize your objective function in some cases. This can be done, if it's of the form:
def objective_f(params):
r = np.zeros([200], float)
for j in range(200):
r[j] = run_simulation(j, params)
return
def run_simulation(j, params):
r1 = ... compute j-th entry of the result ...
return r1
Here, you can clearly parallelize across the loop over j, for instance using the multiprocessing module. Something like this: (untested)
def objective_f(params):
r = np.zeros([200], float)
def parameters():
for j in range(200):
yield j, params
pool = multiprocessing.Pool()
r[:] = pool.map(run_simulation, parameters())
return r
Another opportunity for parallelization occurs if you have to fit multiple data sets --- this is an (embarassingly) parallel problem, and the different data sets can be fitted in parallel.
If this does not help, you can look into discussion on parallelization of the LM algorithm in the literature. For instance: http://dl.acm.org/citation.cfm?id=1542338 The main optimization suggested in this paper seems to be parallelization of the numerical computation of the Jacobian. You can do this by supplying your own parallelized Jacobian function to leastsq. The remaining suggestion of the paper, speculatively parallelizing Levenberg-Marquardt search steps, is however more difficult to implement and requires changes in the LM algorithm.
I'm not aware of Python (or other language) libraries implementing optimization algorithms targeted for parallel computation, although there may be some. If you manage to implement/find one of them, please advertise this on the Scipy users mailing list --- there is certainly interest in one of these!
Does this help?
http://docs.python.org/library/multiprocessing.html
I've always found Pool to be the simplest to multiprocess with python.
NumPy/SciPy's functions are usually optimized for multithreading. Did you look at your CPU utilization to confirm that only one core is being used while the simulation is being ran? Otherwise you have nothing to gain from running multiple instances.
If it is, in fact, single threaded, then your best option is to employ the multiprocessing module. It runs several instances of the Python interpreter so you can make several simultaneous calls to SciPy.
Have you used scipy.least_squares, it is a much better option, and when I use it to optimize a function it uses all the available threads. Therefore exactly what you asked
Related
I am looking for a differential evolution algorithm (hopefully the one from Scipy) I could use in an unorthodox way. I would like that for each generation, the DE gives me all the child members of the new generation in advance and that I evaluate them all at once in my objective function.
The reason is that my objective function calls COMSOL. I can do a batch of calculations in a COMSOL that COMSOl is going to parallelize carefully, so I don't want the DE to parallelize it itself. So in the end, I want to calculate all the members in one call of COMSOL. Do you have any idea of a package in Python with this kind of freedom?
Thank you for your help!
You can vectorise differential_evolution by using the ability of the workers keyword to accept a map-like callable that is sent the entire population and is expected to return an array with the function values evaluated for the entire population:
from scipy.optimize import rosen, differential_evolution
bounds=[(0, 10), (0, 10)]
def maplike_fun(func, x):
# x.shape == (S, N), where S is the size of the population and N
# is the number of parameters. This is where you'd call out from
# Python to COMSOL, instead of the following line.
return func(x.T)
res = differential_evolution(rosen, bounds, workers=maplike_fun, polish=False, updating='deferred')
From scipy 1.9 there will also be a vectorized keyword, which will send the entire population to the objective function at each iteration.
Context:
I'm developing an optimizer by using SciPy's differential evolution package. I get some good results with worker = 1, but I would like to speed up the runtime.
I checked the following thread already regarding How to enable parallel in scipy.optimize.differential_evolution? Even if I add if __name__ == "main": and set workers = -1, the runtime is exactly the same. I tested my code on my local machine (2 physical, 4 logical processors) or on our server environment (16 cores)
I tested the following use case https://medium.com/#grvsinghal/speed-up-your-code-using-multiprocessing-in-python-36e4e703213e. Changing the number of workers does impact the runtime, so parallel processing work on my laptop and server as well.
Consequently, my hypothesis is that the way I defined my objective and constraint function might be the problem.
Pseudo Code:
The code is for work, so I can't share it. My objective and constraint functions need a lot of constants, hence I wrapped them into a Class. I know that the args parameter is for that, but the constraint function doesn't have that parameter.
The code structure looks the following way:
class MyClass:
def __init__(configuration, array1, array2, dataframe):
# assigning attributes so that
self.something = datafame[column1]
...
def obj(self, x):
# based on the initialized values + optimized parameters, it calculates the objective
def cons(self, x):
# based on the initialized values + optimized parameters, it calculates the constraint violations
Then, I create a class instance o = MyClass() and call the differential evolution function with the class module: differential_evolution(func = o.obj, ...).
Question:
Has anyone faced the following issue, i.e. even if you set multiple workers the code runs on one?
Do you have any suggestions on how to better design the objective and constraint functions so that they are eligible for parallel processing?
Thank you!
I am trying to parallelize an interaction with a Python object that is computationally expensive. I would like to use Ray to do this but so far my best efforts have failed.
The object is a CPLEX model object and I'm trying to add a set of constraints for a list of conditions.
Here's my setup:
import numpy as np
import docplex.mp.model as cpx
import ray
m = cpx.Model(name="mymodel")
def mask_array(arr, mask_val):
array_mask = np.argwhere(arr == mask_val)
arg_slice = [i[0] for i in array_mask]
return arg_slice
weeks = [1,3,7,8,9]
const = 1.5
r = rate = np.array(df['r'].tolist(), dtype=np.float)
x1 = m.integer_var_list(data_indices, lb=lower_bound, ub=upper_bound)
x2 = m.dot(x1, r)
#ray.remote
def add_model_constraint(m, x2, x2sum, const):
m.add_constraint(x2sum <= x2*const)
return m
x2sums = []
for w in weeks:
arg_slice = mask_array(x2, w)
x2sum = m.dot([x2[i] for i in arg_slice], r[arg_slice])
x2sums.append(x2sum)
#: this is the expensive part
for x2sum in x2sums:
add_model_constraint.remote(m, x2, x2sum, const)
In a nutshell, what I'm doing is creating a model object, some variables, and then looping over a set of weeks in order to build a constraint. I subset my variable, compute some dot products and apply the constraint. I would like to be able to create the constraint in parallel because it takes a while but so far my code just hangs and I'm not sure why.
I don't know if I should return the model object in my function because by default the m.add_constraint method modifies the object in place. But at the same time I know Ray returns references to the remote value so yea, not sure what's supposed to happen there.
Is this at all a valid use of ray? It it reasonable to expect to be able to modify a CPLEX object in this way (or any other arbitrary python object)?
I am new to Ray so I may be structuring this all wrong, or maybe this will never work for X, Y, and Z reason which would also be good to know.
The Model object is not designed to be used in parallel. You cannot add constraints from multiple threads at the same time. This will result in undefined behavior. You will have to at least a lock to make sure only thread at a time adds constraints.
Note that parallel model building may not be a good idea at all: the order of constraints will be more or less random. On the other hand, behavior of the solver may depend on the order of constraints (this is called performance variability). So you may have a hard time reproducing certain results/behavior.
I understand the primary issue was the performance of module building.
From the code you sent, I have two suggestions to address this:
post constraints in batches, that is store constraints in a list and add them once using Model.add_constraints(), this should be more efficient than adding them one at a time.
experiment with Model.dotf() (functional-style scalar product). It avoids building auxiliary lists, passing instead a function of the key , returning the coefficient.
This method is new in Docplex version 2.12.
For example, assuming a list of 3 variables:
abc = m.integer_var_list(3, name=["a", "b", "c"]) m.dotf(abc, lambda
k: k+2)
docplex.mp.LinearExpression(a+2b+3c)
Model.dotf() is usually faster than Model.dot()
I’m trying to solve a simple ODE to visualise the temporal response, which works well for constant input conditions using the new solve_ivp integration API in SciPy. For example:
def dN1_dt_simple(t, N1):
return -100 * N1
sol = solve_ivp(fun=dN1_dt_simple, t_span=[0, 100e-3], y0=[N0,])
However, I wonder is it possible to plot the response to a time-varying input? For instance, rather than having y0 fixed at N0, can I find the response to a simple sinusoid?
Is there a compatible way to pass time-varying input conditions into the API?
The function you pass to solve_ivp has a t in its signature for exactly this reason. You can do with it whatever you like¹. For example, to get a smooth, one-time pulse, you can do:
from numpy import pi, cos
def fun(t,N1):
input = 1-cos(t) if 0<t<2*pi else 0
return -100*N1 + input
sol = solve_ivp(fun=fun, t_span=[0,20], y0=[N0])
Note that y0 is not the input in your use of the term, but the initial condition. It is defined and makes sense for one point in time only – where you start your integration/simulation.
With ODEs, you typically model external inputs as forces or similar (affecting the time derivative of the system like in the above example) rather than direct changes to the state.
With this approach and in your context of an excitable system, N0 is already the outcome of some external input.
¹ As long as it is sufficiently smooth for the needs of the respective integrator, usually continuously differentiable (C¹). If you want to implement a step, better use a very sharp sigmoid instead.
I'm trying to use scipy.optimize.minimize to minimize a complicated function. I noticed in hindsight that the minimize function takes the objective and derivative functions as separate arguments. Unfortunately, I've already defined a function which returns the objective function value and first-derivative values together -- because the two are computed simultaneously in a for loop. I don't think there is a good way to separate my function into two without the program essentially running the same for loop twice.
Is there a way to pass this combined function to minimize?
(FYI, I'm writing an artificial neural network backpropagation algorithm, so the for loop is used to loop over training data. The objective and derivatives are accumulated concurrently.)
Yes, you can pass them in a single function:
import numpy as np
from scipy.optimize import minimize
def f(x):
return np.sin(x) + x**2, np.cos(x) + 2*x
sol = minimize(f, [0], jac=True, method='L-BFGS-B')
Something that might work is: you can memoize the function, meaning that if it gets called with the same inputs a second time, it will simply return the same outputs corresponding to those inputs without doing any actual work the second time. What is happening behind the scenes is that the results are getting cached. In the context of a nonlinear program, there could be thousands of calls which implies a large cache. Often with memoizers(?), you can specify a cache limit and the population will be managed FIFO. IOW you still benefit fully for your particular case because the inputs will be the same only when you are needing to return function value and derivative around the same point in time. So what I'm getting at is that a small cache should suffice.
You don't say whether you are using py2 or py3. In Py 3.2+, you can use functools.lru_cache as a decorator to provide this memoization. Then, you write your code like this:
#functools.lru_cache
def original_fn(x):
blah
return fnvalue, fnderiv
def new_fn_value(x):
fnvalue, fnderiv = original_fn(x)
return fnvalue
def new_fn_deriv(x):
fnvalue, fnderiv = original_fn(x)
return fnderiv
Then you pass each of the new functions to minimize. You still have a penalty because of the second call, but it will do no work if x is unchanged. You will need to research what unchanged means in the context of floating point numbers, particularly since the change in x will fall away as the minimization begins to converge.
There are lots of recipes for memoization in py2.x if you look around a bit.
Did I make any sense at all?