I'm working on developing an adaptive rejection sampler in Numba. I'd like to use a class to implement it since I think it'll make the code a lot cleaner, and I see that classes are supported in Numba. My class would be a lot more general/useful if it could take functions as input, i.e. the log pdf of the distribution I want to sample from. Is there any way to do that? The alternative I guess is to define the log pdf equations in the class definition itself.
Why do I want to do this? The sampler will be used as part of a Gibbs sampling scheme, so speed ups in each sampling step are crucial. I have to simulate from a distribution I know only up to a normalizing constant, and adaptive rejection sampling is a general technique that will help me sample without needing to know this normalizing constant. There is a python implementation of an adaptive rejection sampler floating around stack overflow, but it's too slow for my purposes. It also randomly breaks for some reason on some simulated data that it should work on. I've had luck with numba on other parts of my project, including a greater than 100x speed up on one part of the Gibbs sampler.
Numba functions cannot take functions as input arguments. The official docs recommend possibly using closures in a function factory as a workaround in some cases:
http://numba.pydata.org/numba-doc/latest/user/faq.html#can-i-pass-a-function-as-an-argument-to-a-jitted-function
Copying the code example from the above link in case the url ever becomes invalid:
def make_f(g):
# Note: a new f() is compiled each time make_f() is called!
#jit(nopython=True)
def f(x):
return g(x) + g(-x)
return f
f = make_f(my_g_function)
result = f(1)
Not sure if this would work in your particular case though. I think defining the functions you want as class methods would be a better strategy, although without a code example, I'm just guessing.
Related
Is there any place with a brief description of each of the algorithms for the parameter method in the minimize function of the lmfit package? Both there and in the documentation of SciPy there is no explanation about the details of each algorithm. Right now I know I can choose between them but I don't know which one to choose...
My current problem
I am using lmfit in Python to minimize a function. I want to minimize the function within a finite and predefined range where the function has the following characteristics:
It is almost zero everywhere, which makes it to be numerically identical to zero almost everywhere.
It has a very, very sharp peak in some point.
The peak can be anywhere within the region.
This makes many minimization algorithms to not work. Right now I am using a combination of the brute force method (method="brute") to find a point close to the peak and then feed this value to the Nelder-Mead algorithm (method="nelder") to finally perform the minimization. It is working approximately 50 % of the times, and the other 50 % of the times it fails to find the minimum. I wonder if there are better algorithms for cases like this one...
I think it is a fair point that docs for lmfit (such as https://lmfit.github.io/lmfit-py/fitting.html#fit-methods-table) and scipy.optimize (such as https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html#optimization-scipy-optimize) do not give detailed mathematical descriptions of the algorithms.
Then again, most of the docs for scipy, numpy, and related libraries describe how to use the methods, but do not describe in much mathematical detail how the algorithms work.
In fairness, the different optimization algorithms share many features and the differences between them can get pretty technical. All of these methods try to minimize some metric (often called "cost" or "residual") by changing the values of parameters for the supplied function.
It sort of takes a text book (or at least a Wikipedia page) to establish the concepts and mathematical terms used for these methods, and then a paper (or at least a Wikipedia page) to describe how each method differs from the others. So, I think the basic answer would be to look up the different methods.
I come upon the following optimization problem:
The target function is a multivariate and non-differentiable function which takes as argument a list of scalars and return a scalar. It is non-differentiable in the sense that the computation within the function is based on pandas and a series of rolling, std, etc. actions.
The pseudo code is below:
def target_function(x: list) -> float:
# calculations
return output
Besides, each component of the x argument has its own bounds defined as a tuple (min, max). So how should I use the scipy.optimize library to find the global minimum of this function? Any other libraries could help?
I already tried scipy.optimize.brute, which took me like forever and scipy.optimize.minimize, which never produced a seemingly correct answer.
basinhopping, brute, and differential_evolution are the methods available for global optimization. As you've already discovered, brute-force global optimization is not going to be particularly efficient.
Differential evolution is a stochastic method that should do better than brute-force, but may still require a large number of objective function evaluations. If you want to use it, you should play with the parameters and see what will work best for your problem. This tends to work better than other methods if you know that your objective function is not "smooth": there could be discontinuities in the function or its derivatives.
Basin-hopping, on the other hand, makes stochastic jumps but also uses local relaxation after each jump. This is useful if your objective function has many local minima, but due to the local relaxation used, the function should be smooth. If you can't easily get at the gradient of your function, you could still try basin-hopping with one of the local minimizers which doesn't require this information.
The advantage of the scipy.optimize.basinhopping routine is that it is very customizable. You can use take_step to define a custom random jump, accept_test to override the test used for deciding whether to proceed with or discard the results of a random jump and relaxation, and minimizer_kwargs to adjust the local minimization behavior. For example, you might override take_step to stay within your bounds, and then select perhaps the L-BFGS-B minimizer, which can numerically estimate your function's gradient as well as take bounds on the parameters. L-BFGS-B does work better if you give it a gradient, but I've used it without one and it still is able to minimize well. Be sure to read about all of the parameters on the local and global optimization routines and adjust things like tolerances as acceptable to improve performance.
After taking a couple advanced statistics courses, I decided to code some functions/classes to just automate estimating parameters for different distributions via MLE. In Matlab, the below is something I easily coded once:
function [ params, max, confidence_interval ] = newroutine( fun, data, guesses )
lh = #(x,data) -sum(log(fun(x,data))); %Gets log-likelihood from user-defined fun.
options = optimset('Display', 'off', 'MaxIter', 1000000, 'TolX', 10^-20, 'TolFun', 10^-20);
[theta, max1] = fminunc(#(x) lh(x,data), guesses,options);
params = theta
max = max1
end
Where I just have to correctly specify the underlying pdf equation as fun, and with more code I can calculate p-values, confidence-intervals, etc.
With Python, however, all the sources I've found on MLE automation (for ex., here and here) insist that the easiest way to do this is to delve into OOP using a subclass of statsmodel's, GenericLikelihoodModel, which seems way too complicated for me. My reasoning is that, since the log-likelihood can be automatically created from the pdf (at least for the vast majority of functions), and scipy.stats."random_dist".fit() already easily returns MLE estimates, it seems ridiculous to have to write ~30 lines of class code each time you have a new dist. to fit.
I realize that doing it the way the two links suggests allows you to automatically tap into statsmodel's functions, but it honestly does not seem simpler than tapping into scipy oneself and writing much simpler functions.
Am I missing an easier way to perform basic MLE, or is there a real good reason for the way statsmodels does this?
I wrote the first post outlining the various methods, and I think it is fair to say that while I recommend the statsmodels approach, I did so to leverage the postestimation tools it provides and to get standard errors every time a model is estimated.
When using minimize, the python equivalent of fminunc (as you outline in your example), oftentimes I am forced to use "Nelder-Meade" or some other gradiant-free method to get convergence . Since I need standard errors for statistical inference, this entails an additional step using numdifftools to recover the hessian. So in the end, the method you propose has its complications too (for my work). If all you care about is the maximum likelihood estimate and not inference, then the approach you outline is probably best and you are correct that you don't need the machinery of statsmodel.
FYI: in a later post, I use your approach combined with autograd for significant speedups of big maximum likelihood models. I haven't successfully gotten this to work with statsmodels.
I am preconditioning a matrix using spilu, however, to pass this preconditioner into cg (the built in conjugate gradient method) it is necessary to use the LinearOperator function, can someone explain to me the parameter matvec, and why I need to use it. Below is my current code
Ainv=scla.spilu(A,drop_tol= 1e-7)
Ainv=scla.LinearOperator(Ainv.shape,matvec=Ainv)
scla.cg(A,b,maxiter=maxIterations, M = Ainv)
However this doesnt work and I am given the error TypeError: 'SuperLU' object is not callable. I have played around and tried
Ainv=scla.LinearOperator(Ainv.shape,matvec=Ainv.solve)
instead. This seems to work but I want to know why matvec needs Ainv.solve rather than just Ainv, and is it the right thing to feed LinearOperator?
Thanks for your time
Without having much experience with this part of scipy, some comments:
According to the docs you don't have to use LinearOperator, but you might do
M : {sparse matrix, dense matrix, LinearOperator}, so you can use explicit matrices too!
The idea/advantage of the LinearOperator:
Many iterative methods (e.g. cg, gmres) do not need to know the individual entries of a matrix to solve a linear system A*x=b. Such solvers only require the computation of matrix vector products docs
Depending on the task, sometimes even matrix-free approaches are available which can be much more efficient
The working approach you presented is indeed the correct one (some other source doing it similarily, and some course-materials doing it like that)
The idea of not using the inverse matrix, but using solve() here is not to form the inverse explicitly (which might be very costly)
A similar idea is very common in BFGS-based optimization algorithms although wiki might not give much insight here
scipy has an extra LinearOperator for this not forming the inverse explicitly! (although i think it's only used for statistics / completing/finishing some optimization; but i successfully build some LBFGS-based optimizers with this one)
Source # scicomp.stackexchange discussing this without touching scipy
And because of that i would assume spilu is completely going for this too (returning an object with a solve-method)
I have a bit of code that fits theoretical prediction to experimental data, and I want to run a LMA (Levenberg-Marquardt Algorithm) to fit the theory to experiment. However the calculations are non-trivial, with each model taking ~10-30 minutes to calculate on a single processor, however the problem is embarrassingly parallelisable and the code is currently set up to submit the different components (of a single itteration) to a cluster computer (this calculation still takes ~1-2 minutes).
Now this submission script is set up within a callable function within python - so for setting it up with the scipy LMA (scipy.optimise.leastsq) it is relatively trivial - however the scipy LMA will, I imagine, pass each individual calculation (for gauging the gradient) in serial, and wait for the return, whereas I'd prefer the LMA to send an entire set of calculations at a time, and then await the return. The python submission script looks a bit like:
def submission_script(number_iterations,number_parameters,value_parameters):
fitness_parameter = [0]*number_iterations
<fun stuff>
return (fitness_parameter)
Where the "value_parameters" is a nested list of dimensions [number_iterations][number_parameters] which contains the variables that are to be calculated for each model, "number_parameters" is the number of parameters that are to be fitted, "number_iterations" is the number of models to be calculated (so each step, to gauge the gradient, the LMA calculates 2*number_parameters models), and "fitness_parameter" is the value that has to be minimised (and has the dimensions [iterations]).
Now, obviously, I could write my own LMA, but that is a little bit of reinventing the wheel - I was wondering if there was anything out there that would satisfy my needs (or if the scipy LMA can be used in this way).
A Gauss-Newton algorithm should also work, as the starting point should be near the minima. The ability to constrain the fit (i.e. set maximum and minimum values for the fitted parameters) would be nice, but isn't necessary.
The scipy.optimize.leastsq function gives you the opportunity to provide a function J for evaluating the jacobian for a given parameter vector. You could implement a multiprocessing solution for calculating this matrix instead of having scipy.optimize.leastsq approximate it by serially calling your function f.
Unfortunately the LMA implementation in scipy uses separate functions for f and J. You may want to cache information you calculate in f in order to reuse it in J if it is called with the same parameter vecor. Alternatively you can implement a own LMA version that uses a single fJ call.
Found that this is basically a repeated question - it has been asked an answered at the link below.
Multithreaded calls to the objective function of scipy.optimize.leastsq