Maximum Likelihood Estimation with statsmodels overcomplicates things? Hoping for Recommendations - python

After taking a couple advanced statistics courses, I decided to code some functions/classes to just automate estimating parameters for different distributions via MLE. In Matlab, the below is something I easily coded once:
function [ params, max, confidence_interval ] = newroutine( fun, data, guesses )
lh = #(x,data) -sum(log(fun(x,data))); %Gets log-likelihood from user-defined fun.
options = optimset('Display', 'off', 'MaxIter', 1000000, 'TolX', 10^-20, 'TolFun', 10^-20);
[theta, max1] = fminunc(#(x) lh(x,data), guesses,options);
params = theta
max = max1
end
Where I just have to correctly specify the underlying pdf equation as fun, and with more code I can calculate p-values, confidence-intervals, etc.
With Python, however, all the sources I've found on MLE automation (for ex., here and here) insist that the easiest way to do this is to delve into OOP using a subclass of statsmodel's, GenericLikelihoodModel, which seems way too complicated for me. My reasoning is that, since the log-likelihood can be automatically created from the pdf (at least for the vast majority of functions), and scipy.stats."random_dist".fit() already easily returns MLE estimates, it seems ridiculous to have to write ~30 lines of class code each time you have a new dist. to fit.
I realize that doing it the way the two links suggests allows you to automatically tap into statsmodel's functions, but it honestly does not seem simpler than tapping into scipy oneself and writing much simpler functions.
Am I missing an easier way to perform basic MLE, or is there a real good reason for the way statsmodels does this?

I wrote the first post outlining the various methods, and I think it is fair to say that while I recommend the statsmodels approach, I did so to leverage the postestimation tools it provides and to get standard errors every time a model is estimated.
When using minimize, the python equivalent of fminunc (as you outline in your example), oftentimes I am forced to use "Nelder-Meade" or some other gradiant-free method to get convergence . Since I need standard errors for statistical inference, this entails an additional step using numdifftools to recover the hessian. So in the end, the method you propose has its complications too (for my work). If all you care about is the maximum likelihood estimate and not inference, then the approach you outline is probably best and you are correct that you don't need the machinery of statsmodel.
FYI: in a later post, I use your approach combined with autograd for significant speedups of big maximum likelihood models. I haven't successfully gotten this to work with statsmodels.

Related

lmfit/scipy.optimize minimization methods description?

Is there any place with a brief description of each of the algorithms for the parameter method in the minimize function of the lmfit package? Both there and in the documentation of SciPy there is no explanation about the details of each algorithm. Right now I know I can choose between them but I don't know which one to choose...
My current problem
I am using lmfit in Python to minimize a function. I want to minimize the function within a finite and predefined range where the function has the following characteristics:
It is almost zero everywhere, which makes it to be numerically identical to zero almost everywhere.
It has a very, very sharp peak in some point.
The peak can be anywhere within the region.
This makes many minimization algorithms to not work. Right now I am using a combination of the brute force method (method="brute") to find a point close to the peak and then feed this value to the Nelder-Mead algorithm (method="nelder") to finally perform the minimization. It is working approximately 50 % of the times, and the other 50 % of the times it fails to find the minimum. I wonder if there are better algorithms for cases like this one...
I think it is a fair point that docs for lmfit (such as https://lmfit.github.io/lmfit-py/fitting.html#fit-methods-table) and scipy.optimize (such as https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html#optimization-scipy-optimize) do not give detailed mathematical descriptions of the algorithms.
Then again, most of the docs for scipy, numpy, and related libraries describe how to use the methods, but do not describe in much mathematical detail how the algorithms work.
In fairness, the different optimization algorithms share many features and the differences between them can get pretty technical. All of these methods try to minimize some metric (often called "cost" or "residual") by changing the values of parameters for the supplied function.
It sort of takes a text book (or at least a Wikipedia page) to establish the concepts and mathematical terms used for these methods, and then a paper (or at least a Wikipedia page) to describe how each method differs from the others. So, I think the basic answer would be to look up the different methods.

Custom roll fix effects in regressions

In a regressions, I know instead of using fixed effects (say on country or year), you can specify these effects with dummy variables and this is a perfectly legitimate method. However, when the number of effects (e.g. countries) is large, then it becomes very computational difficult. Instead you would run the fixed effects model. I am curious what R's plm or Stata's xtreg,fe does behind the scenes. Specifically, I wanted to try custom rolling my own panel regression...looking for some likelihood function (or way to condition the likelihood) I can throw into optim and have some fun. Ideas?
how plm and xtreg,fedo behind the screen are well documented in their instructions, I didn't read them through but I think that when you run pgmm or some thing like logit regression in plm,the algorithm actually is maximum likelihood given the nature of these problems.
and it would be interesting to write all these by yourself.

Numba jitclass with function input

I'm working on developing an adaptive rejection sampler in Numba. I'd like to use a class to implement it since I think it'll make the code a lot cleaner, and I see that classes are supported in Numba. My class would be a lot more general/useful if it could take functions as input, i.e. the log pdf of the distribution I want to sample from. Is there any way to do that? The alternative I guess is to define the log pdf equations in the class definition itself.
Why do I want to do this? The sampler will be used as part of a Gibbs sampling scheme, so speed ups in each sampling step are crucial. I have to simulate from a distribution I know only up to a normalizing constant, and adaptive rejection sampling is a general technique that will help me sample without needing to know this normalizing constant. There is a python implementation of an adaptive rejection sampler floating around stack overflow, but it's too slow for my purposes. It also randomly breaks for some reason on some simulated data that it should work on. I've had luck with numba on other parts of my project, including a greater than 100x speed up on one part of the Gibbs sampler.
Numba functions cannot take functions as input arguments. The official docs recommend possibly using closures in a function factory as a workaround in some cases:
http://numba.pydata.org/numba-doc/latest/user/faq.html#can-i-pass-a-function-as-an-argument-to-a-jitted-function
Copying the code example from the above link in case the url ever becomes invalid:
def make_f(g):
# Note: a new f() is compiled each time make_f() is called!
#jit(nopython=True)
def f(x):
return g(x) + g(-x)
return f
f = make_f(my_g_function)
result = f(1)
Not sure if this would work in your particular case though. I think defining the functions you want as class methods would be a better strategy, although without a code example, I'm just guessing.

Method of moments in scipy?

Following from this question, is there a way to use any method other than MLE (maximum-likelihood estimation) for fitting a continuous distribution in scipy? I think that my data may be resulting in the MLE method diverging, so I want to try using the method of moments instead, but I can't find out how to do it in scipy. Specifically, I'm expecting to find something like
scipy.stats.genextreme.fit(data, method=method_of_moments)
Does anyone know if this is possible, and if so how to do it?
Few things to mention:
1) scipy does not have support for GMM. There is some support for GMM via statsmodels (http://statsmodels.sourceforge.net/stable/gmm.html), you can also access many R routines via Rpy2 (and R is bound to have every flavour of GMM ever invented): http://rpy.sourceforge.net/rpy2/doc-2.1/html/index.html
2) Regarding stability of convergence, if this is the issue, then probably your problem is not with the objective being maximised (eg. likelihood, as opposed to a generalised moment) but with the optimiser. Gradient optimisers can be really fussy (or rather, the problems we give them are not really suited for gradient optimisation, leading to poor convergence).
If statsmodels and Rpy do not give you the routine you need, it is perhaps a good idea to write out your moment computation out verbose, and see how you can maximise it yourself - perhaps a custom-made little tool would work well for you?

Does Matlab's fminimax apply Pareto optimality?

I am working on multi-objective optimization in Matlab, and am using the fiminimax in the Optimization toolbox. I want to know if fminimax applies Pareto optimization, and if not, why? Also, can you suggest a multi-objective optimization package in Matlab or Python that does use Pareto?
For python, DEAP may be the one you're looking for. Extensive documentation with a lot of real life examples, and a really helpful Google Groups forum. It implements two robust MO algorithms: NSGA-II and SPEA-II.
Edit (as requested)
I am using DEAP for my MSc thesis, so I will let you know how we are using Pareto optimality. Setting DEAP up is pretty straight-forward, as you will see in the examples. Use this one as a starting point. This is the short version, which uses the built-in algorithms and operators. Read both and then follow these guidelines.
As the OneMax example is single-objective, it doesn't use MO algorithms. However, it's easy to implement them:
Change your evaluation function so it returns a n-tuple with the desired scores. If you want to minimize standard deviation too, something like return sum(individual), numpy.std(individual) would work.
Also, modify the weights parameter of the base.Fitness object so it matches that returned n-tuple. A positive float means maximization, while a negative one means minimization. You can use any real number, but I would stick with 1.0 and -1.0 for the sake of simplicity.
Change your genetic operators to cxSimulatedBinaryBounded(), mutPolynomialBounded() and selNSGA2(), for crossover, mutation and selection operations, respectively. These are the suggested methods, as they were developed by the NSGA-II authors.
If you want to use one of the embedded ready-to-go algorithms in DEAP, choose MuPlusLambda().
When calling the algorithm, remember to change the halloffame parameter from HallOfFame() to ParetoFront(). This will return all non-dominated individuals, instead of the best lexicographically sorted "best individuals in all generations". Then you can resolve your Pareto Front as desired: weighted sum, custom lexicographic sorting, etc.
I hope that helps. Take into account that there's also a full, somehow more advanced, NSGA2 example available here.
For fminimax and fgoalattain it looks like the answer is no. However, the genetic algorithm solver, gamultiobj, is Pareto set-based, though I'm not sure if it's the kind of multi-objective optimization function you want to use. gamultiobj implements the NGSA-II evolutionary algorithm. There's also this package that implements the Strengthen Pareto Evolutionary Algorithm 2 (SPEA-II) in C with a Matlab mex interface. It's a bit old so you might want to recompile it (you'll need to anyways if you're not on Windows 32-bit).

Categories