In sklearn.linear_model.Ridge, what exactly is the solverparameter doing? - python

In the sklearn.linear_model.Ridge method, there is a parameter, solver : {‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’}.
And according to the documentation, we should choose different parameter depending on different types' values which is dense or sparse or just use auto.
So in my opinion, we just choose a specific parameter to make calculations fast to corresponding data.
Are my thoughts right or wrong?
If not mind, could anyone give me some advice because I didn't search and get anything proving my thoughts or not?
Sincerely thanks.

You are almost right.
Some solver work only with specific type of data (dense vs. sparse) or specific type of problem (non-negative weights).
However for many cases you can use multiple solvers (e.g. for sparse problems you have at least sag, sparse_cg and lsqr). These solvers have different characteristics and some of them might work better in some cases and some of them work better in other cases. In some cases some solvers even do not converge.
In many cases, the simple engineering-like answer is to use solver which works best on your data. You just test all of them and measure time to answer.
If you want, more precise answer you should dig into documentation of referenced methods (e.g. scipy.sparse.linalg.lsqr).

Related

lmfit/scipy.optimize minimization methods description?

Is there any place with a brief description of each of the algorithms for the parameter method in the minimize function of the lmfit package? Both there and in the documentation of SciPy there is no explanation about the details of each algorithm. Right now I know I can choose between them but I don't know which one to choose...
My current problem
I am using lmfit in Python to minimize a function. I want to minimize the function within a finite and predefined range where the function has the following characteristics:
It is almost zero everywhere, which makes it to be numerically identical to zero almost everywhere.
It has a very, very sharp peak in some point.
The peak can be anywhere within the region.
This makes many minimization algorithms to not work. Right now I am using a combination of the brute force method (method="brute") to find a point close to the peak and then feed this value to the Nelder-Mead algorithm (method="nelder") to finally perform the minimization. It is working approximately 50 % of the times, and the other 50 % of the times it fails to find the minimum. I wonder if there are better algorithms for cases like this one...
I think it is a fair point that docs for lmfit (such as https://lmfit.github.io/lmfit-py/fitting.html#fit-methods-table) and scipy.optimize (such as https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html#optimization-scipy-optimize) do not give detailed mathematical descriptions of the algorithms.
Then again, most of the docs for scipy, numpy, and related libraries describe how to use the methods, but do not describe in much mathematical detail how the algorithms work.
In fairness, the different optimization algorithms share many features and the differences between them can get pretty technical. All of these methods try to minimize some metric (often called "cost" or "residual") by changing the values of parameters for the supplied function.
It sort of takes a text book (or at least a Wikipedia page) to establish the concepts and mathematical terms used for these methods, and then a paper (or at least a Wikipedia page) to describe how each method differs from the others. So, I think the basic answer would be to look up the different methods.

Scipy Linear algebra LinearOperator function utilised in Conjugate Gradient

I am preconditioning a matrix using spilu, however, to pass this preconditioner into cg (the built in conjugate gradient method) it is necessary to use the LinearOperator function, can someone explain to me the parameter matvec, and why I need to use it. Below is my current code
Ainv=scla.spilu(A,drop_tol= 1e-7)
Ainv=scla.LinearOperator(Ainv.shape,matvec=Ainv)
scla.cg(A,b,maxiter=maxIterations, M = Ainv)
However this doesnt work and I am given the error TypeError: 'SuperLU' object is not callable. I have played around and tried
Ainv=scla.LinearOperator(Ainv.shape,matvec=Ainv.solve)
instead. This seems to work but I want to know why matvec needs Ainv.solve rather than just Ainv, and is it the right thing to feed LinearOperator?
Thanks for your time
Without having much experience with this part of scipy, some comments:
According to the docs you don't have to use LinearOperator, but you might do
M : {sparse matrix, dense matrix, LinearOperator}, so you can use explicit matrices too!
The idea/advantage of the LinearOperator:
Many iterative methods (e.g. cg, gmres) do not need to know the individual entries of a matrix to solve a linear system A*x=b. Such solvers only require the computation of matrix vector products docs
Depending on the task, sometimes even matrix-free approaches are available which can be much more efficient
The working approach you presented is indeed the correct one (some other source doing it similarily, and some course-materials doing it like that)
The idea of not using the inverse matrix, but using solve() here is not to form the inverse explicitly (which might be very costly)
A similar idea is very common in BFGS-based optimization algorithms although wiki might not give much insight here
scipy has an extra LinearOperator for this not forming the inverse explicitly! (although i think it's only used for statistics / completing/finishing some optimization; but i successfully build some LBFGS-based optimizers with this one)
Source # scicomp.stackexchange discussing this without touching scipy
And because of that i would assume spilu is completely going for this too (returning an object with a solve-method)

How to implement multiple testing for scipy.stats tests

I have a dataframe of values from various samples from two groups. I performed a scipy.stats.ttest on these, which works perfectly, but I am a bit concerned here with the fact that so much testing may yield multiple testing error.
And I wonder how to implement MTC (multiple testing correction) with this. I mean, is there some function in scipy or statsmodels which would perform directly the tests and apply MTC on the output series of p-value, or can I apply an MTC function on a list of p-value without problem?
I know that statsmodels may comprise such functions, but what it has in power, it lacks greatly in manageability and documentation, unhappily (indeed, that's not the fault of the developers, they are three for such huge project). Anyway, I am a little stuck here, so I'll gladly take any suggesting. I didn't ask this in CrossValidated, because it is more related to the implementation part than the statistical part.
Edit 9th Oct 2019:
this link works as of today
https://www.statsmodels.org/stable/generated/statsmodels.stats.multitest.multipletests.html
original answer (returns 404 now)
statsmodels.sandbox.stats.multicomp.multipletests takes an array of p-values and returns the adjusted p-values. The documentation is pretty clear.

deterministic annealing method

I have ran into a shape matching problem and one term which I read about is deterministic annealing. I want to use this method to convert discrete problems, e.g. travelling salesman problem to continuous problems which could assist of sticking in the local minima. I don't know whether there is already an implementation of this statistical method and also implementation of it seems a bit challenging for me because I couldn't completely understand what this method does and couldn't find enough documentations. Can somebody explain it more or introduce a library especially in python that got already implemented?
You can see explication on Simulated annealing. Also, take a look to scipy.optimize.anneal.

Uses for secondary returns of scipy.optimize.leastsq?

I have been using scipy.optimize.leastsq quite a bit lately, but whenever I call it I only use the return "x" (the solution) from this long list of return values. I can't see myself needing any of the other values it returns. I'm curious, has anyone used them? Did it work well for what you used it for?
They are really useful, if you want to look into how well the fit worked. For instance, cov_x is the covariance matrix. Its diagonal entries are the estimation errors squared, so if you have parameters x[i] then sqrt(cov_x[i,i]) will be the estimated uncertainties of these parameters. Its off-diagonal entries on the other hand tell you something about the correlations between fit parameters. The wikipedia article about the covariance matrix is very informative on the mathematical details.
The further values are intended more for debugging as far as I can see, so one could probably design the API somewhat differently, to handle this kind of thing via exceptions instead, but the information there still can be very useful if required.
Many of these return values reflect (in variable names and values) the outputs of the underlying Fortran code from MINPACK (lmdif and/or lmder). Why both 'ier' and 'mesg' are returned, while other things are stuffed in infodict, and why the spelling follows Fortran77 conventions is beyond me.
It's unfortunate the return is not more Pythonic (say returning a OptimizeResult instance, as the new-ish minimize() does, perhaps adding a 'covariance' member and maybe more from infodict). I think that would require a wrapper level around leastsq().

Categories