The difference between C++ (LAPACK, sgels) and Python (Numpy, lstsq) results - python

I am comparing the numerical results of C++ and Python computations. In C++, I make use of LAPACK's sgels function to compute the coefficients of a linear regression problem. In Python, I use Numpy's linalg.lstsq function for a similar task.
What is the mathematical difference between the methods used by sgels and linalg.lstsq?
What is the expected error (e.g. 6 significant digits) when comparing the results (i.e. the regression coefficients) numerically?
FYI: I am by no means a C++ or Python expert, which makes it difficult to understand what is going on inside the functions.

Taking a look at the source of numpy, in the file linalg.py, lstsq relies on LAPACK's zgelsd() for complex and dgelsd() for real. Here are the differences to sgels():
dgelsd() is for double while sgels() is for float. There is a difference of precision...
dgels() makes use the QR factorization of the matrix A and assumes that A has full rank. The condition number of the matrix must be reasonable to get a significant result. See this course for getting the logic of the method. On the other hand, dgelsd() makes use of the Singular value decomposition of A. In particular, A can be rank defiencient and small singular values are discarted depending of the additional argument rcond or machine precision. Notice that numpy's default value for rcond is -1: negative values refers to machine precision. See this course for the logic.
According to the benchmark of LAPACK, on can expect dgels() to be about 5 time faster than dgelsd().
You may see significant differences between the result of sgels() and dgelsd() if the matrix is ill conditionned. Indeed, there is a bound on the error of the linear regression which depends on the algorithm and the value of rcond() that is used. See the user guide of LAPACK on, Error Bounds for Linear Least Squares Problems for estimates of the errors and Further Details: Error Bounds for Linear Least Squares Problems for technical details.
As a conclusion, sgels() and dgels() can be used if the measures in b are accurate and easily related to the explanatory variables. For instance, if sensors are placed at the exits of exhaust pipes, it's easy to guess which motors are running. But sometimes, the linear link between the source and the measures is not precisely known (uncertainty on the terms of A) or discriminating polluters on the base of measurements becomes more difficult (Some polluters are far from the set of sensors and A is ill-conditionned). In this kind of situation, dgelsd() and tunning the rcond argument can help. Whenever in doubt, use dgelsd() and estimate the error on the estimated x according to LAPACK's user guide.

Related

Solving a large (150 variable) system of linear, ordinary differential equations; running into floating point rounding and/or stiffness problems

EDIT: Original post too vague. I am looking for an algorithm to solve a large-system, solvable, linear IVP that can handle very small floating point values. Solving for the eigenvectors and eigenvalues is impossible with numpy.linalg.eig() as the returned values are complex and should not be, it does not support numpy.float128 either, and the matrix is not symmetric so numpy.linalg.eigh() won't work. Sympy could do it given an infinite amount of time, but after running it for 5 hours I gave up. scipy.integrate.solve_ivp() works with implicit methods (have tried Radau and BDF), but the output is wildly wrong. Are there any libraries, methods, algorithms, or solutions for working with this many, very small numbers?
Feel free to ignore the rest of this.
I have a 150x150 sparse (~500 nonzero entries of 22500) matrix representing a system of first order, linear differential equations. I'm attempting to find the eigenvalues and eigenvectors of this matrix to construct a function that serves as the analytical solution to the system so that I can just give it a time and it will give me values for each variable. I've used this method in the past for similar 40x40 matrices, and it's much (tens, in some cases hundreds of times) faster than scipy.integrate.solve_ivp() and also makes post model analysis much easier as I can find maximum values and maximum rates of change using scipy.optimize.fmin() or evaluate my function at inf to see where things settle if left long enough.
This time around, however, numpy.linalg.eig() doesn't seem to like my matrix and is giving me complex values, which I know are wrong because I'm modeling a physical system that can't have complex rates of growth or decay (or sinusoidal solutions), much less complex values for its variables. I believe this to be a stiffness or floating point rounding problem where the underlying LAPACK algorithm is unable to handle either the very small values (smallest is ~3e-14, and most nonzero values are of similar scale) or disparity between some values (largest is ~4000, but values greater than 1 only show up a handful of times).
I have seen suggestions for similar users' problems to use sympy to solve for the eigenvalues, but when it hadn't solved my matrix after 5 hours I figured it wasn't a viable solution for my large system. I've also seen suggestions to use numpy.real_if_close() to remove the imaginary portions of the complex values, but I'm not sure this is a good solution either; several eigenvalues from numpy.linalg.eig() are 0, which is a sign of error to me, but additionally almost all the real portions are of the same scale as the imaginary portions (exceedingly small), which makes me question their validity as well. My matrix is real, but unfortunately not symmetric, so numpy.linalg.eigh() is not viable either.
I'm at a point where I may just run scipy.integrate.solve_ivp() for an arbitrarily long time (a few thousand hours) which will probably take a long time to compute, and then use scipy.optimize.curve_fit() to approximate the analytical solutions I want, since I have a good idea of their forms. This isn't ideal as it makes my program much slower, and I'm also not even sure it will work with the stiffness and rounding problems I've encountered with numpy.linalg.eig(); I suspect Radau or BDF would be able to navigate the stiffness, but not the rounding.
Anybody have any ideas? Any other algorithms for finding eigenvalues that could handle this? Can numpy.linalg.eig() work with numpy.float128 instead of numpy.float64 or would even that extra precision not help?
I'm happy to provide additional details upon request. I'm open to changing languages if needed.
As mentioned in the comment chain above the best solution for this is to use a Matrix Exponential, which is a lot simpler (and apparently less error prone) than diagonalizing your system with eigenvectors and eigenvalues.
For my case I used scipy.sparse.linalg.expm() since my system is sparse. It's fast, accurate, and simple. My only complaint is the loss of evaluation at infinity, but it's easy enough to work around.

lmfit/scipy.optimize minimization methods description?

Is there any place with a brief description of each of the algorithms for the parameter method in the minimize function of the lmfit package? Both there and in the documentation of SciPy there is no explanation about the details of each algorithm. Right now I know I can choose between them but I don't know which one to choose...
My current problem
I am using lmfit in Python to minimize a function. I want to minimize the function within a finite and predefined range where the function has the following characteristics:
It is almost zero everywhere, which makes it to be numerically identical to zero almost everywhere.
It has a very, very sharp peak in some point.
The peak can be anywhere within the region.
This makes many minimization algorithms to not work. Right now I am using a combination of the brute force method (method="brute") to find a point close to the peak and then feed this value to the Nelder-Mead algorithm (method="nelder") to finally perform the minimization. It is working approximately 50 % of the times, and the other 50 % of the times it fails to find the minimum. I wonder if there are better algorithms for cases like this one...
I think it is a fair point that docs for lmfit (such as https://lmfit.github.io/lmfit-py/fitting.html#fit-methods-table) and scipy.optimize (such as https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html#optimization-scipy-optimize) do not give detailed mathematical descriptions of the algorithms.
Then again, most of the docs for scipy, numpy, and related libraries describe how to use the methods, but do not describe in much mathematical detail how the algorithms work.
In fairness, the different optimization algorithms share many features and the differences between them can get pretty technical. All of these methods try to minimize some metric (often called "cost" or "residual") by changing the values of parameters for the supplied function.
It sort of takes a text book (or at least a Wikipedia page) to establish the concepts and mathematical terms used for these methods, and then a paper (or at least a Wikipedia page) to describe how each method differs from the others. So, I think the basic answer would be to look up the different methods.

Characteristic polynomial of binary matrix over the field F2 or GF(2)

How can the characteristic polynomial of a binary matrix (one with only zeros and ones) be found programmatically, where the process operates in the finite field F2 (also known as GF(2)) and the coefficients are zeros and ones?
Here's what I have tried:
SymPy's charpoly() method doesn't give the answer I want, since it doesn't operate on the field F2 and gives a polynomial with coefficients well beyond 0 and 1. However, is it possible to adapt the output of charpoly() to return the characteristic polynomial over F2, or to have the charpoly() method operate on that field?
This repository is about the most convenient thing I could find that could solve this question. As of this writing I am trying it out now. However, it is very slow (is on track to take many hours) for the sizes of matrices I am interested in (128x128 to 256x256). Moreover, I had to modify the source code to fit my needs since the code, as is, doesn't take arbitrary matrices.
I am asking this question because finding the characteristic polynomial in F2 is part of the process of calculating the appropriate jump parameter for certain random number generators (see my note on this).
As it turns out, the coefficients of the characteristic polynomial returned by charpoly() can be adapted for the GF(2) finite field, and it's easy to do: odd coefficients become ones, and even coefficients become zeros. And this is enough for my purposes. Therefore, my issue is solved.

How do you show cost function per iteration in scikit-learn?

I've been running some linear/logistic regression models recently, and I wanted to know how you can output the cost function for each iteration. One of the parameters in sci-kit LinearRegression is 'maxiter', but in reality you need to see cost vs iteration to find out what this value really needs to be i.e. is the benefit worth the computational time to run more iterations etc
I'm sure I'm missing something but I would have thought there was a method that outputted this information?
Thanks in advance!
One has to understand if there is any iteration (implying computing a cost function) or an analytical exact solution, when fitting any estimator.
Linear Regression
In fact, Linear Regression - ie Minimization of the Ordinary Least Square - is not an algorithm but a minimization problem that can be solved using different techniques. And those techniques
Not getting into the details of the statistical part described here :
There are at least three methods used in practice for computing least-squares solutions: the normal equations, QR decomposition, and singular value decomposition.
As far as I got into the details of the codes, it seems that the computational time is involved by getting the analytical exact solution, not iterating over the cost function. But I bet they depend on your system being under-, well- or over-determined, as well as the language and library you are using.
Logistic Regression
As Linear Regression, Logistic Regression is a minimization problem that can be solved using different techniques that, for scikit-learn, are : newton-cg, lbfgs, liblinear and sag.
As you mentionned, sklearn.linear_model.LogisticRegression includes the max_iter argument, meaning it includes iterations*. Those are controled either because the updated argument doesn't change anymore - up to a certain epsilon value - or because it reached the maximum number of iterations.
*As mentionned in the doc, it includes iterations only for some of the solvers
Useful only for the newton-cg, sag and lbfgs solvers. Maximum number of iterations taken for the solvers to converge.
In fact, each solver involves its own implementation, such as here for the liblinear solver.
I would recommand to use the verbose argument, maybe equal to 2 or 3 to get the maximum value. Depending on the solver, it might print the cost function error. However, I don't understand how you are planning to use this information.
Another solution might be to code your own solver and print the cost function at each iteration.
Curiosity kills cat but I checked the source code of scikit which involves many more.
First, sklearn.linear_model.LinearRegression use a fit to train its parameters.
Then, in the source code of fit, they use the Ordinary Least Square of Numpy (source).
Finally, Numpy's Least Square function uses the functionscipy.linalg.lapack.dgelsd, a wrapper to the LAPACK (Linear Algebra PACKage) function DGELSD written in Fortran (source).
That is to say that getting into the error calculation, if any, is not easy for scikit-learn developers. However, for the various using of LinearRegression and many more I had, the trade-off between cost-function and iteration time is well-adressed.

Comparing fsolve results in python and matlab

I have a follow up question to the post written a couple days ago, thank you for the previous feedback:
Finding complex roots from set of non-linear equations in python
I have gotten the set non-linear equations set up in python now so that fsolve will handle the real and imaginary parts independently. However, there are still problems with the python "fsolve" converging to the correct solution. I have exactly the same inputs that are used in Matlab, and after double checking, the set of equations are exactly the same as well. Matlab, no matter how I set the initial values, will always converge to the correct solution. With python however, every initial condition produces a different result, and never the correct one. After a fraction of a second, the following warning appears with python:
/opt/local/Library/Frameworks/Python.framework/Versions/Current/lib/python2.7/site-packages/scipy/optimize/minpack.py:227:
RuntimeWarning: The iteration is not making good progress, as measured by the
improvement from the last ten iterations.
warnings.warn(msg, RuntimeWarning)
I was wondering if there are some known differences between the fsolve in python and Matlab, and if there are some known methods to optimize the performance in python.
Thank you very much
I don't think that you should rely on the fact that the names are the same. I see from your other question that you are specifying that Matlab's fsolve use the 'levenberg-marquardt' algorithm rather than the default. Python's scipy.optimize.fsolve uses MINPACK's hybrd algorithms. Levenberg-Marquardt finds roots approximately by minimizing the sum of squares of the function and is quite robust. It is not a true root-finding method like the default 'trust-region-dogleg' algorithm. I don't know how the hybrd schemes work, but they claim to be a modification of Powell's method.
If you want something similar to what you're doing in Matlab, I'd look for an optimization scheme that implements Levenberg-Marquardt, such as scipy.optimize.root, which you were also using in your previous question. Is there a reason why you're not using that?

Categories