Precision of numpy's eigenvaluesh

Precision of numpy's eigenvaluesh - python

First I find the eigenvalues of a (4000x4000) matrix by using numpy.linalg.eigvalsh. Then, I change the boundary conditions, expecting only a minor change in the eigenvalues.
Subtracting the eigenvalues is vulnerable to floating point errors, so I've used some relative tolerance.
Now say I have a eigenvalue A = 1.0001e-10, and another B = 1.0050e-10. According to my humble knowledge of floating-point arithmetic, A - B != 0. The problem is, that these numbers come from linear algebra calculations involving many orders of magnitude. Other eigenvalues might for example be of order 1.
The question is, what is the precision of eigenvalues calculated using numpy.linalg.eigvalsh? Is this precision relative to the value (A * eps), or is it relative to the largest eigenvalue? or perhaps relative to elements of the original matrix?
For example, this matrix:
1 1e-20
1e-20 3
gives the same eigenvalues as this:
1 1e-5
1e-5 3

I'm not sure if Lapack is used underneath eigvalsh, but this might be of interest:
Lapack error bounds for the Symmetric/Unsymmetric eigenproblem:
http://www.netlib.org/lapack/lug/node89.html
http://www.netlib.org/lapack/lug/node91.html

First, the solver is not exact. Second, your example matrices are poorly conditioned: the diagonal elements are orders of magnitude larger than the off-diagonal ones. This is always going to cause numerical issues.
From simple algebra, the determinant of the second matrix is (1 * 3) - (1e5 * 1e5) = 3 - 1e-10. You can already see that the precision problem is actually twice as large the precision of your smallest element. (The same applies for the eigenvalues.) Even though linalg is using double precision, because the solver is approximate you get the same answer. If you change the small values to 1e-3 you start seeing a difference, because now the precision is on the order of the numerical approximation.
This specific problem has been asked before. You can see in this answer how to use sympy to solve the eigenvalues with arbitrary precision.

Related

Rounding errors: deal with operation on vectors with very small components

Imagine to have some vectors (could be a torch tensor or a numpy array) with a huge number of components, each one very small (~ 1e-10).
Let's say that we want to calculate the norm of one of these vectors (or the dot product between two of them). Also using a float64 data type the precision on each component will be ~1e-10, while the product of 2 component (during the norm/dot product computation) can easily reach ~1e-20 causing a lot of rounding errors that, summed up together return a wrong result.
Is there a way to deal with this situation? (For example is there a way to define arbitrary precision array for these operations, or some built in operator that take care of that automatically?)

You are dealing with two different issues here:
Underflow / Overflow
Calculating the norm of very small values may underflow to zero when you calculate the square. Large values may overflow to infinity. This can be solved by using a stable norm algorithm.
A simple way to deal with this is to scale the values temporarily. See for example this:
a = np.array((1e-30, 2e-30), dtype='f4')
np.linalg.norm(a) # result is 0 due to underflow in single precision
scale = 1. / np.max(np.abs(a))
np.linalg.norm(a * scale) / scale # result is 2.236e-30
This is now a two-pass algorithm because you have to iterate over all your data before determining a scaling value. If this is not to your liking, there are single-pass algorithms, though you probably don't want to implement them in Python. The classic would be Blue's algorithm:
http://degiorgi.math.hr/~singer/aaa_sem/Float_Norm/p15-blue.pdf
A simpler but much less efficient way is to simply chain calls to hypot (which uses a stable algorithm). You should never do this, but just for completion:
norm = 0.
for value in a:
norm = math.hypot(norm, value)
Or even a hierarchical version like this to reduce the number of numpy calls:
norm = a
while len(norm) > 1:
hlen = len(norm) >> 1
front, back = norm[:hlen], norm[hlen: 2 * hlen]
tail = norm[2 * hlen:] # only present with length is not even
norm = np.append(np.hypot(front, back), tail)
norm = norm[0]
You are free to combine these strategies. For example if you don't have your data available all at once but blockwise (e.g. because the data set is too large and you read it from disk), you can pick a scaling value per block, then chain the blocks together with a few calls to hypot.
Rounding errors
You accumulate rounding errors, especially when accumulating values of different magnitude. If you accumulate values of different signs, you may also experience catastrophic cancelation. To avoid these issues, you need to use a compensated summation scheme. Python provides a very good one with math.fsum.
So if you absolutely need highest accuracy, go with something like this:
math.sqrt(math.fsum(np.square(a * scale))) / scale
Note that this is overkill for a simple norm since there are no sign changes in the accumulation (so no cancelation) and the squaring increases all differences in magnitude so that the result will always be dominated by its largest components, unless you are dealing with a truly horrifying dataset. That numpy does not provide built-in solutions for these issues tells you that the naive algorithm is actually good enough for most real-world applications. No reason to go overboard with the implementation before you actually run into trouble.
Application to dot products
I've focused on the l2 norm because that is the case that is more generally understood to be hazardous. Of course you can apply similar strategies to a dot product.
np.dot(a, b)
ascale = 1. / np.max(np.abs(a))
bscale = 1. / np.max(np.abs(b))
np.dot(a * ascale, b * bscale) / (ascale * bscale)
This is particularly useful if you use mixed precision. For example the dot product could be calculated in single precision but the x / (ascale * bscale) could take place in double or even extended precision.
And of course math.fsum is still available: dot = math.fsum(a * b)
Bonus thoughts
The whole scaling itself introduces some rounding errors because no one guarantees you that a/b is exactly representable in floating point. However, you can avoid this by picking a scaling factor that is an exact power of 2. Multiplying with a power of 2 is always exact in FP (assuming you stay in the representable range). You can get the exponent with math.frexp

Vectorized computation of log(n!)

I have an (arbitrarily shaped) array X of integers, and I would like to compute the logarithm of the factorial of each entry (Precisely, not through the Gamma function).
The numbers are big enough that
np.log(scipy.special.factorial(X))
is unfeasible. So I want to do something like np.sum(np.log(np.arange(2,X+1)), axis=-1)
But the arange() function gives a different size to each entry, so this doesn't work. I though about padding with ones, but I'm not sure how to do this.
Can this be done in a vectorized way?

I don't see what problem you have with the gamma function. The gamma function isn't an approximation, and while approximations may be involved in the computation of scipy.special.gammaln, there's no reason to expect those approximations to be worse than the error involved in computing the result manually. scipy.special.gammaln seems like the perfect tool for the job:
X_log_factorials = scipy.special.gammaln(X+1)
If you want to do this manually anyway, you could take the logarithms of all positive integers up to the maximum of your array, compute a cumulative sum, and then select the log-factorials you're interested in:
logarithms = numpy.log(numpy.arange(1, X.max()+1))
log_factorials = numpy.cumsum(logarithms)
X_log_factorials = log_factorials[X-1]
(If you want to handle 0!, you will need to make a minor adjustment, such as by setting X_log_factorials[X==0] = 0.)

What do extra results of numpy.polyfit mean?

When creating a line of best fit with numpy's polyfit, you can specify the parameter full to be True. This returns 4 extra values, apart from the coefficents. What do these values mean and what do they tell me about how well the function fits my data?
https://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html
What i'm doing is:
bestFit = np.polyfit(x_data, y_data, deg=1, full=True)
and I get the result:
(array([ 0.00062008, 0.00328837]), array([ 0.00323329]), 2, array([
1.30236506, 0.55122159]), 1.1102230246251565e-15)
The documentation says that the four extra pieces of information are: residuals, rank, singular_values, and rcond.
Edit:
I am looking for a further explanation of how rcond and singular_values describes goodness of fit.
Thank you!

how rcond and singular_values describes goodness of fit.
Short answer: they don't.
They do not describe how well the polynomial fits the data; this is what residuals are for. They describe how numerically robust was the computation of that polynomial.
rcond
The value of rcond is not really about quality of fit, it describes the process by which the fit was obtained, namely a least-squares solution of a linear system. Most of the time the user of polyfit does not provide this parameter, so a suitable value is picked by polyfit itself. This value is then returned to the user for their information.
rcond is used for truncation in ill-conditioned matrices. Least squares solver does two things:
Finds x that minimizes the norm of residuals Ax-b
If multiple x achieve this minimum, returns x with the smallest norm among those.
The second clause occurs when some changes of x do not affect the right-hand side at all. But since floating point computations are imperfect, usually what happens is that some changes of x affect the right hand side very little. And this is where rcond is used to decide when "very little" should be considered as "zero up to noise".
For example, consider the system
x1 = 1
x1 + 0.0000000001 * x2 = 2
This one can be solved exactly: x1 = 1 and x2 = 10000000000. But... that tiny coefficient (that in reality, came after some matrix manipulations) has some numeric error in it; for all we know it could be negative, or zero. Should we let it have such huge influence on the solution?
So, in such a situation the matrix (specifically its singular values) gets truncated at level rcond. This leaves
x1 = 1
x1 = 2
for which the least-squares solution is x1 = 1.5, x2 = 0. Note that this solution is robust: no huge numbers from tiny fluctuations of coefficients.
Singular values
When one solves a linear system Ax = b in the least squares sense, the singular values of A determine how numerically tricky this is. Specifically, large disparity between largest and smallest singular values is problematic: such systems are ill-conditioned. An example is
0.835*x1 + 0.667*x2 = 0.168
0.333*x1 + 0.266*x2 = 0.0067
The exact solution is (1, -1). But if the right hand side is changed from 0.067 to 0.066, the solution is (-666, 834) -- totally different. The problem is that the singular values of A are (roughly) 1 and 1e-6; this magnifies any changes on the right by the factor of 1e6.
Unfortunately, polynomial fit often results in ill-conditioned matrices. For example, fitting a polynomial of degree 24 to 25 equally spaced data points is unadvisable.
import numpy as np
x = np.arange(25)
np.polyfit(x, x, 24, full=True)
The singular values are
array([4.68696731e+00, 1.55044718e+00, 7.17264545e-01, 3.14298605e-01,
1.16528492e-01, 3.84141241e-02, 1.15530672e-02, 3.20120674e-03,
8.20608411e-04, 1.94870760e-04, 4.28461687e-05, 8.70404409e-06,
1.62785983e-06, 2.78844775e-07, 4.34463936e-08, 6.10212689e-09,
7.63709211e-10, 8.39231664e-11, 7.94539407e-12, 6.32326226e-13,
4.09332903e-14, 2.05501534e-15, 7.55397827e-17, 4.81104905e-18,
8.98275758e-20]),
which, with the default value of rcond (5.55e-15 here), gets four of them truncated to 0.
The difference in magnitude between smallest and largest singular values indicates that perturbing the y-values by numbers of size 1e-15 can result in changes of about 1 to the coefficients. (Not every perturbation will do that, just some that happen to align with a singular vector for a small singular value).
Rank
The effective rank is just the number of singular values above the rcond threshold. In the above example it's 21. This means that even though the fit is for 25 points, and we get a polynomial with 25 coefficients, there are only 21 degrees of freedom in the solution.

Calculating mutual information in python returns nan

I've implemented the mutual information formula in python using pandas and numpy
def mutual_info(p):
p_x=p.sum(axis=1)
p_y=p.sum(axis=0)
I=0.0
for i_y in p.index:
for i_x in p.columns:
I+=(p.ix[i_y,i_x]*np.log2(p.ix[i_y,i_x]/(p_x[i_y]*p[i_x]))).values[0]
return I
However, if a cell in p has a zero probability, then np.log2(p.ix[i_y,i_x]/(p_x[i_y]*p[i_x])) is negative infinity, and the whole expression is multiplied by zero and returns NaN.
What is the right way to work around that?

For various theoretical and practical reasons (e.g., see Competitive Distribution Estimation:
Why is Good-Turing Good), you might consider never using a zero probability with the log loss measure.
So, say, if you have a probability vector p, then, for some small scalar α > 0, you would use α 1 + (1 - α) p (where here the first 1 is the uniform vector). Unfortunately, there are no general guidelines for choosing α, and you'll have to assess this further down the calculation.
For the Kullback-Leibler distance, you would of course apply this to each of the inputs.

How to compute scipy sparse matrix determinant without turning it to dense?

I am trying to figure out the fastest method to find the determinant of sparse symmetric and real matrices in python. using scipy sparse module but really surprised that there is no determinant function. I am aware I could use LU factorization to compute determinant but don't see a easy way to do it because the return of scipy.sparse.linalg.splu is an object and instantiating a dense L and U matrix is not worth it - I may as well do sp.linalg.det(A.todense()) where A is my scipy sparse matrix.
I am also a bit surprised why others have not faced the problem of efficient determinant computation within scipy. How would one use splu to compute determinant?
I looked into pySparse and scikits.sparse.chlmod. The latter is not practical right now for me - needs package installations and also not sure sure how fast the code is before I go into all the trouble.
Any solutions? Thanks in advance.

Here are some references I provided as part of an answer here.
I think they address the actual problem you are trying to solve:
notes for an implementation in the Shogun library
Erlend Aune, Daniel P. Simpson: Parameter estimation in high dimensional Gaussian distributions, particularly section 2.1 (arxiv:1105.5256)
Ilse C.F. Ipsen, Dean J. Lee: Determinant Approximations (arxiv:1105.0437)
Arnold Reusken: Approximation of the determinant of large sparse symmetric positive definite matrices (arxiv:hep-lat/0008007)
Quoting from the Shogun notes:
The usual technique for computing the log-determinant term in the likelihood expression relies on Cholesky factorization of the matrix, i.e. Σ=LLT, (L is the lower triangular Cholesky factor) and then using the diagonal entries of the factor to compute log(det(Σ))=2∑ni=1log(Lii). However, for sparse matrices, as covariance matrices usually are, the Cholesky factors often suffer from fill-in phenomena - they turn out to be not so sparse themselves. Therefore, for large dimensions this technique becomes infeasible because of a massive memory requirement for storing all these irrelevant non-diagonal co-efficients of the factor. While ordering techniques have been developed to permute the rows and columns beforehand in order to reduce fill-in, e.g. approximate minimum degree (AMD) reordering, these techniques depend largely on the sparsity pattern and therefore not guaranteed to give better result.
Recent research shows that using a number of techniques from complex analysis, numerical linear algebra and greedy graph coloring, we can, however, approximate the log-determinant up to an arbitrary precision [Aune et. al., 2012]. The main trick lies within the observation that we can write log(det(Σ)) as trace(log(Σ)), where log(Σ) is the matrix-logarithm.

The "standard" way to solve this problem is with a cholesky decomposition, but if you're not up to using any new compiled code, then you're out of luck. The best sparse cholesky implementation is Tim Davis's CHOLMOD, which is licensed under the LGPL and thus not available in scipy proper (scipy is BSD).

You can use scipy.sparse.linalg.splu to obtain sparse matrices for the lower (L) and upper (U) triangular matrices of an M=LU decomposition:
from scipy.sparse.linalg import splu
lu = splu(M)
The determinant det(M) can be then represented as:
det(M) = det(LU) = det(L)det(U)
The determinant of triangular matrices is just the product of the diagonal terms:
diagL = lu.L.diagonal()
diagU = lu.U.diagonal()
d = diagL.prod()*diagU.prod()
However, for large matrices underflow or overflow commonly occurs, which can be avoided by working with the logarithms.
diagL = diagL.astype(np.complex128)
diagU = diagU.astype(np.complex128)
logdet = np.log(diagL).sum() + np.log(diagU).sum()
Note that I invoke complex arithmetic to account for negative numbers that might appear in the diagonals. Now, from logdet you can recover the determinant:
det = np.exp(logdet) # usually underflows/overflows for large matrices
whereas the sign of the determinant can be calculated directly from diagL and diagU (important for example when implementing Crisfield's arc-length method):
sign = swap_sign*np.sign(diagL).prod()*np.sign(diagU).prod()
where swap_sign is a term to consider the number of permutations in the LU decomposition. Thanks to #Luiz Felippe Rodrigues, it can be calculated:
swap_sign = (-1)**minimumSwaps(lu.perm_r)
def minimumSwaps(arr):
"""
Minimum number of swaps needed to order a
permutation array
"""
# from https://www.thepoorcoder.com/hackerrank-minimum-swaps-2-solution/
a = dict(enumerate(arr))
b = {v:k for k,v in a.items()}
count = 0
for i in a:
x = a[i]
if x!=i:
y = b[i]
a[y] = x
b[x] = y
count+=1
return count

Things start to go wrong with the determinant of sparse tridiagonal (-1 2 -1) around N=1e6 using both SuperLU and CHOLMOD...
The determinant should be N+1.
It's probably propagation of error when calculating the product of the U diagonal:
from scipy.sparse import diags
from scipy.sparse.linalg import splu
from sksparse.cholmod import cholesky
from math import exp
n=int(5e6)
K = diags([-1.],-1,shape=(n,n)) + diags([2.],shape=(n,n)) + diags([-1.],1,shape=(n,n))
lu = splu(K.tocsc())
diagL = lu.L.diagonal()
diagU = lu.U.diagonal()
det=diagL.prod()*diagU.prod()
print(det)
factor = cholesky(K.tocsc())
ld = factor.logdet()
print(exp(ld))
Output:
4999993.625461911
4999993.625461119
Even if U is 10-13 digit accurate, this might be expected:
n=int(5e6)
print(n*diags([1-0.00000000000025],0,shape=(n,n)).diagonal().prod())
4999993.749444371

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.