I'm trying to get into Singular Value Decomposition (SVD). I've found this YouTube Lecture that contains an example. However, when I try this example in numpy I'm getting "kind of" different results. In this example the input matrix is
A = [ [1,1,1,0,0], [3,3,3,0,0], [4,4,4,0,0], [5,5,5,0,0], [0,2,0,4,4], [0,0,0,5,5], [0,1,0,2,2] ]
A = np.asarray(A)
print(A)
[[1 1 1 0 0]
[3 3 3 0 0]
[4 4 4 0 0]
[5 5 5 0 0]
[0 2 0 4 4]
[0 0 0 5 5]
[0 1 0 2 2]]
The rank of this matrix is 3 (np.linalg.matrix_rank(A)). The lecture states that the number of singular values are the rank of the matrix, and in the example the Sigma matrix S is indeed of size 3=3. However, when I perform
U, S, V = np.linalg.svd(A)
matrix S contains 5 values. On the other hand, the first 3 values match the one in the example, and the other 2 are basically 0. Can I assume that get more singular values than the the rank because of the numerical algorithm behind SVD and the finite representation of real numbers on computers - or something along that line?
As mentioned on this page, numpy internally uses LAPACK routine _gesdd to get the SVD decomposition. Now, if you see _gesdd documentation, it mentions,
To find the SVD of a general matrix A, call the LAPACK routine ?gebrd
or ?gbbrd for reducing A to a bidiagonal matrix B by a unitary
(orthogonal) transformation: A = QBPH. Then call ?bdsqr, which forms
the SVD of a bidiagonal matrix: B = U1ΣV1H.
So, there are 2 steps involved here :
Bidiagonalization by orthogonal transformation(Householder transformations)
Get the SVD of the bidiagonal matrix, using implicit zero-shift QR algorithm.
QR algorithm is an iterative algorithm, meaning you don't get an "exact" answer, but get better and better approximations with each iteration and stop if the change in values fall below a threshold, so it is "approximate" in that sense.
Thus along with the issue of numerical accuracies due to finite machine representation of reals, even if we had infinite representational capacity, we would have gotten "approximate" results(if we ran the algorithm for finite time) due to the iterative nature of the algorithm.
Related
I have a large numpy array A of shape M*3, whose elements of each row are unique, non-negative integers ranging from 0 to N - 1. In fact, each row corresponds to a triangle in my finite element analysis.
For example, M=4, N=5, and a matrix A looks like the following
array([[0, 1, 2],
[0, 2, 3],
[1, 2, 4],
[3, 2, 4]])
Now, I need to construct another array B of size M*N, such that
B[m,n] = 1 if n is in A[m], or else 0
The corresponding B for the exemplary A above would be
1 1 1 0 0
1 0 1 1 0
0 1 1 0 1
0 0 1 1 1
A loop-based code would be
B = np.zeros((M,N))
for m in range(M):
for n in B[m]:
B[m,n]=1
But since I have large M and N (of scale 10^6 for each), how can I use good Numpy indexing techniques to accelerate this process? Besides, I feel that sparse matrix techniques are also needed since M * N data of 1 byte is about 10**12, namely 1 000 G.
In general, I feel using numpy's vectorization techniques, such as indexing and broadcasting, look more like an ad-hoc, error-prone activity relying on quite a bit of street smarts (or called art, if you prefer). Are there any programming language efforts that can systematically convert your loop-based code to a high-performance vectorized version?
You can directly create a sparse csr-matrix from your data
As you already mentioned in your question a dense matrix consisting of uint8 values would need 1 TB. By using a sparse matrix, this can be reduced to approx. 19 MB as shown in the example bellow.
Creating Inputs with relevant size
This should be included in the question, as it gives a hint on the sparsity of the matrix.
from scipy import sparse
import numpy as np
M=int(1e6)
N=int(1e6)
A=np.random.randint(low=0,high=N,size=(M,3))
Creating a sparse csr-matrix
Have a look at the scipy-doc or for a general overview the wiki article could also be useful.
#array of ones, the same size of non-zero values (3 MB if uint8)
data =np.ones(A.size,dtype=np.uint8)
#you already have the indices, they are expected as an 1D-array (12 MB)
indices=A.reshape(-1)
#every A.shape[1], a new row beginns (4 MB)
indptr =np.arange(0,A.size+1,A.shape[1])
B=sparse.csr_matrix((data,indices,indptr),shape=(M, N))
I've been facing an interesting python problem. I've tried to inverse 3x3 matrix A
[[1 2 3]
[4 5 6]
[7 8 9]]
and then multiply it on initial one: A⁻ⁱA. Instead of identity matrix (with all diagonal elements equal one) I've got this one:
[[ 12. 8. 8.]
[-16. -8. 0.]
[ 4. 0. 0.]]
The problem occurs only in this specific case. Matrices with other values give right results. Here is the code:
import numpy as np
np.set_printoptions(precision=2,suppress=True)
A = np.array([1,2,3,4,5,6,7,8,9])
A = A.reshape(3,3)
print(A)
print(np.linalg.det(A))
print(np.matmul(np.linalg.inv(A),A))
Output:
[[1 2 3]
[4 5 6]
[7 8 9]]
6.66133814775094e-16
[[ 12. 8. 8.]
[-16. -8. 0.]
[ 4. 0. 0.]]
As others have pointed out, a singular matrix is non-invertible, so you get a nonsense answer from A^-1 A.
Numpy includes a handy function to check the condition number
np.linalg.cond(A)
# 5.0522794445385096e+16
As wikipedia states, this is a measure of the sensitivity of the output value b in Ax = b for small change in the matrix values in A (kind of like a generalized derivative). The large value indicates that A is "il-conditioned", and can result in unstable values. This is intrinsic to the real-valued matrix but can be worsened by floating point arithmetic.
cond is more useful than looking at np.linalg.det(A) to know if your matrix will be well-behaved because its is not sensitive to the scale of values in A (whereas the norm and determinant are). As an example, here is a matrix with small values, but really has no issue with invertibility:
A = 1e-10*np.random.random(size=(3,3))
np.linalg.det(A)
# 2.128774239739163e-31
# ^^ this looks really bad...
np.linalg.cond(A)
# 8.798791503909136
# nevermind, it's probably ok
A_ident = np.matmul(np.linalg.inv(A), A)
np.linalg.norm(A_ident - np.identity(3))
# 5.392490230798587e-16
# A^(-1)*A is very close to the identity matrix, not il-conditioned.
The determinant of this matrix is 0. since
import numpy as np
np.set_printoptions(precision=2,suppress=True)
A = np.array([1,2,3,4,5,6,7,8,9])
A = A.reshape(3,3)
# print determinant
print(np.linalg.det(A))
returns
[[1 2 3]
[4 5 6]
[7 8 9]]
0.0
you have a matrix that has no computable inverse.
Your matrix is not invertible, see e.g. wolfram alpha, which says that matrix is singular.
You may be misguided that Python printed a nonzero value of determinant (6.66133814775094e-16), however, this value is so close to 0, that you should treat it like that. The operations that computers do on floating point numbers usually are not completely accurate (see e.g. this question Why are floating point numbers inaccurate?) and that might have been the reason that the value of determinant was close to zero, but not exactly there.
I want to find the association between variables and cramer V works like a treat for matrices of sizes greater than 2X2. However, for matrices with low frequencies, it does not work well. For the following contingency matrix, i get the result as 0.5. How can I account for the same?
1 2
a 2 0
b 0 2
Here is my code:
def cramers_stat(confusion_matrix):
chi2 = ss.chi2_contingency(confusion_matrix)[0]
n = confusion_matrix.sum().sum()
return np.sqrt(chi2 / (n*(min(confusion_matrix.shape)-1)))
result=cramers_stat(confusion_matrix)
print(result)
confusion_matrix is my input, in this case the matrix i mentioned above. I understand for good results, i need a matrix frequency above 5, but for perfect association as the case above I expected the result to be 1.
When you compute the Cramér coefficient, you must compute chi2 without continuity correction. For a 2x2 matrix, chi2_contingency uses continuity correction by default. So you must tell chi2_contingency to not use continuity correction by giving the argument correction=False:
chi2 = ss.chi2_contingency(confusion_matrix, correction=False)[0]
I do try to find an appropriate function for the permeability of cells under varying conditions. If I assume constant permeability, I can fit it to the experimental data and use Sklearns PolynomialFeatures together with a LinearModel (As explained in this post) in order to determine a correlation between the conditions and the permeability. However, the permeability is not constant and now I try to fit my model with the permeability as a function of the process conditions. The PolynomialFeature module of sklearn is quite nice to use.
Is there an equivalent function within scipy or numpy which allows me to create a polynomial model (including interaction terms e.g. a*x[0]*x[1] etc.) of varying order without writing the whole function by hand ?
The standard polynomial class in numpy seems not to support interaction terms.
I'm not aware of such a function that does exactly what you need, but you can achieve it using a combination of itertools and numpy.
If you have n_features predictor variables, you essentially must generate all vectors of length n_features whose entries are non-negative integers and sum to the specified order. Each new feature column is the component-wise power using these vectors who sum to a given order.
For example, if order = 3 and n_features = 2, one of the new features will be the old features raise to the respective powers, [2,1]. I've written some code below for arbitrary order and number of features. I've modified the generation of vectors who sum to order from this post.
import itertools
import numpy as np
from scipy.special import binom
def polynomial_features_with_cross_terms(X, order):
"""
X: numpy ndarray
Matrix of shape, `(n_samples, n_features)`, to be transformed.
order: integer, default 2
Order of polynomial features to be computed.
returns: T, powers.
`T` is a matrix of shape, `(n_samples, n_poly_features)`.
Note that `n_poly_features` is equal to:
`n_features+order-1` Choose `n_features-1`
See: https://en.wikipedia.org\
/wiki/Stars_and_bars_%28combinatorics%29#Theorem_two
`powers` is a matrix of shape, `(n_features, n_poly_features)`.
Each column specifies the power by row of the respective feature,
in the respective column of `T`.
"""
n_samples, n_features = X.shape
n_poly_features = int(binom(n_features+order-1, n_features-1))
powers = np.zeros((n_features, n_poly_features))
T = np.zeros((n_samples, n_poly_features), dtype=X.dtype)
combos = itertools.combinations(range(n_features+order-1), n_features-1)
for i,c in enumerate(combos):
powers[:,i] = np.array([
b-a-1 for a,b in zip((-1,)+c, c+(n_features+order-1,))
])
T[:,i] = np.prod(np.power(X, powers[:,i]), axis=1)
return T, powers
Here's some example usage:
>>> X = np.arange(-5,5).reshape(5,2)
>>> T,p = polynomial_features_with_cross_terms(X, order=3)
>>> print X
[[-5 -4]
[-3 -2]
[-1 0]
[ 1 2]
[ 3 4]]
>>> print p
[[ 0. 1. 2. 3.]
[ 3. 2. 1. 0.]]
>>> print T
[[ -64 -80 -100 -125]
[ -8 -12 -18 -27]
[ 0 0 0 -1]
[ 8 4 2 1]
[ 64 48 36 27]]
Finally, I should mention that the SVM polynomial kernel achieves exactly this effect without explicitly computing the polynomial map. There are of course pro's and con's to this, but I figured I should mentioned it for you to consider if you have not, yet.
Using example from Andrew Ng's class (finding parameters for Linear Regression using normal equation):
With Python:
X = np.array([[1, 2104, 5, 1, 45], [1, 1416, 3, 2, 40], [1, 1534, 3, 2, 30], [1, 852, 2, 1, 36]])
y = np.array([[460], [232], [315], [178]])
θ = ((np.linalg.inv(X.T.dot(X))).dot(X.T)).dot(y)
print(θ)
Result:
[[ 7.49398438e+02]
[ 1.65405273e-01]
[ -4.68750000e+00]
[ -4.79453125e+01]
[ -5.34570312e+00]]
With Julia:
X = [1 2104 5 1 45; 1 1416 3 2 40; 1 1534 3 2 30; 1 852 2 1 36]
y = [460; 232; 315; 178]
θ = ((X' * X)^-1) * X' * y
Result:
5-element Array{Float64,1}:
207.867
0.0693359
134.906
-77.0156
-7.81836
Furthermore, when I multiple X by Julia's — but not Python's — θ, I get numbers close to y.
I can't figure out what I am doing wrong. Thanks!
Using X^-1 vs the pseudo inverse
pinv(X) which corresponds to the pseudo inverse is more broadly applicable than inv(X), which X^-1 equates to. Neither Julia nor Python do well using inv, but in this case apparently Julia does better.
but if you change the expression to
julia> z=pinv(X'*X)*X'*y
5-element Array{Float64,1}:
188.4
0.386625
-56.1382
-92.9673
-3.73782
you can verify that X*z = y
julia> X*z
4-element Array{Float64,1}:
460.0
232.0
315.0
178.0
A more numerically robust approach in Python, without having to do the matrix algebra yourself is to use numpy.linalg.lstsq to do the regression:
In [29]: np.linalg.lstsq(X, y)
Out[29]:
(array([[ 188.40031942],
[ 0.3866255 ],
[ -56.13824955],
[ -92.9672536 ],
[ -3.73781915]]),
array([], dtype=float64),
4,
array([ 3.08487554e+03, 1.88409728e+01, 1.37100414e+00,
1.97618336e-01]))
(Compare the solution vector with #waTeim's answer in Julia).
You can see the source of the ill-conditioning by printing the matrix inverse you're calculating:
In [30]: np.linalg.inv(X.T.dot(X))
Out[30]:
array([[ -4.12181049e+13, 1.93633440e+11, -8.76643127e+13,
-3.06844458e+13, 2.28487459e+12],
[ 1.93633440e+11, -9.09646601e+08, 4.11827338e+11,
1.44148665e+11, -1.07338299e+10],
[ -8.76643127e+13, 4.11827338e+11, -1.86447963e+14,
-6.52609055e+13, 4.85956259e+12],
[ -3.06844458e+13, 1.44148665e+11, -6.52609055e+13,
-2.28427584e+13, 1.70095424e+12],
[ 2.28487459e+12, -1.07338299e+10, 4.85956259e+12,
1.70095424e+12, -1.26659193e+11]])
Eeep!
Taking the dot product of this with X.T leads to a catastrophic loss of precision.
Notice that X is a 4x5 matrix or in statistical terms that you have fewer observations than parameters to estimate. Therefore, the least squares problem has infinitely many solutions with the sum of the squared errors exactly equal to zero. In this case, the normal equations don't help you much because the matrix X'X is singular. Instead, you should just find a solution to X*b=y.
Most numerical linear algebra systems are based on the FORTRAN package LAPACK which uses the a pivoted QR factorization for solving the problem X*b=y. Since there are infinitely many solutions, LAPACK's picks the solution with the smallest norm. In Julia, you can get this solution, simply by writing
float(X)\y
(Unfortunately, the float part is necessary right now, but that will change.)
In exact arithmetic, you should get the same solution as the one above with either of your proposed methods, but the floating point representation of you problem introduces small rounding errors and these errors will affect the calculated solution. The effect of the rounding errors on the solution is much larger when using the normal equations compared to using the QR factorization directly on X.
This holds true also in the usual case where X has more rows than columns so often it is recommended that you avoid the normal equations when solving least squares problems. However, when X has many more rows than columns, the matrix X'X is relatively small. In this case, it will be much faster to solve the problem with the normal equations instead of using the QR factorization. In many statistical problems, the extra numerical error is extremely small compared to the statical error so the loss of precision due to the normal equations can simply be ignored.