Given a query vector (one-hot-vector) q with size of 50000x1 and a large sparse matrix A with size of 50000 x 50000 and nnz of A is 0.3 billion, I want to compute r=(A + A^2 + ... + A^S)q (usually 4 <= S <=6).
I can above equation iteratively using loop
r = np.zeros((50000,1))
for i in range(S):
q = A.dot(q)
r += q
but I want to more fast method.
First thought was A can be symmetric, so eigen decomposition would help for compute power of A. But since A is large sparse matrix, decomposition makes dense matrix with same size as A which leads to performance degradation (in aspect of memory and speed).
Low Rank Approximation was also considered. But A is large and sparse, so not sure which rank r is appropriate.
It is totally fine to pre-compute something, like A + A^2 + ... + A^S = B. But I hope last computation will be fast: compute Bq less than 40ms.
Is there any reference or paper or tricks for that?
Even if the matrix is not sparse this the iterative method is the way to go.
Multiplying A.dot(q) has complexity O(N^2), while computing A.dot(A^i) has complexity O(N^3).
The fact that q is sparse (indeed much more sparse than A) may help.
For the first iteration A*q can be computed as A[q_hot_index,:].T.
For the second iteration the expected density of A # q has the same expected density as A for A (about 10%) so it is still good to do it sparsely.
For the third iteration afterwards the A^i # q will be dense.
Since you are accumulating the result it is good that your r is not sparse, it prevents index manipulation.
There are several different ways to store sparse matrices. I myself can't say I understand in depth all of them, but I think csr_matrix, csc_matrix, are the most compact for generic sparse matrices.
Eigen decomposition is good when you need to compute a P(A), to compute a P(A)*q the eigen decomposition becomes advantageous only when P(A) has degree of the order of the size of A. Eigen-decomposition has complexity O(N^3), a matrix-vector product has complexity O(N^2), the evaluation of a polynomial P(A) of degree D using the eigen decomposition can be achieved in O(N^3 + N*D).
Edit: Answering questionss on the comments
"it prevents index manipulation" <- Could you elaborate this?
Suppose you have a sparse matrix [0,0,0,2,0,7,0]. This could be described as ((3,2), (5,7)). Now suppose you assigne 1 to one element and it becomes [0,0,0,2,1,7,0], it is now represented as ((3,2), (4,1), (5,7)). The assignment is performed by insertion in an array, and inserting in an array has complexity O(nnz), where nnz is the number of nonzero elements. If you have a dense matrix you can always modify one element with complexity O(1).
What is the N in the complexity?
It is the number of rows, or columns, of the matrix A
About the eigen decomposition, do you want to say that it is worth
that computing r can be achieved in O(N^3 +N*D) not O(N^3 + N^2)
Computing P(A) will have complexity O(N^3 * D) (with a different constant), for big matrices, computing P(A) using the eigen decomposition is probably the most efficient. But P(A)x have O(N^2 * D) complexity, so it is probably not a good idea to compute P(A)x with eigen decomposition unless you have big D (>N), when speed is concerned.
Related
I would like to generate invertible matrices (specifically those from GL(n), a general linear group of size n) using Tensorflow and/or Numpy for use with my neural network.
How can this be done and what would be the best way of doing so?
I understand there is a way to generate symmetric invertible matrices by computing (A + A.T)/2 for arbitrary square matrices A, however, I would like mine to not just be symmetric.
I happened to have found one way which I believe can generate a large variety of random invertible matrices using diagonal dominance.
The theorem is that given an nxn matrix, if the abs of the diagonal element is larger than the sum of the abs of all the row elements with respect to the row the diagonal element is in, and this holds true for all rows, then the underlying matrix is invertible. (here is the corresponding wikipedia article: https://en.wikipedia.org/wiki/Diagonally_dominant_matrix)
Therefore the following code snippet generates an arbitrary invertible matrix.
n = 5 # size of invertible matrix I wish to generate
m = np.random.rand(n, n)
mx = np.sum(np.abs(m), axis=1)
np.fill_diagonal(m, mx)
I have a NumPy array vectors = np.random.randn(rows, cols). I want to find differences between its rows according to some other array diffs which is sparse and "2-hot": containing a 1 in its column corresponding to the first row of vectors and a -1 corresponding to the second row. Perhaps an example shall make it clearer:
diffs = np.array([[ 1, 0, -1],
[ 1, -1, 0]])
then I can compute the row differences by simply diffs # vectors.
Unfortunately this is slow for diffs of 10_000x1000 and vectors 1000x15_000. I can get a speedup by using scipy.sparse: sparse.csr_matrix(diffs) # vectors, but even this is 300ms.
Possibly this is simply as fast as it gets, but some part of me thinks whether using matrix multiplications is the wisest decision for this task.
What's more is I need to take the absolute value afterwards so really I'm doing np.abs(sparse.csr_matrix(diffs) # vectors) which adds ~ 200ms for a grand total of ~500ms.
I can compute the row differences by simply diffs # vectors.
This is very inefficient. A matrix multiplication runs in O(n*m*k) for a (n,m) multiplied by a (m,k) one. In your case, there is only two values per line and you do not actually need a multiplication by 1 or -1. Your problem can be computed in O(n*k) time (ie. m times faster).
Unfortunately this is slow for diffs of 10_000x1000 and vectors 1000x15_000. I can get a speedup by using scipy.sparse.
The thing is the input data representation is inefficient. When diff is an array of size (10_000,1000), this is not reasonable to use a dense matrix that would be ~1000 times bigger than needed nor a sparse matrix that is not optimized for having only two non-zero values (especially 1 and -1). You need to store the position of the non-zeros values in a 2D array called sel_rows of shape (2,n) where the first row contains the location of the 1 and the second one contains the location of the -1 in the diff 2D array. Then, you can extract the rows of vectors for example with vectors[sel_rows[0]]. You can perform the final operation with vectors[sel_rows[0,:]] - vectors[sel_rows[1,:]]. This approach should be drastically faster than a dense matrix product and it may be a bit faster than a sparse one regarding the target machine.
While the above solution is simple, it create multiple temporary arrays that are not cache-friendly since your output array should take 10_000 * 15_000 * 8 = 1.1 GiB (which is quite huge). You can use Numba so to remove temporary array and so improve the performance. Multiple threads can be used to improve performance even further. Here is an untested code:
import numba as nb
#nb.njit('(int_[:,::1], float64[:,::1])', parallel=True)
def compute(diffs, vectors):
n, k = diffs.shape[0], vectors.shape[1]
assert diffs.shape[1] == 2
res = np.empty((n, k))
for i in nb.prange(n):
a, b = diffs[i]
for j in range(k):
# Compute nb.abs here if needed so to avoid
# creating new temporary arrays
res[i, j] = vectors[a, j] - vectors[b, j]
return res
This above code should be nearly optimal. It should be memory bound and able to saturate the memory bandwidth. Note that writing such huge arrays in memory take some time as well as reading (twice) the input array. On x86-64 platforms, a basic implementation should move 4.4 GiB of data from/to the RAM. Thus, on a mainstream PC with a 20 GiB/s RAM, this takes 220 ms. In fact, the sparse matrix computation result was not so bad in practice for a sequential implementation.
If this is not enough to you, then you can use simple-precision floating-point numbers instead of double-precision (twice faster). You could also use a low-level C/C++ implementation so to reduce the memory bandwidth used (thanks to non-temporal instructions -- ~30% faster). There is no much more to do.
I have a very big sparse matrix A = 7Mi-by-7Mi matrix. I am using Matlab's eigs(A,k) function which can calculate first k eigenvalues and vectors.
I need all of its eigenvector and values. But I can't store all of the eigenvectors because it requires a lot of memory.
Is there any way (in Matlab or Python) by which I can get eigenvectors one by one in a for loop? i.e. in ith iteration, I get the ith eigenvector and value.
If you have a good guess about how large the eigenvalue you are looking for is, say lambda_guess, you can use the Power iteration on
(A - lambda_guess* Id)^-1
This approach is sometimes referred to as the inverse-shift method. Here the method will converge to the eigenvalue closest to lambda_guess (and the better your guess the faster the convergence). Note that you wouldn't store the inverse, but only compute the solution of
x_next_iter = solve(A - lambda_guess*Id, x_iter), possibly itself with an iterative linear solver.
I would combine this with a subspace iteration method with subspace at least size two. This way, on your first iteration, you can find the smallest and second smallest eigenvalues lambda1, lambda2.
Then you can try lambdaguess= lambda2+ epsilon so that The first and second eigenvector outputted correspond to the second and third smallest eigenvalues, respectively.(if the first eigenvalue of this iteration is not the same as the value of lambda2 for your previous iteration, you need to make epsilon smaller and repeat. In practice you would test that their difference is small enough, to account for roundoff error and the fact that iterative methods are never exact). You repeat this until you get the eigenvalue number you are looking for. It's going to be slow, but you will only use two eigenvectors at any time.
NOTE: we assume all eigenvalues are distinct, otherwise this problem will not have a low memory solution with the usual techniques. In general, if the maximal multiplicity of an eigenvalue is m, you would need m vectors in memory for subspace iteration to converge.
I have the need to write an algorithm that deals with some very low rank (compared to the dimension) square matrices. I'd like to write such matrices as sum of "product" of a (d, 1)-matrix with a (1, d)-matrix by saving only a list of the vectors.
Also I'd like to have left and right matrix multiplication done with application of the matrix to the vectors: i.e. call $M = \sum_i v_i * w_i^T$ then I'd like that $TM = \sum_i (T v_i) * w_i^T$ and the like.
I've not seen any such thing in scipy but this would be really useful since matrix multiplication now becomes some matrix-vector multiplication.
Please note that the rank of my matrices is about 20, while their dimension is about 400.000, so this would save my computations a lot of time.
Please also note that such matrices are not sparse, they are just low rank and already decomposed into a sum of (d, 1)-matrix with a (1, d)-matrix.
How do you advise to do such a thing? Where can i found references to add a matrix type to scipy?
I've got a 2x2 matrix defined by the variables J00, J01, J10, J11 coming in from other inputs. Since the matrix is small, I was able to compute the spectral norm by first computing the trace and determinant
J_T = tf.reduce_sum([J00, J11])
J_ad = tf.reduce_prod([J00, J11])
J_cb = tf.reduce_prod([J01, J10])
J_det = tf.reduce_sum([J_ad, -J_cb])
and then solving the quadratic
L1 = J_T/2.0 + tf.sqrt(J_T**2/4.0 - J_det)
L2 = J_T/2.0 - tf.sqrt(J_T**2/4.0 - J_det)
spectral_norm = tf.maximum(L1, L2)
This works, but it looks rather ugly and it isn't generalizable to larger matrices. Is there cleaner way (maybe a method call that I'm missing) to compute spectral_norm?
The spectral norm of a matrix J equals the largest singular value of the matrix.
Therefore you can use tf.svd() to perform the singular value decomposition, and take the largest singular value:
spectral_norm = tf.svd(J,compute_uv=False)[...,0]
where J is your matrix.
Notes:
I use compute_uv=False since we are interested only in singular values, not singular vectors.
J does not need to be square.
This solution works also for the case where J has any number of batch dimensions (as long as the two last dimensions are the matrix dimensions).
The elipsis ... operation works as in NumPy.
I take the 0 index because we are interested only in the largest singular value.