I am using Python 3.23 and I am want to multiply a sparse VECTOR with a dense MATRIX. The idea of first unfolding the sparse vector into a dense one and then multiplying is of course silly from any standpoint except for mem management until the actual unfolding. It will be more expensive with zeros in there...
Also, does any one know of a good way for SciPy to keep one dimensional matrices in sparse mode? The only one (admittedly) i have used is the classical notation of three vectors (x,y,value), so i have had to use np.ones(len(...)) to get it to work.
Well.. comments welcome!
Store the vector using the Scipy sparse matrix classes:
x = csr_matrix(np.random.rand(1000) > 0.99).T
print x.shape # (1000, 1)
Related
I am trying to get rid of the for loop and instead do an array-matrix multiplication to decrease the processing time when the weights array is very large:
import numpy as np
sequence = [np.random.random(10), np.random.random(10), np.random.random(10)]
weights = np.array([[0.1,0.3,0.6],[0.5,0.2,0.3],[0.1,0.8,0.1]])
Cov_matrix = np.matrix(np.cov(sequence))
results = []
for w in weights:
result = np.matrix(w)*Cov_matrix*np.matrix(w).T
results.append(result.A)
Where:
Cov_matrix is a 3x3 matrix
weights is an array of n lenght with n 1x3 matrices in it.
Is there a way to multiply/map weights to Cov_matrix and bypass the for loop? I am not very familiar with all the numpy functions.
I'd like to reiterate what's already been said in another answer: the np.matrix class has much more disadvantages than advantages these days, and I suggest moving to the use of the np.array class alone. Matrix multiplication of arrays can be easily written using the # operator, so the notation is in most cases as elegant as for the matrix class (and arrays don't have several restrictions that matrices do).
With that out of the way, what you need can be done in terms of a call to np.einsum. We need to contract certain indices of three matrices while keeping one index alone in two matrices. That is, we want to perform w_{ij} * Cov_{jk} * w.T_{ki} with a summation over j, k, giving us an array with i indices. The following call to einsum will do:
res = np.einsum('ij,jk,ik->i', weights, Cov_matrix, weights)
Note that the above will give you a single 1d array, whereas you originally had a list of arrays with shape (1,1). I suspect the above result will even make more sense. Also, note that I omitted the transpose in the second weights argument, and this is why the corresponding summation indices appear as ik rather than ki. This should be marginally faster.
To prove that the above gives the same result:
In [8]: results # original
Out[8]: [array([[0.02803215]]), array([[0.02280609]]), array([[0.0318784]])]
In [9]: res # einsum
Out[9]: array([0.02803215, 0.02280609, 0.0318784 ])
The same can be achieved by working with the weights as a matrix and then looking at the diagonal elements of the result. Namely:
np.diag(weights.dot(Cov_matrix).dot(weights.transpose()))
which gives:
array([0.03553664, 0.02394509, 0.03765553])
This does more calculations than necessary (calculates off-diagonals) so maybe someone will suggest a more efficient method.
Note: I'd suggest slowly moving away from np.matrix and instead work with np.array. It takes a bit of getting used to not being able to do A*b but will pay dividends in the long run. Here is a related discussion.
I'm trying to generate a kernel function for GP using only Matrix operations (no loops).
Vectors where no problem taking advantage of broadcasting
def kernel(A,B):
return 1/np.exp(np.linalg.norm(A-B.T))**2
A and B are both [n,1] vectors, but with [n,m] shaped matrices It just doesn't work. (Also tried reshaping to [1,n,m])
I'm interested on computing an X Matrix where every ij-th element is defined by Ai-Bj.
Now I'm working on Numpy but my final objective is implement this on Tensorflow.
Thanks in Advance.
Suppose I have two dense matrices U (10000x50) and V(50x10000), and one sparse matrix A(10000x10000). Each element in A is either 1 or 0. I hope to find A*(UV), noting that '*' is element-wise multiplication. To solve the problem, Scipy/numpy will calculate a dense matrix UV first. But UV is dense and large (10000x10000) so it's very slow.
Because I only need a few elements of UV indicated by A, it should save a lot of time if only necessary elements are calculated instead of calculating all elements then filtering using A. Is there a way to instruct scipy to do this?
BTW, I used Matlab to solve this problem and Matlab is smart enough to find what I'm trying to do and works efficiently.
Update:
I found Matlab calculated UV fully as scipy does. My scipy installation is simply too slow...
Here's a test script and possible speedup. The basic idea is to use the nonzero coordinates of A to select rows and columns of U and V, and then use einsum to perform a subset of the possible dot products.
import numpy as np
from scipy import sparse
#M,N,d = 10,5,.1
#M,N,d = 1000,50,.1
M,N,d = 5000,50,.01 # about the limit for my memory
A=sparse.rand(M,M,d)
A.data[:] = 1 # a sparse 0,1 array
U=(np.arange(M*N)/(M*N)).reshape(M,N)
V=(np.arange(M*N)/(M*N)).reshape(N,M)
A1=A.multiply(U.dot(V)) # the direct solution
A2=np.einsum('ij,ik,kj->ij',A.A,U,V)
print(np.allclose(A1,A2))
def foo(A,U,V):
# use A to select elements of U and V
A3=A.copy()
U1=U[A.row,:]
V1=V[:,A.col]
A3.data[:]=np.einsum('ij,ji->i',U1,V1)
return A3
A3 = foo(A,U,V)
print(np.allclose(A1,A3.A))
The 3 solutions match. For large arrays, foo is about 2x faster than the direct solution. For small size, the pure einsum is competitive, but bogs down for large arrays.
The use of dot in foo would have computed too many products, ij,jk->ik as opposed to ij,ji->i.
I'm looking for simple sparse vector implementation that can be mapped into memory, similarly to numpy.memmap.
Unfortunately, numpy implementation deals only with full vector. Example usage:
vec = SparseVector('/tmp/file.dat') # SparseVector is the class I'm looking for
vec[10] = 10
vec[50] = 21
for key in vec:
print vec[key] # 10, 21
I foung scipy class representing sparse matrix, however 2 dimensions are clumsy to use as I'd need to make matrix with only one row a then use vec[0,i].
Any suggestions?
Someone else was just asking about 1d sparse vectors, only they wanted to take advantage of the scipy.sparse method of handling duplicate indices.
is there something like coo_matrix but for sparse vectors?
As shown there, a coo_matrix actually consists of 3 numpy arrays, data, row, col. Other formats rearrange the values in other ways, lil for example has 2 nested lists, one for the data, another for the coordinates. dok is a regular dictionary, with (i,j) tuples as keys.
In theory then a sparse vector will require 2 arrays. Or as your example shows it could be a simple dictionary.
So you could implement a mmap sparse vector by using two mmap arrays. As far as I know there isn't a mmap version of the scipy sparse matrices, though it's not something I've looked for.
But what functionality do you want? What dimension? So large that a dense version would not fit in regular memory? Are you doing math with it? Or just data lookup?
I have a matrix B that is square and dense, and a matrix A that is rectangular and sparse.
Is there a way to efficiently compute the product B^-1 * A?
So far, I use (in numpy)
tmp = B.inv()
return tmp * A
which, I believe, makes us of A's sparsity. I was thinking about using the sparse method
numpy.sparse.linalg.spsolve, but this requires B, and not A, to be sparse.
Is there another way to speed things up?
Since the matrix to be inverted is dense, spsolve is not the tool you want. In addition, it is bad numerical practice to calculate the inverse of a matrix and multiply it by another - you are much better off using LU decomposition, which is supported by scipy.
Another point is that unless you are using the matrix class (I think that the ndarray class is better, this is something of a question of taste), you need to use dot instead of the multiplication operator. And if you want to efficiently multiply a sparse matrix by a dense matrix, you need to use the dot method of the sparse matrix. Unfortunately this only works if the first matrix is sparse, so you need to use the trick which Anycorn suggested of taking the transpose to swap the order of operations.
Here is a lazy implementation which doesn't use the LU decomposition, but which should otherwise be efficient:
B_inv = scipy.linalg.inv(B)
C = (A.transpose().dot(B_inv.transpose())).transpose()
Doing it properly with the LU decomposition involves finding a way to efficiently multiply a triangular matrix by a sparse matrix, which currently eludes me.