Sparse matrix factorization with Nimfa is very slow with implicit zeros - python

I am trying to factorize very large matrixes with the python library Nimfa.
Since the matrix is so large I am unable to instanciate it in a dence format in memory, so instead I use scipy.sparse.csr_matrix.
The library has a sparse matrix function that is called Snmf: Sparse Nonnegative Matrix Factorization (SNMF), which appears to be what I am looking for.
When trying it out I had serious performance issues with the factorization (not memory representation but in speed) I have not yet been able to factor a simple 10 x 95 matrix that is sparse.
This is how I build the test matrix:
m1 = lil_matrix((10, 95))
for i in xrange(10):
for j in xrange(95):
if random.random() > 0.8: m1[i, j] = 1
m1 = csc_matrix(m1)
and this is how I run it
t = time()
fctr = nimfa.mf(m1,
seed = "random_vcol",
rank = 2,
method = "snmf",
max_iter = 15,
initialize_only = True,
version = 'r',
eta = 1.,
beta = 1e-4,
i_conv = 10,
w_min_change = 0)
print numpy.shape(m1)
a = nimfa.mf_run(fctr)
print a.coef()
print a.basis()
print time() - t
This doesn't seem to finish at all. But if i do m1.todense() it finishes in seconds. Since I am unable to instanciate my real matrix this is not really a good solution for me.
I have tried different scipy.sparse matrix format but to no avail: csc_matrix, csr_matrix and dok_matrix.
Am I using wrong matrix format? What matrix operations does the snmf algorithm need to execute quickly? Is there some other mistake I am overlooking?

I did some digging, and there seems to be a bug in their sparse implementation. What it is, I don't know, but if you look at line 289 in _spfcnlls len(f_set) never decreases and the loop runs forever. When the matrix is not sparse, that method is never called. I opened an issue on the github repository here.
In the meantime, is there a factorization function in numpy or scipy that would fit your needs?

Related

Matrix multiplication in CUDA running out of memory

I try to compute the matrix multiplication using the script:
import numpy as np
import math
from timeit import default_timer as timer
from numba import cuda
from numba import *
def mult(a,b):
return a*b
mult_gpu=cuda.jit(restype=float32,argtypes=[float32,float32],device=True)(mult)
#cuda.jit(argtypes=[float32[:,:],float32[:,:],float32[:,:,:]])
def mult_kernel(a,b,c):
Ni=c.shape[0]
Nj=c.shape[1]
Nk=c.shape[2]
startX,startY,startZ=cuda.grid(3)
gridX=cuda.gridDim.x*cuda.blockDim.x
gridY=cuda.gridDim.y*cuda.blockDim.y
gridZ=cuda.gridDim.z*cuda.blockDim.z
for i in range(startX,Ni,gridX):
for j in range(startY,Nj,gridY):
for k in range(startZ,Nk,gridZ):
c[i,j,k]=mult_gpu(a[i,k],b[j,k])
def main():
A=np.ones((20,50000),dtype=np.float32)
B=np.ones((3072,50000),dtype=np.float32)
C=np.ones((20,3072,50000),dtype=np.float32)
(Ni,Nj,Nk)=C.shape
my_gpu=cuda.get_current_device()
thread_ct=my_gpu.WARP_SIZE
block_ct_x=int(math.ceil(float(Ni)/thread_ct))
block_ct_y=int(math.ceil(float(Nj)/thread_ct))
block_ct_z=int(math.ceil(float(Nk)/thread_ct))
blockdim=thread_ct,thread_ct,thread_ct
griddim=block_ct_x,block_ct_y,block_ct_z
print "Threads per block:",blockdim
print "Blocks per grid:",griddim
start=timer()
Cg=cuda.to_device(C)
mult_kernel[griddim,blockdim](A,B,Cg)
Cg.to_host()
dt=timer()-start
print "Computation done in %f s"%(dt)
print 'C[:3,1,1] = ',C[:3,1,1]
print 'C[-3:,1,1] = ',C[-3:,1,1]
if __name__=='__main__':
main()
Executing this yields an error:
numba.cuda.cudadrv.driver.CudaAPIError: [2] Call to cuMemAlloc results in CUDA_ERROR_OUT_OF_MEMORY
How could I fix this memory problem?
Nevertheless, using smaller matrices
A=np.ones((20,500),dtype=np.float32)
B=np.ones((372,500),dtype=np.float32)
C=np.ones((20,372,500),dtype=np.float32)
there is still an error:
numba.cuda.cudadrv.driver.CudaAPIError: [1] Call to cuLaunchKernel results in CUDA_ERROR_INVALID_VALUE
I got inspired by the Mandelbrot Example to implement the computation above.
EDIT1
In order to resolve any confusion, this is actually a 3D matrix by 3D matrix multiplication:
A=np.ones((20,1,50000),dtype=np.float32)
B=np.ones((1,3072,50000),dtype=np.float32)
C=np.ones((20,3072,50000),dtype=np.float32)
I skipped one dimension in A and B because it is not necessary for the computation.
EDIT2
My GPU is:
In [1]: from numba import cuda
In [2]: gpu=cuda.get_current_device()
In [3]: gpu.name
Out[3]: 'GeForce GT 750M'
EDIT3
According to the memory of my GPU (2GB), I reduced the size of each dimension by 2:
dimx=10
dimy=1536
dimz=25000
A=np.ones((dimx,dimz),dtype=np.float32)
B=np.ones((dimy,dimz),dtype=np.float32)
C=np.ones((dimx,dimy,dimz),dtype=np.float32)
But I still receive the CUDA_ERROR_OUT_OF_MEMORY error. How could one explain this?
The calculation yields a size of about 1.7GB for the 3 matrices:
(10*1536*25000*4.+10*25000*4+1536*25000*4.)/(10**9)=1.6906
Regarding the first problem, you're running out of memory. A major contributor to that is that this isn't the way people would normally do a matrix-matrix multiply. Normally, as you are multiplying row and column elements together, you would keep a running sum, then store that sum in the appropriate location in the product (result) matrix. This will allow you to have a much smaller size for the c matrix, ie. it need only be 2 dimensions, not 3. You may wish to just study the linear algebra definition of matrix-matrix multiply. When you multiply a 2D matrix by a 2D matrix, the result is a 2D matrix, not a 3D matrix.
In a nutshell, something like this:
for i in range(startX,Ni,gridX):
for j in range(startY,Nj,gridY):
c[i,j] = 0
for k in range(startZ,Nk,gridZ):
c[i,j]= c[i,j] + mult_gpu(a[i,k],b[j,k])
And adjust your c shape accordingly.
If you actually need the individual products in 3D form as you are doing here, then there is not much I can say except that you will need to scale the problem to fit in the GPU memory size for whatever GPU you are using.
Regarding the second problem, you have a problem here:
thread_ct=my_gpu.WARP_SIZE
...
blockdim=thread_ct,thread_ct,thread_ct
WARP_SIZE is 32 (presumably) so you are asking for a 3D block of dimensions 32*32*32 = 32K threads. But CUDA threadblocks are limited to a maximum of 1024 threads, which limit is the product of the individual dimensions.
If you change your thread_ct to 8, for example:
thread_ct=8
You should be able to get past this particular issue.

Use of scipy sparse in ode solver

I am trying to solve a differential equation system
x´=Ax with x(0) = f(x)
in python, where A indeed is a complex sparse matrix.
For now i have been solving the system using the scipy.integrate.complex_ode class in the following way.
def to_solver_function(time,vector):
sendoff = np.dot(A, np.transpose(vector))
return sendoff
solver = complex_ode(to_solver_function)
solver.set_initial_value(f(x),0)
solution = [f(x)]
for time in time_grid:
next = solver.integrate(time)
solution.append(next)
This has been working OK, but I need to "tell the solver" that my matrix is sparse. I figured out that i should use
Asparse = sparse.lil_matrix(A)
but how do i change my solver to work with this?
How large and sparse is A?
It looks like A is just a constant in this function:
def to_solver_function(time,vector):
sendoff = np.dot(A, np.transpose(vector))
return sendoff
Is vector 1d? Then np.transpose(vector) does nothing.
For calculation purposes you want
Asparse = sparse.csr_matrix(A)
Does np.dot(Asparse, vector) work? np.dot is supposed to be sparse aware. If not, try Asparse*vector. This probably produces a dense matrix, so you may need (Asparse*vector).A1 to produce a 1d array.
But check the timings. Asparse needs to quite large and very sparse to perform faster than A in a dot product.

How can I generate a large identity matrix in python without running into "memory full"?

I am using the code below:
n = 40000
numpy.matlib.identity(n)
You can do this with scipy using sparse matrix representation:
import numpy as np
from scipy.sparse import identity
n = 30000
a = np.identity(n)
print a.nbytes
b = identity(n)
print b.data.nbytes
The difference is huge (quadratic): 7200000000 vs 240000.
You can also try to decrease the size by providing appropriate dtype, like a = np.identity(n, dtype='int8') but this will only reduce the size linearly (with maximum linear factor of less than 200).
The same way you can do b = identity(n, dtype='int8', format='dia') which will reduce the size even further to 30000.
But the most important thing is what are you planning to do with this matrix (highly doubt you just want to create it)? And some of the operations would not support sparse indices. Then you either have to buy more memory or come up with smart linear-algebra stuff to operate on parts of the matrices, store results on disk and merge them together.

root mean square in numpy and complications of matrix and arrays of numpy

Can anyone direct me to the section of numpy manual where i can get functions to accomplish root mean square calculations ...
(i know this can be accomplished using np.mean and np.abs .. isn't there a built in ..if no why?? .. just curious ..no offense)
can anyone explain the complications of matrix and arrays (just in the following case):
U is a matrix(T-by-N,or u say T cross N) , Ue is another matrix(T-by-N)
I define k as a numpy array
U[ind,:] is still matrix
in the following fashion
k = np.array(U[ind,:])
when I print k or type k in ipython
it displays following
K = array ([[2,.3 .....
......
9]])
You see the double square brackets (which makes it multi-dim i guess)
which gives it the shape = (1,N)
but I can't assign it to array defined in this way
l = np.zeros(N)
shape = (,N) or perhaps (N,) something like that
l[:] = k[:]
error:
matrix dimensions incompatible
Is there a way to accomplish the vector assignment which I intend to do ... Please don't tell me do this l = k (that defeats the purpose ... I get different errors in program .. I know the reasons ..If you need I may attach the piece of code)
writing a loop is the dumb way .. which I'm using for the time being ...
I hope I was able to explain .. the problems I'm facing ..
regards ...
For the RMS, I think this is the clearest:
from numpy import mean, sqrt, square, arange
a = arange(10) # For example
rms = sqrt(mean(square(a)))
The code reads like you say it: "root-mean-square".
For rms, the fastest expression I have found for small x.size (~ 1024) and real x is:
def rms(x):
return np.sqrt(x.dot(x)/x.size)
This seems to be around twice as fast as the linalg.norm version (ipython %timeit on a really old laptop).
If you want complex arrays handled more appropriately then this also would work:
def rms(x):
return np.sqrt(np.vdot(x, x)/x.size)
However, this version is nearly as slow as the norm version and only works for flat arrays.
For the RMS, how about
norm(V)/sqrt(V.size)
I don't know why it's not built in. I like
def rms(x, axis=None):
return sqrt(mean(x**2, axis=axis))
If you have nans in your data, you can do
def nanrms(x, axis=None):
return sqrt(nanmean(x**2, axis=axis))
Try this:
U = np.zeros((N,N))
ind = 1
k = np.zeros(N)
k[:] = U[ind,:]
I use this for RMS, all using NumPy, and let it also have an optional axis similar to other NumPy functions:
import numpy as np
rms = lambda V, axis=None: np.sqrt(np.mean(np.square(V), axis))
If you have complex vectors and are using pytorch, the vector norm is the fastest approach on CPU & GPU:
import torch
batch_size, length = 512, 4096
batch = torch.randn(batch_size, length, dtype=torch.complex64)
scale = 1 / torch.sqrt(torch.tensor(length))
rms_power = batch.norm(p=2, dim=-1, keepdim=True)
batch_rms = batch / (rms_power * scale)
Using batch vdot like goodboy's approach is 60% slower than above. Using naïve method similar to deprecated's approach is 85% slower than above.

Python Inverse of a Matrix

How do I get the inverse of a matrix in python? I've implemented it myself, but it's pure python, and I suspect there are faster modules out there to do it.
You should have a look at numpy if you do matrix manipulation. This is a module mainly written in C, which will be much faster than programming in pure python. Here is an example of how to invert a matrix, and do other matrix manipulation.
from numpy import matrix
from numpy import linalg
A = matrix( [[1,2,3],[11,12,13],[21,22,23]]) # Creates a matrix.
x = matrix( [[1],[2],[3]] ) # Creates a matrix (like a column vector).
y = matrix( [[1,2,3]] ) # Creates a matrix (like a row vector).
print A.T # Transpose of A.
print A*x # Matrix multiplication of A and x.
print A.I # Inverse of A.
print linalg.solve(A, x) # Solve the linear equation system.
You can also have a look at the array module, which is a much more efficient implementation of lists when you have to deal with only one data type.
Make sure you really need to invert the matrix. This is often unnecessary and can be numerically unstable. When most people ask how to invert a matrix, they really want to know how to solve Ax = b where A is a matrix and x and b are vectors. It's more efficient and more accurate to use code that solves the equation Ax = b for x directly than to calculate A inverse then multiply the inverse by B. Even if you need to solve Ax = b for many b values, it's not a good idea to invert A. If you have to solve the system for multiple b values, save the Cholesky factorization of A, but don't invert it.
See Don't invert that matrix.
It is a pity that the chosen matrix, repeated here again, is either singular or badly conditioned:
A = matrix( [[1,2,3],[11,12,13],[21,22,23]])
By definition, the inverse of A when multiplied by the matrix A itself must give a unit matrix. The A chosen in the much praised explanation does not do that. In fact just looking at the inverse gives a clue that the inversion did not work correctly. Look at the magnitude of the individual terms - they are very, very big compared with the terms of the original A matrix...
It is remarkable that the humans when picking an example of a matrix so often manage to pick a singular matrix!
I did have a problem with the solution, so looked into it further. On the ubuntu-kubuntu platform, the debian package numpy does not have the matrix and the linalg sub-packages, so in addition to import of numpy, scipy needs to be imported also.
If the diagonal terms of A are multiplied by a large enough factor, say 2, the matrix will most likely cease to be singular or near singular. So
A = matrix( [[2,2,3],[11,24,13],[21,22,46]])
becomes neither singular nor nearly singular and the example gives meaningful results... When dealing with floating numbers one must be watchful for the effects of inavoidable round off errors.
For those like me, who were looking for a pure Python solution without pandas or numpy involved, check out the following GitHub project: https://github.com/ThomIves/MatrixInverse.
It generously provides a very good explanation of how the process looks like "behind the scenes". The author has nicely described the step-by-step approach and presented some practical examples, all easy to follow.
This is just a little code snippet from there to illustrate the approach very briefly (AM is the source matrix, IM is the identity matrix of the same size):
def invert_matrix(AM, IM):
for fd in range(len(AM)):
fdScaler = 1.0 / AM[fd][fd]
for j in range(len(AM)):
AM[fd][j] *= fdScaler
IM[fd][j] *= fdScaler
for i in list(range(len(AM)))[0:fd] + list(range(len(AM)))[fd+1:]:
crScaler = AM[i][fd]
for j in range(len(AM)):
AM[i][j] = AM[i][j] - crScaler * AM[fd][j]
IM[i][j] = IM[i][j] - crScaler * IM[fd][j]
return IM
But please do follow the entire thing, you'll learn a lot more than just copy-pasting this code! There's a Jupyter notebook as well, btw.
Hope that helps someone, I personally found it extremely useful for my very particular task (Absorbing Markov Chain) where I wasn't able to use any non-standard packages.
You could calculate the determinant of the matrix which is recursive
and then form the adjoined matrix
Here is a short tutorial
I think this only works for square matrices
Another way of computing these involves gram-schmidt orthogonalization and then transposing the matrix, the transpose of an orthogonalized matrix is its inverse!
Numpy will be suitable for most people, but you can also do matrices in Sympy
Try running these commands at http://live.sympy.org/
M = Matrix([[1, 3], [-2, 3]])
M
M**-1
For fun, try M**(1/2)
If you hate numpy, get out RPy and your local copy of R, and use it instead.
(I would also echo to make you you really need to invert the matrix. In R, for example, linalg.solve and the solve() function don't actually do a full inversion, since it is unnecessary.)

Categories