Until now I used numpy.linalg.eigvals to calculate the eigenvalues of quadratic matrices with at least 1000 rows/columns and, for most cases, about a fifth of its entries non-zero (I don't know if that should be considered a sparse matrix). I found another topic indicating that scipy can possibly do a better job.
However, since I have to calculate the eigenvalues for hundreds of thousands of large matrices of increasing size (possibly up to 20000 rows/columns and yes, I need ALL of their eigenvalues), this will always take awfully long. If I can speed things up, even just the tiniest bit, it would most likely be worth the effort.
So my question is: Is there a faster way to calculate the eigenvalues when not restricting myself to python?
#HighPerformanceMark is correct in the comments, in that the algorithms behind numpy (LAPACK and the like) are some of the best, but perhaps not state of the art, numerical algorithms out there for diagonalizing full matrices. However, you can substantially speed things up if you have:
Sparse matrices
If your matrix is sparse, i.e. the number of filled entries is k, is such that k<<N**2 then you should look at scipy.sparse.
Banded matrices
There are numerous algorithms for working with matrices of a specific banded structure.
Check out the solvers in scipy.linalg.solve.banded.
Largest Eigenvalues
Most of the time, you don't really need all of the eigenvalues. In fact, most of the physical information comes from the largest eigenvalues and the rest are simply high frequency oscillations that are only transient. In that case you should look into eigenvalue solutions that quickly converge to those largest eigenvalues/vectors such as the Lanczos algorithm.
An easy way to maybe get a decent speedup with no code changes (especially on a many-core machine) is to link numpy to a faster linear algebra library, like MKL, ACML, or OpenBLAS. If you're associated with an academic institution, the excellent Anaconda python distribution will let you easily link to MKL for free; otherwise, you can shell out $30 (in which case you should try the 30-day trial of the optimizations first) or do it yourself (a mildly annoying process but definitely doable).
I'd definitely try a sparse eigenvalue solver as well, though.
Related
EDIT: Original post too vague. I am looking for an algorithm to solve a large-system, solvable, linear IVP that can handle very small floating point values. Solving for the eigenvectors and eigenvalues is impossible with numpy.linalg.eig() as the returned values are complex and should not be, it does not support numpy.float128 either, and the matrix is not symmetric so numpy.linalg.eigh() won't work. Sympy could do it given an infinite amount of time, but after running it for 5 hours I gave up. scipy.integrate.solve_ivp() works with implicit methods (have tried Radau and BDF), but the output is wildly wrong. Are there any libraries, methods, algorithms, or solutions for working with this many, very small numbers?
Feel free to ignore the rest of this.
I have a 150x150 sparse (~500 nonzero entries of 22500) matrix representing a system of first order, linear differential equations. I'm attempting to find the eigenvalues and eigenvectors of this matrix to construct a function that serves as the analytical solution to the system so that I can just give it a time and it will give me values for each variable. I've used this method in the past for similar 40x40 matrices, and it's much (tens, in some cases hundreds of times) faster than scipy.integrate.solve_ivp() and also makes post model analysis much easier as I can find maximum values and maximum rates of change using scipy.optimize.fmin() or evaluate my function at inf to see where things settle if left long enough.
This time around, however, numpy.linalg.eig() doesn't seem to like my matrix and is giving me complex values, which I know are wrong because I'm modeling a physical system that can't have complex rates of growth or decay (or sinusoidal solutions), much less complex values for its variables. I believe this to be a stiffness or floating point rounding problem where the underlying LAPACK algorithm is unable to handle either the very small values (smallest is ~3e-14, and most nonzero values are of similar scale) or disparity between some values (largest is ~4000, but values greater than 1 only show up a handful of times).
I have seen suggestions for similar users' problems to use sympy to solve for the eigenvalues, but when it hadn't solved my matrix after 5 hours I figured it wasn't a viable solution for my large system. I've also seen suggestions to use numpy.real_if_close() to remove the imaginary portions of the complex values, but I'm not sure this is a good solution either; several eigenvalues from numpy.linalg.eig() are 0, which is a sign of error to me, but additionally almost all the real portions are of the same scale as the imaginary portions (exceedingly small), which makes me question their validity as well. My matrix is real, but unfortunately not symmetric, so numpy.linalg.eigh() is not viable either.
I'm at a point where I may just run scipy.integrate.solve_ivp() for an arbitrarily long time (a few thousand hours) which will probably take a long time to compute, and then use scipy.optimize.curve_fit() to approximate the analytical solutions I want, since I have a good idea of their forms. This isn't ideal as it makes my program much slower, and I'm also not even sure it will work with the stiffness and rounding problems I've encountered with numpy.linalg.eig(); I suspect Radau or BDF would be able to navigate the stiffness, but not the rounding.
Anybody have any ideas? Any other algorithms for finding eigenvalues that could handle this? Can numpy.linalg.eig() work with numpy.float128 instead of numpy.float64 or would even that extra precision not help?
I'm happy to provide additional details upon request. I'm open to changing languages if needed.
As mentioned in the comment chain above the best solution for this is to use a Matrix Exponential, which is a lot simpler (and apparently less error prone) than diagonalizing your system with eigenvectors and eigenvalues.
For my case I used scipy.sparse.linalg.expm() since my system is sparse. It's fast, accurate, and simple. My only complaint is the loss of evaluation at infinity, but it's easy enough to work around.
I'm trying to solve a system of equations that is a 1 Million x 1 Million square matrix and one 1 Million solution vector.
To do this, I'm using np.linalg.solve(matrix, answers) but it's taking a very long time.
Is there a way to speed it up?
Thanks #Chris but that doesn't answer the question since I've also tried using the Scipy module and it still takes a very long time to solve. I don't think my computer can hold that much data in RAM
OK for clarity, I've just found out that the name of the matrix that I'm trying to solve is a Hilbert matrix
Please reconsider the need for solving such a HUGE system unless your system is very sparse.
Indeed, this is barely possible to store the input/output on a PC storage device: the input dense matrix takes 8 TB with double-precision values and the output will certainly also takes few TB not to mention a temporary data storage is needed to compute the result (at least 8 TB for a dense matrix). Sparse matrices can help a lot if your input matrix is almost full of zeros but you need the matrix to contain >99.95% of zeros so to store it in your RAM.
Furthermore, the time complexity of solving a system is O(n m min(n,m)) so O(n^3) in your case (see: this post). This means a several billion billions operations. A basic mainstream processor do not exceed 0.5 TFlops. In fact, my relatively good i5-9600KF reach 0.3 TFlops in the LINPACK computationally intensive benchmark. This means the computation will certainly take a month to compute assuming is is bounded only by the speed of a mainstream processor. Actually, solving a large system of equations is known to be memory bound so it will be much slower in practice because modern RAM are a bottleneck in modern computers (see: memory wall). So for a mainstream PC, this should take from from several months to a year assuming the computation can be done in your RAM which is not possible as said before for a dense system. Since high-end SSD are about an order of magnitude slower than the RAM of a good PC, you should expect the computation to take several years. Not to mention a 20 TB high-end SSD is very expensive and it might be a good idea to consider power outages and OS failure for such a long computational time... Again, sparse matrices can help a lot, but note that solving sparse systems is known to be significantly slower than dense one unless the number of zeros is pretty small.
Such systems are solved on supercomputers (or at least large computing clusters), not regular PCs. This requires to use distributed computing and tools likes MPI and distributed linear solvers. A whole field of research is working on this topic to make them efficient on large scale systems.
Note that computing approximations can be faster, but one should solve the space problem in the first place...
Right now I am using the numpy.linalg.solve to solve my matrix, but the fact that I am using it to solve a 5000*17956 matrix makes it really time consuming. It runs really slow and It have taken me more than an hour to solve. The running time for this is probably O(n^3) for solving matrix equation but I never thought it would be that slow. Is there any way to solve it faster in Python?
My code is something like that, to solve a for the equation BT * UT = BT*B a, where m is the number of test cases (in my case over 5000), B is a data matrix m*17956, and u is 1*m.
C = 0.005 # hyperparameter term for regulization
I = np.identity(17956) # 17956*17956 identity matrix
rhs = np.dot(B.T, U.T) # (17956*m) * (m*1) = 17956*1
lhs = np.dot(B.T, B)+C*I # (17956*m) * (m*17956) = 17956*17956
a = np.linalg.solve(lhs, rhs) # B.T u = B.T B a, solve for a (17956*1)
Update (2 July 2018): The updated question asks about the impact of a regularization term and the type of data in the matrices. In general, this can make a large impact in terms of the datatypes a particular CPU is most optimized for (as a rough rule of thumb, AMD is better with vectorized integer math and Intel is better with vectorized floating point math when all other things are held equal), and the presence of a large number of zero values can allow for the use of sparse matrix libraries. In this particular case though, the changes on the main diagonal (well under 1% of all the values in consideration) will have a negligible impact in terms of runtime.
TLDR;
An hour is reasonable (a cubic regression suggests that this would take around 83 minutes on my machine -- a low-end chromebook).
The pre-processing to generate lhs and rhs account for almost none of that time.
You won't be able to solve that exact problem much faster than with numpy.linalg.solve.
If m is small as you suggest and if B is invertible, you can instead solve the equation U.T=Ba in a minute or less.
If this is part of a larger problem, this costly intermediate step might be able to be simplified away from a mathematical framework.
Performance bottlenecks really should be addressed with profiling to figure out which step is causing the issues.
Since this comes from real-world data, you might be able to get away with fewer features (either directly or through a reduction step like PCA, NMF, or LLE), depending on the end goal.
As mentioned in another answer, if the matrix is sufficiently sparse you can get away with sparse linear algebra routines to great effect (many natural language processing data sources are like this).
Since the output is a 1D vector, I would use np.dot(U, B).T instead of np.dot(B.T, U.T). Transposes are neat that way. This avoids doing the transpose on a big matrix like B, though since you have a cubic operation as the dominant step this doesn't matter much for your problem.
Depending on whether you need the original data anymore and if the matrices involved have any other special properties, you might be able to fiddle with the parameters in scipy.linalg.solve instead for a gain.
I've had mixed success replacing large matrix equations with block matrix equations falling back on numpy routines. That approach typically saves 5-20% over numpy approaches and takes 1% or so off scipy approaches on my system. I haven't fully explored the reason for the discrepancy.
Assuming your matrix is sparse, the scipy.sparse.linalg module will be useful. Here is the documentation for the whole module, and here is the documentation for spsolve.
recently I'm working on a problem which requires
diagonalizing a huge hermitian matrix to get all the eigenvalues.
Currently I'm using Mathematica to do the job.
However it is not applicable due to the limitation of memory
when the matrix size approaches (2^15,2^15), where the diagonalization costs approximately 32 GBs memory.
I've tried using python by importing the matrix from mathematica,
import numpy as np
from scipy.io import mmread
from scipy.sparse import csc_matrix
#importing sparse matrix to save space
h = mmread("h.mtx")
h = csc_matrix(h)
#diagonlizing the dense one
ev = np.linalg.eigvalsh(h.todense())
It works but unfortunately an order of magnitude slower than Mathematica.
So, is there any other possible solutions, say, C++?
I know nothing about C++ so I guess the simplest way may be importing the
matrix to C++ and diagonalizing.
Thanks!
Running some preliminary test using this matrix:
http://math.nist.gov/MatrixMarket/data/NEP/h2plus/qc2534.html
I determined that the conversion to dense does not take up much of the time. The eigenvalue calculation does.
Numpy uses highly-optimized Lapack routines to calculate. These are the same you'd use in C++. Therefore C++ won't give you much of a speedup. If you want a speedup use the sparseness as a property, go to a better computer or switch to a distributed matrix storage(lot's of labor here).
P.S: if you do this for a university project you might want to look around if your university has a cluster of some sort. A cluster node typically has lots of memory. If not, check amazons AWS EC2 or googles compute engine for instances with lot's of ram.
Edit:
Here Wolfram says what Mathematica does behind the scenes: http://reference.wolfram.com/language/tutorial/LinearAlgebraAppendix.html#83486633
Arpack is a (arnoldi)subspace solver, giving you only the highest or lowest k-eigenvalues, ATLAS is just a Lapack implementation and the rest seems to be for solving linear systems.
All methods giving you the full eigenspectrum will require the matrix decomposition of a NxN matrix. If you only want k vectors there are methods which reduce it to a decomposition of a k x k-matrix.
There are modern alternatives to Arpack(http://slepc.upv.es/ or the one that comes with MKL), but they all give you a subspace.
c++ won't help much.
In python you can delegate easily to C++ and a lot of scipy routines will do just that (for performance). I also expect that if you only time the eigen value line you will get similar performance to Matematica and the difference in performance comes from reading the data.
The best solution is to look for a more appropriate algorithm, maybe something that operates on the sparse matrix directly, or decompose the original into smaller matrices and combine them.
To make the original solution more tractable you could try increasing the amount of swap space. In linux it's a dedicated partition, in windows it's a setting. This should allow Matematica/python to use more memory, but it's going to be much slower due to memory trashing. Get an SSD to speed this setup up, but note that it's going to be destroyed faster due to often writes. Or even better buy more RAM.
Using Scikit-learn (v 0.15.2) for non-negative matrix factorization on a large sparse matrix (less than 1% values > 0). I want to find factors by minimizing errors only on non-zero values of the matrix (i.e., do not calculate errors for entries that are zero), and to favor sparsity. I'm not sure if something's wrong with what I'm trying. The scikit-learn package's NMF and ProjectedGradientNMF have worked well for me before. But it seems that when the matrix size increases, the factorization is terribly slow.
I'm talking about matrices with > 10^10 cells. For matrix with ~10^7 cells, I find the executime time to be good.
The parameters I've used are as follows: nmf_model = NMF(n_components = 100, init='nndsvd', random_state=0, tol = 0.01, sparseness='data').
When I tried slightly different parameters (change to init=random), I get the following warning. After the warning, the execution of the script halts.
/lib/python2.7/site-packages/sklearn/decomposition/nmf.py:252: UserWarning: Iteration limit reached in nls subproblem.
warnings.warn("Iteration limit reached in nls subproblem.")
Is there a way to make this faster and solve the above problem? I've tried using numpy sparse matrix (column- and row-sparse), but surprisingly - it's slower on the test I did with a smaller matrix (~10^7 cells).
Considering that one would have to run multiple iterations of such a factorization (to choose an ideal number of factors and k-fold cross validation), a faster way to solve this problem is highly desirable.
I'm also open to suggestions of package/tools that's not based on sklearn or Pyhon. I understand questions about package/tool choices are not encouraged, but for such a specific use-case, knowing what techniques others in the field use would be very helpful.
Maybe a few words on what the initial problem is about could enable us to give better answers.
Matrix Factorization on a very large matrix is always going to be slow due to the nature of the problem.
Suggestions:
Reducing n_components to < 20 will speed it up somewhat. However, the only real improvement in speed will be achieved by limiting the size of the matrix.
With a matrix like you describe, one could think that you are trying to factorize a term frequency matrix. If this is so, you could try to use the vectorization functions in scikit-learn to limit the size of the matrix. Most of them have a max_features parameter. Example:
vectorizer = TfidfVectorizer(
max_features=10000,
ngram_range=(1,2))
tfidf = vectorizer.fit_transform(data)
This will significantly speed up the problem solving.
Should I be completely wrong and this is not a term frequency problem, I would still look into ways to limit the initial matrix you are trying to factorize.
You might want to take a look at this article which discusses more recent techniques on NMF: http://www.cc.gatech.edu/~hpark/papers/nmf_blockpivot.pdf
The idea is to work only on the nonzero entries for factorization which reduces computational time especially when the matrix/matrices involved is/are very sparse.
Also, one of the authors from the same article created NMF implementations on github including the ones mentioned in their article. Here's the link: https://github.com/kimjingu/nonnegfac-python
Hope that helps.
Old question, new answer.
The OP asks for "zero-masked" NMF, where zeros are treated as missing values. This will never be faster than normal NMF. Consider NMF by alternating least squares, here the left-hand side of systems of equations is generally constant (it is simply the tcrossprod of W or H), but in zero-masked NMF it needs to be re-calculated for every single sample or feature.
I've implemented zero-masked NMF in the RcppML R package. You can install it from CRAN and use the nmf function setting the mask_zeros argument to TRUE:
install.packages("RcppML")
A <- rsparsematrix(1000, 1000, 0.1) # simulate random matrix
model <- RcppML::nmf(A, k = 10, mask_zeros = TRUE)
My NMF implementation is faster than scikit-learn without masking zeros, and shouldn't be impossibly slow for 99% sparse matrices.