Solve very large system of linear equations with Numpy

Solve very large system of linear equations with Numpy - python

I'm trying to solve a system of equations that is a 1 Million x 1 Million square matrix and one 1 Million solution vector.
To do this, I'm using np.linalg.solve(matrix, answers) but it's taking a very long time.
Is there a way to speed it up?
Thanks #Chris but that doesn't answer the question since I've also tried using the Scipy module and it still takes a very long time to solve. I don't think my computer can hold that much data in RAM
OK for clarity, I've just found out that the name of the matrix that I'm trying to solve is a Hilbert matrix

Please reconsider the need for solving such a HUGE system unless your system is very sparse.
Indeed, this is barely possible to store the input/output on a PC storage device: the input dense matrix takes 8 TB with double-precision values and the output will certainly also takes few TB not to mention a temporary data storage is needed to compute the result (at least 8 TB for a dense matrix). Sparse matrices can help a lot if your input matrix is almost full of zeros but you need the matrix to contain >99.95% of zeros so to store it in your RAM.
Furthermore, the time complexity of solving a system is O(n m min(n,m)) so O(n^3) in your case (see: this post). This means a several billion billions operations. A basic mainstream processor do not exceed 0.5 TFlops. In fact, my relatively good i5-9600KF reach 0.3 TFlops in the LINPACK computationally intensive benchmark. This means the computation will certainly take a month to compute assuming is is bounded only by the speed of a mainstream processor. Actually, solving a large system of equations is known to be memory bound so it will be much slower in practice because modern RAM are a bottleneck in modern computers (see: memory wall). So for a mainstream PC, this should take from from several months to a year assuming the computation can be done in your RAM which is not possible as said before for a dense system. Since high-end SSD are about an order of magnitude slower than the RAM of a good PC, you should expect the computation to take several years. Not to mention a 20 TB high-end SSD is very expensive and it might be a good idea to consider power outages and OS failure for such a long computational time... Again, sparse matrices can help a lot, but note that solving sparse systems is known to be significantly slower than dense one unless the number of zeros is pretty small.
Such systems are solved on supercomputers (or at least large computing clusters), not regular PCs. This requires to use distributed computing and tools likes MPI and distributed linear solvers. A whole field of research is working on this topic to make them efficient on large scale systems.
Note that computing approximations can be faster, but one should solve the space problem in the first place...

Related

Memory requirement for PCA/kPCA

Is there a way to know exactly how much memory I will need to do PCA/kPCA in Python?
For example, if I have a matrix of N rows and M columns:
What memory will I need if N = 100, 10000, 100000?
And does M have an effect on the memory needed for PCA/kPCA?

Interesting question. From what I can glean from the paper Online principal component analysis in high dimension; which algorithm to choose? by Cardot & Degras (2015) the complexity of PCA depends on the SVD (singular value deomposition) step. Thus, the space complexity of 'batch' (in memory) PCA is, using your notation, at least O(NM) for the data, plus O(M2) for the covariance matrix, so the complexity is O(M2) in total.
This is all backed up by this tutorial paper by Li et al. (Note that your N is their m and your M is their n.) They give more detail though, showing that about 3 copies are made of each matrix, and they show how the choice of algorithm matters.
This answer to a question about NumPy's SVD implementation might help you compute the exact memory footprint of the process. Or check out a tool like memory-profiler, which will also give you an exact answer.
In short, the memory required depends on the algorithm but is very roughly 3 × (NM + M2). You can assume that each float in those arrays needs 8 bytes of memory.

Fastest way to compute large amount of fixed points in python?

I have a large amount of one-dimensional nonlinear fixed point problems to solve, what is the most efficient numerical solver? I'm currently using scipy.optimize.fixed_point, it takes around 17s to run 1000 of my tasks. Thanks for any suggestions.

If these are all 1D, you can take the fixed_point source,
https://github.com/scipy/scipy/blob/v1.5.2/scipy/optimize/minpack.py#L876
simplify it (can decide once on the acceleration strategy, no need for _lazywhere etc) and compile it with either cython or numba.

Efficient way to solve matrix equation in Python

Right now I am using the numpy.linalg.solve to solve my matrix, but the fact that I am using it to solve a 5000*17956 matrix makes it really time consuming. It runs really slow and It have taken me more than an hour to solve. The running time for this is probably O(n^3) for solving matrix equation but I never thought it would be that slow. Is there any way to solve it faster in Python?
My code is something like that, to solve a for the equation BT * UT = BT*B a, where m is the number of test cases (in my case over 5000), B is a data matrix m*17956, and u is 1*m.
C = 0.005 # hyperparameter term for regulization
I = np.identity(17956) # 17956*17956 identity matrix
rhs = np.dot(B.T, U.T) # (17956*m) * (m*1) = 17956*1
lhs = np.dot(B.T, B)+C*I # (17956*m) * (m*17956) = 17956*17956
a = np.linalg.solve(lhs, rhs) # B.T u = B.T B a, solve for a (17956*1)

Update (2 July 2018): The updated question asks about the impact of a regularization term and the type of data in the matrices. In general, this can make a large impact in terms of the datatypes a particular CPU is most optimized for (as a rough rule of thumb, AMD is better with vectorized integer math and Intel is better with vectorized floating point math when all other things are held equal), and the presence of a large number of zero values can allow for the use of sparse matrix libraries. In this particular case though, the changes on the main diagonal (well under 1% of all the values in consideration) will have a negligible impact in terms of runtime.
TLDR;
An hour is reasonable (a cubic regression suggests that this would take around 83 minutes on my machine -- a low-end chromebook).
The pre-processing to generate lhs and rhs account for almost none of that time.
You won't be able to solve that exact problem much faster than with numpy.linalg.solve.
If m is small as you suggest and if B is invertible, you can instead solve the equation U.T=Ba in a minute or less.
If this is part of a larger problem, this costly intermediate step might be able to be simplified away from a mathematical framework.
Performance bottlenecks really should be addressed with profiling to figure out which step is causing the issues.
Since this comes from real-world data, you might be able to get away with fewer features (either directly or through a reduction step like PCA, NMF, or LLE), depending on the end goal.
As mentioned in another answer, if the matrix is sufficiently sparse you can get away with sparse linear algebra routines to great effect (many natural language processing data sources are like this).
Since the output is a 1D vector, I would use np.dot(U, B).T instead of np.dot(B.T, U.T). Transposes are neat that way. This avoids doing the transpose on a big matrix like B, though since you have a cubic operation as the dominant step this doesn't matter much for your problem.
Depending on whether you need the original data anymore and if the matrices involved have any other special properties, you might be able to fiddle with the parameters in scipy.linalg.solve instead for a gain.
I've had mixed success replacing large matrix equations with block matrix equations falling back on numpy routines. That approach typically saves 5-20% over numpy approaches and takes 1% or so off scipy approaches on my system. I haven't fully explored the reason for the discrepancy.

Assuming your matrix is sparse, the scipy.sparse.linalg module will be useful. Here is the documentation for the whole module, and here is the documentation for spsolve.

convert a sparse matrix to dense and get the full eigenvalues

recently I'm working on a problem which requires
diagonalizing a huge hermitian matrix to get all the eigenvalues.
Currently I'm using Mathematica to do the job.
However it is not applicable due to the limitation of memory
when the matrix size approaches (2^15,2^15), where the diagonalization costs approximately 32 GBs memory.
I've tried using python by importing the matrix from mathematica,
import numpy as np
from scipy.io import mmread
from scipy.sparse import csc_matrix
#importing sparse matrix to save space
h = mmread("h.mtx")
h = csc_matrix(h)
#diagonlizing the dense one
ev = np.linalg.eigvalsh(h.todense())
It works but unfortunately an order of magnitude slower than Mathematica.
So, is there any other possible solutions, say, C++?
I know nothing about C++ so I guess the simplest way may be importing the
matrix to C++ and diagonalizing.
Thanks!

Running some preliminary test using this matrix:
http://math.nist.gov/MatrixMarket/data/NEP/h2plus/qc2534.html
I determined that the conversion to dense does not take up much of the time. The eigenvalue calculation does.
Numpy uses highly-optimized Lapack routines to calculate. These are the same you'd use in C++. Therefore C++ won't give you much of a speedup. If you want a speedup use the sparseness as a property, go to a better computer or switch to a distributed matrix storage(lot's of labor here).
P.S: if you do this for a university project you might want to look around if your university has a cluster of some sort. A cluster node typically has lots of memory. If not, check amazons AWS EC2 or googles compute engine for instances with lot's of ram.
Edit:
Here Wolfram says what Mathematica does behind the scenes: http://reference.wolfram.com/language/tutorial/LinearAlgebraAppendix.html#83486633
Arpack is a (arnoldi)subspace solver, giving you only the highest or lowest k-eigenvalues, ATLAS is just a Lapack implementation and the rest seems to be for solving linear systems.
All methods giving you the full eigenspectrum will require the matrix decomposition of a NxN matrix. If you only want k vectors there are methods which reduce it to a decomposition of a k x k-matrix.
There are modern alternatives to Arpack(http://slepc.upv.es/ or the one that comes with MKL), but they all give you a subspace.

c++ won't help much.
In python you can delegate easily to C++ and a lot of scipy routines will do just that (for performance). I also expect that if you only time the eigen value line you will get similar performance to Matematica and the difference in performance comes from reading the data.
The best solution is to look for a more appropriate algorithm, maybe something that operates on the sparse matrix directly, or decompose the original into smaller matrices and combine them.
To make the original solution more tractable you could try increasing the amount of swap space. In linux it's a dedicated partition, in windows it's a setting. This should allow Matematica/python to use more memory, but it's going to be much slower due to memory trashing. Get an SSD to speed this setup up, but note that it's going to be destroyed faster due to often writes. Or even better buy more RAM.

The fastest way to calculate eigenvalues of large matrices

Until now I used numpy.linalg.eigvals to calculate the eigenvalues of quadratic matrices with at least 1000 rows/columns and, for most cases, about a fifth of its entries non-zero (I don't know if that should be considered a sparse matrix). I found another topic indicating that scipy can possibly do a better job.
However, since I have to calculate the eigenvalues for hundreds of thousands of large matrices of increasing size (possibly up to 20000 rows/columns and yes, I need ALL of their eigenvalues), this will always take awfully long. If I can speed things up, even just the tiniest bit, it would most likely be worth the effort.
So my question is: Is there a faster way to calculate the eigenvalues when not restricting myself to python?

#HighPerformanceMark is correct in the comments, in that the algorithms behind numpy (LAPACK and the like) are some of the best, but perhaps not state of the art, numerical algorithms out there for diagonalizing full matrices. However, you can substantially speed things up if you have:
Sparse matrices
If your matrix is sparse, i.e. the number of filled entries is k, is such that k<<N**2 then you should look at scipy.sparse.
Banded matrices
There are numerous algorithms for working with matrices of a specific banded structure.
Check out the solvers in scipy.linalg.solve.banded.
Largest Eigenvalues
Most of the time, you don't really need all of the eigenvalues. In fact, most of the physical information comes from the largest eigenvalues and the rest are simply high frequency oscillations that are only transient. In that case you should look into eigenvalue solutions that quickly converge to those largest eigenvalues/vectors such as the Lanczos algorithm.

An easy way to maybe get a decent speedup with no code changes (especially on a many-core machine) is to link numpy to a faster linear algebra library, like MKL, ACML, or OpenBLAS. If you're associated with an academic institution, the excellent Anaconda python distribution will let you easily link to MKL for free; otherwise, you can shell out $30 (in which case you should try the 30-day trial of the optimizations first) or do it yourself (a mildly annoying process but definitely doable).
I'd definitely try a sparse eigenvalue solver as well, though.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.