I have been experiencing with MPI in Python through mpi4py. And it's working amazingly well to parallelize some code I developed. However I rely heavily on Numpy for matrix manipulation. I have a question regarding the usage of Numpy with MPI.
Let's take for instance de dot function in Numpy. Let's say I have 2 huge matrices A and B and I want to compute their matrix product A*B:
numpy.dot(A, B)
I would like to know how to spread this function call over the whole cluster. I can chunk B (columnwise) into smaller matrices and spread the matrix product on the cluster nodes and therefore regroup the result. However it seems like a bad workaround. Is there a better solution ?
Related
I have been trying for some days to calculate the nearest positive semi-definite matrix from a very large covariance matrix to be able to sample from it.
I have tried MATLAB for the effect, but the memory usage is insane and it always crashes eventually, no error message or log file as far as I searched. The function used for the calculation can be found here https://www.mathworks.com/matlabcentral/fileexchange/42885-nearestspd. Optimizing the function to remove intermediate matrices seemed to reduce the memory usage, but it eventually crashes much in the same way.
Found this approach for doing the calculation https://stackoverflow.com/a/63131309/18660401 and switched to Python, in hopes of finding some GPU libraries to accelerate the calculations, but it seems I cannot find an up-to-date library that suports calculating eigenvectors using the numpy function. This is the function I am using:
import numpy as np
def get_near_psd(A):
C = (A + A.T)/2
eigval, eigvec = np.linalg.eig(C)
eigval[eigval < 0] = 0
return eigvec.dot(np.diag(eigval)).dot(eigvec.T)
I am currently trying to run the same function with numba in hopes that the translation to LLVM is enough to make the calculations in reasonable time, only modified the above version to include the #jit decorator from numba.
There does not seem to be a very optimized way to do this as far as I can find on my own, so any suggestion is very appreciated to crack this.
Edit: The matrix is a two-dimensional 60416x60416 covariance matrix and it is to be used to generate new samples from the distribution of the mean and covariance matrix calculated from a set of samples using a GAN. For training purposes, samples also need to be generated from randomly sampling the distribution, which I am intending to use the function multivariate_normal from numpy for.
A very up to date library that does have these capabilities including GPU support is pytorch, check out the examples on the torch.linalg.eig-function and the corresponding accelerated function torch.linalg.eigh for symmetric/hermitian matrices. You do have to convert these matrices from numpy to pytorch-tensors first to do the computation (and then convert it back), but you can definitely use it in a very similar way.
Of course also this library can't just magically give you more memory, but it is highly optimized.
I have a weighted graph data structure used in a machine learning algorithm, that requires frequent alterations (insertions, deletions of both vertices and edges). I am currently using an adjacency matrix implemented with a numpy 2d array with entries being
G[i, j] = W{i, j} if ij (is an edge) else 0
This works well for edges |V| < 1,500 but gets really slow with the search, insert and delete operations beyond that.
Since I am using a vectorized optimization of the graph embedding based on the weights, I need to use numpy arrays, so using lists is not feasible in this case.
Is there any efficient implementations of graphs that I can use for the storage, and operations on Graphs written in Python that can used ?
As mentioned in the question, it is very hard to beat the performance of an adjacency list when the graph is sparse. Adjacency matrices will always waste a lot of space for sparse graphs so you will probably have to find an alternative from using numpy arrays in all operations.
Some of the possible solutions to your problem may be:
Use an adjacency list structures for the other operations and convert to 2d numpy arrays when necessary (may not be efficient)
Use a sparse matrix: try to use a sparse matrix so you still can do matrix operations without converting back and forth. You may read more about them in this blog post. Note that you will have to replace some of the numpy operations to their scipy.sparse equivalents in your code if you opt for this solution.
Try using NetworkX library which is one of the best out there to handle Graph data structures.
recently I'm working on a problem which requires
diagonalizing a huge hermitian matrix to get all the eigenvalues.
Currently I'm using Mathematica to do the job.
However it is not applicable due to the limitation of memory
when the matrix size approaches (2^15,2^15), where the diagonalization costs approximately 32 GBs memory.
I've tried using python by importing the matrix from mathematica,
import numpy as np
from scipy.io import mmread
from scipy.sparse import csc_matrix
#importing sparse matrix to save space
h = mmread("h.mtx")
h = csc_matrix(h)
#diagonlizing the dense one
ev = np.linalg.eigvalsh(h.todense())
It works but unfortunately an order of magnitude slower than Mathematica.
So, is there any other possible solutions, say, C++?
I know nothing about C++ so I guess the simplest way may be importing the
matrix to C++ and diagonalizing.
Thanks!
Running some preliminary test using this matrix:
http://math.nist.gov/MatrixMarket/data/NEP/h2plus/qc2534.html
I determined that the conversion to dense does not take up much of the time. The eigenvalue calculation does.
Numpy uses highly-optimized Lapack routines to calculate. These are the same you'd use in C++. Therefore C++ won't give you much of a speedup. If you want a speedup use the sparseness as a property, go to a better computer or switch to a distributed matrix storage(lot's of labor here).
P.S: if you do this for a university project you might want to look around if your university has a cluster of some sort. A cluster node typically has lots of memory. If not, check amazons AWS EC2 or googles compute engine for instances with lot's of ram.
Edit:
Here Wolfram says what Mathematica does behind the scenes: http://reference.wolfram.com/language/tutorial/LinearAlgebraAppendix.html#83486633
Arpack is a (arnoldi)subspace solver, giving you only the highest or lowest k-eigenvalues, ATLAS is just a Lapack implementation and the rest seems to be for solving linear systems.
All methods giving you the full eigenspectrum will require the matrix decomposition of a NxN matrix. If you only want k vectors there are methods which reduce it to a decomposition of a k x k-matrix.
There are modern alternatives to Arpack(http://slepc.upv.es/ or the one that comes with MKL), but they all give you a subspace.
c++ won't help much.
In python you can delegate easily to C++ and a lot of scipy routines will do just that (for performance). I also expect that if you only time the eigen value line you will get similar performance to Matematica and the difference in performance comes from reading the data.
The best solution is to look for a more appropriate algorithm, maybe something that operates on the sparse matrix directly, or decompose the original into smaller matrices and combine them.
To make the original solution more tractable you could try increasing the amount of swap space. In linux it's a dedicated partition, in windows it's a setting. This should allow Matematica/python to use more memory, but it's going to be much slower due to memory trashing. Get an SSD to speed this setup up, but note that it's going to be destroyed faster due to often writes. Or even better buy more RAM.
I want to create a 2D matrix in python when number of rows and columns are equal and it is around 231000. Most of the cell entries would be zero.
Some [i][j] entries would be non-zero.
The reason for creating this matrix is to apply SVD and get [U S V] matrices with rank of say 30.
Can anyone provide me with the idea how to implement this by applying proper libraries. I tried pandas Dataframe but it shows Memory error.
I have also seen scipy.sparse matrix but couldn't figure out how it would be applied to find SVD.
I think this is a duplicate question, but I'll answer this anyways.
There are several libraries in python aimed at dealing with partial svds on very sparse matrices.
My personal preference is scipy.sparse.linalg.svds, a ARPACK implementation of iterative partial SVD calculation.
You can also try the function sparsesvd.sparsesvd, which uses the SVDLIBC implementation, or scipy.sparse.linalg.svd, which uses the LAPACK implementation.
To convert your table to a format that these algorithms use, you will need to import scipy.sparse, which allows you to use the csc_matrix class
Use the above links to help you out. There are a lot of resources already here on stack overflow and many more on the internet.
I need to run the python (numpy) code:
numpy.linalg.svd(M.dot(M.T))
but M is a dense float64 matrix of shape 100224 x 349800
Obviously, it does not fit in memory.
I know pytables is supposed to be able to do out-of-core operations
but I have only found examples of element-wise operations, nothing like
a dot product or SVD.
Is this even possible in python? and if not, what are my options?
(Any programming language is probably fine)